2021-12-29

WARNING: This article may be obsolete
This post was published in 2021-12-29. Obviously, expired content is less useful to users if it has already pasted its expiration date.
This article is categorized as "Garbage" . It should NEVER be appeared in your search engine's results.


(继续上次的内容,主要是MDP)从其他书籍/博客上补充MDP相关知识

从这里继续:🔗 [2021-12-26 - Truxton's blog] https://truxton2blog.com/2021-12-26/

遗留问题:

如果pi策略提前停止变化(在收敛于theta之前),那么这个迭代是不是应该提前停止?

(等等,这个问题有点混乱,把value iteration和policy iteration混在一起了)

(主要内容来自周志华机器学习,有待整理)


value iteration and policy iteration

🔗 [MDP马尔可夫决策过程中的值迭代和策略迭代感觉并没有本质区别? - 知乎] https://www.zhihu.com/question/41477987


有关real-time DP和两种迭代策略的选择

🔗 [8.7 实时动态规划 - 知乎] https://zhuanlan.zhihu.com/p/60444532


当日草稿

哎呀,出了点问题,当天留了一份混乱的草稿,过了很长时间才想起来,但已经很难修改的清晰可读了,只能期待后续重写一份:

不同PDF阅读器的显示效果不同,如果渲染PDF出现偏差,请尝试其他PDF阅读器


 Last Modified in 2022-07-28 

Leave a Comment Anonymous comment is allowed / 允许匿名评论