WARNING: This article may be obsolete
This post was published in 2021-12-29. Obviously, expired content is less useful to users if it has already pasted its expiration date.
This post was published in 2021-12-29. Obviously, expired content is less useful to users if it has already pasted its expiration date.
This article is categorized as "Garbage" . It should NEVER be appeared in your search engine's results.
Table of Contents
(继续上次的内容,主要是MDP)从其他书籍/博客上补充MDP相关知识
从这里继续:🔗 [2021-12-26 - Truxton's blog] https://truxton2blog.com/2021-12-26/
遗留问题:
如果pi策略提前停止变化(在收敛于theta之前),那么这个迭代是不是应该提前停止?
(等等,这个问题有点混乱,把value iteration和policy iteration混在一起了)
(主要内容来自周志华机器学习,有待整理)
value iteration and policy iteration
🔗 [MDP马尔可夫决策过程中的值迭代和策略迭代感觉并没有本质区别? - 知乎] https://www.zhihu.com/question/41477987
有关real-time DP和两种迭代策略的选择
🔗 [8.7 实时动态规划 - 知乎] https://zhuanlan.zhihu.com/p/60444532
当日草稿
哎呀,出了点问题,当天留了一份混乱的草稿,过了很长时间才想起来,但已经很难修改的清晰可读了,只能期待后续重写一份:
Last Modified in 2022-07-28