2021-12-29

by truxton 2022-07-282021-12-29 Last Modified in 2022-07-28Home ❯ Garbage

WARNING: This article may be obsolete
This post was published in 2021-12-29. Obviously, expired content is less useful to users if it has already pasted its expiration date.

This article is categorized as "Garbage" . It should NEVER be appeared in your search engine's results.

Table of Contents

（继续上次的内容，主要是MDP）从其他书籍/博客上补充MDP相关知识

从这里继续：🔗 [2021-12-26 - Truxton's blog] https://truxton2blog.com/2021-12-26/

遗留问题：

如果pi策略提前停止变化（在收敛于theta之前），那么这个迭代是不是应该提前停止？

（等等，这个问题有点混乱，把value iteration和policy iteration混在一起了）

（主要内容来自周志华机器学习，有待整理）

value iteration and policy iteration

🔗 [MDP马尔可夫决策过程中的值迭代和策略迭代感觉并没有本质区别？ - 知乎] https://www.zhihu.com/question/41477987

有关real-time DP和两种迭代策略的选择

🔗 [8.7 实时动态规划 - 知乎] https://zhuanlan.zhihu.com/p/60444532

当日草稿

哎呀，出了点问题，当天留了一份混乱的草稿，过了很长时间才想起来，但已经很难修改的清晰可读了，只能期待后续重写一份：

不同PDF阅读器的显示效果不同，如果渲染PDF出现偏差，请尝试其他PDF阅读器

（继续上次的内容，主要是MDP）从其他书籍/博客上补充MDP相关知识

value iteration and policy iteration

有关real-time DP和两种迭代策略的选择

当日草稿

Leave a Comment Anonymous comment is allowed / 允许匿名评论 Cancel reply