强化学习（Reinforcement learning）是什么？-白红宇的个人博客

发布日期：2021-07-01 05:05:08 浏览次数：2 分类：技术文章

本文共 3106 字，大约阅读时间需要 10 分钟。

强化学习（Reinforcement learning）：

Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. The problem, due to its generality, is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics, and genetic algorithms. In the operations research and control literature, the field where reinforcement learning methods are studied is called approximate dynamic programming. The problem has been studied in the theory of optimal control, though most studies are concerned with the existence of optimal solutions and their characterization, and not with the learning or approximation aspects. In economics and game theory, reinforcement learning may be used to explain how equilibrium may arise under bounded rationality.

In machine learning, the environment is typically formulated as a Markov decision process (MDP) as many reinforcement learning algorithms for this context utilize dynamic programming techniques.[1] The main difference between the classical techniques and reinforcement learning algorithms is that the latter do not need knowledge about the MDP and they target large MDPs where exact methods become infeasible.

Reinforcement learning differs from standard supervised learning in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Further, there is a focus on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge).[2] The exploration vs. exploitation trade-off in reinforcement learning has been most thoroughly studied through the multi-armed bandit problem and in finite MDPs.

强化学习(reinforcement learning)，又称再励学习、评价学习，是一种重要的机器学习方法，在智能控制机器人及分析预测等领域有许多应用。但在传统的机器学习分类中没有提到过强化学习，而在连接主义学习中，把学习算法分为三种类型，即非监督学习(unsupervised learning)、监督学习(supervised leaning)和强化学习。

强化学习（Reinforcement learning）灵感来源于心理学中的行为主义理论，即有机体如何在环境给予的奖励或惩罚的刺激下，逐步形成对刺激的预期，产生能获得最大利益的习惯性行为。这个方法具有普适性，因此在其他许多领域都有研究，例如博弈论、控制论、运筹学、信息论、模拟优化方法、多主体系统学习、群体智能、统计学以及遗传算法。

强化学习也是多学科多领域交叉的一个产物，它的本质就是解决“决策（decision making）”问题，即学会自动进行决策。

强化学习作为一个序列决策（Sequential Decision Making）问题，它需要连续选择一些行为，从这些行为完成后得到最大的收益作为最好的结果。它在没有任何label告诉算法应该怎么做的情况下，通过先尝试做出一些行为——然后得到一个结果，通过判断这个结果是对还是错来对之前的行为进行反馈。由这个反馈来调整之前的行为，通过不断的调整算法能够学习到在什么样的情况下选择什么样的行为可以得到最好的结果。

强化学习与监督学习有不少区别，从前文中可以看到监督学习是有一个label（标记）的，这个label告诉算法什么样的输入对应着什么样的输出。而强化学习没有label告诉它在某种情况下应该做出什么样的行为，只有一个做出一系列行为后最终反馈回来的reward signal，这个signal能判断当前选择的行为是好是坏。另外强化学习的结果反馈有延时，有时候可能需要走了很多步以后才知道之前某步的选择是好还是坏，而监督学习如果做了比较坏的选择则会立刻反馈给算法。强化学习面对的输入总是在变化，不像监督学习中——输入是独立分布的。每当算法做出一个行为，它就影响了下一次决策的输入。强化学习和标准的监督式学习之间的区别在于，它并不需要出现正确的输入/输出对，也不需要精确校正次优化的行为。强化学习更加专注于在线规划，需要在Exploration（探索未知的领域）和Exploitation（利用现有知识）之间找到平衡。

1. C++标准模板库从入门到精通

2.跟老菜鸟学C++

3. 跟老菜鸟学python

4. 在VC2015里学会使用tinyxml库

5. 在Windows下SVN的版本管理与实战

6.Visual Studio 2015开发C++程序的基本使用

7.在VC2015里使用protobuf协议

8.在VC2015里学会使用MySQL数据库

转载地址：https://mysoft.blog.csdn.net/article/details/60749058 如侵犯您的版权，请留言回复原文章的地址，我们会给您删除此文章，给您带来不便请您谅解！

上一篇：凸优化(Convex Optimization)是什么？

下一篇：独立成分分析（Independent Component Analysis, ICA）是什么？

发表评论

关于作者

喝酒易醉，品茶养心，人生如梦，品茶悟道，何以解忧？唯有杜康！

-- 愿君每日到此一游！