1 Preface
Deep reinforcement learning can be said to be the most advanced research direction in the field of depth learning, the goal of which is to make the robot have the ability of decision-making and motion control. The machine flexibility that human beings create is far lower than some low-level organisms, such as bees. DRL is to do this, but the key is to use neural networks for decision control.
Therefore, consider a moment, decided to launch the DRL Frontier series, the first time to push to understand the DRL frontier, mainly to introduce the latest research results, do not explain the specific methods (considering the Bo master himself can not be so quick to understand). Therefore, this article is suitable for reading for children's shoes which are not fully aware of this field or are interested in this field.
Let's get down to the chase. 2 Benchmarking deep reinforcement Learning for continuous Control
Article Source: http://arxiv.org/abs/1604.06778
Date: April 25, 2016
Open source software address: Https://github.com/rllab/rllab
This article is not an innovative algorithm of the article, but it is an extremely important article, see the article at the first glance can be seen. This article has made a benchmark to the problem of DRL in the field of continuous control, and the key is that the author opens the program open source, according to the author's exact words
To encourage adoption by other researchers!
In this article, or in this open source package, the author uses Python to reproduce the mainstream and cutting-edge algorithms for continuous control, and then applies the algorithm to 31 different difficulty continuous control problems.
Then there are four categories of tasks:
1) Simple task: Keep the inverted pendulum balanced and so on
2) Movement task: Let the inside of the virtual creatures run forward, the faster the better.
3) Incomplete observation tasks: virtual organisms can only get limited perceptual information, such as only know the location of each joint but do not know the speed
4) Hierarchical task: Contains top level decision + underlying control. For example, let virtual ants find food or let virtual snakes go through the maze. That's a lot of difficulty.
With the same test environment, different algorithms can be compared.
The result of the comparison is that the two methods of TNPG and Trpo (the Schulman of UC Berkerley, which are now OpenAI) are best, DDPG (DeepMind's David Silver team). The hierarchy task currently does not have an algorithm to complete, to generate new algorithm.
Then the article did not test the DeepMind a3c algorithm http://arxiv.org/pdf/1602.01783, which is currently the best algorithm according to DeepMind article. 3 Summary
UC Berkerley This open source is believed to have a significant impact on academia, and many researchers will benefit from their openness to replication algorithms. Later studies may also be tested on this benchmark. "This article is original article, reprint please indicate source: Blog.csdn.net/songrotek"