DRL Frontier: Benchmarking Deep reinforcement Learning for continuous Control

Source: Internet
Author: User
Tags benchmark
1 Preface

Deep reinforcement learning can be said to be the most advanced research direction in the field of depth learning, the goal of which is to make the robot have the ability of decision-making and motion control. The machine flexibility that human beings create is far lower than some low-level organisms, such as bees. DRL is to do this, but the key is to use neural networks for decision control.

Therefore, consider a moment, decided to launch the DRL Frontier series, the first time to push to understand the DRL frontier, mainly to introduce the latest research results, do not explain the specific methods (considering the Bo master himself can not be so quick to understand). Therefore, this article is suitable for reading for children's shoes which are not fully aware of this field or are interested in this field.

Let's get down to the chase. 2 Benchmarking deep reinforcement Learning for continuous Control

Article Source: http://arxiv.org/abs/1604.06778
Date: April 25, 2016
Open source software address: Https://github.com/rllab/rllab

This article is not an innovative algorithm of the article, but it is an extremely important article, see the article at the first glance can be seen. This article has made a benchmark to the problem of DRL in the field of continuous control, and the key is that the author opens the program open source, according to the author's exact words

To encourage adoption by other researchers!

In this article, or in this open source package, the author uses Python to reproduce the mainstream and cutting-edge algorithms for continuous control, and then applies the algorithm to 31 different difficulty continuous control problems.
Then there are four categories of tasks:
1) Simple task: Keep the inverted pendulum balanced and so on

2) Movement task: Let the inside of the virtual creatures run forward, the faster the better.

3) Incomplete observation tasks: virtual organisms can only get limited perceptual information, such as only know the location of each joint but do not know the speed

4) Hierarchical task: Contains top level decision + underlying control. For example, let virtual ants find food or let virtual snakes go through the maze. That's a lot of difficulty.

With the same test environment, different algorithms can be compared.

The result of the comparison is that the two methods of TNPG and Trpo (the Schulman of UC Berkerley, which are now OpenAI) are best, DDPG (DeepMind's David Silver team). The hierarchy task currently does not have an algorithm to complete, to generate new algorithm.

Then the article did not test the DeepMind a3c algorithm http://arxiv.org/pdf/1602.01783, which is currently the best algorithm according to DeepMind article. 3 Summary

UC Berkerley This open source is believed to have a significant impact on academia, and many researchers will benefit from their openness to replication algorithms. Later studies may also be tested on this benchmark. "This article is original article, reprint please indicate source: Blog.csdn.net/songrotek"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.