Source: http://wanghaitao8118.blog.163.com/blog/static/13986977220153811210319/
Google's deep-mind team published a bull X-ray article in Nips in 2013, which blinded many people and unfortunately I was in it. Some time ago collected a lot of information about this, has been lying in the collection, is currently doing some related work (want to have a small partner to communicate).
First, related articles
On the DRL, this aspect of the work should be with the deep learning of the recent rise in the explosion, do this research is generally artificial intelligence in the field of Daniel. One of the earliest (pending) work that dates back to 2010, lange[1]. It is suggested that deep Auto-encoder is used for visual-based related control, and the current architecture is very similar. Next, in 2011, CUCCU and others [2] (Jurgen schmidhuber), did some related work (with DRL near meaning). About the Swiss Daniel Jurgen, he wrote a review of the DL last year, this is not the key, the key is that the goods quoted 888 references, the goods must be a Chinese expert (because the Chinese people like it), in fact, he also do Chinese recognition (I was stunned). Or 2011, Abtahi and other people [3] with DBN to replace the traditional reinforcement learning in the approximation (do RL is not very kind, and deep mind on a little bit!) There is wood to feel very pity, almost all touched the door of nature),. 2012, Lange[4] This person further began to do the application, put forward deep fitted Q learning to do the vehicle control, but the effect is not well. 2012 years later, people began to write about the application prospects and review of deep reinforcement learning, such as Arel[5] (Daniel still look at the far). Then the time is 2013 years, the Deep Mind team published their article on the Nips [6], the industry is stunned, (RL and DL can also play). But, just out of time, they do not give code, each road cattle shocked to start reverse engineering, finally there are a bunch of people to write the code (embarrassed Ah!). Why can't I write it? Later more to force is deep mind[7] incredibly further sent to nature up. Anyway, I knew it was stunned, Ai people began to rave, all kinds of people, and then now this thing began to become hot, do not know will be like Google glasses. As for the development of DRL, let's look at how those individuals shout!
Second,
Scientific Review
- First to the Chinese, this analysis DRL more objective, the recommended index of 3 stars http://www.infoq.com/cn/articles/atari-reinforcement-learning. But in fact, it is only said a fur, really want to see the content of the words or to see others paper
- Pure Science http://36kr.com/p/220012.html and http://36kr.com/p/217570.html, two articles are 36KR reported, is a domestic comparison has a conscience, the recommended index of 2 stars.
Let's see what foreign people say!
- This advantage is similar to a review, suitable for academic people to see, but also have demo and tutorial! Some videos have to be fqhttp://nextbigfuture.com/2014/12/deep-learning-and-deep-reinforcement.html. The recommended index is 5 stars.
- http://arstechnica.com/science/2015/02/ai-masters-49-atari-2600-games-without-instructions/, this is a science, there are videos, Better than the domestic science popularization, the recommended index of 3 stars.
- There is a overview, basically the deep heart of the main part of the article singled out to say, suitable for a certain ml basis of the people to see, the recommended index of 3 stars. http://artent.net/2014/12/10/a-review-of-playing-atari-with-deep-reinforcement-learning/.
- Nature also interviewed a scholar scientist in Eastern Europe who did the inverse deep reinforcement study, and they gave a flaw in the DRL algorithm, a http://www.7huoxing.com/?p=13035 of Chinese translation. Recommended index 2 stars, suitable for popular science.
There are many more to be listed.
Third, Related Code
This part should be the most concerned about, I think most of the first idea of reading the article is the code! Rub, in fact, I also think so, who call their coding ability is not enough! No way, I am on the net (github.com) deep dug for a long time, did not expect Daniel really many many Ah!
- The first is of course Google's own published code Ah! It's the conscience of the industry! https://sites.google.com/a/deepmind.com/dqn/. Unfortunately the comment is very few ..... is based on the code of Torch 7, and I have to bite the bullet to learn lua. I've been dealing with a variety of scripting languages these past few months, and it's all tears! Note, deep Mind's git address https://github.com/deepmind. Say no GPU computer really don't run, I ran for 13 hours to run probably 1/20~~~ suggested running under Ubuntu, the best version of the higher;
- Then there is the reverse engineering of the various Road men (Daniel). HTTPS://GITHUB.COM/SPRAGUNR/DEEP_Q_RL, author Spragunr published the code based on Python's various tools, this Daniel uses the ability of external tools real hanging bombing days, a lot of tools Ah! OpenCV, Cython, Rl-glue, Theano and so on. I said in the ubuntu14.04lts above configuration for a long time, found that the computer does not have a GPU, not run up, the whole person is not good, there is a seemingly the https://github.com/brian473/neural_rl of their students, Inside also need to configure Python library, Theano and the like, the steps are more ~ But Daniel's students are Daniel.
- Https://github.com/kristjankorjus/Replicating-DeepMind, here Kristjankorjus Daniel based on Convnet, published this Code, is also based on Python, I said I did not run, I do not know the situation, it seems to want the GPU, but also to configure a heap of libraries.
- Stanford Karpathy and other people reproduce the DQN, but is used JS, started really scared urine, in the browser run the amount! Https://github.com/karpathy/convnetjs, this demo good, unfortunately can only show a little value, the page is visible http://cs.stanford.edu/people/karpathy/convnetjs/. Also, this person's student is more bull, directly with torch 7 to achieve, (after reading the code found that the structure and Google gave almost identical, and the comments are quite detailed), https://github.com/fangzai/DeepQLearning. Forget the source address where, I put this to my git, a bit sorry to the original author.
- Some people use Caffe to realize, is a Japanese. There are currently two addresses, https://github.com/chiggum/AI/tree/master/rl/atari_agent, and Https://github.com/muupan/dqn-in-the-caffe The second one was written by the Japanese (and really bull). Many people like to use Caffe, but it seems that there is no torch 7 convenient, there is a benefit is directly C + +, high efficiency. However, I have not run these two programs, look at the results of the author, very good.
- It said that the converse of the people of Estonia Ilya Kuzovkin about the improvement of Google's source code, as well as the relevant computer configuration information. Https://github.com/kuz/DeepMind-Atari-Deep-Q-Learner. Here, follow their report. Https://courses.cs.ut.ee/MTAT.03.291/2014_spring/uploads/Main/Replicating%20DeepMind.pdf, bright Special Professional, I have been contacted by the author and the person is very nice.
- There are some basic is their fork out to improve, the reader to dig it yourself!
It's basically that much, and then the periphery of the installation of Torch 7 and other things is not mentioned.
Third, the forum
This is a group on Google, there are many people discuss DQN algorithm and code usage experience, interested can join.
Https://groups.google.com/forum/#!topic/deep-q-learning.
Finally, the home town building with deep mind http://deepmind.com/.
Iv. Supplementary
You may encounter some problems while running deep-mind programs.
Question 1
In the Convnet.lua 22 line error, return Nill value, this place for torch settings have a problem, please see the following URL.
Http://stackoverflow.com/questions/29564360/bug-encountered-when-running-googles-deep-q-network-code
PS: Welcome to the big calf beef supplement ~ ~ ~
[1] S. Lange and M. Riedmiller, "Deep auto-encoder neural networks in reinforcement learning," in ijcnn, $, pp. 1- 8.
[2] G. Cuccu, M. Luciw, J. Schmidhuber, and F. Gomez, "intrinsically motivated neuroevolution for vision-based reinforcement l Earning, "in Development and Learning (ICDL), IEEE International Conference on, (), pp. 1-7.
[3] F. Abtahi and I. Fasel, "deep belief nets as function approximators for reinforcement learning," RBM, Vol. 2, p. H3, 2011.
[4] S. Lange, M. Riedmiller, and A. Voigtlander, "Autonomous reinforcement learning on raw visual input data in a real world a Pplication, "in neural Networks (IJCNN), the" International JointConference on, "pp. 1-8.
[5] I. Arel, "deep reinforcement learning as Foundation for Artificial General Intelligence," theoretical foundatio NS of Artificial General Intelligence, Ed:springer, pp. 89-102.
[6] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, et al, "Playing Atari with deep reinf Orcement learning, "arXiv preprint arxiv:1312.5602, 2013.
[7] V. Mnih, K. Kavukcuoglu, D. Silver, A. Rusu, J. Veness, M. G. Bellemare, et al, "Human-level control through Deep reinforcement Learning, " Nature, Vol 518 (7540), pp. 529-533, 2015.
Resources for deep reinforcement learning