Deep Q Network
- 4.1 DQN Algorithm Update
- 4.2 DQN Neural Network
- 4.3 DQN thinking decision
- 4.4 OpenAI Gym Environment Library
Notesdeep q-learning algorithm
This gives us the final deep q-learning algorithm with experience Replay:
There is many more tricks this DeepMind used to actually make it work–like target network, error clipping, reward Clipp ing etc, but these is out of the scope for this introduction.
The most amazing part of this algorithm is, it learns anything at all. Just Think about It–because We q-function are initialized randomly, it initially outputs complete garbage. And we are using this garbage (the maximum q-value of the next state) as targets for the network, only occasionally Foldin G in a tiny reward. That's sounds insane, how could it learn anything meaningful at all? The fact is, the it does.
Extension
- Using Keras and deep q-network to Play Flappybird | Ben Lau
- Demystifying Deep Reinforcement Learning
- The above post is a-must-read for those who was interested in deep reinforcement learning.
Learning Notes:morvan-reinforcement Learning, part 4:deep Q Network