TicTacToe by reinforcement learning, learningbydoing

Source: Internet
Author: User

TicTacToe by reinforcement learning, learningbydoing

I do not know much about mathematical formulas for students who are new to reinforcement learning. I hope some simple and clear code can be used to enhance my intuitive understanding of deep learning. This is a preliminary entry code, I hope it will play a basic role in your learning and reinforcement.

For details about how to play the game, refer to Baidu. This article uses Q-learning in Reinforcement learning to calculate the various states of Q (S, a) in the game process, at the end of the Code is the process of human bypass, the computer takes the first step.

In Q-learning, the calculation formula for each Q (S, a) is as follows:

Q (S, a) = Q (S, a) + 0.1 * (reward (s, a) + 0.9 * Q (S ', A')-Q (S, a ))

Reward (s, a) returns 1 when the computer wins,-1 when the computer loses, and 0 when the computer loses

After calculating Q (S, a), there is a man-machine battle code, board = [0, 0, 0, 0, 0, 0, 0, 0], indicates the initial status of the board. The computer selects the maximum Q (s, a) to take the first step. If the computer chooses location 3, the board becomes [0, 0, 0, 1, 0, 0, 0, 0, 0]. You can select a random position for a player. For example, if the position is 0, enter 0 on the keyboard, and the board is changed to board = [2, 0, 0, 1, 0, 0, 0, 0, 0], continue until the end status.

I have never won a computer

Location: https://github.com/k13795263/TicTacToe/blob/master/TicTacToe.py

Certificate --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Award declaration, "more support, more energy, more useful code"

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.