TicTacToe by reinforcement learning, learningbydoing
I do not know much about mathematical formulas for students who are new to reinforcement learning. I hope some simple and clear code can be used to enhance my intuitive understanding of deep learning. This is a preliminary entry code, I hope it will play a basic role in your learning and reinforcement.
For details about how to play the game, refer to Baidu. This article uses Q-learning in Reinforcement learning to calculate the various states of Q (S, a) in the game process, at the end of the Code is the process of human bypass, the computer takes the first step.
In Q-learning, the calculation formula for each Q (S, a) is as follows:
Q (S, a) = Q (S, a) + 0.1 * (reward (s, a) + 0.9 * Q (S ', A')-Q (S, a ))
Reward (s, a) returns 1 when the computer wins,-1 when the computer loses, and 0 when the computer loses
After calculating Q (S, a), there is a man-machine battle code, board = [0, 0, 0, 0, 0, 0, 0, 0], indicates the initial status of the board. The computer selects the maximum Q (s, a) to take the first step. If the computer chooses location 3, the board becomes [0, 0, 0, 1, 0, 0, 0, 0, 0]. You can select a random position for a player. For example, if the position is 0, enter 0 on the keyboard, and the board is changed to board = [2, 0, 0, 1, 0, 0, 0, 0, 0], continue until the end status.
I have never won a computer
Location: https://github.com/k13795263/TicTacToe/blob/master/TicTacToe.py
Certificate --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Award declaration, "more support, more energy, more useful code"