MDP: Markov decision-making processes (Markov decision process)
Defined:
A Markov model includes the following sections
- State set S (states)
- Action set A (Actions)
- Rewards and punishments functions R (reward function)
- Under state s , the effect function of a action is performed T
We assume that the effect of performing action a is only relevant to the current state, regardless of the previous history state.
Action Representations: divided into deterministic actions (determinstic actions) and random actions (Stochastic actions)
- Deterministic action:t:sxa-S, for each State and action to determine the next state
- Random action:t:sxa-Prob (S), for each State and action can determine the probability distribution of the next state
Strategy π (Pai): Indicates the current state s select what action a
The execution process of the strategy π:
- Determining the current state
- According to the current state, according to the strategy Pi execute action a
- Perform 1
The so-called full observation (Fully observation) is the next state that performs action A arrives, the system can be known
MDP: Markov decision making process (i)