This article is sample.py
1 #-*-coding:utf-8-*-2 """Contains class representing an LSPI sample."""3 4 5 classSample (object):6 7 """represents an Lspi sample tuple "(s, A, R, S ', absorb) '.8 #表达了LSPI的采样, expressed in tuple9 parameters# Input ParametersTen ---------- One A state:numpy.array# status Vector - State of the environment at the start of the sample. The status of the environment at the start of the sample - ' s ' in the sample tuple. the (the usual type is a numpy array.) - number of actions performed by the action:int# - The Index of action is executed. - "A" in the sample tuple + reward:float# Rewards from the environment - reward received from the environment. + "R" in the sample tuple A The next_state:numpy.array# uses the next state of the environment after the action in the sample at State of the environment after executing the sample ' s action. - ' s ' in the sample tuple - (the type should match that's state.) - Absorb:bool, optional# If this sample ends this episode, then it returns true. - True If this sample ended the episode. False otherwise. - ' absorb ' in the sample tuple in (the default is False, which implies. - non-episode-ending sample) to + - assumes that's a non-absorbing sample (as the vast majority the of samples would be non-absorbing). * # Assuming this sample is not going to end episode, $ # do this: Set as a class to facilitate different invocation methodsPanax Notoginseng This was just a dumb data holder so the types of the different - Fields can is anything convenient for the problem domain. the + For states represented by vectors a numpy arrays works well. A the """ + - def __init__(Self, State, action, reward, Next_State, absorb=False): # Initialize $ """Initialize Sample instance.""" $Self.state = State -Self.action =Action -Self.reward =Reward theSelf.next_state =Next_State -Self.absorb =AbsorbWuyi the def __repr__(self): # This function is called when printing. - """Create string representation of tuple.""" Wu return 'Sample (%s ,%s,%s,%s,%s)'%(Self.state, - Self.action, About Self.reward, $ Self.next_state, -Self.absorb)
Six Value Function APPROXIMATION-LSPI Code (5)