標籤:
本篇是sample.py
1 # -*- coding: utf-8 -*- 2 """Contains class representing an LSPI sample.""" 3 4 5 class Sample(object): 6 7 """Represents an LSPI sample tuple ``(s, a, r, s‘, absorb)``. 8 #表達了LSPI的採樣,用tuple表示 9 Parameters#輸入參數 10 ----------11 12 state : numpy.array#狀態向量13 State of the environment at the start of the sample.採樣開始時環境的狀態14 ``s`` in the sample tuple.15 (The usual type is a numpy array.)16 action : int#執行的動作的編號17 Index of action that was executed.18 ``a`` in the sample tuple19 reward : float#從環境中獲得的獎勵20 Reward received from the environment.21 ``r`` in the sample tuple22 next_state : numpy.array#採用了採樣中的動作後的下一個環境狀態23 State of the environment after executing the sample‘s action.24 ``s‘`` in the sample tuple25 (The type should match that of state.)26 absorb : bool, optional#如果這個採樣終結了這個episode那麼就返回True27 True if this sample ended the episode. False otherwise.28 ``absorb`` in the sample tuple29 (The default is False, which implies that this is a30 non-episode-ending sample)31 32 33 Assumes that this is a non-absorbing sample (as the vast majority34 of samples will be non-absorbing).35 #假設這個sample是不會結束episode的,36 #這麼做:設成一個類,是為了方便不同的調用方式37 This class is just a dumb data holder so the types of the different38 fields can be anything convenient for the problem domain.39 40 For states represented by vectors a numpy array works well.41 42 """43 44 def __init__(self, state, action, reward, next_state, absorb=False):#初始化45 """Initialize Sample instance."""46 self.state = state47 self.action = action48 self.reward = reward49 self.next_state = next_state50 self.absorb = absorb51 52 def __repr__(self):#列印的時候調用該函數.53 """Create string representation of tuple."""54 return ‘Sample(%s, %s, %s, %s, %s)‘ % (self.state,55 self.action,56 self.reward,57 self.next_state,58 self.absorb)
(六)Value Function Approximation-LSPI code (5)