(六)Value Function Approximation-LSPI code (5)

來源:互聯網
上載者:User

標籤:

本篇是sample.py

 

 1 # -*- coding: utf-8 -*- 2 """Contains class representing an LSPI sample.""" 3  4  5 class Sample(object): 6  7     """Represents an LSPI sample tuple ``(s, a, r, s‘, absorb)``. 8     #表達了LSPI的採樣,用tuple表示 9     Parameters#輸入參數    10     ----------11         12     state : numpy.array#狀態向量13         State of the environment at the start of the sample.採樣開始時環境的狀態14         ``s`` in the sample tuple.15         (The usual type is a numpy array.)16     action : int#執行的動作的編號17         Index of action that was executed.18         ``a`` in the sample tuple19     reward : float#從環境中獲得的獎勵20         Reward received from the environment.21         ``r`` in the sample tuple22     next_state : numpy.array#採用了採樣中的動作後的下一個環境狀態23         State of the environment after executing the sample‘s action.24         ``s‘`` in the sample tuple25         (The type should match that of state.)26     absorb : bool, optional#如果這個採樣終結了這個episode那麼就返回True27         True if this sample ended the episode. False otherwise.28         ``absorb`` in the sample tuple29         (The default is False, which implies that this is a30         non-episode-ending sample)31 32 33     Assumes that this is a non-absorbing sample (as the vast majority34     of samples will be non-absorbing).35     #假設這個sample是不會結束episode的,36     #這麼做:設成一個類,是為了方便不同的調用方式37     This class is just a dumb data holder so the types of the different38     fields can be anything convenient for the problem domain.39 40     For states represented by vectors a numpy array works well.41 42     """43 44     def __init__(self, state, action, reward, next_state, absorb=False):#初始化45         """Initialize Sample instance."""46         self.state = state47         self.action = action48         self.reward = reward49         self.next_state = next_state50         self.absorb = absorb51 52     def __repr__(self):#列印的時候調用該函數.53         """Create string representation of tuple."""54         return ‘Sample(%s, %s, %s, %s, %s)‘ % (self.state,55                                                self.action,56                                                self.reward,57                                                self.next_state,58                                                self.absorb)

 

(六)Value Function Approximation-LSPI code (5)

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.