(六)Value Function Approximation-LSPI code (5)

最後更新：2016-05-13 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：

本篇是sample.py

 1 # -*- coding: utf-8 -*- 2 """Contains class representing an LSPI sample.""" 3  4  5 class Sample(object): 6  7     """Represents an LSPI sample tuple ``(s, a, r, s‘, absorb)``. 8     #表達了ＬＳＰＩ的採樣，用ｔｕｐｌｅ表示 9     Parameters＃輸入參數    10     ----------11         12     state : numpy.array＃狀態向量13         State of the environment at the start of the sample.採樣開始時環境的狀態14         ``s`` in the sample tuple.15         (The usual type is a numpy array.)16     action : int＃執行的動作的編號17         Index of action that was executed.18         ``a`` in the sample tuple19     reward : float＃從環境中獲得的獎勵20         Reward received from the environment.21         ``r`` in the sample tuple22     next_state : numpy.array＃採用了採樣中的動作後的下一個環境狀態23         State of the environment after executing the sample‘s action.24         ``s‘`` in the sample tuple25         (The type should match that of state.)26     absorb : bool, optional＃如果這個採樣終結了這個episode那麼就返回Ｔｒｕｅ27         True if this sample ended the episode. False otherwise.28         ``absorb`` in the sample tuple29         (The default is False, which implies that this is a30         non-episode-ending sample)31 32 33     Assumes that this is a non-absorbing sample (as the vast majority34     of samples will be non-absorbing).35     ＃假設這個ｓａｍｐｌｅ是不會結束episode的，36     ＃這麼做：設成一個類，是為了方便不同的調用方式37     This class is just a dumb data holder so the types of the different38     fields can be anything convenient for the problem domain.39 40     For states represented by vectors a numpy array works well.41 42     """43 44     def __init__(self, state, action, reward, next_state, absorb=False):＃初始化45         """Initialize Sample instance."""46         self.state = state47         self.action = action48         self.reward = reward49         self.next_state = next_state50         self.absorb = absorb51 52     def __repr__(self):＃列印的時候調用該函數．53         """Create string representation of tuple."""54         return ‘Sample(%s, %s, %s, %s, %s)‘ % (self.state,55                                                self.action,56                                                self.reward,57                                                self.next_state,58                                                self.absorb)

(六)Value Function Approximation-LSPI code (5)

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More