OpenAI Gym Learning

Last Update:2018-07-26 Source: Internet

Author: User

Tags assert gopher

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Observation (observations)

The previous blog introduced the use of OpenAI Gym's cartpole (inverted pendulum) demo, if you want to do in each step better than taking random action, then the actual understanding of the impact of action on the environment may be good.
The step function of the environment returns the required information, and the step function returns four values observation, reward, done, info, and here is the specific information: Observation (object): An environment-related object that describes the environment you observe, such as camera pixel information, robot angular speed and angular acceleration, board game in the chessboard state. Reward (float): The sum of all returns derived from previous actions, and the way in which different environments are calculated are not
One, but the goal always increases their total return. Done (Boolean): Determines whether the reset (reset) environment is in place, most tasks are divided into well-defined episodes, and completion is true to indicate that episode has terminated. Info (dict): Diagnostic information for debugging, and sometimes for learning, but formal evaluations do not allow this information to be used for learning.
This is a typical implementation of the Agent-environment loop. Each time step, the Agent selects a action,environment to return a observation and reward.

The process starts by calling reset, which returns an initial observation. So the more appropriate way to write the last blog code is to follow the complete logo:

Import Gym
env = gym.make (' cartpole-v0 ') for
I_episode in range:
    observation = Env.reset () to T in
    R Ange (m):
        env.render ()
        print (observation)
        action = env.action_space.sample ()
        observation, reward , done, info = Env.step (action)
        if done:
            print (' episode finished after {} timesteps '. Format (t+1))
            break

When done is true, control fails, and this phase ends episode. Can calculate the return of each episode is its adherence to the t+1 time, the longer the longer the return, in the above algorithm, the agent's behavior choice is random, the average return of about 20.

[0.00753165  0.8075176  -0.15841931-1.63740717]
[0.023682    1.00410306-0.19116745-1.97497356]
Episode finished after Timesteps
[ -0.01027234-0.00503277  0.01774634  0.01849733]
[-0.01037299- 0.20040467  0.01811628  0.31672619]
[ -0.01438109-0.00554538  0.02445081  0.02981111]
[ -0.01449199  0.18921755  0.02504703-0.25505814]
[ -0.01070764  0.38397309  0.01994587- 0.53973677]
[ -0.00302818  0.57880906  0.00915113-0.8260689]
[0.008548    0.77380468- 0.00737025-1.11585968]
[0.02402409  0.9690226  -0.02968744-1.41084543]
[0.04340455  1.16449982-0.05790435-1.71265888]
[0.06669454  1.36023677-0.09215753-2.0227866]
[0.09389928  1.55618414-0.13261326-2.34251638]
[0.12502296  1.75222707-0.17946359-2.67287294]
Episode Finished after timesteps

space (spaces)

In the example above, a random action has been extracted from the action space of the environment. But what exactly are these actions? Each environment has a first-level space object that describes the effective action and the observed result:

Import Gym
env = gym.make (' cartpole-v0 ')
print (env.action_space)
#> Discrete (2)
print ( Env.observation_space)
#> Box (4,)

The discrete space allows for a fixed range of non-negative numbers, so in this case the effective action is 0 or 1. The box space represents an n-dimensional frame, so effective observation will be an array of 4 digits. You can also check the scope of the box:

Print (Env.observation_space.high)
#> Array ([2.4       ,         inf,  0.20943951,         inf])
print ( Env.observation_space.low)
#> Array ([ -2.4       ,        -inf, -0.20943951,        -inf])

This introspection (introspection) can help write common code that works for many different environments. box and discrete are the most commonly used spaces that can be sampled from a space or checked for content that belongs to it:

From gym import spaces space
= spaces. Discrete (8) # Set with 8 elements {0, 1, 2, ..., 7}
x = Space.sample ()
assert Space.contains (x)
assert SPAC E.N = 8

For Cartpole-v0, one of the operations will exert force to the left, one exerting force to the right. Environment (environments)

The main purpose of gym is to provide a large number of environments that expose common interfaces, and to make versioning so that comparisons can be made to see which environments are provided by the system:

from Gym import Envs print (Envs.registry.all ()) [Envspec (Predictactionscartpole-v0), Envspec (Asteroids-ramdeterministic-v0), Envspec (Asteroids-ramdeterministic-v3), Envspec ( GOPHER-RAMDETERMINISTIC-V3), Envspec (Gopher-ramdeterministic-v0), Envspec (Doubledunk-ramdeterministic-v3), Envspec (Doubledunk-ramdeterministic-v0), Envspec (Tennis-ramnoframeskip-v3), Envspec ( Roadrunner-ramdeterministic-v0), Envspec (Robotank-ram-v3), Envspec (Cartpole-v0), Envspec (CARTPOLE-V1), EnvSpec ( GOPHER-RAM-V3), Envspec (gopher-ram-v0) ...

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More