Skip to content

Instantly share code, notes, and snippets.

@iambrian
Last active November 13, 2020 21:12
Show Gist options
  • Select an option

  • Save iambrian/2bcc8fc03eaecb2cbe53012d2f505465 to your computer and use it in GitHub Desktop.

Select an option

Save iambrian/2bcc8fc03eaecb2cbe53012d2f505465 to your computer and use it in GitHub Desktop.

Revisions

  1. iambrian renamed this gist Oct 18, 2016. 1 changed file with 0 additions and 0 deletions.
    File renamed without changes.
  2. iambrian renamed this gist Oct 6, 2016. 1 changed file with 0 additions and 0 deletions.
    File renamed without changes.
  3. iambrian created this gist Oct 6, 2016.
    55 changes: 55 additions & 0 deletions tutorial
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,55 @@
    Getting Setup:
    Follow the instruction on https://gym.openai.com/docs

    ```
    git clone https://github.com/openai/gym
    cd gym
    pip install -e . # minimal install
    ```

    Basic Example using CartPole-v0:

    Level 1: Getting environment up and running
    ```
    import gym
    env = gym.make('CartPole-v0')
    env.reset()
    for _ in range(1000): # run for 1000 steps
    env.render()
    action = env.action_space.sampe() # pick a random action
    env.step(action) # take action
    ```

    Level 2: Running trials(AKA episodes)
    ```
    import gym
    env = gym.make('CartPole-v0')
    for i_episode in range(20):
    observation = env.reset() # reset for each new trial
    for t in range(100): # run for 100 timesteps or until done, whichever is first
    env.render()
    action = env.action_space.sample() # select a random action (see https://github.com/openai/gym/wiki/CartPole-v0)
    observation, reward, done, info = env.step(action)
    if done:
    print("Episode finished after {} timesteps".format(t+1))
    break
    ```

    Level 3: Non-random actions
    ```
    import gym
    env = gym.make('CartPole-v0')
    highscore = 0
    for i_episode in range(20): # run 20 episodes
    observation = env.reset()
    points = 0 # keep track of the reward each episode
    while True: # run until episode is done
    env.render()
    action = 1 if observation[2] > 0 else 0 # if angle if positive, move right. if angle is negative, move left
    observation, reward, done, info = env.step(action)
    points += reward
    if done:
    if points > highscore: # record high score
    highscore = points
    break
    ```