Last active
November 13, 2020 21:12
-
-
Save iambrian/2bcc8fc03eaecb2cbe53012d2f505465 to your computer and use it in GitHub Desktop.
Revisions
-
iambrian renamed this gist
Oct 18, 2016 . 1 changed file with 0 additions and 0 deletions.There are no files selected for viewing
File renamed without changes. -
iambrian renamed this gist
Oct 6, 2016 . 1 changed file with 0 additions and 0 deletions.There are no files selected for viewing
File renamed without changes. -
iambrian created this gist
Oct 6, 2016 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,55 @@ Getting Setup: Follow the instruction on https://gym.openai.com/docs ``` git clone https://github.com/openai/gym cd gym pip install -e . # minimal install ``` Basic Example using CartPole-v0: Level 1: Getting environment up and running ``` import gym env = gym.make('CartPole-v0') env.reset() for _ in range(1000): # run for 1000 steps env.render() action = env.action_space.sampe() # pick a random action env.step(action) # take action ``` Level 2: Running trials(AKA episodes) ``` import gym env = gym.make('CartPole-v0') for i_episode in range(20): observation = env.reset() # reset for each new trial for t in range(100): # run for 100 timesteps or until done, whichever is first env.render() action = env.action_space.sample() # select a random action (see https://github.com/openai/gym/wiki/CartPole-v0) observation, reward, done, info = env.step(action) if done: print("Episode finished after {} timesteps".format(t+1)) break ``` Level 3: Non-random actions ``` import gym env = gym.make('CartPole-v0') highscore = 0 for i_episode in range(20): # run 20 episodes observation = env.reset() points = 0 # keep track of the reward each episode while True: # run until episode is done env.render() action = 1 if observation[2] > 0 else 0 # if angle if positive, move right. if angle is negative, move left observation, reward, done, info = env.step(action) points += reward if done: if points > highscore: # record high score highscore = points break ```