Skip to content

Latest commit

 

History

History
22 lines (11 loc) · 1.15 KB

README.md

File metadata and controls

22 lines (11 loc) · 1.15 KB

Qlearning_vs_SARSA

Code that produces a comparison between two different learning agents in a classic gridworld game. One that uses the off-policy approach of Q-learning, and the other which uses the on-policy State Action Reward State Action (SARSA) approach.

Image of grid used in game:

cliff walk

Code should produce graphs like the ones below which show the average rewards for the agents over 500 epochs for varying levels of exporation (epsilon value).

The image below compares the two agents for an epsilon value of 0.1:

qvsSARSA_ep0 1

The image below compares the two agents for an epsilon value of 0.25:

qvsSARSA_ep0 25

The image below compares the two agents for an epsilon value of 0.75:

qvsSARSA_ep0 75