Reinforcement Learning
Programming Assignment 1
          Due: Friday, Nov 15, 2024 (11:59pm)
Instructions
  • This assignment is to be done in groups of 3. It is fine to work in groups of 2
    or individually, but I prefer groups of 3.
  • This assignment is to be submitted online via LMS on or before Nov 15, 2024
    (11:59pm). Please submit 2 programs grid-world-1.py and grid-world-2.py and
    just one PDF file with your written answer to Question-1, and the policy and
    value function images for questions 3 and 4.
  • This assignment is for 100 points. It is worth 10% of your final grades.
                                        1
Problems
The objective of this assignment is to give you some practical experience with the
MDP algorithms we learned in class. We will work with the Grid World problem in
Example 3.5 (page 61) from chapter 3 in the textbook. Please download the entire
project from this link: Python code for examples in the textbook. We will take
take the program grid-world.py in the folder chapter03 from this project and make
changes to it.
  1. (15 points) Describe in 1-2 sentences what these functions do in grid-world.py.
        • f igure 3 2 linear system()
        • f igure 3 2()
        • f igure 3 5()
  2. (25 points) Make a copy of grid-world.py and rename it as grid-world-1.py. Add
     a function get epsilon greedy policy(value vector, epsilon) that computes and
     returns an ϵ-greedy policy πϵV with respect to value vector V . The function
     should take V and ϵ as arguments.
  3. (30 points) Add to grid-world-1.py a function policy iteration(epsilon) that
     computes and returns an optimal ϵ-greedy policy πϵ∗ using the policy iter-
     ation algorithm. The function should take ϵ as argument. Use the func-
     tions draw image(image) and draw policy(optimal values) to save the pol-
     icy and value function to the Images folder. You need to make a copy of
     f igure 3 2 linear system() and change it to take a policy as argument and
     evaluate it. Test the function with ϵ values 0.2 and 0.0.
  4. (30 points) Make a copy of grid-world-1.py and rename it as grid-world-2.py.
     Change the function step(state, action) and all the other functions to make
     actions stochastic as defined next. For example, for action left the next-state
     should be the square to the left with probability 0.85, the square to the right
     with probability 0.05, the square above with probability 0.05, and the square
     below with probability 0.05. If the action takes you off the grid, do what the
     existing code does to handle that. The other 3 actions are defined similarly.
     Test all these functions for the new action definition (please see the next page).
                                          2
• f igure 3 2 linear system()
• f igure 3 2()
• f igure 3 5()
• policy iteration(epsilon) with epsilon values 0.2 and 0