0% found this document useful (0 votes)
18 views3 pages

RL Programming 1

This document outlines the instructions for a programming assignment in Reinforcement Learning due on November 15, 2024. Students are required to work in groups to modify a provided grid-world Python program, complete specific tasks related to MDP algorithms, and submit their work online. The assignment is worth 100 points and constitutes 10% of the final grade.

Uploaded by

nikitha.r-26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views3 pages

RL Programming 1

This document outlines the instructions for a programming assignment in Reinforcement Learning due on November 15, 2024. Students are required to work in groups to modify a provided grid-world Python program, complete specific tasks related to MDP algorithms, and submit their work online. The assignment is worth 100 points and constitutes 10% of the final grade.

Uploaded by

nikitha.r-26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Reinforcement Learning

Programming Assignment 1
Due: Friday, Nov 15, 2024 (11:59pm)

Instructions
• This assignment is to be done in groups of 3. It is fine to work in groups of 2
or individually, but I prefer groups of 3.

• This assignment is to be submitted online via LMS on or before Nov 15, 2024
(11:59pm). Please submit 2 programs grid-world-1.py and grid-world-2.py and
just one PDF file with your written answer to Question-1, and the policy and
value function images for questions 3 and 4.

• This assignment is for 100 points. It is worth 10% of your final grades.

1
Problems
The objective of this assignment is to give you some practical experience with the
MDP algorithms we learned in class. We will work with the Grid World problem in
Example 3.5 (page 61) from chapter 3 in the textbook. Please download the entire
project from this link: Python code for examples in the textbook. We will take
take the program grid-world.py in the folder chapter03 from this project and make
changes to it.

1. (15 points) Describe in 1-2 sentences what these functions do in grid-world.py.


• f igure 3 2 linear system()
• f igure 3 2()
• f igure 3 5()

2. (25 points) Make a copy of grid-world.py and rename it as grid-world-1.py. Add


a function get epsilon greedy policy(value vector, epsilon) that computes and
returns an ϵ-greedy policy πϵV with respect to value vector V . The function
should take V and ϵ as arguments.

3. (30 points) Add to grid-world-1.py a function policy iteration(epsilon) that


computes and returns an optimal ϵ-greedy policy πϵ∗ using the policy iter-
ation algorithm. The function should take ϵ as argument. Use the func-
tions draw image(image) and draw policy(optimal values) to save the pol-
icy and value function to the Images folder. You need to make a copy of
f igure 3 2 linear system() and change it to take a policy as argument and
evaluate it. Test the function with ϵ values 0.2 and 0.0.

4. (30 points) Make a copy of grid-world-1.py and rename it as grid-world-2.py.


Change the function step(state, action) and all the other functions to make
actions stochastic as defined next. For example, for action left the next-state
should be the square to the left with probability 0.85, the square to the right
with probability 0.05, the square above with probability 0.05, and the square
below with probability 0.05. If the action takes you off the grid, do what the
existing code does to handle that. The other 3 actions are defined similarly.
Test all these functions for the new action definition (please see the next page).

2
• f igure 3 2 linear system()
• f igure 3 2()
• f igure 3 5()
• policy iteration(epsilon) with epsilon values 0.2 and 0

You might also like