0% found this document useful (0 votes)

23 views3 pages

Solution 8

this is nptel assigemtn solution

Uploaded by

nandini kataria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views3 pages

Solution 8

this is nptel assigemtn solution

Uploaded by

nandini kataria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Assignment 8 (Sol.

)
Reinforcement Learning
Prof. B. Ravindran

1. Is the problem of non-stationary targets an issue when using Monte Carlo returns as targets?

(a) no
(b) yes

Sol. (a)
When Monte Carlo returns are used as targets, then the targets are stationary as the target
values do not change as the parameters evolve.
2. In a parameterised representation of the value function, we use a feature which acts as a counter
of some concept in the environment (number of cans the robot has collected, for example).
Does such a feature used for representing the state space lead to a violation of the Markov
property?

(a) no
(b) yes

Sol. (a)
As long as states contain adequate information so that the conditional probability distribution
of the future states depends only upon the current state of the environment, any kind of feature
can be used to describe the state space without violating the Markov property.
3. Which of the following will effect generalisation when using the tile coding method?

(a) modify the number of tiles in each tiling (assuming the range covered along each dimension
by the tilings remains unchanged)
(b) modify the number of tilings
(c) modify the size of tiles
(d) modify the shape of tiles

Sol. (a), (b), (c), (d)

Option (a) and (c) are equivalent and effect the range over which values of one state gen-
eralise to values of other states. The same effect is also achieved by the number of tilings,
since, for example, more tilings (with different overlapping regions) will result in more states
being affected by updates to the value of a single state. Finally, the shape of tiles has an
effect on generalisation since it allows us to vary between uniform generalisation (for example
using square tiles) and non-uniform generalisation (for example using rectangular tiles) where
generalisation is more pronounced along some dimensions and not so much along others.

1
4. For a particular MDP, suppose we use function approximation and using the gradient descent
approach, converge to the value function that is the global optimum. Is this value function
the same, in general, as the true value function of the MDP?

(a) no
(b) yes

Sol. (a)
The value function corresponding to the global optimum and the true value function need not
be the same considering that, given the representation used, the function approximation may
not even be able to express the true value function.
5. Which of the following methods would benefit from normalising the magnitudes of the basis
functions?

(a) on-line gradient descent TD(λ)

(b) linear gradient descent Sarsa(λ)
(c) LSPI
(d) none of the above

Sol. (a), (b)

Order of magnitude differences in the values of the features can impact the performance of
gradient descent procedures and hence such methods can benefit from normalisation of the
inputs. On the other hand LSPI involves solving a system of linear equations, for which
procedures exist which are not impacted by scaling issues.
6. Suppose that individual features, φi (s, a), used in the representation of the action value func-
tion are non-linear functions of s and a. Is it possible to use the LSTDQ method in such
scenarios?

(a) no
(b) yes

Sol. (b)
When discussing linear methods such as LSTDQ, we are talking about methods which can
approximate functions which are linear in the parameter vector, not the feature vector.

7. Which among the following statements about the LSTD and LSTDQ methods is/are correct?

(a) LSTD learns the state value function

(b) LSTDQ learns the action value function
(c) both LSTD and LSTDQ can reuse samples
(d) both LSTD and LSTDQ can be used along with tabular representations of value functions

Sol. (a), (b), (d)

Option (c) is not correct. Given a set of samples collected from the actual process (not from
a generative model in which case reusing samples is perhaps not that important) it is useful
for these samples to be reused in evaluating different policies. Recall that LSPI is a policy

2
iteration algorithm where the policy is constantly being improved upon (and hence changing).
Such sample reuse is possible in LSTDQ if for any policy π, π(s0 ) is available for each s0 in
the set of samples. This is because for different policies, the same samples can be made use
of, as individual policies only determine the φ(s0 , π(s0 )) component of Ã. This is not the case
with LSTD where freedom in action choices is not available and must be determined from the
policy that is being evaluated.

8. Consider the five state random walk task described in the book. There are five states,
{s1 , s2 , ..., s5 }, in a row with two actions each, left and right. There are two terminal states
at each end, with a reward of +1 for terminating on the right, after s5 and a reward of 0 for
all other transitions, including the one terminating on the left after s1 . In designing a linear
function approximator, what is the least number of state features required to represent the
value of the equi-probable random policy?

(a) 1
(b) 2
(c) 3
(d) 5

Sol. (a)
The value of the states from s1 to s5 are 1/6, 2/6, ..., 5/6. Hence a single feature, φ(si ) = i is
adequate to represent the values.

Assignment 8: Reinforcement Learning Prof. B. Ravindran
100% (2)
Assignment 8: Reinforcement Learning Prof. B. Ravindran
4 pages
Assignment 5
100% (1)
Assignment 5
2 pages
Assignment 4: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Assignment 4: Reinforcement Learning Prof. B. Ravindran
4 pages
Reinforcement Learning Assignment Solutions
100% (1)
Reinforcement Learning Assignment Solutions
4 pages
Solution 3
No ratings yet
Solution 3
4 pages
A12 Spring2024
No ratings yet
A12 Spring2024
5 pages
RL With LCS
No ratings yet
RL With LCS
29 pages
Solution 9
No ratings yet
Solution 9
3 pages
RL-solution 4
No ratings yet
RL-solution 4
4 pages
CS6700 RL 2024 Wa1
No ratings yet
CS6700 RL 2024 Wa1
7 pages
Assignment 7 (Sol.) : Reinforcement Learning
0% (1)
Assignment 7 (Sol.) : Reinforcement Learning
3 pages
Practice Assignment 5: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Practice Assignment 5: Reinforcement Learning Prof. B. Ravindran
2 pages
Assignment 6 (Sol.) : Reinforcement Learning
No ratings yet
Assignment 6 (Sol.) : Reinforcement Learning
4 pages
AI 3000 / CS 5500: Reinforcement Learning Assignment 1: Problem 1: Markov Reward Process
No ratings yet
AI 3000 / CS 5500: Reinforcement Learning Assignment 1: Problem 1: Markov Reward Process
5 pages
HW 2
No ratings yet
HW 2
2 pages
Problem 1: Markov Reward Process
No ratings yet
Problem 1: Markov Reward Process
3 pages
q2B Review Sol
No ratings yet
q2B Review Sol
14 pages
Lecture 8 Applications
No ratings yet
Lecture 8 Applications
26 pages
AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions
0% (1)
AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions
4 pages
cs747 A2020 Quizzes PDF
No ratings yet
cs747 A2020 Quizzes PDF
5 pages
Cs748 s2021 Quizzes Till q4
No ratings yet
Cs748 s2021 Quizzes Till q4
4 pages
AI Exam Prep for CS Students
No ratings yet
AI Exam Prep for CS Students
4 pages
cs188 Fa18 Final Sol
No ratings yet
cs188 Fa18 Final Sol
26 pages
Assignment 7
No ratings yet
Assignment 7
2 pages
GDD Nonlinear NIPS 2009 Convergent Temporal Difference Learning With Arbitrary Smooth Function Approximation
No ratings yet
GDD Nonlinear NIPS 2009 Convergent Temporal Difference Learning With Arbitrary Smooth Function Approximation
9 pages
Optimal Policy Transfer with SFs
No ratings yet
Optimal Policy Transfer with SFs
20 pages
242 Sheet 02 02
No ratings yet
242 Sheet 02 02
6 pages
Solution 2
0% (1)
Solution 2
6 pages
(CS 5008) Reinforcement Learning: Assignment 5
No ratings yet
(CS 5008) Reinforcement Learning: Assignment 5
2 pages
cs461 hw1
No ratings yet
cs461 hw1
14 pages
Assignment 4 (Sol.) : Reinforcement Learning
No ratings yet
Assignment 4 (Sol.) : Reinforcement Learning
6 pages
Quiz2 Sol
No ratings yet
Quiz2 Sol
4 pages
Reinforcement Learning - Unit 6 - Week 4
0% (1)
Reinforcement Learning - Unit 6 - Week 4
3 pages
Isye 6669 Final: Instructor: Prof. Shabbir Ahmed and Prof. Andy Sun
No ratings yet
Isye 6669 Final: Instructor: Prof. Shabbir Ahmed and Prof. Andy Sun
11 pages
Wa 2
No ratings yet
Wa 2
6 pages
Reinforcement Learning Exam
No ratings yet
Reinforcement Learning Exam
6 pages
8200 Non Delusional Q Learning and Value Iteration
No ratings yet
8200 Non Delusional Q Learning and Value Iteration
11 pages
Machine 2020 Jul-Dec
No ratings yet
Machine 2020 Jul-Dec
45 pages
Function Approximation
No ratings yet
Function Approximation
35 pages
Linear Programming MCQs
No ratings yet
Linear Programming MCQs
196 pages
MCQ On Linear Programming Problem
82% (11)
MCQ On Linear Programming Problem
7 pages
Final Solution
No ratings yet
Final Solution
12 pages
Handout 1 Introduction
No ratings yet
Handout 1 Introduction
7 pages
Mico
No ratings yet
Mico
27 pages
402 Lec20
No ratings yet
402 Lec20
21 pages
15-381 Spring 2007 Final Exam SOLUTIONS
No ratings yet
15-381 Spring 2007 Final Exam SOLUTIONS
18 pages
Reinforcement Learning - Unit 13 - Week 10
No ratings yet
Reinforcement Learning - Unit 13 - Week 10
3 pages
ML - Gate - Test 1
No ratings yet
ML - Gate - Test 1
7 pages
MULTI-OBJECTIVE Linear Programming
No ratings yet
MULTI-OBJECTIVE Linear Programming
37 pages
Isye 6669 Midterm Practice: Instructors: Prof. Andy Sun
No ratings yet
Isye 6669 Midterm Practice: Instructors: Prof. Andy Sun
10 pages
Discounted Markov Decision Processes
No ratings yet
Discounted Markov Decision Processes
26 pages
Devoir 1
No ratings yet
Devoir 1
6 pages
EC 2024 Zero Sum Stochastic Games Linear Function Approximation
No ratings yet
EC 2024 Zero Sum Stochastic Games Linear Function Approximation
55 pages
Linear Programming Simplified
No ratings yet
Linear Programming Simplified
4 pages
Math 407: Optimization Definitions
No ratings yet
Math 407: Optimization Definitions
6 pages
Supervised Machine Learning Regression
No ratings yet
Supervised Machine Learning Regression
6 pages
Lab6 2210110797
No ratings yet
Lab6 2210110797
3 pages
Lab7 2210110797
No ratings yet
Lab7 2210110797
4 pages
Admissions 2025 Information Brochure
No ratings yet
Admissions 2025 Information Brochure
15 pages
Basic Python Programs
No ratings yet
Basic Python Programs
7 pages
(Slides) Chapter 1. Single Variable Function
No ratings yet
(Slides) Chapter 1. Single Variable Function
60 pages
Ha Yeon Seo - Math AA IA
No ratings yet
Ha Yeon Seo - Math AA IA
20 pages
Feng - 210 - Engineering Maths - 2024 - 2025
No ratings yet
Feng - 210 - Engineering Maths - 2024 - 2025
92 pages
Electrodynamics for Physics Students
No ratings yet
Electrodynamics for Physics Students
531 pages
Differential Calculus Activity Book
No ratings yet
Differential Calculus Activity Book
30 pages
An Isotropic 3x3 Image Gradient Operator: February 2014
No ratings yet
An Isotropic 3x3 Image Gradient Operator: February 2014
6 pages
1969 - Snow - Anisotropic Permeability Fractured Media
No ratings yet
1969 - Snow - Anisotropic Permeability Fractured Media
17 pages
Lindgren 2025 J. Phys. Conf. Ser. 2987 012001
No ratings yet
Lindgren 2025 J. Phys. Conf. Ser. 2987 012001
16 pages
A Course Manual On Engineering Electroma
No ratings yet
A Course Manual On Engineering Electroma
107 pages
Year 8 Linear Equations PrePost Test
No ratings yet
Year 8 Linear Equations PrePost Test
8 pages
(Rothe) Topic From Relativity (2010)
No ratings yet
(Rothe) Topic From Relativity (2010)
107 pages
Notes On Analytical Mechanics
No ratings yet
Notes On Analytical Mechanics
58 pages
Vector Calculus: A M4 Mini Project
No ratings yet
Vector Calculus: A M4 Mini Project
16 pages
Einstein Derivation
No ratings yet
Einstein Derivation
12 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
Calculus: Polynomial Differentiation
No ratings yet
Calculus: Polynomial Differentiation
70 pages
Electromagnetism - Principles and Applications (Lorrain Corson) 0716700646
No ratings yet
Electromagnetism - Principles and Applications (Lorrain Corson) 0716700646
522 pages
Module - 2
No ratings yet
Module - 2
129 pages
Module 1: A Crash Course in Vectors: Lecture 1: Scalar and Vector Fields
No ratings yet
Module 1: A Crash Course in Vectors: Lecture 1: Scalar and Vector Fields
56 pages
4.8 Differentiation MS
No ratings yet
4.8 Differentiation MS
31 pages
AMA2112: Mathematics II Chapter 6. Vector Calculus: Dr. ZHANG Guofeng Guofeng - Zhang@polyu - Edu.hk
No ratings yet
AMA2112: Mathematics II Chapter 6. Vector Calculus: Dr. ZHANG Guofeng Guofeng - Zhang@polyu - Edu.hk
40 pages
Final Syllabus As Per VTH Deans Committee B Tech Agril Engg
No ratings yet
Final Syllabus As Per VTH Deans Committee B Tech Agril Engg
189 pages
Gradient: Expression in 3-Dimensional Rectangular Coordinates
No ratings yet
Gradient: Expression in 3-Dimensional Rectangular Coordinates
11 pages
ECE Digital Image Processing
No ratings yet
ECE Digital Image Processing
212 pages
Vector Calculus For Classical Electromagnetism Info
No ratings yet
Vector Calculus For Classical Electromagnetism Info
10 pages
Fourpot: Processing and Analysis of Potential Field Data Using 2-D Fourier Transform
No ratings yet
Fourpot: Processing and Analysis of Potential Field Data Using 2-D Fourier Transform
54 pages
Vector Electricity and Magnetism
No ratings yet
Vector Electricity and Magnetism
104 pages
Topographic Image Analysis
No ratings yet
Topographic Image Analysis
22 pages
Vectorized Neural Network Gradients
No ratings yet
Vectorized Neural Network Gradients
7 pages
Photoshop Photo Blending Guide
No ratings yet
Photoshop Photo Blending Guide
4 pages

Solution 8

Uploaded by

Solution 8

Uploaded by

Assignment 8 (Sol.

Sol. (a), (b), (c), (d)

(a) on-line gradient descent TD(λ)

Sol. (a), (b)

(a) LSTD learns the state value function

Sol. (a), (b), (d)

You might also like