0% found this document useful (0 votes)

3 views25 pages

Week 14 RL October 19

Uploaded by

Deepak Joshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views25 pages

Week 14 RL October 19

Uploaded by

Deepak Joshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Reinforcement Learning Refresher

Dr. Somdyuti Paul

Department of Artificial Intelligence
IIT Kharagpur
https://somdyuti2.github.io/

1
Hands-on approach to AI for real-world applications
What is Reinforcement Learning?
● Reinforcement Learning (RL) is simply learning by doing or in other words, learning from experience.

● The objective of a learner, also referred to as an agent is to figure out the best ways to achieve specific goals
under constraints imposed by the system in which it acts, referred to as the environment.

Agent

Environment

2
Hands-on approach to AI for real-world applications
What is Reinforcement Learning?
● At any instance, the information about the environment available to the agent is encapsulated by a random variable
called the state.
● The agent interacts with the environment by taking specific actions, which correspond to the permissible moves or
decisions in the environment
● Actions result in rewards (or penalties), which the agent receives as feedback from the environment.

Goal

+100 points
3
Hands-on approach to AI for real-world applications
What is Reinforcement Learning?
● The reward (+ve or -ve) allows the agent to update its rule for deciding on actions to
execute at each state, which is called its policy.
● The agent’s goal is to learn a policy that maximizes the cumulative reward over time.

= Great!
Follow the same path
again

4
Hands-on approach to AI for real-world applications
RL Notations and Markov Decision Process
● Notations used:
● St : state of the agent at step t
● At: action taken at step t
● Rt+1: reward received after executing At at St.

● An RL problem is typically formulated as a Markov Decision Process (MDP), which has

the Markov property of being memoryless, i.e. the next state depends only on the
current state and action taken, and not on the history of past states.

Markov Property

Pr(St+1 = st+1 | St = st, St−1 = st−1, ⋯S1 = s1) = Pr(St+1 = st+1 | St = st)

5
Hands-on approach to AI for real-world applications
RL Assignment: Question 1

You are developing a reinforcement learning-based system for an autonomous drone tasked with delivering
packages in an urban environment. The drone must navigate through a city, avoiding obstacles such as
buildings, trees, and other ying objects. It must also op mize its ight path to minimize delivery me while
ensuring safety and ba ery e ciency. The drone is equipped with sensors for detec ng obstacles and GPS for
naviga on.

Formulate this problem as an RL task. Specify the following components of the associated MDP:
• State Space (S): Describe how you would de ne the state space for this problem.
• Ac on Space (A): De ne the ac on space available for the drone
• Reward Func on (R): Propose a reward func on that encourages e cient and safe deliveries

6
Hands-on approach to AI for real-world applications
ti
ti
ti
fi
tt
fl
ffi
ti
fi
ti
ti
fl
ffi
ti
ti
RL Assignment: Question 1
Solution:

State Space (S):The state space should capture all relevant informa on about the drone's current situa on and environment. A possible state
representa on could include:
• The current posi on and al tude of the drone.
• The current velocity of the drone.
• The posi on of the delivery des na on.
• Obstacles in the vicinity
• The remaining ba ery level of the drone.
• The current weather condi ons (e.g., wind speed, rain).
Ac on Space: the ac on space consists of the set of possible maneuvers the drone can perform.These ac ons include:
• Move Forward: Increase the drone’s forward velocity.
• Move Backward: Decrease the drone’s forward velocity.
• Ascend: Increase the drone’s al tude.
• Descend: Decrease the drone’s al tude.
• Turn Le : Adjust the drone’s direc on to the le .
• Turn Right: Adjust the drone’s direc on to the right.
• Hover: Maintain the current posi on and al tude.

7
Hands-on approach to AI for real-world applications
ti
ft
ti
ti
ti
tt
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
ft
ti
ti
ti
RL Assignment: Question 1

Reward Function : the reward function should incentivize efficient, safe, and timely deliveries. A possible reward
function could be:
• +100 for successfully delivering the package to the destination.
• -1 for each second of flight time to encourage minimizing delivery time.
• -10 for collisions with obstacles or other drones.
• -0.1 point for each percent drop in battery level to encourage energy efficiency.

8
Hands-on approach to AI for real-world applications
Discounted Rewards and Return
● Often, an immediate reward is more valuable
to an agent than the same or even higher
reward received in the future.
● e.g., in the task of managing an investment
portfolio with RL, an action that yields $100
today might be preferred over another
action that yields $100 after 12 months.

● To account for this effect, a discount factor

γ ∈ (0,1] is introduced to scale future
Return
rewards.
2 T−t−1
Gt = Rt+1 + γRt+2 + γ Rt+3 + ⋯ + γ RT
● The cumulative discounted reward is called = Rt+1 + γGt+1
return.

9
Hands-on approach to AI for real-world applications
State and Action Value Functions
● Value functions are used to quantify the desirability of particular states or state,
action pairs from the agent’s perspective.
How good is it to be
How good is it to in state s and take
be in state s action a in state s
State (s)
State (s)
State Value V(s) Action Value Q(s, a)
Function Function
Action (a)

● The value functions are learned with respect to a policy (π).

● Learning the value functions enables the agent to find the optimal policy, which is the
policy that maximizes the agent’s return.

10
Hands-on approach to AI for real-world applications
Exploration vs. Exploitation Tradeoff
● While learning from the environment, the agent has the
following choices
● Exploration: Trying new actions to explore new states with
unknown rewards
● Exploitation: following past actions to visit known states with
known rewards.
Where to
● An agent that does not explore the environment enough may dine?
miss out on discovering actions that could lead to potentially
higher rewards.

● An agent that keeps exploring cannot effectively maximize its

return, as it keeps on selecting suboptimal actions without
leveraging any information learned about the environment.

11
Hands-on approach to AI for real-world applications
Exploration vs. Exploitation Tradeoff
● A simple, yet effective strategy to balance
exploration and exploitation is to randomly Pick Z ∼ U[0,1)
choose between exploration and exploitation at
each learning step: Z≤ϵ Z>ϵ
● With probability ϵ, the agent selects a random
action at a given state. Pick random action Pick greedy action
● With probability 1 − ϵ, the agent selects the
Exploration Exploitation
greedy action, i.e. the action which has the
maximum action value (q-value) at a given
state.
● The above policy is called ϵ-greedy.

12
Hands-on approach to AI for real-world applications
Exploration vs. Exploitation Tradeoff
● To encourage effective exploration in the early
stages of learning a large ϵ is chosen initially. ϵ0
● ϵ is gradually reduced as learning progresses, i.e.
more information about the environment δ = 0.99 (Decay rate)
t
becomes available to the agent for effective ϵ0δ
exploitation.
Exploration Exploitation

Epsilon Decay ϵmin

ϵt = max (ϵmin, ϵ0 ⋅ δ )t

● A fully trained agent relies on exploitation to

maximize its return.

13
Hands-on approach to AI for real-world applications
The Q-Table
For finite state-action spaces, the q-value function could be represented as a table:
The corresponding Q-table is a 16 × 4 matrix
0 1 2 3 Actions
Up
0 s1 s2 s3 s4
Q a1 : ↑ a2 : ↓ a3 : ← a4 : →
1 s5 s1 : (0,0) Q(s1, a1) Q(s1, a2) Q(s1, a3) Q(s1, a4)

⋮ ⋱ Left Right s2 : (0,1) Q(s2, a1) Q(s2, a2) Q(s2, a3) Q(s2, a4)

States
2 s3 : (0,2) Q(s3, a1) Q(s3, a2) Q(s3, a3) Q(s3, a4)

3 ⋯ s16 s4 : (0,3)
⋮
Q(s4, a1) Q(s4, a2)
⋮ ⋮
Q(s4, a3)
⋮
Q(s4, a4)
⋮
Down
s16 : (3,3) Q(s16, a1) Q(s16, a2) Q(s16, a3) Q(s16, a4)
16 states
4 actions

14
Hands-on approach to AI for real-world applications
Q-learning
● The goal of Q-learning is to learn the action values in the Q-table, such that the
learned q-values give a policy that maximizes the agent’s return.
● This learning is accomplished using a recursive formulation called the Bellman
equation, that allows the q-value function of a given state to be expressed in-terms
of the q-value function of the next state, under a given policy π.

The Bellman Equation The expected return on taking action a in

state s, and following policy π thereafter.
Qπ(s, a) = [Rt+1 + γQπ(St+1, At+1) St = s, At = a]

15
𝔼
Hands-on approach to AI for real-world applications
Q-learning
Initialize Q-table

Behavior Policy is
Pick action A for the current
taken to be ϵ-greedy
state S using behavior policy

Perform action A

Observe reward R and next

state S′

Update Rule (Based on Bellman’s Equation)

Update Q-table
Q(S, A) ← Q(S, A) + α[R + γ maxQ(S′, a) − Q(S, A)]
a

Update current state

16
Hands-on approach to AI for real-world applications

Understanding the Q-learning Update Rule
From Bellman’s equation, we can estimate the q-value of taking action A taken at the state S and
following a greedy policy thereafter as follows:
̂ A) = R + γmaxQ(S′, a)
Q(S,
a

The difference between q-value estimated from the Bellman’s equation and the current q-value is
called the temporal difference error or TD-error:
̂ A) − Q(S, A)
TD-Error = Q(S,

The Q-value is updated to correct the previous estimate according to the new estimate:
Q(S, A) ← Q(S, A) + α ⋅ TD-Error, where α is the learning rate.
Q(S, A) ← Q(S, A) + α[R + γmaxQ(S′, a) − Q(S, A)]
a

17
Hands-on approach to AI for real-world applications

Q-learning Example
Consider the following 3 × 3 gridworld, where the robot’s goal is to navigate from the initial state, s1,
to the goal state, s9, to make a delivery while avoiding potential obstacles or hazards.

• On visiting any state, the robot’s battery depletes by 1%, s1 s2 s3

which corresponds to a reward of -1.
-1 -1 -1
• On visiting the state s5 which has a hazard, the robot gets s4 s5 s6
damaged, cannot continue its task and has to start again
from s1, which corresponds to a reward of -10.
-1 -10 -1
s7 s8 s9
• On making a successful delivery by reaching the goal state, -1 +10
the robot receives a reward of +10. -1

18
Hands-on approach to AI for real-world applications
Q-learning Example
s1 s2 s3
• The Q-table is initialized with 0s. Let γ = 0.9 and
α = 0.1. -1 -1 -1
• Let the agent take the right action from its initial s4 s5 s6
state s1 to get a reward of -1.
-1 -10 -1
s7 s8 s9
• The TD error for Q(s1, a4) is thus
(R + γ . maxQ(s2, a)) − Q(s1, a4) = (−1 + 0.9 × 0) − 0 = − 1
a -1 -1 +10
a1
• So, Q(s1, a4) is updated as,
Q(s1, a4) ← Q(s1, a4) + α ⋅ TD-Error
Q(s1, a4) ← 0 + 0.1 ⋅ (−1) = − 0.1 a3 a4

a2 19
Hands-on approach to AI for real-world applications
Q-learning Example
Suppose the Q-table learned by the agent after 1 episode is as shown:
Q-table after 1 episode
In the next episode, if the agent starting at state s1
takes the right action, the corresponding update to
Q(s1, a4) would be performed as follows:
TD-Error = (R + γ . maxQ(s2, a)) − Q(s1, a4)

States
a
= (−1 + 0.9 × 0) − (−0.19)
= − 0.81
Q(s1, a4) ← Q(s1, a4) + α ⋅ TD-Error
Q(s1, a4) ← − 0.19 + 0.1 ⋅ (−0.81) = − 0.271 Actions

20
Hands-on approach to AI for real-world applications
Q-learning Example
After several episodes, the q-values converge to the values shown in the following Q-table.

s1 s2 s3
-1 -1 -1
s4 s5 s6

-1 -10 -1
s7 s8 s9

-1 -1 +10
Learned Q-table Optimal policy of the agent using
the learned Q-table
21
Hands-on approach to AI for real-world applications
RL Assignment Question 3
In a simple MDP, an agent is in a state s, and the actions it can take can lead to the following outcomes:
With probability 0.4, the agent transitions to state s′, with reward R = 10, and v(s′) = 5
With probability 0.6, the agent transitions to state s′′, with reward R = 2, and v(s′) = 3.

The discount factor γ is 0.5. Using Bellman equation, find the expected value of state s.

Solution:
vπ(s) = π[Rt+1 + γvπ(St+1) | St = s]
According to the given policy,
[Rt+1] = 0.4 × 10 + 0.6 × 2 = 5.2
[vπ(St+1)] = 0.4 × 5 + 0.6 × 3 = 3.8
So, we have:
vπ(s) = 5.2 + 0.5 × 3.8 = 7.1
Thus, the expected value of state s is 7.1.
22
Hands-on approach to AI for real-world applications
𝔼
𝔼

𝔼
Key Learnings

This lecture should enable you to:

• Identify if a particular problem is suitable to be formulated as a RL problem.
• Identify states, actions and rewards in a particular environment.
• Understand the exploration-exploitation trade-off involved in RL
• Understand the basics of Q-learning
• Apply concepts to simple scenarios

23
Hands-on approach to AI for real-world applications
RL Resources
• This was a very high-level overview!
• Resources for learning RL:
• Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto
• RL course lectures by David Silver
• Resources for implementing RL algorithms:
• Gymnasium - a diverse collection of RL environments, including classic control, robotics, games etc.
• Stable Baselines3 - implementations of well-known and useful RL algorithms in PyTorch with trained
models and example code snippets.
• TensorFlow Agents - A TensorFlow based library for implementing and testing RL algorithms.
• https://www.reddit.com/r/reinforcementlearning/ - community for discussing everything related to RL
(such as project ideas, help in understanding research papers, etc.)
24
Hands-on approach to AI for real-world applications
Thank You!!
Presentation link:
https://www.icloud.com/keynote/08fwyHyWUKQb32IpZIR2Nvoww#RL%5FOctober%5F19

25
Hands-on approach to AI for real-world applications

Week 7 - Reinforcement Learning Hands On AI
No ratings yet
Week 7 - Reinforcement Learning Hands On AI
28 pages
Reinforcement Learning - Personal Study Notes
No ratings yet
Reinforcement Learning - Personal Study Notes
12 pages
Green and Black Modern Machine Learning Presentation
No ratings yet
Green and Black Modern Machine Learning Presentation
14 pages
Unit 3 Ai
No ratings yet
Unit 3 Ai
5 pages
37 RL
No ratings yet
37 RL
18 pages
ML Unit 4
No ratings yet
ML Unit 4
17 pages
Unit 02 REL
No ratings yet
Unit 02 REL
127 pages
01 RL Fundamentals - Complete Beginner's Guide
No ratings yet
01 RL Fundamentals - Complete Beginner's Guide
22 pages
Reinforcement Learning Mastery Path
No ratings yet
Reinforcement Learning Mastery Path
18 pages
10 ReinforcementLearning
No ratings yet
10 ReinforcementLearning
59 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
Reinforcement Learning Guide
No ratings yet
Reinforcement Learning Guide
18 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
Sections
No ratings yet
Sections
76 pages
AI Learning for Advanced Users
No ratings yet
AI Learning for Advanced Users
12 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
35 pages
03 04 Lessonarticle
No ratings yet
03 04 Lessonarticle
5 pages
Lec 04 Reinforcement Learning
No ratings yet
Lec 04 Reinforcement Learning
57 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
A Crash Course On Reinforcement Learning - Felix Wagner
No ratings yet
A Crash Course On Reinforcement Learning - Felix Wagner
84 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
10 pages
Final
No ratings yet
Final
18 pages
DQN Atari
No ratings yet
DQN Atari
26 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
Introduction To Reinforcement Learning
100% (1)
Introduction To Reinforcement Learning
52 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
63 pages
RLDL Unit 1
No ratings yet
RLDL Unit 1
15 pages
Reinforcement Learning For IoT - Final
No ratings yet
Reinforcement Learning For IoT - Final
45 pages
ML Unit-4
No ratings yet
ML Unit-4
10 pages
Learning Task
No ratings yet
Learning Task
14 pages
Reinforcement Learning Notes ?
No ratings yet
Reinforcement Learning Notes ?
40 pages
21 - Reinforcement Learning
No ratings yet
21 - Reinforcement Learning
25 pages
Kguh
No ratings yet
Kguh
38 pages
10 Deep Reinforcement
No ratings yet
10 Deep Reinforcement
40 pages
Deep Reinforcement Learning: Lecture Notes
No ratings yet
Deep Reinforcement Learning: Lecture Notes
60 pages
F90de-Introduction To Reinforcement Learning
No ratings yet
F90de-Introduction To Reinforcement Learning
67 pages
Unit 5d - Deep Reinforcement Learning
No ratings yet
Unit 5d - Deep Reinforcement Learning
52 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
16 RL PDF
No ratings yet
16 RL PDF
87 pages
Q Learing
No ratings yet
Q Learing
30 pages
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
No ratings yet
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
16 pages
RL Lecturer
No ratings yet
RL Lecturer
38 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
17 pages
Lecture 06
No ratings yet
Lecture 06
98 pages
COMP3411 Week 4 - RL
No ratings yet
COMP3411 Week 4 - RL
79 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
16 pages
Intro To Reinforcement Learning
No ratings yet
Intro To Reinforcement Learning
56 pages
Unit 5 Deep Learning
No ratings yet
Unit 5 Deep Learning
24 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
16 pages
Sdfesdf
No ratings yet
Sdfesdf
23 pages
w7 - Reinforcement Learning
No ratings yet
w7 - Reinforcement Learning
5 pages
AI Course Overview for Students
No ratings yet
AI Course Overview for Students
82 pages
16 - Reinforcement Learning and Bandits
No ratings yet
16 - Reinforcement Learning and Bandits
41 pages
15 Deep Reinforcement Learning v24.2
No ratings yet
15 Deep Reinforcement Learning v24.2
115 pages
Week 3 NumPy Tutorial
No ratings yet
Week 3 NumPy Tutorial
13 pages
Week 11 - Deep Temporal Models
No ratings yet
Week 11 - Deep Temporal Models
84 pages
Week 12 Foundations of Generative AIv2 2
No ratings yet
Week 12 Foundations of Generative AIv2 2
74 pages
पंचदश दिन - षोडशवर्गीय कुण्डली निर्माण - 2025-07-01
No ratings yet
पंचदश दिन - षोडशवर्गीय कुण्डली निर्माण - 2025-07-01
2 pages
गृहकार्य 2025-07-04
No ratings yet
गृहकार्य 2025-07-04
23 pages
अष्टादश दिन - षोडशवर्गीय कुण्डली निर्माण - 2025-07-04
No ratings yet
अष्टादश दिन - षोडशवर्गीय कुण्डली निर्माण - 2025-07-04
3 pages
Broucher
No ratings yet
Broucher
6 pages
Week 7 - Mnist-Mlp
No ratings yet
Week 7 - Mnist-Mlp
7 pages
Ahmed 2022
No ratings yet
Ahmed 2022
7 pages
RL For Credit Risk
No ratings yet
RL For Credit Risk
21 pages
Artificial Intelligence Operated Elevator Using RL AIOERL
No ratings yet
Artificial Intelligence Operated Elevator Using RL AIOERL
4 pages
Design of Deep Reinforcement Learning Controller Through Data Assisted Model For Robotic Fish Speed Tracking
No ratings yet
Design of Deep Reinforcement Learning Controller Through Data Assisted Model For Robotic Fish Speed Tracking
14 pages
1 s2.0 S0950705123010389 Main
No ratings yet
1 s2.0 S0950705123010389 Main
18 pages
Project Propor Sal
No ratings yet
Project Propor Sal
3 pages
RL Concepts and Methods
No ratings yet
RL Concepts and Methods
8 pages
Intro to Machine Learning Basics
No ratings yet
Intro to Machine Learning Basics
38 pages
MLT Unit-5 Notes
No ratings yet
MLT Unit-5 Notes
17 pages
Mastering Machine Learning - A Comprehensive Guide
No ratings yet
Mastering Machine Learning - A Comprehensive Guide
19 pages
A Review of Vertical Switching Algorithms For Heterogeneous Wireless Networks
No ratings yet
A Review of Vertical Switching Algorithms For Heterogeneous Wireless Networks
6 pages
Mitchell Tom 1996 1
No ratings yet
Mitchell Tom 1996 1
25 pages
AI 20report 205
No ratings yet
AI 20report 205
18 pages
Algorithm Trading Using Q-Learning and Recurrent Reinforcement Learning PDF
No ratings yet
Algorithm Trading Using Q-Learning and Recurrent Reinforcement Learning PDF
7 pages
Slides Active Flow Control Deep Reinforcement Learning
No ratings yet
Slides Active Flow Control Deep Reinforcement Learning
46 pages
DRLBTSA: Deep Reinforcement Learning Based Task Scheduling Algorithm in Cloud Computing
No ratings yet
DRLBTSA: Deep Reinforcement Learning Based Task Scheduling Algorithm in Cloud Computing
29 pages
Introduction To AI Robotics Intelligent Robotics and Autonomous Agents Series 2nd Edition Robin R. Murphy Online Reading
100% (2)
Introduction To AI Robotics Intelligent Robotics and Autonomous Agents Series 2nd Edition Robin R. Murphy Online Reading
79 pages
Enhancing Q-Learning Speed Using Selective Signal Injection
No ratings yet
Enhancing Q-Learning Speed Using Selective Signal Injection
4 pages
Deep Reinforcement Learning in Action 1st Edition Alexander Zai No Waiting Time
No ratings yet
Deep Reinforcement Learning in Action 1st Edition Alexander Zai No Waiting Time
107 pages
Machine Learning Presentation
No ratings yet
Machine Learning Presentation
20 pages
Demand Response in Electricity Markets: An Overview: July 2007
No ratings yet
Demand Response in Electricity Markets: An Overview: July 2007
7 pages
Reinforcement Learning For Building Management Systems
No ratings yet
Reinforcement Learning For Building Management Systems
9 pages
A Survey On Reinforcement Learning Methods For UAV Systems
No ratings yet
A Survey On Reinforcement Learning Methods For UAV Systems
36 pages
Course Objectives: Teavsbdhbzdnching A Reinforcement Learning Agent To Play Atari Using Deep Q-Learning
No ratings yet
Course Objectives: Teavsbdhbzdnching A Reinforcement Learning Agent To Play Atari Using Deep Q-Learning
3 pages
On-Line Building Energy Optimization Using Deep Reinforcement Learning
No ratings yet
On-Line Building Energy Optimization Using Deep Reinforcement Learning
11 pages
cs461 hw1
No ratings yet
cs461 hw1
14 pages
Scaling UPF Instances in 5G 6G Core With Deep Reinforcement Learning-1
No ratings yet
Scaling UPF Instances in 5G 6G Core With Deep Reinforcement Learning-1
15 pages
Course Objectives: Teaching A Reinforcement Learning Agent To Play Atari Using Deep Q-Learning
No ratings yet
Course Objectives: Teaching A Reinforcement Learning Agent To Play Atari Using Deep Q-Learning
3 pages
Stochastic Reactive Production Scheduling by Multi-Agent Based Asynchronous Approximate Dynamic Programming
No ratings yet
Stochastic Reactive Production Scheduling by Multi-Agent Based Asynchronous Approximate Dynamic Programming
10 pages
Shi Et Al. - 2022 - DeepTPI Test Point Insertion With Deep Reinforcement Learning
No ratings yet
Shi Et Al. - 2022 - DeepTPI Test Point Insertion With Deep Reinforcement Learning
10 pages

Week 14 RL October 19

Uploaded by

Week 14 RL October 19

Uploaded by

Reinforcement Learning Refresher

Dr. Somdyuti Paul

● An RL problem is typically formulated as a Markov Decision Process (MDP), which has

● To account for this effect, a discount factor

● The value functions are learned with respect to a policy (π).

● An agent that keeps exploring cannot effectively maximize its

Epsilon Decay ϵmin

● A fully trained agent relies on exploitation to

The Bellman Equation The expected return on taking action a in

Observe reward R and next

Update Rule (Based on Bellman’s Equation)

Update current state

• On visiting any state, the robot’s battery depletes by 1%, s1 s2 s3

This lecture should enable you to:

You might also like