0% found this document useful (0 votes)

15 views44 pages

RL Unit 1

The document provides an overview of Reinforcement Learning (RL), its principles, and applications, emphasizing its difference from supervised learning. It outlines course objectives, key concepts such as policies, reward functions, and types of reinforcement, as well as historical developments in the field. Additionally, it discusses examples like Tic-Tac-Toe and highlights the evolution of RL techniques over time.

Uploaded by

ayraf raihan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views44 pages

RL Unit 1

Uploaded by

ayraf raihan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 44

Reinforcement Learning

Ms Aayushi Jain
Assistant Professor
Dept. of CSE/IT
Contact Email:
jain.aayushi@manipal.edu
Course Objectives and Outcomes
This course will enable learners to
• Learn how to define RL Tasks
• Understand the core principles behind RL
• Understand policies and value functions
At the end of this course, Learners will be able to
• Understand the Reinforcement Learning Problem.
• Develop an insight into the Basics of Probability and Multi-arm Bandits.
• Recognize Finite Markov Decision Processes and Dynamic
Programming Techniques.
• Understand Monte Carlo Methods and Temporal-Difference Learning.
• Enumerate Approximate Solution Methods
What is ML?

• Machine learning (ML) is a

branch of artificial intelligence
(AI) and computer science that
focuses on the using data and
algorithms to enable AI to
imitate the way that humans
learn, gradually improving its
accuracy.(source-IBM)
Reinforcement Learning
• Reinforcement learning is an area of Machine Learning. It is about taking
suitable action to maximize reward in a particular situation.

• It is employed by various software and machines to find the best possible

behavior or path it should take in a specific situation.

• It differs from supervised learning in a way that in supervised learning the

training data has the answer key with it so the model is trained with the
correct answer itself whereas in reinforcement learning, there is no answer
but the reinforcement agent decides what to do to perform the given task.
Reinforcement Learning
•In the absence of a training dataset, it is bound to learn from its
experience.
•Reinforcement Learning (RL) is the science of decision making.
•It is about learning the optimal behavior in an environment to obtain
maximum reward.
•In RL, the data is accumulated from machine learning systems that use
a trial-and-error method. Data is not part of the input that we would
find in supervised or unsupervised machine learning.
Reinforcement Learning(contd..)
• It uses algorithms that learn from outcomes and decide which action to take next.
After each action, the algorithm receives feedback that helps it determine whether
the choice it made was correct, neutral or incorrect.
• It is a good technique to use for automated systems that have to make a lot of
small decisions without human guidance.
Example
Example(contd..)

The above image shows the robot, diamond, and fire. The goal of the
robot is to get the reward that is the diamond and avoid the hurdles that
are fired. The robot learns by trying all the possible paths and then
choosing the path which gives him the reward with the least hurdles.
Each correct step will give the robot a reward and each wrong step will
subtract the reward of the robot. The total reward will be calculated
when it reaches the final reward that is the diamond.
Main points in Reinforcement learning

Input: The input should be an initial state from which the model will start.
Output: There are many possible outputs as there are a variety of solutions
to a particular problem
Training: The training is based upon the input; The model will return a state
and the user will decide to reward or punish the model based on its output.
•The model keeps continues to learn.
•The best solution is decided based on the maximum reward.
Difference between Reinforcement and Supervised
Learning

Reinforcement learning Supervised learning

Reinforcement learning is all about making decisions

sequentially. In simple words, we can say that the In Supervised learning, the decision is made on
output depends on the state of the current input and the initial input or the input given at the start
the next input depends on the output of the previous
input

In supervised learning the decisions are

In Reinforcement learning decision is dependent, So
independent of each other, so labels are
we give labels to sequences of dependent decisions
given to each decision.

Example: Chess game, text summarization Example: Object recognition, spam detection
Types of Reinforcement Learning

There are two types of Reinforcement:

• Positive: Positive Reinforcement is defined as when an event, occurs due to a
particular behavior, increases the strength and the frequency of the behavior. In
other words, it has a positive effect on behavior.
• Advantages of reinforcement learning are:
• Maximizes Performance
• Sustain Change for a long period of time
• Too much Reinforcement can lead to an overload of states which can
diminish the results
Types of Reinforcement Learning

Negative: Negative Reinforcement is defined as strengthening of behaviour because a

negative condition is stopped or avoided.

Advantages of reinforcement learning:

1. Increases behaviour.
2. Provide defiance to a minimum standard of performance.
3. It Only provides enough to meet up the minimum behaviour.
Elements of Reinforcement Learning
Reinforcement learning elements are as follows:
• Policy
• Reward function
• Value function
• Model of the environment
• Policy: Policy defines the learning agent behavior for given time period. It is a
mapping from perceived states of the environment to actions to be taken when in
those states.
• Reward function: Reward function is used to define a goal in a reinforcement
learning problem. It provides a numerical score based on the state of the
environment.
Elements of Reinforcement Learning
• Value function: Value functions specify what is good in the long run. The value
of a state is the total amount of reward an agent can expect to accumulate over the
future, starting from that state.
• Model of the environment: Models are used for planning.
Tic-Tac-Toe Game • It’s a paper and pencil game of two players X and O,
who chooses to mark spaces on a grid of 3X3.
Look up Table
Approach • The game is won by the player who succeeds in keeping
three of their marks in horizontal, vertical or in diagonal.
Tic-Tac-Toe Game Playing
• Two Players – Human, Computer
• The objective is to write a computer program in such a way that computer wins
most of the time.
• Three approaches are presented to play this game which increase in
• Complexity
• Use of generalization
• Clarity of their knowledge
• Extensibility of their approach
Tic-Tac-Toe Game Playing
• Let us assume we have two Players – Player 1 represented as X, Player 2
represented as O.
• A player who gets 3 consecutive marks first, they will win the game.
Tic-Tac-Toe Game Playing
Tic-Tac-Toe Game Playing

•Data Structures: Board and Move Table

•Consider a board having nine elements vector. Each element will
contain
• 0 for blank
• 1 indicating X player move
• 2 indicating O player move
• Computer may play X or O player
• First player is always X.
Tic-Tac-Toe Game Playing

•Move Table:
• It is a vector of 3^9 elements, each element of which is a nine-element vector
representing board position.
• Total of 19683 elements are presented in move table.
Tic-Tac-
Toe
Game
Playing
Tic-Tac-Toe Game Playing
•Advantages
This Program is very efficient in time.
• Disadvantages
• A lot of space to store the Move-table.
• A lot of work to specify all the entries in the Move-table.
• Difficult to extend.
• Highly error prone as data is high.
• Poor extensibility
• Not an intelligent program.
Tic-Tac-Toe
Game
Playing • Here we assign board
position to vector
(Magic elements
• Sum of all rows,
Squares) columns, diagonals must
be 15.
Tic-Tac-Toe Game Playing(Magic Squares)

• Algorithm:
First machine will check the chances to win,
- Difference between 15 and the sum of two squares
- If this difference is not positive or greater than 9, then the original two squares
are not colinear and it can be ignored.
- or it will check the opponent of winning and block the chances of winning.
Tic-Tac-Toe Game Playing(Magic Squares)
Tic-Tac-Toe Game Playing(Magic Squares)

• Difference = 15-(4+5)= 15-9 =6

6 is not empty in the previous state. Therefore, computer can’t win the game then
check for the chances of winning human.
• Difference= 15-(8+6)= 15-14 =1
Therefore, position 1 is free and there is a chance to win for the human so that
computer have to block the position 1 by taking the next move.
Difference = 15- (5+4) = 6.
6 is not empty so that
computer can’t win the
Tic-Tac-Toe game.

Game Difference= 15-(1+4)= 10

Playing(Magic 10 is greater than 9 so that
Squares) computer can’t win the
game.

Difference = 15- (1+5) = 9

9 is empty so that computer
win the game. Computer
move to 9.
Tic-Tac-Toe game(Third approach)
Data Structures required
Board position is a structure containing
•A 9-element array representing the board.
•A list of board positions that could result from the next move.
•A number or rating representing an estimate of how likely the board.
Tic-Tac-
Toe
game(Thi
rd
approac
h)
Tic-Tac-
Toe
game(Thi
rd
approac
h)
Tic-Tac-
Toe
game(Thi
rd
approac
h)
Tic-Tac-Toe game(Third approach)

•This program requires more time than the other approaches.

•It has to search a tree representing all possible move sequences before
making each move.
•This approach is extensible to handle
•3-dimensional tic-tac-toe
•It could be extended to handle games which are more complicated
than tic-tac-toe.
History of Reinforcement Learning

•1911- Law of effect by Thorndike which states that behavioral

responses that are most proximal to a satisfying result are more likely
to become patterns.
•1948- Alan Turing described a machine named p-type which consists
of neuron like elements connected into networks. It have two
interfering inputs pleasure/reward and pain/punishment. He studied
that it resembles the patterns of how a child learns which analogous to
the training procedure of a machine.
•1950 to 1960- Richard bellman developed Bellman equation to
identify optimal return function.
History of Reinforcement Learning

•1950 to 1960-Ronald Howard developed Markov decision process

which is a discrete-time stochastic control process. It provides a
mathematical framework for modeling decision making in situations
where outcomes are partly random and partly under the control of a
decision maker.
•1960 to 1970- Minsky’s Credit assignment Problem refers to the
problem of measuring the influence and impact of an action taken by
an agent on future rewards. The core aim is to guide the agents to take
corrective actions which can maximize the reward.
Credit
Assignm
ent
Problem
History of Reinforcement Learning

•1970 to 1980s- Creation of low memory machines to improve the

probability of rewards called Learning Automata.
•It is an adaptive decision-making unit situated in a random
environment that learns the optimal action through repeated
interactions with its environment.
•The actions are chosen according to a specific probability distribution
which is updated based on the environment response the automation
obtains by performing a particular action.
History of Reinforcement Learning

•1970 to 1980s- Temporal Difference Learning in reinforcement

learning works as an unsupervised learning method. It helps predict
the total expected future reward.
•The trick is pretty simple. Instead of trying to figure out the total
future reward temporal difference learning just tries to guess the mix
of immediate reward and its own reward prediction for the next
moment.
•It aims to bring the expected prediction and the new prediction
together, thus matching expectations with reality and gradually
increasing the accuracy of the entire chain of prediction.
Example of Temporal difference Learning
Parameters in Temporal difference Learning
History of Reinforcement Learning
•1990- TD-Gammon is a computer backgammon program developed in
1992 by Gerald Tesauro at IBM's Thomas J. Watson Research Center.
•Its name comes from the fact that it is an artificial neural net trained
by a form of temporal-difference learning, specifically TD-Lambda.
•2000-2020s- AlphaGo is a first computer program to defeat a human
Go player and a Go world champion.
•Google DeepMind patented an application of Deep Q-Learning that
can play 2600 games at expert human levels.
•DRL for Autonomous driving vehicles and robotics.

Module 1
No ratings yet
Module 1
72 pages
Unit 1 Reinforcement Learning
No ratings yet
Unit 1 Reinforcement Learning
70 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
Assignment 15 Modern AI
No ratings yet
Assignment 15 Modern AI
3 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
MLT Unit-5 Notes
No ratings yet
MLT Unit-5 Notes
17 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Unit 6
No ratings yet
Unit 6
34 pages
Reinforcement
No ratings yet
Reinforcement
9 pages
Unit-5 (AI)
No ratings yet
Unit-5 (AI)
21 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
17 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
RL & DL Notes
No ratings yet
RL & DL Notes
43 pages
Unit-5 Mla
No ratings yet
Unit-5 Mla
22 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
136 pages
RL & DL Notes
No ratings yet
RL & DL Notes
73 pages
Reinforcemnet Learning
No ratings yet
Reinforcemnet Learning
8 pages
Unit 5
No ratings yet
Unit 5
45 pages
IntroductiontoRL BR
No ratings yet
IntroductiontoRL BR
22 pages
21ai020 & Reinforcement Learning UNIT 1-LM:1
No ratings yet
21ai020 & Reinforcement Learning UNIT 1-LM:1
8 pages
Unit V Reinforcement Learning and Genetic Algorithm
No ratings yet
Unit V Reinforcement Learning and Genetic Algorithm
40 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
Lecture 9 Reiforcement Learning
No ratings yet
Lecture 9 Reiforcement Learning
29 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
64 pages
RL Presentation2
No ratings yet
RL Presentation2
19 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
Unit-6 Reinforcement Learning
No ratings yet
Unit-6 Reinforcement Learning
75 pages
Exp-14 Reinforcement Learning
No ratings yet
Exp-14 Reinforcement Learning
11 pages
AI Unit - 3
No ratings yet
AI Unit - 3
102 pages
Unit 5 ML 3year
No ratings yet
Unit 5 ML 3year
17 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
2 pages
Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
Unit 3
No ratings yet
Unit 3
29 pages
Winter Semester 2023-24 - CSE4037 - ETH - AP2023246000594 - 2024-01-05 - Reference-Material-I
No ratings yet
Winter Semester 2023-24 - CSE4037 - ETH - AP2023246000594 - 2024-01-05 - Reference-Material-I
35 pages
10 ReinforcementLearning
No ratings yet
10 ReinforcementLearning
59 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
19 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
32 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
5.5 Reinforcement Learning
No ratings yet
5.5 Reinforcement Learning
5 pages
Reinforcement Learning Guide
No ratings yet
Reinforcement Learning Guide
64 pages
ML Unit-4
No ratings yet
ML Unit-4
10 pages
ML Unit-V
No ratings yet
ML Unit-V
20 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
29 pages
DRL Final Notes
No ratings yet
DRL Final Notes
281 pages
Types of Data:: Reference Website
No ratings yet
Types of Data:: Reference Website
15 pages
Lecture#1 - RL An Introduction 2023
No ratings yet
Lecture#1 - RL An Introduction 2023
44 pages
Markov Decision Process and Reinforcement Learning
No ratings yet
Markov Decision Process and Reinforcement Learning
36 pages
RL - 01 Introduction To Reinforcement Learning
No ratings yet
RL - 01 Introduction To Reinforcement Learning
62 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
30 pages
Artificial Intelligence: Computer Science & Engineering, Khulna University
No ratings yet
Artificial Intelligence: Computer Science & Engineering, Khulna University
30 pages
Module 01
No ratings yet
Module 01
66 pages
Sara Reinforcement Learning
No ratings yet
Sara Reinforcement Learning
69 pages
L-14 - Reinforcement-L-d-07062024-111949am
No ratings yet
L-14 - Reinforcement-L-d-07062024-111949am
22 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
5 pages
UNIT V Reinforcement Learning
No ratings yet
UNIT V Reinforcement Learning
8 pages
Unit 4
No ratings yet
Unit 4
56 pages
Module 2 AI
No ratings yet
Module 2 AI
46 pages
Semantic Web - Module1
No ratings yet
Semantic Web - Module1
25 pages
Module 2
No ratings yet
Module 2
97 pages
01-NDL Theory and Lab Syllabus
No ratings yet
01-NDL Theory and Lab Syllabus
4 pages
Cse357 Combinatorial Studies
No ratings yet
Cse357 Combinatorial Studies
2 pages
2nd Year Computer Long Questions
70% (10)
2nd Year Computer Long Questions
4 pages
Exceed v14. 013.730 HotFix Notes
No ratings yet
Exceed v14. 013.730 HotFix Notes
5 pages
Penetration Test Insights
No ratings yet
Penetration Test Insights
10 pages
ACS580 Drives for Cooling Compressors
No ratings yet
ACS580 Drives for Cooling Compressors
24 pages
Guideline WirelessLAN PDF
No ratings yet
Guideline WirelessLAN PDF
15 pages
Unit 1 Nmap, Zenmap, Scanner
No ratings yet
Unit 1 Nmap, Zenmap, Scanner
5 pages
All Trades Visible - Mcpack
No ratings yet
All Trades Visible - Mcpack
26 pages
CPE314 Syllabus
No ratings yet
CPE314 Syllabus
14 pages
MA300-BT: Metallic Casing Outdoor Access Control
No ratings yet
MA300-BT: Metallic Casing Outdoor Access Control
2 pages
Lastexception 63872575244
No ratings yet
Lastexception 63872575244
2 pages
USING LOOPS in MARIE
No ratings yet
USING LOOPS in MARIE
1 page
AUD CIS CH 1-6
No ratings yet
AUD CIS CH 1-6
25 pages
STD 12 Chapter 9 Working With Array and String Textual Exercise and Previous Years Board Papers
100% (1)
STD 12 Chapter 9 Working With Array and String Textual Exercise and Previous Years Board Papers
10 pages
Office 365 Activation Key and Crack Free Download Full Version
0% (2)
Office 365 Activation Key and Crack Free Download Full Version
7 pages
WTRL ZRL RulesandRegulationv11.01
No ratings yet
WTRL ZRL RulesandRegulationv11.01
16 pages
Intra Organisational Commerce: Advantages
No ratings yet
Intra Organisational Commerce: Advantages
18 pages
Notes DBMS
No ratings yet
Notes DBMS
8 pages
SPM Unit-3 Part-1
No ratings yet
SPM Unit-3 Part-1
41 pages
Major Project ResearchPaper TravelBuddy
No ratings yet
Major Project ResearchPaper TravelBuddy
6 pages
1z0-064.examcollection - Premium.exam.84q: Number: 1z0-064 Passing Score: 800 Time Limit: 120 Min File Version: 4.1
No ratings yet
1z0-064.examcollection - Premium.exam.84q: Number: 1z0-064 Passing Score: 800 Time Limit: 120 Min File Version: 4.1
45 pages
Mac Os
No ratings yet
Mac Os
2 pages
Bears Manual
No ratings yet
Bears Manual
22 pages
Class 10 IT Syllabus
No ratings yet
Class 10 IT Syllabus
13 pages
DBMS Lectures Compile
No ratings yet
DBMS Lectures Compile
183 pages
Spanish Emotions for Students
No ratings yet
Spanish Emotions for Students
14 pages
Build A Flutter Video Calling App
No ratings yet
Build A Flutter Video Calling App
13 pages
How To Install Teorex Inpaint 5.1 + Portable
No ratings yet
How To Install Teorex Inpaint 5.1 + Portable
1 page
HD Resume
No ratings yet
HD Resume
2 pages
1991-1997 Apple Inc
No ratings yet
1991-1997 Apple Inc
2 pages

RL Unit 1

Uploaded by

RL Unit 1

Uploaded by

Reinforcement Learning

• Machine learning (ML) is a

• It is employed by various software and machines to find the best possible

• It differs from supervised learning in a way that in supervised learning the

Reinforcement learning Supervised learning

Reinforcement learning is all about making decisions

In supervised learning the decisions are

There are two types of Reinforcement:

Negative: Negative Reinforcement is defined as strengthening of behaviour because a

Advantages of reinforcement learning:

•Data Structures: Board and Move Table

• Difference = 15-(4+5)= 15-9 =6

Game Difference= 15-(1+4)= 10

Difference = 15- (1+5) = 9

•This program requires more time than the other approaches.

•1911- Law of effect by Thorndike which states that behavioral

•1950 to 1960-Ronald Howard developed Markov decision process

•1970 to 1980s- Creation of low memory machines to improve the

•1970 to 1980s- Temporal Difference Learning in reinforcement

You might also like