0% found this document useful (0 votes)
15 views44 pages

RL Unit 1

The document provides an overview of Reinforcement Learning (RL), its principles, and applications, emphasizing its difference from supervised learning. It outlines course objectives, key concepts such as policies, reward functions, and types of reinforcement, as well as historical developments in the field. Additionally, it discusses examples like Tic-Tac-Toe and highlights the evolution of RL techniques over time.

Uploaded by

ayraf raihan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views44 pages

RL Unit 1

The document provides an overview of Reinforcement Learning (RL), its principles, and applications, emphasizing its difference from supervised learning. It outlines course objectives, key concepts such as policies, reward functions, and types of reinforcement, as well as historical developments in the field. Additionally, it discusses examples like Tic-Tac-Toe and highlights the evolution of RL techniques over time.

Uploaded by

ayraf raihan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Reinforcement Learning

Ms Aayushi Jain
Assistant Professor
Dept. of CSE/IT
Contact Email:
jain.aayushi@manipal.edu
Course Objectives and Outcomes
This course will enable learners to
• Learn how to define RL Tasks
• Understand the core principles behind RL
• Understand policies and value functions
At the end of this course, Learners will be able to
• Understand the Reinforcement Learning Problem.
• Develop an insight into the Basics of Probability and Multi-arm Bandits.
• Recognize Finite Markov Decision Processes and Dynamic
Programming Techniques.
• Understand Monte Carlo Methods and Temporal-Difference Learning.
• Enumerate Approximate Solution Methods
What is ML?

• Machine learning (ML) is a


branch of artificial intelligence
(AI) and computer science that
focuses on the using data and
algorithms to enable AI to
imitate the way that humans
learn, gradually improving its
accuracy.(source-IBM)
Reinforcement Learning
• Reinforcement learning is an area of Machine Learning. It is about taking
suitable action to maximize reward in a particular situation.

• It is employed by various software and machines to find the best possible


behavior or path it should take in a specific situation.

• It differs from supervised learning in a way that in supervised learning the


training data has the answer key with it so the model is trained with the
correct answer itself whereas in reinforcement learning, there is no answer
but the reinforcement agent decides what to do to perform the given task.
Reinforcement Learning
•In the absence of a training dataset, it is bound to learn from its
experience.
•Reinforcement Learning (RL) is the science of decision making.
•It is about learning the optimal behavior in an environment to obtain
maximum reward.
•In RL, the data is accumulated from machine learning systems that use
a trial-and-error method. Data is not part of the input that we would
find in supervised or unsupervised machine learning.
Reinforcement Learning(contd..)
• It uses algorithms that learn from outcomes and decide which action to take next.
After each action, the algorithm receives feedback that helps it determine whether
the choice it made was correct, neutral or incorrect.
• It is a good technique to use for automated systems that have to make a lot of
small decisions without human guidance.
Example
Example(contd..)

The above image shows the robot, diamond, and fire. The goal of the
robot is to get the reward that is the diamond and avoid the hurdles that
are fired. The robot learns by trying all the possible paths and then
choosing the path which gives him the reward with the least hurdles.
Each correct step will give the robot a reward and each wrong step will
subtract the reward of the robot. The total reward will be calculated
when it reaches the final reward that is the diamond.
Main points in Reinforcement learning

Input: The input should be an initial state from which the model will start.
Output: There are many possible outputs as there are a variety of solutions
to a particular problem
Training: The training is based upon the input; The model will return a state
and the user will decide to reward or punish the model based on its output.
•The model keeps continues to learn.
•The best solution is decided based on the maximum reward.
Difference between Reinforcement and Supervised
Learning

Reinforcement learning Supervised learning

Reinforcement learning is all about making decisions


sequentially. In simple words, we can say that the In Supervised learning, the decision is made on
output depends on the state of the current input and the initial input or the input given at the start
the next input depends on the output of the previous
input

In supervised learning the decisions are


In Reinforcement learning decision is dependent, So
independent of each other, so labels are
we give labels to sequences of dependent decisions
given to each decision.

Example: Chess game, text summarization Example: Object recognition, spam detection
Types of Reinforcement Learning

There are two types of Reinforcement:


• Positive: Positive Reinforcement is defined as when an event, occurs due to a
particular behavior, increases the strength and the frequency of the behavior. In
other words, it has a positive effect on behavior.
• Advantages of reinforcement learning are:
• Maximizes Performance
• Sustain Change for a long period of time
• Too much Reinforcement can lead to an overload of states which can
diminish the results
Types of Reinforcement Learning

Negative: Negative Reinforcement is defined as strengthening of behaviour because a


negative condition is stopped or avoided.

Advantages of reinforcement learning:


1. Increases behaviour.
2. Provide defiance to a minimum standard of performance.
3. It Only provides enough to meet up the minimum behaviour.
Elements of Reinforcement Learning
Reinforcement learning elements are as follows:
• Policy
• Reward function
• Value function
• Model of the environment
• Policy: Policy defines the learning agent behavior for given time period. It is a
mapping from perceived states of the environment to actions to be taken when in
those states.
• Reward function: Reward function is used to define a goal in a reinforcement
learning problem. It provides a numerical score based on the state of the
environment.
Elements of Reinforcement Learning
• Value function: Value functions specify what is good in the long run. The value
of a state is the total amount of reward an agent can expect to accumulate over the
future, starting from that state.
• Model of the environment: Models are used for planning.
Tic-Tac-Toe Game • It’s a paper and pencil game of two players X and O,
who chooses to mark spaces on a grid of 3X3.
Look up Table
Approach • The game is won by the player who succeeds in keeping
three of their marks in horizontal, vertical or in diagonal.
Tic-Tac-Toe Game Playing
• Two Players – Human, Computer
• The objective is to write a computer program in such a way that computer wins
most of the time.
• Three approaches are presented to play this game which increase in
• Complexity
• Use of generalization
• Clarity of their knowledge
• Extensibility of their approach
Tic-Tac-Toe Game Playing
• Let us assume we have two Players – Player 1 represented as X, Player 2
represented as O.
• A player who gets 3 consecutive marks first, they will win the game.
Tic-Tac-Toe Game Playing
Tic-Tac-Toe Game Playing

•Data Structures: Board and Move Table


•Consider a board having nine elements vector. Each element will
contain
• 0 for blank
• 1 indicating X player move
• 2 indicating O player move
• Computer may play X or O player
• First player is always X.
Tic-Tac-Toe Game Playing

•Move Table:
• It is a vector of 3^9 elements, each element of which is a nine-element vector
representing board position.
• Total of 19683 elements are presented in move table.
Tic-Tac-
Toe
Game
Playing
Tic-Tac-Toe Game Playing
•Advantages
This Program is very efficient in time.
• Disadvantages
• A lot of space to store the Move-table.
• A lot of work to specify all the entries in the Move-table.
• Difficult to extend.
• Highly error prone as data is high.
• Poor extensibility
• Not an intelligent program.
Tic-Tac-Toe
Game
Playing • Here we assign board
position to vector
(Magic elements
• Sum of all rows,
Squares) columns, diagonals must
be 15.
Tic-Tac-Toe Game Playing(Magic Squares)

• Algorithm:
First machine will check the chances to win,
- Difference between 15 and the sum of two squares
- If this difference is not positive or greater than 9, then the original two squares
are not colinear and it can be ignored.
- or it will check the opponent of winning and block the chances of winning.
Tic-Tac-Toe Game Playing(Magic Squares)
Tic-Tac-Toe Game Playing(Magic Squares)

• Difference = 15-(4+5)= 15-9 =6


6 is not empty in the previous state. Therefore, computer can’t win the game then
check for the chances of winning human.
• Difference= 15-(8+6)= 15-14 =1
Therefore, position 1 is free and there is a chance to win for the human so that
computer have to block the position 1 by taking the next move.
Difference = 15- (5+4) = 6.
6 is not empty so that
computer can’t win the
Tic-Tac-Toe game.

Game Difference= 15-(1+4)= 10


Playing(Magic 10 is greater than 9 so that
Squares) computer can’t win the
game.

Difference = 15- (1+5) = 9


9 is empty so that computer
win the game. Computer
move to 9.
Tic-Tac-Toe game(Third approach)
Data Structures required
Board position is a structure containing
•A 9-element array representing the board.
•A list of board positions that could result from the next move.
•A number or rating representing an estimate of how likely the board.
Tic-Tac-
Toe
game(Thi
rd
approac
h)
Tic-Tac-
Toe
game(Thi
rd
approac
h)
Tic-Tac-
Toe
game(Thi
rd
approac
h)
Tic-Tac-Toe game(Third approach)

•This program requires more time than the other approaches.


•It has to search a tree representing all possible move sequences before
making each move.
•This approach is extensible to handle
•3-dimensional tic-tac-toe
•It could be extended to handle games which are more complicated
than tic-tac-toe.
History of Reinforcement Learning

•1911- Law of effect by Thorndike which states that behavioral


responses that are most proximal to a satisfying result are more likely
to become patterns.
•1948- Alan Turing described a machine named p-type which consists
of neuron like elements connected into networks. It have two
interfering inputs pleasure/reward and pain/punishment. He studied
that it resembles the patterns of how a child learns which analogous to
the training procedure of a machine.
•1950 to 1960- Richard bellman developed Bellman equation to
identify optimal return function.
History of Reinforcement Learning

•1950 to 1960-Ronald Howard developed Markov decision process


which is a discrete-time stochastic control process. It provides a
mathematical framework for modeling decision making in situations
where outcomes are partly random and partly under the control of a
decision maker.
•1960 to 1970- Minsky’s Credit assignment Problem refers to the
problem of measuring the influence and impact of an action taken by
an agent on future rewards. The core aim is to guide the agents to take
corrective actions which can maximize the reward.
Credit
Assignm
ent
Problem
History of Reinforcement Learning

•1970 to 1980s- Creation of low memory machines to improve the


probability of rewards called Learning Automata.
•It is an adaptive decision-making unit situated in a random
environment that learns the optimal action through repeated
interactions with its environment.
•The actions are chosen according to a specific probability distribution
which is updated based on the environment response the automation
obtains by performing a particular action.
History of Reinforcement Learning

•1970 to 1980s- Temporal Difference Learning in reinforcement


learning works as an unsupervised learning method. It helps predict
the total expected future reward.
•The trick is pretty simple. Instead of trying to figure out the total
future reward temporal difference learning just tries to guess the mix
of immediate reward and its own reward prediction for the next
moment.
•It aims to bring the expected prediction and the new prediction
together, thus matching expectations with reality and gradually
increasing the accuracy of the entire chain of prediction.
Example of Temporal difference Learning
Parameters in Temporal difference Learning
History of Reinforcement Learning
•1990- TD-Gammon is a computer backgammon program developed in
1992 by Gerald Tesauro at IBM's Thomas J. Watson Research Center.
•Its name comes from the fact that it is an artificial neural net trained
by a form of temporal-difference learning, specifically TD-Lambda.
•2000-2020s- AlphaGo is a first computer program to defeat a human
Go player and a Go world champion.
•Google DeepMind patented an application of Deep Q-Learning that
can play 2600 games at expert human levels.
•DRL for Autonomous driving vehicles and robotics.

You might also like