0% found this document useful (0 votes)

89 views9 pages

Reinforcement

Reinforcement learning is a type of machine learning where an agent learns how to behave in an environment by performing actions and receiving rewards or penalties as feedback. The agent learns through trial-and-error interactions with the environment to maximize its long-term reward. Key aspects of reinforcement learning include agents, environments, actions, states, rewards, policies, and value functions. Reinforcement learning algorithms like Q-learning are used in applications such as robotics, control systems, game playing, chemistry, business, manufacturing, and finance.

Uploaded by

Shivareddy Gangam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views9 pages

Reinforcement

Uploaded by

Shivareddy Gangam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Reinforcement Learning

o Reinforcement Learning is a feedback-based Machine learning technique in

which an agent learns to behave in an environment by performing the actions
and seeing the results of actions. For each good action, the agent gets positive
feedback, and for each bad action, the agent gets negative feedback or penalty.
o In Reinforcement Learning, the agent learns automatically using feedbacks
without any labelled data, unlike supervised learning.
o Since there is no labelled data, so the agent is bound to learn by its experience
only.
o RL solves a specific type of problem where decision making is sequential, and
the goal is long-term, such as game-playing, robotics, etc.
o The agent interacts with the environment and explores it by itself. The primary
goal of an agent in reinforcement learning is to improve the performance by
getting the maximum positive rewards.
o The agent learns with the process of hit and trial, and based on the experience, it
learns to perform the task in a better way. Hence, we can say
that "Reinforcement learning is a type of machine learning method where an
intelligent agent (computer program) interacts with the environment and
learns to act within that." How a Robotic dog learns the movement of his arms
is an example of Reinforcement learning.
o It is a core part of Artificial intelligence, and all AI agent works on the concept of
reinforcement learning. Here we do not need to pre-program the agent, as it
learns from its own experience without any human intervention.
o Example: Suppose there is an AI agent present within a maze environment, and
his goal is to find the diamond. The agent interacts with the environment by
performing some actions, and based on those actions, the state of the agent gets
changed, and it also receives a reward or penalty as feedback.
o The agent continues doing these three things (take action, change state/remain
in the same state, and get feedback), and by doing these actions, he learns and
explores the environment.
o The agent learns that what actions lead to positive feedback or rewards and
what actions lead to negative feedback penalty. As a positive reward, the agent
gets a positive point, and as a penalty, it gets a negative point.

Terms used in Reinforcement Learning

o Agent(): An entity that can perceive/explore the environment and act upon it.
o Environment(): A situation in which an agent is present or surrounded by. In RL,
we assume the stochastic environment, which means it is random in nature.
o Action(): Actions are the moves taken by an agent within the environment.
o State(): State is a situation returned by the environment after each action taken
by the agent.
o Reward(): A feedback returned to the agent from the environment to evaluate
the action of the agent.
o Policy(): Policy is a strategy applied by the agent for the next action based on
the current state.
o Value(): It is expected long-term retuned with the discount factor and opposite
to the short-term reward.
o Q-value(): It is mostly similar to the value, but it takes one additional parameter
as a current action (a).

Key Features of Reinforcement Learning

o In RL, the agent is not instructed about the environment and what actions need
to be taken.
o It is based on the hit and trial process.
o The agent takes the next action and changes states according to the feedback of
the previous action.
o The agent may get a delayed reward.
o The environment is stochastic, and the agent needs to explore it to reach to get
the maximum positive rewards.

Approaches to implement Reinforcement Learning

There are mainly three ways to implement reinforcement-learning in ML, which are:

1. Value-based:
The value-based approach is about to find the optimal value function, which is
the maximum value at a state under any policy. Therefore, the agent expects the
long-term return at any state(s) under policy π.
2. Policy-based:
Policy-based approach is to find the optimal policy for the maximum future
rewards without using the value function. In this approach, the agent tries to
apply such a policy that the action performed in each step helps to maximize
the future reward.
The policy-based approach has mainly two types of policy:
o Deterministic: The same action is produced by the policy (π) at any state.
o Stochastic: In this policy, probability determines the produced action.
3. Model-based: In the model-based approach, a virtual model is created for the
environment, and the agent explores that environment to learn it. There is no
particular solution or algorithm for this approach because the model
representation is different for each environment.

Types of Reinforcement learning

There are mainly two types of reinforcement learning, which are:

o Positive Reinforcement
o Negative Reinforcement

Positive Reinforcement:

The positive reinforcement learning means adding something to increase the tendency
that expected behaviour would occur again. It impacts positively on the behaviour of
the agent and increases the strength of the behaviour.

This type of reinforcement can sustain the changes for a long time, but too much
positive reinforcement may lead to an overload of states that can reduce the
consequences.

Negative Reinforcement:

The negative reinforcement learning is opposite to the positive reinforcement as it

increases the tendency that the specific behaviour will occur again by avoiding the
negative condition.

It can be more effective than the positive reinforcement depending on situation and
behaviour, but it provides reinforcement only to meet minimum behaviour.

How to represent the agent state?

We can represent the agent state using the Markov State that contains all the required
information from the history. The State St is Markov state if it follows the given
condition:
P[St+1 | St ] = P[St +1 | S1,......, St]
The Markov state follows the Markov property, which says that the future is
independent of the past and can only be defined with the present. The RL works on
fully observable environments, where the agent can observe the environment and act
for the new state. The complete process is known as Markov Decision process, which
is explained below:
Markov Decision Process
Markov Decision Process or MDP, is used to formalize the reinforcement learning
problems. If the environment is completely observable, then its dynamic can be
modeled as a Markov Process. In MDP, the agent constantly interacts with the
environment and performs actions; at each action, the environment responds and
generates a new state.

MDP is used to describe the environment for the RL, and almost all the RL problem can
be formalized using MDP.
MDP contains a tuple of four elements (S, A, P a, Ra):
o A set of finite States S
o A set of finite Actions A
o Rewards received after transitioning from state S to state S', due to action a.
o Probability Pa.

MDP uses Markov property, and to better understand the MDP, we need to learn about
it.

Markov Property:
It says that "If the agent is present in the current state S1, performs an action a1 and
move to the state s2, then the state transition from s1 to s2 only depends on the
current state and future action and states do not depend on past actions, rewards, or
states."
Or, in other words, as per Markov Property, the current state transition does not
depend on any past action or state. Hence, MDP is an RL problem that satisfies the
Markov property. Such as in a Chess game, the players only focus on the current state
and do not need to remember past actions or states.
Finite MDP:
A finite MDP is when there are finite states, finite rewards, and finite actions. In RL, we
consider only the finite MDP.

Markov Process:
Markov Process is a memoryless process with a sequence of random states S1, S2, .....,
St that uses the Markov Property. Markov process is also known as Markov chain, which
is a tuple (S, P) on state S and transition function P. These two components (S and P)
can define the dynamics of the system.

Reinforcement Learning Algorithms

Reinforcement learning algorithms are mainly used in AI applications and gaming
applications. The main used algorithms are:
o Q-Learning:
o Q-learning is an Off policy RL algorithm, which is used for the temporal
difference Learning. The temporal difference learning methods are the way of
comparing temporally successive predictions.
o It learns the value function Q (S, a), which means how good to take action "a" at
a particular state "s."
o The below flowchart explains the working of Q- learning:
Reinforcement Learning Applications
1. Robotics:

1. RL is used in Robot navigation, Robo-soccer, walking, juggling, etc.

2. Control:

1. RL can be used for adaptive control such as Factory processes,

admission control in telecommunication, and Helicopter pilot is an
example of reinforcement learning.

3. Game Playing:

1. RL can be used in Game playing such as tic-tac-toe, chess, etc.

4. Chemistry:

1. RL can be used for optimizing the chemical reactions.

5. Business:

1. RL is now used for business strategy planning.

6. Manufacturing:

1. In various automobile manufacturing companies, the robots use deep

reinforcement learning to pick goods and put them in some containers.

7. Finance Sector:

1. The RL is currently used in the finance sector for evaluating trading

strategies.
Temporal difference learning
● It is a combination of monte carlo and dynamic programming.
● It is a model-free learning. Hence does not require the model to be known
in advance.
● It can be applied for non episodic tasks.
(In an episodic environment, an agent's current action will not affect a future
action, whereas, in a non-episodic environment, an agent's current action will
affect a future action and is also called the sequential environment.)
Temporal difference learning uses an update rule for update the value of a state.
V(S’t)=V(St)+alpha[Rt+1+gammV(St+1)-V(St)]
V(St) – the value of the previous state
Alpha – learning rate
Gamma - discount factor
V(St+1 )- the value of the current sate
R - Reward

Monte carlo idea

Learns directly from raw experience i.e without a model. There is no predefined
model
Dynamic programming
It estimates based on part of learning rather than waiting for the final outcome.
Independent Component Analysis (ICA)
● is a machine learning technique to separate independent sources from
a mixed signal.
● Unlike principal component analysis which focuses on maximizing the
variance of the data points, the independent component analysis
focuses on independence, i.e. independent components.
Problem: To extract independent sources’ signals from a mixed signal
composed of the signals from those sources.
Given: Mixed signal from five different independent sources.
Aim: To decompose the mixed signal into independent sources:
● Source 1
● Source 2
● Source 3
● Source 4
● Source 5
Solution: Independent Component Analysis (ICA).
Consider Cocktail Party Problem or Blind Source Separation problem to
understand the problem which is solved by independent component analysis.

Here, There is a party going into a room full of people. There is ‘n’ number of
speakers in that room and they are speaking simultaneously at the party. In
the same room, there are also ‘n’ number of microphones placed at different
distances from the speakers which are recording ‘n’ speakers’ voice signals.
Hence, the number of speakers is equal to the number of microphones in the
room.
Now, using these microphones’ recordings, we want to separate all the ‘n’
speakers’ voice signals in the room given each microphone recorded the voice
signals coming from each speaker of different intensity due to the difference in
distances between them. Decomposing the mixed signal of each microphone’s
recording into an independent source’s speech signal can be done by using
the machine learning technique, independent component analysis.
[ X1, X2, ….., Xn ] => [ Y1, Y2, ….., Yn ]
where, X1, X2, …, Xn are the original signals present in the mixed signal and
Y1, Y2, …, Yn are the new features and are independent components which
are independent of each other.

Restrictions on ICA –

1. The independent components generated by the ICA are assumed to

be statistically independent of each other.
2. the number of independent components generated by the ICA is
equal to the number of observed mixtures.

Difference between PCA and ICA –

Independent Component
Principal Component Analysis Analysis

It reduces the dimensions to avoid It decomposes the mixed signal into its
the problem of overfitting. independent sources’ signals.

It deals with the Principal It deals with the Independent

Components. Components.

It focuses on maximizing the It doesn’t focus on the issue of

variance. variance among the data points.

It doesn’t focus on the mutual It focuses on the mutual independence

independence of the components. of the components.

MLT Unit-5 Notes
No ratings yet
MLT Unit-5 Notes
17 pages
Unit-5 Mla
No ratings yet
Unit-5 Mla
22 pages
Lecture 9 Reiforcement Learning
No ratings yet
Lecture 9 Reiforcement Learning
29 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
25 pages
Unit-5 (AI)
No ratings yet
Unit-5 (AI)
21 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
29 pages
Unit 5
No ratings yet
Unit 5
45 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
5 pages
RL & DL Notes
No ratings yet
RL & DL Notes
43 pages
Unit 3
No ratings yet
Unit 3
29 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
RL & DL Notes
No ratings yet
RL & DL Notes
73 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
15 pages
21ai020 & Reinforcement Learning UNIT 1-LM:1
No ratings yet
21ai020 & Reinforcement Learning UNIT 1-LM:1
8 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
Unit 6
No ratings yet
Unit 6
34 pages
L-14 - Reinforcement-L-d-07062024-111949am
No ratings yet
L-14 - Reinforcement-L-d-07062024-111949am
22 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
32 pages
Unit-5 ML Notes
No ratings yet
Unit-5 ML Notes
31 pages
Unit-6 Reinforcement Learning
No ratings yet
Unit-6 Reinforcement Learning
75 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
CMPE257 - W10C13 - Reinforcement Learning
No ratings yet
CMPE257 - W10C13 - Reinforcement Learning
161 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
Unit 5 Part1 RL Notes
No ratings yet
Unit 5 Part1 RL Notes
22 pages
RL Presentation2
No ratings yet
RL Presentation2
19 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
10 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
19 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
88 pages
Unit-5 Reinforcemnt and Q Learning
No ratings yet
Unit-5 Reinforcemnt and Q Learning
45 pages
Unit4 (AI) 2024 Docx-1
No ratings yet
Unit4 (AI) 2024 Docx-1
22 pages
Types of Reinforcement Learning MDP
No ratings yet
Types of Reinforcement Learning MDP
3 pages
Unit-8 - Reinforcement Learning
No ratings yet
Unit-8 - Reinforcement Learning
52 pages
Unit 4
No ratings yet
Unit 4
56 pages
Unit 5
No ratings yet
Unit 5
10 pages
IntroductiontoRL BR
No ratings yet
IntroductiontoRL BR
22 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
5 pages
Module 1
No ratings yet
Module 1
72 pages
Reinforcement Learning, Q-Learning
No ratings yet
Reinforcement Learning, Q-Learning
20 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
5 pages
Reinforcement Learning Overview
No ratings yet
Reinforcement Learning Overview
2 pages
Lecture Week12
No ratings yet
Lecture Week12
37 pages
Unit V Reinforcement Learning and Genetic Algorithm
No ratings yet
Unit V Reinforcement Learning and Genetic Algorithm
40 pages
Winter Semester 2023-24 - CSE4037 - ETH - AP2023246000594 - 2024-01-05 - Reference-Material-I
No ratings yet
Winter Semester 2023-24 - CSE4037 - ETH - AP2023246000594 - 2024-01-05 - Reference-Material-I
35 pages
5.5 Reinforcement Learning
No ratings yet
5.5 Reinforcement Learning
5 pages
ML Unit-4
No ratings yet
ML Unit-4
10 pages
Sara Reinforcement Learning
No ratings yet
Sara Reinforcement Learning
69 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
13 pages
Multi-Agent Reinforcement Learning-Implementation of Hide and Seek
No ratings yet
Multi-Agent Reinforcement Learning-Implementation of Hide and Seek
7 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
Reinforcement Learning Guide
100% (1)
Reinforcement Learning Guide
24 pages
Lecture 5
No ratings yet
Lecture 5
28 pages
Reinforcemnet Learning
No ratings yet
Reinforcemnet Learning
8 pages
ML 10
No ratings yet
ML 10
9 pages
114021
No ratings yet
114021
55 pages
Assignment 15 Modern AI
No ratings yet
Assignment 15 Modern AI
3 pages
Unit 4
No ratings yet
Unit 4
49 pages
RL Vishnu Sankar
No ratings yet
RL Vishnu Sankar
26 pages
Mission U500 Service Manual
67% (3)
Mission U500 Service Manual
67 pages
Procedure and Uses
No ratings yet
Procedure and Uses
3 pages
English Test 6 Grade Name: - Date: - Read The Dialogue
No ratings yet
English Test 6 Grade Name: - Date: - Read The Dialogue
6 pages
Basic 2 Work in Team Environment
100% (1)
Basic 2 Work in Team Environment
38 pages
Cke900 Spec
No ratings yet
Cke900 Spec
4 pages
Small Group Dynamics
100% (1)
Small Group Dynamics
5 pages
Thermal Diffusivity of Saliva and Tears Obtainded by Thermal Lens Spectros
No ratings yet
Thermal Diffusivity of Saliva and Tears Obtainded by Thermal Lens Spectros
2 pages
Supply Chain Analysis For Inverter Air Conditioners
100% (1)
Supply Chain Analysis For Inverter Air Conditioners
110 pages
STD 10 Math Probability All Types Ques.
No ratings yet
STD 10 Math Probability All Types Ques.
15 pages
Mammut Sleep Well Pt1 E
100% (1)
Mammut Sleep Well Pt1 E
39 pages
Ss2 Week 6 Farm Planning
No ratings yet
Ss2 Week 6 Farm Planning
3 pages
Friedmans Core-Periphery Model
100% (1)
Friedmans Core-Periphery Model
29 pages
Shell 1
No ratings yet
Shell 1
2 pages
Manual - DMView (3.3.2)
No ratings yet
Manual - DMView (3.3.2)
43 pages
Kwame Nkrumah University of Science and Technology. College of Science. Department of Theoretical and Applied Biology
No ratings yet
Kwame Nkrumah University of Science and Technology. College of Science. Department of Theoretical and Applied Biology
13 pages
Unit Five DIFFERENTIATE
No ratings yet
Unit Five DIFFERENTIATE
32 pages
SD Card Bill
No ratings yet
SD Card Bill
1 page
Literature Review On Awareness
100% (1)
Literature Review On Awareness
8 pages
Centrifugal Pump Inventory List
No ratings yet
Centrifugal Pump Inventory List
7 pages
Filters Cabin Air Oil Fuel Catalog
No ratings yet
Filters Cabin Air Oil Fuel Catalog
356 pages
Assignment 7
No ratings yet
Assignment 7
2 pages
Kitchen Safety in Hospital
No ratings yet
Kitchen Safety in Hospital
6 pages
Job Application: Rose Ann Capispisan
No ratings yet
Job Application: Rose Ann Capispisan
2 pages
Chapter 9: Managing Flow Variability: Process Control and Capability
No ratings yet
Chapter 9: Managing Flow Variability: Process Control and Capability
6 pages
Hidalgo Revised Gender Syllabus Latest
No ratings yet
Hidalgo Revised Gender Syllabus Latest
4 pages
A Study of Algorithm and Application in Transient Signals Wavelet Post-Analysis Methods
No ratings yet
A Study of Algorithm and Application in Transient Signals Wavelet Post-Analysis Methods
6 pages
High Conductivity Copper, Hard, UNS C10200 (MatWeb)
No ratings yet
High Conductivity Copper, Hard, UNS C10200 (MatWeb)
3 pages
5000LM - Flashlight ENG INST - MANUAL v12
No ratings yet
5000LM - Flashlight ENG INST - MANUAL v12
2 pages
Earn Money Watching Ads
No ratings yet
Earn Money Watching Ads
2 pages
Unit Test
No ratings yet
Unit Test
9 pages

Reinforcement

Uploaded by

Reinforcement

Uploaded by

Reinforcement Learning

o Reinforcement Learning is a feedback-based Machine learning technique in

Terms used in Reinforcement Learning

Key Features of Reinforcement Learning

Approaches to implement Reinforcement Learning

Types of Reinforcement learning

The negative reinforcement learning is opposite to the positive reinforcement as it

How to represent the agent state?

Reinforcement Learning Algorithms

1. RL is used in Robot navigation, Robo-soccer, walking, juggling, etc.

1. RL can be used for adaptive control such as Factory processes,

1. RL can be used in Game playing such as tic-tac-toe, chess, etc.

1. RL can be used for optimizing the chemical reactions.

1. RL is now used for business strategy planning.

1. In various automobile manufacturing companies, the robots use deep

1. The RL is currently used in the finance sector for evaluating trading

Monte carlo idea

1. The independent components generated by the ICA are assumed to

Difference between PCA and ICA –

It deals with the Principal It deals with the Independent

It focuses on maximizing the It doesn’t focus on the issue of

It doesn’t focus on the mutual It focuses on the mutual independence

You might also like