0% found this document useful (0 votes)

60 views5 pages

Reinforcement Learning

Reinforcement learning is a branch of machine learning where an agent learns from trial-and-error interactions with a dynamic environment. Unlike supervised and unsupervised learning, reinforcement learning does not rely on static datasets and can start learning without human supervision given the right incentive. Deep reinforcement learning combines reinforcement learning and deep learning to solve complex problems. Examples of applications include autonomous driving, robotics, scheduling, and calibration.

Uploaded by

supriya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views5 pages

Reinforcement Learning

Uploaded by

supriya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Reinforcement Learning

Reinforcement learning is a branch of machine learning (Figure 1). Unlike unsupervised and
supervised machine learning, reinforcement learning does not rely on a static dataset, but
operates in a dynamic environment and learns from collected experiences. Data points, or
experiences, are collected during training through trial-and-error interactions between the
environment and a software agent. This aspect of reinforcement learning is important,
because it alleviates the need for data collection, preprocessing, and labeling before training,
otherwise necessary in supervised and unsupervised learning. Practically, this means that,
given the right incentive, a reinforcement learning model can start learning a behavior on its
own, without (human) supervision.

Deep learning spans all three types of machine learning; reinforcement learning and deep
learning are not mutually exclusive. Complex reinforcement learning problems often rely on
deep neural networks, a field known as deep reinforcement learning.

Figure 1. Three broad categories of machine learning: unsupervised learning, supervised

learning and reinforcement learning.

Examples of Reinforcement Learning Applications

Deep neural networks trained with reinforcement learning can encode complex behaviors.
This allows an alternative approach to applications that are otherwise intractable or more
challenging to tackle with more traditional methods. For example, in autonomous driving, a
neural network can replace the driver and decide how to turn the steering wheel by
simultaneously looking at multiple sensors such as camera frames and lidar measurements.
Without neural networks, the problem would normally be broken down in smaller pieces like
extracting features from camera frames, filtering the lidar measurements, fusing the sensor
outputs, and making “driving” decisions based on sensor inputs.

While reinforcement learning as an approach is still under evaluation for production systems,
some industrial applications are good candidates for this technology.

Advanced controls: Controlling nonlinear systems is a challenging problem that is often

addressed by linearizing the system at different operating points. Reinforcement learning can
be applied directly to the nonlinear system.
Automated driving: Making driving decisions based on camera input is an area where
reinforcement learning is suitable considering the success of deep neural networks in image
applications.

Robotics: Reinforcement learning can help with applications like robotic grasping, such as
teaching a robotic arm how to manipulate a variety of objects for pick-and-place
applications (39:38). Other robotics applications include human-robot and robot-robot
collaboration.

Scheduling: Scheduling problems appear in many scenarios including traffic light control
and coordinating resources on the factory floor towards some objective. Reinforcement
learning is a good alternative to evolutionary methods to solve these combinatorial
optimization problems.

Calibration: Applications that involve manual calibration of parameters, such as electronic

control unit (ECU) calibration, may be good candidates for reinforcement learning.

How Reinforcement Learning Works

The training mechanism behind reinforcement learning reflects many real-world scenarios.
Consider, for example, pet training through positive reinforcement.

Figure 2. Reinforcement learning in dog training.

Using reinforcement learning terminology (Figure 2), the goal of learning in this case is to
train the dog (agent) to complete a task within an environment, which includes the
surroundings of the dog as well as the trainer. First, the trainer issues a command or cue,
which the dog observes (observation). The dog then responds by taking an action. If the
action is close to the desired behavior, the trainer will likely provide a reward, such as a food
treat or a toy; otherwise, no reward will be provided. At the beginning of training, the dog
will likely take more random actions like rolling over when the command given is “sit,” as it
is trying to associate specific observations with actions and rewards. This association, or
mapping, between observations and actions is called policy. From the dog’s perspective, the
ideal case would be one in which it would respond correctly to every cue, so that it gets as
many treats as possible. So, the whole meaning of reinforcement learning training is to “tune”
the dog’s policy so that it learns the desired behaviors that will maximize some reward. After
training is complete, the dog should be able to observe the owner and take the appropriate
action, for example, sitting when commanded to “sit” by using the internal policy it has
developed. By this point, treats are welcome but, theoretically, shouldn’t be necessary.

Keeping in mind the dog training example, consider the task of parking a vehicle using an
automated driving system (Figure 3). The goal is to teach the vehicle computer (agent) to
park in the correct parking spot with reinforcement learning. As in the dog training case, the
environment is everything outside the agent and could include the dynamics of the vehicle,
other vehicles that may be nearby, weather conditions, and so on. During training, the agent
uses readings from sensors such as cameras, GPS, and lidar (observations) to generate
steering, braking, and acceleration commands (actions). To learn how to generate the correct
actions from the observations (policy tuning), the agent repeatedly tries to park the vehicle
using a trial-and-error process. A reward signal can be provided to evaluate the goodness of a
trial and to guide the learning process.

Figure 3. Reinforcement learning in autonomous parking.

In the dog training example, training is happening inside the dog’s brain. In the autonomous
parking example, training is handled by a training algorithm. The training algorithm is
responsible for tuning the agent’s policy based on the collected sensor readings, actions, and
rewards. After training is complete, the vehicle’s computer should be able to park using only
the tuned policy and sensor readings.

One thing to keep in mind is that reinforcement learning is not sample efficient. That is, it
requires a large number of interactions between the agent and the environment to collect data
for training. As an example, AlphaGo, the first computer program to defeat a world champion
at the game of Go, was trained non-stop for a period of a few days by playing millions of
games, accumulating thousands of years of human knowledge. Even for relatively simple
applications, training time can take anywhere from minutes, to hours or days. Also, setting up
the problem correctly can be challenging as there is a list of design decisions that need to be
made, which may require a few iterations to get right. These include, for example, selecting
the appropriate architecture for the neural networks, tuning hyperparameters, and shaping of
the reward signal.

Reinforcement Learning Workflow

The general workflow for training an agent using reinforcement learning includes the
following steps (Figure 4):
Figure 4. Reinforcement learning workflow.

1. Create the environment

First you need to define the environment within which the reinforcement learning agent
operates, including the interface between agent and environment. The environment can be
either a simulation model, or a real physical system, but simulated environments are usually a
good first step since they are safer and allow experimentation.

2. Define the reward

Next, specify the reward signal that the agent uses to measure its performance against the task
goals and how this signal is calculated from the environment. Reward shaping can be tricky
and may require a few iterations to get it right.

3. Create the agent

Then you create the agent, which consists of the policy and the reinforcement learning
training algorithm. So you need to:

a) Choose a way to represent the policy (such as using neural networks or look-up tables).

b) Select the appropriate training algorithm. Different representations are often tied to
specific categories of training algorithms. But in general, most modern reinforcement
learning algorithms rely on neural networks as they are good candidates for large state/action
spaces and complex problems.

4. Train and validate the agent

Set up training options (like stopping criteria) and train the agent to tune the policy. Make
sure to validate the trained policy after training ends. If necessary, revisit design choices like
the reward signal and policy architecture and train again. Reinforcement learning is generally
known to be sample inefficient; training can take anywhere from minutes to days depending
on the application. For complex applications, parallelizing training on multiple CPUs, GPUs,
and computer clusters will speed things up (Figure 5).
Figure 5. Training sample inefficient learning problem with parallel computing.

5. Deploy the policy

Deploy the trained policy representation using, for example, generated C/C++ or CUDA
code. At this point, the policy is a standalone decision-making system.

Training an agent using reinforcement learning is an iterative process. Decisions and results
in later stages can require you to return to an earlier stage in the learning workflow. For
example, if the training process does not converge to an optimal policy within a reasonable
amount of time, you may have to update any of the following before retraining the agent:

 Training settings
 Reinforcement learning algorithm configuration
 Policy representation
 Reward signal definition
 Action and observation signals
 Environment dynamics

Ai PPT New
No ratings yet
Ai PPT New
14 pages
UNIT V Reinforcement Learning
No ratings yet
UNIT V Reinforcement Learning
8 pages
Winter Semester 2023-24 - CSE4037 - ETH - AP2023246000594 - 2024-01-05 - Reference-Material-I
No ratings yet
Winter Semester 2023-24 - CSE4037 - ETH - AP2023246000594 - 2024-01-05 - Reference-Material-I
35 pages
Unit V Reinforcement Learning and Genetic Algorithm
No ratings yet
Unit V Reinforcement Learning and Genetic Algorithm
40 pages
A Beginners Guide To Deep Reinforcement Learning PDF
No ratings yet
A Beginners Guide To Deep Reinforcement Learning PDF
9 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
Unit 6
No ratings yet
Unit 6
34 pages
Reinforcement Learning Is An Autonomous
No ratings yet
Reinforcement Learning Is An Autonomous
3 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
Reinforcemnet Learning
No ratings yet
Reinforcemnet Learning
8 pages
Reinforcement Learning With Python - Master Reinforcemearning in Python Without Being An Expert - Bob Story (Bob Story) (Z-Library)
No ratings yet
Reinforcement Learning With Python - Master Reinforcemearning in Python Without Being An Expert - Bob Story (Bob Story) (Z-Library)
58 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
4 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
RL & DL Notes
No ratings yet
RL & DL Notes
73 pages
UNIT-V-Reinforcement Learning
No ratings yet
UNIT-V-Reinforcement Learning
4 pages
Module 1
No ratings yet
Module 1
72 pages
RL & DL Notes
No ratings yet
RL & DL Notes
43 pages
Intro to Reinforcement Learning
No ratings yet
Intro to Reinforcement Learning
9 pages
AI Unit - 3
No ratings yet
AI Unit - 3
102 pages
Exp-14 Reinforcement Learning
No ratings yet
Exp-14 Reinforcement Learning
11 pages
Reinforcement 2
No ratings yet
Reinforcement 2
2 pages
Unit-5 Mla
No ratings yet
Unit-5 Mla
22 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
15 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
7 pages
L-14 - Reinforcement-L-d-07062024-111949am
No ratings yet
L-14 - Reinforcement-L-d-07062024-111949am
22 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
Lec 1 Intro Course Overview
No ratings yet
Lec 1 Intro Course Overview
50 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
25 pages
Reinforcement learning-WPS Office
No ratings yet
Reinforcement learning-WPS Office
1 page
tiếng anhi
No ratings yet
tiếng anhi
7 pages
Unit 5
No ratings yet
Unit 5
58 pages
Unit 3
No ratings yet
Unit 3
29 pages
Deep Reinforcement Learning: From Q-Learning To Deep Q-Learning
No ratings yet
Deep Reinforcement Learning: From Q-Learning To Deep Q-Learning
9 pages
A Beginner's Guide To Deep Reinforcement Learning: Skymind - Ai
No ratings yet
A Beginner's Guide To Deep Reinforcement Learning: Skymind - Ai
23 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
IntroductiontoRL BR
No ratings yet
IntroductiontoRL BR
22 pages
An Introduction To Deep ReinforcementLearning
No ratings yet
An Introduction To Deep ReinforcementLearning
65 pages
Reinforcement Learning and Deep Learning Unit 1,2
No ratings yet
Reinforcement Learning and Deep Learning Unit 1,2
74 pages
Unit 5 ML 3year
No ratings yet
Unit 5 ML 3year
17 pages
Types of Data:: Reference Website
No ratings yet
Types of Data:: Reference Website
15 pages
Unit 5-1
No ratings yet
Unit 5-1
8 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
AI Week 15
No ratings yet
AI Week 15
3 pages
A Concise Introduction To Reinforcement Learning: February 2018
No ratings yet
A Concise Introduction To Reinforcement Learning: February 2018
12 pages
Reinf Learning Res Paper 2
No ratings yet
Reinf Learning Res Paper 2
12 pages
Reinforcement Learning (RL) : Agent
No ratings yet
Reinforcement Learning (RL) : Agent
35 pages
Introduction To Prolog-Unit3
No ratings yet
Introduction To Prolog-Unit3
30 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
11 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
19 pages
Module 5-1
No ratings yet
Module 5-1
12 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
3 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
8 pages
Reinforcement Learning 1
No ratings yet
Reinforcement Learning 1
14 pages
Disertatie
No ratings yet
Disertatie
5 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
19 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
14 pages
Aamc MCAT Study Plan
100% (1)
Aamc MCAT Study Plan
36 pages
Individual Summary Sheet
No ratings yet
Individual Summary Sheet
5 pages
10T7020934B1
No ratings yet
10T7020934B1
19 pages
SJB Institute of Technology: Department of Information Science & Engineering
No ratings yet
SJB Institute of Technology: Department of Information Science & Engineering
14 pages
Louise Atkins - Robert West - Susan Michie - The Behaviour Change Wheel - A Guide To Designing Interventions (2014, Silverback Publishing) - Libgen - Li
No ratings yet
Louise Atkins - Robert West - Susan Michie - The Behaviour Change Wheel - A Guide To Designing Interventions (2014, Silverback Publishing) - Libgen - Li
333 pages
Column Load Analysis: LRFD & ASD
No ratings yet
Column Load Analysis: LRFD & ASD
5 pages
How To Write A Thesis For A Critical Book Review
100% (3)
How To Write A Thesis For A Critical Book Review
5 pages
Catalog - Arroway Textures - Wood Volume One (EN) PDF
No ratings yet
Catalog - Arroway Textures - Wood Volume One (EN) PDF
24 pages
Basic 2 Work in Team Environment
100% (1)
Basic 2 Work in Team Environment
38 pages
Connected Speech e Phrasal Verbs A Coppie
No ratings yet
Connected Speech e Phrasal Verbs A Coppie
7 pages
Industrial Manual Switch Specs
No ratings yet
Industrial Manual Switch Specs
3 pages
50kVAr APFC PANEL - ASBUILT
No ratings yet
50kVAr APFC PANEL - ASBUILT
2 pages
Scalar and Vector Applications
No ratings yet
Scalar and Vector Applications
15 pages
The Diamond System
100% (1)
The Diamond System
19 pages
GM07 08 English Manual
No ratings yet
GM07 08 English Manual
19 pages
Backend Developer
No ratings yet
Backend Developer
2 pages
STD 10 Math Probability All Types Ques.
No ratings yet
STD 10 Math Probability All Types Ques.
15 pages
Srini
No ratings yet
Srini
4 pages
Reading Comprehension What Is Personal Hygiene
100% (1)
Reading Comprehension What Is Personal Hygiene
5 pages
Repeater - Andrew - Node A Datasheet North America CO
100% (1)
Repeater - Andrew - Node A Datasheet North America CO
4 pages
Appen English Transcription Exam
No ratings yet
Appen English Transcription Exam
11 pages
3G Missing Neighbors
No ratings yet
3G Missing Neighbors
3 pages
Worksheets in Genmath
No ratings yet
Worksheets in Genmath
5 pages
750-386 Adac 1000 PDF
No ratings yet
750-386 Adac 1000 PDF
140 pages
Las English5 Q4 W1
No ratings yet
Las English5 Q4 W1
2 pages
COURSE SYLLABUS: Geography 110/C110/ISF101: Economic Geography of The Industrial World
No ratings yet
COURSE SYLLABUS: Geography 110/C110/ISF101: Economic Geography of The Industrial World
4 pages
Skill Assessment: Tableau
No ratings yet
Skill Assessment: Tableau
1 page
Roman Numerals
No ratings yet
Roman Numerals
6 pages
Coordination Numbers and Geometry - Chemistry LibreTexts
No ratings yet
Coordination Numbers and Geometry - Chemistry LibreTexts
7 pages
Class 12 Lesson Plan Computer Science Chap 3
No ratings yet
Class 12 Lesson Plan Computer Science Chap 3
29 pages

Reinforcement Learning

Uploaded by

Reinforcement Learning

Uploaded by

Reinforcement Learning

Figure 1. Three broad categories of machine learning: unsupervised learning, supervised

Examples of Reinforcement Learning Applications

Advanced controls: Controlling nonlinear systems is a challenging problem that is often

Calibration: Applications that involve manual calibration of parameters, such as electronic

How Reinforcement Learning Works

Figure 2. Reinforcement learning in dog training.

Figure 3. Reinforcement learning in autonomous parking.

Reinforcement Learning Workflow

1. Create the environment

2. Define the reward

3. Create the agent

4. Train and validate the agent

5. Deploy the policy

You might also like