0% found this document useful (0 votes)
60 views5 pages

Reinforcement Learning

Reinforcement learning is a branch of machine learning where an agent learns from trial-and-error interactions with a dynamic environment. Unlike supervised and unsupervised learning, reinforcement learning does not rely on static datasets and can start learning without human supervision given the right incentive. Deep reinforcement learning combines reinforcement learning and deep learning to solve complex problems. Examples of applications include autonomous driving, robotics, scheduling, and calibration.

Uploaded by

supriya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views5 pages

Reinforcement Learning

Reinforcement learning is a branch of machine learning where an agent learns from trial-and-error interactions with a dynamic environment. Unlike supervised and unsupervised learning, reinforcement learning does not rely on static datasets and can start learning without human supervision given the right incentive. Deep reinforcement learning combines reinforcement learning and deep learning to solve complex problems. Examples of applications include autonomous driving, robotics, scheduling, and calibration.

Uploaded by

supriya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Reinforcement Learning

Reinforcement learning is a branch of machine learning (Figure 1). Unlike unsupervised and
supervised machine learning, reinforcement learning does not rely on a static dataset, but
operates in a dynamic environment and learns from collected experiences. Data points, or
experiences, are collected during training through trial-and-error interactions between the
environment and a software agent. This aspect of reinforcement learning is important,
because it alleviates the need for data collection, preprocessing, and labeling before training,
otherwise necessary in supervised and unsupervised learning. Practically, this means that,
given the right incentive, a reinforcement learning model can start learning a behavior on its
own, without (human) supervision.

Deep learning spans all three types of machine learning; reinforcement learning and deep
learning are not mutually exclusive. Complex reinforcement learning problems often rely on
deep neural networks, a field known as deep reinforcement learning.

Figure 1. Three broad categories of machine learning: unsupervised learning, supervised


learning and reinforcement learning.

Examples of Reinforcement Learning Applications


Deep neural networks trained with reinforcement learning can encode complex behaviors.
This allows an alternative approach to applications that are otherwise intractable or more
challenging to tackle with more traditional methods. For example, in autonomous driving, a
neural network can replace the driver and decide how to turn the steering wheel by
simultaneously looking at multiple sensors such as camera frames and lidar measurements.
Without neural networks, the problem would normally be broken down in smaller pieces like
extracting features from camera frames, filtering the lidar measurements, fusing the sensor
outputs, and making “driving” decisions based on sensor inputs.

While reinforcement learning as an approach is still under evaluation for production systems,
some industrial applications are good candidates for this technology.

Advanced controls: Controlling nonlinear systems is a challenging problem that is often


addressed by linearizing the system at different operating points. Reinforcement learning can
be applied directly to the nonlinear system.
Automated driving: Making driving decisions based on camera input is an area where
reinforcement learning is suitable considering the success of deep neural networks in image
applications.

Robotics: Reinforcement learning can help with applications like robotic grasping, such as
teaching a robotic arm how to manipulate a variety of objects for pick-and-place
applications (39:38). Other robotics applications include human-robot and robot-robot
collaboration.

Scheduling: Scheduling problems appear in many scenarios including traffic light control
and coordinating resources on the factory floor towards some objective. Reinforcement
learning is a good alternative to evolutionary methods to solve these combinatorial
optimization problems.

Calibration: Applications that involve manual calibration of parameters, such as electronic


control unit (ECU) calibration, may be good candidates for reinforcement learning.

How Reinforcement Learning Works

The training mechanism behind reinforcement learning reflects many real-world scenarios.
Consider, for example, pet training through positive reinforcement.

Figure 2. Reinforcement learning in dog training.

Using reinforcement learning terminology (Figure 2), the goal of learning in this case is to
train the dog (agent) to complete a task within an environment, which includes the
surroundings of the dog as well as the trainer. First, the trainer issues a command or cue,
which the dog observes (observation). The dog then responds by taking an action. If the
action is close to the desired behavior, the trainer will likely provide a reward, such as a food
treat or a toy; otherwise, no reward will be provided. At the beginning of training, the dog
will likely take more random actions like rolling over when the command given is “sit,” as it
is trying to associate specific observations with actions and rewards. This association, or
mapping, between observations and actions is called policy. From the dog’s perspective, the
ideal case would be one in which it would respond correctly to every cue, so that it gets as
many treats as possible. So, the whole meaning of reinforcement learning training is to “tune”
the dog’s policy so that it learns the desired behaviors that will maximize some reward. After
training is complete, the dog should be able to observe the owner and take the appropriate
action, for example, sitting when commanded to “sit” by using the internal policy it has
developed. By this point, treats are welcome but, theoretically, shouldn’t be necessary.

Keeping in mind the dog training example, consider the task of parking a vehicle using an
automated driving system (Figure 3). The goal is to teach the vehicle computer (agent) to
park in the correct parking spot with reinforcement learning. As in the dog training case, the
environment is everything outside the agent and could include the dynamics of the vehicle,
other vehicles that may be nearby, weather conditions, and so on. During training, the agent
uses readings from sensors such as cameras, GPS, and lidar (observations) to generate
steering, braking, and acceleration commands (actions). To learn how to generate the correct
actions from the observations (policy tuning), the agent repeatedly tries to park the vehicle
using a trial-and-error process. A reward signal can be provided to evaluate the goodness of a
trial and to guide the learning process.

Figure 3. Reinforcement learning in autonomous parking.

In the dog training example, training is happening inside the dog’s brain. In the autonomous
parking example, training is handled by a training algorithm. The training algorithm is
responsible for tuning the agent’s policy based on the collected sensor readings, actions, and
rewards. After training is complete, the vehicle’s computer should be able to park using only
the tuned policy and sensor readings.

One thing to keep in mind is that reinforcement learning is not sample efficient. That is, it
requires a large number of interactions between the agent and the environment to collect data
for training. As an example, AlphaGo, the first computer program to defeat a world champion
at the game of Go, was trained non-stop for a period of a few days by playing millions of
games, accumulating thousands of years of human knowledge. Even for relatively simple
applications, training time can take anywhere from minutes, to hours or days. Also, setting up
the problem correctly can be challenging as there is a list of design decisions that need to be
made, which may require a few iterations to get right. These include, for example, selecting
the appropriate architecture for the neural networks, tuning hyperparameters, and shaping of
the reward signal.

Reinforcement Learning Workflow


The general workflow for training an agent using reinforcement learning includes the
following steps (Figure 4):
Figure 4. Reinforcement learning workflow.

1. Create the environment

First you need to define the environment within which the reinforcement learning agent
operates, including the interface between agent and environment. The environment can be
either a simulation model, or a real physical system, but simulated environments are usually a
good first step since they are safer and allow experimentation.

2. Define the reward

Next, specify the reward signal that the agent uses to measure its performance against the task
goals and how this signal is calculated from the environment. Reward shaping can be tricky
and may require a few iterations to get it right.

3. Create the agent

Then you create the agent, which consists of the policy and the reinforcement learning
training algorithm. So you need to:

a) Choose a way to represent the policy (such as using neural networks or look-up tables).

b) Select the appropriate training algorithm. Different representations are often tied to
specific categories of training algorithms. But in general, most modern reinforcement
learning algorithms rely on neural networks as they are good candidates for large state/action
spaces and complex problems.

4. Train and validate the agent

Set up training options (like stopping criteria) and train the agent to tune the policy. Make
sure to validate the trained policy after training ends. If necessary, revisit design choices like
the reward signal and policy architecture and train again. Reinforcement learning is generally
known to be sample inefficient; training can take anywhere from minutes to days depending
on the application. For complex applications, parallelizing training on multiple CPUs, GPUs,
and computer clusters will speed things up (Figure 5).
Figure 5. Training sample inefficient learning problem with parallel computing.

5. Deploy the policy

Deploy the trained policy representation using, for example, generated C/C++ or CUDA
code. At this point, the policy is a standalone decision-making system.

Training an agent using reinforcement learning is an iterative process. Decisions and results
in later stages can require you to return to an earlier stage in the learning workflow. For
example, if the training process does not converge to an optimal policy within a reasonable
amount of time, you may have to update any of the following before retraining the agent:

 Training settings
 Reinforcement learning algorithm configuration
 Policy representation
 Reward signal definition
 Action and observation signals
 Environment dynamics

You might also like