0% found this document useful (0 votes)

471 views11 pages

Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)

Uploaded by

QUARREL CREATIONS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

471 views11 pages

Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)

Uploaded by

QUARREL CREATIONS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

UNIT-5 PART C

1)Explain the Q function and Q learning algorithm assuming

deterministic rewards and actions with example.

ans)
https://www.freecodecamp.org/news/an-introduction-to-q-learn
ing-reinforcement-learning-14ac0b4493cc/

2) Explain the K – nearest neighbor algorithm for

approximating a discrete– valued function f : Hn→ V with
pseudo code.

3)Compare unsupervised learning and reinforcement learning
with examples.

4)Develop a Q learning task for the recommendation system of
an online shopping website. What will be the environment of the
system? Write the cost function and value function for the
system

A Q-learning task has these components: Agent, Environment,

State, Reward Function, Value Function and Policy.

1)To simplify the problem, we assume a hypothetical user whose

experience on an online shopping store is pooled from all the
actual users.

2)Our recommender model is going to be the agent of the

system that processes products to this hypothetical user that
will buy/not-buy the recommendation.

3)The user behaves as the system’s environment, responding to

the system’s recommendation depending to the state of the
system.
4)User feedback determines our reward, that is, one score only
if the user buys.
5)Action of the agent is product recommended.

6)Our state is defined as the product features and

corresponding user reactions of past 5 steps, excluding the
current step.

7)Therefore, feedback and action together give us the next

state.

The goal of the agent is to learn a policy that maximizes

accumulated rewards.

5)Identify the suitable learning method for training a robotic

arm and explain it.

ans) Industrial robots deployed today across various industries

are mostly doing repetitive tasks. Basically, moving or putting
objects in predefined trajectories. But the reality is that the
ability of robots to handle different or complex environments is
really limited in today’s manufacturing.The main challenge that
we have to overcome is designing adaptable control algorithms
that can easily adapt to new environments.

Reinforcement learning (RL) is a type of Machine Learning where

we can teach an agent how to behave in an environment by
performing actions and seeing the results.

The concept of Reinforcement Learning has been around for a

while but the algorithm was not very adaptable and was
incapable of doing continuous tasks.

For RL, we use a framework called the Markov Decision Process

(MDP) which produces an easy framework for a really complex

problem. An agent (e.g. robotic arm) would first observe the

environment it’s in and take actions accordingly. Rewards are

given out according to the result.

For robotic control the state is measured by using sensors to

measure the joint angles, velocity, and the end-effector pose:

Policy

The main objective is to find a policy. A policy is something that

tells us how to act in a particular state. The objective is to find a

policy that makes the most rewarding decisions.

Now, you put the objective together. We want to find a sequence

of actions that maximize expected rewards or minimize cost

Q-Learning

Q-learning is a model-free reinforcement learning algorithm

which means that it does not require a model of the

environment. It’s especially effective because it can handle

problems with random transitions and rewards, without

requiring adaptations.The most common Q-learning method

consists of these steps:

1. Sample an action.

2. Observe the reward and the next state.

3. Take the action with the highest Q.

7 . How does Q function become able to learn with and without

complete knowledge of reward function and state transition

function.

Q-learning is a model-free reinforcement learning algorithm

which means that it does not require a model of the

environment. It’s especially effective because it can handle

problems with random transitions and rewards, without

requiring adaptations.

Q-learning is an off-policy learner. Means it learns the value of the

optimal policy independently of the agent’s actions. On the other

hand, an on-policy learner learns the value of the policy being carried

out by the agent, including the exploration steps and it will find a
policy that is optimal, taking into account the exploration inherent in

the policy.

8 . How does setting a Reinforcement Learning problem require

an understanding of the following parameters of the problem?

(a) Delayed reward

(b) Exploring unknown or exploiting already learned states and

actions.

(c) Number of old states should be considered to decide action

Ans. (a) Delayed reward:

In the general case of the reinforcement learning problem, the

agent's actions determine not only its immediate reward, but

also the next state of the environment. The agent must take into

account the next state as well as the immediate reward when it

decides which action to take. The model of long-run optimality

the agent is using determines exactly how it should take the

value of the future into account. The agent will have to be able

to learn from delayed reinforcement: it may take a long

sequence of actions, receiving insignificant reinforcement, then

finally arrive at a state with high reinforcement. The agent must

be able to learn which of its actions are desirable based on

reward that can take place arbitrarily far in the future.

(b)The agents have to explore in order to improve the state

which potentially yields higher rewards in the future or exploit

the state that yields the highest reward based on the existing

knowledge. Pure exploration degrades the agent’s learning but

increases the flexibility of the agent to adapt in a dynamic

environment. On the other hand pure exploitation drives the

agent’s learning process to locally optimal solutions.

different places in which you can place an ‘X’ or ‘O’ in a game of

Tic Tac Toe, and the reward is +1 or -1 depending on whether you

win or lose the game. The “state space” is the total number of

possible states in a particular RL setup. Tic tac toe has a small

enough state space (one reasonable estimate being 593) that we

can actually remember a value for each individual state, using a

table. This is called a tabular method for this reason. For models

like playing chess we use value function approximation as the

total number of possibilities is around 1049

ML Unit 1-Notes
No ratings yet
ML Unit 1-Notes
21 pages
Autoencoders: Types and Applications
No ratings yet
Autoencoders: Types and Applications
91 pages
Deep Learning Midterm Exam
No ratings yet
Deep Learning Midterm Exam
2 pages
Understanding Machine Learning Solution Manual: 2 Gentle Start
No ratings yet
Understanding Machine Learning Solution Manual: 2 Gentle Start
67 pages
CS230 Midterm Solutions Fall 2022
No ratings yet
CS230 Midterm Solutions Fall 2022
20 pages
DL Question Bank Answers
No ratings yet
DL Question Bank Answers
55 pages
AI&ML BM4251 Unit 1-5 Notes
No ratings yet
AI&ML BM4251 Unit 1-5 Notes
116 pages
Question Bank
No ratings yet
Question Bank
4 pages
EC 6303 Question Bank SIGNALS & SYSTEM May 1 16
No ratings yet
EC 6303 Question Bank SIGNALS & SYSTEM May 1 16
56 pages
Systems For Digital Signal Processing: 1 - Introduction
No ratings yet
Systems For Digital Signal Processing: 1 - Introduction
21 pages
9.deep Feedforward Networks
100% (1)
9.deep Feedforward Networks
13 pages
Machine 2021 Jan-Apr Practice
No ratings yet
Machine 2021 Jan-Apr Practice
26 pages
Thyroid Disease Classification Using Machine Learning Project
No ratings yet
Thyroid Disease Classification Using Machine Learning Project
34 pages
AI Unit 4 QA
No ratings yet
AI Unit 4 QA
22 pages
Module-3 Association Analysis: Data Mining Association Analysis: Basic Concepts and Algorithms
No ratings yet
Module-3 Association Analysis: Data Mining Association Analysis: Basic Concepts and Algorithms
34 pages
IAT-I Question Paper With Solution of 18CS71 Artificial Intelligence and Machine Learning Oct-2022-Dr. Paras Nath Singh
No ratings yet
IAT-I Question Paper With Solution of 18CS71 Artificial Intelligence and Machine Learning Oct-2022-Dr. Paras Nath Singh
7 pages
AI 3000 / CS 5500: Reinforcement Learning Assignment 1: Problem 1: Markov Reward Process
No ratings yet
AI 3000 / CS 5500: Reinforcement Learning Assignment 1: Problem 1: Markov Reward Process
5 pages
Problem 1: Markov Reward Process
No ratings yet
Problem 1: Markov Reward Process
3 pages
Machine Learning Full Question Bank
No ratings yet
Machine Learning Full Question Bank
14 pages
Soft Computing Semester 30-Apr-2024 19-05-12
No ratings yet
Soft Computing Semester 30-Apr-2024 19-05-12
5 pages
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
4 pages
21cs743 Model Question Paper Solution
No ratings yet
21cs743 Model Question Paper Solution
33 pages
NLP Midsem Paper Jan 2024 Regular Exam
No ratings yet
NLP Midsem Paper Jan 2024 Regular Exam
4 pages
ANN Quiz - PDF - Artificial Neural Network - Computational Science
No ratings yet
ANN Quiz - PDF - Artificial Neural Network - Computational Science
17 pages
Module 4
No ratings yet
Module 4
18 pages
EC8074 Unit 4
No ratings yet
EC8074 Unit 4
3 pages
Artificial Intelligence Module 5
No ratings yet
Artificial Intelligence Module 5
23 pages
Machine Learning - Question
No ratings yet
Machine Learning - Question
5 pages
ML Question Bank
No ratings yet
ML Question Bank
29 pages
Reinforcement Learning - Unit 14 - Week 11
No ratings yet
Reinforcement Learning - Unit 14 - Week 11
3 pages
Summary Notes of CNN
No ratings yet
Summary Notes of CNN
23 pages
UNIT - V Question Bank
No ratings yet
UNIT - V Question Bank
1 page
Experiment No. 4 TE SL-II (ANN)
100% (1)
Experiment No. 4 TE SL-II (ANN)
2 pages
Intro to k-Nearest Neighbor Algorithm
No ratings yet
Intro to k-Nearest Neighbor Algorithm
3 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
18 pages
ML Question Bank
No ratings yet
ML Question Bank
7 pages
Samuels Checkers Player Presentation
No ratings yet
Samuels Checkers Player Presentation
14 pages
18AI61
No ratings yet
18AI61
3 pages
Deep Learning Exam With Answers
No ratings yet
Deep Learning Exam With Answers
4 pages
Various Neural Network Architect Assignment Questions
No ratings yet
Various Neural Network Architect Assignment Questions
9 pages
ANN-unit 4
No ratings yet
ANN-unit 4
25 pages
Neural Networks & SVMs in AI
No ratings yet
Neural Networks & SVMs in AI
19 pages
Machine Learning Theory Essentials
No ratings yet
Machine Learning Theory Essentials
9 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
Unit - 3-NNDL - Notes
No ratings yet
Unit - 3-NNDL - Notes
17 pages
Watershed Segmentation
No ratings yet
Watershed Segmentation
19 pages
Module 5
No ratings yet
Module 5
31 pages
Question Bank - ML - Unit1,2,3
0% (1)
Question Bank - ML - Unit1,2,3
3 pages
Deep Learning Exp
100% (1)
Deep Learning Exp
25 pages
Single-Layer Perceptron Guide
No ratings yet
Single-Layer Perceptron Guide
39 pages
AI Unit2 ProblemSolving
No ratings yet
AI Unit2 ProblemSolving
191 pages
Adversarial Search 2020
No ratings yet
Adversarial Search 2020
34 pages
Unit 2 - Soft Computing - WWW - Rgpvnotes.in
No ratings yet
Unit 2 - Soft Computing - WWW - Rgpvnotes.in
20 pages
Machine Learning Question Bank-Unit 3
No ratings yet
Machine Learning Question Bank-Unit 3
6 pages
Reinforcement Learning & MDPs
100% (1)
Reinforcement Learning & MDPs
8 pages
37 RL
No ratings yet
37 RL
18 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
Unit 1
No ratings yet
Unit 1
18 pages
Learning Task
No ratings yet
Learning Task
14 pages
Solution of Matrix Eigenvalue Problem: Mike Renfro
No ratings yet
Solution of Matrix Eigenvalue Problem: Mike Renfro
23 pages
Procedure For Evaluation and Selection
100% (1)
Procedure For Evaluation and Selection
2 pages
Valvula de Expansión Electronica
No ratings yet
Valvula de Expansión Electronica
4 pages
VMware Pro 14 + License Keys
No ratings yet
VMware Pro 14 + License Keys
1 page
Python - Numbers
No ratings yet
Python - Numbers
37 pages
A Project Report On "Digital Marketing - It's Scope Types and Importance'
0% (1)
A Project Report On "Digital Marketing - It's Scope Types and Importance'
12 pages
Building Services - Acoustics
No ratings yet
Building Services - Acoustics
17 pages
PDF Wind Load Calculations As Per Is 875 Part 3xls Compress
No ratings yet
PDF Wind Load Calculations As Per Is 875 Part 3xls Compress
85 pages
Block Diagram of Computer System
No ratings yet
Block Diagram of Computer System
7 pages
Sample Spill Report Log
No ratings yet
Sample Spill Report Log
1 page
LPG Refill Invoice & Safety Tips
No ratings yet
LPG Refill Invoice & Safety Tips
2 pages
TD000090 01 PUB Kardio650Cat 181202
No ratings yet
TD000090 01 PUB Kardio650Cat 181202
2 pages
Classical IPC Problems
No ratings yet
Classical IPC Problems
15 pages
Punch - Order Form
No ratings yet
Punch - Order Form
2 pages
Instrumentation List
No ratings yet
Instrumentation List
2 pages
Top Did A Funny To Salad - Lua
No ratings yet
Top Did A Funny To Salad - Lua
288 pages
Blender Quiz - Animation
0% (1)
Blender Quiz - Animation
3 pages
100 Richest Indian
No ratings yet
100 Richest Indian
1 page
The Art of Spin Electronics
No ratings yet
The Art of Spin Electronics
2 pages
PSA RA 2 en Repair Shell Cracks Near Rectangular Beams - Kelvion
No ratings yet
PSA RA 2 en Repair Shell Cracks Near Rectangular Beams - Kelvion
9 pages
Bibliometrics Tools for Researchers
No ratings yet
Bibliometrics Tools for Researchers
10 pages
Hydrogen From Renewable Electricity: An International Review of Power-To-Gas Pilot Plants For Stationary Applications
No ratings yet
Hydrogen From Renewable Electricity: An International Review of Power-To-Gas Pilot Plants For Stationary Applications
23 pages
2N3390, 2N3391, 2N3392 Silicon NPN Transistor General Purpose Amplifier TO 92 Type Package
No ratings yet
2N3390, 2N3391, 2N3392 Silicon NPN Transistor General Purpose Amplifier TO 92 Type Package
2 pages
CENG435 Syllabbus 2023 24 2
No ratings yet
CENG435 Syllabbus 2023 24 2
4 pages
EV Wireless Charging Review
No ratings yet
EV Wireless Charging Review
27 pages
Unit 4 Lec 13 - Tractor Types Cost Analysis of Power and Attached Implement
No ratings yet
Unit 4 Lec 13 - Tractor Types Cost Analysis of Power and Attached Implement
3 pages
Michelin X Works d2 - Product Sheet - en (1) - 2021
No ratings yet
Michelin X Works d2 - Product Sheet - en (1) - 2021
2 pages
PEP 0 – Index of Python Enhancement Proposals (PEPs) | peps.python.org
No ratings yet
PEP 0 – Index of Python Enhancement Proposals (PEPs) | peps.python.org
59 pages
Trunk Path & Numbering Guide
No ratings yet
Trunk Path & Numbering Guide
3 pages
ALN-EN - Smoke Detector Specification PDF
No ratings yet
ALN-EN - Smoke Detector Specification PDF
1 page

Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)

Uploaded by

Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)

Uploaded by

UNIT-5 PART C

1)Explain the Q function and Q learning algorithm assuming

2) Explain the K – nearest neighbor algorithm for

A Q-learning task has these components: Agent, Environment,

1)To simplify the problem, we assume a hypothetical user whose

2)Our recommender model is going to be the agent of the

3)The user behaves as the system’s environment, responding to

6)Our state is defined as the product features and

7)Therefore, feedback and action together give us the next

The goal of the agent is to learn a policy that maximizes

5)Identify the suitable learning method for training a robotic

ans) ​Industrial robots deployed today across various industries

Reinforcement learning (RL) is a type of Machine Learning where

The concept of Reinforcement Learning has been around for a

For RL, we use a framework called the ​Markov Decision Process

(MDP) ​which produces an easy framework for a really complex

problem. An ​agent​ (e.g. robotic arm) would first observe the

environment it’s in and take ​actions ​accordingly. ​Rewards​ are

given out according to the result.

measure the joint angles, velocity, and the end-effector pose:

The main objective is to find a ​policy. ​A​ ​policy is something that

tells us how to act in a particular state. The objective is to find a

policy that makes the most rewarding decisions.

Now, you put the objective together. We want to find a sequence

of actions that maximize expected rewards or minimize cost

Q-learning is a ​model-free​ reinforcement learning algorithm

which means that it does not require a model of the

environment. It’s especially effective because it can handle

problems with random transitions and rewards, without

requiring adaptations.The most common Q-learning method

consists of these steps:

2. Observe the reward and the next state.

3. Take the action with the highest Q.

7 . How does Q function become able to learn with and without

complete knowledge of reward function and state transition

Q-learning is a ​model-free​ reinforcement learning algorithm

which means that it does not require a model of the

environment. It’s especially effective because it can handle

problems with random transitions and rewards, without

Q-learning is an ​off-policy learner​. Means it learns the value of the

optimal policy independently of the agent’s actions. On the other

8 . How does setting a Reinforcement Learning problem require

(a) Delayed reward

(b) Exploring unknown or exploiting already learned states and

(c) Number of old states should be considered to decide action

Ans. (a) Delayed reward:

In the general case of the reinforcement learning problem, the

agent's actions determine not only its immediate reward, but

decides which action to take. The model of long-run optimality

the agent is using determines exactly how it should take the

to learn from delayed reinforcement: it may take a long

sequence of actions, receiving insignificant reinforcement, then

finally arrive at a state with high reinforcement. The agent must

be able to learn which of its actions are desirable based on

reward that can take place arbitrarily far in the future.

(b)​The agents have to explore in order to improve the state

which potentially yields higher rewards in the future or exploit

knowledge. Pure exploration degrades the agent’s learning but

increases the flexibility of the agent to adapt in a dynamic

environment. On the other hand pure exploitation drives the

agent’s learning process to locally optimal solutions.

different places in which you can place an ‘X’ or ‘O’ in a game of

Tic Tac Toe, and the reward is +1 or -1 depending on whether you

possible states in a particular RL setup. Tic tac toe has a small

enough state space (one reasonable estimate being ​593​) that we

can actually remember a value for each individual state, using a

like playing chess we use ​value function approximation as the

total number of possibilities is around 10​49

You might also like

ans) Industrial robots deployed today across various industries

For RL, we use a framework called the Markov Decision Process

(MDP) which produces an easy framework for a really complex

problem. An agent (e.g. robotic arm) would first observe the

environment it’s in and take actions accordingly. Rewards are

The main objective is to find a policy. A policy is something that

Q-learning is a model-free reinforcement learning algorithm

Q-learning is a model-free reinforcement learning algorithm

Q-learning is an off-policy learner. Means it learns the value of the

(b)The agents have to explore in order to improve the state

enough state space (one reasonable estimate being 593) that we

like playing chess we use value function approximation as the

total number of possibilities is around 1049