0% found this document useful (0 votes)

32 views5 pages

Unit 3 Ai

Uploaded by

klr289020

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views5 pages

Unit 3 Ai

Uploaded by

klr289020

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 5

Passive Reinforcement Learning (RL) is a type of reinforcement learning where an

agent learns a policy or makes decisions based on pre-determined environments or

fixed actions, rather than actively exploring and interacting with the environment.
The key difference in passive learning is that the agent does not take actions to
directly influence or change the environment in a deliberate way, but instead, it
learns from observing the results of its actions, typically in a way that involves
no active exploration or experimentation.

Based on this concept, we discuss three methods of estimating utility:

Direct Utility Estimation (DUE)

The first, most naive method of estimating utility comes from the simplest
interpretation of the above definition. We construct an agent that follows the
policy until it reaches the terminal state. At each step, it logs its current
state, reward. Once it reaches the terminal state, it can estimate the utility for
each state for that iteration, by simply summing the discounted rewards from that
state to the terminal one.

It can now run this 'simulation' nn times, and calculate the average utility of
each state. If a state occurs more than once in a simulation, both its utility
values are counted separately.

Adaptive Dynamic Programming (ADP)

This method makes use of knowledge of the past state ss, the action aa, and the new
perceived state s′s′ to estimate the transition probability P(s′ | s,a)P(s′ | s,a).
It does this by the simple counting of new states resulting from previous states
and actions.
The program runs through the policy a number of times, keeping track of:

each occurrence of state ss and the policy-recommended action aa in NsaNsa

each occurrence of s′s′ resulting from aa on ss in Ns′|saNs′|sa.

Temporal-difference learning (TD) sodhi

Applications of Passive RL:

Policy Evaluation: Used when we want to evaluate how good a given policy is, rather
than learn a new one. This is useful in scenarios where a good policy is already
available,
Simulated Environments: In certain applications where exploration is costly or
dangerous, passive learning allows agents to be trained using a fixed sequence of
interactions that might simulate real-world conditions, such as robotics or
healthcare.
--------------------------------------------------------
Active Reinforcement Learning (RL) is a type of reinforcement learning where the
agent actively interacts with the environment to learn an optimal policy that
maximizes cumulative rewards over time. Unlike passive reinforcement learning,
where the agent only evaluates a fixed policy, active reinforcement learning allows
the agent to explore the environment, take actions, and adjust its behavior based
on the outcomes of its actions.

Key Characteristics of Active Reinforcement Learning

Exploration and Exploitation:
Exploration refers to the agent trying out new actions in the environment to
discover better strategies.
Exploitation refers to the agent using the knowledge it has gathered to maximize
rewards by taking actions that it believes will yield the best outcomes.
Balancing exploration and exploitation is critical in active RL. If the agent
explores too much, it may waste time on suboptimal actions, but if it exploits too
much, it might miss better strategies.
Learning an Optimal Policy:

In active RL, the agent’s goal is to learn a policy, which is a strategy that
defines the best action to take from any given state to maximize long-term rewards.
The agent learns this policy through trial and error by interacting with the
environment.
State-Action-Reward-State-Action (SARSA):

Active RL models often use a state-action-reward-state-action (SARSA) or Q-learning

approach, where the agent learns the value of taking specific actions in given
states and how that leads to future rewards.
Reward Signal:

The agent receives feedback in the form of rewards (or penalties) after taking
actions in the environment. The goal is for the agent to learn which actions lead
to higher rewards over time and optimize its decision-making process.
Dynamic Behavior:

Since the agent is actively exploring and interacting with the environment, its
behavior is dynamic and continuously changing based on the accumulated knowledge.
It adapts to the feedback it receives from the environment.

Key Algorithms in Active RL:

Q-Learning:

A model-free, off-policy algorithm that aims to learn the value of taking an action
in a given state. The core of Q-learning is the Q-function, which estimates the
future rewards for each action in each state.
The agent updates the Q-values using the formula: Q(st,at)←Q(st,at)
+α(rt+1+γmax⁡
aQ(st+1,a)−Q(st,at))Q(s_t, a_t) \leftarrow Q(s_t, a_t) + \alpha \
left( r_{t+1} + \gamma \max_{a} Q(s_{t+1}, a) - Q(s_t, a_t) \right)Q(st,at)←Q(st,at
)+α(rt+1+γamaxQ(st+1,a)−Q(st,at)) where:Q(st,at)Q(s_t, a_t)Q(st,at) is the current
Q-value for state sts_tst and action ata_tat.
rt+1r_{t+1}rt+1 is the reward received after taking action ata_tat from state
sts_tst.
γ\gammaγ is the discount factor that determines the importance of future rewards.
α\alphaα is the learning rate.
SARSA (State-Action-Reward-State-Action):

A model-free, on-policy algorithm that updates the Q-value based on the action
actually taken, as opposed to Q-learning, which uses the greedy action from the
next state. The SARSA update formula is: Q(st,at)←Q(st,at)
+α(rt+1+γQ(st+1,at+1)−Q(st,at))Q(s_t, a_t) \leftarrow Q(s_t, a_t) + \alpha \
left( r_{t+1} + \gamma Q(s_{t+1}, a_{t+1}) - Q(s_t, a_t) \right)Q(st,at)←Q(st,at)
+α(rt+1+γQ(st+1,at+1)−Q(st,at))
Deep Q-Networks (DQN):

DQN combines Q-learning with deep neural networks. It uses a neural network to
approximate the Q-values for large state spaces, allowing RL to scale to more
complex tasks like playing video games or robotics.
Policy Gradient Methods:
These methods, including REINFORCE and Actor-Critic methods, directly optimize the
policy rather than learning a value function. The agent adjusts the probability
distribution over actions to maximize expected rewards.

Applications of Active RL:

Game Playing: Active RL has been famously applied in gaming environments, such as
AlphaGo and OpenAI's Dota 2 bot, where the agent learns through trial and error and
continuously improves its strategy.
Robotics: Robots can learn to perform tasks like object manipulation, walking, or
grasping by actively interacting with the environment.
Autonomous Vehicles: Self-driving cars use active RL to navigate complex
environments, learn optimal driving strategies, and improve safety.
Healthcare: Active RL can be used to learn treatment strategies in personalized
medicine, such as optimizing dosage or intervention timing.
Finance: Active RL is used to learn trading strategies and optimize portfolio
management by continuously interacting with market conditions.
-------------------------------------------------------------------------
Reinforcement Learning (RL) is a type of machine learning in which an agent learns
to make decisions by interacting with an environment. The goal of RL is for the
agent to learn a strategy, called a policy, that maximizes cumulative rewards over
time by choosing actions that lead to favorable outcomes.
--------------------------------------------
Policy Search in Artificial Intelligence (AI) refers to the process of finding the
optimal policy that allows an agent to make the best possible decisions in a given
environment to maximize a specific objective, typically a cumulative reward. The
policy is a mapping from states (or observations) to actions, and the goal of
policy search is to determine which actions to take in different states in order to
achieve the highest possible long-term reward.

Key Concepts in Policy Search:

Policy:

A policy is a strategy or function that defines the actions an agent should take at
each state in the environment. It can be deterministic (specific action for each
state) or stochastic (probability distribution over actions for each state).
Objective:

The objective of policy search is to optimize the policy so that the agent
maximizes the cumulative reward or minimizes a cost function over time. This often
involves balancing exploration (trying new actions) and exploitation (choosing the
best-known actions).
Direct vs. Indirect Policy Search:

Direct Policy Search: Involves optimizing the policy directly, without needing to
learn the value function (which predicts future rewards). This can be done through
techniques like reinforcement learning and policy gradient methods.
Indirect Policy Search: Involves learning a value function (or model) first, and
then deriving the policy from it. This is common in value-based methods like Q-
learning and SARSA, where the policy is derived by selecting actions that maximize
the value function.

Applications of Policy Search:

Robotics: Teaching robots to perform complex tasks, like walking, grasping objects,
or navigating through environments.
Game AI: Training agents to play games like chess, Go, or Dota 2 by searching for
optimal policies for decision-making in competitive environments.
Autonomous Vehicles: Enabling self-driving cars to make real-time decisions by
searching for policies that ensure safe and efficient driving.
Healthcare: Developing treatment policies, such as personalized medicine or
optimizing medical intervention strategies based on patient data.
Finance: Learning trading strategies where the goal is to maximize return on
investment over time.
Advertising and Marketing; sodhi
-----------------------------------------------------

Text Classification: A Summary for 10 Marks

Introduction to Text Classification

Text classification is a core task in natural language processing (NLP), which

involves assigning predefined categories or labels to text. This process helps in
organizing and structuring text data, such as articles, emails, or social media
posts. Common applications of text classification include sentiment analysis, topic
labeling, spam detection, and intent detection. For instance, a text classifier can
analyze a sentence like “This product is so easy to use and has a great user
interface” and assign relevant tags such as UI and Easy to Use.

There are three main approaches to text classification:

Rule-based Systems
Machine Learning-based Systems
Hybrid Systems

1. Rule-Based Systems
Rule-based systems use a set of predefined, handcrafted linguistic rules to
classify text. These rules rely on the identification of certain keywords or
patterns in the text. For example, to classify news articles into Sports and
Politics, you would define two lists of words associated with each category (e.g.,
football, LeBron James for sports and Donald Trump, Putin for politics). When
classifying a new text, the system counts how many times words from each list
appear in the text. The category with more matching words is chosen.

Example: For the headline “When is LeBron James' first game with the Lakers?”, the
rule-based system would classify it under Sports because it contains the word
LeBron James.
Advantages:

Easy to understand and modify.

Can be improved over time by adding new rules.
Disadvantages:

Time-consuming and requires deep domain knowledge.

Difficult to maintain as the system grows more complex.
Doesn’t scale well; adding new rules may interfere with existing ones.

2. Machine Learning-based Systems

Machine learning-based systems do not rely on manually crafted rules. Instead, they
learn to classify text based on labeled training data. These systems use algorithms
that analyze past examples to recognize patterns and associations between text
features and their corresponding categories.

Feature Extraction: The first step in machine learning-based text classification is

to convert text into numerical representations. One common method is Bag of Words
(BoW), where the frequency of words is used to represent the text as a vector. For
instance, if the dictionary of words contains {This, is, awesome, bad, basketball},
and the text is “This is awesome,” it would be represented as a vector: (1, 1, 1,
0, 0), indicating the frequency of each word in the text.

Training: The model is trained using labeled data, where each text is associated
with a category (e.g., Sports, Politics). The machine learning algorithm learns to
associate specific patterns in the text with these categories. After training, the
model can classify unseen text based on the learned patterns.

Advantages:

More accurate than rule-based systems, especially for complex tasks.

Easier to maintain; new examples can be used to retrain the model.
Can handle a variety of tasks with large and diverse datasets.
Popular Algorithms:

Naive Bayes: A probabilistic classifier often used for text classification.

Support Vector Machine (SVM): A powerful classifier used for separating text into
distinct categories.
Deep Learning Models: Advanced models like neural networks, including Recurrent
Neural Networks (RNNs) and transformers like BERT, that can learn more complex
relationships in text data.

3. Hybrid Systems
Hybrid systems combine the strengths of both rule-based and machine learning-based
approaches. These systems use a machine learning classifier as the base and add
rule-based systems to handle specific cases where the classifier may fail or be
less accurate. This allows the system to improve its accuracy by handling edge
cases or ambiguous classifications.

Advantages:

Combines the flexibility and learning power of machine learning with the precision
of rule-based systems.
Allows fine-tuning to improve performance for specific categories or difficult
cases.

Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
No ratings yet
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
16 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
10 pages
Reinforcement Learning Mastery Path
No ratings yet
Reinforcement Learning Mastery Path
18 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
19 pages
Reinforcement Learning - Personal Study Notes
No ratings yet
Reinforcement Learning - Personal Study Notes
12 pages
Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
Unit 5
No ratings yet
Unit 5
45 pages
RLDL Unit 1
No ratings yet
RLDL Unit 1
15 pages
Introduction To Reinforcement Learning
100% (1)
Introduction To Reinforcement Learning
52 pages
RL RS-Unit - 3
No ratings yet
RL RS-Unit - 3
6 pages
RL Concepts and Methods
No ratings yet
RL Concepts and Methods
8 pages
Reinforcement Learning Notes ?
No ratings yet
Reinforcement Learning Notes ?
40 pages
10 Deep Reinforcement
No ratings yet
10 Deep Reinforcement
40 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
MLT Unit-5 Notes
No ratings yet
MLT Unit-5 Notes
17 pages
H&M
No ratings yet
H&M
10 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
5 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
COMP3411 Week 4 - RL
No ratings yet
COMP3411 Week 4 - RL
79 pages
37 RL
No ratings yet
37 RL
18 pages
Sections
No ratings yet
Sections
76 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
4 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
16 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
20 pages
CMPE257 - W10C13 - Reinforcement Learning
No ratings yet
CMPE257 - W10C13 - Reinforcement Learning
161 pages
Reinforcement Learning Guide
No ratings yet
Reinforcement Learning Guide
18 pages
Maai 6
No ratings yet
Maai 6
143 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
17 pages
ML Unit-4 - RTU
No ratings yet
ML Unit-4 - RTU
18 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
35 pages
Unit 5 Notes
No ratings yet
Unit 5 Notes
9 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
16 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
32 pages
Ai Unit 3
No ratings yet
Ai Unit 3
23 pages
Reinforcement Learning - Basics
No ratings yet
Reinforcement Learning - Basics
7 pages
ML Unit-4
No ratings yet
ML Unit-4
10 pages
Multi-Agent Reinforcement Learning-Implementation of Hide and Seek
No ratings yet
Multi-Agent Reinforcement Learning-Implementation of Hide and Seek
7 pages
Sdfesdf
No ratings yet
Sdfesdf
23 pages
Unit-5 Mla
No ratings yet
Unit-5 Mla
22 pages
Reinforcement Learning, Q-Learning
No ratings yet
Reinforcement Learning, Q-Learning
20 pages
Unit 3
No ratings yet
Unit 3
13 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
13 pages
Unit 1
No ratings yet
Unit 1
18 pages
Unit 3
No ratings yet
Unit 3
12 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
Unit - 5
No ratings yet
Unit - 5
43 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
Deep Reinforcement Learning Guide
No ratings yet
Deep Reinforcement Learning Guide
32 pages
Intro To Reinforcement Learning
No ratings yet
Intro To Reinforcement Learning
56 pages
tiếng anhi
No ratings yet
tiếng anhi
7 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
29 pages
Q Learing
No ratings yet
Q Learing
30 pages
Reinforcement LN-6
No ratings yet
Reinforcement LN-6
13 pages
5th Unit Notes Full File
No ratings yet
5th Unit Notes Full File
22 pages
Graphites and Fullerene
No ratings yet
Graphites and Fullerene
9 pages
Holiday Package S3 Phy - Math 2024
No ratings yet
Holiday Package S3 Phy - Math 2024
20 pages
Frequency Shift Keying Scheme To Implement SDR Using Hackrf One
No ratings yet
Frequency Shift Keying Scheme To Implement SDR Using Hackrf One
12 pages
DRAM Controller Design Guide
No ratings yet
DRAM Controller Design Guide
1 page
Smarter Than Air Fryers With Zero Oil Cooking: Presenting Magicook
No ratings yet
Smarter Than Air Fryers With Zero Oil Cooking: Presenting Magicook
2 pages
AP Chemistry Unit 5 Kinetics
No ratings yet
AP Chemistry Unit 5 Kinetics
106 pages
Question Bank (Electrochemistry)
No ratings yet
Question Bank (Electrochemistry)
2 pages
Metrology and Measurement Systems: Index 330930, ISSN 0860-8229 WWW - Metrology.pg - Gda.pl
No ratings yet
Metrology and Measurement Systems: Index 330930, ISSN 0860-8229 WWW - Metrology.pg - Gda.pl
12 pages
Business Analytics: Global Edition James R. Evans Download
100% (1)
Business Analytics: Global Edition James R. Evans Download
127 pages
Analysing Sculpture PDF
No ratings yet
Analysing Sculpture PDF
6 pages
Soeteman Legal Logic or Can We Do Without PDF
No ratings yet
Soeteman Legal Logic or Can We Do Without PDF
14 pages
12-Tuning Based On Integral Error Criteria
No ratings yet
12-Tuning Based On Integral Error Criteria
15 pages
Plagiarism Checker X Originality Report: Similarity Found: 12%
No ratings yet
Plagiarism Checker X Originality Report: Similarity Found: 12%
10 pages
The Second Chance Scalp Cheat Sheet
100% (3)
The Second Chance Scalp Cheat Sheet
2 pages
lsc11 01 Que 20230603
No ratings yet
lsc11 01 Que 20230603
28 pages
Grade 8 Maths Practice Paper
No ratings yet
Grade 8 Maths Practice Paper
33 pages
Polynomial Rings and Field Theory
No ratings yet
Polynomial Rings and Field Theory
19 pages
Primary 4 Math Mid-Year Exam Paper
100% (1)
Primary 4 Math Mid-Year Exam Paper
31 pages
11th Kinematic - 1D Calculus - 3
No ratings yet
11th Kinematic - 1D Calculus - 3
6 pages
Stud Welding Guide for Professionals
No ratings yet
Stud Welding Guide for Professionals
3 pages
An-955 A Cost Effective VHF Amplifier For Land Mobile Radios
No ratings yet
An-955 A Cost Effective VHF Amplifier For Land Mobile Radios
8 pages
STAT-205 (IT) Mid Term Paper
No ratings yet
STAT-205 (IT) Mid Term Paper
2 pages
Solved Previous Year Questions 1 Lyst1734255421087
No ratings yet
Solved Previous Year Questions 1 Lyst1734255421087
14 pages
Carl Friedrich Gauss
No ratings yet
Carl Friedrich Gauss
14 pages
Akselito - Twitter Stoqo CV
No ratings yet
Akselito - Twitter Stoqo CV
1 page
Pages From 5054 - s15 - QP - 21 - Ideal Gas
No ratings yet
Pages From 5054 - s15 - QP - 21 - Ideal Gas
2 pages
Categorical Encoding & Outlier Detection
No ratings yet
Categorical Encoding & Outlier Detection
21 pages
Linear & Angular Mechanics Test
No ratings yet
Linear & Angular Mechanics Test
5 pages
Magnetism
100% (1)
Magnetism
8 pages
B.Sc. 2nd Sem
No ratings yet
B.Sc. 2nd Sem
2 pages

Unit 3 Ai

Uploaded by

Unit 3 Ai

Uploaded by

Passive Reinforcement Learning (RL) is a type of reinforcement learning where an

agent learns a policy or makes decisions based on pre-determined environments or

Based on this concept, we discuss three methods of estimating utility:

Direct Utility Estimation (DUE)

Adaptive Dynamic Programming (ADP)

each occurrence of state ss and the policy-recommended action aa in NsaNsa

Temporal-difference learning (TD) sodhi

Applications of Passive RL:

Key Characteristics of Active Reinforcement Learning

Active RL models often use a state-action-reward-state-action (SARSA) or Q-learning

Key Algorithms in Active RL:

Applications of Active RL:

Key Concepts in Policy Search:

Applications of Policy Search:

Text Classification: A Summary for 10 Marks

Text classification is a core task in natural language processing (NLP), which

There are three main approaches to text classification:

Easy to understand and modify.

Time-consuming and requires deep domain knowledge.

2. Machine Learning-based Systems

Feature Extraction: The first step in machine learning-based text classification is

More accurate than rule-based systems, especially for complex tasks.

Naive Bayes: A probabilistic classifier often used for text classification.

You might also like