0% found this document useful (0 votes)
32 views5 pages

Unit 3 Ai

Uploaded by

klr289020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views5 pages

Unit 3 Ai

Uploaded by

klr289020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 5

Passive Reinforcement Learning (RL) is a type of reinforcement learning where an

agent learns a policy or makes decisions based on pre-determined environments or


fixed actions, rather than actively exploring and interacting with the environment.
The key difference in passive learning is that the agent does not take actions to
directly influence or change the environment in a deliberate way, but instead, it
learns from observing the results of its actions, typically in a way that involves
no active exploration or experimentation.

Based on this concept, we discuss three methods of estimating utility:

Direct Utility Estimation (DUE)

The first, most naive method of estimating utility comes from the simplest
interpretation of the above definition. We construct an agent that follows the
policy until it reaches the terminal state. At each step, it logs its current
state, reward. Once it reaches the terminal state, it can estimate the utility for
each state for that iteration, by simply summing the discounted rewards from that
state to the terminal one.

It can now run this 'simulation' nn times, and calculate the average utility of
each state. If a state occurs more than once in a simulation, both its utility
values are counted separately.

Adaptive Dynamic Programming (ADP)

This method makes use of knowledge of the past state ss, the action aa, and the new
perceived state s′s′ to estimate the transition probability P(s′ | s,a)P(s′ | s,a).
It does this by the simple counting of new states resulting from previous states
and actions.
The program runs through the policy a number of times, keeping track of:

each occurrence of state ss and the policy-recommended action aa in NsaNsa


each occurrence of s′s′ resulting from aa on ss in Ns′|saNs′|sa.

Temporal-difference learning (TD) sodhi

Applications of Passive RL:


Policy Evaluation: Used when we want to evaluate how good a given policy is, rather
than learn a new one. This is useful in scenarios where a good policy is already
available,
Simulated Environments: In certain applications where exploration is costly or
dangerous, passive learning allows agents to be trained using a fixed sequence of
interactions that might simulate real-world conditions, such as robotics or
healthcare.
--------------------------------------------------------
Active Reinforcement Learning (RL) is a type of reinforcement learning where the
agent actively interacts with the environment to learn an optimal policy that
maximizes cumulative rewards over time. Unlike passive reinforcement learning,
where the agent only evaluates a fixed policy, active reinforcement learning allows
the agent to explore the environment, take actions, and adjust its behavior based
on the outcomes of its actions.

Key Characteristics of Active Reinforcement Learning


Exploration and Exploitation:
Exploration refers to the agent trying out new actions in the environment to
discover better strategies.
Exploitation refers to the agent using the knowledge it has gathered to maximize
rewards by taking actions that it believes will yield the best outcomes.
Balancing exploration and exploitation is critical in active RL. If the agent
explores too much, it may waste time on suboptimal actions, but if it exploits too
much, it might miss better strategies.
Learning an Optimal Policy:

In active RL, the agent’s goal is to learn a policy, which is a strategy that
defines the best action to take from any given state to maximize long-term rewards.
The agent learns this policy through trial and error by interacting with the
environment.
State-Action-Reward-State-Action (SARSA):

Active RL models often use a state-action-reward-state-action (SARSA) or Q-learning


approach, where the agent learns the value of taking specific actions in given
states and how that leads to future rewards.
Reward Signal:

The agent receives feedback in the form of rewards (or penalties) after taking
actions in the environment. The goal is for the agent to learn which actions lead
to higher rewards over time and optimize its decision-making process.
Dynamic Behavior:

Since the agent is actively exploring and interacting with the environment, its
behavior is dynamic and continuously changing based on the accumulated knowledge.
It adapts to the feedback it receives from the environment.

Key Algorithms in Active RL:


Q-Learning:

A model-free, off-policy algorithm that aims to learn the value of taking an action
in a given state. The core of Q-learning is the Q-function, which estimates the
future rewards for each action in each state.
The agent updates the Q-values using the formula: Q(st,at)←Q(st,at)
+α(rt+1+γmax⁡
aQ(st+1,a)−Q(st,at))Q(s_t, a_t) \leftarrow Q(s_t, a_t) + \alpha \
left( r_{t+1} + \gamma \max_{a} Q(s_{t+1}, a) - Q(s_t, a_t) \right)Q(st,at)←Q(st,at
)+α(rt+1+γamaxQ(st+1,a)−Q(st,at)) where:Q(st,at)Q(s_t, a_t)Q(st,at) is the current
Q-value for state sts_tst and action ata_tat.
rt+1r_{t+1}rt+1 is the reward received after taking action ata_tat from state
sts_tst.
γ\gammaγ is the discount factor that determines the importance of future rewards.
α\alphaα is the learning rate.
SARSA (State-Action-Reward-State-Action):

A model-free, on-policy algorithm that updates the Q-value based on the action
actually taken, as opposed to Q-learning, which uses the greedy action from the
next state. The SARSA update formula is: Q(st,at)←Q(st,at)
+α(rt+1+γQ(st+1,at+1)−Q(st,at))Q(s_t, a_t) \leftarrow Q(s_t, a_t) + \alpha \
left( r_{t+1} + \gamma Q(s_{t+1}, a_{t+1}) - Q(s_t, a_t) \right)Q(st,at)←Q(st,at)
+α(rt+1+γQ(st+1,at+1)−Q(st,at))
Deep Q-Networks (DQN):

DQN combines Q-learning with deep neural networks. It uses a neural network to
approximate the Q-values for large state spaces, allowing RL to scale to more
complex tasks like playing video games or robotics.
Policy Gradient Methods:
These methods, including REINFORCE and Actor-Critic methods, directly optimize the
policy rather than learning a value function. The agent adjusts the probability
distribution over actions to maximize expected rewards.

Applications of Active RL:


Game Playing: Active RL has been famously applied in gaming environments, such as
AlphaGo and OpenAI's Dota 2 bot, where the agent learns through trial and error and
continuously improves its strategy.
Robotics: Robots can learn to perform tasks like object manipulation, walking, or
grasping by actively interacting with the environment.
Autonomous Vehicles: Self-driving cars use active RL to navigate complex
environments, learn optimal driving strategies, and improve safety.
Healthcare: Active RL can be used to learn treatment strategies in personalized
medicine, such as optimizing dosage or intervention timing.
Finance: Active RL is used to learn trading strategies and optimize portfolio
management by continuously interacting with market conditions.
-------------------------------------------------------------------------
Reinforcement Learning (RL) is a type of machine learning in which an agent learns
to make decisions by interacting with an environment. The goal of RL is for the
agent to learn a strategy, called a policy, that maximizes cumulative rewards over
time by choosing actions that lead to favorable outcomes.
--------------------------------------------
Policy Search in Artificial Intelligence (AI) refers to the process of finding the
optimal policy that allows an agent to make the best possible decisions in a given
environment to maximize a specific objective, typically a cumulative reward. The
policy is a mapping from states (or observations) to actions, and the goal of
policy search is to determine which actions to take in different states in order to
achieve the highest possible long-term reward.

Key Concepts in Policy Search:


Policy:

A policy is a strategy or function that defines the actions an agent should take at
each state in the environment. It can be deterministic (specific action for each
state) or stochastic (probability distribution over actions for each state).
Objective:

The objective of policy search is to optimize the policy so that the agent
maximizes the cumulative reward or minimizes a cost function over time. This often
involves balancing exploration (trying new actions) and exploitation (choosing the
best-known actions).
Direct vs. Indirect Policy Search:

Direct Policy Search: Involves optimizing the policy directly, without needing to
learn the value function (which predicts future rewards). This can be done through
techniques like reinforcement learning and policy gradient methods.
Indirect Policy Search: Involves learning a value function (or model) first, and
then deriving the policy from it. This is common in value-based methods like Q-
learning and SARSA, where the policy is derived by selecting actions that maximize
the value function.

Applications of Policy Search:


Robotics: Teaching robots to perform complex tasks, like walking, grasping objects,
or navigating through environments.
Game AI: Training agents to play games like chess, Go, or Dota 2 by searching for
optimal policies for decision-making in competitive environments.
Autonomous Vehicles: Enabling self-driving cars to make real-time decisions by
searching for policies that ensure safe and efficient driving.
Healthcare: Developing treatment policies, such as personalized medicine or
optimizing medical intervention strategies based on patient data.
Finance: Learning trading strategies where the goal is to maximize return on
investment over time.
Advertising and Marketing; sodhi
-----------------------------------------------------

Text Classification: A Summary for 10 Marks


Introduction to Text Classification

Text classification is a core task in natural language processing (NLP), which


involves assigning predefined categories or labels to text. This process helps in
organizing and structuring text data, such as articles, emails, or social media
posts. Common applications of text classification include sentiment analysis, topic
labeling, spam detection, and intent detection. For instance, a text classifier can
analyze a sentence like “This product is so easy to use and has a great user
interface” and assign relevant tags such as UI and Easy to Use.

There are three main approaches to text classification:

Rule-based Systems
Machine Learning-based Systems
Hybrid Systems

1. Rule-Based Systems
Rule-based systems use a set of predefined, handcrafted linguistic rules to
classify text. These rules rely on the identification of certain keywords or
patterns in the text. For example, to classify news articles into Sports and
Politics, you would define two lists of words associated with each category (e.g.,
football, LeBron James for sports and Donald Trump, Putin for politics). When
classifying a new text, the system counts how many times words from each list
appear in the text. The category with more matching words is chosen.

Example: For the headline “When is LeBron James' first game with the Lakers?”, the
rule-based system would classify it under Sports because it contains the word
LeBron James.
Advantages:

Easy to understand and modify.


Can be improved over time by adding new rules.
Disadvantages:

Time-consuming and requires deep domain knowledge.


Difficult to maintain as the system grows more complex.
Doesn’t scale well; adding new rules may interfere with existing ones.

2. Machine Learning-based Systems


Machine learning-based systems do not rely on manually crafted rules. Instead, they
learn to classify text based on labeled training data. These systems use algorithms
that analyze past examples to recognize patterns and associations between text
features and their corresponding categories.

Feature Extraction: The first step in machine learning-based text classification is


to convert text into numerical representations. One common method is Bag of Words
(BoW), where the frequency of words is used to represent the text as a vector. For
instance, if the dictionary of words contains {This, is, awesome, bad, basketball},
and the text is “This is awesome,” it would be represented as a vector: (1, 1, 1,
0, 0), indicating the frequency of each word in the text.

Training: The model is trained using labeled data, where each text is associated
with a category (e.g., Sports, Politics). The machine learning algorithm learns to
associate specific patterns in the text with these categories. After training, the
model can classify unseen text based on the learned patterns.

Advantages:

More accurate than rule-based systems, especially for complex tasks.


Easier to maintain; new examples can be used to retrain the model.
Can handle a variety of tasks with large and diverse datasets.
Popular Algorithms:

Naive Bayes: A probabilistic classifier often used for text classification.


Support Vector Machine (SVM): A powerful classifier used for separating text into
distinct categories.
Deep Learning Models: Advanced models like neural networks, including Recurrent
Neural Networks (RNNs) and transformers like BERT, that can learn more complex
relationships in text data.

3. Hybrid Systems
Hybrid systems combine the strengths of both rule-based and machine learning-based
approaches. These systems use a machine learning classifier as the base and add
rule-based systems to handle specific cases where the classifier may fail or be
less accurate. This allows the system to improve its accuracy by handling edge
cases or ambiguous classifications.

Advantages:

Combines the flexibility and learning power of machine learning with the precision
of rule-based systems.
Allows fine-tuning to improve performance for specific categories or difficult
cases.

You might also like