Unit-6
Learning and Expert System
• Introduction of learning, Forms of learning,
inductive learning, Learning Decision trees,
Introduction of Expert System & its examples,
Expert system Architecture and its shells.
Learning
• In which we describe agents that can improve
their behavior through diligent study of their
own experiences.
• An agent is learning if it improves its
performance on future tasks after making
observations about the world.
FORMS OF LEARNING
Any component of an agent can be improved
by learning from data. The improvements, and
the techniques used to make them, depend
on four major factors:
• Which component is to be improved.
• What prior knowledge the agent already has.
• What representation is used for the data and
the component.
• What feedback is available to learn from
Components to be learned
The components of these agents include:
1. A direct mapping from conditions on the current state to actions.
2. A means to infer relevant properties of the world from the percept sequence.
3. Information about the way the world evolves and about the results of possible
actions the agent can take.
4. Utility information indicating the desirability of world states.
5. Action-value information indicating the desirability of actions.
6. Goals that describe classes of states whose achievement maximizes the agent’s
utility. Each of these components can be learned.
• Consider, for example, an agent training to become a taxi driver. Every time the
instructor shouts “Brake!” the agent might learn a condition– action rule for when
to brake (component 1); the agent also learns every time the instructor does not
shout. By seeing many camera images that it is told contain buses, it can learn to
recognize them (2). By trying actions and observing the results—for example,
braking hard on a wet road—it can learn the effects of its actions (3). Then, when
it receives no tip from passengers who have been thoroughly shaken up during the
trip, it can learn a useful component of its overall utility function (4)
Representation and prior knowledge
• We have seen several examples of representations for agent components:
propositional and first-order logical sentences for the components in a
logical agent; Bayesian networks for the inferential components of a
decision-theoretic agent, and so on.
• Effective learning algorithms have been devised for all of these
representations.
• Factored representation—a vector of attribute values—and outputs that
can be either a continuous numerical value or a discrete value. It covers
functions and prior knowledge composed of first-order logic sentences,
and Bayesian networks.
• There is another way to look at the various types of learning. We say that
learning a (possibly incorrect) general function or rule from specific input–
output pairs is called inductive learning.
• Analytical or deductive INDUCTIVE LEARNING learning: going from a
known general rule to a new rule that is logically entailed, but is
DEDUCTIVE LEARNING useful because it allows more efficient processing.
Feedback to learn from
• There are three types of feedback that determine the three main
types of learning: In unsupervised learning the agent learns
patterns in the input even though no explicit UNSUPERVISED
LEARNING CLUSTERING feedback is supplied. The most common
unsupervised learning task is clustering: detecting potentially useful
clusters of input examples. For example, a taxi agent might
gradually develop a concept of “good traffic days” and “bad traffic
days” without ever being given labeled examples of each by a
teacher
• In reinforcement learning the agent learns from a series of
reinforcements—rewards REINFORCEMENT LEARNING or
punishments. For example, the lack of a tip at the end of the
journey gives the taxi agent an indication that it did something
wrong. The two points for a win at the end of a chess game tells the
agent it did something right. It is up to the agent to decide which of
the actions prior to the reinforcement were most responsible for it.
Feedback to learn from
• In supervised learning the agent observes some example input–output pairs and learns
SUPERVISED LEARNING a function that maps from input to output. In component 1 above,
the inputs are percepts and the output are provided by a teacher who says “Brake!” or “Turn
left.” In component 2, the inputs are camera images and the outputs again come from a
teacher who says “that’s a bus.” In 3, the theory of braking is a function from states and
braking actions to stopping distance in feet. In this case the output value is available directly
from the agent’s percepts (after the fact); the environment is the teacher. In practice, these
distinction are not always so crisp.
• In semi-supervised learning we SEMI-SUPERVISED LEARNING are given a few labeled
examples and must make what we can of a large collection of unlabeled examples. Even the
labels themselves may not be the oracular truths that we hope for. Imagine that you are
trying to build a system to guess a person’s age from a photo. You gather some labeled
examples by snapping pictures of people and asking their age. That’s supervised learning. But
in reality some of the people lied about their age. It’s not just that there is random noise in
the data; rather the inaccuracies are systematic, and to uncover them is an unsupervised
learning problem involving images, self-reported ages, and true (unknown) ages. Thus, both
noise and lack of labels create a continuum between supervised and unsupervised learning.
LEARNING DECISION TREES
• Decision tree induction is one of the simplest
and yet most successful forms of machine
learning.
The decision tree representation
• A decision tree represents a function that takes as input a vector of
attribute values and returns a “decision”—a single output value. The input
and output values can be discrete or continuous.
• For now we will concentrate on problems where the inputs have discrete
values and the output has exactly two possible values; this is Boolean
classification, where each POSITIVE example input will be classified as true
(a positive example) or false (a negative example). NEGATIVE A decision
tree reaches its decision by performing a sequence of tests.
• Each internal node in the tree corresponds to a test of the value of one of
the input attributes, Ai, and the branches from the node are labeled with
the possible values of the attribute, Ai = vik.
• Each leaf node in the tree specifies a value to be returned by the function.
The decision tree representation is natural for humans; indeed, many
“How To” manuals (e.g., for car repair) are written entirely as a single
decision tree stretching over hundreds of pages.
Example
• As an example, we will build a decision tree to decide whether to wait for a table
at a GOAL PREDICATE restaurant. The aim here is to learn a definition for the goal
predicate WillWait. First we list the attributes that we will consider as part of the
input:
1. Alternate: whether there is a suitable alternative restaurant nearby.
2. Bar : whether the restaurant has a comfortable bar area to wait in.
3. Fri/Sat: true on Fridays and Saturdays.
4. Hungry: whether we are hungry.
5. Patrons: how many people are in the restaurant (values are None, Some, and Full).
6. Price: the restaurant’s price range ($, $$, $$$).
7. Raining: whether it is raining outside.
8. Reservation: whether we made a reservation.
9. Type: the kind of restaurant (French, Italian, Thai, or burger).
10. WaitEstimate: the wait estimated by the host (0–10 minutes, 10–30, 30–60, or
>60).
Example
• Note that every variable has a small set of
possible values; the value of WaitEstimate, for
example, is not an integer, rather it is one of the
four discrete values 0–10, 10–30, 30–60, or >60.
• Notice that the tree ignores the Price and Type
attributes. Examples are processed by the tree
starting at the root and following the appropriate
branch until a leaf is reached.
• For instance, an example with Patrons = Full and
WaitEstimate = 0–10 will be classified as positive
(i.e., yes, we will wait for a table).
Inducing decision trees from examples
A training set of 12 examples
DECISION-TREE-LEARNING algorithm
• The DECISION-TREE-LEARNING algorithm adopts a
greedy divide-and-conquer strategy: always test the
most important attribute first.
• This test divides the problem up into smaller
subproblems that can then be solved recursively. By
“most important attribute,” we mean the one that
makes the most difference to the classification of an
example.
• That way, we hope to get to the correct classification
with a small number of tests, meaning that all paths in
the tree will be short and the tree as a whole will be
shallow.
THE THEORY OF LEARNING
• We’ll start with the question of how many examples are needed for learning. We
saw from the learning curve for decision tree learning on the restaurant problem
that improves with more training data. Learning curves are useful, but they are
specific to a particular learning algorithm on a particular problem. Are there some
more general principles governing the number of examples needed in general?
• Questions like this are addressed by computational learning theory, which lies at
the intersection of AI, statistics, COMPUTATIONAL LEARNING THEORY and
theoretical computer science. The underlying principle is that any hypothesis that
is seriously wrong will almost certainly be “found out” with high probability after a
small number of examples, because it will make an incorrect prediction.
• Thus, any hypothesis that is consistent with a sufficiently large set of training
examples is unlikely to be seriously wrong: that is, it must be probably
approximately correct. Any learning algorithm that returns hypotheses PROBABLY
APPROXIMATELY CORRECT PAC LEARNING that are probably approximately correct
is called a PAC learning algorithm; we can use this approach to provide bounds on
the performance of various learning algorithms.
Types of learning
• Explanation-Based Learning
• Computational learning
• Relevance based
• Instance based
• Inference learning
• statistical
Explanation-Based Learning
• Artificial Intelligence (AI) has made significant strides in learning and problem-
solving, thanks to various approaches that allow machines to improve their
performance over time. One such approach is Explanation-Based Learning (EBL).
• EBL is a form of machine learning where the system learns by understanding and
generalizing from a single example, focusing on the underlying principles that
make the example work.
• This article explores the concept of EBL, its working mechanism, applications,
and its advantages and challenges.
What is Explanation-Based Learning?
• Explanation-Based Learning is a machine learning technique where an AI
system learns by analyzing and understanding the underlying structure or
reasoning behind a specific example. Unlike traditional learning methods that
require numerous examples to generalize, EBL leverages domain knowledge to
form a general rule or concept from just one or a few examples. This makes EBL
particularly useful in domains where data is scarce or where understanding the
rationale behind examples is more critical than just recognizing patterns.
Key Characteristics of EBL
• Use of Domain Knowledge: EBL relies heavily on pre-existing domain knowledge to explain why a particular
example is a valid instance of a concept. This knowledge helps the system to generalize the learned concept to
new, similar situations.
• Focused Learning: EBL focuses on understanding the essential features of an example that are necessary to
achieve a goal or solve a problem. This contrasts with other learning methods that may treat all features equally or
rely on statistical correlations.
• Efficiency: Since EBL can learn from a single example by generalizing from it, it is computationally efficient
compared to other learning methods that require large datasets for training.
How Explanation-Based Learning Works?
• Explanation-Based Learning follows a systematic process that involves the following steps:
• Input Example: The learning process begins with a single example that the system needs to learn from. This
example is typically a positive instance of a concept that the system needs to understand.
• Domain Knowledge: The system uses domain knowledge, which includes rules, concepts, and relationships
relevant to the problem domain. This knowledge is crucial for explaining why the example is valid.
• Explanation Generation: The system generates an explanation for why the example satisfies the concept. This
involves identifying the relevant features and their relationships that make the example a valid instance.
• Generalization: Once the explanation is generated, the system generalizes it to form a broader concept that can
apply to other similar examples. This generalization is typically in the form of a rule or a set of rules that describe
the concept.
• Learning Outcome: The outcome of EBL is a generalized rule or concept that can be applied to new situations. The
system can now use this rule to identify or solve similar problems in the future.
Example of Explanation-Based
Learning in AI
• Scenario: Diagnosing a Faulty Component in a Car Engine
• Context: Imagine you have an AI system designed to diagnose problems in car engines. One day, the system is
given a specific example where the engine fails to start. After analyzing the case, the system learns that the failure
was due to a faulty ignition coil.
Step 1: Input Example
The system is provided with a scenario where a car engine fails to start. The diagnostic information indicates that the
cause is a faulty ignition coil.
Step 2: Use of Domain Knowledge
The AI system has pre-existing domain knowledge about car engines. It knows how the ignition system works, the role
of the ignition coil, and the conditions under which an engine would fail to start.
Step 3: Explanation Generation
Using this domain knowledge, the system generates an explanation for why the engine failure occurred:
Ignition System Knowledge: The system understands that the ignition coil is responsible for converting the battery's
low voltage to the high voltage needed to create a spark in the spark plugs.
Faulty Coil Impact: It explains that if the ignition coil is faulty, it will fail to generate the necessary high voltage,
resulting in no spark, which prevents the engine from starting.
Step 4: Generalization
The system then generalizes this explanation to form a rule:
General Rule: "If the engine fails to start and the ignition coil is faulty, then the cause of the failure is likely due to the
ignition coil not providing the necessary voltage to the spark plugs."
Step 5: Learning Outcome
• The AI system has now learned a new diagnostic rule that can be applied to future cases:
• Future Application: In future diagnostics, if the system encounters a similar scenario where the engine fails to
start, it can use this learned rule to quickly check the ignition coil as a potential cause.
Applications of Explanation-Based
Learning
• Explanation-Based Learning is particularly useful in domains where
understanding the reasoning behind decisions is critical.
• Some of the notable applications of EBL include:
• Medical Diagnosis: EBL can be used in medical diagnosis systems to learn
from specific cases and generalize the underlying principles for diagnosing
similar conditions in other patients.
• Legal Reasoning: In legal systems, EBL can help in understanding the
principles behind legal precedents and applying them to new cases with
similar circumstances.
• Automated Planning: EBL is useful in automated planning systems, where
it can learn from successful plans and generalize the steps required to
achieve similar goals in different contexts.
• Natural Language Processing: EBL can be applied in natural language
processing tasks where understanding the structure and meaning behind
language is more important than statistical correlations.
Explanation-Based Learning
Advantages of Explanation-Based Learning
• Efficiency in Learning: EBL can learn effectively from a single example, making it efficient in
situations where data is scarce or expensive to obtain.
• Understanding and Generalization: EBL focuses on understanding the rationale behind examples,
leading to more robust generalizations that can be applied to a wide range of situations.
• Interpretable Models: The rules or concepts learned through EBL are often more interpretable than
those learned through other methods, making it easier to understand and trust the system's
decisions.
Challenges and Limitations
• Dependency on Domain Knowledge: EBL relies heavily on accurate and comprehensive domain
knowledge. If the domain knowledge is incomplete or incorrect, the system may generate flawed
explanations and generalizations.
• Limited to Well-Defined Problems: EBL is most effective in well-defined problem domains where
the rules and relationships are clear. It may struggle in more complex or ambiguous domains.
• Complexity of Explanation Generation: Generating explanations can be computationally intensive,
especially in domains with complex relationships and a large number of features.
Computational learning
• Computational learning theory (CoLT) is a subfield of artificial intelligence (AI) that
studies how machines learn from data. It's a cornerstone of AI that examines the
fundamental processes, algorithms, and mechanisms that enable machines to
understand and perceive data. CoLT's main goals include:
• Developing algorithms that can learn from data
• Understanding the limits of what can be learned from data
• Quantifying learning tasks and algorithms using mathematical frameworks
• CoLT's theoretical results are often applied to supervised learning, a type of
inductive learning where an algorithm is given labeled samples. For example, an
algorithm might be given descriptions of mushrooms, with labels indicating
whether or not they are edible. The algorithm then uses these samples to create a
classifier that can assign labels to new samples.
• CoLT concepts like VC dimension and sample complexity are used to inform
regularization techniques that prevent overfitting. Regularization methods help
ensure that the learned model performs well on both training data and new,
unseen data
Computational learning
• One of the main goals of computational learning theory is to develop algorithms
that can learn from data. This includes both supervised and unsupervised learning
algorithms. Supervised learning algorithms are given a set of training data that
includes the correct answers for a set of tasks. The goal of a supervised learning
algorithm is to learn a function that can map from the input data to the correct
answers. Unsupervised learning algorithms are given a set of data but not the
correct answers. The goal of an unsupervised learning algorithm is to find patterns
in the data.
• Another goal of computational learning theory is to understand the limits of what
can be learned from data. This includes understanding the sample complexity of
learning algorithms. The sample complexity of a learning algorithm is the number
of data points that the algorithm needs to see in order to learn a function. The
sample complexity tells us how much data an algorithm needs in order to learn a
function.
• The goals of computational learning theory are to develop algorithms that can
learn from data and to understand the limits of what can be learned from data.
These goals are important for developing AI systems that can learn from data and
for understanding the limits of AI.
What are some of the main methods used in
computational learning theory?
• Computational learning theory is a branch of artificial intelligence (AI) that deals
with the design and analysis of algorithms that learn from data. The main methods
used in computational learning theory are:
1. Inductive learning: This is the most common type of learning used in AI. In inductive
learning, a computer program is given a set of training data (examples of correct
inputs and outputs), and it is then tasked with generating a general rule or model
that can be used to predict the output for new inputs.
2. Deductive learning: In deductive learning, a computer program is given a set of
rules or a model, and it is then tasked with using these to generate correct outputs
for new inputs.
3. Abductive learning: In abductive learning, a computer program is given a set of
training data, and it is then tasked with generating a hypothesis (a possible
explanation for the data) that is consistent with the data.
4. Reinforcement learning: In reinforcement learning, a computer program is given a
set of rewards and punishments, and it is then tasked with learning a policy (a set
of rules) that will maximize the rewards and minimize the punishments.
Relevance based learning
• Relevance Learning is a critical concept in Artificial
Intelligence (AI) that enables models to identify and
prioritize the most important information within a dataset.
• This technique is essential for enhancing the performance
of various AI applications, such as search engines,
recommendation systems, and even in fields like medical
diagnostics.
• Understanding how AI determines the importance of data
can significantly improve the effectiveness of these
systems.
What is Relevance Learning?
• Relevance learning in AI refers to the process by which models learn to determine what is relevant
between input data and the expected output. This involves ranking or ordering components based
on their significance, often using a predefined criterion and labeled data.
• Various mathematical models are trained to mimic human-like judgments of importance, producing
results that make sense within the given context. Relevance learning is particularly crucial in areas
like Information Retrieval, Recommender Systems, and Natural Language Processing, where the
primary goal is to deliver only the most relevant information to the end-user.
• Key Features of Relevance Learning
• Contextual Understanding: Models learn to understand the context and evaluate the significance
of objects within that context.
• Data-Driven: Relevance Learning leverages large datasets with relevance annotations to train
models effectively.
• Ranking Optimization: This principle is based on positioning items, with the most significant ones
ranked higher.
• Personalization: Relevance learning allows for the adjustment of relevance according to user
preferences and experiences.
• Continuous Improvement: Models are continually updated with new data, enabling them to adapt
and improve their relevance assessments over time.
Relevance Learning Architecture
Input Layer
• Data Ingestion: Involves gathering raw data from various sources, such as text documents, images, and user interactions. This data
includes features (keywords, metadata) and labels indicating relevance.
• Preprocessing: The raw data is cleaned, formatted, and aligned with the descriptions of ideal training data. This step might involve
tokenization, embedding generation, or feature extraction.
Feature Extraction
• Domain-Specific Features: Extract features that are directly useful for the task at hand, such as term frequency or document length in a
document retrieval system.
• Deep Feature Representation: High-level abstract features are extracted using advanced architectures like Convolutional Neural
Networks (CNNs) or Transformers.
Relevance Model
• Learning to Rank (LTR): This core model learns the relevance of various inputs, often using algorithms like RankNet, LambdaRank, or
neural ranking models to score items according to their relevance.
• Loss Function: Specific ranking loss functions, such as cross-entropy for classification or pairwise loss for ranking, are used during the
model's training phase.
• Supervised Learning: Relies on datasets where relevance has been predefined for each data item.
Scoring and Ranking
• Relevance Scoring: The trained model predicts a relevance score for each item, which helps in ranking them according to their
significance to the query or task.
• Sorting Mechanism: Items are sorted based on their relevance scores, with the highest-scoring items placed at the top.
Feedback Loop
• User Feedback: The system gathers user feedback, such as clicks or ratings, to refine and update the relevance model.
• Model Retraining: The model is periodically retrained with new data and feedback to enhance its performance over time.
Output Layer
• Result Presentation: The most relevant items are ranked and presented to the user based on the relevance scores.
Evaluation Metrics: Metrics like Precision at K, Mean Average Precision (MAP), or Normalized Discounted Cumulative Gain (NDCG) are used
to evaluate the model’s performance.
The Relevance Learning Hypothesis
• The Relevance Learning Hypothesis suggests that a model must be trained to understand the relevance between data points to predict
or rank information in a manner consistent with human reasoning. This hypothesis forms the foundation of research in fields like
information retrieval, recommendation systems, and natural language processing.
Relevance Learning
Key Points of the Relevance Learning Hypothesis:
• Contextual Relevance: Relevance varies depending on how the query or task is framed.
• Dynamic Nature of Relevance: Relevance is not static; it changes over time with new data and shifting user
preferences.
• Human-Centric Understanding: Models must replicate human-like judgments about what aspects are important
or useful.
• Optimization for Utility: Accurately predicting relevance maximizes the utility of the system, providing users with
results that exceed their expectations.
• Feedback Loop: Continuous feedback helps models improve their relevance predictions over time.
Examples of Relevance Learning
• Search Engines: Google Search uses relevance learning algorithms to rank billions of web pages, ensuring the most
relevant pages appear at the top.
• Recommendation Systems: Platforms like Amazon and Netflix use relevance learning to suggest products or
content based on user history and preferences.
• Personalized News Feeds: Social media platforms like Facebook and Twitter use relevance learning to deliver news
feeds tailored to user interests.
• Information Retrieval Systems: Legal research tools and academic search engines like Google Scholar use
relevance learning to sort and rank documents based on their relevance to the user's query.
• Online Advertising: Google Ads and social media platforms like Facebook use relevance learning to determine
which ads to display based on user behavior and search history.
Instance-based learning
• The Machine Learning systems which are categorized as instance-based learning are the systems that learn the training examples by
heart and then generalizes to new instances based on some similarity measure. It is called instance-based because it builds the
hypotheses from the training instances.
• It is also known as memory-based learning or lazy-learning (because they delay processing until a new instance must be
classified). The time complexity of this algorithm depends upon the size of training data. Each time whenever a new query is
encountered, its previously stores data is examined. And assign to a target function value for the new instance.
• The worst-case time complexity of this algorithm is O (n), where n is the number of training instances. For example, If we were to create
a spam filter with an instance-based learning algorithm, instead of just flagging emails that are already marked as spam emails, our
spam filter would be programmed to also flag emails that are very similar to them. This requires a measure of resemblance between
two emails. A similarity measure between two emails could be the same sender or the repetitive use of the same keywords or
something else.
Advantages:
• Instead of estimating for the entire instance set, local approximations can be made to the target function.
• This algorithm can adapt to new data easily, one which is collected as we go .
Disadvantages:
• Classification costs are high
• Large amount of memory required to store the data, and each query involves starting the identification of a local model from scratch.
• Some of the instance-based learning algorithms are :
• K Nearest Neighbor (KNN)
• Self-Organizing Map (SOM)
• Learning Vector Quantization (LVQ)
• Locally Weighted Learning (LWL)
• Case-Based Reasoning
•
Memory-based learning (MBL)
• Memory-based learning (MBL) is a machine learning technique that uses
stored training data to classify new data items by their similarity. It's also
known as instance-based learning.
Key characteristics of memory-based learning:
• Stores training data: MBL stores all available instances of a task in
memory.
• Compares new data to training data: MBL compares new data items to
the training data to classify them.
• Uses similarity metrics: MBL uses similarity metrics to determine which
training data items are most similar to the new data item.
• Adapts to new data: MBL can adapt to new data by storing new instances
or removing old ones.
• Lazy algorithms: MBL algorithms are sometimes called "lazy" because
they postpone computation until a new instance is observed.
• Some generalization techniques used in MBL include: nearest-neighbor
search, space decomposition techniques, and clustering.
• Examples of MBL algorithms include: k-nearest neighbors algorithm,
Kernel machines, and RBF networks.
Statistical Learning
Difference between Statistical Model and Machine Learning
Statistical Model:
• A mathematical process that attempts to describe the population from which a sample came, which allows us to
make predictions of future samples from that population.
• Examples: Hypothesis testing, Correlation, etc.
Some problem statements solved by statistical modeling:
• employing inferential statistics to calculate the average income of a population from a random sample
• estimating a stock’s future price using previous data, and time series analysis.
Objectives of Statistical Model:
• used for proving any result such as hypothesis testing, and p-value.
• search data for interesting information (exploratory) such as generating hypotheses.
• building a protective model.
Assumptions in Statistical Model:
• Independence, states that there shouldn’t be any relationships between the observations in the collection.
• Normality requires that the response variable’s distribution is approximately normal, with data symmetric around
the mean.
• Linearity indicates that the relationship between the response variable and predictor variable(s) should be linear.
• No multicollinearity, suggesting the independence of predictor variables from each other.
• outliers, the dataset should not contain any outliers that may influence the results.
Types of Statistical Model
• The group of probability distributions that
have a finite number of parameters is known
as parametric.
• Nonparametric models are those where the
kind and quantity of parameters are
adjustable and not predetermined.
• Semiparametric means that the parameter
has both a parametric and a non-parametric.
Machine Learning:
• Machine Learning is the science that allows computers to learn and improve their learning over time, by feeding them data
and information in the form of observations and real-world interactions.
• According to Arthur Samuel machine learning is, “the field of study that gives computers the ability to learn without being
explicitly programmed “ i.
• OR
• According to Tom Mitchell, “Machine learning is the study of computer algorithms that allow computer programs to improve
through experience automatically”.
• Example: Predicting house price with the help of a machine learning model on the basis of attributes such as location, and
area by the help of machine learning we can find out the relationship between the dependent variable (i.e house price) on
independent features (i.e location, area, year of formation) and we can predict the price of another input on the resulting
relation.
Some problem statements for machine learning :
• Recommendation: Utilize collaborative filtering to suggest movies to viewers based on their prior viewing habits and ratings.
• Diseases Prediction: employing a support vector machine to make a prediction about a patient’s propensity to develop a
specific disease based on their medical history and genetic information.
Assumptions in Machine Learning:
• Data is independent and identically distributed (IID), which means that every data point is independent of the others and
has the same distribution.
• The assumption that there is a linear relationship between the input variables and the output variable underlies some
models, such as linear regression.
• Normality, Some models presuppose that the model’s input variables and/or error terms are distributed normally.
• No multicollinearity, Linear models presuppose that the input variables are not highly associated with one another and do
not exhibit multicollinearity.
• High Sample Size, Certain models rely on the sample size being sufficiently big to guarantee precise parameter estimates.
Statistical Learning
What is Statistical Learning?
• Statistical learning is using data to make predictions or identify patterns. It uses statistical
algorithms to analyze and model relationships between variables in a dataset. These algorithms can
perform tasks such as classification, regression, and clustering.
• How is Statistical Learning Used?
• Statistical learning is used in various fields, including finance, healthcare, and marketing. It can
predict customer behavior, identify fraud, and analyze medical data. Some common applications of
statistical learning include:
• Predictive modeling Time series forecasting Image recognition Natural language processing
Anomaly detection
• Benefits of Statistical Learning
• Statistical learning has many benefits, including:
• Accurate predictions: Statistical learning algorithms often make more accurate predictions than
traditional methods. Scalability: Statistical learning algorithms can be applied to large datasets with
many variables. Flexibility: Statistical learning algorithms can be used for various tasks, from
classification to clustering. Automation: Statistical learning algorithms can automate many tasks,
saving time and reducing errors.
Inductive learning
What is Inductive Learning Algorithm?
• Inductive Learning Algorithm (ILA) is an iterative and inductive machine learning algorithm that is
used for generating a set of classification rules, which produces rules of the form “IF-THEN”, for a
set of examples, producing rules at each iteration and appending to the set of rules.
• There are basically two methods for knowledge extraction firstly from domain experts and then
with machine learning. For a very large amount of data, the domain experts are not very useful and
reliable. So we move towards the machine learning approach for this work. To use machine learning
One method is to replicate the expert’s logic in the form of algorithms but this work is very tedious,
time taking, and expensive. So we move towards the inductive algorithms which generate the
strategy for performing a task and need not instruct separately at each step.
Why you should use Inductive Learning?
• The ILA is a new algorithm that was needed even when other reinforcement learnings like ID3 and
AQ were available.
• The need was due to the pitfalls which were present in the previous algorithms, one of the major
pitfalls was the lack of generalization of rules.
• The ID3 and AQ used the decision tree production method which was too specific which were
difficult to analyze and very slow to perform for basic short classification problems.
• The decision tree-based algorithm was unable to work for a new problem if some attributes are
missing.
• The ILA uses the method of production of a general set of rules instead of decision trees, which
overcomes the above problems
Basic Requirements to Apply
Inductive Learning Algorithm
• List the examples in the form of a table ‘T’
where each row corresponds to an example
and each column contains an attribute value.
• Create a set of m training examples, each
example composed of k attributes and a class
attribute with n possible decisions.
• Create a rule set, R, having the initial value
false.
• Initially, all rows in the table are unmarked.
Necessary Steps for Implementation
• Step 1: divide the table ‘T’ containing m examples into n sub-tables (t1, t2,…..tn). One table for
each possible value of the class attribute. (repeat steps 2-8 for each sub-table)
• Step 2: Initialize the attribute combination count ‘ j ‘ = 1.
• Step 3: For the sub-table on which work is going on, divide the attribute list into distinct
combinations, each combination with ‘j ‘ distinct attributes.
• Step 4: For each combination of attributes, count the number of occurrences of attribute values
that appear under the same combination of attributes in unmarked rows of the sub-table under
consideration, and at the same time, not appears under the same combination of attributes of
other sub-tables. Call the first combination with the maximum number of occurrences the max-
combination ‘ MAX’.
• Step 5: If ‘MAX’ == null, increase ‘ j ‘ by 1 and go to Step 3.
• Step 6: Mark all rows of the sub-table where working, in which the values of ‘MAX’ appear, as
classified.
• Step 7: Add a rule (IF attribute = “XYZ” –> THEN decision is YES/ NO) to R whose left-hand side will
have attribute names of the ‘MAX’ with their values separated by AND, and its right-hand side
contains the decision attribute value associated with the sub-table.
• Step 8: If all rows are marked as classified, then move on to process another sub-table and go to
Step 2. Else, go to Step 4. If no sub-tables are available, exit with the set of rules obtained till then.
Expert Systems in AI
• Expert systems are a crucial subset of artificial
intelligence (AI) that simulate the decision-
making ability of a human expert.
• These systems use a knowledge base filled
with domain-specific information and rules to
interpret and solve complex problems.
• Expert systems are widely used in fields such
as medical diagnosis, accounting, coding, and
even in games.
Expert Systems in AI
Components and Architecture of an
Expert System
• Knowledge Base: The knowledge base represents facts and rules. It
consists of knowledge in a particular domain as well as rules to solve a
problem, procedures and intrinsic data relevant to the domain.
• Inference Engine: The function of the inference engine is to fetch the
relevant knowledge from the knowledge base, interpret it and to find a
solution relevant to the user’s problem. The inference engine acquires the
rules from its knowledge base and applies them to the known facts to
infer new facts. Inference engines can also include an explanation and
debugging abilities.
• Knowledge Acquisition and Learning Module: The function of this
component is to allow the expert system to acquire more and more
knowledge from various sources and store it in the knowledge base.
• User Interface: This module makes it possible for a non-expert user to
interact with the expert system and find a solution to the problem.
• Explanation Module: This module helps the expert system to give the user
an explanation about how the expert system reached a particular
conclusion.
Applications of Expert Systems
• Medical Diagnosis: Expert systems assist doctors by analyzing
symptoms and medical history to suggest possible diagnoses or
treatment options. For example, MYCIN, an early expert system,
helped identify bacterial infections and recommend antibiotics.
• Financial Services: In finance, expert systems are used for credit
scoring, fraud detection, and investment advice. They analyze
financial data and patterns to make informed decisions.
• Technical Support: Expert systems can troubleshoot and provide
solutions for technical issues. They guide users through problem-
solving steps based on pre-defined rules and knowledge.
• Manufacturing: In manufacturing, expert systems help optimize
production processes, perform quality control, and manage
inventory by analyzing data and making recommendations.
Expert system Architecture and its
shells.
• Expert system shells are software toolkits that help build expert systems
by providing a knowledge representation scheme and an inference
engine. They are used by knowledge and system engineers to create
expert systems.
Here are some things to consider when choosing an expert system shell:
• Knowledge representation schemes: How knowledge is represented
• Knowledge engineering tools: Tools for knowledge engineering
• Inference engine and problem solving strategies: How the inference
engine and problem solving strategies work
• Compatibility and portability: How compatible and portable the shell is
• User interface: How user-friendly the shell's interface is
• Performance and productivity: How well the shell performs and how
productive it is