0% found this document useful (0 votes)
103 views21 pages

Unit-1 MLT

This document provides an introduction to machine learning, including definitions, techniques, applications, and types of learning. It discusses how machine learning allows computers to learn from experience and improve automatically over time. The three main types of learning covered are supervised learning, unsupervised learning, and reinforcement learning. Supervised learning uses labeled training data, unsupervised learning finds patterns in unlabeled data, and reinforcement learning involves trial-and-error learning with feedback to maximize rewards. Examples are given of how machine learning has been successfully applied in areas like speech recognition, autonomous vehicles, and playing games.

Uploaded by

ayush
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views21 pages

Unit-1 MLT

This document provides an introduction to machine learning, including definitions, techniques, applications, and types of learning. It discusses how machine learning allows computers to learn from experience and improve automatically over time. The three main types of learning covered are supervised learning, unsupervised learning, and reinforcement learning. Supervised learning uses labeled training data, unsupervised learning finds patterns in unlabeled data, and reinforcement learning involves trial-and-error learning with feedback to maximize rewards. Examples are given of how machine learning has been successfully applied in areas like speech recognition, autonomous vehicles, and playing games.

Uploaded by

ayush
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

0

Introduction
Ever since computers were invented, we have wondered whether they might be
made to learn. If we could understand how to program them to learn-to improve
automatically with experience-the impact would be dramatic.

Machine Learning Techniques • Imagine computers learning from medical records which treatments are most
effective for new diseases

Unit-1 • Houses learning from experience to optimize energy costs based on the particular
usage patterns of their occupants.
• Personal software assistants learning the evolving interests of their users in order
to highlight especially relevant stories from the online morning newspaper

Examples of Successful Applications of


Why is Machine Learning Important?
Machine Learning
• Some tasks cannot be defined well, except by examples (e.g., recognizing
• Learning to recognize spoken words people).
• Learning to drive an autonomous vehicle • Relationships and correlations can be hidden within large amounts of data.
• Learning to classify new astronomical structures Machine Learning/Data Mining may be able to find these relationships.
• Human designers often produce machines that do not work as well as desired
• Learning to play world-class backgammon
in the environments in which they are used.
• The amount of knowledge available about certain tasks might be too large for
explicit encoding by humans (e.g., medical diagnostic).
• Environments change over time.
• New knowledge about tasks is constantly being discovered by humans. It may
be difficult to continuously re-design systems “by hand”.

3 4
0

Areas of Influence for Machine Learning Machine Learning: A Definition


• Statistics: How best to use samples drawn from unknown probability distributions to
help decide from which distribution some new sample is drawn?
A computer program is said to learn from experience E
• Brain Models: Non-linear elements with weighted inputs (Artificial Neural with respect to some class of tasks T and performance
Networks) have been suggested as simple models of biological neurons. measure P, if its performance at tasks in T, as measured
• Adaptive Control Theory: How to deal with controlling a process having unknown
parameters that must be estimated during operation?
by P, improves with experience E.
• Psychology: How to model human performance on various learning tasks?
• Artificial Intelligence: How to write algorithms to acquire the knowledge humans are
able to acquire, at least, as well as humans?
• Evolutionary Models: How to model certain aspects of biological evolution to
improve the performance of computer programs?

5 6

TYPES OF LEARNING
In general, machine learning algorithms can be classified into three types.
Why “Learn”?  Supervised learning
 Unsupervised learning
 Reinforcement learning
Learning is used when:
• Human expertise does not exist (navigating on Mars) SUPERVISED LEARNING
• Humans are unable to explain their expertise (speech recognition) A training set of examples with the correct responses (targets) is provided and, based on
• Solution changes in time (routing on a computer network) this training set, the algorithm generalises to respond correctly to all possible inputs.
• Solution needs to be adapted to particular cases (user biometrics) This is also called learning from examples. Supervised learning is the machine learning
task of learning a function that maps an input to an output based on example input-
output pairs.
Supervised learning is used to solve classification and regression problems.

Classification: A classification problem is when the output variable is a category, such as


“Red” or “blue” , “disease” or “no disease”.
Regression: A regression problem is when the output variable is a real value, such as
7 “dollars” or “weight”.
0

REINFORCEMENT LEARNING
UNSUPERVISED LEARNING This is somewhere between supervised and unsupervised learning. This type of
Correct responses are not provided, but instead the algorithm tries to identify similarities learning is used to control the behavior of a system.
between the inputs so that inputs that have something in common are categorised together.
The statistical approach to unsupervised learning is known as density estimation. Reinforcement learning is sometime called learning with a critic because of this
monitor that scores the answer, but does not suggest improvements. Reinforcement
Unsupervised learning is a type of machine learning algorithm used to draw inferences from learning is the problem of getting an agent to act in the world so as to maximize its
datasets consisting of input data without labeled responses. In unsupervised learning rewards. A learner (the program) is not told what actions to take as in most forms of
algorithms, a classification or categorization is not included in the observations. There are no machine learning, but instead must discover which actions yield the most reward by
output values and so there is no estimation of functions. Since the examples given to the trying them. In the most interesting and challenging cases, actions may affect not only
learner are unlabeled, the accuracy of the structure that is output by the algorithm cannot be the immediate reward but also the next situations and, through that, all subsequent
evaluated. The most common unsupervised learning method is cluster analysis, which is used rewards.
for exploratory data analysis to find hidden patterns. Association and dimensionality
reduction are some other tasks that can be performed by unsupervised learning. Example: Consider teaching a dog a new trick: we cannot tell it what to do, but we
can reward/punish it if it does the right/wrong thing. It has to find out what it did that
made it get the reward/punishment. We can use a similar method to train computers
to do many tasks, such as playing backgammon or chess, scheduling jobs, and
controlling robot limbs.

Well-Posed Learning Problem Checkers Game


Definition: A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P, if its performance at tasks in T,
as measured by P, improves with experience E.

To have a well-defined learning problem, three features needs to be identified:


1. The class of tasks
2. The measure of performance to be improved
3. The source of experience

11 12
0

Game Basics Rules of the Game


• Checkers is played by two players. Each player begins the game with 12 colored • Moves are allowed only on the dark squares, so pieces always move diagonally.
discs. (One set of pieces is black and the other red.) Each player places his or her Single pieces are always limited to forward moves (toward the opponent).
pieces on the 12 dark squares closest to him or her. Black moves first. Players • A piece making a non-capturing move (not involving a jump) may move only
then alternate moves. one square.
• The board consists of 64 squares, alternating between 32 dark and 32 light • A piece making a capturing move (a jump) leaps over one of the opponent's
squares. pieces, landing in a straight diagonal line on the other side. Only one piece may
• It is positioned so that each player has a light square on the right side corner be captured in a single jump; however, multiple jumps are allowed during a
closest to him or her. single turn.
• A player wins the game when the opponent cannot make a move. In most cases, • When a piece is captured, it is removed from the board.
this is because all of the opponent's pieces have been captured, but it could also • If a player is able to make a capture, there is no option; the jump must be made.
be because all of his pieces are blocked in. • If more than one capture is available, the player is free to choose whichever he or
she prefers.

13 14

Rules of the Game Cont. Well-Defined Learning Problem


• When a piece reaches the furthest row from the player who controls that piece, it
is crowned and becomes a king. One of the pieces which had been captured is A checkers learning problem:
placed on top of the king so that it is twice as high as a single piece.  Task T: playing checkers
• Kings are limited to moving diagonally but may move both forward and  Performance measure P: percent of games won against opponents
backward. (Remember that single pieces, i.e. non-kings, are always limited to  Training experience E: playing practice games against itself
forward moves.)
• Kings may combine jumps in several directions, forward and backward, on the A handwriting recognition learning problem:
same turn. Single pieces may shift direction diagonally during a multiple capture  Task T: recognizing and classifying handwritten words within images
turn, but must always jump forward (toward the opponent).  Performance measure P: percent of words correctly classified
 Training experience E: a database of handwritten words with given
classifications

15 16
0

A robot driving learning problem:


 Task T: driving on public four-lane highways using vision sensors
Designing a Learning System
 Performance measure P: average distance travelled before an error (as judged by
human overseer) 1. Choosing the Training Experience
 Training experience E: a sequence of images and steering commands recorded 2. Choosing the Target Function
while observing a human driver
3. Choosing a Representation for the Target Function
4. Choosing a Function Approximation Algorithm
1. Estimating training values
2. Adjusting the weights
5. The Final Design

17 18

The basic design issues and approaches to machine 1. Choosing the Training Experience
learning is illustrated by considering designing a
• The first design choice is to choose the type of training experience from which
program to learn to play checkers, with the goal of the system will learn.
entering it in the world checkers tournament • The type of training experience available can have a significant impact on
success or failure of the learner.

There are three attributes which impact on success or failure of the learner

1. Whether the training experience provides direct or indirect feedback regarding


the choices made by the performance system.
2. The degree to which the learner controls the sequence of training examples
3. How well it represents the distribution of examples over which the final system
performance P must be measured.

19 20
0

1. Whether the training experience provides direct or indirect feedback regarding 2. A second important attribute of the training experience is the degree to which the
the choices made by the performance system. learner controls the sequence of training examples
For example, in checkers game: For example, in checkers game:
• In learning to play checkers, the system might learn from direct training examples consisting of individual • The learner might depends on the teacher to select informative board states and to provide the correct move
checkers board states and the correct move for each. for each.

• Indirect training examples consisting of the move sequences and final outcomes of various games played. • Alternatively, the learner might itself propose board states that it finds particularly confusing and ask the
teacher for the correct move.
• The information about the correctness of specific moves early in the game must be inferred indirectly from
the fact that the game was eventually won or lost. • The learner may have complete control over both the board states and (indirect) training classifications, as it
does when it learns by playing against itself with no teacher present.
• Here the learner faces an additional problem of credit assignment, or determining the degree to which each
move in the sequence deserves credit or blame for the final outcome. • Notice in this last case the learner may choose between experimenting with novel board states that it has not
yet considered, or honing its skill by playing minor variations of lines of play it currently finds most
• Credit assignment can be a particularly difficult problem because the game can be lost even when early promising.
moves are optimal, if these are followed later by poor moves.

• Hence, learning from direct training feedback is typically easier than learning from indirect feedback.

21 22

3. A third attribute of the training experience is how well it represents the


distribution of examples over which the final system performance P must be
2. Choosing the Target Function
measured.
The next design choice is to determine exactly what type of knowledge will be
Learning is most reliable when the training examples follow a distribution similar to that of future test learned and how this will be used by the performance program.
examples. • Lets begin with a checkers-playing program that can generate the legal moves
For example, in checkers game:
from any board state.
• In checkers learning scenario, the performance metric P is the percent of games the system wins in the world • The program needs only to learn how to choose the best move from among these
tournament. legal moves. This learning task is representative of a large class of tasks for
• If its training experience E consists only of games played against itself, there is an danger that this training
which the legal moves that define some large search space are known a priori, but
experience might not be fully representative of the distribution of situations over which it will later be tested. for which the best search strategy is not known.
For example, the learner might never encounter certain crucial board states that are very likely to be played
by the human checkers champion.

• It is necessary to learn from a distribution of examples that is somewhat different from those on which the
final system will be evaluated. Such situations are problematic because mastery of one distribution of
examples will not necessary lead to strong performance over some other distribution.

23 24
0

Given this setting where we must learn to choose among the legal moves, the most 2. An alternative target function is an evaluation function that assigns a numerical
obvious choice for the type of information to be learned is a program, or function, score to any given board state
that chooses the best move for any given board state. Let the target function V and the notation
V:B R
1. Let ChooseMove be the target function and the notation is which denote that V maps any legal board state from the set B to some real value
ChooseMove : B M
which indicate that this function accepts as input any board from the set of legal We intend for this target function V to assign higher scores to better board states. If
board states B and produces as output some move from the set of legal moves M. the system can successfully learn such a target function V, then it can easily use it to
select the best move from any current board position.
ChooseMove is an choice for the target function in checkers example, but this
function will turn out to be very difficult to learn given the kind of indirect training
experience available to our system

25 26

Let us define the target value V(b) for an arbitrary board state b in B, as follows:
1. if b is a final board state that is won, then V(b) = 100
3. Choosing a Representation for the
2. if b is a final board state that is lost, then V(b) = -100 Target Function
3. if b is a final board state that is drawn, then V(b) = 0
4. if b is a not a final state in the game, then V(b) = V(b' ), let us choose a simple representation - for any given board state, the function c will
where b' is the best final board state that can be achieved starting from b and be calculated as a linear combination of the following board features:
playing optimally until the end of the game
xl: the number of black pieces on the board
x2: the number of red pieces on the board
x3: the number of black kings on the board
x4: the number of red kings on the board
x5: the number of black pieces threatened by red (i.e., which can be
captured on red's next turn)
x6: the number of red pieces threatened by black

27 28
0

Thus, learning program will represent as a linear function of the form Partial design of a checkers learning program:

• Task T: playing checkers


Where, • Performance measure P: percent of games won in the world tournament
• w0 through w6 are numerical coefficients, or weights, to be chosen by the • Training experience E: games played against itself
learning algorithm. • Target function: V: Board R
• Learned values for the weights w1 through w6 will determine the relative • Target function representation
importance of the various board features in determining the value of the board
• The weight w0 will provide an additive constant to the board value

The first three items above correspond to the specification of the learning task,
whereas the final two items constitute design choices for the implementation of the
learning program.

29 30

4. Choosing a Function Approximation Function Approximation Procedure


Algorithm 1. Derive training examples from the indirect training experience available to the
learner
• In order to learn the target function f we require a set of training examples, each 2. Adjusts the weights wi to best fit these training examples
describing a specific board state b and the training value Vtrain(b) for b.

• Each training example is an ordered pair of the form (b, Vtrain(b)).

• For instance, the following training example describes a board state b in


which black has won the game (note x2 = 0 indicates that red has no remaining
pieces) and for which the target function value Vtrain(b) is therefore +100.

((x1=3, x2=0, x3=1, x4=0, x5=0, x6=0), +100)

31 32
0

1. Estimating training values 2. Adjusting the weights

A simple approach for estimating training values for intermediate board states is to Specify the learning algorithm for choosing the weights wi to best fit the set of
assign the training value of Vtrain(b) for any intermediate board state b to be training examples {(b, Vtrain(b))}
V(̂ Successor(b))

A first step is to define what we mean by the bestfit to the training data.
Where , • One common approach is to define the best hypothesis, or set of weights, as that
V̂ is the learner's current approximation to V
which minimizes the squared error E between the training values and the values
Successor(b) denotes the next board state following b for which it is again the predicted by the hypothesis.
program's turn to move

Rule for estimating training values

Vtrain(b) ← V̂ (Successor(b)) • Several algorithms are known for finding weights of a linear function that
minimize E.
33 34

In our case, we require an algorithm that will incrementally refine the weights as Here ƞ is a small constant (e.g., 0.1) that moderates the size of the weight update.
new training examples become available and that will be robust to errors in these
estimated training values Working of weight update rule

One such algorithm is called the least mean squares, or LMS training rule. For • When the error (Vtrain(b)- V̂(b)) is zero, no weights are changed.
each observed training example it adjusts the weights a small amount in the • When (Vtrain(b) - V̂(b)) is positive (i.e., when V̂(b) is too low), then each weight
direction that reduces the error on this training example is increased in proportion to the value of its corresponding feature. This will
raise the value of V̂(b), reducing the error.
LMS weight update rule :- For each training example (b, Vtrain(b)) • If the value of some feature xi is zero, then its weight is not altered regardless of
Use the current weights to calculate V̂ (b) the error, so that the only weights updated are those whose features actually
For each weight wi, update it as occur on the training example board.

wi ← wi + ƞ (Vtrain (b) - V̂(b)) xi

35 36
0

5. The Final Design 1.The Performance System is the module that must solve the given performance
task by using the learned target function(s).
It takes an instance of a new problem (new game) as input and produces a trace of
The final design of checkers learning system can be described by four distinct
its solution (game history) as output.
program modules that represent the central components in many learning systems
In checkers game, the strategy used by the Performance System to select its next
move at each step is determined by the learned V̂ evaluation function. Therefore, we
expect its performance to improve as this evaluation function becomes increasingly
accurate.

2.The Critic takes as input the history or trace of the game and produces as output
a set of training examples of the target function. As shown in the diagram, each
training example in this case corresponds to some game state in the trace, along
with an estimate Vtrain of the target function value for this example.

37 38

3.The Generalizer takes as input the training examples and produces an output The sequence of design choices made for the checkers program is summarized in
hypothesis that is its estimate of the target function. below figure
It generalizes from the specific training examples, hypothesizing a general function
that covers these examples and other cases beyond the training examples.
In our example, the Generalizer corresponds to the LMS algorithm, and the output
hypothesis is the function V̂ described by the learned weights w0, . . . , W6.

4.The Experiment Generator takes as input the current hypothesis and outputs a
new problem (i.e., initial board state) for the Performance System to explore. Its
role is to pick new practice problems that will maximize the learning rate of the
overall system.
In our example, the Experiment Generator always proposes the same initial game
board to begin a new game.

39 40
0

Issues in Machine Learning • What is the best strategy for choosing a useful next training experience, and how
does the choice of this strategy alter the complexity of the learning problem?
• What algorithms exist for learning general target functions from specific training
examples? In what settings will particular algorithms converge to the desired • What is the best way to reduce the learning task to one or more function
function, given sufficient training data? Which algorithms perform best for approximation problems? Put another way, what specific functions should the
which types of problems and representations? system attempt to learn? Can this process itself be automated?

• How much training data is sufficient? What general bounds can be found to • How can the learner automatically alter its representation to improve its ability to
relate the confidence in learned hypotheses to the amount of training experience represent and learn the target function?
and the character of the learner's hypothesis space?

• When and how can prior knowledge held by the learner guide the process of
generalizing from examples? Can prior knowledge be helpful even when it is
only approximately correct?

41 42

History of ML Introduction of Machine Learning Approaches

1946: First computer called ENIAC to perform numerical computations


1950: Alan Turing proposes the Turing test. Can machines think?
1952: First game playing program for checkers by Arthur Samuel at IBM.
1957: Perceptron developed by Frank Roseblatt. Can be combined to form a neural
network.
1980: Decision Tree

Early 1990’s: Statistical learning theory. Emphasize learning from data instead of rule-
based inference. Bay’s Learning
2000: Support Vector Machine
Current status: Used widely in industry, combination of various approaches but data-
driven is prevalent.
0

Artificial Neural Network is able to learn from the data and provide responses in the form of
predictions or classifications. It learns from the example data sets. One can enhance existing
data analysis techniques owing to their advanced predictive capabilities.
Clustering
It comprises of 3 layers:
Input Layer: The input layer is the first layer of an ANN that receives the input information

Hidden Layer: They are in the middle of neural networks,there can be 1 or more hidden
layers which perform computations on input data.

Output Layer: It is the last layer of the neural network this layer helps to determine output
through computations performed by hidden layers.

Applications:
Image recognition
Speech recognition
Machine translation
Medical diagnosis

Clustering is a part of unsupervised machine learning where data is grouped based on the
similarity of the data points.
Goal is to find a natural grouping in data so that items in the same clusters are more similar
to each other than to objects in other groups.
Clusters means data when arranged into subgroups where in every group has similar
classes,it comes into picture based on the human tendency to classify objects into classes or
subgroups based on their behaviour,nature and activities they perform.
So the machine learns all the features and patterns by itself.

Applications:
Identification of cancer cells: It helps to cluster data on the basis of cancerous and non-
cancerous data which results in cancer detection
Search Engine: When we search something on google, clustering groups similar objects in a
single cluster and provides that to us.
Wireless Networks: Improve their energy consumption and optimize data transmission.
Customer Segmentation: Clustering allows them to segment customers into several clusters
based on which they can adopt new strategies to appeal to their customer base.
0

Hierarchical Clustering: It is used to group together the unlabeled data points having similar
characteristics.
Divisive Clustering: All the data points are treated as one big cluster and the process of
clustering involves dividing (Top-down approach) the one big cluster into various small
clusters.
Agglomerative Clustering: . It ia a bottom-up approach to form a big cluster from small
clusters.
Centroid Clustering: It organizes the data into non-hierarchical clusters.
Model Based Clustering: One uses certain models for clusters and tries to optimize the fit
between the data and the models. In the model-based clustering approach, the data are
viewed as coming from a mixture of probability distributions, each of which represents a
different cluster.
Graphical Theoretic Clustering: Graph Clustering intends to partition the nodes in the graph
into disjoint groups. It is the process of grouping the nodes of the graph into clusters.
Spectral Clustering: Spectral Clustering is a growing clustering algorithm which has
performed better than many traditional clustering algorithms in many cases. It treats each
data point as a graph-node and thus transforms the clustering problem into a graph-
partitioning problem.

Reinforcement Learning Introduction

Decision tree learning


Reinforcement learning is the problem of getting an agent to act in the world so as to
maximize its rewards. A learner (the program) is not told what actions to take as in most One of the most widely used and practical methods for inductive
forms of machine learning, but instead must discover which actions yield the most reward inference
by trying them. In the most interesting and challenging cases, actions may affect not only
the immediate reward but also the next situations and, through that, all subsequent Approximate discrete-valued target function
rewards. The learned function is represented by a decision tree
Decision tree can also be re-represented as sets of if-then rules to
improve human readability
Example: Consider teaching a dog a new trick: we cannot tell it what to do, but we can
reward/punish it if it does the right/wrong thing. It has to find out what it did that made it get
Robust to noisy data and capable of learning disjunctive expressions
the reward/punishment.
0

Decision Tree Representation (1/2) Examples of Decision Tree

Classification of instances
Play tennis?
Decision tree classify instances by sorting them down the tree from the
root to some leaf node Outlook

Node
Sunny Overcast Rain
Specifies test of some attribute
Branch Humidity Yes Wind

Corresponds to one of the possible values for this attribute


High Normal Strong Weak

No Yes No Yes

Decision Tree Representation (2/2) Bayesian reasoning is …


e.g. Probability, statistics, data-fitting
Saturday morning
Outlook (Outlook=sunny, Temperature=Hot,
Humidity=high, Wind=Strong)
Sunny Rain (Outlook=Sunny ∧ Humidity=High) so NO
Overcast Each path corresponds to a conjunction of attribute tests
Humidity Yes Wind Decision trees represent a disjunction of conjunction of constraints
on the attribute values of instances
High Normal Strong Weak (Outlook=Sunny ∧Humidity=normal)
∨(Outlook=Overcast)

No ∨(Outlook=Rain ∧Wind=Weak)
Yes No Yes
0

Bayesian reasoning is … Bayesian reasoning is …


A theory of mind A theory of artificial intelligence

[Thrun et al.]

Bayesian reasoning is … and …


A standard tool of computer vision Applications in:
Data mining
Robotics
Signal processing
Bioinformatics
Text analysis (inc. spam filters)
and (increasingly) graphics!
0

Core Concepts
Goal: find the best hypothesis from some space H, given the observed data D. We
consider the best hypothesis to be the most probable hypothesis in H, but in order to
determine that, we must assume a probability distribution over hypothesis class H.

Further, we must also know something about the relationship between the observed
data and the hypothesis.

Now, Let us assume that I received 100 e-mails. The percentage of spam in the whole
Basic Formulas for Probabilities e-mail is 30%. So, I have 30 spam e-mails and 70 desired e-mails in 100 e-mails. The
percentage of the word ‘offer’ that occurs in spam e-mails is 80%. It means 80% of 30
Product Rule: probability P(A  B) of a conjunction of two events A and B: e-mail and it makes 24. Now, I know that 30 e-mails of 100 are spam and 24 of them
P(A  B) = P(A | B) P(B) = P(B | A) P(A) contain ‘offer’ where 6 of them not contains ‘offer’.
Sum Rule: probability of a disjunction of two events A and B:
P(A  B) = P(A) + P(B) - P(A  B)
Theorem of total probability: if events A1,…, An are mutually exclusive, then

The question is: What is the probability of spam where the mail
63
contains the word ‘offer’?
0

We need to find the total number of mails which contains ‘offer’ ;


24 +7 = 31 mail contain the word ‘offer’

Find the probability of spam if the mail contains ‘offer’ ;


In 31 mails 24 contains ‘offer’ means 77.4% = 0.774 (probability) SUPPORT VECTOR MACHINE
Solution with Bayes’ Equation:
A = Spam
B = Contains the word ‘offer’

P( contains offer|spam) = 0.8 (given in the question)


P(spam) = 0.3 (given in the question)

P(contains offer) = 0.3*0.8 + 0.7*0.1 = 0.31

History of SVM Linear Classifiers Estimation:


x f yest
• SVM is related to statistical learning theory [3] f(x,w,b) = sign(w. x - b)
denotes +1
• SVM was first introduced in 1992 [1] denotes -1
w: weight vector
• SVM becomes popular because of its success in handwritten digit x: data vector
recognition
How would you classify
• SVM is now regarded as an important example of “kernel methods”, one this data?
of the key area in machine learning

2023/3/3 67 2023/3/3 68
0

a a
Linear Classifiers Linear Classifiers
x f yest x f yest
f(x,w,b) = sign(w. x - b) f(x,w,b) = sign(w. x - b)
denotes +1 denotes +1
denotes -1 denotes -1

How would you classify How would you classify


this data? this data?

2023/3/3 69 2023/3/3 70

a a
Linear Classifiers Linear Classifiers
x f yest x f yest
f(x,w,b) = sign(w. x - b) f(x,w,b) = sign(w. x - b)
denotes +1 denotes +1
denotes -1 denotes -1

How would you classify Any of these would be


this data? fine..

..but which is best?

2023/3/3 71 2023/3/3 72
0

a a
Classifier Margin Maximum Margin
x f yest x f yest
f(x,w,b) = sign(w. x - b) f(x,w,b) = sign(w. x - b)
denotes +1 denotes +1
denotes -1 Define the margin of a denotes -1 The maximum margin linear
linear classifier as the width classifier is the linear
that the boundary could be classifier with the, um,
increased by before hitting maximum margin.
a datapoint.
This is the simplest kind of
SVM (Called an LSVM)

Linear SVM
2023/3/3 73 2023/3/3 74

a
Maximum Margin Why Maximum Margin?
x f yest
f(x,w,b) = sign(w. x + b) f(x,w,b) = sign(w. x - b)
denotes +1 denotes +1
denotes -1 The maximum margin linear denotes -1 The maximum margin linear
classifier is the linear classifier is the linear
classifier with the, um, classifier with the, um,
maximum margin. maximum margin.
Support Vectors are Support Vectors are
those datapoints that the This is the simplest kind of those datapoints that the This is the simplest kind of
margin pushes up against SVM (Called an LSVM) margin pushes up against SVM (Called an LSVM)

Linear SVM
2023/3/3 75 2023/3/3 76
0

How to calculate the distance from a point to a line?


Estimate the Margin
denotes +1 denotes +1
denotes -1 x
wx +b = 0 denotes -1 x
wx +b = 0

X – Vector X – Vector
W W
W – Normal Vector W – Normal Vector
b – Scale Value b – Scale Value

 http://mathworld.wolfram.com/Point-LineDistance2-Dimensional.html • What is the distance expression for a point x to a line wx+b= 0?


 In our case, w1*x1+w2*x2+b=0,

 thus, w=(w1,w2), x=(x1,x2) x w  b xw  b


d (x )  
 i  1 w i2
2 d
w 2
2023/3/3 77 2023/3/3 78

Large-margin Decision Boundary


• The decision boundary should be as far away from the data of both
Finding the Decision Boundary
classes as possible
– We should maximize the margin, m • Let {x1, ..., xn} be our data set and let yi  {1,-1} be the class
– Distance between the origin and the line wtx=-b is b/||w|| label of xi
• The decision boundary should classify all points correctly 
• To see this: when y=-1, we wish (wx+b)<1, when y=1, we wish
(wx+b)>1. For support vectors, we wish y(wx+b)=1.
Class 2
• The decision boundary can be found by solving the following
constrained optimization problem
m
Class 1

2023/3/3 79 2023/3/3 80
0

Data science vs Machine Learning


Data science is the field that studies data and how to extract meaning from it while
machine learning focuses on tools and techniques for building models that can learn by
themselves using data.

You might also like