PR Unit 1 ....
PR Unit 1 ....
Pattern is everything around in this digital world. A pattern can either be seen physically or it can
be observed mathematically by applying algorithms.
Example: The colours on the clothes, speech pattern etc. In computer science, a pattern is
represented using vector features values.
Pattern recognition is the process of recognizing patterns by using machine learning algorithm.
Pattern recognition can be defined as the classification of data based on knowledge already gained
or on statistical information extracted from patterns and/or their representation. One of the
important aspects of the pattern recognition is its application potential.
Example: consider our face then eyes, ears, nose etc are features of the face.
A set of features that are taken together, forms the features vector.
Example: In the above example of face, if all the features (eyes, ears, nose etc) taken together
then the sequence is feature vector([eyes, ears, nose]). Feature vector is the sequence of a features
represented as a d-dimensional column vector. In case of speech, MFCC (Melfrequency Cepstral
Coefficent) is the spectral features of the speech. Sequence of first 13 features forms a feature
vector.
Feature Vector:
Pattern recognition system should recognise familiar pattern quickly and accurate
Recognize and classify unfamiliar objects
Accurately recognize shapes and objects from different angles
Identify patterns and objects even when partly hidden
Recognise patterns quickly with ease, and with automaticity.
Learning is a phenomena through which a system gets trained and becomes adaptable to give
result in an accurate manner. Learning is the most important phase as how well the system
performs on the data provided to the system depends on which algorithms used on the data. Entire
dataset is divided into two categories, one which is used in training the model i.e. Training set and
the other that is used in testing the model after training, i.e. Testing set.
Training set:
Training set is used to build a model. It consists of the set of images which are used to train the
system. Training rules and algorithms used give relevant information on how to associate input
data with output decision. The system is trained by applying these algorithms on the dataset, all
the relevant information is extracted from the data and results are obtained. Generally, 80% of the
data of the dataset is taken for training data.
Testing set:
Testing data is used to test the system. It is the set of data which is used to verify whether the
system is producing the correct output after being trained or not. Generally, 20% of the data of the
dataset is used for testing. Testing data is used to measure the accuracy of the system. Example: a
system which identifies which category a particular flower belongs to, is able to identify seven
category of flowers correctly out of ten and rest others wrong, then the accuracy is 70 %
A pattern is a physical object or an abstract notion. While talking about the classes of animals, a
description of an animal would be a pattern. While talking about various types of balls, then a
description of a ball is a pattern. In the case balls considered as pattern, the classes could be
football, cricket ball, table tennis ball etc. Given a new pattern, the class of the pattern is to be
determined. The choice of attributes and representation of patterns is a very important step in
pattern classification. A good representation is one which makes use of discriminating attributes
and also reduces the computational burden in pattern classification.
An obvious representation of a pattern will be a vector. Each element of the vector can represent
one attribute of the pattern. The first element of the vector will contain the value of the first
attribute for the pattern being considered.
Example: While representing spherical objects, (25, 1) may be represented as an spherical object
with 25 units of weight and 1 unit diameter. The class label can form a part of the vector. If
spherical objects belong to class 1, the vector would be (25, 1, 1), where the first element
represents the weight of the object, the second element, the diameter of the object and the third
element represents the class of the object.
Advantages:
Disadvantages:
Syntactic Pattern recognition approach is complex to implement and it is very slow process.
Sometime to get better accuracy, larger dataset is required.
It cannot explain why a particular object is recognized.
Example: my face vs my friend’s face.
Applications:
Computer vision
Pattern recognition is used to extract meaningful features from given image/video samples
and is used in computer vision for various applications like biological and biomedical
imaging.
Seismic analysis
Pattern recognition approach is used for the discovery, imaging and interpretation of
temporal patterns in seismic array recordings. Statistical pattern recognition is implemented
and used in different types of seismic analysis model.
Pattern recognition and Signal processing methods are used in various applications of radar
signal classifications like AP mine detection and identification.
Speech recognition
The greatest success in speech recognition has been obtained using pattern recognition
paradigms. It is used in various algorithms of speech recognition which tries to avoid the
problems of using a phoneme level of description and treats larger units such as words as
pattern
Machine Vision:
A machine vision system captures images via a camera and analyzes them to produce
descriptions of images=d objects. For example, during inspection in manufacturing industry
when the manufactured objects are passed through the camera, the images have to be
analyzed online.
CAD helps to assist doctors in making diagnostic decision. Computer assisted diagnosis has
been applied in medical field such as X-rays, ECGs, ultrasound images etc.
Speech Recognition:
This process recognizes the spoken information. In this the software in built around a pattern
recognition system which recognizes the spoken text ans translated it into ASCII characters
which are shown on the screen. In this we can also identify the identity of speaker.
Character Recognition:
This application recognizes both letter and number. In this the optically scanned image is
provided as input and alphanumeric characters are generated as output. Its major implication
is in automation and information handling. It is also used in page readers, zip code, license
plate etc.
Manufacturing:
In this the 3-D images such as structured light, laser, stereo etc is provided as input and as a
result we can identify the objects.
Fingerprint Identification:
In this the input image is obtained from fingerprint sensors and by this technique various
fingerprint classes are obtained and we can identify the owner of the fingerprint.
Industrial Automation:
In this we provide the intensity or range image of the product and by this the defective or
non-defective product is identified.
Approaches for Pattern Recognition Systems can be represented by different phases as Pattern
Recognition Systems can be divided into components.
2. Segmentation and Grouping: Deepest problems in pattern recognition that deals with the
problem of recognizing or grouping together the various parts of an object.
5. Post Processing: It deals with action decision making by using the output of the
classifier.Action such as to minimum-error-rate classification that will minimize the total
expected cost.
There are various sequences of activities that are used for designing the Pattern Recognition
Systems. These activities are as follows:
Data Collection
Feature Choice
Model Choice
Training
Evaluation
In pattern recognition system, for recognizing the pattern or structure two basic approaches are
used which can be implemented in diferrent techniques. These are –
Statistical Approach and
Structural Approach
Statistical Approach:
Statistical methods are mathematical formulas, models, and techniques that are used in the
statistical analysis of raw research data. The application of statistical methods extracts
information from research data and provides different ways to assess the robustness of research
outputs.
Two main statistical methods are used :
3. Descriptive Statistics: It summarizes data from a sample using indexes such as the mean or
standard deviation.
4. Inferential Statistics: It draw conclusions from data that are subject to random variation.
Structural Approach:
The Structural Approach is a technique wherein the learner masters the pattern of sentence.
Structures are the different arrangements of words in one accepted style or the other.
Types of structures:
Sentence Patterns
Phrase Patterns
Formulas
Idioms
Supervised learning
Supervised learning as the name indicates the presence of a supervisor as a teacher. Basically
supervised learning is a learning in which we teach or train the machine using data which is well
labeled that means some data is already tagged with the correct answer. After that, the machine is
provided with a new set of examples(data) so that supervised learning algorithm analyses the
training data(set of training examples) and produces a correct outcome from labeled data.
For instance, suppose you are given a basket filled with different kinds of fruits. Now the first
step is to train the machine with all different fruits one by one like this:
If shape of object is rounded and depression at top having color Red then it will be labelled
as –Apple.
If shape of object is long curving cylinder having color Green-Yellow then it will be labelled
as –Banana.
Now suppose after training the data, you have given a new separate fruit say Banana from basket
and asked to identify it.
Since the machine has already learned the things from previous data and this time have to use it
wisely. It will first classify the fruit with its shape and color and would confirm the fruit name as
BANANA and put it in Banana category. Thus the machine learns the things from training
data(basket containing fruits) and then apply the knowledge to test data(new fruit).
Unsupervised learning is the training of machine using information that is neither classified nor
labeled and allowing the algorithm to act on that information without guidance. Here the task of
machine is to group unsorted information according to similarities, patterns and differences
without any prior training of data.
Unlike supervised learning, no teacher is provided that means no training will be given to the
machine. Therefore machine is restricted to find the hidden structure in unlabeled data by our-
self.
For instance, suppose it is given an image having both dogs and cats which have not seen ever.
Thus the machine has no idea about the features of dogs and cat so we can’t categorize it in dogs
and cats. But it can categorize them according to their similarities, patterns, and differences i.e.,
we can easily categorize the above picture into two parts. First first may contain all pics
having dogs in it and second part may contain all pics having cats in it. Here you didn’t learn
anything before, means no training data or examples.
Unsupervised learning classified into two categories of algorithms:
Clustering: A clustering problem is where you want to discover the inherent groupings in
the data, such as grouping customers by purchasing behavior.
Association: An association rule learning problem is where you want to discover rules that
describe large portions of your data, such as people that buy X also tend to buy Y.
Introduction to Clustering
These data points are clustered by using the basic concept that the data point lies within the given
constraint from the cluster centre. Various distance methods and techniques are used for
calculation of the outliers.
Why Clustering ?
Clustering is very much important as it determines the intrinsic grouping among the unlabeled
data present. There are no criteria for a good clustering. It depends on the user, what is the criteria
they may use which satisfy their need. For instance, we could be interested in finding
representatives for homogeneous groups (data reduction), in finding “natural clusters” and
describe their unknown properties (“natural” data types), in finding useful and suitable groupings
(“useful” data classes) or in finding unusual data objects (outlier detection). This algorithm must
make some assumptions which constitute the similarity of points and each assumption make
different and equally valid clusters.
Clustering Methods :
Density-Based Methods : These methods consider the clusters as the dense region having
some similarity and different from the lower dense region of the space. These methods have
good accuracy and ability to merge two clusters.Example DBSCAN (Density-Based Spatial
Clustering of Applications with Noise) , OPTICS (Ordering Points to Identify Clustering
Structure) etc.
Hierarchical Based Methods : The clusters formed in this method forms a tree-type
structure based on the hierarchy. New clusters are formed using the previously formed one. It
is divided into two category
Agglomerative (bottom up approach)
Divisive (top down approach)
examples CURE (Clustering Using Representatives), BIRCH (Balanced Iterative Reducing
Clustering and using Hierarchies) etc.
Partitioning Methods : These methods partition the objects into k clusters and each
partition forms one cluster. This method is used to optimize an objective criterion similarity
function such as when the distance is a major parameter example K-means, CLARANS
(Clustering Large Applications based upon Randomized Search) etc.
Grid-based Methods : In this method the data space is formulated into a finite number of
cells that form a grid-like structure. All the clustering operation done on these grids are fast
and independent of the number of data objects example STING (Statistical Information
Grid), wave cluster, CLIQUE (CLustering In Quest) etc.
Clustering Algorithms :
K-means clustering algorithm – It is the simplest unsupervised learning algorithm that solves
clustering problem.K-means algorithm partition n observations into k clusters where each
observation belongs to the cluster with the nearest mean serving as a prototype of the cluster .
Marketing : It can be used to characterize & discover customer segments for marketing
purposes.
Biology : It can be used for classification among different species of plants and animals.
Libraries : It is used in clustering different books on the basis of topics and information.
Insurance : It is used to acknowledge the customers, their policies and identifying the
frauds.
City Planning: It is used to make groups of houses and to study their values based on their
geographical locations and other factors present.
Earthquake studies: By learning the earthquake-affected areas we can determine the dangerous
zones.
Pattern Recognition approaches
Patterns generated from the raw data depend on the nature of the data. Patterns may
be generated based on the statistical feature of the data. In some situations, underlying
structure of the data decides the type of the pattern generated. In some other instances,
neither of the two situation exits. In such scenarios a system is developed and trained for
desired responses. Thus, for a given problem one or more of these different approaches may
be used to obtain the solution. Hence, to obtain the desired attributes for a pattern recognition
system, there are many different mathematical techniques. The four best-known approaches
for the pattern recognition are:
1. Template matching
2. Statistical classification
3. Syntactic matching
4. Neural networks
In template matching, the prototype of the pattern to be recognized is compared
against the pattern to be recognized. In the statistical approach, the patterns are described as
random variables, from which class densities can be inferred. Classification is done based on
the statistical modeling of data. In the syntactic approach, a pattern is seen as being
composed of simple sub-patterns which are themselves built from yet simpler sub-patterns,
the simplest being the primitives. Inter relationships between these primitive patterns are
used to represent a more complex pattern. The neural network approach to pattern
recognition is strongly related to the statistical methods, since they can be regarded as
parametric models with their own learning scheme.
The models proposed need not be independent and sometimes the same pattern
recognition method exists with different interpretations. A hybrid system may be built
involving multiple models. The comparison of different approaches is summarized in
Table 1.1.
Table 1.1: Pattern Recognition Models
Template matching
One of the simplest and earliest approaches to pattern recognition is based on
template matching. Matching is carried out to determine the similarity between two entities
such as points, curves, or shapes of the same type. In template matching, a template or a
prototype of the pattern to be recognized is available. The pattern to be recognized is
matched against the stored template while taking into account all allowable operations such
as translation, rotation and scale changes. The similarity measure, often a correlation, may be
optimized based on the available training set. Often, the template itself is learned from the
training set. Template matching is computationally demanding. Present day computers with
higher computation power, due to their faster processors, has made this approach more
feasible. The rigid template matching even though effective in some application domains has
a number of disadvantages. For example, it would fail if the patterns are distorted due to the
imaging process, viewpoint change, or large intra-class variations among the patterns. When
the deformation cannot be easily explained or modeled directly, deformable template models
or rubber sheet deformations can be used to the match patterns.
Statistical Pattern Recognition
The statistical pattern recognition approach assumes statistical basis for classification
of data. It generates random parameters that represent the properties of the pattern to be
recognized. The main goal of statistical pattern classification is to find to which category or
class a given sample belongs. Statistical methodologies such as statistical hypothesis testing,
correlation and Bayes classification are used for implementing this method. The effectiveness
of the representation is determined by how well pattern from different classes are well
separated.
To measure the nearness of the given sample with one of the classes, statistical
pattern recognition uses probability of error. Bayesian classifier is a natural choice in
applying statistical methods to pattern recognition. However, its implementation is often
difficult due to the complexity of the problems and especially when the dimensionality of the
system is high. One can also consider simpler solution such as a parametric classifier based
on assumed mathematical forms such as linear, quadratic or piecewise. Initially a parametric
form of the decision boundary is specified; then the best decision boundary of the specified
form is found based on the classification of training samples. Another important issue
concerned with statistical pattern recognition is the estimation of the values of the parameters
since they are not given in practice. In these systems it is always important to understand
how the number of samples affects the classifier design and performance.
Syntactic Pattern Recognition
In many situations there exist interrelationship or interconnection between the
features associated with a pattern. In such circumstances it is appropriate to assume a
hierarchical relationship where a pattern is viewed as being consist of simple sub patterns
which are themselves built with yet another sub pattern. This is the basis of Syntactic pattern
recognition. In this method symbolic data structures such as arrays, strings, trees, or graphs
are used for pattern representation. These data structures define the relations between
fundamental pattern components and allow the representation of hierarchical models. Thus
complex patterns can be represented from simpler ones. The recognition of an unknown
pattern is accomplished by comparing its symbolic representation with a number of
predefined objects. This comparison helps to compute the similarity measurement between
the unknown input and with known patterns.
The symbolic data structures used for the representation of the patterns are
represented by words of symbols or strings. The individual symbols in a string usually
represent components of the atomic pattern. The strings are however one-dimensional in
nature but many patterns are inherently two or more dimensional. One of the most used and
powerful symbolic structure for higher dimensional data representation is a graph. A graph is
composed of a set of nodes and a set of edges in which the nodes represent simpler sub-
patterns and the edges the relations between those sub-patterns. These relations may be
spatial, temporal or of any other type, depending on the problem. An important subclass of a
graph is a tree. A tree has three different classes of nodes, which are root, interior and leave.
Trees are intermediate between strings and graphs. They are interesting for pattern
recognition applications since they are more powerful than strings as a representation of the
object and computationally less expensive than graphs. Another form of symbolic
representation is the array which is a special type of graph which has the nodes and edges
arranged in a regular form. This type of data structure is very useful for low level pattern
representation.
Structural pattern recognition is found to be good because it provides a description of
how the given pattern is constructed from the primitives in addition to classification. This
method is useful in situations where the patterns have a definite structure which can be
captured in terms of a set of rules. However, due to parsing difficulties the implementation of
a syntactic approach is limited. It is very difficult to use this method for segmentation of
noisy patterns and another problem is inference of the grammar from training data. Powerful
pattern recognition capabilities can be achieved by combining the syntactic and statistical
pattern recognition techniques [Fu 1986].
Neural Network
Neural computing is based on the way by which biological neural system store and
manipulates information. It can be viewed as parallel computing environment consisting of
interconnection of large number of simple processors. Neural network have been successfully
applied in many tasks of pattern recognition and machine learning systems. The structure of
neural system is drawn from analogies with biological neural systems. Many algorithms have
been designed to work with neural network learning have been developed. In these
algorithms, a set of rules defines the evolution process undertaken by the synaptic
connections of the networks, thus allowing them to learn how to perform specified tasks.
Neural network models uses a network of weighted directed graphs in which the nodes
are artificial neurons and directed edges are connections between neuron outputs and
neuron inputs. The neural networks have the ability to learn complex nonlinear input-
output relationships, use sequential training procedures, and adapt themselves to the data.
Different types of neural networks are used for pattern classification. Among them Feed-
forward network and Kohonen-Network is commonly used. The learning process
involves updating network architecture and connection weights so that a network can
efficiently perform a specific classification/clustering task. The neural network models
are gaining popularity because of their ability to solve pattern recognition problems,
seemingly low dependence on domain-specific knowledge, and due to the availability of
efficient learning algorithms for practitioners to use. Neural networks are also useful for
implementing nonlinear algorithms for feature extraction and classification. In addition,
existing feature extraction and classification algorithms can also be mapped on neural
network architectures for efficient implementation. In spite of the seemingly different
underlying principles, most of the well-known neural network models are implicitly
equivalent or similar to classical statistical pattern recognition methods.
OR
Well-known concepts from statistical decision theory are utilized to establish decision
boundaries between pattern classes.
The recognition system is operated in two modes: training (learning) and classification
(testing) (see Fig. 1).
The role of the preprocessing module is to segment the pattern of interest from the
background, remove noise, normalize the pattern, and any other operation which will
contribute in defining a compact representation of the pattern.
In the training mode, the feature extraction/selection module finds the appropriate
features for representing the input patterns and the classifier is trained to partition
the feature space.
The feedback path allows a designer to optimize the preprocessing and feature
extraction/selection strategies. In the classification mode, the trained classifier assigns the
input pattern to one of the pattern classes under consideration based on the measured
features
In these systems it is always important to understand how the number of samples affects
the classifier design and performance
Neural computing is based on the way by which biological neural system store and
manipulates information. It can be viewed as parallel computing environment consisting of
interconnection of large number of simple processors.
Neural network have been successfully applied in many tasks of pattern recognition and
machine learning systems. The structure of neural system is drawn from analogies with
biological neural systems.
Many algorithms have been designed to work with neural network learning have been
developed. In these algorithms, a set of rules defines the evolution process undertaken by the
synaptic
Designing a neural network which is used error back propagation algorithm is not only a
science but also an experimental work.
The reason is that many factors are engaged in designing a network which are the results of
researcher's experiences however with considering some matters we can lead the back
propagation algorithm to better Performance
connections of the networks, thus allowing them to learn how to perform specified tasks.
Neural network models uses a network of weighted directed graphs in which the nodes are
artificial neurons and directed edges are connections between neuron outputs and neuron
inputs.
The neural networks have the ability to learn complex nonlinear input-output relationships,
use sequential training procedures, and adapt themselves to the data. Different types of neural
networks are used for pattern classification. Among them Feedforward network and
Kohonen-Network is commonly used.
The learning process involves updating network architecture and connection weights so that
a network can efficiently perform a specific classification/clustering task. The neural network
models are gaining popularity because of their ability to solve pattern recognition problems,
seemingly low dependence on domain-specific knowledge, and due to the availability of
efficient learning algorithms for practitioners to use.
Neural networks are also useful for implementing nonlinear algorithms for feature extraction
and classification. In addition, existing feature extraction and classification algorithms can
also be mapped on neural network architectures for efficient implementation. In spite of the
seemingly different underlying principles, most of the well-known neural network models are
implicitly equivalent or similar to classical statistical pattern recognition methods.
The model of a network comprises analog cells like neuron. Fig. shows an instance of these
cells which are used in a network.
This multi layer hierarchal network is made of lots of cell layers. In this network there are
forward and backward links between cells. If this network is used for recognizing the pattern
in this hierarchy, forward signals handle the process of recognizing pattern whereas backward
signals handle the process of separating patterns and reminding.
We can teach this network to recognize each set of patterns. Even being extra instigators or
lack in patterns, this model can recognize it. It is not necessary that the complete reminding
recognize manipulated shapes or the shapes that are changed in size or convert the imperfect
parts to the main mode.
Template matching
One of the simplest and earliest approaches to pattern recognition is based on template
matching. Matching is carried out to determine the similarity between two entities such as
points, curves, or shapes of the same type.
Present day computers with higher computation power, due to their faster processors, has
made this approach more feasible. The rigid template matching even though effective in some
application domains has a number of disadvantages. For example, it would fail if the
patterns are distorted due to the imaging process, viewpoint change, or large intra-class
variations among the patterns. When the deformation cannot be easily explained or modeled
directly, deformable template models or rubber sheet deformations can be used to the match
patterns.
•Our goal of pattern recognition is to reach an optimal decision rule to categorize the
incoming data into their respective categories
•The decision boundary separates points belonging to one class from points of other
•The decision boundary partitions the feature space into decision regions.
•The nature of the decision boundary is decided by the discriminant functionwhich is used for
decision. It is a function of the feature vector.
general, a pattern classifier carves up (or tesselates or partitions) the feature space into
volumes called decision regions. All feature vectors in a decision region are assigned to the
same category. The decision regions are often simply connected, but they can be multiply
connected as well, consisting of two or more non-touching regions.
The decision regions are separated by surfaces called the decision boundaries. These
separating surfaces represent points where there are ties between two or more categories.
For a minimum-distance classifier, the decision boundaries are the points that are equally
distant from two or more of the templates. With a Euclidean metric, the decision boundary
between Region i and Region j is on the line or plane that is the perpendicular bisector of the
line from mi to mj. Analytically, these linear boundaries are a consequence of the fact that the
discriminant functions are linear. (With the Mahalanobis metric, the decision boundaries are
quadratic surfaces, such as ellipsoids, paraboloids or hyperboloids.)
Decision boundary
Figure 3.2 graphically defines the input space, decision regions, decision boundaries, and
transition regions for a two-dimensional classification problem. To define the input space, we
use a simple two-component input pattern, input vector x = {x1 and x2}. The output vector y
contains three possible classes, i.e., y = {class I, class II, class III}. Note that for every point
within the input space, there must be one and only one class specified. This example has only
three possible output vectors for training the network, y = {[1,0,0], [0,1,0], or [0,0,1]}. The
two-dimensional input space, in Figure 3.2, is constrained by feasible operating limits of the
input variables, xi (i = 1 to n), that is, (1) x1,min < x1 < x1,max; and (2) x2,min < x2 <
x2,max. The possible output classes are mapped within this two-dimensional space.
1. Decision region : a specific region within the input space which corresponds to a unique
output class. All points within this region contain one and only one output class. Note that the
input space can have multiple decision regions corresponding to multiple output
classes. Figure 3.2 has three decision regions, one for each output class.
2. Decision Boundary: the boundary is the intersection of two different decision regions.
In Figure 3.2, the decision boundary between classes I and II would have an output vector of
y = [0.5,0.5,0]. This example has three decision boundaries: (i) between class I and class II,
(ii) between class I and class III, and (iii) between class II and class III.
3. Transition Region: this area is the buffer between two different decision regions. Here,
we can make only fuzzy inferences about the classification because the predicted output
vector is not y = {[1,0,0], [0,1,0], or [0,0,1]}. For example, the transition region between
class I and class II begins as the class I output response, then starts to decrease from 1, and
ends when it reaches 0. Similarly, in the transition region, the class II output response
increases from 0 to 1.
If the decision surface is a hyperplane, then the classification problem is linear, and the
classes are linearly separable.
Decision boundaries are not always clear cut. That is, the transition from one class in the
feature space to another is not discontinuous, but gradual. This effect is common in fuzzy
logic based classification algorithms, where membership in one class or another is
ambiguous.
•For two category case, a positive value of discriminant function decides class 1 and a
negative value decides the other.
•If the number of dimensions is three. Then the decision boundary will be a planeor a 3-D
surface. The decision regions become semi-infinite volumes
•If the number of dimensions increases to more than three, then the decision boundary
becomes a hyper-planeor a hyper-surface. The decision regions become semi-infinite
hyperspaces.
Learning
•The classifier to be designed is built using input samples which is a mixture of all the
classes.
•If the Learningis offline i.e. Supervised method then, the classifier is first given a set of
training samples and the optimal decision boundary found, and then the classification is done.
•If the learning is online then there is no teacher and no training samples (Unsupervised).The
input samples are the test samples itself. The classifier learns and classifies at the same time.
Straight line decision boundary
Features
We might add other features that are not highly correlated with the ones we already have. Be
sure not to reduce the performance by adding “noisy features” Ideally, you might think the
best decision boundary is the one that provides optimal performance on the training data (see
the following figure)
Our satisfaction is premature because the central aim of designing a classifier is to correctly
classify new (test) input
Decision Boundary Choice
Our satisfaction is premature because the central aim of designing a classifier is to correctly
classify new (test) input
Supervised learning A teacher provides a category label for each pattern in the training set
Unsupervised learning The system forms clusters or “natural groupings” of the unlabeled
input patterns
METRIC SPACE
A point-set S is a metric space if there is a distance function d, which takes ordered pairs (s,t)
of elements of S and returns a distance that satisfies the following conditions
For each pair s, t in S, d(s,t) >0 if s and t are distinct points and d(s,t) =0 if s and t are
identical
For each pair s,t in S, the distance from s to t is equal to the distance from t to s, d(s,t) = d(t,s)
For each tripe s,t,u in S, the sum of the distances from s to t and from t to u is always at least
as large as the distance from s to u
DISTANCE
They provide the foundation for many popular and effective algorithms like k-nearest neighbours for
supervised learning and k-means clustering for unsupervised learning.
Different distance measures must be chosen and used depending on the types of the data. As such, it is
important to know how to implement and calculate a range of different popular distance measures and the
intuitions for the resulting scores.
Most commonly, the two objects are rows of data that describe a subject (such as a person, car, or house),
or an event (such as a purchase, a claim, or a diagnosis).
Perhaps the most likely way you will encounter distance measures is when you are using a specific PR
algorithm that uses distance measures at its core. The most famous algorithm of this type is the k-nearest
neighbours algorithm, or KNN for short.
distance measures play an important role in PR. Perhaps four of the most commonly used distance
measures in PR are as follows:
Hamming Distance
Euclidean Distance
Manhattan Distance
Minkowski Distance
Hamming Distance
Hamming distance calculates the distance between two binary vectors, also referred to as binary strings or
bitstrings for short.
You are most likely going to encounter bitstrings when you one-hot encode categorical columns of data.
For example, if a column had the categories ‘red,’ ‘green,’ and ‘blue,’ you might one hot encode each
example as a bitstring with one bit for each column.
red = [1, 0, 0]
green = [0, 1, 0]
blue = [0, 0, 1]
The distance between red and green could be calculated as the sum or the average number of bit
differences between the two bitstrings. This is the Hamming distance.
Euclidean Distance
Euclidean distance calculates the distance between two real-valued vectors.
You are most likely to use Euclidean distance when calculating the distance between two rows of data that
have numerical values, such a floating point or integer values.
If columns have values with differing scales, it is common to normalize or standardize the numerical
values across all columns prior to calculating the Euclidean distance. Otherwise, columns that have large
values will dominate the distance measure.
It might make sense to calculate Manhattan distance instead of Euclidean distance for two vectors in an
integer feature space.
Minkowski Distance
Minkowski distance calculates the distance between two real-valued vectors.
It is a generalization of the Euclidean and Manhattan distance measures and adds a parameter, called the
“order” or “p“, that allows different distance measures to be calculated.