0% found this document useful (0 votes)

93 views211 pages

Machine Learning: BE Sixth Semester 20CS610

Uploaded by

rahuljssstu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views211 pages

Machine Learning: BE Sixth Semester 20CS610

Uploaded by

rahuljssstu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 211

20CS610 Machine Learning

BE Sixth Semester 20CS610

Course Outcomes
2

◻ After completing this course, students should be able to:

◻ CO1: Understand the basic concepts of Machine

Learning.
◻ CO2: Formulate machine learning problems
corresponding to different applications.
◻ CO3: Understand the unsupervised learning techniques..

◻ CO4: Understanding the pre-processing activity done

for Machine learning algorithm.
◻ CO5: Understand neural networks implemented for
various applications.
Text Books
3

◻ Pattern Recognition and Image Analysis, Earl Gose,

Richard Johnsonbaugh Steve Jos, Pearson 2015
◻ Pattern Classification, Richard O. Duda, Peter E.
Hart, David G. Stork, Second edition, Wiley
publication.
◻ Ethem Alpaydin (2014). Introduction to
Machine Learning, Third Edition, MIT Press.
The textbook website is https://www.
cmpe.boun.edu.tr/~ethem/i2ml3e/
Reference Books
4

◻ 1. Machine Learning, Tom M. Mitchell, McGraw-Hill

Publishers, 1997.
◻ 2. Pattern Recognition and Machine Learning,
Christopher M. Bishop, Springer Publishers, 2011.
◻ 3. Kevin Murphy, Machine Learning: A Probabilistic
Perspective, MIT Press, 2012.
◻ 4. Understanding Machine Learning , Shai Shalev-
Shwartz and Shai Bendavid, Cambridge University
press. 2017.
Assessment Weightage in Marks
5

◻ Class Test –I 10
◻ Quiz/Mini Projects/ Assignment/ seminars 10
◻ Class Test – II 10
◻ Quiz/Mini Projects/ Assignment/ seminars 10
◻ Class Test – III 10

◻ Total 50
Question Paper Pattern
6

◻ Semester End Examination (SEE)

◻ Semester End Examination (SEE) is a written examination of three
hours duration of 100 marks with 50% weightage.

◻ Note:
◻ • The question paper consists of TWO parts PART- A and PART- B.

◻ • PART- A consists of Question Number 1-5 are compulsory

(ONE question from each unit)

◻ • PART-B consists of Question Number 6-15 will have internal

choice.
(TWO question from each unit)

◻ • Each Question carries 10 marks and may consist of sub-

questions.
◻ • Answer 10 full questions of 10 marks each
UNIT – 1 Introduction & Bayesian Decision
Theory:
1

◻ Introduction: What Is Machine Learning?

Applications of Machine Learning, Types of
Machine Learning, Statistical Decision Theory and
Analysis
◻ Probability: Introduction, Basics of Probability,
Combination, Permutation, union, Intersection.
Complement, Conditional Probability, Random
Variables, Binomial Distribution, Normal distribution,
Joint Distributions and Densities, Moments of
Random Variables
MACHINE LEARNING – UNIT 1
Introduction
◻ Artificial Intelligence (AI)
◻ Machine Learning (ML)
◻ Deep Learning (DL)
◻ Data Science
Artificial Intelligence
◻ Artificial intelligence is intelligence demonstrated
by machines, as opposed to natural intelligence
displayed by animals including humans.
Machine Learning
◻ Machine Learning – Statistical Tool to explore the data.

Machine learning (ML) is a type of artificial intelligence (AI) that allows

software applications to “self-learn” from training data and improve
over time, to become more accurate at predicting outcomes without being
explicitly programmed to do so.

Machine learning algorithms use historical data as input to predict new

output values.

Machine learning algorithms are able to detect patterns in data and

learn from them, in order to make their own predictions

If you are searching some item in amazon… next time… without your
request… your choice will be listed.
Deep Learning
◻ It is the subset of ML, which mimic human brain.

◻ Three popular Deep Learning Techniques are:

ANN – Artificial Neural Network
CNN- Convolution Neural Network
RNN- Recurrent Neural Network
Summary:
What is Machine Learning?
◻ Learning : is any process by which a system improves
performance from experience.
◻ Example: We call experienced doctor, Experienced
Teacher, Experienced driver

◻ Machine Learning is the study of algorithms that

• improve their performance P
• at some task T
• with experience E.
A well-defined learning task is given by <P,T,E>
Defining the Learning Task
Improve on task T, with respect to performance metric P, based
on experience E

T: Recognizing hand-written words

P: Percentage of words correctly classified
E: Database of human-labeled images of handwritten words

T: Driving on four-lane highways using vision sensors

P: Average distance traveled before a human-judged error
E: A sequence of images and steering commands recorded
while observing a human driver.
◻ A Machine Learning system learns from historical data,
builds the prediction models, and whenever it
receives new data, predicts the output for it.
◻ The accuracy of predicted output depends upon the
amount of data, as the huge amount of data helps to
build a better model which predicts the output more
accurately.
Types of Machine Learning:
Supervised Learning
Unsupervised Learning
Semi supervised Learning
Reinforcement Learning
• Classification is the task of assigning a class label to an input
pattern. The class label indicates one of a given set of classes. The
classification is carried out with the help of a model obtained
using a learning procedure. According to the type of learning
used, there are two categories of classification. supervised
learning and unsupervised learning.
• Supervised learning makes use of a set of examples which
already have the class labels assigned to them.
• Unsupervised learning attempts to find inherent structures in
the data.
• Semi-supervised learning makes use of a small number of
labeled data and a large number of unlabeled data to learn the
classifier.
1. Supervised Learning

• Supervised learning: classification is seen as supervised

learning from examples.
– Supervision: The data (observations, measurements, etc.) are
labeled with pre-defined classes. It is like that a “teacher” gives the
classes (supervision).
– It is called supervised learning because the process of an
algorithm learning from the training dataset can be thought
of as a teacher supervising the learning process.
– Test data are classified into these classes too.
– Example: Classify the mails as span or non span based on
redecided parameters.
Supervised Learning

◻ The aim of a supervised learning algorithm

is to find a mapping function to map the
input variable(x) with the output
variable(y).
◻ In the real-world, supervised learning can be
used for Risk Assessment, Image
classification, Fraud Detection, spam
filtering, etc.
◻ Types of supervised Machine learning
Algorithms:
Supervised learning can be further divided into two types of problems:
Regression

◻ Regression algorithms are used if there is a relationship between the

input variable and the output variable. It is used for the prediction of
continuous variables, such as Weather forecasting, Market Trends, etc.

Below are some popular Regression algorithms which come under

supervised learning:
• Linear Regression
• Regression Trees
• Non-Linear Regression
• Bayesian Linear Regression
• Polynomial Regression
Classification

◻ Classification algorithms are used when the output variable is

categorical, which means there are two classes such as Yes-
No, Male-Female, True-false, etc.

Some of the Classification algorithms which come under

supervised learning: are:
• Random Forest

• Decision Trees

• Logistic Regression

• Support vector Machines

Regression vs classification
◻ Regression models are used to predict a continuous
value.

◻ Predicting prices of a house given the features of

house like size, price etc is one of the common
examples of Regression. It is a supervised technique.

◻ Classiﬁcation applied on discrete values.

◻ Example: age vs. height., temperature of a city, house
price
Regression vs classification
2. Unsupervised learning
27

• Unsupervised learning (clustering)

– Class labels of the data are unknown.
– Given a set of data, the task is to establish the existence
of classes or clusters in the data.
◻ Unsupervised learning is a type of machine
learning in which models are trained using
unlabeled dataset and are allowed to act on that
data without any supervision.
Types of unsupervised Algorithm
• Clustering: Clustering is a method of grouping the objects
into clusters such that objects with most similarities
remains into a group and has less or no similarities with
the objects of another group.

• Association: An association rule is an unsupervised

learning method which is used for finding the relationships
between variables in the large database. It determines the
set of items that occurs together in the dataset.
Association rule makes marketing strategy more effective.
Such as people who buy X item (suppose a bread) are also
tend to purchase Y Butter/Jam) item. A typical example of
Association rule is Market Basket Analysis.
Unsupervised Learning algorithms:

◻ Below is the list of some popular unsupervised learning

algorithms:
• K-means clustering
• KNN (k-nearest neighbors)
• Hierarchal clustering
• Anomaly detection
• Neural Networks
• Principle Component Analysis
• Independent Component Analysis
• Apriori algorithm
• Singular value decomposition
Supervised Learning Unsupervised Learning

Supervised learning algorithms are trained using Unsupervised learning algorithms are trained using
labeled data. unlabeled data.

Supervised learning model takes direct feedback to Unsupervised learning model does not take any
check if it is predicting correct output or not. feedback.

Supervised learning model predicts the output. Unsupervised learning model finds the hidden
patterns in data.

In supervised learning, input data is provided to the In unsupervised learning, only input data is provided
model along with the output. to the model.

The goal of supervised learning is to train the model The goal of unsupervised learning is to find the
so that it can predict the output when it is given new hidden patterns and useful insights from the
data. unknown dataset.
Supervised learning needs supervision to train the Unsupervised learning does not need any supervision
model. to train the model.

Supervised learning can be categorized Unsupervised Learning can be classified

in Classification and Regression problems. in Clustering and Associations problems.
Supervised vs. unsupervised learning
3. Semi supervised learning

It makes use of a small number of labeled data and a large number

of unlabeled data to learn.

The model uses labeled data as an input to make inferences

about the unlabeled data.
4. Reinforcement Learning
34

◻ Reinforcement learning is a machine learning

training method based on rewarding desired
behaviors and/or punishing undesired ones .
◻ In general, a reinforcement learning agent is able
to perceive and interpret its environment, take
actions and learn through trial and error.
◻ Since there is no training data, machines learn
from their own mistakes and choose the
actions that lead to the best solution or
maximum reward.
Reinforcement – Learning from the
35
environment
Life cycle of Machine learning
Learning
37

• The classifier to be designed is built using input samples

which is a mixture of all the classes.
• The classifier learns how to discriminate between samples of
different classes.
• If the Learning is offline i.e. Supervised method then, the
classifier is first given a set of training samples and the
optimal decision boundary found, and then the classification
is done.
• If the learning is online then there is no teacher and no
training samples (Unsupervised). The input samples are the
test samples itself. The classifier learns and classifies at the
same time.
Training and Testing data
◻ Two types of data set in supervised classifier.
Training set : 70 to 80% of the available data will be used for
training the system.
In Supervised classification Training data is the data you use to
train an algorithm or machine learning model to predict the
outcome you design your model to predict.
Testing set : around 20-30% will be used for testing the system. Test
data is used to measure the performance, such as accuracy or
efficiency, of the algorithm you are using to train the machine.
Testing is the measure of quality of your algorithm.
Many a times even after 80% testing, failures can be see during
testing, reason being not good representation of the test data in the
training set.
◻ Unsupervised classifier does not use training data
Model Selection?

◻ It is the process of selecting the optimal model from the set of candidate
models, for a given data.

◻ Data used in learning:

Training Set (Usually 60%)
Validation set – Cross Validation (20%) : Validated on different models on the
same training set
Test data (20%) : Unseen data

◻ Model Selection is finding the optimal model which minimizes both bias and
variance.
◻ Bias is the error during training and Variance is the error during testing
Features and classes
◻ Properties or attributes used to classify the objects are
called features.
◻ A collection of “similar” (not necessarily same) objects are
grouped together as one “class”.
◻ For example:

◻ All the above are classified as character T

◻ Classes are identified by a label.
◻ Most of the pattern recognition tasks are first done by
humans and automated later.
Samples or patterns
◻ The individual items or objects or situations to be
classified will be referred as samples or patterns
or data.
◻ The set of data is called “Data Set”.
• Pattern is anything which has a regular sequence of occurrence. A pattern
can be either through visualization or it can be derived mathematically by
applying algorithms.
• Patterns include repeated trends in various forms of data
• Examples : Speech pattern, print on clothes, design of outfits, jewelry pattern,
Sound wave, tree species, fingerprint, face, barcode, QR-code, handwriting,
or character image etc.
• Pattern recognition is the process of recognizing patterns by using a machine
learning algorithm. Pattern recognition can be defined as the classification of
data based on knowledge already gained or on statistical information
extracted from patterns and/or their representation. One of the important
aspects of pattern recognition is its application potential.
• Examples: Speech recognition, speaker identification, multimedia document
recognition (MDR), automatic medical diagnosis.
The data inputs for pattern recognition can be words or texts, images, or audio
files. Hence, pattern recognition is broader compared to computer vision that
focuses on image recognition.
In a typical pattern recognition application, the raw data is processed and
converted into a form that is convenient for a machine to use. Pattern
recognition involves the classification and cluster of patterns.
Definition of Pattern Recognition
• Pattern recognition is defined as the study of how machines can observe the
environment, learn to distinguish various patterns of interest from their
background, and make logical decisions about the categories of the patterns.
During recognition, the given objects are assigned to a specific category.
• In general, pattern recognition can be described as an information reduction,
information mapping, or information labeling process.
• In computer science, pattern recognition refers to the process of matching
information already stored in a database with incoming data based on
their attributes or features.
Input Data and Output Response for Various Applications

Task of Classification Input Data Output Response

Character Recognition Optical Signals or Strokes Name of the character

Speech Recognition Acoustic Waveforms Name of the word

Speaker Recognition Voice Name of the speaker

Weather Prediction Weather Maps Weather Forecasts

Medical Diagnosis Symptoms Disease

Stock Market Prediction Financial News and Charts Predicted Market ups and Downs
Image Processing Example
• Sorting Fish: incoming fish are sorted
according to species using optical sensing
(sea bass or salmon?)

• Problem Analysis:
▪ set up a camera and take some sample
images to extract features
▪ Consider features such as length, lightness,
width, number and shape of fins, position of
mouth, etc.
Preprocessing

A critical step for reliable feature extraction!

Examples:

• Noise removal

• Image enhancement

• Separate touching
or occluding fish
47

• Extract boundary of each

fish
Feature Extraction

• How to choose a good set of features?

– Discriminative features

– Invariant features (e.g., invariant to geometric

transformations such as translation, rotation and scale)
• Are there ways to automatically learn which features are
best ? 48
Feature Extraction (cont’d)

Histogram of “length”

threshold l*

• Even though sea bass is longer than salmon on the average, there are
many examples of fish where this observation does not hold.
Add Another Feature

Lightness is a better feature than length because it reduces the

misclassification error.
Can we combine features in such a way that we improve performance?
(Hint: correlation)
Multiple Features

• To improve recognition accuracy, we might need to use more than one

features.
– Single features might not yield the best performance.
– Using combinations of features might yield better performance.

51
• If the feature space cannot be perfectly separated by a
straight line, a more complex boundary might be used.
(non-linear)
• Alternatively a simple decision boundary such as straight
line might be used even if it did not perfectly separate
the classes, provided that the error rates were
acceptably low.
Decision region and Decision Boundary

• Our goal of Machine learning is to reach an optimal decision rule to

categorize the incoming data into their respective categories
• The decision boundary separates points belonging to one class from
points of other
• The decision boundary partitions the feature space into decision
regions.
• The nature of the decision boundary is decided by the discriminant
function which is used for decision. It is a function of the feature vector.
Hyper planes and Hyper surfaces

For two category case, a positive value of discriminant function decides class 1
and a negative value decides the other.
• If the number of dimensions is three. Then the decision boundary will be a
plane or a 3-D surface. The decision regions become semi-infinite volumes
• If the number of dimensions increases to more than three, then the decision
boundary becomes a hyper-plane or a hyper-surface. The decision regions
become semi-infinite hyperspaces.
Decision Theory
• Can we do better than a linear classifier?
EXAMPLES FOR MACHINE
LEARNING APPLICATIONS
Handwriting Recognition

57
License Plate Recognition

58
Biometric Recognition

59
Face Detection/Recognition

Detection

Matching

Recognition

60
Fingerprint Classification

Important step for speeding up identiﬁcation

61
Autonomous Systems

Obstacle detection and avoidance

Object recognition

62
Medical Applications

Skin Cancer Detection Breast Cancer Detection

63
Land Cover Classification
(using aerial or satellite images)

Many applications including “precision” agriculture.

64
Statistical Decision Theory
◻ Decision theory, in statistics, a set of
quantitative methods for reaching optimal
decisions.
Example for Statistical Decision Theory

• Consider Hypothetical Basket ball Association:

• The prediction could be based on the difference between the home
team’s average number of points per game (apg) and the visiting
team’s ‘apg’ for previous games.
• The training set consists of scores of previously played games, with
each home team is classified as winner or loser
• Now the prediction problem is : given a game to be played, predict
the home team to be a winner or loser using the feature ‘dapg’,
• Where dapg = Home team apg – Visiting team apg
Data set of games showing outcomes, differences between average numbers of points scored and differences
between winning percentages for the participating teams in previous games
• The figure shown in the previous slide, lists 30 games and gives the value of
dapg for each game and tells whether the home team won or lost.
• Notice that in this data set the team with the higher apg usually wins.

• For example in the 9th game the home team on average, scored 10.8 fewer
points in previous games than the visiting team, on average and also the home
team lost.

• When the teams have about the same apgs, the outcome is less certain. For
example, in the 10th game , the home team on average scored 0.4 fewer points
than the visiting team, on average, but the home team won the match.

• Similarly 12th game, the home team had an apg 1.1. less than the visiting team
on average and the team lost.
Histogram of dapg

• Histogram is a convenient way to describe the data.

• To form a histogram, the data from a single class are
grouped into intervals.
• Over each interval rectangle is drawn, with height
proportional to number of data points falling in that
interval. In the example interval is chosen to have width
of two units.
• General observation is that, the prediction is not accurate
with single feature ‘dapg’
Lost

Won
Prediction
• To predict normally a threshold value T is used.
• ‘dapg’ > T consider to be win
• ‘dapg’ < T consider to be lost

• T is called decision boundary or threshold.

• If T=-1, four samples in the original data are misclassified.
– Here 3 winners are called losers and one loser is called winner.
• If T=0.8, results in no samples from the loser class being misclassified as winner, but 5
samples from the winner class would be misclassified as loser.
• IF T=-6.5, results no samples from the winner class being misclassified as losers, but 7
samples from the loser would be misclassified as winners.
• By inspection, we see that when a decision boundary is used to classify the samples the
minimum number of samples that are misclassified is four.
• In the above observations, the minimum number of samples misclassified is 4 when T=0.8
Using Additional Feature dwp

• To make it more accurate let us consider two features.

• Additional features often increases the accuracy of
classification.
• Along with ‘dapg’ another feature ‘dwp’ is considered.

• wp= winning percentage of a team in previous games

• dwp = difference in winning percentage between teams
• dwp = Home team wp – visiting team wp
Data set of games showing outcomes, differences between average number of points scored and differences between
winning percentages for the participating teams in previous games
• Now observe the results on a scatterplot

• Each sample has a corresponding feature vector (dapg,dwp), which determines its position in the plot.
• Note that the feature space can be classified into two decision regions by a straight line, called a linear
decision boundary. (refer line equation). Prediction of this line is logistic regression.
• If the sample lies above the decision boundary, the home team would be classified as the winner and it is
below the decision boundary it is classified as loser.
Prediction with two parameters.

• Consider the following : springfield (Home team)

• dapg= home team apg – visiting team apg = 98.3-102.9 = -4.6

• dwp = Home team wp – visiting team wp = -21.4-58.1 = -36.7

• Since the point (dapg, dwp) = (-4.6,-36.7) lies below the decision
boundary, we predict that the home team will lose the game.
◻ If the feature space cannot be perfectly separated by a
straight line, a more complex boundary might be used. (non-
linear)

◻ Alternatively a simple decision boundary such as straight line

might be used even if it did not perfectly separate the classes,
provided that the error rates were acceptably low.
◻ Having the model shown in previous slide, we can
use it for any type of recognition and classification.
◻ It can be
speaker recognition
Speech recognition
Image classification
Video recognition and so on…
◻ It is now very important to learn:
Different techniques to extract the features
Then in the second stage, different methods to
recognize the pattern and classify
■ Some of them use statistical approach
■ Few uses probabilistic model using mean and variance etc.
■ Other methods are - neural network, deep neural networks
■ Hyper box classifier
■ Fuzzy measure
■ And mixture of some of the above
When do we use Machine Learning?

◻ ML is used when:
• Human expertise does not exist (navigating on Mars)

• Humans can’t explain their expertise (speech

recognition)
• Models must be customized (personalized medicine)

• Models are based on huge amounts of data

(genomics)
PROBABILITY:
INTRODUCTION TO PROBABILITY
PROBABILITIES OF EVENTS
What is covered?
◻ Basics of Probability
◻ Combination
◻ Permutation
◻ Examples for the above
◻ Union
◻ Intersection
◻ Complement
What is a probability
◻ Probability is the branch
of mathematics concerning numerical descriptions
of how likely an event is to occur

◻ The probability of an event is a number between 0

and 1, where, roughly speaking, 0 indicates that
the event is not going to happen and 1 indicates
event happens all the time.
Experiment

◻ The term experiment is used in probability theory

to describe a process for which the outcome is not
known with certainty.

Example of experiments are:

Rolling a fair six sided die.
Randomly choosing 5 apples from a lot of 100 apples.
Event
◻ An event is an outcome of an experiment. It is
denoted by capital letter. Say E1,E2… or A,B….
and so on

◻ For example toss a coin, H and T are two events.

◻ The event consisting of all possible outcomes of a

statistical experiment is called the “Sample Space”.
Ex: { E1,E2…}
Examples
Sample Space of Tossing a coin = {H,T}
Tossing 2 Coins = {HH,HT,TH,TT}
Random phenomena
– Unable to predict the outcomes, but in the long-run, the
outcomes exhibit statistical regularity.
Examples
1. Tossing a coin – outcomes S ={Head, Tail}

Unable to predict on each toss whether is Head or Tail.

In the long run can predict that 50% of the time heads will occur
and 50% of the time tails will occur
2. Rolling a die – outcomes
S ={ , , , , , }
Unable to predict outcome but in the long run can one can
determine that each outcome will occur 1/6 of the time.
Use symmetry. Each side is the same. One side should not
occur more frequently than another side in the long run. If the
die is not balanced this may not be true.
Example

◻ The die toss:

◻ Simple events: Sample space:
1 E1
S ={E1, E2, E3, E4, E5, E6}
2 E2
S
3 E3 • E1 • E3
4 E4 • E5
E5 • E2
5 • E4 • E6
E6
6
Frequency of an event
90

◻ Frequency of occurrence is measured by:

◻ (after the event has occurred)

• If we let n get inﬁnitely large,

The Probability of an Event

◻ The probability of an event A measures “how

often” A will occur. We write P(A).

◻ P(A) must be between 0 and 1.

If event A can never occur, P(A) = 0. If event A always
occurs when the experiment is performed, P(A) =1.

Then P(A) + P(not A) = 1.

So P(not A) = 1-P(A)

◻ The sum of the probabilities for all simple events in S

equals 1.
Example 1

Toss a fair coin twice. What is the probability

of observing at least one head?

1st Coin 2nd Coin Ei

P(Ei)
H H 1/4
H P(at least 1 head)
T H
H 1/4
1/4
= P(E1) + P(E2) + P(E3)
H T
T 1/4 = 1/4 + 1/4 + 1/4 = 3/4
T H
T
Example 2
A bowl contains three colour Ms®, one red, one
blue and one green. A child selects two M&Ms at
random. What is the probability that at least one
is red?

1st M&M 2nd M&M Ei P(Ei)

m
m
RB 1/6
m RG 1/6 P(at least 1 red)
m BR
m 1/6
m = P(RB) + P(BR)+
BG 1/6
m P(RG) + P(GR)
m GB 1/6
m GR
1/6
= 4/6 = 2/3
Example 3
The sample space of throwing a pair of
dice is
Example 3
Event Simple events Probability

Dice add to 3 (1,2),(2,1) 2/36

Dice add to 6 (1,5),(2,4),(3,3), 5/36
(4,2),(5,1)
Red die show 1 (1,1),(1,2),(1,3), 6/36
(1,4),(1,5),(1,6)
Green die (1,1),(2,1),(3,1), 6/36
show 1 (4,1),(5,1),(6,1)
Permutations

◻ The number of ways you can arrange

n distinct objects, taking them r at a
time is

Example: How many 3-digit lock

combinations can we make from the
numbers 1, 2, 3, and 4?

The order of the choice

is important!
Examples

Example: A lock consists of ﬁve parts and can be

assembled in any order. A quality control engineer wants
to test each order for eﬃciency of assembly. How many
orders are there?

The order of the choice

is important!
It is required to seat 5 men and 4 women in a row so that the women occupy the
even places. How many such arrangements are possible?

98Solution:

Given: 5 men and 4 women

Total number of people = 9
The women occupy even places, which means they will be sitting in the 2nd, 4th, 6th
and 8th places, whereas the men will be sitting in the 1st, 3rd, 5th, 7th and 9th
places.

The number of arrangements in which 4 women can sit in 4 places = 4P4 = 4!/(4 – 4)!
= 4!/0! = 24/1 = 24

5 men can occupy 5 seats in 5 ways.

That means the number of ways they can be seated = 5P5 = 5!/(5 – 5)! = 5!/0! =
120/1 = 120

Therefore, the total numbers of possible sitting arrangements = 24 × 120 = 2880

Combinations

◻ The number of distinct combinations of n

distinct objects that can be formed, taking
them r at a time is

Example: Three members of a 5-person commi ee

must be chosen to form a subcommi ee. How many
diﬀerent subcommi ees could be formed?

The order
of
the choice
is
not
Example
m
m m

•
m m m

A box contains six M&Ms®, four red

and two green. A child selects two M&Ms at random. What is the
probability that exactly one is red?
The order of
the choice is
not important!

4 × 2 =8 ways to choose 1
red and 1 green M&M. P(exactly one red)
= 8/15
A team of four has to be selected from 6 boys and 4 girls. How many different ways a
team can be selected if at least one boy must be there in the team?

101
Solution:

Combination of a four-member team with at least one boy are:

{(BGGG), (BBGG), (BBBG), (BBBB)}

Number of ways one boy and three girls can be selected = 6 C 1 × 4 C 3 = 6 × 4 = 24

Number of ways two boys and two girls can be selected = 6 C 2 × 4 C 2 = 15 × 6 = 90

Number of ways three boys and one girl can be selected = 6 C 3 × 4 C 1 = 20 × 4 = 80

Number of ways four boys can be selected = 6 C 4 = 15

Total number of ways to form such a team = 24 + 90 + 80 + 15 = 209.

Is it combination or permutation?
◻ Having 6 dots in a braille cell, how many different character can
be made?

◻ It is a problem of combination
◻ C6,0+C 6,1 + C6,2 + C6,3+ C6,4+C6,5+
C6,6=1+6+15+20+15+6+1 = 64
◻ (Why combination is used not permutation? : reason each dots is
of same nature )
◻ 64 different characters can be made.
◻ Where N is from 0 to 6. (It is the summation of combinations..)
Having 4 characters, how may 2 character words can be formed:
Permutation : P4,2= 12
Combination: C4,2 = 6

Remember Permutation is larger than combination

Summary:
◻ So formula for Permutation is : (order is relevant)

◻ Formula for Combination is: (Order is not relevant)

EVENT RELATIONS
An Event , E
The event, E, is any subset of the sample space, S. i.e. any set
of outcomes (not necessarily all outcomes) of the random
phenomena
Venn
S diagram
E
The event, E, is said to have occurred if after the
outcome has been observed the outcome lies in E.

S
E
Examples

1. Rolling a die – outcomes

S ={ , , , , , }
={1, 2, 3, 4, 5, 6}

E = the event that an even number is rolled

= {2, 4, 6}

={ , , }
Special Events

The Null Event, is also called as empty event

represented by - φ
φ = { } = the event that contains no
outcomes
The Entire Event, The Sample Space - S
S = the event that contains all outcomes
3 Basic Event relations

1. Union if you see the word or,

2. Intersection if you see the word and,
3. Complement if you see the word not.
Union

Let A and B be two events, then the

union of A and B is the event (denoted
by A∪B) deﬁned by:
A ∪ B = {e| e belongs to A or e belongs
to B}
A∪B

A B
The event A ∪ B occurs if the event A
occurs or the event and B occurs or
both occurs.

A∪B

A B
Intersection

Let A and B be two events, then the

intersection of A and B is the event
(denoted by A∩B) deﬁned by:
A ∩ B = {e| e belongs to A and e
belongs to B}
A∩B

A B
The event A ∩ B occurs if the event A
occurs and the event and B occurs .

A∩B

A B
Complement

Let A be any event, then the

complement of A (denoted by )
deﬁned by:
= {e| e does not belongs to A}

A
The event occurs if the event A does
not occur

A
Mutually Exclusive

Two events A and B are called

mutually exclusive if:

A B
If two events A and B are mutually
exclusive then:

1. They have no outcomes in common.

They can’t occur at the same time. The outcome of
the random experiment can not belong to both A
and B.

A B
RULES OF PROBABILITY

ADDITIVE RULE
RULE FOR COMPLEMENTS
Additive rule (General case)

P[A ∪ B] = P[A] + P[B] – P[A ∩ B]

or
P[A or B] = P[A] + P[B] – P[A and
B]
The additive rule (Mutually exclusive events) if A ∩ B =
φ
P[A ∪ B] = P[A] + P[B]
i.e.
P[A or B] = P[A] +
P[B]
if A ∩ B = φ
(A and B mutually
exclusive)
When P[A] is added to P[B] the outcome in A ∩ B are counted
twice
hence

P[A ∪ B] = P[A] + P[B] – P[A ∩ B]

Example:
Bangalore and Mohali are two of the cities competing for the National
university games. (There are also many others).

The organizers are narrowing the competition to the final 5 cities.

There is a 20% chance that Bangalore will be amongst the final 5.

There is a 35% chance that Mohali will be amongst the final 5 and
an 8% chance that both Bangalore and Mohali will be amongst the
final 5.

What is the probability that Bangalore or Mohali will be amongst the

final 5.
Solution:
Let A = the event that Bangalore is amongst the
ﬁnal 5.
Let B = the event that Mohali is amongst the ﬁnal 5.

Given P[A] = 0.20, P[B] = 0.35, and P[A ∩ B] = 0.08

What is P[A ∪ B]?

Note: “and” ≡ ∩, “or” ≡ ∪ .
Find the probability of drawing an ace or a spade
from a deck of cards.
There are 52 cards in a deck; 13 are spades, 4 are aces.

Probability of a single card being spade is: 13/52 = 1/4.

Probability of drawing an Ace is : 4/52 = 1/13.

Probability of a single card being both Spade and Ace =

1/52.

Let A = Event of drawing a spade .

Let B = Event drawing Ace.

Given P[A] =1/4, P[B] =1/13, and P[A ∩ B] = 1/52

Rule for complements

or
Complement
Let A be any event, then the complement of A (denoted by )
deﬁned by:

= {e| e does not belongs to A}

A
The event occurs if the event A does not occur

A
Logic:

A
What Is Conditional Probability?

◻ Conditional probability is defined as the likelihood

of an event or outcome occurring, based on the
occurrence of a previous event or outcome.
◻ Conditional probability is calculated by multiplying
the probability of the preceding event by the
updated probability of the succeeding, or
conditional, event.
◻ Bayes' theorem is a mathematical formula used
in calculating conditional probability.
Definition

Suppose that we are interested in computing the probability of

event A and we have been told event B has occurred.
Then the conditional probability of A given B is defined to be:

Similarly, P[B|A] = P[A ∩ B] /P[A]

• From the previous two expressions
P[A ∩ B] = P[B].P[A|B]
And P[A ∩ B] = P[A].P[B|A]
Can also be used to calculate P[A ∩ B]
The Multiplication Rule
• In many of the cases, P(A) may not depend on whether B has occurred. We say
that the event A is independent of B if P(A) = P(A|B). An important
consequence of the definition of independence is multiplication rule, which is
obtained by substituting P(A) for P(A|B) in the above expressions
• P[A ∩ B] = P[A].P[B] whenever A is independent of B
Rationale:

If we’re told that event B has occurred then the sample space is restricted to B.
The probability within B has to be normalized, This is achieved by dividing by P[B]
The event A can now only occur if the outcome is in A ∩ B. Hence the new probability
of A is:

A
B

A∩B
An Example

The academy awards is soon to be shown.

For a specific married couple the probability that the
husband watches the show is 80%, the probability that
his wife watches the show is 65%, while the
probability that they both watch the show is 60%.
If the husband is watching the show, what is the
probability that his wife is also watching the show
Solution:
The academy awards is soon to be shown.
Let B = the event that the husband watches the show
P[B]= 0.80
Let A = the event that his wife watches the show
P[A]= 0.65 and P[A ∩ B]= 0.60
Another example
◻ There are 100 Students in a class.
◻ 40 Students likes Apple
Consider this event as A, So probability of occurrence of A is 40/100 = 0.4
◻ 30 Students likes Orange.
Consider this event as B, So probability of occurrence of B is 30/100=0.3

◻ Remaining Students does like either Apple nor Orange

◻ 20 Students likes Both Apple and Orange, So probability of Both A and B occurring is = A
intersect B = 20/100 = 0.2

◻ What is the probability of A in B, means what is the probability that A is occurring given B
:
P(A|B) = 0.2/0.3 = 0.67

P(A|B) indicates that A

occurring in the sample space
40 20 30 of B.

Here we are not considering

the entire sample space of 100
students, but only 30 students.
Example : Calculating the conditional probability of rain given that the
biometric pressure is high.
Weather record shows that high barometric pressure (defined as being over
760 mm of mercury) occurred on 160 of the 200 days in a data set, and it
rained on 20 of the 160 days with high barometric pressure. If we let R denote
the event “rain occurred” and H the event “ High barometric pressure occurred”
and use the frequentist approach to define probabilities.
P(H) = 160/200 = 0.8
and P(R and H) = 20/160 = 0.10
We can obtain the probability of rain given high pressure, directly from the
data.
P(R|H) = 20/160 = 0.125
Using conditional probability
P(R|H) = P(R and H)/P(H) = 0.10/0.8 = 0.125.
Example : In my town, it's rainy one third of the days. Given that it is rainy,
there will be heavy traffic with probability 1/2, and given that it is not rainy,
there will be heavy traffic with probability 1/4. If it's rainy and there is heavy
traffic, I arrive late for work with probability 1/2. On the other hand, the
probability of being late is reduced to 1/8 if it is not rainy and there is no
heavy traffic. In other situations (rainy and no traffic, not rainy and traffic) the
probability of being late is 0.25. You pick a random day.
• What is the probability that it's not raining and there is heavy traffic and I am
not late?
• What is the probability that I am late?
• Given that I arrived late at work, what is the probability that it rained that
day?
Let R be the event that it's rainy, T be the event that there is heavy traffic,
and L be the event that I am late for work. As it is seen from the problem
statement, we are given conditional probabilities in a chain format. Thus, it is
useful to draw a tree diagram for this problem. In this figure, each leaf in the
tree corresponds to a single outcome in the sample space. We can calculate the
probabilities of each outcome in the sample space by multiplying the
probabilities on the edges of the tree that lead to the corresponding outcome
a. The probability that it's not raining and there is heavy traffic and I am not
late can be found using the tree diagram which is in fact applying the chain
rule:
P(Rc∩T∩Lc) =P(Rc)P(T|Rc)P(Lc|Rc∩T)
=2/3⋅1/4⋅3/4
=1/8.
b. The probability that I am late can be found from the tree. All we need to do
is sum the probabilities of the outcomes that correspond to me being late. In
fact, we are using the law of total probability here.

P(L) =P(R and T and L)+P(R and Tc and L) + P(Rc and T and L) + P(Rc and
Tc and L)
=1/12+1/24+1/24+1/16
=11/48.
c. We can find P(R|L) using
P(R|L)=P(R∩L)/P(L)
We have already found P(L)=11/48 and we can find P(R∩L) similarly by
adding the probabilities of the outcomes that belong to R∩L.
In particular,
P(R∩L) =P(R,T,L)+P(R,Tc,L)
=1/12+1/24
=1/8

Thus we obtain
P(R|L) =P(R∩L)/P(L)
=(1/8)/(11/48)
=6/11.
Random Variables

Random variable takes a random value, which is real and can be finite
or infinite and it is generated out of random experiment.
The random value is generated out of a function.

Example: Let us consider an experiment of tossing two coins.

Then sample space is S= { HH, HT, TH, TT}

Given X as random variable with condition: number of heads.

X(HH) =2
X(HT) =1
X(TH) =1
X(TT) = 0
• The distribution function for the number of heads from two flips of a coin.
• The random variable k is defined to be the total number of heads that occur
when a fair coin is flipped two times.
• This random variable can have only 3 values 0,1,2, so it is discrete.
• Distribution function
k
is (T,P(k)T), (T, H), (H, T), (H, H)

0 1/4

1 2/4

2 1/4
• In general, a probability distribution function takes the following form.

• The table shows the pmf of a dataset. Areas under pmf graphs correspond to probability

• For example:
Pr(X = 2)
= shaded rectangle
= height × base
• Two types of random variables
– Discrete random variables (countable set of possible outcomes)
– Continuous random variable (unbroken chain of possible
outcomes)
A discrete random variable is described by its distribution
function which lists for each outcome x the probability P(x) of x.
• Discrete random variables are understood in terms of their probability mass function (pmf)

• pmf ≡ a mathematical function that assigns probabilities to all possible outcomes for a discrete
random variable.
◻ Two types of random variables
Discrete random variables
Continuous random variable
Discrete random variables

◻ If the variable value is finite or infinite but

countable, then it is called discrete random variable.

◻ Example of tossing two coins and to get the count

of number of heads is an example for discrete
random variable.

◻ Sample space of real values is fixed.

Discrete random variables
• If the variable value is finite or infinite but countable,
then it is called discrete random variable.

• Example : Tossing two coins or ice cream purchase in the

following slides are examples for discrete random
variable.
Example: Random variable
Probability of customers purchasing more than 3 ice creams:

• Consider P(X), where X>3

• Then it will be sum of P(X=4) + P(X=5) + P(X=6)=

0.04+0.04+0.02=0.10
• 10 percent of the customers will purchase more than 3 ice creams.

• However Summation of i=1 to n P(Xi)=1

Continuous Random Variable

◻ If the random variable values lies between two certain fixed numbers then it
is called continuous random variable. The result can be finite or infinite.

◻ Sample space of real values is not fixed, but it is in a range.

◻ If X is the random value and it’s values lies between a and b then,

It is represented by : a <= X <= b

Example: Temperature, age, weight, height…etc. ranges between specific

range.
Here the values for the sample space will be infinite
Probability distribution
◻ Frequency distribution is a listing of the observed
frequencies of all the output of an experiment that
actually occurred when experiment was done.
◻ Where as a probability distribution is a listing of
the probabilities of all possible outcomes that could
result if the experiment were done. (distribution
with expectations).
Broad classification of Probability
distribution

◻ Discrete probability distribution

Binomial distribution
Poisson distribution

◻ Continuous Probability distribution

Normal distribution
Discrete Probability Distribution:
Binomial Distribution
◻ A binomial distribution can be thought of as
simply the probability of a SUCCESS or FAILURE
outcome in an experiment or survey that is
repeated multiple times. (When we have only two
possible outcomes)

◻ Example, a coin toss has only two possible outcomes:

heads or tails and taking a test could have two
possible outcomes: pass or fail.
Assumptions of Binomial distribution
(It is also called as Bernoulli’s Distribution)
◻ Assumptions:
Random experiment is performed repeatedly with a fixed and
finite number of trials. The number is denoted by ‘n’
There are two mutually exclusive possible outcome on each trial,
which are know as “Success” and “Failure”. Success is denoted by ‘p’
and failure is denoted by ‘q’. and p+q=1 or q=1-p.
The outcome of any give trail does not affect the outcomes of the
subsequent trail. That means all trials are independent.
The probability of success and failure (p&q) remains constant for all
trials. If it does not remain constant then it is not binomial distribution.
For example tossing a coin the probability of getting head or
getting a red ball from a pool of colored balls, here every time
after the ball is taken out it is again replaced to the pool.
With this assumption let see the formula
Formula for Binomial Distribution

P(X=r) =

Where P is success and

q is failure
Binomial distribution, generally
Note the general pattern emerging if you have only two possible outcomes (call them 1/0 or yes/no or
success/failure) in n independent trials.
Total number of successes X obtained in trials is called a binomial random variable
then the probability of exactly X “successes” P(x) is as follows

n = number of trials

P(x)
= 1-p = probability
of failure
X=# p=
successes probability of
out of n success
trials
Binomial Probability Distribution

■ A fixed number of observations (trials), n

■ e.g., 15 tosses of a coin; 20 patients; 1000 people surveyed
■ A binary outcome
■ e.g., head or tail in each toss of a coin; disease or no disease
■ Generally called “success” and “failure”
■ Probability of success is p, probability of failure is 1 – p
■ Constant probability for each observation
■ e.g., Probability of getting a tail is the same each time we toss the coin
◻ Consider a pen manufacturing company
◻ 10% of the pens are defective

◻ (i)Find the probability that at least 2 pens are defective

in a box of 12
◻ So n=12,
◻ p=10% = 10/100 = 1/10
◻ q= (1-q) =90/100 = 9/10
◻ X>=2
◻ P(X>=2) = 1- [P(X<2)]
◻ = 1-[P(X=0) +P(X=1)]
Binomial Distribution: Illustration with example

◻ Consider a pen manufacturing company

◻ 10% of the pens are defective

◻ (i)Find the probability that exactly 2 pens are

defective in a box of 12
◻ So n=12,
◻ p=10% = 10/100 = 1/10
◻ q= (1-q) =90/100 = 9/10
◻ X=2
Binomial distribution: Another example

◻ If I toss a coin 20 times, what’s the

probability of getting exactly 10
heads?
Binomial distribution: example

• If I toss a coin 20 times, what’s the probability of

getting of getting 2 or fewer heads?
The Binomial Distribution: another example

◻ Say 40% of the class is

female.
◻ What is the probability
that 6 of the first 10
students walking in will
be female?
Continuous Random Variable

• If the random variable values lies between two certain fixed

numbers then it is called continuous random variable. The
result can be finite or infinite.
• If X is the random value and it’s values lies between a and b
then,
It is represented by : a <= X <= b

Example: Temperature, age, weight, height…etc. ranges

between specific range.
Continuous Probability Distributions

◻ When the random variable of interest can take any value in an interval,
it is called continuous random variable.
Every continuous random variable has an inﬁnite, uncountable
number of possible values (i.e., any value in an interval).

• Examples Temperature on a given day, Length, height, intensity of light

falling on a given region.
◻ The length of time it takes a truck driver to go from New York City to Miami.
◻ The depth of drilling to find oil.
◻ The weight of a truck in a truck-weighing station.
◻ The amount of water in a 12-ounce bottle.
For each of these, if the variable is X, then x>0 and less than some maximum value
possible, but it can take on any value within this range
•
◻ Continuous random variable differs from discrete random
variable. Discrete random variables can take on only a ﬁnite
number of values or at most a countable inﬁnity of values.

◻ A continuous random variable is described by Probability

density function. This function is used to obtain the probability
that the value of a continuous random variable is in the given
interval.
Continuous Uniform Distribution
◻ For Uniform distribution, f(x) is constant over the
possible value of x.
◻ Area looks like a rectangle.
◻ For the area in continuous distribution we need to
do integration of the function.
◻ However in this case it is the area of rectangle.
◻ Example to time taken to wash the cloths in a
washing machine. (for a standard condition)
174
NORMAL DISTRIBUTION

◻ The most often used continuous probability

distribution is the normal distribution; it is also known
as Gaussian distribution.
◻ Its graph called the normal curve is the bell-shaped
curve.

◻ Such a curve approximately describes many

phenomenon occur in nature, industry and research.
Physical measurement in areas such as
meteorological experiments, rainfall studies and
measurement of manufacturing parts are often more
than adequately explained with normal distribution.
NORMAL DISTRIBUTION

The normal (or Gaussian) distribution, is a very commonly used

(occurring) function in the fields of probability theory, and has wide
applications in the fields of:
- Pattern Recognition;
- Machine Learning;
- Artificial Neural Networks and Soft computing;
- Digital Signal (image, sound , video etc.) processing
- Vibrations, Graphics etc.
The probability distribution of the normal variable depends upon the two parameters 𝜇 and 𝜎
– The parameter μ is called the mean or expectation of the
distribution.
– The parameter σ is the standard deviation; and variance is
thus σ^2.
– Few terms:
• Mode: Repeated terms
• Median : middle data (if there are 9 data, the 5th one is the median)
• Mean : is the average of all the data points
• SD- standard Deviation, indicates how much the data is deviated
from the mean.
– Low SD indicates that all data points are placed close by
– High SD indicates that the data points are distributed and are not close by.
• SD given by the formula (S)
• Where S is sample SD

• If you want population SD, represented by

The probability distribution of the normal variable depends upon the two parameters 𝜇 and 𝜎
– The parameter μ is called the mean or expectation of the distribution.
– The parameter σ is the standard deviation; and variance is thus σ^2.

– standard deviation is a measure of the amount of variation or dispersion of a set of values.

– A low standard deviation indicates that the values tend to be close to the mean ( expected
value) of the set,
– a high standard deviation indicates that the values are spread out over a wider range.
• The density of the normal variable 𝑥 with mean 𝜇 and variance 𝜎2
is 1 (𝑥−𝜇) ◌ൗ 2
2
𝑓 𝑒− 2𝜎 −∞ < 𝑥 < ∞
𝜎
𝑥 = 2𝜋
where 𝜋 = 3.14159 … and 𝑒 = 2.71828 … . ., the Naperian
constant f(x)
𝜎
The Normal distribution 1 −( )
x− μ 2

e
f ( x)
2
(mean μ, standard deviation σ) 2σ

2 πσ
=

A plot of normal distribution (or

bell-shaped curve) where each
band has a width of 1 standard
deviation – See also: 68–95–
99.7 rule.

Standard Normal Distribution : In the above equation

probability is computed for particular value of x. If you
want a range then it has to be integrated.
For Standard Normal distribution:

• For standard normal distribution, the area under the

given range is given by:
Problem: Normal distribution

• Consider an electrical circuit in which the voltage is

normally distributed with mean 120 and standard
deviation of 3. What is the probability that the next
reading will be between 119 and 121 volts?
•
Another problem
Difference between PDF and PMF
Joint Distributions and Densities

• The joint random variables (x,y) signifies that ,

simultaneously, the first feature has the value x and
the second feature has the value y.

• If the random variables x and y are discrete, the joint

distribution function of the joint random variable (x,y)
is the probability of P(x,y) that both x and y occur.
Joint distribution in continuous random variable

• If x and y are continuous, then the probability density function is used over
the region R, where x and y is applied is used.
• It is given by:

• Where the integral is taken over the region R. This integral represents a
volume in the xyp-space.
Probability distributions can be used to describe the
population, just as we described samples .
– Shape: Symmetric, skewed, mound-shaped…

– Outliers: unusual or unlikely measurements

– Center and spread: mean and standard deviation. A population mean is

called μ and a population standard deviation is called σ.
Let x be a discrete random variable with probability distribution p(x). Then
the mean, variance and standard deviation of x are given as
Variance, continuous

Discrete case:

Continuous case?:
Moments of Random Variables
Moments are very useful in statistics because they tell us much about our data.
• In mathematics, the moments of a function are quantitative measures related to the
shape of the function's graph.
• The “moments” of a random variable (or of its distribution) are expected values of
powers or related functions of the random variable.

• If the function represents mass, then the first moment is the center of the mass, and
the second moment is the rotational inertia. The mathematical concept is closely
related to the concept of moment in physics.

• If the function is a probability distribution, then there are four commonly used
moments in statistics
The first moment is the expected value - measure of center of the data
The second central moment is the variance - spread of our data about the mean
The third standardized moment is the skewness - the shape of the distribution
The fourth standardized moment is the kurtosis - measures the peakedness or
flatness of the distribution.
Computing Moments for population
Moment 3: To know the Skewness

In positive
Skewness,
Mean is > median
and
Median>mode

And it is reverse
in case of –ve
skewness
Moment 4 : To know the Kurtosis
D
Normal Distribution
◻ Consider an example of x values:
◻ 4,5,5,6,6,6,7,7,8
◻ Mode, Median and mean all will be equal

◻ = Mode is 6
◻ = Median is 6
◻ = Mean is also 6
Positive Skew
◻ Consider an example of x values:
◻ 5,5,5,6,6,7,8,9,10
◻ (It is an example for Normal Distribution)
◻ = Mode is 5
◻ = Median is 6
◻ = Mean is also 6.8
+ve skew
-ve skew
The “moments” of a random variable (or of its distribution)
are expected values of powers or related functions of the
random variable.
In particular, the first moment is the mean, µX = E(X).
μk = E ( X )
k

⎧ if X is
x ∑ x p k
discrete
=⎪∞
⎨
⎪( x) k if X is
⎪∫ x f
⎩-
( x) continuous
The mean is a measure of the∞dx“center” or “location” of a distribution.
The kth moment of X.
Let X be a discrete random variable having support Rx = <1, 2> and the pmf is

using this compute Solution:

A central moment is a moment of a probability distribution of

random variables about the random variables Mean (Expected Value).

The kth central moment of X

Formula for Computing Kth Central moment of Random variable

The k th central moment of X

Expected Values of Discrete Random
Variables

• The variance of a discrete random variable x is

• The standard deviation of a discrete random variable x is

204
Let X be a discrete random variable having support x = <1, 2>
and the pmf is

Using this compute mean (first order moment)

First order moment is the mean.

Solution:
• Example : Let X be a discrete random variable having support Rx = <1, 2, 3> and pmf is as listed
below

• The third moment of can be computed as shown:

rd
For example computation of 3 Order
moment
◻ The third central moment of can be computed as follows:
◻ Here X value is 1 and 2 and Probability is ¾ and ¼ respectively. Consider Mean
is 5/4
Estimation of Parameters from Samples

◻ There are 3 kinds of estimates for these

parameters:
Method of moments estimates
Maximum likelihood estimates
Unbiased estimates.
Estimation of Parameters from samples: Method of moments

To estimate parameters using methods of moments, ‘n’ independent samples or

patterns x1,x2,x3…xn are collected from random variable x, which may be
continuous or discrete.
Randomly choose one of these samples in the data set,

let its value to be a new discrete random variable x’ called the empirical random
variable, which takes on of these values x1,x2,x3…xn.
Each with a probability 1/n.
Compute the sample mean and sample variance using the formula

The method of moments can also be used to compute covariance :

covariance is a measure of the relationship between two random variables.
The metric evaluates how much – to what extent – the variables change together.
In other words, it is essentially a measure of the variance between two variables.
Maximum Likelihood Estimates:

• Maximum likelihood estimation is a method that determines

values for the parameters of a model. The parameter values are
found such that they maximise the likelihood that the process
described by the model produced the data that were actually
observed.
• To compute maximum likelihood estimate, choose parameter (or
set of parameters) that maximises the joint distribution function
or multivariate density function for the entire data set when it is
evaluated at the sample points x1,x2,x3…xn.
END OF UNIT 1

ML Notes
No ratings yet
ML Notes
23 pages
Types of Machine Learning
No ratings yet
Types of Machine Learning
3 pages
ML QB Ans
No ratings yet
ML QB Ans
48 pages
Machine Learning Lab Experiments
No ratings yet
Machine Learning Lab Experiments
40 pages
AI Exam
No ratings yet
AI Exam
2 pages
Machine Learning Lecture
No ratings yet
Machine Learning Lecture
10 pages
Advance Thread Detection Using AI &ML in Cyber Security
No ratings yet
Advance Thread Detection Using AI &ML in Cyber Security
9 pages
Cns Unit 3
No ratings yet
Cns Unit 3
74 pages
Machine Learning Approaches For Combating Distributed Denial of Service Attacks in Modern Networking Environments
No ratings yet
Machine Learning Approaches For Combating Distributed Denial of Service Attacks in Modern Networking Environments
29 pages
chp-4 3
No ratings yet
chp-4 3
21 pages
Conductivity and Band Theory of Solids
100% (1)
Conductivity and Band Theory of Solids
8 pages
Ai ML Notes
No ratings yet
Ai ML Notes
18 pages
Advanced Cryptography Concepts
No ratings yet
Advanced Cryptography Concepts
38 pages
Software Engineering 2
No ratings yet
Software Engineering 2
4 pages
EC6404 Uw
No ratings yet
EC6404 Uw
354 pages
Part-A (University Questions) : 1. Write The Categories of Networks. (A/M-2011)
No ratings yet
Part-A (University Questions) : 1. Write The Categories of Networks. (A/M-2011)
38 pages
Ds 12 AVL Tree
100% (1)
Ds 12 AVL Tree
45 pages
CNS Unit - 3
100% (1)
CNS Unit - 3
22 pages
Data Structures Lab Manual
100% (1)
Data Structures Lab Manual
54 pages
Multiplicative Inverse in Mod (M) : 1 Basic Definitions
No ratings yet
Multiplicative Inverse in Mod (M) : 1 Basic Definitions
9 pages
Public Key Cryptosystems With Applications
No ratings yet
Public Key Cryptosystems With Applications
21 pages
Euclidean Algorithm: by Prof. Dr. Safaa Amin
No ratings yet
Euclidean Algorithm: by Prof. Dr. Safaa Amin
40 pages
Manual Testing
No ratings yet
Manual Testing
62 pages
Advanced Algorithms Unit 4 PP
No ratings yet
Advanced Algorithms Unit 4 PP
47 pages
Ec6304 Electronic Circuits-I
100% (1)
Ec6304 Electronic Circuits-I
193 pages
Eced
No ratings yet
Eced
152 pages
Digital Signal Processing Quiz
No ratings yet
Digital Signal Processing Quiz
25 pages
Lecture 4: Balanced Binary Search Trees
No ratings yet
Lecture 4: Balanced Binary Search Trees
8 pages
AVL Tree
100% (1)
AVL Tree
6 pages
Applications of Number Theory 163: K K K 1 K 1 1 0 0 1 2 K
No ratings yet
Applications of Number Theory 163: K K K 1 K 1 1 0 0 1 2 K
7 pages
Short Notes of Control Systems
No ratings yet
Short Notes of Control Systems
61 pages
DC Digital Communication PART5
No ratings yet
DC Digital Communication PART5
194 pages
QB For Ec-I
No ratings yet
QB For Ec-I
7 pages
Electronics Notes
No ratings yet
Electronics Notes
13 pages
Number Theory and Public Key Cryptography: Syllabus
No ratings yet
Number Theory and Public Key Cryptography: Syllabus
26 pages
Analog and Digitalcommunications
No ratings yet
Analog and Digitalcommunications
139 pages
CNS Unit-3 R20
No ratings yet
CNS Unit-3 R20
40 pages
BE Answers AI
100% (1)
BE Answers AI
24 pages
Lecture 4: Balanced Binary Search Trees
No ratings yet
Lecture 4: Balanced Binary Search Trees
11 pages
Machine Learning Basics & History
No ratings yet
Machine Learning Basics & History
458 pages
Rivest-Shamir - Adleman - Rsa: - Made by Sahil Bhatiya
No ratings yet
Rivest-Shamir - Adleman - Rsa: - Made by Sahil Bhatiya
12 pages
Ies Electronics Paper
No ratings yet
Ies Electronics Paper
26 pages
Electonic Devices Notes
No ratings yet
Electonic Devices Notes
19 pages
VPN Basics
No ratings yet
VPN Basics
26 pages
04-Number Theory and Cryptography
No ratings yet
04-Number Theory and Cryptography
38 pages
Semi Conductor Electronics
No ratings yet
Semi Conductor Electronics
21 pages
DDoS Attack Identification and Defense Using SDN Based On Machine Learning Method
No ratings yet
DDoS Attack Identification and Defense Using SDN Based On Machine Learning Method
5 pages
SEMICONDUCTORS
No ratings yet
SEMICONDUCTORS
28 pages
Cryptography and Its Applications
No ratings yet
Cryptography and Its Applications
22 pages
On Fermat's Last Theorem (III) 22 Sept 2021
No ratings yet
On Fermat's Last Theorem (III) 22 Sept 2021
6 pages
Semiconductor Quiz for Students
No ratings yet
Semiconductor Quiz for Students
8 pages
CSD3002 Secure-Software-Engineering LTP 1.0 6 CSD3002
No ratings yet
CSD3002 Secure-Software-Engineering LTP 1.0 6 CSD3002
3 pages
AVL Trees and Golden Ratio Analysis
No ratings yet
AVL Trees and Golden Ratio Analysis
11 pages
Number Theory 1
No ratings yet
Number Theory 1
21 pages
Exp 3 - Amplitude Modulation and Demodulation PDF
No ratings yet
Exp 3 - Amplitude Modulation and Demodulation PDF
8 pages
Electronics Engineering Practical Sets For RRB and Competative Exam - by WWW - Learnengineering.in
No ratings yet
Electronics Engineering Practical Sets For RRB and Competative Exam - by WWW - Learnengineering.in
162 pages
Electronic Device
No ratings yet
Electronic Device
176 pages
Machine L-Lab-Manual
No ratings yet
Machine L-Lab-Manual
90 pages
CB, CE and CC Transistor Configurations - Bipolar Junction Transistors (BJT)
No ratings yet
CB, CE and CC Transistor Configurations - Bipolar Junction Transistors (BJT)
17 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
225 pages
IoT Introduction 19th March
No ratings yet
IoT Introduction 19th March
39 pages
17CS81 IoT Module5
No ratings yet
17CS81 IoT Module5
73 pages
6th Sem Open Elective II Syllabus - Final
No ratings yet
6th Sem Open Elective II Syllabus - Final
51 pages
6th Sem Open Elective III Syllabus - Final
No ratings yet
6th Sem Open Elective III Syllabus - Final
52 pages
VI Semester Syllabus - New
No ratings yet
VI Semester Syllabus - New
35 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
104 pages
Unit 2 .Statistical Decision Making-1
No ratings yet
Unit 2 .Statistical Decision Making-1
213 pages
Unit 5
No ratings yet
Unit 5
77 pages
Unit 3 Clustering
No ratings yet
Unit 3 Clustering
101 pages
Section 4 Computer Networks
No ratings yet
Section 4 Computer Networks
8 pages
RA Ethernetip Quickconnect
No ratings yet
RA Ethernetip Quickconnect
60 pages
RWKV: Reinventing RNNs For The Transformer Era - Cropped
No ratings yet
RWKV: Reinventing RNNs For The Transformer Era - Cropped
25 pages
Gantom Series: Lighting Fixtures
No ratings yet
Gantom Series: Lighting Fixtures
4 pages
RPT Customer Libiality Report
No ratings yet
RPT Customer Libiality Report
1 page
Energy-Efficient Data Center Guide
100% (1)
Energy-Efficient Data Center Guide
48 pages
Ess V1.0.0
No ratings yet
Ess V1.0.0
29 pages
Consumer Perceived Risk in Online Shopping Environ
No ratings yet
Consumer Perceived Risk in Online Shopping Environ
6 pages
(BOSE) CS Mid Sem Notes
No ratings yet
(BOSE) CS Mid Sem Notes
13 pages
BCA Scheme Syllabus 2023-24 (6-6-2023)
No ratings yet
BCA Scheme Syllabus 2023-24 (6-6-2023)
203 pages
Perception Server Installation & User Guide: Grid Solutions
No ratings yet
Perception Server Installation & User Guide: Grid Solutions
49 pages
Free AI Art Generator From Photo - Convert Images To AI Art Artguru
No ratings yet
Free AI Art Generator From Photo - Convert Images To AI Art Artguru
1 page
Bhashini Cheatsheet
No ratings yet
Bhashini Cheatsheet
7 pages
Controller MPI 501U
100% (1)
Controller MPI 501U
26 pages
Rasyita Pertiwi - Copyedited
No ratings yet
Rasyita Pertiwi - Copyedited
7 pages
Hotel Wisanti Full
No ratings yet
Hotel Wisanti Full
61 pages
Error Control Techniques in Wireless Communication
No ratings yet
Error Control Techniques in Wireless Communication
4 pages
2017 CAMELION Catalogue GB Web
No ratings yet
2017 CAMELION Catalogue GB Web
79 pages
Proceeding
No ratings yet
Proceeding
380 pages
Project Management Tools Impact Study
No ratings yet
Project Management Tools Impact Study
4 pages
Client Onboarding Kyc Aml Compliance
No ratings yet
Client Onboarding Kyc Aml Compliance
2 pages
ML-Based Network Intrusion Detection
No ratings yet
ML-Based Network Intrusion Detection
3 pages
Common Tools
No ratings yet
Common Tools
36 pages
UPDPSWin 3900MU
No ratings yet
UPDPSWin 3900MU
23 pages
PentestTools SubdomainFinder Report
No ratings yet
PentestTools SubdomainFinder Report
5 pages
Harnessing AI For Smart Marketing
No ratings yet
Harnessing AI For Smart Marketing
9 pages
MLS English Small
No ratings yet
MLS English Small
56 pages
Civil Engineering Course Prep
No ratings yet
Civil Engineering Course Prep
17 pages
Element 31 Asian Girl Is Us16.PDF - TMP
No ratings yet
Element 31 Asian Girl Is Us16.PDF - TMP
5 pages
Lighting Distribution Pricelist 1st Nov 2015 - Update
No ratings yet
Lighting Distribution Pricelist 1st Nov 2015 - Update
24 pages