Lecture 04: Introduction
to Probability Theory
(Part II)
Ilias Bilionis
ibilion@purdue.edu
School of Mechanical Engineering
Purdue University
predictivesciencelab.org
1
Lecture objectives
• Quick reminder.
• Introduce the concept of a random variable.
• Introduce the basics of discrete random variables.
• Introduce the basics of continuous random variables.
• Introduce the expectation and the variance operator.
• Introduce conditional independence.
2
Frequency interpretation
The probability of heads = the observed frequency of getting
heads in repeated random experiments
3
Dynamics of a coin toss:
Rise of uncertainty
5
Talking About Probabilities
the probability of A being true given that we
p(A|BI) =
know that B and I are true
or (assuming I is implied)
the probability of A being true given that we
=
know that B is true
or (assuming arguments about truth are implied)
= the probability of A given B
6
The Rules of Probability
Theory
The obvious rule (for lack of a better name):
p(A|I) + p(¬A|I) = 1
The product rule (Bayes rule, Bayes theorem):
p(AB|I) = p(A|BI)p(B|I)
That’s it!
These are enough to compute any probability we want!
7
Other Rules of Probability
All the other rules of probability theory can be derived from these two
rules!
p(A + B|I) = p(A|I) + p(B|I) p(AB|I)
8
Other Rules of Probability
p(A + B|I) = p(A|I) + p(B|I) p(AB|I)
I
9
The Sum Rule
Consider the sequence of logical sentences: B1 , . . . , Bn
such that:
• One of them is definitely true:
p(B1 + · · · + Bn |I) = 1
• They are mutually exclusive:
p(Bi Bj |I) = 0, if i 6= j
The sum rule states that for any logical sentence A:
X X
P (A|I) = p(ABi |I) = p(A|Bi I)p(Bi |I)
i 10
i
Intuition about the Sum
X
Rule
X
P (A|I) = p(ABi |I) = p(A|Bi I)p(Bi |I)
i i
I
B1 B2 B3 B4
11
Random Variables
12
Definition of a Random
Variable
The math definition is based on measure theory and does not really
give us physical intuition about what a random variable is.
We’ll just take:
A random variable (r.v.) X is a variable of our problem whose value is
unknown to us (aleatory or epistemic - we don’t care!)
13
Discrete Random Variables
… are r.v.’s that take (at most countably infinite) discrete values.
14
Talking About Discrete
Random Variables
p(X = x|I) = the probability that the value of X is x given our
current information I.
If there is no ambiguity, we may write:
p(x) ⌘ p(X = x|I)
This is known as the probability density function (PDF) of X.
15
Properties of the Probability
Density Function: Normalization
p(X = x|I) 0
16
Properties of the Probability
Density Function: Normalization
X
p(X = x|I) = 1
x
or
X
p(x) = 1
x
17
Properties of the Probability
Density Function: Product Rule
p(X = x, Y = y|I) = p(X = x|Y = y, I)p(Y = y|I)
or
p(x, y) = p(x|y)p(y)
18
Properties of the Probability
Density Function: Sum Rule
X X
p(X = x|I) = p(X = x, Y = y|I) = p(X = x|Y = y, I)p(Y = y|I)
y y
or
X X
p(x) = p(x, y) = p(x|y)p(y)
y y
19
Continuous Random
Variables
… are r.v.’s that take continuous scalar values.
20
Continuous Random
Variables
p(X = x|I) =
21
Continuous Random Variables:
Cumulative Distribution Function (CDF)
F (x) = p(X x|I)
Properties
F ( 1) = 0 and F (+1) = 1
x1 x2 =) F (x1 ) F (x2 )
p(x1 X x2 |I) = F (x2 ) F (x1 )
22
Continuous Random Variables:
Probability Density Function
dF (x)
f (x) =
dx
Properties
Z x2
p(x1 X x2 |I) = f (x)dx
x1
f (x) 0
Z +1
f (x)dx = 1
1
23
Continuous Random Variables:
Probability Density Function
dF (x)
f (x) =
dx
Z x2
p(x1 X x2 |I) = f (x)dx.
x1
24
Continuous Random Variables:
Simplifying the Notation
dF (x) d
p(x) := f (x) = = p(X x|I)
dx dx
25
Continuous Random
Variables: The Product Rule
p(x, y) = p(x|y)p(y)
26
Continuous Random
Variables: The Sum Rule
Z Z
p(x) = p(x, y)dy = p(x|y)p(y)dy
27
Simplifying the Notation
Even Further
Consider a discrete random variable X taking values:
x 1 , x2 , . . .
with probabilities:
p1 , p2 , . . .
We may write it as a fake continuous random variable using a PDF
with a Dirac delta:
X
p(x) = p(X = x) = pi (x xi )
i
Because of the properties of the Dirac delta, all the formulas are
identical for both discrete and continuous random variables.
28
Expectations of
Random Variables
29
Expectations of Random
Variables
Z
E[X] := E[X|I] = xp(x)dx
30
Properties of the
Expectation Operator
E[X + c] = E[X] + c
31
Properties of the
Expectation Operator
E[cX] = cE[X]
33
Properties of the
Expectation Operator
E[X + Y ] = E[X] + E[Y ]
32
Properties of the
Expectation Operator
Z
E[g(X)] := E[g(X)|I] = g(x)p(x)dx
34
Properties of Expectations:
Jensen’s Inequality
If f is convex, then:
f (E[X]) E[f (X)]
NOTE: Equality only if f is linear.
35
The Variance Operator
h i
2
V[X] = E (X E[X])
36
Properties of the Variance
Operator
2 2
V[X] = E[X ] (E[X])
37
Properties of the Variance
Operator
V[X + c] = V[X]
V[cX] = c2 V[X]
38
The Covariance Operator
C[X, Y ] = E [(X E[X]) (Y E[Y ])]
39
Properties of the Variance
Operator
V[X + c] = V[X]
V[cX] = c2 V[X]
V[X + Y ] = V[X] + V[Y ] + 2C(X, Y )
40
Conditional Expectation
Z
E[X|Y = y] := E[X|Y = y, I] = xp(x|y)dx
41
Conditional Anything
h i
2
V[X|Y = y] := E (X E[X|Y = y, I]) |Y = y, I
42
Conditional
Independence
43
Conditional Independence
of Logical Sentences
A ? B|I
means that
p(A|B, I) = p(A|I)
We can show that:
A ? B|I () p(AB|I) = p(A|I)p(B|I)
44
Conditional Independence
of Logical Sentences
A ? B|C, I
means that
p(A|B, C, I) = p(A|C, I)
We can show that:
A ? B|C, I () p(A, B|C, I) = p(A|C, I)p(B|C, I)
45
Conditional Independence
of Random Variables
X ? Y |I
means that
p(x|y) = p(x)
We can show that:
p(x, y) = p(x)p(y)
46
Expectation of Product of
Independent Random Variables
X ? Y |I =) E[XY ] = E[X]E[Y ]
47
Covariance of Independent
Random Variables
X ? Y |I =) C(X, Y ) = 0
48
Variance of Sum of
Independent Random Variables
X ? Y |I =) V[X + Y ] = V[X] + V[Y ]
49