0% found this document useful (0 votes)

35 views70 pages

3 Expectation

The document discusses the concept of expectation in probability, defining it for both discrete and continuous random variables, and providing examples such as uniform, binomial, Poisson, and Gaussian distributions. It also covers related concepts like quantiles, quartiles, median, mode, variance, and standard deviation, emphasizing their definitions and properties. Additionally, it explains the linearity of expectation and the expectation of functions of random variables.

Uploaded by

Ananya Rao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views70 pages

3 Expectation

Uploaded by

Ananya Rao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 70

CS 215

Data Analysis and Interpretation

Expectation
Suyash P. Awate
Expectation
• “Expectation” of the random variable;
“Expected value” of the random variable;
“Mean” of the random variable.
• “Expected value” isn’t the value that is
most likely to be observed in the
random experiment
• Can think of it as the center of mass of
the probability mass/density function
Expectation
• Definition:
Expectation of a Discrete Random Variable:
• Frequentist interpretation of probabilities and expectation
• If a random experiment is repeated infinitely many times,
then the proportion of number of times event E occurs is the probability P(E)
• If a random experiment underlying a discrete random variable X
is repeated infinitely many times,
then the proportion of number of experiments when X takes value x is P(X=x)
• So, in N→∞ experiments, number of times X takes value xi will → N.P(X=xi)
• So, across all N→∞ experiments,
arithmetic average of observed values will
→ (1/N) ∑i (xi) (N.P(X=xi))
= E[X]
Expectation
• Another Formulation of Expectation
• Recall:
• Discrete random variable X is a function defined on a probability space {Ω,ẞ,P}
• Function X:Ω→R, maps each element in sample space Ω to a single numerical value
belonging to the set of real numbers

X(.)

s x = X(s)
Expectation
• Example
• “Expected value” for the uniform random variable modelling die roll
• Values on die are {1,2,3,4,5,6}
• E[X] = 3.5
• Expectation of a uniform random variable (discrete case)
• If X has uniform distribution over n consecutive integers over [a,b],
then E[X] = (a+b)/2
Expectation
• Example
• Expectation of a binomial random variable (when n=1, this is Bernoulli)

j := k – 1
m := n – 1
Expectation
• Example
• Expectation of a Poisson random variable
• Consider random arrivals/hits occurring at a constant average rate λ>0,
i.e., λ arrivals/hits (typically) per unit time

• This gives meaning to parameter λ as average number of arrivals in unit time

Expectation
• Definition:
"
Expectation of a Continuous Random variable: E[X] :=∫!" 𝑥𝑃 𝑥 𝑑𝑥
• Frequentist interpretation of probabilities and expectation
• If a random experiment underlying a continuous random variable X
is repeated N→∞ times,
then,
for a tiny interval [x,x+Δx],
the proportion of time X takes values within interval is approximately P(x)Δx
• So, in N→∞ experiments,
number of times we will get X within [xi,xi+Δx] is approximately N.P(xi)Δx
• So, across all N→∞ experiments,
arithmetic average of all observed values is
approximately (1/N) ∑i (xi) (N.P(xi)Δx)
• In the limit that Δx→0, this average→E[X]
Expectation
X(.)
• Another Formulation of Expectation
s x = X(s)
• Recall:
• Random variable X is a function defined on a probability space {Ω,ẞ,P}
• Function X:Ω→R, maps each element in sample space Ω to a single numerical value
belonging to the set of real numbers x

P(x)
" " X(s)
• E[X] :=∫!" 𝑥𝑃 𝑥 𝑑𝑥 = ∫!" 𝑋 𝑠 𝑃 𝑠 𝑑𝑠
• Intuition remains the same as in the discrete case [x,x+Δx]
• Using probability-mass conservation:
P(x)Δx is approximated by P(s1)Δs1 + P(s2)Δs2 + …
[s1,s1+Δs1] [s2,s2+Δs2] s
• Thus, x.P(x)Δx is approximated by
X(s1).P(s1)Δs1 + X(s2).P(s2)Δs2 + … P(s)
• A more rigorous proof needs advanced results in real analysis
Expectation
• Mean as the center of mass

P(x)
• By definition,
mean m := E[X] :=∫! 𝑥𝑃 𝑥 𝑑𝑥
• Thus, ∫!(𝑥 − 𝑚)𝑃 𝑥 𝑑𝑥 = 0
x m
• Mass P(x)dx dx
placed around location ‘x’
applies a torque ∝ P(x)dx.(x-m)
at the fulcrum placed at location ‘m’
• Because the integral ∫#(𝑥 − 𝑚)𝑃 𝑥 𝑑𝑥 is zero,
the net torque around the fulcrum ‘m’ is zero
• Hence, ‘m’ is the center of mass
Expectation
• Example
• Expectation of a uniform random variable (continuous case)
Expectation PDF
P(x) = 0, for all x < 0
P(x) = 𝜆 exp −𝜆𝑥 , ∀𝑥 ≥ 0
• Example CDF
f(x) = 0, for all x < 0
• Expectation of an exponential random variable f(x) = 1 − exp −𝜆𝑥 , ∀𝑥 ≥ 0
• Consider random arrivals/hits occurring
at a constant average rate λ > 0
• Define β := 1/λ

• This gives meaning to parameter β as average inter-arrival time

• Larger arrival/hit rate leads to lesser inter-arrival time
Expectation
• Example
• Expectation of a Gaussian random variable
Expectation
• Example
• Expectation of a limiting case of binomial
• As n tends to infinity,

binomial

tends to a

“Gaussian” form

• Gaussian expectation μ(=np here) is

consistent with binomial expectation np
Expectation
• Linearity of Expectation
• For both discrete and continuous random variables
• For random variables X and Y having a joint probability space (Ω,ẞ,P),
the following rules hold:
• E[X + Y] = E[X] + E[Y]
• Either
• Or LHS = ∫# ∫/ 𝑥 + 𝑦 𝑃 𝑥, 𝑦 𝑑𝑥𝑑𝑦 = ∫# 𝑥 ∫/ 𝑃 𝑥, 𝑦 𝑑𝑦 𝑑𝑥 + ∫/ 𝑦 ∫# 𝑃 𝑥, 𝑦 𝑑𝑥 𝑑𝑦 = RHS
• E[X + c] = E[X] + c, where ‘c’ is a constant

• E[a X] = a E[X], where ‘a’ is a scalar constant

• This generalizes to:

Expectation
• Expectation of a “function of a random variable”
• Let us define values y := Y(x), or “Y(.) is a function of the random variable X”

X(.) Y(.)

s x := X(s) y := Y(x) := Y(X(s))

• Discrete random variable: 𝐸 𝑌 𝑋 ≔ 𝐸" # 𝑌(𝑋) ≔ ∑!! 𝑌 𝑥$ 𝑃(𝑥$ )

• Continuous random variable: 𝐸 𝑌 𝑋 ≔ 𝐸" # 𝑌(𝑋) ≔ ∫! 𝑌 𝑥 𝑃 𝑥 𝑑𝑥
• Property:
• Just as EP(S)[X(S)] = EP(X)[X], …
• … we get EP(X)[Y(X)] = EP(Y)[Y]
Expectation
• Expectation of a function of multiple random variables
• Definition: When we have multiple random variables X1,…,Xn with
a joint PMF/PDF P(X1,…,Xn) and
a function of the multiple random variables g(X1,…,Xn),
then we define the expectation of g(X1,…,Xn) as:
𝐸 𝑔 𝑋% , … , 𝑋& ∶= 4 𝑔 𝑥% , … , 𝑥& 𝑃(𝑋% = 𝑥% , … , 𝑋& = 𝑥& )
!" ,…,!#
or
𝐸 𝑔 𝑋% , … , 𝑋& ∶= 5 𝑔 𝑥% , … , 𝑥& 𝑃(𝑥% , … , 𝑥& ) 𝑑𝑥% … 𝑑𝑥&
!" ,…,!#

• If X and Y are independent, then E[XY] = E[X] E[Y]

• Proof:
• ∑#,/ 𝑥𝑦𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) = ∑#,/ 𝑥𝑦𝑃 𝑋 = 𝑥 𝑃 𝑌 = 𝑦 = ∑# 𝑥𝑃 𝑋 = 𝑥 ∑/ 𝑦𝑃(𝑌 = 𝑦)
Expectation
• Tail-sum formula
• Let X be a discrete random variable taking values in set of natural numbers

• Then,
P(x=1)
P(x=2) P(x=2)
P(x=3) P(x=3) P(x=3)
• Proof: P(x=4) P(x=4) P(x=4) P(x=4)
…

Sum over rows (row number = x)

Sum over columns (column number = k)

Expectation
• Tail-sum formula
• Let X be a continuous random variable taking non-negative values
• Notation: For random variable X, PDF is fX(.) and CDF is FX(.)
• Then,
t
• Proof:

x t
x
t x
Expectation in Life
• Action without expectation à Happiness [Indian Philosophy]
Quantile, Quartile
• Definition: For a discrete/continuous random variable
with a PMF/PDF P(.), the q-th quantile
(where 0<q<1) is any real number ‘xq’
such that P(X≤xq) ≥ q and P(X≥xq) ≥ 1-q
• Quartiles: q = 0.25 (1st quartile),
q = 0.5 (2nd), q = 0.75 (3rd)
• Percentiles
• q=0.25 à 25th percentile
• Box plot,
box-and-whisker plot
• Inter-Quartile Range
(IQR)
Quantile, Median
• Definition:
For a discrete/continuous random variable with a PMF/PDF P(.),
the median is any real number ‘m’
such that P(X≤m) ≥ 0.5 and P(X≥m) ≥ 0.5
• Median = second quartile
• Definition:
For a continuous random variable with a PDF P(.),
the median is any real number ‘m’
such that P(X≤m) = P(X>m)
• CDF fX(m) = 0.5
• A PDF can be associated with multiple medians
Mode
• For discrete X
• Mode m is a value for which the PMF value P(X=m) is maximum
• A PMF can have multiple modes
• For continuous X
• Mode ‘m’ is any local maximum of the PDF P(.)
• A PDF can have multiple modes
• Unimodal PDF = A PDF having only 1 local maximum
• Bimodal PDF:
2 local maxima
• Multimodal PDF:
2 or more
local maxima
Mean, Median, Mode
• For continuous X, for unimodal and symmetric distributions,
mode = mean = median
• Assuming symmetry
around mode,
mass on left of mode =
mass on right of mode
• So, mode = median
• Assuming symmetry
around mode,
every P(x)dx mass on left of mode
is matched by
a P(x)dx mass on right of mode
• So, mode = mean
Variance
• Definition: Var(X) := E[(X-E[X])2]
• A measure of the spread of the mass (in PMF or PDF) around the mean
• Property: Variance is always non-negative
• Property: Var(X) = E[X2] – (E[X])2
• Proof: LHS =
E[(X-E[X])2]
= E[ X2 + (E[X])2 – 2.X.E[X] ]
= E[X2] + (E[X])2 – 2(E[X])2
= E[X2] – (E[X])2 = RHS
• Definition: Standard deviation is the square root of the variance
• Units of variance = square of units of values taken by random variable
• Units of standard deviation = units of values taken by random variable
Variance
• Variance of a Uniform Random Variable
• Discrete case
• X has uniform distribution over n integers {a, a+1, …, b}
• Here, n = b–a+1
• Variance = (n2 – 1) / 12
Variance
• Variance of a Binomial Random Variable
• Var(X) = E[X2] – (E[X])2 , where E[X] = np
Variance
• Variance of a Binomial Random Variable
• Var(X) = E[X2] – (E[X])2 , where E[X] = np
• So, E[X2]
= np (mp + 1)
= np ((n–1)p + 1)
= (np)2 + np(1-p)
• Thus, Var(X) = np(1–p) = npq
• Interpretation
• When p=0 or p=1,
then Var(X) = 0,
which is the minimum possible
• When p=q=0.5,
then Var(X) is maximized
Variance
• Variance of a Poisson Random Variable
• Var(X) = E[X2] – (E[X])2 , where E[X] = λ
Variance
• Variance of a Poisson Random Variable
• Var(X) = E[X2] – (E[X])2 , where E[X] = λ
• So, E[X2]
= λ (λ.1 + 1)
= λ2 + λ
• Thus, Var(X) = λ
• Interpretation
• Mean of Poisson random variable was also λ
• Standard deviation of Poisson random variable is λ0.5
• As mean increases, so does variance (and standard deviation)
• When mean increase by factor of N (i.e., N time larger signal = number of arrivals/hits),
then the standard deviation (spread) increases only by a factor of N0.5
• As N increases,
then variability in number of arrivals/hits, relative to average arrival/hit rate, decreases
Variance
• Variance of a Uniform Random Variable
• Continuous case
• X has uniform distribution over [a,b]
• Variance = (b – a)2 / 12
Variance PDF
P(x) = 0, for all x < 0
P(x) = 𝜆 exp −𝜆𝑥 , ∀𝑥 ≥ 0
• Variance of a Exponential Random Variable CDF
f(x) = 0, for all x < 0
• Var(X) = E[X2] – (E[X])2 , where E[X] = β := 1/λ f(x) = 1 − exp −𝜆𝑥 , ∀𝑥 ≥ 0

• So, Var(X) = β2 . So, β = E[X] = SD(X); unlike Poisson.

Variance
• Variance of a Gaussian Random Variable
• Var(X) = E[X2] – (E[X])2, where E[X] = μ
Variance
• Variance of a Gaussian Random Variable
• Var(X) = E[X2] – (E[X])2 , where E[X] = μ

t . (t.exp(-t2))
Variance
• Example
• Variance of a limiting case of binomial
• As n tends to infinity,

binomial

tends to

Gaussian

• Gaussian variance σ2 (= npq in this case) is

consistent with binomial variance npq
Variance
• Property: Var(aX+c) = a2Var(X)
• Adding a constant to a random variable doesn’t change the variance (spread)
• This only shifts the PDF/PMF
• If Y := X + c, then Var(Y) = Var(X)
• If we scaling a random variable by ‘a’, then the variance gets scaled by a2
• If Y := aX, then Var(Y) = a2Var(X)
• Proof:
Variance
• Property: Var(X+Y) = Var(X) + Var(Y) + 2(E[XY] – E[X]E[Y])
• Proof:

E[Y2]
E[Y2]

• If X and Y are independent,

then E[XY] = E[X] E[Y], and so Var(X+Y) = Var(X) + Var(Y)
• If X,Y,Z are independent, then
Var(X+Y+Z) = Var(X+Y) + Var(Z) = Var(X) + Var(Y) + Var(Z)
• For independent random variables X1, …, Xn;
Var(X1 + … + Xn) = Var(X1) + … + Var(Xn)
Markov’s Inequality
• Theorem: Let X be a random variable with PDF P(.).
Let u(.) be an non-negative-valued function.
Let ‘c’ be a positive constant.
Then, P(u(X) ≥ c) ≤ E[u(X)] / c
• Proof:
• E[u(X)] = ∫x:u(x)≥c u(x) P(x) dx + ∫x:u(x)<c u(x) P(x) dx
• Because u(.) takes non-negative values, each integral above is non-negative
• So, E[u(X)] ≥ ∫x:u(x)≥c u(x) P(x) dx
≥ c ∫x:u(x)≥c P(x) dx
= c P(u(X) ≥ c)
• Because c>0, we get E[u(X)]/c ≥ P(u(X) ≥ c)
• Special case à
• X takes non-negative values & u(x) := x
Chebyshev’s Inequality Markov’s Inequality:
P(u(X) ≥ c) ≤ E[u(X)] / c

• Theorem: Let X be a random variable with PDF P(.),

finite expectation E[X], and finite variance Var(X).
Then, P(|X-E[X]| ≥ a) ≤ Var(X) / a2
• Proof:
• Define random variable u(X) := (X-E[X])2
• Then, by Markov’s inequality, P(u(X) ≥ a2) ≤ E[u(X)] / a2
• LHS = P(|X-E[X]| ≥ a)
• RHS = Var(X) / a2
• Q.E.D.
• Corollary: If random variable X has standard deviation σ, then
P(|X-E[X]| ≥ kσ) ≤ 1/k2
• This is consistent with the notion of standard deviation (σ) or variance (σ2)
measuring the spread of the PDF around the mean (center of mass)
Chebyshev’s Inequality
Chebyshev
• Pafnuty Chebyshev
• Founding father of Russian mathematics
• Students: Lyapunov, Markov
• First person to think
systematically in terms of
random variables and their
moments and expectations
Markov
• Andrey Markov
• Russian mathematician best known for
his work on stochastic processes
• Advisor: Chebyshev
• Students: Voronoy
• One year after doctoral defense,
appointed extraordinary professor
• He figured out that he could use chains to model
the alliteration of vowels and consonants
in Russian literature
Jensen’s Inequality
• Theorem: Let X be any random variable; f(.) be any convex function.
Then, E[f(X)] ≥ f(E[X]) A real-valued function is called convex if
the line segment between any two points on the graph of the function
• Proof: lies above/never-below the graph between the two points.
• Let m := E[X], can be anywhere on real line
• Consider a tangent (subderivative line) to f(.) at [m,f(m)]
• This line is, say, Y = aX+b,
which lies at/below (never above) f(X)
• Then, f(m) = am+b
• Then,
E[f(X)] ≥ E[aX+b]
= aE[X] + b
= f(E[X])
Jensen’s Inequality
• Corollary: Let X be any random variable; g(.) be any concave function.
Then, E[g(X)] ≤ g(E[X]) A real-valued function is called concave if
the line segment between any two points on the graph of the function
• Proof: lies below/never-above the graph between the two points.
• Let m := E[X], can be anywhere on real line
• Consider a tangent (subderivative line) to g(.) at [m,g(m)]
• This line is, say, Y = aX+b,
which lies at/above (never below) g(X)
• Then, g(m) = am+b
• Then,
E[g(X)] ≤ E[aX+b]
= aE[X] + b
= g(E[X])
Jensen
• Johan Jensen
• Danish mathematician and engineer
• President of the Danish Mathematical Society
from 1892 to 1903
• Never held any academic position
• Engineer for Copenhagen Telephone Company
• Became head of its technical department
• Learned advanced math topics by himself
• All his mathematics research
was carried out in his spare time
Minimizer of Expected Absolute Deviation
• Theorem: E[|X – c|] is minimum when c = Median(X)
• Case 1: Let c ≤ m := Median(X)
+ *
• E[|X – c|] = ∫)* 𝑐 − 𝑥 𝑃 𝑥 𝑑𝑥 + ∫+ 𝑥 − 𝑐 𝑃 𝑥 𝑑𝑥 (say, A + B)
, ,
• A = ∫)* 𝑐 − 𝑥 𝑃 𝑥 𝑑𝑥 − ∫+ 𝑐 − 𝑥 𝑃 𝑥 𝑑𝑥 (say, A1 – A2)
, *
• B = ∫+ 𝑥 − 𝑐 𝑃 𝑥 𝑑𝑥 + ∫, 𝑥 − 𝑐 𝑃 𝑥 𝑑𝑥 (say, B1 + B2)
1
• Now, B1 – A2 = 2 ∫0 𝑥 − 𝑐 𝑃 𝑥 𝑑𝑥 ≥ 0
, ,
• A1 = ∫)* 𝑐 − 𝑚 𝑃 𝑥 𝑑𝑥 + ∫)* 𝑚 − 𝑥 𝑃 𝑥 𝑑𝑥 (say, A11 + A12)
* *
• B2 = ∫, 𝑥 − 𝑚 𝑃 𝑥 𝑑𝑥 + ∫, 𝑚 − 𝑐 𝑃 𝑥 𝑑𝑥 (say, B21 + B22)
• Now, A11 + B22 = –(m–c) (1–P(x≥m)) + (m–c) P(x≥m) = (m–c) (2P(x≥m)–1) ≥ 0
• Now, A12 + B21 = E[|X – m|]
,
• So, A+B = E[|X – m|] + (m–c) (2P(x≥m) – 1) + 2 ∫+ 𝑥 − 𝑐 𝑃 𝑥 𝑑𝑥
• Value of c minimizing A+B is c = m
Minimizer of Expected Absolute Deviation
• Theorem: E[|X – c|] is minimum when c = Median(X)
• Case 2: Let m := Median(X) ≤ c
+ *
• E[|X – c|] = ∫)* 𝑐 − 𝑥 𝑃 𝑥 𝑑𝑥 + ∫+ 𝑥 − 𝑐 𝑃 𝑥 𝑑𝑥 (say, A + B)
, +
• A = ∫)* 𝑐 − 𝑥 𝑃 𝑥 𝑑𝑥 + ∫, 𝑐 − 𝑥 𝑃 𝑥 𝑑𝑥 (say, A1 + A2)
+ *
• B = − ∫, 𝑥 − 𝑐 𝑃 𝑥 𝑑𝑥 + ∫, 𝑥 − 𝑐 𝑃 𝑥 𝑑𝑥 (say, – B1 + B2)
0
• Now, A2 – B1 = 2 ∫1 𝑐 − 𝑥 𝑃 𝑥 𝑑𝑥 ≥ 0
, ,
• A1 = ∫)* 𝑐 − 𝑚 𝑃 𝑥 𝑑𝑥 + ∫)* 𝑚 − 𝑥 𝑃 𝑥 𝑑𝑥 (say, A11 + A12)
* *
• B2 = ∫, 𝑥 − 𝑚 𝑃 𝑥 𝑑𝑥 + ∫, 𝑚 − 𝑐 𝑃 𝑥 𝑑𝑥 (say, B21 + B22)
• Now, A11 + B22 = (c–m) P(x≤m) – (c–m) (1–P(x≤m)) = (c–m) (2P(x≤m)–1) ≥ 0
• Now, A12 + B21 = E[|X – m|]
+
• So, A+B = E[|X – m|] + (c–m) (2P(x≤m) – 1) + 2 ∫, 𝑐 − 𝑥 𝑃 𝑥 𝑑𝑥
• Value of c minimizing A+B is c = m
Mean, Median, Standard Deviation
• Theorem:
Mean(X) and Median(X) are within a distance of SD(X) of each other
• Proof:
• Distance between mean and median
= |E[X] – Median(X)|
= |E[X – Median(X)]|
This is |E[.]|, where |.| is a convex function. Apply Jensen’s inequality.
≤ E[|X – Median(X)|]
≤ E[|X – E[X]|] (because Median(X) minimizes expected absolute deviation)
= E[Sqrt{ (X – E[X])2 }]
This is E[Sqrt(.)], where Sqrt(.) is a concave function. Apply Jensen’s inequality.
≤ Sqrt{ E[ (X – E[X])2 ] }
= Sqrt{ Var(X) } = SD(X)
Law of Large Numbers
• This justifies why the expectation is motivated as an average over a
large number of random experiments (“long-term average”)
• Let random variables X1, …, Xi, …, Xn be ‘n’ independent and identically
distributed (i.i.d.), each with mean μ=E[Xi] and finite variance v=Var(Xi)
• Let the average, over ‘n’ experiments, be modeled by
a random variable 𝑋 := (X1 + … + Xn) / n
• Then, the expected average E[𝑋] = μ, by the linearity of expectation
• But, in specific runs, how close is 𝑋 to the expectation μ ?
• So, we analyze the spread of 𝑋 around μ
• Var(𝑋) = Var(X1/n) + … + Var(Xn/n) = n(v/n2) = v/n
Law of Large Numbers
• This justifies why the expectation is motivated as an average over a
large number of random experiments
• Law of large numbers: For all ε > 0, as n→∞, P(|𝑋 – μ | ≥ ε) → 0
• Proof: Using Chebyshev’s inequality,
P(|𝑋 – μ | ≥ ε)
≤ Var(𝑋) / ε2
= v / (nε2)
→0, as n→∞
• Thus, as the average 𝑋 uses data from more number of experiments ‘n’,
the event of “𝑋 being farther from μ than ε” has a probability that tends to 0
Law of Large Numbers
• Example
• This also gives us a way to
compute an “estimate” of
the expectation μ of a
random variable X
from “observations”/data
• What is the estimate ?
•𝑋
Law of Large Numbers

www.nature
.com/article
s/nmeth.26
13
Covariance
• For random variables X and Y, consider the joint PMF/PDF P(X,Y)
• Covariance: A measure of how the values taken by X and Y vary
together (“co”-“vary”)
• Definition: Cov(X,Y) := E[(X – E[X])(Y – E[Y])]
• Interpretation:
• Define U(X) := X – E[X] and V(Y) := Y – E[Y] (Note: U and V have expectation 0)
• In the joint distribution P(U,V),
if larger (more +ve) values of U typically correspond to larger values of V, and
smaller (more –ve) values of U typically correspond to smaller values of V,
then U and V co-vary positively
• In the joint distribution P(U,V),
if larger values of U typically correspond to smaller values of V, and …
then U and V co-vary negatively
• Property: Symmetry: Cov(X,Y) = Cov(Y,X)
Covariance
• Examples
Covariance
• Property: Cov(X,Y) = E[XY] – E[X]E[Y]
• Proof:
• Cov(X,Y) = E[(X – E[X])(Y – E[Y])] = E[XY] – E[X]E[Y] – E[X]E[Y] + E[X]E[Y] = E[XY] – E[X]E[Y]
• So, Var(X+Y) = Var(X) + Var(Y) + 2(E[XY] – E[X]E[Y]) = Var(X) + Var(Y) + 2Cov(X,Y)
• Also, when X and Y are independent, then Cov(X,Y) = 0
• Property: When Var(X) and Var(Y) are finite, and one of them is 0,
then Cov(X,Y)=0
• Property: When Y := mX + c (with finite m), what is Cov(X,Y) ?
• Cov(X,Y) = E[XY] – E[X]E[Y]
= E[mX2 + cX] – E[X](m.E[X] + c)
= m.E[X2] – m(E[X])2 = m.Var(X)
• When Var(X)>0, covariance is ∝ line-slope ‘m’, and has same sign as that of m
Covariance
• Bilinearity of Covariance
• Let X, X1, X2, Y, Y1, Y2 be random variables. Let c be a scalar constant.
• Property: Cov(X1 + X2, Y) = Cov(X1, Y) + Cov(X2, Y) = Cov(Y, X1 + X2)
• Proof (first part; second part follows from symmetry):

• Property: Cov(aX, Y) = a.Cov(X, Y) = Cov(X, aY)

• Proof (first part):
• Cov(aX, Y)
= E[ aXY ] − E[ aX ]E[ Y ]
= a (E[ XY ] − E[ X ]E[ Y ])
= a Cov(X,Y)
Standardized Random Variable
• Definition:
If X is a random variable, then its standardized form is given by
X* := (X – E[X]) / SD(X), where SD(.) gives the standard deviation
• Property: E[X*] = 0, Var(X*) = 1
• Proof:

• X* is unit-less
• X* is obtained by:
• First shifting/translating X to make mean 0, and
• Then scaling the shifted variable to make variance 1
Correlation
• For covariance, the magnitude isn’t easy to interpret (unlike its sign)
• Correlation: A measure of how the values taken by X and Y vary
together (“co”-“relate”) obtained by rescaling covariance
• Pearson’s correlation coefficient
• Assuming X and Y are linearly related, correlation magnitude shows the
strength of the (functional/deterministic) relationship between X and Y
• Let ‘SD’ = standard deviation
• Definition: Cor(X,Y) :=

• Thus, Cor(X,Y) = E[XY], where X* and Y* are the standardized variables

= E[X*Y*] – E[X*]E[Y*]
= Cov(X*,Y*)
Correlation
• Property: -1 ≤ Cor(X,Y) ≤ 1
• Proof:
• First inequality
• 0 ≤ E[(X*+Y*)2]
= E[(X*)2] + E[(Y*)2] + 2E[X*Y*]
= 2(1 + Cor(X,Y))
• So, –1 ≤ Cor(X,Y)

• Second inequality
• 0 ≤ E[(X*–Y*)2]
= E[(X*)2] + E[(Y*)2] – 2E[X*Y*]
= 2(1 – Cor(X,Y))
• So, Cor(X,Y) ≤ 1
Correlation
• Property: If X and Y are linearly related, i.e., Y = mX + c,
and are non-constant (i.e., SD(X)>0 and SD(Y)>0),
then |Cor(X,Y)| = 1
• Proof:
• When Y = mX + c, then SD(Y) = |m| SD(X)
• Cor(X,Y)
= Cov(X,Y) / (SD(X) SD(Y))
= mVar(X) / (SD(X) |m|SD(X))
= ±1
= sign of the slope m
Correlation
• Property: If |Cor(X,Y)| = 1, then X and Y are linearly related
• Proof:
• If Cor(X,Y) = 1, then E[(X*–Y*)2] = 2(1 – Cor(X,Y)) = 0
• For discrete X,Y: this must imply X*=Y* for all (x’,y’) where P(X=x’,Y=y’) > 0
• Else the summation underlying the expectation cannot be zero
• For continuous X,Y: this must imply X*=Y* for all measures (dx’,dy’) where P(dx’,dy’) > 0
• X* and Y* can be unequal only on a countable set of isolated points where P(dx’,dy’) > 0
• Else the integral underlying the expectation cannot be zero
• If Cor(X,Y) = (–1), then E[(X*+Y*)2] = 2(1 + Cor(X,Y)) = 0
• For discrete X,Y: this must imply X*=(–Y*) for all (x’,y’) where P(X=x’,Y=y’) > 0
• For continuous X,Y: this must imply X*=(–Y*) for all measures (dx’,dy’) where P(dx’,dy’) > 0
• Inequality can hold only on a countable set of isolated points where P(dx’,dy’) > 0
• If X* = ±Y*, then Y must be of the form mX+c
Correlation
• If |Cor(X,Y)|=1 (or Y=mX+c), then
how to find the equation of the line from data {(xi,yi): i=1,…,n}?
• By the way: line must pass through (E[X],E[Y])
• Because, when X=E[X], value of Y must be mE[X]+c, but that also equals E[Y]
• We proved that: if Y=mX+c, then |Cor(X,Y)|=1 and Y* = ±X* = Cor(X,Y) X*
• So, (Y – E[Y]) / SD(Y) = Cor(X,Y) (X – E[X]) / SD(X)
• So, Y = E[Y] + SD(Y) Cor(X,Y) (X – E[X]) / SD(X)
• So, Y = E[Y] + Cov(X,Y) (X – E[X]) / Var(X)
• This gives the equation of the line with:
• Slope m := Cov(X,Y) / Var(X)
• Intercept c := E[Y] – Cov(X,Y) E[X] / Var(X)
Correlation
• Examples
Correlation
• Four sets of data with the same correlation of 0.816
• Blue line indicates the line passing through (E[X],E[Y]) with slope = 0.816
(more on this when we study estimation)
• So, correlation = 0.816
doesn’t always mean that data
lies along a line of slope 0.816
• This indicates the likely
misinterpretation of correlation
when variables underlying data
aren’t linearly related
Correlation
• Zero correlation doesn’t imply independence

• We showed that independence implies zero covariance/correlation,

but the converse isn’t always true
• Example: Let X be uniformly distributed within [-1,+1]. Let Y := X2.
• Cov(X,X2) = E[X.X2] – E[X]E[X2] = E[X3] – 0.E[X2] = 0
• Thus, Cov(X,Y) = 0 = Cor(X,Y) even though Y is a deterministic function of X
Correlation
• Non-zero correlation doesn’t imply causation
• https://hbr.org/2015/06/beware-spurious-correlations
• https://science.sciencemag.org/content/348/6238/980.2
• http://www.tylervigen.com/spurious-correlations
Correlation
• Non-zero correlation doesn’t imply causation
• https://hbr.org/2015/06/beware-spurious-correlations
• https://science.sciencemag.org/content/348/6238/980.2
• http://www.tylervigen.com/spurious-correlations
Correlation
• Non-zero correlation
doesn’t imply causation
Correlation
• Non-zero correlation doesn’t imply causation
Correlation
• Non-zero correlation
doesn’t imply causation

Probability Concepts Explained
No ratings yet
Probability Concepts Explained
21 pages
Simulation
No ratings yet
Simulation
59 pages
ProbabilityStatistics Probability2
No ratings yet
ProbabilityStatistics Probability2
11 pages
Probability Review Stochastic
No ratings yet
Probability Review Stochastic
23 pages
Understanding Random Variables
No ratings yet
Understanding Random Variables
44 pages
Random Variables & Distributions Guide
No ratings yet
Random Variables & Distributions Guide
5 pages
MAS 102 - Topic 1
No ratings yet
MAS 102 - Topic 1
13 pages
Random Variables PDF
No ratings yet
Random Variables PDF
64 pages
Probability Concepts for Students
No ratings yet
Probability Concepts for Students
74 pages
Slides-Probability and Random Processes, 4, March 2024
No ratings yet
Slides-Probability and Random Processes, 4, March 2024
116 pages
Discrete Random Variable
No ratings yet
Discrete Random Variable
41 pages
MA1201 Probability Notes
No ratings yet
MA1201 Probability Notes
30 pages
3 Discrete Random Variables and Probability Distributions
No ratings yet
3 Discrete Random Variables and Probability Distributions
22 pages
ECE316 Notes 2
No ratings yet
ECE316 Notes 2
25 pages
Expectations of Discrete Random Variables: Scott Sheffield
No ratings yet
Expectations of Discrete Random Variables: Scott Sheffield
61 pages
WINSEM2024-25 MAT1011 ETH AP2024254000674 2025-01-22 Reference-Material-I
No ratings yet
WINSEM2024-25 MAT1011 ETH AP2024254000674 2025-01-22 Reference-Material-I
32 pages
Slides 9 A
No ratings yet
Slides 9 A
44 pages
Random Variable and Mathematical Expectation
No ratings yet
Random Variable and Mathematical Expectation
9 pages
Lecture 2
No ratings yet
Lecture 2
29 pages
Exam P Review Sheet
No ratings yet
Exam P Review Sheet
12 pages
3 - DscrtRandVar.
No ratings yet
3 - DscrtRandVar.
28 pages
Review Some Basic Statistical Concepts: Topic
No ratings yet
Review Some Basic Statistical Concepts: Topic
55 pages
Random Variables and Probability Distribution
No ratings yet
Random Variables and Probability Distribution
73 pages
Statistical Inference
No ratings yet
Statistical Inference
106 pages
Lecture 1-1 - Review of Probability
No ratings yet
Lecture 1-1 - Review of Probability
36 pages
Week 5-8 Short Notes
No ratings yet
Week 5-8 Short Notes
10 pages
Introductory Probability and The Central Limit Theorem
No ratings yet
Introductory Probability and The Central Limit Theorem
11 pages
Statistics Concepts: An Overview of Upper-Division Statistics With R
No ratings yet
Statistics Concepts: An Overview of Upper-Division Statistics With R
69 pages
L05 Final
No ratings yet
L05 Final
19 pages
PML Class 0 2025
No ratings yet
PML Class 0 2025
55 pages
MIT14 381F13 Lec1 PDF
No ratings yet
MIT14 381F13 Lec1 PDF
8 pages
OptimalLinearFilters PDF
No ratings yet
OptimalLinearFilters PDF
107 pages
Chapter 2
No ratings yet
Chapter 2
8 pages
Ugc Net Economics English Book 2
No ratings yet
Ugc Net Economics English Book 2
17 pages
10 Random Variables
No ratings yet
10 Random Variables
2 pages
Probability Distribution
0% (1)
Probability Distribution
21 pages
Chapter 5 Sampling in Discrete Even Simulation
No ratings yet
Chapter 5 Sampling in Discrete Even Simulation
56 pages
Probability Theory & Random Variables
No ratings yet
Probability Theory & Random Variables
38 pages
Unit02 Slide
No ratings yet
Unit02 Slide
53 pages
LEC0125 RNG Generation
No ratings yet
LEC0125 RNG Generation
7 pages
Stats Cheat Sheets
No ratings yet
Stats Cheat Sheets
15 pages
Chapter 5 Prob
No ratings yet
Chapter 5 Prob
6 pages
Probability
No ratings yet
Probability
28 pages
DI&M Part3
No ratings yet
DI&M Part3
18 pages
Notes UnitIV
No ratings yet
Notes UnitIV
32 pages
060 Random Variables
No ratings yet
060 Random Variables
5 pages
Chapter 2 - Preliminary For Coms
No ratings yet
Chapter 2 - Preliminary For Coms
76 pages
RV Intro
No ratings yet
RV Intro
5 pages
Statistics Week1
No ratings yet
Statistics Week1
48 pages
LN06 Random Variables
No ratings yet
LN06 Random Variables
5 pages
Chapter 6
No ratings yet
Chapter 6
39 pages
Chapter 3: Random Variables and Probability Distributions This Chapter Is All About
No ratings yet
Chapter 3: Random Variables and Probability Distributions This Chapter Is All About
8 pages
UE21EC241A - MEE - RM - U3 Updated
No ratings yet
UE21EC241A - MEE - RM - U3 Updated
161 pages
Chapter 1 Review
No ratings yet
Chapter 1 Review
24 pages
ECMT1020 Lecture Notes 01 rv1
No ratings yet
ECMT1020 Lecture Notes 01 rv1
6 pages
Grade 11 Mathematics Scores 2013
No ratings yet
Grade 11 Mathematics Scores 2013
2 pages
Galilea Complete Pricing Guide 2024 Year End
No ratings yet
Galilea Complete Pricing Guide 2024 Year End
15 pages
EMS1110 Rev 12 C
No ratings yet
EMS1110 Rev 12 C
24 pages
Philosophy Dissertation Help
100% (2)
Philosophy Dissertation Help
4 pages
Instruction Manual - Unique To Mixproof Tank Outlet Valve - Ese00156en
No ratings yet
Instruction Manual - Unique To Mixproof Tank Outlet Valve - Ese00156en
62 pages
Ai Enemies - Assault Soldier: Documentation
No ratings yet
Ai Enemies - Assault Soldier: Documentation
20 pages
Vanuatu Vessel Registration Details
No ratings yet
Vanuatu Vessel Registration Details
6 pages
MA - Handbook of Information 2022
No ratings yet
MA - Handbook of Information 2022
21 pages
Workshop Brochure
No ratings yet
Workshop Brochure
4 pages
Escalate Privileges Via Token Manipulation
No ratings yet
Escalate Privileges Via Token Manipulation
1 page
Viral Video Safe Zones Template
No ratings yet
Viral Video Safe Zones Template
1 page
Mostafa Qasim Fluid Report
No ratings yet
Mostafa Qasim Fluid Report
7 pages
Eminence Sigma Pro 18a-2
No ratings yet
Eminence Sigma Pro 18a-2
2 pages
Case Tools CIA I
No ratings yet
Case Tools CIA I
1 page
Assignment 6
No ratings yet
Assignment 6
3 pages
DH1 - LV - Energization Comissioning Plan (002) - With Enppi Comments
No ratings yet
DH1 - LV - Energization Comissioning Plan (002) - With Enppi Comments
1 page
HDD Phycomp
No ratings yet
HDD Phycomp
40 pages
The Result of Learners Who Have Own Personal Computer..........
No ratings yet
The Result of Learners Who Have Own Personal Computer..........
22 pages
Summit X440 Series: The Intelligent, Affordable Edge
No ratings yet
Summit X440 Series: The Intelligent, Affordable Edge
12 pages
Ucs415 MST With Solutions
No ratings yet
Ucs415 MST With Solutions
9 pages
You Exec - Six Sigma Free
No ratings yet
You Exec - Six Sigma Free
7 pages
Sawyer & Moxa Connection Guide
No ratings yet
Sawyer & Moxa Connection Guide
4 pages
AI Emergy AnalysisCBRN Final
No ratings yet
AI Emergy AnalysisCBRN Final
3 pages
(Ebook PDF) Accounting Information Systems 11th Edition by Patrick Wheeler Instant Download
No ratings yet
(Ebook PDF) Accounting Information Systems 11th Edition by Patrick Wheeler Instant Download
53 pages
JavaScript Conditional Statements Guide
No ratings yet
JavaScript Conditional Statements Guide
107 pages
Rise of AI
No ratings yet
Rise of AI
7 pages
Centrilift ESP Equipment Catalog PDF
67% (3)
Centrilift ESP Equipment Catalog PDF
177 pages
Introduction PDF
No ratings yet
Introduction PDF
13 pages
MoTeC - PDM30
No ratings yet
MoTeC - PDM30
10 pages
Concrete: History, Types, and Properties
No ratings yet
Concrete: History, Types, and Properties
23 pages

3 Expectation

Uploaded by

3 Expectation

Uploaded by

CS 215

Data Analysis and Interpretation

• This gives meaning to parameter λ as average number of arrivals in unit time

• This gives meaning to parameter β as average inter-arrival time

• Gaussian expectation μ(=np here) is

• E[a X] = a E[X], where ‘a’ is a scalar constant

• This generalizes to:

s x := X(s) y := Y(x) := Y(X(s))

• Discrete random variable: 𝐸 𝑌 𝑋 ≔ 𝐸" # 𝑌(𝑋) ≔ ∑!! 𝑌 𝑥$ 𝑃(𝑥$ )

• If X and Y are independent, then E[XY] = E[X] E[Y]

Sum over rows (row number = x)

Sum over columns (column number = k)

• So, Var(X) = β2 . So, β = E[X] = SD(X); unlike Poisson.

• Gaussian variance σ2 (= npq in this case) is

• If X and Y are independent,

• Theorem: Let X be a random variable with PDF P(.),

• Property: Cov(aX, Y) = a.Cov(X, Y) = Cov(X, aY)

• Thus, Cor(X,Y) = E[X*Y*], where X* and Y* are the standardized variables

• We showed that independence implies zero covariance/correlation,

You might also like

• Thus, Cor(X,Y) = E[XY], where X* and Y* are the standardized variables