0% found this document useful (0 votes)

81 views33 pages

Conditional Expectation

The document discusses conditional expectation and how it relates to concepts in linear algebra like vector projection. It begins by motivating conditional expectation and listing references. It then defines conditional expectation mathematically for discrete random variables X and Y as the expectation of X given the value of Y. The conditional expectation of X given Y minimizes the mean squared error and has certain properties related to expectations. The document draws connections between conditional expectation and concepts in vector spaces like subspaces, bases, and coordinates to provide geometric intuition for conditional expectation.

Uploaded by

Osama Hassan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views33 pages

Conditional Expectation

Uploaded by

Osama Hassan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Understanding Conditional Expectation via Vector

Projection
Cheng-Shang Chang
Department of Electrical Engineering
National Tsing Hua University
Hsinchu, Taiwan, R.O.C.
Jan. 14, 2008
1
Motivation and References
Many students are confused with conditional expectation.
In this talk, we explain how conditional expectation (taught in
probability) is related to linear transformation and vector projection
(taught in linear algebra).
References:
S.J. Leon. Linear Algebra with Applications. New Jersey: Prentice Hall,
1998.
S. Ghahramani. Fundamentals of Probability. Pearson Prentice Hall,
2005.
2
Conditional Expectation
Consider two discrete random variables X and Y .
Let p(x, y) = P(X = x, Y = y) be the joint probability mass function.
Then the marginal distribution
p
X
(x) = P(X = x) =

yB
p(x, y),
where B is the set of possible values of Y .
Similarly,
p
Y
(y) = P(Y = y) =

xA
p(x, y),
where A is the set of possible values of X.
Then the conditional probability mass function of X given Y = y is
p
X[Y
(x[y) = P(X = x[Y = y) =
p(x, y)
p
Y
(y)
.
3
Conditional Expectation
The conditional expectation of X given Y = y is dened as
E[X[Y = y] =

xA
xp
X[Y
(x[y). (1)
Consider a real-valued function h from 1 to 1.
From the law of unconscious statistician, the conditional expectation of
h(X) given Y = y is
E[h(X)[Y = y] =

xA
h(x)p
X[Y
(x[y).
The conditional expectation of X given Y , denoted by E[X[Y ], is the
function of Y that is dened to be E[X[Y = y] when Y = y.
Specically, let (x) be the function with (0) = 1 and (x) = 0 for all x ,= 0.
Also, let
y
(Y ) = (Y y) be the indicator random variable such that

y
(Y ) = 1 if the event Y = y occurs and
y
(Y ) = 0 otherwise.
Then
E[X[Y ] =

yB
E[X[Y = y]
y
(Y ) =

xA
xp
X[Y
(x[y)
y
(Y ). (2)
4
Properties of Conditional Expectation
The expectation of the conditional expectation of X given Y is the same
as the expectation of X, i.e.,
E[X] = E[E[X[Y ]]. (3)
Let h be a real-valued function from 1 to 1. Then
E[h(Y )X[Y ] = h(Y )E[X[Y ]. (4)
As E[X[Y ] is a function of Y ,
E[E[X[Y ][Y ] = E[X[Y ]E[1[Y ] = E[X[Y ].
This then implies
E[X E[X[Y ][Y ] = 0. (5)
Using (3) and (5) yields
E[h(Y )(X E[X[Y ])] = E[E[h(Y )(X E[X[Y ])][Y ]
= E[h(Y )E[(X E[X[Y ])][Y ] = 0. (6)
5
Properties of Conditional Expectation
Let f be a real-valued function from 1 to 1.
E[(X f(Y ))
2
] = E

(X E[X[Y ]) + (E[X[Y ] f(Y ))

= E[(X E[X[Y ])
2
] + 2E[(X E[X[Y ])(E[X[Y ] f(Y ))] + E[(E[X[Y ] f(Y ))
2
]
= E[(X E[X[Y ])
2
] + E[(E[X[Y ] f(Y ))
2
],
where the crossterm is 0 from (6).
The conditional expectation of X given Y is the function of Y that
minimizes E[(X f(Y ))
2
] over the set of functions of Y , i.e.,
E[(X E[X[Y ])
2
] E[(X f(Y ))
2
], (7)
for any function f.
6
Vector Space
Let V be the a set on which the operations of vector addition and scalar
multiplication are dened.
Axioms:
(Commutative law) u + v = v + u for all u and v in V .
(Associative law(i)) (u + v) + w = u + (v + w) for all u, v, w in V .
(Zero element) There exists an element 0 such that u + 0 = u for
any u V .
(Inverse) For any u V , there exists an element u V such that
u + (u) = 0.
(Distributive law(i)) (u + v) = u + v for any scalar and u, v V .
(Distributive law(ii)) ( + )u = u + u for any scalars and and
any u V .
(Associative law (ii)) ()u = (u) for any scalars and and any
u V .
(Identity) 1 u = u for any u V .
7
Vector Space
Closure properties:
If u V and is a scalar, then u V .
If u, v V , then u + v V .
Additional properties from the axioms and the closure properties:
0 u = 0.
u + v = 0 implies that v = u.
(1) u = u.
Example: the vector space C[a, b]
Let C[a, b] be the set of real-valued functions that are dened and
continuous on the closed interval [a, b].
Vector addition: (f + g)(x) = f(x) + g(x).
Scalar multiplication: (f)(x) = f(x).
8
Subspace
(Subspace) If S is a nonempty subset of a vector space V , and S satises
the closure properties, then S is called a subspace of V .
(Linear combination) Let v
1
, v
2
, . . . , v
n
be vectors in a vector space V . A
sum of the form
1
v
1
+
2
v
2
+ . . . +
n
v
n
is called a linear combination of
v
1
, v
2
, . . . , v
n
.
(Span) The set of all linear combinations of v
1
, v
2
, . . . , v
n
is called span of
v
1
, v
2
, . . . , v
n
(denoted by Span(v
1
, v
2
, . . . , v
n
)).
(Spanning set) The set v
1
, v
2
, . . . , v
n
is a spanning set for V if and only if
every vector in V can be written as a linear combination of v
1
, v
2
, . . . , v
n
,
i.e.,
V Span(v
1
, v
2
, . . . , v
n
).
(Linearly independent) The vectors v
1
, v
2
, . . . , v
n
in a vector space V are
said to be linearly independent if
c
1
v
1
+ c
2
v
2
+ . . . c
n
v
n
= 0
implies that all of the scalars c
1
, . . . , c
n
must be 0.
9
Basis and Dimension
(Basis) The vectors v
1
, v
2
, . . . , v
n
form a basis for a vector space V if and
only if
v
1
, v
2
, . . . , v
n
are linear independent.
v
1
, v
2
, . . . , v
n
span V .
(Dimension) If a vector space V has a basis consisting of n vectors, we
say that V has dimension n.
nite-dimensional vector space: If there is a nite set of vectors
that span the vector space.
innite-dimensional vector space: for example C[a, b]
Theorem: Suppose that V is a vector space of dimension n > 0.
Any set of n linear independent vectors spans V .
Any n vectors that span V are linear independent.
No set of less than n vectors can span V .
10
Coordinates
Let E = v
1
, v
2
, . . . , v
n
be an ordered basis for a vector space V .
For any vector v V , it can be uniquely written in the form
v = c
1
v
1
+ c
2
v
2
+ . . . c
n
v
n
.
The vector c = (c
1
, c
2
, . . . , c
n
)
T
in 1
n
is called the coordinate vector of v
with respect to the ordered basis E (denoted by [v]
E
).
The c
i
s are called the coordinates of v relative to E.
A vector space with dimension n is isomorphic to 1
n
once a basis is
found.
11
Random Variables on the Same Probability Space
A probability space is a triplet (S, T, P), where S is the sample space, T is
the set of (measurable) events, and P is the probability measure.
A random variable X on a probability space (S, T, P) is a mapping from
X : S 1.
The set of all random variables on the same probability space forms a
vector space with each random variable being a vector.
Vector addition: (X + Y )(s) = X(s) + Y (s) for every sample point s in
the sample space S.
Scalar multiplication: (X)(s) = X(s) for every sample point s in
the sample space S.
12
The Set of Functions of a Discrete Random Variable
Suppose that X is a discrete random variable with the set of possible
values A = x
1
, x
2
, . . . , x
n
.
Let
x
i
(X) = (X x
i
) be the indicator random variable with
x
i
(X) = 1 if
the event X = x
i
occurs and 0 otherwise.
Let (X) = Span(
x
1
(X),
x
2
(X), . . . ,
x
n
(X)).

x
1
(X),
x
2
(X), . . . ,
x
n
(X) are linearly independent. To see this,
suppose s
i
is a sample point such that X(s
i
) = x
i
. Then
(c
1

x
1
(X) + c
2

x
2
(X) + . . . + c
n

x
n
(X))(s
i
) = 0(s
i
) = 0
implies that c
i
= 0.

x
1
(X),
x
2
(X), . . . ,
x
n
(X) is a basis of (X).
(X) is a vector space with dimension n.
13
The Set of Functions of a Discrete Random Variable
(X) is the set of (measurable) functions of the random variable X.
For any real-valued function g from 1 to 1, g(X) is a vector in
(X) as
g(X) =
n

i=1
g(x
i
)
x
i
(X).
For any vector v in (X), there is a real-valued function g from 1
to 1 such that v = g(X). To see this, suppose that
v =
n

i=1
c
i

x
i
(X).
We simply nd a function g such that g(x
i
) = c
i
for all i.
The vector (g(x
1
), g(x
2
), . . . , g(x
n
))
T
1
n
is the coordinate vector of g(X)
with respect to the ordered basis
x
1
(X),
x
2
(X), . . . ,
x
n
(X).
In probability theory, (X) is often called as the -algebra generated by
the random variable X, and a random variable Y is called
(X)-measurable if there is a (measurable) function g such that Y = g(X).
14
Linear Transformation
A mapping L from a vector space V into a vector space W is said to be a
linear transformation if
L(v
1
+ v
2
) = L(v
1
) + L(v
2
)
for all v
1
, v
2
V and for all scalars , .
(Matrix representation theorem) If E = [v
1
, v
2
, . . . , v
n
] and F = [w
1
, w
2
, . . . , w
m
]
are ordered bases for vector spaces V and W, respectively, then
corresponding to each linear transformation L : V W there is an mn
matrix A such that
[L(v)]
F
= A[v]
E
for each v V.
The matrix A is called the matrix representing the linear transformation
L relative to the ordered bases E and F.
The j
th
column of the matrix A is simply of the coordinate vector of L(v
j
)
with respect to the ordered basis F, i.e.,
a
j
= [L(v
j
)]
F
.
15
Conditional Expectation As a Linear Transformation
Suppose that X is a discrete random variable with the set of possible
values A = x
1
, x
2
, . . . , x
n
.
Suppose that Y is a discrete random variable with the set of possible
values B = y
1
, y
2
, . . . , y
m
.
Let (X) = Span(
x
1
(X),
x
2
(X), . . . ,
x
n
(X)) be the vector space that consists
of the set of functions of the random variable X.
Let (Y ) = Span(
y
1
(Y ),
y
2
(Y ), . . . ,
y
m
(Y )) be the vector space that consists
of the set of functions of the random variable Y .
Consider the linear transformation L : (X) (Y ) with
L(
x
i
(X)) =
m

j=1
P(X = x
i
[Y = y
j
)
y
j
(Y ), i = 1, 2, . . . , n.
The linear transformation L can be represented by the mn matrix A
with
a
i,j
= P(X = x
i
[Y = y
j
).
16
Conditional Expectation As a Linear Transformation
Since g(X) =

n
i=1
g(x
i
)
x
i
(X), we then have
L(g(X)) = L(
n

i=1
g(x
i
)
x
i
(X))
=
n

i=1
g(x
i
)L(
x
i
(X))
=
n

i=1
g(x
i
)
m

j=1
P(X = x
i
[Y = y
j
)
y
j
(Y )
=
m

j=1

i=1
g(x
i
)P(X = x
i
[Y = y
j
)

y
j
(Y )
=
m

j=1
E[g(X)[Y = y
j
]
y
j
(Y )
= E[g(X)[Y ].
The linear transformation L of the random variable g(X) is the condition
expectation of g(X) given Y .
17
Inner Product
(Inner product) An inner product on a vector space V is a mapping that
assigns to each pair of vectors u and v in V a real number u, v) with the
following three properties:
u, u) 0 with equality if and only if u = 0.
u, v) = v, u) for all u and v in V .
u + v, w >= u, w) + v, w) for all u, v, w in V and all scalars
and .
(Inner product space) A vector space with an inner product is called an
inner product space.
(Length) The length of a vector u is given by
[[u[[ =

u, u).
(Orthogonality) Two vectors u and v are orthogonal if u, v) = 0.
(The Pythagorean law) If u and v are orthogonal vectors, then
[[u + v[[
2
= [[u[[
2
+ [[v[[
2
.
18
Inner Product on the Vector Space of Random Variables
Consider the vector space of the set of random variables on the same
probability space.
Then
X, Y ) = E[XY ]
is an inner product of that vector space.
Note that E[X
2
] = 0 implies that X = 0 with probability 1.
If we restrict ourselves to the set of random variables with mean 0. Then
two vectors are orthogonal if and only if they are uncorrelated.
As a direct consequence, two independent random variables with mean 0
are orthogonal.
19
Scalar Projection and Vector Projection
(Scalar projection) If u and v are vectors in an inner product space V and
v ,= 0, then the scalar projection of u onto v is given by
=
u, v)
[[v[[
.
(Vector Projection) The vector projection of u onto v is given by
p = (
1
[[v[[
v) =
u, v)
v, v)
v.
Properties:
u p and p are orthogonal.
u = p if and only if u is a scalar multiple of v.
20
Vector Projection on a Vector Space with an Orthogonal
Basis
An order basis v
1
, v
2
, . . . v
n
for a vector space V is said to be an
orthogonal basis for V if v
i
, v
j
) = 0 for all i ,= j.
Let S be a subspace of an inner product space V . Suppose that S has an
orthogonal basis v
1
, v
2
, . . . v
n
. Then the vector projection of u onto S is
given by
p =
n

i=1
u, v
i
)
v
i
, v
i
)
v
i
.
Properties:
u p is orthogonal to every vector in S.
u = p if and only if u S.
(Least square) p is the element of S that is closest to u, i.e.,
[[u v[[ > [[u p[[,
for any v ,= p in S. Prove by the Pythagorean law.
[[u v[[
2
= [[(u p) + (p v)[[
2
= [[u p[[
2
+ [[p v[[
2
.
21
Conditional Expectation as a Vector Projection
We have shown that E[g(X)[Y ] is the linear transformation of L(g(X)) from
(X) to (Y ) with
L(
x
i
(X)) =
m

j=1
P(X = x
i
[Y = y
j
)
y
j
(Y ) = E[
x
i
(X)[Y ], i = 1, 2, . . . , n.
Note that
y
i
(Y )
y
j
(Y ) = 0 for all i ,= j.
Thus, E[
y
i
(Y )
y
j
(Y )] = 0 for all i ,= j.

y
1
(Y ),
y
2
(Y ), . . . ,
y
m
(Y ) is an orthogonal basis for (Y ).
The vector projection of
x
i
(X) on (Y ) is then given by
m

j=1

x
i
(X),
y
j
(Y ))

y
j
(Y ),
y
j
(Y ))

y
j
(Y ) =
m

j=1
E[
x
i
(X)
y
j
(Y )]
E[
y
j
(Y )
y
j
(Y )]

y
j
(Y )
=
m

j=1
E[
x
i
(X)
y
j
(Y )]
E[
y
j
(Y )]

y
j
(Y ) =
m

j=1
P(X = x
i
, Y = y
j
)
P(Y = y
j
)

y
j
(Y )
=
m

j=1
P(X = x
i
[Y = y
j
)
y
j
(Y ) = E[
x
i
(X)[Y ].
22
Conditional Expectation as a Vector Projection
Recall that an inner product is a linear transformation for the rst
argument, i.e.,
u + v, w >= u, w) + v, w)
for all u, v, w in V and all scalars and .
Since g(X) =

n
i=1
g(x
i
)
x
i
(X), the vector projection of g(X) on (Y ) is then
given by
m

j=1
g(X),
y
j
(Y ))

y
j
(Y ),
y
j
(Y ))

y
j
(Y ) =
m

j=1

n
i=1
g(x
i
)
x
i
(X),
y
j
(Y ))

y
j
(Y ),
y
j
(Y ))

y
j
(Y )
=
n

i=1
g(x
i
)
m

j=1

x
i
(X),
y
j
(Y ))

y
j
(Y ),
y
j
(Y ))

y
j
(Y )
=
n

i=1
g(x
i
)E[
x
i
(X)[Y ] = E[
n

i=1
g(x
i
)
x
i
(X)[Y ]
= E[g(X)[Y ].
Thus, E[g(X)[Y ] is the vector projection of g(X) on (Y ).
23
Conditional Expectation as a Vector Projection
It then follows from the properties of vector projection that
g(X) E[g(X)[Y ] is orthogonal to every random variable in (Y ), i.e.,
for any real-valued function h : 1 1,
g(X) E[g(X)[Y ], h(Y )) = E[(g(X) E[g(X)[Y ])h(Y )] = 0.
(Least square) E[g(X)[Y ] is the element of (Y ) that is closest to
g(X), i.e., for any real-valued function h : 1 1 and
h(Y ) ,= E[g(X)[Y ],
E[(g(X)h(Y ))
2
] = [[g(X)h(Y )[[ > [[g(X)E[g(X)[Y ][[ = E[(g(X)E[g(X)[Y ])
2
].
24
Conditioning on a Set of Random Variables
Note that Y only needs to be a random element in the previous
development.
In particular, if Y = (Y
1
, Y
2
, . . . , Y
d
) is a d-dimensional random vector, then
(Y ) = (Y
1
, Y
2
, . . . , Y
d
) is the set of functions of Y
1
, Y
2
, . . . , Y
d
.
E[g(X)[Y ] = E[g(X)[Y
1
, Y
2
, . . . , Y
d
] is the vector projection of g(X) on
(Y
1
, Y
2
, . . . , Y
d
).
g(X) E[g(X)[Y
1
, Y
2
, . . . , Y
d
] is orthogonal to every random variable in
(Y
1
, Y
2
, . . . , Y
d
), i.e., for any function h : 1
d
1,
g(X) E[g(X)[Y
1
, Y
2
, . . . , Y
d
], h(Y
1
, Y
2
, . . . , Y
d
))
= E[(g(X) E[g(X)[Y
1
, Y
2
, . . . , Y
d
])h(Y
1
, Y
2
, . . . , Y
d
)] = 0.
(Least square) E[g(X)[Y
1
, Y
2
, . . . , Y
d
] is the element of (Y
1
, Y
2
, . . . , Y
d
)
that is closest to g(X), i.e., for any function h : 1
d
1 and
h(Y
1
, Y
2
, . . . , Y
d
) ,= E[g(X)[Y
1
, Y
2
, . . . , Y
d
],
E[(g(X) h(Y
1
, Y
2
, . . . , Y
d
))
2
] > E[(g(X) E[g(X)[Y
1
, Y
2
, . . . , Y
d
])
2
].
25
General Denition of Conditional Expectation
In some advanced probability books, conditional expectation is dened in
a more general way.
For a -algebra (, E[X[(] is dened to be the random variable that
satises
(i) E[X[(] is (-measurable, and
(ii)

A
XdP =

A
E[X[(]dP for all A (.
To understand this denition, consider the -algebra generated by the
random variable Y (denoted by (Y )).
The condition that E[X[Y ] is (Y )-measurable is simply that E[X[Y ] is a
(measurable) function of Y , i.e., E[X[Y ] = h(Y ) for some (measurable)
function.
To understand the second condition, one may rewrite it as follows:
E[1
A
X] = E[1
A
E[X[Y ]], (8)
for all event A in (Y ), where 1
A
is the indicator random variable with
1
A
= 1 when the event A occurs.
26
General Denition of Conditional Expectation
Since 1
A
is (Y )-measurable, it must be a function of Y . Thus, (8) is
equivalent to
E[g(Y )X] = E[g(Y )E[X[Y ]], (9)
for any (measurable) function g.
Now rewriting (9) using the inner product yields
g(Y ), X E[X[Y ]) = 0, (10)
for any function g.
The condition in (10) simply says that X E[X[Y ] is orthogonal to every
vector in (Y ) (X E[X[Y ] is in the orthogonal complement of (Y )).
To summarize, the rst condition is that the vector projection should be
in the projected space, and the second condition is that the dierence
between the vector being projected and the vector projection should be
in the orthogonal complement of the projected space.
These two conditions are exactly the same as those used to dene
projections in linear algebra.
27
Projections on the Set of Linear Functions of Y
Recall that (Y ) = Span(
y
1
(Y ),
y
2
(Y ), . . . ,
y
m
(Y )) is the set of functions of
Y .

L
(Y ) = Span(Y, 1) be the set of linear functions of Y , i.e., the set of
functions of the form aY + b for some constants a and b.

L
(Y ) is a subspace of (Y ).
However, Y and 1 are in general not orthogonal as E[Y 1] = E[Y ] may not
be 0.
(Gram-Schmidt orthogonalization process) Y E[Y ], 1 is an orthogonal
basis for
L
(Y ) as
E[(Y E[Y ]) 1] = E[Y ] E[Y ] = 0.
The projection of a random variable X on
L
(Y ) is then given by
p
L
=
X, Y E[Y ])
Y E[Y ], Y E[Y ])
(Y E[Y ]) +
X, 1)
1, 1)
1
=
E[XY ] E[X]E[Y ]
E[(Y E[Y ])
2
]
(Y E[Y ]) + E[X].
28
Projections on the Set of Linear Functions of Y
It then follows from the properties of vector projection that
X
E[XY ]E[X]E[Y ]
E[(Y E[Y ])
2
]
(Y E[Y ]) E[X] is orthogonal to every random
variable in
L
(Y ), i.e., for any constants a and b,
E

X
E[XY ] E[X]E[Y ]
E[(Y E[Y ])
2
]
(Y E[Y ]) E[X]

aY + b

= 0.
(Least square)
E[XY ]E[X]E[Y ]
E[(Y E[Y ])
2
]
(Y E[Y ]) + E[X] is the element of
L
(Y )
that is closest to X, i.e., for any constants a and b,
E[(X aY b)
2
] E

X
E[XY ] E[X]E[Y ]
E[(Y E[Y ])
2
]
(Y E[Y ]) E[X]

.
When X and Y are jointly normal, then the vector projection of X on
(Y ) is the same as that on
L
(Y ), i.e.,
E[X[Y ] =
E[XY ] E[X]E[Y ]
E[(Y E[Y ])
2
]
(Y E[Y ]) + E[X].
29
Projections on a Subspace of (Y )
Let Y
i
=
i
(Y ), i = 1, 2, . . . , d, where
i
()s are some known functions of Y .
Let

(Y ) = Span(1, Y
1
, Y
2
, . . . , Y
d
).

(Y ) is a subspace of (Y ).
In general, 1, Y
1
, Y
2
, . . . , Y
d
is not an orthogonal basis of

(Y ).
How do we nd an orthogonal basis of

(Y )?
(Zero mean) Let

Y
i
= Y
i
E[Y
i
]. Then 1,

Y
i
) = E[

Y
i
] = 0.
(Matrix diagonalization) Let

Y = (

Y
1
,

Y
2
, . . . ,

Y
d
)
T
. Let A = E[

Y
T
] be the
d d covariance matrix. As A is symmetric, there is an orthogonal matrix
U and a diagonal matrix D such that
D = U
T
AU.
Let Z = (Z
1
, Z
2
. . . , Z
d
)
T
= U
T

Y. Then
E[ZZ
T
] = E[U
T

Y
T
U] = U
T
E[

Y
T
]U = U
T
AU = D.
Thus, 1, Z
1
, Z
2
, . . . , Z
d
is an orthogonal basis of

(Y ).
30
Projections on a Subspace of (Y )
The projection of a random variable X on

(Y ) is then given by
p

=
d

k=1
X, Z
k
)
Z
k
, Z
k
)
Z
k
+
X, 1)
1, 1)
1
=
d

k=1
E[XZ
k
]
E[Z
2
k
]
Z
k
+ E[X].
It then follows from the properties of vector projection that
X p

is orthogonal to every random variable in

(Y ), i.e., for any

constants a
k
, k = 1, 2, . . . , d, and b,
E

X p

k=1
a
k

k
(Y ) + b

= 0.
(Least square) p

is the element of

(Y ) that is closest to X, i.e.,

for any constants a
k
, k = 1, 2, . . . , d, and b,
E[(X
d

k=1
a
k

k
(Y ) b)
2
] E

X p

)
2

.
31
Regression
We have shown how to compute the conditional expectation (and other
projections on a subspace of (Y )) if the point distribution of X and Y is
known.
Suppose that the point distribution of X and Y is unknown.
Instead, a random sample of size n is given, i.e., (x
k
, y
k
), k = 1, 2, . . . , n is
known.
How do you nd h(Y ) such that E[(X h(Y ))
2
] is minimized?
(Empirical distribution) Even though we do not know the true
distribution, we still have the empirical distribution, i.e.,
P(X = x
k
, Y = y
k
) =
1
n
, k = 1, 2, . . . , n.
Then one can use the empirical distribution to compute the conditional
expectation (and other projections on a subspace of (Y )).
32
Linear Regression
(Linear regression) Use the empirical distribution as the distribution of X
and Y . Then
p
L
=
E[XY ] E[X]E[Y ]
E[(Y E[Y ])
2
]
(Y E[Y ]) + E[X],
where
E[XY ] =
1
n
n

k=1
x
k
y
k
,
E[X] =
1
n
n

k=1
x
k
, E[Y ] =
1
n
n

k=1
y
k
,
E[Y
2
] =
1
n
n

k=1
y
2
k
.
p
L
minimizes the empirical square error (risk)
E[(X aY b)
2
] =
1
n
n

k=1
(x
k
ay
k
b)
2
for any constants a and b.
33

Math Review For ML
No ratings yet
Math Review For ML
41 pages
Digital Comm: Probability & Algebra
No ratings yet
Digital Comm: Probability & Algebra
5 pages
Vector Space & Random Variables
No ratings yet
Vector Space & Random Variables
6 pages
Projection
No ratings yet
Projection
12 pages
2 Probability and Linear Algebra
No ratings yet
2 Probability and Linear Algebra
21 pages
Week1 Summary Detail
No ratings yet
Week1 Summary Detail
29 pages
Probab Refresh
No ratings yet
Probab Refresh
7 pages
Compre
No ratings yet
Compre
46 pages
Econometrics for Graduate Students
No ratings yet
Econometrics for Graduate Students
33 pages
Vector Spaces-Imgprocessimg PDF
No ratings yet
Vector Spaces-Imgprocessimg PDF
25 pages
Random Vectors
No ratings yet
Random Vectors
33 pages
Linear Algebra
No ratings yet
Linear Algebra
92 pages
Linear Algebra Notes
No ratings yet
Linear Algebra Notes
17 pages
Linear Algebra Lecture Notes
No ratings yet
Linear Algebra Lecture Notes
50 pages
Multivariate Statistics - An Introduction 8th Edition
100% (1)
Multivariate Statistics - An Introduction 8th Edition
202 pages
Ta 2
No ratings yet
Ta 2
7 pages
L3-7 Mathematical Foundations
No ratings yet
L3-7 Mathematical Foundations
25 pages
CHAPTERS 1-2 Algebre Lineaire 3
No ratings yet
CHAPTERS 1-2 Algebre Lineaire 3
80 pages
Math Notes for ECON2125 Students
100% (1)
Math Notes for ECON2125 Students
162 pages
Math Recap
No ratings yet
Math Recap
45 pages
Vector Spaces and Linear Algebra
No ratings yet
Vector Spaces and Linear Algebra
31 pages
What Is The Main Problem in Control?: Chapter 1 Overview of Control Engineering
100% (1)
What Is The Main Problem in Control?: Chapter 1 Overview of Control Engineering
82 pages
Elements of Convex Optimization Theory - 2015
No ratings yet
Elements of Convex Optimization Theory - 2015
31 pages
Advanced Vector Space Concepts
No ratings yet
Advanced Vector Space Concepts
69 pages
Lec 4
No ratings yet
Lec 4
21 pages
Stochastic Processes, Detection and Estimation: 6.432 Course Notes
No ratings yet
Stochastic Processes, Detection and Estimation: 6.432 Course Notes
52 pages
Linear Algebra
No ratings yet
Linear Algebra
18 pages
Lect 06
No ratings yet
Lect 06
16 pages
Week1 Summary Detail
No ratings yet
Week1 Summary Detail
40 pages
Multivariate Statistical Distributions
No ratings yet
Multivariate Statistical Distributions
12 pages
Basics of Vector Spaces and Transformations
No ratings yet
Basics of Vector Spaces and Transformations
7 pages
Linear Algebra and Matrix Analysis: Vector Spaces
No ratings yet
Linear Algebra and Matrix Analysis: Vector Spaces
19 pages
Linear Algebra Cheat Sheet
No ratings yet
Linear Algebra Cheat Sheet
5 pages
Iiserb Mm1 Notes Oct 4
No ratings yet
Iiserb Mm1 Notes Oct 4
30 pages
Basics of Vector Space
No ratings yet
Basics of Vector Space
25 pages
Linear-Algebra-Review Xid-8243921 1
No ratings yet
Linear-Algebra-Review Xid-8243921 1
6 pages
Franz Luef Linear Methods
No ratings yet
Franz Luef Linear Methods
107 pages
Topic 3
No ratings yet
Topic 3
91 pages
The Dual Space
No ratings yet
The Dual Space
11 pages
Vector Space
No ratings yet
Vector Space
67 pages
Linalg Math Start23 Jupyter Notebook
No ratings yet
Linalg Math Start23 Jupyter Notebook
17 pages
Iiserb Mm1 Notes
No ratings yet
Iiserb Mm1 Notes
21 pages
Algebraic and Matrix Form
No ratings yet
Algebraic and Matrix Form
66 pages
Math6015 Lecture 01
No ratings yet
Math6015 Lecture 01
100 pages
Lecture2 Math ML Review
No ratings yet
Lecture2 Math ML Review
87 pages
Topic 4 - Sequences of Random Variables
No ratings yet
Topic 4 - Sequences of Random Variables
32 pages
3501 Handouts
No ratings yet
3501 Handouts
41 pages
MAT 213-304 Linear Algebra II Notes
No ratings yet
MAT 213-304 Linear Algebra II Notes
35 pages
Linear Algebra
No ratings yet
Linear Algebra
96 pages
Econometrics: Matrix Algebra Exercises
No ratings yet
Econometrics: Matrix Algebra Exercises
4 pages
MATH 261 Under Revision Updated
No ratings yet
MATH 261 Under Revision Updated
43 pages
LN 1
No ratings yet
LN 1
11 pages
Love Fest in Algebra
No ratings yet
Love Fest in Algebra
17 pages
Calculus
No ratings yet
Calculus
11 pages
Solomon: Algebra 2 Notes
No ratings yet
Solomon: Algebra 2 Notes
108 pages
Linear Algebra
No ratings yet
Linear Algebra
3 pages
Kuttler LinearAlgebra AFirstCourse Yorku MATH2022 Summer2016
No ratings yet
Kuttler LinearAlgebra AFirstCourse Yorku MATH2022 Summer2016
256 pages
L07 Dis - Math
No ratings yet
L07 Dis - Math
25 pages
Geometry Problem Solving
No ratings yet
Geometry Problem Solving
5 pages
L09-Discrete Math
No ratings yet
L09-Discrete Math
38 pages
Unit's Grade: Qualification: BTEC Level 2 Extended Diploma in Mechanical Engineering Year-1
No ratings yet
Unit's Grade: Qualification: BTEC Level 2 Extended Diploma in Mechanical Engineering Year-1
1 page
Assignment - Delivery Form ND-U4.With Names.
No ratings yet
Assignment - Delivery Form ND-U4.With Names.
4 pages
Department of Foundation and General Studies - Gmath.-Assign1winter
No ratings yet
Department of Foundation and General Studies - Gmath.-Assign1winter
8 pages
Logic Proof Notes
No ratings yet
Logic Proof Notes
59 pages
Propositional Logic: Lecture 1: Sep 2
No ratings yet
Propositional Logic: Lecture 1: Sep 2
58 pages
Hand Out 20080904
No ratings yet
Hand Out 20080904
1 page
Mathematical Induction (Discrete Math)
No ratings yet
Mathematical Induction (Discrete Math)
41 pages
Course Code Faculty Name Co Examiner
No ratings yet
Course Code Faculty Name Co Examiner
1 page
10 Unit Assessment Strategy (Auto-ABC1A)
No ratings yet
10 Unit Assessment Strategy (Auto-ABC1A)
3 pages
Geometry Problem Solving
No ratings yet
Geometry Problem Solving
5 pages
7tasks U3 A2
No ratings yet
7tasks U3 A2
4 pages
8tasks U3 A2
No ratings yet
8tasks U3 A2
5 pages
Physics 2. Electromagnetism: 1 Fields
No ratings yet
Physics 2. Electromagnetism: 1 Fields
9 pages
KS4+Electricity+ +electromagnetism
No ratings yet
KS4+Electricity+ +electromagnetism
24 pages
Chapter 30 (1) .Magnetic
No ratings yet
Chapter 30 (1) .Magnetic
29 pages
Lecture Outline: College Physics, 7 Edition
No ratings yet
Lecture Outline: College Physics, 7 Edition
29 pages
Chapter 28. Magnetic Field
No ratings yet
Chapter 28. Magnetic Field
37 pages
Physics: Magnetic Fields & Currents
100% (1)
Physics: Magnetic Fields & Currents
23 pages
Chapter 26. Current and Resistance
No ratings yet
Chapter 26. Current and Resistance
28 pages
Chapter 26. Current and Resistance
No ratings yet
Chapter 26. Current and Resistance
28 pages
Matrices and Determinants MCQ
72% (36)
Matrices and Determinants MCQ
2 pages
Quadratic Equation Set - 2 (Prelims)
No ratings yet
Quadratic Equation Set - 2 (Prelims)
58 pages
Class 12 Mathematics Project PDF
100% (2)
Class 12 Mathematics Project PDF
13 pages
10 Mathematics Ncert ch05 Arithmetic Progressions Ex 5.2 PDF
No ratings yet
10 Mathematics Ncert ch05 Arithmetic Progressions Ex 5.2 PDF
20 pages
Integral Calculus for Engineers
No ratings yet
Integral Calculus for Engineers
2 pages
Quantum Mechanics
No ratings yet
Quantum Mechanics
29 pages
Richards Equation Unsaturated Flow Simulation
No ratings yet
Richards Equation Unsaturated Flow Simulation
10 pages
Modul 6.2
No ratings yet
Modul 6.2
5 pages
Matlab Vectors and Operations Guide
No ratings yet
Matlab Vectors and Operations Guide
3 pages
11 IB Mathematics Second Term Performance
No ratings yet
11 IB Mathematics Second Term Performance
13 pages
Q.) Explain Scan Line Algorithm of Polygon Clipping
No ratings yet
Q.) Explain Scan Line Algorithm of Polygon Clipping
18 pages
Calculator-Techniques by Dimal PDF
100% (2)
Calculator-Techniques by Dimal PDF
108 pages
Complex
No ratings yet
Complex
2 pages
Advanced Solving Practice #1
No ratings yet
Advanced Solving Practice #1
11 pages
Chen W (1) .L - Discrete Mathematics Super Cool Notes
No ratings yet
Chen W (1) .L - Discrete Mathematics Super Cool Notes
190 pages
Basic Algebra Exercises (MA 419)
100% (1)
Basic Algebra Exercises (MA 419)
12 pages
CHAPTER 1 Partial Derivatives
No ratings yet
CHAPTER 1 Partial Derivatives
22 pages
Nmo QP
No ratings yet
Nmo QP
12 pages
I Mte-1 I: Bachelor'S Degree Programme
No ratings yet
I Mte-1 I: Bachelor'S Degree Programme
6 pages
CH 6
No ratings yet
CH 6
22 pages
Discrete Fourier Transform, DFT and FFT
No ratings yet
Discrete Fourier Transform, DFT and FFT
37 pages
Numerical Methods Using Trust-Region Approach For Solving Nonlinear Ill-Posed Problems
No ratings yet
Numerical Methods Using Trust-Region Approach For Solving Nonlinear Ill-Posed Problems
11 pages
Mathematics Major Program Guide
No ratings yet
Mathematics Major Program Guide
17 pages
Unit - I. Number Theorem
No ratings yet
Unit - I. Number Theorem
2 pages
Session 2024-25: Unit-1: Partial Differential Equations (PDE) Session 2024-25
No ratings yet
Session 2024-25: Unit-1: Partial Differential Equations (PDE) Session 2024-25
230 pages
Matrix
No ratings yet
Matrix
21 pages
Sta. Teresa College: Course Syllabus Course Code Course Name Course Credits Course Description
No ratings yet
Sta. Teresa College: Course Syllabus Course Code Course Name Course Credits Course Description
8 pages
Interpolation: A) Lagrange Method
No ratings yet
Interpolation: A) Lagrange Method
12 pages
DTFT Properties & LTI Systems
No ratings yet
DTFT Properties & LTI Systems
25 pages
Matlab Matrix Operation
No ratings yet
Matlab Matrix Operation
56 pages

Conditional Expectation

Uploaded by

Conditional Expectation

Uploaded by

Understanding Conditional Expectation via Vector

(X E[X[Y ]) + (E[X[Y ] f(Y ))

is orthogonal to every random variable in

(Y ), i.e., for any

(Y ) that is closest to X, i.e.,

You might also like