1
Textbook: “Matrix algebra useful for statistics”, Searle.
Webpage: Webpage:
1. course notes:
http://mail.thu.edu.tw/~wenwei/cgi,
then click on 統計教材 and then click on
and then click on
Math Algebra ( Word , PDF )
2. Online grades:
http://mail.thu.edu.tw/~wenwei
Then, click on
Online Grade: 2008, Summer, Basic Statistics
Objective: introduce basic concepts and skills in matrix
algebra. In addition, some applications of
matrix algebra in statistics are described.
Section 1. Introduction and Matrix Operations
Definition of r c matrix:
2
An r c matrix A is a rectangular array of rc real numbers arranged in r
horizontal rows and c vertical columns:
a11 a12 a1c
a a22 a2 c
A 21
.
ar1 ar 2 arc
The i’th row of A is
rowi ( A) ai1 ai 2 aic , i 1,2, , r , ,
and the j’th column of A is
a1 j
a
col j ( A) , j 1,2, , c.
2j
arj
We often write A as
A aij Ar c .
Matrix addition:
Let
a11 a12 a1c
a a22 a2 c
A Arc [ aij ] 21 ,
ar1 ar 2 arc
b11 b12 b1s
b b22 b2 s
B Bcs [bij ] 21 ,
bc1 bc 2 bcs
d11 d12 d1c
d d 22 d 2c
D Drc [d ij ] 21 .
d r1 dr 2 d rc
Then,
(a11 d11 ) (a12 d12 ) (a1c d1c )
( a d ) ( a d 22 ) (a2 c d 2c )
A D [aij d ij ] 21 21 22
,
( a r1 d r1 ) ( a r 2 dr2 ) (arc d rc )
3
pa11 pa12 pa1c
pa pa22 pa2 c
pA [ paij ] 21 , p R.
par1 par 2 parc
and the transpose of A is denoted as
a11 a21 ar1
a a22 ar 2
At Actr [a ji ] 12
a1c a2 c arc
Example 1:
Let
1 3 1 3 7 0
A and B .
4 5 0 8 1 1
Then,
1 3 3 7 1 0 4 4 1
A B ,
4 8 5 1 0 1 4 6 1
1 2 3 2 1 2 2 6 2
2A
4 2 5 2 0 2 8 10 0
and
1 4
A 3 5 .
t
1 0
Matrix multiplication:
We first define the dot product or inner product of n-vectors.
Definition of dot product:
The dot product or inner product of the n-vectors
b1
b
a a1 a 2 ac and b ,
2
bc
are
c
a b a1b1 a2b2 ac bc ai bi .
i 1
Example 1:
4
4
Let a 1 2 3 and b 5 . Then, a b 1 4 2 5 3 6 32 .
6
Definition of matrix multiplication:
e11 e12 e1s
e e2 s
E E r s
eij 21
e22
er1 er 2 ers
row1 ( A) col1 ( B) row1 ( A) col 2 ( B) row1 ( A) col s ( B)
row ( A) col ( B) row 2 ( A) col2 ( B) row 2 ( A) col s ( B)
2 1
row r ( A) col1 ( B ) row r ( A) col 2 ( B ) row r ( A) col s ( B)
row1 ( A)
row ( A)
2 col ( B ) col ( B ) col ( B )
1 2 s
rowr ( A)
a11 a12 a1c b11 b12 b1s
a a22 a2 c b21 b22 b2 s
21
Arc Bcs
ar1 ar 2 arc bc1 bc 2 bcs
That is,
eij rowi ( A) col j ( B ) ai1b1 j ai 2 b2 j aic bcj , i 1, , r , j 1, , s.
Example 2:
1 2 0 1 3
A22 , B23 .
3 1 1 0 2
Then,
row1 ( A) col1 ( B ) row1 ( A) col2 ( B) row1 ( A) col3 ( B ) 2 1 1
E 23
row 2 ( A) col1 ( B ) row 2 ( A) col2 ( B ) row 2 ( A) col3 ( B ) 1 3 11
since
0 0
row1 ( A) col1 ( B) 1 2 2 , row 2 ( A) col1 ( B ) 3 1 1
1 1
1 1
row1 ( A) col2 ( B ) 1 2 1 , row 2 ( A) col 2 ( B) 3 1 3
0 0
3 3
row1 ( A) col3 ( B ) 1 2 1 , row 2 ( A) col3 ( B) 3 1 11 .
2 2
5
Example 3
1 1 4 5
a31 2, b12 4 5 a 31b12 2 4 5 8 10
3 3 12 15
Another expression of matrix multiplication:
row1 ( B )
row ( B )
Ar c Bcs col1 ( A) col 2 ( A) col c ( A) 2
row c ( B)
c
col1 ( A)row1 ( B ) col 2 ( A)row 2 ( B ) col c ( A)row c ( B ) col i ( A)row i ( B )
i 1
where coli ( A) rowi ( B) are r s matrices.
Example 2 (continue):
row1 ( B)
A22 B23 col1 ( A)
col2 ( A) col1 ( A)row1 ( B ) col2 ( A)row 2 ( B)
row 2 ( B)
1 2 0 1 3 2 0 4 2 1 1
0 1 3 1 0 2
3 1 0 3 9 1 0 2 1 3 11
Note:
row1 ( A)
row ( A)
Heuristically, the matrices A and B, and
2
row r ( A)
col1 ( B) col 2 ( B) col s ( B ) , can be thought as r 1 and 1 s vectors.
Thus,
row1 ( A)
row ( A)
Arc Bcs 2 col1 ( B ) col2 ( B ) cols ( B )
rowr ( A)
can be thought as the multiplication of r 1 and 1 s vectors. Similarly,
6
row1 ( B )
row ( B)
Arc Bcs col1 ( A) col2 ( A) colc ( A) 2
rowc ( B)
can be thought as the multiplication of 1 c and c 1 vectors.
Note:
I. AB is not necessarily equal to BA . For instance,
1 3 2 1
A and B
2 1 0 2
2 5 0 7
AB BA .
4 4 4 2
II. AC BC A might be not equal to B . For instance,
1 3 2 4 1 2
A , B and C
0 1 2 3 1 2
2 4
AC BC but A B
1 2
III. AB 0 , it is not necessary that A 0 or B 0 . For instance,
1 1 1 1
A and B
1 1 1 1
0 0
AB BA but A 0, B 0.
0 0
A p A A A , A p A q A p q , ( A ) A
p q pq
IV.
p factors
Also, ( AB) p is not necessarily equal to A p B p .
V. AB t B t At .
Trace:
Definition of the trace of a matrix:
The sum of the diagonal elements of a rr square matrix is called the trace of
the matrix, written tr ( A) , i.e., for
7
a11 a12 a1r
a a 22 a 2 r
A 21 ,
a r1 ar 2 a rr
r
tr ( A) a11 a22 arr aii .
i 1
Example 4:
1 5 6
7 . Then, tr ( A) 1 2 3 6 .
Let A 4 2
8 9 3
Homework 1
1. Prove tr ( AB ) tr ( BA) , where A and B are r c and c r
matrices, respectively.
2.
(a) When does A B A B A 2 B 2 ?
(b) When A t A. Prove tr ( AB) tr ( AB t )
(c) When X t XGX t X X t X , prove X t XG t X t X X t X
Section 2 Special Matrices
2.1 Symmetric Matrices:
Definition of symmetric matrix:
A rr matrix Ar r is defined as symmetric if A A t . That is,
a11 a12 a1r
a a22 a2 r
A 12
, aij a ji .
a1r a2 r arr
Example 1:
1 2 5
A 2 3 6 is symmetric since A A t .
5 6 4
8
Example 2:
Let X 1 , X 2 , , X r be random variables. Then,
X1 X2 … Xr
X 1 Cov ( X 1 , X 1 ) Cov ( X 1 , X 2 ) Cov ( X 1 , X r )
X 2 Cov ( X 2 , X 1 ) Cov ( X 2 , X 2 ) Cov ( X 2 , X r )
V
X r Cov ( X r , X 1 ) Cov ( X r , X 2 ) Cov ( X r , X r )
Var ( X 1 ) Cov ( X 1 , X 2 ) Cov ( X 1 , X r )
Cov ( X , X ) Var ( X 2 ) Cov ( X 2 , X r )
1 2
Cov ( X 1 , X r ) Cov ( X 2 , X r ) Var ( X r )
is called the covariance matrix, where Cov( X i , X j ) Cov( X j , X i ), i, j 1,2, , r ,
is the covariance of the random variables X i and X j and Var ( X i ) is the variance
of X i . V is a symmetric matrix. The correlation matrix for X 1 , X 2 , , X r is defined
as
X1 X2 … Xr
X 1 Corr ( X 1 , X 1 ) Corr ( X 1 , X 2 ) Corr ( X 1 , X r )
X 2 Corr ( X 2 , X 1 ) Corr ( X 2 , X 2 ) Corr ( X 2 , X r )
R
X r Corr ( X r , X 1 ) Corr ( X r , X 2 ) Corr ( X r , X r )
1 Corr ( X 1 , X 2 ) Corr ( X 1 , X r )
Corr ( X , X ) 1 Corr ( X 2 , X r )
1 2
Corr ( X 1 , X r ) Corr ( X 2 , X r ) 1
Cov( X i , X j )
where Corr ( X i , X j ) Corr ( X j , X i ), i, j 1,2, , r , is the
Var ( X i )Var ( X j )
correlation of X i and X j . R is also a symmetric matrix. For instance, let X 1 be the
random variable represent the sale amount of some product and X 2 be the random
variable represent the cost spent on advertisement. Suppose
Var ( X 1 ) 20, Var ( X 2 ) 80, Cov( X 1 , X 2 ) 15.
Then,
20 15
V
15 80
and
9
15 3
1 1
R 20 80 8
3
15 1 1
20 80 8
Example 3:
Let Ar c be a r c matrix. Then, both AA t and A t A are symmetric since
AA
t t
At
t
A t AA t and A t A t
At At
t
At A .
AA t is a rr symmetric matrix while A t A is a c c symmetric matrix.
row1 ( At )
row2 ( At )
AA col1 ( A) col2 ( A) colc ( A)
t
t
rowc ( A )
col1t ( A)
t
col2 ( A)
col1 ( A) col2 ( A) colc ( A)
t
colc ( A)
col1 ( A)col1t ( A) col2 ( A)col2t ( A) colc ( A)colct ( A)
c
coli ( A)colit ( A)
i 1
Also,
row1 ( A)
row ( A)
AA
t
2
row1t ( A) row2t ( A) rowrt ( A)
rowr ( A)
row1 ( A) row1t ( A) row1 ( A) row2t ( A) row1 ( A) rowrt ( A)
row2 ( A) row1t ( A) row2 ( A) row2t ( A) row2 ( A) rowrt ( A)
rowr ( A) row1 ( A) rowr ( A) row2 ( A) rowr ( A) rowr ( A)
t t t
Similarly,
10
row1 ( A)
row ( A)
t
A A row1 ( A) row2 ( A) rowr ( A)
t t t
2
rowr ( A)
row1t ( A) row1 ( A) row2t ( A)row2 ( A) rowrt ( A)rowr ( A)
r
rowit ( A) rowi ( A)
i 1
and
col1t ( A)
t
col2 ( A)
A A
t
1
col ( A) col2 ( A) colc ( A)
t
colc ( A)
col1t ( A) col1 ( A) col1t ( A) col2 ( A) col1t ( A) colc ( A)
t
col ( A) col1 ( A) col2t ( A) col2 ( A) col2t ( A) colc ( A)
2
t
colc ( A) col1 ( A) colc ( A) col2 ( A) colct ( A) colc ( A)
t
For instance, let
1 3
1 2 1
A and A 2 0 .
t
3 0 1
1 1
Then,
row1 ( A t )
AA t col1 ( A) col 2 ( A) col3 ( A) row 2 ( A t )
row3 ( A t )
col1t ( A)
col1 ( A) col 2 ( A) col3 ( A) col2t ( A)
col3t ( A)
col1 ( A)col1t ( A) col 2 ( A)col 2t ( A) col3 ( A)col3t ( A)
1 2 1
1 3 2 0 1 1
3 0 1
1 3 4 0 1 1 6 2
3 9 0 0 1 1 2 10
In addition,
11
At A row1t ( A) row1 ( A) row2t ( A) row2 ( A)
1 3
2 1 2 1 0 3 0 1
1 1
1 2 1 9 0 3 10 2 2
2 4 2 0 0 0 2 4 2
1 2 1 3 0 1 2 2 2
Note:
A and B are symmetric matrices. Then, AB is not necessarily equal to
BA ( AB) t . That is, AB might not be a symmetric matrix.
Example 4:
1 2 3 7
A and B .
2 3 7 6
Then,
17 19 17 27
AB BA 19
27 32 32
Properties of AA t and A t A :
(a)
At A 0 A0
tr ( A A) 0
t
A0
(b)
PAA t QAAt PA QA
[proof]
(a)
Let
col1t ( A) col1 ( A) col1t ( A) col2 ( A) col1t ( A) colc ( A)
t
col2 ( A) col1 ( A) col2t ( A) col2 ( A) col2t ( A) colc ( A)
S A A
t
t
colc ( A) col1 ( A) colc ( A) col2 ( A) colct ( A) colc ( A)
t
sij 0 .
Thus, for j 1,2, , c,
12
a1 j
a
s jj col tj ( A) col j ( A) a1 j a2 j
a rj a12j a 22 j a rj2 0
2j
a rj
a1 j a 2 j a rj 0
A0
tr ( A t A) tr ( S ) s11 s 22 s cc
col1t ( A) col1 ( A) col2t ( A) col2 ( A) colct ( A)colc ( A)
a112 a 21
2
a r21 a122 a 22
2
a r22 a12c a 22c a rc2
0
a 0, i 1,2, , r ; j 1,2, , c. aij 0
2
ij
A0
(b)
Since PAA t QAAt , PAA t QAAt 0,
PAA t
QAAt P t Q t PA QA At P t Q t
PA QA A P t t
At Q t
PA QA PA QA
t
0
By (a),
PA QA t 0 PA QA 0 PA QA
Note:
A r r matrix Br r is defined as skew-symmetric if B B t . That is,
aij a ji , a ii 0 .
Example 5:
0 4 5
B 4 0 6
5 6 0
Thus,
0 4 5 0 4 5
B 4
t
0 6 B 4
t
0 6 B .
5 6 0 5 6 0
2.2 Idempotent Matrices:
Definition of idempotent matrices:
13
A square matrix K is said to be idempotent if
K 2 K.
Properties of idempotent matrices:
1. K r K for r being a positive integer.
2. I K is idempotent.
3. If K 1 and K 2 are idempotent matrices and K 1 K 2 K 2 K 1 . Then,
K 1 K 2 is idempotent.
[proof:]
1.
For r 1, K 1 K .
Suppose K r K is true, then K r 1 K r K K K K 2 K .
By induction, K r K for r being any positive integer.
2.
I K I K I K K K 2 I K K K I K
3.
K 1 K 2 K1 K 2 K1 K 2 K1 K 2 K1 K1 K 2 K 2 since K1 K 2 K 2 K1
K 12 K 22 K 1 K 2
Example 1
Let Ar c be a r c matrix. Then,
K A At A 1
A is an idempotent matrix since
KK A A t A 1
At A At A 1
A AI A t A 1
At A At A 1
AK.
Note:
A matrix A satisfying A 2 0 is called nilpotent, and that for which A 2 I could
be called unipotent.
Example 2:
1 2 5
A 2 4 10 A 2 0 A is nilpotent.
1 2 5
1 3 1 0
B B2 B is unipotent.
0 1 0 1
14
Note:
K is a idempotent matrix. Then, K I might not be idempotent.
2.3 Orthogonal Matrices:
Definition of orthogonality:
Two n 1 vectors u and v are said to be orthogonal if
u tv vtu 0
A set of n 1 vectors x1 , x 2 , , x n is said to be orthonormal if
xit xi 1, xit x j 0, i j , i, j 1,2, , n.
Definition of orthogonal matrix:
A n n square matrix P is said to be orthogonal if
PP t P t P I nn .
Note:
row1 ( P )row1t ( P ) row1 ( P )row 2t ( P ) row1 ( P )row nt ( P )
row 2 ( P )row1t ( P ) row 2 ( P )row 2t ( P ) row 2 ( P )row nt ( P )
PP t
t
row n ( P )row1 ( P ) row n ( P )row 2t ( P ) row n ( P )row nt ( P )
1 0 0
0 1 0
0 0 1
col1t ( P )col1 ( P ) col1t ( P )col 2 ( P ) col1t ( P )col n ( P )
t
col ( P )col1 ( P ) col 2t ( P )col 2 ( P ) col 2t ( P )col n ( P )
2
t t t
col n ( P )col1 ( P ) col n ( P )col 2 ( P ) col n ( P )col n ( P )
Pt P
row i ( P ) row it ( P ) 1, row i ( P ) row tj ( P ) 0
colit ( P )coli ( P ) 1, colit ( P )col j ( P ) 0
Thus,
row ( P), row
t
1
t
2 ( P), , row nt ( P) and col1 ( P), col 2 ( P), , col n ( P)
are both orthonormal sets!!
15
Example 1:
(a) Helmert Matrices:
The Helmert matrix of order n has the first row
1 / n 1/ n 1/ n ,
and the other n-1 rows ( i 2,3, , n ) has the form,
i 1
1 / (i 1)i 1 / (i 1)i 1 / (i 1)i 0 0
i 1 i
(i-1) items n-i items
For example, as n 4 , then
1/ 4 1/ 4 1/ 4 1/ 4
1/ 1 2 1/ 1 2 0 0
H4
1 / 23 1/ 2 3 2/ 23 0
1 / 3 4 1/ 3 4 1/ 3 4 3 / 3 4
1/ 4 1/ 4 1/ 4 1/ 4
1/ 2 1/ 2 0 0
1/ 6 1/ 6 2/ 6 0
1 / 12 1 / 12 1 / 12 3 / 12
In statistics, we can use H to find a set of uncorrelated random variables.
Suppose Z 1 , Z 2 , Z 3 , Z 4 are random variables with
Cov( Z i , Z j ) 0, Cov( Z i , Z i ) 2 , i j , i, j 1,2,3,4.
Let
X1 1/ 4 1/ 4 1/ 4 1 / 4 Z1
X
1 / 2 1 / 2 0 0 Z 2
X H4Z
2
X3 1/ 6 1/ 6 2 / 6 0 Z 3
X 4 1 / 12 1 / 12 1 / 12 3 / 12 Z 4
1 / 4 Z1 Z 2 Z 3 Z 4
1 / 2 Z1 Z 2
1/ 6 Z Z Z
1 2 3
1 / 12 Z1 Z 2 Z 3 3Z 4
Then,
Cov( X i , X j ) 2 row i ( H 4 )row tj ( H 4 ) 0
since row1t ( H 4 ), row2t ( H 4 ), row3t ( H 4 ), row4t ( H 4 ) is an orthonormal set of
vectors. That is, X 1 , X 2 , X 3 , X 4 are uncorrelated random variables. Also,
X X X Zi Z ,
4
2 2 2 2
2 3 4
i 1
16
where
4
Z i
.
Z i 1
(b) Givens Matrices:
Let the orthogonal matrix be
cos( ) sin( )
G .
sin( ) cos( )
G is referred to as a Givens matrix of order 2. For a Givens matrix of order 3,
3
there are 3 different forms,
2
1 2 3 1 2 3
1 cos( ) sin( ) 0 1 cos( ) 0 sin( )
G12 2 sin( ) cos( ) 0, G13 2 0 1 0
3 0 0 1 3 sin( ) 0 cos( )
1 2 3 .
1 1 0 0
G23 2 0 cos( ) sin( )
3 0 sin( ) cos( )
The general form of a Givens matrix Gij of order 3 is an identity matrix except
for 4 elements, cos( ), sin( ), and sin( ) are in the i’th and j’th rows and
4
columns. Similarly, For a Givens matrix of order 4, there are 6 different
2
forms,
1 2 3 4 1 2 3 4
1 cos( ) sin( ) 0 0 1 cos( ) 0 sin( ) 0
2 sin( ) cos( ) 0 0 2 0 1 0 0
G12 , G13
3 0 0 1 0 3 sin( ) 0 cos( ) 0
4 0 0 0 1 4 0 0 0 1
1 2 3 4 1 2 3 4
17
1 cos( ) 0 0 sin( ) 1 1 0 0 0
2 0 1 0 0 2 0 cos( ) sin( ) 0
G14 , G23
3 0 0 1 0 3 0 sin( ) cos( ) 0
4 sin( ) 0 0 cos( ) 4 0 0 0 1
1 2 3 4 1 2 3 4
1 1 0 0 0 1 1 0 0 0
2 0 cos( ) 0 sin( ) 2 0 1 0 0
G24 , G34 .
3 0 0 1 0 3 0 0 cos( ) sin( )
4 0 sin( ) 0 cos( ) 4 0 0 sin( ) cos( )
n
For the Givens matrix of order n, here are different forms. The general
2
form of Grs g ij is an identity matrix except for 4 elements,
g rr g ss cos( ), g rs g sr sin( ), r s .
2.4 Positive Definite Matrices:
Definition of positive definite matrix:
A symmetric n n matrix A satisfying
x1tn Ann x n1 0 for all x 0 ,
is referred to as a positive definite (p.d.) matrix.
Intuition:
If ax 2 0 for all real numbers x, x 0 , then the real number a is positive.
Similarly, as x is a n 1 vector, A is a n n matrix and x t Ax 0 , then the
matrix A is “positive”.
Note:
A symmetric n n matrix A satisfying
x1tn Ann xn1 0 for all x 0 ,
is referred to as a positive semidefinite (p.d.) matrix.
Example 1:
Let
18
x1 1
x 1
x 2 and l .
xn 1
Thus,
n n
xi x xi2 nx 2
2
i 1 i 1
x1 1 / n x1
x 1 / n x
x1 x2 xn 2 n x1 x2 xn 1 / n 1 / n 1 / n 2
xn 1 / n xn
1 1 ll t
x t Ix x t n ll t x x t Ix x t x
n n n
ll t
x t I x
n
ll t
Let A I . Then, A is positive semidefinite since for x 0,
n
n
x t Ax xi x 0 .
2
i 1