0% found this document useful (0 votes)
76 views23 pages

Matrix Algebra Solution

The document provides solutions and hints for selected exercises from 'Matrix Algebra' by James E. Gentle, covering various topics such as linear independence, vector norms, and properties of matrices. It includes detailed explanations and examples for specific exercises, demonstrating mathematical proofs and computations. The exercises range from basic concepts to more advanced applications in matrix algebra.

Uploaded by

David Dinh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views23 pages

Matrix Algebra Solution

The document provides solutions and hints for selected exercises from 'Matrix Algebra' by James E. Gentle, covering various topics such as linear independence, vector norms, and properties of matrices. It includes detailed explanations and examples for specific exercises, demonstrating mathematical proofs and computations. The exercises range from basic concepts to more advanced applications in matrix algebra.

Uploaded by

David Dinh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Solutions and Hints for Selected Exercises

Matrix Algebra by James E. Gentle

Exercises Beginning on Page 63


2.1a. Assume {v1 , v2 , . . . , vk } is a linearly independent set of vectors. Now
consider the pair of vectors vi and vk . If {vi , vj } is not linearly indepen-
dent, then there exist ai and aj , not both 0, such that ai vi + aj vj = 0.
In that case, however, we would have

a1 v1 + · · · + ai vi + · · · + aj vj + · · · + ak vk = 0,

in which one ai , aj 6= 0, contradicting the condition that the full set is


linearly independent.
2.1b. Let v1 = (1, 1, 0), v2 = (1, 0, 1), and v3 = (2, 1, 1). Now {v1 , v2 }, {v1 , v3 },
and {v2 , v3 } are all linearly independent sets. We can see this because
each pair has a vector with a zero that is unmatched by the other vector.
The full set, however, is not linearly independent because v1 +v2 −v3 = 0.
2.4b. Let one vector space consist of all vectors of the form (a, 0) and the
other consist of all vectors of the form (0, b). The vector (a, b) is not in
the union if a 6= 0 and b 6= 0.
2.9. Give a counterexample to the triangle inequality; for example, let x =
(9, 25) and y = (16, 144).
2.12a. We first observe that if kxkp = 0 or kykq = 0, we have x = 0 or y = 0,
and so the inequality is satisfied because both sides are 0; hence, we need
only consider the case kxkp > 0 and kykq > 0. We also observe that if
p = 1 or q = 1, we have the Manhattan and Chebyshev norms and the
inequality is satisfied; hence we need only consider the case 1 < p < ∞.
Now, for p and q as given, for any numbers ai and bi , there are numbers
si and ti such that |ai | = esi /p and |bi | = eti /q . Because ex is a convex
function, we have esi /p+ti /q ≤ p1 esi + 1q eti , or

ai bi ≤ |ai ||bi | ≤ |ai |p /p + |bi |q /q.

Now let
xi yi
ai = and bi = ,
kxkp kykq
and so
xi yi 1 |xi |p 1 |yi |q
≤ p + .
kxkp kykq p kxkp q kykqq
Now, summing these equations over i, we have

hx, yi 1 kxkpp 1 kykqq


≤ +
kxkp kykq p kxkpp q kykqq
= 1.
2

Hence, we have the desired result.


As we see from this proof, the inequality is actually a little stronger
than stated. If we define u and v by ui = |xi | and vi = |yi |, we have

hx, yi ≤ hu, vi ≤ kxkpkykq .

We observe that equality occurs if and only if


  1q   p1
|xi | |yi |
=
kxkp kykq

and
sign(xi ) = sign(yi )
for all i.
We note a special case by letting y = 1:

x̄ ≤ kxkp,

and with p = 2, we have a special case of the Cauchy-Schwarz inequality,

nx̄2 ≤ kxk22,

which guarantees that V(x) ≥ 0.


2.12b. Using the triangle inequality for the absolute value, we have |xi + yi | ≤
|xi | + |yi |. This yields the result for p = 1 and p = ∞ (in the limit).
Now assume 1 < p < ∞. We have
n
X n
X
kx + ykpp ≤ |xi + yi |p−1 |xi | + |xi + yi |p−1 |yi |.
i=1 i=1

Now, letting q = p/(p − 1), we apply Hölder’s inequality to each of the


terms on the right:

n n
! q1 n
! p1
X X X
p−1 (p−1)q p
|xi + yi | |xi | ≤ |xi + yi | |xi |
i=1 i=1 i=1

and
n n
! q1 n
! p1
X X X
p−1 (p−1)q p
|xi + yi | |xi | ≤ |xi + yi | |yi | ,
i=1 i=1 i=1

so
n n
! q1  n ! p1 n
! p1 
X X X X
|xi +yi |p ≤ |xi + yi | (p−1)q  |xi | p
+ |yi |p 
i=1 i=1 i=1 i=1
3

1
or, because (p − 1)q = p and 1 − q = p1 ,

n
! p1 n
! p1 n
! p1
X X X
|xi + yi |p ≤ |xi |p + |yi |p ,
i=1 i=1 i=1

which is the same as

kx + ykp ≤ kxkp + kykp ,


the triangle inequality.
2.21e. In IR3 ,  
kx × yk
angle(x, y) = sin−1 .
kxk ky|
Because x × y = −y × x, this allows us to determine the angle from x
to y; that is, the direction within (−π, π] in which x would be rotated
to y.
2.23. Just consider the orthogonal vectors x = (1, 0) and y = (0, 1). The
centered vectors are xc = ( 21 , − 21 ) and yc = (− 12 , 21 ). The angle between
the uncentered vectors is π/2, while that between the centered vectors
is π.
2.26a.

x <- c(1,2,3)
y <- c(-1,2,-3)
sum(x*y)
# [1] -6

2.26b.

Dot <- function(x, y){


return(sum(x*y))
}
Dot(x, y)
# [1] -6

2.26c.

"%.%" <- function(x, y){


return(sum(x*y))
}
x %.% y
# [1] -6
4

2.27.

x <- c(1,2,3)
y <- c(-1,2,-3)
c(x[2]*y[3]-x[3]*y[2],
x[3]*y[1]-x[1]*y[3],
x[1]*y[2]-x[2]*y[1])
# [1] -12 0 4

2.28.

x <- c(1,2,3)
sum(abs(x))
# [1] 6
sqrt(sum(x*x))
# [1] 3.741657
max(abs(x))
# [1] 3

2.29.

x <- c(1,2,3)
y <- x/sum(x*x)
y
# [1] 0.07142857 0.14285714 0.21428571
sum(x*y)
# [1] 1

2.30a.

x <- c(1,2,3)
y <- c(-1,2,-3)
ypx <- sum(x*y)*x/sum(x*x)
ypx
# [1] -0.4285714 -0.8571429 -1.2857143

2.30b.

c(x[2]*ypx[3]-x[3]*ypx[2],
x[3]*ypx[1]-x[1]*ypx[3],
5

x[1]*ypx[2]-x[2]*ypx[1])
# [1] -4.440892e-16 2.220446e-16 0.000000e+00

The actual cross product is (0, 0, 0). In the solution above, two computed
zeroes are of order −16, which is essentially an additive zero.
2.31a.

x <- c(1,2,3)
y <- c(-1,2,-3)
z <- c(1,0,-3)
u1 <- x; u2 <- y; u3 <- z
u1 <- u1/sqrt(sum(u1*u1))
u2 <- u2 - sum(u1*u2)*u1
u3 <- u3 - sum(u1*u3)*u1
u2 <- u2/sqrt(sum(u2*u2))
u3 <- u3 - sum(u2*u3)*u2
u3 <- u3/sqrt(sum(u3*u3))

2.31b.

w <- c(5,4,3)
c1 <-sum(w*u1)
c2 <-sum(w*u2)
c3 <-sum(w*u3)
c1*u1 + c2*u2 + c3*u3
# [1] 5 4 3

2.32.

n <- 100
set.seed(12345)
u <- rnorm(n)
z <- rnorm(n)
v <- u+z

2.32a.

mu <- mean(u)
vu <- var(u)
mv <- mean(v)
vv <- var(v)
6

round(c(mu,vu,mv,vv),2)
# [1] 0.25 1.24 0.29 2.50

2.32b.

covuv <- cov(u,v)


coruv <- covuv/sqrt(vu*vv)
round(c(covuv,coruv),2)
# [1] 1.36 0.77

Exercises Beginning on Page 203


3.4. This exercise occurs in various guises in many different places, and the
simple approach is to add and subtract x̄:

(x − a)T (x − a) = (x − x̄ − (a + x̄))T (x − x̄ − (a + x̄))


= (xc − (a + x̄))T (xc − (a + x̄))
= xT T T
c xc + (a + x̄) (a + x̄)) − 2(a + x̄) xc
= xT 2
c xc + n(a + x̄) .

Finally, we get the expressions in equation (3.99) by writing xT


c xc as
tr(xc xT
c ).
3.16. Write the n × m matrix A as

A = (aij ) = [a∗1 , . . . , a∗m ].

If A is of rank one, the maximum number of linearly independent


columns is one; hence, for k = 2, . . . , m, a∗k = ck a∗1 , for some ck .
Now let x = a∗1 , which is an n-vector, and let y be an m-vector whose
first element is 1 and whose k = 2, . . . , m elements are the ck s. We see
that A = xyT by direct multiplication.
This decomposition is not unique, of course.
3.17. If the elements of the square matrix A are integers then each cofactor
a(ij) is an integer, and hence the elements of adj(A) are integers. If
|A| = ±1, then A−1 = ±adj(A), and so the elements of A−1 are integers.
An easy way to form a matrix whose determinant is ±1 is to form an
upper triangular matrix with either 1 or −1 on the diagonal. A more
“interesting matrix” that has the same determinant can then be formed
by use of elementary operations. (People teaching matrix algebra find
this useful!)
3.18. Because the inverse of a matrix is unique, we can verify each equation
by multiplication by the inverse at the appropriate place. We can of-
ten reduce an expression to a desired form by multiplication by the
7

identity M M −1 , where M is some appropriate matrix. For example,


equation (3.187) can be verified by the equations

(A + B)(A−1 − A−1 (A−1 + B −1 )−1 A−1 ) =


−1 −1 −1 −1 −1 −1 −1 −1 −1 −1
I − (A + B ) A + BA − BA (A + B ) A =
I + BA−1 − (I + BA−1 )(A−1 + B −1 )−1 A−1 =
(I + BA−1 )(I − (A−1 + B −1 )−1 A−1 ) =
B(B −1 + A−1 )(I − (A−1 + B −1 )−1 A−1 ) =
B(B −1 + A−1 ) − BA−1 = I

3.21. Express the nonzero elements of row i in the n × n matrix A as ai bk


for k = i, . . . , n, and the nonzero elements of column j as ak bj for
k = j, . . . , n. Then obtain expressions for the elements of A−1 . Show,
for example, that the diagonal elements of A−1 are (ai bi )−1 .
3.25. For property 8, let c be a nonzero eigenvalue of AB. Then there exists
v (6= 0) such that ABv = cv, that is, BABv = Bcv. But this means
BAw = cw, where w = Bv 6= 0 (because ABv 6= 0) and so c is an
eigenvalue of BA. We use the same argument starting with an eigenvalue
of BA. For square matrices, there are no other eigenvalues, so the set
of eigenvalues is the same.
For property 9, see the discussion of similarity transformations on
page 164.
3.37. Let A and B be such that AB is defined.
2
X X
kABk2F = aik bkj
ij k
! !
X X X
≤ a2ik b2kj (Cauchy-Schwarz)
ij k k
  
X X
= a2ik   b2kj 
i,k k,j

= kAk2F kBk2F .

3.40. Hints.
For inequality (3.325), use the Cauchy-Schwartz inequality.
For inequality (3.327), use Hölder’s inequality.
3.45a.

Epqa <- function(n,p=0,q=0,a=0){


# Function to create an elementary operator matrix.
# The operator matrix may be E_pq, E_p(a), or E_pq(a).
Epqa <- NA
8

if (n<2){
print(’n must be at least 2’)
return()
}
perm <- FALSE
mult <- FALSE
axpy <- FALSE
if (p>0&q>0&a==0) perm <- TRUE
if (p>0&q==0) mult <- TRUE
if (p>0&q>0&a!=0) axpy <- TRUE
if (perm+mult+axpy==0){
print(’No valid elementary operator matrix’)
return()
}
# Set Epqa to the identity
Epqa <- matrix(c(1,rep(c(rep(0,n),1),n-1)),ncol=n)
if (perm){ # permute rows
Epqa[p,p] <- 0
Epqa[p,q] <- 1
Epqa[q,q] <- 0
Epqa[q,p] <- 1
} else
if (mult){ # multiply one row by a
Epqa[p,p]=a
} else
if (axpy){ # axpy
Epqa[p,q] <- a
}
return(Epqa)
}

3.45b. Epq is its own inverse.

n <- 5
p <- 2
q <- 4
E <- Epqa(n,p,q)
E%*%E

3.45c. For a 6= 0, Ep (1/a)Ep(a) = I.

n <- 5
p <- 2
9

q <- 0
a <- 3
E1 <- Epqa(n,p,q,a)
E2 <- Epqa(n,p,q,1/a)
E1%*%E2

3.45d. An axpy operation, Epq (−a)Epq (a) = I.

n <- 5
p <- 2
q <- 4
a <- 3
E1 <- Epqa(n,p,q,a)
E2 <- Epqa(n,p,q,-a)
E1%*%E2

Exercises Beginning on Page 247


4.4. Let A is an n × n matrix, whose columns are the same as the vectors
aj , and let QR be the QR factorization of A. Because Q is orthogonal,
det(Q) = 1, and det(R) = det(A). Hence, we have

|det(A)| = |det(R)|
Yn
= |rjj |
j=1
Yn
≤ krj k2
j=1
Yn
= kaj k2 ,
j=1

where rj is the vector whose elements are the same as the elements in
the j th column of R.
If equality holds, then either some aj is zero, or else rjj = krj k for
j = 1, . . . , n. In the latter case, R is diagonal, and hence AT A is diagonal,
and so the columns of A are orthogonal.
4.10. The R code that will produce the graph is

x<-c(0,1)
y<-c(0,1)
z<-matrix(c(0,0,1,1),nrow=2)
10

trans<-persp(x, y, z, theta = 45, phi = 30)


bottom<-c(.5,0,0,1)%*%trans
top<-c(.5,1,1,1)%*%trans
xends<-c(top[,1]/top[,4],bottom[,1]/bottom[,4])
yends<-c(top[,2]/top[,4],bottom[,2]/bottom[,4])
lines(xends,yends,lwd=2)

4.12a.

Gaus <- function(n,p,ap,tol=sqrt(.Machine$double.eps)){


if (tol*min(abs(ap[-p]))>abs(ap[p])) return(NA)
Gaus <- Epqa(n,p,0,1/ap[p])
if (p>1) {
for (i in 1:(p-1)) Gaus <- Epqa(n,i,p,-ap[i])%*%Gaus
}
if (n>p) {
for (i in (p+1):n) Gaus <- Epqa(n,i,p,-ap[i])%*%Gaus
}
return(Gaus)
}

4.12b.

n <- 3
A <- matrix(c(1,3,2,2,3,1,2,1,2,4,2,1),nrow=n)
p <- 2
Gaus(n,p,A[,p])%*%A

4.12c.

m <- dim(A)[2]
A12 <- cbind(A,matrix(rep(0,n*n),nrow=n))
for (i in 1:n) A12[i,m+i]<-1
A12 <- Gaus(3,1,A12[,1])%*%A12
A12 <- Gaus(3,2,A12[,2])%*%A12
A12 <- Gaus(3,3,A12[,3])%*%A12
A12 <- rbind(A12[,5:7],c(0,0,0))
A%*%A12%*%A
A
A12%*%A%*%A12
A12
11

A12%*%A
t(A12%*%A)
A%*%A12
t(A%*%A12)

It is a g12 inverse, but it is not a Moore-Penrose inverse, because neither


A− A nor AA− is symmetric.
4.13a.

A <- matrix(c(1,3,2,2,3,1,2,1,2,4,2,1),nrow=3)
SVDA <- svd(A)
SVDA$u%*%diag(SVDA$d)%*%t(SVDA$v)
A

4.13b.

Ainv <- t(SVDA$u%*%diag(1/SVDA$d)%*%t(SVDA$v))


A%*%Ainv%*%A
A
Ainv%*%A%*%Ainv
Ainv
Ainv%*%A
t(Ainv%*%A)
A%*%Ainv
t(A%*%Ainv)

It is not a Moore-Penrose inverse. It is a g3 generalized inverse.


4.14a. B is nonnegative definite.

> A <- matrix(c(1,3,2,2,3,1,2,1,2,4,2,1),nrow=3)


> B <- t(A)%*%A
> B
[,1] [,2] [,3] [,4]
[1,] 14 13 9 12
[2,] 13 14 9 15
[3,] 9 9 9 12
[4,] 12 15 12 21
> SVDB <- svd(B)
> SVDB$d
[1] 5.069044e+01 5.352440e+00 1.957116e+00 1.089186e-15
> sqrtB <- SVDB$u%*%sqrt(diag(SVDB$d))%*%t(SVDB$v)
> sqrtB%*%sqrtB
12

[,1] [,2] [,3] [,4]


[1,] 14 13 9 12
[2,] 13 14 9 15
[3,] 9 9 9 12
[4,] 12 15 12 21

4.14b. D is positive definite.

> C <- matrix(c(4,2,2,2,3,1,2,1,4),nrow=3)


> D <- C%*%C
> D
[,1] [,2] [,3]
[1,] 24 16 18
[2,] 16 14 11
[3,] 18 11 21
> SVDD <- svd(D)
> SVDD$d
[1] 50.680541 6.596539 1.722920
> sqrtD <- SVDD$u%*%sqrt(diag(SVDD$d))%*%t(SVDD$v)
> sqrtD%*%sqrtD
[,1] [,2] [,3]
[1,] 24 16 18
[2,] 16 14 11
[3,] 18 11 21
> sqrtD
[,1] [,2] [,3]
[1,] 4 2 2
[2,] 2 3 1
[3,] 2 1 4

C is the square root of D.


4.14c. F is positive definite.

> E <- matrix(c(4,2,2,2,-3,1,2,1,4),nrow=3)


> F <- C%*%C
> SVDF <- svd(F)
> SVDF$d
[1] 50.680541 6.596539 1.722920
> sqrtF <- SVDF$u%*%sqrt(diag(SVDF$d))%*%t(SVDF$v)
> sqrtF%*%sqrtF
[,1] [,2] [,3]
[1,] 24 16 18
13

[2,] 16 14 11
[3,] 18 11 21
> sqrtF
[,1] [,2] [,3]
[1,] 4 2 2
[2,] 2 3 1
[3,] 2 1 4

E is not the square root of F .

Exercises Beginning on Page 294


5.1. First, show that
 −1
kAxk kA−1 xk
max = min
x6=0 kxk x6=0 kxk
and  −1
kA−1 xk kAxk
max = min .
x6=0 kxk x6=0 kxk

5.2a. The matrix prior to the first elimination is


 
2 5 3 19
 1 4 1 12  .
122 9

The solution is (3, 2, 1).


5.2b. The matrix prior to the first elimination is
 
5 2 3 19
 4 1 1 12  ,
212 9

and x1 and x2 have been interchanged.


5.2c.  
100
D = 0 5 0,
002
 
000
L = 2 0 0,
120
 
041
U = 0 0 3,
000
ρ((D + L)−1 U ) = 1.50.
14

e + L)
5.2e. ρ((D e −1 U
e ) = 0.9045.
5.2g. Some R code for this is

tildeD <- matrix(c(2, 0, 0,


0, 4, 0,
0, 0, 2), nrow=3, byrow=T)
tildeL <- matrix(c(0, 0, 0,
1, 0, 0,
1, 2, 0), nrow=3, byrow=T)
tildeU <- matrix(c(0, 5, 3,
0, 0, 1,
0, 0, 0), nrow=3, byrow=T)
b <- c(12,19,9)

omega <- 0.1


tildeDUadj <- (1-omega)*tildeD - omega*tildeU
tildeAk <- tildeD+omega*tildeL
badj <- omega*b
xk <- c(1,1,1)
nstep <- 2
for (i in 1:nstep){
bk <- tildeDUadj%*%xk + badj
xkp1 <- solve(tildeAk,bk)
dif <- sqrt(sum((xkp1-xk)^2))
print(dif)
xk <- xkp1
}

5.4a. nm(m + 1) − m(m + 1)/2. (Remember AT A is symmetric.)


5.4g. Using the normal equations with the Cholesky decomposition requires
only about half as many flops as the QR, when n is much larger than
m. The QR method oftens yields better accuracy, however.
5.5a. 1. X(X T X)−1 X T X = X
2. (X T X)−1 X T X(X T X)−1 X T = (X T X)−1 X T
3. X(X T X)−1 X T is symmetric (take its transpose).
4. (X T X)−1 X T X is symmetric.
Therefore, (X T X)−1 X T = X + .
5.5b. We want to show X T (y − XX + y) = 0. Using the properties of X + , we
have
15

X T (y − XX + y) = X T y − X T XX + y
= X T y − X T (XX + )T y because of symmetry
= X T y − X T (X + )T X T y
= X T y − X T (X T )+ X T y property of Moore-Penrose inverses and transposes
= X T y − X T y property of Moore-Penrose inverses
= 0

Exercises Beginning on Page 314


6.1a. 1.
6.1b. 1.
6.1d. 1. (All that was left was to determine the probability that cn 6= 0 and
cn−1 6= 0.)
6.2a. 11.6315.
6.3.  
3.08 −0.66 0 0
 −0.66 4.92 −3.27 0
 .
 0 −3.27 7.00 −3.74 
0 0 −3.74 7.00

Exercises Beginning on Page 353


7.7. This is because the subspace that generates a singular matrix is a lower
dimensional space than the full sample space, and so its measure is 0.
7.9. 2dn/2 Γd (n/2)|Σ|n/2 .
1 1
Make the change of variables W = 2Σ 2 XΣ 2 , determine the Jacobian,
and integrate.

Exercises Beginning on Page 425


8.3. 1. See Exercise 7.7, page 353.
8.10a.

p(c) = cm − α1 cm−1 − α2 σ1 cm−2 − α3 σ1 σ2 cm−3 − · · · − αm σ1 σ2 · · · σm−1 .

8.10b. Define
p(c)
f(c) = 1 − .
cm
This is a monotone decreasing continuous function in c, with f(c) → ∞
as c → 0+ and f(c) → 0 as c → ∞. Therefore, there is a unique value
c∗ for which f(c∗ ) = 1. The uniqueness also follows from Descartes’ rule
of signs, which states that the maximum number of positive roots of a
polynomial is the number of sign changes of the coefficients, and in the
case of the polynomial p(c), this is one.
16

8.18. This is a simple case of matrix multiplication.


To illustrate the use of R in complex matrices, I will show some code
that is relevant to this problem, for a given order of course.

omegajn <- function(n){


# Function to create the n^th roots of 1
omegajn <- complex(n)
omegajn[1] <- 1
if (n>=2) omegajn[2] <-
complex(re=cos(2*pi/n),im=sin(2*pi/n))
if (n>=3) for (j in 3:n) omegajn[j] <-
omegajn[2]*omegajn[j-1]
return(omegajn)
}

Fn <- function(n){
# Function to create a Fourier matrix
rts <- omegajn(n)
Fn <- matrix(c(rep(1,n),rts),nrow=n)
if (n>=3) for (j in 3:n) Fn <- cbind(Fn,rts^(j-1))
Fn <- Fn/sqrt(n)
return(Fn)
}

# perform multiplications to get the elementary


# circulant matrix of order 5
round(Conj(t(F5))%*%diag(rts5)%*%F5)

8.19. (−1)bn/2c nn , where b·c is the floor function (the greatest integer func-
tion). For n = 1, 2, 3, 4, the determinants are 1, −4, −27, 256.

Exercises Beginning on Page 506


9.3d. Assuming W is positive definite, we have

βbW,C = (X T W X)−1 X T W y +

(X T W X)−1 LT (L(X T W X)+ LT )−1 (c − L(X T W X)+ X T W y) .

9.8. Let X = [Xi | Xo ] and Z = XoT Xo − XoT Xi (XiT Xi )−1 XiT Xo . Note that
XoT Xi = XiT Xo . We have
17

XiT X(X T X)−1 X T


 T −1
T Xi Xi XiT Xo T
= Xi [Xi | Xo ] [Xi | Xo ]
XoT Xi XoT Xo
 
= XiT Xi | XiT Xo
(XiT Xi )−1 − (XiT Xi )−1 (XoT Xi )Z −1 (XiT Xo )(XiT Xi )−1
−Z −1 (XoT Xi )(XiT Xi )−1 
−(XiT Xi )−1 (XiT Xo )Z −1
 T Z −1
Xi
XoT

= I − (XoT Xi )Z −1 (XiT Xo )(XiT Xi )−1 − XiT Xo Z −1 (XoT Xi )(XiT Xi )−1 |
 T −XiT Xo Z −1 + XiT Xo Z −1
Xi
XoT

= XiT ,

9.9. One possibility is  


20 100
 5 25 
 
 5 25 
 
 10 NA 
 
 10 NA  .
 
 10 NA 
 
 NA 10 
 
 NA 10 
NA 10
The variance-covariance matrix computed from all pairwise complete
observations is  
30 375
,
375 1230
while that computed only from complete cases is
 
75 375
.
375 1875

The correlation matrix computed from all pairwise complete observa-


tions is  
1.00 1.95
.
1.95 1.00
Note that this example is not a pseudo-correlation matrix.
In the R software system, the cov and cor functions have an argu-
ment called “use”, which can take the values “all.obs”, “complete.obs”,
18

or “pairwise.complete.obs”. The value “all.obs” yields an error if the


data matrix contains any missing values. In cov, the values “com-
plete.obs” and “pairwise.complete.obs” yield the variance-covariances
shown above. The function cor with use="pairwise.complete.obs"
yields  
1.00 1.00
.
1.00 1.00
However, if cov is invoked with use="pairwise.complete.obs" and
the function cov2cor is applied to the result, the correlations are 1.95,
as in the first correlation matrix above.
9.12b. The first step is to use the trick of equation (3.97), xT Ax = tr(AxxT ),
again to undo the earlier expression, and write the last term in equa-
tion (9.120) as
n  n
− tr Σ −1 (ȳ − µ)(ȳ − µ)T = − (ȳ − µ)Σ −1 (ȳ − µ)T .
2 2
Now Σ −1 is positive definite, so (ȳ − µ)Σ −1 (ȳ − µ)T ≥ 0 and hence
is minimized for µ̂ = ȳ. Decreasing this term increases the value of
all positive definite Σ −1 .
l(µ, Σ; y), and so l(µ̂, Σ; y) ≥ l(µ, Σ; y) forP
n
Now, we consider the other term. Let A = i=1 (yi − ȳ)(yi − ȳ)T . The
first question is whether A is positive definite. We will refer to a text
on multivariate statistics for the proof that A is positive definite with
probability 1 (see Muirhead, 1982, for example). We have
n 1 
l(µ̂, Σ; y) = c −log |Σ| − tr Σ −1 A
2 2
n 
= c− log |Σ| + tr Σ −1 A/n .
2
Because c is constant, the function is maximized at the minimum of the
latter term subject to Σ being positive definite, which, as shown for
b = A/n.
expression (9.137), occurs at Σ
t−1 t−1
9.14. We can develop a recursion for pt11 based on p11 and p12 ,
pt11 = pt−1 t−1
11 (1 − α) + p12 β,

and because p11 + p12 = 1, we have pt11 = pt−1


11 (1 − α − β) + β. Putting
this together, we have
 
β/(α + β) α/(α + β)
lim P = ,
t→∞ β/(α + β) α/(α + β)
and so the limiting (and invariant) distribution is πs = (β/(α +
β), α/(α + β)).
9.15c. From the exponential growth, we have N (T ) = N (0) erT ; hence,
1   1
r = log N (T ) /N (0) = log(r0 ).
T T
19

9.23. This is an open question. If you get a proof of convergence, submit it


for publication. You may wish to try several examples and observe the
performance of the intermediate steps. I know of no case in which the
method has not converged.
9.24b. Starting with the correlation matrix given above as a possible solution
for Exercise 9.9, four iterations of equation (9.81) using δ = 0.05 and
f(x) = tanh(x) yield  
1.000 0.997
.
0.997 1.000

Exercises Beginning on Page 573


10.1a. The computations do not overflow. The first floating-point number x
such that x + 1 = x is
0.10 · · · 0 × bp+1 .
Therefore, the series converges at the value of i such that i(i + 1)/2 = x.
Now solve for i.
10.2. The function is log(n), and Euler’s constant is 0.57721....
10.5. 2−56. (The standard has 53 bits normalized, so the last bit is 2−55 , and
half of that is 2−56 .)
10.6a. Normalized: 2bp−1 (b − 1)(emax − emin + 1) + 1.
Nonnormalized: 2bp−1 (b − 1)(emax − emin + 1) + 1 + 2bp−1 .
10.6b. Normalized: bemin −1 .
Nonnormalized: bemin −p .
10.6c. 1 + b−p+1 or 1 + b−p when b = 2 and the first bit is hidden.
10.6d. bp .
10.6e. 22.
10.11. First of all, we recognize that the full sum in each case is 1. We therefore
accumulate the sum from the direction in which there are fewer terms.
After computing the first term from the appropriate direction, take a
logarithm to determine a scaling factor, say sk . (This term will be the
smallest in the sum.) Next, proceed to accumulate terms until the sum
is of a different order of magnitude than the next term. At that point,
perform a scale adjustment by dividing by s. Resume summing, making
similar scale adjustments as necessary, until the limit of the summation
is reached.
10.13. The result is close to 1.
What is relevant here is that numbers close to 1 have only a very few
digits of accuracy; therefore, it would be better to design this program so
that it returns 1−Pr(X ≤ x) (the “significance level”). The purpose and
the anticipated use of a program determine how it should be designed.
10.16a. 2.
10.16b. 0.
10.16c. No (because the operations in the “for” loop are not chained).
20

10.17c.
a = x1
b = y1
s=0
for i = 2, n
{
d = (xi − a)/i
e = (yi − b)/i
a= d+a
b= e+b
s = i(i − 1)de + s
}.
10.20. 1. No; 2. Yes; 3. No; 4. No.
10.21. A very simple example is
 
1 1+
,
1 1

where  < b−p , because in this case the matrix stored in the computer
would be singular. Another example is
 
1 a(1 + )
,
a(1 + ) a2 (1 + 2)

where  is the machine epsilon.


10.23a.

> 1/0
[1] Inf
> -5*(1/0)
[1] -Inf
> (1/0)*(1/0)
[1] Inf
> 1/0==5*(1/0)
[1] TRUE

10.23b.

> 0/0
[1] NaN
> -5*(0/0)
[1] NaN
> 0/0==NaN
[1] NA
21

> x <- NaN


> x==NaN
[1] NA
> is.nan(x)
[1] TRUE
> is.na(x)
[1] TRUE

10.23c.

> x<-NA
> x==NA
[1] NA
> is.nan(x)
[1] FALSE
> is.na(x)
[1] TRUE

10.26. By decomposing 24 and multiplying some factors, or by using Stirling’s


approximation, we know that there are approximately 24 or 25 digits in
24! By considering the number of times 10 occurs as a factor, we know
that the last four digits are 0s.

> x24mpfr <- mpfr(24, prec=25*log2(10))


> x24mpfrfac <- factorial(x24mpfr)
> print(x24mpfrfac, digits=26)
1 ’mpfr’ number of precision 83 bits
[1] 620448401733239439360000

For comparison,

> x24 <- 24


> x24fac <- factorial(x24)
> print(x24fac, digits=22)
[1] 6.2044840173323941e+23

Exercises Beginning on Page 592


11.2a. O(nk).
11.2c. At each successive stage in the fan-in, the number of processors doing
the additions goes down by approximately one-half.
22

If p ≈ k, then O(n log k) (fan-in on one element of c at a time)


If p ≈ nk, then O(log k) (fan-in on all elements of c simultaneously)
If p is a fixed constant smaller than k, the order of time does not change;
only the multiplicative constant changes.
Notice the difference in the order of time and the order of the number of
computations. Often there is very little that can be done about the order
of computations.
11.2d. Because in a serial algorithm the magnitudes of the summands become
more and more different. In the fan-in, they are more likely to remain
relatively equal. Adding magnitudes of different quantities results in
benign roundoff, but many benign roundoffs become bad. (This is not
catastrophic cancellation.) Clearly, if all elements are nonnegative, this
argument would hold. Even if the elements are randomly distributed,
there is likely to be a drift in the sum (this can be thought of as a
random walk). There is no difference in the number of computations.
11.2e. Case 1: p ≈ n. Give each ci a processor – do an outer loop on each.
This would likely be more efficient because all processors are active at
once.
Case 2: p ≈ nk. Give each aij bj a processor – fan-in for each. This would
be the same as the other.
If p is a fixed constant smaller than n, set it up as in Case 1, using n/p
groups of ci ’s.
11.2f. If p ≈ n, then O(k).
If p ≈ nk, then O(log k).
If p is some small fixed constant, the order of time does not change; only
the multiplicative constant changes.

Exercises Beginning on Page 633


12.2. Here is a recursive Matlab function for the Strassen algorithm due to
Coleman and Van Loan. When it uses the Strassen algorithm, it requires
the matrices to have even dimension.
function C = strass(A,B,nmin)
%
% Strassen matrix multiplication C=AB
% A, B must be square and of even dimension
% From Coleman and Van Loan
% If n <= nmin, the multiplication is done conventionally
%
[n n ] = size(A);
if n <= nmin
C = A * B; % n is small, get C conventionally
else
m = n/2; u = 1:m; v = m+1:n;
P1 = strass(A(u,u)+A(v,v), B(u,u)+B(v,v), nmin);
23

P2 = strass(A(v,u)+A(v,v), B(u,u), nmin);


P3 = strass(A(u,u), B(u,v)-B(v,v), nmin);
P4 = strass(A(v,v), B(v,u)-B(u,u), nmin);
P5 = strass(A(u,u)+A(u,v), B(v,v), nmin);
P6 = strass(A(v,u)-A(u,u), B(u,u)+B(u,v), nmin);
P7 = strass(A(u,v)-A(v,v), B(v,u)+B(v,v), nmin);
C = [P1+P4-P5+P7 P3+P5; P2+P4 P1+P3-P2+P6];
end
12.5a.

real a(4,3)
data a/3.,6.,8.,2.,5.,1.,6.,3.,6.,2.,7.,1./
n = 4
m = 3
x1 = a(2,2) ! Temporary variables must be used because of
x2 = a(4,2) ! the side effects of srotg.
call srotg(x1, x2,, c, s)
call srot(m, a(2,1), n, a(4,1), n, c, s)
print *, c, s
print *, a
end

This yields 0.3162278 and 0.9486833 for c and s. The transformed matrix
is  
3.000000 5.000000 6.000000
 3.794733 3.162278 1.581139 
 .
 8.000000 6.000000 7.000000 
−5.059644 −0.00000002980232 −1.581139
12.7b. Using the Matrix package in R, after initializing rho and sig2, this is
Vinv <- sparseMatrix(i=c(1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8,9,
9,9,10,10),
j=c(1,2,1:3,2:4,3:5,4:6,5:7,6:8,7:9,8:10,9,10),
x=c(1,-rho,rep(c(-rho,1+rho^2,-rho),8),-rho,1))/
(1-rho^2)*sig2

12.9. 10.7461941829033 and 10.7461941829034.


12.11.
−(fh)+ei ch−bi −(ce)+bf
−(ceg)+bfg+cdh−afh−bdi+aei −(ceg)+bfg+cdh−afh−bdi+aei −(ceg)+bfg+cdh−afh−bdi+aei

fg−di −(cg)+ai cd−af


−(ceg)+bfg+cdh−afh−bdi+aei −(ceg)+bfg+cdh−afh−bdi+aei −(ceg)+bfg+cdh−afh−bdi+aei

−(eg)+dh bg−ah −(bd)+ae


−(ceg)+bfg+cdh−afh−bdi+aei −(ceg)+bfg+cdh−afh−bdi+aei −(ceg)+bfg+cdh−afh−bdi+aei

You might also like