LAChapter 1
LAChapter 1
Solving linear systems (a system of linear equations) is the most important problem of linear
algebra and possibly of applied mathematics as well. Usually, information in a linear system
is often arranged into a rectangular array, called a “matrix”. The matrix is particularly
important in developing computer programs to solve linear systems with huge sizes because
computers are suitable to manage numerical data in arrays. Moreover, matrices are not only
a simple tool for solving linear systems but also mathematical objects in their own right. In
fact, matrix theory has a variety of applications in science, engineering, and mathematics.
Therefore, we begin our study on linear systems and matrices in the first chapter.
Let R denote the set of real numbers. We now introduce linear equations, linear systems,
and matrices.
We consider
a1 x1 + a2 x2 + · · · + an xn = b,
1
2 CHAPTER 1. LINEAR SYSTEMS AND MATRICES
R
where ai ∈ (i = 1, 2, . . . , n) are coefficients, xi (i = 1, 2, . . . , n) are variables (unknowns),
R
n is a positive integer, and b ∈ is a constant. An equation of this form is called a linear
equation, in which all variables occur to the first power. When b = 0, the linear equation
is called a homogeneous linear equation. A sequence of numbers s1 , s2 , . . . , sn is called
a solution of the equation if x1 = s1 , x2 = s2 , . . . , xn = sn such that
a1 s1 + a2 s2 + · · · + an sn = b.
The set of all solutions of the equation is called the solution set of the equation.
(a) x + y = 1.
(b) x + y + z = 1.
It is easy to see that the solution set of (a) is a line in xy-plane and the solution set of (b)
is a plane in xyz-space.
The graphs of these equations are lines called l1 and l2 . We have three possible cases of lines
l1 and l2 in xy-plane. See Figure 1.1.
When l1 and l2 intersect at only one point, there is exactly one solution of the system.
When l1 and l2 coincide, there are infinitely many solutions of the system.
y y y
@ 6 @ 6 6
@ @ l1
@
@
@ l1 @ l
@ @ @
2
x
- x
- l2 x
-
@ @ @
@ @ @
l2 @ @
@ l1 @
@
@
@
@@
No solution One solution Infinitely many solutions
Figure 1.1
1.1.2 Matrices
The term matrix was first introduced by a British mathematician James Sylvester in the 19th
century. Another British mathematician Arthur Cayley developed basic algebraic operations
on matrices in the 1850s. Up to now, matrices have become the language to know.
Definition. A matrix is a rectangular array of numbers. The numbers in the array are
called the entries in the matrix.
Remark. The size of a matrix is described in terms of the number of rows and columns it
contains. Usually, a matrix with m rows and n columns is called an m × n matrix. If A
is an m × n matrix, then we denote the entry in row i and column j of A by the symbol
(A)ij = aij . Moreover, a matrix with real entries will be called a real matrix and the set of
R
all m × n real matrices will be denoted by the symbol m×n . For instance, a matrix A in
R m×n
can be written as
a11 a12 · · · a1n
a21 a22 · · · a2n
A = .. .. .. ,
. . .
am1 am2 · · · amn
where aij ∈ R
for any i and j. When compactness of notation is desired, the preceding
matrix can be written as
A = [aij ] .
4 CHAPTER 1. LINEAR SYSTEMS AND MATRICES
a = [a1 , a2 , . . . , an ] ∈ R1×n.
A column matrix is a general m × 1 matrix b given by
b1
b2
b = .. ∈ m×1 .
.
R
bm
The main diagonal of the square matrix A is the set of entries a11 , a22 , . . . , ann in (1.2).
For linear system (1.1), we can write it briefly as the following matrix form
a11 a12 · · · a1n b1
a21 a22 · · · a2n b2
.. .. .. .. ,
. . . .
am1 am2 · · · amn bm
Remark. When we construct an augmented matrix associated with a given linear system,
the unknowns must be written in the same order in each equation and the constants must
be on the right.
In order to solve a linear system efficiently, we replace the given system with its augmented
matrix and then solve the same system by operating on the rows of the augmented matrix.
There are three elementary row operations on matrices defined as follows:
By using elementary row operations, we can always reduce the augmented matrix of a given
system to a simpler augmented matrix from which the solution of the corresponding system
is evident. In fact, the elementary row operations do not change the solution of the given
system. See the following example.
Solution. In the left column below we solve a system by operating on the equations in the
system, and in the right column we use the corresponding operations on the rows of its
augmented matrix.
x+ y+ z= 3
1 1 1 3
−x + y − z = 3 −1 1 −1 3 .
2x + 2y − 3z = 1. 2 2 −3 1
We immediately have
x = −1, y = 3, z = 1.
In this section, we develop a method called Gauss-Jordan elimination [1, p. 15] for solv-
ing linear systems. In fact, Gauss-Jordan elimination is the most frequently used algorithm
in scientific computing.
In the example of Subsection 1.1.3, we solved the given linear system by reducing the aug-
mented matrix to
1 0 0 −1
0 1 0 3
0 0 1 1
from which the solution of the system was evident. This is an example of a matrix that is
in reduced row-echelon form. We therefore give the following definition.
Definition. For any matrix in reduced row-echelon form, it must satisfy the following con-
ditions.
1.2. GAUSS-JORDAN ELIMINATION 7
(i) For the rows that consist entirely of zeros, they are grouped together at the bottom of
the matrix. The rows that consist entirely of zeros will be called zero rows.
(ii) If a row does not consist entirely of zeros, then the first nonzero number in the row is
a 1. We call this a leading 1.
(iii) For two successive rows that both contain leading 1’s, the leading 1 in the higher row
occurs farther to the left than the leading 1 in the lower row.
(iv) Each column that contains a leading 1 has zeros in all its other entries.
Remark. A matrix having properties (i), (ii), (iii), but not necessarily (iv), is said to be in
row-echelon form.
Gauss-Jordan elimination is a standard technique for solving linear systems. Actually Gauss-
Jordan elimination is a step-by-step elimination procedure which reduces an augmented
matrix of a given linear system to reduced row-echelon form. Then the solution set of the
system can be found by just inspection. We illustrate the idea by the following example.
Now, by using the elementary row operations, we are going to reduce the matrix to reduced
row-echelon form.
8 CHAPTER 1. LINEAR SYSTEMS AND MATRICES
Step 1. Interchange the top row with another row, if necessary, to bring a nonzero entry
to the top of the leftmost column that does not consist entirely of zeros:
2 6 6 4 2 28
Interchange 1st row and 2nd row
−−−−−−−−−−−−−−−−−−−−→ 0 −3 0 0 7 15 .
2 11 6 4 −9 5
Step 2. If the entry that is now at top of the column found in Step 1 is a ̸= 0, multiply
the first row by 1/a in order to introduce the leading 1:
1 3 3 2 1 14
1/2 × 1st row
−−−−−−−−−−−−−−−−→ 0 −3 0 0 7 15 .
2 11 6 4 −9 5
Step 3. Add suitable multiples of the top row to the rows below so that all entries below
the leading 1 become zeros:
1 3 3 2 1 14
3rd row + (−2) × 1st row
−−−−−−−−−−−−−−−−−→ 0 −3 0 0 7 15 .
0 5 0 0 −11 −23
Step 4. Now cover the top row in the matrix and begin again with Step 1 applied to the
submatrix remained. Continue in this way until the entire matrix is in row-echelon
form:
1 3 3 2 1 14
(−1/3) × 2nd row
−−−−−−−−−−−−−−−−→ 0 1 0 0 − 3 −5
7
0 5 0 0 −11 −23
1 3 3 2 1 14
3rd row + (−5) × 2nd row
−−−−−−−−−−−−−−−−→ 0 1 0 0 −7 −5
3
2
0 0 0 0 3
2
1 3 3 2 1 14
3/2 × 3rd row
−−−−−−−−−−−−−−−−→ 0 1 0 0 −7 −5
3 .
0 0 0 0 1 3
The entire matrix is now in row-echelon form. To find the reduced row-echelon
form we need the following additional step.
1.2. GAUSS-JORDAN ELIMINATION 9
Step 5. Beginning with the last nonzero row and working upward, add suitable multiples
of each row to the rows above to introduce zeros above the leading 1’s:
1 3 3 2 1 14
2nd row + (7/3) × 3rd row
−−−−−−−−−−−−−−−−−→ 0 1 0 0 0 2
0 0 0 0 1 3
1 3 3 2 0 11
1st row + (−1) × 3rd row
−−−−−−−−−−−−−−−−→ 0 1 0 0 0 2
0 0 0 0 1 3
1 0 3 2 0 5
1st row + (−3) × 2nd row
−−−−−−−−−−−−−−−−→ 0 1 0 0 0 2 .
0 0 0 0 1 3
Since x1 , x2 , and x5 correspond to leading 1’s in reduced row-echelon form of the augmented
matrix, we call them leading variables. The remaining variables x3 and x4 are called free
variables. Solving for the leading variables yields
x1 = −3x3 − 2x4 + 5
x2 = 2
x5 = 3.
A linear system is called to be homogeneous if the constant terms are all zero. Consider
a11 x1 + a12 x2 + · · · + a1n xn = 0
a21 x1 + a22 x2 + · · · + a2n xn = 0
.. .. .. ..
. . . .
a x + a x + · · · + a x = 0.
m1 1 m2 2 mn n
Theorem 1.1. A homogeneous linear system has infinitely many solutions if there are more
variables than equations.
Proof. Let
a11 x1 + a12 x2 + · · · + a1m xm + · · · + a1n xn = 0
a21 x1 + a22 x2 + · · · + a2m xm + · · · + a2n xn = 0
.. .. .. .. ..
. . . . .
a x + a x + · · · + a x + · · · + a x = 0,
m1 1 m2 2 mm m mn n
where m < n. By using elementary row operations, one can obtain the reduced row-echelon
form of the augmented matrix of the system. It follows from the reduced row-echelon form
that the corresponding system has the following form
P
xk 1 + ()=0
P
xk 2 + ()=0
.. ..
. .
P
xk r + ( ) = 0,
P
where k1 < k2 < · · · < kr are numbers in the set {1, 2, . . . , m} and ( ) denotes sums that
involve the n − r free variables. We remark that r ≤ m < n and usually k1 = 1. Finally, we
obtain P
xk 1 = − ( )
P
xk = − ( )
2
..
.
x P
kr =− ( ),
1.3. MATRIX OPERATIONS 11
Example. We consider the following linear system with 6 variables and 4 equations,
a11 x1 + a12 x2 + a13 x3 + a14 x4 + a15 x5 + a16 x6 = 0
a21 x1 + a22 x2 + a23 x3 + a24 x4 + a25 x5 + a26 x6 = 0
a31 x1 + a32 x2 + a33 x3 + a34 x4 + a35 x5 + a36 x6 = 0
a41 x1 + a42 x2 + a43 x3 + a44 x4 + a45 x5 + a46 x6 = 0.
The augmented matrix A of the system is
a11 a12 a13 a14 a15 a16 0
a21 0
A= a31
a22
a32
a23
a33
a24
a34
a25
a35
a26
a36
∈
0
R4×7.
a41 a42 a43 a44 a45 a46 0
By using elementary row operations, if the reduced row-echelon form of A is obtained as
1 b12 0 0 0 b16 0
0 0 1 b24 0 b26 0
,
0 0 0 0 1 b36 0
0 0 0 0 0 0 0
then the corresponding system is
x1 + b12 x2 + b16 x6 = 0
x3 + b24 x4 + b26 x6 = 0
x5 + b36 x6 = 0.
We therefore have
x1 = −b12 x2 − b16 x6
x3 = −b24 x4 − b26 x6
x5 = −b36 x6 .
From the proof of Theorem 1.1, it follows that r = 3, k1 = 1, k2 = 3, and k3 = 5. There are
three free variables, say, x2 , x4 , and x6 . Thus, the system has infinitely many solutions.
Matrices appear in many contexts other than as augmented matrices for linear systems.
In this section, we begin our study on matrix theory by giving some basic definitions of
the subject. We also introduce some operations on matrices and discuss their fundamental
properties.
12 CHAPTER 1. LINEAR SYSTEMS AND MATRICES
Now, we develop an arithmetic of matrices which contains the sum, difference, product of
matrices, and so on. We have the following definition of operations on matrices.
Definition.
(i) Equal of matrices: Two matrices A = [aij ] and B = [bij ] are said to be equal, denoted
by A = B, if they have the same size and aij = bij for all i, j.
(ii) Sum and difference: Let A = [aij ] and B = [bij ] have the same size. Then A + B is
a matrix with the entries given by (A + B)ij := aij + bij for all i, j, and A − B is a
matrix with the entries given by (A − B)ij := aij − bij for all i, j.
(iii) Scalar multiplication: Let A = [aij ] and c be any scalar. Then cA is a matrix with the
entries given by (cA)ij := caij for all i, j.
s
X
(iv) Linear combination of matrices: ci A(i) , where A(i) (1 ≤ i ≤ s) are matrices of the
i=1
same size and ci (1 ≤ i ≤ s) are scalars.
(v) Matrix product: Let A = [aij ] be a general m × r matrix and B = [bij ] be a general
r × n matrix. Then the product of A and B is an m × n matrix denoted by
a11 a12 · · · a1r
a21 a22 · · · a2r −−
.. .. .. b11 b12 · · · ¦ b1j ¦ · · · b1n
. . . b21 b22 · · · ¦ b2j ¦ · · · b2n
AB =
−−−−−−−−−
.. .. . ..
¦ ai1 ai2 · · · air ¦ . . ¦ .. ¦ .
−−−−−−−−−
.. .. .. br1 br2 · · · ¦ brj ¦ · · · brn
. . . −−
am1 am2 · · · amr
Remark. In the definition of matrix product, the number of columns of the first factor A
must be the same as the number of rows of the second factor B in order to form the product
AB. If the condition is not satisfied, then the product is undefined. Even if A and B are
both n × n matrices, we usually have AB ̸= BA.
1.3. MATRIX OPERATIONS 13
Example 2. Let
1 5 4 3 1 2
A= 2 −3 0 , B = 0 −1 −2 .
0 4 1 1 6 2
But AB ̸= BA.
A matrix can be partitioned into smaller matrices by inserting horizontal and vertical rules
between selected rows and columns. For instance, below are three possible partitions of a
general 4 × 5 matrix A. The first one is a partition of A into four submatrices A11 , A12 ,
A21 , and A22 ; the second one is a partition of A into its row matrices r1 , r2 , r3 , and r4 ; the
third one is a partition of A into its column matrices c1 , c2 , c3 , c4 , and c5 :
14 CHAPTER 1. LINEAR SYSTEMS AND MATRICES
a11 a12 a13 ¦ a14 a15
a21 a22 a23 ¦ a24 a25 A11 A12
A = − − − − − − − − −− = ;
a31 a32 a33 ¦ a34 a35 A21 A22
a41 a42 a43 ¦ a44 a45
a11 a12 a13 a14 a15 r1
− − − − − − − − −− −−
a21 a22 a23 a24 a25 r2
A=
− − − − − − − − −− = −− ;
r
a31 a32 a33 a34 a35 3
− − − − − − − − −− −−
a41 a42 a43 a44 a45 r4
a11 ¦ a12 ¦ a13 ¦ a14 ¦ a15
a21 ¦ a22 ¦ a23 ¦ a24 ¦ a25
A=
a31 ¦ a32 ¦ a33 ¦ a34 ¦ a35 = c1 ¦ c2 ¦ c3 ¦ c4 ¦ c5 .
a41 ¦ a42 ¦ a43 ¦ a44 ¦ a45
and
hX
r r
X i
ith row matrix of AB = aik bk1 , . . . , aik bkn = [ ith row matrix of A ]B.
k=1 k=1
From the remark in Subsection 1.3.3, we know that the computation of a matrix product
can be completed by some special partitions of matrices. We now introduce the general case.
Let
A11 A12 · · · A1s B11 B12 · · · B1t
A21 A22 · · · A2s B21 B22 · · · B2t
A = ..
.
..
.. ∈ R
m×l
,
B = .. ..
.. ∈ R
l×n
,
. . . . .
Ar1 Ar2 · · · Ars Bs1 Bs2 · · · Bst
where the number of columns of submatrix Aik is equal to the number of rows of submatrix
Bkj for each 1 ≤ i ≤ r, 1 ≤ k ≤ s, and 1 ≤ j ≤ t. Then we construct
C11 C12 · · · C1t
C21 C22 · · · C2t
C = ..
.
..
.. ∈ R
m×n
,
. .
Cr1 Cr2 · · · Crt
for 1 ≤ i ≤ r and 1 ≤ j ≤ t. In fact, the partitioned matrix C is nothing new but the
product of matrices A and B, i.e., C = AB. See the following example.
R R
Example. Let A ∈ 3×3 and B ∈ 3×2 . Below are the partitions of A and B:
a11 ¦ a12 a13 b11 ¦ b12
− − − − −− A A − − −− B B
A = a21 ¦ a22 a23 = 11 12
, B = b21 ¦ b22 = 11 12
.
A21 A22 B21 B22
a31 ¦ a32 a33 b31 ¦ b32
16 CHAPTER 1. LINEAR SYSTEMS AND MATRICES
Then
A11 A12 B11 B12 A11 B11 + A12 B21 A11 B12 + A12 B22
=
A21 A22 B21 B22 A21 B11 + A22 B21 A21 B12 + A22 B22
b21 b22
a11 b11 + a12 a13
b31
a11 b12 + a12 a13
b32
= −
− −− − − − − − − − − − − − − −− − − − − − − − − − − − −− − −
a21 a22 a23 b21 a21 a22 a23 b22
b11 + b12 +
a31 a32 a33 b31 a31 a32 a33 b32
a11 b11 + a12 b21 + a13 b31 a11 b12 + a12 b22 + a13 b32
− − − − − − − − − − − − − − − − − − − − − − − − − − −−
=
a21 b11
a22 b21 + a23 b31 a21 b12 a22 b22 + a23 b32
+ +
a31 b11 a32 b21 + a33 b31 a31 b12 a32 b22 + a33 b32
a11 b11 + a12 b21 + a13 b31 a11 b12 + a12 b22 + a13 b32
= a21 b11 + a22 b21 + a23 b31 a21 b12 + a22 b22 + a23 b32 = AB.
a31 b11 + a32 b21 + a33 b31 a31 b12 + a32 b22 + a33 b32
In fact, the matrix product has an important application in solving linear systems. Consider
linear system (1.1) of m linear equations in n unknowns. We can replace the m equations
in this system with the single matrix equation
a11 x1 + a12 x2 + · · · + a1n xn b1
a x + a x + ··· + a x b
21 1 22 2 2n n 2
=
.. .. .. .. .
. . . .
am1 x1 + am2 x2 + · · · + amn xn bm
By using the product of matrices, it follows that
a11 a12 · · · a1n x1 b1
a
21 a22 · · · a2n x2 b2
= .
.. .. .. .. ..
. . . . .
am1 am2 · · · amn xn bm
Let
x1 b1
a11 a12 ··· a1n
x2 b2
a21 a22 ··· a2n
A= .. .. .. , x=
..
,
b=
..
.
. . . . .
am1 am2 · · · amn xn bm
1.3. MATRIX OPERATIONS 17
Then the original system has been replaced by the single matrix equation
Ax = b.
Here A is called the coefficient matrix of the system. Thus, the augmented matrix for
the system is obtained by adjoining b to A as the last column, i.e., [ A ¦ b ].
Remark. Note that by using matrix operations on the above linear system, it can also be
written as follows:
a11 a12 a1n b1
a a a b
21 22 2n 2
x1
..
+ x2
..
+ · · · + xn
.. = .. ,
. . . .
am1 am2 amn bm
i.e.,
x1 c1 + x2 c2 + · · · + xn cn = b, (1.5)
where cj is the jth column matrix of A for 1 ≤ j ≤ n.
For instance,
a11 a12 a13 a11 a21 a31
A = a21 a22 a23 , AT = a12 a22 a32 , tr(A) = a11 + a22 + a33 .
a31 a32 a33 a13 a23 a33
Some important properties of the transpose are listed in the following theorem.
Theorem 1.2. Let the sizes of matrices A and B be such that the stated operations can be
performed. Then
18 CHAPTER 1. LINEAR SYSTEMS AND MATRICES
(a) (AT )T = A.
(b) (A ± B)T = AT ± B T .
(d) (AB)T = B T AT .
Proof. Parts (a), (b), and (c) are self-evident. We therefore only prove (d). Let
To evaluate the right-hand side of (1.6), let AT = [a′ij ] and B T = [b′ij ], then
Some important properties of the trace are included in the following theorem.
(c) tr(αA + βB) = αtr(A) + βtr(B), where α and β are any scalars.
Proof. Part (a) is obvious. For (b), because addition is associative and commutative, we
have
n
X n X
X n n X
X n n
X
tr(AB) = (AB)ii = aik bki = bki aik = (BA)kk = tr(BA).
i=1 i=1 k=1 k=1 i=1 k=1
n
X n
X
tr(αA + βB) = (αA + βB)ii = (α · aii + β · bii )
i=1 i=1
n
X n
X
=α aii + β bii = αtr(A) + βtr(B).
i=1 i=1
For (e), by using (a), the given condition B T = −B, and (c), we deduce
Thus, tr(B) = 0.
In this section, we study some basic properties of the arithmetic operations on matrices.
20 CHAPTER 1. LINEAR SYSTEMS AND MATRICES
Theorem 1.4. Let A, B, and C be matrices and the sizes of matrices be assumed such that
the indicated operations can be performed. The following rules of matrix operations are valid.
Proof. We only prove (c) of the associative law for matrix product. The other parts here are
left as an exercise. Assume that
for 1 ≤ j ≤ n and 1 ≤ l ≤ r. Since (AB)C = V C, the entry in row i and column l of matrix
V C is given as follows:
m m n
! m X n
X X X X
(V C)il = vik ckl = aij bjk ckl = aij bjk ckl (1.7)
k=1 k=1 j=1 k=1 j=1
for 1 ≤ i ≤ s and 1 ≤ l ≤ r. Since A(BC) = AW , the entry in row i and column l of matrix
AW is given as follows:
n n m
! n Xm
X X X X
(AW )il = aij wjl = aij bjk ckl = aij bjk ckl (1.8)
j=1 j=1 k=1 j=1 k=1
Remark. Throughout the book, we use the symbol In to denote the n×n identity matrix. If
there is no confusion, we sometimes use I to denote the identity matrix with an appropriate
size. Besides, we also use ei to denote the ith column matrix of In , i.e.,
T
ei = 0 · · · 0 1 0 ··· 0 .
↑
ith
Theorem 1.5. Let the sizes of the matrices be such that the indicated operations can be
performed. The following rules of matrix operations are valid.
(a) AI = A, IA = A.
(b) A + 0 = 0 + A = A.
(c) A0 = 0, 0A = 0.
22 CHAPTER 1. LINEAR SYSTEMS AND MATRICES
Example. Let
1 −1 1 1
A= , B= .
−1 1 1 1
Then AB = 0 even if both A and B are nonzero matrices. Thus, if AB = 0 for A ∈ Rm×n
R
and B ∈ n×r , perhaps it does not follow that A = 0 or B = 0.
As the following theorem shows, the identity matrix is useful in studying reduced row-
echelon forms of square matrices. The proof of the theorem is left as an exercise.
Theorem 1.6. Let R be the reduced row-echelon form of a square matrix A. Then either R
has a row of zeros or R = I.
Definition. Let A and B be square matrices of the same size such that
AB = BA = I.
and
3 −1 1 1 1 0
BA = = = I.
−2 1 2 3 0 1
The next theorem shows that an invertible matrix has exactly one inverse.
AB = BA = I, AC = CA = I.
(BA)C = IC = C.
(BA)C = B(AC) = BI = B.
Thus, C = B.
For a 2 × 2 invertible matrix, the following theorem gives a formula for constructing the
inverse.
The following theorem is concerned with the invertibility of the product of invertible
matrices.
(AB)−1 = B −1 A−1 .
In general,
−1
A(1) A(2) · · · A(p) = A−1 −1 −1
(p) · · · A(2) A(1) , (1.9)
where A(i) (1 ≤ i ≤ p) are n × n invertible matrices and p is any positive integer.
24 CHAPTER 1. LINEAR SYSTEMS AND MATRICES
and
(B −1 A−1 )(AB) = B −1 (A−1 A)B = B −1 B = I.
Thus,
(AB)−1 = B −1 A−1 .
We are now going to prove the general case of (1.9) by using induction. When p = 2, we
have just proved that (1.9) is true. We assume that (1.9) is true for p = k − 1, i.e.,
−1
A(1) A(2) · · · A(k−1) = A−1 −1 −1
(k−1) · · · A(2) A(1) . (1.10)
Considering the case of p = k, it follows from the case of p = 2 and (1.10) that
−1 −1
A(1) A(2) · · · A(k−1) A(k) = A(1) A(2) · · · A(k−1) A(k)
−1
= A−1
(k) A(1) A(2) · · · A(k−1) = A−1 −1 −1 −1
(k) A(k−1) · · · A(2) A(1)
= A−1 −1 −1 −1
(k) A(k−1) · · · A(2) A(1) .
The following theorem gives a relationship between the inverse of an invertible matrix A
and the inverse of AT .
and
(A−1 )T AT = (AA−1 )T = I T = I.
Thus, (AT )−1 = (A−1 )T .
1.5. ELEMENTARY MATRICES AND A METHOD FOR FINDING A−1 25
A0 := I, An := AA · · · A} .
| {z
n
A−n := (A−1 )n = A −1 −1 −1
| A {z· · · A } .
n
If r and s are integers, then it follows from the definition of powers that
We develop an algorithm for finding the inverse of an invertible matrix in this section.
Theorem 1.12. Let A be an m × n matrix. If the elementary matrix E results from per-
forming a certain row operation on Im , then the product EA is the matrix that results when
this same row operation is performed on A.
Proof. We only prove the statement concerned with the elementary matrix E(i, j). One can
prove the statements concerned with E i(c) and E i, j(k) easily by using the same trick.
Let r1 , r2 , . . . , rm denote the row matrices of A and e1 , e2 , . . . , em denote the column matrices
of Im . It follows that T
r1 e1
.. ..
. .
T
ri ej
. .
A= .
. ,
E(i, j) = .. .
r eT
j i
. .
.. ..
rm eTm
Since eTk A = rk for 1 ≤ k ≤ m, we have by (1.4),
eT1 eT1 A r1
.. .. ..
. . .
T T
ej ej A rj
. .. ..
E(i, j)A = .. A =
.
= . .
eT eTi A ri
i
. ..
..
.. . .
eTm eTm A rm
Hence E(i, j)A is a matrix that results when rows i and j of A are interchanged.
The following theorem is concerned with the invertibility of elementary matrices. The
proof of the theorem is left as an exercise.
Remark. It follows from Theorem 1.13 that the inverse of any elementary matrix is still an
elementary matrix.
28 CHAPTER 1. LINEAR SYSTEMS AND MATRICES
The next theorem establishes some equivalent statements of the invertibility of a matrix.
These results are extremely important and will be used many times later.
Theorem 1.14. Let A be an n × n matrix. Then the following statements are equivalent,
i.e., all true or all false.
(a) A is invertible.
(a) ⇒ (b): Let x0 be any solution of Ax = 0, i.e., Ax0 = 0. Since A is invertible, multiplying
both sides of Ax0 = 0 by A−1 , we have A−1 Ax0 = A−1 0, which implies In x0 = 0. Thus,
x0 = 0. Therefore, Ax = 0 has only the trivial solution.
(b) ⇒ (c): Let R be the reduced row-echelon form of A. Then by Theorem 1.6, R = In
or R has a zero row. If R has a zero row, then it follows from Theorem 1.1 that Rx = 0
has infinitely many solutions, i.e., nontrivial solutions. Therefore, Ax = 0 has nontrivial
solutions, which contradicts (b). Thus, R = In .
(c) ⇒ (d): If the reduced row-echelon form of A is In , then there exist some elementary
matrices E(1) , E(2) , . . . , E(k) such that
By Theorem 1.13, we know that every elementary matrix is invertible and the inverse of an
elementary matrix is still an elementary matrix. It follows from (1.11) that
−1 −1 −1
A = E(1) E(2) · · · E(k) .
= I ¦ E(k) · · · E(2) E(1)
= I ¦ A−1 .
Thus, the sequence of elementary row operations that reduces A to I actually converts I to
A−1 simultaneously.
Thus,
39 −14 2
A−1 = −3 1 0 .
−19 7 −1
We develop more results concerned with linear systems and invertibility of matrices in this
section.
Theorem 1.15. Every linear system has either no solution, exactly one solution, or infinitely
many solutions.
Proof. If Ax = b is a system of linear equations, then exactly one of the following is true:
The proof will be completed if we can show that the system has infinitely many solutions in
case (c). Assume that Ax = b has more than one solution, and let x0 = x1 − x2 , where x1
and x2 are any two distinct solutions. Therefore, x0 is nonzero. Moreover,
i.e., x1 + kx0 is a solution of Ax = b. Since x0 ̸= 0 and there are infinitely many choices for
k, we conclude that Ax = b has infinitely many solutions.
1.6. FURTHER RESULTS ON SYSTEMS AND INVERTIBILITY 31
From the definition of the inverse of an invertible matrix A, it is necessary to find a square
matrix B such that
AB = I, BA = I.
The next theorem shows that if B satisfies either condition, then the other condition holds
automatically.
Proof. We only prove (a) and the proof of (b) is similar. We consider the system Ax = 0 and
show that this system only has the trivial solution. Let x0 be any solution of this system,
i.e., Ax0 = 0. Multiplying both sides by B yields
BAx0 = B0.
Since BA = I, we deduce
Ix0 = 0, i.e., x0 = 0.
Thus, the system Ax = 0 has only the trivial solution. It follows from Theorem 1.14 that
A−1 exists. Multiplying both sides of BA = I on the right by A−1 , we obtain
Theorem 1.17. Let A and B be square matrices of the same size. If AB is invertible, then
A and B must also be invertible.
(AB)(AB)−1 = I =⇒ A[B(AB)−1 ] = I,
and
(AB)−1 (AB) = I =⇒ [(AB)−1 A]B = I.
By Theorem 1.16, both A and B are invertible.
Example. Let A and B be square matrices of the same size. Show that if I − AB is
invertible, then I − BA is also invertible.
32 CHAPTER 1. LINEAR SYSTEMS AND MATRICES
Proof. By Theorem 1.16, we only need to find a matrix X such that (I − BA)X = I.
Actually,
I = I − BA + BA = I − BA + BIA
= I − BA + (B − BAB)(I − AB)−1 A
= I − BA + (I − BA)B(I − AB)−1 A
The following theorem shows that we can solve a certain linear system by using the
inverse of its coefficient matrix.
Theorem 1.18. Let A be an n × n invertible matrix. Then for each n × 1 matrix b, the
system Ax = b has exactly one solution, namely, x = A−1 b.
Proof. Since A is invertible, there exists A−1 . For each n × 1 matrix b, let x0 be an arbitrary
solution of Ax = b, i.e., Ax0 = b. Multiplying both sides of Ax0 = b by A−1 , we have
(a) A is invertible.
Proof. Since we know that (a), (b), (c), and (d) are equivalent, it is sufficient to prove that
(a) ⇔ (e).
(e) ⇒ (a): If the system Ax = b has exactly one solution for every n × 1 matrix b, then in
particular, the systems
Ax = e1 , Ax = e2 , ..., Ax = en
are consistent, where ei denotes the ith column matrix of In for 1 ≤ i ≤ n. Let x1 , x2 , . . . , xn
be solutions of the respective systems, and let us form an n × n matrix C having these
solutions as columns:
C = x1 ¦ x2 ¦ · · · ¦ xn .
Then we have by (1.3),
AC = Ax1 ¦ Ax2 ¦ · · · ¦ Axn = e1 ¦ e2 ¦ · · · ¦ en = In .
Certain classes of matrices have special structures, which are useful in linear algebra and
also have many applications in practice.
The following square matrices are called triangular matrices: a lower triangular matrix
a11 0 0 ··· 0
a21 a22 0 · · · 0
a32 a33 · · · 0
L = a31 (aij = 0, i < j)
.. .. .. .. ..
. . . . .
an1 an2 an3 · · · ann
(a) The transpose of a lower triangular matrix is upper triangular, and the transpose of an
upper triangular matrix is lower triangular.
(b) The product of lower triangular matrices is lower triangular, and the product of upper
triangular matrices is upper triangular.
(c) A triangular matrix is invertible if and only if its diagonal entries are all nonzero.
(d) The inverse of an invertible lower triangular matrix is lower triangular, and the inverse
of an invertible upper triangular matrix is upper triangular.
Proof. Part (a) is obvious. We defer the proof of (c) until the next chapter (after Theorem
2.8). Here we prove (b) and (d) only.
For (b), we will prove the result for lower triangular matrices. The proof for upper triangular
matrices is similar. Let A = [aij ] and B = [bij ] be n × n lower triangular matrices, and let
C = AB = [cij ]. Obviously, aij = bij = 0 for i < j. We can prove that C is lower triangular
1.7. SOME SPECIAL MATRICES 35
by showing that cij = 0 for i < j. If i < j, then the terms in the expression of cij can be
grouped as follows:
cij = ai1 b1j + ai2 b2j + · · · + ai,j−1 bj−1,j + aij bjj + ai,j+1 bj+1,j + · · · + ain bnj .
| {z } | {z }
Terms in which the row Terms in which the row
number of b is less than the number of a is less than
column number of b the column number of a
In the first grouping all of the b factors are zero since bij = 0 for i < j, and in the second
grouping all of the a factors are zero since aij = 0 for i < j. Thus, cij = 0 for i < j. It
follows that C is a lower triangular matrix.
For (d), we only prove the result for lower triangular matrices again. Let A = [aij ] be an
n × n invertible lower triangular matrix, where aij = 0 for i < j. From (c), we know that
aii ̸= 0 for all i. Suppose that B = [bij ] is the inverse of A. Then
AB = I. (1.12)
We now compare the entries in both sides of (1.12) row by row. Beginning with the first
row, we have for j > 1,
0 = (I)1j = a11 b1j + a12 b2j + · · · + a1n bnj = a11 b1j + 0 · b2j + · · · + 0 · bnj = a11 b1j ,
which implies b1j = 0. For the second row, we have for j > 2,
n
X
0 = (I)2j = a2t btj = a21 b1j + a22 b2j = a22 b2j =⇒ b2j = 0.
t=1
By induction, we suppose that for the top k − 1 rows, bij = 0 if i < j and i < k < n. For
the kth row, we have for j > k,
n
X k
X
0 = (I)kj = akt btj = akt btj = akk bkj =⇒ bkj = 0.
t=1 t=1
Theorem 1.21. Let A and B be symmetric matrices of the same size. Then
(a) AT is symmetric.
(b) A ± B are symmetric.
(c) kA is symmetric, where k is any scalar.
(d) AB is symmetric if and only if AB = BA.
(e) If A is invertible, then A−1 is symmetric.
Proof. Parts (a), (b), and (c) are obvious. Here we only prove (d) and (e).
For (d), since A and B are symmetric, it follows from Theorem 1.2 (d) that
AB = (AB)T ⇐⇒ AB = B T AT ⇐⇒ AB = BA.
Theorem 1.22. Let A be an arbitrary matrix. Then AAT and AT A are symmetric. Further-
more, if A is square and invertible, then both AAT and AT A are symmetric and invertible.
Proof. It directly follows from Theorem 1.2 (d) and (a) that
(AAT )T = (AT )T AT = AAT and (AT A)T = AT (AT )T = AT A.
If A is square and invertible, it follows from Theorem 1.10 that AT is also invertible. By
Theorem 1.9, we find that AAT and AT A are invertible.
Remark. Every square matrix can be written as a sum of a symmetric matrix and a skew-
symmetric matrix. This is one of the most famous results in matrix theory. See Exercise
1.35.
1.7. SOME SPECIAL MATRICES 37
Exercises
Elementary exercises
1.1. Determine which equations are linear in variables x, y, and z. If an equation is not
linear, explain why not.
√
(a) x − πy + 3 5z = 0. (b) x2 + y 2 + z 2 = 1.
√
(c) x−1 + 7y + z = sin(π/9). (d) 3 cos x − 4y + z = 3.
√
(e) (cos 3)x − 4y + z = 3. (f) x = −7xy + 3z.
(g) xy + z + 1 = 0. (h) x−2 + y + 8z = 5.
1.2. Determine whether each system has a unique solution, infinitely many solutions, or no
solution. Then try to solve each system to confirm your answer.
(
x+y=0 x + 5y = −1
(a) (b) −x + y = −5
2x + y = 3.
2x + 4y = 4.
1.3. Find the augmented and coefficient matrices for each of the following linear systems.
2x − 3y
+5=0 5x1 + x2
− x4 + 2x5 = 1
(a) 4x + 2y −2=0 (b) 3x2 + 2x3 − x4 =3
3x + 5z + 3 = 0. 5x1 + 3x5 = 2.
1.4. Consider
x + y + 3z = a
2x − y + 2z = b
x − 2y − z = c.
Show that if this system is consistent, then the constants a, b, and c must satisfy c = b − a.
1.5. For which values of a, will the following system have no solution? Exactly one solution?
Infinitely many solutions?
x + 2y −
3z = 5
3x − y + 5z = 1
4x + y + (a2 − 14)z = a + 2.
38 CHAPTER 1. LINEAR SYSTEMS AND MATRICES
1.8. Let
0 1 1 0 1 1
A= , B= , C= .
−1 0 0 1 1 1
1 4
(a) Is M = a linear combination of A, B, and C?
2 1
1 2
(b) Is N = a linear combination of A, B, and C?
3 4
R R R R R
1.9. Let A ∈ 4×5 , B ∈ 4×5 , C ∈ 5×2 , D ∈ 4×2 , and E ∈ 5×4 . Determine which of the
following matrix operations can be performed. If so, find the size of each resulting matrix.
1.11. Let A, B, and C be square matrices of the same size. Find an example to show that
AB = AC but B ̸= C.
1.13. Let
1 5 2 6 1 3
1 4 2
C= , D = −1 0 1 , E = −1 1 2 .
3 1 5
3 2 4 4 1 3
1.14. In each part, compute the product of A and B by the method of product of partitioned
matrices in Subsection 1.3.4. Then check your results by multiplying AB directly.
2 1 ¦ 4
−1 2 ¦ 1 5
−3 5 ¦ 2
(a) A = 0 −3 ¦ 4 2 , B= − − − − −− .
−−−−−−− 7 −1 ¦ 5
1 5 ¦ 6 1
0 3 ¦ −3
2 1 ¦ 4
−1 2 1 ¦ 5
−−−−−−− −3 5 ¦ 2
(b) A = 0 −3 4 ¦ 2 , B=
7 −1 ¦
5 .
1 5 6 ¦ 1 − − − − −−
0 3 ¦ −3
40 CHAPTER 1. LINEAR SYSTEMS AND MATRICES
2 −5
1 3 2 −1 3 −4
(c) A =
0
5 , B= .
0 1 5 7
−−−
1 4
AB = a1 b1 + a2 b2 + · · · + an bn . (1.13)
R
1.16. Let A = [aij ] ∈ n×n . Find two n × 1 matrices x and y such that xT Ay = aij , where
1 ≤ i ≤ n and 1 ≤ j ≤ n.
1.17. Let
3 −2 7 6 −2 4
A= 6 5 4 , B= 0 1 3 .
0 4 9 7 7 5
Compute
(a) tr(3A − 5B T ). (b) tr(A2 ). (c) tr(AB).
1.22. Use Theorem 1.8 to compute the inverses of the following matrices.
3 1 cos θ sin θ
(a) A = . (b) B = .
5 2 − sin θ cos θ
n
1 0 1
1.23. Find 0 1 0 , where n is any positive integer.
0 0 1
(I − A)−1 = I + A + A2 + A3 .
where A11 and A22 are square matrices. Find a permutation matrix P such that P T AP = B,
where
A22 A21
B= .
A12 A11
Challenge exercises
1.36. Determine the value of λ such that the following linear system has only the trivial
solution.
λx1 + x2 + x3 = 0
x1 + λx2 + x3 = 0
x1 + x2 + x3 = 0.
1.43. Let A ∈ Rn×n. Show that if AB = BA for all B ∈ Rn×n, then A = cI, where c is a
scalar.
(a) I − A is invertible.
(d) I + M is invertible.