Mat2114notes Written
Mat2114notes Written
Joe Wells
Virginia Tech
Fall 2023
Contents
0 Preface 4
1 Vectors
1.1 The Geometry and Algebra of Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.1 Definitions and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.2 Geometric Interpretation of Vector Operations . . . . . . . . . . . . . . . . . . . 6
1.1.3 Linear combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.4 Geometry of Linear Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2 Length and Angle: The Dot Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2.1 Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2.2 Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2.3 Angles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3 Matrices
1
CONTENTS 2
5 Orthogonality
5.1 Orthogonality in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.1.1 Orthonormality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.2 Orthogonal Complements and Orthogonal Projections . . . . . . . . . . . . . . . . . . . 56
5.2.1 Orthogonal Complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
CONTENTS 3
6 Vector Spaces
6.3 Change of Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.6 The Matrix of a Linear Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Appendix
C Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
C.1 Martices with complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Index 60
Chapter 0
Preface
There are many different approaches to linear algebra, and everyone has their preference. This
document is largely compiled from the course I taught in the Spring of 2020 at Virginia Tech, where
the book (Linear Algebra: A Modern Introduction 4th Ed. by David Poole) was out of my control.
Over the years, I’ve modified the precise topic ordering, which mostly follows the book with a few
exceptions.
Although not formally stated anywhere, this class is largely geared towards math-adjacent students
(engineering, physics, computer science, etc.) and so these notes and the presentation are at a lower
level of abstraction (and rigor) than what one might experience in another introductory linear
algebra course. In hindsight, I probably would have picked both a different text and order in which
to introduce the topics – it seems perverse to leave the phrase “vector space” until the 6th chapter!
Nevertheless, I did my best to gently introduce concepts as needed in order to more smoothly segue
the topics. As well, many of the homework exercises are designed to bridge certain theoretical gaps
in the material and introduce concepts much earlier than the text (notably, linear transformations).
I would like to thank the many students who inadvertently served as my copy editors over the years.
4
CHAPTER 0. PREFACE 5
Definition
A (real) vector space, V , is a set of objects (called vectors) with two operations – vec-
tor addition (denoted +) and scalar multiplication (no symbol) – satisfying the following
properties: for all vectors u, v, w in V and for all real numbers a, b (called scalars),
(a) u + v is in V [closure]
(b) u + v = v + u [commutativity]
(c) (u + v) + w = u + (v + w) [associativity]
(d) There is some vector 0, called the zero vector, [additive identity]
so that u + 0 = u for all vectors u.
(e) For each u in V , there is some vector −u for [additive inverse]
which u + (−u) = 0.
(f) au is in V [closure]
(g) a(u + v) = au + av [distributivity]
(h) (a + b)u = au + bu [distributivity]
(i) (ab)u = a(bu) [associativity]
(j) 1u = u [multiplicative identity]
It turns out that vector spaces are very common and you’re probably already familiar with many of
them without even knowing it.
Example 1.1.1.1: R is a vector space.
The real numbers, denoted R, form a real vector space when endowed with the normal addition
and multiplication operations.
CHAPTER 0. PREFACE 6
The set of all ordered pairs of real numbers, (x, y), is a real vector space when endowed with the
following operations.
addition: (x1 , y1 ) + (x2 + y2 ) = (x1 + x2 , y1 + y2 )
scalar multiplication: r(x, y) = (rx, ry)
The pair (0, 0) is the zero vector in this space.
The set of all polynomials with real coefficients and degree at most n (these are polynomials of
the form an xn + · · · + a1 x + a0 ), denoted P n (x), is a vector space when considered the usual
addition and scalar multiplication.
addition: (an xn + · · · + a0 ) + (bn xn + · · · + b0 ) = (an + bn )xn + · · · + (a0 + b0 )
scalar multiplication: r(an xn + · · · + a0 ) = (ran )xn + (ra0 )
The number 0 is the zero vector in this space, and this space is sometimes denoted P n .
The set of all continuous real-valued functions on R (these are functions f : R → R), denoted
C(R) is a vector space when considered with the usual function addition and scalar multiplica-
tion.
addition: f1 (x) + f2 (x) = (f1 + f2 )(x)
scalar multiplication: r(f (x)) = (rf )(x)
The function f (x) = 0 is the zero vector in this space, and this space is denoted C(R).
It is straightforward to show that each of the above is a vector space and we leave it as an exercise to
the reader.
Now we’ll take a geometric interpretation of vectors to help justify the naturality of the operations of
vector addition and scalar multiplication. Let o = (0, 0), p1 = (x1 , y1 ), p2 = (x2 , y2 ) be some points in
the plane. Let − → be the arrow from o to p , and similarly let −
op 1 1
→ be the arrow from o to p .
op 2 2
Furthermore, let p3 = p1 + p2 (with addition as described in Example 1.1.1.2). Since arrows
communicate to us a notion of length and direction, the arrow − → can be described as the total
op 3
displacement and direction indicated by placing the two arrows op1 and −
−→ → “head-to-tail”, as is
op 2
illustrated in Figure 1.1.
CHAPTER 0. PREFACE 7
p2 p2
p3 p3
−→
op
−→
op 3
2
−→
op1
p1 p1
Figure 1.1: The original vectors (left) and “head-to-tail” vector addition (right).
With p1 as before, consider some real number r. By the scalar multiplication operation described in
Example 1.1.1.2, we can consider the point p4 = rp1 = (rx1 , ry1 ). As the name suggests, scalar
multiplication by a real number r has the effect of scaling the arrow −→. In the case that r > 0, the
op 1
arrow −→ points in the same direction as −
op 4
→ and its length is scaled by r. In the case that r < 0, the
op 1
arrow −→ points in the opposite direction of −
op 4
→ and its length is scaled by |r|. (See Figure 1.2)
op 1
−→
op4
−→ −→
op
op 1 1
−→
op 4
Figure 1.2: The original vector scaled by r > 0 (left) and r < 0 (right).
We can extend this same idea to ordered n-tuples of real numbers (x1 , x2 , . . . , xn ), associating them
with arrows in n-dimensional space (the word “dimension” here should be understood only in an
intuitive sense; the definition will be made precise in a later chapter), which leads us to the following
definition.
Definition
Rn is the set of arrays with n real entries of the form
x1
..
x1 , . . . , x n or . .
xn
or as a column vector
v1
..
v = . .
vn
Each of these presentations represents the same object and should be regarded as the same.
However, certain computations are very much reliant upon the choice of representation. Throughout
this text, we will almost exclusively prefer column vectors and will be very deliberate whenever using
row vectors. One could equally well develop the theory of linear algebra using row vectors, so this is
merely a stylistic choice on the author’s part.
For the sake of concreteness, the remainder of the text will be devoted almost exclusively to
developing the theory of linear algebra using Rn . It is a fact that every finite-dimensional vector
space can be regarded being “the same” as Rn , and so there is no loss of generality in making this
specification. Most of these notions do carry over to infinite-dimensional vector spaces, although
there is considerably more prerequisite knowledge and technical detail needed to discuss such things
with any sort of rigor.
CHAPTER 0. PREFACE 9
With the operations of addition and scalar multiplication, the fundamental building blocks of any
vector space are linear combinations.
Definition: Linear Combination
A vector u in Rn is a linear combination of the vectors v1 , . . . vk if there are scalars r1 , . . . , rk
so that
u = r1 v1 + · · · + rk vk .
You can think of a linear combination as some sort of recipe - the vi ’s are the ingredients, the ri ’s are
the quantities of those ingredients, and u is the finished product. We also note that there is no
obvious relationship between k and n in the definition above. It could be that k = n, k ≤ n, or that
k > n.
Definition: Standard Basis Vectors
In Rn , there are n vectors
1 0 0
0 1 0
e1 = .. , e2 = .. , ··· en = .. .
. . .
0 0 1
For now, ignore the word basis above; we will give technical meaning to that later. The reason these
are standard is because, when looking to decompose a vector u into a linear combination of vectors,
then simply picking apart the components is probably the most natural thing to try first.
Example 1.1.3.1
5
Show that the vector u = 6 is a linear combination of the standard basis vectors.
7
5 5 0 0 1 0 0
u = 6 = 0 + 6 + 0 = 5 0 + 6 1 + 7 0 = 5e1 + 6e2 + 7e3
7 0 0 7 0 0 1
With the standard basis vectors above, one can be convinced that the linear combination that
appears is the unique such combination. However, in general, linear combinations need not be unique.
CHAPTER 0. PREFACE 10
Example 1.1.3.2
1 1 −1
Show that the vector u = −1 is a linear combination of the vectors v1 = −1, v2 = 1 ,
0 1 1
0
and v3 = 0 in multiple ways.
1
The reader may be wondering precisely when a given vector admits a unique linear combination.
This is a very important discussion with important implications, and so we will postpone this
discussion for a later chapter.
The reader is probably familiar with the Cartesian grid, which provides a useful geometric depiction
of the algebra. We similarly want to construct a grid that is uniquely suited to a given set of vectors
in Fn . We’ll call this a coordinate grid (which is nonstandard terminology), and its construction is
simple: the lines of the grids should be parallel to the vectors (in standard position) and the
intersections of these grid lines correspond to integer linear combinations of vectors.
Example 1.1.4.1
Show that the coordinate grid for R2 formed from the standard basis vectors e1 and e2 is the
usual Cartesian grid.
e2
e1
CHAPTER 0. PREFACE 11
Example 1.1.4.2
Draw the The coordinate grid for R2 formed from the vectors v1 = [1, 1]T and v2 = [1, −1]T .
v1
v2
Combined with the geometric intuition about vector addition and scalar multiplication, these
coordinate grids provide us with a way to visually identify the linear combination.
Example 1.1.4.3
Draw a picture which shows how the vector u = [2, 4]T is a linear combination of the standard
basis vectors e1 and e2 .
u = 2e1 + 5e2
u
4e2
2e1
Example 1.1.4.4
T 1 1
Show that the vector u = [2, 4] is a linear combination of the vectors v1 = and v2 = .
1 −1
CHAPTER 0. PREFACE 12
3v1
−v2
u = 3v1 − v2 .
Of course, this coordinate grid can also help to show us when linear combinations are not unique.
Example 1.1.4.5
2 2 2 0
Show that u = is a linear combination of the vectors v1 = , v2 = , and v3 = in
4 0 2 2
multiple different ways.
u u
2v3
v3
v2 v2
v1 v1
u = v1 + 2v3
= v2 + v3 .
CHAPTER 0. PREFACE 13
1.
CHAPTER 0. PREFACE 14
u • v = u1 v1 + · · · + un vn .
uT v = [u1 v1 + · · · + un vn ]
Proof. The proof is entirely straightforward and left as an exercise to the reader.
1.2.1 Length
v • v = x2 + y 2 ,
which, from the Pythagorean theorem, is precisely the square of the length of v.
CHAPTER 0. PREFACE 15
v
2
y
2 +
p x
y
Definition: length
The following are immediate consequences of the properties of the dot product in Theorem 1.2.0.1
Theorem 1.2.1.1: Properties of Length
For v ∈ Rn and a scalar k,
1. kvk = 0 if and only if v = 0.
2. kkvk = |k|kvk.
The following follows from the classical geometry result of the same name.
Theorem 1.2.1.2: Triangle Inequality
For u, v ∈ Rn ,
ku + vk ≤ kuk + kvk.
CHAPTER 0. PREFACE 16
u+v
v
u
Remark. Every unit vector in R2 corresponds to a point on the unit circle. Every unit vector in R3
corresponds to a point on the unit sphere. Generally, every unit vector in Rn corresponds to a point
on the unit (n − 1)-sphere.
v
Let v be any nonzero vector and let ` = kvk be its length. Then the vector is a unit vector because
`
v kvk `
= = =1
` ` `
v/kvk
Remark. If kvk > 1, then normalization corresponds to shrinking v (pictured above), but if kvk < 1,
then normalization stretches v.
Remark. Despite the similarities in name, “normalization” is unrelated to the concept of a “normal
vector.” What you’ll find is that “normal” is probably the most over-used word in mathematics.
Because there aren’t any around me as I type this, I’m going to go ahead and blame the physicists
for the abuse of language.
CHAPTER 0. PREFACE 17
1.2.2 Distances
Recall that, for two points P (x1 , y1 ) and Q(x2 , y2 ) in the plane, we have that the distance between
them is given by
p
d(P, Q) = (x1 − x2 )2 + (y1 − y2)2
If we identify the point P (x1 , y1 ) with the vector u = [x1 , y1 ]T and the point Q(x2 , y2 ) with the vector
v = [x2 , y2 ]T , then the right-hand side of the equation is just ku − vk. As such, we can define
distances between vectors using the obvious analog.
Definition: distance
Given two vectors u, v ∈ Rn , the distance between u and v is
d(u, v) = ku − vk.
Remark. Visualizing vectors as arrows emanating from the origin, distance, as above, is actually
measuring the distance between the heads of the arrows.
u−v
v
CHAPTER 0. PREFACE 18
1.2.3 Angles
c u
B θ θ
b u−v
a v
b2 = a2 + c2 − 2ac cos(θ)
Replacing the triangle 4ABC with the triangle formed from vectors u, v, u − v (as in the picture
above on the right), we have
Expanding out the left-hand side of the above equation in terms of dot products, we get
Example 1.2.3.1
Compute the angle between the vectors u = [0, 3, 3]T and v = [−1, 2, 1]T .
and thus
√ !
3 π
θ = arccos = .
2 6
1.
CHAPTER 0. PREFACE 21
a1 x 1 + · · · + an x n = b
where a1 , . . . , an are real numbers called coefficients and b is a real number called the constant
term. A solution of the equation is a vector [v1 , . . . , vn ]T so that
a1 v1 + · · · + an vn = b.
Example 2.1.0.1
4x − y = 2 is an example of a linear equation. And notice we can rearrange it as y = 4x − 2,
which is the equation of a line (hence why we call these “linear”). The vector [1, 2]T is a solution
because
4(1) − (2) = 2.
In fact, for any real number t, the vector [t, 4t − 2]T is a solution because
4(t) − (4t − 2) = 2.
Definition
The collection of all solutions to a linear equation is called the solution set of that equation.
Noticing that
t 0 t 0 1
= + = +t
4t − 2 −2 4t −2 4
we can write the solution set to the previous example as
0 1
+t where t ∈ R
−2 4
Definition
The parametric form of the solution set is when it is written as
{v0 + t1 v1 + · · · + tn vn where ti ∈ R}
Example 2.1.0.2
π √
sin x + 540.6464y + z = e71 is a linear equation, because complicated as they are,
82364423
π √
sin , 540.6464y, e71 are just real numbers.
82364423
Example 2.1.0.3
x + xy + y = 7 is not a linear equation because of that xy term.
Example 2.1.0.4
Definition
A system of linear equations is a finite set of linear equations, each with the same variables
(and probably different coefficients). A solution of a system of linear equations is a vector
that is simultaneously a solution for each linear equation in the system. A solution set is the
collection of all possible solutions to the system.
Example 2.1.0.5
The system
2x − y = 3
x + 3y = 5
has the vector [2, 1]T as a solution; in fact, this is the only solution.
CHAPTER 0. PREFACE 23
Definition
A system of linear equations is called consistent if it has at least one solution, and inconsistent
if it has no solutions.
You can convince yourself of the above trichotemy by considering how it works for systems of
equations with two variables (whose solution sets are graphically lines in the Cartesian plane). Two
lines can either intersect in a single point (if they are transverse), intersect in infinitely many points
(if they coincide), or no points (if they are parallel).
Example 2.1.0.6
The system in Example 2.1.0.5 is consistent and the solution is unique.
Example 2.1.0.7
The system
x − y = 0
2x − 2y = 0
is consistent. It has the solution [x, y]T = [1, 1]T , but this is not the only solution. For any real
number t, the vector [t, t]T is a solution, so there are infinitely many.
Example 2.1.0.8
The system
x + y = 0
x + y = 2
has no solutions.
Definition
Two systems of linear equations are called equivalent if they have the same solution set.
Example 2.1.0.9
Consider the system
x + 3y + 5z = 7
2y − 4z = 6
8z = 16
Because of this kind of “triangular structure,” we quickly deduce z = 2, and then 2y − 4(2) = 6
implies that y = 7, and then x + 3(7) + 5(2) = 7 implies that x = −24.
Since the variables themselves aren’t changing, we can save time and represent any linear system by
a matrix.
Definition
Given a system of linear equations
a11 x1 + a12 x2 · · · a1n xn = b1
a21 x1 + a22 x2 · · · a2n xn = b2
.. .. .. .. ..
. + . . . = .
a x
m1 1 + am2 x2 · · · amn xn = bm
Remark. If A is the coefficient matrix for some system and b = [b1 , . . . , bm ]T is the column vector of
constant terms, we may write A b to represent the augmented matrix.
Remark. We will always be very explicit when we are making claims about augmented matrices
specifically, and we will take care to always draw the line for an augmented matrix. When
programming with matrices, however, the vertical line isn’t there, so you’ll have to be especially
careful when considering whether the matrix you’ve used is representative of an augmented matrix or
something else.
CHAPTER 0. PREFACE 25
Example 2.1.0.10
The “triangular structure” of the system in Example 2.1.0.9 is also apparent in the corresponding
augmented and coefficient matrices:
1 3 5 7 1 3 5
0 2 4 6 and 0 2 4
0 0 8 16 0 0 8
CHAPTER 0. PREFACE 26
1.
CHAPTER 0. PREFACE 27
Example 2.2.1.1
In Example 2.1.0.5 we saw that the system
2x − y = 3
x + 3y = 5
was consistent and had the unique solution [x, y]T = [2, 1]T . Consider now the following three
systems of equations. The first systems obtained by merely swapping the equations. The second
system is obtained by scaling the second equation. The third system is obtained by replacing
the second equation with the sum of the first and second equations.
x + 3y = 5 2x − y = 3 2x − y = 3
2x − y = 3 100x + 300y = 500 3x + 2y = 8
All of these have the (unique) solution [x, y]T = [2, 1]T (this is left as an exercise for the reader),
and so they are all equivalent.
It turns out that this fact isn’t specific to this system, but is generally true of any linear system:
these three operations do not change the solution set of the system! The “elimination method”
(which you may be familiar with from a previous algebra/precalculus class) uses this fact to solve
systems of linear equations. If we think about what this is doing to the corresponding augmented
matrices, we get what we call the elementary row operations.
Definition: Elementary Row Operations
The elementary row operations of a given matrix are the following operations:
1. Swapping Row i and Row j (denoted Ri ↔ Rj ).
2. Multiplying Row i by a nonzero constant (denoted kRi 7→ Ri ).
3. Adding (a multiple of) Row j to Row i (denoted Ri + kRj 7→ Ri ).
Remark. These operations are not specific to augmented matrices, but are true of any matrices. In
fact, unless explicitly stated otherwise, you should probably not ever assume that a matrix is
augmented.
Given two (augmented) matrices, the above operations do not change the solution set for the
corresponding linear system. So since two linear systems are equivalent if they have the same
solution set, the following is a natural definition
Definition: row equivalence
Two matrices A and B are row equivalent if there is a sequence of elementary row operations
transforming A into B.
CHAPTER 0. PREFACE 28
Example 2.2.1.2
Using the systems in Example 2.2.1.1, show that the corresponding augmented matrices are row
equivalent.
2 −1 3
1 3 5
100R2 7→ R2
R1
2
+
R
R2
↔
7→
1
R
R2
1 3 5 2 −1 3 2 −1 3
2 −1 3 100 300 500 3 2 8
The following systems are equivalent (it’s again an exercise to the reader to verify this):
x − y − z = 2 x − y − z = 2 x = 3
3x − 3y + 2z = 16 y + 3z = 5 y = −1
2x − y + z = 9 5z = 10 z = 2
and thus they correspond to the following row equivalent augmented matrices
1 −1 −1 2 1 −1 −1 2 1 0 0 3
3 −3 2 16 0 1 3 5 0 1 0 −1
2 −1 1 9 0 0 5 10 0 0 1 2
The second and third systems are much more useful for actually solving the system because they
have the nice triangular structure that allows us to back-substitute (or in the case of the third one,
simply reading off the solution). Let’s give names to this triangular structure that we like so much.
Example 2.2.2.1
The following matrices are in row echelon form.
CHAPTER 0. PREFACE 29
0 1 2 3 4 5
1 2 3 1 2 3 0
0 0 6 7 8 9
4 5 0 4 5 0 0 0 0 10 11
0 0 0 0 0 6
0 0 0 0 0 5
Example 2.2.2.2
0 0 0 0 0 1 2 3
2 4 5 1 3 0 0 0 0 4 5
1 0 0
0 2 3 0 0 0 6 7
Example 2.2.2.3
The following matrices are in reduced row echelon form.
0 1 0 3 0 0
1 0 3 1 0 0 0
0 0 1 7 0 0
1 5 0 1 0 0 0 0 0 1 0
0 0 0 0 0 1
0 0 0 0 0 1
Example 2.2.2.4
The following matrices are not in reduced row echelon form. (Why?)
1 0 0 0 1 0 0 1 1 1
0 2 0 1 0 0 0 0 1 1
0 0 1 0 0 0 1 0 0 1
Theorem 2.2.2.5
Every matrix is equivalent to a matrix in (reduced) row echelon form.
The proof of this is entirely procedural, so let’s see it done in the context of an example.
CHAPTER 0. PREFACE 30
Example 2.2.2.6
Using elementary row operations, find a matrix R that is row equivalent to the following matrix.
1 −1 −1 2
3 −3 2 16
2 −1 1 9
1. Working left to right, find the first nonzero column in the matrix.
(The first column is nonzero.)
2. Among all of the rows with nonzero entries in this column, choose one and move it to Row
1.
(We’ll just keep the first row where it is.)
3. Use elementary row operations to clear all other nonzero entries in this column (below
Row 1).
1 −1 −1 2 1 −1 −1 2
R2 −3R1 7→ R2
3 −3 2 16 − −−−−−−−→ 0 0 5 10 (2.1)
2 −1 1 9 2 −1 1 9
1 −1 −1 2
R −2R 7→ R3
−−3−−−1−−−→ 0 0 5 10 (2.2)
0 1 3 5
(2.3)
1 −1 −1 2
R ↔R3
−−2−−→ 0 1 3 5 (2.4)
0 0 5 10
6. Use elementary row operations to clear all other nonzero entries in this column (below
Row 2).
(Already done.)
7. Repeat this process until the matrix is in row echelon form.
(Huzzah, the matrix in Equation 2.5 is in row echelon form!)
8. Now scale every row so that the leading term is a 1. The result will be in reduced row
echelon form.
1
R →
7 R
1 −1 −1 2
5 3 3
−−−−−→ 0 1 3 5 (2.5)
0 0 1 2
CHAPTER 0. PREFACE 31
9. Working from left to right, use elementary row operations to clear all nonzero entries above
each leading 1.
1 0 2 7
R +R2 7→R1
−−1−−− −−→ 0 1 3 5 (2.6)
0 0 1 2
1 0 0 3
R1 −2R3 7→R1
−− −−−−−→ 0 1 3 5 (2.7)
0 0 1 2
1 0 0 3
R2 −3R3 7→R2
−− −−−−−→ 0 1 0 −1 (2.8)
0 0 1 2
Remark. It’s worth noting that row operations can be performed in any order, and so the order of
operations carried out above is not strictly necessary. Indeed, if you are working through this
procedure by hand, you may choose to do operations that retain integers and positive entries for as
long as possible to reduce arithmetic mistakes.
Remark. The row echelon form of a given matrix is not unique, but the reduced row echelon form of
a matrix is unique.
Definition
The process described in the example above is called row reduction.
Theorem 2.2.2.7
Matrices A and B are row equivalent if and only if they can be row reduced to the same echelon
form.
Proof. Let α1 , α2 , . . . , αk be the sequence of row operations that row reduces A, and let β1 , β2 , . . . , β`
be the sequence of row operations that row reduces B. In other words,
αk ◦ · · · ◦ α2 ◦ α1 (A) = RREF(A) and β` ◦ · · · ◦ β2 ◦ β1 (B) = RREF(B).
If A and B are row equivalent, then there is a sequence of row operations σ1 , σ2 , . . . , σn for which
σn ◦ · · · ◦ σ2 ◦ σ1 (A) = B and therefore
β` ◦ · · · ◦ β1 ◦ σn ◦ · · · ◦ σ1 ◦ α1−1 ◦ · · · αk−1 (RREF(A))
= β` ◦ · · · ◦ β1 ◦ σn ◦ · · · ◦ σ1 (A)
= β` ◦ · · · ◦ β1 (B)
= RREF(B)
Since any row operation to a matrix in reduced row echelon form will take it out of reduced row
echelon form, then it must be that RREF(A) = RREF(B). Conversely, if RREF(A) = RREF(B),
then we have a equence of row operations which changes A into B:
β1−1 ◦ · · · ◦ β`−1 ◦ αk ◦ · · · ◦ α1 (A)
= β1−1 ◦ · · · ◦ β`−1 (RREF(A))
= β1−1 ◦ · · · ◦ β`−1 (RREF(B))
=B
CHAPTER 0. PREFACE 32
Definition
Given a linear system with augmented matrix [A|b] in (reduced) row echelon form, the pivot
columns correspond to leading variables in the system, and the other nonzero columns corre-
spond to free variables in the system.
Both processes take about the same amount of time by hand. But since the reduced row echelon
form is unique and most matrix algebra software has an RREF feature, Gauss–Jordan is more
efficient in practice.
Example 2.2.3.1
Use Gaussian–Jordan elimination to find the solution set for the given system
x1 − x2 + x3 + 4x4 = 0
2x1 + x2 − x3 + 2x4 = 9
3x1 − 3x2 + 3x3 + 12x4 = 0
Replacing our free variables x3 and x4 with parameters s and t (respectively), our solution set is
3 0 −2
1 2
3
+ s + t where s, t ∈ R
0 1 0
0 0 1
What we have seen is that both row echelon form and reduced row echelon form are useful in the
same way, but both have pros and cons. Row echelon form isn’t unique and, in the case of
augmented matrices, it takes a little bit more work to solve the system at the end. Reduced row
echelon form is unique and makes the solution at the end easier, but requires more steps initially.
Example 2.2.4.1
In Example 2.1.0.8, we stated that the system
x + y = 0
x + y = 2
was inconsistent. What do you observe when you row reduce the augmented matrix?
1 1 0 R2 −R1 7→ R2 1 1 0
−−−−−−−→
1 1 2 0 0 2
That last row corresponds to the linear equation 0 = 2, which is patently false. This means
there can’t possibly be a solution to the system, i.e., it is inconsistent. We state this observation
as a proposition.
CHAPTER 0. PREFACE 34
Proposition 2.2.4.2
Let A = [aij ] be the coefficient matrix for a system and b = [b1 , . . . , bm ]T the vector of constant
terms. If the row echelon form of the augmented matrix [A|b] contains a row where ai1 = ai2 =
· · · = aim = 0 and bm 6= 0, then the system is inconsistent.
One might ask if we can say anything about a consistent system from its (reduced) row echelon form.
To answer this, we first introduct the following definition.
Definition: rank
The rank of a matrix A is the number of nonzero rows in its (reduced) row echelon form, and
is denoted Rank(A).
Example 2.2.4.3: W
at is the rank of the coefficient matrix in Example 2.1.0.8? is 1, and the rank of the coefficient
matrix in Example ?? is 2.
1 and 2, respectively.
Remark. It turns out this theorem is actually just a special interpretation of a much more powerful
theorem called the “Rank-Nullity Theorem,” but that discussion will have to wait for a later section.
Definition: homogeneous
A system of linear equations, Ax = b is homogeneous if b = 0.
Remark. Homogeneous systems are nice because they ALWAYS have at least one solution (namely
the trivial solution x = 0).
Theorem 2.2.4.5
If Ax = 0 is a homogeneous system of m linear equations and n variables, where m < n, then
the system has infinitely many solutions.
Proof. Since the system is homogeneous, it has at least one solution. Since Rank(A) ≤ m, then by
the Rank Theorem
and a nonzero number of free variables implies that there are infinitely-many solutions.
CHAPTER 0. PREFACE 35
Example 2.2.4.6
Use Gauss-Jordan elimination to find the solution set for the given system
x1 − x2 + 3x3 + 4x4 = 0
x1 + x2 − x3 − 2x4 = 0
Creating the augmented matrix and doing the corresponding row operations, we have
1 −1 3 4 0 R2 −R1 7→ R2 1 −1 3 4 0
−−−−−−−→
1 1 −1 −2 0 0 2 −4 −6 0
1
R 7→ R2
2 2 1 −1 3 4 0
−−−−−→
0 1 −2 −3 0
R1 +R2 7→ R1 1 0 1 1 0
−−−−−−−→
0 1 −2 −3 0
From here, we can see that x3 and x4 are free variables, so letting x3 = s and x4 = t, we get
that the solution is
x1 −s − t −1 −1
x2 2s + 3t 2 3
=
x3 s = s 1 + t 0
x4 t 0 1
The way this last example differs from Example 2.2.3.1 is that we have exactly as many free variables
as we have vectors in the linear combination (instead of also having the extra constant vector added
on). This is more ideal because, with the usual vector operations, the collection of all of these
solutions is actually a vector space! We will explore this idea a bit further in the next section.
CHAPTER 0. PREFACE 36
1.
CHAPTER 0. PREFACE 37
1. Give an example of three vectors u, v, w ∈ R2 for which {u, v, w} is a linearly dependent set,
but u cannot be written as a linear combination of v and w.
CHAPTER 0. PREFACE 39
3.1.4 Transpose
CHAPTER 0. PREFACE 40
matrix A B C D E F
size 2×2 2×3 3×2 2×2 1×2 2×1
Determine the sizes of each of the following matrices, if possible. If it is not possible, explain
why.
(a) 3D − 2A
(b) D + BC
(c) BB T
(d) B T C T − (CB)T
(e) DA − AD
(f) (I2 − D)2
2 1 −7
2. Compute (A − I2 ) where A = .
0 1
3. Give an example of a nonzero 2 × 2 matrix A for which A2 = O2×2 . Show some work to verify
it has this property.
4. Give an example of a nonzero 3 × 3 matrix A for which A3 = O3×3 . Show some work to verify
it has this property.
0 2
5. Let A = . Give an example of two 2 × 2 matrices B and C such that AB = AC but
0 0
B 6= C. Show some work to verify it has this property.
6. Give an example of two 2 × 2 matrices A and B for which AB 6= BA. Show some work to
verify it has this property.
1 1
7. Let A = .
0 1
(a) Compute A2 , A3 , and A4 .
(b) What is An for some positive integer n?
CHAPTER 0. PREFACE 41
3.5.3 Coordinates
CHAPTER 0. PREFACE 44
1.
CHAPTER 0. PREFACE 46
4.2 Determinants
4.4.2 Diagonalization
3.7 Applications
5.1 Orthogonality in Rn
5.1.1 Orthonormality
CHAPTER 0. PREFACE 56
5.3.1 Gram–Schmidt
5.3.2 QR Factorization
CHAPTER 0. PREFACE 58
column vector, 8 Rn , 7
coordinate grid, 10 P n (x), 6
linear combiation, 9 vector space, 5
row vector, 8 scalar, 5
scalar multiplication, 5
Symbols vector, 5
C(R), 6 vector addition, 5
R, 5 zero vector, 5
60