MATH 212 - Linear Algebra I: Last Revision: November 22, 2012
MATH 212 - Linear Algebra I: Last Revision: November 22, 2012
solution: x = 2, y = 1
How many tons of ore should be extracted from each mine, to exactly fill an order for a total of 8
oz. gold, 78 oz. silver and 144 oz. copper?
Let x, y, z be the number of tons of raw ore from mines A, B, C, respectively. Then the total
amount of gold produced will be x + 2y + z. So we require that x + 2y + z = 8 if the order for gold
is to be filled exactly. We can write a similar equation for the total quantities of silver and copper
produced. In summary:
x +2y +z = 8
10x +18y +12z = 78
20x +22y +40z = 144
1
For any given system of linear equations, the fundamental questions are:
In the first section of the course we will develop some computational tools and some theory that
we can use to answer these questions.
2 Gauss-Jordan Elimination
Lec. #2
Throughout the algebraic manipulations required to solve the previous example, many of the sym-
bols were redundant—they act only as placeholders. All the essential information can be represented
in the augmented matrix:
1 2 1 8
10 18 12 78
20 22 40 144
The variable names and equals sign are implicit. The essential operations required to solve the
system can be performed by operating on rows of this matrix. This cuts down on the book-keeping.
Permissible operations on the equations become the following permissible elementary row operations
on the augmented matrix:
These operations allow us to modify the form of the equations without changing their solution.
By a systematic application of these operations (the row reduction algorithm) we arrive at the
following:
1 2 1 8
0 1 −1 1
0 0 1 1
2
This gives z = 1 immediately; x and y can be found by back substitution.
solution: x = 3, y = 2, z = 1
1. Go to the first non-zero column; interchange rows so the top entry (the pivot) is non-zero.
2. Apply row operations to obtain all zeroes below the pivot.
3. Consider the sub-matrix below and to the right of the pivot; repeat steps 1–2 on this subma-
trix. Continue in this way until there are no more pivots available.
4. Scale each row so the leading entry is 1.
5. Go to the right-most pivot; use row operations to obtain all zeroes above the pivot.
6. Move one pivot to the left, and repeat. Continue in this way for all pivots.
After step 3 the augmented matrix is in row echelon form. It has the following properties:
• On each row, the leading entry is to the right of the leading entry in the row above, and has
only zeroes below it.
• Any all-zero rows are at the bottom.
At this stage (if the solution is unique) we can read off one solution, and proceed with back-
substitution. Or we can continue with steps 3-6, after which the matrix is in reduced row echelon
form (RREF). It then has the following additional properties:
• Every leading entry is a 1, and is the only non-zero entry in its column.
From the reduced row echelon form, we can just read off the solution, if it is unique.
Note: there’s more than one way to get to the RREF, so you might think there’s more than one
possible final answer. It turns out that for any matrix, the RREF is unique: everyone will get the
same final answer, no matter what steps they take to get there (we’ll prove this later).
Example 2.1. Use row operations to put the following matrix in reduced row echelon form:
0 5 3 −1
1 2 1 0
−3 −1 2 1
1 2 1 0 1 0 0 3/5
solution: REF: 0 5 3 −1 RREF: 0 1 0 −4/5
0 0 2 2 0 0 1 1
3
Example 2.2. Use Gauss-Jordan elimination to solve the following linear systems:
x1 + x2 + x3 = 3
x1 − 2x2 + 3x3 = 9
2x1 + 3x2 + x3 = 5 −x1 + 3x2 = −4
x1 − x2 − 2x3 = −5 2x1 − 5x2 + 5x3 = 17
There is no point of intersection, and the system has no solution. We say this system is inconsistent.
When we apply Gauss-Jordan to this system, we get:
1 −2 0
0 0 8
The last row asserts 0 = 8 (which is obviously false). We say the system is inconsistent. In other
words, the original system is equivalent to the false statement “0 = 8”. We can generalize this
observation as follows:
• If a row echelon form of the augmented matrix has a row of the form [0 · · · 0 k] where k 6= 0,
then the system is inconsistent.
• Conversely, there is no such row then the system is consistent: there exists at least one
solution.
Example 3.1. For the linear system with the following augmented matrix, how many solutions
are there? (interpret the result graphically.)
0 1 2
1 −1 1
1 1 3
4
3.2 General Solution
Notice that the final row expresses the equation “0 = 0”, which is redundant. In essence, the
Gauss-Jordan has eliminated one of the equations. Also, this is the simplest system of equations
that are equivalent to the original; they can’t be reduced further.
It will be useful to distinguish between the basic variables (x1 and x2 ) which correspond to pivot
columns, and the other, free variable (x3 ). This system has an infinite number of solutions, which
can be conveniently expressed by solving for the basic variables in terms of the free variables,
yielding the general solution
x1 = 3x3 + 1
x2 = x3
x3 is free.
which describes every possible solution of the original system. This means x3 can take on any
value, while x1 and x2 are determined by x3 according to the formulas above. So, for example, one
solution is:
x3 = 0, x1 = 1, x2 = 0
and another is
x3 = 2, x1 = 7, x2 = 2.
We get a different solution for each choice of x3 ; thus this system has an infinite number of solutions.
Note that the variable x3 acts a parameter in general solution: it can take any value, but this
value then determines all of the other variables. It is sometimes convenient to introduce a dummy
variable to act as this parameter. Then, for example, if we let x3 = t we can write the general
solution in parametric form as
x1 = 3t + 1
x2 = t
x3 = t
where the parameter t can take any value.
Example 3.2. Find the general solution of the linear system
x1 − x2 − x3 + 2x4 = 1
2x1 − 2x2 − x3 + 3x4 = 3
−x1 + x2 − x3 = −3,
5
3.3 General results
Lec. #4
Note that for any system of linear equations, there are only three possibilities for the number of
solutions: there must be either:
• exactly one solution (when the number of pivot columns equals the number of variables)
• no solutions (when the final row of a REF looks like [0 · · · 0 k])
• an infinite number of solutions (when the final row does not look like [0 · · · 0 k] and there
are fewer pivot columns than the number of variables, so there is at least one free variable)
So, e.g. it is impossible for a linear system to have exactly two solutions.
Example 3.3. Consider the linear system
x + ky = 2
−3x + 4y = 1.
For what value(s) of k is there (a) a unique solution (b) no solution (c) an infinite number of
solutions?
Example 3.4. Consider the linear system
2x − y = 3
−4x + 2y = k
4x − 2y = 6.
For what value(s) of k is there (a) a unique solution (b) no solution (c) an infinite number of
solutions?
A linear system with all all zeroes on the right-hand side is said to be homogeneous. For example,
the system
3x + y + z = 0
x−y+z =0
x + 5y − 2z = 0
is homogeneous. Homogeneous systems have some special properties that are worth noting. For
example, a homogenous system always has at least one solution, the “zero solution” x = y = z =
· · · = 0.
Theorem 3.1. For a homogenous linear system, if the number of equations is less than the number
of variables then the system has infinitely many solutions.
Proof. Since the augmented matrix has fewer rows than variables, the RREF will have fewer pivots
(there is at most one pivot per row) than the number of variables. Thus there must be at least one
free variable. The system cannot be inconsistent (the zero solution will always be a solution) so,
since there is a free variable, there must be infinitely many solutions.
6
4 Applications of Linear Systems
x1 C3 H8 + x2 O2 −→ x3 CO2 + x4 H2 O
Write equations expressing the balance of numbers of each atom on both sides of the reaction:
3x1 = x3
8x1 = 2x4
2x2 = 2x3 + x4
Bring all terms to the left-hand sides, write the corresponding augmented matrix, and RREF:
3 0 −1 0 0 1 0 0 −1/4 0
8 0 0 −2 0 −→ 0 1 0 −5/4 0
0 2 −2 −1 0 0 0 1 −3/4 0
So x4 is a free variable, and the general solution can be written in parametric form as
x1 = (1/4)t
x2 = (5/4)t
x3 = (3/4)t
x4 = t.
B2 S3 + H2 O −→ H3 BO3 + H2 S
Example 4.3. Solve the following network flow (i.e. find the flow in each branch of the network).
7
x2
10 7
x1 x3
Using the principle that “flow in = flow out” at each node, we get the following equations:
x1 + x2 = 10
1 1 0 10 1 0 −1 3 x 1 = 3 + t
RREF
x2 + x3 = 7 =⇒ 0 1 1 7 −−−−→ 0 1
1 7 =⇒ x2 = 7 − t
x1 = 3 + x3 1 0 −1 3 0 0 0 0
x3 = t
If the directions of flow are required to be as shown, then the range of possible solutions becomes
restricted as follows:
3 + t ≥ 0
3 ≤ x1 ≤ 10
7−t≥0 =⇒ 0 ≤ t ≤ 7 =⇒ 0 ≤ x2 ≤ 7
t≥0 0 ≤ x3 ≤ 7
Three companies (A, B and C) compete for market share (e.g. customers). Each weak, each
company loses a fixed proportion of its market share to its competitors, as shown below.
0.3 0.1
0.1
0.1
A B
0.2
After many weeks, the distribution of market share reaches an equilibrium. What share of the
market does each of the companies have?
8
At equilibrium, the weekly net change of each company’s market share is zero. This gives the
following system of equations.
−0.4x1 + 0.2x2 + 0.1x3 = 0
0.1x1 − 0.2x2 + 0.1x3 = 0
0.3x1 − 0.2x3 = 0
5 Matrix Algebra
Lec. #7
We have already seen that matrices are useful to keep track of the “book-keeping” involved in
solving linear systems. Matrices have many other uses as well; to this end we need to develop our
theory of matrices and the mathematical operations that can be performed with them.
It is convenient to use symbols to represent matrices, just as symbols are used to represent numbers
in ordinary algebra. For example, let
5 3 1 2
2 4
A = 1 −2 B= 3 4 C= and c = 3.
6 8
4 0 5 6
Throughout this course we will follow the convention of using uppercase letters to represent matri-
ces. Lowercase letters will be used to represent scalars (i.e. numbers).
Definition 5.1. Matrix addition (A + B) and subtraction (A − B) are defined by element-wise
addition/subtraction of the individual components of the matrices.
Note that when two matrices are added or subtracted, the result is matrix of the same dimensions
as the original two. Also, addition and subtraction are only defined for matrices that that have the
same dimensions. So, with A and C as given above, both A + C and A − C are undefined.
Definition 5.2. Multiplication by a scalar (cA) is defined by element-wise multiplication.
9
Again, the result is a matrix of the same dimensions as the original.
As in regular algebra, we can combine addition and scalar multiplication to form compound ex-
pressions.
The following rules of matrix algebra are consequences of the way we have defined addition, sub-
traction and scalar multiplication.
Theorem 5.1. Let A, B and C be given matrices with the same dimensions; let c and d be given
scalars. Let 0 be the zero matrix. Then:
3. A + 0 = A
4. A + (−A) = 0.
6. (c + d)A = cA + dA
7. c(dA) = (cd)A
8. 1A = A
We need to prove that these algebraic identiteies are indeed true in general. The proofs can be
done simply (but tediously) by considering arbitrary matrices whose components are represented
symbolically, and showing that both sides of each identity do indeed agree.
10
From the definition of matrix addition, the left-hand side of the identity becomes:
a11 + b11 a12 + b12 . . . a1n + b1n
a21 + b21 a22 + b22 . . . a2n + b2n
A+B = .. .. ..
. . .
am1 + bm1 am2 + bm2 . . . amn + bmn
whereas
b11 + a11 b12 + a12 ... b1n + a1n a11 + b11 a12 + b12 ... a1n + b1n
b21 + a21 b22 + a22 ... b2n + a2n a21 + b21 a22 + b22 ... a2n + b2n
B+A= .. .. .. = .. .. ..
. . . . . .
bm1 + am1 bm2 + am2 . . . bmn + amn am1 + bm1 am2 + bm2 . . . amn + bmn
where in the last step we used the commutative law for ordinary addition of scalars. Comparing
final results from these two equations, we see that indeed A + B = B + A.
In essence, the algebraic properties listed above imply that matrices can be treated as algebraic
quantities that follow the same rules as in ordinary algebra, as least as far as addition, subtraction,
and scalar multiplication are concerned. That is, when doing algebra with matrices we can treat
them like ordinary variables — up to a point. So, for example, we can use algebra to solve equations
involving matrices:
4A + 3B = 8C
5.3 Vectors
Matrices having just a single column play a special role, and are called vectors. A vector is sometimes
written as a column matrix, sometimes just as a comma-separated list. For example:
1
x = 5 = (1, 5, 3).
3
Throughout this course we will follow the convention of writing vectors using boldface symbols (or
with an underline, when writing by hand).
We identify the individual components of a vector using subscripts. So, example, the vector x
above has components x1 = 1, x2 = 5, and x3 = 3. The order of the components matters:
(1, 2, 3) 6= (1, 3, 2).
11
5.4 Vector form of the general solution of a linear system
Lec. #8
Go back to a previous example:
Example 5.4. Find, in vector form, all solutions of the linear system
3x1 + 5x2 − 4x3 = 0
−3x1 − 2x2 + 4x3 = 0.
RREF gives
1 0 −4/3 0
0 1 0 0
so the general solution is
x1 = 34 t
x2 = 0
x3 = t
where t ∈ IR is an arbitrary parameter. Introducing the vector of unknowns x = (x1 , x2 , x3 ) we can
write the general solution in vector form as
4 4
x1 3t 3
x = x2 = 0t = t 0 .
x3 t 1
We see that every solution is a scalar multiple of the constant vector ( 43 , 0, 1).
Example 5.5. Find, in vector form, the general solution of
x1 − x2 − x3 + 2x4 = 1
2x1 − 2x2 − x3 + 3x4 = 3
−x1 + x2 − x3 = −3.
Having defined addition, subtraction and scalar multiplication for matrices, we now turn to defining
matrix multiplication. It turns out that the most useful way to define the product of two matrices is
not by element-wise multiplication, as you might have expected. The matrix product has a rather
more complicated definition, whose utility will become apparent as we proceed.
Definition 5.3. Given two matrices A and B, the matrix product (AB) is defined by
AB = C
where Cij , the (i, j)’th component of C (i.e. the entry in the i’th row and j’th column) is given by
the cumulative product of the i’th row of A with the j’th row of B. That is,
n
X
Cij = Ai1 B1j + Ai2 B2j + · · · + Ain Bnj = Aik Bkj
k=1
12
Important: the product AB is only defined if the number of columns of A equals the number of
rows of B.
Example 5.6. Evaluate:
1 2
1 2 3 1 2 3 1 2 1 2 3
(a) 0 −3 (b) (c)
4 5 6 4 5 6 0 −3 0 −3 1
−1 5
Example (c), the product of a matrix and a vector, is a common special case that’s worth extra
attention. Note that the product Ax always results in a vector.
Matrix powers can be define in terms of iterated muliplication. So, for example, if A is a matrix
then A3 = AAA. In general, An = AA · · · A}.
| {z
n times
1 2
Example 5.7. Calculate A3 where A = .
0 3
There is another useful way to think of matrix multiplication. Suppose the columns of A are the
vectors v1 , . . . , vn (we write A = [v1 v2 · · · vn ]). Then
Ax = x1 v1 + x2 v2 + · · · + xn vn .
So the matrix-vector product is the vector sum of the columns of A, each weighted by the corre-
sponding component of x (such a weighted sum is called a linear combination of the columns of
A.)
The following algebraic properties are consequences of our definition of matrix multiplication (al-
though they are tedious to prove):
Thus, under matrix multiplication, matrices can be treated as ordinary algebraic quantities except
that matrix multiplication does not commute: left-multiplication is fundamentally different from
right-multiplication; they are not interchangeable. This is very important to remember. It is the
key difference between linear algebra and the ‘ordinary’ algebra, and accounts for much of the
richness in the theory of linear algebra.
13
1 2
3 4
Example 5.8. Calculate AB and BA where A = and B = .
0 −3
1 0
1 2 5
Example 5.9. Calculate AB and BA where A = and B = .
0 −3 2
The components of this matrix look like the left-hand side of a linear system. Indeed, the the linear
system
x1 + 2x2 = 5
3x1 + 4x2 = 8
or simply
Ax = b
where b = (5, 8).
The matrix A is called the coefficient matrix ; its components can be read directly from the system
of equations.
Any system of linear equations can be written as a single equivalent matrix equation Ax = b. This
turns out to be very useful.
Example 5.10. Write a system of linear equations that is equivalent to the matrix equation Ax = b
with
1 3 0
A= and b = .
2 5 1
Example 5.11. Find matrices A and b so that the linear system
x1 − 2x2 = 0
2x1 − 4x2 = 8.
Identifying matrix equations with linear systems has many advantages, including:
14
• any linear system can be represented by a single equation of the form Ax = b, no matter
how many unknowns or equations there are
Since any linear system can be written in matrix form as Ax = b, there is much to be gained by
discovering how to solve this and other matrix equations.
To solve Ax = b, one is tempted to simply “divide” both sides by A. However, matrix division is
an undefined operation, so this won’t work. But consider the equation
ax = b
where a, x and b are simply scalars. To solve this equation we could divide both sides by a, but it
is also possible to solve it using only multiplication, by multiplying both sides by the number a−1 :
ax = b =⇒ x = (a−1 )b.
Our goal in the following sections is to do something similar for matrix equations, i.e. to solve
Ax = b using only multiplication. We will do this by finding a matrix A−1 so that the solution
can be written as x = A−1 b. To do this, we need to introduce the “identity matrix”.
Definition 5.4. The n × n identity matrix is the (unique) matrix I such that IA = AI = A or
every n × n matrix A. Every identity matrix I has the form
1 0 ... 0
0 1 . . . 0
I = . . . ..
.. . .
0 0 ... 1
For the purposes of matrix multiplication, I plays a role similar to the number 1 in ordinary
multiplication. This simple idea proves to be very useful.
Note that multiplication by the identity does commute.
We are now in a position to present the main tool for solving matrix equations like Ax = b:
15
Definition 5.5. Given an n × n matrix A, the inverse of A (if it exists) is the (unique) n × n
matrix, written A−1 , such that
AA−1 = A−1 A = I.
If A−1 exists then A is said to be invertible or non-singular.
Now consider the equation Ax = b. By multiplying this equation on both sides by A−1 and
simplifying, using the rules of matrix algebra, we can solve for x as follows:
Ax = b =⇒ A−1 (Ax) = A−1 b
=⇒ (A−1 A)x = A−1 b
=⇒ Ix = A−1 b
=⇒ x = A−1 b.
This gives an algebraic method of solving a system of linear equations, without using the Gauss-
Jordan algorithm, and it gives an explicit formula for the solution. All we need is a way to find
A−1 . Often this involves a lot of work, but for special cases it’s easy.
x−y =4
x−z =6
6x − 2y − 3z = 2.
16
5.11 Non-invertible matrices
We have found an algebraic way of solving a system of linear equations, by transforming the system
into a matrix equation:
Ax = b =⇒ x = A−1 b.
Notice this this equation implies that there is a unique solution. Thus, if a linear system does not
have a unique solution then the coefficient matrix does not have an inverse. Of course, not all
matrices have inverses, since not all linear systems have unique solutions.
Definition 5.6. If a matrix A has an inverse then we say A is invertible or nonsingular. If A does
not have an inverse we say A is non-invertible or singular.
1 2
Example 5.17. If we try to find the inverse of A = we get a division by zero. The matrix
2 4
is singular: it does not have an inverse. Consider the linear system corresponding to Ax = b:
x1 + 2x2 = b1
2x1 + 4x2 = b2
1 2 b1 RREF 1 2 b1
−−−−→
2 4 b2 0 0 b2 − 2b1
Depending on the values b1 , b2 this system is either inconsistent or has an infinite number of
solutions. Either way, there will not be a unique solution, so we cannot solve the corresponding
matrix equation in the form x = A−1 b. Therefore A cannot have an inverse.
We have seen that matrix inverses are useful for expressing the solution of linear systems, but this
will only be possible if we can find a matrix inverse when we need one.
Here is a general method for finding A−1 , if it exists:
1. Form the augmented matrix [ A | I ] where the identity I has the dimensions of A.
To summarize:
RREF
[ A | I ] −−−−→ [ I | A−1 ].
Later we’ll see why this process works; for now we’ll just take it for granted and use it whenever
we need to compute a matrix inverse.
1 4
Example 5.18. Find the inverse of A = .
1 3
17
1 2 3
Example 5.19. Find the inverse of 2 5 3 if it exists.
1 0 8
1 −2 −1
Example 5.20. Show that −1 5 6 does not have an inverse.
5 −4 5
XA = B =⇒ (XA)A−1 = BA−1
=⇒ X(AA−1 ) = BA−1
=⇒ XI = BA−1
=⇒ X = BA−1 .
Example 5.21. Solve the matrix equation ABXC = D for X, supposing A, B and C are invertible.
1. A−1 is unique.
2. (A−1 )−1 = A.
3. (AB)−1 = B −1 A−1 .
4. (cA)−1 = 1c A−1 .
Proof of 1: (by contradiction) Suppose B and C are two different inverses of A. This means that
BA = I, AB = I, CA = I, AC = I. Then:
B = IB = (CA)B = C(AB) = CI = C
Proof of 2: The matrix (A−1 )−1 is the (unique) matrix C such that A−1 C = I and CA−1 = I. But
we alreay know that A is such a matrix, so the inverse of A−1 is A.
18
Proof of 3: It suffices to demonstrate that both (B −1 A−1 )(AB) = I and (AB)(B −1 A−1 ) = I:
1. (AT )T = A
2. (A + B)T = AT + B T
3. (kAT ) = k(AT )
4. (AB)T = B T AT
Here we will see why it is that calculating the RREF of [ A | I ] gives [ I | A−1 ].
The key observation is that every elementary row operation can be represented by a certain matrix
multiplication:
Example 5.23. Show that performing the row operation R2 + 2R1 on the augmented matrix
−1 3 2 5
2 4 1 1
−4 9 0 2
19
is equivalent to left multiplication by the matrix
1 0 0
E = 2 1 0 .
0 0 1
In the previous example, notice that the matrix E is what we would get if we applied the same row
operation to the identity matrix. In fact, this is always the case. To see why, suppose a certain row
operation (or combination of row operations) corresponds to left-multiplication by some matrix E.
We have, of course, that
E = EI,
but now the right-hand side is the result of applying the given row operations to I, so this equation
states that we can find E by applying the corresponding row operations to I.
Example 5.24. Find the 3 × 3 matrix E such that left multiplication of a 3 × n matrix by E
corresponds to the row operations R1 ↔ R3 following by R2 − 5R1 .
Now consider calculating the RREF of [ A | I ]. This means finding a sequence of row operations
that reduces the left side of this matrix to the identity. Equivalently, it means finding a series of
matrices E1 , E2 , . . . , Ek so that left multiplication by E1 , followed by E2 , etc., reduces A to I:
Ek · · · E2 E1 A = I.
Now this says that the matrix Ek · · · E2 E1 must be A−1 , since multiplication of A by Ek · · · E2 E1
gives the identity. When we calculate the RREF of [ A | I ], we apply the same row operations to
both sides, so we have
RREF
[ A | I ] −−−−→ [ Ek · · · E2 E1 A Ek · · · E2 E1 I] = [ I Ek · · · E2 E1 ] = [ I A−1 ].
Note that we have found an interesting characterization of A−1 : it is the matrix that performs the
series of elementary row operations required tp reduce A to I. So, for example, when we solve
Ax = b by multiplying both sides by A−1 = Ek · · · E2 E1 , we are effectively performing the row
operations that result in the RREF of the augmented matrix [ A | b ]:
RREF
[ A b ] −−−−→ [ Ek · · · E2 E1 A Ek · · · E2 E1 b ] = [ I A−1 b ].
In other words, A−1 is a record of all the row operations required to compute the RREF of the
augmented matrix; calculating A−1 b performs these row operations on b only, resulting in the final
column of the RREF, from which we can read off the solution.
6 Applications
Lec. #13
Having built up a collection of the most essential ideas and techniques of linear algebra, we’re in a
position to explore some applications. The techniques discussed so far are used in a huge variety
of different fields; we’ll only sample some of the possibilities.
20
6.1 Balancing chemical equations: vector form
x1 C3 H8 + x2 O2 −→ x3 CO2 + x4 H2 O.
Rather than balance all species separately, we can balance all at once using vectors. For each
molecule, define a constant vector whose elements are the number of atoms of carbon, hydrogen
and oxygen, respectively. This is, represent C3 H8 with the vector (3, 8, 0), O2 with the vector
(0, 0, 2), etc. The balanced chemical equation then becomes the vector equation:
3 0 1 0
x1 8 + x2 0 = x3 0 + x4 2 .
0 2 2 1
Of course, component-by-component this represents three equations, each expressing the balance of
one species. But working at the level of vectors is more concise, there’s only one equation, there’s
a more intuitive connection to the chemical equation, and it’s obvious how to generalize to more
complicated reactions with more species.
Taking everything to the left-hand side we get the homogeneous matrix equation:
3 0 −1 0 0
8 0 0 −2 x = 0 .
0 2 −2 −1 0
So one solution (taking x3 = 4) is x = (1, 5, 3, 4). The structure of the entire solution set is also
apparent: every solution is some multiple of this contant vector.
21
.05
.95 .9
BC Alberta
.1
1. How many people will be living in each province after 10 years? (Assume the foregoing
assumptions continue to hold throughout this time interval.) 50 years?
2. Eventually the population distribution reaches an equilibrium. How to predict this equilib-
rium distribution from the transition matrix A?
(n) (n)
To solve 1. let x1 = number of people in BC after n years, x2 = number of people in Alberta
(n) (n)
after n years. Let x(n) = (x1 , x2 ); so e.g. x(0) = (4 × 106 , 3 × 106 ). We can write the following
equations for the populations after one year:
(1) (0) (0)
x1 = 0.95x1 + 0.1x2
(1) (0) (0)
x2 = 0.05x1 + 0.9x2
In general,
x(n) = An x(0) .
Note: the matrix A10 can be interpreted as a transition matrix, representing the net migration that
has occurred after 10 years.
To answer 2. notice that if x is the population vector at equilibrium then
0.95 0.1 1 0 x1 0
Ax = x =⇒ (A − I)x = 0 =⇒ − = .
0.05 0.9 0 1 x2 0
We can’t solve this by using the matrix inverse of (A − I), since this would give x = (0, 0) (the
trivial solution). In fact (A − I) doesn’t have an inverse, because there are an infinite number of
solutions. Instead, we use Gauss-Jordan on the augmented matrix for this linear system:
−0.05 0.1 0 RREF 1 −2 0
−−−−→ .
0.05 −0.1 0 0 0 0
22
So the general solution (in vector form) is
x1 2x2 2
x= = = x2 where x2 is free.
x2 x2 1
So every equilibrium solution is one where BC has twice the population of Alberta. Of course the
particular value of x2 depends on the total population of the two regions:
4, 666, 667
4, 000, 000 + 3, 000, 000 = 2x2 + x2 =⇒ x2 = 7, 000, 000/3 =⇒ x =
2, 333, 333
1 3
5
A directed graph consists of a set of vertices or nodes (numbered 1 through 5 above) and a set of
directed edges between nodes.
Some important types of questions that might be asked about such a graph are:
• How many different paths from node 1 to node 4 take exactly 6 steps?
• How many different paths are there from node 1 to node 4 taker fewer than 6 steps?
• What is the shortest path from node 1 to node 4?
Of course for a simple graph we can answer these questions easily; but for a complicated graph
these are hard problems. For general graphs, all of these questions can be answered using methods
of matrix algebra. The key is to associate with the graph an incidence matrix A with elements
aij = number of edges joining the i’th node to the j’th node.
So for the graph above the incidence matrix is
0 1 0 0 1
0 0 1 1 0
A= 0 0 0 0 1
0 0 0 0 1
1 0 0 1 0
23
Given A, it is possible to recover a drawing of the graph. So A is as complete a description of the
graph as the picture.
Definition 6.1. A path or chain is a route (sequence of edges) from one node to another. A path
that consists of n edges is called an n-chain.
Theorem 6.1. The number of n-chains from node i to node j is equal to the element aij of An .
24
7 Determinants
Lec. #16
The factor ad − bc in the denominator is called the determinant of the matrix, and plays an
important role. In particular, if ab − bc = 0 the inverse of the matrix is undefined. This provides
a quick check for whether a matrix has an inverse, and hence whether the corresponding linear
system has a unique solution. Determinants are important for many other reasons, so we’ll spend
a couple of days studying them.
a b
Definition 7.1. If A = then det A = |A| = ad − bc.
c d
Theorem 7.1. If A is a square matrix, then A−1 exists if and only if det A 6= 0.
This really becomes useful for larger matrices, because it’s a much faster alternative to trying to
compute A−1 by the Gauss-Jordan method. But how to compute det A if A is a a general n × n
matrix? This requires some more definitions:
Definition 7.2. Let A be an n × n matrix with elements aij . The matrix Mij formed by deleting
row i and column j from A is called the ij’th minor of A. The determinant of A, written det(A)
or |A|, is given by
n
X
det(A) = (−1)i+j aij |Mij | (cofactor expansion along row i (i = 1, . . . , n)) (1)
j=1
So, for example if A is a 3 × 3 matrix then the cofactor expansion along row 1 gives
25
first row:
4 −1 2 −1 2 4
det A = (1)
− (5)
+ (0)
−2 0 0 0 0 −2
h i h i
= (1) (4)(0) − (−1)(−2) − (5) (2)(0) − (−1)(0) + 0
= −2
This often makes calculating a determinant easier, because we are free to make a convenient choice
of row or column along which to do the cofactor expansion — typically this is the row or column
with the most zeroes.
Example 7.2. To calculate the determinant in the previous example, we can use a cofactor ex-
pansion down the third column:
2 4 1 5 1 5
det A = (0)
− (−1)
+ (0)
0 −2 0 −2 2 4
h i
= (1) (1)(−2) − (5)(0) = −2.
Example 7.3. Calculate determinants of
3 5 −8 4
3 0 4 0 −2 3 −7
(a) A = 2 3 2 (b) B =
0
0 1 5
0 5 −1
0 0 0 2
Aside: Students who have studied recursion (e.g. computer science) will recognize the determinant
calculation as a recursive algorithm: the determinant of an n × n matrix is expressed as a sum of
determinants of n smaller sub-matrices of dimension (n − 1) × (n − 1). These in turn are expressed
as determinants of yet smaller sub-matrices, of dimension (n − 2) × (n − 2), and so on until the
algorithm terminates at the 2 × 2 case, for which we have a simple formula.
The number of arithmetic operations to calculate det A grows very quick as A gets bigger. Count
just the multiplications (most expensive operation):
26
For the 10 × 10 case this is 10! = 3, 628, 800. 20! ≈ 2 × 1018 (more than the age of the universe in
seconds!)
In particular, a triangular matrix is invertible if and only if its diagonal entries are all non-zero.
Applying elementary row operations to a matrix changes the value of its determinant in predicable
ways:
Theorem 7.4. Let A be an n × n matrix. Then
7.4 Summing up
Lec. #18
The determinant helps to tie together the disparate parts of the theory discussed so far, as sum-
marized in the following theorem:
Theorem 7.6. Let A be an n × n matrix. Then all of the following statements are equivalent:
1. A is invertible.
2. The linear system Ax = 0 has only the trivial solution.
3. The linear system Ax = b has a unique solution.
4. A is row equivalent to the n × n identity matrix I.
5. The row echelon form of A has n pivots.
6. det A 6= 0.
27
8 Vectors
Lec. #19
1
Recall: a vector is a column matrix, e.g. x = 0 = (1, 0, 2).
2
Geometrically, 2-vectors can be identified with points in the plane; similarly, 3-vectors represent
points in 3-space. It is also useful to think of a vector as displacement, e.g. relative to the origin,
which specifies a magnitude and direction, but without any particular location; we represent this
graphically by drawing a vector as an arrow, often but not necessarily anchored at the origin.
Definition 8.1. The set of all 2-vectors is denoted IR2 . The set of all 3-vectors is denoted IR3 .
More generally, for any fixed integer n ≥ 0, IRn denotes the set of all n-vectors. A set IRn is called
a Euclidean space.
Geometrically, IR2 is the set of points in the plane; IR3 is the set of all points in 3-space. An
ordinary scalar is an element of IR1 , or simply IR, the real number line.
- scalar multiplication lengthens or shortens a vector while preserving its orientation (if the scalar
multiple is negative, there is also a reflection)
Vector subtraction can be viewed similarly. The difference x − y asks“what must I add to y to get
to x?”
28
Definition 8.3 (Span). The set of all possible linear combinations of vectors v1 , . . . , vn is denoted
Essentially, the span of a given set of vectors is the set of points you can “get to” by forming linear
combinations.
Example 8.1. Let u = (1, 3) ∈ IR2 . Then span{u} is the set of all vectors of the form cu where c
is a scalar. Geometrically, span{u} is the set of vectors on a line with direction u passing through
the origin.
Example 8.2. Let u = (1, 2, 3), v = (2, 0, 0) ∈ IR3 . Then span{u, v} is the set of all vectors of the
form c1 u + c2 v where c1 , c2 are scalars. Geometrically, span{u, v} is the plane through the origin
containing both u and v.
Example 8.3. Consider v1 = (1, 2), v2 = (1, −1). We see that span{v1 , v2 } contains x = (5, 1),
since x = 2v1 + 3v2 (i.e. x is a linear combination of v1 , v2 ).
Example 8.4. Let v1 = (2, 3), v2 = (4, 6). Consider x = (4, 4). Is x ∈ span{v1 , v2 }?
Example 8.5. Let v1 = (1, 1), v2 = (1, −1). Prove that span{v1 , v2 } = IR2 .
(Since you can “get to” all of IR2 by forming linear combinations of v1 and v2 , we say that these
vectors span IR2 .)
Example 8.6. Let v1 = (1, 2, 0), v2 = (1, 0, 1), v3 = (3, 2, 2). Do v1 , v2 , v3 span IR3 ?
Example 8.7. Let u = (1, 3), v = (2, 6) ∈ IR2 . Then span{u, v} = span{u} = span{v} is a line
through the origin with direction u, since u and v point in the same direction. In other words, u
and v don’t span a plane because they don’t point in independent directions.
[need a picture here]
Example 8.8. Let u = (1, 0, 0), v = (0, 1, 0), w = (1, 2, 0) ∈ IR3 . Then span{u, v, w} is the plane
with equation z = 0. This is the same plane obtained by considering just span{u, v} because w
does not point in a third independent direction. Loosely speaking, including w in the set of vectors
does not increase the number of places we can get to be forming linear combinations.
[need a picture here]
29
In the previous example it’s easy to see that all three vectors lie in the same plane, but consider:
Example 8.9. Suppose u = (1, 2, 3), v = (0, 1, −1), w = (1, 4, 1) ∈ IR3 . Do all three vectors point
in essentially independent directions, or do they all lie in a single plane? To answer this question
we need to check whether any one of the vectors can be expressed as a linear combination of the
other two (in which case this vector does not point in a third direction independent of the other
two). In fact you can check that w = u + 2v, so indeed w is a linear combination of u and v, and
all three vectors lie in a single plane. In other words, the three vectors span a set in which there
are only two independent directions.
c1 v1 + c2 v2 + · · · + cn vn = 0
for some scalars c1 , . . . cn that are not all zero. Otherwise the vectors are linearly independent.
That is, the vectors are linearly independent if the equation above has only the trivial solution
c1 = c2 = · · · = cn = 0.
The motivation behind the definition is that if it were possible to find scalars c1 , . . . , cn that satisfy
the equation in the definition, then it would be possible to express one of the vectors as a linear
combination of the others; e.g. if c1 6= 0 then
c2 cn
v1 = − v2 − · · · − vn
c1 c1
so that v1 is a linear combination of the other vectors; that is, it doesn’t point in an independent
direction.
Example 8.10. Let u = (1, 2, 3), v = (0, 1, −1), w = (1, 4, 0) ∈ IR3 . To test for linear indepen-
dence, we could painstakingly check directly whether any of these vectors can be expressed as a
linear combination of the other two (as in the previous example). It’s easier to apply the definition,
in which we seek numbers c1 , c2 , c3 that satisfy the equation
1 0 1 0
c1 2 + c2 1 + c3 4 = 0 .
3 −1 0 0
so the only solution is the trivial solution c1 = c2 = c3 = 0. Therefore the vectors are linearly
independent. In fact they span all of IR3 .
30
If you have two linearly independent vectors in IR2 , then it should be impossible to choose a third
vector that points in a third independent direction. This idea is formalized in the following theorem.
Theorem 8.1. A set of m vectors in IRn is always linearly dependent if m > n.
Proof. (sketch) Following the same method as the previous example, we set up a system of equations
for the unknowns c1 , . . . , cm , and the corresponding augmented matrix. We get n equations and m
unknowns. If m > n there will be at least one free variable, hence an infinite number of nontrivial
solutions. The existence of a nontrivial solution means the vectors are linearly dependent.
Lec. #22
An immediate consequence is the following corollary:
Corollary 8.1. Any set of linearly independent vectors in IRn contains at most n vectors.
Example 8.11. By the theorem just given, we know that the set of vectors u = (1, 2), v =
(0, 1), w = (3, 4) ∈ IR2 is necessarily linearly dependent.
Linear independence gives us another way to think of existence of solutions for systems of linear
equations:
Theorem 8.2. Let A be an n × n matrix. The homogeneous system Ax = 0 has nontrivial
solutions if and only if the columns of the coefficient matrix A are a set of linearly dependent
vectors. Conversely, if the columns of A are linearly independent then the equation Ax = 0 has
only the trivial solution x = 0.
Proof. The proof is basically the same as for Theorem 8.1 with m = n.
Theorem 8.3. Any set of n linearly independent vectors in IRn spans IRn .
A basis defines a coordinate system for V , in that any x ∈ V can be expressed as a (unique) linear
combination
x = c1 v1 + c2 v2 + · · · + cn vn .
The vector c = (c1 , . . . , cn ) gives the coordinates of x relative to the basis B = {v1 , . . . , vn }.
Definition 8.6 (Standard Basis for IRn ). In IRn the vectors
1 0 0
0 1 0
e1 = . e2 = . · · · en = .
.. .. ..
0 0 1
form the standard basis.
31
Example 8.12. The vector x = (2, 5) ∈ IR2 can be expressed as
x = 2e1 + 5e2 .
The numbers 2, 5 are the familiar coordinates of x relative to the standard basis. An alternative
basis for IR3 is defined by the vectors v1 = (1, 1), v2 = (−1, 1). The same vector x = (2, 5) can be
expressed with coordinates (c1 , c2 ) relative to the basis B = {v1 , v2 }, according to:
2 1 −1
= c1 + c2
5 1 1
Solving this linear system gives c1 = 2/3, c2 = −7/3. We can think of these as the coordinates of x
relative to a coordinate system with axes in the directions of v1 and v2 . We write (x)B = (2/3, −7/3)
to denote that the coordinates are being specified relative to the basis B.
Corollary 8.2. Every set of n linearly independent vectors in IRn is a basis for IRn .
Example 8.13. Consider the set of vectors x ∈ IR3 that lie on the plane
x + 2y + 3z = 0.
We’ve seen already that such a plane through the origin is a vector space (a subspace of IR3 ); call
it V . To find a basis for V , write the general solution of the equation in vector form:
x −2y − 3z −2 −3
x= y =
y =y 1 +z 0
z z 0 1
Thus any point x in the plane can be expressed as a linear combination of v1 = (−2, 1, 0) and
v2 = (−3, 0, 1). So B = {v1 , v2 } provides a basis for V ; you can think of B as a coordinate system
in V , with axes in the v1 - and v2 -directions.
Lec. #23
The number of vectors required to span a vector space V provides some measure of the “size” of
V ; we call this number the dimension of V :
Definition 8.7 (Dimension). If a vector space V has a finite basis, then the dimension of V
(written dim(V )) is the number of vectors in the basis.
There’s some ambiguity in this definition, since at first it seems that different bases might give
different dimensions. The following theorem establishes that in fact there is just one dimension:
Theorem 8.4. If {u1 , . . . , um } and {v1 , . . . , vn } are both bases for a vector space V , then m = n.
That is, any two bases for V have the same number of vectors.
Example 8.14. In IRn , a basis (e.g. the standard basis {e1 , . . . , en }) consists of n vectors, so that
dim(IRn ) = n.
32
Example 8.15. In a previous example we found a basis B = {v1 , v2 } for the plane V with equation
x + 2y + 3z = 0. Therefore dim(V ) = 2.
Suppose we have a vector space V and a given a vector x ∈ V , relative to the standard basis. Given
some other basis B = {v1 , . . . , vn }, we would like to find the coordinates of x relative to B. That
is, we wish to find the coordinates (x)B = (c1 , . . . , cn ) such that
c1 v1 + · · · cn vn = x.
We can write a simpler formula for the c1 , . . . , cn by writing this equation in matrix form:
c1
.
v1 v2 · · · vn .. = x.
cn
Letting PB = v1 · · · vn and (x)B = (c1 , . . . , cn ), this becomes simply
x = PB (x)B .
The matrix PB is called the change of basis matrix or transition matrix from the standard basis to
the basis B. We can solve algebraically for xB :
(x)B = PB−1 x.
Example 8.17. Let x = (3, 1) and B = {v1 , v2 } where v1 = (1, 1), v2 = (−1, 1). The transition
matrix for the change of basis from the standard basis to the basis B is
1 −1 −1 1/2 1/2
PB = v1 v2 = so PB =
1 1 −1/2 1/2
Thus
1/2 1/2 3 2
x = PB (x)B =⇒ (x)B = PB−1 x = = .
−1/2 1/2 1 −1
33
9 Abstract Vector Spaces
Lec. #24
9.1 Abstract vector spaces and their properties
So far we have been studying only vectors in IR2 , IR3 and (generally) IRn — the so-called real
Euclidean spaces. Most of our considerations were motivated by either systems of linear equations
or coordinate geometry. But it turns out that most of what we have done can be generalized, to the
case of general vector spaces. While this new setting is more abstract, we gain greatly in that the
ideas applicable to vectors in IRn (about which you have hopefully developed some intuitive ideas
by now) become powerful tools applicable to much broader classes of mathematical objects. The
ideas developed in this section are fundamental to many mathematical fields, including advanced
calculus, differential equations, and analysis in general.
9.1.1 Definitions
Definition 9.1 (Real vector space). A real vector space V is a set of mathematical objects, called
vectors, together with two operations called addition and scalar multiplication that satisfy the
following axioms:
Recall that a scalar is a real number (thus the term real in the definition).
The axioms above are fairly obviously true for the sets IR, IR2 , IR3 and IRn in general, so each of
these is a real vector space. However, there are much more exotic vector spaces that also satisfy
these axioms. The power of the concept is that any mathematical fact that we can prove about
vector spaces in general will be true for these exotic spaces, and not just for IRn . Any theorems we
prove in this section will apply to any vector space.
To get a sense for the generality of the concept of a vector space, it will help to consider some
examples. Often the meaning of a definition becomes more clear if we consider some examples that
do not satisfy the requirements of the definition.
34
Example 9.1. The unit interval V = [0, 1] ⊂ IR is not a vector space. This is because V is not
closed under addition: both x = 0.5 ∈ V and y = 0.8 ∈ V , but x + y = 1.3 ∈
/ V . In other words, it
is possible to get outside of V using addition of elements of V .
Example 9.2. The set of non-negative integers V = {0, 1, 2, . . .} is a not vector space. This is
because V does not contain the additive inverse: 1 ∈ V but −1 ∈
/ V.
Example 9.3. The set of integers V = {. . . , −2, −1, 0, 1, 2, . . .} is not a real vector space. This is
because V is not closed under scalar multiplication: 2 ∈ V but 0.3(2) = 0.6 ∈ / V . In other words,
it is possible to get outside of V using scalar multiplication.
Example 9.4. The set of points (x, y) on the line y = mx is a vector space, i.e. V = {(x, y) ∈ IR2 :
y = mx}.
Most of the axioms are obvious. We’ll prove some of the less obvious ones. Let x = (x1 , y1 ) and
y = (x2 , y2 ) ∈ V , so y1 = mx1 and y2 = mx2 . To show that x + y ∈ V :
x + y = (x1 , mx1 ) + (x2 , mx2 ) = (x1 + x2 , mx1 + mx2 ) = (x1 + x2 , m(x1 + x2 )) ∈ V.
To show that −x ∈ V :
−x = −(x1 , mx1 ) = (−x1 , −mx1 ) = (−x1 , m(−x1 )) ∈ V.
To show that αx ∈ V :
αx = α(x1 , mx1 ) = (αx1 , αmx1 ) = (αx1 , m(αx1 )) ∈ V.
Notice that the line V in the previous example is equivalent to IR. One sometimes speaks of such
a V as being an embedding of IR in IR2 . More commonly we say V is a subspace of IR2 . Similarly
a plane (a 2-dimensional vector space) in 3 dimensions is basically an embedding of IR2 in IR3 , or
a 2-dimensional subspace of IR3 :
Now some more exotic examples.
Example 9.5. Let Pn = {polynomials P (x): P has real coefficients and deg(P ) ≤ n}. Then Pn is
a vector space. Clearly the sum of two polynomials of degree ≤ n is also a polynomial of degree
≤ n, so Pn is closed under addition. The additive identity is given by 0 = 0xn + 0xn−1 + · · · +
0x + 0 ∈ Pn . If x = an xn + an−1 xn−1 + · · · + a1 x + a0 then the corresponding additive inverse is
−x = −an xn − an−1 xn−1 − · · · − a1 x − a0 , so that x + (−x) = 0. All other properties are at least
as obvious.
Example 9.6. The space C[a, b] = {real-valued continuous functions on the interval [a, b]} is a
real vector space. It is closed under addition because the sum of two continuous functions is also
a continuous function. The addition operation is the usual addition of functions: (f + g)(x) =
f (x) + g(x). Similarly, scalar multiplication is defined by (αf )(x) = αf (x). The zero element 0
is the zero function f (x) = 0. The additive inverse is given by (−f )(x) = −f (x). With these
identifications, the rest of the axioms are evident.
35
9.1.2 Subspaces
Lec. #25
To check that a given set is a vector space is tedious, requiring that we check many axioms.
Fortunately the situation is simpler if we already have a vector space V and we wonder whether
some subset U ⊂ V is a vector space:
Theorem 9.1. Let V be a vector space. A nonempty subset U ⊂ V is a vector space if the following
closure axioms hold:
1. If x, y ∈ U then x + y ∈ U .
2. If x ∈ U and α is a scalar, then αx ∈ U .
Basically, this works because U inherits all of the other axioms (algebraic properties of addition
and multiplication) from V . The only axioms that might be broken by taking a subset of V are
the closure axioms, so we need to check that these hold on U .
An important consequence is that every subspace of a vector space V automatically contains the 0
element. This is because for any x ∈ U , closure under scalar multiplication implies that 0x = 0 ∈ U
(we’ll prove this in the next section).
Example 9.7. If U ⊂ IR3 is the set of points lying in some plane that passes through the origin,
then U is a vector space (a subspace of IR3 ). To prove this, write U = {(x, y, z) : Ax+By +Cz = 0}
(points in the plane satisfy the equation of the plane in standard form). To check closure under
addition, suppose x = (x1 , y1 , z1 ) ∈ U and y = (x2 , y2 , z2 ) ∈ U . To prove that x + y ∈ U , write
x + y = (x1 + x2 , y1 + y2 , z1 + z2 ).
Now show that the coordinates of this vector satisfy the equation of the plane:
A(x1 + x2 ) + B(y1 + y2 ) + C(z1 + z2 ) = (Ax1 + By1 + Cz1 ) + (Ax2 + Bx2 + Cz2 ) = 0.
| {z } | {z }
=0 since x ∈ U =0 since y ∈ U
so αf ∈ U .
36
Example 9.9. Let V = C[0, 1] and U = {f ∈ V : f (0) = 0}. Then U is a subspace of V .
Lec. #26
Example 9.10. Let V = C[0, 1] and U = {f ∈ V : f ′ (x) exists and is continuous}. Then U is a
subspace of V .
Given some set U that we wish to prove is a vector space, sometimes it’s possible to recognize U as
the intersection of two sets that have already been proved to be vector spaces. Then the following
theorem guarantees that U also is a vector space.
Example 9.11. We’ve already proved that V = C[0, 1] R= {continuous functions on [0, 1]} is a
1
vector space. We’ve also proved that U1 = {f ∈ V : 0 f (x) dx = 0} and U2 = {f ∈ V :
′
f (x) is continuous} are both subspaces of V . From theR previous theorem it follows that the
1
subset U = {f ∈ V : f has continuous first derivative and 0 f (x) dx = 0} is also a subspace of V ,
since we recognize that U = U1 ∩ U2 .
Example 9.12. Let u = (1, 2, 3), v = (2, 0, 0) ∈ IR3 . Then the plane spanned by u and v is a
subspace of IR3 .
Example 9.13. Consider V = C[0, 1], and let f (x) = 1, g(x) = x, and h(x) = x2 . Then
U = span{f, g, h} is the set of functions of the form A + Bx + Cx2 for some coefficients A, B, C
(i.e. U is the set of quadratic functions). Then U is a subspace of V . In fact U is identically the
space P2 from example 9.5.
Numerous logical consequences follow from the axioms defining a vector space. For example:
Theorem 9.4. Let V be a vector space. Then for any x ∈ V and any scalar α,
1. α0 = 0
2. 0x = 0
3. (−1)x = −x.
α0 = α(0 + 0) = α0 + α0.
37
Adding −(α0) to both sides we get
2. The proof is basically the same as above. Start with 0+0 = 0, so that 0x = (0+0)x = 0x+0x.
Adding −0x to both sides we get 0 = 0x + (0x + (−0x)) = 0x + 0 = 0x.
It may seem that we’re being inordinately fussy in carrying out the steps of the proof, but there are
important reasons. Notice that every step is justified by one of the defining axioms; no algebraic
“shortcuts” are taken, because it isn’t apparent what shortcuts might be valid. Everything must
follow directly from the defining axioms.
Every vector space is equipped with an “addition” operation we label “+”, and a “zero element”
we label “0”. In doing so we’re re-using some familiar mathematical symbols, but they need not
correspond to addition and the number 0 as we have come to know them. So no algebraic operations
with them can be carried out without justification either by the defining axioms of by some theorem
that follows from them. This is typical of axiomatic reasoning that is common throughout advanced
mathematics.
The usual laws of algebra lead to the following properties of the scalar product:
1. x · y = y · x
2. x · (y + z) = x · y + x · z
3. (ax) · y = a(x · y)
38
10.2 Length of a vector
This follows from the Pythagorean Theorem. More generally, if v = (x1 , x2 , . . . , xn ) is any vector
in IRn then its magnitude is q
|v| = x21 + x22 + · · · + x2n .
From the definition it follows that for any vector v and scalar c,
|cv| = |c||v|.
Example 10.2. A wire is extended between the tops of two flag poles, whose tops are located at
points with coordinates (3, 4, 1) and (2, 2, 2) in IR3 . Find the length of the wire.
|u + v| ≤ |u| + |v|.
This theorem states the intuitive fact that a straight line is the shortest distance between two
points.
Proof. Use the algebraic properties of the dot product (and the Cauchy-Schwarz inequality, |u·v| ≤
|u||v|, which we haven’t discussed):
|u + v|2 = (u + v) · (u + v)
= u · u + 2u · v + v · v
= |u|2 + 2u · v + |v|2
≤ |u|2 + 2|u · v| + |v|2
≤ |u|2 + 2|u||v| + |v|2
= (|u| + |v|)2
39
Given any nonzero vector v, we can form a unit vector having the same direction by dividing v by
its length: u = v/|v|. Then u points in the same direction as v, but has unit magnitude. This
is useful because it makes it easy to form other vectors, of any length we want, that point in the
same direction as v; e.g. 5u is a vector of length 5 in the same direction as v.
Example 10.3. You walk from the origin along a straight line through the point v = (2, 3). If you
keep walking along this line until your distance from the origin is 10, what is your final position
vector?
Definition 10.3. The standard unit vectors are defined as i = (1, 0, 0), j = (0, 1, 0), k = (0, 0, 1).
These point in the directions of the coordinate axes.
Example 10.4. Write the vector v = (4, −3, 7) as a combination of the standard unit vectors.
Proof. (Special case of 2- and 3-D.) This follows from the cosine law, which in terms of vectors can
be written:
|u − v|2 = |u|2 + |v|2 − 2|u||v| cos θ
By rewriting the left-hand side using algebraic properties of the dot product:
|u − v|2 = (u − v) · (u − v)
= u · u − 2u · v + v · v
= |u|2 − 2u · v + |v|2 ,
Angles between vectors really only make geometrical sense in IR2 and IR3 ; nevertheless, this defini-
tion extends the notion of “angle” to higher dimensions.
Example 10.5. Guy wires are extended from the origin to the tops of two flag poles whose tops
are located at points with coordinates (3, 4, 1) and (2, 2, 2) in IR3 . Find the angle between the wires
at the origin.
40
10.5 Orthogonality
In 2- and 3-D, orthogonality of vectors is equivalent to their being perpendicular, i.e. the angle θ
between them is 90◦ . From the definition of angle between vectors:
u·v
= cos 90◦ = 0 ⇐⇒ u · v = 0.
|u||v|
Example 10.6. Let u = (1, 2, 3), v = (3, 2, 1), w = (−3, 0, 1). Which pairs of vectors (if any) are
orthogonal?
Again, this definition only makes geometrical sense in 2- and 3-D, but it applies in higher dimensions
as well. Basically, vectors are orthogonal if they point in independent directions.
Definition 10.6. Two vectors u, v are parallel if the angle θ between them is 0 or 180◦ ; that is, if
u·v
cos θ = = ±1.
|u||v|
Lec. #28
Theorem 10.2 (Pythagoras). Vectors u, v are orthogonal if and only if |u + v|2 = |u|2 + |v|2 .
|u + v|2 = (u + v) · (u + v)
= u · u + 2u · v + v · v
= |u|2 + 2u · v + |v|2
Given vectors u and v in IRn , it will be useful to have a method for resolving u into two components,
one parallel to v and the other perpendicular to v:
u = uk + u⊥ .
Now we can form the vector uk since we know both its length (by the formula above) and its
direction (v/|v|). The result uk is called the orthogonal projection of u onto v:
41
Definition 10.7. Let u, v be nonzero vectors. The orthogonal projection of u onto v, denoted by
projv (u), is the vector
u·v v u·v u·v
projv (u) = = 2
v= v.
|v| |v| |v| v·v
Example 10.7. Find the orthogonal projection of u = (3, 4) onto v = (5, 2).
Example 10.8. Given u = (3, −1, 4) and v = (4, 1, 1) find projv (u).
In two dimensions the standard form for the equation of a line is y = mx + b. It turns out this
doesn’t generalize well to describing lines in higher dimensions. For this purpose it is better to
write the equation of the line in parametric form.
A line through the point r ∈ IR2 parallel to v can be represented by the parametric equation
x = r + tv, t ∈ IR
Example 11.1. To find the parametric form of the line through the points (3, 2, 1) and (1, −1, 2),
we first find the direction vector
We can readily find other points on the line, e.g. taking t = 2 we get x = (3 + 2 · 2, 2 + 3 · 2, 1 − 2) =
(7, 8, −1).
42
11.1.2 Symmetric form
By solving for the parameter t the equation of a line can be written in symmetric form. Following
the previous example, we get
x−3 y−2 1−z
t= t= t=
2 3 1
so every point on the line must satisfy the equation
x−3 y−2
= = 1 − z.
2 3
From the first equality we have
3(x − 3) = 2(y − 2) =⇒ 3x − 2y = 5.
This give the equation of the line as projected on the xy-plane (i.e. top-down view). We can get
projection in the xz- or yz-planes by considering other pairs of equalities.
Example 11.2. Consider two lines: one through the points (0, 1, 2) and (1, 2, 3), the other through
the points (−2, 0, 1) and (1, 1, 5). Do these lines intersect?
It helps to first express both lines in parametric form. The direction vectors are (1, 1, 1) and (3, 1, 4),
respectively, so in vector form the lines have parametric equations:
x = (0, 1, 2) + t1 (1, 1, 1)
x = (1, 1, 5) + t2 (3, 1, 4).
Note that we need two parameters t1 and t2 (one to parametrize each line). The lines intersect if
there they have a point in common; that is, if there are numbers t1 , t2 such that both equations
above describe the same point. Component-wise, this condition can be expressed as a system of
linear equations:
0 + t1 = 1 + 3t2 (x-coordinate)
1 + t1 = 1 + t2 (y-coordinate)
2 + t1 = 5 + 4t2 (z-coordinate).
The augmented matrix for this system is:
1 −3 1 1 0 0
1 −1 0 − RREF
−−−→ 0 1 0
1 −4 3 0 0 1
43
11.2 Planes
Lec. #31
11.2.1 Standard form
We have already seen how a linear equation in three variable describes a plane, e.g. the set of points
(x, y, z) ∈ IR3 that satisfy the equation
3x + 2y − 5z = 8.
all lie in a plane. In general, any plane in IR3 can be described by an equation in standard form:
ax + by + cz = d,
y = z = 0 =⇒ x = 8/3 (x-intercept)
x = z = 0 =⇒ y = 4 (y-intercept)
x = y = 0 =⇒ z = −8/5 (z-intercept)
Another way to describe a plane is to parametrize it much as we did for lines in the previous section.
To obtain a parametric form for the equation of a plane, consider its equation in standard form as
a system of linear equations (a somewhat trivial system consisting of just one equation), with one
basic variable (say, x) and two free variables (y and z).
For our example above (3x + 2y − 5z = 8 in standard ) the general solution is
(
x = 38 − 23 y + 35 z
y, z are free.
where y, z are free; the free variables are parameters that parametrize the set of all points in the
plane.
In general the parametric form of the equation of any plane is
x = r + su + tv, s, t ∈ IR.
44
In our example r = (8/3, 0, 0). u = (−2/3, 1, 0) and v = (5/3, 0, 1). The plane consists exactly of
the set of points x of the form x = r + su + tv where s, t can be any real numbers. The vector r is
some point in the plane; vectors u, v specify two independent directions in the plane, along which
we can travel from r to arrive at any other point in the plane. It makes sense that there should be
two independent directions (and two corresponding parameters), since a plane is a 2-dimensional
surface.
If we fix s = 0 then the equation reduces to x = r + tv, which is the equation of a line through r
with direction v. This line lies in the plane. Fixing any other value of s we get a different (parallel)
line, through a different point. We get another collection of lines in the plane if we hold t fixed and
allow s to vary. Thus we can see that the plane consists of collections of parallel lines.
In standard form the equation of a plane doesn’t give us much insight into the plane’s orientation.
We might wonder, for example, what direction does the plane face?
A natural way to describe the orientation of a plane is by it normal vector, i.e. the direction of any
vector perpendicular to the plane. This also leads to a natural form for the equation of the plane.
Given any point r in the plane and the normal vector n, we have that for any point x in the plane,
the vector x − r is perpendicular to n.
[need a picture here]
Using the dot product to express orthogonality, we have:
(x − r) · n = 0.
This is the equation of the plane in normal form. The plane consists of all points x that satisfy
this equation.
Example 11.3. The normal form of the equation of the plane that passes through the point (2, 3, 1)
and is oriented with normal vector (−1, −1, 1) is
[x − (2, 3, 1)] · (−1, −1, 1) = 0.
Expanding this in terms of the components of x = (x, y, z) we have
[(x, y, z) − (2, 3, 1)] · (−1, −1, 1) = 0 =⇒ (x − 2, y − 3, z − 1) · (−1, −1, 1) = 0
=⇒ −(x − 2) − (y − 3) + (z − 1) = 0
=⇒ −x − y + z = −4,
which gives the equation of the line in standard form.
Example 11.4. To find the normal vector for a plane expressed in standard form, e.g.
3x + 2y − 5z = 8,
we need only rearrange the equation:
3x + 2y − 5z − 8 = 0 =⇒ 3(x − 0) + 2(y − 4) − 5(z − 0) = 0
=⇒ [(x, y, z) − (0, 4, 0)] · (3, 2, −5) = 0
=⇒ (x − r) · n = 0
45
with r = (0, 4, 0) and n = (3, 2, −5). Notice that the components of n are exactly the coefficients
from the equation in standard form. This is true in general, so we can always easily find the normal
vector n directly from the equation in standard form.
12 Linear Transformations
46