0% found this document useful (0 votes)
114 views108 pages

Introduction To Number Theory

This document is an introduction to number theory, specifically tailored for junior and senior mathematics majors at UCLA. It covers topics such as unique factorization, modular arithmetic, and quadratic number theory, emphasizing the connections between algebra, analysis, and geometry. The course is designed for students with minimal prior exposure to abstract algebra and includes historical context and key mathematical figures in the development of number theory.

Uploaded by

sdwzlj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views108 pages

Introduction To Number Theory

This document is an introduction to number theory, specifically tailored for junior and senior mathematics majors at UCLA. It covers topics such as unique factorization, modular arithmetic, and quadratic number theory, emphasizing the connections between algebra, analysis, and geometry. The course is designed for students with minimal prior exposure to abstract algebra and includes historical context and key mathematical figures in the development of number theory.

Uploaded by

sdwzlj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 108

Introduction to Number Theory

Tim Smits

June 13, 2023


Contents

I Unique Factorization 5
1 Unique factorization in Z 6
1.1 Key properties of Z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Divisibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 The Euclidean algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Linear Diophantine equations . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Unique factorization in Z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.6 Primes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2 Unique factorization in Euclidean domains 22


2.1 Unique factorization in F [x] . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2 Unique factorization in Z[i] . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Euclidean domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

II Modular Arithmetic 34
3 Arithmetic in quotient rings 35
3.1 The quotient ring and Z/nZ . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Quotients of Z[i] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3 Quotients of F [x] and finite fields . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4 The structure of (Z/nZ)× 43


4.1 Chinese remainder theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Euler’s theorem and orders mod n . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3 Cyclicity of (Z/pZ)× . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4 Hensel’s lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.5 Cyclicity of (Z/nZ)× . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.6 Application: Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.7 Application: Decimal expansions . . . . . . . . . . . . . . . . . . . . . . . . 57
4.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

1
III Quadratic Number Theory 66
5 Quadratic congruences 67
5.1 Quadratic residues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2 Quadratic reciprocity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.3 Jacobi reciprocity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.4 Squares mod n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

6 Conic sections 76
6.1 Rational points on conics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.2 Sums of squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.3 Sums of squares and lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.4 Pell’s equation and quadratic rings . . . . . . . . . . . . . . . . . . . . . . . 84
6.5 Quadratic fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

A Induction 99

B Algebraic Structures 101


B.1 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
B.2 Rings and fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
B.3 Morphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

2
Introduction

These notes are from the math 111 course I taught at UCLA in Spring 2023, which serves
as an introduction to number theory intended for junior and senior mathematics majors.
The prerequisite knowledge is minimal exposure to abstract algebra: familiarity with the
definitions of a group, a ring, an ideal, a field, and a homomorphism is sufficient background
for almost everything in these notes. The basics are contained in the appendix for those
who need it. Additionally, it is assumed the reader is familiar with linear algebra in finite
dimensional vector spaces. There may be many typos. Please let me know if any are found!

Number theory is one of the oldest branches of mathematics, with many fundamental
results dating back to the ancient Greeks. At its core, number theory studies the integers. It
might be hard to believe that this could be particularly hard, or even interesting. However,
number theory is an incredibly deep, and rich area of mathematics. Here are a two reasons
why studying number theory is worthwhile:

ˆ Number theory is a central branch of mathematics, meaning that it heavily utilizes


the three “core” branches of mathematics: algebra, analysis, and geometry. At an
undergraduate level, many of the core courses are “disjoint” from each other. A first
time learner of abstract algebra is not going to see any connections to real analysis,
and in a first course in real analysis, students are unlikely to see the connections
with geometry and topology. With number theory, however, it’s much easier to see
the interplay between the different branches of mathematics at an introductory level.
In particular, the study of Diophantine equations bridges ideas from algebra (unique
factorization), and geometry (rational points on curves).
From a modern perspective, there are many objects that come from “algebraic”, “an-
alytic”, and “geometric” ideas. One of the most important examples of this arises in
the study of elliptic curves, where there is a geometric object (the elliptic curve E),
an algebraic object (the group of rational points E(Q)), and an analytic object (an
L-function attached to E). Much of modern number theory revolves around trying to
unify these three different perspectives. No matter what your mathematical interests
are, there are applications of it to number theory!

ˆ As a consequence of number theory being such an old branch of mathematics, it’s a


subject with a very interesting history, that motivated a lot of the development of
mathematics. In particular, one of the most famous problems is Fermat’s last theorem,
which says there are no non-trivial integer solutions to xn + y n = z n for n ≥ 3.
When n = 2, solutions to x2 + y 2 = z 2 correspond to Pythagorean triples, which were

3
studied by the Greek mathematician Pythagoras. Fermat famously claimed that he
had a proof of his theorem, which he did not write down. Fermat’s last theorem went
unproven for over 350 years, and it captured the interest of many historically prominent
mathematicians, who made significant advancements to the field of number theory.
For those interested, here is a list of some of the mathematicians whose ideas were
fundamental to the development of the topics that will be covered in these notes:

– Brahmagupta, Euclid, Pythagoras, Diophantus


– Fermat, Euler, Lagrange, Legendre, Gauss
– Dirichlet, Kummer, Dedekind, Hensel

4
Part I

Unique Factorization

5
Chapter 1

Unique factorization in Z

1.1 Key properties of Z


The set Z of integers comes with two binary operations, addition (+) and multiplication (·),
meaning that for any a, b ∈ Z we get well-defined integers a + b and a · b. We will take the
construction of the integers and the formal definition of these operations for granted, but
these operations satisfy the following properties for any a, b, c ∈ Z:

1. (Commutativity) a + b = b + a and ab = ba.

2. (Associativity) a + (b + c) = (a + b) + c and a(bc) = (ab)c.

3. (Distributivity) a(b + c) = ab + ac.

4. (Zero) There exists 0 ∈ Z such that a + 0 = a.

5. (Negatives) There exists −a ∈ Z such that a + (−a) = 0.

6. (One) There exists 1 ∈ Z such that a · 1 = a.

More compactly, addition and multiplication are both associative and commutative, mul-
tiplication distributes over addition, and there are special elements 0 and 1 that don’t
“change” an integer a with respect to the relevant operation. Addition has the extra prop-
erty that integers have additive inverses, which give rise to negative numbers.

Sitting inside of Z is a non-empty subset P ⊂ Z with the following properties:

1. For any a, b ∈ P , a + b ∈ P .

2. For any a, b ∈ P , a · b ∈ P .

3. 0 6∈ P .

6
4. (Trichotomy) For all a ∈ Z, exactly one of the following holds: a ∈ P , 0 ∈ P , or
−a ∈ P .

This subset P is, of course, the positive integers. The existence of this subset P allows us
to define an ordering on Z, by defining a < b ⇐⇒ a − b ∈ P . The ordering of the integers
gives rise to one of its most basic properties:
Proposition 1 (Zero product property). Let a, b ∈ Z with ab = 0. Then a = 0 or b = 0.
Proof. We prove the contrapositive, that if a 6= 0 and b 6= 0 then ab 6= 0. First suppose
that b > 0. Since multiplication of integers is defined via repeated addition, i.e. a · b =
{z. . . + a}, then ab ≥ a or ab ≤ a depending on the sign of a. If b < 0, do the same
|a + a +
b times
thing with (−a)(−b).
The integers are very special, because not only can they be ordered, but they can be
well-ordered.
Theorem 1.1.1 (Well-ordering principle). Let S ⊂ N be non-empty. Then S has a smallest
element with respect to <.
The well-ordering principle will be very important for proving key theorems that hold
in the integers. One can show that the well-ordering principle is equivalent to the principle
of mathematical induction, and so many of the proofs that follow could be formulated with
induction instead. We leave the proof of their equivalency in the appendix.

1.2 Divisibility
Definition 1.2.1. We say for integers a, b that a divides b if there is an integer k such that
b = ak, and write this as a | b.
Example 1.2.2. We have 2 | 10, because 102 · 5, (−7) | 49, because 49 = (−7) · (−7), and
5 | 0 because 0 = 5 · 0. Clearly ±1 | n for any integer n, however, 0 - n for any non-zero
integer n, because there is no integer k with 0 · k = n.
From the definition of divisibility, we have the following basic properties:
Proposition 2. Let a, b, c ∈ Z.
1. If a | b then a | b` for any ` ∈ Z.

2. If a | b and b | c, then a | c.

3. If a | b and a | c then a | bx + cy for any x, y ∈ Z.

4. If a | b for b 6= 0, then |a| ≤ |b|.


Proof. These follow more or less immediately from the definition and are left as an exercise.

7
One of the first things one learns about the integers in grade school is that it’s possible
to perform division with remainder :

Theorem 1.2.3 (Division Algorithm). Let a and b be integers with b > 0. Then there exist
unique integers q and r such that a = bq + r with 0 ≤ r < b.

Proof. Let S = {a − bq > 0 : q ∈ Z}. Then S is non-empty: this is clear if a > 0, and if
a < 0, then a − (a − 1)b = a(1 − b) + b > 0. By the well-ordering principle, S contains a
smallest positive element, say r, which we can write as a − bq = r for some q. This then gives
a = bq + r. First we show that 0 ≤ r < b. If r > b, then r = b + (r − b), and (r − b) > 0. so
a = b(q + 1) + (r − b) with 0 < r − b < r. This contradicts the minimality of r, so 0 ≤ r < b.
This proves the existence of such q and r, so it remains to show the uniqueness of q and r.

Suppose that a = bq + r and a = bq 0 + r0 with 0 ≤ r < b and 0 ≤ r0 < b. Without loss of


generality, suppose that r ≤ r0 . Then from bq + r = bq 0 + r0 we get b(q − q 0 ) = (r0 − r). This
says (r0 − r) is a multiple of b, and since 0 ≤ r0 − r < b this says r0 − r = 0, i.e. r = r0 . It’s
then immediate that q = q 0 so uniqueness follows.

Example 1.2.4. With a = 17 and b = 3203, we have 3203 = 17 · 188 + 7. This can be
determined from the usual grade school algorithm of long division.

It’s possible to extend the division algorithm to allow negative dividends. The proof is
very similar, and we leave it as an exercise.

Theorem 1.2.5 (Extended Division Algorithm). Let a, b ∈ Z with b 6= 0. Then there exist
unique integers q, r such that a = bq + r and 0 ≤ r < |b|.

Proof. Exercise.

Definition 1.2.6. The greatest common divisior of two integers a and b is the largest
positive integer d such that:

1. d | a and d | b

2. If c | a and c | b, then c ≤ d.

We write gcd(a, b) or (a, b) to denote the greatest common divisor. If (a, b) = 1, we say that
a and b are relatively prime.

If d is any divisor of an integer a, then property 4 of proposition 2 says there are only
finitely many possibilities for d, so it makes sense to speak of a largest one. If a = 0, then
all integers are divisors of a, so (0, a) = |a|. Note that (0, 0) is therefore undefined.

Example 1.2.7. The greatest common divisor of a = 315 and b = 195 is d = 15, which is
easily found by factoring.

The greatest common divisor of two integers a and b has the important property that
it’s an integer linear combination of a and b:

8
Theorem 1.2.8 (Bezout’s lemma). Let a, b ∈ Z and set d = gcd(a, b). Then there exist
integers x and y such that ax + by = d.

Proof. Let S = {ax + by > 0 : x, y ∈ Z}, the set of all integer linear combinations of a and b
that are positive. Notice that S is non-empty, because either a or −a is positive and a ∈ S.
By the well-ordering principle, S has a least positive element, say d0 , so there are integers x
and y such that ax + by = d0 . We will show that d = d0 .

Since d | a and d | b by definition, d | (ax + by) so d | d0 , and therefore d ≤ d0 . To show


that d0 ≤ d, we will show that d0 divides every element of S. Let s ∈ S, so s = ax0 + by 0 for
some x0 , y 0 ∈ Z. By the division algorithm, we can write s = d0 q + r for unique q and r with
0 ≤ r < d0 . Then r = s − d0 q = ax0 + by 0 − (ax + by)q = a(x0 − xq) + b(y 0 − yq). However,
r < d0 and d0 is the smallest positive element of S, so this forces r = 0. Therefore, s = d0 q
so d0 | s for any s ∈ S. In particular, a, b ∈ S so d0 | a and d0 | b, so d0 is a common divisor
of a and b, so d0 ≤ d by definition of d.
Note that we not only proved that the greatest common divisor is an integer linear
combination of a and b, but that it’s the smallest positive integer linear combination of a
and b. In particular, this means that if ax + by = 1 for some integers x, y, then we must
have gcd(a, b) = 1.

1.3 The Euclidean algorithm


In the example in the previous section, it was very easy to factor the given numbers to
compute their greatest common divisior. In general, factoring is a very hard problem.

Example 1.3.1. Let a = 1002001 and b = 379427895. Then a = 72 · 112 · 132 and b =
34 · 5 · 7 · 11 · 233 , so we have gcd(a, b) = 7 · 11 = 77.

This example would already be hard to do by hand, and if the integers are large enough,
even a computer will have trouble factoring them. The following algorithm (of Euclid!) is
a significantly more efficient way of doing this computation. Before we state and give the
proof, we need a lemma about greatest common divisors.

Lemma 1.3.2. Let a, b ∈ Z such that a = bq + r for some integers q, r. Then gcd(a, b) =
gcd(b, r).

Proof. Let d = gcd(a, b). Then d | a and d | b so d | a − bq, which means d | r. This says d
is a common divisor of b, r, so d ≤ gcd(b, r). On the other hand, set d0 = gcd(b, r). We have
d0 | b and d0 | r so d0 | bq + r, so that d0 | a, so d0 is a common divisor of a, b so d0 ≤ gcd(a, b).
Combining the two inequalities says d0 = d.

Theorem 1.3.3 (Euclidean algorithm). Let a, b be non-zero integers. Repeatedly carry out
the division algorithm as follows:

9
a = bq1 + r1 , 0 ≤ r1 < |b|
b = r1 q2 + r2 , 0 ≤ r2 < r1
r1 = r2 q3 + r3 , 0 ≤ r3 < r2
..
.
The last non-zero remainder is the greatest common divisor of a and b.
Proof. The proof has two steps: first, we show that the algorithm eventually terminates, so
that there are finitely many remainders. Then, we show that the last non-zero remainder
is indeed gcd(a, b). Consider the set of remainders {rn }. Then {rn } is a strictly decreasing
set of positive integers, and so by the well-ordering principle, there must be some N such
that n ≥ N means that rn = 0. To see this, suppose otherwise, that all terms of the
sequence are strictly positive. Then by the well-ordering principle, {rn } has a least positive
element, say L. Therefore, there is N such that rN = L. However, since {rn } is a strictly
decreasing sequence, we have rN +1 < rN < L, contradicting the definition of L. Therefore,
there are finitely many non-zero remainders output by the algorithm, call them r1 , . . . , rN .
Repeatedly applying the above lemma, we find gcd(a, b) = gcd(b, r1 ) = gcd(r1 , r2 ) = . . . =
gcd(rN −1 , rN ) = gcd(rN , 0) = rN .
Example 1.3.4. Let a = 75 and b = 45. We have
75 = 45 · 1 + 30
45 = 30 · 1 + 15
30 = 15 · 2 + 0
So that (75, 45) = 15.
Example 1.3.5. Let a = 517 and b = 89. We have

517 = 89 · 5 + 72
89 = 72 · 1 + 17
72 = 17 · 4 + 4
17 = 4 · 4 + 1
4=4·1+0
So that (517, 89) = 1.
Example 1.3.6. Let a = 379427895 and b = 1002001. We have

379427895 = 1002001 · 378 + 671517


1002001 = 671517 · 1 + 330484
671517 = 330484 · 2 + 10549
330484 = 10549 · 31 + 3465
10549 = 3465 · 3 + 154
3465 = 154 · 22 + 77
154 = 77 · 2 + 0

10
So that (379427895, 1002001) = 77.

From the above example, even though the numbers were moderately large, the Euclidean
algorithm finished rather quickly. In general, the algorithm is very efficient. It runs in
O(log(min(a, b)) time! The Euclidean algorithm is not only useful for computing the greatest
common divisor; we can use it to tell us what integer linear combination of a and b to take
to get the greatest common divisor!

Example 1.3.7. Let a = 517 and b = 89. We have

517 = 89 · 5 + 72
89 = 72 · 1 + 17
72 = 17 · 4 + 4
17 = 4 · 4 + 1
Which tells us that (517, 89) = 1. We now back substitute to solve for 1 in terms of the
previous lines. We have

1 = 17 − 4 · 4
= 17 − 4 · (72 − 17 · 4) = 17 · 17 − 4 · 72
= 17 · (89 − 72 · 1) − 4 · 72 = 89 · 17 − 72 · 21
= 89 · 17 − (517 − 89 · 5) · 21 = 89 · 122 − 517 · 21
This says the integers we are looking for in Bezout’s lemma are x = −21 and y = 122.

In particular, the Euclidean Algorithm provides an alternate proof of Bezout’s lemma


that doesn’t use the well-ordering principle: run the Euclidean algorithm on the pair (a, b)
and then perform back substitution to construct x, y such that ax + by = gcd(a, b).

1.4 Linear Diophantine equations


A Diophantine equation is an equation, typically a polynomial equation, with integer co-
efficients. Historically, Diophantine equations were studied by the Greek mathematician
Diophantus. Many problems in number theory can be translated into some question about
finding integer solutions to Diophantine equations, e.g. Pythagorean triples correspond to
integer solutions to the Diophantine equation x2 + y 2 = z 2 .

We begin with some useful properties of the greatest common divisor that we’ll frequently
use.

Proposition 3. Let a, b, c ∈ Z.

(a) gcd(ka, kb) = k · gcd(a, b) for any integer k > 0.

(b) Let g = gcd(a, b). Then gcd( ag , gb ) = 1.

(c) If gcd(a, c) = 1 and gcd(b, c) = 1 then gcd(ab, c) = 1.

11
Proof.

(a) We have gcd(a, b) = min{ax + by > 0 : x, y ∈ Z}, the smallest positive integer linear
combination of a and b. From this, it’s then quite clear that k · min{ax + by > 0 :
x, y ∈ Z} = min{(ka)x + (kb)y > 0 : x, y ∈ Z} = gcd(ka, kb).

(b) Let g = gcd(a, b). Then g = gcd( ag · g, gb · g) = g · gcd( ag , gb ) by (a). Since g > 0, this
means gcd( ag , gb ) = 1.

(c) By Bezout’s lemma, there exist integers x, y such that ax + cy = 1 and x0 , y 0 such that
bx0 + cy 0 = 1. Multiplying these two equations together, (ax + cy)(bx0 + cy 0 ) = 1, so
ab(xx0 ) + c(axy 0 + bx0 y + cyy 0 ) = 1. This says 1 is an integer linear combination of ab
and c, so that gcd(ab, c) = 1.

Proposition 4. Let a, b, c ∈ Z with a | bc and gcd(a, c) = 1. Then a | b.

Proof. Since gcd(a, c) = 1, by Bezout’s lemma there are integers x, y such that ax + cy = 1.
Multiplying by b says abx + bcy = b. Since a | bc, then a | abx + bcy, i.e. a | b.
We will now give a complete classification of integer solutions to linear Diophantine
equations.

Theorem 1.4.1. Consider the equation ax + by = c for a, b, c ∈ Z.

(a) A solution exists in Z2 if and only if gcd(a, b) | c.

(b) If (x0 , y0 ) ∈ Z2 is one solution, then all solutions are of the form (x0 + b0 k, y0 − a0 k)
for some k ∈ Z, where a0 = gcd(a,b)
a
and b0 = gcd(a,b)
b
.

Proof.

(a) Suppose that (x, y) ∈ Z2 is a solution to the equation. Then gcd(a, b) divides both
a and b, so gcd(a, b) | ax + by, which says gcd(a, b) | c. Conversely, suppose that
gcd(a, b) | c. Write c = gcd(a, b)k for some integer k. By Bezout’s lemma, there exist
integers x, y such that ax+by = gcd(a, b). Multiplying by k, we have a(kx)+b(ky) = c,
so that (kx, ky) is a solution.

(b) Suppose that (x, y) is a second solution to the equation, so that ax0 + by0 = c and
ax+by = c. Equating, we find a(x−x0 ) = b(y0 −y), so dividing through by gcd(a, b) we
get a0 (x − x0 ) = b0 (y0 − y). Since a0 | b0 (y0 − y) and gcd(a0 , b0 ) = 1, this means a0 | y0 − y
so we can write y0 − y = ka for some integer k. Similarly, we can write x − x0 = b0 `
for some integer `. Substituting back in, this means that a0 b0 ` = a0 b0 k, so that ` = k.
This says (x, y) = (x0 + b0 k, y0 − a0 k) for some integer k, so any solution must be of
this form. Finally, it’s clear by a simple substitution that any pair (x0 + b0 k, y0 − a0 k)
is a solution to ax + by = c, so that these constitute all possible solutions.

12
Example 1.4.2. Suppose we want to find all integer solutions to the equation 37x+47y = 22.
First, we run the Euclidean algorithm:

47 = 37 · 1 + 10
37 = 10 · 3 + 7
10 = 7 · 1 + 3
7=3·2+1

So that gcd(47, 37) = 1. Performing back substitution, we find that (14, −11) is one solution
to the equation 37x + 47y = 1. Multiplying by 22, we have (308, −242) is a solution to the
equation 37x + 47y = 22. The above theorem then says that the solution set to this equation
is {(308 + 47k, −242 − 37k) : k ∈ Z}.

1.5 Unique factorization in Z


Definition 1.5.1. An integer p > 1 is called prime if the only positive divisors of p are 1
and p.

The sequence of prime numbers starts off as 2, 3, 5, 7, 11, . . .. Primes are the “building
blocks” of the integers, in the sense that all integers are constructed from primes. An
alternate characterization of primes is the following:

Proposition 5 (Euclid’s lemma). Let p > 1. Then p is prime if and only if for any integers
a, b, p | ab =⇒ p | a or p | b.

Proof. First, suppose that p is prime. Let p | ab, and suppose that p - a. Then (a, p) = 1 by
definition of p being prime, so by Bezout’s lemma, there are integers x, y with px + ay = 1.
Multiplying by b says pbx + aby = b, and since p | ab, we have p | pbx + aby, so p | b.
Conversely, suppose that p > 1 is an integer with the property that p | ab =⇒ p | a or
p | b. Let d be a positive divisor of p. Then we can write p = dk for some positive integer k.
Certainly, p | p, so p | dk means that p | d or p | k. However, since 1 ≤ d, k ≤ p, the only way
this is possible is if d = 1 or d = p, so that the only positive divisors of p are 1 and p.
This characterization of primes will be the key to proving unique factorization.

Theorem 1.5.2 (Fundamental Theorem of Arithmetic). Let n > 1 be an integer. Then there
exist unique primes p1 , . . . , pk and unique positive integers e1 , . . . , ek such that n = pe11 · · · pekk .
That is, every integer has a unique factorization (up to order of factors) into a product of
primes.

Proof. There are two parts to the proof. First, we show that we can write every integer n > 1
as some product of prime numbers, and then we will show that such a choice of primes are
unique. Both of these statements will be proven using strong induction.

Existence:
Note that n = 2 is a prime. Now suppose for some k that the integers 2, 3, . . . , k can

13
be written as a product of primes. Consider the integer k + 1: if it is prime, we are done.
Otherwise, k + 1 is not prime, so by definition it has a non-trivial positive divisor. Write
k + 1 = ab for some integers a and b. Necessarily, 1 < a, b < k + 1, so in particular a and
b are integers between 2 and k. By induction hypothesis, both a and b can be written as a
product of primes, and therefore k + 1 is a product of primes as well. By induction, we then
see that every integer n > 1 is a product of primes.

Uniqueness:
Note that 2 is a prime so it’s a product of primes in a unique way. Now suppose that for
some k, we have that 2, 3, . . . , k all have a factorization using a unique set of primes. If k + 1
is prime, again we are done. Otherwise, suppose that we can write k+1 = p1 · · · pm = q1 · · · q`
for some primes pi and qj . Then p1 | q1 · · · q` , so by inductively applying Euclid’s lemma,
one finds p1 | qj for some j, and since qj are prime, this says p1 = qj for some j. By
reordering the factors as necessary, assume p1 = q1 . Cancelling p1 from both sides, we have
a = p2 · · · pm = q2 · · · q` . However, 1 < a < k so by assumption, a has unique factorization,
i.e. m = ` and pi = qi for all 2 ≤ i ≤ m (after reordering if necessary). Since k + 1 = ap1 ,
and p1 = q1 , this shows k + 1 has unique factorization as desired. By induction, every n > 1
has unique factorization. Collecting terms of the same prime together shows that n is of the
form listed in the statement and the uniqueness of the exponents is immediate.
Another way of thinking about the statement of unique factorization is the following:

Corollary 1.5.3. For any integer n > 1, we can write n = p pep with ep ≥ 0, and ep > 0
Q
for finitely many primes p.

This point of view allows one to view an integer as an infinite tuple (e2 , e3 , . . .) where
only finitely many ep are non-zero, the point being that the data of an integer is given by
the data of the exponents of each prime in Z.

We now explore some consequences of the Fundamental Theorem of Arithmetic.

Definition 1.5.4. The least common multiple of two integers a, b, denoted lcm(a, b) is
the unique positive integer ` with the following properties:

1. a | ` and b | `.

2. For any positive m such that a | m and b | m, we have ` ≤ m.

For any two integers a and b, we can always factor them into a common set of primes
by allowing exponents to be 0. This leads to another computation of the greatest common
divisor, and a computation of the least common multiple.

Proposition 6. Let a = pe11 · · · pekk and b = pf11 · · · pfkk be prime factorizations of a and
min{e1 ,f1 } min{e ,f }
b into a common set of primes. Then gcd(a, b) = p1 · · · pk k k and lcm(a, b) =
max{e1 ,f1 } max{ek ,fk }
p1 · · · pk .

14
Proof. We prove the first statement, and leave the second as an exercise as it is very similar.
Let d = gcd(a, b). Then since d divides both a and b, we can write d = pt11 · · · ptkk for some
non-negative integers ti where ti ≤ ei and ti ≤ fi . Therefore, ti ≤ min{ei , fi } for all i. It’s
clear that setting ti = min{ei , fi } for all i gives us a divisor of a and b, which is therefore
the greatest common divisor because the size of each exponent has been maximized.

Corollary 1.5.5. Let a, b > 0 be integers. Then ab = gcd(a, b) · lcm(a, b).

Proof. The prime factorization of ab is given by multiplying the prime factorizations of a


and b together. Writing a = pe11 · · · pekk and b = pf11 · · · pfkk , the exponent of pi in ab is given
by ei + fi = min{ei , fi } + max{ei , fi } from which the result follows.
The above relation gives an efficient way to compute the least common multiple of two
integers! There is no “Bezout-like” relation for the least common multiple like there is for
ab
the greatest common divisor, but one has lcm(a, b) = gcd(a,b) . The Euclidean Algorithm can
efficiently compute gcd(a, b), and therefore can be used to efficiently compute lcm(a, b).

Proposition 7. For any integer n > 1 there exist unique integers a, b such that n = ab2
with a squarefree (not divisible by the square of any prime).

Proof. Write n = pe11 · · · pekk as a product of primes. For each exponent, write ei = 2fi + ri
with ri = 0 or ri = 1 by the division algorithm. Then n = ki=1 pri i · ( ki=1 pfi i )2 = ab2 . That
Q Q
a and b are the unique integers with this property are clear.
As a final application of unique factorization, we give another proof of the next result:

Proposition 8. Let a, b, c > 0 such that gcd(a, c) = 1 and gcd(b, c) = 1. Then gcd(ab, c) = 1.

Proof. Let The prime factorization of ab comes from multiplying the prime factorizations of
a and b together. Let p be a prime dividing c. Then by assumption, p - a and p - b, so p - ab.
Since any common divisor d > 1 of ab and c must be divisible by some prime dividing c, this
proves gcd(ab, c) = 1.

1.6 Primes
Since primes build the integers, many questions in number theory are about trying to under-
stand the set of prime numbers. The most basic question one can ask, is how many prime
numbers are there? This was first answered by Euclid, around 300 BC.

Theorem 1.6.1 (Euclid). There are infinitely many primes.

Proof. Suppose we start with a finite list of primes p1 , . . . , pn . Consider N = p1 , · · · pn + 1.


We know that N must have a prime divisor, however N is not divisible by any of the primes
pi because it leaves remainder of 1 upon division. Therefore, there is some other prime pn+1
such that pn+1 | N . Therefore, given any finite set of primes we can produce a new prime,
so there must be infinitely many primes.
While satisfying, there is a much better argument of this fact that was given by Euler.

15
Theorem 1.6.2 (Euler). There are infinitely many primes.
Proof. Suppose there were finitely many primes, p1 , . . . , pN . The we get the following product
expansion
N N
Y 1 Y 1
1 = (1 + + . . .)
i=1
1 − pi i=1
pi
by recognizing each term 1−1 1 as the sum of the geometric series ∞ 1
P
k=0 pk . By the Funda-
i
pi
mental Theorem of Arithmetic, we can write any integer n > 1 as n = pe11 · · · peNN where
ei ≥ 0. We can recover any such choice of integer as a denominator of a term in the ex-
panded product by picking the appropriate term p1ei from each infinite series in the product.
i
1
Therefore, for any n > 1 we see that n
is a term in the product. Therefore, we must have
∞ N
X 1 Y 1
< 1
n=1
n i=1 1 − pi

The left hand side is the harmonic series, which from calculus diverges to ∞, which results
in a contradiction. Therefore, there must be infinitely many primes.
Euler’s argument is much better than Euclid’s argument, because it actually provides
some additional information about how “large” the set of primes are.
Theorem 1.6.3 (Euler). The infinite series p p1 diverges, where the sum is taken over all
P
primes p.
Proof. The argument is similar to the argument above, but here we need to be a bit more
careful with the analysis. For any integer N , similar to before, we can write
Y 1 Y 1
1 = (1 + + . . .)
p≤N
1− p p≤N
p

by expanding out the terms in the product as infinite series. Again, the same logic tells us
that we must have
N
X 1 Y 1
≤ .
n=1
n p≤N 1 − p1
Taking logarithms of both sides,
N
X 1 X 1
log( )≤ − log(1 − ).
n=1
n p≤N
p

Using calculus, one can show that − log(1 − p1 ) ≤ 1


p
+ 1
p2
, so we find that
N
X 1 X1 X 1
log( )≤ + .
n=1
n p≤N
p p≤N p2

Taking N → ∞, the second term on the right converges by a comparison with ∞ 1


P
n=1 n2 , and

P 1on the left diverges to ∞ because the harmonic series diverges. This then proves
the term
that p p diverges.

16
Euler’s result tells P
us that not only are there infinitely many primes, but the set of primes
is “large enough” for p p1 to diverges. This gives us some information that we didn’t have
before! Note that not all infinite sets satisfy the property that x∈S x1 = ∞, e.g. taking S
P
to be the set of square shows this is false. This really is saying something interesting about
the primes!

Using the integral test, one can show that N 1


P
n=1 n ≈ log(N ), so that the partial sums
of the harmonic series grow likePlogarithms. Therefore, the harmonic series diverges slowly.
It’s a theorem of Merten that p≤N p1 ≈ log(log(N )), which diverges even slower ! Using
this approximation, we would need N > 528491311 for p≤N p1 ≈ log(log(N )) > 3! That
P
this sum diverges is very much not obvious, and if one did not see a proof it would be very
easy to convince themselves that it is bounded!

Definition 1.6.4. The prime counting function π(x) is defined by π(x) = #{p ≤ x :
p is prime}.

Example 1.6.5. π(5) = 3 and π(10.5) = 4.

The benefit of consider π(x) as a function of real numbers is so that calculus can be used
to analyze it. The previous results tell us that limx→∞ π(x) = ∞. One of the most important
problems in all of number theory is to understand how π(x) grows. The fundamental result
about the growth rate of primes is the following:
π(x)
Theorem 1.6.6 (Prime Number Theorem). limx→∞ x/ log(x)
=1
x
In other words, the Prime Number Theorem says that as x → ∞, π(x) ≈ log(x) . There is
a much better approximation to π(x), first discovered by Gauss.
Rx 1
Definition 1.6.7. The logarithmic integral Li(x) is defined by Li(x) = 2 log(t) dt.
Li(x)
It’s a standard calculus exercise to show that limx→∞ x/ log(x)
= 1, so if one believes the
π(x) 1
Prime Number Theorem, then one also has π(x) ≈ Li(x) as x → ∞. Since x
≈ log(x)
,
and π(x)
x
is the “probability” that a randomly chosen number in the interval [2, x] is prime,
1
informally we see that the “density” primes is given by log(x) . Integrating this “density”
function on the interval [2, x] should give the number of primes in the interval [2, x]. The
Prime Number Theorem says that this is actually a pretty reasonable approximation, so that
primes roughly “behave randomly”.
By repeatedly integrating by parts, one may show that
x x 2x
Li(x) ≈ + 2
+ + ...
log(x) log(x) log(x)3
x
which shows that Li(x) is indeed a “better” approximation than log(x) , and that other easy
approximations may be found by just taking a few terms of this expansion. How accurate of
an approximation is Li(x) exactly? Here’s a table of π(x) and Li(x) for various values of x.

17
x π(x) Li(x)
104 1229 ≈ 1229
105 9592 ≈ 9571
106 78498 ≈ 78380
1010 455052511 ≈ 454793911

As can be seen, it’s generally quite close! Note that although all the values in this table are
underestimates, π(x) − Li(x) changes sign infinitely often!

How can one quantify how “good” of an approximation Li(x) is to π(x)? This is the
Riemann Hypothesis!

Conjecture 1.6.8 (Riemann Hypothesis). There exists a real number C > 0 such that for
x sufficiently large, √
|π(x) − Li(x)| ≤ C x log(x)

In other words, √the Riemann Hypothesis conjectures that the “error term” π(x) − Li(x)
grows roughly like x log(x) (up to constant factor and sign), which gives important infor-
mation about the distribution of prime numbers. The “classical” statement of the Riemann
Hypothesis is the following:

Conjecture 1.6.9 (Riemann Hypothesis). For s ∈ C with Re(s) > 1, let ζ(s) = ∞ 1
P
n=1 ns .
1
Then any zero of ζ(s) outside of −2, −4, −6, · · · must have a real part of 2 .

Although not obvious, these two formulations are indeed equivalent. That there is some
connection between the Riemann Zeta function and the distribution of primes can be seen
from Euler’s proof that there are infinitely many of them. One may factor
Y 1
ζ(s) =
p
1 − p−s

as Euler did, and note that taking s → 1 results in the proof that there are infinitely many
primes!

In general, most results about prime numbers are very hard to prove. Another famous
open problem is the Twin Prime Conjecture, which states the following:

Conjecture 1.6.10 (Twin Prime Conjecture). Let pn denote the n-th prime number. There
are infinitely many n such that pn+1 − pn = 2.

The best known result, proven in 2014, is that there are infinitely many values of n with
pn+1 − pn < 246.

Another famous question about prime numbers is the distribution of primes in arithmetic
progressions. The following is a famous result of Dirichlet:

Theorem 1.6.11 (Dirichlet). Let a, b > 0 with gcd(a, b) = 1. There are infinitely many
primes of the form ak + b.

18
The idea of the proof is a modified version ofP Euler’s argument. Dirichlet shows that
1
with S = {p : p = ak + b, p prime}, one has p∈S p = ∞. However, the argument is
significantly harder. Simple cases of Dirichlet’s Theorem can be proven by hand, but there
is no “universal” technique for proving there are infinitely many primes of form ak + b for
fixed values of a, b. To show there are infinitely many primes of the form 23 + 31k, for
example, there is likely no known method outside of just proving Dirichlet’s Theorem.

1.7 Exercises
1. Apply the extended division algorithm to each pair of numbers a, b. That is, for each
one, find integers q and r such that a = b · q + r with 0 ≤ r < |b|.

(a) a = 47, b = −13


(b) a = 956, b = −27
(c) a = 29657452, b = −4382

2. For each of the pairs of integers (a, b) below, do the following:

(i) Run the Euclidean algorithm to compute gcd(a, b).


(ii) Use back substitution to find integers x, y such that ax + by = gcd(a, b).

(a) (504, 94)


(b) (−1260, 816)

3. Compute the gcd and lcm of 22 33 55 77 1111 and 27 35 53 72 13.

4. Let a, b be integers with gcd(a, 25) = 5 and gcd(b, 125) = 25. Compute gcd(ab, 625), gcd(a+
b, 625) and lcm(ab, 625).

5. Use the Euclidean algorithm to compute lcm(3134376, 17599768).

6. (a) Find all integer solutions to 2260x + 816y = 6968.


(b) Find all positive integer solutions to 54x + 21y = 906.
(c) Find all integer solutions to 2x + 3y + 5z = 1.

7. Prove Proposition 1.2.2.

8. Prove Theorem 1.2.5.

9. Suppose that n > 1 is an odd integer. Prove that 8 | n2 − 1.

10. Show that if n > 1 is odd, then 24 | n3 − n.

11. Show that the sum of three consecutive odd integers is divisible by 9.

12. Find all rectangles with integer sides with equal area and perimeter.

19
13. (a) For any integers x > 1 and m, n ≥ 0, prove that (xn − 1) | (xm − 1) if and only if
n | m.
(b) For any integer x, prove that if n is odd, (x + 1) | (xn + 1).

14. In this problem, you √ will give a proof that 2 is irrational using
√ the Euclidean algo-
a
rithm. Suppose that 2 was rational, so it can be written as 2 = b for some positive
integers a, b with b 6= 0.

(a) Show that a = b · 1 + (a − b) with 0 ≤ a − b < b, so that this is the first step in
the Euclidean algorithm on the pair (a, b) with q1 = 1 and r1 = a − b.
(b) Write down the next step in the Euclidean algorithm by performing the division
algorithm on the pair (b, a − b). What is q2 ? What is the ratio r1 /r2 ? (Your
answers should be numbers, not involving the letters a, b).
rn−1 r1
(c) Prove that qn = q2 and rn
= r2
for all n ≥ 2. (Hint: prove these both simulta-
neously via induction.)
(d) Explain why √
the truth of the statement in (c) yields a contradiction, therefore
proving that 2 must not be rational.

15. The Fibonacci sequence is defined by

F0 = 1, F1 = 1
Fn+1 = Fn + Fn−1 , n≥1

Prove by induction that for all n ≥ 1, the number of steps required for the Euclidean
algorithm on the pair (Fn+1 , Fn ) to terminate is exactly n.

16. Prove that if gcd(a, b) = d then gcd(a2 , b2 ) = d2 .

17. Prove that if gcd(a, b) = 1, then lcm(a, b) = ab.

18. Prove that gcd(a, bc) | gcd(a, b)gcd(a, c).

19. Prove that if gcd(a, b) = 1, then gcd(a + b, ab) = 1.

20. Prove that if gcd(a, b) = 1 and ab = c2 , then a and b are both squares.

21. Suppose that a, b ∈ Z with a | c and b | c. Prove that lcm(a, b) | c.

22. Let a, b be integers and let n ≥ 1. Prove that an | bn if and only if a | b.

23. Prove that for any integer n ≥ 1 there exist unique integers r, m with m odd such that
n = 2r m.
n n n
24. Find an integer n such that 2
is a square, 3
is a cube, and 5
is a fifth power.

25. Let n be a positive integer such that the exponent of each prime in its factorization is
at least two. For example, 1944 = 23 · 35 . Prove that there are integers a, b such that
n = a2 b 3 .

20
26. (a) Let a, b ≥ 1 be integers. Prove that if (a2 − b2 ) | (a2 + b2 ), then (a2 − b2 ) |
2 gcd(a, b)2 .
(b) Prove there are no integers a, b ≥ 1 such that (a2 − b2 ) | (a2 + b2 ).

27. Define vp (n) to be the exponent of p in the prime factorization of n. For example,
v3 (45) = 2 because 45 = 32 · 5. Prove that vp (n!) = b np c + b pn2 c + . . ., where bxc denotes
the greatest integer that is at most x.

28. We define the function τ (n) to be the number of positive divisors of n. For example,
τ (12) = 6 because it has positive divisors 1, 2, 3, 4, 6, 12.

(a) Let m, n be integers with gcd(m, n) = 1. Prove that if d | mn, then d = d1 d2 for
unique d1 , d2 with d1 | n and d2 | m. Explain why this means τ (mn) = τ (m)τ (n).
(b) Write down a formula for τ (n) in terms of the prime factorization of n. (Hint:
start by computing τ (pe ) for p prime and e ≥ 1.)

29. Compute the number of divisors of 20!.

30. How many zeroes does 169! end in?

31. Find all integers n < 100 such that τ (n) = 12.

32. Let p ≥ 5 be a prime. Show that p2 + 2 is composite (Hint: show that p2 + 2 is divisible
by 3).

33. Prove that if n > 1 is composite, then n has a prime factor p with p ≤ n.

34. (a) Prove that if 2n − 1 is prime, then n is prime. Similarly, prove that if 2n + 1 is
prime, then n is a power of 2.
(b) Show that the converse of each statement is false (you may use WolframAlpha for
computations).

35. Prove that there are infinitely many primes of the form 4k + 3.

36. (Sieve of Eratosthenes) Write down all natural numbers from 1 to 100, perhaps on a
10 × 10 array. Circle the number 2, the smallest prime. Cross off all numbers divisible
by 2. Circle 3, the next number that is not crossed out. Cross off all larger numbers
that are divisible by 3. Continue to circle the smallest number that is not crossed out
and cross out all its multiples. Repeat. Why are the circled numbers all the primes
less than 100?
Li(x)
37. Prove that limx→∞ x/ log(x)
= 1.

21
Chapter 2

Unique factorization in Euclidean


domains

2.1 Unique factorization in F [x]


The integers are not just the only object of interest to number theorists. There is a very
close link between numbers and polynomials. For example, instead of thinking of the number
1
2
∈ Q, one may instead think of it as the (unique) root of the polynomial 2x − 1 ∈ Z[x].
Throughout these notes, we assume that all rings are commutative.

Definition 2.1.1. Let R be a ring and let p(x) = an xn + . . . + a0 ∈ R[x] be a polynomial.


The degree of p(x) is the largest exponent out of the non-zero terms in the sum. We call
p(x) monic if the leading coefficient (coefficient of the highest exponent term) is 1.

Definition 2.1.2. Let f (x), g(x) ∈ R[x]. We say that f (x) divides g(x) and write f (x) |
g(x) if there is a polynomial h(x) ∈ R[x] such that g(x) = f (x)h(x).

For our purposes, we’re going to mainly focus on when R = F is a field, as the polynomial
rings F [x] will turn out to be those that are most similar. This is not to say that other
polynomial rings are not of interest! In particular, Z[x] is rather important.

Proposition 9. Suppose that f (x), g(x) ∈ R[x] for R = Z or R = F a field. Then


deg(f (x)g(x)) = deg(f (x)) + deg(g(x)).

Proof. Suppose that the leading terms of f (x) and g(x) are an xn and bm xm respectively.
Then the leading term of f (x)g(x) is an bm xn+m . Since an bm ∈ R, we must have an bm 6= 0
because an , bm 6= 0 and the product of two non-zero elements in R cannot be zero.

Definition 2.1.3. Let f (x) ∈ F [x]. We say that f (x) is irreducible if f (x) = g(x)h(x)
for some g(x), h(x) ∈ F [x] means either g(x) is constant or h(x) is constant. If f (x) is not
irreducible, we call it reducible.

We have the following analogies between Z and F [x]:

22
Z F [x]
±1 non-zero constant polynomials
|n| deg(f (x))
positive monic
prime irreducible

In Z, we got unique factorization by following the chain of reasoning:

Division Algorithm =⇒ Euclidean Algorithm =⇒ Bezout’s lemma =⇒ Euclid’s lemma


=⇒ Unique factorization.

In F [x], we’re going to prove analogous versions of all of these results and follow the same
chain of reasoning to come to the same conclusion!
Theorem 2.1.4 (Division Algorithm). Let f (x), g(x) ∈ F [x] with g(x) = 6 0. There exist
unique q(x), r(x) ∈ F [x] such that f (x) = g(x)q(x) + r(x) with r(x) = 0 or deg(r(x)) <
deg(g(x)).
Proof. If g(x) | f (x) this is obvious (with r(x) = 0), so assume it doesn’t. Let S =
{f (x) − g(x)q(x) : q(x) ∈ F [x]}. Note that f (x) ∈ S, so that S is non-empty. By the
well-ordering principle, choose r(x) ∈ S of minimal non-negative degree. This means that
there is q(x) ∈ F [x] such that f (x) − g(x)q(x) = r(x), i.e. f (x) = g(x)q(x) + r(x). First,
we show the remainder bound. If deg(r(x)) ≥ deg(g(x)), then write r(x) = rd xd + . . . and
g(x) = gk xk + . . ., where rd xd and gk xk are the leading terms of r(x) and g(x) respectively.
Then r(x) − g(x) · ( grkk xd−k ) = f (x) − g(x)(q(x) + grkd xd−k ) ∈ S, however this polynomial has
smaller degree than r(x), which was a polynomial of minimal degree among the elements of
S. This is a contradiction, so this means deg(r(x)) < deg(g(x)).

For the uniqueness statement, if two pairs (q1 (x), r1 (x)) and (q2 (x), r2 (x)) satisfy the
conditions of the theorem, this means that g(x)(q1 (x)−q2 (x)) = r2 (x)−r1 (x). Taking degrees,
we must have deg(g(x)) + deg(q1 (x) − q2 (x)) = deg(r2 (x) − r1 (x)). If both q1 (x) − q2 (x) and
r2 (x) − r1 (x) are non-zero, then we would have deg(g(x)) + deg(q1 (x) − q2 (x)) < deg(g(x)),
which is impossible. Therefore, at least one of these polynomials is zero, and once one of
them is it’s clear the other must be as well. Therefore, q1 (x) = q2 (x) and r1 (x) = r2 (x) as
desired.
Example 2.1.5. Let f (x) = x4 − 4x3 + 3x2 + 2x − 1 and g(x) = x2 − 2x + 1 in Q[x]. Then
we have f (x) = (x2 − 2x − 2)g(x) + 1. Once again, this process is equivalent to the process
of polynomial long division that one learns in middle school mathematics.
In the proof of the division algorithm, we needed to divide by the leading coefficient of
g(x). Therefore, we really did use that F was a field, and so for a general ring R[x], there’s
no hope for a general division algorithm. However, in any ring R[x], we can always divide
by monic polynomials, as the same proof goes through!
Definition 2.1.6. Let f (x), g(x) ∈ F [x]. The greatest common divisor of f (x) and g(x),
gcd(f (x), g(x)) is the monic polynomial d(x) that satisfies:

23
1. d(x) | f (x) and d(x) | g(x).

2. For any monic h(x) such that h(x) | f (x) and h(x) | g(x) we have deg(h(x)) <
deg(g(x)).

In Z[x], we cannot always rescale a polynomial to be monic like we can in F [x], so


if we want to define a greatest common divisor in this ring we need to alter the defi-
nition slightly. The second condition can be instead be replaced with “h(x) | f (x) and
h(x) | g(x) =⇒ h(x) | d(x)”. If we then require the leading coefficient of d(x) to be posi-
tive, we can then pick out “a” greatest common divisor. Note that in our definition for F [x],
we require the greatest common divisor to be monic, otherwise there are many polynomials
that satisfy the above properties (which all just differ by some constant multiple).

Just as before, we can repeatedly run the division algorithm over and over again to
compute the greatest common divisor of two polynomials f (x) and g(x) in F [x], giving us
a Euclidean Algorithm. Also, just as before, once we have a Euclidean Algorithm, it can
be run backwards to write gcd(f (x), g(x)) as an F [x]-linear combination of f (x) and g(x),
giving us a Bezout lemma. We’ll provide precise statements of these results for completeness
sake, but the proofs are left as exercises for the reader.

Theorem 2.1.7 (Euclidean Algorithm). Let f (x), g(x) be non-zero polynomials. Repeatedly
carry out the division algorithm as follows:

f (x) = g(x)q1 (x) + r1 (x), deg(r1 (x)) = 0 or ≤ deg(r1 (x)) < deg(g(x))
g(x) = r1 (x)q2 (x) + r2 (x), deg(r2 (x)) = 0 or ≤ deg(r2 (x)) < deg(r1 (x))
r1 (x) = r2 (x)q3 (x) + r3 (x), deg(r3 (x)) = 0 or ≤ deg(r3 (x)) < deg(r2 (x))
..
.
The last non-zero remainder is the greatest common divisor of f (x) and g(x), up to constant
multiple.

Theorem 2.1.8 (Bezout’s lemma). Let f (x), g(x) ∈ F [x]. Then there are p(x), q(x) ∈ F [x]
such that f (x)p(x) + g(x)q(x) = gcd(f (x), g(x)).

Example 2.1.9. Let f (x) = x3 + x2 + 1 and g(x) = x2 + x + 1 in Q[x]. First, let’s compute
gcd(f (x), g(x)). Running the Euclidean Algorithm produces the following sequence of steps:

f (x) = x · g(x) + (1 − x)
g(x) = (1 − x)(−x − 2) + 3
1
−x − 2 = 3 · ( (−x − 2)) + 0
3
The last non-zero remainder is 3, which when rescaled to be monic, is 1. This says that
gcd(f (x), g(x)) = 1, so that f (x) and g(x) are relatively prime. To find p(x), q(x) such
that f (x)p(x) + g(x)q(x) = 1, first, we perform back substitution. Doing so, we find that
2
(1 − x2 − 2x)g(x) + (x + 2)f (x) = 3, and then rescaling gives 1−x3−2x g(x) + x+2
3
f (x) = 1.

24
Now that we have a Euclidean Algorithm and a Bezout lemma, we need a Euclid lemma
before we can get unique factorization.

Theorem 2.1.10 (Euclid’s lemma). Suppose that f (x) ∈ F [x] is irreducible and f (x) |
g(x)h(x) in F [x]. Then f (x) | g(x) or f (x) | h(x).

Proof. The proof is the same as in Z: suppose that f (x) - g(x). Then since f (x) is irreducible,
we have gcd(f (x), g(x)) = 1, so by Bezout’s lemma there are p(x), q(x) such that f (x)p(x) +
g(x)q(x) = 1. Multiplying by h(x), we have f (x)h(x)p(x) + g(x)h(x)q(x) = h(x). Since f (x)
divides both terms in the sum on the left, we must have f (x) | h(x) as desired.

Theorem 2.1.11 (Unique factorization). Let f (x) ∈ F [x] be a non-constant monic polyno-
mial. There exist unique monic irreducibles π1 (x), . . . , πk (x) and integers e1 , . . . , ek > 0 such
that f (x) = π1 (x)e1 · · · πk (x)ek .

Proof. As before, the proof has two parts. First, we show that every non-constant polyno-
mial in F [x] is a product of monic irreducibles, and then we show this happens for a unique
set of monic irreducibles.

Existence:
We proceed by induction on deg(f (x)). If deg(f (x)) = 1, then we’re done, as any degree
1 polynomial is irreducible. Suppose we know that any monic polynomial of degree ≤ k can
be written as a product of monic irreducibles for some k. Let f (x) be a monic degree k + 1
polynomial. If f (x) is irreducible, we’re done, otherwise f (x) is reducible, so we can write
f (x) = g(x)h(x) for some g(x), h(x) ∈ F [x] with 1 ≤ deg(g(x)), deg(h(x)) ≤ k. Without
loss of generality, since f (x) is monic and we’re working over a field, we can rescale g(x) and
h(x) to both be monic. By assumption, both g(x) and h(x) factor into products of monic
irreducibles, and therefore f (x) does as well. By induction, the result is true for any monic
polynomial of degree at least 1.

Uniqueness:
If deg(f (x)) = 1, we’re again done, as f (x) is irreducible. Now, suppose that we know
the factorization is unique for all monic f (x) of degree ≤ k for some k. Let f (x) be a
monic polynomial of degree k + 1. Once more, if f (x) is irreducible, we’re done. Otherwise,
f (x) is reducible, so it can be written as a product of monic irreducibles. Let f (x) =
π1 (x) · · · πk (x) = q1 (x) · · · q` (x) be a two factorization of f (x) into monic irreducibles. Since
π1 (x) divides the left hand side, it must divide qi (x) for some i by Euclid’s lemma. Without
loss of generality, suppose that π1 (x) | q1 (x). Since both π1 (x) and q1 (x) are monic and
irreducible, this means that π1 (x) = q1 (x). Therefore, we must have π2 (x) · · · πk (x) =
q2 (x) · · · q` (x). This is a polynomial of degree ≤ k, so by induction hypothesis, it has a unique
factorization into a product of monic irreducibles. This forces k = ` and that {πi (x)} =
{qj (x)} for i ≥ 2. Since π1 (x) = q1 (x), this means we actually have {πi (x)} = {qj (x)}
for all i, so that the two sets of monic irreducible factors of f (x) are actually the same.
Therefore by induction, any monic non-constant polynomial has a unique factorization into
monic irreducibles. Collecting powers of the same irreducible factors gives the form stated
in the theorem.

25
Note that if we didn’t required that the irreducible factors be monic, then the factorization
would only be unique up to a constant-multiple. The key point is that two monic irreducible
polynomials in F [x] are equal if they divide each other.

2.2 Unique factorization in Z[i]


Definition 2.2.1. The Gaussian integers Z[i] are defined by Z[i] = {a + bi : a, b ∈ Z}.

We have that Z[i] ⊆ C is a subring, so we can and multiply Gaussian integers the same
way that we can add and multiply complex numbers.

Definition 2.2.2. For α = a + bi ∈ Z[i], the norm of α is defined by N (α) = αᾱ = a2 + b2 .

In the Gaussian integers, the norm is the notion of size, analogous to the degree of a
polynomial or the absolute value of an integer.

Proposition 10. Let α, β ∈ Z[i]. Then N (αβ) = N (α)N (β).

Proof. Let α = a + bi and β = c + di, so that αβ = (ac − bd) + (ad + bc)i. Then we have
N (α)N (β) = (a2 + b2 )(c2 + d2 ) = (ac − bd)2 + (bc + ad)2 = N (αβ).

Definition 2.2.3. We call α ∈ Z[i] a unit if there is β ∈ Z[i] such that αβ = 1.

Proposition 11. α ∈ Z[i] is a unit if and only if α = ±1, ±i.

Proof. We have αβ = 1 in Z[i] if and only if N (α)N (β) = 1 in Z. Since N (α) ≥ 0, this
means N (α) = 1. Writing α = a + bi, this means a2 + b2 = 1. The integer solutions to this
equation are easily seen to be (±1, 0) and (0, ±1), corresponding to the four numbers ±1
and ±i. Each of these are obviously units, so there are exactly four.
In Z the units are ±1, and in F [x] the units are non-zero constant polynomials. In both
of these settings, we have a “canonical” choice of unit: in Z, we can pick out 1 because its
positive, and in F [x] we can always rescale to make polynomials monic. However, in Z[i],
we do not have such a canonical choice of unit. This is going to result in a lot of statements
in Z[i] only being unique up to unit multiple.

Definition 2.2.4. For α, β ∈ Z[i], we say that α divides β and write α | β if there is
γ ∈ Z[i] such that β = αγ.

Example 2.2.5. 14 − 3i = (4 + 5i)(1 − 2i), so 4 + 5i | 14 − 3i. However, 4 + 5i - 14 + 3i. In


C, we have 14+3i
4+5i
71
= 41 − 58
41
i 6∈ Z[i].

Often times, its useful to pass from Z[i] to Z by taking norms. One such use is for
checking for potential divisibility:

Proposition 12. Let α, β ∈ Z[i]. If α | β then N (α) | N (β) in Z.

Proof. Suppose that α | β. Then β = αγ for some γ ∈ Z[i], taking norms results in
N (β) = N (α)N (β), which says that N (α) | N (β) in Z.

26
Next, we state and prove the division algorithm in Z[i].
Theorem 2.2.6 (Division Algorithm). Let α, β ∈ Z[i] with β 6= 0. Then there exist γ, ρ ∈
Z[i] such that α = βγ + ρ with 0 ≤ N (ρ) ≤ 21 N (β).

Proof. Let α = a + bi and β = c + di. In C, we have αβ = Nα(β) β̄


= cac+bd ad−bc
2 +d2 + c2 +d2 i = r + si

for r = cac+bd ad−bc


2 +d2 and s = c2 +d2 . Choose p, q ∈ Z such that |r − p| ≤ 21 and |s − q| ≤ 12
(which may be achieved by rounding r, s to the nearest integer). Set γ = p + qi, and let
θ = (r − p) + (s − q)i = αβ − γ. This says that α = βθ + βγ, so that βθ ∈ Z[i]. We then
have N (βθ) = N (β)N (θ) ≤ N (β)(( 12 )2 + ( 12 )2 ) = 21 N (β). Therefore, we’re done by taking
ρ = βθ.
Example 2.2.7. With α = 11 + 10i and β = 4 + i, we have αβ = 54 17
+ 29
17
i ∈ C. We take
γ = 3+2i, and ρ = α−βγ = 1−i. Then the division algorithm says that α = β(3+2i)+(1−i).
Unlike in both Z and F [x], we do not have uniqueness of γ and ρ in the division algorithm
for Z[i]. For example,
1 + 8i = (2 − 4i)(i − 1) − (1 + 2i)
1 + 8i = (2 − 4i)(−2 + i) + (1 − 2i)
This is related to not being able to pick out a canonical choice of unit in Z[i]. If one dropped
the requirement that the remainder in Z is positive, uniqueness would also be lost. For
example,
9=4·2+1
9 = 4 · 3 + −3
Once again, the division algorithm in Z[i] leads to a Euclidean algorithm in Z[i]. This give
us a notion of greatest common divisor, and by running the Euclidean algorithm backwards,
a Bezout lemma. However, due to the unit issue in Z[i], we have to be a bit careful. It’s not
possible to pick out “a” greatest common divisor like in Z and F [x]. There is some inherent
ambiguity in the definition!
Definition 2.2.8. Let α, β ∈ Z[i]. We call δ ∈ Z[i] a greatest common divisor of α, β if:
1. δ | α and δ | β.

2. For any other γ ∈ Z[i] with γ | α and γ | β we have γ | δ.


In particular, if δ is a greatest common divisor of α, β then so are ±δ, ±iδ. When we
write gcd(α, β), we mean a choice of greatest common divisor. If gcd(α, β) is a unit, then
we say that α, β are relatively prime.
Theorem 2.2.9. Let α, β ∈ Z[i]. Then there exist γ, ρ ∈ Z[i] such that αγ + βρ = gcd(α, β)
for any choice gcd(α, β).
Proof. The proof is as usual. The Euclidean algorithm terminates in a greatest common
divisor δ of α and β, so running it backwards produces γ, ρ such that αγ + βρ = δ. We just
need to check that any two greatest common divisors differ by a unit multiple, so that we
can obtain any of them from one equation. However, this is easy, as if δ, δ 0 are two greatest

27
common divisors of α, β, then δ 0 | δ and δ | δ 0 . This means N (δ 0 ) | N (δ) and N (δ) | N (δ 0 ),
so that N (δ) = N (δ 0 ). Then from δ 0 | δ, we see that δ = εδ 0 for some ε ∈ Z[i], so taking
norms says N (ε) = 1, which means that ε is a unit as desired.

Example 2.2.10. Let α = 32 + 9i and β = 4 + 11i. Running the Euclidean algorithm,

32 + 9i = (4 + 11i)(2 − 2i) + (2 − 5i)


4 + 11i = (2 − 5i)(−2 + i) + (3 − i)
2 − 5i = (3 − i)(1 − i) − i
3 − i = (−i)(1 + 3i) + 0
The last non-zero remainder is a unit, so α and β are relatively prime. If we back substitute,
we can solve the Bezout equation. Doing so, we find that α(3i) + β(−7 − 5i) = −i.

In order to talk about unique factorization, we need to decide what it means for α ∈ Z[i]
to be prime.

Definition 2.2.11. Let α ∈ Z[i] with N (α) > 1. We call α prime if α has no non-trivial
divisors in Z[i], i.e. if α = βγ then either β or γ is a unit in Z[i].

One can sometimes easily detect if a Gaussian integer is prime or not by looking at its
norm:

Proposition 13. Suppose that α ∈ Z[i] has N (α) prime in Z. Then α is prime in Z[i].

Proof. Suppose that α = βγ for some β, γ ∈ Z[i]. Taking norms, this means p = N (α) =
N (β)N (γ) for some prime p ∈ Z. This means either N (β) or N (γ) equals 1 since p is prime,
and therefore that either β or γ is a unit in Z[i].
Be careful! The converse of this statement is not true. For example, 3 is prime in Z[i],
but its norm N (3) = 9 is not prime in Z. To see that 3 is prime, suppose that 3 = αβ
for some α, β ∈ Z[i]. Taking norms, we would have 9 = N (α)N (β). If the factorization
is non-trivial, then N (α) = 3. Writing α = a + bi, we would need a2 + b2 = 3. However,
one can easily see that this has no integer solutions! Therefore, it’s not possible to have a
non-trivial factorization.

Once again, before we can get unique factorization, we need Euclid’s lemma.

Proposition 14 (Euclid’s lemma). Suppose that π ∈ Z[i] is prime and π | αβ for some
α, β ∈ Z[i]. Then π | α or π | β.

Proof. Same as usual. If π - α, then since π is prime gcd(π, α) is a unit. Therefore by Bezout,
we can find γ, ρ such that πγ + αρ = 1. Multiplying through by β shows that π | β.
We’re now ready to state and prove unique factorization in Z[i].

Theorem 2.2.12. Let α ∈ Z[i] with N (α) > 1. Then there exist π1 , . . . , πk ∈ Z[i] prime
such that α = π1 · · · πk . This is unique, in the following sense: if π10 , . . . , π`0 is another prime
factorization, then k = ` and there is some reordering so that πi and πi0 differ by a unit.

28
Proof. We follow the same proof structure as we have done for the other two locations. First
we prove every such α can be written as a product of primes, and then we show the notion
of uniqueness. We do this by induction on the norm.
Existence:
If N (α) = 2, we’re done since α is prime. Suppose that any α with N (α) ≤ k for some k can
be written as a product of primes, and let α ∈ Z[i] have N (α) = k + 1. If α is prime, we’re
done. Otherwise, write α = βγ for some β, γ with 1 ≤ N (β), N (γ) ≤ k. By assumption,
both β and γ can then be written as a product of primes, and therefore α can as well. By
induction, any such α is a product of primes.

Uniqueness:
Again, if N (α) = 2 we’re done. Assume we know that for any α with N (α) ≤ k for some
k, that the prime factorization is unique in the sense as stated in the theorem. Let α ∈ Z[i]
with N (α) = k + 1. If α is prime, again we’re done, otherwise α can be written as a product
of primes. Write α = π1 · · · πk = π10 · · · π`0 as two prime factorizations. Since π1 is prime and
π1 divides the left hand side, by Euclid’s lemma we must have π1 | πi0 for some i. Without
loss of generality, assume that π1 | π10 . This means that π10 = π1 γ for some γ, and since
π10 is prime, this means that γ must be a unit. Canceling π1 from both sides, this says
π2 · · · πk = γπ20 · · · π`0 . Since γπ2 is still a prime, by induction hypothesis, we must have
k = `. and after reordering, πi must differ from πi0 by a unit (here, by abuse of notation, we
are taking π20 = γπ20 in the original sense of π20 ). The point, then, is that a unit multiple of
γπ20 is still just a unit multiple of π20 , and so πi differs by a unit multiple of πi0 for all i ≥ 1.
Therefore by induction, we’re done.
You’re probably used to factoring both integers and polynomials throughout your life,
but it’s unlikely you have experience with factoring Gaussian integers. How can this be
done? Take norms!

Example 2.2.13. Let α = 3 + 4i. Then N (α) = 25, so any non-trivial divisor of α must
have norm 5. Let β = a+bi, so we want to solve N (β) = a2 +b2 = 5. One may check that the
only solutions to this are (±2, ±1) and (±1, ±2), which correspond to the Gaussian integers
1 + 2i, 1 − 2i, and all their unit multiples. Both of these are prime because their norm is 5,
3+4i
so we can just check for divisibility by hand. We find that 1−2i = −5+10i
5
= −1 + 2i, so that
2
3 + 4i = (1 − 2i)(−1 + 2i) = −(1 − 2i) is a prime factorization of 3 + 4i.

2.3 Euclidean domains


In the above two sections, we’ve seen two other settings that have unique factorization, just
like in the integers. In both of these places, we followed the exact same chain of reasoning
to reach this conclusion. In this section, we’ll use the language of rings to abstract away the
properties needed to follow this chain of reasoning.

Definition 2.3.1. An integral domain is a ring R such that for any a, b ∈ R with a, b 6= 0,
ab 6= 0.

29
Definition 2.3.2. A Euclidean domain is an integral domain R with a function N :
R\ → N with N (0) = 0 such that for any a, b ∈ R with b 6= 0 there exist q, r ∈ R such that
a = bq + r with r = 0 or N (r) < N (b). The function N is called the norm.
This definition says that a Euclidean domain is an integral domain that has a division
algorithm.
Example 2.3.3. The ring Z is a Euclidean domain with N (n) = |n|. For F a field, F [x] is
a Euclidean domain with norm given by N (p(x)) = deg(p(x)). Z[i] is a Euclidean domain
with norm N (α) = αᾱ.
If R is a Euclidean domain, then because R has a division algorithm, R has a Euclidean
algorithm. In Z[i], we saw that the greatest common divisor is only defined up to unit
multiple. There, it was easy to find all the units, because the norm was multiplicative. This
need not be true in general, and a general Euclidean domain could have many units. For
this reason, it will be more beneficial to state results in terms of ideals of R: two principal
ideals (d) and (d0 ) are equal if and only if d = ud0 for some u ∈ R× , so the ideal generated
by an element does not care about multiplication by a unit.
Proposition 15. Let R be a Euclidean domain. Then every ideal of R is principal.
Proof. Let I 6= (0) be an ideal of R. By the well-ordering principle, there exists d ∈ I such
that d has minimal among all elements of I. Then clearly (d) ⊆ I. For any a ∈ I, since R
is Euclidean, we may write a = dq + r with r = 0 or N (r) < N (d). As r = a − dq ∈ I, by
definition of d we must have r = 0, so a ∈ (d) and therefore I = (d).
The next corollary follows immediately:
Corollary 2.3.4. (Bezout’s lemma) Let R be a Euclidean domain. For any a, b ∈ R there
exists d ∈ R such that (a, b) = (d).
This means that there exist x, y ∈ R such that ax + by = d, so this indeed the same as
the form of Bezout’s lemma we are used to. We now give a definition of the greatest common
divisor of two elements:
Definition 2.3.5. Let R be a Euclidean domain. Let a, b ∈ R. We say that a divides b
and write a | b if there is x ∈ R such that ax = b. We call d 6= 0 a greatest common
divisor of a and b if d is a generator of the ideal (a, b).
Proposition 16. Let R be a Euclidean domain and let a, b ∈ R. Then d is a greatest
common divisor of a and b if and only if d is a common divisor of a and b and for any other
common divisor d0 of a and b, we have d0 | d.
Proof. First, suppose that d is a greatest common divisor of a and b, so that (a, b) = (d).
Suppose that d0 | a and d0 | b. Then we may write a = d0 r and b = d0 s for some r, s ∈ R. As
(a, b) = d, there are x, y ∈ R with ax + by = d, so d0 (rx + sy) = d, which says d0 | d.

Conversely, suppose that d has the stated property. We need to show that (a, b) = (d).
We know that (a, b) = (d0 ) for some d0 ∈ R, so there are x, y ∈ R with ax + by = d0 . As d | a

30
and d | b this means d | d0 . On the other had, as a, b ∈ (d0 ) this means that d0 | a and d0 | b,
so d0 | d. This means there are r, s ∈ R with d0 = rd and d = sd0 , so (1 − rs)d = 0. As R is
a domain, this means rs = 1 so that d and d0 differ by a unit multiple, so (a, b) = (d0 ) = (d)
as desired.
We will use the notation gcd(a, b) to mean a choice of greatest common divisor as in the
previous section. Just like in Z[i], we can use the same proof as in Z to show that for any
a, b ∈ R not both zero, the last non-zero remainder upon running the Euclidean algorithm
is a greatest common divisor of a and b.

Next, we need a notion of what it means for an element in R to be prime.

Definition 2.3.6. Let R be an integral domain. We call an element p ∈ R prime if p is


not a unit and for any a, b ∈ R, if p | ab then p | a or p | b.

Proposition 17. Let R be a Euclidean domain. Then p ∈ R is prime if and only if p = ab


for a, b ∈ R means either a or b is a unit.

Proof. First, suppose that p is prime in R, and p = ab. Then p | ab clearly, so p | a or


p | b by definition. Without loss of generality, assume that p | a, so a = pr for some r ∈ R.
Multiplying by b shows that br = 1, so that b is a unit.

Conversely, suppose that p ∈ R has no non-trivial divisors, and suppose that p | ab. If
p - a, the claim is that (a, p) = (1), so that there are x, y ∈ R with ax + py = 1, and then the
usual argument of multiplying by b shows that p | b, so that p is prime. Let d be a greatest
common divisor of a and p, so (a, p) = (d). This means that p = dr for some r ∈ R, and
therefore either d or r is a unit. Since a ∈ (d), there is s ∈ R with a = ds. If d is not a unit,
then this means r is a unit, and so a = p(r−1 s), which is a contradiction, as p - a. Therefore,
d is a unit, and we’re done.
Note that in the above argument, the first half did not use anywhere that R was a
Euclidean domain. This means that in any integral domain, if p is prime then p has no
non-trivial divisors. However, the backwards direction used that R was Euclidean in a non-
trivial way. If R is a domain where every ideal of R is principal, we call R a principal ideal
domain (PID for short). We then have a well-defined notion of a greatest common divisor
in a PID by the same definition, and the same argument shows that the above proposition
holds in any PID. For an example of a ring where these notions are not the same, see the
exercises.

We are now ready to discuss unique factorization in Euclidean domains. We’d like to
give a proof in the spirit that was given for Z, F [x], Z[i]. All of those proofs proceeded by
induction on the norm of an element, which all relied on the key fact that in these rings,
if a | b, then N (a) ≤ N (b). Our definition of a Euclidean domain does not guarantee that
a norm function has this property. The standard proof in many abstract algebra books is
to use the property that a PID (and therefore, a Euclidean domain) is Noetherian, which
means that any ascending chain of ideals I1 ⊆ I2 ⊆ . . . eventually stabilizes, i.e. there is

31
some n such that In = Ik for all k ≥ n.

However, it turns out that if R is a Euclidean domain, we can always find a different
norm for R that does have the desired property.

Lemma 2.3.7. Let R be a Euclidean domain with norm function N . There there exists
another norm N 0 on R such that for all non-zero a, b ∈ R, N 0 (a) ≤ N 0 (ab) and R is
Euclidean with respect to N 0 .

Proof. Define N 0 : R → N by N 0 (0) and N 0 (a) = minb6=0 N (ab), i.e. N 0 is the smallest value
of N among all non-zero multiples of a. First, we show that N 0 (a) ≤ N 0 (ab) for non-zero
a, b ∈ R. By definition, we can write N 0 (ab) = N (abc) for some c 6= 0 ∈ R. Then as abc
is a non-zero multiple of a, N 0 (a) ≤ N (abc) = N 0 (ab). Next, we show that R admits a
division algorithm with respect to N 0 . Let a, b ∈ R with b 6= 0. Write N 0 (b) = N (bc) for
some c 6= 0 ∈ R. As R has a division algorithm with respect to N , there are q, r ∈ R with
a = (bc)q + r and r = 0 or N (r) < N (bc). Set q 0 = qc, so a = bq 0 + r with r = 0 or
N (r) < N (bc). As N (bc) = N 0 (b), and N 0 (r) ≤ N (r · 1) = N (r), we have N 0 (r) < N 0 (b), so
R is Euclidean with respect to N 0 .

Theorem 2.3.8. Let R be a Euclidean domain. For any non-zero r ∈ R that is not a unit,
there exist p1 , . . . , pk ∈ R prime such that r = p1 · · · pk . This factorization is unique in the
following sense: if r = q1 · · · q` is another prime factorization of r, then k = ` and pi = ui qi
for some units ui .

Proof. By the above, we may assume that the norm on R satisfies the inequality in the
statement of the lemma. The proof then proceeds as usual, by induction on the norm of a.
We leave the details as an exercise.

2.4 Exercises
1. Verify that the same proof of the division algorithm works in an arbitrary polynomial
ring R[x] when g(x) is monic.

2. Run the Euclidean algorithm backwards in Q[x] to find p(x), q(x) ∈ Q[x] such that
(4x3 − 2x2 − 3x + 1)p(x) + (2x2 − x − 2)q(x) = 1.

3. Show that Z[x] does not have a “Bezout lemma” by showing that gcd(2, x) = 1, but
there is no way to write 1 as a Z[x]-linear combination of 2 and x.

4. Let F be a field. Prove there are infinitely many monic irreducible polynomials in
F [x]. Does your proof still work in Z[x]?

5. Let p(x), q(x), r(x) ∈ R[x] be non-constant. Prove that if gcd(p(x), q(x)) = 1 and
r(x) | p(x) + q(x), then there exist polynomials a(x), b(x) ∈ R[x] such that a(x)p(x) +
(q(x)r(x))b(x) = 1.

6. Use norms to find prime factorizations of 3 + 5i and 1 + 18i in Z[i].

32
7. Run the Euclidean algorithm backwards in Z[i] to find α, β ∈ Z[i] such that (8 − i)α +
(9 + 2i)β = 2 + i.

8. Suppose that α, β, γ ∈ Z[i] with α7 | β 8 and β 5 | γ 4 . Prove that α | γ.


√ √
9. Define Z[ −5] = {a + b −5 : a,√b ∈ Z}. This forms a ring, with the operations of
addition and multiplication√ in Z[ −5] analogous to how addition and multiplication

in√Z[i] work. For α ∈ Z[ −5], the norm of α is defined by N (α) = (a + b −5)(a −
b −5) = a2 + 5b2 . The same p argument that shows the norm in Z[i] is multiplicative
also shows the norm in Z[ −5] is multiplicative.
√ √
(a) A number α ∈ Z[ −5] is called irreducible if α =√βγ in Z[ −5] means either
β = ±1 or γ = ±1. Show that 3 is irreducible in Z[ −5].

(b) (If√you’ve seen ring theory) A number α ∈ Z[√ −5] is called prime √ if α | βγ in
Z[ −5] means α | β or α | γ. Show that 3 + 2 −5 is prime in Z[ −5].
√ √
Show that 3 is not prime in Z[√ −5] by showing that
(c) √ √ 2 · 3 = (1 + −5)(1 −
−5), but 3 divides neither 1 ± −5. Deduce that Z[ −5] does not have unique
factorization into irreducibles.

10. A Pythagorean triple is a tuple (x, y, z) of positive integers such that x2 + y 2 = z 2 . A


Pythagorean triple (x, y, z) is called primitive if gcd(x, y, z) = 1. You’ll use the unique
factorization of Z[i] to classify all primitive Pythagorean triples.

(a) Show that if (x, y, z) is primitive, then z is odd and exactly one of x, y is even.
(b) Let α, β, γ ∈ Z[i] such that α, β are relatively prime and αβ = γ 2 . Prove that
both α and β are perfect squares in Z[i].
(c) Suppose α ∈ Z[i]. Show that (1 + i) | α if and only if N (α) is even.
(d) Let α = x + yi with x odd, y even, and gcd(x, y) = 1. Prove that α and ᾱ are
relatively prime. (Hint: if δ is a gcd, show that δ | 2 = −i(1 + i)2 . Then, use part
(c))
(e) Deduce that (x, y, z) is a primitive Pythagorean triple if and only if x = m2 −
n2 , y = 2mn, z = m2 + n2 for some integers m, n.

11. Prove theorem 2.3.8.

33
Part II

Modular Arithmetic

34
Chapter 3

Arithmetic in quotient rings

3.1 The quotient ring and Z/nZ


Let R be a commutative ring and let I be an ideal of R. We start with a general ring
theoretic construction, before specializing to our particular cases of interest.

Define an equivalence relation on R by a ∼ b if and only if a − b ∈ I. One may check


that this is indeed an equivalence relation. For any a ∈ R, we will interchangeably write any
of the following to mean the equivalence class a: [a],a + I, ā, a mod I.

Definition 3.1.1. The quotient ring R/I is, as a set, R/I = {[a] : a ∈ R} equipped
operations + and · defined by [a] + [b] := [a + b] and [a] · [b] := [ab]. The additive and
multiplicative identity elements with respect to these operations are [0] and [1].

We leave it to the reader to check that these operations are well-defined (i.e. do not
depend on a choice of representative of the equivalence class) and satisfy the ring axioms. In
terms of notation, we usually write a ≡ b mod I to mean [a] = [b] in R/I, or when I = (r)
is principal, we write a ≡ b mod r.

There is a natural ring homomorphism π : R → R/I given by π(r) = r mod I called


the reduction map. The process of passing from a ring R to a quotient ring R/I via the
reduction map is a powerful technique for understanding R.

In Z, all ideals look like (n) for some integer n > 0. The quotient ring Z/(n) is called
the integers modulo n, and is often instead written Z/nZ. By the division algorithm, any
integer a > 0 can be written as a = nq + r with 0 ≤ r < n. This means that a ≡ r mod n,
and so any [a] ∈ Z/nZ looks like [r] for some 0 ≤ r ≤ n − 1. Since all these equivalence
classes are distinct, this means that Z/nZ = {[r] : 0 ≤ r ≤ n − 1}, so that Z/nZ is a finite
ring of size n.

Example 3.1.2. In Z/10Z, we have 5 + 5 ≡ 10 ≡ 0 mod 10, 5 + 8 ≡ 13 ≡ 3 mod 10,


−6 + 2 ≡ −4 ≡ 6 mod 10, and 3 · 7 ≡ 21 ≡ 1 mod 10.

Example 3.1.3. In Z/7Z, we have 23 ≡ 8 ≡ 1 mod 7 and 34 ≡ 92 ≡ 22 ≡ 4 mod 7.

35
General advice for doing computations in Z/nZ would be to “reduce” as you go along,
as seen in the computation of 34 mod 7 in the previous example. Other useful tricks are
changing to a different representative of the equivalence class to one of smaller absolute
value. For example, it might be annoying to try computing 10012 mod 1003 by directly
squaring 1001, however this is easy if you note that 1001 ≡ −2 mod 1003, so that 10012 ≡
(−2)2 ≡ 4 mod 1003.

Definition 3.1.4. We say that [a] ∈ Z/nZ is a unit (or is invertible) if there is [b] ∈ Z/nZ
such that [a][b] = [1] in Z/nZ. If such a [b] exists, we write [b] = [a]−1 and call [b] the inverse
of [a]. The units of Z/nZ are denoted by (Z/nZ)× .

Our first goal is the classification of the units mod n, which is a familiar result in disguise.

Theorem 3.1.5. [a] ∈ Z/nZ is a unit of and only if gcd(a, n) = 1.

Proof. By definition, [a] is a unit in Z/nZ if and only if there is [b] such that [a][b] = [1], i.e.
ab ≡ 1 mod n. By definition of Z/nZ, this is true if and only if ab − 1 = ny for some y ∈ Z,
i.e. ab − ny = 1 in Z. By Bezout’s lemma, this is equivalent to saying that gcd(a, n) = 1.
The units of Z/nZ are the elements of Z/nZ that we can divide by: dividing by [a] in
Z/nZ is the same thing as multiplying by [a]−1 , just like how in Q dividing by n is the same
as multiplying by n−1 .

Corollary 3.1.6. Z/nZ is a field if and only if n is prime.

Proof. If n = p is prime, then gcd(a, p) = 1 for all 1 ≤ a ≤ p − 1 so all non-zero elements


of Z/pZ are units, which by definition means that Z/pZ is a field. On the other hand, if
n = ab is a non-trivial factorization, then we have ab ≡ 0 mod n with a 6≡ 0 mod n and
b 6≡ 0 mod n, which is not possible in a field.
The quotients Z/pZ for p prime are examples of finite fields. When we want to think
of Z/pZ as a field and not just as a ring, we will write Fp instead. We’ll later see that
all finite fields are “built” from the finite fields Fp , so that these are the most important
examples of finite fields. We’ll also later see how the rings Z/pZ are the most important
rings for understanding the structure of Z/nZ, analogously to how understanding Z means
understanding primes.

Since we know how to determine if a mod n is a unit or not, how can we find its inverse?
This can be done by solving the Bezout equation.

Example 3.1.7. Computing the inverse of 5 mod 13 is the same as trying to solve the
equation 5x ≡ 1 mod 13. This means finding integers x, y such that 5x = 1 + 13y, i.e.
5x − 13y = 1. By inspection (or by running the usual algorithm), note that (−5, −2) is one
such solution, so that 5 · −5 ≡ 1 mod 13. This means that 5−1 ≡ −5 ≡ 8 mod 13.

Next, we determine when a linear equation in Z/nZ is solvable. Just like in Z, this will
not be particularly difficult to do.

36
Theorem 3.1.8. The equation [a]x = [b] in Z/nZ is solvable if and only if gcd(a, n) | b.
If [x0 ] is one solution, then all solutions are given by [x0 + k nd ] for 0 ≤ k ≤ d − 1 where
d = gcd(a, n).

Proof. Saying [a]x = [b] in Z/nZ is the same thing as saying ax ≡ b mod n. When is the
congruence ax ≡ b mod n solvable? This happens if and only if there is x, y ∈ Z such that
ax = b + ny, i.e. ax − ny = b. We know this has integer solutions if and only if gcd(a, n) | b.

Now, suppose that we have two solutions, [x] and [x0 ] in Z/nZ. This means that ax ≡
b mod n and ax0 ≡ b mod n, so that a(x − x0 ) ≡ 0 mod n. This means that there is y ∈ Z
such that a(x − x0 ) = ny, so dividing through by d = gcd(a, n) means that a0 (x − x0 ) = n0 y
where a0 = ad and n0 = nd . Since gcd(a0 , n0 ) = 1, this means that n0 | (x − x0 ), so that
x = x0 + n0 k for some k ∈ Z. As k runs through the various integers, what are the
possible equivalence classes [x]? Well, [x0 + n0 k] = [x0 + n0 `] is the same as saying that
n0 (k − `) ≡ 0 mod n, which means that n0 (k − `) = nm for some m. Dividing through by n0
means k −` = dm, so that k ≡ ` mod d. This means that there are d incongruent equivalence
classes mod n, coming from taking k = 0, . . . , d − 1, and therefore, by what we’ve shown,
the only possible solutions to [a]x = [b]. It’s clear that each value of k in this range produces
distinct equivalence classes mod n, and it’s easy to check that they are solutions, so therefore
these consist of all solutions.
The above theorem highlights some of the strangeness that happens when you work in
rings and not fields. A linear equation in Z/nZ can have more than one solution! When [a]
is a unit, however, there is indeed a unique solution to [a]x = [b], namely, x = [a]−1 [b].

Example 3.1.9. What are the solutions to the equation [42]x = [12] in Z/78Z? Equivalently,
this is the same as asking how do you solve the congruence equation 42x ≡ 12 mod 78?
Since gcd(42, 78) = 6, this means there are 6 solutions to this equation. To find one, first
we find a solution to 42x + 78y = 12 in Z2 . Running the Euclidean algorithm, we find that
42·2+78·−1 = 6, so that 42·4+78·−2 = 12. Working mod 78, this means 42·4 ≡ 12 mod 78,
so that one solution is 4 mod 78. The above theorem then says that all solutions are given
by x ≡ 4, 17, 30, 43, 56, 69 mod 78.

The strength of working inside Z/nZ instead of Z is that it translates problems of di-
visibility into a finite setting. It’s very similar to how in linear algebra it’s much easier to
prove things in a finite dimensional vector space by reducing to just working with a basis.
Statements about divisibility get translated to statements in Z/nZ by simply realizing that
saying n | a is the same thing as saying a ≡ 0 mod n. For example, suppose we wished
to prove that 3 | n3 − n for any integer n. In Z, one such argument would be to fac-
tor n3 − n = (n − 1)n(n + 1) into a product of three consecutive integers, one of which
must therefore be divisible by 3. Working mod 3, however, this just means verifying that
n3 ≡ n mod 3 for any integer n. The only congruence classes mod 3 are 0, 1, 2 mod 3, so just
verifying directly that 03 ≡ 0 mod 3, 13 ≡ 1 mod 3, and 23 ≡ 8 ≡ 2 mod 3 is enough!

In particular, one very important application of passing to the quotient is that it can
help detect obstructions to solutions of Diophantine equations.

37
Example 3.1.10. Does x2 + y 2 = 3 have integer solutions? It’s easy to verify by hand
that this is not possible, since clearly 0 ≤ |x|, |y| ≤ 1. However, a better argument is the
following. Suppose that x2 + y 2 = 3 did have a solution (x, y). Then working mod 4, we
would have x2 + y 2 ≡ 3 mod 4. All integers square to either 0 or 1 mod 4, so the left hand
side is congruent to 0, 1, 2 6≡ 3 mod 4, which is a contradiction. Therefore, there cannot be
any integer solutions.

3.2 Quotients of Z[i]


Next, we’ll apply the quotient ring construction to Z[i]. Any ideal of Z[i] is principal, so
I = (α) for some α ∈ Z[i]. Saying that β ≡ β 0 mod α means that β = β 0 + αγ for some
γ ∈ Z[i], or equivalently, that β − β 0 = αγ. What do Z[i]-multiples of α look like? Set
α = a + bi and γ = c + di. By direct computation, one can verify that αγ = cα + d(iα).
This means that αγ is a Z-linear combination of α and iα!

In the complex plane, α and iα are orthogonal to each other, and therefore form an
R-basis of C. Let L = SpanZ {α, iα} be the lattice in C generated by α and iα. As a set, L
consists of all the Z[i]-multiples of α!
Definition 3.2.1. Let α ∈ Z[i] and let L = SpanZ {α, iα}. The fundamental parallelo-
gram of L is the parallelogram in C whose vertices are given by 0, α, iα, α + iα.
The fundamental parallelogram is named as such because any z ∈ C can be translated
to a point inside the fundamental parallelogram by moving along the lattice L. What this
means, then, is that for any β ∈ Z[i], there is γ ∈ Z[i] inside the fundamental parallelogram
of L such that β ≡ γ mod α. This means that the equivalence classes inside Z[i]/(α) come
from the Gaussian integers in the fundamental parallelogram.
Example 3.2.2. Let α = 1 + 2i, so iα = −2 + i. The lattice of interest is L = SpanZ {1 +
2i, −2 + i}, and the fundamental parallelogram has vertices 0, 1 + 2i, −2 + i, −1 + 3i. To find
a set of representatives for Z[i]/(1 + 2i), we need to find the incongruent lattice points in the
fundamental parallelogram. The first observation is that all lattice points in the interior are
incongruent. The second observation is that we need only consider boundary lattice points
on the lower half of the parallelogram. This is because boundary points on the upper half
can be translated down to a point on the lower half by moving along the boundary. The final
observation is that all corner points are congruent to 0, as they are simply just Z[i]-multiples
of α. We then find that Z[i]/(1 + 2i) = {[0], [i], [2i], [−1 + i], [−1 + 2i]} so that Z[i]/(1 + 2i)
has size 5.
In general, what is the size of Z[i]/(α)? By our observations, this is determined by the
number of lattice points inside the interior of the fundamental parallelogram of the lattice
L, and half the number of lattice points on the boundary. However, this counts the origin
twice, so subtracting 1 gives the correct count. There is a result known as Pick’s theorem,
which says that this lattice point count is precisely the area of the parallelogram, which is
N (α). We’ll later see how Pick’s theorem can be proved.

Algebraically, arithmetic in Z[i]/(α) is similar to arithmetic in Z/nZ.

38
Example 3.2.3. In Z[i]/(1 + 2i), we have (−1 + 2i)(2i) ≡ −4 − 2i ≡ i mod 1 + 2i because
−4 − 2i = (−2 + i)α + i. Additionally, (2i)2 ≡ −4 ≡ −1 + i mod 1 + 2i.

Both of these computations can also be done geometrically by drawing the lattice and
seeing which Gaussian integer −4 − 2i and −4 get translated to, respectively. However, in
general, it’s going to be significantly easier to just perform the division algorithm to compute
the remainder upon division by α.

Proposition 18. [β] is a unit in Z[i]/(α) if and only if α, β are relatively prime in Z[i].

Proof. The proof is the same as in Z/nZ: [β]x = [1] is solvable in Z[i]/(α) if and only if
βx + αy = 1 is solvable in Z[i]2 , which is equivalent to saying that gcd(α, β) is a unit.
Just like in Z/nZ, finding the inverse of [β] in Z[i]/(α) can be done by solving the corre-
sponding Bezout equation in Z[i]. The useful corollary of the above result is the following:

Corollary 3.2.4. If α is prime in Z[i], then Z[i]/(α) is a field.

Example 3.2.5. With α = 1 + 2i as before, since N (α) = 5 is prime, this means that
α is prime in Z[i]. Therefore, Z[i]/(α) is a field of size 5. One can construct an explicit
isomorphism F5 ∼= Z[i]/(1 + 2i) by the map x mod 5 7→ x mod 1 + 2i.

3.3 Quotients of F [x] and finite fields


Similar to before, we will mostly focus our attention on quotients of F [x] instead of arbitrary
polynomial rings. This is largely because in F [x], all ideals are principal, and so we only have
to consider the quotient rings F [x]/(p(x)) for p(x) ∈ F [x], which will be rather simple. For
other rings, such as Z[x], the structure of ideals is more complicated. For example, I = (2, x)
is not a principal ideal of Z[x]. For the reader familiar with ring theory, it might be a fa-
miliar fact that Z[x]/(2, x) ∼
= F2 , explicitly realized by the map p(x) mod I 7→ p(0) mod 2.
Understanding such quotients are important in number theory, as, for example, one may
identify Z[x]/(x2 + 1) ∼
= Z[i] via p(x) mod x2 + 1 7→ p(i), so that the Gaussians integers are
actually just a quotient of Z[x]!

Let I = (p(x)) be an ideal of F [x]. By the division algorithm, any f (x) ∈ F [x] can be
uniquely written as f (x) = p(x)q(x) + r(x) with r(x) = 0 or deg(r(x)) < deg(p(x)), and so
in the quotient ring, f (x) ≡ r(x) mod p(x). Since [r(x)] are all distinct for r(x) satisfying
the remainder bound, similar to as in Z this tells us that F [x]/(p(x)) = {[r(x)] : r(x) =
0 or deg(r(x)) < deg(p(x))}. Unlike with Z or Z[i], quotients of F [x] may or may not be
infinite, depending on if F is finite or not.

Example 3.3.1. The equivalence classes in the quotient ring R[x]/(x2 + 1) are given by
[a + bx] for a, b ∈ R. There is an isomorphism R[x]/(x2 + 1) ∼
= C given by [a + bx] 7→ a + bi,
2
so that R[x]/(x + 1) is a field.

Example 3.3.2. In F2 [x]/(x2 + x + 1), the equivalence classes are given by [0], [1], [x], [x + 1]
where we think of 0, 1 as living in F2 , and there are four elements in the quotient. In

39
this quotient, addition of the coefficients works “mod 2”. For example, we have x2 ≡
x+1 mod x2 +x+1 because x2 +x+1 ≡ 0 mod x2 +x+1. Similarly, x(x+1) ≡ 1 mod x2 +x+1
because x(x + 1) + 1 ≡ 0 mod x2 + x + 1.

Example 3.3.3. F3 [x]/(x2 +1) has 9 elements: the equivalence classes [a+bx] with a, b ∈ F3 .
As [x2 + 1] = [0], this tells us that [x]2 = [2]. We have (2x + 1) + (x + 2) ≡ 0 mod x2 + 1,
and (2x + 1)(x + 2) = 2x2 + 5x + 2 ≡ 2x mod x2 + 1.

Unsurprisingly, we have similar results as before:

Proposition 19. [f (x)] ∈ F [x]/(p(x)) is a unit if and only if gcd(f (x), p(x)) = 1.

Corollary 3.3.4. F [x]/(p(x)) is a field if and only if p(x) is irreducible in F [x].

We omit the proofs and leave them to the reader.

Example 3.3.5. We saw before that R[x]/(x2 + 1) ∼ = C via an explicit isomorphism, so


it’s a field. Alternatively, x2 + 1 is irreducible in R[x] because it has no root in R[x], which
tells us that R[x]/(x2 + 1) is a field (but without helping us identify it’s isomorphism class).
Similarly, x2 + x + 1 is irreducible in F2 [x] because it has no root in F2 . This tells us that
F2 [x]/(x2 + x + 1) is a field of size 4.

Example 3.3.6. Since gcd(x +2, x2 − 1) = 1 in Q[x] = 1, this means that [x +2] is invertible
in Q[x]/(x2 − 1). To find a multiplicative inverse, we run the Euclidean algorithm backwards
to find 31 (x2 − 1) + (x + 2)( 13 (2 − x)) = 1 is a solution to the Bezout equation. Taking this
mod x2 − 1, we find [x + 2]−1 = [ 13 (2 − x)].

One particular application of the above corollary is that it gives us a method of con-
structing finite fields! If F is a finite field of size q and p(x) is an irreducible polynomial of
degree d in F [x], then F [x]/(p(x)) is a field of size pd elements, because there are p choices
for the d coefficients of a polynomial of degree at most d − 1. It turns out, that all finite
fields arise from this construction, although we will not prove this. We will, however, prove
the following basic fact:

Proposition 20. Let F be a finite field. Then |F | = pn for some prime p and n ≥ 1.

Proof. Since F is finite, we must have k · 1 = |1 + 1 +


{z. . . + 1} = 0 for some integer k. Let c
k times
be the smallest positive integer with this property. Then c must be prime, because if c = ab
is a non-trivial factorization, we have c · 1 = (ab) · 1 = (a · 1)(b · 1) = 0 in F . Since F is a
field, this means a · 1 = 0 or b · 1 = 0, contradicting the definition of c. Writing p instead of
c, there is a natural isomorphism of fields Fp ∼ = h1i, given by x mod p 7→ x · 1, where h1i is
the additive subgroup of F generated by 1. Therefore, we may view Fp as a sub-field of F
by identifying it with h1i. We then see that Fp acts on F by multiplication, so we may view
F as a vector space over Fp . Since F is finite, it must be a finite dimensional vector space,
and therefore we must have F ∼ = Fnp for some n by the classification of finite dimensional
vector spaces. Taking cardinalities gives us what we want.

40
A slightly easier task than classifying all finite fields is showing that for each n ≥ 1, there
is a finite field of size pn (the above proposition just says that it’s possible, not that there is
one!). This task is equivalent to showing that there is at least one irreducible polynomial of
degree n in Fp [x] for each n. Since irreducible polynomials in Fp [x] play the role of primes
in Fp [x], this is basically asking Pif there are primes of each possible “size” in Fp [x]. By
adapting Euler’s argument that p p1 diverges to the setting of Fp [x], one can show that
such polynomials must exist. It is, however, a much more complicated argument!

3.4 Exercises
1. Compute the following:

(a) 1748 − 524 mod 39


(b) 84526 · 8629673 − 448184 · 5911832 mod 15
(c) 1477−1 mod 9235
(d) 1769234 mod 31
(e) 1! + 2! + . . . + 100! mod 25

2. Solve 140x ≡ 133 mod 301.

3. Prove that each of the following Diophantine Equations have no integer solutions.

(a) 7x2 + 2 = y 3
(b) x3 + y 3 + z 3 = 4
(c) x2 − 2y 2 = 10

4. Prove that there are no integers m, n such that 3m + 3n + 1 is a perfect square.

5. Prove that 15 n5 + 31 n3 + 7
15
n is an integer for all n ∈ Z.

6. Let n = an · 10n + . . . + a1 · 10 + a0 be the decimal expansion of n. For example,


123 = 1 · 100 + 2 · 10 + 3.

(a) Show that n is divisible by 3 if and only if a0 + a1 + . . . + an is divisible by 3.


Explain why the same condition also holds for divisibility by 9. Similarly, show
that n is divisible by 11 if and only if a0 − a1 + . . . + (−1)n an is divisible by 11.
(b) Let gcd(k, 10) = 1, and choose b so that 10b ≡ 1 mod k. Show that n is divisible
by k if and only if n−a
10
0
+ ba0 is divisible by k. Apply the divisibility test for
k = 29 to test if 16559 is divisible by 29.

7. (a) Let p be a prime. Show that the only solutions to x2 ≡ 1 mod p are x ≡ ±1 mod p.
(b) Prove that if p is prime, then (p − 1)! ≡ −1 mod p. (Hint: pair up integers with
their inverses mod p).

41
(c) Prove that if n > 4 is composite, then (n − 1)! ≡ 0 mod n.
Combining these two parts says n is prime if and only if n - (n − 1)!. Of course, this is
a very bad way of checking an integer is prime, because (n − 1)! gets very large, very
fast!

8. Carefully go through the steps to construct the quotient ring R/I: verify that ∼ is an
equivalence relation, prove that addition and multiplication are well-defined, and that
R/I is actually a ring.

9. Verify that the map in example 3.2.5 is an isomorphism of fields.

10. Find a set of representatives for the ring Z[i]/(2 + 3i).

11. Compute [1 + 2i]−1 in Z[i]/(2 + 3i).

12. Show that the ideal (2, x) of Z[x] is not principal.

13. Verify that Z[x]/(x2 + 1) ∼


= Z[i] via the map p(x) mod x2 + 1 7→ p(i).

14. Let F be a field and let p(x) be an irreducible polynomial of degree d in F [x], so that
K := F [x]/(p(x)) is a field.

(a) Identify F with its image in K. Show that α = [x] ∈ K is a root of p(T ) ∈ K[T ].
(b) Explain why {1, α, α2 , . . . , αd−1 } is a basis for K as an F -vector space.
(c) Let p(x) = x4 + x + 1 ∈ F2 [x]. Prove p(x) is irreducible in F2 [x] and write down
all the elements of K in terms of the basis in part (b). Then, compute α6 + α + 1.

15. Let p(x) be an irreducible polynomial of degree d in Fp [x], so that F = Fp [x]/(p(x)) is


a field of size pd .

(a) Prove that for any α, β ∈ F , (α + β)p = αp + β p .


(b) Viewing Fp as a subfield of F , suppose that g(x) ∈ Fp [x] has a root α in F . Prove
k
that αp is a root of g(x) for all k ≥ 0.
(c) Let F = F2 [x]/(x4 + x + 1), which by the previous problem is a field where
x4 + x + 1 has a root α. Find all the roots of f (x) in F . Find all the roots of
g(x) = x4 + x3 + 1 in F . Make sure to write your answers in terms of the basis
{1, α, α2 , α3 }!

16. Let α be a root of x2 + x + 2 in F = F3 [x]/(x2 + x + 2). Compute the other root of


x2 + x + 2 in F .

42
Chapter 4

The structure of (Z/nZ)×

4.1 Chinese remainder theorem


The Chinese Remainder Theorem is a “structure theorem” for the ring Z/nZ. It is arguably
the most important theorem in all of number theory!
Theorem 4.1.1 (Chinese remainder theorem). Let n, m be positive integers with gcd(n, m) =
1. Then Z/mnZ ∼
= Z/mZ × Z/nZ as rings.
Proof. Define a map f : Z/mnZ → Z/mZ × Z/nZ by x mod mn 7→ (x mod m, x mod n).
First, we must check that f is well-defined. Suppose that x ≡ x0 mod mn, so that mn | x−x0 .
Then we certainly have that m | x−x0 and n | x−x0 so that x ≡ x0 mod m and x ≡ x0 mod n,
meaning f (x mod mn) = f (x0 mod mn).

Next, we check the map is injective. If f (x mod mn) = f (y mod mn), then x ≡ y mod m
and x ≡ y mod n. This means m | x − y and n | x − y, which means that mn | x − y be-
cause m, n are relatively prime. Therefore, x ≡ y mod mn. To show that f is surjective,
for any pair (a mod m, b mod n) we wish to find x mod mn such that x ≡ a mod m and
x ≡ b mod n. We do this as follows. From x ≡ a mod m we must have x = a + mk for some
integer k. Therefore, a + mk ≡ b mod n, so that mk ≡ b − a mod n. Since gcd(m, n) = 1, m
is invertible mod n, and so k ≡ m−1 (b − a) mod n. Let ` be any integer with ` ≡ m−1 mod n.
Then we see that a + m`(b − a) mod mn is the desired class that works.

That f is a ring homomorphism easily follows from the definitions of addition and mul-
tiplication in the product rings.
By writing n in terms of its prime factorization and repeatedly applying the Chinese
Remainder Theorem, we get the following corollary (often times also called the Chinese
Remainder Theorem).
Corollary 4.1.2. Let n = pe11 · · · pekk be the prime factorization of n. Then as rings, Z/nZ ∼
=
Z/pe11 Z × . . . × Z/pekk Z.
The Chinese Remainder Theorem is important because it says that if you understand
whats happening in Z/pe Z for prime p and e ≥ 1, then you can “glue” all the information

43
together to understand what’s happening in Z/nZ. The proof of the Chinese Remainder
Theorem is constructive, and so by following the proof we can explicitly solve systems of
congruences.
Example 4.1.3. Suppose we wish to find all integers that satisfy the system of congruences

x ≡ 1 mod 2

x ≡ 2 mod 3

x ≡ 3 mod 5

The first equation says that x = 1 + 2k for some integer k, so plugging into the second yields
1 + 2k ≡ 2 mod 3. This means that k ≡ 2 mod 3, so k = 2 + 3` for some `. Plugging back
in, we get that x = 1 + 2(2 + 3`) = 5 + 6`. Finally, plugging this into the last equation,
5 + 6` ≡ 3 mod 5 means that ` ≡ 3 mod 5, so that ` = 3 + 5s for some s. Plugging this
back in, x = 5 + 6(3 + 5s) = 23 + 30s. This says that x ≡ 23 mod 30, and clearly any
such choice of integer actually works. Therefore, any x with x ≡ 23 mod 30 solves the
system. Ring theoretically, this computation can be interpreted as computing the pre-image
of (1 mod 2, 2 mod 3, 3 mod 5) under the isomorphism Z/30Z ∼ = Z/2Z × Z/3Z × Z/5Z from
the Chinese Remainder Theorem.
Since the Chinese remainder theorem is an isomorphism of rings, it induces an isomor-
phism on unit groups, too:
Corollary 4.1.4. Let n, m be positive integers with gcd(n, m) = 1. Then (Z/mnZ)× ∼
=
× ×
(Z/mZ) × (Z/nZ)
Proof. The Chinese Remainder Theorem says that Z/mnZ ∼ = Z/mZ×Z/nZ, so (Z/mnZ)× ∼ =
(Z/mZ × Z/nZ)× . By definition of multiplication in the product ring, we see that (a mod
m, b mod n) is a unit if and only if there is (x mod m, y mod n) such that ax ≡ 1 mod m and
by ≡ 1 mod n, which is equivalent to saying that a mod m and b mod n are units in Z/mZ
and Z/nZ respectively. This means that (Z/mZ × Z/nZ)× = (Z/mZ)× × (Z/nZ)× .
Definition 4.1.5. We define ϕ(n) = #{1 ≤ x ≤ n : gcd(x, n) = 1}.
From what we know about Z/nZ, saying that gcd(x, n) = 1 is the same as saying that x
is a unit mod n, so this means that ϕ(n) = |(Z/nZ)× | counts the size of the unit group mod
n.
Proposition 21. Let m, n be positive integers with gcd(m, n) = 1. Then ϕ(mn) = ϕ(m)ϕ(n).
Proof. This follows immediately from the above corollary by definition of ϕ and noting that
the cardinality of a product ring is the product of cardinalities.
Writing n in terms of its prime factorization, we get an immediate corollary.
Corollary 4.1.6. Let n = pe11 · · · pekk be the prime factorization of n. Then ϕ(n) = ϕ(pe11 ) · · · ϕ(pekk ).
To compute ϕ(n), then, means we just have to compute ϕ(pe ) for prime p and e ≥ 1.
However, this is rather simple.

44
Proposition 22. let p be a prime and e ≥ 1. Then ϕ(pe ) = pe−1 (p − 1).

Proof. ϕ(pe ) counts, by definition, the number of integers between 1 and pe that are co-prime
to pe . There are pe−1 integers in this range not co-prime to pe , namely, the pe−1 multiples of
p. Therefore, there are pe − pe−1 = pe−1 (p − 1) integers that are co-prime.

Corollary 4.1.7. ϕ(n) = n p|n (1 − p1 ), where the product is taken over all primes p that
Q
divide n.

Proof. Write ϕ(pe ) = pe (1 − p1 ), from which it follows immediately by writing n in terms of


a prime factorization and using the multiplicativity of ϕ.

Example 4.1.8. 360 = 23 · 32 · 5, so ϕ(360) = ϕ(8) · ϕ(9) · ϕ(5) = 4 · 6 · 4 = 96.

4.2 Euler’s theorem and orders mod n


Theorem 4.2.1 (Euler). Let a, n ∈ Z with n > 1 and gcd(a, n) = 1. Then aϕ(n) ≡ 1 mod n.

Proof. Consider the map f : Z/nZ → Z/nZ give by f ([x]) = [a][x]. This is a bijec-
tion, with inverse given by f −1 ([x]) = [a]−1 [x]. There are ϕ(n) units in Z/nZ, call them
u1 , . . . , uϕ(n) . Since [a] is a unit mod n and the product of two units is a unit, this means
that f bijectively maps the set of units onto itself, so that the sets {u1 , . . . , uϕ(n) } and
{[a]u1 , . . . , [a]uϕ(n) } are the same. Taking the product of all elements in these sets, we see
that ϕ(n)
Q Qϕ(n) Qϕ(n)
i=1 ui ≡ i=1 aui mod n. Note that i=1 ui is invertible because it’s a product of
units, and therefore dividing through on both sides we find aϕ(n) ≡ 1 mod n as desired.
When n = p is prime, this result is usually known as Fermat’s little theorem.

Corollary 4.2.2 (Fermat’s little theorem). Let p be a prime and p - a. Then ap−1 ≡ 1 mod p.

Both Euler’s theorem and Fermat’s little theorem are special cases of the more general
Lagrange’s theorem in group theory, which says that for any finite group G, and a ∈ G,
one has a|G| = e, where e is the identity element of G. The proof of Lagrange’s theorem is
the same as the proof as Euler’s theorem, proven by showing that the multiplication by a
map is a bijection on G. Euler and Fermat’s results are the cases where G = (Z/nZ)× and
(Z/pZ)× , respectively.

Euler’s theorem is a very useful computational tool for simplifying exponentials mod n,
as it says that the exponent only matters mod ϕ(n).

Example 4.2.3. Suppose we wanted to compute 3256 mod 100. Since ϕ(100) = ϕ(25) ·
ϕ(4) = 20 · 2 = 40, by Euler’s theorem, we have 340 ≡ 1 mod 100. Since 256 ≡ 16 mod 40,
this means that 3256 ≡ 316 mod 100. This is much more manageable, and we handle this
with repeated squaring: 32 ≡ 9 mod 100, 34 ≡ 81 ≡ −19 mod 100, 38 ≡ 361 ≡ 61 mod 100,
316 ≡ 3721 ≡ 21 mod 100.

Euler’s theorem is also quite useful for computing k-th roots mod n:

45
Example 4.2.4. Suppose we wish to solve x5 ≡ 2 mod 7. By Fermat’s little theorem, any
x mod 7 satisfies x6 ≡ 1 mod 7. If we can find k such that 5k ≡ 1 mod 6, then we must
have x ≡ 2k mod 7 by exponentiating. This is a standard inverse computation, and we find
that k ≡ 5 mod 6, works. This means that x ≡ 25 ≡ 4 mod 7 the unique solution to this
congruence equation.

Euler’s theorem can even be used to compute inverses, although it’s not generally an
efficient way to do so by hand.

Example 4.2.5. By Euler’s theorem, we have 358 ≡ 1 mod 118, which means that 3−1 ≡
357 mod 118. One may compute that 35 ≡ 7 mod 118, 325 ≡ 51 mod 118, 329 ≡ 1 mod 118,
so that 357 ≡ 325 · 33 ≡ 79 mod 118.

Definition 4.2.6. Let a, n ∈ Z with n > 1 and gcd(a, n) = 1. The order mod n of a,
denoted ordn (a) is the smallest positive integer k such that ak ≡ 1 mod n.

Since aϕ(n) ≡ 1 mod n by Euler’s theorem, this tells us that ordn (a) is actually defined,
and that ordn (a) ≤ ϕ(n).

Example 4.2.7. We have ord10 (7) = 4 because 74 ≡ 1 mod 10 and 7, 72 , 73 6≡ 1 mod 10.

Proposition 23. Let a, n, k ∈ Z with n > 1. Then ak ≡ 1 mod n if and only if ordn (a) | k.

Proof. if ordn (a) | k then k = ordn (a)` for some `, from which we see that ak ≡ (aordn (a) )` ≡
1 mod n. Conversely, suppose that ak ≡ 1 mod n. By the division algorithm, we may write
k = ordn (a)q + r with 0 ≤ r < ordn (a). Plugging this in, we have ak ≡ ar ≡ 1 mod n. By
definition of ordn (a), this is possible only if r = 0, in which case we see that ordn (a) | k.
The above proposition is useful for attempting to compute ordn (a), because in combina-
tion with Euler’s theorem, this means that ordn (a) must be a divisor of ϕ(n). This means
that one can compute ordn (a) by ruling out divisors one by one.

Example 4.2.8. What is ord23 (5)? By Fermat’s little theorem, we have ϕ(23) = 22, so
ord23 (5) must divide 22, and therefore we must have ord23 (5) = 2, 11, 22. It’s clearly not 2,
and we compute 511 ≡ 5 · (25)5 ≡ 5 · 25 ≡ 45 ≡ −1 mod 23, so that it’s not 11 either. This
then tells us that ord23 (5) = 22.

Proposition 24. Let a, n, k, ` ∈ Z with n > 1 and gcd(a, n) = 1. Then ak ≡ a` mod n if


and only if k ≡ ` mod ordn (a).

Proof. Suppose that ak ≡ a` mod n. Multiplying through by a−` we see that k−` ≡ 1 mod n,
so that by the above proposition, ordn (a) | k − ` so that k ≡ ` mod ordn (a). Conversely,
suppose that k ≡ ` mod ordn (a), so that k = ` + ordn (a)t for some t. It’s then clear that
ak ≡ a` · (aordn (a) )t ≡ a` mod n.
The above proposition says that exponentiation only matters modulo ordn (a), which
means that knowing the value of ordn (a) is of computational interest. Our next proposition
tells us how to compute the order of a power.

46
Proposition 25. Let a, n, k ∈ Z with n > 1, k ≥ 1, and gcd(a, n) = 1. Then ordn (ak ) =
ordn (a)
gcd(k,ordn (a))
.

Proof. Set t = ordn (a) and d = gcd(k, t). Then we may write t = t0 d and k = k 0 d for
0 0 0 0
some k 0 , t0 . We have (ak )t/d ≡ akt/d ≡ ak t d ≡ ak t ≡ (at )k ≡ 1 mod n, which says that
k k
ordn (ak ) | t/d = t0 . On the other hand, (ak )ordn (a ) ≡ ak·ordn (a ) ≡ 1 mod n, so this means
that t | kordn (ak ). Dividing through by d, this means t0 | k 0 ·ordn (ak ), and since gcd(k 0 , t0 ) = 1
this means that t0 | ordn (ak ). Combining the two divisibilities gives us what we want.
Since Euler’s theorem tells us that ordn (a) ≤ ϕ(n), a natural question is, when does
equality happen?

Definition 4.2.9. Let n > 1 be an integer and gcd(a, n) = 1. We say that a is a generator
mod n if ordn (a) = ϕ(n).

The term “generator” is because if ordn (a) = ϕ(n), the powers a, a2 , . . . , aϕ(n) must all
be distinct residues modulo n. Since there are ϕ(n) units mod n, this means that each
unit is congruent to some power of a modulo n. Group theoretically, this says ha mod ni =
(Z/nZ)× . Since cyclic groups are significantly easier to work in, it will be rather useful to
understand when generators exist. The Chinese Remainder Theorem tells us that (Z/nZ)× ∼ =
e1 ek e1 ek
(Z/p1 Z) ×. . .×(Z/pk Z) for n = p1 · · · pk , so we’ll start by investigating when (Z/p Z)×
× × e

is cyclic.

4.3 Cyclicity of (Z/pZ)×


The first step in our investigation of when (Z/pe Z)× is cyclic for prime p and e ≥ 1 is to
start with the e = 1 case. As it turns out, there is always a generator mod p. We’ll deduce
this by proving a stronger result, that the multiplicative group of any finite field is cyclic.

Proposition 26. Let F be a field and f (x) ∈ F [x] a polynomial of degree d ≥ 1. Then f (x)
has at most d distinct roots in d.

Proof. If d = 1, then f (x) = ax + b for some a, b ∈ F and clearly the only root of f (x) in F is
−b/a. Now suppose we know the result is true for any polynomial of degree k. Let f (x) be an
arbitrary polynomial of degree k + 1. If f (x) has no root in F , then we’re done. Otherwise,
it has some root α ∈ F . By the division algorithm, we may write f (x) = (x − α)g(x) + r(x)
where r(x) = 0 or r(x) = c for some c ∈ F . Plugging in α shows that r(α) = 0, so that
r(x) = 0 for all x ∈ F . This means that f (x) = (x − α)g(x), and deg(g(x)) = k − 1. By
induction hypothesis, g(x) has at most k − 1 distinct roots in F , which means that f (x) has
at most (k − 1) + 1 = k roots in F . By induction, we’re done.

Proposition 27. Let F be a finite field of size q. Then for any α ∈ F × , αq−1 = 1.

Proof. The proof is the same as the proof of Euler’s theorem: consider the map f : F × → F ×
given by f (x)
Q = αx. This is a bijection with inverse map f −1 (x) = α−1 x. Therefore,
q−1
Q
x∈F × αx = x∈F × x which says α = 1 after canceling the product from both sides.

47
Proposition 28. Let F be a finite field of size q, and define Sd = #{α ∈ F × : ord(α) = d}.
Then if Sd > 0, we have Sd = ϕ(d).

Proof. Suppose that ord(α) = d. Then αd = 1, so α is a root of f (x) = xd − 1 ∈ F [x]. Since


α has order d, this means the powers 1, α, α2 , . . . , αd−1 are all distinct elements of F , and
note that f (αk ) = (αd )k − 1 = 0. Since f (x) has at most d roots in F and we’ve found d
distinct roots, this must be all of them. Which roots have order d? Using the formula for
the order of a power, ord(αk ) = d if and only if gcd(d, k) = 1, and there are precisely ϕ(d)
values of k in the range 1 ≤ k ≤ d with this property. Since any element of order d is a root
of f (x), this shows that if an element of order d exists, there are exactly ϕ(d) elements of
order d.
P
Proposition 29. For any integer n ≥ 1, we have d|n ϕ(d) = n.
P P
Proof. The trick is to compute the sum backwards: d|n ϕ(d) = d|n ϕ(n/d). By definition,
ϕ(n/d) counts the number of integers in the range 1 ≤ k ≤ n/d with gcd(k, n/d) = 1.
Now, saying that gcd(k, n/d) = 1 is the same as saying that gcd(dk, n) = d. If 1 ≤ m ≤ n
and gcd(m, n) = d, then m = dk for some 1 ≤ k ≤ n/d anyway, so this means that
ϕ(n/d) actually counts the number of integers in the range 1 ≤ m ≤ n with gcd(m, n) = d.
Now, any such integer 1 ≤ m ≤ n has gcd(m, n) = d for some d | n. Therefore, the sets
Ad = {1 ≤ m ≤ n : gcd(m, n) = d} for d | n form a P partition of {1,P . . . , n}. By our
observation, |Ad | = ϕ(n/d), so taking cardinalities yields d|n ϕ(n/d) = d|n |Ad | = n as
desired.

Theorem 4.3.1. Let F be a finite field of size q. Then F × is a cyclic group. In particular,
(Z/pZ)× always has a generator for p prime.

Proof. For any α ∈ F ×Pwe must have ord(α) | q − 1, since αq−1 = 1. By the previous lemma,
this says that q − 1 = d|q−1 ϕ(d). Using the notation of proposition 25, Sd ≤ ϕ(d) for all d
P P
(there could be no elements of order d!). Therefore, q −1 ≤ d|q−1 Sd ≤ d|q−1 ϕ(d) = q −1.
P
This forces equality, so that q − 1 = d|q−1 Sd . This forces Sd > 0 for all d | q − 1, so that
in particular, Sq−1 = ϕ(q − 1) as desired.
It’s worth pointing out that this argument really does use the fact that F is a field in a
non-trivial way. The crux of the argument relies on the fact that over a field, the number of
roots of a polynomial are bounded by the degree. Over a ring, this need not happen. For
example, in (Z/8Z)[x], the polynomial x2 − 1 has 4 roots, so the argument cannot generalize.

One very useful application of our main theorem is being able to detect if elements of
finite fields are n-th powers or not.

Proposition 30. Let F be a finite field of size q and let α ∈ F × . Then the equation xn = α
is solvable in F if and only if αq−1/d = 1, where d = gcd(q − 1, n). When there is a solution,
there are exactly d solutions.

Proof. Let g be a generator of F × . Write α = g ` for some `, and let x = g k . Then solving
xn = a is the same as solving g nk = g ` in F . This is equivalent to the solvability of the

48
congruence nk ≡ ` mod q − 1, which we know is solvable if and only if gcd(n, q − 1) | `,
and has exactly d = gcd(n, q − 1) solutions. Write q − 1 = dm for some m. If d | `, then
αq−1/d = g (q−1)`/d = (g q−1 )`/d = 1. Conversely, if αq−1/d = 1, this means that g (q−1)`/d = 1,
so q − 1 | (q − 1)`/d. This means d(q − 1) | (q − 1)`, so that d | ` as desired.
Example 4.3.2. Suppose we wanted to solve x44 ≡ 81 mod 257. First, note that 3 is a
generator mod 257. We must have ord257 (3) | 256, so so we just need to rule out all lower
powers of 2. One may compute 35 ≡ −14 mod 257, 310 ≡ 196 mod 257, 311 ≡ 74 mod 257,
322 ≡ 79 mod 257, 344 ≡ 73 mod 257, 3128 ≡ 176 81
≡ 2 + 8114
≡ −1 mod 257. This shows that
we must have ord257 (3) = 256. Therefore, we want to solve 344k ≡ 34 mod 257. This is the
same as solving 44k ≡ 4 mod 256, i.e. 11k ≡ 1 mod 64. This means k ≡ 35 mod 64, so that
k ≡ 35, 99, 163, 227 mod 256. This yields the four solutions x ≡ 186, 108, 8, 149 mod 257.

4.4 Hensel’s lemma


If the Chinese Remainder Theorem is in the running for the most important theorem in
number theory, then Hensel’s lemma might be a contender for the second most important
theorem. Hensel’s lemma is a “lifting” theorem: it says that if you understand what’s
happening in Z/pZ for prime p, then you can often times “lift” up to understand what’s
happening in Z/pe Z for e ≥ 1.
Theorem 4.4.1 (Hensel’s lemma). Let p be a prime. Suppose that f (x) ∈ Z[x] and that
f (c) ≡ 0 mod p and f 0 (c) 6≡ 0 mod p for some c ∈ Z. Then for any k ≥, there is ck ∈ Z
such that ck ≡ c mod p with f (ck ) ≡ 0 mod pk , and ck is unique modulo pk .
Proof. We will explicitly construct a root of f (x) mod pk for all k ≥ 1. By assumption,
we may take c1 = c, and certainly there is a unique root of f (x) mod p with this property.
Now, suppose that we have constructed ck with the desired properties. We wish to construct
ck+1 . Consider ck + tpk for t ∈ Z. We’ll show that a choice of t gives the desired prop-
erties. We have f (ck + tpk ) = f (ck ) + tpk f 0 (ck ) + 12 t2 p2k f 00 (ck ) + . . . by Taylor expanding
f (x) around ck . Working mod pk , we see that f (ck + tpk ) ≡ f (ck ) + tpk f 0 (ck ) mod pk+1 ,
because all higher terms are divisible by a larger power of p. We wish to find t such
that 0 ≡ f (ck ) + tpk f 0 (ck ) mod pk+1 . By induction hypothesis, f (ck ) ≡ 0 mod pk , so di-
viding through, we want to solve 0 ≡ f (c pk
k)
+ tf 0 (ck ) mod p. Since ck ≡ c mod p and
f 0 (c) 6≡ 0 mod p by assumption, we can indeed solve for t: t ≡ − f (c
pk
k) 0
f (c)−1 mod p. We
then set ck+1 = ck + tpk , from which f (ck ) ≡ 0 mod pk+1 by construction, and clearly
ck+1 ≡ c mod p. By induction, this shows there is a root mod pk for all k ≥ 1.

This shows existence, so now we need to prove that there is a unique choice of lift of
the root mod pk . Once again, we proceed by induction. For k = 1, this is clear. Now
suppose that we know there’s a unique lift to a root mod pk , and suppose we have two
roots c1 , c2 such that f (c1 ) ≡ f (c2 ) ≡ 0 mod pk+1 that both satisfy c1 ≡ c2 ≡ c mod p.
We need to show that c1 ≡ c2 mod pk+1 . We prove this by induction as well! Suppose we
know that f (c1 ) ≡ f (c2 ) ≡ 0 mod pk+1 and c1 ≡ c2 mod p =⇒ c1 ≡ c2 mod pk for some
k. If f (c1 ) ≡ f (c2 ) ≡ 0 mod pk+1 and c1 ≡ c2 mod p. Then f (c1 ) ≡ f (c2 ) ≡ 0 mod pk as

49
well, so by induction hypothesis, c1 ≡ c2 mod pk . Then we can write c2 = c1 + tpk for some
t. Plugging in and Taylor expanding at c1 , we have f (c2 ) ≡ f (c1 ) + tpk f 0 (c1 ) mod pk+1 ,
and since f (c1 ) ≡ f (c2 ) ≡ 0 mod pk , this in fact means that 0 ≡ tf 0 (c1 ) mod p. Since
f 0 (c1 ) ≡ f 0 (c) 6≡ 0 mod p, this means t ≡ 0 mod p, so that c1 ≡ c2 mod pk+1 as desired.
Therefore by induction, we’re done.
The condition that f 0 (c) 6≡ 0 mod p means that c is not a repeated root of f (x) in Fp .
Therefore, what Hensel’s lemma says is that simple roots of polynomials mod p uniquely lift
up to roots mod pk for any k ≥ 1.
Example 4.4.2. Let f (x) = x2 − 2 ∈ Z[x]. Note that f (3) ≡ 0 mod 7 and f 0 (3) ≡ 6 6≡
0 mod 7. By Hensel’s lemma, this means for any k ≥ 1 we can find a unique lift of 3 mod 7
to a root of f (x) mod pk . The proof of Hensel’s lemma explicitly tells us how to do this!

To find a root mod 49, our lift is of the form c2 = 3 + 7t for some t. Therefore, we need to
solve f (3)
7
+ tf 0 (3) ≡ 0 mod 7. Since f 0 (3) ≡ −1 mod 7, this means we need 1 − t ≡ 0 mod 7,
so t ≡ 1 mod 7. Thus, 10 mod 49 is the unique lift of 3 mod 7 to a root of f (x) mod pk .

We can keep repeating this process to lift up roots further. For example, to compute
the lift of the root to a root mod 343, we have c3 = 10 + 49t for some t. To compute t,
we solve f (10)
49
+ tf 0 (49) ≡ 0 mod 7. Since f 0 (49) ≡ f 0 (7) ≡ −1 mod 7, this means we solve
2 − t ≡ 0 mod 7, so that t ≡ 2 mod 7. Then 108 mod 343 is the unique lift of 3 mod 7 to a
root of f (x) mod 343.
By combining Hensel’s lemma with the Chinese Remainder Theorem, we get a general
process for trying to solve polynomial congruences f (x) ≡ 0 mod n for f (x) ∈ Z[x].

ˆ Write n = pe11 · · · pekk as its prime factorization.

ˆ If applicable, try to lift roots of f (x) mod pi to roots of f (x) mod pei i using Hensel’s
lemma.
ˆ Use the Chinese Remainder Theorem to glue the roots mod pei i to a root mod n.

That this last steps works follows from the next proposition:
Proposition 31. Suppose that f (x) ∈ Z[x]. Let m, n > 1 with gcd(m, n) = 1. If f (r) ≡
0 mod m and f (s) ≡ 0 mod n then f (c) ≡ 0 mod mn, where c ≡ r mod m and c ≡ s mod n.
Proof. Exercise.
Example 4.4.3. Suppose we wanted to find all solutions to the congruence x3 − 6 ≡ 0 mod
245. We note that 245 = 5 · 72 , so solving x3 ≡ 6 mod 245 is equivalent to solving
(
x3 ≡ 1 mod 5
x3 ≡ 6 mod 49

There is a unique solution to the first congruence, namely, x ≡ 1 mod 5. For the second
congruence, note that there are three solutions to x3 ≡ 6 mod 7, given by x ≡ 3, 5, 6 mod 7.

50
Since none of these are repeated roots of f (x) mod 7, by Hensel’s lemma, they all uniquely
lift to roots mod 49. One may run through the lifting computation and verify that these lifts
are x ≡ 24, 34, 40 mod 49. Since any root of f (x) mod 49 is a lift of some root of f (x) mod 7,
these are all the roots of f (x) mod 49. We then get three systems of congruences
( ( (
x ≡ 1 mod 5 x ≡ 1 mod 5 x ≡ 1 mod 5
x ≡ 23 mod 49 x ≡ 34 mod 49 x ≡ 40 mod 49

which glue together to the roots x ≡ 171, 181, 236 mod 245 by the Chinese Remainder
Theorem. Since any root of f (x) mod 245 corresponds to a pair of roots mod 5 and 49, these
are all roots of f (x) mod 245.

What happens when the condition f 0 (c) 6≡ 0 mod p is not met? Let r = c + tpk be a
lift of c mod pk to a congruence class mod pk+1 . Then we have f (r) ≡ f (c) + tpk f 0 (c) ≡
f (c) mod pk+1 . This means that either all lifts are roots, if c is a root of f (x) mod pk , or no
lifts are roots, if c is not a root of f (x) mod pk .

Example 4.4.4. Consider f (x) = x2 +1 ∈ Z[x]. Then f (1) ≡ 0 mod 2 and f 0 (1) ≡ 0 mod 2,
so Hensel’s lemma doesn’t apply. The lifts of 1 mod 2 to a congruence class mod 4 are
1, 3 mod 4, and neither of these are roots because f (1) ≡ 2 6≡ 0 mod 4.

Hensel’s lemma can also be used to compute inverse mod pk !

Example 4.4.5. Suppose we wanted to compute 5−1 mod 343. This is equivalent to finding
the root of f (x) = 5x − 1 mod 343. Note that f (3) ≡ 0 mod 7 and f 0 (3) ≡ 5 mod 7, so we
can lift 3 mod 7 to a root mod 343. One may verify that 10 mod 49 is the lift to a root mod
49, and that 206 mod 343 is the lift to a root mod 343. This means that 5−1 ≡ 206 mod 343.

Since Hensel’s lemma can be used to compute inverses mod pk , one may then combine
Hensel’s lemma with the Chinese Remainder Theorem to compute inverses mod n for any
n. However, this is, in general, going to be slower than just running the Euclidean Algorithm.

The astute reader may have noticed the following recursive formula for computing lifts
of roots:

Proposition 32. Suppose that f (x) ∈ Z[x] satisfies the conditions of Hensel’s lemma, and
that ck is a root of f (x) mod pk for k ≥ 1. Then ck+1 ≡ ck − f (ck )f 0 (c) mod pk+1 is the
unique lift of ck mod pk to a root of f (x) mod pk+1 , where f 0 (c) ≡ f 0 (c)−1 mod p.

This recursion looks suspiciously similar to Newton’s method from calculus, which says
that the successive iterates xn+1 = xn − ff0(x n)
(xn )
provide good approximations to a root of f (x).
This is not coincidental!

For each prime p and k ≥ 1, there is a natural ring homomorphism πk+1 : Z/pk+1 Z →
Z/pk Z given by x mod pk+1 7→ x mod pk . This produces an infinite sequence of maps
π π
. . . → Z/pk Z −→
k
Z/pk−1 Z → . . . → Z/p2 Z −→
2
Z/pZ

51
There is a formal algebraic construction called the inverse limit. One may form the inverse
limit Zp = lim Z/pk Z with respect to these projection maps, which is called the ring of p-adic
←−
integers.

Algebraically, one may think of Zp as an “infinite” version of Z/pk Z. By the construction


of the inverse limit, there are projection maps Zp → Z/pk Z for any k ≥ 1. Elements of
Zp look like “power series in p”: for α ∈ Zp one can write α = c0 + c1 p + c2 p2 + . . . where
ci ∈ {0, . . . , p − 1} (one has to make sense of what it means for this infinite sum to converge
in Zp , but we won’t get into this).

Hensel’s lemma is then a p-adic version of Newton’s method! At each step, it produces a
root rk of f (x) mod pk , which looks like something of the form rk = r0 +r1 p+. . .+rk−1 pk−1 ,
coming from lifting an initial root r0 mod p. Taking a limit as k → ∞ will produce a root
α ∈ Zp of f (x). The k-th iterate rk is a p-adic approximation of α that agrees mod pk .

4.5 Cyclicity of (Z/nZ)×


We’ve seen that (Z/pZ)× for p prime is a cyclic group. Using the lifting philosophy of Hensel,
we’re going to show that a generator mod p can be lifted to a generator mod pk for all k ≥ 1,
so that (Z/pk Z)× is cyclic. Once we’ve proven this, we’ll be ready to tackle the general case
of when (Z/nZ)× is cyclic.

Proposition 33. Suppose that ordn (a) = m and ordn (b) = `. If gcd(m, `) = 1 then
ordn (ab) = m`.

Proof. Set t = ordn (ab). We see that (ab)m` = (am )` (b` )m ≡ 1 mod n, so that t | m`. On the
other hand, we have (ab)t ≡ 1 mod n, so raising to the m-th power yields btm ≡ 1 mod n.
Similarly, at` ≡ 1 mod n. This means that ` | tm and since gcd(`, m) = 1 this means that
` | t. Similarly, m | t, and once again because gcd(`, m) = 1 we conclude that `m | t, so that
t = m`.
The proof really requires that gcd(m, `) = 1 (it was used twice!). Without this condition,
it’s not even true that ordn (ab) = lcm(ordn (a), ordn (b)) like one might expect!

Proposition 34. There is a generator mod p2 for odd prime p.

Proof. Let g be a generator mod p. Then g is a root mod p of f (x) = xp−1 − 1 ∈ Z[x]. Since
f 0 (g) ≡ −g p−2 6≡ 0 mod p, by Hensel’s lemma, there is a unique lift g + cp of g to a root
of f (x) mod p2 . Note that ϕ(p2 ) = p(p − 1), so this means that ordp2 (g + cp) = p − 1 or
p(p − 1) since by virtue of being a root of xp−1 − 1, the order has to be divisible by p − 1.
We claim that it cannot be p(p − 1). For any d with (g + cp)d ≡ 1 mod p2 , we must have
g d ≡ 1 mod p. In particular, this means any other lift g + c0 p cannot be a root of xp−1 − 1
mod p2 , because there’s a unique lift of g to a root mod p2 . This means that all other lifts
must have order p(p − 1), so we’re done.

52
In particular, this means that for each generator g mod p, there are p−1 lifts of g that are
a generator mod p2 . Since there are ϕ(p − 1) generators mod p, there are (p − 1)ϕ(p − 1) =
ϕ(ϕ(p2 )) generators mod p2 (note that a generator mod p2 must also be a generator mod
p!).

Proposition 35. If g is a generator mod p2 for odd prime p then g is a generator mod pk
for all k ≥ 2.

Proof. We already did the case k = 2. Suppose that g is a generator mod pk , and set t =
k+1
ordpk+1 (g). Since ϕ(pk+1 ) = pk (p − 1) and both g t ≡ 1 mod pk+1 and g ϕ(p ) ≡ 1 mod pk+1 ,
k
this means t = pk (p − 1) or t = pk−1 (p − 1). Since g p (p−1) ≡ 1 mod pk+1 , this means that
g p−1 ≡ 1 mod p, so we may write g p−1 = 1 + p` for some ` with p - ` (because g is a generator
mod p2 , and so g p−1 6≡ 1 mod p2 ). Then g p(p−1) = (g p−1 )p = (1 + p`)p ≡ 1 + p`2 mod p3
because the binomial coefficients kp are divisible by p for k = 1, . . . , p − 1. Therefore,
we may write g p(p−1) = 1 + `2 p2 with p - `2 . Taking a p-th power once more, we find
2 k−1
g p (p−1) ≡ 1 + `2 p3 mod p4 . Repeating this process, we eventually find that g p (p−1) ≡
1 + `k−1 pk mod pk+1 with p - `k−1 . This means that ordpk+1 (g) = pk (p − 1) must be the only
possibility, so that g is a generator mod pk+1 . By induction, we’re done.
Note that the proof explicitly tells us what a generator is mod pk for all k ≥ 2. Since
g p ≡ g mod p, g p mod p2 is a lift of g mod p. Then note that (g p )p−1 ≡ 1 mod p2 by Euler’s
theorem, so g p is the “bad” lift of g that cannot be a generator mod p2 . This means that any
other lift of g mod p that’s not congruent to g p mod p2 is a generator mod p2 ! In particular,
note that g p−1 (g + p) ≡ g mod p, and g p−1 (g + p) 6≡ g p mod p2 , so that g p−1 (g + p) is always
a generator mod p2 .

Example 4.5.1. 3 is a generator mod 7, so 36 · 10 ≡ 38 mod 49 is a generator mod 7k for


all k ≥ 2.

We now have to handle the case of p = 2, for which Hensel’s lemma obviously does not
apply, because xp−1 − 1 = x − 1 always has a single root!

Proposition 36. There is a generator mod 2k if and only if k = 1, 2.

Proof. That there is a generator mod 2 and a generator mod 4 is obvious: in both cases,
1 mod 2 and 3 mod 4 are generators. Therefore, suppose that k ≥ 3 and g is a generator
k−1
mod 2k . Then the set of powers {1, g, . . . , g 2 } must hit all 2k−1 units mod 2k . Therefore,
there is ` such that g ` ≡ −1 mod 2k . Squaring, g 2` ≡ 1 mod 2k so this means 2k−1 | 2`
so that 2k−2 | `, which forces 2k−2 = ` since 1 ≤ ` ≤ 2k . On the other hand, note that
g 2 ≡ 1 mod 8 since all units square to 1 mod 8. This means that g 4 ≡ 1 mod 16 using the
k−2
Binomial theorem, and so repeatedly squaring, we arrive at g 2 ≡ 1 mod 2k . However, this
means that 1 ≡ −1 mod 2k , which is a contradiction. Therefore, there is no generators mod
2k for k ≥ 3.
We need one more property of orders before we’re ready to tackle our goal.

Proposition 37. Suppose that gcd(m, n) = 1. Then ordmn (a) = lcm(ordm (a), ordn (a)).

53
Proof. Let t = ordmn (a) and k = lcm(ordm (a), ordn (a)). Then at ≡ 1 mod mn, so at ≡
1 mod m and at ≡ 1 mod n, so this means that ordm (a) | t and ordn (a) | t so k | t. On the
other hand, ak ≡ 1 mod m and ak ≡ 1 mod n, so in particular since m, n are relatively prime
this means that ak ≡ 1 mod mn, so that t | k. This proves t = k as we wanted.
Theorem 4.5.2. There is a generator mod n if and only if n = 2, 4, pk , 2pk for p an odd
prime and k ≥ 1.
Proof. We’ve already shown the result for n = 2, 4, pk , so it remains to see that there is a
generator mod 2pk . To see this, suppose that g is a generator mod pk . If g is odd, then g has
order 1 mod 2, so applying the previous proposition says that ord2pk (g) = ϕ(pk ) = ϕ(2pk ),
so that g is a generator. If g is even, then g + pk is odd, and ordpk (g + pk ) = ordpk (g), so
g + pk is a generator mod 2pk .

Conversely, write n = 2e pe11 · · · pekk as a product of primes where pi are odd. Then
ordn (a) = lcm(ord2e (a), . . . , ordpek (a)). If e ≥ 3, then ord2e (a) ≤ 2e−2 which means that
k
ordn (a) < 2e−2 ϕ(pe11 · · · pekk ) < ϕ(n). If k ≥ 2, then ϕ(pe11 ) and ϕ(pekk ) are both even,
and so this means that lcm(ordpe11 (a), ordpe22 (a)) < ϕ(pe11 pe22 ), which one again results in
ordn (a) < ϕ(n). So it remains to rule out n = 4pk , but the same argument works because
both ϕ(4) and ϕ(pk ) are even. This leaves the only possibilities as n = 2, 4, pk , 2pk .
Earlier, we gave a criteria for when an element in a finite field is an n-th power. Since we
now know when there is a generator mod n, we can give a similar criteria for such values.
Proposition 38. Suppose that there is a generator mod m, and gcd(a, m) = 1. Then
xn ≡ a mod m is solvable if and only if aϕ(m)/d ≡ 1 mod m with d gcd(n, ϕ(m)).
The proof is identical to the finite field case, and so we omit it. A particular test of
interest will be the following special case:
Corollary 4.5.3. Let p be a prime and p - a. Then x2 ≡ a mod p is solvable if and only if
p−1
a 2 ≡ 1 mod p.

4.6 Application: Cryptography


As an application of all this theory, we’ll give a short discussion about cryptography.
Definition 4.6.1. Let g be a generator mod n. For any a with gcd(a, n) = 1, the base g
logarithm of a, logg (a), is the congruence class mod ϕ(n) that solves g x ≡ a mod n.
This is, of course, similar to how logarithms in R are defined: logb (a) is the real number
such that bx = a in R. Here the same thing happens, except because exponents are only
unique modulo ordn (g) = ϕ(n), the logarithm is a congruence class, and not an integer. We
often refer to logg (a) as a “discrete logarithm”
Example 4.6.2. 3 is a generator mod 7, and 34 ≡ 4 mod 7, so log3 (4) ≡ 4 mod 6.
Discrete logarithms have all the properties that you would expect.

54
Proposition 39. Let g be a generator mod n. Then for any integers a, b, k we have:
1. logg (1) ≡ 0 mod ϕ(n).

2. logg (ab) ≡ logg (a) + logg (b) mod ϕ(n).

3. logg (ak ) ≡ k logg (a) mod ϕ(n).


Proof. These all follow rather easily. The first is obvious, so we just prove the second. The
third then follows similarly. By definition, g logg (ab) ≡ ab mod n, and a ≡ g logg (a) mod n and
b ≡ g logg (b) mod n, so this means g logg (ab) ≡ g logg (a)+logg (b) mod n which then immediately tells
us that logg (ab) ≡ logg (a) + logg (b) mod ϕ(n).
Just as logarithms are useful for solving equations, so are discrete logarithms.
Example 4.6.3. Suppose we wanted to solve the congruence 6x12 ≡ 11 mod 17. One may
check that 3 is a generator mod 17, so taking logarithms of both sides, log3 (6) + 12 log3 (x) ≡
log3 (11) mod 16. Since log3 (6) ≡ 1 + log3 (2) mod 16, this means 1 + 12 log3 (x) ≡ log3 ( 11
2
)≡
11
log3 (14) mod 16, because 2 ≡ 14 mod 17. Note that 14 ≡ −3 mod 17, so log3 (14) ≡
log3 (−3) ≡ log3 (−1) + 1 ≡ 9 mod 16, as log3 (−1) = 8 because 3 is a generator mod 17.
Therefore, 12 log3 (x) ≡ 8 mod 16, and so this tells us that 3 log3 (x) ≡ 2 mod 4. Solving says
log3 (x) ≡ 2, 6, 10, 14 mod 16, so that x ≡ 9, 15, 8, 2 mod 17.
Classical algorithms in cryptography are based on “hard” problems in number theory. In
particular, we’ll discuss the following two algorithms, each based on the following ideas:

ˆ RSA: Taking k-th roots modn is hard!

ˆ ElGamal: Computing logg (a) is hard!

It’s worth pointing out that discrete logarithms make sense in any cyclic group, since the
definition only depends on the existence of a generator. Classically, ElGamal was done in
(Z/pZ)× , although other common choices are are F × for F a finite field of size q, or the
group of rational points on certain elliptic curves. It’s more difficult to do computations in
these settings, so it increases the security of the crypto system.

We’ll start with RSA. We have two participants, Alice and Bob, who wish to secretly
communicate with each other over a public channel. Each participant has a public key,
and a private key. The data of each of these is as follows:

ˆ The public key is a pair (e, m) of integers such that gcd(e, ϕ(m)) = 1.

ˆ The private key is an integer d such that ed ≡ 1 mod ϕ(m).

The idea behind RSA is quite straightforward. Let the public keys of Alice and Bob be
(eA , mA ) and (eB , mB ) respectively, with private keys dA and dB . Alice takes the message
she wants to send, and converts it via some shared scheme between Alice and Bob into an
integer M (by, say, associating letters to numbers via a cipher). Alice takes her message M ,
and encrypts it by computing M eB mod mB . Bob can then decrypt the message by taking

55
a dB -th power, as (M eB )dB ≡ M eB dB ≡ M mod mB because eB dB ≡ 1 mod ϕ(mB ). Where
does the security come from? Suppose that Eve the spy wants to listen in on their conversa-
tion. Eve always knows (eA , mA ) and (eB , mB ). If she intercepts the encrypted message E,
she needs to either solve eA x ≡ 1 mod ϕ(mA ) or eB x ≡ 1 mod ϕ(mB ), depending on if the
message is getting sent to Alice or Bob. The point is that doing so generally requires com-
puting ϕ(mi ), and this is not really feasible without knowing the factorization of the modulus!

Traditionally, one takes m = pq for distinct primes p, q. Then ϕ(m) = (p−1)(q −1). Note
that x2 − (m + 1 − ϕ(m))x + m = (x − p)(x − q), so if you know both m, ϕ(m)) it’s equivalent
to knowing p, q, and therefore how m factors. The security of RSA then, relies of the fact
that there’s no efficient algorithm for factoring integers. With modern computational power,
this is not yet possible. However with quantum computers, there is an efficient factoring
algorithm known as Schor’s algorithm. The world is still quite a ways away from RSA no
longer being actually secure, but the existence of this algorithm might make it less attractive
a choice of cryptosystem. We give a numerical example to illustrate how RSA works.
Example 4.6.4. Suppose that you are Alice, and Bob’s public key is (3, 2823907). You wish
to send the message “MOM” to Bob. Let A = 01, . . . , Z = 26 and let a space correspond to
27. Using this rule, the message “MOM” becomes the integer 131513. We encrypt the mes-
sage by computing 1315133 ≡ 1842379 mod 2823907, and we send Bob the integer 1842379.
Bob’s private key is 1880251 (verify this using WolfamAlpha!). Bob can then decrypt the
received message by computing 18423791880251 ≡ 131513 mod 2823907, and translating back
into text using the predetermined association of letters and numbers.
Some remarks about RSA:
ˆ If the message is long, it needs to be broken up into chunks before it can be sent over.
For example, if the modulus in Bob’s public key is 7 digits long, then using the scheme
in the above example, Alice can only send over messages consisting of at most three
letters at a time. This is because any message of at least four letters will correspond
to an 8 digit integer, which would be reduced, losing information.
ˆ If the modulus was, say, 12964553, then the message can have at most 4 letters, as long
as the starting letter is not after “K”. For example, any four letter message starting
with “M” would be able to be sent, because we would get an 8 digit number that gets
reduced mod 12964553.
Example 4.6.5. Assume that Bob has the same public key as in the previous example. If
Bob receives the block message (1294545, 1214153) from Alice, this decodes to 11209, 2205.
Since any message needs to have an even number of digits, we need to insert a leading 0 into
the first block, leaving Bob with 011209, 2205. This decodes to the message “ALIVE”.

Next, we take a look at the ElGamal cryptosystem. Once again, we assume a similar
set up: Alice and Bob wish to secretly communicate over a public channel. Once again, the
system consists of a public key and a private key. The data is as follows:
ˆ The public key is a triple (p, g, h) such that p is a prime, g is a generator mod p, and
h ≡ g a mod p.

56
ˆ The private key is any integer a.
Suppose that the public keys of Alice and Bob are (p, g, h) and (p, g, h0 ) respectively. Alice
and Bob share a secret! Alice can compute (h0 )a ≡ (g b )a ≡ g ab mod p and Bob can compute
hb ≡ (g a )b ≡ g ab mod p. We set s ≡ g ab mod p. As before, Alice converts her message
into an integer M , and she can encrypt it by sending M s mod p over to Bob. Since Bob
knows s mod p, Bob can compute its inverse, and decrypt the message by computing (M s) ·
s−1 mod p ≡ M mod p. Why is this secure? If Eve the spy tries to listen in to their
conversation, Eve will know g, g a , g b . In order to decrypt the message that’s getting send,
she’d need to know how to compute g ab using only these quantities. Currently, there is no
efficient algorithm for doing so!

4.7 Application: Decimal expansions


In elementary school, you learn that a rational number x between 0 and 1 has either a finite
decimal expansion x = .c1 c2 . . . cd , or is eventually periodic, that is x = .b1 b2 . . . bm c1 c2 . . . cd .
The goal of this handout will be to explain this phenomenon, and determine algorithms for
determining decimal expansions. Surprisingly, the key to this will be Euler’s theorem.

We’ll start by showing that rational numbers are precisely those with eventually periodic
decimal expansions.
Theorem 4.7.1. Let x be a real number with 0 < x < 1. Then x is rational if and only if
the decimal expansion of x is eventually periodic.
Proof. First, suppose that x is rational. Write x = ab with (a, b) = 1, and suppose the decimal
expansion of x is x = .c1 c2 c3 . . .. Then 10k x = c1 c2 . . . ck .ck+1 . . .. By the division algorithm,
k
write 10k a = bqk + rk where 0 ≤ rk < b. Thus, c1 c2 . . . ck .ck+1 = 10k x = 10b a = qk + rbk ,
so that qk = c1 . . . ck and rbk = .ck+1 ck+2 . . .. Since there are only finitely many possi-
ble values for rk , there exist some integers m, n with m < n such that rm = rn , so that
.cm+1 cm+2 . . . = .cn+1 cn+2 . . .. This says the decimal expansion of x is .c1 c2 . . . cm+1 . . . cn .

Next, assume that x = .b1 b2 . . . bm c1 . . . cd . Then 10m x = b1 . . . bm .c1 . . . cd . Therefore


if we can show c1 . . . cd is rational, we’re done, as 10m x will then be an integer plus a
rational number, and therefore solving for x says x is rational. Set y = .c1 . . . cd . Then
c1 ...cd
10d y = c1 . . . cd .c1 . . . cd , so (10d − 1)y = c1 . . . cd says y = 10 d −1 , so that y is rational as

desired.
Saying the decimal expansion is finite means that the repeating part is a block of 0’s, so
we really have proved what we wanted: (the repeating block is either all 0’s or it isn’t). Note
that the above proof is constructive: given a rational number, it gives us an algorithm for
computing it’s decimal expansion, and given a decimal expansion, it tells us what rational
number it comes from.
Example 4.7.2. Let x = .11123. Then 100x = 11.123, so we need to compute y = .123.
123 123
We have 1000y = 123.123, so 999y = 123 says y = 999 . Therefore, 100x = 11 + 999 = 11112
999
,
11112 926
so x = 99900 = 8325 .

57
1
Example 4.7.3. Let x = 303
. To compute the decimal expansion of x, we follow the proof:

101 · 1 = 303 · 0 + 10
102 · 1 = 303 · 0 + 100
103 · 1 = 303 · 3 + 91
104 · 1 = 303 · 33 + 1
105 · 1 = 303 · 330 + 10

We have found two integers m and n with rm = rn , namely, m = 1 and n = 5. This says
x = .c1 c2 c3 c4 c5 . We can now read off the digits by looking at the remainders:

c1 = q1 =0
10c1 + c2 = q2 = 0 =⇒ c2 = 0
100c1 + 10c2 + c3 = q3 = 3 =⇒ c3 = 3
1000c1 + 100c2 + 10c3 + c4 = q4 = 33 =⇒ c4 = 3
10000c1 + 1000c2 + 100c3 + 10c4 + c5 = q5 = 330 =⇒ c5 = 0

Therefore, x = .00330 = .0033.


1
This algorithm is not terribly useful: the output for 303 had an initial non-repeating
block, however we could actually write it as a purely repeating decimal!

Our next goal will be to determine whether a rational number x = ab has a finite decimal
expansion, or an eventually periodic decimal expansion (with non-zero repeating block), and
to determine a better algorithm for computing the decimal expansion. The key step in our
proof was that there are finitely many remainders upon division by b, so that rm = rn for
some integers m < n. Translated into a statement about modular arithmetic, there are
integers m, n such that 10m ≡ 10n mod b. If 10 is invertible mod b, this is the same thing as
saying 10n−m ≡ 1 mod b, so that there is a solution to 10d ≡ 1 mod b.

We’ll first tackle the case where x has a purely periodic decimal expansion, i.e. x =
.c1 . . . cd .

Theorem 4.7.4. Let x = ab with (a, b) = 1 be a rational number between 0 and 1. Then the
decimal expansion of x is purely periodic if and only if (10, b) = 1. In particular, the period
length is given by ordb (10).

Proof. First, suppose that x = .c1 . . . cd is purely periodic. Then 10d x = c1 c2 . . . cd .c1 . . . cd ,
c1 ...cd d
so x = 10 d −1 . Since 10 − 1 ≡ 1 mod 10, the denominator remains co-prime to 10 even after

canceling common factors with the numerator.

Now suppose that (10, b) = 1, and let d = ordb (10). Then 10d ≡ 1 mod b, so 10d − 1
is a multiple of b. Write 10d − 1 = bk for some k, so that x = ab = ak bk
= 10ak
d −1 . Since
a d
b
< 1 we have ak < 10 − 1, so the decimal expansion of ak requires at most d digits. Write

58
c1 ...cd
ak = c1 . . . cd , so x = 10d −1
= .c1 . . . cd .

Finally, suppose that x can be written as a repeating decimal with block length `. The
above argument shows that x can be written as a fraction with denominator 10` − 1, so that
10` ≡ 1 mod b. This means d = ordb (10) satisfies d | `, so d ≤ `. Since we’ve shown that x
can be written as a repeating decimal with block length d, this shows it is the minimal such
length, and therefore the period.
Now, we’ll tackle when x has a finite decimal expansion.
Theorem 4.7.5. Let x = ab with (a, b) = 1 be a rational number between 0 and 1. Then the
decimal expansion of x is finite if and only if the only possible prime factors of b are 2 and
5.
Proof. First, suppose that b = 2e 5f for some e, f . Let d = max{e, f }. Then 10d x = ka for
ka
some integer k, so x = 10 d says the decimal expansion of x is an integer with some number

of zeros before it, i.e. is finite.

Next, suppose that x has a finite decimal expansion, x = .c1 c2 . . . cd . Then 10d x =
c1 . . . cd , so x = c110...cd d . Canceling common factors from the numerator and denominator to
reduce to common form, this says the only prime factors of the b must divide 10d , i.e. must
be 2 or 5.
If we combine the two statements, we get the following theorem:
Theorem 4.7.6. Let x = ab where (a, b) = 1 be a rational number between 0 and 1. Depend-
ing on the prime factorization of b, exactly one of the following holds:

(1) x has a finite decimal expansion, if and only if b = 2e 5f for some e, f not both 0.

(2) x is purely periodic with period length ordb (10) if and only if (10, b) = 1.

(3) x is eventually periodic with an initial non-repeating block if and only if (10, b) 6= 1
and b is divisible by a prime other than 2 or 5. If b = 2e 5f b0 , then the period length is
ordb0 (10).

Proof. Everything is immediate from what we have done so far, except the claim about the
period length in statement (3). Let k = max{e, f }, then 10k x = ma b0
for some integer m.
Writing ma = b q + r, with 0 ≤ r < b , we have b0 = q + b0 . Since (10, b0 ) = 1, this says
0 0 ma r
r
b0
is periodic of length ordb0 (10), so 10k x has a purely periodic fractional part of period
ordb0 (10). Dividing by 10k shifts the decimal point left by k places, so that x has an initial
non-repeating block (the digits of q) followed by a repeating block.
By using modular arithmetic, we can improve the algorithm from our earlier example.
1
Example 4.7.7. Let x = 303 . Since (10, 303) = 1, the above theorem says x is purely
periodic with period length d = ord303 (10). By Euler’s theorem, ϕ(303) = ϕ(3)ϕ(101) =
200, so d | 200. One can check manually that 104 ≡ 1 mod 303, so d = 4. We have
1
104 − 1 = 303 · 33, so 303 = 1033
4 −1 = .0033.

59
1
Example 4.7.8. Let x = 200 . Then 200 = 23 · 52 , so x has a finite decimal expansion. We
have max{3, 2} = 3, so 1000x = 5. Shifting the decimal to the left 3 places says x = .005.
926
Example 4.7.9. Let x = 8325 . We have 8325 = 32 · 52 · 37, so x has an initial non-
repeating block following by a repeating block. To compute the decimal expansion of x,
we’ll use a combination of the previous two methods. Multiply x by 100, so that 100x =
926·4
9·37
= 3704
333
41
= 11 + 333 . Since (10, 333) = 1, 333 41
is purely periodic. We find ϕ(333) = 216 so
d = ord333 (10) | 216. One can check that 10 ≡ 1 mod 333, so d = 3. We have 103 −1 = 333·3,
3
1
so 333 = 1033−1 . This says 333
41
= 10123 41
3 −1 , so 333 = .123. Therefore, 100x = 11.123, so shifting

the decimal two places left gives x = .11123.

Example 4.7.10. As a final example, the fractions 71 , 27 , . . . , 67 are all purely periodic because
(10, 7) = 1. One can check that ord7 (10) = 6, so that each of these fractions are periodic of
length 6. As you’re probably aware, these fractions are cyclic shifts of each other:

1 2 3
= .142857 = .285714 = .428571
7 7 7
4 5 6
= .571428 = .714285 = .857142
7 7 7

Why does this happen? Since ord7 (10) = 6, the powers 10k mod 7 for 1 ≤ k ≤ 6 must all
be distinct. Since there are 6 non-zero elements mod 7, we actually hit all of them: for
k
any 1 ≤ m ≤ 6, there is k such that m ≡ 10k mod 7. This says that m7 and 107 have the
same fractional part, but the latter we can compute by just shifting the decimal point! To
illustrate this, suppose we wanted to compute 75 . One can check that 105 ≡ 5 mod 7, so 57
5 5
and 107 have the same fractional part. From 71 = .142857, we find 107 = 14285.714285 (we
know this repeats because the period length is independent of the numerator, so the first 6
digits must be the repeating block), which gives 75 = .714285.

Integers for which n1 have the cyclic shifting property listed above are quite rare: it turns
out, these are precisely the integers n such that ordn (10) = ϕ(n). The fractions with n ≤ 100
for which this holds are 71 , 17
1 1 1 1 1 1 1 1 1
, 19 , 23 , 29 , 47 , 49 , 59 , 61 , 97 . In general, finding such an n where
this condition holds is very hard!

4.8 Exercises
1. Solve the following systems of congruences.

(a) 
x ≡ 3 mod 5

x ≡ 1 mod 7

x ≡ 4 mod 9

60
(b) 
x ≡ 5 mod 6

x ≡ 14 mod 15

x ≡ 24 mod 35

(Note: since the moduli are not co-prime, you need to split each single congruence
into a system of congruences, first!)
(c) Show that the system below does not have a solution.

x ≡ 5 mod 6

x ≡ 8 mod 15

x ≡ 6 mod 35

2. Solve the system of congruences



2
x ≡ 2 mod 7

x2 ≡ 3 mod 11

 2
x ≡ 4 mod 13

52
3. Compute 5051 mod 101.

4. Solve x11 ≡ 7 mod 18.

5. Prove that the map f : F× ×


29 → F29 given by x mod 29 7→ x
17
mod 29 is a bijection.

6. Find all solutions to ϕ(n) = 16.

7. Prove there is no solution to ϕ(n) = 14.

8. Find all solutions to x5 − 2x − 42 ≡ 0 mod 1323 (Note: 1323 = 33 · 72 ).

9. Create a table of the values of log5 (a) for all units mod 23. Use your table and properties
of the discrete logarithm to solve the equations 3x ≡ 2 mod 23 and 3x14 ≡ 2 mod 23.

10. Find all generators of F × for the field F = F3 [x]/(x2 + 1).

11. Suppose you know that a40 ≡ 1 mod n, a72 ≡ 1 mod n, and a4 6≡ 1 mod n. What is
ordn (a2 )?

12. Find all primes p such that ordp (2) = 10.


d
13. Prove for integers m, n > 1 that ϕ(mn) = ϕ(m)ϕ(n) ϕ(d) where d = gcd(m, n).

14. Let p, q be distinct odd primes such that p − 1 | q − 1. If gcd(a, pq) = 1, prove that
aq−1 ≡ 1 mod pq.

15. Prove that n | ϕ(2n − 1) for all n ≥ 1.

61
16. Mersenne primes are primes of the form 2p − 1 for prime p, and Fermat primes are
n
primes of the form 22 + 1.

(a) Let p be an odd prime, and let q be a prime divisor of 2p − 1. Prove that
ordq (2) = p. Deduce that if q is a prime divisor of 2p − 1, then q = 2pk + 1 for
some integer k.
n
(b) Similarly, for odd prime p prove that if p | 22 + 1 then ordp (2) = 2n+1 . Deduce
n
that if p is a prime divisor of 22 + 1, that p must be of the form 2n+1 k + 1 for
some integer k.
(c) Using WolframAlpha, 232 +1 = 641·6700417. Use the previous part to explain how
one could identify 641 as a possible factor, and prove by hand that 641 | 232 + 1
using modular arithmetic, so that 641 is the smallest prime divisor of 232 + 1.
Similarly, find by hand the smallest prime divisor of 229 − 1.

17. Let a > 0 with a not divisible by p for prime p. For 1 ≤ j ≤ p − 1, write a · j = pqj + rj
by the division algorithm. Fermat’s little theorem was proven by multiplying these
equations together.

(a) Adding them instead, prove that p−1 aj (a−1)(p−1)


P
j=1 b p c = 2
.
(b) Consider the triangle with vertices (0, 0), (p, 0) and (p, a). Show that the ex-
pression in part (a) counts the number of lattice points in the interior of this
triangle.

18. This exercise is a primality test based on Fermat’s little theorem. For any integer a
and prime p with p - a, Fermat’s little theorem says ap−1 ≡ 1 mod p, so taking the
contrapositive of this statement says that if ap−1 6≡ 1 mod p, then p is composite! This
gives rise to an algorithm for testing if an integer n is prime or not:

ˆ Randomly pick an integer a with n - a.


ˆ If an−1 6≡ 1 mod n, then n is composite and we’re done!
ˆ Otherwise, if an−1 ≡ 1 mod n, pick a different integer a and repeat.

If a is an integer such that an−1 6≡ 1 mod n, we call a a Fermat witness for the
compositeness of n.

(a) Using WolframAlpha or any other computer algebra system, find the smallest
Fermat witness for 2821.
(b) Let m = 56052361. Using WolframAlpha or any other computer algebra system,
determine if 2, 3, 5, 6, 7, 10, or 11 are Fermat witnesses for m. What do you find?
Is this enough information to tell you with certainty if m is prime or composite,
and why?

We call n a Carmichael number if an−1 ≡ 1 mod n for all integers a with gcd(a, n) = 1.
Carmichael numbers are the integers for which our primality test will never give us any
information.

62
(c) Prove that 561 is a Carmichael number. (Hint: 561 = 3 · 11 · 17. Euler’s theorem
might be useful.)

19. Euler’s theorem says for any integer a with gcd(a, n) = 1, that aϕ(n) ≡ 1 mod n. How-
ever, ϕ(n) is often times not the smallest exponent we can choose with this property.
The Carmichael function λ(n) is defined to be the smallest positive integer k such that
ak ≡ 1 mod n for all integers a with gcd(a, n) = 1. It turns out, for example, that
λ(1729) = 36, and so every integer a with gcd(a, 1729) = 1 satisfies a36 ≡ 1 mod 1729.

(a) Prove that n is Carmichael number if and only if λ(n) | n − 1.


(b) Compute λ(3), λ(11) and λ(17).
(c) Compute λ(561).

20. The goal of this problem is to give an alternate proof of Euler’s theorem.

(a) Prove that for any integer k with 1 ≤ k ≤ p − 1, that kp ≡ 0 mod p.




(b) For any integer a ≥ 0, prove by induction on a that ap ≡ a mod p. Deduce that
for gcd(a, p) = 1, aϕ(p) ≡ 1 mod p.
k)
(c) Prove that for any integers a, k with k ≥ 1 and gcd(a, p) = 1, that aϕ(p ≡
1 mod pk .
(d) Prove that for integers a, n with gcd(a, n) = 1 that aϕ(n) ≡ 1 mod n.

21. Let p be an odd prime and let k ≥ 0 be an integer. Prove that


(
0 mod p p−1-k
1k + 2k + . . . + (p − 1)k ≡
−1 mod p p − 1 | k

22. Prove the following generalization of Wilson’s theorem: if (Z/nZ)× has a generator,
then the product of all units mod n is congruent to −1 mod n.

23. The goal of this problem is to count the number of solutions to the congruence x2 ≡
1 mod n for arbitrary n > 1.

(a) Let p > 2 be prime and let k ≥ 1. Prove that x2 ≡ 1 mod pk if and only if
x ≡ ±1 mod pk .
(b) Prove by induction that for k ≥ 3, x2 ≡ 1 mod 2k has exactly four solutions:
x ≡ ±1 mod 2k and x ≡ ±1 + 2k−1 mod 2k .
(c) Let f (x) ∈ Z[x], and let gcd(m, n) = 1. Let N (k) denote the number of solutions
to the congruence f (x) ≡ 0 mod k. Prove that N (mn) = N (m)N (n). For f (x) =
x2 − 1, compute N (n) for arbitrary n > 1 in terms of the prime factorization of
n.
(d) Find all solutions to x2 ≡ 1 mod 1708.

24. Now, you’ll do the same for the congruence x2 ≡ −1 mod n.

63
(a) Let p be an odd prime and k ≥ 1. Prove x2 ≡ −1 mod pk has either 0 or 2
solutions, depending on the congruence class of p mod 4.
(b) Prove that x2 ≡ −1 mod 2k is solvable if and only if k = 0 or k = 1.
(c) Give a formula that counts the number of solutions to x2 ≡ −1 mod n in terms of
the prime factorization of n. Use your formula to count the number of solutions
to x2 ≡ −1 mod 2 · 53 · 13.
25. Let p be a prime. By Hensel’s lemma, for any n ≥ 2 there are exactly p − 1 roots of
xp−1 − 1 mod pn , coming from the lifts of 1, 2, . . . , p − 1 mod p.
(a) Let n > 0 and suppose that x ≡ y mod pn . Prove that xp ≡ y p mod pn+1 .
n−1
(b) Define a map ω : F× n ×
p → (Z/p Z) by x mod p → x
p
mod pn . Show that ω is
well-defined, injective, and satisfies ω(x)p−1 ≡ 1 mod pn .
(c) Use the previous part to compute the solutions to x4 ≡ 1 mod 6125.
26. Let vp (n) denote the highest power of p that divides n in its prime factorization, e.g.
v3 (7) = 0 and v3 (9) = 2. In this problem, you’ll see how you can use lifting to compute
orders mod n.
(a) (Let p be an odd prime and suppose that a ≡ 1 mod p. Prove that vp (an − 1) =
vp (n) + vp (a − 1). (Hint: consider separately the cases p - n and n = p before
handling the general case.)
(b) Let p be an odd prime and let d = ordp (a). Prove that
(
d m≥k
ordpk (a) = k−m
d·p m<k

where m = vp (ad − 1).


(c) Compute ord847847 (3) using the previous part and properties of orders. (Note:
847847 = 72 · 113 · 13).
27. ASCII is the standard way of converting symbols into numbers. Let A = 65, B =
66, . . . , Z = 90, and let a space correspond to 32. (This is not a particularly secure
method for converting text into numbers, but this is a simple illustration of how your
computer might do so without any other algorithm.)
(a) Let (m, e) = (4951760154835678088235319297, 1850567623300615966303954877)
be the public key. Convert the message “HELLO WORLD” into a number and
using WolframAlpha, use the RSA method to encrypt your message.
(b) Suppose that you are Eve the spy. Using WolframAlpha, compute the factoriza-
tion of m and then compute ϕ(m). Then, compute the decryption key d. Use
this to decode the intercepted message
(55799119760817384352725395, 3132339983985735578472674402) that Bob sent to
Alice, and convert it back to plain text (using a decimal to ASCII converter, per-
haps).

64
As you can see, the numbers used in this example are not even remotely difficult for a
computer to break RSA with. The actual recommendation for primes used for serious
applications of RSA are between 309 to 617 digits long!

65
Part III

Quadratic Number Theory

66
Chapter 5

Quadratic congruences

5.1 Quadratic residues


Now that we have a good understanding of the structure of (Z/nZ)× , we focus our attention
on quadratic congruences of the form x2 ≡ a mod n.

Definition 5.1.1. We call a a quadratic residue mod n if the congruence x2 ≡ a mod n.


Otherwise, we call a a quadratic non-residue mod n. Notationally, we will write a ≡
 mod n to mean that a is a quadratic residue mod n.

In the previous chapter, we saw that when n = p is prime, we have the following criteria
for determining when a is a quadratic residue mod n or not.

Proposition 40 (Euler’s criterion). Let p be a prime and p - a. Then


(
p−1 1 a ≡  mod p
a 2 ≡
−1 a 6≡  mod p
p−1 p−1
Proof. Note that ap−1 ≡ 1 mod p means that (a 2 − 1)(a 2 + 1) ≡ 0 mod p. Therefore,
p−1 p−1
a 2 ≡ ±1 mod p. We saw before that a 2 ≡ 1 mod p if and only if a ≡  mod p, which
p−1
means that if a 6≡  mod p we must have a 2 ≡ −1 mod p.
p−1
Corollary 5.1.2. For p odd, there are exactly 2
non-zero quadratic residues mod p.
p−1
Proof. There are p − 1 non-zero elements mod p, each of which is either a root of x 2 − 1 or
p−1
x 2 +1 mod p by the above. This means each polynomial has exactly p−1 2
roots because they
p−1
have at most 2 roots each. We’re done once we combine with the above proposition.

Definition 5.1.3. Let p be an odd prime and p - a. The Legendre symbol ( ap ) is defined
by
  (
a 1 a ≡  mod p
=
p −1 a 6≡  mod p

Proposition 41. Let p be an odd prime and p - a, b.

67
p−1
1. a 2 ≡ ( ap ) mod p.

2. ( ab
p
) ≡ ( ap )( pb ) mod p.

3. If a ≡ b mod p then ( ap ) = ( pb ).

Proof.

1. This follows immediately from Euler’s criterion.


p−1 p−1 p−1 p−1
2. By 1., (ab) 2 ≡ ( ab
p
) mod p, and (ab) 2 ≡a 2 b 2 ≡ ( ap )( pb ) mod p.

3. Trivial.

p−1
Corollary 5.1.4. Let p be an odd prime. Then ( −1
p
) = (−1) 2 .
p−1
Proof. The above proposition tells us that ( −1
p
) ≡ (−1) 2 mod p. Since both sides of the
congruence are ±1, because p is odd, this is only possible if they are actually equal.
p−1
In particular, casing on when 2
is even or not, this tells us that −1 ≡  mod p if and
only if p ≡ 1 mod 4.

As the Legendre symbol is multiplicative, this means that if a = pe11 · · · pekk is the prime
factorization of a, we have ( ap ) = ( pp1 )e1 · · · ( ppk )ek . Therefore, understanding how to compute
Legendre symbols reduces to understanding how to compute Legendre symbols of the form
( pq ) for prime q.

It’s also worth pointing out that the Legendre symbol can be viewed as a group homomor-
phism ( p· ) : F× ∼ ×
p → Z/2Z = h−1i ≤ C . In other words, the Legendre symbol is a character of
the group F× p . The benefit of this viewpoint is that it allows one to deduce properties of the
Legendre symbols via Fourier analysis. The main result of the next section, the quadratic
reciprocity law, is frequently proven using this view point in many number theory textbooks
by examining Gauss sums, which are, essentially, just the Fourier transform of the Legendre
symbol ( p· ).

5.2 Quadratic reciprocity


Numerically, if one tabulates whether p is a square mod q and q is a square mod p for various
primes p and q, there is a (non-obvious!) pattern:

ˆ If one of p, q ≡ 1 mod 4, then p ≡  mod q if and only if q ≡  mod p.

ˆ If both p, q ≡ 3 mod 4, then p ≡  mod q if and only if q 6≡  mod p.

68
This is the so called law of quadratic reciprocity, and was first proven by Gauss. He gave
8 different proofs of this result! Today, there are over 250 different proofs of quadratic
reciprocity. It’s a very central result in number theory!

Theorem 5.2.1 (Quadratic reciprocity). Let p and q be odd primes.


p−1
1. ( −1
p
) = (−1) 2

p2 −1
2. ( p2 ) = (−1) 8

p−1 q−1
3. ( pq )( pq ) = (−1) 2
· 2

The first two parts of the theorem are often known as the supplementary laws, with the
third part often being synonymous with the phrase “quadratic reciprocity”. We are going
to give Eisenstein’s proof of quadratic reciprocity, which is a modification of Gauss’s third
proof. We’ve already proven the first supplementary law in the previous section, so we will
proceed by proving the second.
p2 −1
Proposition 42. ( p2 ) = (−1) 8

p−1
Proof. The idea is a rather clever counting argument. Set s = 2
, and consider the s
equations:

1 = (−1) · (−1)
2 = 2 · (−1)2
3 = (−3) · (−1)3
..
.
s = (±s) · (−1)s
where the sign in the last equation is chosen so that the equality is correct. Multiply all
these equations together. The left hand side is just s!. What happens on the right hand
s(s+1)
side? There’s clearly a factor of (−1)1+2+...+s = (−1) 2 , all the even integers 2, 4, . . . , up
until s, and some negative odd numbers. Note that 2s ≡ −1 mod p, 2(s − 1) ≡ −3 mod p,
etc. so that modulo p, these negative odd numbers are really just even numbers in disguise!
s(s+1)
Therefore, we find that s! ≡ (−1) 2 s!2s mod p, where each factor of 2 gets paired up
with one of the integers in s! to produce the alternating sign pattern that we have. Since
s(s+1)
s! is a unit mod p, we find that 2s ≡ (−1) 2 mod p. Using that s = p−1 2
, we find that
p−1 p2 −1
( p2 ) ≡ 2 2 ≡ (−1) 8 mod p. Since both sides are ±1 mod p, this forces actual equality
which is we wanted.
As part of our proof, we need the following technical lemmas:

Proposition 43 (Gauss’s lemma). Suppose that p is an odd prime and p - a. For 1 ≤ k ≤


p−1
2
, write a · k ≡ εk rk mod p with εk = ±1 and 0 < rk < p2 . Then ( ap ) = (−1)µ , where
µ = #{k : εk = −1}.

69
Proof. First, we show that all such rk are distinct. Suppose that ri ≡ rj mod p for some i, j.
Then a·i ≡ εi ri ≡ εi rj mod p, and a·j ≡ εj rj ≡ εj ri mod p. Therefore, a·i·εj ≡ a·j·εj mod p,
so that i ≡ ±j mod p, as εi , εj are either ±1. As 1 ≤ i, j ≤ p−1 2
, the only way this is possible
is if i = j. Therefore, for all k in the range of interest, we must have rk are distinct.
1 ≤ rk ≤ p−1 2
, this then says that {rk } = {1, 2, . . . , p−1
2
}. Taking the product of the p−1 2
p−1 p−1 p−1 Q p−1 p−1
congruences a · k ≡ εk rk mod p, we find that a 2 ( 2 )! ≡ ( 2 )! k=1 εk mod p. Since ( 2 )!
2

Q p−1
is invertible mod p, and the value of i=1 2
εk only depends on the number of negative 1’s in
p−1
a
the product, this yields ( p ) ≡ a 2 ≡ (−1)µ mod p, and then actual equality follows since
both sides are ±1 mod p.

Proposition 44 (Eisenstein’s lemma). Let p be an odd prime and p - a. Then ( ap ) = (−1)m ,


P p−1
where m = k=12
b 2ak
p
c.

Proof. Write 2ak = pqk + rk0 by the division algorithm for 1 ≤ k ≤ p−1 2
and 0 ≤ rk0 ≤
2ak 0
p − 1. Note that qk = b p c, and that because p is odd, qk ≡ rk mod 2. Therefore, m =
P p−1 P p−1 P p−1 P p−1 0 Q p−1
2ak 0 m 2
k=1 rk = rk0
2
b
k=1 p c ≡ 2
q
k=1 k ≡ 2
r
k=1 k mod 2. Thus, (−1) = (−1) k=1 (−1) . Using
2

the notation of the previous proposition, write a · k ≡ εk rk mod p. Then this says 2ak ≡
rk0 ≡ 2εk rk mod p. In the proof of Gauss’s lemma, we showed that {rk } is a rearrangement of
{1, 2, . . . , p−1
2
}, and therefore this means that {2rk } consists of the even integers {2, 4, . . . , p−
1}. Note that if εk = 1, then rk0 must be even. Otherwise, if εk = −1, then rk0 is congruent to
a negative even integer mod p, which is congruent to a positive odd integer because p is odd!
Q p−1 0
Therefore, k=1 2
(−1)rk = (−1)µ . This means that m ≡ µ mod 2, so we’re done by Gauss’s
lemma.
The above lemmas are not particularly enlightening, and are not seemingly very useful.
The brilliance in Eisenstein’s proof is that the sum in the previous proposition has a very
simple geometric interpretation, which leads to a very clean geometric proof of quadratic
reciprocity.

Theorem 5.2.2 (Quadratic Reciprocity). Let p, q be distinct odd primes. Then ( pq )( pq ) =


p−1 q−1
· 2
(−1) 2 .
P p−1
Proof. The key observation is that the sum k=1 2
b 2q
p
kc counts the number of lattice points
in the interior of the triangle with vertices (0, 0), (p, 0) and (p, q) that have even x-coordinate
(this is an exercise!).

Consider the rectangle with vertices (0, 0), (p, 0), (0, q), (p, q). There are p − 1 columns,
each with q − 1 lattice points, and these correspond to all the lattice points inside the rectan-
gle. Consider the diagonal from (0, 0) to (p, q). Since p and q are relatively prime, there are
no lattice points on this diagonal. Therefore, for each column, the number of lattice points
above the diagonal has the same parity as the number of lattice points below the diagonal,
because each column has an even number of points.

70
Consider the triangle with vertices ( p2 , 2q ), ( p2 , q), (p, q). There is a bijection between the
lattice points in the interior of this triangle with even x-coordinate and lattice points in
the interior of the triangle whose vertices are (0, 0), ( p2 , 0), ( p2 , 2q ) that have odd x-coordinate,
given by reflecting across the line y = 2q and then across the line x = p2 . Putting this together,
the number of even x-coordinate lattice points inside the large triangle that are in the right
half of the rectangle has the same parity as the number of even x-coordinate lattice points
inside the upper triangle with even x-coordinate, which is the same as the number of lattice
P p−1
points in the lower left triangle with odd x-coordinate, so that k=1 2
b 2q
p
kc counts the total
number of lattice points inside the lower left triangle.
P q−1 2p
By symmetry, k=1 b q kc counts the total number of lattice points inside the triangle
2

P p−1 P q−1
2 b 2q 2 b 2p kc
kc+ k=1
with vertices (0, 0), (0, 2q ), ( p2 , 2q ), so ( pq )( pq ) = (−1) k=1 p q .

This exponent is the number of lattice points inside the rectangle whose vertices are
p−1 q−1
(0, 0), ( p2 , 0), (0, 2q ), ( p2 , 2q ), which is simply p−1
2
· q−1
2
. This proves ( pq )( pq ) = (−1) 2 · 2 , which
is what we wanted.
Armed with quadratic reciprocity, we can now see some applications.

Example 5.2.3. The √ quadratic congruence 2x2 + 5x − 9 ≡ 0 mod 101 has a solution if and
only if the expression 97 makes sense in F101 by the quadratic formula. This is equivalent
97 97
to computing ( 101 ). Since 101 ≡ 1 mod 4, by quadratic reciprocity,√( 101 ) = ( 101
97
4
) = ( 97 ) = 1,
−5± 97
so the congruence is solvable. The solutions are given by x ≡ 2
mod 101, which after
doing the computation yields x ≡ 19, 29 mod 101.
79
Example 5.2.4. To compute ( 101 ), we repeatedly apply quadratic reciprocity over and
79
over until we arrive at a symbol we can easily calculate. We have ( 101 ) = ( 101
79
) = ( 22
79
)=
2 11 11 79 2
( 79 )( 79 ) = ( 79 ) = −( 11 ) = −( 11 ) = 1.

Example 5.2.5. For which prime p is −5 ≡  mod p? Equivalently, for which prime
p is ( −5p
) = 1? Factoring the Legendre symbol, we want to find all prime p such that
( p )( p ) = 1, so we need ( −1
−1 5
p
) = ( p5 ) = 1 or ( −1
p
) = ( p5 ) = −1. By quadratic reciprocity,
( p5 ) = ( p5 ) ≡ p2 mod 5. Therefore, we see that ( p5 ) = 1 for p ≡ ± mod 5 and ( p5 ) = −1 for
p ≡ ±2 mod 5. Since ( −1 p
) = 1 for p ≡ 1 mod 4 and ( −1 p
) = −1 for p ≡ 3 mod 4, we get the
following systems ( (
p ≡ 1 mod 4 p ≡ 3 mod 4
p ≡ ±1 mod 5 p ≡ ±2 mod 5
resulting in p ≡ 1, 3, 7, 9 mod 20.

71
5.3 Jacobi reciprocity
Definition 5.3.1. Let n > 1 be odd, and write n = pe11 · · · pekk . For any a with gcd(a, n) = 1,
the Jacobi symbol ( na ) is defined by
   e1  ek
a a a
= ···
n p1 pk
The Jacobi symbol is then an extension of the Legendre symbol to non-prime denom-
inator. It should be clear why such a definition would be interesting: the Jacobi symbol
allows factorization of both the numerator and the denominator, so the computations are
potentially easier.
Proposition 45. Let n > 1 be odd and a, b with gcd(a, n) = gcd(b, n) = 1. Then the
following properties hold:
1. ( ab
n
) = ( na )( nb )
a a
2. ( mn ) = (m )( na )
Proof. This is immediate from the definition of the Jacobi symbol.
Unlike the Legendre symbol, the Jacobi symbol does not detect if a ≡  mod n. For
2
example, ( 15 ) = ( 23 )( 25 ) = (−1) · (−1) = 1 by the above proposition, but obviously x2 ≡
2 mod 15 is not solvable because it’s not solvable mod 3 or mod 5.

However, it is true that if ( na ) = −1, that a 6≡  mod n, simply because this means
2
x ≡ a mod pi is not solvable for some prime factor pi of n.
Example 5.3.2. ( 28
45
2 2 7 2 7
) = ( 45 ) ( 3 ) ( 5 ) = −1, so x2 ≡ 28 mod 45 is not solvable. The obstruc-
tion comes from x2 ≡ 28 ≡ 3 mod 5 not being solvable.
Much like Legendre symbols, Jacobi symbols have a version of reciprocity.
Theorem 5.3.3 (Jacobi Reciprocity). Let n, m > 1 be odd integers that are co-prime. Then
the following hold:
n−1
1. ( −1
n
) = (−1) 2

n2 −1
2. ( n2 ) = (−1) 8

m−1 n−1
n · 2
3. ( m )( m
n
) = (−1) 2

We postpone the proof until after the following remark. The main application of Jacobi
symbols is that they make calculating Legendre symbols easier. The Legendre symbol re-
quires one the factor the numerator before being able to apply quadratic reciprocity, whereas
Jacobi symbols do not require factoring to apply Jacobi reciprocity unless the numerator is
even! It’s easy to compute the exponent of 2 in the factorization of any integer, so this is not
computationally an issue. As there are not efficient factoring algorithms, this means using
Jacobi reciprocity is in general, going to be superior.

72
Example 5.3.4. Suppose we wanted to compute ( 1001 9907
) using Jacobi reciprocity. We just con-
tinually flip the symbol, and factor if even integers appear. We have the following sequence
of steps: ( 1001
9907
898
) = ( 1001 2
) = ( 1001 449
)( 1001 449
) = ( 1001 ) = ( 103
449
37
) = ( 103 29
) = ( 37 8
) = ( 29 2 3
) = ( 29 ) = −1.
We now proceed to the proof of Jacobi reciprocity. Since the Jacobi symbol is defined in
terms of the Legendre symbol, the proof will not be very hard.
Lemma 5.3.5. For any a, b > 1 odd and e ≥ 1 we have:
ab−1 a−1 b−1
1. 2
≡ 2
+ 2
mod 2
a2 b2 −1 a2 −1 b2 −1
2. 8
≡ 8
+ 8
mod 2
ae −1
3. 2
≡ e a−1
2
mod 2
a2e −1 2
4. 8
≡ e a 8−1 mod 2
Proof.
1. Since a, b are odd we have (a − 1)(b − 1) ≡ 0 mod 4. This means ab − 1 ≡ (a − 1) +
(b − 1) mod 4, so dividing by 2 yields the result.
2. The proof is very similar, note that (a2 − 1)(b2 − 1) ≡ 0 mod 16 so that a2 b2 − 1 =
(a2 − 1) + (b2 − 1) mod 16. The result follows upon dividing by 8.
3. We have (ae − 1) ≡ (a − 1)(1 + a + . . . + ae−1 ) mod 4. Divide by 2 and use that a is
e
odd to get a 2−1 ≡ e a−1
2
mod 2.
4. Identical to 3., just work mod 16 and divide by 8 instead.

Corollary 5.3.6. Let a1 , . . . , ak be odd positive integers and e1 , . . . , ek ≥ 1.


e e
Pk ai −1 (a11 ···akk )−1
1. e
i=1 i 2 ≡ 2
mod 2
e e
Pk a2i −1 (a11 ···akk )2 −1
2. i=1 ei 8
≡ 8
mod 2
Proof. Both statements easily follow by induction and using the previous lemma.
Proof. (of Jacobi reciprocity)
Pk pi −1
1. Write n = pe11 · · · pekk . By definition, we have ( −1
n
) = ( −1
p1
)e1 · · · ( −1
pk
)ek = (−1) i=1 ei 2 =
n−1
(−1) 2 by the above corollary.
2. Same as the above, except using the other part of the corollary.
3. Let m = pe11 · · · pekk and n = q1f1 · · · qkf` be the prime factorizations of m and n respec-
Pk P` pi −1 qj −1
n j=1 ei fj · 2
tively. Factoring and using quadratic reciprocity, we have ( m )( m
n
) = (−1) i=1 2 =
m−1 n−1
·
(−1) 2 2 by applying the corollary twice.

73
5.4 Squares mod n
To wrap up the discussion of quadratic congruences, we’ll talk briefly about solutions to
quadratic congruences mod n. For a general quadratic congruence ax2 + bx + c ≡ 0 mod n,
if gcd(n, 2a) = 1 then by the quadratic formula, the congruence will have a solution if and
only if b2 − 4ac is a square mod n. Under these assumptions, understanding when quadratic
congruences are solvables reduces to understanding when quadratic congruences of the form
x2 ≡ a mod n are solvable.

For the sake of keeping the analysis simple, we’ll investigate what happens when a is odd,
so that the headaches involved with trying to solve quadratic equations mod powers of two
are kept to a minimum.
Proposition 46. Let e ≥ 3. Then x2 ≡ a mod 2e is solvable if and only if a ≡ 1 mod 8.
Proof. The forward direction is clear, as if x2 ≡ a mod 2e then x2 ≡ a mod 8 and all odd
integers square to 1 mod 8. Conversely, if e = 3 and a ≡ 1 mod 8, then obviously x2 ≡ 1 mod
8 is solvable. Now, assume that e > 3 and that we know the statement “a ≡ 1 mod 8 =⇒
x2 ≡ a mod 2k is solvable” holds for all 3, . . . , k. We need to then construct a solution mod
2k+1 . If x2 ≡ a mod 2k+1 were a solution, then necessarily we must have x2 ≡ a mod 2k . By
induction hypothesis, this has a solution, say c2 ≡ a mod 2k . This means that a = c2 + 2k `
for some `. The claim is that c0 = c + 2k−1 ` is a a solution to the desired congruence. Indeed,
squaring we find that (c0 )2 = c2 + 2k c` + 22k−2 `2 ≡ a + 2k `(c − 1) mod 2k+1 as 2k − 2 ≥ k + 1
for k ≥ 3. Note that c must be odd, as a is odd, which means that 2 | c − 1. This then
shows that (c0 )2 ≡ a mod 2k+1 , as desired. Therefore by induction, if a ≡ 1 mod 8 then
x2 ≡ a mod 2e is solvable for all e ≥ 3.
Theorem 5.4.1. Let n = 2e pe11 · · · pekk be the prime factorization of n, and suppose that
gcd(a, n) = 1. Then the congruence x2 ≡ a mod n is solvable if and only if ( pai ) = 1 for all
i, a ≡ 1 mod 4 if e = 2, and a ≡ 1 mod 8 if e ≥ 3.
Proof. If x2 ≡ a mod n is solvable then clearly x2 ≡ a mod pi holds for all i. If e = 2 then
a ≡ 1 mod 4 because all odd integers square to 1 mod 4, and if e ≥ 3 then a ≡ 1 mod 8
because all odd integers square to 1 mod 8.

Conversely, suppose that ( pai ) = 1 for all i. Then by Hensel’s lemma, for each i there
are exactly two solutions to x2 ≡ a mod pei i , given by lifting up the two solutions to the
congruence x2 ≡ a mod p. If e = 2 and a ≡ 1 mod 4, then clearly x2 ≡ a mod 4 is solvable.
If a ≡ 1 mod 8 then by the above lemma, x2 ≡ a mod 2e is solvable. Therefore, we have
x2 ≡ a mod 2e is solvable and x2 ≡ a mod pei i are all solvable, so by the Chinese remainder
theorem, we must have x2 ≡ a mod n is solvable by gluing together a solution to each
congruence.

5.5 Exercises
1. Determine if the congruence 3x2 + 6x + 5 ≡ 0 mod 89 is solvable.

74
113
 514  401

2. Use Jacobi reciprocity to compute the Legendre symbols , 1093 , and 757
997
.

3. Use quadratic reciprocity to find all primes p such that p7 = 1. Do the same for

15

p
= 1.

4. Find a prime number p that can be simultaneously expressed in all three of the following
forms: x2 + y 2 , x2 + 2y 2 , x2 + 3y 2 .

5. Show that for any prime p, the polynomial f (x) = x4 − 10x2 + 1 ∈ Z[x] is reducible in
Fp [x].

6. Recall that a Mersenne prime is a prime of the form 2p − 1, and a Fermat prime is a
n
prime of the form 22 + 1. (Refer to exercise 4.7.16).
n
(a) Let p be a prime dividing 22 + 1. Use Legendre symbols to prove that p ≡
1 mod 2n+2 (Hint: show that 2n+1 | p−1
2
).
(b) Show that if p is an odd prime, then every prime divisor q of 2p − 1 must satisfy
q ≡ ±1 mod 8.
(c) Prove by hand that 217 − 1 is not prime.
Pp−1 a

7. (a) Let p be an odd prime. Prove that a=1 = 0.
p
Pp−1 a

(b) Let p be a prime with p ≡ 1 mod 4. Prove that a=1 a p
= 0.

8. Let N = p1 · · · pk be a product of distinct odd primes. Prove that there exists an


integer a such that Na = −1.

9. In this exercise, you’ll see how to prove special cases of Dirichlet’s theorem using your
knowledge of Legendre symbols!

(a) Prove there are infinitely many primes of the form 4k + 1. (Hint: consider N =
(2p1 · · · pn )2 + 1.)
(b) Prove there are infinitely many primes of the form 8k + 7. (Hint: Construct a
similar choice of N which has a prime factor that is 7 mod 8.)

75
Chapter 6

Conic sections

6.1 Rational points on conics


Now having a solid understanding of when quadratic equations have solutions mod n, we
turn our attention back to the integers. It’s trivial to determine when ax2 + bx + c = 0
has solutions in Z, so our next goal will be trying to understand when the curve C :
ax2 + bxy + cy 2 + dx + ey + f = 0 has solutions in Z2 . Such curves C are called conic sections.

There are many interesting problems in number theory that essentially boil down to
finding integer or rational solutions on curves. The classical example that we’ll start with is
the problem of trying to classify all Pythagorean triples, which are the integer points on the
surface x2 + y 2 = z 2 . We start with the following simple observation: if (x, y, z) is a solution
in Z3 to x2 + y 2 = z 2 , then (x/z, y/z) is a solution to x2 + y 2 = 1. In other words, (x/z, y/z)
is a rational point on the unit circle S 1 . Conversely, any rational point (r, s) on S 1 can be
written so that both coordinates have a common denominator, say r = ac and s = cb , from
which r2 + s2 = 1 results in a2 + b2 = c2 . Therefore, there is a correspondence
{Pythagorean triples} ←→ {Rational point of S 1 }
As it turns out, there is a rather easy geometric way to find rational points on the unit
circle! First, we note that (0, 1) lives on S 1 . Consider the line Lm : y − 1 = mx that passes
through (0, 1) and has slope m. For each value of m, Lm intersects the circle at a second point
(where we count intersection with multiplicity). To find the intersection point, we must solve
−2m
the quadratic equation x2 +(1+mx)2 = 1. Expanding and solving for x yields x = 1+m 2 , and
1−m2
plugging back into the equation of Lm says that y = 1+m2 . If m ∈ Q, then the coordinates of
this intersection point are rational, and so we’ve found a new rational point on S 1 ! On the
other hand, starting with any rational point (r, s) 6= (0, −1) ∈ S 1 (this would correspond to
an infinite slope!), the line passing through (0, 1) and (r, s) has slope s−1
r
, which is rational.
1 −2m 1−m2
This tells us the set of rational points on S is {( 1+m2 , 1+m2 ) : m ∈ Q} ∪ {(0, −1)}, and that
there is a bijection
{Rational points of S 1 } \ {(0, −1)} ←→ Q
Witing m = uv for gcd(u, v) = 1 as a fraction in lowest terms, the rational point corre-
2 2
sponding to m is (− u22uv , v −u ). From our initial observation, this then corresponds to the
+v 2 u2 +v 2

76
Pythagorean triple (−2uv, v 2 − u2 , u2 + v 2 ).

This idea can be adapted to find rational points on any conic section, provided we have
found one point to start with!

Example 6.1.1. Suppose we wanted to find all rational points on the hyperbola x2 −y 2 = 1.
We note that (1, 0) is such a point, and that for m ∈ Q, the line Lm : y = m(x − 1) passing
through (1, 0) has at most one other intersection point with the hyperbola. The coordinates
of this intersection come from solving the equation x2 −m2 (x−1)2 = 1, which is the quadratic
equation (1 − m2 )x2 + 2m2 x − (m2 + 1) = 0. For m = ±1, we just recover the point (1, 0)
(which is because the hyperbola has asymptotes of y = ±x !). For m 6= ±1, we may divide
2m2 m2 +1
to write this as x2 + 1−m 2 x − 1−m2 = 0.

There’s a trick to avoid messy algebra from trying to use the quadratic formula. For a
general quadratic equation x2 +ax+b, there’s a relation between the roots and the coefficients
of the polynomial: for two roots r1 , r2 ∈ C, the factorization x2 + ax + b = (x − r1 )(x − r2 )
tells us that r1 r2 = b and r1 +r2 = −a. We know that one of the roots must be r1 = 1. Why?
Solving the quadratic equation tells us the x-coordinates of the points on the hyperbola that
is passes through. By construction, one of those points is (1, 0)! Therefore, we find that
2 +1 2 +1
the roots are r − 1 = 1 and r2 = m m2 −1
. Therefore, the intersection point is ( m , 2m ).
m2 −1 m2 −1
Once more, we note that rational choices of m produce rational points, and any rational
point must come from a rational slope. This tells us that the set of rational points is
2 +1
{( m , 2m ) : m 6= ±1 ∈ Q} ∪ {(1, 0)}.
m2 −1 m2 −1

That this procedure works is not entirely obvious. For the conic C : ax2 + bxy + cy 2 +
dx + ey + f = 0, one may prove that there is a new coordinate system given by an invertible
affine change of variables such that, with respect to these new coordinates X and Y , C
takes on one of three “reduced” forms: X 2 + BY 2 = D, X 2 = Y , or X 2 = E for some
B, D, E ∈ Z with B 6= 0. The proof of this fact is rather technical and not enlightening, so
we just take it for granted. From there, however, we may observe that the procedure works
for any of these three reduced forms by using relation between roots and coefficients of a
quadratic to see that because the coefficients of the quadratic resulting from trying to find
the of intersection point of the conic C and Lm has rational coefficients, and one root is, by
assumption, rational, the other must be as well.

There is another method for finding Pythagorean triples, using algebra instead of ge-
ometry. The idea comes from the observation that if x2 + y 2 = z 2 in Z3 , then this means
(x + iy)(x − iy) = z 2 in Z[i], and so this is now a question about factorization in Z[i].

Theorem 6.1.2. Let (x, y, z) a primitive Pythagorean triple. Then x = m2 − n2 , y = 2mn


and z = m2 +n2 for some integers m, n. For any choice of m > n > 0, and gcd(m, n) = 1, the
resulting values of x, y, z form a primitive Pythagorean triple, which is uniquely determined
by (m, n).

Proof. Suppose that (x, y, z) is a primitive Pythagorean triple, so that x2 +y 2 = z 2 . First, we


note that z must be odd. If it was even, then working mod 4 we must have x2 +y 2 ≡ 0 mod 4.

77
The only way this is possible is if x, y ≡ 0 mod 4, which would mean both x, y are even,
contradicting that (x, y, z) is primitive. As a consequence, exactly one of x, y can be even.
Without loss of generality, we will assume that x is odd and y is even.

In Z[i], this means that (x + iy)(x − iy) = αᾱ = z 2 . First, observe that α and ᾱ are
relatively prime in Z[i]. This is because if δ is a greatest common divisor of α and ᾱ, then
δ | α ± ᾱ, so δ | 2 gcd(x, y) = 2. The factorization of 2 in Z[i] is 2 = −i(1 + i)2 , so the only
non-trivial divisor is (up to unit multiple) 1 + i. Could 1 + i be a common divisor? The
answer is no. If 1 + i | α then taking norms, 2 | N (α) = z 2 which is not possible, as z 2 is
odd. Therefore, gcd(α, ᾱ) must be a unit. Since αᾱ = z 2 and α, ᾱ share no prime factors
in common, by unique factorization the only way this is possible is if both α, ᾱ are perfect
squares in Z[i]. Therefore, we may write α = (m + ni)2 for some integers m, n. Equating real
and imaginary parts yields x = m2 − n2 and y = 2mn. This says z 2 = x2 + y 2 = (m2 + n2 )2 .
As z > 0, this means z = m2 + n2 .

That all such choices x = m2 −n2 , y = 2mn, z = m2 +n2 for m > n > 0 and gcd(m, n) = 1
actually are solutions to x2 + y 2 = z 2 is easy to verify. Must (x, y, z) be primitive? If it were
not, suppose that p is a common prime factor. As z is odd, we must have p 6= 2. Therefore,
from p | mn we must have p | m or p | n. Either way, m2 ≡ n2 mod p would mean both
m, n ≡ 0 mod p, which is a contradiction. Therefore, (x, y, z) is indeed a primitive triple.
Finally, from the above paragraph we see that (m, n) correspond to one of the square roots
of x + yi ∈ Z[i]. The other square root is −m − ni, which is not included as a possibility.

6.2 Sums of squares


What about finding integer points on circles? This is harder! We’ll start with a famous
question of Fermat: For integer n > 0, when does the equation x2 + y 2 = n have integer
solutions? Equivalently, when is n a sum of two squares in Z?

The first observation is the following:

Theorem 6.2.1 (Fermat). Let p ∈ Z be an odd prime. Then x2 + y 2 = p has integer


solutions if and only if p ≡ 1 mod 4.

Proof. Firstly, suppose that x2 + y 2 = p has solutions. Then working mod p, this means
that x2 + y 2 ≡ 0 mod p. Since neither x, y ≡ 0 mod p, this means that (x/y) ≡ −1 mod p,
so ( −1
p
) = 1, which means that p ≡ 1 mod 4. On the other hand, suppose that p ≡ 1 mod 4,
so x ≡ −1 mod p is solvable. This means there is some integer x with p | x2 + 1 in Z,
2

or in Z[i], p | (x + i)(x − i). If p was prime, this would then mean that p | x ± i, which
is clearly impossible. Therefore, p is composite, so we may write p = αβ as a non-trivial
factorization in Z[i]. Taking norms, this means p2 = N (α)N (β) in Z. Since the factorization
is non-trivial, this forces N (α) = N (β) = p. Writing α = x + yi, this says x2 + y 2 = p as
desired.
The proof above gives us a complete classification of how primes split in Z[i]:

78
Corollary 6.2.2. Let p ∈ Z be a prime. The factorization of p in Z[i] is:

ˆ p = (1 + i)(1 − i) = −(1 + i)2 if p = 2.

ˆ p = ππ̄ for a Gaussian prime π if p ≡ 1 mod 4.

ˆ p remains prime in Z[i] if p ≡ 3 mod 4.

Proof.

ˆ Obvious.

ˆ The above proof show that if p ≡ 1 mod 4 then p = N (π) = ππ̄ for some Gaussian
integer π, which is then prime because its norm is prime.

ˆ Once more, the above proof shows that if p factored it would have to be a sum of two
squares.

The splitting of integer primes p in Z[i] then tells us a complete classification of all primes
in Z[i], after we observe the following:

Proposition 47. Let π ∈ Z[i] be a prime. Then π | p in Z[i] for some prime p ∈ Z.

Proof. N (π) is an integer, and therefore is a product of primes in Z. Since π is prime, in


Z[i], this means π divides one of these primes by Euclid’s lemma.

Corollary 6.2.3. Let π ∈ Z[i] be prime. Then up to unit multiple, π is one of the following:

ˆ 1 + i.

ˆ α or ᾱ, where α ∈ Z[i] satisfies N (α) = p for some prime p ∈ Z with p ≡ 1 mod 4.

ˆ p for p ∈ Z primes and p ≡ 3 mod 4.

Proof. Follows immediately from the above proposition + corollary.


Returning back to the geometry, we’re ready to prove the following:

Theorem 6.2.4 (Fermat). Let n > 1 be an integer. Then x2 + y 2 = n has integer solutions
if and only if for every prime p | n with p ≡ 3 mod 4, we have vp (n) is even.

Proof. Write n = 2e pe11 · · · pekk · · · q1f1 · · · q`f` , where pi ≡ 1 mod 4 and qj ≡ 3 mod 4. By the
previous theorem, we know that pi = πi π̄i for some Gaussian prime πi . Therefore, we see that
2e pe11 · · · pekk = N ((1 + i)e αie1 · · · αkek ), which is then a sum of two squares. So it’s sufficient to
only consider the part of the factorization that are coming from qj .
f j/2 f f /2
If all fj are even, then qj j = N (qj ), or equivalently, qj j = (qj j )2 + 02 . Therefore,
f /2 f /2
n = N ((1 + i)e αie1 · · · αkek q11 · · · qj j ) so that n is a sum of two squares. It remains to show

79
that if ej is odd for some j, then n cannot be written as a sum of two squares.

To handle this, we proceed by induction. If n = 2, then n has no prime factors that are
3 mod 4 and so there’s nothing to prove. Suppose we know for all integers 2 ≤ k < n that if
n is the sum of two squares, the multiplicity of all prime factors that are 3 mod 4 is even. If
n has no prime factors that are 3 mod 4, there’s nothing to prove and we’re done. Otherwise,
assume that n has some prime factor p with p ≡ 3 mod 4. As n = x2 + y 2 and p | n, this
means p | (x + iy)(x − iy) in Z[i]. Since p ≡ 3 mod 4 we’ve seen that p is prime in Z[i], so
this means p | x ± iy by Euclid’s lemma, which then says that p | x, y. If n = p2 we’re done,
so assume that n > p2 . We may write n = p2 (x/p)2 + (y/p)2 , so n/p2 = (x/p)2 + (y/p)2 .
As 2 ≤ n/p2 < n, by induction hypothesis , vp (n/p2 ) is even, and therefore vp (n) is even as
well. By induction, we’re done.

Example 6.2.5. Let n = 45 = 32 · 5. By the above theorem, n can be written as a sum of


two squares. The proof tells us exactly how to do this: in Z[i], we have 5 = N (1 + 2i) and
32 = N (3), so we may write 45 = N (3 + 6i) = (3 + 6i)(3 − 6i) = 32 + 62 .

Example 6.2.6. What are all integer points on x2 + y 2 = 45? From the above example,
each different prime x + yi of norm 5 produces the point (3x, 3y) on the circle. There are
precisely 8 different primes of norm 5: the unit multiples of the conjugate primes 1 + 2i and
1 − 2i. This results in the 8 integer points (3, ±6), (−3, ±6), (6, ±3), (−6, ±3).

Alternatively, we could have just brute forced our way this solution: if x2 + y 2 = 45 then
we clearly see that |x| ≤ 6. We can then set y 2 = 45 − x2 just plug in x = ±1, . . . , ±6 to
find which resulting y values are perfect square.

As before, once we’ve found one rational point, we then know all rational points on
x + y 2 = 45.
2

Example 6.2.7. We’ve seen that (3, 6) is an integer point. Consider Lm : y − 6 = m(x − 3)
of arbitrary rational slope m. Then as m varies among the rational numbers, the intersection
of Lm with x2 + y 2 = 45 will hit all other rational points on the circle, with the exception of
(3, −6). Solving for the intersection point, we find that the rational points on this circle are
2 2
{( 3(mm−4m−1)
2 +1 , −6(mm2+m−1)
+1
) : m ∈ Q} ∪ {(3, −6)}. Writing m = uv for gcd(u, v) = 1, we get
2 2 2 2)
{( 3(u u−4uv−v
2 +v 2
) −6(u +uv−v
, u2 +v 2
: gcd(u, v) = 1} ∪ {(3, −6)}.

How can we find the integer points from the rational points? Firstly, we need a count.
There are certainly finitely many integer solutions to this equation. If (x, y) is one solution to
x2 +y 2 = 45, then so are (±x, y), (x, ±y), (±y, x), (y, ±x). Therefore, the number of solutions
is divisible by 8. Also, if x2 + y 2 = 45 then x2 + y 2 ≡ 0 mod 3. If one of x, y 6≡ 0 mod 3, this
means either (x/y)2 ≡ −1 mod 3 or (y/x)2 ≡ −1 mod 3, which is impossible as ( −1 3
) = −1.
2 2
Therefore, both x, y ≡ 0 mod 3, and so (x/3) + (y/3) = 5. Therefore, it’s sufficient to
count the number of solutions to x2 + y 2 = 5. Clearly, |x|, |y| ≤ 2, so there are at most 16
solutions, and it’s also obvious that there are < 16 solutions, so there are exactly 8 integer
points on x2 + y 2 = 45. We know (3, 6) is one of them, so therefore all of them are given by

80
(3, ±6), (−3, ±6), (6, ±3), (−6, ±3).

How could we find values of (u, v) that produce these points? Taking (u, v) = (0, ±1)
results in (−3, 6), so assume u2 + v 2 > 1. Necessarily, we need that u2 + v 2 | 3(u2 − 4uv − v 2 )
and u2 + v 2 | −6(u2 + uv − v 2 ). Let p be a prime factor of u2 + v 2 . We must have
either p = 2, or p ≡ 1 mod 4, and u2 + v 2 ≡ 0 mod p. We’ll ignore the possibility of
p = 2 for the moment, and therefore, we require that 3(u2 − 4uv − v 2 ) ≡ 0 mod p, so that
u2 −v 2 ≡ 4uv mod p. Adding with u2 +v 2 ≡ 0 mod p says 2u2 ≡ 4uv mod p, and subtracting
says 2v 2 ≡ −4uv mod p. If both u, v 6≡ 0 mod p, then we would have u ≡ 2v mod p and
v ≡ −2u mod p, so that 5u ≡ 0 mod p, which tells us that p = 5. If one of u, v ≡ 0 mod p,
then the other must be as well, which is not possible since gcd(u, v) = 1, so we may just
assume that u, v 6≡ 0 mod p. From u2 + v 2 ≡ 0 mod 5, we have (v/u)2 ≡ −1 ≡ 4 mod 5, so
this means v ≡ 2u mod 5 or v ≡ 3u mod 5. We also require that u2 + uv − v 2 ≡ 0 mod 5,
but note u2 + uv − v 2 ≡ u2 − 4uv − v 2 mod 5 so this says nothing new.

There are 8 potential points in (F× 2


5 ) that satisfy v ≡ 2u mod 5 and v ≡ 3u mod 5: these
are (1, 3), (2, 1), (3, 4), (4, 2), (1, 2), (2, 4), (3, 1), (4, 3). Only three of them could actually yield
integer points: (1, 3), (2, 1), (4, 2), because these are the only three which satisfy u2 − 4uv −
v 2 ≡ 0 mod 5. We can then search through (u, v) with (u, v) ≡ (1, 3), (2, 1), (4, 2) mod 5 to
find values that work. For example, the 7 integer points produced by the line procedure can
be found by taking (u, v) = (0, 1), (1, −2), (1, 3), (2, 1), (−3, 1), (1, 1), (1, −1).

This might seem rather useless, however there is reason to investigate this solution more
closely: it reveals a connection between integer points on the circle x2 + y 2 = 45 and lattice
points of certain lattices.

6.3 Sums of squares and lattices


We start with some definitions.

Definition 6.3.1. A lattice of Rn is L = SpanZ {v1 , . . . , vn } for R-linearly independent


vectors v1 , . . . , vn ∈ Rn . The fundamental parallelogram T of the lattice L is T =
{c1 v1 + . . . + cn vn : 0 ≤ ci ≤ 1}.

Definition 6.3.2. A set S ⊂ Rn is called convex if for any x, y ∈ S and any 0 ≤ t ≤ 1 ∈ R,


the line tx + (1 − t)y ∈ S. We say that S is symmetric around the origin if for any
x ∈ S, we also have −x ∈ S.

Our interest in lattices is the following theorem of Minkowski:

Theorem 6.3.3 (Minkowski). Let S ⊂ Rn be a convex set that is symmetric around the
origin, and let L be a lattice in Rn . If Vol(S) > 2n Vol(T ), then S contains a non-zero lattice
point of L.

Proof. First, we start with the case L = Zn . Consider the map f : S → Rn /2L by
x 7→ x mod 2L. We claim that f cannot be an injection. If it were, we would be able

81
to fit S inside a hyper-cube of volume 2n , which is not possible because Vol(S) > 2n . Ther-
fore, there are two distinct points x, y such that f (x) = f (y). This means that x ≡ y mod 2L,
so x and y differ by a non-zero element of 2L, i.e. x = y + 2p for some non-zero lattice point
p. Since S is symmetric around the origin, −x ∈ S and because S is convex, the line between
−x and y must be contained in S. Therefore, the midpoint 21 (−x + y) = p is contained in
S, which is what we wanted.

For the general case, let L = SpanZ {v1 , . . . , vn }, and let A = v1 · · · vn . Then A maps
Zn to L, and because Vol(S) > 2n Vol(T ) = 2n det(A), this means that Vol(A−1 S) > 2n . By
the above case, there is a non-zero lattice point x of Zn contained in A−1 S, and mapping
back says Ax is a non-zero lattice point of L contained in S.
Using Minkowski’s theorem, we can give another proof that primes congruent to 1 mod 4
are sums of squares.

Theorem 6.3.4 (Fermat). Let p ≡ 1 mod 4 be a prime. Then there are integers x, y such
that p = x2 + y 2 .

Proof. Since p ≡ 1 mod 4, we know that ( −1 p


) = 1. Choose 1 < k < p − 1 such that
     
2 1 0 x
k ≡ −1 mod p, and consider the lattice L = SpanZ , . Note that any ∈L
k p y
satisfies y ≡ kx mod p, which means that x2 + y 2 ≡ 0 mod p. Therefore, every lattice point
of L lies on the circle x2 + y 2 = kp for some integer k. Let S be the open disk x2 + y 2 < 2p.
Note that S is convex, symmetric around the origin, and has area 2πp > 4p. Therefore, by
Minkowski’s theorem, S contains a non-zero lattice point of L. The only circle of the form
x2 + y 2 = kp contained in S is x2 + y 2 = p, so we’re done.
In our previous proof using unique factorization, we saw that p = ππ̄ for some Gaus-
sian prime π = x + yi. This corresponds to the 8 distinct integer points on the circle
(x, ±y), (−x, ±y), (y, ±x), (−y, ±x), by writing p = N (α) for any of the unit multiples of
π, π̄.

In the spirit of the proof using Minkowski’s theorem, is it possible to give a proof of this
using a geometric argument? The answer is yes! To do so, we’ll need another theorem about
lattices:

Theorem 6.3.5 (Pick). Let L be a lattice in R2 and let P be a convex polygon with vertices
on L. Then Area(P ) = (I + 21 B − 1)Area(T ), where I is the number of lattice points of L
inside P , and B is the number of lattice points of L on the boundary of P .

We’ll take Pick’s theorem for granted for the moment, and return to our question.
Fix a choice 
of k with k 2 ≡ −1 mod p. Take L to be the lattice in the above proof,
 
1 0
L = SpanZ , and let C : x2 + y 2 = p be the circle of interest. Note that
k p
for any (x, y) ∈ C, exactly one of (x, y) or (x, −y) must live in C ∩ L: this is because
x2 + y 2 ≡ 0 mod p means that x ≡ ±ky mod p, and if both (x, y) and (x, −y) ∈ L then
this would mean y ≡ −y ≡ kx mod p, so y ≡ 0 mod p, and this is impossible. Therefore,

82
showing that there are exactly 8 integer points is equivalent to showing that C ∩ L contains
exactly 4 points.

If (x, y) ∈ C ∩ L, this tells us that (−x, −y), (−y, x), (y, −x) are all in C ∩ L as well,
which means that |C ∩ L is divisible by 4. Now, construct the convex polygon P having
its vertices the points of C ∩ L. The polygon P is contained entirely on and within the
circle. By Pick’s theorem, Area(P ) = (I + B2 − 1) · p. As all non-zero lattice points of
L lie on a circle of non-zero radius, there is a single interior point of L contained in P ,
namely, the origin. Therefore, I = 1, and all other lattice points are on the boundary. Thus,
|C ∩ L| = 2Area(P
p
)
< 2·πp
p
= 2π. This forces |C ∩ L| = 4, which is what we wanted!

Pick’s theorem has a surprising number of number theoretic applications. Here’s another
one, proving a remark we made earlier in chapter 3:

Proposition 48. Let α ∈ Z[i]. Then |Z[i]/(α)| = N (α).

Proof. Identify C with R2 in the usual way, which identifies Z[i] with the lattice Z2 . Consider
the lattice L = SpanZ {α, iα}. As we saw before, saying that β ≡ β 0 mod α is the same
thing as saying that β, β 0 differ by a lattice point of L, and therefore Z[i]/(α) ∼ = R2 /L.
The different equivalence classes of Z[i]/(α) come from the lattice points of Z2 inside the
fundamental parallelogram T of L, and half the number of points on the boundary, because
the points “above” the diagonal of the parallelogram can be obtained by moving along the
lattice from the lower half. However, this counts the origin twice, so we need to subtract 1.
By Pick’s theorem, this count is precisely Area(T ). This is given by |(x, y, 0) × (−y, x, 0)| =
x2 + y 2 = N (α) as desired.

Example 6.3.6. With the new results about Z[i] from this chapter, we can understand
quotients of Z[i] better.

ˆ Let α = 3 − i ∈ Z[i]. Note that α = (1 − i)(2 + i) is a prime factorization of α. By


the Chinese remainder theorem, Z[i]/(α) ∼ = Z[i]/(1 − i) × Z[i]/(2 + i). Each of these
rings are fields of size 2 and 5 respectively, and so are isomorphic to Z/2Z and Z/5Z.
Therefore, Z[i]/(α) ∼ = Z/2Z × Z/5Z ∼ = Z/10Z.
ˆ With α = 5, then 5 = (1 + 2i)(1 − 2i) is a prime factorization in Z[i]. Therefore,
Z[i]/(5) ∼
= (Z/5Z)2 by the reasoning above.
ˆ If α = 2, then Z[i]/(2) = Z[i]/(1 + i)2 , which is a ring of size 4, which is not a field,
because (1 + i)2 ≡ 0 mod (1 + i)2 . Up to isomorphism, there are two such possibilities
for the isomorphism class of this ring: (Z/2Z)2 and Z/4Z. The former has no non-zero
element with the property that x2 = 0, and therefore, this forces Z[i]/(2) ∼ = Z/4Z.
ˆ With α = 7, we know that α is prime in Z[i]. Therefore, Z[i]/(7) is a field of size 49.
As Z[i] ∼
= Z[x]/(x2 + 1), this means Z[i]/(7) ∼
= (Z[x]/(x2 + 1))/(7) ∼
= Z[x]/(7, x2 + 1) ∼=
2 ∼ 2
(Z[x]/(7))/(x + 1) = F7 [x]/(x + 1). This latter ring is a field extension of F7 where
x2 + 1 has a root, which we may as well call i, so we may write F7 [x]/(x2 + 1) ∼
= F7 (i).

Therefore, Z[i]/(7) = F7 (i).

83
Now, we return to the proof of Pick’s theorem.

Proof. (of Pick’s theorem) Let L = SpanZ {v1 , v2 } and let A = v1 v2 . Then A maps Z2
to L. As the image of a convex set under a linear transformation is convex, if we know the
result for L = Z2 , then we get the result for a general L by linear algebra.

The proof of Pick’s theorem goes as follows:

1. Triangulating the polygon P , we may reduce to the case of triangles.

2. Pick’s theorem holds for rectangles.

3. Cutting the rectangle in half diagonally, Pick’s theorem holds for right triangles with
sides parallel to the axes.

4. Any triangle can be embedded into a rectangle in such a way that it divides it into
four triangles, three of which are right with sides parallel to the axes.

We leave the proof of each step as an exercise for the interested reader.

6.4 Pell’s equation and quadratic rings


Now, we turn our attention to hyperbolas. The general approach of section 1 tells us how to
find rational points on hyperbolas, so we would like to focus on how to find integer points on
hyperbolas. Any hyperbola can be transformed into a standard equation x2 − Dy 2 = N for
some integers, D, N . When D is a perfect square, this turns into the equation (x − Dy)(x +
Dy) = N , from which we can determine solutions from the factors of N . Therefore, the first
“interesting” case comes from studying equations of the form x2 − Dy 2 = N , for integers
D, N with D square-free.
Definition 6.4.1. Let D be a square-free integer. The equation x2 − Dy 2 = 1 is called a
Pell equation.
Hyperbolas of this form are named after the English mathematician John Pell, although
their study long predates him, appearing in texts of the Indian mathematician Brahmagupta
in the 7th century.

This hyperbola always has the trivial points (±1, 0). A point (x, y) is called positive if
x, y > 0. Here are two reasons why Pell’s equation is interesting:
ˆ Suppose that (a, b) is an integer solution to x2 −Dy 2 = 1. This means that a2 −Db2 = 1,
q √
so ( ab )2 − b12 = D. For large denominators b, this means that ab is a “good” rational

approximation to D.

ˆ An integer solution (a, b) to the Pell equation x2 − Dy 2 = 1 is a unit in the ring Z[ D].
Therefore, if one is interested in understanding units of quadratic rings, Pell’s equation
naturally arises.

84
Example 6.4.2. The point (19601,
√ 13860) lies on the hyperbola x2 − 2y 2 = 1. Indeed,
19601
13860
≈ 1.41421356 agrees with 2 to 8 decimal places.
Both of these viewpoints are useful for understanding Pell equations. First, we’ll show
that any Pell equation x2 − Dy 2 = 1 has a non-trivial solution. To do so, we need the
following approximation result of Dirichlet:
Lemma 6.4.3 (Dirichlet). Let α be an irrational number. There are infinitely many integers
x, y with gcd(x, y) = 1 such that |α − xy | < y12 .

Proof. Choose n ≥ 2 and partition the half open interval [0, 1) as [0, 1) = [0, n1 )∪. . .∪[ n−1
n
, 1).
Consider {α} = α − bαc, the fractional part of α. Then {α} lies in a unique one of these
sets. Next, consider {0}, {α}, {2α}, . . . , {nα}. By the Pidgeonhole principle, two of these
must lie in the same set. That is, there exist j, k with 0 ≤ j, k ≤ n and k < j such
that |jα − bjαc − (kα − bkαc)| < n1 . Set y = j − k and x = bjαc − bkαc, so we have
|x − yα| < n1 . We may assume that gcd(x, y) = 1, as otherwise we just divide by an integer,
which strengthens the inequality. Next, because 0 ≤ j, k ≤ n we see that 0 < y < n, which
1
means |x − yα| = | xy − α| < ny < y12 . This shows the existence of such a rational number xy .
It remains to show we can find infinitely many with this property.

With x, y as above, note that | xy − α| =


6 0 as α is irrational. Therefore, choose m > |x
1
−α|
.
y

This procedure produces x1 , y1 such that | xy11 − α| < my1 1 < | xy − α| < y12 and 0 < y1 < m.
Repeating this process produces infinitely many such rational numbers.
Lemma 6.4.4. Let D be a positive, square-free integer. Then there is a constant M such
that |x2 − Dy 2 | < M has infinitely many integral solutions.
√ √
Proof. In R, we write x2 − Dy 2 = (x + Dy)(x − Dy). By√the previous lemma, there
are infinitely many pairs (x, y) with gcd(x, y) = 1 with |x − Dy| < y1 . By the triangle
√ √ √ √ √ √
inequality, |x + Dy| = |x − Dy + 2 Dy| ≤ |x − Dy| + 2 D|y| < y1 + 2 Dy. Therefore,
√ √ √ √ √
|x2 − Dy 2 | = |x − Dy||x + Dy| < y1 ( y1 + 2 Dy) ≤ 2 D + 1. Taking M = 2 D + 1
finishes the proof.
We’re now ready to prove that Pell equations always have solutions.
Theorem 6.4.5 (Lagrange). Let D > 0 be a square-free integer. Then x2 − Dy 2 = 1 has a
non-trivial solution.
Proof. By the previous lemma, there is an integer M such that x2 − Dy 2 = M for infinitely
many integral solutions (x, y), √
and we may assume √ that x, y > 0 are distinct. Why? There
2 2
√ integers −(b2 D + 1c), . . . , b2 D + 1c, and infinitely many solutions with
are finitely many
|x − Dy | < 2 D + 1, so by the pidgeonhole principle, one choice of M has infinitely many
solutions. Now, we work modulo |M |. Since there are finitely many residue classes mod |M |,
applying the pidgeonhole principle a second time, we can find two pairs (x1 , y1 ) 6= (x2 , y2 )
such that x1 ≡ x2 mod |M | and and y1 ≡ y2 mod |M |. Therefore,
√ there√are integers k, `√such
that x2 = x1 + M k and y2 = y1 + M `,√so that x2 + √y2 D = (x1 + y√ 1 D) + M (k + ` D).
We may then compute that (x1 − y1 D)(x2 + y2 D) = M (a + b D) for some integers

85
√ √ √
a, b. Similarly, (x1 + y1 D)(x2 − y2 D) = M (a − b D), and so multiplying together yields
M 2 = M 2 (a2 − Db2 ). This says that a2 −√Db2 = 1. √ Finally, we note that (a, b) 6= (±1, 0)
because if so,
√ this would mean that (x√ 1 −y1 D)(x2 +y2 D) √ = ±M , and multiplying through
by x2 − y2 D would say M (x1 − y1 D) = ±M (x2 − y2 D). This would then imply that
x1 = ±x2 , which is not possible because we assumed that x1 , x2 > 0 and x1 6= x2 . Therefore,
(a, b) is a non-trivial solution, which is what we wanted.
The above theorem tells us that Pell equations always have a solution, but the proof
does not help us find a solution. In order to understand this, we have to look through an
algebraic lens.
√ √
Definition 6.4.6. Let D 6= be a square-free integer. Set Z[ D] = {a + b D : a, b ∈ Z},
which is a subring√ of either √R or C depending on if √D > 0 or D < 0. Addition is de-
fined by
√ (a + b D)
√ + (c + d D) = (a + c) + √
(b + d) D and multiplication is defined by
(a + b D)(c + d D) = (bdD + ac) + (ad + bc) D.
√ √ 2 2
The norm of α = x + y D ∈ Z[ D] √ is defined by N (α) = x − Dy . We usually write α
to mean the conjugate expression x − y D, but beware that this is not equal to the complex
conjugate of α, unless D < 0!

We’re already familiar with a special case of these quadratic rings: when D = −1 then
we recover Z[i]. In general,
√ the arithmetic of these rings varies
√ wildly depending on the√value
D. For example, Z[ 3] has unique factorization, but Z[ −3] does not. Neither Z[ 5] or
of √
Z[ −5] have unique factorization, either! In general, we won’t be able to say too much, but
because all these rings have norms, they all share some similar properties.

Proposition 49. For α, β ∈ Z[ D], N (αβ) = N (α)N (β).

Proof. Same proof as Z[i] but with i replaced by D.
√ √
Definition 6.4.7. We √ say that α ∈ Z[ D] is a unit if there is some β ∈ Z[ D] such that
αβ = 1. If α = x + y D is a unit, we call α positive if x, y > 0.

Proposition 50. α ∈ Z[ D] is a unit if and only if N (α) = ±1.

Proof. If αβ = 1 then by multiplicativity, N (α)N (β) = 1 in Z, so N (α) = ±1. Conversely,


if N (α) = ±1 then either α or −α is a multiplicative inverse of α, so it’s a unit.
The√connection between quadratic rings and Pell’s equation comes from units: saying
α ∈ Z[ D] is a unit with N (α) = 1 is equivalent to saying there are integers x, y such that
x2 − √Dy 2 = 1. In particular, the above theorem tells us there is always a non-trivial unit
in Z[ D]. From the √ multiplicativity of the norm, if N (α) = 1 then√N (αk ) = 1 for any
k ≥ 1. Since αk ∈ Z[ D], we can find xk , yk such that αk = xk + yk D, so that (xk , yk )
is a solution to x2 − Dy 2 = 1. Therefore, if we can find one non-trivial solution to a Pell
equation, we therefore know infinitely many solutions. Also note that α−k = α1k = αk by
multiplying numerator/denominator by αk and using that N (α) = 1. Since the coefficients
of αk are also solutions x2 − Dy 2 = 1, it’s therefore true for all k ∈ Z.

86
√ √ √
Example 6.4.8. Observe that 3 + 2 2 is a unit in Z[ 2] because N (3 + 2 2)√ = 9 − 8 = 1.
Therefore, (3, 2) is a solution to x2 − 2y 2 = √1. Any integer power
√ of 3 + 2 2 therefore
5
produces another solution.
√ −3 For example,√(3 + 2 2) = 3363 + 2378 2 produces the solution
(3363, 2378) and (3 + 2 2) = 99 − 70 2 produces the solution (99, −70).

Next, our goal is to describe all possible solutions to Pell’s equation.

Lemma 6.4.9. Let (x, 2 2


√ solution to x − Dy 0 = 1, and let
√ y) be a0 positive (x0 , y 0 ) be any other
0 0
solution. Then x + y D < x + y D if and only if x < x and y < y .

Proof. The backwards direction is obvious, so we need only to prove the forward √ direction.

0 0
Suppose that (x, y) is a positive solution and (x , y ) is √any solution with
√ x+y D < x0 +y 0 D.
0 0
Since (x, y)√is positive, we have
0 0 0 0
√ x, y ≥0 1 so0 √x + y D > x +0 √ y D > 1. Inverting says
0 < x√ − y D < 1, so x + y √ D > x√− y D. This says 2y D > 0, so y 0 > 0. Since
x0 − y√
0
D > 0, this√says x0 > y 0 D ≥ D > 0 0
√1, so (x0 , y )0 √is a positive solution. Now√from
x + y D < x + y√ D, inverting says x − y D > x − y D, so (x + x0 ) + (y − y 0 ) D <
0 0

(x + x0 ) + (y 0 − y) D, which yields y 0 > y. This then give x2 = 1 + Dy 2 < 1 + Dy 02 = x02 ,


and since x, x0 > 0 taking a square root tells us x < x0 .
If we find a positive solution (x, y) with y minimal, then in fact, x is minimal as well.
This is because if (x0 , y 0 ) is any other solution, we have x2 = 1 + Dy 2 < 1 + Dy 02 = x02 , and
so once more, because (x, y) is positive this means (x0 , y 0 ) is positive, so x < x0 . We will
call a positive solution to x2 − Dy 2 = 1 with x, y minimal the fundamental solution to
x2 − Dy 2 = 1.

Theorem 6.4.10. Let (x1 , y1 ) be the fundamental solution to x2 − Dy 2 = 1. If (x, y) is any


2 2
√ n solution to√x − Dy = 1, then x = xn and y = yn for some n ≥ 1, where
other positive
(x1 + y1 D) = xn + yn D.
√ √ n
Proof. Since (x1 , y1 ) is a positive solution, we have x1√+ y1 D > 1, so (x√ 1 + y 1 D) →√∞ Nas
N +1
n → ∞. Therefore we can find N√such that (x√1 + y1 D) >√x + y D ≥ (x1 +√y1 D) .
−N
Dividing
√ through √ says 1 ≤ (x + y D)(x1 + y1 D) < x1 + y1 D. Write (x + y D)(x1 +
y1 D)−N = a + b D for some√a, b. The above results tell us that (a, b) is a solution to Pell’s
equation,
√ and since √ 1 ≤ a + b D, we have that (a, b) is a positive solution. Therefore, from
a + b D < x1 + y1 D, this means that a < √ x1 and b < y1 , √which contradicts that (x1 , y1 )
is the fundamental solution. Therefore, x + y D = (x1 + y1 D)N for some N .

Corollary 6.4.11. Let (x, y) be the fundamental solution√ to x2 −Dy 2 = 1.√Then all solutions
to x2 − Dy 2 = 1 are of the form ±(xn , yn ), where (x + y D)n = xn + yn D.

Proof. In the above theorem, we proved that if (x0 , y 0 ) is a positive solution, then (x0 , y 0 ) =
(xn , yn√) for some n, so it remains to handle the case that (x0 , y 0 ) is not positive. The number
x0 + y 0 D lies in one√ of the intervals (0, 1), (−1,√ 0), (−∞,0 −1),0 √so exactly one of the numbers
1√ 0 0 1√ 0 0
x0 +y 0 D
= x − y D, − x0 +y0 D = −(x − y D), −(x + y D) lives in (1, ∞). Each of

these numbers still have norm 1√in Z[ D], and so √ their coefficients form solutions to Pell’s
0 0 ±1 n
equation. Therefore, ±(x + y D) = (x + y D) for some n ≥ 1, which is what we
wanted.

87
Example 6.4.12. To solve the Pell equation x2 − 2y 2 = 1, we need to find a fundamental
solution, which comes from a solution (x, y) with y minimal. Setting x2 = 1 + 2y 2 , and
plugging in values of y, we see that (2, 3) is √
the fundamental
√ ksolution. Therefore, any other
positive solution (xk , yk ) come
√ from xk + y√
k D = (2 + 3 2)√. To find explicit
√ formulas for
xk and yk , note that√xk + yk√ D = (2 + 3 2)k√and xk √ − yk 2 = (2 − 3 2)k by inverting.
k 2)k k −(3−2 2)k
Therefore, xk = (3+2 2) +(3−2
2
and yk = (3+2 2)2√ 2
.
2 2
√Next, we turn our attention to the negative Pell equation x − Dy = −1. Since units in
Z[ D] can have norm ±1, this is the other equation√ we need to understand √
if we would like
to understand the structure of the units. If x+y D has norm −1, then (x+y D)2 has norm
1, because the norm is multiplicative. This means that solutions to x2 − Dy 2 = −1 generate
solutions to x2 − Dy 2 = 1! Similarly to before, the fundamental solution of x2 − Dy 2 = −1
will generate all solutions to either Pell equation.

Lemma 6.4.13. Suppose that x2 − Dy 2 = −1 and x + y D > 1. Then x, y ≥ 1.
√ √ √ √
Proof. We have x+y1√D = −(x − y D) = −x + y D, so x + y D > 1 > −x + y D > 0.

Subtracting says that 2x > 0, so x ≥ 1. Therefore, y D > x, so y ≥ 1.
Once more, we can order positive solutions to the negative Pell equation.
2 2 0 0
Lemma 6.4.14.
√ Suppose
0 0
√ that x − Dy = −1 0and (x , y )0 is any other positive solution.
Then x + y D < x + y D if and only if x < x and y < y .
√ 0 0

Proof. Once√ again, the backwards
√ √direction is clear. Suppose that x+y D√< x +y D. √
This
means x0 +y 0 D > x+y D ≥ 1+√ D > 1, so inverting tells √ us that −x+y D > −x0 +y 0 D.
Adding yields (x0 − x) + (y + y 0 ) D > (x − x0 ) + (y + y 0 ) D. This means that 2x0 > 2x,
so x0 > x. From this, Dy 2 = x2 + 1 < (x0 )2 + 1 = D(y 0 )2 , so y < y 0 since y, y 0 are both
positive.

Theorem 6.4.15. Let (x1 , y1 ) be the fundamental√solution to x2 −√Dy 2 = −1. Then all
solutions to x2 − Dy 2 = ±1 are of the form x + y D = ±(x1 + y1 D)k for some j ∈ Z.
The solutions to x2 − Dy 2 = −1 have k odd and the solutions to x2 − Dy 2 = 1 have k even.

Proof. The idea is as follows: first, we show that (x1 + y1 D)2 yields the fundamental
solution to the positive Pell equation x2 − Dy 2 = 1. This√will then tell us that positive
solutions to x2 − Dy 2 = 1 come from even powers of (x1 + y1 D)2 , and then we just need to
show that odd powers produce all positive solutions to x2 − Dy 2 = −1. Once we know the
result for positive solutions, we’ll know it for all solutions using a similar argument to before.

Let (X
√ 1 , Y1 ) denote the√fundamental solution to x2 − Dy 2 = 1, √ with associated√unit
X1 + Y1 D. Since (x1 + y1 D) has norm 1, we can write (x1 +√y1 D) = (X1√+ Y1 D)k
2 2

for some k ≥ 1. If k is even, say, k = 2`, this would mean x1 + y1 D = (X1 + Y1 D)` . The
right hand side has norm 1, while the left hand side has norm −1, √ which is a contradiction.
√ 2
Therefore,
√ k = 2` +√1 must be odd.
√ ` This means that X 1 + Y 1 D = (a + b D) , where
a + b D = (x1 + y1 D)(X1 − Y1 D) . Comparing coefficients, we see that 2ab = Y1 > 0,
which means that a, b have the same sign. Therefore, without loss of generality, we may

88
√ √
assume √ that a, b > 0.√If ` > 0, this would mean that (x1 + y1 D)2 √
= (X1 + Y1 √D)2`+1 >
2
X1 + Y1 D √ = (a + b D) , so taking square roots would say x1 + y1 D > a + b D. Note
that a + b D has norm −1, so by the previous lemma, this would mean √ y1 > b. However,

this would contradict the minimality of y1 . Thus, ` = 0 and X1 + Y1 D = (x1 + y1 D)2 as
desired.

Now, we show that positive solutions come from powers
√ of x1 + y1√ D. If (x, y) is √
a posi-
tive solution to x − Dy = 1, then we may write x + y D = (X1 + Y1 D) = (x1 + y1 D)2k
2 2 k

for some k ≥ 1. This shows that solutions to the positive Pell equation come from even pow-
2 2
ers of the fundamental
√ 2 solution. Now, √ 2k solution to x − Dy = −1, we
√ ifk (x, y) is a positive
can write (x√+ y D) = (X √1 + Y1 D) = (x1 + y1 D) for some k ≥ 1. Taking square
roots, x + y D = (x1 + y1 D)k . Taking norms says −1 = (−1)k , and therefore k must be
odd.

This shows that all positive solutions to x2 − Dy 2 = ±1 come from powers of x1 + y1 √D.
Now, suppose that we have a solution that is not positive. This means that α = x + y D
is not contained in (1, ∞). However, one of α1 , − α1 , −α must be, and therefore we recover
√ √
x + y D = ±(x1 + y1 D)k for some k ∈ Z, which is what we wanted.
The important
√ implication of the above result is that it tells us the structure of the unit
group of Z[ D]!
Corollary 6.4.16. Let D be a square free integer. Then

Z/2Z D < −1
√ × 
Z[ D] ∼ = Z/4Z D = −1

Z/2Z × Z D > 0

Proof. If D < −1 the only solutions to x2 + Dy 2 = ±1 are (±1, 0) corresponding to the


units ±1, and h−1i ∼ = Z/2Z. × ∼
√ If D = −1 then√Z[i]k = hii = Z/4Z. If D > 0, we may write
√ in the form x + y D = ±(x1 + y1 D) for some k ∈ Z, and fundamental unit
any unit
x1 +y1 D (which is a fundamental solution for the negative Pell equation√ if it’s solvable,
√ and
the positive one√ if it’s not). The isomorphism is then given by x + y D → (ε(x + y D), k)
where ε(x + y D) denotes the sign in this decomposition. That this is a group isomorphism
is quite clear.

√When D > 0, the unit used


√ to generate all other
√ units is called the fundamental unit of
Z[ D]. For example, 1 +√ 2 has norm −1 in Z[ 2], and is clearly the fundamental unit
because the coefficient on 2 is minimal among units larger than 1.

To wrap up our discussion on Pell’s equation, we’ll briefly talk about the connection with
continued fractions.
Definition 6.4.17. Let α ∈ R. The continued fraction expansion of α is a sequence
of integers {a0 , a1 , . . .} such that α = a0 + a1 + 1 1 . Notationally, we denote this continued
a2 +...
fraction as [a0 ; a1 , a2 , . . .]. The k-th convergent of α is given by [a0 ; a1 , . . . , ak ].

89

1+ 5
Example 6.4.18. We have 2 = [1; 1]. Let ϕ = 2
. Then ϕ = 1 + ϕ1 , from which we see
that ϕ = [1; 1, 1, . . .].

That a continued fraction expansion of a real number exists and is unique is a fact we’ll
take for granted. The idea is rather simple: any rational number has a finite continued
fraction expansion. Any irrational number can be approximated by rationals, and so one
shows that the sequence of continued fractions converges, and so are the convergents of α.
However, it’s rather technical and not enlightening.

The computation of a continued fraction expansion can be done using an analogous


process to the Euclidean algorithm. Since it’s not terribly important to us, we won’t be very
formal, and will describe it through an example.

Example 6.4.19. Suppose that we wished to compute the continued fraction expansion of e.
1
We can do this as follows. Firstly, e ≈ 2.718, so we can write e = 2+(e−2) = 2+ 1/(e−2) . Take
1 1
a0 = 2 and α1 = e−2 . Then α1 ≈ 1.39, so we can write α1 = 1 + 1/(1−α1 ) . We take a1 = 1 and
1 1 1
set α2 = 1−α 1
. We have α2 ≈ 2.54, so we can write α2 = 2 + 1/(2−α 2)
and we take α3 = 2−α 2
.
The first few terms of the continued fraction expansion are [2; 1, 2, 1, 1, 4, 1, 1, 6, 1, 1, 8, . . .].

The connection between Pell’s equation and continued


√ fractions is as follows.
√ If (x, y) is
2 2 x
a solution to x − Dy = 1, then y is a convergent of D. It’s a fact that D has a periodic
continued fraction expansion. The period tells us how to find the fundamental solution to
the Pell equation.

Theorem 6.4.20.
√ Let D > 0 be square-free, and let m be the period of the continued fraction
expansion of D. The fundamental solution to x2 − Dy 2 = 1 is given by

ˆ (pm−1 , qm−1 ) where pqm−1
m−1
is the m − 1-st convergent of D, if m is even.

ˆ (p2m−1 , q2m−1 ) if m is odd.



Example 6.4.21. One may compute that 13 = [3; 1, 1, 1, 1, 6]. The length of this period
is 5, which is odd, so the theorem says the p fundamental solution to x2 − 13y 2 = 1 is given
√ √
by (x, y) = (p9 , q9 ) = (649, 180). Note that 649 + 180 13 = 18 + 5 13, so that (18, 5) is
the fundamental solution to the negative Pell equation x2 − 13y 2 = −1. Observe that this is
the 5th convergent!

6.5 Quadratic fields


We’ll begin with a general discussion about algebraic numbers. Let K ⊂ C be a sub-field.

Definition 6.5.1. A number α ∈ C is called algebraic if there is a polynomial p(x) ∈ Q[x]


such that p(α) = 0. We call an algebraic number α an algebraic integer if there is a monic
polynomial p(x) ∈ Z[x] with p(α) = 0. Notationally, we use Q to denote the set of algebraic
numbers, K = Q ∩ K, and OK to denote the set of algebraic integers in K.

90
Definition 6.5.2. The minimal polynomial of α ∈ Q, mα (x), is the monic polynomial of
smallest degree such that mα (x) ∈ Q[x] and mα (α) = 0.

It makes sense to speak of a minimal polynomial, because by definition, any algebraic


number α is killed by some polynomial with rational coefficients. By the well-ordering
principle, there is then a polynomial that kills α of minimal degree, and rescaling to be monic
forces uniqueness. We begin by proving some basic properties of the minimal polynomial.

Proposition 51. Let α ∈ Q.

1. For f (x) ∈ Q[x], mα (x) | f (x) in Q[x] if and only if f (α) = 0.

2. mα (x) is irreducible in Q[x].

3. α is an algebraic integer if and only if mα (x) ∈ Z[x].

Proof.

1. The forward direction is obvious. For the backwards direction, write f (x) = mα (x)q(x)+
r(x) with r(x) = 0 or deg(r(x)) < deg(mα (x)) by the division algorithm in Q[x]. Plug-
ging in α, we see that r(α) = 0. By minimality, this forces r(x) = 0, so mα (x) | f (x).

2. For contradiction, suppose that mα (x) = g(x)h(x) ∈ Q[x] is a non-trivial factorization.


Without loss of generality, we may assume that f (x), g(x) are monic by rescaling.
Plugging in α means that either f (α) = 0 or g(α) = 0. But this would mean that α is
a root of a monic polynomial with rational coefficients of strictly smaller degree, which
contradicts the definition of mα (x). Therefore, mα (x) is irreducible.

3. The backwards direction is true by definition, so we only prove the forward direction.
Suppose that α is an algebraic integer. Then there is some monic f (x) ∈ Z[x] such
that f (α) = 0. By the first part, we can write f (x) = mα (x)g(x) as a factorization
in Q[x]. Since f (x) is monic, by Gauss’s lemma, this means the factorization actually
happens in Z[x], so mα (x) ∈ Z[x].

Example 6.5.3.

ˆ 2 is an algebraic integer, because it’s minimal polynomial is x2 − 2 ∈ Z[x].

ˆ 1
is an algebraic number, as it’s a root of x − 12 ∈ Q[x]. However, it’s not an algebraic
2
integer, because it’s minimal polynomial is x − 12 6∈ Z[x].

ˆ α = 3 2 + 1 is an algebraic integer. We see that (α − 1)3 = 2, and so α is a root of
(x − 1)3 − 2 = x3 − 3x2 + 3x − 3 ∈ Z[x]. In fact, mα (x) = x3 − 3x2 + 3x − 3: by the
rational root theorem, the only possible roots are ±3, neither of which are roots. If a
cubic polynomial was reducible, it would have to have a linear factor, and therefore a
root, which means this polynomial is indeed irreducible.

91
Definition 6.5.4. Let α ∈ Q. We define Q(α) to be the smallest subfield of C containing
both α and Q.
Proposition 52. There is an isomorphism of fields Q(α) ∼
= Q[x]/(mα (x)).
Proof. Define a map ϕ : Q[x]/(mα (x)) → Q(α) by p(x) mod mα (x) 7→ p(α). Firstly, we
check that this map is well-defined, since it depends on a choice of representative for the
equivalence class. If p(x) ≡ q(x) mod mα (x), we may write q(x) = p(x) + f (x)mα (x) for
some f (x) ∈ Q[x]. Therefore, ϕ(q(x) mod mα (x)) = q(α) = p(α) + f (α)mα (α) = p(α) =
ϕ(p(x) mod mα (x)). Next, we observe that ϕ actually is a homomorphism. This is easy to
see, and leave the verification to the reader. Finally, we prove that ϕ is an isomorphism. If
ϕ(p(x) mod mα (x)) = ϕ(q(x) mod mα (x)), this would mean p(α) = q(α), so (p − q)(α) = 0.
This says mα (x) | (p(x) − q(x)), which means that p(x) ≡ q(x) mod mα (x), so ϕ is injective.
Finally, we note that x mod mα (x) 7→ α and rs mod mα (x) 7→ rs . Since Q[x]/(mα (x)) is a
field because mα (x) is irreducible in Q[x], this means its image under ϕ is isomorphic to a
subfield of Q(α) containing Q and α, and therefore equals Q(α) by definition.
Corollary 6.5.5. As a set, Q(α) = {a0 + . . . + ad−1 αd−1 : ai ∈ Q}, where d = deg(mα (x)).
Proof. This immediately follows from the above isomorphism: any element of Q[x]/(mα (x))
has a representative that is a polynomial with rational coefficients of degree strictly smaller
than d.
For some concrete
√ examples, any element √ of Q(i)
√ is of the form a + bi for some a, b ∈ Q.
3 3 3
Any element of Q( 2) is of the form a + b 2 + c 4, for some a, b, c ∈ Q. Another way of
thinking about this result is that the powers 1, α, . . . , αd−1 of α form a Q-basis of Q(α) as a
Q-vector space.

Fields of the form Q(α) for some α algebraic are called number fields. They’re the main
object of study in algebraic number theory. A common misconception is that this branch
of number theory is “using algebra to study number theory”. However, as not all number
fields are real subfields of C, one naturally needs to bring in tools from complex analysis to
understand the full picture. Although not yet clear, the algebraic integers of a number field
play a role analogous to the role that Z plays inside of Q.

Proposition 53. OQ = Z.
Proof. Suppose that α ∈ Q is an algebraic integer. Write α = rs with gcd(r, s) = 1. By
definition, there is some monic f (x) = xn +an−1 xn−1 +. . .+a0 ∈ Z[x] with f (α) = 0. Plugging
in, this means ( rs )n +. . .+a0 = 0, so clearing denominators yields rn +an−1 rn−1 s+. . .+sn = 0.
This means rn = −s(an−1 rn−1 + . . . + sn−1 ), so s | rn . Since gcd(r, s) = 1, this means s = ±1,
so α ∈ Z. Obviously any integer is an algebraic integer, so we’re done.
This result may seem silly, but it’s actually very important. It allows us to adjust the
way we think about the subject! So far, we’ve really been considering the integers to be our
main object of study. Instead, we can now think of the rationals as what we’re really trying
to understand. In particular, that means understanding the special subring OQ = Z! The

92
main power of this viewpoint is that by studying number fields, powerful tools from field
theory and Galois theory can be used.

For our
√ purposes though, we’re just going to focus on quadratic √ fields, fields of the form
K = Q( D) for D a square-free integer. For any α = a + b D ∈ K, we note that the
minimal polynomial of α is mα (x) = x2 − 2ax + 2 2
√ (a − Db ) ∈ Q[x], because its roots α and
ᾱ are not rational (assuming b 6= 0). If α ∈ Z[ √D], then both coefficients are integers, and
therefore
√ the earlier proposition tells us that Z[ D] ⊂ OK . In general though, OK is larger

1+ 5

than Z[ D]. To see this, note that 2 ∈ Q( 5) has minimal polynomial x2 − x − 1 ∈ Z[x],

and therefore Z[ 5] ( OK .

Theorem 6.5.6. Let D be a square-free integer, and let K = Q( D). Then
( √
Z[ 1+2 D ] D ≡ 1 mod 4
OK = √
Z[ D] D ≡ 2, 3 mod 4

Proof. Let α ∈ OK . Write α = a + b D for some a, b ∈ Q. As observed above, mα (x) =
x2 − 2ax + (a2 − Db2 ). Therefore, we require 2a, a2 − Db2 ∈ Z. This means that a ∈ 12 Z, so
write a = 2c for some c.

ˆ If c is even, then a ∈ Z so from a2 − Db2 ∈ Z, this means that Db2 ∈ Z. Writing


b = rs for gcd(r, s) = 1, this means Dr2 = s2 k for some k. This means s2 | D because
gcd(s2 , r2√
) = 1, and because D is square-free,
√ s2 = 1 means s = ±1. Therefore, b ∈ Z,
so α ∈ Z[ D] and therefore OK = Z[ D].
2 2
ˆ If c is odd, then c −4Db
4
∈ Z, so we can write c2 − 4Db2 = 4k for some k. This means
that d(2b)2 ∈ Z, so by the above argument this means 2b ∈ Z. Therefore, we can
write 2b = ` for some `. As c2 − D`2 = 4k, because c is odd, working mod 4 means
1 ≡ D`2 mod 4. us that D ≡ 1 mod
√ Thisc tells` √ √
4, and √` is odd. Returning to α, we
have α = a + b D = 2 + 2 D = 2 + `( 2 ) ∈ Z[ 1+2 D ].
c−` 1+ D


Therefore, we’ve shown that if D√≡ 2, 3 mod 4 then c is even, which means OK = Z[ D]. If
D ≡ 1 mod 4, we show that Z[ 1+2 D ] ⊂ OK ,√which then√would mean c is odd, and give us the

other inclusion. To see this, let α = a+b( 1+2 D ) ∈ Z[ 1+2 D ]. Then writing α = (a+ 2b )+ 2b D,
we may compute that the minimal polynomial of α is mα (x) = x2 − (2a + b)x + (a2 + ab +
b2 ( 1−D
4
) ∈ Z[x], so we’re done.

In particular, we see that OK for K = Q( D) is a ring. It turns out this is true for any
number field, although the proof is more complicated. The ring OK is supposed to be an
object
√ inside√of K that plays a role similar to that of Z inside of Q. √ For example,
√ the subring
Z[ 5] of Q( 5) does not have unique factorization, because (3 + 5)(3 − 5) = 4 = 2 · 2 are
two genuinely√ different irreducible factorizations of 4. However, it turns out that the ring of
integers Z[ 1+2 5 ] does have unique factorization! This example is a bit disingenuous, though,

as unfortunately OK still√ does not have
√ unique factorization. For example, Z[ −5] is the
ring of integers in Q( −5), but Z[ −5] does not have unique factorization (see exercise

93
2.4.9).

For quadratic rings, we know the following: if D < 0, there precisely 9 values of D for
which OK has unique factorization: these are D = −1, −2, −3, −7, −11, −19, −43, −67, −163.
This is known as the Heegner-Stark theorem. For D > 0, we know know even less! It was
conjectured by Gauss that there are infinitely many values of D for which OK has unique
factorization. Even today, this is still an open problem. In fact, we expect this to hold
≈ 76% of the time for prime values of D!

It’s often times easy to show that a quadratic ring does not have unique factorization.

Proposition 54. Let D be squarefree and suppose that Z[ D] has unique factorization.
Then for prime p ∈ Z, the following are equivalent:

1. p factors in Z[ D].

2. ±p = x2 − Dy 2 for some x, y ∈ Z.

3. D ≡  mod p.

Proof. If p = αβ in Z[ D], then taking norms means p2 = N (α)N (β). If the factorization
is non-trivial, this means either p = N (α) or −p = N (α), which means ±p = x2 − Dy 2 . If
±p = x2 − Dy 2 , then working mod p says x2 − Dy 2 ≡ 0 mod p. Since x, y 6≡ 0 mod p, this
2
means (x/y)
√ ≡ D mod p. Finally,
√ If D ≡ √ mod p, there is some x such that√ p | x2 − D in
Z. In Z[ D], this means p | (x − D)(x + √ D). Note that p is not prime in Z[ D], because
√ then this would mean p | x ± D so that p | ±1, which is impossible. Therefore,
if it were,
as Z[ D] has unique factorization,
√ this means that p is reducible, so there is non-trivial
factorization p = αβ in Z[ D].

Example 6.5.7. The ring Z[ 10] does not have unique factorization. Note that 10 ≡
0 mod 2, so by the above result, this would mean that either 2 = x2 −10y 2 or −2 = x2 −10y 2 is
solvable. However, this would mean that x2 ≡ ±2 mod 5 is solvable, which is a contradiction.
We are able to show this abstractly, without exhibiting any different factorizations!

Although OK does not generally have unique factorization of elements, it remarkably


turns out that it has unique factorization of ideals! In fact, historically the term ideal comes
from number theory, not algebra. Kummer observed that OK does not always have unique
factorization. In an attempt to explain (and fix) this failure, Kummer’s idea was that there
were so called “ideal numbers” for which unique factorization would hold. Dedekind later
fleshed out this idea and it led to the definition of ideals for rings.

Here are some facts about ideals in OK which we’ll use to understand how this works.

ˆ Any ideal I ⊂ OK is finitely generated, with at most two generators. Concretely,


I = (α) or I = (α, β) for some α, β ∈ OK .

94
ˆ For two ideals I, J ⊂ OK , the product IJ is defined by IJ = { i ai bi : ai ∈ I, bi ∈ J},
P
the set of finite sums of products of elements of I and J. If we write I = (α1 , α2 ) and
J = (β1 , β2 ), then IJ = (α1 β1 , α1 β2 , α2 β1 , α2 β2 ), which by fact 1 can be written as an
ideal generated by two elements.
√ √
Example√6.5.8. For√K = Q( −14), we have √ OK = √ Z[ −14] because −14 ≡ 2√mod 4. Let
I√= (5 + −14,√2 + −14) and J = (4 + −14, 2 −√ −14). Then IJ = (6 + √ 9 −14, −6 +
6 −14, 24 − 3 −14, 18). Observe that IJ = (6, 3 −14): that IJ ⊂ (6, 3 −14) is clear,
and√ to see the other containment, note that adding the first two generators √ shows that
15√ −14 ∈ IJ, and multiplying the third by 3 and adding 18 shows that 18 −14 ∈ IJ, so
3 −14 ∈ IJ. Finally, note adding the second √ generator to twice the third says 42 ∈ IJ, so
gcd(18, 42) = 6 ∈ IJ, which says IJ = (6, −14).
Definition 6.5.9. Let I, J be ideals of OK . We say that I divides J if there is another
ideal A such that J = IA. We say that an ideal p is prime if p | IJ means p | I or p | J.
To the reader familiar with abstract algebra, this might seen different than the “usual”
definition of a prime ideal. Most algebra textbooks would call p prime if ab ∈ p means either
a ∈ p or b ∈ p. For the ring of integers OK , these definitions are equivalent. This relies on
two facts:

ˆ For any commutative ring R, an ideal P is prime if and only if for any ideals I, J ⊂ R
with IJ ⊂ P , either I ⊂ P or J ⊂ P .
ˆ In OK , I | J if and only if I ⊃ J.

This latter property is rather special: this is not true for most rings! Now that we have a
definition of divisibility of ideals and a notion of what a prime ideal is, we can state what
we mean precisely by unique factorization.
Theorem 6.5.10. Let I be an ideal of OK . Then there exist unique prime ideals p1 , . . . pk
and integers e1 , . . . , ek ≥ 1 such that I = pe1 · · · pekk .
We won’t prove this theorem, as we would need to develop some theory in order to do
so. The idea, though, is define a notion of a norm of an ideal, as a way of measuring its
size. The proof then proceeds analogously to the proof in theorem 2.3.8 by induction on the
norm of an ideal.
√ √
Example 6.5.11. Let K = Q( −5), which √ has OK =√Z[ −5]. The ring OK does not
have unique factorization: 6 = 2 · 3 = (1 + −5)(1 − −5) are two genuinely different
factorizations of 6 that do√ not2 differ √by a unit. √As ideals, though, we have the unique
factorization (6) = (2, 1+ −5) (3, 1+ −5)(3, 1− −5). That these ideals are indeed prime
can be verified by using the result from abstract algebra than an ideal P of a ring
√ R is prime
if
√ and only if the quotient ring R/P
√ is a domain. For example, to compute Z[ −5]/(2, 1 +
−5),
√ in this quotient
√ we have −5 = −1 and √ 2 = 0, so there’s
√ an isomorphism of rings
Z[ −5]/(2, 1 + −5) ∼ = Z/2Z given by a + b −5 mod (2, 1 + −5) 7→ a − b mod 2.
The obstruction of when OK has unique factorization is the existence of non-principal
ideals:

95
Theorem 6.5.12. OK has unique factorization if and only if every ideal in OK is principal.

We’ll use this theorem to give another proof that Z[ −5] does not have unique factor-
ization.
√ √
Example 6.5.13. √ The ideal (2, −5) of Z[ √−5] is not principal.√Suppose it were, then we
√ −5) = (α) for some α ∈ Z[ −5]. Since 2 ∈ (2, −5), this means 2 ∈ (α),
could write (2,
so 2 | α in Z[ −5]. Write
√ 2 = αβ for some β. Taking norms would mean 4 = N (α)N (β), so
N (α) | 4. Similarly, −5 ∈ (α) so N (α) | 5. Therefore,
√ N (α)√| gcd(4, 5) = 1, so N (α) = 1
and
√ therefore√α is a unit. This would mean that√Z[ −5] = √(2, −5). If this were true, then
Z[ −5]/(2, −5) would be the 0 ring, but Z[ −5]/(2, −5) ∼ = Z/2Z as we saw before,
which yields a contradiction.

Finally, we close with the factorization of primes in OK for K = Q( D).

Theorem 6.5.14. Let p ∈ Z be a prime. Set d = D if D √ ≡ 1 mod 4, or d = 4D if


D ≡ 2, 3 mod 4. Then the way (p) factors in OK for K = Q( D) is:

ˆ (p) = p2 for some prime ideal p of OK if and only if p | d. In this case, we say that
(p) is ramified.

ˆ (p) = pp for some prime ideal p of OK if and only if ( dp ) = 1. In this case, we say that
(p) splits.

ˆ (p) is prime in OK if and only if ( dp ) = −1. In this case, we say that (p) is intert.

Here, by p we mean the conjugate ideal of p. If p = (α, β) for α, β ∈ OK , we define


p = (α, β). The number d in the theorem is called the discriminant of K.

Note that ( dp ) = ( Dp ) because ( p4 ) = 1 is always true. One may use quadratic reciprocity
to aid in the computation of the Legendre symbol, and ultimately, will conclude that the
factorization type of (p) is governed by congruence classes of p mod d.

Example 6.5.15. Let K = Q( 3), so the discriminant is d = 12. The primes which ramify
are p = 2, 3. The primes which split are those for which ( 12 p
) = ( p3 ) = 1, which is given by
p ≡ ±1 mod 12. The primes which are inert are those for which ( p3 ) = −1, which is given by
p ≡ ±5 mod 12.

√In the above example, for√ a random prime p it should be “equally likely” that (p) splits in
Z[ 3] and (p) is inert in Z[ 3], because the probability that p lies in the congruence classes
±1 mod 12 should be “the same” as p lying in the congruence classes p ≡ ±5 mod 12. Since
any prime
√ p 6= 2, 3 must satisfy p ≡ ±1, ±5 mod
√ 12, this means about half the primes split
in Z[ 3] and half the primes stay inert in Z[ 3].

This is a consequence of the prime number theorem for arithmetic progressions. For
any integer n, all but finitely many primes must lie in some congruence class a mod n with
gcd(a, n) = 1. Dirichlet’s theorem says that each of these congruence classes a mod n con-
tains infinitely many primes, and the prime number theorem for arithmetic progressions says

96
that the primes are “equidistributed” among the ϕ(n) congruence classes of interest.

For a general quadratic field K = Q( D), the same story is going to be true. For a
general number field K, there are finitely many “splitting types” of the ideal (p) in OK .
The proportion of primes that fall into a certain splitting type is completely understood,
and described in the Chebotarev density theorem, which is, essentially, a generalization of
Dirichlet’s theorem. The Chebotarev density theorem is one of the groundbreaking results
of the early 1900s. The idea that reciprocity laws correspond to splitting of primes was one
of the major ideas that went into the development of class field theory.

6.6 Exercises
1. (a) Find all rational points on the ellipse x2 − xy + y 2 = 1.
(b) Find all integer points on the ellipse x2 − xy + y 2 = 1.

2. Can two (distinct) perfect squares average to be a perfect square? In other words, are
2 2
there integers 0 < x < y < z such that x +z2
= y 2 ? If no, provide a proof. If so, find
all such solutions.

3. Let p be an odd prime.

(a) Prove that for any a ∈ Fp , x2 + y 2 = a is solvable in (Fp )2 . (One approach: show
the sets {x2 : x ∈ Fp } and {a − x2 : x ∈ Fp } overlap).
(b) How many solutions in (Fp )2 are there to x2 + y 2 = 1? (Does the usual approach
still work?)

4. Let p be an odd prime. Use Minkowski’s theorem to prove that x2 + 2y 2 = p has


solutions if and only if p ≡ 1, 3 mod 8.

5. For prime p, the solvability of x2 +y 2 = p in Z2 was related to the arithmetic of the ring
Z[i]: integer points on this circle correspond to factorizations p = αᾱ for α = x + yi
in Z[i]. In this problem,
√ you’ll see how the solvability of x2 + 2y 2 = p is related to the
arithmetic of Z[ −2].

(a) Modifying the argument that was used√to show Z[i] has a division algorithm, state
and prove a division algorithm for Z[ −2].

(b) Once you know that Z[ −2] has a division algorithm, the usual chain of reasoning
will show that it has unique factorization. Modify the proof of when p is a sum
of two squares to show that x2 + 2y 2 = p has solutions if and only if p = 2 or
p ≡ 1, 3 mod 8.

(c) For prime p ∈ Z, list (with proof!) the possible factorization types of p ∈ Z[ −2].
√ √
6. Find the fundamental unit of Z[ 6], and then give a description of all units in Z[ 6].

7. Find all integer solutions to x2 − 10y 2 = −1.

97
8. Observe that (6, 1) is a solution to x2 − 10y 2 = 26. Find two other positive solutions.

9. (a) Suppose that (x, y) is a solution to x2 − Dy 2 = 1. Show that | xy − D| < y12 .
(b) Find the fundamental solution to x2 − 101y 2 = 1.

(c) Use the previous part to find a rational number xy such that | 101 − xy | < 10−12 .

Use this to write down an approximation to 101.

10. (a) Observe that x2 − 80y 2 = 1 has a positive solution by inspection. Hence, prove
that x2 − 80y 2 = −1 has no integer solutions. Generalize this argument to show
that x2 − (n2 − 1)y 2 = −1 has no integer solutions.

n ≥ 2 be an integer. Prove that the fundamental unit of Z[ n2 − 1] is
(b) Let √
n + n2 − 1.

11. (a) Let p ≡ 3 mod 4 be a prime. Prove that x2 − py 2 = −1 is not solvable.



(b) Let p ≡ 1 mod 4 be a prime, and let a + b p > 1 correspond to the fundamental
solution to x2 − py 2 = 1. Prove that a is odd, b is even, and gcd(a + 1, a − 1) = 2.
(c) From pb2 = a2 −1 = (a−1)(a+1), deduce that either a+1 = 2pu2 and a−1 = 2v 2
or a + 1 = 2u2 and a − 1 = 2pv 2 for some integers (u, v).
(d) Show the latter case cannot actually happen, so that a+1 = 2pu2 and a−1 = 2v 2 .
Deduce that v 2 − pu2 = −1, so that the equation x2 − py 2 = −1 does have a
solution.

12. The n-th triangular number Tn is defined by Tn = 1+2+. . .+n = n(n+1) 2


. The numbers
Tn count the number of dots in the triangular arrangement with n dots on each side.
Prove there are infinitely many pairs of integers (n, k) such that Tn = k 2 . (Hint: write
down an equation describing such pairs and transform it into a Pell equation).
√ √
13. Find (with proof) the minimal polynomial of 2 + 3.

98
Appendix A

Induction

The most common statement of mathematical induction may be stated as follows:

Theorem A.0.1. (Principle of Mathematical Induction) For n ∈ N, let P (n) be a statement


such that

1. P (n0 ) is true for some n0

2. P (k) is true implies P (k + 1) is true for all k ≥ n0 .

Then P (n) is true for all n ≥ n0 .

There is also a “stronger” version of induction:

Theorem A.0.2. (Principle of Strong Induction) For n ∈ N, let P (n) be a statement such
that

1. P (n0 ) is true for some n0

2. P (n0 ), . . . , P (k) is true implies P (k + 1) is true for all k ≥ n0 .

Then P (n) is true for all n ≥ n0 .

It’s not terribly hard to show that these two forms of induction are equivalent to each
other. More interestingly, is that both forms of induction are equivalent to the well-ordering
principle!

Theorem A.0.3. The principle of strong induction is equivalent to the well-ordering prin-
ciple.

Proof. Suppose that the principle of strong induction holds. Let S ⊂ Z+ be a non-empty
subset. We wish to show that S has a least positive element. For sake of contradiction,
suppose that S does not have a least positive element. Then 1 6∈ S, because 1 is the smallest
positive integer. From this we see that 2 6∈ S, because 1 6∈ S and 2 is the next positive
integer after 1. Continuing this train of thought, we see that if 1, 2, . . . , k 6∈ S for some k,
then we must have k + 1 6∈ S . By induction, we must then have n 6∈ S for alln ≥ 1, which
says that S is empty, a contradiction. Therefore, if we assume strong induction holds, then

99
the well-ordering principle holds.

Now suppose that the well-ordering principle holds, and let P be a statement about
integers such that P (n0 ) is true for some n0 and P (n0 ), . . . , P (k) true implies that P (k + 1)
is true for all k ≥ n0 . We wish to show that P (n) is true for all n ≥ n0 . Suppose otherwise,
that there is some m ≥ n0 such that P (m) is false. Let S = {n ∈ Z+ : P (n) is false}.
By assumption S is non-empty, so by the well-ordering principle, S has a smallest positive
element, say k. Since P (n0 ) is true, we must have that k > n0 . Now by definition of k,
P (k − 1) must be true. Similarly, P (n0 ), P (n0 + 1), . . . , P (k − 1) must all be true. By
strong induction, this then says that P (k) is true, which is a contradiction. Therefore if the
well-ordering principle holds, then strong induction holds, so we are done.
We have shown that induction and the well-ordering principle are equivalent, but we
haven’t shown that either one of these statements are actually true. In fact, we can’t! Any
construction of the integers (e.g. the Peano construction) must take either the well-ordering
principle or mathematical induction as an axiom.

100
Appendix B

Algebraic Structures

B.1 Groups
Definition B.1.1. A group is a pair (G, ·) for a non-empty set G and binary operation ·
that satisfy the following properties:

1. For any a, b, c ∈ G, we have (a · b) · c = a · (b · c)

2. There exists e ∈ G such that for any a ∈ G, a · e = e · a = a.

3. For all a ∈ G, there exists a−1 ∈ G such that a · a−1 = a−1 · a = e.

The element e is called the identity element of the group G, and a−1 is called the inverse
of the element a. Furthermore, if G is a group and satisfies the additional property that
for all a, b ∈ G we have a · b = b · a, we say that G is abelian. When G is abelian, it is
customary to write the operation on G as “+” instead of “·” (unless working with a specific
example of a group).

In other words, a group is a set where you can “multiply” elements together in some way.
The axioms for a group are rather loose, and so the notion of “multiplication” in a group
can be rather abstract.

Example B.1.2. The trivial group {0} defined by the operation 0 · 0 = 0. This is an abelian
group.

Example B.1.3. The pair (Z, +) with the usual addition of integers form an abelian group.
The identity element is 0, and the inverse of a number a ∈ Z is the negative number −a
because a + (−a) = 0. However, (Z, ·) with the operation of multiplication does not form
a group. Although there is an identity element with respect to this operation (the number
1), not all integers have an inverse with respect to ·. For example, there is no integer a such
that 2a = 1, so 2 has no inverse with respect to ·.

Example B.1.4. The pairs (Q, +) and (Q \ {0}, ·) are with the usual operations of addition
and multiplication respectively are abelain groups.

101
Example B.1.5. The pair (M2 (R), +) of 2 × 2 matrices with real number entries is an
abelain group. The pair (SL2 (R), ·) of 2 × 2 matrices of determinant 1  is a group
 under
1 0
multiplication of matrices. The identity element is the identity matrix , and the
0 1
fact that this forms a group is because from linear algebra, det(AB) = det(A) det(B), and
therefore the product of two matrices of determinant 1 has determinant 1. However, this
group is not an abelian, because
  matrix
 multiplication
  is generally
 not commutative.
   As an
1 1 1 0 2 1 1 0 1 1 1 1
explicit counterexample, = , while = .
0 1 1 1 1 1 1 1 0 1 1 2

Example B.1.6. Let X be a non-empty set, and let Sym(X) denote the set of bijections
on X. That is, an element of Sym(X) is a bijection f : X → X. Then (Sym(X), ◦) is a
group under composition of functions. The identity element of Sym(X) is the identity map
idX defined by idX (x) = x for all x ∈ X. The inverse of f ∈ Sym(X) is the inverse map
f −1 : X → X, which necessarily exists as f is a bijection. The group Sym(X) is called the
symmetric group on X.

Definition B.1.7. Let (G, ·G ) and (H, ·H ) be groups. The product of G and H, G × H is
the set G × H = {(g, h) : g ∈ G, H ∈ H} with group operation · given by (g, h) · (g 0 , h0 ) =
(g ·G g 0 , h ·H h0 ) for (g, h) and (g 0 , h0 ) ∈ G × H. The identity element is given by (eG , eH ).

Definition B.1.8. A subgroup of a group (G, ·) is a subset H ⊂ G such that (H, ·) is also
a group, and the identity element of H is the same as the identity element of G.

Example B.1.9. The set of even integers 2Z is a subgroup of (Z, +). The integers Z are a
subgroup of (Q, +) and also of (R, +). The group SL2 (R) is a subgroup of (M2 (R), ·).

B.2 Rings and fields


Definition B.2.1. A ring is a pair (R, +, ·) consisting of a non-empty set R with binary
operations +, · called addition and multiplication that satisfy the following axioms for all
a, b, c ∈ R:

1. a + b = b + a.

2. a + (b + c) = (a + b) + c and (a · b) · c = a · (b · c).

3. a · (b + c) = a · b + a · c and (a + b) · c = a · c + b · c.

4. There is an element 0 ∈ R with the property a + 0 = a.

5. For any a, there is an element −a ∈ R such that a + (−a) = 0.

6. There is an element 1 ∈ R such that a · 1 = a.

Note that the definition of a ring does not require that multiplication be commutative. A
ring that satisfies ab = ba for all a, b ∈ R is called a commutative ring.

102
Some algebra textbooks do not require that a ring have a multiplicative identity, and
instead call our definition a “ring with identity”. This is very bad – for various reasons, it
ends up being better to think of not having an identity element as something missing from
a ring instead of something added to a ring. There are a few arguments for not including a
multiplicative identity as part of the definition of a ring, but at the end of the day, objects
that behave like rings without identity are much better labeled under different terms.

Example B.2.2. The zero ring is a set {0} with operations defined by 0+0 = 0 and 0·0 = 0.
This satisfies the axioms for a commutative ring, with additive and multiplicative identity
both given by 0.

Example B.2.3. The integers Z are the prototypical example of a ring. Other familiar
examples: Q, R, C. All these rings are commutative.

Example B.2.4. M2 (R) is a ring with operations of matrix addition and matrix multi-
plication. This ring is not commutative, as can be seen by the example in the previous
section.

Example B.2.5. Let R be a ring, and let R[x] denote the set of polynomials with coefficients
in R in the indeterminate variable x. An element p(x) ∈ R[x] looks like p(x) = a0 + a1 x +
. . . + an xn for some n ≥ 0 and ai ∈ R. Then R[x] forms a commutative ring with operations
of polynomial addition and polynomial multiplication. These operations are the same as the
ones you learn in middle school. For example, in Z[x], one has (2x+1)+(x2 +2) = x2 +2x+3
and (2x+1)(x2 +2) = 2x3 +x2 +4x+2. In R[x] the additive identity is the constant polynomial
p(x) = 0 and the multiplicative identity is the constant polynomial p(x) = 1.

For any ring R, one may “forget” the multiplicative structure to produce a group (R, +)
with respect to the operation of addition. Can one similarly “forget” the additive structure
to be left with a group (R, ·)? Not quite, because the axioms of a ring do not require that
all elements have a multiplicative inverse. However, a special subset of R will have this
property.

Definition B.2.6. Let R be a ring. An element r ∈ R is called a unit of R if it has a


multiplicative inverse. That is, there exists s ∈ R such that rs = sr = 1. If r is a unit, we
denote its inverse as r−1 . The set of units of R is denoted R× , and (R× , ·) forms a group
called the unit group of R.

Example B.2.7. Z× = {±1}, because the only integer solutions to the equation xy = 1 are
x = y = 1 or x = y = −1.

Example B.2.8. Q× = Q \ {0}, because all non-zero rational numbers have a multiplicative
inverse: if ab 6= 0, then ( ab )−1 = ab because ab · ab = ab
ab
= 1.

Definition B.2.9. Let (R, +R , ·R ) and (S, +S , ·S ) be rings. The product ring R × S is the
set R × S = {(r, s) : r ∈ R, s ∈ S} equipped with a ring structure given by (r, s) + (r0 , s0 ) =
(r +R r0 , s +S s0 ) and (r, s) · (r0 , s0 ) = (r ·R r0 , s ·S s0 ) for (r, s) and (r0 , s0 ) ∈ R × S. The identity
elements for + and · are (0R , 0S ) and (1R , 1S ) respectively.

103
Definition B.2.10. A subring of a ring (R, +, ·) is a subset S ⊂ R such that (S, +, ·) forms
a ring with the same identity elements 0 and 1.

Example B.2.11. Z is a subring of Q. It’s also a subring of R, and C.

Example B.2.12. The set of even integers 2Z is a not a subring of Z. Although the sum and
product of two even integers is still even, 1 is not an even number, so 2Z does not contain
the multiplicative identity element of Z (and in fact, does not even have a multiplicative
identity element!)

Definition B.2.13. An ideal of a commutative ring (R, +, ·) is a subgroup (I, +) of (R, +)


such that for any r ∈ R and x ∈ I, one has rx ∈ I. In other words, an ideal of R is an
additive subgroup of R that absorbs multiplication by elements of R.

Note: we restricted the definition of ideals to commutative rings. For a general ring
R, one has notions of left ideals and right ideals, where the multiplication by elements of
R happen on the left or right respectively. For a commutative ring, there is no difference
between these notions. An ideal of a general ring R is a two-sided ideal, meaning it is both a
left and a right ideal. However, all rings we care about will be commutative, and so we will
not bother with these distinctions. Note that being an ideal of R is a stronger property than
I being a subring of R. A subring must only be closed under multiplication by elements of
itself. An ideal is closed under multiplication by any element of R. This is part of why we
want to require that rings have multiplicative identities. If we didn’t then ideals would be
subrings. However, they morally aren’t subrings, because they have stronger properties.

Definition B.2.14. Let (R, +, ·) be a commutative ring and let r ∈ R. The principal
ideal generated by r is (r) = {x · r : x ∈ R}, the set of R-multiples of r. An ideal I is called
principal if I = (r) for some r ∈ R.

Note that (r) really is an ideal: indeed, for any a, b ∈ (r) we have a = xr and b = yr for
some x, y ∈ R, and therefore a + b = (x + y)r ∈ (r). We have 0 = 0 · r ∈ (r), and clearly
−a = (−x)r ∈ (r) so (r) is an additive subgroup of (R, +). For any s ∈ R and a ∈ (r), we
have sa = s(xr) = (sx)r ∈ (r), so (r) absorbs multiplication by elements of R, which means
its an ideal.

Example B.2.15. I = (0) is an ideal of any commutative ring R.

Example B.2.16. The even integers 2Z are an ideal of Z. In fact, 2Z = (2) is the principal
ideal generated by 2 (hence, the notation!)

Example B.2.17. In Q, one has (1) = Q. This is because any element of Q is a Q-multiple
of 1: in particular, ab = ab · 1 ∈ (1). In general, for any commutative ring R one has R = (1)
because for any r ∈ R, one has r = r · 1 ∈ (1).

Example B.2.18. In Z[x], the ideal I = (x2 + 1) consists of all Z[x]-multiples of x2 + 1. For
example, (x2 + 1) · (x2 + 2) = x4 + 3x2 + 2 ∈ I, but 2x 6∈ I because there is no polynomial
p(x) ∈ Z[x] with (x2 + 2)p(x) = 2x.

104
Example B.2.19. Consider S = R[x2 ], the set of polynomials with real coefficients in which
only even powers of x occur. Then S is a subring of R[x2 ], but S is not an ideal of R[x2 ],
because x · (x2 ) = x3 6∈ S.

Definition B.2.20. A field is a pair (F, +, ·) such that (F, +, ·) forms a commutative ring
with 0 6= 1 and F × = F \ {0}. That is, F is a commutative ring where all non-zero have a
multiplicative inverse, so one can divide by any non-zero element.

Note that we require 0 6= 1 in the definition of a field. This is because we do not want
the zero ring to be a field!

Example B.2.21. Q, R, C are all familiar examples of fields. Z is not a field, because one
cannot divide by 2 and stay in the integers: 21 6∈ Z.

Example B.2.22. Let Q(x) denote the set of rational functions in the indeterminate x
with coefficient in Q. That is, a function f (x) ∈ Q(x) is of the form f (x) = p(x)
q(x)
for some
polynomial p(x), q(x) ∈ Q[x] with Q[x]. The operations of addition and multiplication are
once again, the ones familiar from middle school. With respect to these operations, Q(x)
forms a field.

Example B.2.23. Let F be a field. Then the only ideals of F are (0) and F . Indeed, (0)
is always an ideal. Suppose that I ⊂ F is a non-zero ideal. This means there is some x ∈ I
with x 6= 0. Since F is a field, x−1 ∈ F exists, and therefore x · x−1 = 1 ∈ I. Therefore,
I = (1) = F .

Definition B.2.24. A subfield of a field (F, +, ·) is a subset K ⊂ F such that (K, +, ·) is


also a field.

Example B.2.25. Q is a subfield of R. Its also a subfield of C. R is also a subfield of C.

B.3 Morphisms
A morphism is a way of talking between two objects of the same type.

Definition B.3.1. Let R, S be two objects of the same type (groups, rings, or fields).
A homomorphism between R and S is a function f : R → S that preserves algebraic
structure.

ˆ If R, S are groups, a group homomorphism is a map f : R → S such that for any


a, b ∈ R one has f (a ·R b) = f (a) ·S f (b) and f (eR ) = eS where ·R and ·S are the
operations on R, S respectively, and eR , eS are the identity elements of R, S.

ˆ If R, S are rings, a ring homomorphism is a map f : R → S such that for any


a, b ∈ R one has f (a +R b) = f (a) +S f (b),f (a ·R b) = f (a) ·S f (b), and f (1R ) = 1S .

If a homomorphism f is a bijection, we call f an isomorphism. If f is an isomorphism


and R = S, we call f an automorphism. If two objects R and S are isomorphic we write
R∼ = S.

105
Note that we did not define the notion of a homomorphism of fields. Since fields are
rings, a field homomorphism is just a ring homomorphism. In particular, one may check
that for any homomorphism of ring f , one must also have f (0R ) = 0S , f (−r) = −f (r), and
f (r−1 ) = f (r)−1 .

Definition B.3.2. Let R, S be two objects of the same type, and suppose that R is a sub-
object of S (i.e. a subgroup, subring, subfield, etc.). There is a homomorphism i : R → S
called the inclusion map, simply defined by i(x) = x for all x ∈ R. For example, (Z, +) is
a subgroup of (Q, +), so the inclusion map i : Z → Q that sends x ∈ Z to x ∈ Q is a group
homomorphism. We may also view Z as a subring of Q, and so the inclusion map is also a
homomorphism of rings.

Example B.3.3. Consider the two groups (R, +) and (R>0 , ·) with respect to addition and
multiplication of real numbers. The exponential map exp : R → R>0 given by exp(x) = ex
is a group homomorphism. This is because exp(0) = e0 = 1, and exp(x + y) = ex+y =
ex · ey = exp(x) · exp(y) by properties of exponential functions. This map is actually a group
isomorphism, with the inverse map log : R>0 → R, and so R ∼ = R>0 as groups.
Example B.3.4. Consider the complex numbers C, and let σ : C → C denote the complex
conjugation map. That is, σ(a+bi) = a−bi. Then σ is a field automorphism, and we usually
denote σ(a + bi) as a + bi. That σ is an automorphism follows rather easily: for two complex
numbers z = a + bi and w = c + di, one has z + w = (a + bi) + (c + di) = (a + c) + (b + d)i
and zw = (a + bi)(c + di) = (ac − bd) + (ad + bc)i by definition. It’s then straightforward to
see that σ(z + w) = σ(z) + σ(w), σ(zw) = σ(z)σ(w), σ(0) = 0, σ(1) = 1. This means that
σ is a field homomorphism, and clearly σ(σ(z)) = z, so the inverse map of σ is itself which
means its an automorphism.

Example B.3.5. Let R be a ring and let r ∈ R. The map evr : R[x] → R defined by
evr (p(x)) = p(r) is a ring homomorphism called the evaluation at r map. Let 0, 1 denote
the constant polynomials p(x) = 0 and p(x) = 1 in R[x]. Note that evr (0) = 0(r) = 0 and
evr (1) = 1(r) = 1 since 0 and 1 are constant polynomials, and by definition of function
addition/multiplication, for two polynomials p(x), q(x) ∈ R[x] one has (p + q)(r) = p(r) +
q(r) and (p · q)(r) = p(r)q(r), which means evr (p(x) + q(x)) = evr (p(x)) + evr (q(x)) and
evr (p(x)q(x)) = evr (p(x))evr (q(x)). Thus, evr is a ring homomorphism.

Definition B.3.6. Let f : R → S be a homomorphism. The kernel of f , ker(f ) is defined


by:

ˆ ker(f ) = {x ∈ R : f (x) = eS } if R, S are groups.

ˆ ker(f ) = {x ∈ R : f (x) = 0S } if R, S are rings.

The image of f , Im(f ) is defined by Im(f ) = {f (r) : r ∈ R}.

The kernel and image of a morphism f are special subsets of R and S respectively. They
measure the failure of a morphism to be injective/surjective.

Proposition 55. Let f : R → S be a homomorphism.

106
ˆ ker(f ) is trivial if and only if f is injective.
ˆ Im(f ) = S if and only if f is surjective.
Proof. We will give the proof for homomorphisms of rings. The proof for groups is left as
an exercise.
ˆ First, suppose that ker(f ) is trivial, and that f (x) = f (y) for x, y ∈ R. Then 0S =
f (x) − f (y) = f (x − y) because f is a homomorphism, and therefore x − y ∈ ker(f ).
As ker(f ) is trivial, this means x − y = 0R , so x = y. This says x is injective. For
the other direction, note that f (0R ) = 0S always, and so if x ∈ ker(f ), this means
f (x) = 0S and so by injectivity of f , we have x = 0R which means ker(f ) = {0R } is
trivial.
ˆ By definition, f is surjective if and only if for any s ∈ S we may find r ∈ R with
f (r) = s. This is precisely the same thing as saying that Im(f ) = S.

Proposition 56. Let f : R → S be a homomorphism.


ˆ If f is a homomorphism of groups, then ker(f ) and Im(f ) are subgroups of R, S re-
spectively.
ˆ If f is a homomorphism of rings, then ker(f ) is an ideal of R and Im(f ) is a subring
of S.
Proof. Exercise.
Why is the image of a ring homomorphism not an ideal? Since f (1R ) = 1S , if Im(f ) was
an ideal, then necessarily we would have Im(f ) ⊃ (1S ) = S, so Im(f ) = S would mean that
f is surjective. However, obviously not all ring homomorphisms are surjective! Similarly, if
1R ∈ ker(f ) then R = (1R ) ⊂ ker(f ) so f would be the zero map, and so ker(f ) is almost
never a subring of R.
Example B.3.7. The inclusion map i : Z → Q as a map of rings is injective, and so
ker(i) = (0) is trivial, and clearly Im(f ) = Z.
Example B.3.8. We showed before that if F is a field, then the only ideals of F are (0)
and F . This means that if f : F → K is a homomorphism of fields, either ker(f ) = (0) so f
is injective, or ker(f ) = F so f is the zero map.
Example B.3.9. Let GL2 (R) be the group of non-zero invertible matrices with real entries
with respect to multiplication. Consider the determinant map det : GL2 (R) → R× given
by A → det(A). One may check that det is indeed a group homomorphism. We note that
×
ker(det) = {A ∈ GL2 (R) : det(A) = 1} =SL2 (R),  and Im(det) = R . That this latter
a 0
equality is true can be easily seen from det( ) = a for any a ∈ R× .
0 1
Example B.3.10. Let R, S be rings. The projection map πR : R × S → R given by
π((r, s)) = r is a surjective ring homomorphism with Im(πR ) = R and ker(πR ) = {0R } × S ∼
=
S.

107

You might also like