0% found this document useful (0 votes)

13 views117 pages

1 Dna Sequencing

The document provides information about DNA, including its structure as a double helix, storage in cells, relationship to genes and proteins, and sequencing technologies. It discusses DNA sequencing technologies including Sanger sequencing, second generation sequencing, and third generation sequencing. It also provides some useful facts about the human genome and DNA sequencing process.

Uploaded by

Abraham Lincoln

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views117 pages

1 Dna Sequencing

Uploaded by

Abraham Lincoln

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 117

DNA and sequencing (mostly

Illumina)

Many slides adapted from Ben Langmead. Thanks, Ben!

https://langmead-lab.org/teaching-materials/
What is DNA?

• There are many types of biomolecules

we pretty exclusively focus on these in class

• Carbohydrates, lipids, proteins, nucleic acids

• DNA is a type of nucleic acid (deoxyribonucleic acid)

• DNA stores all the genetic information that a particular organism needs to
survive
DNA is stored in nearly every human cell

Question: which cells don’t have DNA?

https://en.wikipedia.org/wiki/DNA
DNA, genes, RNA, and proteins

my
DNA 0
In
D RNA

f protein
DNA and the double helix
AA GT A

TEAT
5’ 3’
DNA, chromatid, chromosome

3
Some useful facts about DNA
Some useful facts about DNA
• A “genome” is about 3.1 Gb

• Just one side of a double helix

Some useful facts about DNA
• A “genome” is about 3.1 Gb

• Just one side of a double helix

• Humans are 99.9% genetically identical

• A great overestimate of a person’s variability is 3M genetic variants

Some useful facts about DNA
• A “genome” is about 3.1 Gb

• Just one side of a double helix

• Humans are 99.9% genetically identical

• A great overestimate of a person’s variability is 3M genetic variants

• If we take the union of all single nucleotide variants, it’s only ~8M (> 5%
allele frequency)
Some useful facts about DNA
• A “genome” is about 3.1 Gb

• Just one side of a double helix

• Humans are 99.9% genetically identical

• A great overestimate of a person’s variability is 3M genetic variants

• If we take the union of all single nucleotide variants, it’s only ~8M (> 5%
allele frequency)

• …so why sequence DNA?

Genomics technology

Sanger DNA 3rd-generation &

DNA Microarrays 2nd-generation DNA
sequencing single-molecule
sequencing
DNA sequencing
1977-1990s Since mid-1990s Since ~2007
Since ~2010

Fred Sanger
1918-2013

“Chain termination”
sequencing
Sanger sequencing

Sanger sequencing Fred Sanger in episode 3 of PBS documentary “DNA” Not-so-high-throughput Sanger sequencing
1977-1990s
First practical method invented by Fred Sanger
in 1977. Initially used to sequence shorter
genomes, e.g. viral genomes 10,000s of bases
long.
Sanger sequencing

From "DNA" documentary, episode 3

Genomics technology

Sanger DNA 3rd-generation &

DNA Microarrays 2nd-generation DNA
sequencing single-molecule
sequencing
DNA sequencing
1977-1990s Since mid-1990s Since ~2007
Since ~2010
Sequencing

No sequencing technology yet invented can read

much more than 10,000 nucleotides at a time with
reasonable cost, throughput, accuracy
Instead, there’s a vigorous race to see whose
sequencer can read “short” fragments of DNA (around
100s of nucleotides) with best cost, throughput,
accuracy
Decoding DNA With Semiconductors
By NICHOLAS WADE Company Unveils DNA Sequencing
Published: July 20, 2011 Device Meant to Be Portable, Disposable
and Cheap
Cost of Gene Sequencing Falls, Raising By ANDREW POLLACK
Hopes for Medical Advances Published: February 17, 2012

By JOHN MARKOFF
Published: March 7, 2012 Source: nytimes.com
Sequencing
Since 2005, many DNA sequencing instruments have been described
and released. They are based on a few different principles

Synthesis / ligation SMRT cell Nanopore

Sequencing by synthesis (“massively parallel sequencing”) provides

greatest throughput, and is the most prevalent today
Pictures: http://www.illumina.com/systems/miseq/technology.ilmn, http://www.genengnews.com/gen-articles/third-generation-sequencing-debuts/3257/
DNA: double helix

A T

G C

http://ghr.nlm.nih.gov/handbook/basics/dna
DNA: double helix

A T

G C

http://ghr.nlm.nih.gov/handbook/basics/dna
DNA: double helix

A T

G C

http://ghr.nlm.nih.gov/handbook/basics/dna

TCACACTGAGCGTGCTG
DNA: double helix

A T

G C

http://ghr.nlm.nih.gov/handbook/basics/dna

Forward strand
TCACACTGAGCGTGCTG
DNA: double helix

A T

G C

http://ghr.nlm.nih.gov/handbook/basics/dna

Forward strand
TCACACTGAGCGTGCTG
AGTGTGACTCGCACGAC
DNA: double helix

A T

G C

http://ghr.nlm.nih.gov/handbook/basics/dna

Forward strand
TCACACTGAGCGTGCTG
Reverse strand
AGTGTGACTCGCACGAC
Your genome

CGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTG
GTATGCACGCGATAG

Reads

Your genome

CGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTG
GTATGCACGCGATAG TATGTCGCAGTATCT

Reads

Your genome

CGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTG
GTATGCACGCGATAG TATGTCGCAGTATCT CACCCTATGTCGCAG

Reads

Your genome

CGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTG
GTATGCACGCGATAG TATGTCGCAGTATCT CACCCTATGTCGCAG GAGACGCTGGAGCCG

Reads

Your genome

Reads

Your genome

Reads

Your genome

CGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTG
GTATGCACGCGATAG TATGTCGCAGTATCT CACCCTATGTCGCAG GAGACGCTGGAGCCG
TAGCATTGCGAGACG GGTATGCACGCGATA TGGAGCCGGAGCACC CGCTGGAGCCGGAGC
TGTCTTTGATTCCTG CGCGATAGCATTGCG GCATTGCGAGACGCT CCTATGTCGCAGTAT
GACGCTGGAGCCGGA GCACCCTATGTCGCA GTATCTGTCTTTGAT CCTCATCCTATTATT
TATCGCACCTACGTT CAATATTCGATCATG GATCACAGGTCTATC ACCCTATTAACCACT
CACGGGAGCTCTCCA TGCATTTGGTATTTT CGTCTGGGGGGTATG CACGCGATAGCATTG
GTATGCACGCGATAG ACCTACGTTCAATAT TATTTATCGCACCTA CCACTCACGGGAGCT
Reads GCGAGACGCTGGAGC CTATCACCCTATTAA CTGTCTTTGATTCCT ACTCACGGGAGCTCT
CCTACGTTCAATATT GCACCTACGTTCAAT GTCTGGGGGGTATGC AGCCGGAGCACCCTA
GACGCTGGAGCCGGA GCACCCTATGTCGCA GTATCTGTCTTTGAT CCTCATCCTATTATT
TATCGCACCTACGTT CAATATTCGATCATG GATCACAGGTCTATC ACCCTATTAACCACT
CACGGGAGCTCTCCA TGCATTTGGTATTTT CGTCTGGGGGGTATG CACGCGATAGCATTG

Your genome

CGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTG
Reads

Your genome
Reads

100 nt

Your genome
Reads

100 nt

Your genome
100,000,000 nt
Reads

100 nt

Your genome a
f
?
100,000,000 nt
The sequencing Oracle

Your genome chri

Chris CGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTG
chr2
A T

G C

Double stranded Double stranded

DNA (double helix) DNA (lego version)
C G
C G A T
A T
T A G C
A T
G C
C
G
C
G
A
T
T
A
A
G Single stranded T
templates
C
C
G
C
G
A
T
T
A
A
G C T
C
C
G
C
G
A
DNA polymerase T
T
A
A
G C T
C
C
G
C
G
A
T
T
T A
A
G C T
C
C
G
C
G
A
A T
T
T A
A
G C T
C
C
G
C
T G
A
A T
T
T A
A
G C T
C
C
G G
C
T G
A
A T
T
T A
A
G C T
C
C G
G G
C
T G
A
A T
T
T A
A
G C T
C
C G
G C G
C
T C G
A
A A T
T
T T A
A
G C A T
G C
More details: Accurate whole human genome sequencing using
reversible terminator chemistry. Nature. 2008 Nov 6;456(7218):53-9
Input DNA
CCATAGTATATCTCGGCTCTAGGCCCTCATTTTTT
CCATAGTATATCTCGGCTCTAGGCCCTCATTTTTT
CCATAGTATATCTCGGCTCTAGGCCCTCATTTTTT
CCATAGTATATCTCGGCTCTAGGCCCTCATTTTTT

More details: Accurate whole human genome sequencing using

reversible terminator chemistry. Nature. 2008 Nov 6;456(7218):53-9
Input DNA
CCATAGTATATCTCGGCTCTAGGCCCTCATTTTTT
CCATAGTATATCTCGGCTCTAGGCCCTCATTTTTT
CCATAGTATATCTCGGCTCTAGGCCCTCATTTTTT
CCATAGTATATCTCGGCTCTAGGCCCTCATTTTTT

Cut into snippets

CCATAGTA TATCTCGG CTCTAGGCCCTC ATTTTTT
CCA TAGTATAT CTCGGCTCTAGGCCCTCA TTTTTT
CCATAGTAT ATCTCGGCTCTAG GCCCTCA TTTTTT
CCATAG TATATCT CGGCTCTAGGCCCT CATTTTTT

More details: Accurate whole human genome sequencing using

reversible terminator chemistry. Nature. 2008 Nov 6;456(7218):53-9
Input DNA
1 shr I
cellCCATAGTATATCTCGGCTCTAGGCCCTCATTTTTT
DNA
CCATAGTATATCTCGGCTCTAGGCCCTCATTTTTT
CCATAGTATATCTCGGCTCTAGGCCCTCATTTTTT
CCATAGTATATCTCGGCTCTAGGCCCTCATTTTTT

Cut into snippets

CCATAGTA TATCTCGG CTCTAGGCCCTC ATTTTTT
CCA TAGTATAT CTCGGCTCTAGGCCCTCA TTTTTT
CCATAGTAT ATCTCGGCTCTAG GCCCTCA TTTTTT
CCATAG TATATCT CGGCTCTAGGCCCT CATTTTTT

Deposit on slide
C
C
A
T
A
G

More details: Accurate whole human genome sequencing using

reversible terminator chemistry. Nature. 2008 Nov 6;456(7218):53-9
Each DNA ‘cluster’ is about 1-2 microns

Flow cell with several lanes and capable of sequencing billions of reads
Prepped sequences “flow” on the flow cell and
bind to wells

C C C
C C T
A A T
T C A
A G A
G G G
Prepped sequences “flow” on the flow cell and
bind to wells

C
C
A
T
A
G

C C
C T
A T
C A
G A
G G
Prepped sequences “flow” on the flow cell and
bind to wells

C
C
A
T
A
G
C
C
A
C
G
G

C
T
T
A
A
G
Prepped sequences “flow” on the flow cell and
bind to wells

C
C
A
T
A
G
C
C
A
C
G
G
C
T
T
A
A
G
Prepped sequences “flow” on the flow cell and
bind to wells

C
C
A
T
A
G Flow cell
C
C
A
C
G
G
C
T
T
A
A stranded
G Single
templates
tinyurl.com/cs121sp24
Prepped sequences “flow” on the flow cell and
bind to wells

C
C
A
T
A
G Flow cell
C
C
A
C
G
G
C
T
T
A
A
G
Billions of
microwells
which contain
“one” sequence
Template
(billions of them!)

Slide
• It is exceptionally diﬃcult to sequence one
molecule
Template
(billions of them!) • Imagine that each template is actually many
copies of the same sequence in one microcell

Slide
DNA polymerase
A T DNA polymerase

C G
A T DNA polymerase

“Terminator”

C G
~~
(snap)

~
~~
~
~~
~
Remove terminators
DNA polymerase
A T DNA polymerase

C G
A T DNA polymerase

C G

Repeat!
(snap)
(snap)
(snap)
(snap)
(snap)
Sequencing by synthesis

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6

Sequencing by synthesis
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6

complement complement complement complement complement complement

G A T A C C

C
C
A
T
A
G
Sequencing by synthesis

Actual Illumina HiSeq 3000 image

http://dnatech.genomecenter.ucdavis.edu/2015/05/07/first-hiseq-3000-data-download/
Sequencing by synthesis
Billions of templates on a slide
Massively parallel: photograph captures all templates
simultaneously
Terminators are “speed bumps,” keeping reactions in sync
Sequencing by synthesis
Billions of templates on a slide
Massively parallel: photograph captures all templates
simultaneously
Terminators are “speed bumps,” keeping reactions in sync
Eh, I thought it was really hard to sequence
specific molecules?
Yes. Yes, it is.

https://youtu.be/oIJaA6h2bFM?t=613
Bridge amplification example

used to
uerces omplemntarpsequeces
tdinotesadapt.rs connect to the microwell
bend strand
have
of
a

Eh of
at original DNA it
c denature
effing in
ate
from
Man
if A
i fat
Cluster of clones
Unterminated
Ahead of schedule
Unterminated
Q = -10 · log10 p
Q = -10 · log10 p

Base quality
Q = -10 · log10 p

Probability that
Base quality
base call is
incorrect
Q = -10 · log10 p

Probability that
Base quality
base call is
incorrect

Q = 10 → 1 in 10 chance call is incorrect

Q = 20 → 1 in 100
Q = 30 → 1 in 1,000
Call: orange (C)
Call: orange (C)

Estimate p, probability incorrect:

Call: orange (C)

Estimate p, probability incorrect:

non-orange light / total light
Call: orange (C)

Estimate p, probability incorrect:

non-orange light / total light

p = 3 green / 9 total = 1/3

Call: orange (C)

Estimate p, probability incorrect:

non-orange light / total light

p = 3 green / 9 total = 1/3

Q = -10 log10 1/3
Call: orange (C)

Estimate p, probability incorrect:

non-orange light / total light

p = 3 green / 9 total = 1/3

Q = -10 log10 1/3 = 4.77
A read in FASTQ format

@ERR194146.1 HSQ1008:141:D0CC8ACXX:3:1308:20201:36071/1
ACATCTGGTTCCTACTTCAGGGCCATAAAGCCTAAATAGCCCACACGTTCCCCTTAAAT
+
?@@FFBFFDDHHBCEAFGEGIIDHGH@GDHHHGEHID@C?GGDG@FHIGGH@FHBEG:G
A read in FASTQ format

Name @ERR194146.1 HSQ1008:141:D0CC8ACXX:3:1308:20201:36071/1

ACATCTGGTTCCTACTTCAGGGCCATAAAGCCTAAATAGCCCACACGTTCCCCTTAAAT
+
?@@FFBFFDDHHBCEAFGEGIIDHGH@GDHHHGEHID@C?GGDG@FHIGGH@FHBEG:G
A read in FASTQ format

Name @ERR194146.1 HSQ1008:141:D0CC8ACXX:3:1308:20201:36071/1

Sequence ACATCTGGTTCCTACTTCAGGGCCATAAAGCCTAAATAGCCCACACGTTCCCCTTAAAT
+
?@@FFBFFDDHHBCEAFGEGIIDHGH@GDHHHGEHID@C?GGDG@FHIGGH@FHBEG:G
A read in FASTQ format

Name @ERR194146.1 HSQ1008:141:D0CC8ACXX:3:1308:20201:36071/1

Sequence ACATCTGGTTCCTACTTCAGGGCCATAAAGCCTAAATAGCCCACACGTTCCCCTTAAAT
(ignore) +
?@@FFBFFDDHHBCEAFGEGIIDHGH@GDHHHGEHID@C?GGDG@FHIGGH@FHBEG:G
A read in FASTQ format

Name @ERR194146.1 HSQ1008:141:D0CC8ACXX:3:1308:20201:36071/1

Sequence ACATCTGGTTCCTACTTCAGGGCCATAAAGCCTAAATAGCCCACACGTTCCCCTTAAAT
(ignore) +
Base qualities ?@@FFBFFDDHHBCEAFGEGIIDHGH@GDHHHGEHID@C?GGDG@FHIGGH@FHBEG:G
FASTQ

Name

Read 1 Sequence
(placeholder)
Base qualities
Name

Read 2 Sequence
(placeholder)
Base qualities
Name

Read 3 Sequence
(placeholder)
Base qualities
Name

Read 4 Sequence
(placeholder)
Base qualities
Name

Read 5 Sequence
(placeholder)
Base qualities
Quality degrades as a function of length
How do we get errors?
• This process is called “base calling”

• “calling” the correct base from the images

• Lots of methods, my fav: “BayesCall”

https://genome.cshlp.org/content/19/10/1884.full
Base qualities

Bases and qualities line up:

AGCTCTGGTGACCCATGGGCAGCTGCTAGGGA
||||||||||||||||||||||||||||||||
HHHHHHHHHHHHHHHGCGC5FEFFFGHHHHHH

Base quality is ASCII-encoded version of Q = -10 log10 p

ASCII
Base qualities

Usual ASCII encoding is “Phred+33”:

take Q, rounded to integer, add 33, convert to character

def QtoPhred33(Q):
""" Turn Q into Phred+33 ASCII-encoded quality """
return chr(int(round(Q)) + 33)

def phred33ToQ(qual):
""" Turn Phred+33 ASCII-encoded quality into Q """
return ord(qual)-33
Base qualities

Usual ASCII encoding is “Phred+33”:

take Q, rounded to integer, add 33, convert to character

def QtoPhred33(Q):
""" Turn Q into Phred+33 ASCII-encoded quality """
return chr(int(round(Q)) + 33)
(converts character to integer according to ASCII table)
def phred33ToQ(qual):
""" Turn Phred+33 ASCII-encoded quality into Q """
return ord(qual)-33
(converts integer to character according to ASCII table)

Molecular Biology Basics
No ratings yet
Molecular Biology Basics
52 pages
Genomics - 2023 - Tagged
No ratings yet
Genomics - 2023 - Tagged
59 pages
c2 Mic210 131012232559 Phpapp01
No ratings yet
c2 Mic210 131012232559 Phpapp01
26 pages
Molecular Basis
No ratings yet
Molecular Basis
66 pages
2025 Spring BS120 General Biology Lecture 15
No ratings yet
2025 Spring BS120 General Biology Lecture 15
30 pages
Reading The Blueprint of Life: DNA Sequencing
No ratings yet
Reading The Blueprint of Life: DNA Sequencing
23 pages
Marzillier 11132013
No ratings yet
Marzillier 11132013
48 pages
T2 Syllabus Revision Class
No ratings yet
T2 Syllabus Revision Class
73 pages
Lecture 2.genes and Genomes
No ratings yet
Lecture 2.genes and Genomes
58 pages
Copy3-Neuro Embryology Presentation
No ratings yet
Copy3-Neuro Embryology Presentation
21 pages
Reading The Blueprint of Life: DNA Sequencing
No ratings yet
Reading The Blueprint of Life: DNA Sequencing
23 pages
Genomic Medicine: Basic Molecular Biology
No ratings yet
Genomic Medicine: Basic Molecular Biology
23 pages
Bioinformatics Practical For Biochemists: Andrei Lupas, Birte Höcker, Steffen Schmidt
No ratings yet
Bioinformatics Practical For Biochemists: Andrei Lupas, Birte Höcker, Steffen Schmidt
34 pages
Chapter 21 Genomes Outline
No ratings yet
Chapter 21 Genomes Outline
63 pages
Genome Sequencing: Jutta Marzillier, PH.D
No ratings yet
Genome Sequencing: Jutta Marzillier, PH.D
56 pages
Dr. MV Hejmadi Dr. JR Beeching (Convenor) Prof. RJ Scott Prof. JMW Slack
No ratings yet
Dr. MV Hejmadi Dr. JR Beeching (Convenor) Prof. RJ Scott Prof. JMW Slack
43 pages
DNA Sequencing
No ratings yet
DNA Sequencing
17 pages
1 DNA Structure, Replication
No ratings yet
1 DNA Structure, Replication
45 pages
Genomics 1
No ratings yet
Genomics 1
47 pages
Lecture 8 Chapter 11
No ratings yet
Lecture 8 Chapter 11
61 pages
Murder Mystery DNA Sequencing
No ratings yet
Murder Mystery DNA Sequencing
38 pages
Mutation, Biotech & Genomics Final Lesson
No ratings yet
Mutation, Biotech & Genomics Final Lesson
32 pages
DNA The Blueprint of Life
No ratings yet
DNA The Blueprint of Life
10 pages
Lecture 01 - Basics of DNA, Genomes, and Mutations
No ratings yet
Lecture 01 - Basics of DNA, Genomes, and Mutations
76 pages
Genome Sequencing Principles
No ratings yet
Genome Sequencing Principles
8 pages
Bioinformatics 4
No ratings yet
Bioinformatics 4
30 pages
DNA and Genetic Engineering
No ratings yet
DNA and Genetic Engineering
43 pages
Gene Sequencing Lecture7
No ratings yet
Gene Sequencing Lecture7
17 pages
Lecture1-4 525 W16 Large
No ratings yet
Lecture1-4 525 W16 Large
80 pages
DNA Technology and Applications
No ratings yet
DNA Technology and Applications
71 pages
Chapter 03
No ratings yet
Chapter 03
245 pages
AGR322 - Genomics
No ratings yet
AGR322 - Genomics
16 pages
Yourgenome Genomicsandgenespack
No ratings yet
Yourgenome Genomicsandgenespack
26 pages
03 - 2023 08 24 MolecularMethods
No ratings yet
03 - 2023 08 24 MolecularMethods
78 pages
Genetic Transformation-Is Referred To The Active (Though Still Unknown) Substance in The S
No ratings yet
Genetic Transformation-Is Referred To The Active (Though Still Unknown) Substance in The S
4 pages
DNA & RNA: A Molecular Overview
No ratings yet
DNA & RNA: A Molecular Overview
35 pages
Genomes, Genes, and Alleles
No ratings yet
Genomes, Genes, and Alleles
21 pages
Genetic and Dna Technology
No ratings yet
Genetic and Dna Technology
96 pages
Human Genome Project Insights
No ratings yet
Human Genome Project Insights
54 pages
Genetic Resources and Food Traceability: Course
No ratings yet
Genetic Resources and Food Traceability: Course
73 pages
Molecular Diagnostic Techniques Lesson 2
No ratings yet
Molecular Diagnostic Techniques Lesson 2
59 pages
Computational Biology Course Overview
No ratings yet
Computational Biology Course Overview
24 pages
Molecular Biology Primer
No ratings yet
Molecular Biology Primer
60 pages
Cytology Lec - Dna & Rna
No ratings yet
Cytology Lec - Dna & Rna
4 pages
Molecular Basis of Inheritance
No ratings yet
Molecular Basis of Inheritance
52 pages
Genomes: Number of Base Pairs
No ratings yet
Genomes: Number of Base Pairs
38 pages
DNA Sequencing
No ratings yet
DNA Sequencing
30 pages
Biol 3401
No ratings yet
Biol 3401
18 pages
Basic Molecular Biology: Gene Structure
No ratings yet
Basic Molecular Biology: Gene Structure
36 pages
DNA Structure
100% (8)
DNA Structure
38 pages
Basics of Gene Cloning & Cloning Enzymes: Lecture 1 BT-620
No ratings yet
Basics of Gene Cloning & Cloning Enzymes: Lecture 1 BT-620
71 pages
L1-Intro Biology PDF
No ratings yet
L1-Intro Biology PDF
77 pages
Marine Biology 6e - Molecular Tools Chapter
No ratings yet
Marine Biology 6e - Molecular Tools Chapter
10 pages
Chapter 20 - DNA Tools and Biotechnology
No ratings yet
Chapter 20 - DNA Tools and Biotechnology
9 pages
Central Dogma: Prepared By: Dyan B. Jumamoy
No ratings yet
Central Dogma: Prepared By: Dyan B. Jumamoy
89 pages
Dna Rna 11
No ratings yet
Dna Rna 11
70 pages
Cat Test 1 Verbal
No ratings yet
Cat Test 1 Verbal
22 pages
5 Jenis Jenis Bioreaktor
No ratings yet
5 Jenis Jenis Bioreaktor
45 pages
Biological Science 6th Edition Freeman HQ File Fast Access
No ratings yet
Biological Science 6th Edition Freeman HQ File Fast Access
301 pages
Lec 5 - Mutations and Control of Gene Expression
No ratings yet
Lec 5 - Mutations and Control of Gene Expression
26 pages
iPSC Guide by ATCC PDF
No ratings yet
iPSC Guide by ATCC PDF
39 pages
Biology - Brenda Walpole - Second Edition - Cambridge 2014 PDF
100% (2)
Biology - Brenda Walpole - Second Edition - Cambridge 2014 PDF
414 pages
Lesson 11
No ratings yet
Lesson 11
4 pages
Cross Polinated Crops
No ratings yet
Cross Polinated Crops
18 pages
Diseno de Primers LAMP
No ratings yet
Diseno de Primers LAMP
23 pages
Cell Structure and Functions PPT 6
No ratings yet
Cell Structure and Functions PPT 6
27 pages
MONTHLY TEST August - 24
No ratings yet
MONTHLY TEST August - 24
4 pages
NHIC Singapore Technology Request
No ratings yet
NHIC Singapore Technology Request
12 pages
Designing Vaccines For Active Immunization
100% (1)
Designing Vaccines For Active Immunization
18 pages
Module 1 The Foundation of Life
No ratings yet
Module 1 The Foundation of Life
18 pages
Solving Safety Implications in A Case Based Decision-Support System in Medicine
No ratings yet
Solving Safety Implications in A Case Based Decision-Support System in Medicine
81 pages
02 - Current Practice MAb Platform
No ratings yet
02 - Current Practice MAb Platform
72 pages
Revision+worksheet+gr7+.sem3 Biology
No ratings yet
Revision+worksheet+gr7+.sem3 Biology
4 pages
Genetics June Revision Notess
No ratings yet
Genetics June Revision Notess
16 pages
Scope of Bioinformatics
No ratings yet
Scope of Bioinformatics
27 pages
Jawetz, Melnick & Adelberg'S Medical Microbiology 28Th Edition Edition Stefan Riedel - Ebook PDF
100% (3)
Jawetz, Melnick & Adelberg'S Medical Microbiology 28Th Edition Edition Stefan Riedel - Ebook PDF
44 pages
Lab 5 The Cell - Cell Structure
No ratings yet
Lab 5 The Cell - Cell Structure
5 pages
Dna Test Report - Medgenome Laboratories: Luvv Aggarwal (G18-4859) 80236/213216
0% (1)
Dna Test Report - Medgenome Laboratories: Luvv Aggarwal (G18-4859) 80236/213216
6 pages
The Cell: Anabolism Catabolism
No ratings yet
The Cell: Anabolism Catabolism
14 pages
Mytaq Hs Red Mix Product Manual
100% (2)
Mytaq Hs Red Mix Product Manual
2 pages
Cell and Cell Structure: Contact
No ratings yet
Cell and Cell Structure: Contact
3 pages
DNA and Molecular Inheritance
No ratings yet
DNA and Molecular Inheritance
44 pages
Toxicology - 2/03/16 What Is Epidemiology?
100% (1)
Toxicology - 2/03/16 What Is Epidemiology?
3 pages
Justin Brian Chiongson, M. SC., RCH Relicardo M. Coloso, Ph. D., RCH
No ratings yet
Justin Brian Chiongson, M. SC., RCH Relicardo M. Coloso, Ph. D., RCH
23 pages
Week 4 Supplementary Activity 4 - Central Dogma of Genetics - ANSWER
No ratings yet
Week 4 Supplementary Activity 4 - Central Dogma of Genetics - ANSWER
3 pages
Ch4 Lecture Slides
No ratings yet
Ch4 Lecture Slides
34 pages

1 Dna Sequencing

Uploaded by

1 Dna Sequencing

Uploaded by

DNA and sequencing (mostly

Many slides adapted from Ben Langmead. Thanks, Ben!

• There are many types of biomolecules

• Carbohydrates, lipids, proteins, nucleic acids

• DNA is a type of nucleic acid (deoxyribonucleic acid)

Question: which cells don’t have DNA?

• Just one side of a double helix

• Just one side of a double helix

• Humans are 99.9% genetically identical

• A great overestimate of a person’s variability is 3M genetic variants

• Just one side of a double helix

• Humans are 99.9% genetically identical

• A great overestimate of a person’s variability is 3M genetic variants

• Just one side of a double helix

• Humans are 99.9% genetically identical

• A great overestimate of a person’s variability is 3M genetic variants

• …so why sequence DNA?

Sanger DNA 3rd-generation &

From "DNA" documentary, episode 3

Sanger DNA 3rd-generation &

No sequencing technology yet invented can read

Synthesis / ligation SMRT cell Nanopore

Sequencing by synthesis (“massively parallel sequencing”) provides

Your genome chri

Double stranded Double stranded

More details: Accurate whole human genome sequencing using

Cut into snippets

More details: Accurate whole human genome sequencing using

Cut into snippets

More details: Accurate whole human genome sequencing using

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6

complement complement complement complement complement complement

Actual Illumina HiSeq 3000 image

Q = 10 → 1 in 10 chance call is incorrect

Estimate p, probability incorrect:

Estimate p, probability incorrect:

Estimate p, probability incorrect:

p = 3 green / 9 total = 1/3

Estimate p, probability incorrect:

p = 3 green / 9 total = 1/3

Estimate p, probability incorrect:

p = 3 green / 9 total = 1/3

Name @ERR194146.1 HSQ1008:141:D0CC8ACXX:3:1308:20201:36071/1

Name @ERR194146.1 HSQ1008:141:D0CC8ACXX:3:1308:20201:36071/1

Name @ERR194146.1 HSQ1008:141:D0CC8ACXX:3:1308:20201:36071/1

Name @ERR194146.1 HSQ1008:141:D0CC8ACXX:3:1308:20201:36071/1

• “calling” the correct base from the images

• Lots of methods, my fav: “BayesCall”

Bases and qualities line up:

Base quality is ASCII-encoded version of Q = -10 log10 p

Usual ASCII encoding is “Phred+33”:

Usual ASCII encoding is “Phred+33”:

You might also like