0% found this document useful (0 votes)
38 views74 pages

US10400267

The document discusses methods for preparing sequencing libraries directly from blood samples to detect chromosomal abnormalities in cell-free DNA (cfDNA) without isolating the cfDNA. The methods involve reducing the binding between the cfDNA and nucleosomal proteins, such as by treating with a detergent or heating. The test samples can be peripheral blood from pregnant women or cancer patients.

Uploaded by

FDS_03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views74 pages

US10400267

The document discusses methods for preparing sequencing libraries directly from blood samples to detect chromosomal abnormalities in cell-free DNA (cfDNA) without isolating the cfDNA. The methods involve reducing the binding between the cfDNA and nucleosomal proteins, such as by treating with a detergent or heating. The test samples can be peripheral blood from pregnant women or cancer patients.

Uploaded by

FDS_03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

US010400267B2

( 12 ) United States Patent ( 10 ) Patent No.: US 10 ,400,267 B2


Srinivasan et al. (45 ) Date of Patent: * Sep . 3, 2019
(54 ) GENERATING CELL -FREE DNA LIBRARIES WO WO 2011/090 539 7 / 2011
DIRECTLY FROM BLOOD WO WO 2014 /014497 1 / 2014
WO WO 2014145078 9 / 2014
(71) Applicant: Verinata Health , Inc ., San Diego, CA
(US ) OTHER PUBLICATIONS
( 72 ) Inventors: Anupama Srinivasan , Redwood City , Adey et al. 2010 . “Rapid , low - input, low -bias construction of
shotgun fragment libraries by high - density in vitro transposition,"
CA (US ); Richard P. Rava , Redwood Genome Biology 11: 1 - 17 .
City, CA (US ) Bianchi et al. 2012 . “ Genome-Wide Fetal Aneuploidy Detection by
(73 ) Assignee: Verinata Health , Inc., San Diego , CA Maternal Plasma DNA Sequencing ,” Obstetrics & Gynecology
119 ( 5 ): 1 - 13 .
(US ) Boom et al. 1990 . “ Rapid and Simple Method for Purification of
Nucleic Acids," Journal of Clinical Microbiology 28 ( 3 ) :495 -503.
( * ) Notice : Subject to any disclaimer, the term of this GE Healthcare 2010 . “ Albumin & IgG Depletion Spin Trap ,” Prod
patent is extended or adjusted under 35 uct Booklet pp . 1 - 12 .
U .S . C . 154 (b ) by 0 days . Hawkins et al. 1994 . “ DNA purification and isolation using a
solid - phase,” Nucleic Acids Research 22 (21) : 4543 -4544 .
This patent is subject to a terminal dis Li et al. 2005 . “ Rapid spontaneous accessibility of nucleosomal
claimer . DNA ,” Nature Structural & Molecular Biology 12 ( 1 ):46 -53 .
Sehnert et al. 2011 . " Optimal Detection of Fetal Chromosomal
abnormalities by Massively Parallel DNA Sequencing of Cell- Free
(21 ) Appl. No.: 16 /005,502 Fetal DNA from Maternal Blood,” Clinical Chemistry 57 ( 7 ) : 1 - 8 .
Srinivasan, et al. 2013 . “ Noninvasive Detection of Fetal Subchromo
(22 ) Filed : Jun. 11, 2018 some Abnormalities via Deep Sequencing of Maternal Plasma,” The
American Journal of Human Genetics 92: 167 - 176 .
(65 ) Prior Publication Data Ulvik , et al. 2001, “ Single Nucleotide Polymorphism (SNP ) Genotyp
ing in Unprocessed Whole Blood and Serum by Real- Time PCR :
US 2018 /0346967 A1 Dec. 6 , 2018 Application to SNPs Affecting Homocysteine and Folate Metabo
lism ,” Clinical Chemistry 47 (11 ):2050 -2053 .
Zmatlikova et al. ( 2010 ) “ Non -enzymatic posttranslational modifi
Related U .S .Application Data cations ofbovine serum albumin by oxo - compounds investigated by
(63) Continuation of application No . 14 /214 ,277, filed on high -performance liquid chromatography -mass spectrometry and
Mar. 14 , 2014 , now Pat. No . 10 ,017 ,807. capillary zone electrophoresis-mass spectrometry ,” Journal of Chro
matography A , 1217 : 8009 - 8015 .
Umetani et al., “ Higher Amount of Free Circulating DNA in Serum
(60 ) Provisional application No.61/ 801, 126 , filed on Mar. than in Plasma Is Not Mainly Caused by Contaminated Extraneous
15 , 2013 . DNA during Separation ,” Department ofMolecular Oncology , John
Wayne Cancer Institute , Santa Afonica , California 90404, USA ,
(51) Int. Ci. (Ann . N . Y . Acad . Sci. 1075: 299- 307 ( 2006 ).
C120 1/6806 ( 2018 .01) El Messaoudi et al., " Circulating cell free DNA : Preanalytical
C12Q 1/6855 ( 2018 . 01) considerations," Clinica Chimica Acta 424 ( 2013 ) 222 -230 .
C12Q 1/6809 ( 2018 .01) (Continued )
(52) U . S . CI.
CPC ........ C12Q 1/6809 (2013.01) ; C12Q 1/6806 Primary Examiner — Jennifer Dunston
(2013 .01) (74 ) Attorney , Agent, or Firm — Weaver Austin
(58) Field of Classification Search Villenevue & Sampson LLP
CPC ........ ... C12Q 1 /6806 ; C12Q 2521/501; C12Q ABSTRACT
2527/ 101 (57 )
See application file for complete search history . The disclosure provides methods and kits for preparing
sequencing library to detect chromosomal abnormality using
(56) References Cited cell- free DNA (cfDNA ) without the need of first isolating
U . S . PATENT DOCUMENTS the cfDNA from a liquid fraction of a test sample. In some
embodiments , the method involves reducing the binding
2010 /0015621 A1 1 /2010 Chang between the cfDNA and nucleosomal proteins without
2010 /0120098 AL 5 / 2010 Grunenwald et al. unwinding the cfDNA from the nucleosomal proteins . In
2010 /0184069 A1 7/ 2010 Fernando et al. some embodiments , the reduction of binding may be
2010 /0209930 A1 8 / 2010 Fernando achieved by treating with a detergent or heating. In some
2011 /0201507 A1 8 /2011 Rava et al. embodiments , the method further involves freezing and
2012 / 0135874 Al 5 /2012 Wang et al.
2012 / 0270739 Al 10 /2012 Rava et al . thawing the test sample before reducing the binding between
2013 /0029852 Al 1/ 2013 Rava et al. the cfDNA and the nucleosomal proteins . In some embodi
2013 /0034546 A1 2 / 2013 Rava et al. ments, the test sample is a peripheral blood sample from a
2013 /0203606 A1 8 / 2013 Pollack et al. pregnant woman including cfDNA of both a mother and a
2014 /0274740 A1 9 / 2014 Srinivasan et al. fetus . In other embodiments , the test sample is a peripheral
FOREIGN PATENT DOCUMENTS blood sample from a patient known or suspected to have
cancer.
WO WO 2013 / 123030 A2 8 / 2003
WO WO 2009/ 135205 11 / 2009 19 Claims, 36 Drawing Sheets
US 10 ,Page
400,2267 B2

(56 ) References Cited Lohman et al. “ DNA Ligases” in Current Protocols in Molecular
Biology, Supplement 94 , pp . 3 . 14 . 1 - 3 . 14 .7 , Apr. 2011. ( Year : 2011 ).
U . S . Office Action dated Mar. 11, 2015 for U . S . Appl. No . 14 /214 ,277.
OTHER PUBLICATIONS U . S . Office Action dated Oct. 26 , 2015 for U .S . Appl. No. 14 /214 ,277 .
U .S . Final Office Action dated May 23 , 2016 for U . S . Appl. No .
Sparks et al., “ Selective analysis of cell-free DNA in maternalblood 14/214 ,277 .
for evaluation of feta I trisomy,” Prenatal diagnosis 32. 1 ( 2012 ): U . S . Office Action dated Jun . 16 , 2017 for U .S . Appl. No. 14 /214 ,277 .
3 - 9. U .S . Final Office Action dated Nov. 27, 2017 for U .S . Appl. No.
Library Preparation : NEBNext® DNA Library Prep Reagent Set for 14 /214 ,277 .
Illumina® Instruction Manual, Version 6 .0 , NEB # E6000S /L , New U . S . Notice of Allowance dated Mar. 12 , 2018 for U . S . Appl. No.
England Biolabs, Inc. Jun. 2016 , pp . 1 -23 . 14 /214 , 277 .
Ligation Protocol with T4 DNA Ligase (M0202 ) NEB , printed from Extended European Search Report dated Oct. 8 , 2018 issued in EP
https ://www .neb . com /protocols / 1 /01/01/ dna -ligation -with -t4 - dna Patent Application No. 18184795 .5 .
ligase -m02002 on Jun. 6 , 2017 as p . 1/ 1 . Huang Dorothy J et al: “ Isolation of cell- free DNA from maternal
Yau et al. Thermal denaturation studies of acetylated nucleosomes plasma using manual and automated systems” , Prenatal Diagnosis
and oligonucleosomes. European Journal of Biochemistry , vol. 129 , I Ed . By Sinuhe Hahn and Laird G . Jackson ; Methods in Molecular
No . 2 , pp . 281 - 288, Dec . 1982 . (Year: 1982 ). Biology ISSN 1 064 -3745 ], Totowa, NJ : Humana Press, C 2008 ,
Bashkin et al. Structure of DNA in a nucleosome core at high salt US, Jan . 1, 2008 ( Jan . 1, 2008 ), XP009507096 , pp . 203 - 208 .
concentration and at high temperature. Biochemistry, vol. 32 , No. 8 , Australian Office Action dated May 24 , 2019 for AU Application
pp . 1895 - 1898 , Mar. 1993 . ( Year: 1993 ). No. 2014233373 .
U . S . Patent Sep . 3 , 2019 Sheet 1 of 36 US 10,400,267 B2

103 Receive Whole Blood Sample


(Optionally fix WBCs)
100

105 Centrifuge to isolate Plasma


Fraction

107 Extract cDNA from Plasma


Fraction

109 m Prepare Library from Extracted


CfDNA

111 Perform Massively Parallel


Sequencing on Library

FIG . 1A
U . S . Patent Sep . 3 , 2019 Sheet 2 of 36 US 10 ,400,267 B2

121
Degrade Proteins in Plasma
Fraction - Treat Plasma with a
Chaotropic Agent

Contact Plasma Fraction with a


Support Matrix to absorb cfDNA
123

Wash Support Matrix

127
Release and Elute cfDNA from
Support Matrix

FIG . 1B
U . S . Patent Sep . 3 , 2019 Sheet 3 of 36 US 10 ,400,267 B2

octamer of core histones :


H2A , H2B , H3, H4 (each one x2)
core DNA *

histone H1 - linker DNA

FIG . 1C
U . S . Patent Sep . 3 , 2019 Sheet 4 of 36 US 10 ,400,267 B2

203 Receive Whole Blood Sample


(Optionally fix WBCs)

Centrifuge to isolate Plasma


205 Fraction
4 4 4 4

207 Reduce Concentration of W

Serum Proteins (optional) W

som
209 7 Make Library directly from
cfDNA in Plasma Fraction N
N

2117 Perform Massively Parallel


Sequencing on Library

FIG . 2A
U . S . Patent Sep . 3 , 2019 Sheet 5 of 36 US 10 ,400, 267 B2

209

221 21 Provide Ligase , adaptor


sequences and other reagents

Ligate adaptor sequences to


223 cfDNA under conditions that
partially release cfDNA from
Nucleosome Proteins

FIG . 2B
U . S . Patent Sep . 3 , 2019 Sheet 6 of 36 US 10 ,400,267 B2

01 Fix White Blood Cells in the


Sample

303 Ww Freeze the Whole Blood


Sample

305
Thaw the Whole Blood
Sample

307 .
Centrifuge to isolate a
Liquid Fraction

???? ??? ???? ???? ???? ????? ????? ???? ????? ????? ?????? ???? ???? ????? ???? ????
* * *oe mens man en
3097 Reduce Concentration of
Serum Proteins (optional)
?
??
???
??
???
;

Livini * * *

311 Make Library directly from


CfDNA in Plasma Fraction

313 m
Perform Massively Parallel
Sequencing on Library

FIG . 3A
U . S . Patent Sep . 3 , 2019 Sheet 7 of 36 US 10 ,400,267 B2

300 Receive Whole Blood Sample

301 a Fix White Blood Cells in the


Sample

303 Freeze the Whole Blood


Sample

305
Thaw the Whole Blood Sample

307 Centrifuge to isolate a Liquid


Fraction

32171 Extract CfDNA from Liquid


Fraction

3232 Prepare Library from Extracted


CfDNA

325 Perform Massively Parallel


Sequencing on Library

FIG . 3B
U . S . Patent Sep . 3 , 2019 Sheet 8 of 36 US 10 ,400,267 B2

Receptacle for 2 Treated medical


4 drops of blood sponge for
with locking lid absorbing &
stabilizing plasma
. ex

Membrane that separates


* *
*. R .
2

FIG . 4
U . S . Patent Sep . 3 , 2019 Sheet 9 of 36 US 10 ,400,267 B2

500
510 m Obtain a biological source
sample comprising
genomic nucleic acids

520 7 Combine marker nucleic


Yacids with biological source
sample

530 Prepare sequencing library


of sample genomic and
marker nucleic acids

540 Perform massively parallel


singleplex sequencing

550
550
Y Analyze sequencing
information

560 1 Verify the integrity of


sample

FIG . 5
U . S . Patent Sep. 3, 2019 Sheet 10 of 36 US 10, 400,267 B2

Obtain a plurality of
600 610 biological samples
comprising genomic nucleic
acids

620 mm Combine unique marker


Ynucleic acids with biological
source

630
Prepare sequencing library
of indexed genomic and
marker nucleic acids

TRRRRRRRRRRRRRRRR

640
Perform massively parallel
multiplex sequencing

6500 Analyze sequencing


information

6607 Verify the integrity of each


of the plurality of samples

FIG . 6
U . S . Patent Sep . 3 , 2019 Sheet 11 of 36 US 10 ,400 , 267 B2

** ** * * * * * * * ** * ** * * * * * ** *

.
.
.
*

*
700
.
.

.
.
.

XX
X
500
* ** * * * * **
7
.
FIG

400
XX

300

200
U . S . Patent Sep. 3 , 2019 Sheet 12 of 36 US 10 ,400, 267 B2

???????? % Chromosome when 50ul plasma-ME


?????25u plasna- ME
w .50ul plasma- PC
. wipos 25ul plasma-PC

vAONNwpochreo .

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
chr11chr13chr15 chr17 chr19 chr21
FIG . 8
U . S . Patent Sep . 3 , 2019 Sheet 13 of 36 US 10 ,400,267 B2

)
?bp
1500

850
700

500
400

300 9A
.
FIG

200
150

100
50

15
U . S . Patent Sep . 3 , 2019 Sheet 14 of 36 US 10 ,400,267 B2

bpl
[
1500

850
700

500
400
9B
.
FIG
300

200
150
100

15
25

101
-
U . S . Patent Sep . 3 , 2019 Sheet 15 of 36 US 10,400,267 B2

1500
bp
[
]

700
850
WV.V
500
400

9C
.
FIG
300

200
150
100

25
15

70
U . S . Patent Sep . 3 , 2019 Sheet 16 of 36 US 10 ,400 , 267 B2

.
sveikino 50ul plasma-ME
???
.

TW - 20 Pls

%
Chr
.

TI

OBIIIIIIIIIII
r1

chr11chr13chr15chr17chr19chr21
FIG . 10
n: S ' ju?jd d?s E ' 6107 J??US LI JO 9€ SA ?OI+ 197°00 78

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .' . ' . ' . ' .

11B
.
FIG

Olv
Toplasma LBOlFoTd WWWWWWWW

)ul / pg( .Conc

OOOOO mm
11A
.
FIG

3000 210 1800$15009 900 600


12008
U . S . Patent Sep . 3 , 2019 Sheet 18 of 36 US 10 ,400 ,267 B2

)
ul
/
pg
(
plasma
from
conc
DNA

11 o
0 100 200 300 400 500 600
DNA conc from FT Blood (pg / ul)

FIG . 12
U . S . Patent Sep . 3 , 2019 Sheet 19 of 36 US 10 ,400 , 267 B2

DNA
YYYYYYYYYYYYYYYYYYYYYYYY
. .
.

.
.
.
.

:: : : . . . .

.
.

.
.

35' 100150200 300 400 500600 1000 2000 10380

Library

red
40 50 60 70 80 90 100 110

FIG . 13A
U . S . Patent Sep . 3 , 2019 Sheet 20 of 36 US 10 ,400 ,267 B2

DNA
. . .

wo

35 100 150 200 300 400 500 600 1000 2000 ' 10380

Library

15 50 100 150 200 300 400 500 700 10001500

FIG . 13B
U . S . Patent Sep . 3 , 2019 Sheet 21 of 36 US 10 ,400 , 267 B2

DNA
. . . .. . . . .. ....... ....

::::: : : : :: :

:: : : : :: : : ::

.. . . ...

red

ho so so 100 120 TAO


Library
: : : :: : : . : . : : . : . . : . :

15 50 100 150 200 300 400 500 700 1500

FIG . 13C
U . S . Patent Sep . 3 , 2019 Sheet 22 of 36 US 10 ,400 , 267 B2

o Plasma
O FT Blood .

00
nMLPilbarsamray
+ + + + + + utututututututute
FT BI?od Library nM
tttttttttttttttt * utututututututututututututututut ttttttttttttt

35 45 55 65

FIG . 14

wipo Avg- FT
Avg - Plasma

mOenNoO .

.
Chr1 Chr3 Chr5 Chr7 Chr9Chr11- chr13 Chr15Chr17Chr19Chr21

FIG . 15
U . S . Patent Sep. 3, 2019 Sheet 23 of 36 US 10,400,267 B2

o
c R2 = 0.977
AP= 0.9731
uro .
Chr
%
w

fumad
o
Chr size (Mb)
40 90 140 190 240

FIG . 16

0 .273 Ratio _ 13 0.1018 atio 0 .092 Ratio 21


0 . 271 0. 1014 0 .091
0 . 269 0 . 101
0 . 267 1 0 .1006 0 .09
0. 1002
0 . 265 4 0 .0998 0 .089 www .com .com
0 1 2 311 1 2 3 11 0 1 2 3

FIG . 17
U . S . Patent Sep . 3 , 2019 Sheet 24 of 36 US 10 ,400 , 267 B2

R = 0 .9276
FT
Blood
Ratio
x
_ R2 = 0 .9496 Y
_
Ratio
Blood
FT
*

Plasma Ratio X Plasma Ratio Y


0 .68 0 . 71 0 .74 0.77 0 .8 0 0 .002 0 .004 0 .006 0 .008

FIG . 18
U . S . Patent Sep . 3, 2019 Sheet 25 of 36 US 10 ,400, 267 B2

WW

0000.
*

*
0
0 10 .
-
.

*
•*
.
.
.
.
.
-
+
0o **

:
:
.
.
.
.
.
.

w
.

.
.

-
.

20
co

socias
w
*

*
web
h

Zm
U .S . Patent Sep . 3, 2019
Sheet26of36
Sheet 26 of 36 US 10 ,400, 267 B2

20
.
FIG

DEL 180

120

LIBDTEOXM
90
C7:1MbBinhrom some
PY
60

Demow 00

.
U . S . Patent Sep . 3 , 2019 Sheet 27 of 36 US 10 ,400 ,267 B2

100

80

60

40

W
out +32 +
Fz
fait 21
.
FIG

XXX 120

11:1MbBinChrom some
90
60
30

324
- 19 96 -
128
U . S . Patent Sep . 3 , 2019 Sheet 28 of 36 US 10 ,400 , 267 B2

22
.
FIG

mencoba
W
w
Mb
Bin
Chromosome
1
.
22
U . S . Patent Sep . 3 , 2019 Sheet 29 of 36 US 10 ,400 , 267 B2

HA
150
X
22
21
20
19
18
17
16
15
14
13
12
11
10
9
w

S Chromosome
Bin
Mb
1
:
6 23A
.
FIG
30
60
90
120
8
7

6
5

2
84
+
1
U . S . Patent Sep . 3 , 2019 Sheet 30 of 36 US 10 ,400 , 267 B2

XXX
XXX

VVV

8000 000
000000000000 ??

?
990 1000 1010 1020 1030 1040 1050
Chromosome 6 : 100 kb Bin

FIG . 23B
U . S . Patent Sep . 3 , 2019 Sheet 31 of 36 US 10 ,400 , 267 B2

Y
150
-
veri

- 1208
90
60
30 Chromosome
Bin
Mb
1
:
-

X
22
21
20
19
18
17
16
15
14
13
12
11
10
9
m
.
www
4
+

do N
24A
.
FIG
-

wstaw
160
8
We NXX com
.

out
viro us
7

6
come
contato
contacte
Montes
Motos
Motor
*

5 »W W 120
80
40
7:1MbBinChrom some
4

-
w 3
Moto
Motot
*
Moh
.
roa
country
another
one
the
are
who
states
your
down

AN
2 W

-
0

4
-
*

Z A
Z7j
U . S . Patent Sep. 3, 2019 Sheet 32 of 36 US 10 ,400 ,267 B2

BE on
or
more
common

wwwwwwnw.comowwsw

400 8:100kbBinChrom some


550
500
450

KRIES W VE
www
com
.
ww
wowi
ws
www

w
24B
.
FIG
D

ymw
on
now
we
so
ws
ww
www

Av w

IBLDAXSTRE C
Bin
kb
100
:
7 h ro m s o m e
w

w
le le 1425
1575
1550
1525
1500
1475
1450

ble
.
w
ww
ws
www

.
www
w

.w w
27
U . S . Patent Sep . 3 , 2019 Sheet 33 of 36 US 10 ,400 , 267 B2

*
12 150
*
*
*
*
*

X
22
21
20
19
18
17
16
15
14
13
12
11
10
9

90
60
30
Chromosome
Bin
Mb
1
:
8
120
25
.
FIG

* 8
7

6
5
4
a
.

nrn.www
4
+

3 281
. 2

1
U . S . Patent Sep . 3 , 2019 Sheet 34 of 36 US 10 ,400 , 267 B2

w N
*

w
w

w
Motion
100
X
22
21
20
19
18
17
116
15
14
.
13
12
11
10
9 wow
wo
vamansaavasnnewaswarnamanasernaawa. wo

wo

wo 15:1MbBinChrom some 26
.
FIG
menwo 80
60
40
- wo

- W
W
8
7

6
5
4
wo
moon

20

3 )2015
2
-
-
+
4

W 1
U . S . Patent Sep . 3 , 2019 Sheet 35 of 36 US 10 ,400 , 267 B2

Y
80

17:1MbBinChrom some
60
40
20

X
22
21
20
19
18
17
16
15
14
13
12
11
meo

] Z17 27
.
FIG

'

I
6
5
4 AMV 12010
90
60
30
'

:1MbBinChrom some
Wywer '

2
'

1 0
0
0
'

????????????????????????????

Z10j
U . S . Patent Sep . 3 , 2019 Sheet 36 of 36 US 10 ,400 , 267 B2

Y
-

-
VA.

.
*

ek
*

*
80
60
40
20 17:1MbBinChrom some
*

X
22
21
20
19
18
17
16
15
14
13
12
11
10
om
.

*
*
0
.

.
promotoren
.

28
.
FIG
.

.
Z?ri
.
.

.
7

3010
:1MbBinChrom some
.

.
6
5
120
90
60
.
.

.
4

3
ws
2

1
.
w

12
+
OLZ
US 10 ,400 , 267 B2
GENERATING CELL - FREE DNA LIBRARIES cell- free DNA ( cfDNA ) without the need of first isolating
DIRECTLY FROM BLOOD the cfDNA from a liquid fraction of a test sample . In some
embodiments , the method involves reducing the binding
CROSS REFERENCE TO RELATED between the cfDNA and nucleosomal proteins without
APPLICATIONS 5 unwinding the cfDNA from the nucleosomal proteins. In a
process by which a sequencing library is generated directly
This application is a continuation application of U .S . from a biological fluid without an intervening DNA isolation
patent application Ser. No. 14 /214 ,277 , filed Mar. 14 , 2014 ; step , there is a minimum amount of the fluid required to
which claims the benefit under 35 U . S . C . $ 119 ( e ) ( 1 ) of U . S . successfully generate the library and still generate useable
Provisional Patent Application No. 61/ 801, 126 , filed Mar. 1 downstream data .
15 , 2013 ; all of the above prior applications are hereby In some embodiments, the reduction of binding may be
incorporated by reference in their entireties. achieved by treating with a detergent or heating. In some
embodiments , the method further involves freezing and
BACKGROUND thawing the test sample before reducing the binding between
One of the critical endeavors in human medical research the cfDNA and the nucleosomal proteins . In some embodi
is the discovery of genetic abnormalities that produce ments, the test sample is a peripheral blood sample from a
adverse health consequences . In many cases , specific genes pregnant woman including cDNA of both a mother and a
and/ or critical diagnostic markers have been identified for fetus, wherein the methods may be used to detect fetal
use in prenatal and cancer diagnosis, for example . 2020 Cho
chromosomal abnormality such as copy number variation .
Conventional procedures for genetic screening and bio - Kits for detection of copy number variation of the fetus
logical dosimetry have utilized invasive procedures, e .g . using the disclosed methods are also provided .
amniocentesis, to obtain cells for the analysis of karyotypes. In some embodiments , the disclosure provides a method
The advent of technologies that allow for sequencing entire for obtaining sequence information from a blood sample
genomes in relatively short time, and the discovery of 25 comprising cell-free DNA . The method involves the follow
circulating cell- free DNA (cfDNA ) have provided the ing : (a ) obtaining the plasma fraction of a whole blood
opportunity to compare genetic material originating from sample ; (b ) without first purifying the cell - free DNA from
one chromosome to be compared to that of another without the plasma fraction , preparing a sequencing library from the
the risks associated with invasive sampling methods. How cell- free DNA; and (c ) sequencing said sequencing library to
ever, the limitations of the existing methods, which include 30 obtain sequence information . In some embodiments, the
insufficient sensitivity stemming from the limited levels of method further includes obtaining the whole blood sample
cfDNA and the special care required in extracting cfDNA, containing cell- free DNA from a subject. In some embodi
underlie the continuing need for improved methods that ments, the whole blood sample is a peripheral blood sample .
would provide inexpensive and reliable diagnosis protocols In some embodiments, the operation of obtaining the
utilizing cfDNA in a variety of clinical settings. 35 plasma fraction involves centrifuging the whole blood
Conventionally, when blood is collected in the commonly sample and removing the resulting buffy coat and hematocrit
used blood collection tubes, such as EDTA tubes and ACD fractions. In some embodiments , the operation of obtaining
tubes, the plasma has to be separated from other blood the plasma fraction further involves centrifuging to the
fractions before purifying cfDNA . Plasma is generally sepa plasma fraction to remove solids from the plasma fraction .
rated from other blood components by centrifugation. The 40 In some embodiments, the process further involves stabiliz
reason for the mandatory plasma isolation step is to avoid ing white blood cells prior to centrifugation .
contaminating the cfDNA with cellular DNA from the white In some embodiments, the process further involves only
blood cells . In addition to separating the plasma, cfDNA a single centrifugation step performed on the whole blood
must be purified by, e .g ., releasing it from nucleosomes prior sample prior to preparing the sequencing library , wherein the
to sequencing . Unfortunately, the purification steps associ- 45 single centrifugation step is performed at an acceleration of
ated with conventional techniques for isolating cfDNA at least about 10 , 000 g .
increase the cost and complexity of the cfDNA diagnostic In some embodiments, the operation of preparing a
procedures. sequencing library from the cell -free DNA involves contact
ing the plasma fraction with sequencing adaptors and a
INCORPORATION BY REFERENCE 50 ligase .
In some embodiments, the process further involves expos
All patents , patent applications, and other publications, ing the plasma fraction to conditions that reduce the binding
including all sequences disclosed within these references, of cell- free DNA to nucleosomal proteins without fully
referred to herein are expressly incorporated herein by detaching the cell-free DNA from the nucleosomal proteins .
reference , to the same extent as if each individual publica - 55 In some embodiments, the conditions that reduce the bind
tion, patent or patent application was specifically and indi- ing of cell - free DNA to nucleosomal proteins include expos
vidually indicated to be incorporated by reference . All ing the plasma fraction to a detergent. In some embodiments ,
documents cited are , in relevant part, incorporated herein by the detergent is a non -ionic detergent. In some embodiments ,
reference in their entireties for the purposes indicated by the the conditions that reduce the binding of cell- free DNA to
context of their citation herein . However, the citation of any 60 nucleosomal proteins include heating the plasma fraction to
document is not to be construed as an admission that it is a temperature of between about 35º C . and 70° C . while
prior art with respect to the present disclosure . contacting the plasma fraction with the sequencing adaptors
and ligase .
SUMMARY In some embodiments , prior to preparing a sequencing
65 library from the cell - free DNA , the cell - free DNA is not
The disclosure provides methods and kits for preparing isolated from the whole blood sample or the plasma. In some
sequencing library to detect chromosomal abnormality using embodiments , prior to preparing a sequencing library from
US 10 ,400 , 267 B2
the cell- free DNA , the cell- free DNA is not removed from In certain embodiments , prior to preparing a sequencing
the whole blood sample or the plasma by contact with a library from the cell- free DNA , the cell- free DNA is not
support matrix . isolated from the whole blood sample or the liquid fraction
In some embodiments , prior to and during preparing a ( e .g ., not contacting the liquid fraction with a support
sequencing library from the cell- free DNA , no protease is 5 matrix ). In certain embodiments , during preparing a
added to the plasma fraction . In some embodiments , the sequencing library from the cell- free DNA , no protease is
process also involves removing serum proteins from the added to the liquid fraction .
plasma fraction prior to preparing a sequencing library from In certain embodiments , the method additionally includes
removing serum proteins from the liquid fraction prior to
the cell-free DNA . In some embodiments , removing serum 10 preparing
proteins from the plasma fraction involves passing the removing may a sequencing library from the cell-free DNA . The
plasma fraction over a support matrix which adsorbs the include passing the liquid fraction over a
serum proteins. support matrix which adsorbs the serum proteins .
In certain embodiments, sequencing the library includes
In some embodiments , massively parallel sequencing is conducting massively parallel sequencing . The sequence
used to perform on the sequencing libraries. In some 15 information may include sequence reads, which may be
embodiments, the sequence information comprises sequence mapped to a reference sequence .
reads. In some embodiments, the process further includes In embodiments where the subject is a pregnant indi
mapping the sequence reads to a reference sequence . vidual, the cell- free DNA is fetal cell-free DNA of a fetus
In some embodiments, the subject providing the blood carried by the pregnant mother. The methods may also
sample is a pregnant mother. The cell-free DNA includes 20 include using the cell- free DNA to determine copy number
fetal cell- free DNA of a fetus carried by the pregnantmother. variation (CNV ) in the fetus. In some embodiments , the
In someembodiments, the process further involves using the subject is a cancer patient. As an example , the cell-free DNA
cell -free DNA to determine copy number variation (CNV ) in may be cell -free DNA of a cancer genome, which may be
the fetus. used to determine copy number variation (CNV ) in such
In other embodiments , the subject providing the blood 25 genome. As an example, the CNV results from loss of
sample is a cancer patient. The cell- free DNA includes homozygosity (LOH ).
cell- free DNA of a cancer genome. In some embodiments, Another aspect of the disclosure concerns kits for classi
the process further involves using the cell- free DNA to fying a copy number variation in a fetal genome, which kits
determine copy number variation (CNV ) in the cancer may be characterized by the following elements : (a ) a
genome. In some embodiments, the CNV results from loss 30 sample collection device for holding a maternal test sample
of homozygosity (LOH ). comprising fetal and maternal nucleic acids ; (b ) an in
In some aspects, the disclosure pertains to methods for process positive control (IPC ) containing one or more
obtaining sequence information from a whole blood sample nucleic acids comprising one or more chromosomal aneu
containing cell- free DNA (e .g ., peripheral blood from a ploidies of interest, where the IPC provides a qualitative
subject such as a pregnant mother ). Such methods may be 35 positive sequence dose value for said one or more chromo
characterized by the following operations : (a ) freezing the somal aneuploidies of interest ; and (c ) one or more fixatives
whole blood sample ; (b ) thawing the frozen whole blood for white blood cell nuclei, one or more nuclease inhibitors,
sample ; (c ) separating solids from the thawed whole blood one or more albumin depletion columns, one or more Ig
sample to obtain a liquid fraction ; (d ) preparing a sequenc depletion columns, one ormore nonionic detergents or salts,
ing library from cell- free DNA in the liquid fraction ; and ( e ) 40 or combinations thereof. As an example , the one or more
sequencing said sequencing library to obtain sequence infor - nonionic detergents may include TWEEN® -20 , at a con
mation . In some implementations, preparing the sequencing centration of between about 0 . 1 % to about 5 % .
library from cell- free DNA is performed without first puri In some implementations, the IPC includes markers to
fying the cell- free DNA from the liquid fraction . track sample( s ) through the sequencing process. In certain
Such method may further include, prior to ( a ), fixing 45 embodiments , the one or more nucleic acids comprising one
blood cells in the whole blood sample . The freezing may or more chromosomal aneuploidies of interest in the IPC
degrade the blood cells without releasing DNA from nuclei comprise i) nucleic acids comprising one or more internal
of the blood cells . Separating solids from the thawed whole positive controls for calculating a first fetal fraction and
blood sample may include centrifuging the thawed whole detecting copy number variations at a first location on a
blood sample . As an example, only a single centrifugation 50 reference genome; and ii) nucleic acids comprising one or
step is performed on the thawed whole blood sample prior more internal positive controls for calculating a second fetal
to preparing the sequencing library , and wherein the single fraction at a second location on the reference genome other
centrifugation step is performed at an acceleration of at least than the first location on the reference genome for detecting
about 10 ,000 g . the copy number variation in i). In certain embodiments , the
In certain embodiments , preparing a sequencing library 55 IPC is configured to relate the sequence information
from the cell-free DNA includes contacting the liquid frac - obtained for the maternal test sample to the sequence
tion with sequencing adaptors and a ligase . This may be information obtained from a set of qualified samples that
conducted in a process that includes exposing the liquid were sequenced at a different time.
fraction to conditions that reduce the binding of cell-free The kit may include one or more marker molecules such
DNA to nucleosomal proteins without fully -detaching the 60 as nucleic acids and /or nucleic acid mimics that provide
cell- free DNA from the nucleosomal proteins. The condi- antigenomic marker sequence ( s ) suitable for tracking and
tions that reduce the binding of cell - free DNA to nucle - verifying sample integrity . The marker molecules may
osomal proteins may include exposing the liquid fraction to include one or more mimetics selected from the group
a detergent ( e . g ., a non -ionic detergent) and /or heating the consisting of a morpholino derivative, a peptide nucleic acid
plasma fraction to a temperature of between about 35° C . 65 (PNA ), and a phosphorothioate DNA .
and 70° C . while contacting the liquid fraction with the In certain embodiments , the sample collection device
sequencing adaptors and ligase . comprises a device for collecting blood and , optionally a
US 10 ,400 , 267 B2
receptacle for containing blood . Such device or receptacle cDNA in plasma, the process involving freezing and thaw
may include an anticoagulant and / or cell fixative , and/ or ing . The process of FIG . 3A does not require isolation of
said antigenomic marker sequence (s ) and /or said internal cDNA from plasma, while the process of FIG . 3B does .
positive controls . FIG . 4 below presents an example of another suitable
The kitmay also include a reagent for sequencing library 5 device for collecting whole blood .
preparation such as a solution for end -repairing DNA, FIG . 5 shows a flow chart of a method whereby marker
and/ or a solution for dA - tailing DNA, and/or a solution for nucleic acids are combined with source sample nucleic acids
adaptor ligating DNA . In some embodiments , the kit addi of a single sample to assay for a genetic abnormality while
tionally includes instructional materials teaching the use of determining the integrity of the biological source sample .
said reagents to determine copy number variation in a 10 FIG . 6 shows a flowchart of an embodiment of the method
biological sample. As an example , the instructionalmaterials for verifying the integrity of samples that are subjected to a
teach the use of said materials to detect a monosomy and /or multistep multiplex sequencing bioassay .
a trisomy. As another example , the instructional materials FIG . 7 shows an electropherogram showing identical
teach the use of said materials to detect a cancer or a library profiles on an Agilent BIOANALYZER® for
predisposition to a cancer. In some implementations, the kit 15 sequencing libraries made starting with 50 ul plasma with
does not include reagents for detecting any polymorphism the Qiagen MINELUTE® and the Phenol-Chloroform DNA
used as a marker for the fetal fraction . isolation methods.
In certain embodiments, the kit includes a sequencer for FIG . 8 shows that the % chromosome tags is invariant
sequencing the fetal and maternal nucleic acids. In certain with lowering amounts of plasma input,
embodiments , the kit includes consumable portion of a 20 FIG . 9A shows a BIOANALYZER® profile of the library
sequencer. The consumable portion is configured to generated with a peak at the expected 300 bp size from the
sequence fetal and maternal nucleic acids from one or more sample processed by protein depletion . FIG . 9B shows a
maternal test samples . Examples of consumable portions comparative BIOANALYZER® profiles of plasma samples
include a flow cell and a chip configured to detect ions . treated with BRIJ® - 35 (middle ), NP40 ( bottom ) and TRI
In certain embodiments , the IPC contains a trisomy 25 TON® - X100 (top ). FIG . 9C shows a BIOANALYZER®
selected from the group consisting of trisomy 21, trisomy profile of a plasma sample in the presence of 0 .05 %
18 , trisomy 21, trisomy 13 , trisomy 16 , trisomy 13 , trisomy TWEEN - 20 .
9 , trisomy 8 , trisomy 22 , XXX , XXY, and XYY (e .g ., FIG . 10 shows the % Chr distribution from a control
trisomy 21 ( T21), trisomy 18 ( T18 ), and trisomy 13 ( T13 )). library made from purified DNA and that from a library
In certain embodiments , the IPC contains an amplification or 30 generated directly from plasma.
a deletion of a p arm or a q arm of any one or more of FIGS. 11A and 11B show the range of cfDNA concen
chromosomes 1 - 22 , X and Y . In certain embodiments, the trations measured for the 31 samples from FT Blood and
IPC contains a partial deletion of one or more arms selected plasma. The figures visualize comparison between DNA
from the group of 1p , lq , 39 , 4p , 5p, 59 , 77, 9q , 10p, 11q, yield from plasma and yield from FT Blood . FIG . 11A shows
134, 18 , 159 , 17p , 22p and 22q . In certain embodiments, the 35 all 31 samples, and FIG . 11B shows the same data without
IPC contains a partial duplication of one or more arms the 6 samples that had high DNA concentration .
selected from the group of 5q , 79, 8p , 134 , 12p , 157, and FIG . 12 shows the correlation between the two starting
17p . In certain embodiments, the IPC is configured to materials for DNA isolation , with the six outliers excluded
provide data for calculating a sequence dose value for said (leaving 25 samples ).
one or more chromosomal aneuploidies of interest. 40 FIGS . 13A to 13C show DNA library profiles, demon
Another aspect of the disclosure concerns kits for classi- strating effect of HMW DNA contamination on library
fying a copy number variation in a cancer genome, which profile .
kits contain (a ) a sample collection device for holding a FIG . 14 shows comparative library yield range and cor
cancer patient test sample comprising cancer and non -cancer relation for 22 paired plasma and FT Blood cfDNAs.
nucleic acids; (b ) an in -process positive control (IPC ) com - 45 FIG . 15 shows % Chr for FT Blood vs . plasma libraries
prising one or more nucleic acids comprising one or more as a function of Chromosomes .
chromosomal aneuploidies of interest, wherein the IPC FIG . 16 shows % Chr plot as a function of Chr size (Mb )
provides a qualitative positive sequence dose value for said for the FT Blood and plasma conditions .
one or more chromosomal aneuploidies of interest; and (c ) FIG . 17 shows the ratios reported for chromosomes 13 , 18
one or more fixatives for white blood cell nuclei, one or 50 and 21 . Condition 1 = FT Blood ; condition 2 =plasma.
more nuclease inhibitors, one or more albumin depletion FIG . 18 shows correlation between FT Blood and Plasma
columns , one or more Ig depletion columns, one or more for Ratio _ X and Ratio _ Y .
nonionic detergents or salts , or combinations thereof. FIG . 19 shows the family 2139 221 ; 1 Mb bin results for
Chr 21 with 0 % (solid circles ) and 10 % (empty circles )
BRIEF DESCRIPTION OF THE DRAWINGS 55 mixtures of the affected son 's DNA mixed with the mother 's
DNA .
FIG . 1A shows a conventional process for processing FIG . 20 shows the family 1313 27; 1 Mb bin results for
cfDNA using next generation sequencing . FIG . 1B shows a Chr 7 with 0 % ( solid circles) and 10 % (empty circles )
process of isolating cfDNA using a support matrix . FIG . 1C mixtures of the affected son 's DNA mixed with the mother' s
illustrates the structure a nucleosome complex including a 60 DNA .
stretch of DNA wrapped around an octamer of histones . FIG . 21 shows the family 2877 z ; 1 Mb bin results for Chr
FIG . 2A shows a process for sample preparations for 11 and 15 with 0 % (solid circles ) and 10 % ( empty circles )
massively parallel sequencing using sequencing library pre - mixtures of the affected son ’s DNA mixed with themother's
pared directly from cfDNA in plasma. FIG . 2B shows the DNA .
operations involved in making the sequence library . 65 FIG . 22 shows the clinical sample C1925 Z22; 1 Mb bin
FIGS. 3A and 3B show processes for massively parallel results for Chr 22 with 0 % ( solid circles) and 10 % (empty
sequencing using sequencing library prepared directly from circles ) mixture of the affected son ’s DNA mixed with the
US 10,400 ,267 B2
mother's DNA . The 2 Mb and the 8 Mb duplications from Blood plasma is prepared by spinning a tube of whole
the son in the DNA mixture are shown . blood and containing an anticoagulant in a centrifuge until
FIG . 23 ( A - B ) shows clinical sample C65104 Zij 1 Mb bin the blood cells fall to the bottom of the tube . The blood
results with a karyotype with duplication in chromosome 6 . plasma is then poured or drawn off. Blood plasma has a
Expanded regions show Z6; 1 Mb bin and 100 kb bin results . 5 density of approximately 1025 kg/mº, or 1.025 kg/1.
FIG . 24 ( A - B ) shows the clinical sample C61154 Z ; 1 Mb “ Peripheral blood ” is blood that obtained from acral areas ,
bin results across the genome for clinical sample with a or from the circulation remote from the heart ; the blood in
karyotype with a small deletion in chromosome 7 ( circled ). the systemic circulation .
Another small deletion is detected in chromosome 8 “ Fixing” refers to a technique that maintains the structure
( circled ). Expanded regions show Zz; and Ze : 100 kb bin data . 10 of cells and/ or sub - cellular components such as cell organ
FIG . 25 shows the clinical sample C61731 Z 1 Mb bin elles (e .g ., nucleus). Fixing modifies the chemical or bio
results across the genome for clinical sample with a karyo logical structure cellular components by, e.g ., cross-linking
type with a small deletion in chromosome 8 . Expanded them . Fixing may cause whole cells and cellular organelles
region show Z8; 1 Mb bin data . 15 to resist lysis. Of interest, fixing may also cause cellular
FIG . 26 shows the clinical sample C62228 Zij 1 Mb bin nucleic acids to resist release into a surrounding medium .
results across the genome for clinical sample with a karyo - For example, fixing may prevent nuclear DNA from white
type with a deletion in chromosome 15 . Expanded region blood cells to resist release into a plasma fraction during
show Z15 ; 1 Mb bin data . centrifugation of whole blood .
FIG . 27 shows the clinical sample C61093 zi; 1 Mb bin 20 “ Fixative” refers to an agent such as a chemical or
results across the genome with a karyotype 46 , XY, add ( 10 ) biological reagent that fixes cellular nucleic acids and
(926 ). Expanded regions show Z10 ; and 217; 1 Mb bin data . thereby causes cells to resist release of such nucleic acids
FIG . 28 shows the clinical sample C61233 Z ;; 1 Mb bin into a surrounding medium . A fixative may disable cellular
results across the genome with a karyotype 46 ,XX , add ( X ) proteolytic enzymes and nucleases . Examples of fixatives
(p22.1 ). Expanded regions show z3j and zXj 1 Mb bin data . 25 include aldehydes (e .g ., formaldehyde ), alcohols, and oxi
The figures show a 40 Mb- long duplication of the region dizing agents . Examples of suitable fixatives are presented
from 158 Mb to 198 Mb on Chr 3 and a 9 Mb-long deletion in US Patent Application Publication 2010 /0184069, filed
on Chr X from 1 Mb to 10 Mb ( although the signal from this Jan . 19 , 2010 , and in US Patent Application Publication No .
deletion did not meet our criteria for classifying it as a 2010 / 209930 , filed Feb . 11, 2010 , each incorporated herein
CNV ) . 30 by reference in its entirety. A vendor of commercially
available fixative compositions for fixing nuclei of white
DETAILED DESCRIPTION blood cells is Streck , Inc . of Omaha Nebr. Streck blood
collection tubes such the Streck Cell-free DNA BCT contain
Definitions a mild preservative , which fixes cellular nuclei and large
35 cellular components, thereby inhibiting white blood cell
“ Whole Blood sample” herein refers to a whole blood lysis that can contaminate plasma DNA with cellular DNA .
sample that has not been fractionated or separated into its “ Freeze” means to turn a liquid sample into a solid sample
component parts. Whole blood is often combined with an by lowering the temperature and optionally increasing the
anticoagulant such as EDTA or ACD during the collection pressure of the sample. In a sample containing biological
process, but is generally otherwise unprocessed . In the US, 40 materials such as cells , freezing typically forms ice crystals,
the capitalized “ Whole Blood ” means a specific standard which will break or otherwise disrupt the biological mate
ized product for transfusion or further processing, where rials . This disruption may involve breaking apart cell mem
" whole blood ” is any unmodified collected blood . branes such cellular components are no longer confined to
“ Blood fractionation ” is the process of fractionating their original cells .
whole blood or separating it into its component parts . This 45 “ Thaw ” means to convert a frozen sample back into liquid
is typically done by centrifuging the blood . The resulting sample by increasing the temperature and optionally
components are : decreasing the pressure of the sample. A thawed sample
a clear solution of blood plasma in the upper phase (which containing biologicalmaterials may contain various cellular
can be separated into its own fractions ), constituents unconfined by the cellmembranes . In the case
a buffy coat, which is a thin layer of leukocytes (white 50 of thawed blood , such cellular constituents include, for
blood cells ) mixed with platelets in the middle, and example , cell nuclei, other cell organelles, hemoglobin ,
erythrocytes (red blood cells ) at the bottom of the centri- denatured proteins, etc .
fuge tube in the hematocrit faction . The term “ copy number variation ” herein refers to varia
Serum separation tubes (SSTs ) are tubes used in phle - tion in the number of copies of a nucleic acid sequence
botomy containing a silicone gel; when centrifuged the 55 present in a test sample in comparison with the copy number
silicone gel forms a layer on top of the buffy coat, allowing of the nucleic acid sequence present in a qualified sample . In
the blood plasma to be removed more effectively for testing certain embodiments, the nucleic acid sequence is 1 kb or
and related purposes . larger. In some cases , the nucleic acid sequence is a whole
“ Blood plasma” or “ plasma” is the straw - colored / pale - chromosome or significant portion thereof. A “ copy number
yellow liquid component of blood that normally holds the 60 variant” refers to the sequence of nucleic acid in which
blood cells in whole blood in suspension . It makes up about copy -number differences are found by comparison of a
55 % of total blood by volume. It is the intravascular fluid sequence of interest in test sample with an expected level of
part of [ extracellular fluid ] (all body fluid outside of cells). the sequence of interest. For example, the level of the
It is mostly water ( 93 % by volume), and contains dissolved sequence of interest in the test sample is compared to that
proteins including albumins, immunoglobulins, and fibrino - 65 present in a qualified sample . Copy number variants / varia
gen , glucose , clotting factors , electrolytes (Na + , Ca2+ ,Mg2 + , tions include deletions, including microdeletions, insertions,
HCO , Cl- etc .), hormones and carbon dioxide. including microinsertions, duplications, multiplications ,
US 10 ,400 , 267 B2
10
inversions, translocations and complex multi- site variants . variations (CNVS) in samples from any mammal, including,
CNVs encompass chromosomal aneuploidies and partial but not limited to dogs, cats, horses, goats, sheep , cattle ,
aneuploidies . pigs, etc . The sample may be used directly as obtained from
The term “ aneuploidy ” herein refers to an imbalance of the biological source or following a pretreatment to modify
genetic material caused by a loss or gain of a whole 5 the character of the sample . For example , such pretreatment
chromosome, or part of a chromosome. may include preparing plasma from blood , diluting viscous
The terms " chromosomal aneuploidy ” and “ complete fluids and so forth . Methods of pretreatment may also
chromosomal aneuploidy ” herein refer to an imbalance of involve, but are not limited to , filtration , precipitation ,
genetic material caused by a loss or gain of a whole dilution , distillation , mixing, centrifugation , freezing,
chromosome, and includes germline aneuploidy and mosaic 10 lyophilization , concentration , amplification , nucleic acid
aneuploidy. fragmentation , inactivation of interfering components, the
The terms“ partial aneuploidy ” and “ partial chromosomal addition of reagents, lysing, etc . If such methods of pre
aneuploidy ” herein refer to an imbalance of genetic material treatment are employed with respect to the sample , such
caused by a loss or gain of part of a chromosome e. g . partial pretreatment methods are typically such that the nucleic
monosomy and partial trisomy, and encompasses imbal- 15 acid ( s ) of interest remain in the test sample , preferably at a
ances resulting from translocations , deletions and insertions. concentration proportional to that in an untreated test sample
The term " aneuploid sample” herein refers to a sample (e.g., namely , a sample that is not subjected to any such
indicative of a subject whose chromosomal content is not pretreatment method ( s )) . Such “ treated ” or “ processed ”
euploid , i. e . the sample is indicative of a subject with an samples are still considered to be biological “ test” samples
abnormal copy number of chromosomes or portions or 20 with respect to the methods described herein .
chromosomes . The term “ normalizing sequence ” herein refers to a
The term “ aneuploid chromosome” herein refers to a sequence that is used to normalize the number of sequence
chromosome that is known or determined to be present in a tags mapped to a sequence of interest associated with the
sample in an abnormal copy number. normalizing sequence . In some embodiments, the normal
The term " plurality ” refers to more than one element. For 25 izing sequence displays a variability in the number of
example , the term is used herein in reference to a number of sequence tags that are mapped to it among samples and
nucleic acid molecules or sequence tags that is sufficient to sequencing runs that approximates the variability of the
identify significant differences in copy number variations sequence of interest for which it is used as a normalizing
( e. g . chromosome doses ) in test samples and qualified parameter, and that can differentiate an affected sample from
samples using the methods disclosed herein . In some 30 one or more unaffected samples . In some implementations,
embodiments , at least about 3x106 sequence tags , at least the normalizing sequence best or effectively differentiates,
about 5x106 sequence tags , at least about 8x106 sequence when compared to other potential normalizing sequences
tags, at least about 10x106 sequence tags, at least about such as other chromosomes, an affected sample from one or
15x106 sequence tags, at least about 20x106 sequence tags, more unaffected samples. A “ normalizing chromosome” or
at least about 30x106 sequence tags, at least about 40x106 35 “ normalizing chromosome sequence ” is an example of a
sequence tags , or at least about 50x106 sequence tags " normalizing sequence ” . A “ normalizing chromosome
comprising between about 20 and 40 bp reads are obtained sequence ” or “normalizing chromosome” can be composed
for each test sample . of a single chromosome or of a group of chromosomes . A
The terms “ polynucleotide” , “ nucleic acid” and “ nucleic " normalizing segment” is another example of a “ normaliz
acid molecules ” are used interchangeably and refer to a 40 ing sequence” . A “ normalizing segment sequence ” can be
covalently linked sequence of nucleotides (i.e ., ribonucle - composed of a single segment of a chromosome or it can be
otides for RNA and deoxyribonucleotides for DNA ) in composed of two or more segments of the same or of
which the 3' position of the pentose of one nucleotide is different chromosomes. In certain embodiments, a normal
joined by a phosphodiester group to the 5' position of the i zing sequence is intended to normalize for variability such
pentose of the next, include sequences of any form of 45 as process - related variability , which stems from interchro
nucleic acid , including , but not limited to RNA and DNA mosomal (intra -run ), inter -sequencing (inter-run ) and/ or
molecules such as cfDNA molecules. The term “ polynucle - platform -dependent variability .
otide ” includes, without limitation , single - and double The term “ sequence dose ” herein refers to a parameter
stranded polynucleotide. that relates the number of sequence tags identified for a
The term “ portion ” is used herein in reference to the 50 sequence of interest and the number of sequence tags
amount of sequence information of fetal and maternal identified for the normalizing sequence . In some cases , the
nucleic acid molecules in a biological sample that in sum sequence dose is the ratio of the number of sequence tags
amount to less than the sequence information of 1 human identified for a sequence of interest to the number of
genome. sequence tags identified for the normalizing sequence . In
The term “ test sample” herein refers to a sample , typically 55 some cases , the sequence dose refers to a parameter that
derived from a biological fluid , cell , tissue, organ , or organ - relates the sequence tag density of a sequence of interest to
ism , comprising a nucleic acid or a mixture of nucleic acids the tag density of a normalizing sequence . A “ test sequence
comprising at least one nucleic acid sequence that is to be dose ” is a parameter that relates the sequence tag density of
screened for copy number variation . In certain embodiments a sequence of interest, e. g . chromosome 21 , to that of a
the sample comprises at least one nucleic acid sequence 60 normalizing sequence e . g . chromosome 9 , determined in a
whose copy number is suspected of having undergone test sample . Similarly , a " qualified sequence dose ” is a
variation . Such samples include , but are not limited to parameter that relates the sequence tag density of a sequence
sputum /oral fluid , amniotic fluid , blood , a blood fraction , orof interest to that of a normalizing sequence determined in
fine needle biopsy samples (e .g., surgical biopsy , fine needle a qualified sample.
biopsy , etc.) urine, peritoneal fluid , pleural fluid , and the 65 The term “ sequence tag density ” herein refers to the
like. Although the sample is often taken from a human number of sequence reads that are mapped to a reference
subject (e. g., patient), the assays can be used to copy number genome sequence ; e. g . the sequence tag density for chro
US 10 ,400 , 267 B2
12
mosome 21 is the number of sequence reads generated by priate to determine whether it matches a reference sequence
the sequencing method that are mapped to chromosome 21 or meets other criteria . A read may be obtained directly from
of the reference genome. The term “ sequence tag density a sequencing apparatus or indirectly from stored sequence
ratio ” herein refers to the ratio of the number of sequence information concerning the sample . In some cases, a read is
tags that are mapped to a chromosome of the reference 5 a DNA sequence of sufficient length ( e . g ., at least about 30
genome e . g . chromosome 21 , to the length of the reference bp ) that can be used to identify a larger sequence or region ,
genome chromosome. e . g . that can be aligned and specifically assigned to a
The term “ Next Generation Sequencing (NGS)” herein chromosome or genomic region or gene.
refers to sequencing methods that allow for massively The term “ sequence tag ” is herein used interchangeably
parallel sequencing of clonally amplified molecules and of 10 with the term “mapped sequence tag ” to refer to a sequence
single nucleic acid molecules. NGS is synonymous with read that has been specifically assigned i.e . mapped , to a
“ massively parallel sequencing” for most purposes. Non - larger sequence e.g . a reference genome, by alignment.
limiting examples of NGS include sequencing -by -synthesis Mapped sequence tags are uniquely mapped to a reference
using reversible dye terminators , and sequencing-by - liga - genome i.e. they are assigned to a single location to the
tion . 15 reference genome. Tags may be provided as data structures
The terms " threshold value ” and “ qualified threshold or other assemblages of data. In certain embodiments, a tag
value ” herein refer to any number that is used as a cutoff to contains a read sequence and associated information for that
characterize a sample such as a test sample containing a read such as the location of the sequence in the genome, e . g .,
nucleic acid from an organism suspected of having a medical the position on a chromosome. In certain embodiments, the
condition . The threshold may be compared to a parameter 20 location is specified for a positive strand orientation . A tag
value to determine whether a sample giving rise to such may be defined to provide a limit amount of mismatch in
parameter value suggests that the organism has the medical aligning to a reference genome. Tags that can be mapped to
condition . In certain embodiments, a qualified threshold more than one location on a reference genome i.e . tags that
value is calculated using a qualifying data set and serves as do not map uniquely, may not be included in the analysis .
a limit of diagnosis of a copy number variation e .g . an 25 As used herein , the terms " aligned " , " alignment” , or
aneuploidy, in an organism . If a threshold is exceeded by " aligning" refer to the process of comparing a read or tag to
results obtained from methods disclosed herein , a subject a reference sequence and thereby determining whether the
can be diagnosed with a copy number variation e . g . trisomy reference sequence contains the read sequence . If the refer
21. Appropriate threshold values for the methods described ence sequence contains the read , the read may bemapped to
herein can be identified by analyzing normalizing values 30 the reference sequence or, in certain embodiments , to a
( e. g . chromosome doses , NCVs or NSVs) calculated for a particular location in the reference sequence. In some cases ,
training set of samples. Threshold values can be identified alignment simply tells whether or not a read is a member of
using qualified (i.e . unaffected ) samples in a training set a particular reference sequence (i.e ., whether the read is
which comprises both qualified (i.e. unaffected ) samples and present or absent in the reference sequence). For example ,
affected samples . The samples in the training set known to 35 the alignment of a read to the reference sequence for human
have chromosomal aneuploidies (i.e . the affected samples ) chromosome 13 will tell whether the read is present in the
can be used to confirm that the chosen thresholds are useful reference sequence for chromosome 13 . A tool that provides
in differentiating affected from unaffected samples in a test this information may be called a set membership tester. In
set (see the Examples herein ). The choice of a threshold is some cases , an alignment additionally indicates a location in
dependent on the level of confidence that the user wishes to 40 the reference sequence where the read or tag maps to . For
have to make the classification . In some embodiments, the example , if the reference sequence is the whole human
training set used to identify appropriate threshold values genome sequence , an alignment may indicate that a read is
comprises at least 10 , at least 20, at least 30 , at least 40 , at present on chromosome 13 , and may further indicate that the
least 50 , at least 60 , at least 70 , at least 80 , at least 90 , at least read is on a particular strand and /or site of chromosome 13 .
100, at least 200 , at least 300 , at least 400 , at least 500 , at 45 Aligned reads or tags are one or more sequences that are
least 600 , at least 700 , at least 800 , at least 900, at least 1000 , identified as a match in terms of the order of their nucleic
at least 2000, at least 3000 , at least 4000 , or more qualified acid molecules to a known sequence from a reference
samples . It may advantageous to use larger sets of qualified genome. Alignment can be done manually, although it is
samples to improve the diagnostic utility of the threshold typically implemented by a computer algorithm , as it would
values . 50 be impossible to align reads in a reasonable time period for
The term “ normalizing value ” herein refers to a numerical implementing the methods disclosed herein . One example of
value that relates the number of sequence tags identified for an algorithm from aligning sequences is the Efficient Local
the sequence ( e . g . chromosome or chromosome segment ) of Alignment ofNucleotide Data (ELAND ) computer program
interest to the number of sequence tags identified for the distributed as part of the Illumina Genomics Analysis pipe
normalizing sequence ( e . g . normalizing chromosome or 55 line . Alternatively , a Bloom filter or similar set membership
normalizing chromosome segment). For example , a “ nor - tester may be employed to align reads to reference genomes .
malizing value ” can be a chromosome dose as described See U . S . Patent Application No. 61/ 552, 374 filed Oct. 27 ,
elsewhere herein , or it can be an NCV (Normalized Chro 2011 which is incorporated herein by reference in its
mosome Value) as described elsewhere herein , or it can be entirety . The matching of a sequence read in aligning can be
an NSV (Normalized Segment Value ) as described else - 60 a 100 % sequence match or less than 100 % (non -perfect
where herein.
The term “ read ” refers to a sequence read from a portion
match ).
As used herein , the term " reference genome” or “ refer
of a nucleic acid sample . Typically, though not necessarily, ence sequence” refers to any particular known genome
a read represents a short sequence of contiguous base pairs sequence , whether partial or complete, of any organism or
in the sample. The read may be represented symbolically by 65 virus which may be used to reference identified sequences
the base pair sequence (in ATCG ) of the sample portion . It from a subject. For example , a reference genome used for
may be stored in a memory device and processed as appro human subjects as well as many other organisms is found at
US 10 ,400, 267 B2
13 14
the National Center for Biotechnology Information at The term “ partial” when used in reference to a chromo
www .ncbi.nlm .nih . gov. A " genome” refers to the complete somal aneuploidy herein refers to a gain or loss of a portion
genetic information of an organism or virus, expressed in i.e . segment, of a chromosome.
nucleic acid sequences. The term “ enrich ” herein refers to the process of ampli
In various embodiments, the reference sequence is sig - 5 fying polymorphic target nucleic acids contained in a por
nificantly larger than the reads that are aligned to it. For tion of a maternal sample , and combining the amplified
product with the remainder of the maternal sample from
example , it may be at least about 100 times larger, or at least which the portion was removed . For example , the remainder
about 1000 times larger,or at least about 10 ,000 times larger, of the maternal sample can be the original maternal sample .
or at least about 10 times larger, or at least about 10 times 10 The term " original
larger, or at least about 107 times larger. maternal sample ” herein refers to a
In one example , the reference sequence is that of a full non -enriched biological sample obtained from a pregnant
length human genome. Such sequences may be referred to as subject e . g . a woman , who serves as the source from which
genomic reference sequences. In another example , the ref a portion is removed to amplify polymorphic target nucleic
acids . The " original sample” can be any sample obtained
erence sequence is limited to a specific human chromosome 15 from a pregnant subject, and the processed fractions thereof
such as chromosome 13 . Such sequences may be referred to e . g . a purified cfDNA sample extracted from a maternal
as chromosome reference sequences. Other examples of plasma sample .
reference sequences include genomes of other species, as The term “ primer," as used herein refers to an isolated
well as chromosomes, sub - chromosomal regions (such as oligonucleotide which is capable of acting as a point of
strands ), etc . of any species. 20 initiation of synthesis when placed under conditions in
In various embodiments , the reference sequence is a which synthesis of a primer extension product , which is
consensus sequence or other combination derived from complementary to a nucleic acid strand , is induced (i.e ., in
multiple individuals. However, in certain applications, the the presence of nucleotides and an inducing agent such as
reference sequence may be taken from a particular indi. DNA polymerase and at a suitable temperature and pH ) . The
vidual. 25 primer is preferably single stranded for maximum efficiency
The term “ maternal sample ” herein refers to a biological in amplification, butmay alternatively be double stranded . If
sample obtained from a pregnant subject e .g . a woman . double stranded , the primer is first treated to separate its
The term “ biological fluid ” herein refers to a liquid taken strands before being used to prepare extension products .
from a biological source and includes , for example , blood, Preferably, the primer is an oligodeoxyribonucleotide . The
serum , plasma, sputum , lavage fluid , cerebrospinal fluid , 30 primer must be sufficiently long to prime the synthesis of
urine , semen , sweat, tears, saliva , and the like . As used extension products in the presence of the inducing agent.
herein , the terms " blood ," " plasma" and " serum ” expressly The exact lengths of the primers will depend on many
encompass fractions or processed portions thereof. Simi- factors , including temperature , source of primer, use of the
larly , where a sample is taken from a biopsy, swab , smear, method , and the parameters used for primer design .
etc ., the “ sample ” expressly encompasses a processed frac - 35 Cell Free DNA
tion or portion derived from the biopsy , swab , smear, etc . Cell-free fetal DNA and RNA circulating in maternal
The terms “ maternal nucleic acids ” and “ fetal nucleic blood can be used for the early non - invasive prenatal
acids ” herein refer to the nucleic acids of a pregnant female diagnosis (NIPD ) of an increasing number of genetic con
subject and the nucleic acids of the fetus being carried by the ditions, both for pregnancy management and to aid repro
pregnant female, respectively . 40 ductive decision -making . The presence of cell- free DNA
As used herein , the term “ fetal fraction ” refers to the circulating in the bloodstream has been known for over 50
fraction of fetal nucleic acids present in a sample comprising years . More recently , presence of small amounts of circu
fetal and maternal nucleic acid . Fetal fraction is often used lating fetal DNA was discovered in the maternal blood
to characterize the cfDNA in a mother' s blood . stream during pregnancy (Lo et al., Lancet 350 :485 -487
As used herein the term " chromosome” refers to the 45 [ 1997 ]) . Thought to originate from dying placental cells ,
heredity -bearing gene carrier of a living cell which is cell- free fetal DNA ( CfDNA ) has been shown to consists of
derived from chromatin and which comprises DNA and short fragments typically fewer than 200 bp in length Chan
protein components (especially histones ). The conventional et al., Clin Chem 50 : 88 - 92 [ 2004 ]) , which can be discerned
internationally recognized individual human genome chro - as early as 4 weeks gestation ( Illanes et al., Early Human
mosome numbering system is employed herein . 50 Dev 83 : 563 - 566 [20071) , and known to be cleared from the
The term “ subject” herein refers to a human subject as maternal circulation within hours of delivery (Lo et al., Am
well as a non -human subject such as a mammal, an inver - J Hum Genet 64 :218 -224 [ 1999 ]). In addition to cfDNA ,
tebrate , a vertebrate , a fungus, a yeast, a bacteria , and a fragments of cell -free fetal RNA (cfRNA ) can also be
virus. Although the examples herein concern humans and discerned in the maternal bloodstream , originating from
the language is primarily directed to human concerns , the 55 genes that are transcribed in the fetus or placenta . The
concepts disclosed herein are applicable to genomes from extraction and subsequent analysis of these fetal genetic
any plant or animal, and are useful in the fields of veterinary elements from a maternal blood sample offers novel oppor
medicine, animal sciences , research laboratories and such . tunities for NIPD .
The term " condition ” herein refers to “medical condition ” In addition to its application in NIPD , numerous reports
as a broad term that includes all diseases and disorders , but 60 in the literature have pointed out that cell- free DNA in
can include ?injuries ] and normal health situations , such as plasma or serum can be applied as a more specific tumor
pregnancy, thatmight affect a person 's health , benefit from marker, than conventional biological samples, for the diag
medical assistance, or have implications for medical treat- nosis and prognosis, as well as the early detection , of cancer.
ments . For instance , one study indicates that the elevation of serum
The term " complete ” is used herein in reference to a 65 cell-free DNA was usually detected in specimens containing
chromosomal aneuploidy to refer to a gain or loss of an elevated tumor markers and is most likely associated with
entire chromosome. tumor metastases . The electrophoretic pattern of cell -free
US 10 , 400, 267 B2
15 16
DNA showed that cell-free DNA from cancer patient is avoid disrupting the leukocytes to a point where their nuclei
fragmented , containing smaller DNA ( 100 bp ) not found in break apart and release DNA into the plasma fraction .
normal cell -free DNA . Wu, et al. Cell- free DNA: measure Density gradient centrifugation is typically used . If this first
ment in various carcinomas and establishment of normal centrifugation step is performed at too high of an accelera
reference range . Clin Chim Acta . 2002, 321( 1 - 2 ): 77 - 87 . 5 tion , some DNA from the leukocytes would likely contami
Baseline Process for Obtaining and Using cfDNA in nate the plasma fraction . After this centrifugation step is
Sequencing completed , the plasma fraction is separated from the other
A conventional process for sequencing cfDNA is fractions and further processed .
described here . It is represented in FIGS. 1A and 1B and in After the first centrifugation is performed at relatively low
the bullet outline below . While the process is described for 10 g - force , a second , optional, centrifugation of the plasma
sequencing cfDNA from blood samples ,many of the process fraction is performed at a higher g - force . In this step ,
steps apply in sequencing cfDNA found in other types of additional particulate matter from the plasma is spun out as
sample such as urine, sweat, saliva etc. a solid phase and removed . This additional solid material
The baseline process may have the following operations : may include some additional cells that also contain DNA
1 . collect blood with EDTA , ACD , or Streck blood 15 that could contaminate the cell free DNA that is to be
collection tubes analyzed . In some embodiments , the first centrifugation is
2 . centrifugations to isolate plasma fraction performed at an acceleration of about 1600 G and the second
a . Low g ( soft ) spin to fractionate blood into plasma centrifugation is performed at an acceleration of about
and other fractions (separate plasma from buffy coat 16 ,000 G .
and hematocrit to reduce contamination from DNA 20 While a single centrifugation process from normal blood
in the white blood cells) is possible , such process has been found to sometimes
b . high g (hard ) spin to separate additional particulates produce plasma contaminated with white blood cells. Any
from plasma fraction DNA isolated from this plasma will include some cellular
3 . isolate/purify cfDNA from plasma (this is a low yield DNA . Therefore, for cfDNA isolation from normal blood ,
process ) 25 the plasma may be subjected to a second centrifugation at
Denature and /or degrade proteins in plasma ( contact high -speed to pellet out any contaminating cells as
with proteases ) and make solution negative with explained .
guanidine hydrochloride or other chaotropic reagent Cell free DNA , as it exists in the plasma of an organism ,
( to facilitate driving cfDNA out of solution ) is typically DNA wrapped or coiled around histone proteins .
Contact treated plasma with a support matrix such as 30 See FIG . 1C for an illustration of the structure a nucleosome
beads in a column. cfDNA comes out of solution and complex including a stretch of DNA wrapped around an
binds to matrix . octamer of histones . Cell - free DNA in blood is apoptotic
Wash the supportmatrix DNA that is still wrapped around nucleosomes. Nucle
Release cfDNA from matrix and recover. osomal proteins are mostly made up of positively charged
4 . make a library from purified cfDNA 35 histones around which the negatively charged DNA is
5 . perform next generation sequencing wound . It takes approximately 147 nucleotides to wrap
FIG . 1A shows a conventional process for processing around a single nucleosomal protein complex , with addi
cfDNA using next generation sequencing . Process 100 tional bases as “ linker ” sequences between nucleosomal
begins with collecting a sample containing cfDNA. See units. This explains why , upon purification , mono -nucle
operation 103 in the flow chart of FIG . 1A . Collection can 40 osomal cfDNA has a peak around 165 - 170 bp .
be performed by any one of many available techniques. Such After a plasma fraction is collected as described , the
techniques should collect a sufficient volume of sample to cfDNA is extracted . See operation 107 of FIG . 1A and the
supply enough cfDNA to satisfy the requirements of the entire flow chart of FIG . 1B . Extraction is actually a mul
sequencing technology , and account for losses during the tistep process that involves separating DNA from the plasma
processing leading up to sequencing . 45 in a column or other solid phase binding matrix .
In certain embodiments , blood is collected in specially The first part of this cfDNA isolation procedure involves
designed blood collection tubes or other container. Such denaturing or degrading the nucleosome proteins and oth
tubes may include an anti- coagulant such as ethylenedi - erwise taking steps to free the DNA from the nucleosome.
amine tetracetic acid (EDTA ) or acid citrate dextrose See operation 121 in the flow chart of FIG . 1B . A typical
(ACD ). In some cases , the tube includes a fixative. In some 50 reagentmixture used to accomplish this isolation includes a
embodiments , blood is collected in a tube that gently fixes detergent, protease , and a chaotropic agent such as guanine
cells and deactivates nucleases ( e. g ., Streck Cell- free DNA hydrochloride . The protease serves to degrade the nucle
BCT tubes ). See US Patent Application Publication No. osome proteins, as well as background proteins in the
2010 /0209930 , filed Feb . 11 , 2010, and US Patent Applica - plasma such as albumin and immunoglobulins. The chao
tion Publication No. 2010 /0184069 , filed Jan . 19, 2010 each 55 tropic agent disrupts the structure of macromolecules by
previously incorporated herein by reference . interfering with intramolecular interactions mediated by
Generally , it is desirable to collect and process cfDNA non - covalent forces such as hydrogen bonds. The chaotropic
that is uncontaminated with DNA from other sources such as agent also renders components of the plasma such as pro
white blood cells . Therefore, white blood cells should be teins negative in charge. The negative charge makes the
removed from the sample and / or treated in a manner that 60 medium somewhat energetically incompatible with the
reduces the likelihood that they will release their DNA. negatively charged DNA . The use a chaotropic agent to
In the conventional process , the blood sample is centri - facilitate DNA purification is described in Boom et al.,
fuged , sometimes twice . See operation 105 in FIG . 1A . The “ Rapid and Simple Method for Purification of Nucleic
first centrifugation step produces three fractions: a plasma Acids” , J. Clin . Microbiology , v . 28 , No. 3 , 1990 .
fraction on top , a buffy coat containing leukocytes, and 65 After this protein degradation treatment, which frees, at
hematocrit fraction on the bottom . This first centrifugation least partially , the DNA coils from the nucleosome proteins ,
process is performed at relatively low g -force in order to the resulting solution is passed through a column or other
US 10 ,400 , 267 B2
18
wise exposed to support matrix . See operation 123 of FIG . separate plasma from other components ( e. g ., buffy
1B . The cfDNA in the treated plasma selectively adheres the coat and hematocrit in a soft spin ) to reduce con
support matrix . The remaining constituents of the plasma tamination from maternal DNA
pass through the binding matrix and removed . The negative option — use a “ freeze -thaw ” supernatant produced as
charge imparted to medium components facilitates adsorp - 5 described below .
tion of DNA in the pores of a support matrix . 3 . make a library directly from cfDNA existing in plasma
After passing the treated plasma through the support or freeze -thaw supernatant without first purifying the
matrix , the support matrix with bound cfDNA is washed to cfDNA from these sources .
remove additional proteins and other unwanted components Condition 1 — loosen cfDNA wrapped around histones
of the sample . See operation 125 of FIG . 1B . After washing, to allow end of cfDNA strand to become available
the cfDNA is freed from the matrix and recovered . See for ligating an adaptor. (mild detergent and / or mild
heat )
operation 127 of FIG . 1B . Unfortunately, this process loses Condition 2 — Do so under conditions that do not harm
a significant fraction of the available DNA from the plasma. ligase or transposase (no aggressive proteases and no
Generally, supportmatrixes have a high capacity for cfDNA , guanidine hydrochloride ) - ligation requires four
which limits the amount of cfDNA that can be easily components : cfDNA , adaptor sequences, ligase ,
separated from the matrix . As a consequence, the yield of ATP.
cfDNA extraction step is quite low . Typically , the efficiency Condition 3 — reduce concentration of background
is well below 50 % (e .g ., it has been found that the typical serum proteins (immunoglobulins and albumin )
yield ofcDNA is 4 -12 ng/ml of plasma from the available 20 one embodiment: pass plasma over a column or other
~ 30 ng /ml plasma ). container of a support matrix . Simple conditions
The purified cfDNA is used to prepare a library for possibly remove only a fraction of the protein ( 50 %
sequencing . See operation 109 of FIG . 1A . To sequence a or 75 % or 80 % or 90 % ).
population of double - stranded DNA fragments using mas 4 . perform next generation sequencing
sively parallel sequencing systems, the DNA fragments must 25 One benefit of directly generating a library is a signifi
be flanked by known adapter sequences. A collection of such cantly higher cfDNA recovery rate than is attainable with a
DNA fragments with adapters at either end is called a conventional process . A second benefit is a simplification of
sequencing library. Two examples of suitable methods for the process by replacing the multi- step DNA isolation pro
generating sequencing libraries from purified DNA are (of1) 30 cedure with a simple one or two- step process that provides
ligation -based attachmentofknown adapters to either end
end of 30 a library of DNA for sequencing . In the conventional tech
nique , the relevant steps are: degrading serum and nucle
fragmented DNA , and (2 ) transposase -mediated insertion of
adapter sequences. There are many suitable massively par osome ing
proteins, contacting the solution with a DNA -absorb
support matrix , washing the support matrix , eluting the
allel sequencing techniques . Some of these are described DNA from the support matrix, and attaching adapters to the
below . The sequencing operation is depicted in block 111 of 35 isolated DNA . In contrast, the direct library generation
FIG . 1A . method includes the following steps: removing some frac
Efficiently Producing cfDNA Libraries tion of the serum proteins, and attaching adapters to the ends
Unless indicated otherwise , details of the operations of the cfDNA in the resulting solution .
described above for a conventional process can be applied Turning to FIG . 2A , the depicted process begins with
for comparable operations employed in the following 40 receipt of a whole blood sample . This is indicated by block
embodiments. 203 of the Figure . This operation may be performed as
Generating Library Directly , without Purifying cfDNA described above for the conventional process. In some cases ,
(Direct Generation of Library from Plasma or FT Superna the whole blood is treated with a fixing agent to stabilize the
tant ) cells in the sample, and thereby reduce the likelihood that
The embodiments described in this section involve mak - 45 their DNA will contaminate the cfDNA used to make a
ing cfDNA sequencing libraries from biological fluids with library.
out first purifying the DNA from such fluids. A typical Additionally , the blood sample may be treated to deacti
cfDNA concentration in biological fluids is approximately vate nucleases. Most nucleases can be deactivated by heat
30 ng/ml of plasma. Between this low starting DNA con - ing the plasma ( e . g ., to about 65° C . for about 15 - 30
centrations and the small size of cfDNA (~ 170 bp), the 50 minutes ) or by contacting the sample with a nuclease
efficiency of DNA isolation is poor (significantly less than inhibitor. In one example , the sample is provided in a blood
50 % yield ). It has been found, for example, that the typical collection tube such as a tube sold for this purpose by Streck ,
yield of cfDNA is 4 - 12 ng /ml of plasma from the available Inc., which includes an additive that deactivates nucleases.
~ 30 ng/ml plasma. The direct method described here can Examples of compositions having nuclease inhibiting activ
greatly increase the yield . 55 ity are disclosed in US Patent Application Publication 2010 /
Examples of processes for generating a library directly 0184069, filed Jan . 19 , 2010 , and in US Patent Application
from plasma, without first purifying DNA , are presented in Publication No. 2010 /209930 , filed Feb . 11 , 2010 , both
the outline immediately below and in the flow charts of previously incorporated herein by reference .
FIGS. 2A and 2B . The sample collected in operation 203 is centrifuged to
1. collect blood - optionally with a fixative ( Any fixative 60 generate a plasma fraction containing the cfDNA that is
that prevents release of cellular DNA would be useful; carried forward in the process. See operation 205 . In certain
e .g ., Streck .) embodiments , only a single centrifugation step is performed ,
2 . centrifugations to isolate plasma (in some implemen - as compared to the conventional process where two cen
tations , only the hard centrifugation is needed if a trifugation steps are performed . The second centrifugation
fixative is used — the fixative binds white blood cell 65 step may be eliminated when the white blood cells in the
DNA to the nucleii , preventing it from contaminating sample are stabilized by fixative or other reagent, so that
the plasma fraction used for its cfDNA .) they do not release their nuclear DNA when exposed to high
US 10 ,400 , 267 B2
20
g - forces . When this is done , a single, high g - force centrifu - degraded serum protein . Certain other embodiments employ
gation step may be employed to remove all cells from the a metalloprotease or other protease requiring a metal ion or
whole blood . The leukocytes that have been stabilized are cofactor to activate its proteolytic function . In such cases ,
better able to withstand the forces experienced during this the sample is contacted with the protease in its active form
step . A greater fraction of the cfDNA in the sample is 5 for a period sufficient to degrade some or all of the serum
recovered in the plasma fraction when a single centrifuga - proteins. Then, the protease is deactivated by removing the
tion step is performed . metal ion or other cofactor. In the cases of a metalloprotein ,
In the direct method described here , the native cfDNA this may be accomplished by contacting the sample with a
coiled around nucleosome proteins may be used as such , chelating agent such as EDTA . Thereafter, the degraded
without first isolating it as required in the conventional 10 serum protein is optionally removed and the adaptor attach
processes described above . As mentioned , cfDNA used in a ment reaction is performed
library must have adapters attached to both ends of the DNA As mentioned , the cfDNA from the sample is converted to
strands . In some cases , these adaptor sequences are about a library without first separating the DNA from the sample .
30 - 100 bp in length , e . g ., about 60 bp . In the conventional See operation 209 of FIG . 2A and both operations of FIG .
process, adaptor ligation is accomplished only after the 15 2B . In other words, the cfDNA is used in the sample or a
cfDNA has been uncoiled and removed from the nucle - portion of the sample in which the cfDNA naturally exists
osome proteins . In the direct process , in contrast, the adapt - (e . g ., the plasma or other liquid fraction of whole blood). In
ers are attached while the cfDNA is still coiled around the process of attaching adaptors , the necessary reactants are
nucleosome proteins . contacted with the sample portion containing the cfDNA. In
Two suitable methods for generating sequencing libraries 20 the case of ligation , these are a ligase, ATP, and adaptors . See
from purified DNA are ( 1) ligation -based attachment of operation 221 of FIG . 2B . Additionally, during the reaction ,
known adapters to either end of fragmented DNA and (2 ) the cfDNA , specifically the “ ends ” of cfDNA , may be made
transposase -mediated insertion of adapter sequences . Both more accessible to library preparation enzymes by certain
of these processes may be performed directly on cfDNA that techniques. See operation 223 of FIG . 2B .
is wound around nucleosomes in biological fluids . 25 Helically wrapped nucleosomal DNA spontaneously
To attach adaptor sequences to cfDNA still bound to becomes accessible to cellular proteins such as RNA poly
nucleosome proteins, it may be necessary to first reduce the merase . See , Li et al., Rapid spontaneous accessibility of
concentration of serum proteins. Further, it may be neces - nucleosomal DNA , Nature Structural and Molecular Biol
sary to conduct an attachment reaction under conditions that ogy , 12 , 1 , January 2005 . However, to make the cfDNA
loosen the cfDNA from the nucleosome proteins. 30 sufficiently accessible for adaptor ligation while still
The adaptor ligation reaction requires four interacting attached to nucleosome proteins, the process may expose the
components : adapter sequences, cfDNA , a ligase, and ATP, protein bound cfDNA to conditions that increase the entropy
the energy source required to drive the ligation reaction . The of the nucleosome- cfDNA complex and allow the ends of
transposase reaction requires similar components. Plasma the coiled DNA to become free of the histones more fre
has a large amount of ambient protein , predominantly 35 - 50 35 quently and /or for longer durations and therefore become
mg/ml albumin and 10 - 15 mg/ml immunoglobulins (Igs ). available for ligation during a greater fraction of the time.
These proteins create steric hindrance for the library -making This loosening of the cfDNA should be accomplished in a
components to act on nucleosomal cfDNA . In other words, way that does not interfere with the litigation process . As
plasma from the sample will have perhaps too much back - such , the process should generally avoid using proteases or
ground proteins such as albumin and immunoglobulins to 40 chaotropic agents such as are used in the conventional
allow adaptor attachment to proceed efficiently . Therefore , isolation process . Proteases which denature or otherwise
methods for removing serum proteins or at least reducing degrade proteins in plasmawould interfere with the action of
their concentration may be employed . See optional step 207 ligase and could only be destroyed at high temperatures
of FIG . 2A . Such methods may involve passing the plasma which would also destroy the cfDNA .
over a supportmatrix that selectively binds proteins but has 45 To promote loosening of the cfDNA , the process may
little or no affinity for the DNA . In some embodiments, employ a slightly elevated temperature and or the use of
serum protein can be depleted using a combination of mild detergents . For example , the process may be conducted
albumin and immunoglobulin depletion columns. at a temperature of between about 30 and 75° C ., or between
A separation procedure for removing proteins can be about 35 and 45° C ., or between about 45 and 55° C ., or
relatively simple compared to the DNA isolation procedure 50 between about 55 and 65° C ., or between about 65 and 75°
which requires contact of the serum to a DNA absorbing C.
support matrix followed by washing and eluting of the DNA . In some embodiments, adaptor attachment is performed
To remove proteins, the current procedure merely involves using mild detergents and salts (or combinations thereof) .
passing the plasma over a support matrix which absorbs for When chosen correctly, these will cause the cfDNA to
serum proteins . No washing or elution is required . 55 unwrap from the histone complex , at least slightly , allowing
An alternative method to reducing serum proteins access to the ends of the cfDNA for ligation of the sequenc
employs a protease that can be removed , degraded and /or ing adapters . If a detergent is used , it should be sufficiently
deactivated before performing the adaptor attachment reac - mild that it does not interfere with the ligation process.
tion . For example , a heat labile protease may be used . This Sodium dodecyl sulfate is likely too aggressive for most
is one that will deactivate at a temperature well below the 60 applications. In other words, it should not disrupt or dena
temperature that degrades the cfDNA . For example , a pro - ture the ligase . Examples of suitable types of detergents
tease that deactivates at a temperature of about 95° C . or include various non - ionic detergents. One example of deter
lower, or about 70° C . or lower, is used in some embodi gent that has been found suitable is TWEEN® -20 (polysor
ments. After treating the plasma or freeze -thaw supernatant bate - 20 ) .
with such protease , the sample temperature is raised to a 65 After, the library is prepared , it sequenced by, e .g., a
level that deactivates the protease. Thereafter, the sample is massively parallel sequencing technique. Additional pro
optionally centrifuged or otherwise processed to remove the teins remaining in the sample after library generation ( in
US 10 ,400 , 267 B2
21 22
cluding histones ) are degraded by the heating step in the first Optional — Size selection to remove putative maternal
cycle of amplification (e.g., PCR ),which is performed as an DNA originating in cells.
initial part of the sequencing process . 5 . Perform next generation sequencing
In some embodiments , adaptors are introduced into target This method can be used with either conventional cfDNA
DNA using transposase -mediated methods. See, Adey et al., 5 isolation procedure or with a procedure that produces a
Rapid , low -input, low - bias construction of shotgun fragment DNA library directly from blood or plasma. The second
libraries by high -density in vitro transposition , Genome procedure is as described above for the directmethod .
Biology 2010 , 11 :R119 . As an example, a Tn5 transposase Typically , the process begins by receiving a whole blood
derivative may be used integrate adaptor sequences into sample (operation 300 ) followed by fixing the white blood
cfDNA . The derivative comprises wild - type Tn5 transposon 10 cells in the sample ( operation 301 ). Suitable fixing agents
DNA is flanked by two inverted IS50 elements , each con include those described above . Additionally , the whole
taining two 19 bp sequences required for function ( outside blood sample may be treated with nuclease inhibitors . These
end and inside end ). A 19 bp derivative allows transposition are also described above . The fixing process should bind
provided that the intervening DNA is long enough to allow white blood cell DNA to the cells ' nuclei, or at least inhibit
the two ends to come in close proximity in order to form a 15 DNA release from the nuclei during centrifugation .
complex with a Tn5 transposase homodimer. As illustrated in FIGS. 3A and 3B , the whole blood
In summary, the direct processing of cell free DNA in sample is frozen . See operation 303. Freezing is believed to
plasma, the method eliminates the need to pass the plasma destroy the constituent cells by breaking their cell mem
through a column or other vessel containing a support branes and otherwise disrupting their cell structure . Certain
matrix . DNA is therefore not isolated on a support matrix . 20 of the cellular organelles may remain intact. These include
This greatly increases the amount of DNA that is recovered the nuclei of the cells , particularly if an appropriate fixing
from the original blood sample . It also reduces the com - agent is used . The freezing may also modify the structure of
plexity of the process . In some embodiments , another sig - the serum proteins so that they more readily come out of the
nificant difference from the conventional process is the lack plasma.
of a step of degrading nucleosomal proteins with a protease 25 Freezing may be performed directly on whole blood . No
or other agent. Typically , the adaptor attachment reaction is other processing is required aside from the previously men
performed in a medium containing a significant fraction of tioned fixing and nuclease inhibition . Freezing may be
the original sample ( e. g ., whole blood , urine, sweat, etc .). conducted in sample collection tubes or other collection
Examples of such fractions include plasma and freeze -thaw vehicle . Preferably, the process is conducted in a manner that
supernatant. 30 resists breaking of the collection vehicle as the sample
To realize these benefits, the direct process addresses the expands. A large expansion surface area to volume is
challenges introduced by salts , proteases, nucleases , albu - desired . In some embodiments , sample tubes are positioned
min , and immunoglobulins, all present in plasma, which can on their sides during freezing . This provides significantly
interfere with the library biochemistry. Therefore, in work greater expansion surface area than is available when tubes
ing with plasma cfDNA directly , the process may ( 1 ) reduce 35 sit upright.
the concentration of background albumins and Igs , (2 ) Freezingmay be accomplished by any suitable procedure ,
inhibit or remove proteases and nucleases, and /or (3 ) render so long as it effectively disrupts the cells in the sample .
the cfDNA ends more accessible . Freezing in conventional freezing apparatus is suitable . As
Freeze Thaw Method (cfDNA Purification from Thawed examples, the freezing temperature may be about - 20° C . or
Supernatant ) 40 lower, or about – 70° C . or lower, or about - 70° C . to - 120°
An alternative process for preparing sequencing libraries C.
is depicted in FIGS . 3A and 3B and the outline that imme After the sample has been frozen , it is thawed . See
diately follows. operation 305 of FIGS. 3A and 3B . The sample may remain
1 . Collect whole blood with a fixative (Any fixative that frozen for any period of time before thawing. In some
prevents release of cellular DNA from the nucleus may 45 embodiments , the sample is thawed by immersing in a liquid
be used ) bath such as a water bath at room temperature . In certain
2 . Freeze and later thaw the whole blood ( the whole blood embodiments , the bath temperature is between about 10° C .
may be frozen in a tube lying on its side to prevent and 37° C .
breakage during freezing ) — The freezing destroys the The thawed blood includes the remnants of the original
cell membranes and possibly modifies serum proteins 50 blood which have been disrupted by the freezing. It is
so that they come out of blood more easily . believed that the thawed blood contains liquid containing
3 . Centrifuge to remove solids much of the cfDNA from the original whole blood sample ,
a single high g (hard ) spin is all that is needed so long but without contamination from cellular DNA. In the pro
as the WBC DNA is fixed to the nuclei . cesses of FIGS. 3A and 3B , the thawed blood is subjected to
The supernatant is red (has hemoglobin ) and of quite 55 a single hard spin centrifugation to separate the sample into
low viscosity compared to whole blood . The freeze a solid phase and a supernatant. See operations 307 . The
thaw may reduce the concentration of serum proteins supernatant may be a low viscosity red colored material. It
and thereby reduce viscosity . is believed that it contains cfDNA, hemoglobin and some
4 . Optional A - isolate cfDNA from supernatant (conven - fraction of the original serum proteins . The solid fraction
tional technique — see papers) 60 includes organelles and other materials from the freeze
Optional — Size selection to remove putative cell disrupted red blood cells white blood cells, and including
bound DNA originating, e. g ., white blood cells. (As relatively intact nuclei of the white blood cells . The solids
an example , select DNA of size 800 bps and smaller ) are removed . Therefore, the supernatant includes much of
make a library from cfDNA ( conventional technique the cfDNA from the sample , typically without contaminat
described above ) 65 ing DNA from white blood cells . The DNA from the white
4 . Option B - directly make library from the supernatant blood cells is included in the solid fraction of has been
using the procedure in the direct method . removed .
US 10 ,400 , 267 B2
23 24
It has been found that a rather high fraction of the whole tears, sputum , urine, sputum , ear flow , lymph , saliva , cere
blood is available in the supernatant. As mentioned , the brospinal fluid , ravages, bone marrow suspension , vaginal
supernatant contains cfDNA that is typically free of DNA flow , transcervical lavage, brain fluid , ascites , milk , secre
from the nuclei of the white blood cells . CfDNA resides not tions of the respiratory, intestinal and genitourinary tracts ,
only in the plasma fraction of a conventionally centrifuged 5 amniotic fluid , milk , pleural fluid , pericardial fluid , perito
blood sample but also in the hematocrit and buffy coat neal fluid , and leukophoresis samples. In some embodi
fractions . However, in the conventional process , the hema ments, the sample is a sample that is easily obtainable by
tocrit and buffy coat are discarded because they are likely non - invasive procedures e. g. blood , plasma, serum , sweat,
contaminated with DNA from other sources within the
blood . As an example , for 8 mL of whole blood sample , 10 tears , sputum , urine , sputum , ear flow , saliva or feces . In
certain embodiments the sample is a peripheral blood
roughly 7 mL of thawed supernatant is recovered . In a sample , or the plasma and/ or serum fractions of a peripheral
conventional, non - freeze -thaw process, only about 3 mL of blood sample . In other embodiments, the biological sample
plasma is recovered from 8 mL of whole blood sample .
Therefore the current process employs a single operation , is a swab or smear, a biopsy specimen , or a cell culture. In
performed on the thawed blood , to produce a blood fraction 15 other embodiments , the biological sample is a stool ( fecal)
having a relatively high retained fraction of the cfDNA from sample .
the original sample. The freeze - thaw method may greatly In some embodiments, the sample is a mixture of two or
increase the recovery of cfDNA and a whole blood sample . more biological samples e. g . a biological sample can com
It is been observed that the viscosity of the supernatant is prise two or more of a biological fluid sample , a tissue
significantly lower than that of whole blood. It is believed 20 sample, and a cell culture sample. As used herein , the terms
that the freezing disrupts the proteins in the serum so that " blood ,” “ plasma” and “ serum ” expressly encompass frac
they are more easily removed from the serum fraction , tions or processed portions thereof. Similarly , where a
possibly by simple centrifugation . sample is taken from a biopsy , swab , smear, etc ., the
The supernatant can be processed to isolate cell free DNA “ sample ” expressly encompasses a processed fraction or
according to the conventional protocol. This is depicted in 25 portion derived from the biopsy, swab , smear, etc .
FIG . 3B , operations 321 , 323, and 325 . Alternatively , the The sample comprising the nucleic acid (s ) to which the
supernatant can be processed to directly to ligate adapters methods described herein are applied typically comprises a
onto cell free DNA in the manner described above . This is biological sample (“ test sample ” ), e.g ., as described above .
depicted in FIG . 3A , operations 309 , 311 , and 313 . In conventional methods , the nucleic acid (s ) to be screened
In certain embodiments, the DNA in the supernatant is 30 for one or more CNVs is purified or isolated by any of a
subjected to size selection to remove high molecular weight number of well-known methods . In some embodiments of
DNA that possibly originates from white blood cells . Size the current disclosure , the processes can omit one or more
selection is performed after centrifugation but before adap - steps involved in the purification or isolation of the nucleic
tor attachment. In some embodiments, it is performed in acid ( s).
conjunction with a serum protein removing step . In certain 35 In some embodiments it is advantageous to obtain cell
embodiments , DNA having a size of about 1000 bp or free nucleic acids e.g. cell- free DNA (cfDNA ). Cell- free
greater is excluded , or a size of about 800 bp or greater is nucleic acids , including cell - free DNA , can be obtained by
excluded , or a size of about 500 bp or greater is excluded . various methods known in the art from biological samples
Various size selection procedures may be employed . Some including but not limited to plasma, serum , and urine (see ,
of these employ a volume excluding agent such as polyeth - 40 e. g.,Fan et al., Proc Natl Acad Sci 105 : 16266 - 16271 [2008 ];
ylene glycol (PEG6000 or PEG8000 ) and a salt ( e.g ., NaCl). Koide et al., Prenatal Diagnosis 25 :604 -607 [2005]; Chen et
The concentrations of the agent and salt dictate the size of al., Nature Med . 2 : 1033 - 1035 [ 1996 ]; Lo et al., Lancet 350 :
DNA that is selected . In some cases, the size selection 485 -487 [ 1997 ]; Botezatu et al., Clin Chem . 46 : 1078 - 1084 ,
process takes advantage of the fact that nucleosomes are 2000 ; and Su et al ., J Mol. Diagn . 6 : 101 - 107 [ 2004 ]) . To
relatively small compact structures , often nominally spheri- 45 separate cell- free DNA from cells in a sample , various
cal, that pass through size selection media more easily than methods including, but not limited to fractionation , centrifu
long strands of DNA and other biomolecules. An example of gation ( e. g ., density gradient centrifugation ), DNA -specific
suitable size selection procedure is described in Hawkins et precipitation , or high -throughput cell sorting and/ or other
al, “ DNA purification and isolation using a solid -phase ” , separation methods can be used . Commercially available
Nucleic Acid Research , Vol. 22 , No. 21 , pp . 4543 - 44 ( 1994 ). 50 kits for manual and automated separation of cfDNA are
A commercially available product for size selection is the available (Roche Diagnostics, Indianapolis , Ind ., Qiagen ,
SPRIselect Reagent Kit (Beckman Coulter ). Valencia , Calif., Macherey -Nagel, Duren , Del.). Biological
Among the advantages of the freeze -thaw process that samples comprising cfDNA have been used in assays to
may be realized are the following: determine the presence or absence of chromosomal abnor
( 1 ) decreased handling of the blood 55 malities e. g . trisomy 21 , by sequencing assays that can
(2 ) larger numbers of aliquots of the FT ( freeze -thaw ) detect chromosomal aneuploidies and / or various polymor
Blood will be available for downstream work phisms.
(3 ) the concentrations of cfDNA isolated from FT Blood In certain embodiments, samples can be obtained from
are typically higher. sources, including, but not limited to , samples from different
Samples Sources 60 individuals, samples from different developmental stages of
While whole blood has been discussed as the sample the same or different individuals, samples from different
source in most of the disclosed embodiments , the methods diseased individuals ( e .g ., individuals with cancer or sus
herein may be used with many different sample sources. In pected of having a genetic disorder ), normal individuals,
certain embodiments, the sample comprises a tissue sample , samples obtained at different stages of a disease in an
a biological fluid sample , a cell sample , and the like . 65 individual, samples obtained from an individual subjected to
Suitable biological fluid samples include , but are not limited different treatments for a disease , samples from individuals
to whole blood, a blood fraction , plasma, serum , sweat, subjected to different environmental factors, samples from
US 10 ,400 , 267 B2
25 26
individuals with predisposition to a pathology , samples sequences . In various embodiments the sequences of
individuals with exposure to an infectious disease agent genomic nucleic acids , and/or of indexed genomic nucleic
( e.g ., HIV ), and the like . acids can be determined using , for example , the Next
In one illustrative , but non - limiting embodiment, the Generation Sequencing Technologies (NGS) described
sample is a maternal sample that is obtained from a pregnant 5 herein . In various embodiments analysis of the massive
female , for example a pregnant woman . The maternal amount of sequence data obtained using NGS can be per
sample comprises a mixture of fetal and maternal DNA e. g . formed using one or more processors as described herein .
cfDNA . In this instance , the sample can be analyzed using As explained , a whole blood sample may be processed to
the methods described herein to provide a prenatal diagnosis provide a plasma fraction containing cfDNA that has
of potential chromosomal abnormalities in the fetus . The 10 reduced binding with , but not fully uncoiled from , nucle
maternal sample can be a tissue sample , a biological fluid osomal proteins . In some embodiments , a plasma fraction
sample , or a cell sample. In some embodiments, thematernal containing such cfDNA may then be provided to a droplet
sample is a biological fluid sample e. g . a blood sample , a actuator as described below . The droplet applicator causes a
plasma sample, a serum sample, a urine sample, a saliva droplet to coagulate . The coagulated portion including
sample . Other maternal samples include any of the biologi- 15 cfDNA may then be provided as an input to assays of next
cal fluid samples disclosed elsewhere herein . generation sequencing . In some embodiments, the assays
In another illustrative, but non -limiting embodiment, the use ligation or transposon -mediated insertion to attach adap
maternal sample is a mixture of two or more biological tors or tags to the cfDNA , to prepare sequencing libraries.
samples e . g . the biological sample can comprise two or more In someembodiments , samples containing cfDNA may be
of a biological fluid sample , a tissue sample , and a cell 20 processed as droplets using a droplet actuator, which allows
culture sample . processing of very small amount of samples using micro
Collection of Samples for cfDNA Sequencing fluidic devices. PCT Patent Application Publication No. WO
Samples can be collected using any of a number of 2009 /135205 describes examples of such droplet actuators,
various different techniques. Techniques suitable for indi- which is incorporated by reference in its entirety . In some
vidual sample types will be readily apparent to those of skill 25 embodiments , a droplet actuator has two substrates sepa
in the art. For example, whole blood may be collected in rated by a droplet operation gap , each substrate associated
tubes such as standard color -coded blood collection tubes with operation electrodes. The droplet operation gap is
containing anticoagulants ( lithium heparin , etc .), chelating occupied by a filler fluid typically comprising an organic oil .
agents (EDTA , etc . ), nuclease and / or protease inhibitors , etc . In some embodiments , a blood sample , either whole blood
Asmentioned above Cell - Free DNA BCTTM tubes available 30 or a blood component such as plasma, can be provided in
from Streck , Inc. are suitable for some applications small quantity to form a source droplet in a filler fluid . Then
described herein . the droplet actuator causes the source droplet to coagulate to
FIG . 4 below presents an example of another suitable form a coagulated portion and a supernatant. The coagula
device for collecting whole blood. As explained above, tion may be effected by applying a procoagulant, heating ,
plasma constitutes roughly 50 % v / v of whole blood . A 35 cooling, or electric field , etc . Then the coagulated portion
version of a small depicted device that collects 2 -4 drops of may be used as an input into assays for further downstream
patient/donor blood ( 100 - 200 ul) and then separates the processing to obtain sequencing libraries.
plasma from the hematocrit using a specialized membrane . An example of sequencing library preparation is
The device can be used to generate the required 50 - 100 ul of described in U .S . Patent Application Publication No. US
plasma for NGS library preparation . Once the plasma has 40 2013 /0203606 , which is incorporated by reference in its
been separated by the membrane, it can be absorbed into a entirety. In some embodiments , this preparation may take
pretreated medical sponge . In certain embodiments , the the coagulated portion of the sample from the droplet
sponge is pretreated with a combination of preservatives, actuator as an assay input. The library preparation process is
proteases and salts to ( a ) inhibit nucleases and /or (b ) stabi - a ligation -based process, which includes four main opera
lize the plasma DNA until downstream processing . Products 45 tions : (a ) blunt- ending, (b ) phosphorylating, (c ) A -tailing,
such as Vivid Plasma Separation Membrane (Pall Life and (d ) ligating adaptors . DNA fragments in a droplet are
Sciences , Ann Arbor, Mich .) and Medisponge 50PW ( Filt- provided to process the sequencing library . In the blunt
rona technologies , St. Charles, Mich .) can be used . ending operation (a ), nucleic acid fragments with 5'- and /or
The plasma DNA in the medical sponge can be accessed 3 '- overhangs are blunt - ended using T4 DNA polymerase that
for NGS library generation in a variety of ways : 50 has both a 3'-5 ' exonuclease activity and a 5 '- 3' polymerase
(a ) Reconstitute and extract that plasma from the sponge activity, removing overhangs and yielding complementary
and isolate DNA for downstream processing. Of course , this bases at both ends on DNA fragments. In some embodi
approach may have limited DNA recovery efficiency . ments , the T4 DNA polymerase may be provided as a
(b ) Utilize the DNA -binding properties of the medical droplet. In the phosphorylation operation (b ), T4 polynucle
sponge polymer to isolate the DNA 55 otide kinase may be used to attach a phosphate to the
(c ) Conduct direct PCR -based library preparation using 5 '-hydroxyl terminus of the blunt-ended nucleic acid . In
the DNA that is bound to the sponge . This may be conducted some embodiments, the T4 polynucleotide kinase may be
using any of the cfDNA library preparation techniques provided as a droplet. In the A - tailing operation (c ), the 3'
described above. hydroxyl end of a dATP is attached to the phosphate on the
Sequencing Library Preparation 60 5 '-hydroxyl terminus of a blunt- ended fragment catalyzed by
In one embodiment, the methods described herein can exo - Klenow polymerase . In the ligating operation (d ),
utilize next generation sequencing technologies that allow sequencing adaptors are ligated to the A -tail . T4 DNA ligase
multiple samples to be sequenced individually as genomic is used to catalyze the formation of a phosphate bond
molecules (i.e . singleplex sequencing) or as pooled samples between the A -tail and the adaptor sequence . In some
comprising indexed genomic molecules (e. g., multiplex 65 embodiments involving cfDNA , end- repairing ( including
sequencing ) on a single sequencing run . These methods can
blunt -ending and phosphorylation ) may be skipped because
generate up to several hundred million reads of DNA the cfDNA are naturally fragmented , but the overall process
US 10 ,400, 267 B2
27 28
upstream and downstream of end repair is otherwise com in preparing a sequencing library. The precise sequence of
parable to processes involving longer strands of DNA . the primary polynucleotide molecules is generally not mate
In some embodiments , instead of using ligation to intro - rial to the method of library preparation , and may be known
duce tags for a sequencing library prepared from cfDNA , or unknown. In one embodiment, the polynucleotide mol
extension or insertion may be used instead of or in addition 5 ecules are DNA molecules. More particularly , in certain
to ligation . U .S . Patent Application Publication No. 2010 / embodiments , the polynucleotide molecules represent the
0120098 , incorporated by reference in its entirety, provides entire genetic complement of an organism or substantially
exemplary processes that may use transposon -mediated the entire genetic complement of an organism , and are
insertion to introduce tags to cfDNA . In some embodiments , genomic DNAmolecules (e.g., cellular DNA , cell free DNA
the cfDNA are unpurified cfDNA obtained by processes 10 (cfDNA ), etc .), that typically include both intron sequence
described above. In the context of the publication , a trans and exon sequence (coding sequence ), as well as non -coding
poson is a genetic element that changes location in a genome
through a transposition reaction catalyzed by a transposase . regulatory sequences such as promoter and enhancer
A transposon end is a double - stranded DNA consisting of sequences. In certain embodiments, the primary polynucle
the minimum number of nucleotides required to couple with 15 otide molecules comprise human genomic DNA molecules ,
a transposase to form a transposome, which drives transpo e .g. cfDNA molecules present in peripheral blood of a
sition . A transposon end containing composition is a double pregnant subject.
stranded DNA containing a transposon end at the 3' end and Preparation of sequencing libraries for some NGS
other sequence elements or tags at the 5 ' end ( e. g., sequenc sequencing platforms is facilitated by the use of polynucle
ing adaptors or unique identifiers for assays ). The transposon 20 otides comprising a specific range of fragment sizes . Prepa
end and transposon end containing composition each have a ration of such libraries typically involves the fragmentation
transferred strand and a non - transferred strand complemen of large polynucleotides (e. g. cellular genomic DNA ) to
tary to the transferred strand , wherein the transferred strand obtain polynucleotides in the desired size range.
is inserted into the target sequence by linking the 3 ' end of Fragmentation can be achieved by any of a number of
the transposon end sequence to the 5 ' end of the target 25 methods known to those of skill in the art . For example ,
sequence . The non -transferred strand is not directly trans - fragmentation can be achieved by mechanicalmeans includ
ferred to the target sequence . The publication provides ing, but not limited to nebulization , sonication and hydros
methods suitable for preparing a sequence library from hear. However mechanical fragmentation typically cleaves
nucleic acids, including cfDNA . One embodiment involves the DNA backbone at C — 0 , P - O and C — C bonds result
tagging both ends of a fragment of a target DNA ( e. g . a 30U ing in a heterogeneous mix of blunt and 3 '- and 5 -over
cfDNA fragment), which constitutes a fragment in a hanging ends with broken C — 0 , P - O and/ C — C bonds
sequencing library. The method involves incubating a frag
ment of a target DNA, a transposase ( e. g . Tn5 transposase or (17333
see , e.g ., Alnemri and Liwack , J Biol. Chem 265 :17323
[ 1990 ]; Richards and Boyer, J Mol Biol 11 : 327 - 240
Mu transposase ), and a transposon end containing compo
sition , thereby allowing a transposition reaction catalyzed by 3535 [1requisite
1965 ]) which may need to be repaired as they may lack the
5'-phosphate for the subsequent enzymatic reac
the transposase . The transposition reaction inserts a trans
ferred strand into the target DNA fragment by ligating the tions e . g . ligation of sequencing adaptors , that are required
transposon end of the transferred strand to the 5 ' end of the for preparing DNA for sequencing .
target sequence , thereby providing a 5 ' tagged target DNA In contrast, cfDNA , typically exists as fragments of less
fragment. The method further involves incubating the 5 ' 40 than about 300 base pairs and consequently, fragmentation
tagged target DNA fragment with a nucleic acid modifying is not typically necessary for generating a sequencing library
enzyme (e. g., a polymerase or a ligase), thereby joining a 3 ' using cfDNA samples .
tag to a 3' end of the 5' tagged target DNA fragment. The Typically, whether polynucleotides are forcibly frag
process yields a di-tagged targetDNA , which may be further mented (e .g ., fragmented in vitro ), or naturally exist as
processed to produce sequencing libraries as described fur- 45 fragments , they are converted to blunt - ended DNA having
ther below . 5'-phosphates and 3'-hydroxyl. Standard protocols e.g . pro
In various embodiments the use of such sequencing tocols for sequencing using, for example, the Illumina
technologies does not involve the preparation of sequencing platform as described elsewhere herein , instruct users to
libraries. end -repair sample DNA , to purify the end - repaired products
However, in certain embodiments the sequencing meth - 50 prior to dA - tailing , and to purify the dA -tailing products
ods contemplated herein involve the preparation of sequenc - prior to the adaptor -ligating steps of the library preparation .
ing libraries . In one illustrative approach , sequencing library Various embodiments, of methods of sequence library
preparation involves the production of a random collection preparation described herein obviate the need to perform one
of adapter-modified DNA fragments (e. g., polynucleotides ) or more of the steps typically mandated by standard proto
that are ready to be sequenced . Sequencing libraries of 55 cols to obtain a modified DNA product that can be
polynucleotides can be prepared from DNA or RNA , includ sequenced by NGS. An abbreviated method (ABB method ),
ing equivalents, analogs of either DNA or cDNA , for a 1 - step method , and a 2 - step method are described below .
example , DNA or cDNA that is complementary or copy Consecutive dA -tailing and adaptor ligation is herein
DNA produced from an RNA template, by the action of referred to as the 2 - step process. Consecutive dA -tailing ,
reverse transcriptase . The polynucleotides may originate in 60 adaptor ligating, and amplifying is herein referred to as the
double - stranded form ( e . g ., dsDNA such as genomic DNA 1 -step method . In various embodiments the ABB and 2 - step
fragments , cDNA , PCR amplification products , and the like ) methods can be performed in solution or on a solid surface .
or, in certain embodiments , the polynucleotides may origi - In certain embodiments the 1 - step method is performed on
nated in single -stranded form ( e. g., ssDNA , RNA, etc .) and a solid surface . Further details on ABB , 2 -step and 1- step
have been converted to dsDNA form . By way of illustration , 65 preparation are disclosed in U . S . Patent Application No .
in certain embodiments , single stranded mRNA molecules US20130029852 A1, which is incorporated by reference for
may be copied into double -stranded cDNAs suitable for use its description of sequencing library preparation .
US 10 ,400 , 267 B2
29 30
Marker Nucleic Acids for Tracking and Verifying Sample In various embodiments , the marker molecules have
Integrity antigenomic sequences , that are sequences that are absent
In various embodiments verification of the integrity of the from the genome of the biological source sample . In an
samples and sample tracking can be accomplished by exemplary embodiment, the marker molecules that are used
sequencing mixtures of sample genomic nucleic acids e. g . 5 to verify the integrity of a human biological source sample
cfDNA, and accompanying marker nucleic acids that have have sequences that are absent from the human genome. In
been introduced into the samples , e.g., prior to processing. an alternative embodiment, the marker molecules have
Marker nucleic acids can be combined with the test sequences that are absent from the source sample and from
sample (e.g., biological source sample ) and subjected to 10 any one or more other known genomes . For example , the
processes that include , for example , one or more of the steps marker molecules that are used to verify the integrity of a
of fractionating the biological source sample e.g . obtaining human biological source sample have sequences that are
an essentially cell- free plasma fraction from a whole blood absent from the human genome and from the mouse
genome. The alternative allows for verifying the integrity of
sample, and sequencing . In some embodiments, sequencing a test sample that comprises two or more genomes. For
comprises preparing a sequencing library . The sequence
ence orof 15 example, the integrity of a human cell-free DNA sample
combination of sequences of the marker molecules that are obtained from a subject affected by a pathogen e . g . a
combined with a source sample is chosen to be unique to the bacterium , can be verified using marker molecules having
source sample . In some embodiments, the unique marker sequences that are absent from both the human genome and
molecules in a sample all have the same sequence. In other the genome of the affecting bacterium . Sequences of
embodiments, the unique marker molecules in a sample are 20 genomes of numerous pathogens e .g . bacteria , viruses ,
a plurality of sequences, e. g., a combination of two, three , yeasts, fungi, protozoa etc., are publicly available on the
four, five , six , seven , eight, nine , ten , fifteen , twenty , or more world wide web at ncbi.nlm .nih . gov/ genomes . In another
different sequences. embodiment, marker molecules are nucleic acids that have
In one embodiment, the integrity of a sample can be sequences that are absent from any known genome. The
verified using a plurality ofmarker nucleic acid molecules 25 sequences of marker molecules can be randomly generated
having identical sequences. Alternatively, the identity of a algorithmically .
sample can be verified using a plurality of marker nucleic In various embodiments the marker molecules can be
acid molecules that have at least two , at least three , at least naturally -occurring deoxyribonucleic acids (DNA ), ribo
four, at least five, at least six , at least seven , at least eight, nucleic acids or artificial nucleic acid analogs (nucleic acid
at least nine , at least ten , at least 11 , at least 12 , at least 13 , 30 mimics ) including peptide nucleic acids (PMA), morpholino
at least 14 , at least 15 , at least 16 , at least 17 , at least 18 , at nucleic acid , locked nucleic acids , glycol nucleic acids , and
least 19 , at least 20 , at least 25 , at least 30, at least 35 , at least threose nucleic acids, which are distinguished from natu
40 , at least 50 , or more different sequences . Verification of rally -occurring DNA or RNA by changes to the backbone of
the integrity of the plurality ofbiological samples i.e . two or the molecule or DNA mimics that do not have a phosphodi
more biological samples, requires that each of the two or 35 ester backbone . The deoxyribonucleic acids can be from
more samples bemarked with marker nucleic acids that have naturally -occurring genomes or can be generated in a labo
sequences that are unique to each of the plurality of test r atory through the use of enzymes or by solid phase chemi
sample that is being marked . For example , a first sample can cal synthesis . Chemical methods can also be used to gen
be marked with a marker nucleic acid having sequence A , erate the DNA mimics that are not found in nature .
and a second sample can be marked with a marker nucleic 40 Derivatives of DNA are that are available in which the
acid having sequence B . Alternatively , a first sample can be phosphodiester linkage has been replaced but in which the
marked with marker nucleic acid molecules all having deoxyribose is retained include but are not limited to DNA
sequence A , and a second sample can be marked with a mimics having backbones formed by thioformacetal or a
mixture of sequences B and C , wherein sequences A , B and carboxamide linkage , which have been shown to be good
C are marker molecules having different sequences . 45 structural DNA mimics . Other DNA mimics include mor
The marker nucleic acid (s) can be added to the sample at pholino derivatives and the peptide nucleic acids (PNA ),
any stage of sample preparation that occurs prior to library which contain an N -(2 -aminoethyl) glycine -based pseudo
preparation ( if libraries are to be prepared ) and sequencing . peptide backbone (Ann Rev Biophys Biomol Struct 24 : 167
In one embodiment, marker molecules can be combined 183 [ 1995 ]) . PNA is an extremely good structural mimic of
with an unprocessed source sample . For example , the 50 DNA ( or of ribonucleic acid (RNA ]), and PNA oligomers
marker nucleic acid can be provided in a collection tube that are able to form very stable duplex structures with Watson
is used to collect a blood sample . Alternatively, the marker Crick complementary DNA and RNA (or PNA ) oligomers ,
nucleic acids can be added to the blood sample following the and they can also bind to targets in duplex DNA by helix
blood draw . In one embodiment, the marker nucleic acid is invasion (Mol Biotechnol 26 : 233 - 248 [ 2004 ]. Another good
added to the vessel that is used to collect a biological fluid 55 structural mimic /analog of DNA analog that can be used as
sample e.g . the marker nucleic acid (s ) are added to a blood a marker molecule is phosphorothioate DNA in which one of
collection tube that is used to collect a blood sample . In the non -bridging oxygens is replaced by a sulfur. This
another embodiment, the marker nucleic acid (s ) are added to modification reduces the action of endo - and exonucleases2
a fraction of the biological fluid sample . For example , the including 5 ' to 3' and 3 to 5 ' DNA POL 1 exonuclease ,
marker nucleic acid is added to the plasma and /or serum 60 nucleases S1 and P1, RNases, serum nucleases and snake
fraction of a blood sample e.g . a maternal plasma sample . venom phosphodiesterase.
Similarly , the marker nucleic acids can be added to a biopsy The length of the marker molecules can be distinct or
specimen prior to processing the specimen . In some embodi- indistinct from that of the sample nucleic acids i. e . the length
ments , the marker nucleic acids can be combined with a of the marker molecules can be similar to that of the sample
carrier that delivers the marker molecules into the cells of 65 genomic molecules , or it can be greater or smaller than that
the biological sample. Cell- delivery carriers include pH - of the sample genomic molecules. The length of the marker
sensitive and cationic liposomes . molecules is measured by the number of nucleotide or
US 10 ,400 , 267 B2
31 32
nucleotide analog bases that constitute the markermolecule . The length of marker molecules used for single molecule
Marker molecules having lengths that differ from those of sequencing can be up to about 25 bp , up to about 50 bp, up
the sample genomic molecules can be distinguished from to about 75 bp , up to about 100 bp, up to about 200 bp , up
source nucleic acids using separation methods known in the to about 300 bp, up to about 400 bp , up to about 500 bp, up
art . For example , differences in the length of the marker and 5 to about 600 bp , up to about 700 bp , up to about 800 bp , up
sample nucleic acid molecules can be determined by elec - to about 900 bp, up to about 1000 bp , or more in length .
trophoretic separation e.g. capillary electrophoresis . Size The length chosen for a marker molecule is also deter
differentiation can be advantageous for quantifying and mined by the length of the genomic nucleic acid that is being
assessing the quality of the marker and sample nucleic acids. sequenced . For example , cfDNA circulates in the human
Preferably , the marker nucleic acids are shorter than the 10 bloodstream as genomic fragments of cellular genomic
genomic nucleic acids, and of sufficient length to exclude DNA . Fetal cfDNA molecules found in the plasma of
them from being mapped to the genome of the sample . For pregnant women are generally shorter than maternal cfDNA
example , as a 30 base human sequence is needed to uniquely molecules (Chan et al., Clin Chem 50 :8892 [2004 ]). Size
map it to a human genome. Accordingly in certain embodi- fractionation of circulating fetal DNA has confirmed that the
ments, marker molecules used in sequencing bioassays of 15 average length of circulating fetal DNA fragments is < 300
human samples should be at least 30 bp in length . bp , while maternal DNA has been estimated to be between
The choice of length of the marker molecule is deter- about 0 .5 and 1 Kb (Li et al., Clin Chem , 50 : 1002 - 1011
mined primarily by the sequencing technology that is used [2004 ]). These findings are consistent with those of Fan et
to verify the integrity of a source sample . The length of the al., who determined using NGS that fetal cfDNA is rarely
sample genomic nucleic acids being sequenced can also be 20 > 340 bp (Fan et al., Clin Chem 56 :1279 - 1286 [2010 ]). DNA
considered . For example , some sequencing technologies isolated from urine with a standard silica -based method
employ clonal amplification of polynucleotides, which can consists of two fractions, high molecular weight DNA ,
require that the genomic polynucleotides that are to be which originates from shed cells and low molecular weight
clonally amplified be of a minimum length . For example , (150 - 250 base pair ) fraction of transrenal DNA ( Tr-DNA )
sequencing using the Illumina GAII sequence analyzer 25 (Botezatu et al., Clin Chem . 46 : 1078 - 1084 , 2000 ; and Su et
includes an in vitro clonal amplification by bridge PCR (also al., J Mol. Diagn . 6 : 101- 107 , 2004 ). The application of
known as cluster amplification ) of polynucleotides that have newly developed technique for isolation of cell - free nucleic
a minimum length of 110 bp, to which adaptors are ligated acids from body fluids to the isolation of transrenal nucleic
to provide a nucleic acid of at least 200 bp and less than 600 acids has revealed the presence in urine of DNA and RNA
bp that can be clonally amplified and sequenced . In some 30 fragments much shorter than 150 base pairs ( U . S . Patent
embodiments, the length of the adaptor-ligated marker mol- Application Publication No . 20080139801). In embodi
ecule is between about 200 bp and about 600 bp , between ments , wherein cDNA is the genomic nucleic acid that is
about 250 bp and 550 bp , between about 300 bp and 500 bp , sequenced , marker molecules that are chosen can be up to
or between about 350 and 450 . In other embodiments, the about the length of the cfDNA . For example , the length of
length of the adaptor-ligated marker molecule is about 200 35 marker molecules used in maternal cfDNA samples to be
bp . For example, when sequencing fetal cfDNA that is sequenced as single nucleic acid molecules or as clonally
present in a maternal sample , the length of the marker amplified nucleic acids can be between about 100 bp and
molecule can be chosen to be similar to that of fetal cfDNA 600 . In other embodiments, the sample genomic nucleic
molecules . Thus, in one embodiment, the length of the acids are fragments of larger molecules . For example , a
marker molecule used in an assay that comprises massively 40 sample genomic nucleic acid that is sequenced is fragmented
parallel sequencing of cDNA in a maternal sample to cellular DNA . In embodiments, when fragmented cellular
determine the presence or absence of a fetal chromosomal DNA is sequenced , the length of the marker molecules can
aneuploidy, can be about 150 bp, about 160 bp , 170 bp, be up to the length of the DNA fragments. In some embodi
about 180 bp , about 190 bp or about 200 bp ; preferably, the ments, the length of the marker molecules is at least the
marker molecule is about 170 bp . Other sequencing 45 minimum length required for mapping the sequence read
approaches e. g . SOLID sequencing , Polony Sequencing and uniquely to the appropriate reference genome. In other
454 sequencing use emulsion PCR to clonally amplify DNA embodiments , the length of the marker molecule is the
molecules for sequencing, and each technology dictates the minimum length that is required to exclude the marker
minimum and themaximum length of the molecules that are molecule from being mapped to the sample reference
to be amplified . The length of marker molecules to be 50 genome.
sequenced as clonally amplified nucleic acids can be up to In addition , marker molecules can be used to verify
about 600 bp . In some embodiments, the length of marker samples that are not assayed by nucleic acid sequencing , and
molecules to be sequenced can be greater than 600 bp . that can be verified by common biotechniques other than
Single molecule sequencing technologies, that do not sequencing e.g . real-time PCR .
employ clonal amplification of molecules , and are capable 55 Sample Controls (e .g., in Process Positive Controls for
of sequencing nucleic acids over a very broad range of Sequencing and /or Analysis ).
template lengths , in most situations do not require that the In various embodiments marker sequences introduced
molecules to be sequenced be of any specific length . How - into the samples, e. g., as described above , can function as
ever, the yield of sequences per unit mass is dependent on positive controls to verify the accuracy and efficacy of
the number of 3 ' end hydroxyl groups, and thus having 60 sequencing and subsequent processing and analysis .
relatively short templates for sequencing is more efficient Accordingly, compositions and method for providing an
than having long templates . If starting with nucleic acids in -process positive control (IPC ) for sequencing DNA in a
longer than 1000 nt, it is generally advisable to shear the sample are provided . In certain embodiments, positive con
nucleic acids to an average length of 100 to 200 nt so that trols are provided for sequencing cfDNA in a sample com
more sequence information can be generated from the same 65 prising a mixture of genomes are provided . An IPC can be
mass of nucleic acids. Thus, the length of the marker used to relate baseline shifts in sequence information
molecule can range from tens of bases to thousands ofbases. obtained from different sets of samples e .g. samples that are
US 10 ,400 , 267 B2
33 34
sequenced at different times on different sequencing runs. DNA from three different subjects each carrying a trisomic
Thus, for example , an IPC can relate the sequence informa chromosome 21, a trisomic chromosome 13 , and a trisomic
tion obtained for a maternal test sample to the sequence chromosome 18 . The mixture of fragmented DNA is pre
information obtained from a set of qualified samples that pared for sequencing. Processing of the mixture of frag
were sequenced at a different time. 5 mented DNA can comprise preparing a sequencing library ,
Similarly , in the case of segment analysis , an IPC can which can be sequenced using any massively parallel meth
relate the sequence information obtained from a subject for ods in singleplex or multiplex fashion . Stock solutions of the
particular segment(s ) to the sequence obtained from a set of genomic IPC can be stored and used in multiple diagnostic
qualified samples (of similar sequences ) that were
sequenced at a different time. In certain embodiments an IPC 10 testsAlternatively
.
the IPC can be created using cfDNA
can relate the sequence information obtained from a subject obtained from a mother known to carry a fetus with a known
for particular cancer -related locito the sequence information
obtained from a set of qualified samples ( e . g ., from a known chromosomal aneuploidy. For example , cfDNA can be
amplification / deletion , and the like ). obtained from a pregnant woman carrying a fetus with
In addition . IPCs can be used as markers to track sample 15 trisomy 21. The cDNA is extracted from the maternal
(s ) through the sequencing process . IPCs can also provide a sample , and cloned into a bacterial vector and grown in
qualitative positive sequence dose value e.g. NCV, for one or bacteria to provide an ongoing source of the IPC . The DNA
more aneuploidies of chromosomes of interest e.g . trisomy can be extracted from the bacterial vector using restriction
21 , trisomy 13, trisomy 18 to provide proper interpretation , enzymes . Alternatively, the cloned cfDNA can be amplified
and to ensure the dependability and accuracy of the data . In 20 by e .g . PCR . The IPC DNA can be processed for sequencing
certain embodiments IPCs can be created to comprise in the same runs as the cfDNA from the test samples that are
nucleic acids from male and female genomes to provide to be analyzed for the presence or absence of chromosomal
doses for chromosomes X and Y in a maternal sample to aneuploidies .
determine whether the fetus is male . While the creation of IPCs is described above with respect
The type and the number of in - process controls depends 25 to trisomys , it will be appreciated that IPCs can be created
on the type or nature of the test needed . For example, for a to reflect other partial aneuploidies including for example ,
test requiring the sequencing of DNA from a sample com - various segment amplification and /or deletions. Thus, for
prising a mixture of genomes to determine whether a chro - example , where various cancers are known to be associated
mosomal aneuploidy exists , the in - process control can com - with particular amplifications ( e . g ., breast cancer associated
prise DNA obtained from a sample known to comprise the 30 with 20013 ) IPCs can be created that incorporate those
same chromosomal aneuploidy that is being tested . For known amplifications.
example , the IPC for a test to determine the presence or Sequencing Methods
absence of a fetal trisomy e. g . trisomy 21 , in a maternal The prepared samples ( e .g ., Sequencing Libraries )may be
sample comprises DNA obtained from an individual with sequenced for various purposes . For example, sequencing
trisomy 21. In some embodiments , the IPC comprises a 35 may be used for identifying copy number variation (s ). Any
mixture of DNA obtained from two or more individuals with of a number of sequencing technologies can be utilized . The
different aneuploidies. For example , for a test to determine above-described techniques for preparing or working with
the presence or absence of trisomy 13 , trisomy 18 , trisomy cfDNA - containing samples can be used to provide a source
21, and monosomy X , the IPC comprises a combination of of cfDNA for any of the methods described herein . The
DNA samples obtained from pregnant women each carrying 40 above-described methods for applying adaptor sequences to
a fetus with one of the trisomies being tested . In addition to the ends of cDNA apply only to those sequencing methods
complete chromosomal aneuploidies, IPCs can be created to that employ adaptors .
provide positive controls for tests to determine the presence Some sequencing technologies are available commer
or absence of partial aneuploidies. cially , such as the sequencing- by -hybridization platform
An IPC that serves as the control for detecting a single 45 from AFFYMETRIX® Inc. (Sunnyvale , Calif.) and the
aneuploidy can be created using a mixture of cellular sequencing -by -synthesis platforms from 454 Life Sciences
genomic DNA obtained from two subjects , one being the (Bradford , Conn .), Illumina (Hayward , Calif.) and Helicos
contributor of the aneuploid genome. For example , an IPC Biosciences (Cambridge, Mass .), and the sequencing-by
that is created as a control for a test to determine a fetal
ligation platform from Applied Biosystems (Foster City ,
trisomy e.g . trisomy 21, can be created by combining 50 Calif.), as described below . In addition to the single mol
genomic DNA from a male or female subject carrying the ecule sequencing performed using sequencing -by - synthesis
trisomic chromosome with genomic DNA with a female of Helicos Biosciences, other single molecule sequencing
subject known not to carry the trisomic chromosome. technologies include , but are not limited to , the SMRTTM
Genomic DNA can be extracted from cells of both subjects , technology of Pacific Biosciences, the ION TORRENTTM
and sheared to provide fragments of between about 100 -400 55 technology , and nanopore sequencing developed for
bp ,between about 150 - 350 bp, or between about 200 - 300 bp example, by Oxford Nanopore Technologies .
to simulate the circulating cfDNA fragments in maternal While the automated Sanger method is considered as a
samples . The proportion of fragmented DNA from the “ first generation technology , Sanger sequencing including
subject carrying the aneuploidy e. g . trisomy 21, is chosen to the automated Sanger sequencing , can also be employed in
simulate the proportion of circulating fetal cfDNA found in 60 the methods described herein . Additional suitable sequenc
maternal samples to provide an IPC comprising a mixture of ing methods include, but are not limited to nucleic acid
fragmented DNA comprising about 5 % , about 10 % , about imaging technologies e . g . atomic force microscopy (AFM )
15 % , about 20 % , about 25 % , about 30 % , of DNA from the or transmission electron microscopy ( TEM ). Such tech
subject carrying the aneuploidy. The IPC can comprise DNA niques may be appropriate for sequencing cfDNA obtained
from different subjects each carrying a different aneuploidy. 65 using the freeze-thaw method described above, for example .
For example , the IPC can comprise about 80 % of the Illustrative sequencing technologies are described in greater
unaffected female DNA, and the remaining 20 % can be detail below .
US 10 ,400 , 267 B2
35 36
In one illustrative, but non - limiting, embodiment, the luciferin to oxyluciferin , and this reaction generates light
methods described herein comprise obtaining sequence that is measured and analyzed .
information for the nucleic acids in a test sample e .g . cfDNA
In another illustrative , but non - limiting , embodiment, the
in a maternal sample , cfDNA or cellular DNA in a subject methods described herein comprises obtaining sequence
being screened for a cancer, and the like , using single 5 information for the nucleic acids in the test sample e .g .
molecule sequencing technology of the Helicos True Single cDNA in a maternal test sample , cDNA or cellular DNA in
Molecule Sequencing (tSMS) technology ( e.g . as described aSOLIDTM subject being screened for a cancer, and the like, using the
in Harris T. D . et al., Science 320 : 106 - 109 [ 2008 ]). In the sequencingtechnology
-by - ligation
(Applied Biosystems). In SOLIDTM
, genomic DNA is sheared into frag
tSMS technique, a DNA sample is cleaved into strands of
approximately 100 to 200 nucleotides, and a polyA sequence 10 ments , and adaptors are attached to the 5 ' and 3' ends of the
fragments to generate a fragment library. Alternatively ,
is added to the 3' end of each DNA strand. Each strand is internal adaptors can be introduced by ligating adaptors to
labeled by the addition of a fluorescently labeled adenosine the
nucleotide . The DNA strands are then hybridized to a flow ments5 ', and 3' ends of the fragments, circularizing the frag
digesting the circularized fragment to generate an
cell, which contains millions of oligo - T capture sites that are
are 15
15 internal adaptor, and attaching adaptors to the 5 ' and 3 ends
immobilized to the flow cell surface . In certain embodiments of the resulting fragments to generate a mate -paired library .
the templates can be at a density of about 100 million Next, clonalbead populations are prepared in microreactors
templates/ cm2. The flow cell is then loaded into an instru containing beads , primers, template , and PCR components .
ment, e .g., HeliScopeTM sequencer, and a laser illuminates Following PCR , the templates are denatured and beads are
the surface of the flow cell, revealing the position of each 20 enriched to separate the beads with extended templates .
template . A CCD camera can map the position of the Templates on the selected beads are subjected to a 3'
templates on the flow cell surface. The template fluorescent modification that permits bonding to a glass slide . The
label is then cleaved and washed away. The sequencing sequence can be determined by sequential hybridization and
reaction begins by introducing a DNA polymerase and a ligation of partially random oligonucleotides with a central
fluorescently labeled nucleotide. The oligo - T nucleic acid 25 determined base (or pair of bases ) that is identified by a
serves as a primer. The polymerase incorporates the labeled specific fluorophore . After a color is recorded , the ligated
nucleotides to the primer in a template directed manner. The oligonucleotide is cleaved and removed and the process is
polymerase and unincorporated nucleotides are removed . then repeated .
The templates that have directed incorporation of the fluo - In another illustrative, but non - limiting , embodiment, the
rescently labeled nucleotide are discerned by imaging the 30 methods described herein comprise obtaining sequence
flow cell surface . After imaging , a cleavage step removes the information for the nucleic acids in the test sample e . g .
fluorescent label, and the process is repeated with other cfDNA in a maternal test sample, cfDNA or cellular DNA in
fluorescently labeled nucleotides until the desired read a subject being screened for a cancer, and the like, using the
length is achieved . Sequence information is collected with single molecule , real- time (SMRTTM ) sequencing technol
each nucleotide addition step . Whole genome sequencing by 35 ogy of Pacific Biosciences . In SMRT sequencing, the con
single molecule sequencing technologies excludes or typi- tinuous incorporation of dye- labeled nucleotides is imaged
cally obviates PCR -based amplification in the preparation of during DNA synthesis. Single DNA polymerase molecules
the sequencing libraries, and the methods allow for direct are attached to the bottom surface of individual zero -mode
measurement of the sample, rather than measurement of wavelength detectors ( ZMW detectors) that obtain sequence
copies of that sample . 40 information while phospholinked nucleotides are being
In another illustrative , but non - limiting embodiment, the incorporated into the growing primer strand . A ZMW detec
methods described herein comprise obtaining sequence tor comprises a confinement structure that enables observa
information for the nucleic acids in the test sample e . g . tion of incorporation of a single nucleotide by DNA poly
cfDNA in a maternal test sample, cfDNA or cellular DNA in merase against a background of fluorescent nucleotides that
a subject being screened for a cancer, and the like, using the 45 rapidly diffuse in an out of the ZMW ( e . g ., in microseconds ).
454 sequencing (Roche) ( e . g. as described in Margulies , M . It typically takes several milliseconds to incorporate a
et al. Nature 437 : 376 -380 [ 2005 ]) . 454 sequencing typically nucleotide into a growing strand . During this time, the
involves two steps. In the first step , DNA is sheared into fluorescent label is excited and produces a fluorescent sig
fragments of approximately 300 - 800 base pairs, and the nal, and the fluorescent tag is cleaved off . Measurement of
fragments are blunt- ended . Oligonucleotide adaptors are 50 the corresponding fluorescence of the dye indicates which
then ligated to the ends of the fragments . The adaptors serve base was incorporated . The process is repeated to provide a
as primers for amplification and sequencing of the frag sequence .
ments. The fragments can be attached to DNA capture beads , In another illustrative, but non -limiting embodiment, the
e .g ., streptavidin -coated beads using, e .g., Adaptor B , which methods described herein comprise obtaining sequence
contains 5 -biotin tag . The fragments attached to the beads 55 information for the nucleic acids in the test sample e . g .
are PCR amplified within droplets of an oil -water emulsion . cfDNA in a maternal test sample , cfDNA or cellular DNA in
The result is multiple copies of clonally amplified DNA a subject being screened for a cancer, and the like, using
fragments on each bead . In the second step , the beads are nanopore sequencing (e .g . as described in Soni G V and
captured in wells ( e. g., picoliter-sized wells ). Pyrosequenc - Meller A . Clin Chem 53: 1996 - 2001 [ 2007 ]). Nanopore
ing is performed on each DNA fragment in parallel. Addition 60 sequencing DNA analysis techniques are developed by a
of one or more nucleotides generates a light signal that is number of companies , including, for example , Oxford
recorded by a CCD camera in a sequencing instrument. The Nanopore Technologies (Oxford , United Kingdom ), Seque
signal strength is proportional to the number of nucleotides nom , NABsys, and the like . Nanopore sequencing is a
incorporated . Pyrosequencing makes use of pyrophosphate single-molecule sequencing technology whereby a single
( PPi) which is released upon nucleotide addition. PPi is 65 molecule of DNA is sequenced directly as it passes through
converted to ATP by ATP sulfurylase in the presence of a nanopore . A nanopore is a small hole, typically of the order
adenosine 5'phosphosulfate . Luciferase uses ATP to convert of 1 nanometer in diameter. Immersion of a nanopore in a
US 10 ,400 , 267 B2
37 38
conducting fluid and application of a potential (voltage ) Machine (PGMTM ) sequencer then sequentially floods the
across it results in a slight electrical current due to conduc chip with one nucleotide after another. If the next nucleotide
tion of ions through the nanopore . The amount of current that floods the chip is not a match . No voltage change will
that flows is sensitive to the size and shape of the nanopore. be recorded and no base will be called . If there are two
As a DNA molecule passes through a nanopore , each 5 identical bases on the DNA strand , the voltage will be
nucleotide on the DNA molecule obstructs the nanopore to double , and the chip will record two identical bases called .
a different degree, changing the magnitude of the current Direct detection allows recordation of nucleotide incorpo
through the nanopore in different degrees. Thus, this change ration in seconds .
in the current as the DNA molecule passes through the In another embodiment, the present method comprises
nanopore provides a read of the DNA sequence . 10 obtaining sequence information for the nucleic acids in the
In another illustrative , but non - limiting, embodiment, the test sample e . g . cfDNA in a maternal test sample , using
methods described herein comprises obtaining sequence sequencing by hybridization. Sequencing -by - hybridization
information for the nucleic acids in the test sample e. g . comprises contacting the plurality of polynucleotide
cfDNA in a maternal test sample, cfDNA or cellular DNA in sequences with a plurality of polynucleotide probes, wherein
a subject being screened for a cancer, and the like, using the 15 each of the plurality of polynucleotide probes can be option
chemical- sensitive field effect transistor ( chemFET) array ally tethered to a substrate . The substrate might be flat
( e .g ., as described in U . S . Patent Application Publication surface comprising an array of known nucleotide sequences .
No. 2009 /0026082 ). In one example of this technique , DNA The pattern of hybridization to the array can be used to
molecules can be placed into reaction chambers , and the determine the polynucleotide sequences present in the
template molecules can be hybridized to a sequencing 20 sample . In other embodiments , each probe is tethered to a
primer bound to a polymerase . Incorporation of one or more bead , e .g ., a magnetic bead or the like. Hybridization to the
triphosphates into a new nucleic acid strand at the 3' end of beads can be determined and used to identify the plurality of
the sequencing primer can be discerned as a change in polynucleotide sequences within the sample .
current by a chemFET. An array can have multiple chemFET In another embodiment, the present method comprises
sensors . In another example , single nucleic acids can be 25 obtaining sequence information for the nucleic acids in the
attached to beads, and the nucleic acids can be amplified on test sample e. g . CfDNA in a maternal test sample , by
the bead , and the individual beads can be transferred to massively parallel sequencing ofmillions of DNA fragments
individual reaction chambers on a chemFET array, with each using Illumina' s sequencing-by -synthesis and reversible ter
chamber having a chemFET sensor, and the nucleic acids m inator-based sequencing chemistry (e . g . as described in
can be sequenced . 30 Bentley et al., Nature 6 :53 -59 [ 2009 ]) . Template DNA can
In another embodiment, the present method comprises be genomic DNA e . g . cfDNA . In some embodiments ,
obtaining sequence information for the nucleic acids in the genomic DNA from isolated cells is used as the template ,
test sample e . g . cfDNA in a maternal test sample , using the and it is fragmented into lengths of several hundred base
Halcyon Molecular' s technology, which uses transmission pairs . In other embodiments, cfDNA is used as the template ,
electron microscopy ( TEM ). The method , termed Individual 35 and fragmentation is not required as cfDNA exists as short
Molecule Placement Rapid Nano Transfer (IMPRNT) , com - fragments . For example fetal cfDNA circulates in the blood
prises utilizing single atom resolution transmission electron stream as fragments approximately 170 base pairs (bp ) in
microscope imaging of high -molecular weight ( 150 kb or length (Fan et al., Clin Chem 56 : 1279 - 1286 [2010 ]), and no
greater ) DNA selectively labeled with heavy atom markers fragmentation of the DNA is required prior to sequencing.
and arranging these molecules on ultra - thin films in ultra - 40 Illumina ' s sequencing technology relies on the attachment
dense ( 3 nm strand -to - strand ) parallel arrays with consistent of fragmented genomic DNA to a planar, optically transpar
base - to -base spacing. The electron microscope is used to ent surface on which oligonucleotide anchors are bound .
image the molecules on the films to determine the position Template DNA is end - repaired to generate 5 '- phosphory
of the heavy atom markers and to extract base sequence lated blunt ends, and the polymerase activity of Klenow
information from the DNA . The method is further described 45 fragment is used to add a single A base to the 3 ' end of the
in PCT patent publication WO 2009/046445 . The method blunt phosphorylated DNA fragments . This addition pre
allows for sequencing complete human genomes in less than pares the DNA fragments for ligation to oligonucleotide
ten minutes. adapters , which have an overhang of a single T base at their
In another embodiment, the DNA sequencing technology 3' end to increase ligation efficiency . The adapter oligonucle
is the Ion Torrent single molecule sequencing, which pairs 50 otides are complementary to the flowcell anchors. Under
semiconductor technology with a simple sequencing chem - limiting- dilution conditions , adapter-modified , single
istry to directly translate chemically encoded information stranded template DNA is added to the flow cell and
( A , C , G , T ) into digital information (0 , 1 ) on a semicon - immobilized by hybridization to the anchors. Attached DNA
ductor chip . In nature ,when a nucleotide is incorporated into fragments are extended and bridge amplified to create an
a strand ofDNA by a polymerase , a hydrogen ion is released 55 ultra -high density sequencing flow cell with hundreds of
as a byproduct. Ion Torrent uses a high -density array of millions of clusters, each containing - 1 , 000 copies of the
micro -machined wells to perform this biochemical process same template . In one embodiment, the randomly frag
in a massively parallel way . Each well holds a differentDNA mented genomic DNA e .g . cfDNA, is amplified using PCR
molecule . Beneath the wells is an ion - sensitive layer and before it is subjected to cluster amplification . Alternatively ,
beneath that an ion sensor. When a nucleotide , for example 60 an amplification - free genomic library preparation is used ,
a C , is added to a DNA template and is then incorporated and the randomly fragmented genomic DNA e. g . cfDNA is
into a strand of DNA , a hydrogen ion will be released . The enriched using the cluster amplification alone (Kozarewa et
charge from that ion will change the pH of the solution , al., Nature Methods 6 :291 -295 [2009]). The templates are
which can be detected by Ion Torrent 's ion sensor. The sequenced using a robust four -color DNA sequencing- by
sequencer essentially the world 's smallest solid -state pH 65 synthesis technology that employs reversible terminators
meter - calls the base, going directly from chemical infor- with removable fluorescent dyes . High - sensitivity fluores
mation to digital information . The Ion personal Genome cence detection is achieved using laser excitation and total
US 10 ,400 , 267 B2
39 40
internal reflection optics . Short sequence reads of about 36 bp , are obtained from mapping the reads to the reference
20 -40 bp e.g. 36 bp , are aligned against a repeat-masked genome per sample . In one embodiment, all the sequence
reference genome and unique mapping of the short sequence reads are mapped to all regions of the reference genome. In
reads to the reference genome are identified using specially one embodiment, the tags that have been mapped to all
developed data analysis pipeline software . Non -repeat - 5 regions e . g . all chromosomes , of the reference genome are
masked reference genomes can also be used . Whether counted , and the CNV i. e . the over - or under - representation
repeat-masked or non - repeat-masked reference genomes are of a sequence of interest e . g . a chromosome or portion
used , only reads that map uniquely to the reference genome thereof, in the mixed DNA sample is determined . The
are counted . After completion of the first read , the templates method does not require differentiation between the two
can be regenerated in situ to enable a second read from the 10 genomes .
opposite end of the fragments. Thus , either single - end or The accuracy required for correctly determining whether
paired end sequencing of the DNA fragments can be used a CNV e . g . aneuploidy, is present or absent in a sample , is
Partial sequencing of DNA fragments present in the sample predicated on the variation of the number of sequence tags
is performed , and sequence tags comprising reads of prede that map to the reference genome among samples within a
termined length e . g . 36 bp , are mapped to a known reference 15 sequencing run interchromosomal variability ) , and the
genome are counted . In one embodiment, the reference variation of the number of sequence tags that map to the
genome sequence is the NCBI36 /hg18 sequence , which is reference genome in different sequencing runs (inter- se
available on the world wide web at genome.ucsc . edu / cgi quencing variability ) . For example , the variations can be
bin / particularly pronounced for tags that map to GC -rich or
hgGateway ? org = Human & db = hg18 & hgsid = 166260105 ). 20 GC -poor reference sequences . Other variations can result
Alternatively, the reference genome sequence is the from using different protocols for the extraction and purifi
GRCh37/hg19 , which is available on the world wide web at cation of the nucleic acids , the preparation of the sequencing
genome.ucsc .edu /cgi-bin /hgGateway . Other sources of pub- libraries, and the use of different sequencing platforms. The
lic sequence information include GenBank , dbEST, dbSTS , present method may use sequence doses ( chromosome
EMBL (the European Molecular Biology Laboratory ), and 25 doses, or segment doses as described below ) based on the
the DDBJ ( the DNA Databank of Japan ). A number of knowledge of normalizing sequences (normalizing chromo
computer algorithms are available for aligning sequences, some sequences or normalizing segment sequences ), to
including without limitation BLAST ( Altschul et al., 1990 ), intrinsically account for the accrued variability stemming
BLITZ (MPsrch ) ( Sturrock & Collins, 1993), FASTA ( Per - from interchromosomal (intra -run ), and inter -sequencing
son & Lipman , 1988 ), BOWTIE (Langmead et al., Genome 30 ( inter -run ) and platform -dependent variability. Chromosome
Biology 10 :R25. 1-R25 . 10 [ 2009]), or ELAND (Illumina , doses are based on the knowledge of a normalizing chro
Inc ., San Diego , Calif., USA ). In one embodiment, one end mosome sequence, which can be composed of a single
of the clonally expanded copies of the plasma cfDNA chromosome, or of two or more chromosomes selected from
molecules is sequenced and processed by bioinformatic chromosomes 1 -22 , X , and Y . Alternatively, normalizing
alignment analysis for the Illumina Genome Analyzer , 35 chromosome sequences can be composed of a single chro
which uses the EfficientLarge- Scale Alignment of Nucleo mosome segment, or of two or more segments of one
tide Databases (ELAND ) software . chromosome or of two or more chromosomes . Segment
In some embodiments of the methods described herein , doses are based on the knowledge of a normalizing segment
the mapped sequence tags comprise sequence reads of about sequence , which can be composed of a single segment of
20 bp , about 25 bp , about 30 bp , about 35 bp , about 40 bp , 40 any one chromosome, or of two or more segments of any
about 45 bp , about 50 bp , about 55 bp , about 60 bp , about two or more of chromosomes 1 - 22 , X , and Y .
65 bp , about 70 bp, about 75 bp, about 80 bp , about 85 bp , Singleplex Sequencing
about 90 bp , about 95 bp , about 100 bp , about 110 bp, about FIG . 5 illustrates a flow chart of an embodiment of the
120 bp , about 130 , about 140 bp , about 150 bp , about 200 method 500 whereby marker nucleic acids are combined
bp , about 250 bp , about 300 bp , about 350 bp , about 400 bp , 45 with source sample nucleic acids of a single sample to assay
about 450 bp , or about 500 bp . It is expected that techno for a genetic abnormality while determining the integrity of
logical advances will enable single - end reads of greater than the biological source sample . In step 510 , a biological source
500 bp enabling for reads of greater than about 1000 bp sample comprising genomic nucleic acids is obtained . In
when paired end reads are generated . In one embodiment, step 520 , marker nucleic acids are combined with the
the mapped sequence tags comprise sequence reads that are 50 biological source sample to provide a marked sample . A
36 bp . Mapping of the sequence tags is achieved by com - sequencing library of a mixture of clonally amplified source
paring the sequence of the tag with the sequence of the sample genomic and marker nucleic acids is prepared in step
reference to determine the chromosomal origin of the 530 , and the library is sequenced in a massively parallel
sequenced nucleic acid (e .g. cfDNA ) molecule , and specific fashion in step 540 to provide sequencing information
genetic sequence information is not needed . A small degree 55 pertaining to the source genomic and marker nucleic acids of
of mismatch (0 -2 mismatches per sequence tag ) may be the sample .Massively parallel sequencingmethods provide
allowed to account for minor polymorphisms that may exist sequencing information as sequence reads, which are
between the reference genome and the genomes in the mixed mapped to one or more reference genomes to generate
sample . sequence tags that can be analyzed . In step 550 , all sequenc
A plurality of sequence tags are typically obtained per 60 ing information is analyzed , and based on the sequencing
sample . In some embodiments , at least about 3x10° information pertaining to the marker molecules , the integrity
sequence tags, at least about 5x10° sequence tags , at least of the source sample is verified in step 560 . Verification of
about 8x10° sequence tags, at least about 10x10° sequence source sample integrity is accomplished by determining a
tags, at least about 15x10° sequence tags , at least about correspondence between the sequencing information
20x10° sequence tags , at least about 30x10 sequence tags, 65 obtained for the maker molecule at step 550 and the known
at least about 40x100 sequence tags, or at least about 50x100 sequence of the marker molecule that was added to the
sequence tags comprising between 20 and 40 bp reads e. g. original source sample at step 520. The sameprocess can be
US 10 ,400 , 267 B2
applied to multiple samples that are sequenced separately, are combined with each of the biological source samples to
with each sample comprising molecules having sequences provide a plurality of uniquely marked samples . A sequenc
unique to the sample i.e . one sample is marked with a unique ing library of sample genomic and marker nucleic acids is
marker molecule and it is sequenced separately from other prepared in step 630 for each of the uniquely marked
samples in a flow cell or slide of a sequencer . If the integrity 5 samples. Library preparation of samples that are destined to
of the sample is verified , the sequencing information per - undergo multiplexed sequencing comprises the incorpora
taining to the genomic nucleic acids of the sample can be tion of distinct indexing tags into the sample and marker
analyzed to provide information e. g. about the status of the nucleic acids of each of the uniquely marked samples to
subject from which the source sample was obtained . For provide samples whose source nucleic acid sequences can be
example , if the integrity of the sample is verified , the 10 correlated with the corresponding marker nucleic acid
sequencing information pertaining to the genomic nucleic sequences and identified in complex solutions. In embodi
acids is analyzed to determine the presence or absence of a ments of the method comprising marker molecules that can
chromosomal abnormality . If the integrity of the sample is be enzymatically modified , e. g . DNA , indexing molecules
not verified , the sequencing information is disregarded . can be incorporated at the 3 ' of the sample and marker
The method depicted in FIG . 5 is also applicable to 15 molecules by ligating sequenceable adaptor sequences com
bioassays that comprise singleplex sequencing of single prising the indexing sequences . In embodiments of the
molecules e. g . tSMS by Helicos, SMRT by Pacific Biosci method comprising marker molecules that cannot be enzy
ences , BASE by Oxford Nanopore , and other technologies matically modified , e. g . DNA analogs that do not have a
such as that suggested by IBM , which do not require phosphate backbone , indexing sequences are incorporated at
preparation of libraries. 20 the 3 ' of the analog marker molecules during synthesis .
Multiplex Sequencing Sequencing libraries of two or more samples are pooled and
The large number of sequence reads that can be obtained loaded on the flow cell of the sequencer where they are
per sequencing run permits the analysis of pooled samples sequenced in a massively parallel fashion in step 640. In step
i.e . multiplexing, which maximizes sequencing capacity and 650 , all sequencing information is analyzed, and based on
reduces workflow . For example , the massively parallel 25 the sequencing information pertaining to the marker mol
sequencing of eight libraries performed using the eight lane ecules ; the integrity of the source sample is verified in step
flow cell of the Illumina Genome Analyzer, and Illumina 's 660. Verification of the integrity of each of the plurality of
HiSeq Systems, can bemultiplexed to sequence two or more source samples is accomplished by first grouping sequence
samples in each lane such that 16 , 24 , 32 etc . or more tags associated with identical index sequences to associate
samples can be sequenced in a single run . Parallelizing 30 the genomic and marker sequences and distinguish
sequencing for multiple samples i.e . multiplex sequencing, sequences belonging to each of the libraries made from
requires the incorporation of sample -specific index genomic molecules of a plurality ofsamples. Analysis of the
sequences , also known as barcodes , during the preparation grouped marker and genomic sequences is then performed to
of sequencing libraries. Sequencing indexes are distinct base verify that the sequence obtained for the marker molecules
sequences of about 5 , about 10 , about 15 , about 20 about 25 , 35 corresponds to the known unique sequence added to the
or more bases that are added at the 3 ' end of the genomic and corresponding source sample . If the integrity of the sample
marker nucleic acid . The multiplexing system enables is verified , the sequencing information pertaining to the
sequencing of hundreds of biological samples within a genomic nucleic acids of the sample can be analyzed to
single sequencing run . The preparation of indexed sequenc provide genetic information about the subject from which
ing libraries for sequencing of clonally amplified sequences 40 the source sample was obtained . For example , if the integrity
can be performed by incorporating the index sequence into of the sample is verified , the sequencing information per
one of the PCR primers used for cluster amplification . taining to the genomic nucleic acids is analyzed to determine
Alternatively, the index sequence can be incorporated into the presence or absence of a chromosomal abnormality . The
the adaptor, which is ligated to the cfDNA prior to the PCR absence of a correspondence between the sequencing infor
amplification . Indexed libraries for single molecule sequenc - 45 mation and known sequence of the marker molecule is
ing can be created by incorporating the index sequence at the indicative of a sample mix -up, and the accompanying
3 ' end of the marker and genomic molecule or 5' to the sequencing information pertaining to the genomic cfDNA
addition of a sequence needed for hybridization to the flow molecules is disregarded .
cell anchors e. g. addition of the polyA tail for single mol- Copy Number Variation Analysis Applications
ecule sequencing using the tSMS. Sequencing of the 50 Sequence information generated as described herein can
uniquely marked indexed nucleic acids provides index be used for any number of applications. One application is
sequence information that identifies samples in the pooled in determining copy number variations (CNVs) in the
sample libraries, and sequence information of marker mol- cfDNA. CNVs that can be determined according to the
ecules correlates sequencing information of the genomic present method include trisomies and monosomies of any
nucleic acids to the sample source . In embodiments wherein 55 one or more of chromosomes 1 - 22 , X and Y , other chromo
the multiple samples are sequenced individually i. e . single - somal polysomies, and deletions and /or duplications of
plex sequencing, marker and genomic nucleic acid mol segments of any one or more of the chromosomes, which
ecules of each sample need only be modified to contain the can be detected by sequencing only once the nucleic acids of
adaptor sequences as required by the sequencing platform a test sample . Any aneuploidy can be determined from
and exclude the indexing sequences . 60 sequencing information that is obtained by sequencing only
FIG . 6 provides a flowchart of an embodiment 600 of the once the nucleic acids of a test sample.
method for verifying the integrity of samples that are The methods and apparatus described herein may employ
subjected to a multistep multiplex sequencing bioassay i.e. next generation sequencing technology (NGS) as described
nucleic acids from individual samples are combined and above. In certain embodiments , clonally amplified DNA
sequenced as a complex mixture . In step 610 , a plurality of 65 templates or single DNA molecules are sequenced in a
biological source samples each comprising genomic nucleic massively parallel fashion within a flow cell ( e .g . as
acids is obtained . In step 620 , unique marker nucleic acids described in Volkerding et al. Clin Chem 55 :641 -658 [ 2009 ];
US 10,400 ,267 B2
43
Metzker M Nature Rev 11 :31-46 [ 2010 ]). In addition to the sample . Step (a ) can comprise sequencing at least a
high - throughput sequence information , NGS provides quan - portion of the nucleic acid molecules of a test sample to
titative information , in that each sequence read is a count- obtain said sequence information for the fetal and maternal
able “ sequence tag ” representing an individual clonal DNA nucleic acid molecules of the test sample.
template or a single DNA molecule . 5 In some embodiments , step ( c ) comprises calculating a
In some embodiments , the methods and apparatus dis - single chromosome dose for each of the chromosomes of
closed herein may employ the following some or all of the interest as the ratio of the number of sequence tags identified
operations from the following: obtain a nucleic acid test for each of the chromosomes of interest and the number of
sample from a patient (typically by a non - invasive proce sequence tags identified for the normalizing segment
dure ); process the test sample in preparation for sequencing; 10 sequence for each of the chromosomes of interest. In some
sequence nucleic acids from the test sample to produce other embodiments , step (c ) comprises (i) calculating a
numerous reads ( e. g ., at least 10 ,000 ) ; align the reads to sequence tag density ratio for each of chromosomes of
portions of a reference sequence / genome and determine the interest, by relating the number of sequence tags identified
amount of DNA ( e.g ., the number of reads) that map to for each chromosomes of interest in step (b ) to the length of
defined portions the reference sequence (e . g ., to defined 15 each of the chromosomes of interest; (ii ) calculating a
chromosomes or chromosome segments ); calculate a dose of sequence tag density ratio for each normalizing segment
one or more of the defined portions by normalizing the sequence by relating the number of sequence tags identified
amount of DNA mapping to the defined portions with an for the normalizing segment sequence in step (b ) to the
amount of DNA mapping to one or more normalizing length of each the normalizing chromosomes; and (iii ) using
chromosomes or chromosome segments selected for the 20 the sequence tag density ratios calculated in steps (i) and ( ii )
defined portion ; determining whether the dose indicates that to calculate a single chromosome dose for each of said
the defined portion is " affected ” ( e. g ., aneuploidy or chromosomes of interest, wherein said chromosome dose is
mosaic ); reporting the determination and optionally convert- calculated as the ratio of the sequence tag density ratio for
ing it to a diagnosis; using the diagnosis or determination to each of the chromosomes of interest and the sequence tag
develop a plan of treatment, monitoring , or further testing 25 density ratio for the normalizing segment sequence for each
for the patient. of the chromosomes of interest.
In some embodiments , the biological sample is obtained Copy number variations in the human genome signifi
from a subject and comprises a mixture of nucleic acids cantly influence human diversity and predisposition to dis
contributed by different genomes. The different genomes can ease (Redon et al., Nature 23 :444 -454 [ 2006 ], Shaikh et al.
be contributed to the sample by two individuals e. g . the 30 Genome Res 19 : 1682- 1690 [ 2009 ]). CNVs have been
different genomes are contributed by the fetus and the known to contribute to genetic disease through different
mother carrying the fetus. Alternatively , the genomes are mechanisms, resulting in either imbalance of gene dosage or
contributed to the sample by aneuploid cancerous cells and gene disruption in most cases. In addition to their direct
normal euploid cells from the same subject e .g . a plasma correlation with genetic disorders , CNVs are known to
sample from a cancer patient . 35 mediate phenotypic changes that can be deleterious.
Apart from analyzing a patient' s test sample , one or more Recently , several studies have reported an increased burden
normalizing chromosomes or one ormore normalizing chro of rare or de novo CNVS in complex disorders such as
mosome segments are selected for each possible chromo - cancers , Autism , ADHD , and schizophrenia as compared to
some of interest. The normalizing chromosomes or seg normal controls, highlighting the potential pathogenicity of
ments are identified asynchronously from the normal testing 40 rare or unique CNVs (Sebat et al., 316 :445 -449 [2007];
of patient samples, which may take place in a clinical Walsh et al., Science 320 :539 -543 [ 2008 ]). CNV arise from
setting. In other words , the normalizing chromosomes or genomic rearrangements , primarily owing to deletion , dupli
segments are identified prior to testing patient samples . The cation , insertion , and unbalanced translocation events .
associations between normalizing chromosomes or seg - Copy number variations determined by the methods and
ments and chromosomes or segments of interest are stored 45 apparatus disclosed herein include gains or losses of entire
for use during testing . chromosomes, alterations involving very large chromosomal
In some embodiments , a method is provided for deter - segments that are microscopically visible , and an abundance
mining the presence or absence of any one or more complete of sub -microscopic copy number variation of DNA seg
fetal chromosomal aneuploidies in a maternal test sample m ents ranging from kilobases (kb ) to megabases (Mb) in
comprising fetal and maternal nucleic acids. The steps of the 50 size . The method is applicable to determining CNV of any
method comprise : ( a ) obtaining sequence information for the fetal aneuploidy , and CNVs known or suspected to be
fetal and maternal nucleic acids in the sample; (b ) using the associated with a variety of medical conditions.
sequence information to identify a number of sequence tags CNV for Prenatal Diagnoses
for each of any one or more chromosomes of interest The present method is a polymorphism -independent
selected from chromosomes 1 - 22 , X and Y and to identify a 55 method that for use in NIPD and that does not require that
number of sequence tags for a normalizing segment the fetal cfDNA be distinguished from the maternal cfDNA
sequence for each of any one or more chromosomes of to enable the determination of a fetal aneuploidy. In some
interest; ( c ) using the number of sequence tags identified for embodiments , the aneuploidy is a complete chromosomal
each of any one or more chromosomes of interest and the trisomy or monosomy, or a partial trisomy or monosomy.
number of sequence tags identified for the normalizing 60 Partial aneuploidies are caused by loss or gain of part of a
segment sequence to calculate a single chromosome dose for chromosome, and encompass chromosomal imbalances
each of any one or more chromosomes of interest; and (d ) resulting from unbalanced translocations, unbalanced inver
comparing each of the single chromosome doses for each of sions , deletions and insertions . By far, the most common
any one or more chromosomes of interest to a threshold known aneuploidy compatible with life is trisomy 21 i.e .
value for each of the one or more chromosomes of interest, 65 Down Syndrome (DS) , which is caused by the presence of
and thereby determining the presence or absence of one or part or all of chromosome 21. Rarely , DS can be caused by
more different complete fetal chromosomal aneuploidies in an inherited or sporadic defect whereby an extra copy of all
US 10 ,400 , 267 B2
45 46
or part of chromosome 21 becomes attached to another Non -limiting examples of deletion syndromes that can be
chromosome ( usually chromosome 14 ) to form a single determined according to the present method include syn
aberrant chromosome. DS is associated with intellectual dromes caused by partial deletions of chromosomes.
impairment, severe learning difficulties and excess mortality Examples of partial deletions that can be determined accord
caused by long-term health problems such as heart disease. 5 ing to the methods described herein include without limita
Other aneuploidies with well-known clinical significance tion partial deletions of chromosomes 1 , 4 , 5 , 7, 11 , 18 , 15 ,
include Edward syndrome (trisomy 18 ) and Patau Syndrome 13 , 17 , 22 and 10 , which are described in the following.
( trisomy 13 ), which are frequently fatal within the first few Examples of deletion disorders include but are not limited to
1921 . 1 deletion syndrome or 1921. 1 ( recurrent ) microdele
months of life .
Abnormalities associated with the number of sex chro 10 tion, Wolf-Hirschhorn syndrome (WHS) (OMIN # 194190 ),
Williams-Beuren Syndrome also known as chromosome
mosomes are also known and include monosomy X e .g . 7q11 .23 deletion syndrome (OMIN 194050 ), Jacobsen Syn
Turner syndrome (XO ), and triple X syndrome (XXX ) in drome also known as 1lq deletion disorder, partial mono
female births and Kleinefelter syndrome (XXY ) and XYY somy of chromosome 18 also known as monosomy 18p ,
syndrome in male births, which are all associated with 15 Angelman Syndrome and Prader -Willi Syndrome, partial
various phenotypes including sterility and reduction in intel monosomy 139 , Smith -Magenis syndrome (SMS-OMIM
lectual skills . Monosomy X [45 , X ] is a common cause of # 182290 ), 22q11.2 deletion syndrome also known as
early pregnancy loss accounting for about 7 % of spontane- DiGeorge syndrome, DiGeorge Syndrome, etc .
ous abortions . Based on the liveborn frequency of 45 , X ( also Several duplication syndromes caused by the duplication
called Turner syndrome) of 1 - 2/ 10 ,000 , it is estimated that 20 of part of chromosome armshave been identified ( see OMIN
less than 1 % of 45 , X conceptuses will survive to term . About Online Mendelian Inheritance in Man viewed online at
30 % of Turners syndrome patients are mosaic with both a ncbi.nlm .nih . gov /omim ]). In one embodiment, the present
45,X cell line and either a 46 ,XX cell line or one containing method can be used to determine the presence or absence of
a rearranged X chromosome (Hook and Warburton 1983). duplications and / or multiplications of segments of any one
The phenotype in a liveborn infant is relatively mild con - 25 of chromosomes 1 - 22 , X and Y . Non -limiting examples of
sidering the high embryonic lethality and it has been hypoth - duplications syndromes that can be determined according to
esized that possibly all liveborn females with Turner syn the present method include duplications of part of chromo
drome carry a cell line containing two sex chromosomes. somes 8 , 15 , 12 , and 17 , which are described in the follow
Monosomy X can occur in females as 45 , X or as 45, X ing .
46XX ,and in males as 45 , X /46XY. Autosomal monosomies 30 Determination of CNV of Clinical Disorders
in human are generally suggested to be incompatible with In addition to the early determination of birth defects, the
life ; however, there is quite a number of cytogenetic reports methods described herein can be applied to the determina
describing full monosomy of one chromosome 21 in live tion of any abnormality in the representation of genetic
born children ( Vosranova I et al., Molecular Cytogen . 1 : 13 sequences within the genome. A number of abnormalities in
[ 2008 ]; Joosten et al., Prenatal Diagn . 17 : 271 -5 [ 1997 ]. The 35 the representation of genetic sequences within the genome
method described herein can be used to diagnose these and have been associated with various pathologies. Such
other chromosomal abnormalities prenatally . pathologies include, but are not limited to cancer, infectious
According to some embodiments the methods disclosed and autoimmune diseases, diseases of the nervous system ,
herein can determine the presence or absence of chromo- metabolic and /or cardiovascular diseases , and the like.
somal trisomies of any one of chromosomes 1 - 22 , X and Y . 40 Accordingly in various embodiments use of the methods
Examples of chromosomal trisomies that can be detected described herein in the diagnosis , and / or monitoring , and or
according to the present method include without limitation treating such pathologies is contemplated . For example , the
trisomy 21 ( T21; Down Syndrome), trisomy 18 (T18 ; methods can be applied to determining the presence or
Edward 's Syndrome), trisomy 16 (116 ), trisomy 20 ( T20 ), absence of a disease , to monitoring the progression of a
trisomy 22 ( T22; Cat Eye Syndrome), trisomy 15 (T15 ; 45 disease and/ or the efficacy of a treatment regimen , to deter
Prader Willi Syndrome), trisomy 13 ( T13 ; Patau Syndrome), mining the presence or absence of nucleic acids of a patho
trisomy 8 (T8; Warkany Syndrome), trisomy 9 , and the XXY gen e. g. virus; to determining chromosomal abnormalities
(Kleinefelter Syndrome ), XYY, or XXX trisomies . Com associated with graft versus host disease (GVHD ), and to
plete trisomies of other autosomes existing in a non -mosaic determining the contribution of individuals in forensic
state are lethal, but can be compatible with life when present 50 analyses .
in a mosaic state . It will be appreciated that various complete CNVs in Cancer
trisomies, whether existing in a mosaic or non -mosaic state , It has been shown that blood plasma and serum DNA from
and partial trisomies can be determined in fetal cfDNA cancer patients contains measurable quantities of tumor
according to the teachings provided herein . Non -limiting DNA, that can be recovered and used as surrogate source of
examples of partial trisomies that can be determined by the 55 tumor DNA , and tumors are characterized by aneuploidy, or
present method include, but are not limited to, partial inappropriate numbers of gene sequences or even entire
trisomy 1q32- 44 , trisomy 9 p , trisomy 4 mosaicism , trisomy chromosomes. The determination of a difference in the
17p , partial trisomy 4926 - qter, partial 2p trisomy, partial amount of a given sequence i.e . a sequence of interest, in a
trisomy lq, and /or partial trisomy 6p/monosomy 6q . sample from an individual can thus be used in the prognosis
The methods disclosed herein can also be used to deter - 60 or diagnosis of a medical condition . In some embodiments ,
mine chromosomalmonosomy X , chromosomalmonosomy the present method can be used to determine the presence or
21, and partial monosomies such as, monosomy 13 , mono - absence of a chromosomal aneuploidy in a patient suspected
somy 15 , monosomy 16 , monosomy 21, and monosomy 22, or known to be suffering from cancer.
which are known to be involved in pregnancy miscarriage . In certain embodiments the aneuploidy is characteristic of
Partial monosomy of chromosomes typically involved in 65 the genome of the subject and results in a generally
complete aneuploidy can also be determined by the method increased predisposition to a cancer. In certain embodiments
described herein . the aneuploidy is characteristic of particular cells (e . g .,
US 10 , 400, 267 B2
47 48
tumor cells, proto -tumor neoplastic cells, etc .) that are or Examples of a sequence of interest include nucleic acids
have an increased predisposition to neoplasia . Particular sequences e. g . complete chromosomes and /or segments of
aneuploidies are associated with particular cancers or pre - chromosomes , that are amplified or deleted in cancerous
dispositions to particular cancers as described below . cells. Cancers have been shown to correlate with full chro
Accordingly , various embodiments of the methods 5 mosome aneuploidy, arm level CNV, and/ or focal CNV.
described herein provide a determination of copy number Examples of cancers associated with CNV are discussed in
variation of sequence ( s ) of interest e . g . clinically - relevant further detail in U .S . Patent Application No.
sequence (s ), in a test sample from a subject where certain US20130029852 A1, which is incorporated by reference for
variations in copy number provide an indicator of the its description of CNV ' s role in cancers .
presence and/ or a predisposition to a cancer. In certain 10 CNVs in Infectious and Autoimmune Disease
embodiments the sample comprises a mixture of nucleic To date a number of studies have reported association
acids is derived from two or more types of cells. In one between CNV in genes involved in inflammation and the
embodiment, the mixture of nucleic acids is derived from immune response and HIV , asthma, Crohn 's disease and
normal and cancerous cells derived from a subject suffering other autoimmune disorders (Fanciulli et al., Clin Genet
from a medical condition e . g . cancer. 15 77 : 201-213 [ 2010 ]). For example , CNV in CCL3L1, has
The development of cancer is often accompanied by an been implicated in HIV / AIDS susceptibility (CCL3L1,
alteration in number of whole chromosomes i.e . complete 17911. 2 deletion ), rheumatoid arthritis (CCL3L1, 17q11 .2
chromosomal aneuploidy, and/or an alteration in the number deletion ), and Kawasaki disease (CCL3L1, 17q11.2 dupli
of segments of chromosomes i.e . partial aneuploidy, caused cation ); CNV in HBD -2 , has been reported to predispose to
by a process known as chromosome instability (CIN ) 20 colonic Crohn 's disease (HDB -2 , 8p23 . 1 deletion ) and pso
( Thoma et al., Swiss Med Weekly 2011 : 141:w13170 ). It is riasis (HDB - 2 , 8p23 . 1 deletion ); CNV in FCGR3B , was
believed that many solid tumors , such as breast cancer , shown to predispose to glomerulonephritis in systemic lupus
progress from initiation to metastasis through the accumu erthematosous (FCGR3B , 1923 deletion , 1q23 duplication ),
lation of several genetic aberrations . [ Sato et al., Cancer anti - neutrophil cytoplasmic antibody (ANCA )-associated
Res ., 50 : 7184 -7189 [ 1990 ]; Jongsma et al., J Clin Pathol: 25 vasculatis ( FCGR3B , 1q23 deletion ), and increase the risk of
Mol Path 55 : 305 - 309 [ 2002 ]) ). Such genetic aberrations , as developing rheumatoid arthritis . There are at least two
they accumulate , may confer proliferative advantages, inflammatory or autoimmune diseases that have been shown
genetic instability and the attendant ability to evolve drug to be associated with CNV at different gene loci. For
resistance rapidly, and enhanced angiogenesis , proteolysis example , Crohn ' s disease is associated with low copy num
and metastasis . The genetic aberrations may affect either 30 ber at HDB - 2 , but also with a common deletion polymor
recessive “ tumor suppressor genes” or dominantly acting phism upstream of the IGRM gene that encodes a member
oncogenes. Deletions and recombination leading to loss of of the p47 immunity -related GTPase family . In addition to
heterozygosity (LOH ) are believed to play a major role in the association with FCGR3B copy number, SLE suscepti
tumor progression by uncovering mutated tumor suppressor bility has also been reported to be significantly increased
alleles. 35 among subjects with a lower number of copies of comple
cfDNA has been found in the circulation of patients ment component C4.
diagnosed with malignancies including but not limited to Associations between genomic deletions at the GSTM1
lung cancer (Pathak et al. Clin Chem 52: 1833 - 1842 [2006 ]) , (GSTM1, 1q23deletion ) and GSTT1 (GSTT1, 22q11.2 dele
prostate cancer (Schwartzenbach et al. Clin Cancer Res tion ) loci and increased risk of atopic asthma have been
15 : 1032- 8 [2009]) , and breast cancer (Schwartzenbach et al. 40 reported in a number of independent studies . In some
available online at breast - cancer-research .com / content/ 11 / 5 / embodiments , the methods described herein can be used to
R71 [ 2009 ]) . Identification of genomic instabilities associ- determine the presence or absence of a CNV associated with
ated with cancers that can be determined in the circulating inflammation and /or autoimmune diseases . For example , the
cfDNA in cancer patients is a potential diagnostic and methods can be used to determine the presence of a CNV in
prognostic tool. In one embodiment, methods described 45 a patient suspected to be suffering from HIV , asthma, or
herein are used to determine CNV of one or more sequence Crohn ' s disease . Examples of CNV associated with such
(s) of interest in a sample , e.g., a sample comprising a diseases include without limitation deletions at 17q11.2 ,
mixture of nucleic acids derived from a subject that is 8p23 .1 , 1923 , and 22q11.2 , and duplications at 17q11 .2 , and
suspected or is known to have cancer, e . g., carcinoma, 1923 . In some embodiments, the presentmethod can be used
sarcoma, lymphoma, leukemia , germ cell tumors and blas - 50 to determine the presence of CNV in genes including butnot
toma. limited to CCL3L1, HBD - 2 , FCGR3B , GSTM , GSTT1 , C4 ,
In one embodiment, the sample is a plasma sample and IRGM .
derived (processed ) from peripheral blood that may com CNV Diseases of the Nervous System
prise a mixture of cfDNA derived from normal and cancer Associations between de novo and inherited CNV and
ous cells. In another embodiment, the biological sample that 55 several common neurological and psychiatric diseases have
is needed to determine whether a CNV is present is derived been reported in autism , schizophrenia and epilepsy, and
from a cells that, if a cancer is present, comprise a mixture some cases of neurodegenerative diseases such as Parkin
of cancerous and non - cancerous cells from other biological son 's disease , amyotrophic lateral sclerosis (ALS ) and auto
tissues including, but not limited to biological fluids or in somal dominant Alzheimer ' s disease (Fanciulli et al ., Clin
tissue biopsies , swabs , or smears. In other embodiments, the 60 Genet 77 : 201 - 213 [ 2010 ]) . Cytogenetic abnormalities have
biological sample is a stool ( fecal) sample . been observed in patients with autism and autism spectrum
The methods described herein are not limited to the disorders (ASDs) with duplications at 15q11 -q13 . Accord
analysis of cfDNA . It will be recognized that similar analy ing to the Autism Genome project Consortium , 154 CNV
ses can be performed on cellular DNA samples . including several recurrent CNVs, either on chromosome
In various embodiments the sequence (s) of interest com - 65 15q11 -q13 or at new genomic locations including chromo
prise nucleic acid sequence ( s ) known or is suspected to play some 2p16 , 1921 and at 17p12 in a region associated with
a role in the development and /or progression of the cancer. Smith -Magenis syndrome that overlaps with ASD . Recur
US 10 ,400 , 267 B2
49 50
rent microdeletions or microduplications on chromosome herein can be used to determine CNV of genes associated
16p11 . 2 have highlighted the observation that de novo with metabolic or cardiovascular disease e . g . hypercholes
CNVs are detected at loci for genes such as SHANK3 terolemia . Examples of CNV associated with such diseases
(22q13 .3 deletion ), neurexin 1 (NRXN1, 2p16 . 3 deletion include without limitation 19p13 .2 deletion /duplication of
and the neuroglins (NLGN4 , Xp22 . 33 deletion ) that are 5 the LDLR gene , and multiplications in the LPA gene.
known to regulate synaptic differentiation and regulate glu - Kits
taminergic neurotransmitter release. Schizophrenia has also In various embodiments, kits are provided for practice of
been associated with multiple de novo CNVs. Microdele - the methods described herein . In certain embodiments the
tions and microduplications associated with schizophrenia kits comprise one or more positive internal controls for a full
contain an overrepresentation of genes belonging to neu - 10 aneuploidy and/or for a partial aneuploidy. Typically,
rodevelopmental and glutaminergic pathways , suggesting although not necessarily , the controls comprise internal
that multiple CNVs affecting these genes may contribute positive controls comprising nucleic acid sequences of the
directly to the pathogenesis of schizophrenia e. g. ERBB4, type that are to be screened for. For example , a control for
2934 deletion , SLC1A3 , 5p13 . 3 deletion ; RAPEGF4 , a test to determine the presence or absence of a fetal trisomy
2q31. 1 deletion; CIT , 12 .24 deletion ; and multiple genes 15 e. g . trisomy 21, in a maternal sample can comprises DNA
with de novo CNV. CNVs have also been associated with characterized by trisomy 21 (e . g ., DNA obtained from an
other neurological disorders including epilepsy (CHRNA7, individual with trisomy 21 ). In some embodiments , the
15q13 . 3 deletion ), Parkinson ' s disease (SNCA 4q22 dupli- control comprises a mixture of DNA obtained from two or
cation ) and ALS (SMN1, 5q12 .2 .- q13 .3 deletion; and SMN2 more individuals with different aneuploidies. For example ,
deletion ). In some embodiments, the methods described 20 for a test to determine the presence or absence of trisomy 13 ,
herein can be used to determine the presence or absence of trisomy 18 , trisomy 21, and monosomy X , the control can
a CNV associated with diseases of the nervous system . For comprise a combination of DNA samples obtained from
example , the methods can be used to determine the presence pregnant women each carrying a fetus with one of the
of a CNV in a patient suspected to be suffering from autisim , trisomys being tested . In addition to complete chromosomal
schizophrenia , epilepsy , neurodegenerative diseases such as 25 aneuploidies, IPCs can be created to provide positive con
Parkinson 's disease , amyotrophic lateral sclerosis (ALS ) or trols for tests to determine the presence or absence of partial
autosomal dominant Alzheimer 's disease . The methods can aneuploidies .
be used to determine CNV of genes associated with diseases In certain embodiments the positive control(s ) comprise
of the nervous system including without limitation any of one or more nucleic acids comprising a trisomy 21 (T21 ),
the Autism Spectrum Disorders (ASD ), schizophrenia , and 30 and /or a trisomy 18 (T18 ), and/ or a trisomy 13 ( T13 ). In
epilepsy, and CNV of genes associated with neurodegenera certain embodiments the nucleic acid (s ) comprising each of
tive disorders such as Parkinson 's disease. Examples of the trisomys present are T21 are provided in separate con
CNV associated with such diseases include without limita - tainers . In certain embodiments the nucleic acids comprising
tion duplications at 15q11 -q13 , 2p16 , 1q21, 17p12 , 16p11. 2 , two or more trisomys are provided in a single container.
and 4922, and deletions at 22q13 . 3 , 2p16 .3 , Xp22 . 33 , 2q34 , 35 Thus , for example , in certain embodiments, a container may
5p13. 3 , 2q31.1 , 12 . 24 , 15q13 .3 , and 5q12 . 2 . In some contain T21 and T18 , T21 and T13, T18 and T13 . In certain
embodiments , the methods can be used to determine the embodiments , a container may contain T18 , T21 and T13 . In
presence of CNV in genes including but not limited to these various embodiments, the trisomys may be provided in
SHANKU , NLGN4, NRXN1, ERBB4 , SLC1A3, equal quantity /concentration . In other embodiments, the
RAPGEF4, CIT , CHRNA7, SNCA , SMN1, and SMN2. 40 trisomymay be provided in particular predetermined ratios.
CNV and Metabolic or Cardiovascular Diseases In various embodiments the controls can be provided as
The association between metabolic and cardiovascular “ stock ” solutions of known concentration .
traits, such as familial hypercholesterolemia (FH ), athero - In certain embodiments the control for detecting an aneu
sclerosis and coronary artery disease , and CNVs has been ploidy comprises a mixture of cellular genomic DNA
reported in a number of studies (Fanciulli et al., Clin Genet 45 obtained from a two subjects , one being the contributor of
77 : 201 - 213 [ 2010 ]) . For example , germline rearrangements, the aneuploid genome. For example, as explained above , an
mainly deletions, have been observed at the LDLR gene internal positive control (IPC ) that is created as a control for
(LDLR , 19p13 . 2 deletion /duplication ) in some FH patients a test to determine a fetal trisomy e. g . trisomy 21 , can
who carry no other LDLR mutations. Another example is the comprise a combination of genomic DNA from a male or
LPA gene that encodes apolipoprotein ( a ) ( apo ( a )) whose 50 female subject carrying the trisomic chromosome with
plasma concentration is associated with risk of coronary genomic DNA from a female subject known not to carry the
artery disease , myocardial infarction (MI) and stroke . trisomic chromosome. In certain embodiments the genomic
Plasma concentrations of the apo (a ) containing lipoprotein DNA is sheared to provide fragments of between about
Lp ( a ) vary over 1000 - fold between individuals and 90 % of 100 - 400 bp , between about 150 - 350 bp , or between about
this variability is genetically determined at the LPA locus, 55 200 -300 bp to simulate the circulating cDNA fragments in
with plasma concentration and Lp ( a ) isoform size being maternal samples.
proportional to a highly variable number of 'kringle 4 ' repeat In certain embodiments the proportion of fragmented
sequences ( range 5 -50 ). These data indicate that CNV in at DNA from the subject carrying the aneuploidy e.g . trisomy
least two genes can be associated with cardiovascular risk . 21 in the control, is chosen to simulate the proportion of
Themethods described herein can be used in large studies to 60 circulating fetal cfDNA found in maternal samples to pro
search specifically for CNV associationswith cardiovascular vide an IPC comprising a mixture of fragmented DNA
disorders. In some embodiments , the present method can be comprising about 5 % , about 10 % , about 15 % , about 20 % ,
used to determine the presence or absence of a CNV about 25 % , about 30 % , of DNA from the subject carrying
associated with metabolic or cardiovascular disease . For the aneuploidy. In certain embodiments the control comprise
example , the present method can be used to determine the 65 DNA from different subjects each carrying a different aneu
presence of a CNV in a patient suspected to be suffering ploidy . For example , the IPC can comprise about 80 % of the
from familial hypercholesterolemia . The methods described unaffected female DNA , and the remaining 20 % can be
US 10,400 ,267 B2
52
DNA from three different subjects each carrying a trisomic In some embodiments, the kits include mild detergents
chromosome 21, a trisomic chromosome 13 , and a trisomic and salts . In some embodiments , the detergents are nonionic
chromosome 18. detergents . In some embodiments , the detergents comprise
In certain embodiments the control( s ) comprise cfDNA TWEEN® - 20 . In some embodiments , the detergent is
obtained from a mother known to carry a fetus with a known 5 selected from one or more of TWEEN® -20 , TRITON®
chromosomal aneuploidy. For example , the controls can X100 , BRIJ® - 35 , SDS, NP40 prior to attempting a library
preparation . The concentrations of the detergents tested
comprise cfDNA obtained from a pregnant woman carrying
varied depending on the ionic /non - ionic character of the
a fetus with trisomy 21 and/or trisomy 18 , and/or trisomy 13 . detergent
The cfDNA can extracted from the maternal sample, and 10 added at . 0 E.1. g% ., and TWEEN® - 20 , BRIJ® -35 and NP40 were
cloned into a bacterial vector and grown in bacteria to added at 0 .01 % and 05 .%05; %SDS .
and TRITON® -X100 were
provide an ongoing source of the IPC . Alternatively, the In various embodiments in addition to the controls or
cloned cfDNA can be amplified by e.g . PCR . instead of the controls , the kits comprise one or more nucleic
While the controls present in the kits are described above
with respect to trisomies , they need not be so limited. It will 15 acids and / or nucleic acid mimics that provide marker
sequence (s ) suitable for tracking and determining sample
be appreciated that the positive controls present in the kit can integrity . In certain embodiments the markers comprise an
be created to reflect other partial aneuploidies including for antigenomic sequence . In certain embodiments the marker
example , various segment amplification and /or deletions. sequences range in length from about 30 bp up to about600
Thus, for example , where various cancers are known to be bp in length or about 100 bp to about 400 bp in length . In
associated with particular amplifications or deletions of 20 certain embodiments the marker sequence (s ) are at least 30
substantially complete chromosomal arms the positive con - bp (or nt) in length . In certain embodiments the marker is
trol(s ) can comprise a p arm or a q arm of any one or more ligated to an adaptor and the length of the adaptor- ligated
of chromosomes 1 - 22 , X and Y . In certain embodiments the marker molecule is between about 200 bp (or nt) and about
control comprises an amplification of one or more arms 600 bp ( or nt), between about 250 bp (or nt ) and 550 bp (or
selected from the group consisting of lq , 39, 4p , 49 , 5p , 59, 25 nt), between about 300 bp (or nt) and 500 bp ( or nt), or
6p , 6q, 7p , 79 , 8p , 87, 9p , 99 , 10p , 109, 12p , 129, 139 , 140, between about 350 and 450 . In certain embodiments , the
16p, 17p , 177, 18p , 187, 19p , 199, 20p , 209, 217, and /or 22q. length of the adaptor-ligated marker molecule is about 200
In certain embodiments, the controls comprise aneuploi- bp (or nt ). In certain embodiments the length of a marker
dies for any regions known to be associated with particular molecule can be about 150 bp ( or nt), about 160 bp (or nt),
amplifications or deletions ( e. g., breast cancer associated 30 170 bp (or nt), about 180 bp (or nt), about 190 bp (or nt) or
with an amplification at 20013 ). Illustrative regions include , about 200 bp ( or nt ). In certain embodiments the length of
but are not limited to 17q23 (associated with breast cancer ), marker ranges up to about 600 bp (or nt ).
19q12 (associate with ovarian cancer ), 1921- 1923 (associ- In certain embodiments the kit provides at least two, or at
ated with sarcomas and various solid tumors ), 8p11 - p12 least three, or at least four, or at least five, or at least six , or
( associated with breast cancer ), the ErbB2 amplicon , and so 35 at least seven , or at least eight, or at least nine , or at least ten ,
forth . In certain embodiments the controls comprise an or at least 11 , or at least 12 , or at least 13 , or at least 14 , or
amplification or a deletion of a chromosomal region . In at least 15 , or at least 16 , or at least 17 m , or at least 18 , or
certain embodiments the controls comprise an amplification at least 19 , or at least 20 , or at least 25 , or at least 30 , or at
or a deletion of a chromosomal region comprising a gene . In least 35 , or at least 40 , or at least 50 different sequences.
certain embodiments the controls comprise nucleic acid 40 In various embodiments , the markers comprise one or
sequences comprising an amplification of a nucleic acid more DNAs or the markers comprise one or more DNA
comprising one or more oncogenes In certain embodiments mimetics. Suitable mimetics include, but are not limited to
the controls comprise nucleic acid sequences comprising an morpholino derivatives , peptide nucleic acids (PNA ), and
amplification of a nucleic acid comprising one or more phosphorothioate DNA. In various embodiments the mark
genes selected from the group consisting of MYC , ERBB2 45 ers are incorporated into the controls. In certain embodi
(EFGR ), CCND1 (Cyclin Di), FGFR1, FGFR2, HRAS , ments the markers are incorporated into adaptor(s ) and /or
KRAS , MYB , MDM2, CCNE, KRAS , MET, ERBB1, provided ligated to adaptors .
CDK4, MYCB , ERBB2 , AKT2 , MDM2 and CDK4. In certain embodiments the kit further includes one or
The foregoing controls are intended to be illustrative and more sequencing adaptors. Such adaptors include , but are
not limiting. Using the teachings provided herein numerous 50 not limited to indexed sequencing adaptors . In certain
other controls suitable for incorporation into a kit will be embodiments the adaptors comprise a single - stranded arm
recognized by one of skill in the art. that include an index sequence and one or more PCR
In certain embodiments, the kits include one or more priming sites . For example , adaptor sequences of about 60
albumin and Ig depletion columns to deplete background bp suitable for use with sequencers from Illumina may be
proteins. 55 employed .
In some embodiments, the kits comprise sample holders In certain embodiments the kit further comprises a sample
that are configured to undergo heating, which deactivates collection device for collection of a biological sample . In
many proteases and nucleases . In some embodiments , the certain embodiments the sample collection device comprises
sample holders configured to be heated to at least about 65° a device for collecting blood and , optionally a receptacle for
for at least about 15 to 30 min . 60 containing blood . In certain embodiments the kit comprises
In some embodiments , the kits include one or more a receptacle for containing blood and the receptacle com
fixatives for white blood cell nuclei. In some embodiments, prises an anticoagulant and /or cell fixative , and /or one or
the kits include one or more nuclease inhibitors. In other more antigenomic marker sequence( s ).
embodiments, the kits include a Cell Free DNA BCTTM tube In certain embodiments the kit further comprises DNA
available from Streck , Inc . of Omaha , Nebr. for blood 65 extraction reagents (e .g ., a separation matrix and/or an
collection , the BCT tube including at least one additive that elution solution ). The kits can also include reagents for
deactivates nucleases. sequencing library preparation . Such reagents include, but
US 10 ,400 , 267 B2
53 54
are not limited to a solution for end -repairing DNA , and /or sub - bins as determined by a defined threshold difference ;
a solution for dA -tailing DNA , and/or a solution for adaptor and (iii ) code for determining that the first bin of interest
ligating DNA . harbors a partial aneuploidy when any of said sub -bins
In addition , the kits optionally include labeling and/or contain significantly more or significantly less nucleic acid
instructional materials providing directions ( e . g ., protocols ) 5 than one ormore other sub - bins. In some embodiments , the
for the use of the reagents and /or devices provided in the kit. computer program product comprises additional code for
For example , the instructional materials can teach the use of determining that a sub -bin of the first bin of interest con
the reagents to prepare samples and/ or to determine copy taining significantly more or significantly less nucleic acid
number variation in a biological sample . In certain embodi than one or more other portions harbors the partial aneu
ments the instructional materials teach the use of the mate - 10 ploidy .
rials to detect a trisomy. In certain embodiments the instruc In some embodiments , the kit comprises a computer
tional materials teach the use of the materials to detect a program product for classifying a copy number variation in
cancer or a predisposition to a cancer. a sub - chromosomal region of a chromosome of interest in a
While the instructional materials in the various kits typi
cally comprise written or printed materials they are not 15 fetal genome, wherein the computer program product com
limited to such . Any medium capable of storing such instruc prises a non -transitory computer readable medium on which
tions and communicating them to an end user is contem - is provided program instructions for classifying a copy
plated herein . Such media include , but are not limited to number variation in a sub - chromosomal region of a chro
electronic storage media ( e.g ., magnetic discs, tapes , car mosome of interest in a fetal genome, the instructions
tridges, chips ), opticalmedia (e. g., CD ROM ), and the like . 20 comprising: ( a ) code for receiving sequence reads from fetal
Such media may include addresses to internet sites that and maternal nucleic acids of a maternal test sample ,
provide such instructional materials . wherein the sequence reads are provided in an electronic
In addition , the kits optionally include labeling and /or format; (b ) code for aligning , using a computing apparatus ,
instructionalmaterials providing directions (e .g., protocols) the sequence reads to a reference chromosome sequence for
for the use of the reagents and/ or devices provided in the kit. 25 the chromosome of interest in the fetal genome and thereby
For example , the instructional materials can teach the use of providing sequence tags corresponding to the sequence
the reagents to prepare samples and / or to determine copy reads ; ( c ) code for computationally identifying a number of
number variation in a biological sample . In certain embodi the sequence tags that are from the chromosome of interest
ments the instructionalmaterials teach the use of the mate by using the computing apparatus and determining that the
rials to detect a trisomy. In certain embodiments the instruc - 30 chromosome of interest in the fetus harbors a copy number
tional materials teach the use of the materials to detect a variation ; (d ) code for calculating a first fetal fraction value
cancer or a predisposition to a cancer.
While the instructional materials in the various kits typi using the number of the sequence tags that are from the
cally comprise written or printed materials they are not chromosome of interest and using the fetal fraction value to
determine
limited to such . Any medium capable of storing such instruc - 354 that the chromosome of interest may contain a
tions and communicating them to an end user is contem partial aneuploidy; (e ) code for computationally identifying
plated herein . Such media include, but are not limited to a number of the sequence tags that are from each of two or
electronic storage media (e.g ., magnetic discs, tapes , car more bins within the reference chromosome sequence by
tridges, chips), opticalmedia (e. g ., CD ROM ), and the like. using the computing apparatus ; and (f ) code for determining
Such media may include addresses to internet sites that 40 that a firstbin of the two ormore bins has a number sequence
provide such instructional materials. tags that is greater or lesser than an expected number tags,
Optionally, the kit comprises a sequencer for sequencing and thereby concluding that the sub -chromosomal region
the fetal and maternal nucleic acids. In embodiments corresponding to the firstbin harbors at least a portion of the
wherein the kit comprises the sequencer, the kit further partial aneuploidy, and wherein the difference between the
comprises a consumable portion of a sequencer,wherein the 45 number of sequence tags for first bin and the expected
consumable portion is configured to sequence fetal and number of tags is greater than a defined threshold .
maternal nucleic acids from one or more maternal test Alternatively, the kit comprises computer program prod
samples . The consumable portion of the sequencer is related ucts for classifying a copy number variation in a cancer
to the sequencing platform being used , and in some genome and/or classifying a copy number variation in a
instances the consumable portion is a flow cell, while in 50 sub -chromosomal region of a chromosome of interest in a
other instances , the consumable portion of the sequencer is
a chip configured to detect ions. In certain embodiments , the cancer
The
genome.
kitmay also comprise a sequencer for sequencing the
kit comprises the consumable portion of the sequencer when fetal and maternal nucleic acids in maternal samples and /or
the sequencer itself is not included in the kit. the cancer and somatic nucleic acids in a cancer sample . The
In some embodiments, another component of the kit is a 55 sequencer
computer program product as described elsewhere herein . process tenscanor hundreds
be a high throughput sequencer that can
of samples at the same time e .g . the
For example , the kit can comprise a computer program
product for classifying a copy number variation in a fetal Illumina HiSegTM systems, or the sequencer can be a per
genome, wherein the computer program product comprises sonal sequencer e . g . the Illumina Misegi sequencer. In
(a ) code for analyzing the tag information for the first bin of 60 some embodiments, the kit includes a consumable portion of
interest to determine whether ( i) the first bin of interest a sequencer such a chip configured to immobilize nucleic
harbors a partial aneuploidy , or ( ii ) the fetus is a mosaic . The acid , detect changes in pH , conduct fluid manipulations, etc.
analysis of the tag information for the first bin of interest The various method , apparatus, systems and uses are
comprises : (i) code for dividing the sequence for the first bin described in further detail in the following Examples which
of interest into a plurality of sub - bins; (ii) code for deter- 65 are not in any way intended to limit the scope of the
mining whether any of said sub -bins contains significantly invention as claimed . The attached figures are meant to be
more or significantly less nucleic acid than one ormore other considered as integral parts of the specification and descrip
US 10 ,400 , 267 B2
55 56
tion of the invention . The following examples are offered to TABLE 1
illustrate , but not to limit the claimed invention .
Library yield in nM as a function of plasma volume input
Library vield in nM
EXAMPLES Plasma ul MINELUTE ® Phe/ CHCl,
The example discussed in method 2 below employs a 200 38 . 4 24. 4
100 27 . 3 19 . 2
freeze thaw (FT) technique and dispenses with the plasma 50 23 . 1 26 . 5
isolation step of the conventional cfDNA isolation protocol. 18 . 2 16 . 2
The example discussed in method 1 demonstrates a proce - 10
dure for making a library directly from cfDNA that is in
isolating 25 The
plasma or in a FT blood supernatant, without firstSt isolating ul (
sequencing libraries generated starting with 50 ul and
microliters ) plasma by both methods were sequenced
cfDNA from the plasma or supernatant. on an Illumina GAII sequencer and various sequencing
Method 1 — Generating Library Directly from Blood or 15 metrics were compared . The table below lists the certain
Plasma without Purifying cfDNA metrics .
Introduction TABLE 2
As explained , in order to sequence a population of DNA
fragments using the current massively parallel sequencing Metrics of sequencing libraries generated by ME and PC methods
systems, adaptor sequences must be attached to either end of Tags: NonExcld NonExcld
the fragments. The collection ofDNA fragments with adapt Input Reads Tags Reads Sites Sites/Tags
ers is a sequencing library. The poor yield of conventional
50 ul plasma- 31328834 13949959 0 .4453 9547222 0 .6844
cfDNA isolation processes provided the inventors with some ME
motivation for making a cfDNA sequencing library from 25 25 ul plasma- 30367943 10686615 0.3519 6188932 0 .5791
biological fluids without first purifying the DNA from such ME
fluids. 50 ul plasma 30807636 11567337 0 .3755 5886940 0 .5089
PC
As explained , the DNA wound around nucleosomes nor - 25 ul plasma- 25533994 10786944 0.4225 3381205 0 .3135
PC
mally wraps and unwraps around the nucleosomal proteins.
This “ breathing” of cfDNA can be utilized to generate a 30
DNA library by attaching adaptors while the cfDNA remains The reads are the short sequences output by the sequencer .
associated with the nucleosomeal proteins. The tags are reads that have been mapped to a non -excluded
Minimum Amount of Biological Fluid Required portion of the human genome. Non -excluded sites are sites
In a process by which a sequencing library is generated 35 onseenthein genome
the
that are not duplicated within the genome. As
table above , cfDNA made from as little as 25 ul
directly from a biological fluid without an intervening DNA of plasma gave > 5x10° non -excluded sites on the GAII ( see
isolation step , there is a minimum amount of the fluid 25 ul plasma-ME condition ). This shows that there is
required to successfully generate the library and still gen adequate cfDNA in as little as 25 ul ofplasma to generate the
erate useable downstream data . 40 minimum necessary unique, non -redundant sequencing tags
In the experiment described in this method , cfDNA was for downstream analysis . Using the higher cfDNA recovery
isolated from decreasing volumes of plasma - 200 ul, 100 processes described herein , the 25 ul should be a sufficient
ul, 50 ul and 25 ul using two differentmethods — The Qiagen sample size. FIG . 8 shows that the % chromosome tags is
MINELUTE? column method (referred to asME method in invariant with lowering amounts of plasma input, where the
figures ) and the phenol-chloroform followed by EtOH pre - 45 different symbols for different methods (ME and PC ) and
cipitation method (referred to as PC method ). The DNA was plasma amounts ( 25 and 50 ul) tend to overlap for each
eluted in 35 ul of Elution buffer (0 . 1M Tris , pH 8 ) and 30 ul chromosome.
Generating Library Directly from Nucleosome-Attached
of the DNA was used to generate sequencing using the NEB cfDNA Using Adapter Ligation Method
library kit Number E6000B (New England BioLabs, Inc .). 50 The data presented above shows that there is adequate
An end - repair step of library generation was not included in DNA in 25 ul or more of plasma to generate workable
these preparations. End repair is typically used to produce sequencing library . The following description shows that a
blunt ends and phosphorylate the ends ands. Such
Such end repair
end repair functioning library can be made directly from plasma.
operations are believed to be unnecessary when working• 55 As mentioned , untreated plasma contains a large amount
with most cfDNA . of ambient protein , predominantly 35 -50 mg/ml albumin
and 10 - 15 mg/ml immunoglobulins . These proteins create
The table below shows the library yield in nM as a steric hinderance for the library-making enzymes to act on
function of plasma volume input for the two cfDNA isola - nucleosomal cfDNA. Plasma also contains salts , proteases
tion techniques (ME and PC ). FIG . 7 is an electropherogram and nucleases that can interfere with the library biochemis
showing identical library profiles on an Agilent BIOANA - 60 try . Therefore , in working with plasma one may simplify its
LYZER® for sequencing libraries made starting with 50 ul composition as follows: ( 1 ) deplete or reduce background
plasma with the Qiagen MINELUTE? ( trace with higher albumins and Igs, ( 2 ) inhibit proteases and nucleases, and /or
magnitude tail and with peak shifted down and toward right) (3 ) make the cfDNA more accessible.
and the Phenol-Chloroform (other trace ) DNA isolation In certain embodiments , background protein can be
methods. The peak is associated with cfDNA having two 65 depleted using a combination of albumin and Ig depletion
adaptors appended thereto - each adaptor being about 60 bp columns. Many proteases and nucleases can be deactivated
in length . by heating the plasma to 65 deg for about 15 -30 min OR
US 10 ,400 , 267 B2
57 58
using a blood collection tube such as a Streck tube (de TABLE 3 - continued
scribed above ) to collect blood because Streck additive
deactivates nucleases. Finally, the " ends” of cfDNA can be Library metrics for positive control and plasma library
made more accessible to library preparations enzymes using Non Excld NonExcld
mild detergents and salts (or a combination thereof). These 5 Condition Reads Tags Tags /Reads Sites Sites/ Tags
will cause the cfDNA to unwrap from the histone complex , Plasma 55174583 31690216 0 .574 455059 0.014
allowing access to the ends of the cfDNA for ligation of the lib (with
sequencing adapters . Tw20 )
The data below describes implementation of such tech
niques to make library directly from plasma. As seen below , FIG . 10 overlays the % Chr distribution from a control
the yields of the library are acceptable and encouraging . library made from purified DNA on the % Chr distribution
1 ) Plasma Protein Depletion : from the library generated directly from plasma. The differ
50 ul plasma was heated to 65 deg C . for 20 min . The ences seen in the plasma library, especially in the number of
resulting cloudy plasma was centrifuged at 15 ,000 g for 5 tags on the smaller chromosomes, may be a result of an
minutes and the supernatant was taken into an end - repair 15 insufficient number of total tags from the plasma library as
free NEB library preparation ( identified above) with indexed input. This data shows that it is feasible to make a sequenc
Illumina adapter. FIG . 9A shows a BIOANALYZER® pro ing library directly from plasma.
file of the library generated with a peak at the expected 300
bp size from the sample processed by protein depletion . The Method 2 – Freezing and Thawing Whole Blood Samples
concentration of DNA in this library was relatively small at 20 cfDNAThe example below describes a method for isolating
directly from blood without first isolating plasma.
1 nM but the results demonstrate that cfDNA around nucle
osomes can be adapter ligated . Moreover , the peak at ~ 120 The example also details downstream experiments that
bp , which represents the adapter dimer , confirmed that ligase demonstrate that cDNA isolated from blood behaves similar
is active in plasma . - 25
to cfDNA isolated from plasma.
2 ) Detergent Treatment of Plasma: Materials and Methods
50 ul plasma was treated with one of various detergents Freeze - Thaw Blood SN Isolation :
( TWEEN® -20 , TRITON®-X100 , BRIJ® - 35 , SDS, NP40 Blood from 31 pregnant donors was collected in Streck
and combinations thereof) prior to attempting a library BCTs, 4 tubes per donor . Upon arrival, three blood tubes
preparation . The concentrations of the detergents tested 30 were processed to plasma using conventional protocols. See
varied depending on the ionic /non - ionic character of the Sehnert et al., Optimal Detection of Fetal Chromosomal
detergent. E . g ., TWEEN® - 20 , BRIJ® - 35 and NP40 were Abnormalities by Massively Parallel DNA Sequencing of
added at 0 . 1 % and 0 .5 % ; SDS and TRITON® -X100 were Cell -Free Fetal DNA from Maternal Blood , Clinical Chem
added at 0 .01 % and 0 .05 % (all percentages in wt/wt). The istry 57: 7 (2011); and Bianchi et al., Genome-Wide Fetal
plasma used in these experiments was not depleted of excess 35 Aneuploidy Detection by Maternal Plasma DNA Sequenc
protein . Untreated plasmaments and was
mostnotdetergents
depleted ofdidexcess
not ing , Obstetrics and Gynecology , vol. 119 , no . 5 ( 2012 ). The
provide apparent library generation . FIG . 9B shows a com fourth tube of blood was placed inside a 50 ml conical tube
parative BIOANALYZER® profiles . In the profiles, there is and left lying on its side at - 20° C ., typically for approxi
no discernible library peak at 300 bp in plasma treated with
mately 16 hrs. Blood tubes lying on their sides did not break
BRIJ® - 35 ( green ), NP40 (blue ) and TRITON® -X100 (red ). 10 upon freezing and the 50 ml conical tube was used as a
However, in all three conditions, there is a peak at 120 bp ,
showing that the ligase works (albeit inefficiently ) in the precautionary secondary container in case of the blood tube
plasma to generate the adapter dimer. The following day, the frozen blood was thawed by
In contrast, as shown in FIG . 9C , plasma in the presence leaving the blood tube in a room temperature water bath . 2.5
of 0.05 % TWEEN® -20 generated a non - trivial libraryy peak peak as45 ml
† of each of the freeze- thawed blood was transferred to two
( concentration ~ 2 .3 nM ) at the expected 300 bp size . Argos polypropylene tubes and centrifuged once at
This library was sequenced on the Illumina GAII, along 16 ,000xg for 10 minutes. Twox1 mlof freeze -thawed blood
with a control library where DNA was isolated form 50 ul of supernatant were transferred from each Argos tube into
plasma using the Qiagen MINELUTE? column . Sequenc Sarstedt cryotubes , resulting in four 1 ml tubes of freeze
ing metrics and % Chr representation were compared .
The table below compared certain sequencingmetrics. As thawed50
cfDNA
blood per donor.
Isolation , Library Preparation and Sequencing
is apparent from the data , the metrics of non - excluded sites DNA isolation , library preparation , dilution and multi
and the ratio of such sites to tags (NES/ Tags ) are not great plexed sequencing were done following the conventional
in the plasma library sample . This shows that the number of procedure mentioned above and described in Sehnert et al.
unique, non-redundant sequencing tags generated by thee 55 and Bianchi et al., supra . 24 plasma and paired 24 freeze
plasma library was not suitable in this experiment. This is to thaw blood libraries were sequenced on a single flowcell
be expected because the concentration of the input library (FC ID =COUBVACXX ).
was only 2. 3 nM . Results
TABLE 3
1) Comparison of cfDNA Yield :
60 DNA yield from freeze - thaw blood (FT) was substantially
Library metrics for positive control and plasma library greater than the yield from plasma. However, encouragingly,
only 6 of the 31 samples showed contamination from
NonExcld NonExcld maternal cellular DNA .
Condition Reads Tags Tags/Reads Sites Sites/ Tags FIGS. 11A and 11B show the range of cfDNA concen
Positive 49701951 35281787 0 .710 31056544 0 . 880 65 trations measured for the 31 samples from FT Blood and
control plasma. The figures visualizes comparison between DNA
yield from plasma and yield from FT Blood . FIG . 11A shows
US 10 ,400 , 267 B2
59
all 31 samples, and FIG . 11B shows the same data without FIG . 15 shows % Chr for FT Blood vs. plasma libraries
the 6 samples that had high DNA concentration to better as a function of Chromosomes. FIG . 16 shows % Chr plot
visualize the pattern of data . as a function of Chr size (Mb ) for the FT Blood and plasma
FIG . 12 shows the correlation between the two starting conditions .
5
materials for DNA isolation , with the six outliers excluded Chromosome Ratios:
( leaving 25 samples ). As expected , there is no correlation FIG . 17 shows the ratios reported for chromosomes 13, 18
between the two sources. This not surprising because pre and 21. Condition 1 = FT Blood ; condition 2 = plasma. The
vious data has shown that there is little correlation betweenratios reported differ between the two conditions. The dif
DNA yields in the manual Qiagen Blood Mini kit process, 10 ference in the ratio values is due to the fact that the ratios for
even from the same target source . the FT Blood condition have not been calculated using the
In the approximately 20 % of samples that show cellular ideal chromosome densities (NCDs). However, the spread of
DNA contamination , the contaminating DNA is typical of the data is comparable .
very high molecular weight DNA . Therefore , sample DNA Fetal Fraction Representation:
can be treated to exclude high molecular weight DNA . There 15 Finally, the sequencing data showed that FT Blood did not
are various commercially available products such as compromise the calculation of fetal fraction in the DNA .
SPRIselect Reagent Kit (Beckman Coulter), which can be FIG . 18 is a correlation plots between FT Blood and Plasma
fine -tuned to selectively retain DNA between predetermined for Ratio _ X and Ratio _ Y . It shows that for the 9 pairs of
sizes in any DNA preparation . Therefore , the problem of putative male fetus samples among the 22 pairs sequenced ,
some samples of FT Blood DNA being contaminated withth 20 correlations
high MW DNA can be solved in a straight- forward manner. ditions reportforhigh ChrX and for ChrY between the two con
R2 values of 0 .9496 (ChrX ) and 0. 9296
2 ) Library Yield and Quality: (ChrY ) respectively.
Indexed TruSeq (Illumina ) libraries were generated from Freeze and then thawing blood is a viable technique for
all 31 paired DNAs. However, when using cfDNA that had 25 generating
high cellular DNA contamination , the library profile looked offer are (cDNA
1 )
libraries. Among the advantages it may
decreased handling of the blood, (2 ) larger
different from the expected profile. High molecular weight
cellular DNA shows up near and around the high marker numbers of aliquots of the FT Blood will be available for
( 10 ,380 bp ) in measurements made with High Sensitivity downstream work , and (3) the concentrations of cDNA
DNA chip (Agilent Technologies , Inc .). This is due to the 30 isolated from FT Blood are typically higher. A potential
interference of the high molecular weight DNA in the library disadvantage of using FT Blood is that in about 20 % of the
process biochemistry . samples , there appears to be cellular DNA contamination .
FIGS. 13A to 13C show DNA library profiles, demon This can interfere with library biochemistry . However, the
strating effect of HMW DNA contamination on library
profile . FIGS. 13A and 13B compare three representative 35 contaminating cellular DNA typically is very high molecular
BIOANALYZER® profiles that detail the effect of the DNA weight DNA . This can be removed by size selection , e.g .
quality on the library quality. Red traces represent DNA and with a product such as SPRI Select. See Hawkins et al.,
libraries from FT blood and blue traces represent DNA and supra . With the use of such products , the process can select
libraries from plasma. FIG . 13C shows one high DNA for DNA within a prescribed size range .
sample and the corresponding effect of the DNA concentra Noninvasive Detection of Fetal Sub- Chromosome Abnor
tion on the library yield and profile . DNA profiles on the
BIOANALYZER® are from High Sensitivity chips ; library malities Using Deep Sequencing ofMaternal Plasma
profiles are from the DNA 1000 chips (Agilent Technolo The following example illustrates the kind of aneuploidy
gies, Inc .). determinations that can be made from cfDNA . Although this
45 work was not done using cfDNA unisolated from plasma,
FIG . 14 shows comparative library yield range and cor
relation for 22 paired plasma and FT Blood cfDNAs. The the process may be applied to cfDNA unisolated from
yield of the libraries was in an acceptable range of 20 -75 plasma.
nM . From the 31 paired samples, the six outliers with very Artificial Mixtures
high cellular DNA contamination in the FT Blood condition 50 To determine the depth of sequencing needed to detect
were not sent for sequencing; finally 22 of 25 were queued fetal sub -chromosome abnormalities i.e . partial aneuploi
for sequencing .
The lack of correlation between the library yields for dies, and to assess the effect of the relative fetal fraction of
DNA form the two processes is not surprising. Each library cfDNA present in a sample, artificial mixtures of 5 % and
process does not start with the same amount of input DNA . 55 10 % sheared genomic DNA were prepared using paired
Comparison of Sequencing Data Between FT Blood and mother and child DNAs obtained from the Coriell Institute
Plasma Libraries : for Medical Research (Camden , N . J.). All children were
males with karyotypes previously determined by metaphase
Chromosome Plots: cytogenetic analysis. The karyotypes of the four paired
The chromosome plots for FT Blood and plasma are 60 samples are shown in Table 4 . The children 's chromosome
slightly different as shown in FIG . 12 . FT Blood libraries
have slightly lower GC bias compared to plasma libraries as abnormalities were selected to represent different clinical
shown in FIG . 13 . (chromosome 4 is the most AT rich scenarios, such as: a ) whole chromosome aneuploidy ( fam
chromosome, and chromosomes 19 and 22 are the most GC ily 2139 ), b ) sub -chromosomal deletion ( family 1313 ), c)
rich chromosomes ). When % Chr hits are plotted versus Chr 65 mosaic sub - chromosomal copy number change (family
size , FT Blood has an R2 of 0 . 977 vs. an R2 of 0 . 973 for 2877 , with an additional inherited deletion ), and d ) sub
plasma. chromosomal duplication (family 1925 ).
US 10 ,400 ,267 B2
61 62
TABLE 4
Coriell samples used to generate artificial mixtures

FFamily CCoriell
ID ID Member Karyotype
22139 NNG09387 Mother 46 , XX
NNG09394 Affected Son 47 , XY, + 21
11313 NNA10924 Mother 46 , XX
NNA10925 Affected Son 46 , XY, del (7) (pter > p14 ::p12 > qter)
22877 NNA22629 Mother 46 , XX , del ( 11)
NNA22628 Affected son 47 , XY, del (11) (pter- >
p12 ::p11.2 > qter ), + 15 [ 12 ]
46 , XY, del (11) (pter- > p12 ::p11. 2- > ter) [40 ]
11925 NNA16268 Mother 46 , XX
NNA16363 Unaffected twin 46 , XY
son
NNA16362 Affected twin 47 , XY, + der (22 )
son

The genomic DNA samples were sheared to a size of - 200 narios in which the fetal karyotype is not known at the time
bp using the Covaris S2 sonicator (Covaris , Woburn , Mass.) of sample acquisition . The results of this study have been
following the manufacturer 's recommended protocols . DNA previously published . Following completion of the
fragments smaller than 100 bp were removed using AmPure MELISSA trial, the study database was assessed to identify
XP beads (Beckman Coulter Genomics , Danvers , Mass.). ten samples that had complex karyotypes , including sub
Sequencing libraries were generated with TruSeq v1 Sample 25 chromosome abnormalities , material of unknown origin , or
25 a marker chromosome ( Table 5 ); also added was one
Preparation kits ( Illumina , San Diego , Calif.) from sheared MELISSA
ME study sample with trisomy 20 as a control of
DNA mixtures consisting ofmaternal DNA only and mater - performance in detection ofwhole chromosome aneuploidy.
nal + child DNA mixtures at 5 % and 10 % w / w . Samples were The karyotypes were performed for clinical indications and
sequenced with single -ended 36 base pair (bp ) reads on the reflected local protocols . For example , some samples were
Illumina HiSeq2000 instrument using TruSeq v3 chemistry . 30 analyzed with chromosome microarrays and some had meta
Each sample was sequenced on four lanes of a flow cell, phase analysis with or without FISH studies.
resulting in 400x10° to 750x100 sequence tags per sample. In the MELISSA study libraries were sequenced using
Maternal Plasma Samples single - end reads of 36 bp with 6 samples in a lane on an
The MatErnal BLood IS Source to Accurately Diagnose Illumina HiSeq2000 using TruSeq v2.5 chemistry. In the
Fetal Aneuploidy (MELISSA ) trial was a registered clinical 35 present example , the previously generated MELISSA librar
trial (NCT01122524 ) that recruited subjects and samples ies were re - sequenced using TruSeq v3 chemistry on an
from 60 different centers in the United States and the Illumina HiSeq 2000 with single- end reads of 25 bp. In this
corresponding metaphase karyotype results from an invasive example , each of the 11 maternal samples was sequenced
prenatal diagnostic procedure . The study was designed to utilizing an entire flow cell, resulting in 600x106 to 1.3x109
prospectively determine the accuracy of MPS (massively 40 sequence tags per sample . All sequencing was performed in
parallel sequencing ) to detect whole chromosome fetal aneu - the Verinata Health research laboratory (Redwood City ,
ploidy. During this trial , all samples with any abnormal Calif.)by research laboratory personnel who were blinded to
karyotype were included to emulate the real clinical sce - the fetal karyotype.
TABLE 5
Karyotypes of clinical samples analyzed by MPS . Samples in the last four
rows are mosaic karyotypes
PPatient ID Specimen Procedure Karyotype
C60715 Chorionic villi Metaphase and 47 , XX , + 20
20212 FISH
C65104 Cultured villi Metaphase , arr 6q12q16 .3 (64 ,075,795- 101,594 ,105)
6q12 , 6q16 .3 3 ,
FISH and 6q16. 3 (102, 176 , 578 - 102 , 827, 691) * 3
microarray
C61154 Chorionic villi Metaphase 46 , XY, del (7 ) (936 . 1)
C61731 Amniocytes Metaphase and 46 , XX , del (8 ) (p23.1p23 .2)
22q FISH
C62228 Chorionic villi Metaphase and 45 , XX , - 15 , der (21) t ( 15 ; 21 )
Chr 15 FISH (q15 ; p11.2 )
C60193 Amniocytes Metaphase 46 , XY, add ( 10 ) ( 426 )
C61233 Amniocytes Metaphase 46 , XX , add (X ) (p22. 1)
C61183 Amniocytes Metaphase and 46, XY or 46, XY, add (15 ) (p11.2 )
FISH
C65664 Amniocytes Metaphase mos
46 , XY, + i (20 ) (q10 ) [ 8 ]/46 , XY [17 ]
US 10 ,400 , 267 B2
63 64
TABLE 5 - continued
Karyotypes of clinical samples analyzed by MPS . Samples in the last four
rows are mosaic karyotypes
PPatient ID Specimen Procedure Karyotype
C66515 Chorionic villi Metaphase and 47, XY, + der (14 or
FISH 22 ) [ 10 ]/46 , XY [ 10 ]
C60552 Chorionic Villi Metaphase 47, XX + mar [ 12 ]/46 , XX [8 ]

Normalization and Analysis The zi; values can be utilized to determine the relative fetal
Sequence reads were aligned to the human genome fraction (ff) present in the cfDNA . The value can then be
assembly hg19 obtained from the UCSC database (hgdown compared to an independent measurement of ff to validate
load .cse .ucsc . edu / goldenPath /hg19 /bigZips /). Alignments 15 copy number detection , or suggest the presence of mosa
were carried out utilizing the Bowtie short read aligner icism . For a bin ratio containing a copy number change from
( version 0. 12 .5 ), allowing for up to two base mismatches normal, the BRV , will increase ( in the case of a duplication)
during alignment. Only reads that unambiguously mapped to or decrease in the case of a deletion ) by equation (3 ):
a single genomic location were included . Genomic sites at
which reads mapped were counted as tags . Regions on the 20 Equation 3
Y chromosome at which sequence tags from male and BRV; = (1+ tm) BRV Medianej
female samples mapped without any discrimination were
excluded from the analysis (specifically , from base 0 to base
2x106; base 10x10 to base 13x10 “; and base 23x10 to the In this equation , ff, is the fetal fraction for sample n . If the
end of chromosome Y ). 25 coefficient of variation for each bin , CV ;; is defined as
The genome was then further divided into 1 Mb and 100 equation (4 ):
kb bins and, for each sample , tags from both the positive and
negative strand were assigned to individual bins for further AMAD Equation 4
analysis . The GC percentage of each bin was determined and 30 CV j = BRVMedianij
bins were ranked by GC percentage across the entire
genome. Each bin was individually normalized by calculat
ing the ratio of tags within a bin to the sum of the number then equation (5)
of tags in the 10 bins with the nearest GC percentages by
equation (1 ): 35 in= abs (22; CV.;) Equation 5
can be used to calculate ff, for sample n from Zi; values when
Tagsi Equation 1 a CNV is present.
Detection of a sub - chromosomal abnormality was a multi
BRVj = ITagskm step process for classifying specific regions as having a copy
40 number variant. The 2 ; + 4 thresholds are indicated in each
Where BRV , is the “ Bin Ratio Value” for the jth bin of figure by a dashed horizontal line. In step 1 , 2 ; values from
chromosome i, and Tags?, is the number of tags in the jih bin the 1 Mb bins that exceeded + 4 were identified . The calcu
of chromosome i. The sum runs over the 10 bins for the 1 Mb lated ff was then utilized and bins that had a ff of less than
4 % were eliminated . For the samples with male fetuses, the
data and 40 bins for the 100 kb data for bins (km ) with
with the 45
nearest GC percentage to bin ij. In order to detect me any ffwas also calculated using all of the bins in chromosome X .
This value was compared to the result obtained for putative
sub -chromosomal differences, each of the BRVs were exam copy number changes to validate a copy number change or
ined for deviations from the median values measured across suggest a mosaic result . Finally, in cases of a single 1 Mb bin
multiple samples . The medians were determined from the that met the above criteria , the 100 kb bins data were
fourmaternal only DNAs ( Table 4 ) for the artificial samples 50 examined and it was required that at least 2 bins (within a
and from the eleven maternal plasma samples ( Table 5 ) for contiguous group of 4 ) indicated a zi, value that exceeded + 4
the clinical samples and were robust to individual sub - or - 4 before classifying a sample as having a copy number
chromosome variants that might have been present in any variant. All three criteria had to be fulfilled to classify the
one of the samples. Median absolute deviations (MAD ) copy number variant. For example , individual data points
were calculated for each bin based on the medians and 55 that only had a z -score of greater than or less than 4 but did
adjusted assuming a normal distribution for the number of not meet the additional criteria were not classified as copy
tags in each bin . The adjusted MADs (aMADs )were utilized number variants.
to calculate a Z - score for each bin by equation (2 ): Results
Artificial Mixtures
60 Whole Chromosome Aneuploidy of Chromosome 21
( BRV;; – BRVMedian ;) Equation 2 FIG . 19 shows the chromosome 21 221; values (1 Mbbins )
AMAD; for an artificialmixture of family 2139 with 10 % of the son ' s
DNA ( T21) mixed with the mother 's DNA. In chromosome
21, there are approximately 38 Mb (35 Mb in the q arm ) that
It was expected that Zij would be approximately 13 for 65 contain unique reference genome sequence in hg19 . All of
regions without any copy number variations (CNVs) and the chromosome 21 tags mapped to this region . With the
significantly greater than 3 when fetal CNVs were present. exception of the first 4 Mb, FIG . 19 shows an over
US 10 ,400 , 267 B2
65 66
representation of most of chromosome 21 in the 10 % the ff calculated from the 1 Mb bin data is 4 .4 % , in
mixture , as would be expected with a full chromosome agreement with the whole chromosome results .
aneuploidy. Using equation 5 to calculate the ff from the Duplications and Deletions
average Z21; values of the amplified regions , ffs of 7 .0 % and Sample C65104 ( Table 6 ) had a complex fetal karyotype
12 .7 % , for the 5 % and 10 % mixtures, respectively , were 5 that involved the long arm of chromosome 6 (69 ) and two
obtained . Calculating the ff average using Zx; values , ffs of duplications, one ofwhich was 38 Mb in size. The second
4 .2 % and 9 .0 % , for the 5 % and 10 % mixtures, respectively, duplication was reported as approximately 650 kb from the
were obtained . chromosome microarray analysis of cultured villi. Using
Sub -Chromosomal Deletion of Chromosome 7 MPS it was previously reported that this sample showed an
The method wasnext tested on Family 1313 , in which the 10 increased whole chromosome normalized chromosome
son has a sub - chromosomal deletion of chromosome 7 . FIG . value (NCV ) in chromosome 6 (NCV = 3 .6 ) (Bianchi, D . W .,
20 shows the chromosome 7 z ; values ( 1 Mb bins) for the Platt , L . D ., Goldberg , J. D ., Abuhamad , A ., Sehnert, A . J
maternal sample mixed with 10 % of her son 's DNA . A Rava , R . P. (2012 ). Genome-wide fetal aneuploidy detection
deletion was observed beginning at bin 38 and continuing to
bin 58 . This reflects the approximately 20 Mb deletion 15 by by maternal plasma DNA sequencing . Obstet. Gynecol. 119 ,
documented in the metaphase karyotype . Fetal fraction 890 - 901 ). This value was insufficient to classify this sample
values ffs of 6 . 1 % and 10 .5 % were calculated for the 5 % and as having a full chromosome aneuploidy, but it was consis
10 % mixtures , respectively, for this sample. Calculating the tent with the presence of a large duplication . FIG . 23A
ff average using Zy; values, ffs of 5 .9 % and 10 .4 % were shows the 1 Mb bin results for this sample showing the z
obtained , respectively . Interestingly in this sample there 20 values as NCV for the chromosomes . All the chromosomes
appeared to be a duplication in thematernal sample at bin 98 other than chromosome 6 showed z values that clustered
of chromosome 7 (circle in FIG . 20 ), which did not appear around 0 . By focusing only on chromosome 6 (FIG . 23A ),
in the son , i.e . was not inherited . Had this duplication been the exact region of the 38 Mb duplication was identified .
maternally inherited , the Z ., value would be expected to This 38 Mb corresponded to the large duplication seen in the
decrease also in the mixture. As shown in FIG . 20 , the value 25 microarray karyotype, and the ff calculated from this dupli
of z ; is lower for the 10 % mixture compared to the pure cation was 11. 9 % . The second duplication in the microarray
maternal sample . Bin 2 which had very high Z -, values of karyotype was not detected a priori by our criteria ; however,
43 .9 and 28 .5 for the maternal sample and 10 % mixture, it can be clearly seen in the 100 kb bin expansion of the
respectively ( data not shown) also appeared to reflect a region ( FIG . 23A ). Improved analytic methodology and /or
maternal duplication . 30 deeper sequencing would clearly allow this duplication to be
Mosaic Duplication of Chromosome 15 detected. Finally, a 300 kb gain in chromosome 7 at 7q22 . 1
In Family 2877 . the maternal sample has a deletion in was also identified by MPS in agreement with the microar
chromosome 11 that was inherited by the son . In addition , ray results ( Table 31).
the son has a duplication in chromosome 15 that was not
maternally inherited , and is part of a mosaic karyotype in 35 TABLE 6
which the majority of cells are normal ( Table 4 ). FIG . 21 MPS results on clinical samples that are congruent
shows both the chromosome 11 and chromosome 15 zij with the clinically reported karyotype
values for the 1 Mbbins in themixture with 10 % of the son 's
DNA. As expected , the inherited deletion in chromosome 11 Patient Affected Gain Start End Size Chromosome
from 41 Mb to 49 Mb had a consistent set of values that did 40 ID Chr Loss bin bin (Mbp ) region
not change with fetal fraction . However, the chromosome 15 CC65104 6 Gain 64 102 38 6q12 -6q16 .3
duplication was clearly detected between bins 27 and 66 , Gain
Loss
98 . 1 98 .3 0 . 3 7422.1
150. 3 150 .6 0 . 3 7936 . 1
albeit with more noise than observed in the other artificial CC61154
CC61731 Loss 2 12 10 8p23 . 2- 8p23 . 2
samples. The noise results from the reduced apparent ff for CC62228 Loss 23 39 16 15 . 11. 2 -15q14
this duplication due to the mosaicism . The ffs calculated 45 CC60193 17 Gain 62 81 19 17723.3 - 17q25 .3
from the duplication using 15 Zi, values were 1.6 % and 3 .0 % 10 Loss 134 135 2 10926 .3
for the 5 % and 10 % mixtures , respectively. In contrast , the CC61233 X
3 Gain 158 198
Loss 1 10
409 3925 .32-3729
Xp22.33- Xp22.31
ffs calculated from chromosome X were 5 . 3 % and 10 .7 % .
The method was able to detect both the sub -chromosomal
duplication with the low mosaic ff and to distinguish that the 50 Sample C61154 came from a pregnant woman carrying a
duplication was due to mosaicism by comparison of the ff fetus with a7q36 . 1 deletion detected by metaphase karyo
result to an independent measurement of chromosome X . type analysis of chorionic villi. FIG . 24A shows the 1 Mb
Duplications of Chromosome 22 bin results for this sample . Only chromosomes 7 and 8
Family 1925 consisted of a mother and two male twins , showed 1 Mb bins with z values that met the criteria for
one of which had two duplications of different sizes in 55 classification . Chromosome 7 showed a single 1 Mb bin with
chromosome 22 . Ten percent mixtures of the affected twin 's a significant decrease in the z value at 7q36 . 1 ( denoted by
DNA and the mother were sequenced . The results indicated circle in FIG . 24A ). An examination of the data at higher
a 2 Mb and an 8 Mb duplication at bins 17 and 43 , resolution ( 100 kb bins ) (FIG . 24B ) showed a deletion of
respectively. The ff for 10 % mixture was calculated to be approximately 300 kb , which was consistentwith the karyo
11. 2 % from the 2 Mb duplication , 11 .6 % from the 8 Mb 60 type report ( Table 6 ). In this sample it was also observed an
duplication , and 9 .8 % from chromosome X (FIG . 22). approximately 1 Mb deletion in both the 1 Mb and 100 kb
Maternal Plasma Samples bin data close to the centromere of chromosome 8 ( as shown
Whole Chromosome Aneuploidy by the oval in FIG . 24A ). The chromosome 8 deletion was
Sample C60715 was previously reported in MELISSA not reported in the karyotype obtained from chorionic villi
study as detected for trisomy 20 . The 1 Mb bin results for 65 (Table 7 ). The ffs calculated from the chromosome 7 and 8
this sample contain ~ 960 million tags across the genome. deletions were 18 .4 % and 68. 5 % , respectively . The ff cal
The extra copy of chromosome 20 was clearly detected and culated from chromosome X was 2 .8 % . In this case , the high
US 10 ,400 , 267 B2
ff value for chromosome 8 indicated that this deletion , which
68
TABLE 7 -continued
was not reported in the fetal metaphase karyotype, was
maternal in origin . In addition , the discordant value of the Copy number variants detected by MPS that
were not reported in the clinical karyotypes
chromosome 7 compared to chromosome X ff values sug
gests that part of the signal could be due to the mother. The 5 Affected Start End Size Chromosome
karyotype report indicated that the chromosome 7 " abnor Pat ID Chr Gain /Loss bin bin (Mbp ) region
mality is most likely a derivative from a carrier parent," C65664 77 Loss 39 .3 40 0 .8 7p14 . 1
which is consistent with the MPS data . 114 Loss 58 58. 1 0 .2 14923. 1
Sample C61731 had a partial deletion of the short arm of C66515 99 Gain 40 .7 41 0 .4 9p31 .1
chromosome 8 . The 1 Mb bin results ( FIG . 25 ) indicated an 10 C60552 66 Loss 151. 4 151. 5 0 .2 6925 . 1
approximately 5 Mb deletion in the p -arm of chromosome 8 222 Gain 25 .6 25 .9 0 .4 22q11.23
in agreement with the karyotype ( Table 6 ). The fetal fraction
calculated from this chromosome deletion was 8 .4 % . Mosaic Karyotypes
Translocations Four of the samples listed in Table 5 (C61183 , C65664 ,
The fetalkaryotype for sample C62228 showed an unbal- 15 C66515 , C60552 ) had mosaic karyotypes with sub - chromo
anced translocation consisting of 45 , XX ,- 15 , der( 21 ) t somal abnormalities. Unfortunately for three of the samples
( 15 ;21 ) ( q15 ;p11.2 ). The 1 Mb bin results for this sample are (C61183 , C66515 , C60552 ) the putative sub -chromosomal
shown in FIG . 26 . There was a clear 17 Mb deletion in abnormality originates in regions of the genome for which
chromosome 15 in agreement with the karyotype ( Table 6 ) . information is either unavailable in the genome build or
The ff calculated from the chromosome 15 deletion was 20 highly repetitive and not be accessible for analysis . Thus, in
11. 3 % . No sub -chromosomal abnormalities were detected in this case , the process was unable to determine the sub
the chromosome 21 data to indicate the translocation break chromosomal abnormalities reported in these three samples .
point.
The Zi; values were all close to and centered around zero .
Identification of Additional Material not Identified by 25 Sample
Karyotype C65664 had a mosaic karyotype with isochromo
Two maternal samples had fetal karyotypes with added some 20q, an abnormality that is associated with an event
material of unknown origin at specific chromosomes . The 1 secondary to post zygotic error (Chen , C .- P . (2003 ) Detec
Mb bin results for sample C60193 are shown in FIG . 27 . tion of mosaic isochromosome 20q in amniotic fluid in a
From the MPS data , the additional material of unknown pregnancy with fetal arthrogryposis multiplex congenita and
origin on the long arm of chromosome 10 appeared to be 30 normal karyotype in fetal blood and postnatal samples of
derived from an approximately 19 Mb duplication at the q placenta , skin , and liver. Prenat. Diagn . 23 , 85 - 87 ) . Since
terminus of chromosome 17 . There was also an approxi- cfDNA primarily originates from placental cytotrophoblasts ,
mately 2 Mb deletion at the q terminus of chromosome 10 it is not expected that this abnormality would be detected
that was confirmed by the 100 kb bin data . The ffs calculated using MPS . There were 1 - 2 small sub - chromosomal changes
from the chromosome 17 duplication and chromosome X 35 detected in these samples by MPS that were not reported in
(male fetus ) were 12 . 5 % and 9 . 4 % , respectively . The 2 Mb the karyotypes ( Table 7 ) .
deletion on chromosome 10 had a calculated ff of 19 .4 % . Further Discussion
Finally , the MPS results for this sample indicated a small This example demonstrates that in non -mosaic cases, it is
( 300 kb ) deletion in chromosome 7 that was not reported in possible to obtain a full fetal molecular karyotype using
the metaphase karyotype ( Table 7 ). 40 MPS ofmaternal plasma cfDNA that is equivalent to CMA
The 1 Mb bin results for sample C61233 are shown in
(chromosomalmicroarray ), and in some cases is better than
FIG . 28 . The karyotype for this sample indicated additional a metaphase karyotype obtained from chorionic villi or
chromosomal material on the short arm of one of the X amniocytes. Such a non -invasive test could have immediate
chromosomes . The additional material of unknown origin
appeared to originate from a 40 Mb duplication at the g 45 clinical utility , particularly in rural areas where invasive
terminus of chromosome 3 . There was also an approxi- procedures are not readily available .
mately 9 Mb deletion on the p arm of chromosome X ( Table Using 25 -mer tags at ~ 109 tags/sample , the results indi
6 ). The ffs calculated from the chromosome 3 duplication cate that sufficient precision can be obtained between
and chromosome X deletion were 9 .5 % and 6 .7 % , respec sequencing runs to reliably achieve 100 kb resolution across
tively. The MPS results for this sample also indicated three 50 the genome. Even greater resolution can be achieved with
small sub -chromosomal changes that were not reported in deeper sequencing. The improvements in the v3 sequencing
the metaphase karyotype ( Table 7 ). chemistry allowed for the use of 25 -mer tags, compared to
the 36 -mers used in previous work (Bianchi, D . W ., Platt, L .
TABLE 7 . D ., Goldberg , J. D ., Abuhamad, A ., Sehnert, A . J., Rava , R .
55 P . (2012 ). Genome-wide fetal aneuploidy detection by
Copy number variants detected by MPS that maternal plasma DNA sequencing. Obstet. Gynecol. 119 ,
were not reported in the clinical karyotypes 890 - 901). These short tags mapped with high efficiency
Affected Start End Size Chromosome across the genome, and the quantitative behavior demon
Pat ID Chr Gain /Loss bin bin (Mbp ) region strated with the artificial mixture analyses validates the
C60715 22 Gain 87. 3 87 . 9 0 .6 2p11. 2 60 methodology . At today ' s costs, this depth of sequencing is
22 Loss 89.8 90 . 2 0.5 2p11 .2 approximately $ 1 ,000 per sample . This is comparable to the
C61154 Loss 46 .9 47. 7 0 .9 8q11 . 1 cost of a chromosome microarray result, but employs a
C60193 77 Loss 158.7 158 . 9 0 .3 7936 .3 risk -free blood draw rather than an invasive procedure .
C61233 33 Loss 114 114.5 0 .6 3q13 .31 Deeper sequencing would allow for even finer resolution at
111 Loss 55.3 55.4 0 .2 11711
117 GGain 81 81. 1 0 .2 17q25. 3 65 an additional cost. Thus, this type of analysis could be
C61183 11 Loss 12 .8 13 0 .3 1p36 .21 implemented today as a reflex test when other clinical
factors are present (such as sonographically -detected
US 10 ,400 , 267 B2
69 70
anomalies that are not typical of whole chromosome aneu Determination of fetal sub -chromosome abnormalities
ploidy ) when the patient declines an invasive procedure or using deep sequencing of maternal plasma allows for a full
prefers a blood test. molecular karyotype of the fetus to be determined noninva
The lack of results on the mosaic samples (except for the sively .
artificial mixture ) highlights the current limitations of both 5 In addition to the example above , which shows that partial
the microarray and MPS approaches. Sub -chromosomal aneuploidies can be determined using cfDNA , a similar
abnormalities that originate in regions of the genome for numbers procedure can be used to determine whole chromosome
which information is either unavailable in the genome build See for example (whole chromosome aneuploidies ) from cfDNA .
or highly repetitive will not be accessible for analysis . Such , example 16 in PCT application US2013/
inaccessible genome regions are typically focused in the 2013 and incorporatedNo
10 023887 (Publication . WO2014 /014497), filed Jan . 30 ,
herein by reference . Further, a similar
telomeres and centromeres of different chromosomes and in procedure can use cfDNA to detect anueploidies associated
the short armsof acrocentric chromosomes. Also , the lower with cancer. See for example , example 29 of PCT applica
fetal fraction for the mosaic portion will be more challeng tion US2013/023887 , which application is incorporated in
ing for detection and may require even deeper sequencing 15 its entirety by reference .
for effective classification .
Metaphase cytogenetic analysis from cell cultures, while What is claimed is :
considered “ standard ,” has some limitations that need to be 1. A method for obtaining sequence information from a
considered . For example , the ability to detect sub - chromo- whole blood sample comprising cell- free DNA , said method
somal abnormalities is typically limited to sizes of 5 Mb or 20 comprising:
greater. This constraint is what led to the recent recommen (a ) obtaining a plasma fraction of the whole blood sample,
dation of using CMAs as a first tier test in clinical practice . wherein the plasma fraction comprises the cell- free
Cell culture is biased towards the detection of more stable DNA ;
chromosomal configurations over significant structural (b ) exposing the plasma fraction to conditions that reduce
alterations . In the case of fluorescence in situ hybridization 25 the binding of the cell -free DNA to nucleosomal pro
( FISH ),only the regionsofthe genome that are addressed by teins, wherein the conditions comprise exposing the
design of the FISH probes can be analyzed . Finally , as plasma fraction to polysorbate - 20 , and /or heating the
shown here , in actual clinical practice metaphase karyotypes plasma fraction to a temperature from about 55° C . to
can be reported to contain “ chromosomal material of about 75° C .;
unknown origin .” The MPS methodology ofmeasuring copy 30 (c) attaching sequencing adapters to ends of unpurified
number variation introduced in this work overcomes these cell -free DNA fragments in the plasma fraction without
limitations of karyotyping first purifying the cell- free DNA from the plasma
Importantly, our results showed that MPS was able to fraction , thereby preparing a sequencing library com
prising library fragments having the sequencing adapt
identify the potential source of the material of unknown ers attached to either end of the unpurified cell- free
origin for clinical samples C60193 and C61233 . In addition , DNA fragments; and
the MPS data showed small deletions in the termini of the ( d ) sequencing said sequencing library to obtain sequence
chromosomes that the metaphase karyotype indicated were information .
the breakpoints for the unknown chromosomal material in 2 . The method of claim 1 , wherein ( c ) comprises contact
each of these samples . Such deletions at the breakpoints of 40 ing the plasma fraction with a transposase and polynucle
translocations have been reported repeatedly in the literature otides comprising a sequencing adapter sequence.
(Howarth , K . D ., Pole, J . C . M , Beavis , J. C ., Batty , E . M ., 3 . The method of claim 1 , wherein obtaining the plasma
Newman , S ., Bignell, G . R ., and Edwards, P. A . W . (2011 ) fraction comprises centrifuging the whole blood sample and
Large duplications at reciprocal translocation breakpoints removing a resulting buffy coat and hematocrit fractions.
that might be the counterpart of large deletions and could 45 4 . The method of claim 3 , wherein obtaining the plasma
arise from stalled replication bubbles. Genome Res. 21 , fraction further comprises centrifuging the plasma fraction
525 - 534 ). Based on these results, MPS may have the capa to remove solids from the plasma fraction .
bilities to identify both the presence of a sub -chromosomal 5 . The method of claim 1 ,wherein (b ) comprises exposing
duplication and suggest a translocation position based on the plasma fraction to polysorbate - 20 while the plasma
small deletions ( or duplications) elsewhere in the genome. 50 fraction is in contact with the sequencing adapters .
6 . The method of claim 1 , wherein only a single centrifu
applications beyond the determination of fetal sub - chromo- gation step is performed on the whole blood sample prior to
somal abnormalities from cfDNA in maternal plasma. Ulti - preparing the sequencing library, and wherein the single
mately , MPS can be applied to any mixed biological sample centrifugation step is performed at an acceleration of at least
in which one wishes to determine the sub - chromosomal 55 about 10 ,000 g .
abnormalities in the minor component, even when the minor 7. The method of claim 1, further comprising removing
component represents only a few percent of the total DNA serum proteins from the plasma fraction prior to preparing
in the specimen . In prenatal diagnostics , samples obtained the sequencing library from the cell- free DNA .
from chorionic villi could be analyzed for mosaic karyo - 8 . The method of claim 1 , wherein the whole blood
types or maternal contamination . Outside of prenatal diag - 60 sample is obtained from a pregnantmother , and the cell - free
nosis , many different cancers have been associated with DNA comprises fetal cell- free DNA of a fetus carried by the
copy number changes that could potentially be detected pregnant mother.
from cfDNA in the blood of the patient or a solid tumor 9 . The method of claim 8 , further comprising using the
sample that contains both normal and cancer cells . As the cell- free DNA to determine copy number variation (CNV ) in
cost of MPS continues to drop , it is expected that its 65 the fetus.
application for detecting sub -chromosomal abnormalities in 10 . The method of claim 1, wherein the whole blood
mixed samples will find broad clinical utility . sample is obtained from a cancer patient.
US 10 ,400, 267 B2
71 72
11 . The method of claim 10 , wherein the cell - free DNA (f) sequencing said sequencing library to obtain sequence
comprises cell- free DNA of a cancer genome, and wherein information .
the method further comprises using the cell- free DNA to 15 . The method of claim 14 , wherein (e ) comprises
determine copy number variation (CNV ) in the cancer contacting the liquid fraction with a ligase and polynucle
genome. 5 otides comprising a sequencing adapter sequence .
12 . The method of claim 1 , wherein the conditions do not 16 . The method of claim 14 , wherein (e ) comprises
include the presence of a protease , sodium dodecyl sulfate, contacting the liquid fraction with a transposase and poly
nucleotides comprising a sequencing adapter sequence .
or heating to a temperature higher than 75° C . 17 . The method of claim 14 , further comprising exposing
13 . The method of claim 1 , wherein the conditions of (b ) the liquid fraction to polysorbate - 20 , and/ or heating the
comprise heating the plasma fraction to a temperature from liquid fraction to a temperature from about 55° C . to about
about 65° C . to about 75° C . 75° C .
14 . A method for obtaining sequence information from a 18 . The method of claim 14 , wherein :
whole blood sample comprising cell- free DNA , said method the whole blood sample is obtained from a pregnant
comprising : 15 mother,
(a ) freezing the whole blood sample ; the cell - free DNA comprises fetal cell- free DNA of a fetus
(b ) thawing the frozen whole blood sample ; carried by the pregnant mother, and
(c) separating solids from the thawed whole blood sample the method further comprises using the cell-free DNA to
to obtain a liquid fraction , wherein the liquid fraction determine copy number variation (CNV ) in the fetus .
comprises the cell -free DNA ; 20
19 . The method of claim 14 , wherein
( d ) reducing concentration of plasma proteins in the liquid the whole blood sample is obtained from a cancer patient,
fraction ; the cell- free DNA comprises cell -free DNA of a cancer
(e ) attaching sequencing adapters to ends of unpurified genome, and
cell-free DNA fragments in the liquid fraction , thereby the method further comprises using the cell-free DNA to
preparing a sequencing library comprising library frag - 25 determine copy number variation (CNV ) in the cancer
ments having the sequencing adapters attached to either genome.
end of the unpurified cell- free DNA fragments ; and

You might also like