0% found this document useful (0 votes)
11 views20 pages

Nihms 263986

The document discusses advancements in RNA sequencing (RNA-seq) technology, highlighting its impact on transcriptome characterization and quantification. Key developments include improved mapping of transcription start sites, strand-specific measurements, and detection of alternative splicing events, which enhance our understanding of RNA biology. The review also addresses current limitations and future opportunities in RNA-seq methodologies, particularly in clinical applications and the analysis of small RNA quantities.

Uploaded by

Giurea Bogdan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views20 pages

Nihms 263986

The document discusses advancements in RNA sequencing (RNA-seq) technology, highlighting its impact on transcriptome characterization and quantification. Key developments include improved mapping of transcription start sites, strand-specific measurements, and detection of alternative splicing events, which enhance our understanding of RNA biology. The review also addresses current limitations and future opportunities in RNA-seq methodologies, particularly in clinical applications and the analysis of small RNA quantities.

Uploaded by

Giurea Bogdan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

NIH Public Access

Author Manuscript
Nat Rev Genet. Author manuscript; available in PMC 2011 May 1.
Published in final edited form as:
NIH-PA Author Manuscript

Nat Rev Genet. 2011 February ; 12(2): 87–98. doi:10.1038/nrg2934.

RNA sequencing: advances, challenges and opportunities

Fatih Ozsolak and Patrice M. Milos


Helicos BioSciences Corporation, One Kendall Square, Cambridge, Massachusetts 02139, USA

Abstract
In the few years since its initial application, massively parallel cDNA sequencing, or RNA-seq,
has allowed many advances in the characterization and quantification of transcriptomes. Recently,
several developments in RNA-seq methods have provided an even more complete characterization
of RNA transcripts. These developments include improvements in transcription start site mapping,
strand-specific measurements, gene fusion detection, small RNA characterization and detection of
alternative splicing events. Ongoing developments promise further advances in the application of
RNA-seq, particularly direct RNA sequencing and approaches that allow RNA quantification from
NIH-PA Author Manuscript

very small amounts of cellular materials.

Over the past 10 years we have come to appreciate the dynamic state of genomes, including
both DNA modifications and RNA quantitative and qualitative changes, which have been
characterized in species ranging from simple model organisms to humans. This advance has
occurred through the use of various genomic measurements, including comprehensive

© 2010 Macmillan Publishers Limited. All rights reserved


fozsolak@helicosbio.com; pmilos@helicosbio.com.
Competing interests statement
The authors declare competing financial interests; see Web version for details.
FURTHER INFORMATION
Fatih Ozsolak and Patrice M. Milos’s homepage
(Helicos BioSciences website): www.helicosbio.com
Helicos Technology Center: http://open.helicosbio.com
The University of California Santa Cruz Genome Browser: http://genome.ucsc.edu
Next generation DNA sequencing
(Often abbreviated to NGS.) Non-Sanger-based high-throughput DNA sequencing technologies. Compared to Sanger sequencing,
NIH-PA Author Manuscript

NGS platforms sequence as many as billions of DNA strands in parallel, yielding substantially more throughput and minimizing the
need for the fragment-cloning methods that are often used in Sanger sequencing of genomes.
Semisuppressive PCR
A PCR strategy that aims to reduce primer dimer accumulation by preferentially amplifying longer DNA fragments.
Spike pool
Internal controls added to RNA samples, consisting of RNA elements of known sequence and composition.
Paired-end reads
A strategy involving sequencing of two different regions that are located apart from each other on the same DNA fragment. This
strategy provides elevated physical coverage and alleviates several limitations of NGS platforms that arise because of their relatively
short read length.
Laser capture microdissection
(Often abbreviated to LCM.) A method allowing cells of interest that are chosen by the operator using a microscope to be specifically
captured from heterogeneous tissue samples. The isolated cells can be used for various analyses including of protein and nucleic acid.
Quantitative real-time polymerase chain reaction
A PCR application that enables the measurement of nucleic acid quantities in samples. Nucleic acid of interest is amplified with PCR.
The level of the amplified product accumulation during PCR cycles are measured in real time. This data is used to infer starting
nucleic acid quantities.
Circulating extracellular nucleic acid
Extracellular DNA or RNA molecules in plasma and serum
Ozsolak and Milos Page 2

transcriptomics studies1. We now have a new appreciation for the complexity of the
transcriptome, encompassing a multitude of previously unknown coding and non-coding
RNA species, particularly small RNAs (sRNAs), including microRNAs, promoter-
NIH-PA Author Manuscript

associated RNAs and newly discovered antisense 3′ termini-associated RNA, to name a


few2,3.

Initial transcriptomics studies largely relied on hybridization-based microarray technologies


and offered a limited ability to fully catalogue and quantify the diverse RNA molecules that
are expressed from genomes over wide ranges of levels. The introduction of high-throughput
next-generation DNA sequencing (NGS) technologies4 revolutionized transcriptomics by
allowing RNA analysis through cDNA sequencing at massive scale (RNA-seq). This
development eliminated several challenges posed by microarray technologies, including the
limited dynamic range of detection5. NGS platforms used for RNA-seq are commercially
available from four companies — Illumina, Roche 454, Helicos BioSciences and Life
Technologies — and new technologies are in development by others4. Given the importance
of sequencing capabilities, such as throughput, read length, error rate and ability to perform
paired reads, for RNA-seq as well as genomic studies, NGS companies are constantly
improving their platforms to provide the best sequencing performance at the lowest cost4.

New methodologies for RNA-seq studies have been providing a progressively fuller
knowledge of both the quantitative and qualitative aspects of transcript biology in both
NIH-PA Author Manuscript

prokaryotes6 and eukaryotes5. Here we discuss these advances, which have included the
development of approaches to allow a more comprehensive understanding of transcription
initiation sites, the cataloguing of sense and antisense transcripts, improved detection of
alternative splicing events and the detection of gene fusion transcripts, which has become
increasingly important in cancer research — all at a data scale that was unimagined just
several years ago. Recently developed approaches also allow the selection of specific RNA
molecules before RNA-seq, allowing transcriptomics studies with more focused aims. In this
Review, we provide an overview of these methods, touching only briefly on the types of
biological insight that they allow, and focusing on the technologies themselves. We provide
a comparison of the different approaches that are available for each application and discuss
the current limitations and the potential for future improvements. We conclude by discussing
two new developments in RNA-seq technologies: direct RNA sequencing (DRS)7 and
methods for the reliable profiling of minute RNA quantities, which is important for
translational research and clinical applications of RNA-seq.

Mapping transcription start sites


The mapping of transcription start sites (TSSs) at nucleotide resolution is necessary to fully
NIH-PA Author Manuscript

define RNA products and to identify adjacent promoter regions that regulate the expression
of each transcript. One of the first high-throughput TSS mapping methods was the cap
analysis of gene expression (cAGe) approach, which was initially developed for Sanger
sequencing8,9. This involved sequencing of cloned cDNA products derived from RNAs
with intact 5′ ends (for example, containing a 5′ cap structure). Although useful, the
technology required high quantities of input RNA and generated only short reads (~20
nucleotides) per TSS.

These limitations prompted the adaptation of the cAGe approach for NGS platforms, which
has resulted in the discovery of the unexpected complexity of TSS distribution across
genomes and in the regions surrounding individual promoters. Methods that combine RNA-
seq with CAGE include deep CAGE10, PEAT11, nanoCAGE and CAGEscan12, which
collectively resolve several technical challenges of the initial Sanger sequencing-based
CAGE strategies (TABLE 1). First, nanoCAGE12 now allows TSS mapping from total RNA

Nat Rev Genet. Author manuscript; available in PMC 2011 May 1.


Ozsolak and Milos Page 3

quantities as small as 10 nanograms through the use of various amplification strategies.


Second, the compatibility of PEAT and CAGEscan with paired-end sequencing (a capability
that is enabled by platforms such as Illumina, but is lacking in others such as Helicos) allows
NIH-PA Author Manuscript

examination of the connectivity of TSSs with downstream regions and facilitates the
assignment of identified TSSs to specific transcripts. In addition, paired-end sequencing
partly alleviates the difficulty of aligning single short reads to repeat regions and thus allows
a subset of repeat elements to be at least partially characterized by RNA-seq.

However, there are several caveats of these NGS-based approaches. One is that no attempt
has been made to examine whether the amplification and other manipulation steps that are
carried out distort the resulting view of how frequently each TSS is used. Spike-in
experiments would be useful to address this issue. In addition, multiple difficulties were
encountered during the development of protocols involving cDNA synthesis and
amplification12. For example, researchers observed artefacts such as primer dimers that
dominated sequencing data sets and reduced effective coverage, prompting the use of
semisuppressive PCR to reduce primer dimer frequency12. Thus, although these methods
may be useful for qualitative applications, establishing and improving their quantitative
capabilities will probably require additional development.

General limitations of RNA-based TSS mapping approaches include their dependence on


cDNA synthesis or hybridization steps, the efficiency of which is dependent on RNA
NIH-PA Author Manuscript

sequence and structure. In addition, RNA-based TSS mapping is challenging for short-lived
transcripts such as primary microRNAs, which are transcribed generally at high levels but
are scarce owing to their rapid degradation. These limitations may be partly alleviated when
combined with other methods such as chromatin-based TSS prediction, which relies on
detecting histone modifications that are indicative of active transcription13,14. Such
integration may also be useful in light of the recent suggestion that post-transcriptional
processing results in 5′ cap-like structures in RNA fragments15. Thus, relying solely on
CAGE data for TSS mapping may result in difficulties in separating transcription initiation
events from RNA processing events.

Strand-specific RNA-seq
Transcriptomic studies in a range of species have revealed a pervasive presence of antisense
transcription events16. Although these events were once considered to reflect biological or
technical noise, it is now clear that antisense transcripts are functional and have various roles
in both normal physiological states and disease states16. There is therefore an increasing
interest in profiling transcriptomes at greater depths to fully characterize sense and antisense
transcription products. Standard RNA-seq approaches generally require double-stranded
NIH-PA Author Manuscript

cDNA synthesis, which erases RNA strand information. In addition, during first-strand
cDNA synthesis, spurious second-strand cDNA artefacts can be introduced, owing to the
DNA-dependent DNA polymerase (DDDP) activities of reverse transcriptases17-19, which
can confound sense versus antisense transcript determination20. Actinomycin D has been
suggested as a potential agent to reduce DDDP activities of reverse transcriptases18, but the
extent to which it is effective, and whether or not it introduces additional artefacts, has not
been fully examined. To overcome these difficulties, several strategies for strand-specific
analyses of transcriptomes have been developed.

The strategies that have been developed to generate strand-specific information generally
rely on one of three approaches. The first involves the ligation of adaptors in a
predetermined orientation to the ends of RNAs or to first-strand cDNA molecules21-23. The
known orientations of these adaptors are used as reference points to obtain RNA strand
information. A second approach is the direct sequencing of the first-strand cDNA products

Nat Rev Genet. Author manuscript; available in PMC 2011 May 1.


Ozsolak and Milos Page 4

that are generated, either in solution24,25 or on surfaces26. Last, a third approach is the
selective chemical marking of the second-strand cDNA synthesis products or RNA27,28.
These strategies have already begun to contribute to our understanding of transcriptomes,
NIH-PA Author Manuscript

including mapping of translation states of RNAs (for example, polysome profiling)29 and
identification of novel promoter-associated RNAs22.

A recent study that used the Saccharomyces cerevisiae genome as a reference compared the
performance of several of these strategies, and the authors observed differences in these
methods with respect to their level of strand specificity, evenness of coverage, agreement
with known annotations, library complexity (for example, number of unique read start
positions, which indicates the protocols’ abilities to avoid amplification artefacts such as
duplicate reads) and ability to generate quantitative expression profiles30. However, in-depth
comparative studies that characterize the biases and artefacts that are introduced by each of
these approaches are still lacking, and scientists working with these data sets should be
aware of several issues.

First, given the tendency of reverse transcriptase to generate spurious second-strand cDNA
products during first-strand cDNA synthesis17-19, it is not clear whether the approaches that
rely on sequencing first-strand cDNA products (either directly or by intra- or inter-molecular
ligation) are absolutely strand specific. The strand specificity of such approaches has been
reported by quantifying the ratio of reads that map in the antisense orientation to the known,
NIH-PA Author Manuscript

well-annotated genes, relative to the reads that map in the sense orientation. This
investigation revealed that a small fraction of reads obtained with these approaches still
align in the antisense orientation; thus, these approaches may not be entirely strand-
specific30. Furthermore, cDNA products that contain both first- and second-strand cDNA
products may not align properly to reference sequences. Given the incomplete annotations of
sense and antisense transcripts in genomes, even in those of well-studied species such as S.
cerevisiae, the true extent of strand specificity of these approaches should be carefully
assessed. Ideally, such assessment should be performed with chemically synthesized RNA
spike pools of defined sequence.

Second, ligation tends to have sequence preferences31,32. Thus, the approaches that rely on
ligation may suffer from various representational biases. examples of such bias are found in
transcriptome profiling23 and ribosome profiling experiments29, in which extremely uneven
coverage was seen for libraries prepared using ligation, compared with libraries prepared
using enzymatic 3′ polyadenylation29. Third, the in-solution or on-surface amplification step
included in some of these approaches may introduce additional artefacts — for example, in
the form of Gc biases and duplicate reads33-35. examination of such effects revealed a
duplicate read fraction in the range of 6.1% to 94.1% for standard and strand-specific
NIH-PA Author Manuscript

Illumina RNA-seq strategies, and the existence of Gc bias towards RNA templates with
neutral Gc content23. It is hoped that many of these limitations will be overcome by the
sequencing technologies that are in development or with modifications and improvements to
existing sequencing technologies4.

Characterization of alternative splicing patterns


Given the importance of alternative splicing patterns in development and the fact that 15–
60% of known disease-causing mutations affect splicing36,37, it will be crucial to catalogue
the complete repertoire of splicing events and to understand how altered splicing patterns
contribute to development, cell differentiation and human disease. Initial splice-site mapping
studies using RNA sequencing-based approaches were limited by read length, which
prevented the reliable alignment to the genome of the two independent exonic portions of
each read, representing the exon splicing event. Thus, initial RNA-seq-based studies of

Nat Rev Genet. Author manuscript; available in PMC 2011 May 1.


Ozsolak and Milos Page 5

alternative splicing used computational strategies to compensate for this limitation. The
reference sequence used for alignment was supplemented with ‘artificial’ sequences that
surround all possible splice junctions between the annotated exons of genes, allowing the
NIH-PA Author Manuscript

reads to be aligned38-41. These approaches changed our view of human splicing, as more
than 95% of human multi-exon genes were found to be alternatively spliced, with ~110,000
novel splice sites per tissue42. By counting the number of reads mapping to each exon and
spanning each splice junction, these approaches also allowed the splice efficiency of each
junction to be determined and the levels of distinct isoforms to be quantified43,44.

Improvements to current sequencing technologies now enable longer read lengths, allowing
better mapping of the reads to the alternatively spliced exons. This improvement comes
from being able to partition the reads into multiple pieces and to align each piece
independently to the genomes. In addition, approaches that involve paired-end reads now
enable sequence information to be obtained from two points in a transcript with an estimated
distance between the reads. As a result, it is now possible to search for splicing patterns
without a requirement for prior knowledge of transcript annotations45,46 (FIG. 1).
examination of splicing patterns and transcript connectivity in an unbiased and genome-wide
manner requires full-length transcript sequences to be obtained, which may be enabled in the
future by emerging technologies47,48.

Gene fusion detection


NIH-PA Author Manuscript

RNA-seq combined with computational analyses analogous to the ones described above for
splice-site detection can also be used to identify gene fusion events in disease tissues, which
has particular importance for cancer research49. Genomic DNA can be analysed with single-
read and paired-end-read strategies for the detection of translocations and other genomic
rearrangements50. However, RNA-seq may be preferable for identifying events that produce
aberrant RNA species and therefore have a higher likelihood of being functional or causal in
biological or disease settings51,52 (FIG. 2). Furthermore, genomic DNA-based approaches
cannot identify fusion events that are due to non-genomic factors, such as trans-splicing53
and read-through events between adjacent transcripts51,54. Paired-end RNA-seq can be
particularly advantageous for fusion identification because of the increased physical
coverage it offers. This approach has led to important biological findings in oncology55,56,
offering potential targets for therapeutic modulation.

The challenges faced in fusion detection are generally in parallel with those for alternative
splicing detection. In addition, RNA-seq-based analyses cannot detect fusion events that
involve the exchange of the promoter of a gene with the coding sequence of another gene.
Furthermore, RNA-seq data include chimeric cDNA artefacts that are generated by template
switching during reverse transcription and amplification57 (discussed below), leading to
NIH-PA Author Manuscript

false positives in gene fusion identification. These difficulties may be partly alleviated when
long-read RNA sequencing technologies with sufficient throughput and sequencing
performance become available4.

Targeted approaches using RNA-seq


Despite the increasing capabilities of NGS in terms of throughput and decreasing costs per
data point, the expenditure necessary to obtain sufficient sequencing coverage for several
research and potential clinical applications is still prohibitive. Such applications include the
characterization of low-abundance transcripts and genotyping to determine, for example,
which alleles of the transcripts might be differentially expressed. In these scenarios, it may
be preferable to enrich for the desired subset of transcripts, to minimize the overall cost of
sequencing and maximize the number of samples that can be analysed.

Nat Rev Genet. Author manuscript; available in PMC 2011 May 1.


Ozsolak and Milos Page 6

Target-enrichment strategies were originally developed for genomic DNA resequencing4,58.


Many of these technologies have been used to capture the human exome from genomic
DNA, given that a large fraction of disease-causing mutations are likely to be located in the
NIH-PA Author Manuscript

protein-coding transcriptome. RNA-seq of poly(A)+ RNA species offers a natural route for
exome sequencing without the use of enrichment strategies. The potential suitability of
mRNA-seq data for the identification of nucleotide variations has been demonstrated
recently by several studies59-61. However, these studies also underscored some challenges
— for example, the high sequencing depth required to sufficiently cover low-abundance
transcripts.

Slight modifications of the genomic DNA-enrichment strategies for cDNA applications have
allowed the development of targeted RNA-seq (FIG. 3). Targeted RNA-seq approaches have
been used to detect fusion transcripts, allele-specific expression, mutations and RNA-editing
events in a subset of transcripts62-64. Targeted RNA-seq strategies currently require longer
sample preparation steps and higher input RNA and cDNA quantities than do other RNA-
seq approaches, owing to the additional probe or microarray preparation and target-selection
steps. Furthermore, capture efficiency usually differs between target regions depending on
hybridization efficiency and other factors. Simplification of this process and improvements
in capture efficiency are desirable for better experimental outcomes.

Small RNA profiling


NIH-PA Author Manuscript

The impact of NGS technologies on sRNA discovery and characterization has been
particularly noteworthy. These studies have been reviewed extensively by others (for
example, see REF. 65), so we do not review this topic in depth here but provide a brief
summary for completeness.

Most initial sRNA-discovery studies used pyrosequencing66,67. Subsequently, the use of


other NGS platforms with higher throughput has resulted in genome-wide surveys and the
discovery of an ever-growing number of sRNA species15,68,69. Because NGS sample
preparation strategies for ‘longer’ RNAs (>200 nucleotides) are not suitable for sRNAs,
such as reverse transcription with random priming (because this way of priming cDNA
synthesis from short RNA species yields even shorter cDNA species that are not long
enough for efficient alignment), modified preparation strategies were developed70-72.

One important limitation of the current RNA-seq-based approaches for studying sRNAs is
their inability to provide an absolutely quantitative view of these transcripts. It has recently
become clear that, although the NGS-based sRNA-profiling approaches can be used for
differential expression analyses, the number of reads obtained per sRNA does not
necessarily correlate with their actual abundance73,74. This discrepancy seems to be due to
NIH-PA Author Manuscript

biases that are introduced during the sample preparation and sequencing steps. Whether
emerging technologies can improve sRNA quantification remains to be seen.

Direct RNA sequencing


cDNA synthesis and other RNA manipulations limit some RNA-seq applications
As noted above, most current RNA-seq methods rely on cDNA synthesis and a range of
subsequent manipulation steps, which places limitations on the current approaches for some
applications. For example, as we have discussed, the generation of spurious second-strand
cDNAs can present difficulties for strand-specific RNA-seq. Strand-specific libraries can
also be prepared to avoid this problem (discussed above), but the approaches that use RNA–
RNA ligation are laborious to construct. Another limitation imposed by cDNA synthesis is
template switching75-77. During the process of reverse transcription, the nascent cDNA that

Nat Rev Genet. Author manuscript; available in PMC 2011 May 1.


Ozsolak and Milos Page 7

is being synthesized can sometimes dissociate from the template RNA and re-anneal to a
different stretch of RNA with a sequence similar to the initial template, generating
artefactual chimeric cDNAs. Template switching may cause problems in the identification
NIH-PA Author Manuscript

of exon–intron boundaries and true chimeric transcripts. Reverse transcriptases can also
synthesize cDNA in a primer-independent manner, which is thought to be caused by self
priming arising from the RNA secondary structure. This results in the generation of random
cDNA synthesis. Furthermore, reverse transcriptases have lower fidelity compared to other
polymerases owing to their lack of proofreading mechanisms78,79, and they have variable
RNA to cDNA conversion efficiency depending on the experimental conditions.

In addition to their requirement for cDNA synthesis, current RNA-seq approaches can
present other difficulties. First, the RNA-seq signal across transcripts tends to show non-
uniformity of coverage, which may be a result of biases introduced during various steps,
such as priming with random hexamers80,81, cDNA synthesis, ligation31,32, amplification35
and sequencing33-35,82. Second, commonly used RNA-seq strategies can result in
transcript-length bias because of the multiple fragmentation and RNA or cDNA size-
selection steps they use83. This bias may result in complications for downstream analyses84.
Third, quantification of transcripts with RNA-seq requires consideration of read mapping
uncertainty (owing to sequencing error rates, repetitive elements, incomplete genome
sequence and inaccuracies in transcript annotations)85 and normalization of the number of
reads mapping to each transcript, based on transcript length. Despite improvements in
NIH-PA Author Manuscript

sequencing methods and bioinformatics advances allowing de novo construction of


transcriptomes86,87, the existing approaches are often not sufficient to detect certain
transcripts and/or cover their entire length. Together with the uncertainty regarding
transcript boundaries and length because of events such as alternative splicing,
polyadenylation sites and promoter usage, the required length-normalization step is a
potential source of errors for quantitative applications. Fourth, RNA-seq strategies often
involve a poly(A)+ mRNA-enrichment step. Polyadenylation of transcripts also takes place
during transcript degradation steps, and thus poly(A)+-enrichment steps may also enrich for
RNA degradation products of RNA polymerase I transcripts and other RNAs88.

Direct sequencing of RNAs


The limitations of current RNA-seq approaches discussed above might be at least partly
alleviated by emerging RNA analysis technologies, including DRS, that substantially alter
the method of RNA characterization. DRS currently requires single-molecule sequencing
capabilities, as the amplification of RNA molecules directly without cDNA conversion has
not been examined. Although RNA-dependent RNA polymerases do exist89, the extent to
which they can be adapted to the amplification-based next-generation sequencing
technologies is unknown at present.
NIH-PA Author Manuscript

The first massively parallel DRS approach was recently developed using the Helicos single-
molecule sequencing platform7,90,91 (FIG. 4). It relies on hybridization of several
femtomoles of 3′-polyadenylated RNA templates to single channels of poly(dT)-coated
sequencing surfaces, followed by sequencing by synthesis. This approach can select and
sequence poly(A)+ RNA from total RNA or cellular lysates, with sequence data being
derived from regions immediately upstream of the polyadenylation sites7. Thus, the
technology offers a path to obtain gene expression profiles and map polyadenylation sites in
a quantitative and genome-wide manner. RNA species that lack natural poly(A) tails can be
polyadenylated in vitro and analysed with DRS.

The development of DRS approaches that are free from cDNA synthesis artefacts such as
template switching and spurious second-strand synthesis provides potential improvements
for applications such as the surveying of strand-specific transcription. Furthermore, DRS

Nat Rev Genet. Author manuscript; available in PMC 2011 May 1.


Ozsolak and Milos Page 8

requires only femtomole or attomole levels of input RNA, depending on the application, and
involves relatively simple sample preparation. DRS-type technologies may therefore be
advantageous for applications that are challenging for current cDNA-based methodologies,
NIH-PA Author Manuscript

such as experiments that yield subnanogram-level RNA (discussed below), archival


specimens or short RNA species, which cannot be easily converted to cDNA. Furthermore,
unlike cDNA-based approaches, which require different strategies for the analysis of short
and longer RNA species, DRS sample preparation involving polyadenylation can be applied
to any RNA species, thus allowing both short and long RNAs to be observed in a single
experiment. DRS may in the future also simplify targeted RNA-seq by enabling the
integration of target selection and sequencing steps (FIG. 3d). Such integration may reduce
the sample preparation steps to only nucleic acid fragmentation, and may minimize costs as
well as the quantity of input nucleic acid required.

A key challenge for DRS is to generate the multimillion-level read quantities that are
required for many RNA applications, particularly quantification, and to further reduce error
rates and input RNA quantities through alterations to the sequencing chemistry and
template-capture steps. DRS may also not solve all of the RNA-seq limitations listed above
— including, for example, the issues of degradation products being captured during
poly(A)+ RNA selection. Furthermore, the combination of paired-end approaches with DRS
and longer read lengths is needed for various applications discussed above, including studies
focusing on the identification of 5′ (for example, CAGE-type TSS mapping) and 3′
NIH-PA Author Manuscript

boundaries of RNA species.

Profiling low-quantity RNA samples


Biological specimens (such as tissue and body fluids) are generally heterogeneous, being a
complex mixture of multiple cell types. The need to specifically select and study particular
cells is clear, but the implementation of this task is not straightforward. Several tools now
allow selection of specific cell types, such as flow-assisted cell sorting (FAcS), laser-capture
microdissection (LcM)92, serial dilution, specialized microfluidic devices93 and
micromanipulation. In addition, methods for high-quality RNA isolation from small
quantities of cells are also available. The main limitation preventing reliable, global
profiling of minute RNA quantities has been the incompatibility of high-throughput RNA
profiling approaches with low-quantity RNA samples. The absence of such methods has
slowed our progress in a range of areas, such as forensics, stem cell biology, metagenomics
and plant biology. The effects of this limitation are perhaps most acutely felt in research into
cancer and other diseases, as samples obtained from patients are generally limited in
quantity; the transition between findings from molecular profiling studies and technologies
for use in clinical research and molecular diagnostics is being held back, slowing our
NIH-PA Author Manuscript

progress towards personalized medicine. Strategies that can provide a comprehensive and
bias-free view of transcriptomes using picogram quantities of input RNA would therefore
stimulate great advances in a range of areas.

Methods for small quantities of RNA


The analysis of low-quantity RNA samples with global microarray and sequencing
technologies has traditionally required one or more amplification step(s) to obtain sufficient
nucleic acid material for subsequent detection. Since the early 1990s, several nucleic acid
amplification strategies for low-quantity RNA applications have been developed, such as
ligation-mediated PCR94, multiple displacement amplification (MDA)95, single-primer
isothermal amplification96 and in vitro transcription (IVT)-based linear amplification97. The
ideal amplification method should provide accurate sequences with a low or zero error rate,
be reproducible, produce high levels of amplification to provide the quantities of nucleic
acid needed, be applicable for nucleic acids from a wide array of species, and preserve the

Nat Rev Genet. Author manuscript; available in PMC 2011 May 1.


Ozsolak and Milos Page 9

representation of the distinct RNA molecules in the original sample. To what extent the
current methods meet these criteria is not clear. Studies performed with microarray-based
measurements suggest that amplification introduces variability and discrepancies, especially
NIH-PA Author Manuscript

for middle- and low-abundance transcripts and as input RNA quantity is lowered further98.

Sequencing-based low-quantity RNA profiling is relatively new. A recently reported


mRNA-seq method relies on double PCR amplification steps and can be used to profile the
transcriptomes of single oocytes40. It was observed, however, that the reproducibility of
such low-quantity RNA-seq approaches may be negatively affected owing to stochastic
amplification biases that may result in the drop-out of some RNA species and preferential
amplification of others23. Such outcomes can lead to, for instance, duplicate reads and
reduced quantification power.

Emerging technologies
A number of both hybridization- and sequencing-based technologies are now emerging that
may allow reliable transcriptome profiles to be obtained from minute cell quantities. On the
sequencing side, nanoCAGE12 now allows TSS mapping from 10 nanograms of total RNA
through the use of various amplification strategies. Amplification-free RNA-seq approaches
have recently been developed that minimize the quantity of input RNA required. One
approach involves the sequencing of first-strand cDNA products from as little as ~500
picograms of RNA, with priming carried out in solution with oligo-dT or random
NIH-PA Author Manuscript

hexamers24,25. Another approach involves the use of poly(dT) primers on sequencing


surfaces to select for poly(A)+ mRNA from cellular lysates, followed by on-surface first-
strand cDNA synthesis and sequencing26. This approach allows reproducible gene
expression profiles to be obtained from ~1,000 cells and eliminates RNA loss during the
RNA isolation steps, which may be particularly important as the input cell quantity is
reduced. As described above, DRS eliminates the cDNA synthesis stage and requires only a
few femtomoles of RNAs containing natural poly(A) tails or RNAs polyadenylated in vitro.
It is also conceivable that microfluidic capabilities could be combined with DRS for single-
cell applications (FIG. 5a).

Hybridization-based methodologies are also providing promise for working with very small
quantities of RNA. The NanoString nCounter System provides an alternative method for
RNA quantification without the requirement for cDNA synthesis, and it relies on the
generation of target-specific probes (FIG. 5b). The probe mixture is hybridized to RNA
samples in solution, followed by the immobilization of probe–RNA duplexes on surfaces
and single-molecule imaging to identify and count individual transcripts99. In principle, the
system can detect up to 16,384 transcripts simultaneously. This approach requires ~100
nanograms of RNA or 2000–5,000 cells100, but optimization of the probe hybridization and
NIH-PA Author Manuscript

surface immobilization steps may further reduce input RNA quantity.

Fluidigm offers a microfluidics platform that can perform quantitative real-time polymerase
chain reaction (qRT-PCR) experiments on gene panels in a multiplexed manner and has
been used to profile single cells. commercial kits allowing one-step cDNA synthesis and
amplification are used for cell lysis, cDNA synthesis and PCR amplification of the transcript
region of interest. Pre-amplified cDNAs are then introduced to the Fluidigm Dynamic Array
for qRT-PCR analysis. This approach may be useful for the determination of the expression
levels of a subset of transcripts across cells of interest101,102.

None of the approaches described above is mature, and none so far fully addresses our need
for reliable, genome-wide and in-depth transcriptome profiles from minute cell quantities.
For example, both the Fluidigm and NanoString technologies interrogate only a selected
subset of transcripts and do not provide comprehensive analyses. However, it is hoped that

Nat Rev Genet. Author manuscript; available in PMC 2011 May 1.


Ozsolak and Milos Page 10

future advances that will arise from the foundation formed by these technologies will enable
such capabilities.
NIH-PA Author Manuscript

Future perspectives
Recent advances in RNA-seq have provided researchers with a powerful toolbox for the
characterization and quantification of the transcriptome. Emerging sequencing technologies
promise to at least partly alleviate the difficulties of current RNA-seq methods and equip
scientists with better tools. Using these technological advances, we can build a complete
catalogue of transcripts that are derived from genomes ranging from those of simple
unicellular organisms to complex mammalian cells, as well as in tissues in normal and
disease states. Furthermore, with our increasing ability to work with minute RNA quantities
from fresh and formalin-fixed paraffin-embedded tissues and cells, and to provide
quantification of RNA species from even single cells, we have the opportunity to define
complex biological networks in a wide range of biological specimens. With these networks
in hand, we can use data-driven RNA network models of cells and tissues in an attempt to
fully understand the biological pathways that are active in various physiological conditions.
In addition, these technologies are bringing us closer to the ability to use RNA
measurements for clinical diagnostics. For example, analysis of circulating extracellular
nucleic acid103 and cells, such as fetal RNA and circulating tumour cells, with these new
technologies may allow for earlier assessment of health, disease recurrence or mutational
NIH-PA Author Manuscript

status. Thus, these technologies will continue to help us realize the full potential of genomic
information as it relates to basic biological questions of differentiation and diversity, as well
as its growing impact on the personalization of healthcare.

Acknowledgments
We apologize to authors whose work could not be cited owing to space constraints. We are grateful to the US
National Human Genome Research Institute for their support (grants R01 HG005230 and R44 HG005279).

References
1. Birney E, et al. Identification and analysis of functional elements in 1% of the human genome by the
ENCODE pilot project. Nature 2007;447:799–816. [PubMed: 17571346]
2. Berretta J, Morillon A. Pervasive transcription constitutes a new level of eukaryotic genome
regulation. EMBO Rep 2009;10:973–982. [PubMed: 19680288]
3. Kapranov P, Willingham AT, Gingeras TR. Genome-wide transcription and the implications for
genomic organization. Nature Rev. Genet 2007;8:413–423. [PubMed: 17486121]
4. Metzker ML. Sequencing technologies — the next generation. Nature Rev. Genet 2010;11:31–46.
[PubMed: 19997069] This Review provides a comprehensive overview of currently available and
NIH-PA Author Manuscript

in-development NGS technologies.


5. Wang Z, Gerstein M, Snyder M. RNA-seq: a revolutionary tool for transcriptomics. Nature Rev.
Genet 2009;10:57–63. [PubMed: 19015660]
6. van Vliet AH. Next generation sequencing of microbial transcriptomes: challenges and
opportunities. FEMS Microbiol. Lett 2010;302:1–7. [PubMed: 19735299]
7. Ozsolak F, et al. Direct RNA sequencing. Nature 2009;461:814–818. [PubMed: 19776739] The first
technology for high-throughput direct sequencing of RNA molecules without prior reverse
transcription.
8. Carninci P, et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nature
Genet 2006;38:626–635. [PubMed: 16645617]
9. Shiraki T, et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting
point and identification of promoter usage. Proc. Natl Acad. Sci. USA 2003;100:15776–15781.
[PubMed: 14663149]

Nat Rev Genet. Author manuscript; available in PMC 2011 May 1.


Ozsolak and Milos Page 11

10. Valen E, et al. Genome-wide detection and analysis of hippocampus core promoters using
DeepCAGE. Genome Res 2009;19:255–265. [PubMed: 19074369]
11. Ni T, et al. A paired-end sequencing strategy to map the complex landscape of transcription
NIH-PA Author Manuscript

initiation. Nature Methods 2010;7:521–527. [PubMed: 20495556]


12. Plessy C, et al. Linking promoters to functional transcripts in small samples with nanoCAGE and
CAGEscan. Nature Methods 2010;7:528–534. [PubMed: 20543846]
13. Marson A, et al. Connecting microRNA genes to the core transcriptional regulatory circuitry of
embryonic stem cells. Cell 2008;134:521–533. [PubMed: 18692474]
14. Ozsolak F, et al. Chromatin structure analyses identify miRNA promoters. Genes Dev
2008;22:3172–3183. [PubMed: 19056895]
15. Affymetrix/Cold Spring Harbor Laboratory ENCODE Transcriptome Project. Post-transcriptional
processing generates a diversity of 5′-modified long and short RNAs. Nature 2009;457:1028–
1032. [PubMed: 19169241] This paper raises the possibility of 5′-cap addition during post-
transcriptional processing steps.
16. Faghihi MA, Wahlestedt C. Regulatory roles of natural antisense transcripts. Nature Rev. Mol. Cell
Biol 2009;10:637–643. [PubMed: 19638999] An excellent review of the literature on sense and
antisense transcription.
17. Gubler U. Second-strand cDNA synthesis: mRNA fragments as primers. Meth. Enzymol
1987;152:330–335. [PubMed: 3309563]
18. Perocchi F, Xu Z, Clauder-Munster S, Steinmetz LM. Antisense artifacts in transcriptome
microarray experiments are resolved by actinomycin D. Nucleic Acids Res 2007;35:e128.
NIH-PA Author Manuscript

[PubMed: 17897965]
19. Spiegelman S, et al. DNA-directed DNA polymerase activity in oncogenic RNA viruses. Nature
1970;227:1029–1031. [PubMed: 4317810]
20. Wu JQ, et al. Systematic analysis of transcribed loci in ENCODE regions using RACE sequencing
reveals extensive transcription in the human genome. Genome Biol 2008;9:R3. [PubMed:
18173853]
21. Cloonan N, et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nature
Methods 2008;5:613–619. [PubMed: 18516046]
22. Core LJ, Waterfall JJ, Lis JT. Nascent RNA sequencing reveals widespread pausing and divergent
initiation at human promoters. Science 2008;322:1845–1848. [PubMed: 19056941]
23. Mamanova L, et al. FRT-seq: amplification-free, strand-specific transcriptome sequencing. Nature
Methods 2010;7:130–132. [PubMed: 20081834]
24. Lipson D, et al. Quantification of the yeast transcriptome by single-molecule sequencing. Nature
Biotechnol 2009;27:652–658. [PubMed: 19581875]
25. Ozsolak F, et al. Digital transcriptome profiling from attomole-level RNA samples. Genome Res
2010;20:519–525. [PubMed: 20133332]
26. Ozsolak F, et al. Amplification-free digital gene expression profiling from minute cell quantities.
Nature Methods 2010;7:619–621. [PubMed: 20639869]
NIH-PA Author Manuscript

27. He Y, Vogelstein B, Velculescu VE, Papadopoulos N, Kinzler KW. The antisense transcriptomes
of human cells. Science 2008;322:1855–1857. [PubMed: 19056939]
28. Parkhomchuk D, et al. Transcriptome analysis by strand-specific sequencing of complementary
DNA. Nucleic Acids Res 2009;37:e123. [PubMed: 19620212]
29. Ingolia NT, Ghaemmaghami S, Newman JR, Weissman JS. Genome-wide analysis in vivo of
translation with nucleotide resolution using ribosome profiling. Science 2009;324:218–223.
[PubMed: 19213877]
30. Levin JZ, et al. Comprehensive comparative analysis of strand-specific RNA sequencing methods.
Nature Methods 2010;7:709–715. [PubMed: 20711195]
31. Faulhammer D, Lipton RJ, Landweber LF. Fidelity of enzymatic ligation for DNA computing. J.
Comput. Biol 2000;7:839–848. [PubMed: 11382365]
32. Housby JN, Southern EM. Fidelity of DNA ligation: a novel experimental approach based on the
polymerisation of libraries of oligonucleotides. Nucleic Acids Res 1998;26:4259–4266. [PubMed:
9722647]

Nat Rev Genet. Author manuscript; available in PMC 2011 May 1.


Ozsolak and Milos Page 12

33. Dohm JC, Lottaz C, Borodina T, Himmelbauer H. Substantial biases in ultra-short read data sets
from high-throughput DNA sequencing. Nucleic Acids Res 2008;36:e105. [PubMed: 18660515]
34. Goren A, et al. Chromatin profiling by directly sequencing small quantities of immunoprecipitated
NIH-PA Author Manuscript

DNA. Nature Methods 2010;7:47–49. [PubMed: 19946276]


35. Kozarewa I, et al. Amplification-free Illumina sequencing-library preparation facilitates improved
mapping and assembly of (G+C)-biased genomes. Nature Methods 2009;6:291–295. [PubMed:
19287394]
36. Nilsen TW, Graveley BR. Expansion of the eukaryotic proteome by alternative splicing. Nature
2010;463:457–463. [PubMed: 20110989]
37. Wang GS, Cooper TA. Splicing in disease: disruption of the splicing code and the decoding
machinery. Nature Rev. Genet 2007;8:749–761. [PubMed: 17726481]
38. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian
transcriptomes by RNA-seq. Nature Methods 2008;5:621–628. [PubMed: 18516045]
39. Sultan M, et al. A global view of gene activity and alternative splicing by deep sequencing of the
human transcriptome. Science 2008;321:956–960. [PubMed: 18599741]
40. Tang F, et al. mRNA-seq whole-transcriptome analysis of a single cell. Nature Methods
2009;6:377–382. [PubMed: 19349980]
41. Wang ET, et al. Alternative isoform regulation in human tissue transcriptomes. Nature
2008;456:470–476. [PubMed: 18978772]
42. Carninci P. Is sequencing enlightenment ending the dark age of the transcriptome? Nature Methods
2009;6:711–713. [PubMed: 19953680]
NIH-PA Author Manuscript

43. Jiang H, Wong WH. Statistical inferences for isoform expression in RNA-seq. Bioinformatics
2009;25:1026–1032. [PubMed: 19244387]
44. Richard H, et al. Prediction of alternative isoforms from exon expression levels in RNA-seq
experiments. Nucleic Acids Res 2010;38:e112. [PubMed: 20150413]
45. Ameur A, Wetterbom A, Feuk L, Gyllensten U. Global and unbiased detection of splice junctions
from RNA-seq data. Genome Biol 2010;11:R34. [PubMed: 20236510]
46. Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-seq.
Bioinformatics 2009;25:1105–1111. [PubMed: 19289445]
47. Eid J, et al. Real-time DNA sequencing from single polymerase molecules. Science 2009;323:133–
138. [PubMed: 19023044]
48. Olasagasti F, et al. Replication of individual DNA molecules under electronic control using a
protein nanopore. Nature Nanotech 2010;5:798–806.
49. Mitelman F, Johansson B, Mertens F. The impact of translocations and gene fusions on cancer
causation. Nature Rev. Cancer 2007;7:233–245. [PubMed: 17361217]
50. Korbel JO, et al. Paired-end mapping reveals extensive structural variation in the human genome.
Science 2007;318:420–426. [PubMed: 17901297]
51. Maher CA, et al. Transcriptome sequencing to detect gene fusions in cancer. Nature 2009;458:97–
101. [PubMed: 19136943]
NIH-PA Author Manuscript

52. Zhao Q, et al. Transcriptome-guided characterization of genomic rearrangements in a breast cancer


cell line. Proc. Natl Acad. Sci. USA 2009;106:1886–1891. [PubMed: 19181860]
53. Li H, Wang J, Mor G, Sklar J. A neoplastic gene fusion mimics trans-splicing of RNAs in normal
human cells. Science 2008;321:1357–1361. [PubMed: 18772439]
54. Maher CA, et al. Chimeric transcript discovery by paired-end transcriptome sequencing. Proc. Natl
Acad. Sci. USA 2009;106:12353–12358. [PubMed: 19592507]
55. Berger MF, et al. Integrative analysis of the melanoma transcriptome. Genome Res 2010;20:413–
427. [PubMed: 20179022]
56. Palanisamy N, et al. Rearrangements of the RAF kinase pathway in prostate cancer, gastric cancer
and melanoma. Nature Med 2010;16:793–798. [PubMed: 20526349]
57. McManus CJ, Duff MO, Eipper-Mains J, Graveley BR. Global analysis of trans-splicing in
Drosophila. Proc. Natl Acad. Sci. USA 2010;107:12975–12979. [PubMed: 20615941]
58. Garber K. Fixing the front end. Nature Biotech 2008;26:1101–1104.

Nat Rev Genet. Author manuscript; available in PMC 2011 May 1.


Ozsolak and Milos Page 13

59. Chepelev I, Wei G, Tang Q, Zhao K. Detection of single nucleotide variations in expressed exons
of the human genome using RNA-seq. Nucleic Acids Res 2009;37:e106. [PubMed: 19528076]
60. Cirulli ET, et al. Screening the human exome: a comparison of whole genome and whole
NIH-PA Author Manuscript

transcriptome sequencing. Genome Biol 2010:11.


61. Shah SP, et al. Mutational evolution in a lobular breast tumour profiled at single nucleotide
resolution. Nature 2009;461:809–813. [PubMed: 19812674]
62. Levin JZ, et al. Targeted next-generation sequencing of a cancer transcriptome enhances detection
of sequence variants and novel fusion transcripts. Genome Biol 2009;10:R115. [PubMed:
19835606]
63. Li JB, et al. Genome-wide identification of human RNA editing sites by parallel DNA capturing
and sequencing. Science 2009;324:1210–1213. [PubMed: 19478186]
64. Zhang K, et al. Digital RNA allelotyping reveals tissue-specific and allele-specific gene expression
in human. Nature Methods 2009;6:613–618. [PubMed: 19620972]
65. Ghildiyal M, Zamore PD. Small silencing RNAs: an expanding universe. Nature Rev. Genet
2009;10:94–108. [PubMed: 19148191]
66. Rajagopalan R, Vaucheret H, Trejo J, Bartel DP. A diverse and evolutionarily fluid set of
microRNAs in Arabidopsis thaliana. Genes Dev 2006;20:3407–3425. [PubMed: 17182867]
67. Ruby JG, et al. Large-scale sequencing reveals 21U-RNAs and additional microRNAs and
endogenous siRNAs in C. elegans. Cell 2006;127:1193–1207. [PubMed: 17174894]
68. Seila AC, et al. Divergent transcription from active promoters. Science 2008;322:1849–1851.
[PubMed: 19056940]
NIH-PA Author Manuscript

69. Taft RJ, et al. Tiny RNAs associated with transcription start sites in animals. Nature Genet
2009;41:572–578. [PubMed: 19377478]
70. Berezikov E, et al. Diversity of microRNAs in human and chimpanzee brain. Nature Genet
2006;38:1375–1377. [PubMed: 17072315]
71. Kapranov P, et al. New class of gene-termini-associated human RNAs suggests a novel RNA
copying mechanism. Nature 2010;466:642–646. [PubMed: 20671709]
72. Lau NC, Lim LP, Weinstein EG, Bartel DP. An abundant class of tiny RNAs with probable
regulatory roles in Caenorhabditis elegans. Science 2001;294:858–862. [PubMed: 11679671]
73. Kawaji H, Hayashizaki Y. Exploration of small RNAs. PLoS Genet 2008;4:e22. [PubMed:
18225959]
74. Linsen SE, et al. Limitations and possibilities of small RNA digital gene expression profiling.
Nature Methods 2009;6:474–476. [PubMed: 19564845] The authors describe the difficulties
associated with the analysis and quantification of short RNA species using current NGS platforms.
75. Cocquet J, Chong A, Zhang G, Veitia RA. Reverse transcriptase template switching and false
alternative transcripts. Genomics 2006;88:127–131. [PubMed: 16457984]
76. Mader RM, et al. Reverse transcriptase template switching during reverse transcriptase-polymerase
chain reaction: artificial generation of deletions in ribonucleotide reductase mRNA. J. Lab. Clin.
Med 2001;137:422–428. [PubMed: 11385363]
NIH-PA Author Manuscript

77. Roy SW, Irimia M. When good transcripts go bad: artifactual RT-PCR ‘splicing’ and genome
analysis. Bioessays 2008;30:601–605. [PubMed: 18478540]
78. Chen D, Patton JT. Reverse transcriptase adds nontemplated nucleotides to cDNAs during 5′-
RACE and primer extension. Biotechniques 2001;30:574–582. [PubMed: 11252793]
79. Roberts JD, et al. Fidelity of two retroviral reverse transcriptases during DNA-dependent DNA
synthesis in vitro. Mol. Cell. Biol 1989;9:469–476. [PubMed: 2469002]
80. Armour CD, et al. Digital transcriptome profiling using selective hexamer priming for cDNA
synthesis. Nature Methods 2009;6:647–649. [PubMed: 19668204]
81. Hansen KD, Brenner SE, Dudoit S. Biases in Illumina transcriptome sequencing caused by random
hexamer priming. Nucleic Acids Res 2010;38:e131. [PubMed: 20395217]
82. Rosenkranz R, Borodina T, Lehrach H, Himmelbauer H. Characterizing the mouse ES cell
transcriptome with Illumina sequencing. Genomics 2008;92:187–194. [PubMed: 18602984]
83. Oshlack A, Wakefield MJ. Transcript length bias in RNA-seq data confounds systems biology.
Biol. Direct 2009;4:14. [PubMed: 19371405]

Nat Rev Genet. Author manuscript; available in PMC 2011 May 1.


Ozsolak and Milos Page 14

84. Young MD, Wakefield MJ, Smyth GK, Oshlack A. Gene ontology analysis for RNA-seq:
accounting for selection bias. Genome Biol 2010;11:R14. [PubMed: 20132535]
85. Li B, Ruotti V, Stewart RM, Thomson JA, Dewey CN. RNA-seq gene expression estimation with
NIH-PA Author Manuscript

read mapping uncertainty. Bioinformatics 2010;26:493–500. [PubMed: 20022975]


86. Guttman M, et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the
conserved multi-exonic structure of lincRNAs. Nature Biotech 2010;28:503–510.
87. Trapnell C, et al. Transcript assembly and quantification by RNA-seq reveals unannotated
transcripts and isoform switching during cell differentiation. Nature Biotech 2010;28:511–515.
88. Shcherbik N, Wang M, Lapik YR, Srivastava L, Pestov DG. Polyadenylation and degradation of
incomplete RNA polymerase I transcripts in mammalian cells. EMBO Rep 2010;11:106–111.
[PubMed: 20062005]
89. Makeyev EV, Bamford DH. Replicase activity of purified recombinant protein P2 of double-
stranded RNA bacteriophage phi6. EMBO J 2000;19:124–133. [PubMed: 10619851]
90. Gurumurthy S, et al. The Lkb1 metabolic sensor maintains haematopoietic stem cell survival.
Nature 2010;468:659–63. [PubMed: 21124451]
91. Ozsolak F, et al. Comprehensive polyadenylation site maps in yeast and human reveal pervasive
alternative polyadenylation. Cell 2010;143:1018–1029. [PubMed: 21145465]
92. Simone NL, Bonner RF, Gillespie JW, Emmert-Buck MR, Liotta LA. Laser-capture
microdissection: opening the microscopic frontier to molecular analysis. Trends Genet
1998;14:272–276. [PubMed: 9676529]
93. Marcy Y, et al. Dissecting biological “dark matter” with single-cell genetic analysis of rare and
NIH-PA Author Manuscript

uncultivated TM7 microbes from the human mouth. Proc. Natl Acad. Sci. USA 2007;104:11889–
11894. [PubMed: 17620602]
94. Pfeifer GP, Steigerwald SD, Mueller PR, Wold B, Riggs AD. Genomic sequencing and
methylation analysis by ligation mediated PCR. Science 1989;246:810–813. [PubMed: 2814502]
95. Dean FB, et al. Comprehensive human genome amplification using multiple displacement
amplification. Proc. Natl Acad. Sci. USA 2002;99:5261–5266. [PubMed: 11959976]
96. Dafforn A, et al. Linear mRNA amplification from as little as 5 ng total RNA for global gene
expression analysis. Biotechniques 2004;37:854–857. [PubMed: 15560142]
97. Eberwine J, et al. Analysis of gene expression in single live neurons. Proc. Natl Acad. Sci. USA
1992;89:3010–3014. [PubMed: 1557406]
98. Nygaard V, Hovig E. Options available for profiling small samples: a review of sample
amplification technology when combined with microarray profiling. Nucleic Acids Res
2006;34:996–1014. [PubMed: 16473852] This review provides a good overview of the current
low-quantity RNA applications and the complications associated with them.
99. Geiss GK, et al. Direct multiplexed measurement of gene expression with color-coded probe pairs.
Nature Biotech 2008;26:317–325.
100. Amit I, et al. Unbiased reconstruction of a mammalian transcriptional network mediating
pathogen responses. Science 2009;326:257–263. [PubMed: 19729616]
NIH-PA Author Manuscript

101. Byrne JA, Nguyen HN, Reijo Pera RA. Enhanced generation of induced pluripotent stem cells
from a subpopulation of human fibroblasts. PLoS ONE 2009;4:e7118. [PubMed: 19774082]
102. Helzer KT, et al. Circulating tumor cells are transcriptionally similar to the primary tumor in a
murine prostate model. Cancer Res 2009;69:7860–7866. [PubMed: 19789350]
103. Lo YM, et al. Plasma placental RNA allelic ratio permits noninvasive prenatal chromosomal
aneuploidy detection. Nature Med 2007;13:218–223. [PubMed: 17206148] This paper describes
the quantification of extracellular circulating RNA in mother’s plasma during pregnancy to detect
fetal aneuploidy.
104. Bowers J, et al. Virtual terminator nucleotides for next-generation DNA sequencing. Nature
Methods 2009;6:593–595. [PubMed: 19620973]

Nat Rev Genet. Author manuscript; available in PMC 2011 May 1.


Ozsolak and Milos Page 15
NIH-PA Author Manuscript

Figure 1. RNA-seq for detection of alternative splicing events


a | Sequence reads are mapped to genomic DNA or to a transcriptome reference to detect
alternative isoforms of an RNA transcript. Mapping is based simply on read counts to each
exon and reads that span the exonic boundaries. One infers the absence of the genomic exon
in the transcript by virtue of no reads mapping to the genomic location. b | Paired sequence
reads provide additional information about exonic splicing events, as demonstrated by
matching the first read in one exon and placing the second read in the downstream exon,
creating a map of the transcript structure.
NIH-PA Author Manuscript
NIH-PA Author Manuscript

Nat Rev Genet. Author manuscript; available in PMC 2011 May 1.


Ozsolak and Milos Page 16
NIH-PA Author Manuscript

Figure 2. Use of RNA-seq for BCR–ABL fusion gene detection


a | Breakpoint cluster region (BCR) and ABL1 gene transcripts. b | BCR–ABL fusion gene
transcript. c | Sequence reads mapping across the BCR–ABL fusion gene site demonstrating
the ability to accurately identify the site of gene fusion. The data were derived from RNA-
seq analysis25 of the K562 transcriptome using the HeliScope (the raw data files are
available at the University of California Santa Cruz Genome Browser and the Helicos
Technology Center).
NIH-PA Author Manuscript
NIH-PA Author Manuscript

Nat Rev Genet. Author manuscript; available in PMC 2011 May 1.


Ozsolak and Milos Page 17
NIH-PA Author Manuscript

Figure 3. Alternative methods for targeted RNA-seq


a | Using poly(A)+ RNA converted to double-stranded cDNA the Agilent SureSelect method
uses RNA probes to enrich selected cDNA62. b | A custom NimbleGen array may allow
selection of cDNAs of interest. c | The generation of DNA molecules with sequence-specific
complementary targeting sites allows the targeting of cDNAs63,64. d | Helicos sequencing
surfaces containing target-specific oligonucleotides can be used to select desired RNA,
DNA and cDNA species and sequence regions of interest in a single step. nt, nucleotide.
NIH-PA Author Manuscript
NIH-PA Author Manuscript

Nat Rev Genet. Author manuscript; available in PMC 2011 May 1.


Ozsolak and Milos Page 18
NIH-PA Author Manuscript
NIH-PA Author Manuscript

Figure 4. Direct RNA sequencing using the Helicos approach


a | RNA that is polyadenylated and 3′ deoxy-blocked with poly(A) polymerase is captured
on poly(dT)-coated surfaces. A ‘fill-and-lock’ step is performed, in which the ‘fill’ step is
performed with natural thymidine and polymerase, and the ‘lock’ step is performed with
fluorescently labelled A, C and G Virtual Terminator (VT) nucleotides104 and polymerase.
This step corrects for any misalignments that may be present in poly(A) and poly(T)
duplexes, and ensures that the sequencing starts in the RNA template rather than the
polyadenylated tail. b | Imaging is performed to locate the positions of the templates. Then,
chemical cleavage of the dye–nucleotide linker is performed to release the dye and prepare
the templates for nucleotide incorporation. c | Incubation of this surface with one labelled
NIH-PA Author Manuscript

nucleotide (C-VT is shown as an example) and a polymerase mixture is carried out. After
this step, imaging is performed to locate the templates that have incorporated the nucleotide.
Chemical cleavage of the dye allows the surface and DNA templates to be ready for the next
nucleotide-addition cycle. Nucleotides are added in the C, T, A, G order for 120 total cycles
(30 additions of each nucleotide).

Nat Rev Genet. Author manuscript; available in PMC 2011 May 1.


Ozsolak and Milos Page 19
NIH-PA Author Manuscript

Figure 5. Emerging technologies for single-cell or low-quantity-cell gene expression profiling


a | Single-molecule DNA and RNA sequencing technologies could be modified for single-
cell applications. Cells can be delivered to flow cells using fluidics systems, followed by cell
lysis and capture of mRNA species on the poly(dT)-coated sequencing surfaces by
hybridization. Standard sequencing runs could take place on channels with a 127.5 mm2
surface area, requiring 2,750 images to be taken per cycle to image the entire channel area.
The surface area needed to accommodate ~350,000 mRNA molecules contained in a single
cell is ~0.4 mm2; thus, only eight images per cycle would be needed. Sequence analysis can
be done with direct RNA sequencing (DRS)7 or on-surface cDNA synthesis followed by
NIH-PA Author Manuscript

single-molecule DNA sequencing26. b | Counter system workflow. Two probes are used for
each target site: the capture probe (shown in red) contains a target-specific sequence and a
modification that allows the immobilization of the molecules on a surface; the reporter probe
contains a different target-specific sequence (shown in blue) and a fluorescent barcode
(shown by a green circle) that is unique to each target being examined. After hybridization
of the capture and reporter probe mixture to RNA samples in solution, excess probes are
removed. The hybridized RNA duplexes are then immobilized on a surface and imaged to
identify and count each transcript with the unique fluorescent signals on the capture and
reporter probes.
NIH-PA Author Manuscript

Nat Rev Genet. Author manuscript; available in PMC 2011 May 1.


NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript

Table 1
Next generation sequencing-based approaches for transcription start site mapping

TSS method RNA sequence data starting RNA Refs

CAGE 5′ end of transcripts 50 μg total RNA 9

DeepCAGE 5′ end of transcripts 10 ng total RNA 10


Ozsolak and Milos

nanoCAGE 5′ end of transcripts 10 ng total RNA 12

CAGEscan 5′ end of transcripts and either 3′ end or internal RNA sequence 10 ng total RNA 12

PEAT 5′ end of transcripts paired with random reads along the RNA 150 μg total RNA 11

CAGE, cap analysis of gene expression; CAGEscan, paired read to combine 5′ CAGE with downstream sequence; DeepCAGE, high-throughput CAGE sequencing; nanoCAGE, low-quantity CAGE; ng,
nanograms; PEAT, paired end analysis of transcription start sites; TSS, transcription start site.

Nat Rev Genet. Author manuscript; available in PMC 2011 May 1.


Page 20

You might also like