Serial Analysis of Gene Expression SAGE
Lecture Notes for BIO702
Manjusha Verma
SAGE is a sequencing-based gene expression profiling technique
It is an approach that allows the rapid and detailed analysis of thousands of transcripts (Science 270: 484-487, 1995, Cell 88: 243-251, 1997).
SAGE is a powerful tool that allows the analysis of overall gene expression patterns with digital analysis
For organisms with poorly characterized genomic and expressed sequences, SAGE can be used to obtain complete transcriptional profiles of expressed genes, albeit unknown genes An adaptation of SAGE, called LongSAGE, allows the derived transcriptome to be used in annotating expressed genes in the genome.
In this sense, SAGE is a truly global and unbiased gene expression technique
The technique of SAGE uses multiple enzymatic, PCR amplification, purification, and cloning steps
Two basic principles
I. A short oligonucleotide sequence, defined by a specific restriction endonuclease (anchoring enzyme, AE) at a fixed distance from the poly(A) tail, can uniquely identify mRNA transcripts
Theoretically, a 10-bp sequence tag can give 410 (1 048 576) different sequence combinations, which is overtly sufficient to discriminate all the transcripts derived from the human genome
II.
The end-to-end concatenation of these short oligonucleotides allows multiple transcript detection per sequencing reaction
Schematic SAGE protocol outline
the purification of mRNA bound to solid phase oligo(dT) magnetic beads
cDNA synthesis directly on the oligo(dT) bead
Digest with the anchoring enzyme NlaIII (AE) to reveal the 3-most restriction site anchored to the oligo(dT) bead
sample is equally divided into two separate tubes and ligated to two different linkers, A or B
Linkers contain the recognition site for BsmFI, a type IIS restriction enzyme that cuts 10-bp 3 from the anchoring enzyme recognition site.
SAGE tags released from the oligo(dT) beads are then separated, blunted, and ligated to each other to give rise to ditags
ditags are PCR amplified, released from the linkers, gel purified, serially ligated, cloned, and sequenced using an automated sequencer
SAGE Data Analysis
The sequence files generated by the automated sequencer are analyzed using the SAGE2000 software (www.sagenet.org) The three steps involved in obtaining a differential gene expression list are: (1) Deciphering the SAGE tags from the sequence data files by using the SAGE2000 software for extracting ditags and checking for duplicate ditags (2) downloading a reference sequence database from the NCBI Web site (SAGEmap, www.ncbi.nlm.nih.gov); (3) Associating the tags to the expressed gene database
Modifications of SAGE
LongSAGE modification uses a different type IIS restriction endonuclease MmeI as the tagging enzyme that cuts 17-bp 3 from the anchoring site Improved efficiencies of library construction requiring less mRNA have been reported using SAGE-Lite and MicroSAGE
Challenging parts of gene expression experiments is determining the biological significance of the candidate genes
Studying direct targets of transcriptional factors and using highly selected populations of cells as starting material
SAGE versus Microarray
SAGE
Scope screen of the genetic No prior data required Limited
Microarray
Possible in well studied species, model plants Large numbers of samples may be analyzed more efficiently
Number of samples
Amount material
of
starting
PCR amplification step of the SAGE technique decreases the amount of RNA needed 50 to 500 ng of RNA or 5 to 50 g of total RNA
1.5*106 bases need to be sequenced for a simple two library comparison More quantitatively reproducible Digital database facilitates direct comparisons between SAGE libraries
Comparatively more
Availability of resources such as an automated DNA sequencer Data quality Comparison experiments between
Comparing experiments is difficult due to a number of random and systematic errors
Assignment: Enumerate SAGE applications by giving experimental examples