Download presentation
Presentation is loading. Please wait.
Published byJewel Greene Modified over 6 years ago
1
Comparative Genomics and New Evolutionary Biology
09/26/2008
2
How does comparative genomics change our view of the evolution of life
Traditionally, we believe that genomes are stable and evolve gradually through veridical inheritances. Now we believe that genomes are in flux, gene loss and HGT are major forces shaping the genomes, rather than isolated incidents of little consequences. Comparative genomics is revealing the true complexity of evolution, and is shaking many traditional concept of evolution, e.g. it uproots the Tree of life, but it also provide data to build a better and more realistic tree. New theoretical and algorithmic developments lie ahead to integrate and interpret these data. The availability of genome sequences from many diverse phylogenetic groups will provide possibility of reconstructing genomes of ancestral life form, including the last Universal Common Ancestor (LUCA) of all the extant life forms on this planet.
3
The three domains of life
The theory of three domains of life was original proposed by Carl Woese in 1980s based on the analysis of the phylogenetic trees of rRNA sequences then available.
4
The three domains of life
The theory is also supported by other evidences: Archaea have unique membrane lipid compositions than bacteria Archaea phenotypically looks like bacteria, and clearly are prokaryotes, but they are more similar to eukaryote in some other respects: ---- their ribosomes share a number of proteins with eukaryotes, but not with bacteria; ---- share RNA polymerase with eukaryotes; ---- the presence of histone in DNA structure; ---- similar organization of the DNA replication apparatus
5
The three domains of life supported by comparative genomics
Unique core COGs of the three domains of life
6
The three domains of life supported by comparative genomics
Sequence similarity among archaea, between archaea and bacteria, and between archaea and eukaryotes, is in decreasing order. A-A Hit A-B A-E Score
7
The three domains of life supported by comparative genomics
The distribution of 310 COG shared by 13 archaeal genomes reveals that archaea are bacterial in shell but eukaryotic in core. Achaea specific A+B-E Metabolism related B+A+E Information process related A+E-B
8
The three domains of life supported by comparative genomics
The distribution of 310 COG shared by 13 archaeal genomes reveals that archaea are “bacterial in metabolism and much of cell biology and eukaryotic in basal information processing systems”. A+E-B A+B-E A+E-B A+B-E A+B-E A+B-E A+E-B A+E-B
9
The three domains of life supported by comparative genomics
Two important conclusions from comparative genomics analyses of the three domains of life. Archaea and bacteria share a substantial gene pool, part of which is ancient heritage of the common ancestor of these two domains, and part is the result of HGT; There is a small but critically important core of proteins in archaea, primarily involved in information processing, that reflects shared history of archaea and eukaryotes.
10
Prevalence of lineage-specific gene loss and horizontal gene transfer in evolution
Lineage-specific gene loss and horizontal gene transfer are two common evolutionary phenomena, but the prevalence of HGT is only fully recognized through comparative genomics analyses. Correlation between the similarity in organisms’ lifestyles and the apparent number of genes they share. Bacterial hyperthermopiles have 15-20% their genes of archaeal origin while their close relative mesophiles only have 1-5% archaeal genes; Mesophilic methanogen archaea have ~30% their genes of bacterial origin, while their hyperthermophilic relatives only have ~3% mesophilic bacterial genes. This larger scale sharing in genes can only be explained by HGT.
11
Correlation between the similarity in organisms’ lifestyles and the number of genes they share
Bacterial hyperthermophile Thermoanaerobacter tengcongensis 258 (yellow dots) (10%) of 2588 genes are more similar to archaeal genes than their bacterial homologs Bacterial mesophile Bacillus sutilus Only 174 (4.2%) of 4112 genes are more similar to archaeal genes than their bacterial homologs
12
Correlation between the similarity in organisms’ lifestyles and the number of genes they share
Archaeal hyperthermophile Methanopyrus. kandleri Only 98 (6%) (blue dots) of 1687 genes are more similar to bacterial genes than their archaeal homologs Archaeal mesophile Methanopyrus. acetivorans 1453 (32%) of 4540 genes are more similar to bacterial genes than their archaeal homologs
13
The indicatives of HGT Surrogate criteria Sequence similarity
Codon bias Unexpected conservation of gene order between distant species Phylogenetic tree criteria Disagreement between the species tree and the gene tree.
14
HGT can occur in essential genes
HGT is more prevalent in no essential genes, and this is explained by the complexity hypothesis: Genes coding for protein subunits of macromolecular complexes or, more generally, proteins involved in a wide range of interactions, are less subject to HGT. But HGT is also found for essential genes. Glutamate and glutamine animoacyl-tRNA synthetases (E/Q-aaRS) E-aaRS Transamidase E + tRNAE E-tRNAE Q-tRNAE Q-aaRS Q + tRNAQ Q-tRNAQ
15
HGT for aminoacyl-tRNA synthetases
HGT for glutamate and glutamine animoacyl-tRNA synthetases (Q-aaRS); g-proteobacteria acquired Q-aaRS from eukaryotes
16
HGT for aminoacyl-tRNA synthetases
HGT for tryptophan animoacyl-tRNA synthetases (W-aaRS); Archeeon P. horikoshii acquired W-aaRS from eukaryotes
17
HGT between prokaryotes and animals
Phylogenetic tree of monoaminoxidases (MAO).
18
HGT or gene loss or vertical inheritance ?
If the species tree are known, then the distribution of a COG on the tree can give a possible scenario of the evolution of the COG. Eubacteria In this COG 1 can be easily explained a vertical evolution. However, to explain the evolution of COG 2 has to invoke HGT and gene loss. If COG 2 emerged at root 2, then two HGT to archaea can explain the distribution. 2 1 COG1 3 COG2 Archaea If COG 2 emerged at root 1, then 4 (in eubacteria)+4 (in archaea) = 8 gene losses are needed to explain the distribution.
19
A simple algorithm for computing the number of gene loss and TGH
Mixed scenarios can also explain the observed distribution, e.g., if COG 2 emerged at root 3. then it would take one gene loss and in archaea and one HGT to eubacteria Designate Incompatibility Quotient for gene i to measure the most parsimonious number of gene loss and HGT events to reconcile its gene tree and given species tree, where l is the number of gene loss, h is the number of HGT events in the minimal (most parsimonious) evolutionary scenario for the given gene, e.g. a COG, and g is “HGT penalty”.
20
A simple algorithm for computing the number of gene loss and TGH
Number of gene loss and HGT events in most parsimonious evolutionary scenarios for COGs at g=1. This data suggest that most COG have undergone HGT and gene loss events.
21
Tree of life: before and after comparative genomics
Phylogenetic trees in the pre-genomic era Charles Darwin’s tree of life: conceptual evolutionary tree of life
22
Phylogenetic trees in the pre-genomic era
Ernst Haeckel’s tree of life: based on the similarity of morphological features Phylogenetic trees based on molecular sequences: ---- first molecular tree was based on cytochromes c and globins ---- widely accepted Tree of Life was based on small subunit ribosome RNA (rRNA) sequences These trees were build based on the molecular clock assumption: genes evolved at constant rate as long as the function of the gene product remains unchanged. Molecular phylogenetic tree was equated with the species tree assuming that the possibility of finding an optimal molecular marker for deciphering the history of life, e.g., rRNA.
23
Tree of life: after comparative genomics
Comparative genomics threatens the species tree concept With multiple completed genome sequences available, detailed analyses of protein families revealed that, there was not reliable phylogenetic signal in the trees even after probable HGTs were removed Even there is no consensus phylogeny for the archaea in the conserved core of archaeal genomes. The three main problems with using single genes to infer a species tree are --- insufficient number of informative sites, --- variability of evolutionary rates in different lineages --- the effect of HGT.
24
Tree of life: after comparative genomics
Comparative genomics threatens the species tree concept Thus there are some concerns that comparative genomics might uproot the very concept of a Tree of Life, at least for the prokaryotes. However, this challenge on the Tree of Life might also offer a way to salvage the concept itself by considering the entire body of information contained in the genomes or a rationally selected substantial part of this information.
25
Methods for Construction of Consensus Genome Trees
Criterion/Approach Method(s) Principal results Ref. Gene content Parsimony, distance methods Trees reflect partly phylogeny and partly similar lifestyles. Phylogenetic signal enhanced when distances normalized by genome size, but resolution limited. [230,358, 373,519, 605,786, 835,915] Gene order Results similar to gene content; effect of HGT noticeable. [470, 915] Mean similarity between orthologs Distance methods Trees appear to reflect largely phylogenetic relationships; limited resolution but some putative new lineages detected. [148,470, 915] Concatenated alignments of proteins less prone to HGT (e.g. ribosomal) Maximum likelihood, distance methods Results largely compatible with the mean similarity approach, but with better resolution; several potential new lineages detected. [119,332, 552,915] Consensus of phylogenetic analysis of multiple orthologous sets Used to verify the above approaches. Most of the new lineages strongly suggested by genome trees supported. [915]
26
A Consensus Genome Trees
Based on results of various genome-tree analyses, in particular, ---- trees made using the median similarity between orthologs ---- those based on concatenated alignments Different groups have attempted to depict the apparent consensus.
27
Another consensus genome tree
Francesca D. Ciccarelli, et al. Science 311, 1283 (2006); The tree has its basis in a concatenation of 31 orthologs occurring in 191 species with sequenced genomes The tree suggests a thermophilic last universal common ancestor.
28
Post-genomic View of the Tree of Life
The simple notion of a single Tree of Life that would accurately and completely depict the evolution of all life forms is gone forever. However, there is a phylogenetic signal in the sequences of prokaryotic proteins, but it is weak because of massive gene loss and HGT. It seems that, to capture this faint signal, analysis of genome-wide protein sets or carefully selected subsets is required. The concept of the Tree of Life is bound to change in the post-genomic world. It cannot be thought of as a definitive “species tree” anymore, but only as a central trend in the rich patchwork of evolutionary history, replete with gene loss and HGT.
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.