skip to main content
article

Statistical Alignment with a Sequence Evolution Model Allowing Rate Heterogeneity along the Sequence

Published: 01 April 2009 Publication History

Abstract

We present a stochastic sequence evolution model to obtain alignments and estimate mutation rates between two homologous sequences. The model allows two possible evolutionary behaviors along a DNA sequence in order to determine conserved regions and take its heterogeneity into account. In our model, the sequence is divided into slow and fast evolution regions. The boundaries between these sections are not known. It is our aim to detect them. The evolution model is based on a fragment insertion and deletion process working on fast regions only and on a substitution process working on fast and slow regions with different rates. This model induces a pair hidden Markov structure at the level of alignments, thus making efficient statistical alignment algorithms possible. We propose two complementary estimation methods, namely, a Gibbs sampler for Bayesian estimation and a stochastic version of the EM algorithm for maximum likelihood estimation. Both algorithms involve the sampling of alignments. We propose a partial alignment sampler, which is computationally less expensive than the typical whole alignment sampler. We show the convergence of the two estimation algorithms when used with this partial sampler. Our algorithms provide consistent estimates for the mutation rates and plausible alignments and sequence segmentations on both simulated and real data.

References

[1]
R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge Univ. Press, 1998.
[2]
F. Ronquist and J.P. Huelsenbeck, "Mrbayes 3: Bayesian Phylogenetic Inference under Mixed Models," Bioinformatics, vol. 19, pp. 1572-1574, 2003.
[3]
J. Felsenstein, Inferring Phylogenies. Sinauer Assoc., 2004.
[4]
T. Jukes and C. Cantor, Evolution of Protein Molecules in Mammalian Protein Metabolism, H.N. Munro, ed., pp. 21-132. Academic Press, 1969.
[5]
M. Dayhoff, R. Schwartz, and B. Orcutt, "A Model of Evolutionary Change in Proteins," Atlas of Protein Structure, vol. 5, no. 3, pp. 345- 352, Nat'l Biomedical Research Foundation, 1978.
[6]
J. Thorne, H. Kishino, and J. Felsenstein, "An Evolutionary Model for Maximum Likelihood Alignment of DNA Sequences," J. Molecular Evolution, vol. 33, pp. 114-124, 1991.
[7]
J. Thorne, H. Kishino, and J. Felsenstein, "Inching toward Reality: An Improved Likelihood Model of Sequence Evolution," J. Molecular Evolution, vol. 34, pp. 3-16, 1992.
[8]
D. Metzler, R. Fleissner, A. Wakolbinger, and A. von Haeseler, "Assessing Variability by Joint Sampling of Alignments and Mutation Rates," J. Molecular Evolution, vol. 53, no. 6, pp. 660-669, 2001.
[9]
D. Metzler, "Statistical Alignment Based on Fragment Insertion and Deletion Models," Bioinformatics, vol. 19, no. 4, pp. 490-499, 2003.
[10]
J. Hein, C. Wiuf, B. Knudsen, M. Moller, and G. Wibling, "Statistical Alignment: Computational Properties, Homology Testing and Goodness-of-Fit," J. Molecular Biology, vol. 302, pp. 265-279, 2000.
[11]
I. Miklos, G.A. Lunter, and I. Holmes, "A 'Long Indel' Model for Evolutionary Sequence Alignment," Molecular Biology and Evolution , vol. 21, no. 3, pp. 529-540, 2004.
[12]
G. Lunter, C. Ponting, and J. Hein, "Genome-Wide Identification of Human Functional DNA Using a Neutral Indel Model," PLoS Computational Biology, vol. 2, no. 1, p. e5, 2006.
[13]
F. Chiaromonte, R. Weber, K. Roskin, M. Diekhans, W. Kent, and D. Haussler, "The Share of Human Genomic DNA under Selection Estimated from Human-Mouse Genomic Alignments," Proc. Cold Spring Harbor Symp. Quantitative Biology, vol. 68, pp. 245-254, 2003.
[14]
G. Bejerano, M. Pheasant, I. Makunin, S. Stephen, W. Kent, J. Mattick, and D. Haussler, "Ultraconserved Elements in the Human Genome," Science, vol. 304, no. 5675, pp. 1321-1325, 2004.
[15]
A. Siepel, G. Bejerano, J. Pedersen, A. Hinrichs, M. Hou, K. Rosenbloom, H. Clawson, J. Spieth, L. Hillier, S. Richards, G. Weinstock, R. Wilson, R. Gibbs, W. Kent, W. Miller, and D. Haussler, "Evolutionarily Conserved Elements in Vertebrate, Insect, Worm, and Yeast Genomes," Genome Research, vol. 15, no. 8, pp. 1034-1050, 2005.
[16]
A. Stathopoulos and M. Levine, "Genomic Regulatory Networks and Animal Development," Developmental Cell, vol. 9, no. 4, pp. 449-462, 2005.
[17]
J. Felsenstein and G. Churchill, "A Hidden Markov Model Approach to Variation among Sites in Rate of Evolution," Molecular Biology and Evolution, vol. 13, pp. 93-104, 1996.
[18]
A. Dempster, N. Laird, and D. Rubin, "Maximum Likelihood from Incomplete Data via the EM Algorithm," J. Royal Statistical Soc. B, vol. 39, pp. 1-38, 1977.
[19]
I. Holmes, "Using Evolutionary Expectation Maximization to Estimate Indel Rates," Bioinformatics, vol. 21, no. 10, pp. 2294-2300, 2005.
[20]
B. Delyon, M. Lavielle, and E. Moulines, "Convergence of a Stochastic Approximation Version of the EM Algorithm," The Annals of Statistics, vol. 27, pp. 94-128, 1999.
[21]
J. Liu and C.E. Lawrence, "Bayesian Inference on Biopolymer Models," Bioinformatics, vol. 15, no. 1, pp. 38-52, 1999.
[22]
C. Robert and G. Casella, Monte Carlo Statistical Methods. Springer, 2004.
[23]
N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller, "Equations of State Calculations by Fast Computing Machines," J. Chemical Physics, vol. 21, no. 6, pp. 1087-1092, 1953.
[24]
W. Hastings, "Monte Carlo Sampling Methods Using Markov Chains and Their Applications," Biometrika, vol. 57, no. 1, pp. 97- 109, 1970.
[25]
E. Kuhn and M. Lavielle, "Coupling a Stochastic Approximation Version of EM with a MCMC Procedure," ESAIM Probability and Statistics, vol. 8, pp. 115-131, 2004.
[26]
M. Powell, "A Fast Algorithm for Nonlinearly Constrained Optimization Calculations," Lecture Notes in Math., vol. 630, pp. 144-157, 1978.
[27]
X. Li and Z. Zhao, "Relative Error Measures for Evaluation of Estimation Algorithms," Proc. Eighth Int'l Conf. Information Fusion, 2005.
[28]
A. Arribas-Gil, E. Gassiat, and C. Matias, "Parameter Estimation in Pair Hidden Markov Models," Scandinavian J. Statistics, vol. 33, no. 4, pp. 651-671, 2006.
[29]
D. Kurokawa, H. Kiyonari, R. Nakayama, C. Kimura-Yoshida, I. Matsuo, and S. Aizawa, "Regulation of Otx2 Expression and Its Functions in Mouse Forebrain and Midbrain," Development, vol. 131, no. 14, pp. 3319-3331, 2004.
[30]
D. Kurokawa, N. Takasaki, H. Kiyonari, R. Nakayama, C. Kimura-Yoshida, I. Matsuo, and S. Aizawa, "Regulation of Otx2 Expression and Its Functions in Mouse Epiblast and Anterior Neuroectoderm," Development, vol. 131, no. 14, pp. 3307-3317, 2004.
[31]
I. Holmes and W. Bruno, "Evolutionary HMMs: A Bayesian Approach to Multiple Alignment," Bioinformatics, vol. 17, pp. 803- 820, 2001.
[32]
R. Fleissner, D. Metzler, and A. von Haeseler, "Simultaneous Statistical Multiple Alignment and Phylogeny Reconstruction," Systematic Biology, vol. 54, no. 4, pp. 548-561, 2005.
[33]
G. Lunter, I. Miklos, A. Drummond, J. Jensen, and J. Hein, "Bayesian Coestimation of Phylogeny and Sequence Alignment," BMC Bioinformatics, pp. 6-83, 2005.

Cited By

View all
  • (2015)Compression of Multiple DNA Sequences Using Intra-Sequence and Inter-Sequence SimilaritiesIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2015.240337012:6(1322-1332)Online publication date: 1-Nov-2015
  • (2011)FEASTIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2010.768:3(698-709)Online publication date: 1-May-2011

Index Terms

  1. Statistical Alignment with a Sequence Evolution Model Allowing Rate Heterogeneity along the Sequence

                          Recommendations

                          Comments

                          Information & Contributors

                          Information

                          Published In

                          IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 6, Issue 2
                          April 2009
                          191 pages

                          Publisher

                          IEEE Computer Society Press

                          Washington, DC, United States

                          Publication History

                          Published: 01 April 2009
                          Published in TCBB Volume 6, Issue 2

                          Author Tags

                          1. Markov processes
                          2. biology and genetics.
                          3. mathematics and statistics
                          4. probabilistic algorithms
                          5. sequence evolution

                          Qualifiers

                          • Article

                          Contributors

                          Other Metrics

                          Bibliometrics & Citations

                          Bibliometrics

                          Article Metrics

                          • Downloads (Last 12 months)3
                          • Downloads (Last 6 weeks)2
                          Reflects downloads up to 12 Jan 2025

                          Other Metrics

                          Citations

                          Cited By

                          View all
                          • (2015)Compression of Multiple DNA Sequences Using Intra-Sequence and Inter-Sequence SimilaritiesIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2015.240337012:6(1322-1332)Online publication date: 1-Nov-2015
                          • (2011)FEASTIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2010.768:3(698-709)Online publication date: 1-May-2011

                          View Options

                          Login options

                          Full Access

                          View options

                          PDF

                          View or Download as a PDF file.

                          PDF

                          eReader

                          View online with eReader.

                          eReader

                          Media

                          Figures

                          Other

                          Tables

                          Share

                          Share

                          Share this Publication link

                          Share on social media