© 2007 Society of Systematic Biologists
A Nonparametric Method for Accommodating and Testing Across-Site Rate Variation
1 Department of Integrative Biology, University of California Berkeley, CA, 94720, USA E-mail: johnh{at}berkeley.edu
2 Department of Biomathematics, David Geffen School of Medicine at UCLA Los Angeles, CA, 90095, USA
3 Department of Human Genetics, David Geffen School of Medicine at UCLA Los Angeles, CA, 90095, USA
4 Department of Biostatistics, UCLA School of Public Health Los Angeles, CA, 90095, USA
Edited by Thomas Buckley: Associate Editor
| Abstract |
|---|
Substitution rates are one of the most fundamental parameters in a phylogenetic analysis and are represented in phylogenetic models as the branch lengths on a tree. Variation in substitution rates across an alignment of molecular sequences is well established and likely caused by variation in functional constraint across the genes encoded in the sequences. Rate variation across alignment sites is important to accommodate in a phylogenetic analysis; failure to account for across-site rate variation can cause biased estimates of phylogeny or other model parameters. Traditionally, rate variation across sites has been modeled by treating the rate for a site as a random variable drawn from some probability distribution (such as the gamma probability distribution) or by partitioning sites to different rate classes and estimating the rate for each class independently. We consider a different approach, related to site-specific models in which sites are partitioned to rate classes. However, instead of treating the partitioning scheme in which sites are assigned to rate classes as a fixed assumption of the analysis, we treat the rate partitioning as a random variable under a Dirichlet process prior. We find that the Dirichlet process prior model for across-site rate variation fits alignments of DNA sequence data better than commonly used models of across-site rate variation. The method appears to identify the underlying codon structure of protein-coding genes; rate partitions that were sampled by the Markov chain Monte Carlo procedure were closer to a partition in which sites are assigned to rate classes by codon position than to randomly permuted partitions but still allow for additional variability across sites.
Keywords: Across-site rate variation; Bayesian estimation; Dirichlet process prior; Markov chain Monte Carlo
Received November 15, 2006; Revised January 28, 2007; Accepted June 7, 2007
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
E. W. Bloomquist and M. A. Suchard Unifying Vertical and Nonvertical Evolution: A Stochastic ARG-based Framework Syst Biol, November 9, 2009; (2009) syp076v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Lartillot, T. Lepage, and S. Blanquart PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating Bioinformatics, September 1, 2009; 25(17): 2286 - 2288. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Si Quang, O. Gascuel, and N. Lartillot Empirical profile mixture models for phylogenetic reconstruction Bioinformatics, October 15, 2008; 24(20): 2317 - 2323. [Abstract] [Full Text] [PDF] |
||||

