© 2005 Society of Systematic Biologists
An Empirical Examination of the Utility of Codon-Substitution Models in Phylogeny Reconstruction
1 Advanced Biomedical Information, Center for Information Medicine, Tokyo Medical and Dental University Japan
2 Department of Biology, University College London Darwin Building, Gower Street, London, WC1E 6BT, UK; E-mail: z.yang{at}ucl.ac.uk
Edited by Tim Collins
| Abstract |
|---|
Models of codon substitution have been commonly used to compare protein-coding DNA sequences and are particularly effective in detecting signals of natural selection acting on the protein. Their utility in reconstructing molecular phylogenies and in dating species divergences has not been explored. Codon models naturally accommodate synonymous and nonsynonymous substitutions, which occur at very different rates and may be informative for recent and ancient divergences, respectively. Thus codon models may be expected to make an efficient use of phylogenetic information in protein-coding DNA sequences. Here we applied codon models to 106 protein-coding genes from eight yeast species to reconstruct phylogenies using the maximum likelihood method, in comparison with nucleotide- and amino acid–based analyses. The results appeared to confirm that expectation. Nucleotide-based analysis, under simplistic substitution models, were efficient in recovering recent divergences whereas amino acid–based analysis performed better at recovering deep divergences. Codon models appeared to combine the advantages of amino acid and nucleotide data and had good performance at recovering both recent and deep divergences. Estimation of relative species divergence times using amino acid and codon models suggested that translation of gene sequences into proteins led to information loss of from 30% for deep nodes to 66% for recent nodes. Although computational burden makes codon models unfeasible for tree search in large data sets, we suggest that they may be useful for comparing candidate trees. Nucleotide models that accommodate the differences in evolutionary dynamics at the three codon positions also performed well, at much less computational cost. We discuss the relationship between a model's fit to data and its utility in phylogeny reconstruction and caution against use of overly complex substitution models.
Keywords: Codon models; divergence dates; maximum likelihood; phylogenetics; phylogenetic information
Received January 19, 2005; Revised March 25, 2005; Accepted May 24, 2005
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. A. Suchard and A. Rambaut Many-core algorithms for statistical phylogenetics Bioinformatics, June 1, 2009; 25(11): 1370 - 1376. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Anisimova and C. Kosiol Investigating Protein-Coding Sequence Evolution with Probabilistic Codon Substitution Models Mol. Biol. Evol., February 1, 2009; 26(2): 255 - 271. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. T Holder, D. J Zwickl, and C. Dessimoz Evaluating the robustness of phylogenetic methods to among-site variability in substitution processes Phil Trans R Soc B, December 27, 2008; 363(1512): 4013 - 4021. [Abstract] [Full Text] [PDF] |
||||
![]() |
T.-K. Seo and H. Kishino Synonymous Substitutions Substantially Improve Evolutionary Inference from Highly Diverged Proteins Syst Biol, June 1, 2008; 57(3): 367 - 377. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Wahlberg and C. W. Wheat Genomic Outposts Serve the Phylogenomic Pioneers: Designing Novel Nuclear Markers for Genomic DNA Extractions of Lepidoptera Syst Biol, April 1, 2008; 57(2): 231 - 242. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Rodrigue, H. Philippe, and N. Lartillot Uniformization for sampling realizations of Markov processes: applications to Bayesian implementations of codon substitution models Bioinformatics, January 1, 2008; 24(1): 56 - 62. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. A. Morrison Increasing the Efficiency of Searches for the Maximum Likelihood Tree in a Phylogenetic Analysis of up to 150 Nucleotide Sequences Syst Biol, December 1, 2007; 56(6): 988 - 1010. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Rodrigue, H. Philippe, and N. Lartillot Exploring Fast Computational Strategies for Probabilistic Phylogenetic Analysis Syst Biol, October 1, 2007; 56(5): 711 - 726. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Kosiol, I. Holmes, and N. Goldman An Empirical Codon Model for Protein Sequence Evolution Mol. Biol. Evol., July 1, 2007; 24(7): 1464 - 1479. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Gatesy, R. DeSalle, and N. Wahlberg How Many Genes Should a Systematist Sample? Conflicting Insights from a Phylogenomic Matrix Characterized by Replicated Incongruence Syst Biol, April 1, 2007; 56(2): 355 - 363. [Full Text] [PDF] |
||||
![]() |
L. Bofkin and N. Goldman Variation in Evolutionary Processes at Different Codon Positions Mol. Biol. Evol., February 1, 2007; 24(2): 513 - 521. [Abstract] [Full Text] [PDF] |
||||



