Systematic Biology Advance Access originally published online on June 29, 2009
Systematic Biology 2009 58(2):199-210; doi:10.1093/sysbio/syp015
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
© Society of Systematic Biologists
Statistical Comparison of Nucleotide, Amino Acid, and Codon Substitution Models for Evolutionary Analysis of Protein-Coding Sequences
1 Professional Programme for Agricultural Bioinformatics
2 Laboratory of Biometrics and Bioinformatics, Graduate School of Agricultural and Life Sciences, University of Tokyo, 1-1-1 Yayoi Bunkyo-Ku, Tokyo 113-8657, Japan
* Correspondence to be sent to: Professional Programme for Agricultural Bioinformatics, Graduate School of Agricultural and Life Sciences, University of Tokyo, 1-1-1 Yayoi Bunkyo-Ku, Tokyo 113-8657, Japan; E-mail: seo{at}iu.a.u-tokyo.ac.jp.
| Abstract |
|---|
Statistical models for the evolution of molecular sequences play an important role in the study of evolutionary processes. For the evolutionary analysis of protein-coding sequences, 3 types of evolutionary models are available: 1) nucleotide, 2) amino acid, and 3) codon substitution models. Selecting appropriate models can greatly improve the estimation of phylogenies and divergence times and the detection of positive selection. Although much attention has been paid to the comparisons among the same types of models, relatively little attention has been paid to the comparisons among the different types of models. Additionally, because such models have different data structures, comparison of those models using conventional model selection criteria such as Akaike information criterion (AIC) or Bayesian information criterion (BIC) is not straightforward. Here, we suggest new procedures to convert models of the above-mentioned 3 types to 64-dimensional models with nucleotide triplet substitution. These conversion procedures render it possible to statistically compare the models of these 3 types by using AIC or BIC. By analyzing divergent and conserved interspecific mammalian sequences and intraspecific human population data, we show the superiority of the codon substitution models and discuss the advantages and disadvantages of the models of the 3 types.
Keywords: AIC; amino acid model; BIC; codon model; likelihood ratio test; model comparison; nucleotide model
Received June 7, 2008; Revised August 26, 2008; Accepted January 5, 2009