Skip Navigation

Systematic Biology 2008 57(1):76-85; doi:10.1080/10635150801898920
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Ripplinger, J.
Right arrow Articles by Sullivan, J.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Ripplinger, J.
Right arrow Articles by Sullivan, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2008 Society of Systematic Biologists

Does Choice in Model Selection Affect Maximum Likelihood Analysis?

Jennifer Ripplinger1 and Jack Sullivan1,2

1 Bioinformatics and Computational Biology, University of Idaho Moscow, Idaho 83844-3051, USA; E-mail: jripplinger{at}vandals.uidaho.edu (J.R.)
2 Department of Biological Sciences, University of Idaho Moscow, Idaho 83844-3051, USA


   Abstract

In order to have confidence in model-based phylogenetic analysis, the model of nucleotide substitution adopted must be selected in a statistically rigorous manner. Several model-selection methods are applicable to maximum likelihood (ML) analysis, including the hierarchical likelihood-ratio test (hLRT), Akaike information criterion (AIC), Bayesian information criterion (BIC), and decision theory (DT), but their performance relative to empirical data has not been investigated thoroughly. In this study, we use 250 phylogenetic data sets obtained from TreeBASE to examine the effects that choice in model selection has on ML estimation of phylogeny, with an emphasis on optimal topology, bootstrap support, and hypothesis testing. We show that the use of different methods leads to the selection of two or more models for ~ 80% of the data sets and that the AIC typically selects more complex models than alternative approaches. Although ML estimation with different best-fit models results in incongruent tree topologies ~50% of the time, these differences are primarily attributable to alternative resolutions of poorly supported nodes. Furthermore, topologies and bootstrap values estimated with ML using alternative statistically supported models are more similar to each other than to topologies and bootstrap values estimated with ML under the Kimura two-parameter (K2P) model or maximum parsimony (MP). In addition, Swofford-Olsen-Waddell-Hillis (SOWH) tests indicate that ML trees estimated with alternative best-fit models are usually not significantly different from each other when evaluated with the same model. However, ML trees estimated with statistically supported models are often significantly suboptimal to ML trees made with the K2P model when both are evaluated with K2P, indicating that not all models perform in an equivalent manner. Nevertheless, the use of alternative statistically supported models generally does not affect tests of monophyletic relationships under either the Shimodaira-Hasegawa (S-H) or SOWH methods. Our results suggest that although choice in model selection has a strong impact on optimal tree topology, it rarely affects evolutionary inferences drawn from the data because differences are mainly confined to poorly supported nodes. Moreover, since ML with alternative best-fit models tends to produce more similar estimates of phylogeny than ML under the K2P model or MP, the use of any statistically based model-selection method is vastly preferable to forgoing the model-selection process altogether.

Keywords: Akaike information criterion; Bayesian information criterion; decision theory; hypothesis tests; likelihood-ratio test; maximum likelihood; model selection; nonparametric bootstrap

Received June 26, 2007; Revised September 10, 2007; Accepted October 23, 2007
Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Syst BiolHome page
H. Huang and L. L. Knowles
What Is the Danger of the Anomaly Zone for Empirical Phylogenetics?
Syst Biol, October 1, 2009; 58(5): 527 - 536.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
A. Stamatakis, P. Hoover, and J. Rougemont
A Rapid Bootstrap Algorithm for the RAxML Web Servers
Syst Biol, October 1, 2008; 57(5): 758 - 771.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.