Skip Navigation

Systematic Biology 2006 55(4):553-565; doi:10.1080/10635150600812544
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (14)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Beiko, R. G.
Right arrow Articles by Ragan, M. A.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Beiko, R. G.
Right arrow Articles by Ragan, M. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2006 Society of Systematic Biologists

Searching for Convergence in Phylogenetic Markov Chain Monte Carlo

Robert G. Beiko1, Jonathan M. Keith2, Timothy J. Harlow1 and Mark A. Ragan1

1 ARC Centre in Bioinformatics and Institute for Molecular Bioscience, The University of Queensland Brisbane, Queensland, 4072, Australia E-mail: r.beiko{at}gmail.com (R.G.B.) and ARC Centre in Bioinformatics
2 Department of Mathematics, The University of Queensland Brisbane, Australia

Edited by Jack Sullivan: Associate Editor


   Abstract

Markov chain Monte Carlo (MCMC) is a methodology that is gaining widespread use in the phylogenetics community and is central to phylogenetic software packages such as MrBayes. An important issue for users of MCMC methods is how to select appropriate values for adjustable parameters such as the length of the Markov chain or chains, the sampling density, the proposal mechanism, and, if Metropolis-coupled MCMC is being used, the number of heated chains and their temperatures. Although some parameter settings have been examined in detail in the literature, others are frequently chosen with more regard to computational time or personal experience with other data sets. Such choices may lead to inadequate sampling of tree space or an inefficient use of computational resources. We performed a detailed study of convergence and mixing for 70 randomly selected, putatively orthologous protein sets with different sizes and taxonomic compositions. Replicated runs from multiple random starting points permit a more rigorous assessment of convergence, and we developed two novel statistics, {delta} and {varepsilon}, for this purpose. Although likelihood values invariably stabilized quickly, adequate sampling of the posterior distribution of tree topologies took considerably longer. Our results suggest that multimodality is common for data sets with 30 or more taxa and that this results in slow convergence and mixing. However, we also found that the pragmatic approach of combining data from several short, replicated runs into a "metachain" to estimate bipartition posterior probabilities provided good approximations, and that such estimates were no worse in approximating a reference posterior distribution than those obtained using a single long run of the same length as the metachain. Precision appears to be best when heated Markov chains have low temperatures, whereas chains with high temperatures appear to sample trees with high posterior probabilities only rarely.

Keywords: Bayesian phylogenetic inference; heating parameter; Markov chain Monte Carlo; replicated chains

Received April 25, 2005; Revised August 10, 2005; Accepted April 19, 2006
Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Syst BiolHome page
S. S. Renner, G. W. Grimm, G. M. Schneeweiss, T. F. Stuessy, and R. E. Ricklefs
Rooting and Dating Maples (Acer) with an Uncorrelated-Rates Molecular Clock: Implications for North American/Asian Disjunctions
Syst Biol, October 1, 2008; 57(5): 795 - 808.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
M. Hohl and M. A. Ragan
Is Multiple-Sequence Alignment Required for Accurate Inference of Phylogeny?
Syst Biol, April 1, 2007; 56(2): 206 - 221.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.