| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
© 2007 Society of Systematic Biologists
New Approaches to Phylogenetic Tree Search and Their Application to Large Numbers of Protein Alignments
1 University of Manchester, Faculty of Life Sciences, Michael Smith Building Oxford Road, Manchester, M13 9PT, UK E-mail: simon.whelan{at}manchester.ac.uk
2 EMBL—European Bioinformatics Institute, Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD, UK
Edited by Thomas Buckley
| Abstract |
|---|
Phylogenetic tree estimation plays a critical role in a wide variety of molecular studies, including molecular systematics, phylogenetics, and comparative genomics. Finding the optimal tree relating a set of sequences using score-based (optimality criterion) methods, such as maximum likelihood and maximum parsimony, may require all possible trees to be considered, which is not feasible even for modest numbers of sequences. In practice, trees are estimated using heuristics that represent a trade-off between topological accuracy and speed. I present a series of novel algorithms suitable for score-based phylogenetic tree reconstruction that demonstrably improve the accuracy of tree estimates while maintaining high computational speeds. The heuristics function by allowing the efficient exploration of large numbers of trees through novel hill-climbing and resampling strategies. These heuristics, and other computational approximations, are implemented for maximum likelihood estimation of trees in the program Leaphy, and its performance is compared to other popular phylogenetic programs. Trees are estimated from 4059 different protein alignments using a selection of phylogenetic programs and the likelihoods of the tree estimates are compared. Trees estimated using Leaphy are found to have equal to or better likelihoods than trees estimated using other phylogenetic programs in 4004 (98.6%) families and provide a unique best tree that no other program found in 1102 (27.1%) families. The improvement is particularly marked for larger families (80 to 100 sequences), where Leaphy finds a unique best tree in 81.7% of families.
Keywords: Algorithms; evolution; phylogenetic tree inference; tree estimation heuristics
Received July 12, 2006; Revised October 17, 2006; Accepted April 6, 2007
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. Loytynoja and N. Goldman Uniting Alignments and Trees Science, June 19, 2009; 324(5934): 1528 - 1529. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Whelan The genetic code can cause systematic bias in simple phylogenetic models Phil Trans R Soc B, December 27, 2008; 363(1512): 4003 - 4011. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Stamatakis, P. Hoover, and J. Rougemont A Rapid Bootstrap Algorithm for the RAxML Web Servers Syst Biol, October 1, 2008; 57(5): 758 - 771. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. P. Huelsenbeck, C. Ane, B. Larget, and F. Ronquist A Bayesian Perspective on a Non-parsimonious Parsimony Model Syst Biol, June 1, 2008; 57(3): 406 - 419. [Abstract] [Full Text] [PDF] |
||||


