© 2007 Society of Systematic Biologists
Is Multiple-Sequence Alignment Required for Accurate Inference of Phylogeny?
1 Australian Research Council Centre in Bioinformatics, and Institute for Molecular Bioscience, The University of Queensland Brisbane, QLD 4072, Australia E-mail: m.ragan{at}imb.uq.edu.au
Edited by Rod Page: Associate Editor
| Abstract |
|---|
The process of inferring phylogenetic trees from molecular sequences almost always starts with a multiple alignment of these sequences but can also be based on methods that do not involve multiple sequence alignment. Very little is known about the accuracy with which such alignment-free methods recover the correct phylogeny or about the potential for increasing their accuracy. We conducted a large-scale comparison of ten alignment-free methods, among them one new approach that does not calculate distances and a faster variant of our pattern-based approach; all distance-based alignment-free methods are freely available from http://www.bioinformatics.org.au (as Python package decaf+py). We show that most methods exhibit a higher overall reconstruction accuracy in the presence of high among-site rate variation. Under all conditions that we considered, variants of the pattern-based approach were significantly better than the other alignment-free methods. The new pattern-based variant achieved a speed-up of an order of magnitude in the distance calculation step, accompanied by a small loss of tree reconstruction accuracy. A method of Bayesian inference from k-mers did not improve on classical alignment-free (and distance-based) methods but may still offer other advantages due to its Bayesian nature. We found the optimal word length k of word-based methods to be stable across various data sets, and we provide parameter ranges for two different alphabets. The influence of these alphabets was analyzed to reveal a trade-off in reconstruction accuracy between long and short branches. We have mapped the phylogenetic accuracy for many alignment-free methods, among them several recently introduced ones, and increased our understanding of their behavior in response to biologically important parameters. In all experiments, the pattern-based approach emerged as superior, at the expense of higher resource consumption. Nonetheless, no alignment-free method that we examined recovers the correct phylogeny as accurately as does an approach based on maximum-likelihood distance estimates of multiply aligned sequences.
Keywords: Alignment-free methods; Bayesian; distance estimation; phylogenetics; tree reconstruction
Received May 3, 2006; Revised July 18, 2006; Accepted October 20, 2006
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
G. A. Wu, S.-R. Jun, G. E. Sims, and S.-H. Kim Whole-proteome phylogeny of large dsDNA virus families by an alignment-free method PNAS, August 4, 2009; 106(31): 12826 - 12831. [Abstract] [Full Text] [PDF] |
||||
