Skip Navigation

Systematic Biology 2006 55(5):740-755; doi:10.1080/10635150600969872
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (12)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Criscuolo, A.
Right arrow Articles by Gascuel, O.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Criscuolo, A.
Right arrow Articles by Gascuel, O.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2006 Society of Systematic Biologists

SDM: A Fast Distance-Based Approach for (Super)Tree Building in Phylogenomics

Alexis Criscuolo1,2, Vincent Berry2, Emmanuel J. P. Douzery1 and Olivier Gascuel2

1 Groupe Phylogénie Moléculaire, ISEM, Université Montpellier 2, CC 064 34095, Montpellier Cedex 05, France
2 Equipe Méthodes et Algorithmes pour la Bioinformatique, LIRMM (CNRS, Université Montpellier 2) 161 rue Ada, 34392, Montpellier Cedex 05, France E-mail: gascuel{at}lirmm.fr (O.G.)

Edited by Olaf Bininda-Emonds: Associate Editor


   Abstract

Phylogenomic studies aim to build phylogenies from large sets of homologous genes. Such "genome-sized" data require fast methods, because of the typically large numbers of taxa examined. In this framework, distance-based methods are useful for exploratory studies and building a starting tree to be refined by a more powerful maximum likelihood (ML) approach. However, estimating evolutionary distances directly from concatenated genes gives poor topological signal as genes evolve at different rates. We propose a novel method, named super distance matrix (SDM), which follows the same line as average consensus supertree (ACS; Lapointe and Cucumel, 1997) and combines the evolutionary distances obtained from each gene into a single distance supermatrix to be analyzed using a standard distance-based algorithm. SDM deforms the source matrices, without modifying their topological message, to bring them as close as possible to each other; these deformed matrices are then averaged to obtain the distance supermatrix. We show that this problem is equivalent to the minimization of a least-squares criterion subject to linear constraints. This problem has a unique solution which is obtained by resolving a linear system. As this system is sparse, its practical resolution requires O(na ka) time, where n is the number of taxa, k the number of matrices, and a < 2, which allows the distance supermatrix to be quickly obtained. Several uses of SDM are proposed, from fast exploratory studies to more accurate approaches requiring heavier computing time. Using simulations, we show that SDM is a relevant alternative to the standard matrix representation with parsimony (MRP) method, notably when the taxa sets of the different genes have low overlap. We also show that SDM can be used to build an excellent starting tree for an ML approach, which both reduces the computing time and increases the topogical accuracy. We use SDM to analyze the data set of Gatesy et al. (2002, Syst. Biol. 51: 652–664) that involves 48 genes of 75 placental mammals. The results indicate that these genes have strong rate heterogeneity and confirm the simulation conclusions.

Keywords: Distance method; evolutionary distances; MRP; phylogenomics; supermatrix; supertree; total evidence

Received September 30, 2005; Revised December 10, 2005; Accepted April 12, 2006
Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J. Clin. Microbiol.Home page
A. M. Zelazny, J. M. Root, Y. R. Shea, R. E. Colombo, I. C. Shamputa, F. Stock, S. Conlan, S. McNulty, B. A. Brown-Elliott, R. J. Wallace Jr., et al.
Cohort Study of Molecular Identification and Typing of Mycobacterium abscessus, Mycobacterium massiliense, and Mycobacterium bolletii
J. Clin. Microbiol., July 1, 2009; 47(7): 1985 - 1995.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
F. Cheng, S. Hartmann, M. Gupta, J. G. Ibrahim, and T. J. Vision
A hierarchical model for incomplete alignments in phylogenetic inference
Bioinformatics, March 1, 2009; 25(5): 592 - 598.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
V. Ranwez, V. Berry, A. Criscuolo, P.-H. Fabre, S. Guillemot, C. Scornavacca, and E. J. P. Douzery
PhySIC: A Veto Supertree Method with Desirable Properties
Syst Biol, October 1, 2007; 56(5): 798 - 817.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.