Skip Navigation

Systematic Biology 2008 57(4):519-539; doi:10.1080/10635150802206883
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Li, C.
Right arrow Articles by Ortí, G.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Li, C.
Right arrow Articles by Ortí, G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2008 Society of Systematic Biologists

Optimal Data Partitioning and a Test Case for Ray-Finned Fishes (Actinopterygii) Based on Ten Nuclear Loci

Chenhong Li1, Guoqing Lu2 and Guillermo Ortí1

1 School of Biological Sciences, University of Nebraska Lincoln, NE 68588, USA; E-mail: cli{at}unlserve.unl.edu (C.L.); gorti{at}unlserve.unl.edu (G.O.)
2 Department of Biology, University of Nebraska Omaha, NE 68182, USA; E-mail: glu3{at}mail.unomaha.edu

Edited by Thomas Buckley


   Abstract

Data partitioning, the combined phylogenetic analysis of homogeneous blocks of data, is a common strategy used to accommodate heterogeneities in complex multilocus data sets. Variation in evolutionary rates and substitution patterns among sites are typically addressed by partitioning data by gene, codon position, or both. Excessive partitioning of the data, however, could lead to overparameterization; therefore, it seems critical to define the minimum numbers of partitions necessary to improve the overall fit of the model. We propose a new method, based on cluster analysis, to find an optimal partitioning strategy for multilocus protein-coding data sets. A heuristic exploration of alternative partitioning schemes, based on Bayesian and maximum likelihood (ML) criteria, is shown here to produce an optimal number of partitions. We tested this method using sequence data of 10 nuclear genes collected from 52 ray-finned fish (Actinopterygii) and four tetrapods. The concatenated sequences included 7995 nucleotide sites maximally split into 30 partitions defined a priori based on gene and codon position. Our results show that a model based on only 10 partitions defined by cluster analysis performed better than partitioning by both gene and codon position. Alternative data partitioning schemes also are shown to affect the topologies resulting from phylogenetic analysis, especially when Bayesian methods are used, suggesting that overpartitioning may be of major concern. The phylogenetic relationships among the major clades of ray-finned fish were assessed using the best data-partitioning schemes under ML and Bayesian methods. Some significant results include the monophyly of "Holostei" (Amia and Lepisosteus), the sister-group relationships between (1) esociforms and salmoniforms and (2) osmeriforms and stomiiforms, the polyphyly of Perciformes, and a close relationship of cichlids and atherinomorphs.

Keywords: Cluster analysis; data partitioning; Holostei; nuclear loci; phylogenetics; ray-finned fish; Actinopterygii

Received November 2, 2007; Revised January 14, 2008; Accepted April 7, 2008
Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Syst BiolHome page
D. San Mauro, D. J. Gower, T. Massingham, M. Wilkinson, R. Zardoya, and J. A. Cotton
Experimental Design in Caecilian Systematics: Phylogenetic Information of Mitochondrial Genomes and Nuclear rag1
Syst Biol, August 18, 2009; (2009) syp043v1.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. M. Brown and R. ElDabaje
PuMA: Bayesian analysis of partitioned (and unpartitioned) model adequacy
Bioinformatics, February 15, 2009; 25(4): 537 - 538.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.