Skip Navigation

Systematic Biology 2008 57(3):335-346; doi:10.1080/10635150802158688
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Sanderson, M. J.
Right arrow Articles by Wehe, A.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Sanderson, M. J.
Right arrow Articles by Wehe, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2008 Society of Systematic Biologists

The PhyLoTA Browser: Processing GenBank for Molecular Phylogenetics Research

Michael J. Sanderson1, Darren Boss1, Duhong Chen2, Karen A. Cranston1 and Andre Wehe2

1 Department of Ecology and Evolutionary Biology, University of Arizona Tucson 85721, USA; E-mail: sanderm{at}email.arizona.edu (M.J.S.); dboss{at}email.arizona.edu (D.B.); cranston{at}email.arizona.edu (K.C.)
2 Department of Computer Science, Iowa State University Ames, IA, USA; E-mail: duhong{at}iastate.edu (D.C.); andre{at}wehe.us (A.W.)

Edited by Olaf Bininda-Emonds


   Abstract

As an archive of sequence data for over 165,000 species, GenBank is an indispensable resource for phylogenetic inference. Here we describe an informatics processing pipeline and online database, the PhyLoTA Browser (http://loco.biosci.arizona.edu/pb), which offers a view of GenBank tailored for molecular phylogenetics. The first release of the Browser is computed from 2.6 million sequences representing the taxonomically enriched subset of GenBank sequences for eukaryotes (excluding most genome survey sequences, ESTs, and other high-throughput data). In addition to summarizing sequence diversity and species diversity across nodes in the NCBI taxonomy, it reports 87,000 potentially phylogenetically informative clusters of homologous sequences, which can be viewed or downloaded, along with provisional alignments and coarse phylogenetic trees. At each node in the NCBI hierarchy, the user can display a "data availability matrix" of all available sequences for entries in a subtaxa-by-clusters matrix. This matrix provides a guidepost for subsequent assembly of multigene data sets or supertrees. The database allows for comparison of results from previous GenBank releases, highlighting recent additions of either sequences or taxa to GenBank and letting investigators track progress on data availability worldwide. Although the reported alignments and trees are extremely approximate, the database reports several statistics correlated with alignment quality to help users choose from alternative data sources.

Keywords: GenBank; phyloinformatics; phylogenetic database; phylogenomics

Received October 29, 2007; Revised January 8, 2008; Accepted March 7, 2008
Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Syst BiolHome page
R. C. Thomson and H. B. Shaffer
Sparse Supermatrices for Phylogenetic Inference: Taxonomy, Alignment, Rogue Taxa, and the Phylogeny of Living Turtles
Syst Biol, November 11, 2009; (2009) syp075v1.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.