Skip Navigation

Systematic Biology 2008 57(2):216-230; doi:10.1080/10635150802032990
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Ross, H. A.
Right arrow Articles by Sibon Li, W. L.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Ross, H. A.
Right arrow Articles by Sibon Li, W. L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2008 Society of Systematic Biologists

Testing the Reliability of Genetic Methods of Species Identification via Simulation

Howard A. Ross, Sumathi Murugan and Wai Lok Sibon Li

Bioinformatics Institute, University of Auckland Private Bag 92019, Auckland Mail Centre, Auckland 1142, New Zealand; E-mail: h.ross{at}auckland.ac.nz (H.A.R.)

Edited by Marshal Hedin


   Abstract

Although genetic methods of species identification, especially DNA barcoding, are strongly debated, tests of these methods have been restricted to a few empirical cases for pragmatic reasons. Here we use simulation to test the performance of methods based on sequence comparison (BLAST and genetic distance) and tree topology over a wide range of evolutionary scenarios. Sequences were simulated on a range of gene trees spanning almost three orders of magnitude in tree depth and in coalescent depth; that is, deep or shallow trees with deep or shallow coalescences. When the query's conspecific sequences were included in the reference alignment, the rate of positive identification was related to the degree to which different species were genetically differentiated. The BLAST, distance, and liberal tree-based methods returned higher rates of correct identification than did the strict tree-based requirement that the query was within, but not sister to, a single-species clade. Under this more conservative approach, ambiguous outcomes occurred in inverse proportion to the number of reference sequences per species. When the query's conspecific sequences were not in the reference alignment, only the strict tree-based approach was relatively immune to making false-positive identifications. Thresholds affected the rates at which false-positive identifications were made when the query's species was unrepresented in the reference alignment but did not otherwise influence outcomes. A conservative approach using the strict tree-based method should be used initially in large-scale identification systems, with effort made to maximize sequence sampling within species. Once the genetic variation within a taxonomic group is well characterized and the taxonomy resolved, then the choice of method used should be dictated by considerations of computational efficiency. The requirement for extensive genetic sampling may render these techniques inappropriate in some circumstances.

Keywords: BLAST; DNA barcoding; phylogenetic; species identification

Received May 3, 2007; Revised July 13, 2007; Accepted January 11, 2008
Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.