© 2008 Society of Systematic Biologists
Inferring Species Membership Using DNA Sequences with Back-Propagation Neural Networks
1 Institute of Zoology, Chinese Academy of Sciences Beijing 100080, P. R. China; E-mail: zhangab2008{at}yahoo.com.cn; zhangab{at}ioz.ac.cn
2 University of Alaska Museum 907 Yukon Drive, Fairbanks, Alaska 99775-6960, USA
3 Molecular Evolution and Animal Systematics, University of Leipzig Talstrasse 33, D-04103 Leipzig, Germany
4 Current Address: Albanova University Center, Royal Institute of Biotechnology SE-106 91 Stockholm, Sweden; E-mail: abzh{at}kth.se
Edited by Marshal Hedin
* To whom correspondence should be sent.
| Abstract |
|---|
DNA barcoding as a method for species identification is rapidly increasing in popularity. However, there are still relatively few rigorous methodological tests of DNA barcoding. Current distance-based methods are frequently criticized for treating the nearest neighbor as the closest relative via a raw similarity score, lacking an objective set of criteria to delineate taxa, or for being incongruent with classical character-based taxonomy. Here, we propose an artificial intelligence–based approach—inferring species membership via DNA barcoding with back-propagation neural networks (named BP-based species identification)—as a new advance to the spectrum of available methods. We demonstrate the value of this approach with simulated data sets representing different levels of sequence variation under coalescent simulations with various evolutionary models, as well as with two empirical data sets of COI sequences from East Asian ground beetles (Carabidae) and Costa Rican skipper butterflies. With a 630-to 690-bp fragment of the COI gene, we identified 97.50% of 80 unknown sequences of ground beetles, 95.63%, 96.10%, and 100% of 275, 205, and 9 unknown sequences of the neotropical skipper butterfly to their correct species, respectively. Our simulation studies indicate that the success rates of species identification depend on the divergence of sequences, the length of sequences, and the number of reference sequences. Particularly in cases involving incomplete lineage sorting, this new BP-based method appears to be superior to commonly used methods for DNA-based species identification.
Keywords: Back-propagation; DNA barcoding; incomplete lineage sorting; neural networks; species identification
Received February 4, 2007; Revised May 3, 2007; Accepted January 11, 2008
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
B. C. O'Meara New Heuristic Methods for Joint Species Delimitation and Species Tree Inference Syst Biol, November 10, 2009; (2009) syp077v1. [Abstract] [Full Text] [PDF] |
||||
