© 2006 Society of Systematic Biologists
DNA Barcoding Will Often Fail to Discover New Animal Species over Broad Parameter Space
Edited by Marshal Hedin: Associate Editor
1 Museum of Vertebrate Zoology, University of California Berkeley, California, 94720-3160, USA E-mail: mhick{at}berkeley.edu (M.J.H.)
2 Florida Museum of Natural History, University of Florida Gainesville, Florida, 32611-7800, USA
| Abstract |
|---|
|
|
|---|
With increasing force, genetic divergence of mitochondrial DNA (mtDNA) is being argued as the primary tool for discovery of animal species. Two thresholds of single-gene divergence have been proposed: reciprocal monophyly, and 10 times greater genetic divergence between than within species (the "10x rule"). To explore quantitatively the utility of each approach, we couple neutral coalescent theory and the classical Bateson-Dobzhansky-Muller (BDM) model of speciation. The joint stochastic dynamics of these two processes demonstrate that both thresholds fail to "discover" many reproductively isolated lineages under a single incompatibility BDM model, especially when BDM loci have been subject to divergent selection. Only when populations have been isolated for > 4 million generations did these thresholds achieve error rates of <10% under our model that incorporates variable population sizes. The high error rate evident in simulations is corroborated with six empirical data sets. These properties suggest that single-gene, high-throughput approaches to discovering new animal species will bias large-scale biodiversity surveys, particularly toward missing reproductively isolated lineages that have emerged by divergent selection or other mechanisms that accelerate reproductive isolation. Because single-gene thresholds for species discovery can result in substantial error at recent divergence times, they will misrepresent the correspondence between recently isolated populations and reproductively isolated lineages (= species).
Keywords: Allopatric; Bateson-Dobzhansky-Muller; DNA barcode; peripatric; reciprocal monophyly
Received December 22, 2005; Revised March 14, 2006; Accepted May 10, 2006
Uncertainty in species identification and discovery is inherent in biodiversity surveys and ecological studies of species-rich communities, and it has been argued forcefully that high-throughput screening of sequence variation at a single mitochondrial gene (sometimes referred to as "DNA-barcoding") will greatly improve such endeavors (Janzen, 2004). For ecologists, DNA-based identification of individuals at any life history stage would be a boon, and when done in the context of a rigorous taxonomic and phylogenetic framework will enhance the impact of systematics on biodiversity science (Wheeler et al., 2004). The controversy (Hebert and Gregorory, 2005; Lee, 2004; Meyer and Paulay, 2005; Moritz and Cicero, 2004; Will et al., 2005) centers on the proposed use of single-gene thresholds as a primary step in species discovery (Hebert et al., 2004; Wiens and Penkroft, 2002). Based on well-established population genetic theory (Tajima, 1983), the major concerns are the substantial noise in the rate at which any such threshold will be met (Hudson and Turelli, 2003; Takahata and Nei, 1985) and that a single-locus threshold can be confounded by introgression or selection (Machado and Hey, 2003). In addition, several studies suggest that there will be a bias against the discovery of recently evolved species, such as those arising via divergent selection (Mendelson and Shaw, 2005; Turner, 1999; Zigler et al., 2005). However, advocates of single-gene thresholds argue that the practical benefits of this approach outweigh potential for inaccuracy when faced with the current biodiversity crisis (Janzen, 2004). Although falsely discovering new species (false-positive error) is less of an issue because these highly diverged historic entities can still be regarded as important components of diversity (Moritz, 2002), it is crucial to better understand when single-gene thresholds will miss (false-negative error) newly discovered species. Herein, we combine predictions of speciation dynamics from a simple yet representative model with the probabilistic assessment of the rates at which two alternative single-gene thresholds for species discovery are met. Further, we use published data on mtDNA divergence among species for which reproductive isolation has been assessed to determine whether the error rates predicted by theory occur in nature.
The general approach to species discovery with a single-gene threshold is to calculate a metric from DNA sequence data that is collected from n1 individuals of a newly discovered candidate species and n2 individuals of a reference species that is its closest relative. These n1 individuals are flagged as being from a candidate new species if this metric exceeds a particular threshold. The two proposed thresholds for mtDNA that we consider are reciprocal monophyly (Wiens and Penkroft, 2002) and when there is 10 times greater average pairwise genetic differences between the n1 individuals of the candidate species and the reference species than the average within-species pairwise differences found in the particular taxonomic group (the "10x rule"; Hebert et al., 2004). We explored these two proposed methods of species discovery throughout a range of conditions by employing simulations that are based on speciation theory (Gavrilets, 2003) and population genetic theory (Tajima, 1983). To this end, we report probabilities of false-positive and false-negative errors given these two proposed single-gene species-discovery thresholds (Hebert et al., 2003; Wiens and Penkroft, 2002) and the classical two-locus/single-incompatibility Bateson-Dobzhansky-Muller (BDM) model of biological species formation by geographic isolation. Although many models of reproductive isolation are likely to be operative in nature, the BDM model guides our intuition about the general utility of single-gene thresholds. Furthermore, it is well characterized, tractable, and its dynamics captures a range of speciation times implicit across many pre- and post-zyogotic isolation models (Gavrilets, 2003; Turelli et al., 2001).
We explored three modes of single-incompatibility BDM speciation—two "allopatric" models, in which a population gives rise to two equally sized populations that can become reproductively isolated (Fig. 1A), and the "peripatric" model of speciation in which a small "island" population is colonized from a much larger "continental" population with subsequent isolation (Fig. 1B). For each set of fixed conditions, error rates were estimated from 10,000 joint simulations of the BDM loci and the species discovery locus (mtDNA; see Materials and Methods). Results are presented for typical mutation rates (Brown et al., 1982; DeSalle et al., 1987; Futuyma, 1998), base frequencies, and number of base pairs at the mtDNA species discovery locus (Hebert et al., 2004).
|
To place the results of our joint simulations into an empirical context, we apply the two threshold to six data sets that have both mtDNA data (cytochrome oxidase 1 or cytochrome b) and data on reproductive isolation between pairs of nominal species (Bolnick and Near, 2005; Coyne and Orr, 1997; Mendelson, 2003; Presgraves, 2002; Price and Bouvier, 2002; Sasa et al., 1998; Zigler et al., 2005). Specifically, we explore error rates given both the condition of reciprocal monophyly, where possible, and the utility of a 10x threshold and two species delineation criteria between these pairs of named lineages (current taxonomic status or reproductive isolation). Whereas the temporal and genealogical dynamics of single and multiple loci have been examined and discussed extensively (Hudson and Coyne, 2002; Hudson and Turelli, 2003; Neigel and Avise, 1986; Rosenberg, 2003; Wollenberg and Avise, 1997), we are not aware of any prior study examining the joint dynamics of neutral "marker loci" and models of reproductive isolation. Our purpose in using theory is to train the intuition of empiricists, and to this end we explore conditions under which single-gene thresholds could be applied.
| Materials and Methods |
|---|
|
|
|---|
To calculate the probability of species discovery error (false-positive and false-negative error rates) under different fixed parameters (divergence time and selection strength), we jointly simulated the dynamic process of speciation and the evolution of the DNA-barcode locus. All populations were assumed to be panmictic and 10 individuals are sampled from both the reference species and the sister population that might be a new species such that each simulated data set consists of 20 scored individuals. We do not allow for postdivergence gene flow (i.e., hybridization).
BDM Models
Speciation was assumed to result from a two-locus, two-allele, single-incompatibility Bateson-Dobzhansky-Muller (BDM) model (Dobzhansky, 1937) under two general scenarios: (1) an "allopatric" model (Fig. 1A), in which one ancestral population diverges into two species (the reference species and a newly discovered putative sister species); and (2) a "peripatric" model (Fig. 1B), where the newly discovered "island" population arose by colonization from the much larger "continental" species (the reference species). Both models involve two loci that have two alleles. According to this classical model of reproductive isolation, allele a at one BDM locus is incompatible with allele B at the other BDM locus such that an individual with genotype aabb is reproductively incompatible with an individual with genotype AABB. This incompatibility can arise for a number of reasons including various epistatic interactions (Wu, 2001) or loss of function in duplicated genes (Lynch, 2002). In this latter case, gametes from hybrids can be lacking in functional genes for a duplicate pair of genes (Lynch and Force, 2000).
In the allopatric BDM model, both isolated populations are initially fixed for single alleles at the two BDM loci (A and b) such that mutual reproductive isolation arises after two opposing substitutions in the two populations (Fig. 1A(i)). Reverse mutations do not occur, and reproductive isolation is not certain under the allopatric model if the same substitution occurs in the two isolated populations (Fig. 1A(ii)). In this allopatric model, positive selection occurs at one of the BDM loci in one of the populations and at the other BDM locus in the other population. In addition to this scenario, we also report results after conditioning that reproductive isolation is certain under the allopatric model (see Fig. 4).
In the peripatric BDM model, both descendent populations are initially fixed for single alleles a and b at the two BDM loci. Because of relaxed selection in the new island environment, substitutions only occur in the island population such that two substitutions in the island population result in a fixation of the AB genotype. Because a is incompatible with B, reproductive isolation arises (Fig. 1B) and the two populations can be considered distinct species under most definitions (Dobzhansky, 1937; Gavrilets, 2003). We explored neutrality at the BDM loci and also explored varying levels of divergent selection operating at the BDM loci (Dobzhansky, 1937; Gavrilets, 2003; Nei, 1976). In the peripatric model, this selection only occurs in the new island population.
Following theory, we assume that the waiting time to BDM speciation (T) is determined by the waiting time between the substitutions (mutations followed by fixations) at the two BDM loci. Because fixation is a stochastic process involving genetic drift and variation within populations, this waiting time can be considered a Poisson process (Kimura, 1968), and we therefore allowed T to follow an exponential distribution (DeGroot, 1986), with the mean determined by a specific BDM model. Under both neutral and divergent selection BDM models, parameters affecting T include a mutation rate (µ) of the BDM loci and probability of speciation (u), where the substitution rates of the BDM loci equal µ under neutrality (Gavrilets, 2003; Kimura, 1968; Nei, 1976). Under divergent selection, two additional parameters affecting T include diploid effective population size (N) and the coefficient of selection (s). We assume that both BDM loci are autosomal with µ = 2.46 x 106 per My, which is 1/10th of our assumed mtDNA-barcode locus rate.
In the neutral allopatric model, the two substitutions at the two BDM loci occur in the opposing populations such that T = 3/(4µ) (Fig. 1A). In the neutral peripatric model, the two substitutions at the two BDM loci occur in the smaller island population such that T = 2/µ (Fig. 1B). Although speciation is faster in the neutral allopatric model, speciation is not certain (u = 0.5), whereas it is certain in the neutral peripatric model (Gavrilets, 2003) (u = 1.0). However, we additionally present results for the allopatric model after conditioning that speciation is certain. Given spatially heterogeneous selection in the allopatric model (Fig. 1A; the two BDM loci experience opposing positive selection in the two divergent populations), T = 3/(2µ S), where S = 4Ns and S >> 1 (Gavrilets, 2003; Nei, 1976). Given divergent selection in the peripatric model, mutant alleles are advantageous in the newly colonized island environment (Fig. 1B) and T = 2(1 – e–S) /µ S. Selection decreases T in both models, and although speciation is still not certain in the allopatric model, selection does increase its probability (u = 1 – 2/S; Gavrilets, 2003).
mtDNA Model Parameters
The DNA-barcode locus was simulated under the neutral coalescent (Tajima, 1983) using sequence evolution parameters commonly found for mitochondrial cytochrome oxidase I, the proposed marker for DNA barcoding in animals. We assume equal sex ratios, a generation time of 2 years, and simulate a haploid mitochondrial barcode locus that is 614 base pairs long, has a mutation rate of 1.95% per million generations, evolves with a transition/transversion ratio of 7.2, has gamma-distributed among-site rate heterogeneity (
= 0.25), and has nucleotide frequencies of A: 0.31, G: 0.33, C: 0.11, T: 0.25).
mtDNA Species Discovery Thresholds
Following the Herbert 10x divergence criterion (Hebert et al., 2004), a new species is discovered if the average divergence between the reference species and the newly discovered putative sister species (
b) is > 10 times the average divergence found within species (
w). We considered
w to be the average divergence found within reference species across all simulations for the allopatric model and the average divergence found within descendent continental species across all simulations for the peripatric model.
Reciprocal monophyly (Fig. 1C) at the DNA-barcode locus is the condition in which both sets of sampled alleles in a pair of populations share a common ancestror more recently than between any pairs of alleles in different populations. Because reciprocal monophyly can never be observed and only inferred by the mutational pattern at the locus, we employ a summary statistic that acts as a surrogate for reciprocal monophyly to facilitate calculation over many simulations. Given that reciprical monophyly is ensured when the allelic common ancestry within a population is less than the population's age (Hudson and Turelli, 2003), we consider a pair of population samples to be reciprocally monophyletic when
b >
1+
2, where
1 and
2 are the average divergences within the reference species and newly discovered putative sister species, respectively. To verify the validity of this reciprocal monophyly surrogate, we compare it to the exact probabilities of reciprocal monophyly derived by Rosenberg (2003) given identical sample sizes and identical relative effective population sizes (Fig. 2).
|
Simulations
For both speciation scenarios, we simulated a data set 10,000 times under different fixed values for (i) the number of generations in the past at which the populations diverged (divergence time) and (ii) s, the coefficient of positive selection for the two BDM loci. Fixed values for times after divergence incrementally ranged from 5.0 x 104 to 4.5 x 106 generations and, in addition to neutrality, we let s incrementally range from 1.0 x 10– 6 to 1.0 x 10–1. Rather than keeping N (effective population size) fixed, we allowed N to vary across each set of 10,000 simulations that were fixed for divergence time and s. To this end, N was drawn from a gamma distribution having a mean and variance of 3.0 x 105 and 3.5 x 1010, respectively. In the case of the peripatric model, each island population size is 100th the size of the sister continental population size. Each iteration entailed: (1) simulating the effective population sizes and the exponential waiting time to BDM speciation (T); (2) simulating the haploid DNA-barcode locus (Hudson, 2002); (3) determining if the newly discovered putative sister species is scored as a "new species" given either of the two DNA-barcode thresholds; and (4) determining if (i) the species discovery was incorrect (false positive), or conversely (ii) if a new species was not discovered despite there being BDM reproductive isolation (false negative). Following 10,000 iterations at each set of fixed parameters (divergence time and selection coefficient), we report the proportion of false positives and false negatives. These steps were accomplished with three C programs operated within a Perl script available upon request from M. Hickerson. The second of these C programs is a version of Hudson's coalescent simulator (Hudson, 2002) (ms) that produces finite sites DNA sequence data.
Empirical Data Sets
Six data sets with information available for both reproductive isolation and mtDNA divergence were examined to compare with our simulation results. Mitochondrial COI or cytochrome b data were drawn from Genbank for calculating interspecific divergences (K-2P). To minimize sampling bias (Felsenstein, 2006; Tajima, 1983), the 10x threshold was calculated using intraspecific values (K2P) generated from taxa with greater than four individuals reported within GenBank or taken directly from the literature (Hebert et al., 2003).
Evaluation of reciprocal monophyly in the empirical data sets was determined via neighbor-joining topologies derived from GenBank sequences or determined from previous investigations of species complexes (Blum et al., 2003; Dean and Ballard, 2005; Gleason et al., 1998; Kopp and Barmina, 2005; Machado and Hey, 2003; Wahlberg et al., 2003). We employ three categories for scoring reciprocal monophyly: reciprocally monophyletic (RM), not reciprocally monophyletic (nonRM), and data not available (NA) when only a single sequence was available per taxon. We considered cases in which only one of the sister pairs has multiple individuals as reciprocally monophyletic if the topology was consistent with nomenclature, albeit tentatively.
Degree of reproductive isolation for each species pair comparison was taken directly from the literature: fruit flies (Coyne and Orr, 1997); butterflies (Presgraves, 2002); urchins (Zigler et al., 2005); frogs (Sasa et al., 1998); darters (Mendelson, 2003); and sunfish (Bolnick and Near, 2005). In the case of butterflies, frogs, and sunfish, this only took into account postzygotic isolation. In the case of the urchins, this took into account only premating isolation (gamete compatibility). And in the case of fruit flies and darters, this took into account both pre- and postzygotic isolation. Similar to Coyne and Orr (1997), we calculated composite pre-postmating isolation in the darters as SI + HI(1–SI), where SI and HI are sexual isolation and hybrid incompatibility, respectively. Because estimates of reproductive isolation are likely to generally be an underestimate given unmeasured zygotic barriers (unless it already is at 1.0), we chose to use 0.75 as a criterion for delimiting reproductively isolated populations.
| Results |
|---|
|
|
|---|
Consistent with theory and evidence (Gavrilets, 2003), the simulations demonstrate that the probability of BDM speciation increases with longer times since isolation and also increases with greater strengths in selection for local adaptation (Figs. 3A and 3E). Also, as expected (Hudson and Turelli, 2003; Rosenberg, 2003; Takahata and Nei, 1985), the probability of reaching both reciprocal monophyly and the 10x threshold increases with longer times after isolation, with the former being attained much more rapidly than the latter (Figs. 3B and 3F). However, because of differences in temporal dynamics between the BDM loci and the mtDNA locus, the simulations showed strong discordance between species discovery using either of the two single-gene thresholds and the attainment of reproductive isolation, especially when populations have been isolated for less than 4 million generations (Fig. 3). Although we report results for sample sizes of 20 individuals (10 from the reference species and 10 from the newly discovered putative sister species), we found that our results were robust to variation in sample size (2 to 100 individuals).
|
Allopatric Model
Under moderate to strong selection at the BDM loci, using reciprocal monophyly as a threshold results in high false-negative error rates (> 10%) for recently divergent lineages (< 1 million generations), regardless of whether we assumed speciation is certain in the allopatric model (Fig. 3D(iv) and Fig. 4). Considering both allopatric models, using reciprocal monophyly as a threshold results in high false-positive error rates (> 40%) under neutral or weakly divergent selection at the BDM loci (Figs. 3C(ii) and 4). In the absence of a priori knowledge of selection strength, reciprocal monophyly only proved reliable (< 10% total error) at > 1 million generations after allopatric isolation, assuming that BDM speciation is certain (Fig. 4).
|
The 10 x threshold also is error prone for species discovery under the allopatric model (Fig. 3). There is a high probability of failing to discover reproductively isolated lineages at divergence times < 2.5 million generations, and there was up to 100% error given moderate to strong divergent selection at the BDM loci (Figs. 3D(iii) and 4). In contrast to reciprocal monophyly, the 10x threshold only becomes prone to false positive error if we assume that speciation is not certain (Fig. 3C(i)).
Peripatric Model
Our simulations of peripatric speciation accorded with theoretical predictions that very strong selection is required to accelerate speciation in small peripheral populations (Gavrilets, 2003; Nei, 1976) (Fig. 3E). Divergent selection and consequent rapid speciation are thought to be likely because island populations will be subjected to strong novel selection pressures, leading to rapid speciation (Losos and De Queiroz, 1997; Schluter, 2000). In this peripatric model, false-negative error rates of 10% to 35% occurred under strong selection at the BDM loci if isolation was less than 150,000 generations and reciprocal monophyly was used as a threshold (Fig. 3H(iv)). When 10x was used as a threshold, false negatives were even higher under the peripatric model with strong selection (Fig. 3H(iii)). Using reciprocal monophyly as a threshold yielded even higher false-positive error rates than when used in either allopatric model (Fig. 3F(ii)). Without prior knowledge of selection strength, reciprocal monophyly achieved low (< 10%) false-positive error rates only at divergence times > 4 million generations (Fig. 3G(ii)).
As in the allopatric model, the 10x threshold proved to be highly prone to false-negative errors in the peripatric model, with substantial (10% to 100%) probability of false negatives across all strengths of selection at the BDM loci if time since isolation was < 2.5 million generations (Fig. 3H(iii)). Similar to the allopatric model with speciation uncertain (Fig. 2), using the 10x threshold under the peripatric model results in a moderate (10% to 20%) probability of falsely discovering new reproductively isolated species when the time since isolation has been 2 to 4 million generations ago and selection at the BDM loci is weak selection or nonexistent (Fig 3G(i)).
Empirical Comparisons of Single-Gene Thresholds and Reproductive Isolation
Do our model-based error rates accord with patterns seen in nature? To address this, we apply the 10x threshold and where possible, reciprocal monophyly, to six empirical data sets that have data for both reproductive isolation (pre- and/or postzygotic isolation) between nominal pairs of species and mtDNA sequence data (COI or cytochrome b). To evaluate the 10x threshold for species discovery, we use two different species delineation criteria—nominal-taxonomic or reproductive isolation threshold of 0.75. Using either of the criteria, we subdivide the pairwise comparisons between nominal-taxonomic species into four categories: (I) correct negatives; (II) false negatives; (III) false positives; and (IV) correct species discoveries. Although each data set has its own particular set of caveats, biases, and methods to measure pre- and/or postzygotic isolation, our objective is to explore the general utility of single-gene thresholds for species discovery when faced with some of the idiosyncratic dynamics of reproductive isolation and/or species delineation.
Degrees of accuracy and direction of error vary widely when implementing the 10x threshold for species discovery (Fig. 5). This is both the case when using the named species as a criterion for species delineation or a reproductive isolation threshold of 0.75. The butterfly data set was the one in which the 10x threshold worked well for correctly "discovering" reproductively isolated species (Fig. 5B). However, this 10x threshold missed a substantial proportion of named species in all other taxonomic groups, this being most pronounced in frogs and darters (Figs. 5D and 5E), where most named or reproductively isolated species are missed. Although a lower threshold would help eliminate false-negative errors, false positives will inevitably increase, especially when reproductive isolation informs animal species delineation. This would particularly be the case in animal groups similar to the urchins (Fig. 5C), where genetic divergence appears to be correlated inversely with reproductive compatibility, possibly due to reinforcement (Zigler et al., 2005).
|
In all groups except for butterflies, several examples of reciprocal monophyly with weak reproductive isolation occurred (false positives; Fig. 5). Only in Drosophila were there numerous cases of nonreciprocal monophyly between reproductively isolated populations (false negatives; Fig. 5A). Consistent with the simulations under neutrality at the BDM loci (Fig. 3), false positives were more of a common problem than false negatives when using reciprocal monophyly across the six empirical data sets (Fig. 5).
| Discussion |
|---|
|
|
|---|
Our joint simulations of these two decoupled processes demonstrate that single-gene thresholds can only consistently discover new species with error rates of < 10% if isolation was > 4 million generations ago. Furthermore, the results of our simulations are consistent with available data in that time to reproductive isolation varies widely within and between taxonomic groups, consequently making prospects for deployment of a single-gene threshold unpredictable. That single-locus thresholds will often discover historically isolated lineages that are not reproductively isolated is perhaps not such a problem as such lineages (if confirmed by other phenotypic or nuclear gene evidence) do represent an important component of diversity (Moritz, 2002) and would be regarded as species under some concepts (Cracraft, 1983). Much more serious is the expected bias against discovering young species (false negatives), as this will systematically underestimate species diversity in regions and taxa undergoing adaptive radiation. Although this concern has been explored and discussed previously (Lee, 2004) and is acknowledged within the DNA barcoding community (Hebert et al., 2004), our results show that both reciprocal monophyly and a divergence threshold like 10x will fail to discover species over broad parameter space. Even though it is not known how representative the simple BDM models we simulate here are in nature, our comparison between the empirical studies and our joint simulations accords with the BDM model being a flexible and reasonable guide to the general temporal dynamics of reproductive isolation (Gavrilets, 2003; Turelli et al., 2001).
In the context of rapid loss of biodiversity, high-throughput molecular approaches to discovering species are seen as utilitarian (Hebert et al., 2004). Population genetic and speciation theory can guide our intuition about potential bias arising from deployment of these single-gene thresholds in large-scale biodiversity surveys. As a first step, we combine coalescent theory with the classical BDM model involving two loci and a single incompatibility. Although the BDM model for postzygotic isolation is one of many proposed mechanisms for reproductive isolation and speciation (Coyne and Orr, 2004; Mayr, 1982), the range of speciation times resulting from our implementations of simple BDM models (Fig. 3) are well within the range of times to speciation and/or reproductive isolation found in nature or the fossil record (Bolnick and Near, 2005; Edwards et al., 2005; Hoskin et al., 2005; Mendelson and Shaw, 2005; Near and Benard, 2004; Palumbi and Lessios, 2005; Presgraves, 2002; Sasa et al., 1998; Stanley, 1998; Turner, 1999; Zigler et al., 2005). For example, prezygotic isolation, a common feature in many animal groups (Edwards et al., 2005; Mendelson and Shaw, 2005; Zigler et al., 2005), could be approximated by our BDM model under divergent selection, and thereby also result in many false negatives in real applications of single-gene thresholds.
Our model-based estimation of error rates should be regarded as conservative, especially in relation to false negatives, which we consider as the more serious form. BDM speciation can be more complex, often involving more loci, incompatibilities (Orr and Turelli, 2001; Turelli et al., 2001; Wu and Ting, 2004), and/or species that are more spatially structured (Avise, 2000). Thus, the false-negative error rates we demonstrate can be considered conservative given the range of speciation times that are expected under more complex versions of the BDM model. For example, time to reproductive isolation under multi-incompatibility BDM models can be extremely rapid when geographic divergent selection acts on the BDM loci (Gavrilets, 2003, 2004; Orr and Turelli, 2001). Conversely, times to reproductive isolation can be longer and less variable under neutral multi-incompatibility BDM models as the number of BDM compatibilities increase (Gavrilets, 2003; Orr and Turelli, 2001), thereby inflating false positives (but reducing false-negative errors). Regarding spatially structured species and populations, our model indirectly incorporates the effects of subdivision because we draw from a wide gamma distribution for effective population sizes. With respect to its effect on the coalescent, subdivided populations and the larger populations from our gamma distribution both result in longer times to reciprocal monophyly, mitochondrial coancestry, and times to fixation of BDM loci under selection (Wakeley, 2000). Secondly, taxonomic groups that normally consists of very subdivided species will have a higher 10x threshold, and the mtDNA divergence between putative species within such heavily subdivided taxonomic groups will be proportionally larger on average if ancestral species are also subdivided (Arbogast et al., 2002).
When applying single-gene thresholds to taxon pairs that have information on both mtDNA divergence and reproductive isolation, the probability of either missing new reproductively isolated species or incorrectly "discovering" depends on the both particular threshold that systematists use to delineate species and the barcode-threshold one employs (vertical bar; Fig. 5). Unsurprisingly, these empirical comparisons demonstrate that different taxonomic groups will require their own particular thresholds that are optimized to minimize error rates (Meyer and Paulay, 2005). For example, the 10x rule is too conservative for Etheostoma (Darters) because a "new" species would rarely be discovered (Fig. 5). Yet it appears that any particular single-gene discovery threshold will result in substantial error rates when the time to reproductive isolation is highly variable within a taxonomic group. This is most obvious in the four deuterostome groups, and the Echinoids in particular (Figs. 5C, 5D, 5E and 5F) illustrate the need to use integrative criteria for verifying new species discoveries, regardless of whether one would use reproductive isolation as a criterion for species recognition. Data from this marine group is especially illuminating given that the use of barcodes for marine taxa has generated a lot of interest (Hanner et al., 2005; O'Dor, 2005). The clear lack of correlation between mtDNA divergence and reproductive isolation in echinoids indicates that thresholds like 10x will be too conservative for such marine taxa where selection can eventually accelerate speciation in some pairs.
| Conclusions |
|---|
|
|
|---|
Our study highlights the biases associated with using single-gene/single-threshold criteria and, given the simple speciation scenarios considered here, argues that the high probability of error cannot be ignored if such methods are implemented for species discovery in large-scale systematic or ecological studies. Furthermore, given that error rates are sensitive to details of the BDM process (genetic architecture, strength of selection), and that we rarely have direct estimates of these details, it will be difficult to estimate the true rate of error for any given taxonomic group or community. Because single-gene thresholds for species discovery result in egregious error at recent divergence times, they will misrepresent the correspondence between recently isolated populations and reproductively isolated lineages (= species). This will impede the growing nexus between ecological and evolutionary perspectives on patterns and processes of species diversity in communities (Hubbell, 2001). Although our conclusions could be perceived as alarmist, evolutionary biologists are increasingly finding high levels of divergent selection that could accelerate speciation (Eberhard, 1996; Nice et al., 2005; Nosil et al., 2005), including selection directly acting on BDM loci (Barbash et al., 2003, 2004; Macnair and Christie, 1983; Presgraves et al., 2003; Ting et al., 1998; Wu and Ting, 2004).
Analyses of DNA sequence variation have proved enormously valuable for describing biodiversity patterns and helping to understand the underlying processes (Avise, 2000). Although single-gene thresholds can be a pragmatic first step in discovering divergent lineages, they are neither sufficient nor necessary. Species discovery will often require an integrated approach that draws on phenotypic and molecular data as appropriate (Dayrat, 2004; Will et al., 2005).
| Acknowledgments |
|---|
|
|
|---|
We thank N. Rosenberg, D. Wake, J. Degnan, T. Mendelson, K. Zigler, G. Roderick, M. Slatkin, J. Novembre, W. Zhai, and P. Palsboll for useful discussions; R. Page, M. Hedin, and two anonymous reviewers for valuable suggestions to improve the manuscript; and E. Stahl for the finite sites version of Hudson's coalescent simulator. This work was supported by the National Science Foundation postdoctoral fellowship in interdisciplinary informatics awarded to M. Hickerson (DBI-0305966) and NSF Grant No. (036338) awarded to C. Meyer.
| References |
|---|
|
|
|---|
-
Arbogast B. S., Edwards S. V., Wakeley J., Beerli P., Slowinski J. B. Estimating divergence times from molecular data on phylogenetic and population genetic timescales. Annu. Rev. Ecol. Syst. (2002) 33:707–740.[CrossRef][Web of Science]
Avise J. C. Phylogeography: The history and formation of species (2000) Cambridge, Massachusetts: Harvard University Press.
Barbash D. A., Awadalla P., Tarone A. M. Functional divergence caused by ancient positive selection of a Drosophila hybrid incompatibility locus. PLoS. Biol. (2004) 2:839–848.[Web of Science]
Barbash D. A., Siino D. F., Tarone A. M. A rapidly evolving MYB-related protein causes species isolation in Drosophila. Proc. Natl. Acad. Sci. USA (2003) 100:5302–5307.
Blum M. J., Bermingham E., Dasmahapatra K. A molecular phylogeny of the neotropical butterfly genus Anartia (Lepidoptera: Nymphalidae). Mol. Biol. Evol. (2003) 26:46–55.
Bolnick D. I., Near T. J. Tempo of hybrid inviability in centrarchid fishes (Teleostei: Centrarchidae). Evolution (2005) 59:1754–1767.[Web of Science][Medline]
Brown W. M., Prager E. M., Wang A., Wilson A. C. Mitochondrial DNA sequences of primates: Tempo and mode of evolution. J. Mol. Evol. (1982) 18:225–239.[CrossRef][Web of Science][Medline]
Coyne J. A., Orr H. A. "Patterns of speciation in Drosophila" revisited. Evolution (1997) 51:295–303.[CrossRef][Web of Science]
Coyne J. A., Orr H. A. Speciation (2004) Sunderland, Massachusetts: Sinauer Associates.
Cracraft J. Species concepts and speciation analysis. Curr. Ornithol. (1983) 1:159–187.
Dayrat B. Towards integrative taxonomy. Biol. J. Linn. Soc. (2004) 85:407–415.[CrossRef][Web of Science]
Dean M. D., Ballard J. W. O. High divergence among Drosophila simulans mitochondrial haplogroups arose in midst of long term purifying selection. Evolution (2005) 36:328–337.
DeGroot M. H. Probability and statistics (1986) Reading, Massachusetts: Addison-Wesley Publishing Company.
DeSalle R., Freedman T., Prager E. M., Wilson A. C. Tempo and mode of sequence evolution in mitochondrial DNA of Hawaiian Drosophila. J. Mol. Evol. (1987) 26:157–164.[CrossRef][Web of Science][Medline]
Dobzhansky T. G. Genetics and the origin of species (1937) New York: Columbia University.
Eberhard W. G. Female Control: Sexual Selection by Cryptic Female Choice (1996) Princeton, New Jersey: Princeton University Press.
Futuyma D. J. Evolutionary biology (1998) Sunderlands, Massachusetts: Sinauer Associates.
Gavrilets S. Models of speciation: what have we learned in 40 years. Evolution (2003) 57:2197–2215.[CrossRef][Web of Science][Medline]
Gavrilets S. Fitness landscapes and the origin of species (2004) Princeton, New Jersey: Princeton University Press.
Gleason J. M., Griffith E. C., Powell J. R. A molecular phylogeny of the Drosophilawillistoni group: Conflicts between species concepts? Evolution (1998) 52:1093–1103.[CrossRef][Web of Science]
Hanner R., Schindel D., Ward B., Hebert P. Fish barcode of life (FISH-BOL) (2005) 1–19.
Hebert P. D. N., Cywinska A., Ball S. L., deWaard J. R. Biological identification through DNA barcodes. Proc. R. Soc. Lond. B Biol. (2003) 270:313–321.[Medline]
Hebert P., Gregorory T. The promise of DNA barcoding for taxonomy. Syst. Biol. (2005) 5:852–859.
Hebert P. D. N., Stoeckle M. Y., Zemlak T. S., Francis C. M. Identification of birds through DNA barcodes. PLoS. Biol. (2004) 2:1657–1663.[Web of Science]
Hoskin C. J., Higgie M., McDonald K. R., Moritz C. Reinforcement drives rapid allopatric speciation. Nature (2005) 437:1353–1357.[CrossRef][Medline]
Hubbell S. P. The unified neutral theory of biodiversity and biogeography (2001) Princeton, New Jersey: Princeton University Press.
Hudson R. R. ms—A program for generating samples under neutral models. Bioinformatics (2002) 18:337–338.
Hudson R. R., Coyne J. A. Mathematical consequences of the genealogical species concept. Evolution (2002) 56:1557–1565.[CrossRef][Web of Science][Medline]
Hudson R. R., Turelli M. Stochasticity overrules the "three-times rule": Genetic drift, genetic draft, and coalescence times for nuclear loci versus mitochondrial DNA. Evolution (2003) 57:182–190.[CrossRef][Web of Science][Medline]
Janzen D. H. Now is the time. Philos. R. Trans. R. Soc. B (2004) 359:731–732.[CrossRef]
Kimura M. Evolutionary rate at the molecular level. Nature (1968) 217:625–626.
Kopp A., Barmina O. Evolutionary history of the Drosophila bipectinata species complex. Genet. Res. (2005) 85:23–46.[CrossRef][Web of Science][Medline]
Lee M. S. Y. The molecularisation of taxonomy. Invertebr. Syst. (2004) 18:1–6.[CrossRef]
Losos J. B., De Queiroz K. Evolutionary consequences of ecological release in Caribbean Anolis lizards. Biol. J. Linn. Soc. (1997) 61:459–483.[Web of Science]
Lynch M. Gene duplication and evolution. Science (2002) 297:945–947.
Lynch M., Force A. G. The origin of interspecific genomic incompatibility via gene duplication. Am. Nat. (2000) 156:590–605.[CrossRef][Web of Science]
Machado C. A., Hey J. The causes of phylogenetic conflict in a classic Drosophila species group. Proc. R. Soc. Lond. B Biol. (2003) 270:1193–1202.[Medline]
Macnair M. R., Christie P. Reproductive isolation as a pleiotropic effect of copper tolerance in Mimulus guttatus. Heredity (1983) 50:295–302.[CrossRef][Web of Science]
Mayr E. Processes of speciation. In: Mechanisms of speciation—Barigozzi C., ed. (1982) New York: Liss. 1–19.
Mendelson T. C. Sexual isolation evolves faster than hybrid inviability in a diverse and sexually dimorphic genus of fish (Percidae: Etheostoma). Evolution (2003) 57:317–327.[CrossRef][Web of Science][Medline]
Mendelson T. C., Shaw K. L. Rapid speciation in an arthropod: The likely force behind an explosion of new Hawaiian cricket species revealed. Nature (2005) 433:375–376.[CrossRef][Medline]
Meyer C. P., Paulay G. DNA barcoding: Error rates based on comprehensive sampling. PLoS. Biol. (2005) 3:e422.[CrossRef][Medline]
Moritz C. Strategies to protect biological diversity and the evolutionary processes that sustain it. Syst. Biol. (2002) 51:238–254.
Moritz C., Cicero C. DNA Barcoding: Promise and pitfalls. PLoS. Biol. (2004) 2:1529–1531.[Web of Science]
Near T. J., Benard M. F. Rapid allopatric speciation in logperch darters (Percidae: Percina). Evolution (2004) 58:2798–2808.[Web of Science][Medline]
Nei M. Mathematical models of speciation and genetic distance. In: Population genetics and ecology—Karlin S., Nevo E., eds. (1976) New York: Academic Press. 95–107. Chapter 5.3.
Neigel J. E., Avise J. C. Phylogenetic relationships of mitochondrial DNA under various demographic models of speciation. In: Evolutionary processes and theory—Nevo E., Karlin S., eds. (1986) New York: Academic Press. 515–534.
Nice C. C., Anthony N., Gelembiuk G., Raterman D., French-Constant R. The history and geography of diversification within the butterfly genus Lycaeides in North America. Mol. Ecol. (2005) 14:1741–1754.[CrossRef][Medline]
Nosil P., Vines T. H., Funk D. J. Reproductive isolation caused by natural selection against immigrants from divergent habitats. Evolution (2005) 59:705–719.[Web of Science][Medline]
O'Dor R. K. Census of marine life research plan (2005) 1–57.
Orr H. A., Turelli M. The evolution of postzygotic isolation: Accumulating Dobzhansky-Muller incompatibilities. Evolution (2001) 55:1085–1094.[CrossRef][Web of Science][Medline]
Palumbi S. R., Lessios H. A. Evolutionary animation: How do molecular phylogenies compare to Mayr's reconstruction of speciation patterns in the sea? Proc. Natl. Acad. Sci. USA (2005) 102:6566–6572.
Presgraves D. C. Patterns of postzygotic isolation in Lepidoptera. Evolution (2002) 56:1168–1183.[CrossRef][Web of Science][Medline]
Presgraves D. C., Balagolapan L., Abmayr S. M., Orr H. A. Adaptive evolution drives divergence of a hybrid incompatibility gene between two species of Drosophila. Nature (2003) 423:715–719.[CrossRef][Medline]
Price T. D., Bouvier M. M. The evolution of F-1 postzygotic incompatibilities in birds. Evolution (2002) 56:2083–2089.[CrossRef][Web of Science][Medline]
Rosenberg N. A. The shapes of neutral gene genealogies in two species: Probabilities of monophyly, paraphyly, and polyphyly in a coalescent model. Evolution (2003) 57:1465–1477.[CrossRef][Web of Science][Medline]
Sasa M. M., Chippindale P. T., Johnson N. A. Patterns of postzygotic isolation in frogs. Evolution (1998) 52:1811–1820.[CrossRef][Web of Science]
Schluter D. The ecology of adaptive radiation (2000) Oxford, UK: Oxford University Press.
Stanley S. M. Macroevolution: Pattern and process (1998) Baltimore, Maryland: John Hopkins University Press.
Tajima F. Evolutionary relationship of DNA sequences in finite populations. Genetics (1983) 105:437–460.
Takahata N., Nei M. Gene genealogy and variance of intrapopulational nucleotide differences. Genetics (1985) 110:325–344.
Ting C.-T., Tsaur S.-C., Wu M.-L., Wu C.-I. A rapidly evolving homeobox at the site of a hybrid sterility gene. Science (1998) 282:1501–1504.
Turelli M., Barton N. H., Coyne J. A. Theory and speciation. Trends Ecol. Evol. (2001) 16:330–343.[CrossRef][Medline]
Turner G. F. Explosive speciation of African cichlid fishes. In: Evolution of biological diversity—Magurran A. E., May R. M., eds. (1999) Oxford, UK: Oxford University Press. 113–129.
Wahlberg N., Oliveira R., Scott J. A. Phylogenetic relationships of Phyciodes butterfly species (Lepidoptera: Nymphalidae): Complex mtDNA variation and species delimitations. Syst. Entomol. (2003) 28:257–273.[CrossRef]
Wakeley J. The effects of subdivision on the genetic divergence of populations and species. Evolution (2000) 4:1092–1101.
Wheeler Q. D., Raven P. H., Wilson E. O. Taxonomy: Impediment or expedient? Science (2004) 303:285.[Abstract]
Wiens J. J., Penkroft T. A. Delimiting species limits in spiny lizards (Sceloporus). Syst. Biol. (2002) 1:69–91.
Will K. W., Mishler B., Wheeler Q. D. The perils of DNA barcoding and the need for integrative taxonomy. Syst. Biol. (2005) 54:844–851.
Wollenberg K., Avise J. C. Phylogenetics and the origin of species. Proc. Natl. Acad. Sci. USA (1997) 94:7748–7755.
Wu C.-I. The genic view of speciation. J. Evol. Biol. (2001) 14:851–865.[CrossRef][Web of Science]
Wu C.-I., Ting C.-I. Genes and speciation. Nat. Rev. Genet. (2004) 5:114–122.[CrossRef][Web of Science][Medline]
Zigler K. S., McCartney M. A., Levitan D. R., Lessios H. A. Sea urchin bindin divergence predicts gamete compatibility. Evolution (2005) 59:2399–2404.[CrossRef][Web of Science][Medline]
This article has been cited by other articles:
![]() |
D. M. Spooner DNA barcoding will frequently fail in complicated groups: An example in wild potatoes Am. J. Botany, June 1, 2009; 96(6): 1177 - 1189. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Meier, G. Zhang, and F. Ali The Use of Mean Instead of Smallest Interspecific Distances Exaggerates the Size of the "Barcoding Gap" and Leads to Misidentification Syst Biol, October 1, 2008; 57(5): 809 - 813. [Full Text] [PDF] |
||||
![]() |
A. Papadopoulou, J. Bergsten, T. Fujisawa, M. T Monaghan, T. G Barraclough, and A. P Vogler Speciation and DNA barcodes: testing the effects of dispersal on the formation of discrete sequence clusters Phil Trans R Soc B, September 27, 2008; 363(1506): 2987 - 2996. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Bond and A. K. Stockman An Integrative Method for Delimiting Cohesion Species: Finding the Population-Species Interface in a Group of Californian Trapdoor Spiders with Extreme Genetic Divergence and Geographic Structuring Syst Biol, August 1, 2008; 57(4): 628 - 646. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. B. Zhang, D. S. Sikes, C. Muster, and S. Q. Li Inferring Species Membership Using DNA Sequences with Back-Propagation Neural Networks Syst Biol, April 1, 2008; 57(2): 202 - 215. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. A. Ross, S. Murugan, and W. L. Sibon Li Testing the Reliability of Genetic Methods of Species Identification via Simulation Syst Biol, April 1, 2008; 57(2): 216 - 230. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. J. Wiens Species Delimitation: New Approaches for Discovering Diversity Syst Biol, December 1, 2007; 56(6): 875 - 878. [Full Text] [PDF] |
||||
![]() |
L. L. Knowles and B. C. Carstens Delimiting Species without Monophyletic Gene Trees Syst Biol, December 1, 2007; 56(6): 887 - 895. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Elias, R. I Hill, K. R Willmott, K. K Dasmahapatra, A. V.Z Brower, J. Mallet, and C. D Jiggins Limited performance of DNA barcoding in a diverse community of tropical butterflies Proc R Soc B, November 22, 2007; 274(1627): 2881 - 2889. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





), not reciprocally monophyletic (nonRM;
), and data not available (NA;
) in cases where only a single sequence was available per taxon. With respect to assessment of performance of reciprocal monophyly, each nonRM at reproductive isolation > 0.75 is regarded as a false negative and each RM at reproductive isolation < 0.75 is considered to be a false positive.


