© 2005 Society of Systematic Biologists
Under-parameterized Model of Sequence Evolution Leads to Bias in the Estimation of Diversification Rates from Molecular Phylogenies
Edited by Peter Linder
1 Department of Biology, Campus Box 1229, Washington University St. Louis, Missouri, 63130, USA; E-mail: ljrevell{at}wustl.edu (L.J.R.)
2 Current address: Biodiversity Centre, University of British Columbia 6270 University Blvd., Vancouver, British Columbia, V6T 1 Z4, Canada; E-mail: harmon{at}zoology.ubc.ca
3 Current address: Center for Population Biology, University of California One Shields Avenue, Davis, California, 95616, USA; E-mail: reglor{at}ucdavis.edu
Received November 5, 2004; Revised March 4, 2005; Accepted May 24, 2005
Macroevolutionary inferences from molecular phylogenies are becoming increasingly common (see Harvey et al., 1996; Mooers and Heard, 1997; Pagel, 1999; Barraclough and Nee, 2001). Many methods in which phylogenies are invoked for historical inference assume that a molecular phylogeny is an errorless representation of the underlying phylogenetic history of the included taxa (but see Lutzoni et al., 2001; Huelsenbeck et al., 2000; Huelsenbeck and Rannala, 2003). However, molecular phylogenies are estimates of this history based on a particular model of evolution; thus, there is some error associated with their estimation (Huelsenbeck and Kirkpatrick, 1996). Here we explore the effects of a particular type of error in phylogenetic branch-length estimation, that caused by assuming an underparameterized model of molecular evolution, on the
-statistic of Pybus and Harvey (2000), a statistic that tests for changes in the rate of diversification through time. Although we restrict our attention to the estimation of diversification rates, our findings are germane to any macroevolutionary inferences relying on the accurate estimation of phylogenetic branch lengths such as molecular dating (e.g., Welch and Bromham, 2005) and probabilistic methods for ancestral state reconstruction (e.g., Ronquist, 2004).
In phylogenetic analyses, models of molecular evolution are often used to estimate branch lengths so that distances separating species on the reconstructed phylogeny are proportional to the time since the species shared a common ancestor multiplied by the substitution rate on the branches separating the taxa (Felsenstein, 2004). When recovering these branch lengths, models of nucleotide substitution are necessary to correct for the effect of superimposed substitutions as genetic differences between taxa accrue. Such models vary from simple to complex. The simplest model, the Jukes and Cantor (1969) model, assumes that all types of nucleotide substitutions are equiprobable, that all sites share a common substitution rate, and that base frequencies are equal. If these assumptions are violated, distances calculated using a Jukes-Cantor correction will likely be underestimates of the actual extent of evolutionary divergence, with misestimation particularly severe for branches connecting more divergent sequences (Yang et al., 1994; Lemmon and Moriarty, 2004). More sophisticated models allow rate heterogeneity across sites (Uzzell and Corbin, 1971; Jin and Nei, 1990), incorporate invariant sites (Hasegawa, 1987), or allow specific types of substitutions to occur at different rates (Kimura, 1980; Lanave et al., 1984; Felsenstein, 2004). The most heavily parameterized model commonly used in phylogenetic studies (the general time reversible model with invariant sites and
-distributed rate heterogeneity, GTR + I +
: Uzzell and Corbin, 1971; Jin and Nei, 1990; Rodríguez et al., 1990) requires the specification of 10 parameters (Felsenstein, 2004).
In phylogenetic analyses, models of molecular evolution are often used to estimate branch lengths so that distances separating species on the reconstructed phylogeny are proportional to the time since the species shared a common ancestor multiplied by the substitution rate on the branches separating the taxa (Felsenstein, 2004). When recovering these branch lengths, models of nucleotide substitution are necessary to correct for the effect of superimposed substitutions as genetic differences between taxa accrue. Such models vary from simple to complex. The simplest model, the Jukes and Cantor (1969) model, assumes that all types of nucleotide substitutions are equiprobable, that all sites share a common substitution rate, and that base frequencies are equal. If these assumptions are violated, distances calculated using a Jukes-Cantor correction will likely be underestimates of the actual extent of evolutionary divergence, with misestimation particularly severe for branches connecting more divergent sequences (Yang et al., 1994; Lemmon and Moriarty, 2004). More sophisticated models allow rate heterogeneity across sites (Uzzell and Corbin, 1971; Jin and Nei, 1990), incorporate invariant sites (Hasegawa, 1987), or allow specific types of substitutions to occur at different rates (Kimura, 1980; Lanave et al., 1984; Felsenstein, 2004). The most heavily parameterized model commonly used in phylogenetic studies (the general time reversible model with invariant sites and
-distributed rate heterogeneity, GTR + I +
: Uzzell and Corbin, 1971; Jin and Nei, 1990; Rodríguez et al., 1990) requires the specification of 10 parameters (Felsenstein, 2004).
Numerous studies have characterized the sensitivity of phylogenetic reconstruction to model misspecification (e.g., Fukami-Kobayashi and Tateno, 1991; Olsen, 1991; Ruvolo et al., 1993; Yang et al., 1994, 1995; Gaut and Lewis, 1995; Adachi and Hasegawa, 1995). Often, topological inference has been found to be robust to model misspecification even when branch lengths are severely misestimated (e.g., Fukami-Kobayashi and Tateno, 1991; Gaut and Lewis, 1995; Håstad and Björklund, 1998; Lemmon and Moriarty, 2004). Ignoring heterogeneity in substitution rate among sites has the most severe effect on branch lengths, causing them to be consistently underestimated (Yang et al., 1994; Gaut and Lewis, 1995).
The
- statistic of Pybus and Harvey (2000) measures the relative positions of internal nodes in a tree (Pybus and Harvey, 2000) and is defined as:
|
| (1) |
-statistic has a standard normal distribution (Pybus and Harvey, 2000). If
is significantly less than zero, internal nodes are concentrated closer to the base of the tree than expected under a constant-rate model of diversification, suggesting that the net diversification rate of the group has slowed over time; conversely, positive
indicates that more internal nodes are concentrated near the tips of the tree and suggests that the net rate of diversification has increased through time (Pybus and Harvey, 2000; Pybus et al., 2002).
In several empirical studies, the
-statistic shows highly significant departures from constant-rate pure-birth cladogenesis (Harmon et al., 2003; Linder et al., 2003; Shaw et al., 2003; Kadereit et al., 2004; Machordom and Macpherson, 2004; Williams and Reid, 2004; Zhang et al., 2004). Several of these studies reveal slow-downs in the net rate of diversification through time (Harmon et al., 2003; Kadereit et al., 2004; Machordom and Macpherson, 2004; Williams and Reid, 2004; Zhang et al., 2004). For example, Harmon et al. (2003) found highly significantly negative
in three of four lizard groups, with the fourth also negative but nonsignificant, and Kadereit et al. (2004) found similar results amongst several genera of alpine plants. In each case the authors hypothesized that geographical or ecological processes contributed to the observed patterns. In contrast, Linder et al. (2003) found significantly positive
among African Restionaceae.
Considerable attention has been paid to the effect of incomplete taxonomic sampling (Pybus and Harvey, 2000; Pybus et al., 2002) and the method of phylogenetic ultrametricization (Barraclough and Vogler, 2002; Martin et al., 2004; Rüber and Zardoya, 2005) on the estimation of
. However, the effect of the assumptions of phylogenetic inference, and in particular the model of sequence evolution, has been given relatively short shrift. Many phenomena of interest to evolutionary biologists, such as the estimation of diversification rates, rely on phylogenetic branch lengths as estimates of the temporal extent of evolutionary divergence, and branch-length estimation is highly sensitive to proper model specification (e.g., Fukami-Kobayashi and Tateno, 1991; Ruvolo et al., 1993; Gaut and Lewis, 1995; Håstad and Björklund, 1998; Lemmon and Moriarty, 2004). As such, assessment of substitution model adequacy, a criterion that may be rarely satisfied by empirical data if the process of molecular evolution is considerably more complicated than our models describe (see Goldman, 1993), is crucial when evolutionary parameters are to be estimated from a molecular phylogeny with branch lengths.
Because the
-statistic depends on branch length estimates, it may be adversely affected by estimating branches in the molecular phylogeny using an underspecified or inadequate model. In fact, Pybus and Harvey (2000) note that the statistic is liable to be misestimated if factors affecting error in branch-length estimation act unevenly in the tree. Here we analyze the extent of the bias in
resulting from nucleotide substitution model underparameterization and focus on characterizing its magnitude and direction, as well as its interaction with other factors such as tree balance, tree size, total tree depth, and sequence length. We also describe the particular circumstances under which these potential biases are likely to be a concern for empirical studies.
| Simulation and Analyses |
|---|
|
|
|---|
As we are primarily interested in testing the effect of model under parameterization on the estimation of
, most of our analyses focus on relatively simple models of molecular evolution that vary with respect to a limited number of critical parameters (e.g., number of substitution parameters, rate heterogeneity among sites, invariant sites), while ignoring variation in tree balance, number of taxa, tree length, and size of the nucleotide data set. However, because these factors invariably differ among empirical studies and likely influence the estimation and hypothesis testing of
, we also explore the impact of these additional factors using a somewhat more restricted set of simulations. Finally, because empirical studies usually use more heavily parameterized models of nucleotide substitution than those explored in the aforementioned analyses, we investigate the effect of more complicated models of sequence evolution on the estimation of
.
Simulation Test for Effect of Underparameterization
We used simulations of phylogenies and associated nucleotide data sets to test the effect of model parameterization on the estimation of
under a range of conditions. We first used the program Phyl-O-Gen (Rambaut, 2002) to simulate 100 phylogenies containing 100 taxa under a constant-rate pure birth model—the null model assumed by Pybus and Harvey (2000) for the
-statistic. This set of 100 trees was used in the majority of the analyses described below.
We then used the program Seq-Gen (Rambaut and Grassly, 1997) to simulate 1000 base-pair data sets on each phylogeny under three of different models of molecular evolution and a range of parameter values for each model: (1) the Jukes-Cantor model (JC; Jukes and Cantor 1969) with heterogeneous rates among sites, under a four category discrete approximation of
-distributed rate heterogeneity (JC+
; Uzzell and Corbin, 1971; shape parameter of the
distribution 
= 0.1, 0.5, 1.0, 5.0, and 10.0; Jin and Nei, 1990); (2) the Jukes-Cantor model with invariant sites (JC + I; proportion of invariant sites, p-inv = 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, and 0.9; Hasegawa et al., 1987); and (3) the K80 model (also known as the Kimura 2-parameter model; Kimura, 1980), in which transitions and transversions are allowed different rates (transition : transversion ratio
= 0.5, 1, 2, 4, 8, and 16). For each simulated data set, we scaled total tree length to 0.5 substitutions per site.
For each simulated data set, we estimated maximum likelihood branch lengths using PAUP*4b10 (Swofford, 1998) while constraining the correct topology and assuming a molecular clock under two models: (1) the generating model (JC+
, JC+I, or K80) with its known parameter value, and (2) the simpler JC model. We then calculated
for each of the trees with estimated branch lengths. All
calculations were performed using a C program, ltt.c, available from the authors upon request.
Because all phylogenies were simulated under a pure-birth model with a constant speciation rate over time, we expect that
will have a standard normal distribution. Thus, we calculated one-tailed type I error rates as the proportion of data sets for which the value of
for the estimated phylogeny was significant (i.e., fell below 95% of the standard normal distribution). We focus on the left tail of the distribution because a one-tailed rejection threshold is often applied in empirical studies (e.g., Pybus and Harvey, 2000; Harmon et al., 2003; Kadereit et al., 2004). We also calculated bias as the average difference between the estimated value of
and its true value. We assessed the significance of the type I error rate by comparing it to the one-tailed 95% binomial confidence limit assuming that the true type I error rate of the statistic was 0.05. We considered the elevation in type I error significant if it exhibited a value in excess of this limit.
Tree Balance
Because tree balance can influence phylogenetic reconstruction (Heard, 1992; Huelsenbeck and Kirkpatrick, 1996; Mooers et al., 1995; Mooers and Heard, 1997), we calculated mean corrected I' (a metric for assessing tree balance: Fusco and Cronk, 1995; Purvis et al., 2002; Agapow and Purvis, 2002) for each of the 100 simulated trees using a C program (balance.c) available from the authors. For a subset of our simulated data sets, we tested for a significant relationship between the deviation of
from its true value and the mean corrected imbalance of the simulated phylogeny. For this analysis, we focused on data sets simulated under JC+
with 
= 0.5, JC+I with p-inv = 0.5, and K80 with
= 16, with associated branch-length estimates made assuming an underparameterized substitution model (JC), because each of these cases showed substantial bias in
(see Results).
Number of Taxa, Total Tree Length, and Sequence Length
To investigate the effect of number of taxa, total tree length, and sequence length on the estimation and hypothesis testing of
we restricted our attention to a single substitution model, JC+
with 
= 0.5. We simulated sequences on pure-birth phylogenies containing various numbers of taxa, while varying total tree length. To do this, we simulated 100 stochastic phylogenies that included 10 to 100 taxa at intervals of 10 taxa. We then simulated sequences with 1000 characters on each of these topologies while scaling total tree length to between 0.1 and 1 at intervals of 0.1 substitutions per site. We also simulated sequence data sets with varying numbers of characters, again while varying total tree length. We simulated data sets on 100 taxon phylogenies with nucleotide sequence lengths ranging from 100 to 1000 at 100-nucleotide intervals and used the same range of total tree lengths as in the previous analysis. We then estimated branch lengths, again constraining to the correct topology, for each resulting data set assuming both the generating model (JC+
) and the underparameterized model (JC). For each phylogeny with estimated branch lengths we calculated
, and for each set of phylogenies (corresponding to an estimation model and a set of tree length and number of taxa or sequence length) we calculated mean deviation from the true
, and type I error rate in
(with its associated binomial probability assuming no bias) for both the full and underparameterized models.
More Complex Models
One limitation of the above simulations is that empirical studies usually use more complex models to estimate branch lengths than the JC and K80 models discussed above. Furthermore, parameters are often estimated from the data rather than known a priori. To investigate the behavior of
under more realistic conditions, we simulated sequence data under the most complex model that is commonly employed in molecular phylogenetic studies, the general time reversible model with invariant sites and
-distributed rate heterogeneity (GTR+I+
: Uzzell and Corbin, 1971; Jin and Nei, 1990; Rodríguez et al., 1990). Because this model has 10 parameters, we could not reasonably explore the entire parameter space in our simulations; instead, we used three sets of substitution rates, base frequencies, invariant sites, and rate heterogeneity drawn from empirical studies of mitochondrial DNA and a nuclear intron (Table 1; Townsend et al., 2004; Kozak et al., 2005). For each set of parameters, we simulated a single 1000-nucleotide data set on each of five pure-birth phylogenies generated using Phyl-O-Gen, with 100 taxa per phylogeny and total tree length scaled to 0.5 substitutions per site. We then used maximum likelihood in PAUP*4b10 to estimate parameter values and branch lengths on these topologies under the full model (GTR+I+
) and seven models representing special cases of the generating model: JC, F81 (Felsenstein, 1981), HKY (Hasegawa et al., 1985), TrN (Tamura and Nei, 1993), TIM (Rodríguez et al., 1990), GTR (Rodríguez et al., 1990), and GTR+I (Hasegawa et al., 1987; Rodríguez et al., 1990). We then calculated
for these estimated phylogenies and compared these estimates to their known values.
|
| Results |
|---|
|
|
|---|
Simulation Test for Effect of Underparameterization
For data generated under a range of rate heterogeneities,
is unbiased and exhibits appropriate type I error rates when estimated under the correct model (JC+
) (Fig. 1a, b, open dots). In contrast,
is strongly negatively biased, with elevated type I error rates, when branch lengths for molecular sequences simulated with moderate to high degrees of rate heterogeneity (
shape parameter,
= 0.1–1) are estimated with the underparameterized model (JC), which does not incorporate rate heterogeneity (Fig. 1a, b filled dots). For extreme rate heterogeneity, bias resulting from model under-parameterization corresponds with conspicuously short early branches and relatively long late branches in the reconstructed phylogeny (compare Fig. 2a to Fig. 2c), an effect absent from the phylogeny in which branch lengths are estimated using the generating model (Fig. 2b). Not surprisingly, therefore, type I error rates with the underparameterized model are highest (up to 0.82) for the data sets simulated under the highest degree of rate heterogeneity (Fig. 1b). This bias persists, but becomes slight as rates become more uniform across sites (
5; Fig. 1a, b). At 
5 the underparameterized model of sequence evolution no longer exhibits significantly elevated type I error (Fig. 1b).
|
|
When sequences are simulated under a model that incorporates invariant sites (JC+I) and branch lengths are estimated using the fully parameterized model,
is unbiased with appropriate type I error rates (Fig. 1c, d). However,
is negatively biased when estimated using the under-parameterized model that lacks the invariant sites parameter (JC) for even moderate proportions of invariant sites (p-inv = 0.2), with the magnitude of the bias increasing with the proportion of invariant sites (Fig. 1c). Despite this bias, underparameterization results in only insignificantly or marginally significantly elevated type I error when the proportion of invariant sites is relatively low (Fig. 1d). For proportions of invariant sites between 0.1 and 0.4, one-tailed type I error rates are
0.11. However, as invariant sites are increased above p-inv = 0.5, type I error increases (Fig. 1d) such that all
values are significant at p-inv = 0.9.
When simulated sequence data is evolved under the K80 model, which incorporates different rates of transitions and transversions, and branch lengths are estimated with the fully parameterized model of sequence evolution,
is unbiased (Fig. 1e) and type I error is low (Fig. 1f). Although
is slightly biased when estimated under JC (Fig. 1e), the observed bias is of lower magnitude over all tested values of the ratio of transitions to transversions (0.25
16) than that created by rate heterogeneity or invariant sites (Fig. 1) and does not lead to significantly elevated type I error rates except at the most extreme value of
tested in this study, at which point type I error is only marginally elevated (
= 16: type I error = 0.12; Fig. 1f).
Tree Balance
There was no significant correlation between tree imbalance and the deviation from true
when an underspecified model was used in branch-length estimation for any of the models tested (JC+
, 
= 0.5: r2 = 0.02, P = 0.17; JC+I, p-inv = 0.5: r2 = 0.005, P = 0.49; K80,
= 16: r2 = 0.006, P = 0.46).
Number of Taxa, Total Tree Length, and Sequence Length
When sequences were evolved and branch lengths estimated under the generating model (JC+
), mean deviation from true
was low across all tree depths and numbers of taxa (Fig. 3a). Type I error rates were also low (Fig. 3c), and in no instance was type I error significantly inflated (Fig. 3e). In contrast, when branch lengths were estimated using the underparameterized model (JC), deviation from true
was negative under all combinations of tree size and length, but most severe for long trees including many taxa (Fig. 3b). One-tailed type I error rates are significantly elevated for trees of length longer than 0.5 substitutions per site, regardless of the number of taxa (Fig. 3f).
|
When branches were estimated using the full model (JC+
), deviation from true
was low except in data sets with very short trees and small sequence length, for which substantial positive bias in
was observed (Fig. 4a). One-tailed type I error rates were not significantly different from 0.05 for all combinations of sequence length and tree depth (Fig. 4c, e). When branch lengths were estimated ignoring rate heterogeneity, deviation from true
is severe and negative (Fig. 4b), leading to significantly elevated type I error rates for all but the shortest tree depths regardless of sequence length (Fig. 4d, f). For very short sequence length and tree length, there is a slight positive bias in
(Fig. 4b).
|
Complex Models
In all cases when sequences were simulated under empirically derived parameter values for the GTR+I +
model,
is significantly negatively biased when the simplest models (JC, F81) are used to estimate branch lengths. Minor improvement in
is observed when more substitution types are included in the model (models HKY and GTR), but
is still significantly negatively biased. There is also some improvement when invariant sites are included (model GTR+I), but
values are not unbiased until the correct model (GTR+I+
) is used (Fig. 5). Although the magnitude of the bias depends on the particular parameter values used, the general patterns are the same in each case.
|
| Discussion |
|---|
|
|
|---|
Model underparameterization can lead to significant negative bias and inflated type I error in the
-statistic of Pybus and Harvey (2000) calculated from molecular phylogenies. In other words, model underparameterization may lead to a spurious pattern of rapid cladogenesis early in a group's history. This problem is particularly severe when rate heterogeneity (
) is ignored (see also Lemmon and Moriarty, 2004) but is also significant if the invariant sites parameter (I) is excluded and when highly unequal transition and transversion rates (
) are ignored (Fig. 1). The finding that underparameterization leads to negatively biased values of
is concordant with the observation that branch lengths are more severely underestimated early rather than late in the tree when they are estimated using an underspecified substitution model (Gojobori et al., 1982; Kuhner and Felsenstein, 1994). However, the large magnitude of the effect (type I error = 1.0 in some cases) highlights the importance of assessing model adequacy before carrying out analyses of diversification, such as the
test, on molecular phylogenies with branch lengths.
In considering the consequences of this finding, a distinction must be drawn between model selection, such as the likelihood ratio–based method implemented in ModelTest (Posada and Crandall, 1998), and tests of model adequacy (see Goldman, 1993; Bollback, 2002). Tests of model adequacy consider the absolute, rather than the relative, fit of the data to the prescribed model: such tests may indicate that even the most heavily parameterized model available for phylogenetic analyses (e.g., GTR+I+
) is inadequate despite the fact that it has been selected by a relative criterion such as the likelihood ratio test. Although absolute model adequacy can now be assessed using several methods (e.g., Goldman, 1993; Bollback, 2002; also described in Lemmon and Moriarty, 2004), such tests are stringent and may reject the adequacy of all available models when applied to empirical data sets (see Goldman, 1993).
For the time being, then, we recommend that significantly negative
be considered cautiously in empirical studies in which model adequacy has not been assessed or in which model adequacy has been rejected. It may be possible to further increase model adequacy—and the resulting accuracy of
estimation—for empirical data sets by modeling additional aspects of rate heterogeneity that can now be incorporated into modern phylogenetic analyses. For example, Bayesian analyses as implemented in MrBayes (Ronquist and Huelsenbeck, 2003) may be conducted with data sets that are divided into several partitions for which parameters are independently estimated.
It may also be useful to consider the length of the tree, the number of taxa, and the number of characters used in the analysis when considering the potential for bias in the
-statistic as a consequence of model inadequacy. Trees of very short length were not particularly susceptible to type I error in the
-test, nor were trees containing few taxa. For long trees, in which branches contain many superimposed substitutions, the consequences on
due to model misspecification are much more severe. For such trees, adding more taxa actually increases the power of the
-statistic to detect spurious results such that even mild apparent deviations from constant-rate speciation, whether real or an artifact of model underparameterization, are statistically significant. Thus, our results suggest that trees of long length, trees containing many taxa, or trees featuring both of these properties are particularly susceptible to bias in the estimation and hypothesis testing of diversification rates using the
-statistic.
Short sequence length has an opposite effect on the
-statistic to that inflicted by model underparameterization. Short sequence length results in some internodes lacking substitutions entirely. Their length is estimated to be very short regardless of the model of nucleotide substitution used in the analysis. Because there are more branches towards the tips of any phylogenetic tree, this phenomenon results in positively inflated
, particularly when sequence and tree lengths are very short. However, for the range of sequence lengths used in empirical studies, the magnitude of this positive bias is small relative to the effect of model underparameterization.
Although we restrict our attention to the
-statistic of Pybus and Harvey (2000), the observation that model underparameterization can lead to severe bias in tree shape extends to other evolutionary inferences that rely on unbiased estimates of phylogenetic branch lengths. For example, comparative methods often assume that the probability of character change on a given branch is proportional to the branch length (e.g., Harvey and Pagel, 1991; Pagel, 1994). Under conditions such as those described above, where oversimplified models are used, such methods will tend to concentrate more change on less severely underestimated, later branches, and less on earlier branches, than if their true lengths were known without error.
| Conclusions |
|---|
|
|
|---|
Our analyses suggest that model underparameterization leads to strong negative bias in the estimation of Pybus and Harvey's
statistic, which may result in the incorrect inference that the rate of cladogenesis has slowed in the course of a group's history. The observation of a decreasing rate of cladogenesis over time is not unexpected, having both theoretical justification (Walker and Valentine, 1984; Hubbell, 2001) and empirical support from paleontological studies (e.g., Sepkoski, 1978, 1979). However, in some cases, molecular phylogenetic studies that recover this pattern via estimation of the
-statistic must be viewed with caution. Tests of absolute model adequacy, such as that described in Bollback (2002), have the potential to alleviate this problem, but are often likely to reject all available models (e.g., Goldman, 1993). Failure in a test of absolute model adequacy is of considerable concern particularly if tree length and sequence length are very long, and if the tree contains many taxa. If the preferred model is found to be inadequate by absolute criteria, any observation of negative
should be interpreted with caution. | Acknowledgements |
|---|
This work was supported in part by a grant from the National Science Foundation (DEB-9982736). We thank J. Losos, K. Kozak, R. B. Langerhans, and S. Heard for comments on the manuscript and members of the Losos lab for much useful discussion. H. P. Linder, R. D. M. Page, D. Posada, and an anonymous reviewer provided insightful criticisms of a previous version of this manuscript.
| References |
|---|
|
|
|---|
-
Adachi J., Hasegawa M. Improved dating of the human/ chimpanzee separation in the mitochondrial DNA tree: Heterogeneity among amino acid sites. J. Mol. Evol. (1995) 40:622–628.[CrossRef][Web of Science][Medline]
Agapow P.-M., Purvis A. Power of eight tree shape statistics to detect nonrandom diversification: A comparison by simulation of two models of cladogenesis. Syst. Biol. (2002) 51:866–872.
Barraclough T. G., Nee S. Phylogenetics and speciation. Trends Ecol. Evol. (2001) 16:391–399.[CrossRef][Medline]
Barraclough T. G., Vogler A. P. Recent diversification rates in North American tiger beetles estimated from a dated mtDNA phylogenetic tree. Mol. Biol. Evol. (2002) 19:1706–1716.
Bollback J. P. Bayesian model adequacy and choice in phylogenetics. Mol. Biol. Evol. (2002) 19:1171–1180.
Felsenstein J. Evolutionary trees from DNA sequences: A maximum likelihood approach. J. Mol. Evol. (1981) 17:368–376.[CrossRef][Web of Science][Medline]
Felsenstein J. Inferring phylogenies (2004) Sunderland, Massachusetts: Sinauer Associates.
Fukami-Kobayashi K., Tateno Y. Robustness of maximum likelihood tree estimation against different patterns of base substitution. J. Mol. Evol. (1991) 32:79–91.[CrossRef][Web of Science][Medline]
Fusco G., Cronk Q. C. B. A new method for evaluating the shape of large phylogenies. J. Theor. Biol. (1995) 175:235–243.[CrossRef][Web of Science]
Gaut B. S., Lewis P. O. Success of maximum likelihood phylogeny inference in the four-taxon case. Mol. Biol. Evol. (1995) 12:152–162.[Abstract]
Gojobori T., Ishii K., Nei M. Estimation of average number of nucleotide substitutions when the rate of substitution varies with nucleotide. J. Mol. Evol. (1982) 18:414–423.[CrossRef][Web of Science][Medline]
Goldman N. Statistical tests of models of DNA substitution. J. Mol. Evol. (1993) 36:182–198.[CrossRef][Web of Science][Medline]
Harmon L. J., Schulte J. A. II, Larson A., Losos J. B. Tempo and mode of evolutionary radiation in Iguanian lizards. Science (2003) 301:961–964.
Harvey P. H., Leigh Brown A. J., Maynard Smith J., Nee S., eds. New uses for new phylogenies (1996) Oxford, UK: Oxford University Press.
Hasegawa M., Kishino H., Yano T. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. (1985) 22:160–174.[CrossRef][Web of Science][Medline]
Hasegawa M., Kishino H., Yano T. Man's place in Hominoidea as inferred from molecular clocks of DNA. J. Mol. Evol. (1987) 26:132–147.[CrossRef][Web of Science][Medline]
Håstad O., Björklund M. Nucleotide substitution models and estimation of phylogeny. Mol. Biol. Evol. (1998) 15:1381–1389.
Heard S. B. Patterns in tree balance among cladistic, phenetic, and randomly generated phylogenetic trees. Evolution (1992) 46:1818–1826.[CrossRef][Web of Science]
Hubbell S. P. The unified neutral theory of biodiversity and biogeography (2001) Princeton, New Jersey: Princeton University Press.
Huelsenbeck J. P., Hillis D. M. Success of phylogenetic methods in the four taxon case. Syst. Biol. (1993) 42:247–264.
Huelsenbeck J. P., Kirkpatrick M. Do phylogenetic methods produce trees with biased shapes? Evolution (1996) 50:1418–1424.[CrossRef][Web of Science]
Huelsenbeck J. P., Rannala B. Detecting correlation between characters in a comparative analysis with uncertain phylogeny. Evolution (2003) 57:1237–1247.[CrossRef][Web of Science][Medline]
Huelsenbeck J. P., Rannala B., Masly J. P. Accommodating phylogenetic uncertainty in evolutionary studies. Science (2000) 288:2349–2350.
Jin L., Nei M. Limitations of the evolutionary parsimony method of phylogenetic analysis. Mol. Biol. Evol. (1990) 7:82–102.[Abstract]
Jukes T. H., Cantor C. R. Evolution of protein molecules. In: Mammalian protein metabolism, Volume III—Munro M. N., ed. (1969) New York: Academic Press. Pages 21–132.
Kadereit J. W., Griebler E. M., Comes H. P. Quaternary diversification in European alpine plants: Pattern and process. Philos. T. Roy. Soc. B (2004) 359:265–274.[CrossRef]
Kimura M. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. (1980) 16:111–120.[CrossRef][Web of Science][Medline]
Kozak K. H., Larson A., Bonett R. M., Harmon L. J. Phylogenetic analysis of ecomorphological divergence, community structure, and diversification rates in dusky salamanders (Plethodontidae: Desmognathus). Evolution (2005) 59:2000–2016.[Web of Science][Medline]
Kuhner M. K., Felsenstein J. A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol. Biol. Evol. (1994) 11:459–468.[Abstract]
Lemmon A. R., Moriarty E. C. The importance of proper model assumption in Bayesian phylogenetics. Syst. Biol. (2004) 53:265–277.
Linder H. P., Eldenäs P., Briggs B. G. Contrasting patterns of radiation in African and Australian Restionaceae. Evolution (2003) 57:2688–2702.[Web of Science][Medline]
Lutzoni F., Pagel M., Reeb V. Major fungal lineages are derived from lichen symbiotic ancestors. Nature (2001) 411:937–940.[CrossRef][Medline]
Machordom A., Macpherson E. Rapid radiation and cryptic speciation in squat lobsters of the genus Munida (Crustacea, Decapoda) and related genera in the South West Pacific: molecular and morphological evidence. Mol. Phylogenet. Evol. (2004) 33:259–279.[CrossRef][Web of Science][Medline]
Martin A. P., Costello E. K., Meyer A. F., Nemergut D. R., Schmidt S. K. The rate and pattern of cladogenesis in microbes. Evolution (2004) 58:946–955.[CrossRef][Web of Science][Medline]
Mooers A. Ø., Heard S. B. Inferring evolutionary process from phylogenetic tree shape. Q. Rev. Biol. (1997) 72:31–54.[CrossRef]
Mooers A. Ø, Page R. D. M., Purvis A., Harvey P. H. Phylogenetic noise leads to unbalanced cladistic tree reconstructions. Syst. Biol. (1995) 44:332–342.
Olsen G. J. Systematic underestimation of tree branch lengths by Lake's operator metrics: An effect of position dependent substitution rates. Mol. Biol. Evol. (1991) 8:592–608.[Web of Science]
Pagel M. Inferring the historical patterns of biological evolution. Nature (1999) 401:877–884.[CrossRef]
Posada D., Crandall K. A. Modeltest: Testing the model of DNA substitution. Bioinformatics (1998) 14:817–818.
Purvis A., Katzourakis A., Agapow P.-M. Evaluating phylogenetic tree shape: Two modifications to Fusco & Cronk's method. J. Theor. Biol. (2002) 214:99–103.[CrossRef][Web of Science][Medline]
Pybus O. G., Harvey P. H. Testing macro-evolutionary models using incomplete molecular phylogenies. Proc. R. Soc. Lond. B (2000) 267:2267–2272.[Medline]
Pybus O. G., Rambaut A., Holmes E. C., Harvey P. H. New inferences from tree shape: Numbers of missing taxa and population growth rates. Syst. Biol. (2002) 51:881–888.
Rambaut A. Phyl-O-Gen: phylogenetic tree simulator package, v1.1 (2002) See http://evolve.zoo.ox.ac.uk/software.html?id=phylogen.
Rambaut A., Grassly N. C. Seq-gen: An application for Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. (1997) 13:235–238.
Rodríguez F., Oliver J. L., Marín A., Medina J. R. The general stochastic model of nucleotide substitution. J. Theor. Biol. (1990) 142:485–501.[Web of Science][Medline]
Ronquist F. Bayesian inference of character evolution. Trends Ecol. Evol. (2004) 19:475–481.[CrossRef][Medline]
Ronquist F., Huelsenbeck J. P. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics (2003) 19:1572–1574.
Rüber L., Zardoya R. Rapid cladogenesis in marine fishes revisited. Evolution (2005) 59:1119–1127.[Web of Science][Medline]
Ruvolo M., Zehr S., von Dornum M., Pan D., Chang B., Lin J. Mitochondrial COII sequences and modern human origins. Mol. Biol. Evol. (1993) 10:1115–1135.[Abstract]
Sepkoski J. J. Jr. A kinetic model of Phanerozoic taxonomic diversity I. Analysis of marine orders. Paleobiology (1978) 2:223–251.
Sepkoski J. J. Jr. A kinetic model of Phanerozoic taxonomic diversity II. Early Phanerozoic families and multiple equilibria. Paleobiology (1979) 3:222–251.
Shaw A. J., Cox C. J., Goffinet B., Buck W. R., Boles S. B. Phylogenetic evidence of a rapid radiation of pleurocarpous mosses (Bryophyta). Evolution (2003) 57:2226–2241.[Web of Science][Medline]
Swofford D. L. PAUP*: Phylogenetic analysis using parsimony (*and other methods) (1998) Sunderland, Massachusetts: Sinauer Associates.
Tamura K., Nei M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. (1993) 10:512–526.[Abstract]
Townsend T. M., Larson A., Louis E., Macey J. R. Molecular phylogenetics of Squamata: The position of snakes, amphisbaenians, and dibamids, and the root of the squamate tree. Syst. Biol. (2004) 53:735–757.
Uzzell T., Corbin K. W. Fitting discrete probability distributions to evolutionary events. Science (1971) 172:1089–1096.
Walker T. D., Valentine J. W. Equilibrium models of the evolutionary species diversity and the number of empty niches. Am. Nat. (1984) 124:887–899.[CrossRef][Web of Science]
Welch J. J., Bromham L. Molecular dating when rates vary. Trends Ecol. Evol. (2005) 20:320–327.[CrossRef][Medline]
Williams S. T., Reid D. G. Speciation and diversity on tropical rocky shores: A global phylogeny of snails of the genus Echinolittorina. Evolution (2004) 58:2227–2251.[CrossRef][Web of Science][Medline]
Yang Z., Goldman N., Friday A. Comparison of models for nucleotide substitution used in maximum-likelihood phylogenetic estimation. Mol. Biol. Evol. (1994) 11:316–324.[Abstract]
Yang Z., Goldman N., Friday A. Maximum likelihood trees from DNA sequences: a peculiar statistical estimation problem. Syst. Biol. (1995) 44:384–399.[Abstract]
Zhang L. B., Comes H. P., Kadereit J. W. The temporal course of quaternary diversification in the european high mountain endemic Primula sect. Auricula (Primulaceae). Int. J. Plant Sci. (2004) 165:191–207.[CrossRef][Web of Science]
This article has been cited by other articles:
![]() |
D. L. Rabosky Heritability of Extinction Rates Links Diversification Patterns in Molecular Phylogenies and Fossils Syst Biol, December 1, 2009; 58(6): 629 - 640. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. L Rabosky and I. J Lovette Density-dependent diversification in North American wood warblers Proc R Soc B, October 22, 2008; 275(1649): 2363 - 2371. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. J. Revell, L. J. Harmon, and D. C. Collar Phylogenetic Signal, Evolutionary Process, and Rate Syst Biol, August 1, 2008; 57(4): 591 - 601. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||







