Skip Navigation

Systematic Biology 2006 55(6):993-1003; doi:10.1080/10635150601087641
This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (15)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Marshall, D. C.
Right arrow Articles by Buckley, T. R.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Marshall, D. C.
Right arrow Articles by Buckley, T. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2006 Society of Systematic Biologists

Accurate Branch Length Estimation in Partitioned Bayesian Analyses Requires Accommodation of Among-Partition Rate Variation and Attention to Branch Length Priors

Edited by Allan Baker: Associate Editor

David C. Marshall1, Chris Simon1,2 and Thomas R. Buckley3

1 Department of Ecology and Evolutionary Biology, University of Connecticut 75 N. Eagleville Road, U-3043, Storrs, Connecticut 06269, USA E-mail: david.marshall{at}uconn.edu (D.C.M.) E-mail: chris.simon{at}uconn.edu (C.S.)
2 School of Biological Sciences, Victoria University of Wellington Wellington, New Zealand
3 Landcare Research Private Bag 92170, Auckland, New Zealand E-mail: BuckleyT{at}landcareresearch.co.nz

Received December 14, 2005; Revised March 1, 2006; Accepted September 5, 2006 Molecular phylogenetic studies are making increasing use of partitioned Bayesian analyses via software tools like MrBayes, version 3 (Ronquist and Huelsenbeck, 2003). Data partitioning is important because, as long as the same topology/history underlies all of the partitions, it addresses some of the problems associated with the combination of data sets with heterogeneous rates (Bull et al., 1993) and eliminates the need to argue the validity of tests that have been used to judge data combinability (e.g., Huelsenbeck et al., 1994; Huelsenbeck and Bull, 1996; Yang, 1996; Cunningham, 1997; Barker and Lutzoni, 2002; Buckley et al., 2002; Dowton and Austin, 2002). In addition, new studies indicate that data partitioning and the use of mixed models often dramatically improve the fit of model to data without the cost of overparameterization (Yang, 1996; Nylander et al., 2004; Brandley et al., 2005). While applying partitioned models to studies of protein-coding mitochondrial data, we have found that analyses using MrBayes may infer overly "long" trees if among-partition rate variation (APRV) is not explicitly accommodated and if data from different partitions evolve at different average rates. We have subsequently confirmed this bias in analyses of simulated datasets, as explained below. Because the erroneous tree lengths (TLs) can be nearly twice as high as those found when APRV is explicitly modeled, this finding should be of interest to those intending to apply previously calculated molecular clock rates to branch lengths estimated under partitioned models, as well as to those inferring evolutionary rates from fossil-calibrated evolutionary trees. Here we discuss the importance of among-partition rate variation and potential pitfalls in implementation of mixed models that accommodate APRV in Bayesian analyses. Note that, for this study, "APRV" refers to rate variation across a priori defined partitions, and a "no-APRV" analysis is one that defines partitions but does not explicitly accommodate among-partition rate variation in the model.


    Effect of Among-Partition Rate Variation on Estimates of Tree Length: An Example and a Simulation Study
 Top
 Effect of Among-Partition Rate...
 Effect of Among-Partition Rate...
 Instability of APRV Analyses...
 Nonidentifiability and...
 Examples from Published...
 Importance of Accurate Branch...
 How to Partition?
 Appendix 1. Methods
 Acknowledgments
 References
 
MrBayes 3 offers three ways for the user to model among-partition rate variation, all of which assume the same topology across partitions. The first option, prset ratepr = fixed, assumes that the average substitution rate (per site) of each partition is the same. The second option, prset ratepr = variable, adds "rate multiplier" parameters (m1, m2, mn) to the overall model that represent the relative average rates of the n partitions, one for each partition. The rates are allowed to vary subject to the constraint that the average rate of substitution across partitions is 1, and the rate multipliers are assigned a dirichlet(<number>, ..., <number>) prior. These multipliers are similar to the cg parameter of Yang (1996), except that in his method cg indicates the relative rate compared to that of the first partition, which is set to 1. The third MrBayes option, unlink brlens, allows relative branch lengths within the tree to vary across partitions—a form of heterotachy (e.g., Philippe et al., 2005). Note that adding gamma-distributed among-site rate variation (Yang, 1994b) to the model within each individual partition does not by itself accommodate rate variation across partitions: APRV is modeled by allowing the average rates of different partitions to vary.

Using data from a study of mostly protein-coding mitochondrial DNA (mtDNA) data, derived from a recent (Plio-Pleistocene age) radiation of New Zealand cicadas of the genus Kikihia (unpublished data), we found that estimates of total tree length are sensitive to underlying assumptions about APRV. Partitioned analyses that unlink branch lengths or assume partition-specific rate multipliers consistently return shorter trees than partitioned analyses that do not accommodate APRV. Table 1 (analyses A to D) compares the mean log-likelihood and model parameter estimates obtained for a set of representative analyses, one unpartitioned and several partitioned. Detailed methods are given in the Appendix; note that the model parameters were estimated separately for each partition in every analysis. The estimated TL for the no-APRV analysis (A) was nearly twice as long as the values obtained by the partitioned analyses that accommodated APRV (analyses B and C). Interestingly, the unpartitioned analysis (D) estimated about the same TL as the partitioned analyses with APRV. The changes in tree length across the three partitioned analyses were accompanied by substantial changes in other model parameters, most notably {alpha}, which determines the shape of the gamma distribution. Although the actual degree of rate variation across codon positions cannot be known with certainty, in this situation there is excellent reason to suspect substantial among-partition rate variation: The data are protein-coding, and third position sites tend to evolve much more rapidly than first or second position sites in such genes. This expectation is strengthened by the much larger number of variable third position sites in the Kikihia data set (Table 2).


View this table:
[in this window]
[in a new window]

 
Table 1 Effect of accommodation of among-partition rate variation (APRV) on parameter estimates, especially tree length, in Bayesian phylogenetic analyses of empirical data. The Partitions column indicates how the mtDNA data were partitioned (by codon position, randomly, or not at all). The APRV column indicates how rate variation among partitions was handled in the analysis model ("none" means APRV not accommodated). "ln L" refers to the mean posterior log-likelihood of the model.

 


View this table:
[in this window]
[in a new window]

 
Table 2 Numbers of variable and parsimony-informative sites by data partition in the Kikihia mtDNA data set.

 
Suspecting that failure to accommodate APRV was causing the longer TL values in runs such as analysis A, we reanalyzed the same mtDNA data set with first, second, and third codon position sites distributed approximately evenly among the three partitions, again without explicitly accommodating APRV in the model (Table 1, analysis E) and again while estimating model parameters separately for each partition. The mean estimate of tree length was, as expected, comparable to the TL found when APRV was accommodated in the codon-partitioned analyses. This result further indicates that higher TL estimates in no-APRV analyses are related to rate variation across partitions.

Although the above test with codon positions randomly assigned to data partitions strongly implicates failure to accommodate partition rate variation as the cause of the higher TL estimates, this inference is limited by the fact that we do not know the true evolutionary properties of the mtDNA data nor the true length of the Kikihia evolutionary tree. To address these limitations, we examined the relationship between tree length and degree of among-partition rate variation using three-partition datasets simulated on a known tree (Fig. 1) under three different levels of APRV (relative partition rates 1:1:1, 3:1:9, and 5:1:25 for treatments A, B, and C, respectively). Within each treatment, the 20 simulated data sets were each analyzed under three partitioned models differing only in the method of accommodation of APRV (no-APRV, rate-multiplier m, and unlinked branches), and the mean log-likelihood and model parameter estimates from the 20 analyses were averaged for each of these three models (Table 3). (Again, "no-APRV" here means that the model is partitioned but does not explicitly accommodate partition rate variation.) Details of the simulations and phylogenetic analyses are found in the Appendix.


Figure 1
View larger version (6K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1 Phylogram used for data simulations.

 


View this table:
[in this window]
[in a new window]

 
Table 3 Distribution of character variation in representative simulated datasets used in this study.

 
The simulated data analyses displayed the same patterns as those observed with the empirical data sets. Tree length was accurately estimated by all partitioned analyses (with and without APRV accommodated) when the data sets were simulated under equal partition rates (Table 4, analyses F, G, H). TL was also accurately estimated when APRV was simulated in the data and accommodated in the analysis model (Analyses B2, B3, C2, C3). In contrast, TL was biased upward in no-APRV analyses (analyses J and M) by approximately 20% with relative partition rates of 3:1:9 and by 50% with simulated partition rates of 5:1:25 (Fig. 2). Values of {kappa} and {alpha} were poorly estimated only when APRV was simulated in the data but not accommodated in the model. When the data were simulated under varying partition rates, mean log-likelihoods were dramatically and equally improved by including the partition rate multiplier or unlinked branches in the model. The two methods of accommodating APRV performed about equally well in estimating tree length and other model parameter estimates.


Figure 2
View larger version (22K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2 Average tree length found in each of the nine combinations of APRV treatment and analysis model (analyses A1–3, B1–3, and C1–3; see Table 4), showing that no-APRV analyses yield upwardly biased estimates when the data are simulated under varying partition rates (analyses B1 and C1). The variation across models was significant within the 3:1:9 (Kruskal-Wallis H = 36.322, P < 0.000) and 5:1:25 (H = 39.437, P < 0.000) APRV treatments and not significant within the 1:1:1 treatment (Kruskal-Wallis H = 3.976, P < 0.137).

 


View this table:
[in this window]
[in a new window]

 
Table 4 Effect of accommodation of among-partition rate variation (APRV) on parameter estimates (± 1 standard deviation) in Bayesian phylogenetic analyses of simulated data. The "Simul. APRV" column indicates the degree of among-partition rate variation simulated, and the Analysis Model column indicates how APRV was accommodated in the analysis of that data set (analyses A1, B1, and C1 did not accommodate APRV). Boldface indicates average estimates that vary more than one standard deviation from the value used in simulating the data. MrBayes analyses employing rate-multipliers do not return partition-specific tree lengths; instead, the product of the estimated rate mutiplier (mn) for each partition (not shown) and TL is given here to facilitate comparison between results from different models. Similarly, TLavg (the average of the three partition-specific per-site substitution rates obtained from an unlinked branch length analysis) is included here for comparison with the TL values returned by the rate multiplier analyses. "ln L" refers to the mean posterior log-likelihood of the model.

 
Why should failure to accommodate among-partition rate variation lead to dramatically higher TL estimates? In a no-APRV analysis, MrBayes estimates a single mean per-site substitution rate (tree length, TL) for the entire data set (Ronquist and Huelsenbeck, 2003-1573), and therefore the same average rate for each partition. If the data in different partitions have evolved under different partition rates, the posterior estimate of TL will represent a compromise between these conflicting processes. But why do no-APRV analyses not converge upon the overall mean TL value as the best compromise? Higher-than-average TL estimates worsen the fit to the data in the slowly evolving partitions; lower-than-average values worsen the fit to the fast third partition. In our analyses slower-than-average sites (first partition, second partition) predominate, yet the inferred TL value in no-APRV analyses is always biased toward the average rate of the sites in the third (most rapidly evolving) partition.

We suggest that upwardly biased TL values in no-APRV analyses are caused by the asymmetrical distribution of variable sites across partitions and constraints on the ability of partition-specific parameters to accommodate the poor fit of TL to the data. When all partitions are assigned the same mean tree length, the log-likelihood estimate is more greatly penalized by the poor fit of TL to the faster partitions. This is because there is no way for many variable sites to be easily explained by a slow evolutionary rate. In contrast, the mismatch between mean TL and the sites in the slower-than-average partitions can be reduced by shifts in parameters that model the degree of among-site rate variation. This is possible because a partition with a comparatively small fraction of variable sites can be explained by both (1) a model combining a slow rate and a low level of ASRV and (2) a model combining a faster rate with a higher level of ASRV. This hypothesis fits the biases observed in the analyses of our simulated data sets. When APRV was simulated in the data but not accommodated in the phylogenetic analysis (Table 4, analyses J and M), the first and (especially) second partition alphas were strongly biased toward smaller than expected values, indicating a shift toward greater estimated among-site rate variation within those partitions. The changes in {alpha} were accompanied by upwardly biased estimates of the transition/transversion rate ratio {kappa} in the first and second partitions. (Presumably {kappa} was biased upwards because the datasets were initially simulated with a greater number of transitions.) Together, these changes indicate a model in which a small number of first and (especially) second partition sites are evolving extremely rapidly while the rest remain unchanged.

Partitioning by codon position without the prset ratepr = variable or unlink brlens options seems to be an especially poor model for estimating branch lengths in data sets with significant among-partition rate variation, such as protein-coding genes. This problem is perhaps most important in studies of closely related taxa, which can be expected to have comparatively few variable first and second codon-position sites, although more simulation work would be needed to determine what effect overall sequence divergence has on the phenomenon. Interestingly, it appears that even a single-partition analysis can outperform a no-APRV partitioned analysis in estimating TL as long as among-site rate variation is included: As noted earlier, the single-partition analysis of the empirical data set (Table 1, analysis D) found a tree length comparable to that observed in the partitioned analyses that accommodated APRV. Reducing the discrete gamma approximation to four rate categories did not change this result (unpublished data).


    Effect of Among-Partition Rate Variation on Model Selection and Nodal Support
 Top
 Effect of Among-Partition Rate...
 Effect of Among-Partition Rate...
 Instability of APRV Analyses...
 Nonidentifiability and...
 Examples from Published...
 Importance of Accurate Branch...
 How to Partition?
 Appendix 1. Methods
 Acknowledgments
 References
 
Accommodating among-partition rate variation dramatically improves the fit of partitioned analyses to datasets exhibiting APRV, by tens to hundreds of log-likelihood units (Tables 1 and 4). This may be an important consideration for those using the likelihood estimate to evaluate alternative candidate models as is done with the Bayes factor (see, e.g., Nylander et al., 2004; Brandley et al., 2005), although in our particular example the no-APRV model (Table 1, analysis A) still outperformed the unpartitioned model (analysis D) by this criterion ("2 ln Bayes Factor" = 12.3, or "very strong" support for the more complex model by the criteria of Kass and Raftery, 1995). The possibility of underestimating the potential gain of partitioned models is important because model selection can have a variety of effects on downstream data analyses. Model misspecification can result in the passage of clock tests even when data are non-clocklike (Huelsenbeck et al., 1997), can cause the mis-estimation of topology (Kuhner and Felsenstein, 1994; Sullivan et al., 1995), can decrease the accuracy and precision of model parameter estimation (Cunningham et al., 1998) and bootstrap support (Buckley et al., 2001; Buckley and Cunningham, 2002), and can strongly bias posterior probabilities (Lemmon and Moriarty, 2004). Furthermore, the effect of model misspecification on posterior probabilities is also influenced by branch lengths mis-estimation (Lemmon and Moriarty, 2004). This is especially important in deeper level phylogenies because error in branch length estimation increases when branches are longer (Buckley et al., 2001; Lemmon and Moriarty, 2004) or when APRV is not accommodated (this work).

In contrast to its effects on tree length estimates and log-likelihoods, inclusion of among-partition rate variation only minimally influences nodal support or topology in our analyses. In the analyses of simulated data, tree topology and relative branch lengths within the tree were unaffected by the analysis method (detailed results are obtainable from the authors upon request). Nodal support values were only minimally affected. Two of the three internal branches (Fig. 1, nodes 2 and 3) were supported by 100% posterior probabilities in every analysis. The remaining, shortest branch was recovered 100% of the time when APRV was included in the model, but was supported by slightly lower posterior probabilities in the two no-APRV analyses (Fig. 3). Treatment of APRV had only minor effects on topology and nodal support in our analyses of the more complex mtDNA data set (e.g., Fig. 4). It is striking that multiple solutions exist that are very similar in topology, relative branch lengths, and nodal support yet dramatically different in mean log-likelihood and in the total amount of evolution inferred.


Figure 3
View larger version (21K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 3 Average posterior probability of node 1 (see Fig. 1) found in each of the nine combinations of APRV treatment and analysis model (analyses A1–3, B1–3, and C1–3; see Table 4), showing reduced support for this short internal branch in no-APRV analyses of data sets simulated under varying partition rates (analyses B1 and C1). The variation across models was significant within the 3:1:9 (Kruskal-Wallis H = 10.590, P < 0.005) and 5:1:25 (H = 16.627, P < 0.000) APRV treatments and not significant within the 1:1:1 treatment (H = 1.816, p < 0.403). Probability values were arcsine-transformed prior to statistical analysis to increase homogeneity of variances.

 


Figure 4
View larger version (134K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 4 Consensus phylograms from partitioned analyses of the Kikihia mtDNA data set. (a) APRV accommodated with the rate multiplier m (Table 1, analysis B). (b) APRV not included in model (Table 1, analysis A). Although the tree in (b) is nearly twice as long the one in (a) (TL of 2.00 versus 1.15), topology and relative branch lengths within the trees are extremely similar. Taxon names in quotation marks refer to undescribed taxa and are intended as nicknames only.

 

    Instability of APRV Analyses Under Diffuse Branch Length Priors
 Top
 Effect of Among-Partition Rate...
 Effect of Among-Partition Rate...
 Instability of APRV Analyses...
 Nonidentifiability and...
 Examples from Published...
 Importance of Accurate Branch...
 How to Partition?
 Appendix 1. Methods
 Acknowledgments
 References
 
Although accommodating among-partition rate variation appears to be necessary for accurate branch length estimation, effective use of the rate multiplier or unlinked branches options appears to require close attention to the branch length prior. Specifically, as the branch length prior mean is increased, the partition rate multipliers and/or partition-specific TL values may shift to estimates that appear unrealistic for the data. Table 5 shows the posterior parameter estimates from a set of seven three-partition analyses of the Kikihia mtDNA data set that accommodated APRV by unlinking branch lengths. The general methods used were the same as those in the first set of empirical analyses (see Appendix) except that invariant sites were used only for the first partition and the few tRNA sites were distributed across the three partitions. Across the seven analyses the branch length prior mean was varied from 1.0 to 0.01 by changing the lambda ({lambda}) argument of the exponential distribution from 1 to 100 (prset brlenspr = unconstrained:exponential[{lambda} ]). Note that the prior mean in MrBayes equals 1/{lambda}. Across this range of priors the partition-specific TL estimates (TL1, TL2, TL3) show an interesting pattern: At larger values of {lambda} (smaller branch length prior means), the relative partition-specific tree lengths appear consistent with typical protein-gene evolution—very slow second-position sites, very fast third position sites. As the branch length prior increases, overall TL increases, but largely due to changes in the first and second partition rates, so that by the time the prior mean reaches 1 (analysis Q, {lambda} = 1) the posterior TL estimates for these partitions greatly exceed the third-partition tree length. The high first-and second-partition TL estimates are accompanied by unusually small estimates of {alpha}, the same pattern observed for the partitioned analyses without APRV in Tables 1 and 4. Note that the branch length prior in analysis R is the default value for MrBayes 3.0b4. In our experience these shifts are sometimes, but not always, accompanied by signs of poor convergence such as unstable parameter estimates and low effective sample sizes when analyzed with Tracer v. 1.3 (Rambaut and Drummond, 2003). We observed similar sensitivity to the branch length prior in analyses that accommodated APRV by use of the partition rate multiplier (data not shown). These results indicate that incorporation of APRV into partitioned Bayesian analyses must be done with caution. Ideally, several MCMC analyses should be conducted to explore sensitivity of the partition rate multipliers or partition-specific tree lengths to the branch length prior.


View this table:
[in this window]
[in a new window]

 
Table 5 Effect of branch length prior (mean = 1/{lambda}) on tree length (TLAVG) and other posterior parameter estimates for the Kikihia mtDNA data set when APRV is accommodated by unlinking branch lengths. "ln L" refers to the mean posterior log-likelihood of the model.

 

    Nonidentifiability and Sensitivity of Tree Length to the Branch Length Prior
 Top
 Effect of Among-Partition Rate...
 Effect of Among-Partition Rate...
 Instability of APRV Analyses...
 Nonidentifiability and...
 Examples from Published...
 Importance of Accurate Branch...
 How to Partition?
 Appendix 1. Methods
 Acknowledgments
 References
 
Branch lengths were sensitive to the prior distribution in all analyses. Lower values of lambda led to longer trees, and vice versa (Table 5). Although this could mean that the branch lengths are nonidentifiable (Rannala, 2002), other factors indicate a more complex problem. Branch lengths vary considerably within our trees, yet relative branch lengths within the consensus phylograms appear insensitive to the choice of model or branch length prior across all analyses (e.g., Fig. 4). Only the total tree length changes in concert with the prior mean. Again, we find it striking that, depending on the partitioning scheme and accommodation of APRV, trees with nearly identical topology, relative branch lengths, and nodal support can be found with very different total tree lengths. (Note that Yang, 2005-82, has argued that consensus phylogram branch lengths from MrBayes 3 should be discarded in favor of new ones obtained by re-running the MCMC with the consensus topology fixed. In this paper we have chosen to use the MrBayes consensus TL values because these have been widely used in published Bayesian analyses and because we are not focusing on individual branch lengths.)

Our analyses did suggest lack of branch length "signal" in one particular situation: In the three-partition analyses with branch lengths unlinked (e.g., Table 1, analysis C), individual branches within the second-partition consensus tree were unusually similar in length (Fig. 5) compared to the branches in the overall consensus phylogram (Fig. 4a), creating a "ladderlike" appearance of the tree. Many of the branches in the second-partition phylogram have lengths that are similar to or slightly less than the mean of the branch length prior, suggesting that the prior is biasing these parameter estimates. Nonidentifiability of individual branch lengths in the slower-evolving partitions may help to explain why the analyses with unlinked branch lengths did not outperform the rate multiplier analyses in mean log-likelihood.


Figure 5
View larger version (65K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 5 Second-partition consensus phylogram for analysis C (unlinked branches) in Table 1, illustrating reduced variance of branch lengths compared to the overall consensus phylogram (not shown, but nearly the same as those in Fig. 4), possibly caused by dominance of the branch length prior. The mean of the branch length prior for this analysis was 0.013. Taxon names in quotation marks refer to undescribed taxa and are intended as nicknames only.

 
Although applying rate multipliers and unlinking branch lengths both accommodate APRV about equally well, unlinking branch lengths will be a better strategy when different data partitions do not maintain the same relative evolutionary rates across the tree. This unusual pattern of among-lineage rate variation has been termed "mosaic evolution" by Simon et al. (1996) where it was demonstrated to occur in 5' versus 3' partitions of 12S rRNA of primates, cicadas and Drosophila. A similar phenomenon focusing on heterogeneity in rate among lineages at individual nucleotide sites has been described by Tuffley and Steel (1998) and Lockhart and colleagues (e.g., Lockhart et al., 1996, 2000), who called it covarion-like evolution, and Philippe and colleagues (e.g., Lopez et al., 2002; Gribaldo et al., 2003), who called it heterotachy. All of these phenomena are forms of covarion-like evolution (Lockhart and Steel, 2005) modified from the early ideas of Fitch and colleagues (e.g., Fitch and Markowitz, 1970; Fitch, 1986; Miyamoto and Fitch, 1995).


    Examples from Published Partitioned Analyses
 Top
 Effect of Among-Partition Rate...
 Effect of Among-Partition Rate...
 Instability of APRV Analyses...
 Nonidentifiability and...
 Examples from Published...
 Importance of Accurate Branch...
 How to Partition?
 Appendix 1. Methods
 Acknowledgments
 References
 
Some studies using MrBayes version 3 for partitioned Bayesian analyses have explicitly mentioned accommodation of APRV generally or use of the prset ratepr = variable option specifically (e.g., Nylander et al., 2004; Bergsten, 2005; Danforth et al., 2005; Ishiguro et al., 2005; Miya et al., 2005). However, many partitioned Bayesian studies do not mention if and/or how among-partition rate variation was modeled, and some may illustrate the tree-lengthening effect of failure to accommodate variable partition rates. For example, Brandley et al. (2005) demonstrated the superiority of partitioned models for resolving phylogenetic relationships of scincid lizards; their most-partitioned analysis yielded significantly better log-likelihoods and resolved deeper nodes in the tree more effectively than an unpartitioned analysis. The study employed a large amount of protein-coding data partitioned by codon position, and the default MrBayes option of prset ratepr = fixed was used (M. Brandley, personal communication). Judging from the scale bars in the figures, the tree presented for their most-partitioned analysis (fig. 4 in Brandley et al. 2005) is approximately twice as long as the tree from the unpartitioned analysis (their fig. 3). The Brandley et al. (2005) study is unusual in that the authors presented trees from both partitioned and unpartitioned analyses, without which it would be difficult to determine if APRV had influenced the branch lengths. (Note that the conclusions of Brandley et al., 2005, about the importance of data partitioning would not have been affected.) A tree from a partitioned Bayesian analysis of Malagasy scincid lizards (Schmitz et al., 2005) appears surprisingly long for a study encompassing a group of closely related genera, with a total branch length of approximately 1.3 expected substitutions per site from the base of the tree to the average tip. No mention was made of the MrBayes ratepr = variable option or unlinking branch lengths.

Older Bayesian analyses using MrBayes version 2 (e.g., Leys et al., 2002; Castoe et al., 2004) are unlikely to have been influenced by the specific tree-lengthening problem discussed in this paper, because data partitioning was allowed only by way of the site-specific rates option. The MrBayes 3.0 prset ratepr = variable option appears to be very similar to the site-specific rates feature of MrBayes 2.0.


    Importance of Accurate Branch Length Estimation to Studies of Divergence Times
 Top
 Effect of Among-Partition Rate...
 Effect of Among-Partition Rate...
 Instability of APRV Analyses...
 Nonidentifiability and...
 Examples from Published...
 Importance of Accurate Branch...
 How to Partition?
 Appendix 1. Methods
 Acknowledgments
 References
 
Divergence time estimation is a frequent goal of molecular phylogenetic studies. Earlier studies produced biased estimates of time (e.g., Doolittle et al., 1996) by failing to account for among-site rate variation (Gogarten et al., 1996; Hasegawa and Fitch, 1996), by failing to correct for multiple hits (e.g., any dating study that uses evenly weighted parsimony branch lengths), or by failing to relax the assumption that different lineages evolve at different rates. For these reasons, it is now generally accepted that best-fitting models that accommodate these biases are necessary for molecular dating (Arbogast et al., 2002; Welch and Bromham, 2005). Yang and Yoder (2003) examined the effects of partitioning protein-coding genes by codon position and estimating dates from these multiple partitions. They found that date estimates varied among codon positions when dates were estimated separately for each but that a combined analysis using a realistic model produced the best results.

Our study has demonstrated that estimates of tree length (the product of rate and time) in analyses with significant APRV are dramatically affected by model assumptions (see Welch et al., 2005, for a related study involving evolutionary rate priors), raising the possibility that divergence time estimates may also be affected if an external rate calibration is used. Fortunately, the most commonly used methods of divergence time estimation are not likely to encounter the problems discussed above. Many dating studies use the Bayesian program MULTIDIVTIME (Thorne et al., 1998; Kishino et al., 2001; Thorne and Kishino, 2002), for which branch lengths are separately estimated for each data partition, eliminating the need for concern about APRV. Penalized likelihood analyses (e.g., R8S; Sanderson, 2002) begin with a user-provided consensus phylogram, but in our empirical examples the relative branch lengths did not usually vary across partitions (but see Fig. 5) so failure to accommodate APRV would not necessarily affect the performance of the algorithm. Tree length errors generated by failure to explicitly model APRV are more likely to influence studies that apply externally estimated "standard" molecular-clock rates (e.g., Brower, 1994) to branch lengths estimated under partitioned models as well as studies that measure evolutionary rates using fossil-calibrated trees in which there are few calibrations and calibrations are located toward the tips of the trees.


    How to Partition?
 Top
 Effect of Among-Partition Rate...
 Effect of Among-Partition Rate...
 Instability of APRV Analyses...
 Nonidentifiability and...
 Examples from Published...
 Importance of Accurate Branch...
 How to Partition?
 Appendix 1. Methods
 Acknowledgments
 References
 
The biggest problem with data partitioning is deciding how to partition. The decision is easier in protein coding genes where the triplet code imposes a codon structure. Yang and Yoder (2003) found that partitioning according to codon position yielded different estimates of molecular dates. Brandley et al. (2005) similarly found that the most important factor in their protein coding partitioning strategy was to separate the rapidly evolving third positions from the slower first and second positions.

Partitioning of ribosomal genes is less intuitive. Although some authors have partitioned rRNA into stems (helices) versus loops (unpaired regions) (e.g., Wheeler and Honeycutt, 1988; Vawter and Brown, 1993; Springer and Douzery, 1996), this does not make sense in terms of evolutionary rate because there is a continuum of variability in stems and loops with some stems and loops evolving quickly on average while others on average remain highly conserved (Peer et al., 1993; Simon et al., 1994; Hickson et al., 1996; Pagel and Meade, 2004). Even more problematic is the presence of rapidly evolving sites in otherwise slowly evolving stems and vice versa (Simon et al., 1996). Models that accommodate correlation among sites (base pairing in stems) (Kjer, 2004) and other components of secondary structure (Jow et al., 2002; Hudelot et al., 2003; Telford et al., 2005) may help address some of these difficulties. Perhaps the best approach to modeling rRNA is to use mixture models that do not require a priori specification of partitions to retrieve signals of pattern heterogeneity (Pagel and Meade, 2004). In mixture models, characters are viewed as having been generated by one of a number of distributions and each of these distributions contributes to the likelihood during an analysis. The parameters of each model distribution and the weights assigned to them are estimated from the data. In the end of the analysis each character can be assigned a probability of membership in each model. Mixture models have also been applied to protein coding genes (Pagel and Meade, 2004) and amino acid sequence evolution (Lartillot and Philippe, 2004), and they may offer an alternative means of avoiding the tree length issues encountered in partitioned Bayesian studies. Preliminary analyses of the Kikihia mtDNA data set (unpublished data) using the program BayesPhylogenies (Pagel and Meade, 2004) yielded tree length estimates comparable to those obtained in Bayesian analyses that accommodated APRV, using mixed models with just two or three "patterns" (or matrices).


    Appendix 1. Methods
 Top
 Effect of Among-Partition Rate...
 Effect of Among-Partition Rate...
 Instability of APRV Analyses...
 Nonidentifiability and...
 Examples from Published...
 Importance of Accurate Branch...
 How to Partition?
 Appendix 1. Methods
 Acknowledgments
 References
 
Effect of Accommodation of APRV on Bayesian Analyses: Empirical Data
The effect of accommodation of among-partition rate variation (APRV) on tree length and other Bayesian parameter estimates was examined using 2152 bp of mtDNA COI, COII, ATPase 6, and ATPase 8 from 32 New Zealand cicada taxa of the genus Kikihia (TreeBase submission SN3085-12924). The sequences have been deposited in GenBank (accession numbers: COI: EF051355 [GenBank] –EF051385 [GenBank] ; COII: EF051386 [GenBank] –EF051416 [GenBank] , A6A8: EF051417 [GenBank] –EF051446 [GenBank] ). Using MrBayes 3.0b4 (Huelsenbeck and Ronquist, 2001), single-and three-partition analyses were conducted with or without APRV accommodated by a partition rate multiplier (each partition assigned a separate rate multiplier, mn) or by unlinking branch lengths (each partitioned assigned a separate tree length, TLn). These were designated analyses A to D. In the three-partition analyses each codon-position was modeled separately (first position sites in partition 1, etc.). An additional three-partition analyses was then conducted (analysis E) in which "random" partitions were used, each containing an equal number of sites from each codon position. Each data partition was separately modeled (parameters unlinked) with a general time-reversible (GTR: Yang, 1994a) sequence evolution model with invariant sites (I; Hasegawa et al., 1985) and with among-site rate variation (ASRV) modeled by a discrete gamma approximation; ({Gamma}; Yang, 1994b) with eight rate categories. All analyses used a branch length prior of unconstrained:exponential(75) and a shape parameter ({alpha}) prior of exp(1). All other parameters were set to the MrBayes default values. For the unlinked branch length analysis (analysis C), the consensus tree length (TLALL, which is not found in the MrBayes output) was calculated by averaging the partition-specific tree lengths (TL1,... TLn), weighted by the number of sites in each partition. Each MrBayes analysis used four chains, one "cold" and three "heated", run for 3 x 106 to 8 x 106 generations (depending on number of parameters in the model) and sampled every 1000 generations; a burnin of approximately 25% was used for most analyses. Stationarity was checked by monitoring post-burnin stability of mean log-likelihood and individual parameter estimates with Tracer v1.3 (Rambaut and Drummond, 2003) and by ensuring that duplicate analyses converged on similar outcomes.

Effect of Accommodation of APRV on Bayesian Analyses: Simulated Data
We also examined the relationship between tree length and degree of among-partition rate variation using three-partition data sets of 2400 bp each simulated on a known tree (Fig. 1) under three different levels of APRV (relative partition rates 1:1:1, 3:1:9, and 5:1:25 for treatments A, B, and C, respectively). Data were simulated using the program Seq-Gen v1.3.2 (Rambaut and Grassly, 1997). Twenty datasets were simulated for each of the three APRV treatments for a total of 60 simulated data sets (representative data sets are summarized in Table 3). Within each treatment, the twenty simulated data sets were each analyzed under three partitioned models differing only in the method of accommodation of APRV (no-APRV, rate-multiplier m, and unlinked branches), and the mean log-likelihood and model parameter estimates from the 20 analyses were averaged for each of these three models. The HKY+{Gamma} model was used in data simulation (and Bayesian phylogenetic analysis) to limit the number of parameters involved, and a small number of taxa (6) were used to accelerate the analyses. The following model parameters were chosen for the simulations: {kappa} = 2.0, {alpha} = 1.0, four rate categories, equal base frequencies ({pi}), 800 bp per partition (2400 bp total), overall TL = 0.64. Partition-specific per-site substitution rates (TL1, TL2, TL3) were simulated as follows: Treatment A (1:1:1) – 0.64, 0.64, 0.64; treatment B (3:1:9) – 0.44, 0.15, 1.32; treatment C (5:1:25)—0.31, 0.06, 1.54. Base frequency ({pi}) estimates of different models did not vary and were omitted from the results for brevity.

Phylogenetic analyses were conducted with MrBayes v3.1 (Ronquist and Huelsenbeck, 2003) to allow use of the chain-convergence diagnostics offered by this version of the program. The parameters {kappa} and {alpha} were estimated separately for each partition (but the base frequencies were "linked" to reduce the number of parameters involved in the test) and the prior means for {alpha} (exp[1]) and {kappa} (beta[1,1]) were set to match the mean values used in the data simulations. A branch length prior of exp(5.247) was used in all analyses. All other parameters were set to the MrBayes default values. MCMC was run for 1.5 million generations in each analysis and 0.5 million generations were discarded as Burn-in. Stationarity was confirmed for all analyses by monitoring the stability of post-burn-in mean log-likelihood and parameter estimates and (for a limited number of analyses of each treatment) ensuring that the uncorrected potential scale reduction factors (PSRF; Gelman and Rubin, 1992) remained close to 1.0 in analyses with the MrBayes "nruns" parameter set to 2; values of PSRF usually did not exceed 1.01. (Yang, 2005-80, suggests that values less than 1.1 or 1.2 indicate convergence "in real data problems".) MrBayes analyses employing rate-multipliers do not return partition-specific tree lengths; instead, the product of the estimated rate mutiplier (mn) for each partition (not shown) and TL is given here to facilitate comparison between results from different models. Similarly, TLavg (the average of the three partition-specific per-site substitution rates obtained from an unlinked branch length analysis) is included here for comparison with the TL values returned by the rate multiplier analyses. Base frequency ({pi}) estimates of different models did not vary and were omitted from the results presented here for brevity.


    Acknowledgments
 Top
 Effect of Among-Partition Rate...
 Effect of Among-Partition Rate...
 Instability of APRV Analyses...
 Nonidentifiability and...
 Examples from Published...
 Importance of Accurate Branch...
 How to Partition?
 Appendix 1. Methods
 Acknowledgments
 References
 
We thank two anonymous reviewers, Allan Baker, and Rod Page for helpful comments on the manuscript. Paul O. Lewis contributed guidance on likelihood-based models and Bayesian analysis. Mark Pagel provided helpful discussions on mixture models and made the latest version of his program available to us. Matt Brandley kindly read a draft of the manuscript and provided useful information. Computational resources for the phylogenetic analyses were provided by the Bioinformatics Facility of the Biotechnology/Bioservices Center, University of Connecticut, Storrs, CT. This material is based upon work partially supported by the National Science Foundation under grant No. DEB-0422386 to CS. TB was supported by funding from the Foundation for Research, Science and Technology (contract number C09X0501).


    References
 Top
 Effect of Among-Partition Rate...
 Effect of Among-Partition Rate...
 Instability of APRV Analyses...
 Nonidentifiability and...
 Examples from Published...
 Importance of Accurate Branch...
 How to Partition?
 Appendix 1. Methods
 Acknowledgments
 References
 

    Arbogast B. S., Edwards S. V., Wakeley J., Beerli P., Slowinski J. B. Estimating divergence times from molecular data on phylogenetic and population genetic timescales. Annu. Rev. Ecol. Syst. (2002) 33:707–740.[CrossRef][Web of Science]

    Barker F. K., Lutzoni F. M. The utility of the incongruence length difference test. Syst. Biol. (2002) 51:625–637.[Free Full Text]

    Bergsten J. A review of long-branch attraction. Cladistics (2005) 21:163–193.[CrossRef][Web of Science]

    Brandley M. C., Schmitz A., Reeder T. W. Partitioned Bayesian analyses, partition choice, and the phylogenetic relationships of scincid lizards. Syst. Biol. (2005) 54:373–390.[Abstract/Free Full Text]

    Brower A. V. Z. Rapid morphological radiation and convergence among races of the butterfly Heliconius erato inferred from patterns of mitochondrial DNA evolution. Proc. Natl. Acad. Sci. USA (1994) 91:6491–6495.[Abstract/Free Full Text]

    Buckley T. R., Arensburger P., Simon C., Chambers G. K. Combined data, Bayesian phylogenetics, and the origin of the New Zealand Cicada genera. Syst. Biol. (2002) 51:4–18.[Abstract/Free Full Text]

    Buckley T. R., Cunningham C. W. The effects of nucleotide substitution model assumptions on estimates of nonparametric bootstrap support. Mol. Biol. Evol. (2002) 19:394–405.[Abstract/Free Full Text]

    Buckley T. R., Simon C., Chambers G. K. Exploring among-site rate variation models in a maximum likelihood framework using empirical data: Effects of model assumptions on estimates of topology, branch lengths, and bootstrap support. Syst. Biol. (2001) 50:67–86.[Abstract/Free Full Text]

    Bull J., Huelsenbeck J. P., Cunningham C. W., Swofford D. L., Waddell P. J. Partitioning and combining data in phylogenetic analysis. Syst. Biol. (1993) 42:384–397.[Free Full Text]

    Castoe T. A., Doan T. M., Parkinson C. L. Data partitions and complex models in Bayesian analysis: The phylogeny of Gymnophthalmid lizards. Syst. Biol. (2004) 53:448–469.[Abstract/Free Full Text]

    Cunningham C. W. Can three incongruence tests predict when data should be combined? Mol. Biol. Evol. (1997) 14:733–740.[Abstract]

    Cunningham C. W., Zhu H., Hillis D. M. Best-fit maximum likelihood models for phylogenetic inference: Empirical tests with known phylogenies. Evolution (1998) 52:978–987.[CrossRef][Web of Science]

    Danforth B. N., Lin C. P., Fang J. How do insect nuclear ribosomal genes compare to protein-coding genes in phylogenetic utility and nucleotide substitution patterns? Syst. Entomol. (2005) 30:549–562.

    Doolittle R. F., Feng D. F., Tsang S., Cho G., Little E. Determining divergence times of the major kingdoms of living organisms with a protein clock. Science (1996) 271:470–477.[Abstract]

    Dowton M., Austin A. Increased congruence does not necessarily indicate increased phylogenetic accuracy—the behavior of the incongruence length difference test in mixed-model analysis. Syst. Biol. (2002) 51:19–31.[Abstract/Free Full Text]

    Fitch W. M. The estimate of total nucleotide substitutions from pairwise differences is biased. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. (1986) 312:317–324.[Abstract/Free Full Text]

    Fitch W. M., Markowitz E. An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochem. Genet. (1970) 4:579–593.[CrossRef][Web of Science][Medline]

    Gelman A., Rubin D. B. Inference from iterative simulation using multiple sequences (with discussion). Stat. Sci. (1992) 7:457–511.[CrossRef]

    Gogarten J. P., Olendzenski L., Hilario E., Simon C., Holsinger K. E. Dating the cenancestor of organisms. Science (1996) 274:1750–1751.[CrossRef][Web of Science][Medline]

    Gribaldo S., Casane D., Lopez P., Philippe H. Functional divergence prediction from evolutionary analysis: A case study of vertebrate hemoglobin. Mol. Biol. Evol. (2003) 20:1754–1759.[Abstract/Free Full Text]

    Hasegawa M., Fitch W. M. Dating the cenancester of organisms. Science (1996) 274:1750.[CrossRef][Web of Science][Medline]

    Hasegawa M., Kishino H., Yano T. Dating the human-ape split by a molecular clock by mitochondrial DNA. J. Mol. Evol. (1985) 22:160–174.[CrossRef][Web of Science][Medline]

    Hickson R., Simon C., Cooper A., Spicer G., Sullivan J., Penny D. Conserved sequence motifs, alignment, and secondary structure for the third domain of animal 12S rRNA. Mol. Biol. Evol. (1996) 13:150–169.[Abstract]

    Hudelot C., Gowri-Shankar V., Jow H., Rattray M., Higgs P. G. RNA-based phylogenetic methods: Application to mammalian mitochondrial RNA sequences. Mol. Phylogenet. Evol. (2003) 28:241–252.[CrossRef][Web of Science][Medline]

    Huelsenbeck J. P., Bull J. J. A likelihood ratio test to detect conflicting phylogenetic signal. Syst. Biol. (1996) 45:92–98.[Abstract/Free Full Text]

    Huelsenbeck J. P., Rannala B., Yang Z. Statistical tests of host-parasite cospeciation. Evolution (1997) 51:410–419.[CrossRef][Web of Science]

    Huelsenbeck J. P., Ronquist F. MrBayes: Bayesian inference of phylogeny. Bioinformatics (2001) 17:754–755.[Abstract/Free Full Text]

    Huelsenbeck J. P., Swofford D. L., Cunningham C. W., Bull J. J., Waddell P. J. Is character weighting a panacea for the problem of data heterogeneity in phylogenetic analysis? Syst. Biol. (1994) 43:288–291.

    Ishiguro N. B., Miya M., Inoue J. G., Nishida M. Sundasalanx (Sundasalangidae) is a progenetic clupeiform, not a closely-related group of salangids (Osmeriformes): Mitogenomic evidence. J. Fish Biol. (2005) 67:561–569.[CrossRef][Web of Science]

    Jow H., Hudelot C., Rattray M., Higgs P. G. Bayesian phylogenetics using an RNA substitution model applied to early mammalian evolution. Mol. Biol. Evol. (2002) 19:1591–1601.[Abstract/Free Full Text]

    Kass R. E., Raftery A. E. Bayes factors. J. Am. Stat. Assoc. (1995) 90:773–795.[CrossRef][Web of Science]

    Kishino H., Thorne J. L., Bruno W. J. Performance of a divergence time estimation method under a probabilistic model of rate evolution. Mol. Biol. Evol. (2001) 18:352–361.[Abstract/Free Full Text]

    Kjer K. Aligned 18S and insect phylogeny. Syst. Biol. (2004) 53:506–514.[Abstract/Free Full Text]

    Kuhner M. J., Felsenstein J. A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol. Biol. Evol. (1994) 11:459–468.[Abstract]

    Lartillot N., Philippe H. A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol. Biol. Evol. (2004) 21:1095–1109.[Abstract/Free Full Text]

    Lemmon A. R., Moriarty E. C. The importance of proper model assumption in Bayesian phylogenetics. Syst. Biol. (2004) 53:265–277.[Abstract/Free Full Text]

    Leys R., Cooper S. J. B., Schwarz M. P. Molecular phylogeny and historical biogeography of the large carpenter bees, genus Xylocopa (Hymenoptera: Apidae). Biol. J. Linn. Soc. (2002) 77:249–266.[CrossRef][Web of Science]

    Lockhart P. J., Huson D., Maier U., Fraunholz M. J., Peer Y. V. D., Barbrook A. C., Howe C. J., Steel M. A. How molecules evolve in Eubacteria. Mol. Biol. Evol. (2000) 17:835–838.[Free Full Text]

    Lockhart P. J., Larkum A. W. D., Steel M. A., Waddell P. J., Penny D. Evolution of chlorophyll and bacteriochlorophyll: The problem of invariant sites in sequence analysis. Proc. Natl. Acad. Sci. USA (1996) 93:1930–1934.[Abstract/Free Full Text]

    Lockhart P. J., Steel M. A. A tale of two processes. Syst. Biol. (2005) 54:948–951.[Free Full Text]

    Lopez P., Casane D., Philippe H. Heterotachy, an important process of protein evolution. Mol. Biol. Evol. (2002) 19:1–7.[Abstract/Free Full Text]

    Miya M., Satoh T. P., Nishida M. The phylogenetic position of toadfishes (order Batrachoidiformes) in the higher ray-finned fish as inferred from partitioned Bayesian analysis of 102 whole mitochondrial genome sequences. Biol. J. Linn. Soc. (2005) 85:289–306.[CrossRef][Web of Science]

    Miyamoto M., Fitch W. M. Testing species phylogenies and phylogenetic methods with congruence. Syst. Biol. (1995) 44:64–76.[Abstract/Free Full Text]

    Nylander J. A. A., Ronquist F., Huelsenbeck J. P., Nieves-Aldrey J. L. Bayesian phylogenetic analysis of combined data. Syst. Biol. (2004) 53:47–67.[Abstract/Free Full Text]

    Pagel M., Meade A. A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Syst. Biol. (2004) 53:571–581.[Abstract/Free Full Text]

    Peer Y. v. d., Neefs J. M., Rijk P. d., Wachter R. d. Reconstructing evolution from eukaryotic small-ribosomal-subunit RNA sequences: Calibration of the molecular clock. J. Mol. Evol. (1993) 37:221–232.[CrossRef][Web of Science][Medline]

    Philippe H., Zhou T., Brinkmann H., Rodrique N., Delsuc F. Heterotachy and long-branch attraction in phylogenetics. BMC Evol. Biol. (2005) 5:50.[CrossRef][Medline]

    Rambaut A., Drummond A. J. Tracer v1.3 (2003) http://evolve.zoo.ox.ac.uk/Available from.

    Rambaut A., Grassly N. C. Seq-Gen: An application for the Monte-Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. (1997) 13:235–238.[Abstract/Free Full Text]

    Rannala B. Identifiability of parameters in MCMC Bayesian inference of phylogeny. Syst. Biol. (2002) 51:754–760.[Abstract/Free Full Text]

    Ronquist F., Huelsenbeck J. P. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics (2003) 19:1572–1574.[Abstract/Free Full Text]

    Sanderson M. J. Estimating absolute rates of molecular evolution and divergence times: A penalized likelihood approach. Mol. Biol. Evol. (2002) 19:101–109.[Abstract/Free Full Text]

    Schmitz A., Brandley M. C., Maufield P., Vences M., Glaw F., Nussbaum R. A., Reeder T. W. Opening the black box: Phylogenetics and morphological evolution of the Malagasy fossorial lizards of the subfamily "Scincidae". Mol. Phylogenet. Evol. (2005) 34:118–133.[CrossRef][Web of Science][Medline]

    Simon C., Frati F., Beckenbach A., Crespi B., Liu H., Flook P. Evolution, weighting, and phylogenetic utility of mitochondrial gene sequences and a compilation of conserved PCR primers. Ann. Entomol. Soc. Am. (1994) 87:651–701.[Web of Science]

    Simon C., Nigro L., Sullivan J., Holsinger K., Martin A., Grapputo A., Franke A., McIntosh C. Large differences in substitutional pattern and evolutionary rate of 12S ribosomal RNA genes. Mol. Biol. Evol. (1996) 13:923–932.[Abstract]

    Springer M. S., Douzery E. Secondary structure and patterns of evolution among mammalian mitochondrial 12S rRNA molecules. J. Mol. Evol. (1996) 43:357–373.[Web of Science][Medline]

    Sullivan J., Holsinger K. E., Simon C. Among-site rate variation and phylogenetic analysis of 12S rRNA in Sigmodontine rodents. Mol. Biol. Evol. (1995) 12:988–1001.[Abstract]

    Telford M. J., Wise M. J., Gowri-Shankar V. Consideration of RNA secondary structure significantly improves likelihood-based estimates of phylogeny: Examples from the Bilateria. Mol. Biol. Evol. (2005) 22:1129–1136.[Abstract/Free Full Text]

    Thorne J. L., Kishino H. Divergence time and evolutionary rate estimation with multilocus data. Syst. Biol. (2002) 51:689–702.[Abstract/Free Full Text]

    Thorne J. L., Kishino H., Painter I. S. Estimating the rate of evolution of the rate of molecular evolution. Mol. Biol. Evol. (1998) 15:1647–1657.[Abstract]

    Tuffley C., Steel M. A. Modeling the covarion hypothesis of nucleotide substitution. Math. Biosci. (1998) 147:63–91.[CrossRef][Web of Science][Medline]

    Vawter L., Brown W. M. Rates and patterns of base change in the SSU ribosomal RNA gene. Genetics (1993) 134:597–608.[Abstract]

    Welch J. J., Bromham L. Molecular dating when rates vary. Trends Ecol. Evol. (2005) 20:320–327.[CrossRef][Medline]

    Welch J. J., Fontanillas E., Bromham L. Molecular dates for the "Cambrian Explosion": The influence of prior assumptions. Syst. Biol. (2005) 54:672–678.[Free Full Text]

    Wheeler W. C., Honeycutt R. L. Paired sequence difference in ribosomal RNAs: Evolutionary and phylogenetic implications. Mol. Biol. Evol. (1988) 5:90–96.[Abstract]

    Yang Z. Estimating the pattern of nucleotide substitution. J. Mol. Evol. (1994a) 39:105–111.[Web of Science][Medline]

    Yang Z. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods. J. Mol. Evol. (1994b) 39:306–314.[CrossRef][Web of Science][Medline]

    Yang Z. Bayesian inference in molecular phylogenetics. In: Mathematics of evolution and phylogeny—Gascuel O., ed. (2005) Oxford: Oxford University Press. 63–90.

    Yang Z. Maximum likelihood models for combined analysis of multiple sequence data. J. Mol. Evol. (1996) 42:587–596.[CrossRef][Web of Science][Medline]

    Yang Z., Yoder A. D. Comparison of likelihood and Bayesian methods for estimating divergence times using multiple gene loci and calibration points, with application to a radiation of cute-looking mouse lemur species. Syst. Biol. (2003) 52:705–716.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Syst BiolHome page
D. C. Marshall
Cryptic Failure of Partitioned Bayesian Phylogenetic Analyses: Lost in the Land of Long Trees
Syst Biol, November 17, 2009; (2009) syp080v1.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
T. E. Roberts, E. J. Sargis, and L. E. Olson
Networks, Trees, and Treeshrews: Assessing Support and Identifying Conflict with Multiple Loci and a Problematic Root
Syst Biol, June 16, 2009; (2009) syp025v3.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
P. Q. Spinks and H. B. Shaffer
Conflicting Mitochondrial and Nuclear Phylogenies for the Widely Disjunct Emys (Testudines: Emydidae) Species Complex, and What They Tell Us about Biogeography and Hybridization
Syst Biol, May 28, 2009; (2009) syp005v1.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
C. C. Steiner, H. Rompler, L. M. Boettger, T. Schoneberg, and H. E. Hoekstra
The Genetic Basis of Phenotypic Convergence in Beach Mice: Similar Pigment Patterns but Different Genes
Mol. Biol. Evol., January 1, 2009; 26(1): 35 - 45.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
C. R. Linnen and B. D. Farrell
Comparison of Methods for Species-Tree Inference in the Sawfly Genus Neodiprion (Hymenoptera: Diprionidae)
Syst Biol, December 1, 2008; 57(6): 876 - 890.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
M. Dohrmann, D. Janussen, J. Reitner, A. G. Collins, and G. Worheide
Phylogeny and Evolution of Glass Sponges (Porifera, Hexactinellida)
Syst Biol, June 1, 2008; 57(3): 388 - 405.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
J. A. Clarke and K. M. Middleton
Mosaicism, Modules, and the Evolution of Birds: Results from a Bayesian Approach to the Study of Morphological Evolution Using Discrete Character Data
Syst Biol, April 1, 2008; 57(2): 185 - 201.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
J. A. McGuire, C. C. Witt, D. L. Altshuler, and J. V. Remsen
Phylogenetic Systematics and Biogeography of Hummingbirds: Bayesian and Maximum Likelihood Analyses of Partitioned Data and Selection of an Appropriate Partitioning Strategy
Syst Biol, October 1, 2007; 56(5): 837 - 856.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
J. M. Brown and A. R. Lemmon
The Importance of Data Partitioning and the Utility of Bayes Factors in Bayesian Phylogenetics
Syst Biol, August 1, 2007; 56(4): 643 - 655.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (15)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Marshall, D. C.
Right arrow Articles by Buckley, T. R.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Marshall, D. C.
Right arrow Articles by Buckley, T. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?