© 2008 Society of Systematic Biologists
A Bayesian Perspective on a Non-parsimonious Parsimony Model
Edited by Marc Suchard
1 Department of Integrative Biology, University of California Berkeley 3060 VLSB No. 3140, Berkeley, CA 94720-3140, USA; E-mail: johnh{at}berkeley.edu
2 Department of Botany, University of Wisconsin 430 Lincoln Drive, Madison, WI 53706, USA
3 Department of Statistics, University of Wisconsin 1300 University Avenue, Madison, WI 53706, USA
4 Swedish Museum of Natural History Box 50007, SE-104 05 Stockholm, Sweden
| Abstract |
|---|
|
|
|---|
Several stochastic models of character change, when implemented in a maximum likelihood framework, are known to give a correspondence between the maximum parsimony method and the method of maximum likelihood. One such model has an independently estimated branch-length parameter for each site and each branch of the phylogenetic tree. This model—the no-common-mechanism model—has many parameters, and, in fact, the number of parameters increases as fast as the alignment is extended. We take a Bayesian approach to the no-common-mechanism model and place independent gamma prior probability distributions on the branch-length parameters. We are able to analytically integrate over the branch lengths, and this allowed us to implement an efficient Markov chain Monte Carlo method for exploring the space of phylogenetic trees. We were able to reliably estimate the posterior probabilities of clades for phylogenetic trees of up to 500 sequences. However, the Bayesian approach to the problem, at least as implemented here with an independent prior on the length of each branch, does not tame the behavior of the branch-length parameters. The integrated likelihood appears to be a simple rescaling of the parsimony score for a tree, and the marginal posterior probability distribution of the length of a branch is dependent upon how the maximum parsimony method reconstructs the characters at the interior nodes of the tree. The method we describe, however, is of potential importance in the analysis of morphological character data and also for improving the behavior of Markov chain Monte Carlo methods implemented for models in which sites share a common branch-length parameter.
Keywords: Bayesian phylogenetic inference; Markov chain Monte Carlo; maximum likelihood; parsimony model
Received May 11, 2007; Revised August 8, 2007; Accepted January 14, 2008
Two different approaches have been taken to justify the parsimony method for inferring phylogenetic trees. The first strategy relies on philosophical arguments. Over the past three decades, various authors have argued that the parsimony method fits the hypothetico-deductive framework of scientific reasoning (Wiley, 1975; Gaffney, 1979; Eldredge and Cracraft, 1980; Wiley, 1981; Farris, 1983), Popper's theory of corroboration (Popper, 1959; Siddall and Kluge, 1997; Kluge, 1997, 1998), or is justified is justified on the basis of the parsimony principle alone (i.e., the logical parsimony school for the justification of parsimony; Beatty and Fink, 1979; Kluge and Wolf, 1993). The second strategy for justifying the parsimony method relies on the idea that good methods of phylogenetic inference are statistical ones. The idea, then, is to find situations in which the parsimony method corresponds to an existing and well-justified method of statistical inference. Farris (1973) was the first to find a stochastic model of evolution that, when implemented in a maximum likelihood framework, corresponded to the parsimony method of phylogenetic inference. In his result, Farris treats the ancestral configuration of states on the phylogenetic tree as parameters and jointly estimates the tree and the ancestral states using maximum likelihood. Importantly, the number of parameters to be estimated increases as fast as new data are added to the problem; the extension of an alignment by one site, for example, adds n–2 additional parameters to be estimated (where n is the number of taxa in the analysis). Goldman (1990) derived a similar result, once again relying on the idea that the ancestral states are jointly estimated with the tree. He points out that "... parsimony analyses rest on a maximum likelihoood justification, but lay themselves open to the possibility of statistical inconsistency by estimating random variables as though they were (incidental) parameters" (Goldman, 1990–356).
Tuffley and Steel (1997) described another case in which the methods of maximum likelihood and parsimony correspond. Importantly, their result does not depend upon estimating ancestral states. Instead, the ancestral states on a tree are considered random variables, and the likelihood involves a sum over all possible assignments of states to the ancestral nodes on the tree. Accounting for uncertainty in the ancestral states of a phylogenetic tree by summing over all possibilities is the standard procedure in the field and has the advantage that inferences are not conditioned on any particular configuration of character histories. Although Tuffley and Steel (1997) integrate out the ancestral states on the tree, their model has the property that the number of parameters increases with the number of observations, just as is the case with the Farris (1973) and Goldman (1990) models. Tuffley and Steel (1997) assume separate independently estimated branch lengths for each site and each branch. Hence, the extension of the alignment by one site adds 2n–3 parameters to estimate. The Tuffley and Steel (1997) model has been referred to as the "no-common-mechanism" model (Tuffley and Steel, 1997; Felsenstein, 2004). This is probably a better terminology than the alternative, calling it "the parsimony model." For one, the Tuffley and Steel (1997) model is just one of several that gives a correspondence between the parsimony and maximum likelihood methods. Moreover, the details of the model reveal that its assumptions are anything but parsimonious. In fact, a more parsimonious model has a common branch-length parameter for all of the sites and is the default option for many programs that implement the maximum likelihood method of phylogenetic inference.
As formulated by Tuffley and Steel (1997), the no-common-mechanism model assumes a common phylogenetic tree for all of the observations (i.e., the tree relating the species is treated as a structural parameter) but introduces 2n–3 incidental parameters for each observation. Here, an observation is a single site (column) in an alignment. Structural parameters are parameters that appear in the probability distribution of all of the observations, whereas an incidental parameter appears in the probability of only a subset of the observations (Neyman and Scott, 1948; Goldman, 1990). Typically, in a phylogenetic analysis only the topology of the tree is of interest to the biologist and the other parameters, whether structural or incidental, are of only passing interest and are considered nuisance parameters. The approach widely used in maximum likelihood inference of phylogeny is to assume that the nuisance parameters take their maximum likelihood values. Of course, if one takes the approach of maximizing the likelihood with respect to the nuisance parameters—producing what is called the "profile likelihood"—one obtains the correspondence between maximum likelihood implemented with the no-common-mechanism model and the parsimony method. An alternative approach is to integrate the nuisance parameters over a suitable prior probability distribution for the parameters. This is the integrated likelihood approach and has a number of advantages over the profile likelihood, at least when it can be implemented (Berger et al., 1999). For example, the profile likelihood can be misleading when the likelihood surface has a sharp ridge and inferences based on the integrated likelihood account for uncertainty in the nuisance parameters. The integrated likelihood approach is often used in maximum likelihood inference of phylogeny to model rate variation across sites. The rate of substitution at a site—an incidental nuisance parameter of the model—is sometimes assumed to be a random variable drawn from a mean-one gamma distribution (Yang, 1993). The rate of substitution, then, is integrated over a gamma prior and inferences do not depend upon the rate at a site taking any particular value. The integrated likelihood approach is also the standard one in a Bayesian analysis, where all of the parameters of the statistical model are treated as random variables with a prior probability distribution.
In this paper, we consider a Bayesian treatment of the no-common-mechanism model of Tuffley and Steel (1997). Bayesian analysis is often useful in parameter-rich models where the introduction of a well-chosen prior can sometimes tame the behavior of the parameter estimates. The no-common-mechanism model of Tuffley and Steel (1997) illustrates the problem that parameter rich models can have when applied to small data sets; the maximum likelihood estimates of the branch lengths are either zero or infinity. A Bayesian approach to the problem places a prior probability distribution on the branch-length parameters and may result in more reliable inferences. We explore this possibility for the parameter-rich model of Tuffley and Steel (1997). We also examine the behavior of the no-common-mechanism model for large trees and ask how much information is contained in the data on the length of a single branch at one site by comparing the posterior probability of the branch lengths to the prior probability distribution.
| Methods |
|---|
|
|
|---|
Likelihood
We assume that an alignment of DNA sequences is available. The alignment contains n sequences, each of which is c nucleotides in length. For example, the following is an alignment of n = 5 sequences that are c = 10 sites in length:
- Species 1 CACAGTTACC
- Species 2 CGCAGTTACC
- Species 3 CGCAGTTATC
- Species 4 CGTGCTTATC
- Species 5 CGCACTTATC
- Species 2 CGCAGTTACC
The nucleotide for the ith species and jth site is denoted xij, where i
(1, ..., n) and j
(1, ..., c). The information at the jth column (site) in the alignment is denoted xj. For example, the information at the third site in the example alignment above is x3 = (C,C,C,T,C)T. The entire alignment is denoted X = (x1, x2, ..., xc).
We assume that the sequences are related to one another through an unrooted tree. The ith tree is denoted
i, and trees are labeled
1,
2, ...,
B(n), where B(n) is the number of possible trees with n tips (B(n) = (2n–5)!! for unrooted trees). We label the tip nodes of the tree 1, 2, ..., n and the interior nodes of the tree are labeled n+1, n+2, ..., 2n–2. Node k has branch k and ancestor
(k). Under the no-common-mechanism model of Tuffley and Steel (1997), each site and each branch has its own length parameter. The length of the branch is in terms of expected number of substitutions per site. Because there are 2n–3 branches, there are a total of (2n–3) x c branch-length parameters in the no-common-mechanism model. Branch k and site j of the tree has length vkj. Figure 1 shows an example tree for n = 5 species. Note that the tree is drawn such that it is rooted at node 2n–2.
|
Tuffley and Steel (1997), assume that nucleotide substitutions occur under the continuous-time Markov model first described by Jukes and Cantor (1969). The Jukes and Cantor (1969) model assumes that all substitution types have equal rates of change. The instantaneous rates of change for the model are contained in the rate matrix Q:
|
|
The probability of observing the data at the jth site is a sum over all combinations of states at the interior nodes of the tree:
|
|
In this paper, we wish to integrate out branch lengths by considering all possible combinations of branch lengths, with each such combination weighted by its probability under some prior model. We assume that the branch lengths are independent gamma-distributed random variables. The gamma distribution has probability density
|
|
and
are the shape and scale parameters, respectively. The gamma probability distribution has mean E(x) =
/
and variance Var(x) =
/
2. The probability of observing the data at the jth site is then a multidimensional integral over all possible combinations of branch lengths, as well as a summation over all possible combinations of nucleotide assignments to the interior nodes of the tree: |
|
There are a number of simplifications that make calculating the likelihood for a site a practical endeavor. First, we use the Felsenstein (1981) pruning algorithm to perform the summation over ancestral states. Second, we are able to take advantage of the fact that there are independent branch-length parameters for each site and branch under the Tuffley and Steel (1997) model and integrate over the branch lengths analytically. If the branch length, v, has a gamma-distribution prior, then the probability transition matrix satisfies the integral equation
|
|
For a diagonalizable matrix A = U D U–1, where D is a diagonal matrix, we define matrix exponentiation as Aβ = U DβU–1. Powers of the diagonal matrix are defined by raising the diagonal elements to the power β. With this definition in hand, the integrated transition probabilities have analytical solution
|
|
|
|
Note that Bayesian inference under the Jukes and Cantor (1969) model is equivalent to calculating the likelihood under a model in which a common branch length is applied to all of the sites. Note also that integrating out branch lengths is not original to this paper: Suchard et al. (2002, 2003) and Sinsheimer et al. (2003) first developed the idea in the course of investigating models that allow the phylogeny to change along the sequence according to a multiple change-point model. Similarly, Goloboff (2003) explored the use of maximum likelihood estimation of phylogeny while integrating branch lengths over a uniform prior probability distribution.
We assume that substitutions are independent across sites. The probability of observing the entire alignment, then, is a product of the site likelihoods:
|
|
Bayesian Analysis of Phylogeny
In a Bayesian statistical analysis of phylogeny, inferences are based upon the posterior probability distribution of trees. For our implementation of the no-common-mechanism model of Tuffley and Steel (1997), the posterior probability of the ith tree is
|
|
We assume that the trees have equal prior probability; because there are B(n) possible unrooted trees, the prior probability of any tree is 1/B(n).
Markov Chain Monte Carlo
Most phylogenetic models used in likelihood-based phylogenetic analysis involve many fewer free parameters than the Tuffley and Steel (1997) parsimony model. However, analysis under these simpler models is difficult. For example, typically the branch-length parameters are shared among sites in the alignment. In a Bayesian analysis of phylogeny, this means that an implementation of the Markov chain Monte Carlo (MCMC) algorithm for approximating posterior probabilities of trees must change branch lengths as well as topology when exploring the space of phylogenetic trees; one cannot integrate over branch lengths when the branch lengths are shared among sites, and one must instead resort to a numerical method, such as MCMC, to perform the integration. Interestingly, even though the parsimony model involves many more parameters, the fact that this model has independent branch-length parameters for each site and branch means that the branch lengths can be analytically integrated (see above). This means that any MCMC implementation that approximates the posterior probability of trees does not need to change branch lengths. The MCMC implementation can work directly on the topology of the phylogenetic tree, significantly simplifying the MCMC.
We took advantage of this simplification when implementing MCMC proposal mechanisms to explore the space of phylogenetic trees. MCMC is a method for approximating high-dimensional integrals and/or summations. One constructs a Markov chain that has as its state space the parameters of interest and a stationary distribution that, for a Bayesian statistical analysis, is the posterior probability distribution of the parameters. Samples of the states of the Markov chain while at stationarity are valid (though dependent) samples from the posterior probability distribution of the parameters (Tierney, 1996). We implemented the Metropolis-Hastings algorithm (Metropolis et al., 1953; Hastings, 1970) using a formalism described by Green (2003) to construct the proposal mechanisms. In this study, the states of the Markov chain are tree topologies, and the current state is designated
. (Note that
and
are considered fixed for any particular MCMC analysis. The Markov chain was initialized with a randomly chosen tree.) The MCMC algorithm works by repeatedly proposing a new state and then either accepting or rejecting that state as the next state of the Markov chain. The proposal mechanism involves the generation of random numbers u drawn from the probability distribution g(u). The proposed state is a deterministic function of the random numbers and the original state:
' = h(
, u). The reverse move from
' to
is imagined through another set of random numbers u' drawn from the probability distribution g'(u'). The tree proposed in the reverse move is determined as
= h(
', u'). Note that the reverse move is never made in computer memory, but the probabilities calculated for the imagined reverse move are required to calculate the acceptance probability. The probability of accepting the proposed tree
' as the next state of the Markov chain is
|
|
The last factor is called the Jacobian of the transform to
' and u' with respect to
and u.
We developed three proposal mechanisms for changing trees: stochastic NNI, a Gibbs-like TBR move, and a Gibbs-like move involving erasure of part of the tree.
NNI
We developed a simple nearest neighbor interchange (NNI) proposal mechanism. An internal branch is chosen at random. This internal branch defines a four-taxon tree ((S1,S2),S3,S4), with Si denoting the subtree that extends from the ith branch incident to the randomly chosen internal branch. With equal probability, either tree ((S1,S3),S2,S4) or ((S1,S4),S2,S3) will be chosen as the proposed tree.
Gibbs TBR
Taxon-bisection and reconnection (TBR) is a tree perturbation that is traditionally used to explore the space of trees for heuristic searches, where the goal is to find the best tree(s) under some optimality criterion. Here, we implemented a stochastic version of TBR that works as follows: First, a branch is randomly chosen and erased from the tree, dividing the tree into two unconnected subtrees. One subtree has N1 branches and the other subtree has N2 branches. Second, the subtrees are reconnected in all possible ways by drawing a branch from one of the N1 branches in the first subtree to one of the N2 branches of the second subtree. There are a total of N1 x N2 ways to reconnect the two subtrees to form a tree that contains all n of the species. The reconnection possibilities can be restricted to all of those reconnection placements that are within some distance of the branch that was originally erased, an idea that has been implemented in PAUP* for heuristic searches (Swofford, 1998). The reconnection limit allows the search to progress faster, by restricting changes to more local parts of the tree. The likelihood is calculated for each of the possible reconnection patterns. The likelihood for the ith possible way to reconnect the subtrees is denoted
i. Third, the likelihoods are normalized, such that the sum of the N1 x N2 likelihoods is one. For example, imagine that the tree was bisected in such a way that there were 2 species in one of the subtrees (with N1 = 1 branch) and 5 species in the second subtree (with N2 = 7 branches). There are a total of 7 possible ways to reconnect the two subtrees. If the log likelihoods for each of the possible reconnection possibilities are –101.6, –96.0, –111.9, –105.9, –102.3, –102.1, and –108.8, then the normalized probabilities are
where pi is the normalized likelihood for reconnection possibility i. Fourth, one of the possible reconnection possibilities is chosen at random using the normalized probabilities calculated in the third step. In the example above, the second reconnection possibility is the most likely to be chosen, because it has a probability of 0.992. However, there is a small chance (in this example, at least) that one of the other six reconnection possibilities will be chosen.
|
Gibbs eraser
Our "eraser" move proposes a new tree by erasing a portion of the tree, leaving m subtrees, and is identical to the Symmetric Neighborhood Alteration to Phylogenies (SNAP) tree perturbation described by Whelan (2007). Figure 2 provides an example of a tree in which a rather large portion of the tree is erased, leaving a total of m = 8 subtrees. All B(m) possible resolutions of the subtrees into fully resolved trees containing n species are tried, with the likelihood (
) calculated for each possibility. As with the Gibbs TBR move, a resolution of the erased tree into a fully resolved tree is chosen in proportion to the normalized likelihoods. For the tree depicted in Figure 2, there are a total of B(8) = 10,395 possible resolutions of the erased portion of the tree. The likelihood is calculated for each of the 10,395 resolutions, and one is chosen in proportion to its likelihood.
|
Data Analysis
We analyzed five data sets: (1) an alignment of the β-globin gene for 17 vertebrates (s = 17, c = 432; Yang et al., 2000); (2) an alignment of the ITS gene for 140 species of Astragalus (s = 140, c = 686; Sanderson and Wojciechowski, 2000); (3) an alignment of the plastid rbcL gene for 357 angiosperm species (s = 357, c = 1497; Savolainen et al., 2000); (4) an alignment of the plastid atpB gene for 357 angiosperm species (s = 357, c = 1428; Savolainen et al., 2000); and (5) an alignment of rbcL genes for 500 plant species (s = 500, c = 759; Chase et al., 1993; Stamatakis et al., 2005). The large alignment of 500 rbcL gene sequences was obtained from http://icwww.epfl.ch/~stamatak/index-Dateien/Page443.htm, which includes several large alignments. The file we analyzed here is labeled 500_ZILLA in that set of alignments. The alignment we analyzed differs from the original Chase et al. (1993) paper in having only c = 759 sites instead of c = 1428 sites Stamatakis et al., 2005).
We approximated the posterior probability of trees using MCMC implemented with eight different proposal strategies: stochastic NNI; Gibbs eraser, erasing a portion of the tree and leaving four, five, or six subtrees to reconnect; and Gibbs TBR implemented with a reconnection limit of 5, 10, 20, or
(i.e., no reconnection limit). All Markov chains were run for a total of one million cycles. Chains were sampled every 100 cycles, and inferences were based on samples taken after generation 200,000. Each analysis was repeated, and the samples from the two chains compared to determine if they had converged on the same probability distribution of trees. We assumed that the (2n–3)x c branch length parameters followed independent exponential prior probability distributions, with parameter 10 (i.e.,
= 1,
= 10). The prior mean of the branch length was 0.1.
| Results and Discussion |
|---|
|
|
|---|
The inferences of phylogeny made in independent MCMC analyses were consistent for some, but not all, of the proposal mechanisms we investigated. Figure 3 shows the correlation of the posterior probabilities of the same clades approximated from two different MCMC analyses, each of which started with a different random starting tree. The analyses of the vertebrate β-globin alignment were consistent regardless of the proposal mechanism used; the posterior probability of a particular clade found in the two independent MCMC analyses were very similar. However, this was not the case for the larger alignments we investigated. For the stochastic NNI and Gibbs-like eraser moves, we often obtained inconsistent results in the MCMC analyses of the large data sets. Often, a clade would be found with high probability in one MCMC analysis but with low probability in the other. This is unambiguous evidence that the MCMC analysis has failed. Importantly, however, we were able to achieve consistent results for the Gibbs-like TBR proposal mechanism. When the reconnection limit was set to 20 or greater, the MCMC analyses were consistent for the larger data sets we examined. For some of the data sets (e.g., the Angiosperm rbcL and atpB alignments and the large green plant alignment), TBR failed when the area of rearrangement was small, as was the case when the reconnection limit was set to 5 or 10.
|
For the larger alignments, it is important that a proposal mechanism makes potentially large changes in the tree to ensure adequate mixing of the MCMC algorithm. There is a clear relationship between how locally the proposal mechanism acts to change the tree and the ability of the MCMC algorithm to adequately explore the space of trees. This is true regardless of the efficiency of the proposal mechanism. For example, both the stochastic NNI and Gibbs-like eraser (leaving four subtrees) moves are equivalent in the size of changes they make to the tree. Both act on a local area of the tree, defining four subtrees from a particular interior branch, and both move to one of the resolutions of the four subtrees into a fully resolved tree containing all of the species. However, the eraser move (leaving four subtrees) is a much more efficient move, because it proposes trees in proportion to their probability, and not blindly, as is the case for the stochastic NNI move we implemented. Yet, neither the stochastic NNI nor the Gibbs-like eraser move (leaving four subtrees) were particularly effective in exploring the space of phylogenetic trees for the larger alignments. Moreover, for the eraser move, increasing the area that was erased did not appear to help the MCMC algorithm to reliably converge on the probability distribution of trees.
We constructed majority-rule consensus trees that summarize the samples taken during the MCMC analysis for each of the alignments. Figure 4 shows the majority-rule consensus tree for the β-globin alignment. (The interested reader can see the majority-rule consensus trees for the analyses we performed of the other alignments in the supplemental material to this article, http://www.systematicbiology.org. These trees included many taxa, and it was not sensible to show them in their entirety here.) The majority rule consensus tree of the β-globin analysis under the no-common-mechanism model was similar to the results of the maximum parsimony analysis for that data set, which resulted in two equally parsimonious trees each of which required a minimum of 757 character-state transformations (Fig. 4). These analyses were also similar to the majority-rule consensus tree that results from a Bayesian analysis of the β-globin alignment under the Jukes and Cantor (1969) model (assuming a common branch-length parameter for all sites, resulting in 2n–3 free parameters; Figure 4c). The maximum parsimony analysis and the MCMC analyses under the no-common-mechanism model and under the Jukes and Cantor (1969) model with a common set of branch lengths for all sites result in a nonmonophyletic Mammalia. However, the Bayesian analysis under the GTR+
model of DNA substitution Tavaré, 1986; Yang, 1993) results in a monophyletic Mammalia (this model assumes a common branch-length parameter for all sites; Figure 4d). The Bayesian tree under the GTR+
model is also the only one that unites artiodactyles.
|
There is a linear relationship between the parsimony score and the log likelihood. Figure 5 shows the relationship between the parsimony score and the log likelihood under the no-common-mechanism model for the trees sampled using the stochastic NNI proposal mechanism. Each plot shows the relationship for the 20,000 trees sampled during the course of the two MCMC analyses performed on each data set. (See Appendix for an explanation of the near-linear relationship between the parsimony score and the integrated likelihood.) Given the linear relationship between the integrated likelihood for a tree and the parsimony score, it is hardly surprising that the trees sampled by the MCMC algorithm are similar to the maximum parsimony trees. However, one should not consider the MCMC sampling methods we describe in this paper as a substitute for maximum parsimony. For large data sets, one may never sample the most-parsimonious tree using MCMC. The strategy we outline here is more similar to that described by Farris et al. (1996) in which a resampling method is coupled with quick (stepwise addition) tree searches to generate a sample of trees, none of which may be the most parsimonious. The main difference between the MCMC implementation of the no-common-mechanism model and the strategy outlined by Farris et al. (1996) is that the MCMC method generates samples from the posterior probability distribution of phylogenetic trees.
|
We examined the marginal posterior probability density for branch lengths for three sites in the β-globin alignment: 1, 5, and 66. We calculated the branch-length probability distributions on the tree shown in Figure 6, which is one of the two most-parsimonious trees. The sites differed in the pattern of nucleotides assigned to the tips:
|
|
|
Note that site 1 has two states (C and T) and requires two changes on the tree shown in Figure 6. Site 5 requires no changes, and site 66 requires a minimum of four changes. Moreover, the assignment of nucleotides to ancestral nodes of the tree is ambiguous for site 66. Figure 7 shows the marginal posterior probability density of the branch lengths (f(vkj | xj,
,
,
), where j = (1,5,66)) for the three sites. The main points to note are (1) the probability density of the branch lengths closely follows the prior when no change is reconstructed along the branch by the parsimony method; (2) that the probability density of the branch length has a well-defined mode, with little probability density for small branch lengths, when a change is unambiguously reconstructed along a branch; and (3) that the probability density is intermediate in shape when the changes are ambiguously reconstructed along the branch.
|
The Bayesian implementation of the no-common-mechanism model performs well when the assumptions of the method are satisfied (i.e., the process generating the observations matches the assumptions of the method). Figure 8 shows the relationship between the posterior probability of a clade and the probability that the clade is correct for simulated data. The simulations were performed using the protocol described in Huelsenbeck and Rannala (2004); parameters were picked from the prior probability distribution, and then sequences were simulated on the tree under the Jukes and Cantor (1969) model of DNA substitution. In this case, a four-taxon tree was first picked from the prior probability distribution of trees (i.e., a tree was picked at random) and a length was picked from the branch-length prior for each branch and site. Here, branch lengths were assumed to be exponentially distributed with parameter 10. Once the tree and branch lengths were chosen, sequences 25 sites in length were simulated along the tree under the Jukes and Cantor (1969) model of substitution. The simulated alignment was then analyzed under the no-common-mechanism model. The procedure was repeated 10,000 times to produce the results shown in Figure 8. Bayes' theorem ensures that the relationship between the posterior probability of a tree and the probability that the tree is correct is linear. In this sense, the results shown in Figure 8 are reassuring in that they suggest the our implementation of the MCMC algorithm for the no-common-mechanism model is correct.
|
The results depicted in Figure 8 should not be taken as evidence that a Bayesian implementation of the no-common-mechanism model ensures that the estimated tree is accurate. The Bayesian implementation of the no-common-mechanism model is susceptible to long-branch attraction, just as is the maximum parsimony method (Felsenstein, 1978). Figure 9 shows the probability of a correct estimate of phylogeny for the four-taxon case. Sequences were simulated assuming a common branch-length parameter for all of the sites in the alignment. Two of the opposing branches (those marked y in Figure 9) were potentially longer than the remaining three branches (marked x). Data were simulated such that the tree length (sum of the five branch lengths) was 1/2. The tree with the maximum posterior probability (the MAP tree) was taken as the best estimate of phylogeny. Note that when the branches marked y were 10 or 20 times longer than the other branches on the phylogeny, the probability of correctly inferring phylogeny decreased toward zero. We did not formally check that the Bayesian implementation of the no-common-mechanism model was inconsistent for these two cases by substituting expected site pattern frequencies of all 256 site patterns for the data, but the results suggest that, at least for the case in which y = 20x, the method is statistically inconsistent.
|
We performed an additional analysis to compare the region of statistical consistency/inconsistency for the maximum likelihood and Bayesian implementations of the no-common-mechanism model. The parameter space is the same as that explored by Felsenstein (1978) and later by Huelsenbeck and Hillis (1993). Branches on the four-species tree were constrained such that the interior branch and two opposing peripheral branches were the same taking one length, whereas the remaining two opposing peripheral branches of the tree took a potentially different length (i.e., the same constraint on branch lengths imposed in Fig. 9 was used). The pattern probabilities were calculated under the Jukes and Cantor (1969) model, and all of the sites were assumed to share a common set of branch lengths. Figure 10 shows the results of the analysis. The zone of consistency was largest for the maximum likelihood implementation of the no-common-mechanism model. The four Bayesian implementations of the no-common-mechanism model were slightly smaller; the zone of consistency was largest when the prior mean of the branch length was small.
|
The no-common-mechanism model is very peculiar, and the authors have mixed feelings about having implemented the method in a Bayesian framework. (That said, the authors are willing to act as enablers to biologists interested in performing Bayesian analysis under the no-common-mechanism model by providing the computer code used in this study. The interested reader should contact the lead author to obtain the code.) The introduction of a prior probability distribution on the many branch-length parameters contained in the model clearly did not tame the behavior of the method. The method inherits the disturbing statistical behavior of the maximum parsimony method Felsenstein, 1978), and, in fact, the integrated likelihood for a tree appears to be a simple rescaling of the parsimony score. Other probability distributions might act as better priors for the branch-length parameters. A probability model allowing some degree of covariation in the lengths of branches might better capture the fact that different sites in an alignment of DNA sequences share a common history, with a common set of branching times, and at least similar rates of substitution; this results in different sites having correlated branch lengths. Many of the models commonly implemented in maximum likelihood, Bayesian, and distance-based phylogenetic analyses already allow for quite a bit of heterogeneity in the substitution process across sites and yet add only a moderate number of additional parameters to the phylogenetic model.
The parameters of the gamma prior play an inordinately strong role in determining the probabilities of trees. Note that the slope of the relationship between the parsimony score and the integrated likelihood becomes more nearly zero as
increases (see Appendix). Essentially, when
is large, a lot of prior probability is placed on long branches. When branch lengths are long (say when the branch length v > 1), the transition probabilities are near to the stationary values, and all trees have similar likelihoods.
There are two areas where we feel that the no-common-mechanism model may be useful. The first is to model character change for morphological data. Lewis (2001) described how one could infer phylogeny using morphological characters using a k-state continuous-time Markov chain. Morphological characters are treated just like molecular characters in the analysis, except the probability of a morphological character is conditioned on being variant. (Characters that are invariant are rarely sampled by morphological systematists, and this sampling scheme needs to be accounted for when calculating likelihoods.) The no-common-mechanism model provides another way to include morphological characters in a likelihood-based phylogenetic analysis.
The no-common-mechanism model might also be used to improve the exploration of tree space by MCMC for models in which the branch-length parameters are shared for many sites. One could argue that the Gibbs-like eraser and TBR moves were implemented very efficiently in terms of how they choose proposed trees. Because topology moves are accepted in proportion to the probability of the data, a proposal mechanism, such as the Gibbs-like TBR move, is able to quickly explore areas of the tree space with high likelihood, effectively ignoring topology moves that result in trees with low likelihoods. This makes MCMC analysis under the no-common-mechanism model very efficient, compared to the proposal mechanisms implemented for models in which branch-length parameters are shared across sites in the alignment. It should be possible, however, to implement the proposal mechanisms described here for a model with a common set of branch length parameters. For example, one could propose topology moves under the no-common-mechanism model but accept or reject the proposed topology under the (simpler) model of interest. Hence, one may be able to achieve the benefits of enhanced exploration of tree space afforded under the no-common-mechanism model for a model with a more reasonable number of free parameters.
| APPENDIX |
|---|
|
|
|---|
Figure 5 shows an apparent near-linear relationship between the integrated log likelihood and the parsimony score for trees sampled from the posterior distribution for each of the five data sets. This is not surprising as there is an exact linear relationship between the maximum likelihood value and parsimony score for trees under the no-common-mechanism model (Tuffley and Steel, 1997). We can explain the near linear relationship for the Bayesian no-common-mechanism model. We define a0 and a1 to be the natural logarithms of the integrated transition probabilities, namely
|
|
|
|
is a sum over all possible internal node reconstructions yj where each term of the sum has the form |
|
/
is small, a1 will be much smaller than a0 and the integrated likelihood sum will be dominated by the term with the most parsimonious reconstructions yj. The integrated log likelihood for the jth site will then be equal to |
|
) is the parsimony score for the jth site for tree
and
j(
) > 0 is the error from approximating the logarithm of a sum with the logarithm of its largest single term. Note that
j(
) will be largest for sites with multiple most parsimonious reconstructions yj. Summing this expression over the c sites shows that the integrated log likelihood of tree
equals |
|
) is the parsimony score for tree
. Provided that the sum in this expression varies little with
, there will be a nearly linear relationship between the integrated likelihood and the parsimony score. In fact, the slope of the graph relating the parsimony score and the integrated likelihood is a1 – a0, or about –log e (3
/
). In each example in this paper where
= 1 and
= 10, the slope is approximately –3.434. | Acknowledgment |
|---|
|
|
|---|
We thank Brent Mishler for background information on plant phylogeny that was useful for interpreting our results. This manuscript was improved by comments by Mike Steel, Mark Holder, and two anonymous reviewers. This research was supported by grants from the NSF (DEB-0445453) and NIH (GM-069801) awarded to J.P.H.
| References |
|---|
|
|
|---|
-
Beatty J., Fink W. L. Review of "Simplicity" by E. Sober. Syst. Zool (1979) 28:643–651.[CrossRef]
Berger J. O., Liseo B., Wolpert R. L. Integrated likelihood methods for eliminating nuisance parameters. Stat. Sci (1999) 14:1–28.[Web of Science]
Chase M. W., Soltis D. E., Olmstead R. G., Morgan D., Les D. H., Mishler B. D., Duvall M. R., Price R. A., Hills H. G., Qiu Y., Kron K. A., Rettig J. H., Conti E., Palmer J. D., Manhart J. R., Sytsma K. J., Michaels H. J., Kress W. J., Karol K. G., Clark W. D., Hedren M., Gaut B. S., Jansen R. K., Kim K., Wimpee C. F., Smith J. F., Furnier G. R., Strauss S. H., Xiang Q., Plunkett G. M., Soltis P. S., Swensen S. M., Williams S. E., Gadek P. A., Quinn C. J., Eguiarte L. E., Golenberg E., Learn G. H., Graham S. W., Barrett S. C. H., Dayanandan S., Albert V. A. Phylogenetics of seed plants: An analysis of nucleotide sequences from the plastid gene rbcl. Ann. Miss. Bot. G (1993) 80:528–580.[CrossRef]
Eldredge N., Cracraft J. Phylogenetic patterns and the evolutionary process (1980) New York: Columbia University Press.
Farris J. S. A probability model for inferring evolutionary trees. Syst. Zool (1973) 22:250–256.[Abstract]
Farris J. S. The logical basis of phylogenetic analysis. In: Advances in cladistics, volume 2—Platnick N. I., Funk V. A., eds. (1983) New York: Columbia University Press. 7–36.
Farris J. S., Albert V. A., Källersjö M., Lipscomb D., Kluge A. G. Parsimony jackknifing outperforms neighbor-joining. Cladistics (1996) 12:99–124.[CrossRef][Web of Science]
Felsenstein J. Cases in which parsimony and compatibility methods will be positively misleading. Syst. Zool (1978) 27:401–411.
Felsenstein J. Evolutionary trees from DNA sequences: A maximum likelihood approach. J. Mol. Evol (1981) 17:368–376.[CrossRef][Web of Science][Medline]
Felsenstein J. Inferring phylogenies (2004) Sunderland, Massachusetts: Sinauer Associates.
Gaffney E. S. An introduction to the logic of phylogeny reconstruction. In: Phylogenetic analysis and paleontology—Cracraft J., Eldredge N., eds. (1979) New York: Columbia University Press. 79–111.
Goldman N. Maximum likelihood inference of phylogenetic trees with special reference to a Poisson process model of DNA substitution and to parsimony analyses. Syst. Zool (1990) 39:345–361.
Goloboff P. A. Parsimony, likelihood, and simplicity. Cladistics (2003) 19:91–103.[CrossRef][Web of Science]
Green P. J. Trans-dimensional Markov chain Monte Carlo. In: Highly structured stochastic systems—Green P. J., Hjort N. L., Richardson S., eds. (2003) Oxford, UK: Oxford University Press. 179–198.
Hastings W. K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika (1970) 57:97–109.
Huelsenbeck J. P., Hillis D. M. Success of phylogenetic methods in the four-taxon case. Syst. Biol (1993) 42:247–264.
Huelsenbeck J. P., Rannala B. Frequentist properties of bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models. Syst. Biol (2004) 53:904–913.
Jukes T. H., Cantor C. R. Evolution of protein molecules. In: Mammalian protein metabolism—Munro H. N., ed. (1969) San Diego, California: Academic Press. 21–123.
Kluge A. G. Testability and the refutation and corroboration of cladistic hypotheses. Cladistics (1997) 13:81–96.[CrossRef][Web of Science]
Kluge A. G. Total evidence or taxonomic congruence: Cladistics or consensus classification. Cladistics (1998) 14:151–158.[CrossRef][Web of Science]
Kluge A. G., Wolf A. J. Cladistics: What's in a word? Cladistics (1993) 9:183–199.[CrossRef][Web of Science]
Lewis P. O. A likelihood approach to estimating phylogeny from discrete morphological character data. Syst. Biol (2001) 50:913–925.
Metropolis N., Rosenbluth A. W., Rosenbluth M. N., Teller A. H., Teller E. Equation of state calculations by fast computing machines. J. Chem. Phys (1953) 21:1087–1092.[CrossRef]
Neyman J., Scott E. L. Consistent estimates based on partially consistent observations. Econometrica (1948) 16:1–32.[CrossRef][Web of Science]
Popper K. R. The logic of scientific discovery (1959) New York: Basic Books.
Sanderson M. J., Wojciechowski M. F. Improved bootstrap confidence limits in large-scale phylogenies with an example from Neo-Astragalus (Leguminosae). Syst. Biol (2000) 49:671–685.
Savolainen V., Chase M. W., Hoot S. B., Morton C. M., Soltis D. E., Bayer C., Fay M. F., Bruijn A. D., Sullivan S., Qiu Q. L. Phylogenetics of flowering plants based upon a combined analysis of plastid atpb and rbcl gene sequences. Syst. Biol (2000) 49:306–362.
Siddall M. E., Kluge A. G. Probabilism and phylogenetic inference. Cladistics (1997) 13:313–336.[CrossRef][Web of Science]
Sinsheimer J. S., Suchard M. A., Dorman K. S., Fang F., Weiss R. E. Are you my mother? Bayesian phylogenetic inference of recombination among putative parental strains. App. Bioinf (2003) 2:131–144.
Stamatakis A., Ludwig T., Meier H. RAxML-III: A fast program for maximum likelihood based inference of large phylogenetic tress. Bioinformatics (2005) 21:456–463.
Suchard M. A., Weiss R. E., Dorman K. S., Sinsheimer J. S. Oh brother, where art thou? A Bayes factor test for recombination with uncertain heritage. Syst. Biol (2002) 51:715–728.
Suchard M. A., Weiss R. E., Dorman K. S., Sinsheimer J. S. Inferring spatial phylogenetic variation along nucleotide sequences: A multiple changepoint model. J. Am. Stat. Assoc (2003) 98:427–437.[CrossRef][Web of Science]
Swofford D. L. PAUP*: Phylogenetic analysis using parsimony (* and other methods) (1998) Sunderland, Massachusetts: Sinauer Associates.
Tavaré S. Some probabilistic and statistical problems on the analysis of DNA sequences. Lectures Math Life Sci (1986) 17:57–86.
Tierney L. Introduction to general state-space Markov chain theory. In: Markov chain Monte Carlo in practice—Gilks W. R., Richardson S., Spiegelhalter D. J., eds. (1996) London: Chapman and Hall. 59–74.
Tuffley C., Steel M. Links between maximum likelihood and maximum parsimony under a simple model of site substitution. Bull. Math. Biol (1997) 59:581–607.[Web of Science][Medline]
Whelan S. New approaches to phylogenetic tree search and their application to large numbers of protein alignments. Syst. Biol (2007) 56:727–740.
Wiley E. O. Karl R. Popper, systematics, and classification: A reply to Walter Bock and other evolutionary taxonomists. Syst. Biol (1975) 24:233–243.
Wiley E. O. Phylogenetics: The theory and practice of phylogenetic systematics (1981) New York: John Wiley and Sons.
Yang Z. Maximum likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol. Biol. Evol (1993) 10:1396–1401.[Abstract]
Yang Z., Nielsen R., Goldman N., Pedersen A. M. K. Codon-substitution models for heterogeneous selection pressure. Genetics (2000) 155:431–449.
This article has been cited by other articles:
![]() |
J. Wu and E. Susko General Heterotachy and Distance Method Adjustments Mol. Biol. Evol., December 1, 2009; 26(12): 2689 - 2697. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Kim and M. J. Sanderson Penalized Likelihood Phylogenetic Inference: Bridging the Parsimony-Likelihood Gap Syst Biol, October 1, 2008; 57(5): 665 - 674. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


















