© 2005 Society of Systematic Biologists
Analysis and Visualization of Tree Space
Edited by Frank Anderson: Associate Editor
1 Section of Integrative Biology and Center for Computational Biology and Bioinformatics, The University of Texas at Austin Austin, Texas, 78712, USA E-mail: dhillis{at}mail.utexas.edu (D.M.H.)
2 Department of Mathematics and Computer Science, Lehman College–City University of New York Bronx, New York, 10468, USA E-mail: stjohn{at}lehman.cuny.edu
| Abstract |
|---|
|
|
|---|
We explored the use of multidimensional scaling (MDS) of tree-to-tree pairwise distances to visualize the relationships among sets of phylogenetic trees. We found the technique to be useful for exploring "tree islands" (sets of topologically related trees among larger sets of near-optimal trees), for comparing sets of trees obtained from bootstrapping and Bayesian sampling, for comparing trees obtained from the analysis of several different genes, and for comparing multiple Bayesian analyses. The technique was also useful as a teaching aid for illustrating the progress of a Bayesian analysis and as an exploratory tool for examining large sets of phylogenetic trees. We also identified some limitations to the method, including distortions of the multidimensional tree space into two dimensions through the MDS technique, and the definition of the MDS-defined space based on a limited sample of trees. Nonetheless, the technique is a useful approach for the analysis of large sets of phylogenetic trees.
Keywords: Bayesian analysis; multidimensional scaling; phylogenetic analysis; tree space; visualization
Received May 23, 2004; Revised August 3, 2004; Accepted August 26, 2004
Systematists are often faced with the need to analyze a large collection of phylogenetic trees. These trees may represent a collection of equally parsimonious solutions to a phylogenetic problem, or a set of trees of similar likelihood, or a sampled set of trees from a Markov chain Monte Carlo (MCMC) Bayesian analysis. In any of these cases, a common approach for expressing the results is to make a consensus tree from the large collection of potential solutions (see Swofford, 1991, for a discussion of consensus methods). Consensus trees are produced to distill a large amount of information into a single summary tree, because it is often impractical to examine or display all of the individual solutions. In the case of MCMC Bayesian analysis, a consensus tree is usually used to summarize information about the posterior probabilities of the individual inferred branches. Although these uses of consensus trees may be appropriate for many purposes, a great deal of information about the individual solutions is usually lost. It is possible that two or more distinct but different biological explanations are represented among different "islands" of solutions (e.g., see Maddison, 1991), but that a consensus of these solutions produces little or no resolution. Although many other solutions among the universe of possible trees may be excluded by the available data, this information can be lost in a consensus tree.
This article describes an alternative method of exploring a set of phylogenetic trees. Geographical topology is often used as an analogy to describe the solution space of phylogenetic trees, and to describe or discuss the behavior of various tree-search strategies (e.g., see Swofford et al., 1996). However, systematists have not usually moved beyond analogy to attempt to visualize this solution space. Here we suggest one approach to this problem and describe a program that allows the rapid and efficient visualization of commonalities and relationships among a large set of phylogenetic trees.
| Multidimensional Scaling of Tree Space |
|---|
|
|
|---|
Biologists usually think of tree space in terms of the topological distance among different trees, which may be defined in terms of common measures such as weighted or unweighted Robinson-Foulds (RF) distance (Robinson and Foulds, 1979, 1981; see also Buneman, 1971). The unweighted RF distance merely sums the number of internal edges (branches) that must be collapsed or expanded to move from one tree topology to another, without any effect on the measure from the length of those edges. The weighted version of RF distance weights the edges by their length, so that two trees are considered to move further apart in tree space as the differences in their branch lengths increase. Thus, weighted RF distances can be useful for distinguishing among trees of identical topology, but with different branch lengths. Given a collection of trees, it is straightforward to calculate pairwise RF-distance matrices (weighted and unweighted) among all the pairs of trees, and this can be done efficiently (linear time with respect to number of taxa) using an algorithm developed by Day (1985). However, an exact representation of the relationships among all possible trees typically would require a large multidimensional space. A common method of visualizing and analyzing a large matrix of distances among points is multidimensional scaling (MDS; see Lingoes et al., 1979; Young and Hamer, 1987; Borg and Groenen, 1997). In MDS, a new space of only a few (typically two) dimensions is created in a manner to minimize the distortions of the observed distance matrix. An optimal solution to MDS involves minimizing a stress function, such as the Kruskal-1 function (Borg and Groenen, 1987), defined as
|
|
In this article, we explore the use of multidimensional scaling to compare and examine large sets of phylogenetic trees and present several example applications. We also briefly describe software (distributed without cost to the user) that can be used to visualize sets of phylogenetic trees in MDS space.
| Methods |
|---|
|
|
|---|
Tree Set Visualization Module for Mesquite
To visualize and examine large sets of phylogenetic trees, we used the Tree Set Visualization module for Mesquite (http://comet.lehman.cuny.edu/treeviz/). Maddison and Maddison (2004) developed the software package Mesquite as a modular system for evolutionary analysis. This project complements and extends their popular software tool, MacClade (Maddison and Maddison, 1992), but has the extra flexibility to allow new methods and visualization techniques to be easily added to the underlying phylogenetic tools. The package is written in Java, and, as such, runs under Macintosh, Windows, and Unix operating systems. The Tree Set Visualization module (Amenta and Klingner, 2002) takes sets of phylogenetic trees and uses MDS to display the trees as points in two-dimensional space, such that the distortion between the true distance between pairs of trees and the screen distance is minimized (based on the stress function described above). Sets of trees can be colored according to their score on an optimality criterion (e.g., a parsimony or maximum likelihood score) or color can represent subsets of trees (e.g., different colors for trees from different Bayesian analyses). Points on the screen can be selected to show the corresponding tree, or consensus of a set of trees (see Fig. 1).
|
We used this software package to study several potential applications of the visualization of phylogenetic tree space. These examples are meant to stimulate additional work on the visualization of tree space, and are not meant to be an exhaustive description of the potential uses of the software.
Bayesian Samples versus Bootstrap Samples
One of our analyses consisted of a comparison of a sample of trees from a Bayesian MCMC analysis (Larget and Simon, 1999) with a sample of trees obtained through nonparametric bootstrapping (Felsenstein, 1985). These two sampling methods have been widely used to obtain credible sets of trees and for estimating confidence limits on phylogenies. Comparisons of these methods have largely concentrated on comparing the posterior probabilities for particular internal branches calculated from the Bayesian analyses with the bootstrap proportions for these same branches (e.g., Wilcox et al., 2002; Suzuki et al., 2002; Alfaro et al., 2003; Cummings et al., 2003; Erixon et al., 2003). We explored an alternative approach that involved the comparison of the sets of sampled trees to the true (model) tree in tree-space.
The data used to generate the trees for comparing the results of Bayesian and bootstrapping analyses were obtained by simulating sequences of genes on the topology and branch lengths of a 44-taxon mammalian tree (Murphy et al., 2000; Fig. 2). This tree was the result of an analysis of 22 different genes and was chosen as a model for our simulations because of the large number of genes and the well-supported resulting phylogenetic estimate (this tree has been discussed in several recent theoretical studies of phylogeny; e.g., Suzuki et al., 2002; Alfaro et al., 2003; Douady et al., 2003). For each gene, we conducted a likelihood-ratio test to select an appropriate model of evolution (Posada and Crandall, 2001) and used PAUP* 4.0b10 (Swofford, 2000) to estimate the optimal model parameters. Then, we computed pairwise Euclidian distances between each set of model parameters by taking the square root of the sum of the squared deviations of each parameter in the model. These distances were then analyzed using multidimensional scaling (Fig. 3). From this MDS analysis, we chose the model parameters of the gene closest to the centroid (IRBP) as simulation conditions for the single gene analysis (other genes were selected for multigene analyses, as described below). The preferred model for this gene was a general time-reversible model with discrete gamma-distributed rate heterogeneity and a proportion of invariant sites (Appendix 2). Seq-Gen version 1.2.5 (Rambaut and Grassly, 1997) was used to simulate 1000 nucleotides per taxon under this model of evolution on the tree shown in Figure 2.
|
|
To obtain the bootstrap sample, we analyzed a simulated data set for 1000 nonparametric bootstrap replicates in PAUP* 4.0b10 (Swofford, 2000) with nearest-neighbor-interchange swapping on a stepwise-addition starting tree. Prior to bootstrapping, optimal model parameters were estimated and set for the bootstrapping analysis. Additionally, we analyzed the same data set using MrBayes 3.04b (Huelsenbeck and Ronquist, 2003). The four Markov chains (see Geyer, 1991; Gilks and Roberts, 1996; Huelsenbeck et al., 2001) were run for 5 million generations under the simulation model (GTR+I+
), sampling every 1000 generations. The 1000 trees from the bootstrapping analysis were combined with the sample of every 1000th tree from the last 1,000,000 generations of the Bayesian analysis, together with the single true (model) tree, into a single file for visualization using multidimensional scaling.
Gene Length and Concatenated Datasets
A second problem that we addressed concerned the performance of phylogenetic inference as a function of increasing sequence length and increasing numbers of genes that evolve under different models of evolution. For this analysis, we selected five genes from the MDS analysis of genes from the Murphy et al. (2001) study (Fig. 3). In this case, we included the gene closest to the centroid (IRBP), as well as four genes that were the most divergent from IRBP in the MDS analysis (Fig. 3). For each gene, we simulated independent datasets from 200 to 1000 nucleotides long on the tree shown in Figure 2.
We analyzed each data set under its individual simulation model in MrBayes 3.04b (Huelsenbeck and Ronquist, 2003) for one million generations. Additionally, we concatenated and analyzed all of the genes in a combined analysis. We conducted a hierarchical likelihood ratio test to determine a suitable model of analysis for each concatenated data set (Posada and Crandall, 2001); these tests on the concatenated data sets always selected the GTR+I+
model. The multigene data sets were then analyzed in MrBayes under a single model (GTR+I+
) for one million generations. The concatenated data sets were analyzed in MrBayes under a "composite model" consisting of the five models used to simulate each gene (Nylander et al., 2004). The true tree and one thousand trees from the analyses of each single gene data set and the single model analysis of the concatenated data set, sampled every 100 generations for the last 100,000 generations, were combined for the MDS analysis.
Comparing Two Bayesian Analyses
A third problem that we considered was the use of multidimensional scaling to compare the samples of trees from multiple independent Bayesian analyses, as a means of determining the degree of convergence in the analyses. For this example, we used the same data set described above (for comparing Bayesian and bootstrap samples) to compare the trees generated by two Bayesian analyses. Two runs were conducted using MrBayes; each run consisted of four Markov chains run for 10 million generations sampled every 1000 generations. The model used for analysis was GTR+I+
(the simulation model).
Three thousand trees from the last 3 million generations from each analysis were combined into a single file for visualization using MDS. These trees were plotted using both weighted and unweighted RF distances (Robinson and Foulds, 1979, 1981).
| Results and Discussion |
|---|
|
|
|---|
Comparison of Bayesian and Bootstrap Samples of Phylogenetic Trees
Interpretation of the results of bootstrap resampling results from phylogenetic studies has been the focus of considerable discussion. Felsenstein (1985) suggested that bootstrapping could be used to assess confidence in particular branches of a phylogenetic tree. Felsenstein's method relies on using the proportion of the bootstrap pseudosamples that support each branch in the tree (the bootstrap proportions). However, several authors have noted that these proportions present a highly biased (and usually conservative) estimates of phylogenetic accuracy (e.g., Zharkikh and Li, 1992a, 1992b, 1995; Hillis and Bull, 1993; Rodrigo, 1993; Li and Zharkikh, 1994; Efron et al., 1996). More recently, Bayesian posterior probabilities (derived from a sample of trees drawn from an MCMC search of tree space after the search has reached equilibrium) have also been used to assess the support or confidence of individual branches in a phylogenetic tree. Many authors have suggested that these Bayesian posterior probabilities are more reasonable estimates of phylogenetic accuracy compared to bootstrap proportions (given the assumptions of the models used in the estimation procedure), although other authors disagree (e.g., see Wilcox et al., 2002; Suzuki et al., 2002; Alfaro et al., 2003; Cummings et al., 2003; and Huelsenbeck and Rannala, 2004, for a diversity of opinions). However, both of these methods (nonparametric bootstrapping and Bayesian MCMC analysis) can be used in a different manner to assess phylogenetic results; namely, they can be used to produce a "credible set" of trees that are considered consistent with the observed data. Bootstrap proportions and Bayesian posterior probabilities are both summary statistics that are extracted from these respective credible sets of trees, and both statistics result in potential loss of information. An alternative approach might be to visualize the tree space that is defined by these credible sets and then use these credible sets to envision actual "confidence limits" in this space around the best estimate. The space within these confidence limits could be used to test a particular hypothesis that might depend upon many different branches in a phylogenetic tree, rather than on the support of a single branch.
An example of this approach is illustrated in Figure 4. In this figure, the true (model) tree is represented by the yellow dot, the trees that are estimated from bootstrap pseudosamples are shown in red, and the trees that are sampled from the MCMC analysis are shown in blue. One might be tempted to draw confidence ellipses around these samples to include 95% of the results, which could indicate the "confidence limits" of the result in tree space. However, there are problems with this approach. First, MDS space is defined by a limited sample of trees (rather than the universe of possible trees). So the "confidence limits" would be dependent on the sample used to define the space. Second, there is distortion of multidimensional space in two dimensions (see section on potential limitations, below). Therefore, we do not think that this approach to defining confidence limits of trees is likely to be practical. Nonetheless, we see the visualization shown in Figure 4 as a useful way to illustrate the relative distributions of trees drawn from bootstrap resampling to trees sampled from a MCMC Bayesian analysis. The tighter clustering of the Bayesian sample compared to the bootstrap pseudosample corresponds to the broader confidence limits typically obtained from nonparametric bootstrap analyses (e.g., Wilcox et al., 2002; Huelsenbeck and Rannala, 2004).
|
Comparing Results from Multiple Genes and Concatenated Data Sets
Another potential application for the visualization of tree space is in studies of multiple data sets and the combination of data sets. The MDS analysis of the results from our simulation of five of the genes studied by Murphy et al. (2001), as well as the analysis of the concatenation of those five genes, is shown in Figure 5. When we compared the RF distances of the trees in Figure 5a to the true tree by computing the distance matrix in PAUP* 4.0b10 (Swofford, 2000), we found that the true tree was an RF distance of 18 to 60 from the estimated trees for the separate genes, although the single model analysis of the concatenated dataset provided much closer estimates of the true tree (RF = {4, 6, 8, ..., 22{). With longer sequences (1000 nucleotides for each gene), three of the single gene analyses included the true tree, whereas the space defined by trees from the other two genes did not include the true tree (Fig. 6). However, both the single model and composite model analyses of the concatenated data set produced similar results and found the true tree or trees that were only a few rearrangements from the true tree (maximum RF = 6; Fig. 6). In this example, all of the genes were simulated on the same tree, and the simulations differed only in the underlying model of evolution of the gene. The analysis suggests that a combined approach to data analysis was highly successful in this case, using either a homogeneous or heterogeneous model for analysis. We do not wish to generalize this result to suggest that all genes should be combined in a joint analysis, as we suspect that the specific results are likely to differ depending on the degrees of (and reasons for) differences among the genes (or other data partitions). Rather, we use this example to show how the visualization approach might be used to investigate the utility of combined versus separate analysis of datasets.
|
|
Comparing Results from Multiple Bayesian Analyses
In conducting a Bayesian phylogenetic analysis, it is important to assess whether or not a sample from a Markov chain has converged on the joint posterior distribution of the various parameters (Huelsenbeck and Ronquist, 2003). However, the techniques for assessing convergence have not been well developed, and most approaches do not examine convergence on the joint posterior distribution of the parameters, but instead examine convergence on the posterior distributions of individual parameters (the marginal posterior distributions). For instance, a common approach is to examine the marginal posterior probabilities of individual bipartitions of the tree across several Bayesian MCMC runs, and then to assume that the samples from the Markov chains have converged if these posterior probabilities are consistent. Many phylogenetic hypotheses, however, do not depend on particular bipartitions in the tree, but rather on larger features of tree topology or combinations of branch lengths, and convergence on marginal posterior distributions does not guarantee convergence on the joint posterior distribution. Here we suggest a method for detecting non-convergence in Bayesian analyses on the joint posterior distribution of topologies and branch lengths.
Figure 7a shows the visualization of tree space sampled from two different Bayesian analyses of the same data set, with the trees compared using unweighted Robinson-Foulds distance. In this case, the sampled space is nearly indistinguishable in the two separate runs (even though the same trees are rarely sampled in the two analyses), indicating that the sampled space of tree topologies (without consideration of branch lengths) was very similar in the two replicates. Figure 7b shows the results of a similar analysis, but using weighted Robinson-Foulds distances to compare the trees, and in this case the sampled space of the two runs is clearly distinguishable. We interpret this result to mean that there is a greater degree of branch-length similarity within independent runs than between runs, suggesting that the chains have either not reached stationarity with respect to branch lengths, or more likely they have not been sampled for a long enough time (after stationarity was reached) to ensure convergence to the joint posterior distribution. In contrast, Figure 7a suggests that the space of tree topologies (independent of branch lengths) has been consistently sampled in the two analyses, although one would probably want to examine additional runs to ensure this result. Thus, this approach to visualizing the results of Bayesian analyses may prove to be a fruitful heuristic for assessing appropriate chain lengths and sampling strategies in MCMC Bayesian analyses of phylogeny. We have also used this approach with several empirical data sets (results not shown) and have found it to be a useful approach for assessing convergence among independent analyses.
|
Some Other Potential Uses of MDS for Visualizing Tree-Space
Another potential use of multidimensional scaling as a means of visualizing tree-space is for examining the progress of a Bayesian MCMC analysis. This visualization may be useful mostly as a tool for describing the method (as in a class or workshop on phylogenetic methods). As an example, we selected a sample of every 10th tree from the first 100,000 generations of an analysis of frog phylogeny (Hillis and Wilcox, 2005), and analyzed the trees under MDS (Fig. 8). These trees represent the early samples from the Markov chain, as the chain begins to sample trees across tree-space, moving to regions of progressively higher optimality scores. This progress can be animated using the Tree Set Visualization module. We have found this example of tree-set visualization to be a useful means for describing the Bayesian MCMC approach in classes and workshops on phylogenetics (see http://lewis.eeb.uconn.edu/lewishome/software. html for another useful MCMC instruction tool).
|
Another common problem in phylogenetics is the discovery of several distinct "tree islands" of equally optimal or near-optimal phylogenetic solutions for a given dataset (Maddison, 1991). In analyzing a particular data set, one might discover that there are a large number of solutions that fit the data equally well. A consensus of these trees may show little or no resolution. However, an unresolved consensus tree does not necessarily indicate that all potential solutions fit the data equally well. Separate summaries of each of the tree islands is likely to show a much higher degree of resolution, and the separate tree islands may represent alternative phylogenetic solutions for the data set. Tree Set Visualization can be used to identify and analyze these tree islands, as shown in Figure 9.
|
Potential Limitations of MDS for Visualizing Tree-Space
Multidimensional scaling based on RF distances is clearly not the only way (and is not necessarily even the best way) to visualize and represent tree-space. We have found this approach to the problem to be useful for exploring large sets of phylogenetic trees, but we also recognize that the approach has some limitations. For instance, any reduction of high-dimensional space into two dimensions necessarily will result in some distortions. As an example of distortion, consider the MDS visualization of trees shown in Figure 10. In this case, a reference tree is shown in blue, and a series of trees that differ from the reference tree by one bipartition each (RF = 2) are shown in red. All of the trees are equally distant from the reference tree in tree-space, and in multiple dimensions would form a "multidimensional sphere" around the reference tree. In addition, pairs of trees in close proximity in Figure 10 differ from each other by one bipartition, and all other trees represented by red dots differ from one another by two bipartitions (RF = 4). The stress function in the MDS analysis is minimized in two dimensions with the representation shown in Figure 10. It would be easy to misinterpret this diagram, and one might think that the trees closer to the center of the figure were closer in RF distance to the reference tree than those that are represented on the outside of the circle. Therefore, it is important to understand the potential for such distortions, and to check the primary distance matrix before interpreting too much about the spatial relationships of trees shown in the MDS analyses.
|
A second caveat that must be considered is that the tree space in these examples is entirely dependent on the sample of trees examined. In other words, the space is redefined and redistorted with the addition of any new tree, and the space visualized does not exist in the absence of the sample at hand. Therefore, care should be taken to avoid over-interpretation of results. Ideally, one might want to define the tree space by considering all possible solutions, and then plot the results of a particular analysis within the universe of possible solutions. This may be possible for smaller data sets, although the large number of possible tree topologies for even a modest number of taxa presents a serious challenge to this approach.
A third limitation concerns the use of RF distances to define the tree-space. This is appropriate when the related trees are connected to one another by nearest-neighbor interchanges (Penny and Hendy, 1985). However, two trees may differ only in the placement of a single taxon, and yet exhibit the maximum possible RF distance from one another. It would be useful to explore other methods for defining the pairwise distances between trees and using these distances to define and analyze tree space.
A fourth limitation involves the use of multidimensional scaling. In some cases, it is possible for the MDS analysis to become trapped in local optima. This can be avoided, however, by altering the initial states, conducting multiple restarts, and changing the step size used by the MDS algorithm (Borg and Groenen, 1997). It is also important to consider that visualization by multidimensional scaling relies on human pattern recognition skills to identify clusters of trees. Stockham et al. (2002) have developed a more formal and quantifiable method for clustering sets of phylogenetic trees.
| Conclusions |
|---|
|
|
|---|
Our intent in this article was to present an approach for multidimensional scaling of tree-space and to suggest some of its potential applications. We acknowledge that we have not thoroughly developed any of the suggested possible uses of this approach in this article. However, we hope that this article will stimulate additional research on the applications that we have suggested, and that the availability of a program for conducting these analyses will aid in these investigations. At the minimum, it appears that the approach presents a useful means for visualizing the results of large sets of phylogenetic trees. We expect that it will lead to new ways of thinking about samples of trees, beyond the usual consensus summaries.
| Appendix 1. Tree Description of the Tree from Murphy Et Al. (2000) |
|---|
|
|
|---|
((Opossum: 0.072454, Diprotodontian: 0.061694):0, ((((Sloth: 0.056950, Anteater: 0.061637):0.009169, Armadillo: 0.056660):0.032179, ((((Hedgehog: 0.137379, Shrew: 0.124147):0.011789, Mole: 0.086828): 0.011954, (((Phyllostomid: 0.093178, Free tailed bat: 0.046665):0.011564, (False vampire bat: 0.062583, (Flying Fox: 0.018553, Rousette Fruitbat: 0.018931):0.036729):0.004788):0.016400, ((((((Whale: 0.013788, Dolphin: 0.021978):0.019568, Hippo: 0.039894):0.004885, Ruminant: 0.073210): 0.008450, Pig: 0.067448):0.005893, Llama: 0.061851):0.027757, ((Horse: 0.043682, (Rhino: 0.028867, Tapir: 0.028638):0.005116):0.020583, ((Cat: 0.046372, Caniform: 0.055840):0.023068, Pangolin: 0.075956):0.003871): 0.001685):0.001155):0.002432):0.011058, (((Sciurid: 0.083962, ((Mouse: 0.042059, Rat: 0.045451):0.122018, (Hystricid: 0.074622, Caviomorph: 0.086677):0.062121):0.005432):0.011864, (Rabbit: 0.057873, Pika: 0.108683):0.043771):0.005743, ((Flying Lemur: 0.061380, Tree Shrew: 0.101818):0.003958, (Strepsirrhine: 0.076186, Human: 0.065099): 0.009553):0.001707):0.007711):0.009175):0.005977, ((((Tenrecid: 0.142758, Golden Mole: 0.067180):0.009411, (Short Eared Elephant Shrew: 0.039055, Long Eared Elephant Shrew: 0.036033):0.088816): 0.002240, Aardvark: 0.068518):0.003248, ((Sirenian: 0.038154, Hyrax: 0.089482):0.002916, Elephant: 0.050883):0.014801):0.025967):0.284326)
|
|
| Acknowledgements |
|---|
We thank Derrick Zwickl for comments and assistance, and for suggesting Figure 10. Nina Amenta and Wayne Maddison provided considerable assistance and advice on the use of Tree Set Visualization and Mesquite. We also thank the other individuals who helped develop the Tree Set Visualization module, including Jeff Klingner, Fred Clarke, Denise Edwards, Silvio Neris, and Ruchi Mahindru. Andy Anderson, Peter Foster, Tamara Munzner, Rod Page, and Alisha Holloway provided helpful suggestions on the manuscript and software. Funding for this project was provided by the National Science Foundation (NSF ITR 0121682/0121651 to Nina Amenta and David Hillis at University of Texas and Katherine St. John at City University of New York), and Tracy Heath was supported on an NSF IGERT fellowship in Computational Phylogenetics and Applications to Biology at the University of Texas.
| References |
|---|
|
|
|---|
-
Alfaro M. E., Zoller S., Lutzoni F. Bayes or bootstrap? A simulation study comparing the performance of Bayesian Markov chain Monte Carlo sampling and bootstrapping in assessing phylogenetic confidence. Mol. Biol. Evol. (2003) 20:255–266.
Amenta N., Klingner J. Case study: Visualizing sets of evolutionary trees. (2002) 8th IEEE Symposium on Information Visualization. 71–74.
Borg I., Groenen P. Modern multidimensional scaling (1997) Heidelberg: Springer-Verlag.
Buckley T. R., Arensburger P., Simon C., Chambers G. K. Combined data, Bayesian phylogenetics, and the origin of the New Zealand cicada genera. Syst. Biol. (2002) 51:4–18.
Buneman P. The recovery of trees from measures of dissimilarity. In: Mathematics in archaeological and historical sciences—Hodson F. R., Kendall D. G., Tautu P., eds. (1971) Edinburgh: Edinburgh University Press. Pages 387–395.
Cummings M. P., Handley S. A., Myers D. S., Reed D. L., Rokas A., Winka K. Comparing bootstrap and posterior probability values in the four-taxon case. Syst. Biol. (2003) 55:477–487.
Day W. H. E. Optimal algorithms for comparing trees with labeled leaves. J. Classification (1985) 2:7–28.[CrossRef]
Efron B., Halloran E., Holmes S. Bootstrap confidence levels for phylogenetic trees. Proc. Natl. Acad. Sci. USA (1996) 93:7085–7090.
Erixon P., Svennblad B., Britton T., Oxelman B. Reliability of Bayesian posterior probabilities and bootstrap frequencies in phylogenetics. Syst. Biol. (2003) 52:665–673.
Felsenstein J. Confidence limits on phylogenies: An approach using the bootstrap. Evolution (1985) 39:783–791.[CrossRef][Web of Science]
Gilks W. R., Roberts G. O. Strategies for improving MCMC. In: Markov Monte Carlo in practice—Gilks W. R., Roberts G. O., Spiegelhalter D. J., eds. (1996) London: Chapman and Hall. Pages 89–114.
Geyer C. J. Markov chain Monte Carlo maximum likelihood. In: Computing science and statistics: Proceedings of the 23rd Symposium on the Interface—Keramidas E. M., ed. (1991) Fairfax Station: Interface Foundation. Pages 156–163.
Hillis D. M., Bull J. J. An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst. Biol. (1993) 42:182–192.
Huelsenbeck J. P., Rannala B. Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models. Syst. Biol. (2004) in press.
Huelsenbeck J. P., Ronquist F. MRBAYES: Bayesian inference of phylogeny. Bioinformatics. (2001) 17:754–755.
Huelsenbeck J. P., Ronquist F., Nielson R., Bollback J. P. Bayesian inference of phylogeny and its impact on evolutionary biology. Science. (2001) 294:2310–2314.
Larget B., Simon D. L. Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Mol. Biol. Evol. (1999) 16:750–759.[Web of Science]
Li W.-H., Zharkikh A. What is the bootstrap technique? Syst. Biol. (1994) 43:424–430.
Lingoes J. C., Roskam E. E., Borg I. Geometric representations of relational data (1979) 2nd edition. Ann Arbor, Michigan: Mathesis Press.
Maddison D. R. The discovery and importance of multiple islands of most-parsimonious trees. Syst. Zool. (1991) 40:315–328.[Abstract]
Maddsion W. P., Maddison D. R. MacClade: Analysis of phylogeny and character evolution (1992) Sunderland, Massachusetts: Sinauer. see also http://www.macclade.org.
Maddison W. P., Maddison D. R. Mesquite: A modular system for evolutionary analysis (2004) Version 1.02. http://mesquiteproject.org.
Murphy W. J., Eizirik E., O'Brien S. J., Madsen O., Scally M., Douady C. J., Teeling E., Ryder O. A., Stanhope M. J., de Jong W. W., Springer M. S. Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science. (2001) 294:2348–2351.
Nylander J. A. A., Ronquist F., Huelsenbeck J. P., Nieves-Aldrey J. L. Bayesian phylogenetic analysis of combined data. Syst. Biol. (2004) 53:47–67.
Rodrigo A. G. Calibrating the bootstrap test of monophyly. Int. J. Parasitol. (1993) 23:507–514.[CrossRef][Web of Science][Medline]
Penny D., Hendy M. D. The use of tree comparison metrics. Syst. Zool. (1985) 34:75–82.
Posada D., Crandall K. A. Selecting the best-fit model of nucleotide substitution. Syst. Biol. (2001) 50:580–601.
Rambaut A., Grassly N. C. Seq-Gen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. (1997) 13:235–238.
Robinson D. F., Foulds L. R. Comparison of weighted labeled trees. Lect. Notes Math. (1979) 748:119–126.[CrossRef]
Robinson D. F., Foulds L. R. Comparison of phylogenetic trees. Math. Biosci. (1981) 53:131–147.[CrossRef][Web of Science]
Stockham C., Wang L.-S., Warnow T. Statistically based postprocessing of phylogenetic analysis by clustering. Bioinformatics. (2002) 18:S285–S293.[Abstract]
Suzuki Y., Glazko G. V., Nei M. Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics. Proc. Natl. Acad. Sci. USA (2002) 99:16138–16143.
Swofford D. L. When are phylogeny estimates from molecular and morphological data incongruent? In: Phylogenetic analysis of DNA aequences—Miyamoto M. M., Cracraft J., eds. (1991) New York: Oxford University Press. Pages 295–333.
Swofford D. L. PAUP*: Phylogenetic analysis using parsimony (*and other methods) (2000) Sunderland, Massachusetts: Sinauer Associates.
Swofford D. L., Olsen G. J., Waddell P. J., Hillis D. M. Phylogenetic inference. In: Molecular systematics—Hillis D. M., Moritz C., Mable B. K., eds. (1996) 2nd edition. Sunderland, Massachusetts: Sinauer Associates. Pages 407–514.
Young F. W., Hamer R. M. Multidimensional scaling: History, theory and applications (1987) New York: Erlbaum.
Whittingham L. A., Slikas B., Winkler D. W., Sheldon F. H. Phylogeny of the tree swallow genus Tachycineta (Aves: Hirundinidae), by Bayesian analysis of mitochondrial DNA sequences. Mol. Phylogenet. Evol. (2002) 22:430–441.[CrossRef][Web of Science][Medline]
Wilcox T. P., Zwickl D. J., Heath T. A., Hillis D. M. Phylogenetic relationships of the dwarf boas and a comparison of Bayesian and bootstrap measures of phylogenetic support. Mol. Phylogenet. Evol. (2002) 25:361–371.[CrossRef][Web of Science][Medline]
Zharkikh A., Li W.-H. Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. I. Four taxa with a molecular clock. Mol. Biol. Evol. (1992a) 9:1119–1147.[Abstract]
Zharkikh A., Li W.-H. Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. II. Four taxa without a molecular clock. J. Mol. Evol. (1992b) 35:356–366.[CrossRef][Web of Science][Medline]
Zharkikh A., Li W.-H. Estimation of confidence in phylogeny: The complete-and-partial bootstrap technique. Mol. Phylogenet. Evol. (1995) 4:44–63.[CrossRef][Medline]
This article has been cited by other articles:
![]() |
A. Stamatakis, P. Hoover, and J. Rougemont A Rapid Bootstrap Algorithm for the RAxML Web Servers Syst Biol, October 1, 2008; 57(5): 758 - 771. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. M. W. Nye Trees of Trees: An Approach to Comparing Multiple Alternative Phylogenies Syst Biol, October 1, 2008; 57(5): 785 - 794. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Lakner, P. van der Mark, J. P. Huelsenbeck, B. Larget, and F. Ronquist Efficiency of Markov Chain Monte Carlo Tree Proposals in Bayesian Phylogenetics Syst Biol, February 1, 2008; 57(1): 86 - 103. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. A. Morrison Increasing the Efficiency of Searches for the Maximum Likelihood Tree in a Phylogenetic Analysis of up to 150 Nucleotide Sequences Syst Biol, December 1, 2007; 56(6): 988 - 1010. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Soria-Carrasco, G. Talavera, J. Igea, and J. Castresana The K tree score: quantification of differences in the relative branch length and topology of phylogenetic trees Bioinformatics, November 1, 2007; 23(21): 2954 - 2956. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. B. Smythe, M. J. Sanderson, and S. A. Nadler Nematode Small Subunit Phylogeny Correlates with Alignment Parameters Syst Biol, December 1, 2006; 55(6): 972 - 992. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. A. Matsen A Geometric Approach to Tree Shape Statistics Syst Biol, August 1, 2006; 55(4): 652 - 661. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||












