| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
© 2004 Society of Systematic Biologists
Sampling Properties of the Bootstrap Support in Molecular Phylogeny: Influence of Nonindependence Among Sites
Edited by Nick Goldman: Associate Editor
CNRS UMR 5000, Génome, Populations, Interactions, Université Montpellier 2 CC 63, Place E. Bataillon, 34095 Montpellier, France; E-mail: galtier{at}univ-montp2.fr
| Abstract |
|---|
|
|
|---|
The influence of nonindependence among sites on phylogenetic reconstructions and bootstrap scores was investigated both analytically and empirically. First, the sampling properties of the bootstrap support in the four-species case was derived for the maximum-parsimony method, assuming either independently or nonindependently evolving sites. The influence of various models of departure from the independence assumption was quantified. Second, trees and bootstrap scores estimated from subsets of consecutive (potentially coevolving) versus dispersed (presumably independent) sites of a ribosomal RNA data set were contrasted. The two approaches consistently suggest that a departure from the assumption of independent sites tends to reduce the amount of phylogenetic information contained in the data, but to increase the apparent statistical support for reconstructed trees, as measured by the bootstrap. In particular, nonindependence can lead to strongly supported wrong internal branches.
Keywords: Bootstrap; coevolution; molecular phylogeny; ribosomal RNA; site independence
Received March 18, 2003; Revised June 11, 2003; Accepted September 14, 2003
Independence among sites is a fundamental assumption of virtually all methods used to access molecular phylogeny, including model-based methods for reconstructing trees and methods for assessing the level of confidence in reconstructions, among which the bootstrap (Felsenstein, 1985) is by far the most widely used. However, biologists agree that nucleic acid and protein sites probably do not all evolve independently, mostly for functional reasons. The set of allowed (or probable) amino acid or nucleotide states at a certain site, for instance, might be constrained by the state at some other sites, e.g., neighboring sites. This is especially true for structural RNAs (e.g., ribosomal and transfer RNAs) whose secondary structure is stabilized by Watson–Crick interactions between pairs of ribonucleotides. Such paired nucleotides evidently coevolve through compensatory substitutions; only C:G, A:U, and possibly G:U pairs are allowed (Vawter and Brown, 1993; see Hickson et al., 1996, for exceptions). Specific Markov models of sequence evolution have been built to represent this peculiarity of ribosomal RNA (rRNA) data (Muse, 1995; Rzhetsky, 1995; Tillier and Collins, 1995, 1998). Compensatory substitutions also have been detected in protein-coding genes (Zhang and Rosenberg, 2002).
How much a departure from the assumption of independent evolution among sites will affect phylogenetic reconstructions (when it is neglected) was empirically addressed by Tillier and Collins (1995) when they simulated the evolution of nucleotide sequences under nonindependence and assessed the performance of the neighbor-joining (NJ; Saitou and Nei, 1987) and maximum-likelihood (ML; Felsenstein, 1981) methods. ML was more robust to nonindependence than was NJ. The performance of NJ was improved when a model accounting for nonindependent changes was used. The authors commented that a departure from the assumption of independence among sites is essentially equivalent to a reduction of the number of sites (Tillier and Collins, 1995, 1998), i.e., the phylogenetic signal is reduced because several sites carry the same (or correlated) information.
Another and perhaps more worrying consequence of nonindependent sites concerns the evaluation of the reliability of the reconstructed trees. Tillier and Collins (1995) noted that incorporation of nonindependence leads to an overestimation of the statistical confidence in the trees, as measured by Tajima's (1992) distance-based test. They argued that a similar overestimation of the level of confidence should apply for other measures, including the bootstrap, an argument previously made by Dixon and Hillis (1993). Nonindependence among sites, although decreasing the phylogenetic content of the data set, tends to increase its self-consistence, i.e., coevolving sites tend to "agree." Consider the extreme case of a 100-site data set that includes a subset of 90 fully linked sites (i.e., sites undergoing only simultaneous changes). These 90 sites are just as reliable as any of the remaining 10 sites, but an internal branch supported by the 90 linked sites would be given a high bootstrap support value (typically > 90%), whether the branch is correct or not.
This upward bias of the bootstrap support value, although strongly suggested by this example, has never been quantified theoretically or empirically, which might be the reason this bias has hardly ever been acknowledged when discussing the statistical support of trees reconstructed from, say, ribosomal versus protein data (although the former presumably violate the independence assumption to a larger extent than do the latter). Here, I examine the influence of departure from independence on bootstrap scores, both analytically (in the simple case of a four-species maximum parsimony reconstruction) and empirically (using rRNA data).
| Theory |
|---|
|
|
|---|
Definitions
Let A, B, C, and D be four taxa, and let T = ((A,B)(C,D)) be the true tree connecting these taxa. Let D be a four-sequence data set with n sites. The unweighted maximum-parsimony (MP) method will unambiguously reconstruct T if and only if the number n1 of sites of the form (A:X, B:X, C:Y, D:Y) is higher than n2 and n3, the number of sites of the form (A:X, B:Y, C:X, D:Y) and (A:X, B:Y, C:Y, D:X), respectively, where X and Y are any distinct character states. Sites not matching these three patterns are not informative. Their total number will be called n4, where the sum of ni for 1
i
4 is n. From the point of view of MP, D is fully described by the vector (n1, n2, n3, n4).
Consider the process of sequence evolution that gives rise to a data set D. Usually, such a process is modeled by defining branch lengths and a Markov transition matrix. The probability of occurrence of a certain data set D is a function of these parameters (Felsenstein, 1981). In the four-species MP case, what matters is just the relative proportions of the four categories of sites defined above. Let p1, p2, p3, and p4 be the expected proportions of the four categories of sites. The pi values sum to 1. They are (not given here) functions of the branch lengths and of the rate matrix. D = (n1, n2, n3, n4) follows a multinomial distribution
(n, p1, p2, p3, p4). With these notations, the probability that MP unambiguously supports T is
|
| (1) |
The next step is to bootstrap data set D. Bootstrapping D means generating m pseudo–data sets, each one being obtained by resampling n sites with replacements out of the n sites in D. Let D*i = (b1i, b2i, b3i, b4i) be the ith pseudo–data set, where bji is the number of sites of category j in pseudo–data set i. Let Xi be the boolean variable defined by Xi = 1 if (b1i > b2i, b1i > b3i), Xi = 0 otherwise. Given D, Xi's are independent and identically distributed Bernoulli variables. The bootstrap support for true tree T is a random variable defined as
|
| (2) |
Sampling Properties of Bootstrap Support (Independent Sites)
In this section, I investigate the sampling properties of B as a function of n and the pi values under the assumption of independent sites. First consider the distribution of B conditional on certain data set D = (ni):
|
| (3) |
|
| (4) |
|
| (5) |
|
| (6) |
x
1,
|
| (7) |
Monte Carlo Approximations
Equations 5 and 7 give the expectation and cumulative distribution of the bootstrap support for the true tree in the four-species MP case under the assumption of independence among sites. Unfortunately, the complexity of the calculations (O(n6) for Eq. 5) makes them intractable for large n; thus, a Monte Carlo approximation is used. Equation 5 can be approximated by
|
| (8) |
i
4) is a random multinomial deviate
(n, pi) and where a is any (preferably big) integer. In Equation 8, the conditional expectation E[B| (ni)] is averaged over a set of pseudo–data sets (ni) randomly sampled from their distribution. A further level of approximation can be reached:
|
| (9) |
i
4) is a compound multinomial deviate, i.e., a multinomial deviate
[n, ni(k)/n], where [ni(k)] (1
i
4) is a multinomial deviate
(n, pi). Similarly, Equation 7 will be approximated by
|
| (10) |
i
4) is a random multinomial deviate
(n, pi). In this study, a and b were fixed to 5,000.
Bootstrap Support for the MP Tree
The above equations approach the distribution of the bootstrap support for true tree T, usually unknown. The distribution of
, the bootstrap support for MP tree
, might be more relevant in practice and is obtained by slight modifications of Equations 5 and 9
. The condition of success for X moves from (b1 > b2 and b1 > b3) to [imax (b1, b2, b3) = imax (n1, n2, n3)], where imax returns the rank (1, 2, or 3) of the highest number within parentheses. The expectation becomes
|
| (11) |
|
| (12) |
Nonindependence Among Sites
I now introduce departures from the assumption of independence among sites. First consider an extreme case in which the data set D includes n/2 independent pairs of fully linked sites (i.e., the two sites of any given pair must belong to the same category). Such a data set includes the same amount of information as a data set with n/2 independent sites. The probability of correct unambiguous MP reconstruction is obtained by replacing n with n/2 in Equation 1. Similarly, the expected bootstrap support (Eq. 5) becomes
|
| (13) |
i
4) is a random multinomial deviate
(n, 2ni(k)/n) and where [ni(k)] (1
i
4) is a random multinomial deviate
(n/2, pi). The cumulative distribution probability of B under the hypothesis of n/2 pairs of fully linked sites can be obtained similarly. The modifications introduced in Equations 11 and 12 still apply here, allowing calculation of the distribution of Other departures from independence among sites can be accommodated using equations similar to Equation 13 or their Monte Carlo approximations, as soon as a procedure for randomly generating D = (ni) under the desired model is available. Models investigated here include (1) n/2 pairs of fully linked sites, (2) q pairs of fully linked sites plus n–2q independent sites (q < n/2), and (3) a set of r fully linked sites plus n–r independent sites (r < n). A program implementing the above formulae is available on request.
| Results |
|---|
|
|
|---|
The above equations make it possible to evaluate the statistical properties of the four-species MP method and of the bootstrap support under any model, where a model is determined by p1, the proportion of "good" sites, p2 and p3, the proportions of misleading sites, and n, the total number of sites.
Bootstrap Support and Probability of Success
The relationship between bootstrap support and MP probability of success was first investigated using Equations 1 and 11. The proportion p1 of "good" sites was varied between 0.05 and 0.12, and the proportions p2 and p3 of misleading sites were fixed to 0.05 (500 independent sites and 1,000 bootstrap replicates were used). When p1 = 0.05, the three possible topologies are (on average) equally supported by the data, and the support for true tree T becomes high for high values of p1. The extreme situations simulated here correspond to data sets expected under the Jukes–Cantor (1969) model of evolution along (1) a starlike tree, with an internal branch length of zero and terminal branch lengths of 0.21 substitutions/site (p1 = p2 = p3 = 0.05), and (2) a symmetric tree with an internal branch length of 0.16 and terminal branch lengths of 0.2 (p1 = 0.12, p2 = p3 = 0.05).
The expected bootstrap support for true tree T was plotted against the MP probability of success under the hypothesis of independent sites (Fig. 1a, open squares). The probability of success varies from 0.30 to nearly 1 when p1 varies from 0.05 to 0.012, as expected. The bootstrap support appears a good estimator of the probability of success for low and high values. For intermediate probabilities of success, the bootstrap support gives an underestimate of the actual value. If, for example, p1 were equal to 0.075, then 89% of the data sets would unambiguously support the true tree T, but the average bootstrap support for T (averaged over every possible data set) would be 77.5%.
|
When the expected bootstrap support for MP tree
Influence of Nonindependence Among Sites
Nonindependence was incorporated in the above analysis by switching from a data set of 500 independent sites to a data set of 1,000 sites, composed of 500 pairs of fully linked sites (where the two sites of a given pair must belong to the same category, either correct or misleading). Data sets generated this way contain the same amount of information as data sets of 500 independent sites and yielded an equal MP probability of success for a given p1 value. The bootstrap support (both of the true tree and the MP tree) was increased when redundancy was incorporated, as anticipated by Tillier and Collins (1995), but the increase was low: typically 0–5% for E(B) and 0–10% for E(
) (Fig. 1, solid squares).
Additional calculations were conducted to assess the potential influence of this upward bias on phylogenetic studies. The total number of sites was fixed to 500, p1 was set to 0.075, and p2 = p3 were set to 0.05. The level of departure from independence was varied by assuming that a certain fraction of the sites are fully linked pairs of sites. When this fraction was zero, the data sets met the independence assumption (500 independent sites). When the fraction was 1, the data set included 250 pairs of fully linked sites. Intermediate levels of nonindependence were also investigated, e.g., 100 pairs of fully linked sites plus 300 independent sites (when the proportion of paired sites is 0.4). Figure 2a displays the behavior of three variables as a function of the proportion of paired sites. As expected, the MP probability of success (open diamonds) decreases when the proportion of paired sites increases because the phylogenetic signal is reduced when sites are duplicated. The average bootstrap support for the MP tree, however, remains stable (solid circles). Although they are less informative, nonindependent data sets appear as self-congruent as independent data sets. The bootstrap support does not reflect the loss of phylogenetic information induced by nonindependence among sites. The third variable examined is the probability of MP success given that the bootstrap support is higher than some arbitrary threshold, namely 80%. Here, the question is how correct are the trees firmly supported by the bootstrap? This probability is close to 1 (0.99) for independent sites but drops to 0.92 when nonindependence is assumed: for data sets including 250 pairs of fully linked sites, an MP tree supported by > 80% bootstrap will be wrong with a probability of 0.08. Nonindependence increases the chances of obtaining high bootstrap supports for wrong trees.
|
Another model of departure from the independence assumption was investigated (Fig. 2b). Here, a 250-site data set included a single group of fully linked sites. The size of this group was varied from zero (independent sites) to 25 (in which case each site of the group depends on the 24 others). The remaining sites were independent of each other. The influence of this kind of departure from independence appears higher than in the previous case. The bootstrap support increased when nonindependence was introduced, although the phylogenetic signal (MP probability of success) decreased. The probability that a well-supported MP tree is wrong increased from 4% (independent sites) to 24% (25 fully linked sites).
Similar calculations were performed assuming equal probabilities for the three categories of informative sites (p1 = p2 = p3 = 0.075), i.e., no phylogenetic signal (Fig. 3). The bootstrap support of the MP tree increased with nonindependence, although the probability of success of MP was, as expected, invariably equal to 1/3. The probability that the MP tree is supported by a
80% bootstrap value was approximately 0.15 in case of independent sites but reached 0.34 for data sets including highly correlated sites (Fig. 3, open squares). The apparent phylogenetic signal, as measured from bootstrap scores, was artificially increased by nonindependence.
|
Data Analysis
The above analysis is restricted to the MP method and to four-species data sets. An empirical analysis of rRNA data was conducted to examine the influence of nonindependence on bootstrap support in a more general case. The rationale of this study was to contrast phylogenetic reconstructions and bootstrap scores obtained from data subsets of consecutive (potentially departing the independence assumption) versus dispersed (presumably independent) sites of a large data set.
The aligned large subunit rRNA sequences of 40 eukaryote species were obtained from the European rRNA database (http://oberon.rug.ac.be:8080/rRNA). Species were chosen according to two criteria: sequences had to be as complete as possible, and the sampling of eukaryotic phyla had to be balanced (Fig. 4). Alignments were slightly modified by eye and then searched for conserved sequence tracts of 100 nucleotides. Ten such segments of 100 unambiguously aligned sites were found (after sites including gaps or undetermined nucleotides were removed). The remaining sites were discarded. A data set of 1,000 sites remained and was designated the full data set (available on request). An MP tree was constructed from this data set was designated the full data tree.
|
Site subsets of the full data set were then considered. The first subsets were the 10 stretches of 100 consecutive sites, designated the WINDOW data subsets. The second group of subsets was obtained by sampling nonconsecutive sites from the full data set. Ten such data subsets were constructed, the ith of which included sites of ranks i, i+10, i+20, ..., i+990 in the full data set; these data subsets were designated MODULO. Therefore, the 10 WINDOW and 10 MODULO data subsets each consisted of 100 sites. For each WINDOW and MODULO data subset, an MP tree was built and the bootstrap supports of reconstructed internal branches were assessed from 500 replicates. The pooled WINDOW results were compared with the pooled MODULO results.
Under the assumption of independent sites, the location of sites in the molecule should not matter, and the two categories of data subsets should behave similarly. However, this was not the case. The accuracy of tree reconstructions was first measured by comparing the trees obtained from data subsets with the full data tree. This comparison is not an absolute measure of accuracy because the full data tree is probably not the same as the true tree. However, this comparison gives information about the potential spatial structure of the phylogenetic information of the data set. The average proportion of "correct" internal branches (i.e., internal branches shared by the full data tree) was higher for the MODULO (0.292) than for the WINDOW (0.189) data subsets (NJ analysis). The phylogenetic signal appears lower in subsets of consecutive sites than in subsets of dispersed sites.
Is this difference reflected by lower average bootstrap support for WINDOW than for MODULO data sets? Results are summarized in Table 1, separating the bootstrap supports of "correct" (i.e., shared by the full data tree) versus "incorrect" internal branches. Internal branches reconstructed from WINDOW data subsets have higher average bootstrap support than do those reconstructed from MODULO data subsets. Nine "incorrect" internal branches were supported by bootstrap scores > 60% in WINDOW reconstructions, whereas three such strongly supported "incorrect" branches arose from MODULO data subsets. Consistent results were obtained when the NJ method (Saitou and Nei, 1987; Kimura's, 1980 distances) rather than MP was used to build trees (Table 2), indicating that this problem is not restricted to MP. The average bootstrap support was substantially higher in the NJ than in the MP analysis; therefore, the threshold for "strongly supported" internal branches was set to 80%, not 60%, in the NJ analysis. Similar results were obtained when the NJ tree was constructed using distances calculated under the assumption of gamma-distributed rates across sites (Jin and Nei, 1990; assumed gamma shape parameter = 0.5; results not shown).
|
|
Subsets of consecutive sites appear to contain a lower phylogenetic signal than subsets of dispersed sites, yet they yield a higher average bootstrap score and a higher number of strongly supported dubious internal branches. These results are fully consistent with the predictions of the above theoretical study and suggest that neighboring rRNA sites do not evolve independently and that the departure from independence has some impact on tree reconstructions and bootstrap scores.
This conclusion was confirmed by an additional analysis involving a third kind of data subset called DOUBLED. These data subsets were constructed by introducing nonindependence in the MODULO subsets. A number (k) of sites (0 < k
50) were randomly sampled (without replacement) out of the 100 sites of each MODULO subset. Each of these k sites was then duplicated. Then, 100–2k sites (randomly sampled without replacement from the remaining 100–k sites of the MODULO data set) were added, resulting in 10 DOUBLED data subsets. This procedure was followed twice, using k = 25 (third column of Tables 1, 2) and k = 50 (fourth column of Tables 1, 2), respectively. The DOUBLED data subsets were analyzed as described above. When 25 sites had been duplicated so that half of the sequence length corresponds to paired sites, doubled dispersed sites appeared to mimic the behavior of consecutive sites (Tables 1, 2). The proportion of "correct" internal branches, the average bootstrap support for "correct" and "incorrect" branches, and the number of strongly supported "incorrect" branches in the DOUBLED (k = 25) data subsets were essentially similar to those obtained from the WINDOW data subsets. Doubling 50 sites (so that all sites of the data sets are paired) resulted in a excess of "incorrect" branches and an elevated average bootstrap support (Table 2) compared with the WINDOW data subsets. The data, therefore, appear compatible with roughly half of the sites being paired, which is typically the approximate proportion of stems in an rRNA molecule.
| Discussion |
|---|
|
|
|---|
The statistical properties of the bootstrap support in the simple four-species MP case were derived analytically (following Zharkikh and Li, 1992a, 1992b) and extended to the nonindependent case. A Monte Carlo numerical approximation was proposed to overcome the high complexity of the actual formula, which allows quick examination of the behavior of the bootstrap score under various conditions without simulating sequence data. Each data point of Figures 1, 2, and 3 typically required 10–15 minutes of computing time on a standard PC.
Whether these equations can be extended to more than four species or to other tree-building methods is a challenging issue. This extension appears feasible in theory but probably not in practice. In any case, the bootstrap score for certain internal branches is a function of a compound multinomial distribution. In the four-species MP case, the multinomial has four classes, and the function simply involves calculating the maximum of the number of outcomes in three of the four classes. Adding species would increase the number of relevant site categories, i.e., the number of multinomial classes. Using distance-based or likelihood methods with complex evolutionary models would also increase this number (because sites AAGG and AACC, for example, would have to be distinguished) and would involve a function much more complex than the maximum over certain categories. Given the already high complexity of the formulae in the four-species MP case, it seems unlikely that an analytical approach could be useful in practice in more complex cases. Simulations (Hillis and Bull, 1993) or empirical analyses (this study) appear more appropriate.
When various forms of departure from independence among sites were incorporated, the probability of MP success decreased, but this drop of phylogenetic signal was not reflected by bootstrap scores; the bootstrap support of the MP tree stagnated, or even increased, under nonindependence in agreement with the prediction of Tillier and Collins (1995). Nonindependence tends to increase the probability of strongly supported wrong reconstructions and tends to generate apparent signal when there is none. Overall, departure from independence results in an overestimation of the confidence to be put in reconstructions, as measured by the bootstrap.
An analysis of rRNA data was fully consistent with these theoretical results. Subsets of consecutive sites (potentially not independent) yielded less accurate phylogenetic reconstructions (as measured from the full data set) but higher bootstrap scores than did subsets of dispersed sites. Incorporating nonindependence by duplicating a certain proportion of dispersed sites mimicked the behavior of consecutive sites.
Should we, therefore, worry about past and future interpretations of bootstrap scores? Probably not too much because the influence of nonindependence on the bootstrap score appears slight. The overestimation was always < 10% in Figure 1. The number of strongly supported dubious internal branches in WINDOW data subsets also was low. Given the highly empirical use of bootstrap scores in current practice (i.e., what should be considered a significant bootstrap score?), accounting for nonindependence effects probably would not have changed much of the molecular phylogeny literature, although nonindependence occasionally might have encouraged misleading interpretations. This additional methodological bias should be kept in mind when conducting a molecular phylogenetic analysis.
However, the WINDOW versus MODULO contrast is conservative with respect to detecting coevolutionary effects. Some rRNA stems are formed by paired RNA segments more distant from each other (in the primary structure) than the arbitrary 100-nucleotide threshold used in this study. Such long-distance interacting sites fall in distinct WINDOW data sets but might co-occur in MODULO data sets, lowering the power of the analysis.
The consecutive versus dispersed sites approach appears a suitable empirical way to roughly quantify the amount of nonindependence in actual data sets. Application to various protein data sets, for example, might yield valuable insights about actual processes of sequence evolution. However, the methodology must be improved. In particular, it in unclear when the behavior of two kinds of data sets (e.g., WINDOW vs. MODULO) is significantly different because the statistics used in this study (proportion of "correct" internal branches, average bootstrap support) involve averaging over nonindependent measures; distinct branches of a reconstructed tree and their bootstrap scores are not independent from each other.
| Acknowledgments |
|---|
|
|
|---|
This work was supported by the Genopole Montpellier Languedoc-Roussillon and the French Appel d'offre Bioinformatique Inter-EPST.
| References |
|---|
|
|
|---|
-
Berry V., Gascuel O. Interpretation of bootstrap trees: Threshold of clade selection and induced gain. Mol. Biol. Evol. (1996) 13:999–1011.
Dixon M. T., Hillis D. M. Ribosomal RNA secondary structure: Compensatory mutations and implications for phylogenetic analysis. Mol. Biol. Evol. (1993) 10:256–267.[Abstract]
Efron B., Halloran E., Holmes S. Bootstrap confidence levels for phylogenetic trees. Proc. Natl. Acad. Sci. USA (1996) 93:7085–7090.
Felsenstein J. Evolutionary trees from DNA sequences: A maximum likelihood approach. J. Mol. Evol. (1981) 17:368–376.[CrossRef][Web of Science][Medline]
Felsenstein J. Confidence limits on phylogenies: An approach using the bootstrap. Evolution (1985) 39:783–791.[CrossRef][Web of Science]
Felsenstein J., Kishino H. Is there something wrong with the bootstrap on phylogenies? A reply to Hillis and Bull. Syst. Biol. (1993) 42:193–200.
Hickson R. E., Simon C., Cooper A., Spicer G. S., Sullivan J., Penny D. Conserved sequence motifs, alignment, and secondary structure for the third domain of animal 12S rRNA. Mol. Biol. Evol. (1996) 13:150–169.[Abstract]
Hillis D. M., Bull J. J. An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analyses. Syst. Biol. (1993) 42:182–192.
Jin L., Nei M. Limitations of the evolutionary parsimony method of phylogenetic analysis. Mol. Biol. Evol. (1990) 7:82–102.[Abstract]
Jukes T. H., Cantor C. R. Evolution of protein molecules. In: Mammalian protein metabolism—Munro H. N., ed. (1969) New York: Academic Press. 21–132. Pages.
Kimura M. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. (1980) 16:111–120.[CrossRef][Web of Science][Medline]
Muse S. V. Evolutionary analyses of DNA sequences subject to constraints on secondary structure. Genetics (1995) 139:1429–1439.[Abstract]
Rzhetsky A. Estimating substitution rates in ribosomal RNA genes. Genetics (1995) 141:771–783.[Abstract]
Saitou N., Nei M. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol. (1987) 4:406–425.[Abstract]
Tajima F. Statistical method for estimating the standard errors of branch lengths in a phylogenetic tree reconstructed without assuming equal rates of nucleotide substitution among different lineages. Mol. Biol. Evol. (1992) 9:168–181.[Abstract]
Tillier E. R. M., Collins R. A. Neighbor joining and maximum likelihood with RNA sequences: Addressing the inter-dependence of sites. Mol. Biol. Evol. (1995) 12:7–15.[Web of Science]
Tillier E. R. M., Collins R. A. High apparent rate of simultaneous compensatory base-pair substitutions in ribosomal RNA. Genetics (1998) 148:1993–2002.
Vawter L., Brown W. M. Rates and patterns of base change in the small subunit ribosomal RNA gene. Genetics (1993) 134:597–608.[Abstract]
Zhang J. Z., Rosenberg H. F. Complementary advantageous substitutions in the evolution of an anti-viral RNase of higher primates. Proc. Natl. Acad. Sci. USA (2002) 99:5486–5491.
Zharkikh A., Li W.-H. Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. I. Four taxa with a molecular clock. Mol. Biol. Evol. (1992a) 9:1119–1147.
Zharkikh A., Li W.-H. Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. II. Four taxa without a molecular clock. J. Mol. Evol. (1992b) 35:356–366.
This article has been cited by other articles:
![]() |
S. A. A. Travers, D. C. Tully, G. P. McCormack, and M. A. Fares A Study of the Coevolutionary Patterns Operating within the env Gene of the HIV-1 Group M Subtypes Mol. Biol. Evol., December 1, 2007; 24(12): 2787 - 2801. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Anisimova and O. Gascuel Approximate Likelihood-Ratio Test for Branches: A Fast, Accurate, and Powerful Alternative Syst Biol, August 1, 2006; 55(4): 539 - 552. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Fares and S. A. A. Travers A Novel Method for Detecting Intramolecular Coevolution: Adding a Further Dimension to Selective Constraints Analyses Genetics, May 1, 2006; 173(1): 9 - 23. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Dutheil, T. Pupko, A. Jean-Marie, and N. Galtier A Model-Based Approach for Detecting Coevolving Positions in a Molecule Mol. Biol. Evol., September 1, 2005; 22(9): 1919 - 1928. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


















