© 2007 Society of Systematic Biologists
Opinions on Multiple Sequence Alignment, and an Empirical Comparison of Repeatability and Accuracy between POY and Structural Alignment
Edited by Thomas Buckley: Associate Editors
1 Department of Ecology, Evolution and Natural Resources, Rutgers University New Brunswick, New Jersey, 08901, USA E-mail: kjer{at}aesop.rutgers.edu
2 Department of Entomology, Texas A&M University College Station, Texas, 77843, USA E-mail: pvittata{at}hotmail.com
3 Department of Biology, College of the Holy Cross Worcester, Massachusetts, 01610, USA E-mail: kober{at}holycross.edu
Received September 15, 2005; Revised December 10, 2005; Accepted July 11, 2006 The concept of homology is pivotal to Darwin's paradigm of descent with modification. However, in molecular phylogenetics, the process of alignment is often overlooked as a critical step. The data in molecular phylogenetic studies are not individual sequences, but rather, columns of putatively homologous nucleotides, or arguably, reconstructed presumed homology pathways in direct optimization (see Wheeler, 1996). Simplified, alignment is the assignment of homology.
Although it would be unthinkable for morphologists to ignore issues of homology, many investigators do not carefully consider alignment of molecular data. Some simply use default parameters with their data in Clustal (Thompson et al., 1994) or some other automated alignment program, perhaps manually deleting "unalignable" regions. This is a mistake, because there are many examples where alignments, and the assumptions that go into them, will produce different trees (e.g., Mindell, 1991; Wheeler, 1995; Morrison and Ellis, 1997; Kjer, 1995, Hickson et al., 2000; Wheeler et al., 2001; Xia et al., 2003; Ogden and Whiting, 2003; Kjer, 2004).
Approaches to alignment for phylogenetic studies can be divided into two broad categories; manual alignment and computer-based alignment. A diagram of the types of alignment and their relationship to one another is provided in Figure 1. In a survey of phylogenetic papers in Systematic Biology, Molecular Biology and Evolution, and Cladistics from the last 3 years, we find that 76% of the papers that utilized rRNA were manually aligned (Table 1). Automated methods can be performed in a variety of programs (e.g., Clustal [Thompson et al., 1994], Divide and Conquer [Stoye, 1998], and T-Coffee [Notredame et al., 2000]), and these and other programs have been discussed by others (i.e., Hickson et al., 2000; Gardner et al., 2005; Katoh et al., 2005). Automated alignments that are adjusted by hand should ultimately be considered manual, because the final homology decisions are in the eye of the investigator. Manual alignment can be further categorized as "by eye," or structural. "By eye" alignments are typically performed without a consistent criterion, and hence are considered by us to be not only subjective, but also phenetic (with nucleotide similarity being the criterion, albeit one without easily demonstrated consistency). Ribosomal RNA structure–based alignments involve the use of published structural models (available from rRNA databases) combined with a search for compensatory base changes (e.g., Kjer, 1995), which provide evidence for conserved helical interactions that are to be aligned together (for a recent historical review see Noller, 2005). In short, computer alignments minimize change among nucleotides, whereas rRNA structural alignments minimize change among rRNA secondary structures.
|
|
All organisms have the same basic rRNA structure, yet the nucleotides involved in base pairing can vary a great deal among lineages. Thus, maximizing nucleotide identity as a means of establishing homology ignores a higher order of conservation. The nucleotide states themselves can make very little difference in ribosomal function, but the conservation of higher order structure is imperative. Computer programs like Clustal (Thompson et al., 1994) and POY (Gladstein and Wheeler, 1997) assume that the lowest evolutionary costs involve the retention of nucleotides. In other words, aligning nucleotides together based on state is both parsimonious and algorithmically less costly, and that shorter trees are better trees. However, we argue that this is not justified in structurally conserved molecules in which the states of the nucleotides are less important than the conserved structures with which they are intimately associated. Previous studies have shown that elucidation of a secondary structure model can aid in more accurate phylogenetic inferences (e.g., Kjer, 1995; Hickson et al., 1996, 2000; Titus and Frost, 1996; Morrison and Ellis, 1997; Xia et al., 2001; Gowri-Shankar and Rattray, 2006). Another reason to consider secondary structure is that models that incorporate it are improving. MrBayes 3.0 (Ronquist and Huelsenbeck, 2003) includes a doublet model (Schöniger and von Haeseler, 1994), and the program PHASE (Jow et al., 2002; Hudelot et al., 2003; Jow et al., 2005) contains multiple models for pairing regions, including a seven-state model for RNA base pairs that offers a promising alternative to six-and sixteen-state models (e.g., Savill et al., 2001; Gibson et al., 2005; Gillespie, 2005). Similar reasoning for aligning by structure may be applied to protein sequences, but the codon organization of protein genes make the alignment of protien coding genes a very different process from the alignment of rRNA and is not discussed in this paper. When using default parameters in a phylogenetic study, it is important to consider whether these defaults were designed for protein-coding data or RNAs because the optima between these data sources should be very different (Katoh et al., 2005).
All computer alignments require input parameters, and the objective selection of these parameters has been an active area of investigation. Among the most important of input parameters is the cost for inserting a gap in the shorter of two sequences relative to the cost of a substitution (the gap cost-to-substitution cost ratio, or gap cost). Under Needleman-Wunsch algorithms (Needleman and Wunsch, 1970), the computer assigns a numerical score to different alignments, providing points for nucleotides that are the same and subtracting points for inserting gaps (two extremes are shown in Fig. 2). In both panels, the two sequences are identical, except for the addition of a single nucleotide at the arrow. In Figure 2A, if the gap cost is too low, the program will produce a trivial alignment in which the nucleotides on the bottom line up with an identical nucleotide in the reference sequence. If the gap cost is too high, the program will not insert a gap (Fig. 2B). This exaggerated example is meant to introduce the idea that there may be some ideal gap cost to substitution ratio between these two obviously inappropriate extremes. Parameter selection will determine phylogenetic conclusions, and the selection of parameters may be accomplished through examining the sensitivity of phylogenetic conclusions to a variety of alignment parameters (Wheeler, 1995).
|
An appealing modification of computer-based alignment would be to simultaneously evaluate alignment and tree building, bypassing fixed hypotheses of homology by reconstructing ancestral sequences on multiple trees in a process now called "direct optimization" (Sankoff et al., 1973; Sankoff, 1975; Sankoff and Cedergren, 1983; Kruskal, 1983). Wheeler (1996) discusses direct optimization (or optimization alignment) and provides an implementation in the program POY (Gladstein and Wheeler, 1997). The optimal tree in POY is that which depicts the minimal number of character transformations (substitutions, insertions, and deletions) between all taxa based on reconstructed ancestral sequences at each node of the cladogram. Hence, the program is rooted in the parsimony optimality criterion and offers a more precise implementation of the Sankoff-Morel-Cedergren method (Felsenstein, 2004). Similar methods of simultaneous alignment and tree estimation exist under Bayesian sampling (Mitchison, 1999) and distance methods (Hein, 1989, 1990). Redelings and Suchard (2005) provide a Bayesian method to simultaneously explore the joint posterior distribution of alignment and phylogeny, with software available at http://www.biomath.ucla.edu/msuchard/. Holmes (2004) discusses a probabalistic model of structural RNA evolution, following the model of Thorne et al. (1991; TKF91 model). Lunter et al. (2005a) also used the TKF91 model to provide a joint analysis of alignment and phylogeny under a Bayesian statistical framework, and Lunter et al. (2005b)discuss likelihood approaches to direct optimization. POY also offers a likelihood implementation of direct optimization (not considered here) that considers indels as independent 5th states (Wheeler, 2006). Gorodkin et al. (1997), Mathews et al. (2002), and Perriquet et al. (2003) provide programs for predicting and aligning structural motifs. Although these approaches may hold enormous promise, we limit our discussion to the parsimony implementation of POY program because it is, by far, the most commonly used method of direct optimization, and many of the other methods are limited by either a simplistic model of insertions and deletions and/or limitations on the number of taxa that can be aligned. Work will continue on the improvement of indel models (i.e., Miklós et al., 2004; Fleissner et al., 2005).
There has been intense disagreement over the relative merits of structural alignments and automated alignments. Papers such as Kjer (1995) and Wheeler (1995, 1996) provide ideas for discussion but do not test the most divergent approaches against one another (structure versus POY). Moreover, two recent studies comparing POY to static methods (structure, by-eye, and automated) do not directly address the relative merits and drawbacks of both approaches (Shull et al., 2001; Belshaw and Quicke, 2002). A recent reanalysis of Belshaw and Quicke (2002) by Gillespie et al. (2005b) illustrated some pitfalls of POY when secondary structure is not used to guide the alignment; however, a rigorous comparison of both methods was not performed. The challenges in evaluating alignment approaches stem from both practical and philosophical sources. Carefully done alignments, both manual and computer-guided, require significant expertise and effort. Computer alignment involves extensive analysis with high-powered computing facilities just to select among a nearly infinite pool of required input options (parameters). Manual alignment can take weeks of labor at the computer screen. An ideal test would be to send a large number of data sets, each with taxa encrypted, to dozens of independent investigators for alignment by both manual and automated approaches. Comparing these results would help to answer which approach is more repeatable in terms of recovery of similar phylogenetic trees. The presence of at least some highly corroborated nodes among the taxa in these "test data sets" would provide an added benefit, so that accuracy could be evaluated, as well as repeatability. It is difficult to objectively evaluate alignment approaches. Here we make an attempt.
| Experimental Procedures |
|---|
|
|
|---|
Complete mitochondrial large subunit ribosomal RNA sequences (16S rRNA) for 18 mammals were retrieved from GenBank. These taxa were selected because we consider the relationships among them to be highly corroborated at every node (for review see Hudelot et al., 2003), and the nodes for the expected tree (Fig. 3) are recovered in a combined analysis of complete mitochondrial genomes. The taxa were encrypted, and their order was shuffled, with only the Bos taurus (cow) sequence identified. Taxa were then aligned by each of the authors following the structural criteria of Kjer et al. (1994), Kjer (1995, 1997), and Hickson et al. (1996), using published structural models from the Comparative RNA web (CRW) site (Cannone et al., 2002) as a guide. Ambiguously aligned nucleotides were coded as multistate characters and analyzed with the program INAASE (Lutzoni et al., 2000). We also each produced a phylogeny generated from POY 3.0.11, with the following command lines; "./poy 16S-exact-gap 1-change 1-extensiongap 1-intermediate-noleading-replicates 10 –trailinggap 0-stats-tbr –time" (Kjer), "./poy 16S-exact-gap 1-extensiongap 1-intermediate –noleading-replicates 10-trailinggap 0-stats-tbr –time" (Ober), and "./poy 16S-gap 1-seed 1-slop 5-checkslop 10-buildsperreplicate 10-replicates 100-impliedalignment > 16S.out" (Gillespie). Ober sent an example file to Kjer, which accounts for the similarities between their command lines. The specific instructions sent from Kjer to Gillespie and Ober were intentionally vague, and are available on the Systematic Biology web site: http://systematicbiology.org, and Kjer's web site: http://www.rci.rutgers.edu/~insects/kjer.htm. The web document also contains recommendations for making the alignment process easier.
|
| Results |
|---|
|
|
|---|
The phylogenies produced from a parsimony analysis of the structural alignments are shown in Figure 4. All three alignments produced nearly identical topologies. The results from the POY analyses are shown in Figure 5. All three of our structural alignments differed at some positions (see web resources; http://systematicbiology.org, or Kjer's web site), as can be seen in the different branch lengths, bootstrap values, and tree lengths. However, all three structural alignments produced nearly the same tree topology, differing only at a single node with bootstrap support near 50% and a near zero branch length (Fig. 3). Support for most nodes was similar.
|
|
The POY results were different. None of us converged on either the same parameters or the same tree (Fig. 5). Gillespie performed 10 different gap costs to change ratios, while choosing not to penalize a higher gap cost with an increasing gap extension cost. Ober produced seven trees and varied both gap costs and gap extension costs. Ober and Gillespie chose to present the variety of trees from the different input parameters, opting not to favor one tree over another. Kjer felt it was important to present one tree and followed a strategy of minimizing ILD scores between the 16S and 12S rRNAs. The first area of parameter space that Kjer investigated was the gap cost-to-change ratio, setting the extend cost (the cost of extending the length of an existing gap) at half that of the gap cost. The ILD scores got successively worse as the gap cost was increased. The next step was to explore the extend cost between 0.5 and 1, fixing the gap cost-to-change ratio at 1:1, which was the optimum from the first set of tests. None of us explored either different transversion costs or any of the other input options available in POY. Would different investigators obtain the same tree using POY? Answer: No, at least not in this case.
| Discussion |
|---|
|
|
|---|
Parameter Selection and Repeatability
If parameter selection is a major determinant of phylogenetic conclusions derived from POY, and if the selection of these parameters is arbitrary, it would be predicted that phylogenetic conclusions developed are less repeatable among investigators unless they are given these arbitrary parameters in advance. There is a similarity between sharing alignments and specifying input parameters in a computer alignment. The ability to repeat a computer alignment from raw data when given the parameters is no more valuable than repeating an analysis from a publicly available NEXUS file of a manual alignment. We suspect that this is not what is meant when the claim is made that manual alignments are not repeatable. If the alignment is clearly presented, the analysis is repeatable. Claims that manual alignments are not repeatable probably stem from the likelihood that different investigators produce alignments that are not identical. Our manual alignments were not identical, but neither were the POY results.
Because parameters like the gap cost influence results, objectivity in their assignment is desirable. One proposal in doing this is to perform a sensitivity analysis on a variety of parameters and measure each tree against some external criterion: a tree based on morphological characters, for example, or minimizing ILD scores between partitions. Other congruence measures have been proposed (Farris et al., 1995; Wheeler and Hiyashi, 1998; Wheeler, 1999a, 1999b; Wheeler et al., 2006), but Aagesen et al. (2005) found that none of the congruence measures they explored were always preferable. Sensitivity analysis (Farris, 1969; Wheeler, 1995; Whiting et al., 1997) was designed to objectively optimize a variety of parameters for phylogenetic analysis, including gap cost. There are problems with sensitivity analysis as it is most commonly practiced. For example, Wheeler (1995) and Whiting et al. (1997) only explored the gap cost against the transition-to-transversion ratio. There are many more parameters that define a tree, such as the cost to extend a gap (extend cost) or the penalty for gaps at the beginning and ends of a sequence fragment, or how an RNA fragment is subdivided into constrained pieces. All these parameters must be set by the investigator, and a thorough search requires the simultaneous optimization of all parameters that define phylogenetic conclusions. If sensitivity analysis was proposed to come to an objective, repeatable set of parameters, whether or not separate investigators would come up with similar parameters has never been tested and depends on putting some arbitrary bounds on an infinite space. Here we provide an example of how three investigators came to different conclusions based on different parameters. Aagesen et al. (2005) come to similar conclusions regarding sensitivity analyses, but with different recommendations. Simmons (2004) discusses the problems with using congruence among characters in selection among alignment parameters. Grant and Kluge (2003) argue for the application of equal weights. Philosophically we agree with them, and our work shows the failure of sensitivity analyses to come to any predictable set of parameters. However, even when the same parameters were input, POY did not always produce the same tree. Kjer performed the 1:1:1 analysis three times, with all parameters exactly the same and twice recovered a tree of 2657 steps, and once a different tree of 2646 steps (Fig. 5A). For these data, it is clear that the TBR search with 10 replicates that was performed was not sufficient. However, Kjer performed over 27 such analyses in a search for optimal parameters, and even with only 18 taxa, it took a considerable amount of time. Gillespie performed 100 replicates and recovered the longest trees. The number of required searches performed in POY is another important variable to be considered.
Figure 6 shows the results of a formal sensitivity analysis, exploring gap costs and extension costs. Rather than showing a gradually sloped landscape, with an unambiguous peak that indicates the optimum for these parameters, we see a flat plain with a single sharp spike at 1:1:1, gap cost:change cost:extend cost. Interestingly, the ILD value for 1:1:0.95 was on the flat plane with the other values. This shows that with this data set it may not be sufficient to be close to an optimum; one needs to hit it exactly, which is difficult to do when there are an infinite number of possibilities. Of course, we still do not know if the peak found at 1:1:1 is optimal, but it was the best among the parameter sets we examined. Other sensitivity analyses (e.g., Wheeler, 1995; Whiting et al., 1997; Terry and Whiting, 2005) have also revealed ambiguous optima in parameter/ILD landscape. Terry and Whiting (2005) showed a sensitivity analysis with POY to reveal less ambiguous optima than a sensitivity analysis with Clustal, although overall variance across parameter parameter/ILD was greater using POY.
|
Wheeler (1995) and Giribet and Wheeler (1999) make the statement that manual alignment violates a priori assumptions about gap costs. The idea that there is a "true cost" for gap insertions for a particular biological system (gene) is at the very heart of the sensitivity analysis issue. Kjer (1995) states that if regions within a gene vary in their permissiveness for gaps, then any fixed "ideal average gap penalty," even if it were objectively defined, would be inappropriate for some regions. Varying this "ideal average gap penalty" to optimize regions that were inappropriately aligned in one iteration must result in a worse estimate in other regions. Length heterogeneity is most commonly encountered in unpaired regions of rRNA or at the tips of hypervariable helices (see van de Peer et al., 1993; Hickson et al., 1996; Gillespie, 2004) but usually not in conserved regions. Figure 7 provides a hypothetical example of how varying a series of fixed gap penalties merely moves the set of appropriately defined parameters from one pool of sites to another without necessarily increasing the size of the appropriately parameterized pool. By converting all of our nucleotide characters into As and all of our gaps into Cs, we used a two-state unequal-state-frequency model to measure among-site rate variation on the expected tree in our aligned NEXUS file with PAUP version 4.0b10 (Swofford, 2001). The recovered alpha value was 0.45, indicating that the concave curve in our cartoon in Figure 7 does approximate the shape of a curve that estimates the variation among sites for indels, and therefore the variation of gap costs. This points out a fundamental flaw with the idea of "sensitivity analysis" as it is most commonly applied in search of optimal parameters that cannot be overcome by exploring a series of fixed (therefore inappropriate) gap costs. In using an approach that varies parameters around some ideal, it is assumed that at least some (or one) of the analyses are appropriate. This is not the case with analyses that use a variety of fixed gap penalties. If all are inappropriate, finding an optimum of many inappropriate methods does not necessarily lead to a meaningful optimum. This can be examined in Figure 8. The top seven taxa are murine rodents, whereas the bottom five taxa represent whales, apes, birds, fish, and mollusks. Within region V7 of the 12S rRNA (van de Peer et al., 1999), variation can be observed even among closely related taxa. Helical strand 38' (synonymous with strand H1047 of E. coli) is length invariant across virtually all eukaryotes (Cannone et al., 2002). Obviously, the "cost" (in selective terms) of length heterogeneity is nowhere near evenly distributed across sites. Ideally, gap costs across this short stretch of rRNA should vary from one region to another. Manual alignment permits flexible and appropriate mental gap costs, although these gap costs are regrettably undefined. This region of mitochondrial 12S rRNA can be downloaded from GenBank for any set of taxa to see the same patterns of regional heterogeneity, which is characteristic of rRNA.
|
|
We accept that the exploration of competing analyses is useful; however, all of these explorations are flawed if "biological" gap insertion penalties are optimized with uniform parameters, yet vary across sites. The definition of gap cost to change ratios is not as unambiguous as it seems. An improvement would be the implementation of a sliding window, in which gap costs at each site were correlated to neighboring sites. However, in our experience (and in reasoning that is circular, in that we define them as such) the boundaries between conserved regions and length heterogeneous regions are abrupt and structurally determined.
Because gap cost to change ratios are arbitrary (Vingron and Waterman, 1994; Kjer, 1995; Wheeler, 1996; Doyle and Davis, 1998; Phillips et al., 2000; Hickson et al., 2000), one would expect that two investigators analyzing an identical data set with POY will come up with different parameters (Morrison and Ellis, 1997; Hickson et al., 2000). Therefore, in reality, aligning with an a priori decision to use POY results in analyses that may be less consistent among investigators and no more repeatable than published manual alignments. In contrast, if rRNA is organized into a conserved secondary structure, then different investigators are more likely to come to similar phylogenetic conclusions because they are all using a homology criterion that is not arbitrary.
Nucleotide Bias, Phenetics, and Ancestral State Reconstruction
One of the critical flaws with POY is that it relies on the reconstruction of ancestral states using parsimony. Collins et al. (1994) provide convincing evidence from both simulations and empirical data that parsimony does a poor job of reconstructing ancestral states in the presence of nucleotide compositional bias and/or accelerated substitution rates. Eyre-Walker (1998) provided a mathematical proof of this. Because the reconstruction of character states on ancestral nodes is what POY does, there must be extensive examination of nucleotide compositional bias before its use. Concern about compositional bias and rate heterogeneity applies more generally to the calculation of tree lengths in any parsimony search. However, we are not arguing against computer searches in general. The problems of compositional bias and ancestral state reconstruction are specifically problematic for POY in relation to how it handles (and retains) the data within rRNA hypervariable regions, because these regions frequently maintain different rate and compositional properties from the rest of the gene. Hypervariable regions in rRNA are commonly subject to compositional bias compounded by accelerated substitution rates (van de Peer et al., 1993). This issue is further confounded in the analysis of rRNAs from arthropods due to an extreme AT-rich nucleotide bias in the mitochondrial genome (i.e., Crozier and Crozier, 1993) and in the variable regions of the nuclear rRNAs of some taxa (i.e., Chalwatzis et al., 1995, 1996; Gillespie et al., 2005a).
Regional nucleotide compositional bias provides another severe challenge to automated alignments. If homology cannot be reasonably asserted, then automated analyses of potentially randomized data will favor grouping organisms together according to nucleotide compositional similarity, a phenetic approach. Note, in Figure 2, nucleotide compositional bias plays a role in reducing the complexity of the sequences, favoring the misalignment of nonhomologous, A-rich regions. This illlustrates why tree length should not be used as a measure of alignment accuracy. In order to examine whether automated searches are grouping organisms according to nucleotide composition, recommendations of Lockhart et al. (1994) should be explored, checking to see if there is a correlation between nucleotide compositional similarity (particularly within length-variable regions) and relationship. Both concerns about compositional bias can be minimized with structural alignments, in that unalignable regions can be objectively defined and either eliminated or treated with fragment-level analysis (Wheeler, 1999; Lutzoni et al., 2000).
Gaps and Data Exclusion Criteria
There are two common alternatives to the treatment of gaps in phylogenetic analyses. Gaps are often scored as "missing data" or as a "5th state." Gaps are not missing data. Even though they are not strictly character states, but rather constructions of the alignment, they are the result of heritable events that can be homologous and often informative. There is no justification for excluding them (Cerchio and Tucker, 1998; Giribet and Wheeler, 1999). However, treating them as either 5th state characters or as independent events in a POY analysis is a very poor alternative. Figure 9 offers an example of why this is so. Wheeler (1995) recommends that we treat gaps as individual independent characters and weight them in a parsimony analysis exactly the same as the ratio between gap costs and change costs. Assume that the alignment in Figure 9 was recovered by using a gap cost–change ratio of 3:1. According to Wheeler's (1995) recommendation, we should consider five independent synapomorphies (numbered 1 to 5 above the sequence) between taxon B and taxon C with gaps as a 5th state. Yet there is no way to estimate how many indel events were responsible for the length difference between taxon B and taxon C. It is more parsimonious to assume a minimum of events, not a maximum. To make matters worse, by weighting these characters by the gap cost to change ratio, we find the equivalent of 15 independent synapomorphies linking B and C, all from what could have been the result of a single event (followed by additional indels), or worse, two convergent losses! Intuitively, the weight these characters should receive is inversely proportional to the length heterogeneity of the region. The problem is reduced in POY with the option of reducing the cost for additional gaps that are contiguous with preexisting gaps (the "extendcost" option), but unless the extend cost is zero, then the nonindependence of contiguous deletions of multiple nucleotides will inflate the tree length and differentially overweight characters from hypervariable regions. Aagesen et al. (2005) show that accommodating the potential nonindependence of contiguous gaps with the affine gap costs in POY improved topological congruence in all cases. Many other authors have dealt with this problem (e.g., Swofford, 1993; Baldwin et al., 1995; Hibbett et al., 1995; Kjer, 1995; Crandall and Fitzpatrick, 1996; Kretzer et al., 1996; Manos, 1997). Simmons and Ochoterena (2000) discuss gap coding at length and provide several options for treating gaps. A search-based extension of fixed state optimization, not discussed here, is implemented in POY (Wheeler, 2003). Two papers (Wheeler 1999a; Lutzoni et al., 2000) independently addressed the inclusion of ambiguously aligned regions through a procedure called fragment-level alignment (Lee 2001; see Fig. 1). Conserved regions are used to delimit unalignable regions, which are then treated as "fixed states" (FSO, Wheeler, 1999a) in POY or as recoded states in static alignments "INAASE" (Lutzoni et al., 2000). Transformations across these unalignable regions are then calculated via step matrices after an optimal alignment is performed. Precise use of structural information can "carve" these unalignable regions further into discrete blocks that should improve these fragment level methods (Gillespie, 2004). The implementation of affine gap costs in POY may be a promising compromise (Aageson et al., 2005), but it would still be subject to the arbitrary nature of parameter selection.
|
| Conclusions and Future Directions |
|---|
|
|
|---|
In a reasoned methodology, a certain amount of subjectivity may be inevitable, but it is important to both minimize subjectivity and call attention to it where it exists. For example, we find it inconsistent to reject structural alignment because it is "unrepeatable," while incorporating morphological data into combined POY analyses because although we support morphological analyses, we understand that no two morphologists are likely to come up with identical data matrices. We object to the implementation of arbitrary methods that are justified primarily on grounds of objectivity and repeatability. That assignment of parameters is arbitrary comes logically from the fact that gap costs are not well understood and capricious, and therefore the application of discretionary parameters will vary among investigators. The often-stated/never-tested claim that "...eyeball alignments may create inaccurate alignments..." is contrary to a wide variety of published confirmations of structural alignment, including Kjer (1995, 2004), Titus and Frost (1996), Schnare et al. (1996), Hickson et al. (1996, 2000), Lutzoni et al. (2000), Mugridge et al. (2000), Ellis and Morrison (1995), Morrison and Ellis (1997), Xia et al. (2001), and Gillespie et al. (2005b).
Previous comparisons of alignment methods involve testing one automated method like Clustal (Shull et al., 2001; Terry and Whiting, 2005) or MALIGN (Wheeler, 2000) against POY. Whiting et al. (2006) compared manually adjusted Clustal alignments with POY (both methods without data exclusion) and found the POY trees to be superior, as judged by likelihood score or tree lengths. Structural alignments need to be a part of these tests. In their comparison of the performance of several computer alignment methods, Hickson et al. (2000) demonstrated that differing alignments that can yield diverse phylogenetic trees can be generated using various programs and different alignment parameters. Problems stem from determining a priori parameter values that are appropriate for particular data sets and appropriate for disparate regions of the same data set. Hickson et al. (2000) conclude the use of secondary structure reduces alignment ambiguity and is a valuable aid to determine rRNA alignments that are biologically relevant. Structural alignment of rRNA follows a long tradition in morphological cladistics, similar to the way a morphologist counts vertebrae back from the skull, or examines connections to infer homology. Direct optimization will work best when we have a reasonable model of how insertions and deletions occur, as well as their frequencies, and possible interdependence (Lunter et al., 2005b). POY does not currently offer a realistic model of indel evolution.
There are other programs to assist with structural alignment. Energetic folding algorithms (e.g., Mathews et al., 1999; Zuker et al., 2003) can be utilized to facilitate the search for putative structures that must then be confirmed with comparative evidence. However, careful interpretation of these folding algorithms is imperative (see Doshi et al., 2004), as their predictions often conflict with the accuracy of comparative structure methods (Hickson et al., 1996; Gutell et al., 2002) that have been validated by recent crystalline structures of the ribosome (e.g., Ban et al., 2000; Schluenzen et al., 2000; Wimberly et al., 2000). Some alignment programs incorporate structural information into the alignment procedure (e.g., Notredame et al., 1997; Gorodkin et al., 2001; Hofacker et al., 2004); however, most programs cannot accurately predict pseudoknots, conserved noncanonical base pairs, or tertiary interactions (Eddy, 2004). We prefer to use these programs as useful tools that are still subject to manual confirmation, particularly in expansion segments and variable regions of rRNA alignments. We find the approaches of Misof et al. (2003), Holmes (2004), Niehuis et al. (2006), and Redelings and Suchard (2005)to be extremely promising. We do not support manual alignments simply for their own sake, but rather because they currently give us satisfactory results that are, in our opinions, both philosophically and operationally superior to POY. We have not yet thoroughly explored either the strengths or the limitations of other approaches. We predict that someday manual alignments will be replaced by computer methods that outperform them, although it is possible that this day is a long way in the future. Because data sets are preserved in GenBank and on the Web, and results are always upgradable hypotheses, we suggest that data should always be treated with the best available methods at the time they are analyzed, given that they can be analyzed with superior methods in the future, if and when these methods become available.
The use of structural information and POY are not mutually exclusive. By constructing a structural alignment for input into POY, with spaces delimiting each stem, conserved unpaired region, and ambiguously aligned region, one could enforce structural constraints from the manual alignment, and permit POY to evaluate the ambiguously aligned regions (see Fig. 1). In effect, this is what some studies have suggested (Giribet and Ribera, 2000; Giribet, 2001) by subdividing rRNA into smaller and smaller pieces. Here we suggest that those pieces be explicitly identified by manual alignment according to structural criteria, as recently applied (Gillespie et al., 2005b).
It may be that some or even all of us did not use POY properly. One could imagine a parallel study that included a masterful analysis with POY, compared to a sloppy manual alignment. This is part of the problem with a comparison of approaches, especially with sample sizes too small to account for variance. However, we are not aware of any specific published recommendations on how a POY analysis should be objectively performed. Although we cannot definitively prove the superiority of one approach over another, we can show that the "all or none" argument that computer alignment is objective and manual alignment is subjective is false. Both require subjective decisions. Manual alignments are difficult, and may be poorly done, especially if structure is not implemented or the sequences are extraordinarily hypervariable in length and substitution rate. But if the alignment is presented, then hypotheses of homology can be challenged and upgraded. Both methods would benefit from clear instructions on how to best perform analyses. Instructions for conducting structural alignment can be found in Kjer (1995) and Gillespie (2004), and now a web page has been offered that provides a tutorial (jRNA web site: http://hymenoptera.tamu.edu/rna/). We believe that skill and experience will always play a role in systematics, and people who are either meticulous or experienced with alignment will probably produce better results.
| Acknowledgments |
|---|
|
|
|---|
We are grateful for the critical and helpful comments by Chris Simon and her phylogenetics class (UCONN), Rodney Honeycutt and Matt Yoder (TAMU), Usman Roshan (NJIT), and reviews by Rod Page, Thomas Buckley, David Morrison, and an anonymous reviewer. Their contributions immensely improved this manuscript. KMK acknowledges financial support from NSF DEB 0316504 and DEB 0430910.
| Notes |
|---|
|
|
|---|
4 Current Address: Bioinformatics Facility, Virginia Bioinformatics Institute, Washington Street, Virginia Institute of Technology, Blacksburg, Virginia, 24061, USA
| References |
|---|
|
|
|---|
-
Aagesen L., Petersen G., Seberg O. Sequence length variation, indel costs, and congruence in sensitivity analysis. Cladistics (2005) 21:15–30.[Web of Science]
Baldwin B. G., Sanderson M. J., Porter J. M., Wojciechowski M. F., Campbell C. S., Donoghue M. J. The ITS region of nuclear ribosomal DNA: A valuable source of evidence on angiosperm phylogeny. Ann. Missouri Bot. Gard. (1995) 82:247–277.[CrossRef][Web of Science]
Ban N., Nissen P., Hansen J., Moore P. B., Steitz T. A. The complete atomic structure of the large ribosomal subunit at 2.4 Å resolution. Science (2000) 289:905–920.
Belshaw R., Quicke D. L. J. Robustness of ancestral state estimates: Evolution of life history strategy in ichneumonoid parasitoids. Syst. Biol. (2002) 51:450–477.
Cannone J. J., Subramanian S., Schnare M. N., Collett J. R., D'Souza L. M., Du Y., Feng B., Lin N., Madabusi L. V., Muller K. M., Pande N., Shang Z., Yu N., Gutell R. R. The Comparative RNA Web (CRW) Site: An online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinformatics (2002) 3:2. [Correction: BMC Bioinformatics 3:15.].[CrossRef][Medline]
Cerchio S., Tucker P. Influence of alignment on the mtDNA phylogeny of Cetacea: Questionable support for a Mysticeti/Physeteroidea clade. Syst. Biol. (1998) 47:336–344.
Chalwatzis N., Baur A., Stetzer E., Kinzelbach R., Zimmermann R. K. Strongly expanded 18S ribosomal-RNA genes correlated with a peculiar morphology in the insect order of Strepsiptera. Zool. Anal. Complex Systems (1995) 98:115–126.
Chalwatzis N., Hauf J., van de Peer Y., Kinzelbach R., Zimmermann R. K. 18S ribosomal-RNA genes of insects: Primary structure of the genes and molecular phylogeny of the Holometabola. Ann. Entomol. Soc. Am. (1996) 89:788–803.[Web of Science]
Collins T. M., Wimberger P. H., Naylor G. Compositional bias, character-state bias, and character-state reconstruction using parsimony. Syst. Biol. (1994) 43:482–496.
Crandall K. A., Fitzpatrick J. F. Jr. Crayfish molecular systematics: Using a combination of procedures to estimate phylogeny. Syst. Biol. (1996) 45:1–26.
Crozier R. H., Crozier Y. C. The mitochondrial genome of the honeybee Apismellifera: Complete sequence and genome organization. Genetics (1993) 133:97–117.[Abstract]
Doshi K. J., Cannone J. J., Cobaugh C. W., Gutell R. R. Evaluation of the suitability of free-energy minimization using nearest-neighbor energy parameters for RNA secondary structure prediction. BMC Bioinformatics (2004) 5:105.[CrossRef][Medline]
Doyle J. J., Davis J. I. Homology in molecular phylogenetics: A parsimony perspective. Pages 101–131. In: Molecular systematics of plants II: DNA sequencing—Soltis D. E., Soltis P. S., Doyle J. J., eds. (1998) Boston, Massachusetts: Kluwer Academic.
Eddy S. R. How do RNA folding algorithms work? Nat. Biotechnol. (2004) 22:1457–1458.[CrossRef]
Ellis J., Morrison D. Effects of sequence alignment on the phylogeny of Sarcocystis deduced from 18S rDNA sequences. Parisitol. Res. (1995) 81:696–699.[CrossRef]
Eyre-Walker A. Problems with parsimony in sequences of biased base composition. J. Mol. Evol. (1998) 47:686–690.[CrossRef][Web of Science][Medline]
Farris J. S. A successive approximates approach to character weighting. Syst. Zool. (1969) 18:374–385.
Farris J. S., Källersjö M., Kluge A. G., Bult C. Testing significance of incongruence. Cladistics (1995) 10:315–319.[CrossRef][Web of Science]
Felsenstein J. Inferring phylogenies (2004) Sunderland, Massachusetts: Sinauer Associates.
Fleissner R., Metzler D., von Haeseler A. Simultaneous statistical multiple alignment and phylogeny reconstruction. Syst. Biol. (2005) 54:548–561.
Gardner P. P., Wilm A., Washietl S. A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res (2005) 33:2433–2439.
Gibson A., Gowri-Shankar V., Higgs P. G., Rattray M. A comprehensive analysis of mammalian mitochondrial genome base composition and improved phylogenetic methods. Mol. Biol. Evol. (2005) 22:251–264.
Gillespie J. J. Structure-based methods for the phylogenetic analysis of ribosomal RNA molecules. Ph.D. Dissertation. (2005) College Station: Texas A&M University.
Gillespie J. J. Characterizing regions of ambiguous alignment caused by the expansion and contraction of hairpin-stem loops in ribosomal RNA molecules. Mol. Phylogenet. Evol. (2004) 33:936–943.[CrossRef][Web of Science][Medline]
Gillespie J. J., McKenna C. H., Yoder M. J., Gutell R. R., Johnston J. S., Kathirithamby J., Cognato A. I. Assessing the odd secondary structural properties of nuclear small subunit ribosomal RNA sequences (18S) of the twisted-wing parasites (Insecta: Strepsiptera). Insect Mol. Biol. (2005a) 14:625–643.[CrossRef][Web of Science][Medline]
Gillespie J. J., Yoder M. J., Wharton R. A. Predicted secondary structures for 28S and 18S rRNA from Ichneumonoidea (Insecta: Hymenoptera: Apocrita): Impact on sequence alignment and phylogeny estimation. J. Mol. Evol. (2005b) 61:114–137.[CrossRef][Web of Science][Medline]
Giribet G. Exploring the behavior of POY, a program for direct optimization of molecular data. Cladistics (2001) 17:S60–S70.[CrossRef][Web of Science][Medline]
Giribet G., Ribera C. A review of arthropod phylogeny: New data based on ribosomal DNA sequences and direct character optimization. Cladistics (2000) 16:204–231.[CrossRef][Web of Science]
Giribet G., Wheeler W. C. On gaps. Mol. Phylogenet. Evol. (1999) 13:132–143.[CrossRef]
Gladstein D. S., Wheeler W. C. POY: The optimization of alignment characters. In: Program and documentation. (1997) Available at ftp.amnh.org/pub/molecular.
Goldman N., Anderson J. P., Rodrigo A. G. Likelihood-based tests of topologies in phylogenetics. Syst. Biol. (2000) 49:652–670.
Gorodkin J., Heyer L. J., Stormo G. D. Finding the most significant common sequence and structure motifs in a set of RNA sequences. Nucleic Acids Res. (1997) 25:3724–3732.
Gorodkin J., Lyngsø R. B., Stormo G. D. A mini-greedy algorithm for faster structural RNA stem-loop search. Genome Informatics (2001) 12:184–193.[Medline]
Gowri-Shankar V., Rattray M. On the correlation between composition and site-specific evolutionary rate: Implications for phylogenetic inference. Mol. Biol. Evol. (2006) 23:352–364.
Grant T., Kluge A. G. Data exploration in phylogenetic inference; scientific, heuristic, or neither. Cladistics (2003) 19:379–418.[CrossRef][Web of Science]
Gutell R. R., Lee J. C., Cannone J. J. The accuracy of ribosomal RNA comparative structure models. Curr. Opin. Struct. Biol. (2002) 12:301–310.[CrossRef][Web of Science][Medline]
Hein J. A method that simultaneously aligns, finds the phylogeny and reconstructs ancestral sequences for any number of ancestral sequences. Mol. Biol. Evol. (1989) 6:649–668.[Abstract]
Hein J. A unified approach to phylogenies and alignments. Methods Enzymol. (1990) 183:625–644.
Hibbett D. S., Fukumasa-Nakai Y., Ysuneda A., Donoghue M. J. Phylogenetic diversity in shiitake inferred from nuclear ribosomal DNA sequences. Mycologia (1995) 87:618.[CrossRef][Web of Science]
Hickson R. E., Simon C., Cooper A., Spicer G. S., Sullivan J., Penny D. Conserved sequence motifs, alignment, and secondary structure for the third domain of animal 12S rRNA. Mol. Biol. Evol. (1996) 13:150–169.[Abstract]
Hickson R. E., Simon C., Perrey S. W. The performance of several multiple-sequence alignment programs in relation to secondary-structure features for an rRNA sequence. Mol. Biol. Evol. (2000) 17:530–539.
Hofacker I. L., Bernhart S. H. F., Stadler P. F. Alignment of RNA base pairing probability matrices. Bioinformatics (2004) 20:2222–2227.
Holmes I. A probabalistic model for the evolution of RNA structure. BMC Bioinformatics (2004) 5:166.[CrossRef][Medline]
Hudelot C., Gowri-Shankar V., Jow H., Rattray M., Higgs P. RNA-based phylogenetic methods: Application to mammalian mitochondrial RNA sequences. Mol. Phylogenet. Evol. (2003) 28:241–252.[CrossRef][Web of Science][Medline]
Jow H., Gowri-Shankar V., Guillard B. PHASE: A software package for phylogenetics and sequence evolution. In: Program and documentation. (2005) Available at http://www.cs.man.ac.uk/~gowrishv/beta-release/.
Jow H., Hudelot C., Rattray M., Higgs P. Bayesian phylogenetics using an RNA substitution model applied to early mammalian evolution. Mol. Biol. Evol. (2002) 19:1591–1601.
Katoh K., Kuma K., Miyata T., Toh H. Improvement in the accuracy of multiple sequence alignment program MAFFT. Genome Informatics (2005) 16:22–33.
Kjer K. M. Use of rRNA secondary structure in phylogenetic studies to identify homologous positions: An example of alignment and data presentation from the frogs. Mol. Phylogenet. Evol. (1995) 4:314–330.[CrossRef][Web of Science][Medline]
Kjer K. M. An alignment template for amphibian 12S rRNA, domain III: Conserved primary and secondary structural motifs. J. Herpetol. (1997) 31:599–604.[CrossRef]
Kjer K. M. Aligned 18S and insect phylogeny. Syst. Biol. (2004) 53:506–514.
Kjer K. M., Baldridge G. D., Fallon A. M. Mosquito large subunit ribosomal RNA: Simultaneous alignment of primary and secondary structure. Biochim. Biophys. Acta (1994) 1217:147–155.[Medline]
Kretzer A., Li Y., Szaro T., Bruns T. D. Internal transcribed spacer sequences from 38 recognized species of Suillus sensu lato: Phylogenetic and taxonomic implications. Mycologia (1996) 88:776–785.[CrossRef][Web of Science]
Kruskal J. B. An overview of sequence comparison. Pages 1–45. In: Time warps, string edits, and macromolecules—Sankoff D., Kruskal J. B., eds. (1983) Reading, Massachusetts: Addison-Wesley.
Lee M. S. Y. Unalignable sequences and molecular evolution. Trends Ecol. Evol. (2001) 16:681–685.[CrossRef]
Lockhart P. J., Steel M., Hendy M. D., Penny D. Recovering evolutionary trees under a more realistic model of sequence evolution. Mol. Biol. Evol. (1994) 11:605–612.[Web of Science][Medline]
Lunter G., Drummond A. J., Miklós I., Hein J. Statistical alignment: Recent progress, new applications, and challenges. Pages 375–405. In: Statistical methods in molecular evolution—Nielsen R., ed. (2005b) New York: Springer.
Lunter G., Miklós I., Drummond A., Jensen J. L., Hein J. Bayesian coestimation of phylogeny and sequence alignment. BMC Bioinformatics (2005a) 6:83.[CrossRef][Medline]
Lutzoni F., Wagner P., Reeb V., Zoller S. Integrating ambiguously aligned regions of DNA sequences in phylogenetic analyses without violating positional homology. Syst. Biol. (2000) 49:628–651.
Manos P. S. Systematics of Nothofagus (Nothofagaceae) based on rDNA spacer sequences (ITS): Taxonomic congruence with morphology and plastid sequences. Am. J. Bot. (1997) 84:1137–1155.[Abstract]
Mathews D. H., Sabina J., Zuker M., Turner D. H. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol. (1999) 288:911–940.[CrossRef][Web of Science][Medline]
Mathews D. H., Turner D. H. Dynalign: An algorithm for finding the secondary structure common to two RNA sequences. J. Mol. Biol. (2002) 317:191–203.[CrossRef][Web of Science][Medline]
Miklós I., Lunter G. A., Holmes I. A "long indel" model for evolutionary sequence alignment. Mol. Biol. Evol. (2004) 21:529–540.
Mindell D. P. Aligning DNA sequences. Pages 73–89. In: Phylogenetic analysis of DNA sequences—Miyamoto M., Craycraft J., eds. (1991) New York: Oxford University Press.
Misof B., Fleck G. Comparative analysis of mt LSU secondary structure of Odonates: Structural variability and phylogenetic signal. Insect Mol. Biol. (2003) 12:535–547.[CrossRef][Web of Science][Medline]
Mitchison G. J. A probabilistic treatment of phylogeny and sequence alignment. J. Mol. Evol. (1999) 49:11–22.[CrossRef][Web of Science][Medline]
Morrison D. A., Ellis J. T. Effects of nucleotide sequence alignment on phylogeny estimation: A case study of 18S rDNAs of Apicomplexa. Mol. Biol. Evol. (1997) 14:428–441.[Abstract]
Mugridge N. B., Morrison D. A., Johnson A. M., Luton K., Dubey J., Votypka J., Tenter A. M. Phylogenetic relationships of the genus Frenkelia: A review of its history and new knowledge gained from comparison of large subunit ribosomal ribonucleic acid gene sequences. Int. J. Parasitol. (1999) 29:957–972.[CrossRef][Web of Science][Medline]
Needleman S. B., Wunsch C. D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. (1970) 48:443–453.[CrossRef][Web of Science][Medline]
Niehuis O., Naumann C. M., Misof B. Identification of evolutionary conserved structural elements in the mt SSU rRNA of Zygaenoidea (Lepidoptera): A comparative sequence analysis. Organ. Divers. Evol. (2006) 6:17–32.[CrossRef]
Noller H. F. RNA structure: Reading the ribosome. Science (2005) 309:1508–1514.
Notredame C., Higgins D., Heringa J. T-Coffee: A novel method for multiple sequence alignments. J. Mol. Biol. (2000) 302:205–217.[CrossRef][Web of Science][Medline]
Notredame C., O'Brien E. A., Higgins D. G. RAGA: RNA sequence alignment by genetic algorithm. Nucleic Acids Res. (1997) 25:4570–4580.
Ogden T. H., Whiting M. F. The problem with "the Paleoptera Problem" sense and sensitivity. Cladistics (2003) 19:432–442.[Web of Science]
Perriquet O., Touzet H., Dauchet M. Finding the common structure shared by two homologous RNAs. Bioinformatics (2003) 19:108–116.
Phillips A., Janies D., Wheeler W. Multiple sequence alignment in phylogenetic analysis. Mol. Phylogenet. Evol. (2000) 16:317–330.[CrossRef][Web of Science][Medline]
Redelings B. D., Suchard M. A. Joint Bayesian estimation of alignment and phylogeny. Syst. Biol. (2005) 54:401–418.
Ronquist F., Huelsenbeck J. P. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics (2003) 19:1572–1574.
Sankoff D. Minimal mutation trees of sequences. SIAM J. Appl. Math. (1975) 28:35–42.[CrossRef]
Sankoff D., Cedergren R. J. Simultaneous comparison of three or more sequences related by a tree. Pages 253–263. In: Time warps, string edits, and macromolecules—Sankoff D., Kruskal J. B., eds. (1983) Reading, Massachusetts: Addison-Wesley.
Sankoff D., Morel C., Cedergren R. J. Evolution of 5S RNA and the non-randomness of base replacement. Nat. New Biol. (1973) 245:232–234.[CrossRef][Web of Science][Medline]
Savill N., Hoyle D., Higgs P. RNA sequence evolution with secondary structure constraints: Comparison of substitution rate models using maximum likelihood methods. Genetics (2001) 157:399–411.
Schluenzen F., Tocilj A., Zarivach R., Harms J., Gluehmann M., Janell D. Structure of functionally activated small ribosomal subunit at 3.3 Å resolution. Cell (2000) 102:615–623.[CrossRef][Web of Science][Medline]
Schnare M. N., Damberger S. H., Gray M. W., Gutell R. R. Comprehensive comparison of structural characteristics in eukaryotic cytoplasmic large subunit (23S-like) ribosomal RNA. J. Mol. Biol. (1996) 256:701–719.[CrossRef][Web of Science][Medline]
Schöniger M., von Haeseler A. A stochastic model and the evolution of autocorrelated DNA sequences. Mol. Phylogenet. Evol. (1994) 3:240–247.[CrossRef][Medline]
Shull V. L., Vogler A. P., Baker M. D., Maddison D. R., Hammond P. M. Sequence alignment of 18S ribosomal RNA and the basal relationships of adephagan beetles: Evidence for monophyly of aquatic families and the placement of Trachypachidae. Syst. Biol. (2001) 50:945–969.
Simmons M. P. Independence of alignment and tree search. Mol. Phylogenet. Evol. (2004) 31:874–879.[CrossRef][Web of Science][Medline]
Simmons M. P., Ochoterena H. Gaps as characters in sequence-based analyses. Syst. Biol. (2000) 49:369–381.
Stoye J. Multiple sequence alignment with the divide-and-conquer method. Gene (1998) 211:GC45–GC56.[CrossRef][Web of Science][Medline]
Swofford D. L. PAUP 3: Phylogenetic analysis using parsimony. In: User manual. (1993) Urbana: University of Illinois.
Swofford D. L. PAUP*: Phylogenetic analysis using parsimony (*and other methods). Version 4.0b10. (2001) Sunderland, Massachusetts: Sinauer Associates.
Terry M. D., Whiting M. F. Comparison of two alignment techniques within a single complex data set: POY versus Clustal. Cladistics (2005) 21:272–281.[CrossRef][Web of Science]
Thompson J. D., Higgins D. G., Gibson T. J. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Res. (1994) 22:4673–4680.
Thorne J. L., Kishino H., Felsenstein J. An evolutionary model for maximum likelihood alignment of DNA sequences. J. Mol. Evol. (1991) 33:114–124.[CrossRef][Web of Science][Medline]
Titus T., Frost D. R. Molecular homology assessment and phylogeny in the lizard family Opluridae (Squamata: Iguania). Mol. Phylogenet. Evol. (1996) 6:49–62.[CrossRef][Web of Science][Medline]
van d e, Peer Y., Neefs J.-M., de Rijk P., de Wachter R. Reconstructing evolution from eukaryotic small-ribosomal-subunit RNA sequences: Calibration of the molecular clock. J. Mol. Evol. (1993) 37:221–232.[CrossRef][Web of Science][Medline]
van de Y. Peer, Robbrecht E., De Hoog S., Caers A., De Rijk P., De Wachter R. Database on the structure of small subunit ribosomal RNA. Nucleic Acids Res. (1999) 27:179–183.
Vingron M., Watermann M. S. Sequence alignment and penalty choice: Review of concepts, case studies and implications. J. Mol. Biol. (1994) 235:1–12.[CrossRef][Web of Science][Medline]
Wheeler W. C. Sequence alignment, parameter sensitivity, and the phylogenetic analysis of molecular data. Syst. Biol. (1995) 44:321–331.
Wheeler W. C. Optimization alignment: The end of multiple sequence alignment in phylogenetics? Cladistics (1996) 12:1–9.[CrossRef][Web of Science]
Wheeler W. C. Fixed character states and the optimization of molecular sequence data. Cladistics (1999a) 15:379–385.[CrossRef][Web of Science]
Wheeler W. C. Measuring topological congruence by extending character techniques. Cladistics (1999b) 15:131–135.[CrossRef][Web of Science]
Wheeler W. C. Heuristic reconstruction of hypothetical-ancestral DNA sequences: Sequence alignment versus direct optimization. Pages 106–113. In: Homology and systematics—Scotland R., Pennington R. T., eds. (2000) London: Systematics Society.
Wheeler W. C. Search-based optimization. Cladistics (2003) 19:348–355.[CrossRef][Web of Science][Medline]
Wheeler W. C. Dynamic homology and the likelihood criterion. Cladistics (2006) 22:157–170.[CrossRef][Web of Science]
Wheeler W. C., Hiyashi C. The phylogeny of extant chelicerate orders. Cladistics (1998) 14:173–192.[CrossRef][Web of Science]
Wheeler W. C., Ramirez M. J., Aagesen L., Schulmeister S. Partition-free congruence analysis: Implications for sensitivity analysis. Cladistics (2006) 22:256–263.[Web of Science]
Wheeler W. C., Whiting M. F., Wheeler Q. D., Carpenter J. M. The phylogeny of the extant hexapod orders. Cladistics (2001) 17:113–169.[CrossRef][Web of Science]
Whiting A. S., Sites J. S., Pellegrino K. C. M., Rodrigues M. T. Comparing alignment methods for inferring the history of the new world lizard genus Mabuya (Squamata: Scincidae). Mol. Phylogenet. Evol. (2006) 38:719–730.[CrossRef][Web of Science][Medline]
Whiting M. F., Carpenter J. C., Wheeler Q. D., Wheeler W. C. The Strepsiptera problem: Phylogeny of the holometabolous insect orders inferred from 18S and 28S ribosomal DNA sequences and morphology. Syst. Biol. (1997) 46:1–68.
Wimberly B. T., Brodersen D. E., Clemons W. M. Jr., Morgan-Warren R. J., Carter A. P. Structure of the 30S ribosomal subunit. Nature (2000) 407:327–339.[CrossRef][Medline]
Xia X., Xie Z., Kjer K. M. 18S ribosomal RNA and tetrapod phylogeny. Syst. Biol. (2003) 52:283–295.
Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. (2003) 31:3406–3415.
This article has been cited by other articles:
![]() |
B. Misof and K. Misof A Monte Carlo Approach Successfully Identifies Randomness in Multiple Sequence Alignments: A More Objective Means of Data Exclusion Syst Biol, May 20, 2009; (2009) syp006v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. A. Morrison Why Would Phylogeneticists Ignore Computerized Sequence Alignment? Syst Biol, March 25, 2009; (2009) syp009v1. [Full Text] [PDF] |
||||
![]() |
E. Benavides, R. Baum, D. McClellan, and J. W. Sites Molecular Phylogenetics of the Lizard Genus Microlophus (Squamata:Tropiduridae): Aligning and Retrieving Indel Signal from Nuclear Introns Syst Biol, October 1, 2007; 56(5): 776 - 797. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. H. Ogden and M. S. Rosenberg Alignment and Topological Accuracy of the Direct Optimization approach via POY and Traditional Phylogenetics via ClustalW + PAUP Syst Biol, April 1, 2007; 56(2): 182 - 193. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||








