© 2006 Society of Systematic Biologists
Nematode Small Subunit Phylogeny Correlates with Alignment Parameters
Edited by Karl Kjer: Associate Editor
1 Department of Nematology, University of California One Shields Avenue, Davis, California 95616, USA E-mail: smythea{at}si.edu (A.B.S.)
2 Section of Ecology and Evolution, University of California One Shields Avenue, Davis, California 95616, USA
| Abstract |
|---|
|
|
|---|
The number of nuclear small subunit (SSU) ribosomal RNA (rRNA) sequences for Nematoda has increased dramatically in recent years, and although their use in constructing phylogenies has also increased, relatively little attention has been given to their alignment. Here we examined the sensitivity of the nematode SSU data set to different alignment parameters and to the removal of alignment ambiguous regions. Ten alignments were created with CLUSTAL W using different sets of alignment parameters (10 full alignments), and each alignment was examined by eye and alignment ambiguous regions were removed (creating 10 reduced alignments). These alignment ambiguous regions were analyzed as a third type of data set, culled alignments. Maximum parsimony, neighbor-joining, and parsimony bootstrap analyses were performed. The resulting phylogenies were compared to each other by the symmetric difference distance tree comparison metric (SymD). The correlation of the phylogenies with the alignment parameters was tested by comparing matrices from SymD with corresponding matrices of Manhattan distances representing the alignment parameters. Differences among individual parsimony trees from the full alignments were frequently correlated with the differences among alignment parameters (580/1000 tests), as were trees from the culled alignments (403/1000 tests). Differences among individual parsimony trees from the reduced alignments were less frequently correlated with the differences among alignment parameters (230/1000 tests). Differences among majority-rule consensus trees (50%) from the parsimony analysis of the full alignments were significantly correlated with the differences among alignment parameters, whereas consensus trees from the reduced and culled analyses were not correlated with the alignment parameters. These patterns of correlation confirm that choice of alignment parameters has the potential to bias the resultant phylogenies for the nematode SSU data set, and suggest that the removal of alignment ambiguous regions reduces this effect. Finally, we discuss the implications of conservative phylogenetic hypotheses for Nematoda produced by exploring alignment space and removing alignment ambiguous regions for SSU rDNA.
Keywords: Multiple alignment; Nematoda; phylogeny reconstruction; ribosomal RNA
Received April 8, 2005; Revised June 28, 2005; Accepted August 3, 2006
The first attempt at constructing a comprehensive phylogeny of Nematoda using molecular phylogenetic methods utilized the nuclear small-subunit (SSU) ribosomal RNA (rRNA) gene to infer the relationships of 53 species (Blaxter et al., 1998). This study produced a phylogenetic hypothesis that has been widely cited and used to examine issues such as the evolution of parasitism (Dorris et al., 1999), morphological evolution (Zhang et al., 2001), and the systematics and taxonomy of nematodes (De Ley and Blaxter, 2002). Since the Blaxter et al. (1998) phylogeny was published, the number of near-complete or complete nematode SSU sequences in GenBank (Benson et al., 2004) has increased dramatically. This larger data set provides an opportunity to more broadly explore the phylogeny of nematodes and examine a critical issue that previous studies have given little attention: the effect of multiple sequence alignment on phylogenetic trees for Nematoda.
Blaxter et al. (1998) created their alignment of 53 species by hand with respect to a secondary structure model (Ellis et al., 1986). Ribosomal RNA secondary structure is being increasingly and successfully used to inform alignments and phylogenies (e.g. Kjer, 1995; Hickson et al., 2000; Goertzen et al., 2003; Xia et al., 2003; Kjer, 2004; Telford et al., 2005), but for multiple alignments of numerous sequences, secondary structure methods cannot yet be fully implemented by computer programs. Wheeler (1999) and Lutzoni et al. (2000) independently suggested a method to recode alignment ambiguous characters (whether delimited by secondary structure or not) for inclusion in parsimony analyses. This recoding approach was combined with secondary structure information by Gillespie (2004), who suggested dissecting rRNA molecules into pairing and nonpairing regions and assigning them to different classes upon recoding. Recent advances in software that incorporate structural information such as PHASE (Hudelot et al.,2003) and the doublet model of MrBayes (Huelsenbeck Ronquist 2001, Ronquist Huelsenbeck, 2003) are also expanding opportunities to incorporate structural information into phylogenetic analyses. Despite these advances, the structural alignment of RNA remains challenging and time consuming for many datasets (Gardner et al., 2005).
Wheeler (1996) implemented an alternative approach, termed "direct optimization" or "optimization alignment," which optimizes an alignment and tree simultaneously (Sankoff et al., 1973, Sankoff et al., 1975, Kruskal et al., 1983, Sankoff Cedergren et al., 1983), in contrast to the more common separate steps of alignment followed by tree construction. This method, implemented in the program POY (Wheeler et al., 2003), uses a parsimony-based optimization procedure to produce a phylogenetic tree directly from nucleotide sequence data without constructing a traditional multiple sequence alignment. Direct optimization has been used to construct phylogenies for a variety of taxa (e.g. Damgaard et al., 2005; Bertelli and Giannini, 2005; Nishiguchi and Nair, 2003), but has been criticized for not producing an alignment output that would allow, for example, detection of problematic sequences or examination of the contribution of ambiguously aligned sequences (Lutzoni et al., 2000). These criticisms have been somewhat mitigated by the development of the "implied alignment" procedure by which the homologies implied by the direct optimization procedure can be extracted and represented as a traditional multiple alignment (Wheeler, 2003, Giribet, 2005). However, the value of implied alignments has also been questioned because they do not represent the data from which the phylogeny was constructed (Kjer, 2004).
A common alternative approach to secondary structure alignment or direct optimization is the use of a computer program that uses a mathematical algorithm to create an alignment by optimizing pattern-matching criteria. For example, the commonly used alignment program CLUSTAL W (Thompson et al., 1994) utilizes a distance tree to guide the progressive alignment of sequences. A recent comparison of multiple sequence alignment programs found CLUSTAL W to perform well when RNA sequence identity exceeded 50% to 60%, but below that identity structural information was required to improve accuracy (Gardner et al., 2005). CLUSTAL W attempts to determine the optimal alignment according to user-input penalties for opening a gap (gap opening penalty, GOP) and for extending the length of a gap (gap extension penalty, GEP). Choice of gap penalites is essentially arbitrary as there is no analytical way to determine what value these penalties should be assigned (Rinsma et al., 1993). DeSalle et al. (1994) suggested exploring multiple alignment space by employing an incremental series of alignment parameters to find appropriate alignment penalites for particular data sets. There is, however, no consensus on how to choose the optimal alignment from a series of different alignments. In this study we explore in detail the impact of using different CLUSTAL W alignment methodologies, including varying alignment parameters, on phylogenetic inference for Nematoda.
Regardless of the method of alignment construction, a common practice is to exclude ambiguously aligned regions from subsequent phylogenetic analysis (Gatesy et al., 1993, Castresana 2000). The justification for exclusion of characters is that certain alignment regions contain questionable statements of positional homology that may increase phylogenetic noise and provide support for spurious relationships (Gatesy et al., 1993, Wheeler, 1995). Löytynoja and Milinkovitch (2001) excluded such regions from their analysis of deep eukaryote phylogeny. They noted that the more conserved regions remaining were less likely to be subject to artifactual results caused by fast-evolving sites that were potentially misaligned. In their study of Nematoda, Blaxter et al. (1998) examined an alternative alignment in which any position including a gap was removed and reported "nearly identical results" that did not change relationships among the deepest nodes in the nematode tree. Our study seeks to more rigorously scrutinize the effect of removing alignment ambiguous regions on tree inference for the nematode SSU rRNA data set. We aim to investigate if tree topologies for Nematoda are influenced by alignment parameter choice, and to assess which major groups of nematodes are reliably supported by the SSU data set.
We approach this investigation of alignment sensitivity by creating 10 multiple alignments ("full" alignments) of 191 nematode SSU sequences (plus outgroups), each with a different set of alignment parameters. Alignment ambiguous regions are removed from these alignments, creating 10 additional data sets ("reduced" alignments). The alignment ambiguous regions that were removed from each alignment are also used as data sets themselves, creating an additional 10 data sets ("culled" alignments). Phylogenetic analyses are performed on all 30 data sets, and the resulting trees are compared to each other by statistical methods. Through these tree comparisons, we examine the influence that different alignment parameters and the exclusion of ambiguous regions have on the resulting phylogenetic hypotheses.
| Materials and Methods |
|---|
|
|
|---|
Alignment Construction
One hundred eighty complete or nearly complete (
1640 base pairs) nematode SSU rDNA sequences were downloaded from GenBank, in addition to four outgroup taxa: a priapulid worm, Priapulus caudatus (Priapulida); a rotifer, Brachionus plicatilis (Rotifera); and two gordian worms, Chordodes morgani and Gordius aquaticus (Nematomorpha). The gordian worms are representatives of the sister group to Nematoda according to molecular phylogenies (Aguinaldo et al., 1997, Peterson and Eernisse, 2001). An additional 11 unpublished nematode sequences were included (All GenBank accession numbers in Appendix 1). These 191 taxa represent all known major lineages of nematodes, but taxon sampling is much more thorough for terrestrial and parasitic nematodes than for the more speciose marine and freshwater taxa. Alignments for these 195 taxa were created using the program CLUSTAL W 1.8 (Thompson et al., 1994) using a dual-processor Linux computer. Ten alignments were created with different GOP and GEP values using CLUSTAL W. These values were chosen to provide a range of absolute values and to vary the ratio between GOP and GEP values; GOP values ranged from 2 to 6 times the GEP values (Table 1). These 10 alignments were termed "full" alignments, and analyses based on them used all sites without regard to potential alignment ambiguity.
|
The 10 full alignments were independently examined by eye using the alignment editor Se-Al v.2.0a11 (Rambaut, 1996) to assess which regions appeared to include ambiguously aligned sites. A sequence region was initially selected as ambiguously aligned if at least 30% of the taxa did not share the same nucleotide or gap state (Hillis and Dixon, 1991). A block of sequence to be excluded was defined by expanding out in both directions from the ambiguously aligned region until reaching conserved flanking sequence defined by an invariant site, or a site containing only transition substitutions. We examined the only published secondary structural model for a nematode SSU rRNA (Ellis et al., 1986) and a secondary structural model for Drosophila melanogaster (Cannonne et al.,2002) to assess the structural regions removed by our method. This method removed sites from both stems and loops, primarily from the variable regions V1–V7 (Ellis et al., 1986), with the exception of V3 from which no sites were removed. Data sets excluding alignment ambiguous regions were termed "reduced" alignments; one such alignment was produced for each of the 10 "full" alignments. For each full alignment, the alignment ambiguous regions (excluded to create the reduced alignment) were used to create a third type of data set, the "culled" data set. Alignments are available from TreeBASE (http://www.treebase.org).
Phylogenetic Analyses
Phylogenetic analyses were performed using PAUP* 4.0b10 (Swofford, 2003) on a multinode Linux AMD computer cluster, running a single analysis per computer processor. Heuristic maximum parsimony searches were performed using 1000 replicates of random taxon addition and tree bisection-reconnection branch swapping with a time limit of 30 s imposed on each addition-sequence replicate. Fifty percent and 70% majority rule (MR) consensus trees were made from parsimony analyses of each of the 10 full, reduced, and culled alignments. Bootstrap analyses with parsimony inference were performed using 1000 replicates, each with 10 replicates of random taxon addition, tree bisection-reconnection branch swapping, a time limit of 30 s on each bootstrap replicate, and the "Multrees" option not in effect (saving only one tree per bootstrap replicate). Ten replicates of each bootstrap analysis were repeated, resulting in a total of 10,000 bootstrap trees that were subsequently pooled. For each of the 10 full, reduced, and culled alignments, bootstrap MR consensus trees (50% and 70%) were constructed from the pools of 10,000 bootstrap trees. To test whether differences in bootstrap results and consensus tree resolution between full and reduced alignments were due to the simple reduction of numbers of characters sampled, we performed "limited" bootstrap analyses. Limited bootstrap analyses limit the number of characters resampled in each replicate with the "Nchar = number of characters" command in PAUP*. Limited bootstrap analyses were performed on all 10 full alignments wherein only the number of characters in the corresponding reduced alignment were sampled. Gaps were treated as missing data in all analyses. Neighbor-joining (NJ) analyses were also performed on all full, reduced, and culled alignments in order to obtain fully dichotomous phylogenetic trees. These NJ analyses used the HKY85 DNA distance (Hasegawa et al., 1985) as that distance provided a relatively simple model, but one that allowed for unequal base frequencies and different transition versus transversion substitution rates.
Evaluating Tree Resolution
To quantitatively compare levels of resolution for parsimony trees and bootstrap parsimony trees, the mean consensus fork index (CFI) (Colless, 1980) was calculated for each analysis. CFI values range from 1 (completely resolved) to 0 (completely unresolved). The CFI (normalized for the number of taxa) is calculated in PAUP* with the "ConTree [tree list]/indices = yes" command. In order to calculate the CFI for each tree in a collection of trees (e.g., all most parsimonious trees), the consensus fork index was calculated for each individual tree with a script in the programming language Python 2.2 (van Rossum and Drake, 2001) that generated a series of ConTree commands for PAUP*. Another Python 2.2 script parsed the resulting sets of consensus indices. These Python 2.2 scripts, and all others used in this study, are available as supplemental material on the Systematic Biology website (http://systematicbiology.org). The CFI for each tree was determined, and the mean and standard deviation over all trees was calculated. The mean CFI for each bootstrap analysis, and the relationship between the mean CFI for trees from an alignment and the number of characters included in the analysis were also evaluated.
Evaluating Tree Topology Disparity
Comparisons Among Alignment Parameters.—To obtain numerical values for evaluating the topological similarity of trees from different alignments, the symmetric difference distance (SymD) tree comparison metric was used (Robinson and Foulds 1979, 1981). In comparing two trees, the SymD measures the number of clades that are present on one tree but not the other. Therefore, more similar trees have lower SymD values, and more disparate trees have higher SymD values. The 10 majority rule (MR) consensus (50%) trees from the heuristic parsimony searches of each of the 10 full alignments were compared to each other by the SymD in PAUP*, creating a 10 x 10 matrix of 45 (unique) SymD values. The 10 MR consensus trees (50%) from the heuristic search of the 10 reduced alignments were also compared to each other by the SymD, as were the 10 MR consensus trees (50%) from the culled alignments. The 10 bootstrap MR consensus trees (50%) from the full alignments were compared to each other by the SymD, as were those from the reduced and culled alignments. The preceeding MR tree comparison by SymD was repeated for 70% MR consensus trees for the parsimony searches and for the bootstrap analysis of the full, reduced, and culled alignments. The 10 NJ trees from the full alignments were also used to calculate SymD matrices, as were those from the reduced and culled alignments. For each of these 10 SymD matrices, the mean and the standard deviation of the SymD values were calculated.
To evaluate whether MR consensus trees were mitigating differences among individual trees from different alignments, single most parsimonious trees from the parsimony analyses were also compared by the SymD in PAUP*. A script in Python 2.2 made a random choice of one tree from each of the 10 alignments. Those 10 trees were compared to each other by the SymD, creating a SymD matrix. This procedure was repeated 1000 times, randomly sampling trees, with replacement, from among the respective 10 full, reduced, and culled alignments. For each SymD matrix, the mean and standard deviation of the SymD values were calculated, and a grand mean (mean of means) and mean standard deviation were determined for all 1000 SymD matrices. Single bootstrap trees from the bootstrap analyses of the full, reduced, and culled alignments were compared to each other by the SymD in the same way.
To visualize and compare the tree space occupied by trees from the full, reduced, and culled alignments, we used the Tree Set Visualization (TSV) module (version 2.1) (Amenta and Klingner, 2002) of the evolutionary analysis software package Mesquite (Maddison and Maddison, 2004). Tree Set Visualization compares sets of trees by the SymD and then uses multidimensional scaling (MDS; see Lingoes et al. [1979]; Young and Hamer [1987]; Borg and Groenen [1997]) to create a two-dimensional plot of those distance matrices. Each tree is represented by a point and the distortion of SymD values between pairs of trees and the screen distance is minimized. Sets of trees can be colored with an associated color score file. Hillis et al (2005) demonstrated several uses of TSV, including the comparison of trees from Bayesian versus bootstrap samples and the comparison of trees from single gene versus concatenated gene data sets. We used TSV to plot single most parsimonious trees from all 10 alignments in three separate analyses for the full, reduced, and culled alignments. Because these sets of trees had too many most parsimonious trees (e.g., 16,115 trees from all 10 full alignments) for TSV to compare (Nina Amenta, personal communication) we sampled 5000 random trees for TSV. A script in Python 2.2 randomly chose 500 trees, with replacement, from each of the 10 sets of alignment parameters. Those 5000 trees were visualized with TSV, and the sampling and visualization process was repeated 10 times to ensure that sampling produced similar visualizations.
The SymD tree distance matrices, from the comparison of MR consensus trees and the comparison of individual parsimony trees, were also tested for correlation with the alignment parameters. A significant correlation would indicate a relationship between the alignment parameter distances and symmetric difference distances calculated from the trees. Testing for this correlation was accomplished by first constructing a Manhattan (rectilinear) distance matrix representing the distances between the 10 sets of alignment parameters where the distance d = |x1–x2|+|y1–y2| (Table 2). For example, alignment 1 (GOP = 4, GEP = 2) is most similar to alignment 2 (GOP = 6, GEP = 2), because these two alignments are separated by a Manhattan distance of 2. In contrast, alignments 1 (GOP = 4, GEP = 2) and 10 (GOP = 60, GEP = 10) are the most disparate alignments in our study, and are separated by a Manhattan distance of 64. The matrix of Manhattan distances of the alignment parameters was then tested for correlation with the SymD matrices by the Mantel test (Mantel, 1967; Mantel and Valand, 1970), which was performed using the program zt version 1.0 (Bonnet and Van de Peer, 2002) using the "-e" (exact) option to calculate all possible permutations of elements in the matrix. The SymD matrices produced by comparison of MR consensus trees (parsimony and bootstrap) and NJ trees were each tested for correlation with the matrix of alignment parameter distances. For the matrices produced by the comparison of individual parsimony and bootstrap trees, a Python 2.2 script automated the input of matrices into zt. For example, each of the 1000 SymD matrices produced by the comparison of parsimony trees from all 10 full alignments were tested for correlation with the matrix of alignment parameters by the Mantel test. A mean r and P-value, over all 1000 tests, was calculated, and the number of times (out of 1000) the test was significant (P < 0.05) was determined. All 1000 SymD matrices from the comparison of parsimony trees from the reduced alignments were tested for significant correlation with the alignment parameters by the same procedure, as were the SymD matrices from the culled alignments. Symmetric difference distance matrices from the bootstrap analysis of the full, reduced, and culled alignments were similarly tested.
|
Comparisons Within Alignment Parameters
To examine the topological differences among trees inferred using the same set of alignment parameters, trees were randomly chosen for comparison by the SymD tree distance metric. To compare trees from the full alignment for a particular set of alignment parameters with trees from the corresponding reduced alignment, a tree was randomly chosen from the full alignment and compared to a randomly chosen tree from the reduced alignment by the SymD. This process was repeated 1000 times, and an average and standard deviation of the SymD values were calculated for all 1000 comparisons. Similarly, two trees were randomly chosen from each full alignment for comparison by the SymD, and two trees were chosen from each reduced alignment and each culled alignment, with the process repeated 1000 times. These comparisons of trees (full versus full, full versus reduced, full versus culled, and all combinations thereof) were repeated for all 10 sets of alignment parameters.
| Results |
|---|
|
|
|---|
Alignment Construction
Both the total number of characters and parsimony informative characters varied among each of the 10 sets of alignment parameters (Fig. 1). The full alignments showed the greatest range in total number of characters, from 1915 to 2311. The number of characters in the full alignments decreased with increasing GOP values (e.g., from alignments 1 to 5), but less so for the higher GEP of 10 (alignments 6 to 10). The total number of characters for the culled alignments also decreased with increasing GOP values, but the number of parsimony informative characters did not. The number of characters in reduced alignments ranged from 1311 to 1143 and did not show a strong relationship with GOP as did the full alignments. For the full and reduced alignments, the number of parsimony informative characters was roughly half the total number of characters, whereas for the culled alignment, the proportion of parsimony informative characters was higher, ranging from 70% to 92%.
|
Tree Resolution
The mean CFI of individual parsimony bootstrap and parsimony trees for each of the 10 sets of alignment parameters are shown in Figure 2. Mean CFI is a measure of the resolution among groups of trees. The CFI of the bootstrap trees and parsimony trees from the full alignments, and those from the limited bootstrap analysis, were nearly identical across alignment parameters (averaging 0.97, 0.97, and 0.95, respectively). The CFI of parsimony trees from the culled alignments were slightly higher than those of the full alignments, with a peak at alignment 6 due to a single, well resolved, most parsimonious tree being found (CFI = 0.98). The mean CFI of the individual bootstrap trees and parsimony trees from the reduced alignments were considerably lower than those from the full and culled alignments (averaging 0.80 and 0.84, respectively). For reduced data sets, CFI values for bootstrap and parsimony trees also showed parallel patterns of variation according to the alignment parameters, tending to have lower CFI values in the middle ranges of the parameters such as GOP = 12, GEP = 2. CFI values were always lower for bootstrap trees when compared to parsimony trees for the same alignment parameters. For the parsimony trees, the variation in mean CFI between full and reduced data sets can be at least partly explained by variation in the number of parsimony informative characters; the mean CFI values showed a correlation with the number of parsimony informative characters of r = 0.76 (P = 0.01). This trend is also suggested for the variation in mean CFI among trees for the reduced data sets (compare Figs. 1 and 2).
|
Tree Topology Disparity
Comparisons Among Alignment Parameters
The mean symmetric difference distance (SymD) values for MR (parsimony and bootstrap) consensus trees and NJ trees showed substantial variation and a marked disparity between full, reduced, and culled alignments (Table 3). Majority-rule consensus trees, both 50% and 70%, from the culled alignments had more nonshared clades than MR trees from the full or reduced alignments. Overall, bootstrap MR consensus trees were more similar to each other than were parsimony MR consensus trees. For example, 50% bootstrap MR trees had fewer non-shared clades (SymD = 23.89 for full, 18.96 for reduced, and 28.93 for culled) than did parsimony trees (SymD = 71.33 for full, 57.22 for reduced, and 106.11 for culled) inferred from these 10 alignments. However, standard deviations for some of these parsimony MR SymD results were relatively high, and mean SymD values for parsimony MR trees (full, reduced, culled) nearly overlap when their standard deviations are considered. Trees representing culled alignments consistently showed the greatest SymD values within any comparison. The NJ trees from the culled alignments showed the greatest disparity among all comparisons, with a mean SymD = 114.31. The NJ trees from the reduced alignments showed an intermediate SymD value of 92.00 and NJ trees from the full alignments were most similar to each other with a mean SymD of 51.00. Neighbor-joining trees from the reduced alignments also showed the greatest standard deviation in SymD (standard deviation of 53.74 for reduced, versus 7.07 for full). For full alignments, NJ trees showed greater similarity than parsimony MR trees.
|
When individual most parsimonious trees from the 10 alignments were compared, mean SymD values were slightly higher than those from the comparison of MR consensus trees (Table 4). As for the MR consensus results, trees from the culled alignments were most dissimilar to each other, with mean SymD values of 116.23. Contrary to the results for bootstrap MR consensus trees, individual bootstrap trees were dramatically different from each other for the full, reduced, and culled alignments (Table 4). When individual most parsimonious trees were plotted in tree space with TSV, trees from the full alignments showed fairly tight clustering with respect to alignment and almost no overlap among trees from different alignments (Figure 3A). Trees from the reduced alignments were somewhat more scattered in tree space when plotted with TSV and showed considerable overlap among trees from some alignments (e.g., alignments 3, 4, and 6; Figure 3B). Most culled alignments formed small, tight clusters according to alignment parameters with the exception of alignment 5 which showed a bimodal distribution of trees and alignment 6 which only resulted in a single most parsimonious tree (Figure 3C).
|
|
Mantel's test of matrix correlation showed that differences among individual parsimony trees from the full alignments were frequently correlated with the differences among alignment parameters (Table 4). For example, in trees from the 10 full alignments, 580 out of 1000 SymD matrices were significantly correlated with the Manhattan distance matrix of the alignment parameters. Differences among trees from the 10 culled alignments were correlated with the differences among alignment parameters nearly as frequently, with 403 out of 1000 SymD matrices significantly correlated. In contrast, only 230 of the 1000 SymD matrices from the reduced alignments were significantly correlated with the differences among alignment parameters. Results from the Mantel's tests on individual bootstrap trees showed intermediate correlation frequencies, with differences among individual bootstrap trees much less frequently correlated with the differences among alignment parameters than trees from the maximum parsimony analyses. Differences among bootstrap trees from the full alignments were more frequently correlated with the differences among alignment parameters than the reduced alignments, but not dramatically so (significant correlations for 144 out of 1000 matrices for the full alignments versus 88 out of 1000 for the reduced alignments). Differences among bootstrap trees from the culled alignments were most frequently correlated with the differences among alignment parameters, showing significant correlations 277/1000 times.
Mantel's test indicated that the 50% MR consensus trees from the parsimony analysis of the full alignments were significantly correlated with the alignment parameters; tests of correlation between the SymD matrix and the matrix of the alignment parameters (Table 2) showed a significant correlation (Table 3) for the full alignments (P = 0.002) but no significant correlation for the reduced alignments (P = 0.063). For 70% MR consensus trees from the parsimony analysis, differences among trees from both the full and reduced alignments were significantly correlated with the differences among alignment parameters (P = 0.035 and P = 0.041, respectively) (Table 3). Differences among majority rule consensus trees (50% and 70%) from the culled alignments were not correlated with the differences among alignment parameters. The SymD between bootstrap MR consensus trees showed a strongly significant correlation (Mantel's test) with distances between alignment parameters for the full and culled alignments as well as the limited bootstrap analysis (P < 0.001, r > 0.5, results of limited bootstrap not shown). The SymD matrix for 50% bootstrap MR consensus trees from the reduced alignments and the distances between alignment parameters showed a significant correlation (P = 0.043), but the correlation was considerably weaker than for the full, culled, and limited analyses (r = 0.282). The SymD for 70% bootstrap MR consensus trees from the full, reduced, and culled alignments were all strongly correlated with the differences among alignment parameters. Results from Mantel's test for the NJ trees revealed that full and culled alignments were significantly correlated with differences among alignment parameters (P = 0.014 and 0.000 respectively) but reduced alignments were not (P = 0.247) (Table 3).
Comparisons Within Alignment Parameters
Within each set of alignment parameters, individual most parsimonious trees were compared to each other by the SymD (Fig. 4). Trees from each full alignment were always quite different from those of their corresponding reduced alignment; mean SymD values ranged from 137 to 162 and standard deviations were low (results not shown). The greatest SymD disparity was seen in trees from the reduced alignments compared to trees from the corresponding culled alignments (Fig. 4, reduced versus culled). These SymD values are twice those seen in comparing trees across alignment parameters (Tables 3 and 4) within a data set type (full, reduced, or culled). For a given set of alignment parameters, trees from the full alignment showed a low level of disparity overall (mean SymD ranging from 15 to 39) but with considerable variation (standard deviations ranging from 7 to 21). Trees from the reduced and culled alignments showed a similarly low level of disparity and high variation.
|
Summarizing Phylogenetic Results
To summarize phylogenetic hypotheses for full data sets, we pooled all equally parsimonious trees from all 10 full alignments into a single MR (50%) consensus tree. To produce an assessment of relative clade reliability for this MR consensus tree, the same procedure was performed for bootstrap trees; 100,000 bootstrap trees from all 10 full alignments were combined into a single bootstrap MR (50%) consensus tree. Bootstrap support values from this tree were placed on clades of interest on the 50% majority rule parsimony consensus tree. Majority-rule consensus parsimony trees and bootstrap MR consensus trees were constructed in the same way for the reduced and culled analyses.
The 50% MR consensus tree of all 16,115 parsimony trees from the 10 full alignments (Figs. 5 to 7) had a CFI of 0.854 and 102 clades with at least 70% bootstrap support (bootstrap trees not shown). The 50% MR consensus tree of all 140,352 parsimony trees from the 10 reduced alignments (online figures, Appendices A to C; available at http://systematicbiology.org) had a CFI of 0.719 and 69 clades with at least 70% bootstrap support. The 50% MR consensus tree of all 9,522 parsimony trees from the 10 culled alignments (online figures, Appendices D to F) had a CFI of 0.849 and 81 clades with at least 70% bootstrap support. This tree showed good resolution among tip taxa and within many clades, but the relationships among major clades were considerably less well resolved compared to trees from the full and reduced analyses.
| Discussion |
|---|
|
|
|---|
The problem of how to treat regions of multiple sequence alignments with questionable homologies is one that has plagued molecular systematists for years. Lee (2001) reviewed many approaches to this problem, and suggested that the most common approach is to discard those regions before analysis. Gatesy et al. (1993) suggested that the discussion of character removal from alignments should be part of the concept of "total evidence" (Kluge et al., 1989). Lecointre and Deleporte (2005) recently considered the philosophy of excluding misleading data in a total evidence context but made no mention of potentially misleading alignment regions. Recent studies claiming to take a "total evidence" approach have both included (Nadler and Hudspeth, 2000, Mattern and McLennan, 2004) and excluded (Tekle et al., 2005) alignment ambiguous sites. The quality of the excluded data has been an important factor in determining whether data is misleading or informative. Some have considered alignment ambiguous regions to be essentially devoid of phylogenetic information (e.g., Castresana [2000]) whereas others have considered gapped regions to be useful at lower taxonomic levels (Kawakita et al., 2003). Our study suggests that alignment ambiguous regions contain considerable phylogenetic information, especially for resolving lower-level relationships, but that these data also produce the most topologically diverse trees among comparisons of different alignment parameters.
There are several alternatives to the options of including all aligned sites in the analysis versus removing all alignment ambiguous regions. For example, newly developed alignment tools that estimate the posterior probability of alignment sites (Loytynoja et al., 2003) offer the possibility of using posterior probabilities of sites as weights for phylogenetic analyses. Another approach is to recode ambiguous regions as additional character information for analyses (Wheeler, 1999, Lutzoni et al., 2000, Gillespie, 2004), and although the program POY (Wheeler et al., 2003) was designed for these types of analyses, this approach has been criticized (Lee et al.,2001) as creating an impractical number of characters. A more substantial problem with recoded alignments is that these data cannot currently be used for nucleotide-based maximum likelihood analyses. Another approach involves the analysis of many different alignments created by different alignment parameters. These multiple data sets can be concatenated into what Wheeler (1995) termed an "elision" alignment, effectively giving greater weight to regions that align identically across alignments and less weight to ambiguous regions that differ across alignments (Thornton et al., 2000). Lutzoni et al. (2000) criticized the elision approach for several reasons, the most fundamental of which, assigning multiple homologies to the same site, was also noted by Wheeler (1995). Multiple data sets produced by different sets of alignment parameters can also be analyzed separately, in what Lee (2001) termed the "multiple analysis method." In this approach, only those relationships found in a consensus of trees from the different alignments are accepted. Difficulties with this approach include the increased time required for data analysis, choosing the range of alignment parameters to explore, and a likely loss of resolution in producing a consensus of possibly quite disparate trees. This multiple analysis method has been considered "the most feasible approach" to the problem of ambiguously aligned regions (Lee et al., 2001), and we consider it a conservative way to examine relationships supported by a data set (see below, "Implications for Nematode Phylogeny"). Herein we have further explored this approach by generating 10 different alignments from different parameters and examining the effect of including, excluding, and separately analyzing alignment ambiguous SSU data with respect to sensitivity to alignment parameter variation, and levels of tree resolution.
Effects of Alignment Parameters on Tree Topology
Alignment parameter choice for the nematode SSU rDNA data set has several important effects on the resulting alignment. Lower, more permissive gap penalties allow the introduction of more (and longer) gaps, creating alignments with more characters. Hickson et al. (2000) suggested that lower gap costs are better as they result in alignments that are more consistent with secondary structure. For this SSU data set, a GEP of 2 was relatively permissive and allowed each gap opening to expand freely. This result is apparent in the dramatic influence of the GOP values on alignment length for alignments 1 to 5, which had a GEP of 2. As expected, a GEP of 10 caused a relative reduction in the expansion of gaps and this is reflected by a more gradual decrease in the length of these alignments (6 to 10) with increasing GOP values. Morrison and Ellis (1997) showed a similar pattern of shorter alignments with increased gap penalties for SSU sequences of apicomplexan protozoa. The choice of alignment parameters did not, however, noticeably change the number of parsimony informative characters, as these remained approximately constant across alignments.
Tree resolution was affected by parameter choice only for the reduced alignments. Both parsimony bootstrap and parsimony MR consensus trees from the reduced alignments were considerably less resolved than trees from the full alignments, which remained essentially equally well-resolved across alignment parameters. Bootstrap values (and therefore resolution in bootstrap MR consensus trees) have been shown to be correlated with the number of characters included in an analysis (Felsenstein et al., 1985). Our limited bootstrap analysis addressed this issue by resampling fewer characters and comparing this result to the corresponding reduced alignment. Because trees from our limited bootstrap analyses were only slightly less resolved than those from the full alignments (Fig. 2), we conclude that the more dramatic drop in resolution of the trees from the reduced alignments is not due to the simple reduction in number of characters. Instead, the decreased resolution is due to the selective removal of alignment ambiguous characters, which are in highly variable regions containing many parsimony-informative sites. This decrease in resolution has been reported for other data sets (Gatesy et al., 1993, Castresana, 2000), and Castresana (2000) primarily attributed the decreased resolution to the loss of many informative characters. Our study showed that the number of parsimony-informative characters explained a significant portion of the variation in resolution of parsimony trees (r = 0.76), providing some statistical support for this explanation.
More noteworthy than their impact on tree resolution, is that alignment parameters influence tree topology in the nematode SSU rRNA data set. Trees from this study were too large for comparison by mere visual inspection, so a tree comparison metric had to be used. The symmetric difference distance (SymD) is the most commonly used tree distance metric. It is easy to calculate and may be used for trees that are not fully resolved. Its primary drawback is its sensitivity; trees that vary only by the different placement of a single terminal taxon can have a very high SymD value when compared (Swofford et al., 1991). SymD is also sensitive to differences in resolution in the trees being compared, as the distance can be inflated by better-resolved trees containing clades not present in compatible but less-resolved trees. This disparity in resolution could explain the high SymD values resulting from the within-alignment comparisons of trees from the reduced versus culled and full versus reduced analyses (Fig. 4), as trees from the reduced analysis had considerably lower resolution than those from the culled or full analyses (Fig. 2). However, inflation of the SymD due to differences in resolution may not account for all SymD disparity, as trees from the full versus culled within-alignment comparison showed intermediate levels of tree topology disparity despite similar levels of resolution (Figs. 2 and 4). In addition, SymD results based on comparison of majority rule consensus trees are not as simple to interpret as SymD comparisons of individual trees, because topological differences among trees can be mitigated by the consensus building process. Majority-rule consensus parsimony trees were more similar to each other than individual parsimony trees, especially for the reduced data sets (e.g., SymD of 57.22 for reduced 50% MR consensus versus 85.79 for individual reduced). This difference between the similarity of MR consensus trees and individual most parsimonious trees reflects the topological differences lost in the consensus building process. Individual bootstrap trees showed the greatest disparity (highest SymD values) due to topological variation introduced by the resampling process. Bootstrap MR consensus trees, in contrast, were very similar to each other, with SymD values as low as 6.0 for the 70% consensus in the full analysis. These trees likely are quite similar due to their low level of resolution (trees not shown), with relatively few clades to differ among trees.
The exploration of tree space with TSV provides visual evidence of the disparity among trees from different alignment parameters. Single most parsimonious trees from the full alignments showed clustering by alignment parameters with almost no overlap. The reduced alignments, from which alignment ambiguous regions had been removed, produced a very different TSV visualization with trees from different alignments overlapping considerably. This overlapping in tree space suggests that the alignment ambiguous regions removed by the culling process were indeed strongly influencing tree topology, as their absence in the reduced alignments produced more similar trees. Culled alignments, consisting entirely of regions judged to be alignment ambiguous, produced trees that were most tightly clustered in tree space and therefore most disparate. The visualization of the trees from the culled alignments dramatically demonstrates the influence of the alignment parameters on resultant tree topology.
Although the actual values of the symmetric difference distance can be hard to interpret, their correlation with the Manhattan distances between alignment parameters in this study is telling. Differences among individual most parsimonious trees from the full alignments were frequently (580/1000) correlated with the differences among alignment parameters, whereas differences among individual most parsimonious trees from the reduced alignments were only significantly correlated with the differences among alignment parameters 230 out of 1000 times. Differences among individual trees from the culled alignments were significantly correlated with the differences among the alignment parameters nearly as often as those from the full alignments, 403 out of 1000 times. This result provides a clear indication that the topology of the inferred trees is shaped by the alignment parameters chosen. That individual trees from the reduced alignments were less frequently correlated with the alignment parameters suggests that the culling process was selectively removing characters more subject to the influence of different alignment parameters. This suggests that regions deemed "alignment ambiguous" contained a substantial proportion of informative sites that were "plastic," with homology statements shaped by parameter choice, and the comparatively high SymD values for individual culled parsimony and bootstrap trees is also consistent with this view. These SymD values are not likely to be subject to inflation due to differences in resolution, as these comparisons are across alignment parameters and within full, reduced, or culled data sets, where little disparity in resolution was seen (Fig. 2). These ambiguous regions alone (culled alignments) were also frequently correlated with the alignment parameters. However, majority-rule consensus trees revealed that culling did not entirely eliminate sites correlated with alignment parameters. Differences among 50% MR consensus trees from the full alignment MP analyses were strongly correlated with the differences among alignment parameters (r = 0.516), trees from the reduced analyses showed a level of correlation (r = 0.281) that was just less than significant (p = 0.063), and consensus trees from the culled alignments were not correlated with the alignment parameters (r = 0.129). For 70% MR consensus trees from the MP analysis, differences among trees from both the full and reduced analyses were significantly correlated with the differences among alignment parameters, but differences among trees from the culled analyses were not. As fewer individual most parsimonious trees from the reduced alignments were correlated with the alignment parameters (e.g., 230 out of 1000 tested), it seems likely that the clades remaining in consensus tree comparisons may amplify repeated "plastic" homology statements. Differences among neighbor-joining trees (fully resolved) from the full and culled alignments were significantly correlated with the differences among alignment parameters, whereas the NJ tree from the reduced alignment was not, lending additional support to the idea that majority-rule consensus tree construction and lack of resolution are masking correlations with alignment parameters. Consensus trees have been criticized for several reasons, including their potential contradiction of the most parsimonious tree (Barrett et al., 1991) and their possible loss of relationships found in all input trees (Steel et al., 2000). Although the majority-rule consensus tree is the median tree in a collection of trees (Barthelemy et al., 1986), these empirical results suggest that the use of median trees for estimation of SymD in a collection of trees is problematic.
Although the sites that we judged alignment ambiguous were potentially misleading due to their being particularly influenced by the alignment parameters, these sites did contain phylogenetic information. Trees produced from these data (culled alignments) generally showed the same relationships among terminal taxa as did trees from analyses using all sites (full alignments) and analyses excluding alignment ambiguous regions (reduced analyses). Although the data that we considered alignment ambiguous were phylogenetically informative and generally congruent with other SSU characters, these data did not adequately support relationships among the major clades. The major clades representing the "backbone" of the tree were more poorly resolved by the culled data set than by the full data set.
Our study has shown that the choice of alignment parameters influences inferred tree topology for the nematode SSU data set. Symmetric difference distance comparisons showed that trees from full data sets, inferred using parsimony and distance methods, are often correlated with alignment parameters. A tree inferred from any particular full alignment will clearly be biased by alignment parameter choice. When no single alignment can be rationally chosen over others, pooling trees from multiple sets of alignment parameters is one approach that has been used commonly (Morrison and Ellis, 1997; Sanchis et al., 2001). Although nothing from the statistical analyses in this study directly supports the pooling of trees, it seems a justifiable empirical response to the knowledge that any single alignment is likely to be biased by the alignment parameters. Pooling trees from several different alignments serves to sample alignment space and can produce well-resolved trees. One difficulty with this approach, however, is that it is unclear what amount of alignment space is appropriate to explore. Kjer (1995, 2004) has argued that exploring alignment space with fixed gap costs (e.g., "sensitivity analysis" [Wheeler, 1995]) is flawed because different regions of the same molecule have varying probabilities of insertion and deletions, requiring different gap costs. Given the complex secondary structures of rRNA molecules, this criticism is likely to be warranted, and future exploration of alignment parameters and gap costs should consider the diverse nature of portions of rRNA molecules.
Regions of multiple alignments that include empirically questionable statements of positional homology have often been excluded from phylogenetic analyses. In our study, most individual trees from reduced data sets were no longer correlated with the alignment parameters, and the TSV visualization showed them to be less clustered with alignment parameters. However, even the removal of characters judged "alignment ambiguous" by eye (approximately 50% of SSU characters) cannot entirely eliminate the pervasive influence of the alignment parameters. Comparison of individual most parsimonious trees by SymD reflects this result as the full alignments had the greatest number of significantly correlated comparisons, followed by the culled alignments, with the reduced data sets having the smallest number of correlated comparisons. The use of a reduced data set does result in a loss of resolution, relative to both individual full alignments and a pool of several full alignments. When the highest levels of resolution are critical for resolving a particular phylogenetic question, such studies may benefit from pooling trees from multiple sets of alignment parameters rather than removing alignment ambiguous characters. The generality of these findings beyond nematode 18S rDNA should be explored with data sets using different loci and taxa.
Implications for Nematode Phylogeny
Nematodes have traditionally been divided into two classes: the primarily terrestrial Secernentea and the primarily aquatic Adenophorea. This distinction arose from evaluation of morphological characters, such as the presence of a lateral canal excretory system in Secernentea, which is absent in Adenophorea (Chitwood et al., 1958). Our analyses, like that of Blaxter et al. (1998), did not support the traditional split between the two classes. Instead, Secernentea was shown to be nested within adenophorean taxa in all our analyses. Table 5 summarizes some of the key differences between the results of Blaxter et al. (1998) and our analyses. In our phylogenetic hypotheses the Secernentea included part of a traditionally adenophorean taxon, the primarily aquatic Chromadorida (e.g., Plectus) (Fig. 6). The Monhysterida, represented by Diplolaimelloides and Daptonema, are strongly supported as the sister to the Secernentea (Fig. 7). This result was also found by Aleshin et al. (1998) in their SSU analysis of 19 nematode species. The rest of the Chromadorida, including many marine taxa, were placed as sister to the Monhysterida plus Secernentea in our analyses. These trees suggest that it was the common ancestor of the Monhysterids plus Secernentea that gave rise to the secernentean lineage.
|
The origin of animal parasitism is an important question that has been addressed with previous analyses of nematode SSU (Blaxter et al., 1998, Dorris et al., 1999, Blaxter et al., 2000). Our conservative phylogenetic hypotheses revealed that animal parasitism has evolved independently at least four times, supporting the conclusion of Blaxter et al. (1998). The evolution of parasitism (animal and plant) is summarized on the condensed MR consensus tree of Figure 8. The Strongylida, including such vertebrate parasites as lungworms (e.g., Angiostrongylus) and hookworms (e.g., Necator), were well supported as a clade in all of our analyses (e.g., Fig. 5). A second, large and heterogeneous clade of animal parasites, including Ascaridida (e.g., Ascaris) and Spirurida (e.g., Dirofilaria), was also strongly supported by both full (Fig. 6) and reduced analyses and was moderately supported by the culled analysis. Vertebrate parasites of the genus Strongyloides (Rhabditida) were placed by all analyses with panagrolaim rhabditids, but as sister to the insect-associated fungivore Bursaphelenchus (Fig. 6), and likely represent a third origin of animal parasites. Finally, the only adenophorean vertebrate parasites, the Trichocephalida (represented by Trichuris and Trichinella) were placed at the very base of the ingroup taxa (Fig. 7), representing a fourth origin of animal parasitism. Trichuris and Trichinella, however, have been noted for their particularly low rate of substitution for the SSU molecule: 0.141 and 0.110 substitution per site, respectively, compared to a rate of 0.187 for Caenorhabditis (Aguinaldo et al., 1997). Although it is possible that Trichocephalida is sister to the remaining nematodes, the placement of this taxon might also be influenced by lineage specific rate variation. Further taxon sampling among traditionally adenophorean nematodes is necessary to more fully characterize phylogenetic diversity among these groups.
|
|
|
|
Blaxter et al. (1998, 2000) and Dorris et al. (1999) suggested an arthropod-parasitic origin for the vertebrate parasites. As evidence they noted some examples of topological associations between arthropod parasites and vertebrate parasites. Because vertebrate parasites were not nested within groups of arthropod parasites, we consider the mere sister-group relationship between arthropod and vertebrate parasites to be inconclusive and find no support for the hypothesis that vertebrate parasites had arthropod parasites as their ancestors. For example, the entomopathogenic Heterorhabditis species were well supported as sister to the vertebrate parasitic Strongylida in all analyses (Fig. 5), but this sister-taxon relationship does not establish entomopathogens as the common ancestor, or preclude a free-living ancestor giving rise to both arthropod and vertebrate parasites. Additionally, the insect parasite Brumptaemilius was nested within a large clade (Fig. 6) that includes many vertebrate parasites (e.g., ascaridids and spirurids) and has as sister other taxa (e.g., oxyurids) that parasitize both vertebrates and invertebrates, providing no clear evidence for the evolution of the host associations among these parasites. The trees of Blaxter et al. 1998 placed the entomopathogenic Steinernema as sister to Strongyloides and Panagrolaimidae, but our analyses did not depict this relationship (Fig. 6). Mermisnigrescens was the only arthropod-parasitic taxon at the base of our tree, among traditionally adenophorean nematodes. It was not placed with the other adenophorean vertebrate parasites in our study, the Trichocephalida, but was instead strongly supported as sister to Mylonchulis, a predator. Thus, neither our phylogenies nor those of Blaxter et al. (1998, 2000) and Dorris et al. (1999) have recovered a vertebrate parasite nested within a clade of arthropod-parasitic taxa, a topology that would provide strong evidence for an arthropod parasitic origin. That both arthropod and vertebrate parasites have evolved multiple times from free-living common ancestors is consistent with the hypothesis that certain preadaptations in free-living nematodes may facilitate adaptations to parasitism. These preadaptations include abilities of certain nematodes to thrive in saprophytic environments, such as higher temperatures resulting from bacterial activity and anaerobic conditions (Osche, 1956, 1963). The addition of more arthropod parasites to the SSU data set would permit more robust testing of these hypotheses.
Our results also suggest multiple origins of plant parasites from free-living nematodes (Fig. 8). Among traditionally adenophorean nematodes, the free-living enoplid Prismatolaimus was strongly supported as sister to the plant-parasitic Triplonchida (Trichodorus and Paratrichodorus) in our full (Fig. 7) and reduced analyses, but weakly supported by our culled analyses. A second clade of adenophorean plant parasites, the Dorylaimida (Xiphinema and Longidorus) was weakly supported (65% bootstrap value) as sister to Mermis and Mylonchulus by the reduced analysis, but is supported more strongly (90% bootstrap value) by the full (Fig. 7) and culled analyses. Sampled secernentean plant parasites (the Tylenchida, e.g., Meloidogyne) were strongly supported as sister to the free-living, bacterial feeders in Cephalobidae (e.g., Cephalobus). Ultrastructural examination of the stoma of certain Cephalobina had previously suggested this relationship (Dolinski et al., 1998).
Although many relationships among major clades remained unresolved by the SSU data set, these results provide an opportunity to examine the monophyly of many important groups. The vertebrate parasites in Strongylida formed a strongly supported (100% bootstrap value) clade in all analyses (Fig. 5) and were nested within the widely paraphyletic Rhabditida. The Strongylida was well supported as sister to the entomopathogenic Heterorhabditis and, as suggested by Fitch and Thomas (1997), part of a clade of rhabditids that Sudhaus and Fitch, (2001) termed "Eurhabditis" (e.g., Pellioditis and Rhabditella). Members of Rhabditida were the most numerically dominant members of our data set, and seemed to show greater topological differences between the full and reduced phylogenetic analyses than did members of other higher taxa. This may have been a result of their relatively high sequence divergence (e.g., compared to the Strongylida, which showed almost no variation even among families) or their greater breadth of taxon sampling. There were conflicting placements for several clades, depending on whether the full or culled data set was used. Caenorhabditis, for example, consistently formed a well-supported clade in full and reduced analyses and was supported (74% bootstrap value) as sister to Diploscapter and Protorhabditis by the full analysis (Fig. 5) and culled analysis (77% bootstrap value). In the reduced analysis, Caenorhabditis was instead placed as sister to the "Eurhabditis"/Strongylida clade, and Diploscapter and Protorhabditis were strongly supported (100% bootstrap value) as sister to the Caenorhabditis/"Eurhabditis"/Strongylida clade. This suggests that the sites supporting a sister-group relationship between Caenorhabditis and the Diploscapter/Protorhabditis clade were removed in the culling process. In an analysis of Rhabditidae using SSU (Fitch et al., 2000) the relationships among these clades were unresolved, but more recent analyses using SSU, large subunit, and portions of the RNA polymerase II gene placed the Caenorhabditis clade and Diploscapter/Protorhabditis clades as sister taxa (Kiontke et al., 2005). Although removing ambiguous sites seemed to increase the resolution among secernentean clades, most of the additional resolution did not show bootstrap support and may therefore be unreliable. Some of the members of Rhabditida did form strongly supported monophyletic groups, such as the Cephalobidae (e.g., Cephalobus), with 100% bootstrap support for full (Fig. 6) and reduced analyses and 92% for culled analyses.
The vertebrate parasitic Ascaridida was paraphyletic in the full analysis as it included the insect parasitic Rhigonematida (Fig. 6), however, this result was not reliably supported by bootstrap resampling. In the reduced analysis Ascaridida was part of a large and poorly resolved clade that included Rhigonematida and vertebrate parasites representing Spirurida and Oxyurida. Spirurida, including important human parasites such as Dracunculus and Wuchereria, was also shown to be paraphyletic as the spirurid Gnathostoma was placed apart from the rest of the spirurids (Fig. 6). Several traditionally adenophorean taxa were also shown to be paraphyletic, including the predominantly aquatic Chromadorida, which had members (Plectus) placed among the generally terrestrial secernentean taxa for all analyses (Fig. 6). None of these placements of Plectus was supported by bootstrap results, however, and its affinities remain unclear. Enoplida also appeared to be paraphyletic, as one of its members, Prismatolaimus, was strongly supported as sister to the plant parasitic Triplonchida (Fig. 7). Many higher adenophorean taxa were represented by only one species in the SSU data set, such as the predatory Mylonchulus in Mononchida, requiring additional taxa in order to test monophyly.
Utility of the Nematode SSU Data Set
We have shown the nematode SSU data set to be sensitive to the choice of alignment parameters. Both resolution and topological differences were seen in trees produced from different alignments. The removal of alignment ambiguous data serves to reduce the influence of alignment parameters, making the choice of parameters less critical. We advocate a conservative approach to this data set that includes an examination of regions that are sensitive to alignment parameters. We suggest that due to the large amount of SSU sequence divergence between major clades of nematodes, this data set may never resolve deep relationships. The greater utility of the nematode SSU data set may lie in its use within major clades, where alignments are likely to be more robust.
|
| Acknowledgments |
|---|
|
|
|---|
We thank B. J. Adams, D. K. Berwald, J. G. Burleigh, A. C. Driskell, J. R. Garey, D. M. Gusfield, R. H. Ree, and M. E. Siddall for helpful suggestions and assistance during the course of the project. We also thank two anonymous reviewers as well as J. G. Baldwin, D. Ciszek, K. M. Kjer, R. D. M. Page, and M. Yoder for critical comments that helped improve the manuscript. This research was supported by NSF PEET grant DEB-9712355 and NSF Tree of Life grant DEB-0228692.
| Notes |
|---|
|
|
|---|
3 Department of Invertebrate Zoology, NMNH, P.O. Box 37012, MRC 163, Smithsonian Institution, Washington, D.C., 20013-7012, USA
| References |
|---|
|
|
|---|
-
Aguinaldo A. M. A., Turbeville J. M., Linford L. S., Rivera M. C., Garey J. R., Raff R. A., Lake J. A. Evidence for a clade of nematodes, arthropods and other moulting animals. Nature (1997) 387:489–493.[CrossRef][Medline]
Aleshin V. V., Kedrova O. S., Milyutina I. A., Vladychenskaya N. S., Petrov N. B. Relationships among nematodes based on the analysis of 18S rRNA gene sequences: Molecular evidence for monophyly of chromadorian and secernentian nematodes. Russ. J. Nematol. (1998) 6:175–184.
Amenta N., Klingner J. Case study: Visualizing sets of evolutionary trees (2002) 71–74. in 8th IEEE Symposium on Information Visualization.
Barrett M., Donoghue M. J., Sober E. Against consensus. Syst. Zool. (1991) 40:486–493.
Barthélemy J. P., McMorris F. R. The median procedure for n-trees. J. Classif. (1986) 3:329–334.[CrossRef]
Benson D. A., Karsch-Mizrachi I., Lipman D. J., Ostell J., Wheeler D. L. Genbank: Update. Nucleic Acids Res. (2004) 32:D23–D26.
Bertelli S., Giannini N. P. A phylogeny of extant penguins (Aves: Sphenisciformes) combining morphology and mitochondrial sequences. Cladistics (2005) 21:209–239.[CrossRef][Web of Science]
Blaxter M. L., De Ley P., Garey J. R., Liu L. X., Scheldeman P., Vierstraete A., Vanfleteren J. R., Mackey L. Y., Dorris M., Frisse L. M., Vida J. T., Thomas W. K. A molecular evolutionary framework for the phylum Nematoda. Nature (1998) 392:71–75.[CrossRef][Medline]
Blaxter M. L., Dorris M., De Ley P. Patterns and processes in the evolution of animal parasitic nematodes. Nematology (2000) 2:43–55.[CrossRef][Web of Science]
Bonnet E., Van de Peer Y. zt: a software tool for simple and partial Mantel tests. J. Stat. Soft. (2002) 7:1–12.
Borg I., Groenen P. Modern multidimensional scaling (1997) Heidelberg: Springer-Verlag.
Cannonne J. J., Subramanian S., Schnare M. N., Collett J. R., D'Souza L. M., Du Y., Feng B., Lin N., Madabusi L. V., Muller K. M., Pande N., Shang Z., Yu N., Gutell R. R. The Comparative RNA Web (CRW) Site: An online database of comparative sequence and structure information for ribosomal, intron, and other RNAs (2002) Available at http://www.rna.icmb.utexas.edu/ BMC Bioinformatics 3:15.
Castresana J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. (2000) 17:540–552.
Chitwood B. G. The designation of official names for higher taxa of invertebrates. Bull. Zool. Nomen. (1958) 15:860–895.
Colless D. H. Congruence between morphometric and allozyme data for Menidia species: A reappraisal. Syst. Zool. (1980) 29:288–299.
Damgaard J., Andersen N. M., Meier R. Combining molecular and morphological analyses of water strider phylogeny (Hemiptera-Heteroptera, Gerromorpha): effects of alignment and taxon sampling. Syst. Entomol. (2005) 30:289–309.[CrossRef]
De Ley P., Blaxter M. L. Systematic position and phylogeny. In: The Biology of Nematodes—Lee D., ed. (2002) London: Taylor and Francis. 1–30.
DeSalle R., Wray C., Absher R. Computational problems in molecular systematics. In: Molecular ecology and evolution: Approaches and applications—Schierwater B., Streit B., Wagner G., DeSalle R., eds. (1994) Basel, Switzerland: Birkhaeuser Verlag. 353–370.
Dolinski C., Borgonie G., Schnabel R., Baldwin J. G. Buccal capsule development as a consideration for phylogenetic analysis of Rhabditida (Nemata). Dev. Genes Evol. (1998) 208:495–503.[CrossRef][Web of Science][Medline]
Dorris M., De Ley P., Blaxter M. L. Molecular analysis of nematode diversity and the evolution of parasitism. Parasitol. Today (1999) 15:188–193.[CrossRef][Web of Science][Medline]
Ellis R. E., Sulston J. E., Coulson A. R. The rDNA of C. elegans: Sequence and structure. Nucleic Acids Res. (1986) 14:2345–2364.
Felsenstein J. Confidence limits on phylogenies: An approach using the bootstrap. Evolution (1985) 39:783–791.[CrossRef][Web of Science]
Fitch D. H., Thomas W. K. Evolution—Riddle D., Blumenthal T., Meyer B., Priess J., eds. (1997) Cold Spring Harbor, New York: Cold Spring Harbor Laboratory Press. 815–850. C. elegans II.
Fitch D. H. A. Evolution of "Rhabditidae" and the male tail. J. Nematol. (2000) 32:235–244.[Web of Science][Medline]
Gardner P. P., Wilm A., Washietl S. A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res. (2005) 33:2433–2439.
Gatesy J., DeSalle R., Wheeler W. Alignment ambiguous nucleotide sites and the exclusion of systematic data. Mol. Phylogenet. Evol. (1993) 2:152–157.[CrossRef][Medline]
Gillespie J. J. Characterizing regions of ambiguous alignment caused by the expansion and contraction of hairpin-stem loops in ribosomal RNA molecules. Mol. Phyl. Evol. (2004) 33:936–943.[CrossRef][Web of Science][Medline]
Giribet G. Generating implied alignments under direct opitimization using POY. Cladistics (2005) 21:396–402.[CrossRef][Web of Science]
Goertzen L. R., Cannone J. J., Gutell R. R., Jansen R. K. ITS secondary structure derived from comparative analysis: Implications for sequence alignment and phylogeny of the Asteraceae. Mol. Phylogenet. Evol. (2003) 29:216–234.[CrossRef][Web of Science][Medline]
Hasegawa M., Kishino H., Yano T. Dating the human-ape split by a molecular clock of mitochondrial DNA. J. Mol. Evol. (1985) 22:160–174.[CrossRef][Web of Science][Medline]
Hickson R. E., Simon C., Perrey S. W. The performance of several multiple-sequence alignment programs in relation to secondary-structure features for an rRNA sequence. Mol. Biol. Evol. (2000) 17:530–539.
Hillis D., Heath T., St. John K. Analysis and visualization of tree space. Syst. Biol. (2005) 54:471–482.
Hillis D. M., Dixon M. T. Ribosomal DNA: Molecular evolution and phylogenetic inference. Q. Rev. Biol. (1991) 66:411–453.[CrossRef][Medline]
Hudelot C., Gowri-Shankar V., Jow H., Rattray M., Higgs P. G. RNA-based phylogenetic methods: Application to mammalian mitochondrial RNA sequences. Mol. Phylogenet. Evol. (2003) 28:241–252.[CrossRef][Web of Science][Medline]
Huelsenbeck J. P., Ronquist F. MrBayes: Bayesian inference of phylogeny. Bioinformatics (2001) 17:754–755.
Kawakita A., Sota T., Ascher J. S., Ito M., Tanaka H., Kato M. Evolution and phylogenetic utility of alignment gaps within intron sequences of three nuclear genes in bumble bees (Bombus). Mol. Biol. Evol. (2003) 20:87–92.
Kiontke K., Fitch D. H. A. The phylogenetic relationships of Caenorhabditis and other rhabditids. In: WormBook—C. elegans Research Community, ed. (2005) Available at http://www.wormbook.org.
Kjer K. M. Use of rRNA secondary structure in phylogenetic studies to identify homologous positions: An example of alignment and data presentation from the frogs. Mol. Phylogenet. Evol. (1995) 4:314–330.[CrossRef][Web of Science][Medline]
Kjer K. M. Aligned 18S and insect phylogeny. Syst. Biol. (2004) 53:506–514.
Kluge A. G. A concern for evidence and a phylogenetic hypothesis of relationships among Epicrates (Boidae, Serpentes). Syst. Zool. (1989) 38:7–25.[Abstract]
Kruskal J. B. An overview of sequence comparison. In: Time warps, string edits, and macromolecules: The theory and practice of sequence comparison—Sankoff D., Kruskal J. B., eds. (1983) Reading, Massachusetts: Addison-Wesley. 1–45.
Lecointre G., Deleporte P. Total evidence requires exclusion of phylogenetically misleading data. Zool. Scripta (2005) 34:101–117.[CrossRef][Web of Science]
Lee M. S. Y. Unalignable sequences and molecular evolution. Trends Ecol. Evol. (2001) 16:681–685.[CrossRef]
Lingoes J. C., Roskam E. E., Borg I. Geometric representations of relational data. (1979) 2nd edition. Ann Arbor, Michigan: Mathesis Press.
Löytynoja A., Milinkovitch M. C. Molecular phylogenetic analyses of the mitochondrial ADP-ATP carriers: The Plantae/Fungi/Metazoa trichotomy revisited. Proc. Nat. Acad. Sci. USA (2001) 98:10202–10207.
Löytynoja A., Milinkovitch M. C. A hidden Markov model for progressive multiple alignment. Bioinformatics (2003) 19:1505–1513.
Lutzoni F., Wagner P., Reeb V., Zoller S. Integrating ambiguously aligned regions of DNA sequences in phylogenetic analyses without violating positional homology. Syst. Biol. (2000) 49:628–651.
Maddison W. P., Maddison D. R. Mesquite: A modular system for evolutionary analysis, version 1.02. (2004) Available at http://mesquiteproject.org.
Mantel N. The detection of disease clustering and a generalized regression approach. Cancer Res. (1967) 27:209–220.
Mantel N., Valand R. S. A technique of nonparametric multivariate analysis. Biometrics (1970) 26:547–558.[CrossRef][Web of Science][Medline]
Mattern M. Y., McLennan D. A. Total evidence phylogeny of Gasterostiedae: Combining molecular, morphological and behavioural data. Cladistics (2004) 20:14–22.[CrossRef][Web of Science]
Morrison D. A., Ellis J. T. Effects of nucleotide sequence alignment on phylogeny estimation: A case study of 18S rDNAs of Apicomplexa. Mol. Biol. Evol. (1997) 14:428–441.[Abstract]
Nadler S. A., Hudspeth D. S. S. Phylogeny of the Ascaridoidea (Nematoda: Ascaridida) based on three genes and morphology: Hypotheses of structural and sequence evolution. J. Parasitol. (2000) 86:380–393.[CrossRef][Medline]
Nishiguchi M. K., Nair V. S. Evolution of symbiosis in the Vibrionaceae: A combined approach using molecules and physiology. Int. J. Syst. Evol. Micr. (2003) 53:2019–2026.[CrossRef]
Osche G. Die Praeadaptation freilebender Nematoden an den Parasitismus. Verhandlungen der Deutschen Zoologischen Gesellschaft, Erlangen (1956) 19:391–396. (Zoologischer Anzeiger Supplement).
Osche G. Morphological, biological, and ecological considerations in the phylogeny of parasitic nematodes. In: The Lower Metazoa, Comparative Biology and Phylogeny—Dougherty E., ed. (1963) Berkeley, California: University of California Press. 283–302.
Peterson K. J., Eernisse D. J. Animal phylogeny and the ancestry of bilaterians: Inferences from morphology and 18S rDNA gene sequences. Evol. Dev. (2001) 3:170–205.[CrossRef][Web of Science][Medline]
Rambaut A. (1996) Available at http://evolve.zoo.ox.ac.uk/ Se-al: Sequence alignment editor.
Rinsma-Melchert I. The expected number of matches in optimal global sequence alignments. N. Z. J. Bot. (1993) 31:219–230.
Robinson D. F., Foulds L. R. Comparisons on weighted labelled trees. In: Lecture notes in mathematics Volume 748 (1979) Berlin: Springer-Verlag. 119–126.
Robinson D. F., Foulds L. R. Comparison of phylogenetic trees. Math. Biosci. (1981) 53:131–147.[CrossRef][Web of Science]
Ronquist F., Huelsenbeck J. P. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics (2003) 19:1572–1574.
Sanchis A., Michelena J. M., Latorre A., Quicke D. L. J., Gardenfors U., Belshaw R. The phylogenetic analysis of variable-length sequence data: Elongation factor-1
introns in European populations of the parasitoid wasp genus Pauesia (Hymenoptera: Braconidae: Aphidiinae). Mol. Biol. Evol. (2001) 18:1117–1131.
Sankoff D. Minimal mutation trees of sequences. SIAM J. Appl. Math. (1975) 28:3542.
Sankoff D., Cedergren R. J. Simultaneous comparison of three or more sequences related by a tree. In: Time warps, string edits, and macromolecules: The theory and practice of sequence comparison—Sankoff D., Kruskal J. B., eds. (1983) Reading, Massachusetts: Addison-Wesley. 253–263.
Sankoff D., Morel C., Cedergren R. J. Evolution of 5S RNA and the non-randomness of base replacement. Nat. New Biol. (1973) 245:232–234.[CrossRef][Web of Science][Medline]
Steel M., Dress A. W. M., Böcker S. Simple but fundamental limitations on supertree and consensus tree methods. Syst. Biol. (2000) 49:363–368.
Sudhaus W., Fitch D. H. A. Comparative studies on the phylogeny and systematics of the Rhabditidae (Nematoda). J. Nematol. (2001) 33:1–72.[Medline]
Swofford D. L. When are phylogenetic estimates from molecular and morphological data incongruent? In: Phylogenetic Analysis of DNA Sequences—Miyamoto M. M., Cracraft J., eds. (1991) New York: Oxford University Press. 295–333.
Swofford D. L. PAUP*: Phylogenetic analysis using parsimony (*and other methods). Version 4 (2003) Sunderland, Massachusetts: Sinauer Associates Sunderland.
Tekle Y. I., Raikova O. I., Ahmadzadeh A., Jondelius U. Revision of the Childiidae (Acoela), a total evidence approach in reconstructing the phylogeny of acoels with reversed muscle layers. J. Zool. Sys. Evol. Res. (2005) 43:72–90.[CrossRef]
Telford M. J., Wise M. J., Gowri-Shankar V. Consideration of RNA secondary structure significantly improves likelihood-based estimates of phylogeny: Examples from the bilateria. Mol. Biol. Evol. (2005) 22:1129–1136.
Thompson J. D., Higgins D. G., Gibson T. J. Clustal W-improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. (1994) 22:4673–4680.
Thornton J. W., DeSalle R. A new method to localize and test the significance of incongruence: Detecting domain shuffling in the nuclear receptor superfamily. Syst. Biol. (2000) 49:183–201.
van Rossum G., Drake F. L. Python Reference Manual (2001) Virginia, USA: PythonLabs. Available at http://www.python.org.
Wheeler W. C. Sequence alignment, parameter sensitivity and the phylogenetic analysis of molecular data. Syst. Biol. (1995) 44:321–331.
Wheeler W. C. Optimization alignment: The end of multiple sequence alignment in phylogenetics? Cladistics (1996) 12:1–9.[CrossRef][Web of Science]
Wheeler W. C. Fixed character states and the optimization of molecular sequence data. Cladistics (1999) 15:379–385.[CrossRef][Web of Science]
Wheeler W. C. Implied alignment: A synapomorphy-based multiple-sequence alignment method and its use in cladogram search. Cladistics (2003) 19:261–268.[CrossRef][Web of Science][Medline]
Wheeler W. C., Gladstain D., De Laet J. POY, version 3.0.11. Program and documentation. American Museum of Natural History (2003) Available at ftp.amnh.org/pub/molecular.
Xia X., Xie Z., Kjer K. M. 18S ribosomal RNA and tetrapod phylogeny. Syst. Biol. (2003) 52:283–295.
Young F. W., Hamer R. M. Multidimensional scaling: History, theory and applications (1987) New York: Erlbaum.
Zhang Y. C., Baldwin J. G. Ultrastructure of the postcorpus of the esophagus of Teratocephalus lirellus (Teratocephalida) and its use for interpreting character evolution in Secernentea (Nematoda). Can. J. Zool. (2001) 79:16–25.[CrossRef]
This article has been cited by other articles:
![]() |
G. Talavera and J. Castresana Improvement of Phylogenies after Removing Divergent and Ambiguously Aligned Blocks from Protein Sequence Alignments Syst Biol, August 1, 2007; 56(4): 564 - 577. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||








