© 2006 Society of Systematic Biologists
Incorporating Allelic Variation for Reconstructing the Evolutionary History of Organisms from Multiple Genes: An Example from Rosa in North America
Edited by Allen Baker: Associate Editor
1 Institut de recherche en biologie végétale, Université de Montréal 4101 Sherbrooke Est, Montréal (Québec), H1X 2B2, Canada E-mail: simon.joly{at}umontreal.ca (S.J.) anne.bruneau{at}umontreal.ca (A.B.)
| Abstract |
|---|
|
|
|---|
Allelic variation within individuals holds information regarding the relationships of organisms, which is expected to be particularly important for reconstructing the evolutionary history of closely related taxa. However, little effort has been committed to incorporate such information for reconstructing the phylogeny of organisms. Haplotype trees represent a solution when one nonrecombinant marker is considered, but there is no satisfying method when multiple genes are to be combined. In this paper, we propose an algorithm that converts a distance matrix of alleles to a distance matrix among organisms. This algorithm allows the incorporation of allelic variation for reconstructing the phylogeny of organisms from one or more genes. The method is applied to reconstruct the phylogeny of the seven native diploid species of Rosa sect. Cinnamomeae in North America. The glyceralgehyde 3-phosphate dehydrogenase (GAPDH), the triose phosphate isomerase (TPI), and the malate synthase (MS) genes were sequenced for 40 individuals from these species. The three genes had little genetic variation, and most species showed incomplete lineage sorting, suggesting these species have a recent origin. Despite these difficulties, the networks (NeighborNet) of organisms reconstructed from the matrix obtained with the algorithm recovered groups that more closely match taxonomic boundaries than did the haplotype trees. The combined network of individuals shows that species west of the Rocky Mountains, Rosa gymnocarpa and R. pisocarpa, form exclusive groups and that together they are distinct from eastern species. In the east, three groups were found to be exclusive: R. nitida–R. palustris, R. foliolosa, and R. blanda–R. woodsii. These groups are congruent with the morphology and the ecology of species. The method is also useful for representing hybrid individuals when the relationships are reconstructed using a phylogenetic network.
Keywords: Allelic variation; gene tree–species tree; haplotype trees; hybridization; incomplete lineage sorting; phylogenetic networks; Rosa; total evidence
Received September 15, 2005; Revised December 10, 2005; Accepted January 6, 2006
Allelic variation at autosomal loci holds information regarding the relationships of organisms. Indeed, using two alleles instead of one can give better estimations of phylogenetic relationships because twice the amount of information is provided. This is especially true of closely related taxa for which incomplete lineage sorting is more likely (Rosenberg, 2002, 2003; Degnan and Salter, 2005). In addition, allelic variation allows the detection of hybrid individuals with a single marker, whereas at least two are required when only one allele per locus is sampled. But in spite of the amount of data contained in allelic variation, little effort has been directed to date at incorporating such information for reconstructing the phylogenetic relationships of organisms.
One solution when a single nonrecombinant marker is considered is to use haplotype trees, which frequently are used in evolutionary studies of closely related species (Schaal and Olsen, 2000). At present, however, there is no phylogenetic method that can easily incorporate allelic variation for more than one gene for reconstructing the evolutionary history of individuals. Yet the importance of investigating several markers for reconstructing the phylogeny of species is widely recognized as any single gene can be incongruent with the evolutionary history of species (Pamilo and Nei, 1988; Takahata, 1989; Wu, 1991; Doyle, 1992; Maddison, 1997; Nichols, 2001; Rosenberg, 2002, 2003; Degnan and Salter, 2005).
Most current approaches used for reconstructing phylogenies from multiple markers, either using a total evidence (e.g., Kluge, 1989; Yang, 1996; Seo et al., 2005) or a consensus approach (e.g., de Queiroz, 1993), cannot incorporate allelic variation for multiple genes because they use haplotypes as terminal units of the analysis. Because it makes no sense to concatenate alleles from different loci that segregate in natural populations, such methods are limited to using a single haplotype per individual. If the individuals, rather than the alleles, were the terminals of the analysis, it would be possible to combine information from different genes.
In this paper, we propose an algorithm that incorporates allelic variation for reconstructing the phylogeny of organisms. The proposed algorithm converts a distance matrix of alleles into a distance matrix of organisms so that individuals become the terminals of the analysis. The matrix of organisms for one marker can either be used alone or in combination with other matrices obtained from independently evolving markers to reconstruct a phylogeny of organisms.
The algorithm is applied to reconstruct the evolutionary history of the seven native diploid species of Rosa sect. Cinnamomeae in North America using allelic variation at three nuclear loci for 40 individuals. Very little is known of the phylogenetic relationships of these rose species, mostly because of the poor species sampling of previous phylogenetic studies (e.g., Millan et al., 1996; Matsumoto et al., 1998). Moreover, the little molecular variation found among North American species (Wissemann and Ritz, 2005; Joly et al., 2006) limits our understanding of their relationships and suggests that these species are of recent origin. Consequently, incomplete lineage sorting (or deep coalescence) could be an important issue in this group as it is expected to be most severe among recently diverged species (Rosenberg, 2002, 2003; Degnan and Salter, 2005). Hybridization also could be a confounding evolutionary process because of the propensity of these roses to hybridize (Erlanson, 1934; Ratsek et al., 1939, 1940; Lewis and Basye, 1961). Therefore, this group represents a good case study to test the proposed algorithm because of the potentially important additional information that allelic variation can provide.
| The POFAD Algorithm |
|---|
|
|
|---|
The POFAD (for Phylogeny of Organisms from Allelic Data) algorithm starts with a distance matrix of alleles for a given marker. The algorithm described below assumes that the organisms are diploids. The algorithm will be illustrated using a hypothetical example with five individuals (A to E) from which we have a haplotype distance matrix (Fig. 1A) that can be represented by a haplotype tree (Fig. 1B). In the example, letters are used to distinguish individuals: capital and lowercase letters represent individuals and alleles, respectively. Alleles within an individual are set apart by a number (1 or 2).
|
Calculating the Distance between Organisms
Let d(A, B) be the distance between individuals A and B and d(a, b) be the distance between alleles a and b. Moreover, let min[x; y] be the minimum of values of x and y. When evaluating the distance between two diploid individuals at a locus, three situations can be encountered:
(1) Both Individuals Have a Single Allele
In this situation, the distance between individuals is equal to the allelic distance. If A and B are two individuals that both have 1 allele,
|
|
(2) One Individual Has One Allele and the Other Has Two Alleles
If A is an individual with one allele (a) and C is an individual with two alleles (c1, c2), then
|
|
(3) Both Individuals Have Two Alleles
Two individuals, D and E, both have two alleles (d1, d2 and e1, e2). There are two pairs of allelic distances possible among these individuals: d(d1, e1) and d(d2, e2) or d(d1, e2) and d(d2, e1). The distance between such organisms is the mean of the shortest pair of distances:
|
|
Combining Information from Different Genes
The matrix of organisms obtained from one marker can either be used alone or be combined with matrices obtained from other markers. For the present paper, each gene matrix is reweighted so that each gene makes an equal contribution to the combined phylogeny. This is done by dividing each distance by the largest distance of the matrix, for each gene matrix. By attributing the same weight to each gene, every gene is considered to represent an independent estimation of the phylogeny. To fulfill this requirement, there needs to be no recombination within markers. In the presence of recombination, more than one evolutionary history is present in one marker and consequently the weight of the nonrecombining portions of a recombinant gene will be down-weighted. It is therefore recommended to test for recombination before combining different genes.
When combining multiple gene matrices, the final distance between two individuals is the mean of distances between these individuals in the individual matrices. If M and N are two individuals, then the mean distance between them will be:
|
|
In our imaginary example, the relationship of individuals was reconstructed from the matrix of organisms (Fig. 1C) using the NeighborNet method (Bryant and Moulton, 2004; Fig. 1D).
| Material and Methods |
|---|
|
|
|---|
Plant Material
Forty individuals from all seven North American diploid species of Rosa sect. Cinnamomeae were investigated (Table 1). Rosa gymnocarpa Nutt. and R. pisocarpa Gray are found exclusively west of the Rocky Mountains; R. blanda Ait., R. foliolosa Nutt. ex Torr. & A. Gray, R. nitida Willd., and R. palustris Marsh. occur strictly east of the Rockies, and R. woodsii Lindl. can be found on both sides of these mountains. Two diploid species of section Synstylae found in North America, R. setigera Michx. (native) and R. multiflora Thunb. (introduced and now a noxious invasive [Meiners et al., 2001; Hunter and Mattice, 2002]), were included as outgroup taxa. DNA was extracted using the CTAB method of Doyle and Doyle (1987) modified as in Joly (2006).
|
Gene Sequencing and Allele Sampling
Three nuclear genes were used in this study: glyceraldehyde 3-phosphate dehydrogenase (GAPDH), triose phosphate isomerase (TPI), and malate synthase (MS). The GAPDH sequences are from Joly et al. (2006); GenBank DQ091014 [GenBank] –027, 030–035, 038–057, 061–069, 072–086, 172–174). TPI was amplified and sequenced using forward primer TPI5F (5'-AAGGTGATCGCCTGTGTTGG-3') and reverse primer TPI7R (Strand et al., 1997) located in the fifth and seventh exon of the gene, respectively (Fig. 2). The MS gene was amplified and sequenced using primers ms400f and ms943r (Lewis and Doyle, 2001); the amplified region covers the first two introns of the gene (Fig. 2). The PCR conditions were as in Joly et al. (2006) except that the annealing temperature was 52°C and 48°C for TPI and MS, respectively, and that a manual hotstart was used for TPI (i.e., the Taq was included after the sample reached 95°C). PCR purification and sequencing followed Joly et al. (2006). Allele recovery was achieved using the procedure described in Joly et al. (2006). In short, individuals with no polymorphic peaks in direct sequencing were considered to be homozygous. Alleles of individuals that showed a single polymorphic site were easily extrapolated, but individuals that showed more than one polymorphic site or that had indels among its alleles needed to be cloned. Three to four clones were sequenced per individual to allow the detection of PCR induced mutations and of in vitro recombinants. The cloning procedure is described in Joly et al. (2006).
|
Analyses
Recombination
For each gene, recombination was tested using the homoplasy test (Maynard Smith and Smith, 1998), the neighbor similarity score (Jakobsen and Easteal, 1996), the Max chi-squared (
2; Maynard Smith, 1992), and the pairwise homoplasy index statistic (
; Bruen et al., 2006). These methods were selected because they were demonstrated to perform well in datasets of low divergence (Posada and Crandall, 2001; Posada, 2002; Bruen et al., 2006). The homoplasy test was performed without an outgroup using Maynard Smith's program (1998) under conservative (SE = 0.6S) and liberal (SE = S) conditions, where SE is the effective number of sites and S is the total number of sites in the dataset. The three other methods were implemented in a program written by T. Bruen (2005), testing the significance of the statistics using 1000 permutations. The
2 test used a sliding window of size corresponding to the number of polymorphic sites divided by 1.5 and the
test used a relative window size (w) of 100.
Phylogenetic Analyses
For each gene, the gaps were recoded using the simple gap coding method (Simmons and Ochoterena, 2000) implemented in GapCoder (Young and Healy, 2003). Haplotype trees were obtained with PAUP* (ver. 4.10b; Swofford, 2002) by heuristic parsimony analysis with 10 random addition sequence replicates, each retaining a maximum of 1000 trees, TBR branch swapping, and saving all minimal trees during branch swapping.
Two methods were used for obtaining allelic distance matrices from sequences. The first used allelic distances corrected using the appropriate evolutionary model, according to the Akaike information criterion (AIC; Akaike, 1974) calculated in ModelTest (ver. 3.7, Posada and Crandall, 1998) from a neighbor-joinning tree using the matrices without the gaps recoded and treating gaps as missing data. The second used the uncorrected distance of PAUP* to recover allelic distances from the matrices with gaps coded as presence/absence characters.
The matrices of organisms were obtained from POFAD for each gene individually and for the three genes in combination. The phylogeny of organisms was reconstructed using the NeighborNet algorithm.(Bryant and Moulton, 2004) implemented in SplitsTree (Huson and Bryant, 2006).
| Results |
|---|
|
|
|---|
Sequences for the genes TPI and MS were deposited in GenBank (DQ200986 [GenBank] to DQ201120 [GenBank] ) and matrices used for the analyses are available from TreeBase (study accession number S1444). All gene regions have a greater proportion of intron than exon positions in the aligned matrix, with TPI having a greater proportion of intron positions than the other genes for the regions under study (Table 2). Of the three genes, MS is the most variable, particularly in the exons where it has a higher number of both synonymous and non-synonymous mutations (Table 2). Indeed, GAPDH, TPI, and MS have 1, 1, and 8 variable amino acid mutations, respectively. All data sets have several indels, which are all located in the intron except one that resulted in the removal of two amino acids in the MS gene.
|
Recombination
Of the four methods used for detecting recombination, only the homoplasy test showed evidence of recombination, returning a positive result for all three datasets (Table 3). This discrepancy between methods could be the consequence of the presence of rate variation among sites in the datasets (see Table 2) because the homoplasy test has been shown to give false evidence of recombination in presence of rate heterogeneity (Posada and Crandall, 2001; Posada, 2002). Therefore, it is more likely that there has been no recombination in the three datasets. Visual inspection of homoplasies on haplotype trees (Templeton et al., 1992) also did not reveal evidence of recombination, further supporting an absence of recombination in each of the three datasets.
|
Haplotype Trees
Because no recombination was detected in the datasets, it is appropriate to use haplotype trees to represent the genealogy of the haplotypes for each gene. The haplotype trees differ with respect to which taxa form a clade for the different genes (Figs. 3A, 4A, 5A). Haplotypes of R. gymnocarpa form a clade with GAPDH and MS, but not with TPI. Haplotypes of R. pisocarpa only group together with GAPDH and none of the other species have their alleles in a single clade, yet this is sometimes the consequence of one or few incongruent haplotypes. Although haplotypes are more often closer to haplotypes of its species than to those of other species, the overall pattern is a lack of differentiation of species for any single gene. Despite the little information available regarding species relationships, some species are found in different positions in the haplotype trees. For instance, R. gymnocarpa is sister to all remaining North American species of sect. Cinnamomeae for GAPDH but not according to the other genes.
|
|
|
Organism Trees
The two ways of recovering allelic distances—the uncorrected distance using gap information and the corrected distance according to the appropriate evolutionary model—gave similar results although including gaps gave a slightly better resolution (data not shown). For this reason, only the results obtained with the uncorrected distance are shown. This choice is further motivated by the presence of several indels in the datasets. Indels are frequent among closely related species or individuals (Britten et al., 2003) and provide phylogenetic information (Kelchner, 2000) that should not be overlooked in phylogenetic studies. Moreover, because of the low divergence among species, it is less important to correct for multiple hits when calculating the distances.
The gene networks of organisms were more often congruent with the taxonomic boundaries than the haplotype trees (Figs. 3B, 4B, 5B). The haplotypes trees for the genes GAPDH, TPI, and MS resolved one, zero, and one species as monophyletic, respectively, whereas the network of organisms for the same genes had three, one, and three species resolved by splits. For example, R. foliolosa individuals are resolved by a split in all three genes and R. pisocarpa individuals group together with GAPDH and MS. Similarly, individuals of R. nitida and R. palustris together are resolved by splits with GAPDH and MS, with few exceptions. Finally, R. blanda and R. woodsii individuals together are resolved by a split with GAPDH, although this group also includes individual palustris386.
The networks of organisms appear to appropriately represent intermediate individuals. For example, many individuals (blanda[160, 421, 1214, 1219], woodsii[4, 700, 741], nitida675) have MS haplotypes that occur in each of the two major clades on the haplotype tree (
and β; Fig. 5A). Their intermediate status is clearly represented in the network of organisms as these individuals are positioned between the clusters corresponding to the two clades in the haplotype trees (
and β; Fig. 5B). Similar examples are found with the other genes.
The phylogenetic network obtained when the three nuclear genes are combined (Fig. 6) is more resolved and relationships are clearer than when genes are analyzed individually. The network clearly shows that individuals of R. gymnocarpa are supported by a split as are individuals of R. pisocarpa. However, the relationship of these western species with the eastern ones is not clear. For example, one split suggests that R. gymnocarpa is sister to all remaining North American species, whereas another suggests that it is closer to R. pisocarpa and some individuals of R. blanda and R. woodsii. Neither R. blanda nor R. woodsii are exclusive in the combined analysis, but these two species together are resolved by a weak split (i.e., there is another contradictory split or bipartition of similar or greater length on the network), which groups all individuals except woodsii700. The species R. nitida, R. foliolosa, and R. palustris are resolved as a group on the network, being supported by a weakly contradicted split. Of these three species, R. foliolosa individuals are clearly distinct and are strongly resolved by a split. Rosa nitida and R. palustris are not distinguished from one another but they are grouped together by a weak split on the network (Fig. 5).
|
| Discussion |
|---|
|
|
|---|
The POFAD Algorithm
The relationships obtained with the networks of organisms more closely match taxonomic boundaries than those obtained from the haplotype trees. This is probably because the proposed method increases the amount of information included per terminal by incorporating allelic variation for reconstructing the evolutionary history of organisms. For example, if an individual has an allele that is closer to alleles of another species because of deep coalescence, the individual could still group with its species depending on the other allele. This is indeed what happens with R. foliolosa that is resolved by a split in all networks of organisms but that is not monophyletic in any of the haplotype trees.
The incorporation of allelic data using the POFAD algorithm also potentially allows the detection and the representation of hybrid individuals if the phylogeny is reconstructed using a reticulate phylogenetic method. For instance, some individuals have malate synthase alleles that fall in two distinct clades in the haplotype tree and these individuals were represented as being intermediate between individuals belonging to these two clades in the network of organisms (see Results and Fig. 5). Using both alleles instead of one for autosomal loci allows the detection of hybrid individuals with a single marker, whereas a minimum of two markers is required when only one allele per individual is sampled. The power of detecting and representing hybrid individuals in phylogenies increases as more genes are investigated (Linder and Rieseberg, 2004), and the increased information contained in allelic variation should similarly improve our ability to reconstruct the evolution of hybrid individuals.
These examples demonstrate the importance of incorporating allelic variation whenever possible in phylogenetic analyses. Using allelic variation effectively doubles the number of lineages sampled. This increases the probability of sampling ancestral lineages within species that provide independent tests of the relationships among species (Rosenberg, 2002). With more ancestral lineages, there is an increased probability of sampling at least one lineage that will have a most recent interspecific coalescent event with its sister species, thereby improving chances of recovering the species phylogeny. This is particularly important for recently diverged species where haplotypes have had less time to coalesce within the species (Rosenberg, 2002).
Combining Multiple Genes
The greatest interest of the POFAD algorithm certainly is its ability to incorporate allelic variation when reconstructing the phylogenetic history of organisms from multiple datasets. Because any single gene can be incongruent with the species tree, it is important to sample multiple independently evolving markers to be confident in the resulting phylogeny. When analyzing multiple markers, one approach is to combine the datasets first and then to proceed to an analysis of the concatenated dataset (Kluge, 1989; Yang, 1996; Seo et al., 2005). This approach suffers from the fact that alleles are the terminal units of the analysis, henceforth hindering the concatenation of alleles from different loci because they segregate in natural populations. One solution would be to use a consensus sequence of alleles for each individual (see Howarth, 2005), therefore making the organisms the terminals units of the analysis. However, this would result in a loss of information because ambiguities are optimized as to minimize the number of evolutionary changes in phylogenetic analyses. To illustrate this, consider a sequence that differs at a single site between two diploid individuals. Then suppose that an individual is coded as R (A or G) at the site (which means that it has one allele with an A and one with a G) and that the other individual has an A. These two individuals would then be treated as if they were identical even if the first individual has two alleles including one that is different from the alleles of the second individual.
The alternative to the total evidence approach is the "gene as character" approach that consists of combining the trees from each marker analysed independently, either by using consensus tree (e.g., de Queiroz, 1993), reconciled tree (Page and Charleston, 1997; Slowinski et al., 1997), or supertree (e.g., Doyle, 1992; Sanderson et al., 1998; Bininda-Emonds, 2004; Wilkinson et al., 2005) methods. As for the total evidence approach, these methods use haplotypes as terminal units and cannot incorporate allelic variation in phylogenetic analyses of multiple genes, with the exception of reconciled trees. Reconciled trees, however, differ from the POFAD method in that species, rather than individuals, are the terminal units of the analysis. Indeed, one assumption of this method is that gene transmission is strictly vertical among the terminal units of the analysis (Page and Charleston, 1997).
Because of these problems with existing methods, studies that have used allelic variation from multiple markers have either compared the topologies of the different haplotype trees (Hare and Avise, 1998), used allelic consensus sequences for individuals in a concatenated matrix (Howarth, 2005), found concordant signals among gene trees to identify nonrecombining groups of individuals (Koufopanou et al., 1997), or compared the demographic events that were found to have affected each genealogy (Templeton, 2002). The method proposed in this paper gives an alternative to these options by reconstructing a single phylogeny of organisms from multiple datasets that contain allelic information.
Applicability
The POFAD method should be useful whenever haplotype trees are used, such as at the intraspecific level or at the species interface among closely related species. At the intraspecific level, it could be useful for phylogeographic studies that wish to draw conclusions from more than one nuclear gene. The use of nuclear genes for phylogeographic studies is becoming frequent (e.g., Olsen and Schaal, 1999; Hare, 2001; Antunes et al., 2002; Joly and Bruneau, 2004) and some studies have already used multiple nuclear gene trees (Hare and Avise, 1998; Templeton, 2002). The proposed method could also be useful for studies at the species interface where it can help delimit species. Because alleles at nuclear loci segregate in natural populations due to sexual reproduction (gene segregation and recombination), relationships within species should be reticulate (tokogenetic), whereas they should be hierarchic (phylogenetic) among species. Tokogenetic relationships result in the sharing of alleles among individuals, which in turn tend to make individuals within species more similar to each other than to individuals of other species. This also implies that there should be no shared phylogenetic patterns among genes within species. In contrast, strong phylogenetic signals shared by a majority of genes should correspond to the speciation event (Koufopanou et al., 1997). These speciation events should therefore result in strong splits in the combined network of organisms if interspecific hybridization does not occur.
Phylogeny of North American Diploid Roses
Little is known of phylogenetic relationships among rose species in North America. Previous studies have provided little information because of the low resolution of molecular markers and poor species sampling (Millan et al., 1996; Matsumoto et al., 1998; Wissemann and Ritz, 2005). In contrast, the three nuclear genes sequenced for several individuals per species in this study allow an assessment of phylogenetic relationships among North American species but also provide information regarding species boundaries.
First of all, the diploid species of Rosa in North America appear to be of recent origin according to the low levels of genetic variation found in haplotype trees. Yet, it is also possible that the long generation time, which is typical for shrubs, could accentuate this trend. A rapid radiation is also supported by the lack of monophyly observed for most species. Indeed, recently diverged species are not expected to be reciprocally monophyletic and incomplete lineage sorting is expected to be frequent among such species (Rosenberg, 2002, 2003; Degnan and Salter, 2005). Nevertheless, polyphyletic species could also be the consequence of interspecific gene flow that is indicative of poorly defined species boundaries. Or course, the phenomenon responsible for nonmonophyletic species is likely to be different from one species to the other. But in spite of the low levels of genetic variation and of the absence of monophyly for most species for one or more of the genes studied, the combined analysis of individuals remains informative regarding the phylogenetic relationships of North American species.
Botanists generally have treated the western and eastern North American rose species as distinct entities (Lewis, 1957b; Erlanson MacFarlane, 1966). Yet, the hypothesis that western and eastern species form distinct phylogenetic groups has never been tested. The combined network suggests that a distinction between the west and the east may exist, although it is only supported by a weak split. Relative to the outgroup species of section Synstylae, one strong split suggests that R. gymnocarpa is sister to all remaining North American species, a signal mostly contributed by the GAPDH gene. The alternative solution, which is supported by a split of similar strength contributed mostly by the MS gene, groups R. gymnocarpa with R. pisocarpa and some individuals of R. blanda and R. woodsii. Congruent with this latter solution, a split on the network supports the monophyly of western species, but this split is rather weak. Because of the incongruence regarding the exact position of the western species among the genes studied, more genes will have to be investigated to determine the exact branching pattern and to confirm the distinction between western and eastern diploid species. Individually, however, both western species R. gymnocarpa and R. pisocarpa form exclusive groups of individuals, suggesting there is little or no genetic exchange between them. Thus, even if the sampling is limited for these species, the results suggest that these species are distinct.
In the east, the combined network shows that species are divided into two clear groups: one consist of R. blanda and R. woodsii and the other of R. foliolosa, R. nitida, and R. palustris. In the former group, individuals of R. blanda and R. woodsii cannot be distinguished from one another. However, both species together form a genetically variable group that is supported by a split in the combined analysis, with the exception of the woodsii700 individual. The high genetic diversity observed in this group may be explained in part by the widespread distribution of these species that could reduce the homogenizing effect of gene flow. Rosa woodsii ranges from California and British Columbia to the eastern Great Plains, whereas R. blanda is distributed from Manitoba and Minnesota in the west to New Brunswick and Maine in the east.
Several clues suggest that the lack of differentiation between R. blanda and R. woodsii is caused by ongoing gene flow. These species are indeed ecologically (they grow in mesic soils along woods and rivers) and morphologically similar and are difficult to tell apart (Lewis, 1962). Moreover, hybrids between these species have been shown to be highly fertile (Erlanson, 1934; Ratsek et al., 1939), and in the area where the two species overlap, Lewis (1962) described a hybrid zone. Clearly, the species status of these species needs to be reassessed.
The second eastern group revealed by the combined network consists of R. foliolosa, R. nitida and R. palustris. This group is congruent with morphological data because these species share many characters that distinguish them from other North American species. In fact, these species represent all the diploid species that were once placed in sect. Carolinae (Crépin, 1889).
Within this group, R. foliolosa distinguishes itself from the other species by having its two individuals clearly resolved as a group on the network. Although only two individuals were investigated for R. foliolosa, the network suggests that it is genetically distinct from the other species. Rosa foliolosa is also distinct from the other species morphologically, being characterized by narrow leaflets and small pedicels (Lewis, 1957a, 1958). This species is also peculiar for having the smallest geographic distribution of all species of sect. Cinnamomeae in North America, as it occurs only in Oklahoma, Texas, and western Arkansas (Lewis, 1958).
Individuals of the last two species, R. nitida and R. palustris, cannot be distinguished from one another on the network but together are supported as a group, albeit by a weak split. If we consider that R. foliolosa individuals are clearly distinct from individuals of these species, then R. nitida and R. palustris together form a rather cohesive group. A close relationship between these species is not surprising as both have narrow stipules, hypanthium glands, and a preference for bogs and poorly drained soils. In contrast with R. blanda and R. woodsii, however, R. nitida and R. palustris are clearly morphologically distinct (Lewis, 1957b, 1957a). This suggests that the lack of genetic distinction between these species is the consequence of a recent origin rather than of poorly defined species boundaries. Although the prevalence of incomplete lineage sorting among species suggests that little time has occurred since the formation of species, the often small populations of these roses and the patchiness of populations over wide geographic areas can also contribute to the retention of ancient polymorphisms. For example, the palustris386 individual is from the western extremity of the distribution of R. palustris, where few populations are found. This could explain why this individual has retained alleles that are more closely related to R. blanda and R. woodsii haplotypes for the GAPDH and TPI genes.
Gene Trees and Species Tree and Individual Sampling within Species
In agreement with most phylogenetic studies investigating multiple markers, incongruence was observed among gene trees obtained from the three loci investigated (Chen and Li, 2001; Cronn and Wendel, 2003; Doyle et al., 2003; Rokas et al., 2003; Jennings and Edwards, 2005). Although some of the incongruence results from the relative position of species among gene phylogenies (i.e., R. gymnocarpa), most of the incongruence observed in this study was caused by the lack of monophyly of the species. Such incongruence could be the result of paralogy, incomplete lineage sorting, or hybridization. No signs of gene duplication were noted in this study so paralogy does not seem to be the cause of the lack of species monophyly. Incomplete lineage sorting is more likely to be the cause of incongruence when an incongruent allele is distant from alleles of other species and when their divergence is basal (Holder et al., 2001; Funk and Omland, 2003; Joly et al., 2006). This appears to be case for the allele palustris386 that falls in the group of R. blanda and R. woodsii individuals in the GAPDH haplotype tree. In contrast, hybridization should cause an incongruent haplotype to have diverged recently and to be similar to alleles of another species (Holder et al., 2001; Funk and Omland, 2003; Joly et al., 2006). For example, hybridization could explain the position of allele A of nitida604 in the GAPDH haplotype tree, which is located in an otherwise exclusively R. blanda and R. woodsii clade. It is not always obvious how to distinguish the two processes, however, and it may be often impossible to be confident of the process that caused incongruence (Holder et al., 2001; Joly et al., 2006).
Incongruence caused by nonmonophyletic species demonstrates the importance not only of sampling many genes but also of sampling many individuals per species when reconstructing the phylogenetic history of closely related species. Rosenberg (2002) indeed showed that enhanced haplotype sampling increases the probability that the gene tree is topologically concordant with the species tree, in particular for recent radiations as in North American diploid roses. Maddison and Knowles (2006) arrive at the same conclusion in a simulation study demonstrating that given limited resources, it is more advantageous to sample more individuals per species for a single gene than to sequence few individuals for more genes if the species have diverged recently. As discussed above in the context of allelic variation, sampling more individuals increases the probability of sampling ancestral lineages and gives a better chance of accurately reconstructing the phylogenetic history of species, particularly for recently diverged species (Rosenberg, 2002).
Studies that assess the gene tree vs. species tree problem often sample a single individual per species and highlight incompatibilities among the phylogenies obtained from different genes. In these studies, a gene can only be congruent or incongruent with the species tree. Yet, it is probably more frequent that for any particular gene there will be some haplotypes that agree with the species tree and some others that will be incongruent with it. As noted by Rosenberg (2003), without an appropriate sampling of individuals within species, one could conclude that a gene has coalesced within the species when it has not. Such incorrect inferences could result in biased conclusions concerning the evolutionary processes involved in speciation (Funk and Omland, 2003).
| Conclusion |
|---|
|
|
|---|
The algorithm described in this paper allows the incorporation of allelic variation in reconstructing the phylogenetic history of organisms of one or more genes. Allelic variation should provide important additional phylogenetic information when working with closely related species. It also gives the opportunity to reconstruct the phylogenetic history of hybrid individuals even with a single marker when a reticulate phylogenetic method is used. We hope that such a method will stimulate the incorporation of allelic data into phylogenetic analysis as it represents an important amount of information that too often is neglected.
| Acknowledgment |
|---|
|
|
|---|
The authors gratefully acknowledge the help of Julian Starr, Walter Lewis, Luc Brouillet, Elisabeth Dickson, Barbara Ertter, Alain Meilleur, Jeff Saarela, and Richard Spellenberg for providing plant material. Authors also thank François-Joseph Lapointe and Bernard Angers for their comments and Trevor Bruen for help and suggestions with recombination analyses. Rod Page, Allan Baker, David Bryant, and an anonymous reviewer gave helpful comments on a previous version of the manuscript. Financial help for this study came from research grants (AB) and fellowships (SJ) from the National Sciences and Engineering Research Council of Canada and from the Fonds québécois de la recherche sur la nature et les technologies.
| References |
|---|
|
|
|---|
-
Akaike H. A new look at the statistical model identification. IEEE Trans. Autom. Contr. (1974) 19:716–723.[CrossRef]
Antunes A., Templeton A. R., Guyomard R., Alexandrino P. The role of nuclear genes in intraspecific evolutionary inference: Genealogy of the transferin gene in the brown trout. Mol. Biol. Evol. (2002) 19:1272–1287.
Bininda-Emonds O. R. The evolution of supertrees. Trends Ecol. Evol. (2004) 19:315–322.[CrossRef][Medline]
Britten R. J., Rowen L., Williams J., Cameron R. A. Majority of divergence between closely related DNA samples is due to indels. Proc. Nat. Acad. Sci. USA (2003) 100:4661–4665.
Bruen T. C. PhiPack: PHI test and other tests of recombination (2005) Montréal: McGill University. Québec, Canada. www.mcb.mcgill.ca/~trevor/.
Bruen T. C., Philippe H., Bryant D. A simple robust statistical test for detecting the presence of recombination. Genetics (2006) 172:2665–2681.
Bryant D., Moulton V. Neighbor-Net: An agglomerative method for the construction of phylogenetic networks. Mol. Biol. Evol. (2004) 21:255–265.
Chen F.-C., Li W.-H. Genomic divergences between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees. Am. J. Hum. Genet. (2001) 68:444–456.[CrossRef][Web of Science][Medline]
Crépin F. Sketch of a new classification of roses. J. R. Hort. Soc. (1889) 11:217–228.
Cronn R., Wendel J. F. Cryptic tryst, genomic mergers, and plant speciation. New Phytologist (2003) 161:133–142.[CrossRef][Web of Science]
de Queiroz A. For consensus (sometimes). Syst. Biol. (1993) 42:368–372.
Degnan J. H., Salter L. A. Gene tree distributions under the coalescent process. Evolution (2005) 59:24–37.[Web of Science][Medline]
Doyle J. J. Gene trees and species trees: Molecular systematics as one-character taxonomy. Syst. Bot. (1992) 17:144–163.[CrossRef]
Doyle J. J., Doyle J. L. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull. (1987) 19:11–15.
Doyle J. J., Doyle J. L., Rauscher J. T., Brown A. H. D. Diploid and polyploid reticulate evolution throughout the history of the perennial soybeans (Glycine subgenus Glycine). New Phytologist (2003) 161:121–132.[CrossRef][Web of Science]
Erlanson E. W. Experimental data for a revision of the North American wild roses. Bot. Gazette (1934) 96:197–259.
Erlanson MacFarlane E. W. The old problem of species in Rosa with special reference to North America. Am. Rose Annu. (1966) 51:150–160.
Flora of North America Editorial Committee. Flora of North America (1993) vol. 1. New York: Oxford University Press.
Funk D. J., Omland K. E. Species-level paraphyly and polyphyly: Frequency, causes, and consequences, with insights from animal mitochondrial DNA. Annu. Rev. Ecol. Evol. Syst. (2003) 34:397–423.[CrossRef]
Hare M. P. Prospects for nuclear gene phylogeography. Trends Ecol. Evol. (2001) 16:700–706.[CrossRef]
Hare M. P., Avise J. C. Population structure in the american oyster as inferred by nuclear gene genealogies. Mol. Biol. Evol. (1998) 15:119–128.[Abstract]
Holder M. T., Anderson J. A., Holloway A. K. Difficulties in detecting hybridization. Syst. Biol. (2001) 50:978–982.
Howarth D. G. Genealogical evidence of homoploid hybrid speciation in an adaptive radiation of Scaevola (Goodeniaceae) in the Hawaiian islands. Evolution (2005) 59:948–961.[CrossRef][Web of Science][Medline]
Hunter J. C., Mattice J. A. The spread of woody exotics into the forests of a northeastern landscape, 1938–1999. J. Torrey Bot. Soc. (2002) 129:220–227.[CrossRef]
Huson D. H., Bryant D. Application of phylogenetic networks in evolutionary studies. Mol. Biol. Evol. (2006) 23:254–267.
Jakobsen I. B., Easteal S. A program for calculating and displaying compatibility matrices as an aid in determining reticulate evolution in molecular sequences. CABIOS (1996) 12:291–295.[Medline]
Jennings W. B., Edwards S. V. Speciational history of Australian grass finches (Poephilla) inferred from thirty gene trees. Evolution (2005) 59:2033–2047.[Web of Science][Medline]
Joly S., Bruneau A. Evolution of triploidy in Apios americana (Leguminosae) revealed by the genealogical analysis of the histone H3-D gene. Evolution (2004) 58:284–295.[Web of Science][Medline]
Joly S., Starr J. R., Lewis W. H., Bruneau A. Polyploid and hybrid evolution in roses east of the Rocky Mountains. Am. J. Bot. (2006) 93:412–425.
Kelchner S. A. The evolution of non-coding chloroplast DNA and its application in plant systematics. Ann Miss. Bot. Garden (2000) 87:482–498.[CrossRef]
Kluge A. G. A concern for evidence and a phylogenetic hypothesis of relationships among Epicrates (Boidae, Serpentes). Syst. Zool. (1989) 38:7–25.[Abstract]
Koufopanou V., Burt A., Taylor J. W. Concordance of gene genealogies reveals reproductive isolation in the pathogenic fungus Coccidioides immitis. Proc. Nati. Acad. Sci. USA (1997) 94:5478–5482.
Lewis C. E., Doyle J. J. Phylogenetic utility of the nuclear gene malate synthase in the palm family (Arecaceae). Mol. Phylogenet. Evol. (2001) 19:409–420.[CrossRef][Web of Science][Medline]
Lewis W. H. A monograph of the genus Rosa in North America east of the Rocky Mountains (1957a) University of Virginia. Ph.D. thesis.
Lewis W. H. Revision of the genus Rosa in eastern North America: A review. Am. Rose Annu. (1957b) 42:116–126.
Lewis W. H. A monograph of the genus Rosa in North America. II. R. foliolosa. Southwestern Naturalist (1958) 3:145–153.[CrossRef]
Lewis W. H. Monograph of the genus Rosa in North America. IV. R. x dulcissima. Brittonia (1962) 14:65–71.[CrossRef]
Lewis W. H., Basye R. E. Analysis of nine crosses between diploid Rosa species. Proc. Am. Soc. Hort. Sci. (1961) 78:573–579.
Linder C. R., Rieseberg L. H. Reconstructing patterns of reticulate evolution in plants. Am. J. Bot. (2004) 91:1700–1708.
Maddison W. P. Gene trees in species trees. Syst. Biol. (1997) 46:523–536.
Maddison W. P., Knowles L. L. Inferring phylogeny despite incomplete lineage sorting. Syst. Biol. (2006) 55:21–30.
Matsumoto S., Kouchi M., Yabuki J., Kusunoki M., Ueda Y., Fukui H. Phylogenetic analyses of the genus Rosa using the matK sequence: Molecular evidence for the narrow genetic background of modern roses. Sci. Hort. (1998) 77:73–82.[CrossRef]
Maynard Smith J. Analysing the mosaic structure of genes. J. Mol. Evol. (1992) 34:126–129.[Web of Science][Medline]
Maynard Smith J. Homoplasy test: Datain and exph programs (1998) Brighton, UK: University of Sussex. www.lifesci.sussex.ac.uk/home/John_Maynard_Smith/. version 3.
Maynard Smith J., Smith N. H. Detecting recombination from gene trees. Mol. Biol. Evol. (1998) 15:590–599.[Abstract]
Meiners S. J., Pickett S. T. A., Cadenasso M. L. Effect of plant invasions on the species richness of abandoned agricultural land. Ecography (2001) 24:633–644.
Millan T., Osuna F., Bobos S., Torres A. M., Cubero J. I. Using RAPDs to study phylogenetic relationships in Rosa. Theor. Appl. Genet. (1996) 92:273–277.[CrossRef][Web of Science]
Nichols R. Gene trees and species trees are not the same. Trends Ecol. Evol. (2001) 16:358–364.[CrossRef][Medline]
Olsen K. M., Schaal B. A. Evidence on the origin of cassava: phylogeography of Manihot esculenta. Proc. Nat. Acad. Sci. USA (1999) 96:5586–5591.
Page R. D. M., Charleston M. A. From gene to organismal phylogeny: Reconciled trees and the gene tree/species tree problem. Mol. Phylogenet. Evol. (1997) 7:231–240.[CrossRef][Web of Science][Medline]
Pamilo P., Nei M. Relationships between gene trees and species trees. Mol. Biol. Evol. (1988) 5:568–583.[Abstract]
Posada D. Evaluating methods for detecting recombination from DNA sequences: Empirical data. Mol. Biol. Evol. (2002) 19:708–717.
Posada D., Crandall K. A. ModelTest: Testing the model of DNA substitution. Bioinformatics (1998) 14:817–818.
Posada D., Crandall K. A. Evaluation of methods for detecting recombination from DNA sequences: Computer simulations. Proc. Nat. Acad. Sci. USA (2001) 98:13757–13762.
Ratsek J. C., Flory W. S. Jr., Yarnell S. H. Crossing relations of some diploid and polyploid species of roses. Proc. Am. Soc. Hort. Sci. (1940) 38:637–654.
Ratsek J. C., Yarnell S. H., Flory W. S. Jr. Crossing relations of some diploid species of roses. Proc. Am. Soc. Hort. Sci. (1939) 37:983–992.
Rokas A., Williams B. L., King N., Carroll S. B. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature (2003) 425:798–803.[CrossRef][Medline]
Rosenberg N. A. The probability of topological concordance of gene trees and species trees. Theor. Popul. Biol. (2002) 61:225–247.[CrossRef][Web of Science][Medline]
Rosenberg N. A. The shape of neutral gene genealogies in two species: Probabilities of monophyly, paraphyly, and polyphyly in a coalescent model. Evolution (2003) 57:1465–1477.[CrossRef][Web of Science][Medline]
Sanderson M. J., Purvis A., Henze C. Phylogenetic supertrees: Assembling the trees of life. Trends Ecol. Evol. (1998) 13:105–109.[CrossRef]
Schaal B. A., Olsen K. M. Gene genealogies and population variation in plants. Proc. Nat. Acad. Sci. USA (2000) 97:7024–7029.
Seo T.-K., Kishino H., Thorne J. L. Incorporating gene-specific variation when inferring and evaluating optimal evolutionary tree topologies from multilocus sequence data. Proc. Nat. Acad. Sci. USA (2005) 102:4436–4441.
Simmons M. P., Ochoterena H. Gaps as characters in sequence-based phylogenetic analyses. Syst. Biol. (2000) 49:369–381.
Slowinski J. B., Knight A., Rooney A. P. Inferring species trees from gene trees: A phylogenetic analysis of the Elapidae (Serpentes) based on the amino acid sequences of venom proteins. Mol. Phylogenet. Evol. (1997) 8:349–362.[CrossRef][Web of Science][Medline]
Strand A. E., Leebens-Mack J., Milligan B. G. Nuclear DNA-based markers for plant evolutionary biology. Mol. Ecol. (1997) 6:113–118.[CrossRef][Medline]
Swofford D. L. Phylogenetic analysis using parsimony (2002) Sunderland, Massachusetts: Sinauer Associates. PAUP* (*and other methods), version 4.0b10.
Takahata N. Gene genealogy in three related populations: Consistency probability between gene and population trees. Genetics (1989) 122:957–966.
Templeton A. R. Out of Africa again and again. Nature (2002) 416:45–51.[CrossRef]
Templeton A. R., Crandall K. A., Sing C. F. A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping and DNA sequence data. III. Cladogram estimation. Genetics (1992) 132:619–633.[Abstract]
Wilkinson M., Cotton J. A., Creevey C., Eulenstein O., Harris S. R., Lapointe F.-J., Levasseur C., Mcinerney J. O., Pisani D., Thorley J. L. The shape of supertrees to come: Tree shape related properties of fourteen supertree methods. Syst. Biol. (2005) 54:419–431.
Wissemann V., Ritz C. M. The genus Rosa (Rosoideae, Rosaceae) revisited: molecular analysis of nrITS-1 and atpB-rbcL intergenic spacer (IGS) versus conventional taxonomy. Bot. J. Linn. Soc. (2005) 147:275–290.[CrossRef][Web of Science]
Wu C.-I. Inferences of species phylogeny in relation to segregation of ancient polymorphisms. Genetics (1991) 127:429–435.[Abstract]
Yang Z. Maximum-likelihood models for combined analyses of multiple sequence data. J. Mol. Evol. (1996) 42:587–596.[CrossRef][Web of Science][Medline]
Young N. D., Healy J. GapCoder automates the use of indel characters in phylogenetic analysis. BMC Bioinformatics (2003) 4:6.[CrossRef][Medline]
This article has been cited by other articles:
![]() |
A. D. Leache Species Tree Discordance Traces to Phylogeographic Clade Boundaries in North American Fence Lizards (Sceloporus) Syst Biol, December 1, 2009; 58(6): 547 - 559. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. D. Leache, M. S. Koo, C. L. Spencer, T. J. Papenfuss, R. N. Fisher, and J. A. McGuire From the Cover: Quantifying ecological, morphological, and genetic variation to delimit species in the coast horned lizard species complex (Phrynosoma) PNAS, July 28, 2009; 106(30): 12418 - 12423. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Frajman, F. Eggens, and B. Oxelman Hybrid Origins and Homoploid Reticulate Evolution within Heliosperma (Sileneae, Caryophyllaceae)--A Multigene Phylogenetic Approach with Relative Dating Syst Biol, July 3, 2009; (2009) syp030v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Joly and A. Bruneau Measuring Branch Support in Species Trees Obtained by Gene Tree Parsimony Syst Biol, May 25, 2009; (2009) syp013v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. E. Wallace, S. G. Weller, W. L. Wagner, A. K. Sakai, and M. Nepokroeff Phylogeographic patterns and demographic history of Schiedea globosa (Caryophyllaceae) on the Hawaiian Islands Am. J. Botany, May 1, 2009; 96(5): 958 - 967. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||








