© 2005 Society of Systematic Biologists
Placing Paleopolyploidy in Relation to Taxon Divergence: A Phylogenetic Analysis in Legumes Using 39 Gene Families
Edited by Rod Page: Associate Editor
1 Department of Plant Biology 228 Plant Sciences Building Cornell University, Ithaca Ithaca, New York, 14853, U.S.A. E-mail: bep27{at}cornell.edu (B.E.P.)
2 Department of Genetics, Development, and Cell Biology, Iowa State University Ames, Iowa, 50011, U.S.A.
3 USDA-ARS-CICGR Ames, Iowa, 50011, U.S.A.
| Abstract |
|---|
|
|
|---|
Young polyploid events are easily diagnosed by various methods, but older polyploid events become increasingly difficult to identify as chromosomal rearrangements, tandem gene or partial chromosome duplications, changes in substitution rates among duplicated genes, pseudogenization or locus loss, and interlocus interactions complicate the means of inferring past genetic events. Genomic data have provided valuable information about the polyploid history of numerous species, but on their own fail to show whether related species, each with a polyploid past, share a particular polyploid event. A phylogenetic approach provides a powerful method to determine this but many processes may mislead investigators. These processes can affect individual gene trees, but most likely will not affect all genes, and almost certainly will not affect all genes in the same way. Thus, a multigene approach, which combines the large-scale aspect of genomics with the resolution of phylogenetics, has the power to overcome these difficulties and allow us to infer genomic events further into the past than would otherwise be possible. Previous work using synonymous distances among gene pairs within species has shown evidence for large-scale duplications in the legumes Glycine max and Medicago truncatula. We present a case study using 39 gene families, each with three or four members in G. max and the putative orthologues in M. truncatula, rooted using Arabidopsis thaliana. We tested whether the gene duplications in these legumes occurred separately in each lineage after their divergence (Hypothesis 1), or whether they share a round of gene duplications (Hypothesis 2). Many more gene family topologies supported Hypothesis 2 over Hypothesis 1 (11 and 2, respectively), even after synonymous distance analysis revealed that some topologies were providing misleading results. Only ca. 33% of genes examined support either hypothesis, which strongly suggests that single gene family approaches may be insufficient when studying ancient events with nuclear DNA. Our results suggest that G. max and M. truncatula, along with approximately 7000 other legume species from the same clade, share an ancient round of gene duplications, either due to polyploidy or to some other process.
Keywords: Fabaceae; gene duplication; Glycine max; legumes; Medicago truncatula; multigene phylogenetic analysis; nuclear DNA; polyploidy
Received May 5, 2004; Revised August 16, 2004; Accepted October 1, 2004
Polyploidy has long been recognized as a genomic feature of some organisms, but the large number of species with a polyploid background has only recently come into focus. Although a polyploid origin of all vertebrates is controversial, some fungi, insects, molluscs, fish, amphibians, reptiles, and many plants show strong evidence of polyploidy (Wolfe, 2001; Mable, 2003). Genomic data have been critical to the inference of polyploidy in organisms long thought to be simple diploids, such as Arabidopsis thaliana (L.) Heynh. (Vision et al., 2000), and will almost certainly increase the number of organisms found to harbor cryptic genome-doubling events in their past. In A. thaliana, these data suggest not only that this taxon has a polyploid past perhaps shared by most of the family Brassicaceae, but that its genome also contains the signature of additional, more ancient polyploid events, including one that might be shared by all eudicots (encompassing around 75% of flowering plants), and another that might be shared by all flowering plants (Vision et al., 2000; Bowers et al., 2003). Thus, although research on many organisms clearly needs to include the understanding of whether a species has a polyploid history or not, the focus is increasingly shifting to questions of when polyploid events occurred and which species share each event (Blanc and Wolfe, 2004).
Most examples of studies of polyploid plants using phylogenetic methods that show the origins, timing relative to species divergences, and products of polyploid events examine relatively recent events (e.g., cotton, Wendel, 1989; Glycine, Doyle et al., 2000; Silene, Popp and Oxelman, 2001), whereas few examine older events (e.g., maize, Gaut and Doebley, 1997; Arabidopsis, Bowers et al., 2003). In animal studies, the several fully sequenced model organisms have been a great advantage to studies that test ancient polyploidy using phylogenetic methods, such as Friedman and Hughes (2001), by providing a very large number of gene families to work with, allowing events over 400 Mya to be tested. The relatively small number of studies examining older events may be explained by the increasing difficulties that arise with greater time since the polyploid event, due in part to the general challenge faced when reconstructing older divergences, but also the greater possibility of extinction of specific gene lineages. Most importantly, as gene duplications (independent of polyploid events) and losses accumulate with time, orthology assessment becomes increasingly difficult (Wendel, 2000).
Although younger and older polyploid events are conveniently referred to as "neo" and "paleo" respectively, this is an arbitrary categorization of a continuum of events, with neopolyploidy occurring within the last few million years and paleopolyploidy being older. Understanding the origins of young polyploids may be facilitated by the short time since polyploid formation. For example, the alleles at histone H3D paralogous loci found in some Glycine neopolyploids differ by at most a few changes from those found in each of the extant representatives of their putative diploid parental lineages, which are clearly differentiated from one another (Doyle et al., 2000), thereby making inference of the hybrid polyploid (allopolyploid) origin of these plants straightforward. In contrast, whether the polyploid origin of the genus Glycine is closer to auto- or allopolyploidy has not been unambiguously ascertained; the extinction of the contributing lineages to an allopolyploid Glycine common ancestor can produce the same pattern as an autopolyploid event (Doyle et al., 2003).
Types of evidence for paleopolyploidy are varied, but it is rare to diagnose such polyploid taxa via phylogenetic means. For example, paleopolyploidy in A. thalianawas suggested based on the observation from genome-wide sequences that the majority of the genome is located in duplicated (but not triplicated) blocks (AGI, 2000). Since that study, phylogenetic approaches coupled with gene order information have suggested a whole genome duplication due to a polyploid event that occurred between 20 and 80 Mya, after Brassicales diverged from Malvales, with 89% of known genes still present in duplicate (Bowers et al., 2003, and references therein).
As more and more taxa are hypothesized to be ancient polyploids, the likelihood of shared polyploid events increases. Shared polyploidy has implications for understanding genome, chromosome, and gene evolution. However, trying to show that taxa share a polyploid past using absolute dates based on duplicated and orthologous gene comparisons or single gene phylogenies is prone to failure, because of the uncertainties surrounding any molecular date estimate, even when fossil calibrated. That most studies do not include all known sources of error when using molecular dates only compounds these problems (see Graur and Martin, 2004). In contrast, phylogenetic methods offer mutually exclusive and testable hypotheses that can determine the position of polyploidy relative to taxon divergence. Here we present an example of a phylogenetic-based analysis that tests when polyploidy occurred in the legume family and demonstrate how some potential pitfalls can be overcome by the use of synonymous distances as a filter that allows some misleading gene family topologies to be identified.
Legume Phylogeny and Genome Evolution
The legume family (Fabaceae) is the third largest family of flowering plants, and among its nearly 20,000 species are the soybean, Glycine max(L.) Merr and the genomic model species Medicago truncatulaGaertner. These two members of the papilionoid subfamily are part of a lineage that is thought to have shared a common ancestor over 50 Mya (Wojciechowski, 2003). Glycine max, with a chromosome number of 2n = 40, has long been considered to have a polyploid genome (Hadley and Hymowitz, 1976; Shoemaker et al., 1996). In contrast, M. truncatula was chosen as a target for genomic studies because of its small genome and diploid (2n = 16) chromosome number relative to its economically important tetraploid congenor, alfalfa (M. sativa L.; Cook, 1999).
Recent analyses have been conducted on duplicated gene pairs of both species (Schlueter et al., 2003, 2004). In G. max, the distribution of coalescence times of 275 gene pairs showed a statistically significant peak at Ks = 0.19 ± 0.03, approximately 15.41 ± 2.13 Mya, which was presumed to be the signature of the polyploid event that increased its chromosome number from the 2n = 20 or 22 that is typical of its close relatives (Goldblatt, 1981). This study also revealed a second, older peak, comprising gene pairs with an average coalescence at Ks = 0.54 ± 0.03, approximately 44.26 ± 2.32 Mya (Schlueter et al., 2003, 2004). Such a peak is consistent with linkage mapping studies (Shoemaker et al., 1996; Lee et al., 2001) and microsynteny analysis (Yan et al., 2003), which suggested two rounds of polyploidy or segmental duplication in the soybean genome. The studies of Schlueter et al. (2003, 2004) have been confirmed by Blanc and Wolfe (2004), who also found two peaks in soybean paralogues at Ks = 0.10–0.15 and 0.45–0.50. Lee et al. (2001) found that recent soybean homoeologous chromosomal regions appeared comparable to entire chromosomes in other legumes, which, coupled with chromosome number data, strongly supports a whole genome (polyploid) explanation. They suggest that the earlier duplication "at a minimum ... involved a whole chromosome" (Lee et al., 2001); however, this is a conservative estimate given the limited scope of their interspecific comparisons. Given these lines of evidence, it seems reasonable to assume that the duplicated gene pairs found in Schlueter et al. (2003, 2004) and Blanc and Wolfe (2004) are the results of either a whole or partial genome duplication rather than correlated but otherwise independent tandem duplications.
Coalescence times of gene pairs in M. truncatulashow a similar pattern to those in G. max, with two peaks, although the youngest peak in M. truncatula is broader than the corresponding G. max peak (Schlueter et al., 2004). The older peak is well defined and is at Ks = 0.71 ± 0.02, approximately 58.19 ± 1.94 Mya old (Ks = 0.65–0.70 in Blanc and Wolfe, 2004). These results are surprising, given the assumed diploid nature of this species.
The estimated date of the most recent common ancestor of G. max and M. truncatula, based on cpDNA matK and rbcL using a penalized likelihood approach and calibrated with several fossils, lies between 54.3 ± 0.6 (matK with standard deviation) and 54.8 ± 1.1 (rbcL) Mya (Lavin et al., in press). If the divergence time estimate and the gene duplication time estimates of Schlueter et al. (2003, 2004) are correct, we have a difficulty. The older M. truncatulagene duplications appear to be older than the species divergence, in which case G. max should share that round of duplications. But the older round of duplications in G. max is estimated to be younger than the species divergence, and therefore should not be shared with M. truncatula. Clearly, the estimated times of these events do not identify which event took place first in these lineages—species divergence or polyploidy. However, Blanc and Wolfe (2004) find a radically different estimate of the age of species divergence of 13.3 to 15.0 Mya, derived from an "orthologue" peak with Ks = 0.40–0.45. The age estimate alone would suggest that species divergence was far more recent than the duplication events estimated by Schlueter et al. (2003, 2004), but the story is a little more complex. The older soybean and Medicago paralogue peaks in Blanc and Wolfe (2004) both have larger Ks values than the "orthologue" peak, indicating that the duplication of these genes probably occurred before taxon divergence. However, the "orthologue" peak cannot be relied upon, as they did not use phylogenies to test orthology.
Phylogenetic Tests of Competing Duplication Hypotheses and Methodological Complications
The phylogenetic signal in a majority of genes should reveal whether the two rounds of duplication in G. max are independent of those in M. truncatula. That is, whether taxon divergence predates both duplication events (Hypothesis 1 [H1], Fig. 1A), or whether M. truncatulaand G. max are linked by a shared round of duplication with one additional round after taxon divergence in G. max, and possibly another in M. truncatula, also following taxon divergence (Hypothesis 2 [H2], Fig. 1B).
|
In an idealized situation, where no secondary losses of paralogues occur, two rounds of polyploidy (or other large-scale gene duplications) occurring independently in the G. max lineage should produce four copies (A1, A2, B1, and B2; Fig. 1A) of a gene that was single copy when the two genera diverged. M. truncatulamay have two copies (as shown) or more, depending on how many rounds of gene duplication it has experienced; however, we are not attempting to test any events in this species after divergence from the G. max lineage.
Under H2 (Fig. 1B), each pair of G. max paralogues produced by a more recent paleopolyploid event (e.g., A1 and A2) will be most closely related to a different copy or copies in M. truncatula (e.g., A1 and A2 from G. max form a clade that is sister to M. truncatulacopy A), with each clade of three or more copies (e.g., all A copies) sister to the other (e.g., the clade of all B copies).
However, the topology of any single gene tree can be misleading due to complicating processes such as independent gene duplications in addition to polyploidy, paralogue losses, and failure to recover the correct topology. Only a comparison among many gene trees will reveal whether the majority share the relative timing of the older polyploid event under H2 (indicating that the older duplications were due to polyploidy), or not (the older duplications were independent). It should be noted that alternative mechanisms that produce a pattern of many duplications within a narrow time period (e.g., strong selection pressure to retain tandem duplications [Brown et al., 1998]) would be indistinguishable from polyploidy in our phylogenies. These alternatives are not being tested here—only whether the older peak of duplicated genes is shared among the taxa or not. We expand upon the study of Schlueter et al. (2004) by testing these hypotheses with 39 small gene families not used in that study, each of which contains three or four members in G. max.
| Methods |
|---|
|
|
|---|
Data Collection and Alignment
We identified three- and four-member gene families in soybean using expressed sequence information in The Institute for Genome Research (TIGR) database (http://www.tigr.org/tigr-scripts/tgi/T_reports.cgi? species = soybean) on May 13, 2002 (the same data set as Schlueter et al., 2004). Multiple expressed sequence tags (ESTs) derived from mRNAs were assembled into tentative consensus sequences (TCs) that each likely represents a single gene. Using TIGR's original parameters (Quackenbush et al., 2000), we assembled ESTs into TCs using the program CAP3, but only for ESTs derived from G. maxcultivars Williams and Williams82. These near-isogenic lines comprise the majority of the G. max EST resources. This method excluded the majority of allelic differences and very recent gene duplications, which were not relevant to our study.
Open reading frames (ORFs) were found for each TC consensus sequence with the program getorf (Rice et al., 2000). The ORF collection was then searched for matches among members that represent gene families. The ORF similarity search criteria used were (1) query and subject nucleotide sequences must match each other in their entire length with a similarity of at least 80% using uncorrected pairwise comparisons; (2) the subject identified must not be the query itself (a identifying a); (3) the subject and query were required to be reciprocal (if a identified b and c, then b must identify a and c, and c must identify a and b); (4) only three- and four-member gene families (triple and quads, respectively) were considered, as four paralogues are expected for G. max under either hypothesis and three paralogues are the minimum that can recover evidence of both rounds of polyploidy in G. max. Using these criteria, the average accepted BLAST (Altschul et al., 1990) E-value was 6 x 10– 9. Finally, any TC containing fewer than three ESTs, along with its paralogous TCs, was eliminated from further analysis, to avoid TCs that might contain errors in the sequence due to the limited depth of EST coverage.
Triplicated (triple) and quadruplicated (quad) G. max sequences were aligned in BioEdit (Hall, 1999), initially using Clustal W (Thompson et al., 1994), then adjusted manually to minimize putative substitution and indel events and maintain the codon reading frame. In final alignments, some short stretches of indel-rich regions were excluded from analysis where several equivalent alignments were possible. Otherwise, gap character states were treated as missing and no separate indel characters were coded. Start and stop codons were identified and more variable up- and downstream nucleotides were removed.
We included sequences of M. truncatulato test our hypotheses and sequences of A. thalianato provide rooting information for our phylogenies. WU BLAST 2.0 searches (W. Gish, 1996–2003, http://blast.wustl.edu) were carried out by taking each G. max copy and searching the M. truncatulaand A. thalianaTIGR EST databases simultaneously. The most similar sequences in these latter taxa to G. max were obtained using a cutoff of 70% similarity in the BLAST alignment or the sequence with the nearest match in the BLAST search. Additionally, the target sequences were required to match the majority (> 50%) of the query sequence. Medicago truncatulasequences that were near matches (68% to 70%) were also included to maximize the recovery of potential orthologues and paralogues in this taxon. Partial sequences from M. truncatula that scored over 70% similarity in their entirety, but only covered a minority of the G. max query sequence, were also included to maximize the recovery of potential orthologues and paralogues, but only if they matched for at least 200 bp. M. truncatulaand A. thaliana sequences obtained were aligned with the G. max triplicates or quadruplicates as previously described.
No attempt was made to parse the M. truncatula or A. thaliana EST collections by genotype, because we wished to maximize the chance of recovering the most closely related sequences to G. max to increase the likelihood of making appropriate phylogenetic comparisons. Not including some genotypes may exclude unique ESTs and correspondingly reduce the chance of finding orthologues, which we wished to avoid. Very similar EST and genomic sequences sometimes may represent the same locus or even the same allele. No attempt to remove these was made until after the phylogenetic analyses, nor was the approximate copy number of any gene inferred for these two taxa.
Phylogenetic Analysis
Single ESTs often appeared to contain some sequencing errors, because alignment to well-sampled TCs from the same or other taxa indicated that lone ESTs contained single nucleotide indels that disrupted the putative reading frame. In initial analyses, we found some M. truncatula and A. thaliana ESTs that were most closely related to TCs from the same taxon. These redundant partial sequences may represent the same or alternative alleles as their most closely related TCs, but did not contribute information regarding the hypotheses being tested, and were therefore removed from further analyses.
Phylogenetic bootstrap analysis was done using maximum parsimony (MP) in PAUP* (Swofford, 1998) to find bootstrap proportions for individual clades. Bootstrap proportions were determined for matrices with more than 12 terminals by heuristic searching (500 bootstrap replicates, with 10 random addition sequence [RAS] replicates per bootstrap replicate, saving 100 trees per RAS replicate) and tree bisection reconnection (TBR) branch swapping. Branch-and-bound searches were conducted for matrices with 12 or fewer terminals, with 500 bootstrap replicates used to find bootstrap proportions.
It has been shown that the parsimony algorithm and overly simple models in a maximum likelihood (ML) framework may be affected by long branch attraction, particularly when rates of change are high (Felsenstein, 1978; Bruno and Halpern, 1999). Therefore, we explored whether alternative models of evolution changed the topologies found using MP. ML bootstrap analyses (using five RAS replicates for each of 500 bootstrap replicates for matrices with 12 or fewer terminals, or three RAS replicates and 200 bootstrap replicates otherwise) were done using fixed parameter estimates and optimum models provided by ModelTest (Posada and Crandall, 1998) and executed in PAUP*. The ModelTest block used in PAUP* was modified to use a parsimony-derived tree (using 100 RAS replicates, saving 100 trees per replicate), rather than the neighbour-joining default starting tree, to determine likelihood scores under each model. We did not search for either most parsimonious or most likely trees, because usually many optimal or near optimal trees with alternative resolutions exist. We used the criterion of moderate to high bootstrap values as the measure of whether a topology offered a critical test of the hypotheses, because this criterion is more conservative than optimal trees.
We compared the MP and ML trees and considered relevant nodes present in either tree with good support (bootstrap
70%), not contradicted by the other tree (i.e., with an alternative resolution with a bootstrap
70%), to be reasonable hypotheses of relationship. Some trees contained poorly supported nodes in both MP and ML analyses. A single poorly supported node was considered unresolved, but each alternative resolution of that node was considered in the next step. Trees with more than one poorly supported node were not considered further as they did not discriminate between either hypothesis.
Additional gene losses and duplications not associated with polyploidy are common events in the histories of nuclear gene families (Lynch and Connery, 2000) and are thus expected in analyses such as this. Because such events can be invoked to force topologies to conform to either hypothesis, we tabulated the number of such ad hoc events required to fit each individual gene tree to each hypothesis. Each gene family topology was evaluated under the assumption that firstly Hypothesis 1, and secondly Hypothesis 2, was correct. We required each topology to accommodate the two rounds of large-scale gene duplications, and then fitted as few ad hoc events as possible to explain missing or extra gene lineages beyond those expected by the large-scale duplications. Each ad hoc loss or duplication was scored as one additional event. The hypothesis that required fewest additional events given a particular topology was considered to be supported by that topology. Topologies equivocal in their support and unresolved topologies, while contributing to neither hypothesis, did not suggest any alternative. Topologies that did not contain any sister grouping of G. max sequences lack primary evidence of any kind of duplication and were considered to be evidence against either hypothesis.
Synonymous Distance Analysis
The number of synonymous substitutions per synonymous site (Ks) between sequences can be used to estimate their time of divergence if rates of synonymous change do not change over time and the rate of change can be successfully calibrated. However, it is widely accepted that different copies within taxa can show rate differences (Zhang et al., 2002), that different genes can show rate differences among the same taxon lineages (Small et al., 1998), and that different taxon lineages can show rate differences for the same gene (Jobson and Albert, 2002). Calibration also has challenges, including estimation of fossil ages, placement of fossil calibrators on a phylogeny, stem versus crown constraints, branch length estimation, topology estimation, and alternative methods of accommodating nonclock behavior among sequences. However, we have circumvented some of these difficulties and used Ks to identify gene family topologies that may be misleading because of orthology-paralogy-homoeology conflation. Probable cases of incorrect comparisons can be identified based on Ks values that are very different from those expected and can then be reevaluated with respect to the support that they provide for our hypotheses.
For each topology we examined G. max–G. max pairs and G. max–M. truncatula nearest pairs and estimated their Ks by the weighted method of Yang and Nielsen (2000) implemented in PAML 3.0 (Yang, 2000). The former compares putative homoeologues, whereas the latter compares putative orthologues, although PAML conducts only topology-independent pairwise comparisons. The distribution of Ks from G. max–G. max pairs was compared to that in Schlueter et al. (2004) to check that our sample of 39 gene families was representative of the larger sample of 275 duplicate genes used in that study.
If we make the reasonable assumption that there were in fact two duplications in the G. max lineage of either the whole or a part of the genome, then the issue of differing gene rates and calibration may be simplified. In Arabidopsis, Zhang et al. (2002) examined the largest blocks of duplicated syntenic gene pairs confined to two chromosomes—almost certainly the products of a single large-scale duplication event. These 242 gene pairs had a maximum of 13.8-fold difference in synonymous rate, but 90% of gene pairs fell within a 2.6-fold range. If this relationship holds in legumes, we have a basis for identifying gene pair comparisons that are unlikely to have been duplicated by the same event, and thus filter some spurious comparisons suggested by gene family topologies alone. The event in Arabidopsiswas estimated to have occurred ca. 100 Mya, and although the dating method is arguable, this event is nevertheless probably older than the events in G. max. Ideally, we would use rate differences that encompass 95% of the rate variation among genes, centered on the estimated median of an event, to correspond to a confidence interval (CI). This was not available to us, so we used a ca. 3.0-fold range to provide an approximate CI of likely Ks rate variation among genes duplicated contemporaneously by a large-scale mechanism. Glycine-Glycine comparisons were thus expected to fall within a Ks range of 0.29 to 0.87. Putative homoeologues with greater pairwise Ks values than these were treated as spurious, and the topologies were reevaluated accordingly. Although we have not dealt with the possibility of different rates among paralogues, the other rate differences and calibration have been accommodated with these assumptions without the inherent problems of attempting to infer absolute dates nor relying on clocklike behavior of genes. Different rates among paralogues may be expected to affect the rate of nonsynonymous change more than synonymous change (Zhang et al., 2002).
Although gene trees were unrooted, the inclusion of A. thaliana sequences allowed the examination of legume-only subtrees, each sister to an A. thaliana sequence(s), within which events that postdate the Brassicaceae-Fabaceae divergence can be inferred. Glycine max–G. max pairwise comparisons were limited to sister copies and any pair within a legume-only subtree, thus avoiding some comparisons that are not relevant to the gene duplications being examined here. Glycine max comparisons to M. truncatula were also limited to the most closely related (and therefore putatively orthologous) M. truncatula sequence for each G. max copy. Where comparisons between a single sequence and a clade of two or more sequences were made, the average Ks was used.
Because the standard deviation (SD) of each Ks was not constant (larger values generally having larger SDs), a natural log (Ln) transformation of Ks values was used to normalize the standard deviations. In order to determine the number of peaks of multiple gene duplications in the Ks histogram, Ks values less than 0.05 were omitted to exclude an expected spike of recent duplication events before the gradual accumulation of paralogue losses (Blanc and Wolfe, 2004). Ks values greater than 1.0 were also excluded from the estimation of distributions because the divergences that these rates track are much older than any of the events that were relevant to the testing of the polyploid hypotheses, given that the largest relevant Ks of duplication peaks in Glycine or Medicago is ca. 0.70 (Schlueter et al., 2003, 2004) or less (Blanc and Wolfe, 2004).
In order to assess how many peaks, and therefore discrete events, might be present in the Ks distribution data, we modeled the Ln-transformed Ks values as a mixture of k normal distributions (McLachlan and Peel, 2000), considering models from k = 1 through k = 5 components (following Schlueter et al., 2004). Each component was allowed a different mean, variance, and probability of membership in that component, with parameters estimated by ML. The variance of each component was constrained to be greater than 0.0001 to avoid false optima in the likelihood function. The number of components was chosen by a series of log likelihood ratio tests comparing models for k components to models for k + 1 components. For all number of components, P-values for these tests were computed using the 5 d.f. chi-square distribution, with statistical significance accepted for P
0.05.
For graphical representation of the data, the normal distributions fit to the Ln-transformed data were back-transformed, and the normal distribution density functions became Ln-normal density functions. As an initial approximation of coalescence estimates, the median of the back-transformed data was used, with an assumed rate of 6.1 synonymous substitutions per synonymous site per billion years (Lynch and Connery, 2000). Using the median minimizes the impact of skewness in the synonymous distance distributions. Additional details can be found in Schlueter et al. (2004).
Ks information was used to evaluate further whether gene trees reasonably fit either of the hypotheses. If the gene required an overall rate change only in order to fit either hypothesis, this was treated as an additional ad hoc event. However, if a gene required an overall rate change and lineage-specific change(s) in order to fit the preferred hypothesis based on the topology, it was deemed not to support either hypothesis (the complexity of ad hoc hypotheses starts to approach the case where any topology could fit any hypothesis). An overall rate change was accepted as a reasonable ad hoc event if, after scaling, each relevant pairwise comparison was within the standard error around the median Ks values of its corresponding event. Scaling was done by dividing the G. max–M. truncatula Ks (or the mean of multiple comparisons) by a scaling factor to make the Ks lie within the standard error of the median (0.57 ± 0.02) and dividing other pairwise comparisons by this factor. We feel that this is a conservative approach to applying Ks data to testing the hypotheses, and found that this method identified very distantly related G. max–containing lineages that appear to have been inappropriately compared (see Results). Exclusion of these distant comparisons usually simplified the evaluation of previously applied ad hoc duplication and loss events.
Null Hypothesis under the Assumption That All Gene Duplications Are Independent and Occur Randomly Through the History of the Lineages
We considered as an alternative to either of the hypotheses being tested the assumption that no large-scale duplications have occurred in the history of these lineages. If we also assume a simple situation where all gene duplications have equivalent probability of occurring on any lineage in a given time period, we would expect that the proportion of genes in Glycine duplicated before and after the divergence of the Glycineand Medicagolineages would be proportional to the time before and after taxon divergence. Given that we only considered Glycine gene duplications up to a Ks of ca. 0.9 as being reasonable, and that the taxon divergence is around Ks = 0.57, we expect a ratio of around 1:1.7 genes duplicated before taxon divergence to after divergence.
Of course, older duplications produce additional lineages for subsequent duplications. The closer to the taxon divergence a duplication before this event occurs, the greater the chance that additional duplications on the daughter lineages will occur after taxon divergence. This will decrease the 1:1.7 ratio (the second number increases). Paralogues from, and therefore evidence for, older duplications are less likely to be preserved than are those from younger duplications (Lynch and Conery [2000] suggest a 4 million year half life of paralogues; also note Blanc and Wolfe, 2004: fig. 1), thereby also decreasing the observable expected ratio of before to after taxon divergence duplications below 1:1.7. If the younger polyploid event (for which there is better evidence than the older one) is included in background knowledge, we can expect even more gene duplications after taxon divergence than before as part of a null hypothesis.
| Results |
|---|
|
|
|---|
Triple and Quad Genes Found
Glycine maxto G. max similarity searches identified 34 triplicated genes and eight quadruplicated genes, using TCs composed only of ESTs from Williams and/or Williams82 genotypes. Of the triplicated and quadruplicated genes, 31 and eight, respectively, could be confidently aligned to each other, giving 39 genes examined (Table 1). These searches also identified 6972 single genes and 275 duplicated genes with confident alignments (Schlueter et al., 2004).
|
Phylogenetic Analysis
Thirty-two of the 39 genes produced topologies with enough resolution and support to provide information regarding the relationships of sequences in each gene. The remaining seven gene trees (18%) lacked critical resolution (Table 2). The MP and ML analyses produced only a single instance of a well-supported (
70% BS) conflicting resolution between topologies inferred by these methods. This involved the placement of an M. truncatula EST, which almost certainly contains errors and may mimic a long branch problem. The alternative placement under MP or ML did not, however, affect the hypothesis supported by this gene (triple gene 11).
|
Rooting of trees using A. thaliana sequences showed that products of several duplications of different ages were recovered in many cases. In some instances, inferred trees required the assumption of gene duplications earlier than the legume–A. thalianadivergence (i.e., distant paralogues not relevant to questions posed here seemed to have been recovered). Therefore, it appears that our method of selecting sequences for comparison was broad enough to have captured all available M. truncatulasequences that postdate the legume–A. thalianadivergence.
The number of ad hoc events required to fit each gene tree to either of the hypotheses is listed in Table 2 and an example of the method of tallying ad hoc events is given in Figure 2. Only one gene tree (3% of the 39 total genes) supported H1 over H2, whereas 22 gene trees (56%) required fewer ad hoc events to fit H2 than they did to fit H1 (Table 2). Four gene trees (10%) failed to provide support for either hypothesis, in that no G. max paralogues were found to be most closely related. Five trees (13%) were equivalent in the number of ad hoc events required to fit either hypothesis. All gene trees required at least one ad hoc event in order to fit either hypothesis.
|
Synonymous Distance Analysis
The distribution of pairwise comparisons of G. max–G. maxsequences were found to be best explained by two peaks for Ks values between 0.05 and 1.0 (P < 0.05; Fig. 3A). The younger peak back transformed median Ks is 0.17 ± 0.05 (standard error), whereas the older peak back transformed median Ks is 0.57 ± 0.05. The standard errors of these overlap with those found in an analysis of 275 duplicated genes (Schlueter et al., 2004; 0.19 ± 0.03 and 0.54 ± 0.03, respectively). Thus, the triple and quad gene data are displaying the same underlying pattern as that seen in the duplicated G. max gene data.
|
The Ks distribution of closest G. max–M. truncatula pairwise comparisons for Ks values between 0.05 and 1.0 (Fig. 3B) is fit best by only a single peak (P < 0.05). The back transformed median Ks of this peak is 0.57 ± 0.02.
Triple gene 11, the only gene found to fit H1 over H2, had larger Ks values for both the younger and older pairwise comparisons than the majority of genes (Fig. 4). The Ks between the closest G. max copies, which should match the younger gene duplication event, was 0.51. However, this value is larger than expected for the younger event and lies far outside the Ks values of 0.085 to 0.255 that allow for a 3.0-fold rate variation around the Ks value of 0.17 estimated from the 39 genes for the younger event. The Ks of the comparison in triple gene 11 between each of the closest pair of G. max copies and their sister, which should match the older event, averages 1.84, and also falls outside Ks values allowing for 3.0-fold rate variation (0.285 to 0.855). The average Ks between the closest M. truncatula sequences to the clade of G. max sequences is 3.04, also larger than expected, which suggests that the closest M. truncatula sequences have not been recovered, either because they have not been sequenced yet or because the loci have been pseudogenized or lost.
|
Some gene trees that support H2 by their topologies were also found to have larger Ks values for relevant comparisons than the majority of genes. An example is shown in quad gene 2 (Fig. 5). Clade A in this gene tree requires fewer ad hoc events to match H2, but this assumes that G. max 102435 and the ancestor of G. max 101156/111957 are the products of the older polyploid event. The Ks of that comparison suggests otherwise (mean Ks = 4.80), whereas the Ks of G. max 101,156 to 111,957 matches the younger polyploid event (Ks = 0.17). Therefore, barring extreme rate fluctuations, the two parts of clade A in quad gene 2 containing G. maxsequences are unlikely to be the products of the older polyploid event, but instead are likely to be due to an earlier gene duplication.
|
Nine gene trees that were thought to support H2 were deemed to be equivocal after Ks values were examined (Table 3). The tree thought to support H1 became equivocal, but another two considered to support H2 were reinterpreted to offer support for H1 (Table 3). After synonymous distances are taken into account, more gene trees still supported H2 over H1 (11 and 2, respectively). Our results indicate that a much higher proportion of gene families show gene duplications corresponding to the older round of large-scale duplication before taxon divergence than after (11:2) than we might expect under the null hypothesis alone.
|
| Discussion |
|---|
|
|
|---|
Multiple Gene Family Approach Supports a Shared Round of Large-Scale Duplication in the Common Ancestor of Glycine and Medicago (H2)
Of the 39 genes we examined, only 13 (33%) supported either of the hypotheses being tested here. The implications for phylogeny reconstruction using a single or only a few genes are profound. Additional gene duplications not part of polyploidy, loss of paralogues, gene silencing (because we used EST-derived sequences), and a failure to sample all paralogues, are some factors that can affect individual gene trees and potentially cause misleading conclusions to be drawn from any one tree. Despite this, our multiple gene family approach recovered a strong signal that clearly favors the hypothesis of a shared round of large-scale gene duplication for G. maxand M. truncatulaover independent large-scale duplications in each genome that occurred after the divergence of these taxa from one another. Of informative genes, that 11 genes supported this outcome to only 2 against is more decisive than it first appears, given that independent gene duplications that could offer spurious support to either hypothesis may be expected to occur more frequently after taxon divergence than before.
Paralogue Loss Required in All 39 Genes
The inference of paralogue loss is critical to interpreting the results of the phylogenetic analyses. The number of single genes and two-, three-, and four-member gene families found here and previously (Schlueter et al., 2003, 2004)—6972, 275, 34, and 8, respectively, suggests that the majority of gene families produced by large-scale duplication have lost members in the time since duplication. High rates of paralogue loss are to be expected based on theoretical predictions (Walsh, 1995) and empirical studies (Wagner, 1998; Lynch and Connery, 2000; Rodin and Riggs, 2003). Although the stringency of the gene family search has almost certainly inflated the number of single genes, at least relative to multigene families, the rapidly declining number of members in each successive multigene family class is also consistent with high rates of paralogue loss.
Every gene examined required the inference of loss of at least one G. max paralogue in order to fit either hypothesis (e.g., Fig. 2), including quadruplicated genes, whereas some genes required more losses. Gene duplications independent of the two rounds seen in G. max were also required in some cases (e.g., Fig. 5).
Ks Data Refute Some Topological Conclusions
On the basis of topology alone, only one gene supported H1 (triple gene 11). Analysis of the Ks between putative homoeologues and of orthologues in the two genera revealed that the timing of events that this gene purports to show is also inconsistent with the majority of genes. Scaling of the Ks values by the median value of the G. max–M. truncatula divergence (0.57) shows that the G. max–G. max comparisons produce Ks values (0.35 and 0.10) that are too small for the two rounds of large-scale duplication in G. max (Fig. 4). Several explanations are possible, broadly summarized as multiple rate shifts, which could be caused by codon bias (Zhang et al., 2002), gene conversion (Wendel et al., 1995), multisomic inheritance (Stebbins, 1971) or exon shuffling (Long et al., 2003), and multiple independent duplications and losses of paralogues. However, a higher than average rate of change and a failure to sample the M. truncatulaorthologue require the fewest assumptions (data not shown). If this is the case, this gene tree does not provide a test of the hypotheses, and therefore it is reasonable to treat this gene tree as providing equivocal support for either hypothesis. Ks data show that a few gene trees whose topologies support H2 are not consistent in their relative timing of events.
Several gene trees that had a topology consistent with H2 were not consistent with this hypothesis once Ks data were considered (Table 3). Once these genes were excluded, 11 gene trees were more consistent with H2 than H1, whereas only two trees supported H1. However, Ks data alone are not always reliable, as the duplicated gene data show. G. max–G. max and M. truncatula–M. truncatula duplicated gene pair Ks values are consistent with paleopolyploidy in these lineages, but misleadingly suggest that these taxa do not share a duplication event, as they appear not to be contemporary. However, the combination of topology and Ks information clearly supports the hypothesis that these taxa do in fact share a large-scale round of gene duplication. This illustrates the power of coupling Ks data with phylogenetic results. Genes where both topology and Ks information are in agreement provide a more robust test of hypothesis than either can alone and thus can make a valuable contribution to any multigene study of polyploidy.
Rates of Synonymous Site Divergence
We found that the median Ks for the Glycine-Medicagodivergence is 0.57 ± 0.02. Given that the fossil calibrated data of Lavin et al. (in press) suggests an age of ca. 54 Mya for this divergence, an average rate of 5.2 x 10– 9 Ks per year may be appropriate for the 39 genes used here. This is close to the rate of 6.1 x 10– 9 Ks per year suggested by Lynch and Connery (2000), but much less than the 15 x 10– 9 used by Blanc and Wolfe (2004) for dicots, which was based on two gene families in Arabidopsis and allies. The conversion of Ks to a time estimate relies on the assumptions that the distribution we found is a reasonable reflection of the divergence, that no systematic biases exist, and that our use of the median (rather than the mean) to correct for a slight right skew is appropriate.
The results of Blanc and Wolfe (2004) warrant further discussion. These authors also used Ks distributions in G. max and M. truncatula to infer the number and timing of large-scale duplications and found a similar pattern of peaks to Schlueter et al. (2004) and this study. They also ponder why these taxa do not share the older peak of Ks values, given that by their estimates both peaks are older than their estimate of taxon divergence. Their explanation centres around the Medicagolineage possessing an allopolyploid past, after the Glycinelineage diverged from both parents. A bimodal distribution of paralogue coalescence in Medicago can be explained by an allopolyploid event only if the allopolyploidy was segmental, such that some paralogues coalescence at the end of tetrasomic inheritance and others coalesce much earlier, at the divergence of the parents (Gaut and Doebley, 1997). A bimodal distribution could also be produced if the Medicago lineage underwent a second round of duplication independent of Glycine. Either of these explanations alone is problematic because they fail to account for the older peak in Glycine. Given our findings that these taxa share a round of large-scale gene duplication, an alternative scenario is that the taxa have differential rates after their divergence and after their shared large-scale duplication event. This explains both older peaks in these taxa with one duplication event and a rate shift across at least one of the genomes. The older duplication is probably not a segmental allopolyploid event, because there is no sign of bimodal peaks for the older events.
In addition, Glycine clearly has a more recent round of polyploidy, whereas Medicago has a more diffuse younger peak that may be explained by a correlated series of tandem duplications, but perhaps not by polyploidy. If our hypothesis of a genome-wide rate shift in one or both of these taxa is correct, the Ks found in orthologue pairwise comparisons will be skewed, either higher or lower depending on whether one of these lineages has increased or decreased its rate compared to other angiosperms. Therefore, the rate of 5.2 x 10– 9 Ks per year should, like any other calibration, be taken with a healthy dose of scepticism if applied to other genes and/or other taxa.
Implications for Mapping and Related Studies
Mapping and related synteny-based studies had initially indicated a polyploid origin of G. max, and later the possibility of a second round of polyploidy in the genus (Shoemaker et al., 1996; Lee et al., 2001; Yan et al., 2003). Our findings have further informed this line of inquiry by placing the divergence of the lineage containing M. truncatula between these events, which shows that two homoeologous chromosomal regions in G. max can be expected to be co-orthologous to regions in M. truncatula. Clearly this has major implications for the sampling design of comparative genomics studies using these two model legume taxa.
At Least 7000 Species, Including Many Crop Plants, Share What Is Most Likely an Ancient Polyploid Event
The clade of legumes that G. max and M. truncatula belong to that must share the common large-scale duplication event in their history includes around 7000 species—a third of all legumes. This clade, which encompasses tribe Indigofereae, the Millettieae + Phaseoleae + allies clade and the Hologalegina clade (Wojciechowski, 2003), includes nearly all of the agriculturally important legumes, such as Vicia (vetch), Pisum (pea), Lens (lentil), Phaseolus (bean), and Vigna (mung bean).
The event that these legumes share is probably a polyploid event for the following reasons. First, the detailed analysis of one group of homoeologous soybean linkage groups revealed that some duplicated syntenic regions are represented twice (Lee et al., 2001). Second, linkage (Shoemaker et al., 1996) and microsynteny (Yan et al., 2003) studies using probes from across the soybean genome indicate that regions showing evidence of ancient homoeology are not confined to one part of the genome. These lines of evidence make the alternative scenarios, that of either segmental duplication or correlated tandem duplications, difficult to reconcile with the available evidence.
| Acknowledgements |
|---|
We thank the molecular systematics group in the Department of Plant Biology, Cornell University, for helpful comments on a draft. We also thank Barbara Mable and James Cotton for their thorough and insightful comments that greatly improved the paper. BEP and JJD acknowledge awards DBI Plant Genome 0321664 and DEB-0089483 from the US National Science Foundation.
| References |
|---|
|
|
|---|
-
AGI. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature (2000) 408:796–815.[CrossRef][Medline]
Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. Basic local alignment search tool. J. Mol. Biol. (1990) 215:403–410.[CrossRef][Web of Science][Medline]
Blanc G., Wolfe K. H. Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell. (2004) 16:1667–1678.
Bowers J. E., Chapman B. A., Rong J., Paterson A. H. Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature. (2003) 422:433–438.[CrossRef][Medline]
Brown C. J., Todd K. M., Rosenzweig R. F. Multiple duplications of yeast hexose transport genes in response to selection in a glucose-limited environment. Mol. Biol. Evol. (1998) 15:931–942.[Abstract]
Bruno W. J., Halpern A. L. Topological bias and inconsistency of maximum likelihood using wrong models. Mol. Biol. Evol. (1999) 16:564–566.[Web of Science][Medline]
Cook D. R. Medicago truncatula—a model in the making. Curr. Opin. Plant Biol. (1999) 2:301–304.[CrossRef][Web of Science][Medline]
Doyle J. J., Doyle J. L., Brown A. H. D., Pfeil B. E. Confirmation of shared and divergent genomes in the Glycine tabacina polyploid complex (Leguminosae) using histone H3-D sequences. Syst. Bot. (2000) 25:437–448.[CrossRef]
Doyle J. J., Doyle J. L., Harbison C. Chloroplast-expressed glutamine synthetase in Glycine and related Leguminosae: Phylogeny, gene duplication, and ancient polyploidy. Syst. Bot. (2003) 28:567–577.
Felsenstein J. Cases in which parsimony or compatability methods will be positively misleading. Syst. Zool. (1978) 27:401–410.
Friedman R., Hughes A. L. Pattern and timing of gene duplication in animal genomes. Genome Res. (2001) 11:1842–1847.
Gaut B. S., Doebley J. F. DNA sequence evidence for the segmental allotetraploid origin of maize. Proc. Nat. Acad. Sci. USA (1997) 94:6809–6814.
Goldblatt P. Cytology and the phylogeny of Leguminosae. In: Advances in legume systematics—Polhill R. M., Raven P. M., eds. (1981) Kew, United Kingdom: Royal Botanic Gardens. Pages 427–463. Part 2.
Hadley H. H., Hymowitz T. Speciation and cytogenetics. In: Soybeans: Improvement, production, and uses—Caldwell B. E., ed. (1976) Madison, Wisconsin: ASA. Pages 97–116.
Hall T. A. BioEdit: A user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp. Ser. (1999) 41:95–98.
Jobson R. W., Albert V. A. Molecular rates parallel diversification contrasts between carnivorous plant sister lineages. Cladistics. (2002) 18:127–136.[CrossRef][Web of Science]
Lee J. M., Grant D., Vallejos C. E., Shoemaker R. C. Genome organisation in dicots. II. Arabidopsis as a bridging species to resolve genome evolution events among legumes. Theor. Appl. Genet. (2001) 103:765–773.
Long M., Deutsch M., Wang W., Betrán E., Brunet F. G., Zhang J. Origin of new genes: Evidence from experimental and computational analyses. Genetica. (2003) 118:171–182.[CrossRef][Web of Science][Medline]
Lynch M., Connery J. S. The evolutionary fate and consequences of duplicate genes. Sci. (2000) 290:1151–1155.
Mable B. K. Breaking down taxonomic barriers in polyploidy research. Trends Plant Sci. (2003) 8:582–590.[CrossRef][Web of Science][Medline]
Popp M., Oxelman B. Inferring the history of the polyploid Silene aegaea (Caryophyllaceae) using plastid and homoeologous nuclear DNA sequences. Mol. Phylogenet. Evol. (2001) 20:474–481.[CrossRef][Web of Science][Medline]
Posada D., Crandall K. A. Modeltest: Testing the model of DNA substitution. Bioinformatics. (1998) 14:817–818.
Quackenbush J., Liang F., Holt I., Pertea G., Upton J. The TIGR gene indicies: Reconstruction and representation of expressed gene sequences. Nucleic Acids Res. (2000) 28:141–145.
Rice P., Longden I., Bleasby A. EMBOSS: The European molecular biology open software suite. Trends Genet. (2000) 16:276–277.[CrossRef][Web of Science][Medline]
Rodin S. N., Riggs A. D. Epigenetic silencing may aid evolution by gene duplication. J. Mol. Evol. (2003) 56:718–729.[CrossRef][Web of Science][Medline]
Schlueter J. A., Dixon P., Granger C., Grant D., Clark L., Doyle J. J., Shoemaker R. C. Mining EST databases to resolve evolutionary events in major crop species. Genome (2004) 47:868–876.[Medline]
Schlueter J. A., Dixon P., Granger C., Shoemaker R. C. Mining the EST databases to determine evolutionary events in the legumes and grasses. Stadler Symp. Proc. (2003) Au: Schlueter et al. 2003 Journal, Volumeand page?
Shoemaker R. C., Polzin K., Labate J., Specht J., Brummer E. C., Olson T., Young N., Concibido V., Wilcox J., Tamulonis J. P., Kochert G., Boerma H. R. Genome duplication in Soybean (Glycine subgenus Soja). Genetics (1996) 144:329–338.[Abstract]
Small R. L., Ryburn J. A., Cronn R. C., Seelanan T., Wendel J. F. The tortoise and the hare: Choosing between noncoding plastome and nuclear ADH sequences for phylogeny reconstruction in a recently diverged plant group. Am. J. Bot. (1998) 85:1301–1315.
Stebbins G. L. Chromosomal evolution in higher plants (1971) London: Edward Arnold.
Swofford D. L. PAUP*. Phylogenetic analysis using parsimony (*and other methods) (1998) Sunderland, Massachusetts: Sinauer Associates. version 4.0b10.
Thompson J. D., Higgins D. G., Gibson T. J. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Res. (1994) 22:4673–4680.
Vision T. J., Brown D. G., Tanksley S. D. The origins of genomic duplication in Arabidopsis. Science (2000) 290:2114–2117.
Wagner A. The fate of duplicated gene: Loss or new function? BioEssays (1998) 20:785–788.[CrossRef][Web of Science][Medline]
Walsh J. B. How often so duplicated genes evolve new functions? Genetics (1995) 139:421–428.[Abstract]
Wendel J. F. New World tetraploid cottons contain Old World cytoplasm. Proc. Nat. Acad. Sci. USA. (1989) 86:4132–4136.[CrossRef]
Wendel J. F. Genome evolution in polyploids. Plant Mol. Biol. (2000) 42:225–249.[CrossRef][Web of Science][Medline]
Wendel J. F., Schnabel A., Seelanan T. Bidirectional interlocus concerted evolution following allopolyploid speciation in cotton (Gossypium). Proc. Nat. Acad. Sci. USA (1995) 92:280–284.
Wojciechowski M. F. Reconstructing the phylogeny of legumes (Leguminosae): An early 21st century perspective. In: Advances in legume systematics—Klitgaard B., Bruneau A., eds. (2003) Kew, United Kingdom: Royal Botanic Garden. Pages 5–35.
Wolfe K. H. Yesterday's polyploids and the mystery of diploidization. Nat. Rev. Genet. (2001) 2:333–341.[CrossRef][Web of Science][Medline]
Yan H. H., Mudge J., Kim D.-J., Larsen D., Shoemaker R. C., Cook D. R., Young N. D. Estimates of conserved microsynteny among the genomes of Glycine max, Medicago truncatula and Arabidopsis thaliana. Theor. Appl. Genet. (2003) 106:1256–1265.[Medline]
Yang Z. Phylogenetic analysis by maximum likelihood (PAML) (2000) London: University College. version 3.
Yang Z., Nielsen R. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol. Evol. (2000) 17:32–43.
Zhang L., Vision T. J., Gaut B. S. Patterns of nucleotide substitution among simultaneously duplicated gene pairs in Arabidopsis thaliana. Mol. Biol. Evol. (2002) 19:1464–1473.
This article has been cited by other articles:
![]() |
D. Grant, R. T. Nelson, S. B. Cannon, and R. C. Shoemaker SoyBase, the USDA-ARS soybean genetics and genomics database Nucleic Acids Res., January 1, 2010; 38(suppl_1): D843 - D846. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. S. Barker, H. Vogel, and M. E. Schranz Paleopolyploidy in the Brassicales: Analyses of the Cleome Transcriptome Elucidate the History of Genome Duplications in Arabidopsis and Other Brassicales Gen Biol Evol, November 3, 2009; 2009(0): 391 - 399. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Libault, T. Joshi, V. A. Benedito, D. Xu, M. K. Udvardi, and G. Stacey Legume Transcription Factor Genes: What Makes Legumes So Special? Plant Physiology, November 1, 2009; 151(3): 991 - 1001. [Full Text] [PDF] |
||||
![]() |
S. B. Cannon, G. D. May, and S. A. Jackson Three Sequenced Legume Genomes and Many Crop Species: Rich Opportunities for Translational Genomics Plant Physiology, November 1, 2009; 151(3): 970 - 977. [Full Text] [PDF] |
||||
![]() |
N. Gill, S. Findley, J. G. Walling, C. Hans, J. Ma, J. Doyle, G. Stacey, and S. A. Jackson Molecular and Chromosomal Evidence for Allopolyploidy in Soybean Plant Physiology, November 1, 2009; 151(3): 1167 - 1174. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. D. Kim, J. H. Shin, K. Van, D. H. Kim, and S.-H. Lee Dynamic Rearrangements Determine Genome Organization and Useful Traits in Soybean Plant Physiology, November 1, 2009; 151(3): 1066 - 1076. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. R. Singer, S. L. Maki, A. D. Farmer, D. Ilut, G. D. May, S. B. Cannon, and J. J. Doyle Venturing Beyond Beans and Peas: What Can We Learn from Chamaecrista? Plant Physiology, November 1, 2009; 151(3): 1041 - 1047. [Full Text] [PDF] |
||||
![]() |
R. J. Bayer, D. J. Mabberley, C. Morton, C. H. Miller, I. K. Sharma, B. E. Pfeil, S. Rich, R. Hitchcock, and S. Sykes A molecular phylogeny of the orange subfamily(Rutaceae: Aurantioideae) using nine cpDNA sequences Am. J. Botany, March 1, 2009; 96(3): 668 - 685. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. E. Soltis, V. A. Albert, J. Leebens-Mack, C. D. Bell, A. H. Paterson, C. Zheng, D. Sankoff, C. W. dePamphilis, P. K. Wall, and P. S. Soltis Polyploidy and angiosperm diversification Am. J. Botany, January 1, 2009; 96(1): 336 - 348. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. W. Innes, C. Ameline-Torregrosa, T. Ashfield, E. Cannon, S. B. Cannon, B. Chacko, N. W.G. Chen, A. Couloux, A. Dalwani, R. Denny, et al. Differential Accumulation of Retroelements and Diversification of NB-LRR Disease Resistance Genes in Duplicated Regions following Polyploidy in the Ancestor of Soybean Plant Physiology, December 1, 2008; 148(4): 1740 - 1759. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Wawrzynski, T. Ashfield, N. W.G. Chen, J. Mammadov, A. Nguyen, R. Podicheti, S. B. Cannon, V. Thareau, C. Ameline-Torregrosa, E. Cannon, et al. Replication of Nonautonomous Retroelements in Soybean Appears to Be Both Recent and Common Plant Physiology, December 1, 2008; 148(4): 1760 - 1771. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Van, D. H. Kim, C. M. Cai, M. Y. Kim, J. H. Shin, M. A. Graham, R. C. Shoemaker, B.-S. Choi, T.-J. Yang, and S.-H. Lee Sequence Level Analysis of Recently Duplicated Regions in Soybean [Glycine max (L.) Merr.] Genome DNA Res, April 1, 2008; 15(2): 93 - 102. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Schlueter, B. E. Scheffler, S. Jackson, and R. C. Shoemaker Fractionation of Synteny in a Genomic Region Containing Tandemly Duplicated Genes across Glycine max, Medicago truncatula, and Arabidopsis thaliana J. Hered., March 2, 2008; (2008) esn010v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. K. Brysting, B. Oxelman, K. T. Huber, V. Moulton, and C. Brochmann Untangling Complex Histories of Genome Mergings in High Polyploids Syst Biol, June 1, 2007; 56(3): 467 - 476. [Abstract] [Full Text] [PDF] |
||||
![]() |
X.-C. Zhang, X. Wu, S. Findley, J. Wan, M. Libault, H. T. Nguyen, S. B. Cannon, and G. Stacey Molecular Evolution of Lysin Motif-Type Receptor-Like Kinases in Plants Plant Physiology, June 1, 2007; 144(2): 623 - 636. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Popp and B. Oxelman Origin and evolution of North American polyploid Silene (Caryophyllaceae) Am. J. Botany, March 1, 2007; 94(3): 330 - 349. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Cardinal, J. W. Burton, A. M. Camacho-Roger, J. H. Yang, R. F. Wilson, and R. E. Dewey Molecular Analysis of Soybean Lines with Low Palmitic Acid Content in the Seed Oil Crop Sci., February 6, 2007; 47(1): 304 - 310. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. A. Jackson, D. Rokhsar, G. Stacey, R. C. Shoemaker, J. Schmutz, and J. Grimwood Toward a Reference Sequence of the Soybean Genome: A Multiagency Effort Crop Sci., November 1, 2006; 46(Supplement_1): S-55 - S-61. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. B. Cannon, L. Sterck, S. Rombauts, S. Sato, F. Cheung, J. Gouzy, X. Wang, J. Mudge, J. Vasdewani, T. Schiex, et al. Legume genome evolution viewed through the Medicago truncatula and Lotus japonicus genomes PNAS, October 3, 2006; 103(40): 14959 - 14964. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. E. Schranz and T. Mitchell-Olds Independent Ancient Polyploidy Events in the Sister Families Brassicaceae and Cleomaceae PLANT CELL, May 1, 2006; 18(5): 1152 - 1165. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||














