© 2006 Society of Systematic Biologists
Exploring Frontiers in the DNA Landscape: An Introduction to the Symposium "Genome Analysis and the Molecular Systematics of Retroelements"
Edited by Rod Page: Associate Editor
1 Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University 26 Oxford Street, Cambridge, Massachusetts 02138, USA E-mail: shedlock{at}oeb.harvard.edu
| Abstract |
|---|
|
|
|---|
The emerging field of phylogenomics is influencing both the amount and type of characters being brought to bear on long-standing problems in systematic biology. Moreover, the proliferation of sequence information from genome projects in concert with the development of new informatics tools is widening access to comparative data on retroelements to a broad cross section of investigators. Motivated by this, the Society of Systematic Biologists sponsored a symposium entitled "Genome Analysis and the Molecular Systematics of Retroelements," and the resulting papers illustrate this theme of new discoveries and cover three basic areas of research: (i) the taxonomic distribution and phylogenetic structure of families of retroelements; (ii) the use of SINE and LINE insertions for phylogenetic inference; and (iii) the informatics and classification of repetitive elements. Contributions of each article are briefly discussed in this context and particularly fruitful directions for future research illuminated by results of this symposium are reviewed.
Keywords: Interspersed repeat; LINE; phylogenomics; retroelements; SINE; transposable element
Received October 1, 2006; Revised October 15, 2006; Accepted October 19, 2006
| Summary and Prospectus |
|---|
|
|
|---|
Phylogenomics and the Democratization of Retroelements
Closer to the Arctic Circle than anyone might have expected, a group of leading international experts gathered for a symposium entitled "Genome Analysis and the Molecular Systematics of Retroelements," held at the annual meeting of the Society of Systematic Biologists (SSB) in June 2005 at the University of Alaska in Fairbanks. It was the first time a number of the participants had attended the SSB Annual Meetings in part because the past 20 years of literature on the molecular evolution of retroelements has been largely rooted in biochemistry, cell biology, and medicine. Retroelements are broadly defined here as mobile repeats found interspersed throughout eukaryotic genomes that rely on an RNA intermediate to undergo amplification and relocation in the host genome from a parent to target locus. As we attempt to manage the biodiversity of organisms as a critical natural resource and resolve the tree of life in the era of genomics, and in particular as the medical industry continues to drive down the cost of whole-genome sequencing, investigators can now begin to more thoroughly integrate advances in genome research with the principles of systematic biology. The emergence of the field of "phylogenomics" provides a wealth of opportunities for comparative biologists to bring both more and new types of character data to bear on solving long-standing problems in systematic biology (Edwards et al., 2005; Shedlock et al., 2006). It was in this spirit of exploration and integrative biology that the SSB symposium on retroelements was conducted, and has produced the series of featured articles in the present issue of Systematic Biology.
Whether we like it or not, well over half of our genotype is filled with selfish molecular parasites and their dead, fossilized remains (Fig. 1; Lander et al., 2001). As the chief architects of genomic diversity, retroelements have become a target for understanding the evolutionary dynamics of chromosomal DNA (Batzer and Deininger, 2001; Brosius, 1991; Kazazian, 2004; Weiner, 2002) and have been highlighted as prime examples of genetic conflict in action (Burt and Trivers, 2006). On the other hand, recent studies indicate that retroelements may be routinely co-opted as functional units and subsequently incorporated into highly conserved novel gene regulatory networks (Bejerano et al., 2006; Nishihara et al., 2006; Xie et al., 2006), thereby blurring traditional cut-and-dry boundaries between functional and nonfunctional compartments of the genome. The relative abundance of retroelements is uneven among eukaryotes due to a variety of historical factors such as rates of amplification, removal from large-scale deletion, and variation in levels of point mutation (Fig. 2). This diversity and uneven taxonomic distribution facilitates their use as genetic markers for some species but also limits their practical application to a variety of problems in systematic biology (see reviews by Shedlock and Okada, 2000; Shedlock et al., 2004). The effective application of retroelements as phylogenetic tools depends on our understanding of both their diversity among species and an understanding of how they have evolved within species. In this respect, the systematics of the elements themselves reciprocally illuminates their use as characters to infer common ancestry among host species or the genetic structure of populations. In this vein, the symposium papers featured in the present issue of Systematic Biology present results on a variety of taxonomic scales that are united by testing evolutionary hypotheses within a phylogenetic framework and by genomics-enabled approaches to investigation.
|
|
Featured Symposium Articles
The scope of the present symposium series of articles can be broken down into three major categories that complement each other: (1) The molecular evolution and phylogenetic structure of retroelement families; (2) the use of SINE and LINE insertion patterns to infer common ancestry among host lineages; and (3) the informatics and classification of interspersed repeats in plant and animal genomes.
The first category is represented by three articles. Irina Arkhipova presents the first phylogenetic treatment of Penelope-like elements, which are notably ancient in their origin and possess a suite of unique features such as the ability of some groups to retain introns. The largest single component of mammalian genomes, the LINE-1 family, is surveyed comprehensively across deuterostomes by Dusan Kordis and colleagues to describe macroevolutionary trends in this family of retroelements, contrasting patterns of diversity apparent before and after the origin of tetrapods. And the ancient chicken repeat-1 (CR1) elements that have a phylum-wide distribution are investigated by myself in the sister group of birds and mammals to reveal substantial but largely unexplored subfamily diversity apparent in nonavian reptilian clades.
Two papers highlight different aspects of using insertion patterns of retroelements to infer the common ancestry of species. A large empirical study by Sasaki and colleagues exemplifies the standards for employing the SINE method of phylogenetic inference developed extensively in nonmodel species by Norihiro Okada's laboratory. The study not only resolves the relationships of Old World freshwater turtles using a large number of independent loci, but reveals patterns of morphological convergence and rapid speciation in some clades that have confounded a clear understanding of the evolutionary history of these highly modified reptiles. David Ray and colleagues from Mark Batzer's group have reviewed the extensive body of data available for more than 11,000 primate-specific Alu SINEs. Their evaluation of homoplasy based on parallel insertions, precise excisions, and lineage sorting artifacts informs the debate regarding the importance of these misleading events that collectively violate critical assumptions inherent to constructing cladograms with LINE and SINE insertions.
Lastly, two articles highlight the rapidly expanding role of bioinformatics that is helping systematists find new informative retroposon loci and also underscore the challenges to classifying a proliferation of new elements being discovered in the wake of genomics. Colleagues led by Jürgen Schmitz in the Brosius Laboratory showcase the performance of their newly developed computer program, CPAL, for isolating informative SINEs from the wealth of genomic information now available for rodents. These authors integrate in silico results with experimental confirmation using PCR amplification of target loci in species representing major rodent clades. The status of SINE analysis in plants is reviewed by Deragon and Zhang, emphasizing the complexities unique to understanding the evolution of plant retroelements. The diversity of plant SINEs is largely underutilized for studying the systematics of crop species, although groundbreaking examples have been published for rice and members of the Brassicaceae. A phylogenetic analysis is used to compare evolutionary histories of SINE families between Brassica and Arabidopsis genomes and to propose a new classification of 15 plant SINE families in an effort to clarify existing taxonomic confusion.
| Future Research |
|---|
|
|
|---|
Successful symposia not only generate insights from new results but help illuminate directions for future research. Presently the glut of empirical results on retoelements far exceeds a theoretical framework for modeling SINE and LINE evolution. The wealth of genome-scale information on primate-specific Alu SINEs has supported such efforts and has led to advances in our understanding of the multiple-source gene model operating for SINEs in primates (Cordaux et al., 2004; Han et al., 2005). Katzourakis et al. (2005) recently investigated a model for the dynamics of human endogenous retroviruses (HERVs) within a genome which allows for changes in the number of active elements over time, extending a fixed source-gene model proposed by Walsh (1985). It would be useful to adapt and extend the HERV model to the study of LINE or SINE elements among multiple species. A multiple-species coalescent algorithm accommodating departures from the standard Kingman model (Eldon and Wakeley, 2006; Kingman, 1982) applied to specific families of elements could support simulation studies under different evolutionary conditions and help optimize sampling strategies for gathering sufficient informative SINE and LINE loci to complete large systematic projects. This would be especially advantageous for tackling difficult phylogenetic problems such as those associated with small effective populations or short divergence times between lineages under investigation.
Statistical evaluation of support for clades inferred with retroelement insertion data is another area open for further development, given that the bootstrap is not ideally suited for the small number of polarized characters in SINE and LINE data matrices (Sanderson, 1995; Shedlock and Okada, 2000). Important progress has been made recently by Waddell et al. (2001) using likelihood-ratio tests to evaluate whether a particular topology based on indels can be considered statistically significant. This has been especially valuable for determining the minimum number of loci required to achieve a 95% confidence interval on given nodes of a SINE cladogram. A related area of research is the integration of SINE and LINE insertions with DNA sequences flanking each informative retroposon locus. Lum et al. (2000) demonstrated the relationship between SINE trees and those inferred from associated flanking sequences is robust to assumptions of orthology for artiodactyl mammals. Hasegawa and colleagues have elaborated this approach to estimate divergence times of nodes established by retroelement insertion patterns using a relaxed clock with the Bayesian machinery of Thorne et al. (1998) and fossil calibrations (e.g., Nikaido et al., 2001). In general the integration of retroposon insertion patterns, DNA sequences, and fossils offers an attractive program for helping assemble difficult branches in the Tree of Life initiative, such as those exemplified by basal therian mammals (Murphy et al., 2001) and avian interordinal relationships (Cracraft et al., 2004).
It is likely that developing a population-genetic analytical framework for retroelements will help expand their use by molecular ecologists to study the geographic structure of populations. For example, the use of population-specific Alu polymorphisms is still largely underutilized in human forensics relative to other genotyping technologies and should also greatly facilitate primate conservation efforts (Ray, 2006). Although much attention has been placed on using fixed loci to reconstruct cladograms of host species, unfixed polymorphic SINEs and LINEs in eukaryotic genomes offer a wealth of biallelic genetic markers that are identical by descent and can be used to reconstruct the demographic histories of subpopulations below the species level (Batzer and Deininger, 2001; Shedlock et al., 2004).
Finally, it will be useful for systematists to refine existing informatics tools to help detect new elements de novo and reduce the ascertainment bias inherent to aligning relatively distant subject and query sequences in the incomplete genome database. One approach to this problem has been to focus on classifying elements based on transposable-element encoding protein sequence alignments, which are less prone to false positives from BLASTn- and BLASTx-like searches (Altschul et al., 1997; Jurka et al., 2005; Smit, 2006). One drawback is that this does not allow one to classify short target elements of systematic interest lacking such coding sequence, most notably SINEs. However, one can take advantage of the fact that most SINEs are derived from tRNA to detect new SINEs in the genome (Churakov et al., 2004; Okada et al., 2004). Tools such as tRNAscan-SE (Lowe and Eddy, 1997) may also facilitate in silico detection of novel SINEs that may not BLAST significantly to tRNA genes annotated in Genbank. A logical extension of this effort is to provide a phylogenetic context for subject retroelement sequences. Preliminary attempts to do this based on parsed BLAST searches are being developed by Malcolm (2006). More restrictive phyogenentic comparisons of repeat types based on conserved domains of amino acid sequence such as the endonuclease and reverse transcriptase regions of retrotransposons (Malik et al., 1999; Xiong and Eickbush, 1990) may prove particularly helpful for diagnosing novel repeats from poorly explored genomes. However, developing such services poses considerable curatorial challenges for database managers to ensure that reliable phylogenetic results could be consistently obtained.
| AcknowledgmentS |
|---|
|
|
|---|
I would like to extend my gratitude to all the participants of the SSB 2005 symposium "Genome Analysis and Molecular Systematics of Retroelements" for their valuable contributions and to the SSB council for their support of this event. Mark Batzer deserves special recognition for his exceptional enthusiasm and help with coordinating this symposium. Rod Page and Debbie Ciszek provided critical guidance and editorial input throughout the publication process. I would also like to thank Scott Edwards, Nori Okada, and Masami Hasegawa for advice and encouragement, John Wakeley for sharing ideas about evolutionary models, and Harvard University for funding.
| References |
|---|
|
|
|---|
-
Altschul S., Madden T., Schaffer A., Zhang J., Zhang Z., Miller W., Lipman D. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. (1997) 25:3389–3402.
Batzer M., Deininger P. L. Alu repeats and human genomic diversity. Nat. Rev. Genet. (2001) 3:370–379.[Web of Science]
Bejerano G., Lowe C. B., Ahituv N., King B., Siepel A., Salama S. R., Rubin E. M., Kent W. J., Haussler D. A distal enhancer and an ultraconserved exon are derived from a novel retroposon. Nature (2006) 441:87–90.[CrossRef][Medline]
Brosius J. Retroposons—Seeds of evolution. Science (1991) 251:753.
Burt A., Trivers R. Genes in conflict: The biology of selfish genetic elements (2006) Cambridge: Harvard University Press.
Churakov G., Smit A. F. A., Brosius J., Schmitz J. A novel abundant family of retroposed elements (DAS-SINEs) in the nine-banded armadillo (Dasypus novemcinctus). Mol. Biol. Evol. (2004) 22:886–893.[CrossRef][Web of Science][Medline]
Cordaux R., Hedges D. J., Batzer M. A. Retrotransposition of Alu elements: How many sources? Trends Genet. (2004) 20:464–467.[CrossRef][Web of Science][Medline]
Cracraft J., Barker F. K., Braun M., Harshman J., Dyke G. J., Feinstein J., Stanley S., Cibois A., Schickler P., Beresford P., Garcia-Moreno J., Sorenson M. D., Tamaki Y., Mindell D. P. Phylogenetic relationships among modern birds (Neornithes): Toward an avian tree of life. In: Assembling the Tree of Life—Donoghue C. J., Donoghue M. J., eds. (2004) New York: Oxford University Press. 468–489.
Edwards S. V., Jennings W. B., Shedlock A. M. Phylogenetics of modern birds in the era of genomics. Proc. R. Soc. Lond. B. (2005) 272:979–992.[Medline]
Eldon B., Wakeley J. Coalescent processes when the distribution of offspring number among individuals is highly skewed. Genetics (2006) 172:2621–2633.
Han K., Xing J., Wang H., Hedges D. J., Garber R. K., Cordaux R., Batzer M. A. Under the genomic radar: The stealth model of Alu amplification. Genome Res. (2005) 15:655–664.
Hillier L. W., Miller W., Birney E., Warren W., Hardison R. C., Ponting C. P., Bork P., Burt D. W., Groenen M. A., Delaney M. E., Dodgson J. B. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature (2004) 432:695–716.[CrossRef][Medline]
Holt R. A., Subramanian G. M., Halpern A., Sutton G. G., Charlab R., Nusskern D. R., Wincker P., Clark A. G., Ribeiro J. M. C., Wides R., et al. The genome sequence of the malaria mosquito Anopheles gambiae. Science (2002) 298:129–149.
International Rice Genome Sequencing Consortium. The map-based sequence of the rice genome. Nature (2005) 436:793–800.[CrossRef][Medline]
Jaillon O., Aury J.-M., Brunet F., Petit J.-L., Stange-Thomann N., Mauceli E., Bouneau L., Fischer C., Ozouf-Costaz C., Bernot A., et al. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature (2004) 431:946–957.[CrossRef][Medline]
Jurka J., Kapitonov V. V., Pavlicek A., Klownowski P., Kohany O., Walichiewicz J. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. (2005) 110:462–467. www.girinst.org/repbase/index.html.[CrossRef][Web of Science][Medline]
Katzourakis A., Rambaut A., Pybus O. G. The evolutionary dynamics of endogenous retroviruses. Trends Microbiol. (2005) 13:463–468.[CrossRef][Web of Science][Medline]
Kazazian H. H. J. Mobile elements: Drivers of genome evolution. Science (2004) 303:1626–1632.
Kingman J. On the genealogy of large populations. In: Essays in statistical science—Gani J., Hannan E., eds. (1982) London: Applied Probability Trust. 27–43.
Kirkness E. F., Bafna V., Halpern A. L., Levy S., Remington K., Rusch D. B., Delcher A. L., Pop M., Wang W., Fraser C. M., Venter J. C. The dog genome: Survey sequencing and comparative analysis. Science (2003) 301:1898–1903.
Lander E. S., Linton L. M., Birren B., Nusbaum C., Zody M. C., Baldwin J., Devon K., Dewar K., Doyle M., FitzHugh W., et al. Initial sequencing and analysis of the human genome. Nature (2001) 409:860–921.[CrossRef][Medline]
Lowe T. M., Eddy S. R. tRNAscan-SE: A program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. (1997) 25:955–996.
Lum J. K., Nikaido M., Shimamura M., Shimodaira H., Shedlock A. M., Okada N., Hasegawa M. Consistency of SINE insertion topology and flanking sequence tree: Quantifying relationships among Cetartiodactyls. Mol. Biol. Evol. (2000) 17:1417–1424.
Malcolm C. Retroposon Base is maintained by the author at the School of Biological Sciences (2006) Queen Mary University of London. www.retroposonbase.com.
Malik H. S., Burke W. D., Eickbush T. The age and evolution of non-LTR retrotransposable elements. Mol. Biol. Evol. (1999) 16:793–805.[Abstract]
Murphy W. J., Eizirik E., Johnson W. E., Zhang Y. P., Ryder O. A., O'Brien S. J. Molecular phylogenetics and the origins of placental mammals. Nature (2001) 409:614–618.[CrossRef][Medline]
Nikaido M., Matsuno F., Hamilton H., Brownell R. L. J., Cao Y., Ding W., Zuoyani Z., Shedlock A. M., Fordyce R. E., Fordyce H. M., Okada N. Retroposon analysis of major cetacean lineages: The monophyly of toothed whales and the paraphyly of river dolphins. Proc. Natl. Acad. Sci. USA (2001) 98:7384–7389.
Nishihara H., Smit A. F. A., Okada N. Functional noncoding sequences derived from SINEs in the mammalian genome. Genome Res. (2006) 16:864–874.
Okada N., Shedlock A. M., Nikaido M. Retroposon mapping in molecular systematics. In: Mobile genetic elements—Miller W. J., Capy P., eds. (2004) Totowa, New Jersey: Humana Press. 189–226.
Ray D. A. SINEs of progress: Mobile element applications to molecular ecology. Mol. Ecol. (2006) In press.
Sanderson M. J. Objections to bootstrapping phylogenies: A critique. Syst. Biol. (1995) 44:299–320.
Shedlock A. M., Janes D., Edwards S. V. Amniote phylogenomics: Testing evolutionary hypotheses with BAC library scanning and targeted clone analysis of large-scale DNA sequences from reptiles. In: Phylogenomics—Murphy W. J., ed. (2006) Totowa, New Jersey: Humana Press. xx–xx.
Shedlock A. M., Okada N. SINE insertions: Powerful tools for molecular systematics. Bioessays (2000) 22:148–160.[CrossRef][Web of Science][Medline]
Shedlock A. M., Takahashi K., Okada N. SINEs of speciation: Tracking lineages with retroposons. Trends Ecol. Evol. (2004) 19:545–553.[CrossRef][Medline]
Smit A. F. A. (2006) RepeatMasker version 3.1.5 http://www.repeatmasker.org.
Thorne J. L., Kishino H., Painter I. S. Estimating the rate of evolution of the rate of molecular evolution. Mol. Biol. Evol. (1998) 15:1647–1657.[Abstract]
Waddell P. J., Kishino H., Ota R. A phylogenetic foundation for comparative mammalian genomics. Genome Informat. (2001) 12:141–154.
Waterston R. H., Lindblad-Toh K., Birney E., Rogers J., Abril J. F., Agarwal P., Agarwala R., Ainscough R., Alexandersson M., An P., et al. Initial sequencing and comparative analysis of the mouse genome. Nature (2002) 420:520–562.[CrossRef][Medline]
Weiner A. M. SINEs and LINEs: The art of biting the hand that feeds you. Curr. Opin. Cell Biol. (2002) 14:343–350.[CrossRef][Web of Science][Medline]
Xie X., Kamal M., Lander E. S. A family of conserved noncoding elements derived from an ancient transposable lement. Proc. Natl. Acad. Sci. USA (2006) 103:11659–11664.
Xiong Y., Eickbush T. H. Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J. (1990) 9:3353–3362.[Web of Science][Medline]
This article has been cited by other articles:
![]() |
C. L. Organ, S. L. Brusatte, and K. Stein Sauropod dinosaurs evolved moderately sized genomes unrelated to body size Proc R Soc B, December 22, 2009; 276(1677): 4303 - 4308. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


