© 2005 Society of Systematic Biologists
Getting to the Roots of Matrix Representation
Edited by Rod Page: Associate Editor
1 Lehrstuhl für Tierzucht, Technical University of Munich Hochfeldweg 1, 85354 Freising-Weihenstephan, Germany E-mail: Olaf.Bininda{at}tierzucht.tum.de (O.R.P.B.-E.)
2 Department of Biological Sciences, Imperial College London Silwood Park Campus, Ascot SL5 7PY, United Kingdom E-mail: robin.beck{at}student.unsw.edu.au (R.M.D.B.) a.purvis{at}imperial.ac.uk (A.p)
3 The Natural History Museum Cromwell Road, London SW7 5BD, United Kingdom
Received May 27, 2004; Revised September 29, 2004; Accepted November 19, 2004 Many supertree methods rely on the matrix representation (MR) of relationships in a set of source trees. The most common coding method is based on additive binary coding (Farris et al., 1970): for each informative node in a source tree (i.e., ones that correspond to a parsimony-informative character), taxa that are descended from that node are scored as 1; those that are not, but are present on the tree are scored as 0; and those that are absent on that source tree, but present on other trees in the set are scored as ?. MR supertree construction has usually been performed using source trees that are rooted. This rooting can be accomplished either by re-rooting all trees using a single taxon common to all source trees (Baum, 1992) or by the inclusion of an additional, hypothetical outgroup encoded entirely with zeros (Ragan, 1992). Because the existence of a single universal taxon is rare in a supertree setting (where source trees need only have overlapping, but not identical taxon sets), most supertree studies have employed the hypothetical outgroup (herein, the "MR-outgroup"), which acts essentially as the universal taxon.
The use of the MR-outgroup is not problematic in and of itself. Most source trees are rooted, and the use of the MR-outgroup preserves this rooting information. By contrast, it is the position of the root in the source tree that may be problematic in a supertree context. Often, the root will have been determined based on received phylogenetic opinion at the time, a type of "appeal to authority" (sensu Gatesy et al., 2002) in the form of an assumption of monophyly. Changes in phylogenetic opinion might mean that, among other changes, the ingroup in the source tree is no longer regarded as being monophyletic with respect to the outgroup. The use of the MR-outgroup will maintain this relationship, but, fortunately, a slight modification to the MR coding procedure can easily address any errors that would arise from the use of an outdated root.
The inclusion of unrooted source trees in a supertree analysis is uncommon, but not new. The family-level MRP supertree of eutherian mammals of Liu et al. (2001) includes both rooted and unrooted source trees and, in theory, many other supertree methods can similarly accommodate such a mixture (Wilkinson et al., 2004). Some methods, notably quartet-based ones (e.g., Piaggio-Talice et al., 2004), can only handle unrooted source trees. Wilkinson et al. (2004) recently discuss both the general desirability of including unrooted source trees and the feasibility of this for specific supertree methods. In this paper, we expand on this theme by examining the desirability of purposely encoding at least some rooted source trees as unrooted, particularly in the context of MR with parsimony supertree analysis (MRP; Baum, 1992; Ragan, 1992). We first provide a simple case study illustrating the problems that appeals to authority can raise in a purely rooted-MR context. From this example, we advocate a semirooted form of analysis in which only source trees for which the placement of the root is held to be robust by the investigator are encoded in a rooted fashion; the remaining source trees are instead purposely unrooted. Liu et al. (2001) apparently employed a similar, albeit undocumented, strategy in their supertree analysis (F.-G. R. Liu, personal communication) and we conclude by examining the need for encoding at least some rooted source trees as unrooted based on our experiences in updating the Liu et al. (2001) supertree.
| Rooted Source Trees and Supertree Analysis |
|---|
|
|
|---|
Most evolutionary trees in biology are rooted. However, it is perhaps underappreciated that rooting is an a posteriori procedure with many inherent assumptions (which are typically a priori). Typically, phylogenetic analyses produce one or more unrooted trees that are subsequently rooted using methods such as outgroup rooting (Maddison et al., 1984), directed character distributions, a molecular clock, midpoint rooting, or rare genomic changes (Rokas and Holland, 2000) in particular (for a general review of most methods, see Swofford et al., 1996). The comparative performance of the first three of these methods was recently examined by Huelsenbeck et al. (2002).
Yet, all rooting methods make inherent assumptions and the robustness of the position of a root can vary based on the specific validity of those assumptions and the characteristics of the analysis itself. For instance, using evidence from gene duplication data, Donoghue and Mathews (1998) inferred the root of the angiosperms to lie along the branch leading to Sorghum. However, with the subsequent analysis of additional species, it was found that the root lay in a very different location, at or close to the branch leading to Amborella (Mathews and Donoghue, 1999, 2000). It has been long appreciated that phylogenetic estimates for a given group can differ merely because of disagreement as to the position of the root. This point was amply demonstrated for apparently conflicting phylogenies of cetaceans by Messenger and McGuire (1998). Another example is for eutherian mammals. Although the rooted trees derived from nuclear versus mitochondrial sequences conflict, the unrooted networks are congruent with respect to the major eutherian clades (Lin et al., 2002).
Thus, incongruence between source trees, which can lead to poor or incorrect resolution in a supertree, might in some cases be because of the source trees being rooted differently, when in reality the unrooted relationships of the ingroup taxa are congruent. However, in assessing these differences, it is important to distinguish between those that arise because of questionable decisions (e.g., the choice of what in hindsight is an inappropriate outgroup) versus those that arise because of the error inherent to any phylogenetic analysis. For example, because the choice of outgroup in each of the Messenger and McGuire (1998) and Lin et al. (2002) examples is apparently robust, the incongruences found probably represent real incongruence and not an artifact of a questionable rooting decision. We would argue that only the latter is sufficient cause to actively deroot trees.
The impact of rooting decisions on a supertree analysis can be demonstrated clearly using a simple example. Consider the unrooted network of four taxa shown in Figure 1. This network can be rooted along any of the five branches, giving rise to five conflicting rooted trees. Each of these trees is equally reasonable in the absence of other information. A standard MRP analysis (i.e., with the MR-outgroup) of the set of five rooted trees yields a single most parsimonious solution of 14 steps that is identical to source tree 2 in Figure 1. The incongruence among the source trees is indicated by all standard goodness-of-fit indices being less than 1 (CI = 0.7143; RI = 0.6000; RC = 0.4737). If, however, we analyze the same five trees, but exclude the MR-outgroup, a single most parsimonious solution of six steps is found that is equivalent to the original unrooted network. More importantly, all three goodness-of-fit indices have values of 1, indicating that, as expected, the unrooted versions of the source trees are congruent.
|
If, however, we have robust rooting information for one or more (but not all) source trees, it is possible to conduct a semirooted analysis in which some source trees are encoded as rooted (using the MR-outgroup) and the remainder as unrooted. In this way, the placement of the root for the unrooted trees is determined not a priori by the investigator, but analytically according to its optimal fit with any rooted source trees on the final supertree. In essence, semirooted analyses are analogous to a conventional phylogenetic analysis in which the outgroup taxon contains missing data for some characters. The polarity of these characters is then established based on the inferred ancestral state that maximizes their fit on the overall tree. Again, take the example of the five trees in Figure 1. If one source tree is input as being rooted (i.e., with the MR-outgroup coded as 0 for it and ? for all remaining source trees), standard MRP analysis yields a single tree of length 10 that is identical to the rooted source tree. All goodness-of-fit indices again have a value of 1, indicating the congruence of the remaining unrooted trees with the one rooted tree. The preceding is true regardless of which of the five source trees is input singly in a rooted fashion.
Mechanically, unrooted or semirooted MR supertree analyses are identical to conventional phylogenetic analyses. In all cases, an unrooted network is returned by the analysis that is then rooted subsequently using some desired method. As such, the difference between an unrooted and semirooted supertree analysis is largely semantic, with the latter containing an explicit outgroup taxon with which to root the supertree (i.e., the fictitious MR-outgroup used for the rooted source trees). Unrooted analyses do not contain such a taxon, although any taxon could be used in a similar manner to root the supertree.
Simultaneous MR coding of unrooted and rooted source trees is implemented in SuperMRP.pl (available from http://www.tierzucht.tum.de/Bininda-Emonds/). Unrooted and semirooted analyses can also proceed from a matrix of rooted codings by either deleting the MR-outgroup entirely (unrooted) or changing its states to ? those for only trees that will be unrooted (semirooted), and then, for MRP analyses at least, deleting any resulting parsimony uninformative matrix elements.
| When to Use Unrooted Source Trees |
|---|
|
|
|---|
Although supertree analyses with unrooted source trees have been shown to have several highly undesirable properties (Steel et al., 2000; Böcker, 2004), these specific properties manifest themselves only with the restriction that the supertree method outputs a single tree only (Steel et al., 2000). As such, unrooted coding can be used with optimization supertree methods such as MRP that can return more than one tree. The use of semirooted coding, which represents a variant of a rooted analysis, should also avoid many of the problems affecting analyses of unrooted trees only.
The encoding of source trees as unrooted is applicable to most MR methods and optimization criteria and allows for the inclusion of the few unrooted source trees from the literature. However, as pointed out by Wilkinson et al. (2004), unrooted encodings obviously cannot be used for any methods that require rooted source trees, such as Purvis MR-coding (Purvis, 1995) and triplet or three-taxon statement supertree methods (e.g., Wilkinson et al., 2001), among others. Additionally, despite being applicable in theory, current implementations of both MinFlip supertrees (MRF; Chen et al., 2004) and irreversible MRP (Bininda-Emonds and Bryant, 1998) preclude the use of unrooted source trees. In the former case, MRF assumes that 1s encode membership in a given clade (O. Eulsenstein, personal communication), although this is not a necessary characteristic of the method. In the latter case, limitations exist in many existing parsimony programs regarding irreversible (Camin-Sokal) parsimony. For example, MacClade (Maddison and Maddison, 2000) imposes the limitation that irreversible characters are in one direction only ("up": 0
1 only), whereas PAUP* (Swofford, 2002) requires either of the two permitted directions to be specified a priori for each character. In irreversible MRP of unrooted trees, both directions should be possible a priori, with the proper direction for each pseudocharacter determined during the analysis based on the fit to rooted source trees.
In certain cases, such as a time-series analysis to investigate changes in phylogenetic opinion over time, it might be desirable to maintain the rooting information in the source trees, even if this information is now know to be erroneous. It might also be felt that the rooting of a source tree is an important auxiliary factor contributing to the source tree being a phylogenetic hypothesis rather than purely a summary of the data matrix (see Bininda-Emonds, 2004) and thus should be retained. However, we advocate that (at least) semirooted coding should be used whenever the rooting of the source tree is held to be sufficiently questionable such that it might impact on the specific analysis. This is probably most relevant when all the taxa in the source tree, and the outgroup taxa in particular, comprise part of the ingroup in the supertree analysis (see below). In such a case, the assumptions made in rooting the source tree, although perhaps valid in the source study, could affect the supertree analysis. Instead, it is better to code the source tree as unrooted and let the position of the root be determined by other source trees that are rooted with a taxon that is not in the ingroup of the supertree.
We note, however, that some source trees could probably be retained as rooted even when their taxon sets are restricted to the ingroup of the overall analysis. These include source trees that have been rooted using any of a number of rare genomic changes such as gene duplications or the presence absence of large indels (see Rokas and Holland, 2000). Such changes are held to be less susceptible to homoplasy and therefore might provide more robust evidence for the placement of the root of the tree, even in the absence of outgroup information. However, as the gene duplication examples mentioned above (Donoghue and Mathews, 1998; Mathews and Donoghue, 1999, 2000) illustrate, even the placement of roots determined using rare genomic changes can be subject to revision. Moreover, such assessments are also dependent upon implicit assumptions. For example, the use of the phytochrome gene duplication evidence to root the angiosperms rests on the assumption that the relevant genes (PHYA and PHYC) diverged just before the root of the angiosperms (Donoghue and Mathews, 1998), something that has recently been called into question (Sharrock and Mathews, in press).
| The Utility of Unrooting Source Trees—An Example from Eutherian Mammals |
|---|
|
|
|---|
The potential problems in using all source trees as rooted are clearly illustrated by our efforts to update the analysis of Liu et al. (2001) to build a higher-level MRP supertree of eutherian (placental) mammals (Beck et al., in preparation). In most source trees, the taxon sampling was limited to mammals, and usually to eutherians. However, the position of the root of the eutherian tree (and, to a lesser extent, of the mammal tree as a whole) remains controversial. Received opinion on this topic differs greatly and has changed with time, with any of erinaceids (e.g., Arnason et al., 2002), xenarthrans (e.g., Shoshani and McKenna, 1998), murid rodents (e.g., Asher et al., 2003; Misawa and Janke, 2003), and afrotherians (e.g., Murphy et al., 2001; Amrine-Madsen et al., 2003) having been championed recently as the basal placental lineage (for a review, see Springer et al., 2004). However, the uncertainty surrounding placental phylogeny (including the number and composition of the orders) means that many existing placental-only trees might be rooted inappropriately according to current phylogenetic opinion.
It should be noted that many of the a priori rooting decisions made in studies of phylogenetic relationships within Eutheria are perfectly valid for the context of those studies. Even if the assumed root taxon is not the true root taxon of the placental mammals, it could still be a valid outgroup for the phylogenetic analysis that was conducted. (However, changes in phylogenetic opinion can render some of these a priori decisions as questionable. For instance, the use of Cetacea as the outgroup for a phylogenetic study of Artiodactyla, as might have been the case in the early 1990s, is now invalid with the current recognition that Cetacea nests within Artiodactyla.) Yet, such a priori rooting decisions are potentially problematic in the context of building a eutherian supertree that is more comprehensive than the particular source tree. Although the unrooted networks produced in some of these studies might reflect the real phylogeny of placental mammals (i.e., the data analyzed accurately reflect the true tree), the procedure of rooting them with what is another ingroup taxon in the context of the supertree analysis can lead to incorrect topologies if the wrong taxon is chosen as the outgroup. As such, these source trees should be coded as being unrooted, whereas source trees that are rooted with a nonplacental outgroup (which is also the outgroup in the supertree analysis) would retain this rooting information. Thus, the use of semirooted MRP allows us to retain older source trees (for which the data remain valid) without necessarily having to incorporate their rooting assignments (the assumptions underlying which might be out of date).
| Acknowledgment |
|---|
|
|
|---|
The first author thanks Tandy Warnow for planting the seed many years ago about the possibility of unrooted MR analyses. Rich Grenyer provided invaluable help and advice regarding various aspects of this study. Sarah Mathews kindly provided a preprint of her work and clarified several points regarding the evolution of the phytochrome gene family. We also thank Gordon Burleigh, Mike Charleston, Rod Page, Mike Steel, and Mark Wilkinson for helpful comments. Funding support was provided by the BMBF (Germany) through the project "Bioinformatics for the Functional Analysis of Mammalian Genomes" (OBE) and the NERC (NER/A/S/2001/00581; RB and AP).
Olaf R. P. Bininda-Emonds and Robin M. D. Beck contributed equally to this work.
| References |
|---|
|
|
|---|
-
Amrine-Madsen H., Koepfli K. P., Wayne R. K., Springer M. S. A new phylogenetic marker, apolipoprotein B, provides compelling evidence for eutherian relationships. Mol. Phylogenet. Evol. (2003) 28:225–240.[CrossRef][Web of Science][Medline]
Arnason U., Adegoke J. A., Bodin K., Born E. W., Esa Y. B., Gullberg A., Nilsson M., Short R. V., Xu X., Janke A. Mammalian mitogenomic relationships and the root of the eutherian tree. Proc. Natl. Acad. Sci. USA (2002) 99:8151–8156.
Asher R. J., Novacek M. J., Geisler J. H. Relationships of endemic African mammals and their fossil relatives based on morphological and molecular evidence. J. Mamm. Evol. (2003) 10:131–194.[CrossRef]
Baum B. R. Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees. Taxon (1992) 41:3–10.[CrossRef][Web of Science]
Bininda-Emonds O. R. P. Trees versus characters and the supertree/supermatrix "paradox". Syst. Biol. (2004) 53:356–359.[CrossRef][Web of Science][Medline]
Bininda-Emonds O. R. P., Bryant H. N. Properties of matrix representation with parsimony analyses. Syst. Biol. (1998) 47:497–508.[Web of Science][Medline]
Böcker S. Unrooted supertrees: Limitations, traps, and phylogenetic patchworks. In: Phylogenetic supertrees: Combining information to reveal the Tree of Life—Bininda-Emonds O. R. P., ed. (2004) Dordrecht, The Netherlands: Kluwer Academic. Pages 331–351.
Chen D., Eulenstein O., Fernández-Baca D. Rainbow: A toolbox for phylogenetic supertree construction and analysis. Bioinformatics (2004) 20:2872–2873.
Donoghue M. J., Mathews S. Duplicate genes and the root of angiosperms, with an example using phytochrome sequences. Mol. Phylogenet. Evol. (1998) 9:489–500.[CrossRef][Web of Science][Medline]
Farris J. S., Kluge A. G., Eckhardt M. J. A numerical approach to phylogenetic systematics. Syst. Zool. (1970) 19:172–191.
Gatesy J., Matthee C., DeSalle R., Hayashi C. Resolution of a supertree/supermatrix paradox. Syst. Biol. (2002) 51:652–664.
Huelsenbeck J. P., Bollback J. P., Levine A. M. Inferring the root of a phylogenetic tree. Syst. Biol. (2002) 51:32–43.
Lin Y.-H., McLenachan P. A., Gore A. R., Phillips M. J., Ota R., Hendy M. D., Penny D. Four new mitochondrial genomes and the increased stability of evolutionary trees of mammals from improved taxon sampling. Mol. Biol. Evol. (2002) 19:2060–2070.
Liu F.-G. R., Miyamoto M. M., Freire N. P., Ong P. Q., Tennant M. R., Young T. S., Gugel K. F. Molecular and morphological supertrees for eutherian (placental) mammals. Science (2001) 291:1786–1789.
Maddison D. R., Maddison W. P. MacClade 4: Analysis of phylogeny and character evolution (2000) Sunderland, Massachusetts: Sinauer Associates. Version 4.0.
Maddison W. P., Donoghue M. J., Maddison D. R. Outgroup analysis and parsimony. Syst. Zool. (1984) 33:83–103.
Mathews S., Donoghue M. J. The root of angiosperm phylogeny inferred from duplicate phytochrome genes. Science (1999) 286:947–950.
Mathews S., Donoghue M. J. Basal angiosperm phylogeny inferred from duplicate phytochromes A and C. Int. J. Pl. Sci. (2000) 161:S41–S55.[CrossRef]
Messenger S. L., McGuire J. A. Morphology, molecules, and the phylogenetics of cetaceans. Syst. Biol. (1998) 47:90–124.
Misawa K., Janke A. Revisiting the Glires concept—phylogenetic analysis of nuclear sequences. Mol. Phylogenet. Evol. (2003) 28:320–327.[CrossRef][Web of Science][Medline]
Murphy W. J., Eizirik E., O'Brien S. J., Madsen O., Scally M., Douady C. J., Teeling E., Ryder O. A., Stanhope M. J., de Jong W. W., Springer M. S. Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science (2001) 294:2348–2351.
Piaggio-Talice R., Burleigh J. G., Eulenstein O. Quartet supertrees. In: Phylogenetic supertrees: Combining information to reveal the Tree of Life—Bininda-Emonds O. R. P., ed. (2004) Dordrecht, The Netherlands: Kluwer Academic. Pages 173–191.
Purvis A. A modification to Baum and Ragan's method for combining phylogenetic trees. Syst. Biol. (1995) 44:251–255.
Ragan M. A. Phylogenetic inference based on matrix representation of trees. Mol. Phylogenet. Evol. (1992) 1:53–58.[CrossRef][Medline]
Rokas A., Holland P. W. Rare genomic changes as a tool for phylogenetics. Trends Ecol. Evol. (2000) 15:454–459.[CrossRef][Medline]
Sharrock R. A., Mathews S. Phytochrome genes in higher plants: Structure, expression, and evolution. In: Photomorphogenesis in plants, 3rd ed. Dordrecht, The Netherlands: Kluwer Academic. In press.
Shoshani J., McKenna M. C. Higher taxonomic relationships among extant mammals based on morphology, with selected comparisons of results from molecular data. Mol. Phylogenet. Evol. (1998) 9:572–584.[CrossRef][Web of Science][Medline]
Springer M. S., Stanhope M. J., Madsen O., de Jong. W. W. Molecules consolidate the placental mammal tree. Trends Ecol. Evol. (2004) 19:430–438.[CrossRef][Medline]
Steel M., Dress A. W. M., Böcker S. Simple but fundamental limitations on supertree and consensus tree methods. Syst. Biol. (2000) 49:363–368.
Swofford D. L. PAUP*. Phylogenetic analysis using parsimony (*and other methods) (2002) Sunderland, Massachusetts: Sinauer Associates. Version 4.
Swofford D. L., Olsen G. J., Waddell P. J., Hillis D. M. Phylogenetic inference. In: Molecular systematics—Hillis D. M., Moritz C., Mable B. K., eds. (1996) Sunderland, Massachusetts: Sinauer Associates. Pages 407–514.
Wilkinson M., Thorley J. L., Littlewood D. T. J., Bray R. A. Towards a phylogenetic supertree of Platyhelminthes? In: Interrelationships of the Platyhelminthes—Littlewood D. T. J., Bray R. A., eds. (2001) London: Taylor and Francis. Pages 292–301.
Wilkinson M., Thorley J. L., Pisani D., Lapointe F.-J., McInerney J. O. Some desiderata for liberal supertrees. In: Phylogenetic supertrees: Combining information to reveal the Tree of Life—Bininda-Emonds O. R. P., ed. (2004) Dordrecht, The Netherlands: Kluwer Academic. Pages 227–246.
This article has been cited by other articles:
![]() |
R. Torices and A. A. Anderberg Phylogenetic analysis of sexual systems in Inuleae (Asteraceae) Am. J. Botany, May 1, 2009; 96(5): 1011 - 1019. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. T Lloyd, K. E Davis, D. Pisani, J. E Tarver, M. Ruta, M. Sakamoto, D. W.E Hone, R. Jennings, and M. J Benton Dinosaurs and the Cretaceous Terrestrial Revolution Proc R Soc B, November 7, 2008; 275(1650): 2483 - 2490. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. J. Tourasse and A.-B. Kolsto SuperCAT: a supertree database for combined and integrative multilocus sequence typing analysis of the Bacillus cereus group of bacteria (including B. cereus, B. anthracis and B. thuringiensis) Nucleic Acids Res., January 11, 2008; 36(suppl_1): D461 - D468. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



