| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
© 2005 Society of Systematic Biologists
Gene Family Content-Based Phylogeny of Prokaryotes: The Effect of Criteria for Inferring Homology
Edited by Peter Lockhart: Associate Editor Chris Simon Editor
1 Department of Biological Sciences, University of South Carolina Columbia, South Carolina 29205, USA E-mail: austin{at}biol.sc.edu (A.L.H.)
2 Department of Computer Science and Engineering, University of South Carolina Columbia, South Carolina 29205, USA
| Abstract |
|---|
|
|
|---|
A number of recent papers have suggested that gene family content can be used to resolve phylogenies, particularly in the case of prokaryotes, in which extensive horizontal gene transfer means that individual gene phylogenies may not mirror the organismal phylogeny. However, no study has yet examined how sensitive such analyses are to the criterion of homology assessment used to assemble multigene families. Using data from 99 completely sequenced prokaryotic genomes, we examined the effect of homology criteria in phylogenetic analyses wherein presence or absence of each family in the genome was used as a cladistic character. Different criteria resulted in evidence for contradictory tree topologies, sometimes with high bootstrap support. A moderately strict criterion seemed best for assembling multigene families in a biologically meaningful way, but it was not necessarily preferable for phylogenetic analysis. Instead, a very strict criterion, which broke up gene families into smaller subfamilies, seemed to have advantages for phylogenetic purposes. The poor performance of gene family content-based phylogenetic analysis in the case of prokaryotes appears to reflect high levels of homoplasy resulting not only from horizontal gene transfer but also, more importantly, from extensive parallel loss of gene families in certain bacteria genomes.
Keywords: Gene content; gene families; gene loss; horizontal gene transfer; phylogenetic methods
Received December 22, 2003; Revised August 6, 2004; Accepted October 31, 2004
The availability of a large number of complete sequences of prokaryotic genomes holds promise for further resolving the phylogenetic relationships among major prokaryotic groups. However, there is evidence that horizontal gene transfer (HGT) may have been a frequent occurrence in prokaryotic evolution, which would imply that the phylogeny of individual genes may not reflect the organismal phylogeny (Daubin et al., 2003; Kunin and Ouzounis, 2003; Lerat et al., 2003; Mirkin et al., 2003; Wolf et al., 2002). For this reason, a number of investigators have advocated approaches to prokaryotic phylogeny based on so-called gene content (Snell et al., 1999), that is, the presence or absence of gene families in genomes, which might be more properly called gene family content. Often, gene family content analyses have made use of various distances based on the proportion of shared gene families (Snell et al., 1999). Because of the ad hoc character of such distances, some authors (e.g., Gu, 2000; Huson and Steel, 2004) have proposed a maximum likelihood approach to this question. However, because developing a biologically accurate model of gene family gain and loss is problematic, a number of authors have applied parsimony to gene family content analyses (Wolf et al., 2001, 2004; Hughes and Friedman, 2004).
Whatever method of phylogenetic reconstruction is used, any analysis based on gene family content faces a problem in defining gene families. When families are defined in automated fashion, some search criterion based on the extent of sequence similarity must be used; but the effect of the choice of search criterion on the results of phylogenetic analyses has so far not been studied. In the present paper, we apply a range of different homology criteria to establish gene families in order to examine the sensitivity of such analyses to the criteria used in assigning family membership.
| METHODS |
|---|
|
|
|---|
We analyzed 99 complete genomes of prokaryotes, 16 from Archaea and 83 from Bacteria (see Appendix 1 for accession numbers). References to currently accepted taxonomy of these species followed the Bergey's Manual Trust Web site (http://www.cme.msu.edu/bergeys/outline.prn.pdf). We assembled gene families by inferred homology from search applied to predicted protein translations using the BLASTCLUST software available in the Blast software package (Altschul et al., 1997). This program identifies families by a single-linkage method, which assembles larger families by linking shared genes among families, thus ensuring that a given gene will be assigned to only one family. Sequence homology was established by identifying matches using a conservative E-value of 10– 6. We used six different criteria for scoring a match between two sequences: (1) a minimum of 10% sequence identity across at least 30% of the two sequences; (2) a minimum of 20% sequence identity across at least 40% of the two sequences; (3) a minimum of 30% sequence identity across at least 50% of the two sequences; (4) a minimum of 40% sequence identity across at least 60% of the two sequences; (5) a minimum of 50% sequence identity across at least 70% of the two sequences; and (6) a minimum of 60% sequence identity across at least 80% of the two sequences. We refer to these criteria, respectively, as 10/30, 20/40, 30/50, 40/60, 50/70, and 60/80.
Using these specified homology criteria, all predicted proteins in the 99 genomes were assigned to families. Families having only a single member were excluded from the analyses. For each remaining family, each genome was scored for presence (1) or absence (0). Maximum parsimony (MP) analysis, using heuristic search by simple stepwise addition (Swofford, 2002), was applied to the resulting matrix, in which protein families corresponded to characters. MP trees were rooted on the assumption that Archaea constitute an outgroup to Bacteria. Bootstrapping (1000 replicates) (Felsenstein, 1985) was used to assess the extent to which clustering patterns in the MP tree received support from the data set as a whole.
In order to assess the nature of the phylogenetic signal in the data sets assembled under different homology criteria, we computed the "amount of possible synapomorphy" (APS) (Ferris, 1989; Simmons et al., 2004). For each parsimony-informative character, APS is defined as the difference between the maximum and minimum number of possible steps for that character. Characters with high APS can potentially be used to resolve deep internal branches of the phylogenetic tree, whereas those with low APS can only resolve outer branches. Thus, the average APS across all informative characters provides information regarding the potential for resolution of deep branches.
In order to examine the "tree-like" nature of the signal in each data set, we calculated NeighborNet splits graphs using SplitsTree 4.0 (Bryant and Moulton, 2004; Huson, 1998) from a matrix of p-distances (proportion of difference) among genomes, derived from the matrix of 1s and 0s. This approach allowed a heuristic visualization of the extent of conflicting signals in the data, as homology criteria were changed.
| RESULTS |
|---|
|
|
|---|
The different search criteria led to differences in definition and membership of families (Table 1). As the strictness of the criterion increased, the mean number of genes per genome assigned to families decreased (Table 1). This evidently occurred because increasingly strict homology criteria led to an increase in the number of "singletons," i.e., single genes not assigned to membership in any family. The mean number of families per genome was lowest with the least strict criterion (10/30), then increased as the criterion became stricter, reaching a maximum at 40/60, and then declined as the criterion became still stricter (Table 1). The mean number of genes per family decreased as a function of increasing strictness of the homology criterion (Table 1).
|
Under most criteria, the mean number of genes per family in a genome was correlated with genome size (in bp). This correlation was strongest with the 30/50 criterion (Table 1), in which case a close linear relationship was observed (Fig. 1A). However, under the strictest criterion (60/80), there was not a significant relationship between the mean number of genes per family and genome size (Table 1 and Fig. 1B). This evidently occurred because, under the strictest criterion, families were broken up to the point that relatively few families had more than a single member in any given genome.
|
Table 2 summarizes results of phylogenetic analyses conducted using the data sets assembled under the different homology criteria. The number of informative characters (i.e., families) available for analyses increased as the strictness of the criterion increased (Table 2). The consistency index (CI) decreased, reaching a minimum at 30/50, then increased sharply as the criteria increased in strictness (Table 2). This pattern evidently occurred because the proportion of hypothesized changes involving loss of a family (character changes from 1 to 0) was highest with the 30/50 criterion. Under the 30/50 criterion, large families were broken up but not excessively so. Thus there were fewer gains of families (character change from 0 to 1) relative to losses under this criterion, and families including both gains and losses contributed to the reduction in CI. With more liberal criteria, fewer distinct families were identified; thus, both gains and losses were reduced. In contrast, with stricter criteria, an increasingly large number of families were identified, leading to very few losses of families and a large number of gains (Table 2).
|
Regarding bootstrap support for branches within the trees, the number of significant (95% support or better) did not change in a consistent way as a function of the strictness of the homology criterion (Table 2). Both the number of significant branches and the number of significant internal branches (i.e., those deeper than the branch leading to a terminal pair) were highest with the 40/60 criterion.
The mean APS (per informative character) differed significantly among the six criteria (one-way analysis of variance [ANOVA]; F5, 161,903 = 844.24; P < 0.001) (Table 2). Mean APS increased slightly with increasing strictness of the homology criterion from 10/30 to 30/30, then decreased as the criterion became increasingly strict (Table 2). As a result, the mean APS for 60/80 was less than half that for 30/50 (Table 2). These results imply that, using a criterion of intermediate strictness, there was maximal potential information for resolving deep internal branches, whereas with an extremely strict criterion a greater proportion of information was available for resolving terminal branches.
Figure 2 illustrates the single MP tree based on the moderate 30/50 criterion. As in all MP trees found under all search criteria, Archaea clustered apart from Bacteria (Fig. 2). In addition, as in all MP trees found under all criteria, closely related species (such as congeners) clustered together, usually with strong bootstrap support (Fig. 2).
|
In the Bacteria, certain members of recognized higher level taxonomic groups clustered together, although monophyly of previously recognized higher level groupings was generally not supported. For example, the order Bacillales (including Bacillus and related genera) formed a well-supported monophyletic group (Fig. 2). However, the phylum Firmicutes, in which Bacillales is included, did not form a monophyletic cluster. Mycoplasma and Ureaplasma, traditionally included in Firmicutes, clustered apart from the cluster including most Firmicutes. In addition, the cluster including most Firmicutes also included Fusobacterium (Fig. 2), which is assigned to a separate phylum (Fusobacteria).
Similarly, there was a well-supported cluster that included many genera assigned to the phylum Proteobacteria, such as Escherichia, Agrobacterium, and Ralstonia (Fig. 2). However, the groupings within this cluster did not correspond to the currently accepted classes Alphaproteobacteria, Betaproteobacteria, and Gammaproteobacteria (Fig. 2). In addition, Rickettsiaand Buchnera, traditionally assigned to Proteobacteria, fell outside this cluster (Fig. 2). In the six MP trees based on the strict 60/80 criterion, Rickettsia and Buchnera clustered strongly with Proteobacteria (Fig. 3). On the other hand, Firmicutes were not recovered as a monophyletic group, because Mycoplasma and Ureaplasma fell outside the cluster with other genera traditionally assigned to Firmicutes (Fig. 3).
|
Figure 4 shows the strict consensus of all MP trees found with the different criteria used. In this consensus tree, most deep-branching patterns were unresolved (Fig. 4). Only 46 branches received significant bootstrap support—a much lower figure than in any of the individual trees constructed on the basis of individual homology criteria (Table 2). Of these, only 19 (41%) represented deep branches (i.e., not branches subtending terminal pairs), again a much lower figure than in any individual tree (Table 2). The fact that certain deep branches were not resolved in the consensus tree but received significant bootstrap support in individual trees implies that the trees constructed on the basis of the different homology criteria frequently resolved the higher-level relationships of prokaryotes in mutually contradictory ways.
|
Equally illustrative of the conflicts among trees were the high mean topological distances (dT) among the MP trees found under each criterion (Table 2). The 20/40 and 30/50 criteria were closest on average to the other criteria, while 60/80 was farthest from the other criteria (Table 2). The large average dT values to 60/80 reflected in part the placement of both Rickettsia and Buchnera with other Proteobacteria under the latter criterion, which was not observed under any other criterion (Fig. 2, Fig. 3 and data not shown).
NeighborNet analyses produced splits graphs that corroborated the findings from the APS analyses. These show that, as the homology assessment became strict, support decreased for the internal branches that separate major clusters. Most noticeable was the loss of phylogenetic signal separating Archaera and Eubacteria; for example, compare the graph for 30/50 with that for 60/80. (For splits graphs for all criteria, see Fig. A1, available online at the Society of Systematic Biologists web site, http://systematicbiology.org). Interestingly, at stricter levels used to infer homology, there appeared to be a higher level of bifurcation amongst terminal taxa (Fig. 4). These findings support other observations that we report and indicate that no one criterion well represents all relevant phylogenetic information.
| DISCUSSION |
|---|
|
|
|---|
The results presented here demonstrate that, at least in the case of prokaryotic genomes, phylogenetic analyses based on gene family content are highly sensitive to the homology criteria used to define families. The true phylogeny of these organisms is so far poorly resolved. Thus, it is not in general possible to say which of the homology criteria used produced a tree closer to the true tree. However, the fact that the trees obtained with different homology criteria were mutually contradictory did not increase confidence in the applicability of gene content analyses to the resolution of prokaryotic phylogenies. Although parsimony was used for phylogenetic reconstruction in the present analyses, there is no reason to believe that the problems revealed here are unique to parsimony. Because all methods of analysis that have been applied to gene family content take family assignment of genes as a given, at least some of the same problems are likely to arise with distance or likelihood methods as well.
The absence of Rickettsia and Buchnera from the cluster with other Proteobacteria in the phylogeny based on the moderate 30/50 criterion (Fig. 2) suggested that parallel loss of gene families is the likely explanation for some of the observed problems. Both Rickettsia and Buchnera have reduced genome sizes due to massive loss of gene families as an adaptation to life as obligate intracellular parasites (Andersson et al., 1998; van Ham et al., 2003). The extensive loss of gene families apparently caused these taxa to cluster nearer to other genera that have lost numerous gene families in adaptation to intracellular life, such as Mycoplasma (Himmelreich et al., 1996). Parallel gene family loss in adaptation to similar lifestyles appears to have created a sufficient degree of homoplasy that the true relationships of these organisms cannot be recovered by the method used. Previous studies have noted the problems that large-scale loss of gene families can pose for analyses based on gene family content (House and Firzgibbon, 2002; Dutilh et al., 2004; Lake and Rivera, 2004). Dutilh et al. (2004) have developed a method of reducing phylogenetically discordant signals in gene family content data that appears to ameliorate the problem.
On the other hand, in our analysis based on the 60/80 homology criterion, Rickettsia and Buchnera clustered among the Proteobacteria, although Buchnera did not cluster with Gammaproteobacteria, as expected from traditional classification (Fig. 3). The strict 60/80 criterion evidently had the effect of breaking up gene families so that only proteins showing a close phylogenetic relationship were grouped in a common family (Table 2). Because of the problems of extensive parallel gene loss, these extremely subdivided families may better reconstruct relatively close evolutionary relationships than do less subdivided families, at least in the case of prokaryotes.
The greatly reduced amount of possible synapomorphy (APS) per character in the case of the 60/80 criterion in comparison to more liberal criteria (Table 2) suggests that a stricter criterion provides more information suitable for resolving close relationships than do more liberal criteria. Conversely, moderate criteria (such as 30/50) showed the highest mean APS per character (Table 2) and thus the most potential information for resolving deep branches. However, the higher APS for moderate criteria did not in practice lead to a strikingly better resolution of deep branches (compare Fig. 2 and Fig. 3). Even at this level of homology criteria NeighborNet analysis showed many contradictory internal splits. This may at least in part reflect ancient horizontal gene transfers (HGT) among major lineages. Using a stricter criterion eliminates some contradictory splits; however, accompanying this is the loss of information as the stricter criterion breaks up ancient gene families whose phylogenetic relationships may document HGT events.
Families assembled with a moderate criterion may provide a better representation of what is usually meant by a multigene family than do the highly subdivided families assembled by a very strict criterion. In completely sequenced eukaryotic genomes, there is a correlation between genome size and the number of genes per family (Friedman and Hughes, 2001). We found in prokaryotic genomes also, except when the strictest was used, the number of genes per family was positively correlated with genome size (Table 1 and Fig. 1A). This suggests that less strict criteria better capture the concept of a gene family as a product of within-genome gene duplications (and, in the case of prokaryotes, occasional between-genome horizontal transfers). This correlation was strongest with the 30/50 criterion, suggesting that a criterion of intermediate strictness may be optimal when the goal is to assemble gene families for purposes of reconstructing the pattern of gene duplication within a genome. On the other hand, a very liberal criterion (such as 10/30) may approximate the results of an analysis based on families of domains or protein folds (Lin and Gerstein, 2000), since a very liberal criterion is likely to group proteins that share even one domain.
With all homology criteria used, the hypothesized gains of families substantially exceeded the hypothesized losses (Table 2). Hypothesized gains of families include both the first appearance of the gene in the phylogeny and its appearance in a new part of the phylogeny as a result of an HGT event. Furthermore, as stricter homology criteria are used, an increasing number of hypothesized gains of gene families are artifacts of the break-up of large families. When subfamilies of a large family are characterized as separate families, each such family is hypothesized to make a separate first appearance in the phylogeny. Thus, although a very strict homology criterion might be preferable for reconstructing some relationships of prokaryotic phylogeny, it would be very misleading if it were used to reconstruct the true pattern of HGT within a phylogeny.
APPENDIX 1
Genome sequences and accession numbers used in analyses:
- Halobacterium sp. NC_002607
[GenBank]
- Thermoplasma acidophilum NC_002578
[GenBank]
- Thermoplasma volcanicum NC_002689
[GenBank]
- Aeropyrum pernix NC_000854
[GenBank]
- Pyrobaculum aerophilum NC_003364
[GenBank]
- Sulfolobus solfataricus NC_002754
[GenBank]
- Solfolobus tokadei NC_003106
[GenBank]
- Pyrococcus furiosus NC_003413
[GenBank]
- Pyrococcus abyssi NC_000868
[GenBank]
- Pyrococcus horokoshii NC_000961
[GenBank]
- Archaeoglobus fulgidus NC_000917
[GenBank]
- Methanosarcina acetivorans NC_003552
[GenBank]
- Methanosarcina mazei NC_003901
[GenBank]
- Methanococcus jannaschii-NC_000909
- Methanobacterium thermoautotrophicum NC_000916
[GenBank]
- Methanopyrus kandleri NC_003551
[GenBank]
- Trophyerma whipplei NC_004551
[GenBank]
- Buchnera aphidicola Bp NC_004545
[GenBank]
- Buchnera aphidicola Sg NC_004061
[GenBank]
- Buchnera sp. APS NC_002528
[GenBank]
- Chlamydia trachomatis NC_000117
[GenBank]
- Chlamydia pneumoniae NC_002620
[GenBank]
- Chlamydophila pneumoniae CWL029 NC_000922
[GenBank]
- Chlamydophila pneumoniae J138 NC_002491
[GenBank]
- Borrelia burgdorferi NC_001318
[GenBank]
- Treponema pallidum NC_000919
[GenBank]
- Mycoplasma pulmonis NC_002771
[GenBank]
- Mycoplasma genitalium NC_000908
[GenBank]
- Mycoplasma pneumoniae NC_000912
[GenBank]
- Mycoplasma penetrans NC_004432
[GenBank]
- Ureaplasma urealyticum NC_002162
[GenBank]
- Rickettsia conorii NC_003103
[GenBank]
- Rickettsia prowazekei NC_000963
[GenBank]
- Campylobacter jejuni NC_002163
[GenBank]
- Helicobacter pylori 26695 NC_000915
[GenBank]
- Helicobacter pylori J99 NC_000921
[GenBank]
- Aquifex aeolicus NC_000918
[GenBank]
- Chlorobium tepidum NC_002932
[GenBank]
- Thermosynechoccus elongatus NC_004113
[GenBank]
- Nostoc sp. NC_003272
[GenBank]
- Synechocystis sp. BA000022
[GenBank]
- Nitrosomonas europaea NC_004757
[GenBank]
- Xylella fastidiosa NC_002488
[GenBank]
- Xanthomonas campestris NC_003902
[GenBank]
- Xanthomonas axonopodis NC_003919
[GenBank]
- Pseudomonas aeruginosa NC_002516
[GenBank]
- Ralstonia solanacearum NC_003295
[GenBank]
- Caulobacter crescentus NC_002696
[GenBank]
- Brucella melitensis NC_003317
[GenBank]
- Brucella suis NC_004310
[GenBank]
- Bradyrhizobium japonicum NC_004463
[GenBank]
- Mesorhizobium loti NC_002678
[GenBank]
- Sinorhizobium meliloti NC_003047
[GenBank]
- Agrobacterium tumefaciens C58 NC_003062
[GenBank]
- Agrobacterium tumefaciens C58 UW NC_003304
- Neisseria meningitidis MC58 NC_003112
[GenBank]
- Neisseria meningitidis Z2491 NC_003116
[GenBank]
- Haemophilus influenzae NC_000907
[GenBank]
- Pasteurella multilocida NC_002663
[GenBank]
- Shewanella oneidensis NC_004347
[GenBank]
- Vibrio cholerae NC_002505
[GenBank]
- Vibrio parahaemolyticus NC_004603
[GenBank]
- Yersinia pestis C092 NC_003143
[GenBank]
- Yersinia pestis KIM NC_004088
[GenBank]
- Salmonella enterica NC_003198
[GenBank]
- Salmonella typhimurium NC_003197
[GenBank]
- Escherichia coli K12 NC_000913
[GenBank]
- Escherichia coli O157H7 NC_002695
[GenBank]
- Escherichia coli O157H7 EDL933 NC_002655
[GenBank]
- Deinococcus radiodurans NC_001263
[GenBank]
- Streptomyces avertimilis NC_003155
[GenBank]
- Streptomyces coelicolor NC_003888
[GenBank]
- Cornyebacterim efficiens NC_004369
[GenBank]
- Mycobacterium leprae NC_002677
[GenBank]
- Mycobacterium tuberculosis CDC1551 NC_002755
[GenBank]
- Mycobacterium tuberculosis H37Rv NC_000962
[GenBank]
- Thermotoga maritima NC_000853
[GenBank]
- Thermoanaerobacter tencongensis NC_003869
[GenBank]
- Clostridium acetobulyticum NC_003030
[GenBank]
- Clostridium perfringens NC_003366
[GenBank]
- Fusobacterium nucleatum NC_003454
[GenBank]
- Staphylococcus aureus MW2 NC_003923
[GenBank]
- Staphylococcus aureus Mu50 NC_002758
[GenBank]
- Staphylococcus aureus N315 NC_002745
[GenBank]
- Listeria innocua NC_003212
[GenBank]
- Listeria monocytogenes NC-003210
- Oceanobacillus iheyensis NC_004193
[GenBank]
- Bacillus halodurans NC_002570
[GenBank]
- Bacillus subtilis NC_000964
[GenBank]
- Lactobacillus plantarum NC_004567
[GenBank]
- Lactococcus lactis NC_002662
[GenBank]
- Streptococcus pneumoniae R6 NC_003098
[GenBank]
- Streptococcus pneumoniae NC_003028
[GenBank]
- Streptococcus agalactiae 2603VR NC_004116
[GenBank]
- Streptococcus agalactiae NEM316 NC_004368
[GenBank]
- Streptococcus pyogenes NC_002737
[GenBank]
- Streptococcus pyogenes MGAS8232 NC_003485
[GenBank]
- Streptococcus pyogenes MGAS315 NC_004070
[GenBank]
- Streptococcus pyog pyogenes SSI1 NC_004606
[GenBank]
| ACKNOWLEDGMENTS |
|---|
|
|
|---|
This research was supported by grant GM066710 to A.L.H. from the National Institutes of Health.
| REFERENCES |
|---|
|
|
|---|
-
Altschul S. F., Madden T. L., Schäffer A. A., Zhang J., Zhang Z., Miller W., Lipman D. J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. (1997) 25:3389–3402.
Andersson S. G., Zomorodipour A., Anderssson J. O., Sicheritz-Ponten T., Alsmark U. C., Podowski R. M., Naslund A. K., Erikson A. S., Winkler H. H., Kurland C. G. The genome sequence of Rickettsia prowazekii and the origin of mitochondria. Nature (1998) 396:133–140.[CrossRef][Medline]
Bryant D., Moulton V. NeighborNet: An agglomerative method for the construction of planar phylogenetic networks. Mol. Biol. Evol. (2004) 21:255–265.
Daubin V., Moran N. A., Ochman H. Phylogenetics and the cohesion of bacterial genomes. Science (2003) 301:829–832.
Dutilh B. E., Huynen M. A., Bruno W. J., Snel B. The consistent phylogenetic signal in genome trees revealed by reducing the impact of noise. J. Mol. Evol. (2004) 58:527–538.[CrossRef][Web of Science][Medline]
Farris J. S. The retention index and the rescaled consistency index. Cladistics (1989) 5:417–419.[CrossRef][Web of Science]
Felsenstein J. Confidence limits on phylogenies: An approach using the bootstrap. Evolution (1985) 39:783–791.[CrossRef][Web of Science]
Friedman R., Hughes A. L. Pattern and timing of gene duplication in animal genomes. Genome Res. (2001) 11:1842–1847.
Gu X. A simple evolutionary model for genome phylogeny based on gene content. In: Comparative genomics—Sankoff D., Nadeau J. H., eds. (2000) Dordrecht: Kluwer Academic. Pages 515–523.
Himmelreich R., Hilbert H., Plagens H., Pirkl E., Li B. C., Herrmann R. Complete sequence analysis of the genome of the bacterium Mycoplasmapneumoniae. Nucleic Acids Res. (1996) 24:4420–4449.
House C. H., Fitz-Gibbon S. T. Using homolog groups to create a whole-genomic tree of free-living organisms: An update. J. Mol. Evol. (2002) 54:539–547.[CrossRef][Web of Science][Medline]
Hughes A. L., Friedman R. Differential loss of ancestral gene families as a source of genomic divergence in animals. Proc. R. Soc. Lond. B Suppl. (2004) 271:S107–S109.[CrossRef]
Huson D. SplitsTree: Analyzing and visualizing evolutionary data. Bioinformatics (1998) 14:68–73.
Huson D. H., Steel M. Phylogenetic trees based on gene content. Bioinformatics (2004) 20:2044–2049.
Kunin V., Ouzounis C. A. The balance of driving forces during genome evolution in prokaryotes. Genome Res. (2003) 13:1589–1594.
Lake J. A., Rivera M. C. Deriving the genomic tree of life in the presence of horizontal gene transfer: Conditioned reconstruction. Mol. Biol. Evol. (2004) 21:681–690.
Lerat E., Daubin V., Moran N. A. From gene trees to organismal phylogeny in prokaryotes: The case of the
-Proteobacteria. PloS Biol. (2003) 1:E19.[Medline]
Lin J., Gerstein M. Whole-genome trees based on the occurrence of folds and orthologs: Implications for comparing genomes on different levels. Genome Res. (2000) 10:808–818.
Mirkin B. G., Fenner T. I., Galperin M. Y., Koonin E. V. Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evol. Biol. (2003) 3:2.[CrossRef][Medline]
Simmons M. P., Carr T. G., O'Neill K. Relative character-state space, amount of potential phylogenetic information, and heterogeneity of nucleotide and amino acid characters. Mol. Phyl. Evol. (2004) 32:913–926.[CrossRef][Web of Science][Medline]
Snell B., Bork P., Huynen M. A. Genome phylogeny based on gene content. Nat. Genet. (1999) 21:108–110.[CrossRef][Web of Science][Medline]
Swofford D. L. PAUP*: Phylogenetic analysis using parsimony (*and other methods). (2002) Sunderland, Massachusetts. Sinauer.
Van Ham R. C. J., Kamerbeek J., Palacios C., Rausell C., Abascal F., Bastolla U., Fernández J. M., Jiménez L., Postigo M., Silva F. J., Tamames J., Viguera E., Latorre A., Valencia A., Morán F., Moya A. Reductive genome evolution in Buchnera aphidicola. Proc. Natl. Acad. Sci. USA (2003) 100:581–586.
Wolf Y. I., Rogozin I. B., Grishin N. V., Koonin E. V. Genome trees and the tree of life. Trends Genet. (2002) 18:272–479.
Wolf Y. I., Rogozin I. B., Grishin N. V., Tatusov R. L., Koonin E. V. Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evol. Biol. (2001) 1:8.[CrossRef][Medline]
Wolf Y. I., Rogozin I. B., Koonin E. V. Coelomata and not Ecdysozoa: Evidence from genome-wide phylogenetic analysis. Genome Res. (2004) 14:29–36.
This article has been cited by other articles:
![]() |
A. L. Hughes and R. Friedman Genome Size Reduction in the Chicken Has Involved Massive Loss of Ancestral Protein-Coding Genes Mol. Biol. Evol., December 1, 2008; 25(12): 2681 - 2688. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Spencer, D. Bryant, and E. Susko Conditioned Genome Reconstruction: How to Avoid Choosing the Conditioning Genome Syst Biol, February 1, 2007; 56(1): 25 - 43. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. K. Lienau, R. DeSalle, J. A. Rosenfeld, and P. J. Planet Reciprocal Illumination in the Gene Content Tree of Life Syst Biol, June 1, 2006; 55(3): 441 - 453. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


99%.


