Skip Navigation

Systematic Biology 2004 53(3):448-469; doi:10.1080/10635150490445797
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (65)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Castoe, T. A.
Right arrow Articles by Parkinson, C. L.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Castoe, T. A.
Right arrow Articles by Parkinson, C. L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2004 Society of Systematic Biologists

Data Partitions and Complex Models in Bayesian Analysis: The Phylogeny of Gymnophthalmid Lizards

Edited by Jack Sites: Associate Editor

Todd A. Castoe1, Tiffany M. Doan2 and Christopher L. Parkinson1

1 Department of Biology, University of Central Florida 4000 Central Florida Boulevard, Orlando, FL 32816–2368, USA; E-mail: tcastoe{at}mail.ucf.edu (T.A.C.), cparkins{at}pegasus.cc.ucf.edu (C.L.P.)
2 Biology Department Vassar College Box 555, 124 Raymond Avenue, Poughkeepsie, NY 12604–0555, USA; E-mail: tiffperu{at}yahoo.com


    Abstract
 Top
 Abstract
 Methods
 Results
 Discussion
 Appendix 1
 Appendix 2
 Acknowledgments
 References
 
Phylogenetic studies incorporating multiple loci, and multiple genomes, are becoming increasingly common. Coincident with this trend in genetic sampling, model-based likelihood techniques including Bayesian phylogenetic methods continue to gain popularity. Few studies, however, have examined model fit and sensitivity to such potentially heterogeneous data partitions within combined data analyses using empirical data. Here we investigate the relative model fit and sensitivity of Bayesian phylogenetic methods when alternative site-specific partitions of among-site rate variation (with and without autocorrelated rates) are considered. Our primary goal in choosing a best-fit model was to employ the simplest model that was a good fit to the data while optimizing topology and/or Bayesian posterior probabilities. Thus, we were not interested in complex models that did not practically affect our interpretation of the topology under study. We applied these alternative models to a four-gene data set including one protein-coding nuclear gene (c-mos), one protein-coding mitochondrial gene (ND4), and two mitochondrial rRNA genes (12S and 16S) for the diverse yet poorly known lizard family Gymnophthalmidae. Our results suggest that the best-fit model partitioned among-site rate variation separately among the c-mos, ND4, and 12S + 16S gene regions. We found this model yielded identical topologies to those from analyses based on the GTR+I+G model, but significantly changed posterior probability estimates of clade support. This partitioned model also produced more precise (less variable) estimates of posterior probabilities across generations of long Bayesian runs, compared to runs employing a GTR+I+G model estimated for the combined data. We use this three-way gamma partitioning in Bayesian analyses to reconstruct a robust phylogenetic hypothesis for the relationships of genera within the lizard family Gymnophthalmidae. We then reevaluate the higher-level taxonomic arrangement of the Gymnophthalmidae. Based on our findings, we discuss the utility of nontraditional parameters for modeling among-site rate variation and the implications and future directions for complex model building and testing.

Keywords: Autocorrelated gamma; Bayesian analysis; combining data; Gymnophthalmidae; likelihood models; partitioning data; Reptilia; site-specific gamma

Received May 18, 2003; Revised March 15, 2003; Accepted December 14, 2003


Incorporating genetic data from multiple genes, often from multiple genomes, is becoming standard in molecular phylogenetics, as is the use of complex model-based likelihood techniques to estimate phylogenetic relationships based on these data. Despite numerous authors advocating the superiority of using multiple loci (especially from multiple genomes) to reconstruct phylogenies (e.g., Pamilo and Nei, 1988; Wu, 1991), few have addressed theoretical and practical effects of modeling sequence evolution simultaneously for different genes (but see examples: Yang, 1996a; Caterino et al., 2001; Pupko et al., 2002; Nylander et al., 2004). Using a single model with a single set of parameters to account for evolution over multiple loci in a combined analysis may fail to accurately portray locus-specific evolutionary patterns. For instance, protein-coding versus rRNA genes may evolve under drastically different constraints because protein-coding genes commonly experience particularly elevated rates of substitution at the third positions of codons. Ribosomal RNA genes, on the other hand, may experience relatively slow rates of compensatory change over regions corresponding to stem-forming secondary structures in the core of the molecule, yet generally more rapid rates in regions corresponding to functionally unconstrained loops and short-range stems (e.g., Dixon and Hillis, 1993; Simon et al., 1994; Muse, 1995; Hickson et al., 1996; Savill et al., 2001). Even among protein-coding genes or rRNA genes, different patterns of substitution rate heterogeneity may result from overall differential rates of evolution or differential functional constraints on particular regions within a gene (Hickson et al., 1996; Yang, 1996b; Moncalvo et al., 2000). Considering these potential variations in evolutionary rates across sites, it seems logical that models of molecular evolution that account for heterogeneity with regard to among-site rate variation should be employed to reconstruct phylogenetic trees.

The recent shift in phylogenetic methodology towards Bayesian inference of phylogeny has heightened the importance of the use of more realistic evolutionary models. This is important for topological accuracy (e.g., Huelsenbeck, 1995; Huelsenbeck, 1997; Sullivan and Swofford, 2001) as well as accurate estimation of posterior probabilities (e.g., Buckley, 2002; Suzuki et al., 2002; Erixon et al., 2003). In general, it has been shown that likelihood methods are fairly robust to model choice in their estimation of topology (Yang et al., 1994; Posada and Crandall, 2001; Sullivan and Swofford, 2001). A major strength of Bayesian analyses is that posterior probability distributions of trees allow direct interpretation of the likelihood of a particular relationship recovered being true, given the data, the model, and the priors (although the robustness of posterior probabilities has not been thoroughly investigated). However, because the accuracy of posterior probabilities in Bayesian phylogenetic methods relies inherently on the model, models that do not affect the consensus topology may have notable effects on the posterior probability distribution of parameters, and thus on confidence regarding phylogenetic conclusions. Therefore, employing complex models that more accurately portray DNA evolution should produce less biased posterior probability estimates as long as parameters can be accurately estimated from the data.

The accuracy of posterior probability estimates in Bayesian phylogenetic reconstruction and the factors that may affect this accuracy remain unclear. Many studies suggest Bayesian posterior probabilities appear to be inflated compared to conventional bootstrap support (e.g., Leaché and Reeder, 2002; Cummings et al., 2003; Douady et al., 2003). However, the accuracy of posterior probability support values in terms of both type I and type II error remains unresolved (e.g., Buckley, 2002; Suzuki et al., 2002; Wilcox et al., 2002; Alfaro et al., 2003). Despite this, evidence is accumulating that suggests a direct relationship between accuracy of posterior probabilities and model complexity whereby Bayesian analyses conducted with underparameterized models appear to experience higher error rates compared with parameter rich models (Suzuki et al., 2002; Wilcox et al., 2002; Erixon et al., 2003). However, benefits of constructing and employing more realistic evolutionary models of DNA substitution are challenged by the potential for imprecise and inaccurate parameter estimation (including topology) resulting from overparameterization. Given ever-increasing computational power, in addition to the speed afforded by Bayesian Markov Chain Monte Carlo phylogenetic methods, the need for accurate models and model testing is apparent.

Our primary goal in this paper is to concentrate on evaluation of alternative models that practically affect phylogenetic inference. Specifically, we designed our approaches to make final decisions about best-fitting models based on the effects they had on topology and posterior probability estimates. This is important because although some alternative, relatively parameter-rich models may provide a better fit to the data, they may not result in alternative topologies or significantly different posterior probability estimates. In such cases, our strategy would instead favor a model with fewer parameters that produced essentially the same topology and posterior probability support estimates.

In this study, the genes used to reconstruct phylogenies are diverse and include one protein-coding nuclear gene (c-mos), one protein-coding mitochondrial gene (ND4), and two rRNA mitochondrial gene fragments (12S and 16S). We focused on the construction and evaluation of models that utilize alternatively partitioned patterns of among-site rate variation to account for heterogeneous evolution of multiple loci in a combined phylogenetic analysis. Particularly, MrBayes v2.01 allows among-site rate variation (gamma; Yang, 1993) to be partitioned among defined sites (site-specific gamma) as well as allowing the use of an autocorrelated gamma parameter to account for local autocorrelation of among-site rates (Kimura, 1985; Schöniger and von Haeseler, 1994; Yang, 1995; Nielsen, 1997). These models allow gamma parameter for among-site rate variation to be rescaled across partitions while using a single rate nucleotide substitution rate matrix for the entire data set. Along with conventional models of sequence evolution (e.g., GTR+I+G), we explore more complex models that partition the among-site rate variation in various ways among loci, in addition to those that employ an additional parameter for autocorrelation of site rate variation. We examine the phylogenetic hypotheses resulting from several alternative partitions of among-site rate variation and discuss their relevance to Bayesian support for clades and support for alternative topological placements of clades.

The taxonomic group examined in this study, lizards of the family Gymnophthalmidae, comprises a large radiation consisting of approximately 34 genera and 180 species occurring throughout South America with relatively few species in Middle America (Pellegrino et al., 2001; Doan, 2003a). The family is composed of small to medium lizards that occur in a variety of habitats and occupy a wide range of niches. This lizard group has been poorly studied with many species unknown beyond their original descriptions.

Relationships of genera within the family Gymnophthalmidae are poorly understood. The most comprehensive and contemporary revision of the supergeneric classification of the family Gymnophthalmidae was made by Pellegrino et al. (2001). They reconstructed a phylogeny of 50 species in 24 genera (recently reduced from 26 by Doan, 2003a) using five genes (two nuclear and three mitochondrial). Based on their reconstruction they erected four subfamilies and four tribes. The subfamily Alopoglossinae, consists solely of Alopoglossus. The subfamily Gymnophthalminae contains 13 genera, divided into two tribes, the Heterodactylini (5 genera) and the Gymnophthalmini (8 genera). The subfamily Rhachisaurinae is monotypic, consisting of Rhachisaurus brachylepis, a new genus separated from Anotosaura. The final subfamily, the Cercosaurinae, consists of 20 genera divided into tribe Cercosaurini (14 genera) and tribe Ecpleopini (6 genera).

Harris (2003) used c-mos sequences to reconstruct a phylogeny of the Squamata with concentrated taxon sampling in the Gymnophthalmidae. He primarily used Pellegrino et al.'s (2001) sequences but added a new sequence of Proctoporus bolivianus. Harris's (2003) reconstruction differed from Pellegrino et al.'s in the placement of Ptychoglossus, Bachia, Arthrosaura, and several smaller scale relationships. In addition, a teiid genus, Tupinambis, was nested within the Gymnophthalmidae as the sister to Ptychoglossus and Alopoglossus (although this relationship received bootstrap and posterior probability support below 50%).

Missing from Pellegrino et al.'s (2001) study were 10 genera. Whereas Pellegrino et al. sampled all genera of Alopoglossinae, Rhachisaurinae, and Gymnophthalmini, they lacked Stenolepis from the Heterodactylini, Amapasaurus from the Ecpleopini, and eight genera from the Cercosaurini. The limited taxon sampling of the Cercosaurini renders conclusions about that tribe problematic (see Hillis, 1998) because only half of the genera and 18 of the approximately 121 species were sampled (14.9%).

In addition to limited taxon sampling, the separate gene partitions were often in conflict with regards to the positions and relationships of many key taxa (Pellegrino et al., 2001). Such conflicts put into doubt the subfamilial and/or tribal placement of genera such as Ptychoglossus, Rhachisaurus, Bachia, and Neusticurus. Our study addresses some of the problems suggested in the combined analysis of Pellegrino et al. (2001) and fills in some significant gaps in taxon sampling. Concentrating on the Cercosaurini, we add 19 new individuals, including 12 new species and one new genus, as well as an additional individual of Ptychoglossus brevifrontalis (a total of 73 additional sequences). With this greater taxon sampling we clarify the relationships and classification within family Gymnophthalmidae. In addition to adding more taxa, we utilize a Bayesian approach to the phylogeny to complement the standard parsimony and likelihood methods used by Pellegrino et al. (2001). We synthesize this information, emphasizing overall phylogenetic evidence, and propose an alternative hypothesis for the intergeneric relationships and taxonomy within the family.

The objectives of our study included (1) evaluating the effects of partitioning among-site rate variation and among-site rate autocorrelation parameters on phylogenetic topology and posterior probabilities for relationships, (2) developing a robust strategy for choosing the best-fit model for among-site rate variation considering practical effects on topology and posterior probability support, (3) identifying the most likely and robust phylogenetic hypothesis for relationships among gymnophthalmid lizards, and (4) reevaluating the supergeneric classification of the family based on our best estimate of gymnophthalmid phylogeny.


    Methods
 Top
 Abstract
 Methods
 Results
 Discussion
 Appendix 1
 Appendix 2
 Acknowledgments
 References
 
DNA Sequences Used
A significant subset of the sequences used in this study is from Pellegrino et al. (2001) and Doan and Castoe (2003). Additional sequences of gymnophthalmid lizards were added to this dataset. Of the five genes used by Pellegrino et al. (2001), we chose to use and expand upon four: mitochondrial NADH dehydrogenase subunit 4 (ND4), mitochondrial small subunit rRNA gene (12S), mitochondrial large subunit rRNA gene (16S), and the nuclear oocyte maturation factor gene (c-mos). The nuclear small subunit rRNA gene (18S) used by Pellegrino et al. (2001) was omitted from our study for two reasons: (1) low phylogenetic signal apparent from the Pellegrino et al. (2001) study, and (2) the nuclear gene for 18S occurs in hundreds or thousands of copies per nuclear genome (e.g., humans: International Human Genome Sequencing Consortium, 2001; Xenopus: Pardue, 1974; Long and Dawid, 1980). Using sequences of this multicopy gene to resolve relationships principally among species and genera within a family may increase the potential for recovery of misleading phylogenetic estimates based on incomplete gene conversion among alleles at different loci or differential fixation of alleles among loci (Gasser et al., 1998; Gonzalez and Sylvester, 2001).

Laboratory methods for obtaining novel sequences used in this study are as follows. Where possible, two individuals of each taxon from distant sampling localities were added to the data set. Genomic DNA was isolated from tissue samples (liver or skin preserved in ethanol) using the Qiagen DNeasy extraction kit and protocol (Qiagen Inc., Hilden, Germany). The mitochondrial ND4 gene was amplified via polymerase chain reaction (PCR) using the primers ND4 and LEU as described in Arévalo et al. (1994). Mitochondrial ribosomal small and large subunit genes (12S and 16S) were amplified as described in Parkinson et al. (1997, 1999). The nuclear c-mos gene was amplified with primers G73 and G74 as described in Saint et al. (1998) and Pellegrino et al. (2001). Positive PCR products were excised from agarose electrophoretic gels and purified using the GeneCleanIII kit (BIO101). Purified PCR products were sequenced in both directions with the amplification primers (and for ND4, an additional internal primer HIS, Arévalo et al., 1994). Samples that could not be sufficiently sequenced directly were cloned using the Topo TA cloning kit (Invitrogen) according to the manufacturer's protocols. Plasmids were isolated from multiple clones per individual using the Qiaquick spin miniprep kit (Qiagen). Plasmids were sequenced using M13 primers (provided by Topo TA kit, Invitrogen) and, in some cases, the internal HIS primer for ND4. Purified PCR products and plasmids were sequenced using the CEQ D Dye Terminator Cycle Sequencing (DTCS) Quick Start Kit (Beckman-Coulter) and run on a Beckman CEQ2000 automated sequencer according to the manufacturers' protocols. Raw sequence chromatographs for sequences generated in this study were edited using Sequencher 3.1 (Gene Codes Corp.). In cases where gene fragments were cloned, all sequences from a single individual were edited together. Novel sequences were deposited in GenBank. The GenBank accession number for each gene sequence used in this study (including novel sequences) are given in Appendix 1.

Sequence Homology and Alignment
Multiple sequence alignment was performed using ClustalW (Thompson et al., 1994). Initial alignments were conducted with a gap opening penalty of 10, a gap extension penalty of 1, and a transition weight of 0.5. For rRNA genes (12S and 16S), alternative multiple alignments were examined with gap opening and gap extension penalties ranging from 10 and 10 (respectively) to 1 and 1, including varying ratios in this range. Initial alignments for protein-coding genes (ND4 and c-mos) were rechecked based on the homology of their translated amino acid sequence using GeneDoc (Nicholas and Nicholas, 1997). The ND4 alignment was unambiguous and not edited manually and the c-mos alignment was slightly manually modified to maximize amino acid similarity over a short indel region within the alignment. Alternative automated alignments (from ClustalW) for rRNA genes (12S and 16S) were compared, along with estimates of secondary structures (Gutell, 1994; Gutell et al., 1994; Titus and Frost, 1996), to evaluate evidence for positional homology. Positions in rRNA genes where alignment was ambiguous were excluded. To minimize the effects of missing data resulting from incomplete sequences, all gene alignments were truncated at the 5' and 3' ends. The final alignment of all concatenated genes, including positions excluded from phylogenetic analyses, is available as supplemental data at http://biology.ucf.edu/~clp/Lab/Lab.htm.

Phylogeny Estimation Using Maximum Parsimony
We inferred phylogenies based on the maximum parsimony (MP) criterion in PAUP* v4.0b10 (Swofford, 2002) and Bayesian (Markov Chain Monte Carlo, MCMC) analysis in MrBayes v2.01 (Huelsenbeck and Ronquist, 2001). Phylogenetic inference was conducted, hierarchically, in three steps: (1) all genes individually, (2) intermediate partitions including both rRNA genes (12S+16S) and all mitochondrial genes (mtgenes), and (3) the combined concatenated data set including all four genes. For MP analyses of independent genes and intermediate partitions, we conducted equally weighted parsimony searches using the heuristic strategy with 200 random taxon addition sequence replicates. Settings for MP analyses were tree bisection-reconnection branch swapping, steepest descent off, and MULTREES option on (Swofford, 2002). For all individual genes and intermediate partitions (rRNA and mtgenes), we assessed support for clades using 200 nonparametric bootstrap pseudoreplicates (Felsenstein, 1985) with 20 random taxon addition sequence replicates implemented with PAUP*. For the combined MP analysis of all genes, we searched for trees using equally weighted parsimony heuristic searches with 1000 random taxon addition sequence replicates and assessed clade support with 200 bootstrap pseudoreplicates with 200 random addition sequence replicates per bootstrap pseudoreplicate. We consider relationships that are supported by at least 70% bootstrap to be significantly resolved (Hillis and Bull, 1993).

Bayesian Phylogeny Estimation
ModelTest version 3.0 (Posada and Crandall, 1998) was used to infer the best-fit model of evolution for each gene data set (individual genes, intermediate partitions, and total combined data) based on hierarchical log-likelihood ratio tests comparing successively complex models (Huelsenbeck and Crandall, 1997; Posada and Crandall, 2001).

All MCMC phylogenetic reconstructions were conducted in MrBayes v2.01 (Huelsenbeck and Ronquist, 2001) with vague priors (as per the program's defaults) and model parameters estimated as part of the analyses. Three heated chains and a single cold chain were used in all MCMC analyses and runs were initiated with random trees, as per the program's defaults. Trees were sampled every 100 generations and majority rule consensus phylograms and posterior probabilities for nodes were assembled from all post burn-in sampled trees. Phylogenetic reconstructions for all data partitions were estimated using three independent runs to confirm that stationarity (or global optimality) was reached and that independent runs converged on similar stationary parameter estimates. Each of these data partition runs was conducted with a total of 1.4 million generations, 400,000 of which were discarded as burn-in, yielding 1 million post burn-in generations.

Each MCMC run for all individual gene and intermediate data partitions employed the model selected by ModelTest for that partition, or the nearest model to that model that could be implemented in MrBayes. The total combined data set was subjected to MCMC analyses under multiple alternative evolutionary models which differed in the way they parameterized among-site rate variation.

The most complex (parameter rich) model that ModelTest v3.0 can evaluate is a general time reversible (GTR; Tavaré, 1986) model with an estimated proportion of invariant sites (I) and gamma distributed among-site rate variation (G; Yang, 1993). MrBayes v2.01 is capable of employing more complex models than this GTR+I+G model. MrBayes allows among-site rate variation to be partitioned among user defined sites (site-specific gamma; SSG) as well as allowing the use of an autocorrelated gamma (A; e.g., Yang, 1995; Penny et al., 2001; Huelsenbeck, 2002) to account for autocorrelation of among-site rates. These two modifications of among-site rate variation may be used independently as well as simultaneously in a given model in MrBayes. In addition to the GTR+I+G model, we conducted combined data MCMC analyses with alternative models that partitioned gamma with (SAG) and without (SSG) accounting for autocorrelation (A) of rates. These two alternative ways to estimate among-site rate variation were invoked by the commands "rates = ssadgamma" and "rates = ssgamma" (respectively) in the MrBayes 2.01 command block. All models applied a single common GTR substitution rate matrix across all data and differed only in the way they modeled and partitioned among-site rates according to the following a priori partitions: GTR + autocorrelated gamma (GTR+AG); protein-coding genes versus rRNA genes (PR-SSG and PR-SAG); nuclear versus mitochondrial genes (NM-SSG and NM-SAG); c-mos versus ND4 versus rRNA genes (CNR-SSG and CNR-SAG); all genes partitioned independently (4gene-SSG and 4gene-SAG). Table 1 provides a summary of the parametric content of each of these models.


View this table:
[in this window]
[in a new window]

 
Table 1 Parametric composition of models tested in Bayesian MCMC analyses of the combined data.

 
Choosing among these models to identify the best model of evolution on which to base phylogenetic and taxonomic decisions was approached in several ways. Our goal was to find the model of evolution which best fit the data yet contained the fewest total parameters (the best-fit model). Specifically, our major criteria for identification of the simplest best-fit model included the demonstration of clear improvements of likelihood estimates under that model, along with a practical effect on topology and/or posterior probability support for clades. Therefore, we were not interested in more complex models that did not estimate a different topology or have significant effects on posterior probability estimates. Once a tentative model was chosen, this model was rigorously tested for overparameterization and unreliability (which would suggest it was not a candidate for the best-fit model).

We examined the burn-in plots of likelihoods for MCMC chains for each model to determine the rate of ascent to an apparent stationary plateau, in addition to the degree of overlap between models and superiority (based on chain likelihood values) of models, relative to the number of parameters they employed. To examine the relative improvement in likelihood scores with respect to model complexity, we compared the 95% credibility interval (CI) of MCMC chain likelihood scores between models. To calculate the 95% CI, we ranked all post burn-in tree estimates by ln likelihood and included the most likely 95% (Felsenstein, 1968; Huelsenbeck et al., 2002). Once a tentative best-fit was identified, we evaluated parameter burn-in plots of these models for evidence of identifiability of parameters by checking for commonality in parameter estimates among runs. We also examined the sensitivity of posterior probability values to model complexity using Wilcoxon signed rank tests implemented with Statistica (StatSoft, 1993) to test for significant changes in posterior probability estimates between the chosen model and those which were proximal alternative best-fit models. For interpretation of phylogenetic inferences, we consider posterior probability values over 95% to be well resolved.

In addition to the three independent MCMC runs (1.4 million generations each) conducted for each model, we conducted a single MCMC run for an extended number of generations (33 million generations) for the two main alternative best-fit models (GTR+I+G and CNR-SSG) for the combined data set. For each long MCMC run, the posterior probabilities for clades were monitored in intervals of two million generations to examine any trends and the overall precision associated with posterior probability estimates through generations of extended MCMC runs. Additionally, posterior probabilities of clades estimated from these long MCMC runs were compared to those estimated from the initial three MCMC runs per model (run for 1.4 million generations) to examine the effect that MCMC analysis strategy (multiple short runs versus single long run) has on estimates of posterior probabilities. Posterior probabilities estimated from these long runs were also used to re-test for significant changes in posterior probability estimates derived from analyses under alternative models (as described above).


    Results
 Top
 Abstract
 Methods
 Results
 Discussion
 Appendix 1
 Appendix 2
 Acknowledgments
 References
 
A total of 1810 characters were included in the analysis (c-mos 408 bp; ND4 623 bp; 12S 331 bp; 16S 448 bp). Details of optimal trees selected by maximum parsimony and best-fit models of evolution selected by ModelTest (Posada and Crandall, 1998) are presented in Table 2. After preliminary phylogenetic reconstructions, we identified several apparent problems with the Pellegrino et al. (2001) data set, including switching of taxon names and apparent contamination, which we rectified prior to final analyses. These are discussed in detail in Appendix 2.


View this table:
[in this window]
[in a new window]

 
Table 2 Statistics for datasets used, including results from MP searches and suggested model from hierarchical ln likelihood ratio test (hLRT) criterion from ModelTest.

 
Parsimony Phylogenetic Reconstruction
The total evidence (all four genes) equally weighted parsimony reconstruction resulted in two most parsimonious trees of 6600 steps with 769 parsimony-informative characters and CI = 0.228, RI = 0.543, RC = 0.124, HI = 0.772 (Fig. 1). Six major clades were recovered, each with high bootstrap support (70% to 100%) and differing from the reconstruction of Pellegrino et al. (2001). Whereas the earliest split within the Gymnophthalmidae in Pellegrino et al. (2001) was the divergence of a clade composed of the three Alopoglossus species from all others, we recovered a clade of Alopoglossus spp. and Ptychoglossus brevifrontalis. As explained in Appendix 2, an apparent taxon name error in the 12S and 16S data sets presumably resulted in the erroneous placement of Ptychoglossus in the Cercosaurinae. In addition to the Alopoglossus + Ptychoglossus clade, five other major clades were present, each of which had bootstrap support of 70% or greater. Relationships among these clades did not receive high bootstrap support. The Heterodactylini was sister to the Gymnophthalmini, supporting a monophyletic Gymnophthalminae with high bootstrap support (99%). The other three clades included a clade of Bachia + Rhachisaurus, the Ecpleopini, and the Cercosaurini. Monophyly of the Cercosaurinae did not receive strong bootstrap support.


Figure 1
View larger version (63K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1 Strict consensus phylogram of two most parsimonious trees based on the equally weighted maximum parsimony search including all four genes (c-mos, ND4, 12S, and 16S). Labels (a) and (b) indicate individuals of a species (see Appendix 1). For reference, labels on the right side represent the taxonomy presented by Pellegrino et al. (2001). Taxa that are not labeled have relationships that do not agree with that former taxonomy.

 
Mitochondrial Gene MCMC Analyses
Based on hierarchical log likelihood ratio tests (hLRTs) of successively complex models of sequence evolution, ModelTest indicated the best-fit model for the combined mitochondrial dataset was the TVM+I+G (Table 2). This model of evolution, characterized by a five-parameter nucleotide substitution rate matrix, is not currently available in MrBayes. Instead, the next best-fitting parameter rich model, which employs a GTR six-parameter nucleotide substitution rate matrix, was employed with proportion of invariant sites (I) and gamma distributed among-site rate variation (G). Parameter estimates derived from the combination of all post burn-in estimates from the three independent MCMC runs are summarized in Table 3. All three runs reached apparent stationarity (in estimates of substitution model parameters, as well as chain likelihood scores) prior to 50,000 generations, well before the conservative burn-in period of 400,000 generations.


View this table:
[in this window]
[in a new window]

 
Table 3 Parameter estimates for all mitochondrial gene and c-mos.

 
The all mitochondrial data partition MCMC reconstruction (Fig. 2a) contrasts with the Pellegrino et al. (2001) reconstruction in the relative phylogenetic placement of major clades deep in the phylogeny, but posterior probability support for some the relationships was not high. As in the parsimony reconstruction described above, Alopoglossus and Ptychoglossus form a clade sister to the remaining gymnophthalmids. The next node splits the Ecpleopini from the remainder of the taxa (but with low posterior probability support). Of the three MCMC runs, one differed slightly with regard to the structure of the remainder of the tree. Examination of this difference among runs revealed that the difference between the majority rule topologies from post burn-in MCMC trees resulted from an approximately 1% difference between runs in the posterior probability density supporting one relationship over another (both of which received posterior probabilities below 50%). Figure 2a depicts the consensus of those three runs that collapses nodes in conflict, creating a polytomy of Rhachisaurus, Bachia, the Gymnophthalminae, and the Cercosaurini minus Bachia. Even with the differences among the reconstructions, the lack of monophyly of the Cercosaurinae differs from Pellegrino et al. (2001; their Fig. 4) and our parsimony reconstruction (Fig. 1).


Figure 2
View larger version (95K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2 Bayesian phylogenetic trees for the independent nuclear and mitochondrial data partitions. Labels (a) and (b) indicate individuals of a species (see Appendix 1). (a) Majority rule phylogram and posterior probabilities resulting from Bayesian analysis of all three mitochondrial genes combined (ND4, 12S, and 16S) based on a combined 3 million post burn-in generations under the GTR+I+G model of evolution. (b) Majority rule phylogram and posterior probabilities resulting from Bayesian analysis of nuclear c-mos gene data based on a combined 3 million post burn-in generations under the K80+G model of evolution.

 
C-mos (Nuclear Gene) MCMC Analyses
Based on hLRTs of successively complex models of sequence evolution, ModelTest indicated the best-fit model for the combined mitochondrial dataset was the K80+G model (Table 2). Parameter estimates derived from the combination of all post burn-in estimates from the three independent MCMC runs, using a K80 + G model, are summarized in Table 3. All three runs reached apparent stationarity (in estimates of substitution model parameters, as well as chain likelihood scores) prior to 50,000 generations.

The nuclear c-mos reconstruction (Fig. 2b) differs from that of Pellegrino et al. (2001), Harris (2003), in our parsimony reconstruction (Fig. 1), and our mitochondrial reconstruction (Fig. 2a). As with Harris (2003), in our parsimony reconstruction, and our mitochondrial DNA reconstruction, Alopoglossus and Ptychoglossus form a basal clade. Similar to our parsimony reconstruction, four additional major clades are formed, each with high posterior probability support for clade monophyly, but low support of the relationships among the clades. The Gymnophthalminae forms a monophyletic group with strong posterior probability support. The Cercosaurinae is not monophyletic because there is strong support for the Ecpleopini being only distantly related to the Cercosaurini. Additionally, as in the parsimony reconstruction, Bachia and Rhachisaurus form a clade. As in Pellegrino et al.'s (2001) maximum likelihood reconstruction, but differing from our parsimony reconstruction, tribe Heterodactylini is not monophyletic but is paraphyletic with respect to the Gymnophthalmini.

Combined MCMC Analyses
Based on the hLRT criterion for model selection, ModelTest chose the GTR+I+G model of nucleotide substitution for the combined data set (Table 2). Burn-in plots of likelihood scores of MCMC chains conducted with this model and alternative models are shown in Figure 3 and mean stationary values (with 95% CI) across models are compared in Figure 4. A more detailed plot of the ascent of likelihood scores of chains toward stationarity for each model is shown in Figure 5. Although we only show burn-in plots for chains from one of three individual MCMC analyses for each model, no single run (under a particular model) was noticeably different with regard to burn-in time, parameter estimate mean, or CI at stationarity.


Figure 3
View larger version (44K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 3 The ln likelihood scores of MCMC chains based on alternative models of evolution, sampled in 10,000 generation intervals for clarity of presentation. See text for descriptions of alternative models.

 

Figure 4
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 4 The mean and 95% credibility interval for post burn-in ln likelihood scores of MCMC chains based on alternative models of evolution. See text for descriptions of alternative models.

 

Figure 5
View larger version (39K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 5 The ln likelihood scores of MCMC chains based on alternative models of evolution, focusing on the period below 50,000 generations, sampled every 100 generations. See text for descriptions of alternative models.

 
Based on the post-burn-in plateau of chain likelihood values observed in Figures 3 and 4Go, the GTR+I+G model appears to out-perform models that partition among-site rate variation between either nuclear versus mitochondrial genes (NM-SSG and NM-SAG) or between protein- coding versus ribosomal RNA genes (PR-SSG and PR-SAG). The GTR+AG model resulted in chain likelihood scores that were markedly lower than those estimated under the GTR+I+G model. Two groups of models that partition among-site rate variation into either three or four classes appeared to result in clear improvements in the likelihood scores of stationary chains when compared to the GTR+I+G model: models that partitioned among-site rate variation among c-mos (nuclear protein-coding) versus ND4 (mitochondrial protein-coding) versus ribosomal RNA genes (CNR-SSG and CNR-SAG; three site partitions) and those that partitioned rate variation among all individual genes (4gene-SSG and 4gene-SAG; four site partitions). Within this group of models with either three or four partitions of among-site rate variation (with and without autocorrelated gamma), no single model clearly outperformed any other based on estimates of stationary chain likelihood scores (Fig. 3). From Figure 5 we observe that all models, including those with three or four partitions for among-site rate variation, achieve stationarity rapidly by approximately 30,000 generations (although we conservatively discarded trees prior to 400,000 generations as burn-in).

Consensus topologies estimated from post-burn-in generations were identical among multiple independent runs under a particular model (Fig. 6). We found a general correlation between topology and model fit (inferred based on relative values of stationary chain likelihood scores), whereby the two models with the lowest range of ln likelihood scores (mean ln likelihood less than –30,550) for chains produced slightly different topologies compared with all models resulting in chains with higher ln likelihoods (mean ln likelihood less than –30,550). Analyses of the combined data employing all models except NM-SSG and GTR+AG recovered the identical topology. The analyses under the NM-SSG and GTR+AG models recovered a topology identical to the others except for a swap in the relative branching order with respect to two clades (Rhachisaurus + Gymnophthalminae and Ecpleopini; neither rearrangement received high posterior probability support), in addition to a modification affecting the phylogenetic position of Proctoporus ventrimaculatus + P. cf. ventrimaculatus.


Figure 6
View larger version (60K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 6 Bayesian phylogenetic tree and posterior probabilities for clades based on the combined, four-gene data set analyzed under the CNR-SSG model. Tree is based on the combination of all post burn-in generations resulting from three independent runs of the model, for a combined total of 3 million post-burn-in generations. Labels (a) and (b) indicate individuals of a species (see Appendix 1). This work's new proposed phylogenetic classification depicted by labeled bars on the right.

 
Based on our a priori criteria for initial identification of the preferred evolutionary model as that which contained the fewest number of parameters while demonstrating a clear optimization of overall chain likelihood, we chose the CNR-SSG model. To examine the effect of model choice on the cumulative posterior probabilities for clades, we tested for significant changes in posterior probabilities between the CNR-SSG model and all other models that were found to be as good or better than the GTR+I+G model (including GTR+I+G, CNR-SAG, 4gene-SSG, and 4gene-SAG) with Wilcoxon signed rank tests. For these, posterior probabilities for matched nodes were pairs for comparison. Tests comparing the overall change in posterior probabilities between the CNR-SSG model and other models with three or four gamma partitions (with and without autocorrelated gamma) were not significant. However, the GTR+I+G model was found to produce, overall, significantly lower estimates for posterior probabilities of clades than the preferred model (CNR-SSG; z = 2.173, P = 0.029). This trend is demonstrated by the relationship between posterior probabilities from the GTR+I+G model and the CNR-SSG model, plotted against one another in Figure 7. Overall, a majority of nodes plotted in this figure fall above the 1:1 line, indicating higher nodal support resulting from the CNR-SSG model. It is also important to note, however, that several nodes did decrease in posterior probability support under the CNR-SSG model.


Figure 7
View larger version (7K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 7 Plot of the posterior probabilities derived from the GTR+I+G MCMC analyses (all three runs combined) versus the posterior probabilities derived from the CNR-SSG model (all three runs combined). For comparison, a 1:1 line is plotted on the same axis.

 
As described above, burn-in plots of ln likelihood of MCMC chain scores from all independent MCMC runs under the CNR-SSG model are essentially identical with a rapid and direct approach to a common stationary plateau (not shown). To investigate burn-in and common estimates at stationarity for the parameters of the independent (1.4 million generation) CNR-SSG model runs, burn-in plots of the reversible rate of A-G and A-T substitutions as well as the gamma parameter and site-specific partition rate parameters are shown in Figure 8. Similar to burn-in plots of likelihood tree scores (Fig. 5), all parameters appear to approach stationarity rapidly (in less than 50,000 generations) and oscillate around a common stationary value (across independent runs). Because all three (1.4 million generation) independent runs of our preferred model (CNR-SSG) appear to reach common stationary estimates of parameters, produce identical topologies, and nearly identical posterior probability estimates, hereafter we report only results based on the combination of all 3 million post burn-in generations pooled from the three independent runs of MCMC analyses using the CNR-SSG model. Parameter values, with 95% credibility intervals, resulting from MCMC CNR-SSG model analyses are given in Table 4.


Figure 8
View larger version (32K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 8 Plots of selected parameters of the CNR-SSG model through generations. All three independent runs are plotted per graph to show common burn-in rates and similar parameter estimates. (a) Plot of parametric estimates of r(A-G) from the GTR rate matrix. (b) Plot of the parametric estimates of r(A-T) from the GTR rate matrix. (c) Plot of the gamma parameter estimates. (d) Plot of the site-specific rate multiplier for the ND4, rRNA (12S+16S), and c-mos site specific partitions of the gamma parameter.

 


View this table:
[in this window]
[in a new window]

 
Table 4 Parameter estimates for CNR-SSG model MCMC runs summarized as means with 95% credibility interval in parentheses.

 
Posterior probability estimates derived from post burn-in generations from the single long MCMC run (33 million generations) of the GTR+I+G and CNR-SSG models were very similar to estimates based on the combination of the three shorter (1.4 million generation) runs. Considering only clades supported with less than 100% posterior probability support, the long MCMC run of the GTR+I+G model produced estimates that were, on average, 1.05% different from the short run estimates, compared with 0.40% for the CNR-SSG model. Given the almost identical posterior probability estimates (within 1%) derived from the single long MCMC run under the CNR-SSG model as compared with those previously estimated from the combination of the three short MCMC runs of this model, we retain the use of posterior probabilities derived from the three short runs for further discussions of the phylogeny. Estimates of model parameters derived from this long CNR-SSG MCMC run are given in Table 4.

Relative to the overall estimates of posterior probabilities (all 33 million generations minus 1 million burn-in), the deviation of posterior probability estimates at intervals of generations showed greater variance for the GTR+I+G model than did the CNR-SSG estimates (Fig. 9). The GTR+I+G model produced less precise point (intermediate interval) estimates than did the CNR-SSG model. In other words, as MCMC chains progressed through generations, the posterior probability estimates tended to vary more for the GTR+I+G than the CNR-SSG model. Although the GTR+I+G model included 20 nodes supported below 100%, whereas the CNR-SSG model included only 17, this bias was factored out by reporting deviations per interval after dividing by the number of nodes considered. This average nodal support deviation (from overall long run estimates) calculated from intervals of generations for the GTR+I+G model was more than twice that for the CNR-SSG model (Fig. 9). In comparisons of variance for nodes receiving similar levels of support (e.g., around 80% posterior probability), greater degrees of variation were evident in the GTR+I+G than the CNR-SSG model run (see Appendix 3 for detail, available at the Society of Systematic Biologists website, http://systematicbiology. org), suggesting that elevated deviation observed in GTR+I+G estimates were not particularly biased by overall higher posterior probability estimates from the CNR-SSG model. Despite variance in posterior probability estimates for intervals of generations, however, no latent trends were observed in posterior probabilities that may indicate that new tree islands were sampled only late in runs (after many generations) or that chains were not completely burned-in after the inferred burn-in period (based on likelihood plateau). Instead, fluctuations in nodal support through generations appear to represent oscillating patterns (see Appendix 3 for detail).


Figure 9
View larger version (17K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 9 Comparison of deviation of posterior probability estimates at intervals of generations compared to overall means from long MCMC runs (33 million generations) for the GTR+I+G and CNR-SSG models. Values represent the absolute deviation of posterior probability estimates (relative to overall mean for long MCMC run) averaged across all nodes receiving less than 100% posterior probability.

 
We reexamine the effect of model choice on the cumulative posterior probabilities for clades based on these two extended MCMC runs (for the GTR+I+G and CNR-SSG models). Results from a Wilcoxon signed rank test returned very similar results as previous estimates based on the three short MCMC runs, suggesting the GTR+I+G model produced overall significantly lower estimates for posterior probabilities of clades than the preferred model (CNR-SSG; z = 2.334, P = 0.019).

The topology of the MCMC CNR-SSG tree (Fig. 6) has similarities with each of the other reconstructions but also differs from all aforementioned reconstructions in several ways. As with all of our phylogenetic reconstructions, Alopoglossus and Ptychoglossus form a well-supported clade sister to the rest of the Gymnophthalmidae. The Cercosaurinae is polyphyletic. As in the c-mos reconstruction, the Heterodactylini is paraphyletic with respect to the Gymnophthalmini. Although the placement of Rhachisaurus as the sister taxon to the Gymnophthalminae is unique among the parsimony and data partitions (mitochondrial and nuclear) in this study, it is in the same position as that was recovered by Pellegrino et al. (2001; their Fig. 4). Additionally, Bachia is recovered (with weak support) as the sister taxon to the Cercosaurini, as was found by Pellegrino et al. (2001).

Comparison Among Phylogenetic Reconstructions
Not all individual gene data sets (Fig. 2, and not shown) were in agreement with the combined tree (Fig. 6). The c-mos data partition agreed with the combined tree on higher-level relationships except for the placement of Rhachisaurus and Bachia. Similar to the parsimony reconstruction (Fig. 1), Rhachisaurus and Bachia formed a clade instead of Rhachisaurus being related to the Gymnophthalminae and Bachia to the Cercosaurini. The ND4 data partition supported a monophyletic Heterodactylini. The 16S data partition resolved the same general relationship as the combined data CNR-SSG tree except that one of the outgroups, Tupinambis quadrilineatus, was nested within the ingroup. The 12S data partition produced topologies most divergent from other genes, but nearly all of those relationships received poor posterior probability support.


    Discussion
 Top
 Abstract
 Methods
 Results
 Discussion
 Appendix 1
 Appendix 2
 Acknowledgments
 References
 
Model Selection and Evaluation
Bayesian methods have greatly improved our ability to estimate phylogenies using larger datasets and complex models of evolution. However, this creates a seemingly paradoxical dilemma with regard to model complexity and overparameterization. In general, it is assumed that more realistic models of evolution will yield more accurate trees and clade credibility (posterior probability) values, thus perhaps favoring parameter-rich models, because interpretations of posterior probabilities are contingent on model specifications (Huelsenbeck et al., 2002). However, a key assumption of Wald's (1949) proof of the consistency of maximum likelihood estimates is that all of the parameters of the likelihood function are identifiable from the true probability distribution of the data (Rogers, 2001). Even if a particular parameter may be intrinsic in the evolution of DNA sequences, we need to consider whether this parameter can be accurately estimated based on the data. This dilemma is manifested when attempting to construct and implement models that realistically describe DNA evolution, while avoiding overparameterization, or using more parameters than can be meaningfully estimated from the data.

In a Bayesian analysis, the problem of identifying the best model may be condensed to two intertwined issues: evaluating model performance and fit and examining the sensitivity of posterior probability distributions to model specifications (Gelman et al., 1995; Huelsenbeck et al., 2002). Detecting overparameterized models, however, is not readily accomplished, especially in a Bayesian phylogenetic framework (Huelsenbeck et al., 2002; Rannala, 2002). Several authors have suggested features of MCMC analyses that may be monitored to identify overparameterization, including poor convergence of MCMC chains (Carlin and Louis, 1996), a strong correlation among parameters in the posterior density despite independence under the prior density (Rannala, 2002), delayed convergence of a MCMC chain to a stationary plateau relative to less parameterized models (Rannala, 2002), and failure of multiple independent runs (chains) of the same model to converge on similar estimates of parameters and posterior probabilities (Huelsenbeck et al., 2002). We used these criteria, with the exception of testing among-parameter correlation, to guide the evaluation of what we tentatively identified as the best-fit model (CNR-SSG). Testing for among-parameter correlation, in our case, was not possible because the nature of the model causes inherent correlation of the parameters of interest (those which partition the among-site rate variation across partitions).

Different evolutionary rates and among-site rate patterns may be intrinsic evolutionary characteristics of different genes owing to their genomic origin (organellar versus nuclear) or function (e.g., protein-coding versus non-protein-coding). Sufficient evidence exists to suggest that drastically different evolutionary rates and distinct, gene-specific among-site rates can be observed among different genes categorized within a particular class (e.g., among protein-coding mitochondrial genes; Miyata, 1982; Kelly and Rice, 1996). We used these observations to identify plausible alternative partitions across which to estimate specific rates of among-site rate variation. Comparison of post burn-in MCMC chain likelihood scores across alternative models for among-site rate variation showed that only the three partition (CNR-SSG and CNR-SAG) and four partition (4gene-SSG and 4gene-SAG) models fit the data better than the GTR+I+G model chosen by ModelTest. We found no evidence for substantial differences in burn-in time, chain likelihood score at stationarity, or overall clade posterior probability estimates across these models, yet we did detect a significant overall improvement in clade posterior probability estimates between one of these four models (CNR-SSG) and the GTR+I+G model. We found no evidence that this best-fit model (CNR-SSG) was parametrically over-fitted (excessively parameter rich). In fact, based on the analysis of the extended MCMC runs, we found this model to produce significantly more consistent posterior probabilities through generations than did the GTR+I+G model. Given available evidence, we concluded that the best-fit model of evolution, in keeping with our goal of practical improvement for the sake of phylogenetic inference, was the CNR-SSG model, upon which we base our preferred hypothesis for the phylogeny of Gymnophthalmidae.

Here, we summarize our approach to model construction and evaluation as an explicit hierarchical process:

  1. Use hLRTs (e.g., ModelTest) to first identify best-fit conventional parameters (although other model choice criteria such as AIC [Akaike, 1974; also available in ModelTest], BIC [Schwarz, 1974], or DT [Minin et al., 2003] may be substituted)
  2. Construct alternative models with data set partitions defined based on a priori expectations of potentially biologically relevant subsets of the data (e.g., protein-coding versus non-protein-coding genes or mitochondrial versus nuclear genes)
  3. Examine model fit based on 95% CI of post burn-in MCMC chain likelihood values
  4. Tentatively choose best-fit model
  5. Examine this model for evidence of parameter identifiability or over-fitting
    1. Compare relative burn-in period across alternative models
    2. Check for topological consistency across multiple runs of tentative model
    3. Examine consistency of parameter estimates across multiple independent runs of the tentatively optimal model
    4. Check for consistency of clade posterior probabilities across independent runs
    5. Check for consistency of posterior probability estimates across generations for extended MCMC runs (with large number of generations)

  6. Test for significant differences in posterior probabilities between tentative model and those models of similar fit to the data
  7. Given evidence for parameter identifiability and significant changes to posterior probabilities, accept model. If identifiability is questionable or no significant changes to posterior probability are observed, reduce model parameterization and repeat model evaluation.

Effects of Partitioning Gamma and Using Autocorrelated Rate Variation
Several studies based on simulated data have strongly supported the view that maximum likelihood estimates of phylogeny remain accurate and robust even when the model used to estimate phylogeny differs markedly from that used to generate simulated data (Fukami-Kobayashi and Tateno, 1991; Yang et al., 1994; Sullivan and Swofford, 2001). Our results support this conclusion using empirical data, in that several different models for among-site rate variation support the same or very similar topologies. Many authors have underscored the importance of including estimates of among-site rate variation (e.g., Yang, 1993; Sullivan and Swofford, 2001; Buckley and Cunningham, 2002; Nylander et al., 2004; see review in Yang, 1996b) in models of sequence evolution for increasing the consistency and accuracy of phylogenetic inference. Our results demonstrate that apparent inappropriate partitioning of gamma among loci (e.g., NM-SAG model) may lead to inconsistent and presumably inaccurate phylogenetic inferences. The fact that our preferred model (CNR-SSG) provided a significant increase in overall posterior probability estimates for clades over the GTR+I+G model suggests that well fitted partitioning of among-site rate variation appears to significantly affect the posterior probability distributions of MCMC analyses. These results parallel previous studies that have demonstrated the significant effects of substitution model on maximum likelihood bootstrap support (Yang et al., 1995; Sullivan et al., 1997; Buckley et al, 2001; Buckley and Cunningham, 2002).

An interesting, yet difficult to interpret, result was observed in comparisons of posterior probability monitored through intervals during extended MCMC runs. We found the CNR-SSG model to produce more consistent posterior probabilities through generations than did the GTR+I+G model. If we assume that the complexity of tree space remained relatively constant under the two models, this may suggest that the MCMC chains of the CNR-SSG model were more consistent over intervals of generations with respect to the regular visitation of tree islands. Alternatively, it seems possible that implementing the partitioned gamma model (CNR-SSG) may have reduced the complexity of tree space by decreasing the number of optimal or near optimal peaks (reducing the number of major islands visited by MCMC chains over time), thereby reducing the variance through generations in trees sampled in the posterior distribution. This may have been accomplished by reducing the likelihood of certain peaks within tree space due to the different parameterization of among-site rates, thereby decreasing the number of near optimal tree islands. In general, the properties associated with the behavior of MCMC chains in tree space through generations has been essentially untouched in the literature, yet represents a significant gap in our understanding of Bayesian MCMC analyses. Future research is clearly necessary to answer questions about the number of generations and independent runs required for robust conclusions from MCMC and also how this may relate to model complexity and changes to the general topology of tree space.

Although the results of this study favor use of models that partition among-site rate variation, they also highlight a potential pitfall of such parameter-rich models. Not all alternative models improved model fit relative to GTR+I+G. The GTR+AG, PR-SSG, PR-SAG, NM-SSG, and NM-SAG models all decreased the fit of the model to the data, relative to the GTR+I+G model (chosen initially by hLRT criteria). These results re-emphasize the need to test the fit of alternative models instead of choosing a particular model a priori (e.g., Huelsenbeck and Crandall, 1997; Posada and Crandall, 2001; Minin et al., 2003). The GTR+I+G model not only fit the data better than some partitioned models (e.g., NM-SSG, PR-SAG; Fig. 4), but also recovered the identical topology while (on average) underestimating posterior probability support for clades (Fig. 7). These findings support the utility of this conventionally employed model and suggest that previous analyses using this model are likely to be as robust (with regard to topology) as more complex models, but provide more conservative estimates of posterior probability support for clades. However, our analysis of extended MCMC runs suggests that, for a reason that is not immediately clear, the GTR+I+G model appeared to take a large number of generations to undergo oscillation cycles with respect to estimates of posterior probabilities. This suggests that, for some models, at least one extended MCMC run (with a large number of generations) is desirable to precisely and accurately estimate posterior probabilities so that trees are sampled in the posterior distribution according to their posterior probability (Swofford, Warren, and Wilgenbusch, unpublished data). It is encouraging, from the standpoint of computational feasibility, that the estimates of model parameters, chain likelihood scores, and, particularly, posterior probabilities derived from the combination of three short (1.4 million generation) independent MCMC runs provided what appears to be, at least, a sufficient approximation of posterior probabilities derived from much longer MCMC runs.

Several authors have demonstrated the utility of employing a parameter to account for autocorrelated among-site rate variation in phylogenetic analyses (e.g., Yang, 1995; Penny et al., 2001; Huelsenbeck, 2002). Although evidence for the occurrence of autocorrelated rates has been well documented (Yang, 1995; Nielsen, 1997; Penny et al., 2001), we found the addition of this parameter to alternative models to be of limited value for improving the fit of models to our data. As can be seen in Figure 4, models which fit the data poorly (PR-SSG and NM-SSG) did appear to be notably improved with the addition of a parameter for autocorrelation of among-site rates, although this increase in fit did not exceed that of the GTR+I+G model (or the CNR or 4gene models). Among models which showed the best-fit to the data (CNR and 4gene), the addition of a parameter to account for among-site rate autocorrelation only slightly increased the likelihood scores for MCMC chains such that there was still broad overlap in the 95% CI of likelihood scores (Fig. 4). Similarly, Wilcoxon signed rank tests comparing overall posterior probabilities for clades between SSG and SAG variants of the CNR and 4gene models found no significant differences attributable to the addition of the auto-correlation parameter.

In this study we have concentrated on accounting for one particular type of heterogeneity in among-site rate patterns in combined DNA sequence analysis, that which exists at or above the level of a gene or locus, ignoring potential partitions that may be prescribed within genes. Models that do examine and attempt to account for within gene heterogeneity by constructing partitions based on codon position (e.g., Yang, 1996a; Krajewski et al., 1999; Buckley et al., 2001), protein domain (Herron et al., in press), or secondary structure for rRNA or tRNA genes (e.g., Schoniger and von Haeseler, 1994; Savill et al., 2001) have also been implemented. These intralocus partitions have yet to be thoroughly evaluated in a Bayesian framework and may potentially add additional realistic parameters to models of sequence evolution, especially in cases where very distant relationships are inferred (Penny et al., 2001) or where extreme accuracy of branch length estimates or model parameters are particularly critical to conclusions (Yang et al., 1994). Understanding and testing of parametric identifiability in complex models have been poorly studied and clearly requires additional attention. This issue, in addition to topology and posterior probability sensitivity to model choice, would benefit from future investigations using both simulated data and known phylogenies where more definitive conclusions about the effects of model choice may be drawn.

Taxonomic Considerations and Alterations
Much of our phylogeny reconstruction is consistent with that recovered by Pellegrino et al. (2001). However, our preferred phylogenetic hypothesis (combined data MCMC CNR-SSG reconstruction; Fig. 6) suggests four higher level taxonomic changes to the current classification (Pellegrino et al., 2001). The first change is that Ptychoglossus appears to be most closely related to Alopoglossus and not to the Cercosaurini. The placement of Ptychoglossus in the Cercosaurini by Pellegrino et al. (2001) was presumably the result of the swapping of taxon names between Ptychoglossus and Neusticurus juruazensis, as discussed in Appendix 2. This relationship was also inferred from the nuclear partition trees of Pellegrino et al. (2001) (and the c-mos reconstruction of Harris, 2003) in which Ptychoglossus was sister to the three Alopoglossus species. After making the correction to the Pellegrino et al. (2001) data set, and adding our own sequences for this taxon, it seems clear that Ptychoglossus brevifrontalis is sister to Alopoglossus; therefore, we remove Ptychoglossus from the Cercosaurinae and place it in the Alopoglossinae. This relationship is also supported by the morphological synapomorphy (present in both Ptychoglossus and Alopoglossus) of infralingual plicae, unique in the family Gymnophthalmidae.

The second taxonomic alteration involves the tribe Heterodactylini. This tribe is paraphyletic with respect to the Gymnophthalmini in our combined tree (Fig. 6) and in the c-mos (Fig. 2) and 16S (not shown) reconstructions. The paraphyly of the tribe was also apparent in Pellegrino et al.'s (2001) maximum likelihood tree. Because there does not appear to be sufficient support for recognizing a separate tribe Heterodactylini, we remove both of the Gymnophthalmini and Heterodactylini tribal names and refer all of the pertaining genera to subfamily Gymnophthalminae with no tribes.

The third taxonomic alteration involves species belonging to the cercosaurine tribe Ecpleopini. The CNR-SSG tree (Fig. 6) suggests that the ecpleopiines and the cercosauriines do not comprise a monophyletic Cercosaurinae. Although posterior probability support for intervening clades is low, monophyly of both groups is well supported. The Ecpleopini appears to be distantly related to the Cercosaurini and we hereby raise the status of the former members of tribe Ecpleopini (Amapasaurus, Anotosaura, Arthrosaura, Colobosauroides, Ecpleopus, and Leposoma; Pellegrino et al., 2001) to subfamily status, the Ecpleopinae Fitzinger.

The fourth taxonomic alteration involves the placement of Bachia. Pellegrino et al. (2001) recovered its placement as basal within the Cercosaurini. The node joining Bachia to the rest of the Cercosaurini was supported by bootstrap values less than 50% on their parsimony tree and by 81% on their maximum likelihood tree. We found conflict between our parsimony reconstructions and Bayesian reconstructions. In the parsimony trees (and 16S and c-mos individual Bayesian gene trees; 16S not shown) we found Bachia to be closely related to Rhachisaurus brachylepis, either joined with the Ecpleopini or distantly related to the Cercosaurinae. In our CNR-SSG reconstruction Bachia appears to be the sister lineage to the rest of the Cercosaurini with low posterior probabilities supporting Bachia in that position. In addition, a large genetic distance separated Bachia from the other cercosauriines. Based on these data we are still unsure of the phylogenetic placement of Bachia within the family. However, we are confident that Bachia appears to be distantly related to all other sampled taxa. We believe the best course of action at the present time is to leave Bachia in the Cercosaurinae but elevate the genus to tribe status, the Bachini. In this way the relationships of this genus with other genera of the Cercosaurinae are not confused.

A new phylogenetic classification for the family is presented in Table 5. Based on the additions we made to the Pellegrino et al. (2001) data set, several other taxonomic issues warrant mention. The addition of Pholidobolus macbrydei provides preliminary support for a monophyletic Pholidobolus. The newly redesignated genus Cercosaura, which now includes all taxa formerly placed in Pantodactylus and Prionodactylus (Doan, 2003a), is supported in this study. The addition of Neusticurus strangulatus shows that this species forms a clade with two other members of its genus, N. ecpleopus and N. juruazensis, but overall the genus is polyphyletic. Additionally, Anotosaura is paraphyletic with respect to Colobosauroides and Colobosaura is paraphyletic with respect to Iphisa.


View this table:
[in this window]
[in a new window]

 
Table 5 Current phylogenetic classification of family Gymnophthalmidae.

 
Proctoporus is the genus that was not included by Pellegrino et al. (2001). Contrary to the conclusions made by Doan (2003b) using morphological data, Proctoporus appears to be a polyphyletic member of the Cercosaurini. In the CNR-SSG reconstruction two separate Proctoporus clades are apparent, separated from each other by Pholidobolus, Cercosaura, and one clade of Neusticurus. One Proctoporus clade is composed of members from Ecuador, whereas the other includes members from Peru and Bolivia. In the parsimony reconstruction Proctoporus ventrimaculatus additionally forms a third lineage that appears to be most closely related to Cercosaura. This species (including an unidentified specimen designated as P. cf. ventrimaculatus) is the sole species from northern Peru, separated by a vast distance from the Ecuadorian clade to its north and the southern Peruvian and Bolivian clade to its south. It is clear that taxonomic rearrangement is necessary to rectify the taxonomy of this genus. This work is underway by Doan and Castoe (in preparation).


    Appendix 1
 Top
 Abstract
 Methods
 Results
 Discussion
 Appendix 1
 Appendix 2
 Acknowledgments
 References
 


View this table:
[in this window]
[in a new window]

 
List of operational taxonomic units (OTUs) used in this study with GenBank accession numbers. Cells with an X indicate that gene sequence was not used in this study. (a) and (b) refer to individuals indicated in the figures. Museum accession numbers for specimens sequenced in this study are given. Acronyms for museums are KU (University of Kansas), MHNSM (Museo de Historia Natural, Universidad Nacional Mayor de San Marcos, Lima, Peru), QCAZ (Museo de Zoología, Pontifica Universidad Católica del Ecuador, Quito, Ecuador), and UTA (University of Texas at Arlington).

 

    Appendix 2
 Top
 Abstract
 Methods
 Results
 Discussion
 Appendix 1
 Appendix 2
 Acknowledgments
 References
 
Problems Discovered with the Pellegrino et al. (2001) Data Set
After reconstructing phylogenies of each data partition separately and combined, we identified four taxa from the Pellegrino et al. (2001) study that demonstrated remarkable incongruence with regard to topological placement among individual gene phylogenies. After careful examination of sequences, phylogenetic trees for independent data partitions, and verification based on our own data collection, we conclude that the names Neusticurus juruazensis and Ptychoglossus brevifrontalis were mistakenly switched by Pellegrino et al. (2001) in the 12S and 16S data sets. With the addition of our specimen of P. brevifrontalis, this was obvious when examining the individual gene trees for 12S and 16S; P. brevifrontalis of Pellegrino et al. (2001) (GenBank nos. AF420757 [GenBank] , AF420697 [GenBank] ) formed a clade with N. ecpleopus and N. strangulatus, whereas N. juruazensis (GenBank nos. AF420758 [GenBank] , AF420704 [GenBank] ) was virtually identical to our P. brevifrontalis. After examining the ND4 and c-mos trees, it was evident that these taxa were inadvertently switched in the 12S and 16S data sets. We swapped the names for these taxa for all of our reconstructions.

An additional problem stems from the inferred phylogenetic placement of Bachia dorbignyi (GenBank no. AF420861 [GenBank] ) and Arthrosaura reticulata (GenBank no. AF420841 [GenBank] ) in the c-mos gene partition. These two species had identical sequences and formed a close sister relationship with Cercosaura eigenmanni. This relationship can been seen in the squamate phylogeny reconstruction of Harris (2003), which used these same sequences. They differed from C. eigenmanni by one base substitution and appeared to be distantly related to other species of Bachia (47 and 51 base substitutions) and from the other three members of the Ecpleopini, for which c-mos sequence was available by 52 to 89 base differences (Colobosauroides cearensis, 77; Ecpleopus gaudichaudii, 52; Leposoma oswaldoi, 89). In all other gene partitions Bachia dorbignyi is closely related to its congeners and Arthrosaura reticulata is closely related to other ecpleopiines. These phylogenetic positions are further supported by morphological features such as leg reduction in all species of Bachia. We cannot say with certainty what this sequence represents but it appears most similar to Cercosaura eigenmanni. Because we could not determine the true origin of the sequences, we excluded these two c-mos sequences from all final analyses.


    Acknowledgments
 Top
 Abstract
 Methods
 Results
 Discussion
 Appendix 1
 Appendix 2
 Acknowledgments
 References
 
We would like to thank L. Ford and C. Raxworthy of AMNH and L. Trueb of KU for the donation of tissues. We thank D. Calloway, P. Chippindale, M. Herron, B. Noonan, C. O'Reilly, A. Rambout, R. Ruggierio, C. Simon, J. Sites, E. Smith, J. Sullivan and two anonymous reviewers for helpful methodological suggestions and comments on the manuscript. We thank the Biology Department at the University of Central Florida, J. Fauth, and J. Weishample for providing computer access.


    References
 Top
 Abstract
 Methods
 Results
 Discussion
 Appendix 1
 Appendix 2
 Acknowledgments
 References
 

    Akaike H. A new look at statistical model identification. IEE Trans. Autom. Contr. (1974) 19:716–723.[CrossRef]

    Alfaro M. E., Zoller S., Lutzoni F. Bayes or bootstrap? A simulation study comparing the performance of Bayesian Markov chain Monte Carlo sampling and Bootstrapping in assessing phylogenetic confidence. Mol. Biol. Evol. (2003) 20:255–266.[Abstract/Free Full Text]

    Arévalo E. S., Davis S. K., Sites J. W. Jr. Mitochondrial DNA sequence divergence and phylogenetic relationships among eight chromosome races of the Sceloporus grammicus complex (Phrynosomatidae) in central Mexico. Syst. Biol. (1994) 43:387–418.[Abstract/Free Full Text]

    Buckley T. R. Model misspecification and probabilistic tests of topology: Evidence from empirical data sets. Syst. Biol. (2002) 51:509–523.[Abstract/Free Full Text]

    Buckley T. R., Cunningham C. W. The effects of nucleotide substitution model assumptions on estimates of nonparametric bootstrap support. Mol. Biol. Evol. (2002) 19:394–405.[Abstract/Free Full Text]

    Buckley T. R., Simon C., Chambers G. K. Exploring among-site rate variation models in a maximum likelihood framework using empirical data: Effects of model assumptions on estimates of topology, branch lengths, and bootstrap support. Syst. Biol. (2001) 50:67–86.[Abstract/Free Full Text]

    Carlin B. P., Louis T. A. Bayes and empirical Bayes methods for data analysis. (1996) New York: Chapman and Hall.

    Caterino M. S., Reed R. D., Kuo M. M., Sperling F. A. H. A partitioned likelihood analysis of swallowtail butterfly phylogeny (Lepidoptera: Papilionidae). Syst. Biol. (2001) 50:106–127.[Abstract/Free Full Text]

    Cummings M. P., Handley S. A., Meyers D. S., Reed D. L., Rokas A., Winka K. Comparing bootstrap and posterior probability values in the four-taxon case. Syst. Biol. (2003) 52:477–487.[Abstract/Free Full Text]

    Dixon M. T., Hillis D. M. Ribosomal RNA secondary structure: Compensatory mutations and implications for phylogenetic analysis. Mol. Biol. Evol. (1993) 10:256–267.[Abstract]

    Doan T. M. A new phylogenetic classification for the gymnophthalmid genera Cercosaura, Pantodactylus, and Prionodactylus (Reptilia: Squamata). Zool. J. Linn. Soc. (2003a) 137:101–115.[CrossRef]

    Doan T. M. A south-to-north biogeographic hypothesis for Andean speciation: Evidence from the lizard genus Proctoporus (Reptilia, Gymnophthalmidae). J. Biogeography (2003b) 30:361–374.

    Doan T. M., Castoe T. A. Using morphological and molecular evidence to infer species boundaries within Proctoporus bolivianus Werner (Squamata: Gymnophthalmidae). Herpetologica (2003) 59:433–450.

    Douady C. J., Delsuc F., Boucher Y., Doolittle W. F., P. Douzery E. J. Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. Mol. Biol. Evol. (2003) 20:248–254.[Abstract/Free Full Text]

    Erixon S. P., Britton B., Oxelman B. Reliability of Bayesian posterior probabilities and bootstrap frequencies in phylogenetics. Syst. Biol. (2003) 52:665–673.[Abstract/Free Full Text]

    Felsenstein J. Statistical inference and the estimation of phylogenies (1968) Chicago: University of Chicago.

    Felsenstein J. Confidence limits on phylogenies: An approach using the bootstrap. Evolution (1985) 39:783–791.[CrossRef][Web of Science]

    Fukami-Kobayashi K., Tateno Y. Robustness of maximum likelihood tree estimation against different patterns of base substitutions. J. Mol. Evol. (1991) 32:79–91.[CrossRef][Web of Science][Medline]

    Gasser R. B., Zhu X., Chilton N. B., Newton L. A., Nedergaard T., Guldberg P. Analysis of sequence homogenization in rDNA arrays of Haemonchus contortus by denaturing gradient gel electrophoresis. Electrophoresis (1998) 19:2391–2395.[CrossRef][Web of Science][Medline]

    Sequencher, version 3.1—Gene Codes Corp, ed. (1996) Ann Arbor: Gene Codes.

    Gonzales I. L., Sylvester J. E. Human rRNA: Evolutionary patterns within genes and tandem arrays derived from multiple chromosomes. Genomics (2001) 73:255–263.[CrossRef][Web of Science][Medline]

    Gutell R. R. Collection of small subunit (16S and 16S-like) ribosomal RNA structures. Nucleic Acid Res. (1994) 22:3502–3507.[Abstract/Free Full Text]

    Gutell R. R., Larsen N., Woese C. R. Lessons from an evolving rRNA: 16S and 23S rRNA structures from a comparative perspective. Microbiol. Rev. (1994) 58:10–26.[Abstract/Free Full Text]

    Harris D. J. Codon bias variation in C-mos between squamate families might distort phylogenetic inferences. Mol. Phylogenet. Evol. (2003) 27:540–544.[CrossRef][Web of Science][Medline]

    Herron M. D., Castoe T. A., Parkinson C. L. Sciurid phylogeny and the paraphyly of Holarctic ground squirrels (Spermophilus). In: Mol. Phylogenet. Evol. in press.

    Hickson R., Simon C., Cooper A., Spicer G., Sullivan J., Penny D. Conserved sequence motifs, alignment, and secondary structure for the third domain of animal 12s rRNA. Mol. Biol. Evol. (1996) 13:150–169.[Abstract]

    Hillis D. M. Taxonomic sampling, phylogenetic accuracy, and investigator bias. Syst. Biol. (1998) 47:3–8.[Free Full Text]

    Hillis D. M., Bull J. J. An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst. Biol. (1993) 42:182–192.[Abstract/Free Full Text]

    Huelsenbeck J. P. Performance of phylogenetic methods in simulation. Syst. Biol. (1995) 44:17–48.[Abstract/Free Full Text]

    Huelsenbeck J. P. Is the Felsenstein zone a fly trap? Syst. Biol. (1997) 46:69–74.[Abstract/Free Full Text]

    Huelsenbeck J. P. Testing a covariotide model of DNA substitution. Mol. Biol. Evol. (2002) 19:698–707.[Abstract/Free Full Text]

    Huelsenbeck J. P., Crandall K. A. Phylogeny estimation and hypothesis testing using maximum likelihood. Ann. Rev. Ecol. Syst. (1997) 28:437–466.[CrossRef][Web of Science]

    Huelsenbeck J. P., Larget B., Miller R., Ronquist F. Potential applications and pitfalls of bayesian inference of phylogeny. Syst. Biol. (2002) 51:673–688.[Abstract/Free Full Text]

    Huelsenbeck J. P., Ronquist F. MrBayes: Bayesian inference of phylogeny. Bioinformatics (2001) 17:754–755.[Abstract/Free Full Text]

    Huelsenbeck J. P., Ronquist R., Nielsen R., Bollback J. Bayesian inference of phylogeny and its impacts on evolutionary biology. Science (2001) 294:2310–2314.[Abstract/Free Full Text]

    Initial sequencing and analysis of the human genome. Nature—International Human Genome Sequencing Consortium, ed. (2001) 409:860–921.[CrossRef][Medline]

    Kelly C., Rice J. Modeling nucleotide evolution: A heterogeneous rate analysis. Math. Biosci. (1996) 133:85–109.[CrossRef][Web of Science][Medline]

    Kimura M. The role of compensatory neutral mutations in molecular evolution. J. Genetics (1986) 64:7–19.

    Krajewski C., Fain M. G., Buckley L., King D. G. Dynamically heterogeneous partitions and phylogenetic inference: An evaluation of analytical strategies with cytochrome b and ND6 gene sequences in cranes. Mol. Phylogenet. Evol. (1999) 13:302–313.[CrossRef][Web of Science][Medline]

    Leaché A. D., Reeder T. W. Molecular systematics of the eastern fence lizard (Sceloporus undulatus): A comparison of parsimony, likelihood, and Bayesian approaches. Syst. Biol. (2002) 51:44–68.[Abstract/Free Full Text]

    Long E. O., Dawid I. B. Repeated genes in eukaryotes. Ann. Rev. Biochem. (1980) 49:727–764.[CrossRef][Web of Science][Medline]

    Minin V., Abdo Z., Joyce P., Sullivan J. Performance-based selection of likelihood models for phylogeny estimation. Syst. Biol. (2003) 52:674–683.[Abstract/Free Full Text]

    Miyata T. Evolutionary changes and functional constraints in DNA sequences. In: Molecular evolution, protein polymorphism and the neutral allele theory—Kimura M., ed. (1982) Tokyo: Japan Scientific Press. 233–266. Pages.

    Moncalvo J. M., Drehmel D., Vilgalys R. Variation in modes and rates of evolution in nuclear and mitochondrial ribosomal DNA in the mushroom genus Aminita (Agaricales, Basidiomycota): Phylogenetic implications. Mol. Phylogenet. Evol. (2000) 16:48–63.[CrossRef][Web of Science][Medline]

    Muse S. V. Evolutionary analyses when nucleotides do not evolve independently. In: Current topics on molecular evolution—Nei M., Takahata N., eds. (1995) PA: Institute on Molecular Evolution and Genetics, University Park. 115–124. Pages.

    Myers C. W., Donnelly M. A. Herpetofauna of the Yutajé-Corocoro Massif, Venezuela: Second report from the Robert G. Goelet American Museum-Terramar expedition to the northwestern tepuis. Bull. Am. Mus. Nat. Hist. (2001) 261:1–85.[CrossRef]

    Nielsen R. Site-by-site estimation of the rate of substitution and the correlation of rates in mitochondrial DNA. Syst. Biol. (1997) 46:346–353.[Abstract/Free Full Text]

    Nicholas K. B., Nicholas H. B. Jr. GeneDoc: A tool for editing and annotating multiple sequence alignments. (1997) Distributed by the author at www.cris.com/~Ketchup/genedoc.shtml.

    Nylander J. A. A., Ronquist F., Huelsenbeck J. P., Nieves-Aldrey J. L. Bayesian phylogenetic analysis of combined data. Syst. Biol. (2004) 53:47–67.[Abstract/Free Full Text]

    Pamilo P., Nei M. Relationships between gene trees and species trees. Mol. Biol. Evol. (1988) 5:568–583.[Abstract]

    Pardue M. L. Localization of repeated DNA sequences in Xenopus chromosomes. Cold Spring Harbor Symp. Quant. Biol. (1974) 38:475–482.[Abstract/Free Full Text]

    Parkinson C. L. Molecular systematics and biogeographical history of pitvipers as determined by mitochondrial ribosomal DNA sequences. Copeia (1999) 1999:576–586.[CrossRef]

    Parkinson C. L., Moody S. M., Alquist J. E. Phylogenetic relationships of the ‘Agkistrodon Complex’ based on mitochondrial DNA sequence data. In: Venomous snakes: Ecology, evolution, and snakebite—Thorpe R. S., Wüster W., Malhotra A., eds. (1997) Oxford: Symposia of the Zoological Society of London. Clarendon Press. 63–78. Pages.

    Pellegrino K. C. M., Rodrigues M. T., Yonenaga-Yassuda Y., Sites J. W. Jr. A molecular perspective on the evolution of microteiid lizards (Squamata: Gymnophthalmidae), and a new classification for the family. Biol. J. Linnean Soc. (2001) 74:315–338.[CrossRef]

    Penny D., McComish B. J., Carleston M. A., Hendy M. D. Mathematical elegance with biochemical realism: The covarion model of molecular evolution. J. Mol. Evol. (2001) 53:711–723.[CrossRef][Web of Science][Medline]

    Posada D., Crandall K. A. Model-Test: Testing the model of DNA substitution. Bioinformatics (1998) 14:817–818.[Abstract/Free Full Text]

    Posada D., Crandall K. A. Selecting the best-fit model of nucleotide substitution. Syst. Biol. (2001) 50:580–601.[Abstract/Free Full Text]

    Pupko T., Huchon D., Cao Y., Okada N., Hasegawa M. Combining multiple data sets in a likelihood analysis: Which models are the best? Mol. Biol. Evol. (2002) 19:2294–2307.[Abstract/Free Full Text]

    Rannala B. Identifiability of parameters in MCMC Bayesian inference of phylogeny. Syst. Biol. (2002) 51:754–760.[Abstract/Free Full Text]

    Rogers J. S. Maximum likelihood estimation of phylogenetic trees in consistent when substitution rates vary according to the invariable sites plus gamma distribution. Syst. Biol. (2001) 50:713–722.[Abstract/Free Full Text]

    Saint K. M., Austin C. C., Donnellan S. C., Hutchinson M. N. C-mos, a nuclear marker useful for squamate phylogenetic analysis. Mol. Phylogenet. Evol. (1998) 10:259–263.

    Savill N. J., Hoyle D. C., Higgs P. G. RNA sequence evolution with secondary structure constraints: Comparison of substitution rate models using maximum-likelihood methods. Genetics (2001) 157:399–411.[Abstract/Free Full Text]

    Schöniger M., von Haeseler A. A stochastic model and the evolution of autocorrelated DNA sequences. Mol. Phylogenet. Evol. (1994) 3:240–247.[CrossRef][Medline]

    Schwarz G. Estimating the dimension of a model. Ann. Stat. (1974) 6:461–464.[CrossRef]

    Simon C., Frati F., Beckenbach A., Brespi B., Liu H., Flook P. Evolution, weighting, and phylogenetic utility of mitochondrial gene sequences and a compilation of conserved PCR primers. Ann. Entomol. Soc. Am. (1994) 87:651–701.[Web of Science]

    Statistica for Windows, release 4.5.—StatSoft, Inc, ed. (1993) Tulsa: StatSoft.

    Sullivan J., Markert J. A., William Kilpatrick C. Phylogeography and molecular systematics of the Peromyscus aztecus species group (Rodentia: Muridae) inferred using parsimony and likelihood. Syst. Biol. (1997) 46:426–440.[Abstract/Free Full Text]

    Sullivan J., Swofford D. L. Should we use model-based methods for phylogenetic inference when we know that assumptions about among-site rate variation and nucleotide substitution pattern are violated? Syst. Biol. (2001) 50:723–729.[Free Full Text]

    Suzuki Y., Glazko G. V., Nei M. Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics. Proc. Natl. Acad. Sci. U.S.A. (2002) 99:16138–16143.[Abstract/Free Full Text]

    Swofford D. L. PAUP*. Phylogenetic Analysis Using Parsimony (* and Other Methods), version 4.0b10 (2002) Sunderland, MA: Sinauer Associates.

    Tavaré S. Some probabilistic and statistical problems on the analysis of DNA sequences. In: Some mathematical questions in biology—DNA sequence analysis—Miura R. M., ed. (1986) Providence, RI: American Math Society. 57–86. Pages.

    Titus T. A., Frost D. R. Molecular homology assessment and phylogeny in the lizard family Opluridae (Squamata: Iguania). Mol. Phylogenet. Evol. (1996) 6:49–62.[CrossRef][Web of Science][Medline]

    Wald A. Note on the consistency of the maximum likelihood estimate. Ann. Math. Stat. (1949) 20:595–601.[CrossRef]

    Wilcox T. P., Zwickl D. J., Heath T. A., Hillis D. M. Phylogenetic relationships of the dwarf boas and a comparison of Bayesian and bootstrap measures of phylogenetic support. Mol. Phylogenet. Evol. (2002) 25:361–371.[CrossRef][Web of Science][Medline]

    Wu C. I. Inferences of species phylogeny in relation to segregation of ancient polymorphisms. Genetics (1991) 127:429–435.[Abstract]

    Yang Z. Maximum likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites: Approximate methods. Mol. Biol. Evol. (1993) 10:1396–1401.[Abstract]

    Yang Z. A space-time process model for the evolution of DNA sequences. Genetics (1995) 139:993–1005.[Abstract]

    Yang Z. Maximum likelihood models for combined analyses of multiple sequence data. J. Mol. Evol. (1996a) 42:587–596.[CrossRef][Web of Science][Medline]

    Yang Z. Among-site rate variation and its impacts on phylogenetic analyses. Trends Ecol. Evol. (1996b) 11:367–372.[CrossRef]

    Yang Z., Goldman N., Friday A. Comparison of models for nucleotide substitution used in maximum-likelihood phylogenetic estimation. Mol. Biol. Evol. (1994) 11:316–324.[Abstract]

    Yang Z., Goldman N., Friday A. Maximum likelihood trees from DNA sequences: A peculiar statistical estimation problem. Syst. Biol. (1995) 44:384–399.[Abstract]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Syst BiolHome page
M. C. Brandley, D. L. Warren, A. D. Leache, and J. A. McGuire
Homoplasy and Clade Support
Syst Biol, June 29, 2009; (2009) syp019v1.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
C. Li, G. Lu, and G. Orti
Optimal Data Partitioning and a Test Case for Ray-Finned Fishes (Actinopterygii) Based on Ten Nuclear Loci
Syst Biol, August 1, 2008; 57(4): 519 - 539.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
E. Benavides, R. Baum, D. McClellan, and J. W. Sites
Molecular Phylogenetics of the Lizard Genus Microlophus (Squamata:Tropiduridae): Aligning and Retrieving Indel Signal from Nuclear Introns
Syst Biol, October 1, 2007; 56(5): 776 - 797.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
J. M. Brown and A. R. Lemmon
The Importance of Data Partitioning and the Utility of Bayes Factors in Bayesian Phylogenetics
Syst Biol, August 1, 2007; 56(4): 643 - 655.
[Abstract] [Full Text] [PDF]


Home page
Int. J. Syst. Evol. Microbiol.Home page
I. M. Chelo, L. Ze-Ze, and R. Tenreiro
Congruence of evolutionary relationships inside the Leuconostoc-Oenococcus-Weissella clade assessed by phylogenetic analysis of the 16S rRNA gene, dnaA, gyrB, rpoC and dnaK
Int J Syst Evol Microbiol, February 1, 2007; 57(2): 276 - 286.
[Abstract] [Full Text] [PDF]


Home page
MicrobiologyHome page
L. Vitorino, I. M. Chelo, F. Bacellar, and L. Ze-Ze
Rickettsiae phylogeny: a multigenic approach
Microbiology, January 1, 2007; 153(1): 160 - 168.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
D. C. Marshall, C. Simon, and T. R. Buckley
Accurate Branch Length Estimation in Partitioned Bayesian Analyses Requires Accommodation of Among-Partition Rate Variation and Attention to Branch Length Priors
Syst Biol, December 1, 2006; 55(6): 993 - 1003.
[Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
L. J. Vitt and E. R. Pianka
From the Cover: Deep history impacts present-day ecology and biodiversity
PNAS, May 31, 2005; 102(22): 7877 - 7881.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
J. P. Huelsenbeck and B. Rannala
Frequentist Properties of Bayesian Posterior Probabilities of Phylogenetic Trees Under Simple and Complex Substitution Models
Syst Biol, December 1, 2004; 53(6): 904 - 913.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (65)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Castoe, T. A.
Right arrow Articles by Parkinson, C. L.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Castoe, T. A.
Right arrow Articles by Parkinson, C. L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?