© 2007 Society of Systematic Biologists
Pitfalls of Heterogeneous Processes for Phylogenetic Reconstruction
tefankovi
1
1 Department of Computer Science, University of Rochester Rochester, New York, 14627, USA Comenius University Bratislava E-mail: stefanko{at}cs.rochester.edu
2 College of Computing, Georgia Institute of Technology Atlanta, Georgia, 30332, USA E-mail: vigoda{at}cc.gatech.edu
Edited by Jack Sullivan: Associate Editor
| Abstract |
|---|
Different genes often have different phylogenetic histories. Even within regions having the same phylogenetic history, the mutation rates often vary. We investigate the prospects of phylogenetic reconstruction when all the characters are generated from the same tree topology, but the branch lengths vary (with possibly different tree shapes). Furthering work of Kolaczkowski and Thornton (2004, Nature 431: 980–984) and Chang (1996, Math. Biosci. 134: 189–216), we show examples where maximum likelihood (under a homogeneous model) is an inconsistent estimator of the tree. We then explore the prospects of phylogenetic inference under a heterogeneous model. In some models, there are examples where phylogenetic inference under any method is impossible—despite the fact that there is a common tree topology. In particular, there are nonidentifiable mixture distributions, i.e., multiple topologies generate identical mixture distributions. We address which evolutionary models have nonidentifiable mixture distributions and prove that the following duality theorem holds for most DNA substitution models. The model has either: (i) nonidentifiability—two different tree topologies can produce identical mixture distributions, and hence distinguishing between the two topologies is impossible; or (ii) linear tests—there exist linear tests which identify the common tree topology for character data generated by a mixture distribution. The theorem holds for models whose transition matrices can be parameterized by open sets, which includes most of the popular models, such as Tamura-Nei and Kimura's 2-parameter model. The duality theorem relies on our notion of linear tests, which are related to Lake's linear invariants.
Keywords: Inconsistency of likelihood; linear invariants; Markov chain; mixture models; Monte Carlo; non-identifiability; phylogenetic invariants; phyogenetics; rate variation; tree identifiability
Received February 6, 2006; Revised May 8, 2006; Accepted September 12, 2006
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
I. B. Rogozin, M. K. Basu, M. Csuros, and E. V. Koonin Analysis of Rare Genomic Changes Does Not Support the Unikont-Bikont Phylogeny and Suggests Cyanobacterial Symbiosis as the Point of Primary Radiation of Eukaryotes Gen Biol Evol, June 22, 2009; 2009(0): 99 - 113. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. A. Matsen and M. Steel Phylogenetic Mixtures on a Single Tree Can Mimic a Tree of Another Topology Syst Biol, October 1, 2007; 56(5): 767 - 775. [Abstract] [Full Text] [PDF] |
||||

