© 2008 Society of Systematic Biologists
A Model-Based Approach to Study Nearest-Neighbor Influences Reveals Complex Substitution Patterns in Non-coding Sequences
1 Department of Applied Mathematics and Computer Science, Ghent University Krijgslaan 281 S9, B-9000 Ghent, Belgium
2 Department of Plant Systems Biology VIB, Technologiepark 927, B-9052 Ghent, Belgium; E-mail: yves.vandepeer{at}psb.ugent.be (Y.V.d.P.)
3 Bioinformatics and Evolutionary Genomics, Department of Molecular Genetics, Ghent University B-9052 Ghent, Belgium
Edited by Olivier Gascuel
| Abstract |
|---|
In this article, we present a likelihood-based framework for modeling site dependencies. Our approach builds upon standard evolutionary models but incorporates site dependencies across the entire tree by letting the evolutionary parameters in these models depend upon the ancestral states at the neighboring sites. It thus avoids the need for introducing new and high-dimensional evolutionary models for site-dependent evolution. We propose a Markov chain Monte Carlo approach with data augmentation to infer the evolutionary parameters under our model. Although our approach allows for wide-ranging site dependencies, we illustrate its use, in two non-coding datasets, in the case of nearest-neighbor dependencies (i.e., evolution directly depending only upon the immediate flanking sites). The results reveal that the general time-reversible model with nearest-neighbor dependencies substantially improves the fit to the data as compared to the corresponding model with site independence. Using the parameter estimates from our model, we elaborate on the importance of the 5-methylcytosine deamination process (i.e., the CpG effect) and show that this process also depends upon the 5' neighboring base identity. We hint at the possibility of a so-called TpA effect and show that the observed substitution behavior is very complex in the light of dinucleotide estimates. We also discuss the presence of CpG effects in a nuclear small subunit dataset and find significant evidence that evolutionary models incorporating context-dependent effects perform substantially better than independent-site models and in some cases even outperform models that incorporate varying rates across sites.
Keywords: Bayes factor; context effect; context-dependent evolution; CpG effect; likelihood function; Markov chain Monte Carlo; nearest-neighbor influences; thermodynamic integration
Received January 4, 2008; Revised March 31, 2008; Accepted June 17, 2008
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
N. Rodrigue, C. L. Kleinman, H. Philippe, and N. Lartillot Computational Methods for Evaluating Phylogenetic Models of Coding Sequence Evolution with Dependence between Codons Mol. Biol. Evol., July 1, 2009; 26(7): 1663 - 1676. [Abstract] [Full Text] [PDF] |
||||
