Identifiability and Inference of Non-Parametric Rates-Across-Sites Models on Large-Scale Phylogenies

Loading...
Thumbnail Image
Penn collection
Statistics Papers
Degree type
Discipline
Subject
phylogenetic reconstruction
rates-across-sites models
concentration of measure
Biostatistics
Statistics and Probability
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Mossel, Elchanan
Roch, Sebastien
Contributor
Abstract

Mutation rate variation across loci is well known to cause difficulties, notably identifiability issues, in the reconstruction of evolutionary trees from molecular sequences. Here we introduce a new approach for estimating general rates-across-sites models. Our results imply, in particular, that large phylogenies are typically identifiable under rate variation. We also derive sequence-length requirements for high-probability reconstruction. Our main contribution is a novel algorithm that clusters sites according to their mutation rate. Following this site clustering step, standard reconstruction techniques can be used to recover the phylogeny. Our results rely on a basic insight: that, for large trees, certain site statistics experience concentration-of-measure phenomena.

Advisor
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Publication date
2013-10-01
Journal title
Journal of Mathematical Biology
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation
Collection