Statistics Papers

Document Type

Journal Article

Date of this Version

10-2013

Publication Source

Journal of Mathematical Biology

Volume

67

Issue

4

Start Page

767

Last Page

797

DOI

10.1007/s00285-012-0571-4

Abstract

Mutation rate variation across loci is well known to cause difficulties, notably identifiability issues, in the reconstruction of evolutionary trees from molecular sequences. Here we introduce a new approach for estimating general rates-across-sites models. Our results imply, in particular, that large phylogenies are typically identifiable under rate variation. We also derive sequence-length requirements for high-probability reconstruction. Our main contribution is a novel algorithm that clusters sites according to their mutation rate. Following this site clustering step, standard reconstruction techniques can be used to recover the phylogeny. Our results rely on a basic insight: that, for large trees, certain site statistics experience concentration-of-measure phenomena.

Copyright/Permission Statement

The final publication is available at Springer via http://dx.doi.org/ 10.1007/s00285-012-0571-4.

Keywords

phylogenetic reconstruction, rates-across-sites models, concentration of measure

Included in

Biostatistics Commons

Share

COinS
 

Date Posted: 27 November 2017

This document has been peer reviewed.