Distorted Metrics on Trees and Phylogenetic Forests

Loading...
Thumbnail Image
Penn collection
Statistics Papers
Degree type
Discipline
Subject
Markov processes
biology computing
evolution (biological)
polynomials
trees (mathematics)
distorted metrics
general Markov model
pairwise distances
phylogenetic forests
phylogenetic reconstruction
phlogeny
polynomial time
trees
binary trees
biological system modeling
character generation
computational complexity
helium
phylogeny
polynomials
reconstruction algorithms
sequences
upper bound
CFN
jukes-cantor
phylogenetics
distortion
forest
metric
tree
algorithms
animals
computational biology
genome
humans
models
genetic
mutation
phylogeny
Bioinformatics
Computational Biology
Statistics and Probability
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Mossel, Elchanan
Contributor
Abstract

We study distorted metrics on binary trees in the context of phylogenetic reconstruction. Given a binary tree T on n leaves with a path metric d, consider the pairwise distances {d(u, v)} between leaves. It is well known that these determine the tree and the d length of all edges. Here we consider distortions ˆd of d such that for all leaves u and v it holds that |d(u, v)− ˆd(u, v)| < f/2 if either d(u, v) < M or ˆd(u, v) < M, where d satisfies f ≤ d(e) ≤ g for all edges e. Given such distortions we show how to reconstruct in polynomial time a forest T1, . . . , Tα such that the true tree T may be obtained from that forest by adding α − 1 edges and α − 1 ≤ 2−Ω(M/g)n. Metric distortions arise naturally in phylogeny, where d(u, v) is defined by the log-det of a covariance matrix associated with u and v. When u and v are “far”, the entries of the covariance matrix are small and therefore dˆ(u, v), which is defined by log-det of an associated empirical correlation matrix may be a bad estimate of d(u, v) even if the correlation matrix is “close” to the covariance matrix. Our metric results are used in order to show how to reconstruct phylogenetic forests with small number of trees from sequences of length logarithmic in the size of the tree. Our method also yields an independent proof that phylogenetic trees can be reconstructed in polynomial time from sequences of polynomial length under the standard assumptions in phylogeny. Both the metric result and its applications to phylogeny are almost tight.

Advisor
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Publication date
2007-01-01
Journal title
IEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation
Collection