Statistics Papers

Document Type

Journal Article

Date of this Version


Publication Source

IEEE/ACM Transactions on Computational Biology and Bioinformatics





Start Page


Last Page





We study distorted metrics on binary trees in the context of phylogenetic reconstruction. Given a binary tree T on n leaves with a path metric d, consider the pairwise distances {d(u, v)} between leaves. It is well known that these determine the tree and the d length of all edges. Here we consider distortions ˆd of d such that for all leaves u and v it holds that |d(u, v)− ˆd(u, v)| < f/2 if either d(u, v) < M or ˆd(u, v) < M, where d satisfies fd(e) ≤ g for all edges e. Given such distortions we show how to reconstruct in polynomial time a forest T1, . . . , Tα such that the true tree T may be obtained from that forest by adding α − 1 edges and α − 1 ≤ 2−Ω(M/g)n.

Metric distortions arise naturally in phylogeny, where d(u, v) is defined by the log-det of a covariance matrix associated with u and v. When u and v are “far”, the entries of the covariance matrix are small and therefore (u, v), which is defined by log-det of an associated empirical correlation matrix may be a bad estimate of d(u, v) even if the correlation matrix is “close” to the covariance matrix.

Our metric results are used in order to show how to reconstruct phylogenetic forests with small number of trees from sequences of length logarithmic in the size of the tree. Our method also yields an independent proof that phylogenetic trees can be reconstructed in polynomial time from sequences of polynomial length under the standard assumptions in phylogeny. Both the metric result and its applications to phylogeny are almost tight.


Markov processes, biology computing, evolution (biological), polynomials, trees (mathematics), distorted metrics, general Markov model, pairwise distances, phylogenetic forests, phylogenetic reconstruction, phlogeny, polynomial time, trees, binary trees, biological system modeling, character generation, computational complexity, helium, phylogeny, polynomials, reconstruction algorithms, sequences, upper bound, CFN, jukes-cantor, phylogenetics, distortion, forest, metric, tree, algorithms, animals, computational biology, genome, humans, models, genetic, mutation, phylogeny



Date Posted: 27 November 2017

This document has been peer reviewed.