## Statistics Papers

#### Document Type

Journal Article

#### Date of this Version

2007

#### Publication Source

IEEE/ACM Transactions on Computational Biology and Bioinformatics

#### Volume

4

#### Issue

1

#### Start Page

108

#### Last Page

116

#### DOI

10.1109/TCBB.2007.1010

#### Abstract

We study distorted metrics on binary trees in the context of phylogenetic reconstruction. Given a binary tree *T* on *n* leaves with a path metric *d*, consider the pairwise distances {*d*(*u*, *v*)} between leaves. It is well known that these determine the tree and the *d* length of all edges. Here we consider distortions ˆ*d* of *d* such that for all leaves *u* and *v* it holds that |*d*(*u*, *v*)− *ˆd*(*u*, *v*)| < *f*/2 if either *d*(*u*, *v*) < *M* or *ˆd*(*u*, *v*) < *M*, where *d* satisfies *f* ≤ *d*(e) ≤ *g* for all edges e. Given such distortions we show how to reconstruct in polynomial time a forest T_{1}, . . . , T_{α} such that the true tree *T* may be obtained from that forest by adding *α* − 1 edges and *α* − 1 ≤ 2^{−Ω(M/g)}n.

Metric distortions arise naturally in phylogeny, where *d*(*u*, *v*) is defined by the log-det of a covariance matrix associated with *u* and *v*. When *u* and *v* are “far”, the entries of the covariance matrix are small and therefore *dˆ*(*u*, *v*), which is defined by log-det of an associated empirical correlation matrix may be a bad estimate of *d*(*u*, *v*) even if the correlation matrix is “close” to the covariance matrix.

Our metric results are used in order to show how to reconstruct phylogenetic forests with small number of trees from sequences of length logarithmic in the size of the tree. Our method also yields an independent proof that phylogenetic trees can be reconstructed in polynomial time from sequences of polynomial length under the standard assumptions in phylogeny. Both the metric result and its applications to phylogeny are almost tight.

#### Keywords

Markov processes, biology computing, evolution (biological), polynomials, trees (mathematics), distorted metrics, general Markov model, pairwise distances, phylogenetic forests, phylogenetic reconstruction, phlogeny, polynomial time, trees, binary trees, biological system modeling, character generation, computational complexity, helium, phylogeny, polynomials, reconstruction algorithms, sequences, upper bound, CFN, jukes-cantor, phylogenetics, distortion, forest, metric, tree, algorithms, animals, computational biology, genome, humans, models, genetic, mutation, phylogeny

#### Recommended Citation

Mossel, E.
(2007).
Distorted Metrics on Trees and Phylogenetic Forests.
*IEEE/ACM Transactions on Computational Biology and Bioinformatics,*
*4*
(1),
108-116.
http://dx.doi.org/10.1109/TCBB.2007.1010

#### Included in

Bioinformatics Commons, Computational Biology Commons, Statistics and Probability Commons

**Date Posted:** 27 November 2017

This document has been peer reviewed.