Statistics Papers

Document Type

Journal Article

Date of this Version

5-7-2009

Publication Source

Journal of Theoretical Biology

Volume

258

Issue

1

Start Page

95

Last Page

102

DOI

10.1016/j.jtbi.2009.01.007

Abstract

Phylogenetic trees describe the evolutionary history of a group of present-day species from a common ancestor. These trees are typically reconstructed from aligned DNA sequence data. In this paper we analytically address the following question: Is the amount of sequence data required to accurately reconstruct a tree significantly more than the amount required to test whether or not a candidate tree was the ‘true’ tree? By ‘significantly’, we mean that the two quantities do not behave the same way as a function of the number of species being considered. We prove that, for a certain type of model, the amount of information required is not significantly different; while for another type of model, the information required to test a tree is independent of the number of leaves, while that required to reconstruct it grows with this number. Our results combine probabilistic and combinatorial arguments.

Copyright/Permission Statement

© 2009. This manuscript version is made available under the CC-BY-NC-ND 4.0 license.

Keywords

phylogenetic tree, information content, sequence length, reconstruction

Share

COinS
 

Date Posted: 27 November 2017

This document has been peer reviewed.