Departmental Papers (CIS)

Date of this Version

September 2006

Document Type

Conference Paper


Postprint version. Published in 32nd International Conference on Very Large Data Bases, ACM 2006, September 12-15, 2006, pages 1231-1234. Publisher URL:


Evolutionary and systems biology increasingly rely on the construction of large phylogenetic trees which represent the relationships between species of interest. As the number and size of such trees increases, so does the need for efficient data storage and query capabilities. Although much attention has been focused on XML as a tree data model, phylogenetic trees differ from document-oriented applications in their size and depth, and their need for structure based queries rather than path-based queries.

This paper focuses on Crimson, a tree storage system for phylogenetic trees used to evaluate phylogenetic tree reconstruction algorithms within the context of the NSF CIPRes project. A goal of the modeling component of the CIPRes project is to construct a huge simulation tree representing a "gold standard" of evolutionary history against which phylogenetic tree reconstruction algorithms can be tested.

In this demonstration, we highlight our storage and indexing strategies and show how Crimson is used for benchmarking phylogenetic tree reconstruction algorithms. We also show how our design can be used to support more general queries over phylogenetic trees.


databases, phylogenetics



Date Posted: 09 February 2007

This document has been peer reviewed.