Efficient scientific data management over trees

Yifeng Zheng, University of Pennsylvania


Fueled by novel technologies capable of producing massive amounts of data, scientists have been faced with an explosion of information that must be rapidly analyzed and integrated with other data to form hypotheses and create knowledge. Success in science now hinges critically on the availability of computational and data management tools to meet these challenges. Michael Stonebraker recently argued that the traditional database concept of “one size fits all” which provides a unique strategy to manage data in all different applications, is no longer applicable in the database market. Nowhere is this truer than with scientific data. Scientific data differs significantly from business data, for which current database technology has been developed. My research is focused on tree-structured scientific data management, one type of scientific data that models an inherently hierarchical process or object. Due to its hierarchical structure, XML has become a common scientific data format (http://xml.gsfc.nasa.gov). However, XML's standard query languages, XPath and XQuery, are not well suited for many scientific applications, in particular, computational linguistics and phylogenetic tree applications. I have spent a significant portion of my research efforts to efficiently support these two types of scientific applications. Specifically, I have studied and summarized commonly used operations (queries) on the data, analyzed why XML techniques cannot be easily applied, and designed and implemented data management systems for these two types of applications.

Subject Area

Computer science

Recommended Citation

Zheng, Yifeng, "Efficient scientific data management over trees" (2007). Dissertations available from ProQuest. AAI3261013.