Date of Award

Winter 2009

Degree Type


Degree Name

Doctor of Philosophy (PhD)

Graduate Group

Genomics & Computational Biology

First Advisor

Kim A. Sharp


The surface of a macromolecule, such as a protein, represents the contact point of any interaction that molecule has with solvent, ions, small molecules or other macromolecules. Analyzing the surface of macromolecules has a rich history but analyzing the distances from this surface to other surfaces or volumes has not been extensively explored. Many important questions can be answered quantitatively through these analyses. These include: what is the depth of a pocket or groove on the surface? what is the overall depth of the protein? how deeply are atoms buried from the surface? where are the tunnels in a protein? where are the pockets and what are their shapes? A single algorithm to solve one graph problem, namely Dijkstra’s shortest paths algorithm, forms the basis for algorithms to answer these many questions. Many distances can be measured, for instance the distance from the convex hull to the molecular surface while avoiding the interior of the surface is defined as Travel Depth. Alternatively, the distance from the surface to every atom can be measured, giving a measure of the Burial Depth of given residues. Measuring the minimum distance to the protein surface for all points in solvent, combined with topological guidance, allows tunnels to be located. Analyzing the surface from the deepest Travel Depth upwards allows pockets to be catalogued over the entire protein surface for additional shape analysis. Ligand binding sites in proteins are significantly deep, though this does not affect the binding affinity. Hyperthermostable proteins have a less deep surface but bury atoms more deeply, forming more spherical shapes than their mesophilic counterparts. Tunnels through proteins can be identified, for the first time tunnels that are winding or bifurcated can be analyzed. Pockets can be found all over the protein surface and these pockets can be tracked through time series, mutational series, or over protein families. All of these results are new and for the first time provide quantitative and statistical verification of some previous hypotheses about protein shape.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."