Date of Award
Doctor of Philosophy (PhD)
Genomics & Computational Biology
William F. DeGrado
Jeff G. Saven
Structural biologists currently confront serious challenges in the effective interpretation of experimental data due to two contradictory situations: a severe lack of structural data for certain classes of proteins, and an incredible abundance of data for other classes. The challenge with small data sets is how to extract sufficient information to draw meaningful conclusions, while the challenge with large data sets is how to curate, categorize, and search the data to allow for its meaningful interpretation and application to scientific problems. Here, we develop computational strategies to address both sparse and abundant data sets. In the category of sparse data sets, we focus our attention on the problem of transmembrane (TM) protein structure determination. As X-ray crystallography and NMR data is notoriously difficult to obtain for TM proteins, we develop a novel algorithm which uses low-resolution data from protein cross-linking or scanning mutagenesis studies to produce models of TM helix oligomers and show that our method produces models with an accuracy on par with X-ray crystallography or NMR for a test set of known TM proteins. Turning to instances of data abundance, we examine how to mine the vast stores of protein structural data in the Protein Data Bank (PDB) to aid in the design of proteins with novel binding properties. We show how the identification of an anion binding motif in an antibody structure allowed us to develop a phosphate binding module that can be used to produce novel antibodies to phosphorylated peptides - creating antibodies to 7 novel phospho-peptides to illustrate the utility of our approach. We then describe a general strategy for designing binders to a target protein epitope based upon recapitulating protein interaction geometries which are over-represented in the PDB. We follow this by using data describing the transition probabilities of amino acids to develop a novel set of degenerate codons to create more efficient gene libraries. We conclude by describing a novel, real-time, all-atom structural search engine, giving researchers the ability to quickly search known protein structures for a motif of interest and providing a new interactive paradigm of protein design.
Hannigan, Brett Thomas, "Navigating the Extremes of Biological Datasets for Reliable Structural Inference and Design" (2013). Publicly Accessible Penn Dissertations. 871.