Navigating the Extremes of Biological Datasets for Reliable Structural Inference and Design

Hannigan, Brett Thomas

Navigating the Extremes of Biological Datasets for Reliable Structural Inference and Design

Files

Hannigan_upenngdas_0175C_10984.pdf (7.22 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Genomics & Computational Biology

Subject

computational biology
degenerate codons
gene libraries
protein design
protein engineering
structural search
Bioinformatics
Biophysics

Copyright date

2014-08-22T00:00:00-07:00

Permalink

https://repository.upenn.edu/handle/20.500.14332/32654

View all metadata

Author

Hannigan, Brett Thomas

Abstract

Structural biologists currently confront serious challenges in the effective interpretation of experimental data due to two contradictory situations: a severe lack of structural data for certain classes of proteins, and an incredible abundance of data for other classes. The challenge with small data sets is how to extract sufficient information to draw meaningful conclusions, while the challenge with large data sets is how to curate, categorize, and search the data to allow for its meaningful interpretation and application to scientific problems. Here, we develop computational strategies to address both sparse and abundant data sets. In the category of sparse data sets, we focus our attention on the problem of transmembrane (TM) protein structure determination. As X-ray crystallography and NMR data is notoriously difficult to obtain for TM proteins, we develop a novel algorithm which uses low-resolution data from protein cross-linking or scanning mutagenesis studies to produce models of TM helix oligomers and show that our method produces models with an accuracy on par with X-ray crystallography or NMR for a test set of known TM proteins. Turning to instances of data abundance, we examine how to mine the vast stores of protein structural data in the Protein Data Bank (PDB) to aid in the design of proteins with novel binding properties. We show how the identification of an anion binding motif in an antibody structure allowed us to develop a phosphate binding module that can be used to produce novel antibodies to phosphorylated peptides - creating antibodies to 7 novel phospho-peptides to illustrate the utility of our approach. We then describe a general strategy for designing binders to a target protein epitope based upon recapitulating protein interaction geometries which are over-represented in the PDB. We follow this by using data describing the transition probabilities of amino acids to develop a novel set of degenerate codons to create more efficient gene libraries. We conclude by describing a novel, real-time, all-atom structural search engine, giving researchers the ability to quickly search known protein structures for a motif of interest and providing a new interactive paradigm of protein design.

Advisor

William F. DeGrado
Jeff G. Saven

Date of degree

2013-01-01

Collection

Dissertations and Theses