A New Definition And Classification Of Antibody Complementarity Determining Regions: Unsupervised Learning Of Protein Backbone Conformations Informs Antibody Structural Bioinformatics And Design

Simon Paul Kelow, University of Pennsylvania


One of the main challenges in modern molecular biology is to establish general, robust, and precise descriptions of the relationship between structural features of molecules (DNA, RNA, proteins, and glycans) and the sequence of their constituent chemical building blocks (nucleotides, amino acids, monosachharides). In his 1951 Nobel lecture, Linus Pauling predicted that chemistry of the future would rely upon these descriptions to solve problems in biological medicine relevant to human health. As of July 8, 2021, X-ray crystallography, NMR, and Cryo-EM have solved 179,842 molecular structures, which have been deposited in the Protein Data Bank (PDB) along with their associated sequences. Antibodies are the largest such family of deposited protein structures in the PDB, and their importance to human health and research in molecular biology is widely acknowledged. In this work, I first show the development and validation of unsupervised learning software to cluster protein backbone conformations (clustering of backbones for Ramachandran analysis, or COBRA). I then describe the application of this software to the wealth of antibody data in the PDB to provide a novel, electron density validated classification of the antibody complementarity determining regions (CDRs). I compare this new classification to previous classifications of the CDRs to show the improvement of the association between the sequences and structures of the CDRs, the ability to robustly separate various CDR families, and the ability to assess the confidence in the quality of CDR families using electron density as support. In addition to providing a new classification of the antibody CDRs by clustering their backbone conformations, I provide an expanded definition of the antibody binding region by defining, naming, and classifying an antibody V-region segment named the “DE loop”, which resembles the other six CDRs in sequence and structural variability, ability to bind antigen, and ability to stabilize antibodies, but has no current recognition as a canonical member of the CDRs. Finally, I show examples implementing these analyses in RosettaAntibodyDesign (RAbD) software to design antibodies towards SARS-COV-2 Spike Protein Type 1 (S1) Receptor Binding Domain (RBD), and show the experimental data for the generated antibody designs.