Human Mutation/substitution Rate: Variability, Modeling And Applications

dc.contributor.advisorBenjamin F. Voight
dc.contributor.authorAggarwala, Varun
dc.date2023-05-17T19:21:50.000
dc.date.accessioned2023-05-22T16:58:47Z
dc.date.available2020-01-13T00:00:00Z
dc.date.copyright2018-02-23T20:16:00-08:00
dc.date.issued2016-01-01
dc.date.submitted2018-02-23T12:42:06-08:00
dc.description.abstractMutation generates genetic variation, and in turn selection purges deleterious variants from the population. Understanding both is critical for discovering causal genes and variants behind diseases or making inferences about evolutionary processes. Human mutation rate varies significantly across the genome although most studies have only considered the immediate flanking nucleotides around the polymorphic site to model and study patterns of variability. The impact of larger sequence context has not been fully clarified, even though it substantially influences rates of mutation. In the first part of this thesis, I develop a novel statistical framework and using data from the 1000 Genomes project, demonstrate that a larger heptanucleotide sequence context explains >81% variability in substitution probabilities, discovering novel mutation promoting motifs at ApT dinucleotides, CAAT, and TACG sequences. My approach also reveals previously undocumented variability in C-to-T substitutions at CpG sites, not immediately explained by differential methylation intensity. Building on this framework, I model the selective forces acting on the coding genome and develop statistical scores that measures the intolerance at the gene or amino-acid level for functional variants. I demonstrate clinical utility of such intolerance scores in identifying genes associated with multiple human diseases including Autism. Next, I apply these lessons of mutation rate variability to develop an algorithm to detect sub-genic enrichment of de novo germline mutations in RB1 gene of bilateral Retinoblastoma (RB) probands to further elucidate disease biology. I demonstrate that previously noted ‘hotspots’ of nonsense mutations in RB1 are compatible with the elevated mutation rates expected at CpG sites, refuting a specific mechanism in RB pathogenesis. I also find enrichment of splice-site donor mutations of exon 6 and 12 but depletion at exon 5, indicative of previously unappreciated heterogeneity in penetrance within this class of substitution. Finally, I generate more accurate and informative estimates of de novo germline mutation rate in humans, and develop a toolkit to simulate, distribute and interpret mutations in human diseases. Overall, my research uncovers novel variability in human mutation rate and provides a systematic framework for analyzing mutational data, which can be used from causal gene discovery to elucidating specific disease mechanisms.
dc.description.degreeDoctor of Philosophy (PhD)
dc.format.extent173 p.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://repository.upenn.edu/handle/20.500.14332/29046
dc.languageen
dc.legacy.articleid3944
dc.legacy.fulltexturlhttps://repository.upenn.edu/cgi/viewcontent.cgi?article=3944&context=edissertations&unstamped=1
dc.provenanceReceived from ProQuest
dc.relation.urlhttps://repository.upenn.edu/cgi/viewcontent.cgi?filename=0&article=3944&context=edissertations&type=additional
dc.rightsVarun Aggarwala
dc.source.issue2158
dc.source.journalPublicly Accessible Penn Dissertations
dc.source.statuspublished
dc.subject.otherComputational Biology
dc.subject.otherEvolutionary Biology
dc.subject.otherHuman Genetics
dc.subject.otherMutation Rate
dc.subject.otherStatistical Genetics
dc.subject.otherBioinformatics
dc.subject.otherGenetics
dc.titleHuman Mutation/substitution Rate: Variability, Modeling And Applications
dc.typeDissertation/Thesis
digcom.contributor.authorisAuthorOfPublication|email:varunaggarwala01@gmail.com|institution:University of Pennsylvania|Aggarwala, Varun
digcom.date.embargo2020-01-13T00:00:00-08:00
digcom.identifieredissertations/2158
digcom.identifier.contextkey11636617
digcom.identifier.submissionpathedissertations/2158
digcom.typedissertation
dspace.entity.typePublication
relation.isAuthorOfPublicationf446b072-8d7d-416d-aca9-7fb6f23f57dc
relation.isAuthorOfPublication.latestForDiscoveryf446b072-8d7d-416d-aca9-7fb6f23f57dc
upenn.graduate.groupGenomics & Computational Biology
Files
Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
Aggarwala_upenngdas_0175C_12452.pdf
Size:
5.11 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
0-Supplementary_Files.xlsx
Size:
16.46 MB
Format:
Microsoft Excel XML