Discovering Pathway And Cell Type Signatures In Transcriptomic Compendia With Machine Learning

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Genomics & Computational Biology
Discipline
Subject
Cancer
Gene Expression
Machine Learning
Biology
Computer Sciences
Genetics
Funder
Grant number
License
Copyright date
2019-08-27T20:19:00-07:00
Distributor
Related resources
Contributor
Abstract

Gene expression measurements capture downstream biological responses to molecular perturbations. This systems biology perspective can be investigated using both supervised and unsupervised machine learning approaches to rapidly derive insight, including cell type and pathway signatures, from transcriptomic compendia. Machine learning applied to transcriptomic compendia can aid in biological discovery, hypothesis generation, and precision medicine. We introduce these topics and discuss their impact in Chapter 1. In Chapters 2-4, we describe and extend a supervised learning approach to detect aberrant gene and pathway activity in cancer. We apply this approach to identify patient tumors, cell lines, and patient derived xenograft models with TP53 loss of function, Ras signaling activation, and NF1 loss. This approach facilitates the discovery of phenocopying variants and potential hidden responders to specific therapies. In Chapters 5-6, we focus on deriving transcriptomic signatures using unsupervised learning. We show that unsupervised learning can identify disease subtypes and can be used to develop gene expression signatures without the need to specify labels a priori. In Chapter 5, we assess the reproducibility of high grade serous ovarian cancer (HGSC) gene expression subtypes across populations and clustering algorithms. In Chapter 6, we train a variational autoencoder on patient tumors and use latent space arithmetic to identify gene signatures most distinguishing HGSC subtypes. Lastly, in Chapter 7, we develop an approach to rapidly interpret compressed features engineered in unsupervised learning algorithms. We train a series of unsupervised models across a wide range of latent space dimensions and develop a network-based method for interpreting these compressed gene expression features. Using this approach, we observe that modifying the hidden layer dimensionality impacts the identification of specific geneset and cell-type activation patterns in cancer and normal tissue. Machine learning models scale to large genomic datasets and have provided state of the art results in a variety of biomedical domains. However, model interpretation is critical to build knowledge and to generate hypotheses.

Advisor
Casey S. Greene
Date of degree
2019-01-01
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation