Using computationally-derived metadata to unlock the tumor microenvironment in high grade serous ovarian cancer
Degree type
Graduate group
Discipline
Genetics and Genomics
Subject
deconvolution
gene expression
ovarian cancer
scRNAseq
tumor microenvironment
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
One takeaway of the decades of genomic data sharing is that data are most valuable when they are accompanied by detailed metadata. To date, metadata are often thought of only as human-annotated descriptions of samples and their handling. However, machine learning methods can also generate metadata that, when added into analyses, allows researchers to gain more insights out of -omics data. In this work, we offer both novel and improved software tools for the generation of metadata, and demonstrate how computationally-derived metadata can refine our knowledge of the subtypes of high-grade serous ovarian cancer (HGSOC). In chapter 1, we discuss the significance of computationally-derived metadata as a whole and review the various methods that have been used to characterize heterogeneity in the tumor microenvironment. Chapters 2 and 3 focus on new or adapted methods for generation of computational metadata. In chapter 2, we describe a re-implemented transfer learning method to generate new metadata on -omics data. We apply this method to mutational inference of cancer samples. In chapter 3, we present a method to generate a new type of metadata for single-cell RNA-seq data that allows for better quality control and preprocessing of data in both cancer and non-cancer contexts. Chapters 4 and 5 focus on how one type of computational metadata can elucidate the tumor microenvironment, namely deconvolution of cell type proportions in bulk transcriptomic data. In chapter 4, we consider the various experimental and technical factors that can bias deconvolution, with the goal of identifying best practices for generating data and identifying methods that return robust, accurate estimates of tumor composition. In chapter 5, we apply the lessons from chapter 4 and perform deconvolution on several large HGSOC datasets. We identify differences in tumor composition that align with and adjust the existing model of HGSOC subtypes, with implications for difference in survival. Overall, this work demonstrates the power of computationally-derived metadata to maximize the utility of existing data and spark new hypotheses that deepen our understanding of the molecular underpinnings of health and disease.
Advisor
Wherry, John