Machine Learning Methods For The Analysis Of Single-Cell And Spatially Resolved Transcriptomics Data
Date of Award
Doctor of Philosophy (PhD)
Epidemiology & Biostatistics
The advent of high-throughput next-generation sequencing technologies has transformed our understanding of cell biology and human disease. It is now common for investigators to study human cell populations by profiling the transcriptomes for thousands of single cells using single-cell RNA sequencing (scRNA-seq) technologies. In addition, recent advances in spatially resolved transcriptomics (SRT) technologies have enabled gene expression profiling with spatial information in tissues. Knowledge of the relative locations of different cells in a tissue is critical for understanding disease pathology because spatial information helps in understanding how the gene expression of a cell is influenced by its surrounding environment and how neighboring regions interact at the gene expression level. In order to take full advantage of the multi-modality information when analyzing scRNA-seq and SRT data, new methods are demanded for the following challenges: (1) how to identify cell types for scRNA-seq data with closely related cell types or low sequencing depths? (2) how to jointly model gene expression, spatial location, and histology in SRT data analysis? (3) how to increase gene expression resolution in SRT to study detailed tissue structure? In this dissertation, I seek to address these various challenges and difficulties associated with scRNA-seq and SRT data analyses. To address challenge (1), I developed ItClust, a supervised machine learning method that takes advantage of cell-type-specific gene expression information learned from a well-labeled source dataset, to help cluster and classify cell types on newly generated target data. To address challenge (2), I developed SpaGCN, a graph convolutional network approach that integrates gene expression, spatial location and histology to identify spatial domains and spatially variable genes in SRT data analysis. Lastly, to address challenge (3), I developed TESLA, a machine learning framework that enhances gene expression resolution in SRT and further performs multi-level tissue annotation with pixel-level resolution. I validated the utility of each of these approaches using experimentally validated cell type labels and independent pathologists’ annotation. I also demonstrated real use cases for these methods in deciphering tumor microenvironment in various cancer types.
Hu, Jian, "Machine Learning Methods For The Analysis Of Single-Cell And Spatially Resolved Transcriptomics Data" (2022). Publicly Accessible Penn Dissertations. 5613.