Machine Learning Methods For The Analysis Of Single-Cell And Spatially Resolved Transcriptomics Data

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Epidemiology & Biostatistics
Discipline
Subject
machine learning
single cell genomics
spatially resolved transcriptomics
statistical genomics
Biostatistics
Funder
Grant number
License
Copyright date
2022-10-05T20:22:00-07:00
Distributor
Related resources
Author
Hu, Jian
Contributor
Abstract

The advent of high-throughput next-generation sequencing technologies has transformed our understanding of cell biology and human disease. It is now common for investigators to study human cell populations by profiling the transcriptomes for thousands of single cells using single-cell RNA sequencing (scRNA-seq) technologies. In addition, recent advances in spatially resolved transcriptomics (SRT) technologies have enabled gene expression profiling with spatial information in tissues. Knowledge of the relative locations of different cells in a tissue is critical for understanding disease pathology because spatial information helps in understanding how the gene expression of a cell is influenced by its surrounding environment and how neighboring regions interact at the gene expression level. In order to take full advantage of the multi-modality information when analyzing scRNA-seq and SRT data, new methods are demanded for the following challenges: (1) how to identify cell types for scRNA-seq data with closely related cell types or low sequencing depths? (2) how to jointly model gene expression, spatial location, and histology in SRT data analysis? (3) how to increase gene expression resolution in SRT to study detailed tissue structure? In this dissertation, I seek to address these various challenges and difficulties associated with scRNA-seq and SRT data analyses. To address challenge (1), I developed ItClust, a supervised machine learning method that takes advantage of cell-type-specific gene expression information learned from a well-labeled source dataset, to help cluster and classify cell types on newly generated target data. To address challenge (2), I developed SpaGCN, a graph convolutional network approach that integrates gene expression, spatial location and histology to identify spatial domains and spatially variable genes in SRT data analysis. Lastly, to address challenge (3), I developed TESLA, a machine learning framework that enhances gene expression resolution in SRT and further performs multi-level tissue annotation with pixel-level resolution. I validated the utility of each of these approaches using experimentally validated cell type labels and independent pathologists’ annotation. I also demonstrated real use cases for these methods in deciphering tumor microenvironment in various cancer types.

Advisor
Mingyao Li
Date of degree
2022-01-01
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation