Date of Award
Doctor of Philosophy (PhD)
Nancy R. Zhang
Single-cell RNA sequencing (scRNA-seq) is a powerful technique for quantifying the gene expression in individual cells. The output of scRNA-seq is a gene expression matrix where each entry is a count of the number of RNA molecules for a given gene in a cell. However, the observed counts are noisy representations of true expression. Technical noise in scRNA-seq experiments produces an observed expression matrix with low counts and an abundance of zeros, resulting in a low signal to noise ratio. The motivation of this thesis is to develop methods which remove the technical noise while preserving real biological signal. First, we present SAVER, a statistical framework for modeling and denoising scRNA-seq data. SAVER is able to recover the true expression without introducing artificial signal. Then, we consider the problem of removing the effects of sequencing batch and other confounding variables in dimension reduction and denoising of scRNA-seq data. By examining the linear factor model with interactions, we show that a conditional variational autoencoder (CVAE) with a weighted objective function can disentangle latent factors from observed covariates by modeling the interaction effects through its nonlinear activation function. We develop a method called SAVER-CVAE, which incorporates the weighted CVAE into the SAVER framework, and demonstrate its ability to simultaneously perform dimension reduction and denoising while adjusting for observed covariates in scRNA-seq data.
Huang, Mo, "A Statistical Framework For Denoising Single-Cell Rna Sequencing Data" (2020). Publicly Accessible Penn Dissertations. 3862.