A Statistical Framework For Denoising Single-Cell Rna Sequencing Data

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Statistics
Discipline
Subject
Batch effects
Deep learning
Denoising
Empirical Bayes
RNA-seq
Single cell
Bioinformatics
Genetics
Statistics and Probability
Funder
Grant number
License
Copyright date
2021-08-31T20:20:00-07:00
Distributor
Related resources
Author
Huang, Mo
Contributor
Abstract

Single-cell RNA sequencing (scRNA-seq) is a powerful technique for quantifying the gene expression in individual cells. The output of scRNA-seq is a gene expression matrix where each entry is a count of the number of RNA molecules for a given gene in a cell. However, the observed counts are noisy representations of true expression. Technical noise in scRNA-seq experiments produces an observed expression matrix with low counts and an abundance of zeros, resulting in a low signal to noise ratio. The motivation of this thesis is to develop methods which remove the technical noise while preserving real biological signal. First, we present SAVER, a statistical framework for modeling and denoising scRNA-seq data. SAVER is able to recover the true expression without introducing artificial signal. Then, we consider the problem of removing the effects of sequencing batch and other confounding variables in dimension reduction and denoising of scRNA-seq data. By examining the linear factor model with interactions, we show that a conditional variational autoencoder (CVAE) with a weighted objective function can disentangle latent factors from observed covariates by modeling the interaction effects through its nonlinear activation function. We develop a method called SAVER-CVAE, which incorporates the weighted CVAE into the SAVER framework, and demonstrate its ability to simultaneously perform dimension reduction and denoising while adjusting for observed covariates in scRNA-seq data.

Advisor
Nancy R. Zhang
Date of degree
2020-01-01
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation