Multivariate Statistical Analysis In Single Cell Transcriptomics
Single Cell RNA Sequencing
Statistics and Probability
With technological advances in the last decade, single cell RNA sequencing (scRNAseq) has emerged as an exciting weapon in the modern scientist’s arsenal to unravel cellular heterogeneity. The ability to measure the transcriptome in each individual cell in any given tissue poses challenges galore, ranging from purely taxonomical to methodological and computational. This thesis deals with the latter. Coping with the hyper-diversity of cells requires fundamental conceptual advances in computational biology. Chapter 1 sets the stage for the subsequent sections that follow, and provides a basic overview of the single cell transcriptomics field, specifically focusing on the questions motivated by a molecular snapshot of individual cells. In Chapter 2, we tackle the philosophical notion of similarity, and introduce a rank-based function on probability spaces that can be utilized to define cell-to-cell distance, and subsequently cluster and identify niche cell groups. We demonstrate that this function is a valid kernel method, and can thus find broad utility in kernel-based machine learning algorithms. In Chapter 3, we summarize a framework for comparing a multivariate distribution across k groups, and thereafter describe scenarios where this graph-based test allows us to compare the distribution of gene sets, corresponding to a biological pathway or function, across a set of closely related cell types. We describe how, when paired with scRNAseq data obtained from different T cell subtypes, this method allowed us to gauge new biological insights regarding the T cell metabolic machinery. Chapters 4 and 5 dive further into a specific pathway, the complement system. It is well known that the complement system shapes homeostasis in immune-privileged organs such as the brain and the retina, and that intracellular complement also regulates cellular metabolism. In Chapter 4, we describe analysis of scRNAseq data from the murine retina, which insinuates at the presence of a local retinal complement system. And lastly in Chapter 5, we describe a novel statistical method for detecting pathway- pathway synergy using scRNAseq data, based on canonical correlations. We used this method to identify metabolic pathway candidates that might potentially interact with the complement system, and subsequently affect metabolic reprogramming in T cells.