Big Data For Microorganisms: Computational Approaches Leveraging Large-Scale Microbial Transcriptomic Compendia

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Genomics & Computational Biology
Discipline
Subject
Genomics
Machine learning
Microbiology
Computer Sciences
Genetics
Funder
Grant number
License
Copyright date
2022-10-05T20:22:00-07:00
Distributor
Author
Lee, Alexandra
Contributor
Abstract

Genome-wide transcriptomics data captures the molecular state of microorganisms – the expression patterns of genes in response to some condition or stimuli. With advancements in high-throughput sequencing technologies, there are thousands of microbial transcription profiles publicly available. Consequently, this data has been collected and integrated to form transcriptomic compendia, which are collections of diverse gene expression experiments. These compendia were found to be a valuable resource for studying systems level biology and hypothesis generation. We describe the construction, benefits and challenges in creating microbial transcriptomic compendia in Chapter 1. One challenge for compendia, which integrates across many different experiments, is batch effects, which are technical sources of variability that can disrupt the detection of underlying biological signals of interest. In Chapter 2, we use a generative neural network to simulate gene expression compendia with varying amounts of technical variability and assess the ability to detect the underlying biological structure in the data after noise was added and then after batch correction was applied. We define a set of principles for how batch correction should be used in the context of these large-scale compendia. In Chapter 3 and 4 we introduce computational approaches to use compendia to improve the analysis of individual experiments and analysis of genomic patterns respectively. In Chapter 3, we develop a portable framework to distinguish between common and context specific transcriptional signals using a compendium to autogenerate a null set of expression changes. This approach allows researchers to put gene expression changes from their individual experiment of interest into the context of existing compendia of experiments. In Chapter 4 we develop an approach to examine the effect of different Pseudomonas aeruginosa genomes, using two dominant strain types, on transcriptional profiles in order to understand how traits manifest. This genome-wide approach reveals a more complete picture of how different genomes affect expression, which mediates different traits present. Overall, these compendia provide a valuable resource that computational tools can leverage to extract patterns and inform research directions.

Advisor
Casey S. Greene
Date of degree
2022-01-01
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation