Detecting Mutations in Mixed Sample Sequencing Data Using Empirical Bayes

Loading...
Thumbnail Image
Penn collection
Statistics Papers
Degree type
Discipline
Subject
empirical Bayes
false discovery rates
discrete data
DNA sequencing
genome variation
Applied Statistics
Biostatistics
Genetics and Genomics
Other Statistics and Probability
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Muralidharan, Omkar
Natsoulis, Georges
Bell, John
Ji, Hanlee
Zhang, Nancy R
Contributor
Abstract

We develop statistically based methods to detect single nucleotide DNA mutations in next generation sequencing data. Sequencing generates counts of the number of times each base was observed at hundreds of thousands to billions of genome positions in each sample. Using these counts to detect mutations is challenging because mutations may have very low prevalence and sequencing error rates vary dramatically by genome position. The discreteness of sequencing data also creates a difficult multiple testing problem: current false discovery rate methods are designed for continuous data, and work poorly, if at all, on discrete data. We show that a simple randomization technique lets us use continuous false discovery rate methods on discrete data. Our approach is a useful way to estimate false discovery rates for any collection of discrete test statistics, and is hence not limited to sequencing data. We then use an empirical Bayes model to capture different sources of variation in sequencing error rates. The resulting method outperforms existing detection approaches on example data sets.

Advisor
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Publication date
2012-01-01
Journal title
The Annals of Applied Statistics
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation
Collection