Bayes and Big Data: The Consensus Monte Carlo Algorithm

Loading...
Thumbnail Image
Penn collection
Statistics Papers
Degree type
Discipline
Subject
Bayesian inference
Markov chain Monte Carlo
distributed computing
big data
embarrassingly parallel
Business
Business Administration, Management, and Operations
Business Analytics
Management Sciences and Quantitative Methods
Operations Research, Systems Engineering and Industrial Engineering
Statistics and Probability
Technology and Innovation
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Scott, Steven L
Blocker, Alexander W
Bonassi, Fernando V
Chipman, Hugh A
George, Edward I
McCulloch, Robert E
Contributor
Abstract

A useful definition of ‘big data’ is data that is too big to process comfortably on a single machine, either because of processor, memory, or disk bottlenecks. Graphics processing units can alleviate the processor bottleneck, but memory or disk bottlenecks can only be eliminated by splitting data across multiple machines. Communication between large numbers of machines is expensive (regardless of the amount of data being communicated), so there is a need for algorithms that perform distributed approximate Bayesian analyses with minimal communication. Consensus Monte Carlo operates by running a separate Monte Carlo algorithm on each machine, and then averaging individual Monte Carlo draws across machines. Depending on the model, the resulting draws can be nearly indistinguishable from the draws that would have been obtained by running a single-machine algorithm for a very long time. Examples of consensus Monte Carlo are shown for simple models where single-machine solutions are available, for large single-layer hierarchical models, and for Bayesian additive regression trees (BART).

Advisor
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Publication date
2016-02-16
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation
Collection