Date of this Version
International Journal of Management Sciences and Engineering Management
A useful definition of ‘big data’ is data that is too big to process comfortably on a single machine, either because of processor, memory, or disk bottlenecks. Graphics processing units can alleviate the processor bottleneck, but memory or disk bottlenecks can only be eliminated by splitting data across multiple machines. Communication between large numbers of machines is expensive (regardless of the amount of data being communicated), so there is a need for algorithms that perform distributed approximate Bayesian analyses with minimal communication. Consensus Monte Carlo operates by running a separate Monte Carlo algorithm on each machine, and then averaging individual Monte Carlo draws across machines. Depending on the model, the resulting draws can be nearly indistinguishable from the draws that would have been obtained by running a single-machine algorithm for a very long time. Examples of consensus Monte Carlo are shown for simple models where single-machine solutions are available, for large single-layer hierarchical models, and for Bayesian additive regression trees (BART).
This is an Accepted Manuscript of an article published by Taylor & Francis in the International Journal of Management Science and Engineering Management on 16 February 2016, available online: http://dx.doi.org/10.1080/17509653.2016.1142191
Bayesian inference, Markov chain Monte Carlo, distributed computing, big data, embarrassingly parallel
Scott, S. L., Blocker, A. W., Bonassi, F. V., Chipman, H. A., George, E. I., & McCulloch, R. E. (2016). Bayes and Big Data: The Consensus Monte Carlo Algorithm. International Journal of Management Sciences and Engineering Management, 11 (2), 78-88. http://dx.doi.org/10.1080/17509653.2016.1142191
Business Administration, Management, and Operations Commons, Business Analytics Commons, Management Sciences and Quantitative Methods Commons, Operations Research, Systems Engineering and Industrial Engineering Commons, Statistics and Probability Commons, Technology and Innovation Commons
Date Posted: 25 October 2018
This document has been peer reviewed.