Spark for Social Science

Loading...
Thumbnail Image
Penn collection
Degree type
Discipline
Subject
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
MacDonald, Graham
Contributor
Abstract

Urban has developed an elastic and powerful approach to the analysis of massive datasets using Amazon Web Services’ Elastic MapReduce (EMR) and the Spark framework for distributed memory and processing. The goal of the project is to deliver powerful and elastic Spark clusters to researchers and data analysts with as little setup time and effort possible, and at low cost. To do that, at the Urban Institute, we use two critical components: (1) an Amazon Web Services (AWS) CloudFormation script to launch AWS Elastic MapReduce (EMR) clusters (2) a bootstrap script that runs on the Master node of the new cluster to install statistical programs and development environments (RStudio and Jupyter Notebooks). The Urban Institute’s Spark for Social Science Github page holds code used to setup the cluster and tutorials for learning how to program in R and Python.

Advisor
Date of presentation
2018-11-01
Conference name
2018 ADRF Network Research Conference Presentations
Conference dates
2023-05-17T21:29:12.000
Conference location
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
DOI: https://doi.org/10.23889/ijpds.v3i5.1044
Recommended citation
Collection