Predicting And Characterizing The Health Of Individuals And Communities Through Language Analysis Of Social Media

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Psychology
Discipline
Subject
Big data
facebook
health
prediction
social media
twitter
Psychology
Quantitative Psychology
Social Psychology
Funder
Grant number
License
Copyright date
2018-02-23T20:17:00-08:00
Distributor
Related resources
Contributor
Abstract

A large and growing fraction of the global population uses social media, through which users share their thoughts, feelings, and behaviors, predominantly through text. To quantify the expression of psychological constructs in language, psychology has evolved a set of “closed-vocabulary” methods using pre-determined dictionaries. Advances in natural language processing have made possible the development of “open-vocabulary” methods to analyze text in data-driven ways, and machine learning algorithms have substantially improved prediction performances. The first chapter introduces these methods, comparing traditional methods of text analysis with newer methods from natural language processing in terms of their relative ability to predict and elucidate the language correlates of age, gender and the personality of Facebook users (N = 65,896). The second and third chapters discuss the use of social media to predict depression in individuals (the most prevalent mental illness). The second chapter reviews the literature on detection of depression through social media and concludes that no study to date has yet demonstrated the efficacy of this approach to screen for clinician-reported depression. In the third chapter, Facebook data was collected and connected to patients’ medical records (N = 683), and prediction models based on Facebook data were able to forecast the occurrence of depression with fair accuracy–about as well as self-report screening surveys. The fourth chapter applies both sets of methods to geotagged Tweets to predict county-level mortality rates of atherosclerotic heart disease mortality (the leading cause of death in the U.S.) across 1,347 counties, capturing 88% of the U.S. population. In this study, a Twitter model outperformed a model combining ten other leading demographic, socioeconomic and health risk factors. Across both depression and heart disease, associated language profiles identified fine-grained psychological determinants (e.g., loneliness emerged as a risk factor for depression, and optimism showed a protective association with heart disease). In sum, these studies demonstrate that large-scale text analysis is a valuable tool for psychology with implications for public health, as it allows for the unobtrusive and cost-effective monitoring of disease risk and psychological states of individuals and large populations.

Advisor
Martin E. Seligman
Date of degree
2017-01-01
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation