Predicting Consumers’ Brand Sentiment Using Text Analysis on Reddit

Cen, Puti

Predicting Consumers’ Brand Sentiment Using Text Analysis on Reddit

Files

Puti_Cen.pdf (1.34 MB)

Subject

brand sentiment; Reddit; natural language processing; consumer insight mining
Business

Permalink

https://repository.upenn.edu/handle/20.500.14332/37915

View all metadata

Author

Cen, Puti

Abstract

With the emergence of data privacy regulations around the world (e.g. GDPR, CCPA), practitioners of Internet marketing, the largest digital marketing channel, face the trade-off between user data protection and advertisement targeting accuracy due to their current reliance on PII-related social media analytics. To address this challenge, this research proposes a predictive model for consumers’ brand sentiment based entirely on textual data from Reddit, i.e. fully compliant with current data privacy regulations. This author uses natural language processing techniques to process all post and comment data from the r/gadgets subreddit community in 2018 – extracting frequently-discussed brands and products through named entity recognition, as well as generating brand sentiment labels for active users in r/gadgets through sentiment analysis. This research then uses four supervised learning classifiers to predict brand sentiments for four brand clusters (Apple, Samsung, Microsoft and Google) based on the self-identified characteristics of Reddit users. Across all four brand clusters, the predictive model proposed by this research achieved a ROC AUC score above 0.7 (three out of the four above 0.8). This research thus shows the predictive power of self-identified user characteristics on brand sentiments and offers a non-PII-required consumer targeting model for digital marketing practitioners.

Advisor

Chris Callison-Burch

Date of degree

2020-05-01

Collection

Dissertations and Theses