Human-Centered AI in Computational Social Science: Evaluating Automated Annotation with Large Language Models

Loading...
Thumbnail Image
Degree type
PhD
Graduate group
Political Science
Discipline
Political Science
Computer Sciences
Data Science
Subject
Artificial Intelligence
Automated Annotation
Computational Social Science
Large Language Models
Funder
Grant number
License
Copyright date
01/01/2025
Distributor
Related resources
Author
Pangakis, Nicholas, James
Contributor
Abstract

Computational social scientists are increasingly incorporating text as data into their research. A typical framework for working with large text data sets involves hiring human annotators to read a subset of the text samples and then building a statistical model to annotate the remainder of the text corpus. Due to their effectiveness at quantifying natural language, their ease of application, and their relatively low cost, artificial intelligence tools, like generative large language models (LLMs), may be used to automate these manual annotation procedures. This process, which I call ``automated annotation," can dramatically improve research designs that involve text as data. For example, I demonstrate that automated annotation procedures can cost 11.6% that of standard annotation approaches and take 18.8% the time. Although automated annotation has remarkable potential in social science, there are serious concerns about misuse and uncritical application. If practitioners use automated annotation without validation, for instance, they risk unknown bias and other inaccuracies in downstream applications. Thus, my dissertation aims to test strategies to develop effective and responsible automated annotation procedures. Specifically, I argue for a human-centered automated annotation framework, which places a central role for human annotations at each stage of the workflow. Across three studies, I develop and implement various automated annotation techniques that all remain grounded in human reasoning. My empirical investigations cover a wide range of topics—from testing automated annotation strategies with generative LLMs to developing a multi-stage, human-in-the-loop annotation pipeline. As a whole, my findings underscore the potential of leveraging AI tools to enhance text-as-data methodologies and to help researchers explore important substantive questions. With proper validation techniques, generative LLMs can approximate human reasoning at a rapid pace and low cost.

Advisor
Hopkins, Daniel, J
Date of degree
2025
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation