Sociotechnical Automation Science: A Case Study in Developing and Augmenting an Ensemble Neural Network with Multiple LLMs for Subject Cataloging at the Penn Libraries
Penn collection
Degree type
Discipline
Subject
subject indexing algorithms
Annif
LLM
Funder
Grant number
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
The sociotechnical aspects of automation play a crucial role in the development of machine learning systems. Through deep collaboration with cataloging professionals at the Penn Libraries, we have created a set of subject indexing algorithms that are ensembled into a neural network. Librarians have evaluated multiple rounds of the algorithm outputs. By identifying the failure points in the neural network-based subject assignment process, we incorporated LLM tasks such as evaluating search result relevance, summarizing search results, and assessing topical assignments of synthetic summaries. Implementing LLM tasks draws on the linguistic strengths of LLMs, rather than world knowledge. The data processing is integrated into an Apache Airflow pipeline, allowing librarians to input an Excel file, which begins the workflow for generating candidate subject descriptions. These machine learning outputs are poised for a pilot test in production systems this summer.