Sociotechnical Automation Science: A Case Study in Developing and Augmenting an Ensemble Neural Network with Multiple LLMs for Subject Cataloging at the Penn Libraries

Loading...
Thumbnail Image
Penn collection
Interdisciplinary Centers, Units and Projects::Penn Libraries::Scholarship at Penn Libraries
Degree type
Discipline
Library and Information Science
Subject
machine learning
subject indexing algorithms
Annif
LLM
Funder
Grant number
Copyright date
2024-06-28
Distributor
Related resources
Author
Contributor
Abstract

The sociotechnical aspects of automation play a crucial role in the development of machine learning systems. Through deep collaboration with cataloging professionals at the Penn Libraries, we have created a set of subject indexing algorithms that are ensembled into a neural network. Librarians have evaluated multiple rounds of the algorithm outputs. By identifying the failure points in the neural network-based subject assignment process, we incorporated LLM tasks such as evaluating search result relevance, summarizing search results, and assessing topical assignments of synthetic summaries. Implementing LLM tasks draws on the linguistic strengths of LLMs, rather than world knowledge. The data processing is integrated into an Apache Airflow pipeline, allowing librarians to input an Excel file, which begins the workflow for generating candidate subject descriptions. These machine learning outputs are poised for a pilot test in production systems this summer.

Advisor
Date of presentation
2024-06-26
Conference name
University of Illinois School of Information Sciences, Summer Colloquium
Conference dates
2024-06-26
Conference location
Online
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation
Collection