BIBFRAME instance mining: Toward authoritative publisher entities using association rules

Loading...
Thumbnail Image
Penn collection
Interdisciplinary Centers, Units and Projects::Penn Libraries::Scholarship at Penn Libraries
Degree type
Discipline
Library and Information Science
Subject
BIBFRAME
FP-growth
data mining
Funder
Grant number
Copyright date
2021
Distributor
Related resources
Author
Contributor
Abstract

With the transition of a shared catalog to BIBFRAME linked data, there is now a pressing need for identifying the canonical Instance for clustering in BIBFRAME. A fundamental component of Instance identification is by way of authoritative publisher entities. Previous work in this area by OCLC research (Connaway & Dickey, 2011) proposed a data mining approach for developing an experimental Publisher Name Authority File (PNAF). The OCLC research was able to create profiles for "high-incidence" publishers after data mining and clustering of publishers. As a component of PNAF, Connaway & Dickney were able to provide detailed subject analysis of publishers. This presentation will detail a case study of machine learning methods over a corpus of subjects, main entries, and added entries, as antecedents into association rules to derive consequent publisher entities. The departure point for the present research into identification of authoritative publisher entities is to focus on clustering, reconciliation and re-use of ISBN and subfield b of MARC 260 along with the subjects (650 - Subject Added Entry), main entries (1XX - Main Entries) and added entries (710 - Added Entry-Corporate Name) as signals to inform a training corpus into association rule mining, among other machine learning algorithms, libraries, and methods.

Advisor
Date of presentation
2020-11-25
Conference name
SWIB20 / Semantic Web in Libraries
Conference dates
2020
Conference location
online
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation
Jim Hahn. "BIBFRAME instance mining: Toward authoritative publisher entities using association rules" SWIB20 / Semantic Web in Libraries
Collection