Recent Submissions

Now showing 1 - 5 of 2054
  • Publication
    Audiovisual Data Curation Primer Presentation
    (2023-12-14) Phegley, Lauren
    This presentation was given as part of the Data Curation Network's Primer Webinar held on 2023-12-14. The authors presented the highlights of our Audiovisual Data Curation Primer, which is a peer-reviewed concise resource designed to provide support for data curators in learning about audiovisual files. The full primer is openly avaliable at
  • Publication
    BIBFRAME instance mining: Toward authoritative publisher entities using association rules
    (2020-11-25) Hahn, Jim
    With the transition of a shared catalog to BIBFRAME linked data, there is now a pressing need for identifying the canonical Instance for clustering in BIBFRAME. A fundamental component of Instance identification is by way of authoritative publisher entities. Previous work in this area by OCLC research (Connaway & Dickey, 2011) proposed a data mining approach for developing an experimental Publisher Name Authority File (PNAF). The OCLC research was able to create profiles for "high-incidence" publishers after data mining and clustering of publishers. As a component of PNAF, Connaway & Dickney were able to provide detailed subject analysis of publishers. This presentation will detail a case study of machine learning methods over a corpus of subjects, main entries, and added entries, as antecedents into association rules to derive consequent publisher entities. The departure point for the present research into identification of authoritative publisher entities is to focus on clustering, reconciliation and re-use of ISBN and subfield b of MARC 260 along with the subjects (650 - Subject Added Entry), main entries (1XX - Main Entries) and added entries (710 - Added Entry-Corporate Name) as signals to inform a training corpus into association rule mining, among other machine learning algorithms, libraries, and methods.
  • Publication
    BF Interlingua: Interoperability among BIBFRAME linked data vocabularies
    (2023-01-19) Hahn, Jim
    Presentation exploring an interchange process among BIBFRAME linked data vocabularies.
  • Publication
    SVDE model interoperability: SVDE and the BIBFRAME interchange structure
    (2022-11-08) Hahn, Jim
    Provides an overview on a possible interchange structure for BIBFRAME using RDF/XML from Library of Congress as the interchange structure. The presentation details selected normalization steps of an SVDE instance into the RDF/XML Library of Congress structure. The presentation concludes with an example of loading SVDE normalized data into the Alma Sandbox at Penn by way of a locally hosted linked data editor, Marva.
  • Publication
    Bibliographic Entities are Described by Sets
    (2021-07-26) Hahn, Jim
    A set theoretical frame based on Svenonius's theory of bibliographic entities is the departure point for this short talk on entity description. This talk will briefly show how properties of bibliographic entity descriptions may be identified using a frequent pattern data mining algorithm over targeted sets of existing metadata descriptions. The MARC21 corpus used in this case was comprised of clustered sets of publishers and publisher locations from the library MARC21 records found in the Platform for Open Data (POD). POD is a data aggregation project involving member institutions of the IvyPlus Library Confederation and contains seventy million MARC21 records, forty million of which are unique.