Ockerbloom, John Mark

Profile Picture
Email Address
Computer Sciences
Library and Information Science
Research Projects
Organizational Units
Digital library planner
I'm a computer scientist by training, and a digital library architect by profession. This site contains writings and presentations that I or others have found useful to refer to at one time or another. Full text articles written during my time at Penn have also been deposited in ScholarlyCommons@Penn.
Research Interests

Search Results

Now showing 1 - 8 of 8
  • Publication
    Copyright and Provenance: Some Practical Problems
    (2007-12-01) Mark Ockerbloom, John
    Copyright clearance is an increasingly complex and expensive impediment to the digitization and reuse of information. Clearing copyright issues in a reliable and cost-effective manner for works created in the last 100 years can involve establishing complex provenance chains for the works, their copyrights, and their licenses. This paper gives an overview of some of the practical provenance-related issues and challenges in clearing copyrights at large scale, and discusses efforts to more efficiently gather and share information and its copyright provenance.
  • Publication
    Architectural Mismatch: Why Reuse is Still So Hard
    (2009-07-01) Garlan, David; Allen, Robert; Mark Ockerbloom, John
    In this article, David Garlan, Robert Allen, and John Ockerbloom reflect on the state of architectural mismatch, a term they coined in their 1995 IEEE Software article, "Architectural Mismatch: Why Reuse Is So Hard." Although the nature of software systems has changed dramatically since the earlier article was published, the challenge of architectural mismatch remains an important concern for the software engineering field.
  • Publication
    Researching the Public Domain & Open Access
    (2015-10-22) Mark Ockerbloom, John; Terrell, Robert R
    A discussion of finding and making use of public domain material, the legal concept of fair use & the opportunities it provides, the wealth of material available post 1922 (the usual “safe” cutoff for public domain reuse) & how scholars can further contribute to that wealth by reclaiming open access for their work.
  • Publication
    New Maps of the Library: Building Better Subject Discovery Tools Using Library of Congress Subject Headings
    (2006-12-05) Mark Ockerbloom, John
    We describe tools in development at the University of Pennsylvania to generate and display interactive "subject maps" for exploring library collections. Based on the Library of Congress Subject Headings (LCSH), these maps are automatically built from existing authority records, a collection's bibliographic records, and optional local "tweaks" for local interests and search patterns. Users can explore these maps via ordinary text- based web browsing, and browse clusters of related research resources. We now provide these maps for small collections like The Online Books Page, and are experimenting with maps for the entire Penn Library catalog. We hope to enable users to take full advantage of the rich conceptual relationships in LCSH-based library collections, and effectively browse increasingly diverse and dispersed library collections.
  • Publication
    High Quality Discovery in a Web 2.0 World: Architectures for Next Generation Catalogs
    (2008-05-29) Mark Ockerbloom, John
    Issues of information and systems architecture underly many of the current debates over the future of cataloging. This talk discusses some ways in which the architecture of the catalog is being redesigned to combine the rich information architecture of library metadata with the robust systems architecture of many Web-based discovery systems. I will show "subject map" discovery systems that better exploit the relationships in complex ontologies like LCSH, and discuss a Digital Library Federation initiative to promote standards supporting interoperability between discovery systems and ILS data and services. I will also touch on the role of networked architectures in improving the quality and efficiency of library cataloging.
  • Publication
    Archiving and Preserving PDF Files
    (2001-02-15) Mark Ockerbloom, John
    Since its release in mid 1993, Adobe Portable Document Format (PDF) has become a widely used standard for electronic document distribution worldwide in many institutional settings. Much of its popularity comes from its ability to faithfully encode both the text and the visual appearance of source documents, preserving their fonts, formatting, colors, and graphics. PDF files can be viewed, navigated, and printed with a free Adobe Acrobat Reader, available on all major computing platforms. PDF has many applications and is commonly used to publish government, public, and academic documents. Many of the electronic journals and other digital resources acquired by libraries are published in PDF format. As libraries grow more dependent on electronic resources, they need to consider how they can preserve these resources for the long term. Many libraries retain back runs of print journals that are over 100 years old, and which are still consulted by researchers. No digital technology has lasted nearly that long, and many data formats have already become obsolete and not easily readable in a much shorter time period. This document discusses ways that libraries can plan for the preservation of electronic journals and other digital resources in PDF format. After a brief discussion of the file specifications and the future plans for PDF, the article focuses on issues related to preservation of PDF files.
  • Publication
    The Next Mother Lode for Large-scale Digitization? Historic Serials, Copyrights, and Shared Knowledge
    (2006-04-11) Mark Ockerbloom, John
    Much of the publicity around recent mass-digitization projects focuses on the millions of books they promise to make freely readable online. Because of copyright, though, most of the books provided in full will be of mainly historical interest. But much of the richest historical text content is not in books at all, but in the newspapers, magazines, newsletters, and scholarly journals where events are reported firsthand, stories and essays make their debut, research findings are announced and critiqued, and issues of the day debated. Back runs of many of these serials are available in major research institutions but often in few other places. But they have the potential for much more intensive use, by a much wider community, if they are digitized and made generally accessible. In this talk, we will discuss an inventory we have conducted at Penn of periodicals copyright renewals. We found that copyrights of the vast majority of mid-20th-century American serials of historical interest were not renewed to their fullest possible extent. The inventory reveals a rich trove of copyright-free digitizable serial content from major periodicals as late as the 1960s. Drawing on our experience with this inventory's production and previous registry development, we will also show how low-cost, scalable knowledge bases could be built from this inventory to help libraries more easily identify freely digitizable serial content, and collaborate in making it digitally available to the world. Our initial raw inventory can be found at http://onlinebooks.library.upenn.edu/cce/firstperiod.html
  • Publication
    Mapping the library future: Subject navigation for today's and tomorrow's library catalogs
    (2008-01-12) Mark Ockerbloom, John
    My ALA Mindwinter 2008 presentation slides on subject maps. For more details on how subject maps are created, see the New Maps of the Library white paper from 2006.