August 2008 Archives

The Importance of Identifiers

|
Bookmark or Share Bookmark or Share

Here is another entry from my colleague Janifer Gatenby, who works at the OCLC Leiden office. In her OCLC role, Janifer--a frequent speaker and standards specialist--identifies trends and opportunities for Web data services and system interoperation. One of her current projects involves leveraging the power of standard library identifiers in the Web 2.0 environment. Identifers are key elements in moving data from different systems to the network level and achieving interoperability. Janifer deals with the larger topic of data sharing and interoperability in her just published open access article, "The Networked Service Layer: Sharing Data for More Effective Management and Cooperation."

-------------------------------------------------------

My passport number is my identifier.  The passport also carries metadata that identifies me, but not necessarily uniquely, because someone else could have the same name, birth place and date.  In this case, more elements are needed to distinguish me from another person, such as my photograph.  As the number of elements required for uniqueness can vary, once identity is established, an identifier is applied for future ease.  Thus it is hard to imagine a passport without a passport number, even if, strictly speaking, it is usually not simply numeric but an alpha numeric string.  In a database, an identifier is used for identity in preference to an ensemble of descriptive elements.  Unique identifiers provide direct access to records and are of fundamental importance in eliminating duplicates both from a database and from incoming records.

 

Identifiers are important in the commercial world, having a key role in distribution, promotion, rights management and copyright protection.  In England in 1967, the SBN (Standard Book Number) was established by J. Whitaker & Sons (now part of Nielsen BookData, publishers of British Books in Print.    The following year, this identifier became the ISBN (International Standard Book Number), now arguably the best recognized identifier in the bibliographic world.  This identifier, produced by the book trade is used for collecting sales statistics and remunerating publishers and authors alike.  ISO, the International Standards Organization, has since approved and published a whole gamut of international identifiers as complements to the ISBN.  Their list of identifiers consists of:

 

    • ISBN (ISO 2108).  Monographs - manifestation level
    • ISSN (ISO 3297[1]).  Serials - manifestation level, but also used at the work level)
    • ISMN (ISO 10957[2]).  Music - manifestation level
    • ISWC (ISO 15707[3]). Music - work level
    • ISTC (ISO 21047[4]).  Text - work level
    • ISRC (ISO 3901[5]).  Sound recordings - manifestation level
    • ISAN (ISO 15706[6]). Audio-visual - work level
    • V-ISAN (ISO 15706-2).  Audio-visual - manifestation level
    • ISIL (ISO 15511[7]) Libraries
    • ISNI (DIS 27729) Name identifier - currently in progress
    • ISCI (CD  27730) Collections - currently in progress
    • DOI (Digital object identifier) - currently in progress

NISO, the North American standards body, is also involved in identifier standards, in particular a work item in progress for an institution identifier for all organisations involved in the supply chain of serial publications.

 

All the ISO standards with the exception of the ISIL and the ISCI are identifiers created for the purpose of underpinning commercial trade. These ISO identifier standards only cover materials where somebody has applied for an identifier, and the application process can be expensive.  Thus within the WorldCat database it is estimated that only 30% of the resources represented have international identifiers.  For the so called "long tail" of resources of little or no commercial value, only quasi official identifiers exist such as the Library of Congress Control Number (LCCN) or the OCLC number. However, identifiers are becoming increasingly important in the Internet environment as a means to access identical resources in multiple sites and hence the identifiers need to be unique on a global scale.  They also need to be capable of being embedded within a URL (Uniform Resource Locator). URLs themselves are addresses and very poor identifiers, because they are both location specific and are volatile, changing frequently.  This has led to the emergence of resolution systems that link from identifiers of resources to locations of information about the resources or to the actual resources themselves.  The DOI is one such resolution system.

 

There are several models for registering identifiers.  In some cases (e.g. ISBN) the international agency releases blocks of numbers to national agencies who then assign them for use by publishers.  The registration of the metadata associated with the identifier is the responsibility of the publisher and the national agency and not the international agency.  So the definitive metadata for books is found in "in print" lists and national bibliographies.  In other cases, a central database of identifiers and their associated metadata is maintained, as for the ISSN and as is planned for the ISTC.  WorldCat can potentially become a reference database for unique identification of resources of all types, commercial and non-commercial alike.  

 

Within WorldCat, like in other databases, identifiers are the key to linking and navigating from resources and their holdings in libraries to related resources, such as different editions of the same work or different works by the same author.  Identifiers also link between base data and enriched data both within the same database and external databases.  Further, identifiers are used to link from resources to services relating to the resources, for example to link from a metadata record within WorldCat.org to online delivery services provided by a local library.  The standard protocols that underlie interoperability use identifiers for processing transactions.  OCLC is already providing identifier services so that its identifier infrastructure can be used by other systems. The first two of these services are now in production, namely xISBN and xISSN that allow retrieval of related resources by ISBN and ISSN respectively.  The ISSN service includes a graphic display of the history of a serial as per the figure below.

 

 

issn_history compressed.JPG For a clearer view of this example, please visit the xISSN registry here.

 

Further identifier services are being progressively released by OCLC, including on the horizon a service allowing grouping of resources at work level that is currently in pilot with the Dutch union catalogue and services based on manifestation identifiers (project GLiMIr - Global Library Manifestation Identifier).


 

 

 

 

 

 

 

 

 


 

OUR Space: The New World of Metadata

|
Comments Comments (3) | Bookmark or Share Bookmark or Share

As requested by conference attendees, here is the presentation at the Industry Symposium at IFLA, 14 August 2008. The presentation describes a new environment of global information services and exciting new roles for metadata and a variety of knowledge organization methods. Argues that the changes in the environment will permanently affect what it means "to catalog" materials for the purpose of connecting citizens, students and scholars to the information they need, when and where they need it.

 

I participated in a panel sponsored by the Libraries and Web 2.0 Discussion Group at the recent IFLA annual meeting in Quebec.  Here is some background and the means to access my presentation and speaker notes, for those who asked. The session attracted about 80 or 85 people. First, the description of the panel from the IFLA programme:

"The Open Knowledge Foundation (ONF) has criticized the draft report of the Working Group for Bibliographic Control of the Library of Congress because there is no provision for the access, re-use and re-distribution of bibliographic data without restriction. The ONF published a petition that all bibliographic data should be free which is supported by users and Web 2.0 services like Library Thing and the Open Library Project. What does that mean for our practice?

We would like to discuss this with representatives of the projects, national libraries and other major data providers. We think that we need to start the discussion as soon as possible and therefore invite all interested delegates to this first meeting of the Library and Web 2.0 Discussion Group."

Patrick Danowski (State Library of Berlin), chair of the Discussion Group, convened the panel and introduced the session. Panelists included Stephen Abram (SirsiDynix), me, Sally McCallum (Library of Congress), and Patrick Peiffer (National Library of Luxembourg and project lead for Creative Commons in Luxembourg). Karen Coyle (consultant--of late to Open Library) submitted a brief video called "free the data" to start off the panel. My presentation followed. 

For those who asked, I've put my presentation plus my speaker notes up on SlideShare.  During the upload to SlideShare, something strange happened, and the speaker notes for slide 2 are actually at the very end of the notes, so watch out for that. (In SlideShare, speaker notes show up as comments.) 

Here's a brief summary of what I presented:

--Information seekers expect seamless connections between metadata and content, regardless of source
--The information industry is being driven to a data sharing model based upon the value in the exchange and linking of data
--Nearly all organizations have terms and conditions for data sharing (documented or not)
--There is no such thing as "free" content or metadata
--There is no such thing as "free" content or metadata services
--"Where the money comes from" directly impacts data sharing policy
--This is a painful transition, esp. for those organizations directly dependent on revenue from content, metadata, or content/metadata-based services
--The present landscape is rich in contradictions

Please refer to the full presentation and notes to place the summary in context. Following the two presentations a rich and thoughtful discussion of the issues ensued among panelists and session participants. I benefited from and was very pleased to be part of the program.
 
I've seen two quite brief blog posts on the session--mentions, really. I am wondering if someone will blog the discussion more fully or else took notes they are willing to share here.

About this blog

Metalogue is a forum for sharing thoughts on all things related to knowledge organization by and for libraries, hosted by Karen Calhoun, Vice President, WorldCat and Metadata Services for OCLC. Karen is joined often by friends and colleagues from all over the globe, who contribute perspectives and experiences about the current and future state of cataloguing and metadata.

Find In A Library

Search for an item in libraries near you:
WorldCat.org »