I have written and spoken often of the pressures on library technical services departments, which are being asked to do more work with the same or fewer resources at a time when they must find ways to become involved in new library initiatives. To achieve the results they need, technical services departments require breakthrough, double-digit improvements in cost, time, and effectiveness.
Some process redesign pioneers like Stanford and Cornell--braving the scorn of others--began over a decade ago to blaze a trail, and today some very large players indeed are embracing the concepts of process redesign and continuous improvement (systematic and continual rethinking of an entire process, not just bits and pieces of it). In 2007, for example, I became aware of the achievements of the Collection Acquisition and Description (CA&D) division at the British Library at Boston Spa, under the leadership of Caroline Brazier (then Head of CA&D, now Head of Resource Discovery) and Alasdair Ball (Head of CA&D Operations).
As is the case in so many places, the British Library's Boston Spa processing operations needed to keep up with the traditional work of selection, acquisitions and cataloging while simultaneously shifting focus to digital developments--all at a time of flat or shrinking resources. Through changes informed by workflow analysis and process mapping, Alasdair guided CA&D staff to 15% staff savings a year and faster turnaround of materials while also freeing up staff time for a digital processing team and other projects.
In late 2007 I invited Alasdair to visit OCLC's
The lower left corner of the floor plan shows where materials arrive. As many materials as possible are "fast tracked" and returned to "finishing" in the shortest possible path (shown in red) through the room. Teams located in the bottom half of the floor plan--a few more dozen feet into the room--complete the processing of still more materials and return them for finishing. Only those materials requiring the attention of original catalogers or other specialists make their way the full length of the red path through the room.
The following photos illustrate how our Contract Services in Dublin Ohio applied what they learned from Alasdair to our workflow redesign and space renovation. The
This photo illustrates the efforts of the "order entry" group. Our "air traffic controllers," these staff members organize the flow of materials by checking them in, searching, and using a set of automated tools to process as many materials as possible. What cannot be completed at this early stage is routed to the next appropriate team.
View imageThree objectives of the workspace redesign were to provide equally for privacy, teamwork, and a logical flow of materials, while also taking advantage of the bright and airy nature of this large room. View image
The co-located original cataloging teams are organized by type and/or language of material.
Preparation for finishing--for example, custom editing according to a particular library's contract--is completed at the end of the process in a spacious work area that allows for multiple bins of different libraries' materials. View image
Space for physical processing and preparation for shipping is located on the way out of the building.
View imageThe workflow redesign and space renovation were completed at the end of June 2008. Later in the summer, we invited our OCLC colleagues to a big opening, complete with tours and a picnic lunch, to celebrate what everyone had accomplished and the new beginning. At this point, what is the most evident to me about the change is the pride of the staff in their new space and in what they have accomplished together. We are grateful to Caroline Brazier, Alasdair Ball, and the British Library for their generosity and good counsel.
Worthy of note in this context is LC's movement to new workflows and an organizational structure that combines acquisitions and cataloging. As Beacher Wiggins, director of LC's Acquisitions and Bibliographic Access Directorate, put it in a talk at a June 2007 ALCTS preconference
Your comments on the concepts of continuous improvement and how they have been or might be applied in library technical services, or your accounts of experiences with technical services space renovation and process redesign are welcome here.
Here is another entry from my colleague
-------------------------------------------------------
My passport number is my identifier. The passport also carries metadata that identifies me, but not necessarily uniquely, because someone else could have the same name, birth place and date. In this case, more elements are needed to distinguish me from another person, such as my photograph. As the number of elements required for uniqueness can vary, once identity is established, an identifier is applied for future ease. Thus it is hard to imagine a passport without a passport number, even if, strictly speaking, it is usually not simply numeric but an alpha numeric string. In a database, an identifier is used for identity in preference to an ensemble of descriptive elements. Unique identifiers provide direct access to records and are of fundamental importance in eliminating duplicates both from a database and from incoming records.
Identifiers are important in the commercial world, having a key role in distribution, promotion, rights management and copyright protection. In
- ISBN (ISO 2108). Monographs - manifestation level
- ISSN (ISO 3297[1]). Serials - manifestation level, but also used at the work level)
- ISMN (ISO 10957[2]). Music - manifestation level
- ISWC (ISO 15707[3]). Music - work level
- ISTC (ISO 21047[4]). Text - work level
- ISRC (ISO 3901[5]). Sound recordings - manifestation level
- ISAN (ISO 15706[6]). Audio-visual - work level
- V-ISAN (ISO 15706-2). Audio-visual - manifestation level
- ISIL (ISO 15511[7]) Libraries
- ISNI (DIS 27729) Name identifier - currently in progress
- ISCI (CD 27730) Collections - currently in progress
- DOI (Digital object identifier) - currently in progress
NISO, the North American standards body, is also involved in identifier standards, in particular a work item in progress for an institution identifier for all organisations involved in the supply chain of serial publications.
All the ISO standards with the exception of the ISIL and the ISCI are identifiers created for the purpose of underpinning commercial trade. These ISO identifier standards only cover materials where somebody has applied for an identifier, and the application process can be expensive. Thus within the WorldCat database it is estimated that only 30% of the resources represented have international identifiers. For the so called "long tail" of resources of little or no commercial value, only quasi official identifiers exist such as the Library of Congress Control Number (LCCN) or the OCLC number. However, identifiers are becoming increasingly important in the Internet environment as a means to access identical resources in multiple sites and hence the identifiers need to be unique on a global scale. They also need to be capable of being embedded within a URL (Uniform Resource Locator). URLs themselves are addresses and very poor identifiers, because they are both location specific and are volatile, changing frequently. This has led to the emergence of resolution systems that link from identifiers of resources to locations of information about the resources or to the actual resources themselves. The DOI is one such resolution system.
There are several models for registering identifiers. In some cases (e.g. ISBN) the international agency releases blocks of numbers to national agencies who then assign them for use by publishers. The registration of the metadata associated with the identifier is the responsibility of the publisher and the national agency and not the international agency. So the definitive metadata for books is found in "in print" lists and national bibliographies. In other cases, a central database of identifiers and their associated metadata is maintained, as for the ISSN and as is planned for the ISTC. WorldCat can potentially become a reference database for unique identification of resources of all types, commercial and non-commercial alike.
Within WorldCat, like in other databases, identifiers are the key to linking and navigating from resources and their holdings in libraries to related resources, such as different editions of the same work or different works by the same author. Identifiers also link between base data and enriched data both within the same database and external databases. Further, identifiers are used to link from resources to services relating to the resources, for example to link from a metadata record within WorldCat.org to online delivery services provided by a local library. The standard protocols that underlie interoperability use identifiers for processing transactions. OCLC is already providing identifier services so that its identifier infrastructure can be used by other systems. The first two of these services are now in production, namely xISBN and xISSN that allow retrieval of related resources by ISBN and ISSN respectively. The ISSN service includes a graphic display of the history of a serial as per the figure below.
For a clearer view of this example, please visit the xISSN registry here.
Further identifier services are being progressively released by OCLC, including on the horizon a service allowing grouping of resources at work level that is currently in pilot with the Dutch union catalogue and services based on manifestation identifiers (project GLiMIr - Global Library Manifestation Identifier).
As requested by conference attendees, here is the presentation at the Industry Symposium at IFLA, 14 August 2008. The presentation describes a new environment of global information services and exciting new roles for metadata and a variety of knowledge organization methods. Argues that the changes in the environment will permanently affect what it means "to catalog" materials for the purpose of connecting citizens, students and scholars to the information they need, when and where they need it.
I participated in a panel sponsored by the Libraries and Web 2.0 Discussion Group at the recent IFLA annual meeting in Quebec. Here is some background and the means to access my presentation and speaker notes, for those who asked. The session attracted about 80 or 85 people. First, the description of the panel from the IFLA programme:
"The Open Knowledge Foundation (ONF) has criticized the draft report of the Working Group for Bibliographic Control of the Library of Congress because there is no provision for the access, re-use and re-distribution of bibliographic data without restriction. The ONF published a petition that all bibliographic data should be free which is supported by users and Web 2.0 services like Library Thing and the Open Library Project. What does that mean for our practice?
We would like to discuss this with representatives of the projects, national libraries and other major data providers. We think that we need to start the discussion as soon as possible and therefore invite all interested delegates to this first meeting of the Library and Web 2.0 Discussion Group."
Patrick Danowski (State Library of Berlin), chair of the Discussion Group, convened the panel and introduced the session. Panelists included Stephen Abram (SirsiDynix), me, Sally McCallum (Library of Congress), and Patrick Peiffer (National Library of Luxembourg and project lead for Creative Commons in Luxembourg). Karen Coyle (consultant--of late to Open Library) submitted a brief video called "free the data" to start off the panel. My presentation followed.
For those who asked, I've put my presentation plus my speaker notes up on SlideShare. During the upload to SlideShare, something strange happened, and the speaker notes for slide 2 are actually at the very end of the notes, so watch out for that. (In SlideShare, speaker notes show up as comments.)
Here's a brief summary of what I presented:
--Information seekers expect seamless connections between metadata and content, regardless of source
--The information industry is being driven to a data sharing model based upon the value in the exchange and linking of data
--Nearly all organizations have terms and conditions for data sharing (documented or not)
--There is no such thing as "free" content or metadata
--There is no such thing as "free" content or metadata services
--"Where the money comes from" directly impacts data sharing policy
--This is a painful transition, esp. for those organizations directly dependent on revenue from content, metadata, or content/metadata-based services
•--The present landscape is rich in contradictions
I was struck by the article in today's online New York Times--the first in a series of pieces on reading and the Internet. While some pundits worry about declining scores on standard reading tests (that test skills evolved for an offline world), other experts note the emergence of a new kind of reading--online, nonlinear, participatory and active, and multilayered. Traditional reading of books among 17-year-olds is down compared to 2004, while reading on the Internet is up. One survey indicates that by 2004, nearly half of 8 to 18 year-olds were using the Internet every day, and thus engaged at some level in reading text. One child's mother, interviewed for today's NY Times story, keeps bringing home books from the local library to try to 'fix' her obviously bright, not-in-need-of-repair young daughter, who prefers the interactive, multifaceted style of reading she does on the 'net.
The article reminded me of Carole Palmer's talk at OCLC a few weeks back. Lorcan blogged about Carole's visit and the paper she presented. Kids are not the only ones doing a lot of reading on the 'net--so are our top scholars. Carole noted in her talk that scholars read 30% more articles in 2006 than in the mid-1990s, but that reading time per article fell. Scholars are reading in short bursts or as Carole puts it, 'bouncing' from one online source to another. They are rapidly navigating through more material in less time, attempting to evaluate and use content while minimizing actual reading. Indexing, abstracts, literature reviews, online discussions and interactions with colleagues, tables of contents and so on, all help researchers decide whether articles are relevant--without reading them from beginning to end.
I am guessing that--if given the means to 'bounce'--the general public's reading behavior might be characterized by bouncing as well. In the WorldCat quality research I've been involved with over the past several months, I've observed that for materials that aren't one or two clicks away already--in other words, for most of what libraries own--searchers from all the groups we studied expressed a strong desire for instant, easy access to summaries, abstracts, reviews, TOCs--anything that would help them decide whether to pursue or avoid reading a title in a search result set. Readers everywhere seem to feel pressed for time. What are they reading from cover to cover, and how do they decide? How can libraries participate and help in this process?
Libraries cannot win if they do not play in the idiom of the Web. More deeply understanding end users' reading styles and preferences could help libraries know what practices to hold onto and what to give up or cut back. Like the standard tests of reading skill, library cataloging practices evolved in keeping with a reading style that is offline, linear, solitary and passive, extended in time, often text-only (few visuals), and single layered (that is, offers one point of view). How much of library cataloging practice reflects the predominant reading style of the last few hundred years? What still works about traditional cataloging practice, and what no longer suits the end user behavior and preferences associated with online reading, or with selecting what to read offline?
It occurs to me that what we tend to chatter about on our lists and blogs -- how much bibliographic data here, encoded in what way in our systems, with how many fields and what type of controlled subject access, and whether the whole qualifies as full, minimal, abbreviated or whatever -- could stand to be assessed in light of the changing world of reading. I am wondering how library cataloging and metadata professionals might begin to identify those practices that are likely to attract and sustain more attention to library collections, given the new age of reading that appears to dawning, at least among certain segments of the end user communities that libraries serve.
By Janifer Gatenby, OCLC Leiden Office
In February 2008, the Nederlandse Centrale Catalogus (NCC: the Dutch Union Catalogue)* started updating WorldCat in real time. As Dutch librarians create new bibliographic records and add new holdings to new and existing records in the NCC, WorldCat is simultaneously and automatically updated behind the scenes. Thus, all new data is visible in WorldCat within seconds, even though it is entered via a local interface. Once in WorldCat the data is exposed to a global audience via the freely available site WorldCat.org. The data is also formatted and made available to the major search engines, in particular Google and Yahoo and various other major sites such as Facebook. Many sites, including an increasing number of blog sites have downloaded the WorldCat search box allowing direct searching from their pages. Other sites are accessing WorldCat via their own programs using the WorldCat API (Application Program Interface).
The importance of up to date and comprehensive information in all potential discovery environments cannot be overstated; users may search wherever they choose and librarians can be confident that these users will be able to retrieve the same resources, no matter their point of entry.
Without the
In 5 months of operation of SRU update, the NCC has added to WorldCat more than 1.5 million holdings and 230,000 bibliographic records with a time between appearance in one catalogue and appearance in the other averaging less than 5 seconds. For the first two months, there were two streams of data, ongoing regular work and a gap load of records processed between the initial batch load and the live date for
Following the NCC success, a second release of the software was implemented on 14th July 2008, extending the service to include enhanced security, a mechanism for the deletion of holdings and a validation only routine. The validation only routine serves as a valuable testing tool for both libraries and OCLC developers alike. The service is now ready for further implementations. The second installation will be the Libraries Australia *** service later this year.
The new service has great potential for WorldCat because a growing percentage of contribution to WorldCat is not via WorldCat's online Connexion interface. All systems, of libraries of all types, currently contributing via batch can potentially switch to interactive update. Sending systems need to have
*The NCC contains bibliographic references and the locations of approximately 12 million books and almost 500,000 periodicals in more than 400 libraries in the Netherlands. The NCC was initially loaded to WorldCat in fiscal year 2006.
**
***Libraries Australia, coordinated by the National Library of Australia, is used for reference, collection development, cataloguing and interlibrary lending. Libraries Australia contains the Australian National Bibliographic Database (ANBD), which records the location details of over 42 million items held in most Australian academic, research, national, state, public and special libraries. The holdings of Libraries Australia became part of WorldCat in fiscal year 2008.
By David Whitehair
David Whitehair is the Global Product Manager for Cataloging & Metadata Services. He manages the group who is responsible for many of our services including Connexion, Contract Cataloging, WorldCat Selection, and Language Sets.
Several of us from the Cataloging & Metadata Services group attended the American Library Association meeting in
Although like most attendees we are exhausted by the end of the conference, we couldn't imagine June/July without it. (OK... I will admit... some of us really question why does the OCLC breakfast have to be at 7:00 AM on Sunday? And why do we have to be there at 6:30? We love the breakfast, but how about a brunch instead?! Ha!) Also like most attendees, at each ALA we sample the local cuisine--and one dinner we try to have together. This time we had a fab-u-lous group dinner at Mr Stox, where we welcomed our team members who traveled the farthest (
Here is just a peek into some of our activities. Friday included the Enhance Sharing Session which was hosted by Jay Weitz and Glenn Patton. Glenn was then joined by Robert Bremer to host a meeting to discuss the implementation of the new linking ISSN. Saturday started early for the Dewey team members Libbie Crawford, Michael Panzer, Joan Mitchell, Giles Martin, and Juli Beall, who hosted a record-breaking turnout at the Dewey Update Breakfast (wow... I guess I shouldn't complain about getting up early for the Sunday breakfast... the Dewey team does this two days in a row).
Sunday afternoon included the New Directions in Cataloging at OCLC presentation given by Renee Register, supported by team members Maureen Huss and Robin Buser. Sunday also included lots of time talking with OCLC members at the OCLC booth for Robin Buser. Of course Rich Greene was undertaking his usual Mr. MARBI marathon at this conference; Doug Perkins conferred with many OCLC members about batchloading, and Linda Gabel and Cynthia Whitacre carried out their many duties as SAC and ALCTS committee members/leaders.
For me... along with Lisa Elliott, the trip ended with a couple of library visits on Tuesday. It is always a joy to be able to visit libraries when we are on the road--and since we were out and about, Lisa and I were able to end the trip with a terrific dinner in
And yes, the planning for next
This post is from my colleague Michael Panzer, who is OCLC's Global Product Manager for Taxonomy Services. Michael builds technical and business cases for the use of Dewey and other OCLC terminology resources in a wide variety of web applications. That means, while analyzing the traditional use cases and user base for knowledge organization systems, rethinking their role in a rapidly changing information landscape.
---------------------------------------
By Michael Panzer
When dealing with a large-scale and widely-used knowledge organization system like the Dewey Decimal Classification, we often tend to focus solely on the organization aspect, which is closely intertwined with editorial work. This is perfectly understandable, since developing and updating the DDC, keeping up with current scientific developments, spotting new trends in both scholarly communication and popular publishing, and figuring out how to fit those patterns into the structure of the scheme are as intriguing as they are challenging.
From the organization perspective, the intended user of the scheme is mainly the classifier. Dewey acts very much as a number-building engine, providing richly documented concepts to help with classification decisions.
Since the Middle Ages, quasi-religious battles have been fought over the "valid" arrangement of places according to specific views of the world, as parodied by Jorge Luis Borges and others. Organizing knowledge has always been primarily an ontological activity; it is about putting the world into the classification.
However, there is another side to this coin--the discovery side. While the hierarchical organization of the DDC establishes a default set of places and neighborhoods that is also visible in the physical manifestation of library shelves, this is just one set of relationships in the DDC. A
What are those "other" relationships that Dewey possesses and that seem so important to surface? Firstly, there is the relationship of concepts to resources. Dewey has been used for a long time, and over 200,000 numbers are assigned to information resources each year and added to WorldCat by the Library of Congress and the German National Library alone.
Secondly, we have relationships between concepts in the scheme itself. Dewey provides a rich set of non-hierarchical relations, indicating other relevant and related subjects across disciplinary boundaries.
Thirdly, perhaps most importantly, there is the relationship between the same concepts across different languages. Dewey has been translated extensively, and current versions are available in French, German, Hebrew, Italian, Spanish, and Vietnamese. Briefer representations of the top-three levels (the DDC Summaries) are available in several languages in the DeweyBrowser. This multilingual nature of the scheme allows searchers to access a broader range of resources or to switch the language of--and thus localize--subject metadata seamlessly. MelvilClass, a Dewey front-end developed by the German National Library for the German translation, could be used as a common interface to the DDC in any language, as it is built upon the standard DDC data format.
It is not hard to give an example of the basic terminology of a class pulled together in a multilingual way:
<class/794.8> a skos:Concept ;
skos:notation "794.8"^^ddc:notation ;
skos:prefLabel "Computer games"@en ;
skos:prefLabel "Computerspiele"@de ;
skos:prefLabel "Jeux sur ordinateur"@fr ;
skos:prefLabel "Juegos por computador"@es .
Expressed in such manner, the Dewey number provides a language-independent representation of a Dewey concept, accompanied by language-dependent assertions about the concept. This information, identified by a URI, can be easily consumed by semantic web agents and used in various metadata scenarios.
Fourthly, as we have seen, it is important to play well with others, i.e., establishing and maintaining relationships to other
Pulling those relationships together under a common surface will be the next challenge going forward. In the semantic web community the concept of Linked Data currently receives some attention, with its emphasis on exposing and connecting data using technologies like URIs, HTTP and RDF to improve information discovery on the web. With its focus on relationships and discovery, it seems that Dewey will be well prepared to become part of this big linked data set. Now it is about putting the classification back into the world!
A good thing about traveling and speaking a lot is that people ask you questions. Sometimes they ask intriguing ones, the kinds that stick in your head. One of those questions compared libraries to pizza shops--"if I want to find a pizza shop in my neighborhood I just google 'pizza' and my zip code, and they all come up. Why isn't it enough to just google 'library' and my zip code and see what's near me--why do I need WorldCat to aggregate library collections for me?"
I tried googling "pizza" and "library" appended with various zip codes, and it's true, Google yields a nice list for both kinds of establishments--with maps, addresses, URLs and phone numbers. From the library list one can usually connect to an online catalog to search and browse library collections, albeit one at a time. At the same time it must be admitted that the method works better for pizza than it does for libraries; academic libraries in particular get left out of the Google search results. But for the sake of argument, let's say that the Google technique for identifying nearby libraries and what's in their collections is effective and comprehensive. Would a "collective collection" of many library collections--that is, WorldCat--still be useful to libraries and the communities they serve?
Beyond Pepperoni, Veggie and Supreme
The most obvious difference between pizza shops and libraries is that with pizza shops, it's pretty easy to predict the inventory. A few also sell lasagna and salads, but most pizza shops are pretty much alike. Libraries and library collections are not alike; and in fact library collections tend to be made up of both popular and harder-to-find items, suggesting a long tail* strategy for attracting the attention of mainstream and niche readers. If the people in the neighborhood don't know those rich library collections are nearby, they may as well not exist. It is large scale digital visibility that creates large scale use, even local use. A database like WorldCat provides a large distribution channel to raise awareness and aggregate dispersed audiences for library collections. WorldCat helps people in the neighborhood more conveniently find the treasures they didn't know they had--in their local libraries, right around the corner.
More Is Better
As library data sets go, WorldCat is a big database. Fiscal year 2008 was a record breaking year, featuring over 26% growth in WorldCat, from nearly 86 million bibliographic records on July 1, 2007, to over 108 million on June 30, 2008. Growth is coming largely from libraries outside the U.S. Major loads in fiscal year 2008 included the holdings of the National Library of Australia and its 800+-member Libraries Australia consortium; the National Library of Sweden; the Bayerische Staatsbibliothek; HeBIS (a network of libraries in a region of Germany); the National Library of New Zealand; and the Swiss National Library.
More Is Better in More than One Way
"To do anything useful with tags, you need numbers. With only a few tags, you can't conclude much. The tags could just be 'noise.'"
This quote from a well known post on Thingology compares LibraryThing's tagging system with Amazon's. With a critical mass of tags, the aggregated whole becomes greater than the sum of the parts. A large number of tags is the starting point, but it isn't simply the large number that makes LT tags useful; it's the patterns that emerge when a large amount of metadata is combined and mined for relationships. In this same way, WorldCat.org, for example, has been "FRBRized" with a set of algorithms developed in the OCLC Office of Research. When the FRBR model brings together bibliographic records in work sets, not only can the original metadata be enriched: end users are enabled to sift through myriad resources more effectively, irrespective of the specific "container" or item the content is carried in. Thus, WorldCat is more than the sum of all the bibliographic records it contains.
A Switch More Than a Destination
The first step to get pizza is to call or visit a pizza shop. The first step to get library materials is to type or click the URL of the library's Web pages or visit the library. WorldCat's collective collection can be visited directly too, by typing in the URL, typing in a WorldCat search box on another site, or clicking on a bookmark. When visited directly, WorldCat.org functions as a kind of "destination restaurant," which has a strong enough appeal to draw customers from outside its particular community.
Yet, while many do visit WorldCat.org as a known destination in and of itself, in fact WorldCat.org functions more as a "switch" to lead information seekers from a variety of places on the Web to more than 10,000 local library collections. In effect, WorldCat enables information seekers to start an information search elsewhere and end up at their local libraries. In this way, working with its Open WorldCat partners, WorldCat exposes the collective collections of libraries globally, to raise awareness and use of particular library collections locally, offering an entirely new set of paths to the pizza shop--er, library.
What follows is a statistical analysis of how visitors reach WorldCat.org. The results suggest that WorldCat.org is now beginning to function well as a switch, with the potential to drive substantial traffic to local libraries from search engines and other Web sites.
Referrals to WorldCat.org, January 1 - May 1, 2008
Search Engines 47.45%
Other Web Sites 39.91%
Typed/Bookmarked URLs 12.64%
-------------------------
The concept of the "long tail," a phrase coined by Chris Anderson, is that given a large enough distribution channel, items in low demand ("non-hits") can collectively generate a volume of usage that meets or exceeds that of bestsellers ("hits").
