October 2008 Archives
In today's networks, more and more similar services are remote or diffused from their provider. The convenience of link resolution to utilize journal subscriptions from your library, the "Find in a Library" links in Google Book Search, and deep links from WorldCat.org into your local catalog all depend on a catalog of a different sort.
OCLC's WorldCat Registry [http://www.oclc.org/us/en/registry/default.htm] is a web-enabled registry listing libraries, the services they offer, and the addresses - both physical and digital - at which they are offered. It is more than a simple list, however - built-in Web services distribute Registry data across the Web, enhancing Web discovery.
The folks at WorldCat Registry have been busy recently creating teaching materials, including an informative YouTube video [http://www.youtube.com/watch?v=OEEQdoaHpxc]. These materials show the value that a comprehensive registry promises. The quality of the "catalog" or "registry" is dependent on its data, so the team is working hard to improve the efficiency of entering and managing the data, as well as improving its global coverage. We urge you to look at the WorldCat Registry project and make sure that your library is represented accurately.
In the most recent issue of LRTS, Janet Swan Hill compares the benefits of cooperative cataloging systems to "how chain letters would work if everyone followed the instructions."  She continues with the analogy of a chain letter that instructs the recipient to put his/her name on the bottom of a list of names, forward the letter to five people, and send a pair of socks to the person at the top of the list. A few weeks later, if everybody follows the instructions, dozens of pairs of socks begin to arrive.
Conventional wisdom suggests that WorldCat is an enormous store of contributed or gathered metadata from individual libraries and other sources--a big mountain of metadata--and of course that is true. But WorldCat is much more than the sum of its collectively contributed metadata, thanks in part to the continuous improvement of its records by the cataloging community, and in part to the curatorial role that OCLC plays. This post is about that curatorial role. A quick review of the activities associated with definitions of the word "curator" yields the list cares, oversees, guards, develops, manages, preserves, organizes, sets agenda for, packages, and more.
Let's take a look at WorldCat record #777 (just to pick a random record), which describes Labor Relations in the Netherlands by John P. Windmuller. This title was published in 1969 by Cornell University Press. Its record was created by catalogers at the Library of Congress in March of 1969 and has not been reissued by LC since that time. So it was one of the original records available to OCLC member libraries when the Online Union Catalog went online in 1971. As of a few days ago it had 394 holdings symbols on it. It's probably had more than that from time to time in its life but, no doubt, some libraries have either lost their copy or weeded it. These holdings range around the world with 25 in Canada, 5 in South Africa, 12 in Australia, 6 in New Zealand, 7 in Great Britain, 5 in Germany, 17 in the Netherlands, 1 in Ireland, 1 in Italy, 1 in Lebanon, 5 in Japan, and 1 in Singapore. The rest are scattered across 44 of the
OCLC has developed its batchloading and data exchange algorithms to capture and add enrichments from incoming records to existing records. And so, the original LC record has been enriched with a Dutch-language subject heading (from GOO-trefwoorden thesaurus) and a classification number (from Nederlandse Basisclassificatie) as a result of the addition of the Dutch Central Catalogue to WorldCat, as well as a French-language subject heading (from the Répertoire de vedettes-matière) from a record loaded from the Université Laval in Québec. At some point over the years, OCLC WorldCat Quality staff manually added a Geographic Area Code (field 043), which is indexed in WorldCat and improves retrieval.
If we look at record #777 in WorldCat.org, we can see that, thanks to OCLC's work in developing a FRBR algorithm, three editions of Dutch translations of Labor Relations in the Netherlands have been clustered with the original English edition. If you search the title for the Dutch versions, you find many more Dutch editions that couldn't be linked to the FRBR cluster because there's nothing in the bibliographic record to link to the original English title.
Finally, if you click on the link "John P. Windmuller" on the "Details" tab of the WorldCat.org display, you see the WorldCat Identities page for this author, where you can see other information about his writings, about him, and about people and organizations to which he can be linked. WorldCat Identities is a recent accomplishment out of the OCLC Office of Research, based on extensive data mining of WorldCat-contributed metadata to create a brand new, end-user facing authority service (Thom Hickey's April 2007 article describes the first release).
All of these things have been possible because OCLC has played a proactive editorial/curatorial/data management role caring for, managing, developing, and packaging not only the metadata in record #777, but the WorldCat metadata surrounding and related to it.
This post wouldn't be complete without a few words describing the roles OCLC has played as maven (well, maybe manager) of WorldCat duplicate detection and resolution as well as automated authority control of name and subject headings. Duplicate "resolution" generally results in the merging of multiple OCLC records into one. For a look at a merged record, check out OCLC record #221147330 in Connexion (Corporations and Citizenship in WorldCat.org).
Duplicates can be identified and merged manually--with the merging done by OCLC's WorldCat Quality group--or algorithmically. During the 1990s, sophisticated algorithms worked their way through WorldCat every six months or so, resolving thousands of duplicates. Those algorithms are now being re-created and expanded for OCLC's new Oracle platform.
From 1992 to 1994, researchers and database specialists from the OCLC Office of Research and the WorldCat Quality unit of that time collaborated to develop and run algorithms to automatically correct topical and geographic subject headings, personal and corporate names to their authorized or preferred forms in WorldCat. By going beyond what could be done using the LC name and subject authority files alone, and using the collective power of WorldCat metadata itself, the team was able to automatically correct over 5.5 million headings in WorldCat, with a very low error rate. OCLC staff learned a good deal about data mining and manipulation in that process--lessons they have carried forward to new initiatives.
More recently, during the development of WorldCat Identities, FRBR work clusters, and other services based on WorldCat data mining, Office of Research staff have uncovered many more opportunities for OCLC to carry out its curatorial role in WorldCat. Most recently, we used information generated from the building of WorldCat Identities to automatically control headings in WorldCat. As of last month, 25.5 million personal name headings in bibliographic records had been newly controlled. Controlling headings and linking headings to appropriate authority records allows headings to be automatically updated when an authorized heading changes.
My thanks to Glenn Patton and his team for their help with this post.
 Hill, Janet Swan. 2008. Entering an alternate universe: some consequences of implementing recommendations of the Library of Congress Working Group on the Future of Bibliographic Control. Library Resources and Technical Services 52 (4): p. 224.
By John MacColl
This post was contributed by John MacColl, an OCLC colleague who joined RLG Programs in November 2007 as European Director, having previously worked in a number of UK academic libraries since the mid-1980s. Based in the University of St Andrews in Scotland, his role is to work with the RLG European Partner libraries, and to lead programs and projects in areas related to his expertise in scholarly communication and digital library technologies. -Karen
I'm John MacColl, and I was in The Dutch National Archives last month with Karen Calhoun, where we were both speaking at the Dutch Customer Contact Day, which OCLC EMEA runs each year for their many Dutch customers. Following Karen's presentation, I gave a presentation whose tongue-in-cheek title was inspired by the organisation hosting the meeting - Are archives the new libraries? It also reflects our growing interest in helping research libraries digitize their unique and rare materials - archives as well as rare books and manuscripts - and to put these materials onto the web as a priority. Are archives the new libraries? was therefore a teasing title, suggesting that much of the business of archives is now coming to the fore for research library managers.
There are significant problems associated with exposing this material to the interested minds which evidence suggests are out there and hungry for it. Much of it is fragile, and traditional library practice has been to prioritise conservation and preservation ahead of access. This has resulted in large quantities of rich research material effectively being turned into hidden collections. We have focused on this problem many times in different ways over the years. See, for example, Merrilee Proffit's blog post on a talk earlier this year by Richard Ovenden from the Bodleian Library. Archivists, however, received a wake-up call in 2005 through the paper produced by Mark Greene and Dennis Meissner, which John Chapman mentioned in his earlier post to this blog, and which made a huge impact in the archives world by advocating a minimalist and demand-led approach to cataloguing, in order to address the hidden collections problem.
Libraries are learning - partly from archivists - that in the digital age, satisfying demand is a different proposition from the one they were familiar with in the print age, when scale in public document management was something they largely controlled, and usage, impact and demand were only crudely measurable. Now we have webscale, and we can see what users are looking at and downloading. Libraries own content which has the potential to be hugely popular and useful in this webscale world, and one of the most interesting tests of this in recent times was the use of flickr by the Library of Congress at the beginning of this year, when it put up 3,100 digital photographs of news images from the first half of the 20th century, in the newly-launched flickr commons.
LC's experiment with images on flickr commons well illustrated the hidden demand for hidden collections. My colleagues and I use it as an example frequently in presentations (I stole the slides from one of my Program Officer colleagues). Slides 29-33 of my presentation tell the story as it unfolded and was told, with some astonishment, in the project blog. Twenty four hours after the images went onto flickr, they had attracted over a million views, with 420 images having received comments - and every image in the collection having been viewed. The impact could not be overstated. While the images could of course be viewed on the LC's own website, the difference is that the LC site (currently the 3,236th most visited site on the web) attracts nowhere near the same amount of web traffic as does flickr (currently the 31st most visited site on the web). As the slides go on to show, LC cataloguers found that many of the user comments on the images were extremely helpful to them in improving the image metadata, and 89 images had their records updated as a result.
The LC's use of flickr was picked up in later postings by Lorcan Dempsey and by Günter Waibel, who provides an interesting update on the experiment, reflecting on the question of balancing webscale provision (should huge numbers of images be provided to flickr all at once?) with human-scale appreciation, as a community of interest developed in the growing collection. The Library of Congress is still evaluating its pilot, and one of its issues will surely be the scale of cataloguer effort which can be dedicated to sifting through user-contributed comments. That's a challenge. But the scale of the demand for these rich examples of research materials is clearly in evidence, and the Library of Congress has now been joined by several other museums and libraries in the flickr commons, including - nicely - the Dutch National Archives (no images of Karen and me on the podium in their photostream however).
OCLC has been increasingly successful at establishing partnerships with national libraries around the world. We introduced a new set of Web pages devoted to national libraries recently
The national libraries Web pages utilize only part of the functionality developed in the Office of Research's WorldMap project, however. OCLC's researchers developed their prototype for comparing a variety of library information on a global level. Their project is interesting not just for the particular application that the researchers built, but because their effort represents an innovative way of repurposing and blending metadata from a variety of sources in a visual representation. Lynn Connaway and Larry Olszewski presented some of the WorldMap prototype research findings at the 2006 Charleston Conference. But back to our topic. In addition to its work with national libraries, OCLC supports a number of national or regional union catalogs through its CBS (Central Bibliographic System) partners in the Netherlands, UK, Germany, France, and Australia. CBS provides a framework for strong and independent consortia of libraries to collaborate and share resources. Over the last year and a half or so, the CBS-based union catalogs of the GGC (Netherlands), HeBIS (Germany), Libraries Australia, and the GBV (Germany) have been loaded into WorldCat to give the library collections that these union catalogs describe broader exposure, from more places on the Web.
The national libraries Web pages utilize only part of the functionality developed in the Office of Research's WorldMap project, however. OCLC's researchers developed their prototype for comparing a variety of library information on a global level. Their project is interesting not just for the particular application that the researchers built, but because their effort represents an innovative way of repurposing and blending metadata from a variety of sources in a visual representation. Lynn Connaway and Larry Olszewski presented some of the WorldMap prototype research findings at the 2006 Charleston Conference.
But back to our topic. In addition to its work with national libraries, OCLC supports a number of national or regional union catalogs through its CBS (Central Bibliographic System) partners in the Netherlands, UK, Germany, France, and Australia. CBS provides a framework for strong and independent consortia of libraries to collaborate and share resources. Over the last year and a half or so, the CBS-based union catalogs of the GGC (Netherlands), HeBIS (Germany), Libraries Australia, and the GBV (Germany) have been loaded into WorldCat to give the library collections that these union catalogs describe broader exposure, from more places on the Web.