July 2008 Archives
I was struck by the article in today's online New York Times--the first in a series of pieces on reading and the Internet. While some pundits worry about declining scores on standard reading tests (that test skills evolved for an offline world), other experts note the emergence of a new kind of reading--online, nonlinear, participatory and active, and multilayered. Traditional reading of books among 17-year-olds is down compared to 2004, while reading on the Internet is up. One survey indicates that by 2004, nearly half of 8 to 18 year-olds were using the Internet every day, and thus engaged at some level in reading text. One child's mother, interviewed for today's NY Times story, keeps bringing home books from the local library to try to 'fix' her obviously bright, not-in-need-of-repair young daughter, who prefers the interactive, multifaceted style of reading she does on the 'net.
The article reminded me of Carole Palmer's talk at OCLC a few weeks back. Lorcan blogged about Carole's visit and the paper she presented. Kids are not the only ones doing a lot of reading on the 'net--so are our top scholars. Carole noted in her talk that scholars read 30% more articles in 2006 than in the mid-1990s, but that reading time per article fell. Scholars are reading in short bursts or as Carole puts it, 'bouncing' from one online source to another. They are rapidly navigating through more material in less time, attempting to evaluate and use content while minimizing actual reading. Indexing, abstracts, literature reviews, online discussions and interactions with colleagues, tables of contents and so on, all help researchers decide whether articles are relevant--without reading them from beginning to end.
I am guessing that--if given the means to 'bounce'--the general public's reading behavior might be characterized by bouncing as well. In the WorldCat quality research I've been involved with over the past several months, I've observed that for materials that aren't one or two clicks away already--in other words, for most of what libraries own--searchers from all the groups we studied expressed a strong desire for instant, easy access to summaries, abstracts, reviews, TOCs--anything that would help them decide whether to pursue or avoid reading a title in a search result set. Readers everywhere seem to feel pressed for time. What are they reading from cover to cover, and how do they decide? How can libraries participate and help in this process?
Libraries cannot win if they do not play in the idiom of the Web. More deeply understanding end users' reading styles and preferences could help libraries know what practices to hold onto and what to give up or cut back. Like the standard tests of reading skill, library cataloging practices evolved in keeping with a reading style that is offline, linear, solitary and passive, extended in time, often text-only (few visuals), and single layered (that is, offers one point of view). How much of library cataloging practice reflects the predominant reading style of the last few hundred years? What still works about traditional cataloging practice, and what no longer suits the end user behavior and preferences associated with online reading, or with selecting what to read offline?
It occurs to me that what we tend to chatter about on our lists and blogs -- how much bibliographic data here, encoded in what way in our systems, with how many fields and what type of controlled subject access, and whether the whole qualifies as full, minimal, abbreviated or whatever -- could stand to be assessed in light of the changing world of reading. I am wondering how library cataloging and metadata professionals might begin to identify those practices that are likely to attract and sustain more attention to library collections, given the new age of reading that appears to dawning, at least among certain segments of the end user communities that libraries serve.
By Janifer Gatenby, OCLC Leiden Office
In February 2008, the Nederlandse Centrale Catalogus (NCC: the Dutch Union Catalogue)* started updating WorldCat in real time. As Dutch librarians create new bibliographic records and add new holdings to new and existing records in the NCC, WorldCat is simultaneously and automatically updated behind the scenes. Thus, all new data is visible in WorldCat within seconds, even though it is entered via a local interface. Once in WorldCat the data is exposed to a global audience via the freely available site WorldCat.org. The data is also formatted and made available to the major search engines, in particular Google and Yahoo and various other major sites such as Facebook. Many sites, including an increasing number of blog sites have downloaded the WorldCat search box allowing direct searching from their pages. Other sites are accessing WorldCat via their own programs using the WorldCat API (Application Program Interface).
The importance of up to date and comprehensive information in all potential discovery environments cannot be overstated; users may search wherever they choose and librarians can be confident that these users will be able to retrieve the same resources, no matter their point of entry.
In 5 months of operation of SRU update, the NCC has added to WorldCat more than 1.5 million holdings and 230,000 bibliographic records with a time between appearance in one catalogue and appearance in the other averaging less than 5 seconds. For the first two months, there were two streams of data, ongoing regular work and a gap load of records processed between the initial batch load and the live date for
Following the NCC success, a second release of the software was implemented on 14th July 2008, extending the service to include enhanced security, a mechanism for the deletion of holdings and a validation only routine. The validation only routine serves as a valuable testing tool for both libraries and OCLC developers alike. The service is now ready for further implementations. The second installation will be the Libraries Australia *** service later this year.
The new service has great potential for WorldCat because a growing percentage of contribution to WorldCat is not via WorldCat's online Connexion interface. All systems, of libraries of all types, currently contributing via batch can potentially switch to interactive update. Sending systems need to have
*The NCC contains bibliographic references and the locations of approximately 12 million books and almost 500,000 periodicals in more than 400 libraries in the Netherlands. The NCC was initially loaded to WorldCat in fiscal year 2006.
***Libraries Australia, coordinated by the National Library of Australia, is used for reference, collection development, cataloguing and interlibrary lending. Libraries Australia contains the Australian National Bibliographic Database (ANBD), which records the location details of over 42 million items held in most Australian academic, research, national, state, public and special libraries. The holdings of Libraries Australia became part of WorldCat in fiscal year 2008.
By David Whitehair
David Whitehair is the Global Product Manager for Cataloging & Metadata Services. He manages the group who is responsible for many of our services including Connexion, Contract Cataloging, WorldCat Selection, and Language Sets.
Several of us from the Cataloging & Metadata Services group attended the American Library Association meeting in
Although like most attendees we are exhausted by the end of the conference, we couldn't imagine June/July without it. (OK... I will admit... some of us really question why does the OCLC breakfast have to be at 7:00 AM on Sunday? And why do we have to be there at 6:30? We love the breakfast, but how about a brunch instead?! Ha!) Also like most attendees, at each ALA we sample the local cuisine--and one dinner we try to have together. This time we had a fab-u-lous group dinner at Mr Stox, where we welcomed our team members who traveled the farthest (
Here is just a peek into some of our activities. Friday included the Enhance Sharing Session which was hosted by Jay Weitz and Glenn Patton. Glenn was then joined by Robert Bremer to host a meeting to discuss the implementation of the new linking ISSN. Saturday started early for the Dewey team members Libbie Crawford, Michael Panzer, Joan Mitchell, Giles Martin, and Juli Beall, who hosted a record-breaking turnout at the Dewey Update Breakfast (wow... I guess I shouldn't complain about getting up early for the Sunday breakfast... the Dewey team does this two days in a row).
Sunday afternoon included the New Directions in Cataloging at OCLC presentation given by Renee Register, supported by team members Maureen Huss and Robin Buser. Sunday also included lots of time talking with OCLC members at the OCLC booth for Robin Buser. Of course Rich Greene was undertaking his usual Mr. MARBI marathon at this conference; Doug Perkins conferred with many OCLC members about batchloading, and Linda Gabel and Cynthia Whitacre carried out their many duties as SAC and ALCTS committee members/leaders.
For me... along with Lisa Elliott, the trip ended with a couple of library visits on Tuesday. It is always a joy to be able to visit libraries when we are on the road--and since we were out and about, Lisa and I were able to end the trip with a terrific dinner in
And yes, the planning for next
This post is from my colleague Michael Panzer, who is OCLC's Global Product Manager for Taxonomy Services. Michael builds technical and business cases for the use of Dewey and other OCLC terminology resources in a wide variety of web applications. That means, while analyzing the traditional use cases and user base for knowledge organization systems, rethinking their role in a rapidly changing information landscape.
By Michael Panzer
When dealing with a large-scale and widely-used knowledge organization system like the Dewey Decimal Classification, we often tend to focus solely on the organization aspect, which is closely intertwined with editorial work. This is perfectly understandable, since developing and updating the DDC, keeping up with current scientific developments, spotting new trends in both scholarly communication and popular publishing, and figuring out how to fit those patterns into the structure of the scheme are as intriguing as they are challenging.
From the organization perspective, the intended user of the scheme is mainly the classifier. Dewey acts very much as a number-building engine, providing richly documented concepts to help with classification decisions.
Since the Middle Ages, quasi-religious battles have been fought over the "valid" arrangement of places according to specific views of the world, as parodied by Jorge Luis Borges and others. Organizing knowledge has always been primarily an ontological activity; it is about putting the world into the classification.
However, there is another side to this coin--the discovery side. While the hierarchical organization of the DDC establishes a default set of places and neighborhoods that is also visible in the physical manifestation of library shelves, this is just one set of relationships in the DDC. A
What are those "other" relationships that Dewey possesses and that seem so important to surface? Firstly, there is the relationship of concepts to resources. Dewey has been used for a long time, and over 200,000 numbers are assigned to information resources each year and added to WorldCat by the Library of Congress and the German National Library alone.
Secondly, we have relationships between concepts in the scheme itself. Dewey provides a rich set of non-hierarchical relations, indicating other relevant and related subjects across disciplinary boundaries.
Thirdly, perhaps most importantly, there is the relationship between the same concepts across different languages. Dewey has been translated extensively, and current versions are available in French, German, Hebrew, Italian, Spanish, and Vietnamese. Briefer representations of the top-three levels (the DDC Summaries) are available in several languages in the DeweyBrowser. This multilingual nature of the scheme allows searchers to access a broader range of resources or to switch the language of--and thus localize--subject metadata seamlessly. MelvilClass, a Dewey front-end developed by the German National Library for the German translation, could be used as a common interface to the DDC in any language, as it is built upon the standard DDC data format.
It is not hard to give an example of the basic terminology of a class pulled together in a multilingual way:
<class/794.8> a skos:Concept ; skos:notation "794.8"^^ddc:notation ; skos:prefLabel "Computer games"@en ; skos:prefLabel "Computerspiele"@de ; skos:prefLabel "Jeux sur ordinateur"@fr ; skos:prefLabel "Juegos por computador"@es . Expressed in such manner, the Dewey number provides a language-independent representation of a Dewey concept, accompanied by language-dependent assertions about the concept. This information, identified by a URI, can be easily consumed by semantic web agents and used in various metadata scenarios. Fourthly, as we have seen, it is important to play well with others, i.e., establishing and maintaining relationships to other Pulling those relationships together under a common surface will be the next challenge going forward. In the semantic web community the concept of Linked Data currently receives some attention, with its emphasis on exposing and connecting data using technologies like URIs, HTTP and RDF to improve information discovery on the web. With its focus on relationships and discovery, it seems that Dewey will be well prepared to become part of this big linked data set. Now it is about putting the classification back into the world!
<class/794.8> a skos:Concept ;
skos:notation "794.8"^^ddc:notation ;
skos:prefLabel "Computer games"@en ;
skos:prefLabel "Computerspiele"@de ;
skos:prefLabel "Jeux sur ordinateur"@fr ;
skos:prefLabel "Juegos por computador"@es .
Expressed in such manner, the Dewey number provides a language-independent representation of a Dewey concept, accompanied by language-dependent assertions about the concept. This information, identified by a URI, can be easily consumed by semantic web agents and used in various metadata scenarios.
Fourthly, as we have seen, it is important to play well with others, i.e., establishing and maintaining relationships to other
Pulling those relationships together under a common surface will be the next challenge going forward. In the semantic web community the concept of Linked Data currently receives some attention, with its emphasis on exposing and connecting data using technologies like URIs, HTTP and RDF to improve information discovery on the web. With its focus on relationships and discovery, it seems that Dewey will be well prepared to become part of this big linked data set. Now it is about putting the classification back into the world!
A good thing about traveling and speaking a lot is that people ask you questions. Sometimes they ask intriguing ones, the kinds that stick in your head. One of those questions compared libraries to pizza shops--"if I want to find a pizza shop in my neighborhood I just google 'pizza' and my zip code, and they all come up. Why isn't it enough to just google 'library' and my zip code and see what's near me--why do I need WorldCat to aggregate library collections for me?"
I tried googling "pizza" and "library" appended with various zip codes, and it's true, Google yields a nice list for both kinds of establishments--with maps, addresses, URLs and phone numbers. From the library list one can usually connect to an online catalog to search and browse library collections, albeit one at a time. At the same time it must be admitted that the method works better for pizza than it does for libraries; academic libraries in particular get left out of the Google search results. But for the sake of argument, let's say that the Google technique for identifying nearby libraries and what's in their collections is effective and comprehensive. Would a "collective collection" of many library collections--that is, WorldCat--still be useful to libraries and the communities they serve?
Beyond Pepperoni, Veggie and Supreme
The most obvious difference between pizza shops and libraries is that with pizza shops, it's pretty easy to predict the inventory. A few also sell lasagna and salads, but most pizza shops are pretty much alike. Libraries and library collections are not alike; and in fact library collections tend to be made up of both popular and harder-to-find items, suggesting a long tail* strategy for attracting the attention of mainstream and niche readers. If the people in the neighborhood don't know those rich library collections are nearby, they may as well not exist. It is large scale digital visibility that creates large scale use, even local use. A database like WorldCat provides a large distribution channel to raise awareness and aggregate dispersed audiences for library collections. WorldCat helps people in the neighborhood more conveniently find the treasures they didn't know they had--in their local libraries, right around the corner.
More Is Better
As library data sets go, WorldCat is a big database. Fiscal year 2008 was a record breaking year, featuring over 26% growth in WorldCat, from nearly 86 million bibliographic records on July 1, 2007, to over 108 million on June 30, 2008. Growth is coming largely from libraries outside the U.S. Major loads in fiscal year 2008 included the holdings of the National Library of Australia and its 800+-member Libraries Australia consortium; the National Library of Sweden; the Bayerische Staatsbibliothek; HeBIS (a network of libraries in a region of Germany); the National Library of New Zealand; and the Swiss National Library.
More Is Better in More than One Way
"To do anything useful with tags, you need numbers. With only a few tags, you can't conclude much. The tags could just be 'noise.'"
This quote from a well known post on Thingology compares LibraryThing's tagging system with Amazon's. With a critical mass of tags, the aggregated whole becomes greater than the sum of the parts. A large number of tags is the starting point, but it isn't simply the large number that makes LT tags useful; it's the patterns that emerge when a large amount of metadata is combined and mined for relationships. In this same way, WorldCat.org, for example, has been "FRBRized" with a set of algorithms developed in the OCLC Office of Research. When the FRBR model brings together bibliographic records in work sets, not only can the original metadata be enriched: end users are enabled to sift through myriad resources more effectively, irrespective of the specific "container" or item the content is carried in. Thus, WorldCat is more than the sum of all the bibliographic records it contains.
A Switch More Than a Destination
The first step to get pizza is to call or visit a pizza shop. The first step to get library materials is to type or click the URL of the library's Web pages or visit the library. WorldCat's collective collection can be visited directly too, by typing in the URL, typing in a WorldCat search box on another site, or clicking on a bookmark. When visited directly, WorldCat.org functions as a kind of "destination restaurant," which has a strong enough appeal to draw customers from outside its particular community.
Yet, while many do visit WorldCat.org as a known destination in and of itself, in fact WorldCat.org functions more as a "switch" to lead information seekers from a variety of places on the Web to more than 10,000 local library collections. In effect, WorldCat enables information seekers to start an information search elsewhere and end up at their local libraries. In this way, working with its Open WorldCat partners, WorldCat exposes the collective collections of libraries globally, to raise awareness and use of particular library collections locally, offering an entirely new set of paths to the pizza shop--er, library.
What follows is a statistical analysis of how visitors reach WorldCat.org. The results suggest that WorldCat.org is now beginning to function well as a switch, with the potential to drive substantial traffic to local libraries from search engines and other Web sites.
Referrals to WorldCat.org, January 1 - May 1, 2008
Search Engines 47.45%
Other Web Sites 39.91%
Typed/Bookmarked URLs 12.64%
The concept of the "long tail," a phrase coined by Chris Anderson, is that given a large enough distribution channel, items in low demand ("non-hits") can collectively generate a volume of usage that meets or exceeds that of bestsellers ("hits").
I have the pleasure of posting an entry from fellow librarian
Janifer's post below summarizes her 30 June 2008 presentation to the ACRL Western European Studies Section, entitled "A Library Preservation Challenge: Managing the Collective Collection over Time." --Karen
Libraries face old and new challenges in managing and preserving the collective collection as it is evolving today. In addition to physical loss via deterioration, natural disaster and war, that which cannot be found is also lost and that which is not accessed may become lost. Failure to be found in terms of resources could be due to failure to rank in search results (after page 5 is obscurity), or failure to highlight the relative merit of a resource. Failure to be accessed could result in loss because of broken links or outdated formats, or for example a PDF document that cannot be opened, not because its format is out of date but because it contains outdated fonts. In the words of Werner Schwartz writing in Liber in 2008 "Visibility in a way can be seen to be indispensable in the survival of the item."
Libraries face space limitations at the same time that physical publishing continues to grow. The British Library reports a growth in shelving of 12 kilometres a year, and Robert Darnton reported in the New York review of books 55, no. 10 June 12 2008 that "in 2006 291,920 new titles were published in the U.S., and the number of new books has increased nearly every year for the last decade, despite the spread of electronic publishing."
The response has been to seek offsite storage, often in stores serving several libraries. As resources are increasingly out of view, there is a need to create richer metadata that will emulate the browsing experience to optimally display resources. Evaluative information includes enrichment provided by data mining, covers, reviews, lists, circulation statistics and tables of contents.
Web scale is needed to attract user contribution that too is a valuable complement to context and evaluative information from more traditional sources. As well as evaluative information, digital materials require control metadata that hopefully will give the resources a chance of being readable in the future. To be scalable, emphasis needs to be given to data mining for the creation of enriched metadata as an alternative to data crafting. Data mining is more successful, the larger the database on which it is applied.
Digitisation is increasingly being regarded as a way of preserving. If physical copies are lost, at least something remains, and digital copies serve to spare the wear on fragile materials. Here too, metadata plays a significant role. The WorldCat copyright evidence registry could allow libraries to share the burden of copyright investigation and the OCLC DLF registry of digital masters aims to prevent duplication of effort.
A two front approach is required; on the one hand library resources should be given maximum exposure to give them a chance to be found and used, and on the other hand, richer metadata is needed for both discovery and maintenance. The need to work collectively on these two fronts is evident. Union catalogues play an essential role as the window to physical and digital stores and WorldCat serves as the union catalogue of union catalogues. Maximum exposure in local, regional and global catalogues as well as in web search engines is now the goal.
There are problems associated with the four fronts of preservation: via centralised stores (physical and digital), digitisation and exposure. Physical stores need better metadata, better holdings metadata, better delivery architecture and more copyright evidence. Digitisation needs more volume, better quality and also copyright evidence. Looking at exposure, perhaps this is where libraries are learning the fastest. With these limitations, does the ensemble of approaches make coherent sense? Hopefully time will judge our efforts favourably.
Hello, I'm Janet Lees, Community Liaison for OCLC based in
I recently spent an interesting and enjoyable week with the 2008 IFLA OCLC Fellows in the
The development of the Leiden University Digital Library here provided a state of the art case study of how to cooperatively agree to support a range of metadata standards to develop a local service which in turn becomes part of the Dutch network of Digital academic repositories (DARE) leading one Fellow to comment "I had expected that the US libraries would have been more advanced than the European ones but I find that they are very much on par."
Under whelmed by the availability of 110 OPAC terminals and chip and pin lending stations, the Fellows were more impressed with the 600 internet pcs, the 10am to 10pm x 7days a week opening hours, the walk up and play piano in the entrance foyer, the 2,000 secure bike racks and the roof top restaurant provided for patrons. Does this represent the priorities for future generations of librarians?
We visited two national libraries - the Koninklijke Bibliotheek and the Deutsche Nationalbibliothek. In both cases the major issue was digital access to the country's cultural heritage - both onsite and virtually - both national libraries demonstrated exciting projects on how they were tackling the challenge of providing access to their valuable and rich collections via the web. However the most ambitious project we saw was the Europeana project prototype that is planned to launch in November 2008 and aims to provide multilingual access to 2 million digital objects already digitised in Europe's museums, libraries, archives and audio-visual collections.
Our final visit took us to the University Library Frankfurt am Main where the special collections staff proudly shared some of their treasures from the outstanding Africana Collections, which serve as a national area and subject specialist collection. The Fellows were able to observe the detailed metadata creation process that supports the sub-Saharan
One essential outcome from the IFLA Fellows programme is the networking opportunity it creates and the value of it to all - whilst the Fellows are now back home they are still in contact with us and planning to stay in touch with colleagues on both sides of the pond.