Remixing Data at ELAG

|
Comments Comments (0) | Bookmark or Share Bookmark or Share

ELAG  2009 - Metadata highlights, by Janifer Gatenby

 

This year's ELAG (European Library Automation Group) was held at the culturally and historically important University Library in Bratislava, Slovak Republic, located in the heart of old Bratislava.  The title of ELAG's 33rd seminar was "New Tools of the Trade" and the conference was full of stimulating and relevant content, with a focus on re-mixing data.

Map metadata is going to get much easier to create and much richer.  Petr Zabicka and Petr Pridal from the Moravian Library in Bruno, Czech Republic,  introducted us to their web site oldmapsonline where there are open source tools for the scanning, metadata creation and adding geo references to maps.  Their open source Map tiler provides an easy interface for assigning a geo bounding box that makes the map compatible with Google maps so that maps can be overlaid, e.g. for "then and now" comparisons. Geo coordinates are more important than traditional access points for searching maps.  They also recommend  zoomify which zooms any image.

To open the main theme of the conference, Karen Coombs gave a rich key note address and animated the mash up work shop. Check her presentation and her workshop notes, both of which are full of useful examples and tips.

Table of content metadata is being harvested and made available for reuse.  This was reported by Lisa Rogers from Heriot-Watt University in the UK with her overview of TicTocs and Golddust. TicTocs aggregates RSS feeds from more than 12,000 journals and then makes a data set available for mash up.  Peter Van Bohemen from Wageningen University has made very rapid use of this service to display the contents of the lastest issue of a journal when a full record display of a serial is requested in the Wageningen union catalogue.  Gold dust is an SDI service using Tictocs and user profiles. 

There were two reports on systems with a new approach to the generation of recommender data.  Marcus Spiering of the University of Karlsruhe reported on Bibtip, which is a recommender system based on evidence from an anonymous session based cookie that looks for "co-inspections" (full record views).  This metadata is harvested from the usage information collected from a library's online catalogue and thus it works for all material represented in the catalogue.  This contrasts with recommender systems based on circulation based usage which only look at the physical collection and systems based on resolver usage which only look at electronic material.  Tamar Sadeh from ExLibris announced bX, a journal article recommender system based on traffic from harvested logs from SFX resolvers. It also looks for "co-inspections" within a session and is based on research from Herbert van de Sompel's Los Alamos lab.  ExLibris will be running this as a chargeable web service.

Thom Hickey and I gave a presentation entitled "Opening Library Data for Web Scale and Re-mixing" . Tom talked about our data resources and how OCLC is both growing and enriching them, with examples from WorldCat Identities and VIAF.  I stressed the importance of identifiers in re-mixing data, alluding to GLIMIR (Global Library Manifestation Identifier:  see my post in January 2009 on the importance of identifiers) and presented an outline of OCLC's identifier services and data APIs.  From the discussion that ensued, we gathered that work identifier services are in demand. Increasingly, metadata specialists are recognizing the importance of manifestation level identifiers as well.  See, for example, the post this week by Jonathan Rochkind.

I've given here an overview of just some of the presentations that are particularly relevant to metadata.  There were other excellent contributions which can be found on the seminar web site.

Special Delivery

|
Comments Comments (0) | Bookmark or Share Bookmark or Share
The catalog-centric model of library use is pretty straightforward. A user consults a shelf list (most likely through a topic, author, or title index), takes note of a shelf location, and then goes physically to that location to find the resource. The networked world has changed certain dynamics of this model, most notably the depth and accessibility of the indexes, but the model of delivery - the last step described above - has really fundamentally changed only very recently. One of OCLC's strategic moves has been to collect and provide holdings data, so that libraries can share in the strength of a unified catalog while still providing local utility to users seeking a physical resource.

However, in research, or where known-item searching is not the norm, there is one step remaining. Once the user has access to the contents of the resource, there is an evaluation process: "Will this suit my needs, or should I look for something else?" In full-text environments, this rapid compression of the time required for the delivery and evaluation phases is substantial. Accordingly, the delays to the evaluation phase in the traditional library delivery model are increasingly unacceptable to our users.

A newly available study on WorldCat data quality, OCLC's "Online Catalogs: What Users and Librarians Want" [http://www.oclc.org/us/en/reports/onlinecatalogs/default.htm] suggests that the user seeks above all not rich bibliographic information but rich availability data and evaluative information. Libraries have not traditionally provided evaluative materials to their users in systematic ways; however, they have maintained such aids (book review indexes, etc.) for expert users and for collection development purposes.

The newest mode of providing evaluative content is a game-changer: the provision of full text.  Aggressive moves by for-profit companies in the digitized full-text market are no secret. They bear none of the costs or scarcities of delivering physical books, instead delivering texts. From the user side, the entire process of determining suitability-of use is extremely foreshortened.

As the research and evaluation process is further influenced by the availability of full text, libraries will need to pay attention to the most user-friendly and popular methods of accessing these texts and provide helpful links to them from their discovery tools. (Libraries have some relevant experience with this in the area of referring users to licensed content through link resolution.) The successful integration and synthesis of multiple types of evaluative information is a central challenge.

Popular alternative discovery platforms for information resources (Amazon, tagged personal collections, etc), in addition to using simple holdings and/or sales data, tables of content, and reviews, have approached the evaluation problem in new ways.  One is subcollections - either curated actively or casually through tagging. Another is leveraging user-behavior data such as browsing behavior or "fulfillment" - circulation or purchase. OCLC's new record display in worldcat.org (and WorldCat Local) uses a variety of tools, including user reviews and behavior data, to provide evaluative information. For an example, see http://www.worldcat.org/oclc/61479616 .

OCLC will continue to develop and leverage internal systems and to seek out external providers of licensed content to enhance the evaluative richness of WorldCat. We invite you to share your thoughts on the new record display.

In my November 24, 2008 post to this blog, I alluded to a research project examining what is most important about WorldCat metadata to a range of audiences, both end users and librarians. My research team and I have now published the findings that are likely to be relevant to library catalogs in general. Our report, Online Catalogs: What Users and Librarians Want, is available as a freely downloadable PDF from http://www.oclc.org/us/en/reports/default.htm.  There is also an executive summary. Your comments are welcome.

More on the Expert Community Experiment

|
Comments Comments (1) | Bookmark or Share Bookmark or Share

In a Feb. 12 entry on this blog, we announced OCLC's Expert Community Experiment, which creates a wiki-like environment around WorldCat cataloging records so that anyone with an OCLC full cataloging authorization can participate in making records better. The experiment began the week of February 16; it will continue for 6 months. To participate, you need nothing besides your OCLC full level cataloging authorization. More information is available on the Expert Community Web pages.

A month into the experiment, I thought there might be intereest in an update on participation.  Registration for the Expert Community Webinars is breaking OCLC records for participation in our webinars--more than 900 sites participated in the four sessions offered in February. If you missed these, there is another Webinar on March 24 for which you can register. Alternatively, you can visit the Expert Community Web pages and click on the Webinar recording available in the right frame.  

Here are some statistics that suggest that the OCLC cataloging community is becoming more engaged in collectively improving WorldCat. The statistics compare master record improvements during the first four weeks of Expert Community Experiment activity with improvements made one year ago (March 2008).

  Expert Community Stats March 2009.jpg 

Besides the brand new Expert Community updates to master records, it seems possible that the experiment is yielding an uplift in database enrichments and minimal-level upgrades as well. 

 

The Macroscopic View

|
Comments Comments (0) | Bookmark or Share Bookmark or Share
Macros are an important part of the toolkit for those that work on large datasets. The size and scope of the WorldCat database has encouraged the development of a number of scripts designed to normalize, translate, and transform.  Some of these tools are used internally by OCLC staff to clean up errors; others are shared among the cooperative and the cataloging world.

Robert Bremer is one of the OCLC staff that makes extensive use of macros. He showed me one of the more complex ones that cleans up common errors in the bibliographic data that gets uploaded to WorldCat. One of the most common types of errors that he must deal with is that of text that the cataloger types in "freehand" in fields like the 504 ‡a (bibliography note, i.e. "Includes bibliographic references.") The logical thing here would be for the cataloger to have a macro to insert this text (and to use it) every time the situation arises; but the use and utility of macros among the cataloging population is uneven.

Although OCLC can and will continue to correct a wide range of errors arising from multiple sources, some macros are most beneficial at the local level. The OCLC cooperative has long benefited from the work of individuals who have sought to share their macro insights. Joel Hahn's "Better Living through Macros" should be of interest to those looking for an introduction: http://www.hahnlibrary.net/libraries/oml/index.html.  A more advanced perspective is at Harvey E. Hahn's "OML Macros" page (http://www.ahml.info/oml/).
Walt Nickerson maintains a collection of macros at http://docushare.lib.rochester.edu/docushare/dsweb/View/Collection-2556.

The main Connexion macros page (http://www.oclc.org/connexion/support/macros.htm) has links to these resources and a few more, including lessons for learning how to create and use macros.

I urge you to look at what is available, see if it makes sense for your organization, and to let us know how we can help the cataloging community in creating new macro resources.


Yesterday, Mary Ann Laun, chair of OCLC's Members Council Cataloging & Metadata Service Group, led their session as part of OCLC's February virtual meeting of Members Council [http://www.oclc.org/us/en/memberscouncil/meetings/2009/february/2009februaryagenda.pdf].  The session covered two topics: OCLC's Expert Community Experiment, and the results of a survey of technical services staff.  I thought both topics would be of keen interest to Metalogue readers.

OCLC's Expert Community Experiment creates a wiki-like environment around WorldCat cataloging records so that anyone with a full cataloging authorization can participate in making records better. The experiment will begin the week of February 16, and will continue for 6 months. No change or signup is required at the institution end - if you have a full cataloging authorization, you will see these changes automatically. We encourage you to discuss and ask questions on the OCLC-CAT listserv [https://www3.oclc.org/app/listserv/] and OCLC staff will be monitoring ASKQC@oclc.org if you want to ask a private question. More information, including a schedule for upcoming webinars, can be found at the Expert Community page: http://www.oclc.org/us/en/worldcat/catalog/quality/expert/default.htm.

Last fall, the OCLC Members Council Cataloging and Metadata Service Group conducted a survey of Members Council delegates and OCLC-CAT listserv readers about issues facing technical services departments. Here are the top three issues selected by respondents to the survey:

  1. training in Next Generation concepts
  2. creating new skills sets (what are they and how do we build them)
  3. transitioning from traditional duties to new ones

Yesterday, OCLC staff were asked how they might assist the technical services community in addressing these issues.  We would like to begin our exploration of ways in which OCLC might partner with other appropriate individuals and groups by asking readers of Metalogue to answer a couple of questions.  Please make your comments on this blog post; they will be most appreciated. Thanks in advance for your thoughts.

  • What do "Next Generation concepts" mean to you? Which of these are particularly relevant to technical services staff?
  • What groups or individuals do you know of in the technical services community who are supplying this sort of training, assessment, and transition assistance?

Just a quick note to let readers know how to track the progress of and submit feedback to the newly announced Review Board on the Principles of Shared Data Creation and Stewardship.

 
Online feedback forum (blog): http://community.oclc.org/reviewboard/
 
 
The group is chaired by Jennifer Younger, university librarian at the University of Notre Dame and an OCLC Members Council delegate.
 

 

The Guardian, a prominent UK newspaper, has published corrections to its article of January 22 in response to objections that OCLC communicated to the Guardian last week. The Guardian's corrections were published today and may be viewed in its "Corrections and Clarifications" page or at the beginning of the original article.

The January 22 article in the Guardian is built around what OCLC regards as a false premise (that OCLC reduces libraries' visibility on the Web). It states that OCLC shares "only 3 million" records with Google Books. This is not the case. OCLC shares nearly all of the database with Google Books and Google Scholar, with the exception being a relatively small amount of data that OCLC is contractually prohibited from sharing. This allows the "find in a library" links to be placed in those services, which drives traffic back to thousands of OCLC member libraries through WorldCat.org. 

It is true that Google indexes only a subset of WorldCat records in the main index. This arrangement is based on advice from Google and is constantly reviewed. Based on recent exchanges we will move to a much more extensive crawl of the WorldCat database soon (while honoring the restrictions on some data sets that OCLC licenses from third parties).

What does not come across clearly in the article is that OCLC has for some time made WorldCat.org, the largest database in the world that represents library collections, freely available for searching on the Web, and that this allows people everywhere to do research and be connected to libraries. 

As has been discussed on this blog and elsewhere, we know that WorldCat.org can be substantially improved, and OCLC is working hard to improve the links and make more libraries' collections visible.

The statement issued by the Guardian today addresses some other ways in which the article misrepresents OCLC and its revised record use policy.

As you may know, OCLC Members Council and the OCLC Board of Trustees have jointly convened a Review Board of Shared Data Creation and Stewardship to represent the membership and inform OCLC on the principles and best practices for sharing library data. The group will discuss the revised record use policy with OCLC members and other key stakeholders. Please watch for announcements of how to provide your input.  In the meantime, comments and questions are invited at recorduse@oclc.org.  

 

 

On Monday I had the opportunity to speak in Denver at the ALCTS Forum "Creating and Sustaining Communities around Shared Library Data."  LJ's Norman Oder provides a substantive, fair summary of the 2-hour session.  For those with an interest I've made my slides available on SlideShare.  

I found the panel presentations and discussion with attendees constructive and helpful. I took quite a few notes so there is much to ponder. Besides sharing the URL of the slides, I thought I would also offer some thoughts about one of the topics that occupied the speakers and audience briefly, the role and significance of WorldCat.org.

One speaker at the Forum wondered about the need for WorldCat.org as an aggregation of information about library collections.  From a different source this past week, I have heard OCLC's commitment to comprehensiveness in WorldCat.org misrepresented as an aspiration to monopoly standing in the library world. While the OCLC database is the largest of its type and in some way serves around 69,000 libraries in 112 countries, considering the number of libraries in the world and the number of cooperative services/catalogs they use, such a notion about OCLC's purpose for WorldCat ranges from the misinformed (in North America) to laughable in most other places in the world.* Instead the purpose of growing WorldCat.org is to begin attracting more attention to the world's library collections on the Web by providing a "point of concentration" to collect and drive traffic to local libraries or consortia.

I would argue that WorldCat.org is a good thing for OCLC member libraries already, and it has the potential to become a great thing. It brings eyeballs to library collections both collectively and individually--attention that otherwise will remain monopolized by the most successful Websites. For an interesting perspective on the landscape of the Web, see the map that Information Architects Japan has created.  IA's map overlays influential Web sites on the Tokyo-area train map. 

What IA's map tells us about libraries on the Web is not new. The loss of information seekers' attention to traditional libraries became painfully obvious four years ago when the Perceptions of Libraries survey report was released, revealing how much more likely respondents were to begin a search for information with a search engine (84%) than on a local library Web site (1%) (see page 17 of 34).

WorldCat.org, introduced in 2006, is a response to brick-and-mortar libraries' loss of attention in the Web landscape. The strategy is to make library collections everywhere much more visible in the main stations on the Web. Today, WorldCat.org is a destination on the Web, yes. More importantly it's a "switch," driving traffic from popular Web sites like Google Book Search to 10,000 OCLC member libraries' collections.** Very recently, a switching mechanism from the Web to many OCLC member libraries has begun to work from mobile phones, as described in the announcement of the WorldCat Mobile pilot. 

While there is plenty of work remaining to be done to consistently and reliably connect searchers from popular sites to library collections via WorldCat.org, the first hard steps have been taken, and OCLC is committed to making WorldCat.org work better for more libraries. The switching mechanism does work: a few months ago, a Hitwise commentator, reporting on downstream websites from Google Book Search, noted that 22% of visits from Google Book Search go to an Education website, with WorldCat.org the #1 Education website. There is reason for optimism that the connections to library catalogs from the Google search engine via WorldCat.org will improve, based on recent exchanges between Google and OCLC.

Going forward, for WorldCat.org to be an effective switch to libraries, it needs to be more comprehensive and connected.  To achieve its potential to help libraries, it needs to be a "point of concentration"--a large store of information about the content and whereabouts of library collections around the world. Its links need to be embedded and more visible on more of the Web's busiest sites. In these ways, WorldCat.org can help online travelers pass through the main stations of the Web and disembark at their local libraries, wherever those libraries are, from Ohio to Oslo to Okinawa.

 

---------------------
*Outsell's 2008 report estimates there are 484,990 libraries worldwide--109,795 in North America.

**To try it out, go to Google Books and search "everything is miscellaneous," then click "find this book in a library" on the book description page.

 

XC Report: New Directions

|
Bookmark or Share Bookmark or Share
Jennifer Bowen and her team at the University of Rochester have published  a white paper describing their metadata work for the eXtensible Catalog (XC) project. It is an exciting project that will reward your attention.

Some of the future directions of the project are very intriguing, and I am particularly interested to see how the XC Authority Control and Aggregation services develop.

What is encouraging about the project is that it shows a middle way between demonization and deification of the MARC format. The project seeks to use the richness of the MARC records and put it in a context which will ease reuse and repurposing. It comes across as a measured and thoughtful reaction to the possibilities and challenges of RDA, linked data, and the new world of bibliographic metadata. 

About this blog

Metalogue is a forum for sharing thoughts on all things related to knowledge organization by and for libraries, hosted by Karen Calhoun, Vice President, WorldCat and Metadata Services for OCLC. Karen is joined often by friends and colleagues from all over the globe, who contribute perspectives and experiences about the current and future state of cataloguing and metadata.

Find In A Library

Search for an item in libraries near you:
WorldCat.org »