We'll be posting future items of interest on the main OCLC Cooperative blog, which can be found at http://community.oclc.org/cooperative/. We hope to see you there!
- Karen Calhoun and John Chapman
OCLC was recently asked to provide an estimate of the number of books held by US libraries that were published outside of the United States. Our answer? Approximately 200 million. We thought readers would be interested in learning the details of how the estimate was obtained.
In MARC records, the 260 field is the most obvious 'place of publication' field. Specifically, the 'a' and 'e' subfields are designed to record "place of publication, distribution, etc." and "place of manufacture", respectively. The problem is that these fields are filled with names - strings of characters - rather than codes, which are much more reliable and easy to parse.*
Such codes are found in the 008 header field. Using those as the focus, we made the following assumptions to answer the question at hand:
- Books are defined as monographic (bib level = 'm') language material (record type = a). This has the effect of including some materials that are not strictly speaking, "books" - pamphlets, broadsides, etc.
- English language Books lacking a known place of publication were assumed to be published in the US; the non-English language books were assumed to be published outside the US.
- Pre-1923 publications were excluded for purposes of copyright analysis.**
- The sample size was 1,700,000 records (roughly 1% of WorldCat).
- We used holdings data to determine how many copies of each title were owned by US libraries. For this holdings count, only the holdings of US libraries were considered.
Given these assumptions, we found the following:
Book titles published in the US: 26,710,400
Book titles published outside the US ("Foreign"): 78,017,300
Foreign book titles, held by US libraries: 22,801,900
Copies of foreign book titles ("holdings") worldwide: 461,596,000
US libraries' holdings of foreign book titles: 203,953,200
* In addition, while 260 ‡a (Place of publication) is common, the ‡e (Place of manufacture) only appears in less than 3% of records, making any combined analysis of these fields statistically shaky.
**If the pre-1923 cutoff is ignored, the number of foreign book titles increases by roughly 25%.
Jenn Riley of the Indiana University Libraries has released an intriguing graphic entitled "Seeing Standards: a Visualization of the Metadata Universe."
The image not only identifies and classifies 105 standards - it also evaluates them on "strength of application" in multiple axes. This judgment is based on level of adoption, design intent, and overall appropriateness. Outside of the massive labor that the research and analysis must have taken, the visual presentation is stunning. The work of Devin Becker on the graphic design should be commended.
The graphic is a useful addition to the literature and an excellent way to brush up on some standards in unfamiliar domains. The timing is excellent as well, as the American Library Association Annual Conference this week kicks off a season of acronym-filled meetings.
Late last week, OCLC hosted Laura Dawson, publishing industry consultant, on a visit to Dublin, Ohio. While she was here, Laura kicked off a series of publisher-focused webinars with her talk "Metadata Is the Message." OCLC staff members Bruce Miller and Renee Register also participated in the program, providing an introduction and joining Ms. Dawson during the Q&A.
Speaking to a webinar audience of publishers and book industry organizations, IT specialists, and librarians, Ms. Dawson emphasized the growing value of metadata for marketing. She made the point that "metadata is your marketing," explaining that as more purchasing is done online, a searcher's first and possibly only encounter with a publication is through its metadata. Underinvesting in metadata can not only impede or prevent an item's discovery and sale--it can also negatively impact the perceived quality of the resource and its publisher. Ms. Dawson suggested that successful discovery and sales rest particularly on accurate titles and author names, BISAC codes and keywords, and "descriptive descriptions" -an interesting set of priorities to compare to recent library community studies of what makes good metadata.*
Ms. Dawson's described e-commerce, often ONIX-based "metadata trails"--which proceed from basic metadata produced by the publisher, to a metadata aggregator (e.g., Bowker, Ingram, Baker & Taylor), thence to online retail giants such as Amazon, Barnes & Noble, and finally to Google and other popular sites.
It's a useful thought exercise to compare publishers' metadata trails to libraries'. In the U.S., with the exception of the Library of Congress' CIP program, the library community's metadata trail has traditionally begun with the creation of metadata based on a cataloging expert's examination of a publication in hand. The metadata is produced according to library-specific, generally MARC-based practices. This library-specific metadata is then shared via a national library or bibliographic utility (like OCLC), then re-aggregated for end-user discovery in a variety of local, group, or global catalogs (like WorldCat.org). The final step is (increasingly) syndication to other sites, including search engines.
It is intersting to consider whether for both publishers and libraries, maximum discoverability is achieved where the publisher and library metadata trails end--with search engines. This possibility is consistent with the findings of Discoverability, a 2009 University of Minnesota Libraries study, whose examination of the origins of search requests for library resources led them to conclude "users are successfully discovering relevant resources through non-library environments (e.g., general web searches, e-commerce sites, and social networking applications). We need to ensure that items in our collections and licensed resources are discoverable in non-library environments." (p. 3).
Readers may be aware of a variety of recent efforts to assess the feasibility of systematizing the library and publisher communities' metadata trails to create metadata cost savings and improve discoverability of both library and publisher materials. In a post last week, we reported on one such effort by Carol Jean Godby , an OCLC research scientist working with a team on mapping ONIX to MARC.
--Karen Calhoun and John Chapman
* A partial list of studies from the library metadata community come readily to mind: Data Driven Evidence for Core MARC Records (see p. 12-15); Online Catalogs: What Users and Librarians Want; and Implications of MARC Tag Usage on Library Metadata Practices.
The Web is a great equalizer of metadata. On the one hand, the lines between professional and amateur creators of metadata are blurring (the subject for another blog post at some point). On the other, the lines between well organized, but historically insular communities of metadata practice--like publishers and libraries--are beginning to blur also, and for good reason. Better aligning the library and publishing metadata traditions to enable large-scale metadata re-use and exchange has the potential to lower internal costs and improve discoverability of published works for both publishers and libraries.
Before either of those outcomes are attainable for the publishing and library communities, both need to learn more about the other's metadata practices, what standards are in place, how these standards are structurally and semantically the same and different, and how the metadata produced using these standards support desired outcomes (such as connecting information seekers with published works at the point of need; showing what is available in a given collection; or assisting in the choice or delivery of a particular item).
The recently released report "Mapping ONIX to MARC," from OCLC research scientist Carol Jean Godby, makes impressive progress in answering these questions. In this report, Dr. Godby shares what she and a team at OCLC learned in the process of implementing Metadata Services for Publishers, which was introduced in 2009. Metadata Services for Publishers' main activity consists of OCLC's receiving records for new items directly from publishers in the form of ONIX, enriching the metadata using WorldCat data elements, returning the enriched metadata to publishers, and then adding the blended metadata to WorldCat.
Dr. Godby provides the metadata specialist with an informative, detailed look at the work of converting and using ONIX data in the context of library bibliographic databases. Particularly helpful is the considerable detail about crosswalking ONIX to MARC. Dr. Godby is skillful in describing some of the structural and syntactical peculiarities of MARC. These issues directly inform current discussion in the library data world about a wholesale transitioning from MARC to other formats.
The report also represents an advance in conceptualizing the crosswalking process at a useful level of abstraction. Godby describes a "crosswalk" as a set of self-contained "maps" which each describe "a source, a target, and, optionally, some conditional logic." Several related maps (i.e., those handling publication identifiers) can be treated together as a "mapping." A crosswalk, in this conception, is a human-readable document which must be made into a machine-readable set of instructions for the task at hand.
A further, in-depth description of some of the mappings will be of particular value, and perhaps comfort, to anyone who has wrestled with MARC metadata. Another point of interest to many will be the fact that the crosswalk developed for the program is publicly available from the OCLC Web site and EdItEUR for comment and further development.
--Karen Calhoun and John Chapman
This issue has become more prominent recently due to a rapid uptake of the CONTENTdm "Quickstart" version, which is included with a FirstSearch Base Package subscription. OCLC's adoption of the OAIster database, which contains a variety of metadata formats including DC, also encouraged us to think about ways to publish best practices for representing collections in DC.
Since August 2009, the CONTENTdm Metadata Working Group, facilitated by OCLC but comprising an open membership of CONTENTdm users, have been developing a formal best practices document. The latest current version, "'Best Practices' for CONTENTdm users creating shareable metadata: Draft 1.8" can be found at http://www.contentdm.org/USC/BestPracticesGuide.pdf. The Guide is authored by Geri Ingram of OCLC Digital Collection Services, Myung-Ja "MJ" Han of University of Illinois Urbana-Champaign, and Sheila Bair of Western Michigan University. They have received crucial support from Jason Lee, OCLC Fellow, and the members of the Metadata Working Group.
Not content to rest, the Metadata Working Group is acting on a variety of fronts to extend the utility of their guidelines. They have been testing their schemas in the Digital Collections Gateway, a new self-mapping tool. They are also extending the Guide with addenda regarding compound objects in CONTENTdm, crosswalking, and consortial metadata harvesting. In all of their work, they are keeping in mind the important balance between representing collections in ways that are usable for both local users and the global community.
ITHAKA is an organization that supports the digital preservation of research and literature through JSTOR and Portico. They pursue important consultative activity through their Ithaka S+R operation. One major project is the triennial Faculty Survey. The 2009 issue is now available at http://www.ithaka.org/ithaka-s-r/research/faculty-surveys-2000-2009/Faculty%20Study%202009.pdf.
The findings presented in Chapter 1 of the 2009 Survey, "Discovery and the Evolving Role of the Library" promise to be of keen interest to those building and maintaining library online catalogs. The headline of this blog entry is taken from page 7 of the report. Throughout the report, the reader will also find interesting evidence about faculty preferences and behaviors, particularly those related to scholarly communications.
Building on a great level of community interest, the Ithaka S+R team have announced a series of webinars to flesh out and further explore specific sections of the report. OCLC staff will be attending, and we recommend you do so as well.
Further information about the webinars is available at http://www.ithaka.org/about-ithaka/announcements/ithaka-s-r-upcoming-webinars-2009-faculty-survey-findings.
The members of the Record Use Policy Council (RUPC) created this draft and they are now seeking input on it. Speaking as one of the twelve members of the RUPC, I would say that the RUPC's draft owes a good deal to OCLC members and others who were willing to speak up and make their perspectives known. Last year's involved community discussion, followed up by the RUPC's hard work between September 2009 and now, have engendered a new draft policy that (I believe) balances members' needs to share their metadata with the need to sustain the shared resource that is WorldCat.
As the RUPC began its work, we relied on many sources, including the scholarly literature, to inform our thinking. A key source was Understanding Knowledge as a Commons: From Theory to Practice, a collection of papers edited by Charlotte Hess and Elinor Ostrom of Indiana University. One of the most important influences on the approach and tone of the book is the Nobel-prize winning work of Elinor Ostrom.
Elinor Ostrom earned her prize, said the Nobel committee, "for her analysis of economic governance, especially the commons." In their opening chapter of Understanding Knowledge as a Commons, she and colleague Charlotte Hess join many scholars who have criticized the contentions of Hardin and his "tragedy of the commons" as mistaken; at the same time they argue that knowledge and information commons, as shared resources, are vulnerable to "social dilemmas."
Hess and Ostrom's work recognizes that successful, durable knowledge or information commons require strong collective action, self-governing mechanisms, agreed norms of reciprocity, the means to resolve disputes, and more. Their ideas provided a useful framework for thinking about and articulating the objectives of the RUPC's draft policy.
I join the RUPC members in encouraging you to take advantage of this time for community review. We welcome your thoughts about the draft policy. Several methods for providing input to the RUPC are available.The period of community review is scheduled for April and most of May.
Of specific interest is the point made in the Executive Summary that, differently from other analyses of catalog and record use, this group focused on the use of MARC by machine applications. While the focus on "machine applications" may sound limiting, it is crucial in understanding how MARC is indexed and processed. This understanding can then lead to a more informed analysis of how to get more out of our search, discovery, and delivery systems. (For example, individuals participating in the OCLC Developer Network have created a number of innovative applications using machine-to-machine access to WorldCat data.)
Looking at the end of the Executive Summary, there is a list of assertions about "MARC's Future." Taken together, these form a call to quickly transition from MARC, and to do so in a way that allows us to "meet the demands ... from the rest of the information universe", using linked data and other solutions deployed both inside and outside of libraries. At once there is a push for reducing redundancy, enhancing flexibility, more quickly resolving the technical and social dilemmas around wide data sharing across systems, and being open to new approaches.
Many of the points made hearken back to "On The Record", the final report from the Library of Congress' Working Group on the Future of Bibliographic Control. That report recommended casting a wide net for data that could be used to help with organization and access. It identified eliminating redundancies as a key step in increasing record production efficiency, pushed for development of a new metadata carrier, and recommended a new focus on user needs over administrative requirements. And both reports show a frustration with the information locked up in MARC fields.
The final assertion in the MARC usage Executive Summary: "Rather than enhancing MARC and MARC-based systems, let's give priority to interoperability...."
Does the library world have the collective will to step away from MARC as recommended in these reports? What do you think?
We'll have a followup post examining some of the implications of specific parts of the study.
As an update on this post, we have made available recordings of the ALA session on the sustainability and economics of the collaborative national bibliographic framework. The recordings are linked from this page:http://www.oclc.org/us/en/multimedia/2010/alamw_techservices.htm
A short intro by Karen Calhoun highlights the Library of Congress Working Group on the Future of Bibliographic Control report as an impetus for a new look at the role of cooperatives and national libraries in the descriptive environment.
Alisdair Ball's presentation provides useful information on the scale and profit/non-profit service mix of the British library. The description of the overall national framework provides a useful contrast to the US model, while retaining crucial environmental similarities. Ball also points out the SCONUL Shared Services Survey [Summary here: http://sconulss.blogspot.com/2009/06/shared-services-survey-headlines.html], which may be unfamiliar to some, which surveys the appetite among UK libraries for shared services.
Ruth Fischer of R2 Consulting provided an overview of the report they prepared on behalf of LC on the MARC record marketplace [Original report, our commentary]. While stressing the limited scope of R2's assignment from LC, she highlights the report's most important points: the high cataloging capacity that remains underused due to insufficient incentives, the distorting market effect of LC's record supply subsidy, and the disjunction between community and commercial values in the information market.
Brian Schottlaender of UC San Diego begins with a useful set of references to seminal reports and studies in the area. He asserts that environmental conditions have moved to a point where changes in cataloging practice are desirable and feasible. His presentation describes the steps that UC libraries have taken to determine the best and most efficient ways to take collaborative cataloging to a new level.
In the question and answer session, an attendee from Lyrasis offers personal anecdotes about the difficulties in shifting cataloging priorities. Jay Schaefer of University of Massachusetts Amherst, reacting to Schottlaender, discusses frankly the difficulties in large organizations with multiple employee classifications, leading to a valuable discussion of training. Diane Hillmann of Information Institute of Syracuse and Metadata Management Associates asks about the interplay between the trend toward making government information more open and possible moves toward cost-recovery for its production. Bob Wolven of Columbia University points out some areas in which catchphrases are emerging, leading to a discussion about unpacking the concept of "uniquely adding value." Kevin Randall cautions that the difference between "metadata work" and "cataloging" is overplayed, and separating the two is a false dichotomy. Robin Wendler of Harvard brings up points relating to the distribution and re-distribution of MARC records and the cost and restrictions engendered.