December 2010 Archives

A Web presence for every library

| | Comments Comments (3) | TrackBacks (0) | Bookmark or Share
In April of 2010 OCLC started the Innovation Lab, a small team focused on the exploration of new technologies in uncharted spaces to enhance the products and services offered by the OCLC cooperative. Examples include the beta WorldCat mobile offering at http://worldcat.org/m and the social network integration Ask4Stuff.

The Innovation Lab also explores new services focused on the needs of specific segments of the library community. We are ready to share our work to date on a very early, experimental service aimed at providing a low cost and easy-to-use Web site service for small and rural public libraries and are looking for feedback and direction from the library community. Investigations into this type of service have started before at OCLC and elsewhere, but have often stumbled due to the challenges inherent in starting with an existing service and adapting it. Understanding this, we took a different, non-traditional path to exploring this opportunity.

In our first rounds of analysis, we wanted to start by visiting the Web sites of the types of libraries we wished to partner with--public libraries with 10 or fewer staff which represents roughly 25% of the U.S. public library community. It was a surprise to me that most of these 2,000-or-so libraries have no working or discoverable web presence. A full 20 years after the invention of the Web, every library deserves to have a presence on the Internet. This became our inspiration. In parallel with the initial analysis, we started a set of staff prototyping efforts. At the beginning of each week, for four consecutive weeks, various staff brought prototypes of software but also of service models and even new service announcements. At the end of the four weeks, we selected some of the most compelling ideas and set out to develop an experimental release by January.

As an experimental service, this is intended to be suggestive of future possibilities. It exists, at this stage, to attract feedback, advice and contributions from the community for the community (YouTube introduction). Anticipated initial features could include:

  1. A pre-populated, template driven Web site for every public library in the US.
  2. Every library can claim their site and modify its contents.
  3. A mobile presentation for every library.
  4. Access to a selection of open access, electronic full text content for users with no staff effort.
  5. Simple inventory management in the cloud based on the power of WorldCat.
  6. Simple patron management service (for those without a current solution).
  7. Simple lending functionality (check out, check in, hold, cancel, renew).

Our hope is that this will be an affordable way for small libraries to participate in the OCLC cooperative and leverage the value of our members' combined resources.

To see the experimental service, come see us at ALA Midwinter on Sunday, January 9 at 5-7 pm in the OCLC Blue Suite in the Hilton San Diego Bayfront Hotel. We will release a publicly accessible (still experimental!) version at that time. Remember, this is really, really early. Some things will work, some will not. To provide feedback, please email directly to the OCLC Innovation Lab at innovation@oclc.org. We hope to find some early adopters that are willing to push the limits and represent their small library in this environment and define a sustainable future.

It is now up to you to help us answer "Where should we go from here?"
 John Wilkin's excellent post on, "Open Bibliographic Data: How Should the Ecosystem Work?" makes thoughtful observations on the bibliographic environment and the developments needed in this rapidly changing space.

John makes some general directional statements, which I can only agree with. He also makes comments and challenges on how OCLC is, and should be, acting to address the requirements of this emerging ecosystem. Being deeply involved in OCLC's efforts in this space, I have a naturally different vantage point from which to view OCLC's activities. I would like to take the opportunity to respond to John's challenges and lay out some of OCLC's activities within this context below.

1. Bibliographic Data is not lifeless - it is in a constant state of flux...

John makes this point very well, and it is one that I could not agree with more. The counterview of bibliographic data being static and "lifeless" has been the source of much ill-informed debate.

This picture is very clear when viewing the transactions in the WorldCat database itself. In the past year we have processed around 360M records into the database, but these transactions have resulted in 'only' around 30M records being added. The vast majority of transactions result in changes to the existing data that add value by adding holdings, adding, merging and updating headings etc. They bring life to the data - they make it evolve.

WorldCat is regularly viewed as a large pool of records with more records being poured in constantly - those records just sit there "lifelessly" waiting to be used. This is absolutely not the case. The data in WorldCat is in an ongoing evolution in which records are constantly enriched. We shouldn't think of "records being added to the database" rather "records being used to further evolve the database after the many previous generations of evolution."

Managing this evolution and enrichment take significant resources - both machine and human. We are aware that as the coverage of WorldCat grows, there is a risk to its coherence through record duplication. A significant investment has been made in the past year in enhanced duplicate detection and resolution algorithms - this has resulted in the merging of 5.1 million previously duplicated records. Of course we have to move this to more open, more syndicated, more life-full existence for the data. We have early examples of this with evaluative content such as reviews, ratings etc - but we would not claim this to be anything other than scratching the surface. Linked data exposure provides one mechanism through which further life can be brought to the data.

2. ... and Bibliographic Data is not "flat."

The value in a large bibliographic aggregation can be leveraged in many more ways than just the core bibliographic records. By mining the data, high value, comprehensive data services can be exposed around key entities such as names, subjects, places, works etc. All these entities are also "alive" in a constant state of flux and are subject to the same dynamics described above.

WorldCat Identities is a good exemplar of such a service, as is the Classify service. Again, these are just a start, there is far more to be done. These concepts need to be extended to other entities, and need open interfaces for community access.

3. We need greater access to all this data...

Our intention is to expose all these entities through a standardized and consistent set of data services and interface mechanisms. These mechanisms include Syndication, Web-Service / API access and Linked Data. The purpose of this 'data service exposure' is 3-fold:

a) To allow the value of WorldCat and WorldCat-derived services to be leveraged into the flow of work throughout the ecosystem.

b) To bring more "life to the data" - allowing more distributed and varied mechanisms to be used for updating and enriching the data.

c) To benefit from the "collective innovation" of the community - providing enablers for the community to create value from the data in ways we have not yet thought of.
Early examples of this include:

  • Data syndication, particularly to the internet search engines. The primary goal has been to put libraries in to the 'internet search flow'. Embedding links to WorldCat in Google Scholar, Google Books and the main search indexes results in over 2 million users performing over 700,000 click-throughs to library services per month.

  • The WorldCat API which provides open search access to WorldCat. This has seen a 20-fold increase in traffic in the past year and now generates around 5 million transactions per month. Over 50 community generated applications have been developed using the WorldCat API and this mechanism is widely used to embed WorldCat access into affinity web-sites.

  • The WorldCat Knowledge Base API. The release of the WorldCat Knowledge Base as an integral part of the cataloguing service includes both a management user interface and API access to provide programmatic access to the knowledge base, allowing users to create their own unique solutions and benefit from network-level data management.

  • The XID services, which provide identifier mapping between related ISBN's and ISSN's, generate around 5 million transactions per month from external services.

  • The Dewey linked data service which presents portions of the DDC (in multiple languages) as linked data.

  • The Virtual International Authority File (VIAF), which matches and links many national authority files, is made available as linked data.
Even though these services are quite limited in relation to our intent, their heavy, and dramatically increasing use illustrates the demand for, and value of, open data services based on WorldCat.

At the core of our strategy moving forward is an "Open, Web-scale Platform built on WorldCat", these standardized interfaces and exposure mechanisms form the Data Services layer of that platform.

4. ... And Data isn't passive or inert - it forms the basis of processes to support library operations.

The interfaces mentioned above represent the Data Services layer of the Platform. But to truly leverage the value of WorldCat, far more can be done to support library operations and processes. To address this, a wide range of library business and workflow services are being exposed through standardized interfaces for access by the community and application developers. These library business and workflow services add further value on top of the data services.

There has been a great deal of interest in OCLC's Web-Scale Management Service (WMS) initiative, but a key point of this initiative may have been overlooked. While the WMS initiative is delivering a range of application-layer functionality, equally importantly it is exposing a set of Open Data and Business/Workflow Services for use by the community. All these services can be viewed as value-added layers on top of WorldCat.

This Platform is intended to support and enhance the notion of "collective innovation" mentioned above. Layered on top of an ecosystem of Open Data, we envision an ecosystem of value-added apps developed from within the library community at large and beyond.

John makes the statement that "OCLC should define its pre-eminence not by how big or how strong the walls are, but by how good and how well-integrated the data are. If WorldCat were in the flow of work, with others building services and activities around it, no one would care whether copies of the records existed elsewhere, and most of the legitimate requests for copies of the records would morph into linked data projects."

The phrase I use to describe OCLC's product and service strategy is "An Open and Extensible Platform built on an extended view of WorldCat." The goal of this platform is exactly to put "WorldCat in the flow of work, with others building services and activities around it".

5. Economics, Value, Rights and Responsibilities

Again, John makes insightful comments and observations on the paradox of openness and its related perils. There are significant complexities to be overcome to deliver the required ecosystem. These challenges are both technical and economic. A response to the technical challenges is briefly summarized in the text above. The economic challenges are equally great.

The economics and long-term sustainability of a club good, such as WorldCat are non-trivial; Elinor Ostrom's Nobel Prize for economics for her analysis of economic governance of the commons is a testament to this. Her book "Governing the Commons: the evolution of institutions for collective action" was essential guidance for the Record Use Policy Council, the writers of the WorldCat Rights and Responsibilities for the OCLC Co-operative. Of course, the policy developed does not make the data as open as it could, but it delivers a reasonable balance between the need for openness and the need for sustainability.

I am aware that the main focus of John's post was on Linked Data, and I have broadened this response well beyond that point. This may at times seem tangential to the main issue. However, I see this broader view as entirely inter-connected to the Open Data issue raised by John. The approach being taken is to broaden and extend the value proposition of WorldCat services through the Platform described (putting WorldCat in the flow of work). In doing so, the aim is to evolve the sustainability equation away from the data itself onto a far broader base and, hence, increasingly free up the data.

The current WorldCat Rights and Responsibilities provide a good framework for where we are today, but I am sure as we move forward, executing this strategy, they will evolve. That evolution will be toward greater openness, but will always need to be balanced against the needs of sustainability.

6. In Summary

OCLC's core strategic goal is to help libraries achieve Web-scale; an objective that we believe to be critical to the future success of libraries as a whole. This is, of course, a huge collaborative and cooperative effort. While OCLC is uniquely positioned to help in this endeavor, OCLC can at best provide a focal point for this activity. It is the collective engagement, and collective innovation of libraries that will achieve Web-scale.

The infrastructure being delivered to provide this focal point is the "Open Web-Scale Platform" as described briefly above. This platform builds on WorldCat and exposes a set of open data services, and layers a set of "business logic" services on top of those to support library operations. The primary goals of this infrastructure are to:

  • Leverage the value of WorldCat and WorldCat-derived services into the flow of work wherever possible.

  • Foster collective innovation from within the community and to allow that innovation, which for so long has been a cottage industry, to be leveraged and scaled for all. 
Linked data is one of the many important facets of this platform.

I welcome comments, feedback and collaboration.
If you're going to ALA Midwinter this year in San Diego, you owe it to yourself to check out the OCLC Symposium that we regularly host in conjunction with that event. Twice a year, at Annual and Midwinter, we invite experts from outside the library community to focus on 'big picture' topics that help us look broadly at trends and issues affecting the profession, and especially their impact on the community. Previous participants have included David Weinberger, author of "Everything is Miscellaneous," social media researcher danah boyd, cultural historian Siva Vaidhyanathan and others. The theme of this year's symposium is, "Transformational literacy: life stages and libraries:"

Every day, libraries help people transform their lives. But nowhere is that role more apparent--and important--as when people move from one life stage to another. Whether the transition is from high-school to college, student to worker, adult to parent or any other important life change, the need for information and preparation during these "between" states is more acute than at any other time in our users' lives.
I'm very excited to announce that Mimi Ito will be our keynote speaker for the event. Mimi is a cultural anthropologist who studies the use of new media, particularly among young people. She'll speak about the role technology plays in young people's information seeking behavior, and how early education and training can set students up to be lifelong learners.

After Mimi's talk, Michael Stephens from the Graduate School of Library and Information Science at Dominican University (and author of "Web 2.0 & Libraries") will moderate a discussion with Mimi, James LaRue, Director of Douglas County Libraries in Castle Rock Colorado, and Joanie Chavis, Dean of the Learning Resource Center for the Baton Rouge Community College Magnolia Library. They'll discuss the ways in which public, academic and community college librarians can help users better navigate informational challenges during the transitional times in their lives.

You can register for the event here, as well as for other OCLC activities, including the Americas Regional Council Member Meeting, which is open to all, regardless of your affiliation with OCLC.

I hope to see you there!