How Many "Foreign" Books Are in US Libraries?
OCLC was recently asked to provide an estimate of the number of books held by US libraries that were published outside of the United States. Our answer? Approximately 200 million. We thought readers would be interested in learning the details of how the estimate was obtained.
In MARC records, the 260 field is the most obvious 'place of publication' field. Specifically, the 'a' and 'e' subfields are designed to record "place of publication, distribution, etc." and "place of manufacture", respectively. The problem is that these fields are filled with names - strings of characters - rather than codes, which are much more reliable and easy to parse.*
Such codes are found in the 008 header field. Using those as the focus, we made the following assumptions to answer the question at hand:
- Books are defined as monographic (bib level = 'm') language material (record type = a). This has the effect of including some materials that are not strictly speaking, "books" - pamphlets, broadsides, etc.
- English language Books lacking a known place of publication were assumed to be published in the US; the non-English language books were assumed to be published outside the US.
- Pre-1923 publications were excluded for purposes of copyright analysis.**
- The sample size was 1,700,000 records (roughly 1% of WorldCat).
- We used holdings data to determine how many copies of each title were owned by US libraries. For this holdings count, only the holdings of US libraries were considered.
Given these assumptions, we found the following:
Book titles published in the US: 26,710,400
Book titles published outside the US ("Foreign"): 78,017,300
Foreign book titles, held by US libraries: 22,801,900
Copies of foreign book titles ("holdings") worldwide: 461,596,000
US libraries' holdings of foreign book titles: 203,953,200
Notes:
* In addition, while 260 ‡a (Place of publication) is common, the ‡e (Place of manufacture) only appears in less than 3% of records, making any combined analysis of these fields statistically shaky.
**If the pre-1923 cutoff is ignored, the number of foreign book titles increases by roughly 25%.


You mention "[t]he problem [with 260] is that these fields are filled with names - strings of characters - rather than codes, which are much more reliable and easy to parse."
Assuming we retain this area into the future and assuming the use of MARC for the next 10 yrs, do you think the trend will be eventually to switch to codes for places and publisher names in 260 rather than strings of characters? Surely codes for places and publishers already exist and could be adapted for our use in bibliographic data. Do the publishers have the place and publisher names in coded form already?
In this area, if not others, perhaps the cataloging rule could be "Take/transcribe what you see only in the absence of coded values."
I'm assuming from the methodology that these figures would include e-books, if the original was published outside the US. (E.g., netLibrary copies of UK publications) Any way to know how the numbers would change if limited to print?