By SIVA VAIDHYANATHAN
Wouldn’t it be cool if we didn’t have to tell students that a Web search is insufficient for serious scholarly research? Wouldn’t it be great if we could use a single, simple portal to find the most-significant Web pages, images, scholarly articles, and books dealing with a particular subject or keyword? Wouldn’t it be wonderful if we could do full-text searches of millions of books?
The dream of a perfect research machine seems almost within our reach. Google, the Mountain View, Calif., company flying high off a huge initial public offering of stock and astounding quarterly revenues, announced late last year that it would digitize millions of bound books from five major English-language libraries. It plans to make available online the full text of public-domain books (generally those published before 1923, plus government works and others never under copyright) and excerpts from works still in copyright.
Harvard University will allow Google to scan 40,000 books during the pilot phase of the project, and the number may grow. The library has more than 15 million volumes. The University of Michigan at Ann Arbor has agreed to let Google scan its entire collection — some 7.8 million works — and Stanford University says it is keeping open the possibility of including “potentially millions” of its more than eight million volumes. The Bodleian Library at the University of Oxford will allow Google to scan public-domain books, which it says are principally those published before 1920. The main library alone holds 6.5 million books in its collection. And the New York Public Library will put in from 10,000 to 100,000 public-domain volumes. It holds 20 million volumes. Even if the project only included Michigan’s collection, it would be astounding.
Google is doing all the scanning and optical-character recognition with a secret proprietary machine and promises not to damage the pages or bindings. According to Google’s contract with Michigan (the only contract released to the public), the university will be offered a digital copy as well.
I have to confess, I am thrilled and dazzled by the potential of such a machine and the research and distribution opportunities it presents. I sincerely wish every Internet user had access to a full-text search of every book in the Google libraries.
But, as we all know, we should be careful what we wish for. This particular project, I fear, opens up more problems than it solves. It will certainly fail to live up to its utopian promise. And it dangerously elevates Google’s role and responsibility as the steward — with no accountability — of our information ecosystem. That’s why I, an avowed open-source, open-access advocate, have serious reservations about it.
It pains me to declare this: Google’s Library Project is a risky deal for libraries, researchers, academics, and the public in general. However, it’s actually not a bad deal for publishers and authors, despite their protestations.