Digitizing a Million Books

If you are curious at just what it takes to digitize over a million books, TechReview.com has an article by Kate Greene that you may want to read. It mentions the two major scanning projects going on – Google’s cooperative efforts with Harvard, Stanford, the University of Michigan, the University of Oxford, and the New York Public Library, and the Million Book Project at Carnegie Mellon University in Pittsburgh. While Google is secretive about how they are doing, the article discusses the Million Book Project’s efforts and resources.


Needed: scanning software for 430 languages and a system to organize the next big leap in the information age.

Fifteen months after Google announced a book-scanning project of biblical proportions -– an effort to digitize the entire book collections of the New York Public Library and Harvard University libraries, among others — the company is still secretive about how they are solving key technical problems and won’t say how much they’ve accomplished so far.

However, a similar if smaller project — the Million Book Project at Carnegie Mellon University in Pittsburgh — has been underway for about seven years. It could provide some clues. The project’s director, computer scientist Raj Reddy, says he and his colleagues have no more knowledge about Google’s methods or progress than anyone else, but they are tackling many of the same challenges.

I’ve got half a dozen books that I’ve been working on digitizing specific family information.

Leave a Comment