« Better All The Time #19 | Main | Magic Numbers »

Encyclopedia Galactica

Via Kurzweil AI, check out this modest proposal made at the Web 2.0 conference in San Franciso:

Universal access to all human knowledge could be had for around $260m, a conference about the web's future has been told.

The idea of access for all was put forward by visionary Brewster Kahle, who suggested starting by digitally scanning all 26 million books in the US Library of Congress.

In his speech, Mr Kahle pointed out that most books are out of print most of the time and only a tiny proportion are available on bookshop shelves.

He estimated that the scanned images would take up about a terabyte of space and cost about $60,000 (£33,000) to store. Instead of needing a huge building to hold them, the entire library could fit on a single shelf.

This is a tremendous idea; and the cost of doing it is only going to go down. The initial scanning work is the only part of the plan that's likely to present much of an expense factor. According to Moore's Law, that $60,000 price tag for storage should be somewhere around $2,000 eight years from now. If the estimate for the robot scanner is accurate, and it follows a less robust drop in price — say halving once every four years — we would be looking at a price tag of around $65 million in the same period of time. Pretty doable, I'd say.

Unfortunately, the legal concept of public domain is rapidly diminishing, while copyright terms are lengthened and controls are made more expansive. As John Bloom observed a while back in The New Republic:

In the name of Mickey Mouse and other American icons, we have gradually lengthened that 14-year limit on copyrights. At one time it was as much as 99 years, then scaled back to 75 years, then — in one of the most anti-American acts of the last century — suspended entirely in 1998. The Sonny Bono Copyright Term Extension Act of that year says simply that there will be no copyright expirations for 20 years, meaning that everything published between 1923 and 1943 will not be released into the public domain. Presumably they'll take up the matter again in 2018 and decide whether any of these books, movies, or songs are ever set free. There are 400,000 of them.

So Kahle's observation that few of these books are still on the shelf will be beside the point. A scanned-in Library of Congress could conceivably serve as a back-up to the print archive, providing an excellent disaster recover resourse, but it would probably not be possible to distribute the whole archive. Only those parts created before 1923.

Of course, there's hope that, when the copyright issue is reviewed again by Congress (presumably in 2018) the public will be more aware of what's going on and will not stand for any more expansions of copyright controls. Failing that, maybe we could get an exception to copyright law into place. Perhaps we could make this backup of the Library of Congress exempt from all copyright restrictions as long as it's used by schools and public libraries.

By 2018, the storage for a copy of the entire Library of Congress online should cost less than $1000; even the cost of creating the archive would be $15 million or less. We could put the entire Library of Congress in every school in America.

Comments

A good compromise would be a mandated pay for content system. Whether they want to or not, content providers would be compelled to offer their content at a mandated price per unit of data. Governments could argue that this is a form of "imminent domain."

Once you've paid a price you would have the right to view that data at any place, any time.

There would be a little running meter at the side of your screen and little popups that would remind you that you are about to purchase the right to view content.

we're getting closer to "The Library" that the aliens had in David Brin's Uplift Trilogy. i thought that the library was one of the coolest things in the books.

Hmmm, the thing that was impressive here is that a terabyte is very affordable. Ie, it might cost $260 million to convert to electronic, but it costs a few thousand to store all of that data on common electronic media. Eg, we're talking a few of the current largest hard disks or maybe 2,000 CD's. Very affordable either way.

Post a comment