SIMS: How big is a terabyte? All the text in the Library of Congress is about 20 terabytes.
KAHLE: This tape robot, about the size of two coke machines, has got this arm that goes and pulls tapes from these shelves and sticks them into a bank of tape drives, and then it copies them onto hard disks.
A paper on the technical aspects of archiving the Internet is online at Internet Archive's site. |
SIMS: And where else might you want to go from here? And that information is difficult to compute. That's the real crux of the Alexa service.
KAHLE: The Holy Grail in all of this is usage paths. Where have other people who have been on this Website go? Where did they go that they had a good time? Where's the good stuff? After somebody has sorted through the search engines and directories. Other people have found the good stuff; why can't I leverage that?
C O N T I N U E D . . . 2 of 2
SIMS: You're tracking people's paths like a scout tracking paths through the forest?
KAHLE: Yes, but we don't care who they were. We just want to know where are the high-traffic sites that might lead to the good views or the water hole or the good things in the forest.
We're starting the system with other suggestions that come from link analysis, content analysis, and some editorial judgment.
SIMS: There's something in it called the 404 Killer?
KAHLE: Well, one thing that you get for free if you've got an archive is you've got an ability to make out-of-print Web pages come back. There are some valuable resources that are just now out of print. You don't expect all books to be continuously printed. In the same way, we shouldn't expect all great Web pages to always be on some site that will always be there. The Web changes -- something like 1 percent each week.
SIMS: How did your path lead here?
KAHLE: At MIT, we always wanted to make things that made an impact on large numbers of people.
Danny Hillis was one of the founders of Thinking Machines, where Brewster Kahle designed supercomputers in the 1980s. Hillis is now a Disney Fellow. |
From there, this Internet stuff started coming around, and we said, "Well, that's not that different; it's just another network of computers. How do we make that searchable?" And that was the genesis of the WAIS project.
We have these computers that are very fast, and then the Internet came along and added the content. Now we've got something to play with. Now we have critical mass to go and try to make something so that these computers can augment and provide advice that's maybe useful to people.