Internet Archive
-
0:05 - 0:49[A documentary by Jonathan Minard featuring Brewster Kahle Robert Miller Alexis Rossi]
[background music] -
0:49 - 0:54So when we started collecting the World Wide Web we just did the same sorts of things that the search engines did.
-
0:54 - 1:01Its basically a computer that clicks on every link on every web page and then records its result.
-
1:01 - 1:10And it goes and gets that web page and then adds it in then it goes and clicks on every one of those links and it keeps going and going and going until you gets to the end.
-
1:10 - 1:22Turns out there is no end the web is in fact infinite. So we prioritise it to try to go and make sure we have got some pages from every website every two months and we have since 1996.
-
1:22 - 1:30Then we made it into the Wayback Machine by making it so that if you type in a URL it shows you all of the past versions and you can surf the web as it was.
-
1:30 - 1:45We thought it would be kind of a quirky fun thing for people who have kind of lost their websites or something like that but it turns out that lots and lots of people use it. it is used by about 500,000 people every day.
-
1:45 - 1:58Well so welcome so this is the Internet Archive we bought this building about three years ago it was a Christian Science Church and the idea was to flatten the floor and make it into a library
-
1:58 - 2:07But exactly what a library looks like now is a little unclear so were sort of adapting to the building and the building is adapting to us
-
2:07 - 2:20So these are servers this is about 2 and a half petabytes of the Internet Archive and they are actually the primary copy of a lot of our books music and video
-
2:20 - 2:29were about to hit 10 petabytes on Thursday were having our party to celebrate the tenth petabyte the lucky byte that finishes our ten petabytes
-
2:29 - 2:43But these are actually ongoing and up and running the lights are attached to the hard drives and they indicate when somebody is uploading or downloading something from the internet Archive
-
2:43 - 2:50and the red one means that it actually needs to be replaced so its probably running out of the mirror copy that is in another place
-
2:50 - 2:57There are a couple of problems with trying to do data preservation. most people really focus on hard drives going wrong or something like that
-
2:57 - 3:05and frankly that just takes dillegence you have to copy things forward every three to five years and if you ever miss a beat you'll lose it so thats a problem
-
3:05 - 3:15But theres also trying to contextualise it so can you read the old formats so we've had to go and translate our movies that maybe have come in in mpeg2 or some other formats
-
3:15 - 3:27Five times and we've had to go back over all of them and redo them five times and keep them so they are in use so that is an ongoing diligence but that isn't even really the problem
-
3:27 - 3:37The problem is in institutional failure, what happens to libraries is that they burn they get burned by governments thats not a political statement its just historically what happens
-
3:37 - 3:54The Library of Congress has already burned once. So if thats what happens to libraries lets design for it. If the Library of Alexandria had made a copy and put it in either India or China we'd have the other works of Aristotle the other plays of Euripedes it would be great
-
3:54 - 4:10But they didn't and why not? A large part is ego. So to say well well its expensive, yeah not really its mostly ego and if you take your crown jewels and move it some place else, well are you less for it? if theres multiple copies are you less for it?
-
4:10 - 4:15Theres starting to be a generation of people that are living in this multiple copy world
-
4:15 - 4:26Wikipedia basically says go ahead make a Wikipedia II take everything weve done and make a wikipedia II - thats bold, thats a really interesting thing
-
4:26 - 4:39We've gone and made partial copies of ourselves and put it in the new Library of Alexandria, which is a beautiful place, and also a partial copy in Amsterdam and access for all data centers there they've donated those spaces
-
4:39 - 4:52so the idea of putting having large scale archives in other political areas such that they go through there different ups and downs based on different forces
-
4:52 - 5:04That I think is really the way to make a Library of Alexandria version II that will last for at least as many centuries and hopefully more centuries than the version I did.
-
5:04 - 5:22More servers! OK So this is, these are more of our servers, these are the front end machines, search enines databases and the like, these are storage machines
-
5:22 - 5:34But everything is made out of the same stuff over and over again which is now becoming more common this is how the Googles and the Hotmails and the Yahoos work
-
5:34 - 5:47But we also use this you notice there is no machine room here right this is an office for the organist that was here and you feel sort of the breeze coming in
-
5:47 - 5:57The cool air comes in goes through the machines comes out hot, goes into our furnace which we keep off, and it pumps it back through the building we heat the building
-
5:57 - 6:09We use the energy twice we think that because we use it twice we think this maybe the first time a data centre has been over 100% efficient
-
6:09 - 6:28So theres no air conditioning we use mother nature for that if its a hot day we open the window otherwise we open the door and we circulate the air in the Internet Archive
-
6:28 - 6:43Then we started collecting these books and scanning them and we didnt want to throw them out, we love books and its important to hold on to them.
-
6:43 - 6:55The book that was scanned, that becomes the digital version that the next generation may get, is in some sense special
-
6:55 - 7:08Welcome, my name is Robert Miller I am global director of books for the Internet Archive and im standing in our physical archive in Richmond California home for up to three million books which will be stored here for 50 to 100 years
-
7:08 - 7:25We have high density long term deep stoage devices. These units that we have are hooked up with thermo couples to measure temperature and humidity each one holds approximately 40,000 books
-
7:25 - 7:35So what we have here is space for approximately 3 million items, there are estimates that there are 100 million books that have been published in the world and we'd like to get one copy of each
-
7:35 - 7:45Our intial target is ten million volumes, why ten million? Well thats about the size of a Princeton University, a Yale University, or the Boston public library
-
7:45 - 7:57So when I open one of these storage devices up here you see the boxes and books and pallettes inside, each one again holds about 40,000 items
-
7:57 - 8:13the one I have in my hand is the British Patent Office was deaccessioning a collection actually throwing them out and we were fortunate enough to get a copy of these
-
8:13 - 8:52[sound of pages turning and scanning machine and background music]
-
8:52 - 9:05The idea of all the books of all time being able to be available to anybody, no matter where you are in the world, because the storage computing and internet technologies that can make it
-
9:05 - 9:19So that a poor kid in Kenya or a poor kid in Kansas can go and have access to the great works no matter where they are or when they were done
-
9:19 - 9:29One of the things that we say here all the time is 'bits in and bits out'. That is basically just an even shorter way of saying universal access to all knowledge
-
9:29 - 9:40Well do you go and put it into 'a cloud' which surely means putting it into corporate hands somebody else that might turn it off at any moment like a Yahoo video thats already gone Google video thats already gone Geocities its already gone
-
9:40 - 9:52Youtube? is that going to last for ever i don't think so Flickr? not really, so how do you go and try to give things away in a perpetual way? Acess drives preservation.
-
9:52 - 10:04So if you think of well why don't we just go and encrypt it and put in in a vault and well be able to look at it in 70 years or something like that? I think that kind of dark archive is the worst possible idea.
-
10:04 - 10:13I think it is keeping things in use, active, that keeps it part of the mind share that keeps people knowing about it, liking it, caring for it.
-
10:13 - 10:25So i think that the best way to preserve things is to make things accessible. it may sound a little unobvious but especially in this digital age where its so easy to forget
-
10:25 - 10:34If you take things away for a generation its as if it doesn't exist so everything we do is open source and all the things that we do we try to give away
-
10:34 - 10:42Can you make it work to give everything away? This is a real experiment and its turning out to work
-
10:42 - 10:54We know how to get all of the information out of every book ever written, we could do it given enough money, we absolutely have the ability to do that right now today
-
10:54 - 11:08Recording every television programme that is broadcast anywhere in the entire world we know how to do that - we record 100 channels 24 hours a day right now its just a matter of scale and money we can do it right now
-
11:08 - 11:17We know how to get the music off of LPs we know how to get the videos off of DVDs we know how to do all of these things its all possible right now
-
11:17 - 11:20Thank you very much for coming tonight woo !
-
11:20 - 11:39The question I think is whether we have the will to do it, do we as countries have the will to finance creating a library that actually does give universal access to all knowledge we can do it right now
-
11:39 - 12:16It is starting to get big but its on the order of a total of about 10 petabytes of data so the Library of Congress all the words in the Library of Congress are about 25 terabytes - so its not a thousand Library of Congresses but we are starting to get there
-
12:16 - 12:50[background music and credits]
- Title:
- Internet Archive
- Description:
-
Archive is a documentary focused on the future of long-term digital storage, the history of the Internet and attempts to preserve its contents on a massive scale.
Part one features Brewster Kahle, founder of the Internet Archive and his colleagues Robert Miller, director of books, and Alexis Rossi, director of web collections. On a mission to create universal access to all knowledge, the Internet Archive’s staff have built the world's largest online library, offering 10 petabytes of archived websites, books, movies, music, and television broadcasts.
The video includes a tour of the Internet Archive’s headquarters in San Francisco, the book scanning center, and the book storage facilities in Richmond, California.
Directed by Jonathan Minard
Cinematography by John Behrens, Alexander Porter, and Fearghal O'dea
Produced at the Internet Archive on October 22-26, during the Books in Browsers Conference and 10 Petabyte Celebration. Project supported by Eyebeam
- Video Language:
- English
glad_tidings edited English subtitles for Internet Archive | ||
glad_tidings edited English subtitles for Internet Archive |