[Script Info] Title: [Events] Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text Dialogue: 0,0:00:00.00,0:00:02.00,Default,,0000,0000,0000,,Erez Lieberman Aiden: Everyone knows Dialogue: 0,0:00:02.00,0:00:05.00,Default,,0000,0000,0000,,that a picture is worth a thousand words. Dialogue: 0,0:00:07.00,0:00:09.00,Default,,0000,0000,0000,,But we at Harvard Dialogue: 0,0:00:09.00,0:00:12.00,Default,,0000,0000,0000,,were wondering if this was really true. Dialogue: 0,0:00:12.00,0:00:14.00,Default,,0000,0000,0000,,(Laughter) Dialogue: 0,0:00:14.00,0:00:18.00,Default,,0000,0000,0000,,So we assembled a team of experts, Dialogue: 0,0:00:18.00,0:00:20.00,Default,,0000,0000,0000,,spanning Harvard, MIT, Dialogue: 0,0:00:20.00,0:00:23.00,Default,,0000,0000,0000,,The American Heritage Dictionary, The Encyclopedia Britannica Dialogue: 0,0:00:23.00,0:00:25.00,Default,,0000,0000,0000,,and even our proud sponsors, Dialogue: 0,0:00:25.00,0:00:28.00,Default,,0000,0000,0000,,the Google. Dialogue: 0,0:00:28.00,0:00:30.00,Default,,0000,0000,0000,,And we cogitated about this Dialogue: 0,0:00:30.00,0:00:32.00,Default,,0000,0000,0000,,for about four years. Dialogue: 0,0:00:32.00,0:00:37.00,Default,,0000,0000,0000,,And we came to a startling conclusion. Dialogue: 0,0:00:37.00,0:00:40.00,Default,,0000,0000,0000,,Ladies and gentlemen, a picture is not worth a thousand words. Dialogue: 0,0:00:40.00,0:00:42.00,Default,,0000,0000,0000,,In fact, we found some pictures Dialogue: 0,0:00:42.00,0:00:47.00,Default,,0000,0000,0000,,that are worth 500 billion words. Dialogue: 0,0:00:47.00,0:00:49.00,Default,,0000,0000,0000,,Jean-Baptiste Michel: So how did we get to this conclusion? Dialogue: 0,0:00:49.00,0:00:51.00,Default,,0000,0000,0000,,So Erez and I were thinking about ways Dialogue: 0,0:00:51.00,0:00:53.00,Default,,0000,0000,0000,,to get a big picture of human culture Dialogue: 0,0:00:53.00,0:00:56.00,Default,,0000,0000,0000,,and human history: change over time. Dialogue: 0,0:00:56.00,0:00:58.00,Default,,0000,0000,0000,,So many books actually have been written over the years. Dialogue: 0,0:00:58.00,0:01:00.00,Default,,0000,0000,0000,,So we were thinking, well the best way to learn from them Dialogue: 0,0:01:00.00,0:01:02.00,Default,,0000,0000,0000,,is to read all of these millions of books. Dialogue: 0,0:01:02.00,0:01:05.00,Default,,0000,0000,0000,,Now of course, if there's a scale for how awesome that is, Dialogue: 0,0:01:05.00,0:01:08.00,Default,,0000,0000,0000,,that has to rank extremely, extremely high. Dialogue: 0,0:01:08.00,0:01:10.00,Default,,0000,0000,0000,,Now the problem is there's an X-axis for that, Dialogue: 0,0:01:10.00,0:01:12.00,Default,,0000,0000,0000,,which is the practical axis. Dialogue: 0,0:01:12.00,0:01:14.00,Default,,0000,0000,0000,,This is very, very low. Dialogue: 0,0:01:14.00,0:01:17.00,Default,,0000,0000,0000,,(Applause) Dialogue: 0,0:01:17.00,0:01:20.00,Default,,0000,0000,0000,,Now people tend to use an alternative approach, Dialogue: 0,0:01:20.00,0:01:22.00,Default,,0000,0000,0000,,which is to take a few sources and read them very carefully. Dialogue: 0,0:01:22.00,0:01:24.00,Default,,0000,0000,0000,,This is extremely practical, but not so awesome. Dialogue: 0,0:01:24.00,0:01:27.00,Default,,0000,0000,0000,,What you really want to do Dialogue: 0,0:01:27.00,0:01:30.00,Default,,0000,0000,0000,,is to get to the awesome yet practical part of this space. Dialogue: 0,0:01:30.00,0:01:33.00,Default,,0000,0000,0000,,So it turns out there was a company across the river called Google Dialogue: 0,0:01:33.00,0:01:35.00,Default,,0000,0000,0000,,who had started a digitization project a few years back Dialogue: 0,0:01:35.00,0:01:37.00,Default,,0000,0000,0000,,that might just enable this approach. Dialogue: 0,0:01:37.00,0:01:39.00,Default,,0000,0000,0000,,They have digitized millions of books. Dialogue: 0,0:01:39.00,0:01:42.00,Default,,0000,0000,0000,,So what that means is, one could use computational methods Dialogue: 0,0:01:42.00,0:01:44.00,Default,,0000,0000,0000,,to read all of the books in a click of a button. Dialogue: 0,0:01:44.00,0:01:47.00,Default,,0000,0000,0000,,That's very practical and extremely awesome. Dialogue: 0,0:01:48.00,0:01:50.00,Default,,0000,0000,0000,,ELA: Let me tell you a little bit about where books come from. Dialogue: 0,0:01:50.00,0:01:53.00,Default,,0000,0000,0000,,Since time immemorial, there have been authors. Dialogue: 0,0:01:53.00,0:01:56.00,Default,,0000,0000,0000,,These authors have been striving to write books. Dialogue: 0,0:01:56.00,0:01:58.00,Default,,0000,0000,0000,,And this became considerably easier Dialogue: 0,0:01:58.00,0:02:00.00,Default,,0000,0000,0000,,with the development of the printing press some centuries ago. Dialogue: 0,0:02:00.00,0:02:03.00,Default,,0000,0000,0000,,Since then, the authors have won Dialogue: 0,0:02:03.00,0:02:05.00,Default,,0000,0000,0000,,on 129 million distinct occasions, Dialogue: 0,0:02:05.00,0:02:07.00,Default,,0000,0000,0000,,publishing books. Dialogue: 0,0:02:07.00,0:02:09.00,Default,,0000,0000,0000,,Now if those books are not lost to history, Dialogue: 0,0:02:09.00,0:02:11.00,Default,,0000,0000,0000,,then they are somewhere in a library, Dialogue: 0,0:02:11.00,0:02:14.00,Default,,0000,0000,0000,,and many of those books have been getting retrieved from the libraries Dialogue: 0,0:02:14.00,0:02:16.00,Default,,0000,0000,0000,,and digitized by Google, Dialogue: 0,0:02:16.00,0:02:18.00,Default,,0000,0000,0000,,which has scanned 15 million books to date. Dialogue: 0,0:02:18.00,0:02:21.00,Default,,0000,0000,0000,,Now when Google digitizes a book, they put it into a really nice format. Dialogue: 0,0:02:21.00,0:02:23.00,Default,,0000,0000,0000,,Now we've got the data, plus we have metadata. Dialogue: 0,0:02:23.00,0:02:26.00,Default,,0000,0000,0000,,We have information about things like where was it published, Dialogue: 0,0:02:26.00,0:02:28.00,Default,,0000,0000,0000,,who was the author, when was it published. Dialogue: 0,0:02:28.00,0:02:31.00,Default,,0000,0000,0000,,And what we do is go through all of those records Dialogue: 0,0:02:31.00,0:02:35.00,Default,,0000,0000,0000,,and exclude everything that's not the highest quality data. Dialogue: 0,0:02:35.00,0:02:37.00,Default,,0000,0000,0000,,What we're left with Dialogue: 0,0:02:37.00,0:02:40.00,Default,,0000,0000,0000,,is a collection of five million books, Dialogue: 0,0:02:40.00,0:02:43.00,Default,,0000,0000,0000,,500 billion words, Dialogue: 0,0:02:43.00,0:02:45.00,Default,,0000,0000,0000,,a string of characters a thousand times longer Dialogue: 0,0:02:45.00,0:02:48.00,Default,,0000,0000,0000,,than the human genome -- Dialogue: 0,0:02:48.00,0:02:50.00,Default,,0000,0000,0000,,a text which, when written out, Dialogue: 0,0:02:50.00,0:02:52.00,Default,,0000,0000,0000,,would stretch from here to the Moon and back Dialogue: 0,0:02:52.00,0:02:54.00,Default,,0000,0000,0000,,10 times over -- Dialogue: 0,0:02:54.00,0:02:58.00,Default,,0000,0000,0000,,a veritable shard of our cultural genome. Dialogue: 0,0:02:58.00,0:03:00.00,Default,,0000,0000,0000,,Of course what we did Dialogue: 0,0:03:00.00,0:03:03.00,Default,,0000,0000,0000,,when faced with such outrageous hyperbole ... Dialogue: 0,0:03:03.00,0:03:05.00,Default,,0000,0000,0000,,(Laughter) Dialogue: 0,0:03:05.00,0:03:08.00,Default,,0000,0000,0000,,was what any self-respecting researchers Dialogue: 0,0:03:08.00,0:03:11.00,Default,,0000,0000,0000,,would have done. Dialogue: 0,0:03:11.00,0:03:13.00,Default,,0000,0000,0000,,We took a page out of XKCD, Dialogue: 0,0:03:13.00,0:03:15.00,Default,,0000,0000,0000,,and we said, "Stand back. Dialogue: 0,0:03:15.00,0:03:17.00,Default,,0000,0000,0000,,We're going to try science." Dialogue: 0,0:03:17.00,0:03:19.00,Default,,0000,0000,0000,,(Laughter) Dialogue: 0,0:03:19.00,0:03:21.00,Default,,0000,0000,0000,,JM: Now of course, we were thinking, Dialogue: 0,0:03:21.00,0:03:23.00,Default,,0000,0000,0000,,well let's just first put the data out there Dialogue: 0,0:03:23.00,0:03:25.00,Default,,0000,0000,0000,,for people to do science to it. Dialogue: 0,0:03:25.00,0:03:27.00,Default,,0000,0000,0000,,Now we're thinking, what data can we release? Dialogue: 0,0:03:27.00,0:03:29.00,Default,,0000,0000,0000,,Well of course, you want to take the books Dialogue: 0,0:03:29.00,0:03:31.00,Default,,0000,0000,0000,,and release the full text of these five million books. Dialogue: 0,0:03:31.00,0:03:33.00,Default,,0000,0000,0000,,Now Google, and Jon Orwant in particular, Dialogue: 0,0:03:33.00,0:03:35.00,Default,,0000,0000,0000,,told us a little equation that we should learn. Dialogue: 0,0:03:35.00,0:03:38.00,Default,,0000,0000,0000,,So you have five million, that is, five million authors Dialogue: 0,0:03:38.00,0:03:41.00,Default,,0000,0000,0000,,and five million plaintiffs is a massive lawsuit. Dialogue: 0,0:03:41.00,0:03:43.00,Default,,0000,0000,0000,,So, although that would be really, really awesome, Dialogue: 0,0:03:43.00,0:03:46.00,Default,,0000,0000,0000,,again, that's extremely, extremely impractical. Dialogue: 0,0:03:46.00,0:03:48.00,Default,,0000,0000,0000,,(Laughter) Dialogue: 0,0:03:48.00,0:03:50.00,Default,,0000,0000,0000,,Now again, we kind of caved in, Dialogue: 0,0:03:50.00,0:03:53.00,Default,,0000,0000,0000,,and we did the very practical approach, which was a bit less awesome. Dialogue: 0,0:03:53.00,0:03:55.00,Default,,0000,0000,0000,,We said, well instead of releasing the full text, Dialogue: 0,0:03:55.00,0:03:57.00,Default,,0000,0000,0000,,we're going to release statistics about the books. Dialogue: 0,0:03:57.00,0:03:59.00,Default,,0000,0000,0000,,So take for instance "A gleam of happiness." Dialogue: 0,0:03:59.00,0:04:01.00,Default,,0000,0000,0000,,It's four words; we call that a four-gram. Dialogue: 0,0:04:01.00,0:04:03.00,Default,,0000,0000,0000,,We're going to tell you how many times a particular four-gram Dialogue: 0,0:04:03.00,0:04:05.00,Default,,0000,0000,0000,,appeared in books in 1801, 1802, 1803, Dialogue: 0,0:04:05.00,0:04:07.00,Default,,0000,0000,0000,,all the way up to 2008. Dialogue: 0,0:04:07.00,0:04:09.00,Default,,0000,0000,0000,,That gives us a time series Dialogue: 0,0:04:09.00,0:04:11.00,Default,,0000,0000,0000,,of how frequently this particular sentence was used over time. Dialogue: 0,0:04:11.00,0:04:14.00,Default,,0000,0000,0000,,We do that for all the words and phrases that appear in those books, Dialogue: 0,0:04:14.00,0:04:17.00,Default,,0000,0000,0000,,and that gives us a big table of two billion lines Dialogue: 0,0:04:17.00,0:04:19.00,Default,,0000,0000,0000,,that tell us about the way culture has been changing. Dialogue: 0,0:04:19.00,0:04:21.00,Default,,0000,0000,0000,,ELA: So those two billion lines, Dialogue: 0,0:04:21.00,0:04:23.00,Default,,0000,0000,0000,,we call them two billion n-grams. Dialogue: 0,0:04:23.00,0:04:25.00,Default,,0000,0000,0000,,What do they tell us? Dialogue: 0,0:04:25.00,0:04:27.00,Default,,0000,0000,0000,,Well the individual n-grams measure cultural trends. Dialogue: 0,0:04:27.00,0:04:29.00,Default,,0000,0000,0000,,Let me give you an example. Dialogue: 0,0:04:29.00,0:04:31.00,Default,,0000,0000,0000,,Let's suppose that I am thriving, Dialogue: 0,0:04:31.00,0:04:33.00,Default,,0000,0000,0000,,then tomorrow I want to tell you about how well I did. Dialogue: 0,0:04:33.00,0:04:36.00,Default,,0000,0000,0000,,And so I might say, "Yesterday, I throve." Dialogue: 0,0:04:36.00,0:04:39.00,Default,,0000,0000,0000,,Alternatively, I could say, "Yesterday, I thrived." Dialogue: 0,0:04:39.00,0:04:42.00,Default,,0000,0000,0000,,Well which one should I use? Dialogue: 0,0:04:42.00,0:04:44.00,Default,,0000,0000,0000,,How to know? Dialogue: 0,0:04:44.00,0:04:46.00,Default,,0000,0000,0000,,As of about six months ago, Dialogue: 0,0:04:46.00,0:04:48.00,Default,,0000,0000,0000,,the state of the art in this field Dialogue: 0,0:04:48.00,0:04:50.00,Default,,0000,0000,0000,,is that you would, for instance, Dialogue: 0,0:04:50.00,0:04:52.00,Default,,0000,0000,0000,,go up to the following psychologist with fabulous hair, Dialogue: 0,0:04:52.00,0:04:54.00,Default,,0000,0000,0000,,and you'd say, Dialogue: 0,0:04:54.00,0:04:57.00,Default,,0000,0000,0000,,"Steve, you're an expert on the irregular verbs. Dialogue: 0,0:04:57.00,0:04:59.00,Default,,0000,0000,0000,,What should I do?" Dialogue: 0,0:04:59.00,0:05:01.00,Default,,0000,0000,0000,,And he'd tell you, "Well most people say thrived, Dialogue: 0,0:05:01.00,0:05:04.00,Default,,0000,0000,0000,,but some people say throve." Dialogue: 0,0:05:04.00,0:05:06.00,Default,,0000,0000,0000,,And you also knew, more or less, Dialogue: 0,0:05:06.00,0:05:09.00,Default,,0000,0000,0000,,that if you were to go back in time 200 years Dialogue: 0,0:05:09.00,0:05:12.00,Default,,0000,0000,0000,,and ask the following statesman with equally fabulous hair, Dialogue: 0,0:05:12.00,0:05:15.00,Default,,0000,0000,0000,,(Laughter) Dialogue: 0,0:05:15.00,0:05:17.00,Default,,0000,0000,0000,,"Tom, what should I say?" Dialogue: 0,0:05:17.00,0:05:19.00,Default,,0000,0000,0000,,He'd say, "Well, in my day, most people throve, Dialogue: 0,0:05:19.00,0:05:22.00,Default,,0000,0000,0000,,but some thrived." Dialogue: 0,0:05:22.00,0:05:24.00,Default,,0000,0000,0000,,So now what I'm just going to show you is raw data. Dialogue: 0,0:05:24.00,0:05:28.00,Default,,0000,0000,0000,,Two rows from this table of two billion entries. Dialogue: 0,0:05:28.00,0:05:30.00,Default,,0000,0000,0000,,What you're seeing is year by year frequency Dialogue: 0,0:05:30.00,0:05:33.00,Default,,0000,0000,0000,,of "thrived" and "throve" over time. Dialogue: 0,0:05:34.00,0:05:36.00,Default,,0000,0000,0000,,Now this is just two Dialogue: 0,0:05:36.00,0:05:39.00,Default,,0000,0000,0000,,out of two billion rows. Dialogue: 0,0:05:39.00,0:05:41.00,Default,,0000,0000,0000,,So the entire data set Dialogue: 0,0:05:41.00,0:05:44.00,Default,,0000,0000,0000,,is a billion times more awesome than this slide. Dialogue: 0,0:05:44.00,0:05:46.00,Default,,0000,0000,0000,,(Laughter) Dialogue: 0,0:05:46.00,0:05:50.00,Default,,0000,0000,0000,,(Applause) Dialogue: 0,0:05:50.00,0:05:52.00,Default,,0000,0000,0000,,JM: Now there are many other pictures that are worth 500 billion words. Dialogue: 0,0:05:52.00,0:05:54.00,Default,,0000,0000,0000,,For instance, this one. Dialogue: 0,0:05:54.00,0:05:56.00,Default,,0000,0000,0000,,If you just take influenza, Dialogue: 0,0:05:56.00,0:05:58.00,Default,,0000,0000,0000,,you will see peaks at the time where you knew Dialogue: 0,0:05:58.00,0:06:01.00,Default,,0000,0000,0000,,big flu epidemics were killing people around the globe. Dialogue: 0,0:06:01.00,0:06:04.00,Default,,0000,0000,0000,,ELA: If you were not yet convinced, Dialogue: 0,0:06:04.00,0:06:06.00,Default,,0000,0000,0000,,sea levels are rising, Dialogue: 0,0:06:06.00,0:06:09.00,Default,,0000,0000,0000,,so is atmospheric CO2 and global temperature. Dialogue: 0,0:06:09.00,0:06:12.00,Default,,0000,0000,0000,,JM: You might also want to have a look at this particular n-gram, Dialogue: 0,0:06:12.00,0:06:15.00,Default,,0000,0000,0000,,and that's to tell Nietzsche that God is not dead, Dialogue: 0,0:06:15.00,0:06:18.00,Default,,0000,0000,0000,,although you might agree that he might need a better publicist. Dialogue: 0,0:06:18.00,0:06:20.00,Default,,0000,0000,0000,,(Laughter) Dialogue: 0,0:06:20.00,0:06:23.00,Default,,0000,0000,0000,,ELA: You can get at some pretty abstract concepts with this sort of thing. Dialogue: 0,0:06:23.00,0:06:25.00,Default,,0000,0000,0000,,For instance, let me tell you the history Dialogue: 0,0:06:25.00,0:06:27.00,Default,,0000,0000,0000,,of the year 1950. Dialogue: 0,0:06:27.00,0:06:29.00,Default,,0000,0000,0000,,Pretty much for the vast majority of history, Dialogue: 0,0:06:29.00,0:06:31.00,Default,,0000,0000,0000,,no one gave a damn about 1950. Dialogue: 0,0:06:31.00,0:06:33.00,Default,,0000,0000,0000,,In 1700, in 1800, in 1900, Dialogue: 0,0:06:33.00,0:06:36.00,Default,,0000,0000,0000,,no one cared. Dialogue: 0,0:06:37.00,0:06:39.00,Default,,0000,0000,0000,,Through the 30s and 40s, Dialogue: 0,0:06:39.00,0:06:41.00,Default,,0000,0000,0000,,no one cared. Dialogue: 0,0:06:41.00,0:06:43.00,Default,,0000,0000,0000,,Suddenly, in the mid-40s, Dialogue: 0,0:06:43.00,0:06:45.00,Default,,0000,0000,0000,,there started to be a buzz. Dialogue: 0,0:06:45.00,0:06:47.00,Default,,0000,0000,0000,,People realized that 1950 was going to happen, Dialogue: 0,0:06:47.00,0:06:49.00,Default,,0000,0000,0000,,and it could be big. Dialogue: 0,0:06:49.00,0:06:52.00,Default,,0000,0000,0000,,(Laughter) Dialogue: 0,0:06:52.00,0:06:55.00,Default,,0000,0000,0000,,But nothing got people interested in 1950 Dialogue: 0,0:06:55.00,0:06:58.00,Default,,0000,0000,0000,,like the year 1950. Dialogue: 0,0:06:58.00,0:07:01.00,Default,,0000,0000,0000,,(Laughter) Dialogue: 0,0:07:01.00,0:07:03.00,Default,,0000,0000,0000,,People were walking around obsessed. Dialogue: 0,0:07:03.00,0:07:05.00,Default,,0000,0000,0000,,They couldn't stop talking Dialogue: 0,0:07:05.00,0:07:08.00,Default,,0000,0000,0000,,about all the things they did in 1950, Dialogue: 0,0:07:08.00,0:07:11.00,Default,,0000,0000,0000,,all the things they were planning to do in 1950, Dialogue: 0,0:07:11.00,0:07:16.00,Default,,0000,0000,0000,,all the dreams of what they wanted to accomplish in 1950. Dialogue: 0,0:07:16.00,0:07:18.00,Default,,0000,0000,0000,,In fact, 1950 was so fascinating Dialogue: 0,0:07:18.00,0:07:20.00,Default,,0000,0000,0000,,that for years thereafter, Dialogue: 0,0:07:20.00,0:07:23.00,Default,,0000,0000,0000,,people just kept talking about all the amazing things that happened, Dialogue: 0,0:07:23.00,0:07:25.00,Default,,0000,0000,0000,,in '51, '52, '53. Dialogue: 0,0:07:25.00,0:07:27.00,Default,,0000,0000,0000,,Finally in 1954, Dialogue: 0,0:07:27.00,0:07:29.00,Default,,0000,0000,0000,,someone woke up and realized Dialogue: 0,0:07:29.00,0:07:33.00,Default,,0000,0000,0000,,that 1950 had gotten somewhat passé. Dialogue: 0,0:07:33.00,0:07:35.00,Default,,0000,0000,0000,,(Laughter) Dialogue: 0,0:07:35.00,0:07:37.00,Default,,0000,0000,0000,,And just like that, the bubble burst. Dialogue: 0,0:07:37.00,0:07:39.00,Default,,0000,0000,0000,,(Laughter) Dialogue: 0,0:07:39.00,0:07:41.00,Default,,0000,0000,0000,,And the story of 1950 Dialogue: 0,0:07:41.00,0:07:43.00,Default,,0000,0000,0000,,is the story of every year that we have on record, Dialogue: 0,0:07:43.00,0:07:46.00,Default,,0000,0000,0000,,with a little twist, because now we've got these nice charts. Dialogue: 0,0:07:46.00,0:07:49.00,Default,,0000,0000,0000,,And because we have these nice charts, we can measure things. Dialogue: 0,0:07:49.00,0:07:51.00,Default,,0000,0000,0000,,We can say, "Well how fast does the bubble burst?" Dialogue: 0,0:07:51.00,0:07:54.00,Default,,0000,0000,0000,,And it turns out that we can measure that very precisely. Dialogue: 0,0:07:54.00,0:07:57.00,Default,,0000,0000,0000,,Equations were derived, graphs were produced, Dialogue: 0,0:07:57.00,0:07:59.00,Default,,0000,0000,0000,,and the net result Dialogue: 0,0:07:59.00,0:08:02.00,Default,,0000,0000,0000,,is that we find that the bubble bursts faster and faster Dialogue: 0,0:08:02.00,0:08:04.00,Default,,0000,0000,0000,,with each passing year. Dialogue: 0,0:08:04.00,0:08:09.00,Default,,0000,0000,0000,,We are losing interest in the past more rapidly. Dialogue: 0,0:08:09.00,0:08:11.00,Default,,0000,0000,0000,,JM: Now a little piece of career advice. Dialogue: 0,0:08:11.00,0:08:13.00,Default,,0000,0000,0000,,So for those of you who seek to be famous, Dialogue: 0,0:08:13.00,0:08:15.00,Default,,0000,0000,0000,,we can learn from the 25 most famous political figures, Dialogue: 0,0:08:15.00,0:08:17.00,Default,,0000,0000,0000,,authors, actors and so on. Dialogue: 0,0:08:17.00,0:08:20.00,Default,,0000,0000,0000,,So if you want to become famous early on, you should be an actor, Dialogue: 0,0:08:20.00,0:08:22.00,Default,,0000,0000,0000,,because then fame starts rising by the end of your 20s -- Dialogue: 0,0:08:22.00,0:08:24.00,Default,,0000,0000,0000,,you're still young, it's really great. Dialogue: 0,0:08:24.00,0:08:26.00,Default,,0000,0000,0000,,Now if you can wait a little bit, you should be an author, Dialogue: 0,0:08:26.00,0:08:28.00,Default,,0000,0000,0000,,because then you rise to very great heights, Dialogue: 0,0:08:28.00,0:08:30.00,Default,,0000,0000,0000,,like Mark Twain, for instance: extremely famous. Dialogue: 0,0:08:30.00,0:08:32.00,Default,,0000,0000,0000,,But if you want to reach the very top, Dialogue: 0,0:08:32.00,0:08:34.00,Default,,0000,0000,0000,,you should delay gratification Dialogue: 0,0:08:34.00,0:08:36.00,Default,,0000,0000,0000,,and, of course, become a politician. Dialogue: 0,0:08:36.00,0:08:38.00,Default,,0000,0000,0000,,So here you will become famous by the end of your 50s, Dialogue: 0,0:08:38.00,0:08:40.00,Default,,0000,0000,0000,,and become very, very famous afterward. Dialogue: 0,0:08:40.00,0:08:43.00,Default,,0000,0000,0000,,So scientists also tend to get famous when they're much older. Dialogue: 0,0:08:43.00,0:08:45.00,Default,,0000,0000,0000,,Like for instance, biologists and physics Dialogue: 0,0:08:45.00,0:08:47.00,Default,,0000,0000,0000,,tend to be almost as famous as actors. Dialogue: 0,0:08:47.00,0:08:50.00,Default,,0000,0000,0000,,One mistake you should not do is become a mathematician. Dialogue: 0,0:08:50.00,0:08:52.00,Default,,0000,0000,0000,,(Laughter) Dialogue: 0,0:08:52.00,0:08:54.00,Default,,0000,0000,0000,,If you do that, Dialogue: 0,0:08:54.00,0:08:57.00,Default,,0000,0000,0000,,you might think, "Oh great. I'm going to do my best work when I'm in my 20s." Dialogue: 0,0:08:57.00,0:08:59.00,Default,,0000,0000,0000,,But guess what, nobody will really care. Dialogue: 0,0:08:59.00,0:09:02.00,Default,,0000,0000,0000,,(Laughter) Dialogue: 0,0:09:02.00,0:09:04.00,Default,,0000,0000,0000,,ELA: There are more sobering notes Dialogue: 0,0:09:04.00,0:09:06.00,Default,,0000,0000,0000,,among the n-grams. Dialogue: 0,0:09:06.00,0:09:08.00,Default,,0000,0000,0000,,For instance, here's the trajectory of Marc Chagall, Dialogue: 0,0:09:08.00,0:09:10.00,Default,,0000,0000,0000,,an artist born in 1887. Dialogue: 0,0:09:10.00,0:09:13.00,Default,,0000,0000,0000,,And this looks like the normal trajectory of a famous person. Dialogue: 0,0:09:13.00,0:09:17.00,Default,,0000,0000,0000,,He gets more and more and more famous, Dialogue: 0,0:09:17.00,0:09:19.00,Default,,0000,0000,0000,,except if you look in German. Dialogue: 0,0:09:19.00,0:09:21.00,Default,,0000,0000,0000,,If you look in German, you see something completely bizarre, Dialogue: 0,0:09:21.00,0:09:23.00,Default,,0000,0000,0000,,something you pretty much never see, Dialogue: 0,0:09:23.00,0:09:25.00,Default,,0000,0000,0000,,which is he becomes extremely famous Dialogue: 0,0:09:25.00,0:09:27.00,Default,,0000,0000,0000,,and then all of a sudden plummets, Dialogue: 0,0:09:27.00,0:09:30.00,Default,,0000,0000,0000,,going through a nadir between 1933 and 1945, Dialogue: 0,0:09:30.00,0:09:33.00,Default,,0000,0000,0000,,before rebounding afterward. Dialogue: 0,0:09:33.00,0:09:35.00,Default,,0000,0000,0000,,And of course, what we're seeing Dialogue: 0,0:09:35.00,0:09:38.00,Default,,0000,0000,0000,,is the fact Marc Chagall was a Jewish artist Dialogue: 0,0:09:38.00,0:09:40.00,Default,,0000,0000,0000,,in Nazi Germany. Dialogue: 0,0:09:40.00,0:09:42.00,Default,,0000,0000,0000,,Now these signals Dialogue: 0,0:09:42.00,0:09:44.00,Default,,0000,0000,0000,,are actually so strong Dialogue: 0,0:09:44.00,0:09:47.00,Default,,0000,0000,0000,,that we don't need to know that someone was censored. Dialogue: 0,0:09:47.00,0:09:49.00,Default,,0000,0000,0000,,We can actually figure it out Dialogue: 0,0:09:49.00,0:09:51.00,Default,,0000,0000,0000,,using really basic signal processing. Dialogue: 0,0:09:51.00,0:09:53.00,Default,,0000,0000,0000,,Here's a simple way to do it. Dialogue: 0,0:09:53.00,0:09:55.00,Default,,0000,0000,0000,,Well, a reasonable expectation Dialogue: 0,0:09:55.00,0:09:57.00,Default,,0000,0000,0000,,is that somebody's fame in a given period of time Dialogue: 0,0:09:57.00,0:09:59.00,Default,,0000,0000,0000,,should be roughly the average of their fame before Dialogue: 0,0:09:59.00,0:10:01.00,Default,,0000,0000,0000,,and their fame after. Dialogue: 0,0:10:01.00,0:10:03.00,Default,,0000,0000,0000,,So that's sort of what we expect. Dialogue: 0,0:10:03.00,0:10:06.00,Default,,0000,0000,0000,,And we compare that to the fame that we observe. Dialogue: 0,0:10:06.00,0:10:08.00,Default,,0000,0000,0000,,And we just divide one by the other Dialogue: 0,0:10:08.00,0:10:10.00,Default,,0000,0000,0000,,to produce something we call a suppression index. Dialogue: 0,0:10:10.00,0:10:13.00,Default,,0000,0000,0000,,If the suppression index is very, very, very small, Dialogue: 0,0:10:13.00,0:10:15.00,Default,,0000,0000,0000,,then you very well might be being suppressed. Dialogue: 0,0:10:15.00,0:10:18.00,Default,,0000,0000,0000,,If it's very large, maybe you're benefiting from propaganda. Dialogue: 0,0:10:19.00,0:10:21.00,Default,,0000,0000,0000,,JM: Now you can actually look at Dialogue: 0,0:10:21.00,0:10:24.00,Default,,0000,0000,0000,,the distribution of suppression indexes over whole populations. Dialogue: 0,0:10:24.00,0:10:26.00,Default,,0000,0000,0000,,So for instance, here -- Dialogue: 0,0:10:26.00,0:10:28.00,Default,,0000,0000,0000,,this suppression index is for 5,000 people Dialogue: 0,0:10:28.00,0:10:30.00,Default,,0000,0000,0000,,picked in English books where there's no known suppression -- Dialogue: 0,0:10:30.00,0:10:32.00,Default,,0000,0000,0000,,it would be like this, basically tightly centered on one. Dialogue: 0,0:10:32.00,0:10:34.00,Default,,0000,0000,0000,,What you expect is basically what you observe. Dialogue: 0,0:10:34.00,0:10:36.00,Default,,0000,0000,0000,,This is distribution as seen in Germany -- Dialogue: 0,0:10:36.00,0:10:38.00,Default,,0000,0000,0000,,very different, it's shifted to the left. Dialogue: 0,0:10:38.00,0:10:41.00,Default,,0000,0000,0000,,People talked about it twice less as it should have been. Dialogue: 0,0:10:41.00,0:10:43.00,Default,,0000,0000,0000,,But much more importantly, the distribution is much wider. Dialogue: 0,0:10:43.00,0:10:46.00,Default,,0000,0000,0000,,There are many people who end up on the far left on this distribution Dialogue: 0,0:10:46.00,0:10:49.00,Default,,0000,0000,0000,,who are talked about 10 times fewer than they should have been. Dialogue: 0,0:10:49.00,0:10:51.00,Default,,0000,0000,0000,,But then also many people on the far right Dialogue: 0,0:10:51.00,0:10:53.00,Default,,0000,0000,0000,,who seem to benefit from propaganda. Dialogue: 0,0:10:53.00,0:10:56.00,Default,,0000,0000,0000,,This picture is the hallmark of censorship in the book record. Dialogue: 0,0:10:56.00,0:10:58.00,Default,,0000,0000,0000,,ELA: So culturomics Dialogue: 0,0:10:58.00,0:11:00.00,Default,,0000,0000,0000,,is what we call this method. Dialogue: 0,0:11:00.00,0:11:02.00,Default,,0000,0000,0000,,It's kind of like genomics. Dialogue: 0,0:11:02.00,0:11:04.00,Default,,0000,0000,0000,,Except genomics is a lens on biology Dialogue: 0,0:11:04.00,0:11:07.00,Default,,0000,0000,0000,,through the window of the sequence of bases in the human genome. Dialogue: 0,0:11:07.00,0:11:09.00,Default,,0000,0000,0000,,Culturomics is similar. Dialogue: 0,0:11:09.00,0:11:12.00,Default,,0000,0000,0000,,It's the application of massive-scale data collection analysis Dialogue: 0,0:11:12.00,0:11:14.00,Default,,0000,0000,0000,,to the study of human culture. Dialogue: 0,0:11:14.00,0:11:16.00,Default,,0000,0000,0000,,Here, instead of through the lens of a genome, Dialogue: 0,0:11:16.00,0:11:19.00,Default,,0000,0000,0000,,through the lens of digitized pieces of the historical record. Dialogue: 0,0:11:19.00,0:11:21.00,Default,,0000,0000,0000,,The great thing about culturomics Dialogue: 0,0:11:21.00,0:11:23.00,Default,,0000,0000,0000,,is that everyone can do it. Dialogue: 0,0:11:23.00,0:11:25.00,Default,,0000,0000,0000,,Why can everyone do it? Dialogue: 0,0:11:25.00,0:11:27.00,Default,,0000,0000,0000,,Everyone can do it because three guys, Dialogue: 0,0:11:27.00,0:11:30.00,Default,,0000,0000,0000,,Jon Orwant, Matt Gray and Will Brockman over at Google, Dialogue: 0,0:11:30.00,0:11:32.00,Default,,0000,0000,0000,,saw the prototype of the Ngram Viewer, Dialogue: 0,0:11:32.00,0:11:34.00,Default,,0000,0000,0000,,and they said, "This is so fun. Dialogue: 0,0:11:34.00,0:11:37.00,Default,,0000,0000,0000,,We have to make this available for people." Dialogue: 0,0:11:37.00,0:11:39.00,Default,,0000,0000,0000,,So in two weeks flat -- the two weeks before our paper came out -- Dialogue: 0,0:11:39.00,0:11:42.00,Default,,0000,0000,0000,,they coded up a version of the Ngram Viewer for the general public. Dialogue: 0,0:11:42.00,0:11:45.00,Default,,0000,0000,0000,,And so you too can type in any word or phrase that you're interested in Dialogue: 0,0:11:45.00,0:11:47.00,Default,,0000,0000,0000,,and see its n-gram immediately -- Dialogue: 0,0:11:47.00,0:11:49.00,Default,,0000,0000,0000,,also browse examples of all the various books Dialogue: 0,0:11:49.00,0:11:51.00,Default,,0000,0000,0000,,in which your n-gram appears. Dialogue: 0,0:11:51.00,0:11:53.00,Default,,0000,0000,0000,,JM: Now this was used over a million times on the first day, Dialogue: 0,0:11:53.00,0:11:55.00,Default,,0000,0000,0000,,and this is really the best of all the queries. Dialogue: 0,0:11:55.00,0:11:58.00,Default,,0000,0000,0000,,So people want to be their best, put their best foot forward. Dialogue: 0,0:11:58.00,0:12:01.00,Default,,0000,0000,0000,,But it turns out in the 18th century, people didn't really care about that at all. Dialogue: 0,0:12:01.00,0:12:04.00,Default,,0000,0000,0000,,They didn't want to be their best, they wanted to be their beft. Dialogue: 0,0:12:04.00,0:12:07.00,Default,,0000,0000,0000,,So what happened is, of course, this is just a mistake. Dialogue: 0,0:12:07.00,0:12:09.00,Default,,0000,0000,0000,,It's not that strove for mediocrity, Dialogue: 0,0:12:09.00,0:12:12.00,Default,,0000,0000,0000,,it's just that the S used to be written differently, kind of like an F. Dialogue: 0,0:12:12.00,0:12:15.00,Default,,0000,0000,0000,,Now of course, Google didn't pick this up at the time, Dialogue: 0,0:12:15.00,0:12:18.00,Default,,0000,0000,0000,,so we reported this in the science article that we wrote. Dialogue: 0,0:12:18.00,0:12:20.00,Default,,0000,0000,0000,,But it turns out this is just a reminder Dialogue: 0,0:12:20.00,0:12:22.00,Default,,0000,0000,0000,,that, although this is a lot of fun, Dialogue: 0,0:12:22.00,0:12:24.00,Default,,0000,0000,0000,,when you interpret these graphs, you have to be very careful, Dialogue: 0,0:12:24.00,0:12:27.00,Default,,0000,0000,0000,,and you have to adopt the base standards in the sciences. Dialogue: 0,0:12:27.00,0:12:30.00,Default,,0000,0000,0000,,ELA: People have been using this for all kinds of fun purposes. Dialogue: 0,0:12:30.00,0:12:37.00,Default,,0000,0000,0000,,(Laughter) Dialogue: 0,0:12:37.00,0:12:39.00,Default,,0000,0000,0000,,Actually, we're not going to have to talk, Dialogue: 0,0:12:39.00,0:12:42.00,Default,,0000,0000,0000,,we're just going to show you all the slides and remain silent. Dialogue: 0,0:12:42.00,0:12:45.00,Default,,0000,0000,0000,,This person was interested in the history of frustration. Dialogue: 0,0:12:45.00,0:12:48.00,Default,,0000,0000,0000,,There's various types of frustration. Dialogue: 0,0:12:48.00,0:12:51.00,Default,,0000,0000,0000,,If you stub your toe, that's a one A "argh." Dialogue: 0,0:12:51.00,0:12:53.00,Default,,0000,0000,0000,,If the planet Earth is annihilated by the Vogons Dialogue: 0,0:12:53.00,0:12:55.00,Default,,0000,0000,0000,,to make room for an interstellar bypass, Dialogue: 0,0:12:55.00,0:12:57.00,Default,,0000,0000,0000,,that's an eight A "aaaaaaaargh." Dialogue: 0,0:12:57.00,0:12:59.00,Default,,0000,0000,0000,,This person studies all the "arghs," Dialogue: 0,0:12:59.00,0:13:01.00,Default,,0000,0000,0000,,from one through eight A's. Dialogue: 0,0:13:01.00,0:13:03.00,Default,,0000,0000,0000,,And it turns out Dialogue: 0,0:13:03.00,0:13:05.00,Default,,0000,0000,0000,,that the less-frequent "arghs" Dialogue: 0,0:13:05.00,0:13:08.00,Default,,0000,0000,0000,,are, of course, the ones that correspond to things that are more frustrating -- Dialogue: 0,0:13:08.00,0:13:11.00,Default,,0000,0000,0000,,except, oddly, in the early 80s. Dialogue: 0,0:13:11.00,0:13:13.00,Default,,0000,0000,0000,,We think that might have something to do with Reagan. Dialogue: 0,0:13:13.00,0:13:15.00,Default,,0000,0000,0000,,(Laughter) Dialogue: 0,0:13:15.00,0:13:18.00,Default,,0000,0000,0000,,JM: There are many usages of this data, Dialogue: 0,0:13:18.00,0:13:21.00,Default,,0000,0000,0000,,but the bottom line is that the historical record is being digitized. Dialogue: 0,0:13:21.00,0:13:23.00,Default,,0000,0000,0000,,Google has started to digitize 15 million books. Dialogue: 0,0:13:23.00,0:13:25.00,Default,,0000,0000,0000,,That's 12 percent of all the books that have ever been published. Dialogue: 0,0:13:25.00,0:13:28.00,Default,,0000,0000,0000,,It's a sizable chunk of human culture. Dialogue: 0,0:13:28.00,0:13:31.00,Default,,0000,0000,0000,,There's much more in culture: there's manuscripts, there newspapers, Dialogue: 0,0:13:31.00,0:13:33.00,Default,,0000,0000,0000,,there's things that are not text, like art and paintings. Dialogue: 0,0:13:33.00,0:13:35.00,Default,,0000,0000,0000,,These all happen to be on our computers, Dialogue: 0,0:13:35.00,0:13:37.00,Default,,0000,0000,0000,,on computers across the world. Dialogue: 0,0:13:37.00,0:13:40.00,Default,,0000,0000,0000,,And when that happens, that will transform the way we have Dialogue: 0,0:13:40.00,0:13:42.00,Default,,0000,0000,0000,,to understand our past, our present and human culture. Dialogue: 0,0:13:42.00,0:13:44.00,Default,,0000,0000,0000,,Thank you very much. Dialogue: 0,0:13:44.00,0:13:47.00,Default,,0000,0000,0000,,(Applause)