1 00:00:00,156 --> 00:00:03,683 This is The Rundown, I'm Hari Sreenivasan, we're talking about words today. 2 00:00:03,683 --> 00:00:09,732 Joining me now is lexicographer Erin McKean, she's the CEO and founder of Wordnik.com. 3 00:00:09,732 --> 00:00:10,885 Thanks for joining us. 4 00:00:10,885 --> 00:00:12,535 You're very welcome. Thank you. 5 00:00:12,535 --> 00:00:18,356 Google recently launched a kind of a website or a database if you will, along with some folks at Harvard— 6 00:00:18,356 --> 00:00:23,531 the NGRAM, which allows people to search for words through hundreds and hundreds and thousands 7 00:00:23,531 --> 00:00:28,427 of books and periodicals and so forth that have gone back for decades. 8 00:00:28,427 --> 00:00:30,250 What did you do when you first heard about it? 9 00:00:30,250 --> 00:00:36,033 We were very excited when we realized that google was releasing the NGRAM data under a very open license 10 00:00:36,033 --> 00:00:41,181 because it means that lots of people can take that data and try and do cool things with it. 11 00:00:41,181 --> 00:00:44,957 And of course at Wordnik, we're all about trying to do cool things with words. 12 00:00:44,957 --> 00:00:50,320 And so the data is based on something like 5 percent of the Google Books corpus, 13 00:00:50,320 --> 00:00:53,878 which is not a lot, but it's a lot of words. 14 00:00:53,878 --> 00:00:59,966 What does it teach you about the English language to have access to the occurrence of words over time? 15 00:00:59,966 --> 00:01:07,814 Right now, you can think of the kind of science behind the NGRAM viewer as like what, 16 00:01:07,814 --> 00:01:10,696 say, early antibiotics were like. 17 00:01:10,696 --> 00:01:16,173 They aren't very targeted, so you can't really tell the difference between, say, 18 00:01:16,173 --> 00:01:19,151 the word "pretty" when it means "good looking" 19 00:01:19,151 --> 00:01:23,670 versus the word "pretty" when it's in a construction, like, "That was a pretty neat thing." 20 00:01:23,670 --> 00:01:29,233 Are we using more new words now? Is the rate of the English language's growth increasing? 21 00:01:29,233 --> 00:01:36,768 Right now we can measure it better than we ever have been able to do before, so in the paper that the 22 00:01:36,768 --> 00:01:40,232 researchers from Google and from Harvard published in Science, 23 00:01:40,232 --> 00:01:45,124 they were talking about that they notice more new words appearing over time. 24 00:01:45,124 --> 00:01:49,897 And also something that I was very happy to have people from Google and Harvard backing me up on 25 00:01:49,897 --> 00:01:54,082 that they estimated that 52 percent of the words that they looked at 26 00:01:54,082 --> 00:01:56,752 were not in the dictionaries that they checked. 27 00:01:56,752 --> 00:01:58,259 How is that even possible? 28 00:01:58,259 --> 00:02:02,972 Well, there are lots and lots of words that happen just once, nonce words, 29 00:02:02,972 --> 00:02:06,746 that if you are making a print dictionary you just don't have room to put them in. 30 00:02:06,746 --> 00:02:09,724 And for someone who hasn't been to Wordnik, what's the difference between 31 00:02:09,724 --> 00:02:12,963 Wordnik and going to one of the other online dictionaries? 32 00:02:12,963 --> 00:02:17,691 So Wordnik has about six times as many words as most of the other online dictionaries. 33 00:02:17,691 --> 00:02:22,994 So we show you as much information as we can about as many words as we can. 34 00:02:22,994 --> 00:02:26,260 So if there's a traditional dictionary definition, we'll show you that. 35 00:02:26,260 --> 00:02:29,264 But if we only have three really good sentences from say 36 00:02:29,264 --> 00:02:32,842 the Wall Street Journal, or Forbes, or the Huffington Post, we'll show you that 37 00:02:32,842 --> 00:02:38,824 and say, "Hey, real journalists are using this word. You can take their sentences as a model." 38 00:02:38,824 --> 00:02:40,888 Since it is getting kind of close to the new year, 39 00:02:40,888 --> 00:02:46,139 what are some of the top words of 2010 or 2011 that you're seeing? 40 00:02:46,139 --> 00:02:50,371 It's interesting, people always want to have the top words of the year, but usually 41 00:02:50,371 --> 00:02:58,024 words kind of incubate underground like seeds for a while until they pop up into popular consciousness. 42 00:02:58,024 --> 00:03:02,805 A couple of words that I've been really interested in lately are all kinda 43 00:03:02,805 --> 00:03:08,343 negative technology consequences words, like geoslavery. 44 00:03:08,343 --> 00:03:10,798 And what does geoslavery mean? 45 00:03:10,798 --> 00:03:17,773 So geoslavery is the idea that with all the GPS functionality and tracking on people's cellphones 46 00:03:17,773 --> 00:03:25,949 that abusive partners and spouses can use that data to keep tighter tabs on their partners 47 00:03:25,949 --> 00:03:29,189 With the idea that they're really trying to enforce behavior limits. 48 00:03:29,189 --> 00:03:31,331 What else is popping up like a seed? 49 00:03:31,331 --> 00:03:39,167 I really like the word aftercrimes, which is made by analogy to afershocks. 50 00:03:39,167 --> 00:03:43,869 So it's little crimes that pop up in an area after a major crime has occurred there. 51 00:03:43,869 --> 00:03:49,381 So what's the end goal for Wordnik? Does it become the dictionary of choice for everyone? 52 00:03:49,381 --> 00:03:52,803 We're trying to map the whole English language. 53 00:03:52,803 --> 00:03:55,285 What we'd really like to be is GPS for words 54 00:03:55,285 --> 00:03:58,707 and show you as much information about as many words as possible. 55 00:03:58,707 --> 00:04:02,364 All right, Erin McKean CEO and founder of Wordnik, lexicographer. 56 00:04:02,364 --> 00:04:04,480 Thanks for joining us and happy wording. 57 00:04:04,480 --> 00:04:06,100 Thanks so much. 58 00:04:06,100 --> 00:04:09,123 I'm Hari Sreenivasan, this is The Rundown. Stay with us.