WEBVTT 00:00:00.156 --> 00:00:03.683 This is The Rundown, I'm Hari Sreenivasan, we're talking about words today. 00:00:03.683 --> 00:00:09.732 Joining me now is lexicographer Erin McKean, she's the CEO and founder of Wordnik.com. 00:00:09.732 --> 00:00:10.885 Thanks for joining us. 00:00:10.885 --> 00:00:12.535 You're very welcome. Thank you. 00:00:12.535 --> 00:00:18.356 Google recently launched a kind of a website or a database if you will, along with some folks at Harvard— 00:00:18.356 --> 00:00:23.531 the NGRAM, which allows people to search for words through hundreds and hundreds and thousands 00:00:23.531 --> 00:00:28.427 of books and periodicals and so forth that have gone back for decades. 00:00:28.427 --> 00:00:30.250 What did you do when you first heard about it? 00:00:30.250 --> 00:00:36.033 We were very excited when we realized that google was releasing the NGRAM data under a very open license 00:00:36.033 --> 00:00:41.181 because it means that lots of people can take that data and try and do cool things with it. 00:00:41.181 --> 00:00:44.957 And of course at Wordnik, we're all about trying to do cool things with words. 00:00:44.957 --> 00:00:50.320 And so the data is based on something like 5 percent of the Google Books corpus, 00:00:50.320 --> 00:00:53.878 which is not a lot, but it's a lot of words. 00:00:53.878 --> 00:00:59.966 What does it teach you about the English language to have access to the occurrence of words over time? 00:00:59.966 --> 00:01:07.814 Right now, you can think of the kind of science behind the NGRAM viewer as like what, 00:01:07.814 --> 00:01:10.696 say, early antibiotics were like. 00:01:10.696 --> 00:01:16.173 They aren't very targeted, so you can't really tell the difference between, say, 00:01:16.173 --> 00:01:19.151 the word "pretty" when it means "good looking" 00:01:19.151 --> 00:01:23.670 versus the word "pretty" when it's in a construction, like, "That was a pretty neat thing." 00:01:23.670 --> 00:01:29.233 Are we using more new words now? Is the rate of the English language's growth increasing? 00:01:29.233 --> 00:01:36.768 Right now we can measure it better than we ever have been able to do before, so in the paper that the 00:01:36.768 --> 00:01:40.232 researchers from Google and from Harvard published in Science, 00:01:40.232 --> 00:01:45.124 they were talking about that they notice more new words appearing over time. 00:01:45.124 --> 00:01:49.897 And also something that I was very happy to have people from Google and Harvard backing me up on 00:01:49.897 --> 00:01:54.082 that they estimated that 52 percent of the words that they looked at 00:01:54.082 --> 00:01:56.752 were not in the dictionaries that they checked. 00:01:56.752 --> 00:01:58.259 How is that even possible? 00:01:58.259 --> 00:02:02.972 Well, there are lots and lots of words that happen just once, nonce words, 00:02:02.972 --> 00:02:06.746 that if you are making a print dictionary you just don't have room to put them in. 00:02:06.746 --> 00:02:09.724 And for someone who hasn't been to Wordnik, what's the difference between 00:02:09.724 --> 00:02:12.963 Wordnik and going to one of the other online dictionaries? 00:02:12.963 --> 00:02:17.691 So Wordnik has about six times as many words as most of the other online dictionaries. 00:02:17.691 --> 00:02:22.994 So we show you as much information as we can about as many words as we can. 00:02:22.994 --> 00:02:26.260 So if there's a traditional dictionary definition, we'll show you that. 00:02:26.260 --> 00:02:29.264 But if we only have three really good sentences from say 00:02:29.264 --> 00:02:32.842 the Wall Street Journal, or Forbes, or the Huffington Post, we'll show you that 00:02:32.842 --> 00:02:38.824 and say, "Hey, real journalists are using this word. You can take their sentences as a model." 00:02:38.824 --> 00:02:40.888 Since it is getting kind of close to the new year, 00:02:40.888 --> 00:02:46.139 what are some of the top words of 2010 or 2011 that you're seeing? 00:02:46.139 --> 00:02:50.371 It's interesting, people always want to have the top words of the year, but usually 00:02:50.371 --> 00:02:58.024 words kind of incubate underground like seeds for a while until they pop up into popular consciousness. 00:02:58.024 --> 00:03:02.805 A couple of words that I've been really interested in lately are all kinda 00:03:02.805 --> 00:03:08.343 negative technology consequences words, like geoslavery. 00:03:08.343 --> 00:03:10.798 And what does geoslavery mean? 00:03:10.798 --> 00:03:17.773 So geoslavery is the idea that with all the GPS functionality and tracking on people's cellphones 00:03:17.773 --> 00:03:25.949 that abusive partners and spouses can use that data to keep tighter tabs on their partners 00:03:25.949 --> 00:03:29.189 With the idea that they're really trying to enforce behavior limits. 00:03:29.189 --> 00:03:31.331 What else is popping up like a seed? 00:03:31.331 --> 00:03:39.167 I really like the word aftercrimes, which is made by analogy to afershocks. 00:03:39.167 --> 00:03:43.869 So it's little crimes that pop up in an area after a major crime has occurred there. 00:03:43.869 --> 00:03:49.381 So what's the end goal for Wordnik? Does it become the dictionary of choice for everyone? 00:03:49.381 --> 00:03:52.803 We're trying to map the whole English language. 00:03:52.803 --> 00:03:55.285 What we'd really like to be is GPS for words 00:03:55.285 --> 00:03:58.707 and show you as much information about as many words as possible. 00:03:58.707 --> 00:04:02.364 All right, Erin McKean CEO and founder of Wordnik, lexicographer. 00:04:02.364 --> 00:04:04.480 Thanks for joining us and happy wording. 00:04:04.480 --> 00:04:06.100 Thanks so much. 00:04:06.100 --> 00:04:09.123 I'm Hari Sreenivasan, this is The Rundown. Stay with us.