[Script Info] Title: [Events] Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text Dialogue: 0,0:00:04.11,0:00:06.77,Default,,0000,0000,0000,,♪ (Music fades in) ♪ Dialogue: 0,0:00:15.80,0:00:19.39,Default,,0000,0000,0000,,(Chirping) Dialogue: 0,0:00:24.56,0:00:30.05,Default,,0000,0000,0000,,(Vocalizations, different languages) Dialogue: 0,0:00:32.92,0:00:36.46,Default,,0000,0000,0000,,(Talking overlaps in background) Dialogue: 0,0:00:38.12,0:00:41.49,Default,,0000,0000,0000,,(Computerized beeping) Dialogue: 0,0:00:49.96,0:00:54.43,Default,,0000,0000,0000,,(Man) We come into this world with the\Ninnate ability to learn to interact Dialogue: 0,0:00:54.43,0:00:57.32,Default,,0000,0000,0000,,with other sentient beings. Dialogue: 0,0:00:59.14,0:01:00.20,Default,,0000,0000,0000,,(Child vocalizing) Dialogue: 0,0:01:00.58,0:01:03.64,Default,,0000,0000,0000,,(Man) Suppose you are to interact with\Nother people by writing little messages. Dialogue: 0,0:01:05.58,0:01:07.40,Default,,0000,0000,0000,,(Man) It'd be a real pain. Dialogue: 0,0:01:07.40,0:01:09.49,Default,,0000,0000,0000,,(Man) And that's how we interact\Nwith computers. Dialogue: 0,0:01:09.49,0:01:13.18,Default,,0000,0000,0000,,It's much easier just to talk to them...\Njust so much easier... Dialogue: 0,0:01:13.18,0:01:15.96,Default,,0000,0000,0000,,(Man) If the computers could understand\Nwhat we're saying. Dialogue: 0,0:01:18.32,0:01:20.83,Default,,0000,0000,0000,,For that, you need really \Ngood speech recognition. Dialogue: 0,0:01:21.25,0:01:24.13,Default,,0000,0000,0000,,(Narrator) The first speech recognition\Nsystem was developed by Bell Laboratories Dialogue: 0,0:01:24.13,0:01:28.57,Default,,0000,0000,0000,,in 1952. It could only recognize\Nnumbers spoken by one person. Dialogue: 0,0:01:28.57,0:01:31.77,Default,,0000,0000,0000,,In the 1970s, Carnegie-Mellon\Ncame out with the Harpy System. Dialogue: 0,0:01:31.77,0:01:36.82,Default,,0000,0000,0000,,This was able to recognize over\N1,000 words and different pronunciations Dialogue: 0,0:01:36.82,0:01:39.93,Default,,0000,0000,0000,,(Narrator) of the same word.\N- (Man) Tomato - (Woman) Tomato Dialogue: 0,0:01:39.93,0:01:42.71,Default,,0000,0000,0000,,(Narrator) Speech recognition continued\Nin the 80s with the introduction of the Dialogue: 0,0:01:42.71,0:01:45.58,Default,,0000,0000,0000,,Hidden Markov Model, which\Nused a more mathematical approach Dialogue: 0,0:01:45.58,0:01:50.19,Default,,0000,0000,0000,,to analyzing sound waves that led to\Nmany breakthroughs we have today. Dialogue: 0,0:01:50.32,0:01:52.90,Default,,0000,0000,0000,,You're taking in very raw audio wave forms Dialogue: 0,0:01:52.90,0:01:54.62,Default,,0000,0000,0000,,like you get through a microphone Dialogue: 0,0:01:54.62,0:01:55.86,Default,,0000,0000,0000,,on your phone Dialogue: 0,0:01:55.86,0:01:56.83,Default,,0000,0000,0000,,or whatever... Dialogue: 0,0:01:56.97,0:02:02.45,Default,,0000,0000,0000,,(Woman) We chop it into small pieces\Nand it tries to identify which phoneme Dialogue: 0,0:02:02.45,0:02:05.72,Default,,0000,0000,0000,,was spoken in that piece of speech. Dialogue: 0,0:02:05.72,0:02:09.45,Default,,0000,0000,0000,,- Phoneme is a primitive unit for\Nexpressing words. Dialogue: 0,0:02:09.99,0:02:15.06,Default,,0000,0000,0000,,(voicing phonemes shown above) Dialogue: 0,0:02:15.50,0:02:20.10,Default,,0000,0000,0000,,And then you stitch those together\Ninto likely words like Palo Alto. Dialogue: 0,0:02:20.10,0:02:23.81,Default,,0000,0000,0000,,- Speech recognition today is good at\Ntranscribing what you've said... Dialogue: 0,0:02:23.81,0:02:25.68,Default,,0000,0000,0000,,(Man, to phone) What's the weather\Nlike in Topeka? Dialogue: 0,0:02:25.68,0:02:30.45,Default,,0000,0000,0000,,(Man) You can talk about travels, your\Ncontacts, like, "Where can I get pizza?" Dialogue: 0,0:02:30.45,0:02:32.07,Default,,0000,0000,0000,,(Phone) Here are the listings for Pizza. Dialogue: 0,0:02:32.07,0:02:34.28,Default,,0000,0000,0000,,(Man) "How tall is the Eiffel Tower?"\N(Phone) The Eiffel Tower is ... Dialogue: 0,0:02:34.33,0:02:36.100,Default,,0000,0000,0000,,(Woman) We've made tremendous\Nimprovements very quickly. Dialogue: 0,0:02:37.06,0:02:39.21,Default,,0000,0000,0000,,(Man, to phone) Who is the 21st\NPresident of the United States? Dialogue: 0,0:02:39.56,0:02:42.81,Default,,0000,0000,0000,,(Phone beeps)\N(Phone) Chester A. Arthur was the 21st... Dialogue: 0,0:02:42.81,0:02:44.40,Default,,0000,0000,0000,,(Man, to phone) Okay, Google,\Nwhere is he from? Dialogue: 0,0:02:44.40,0:02:47.30,Default,,0000,0000,0000,,(Man) Years ago, you had to be an engineer\Nto interact with computers. Dialogue: 0,0:02:48.18,0:02:50.02,Default,,0000,0000,0000,,Today, everybody can interact. Dialogue: 0,0:02:50.35,0:02:53.77,Default,,0000,0000,0000,,- One thing still in its\Ninfancy is understanding. Dialogue: 0,0:02:53.77,0:02:56.41,Default,,0000,0000,0000,,- We need a far more sophisticated\Nlanguage understanding model Dialogue: 0,0:02:56.41,0:02:58.84,Default,,0000,0000,0000,,that understands what the sentence means. Dialogue: 0,0:02:58.84,0:03:00.90,Default,,0000,0000,0000,,We're still a very long way from that. Dialogue: 0,0:03:01.49,0:03:02.51,Default,,0000,0000,0000,,(Beeping) Dialogue: 0,0:03:03.89,0:03:06.91,Default,,0000,0000,0000,,♪ (Soft background music) ♪ Dialogue: 0,0:03:07.74,0:03:12.26,Default,,0000,0000,0000,,(Woman) Our ability to use language is one\Nof the things that helps us have culture. Dialogue: 0,0:03:13.87,0:03:18.65,Default,,0000,0000,0000,,It's one of the things that helps\Nus pass on traditions across generations. Dialogue: 0,0:03:19.52,0:03:25.88,Default,,0000,0000,0000,,Figuring out how the system of language\Nworks, even though it seems easy, Dialogue: 0,0:03:25.98,0:03:32.67,Default,,0000,0000,0000,,turns out to be very hard, but is one that\Nevery baby understands by 2 years old. Dialogue: 0,0:03:32.67,0:03:35.77,Default,,0000,0000,0000,,(Girl) There's two of them.\N(Woman) There's two Ls, yeah (spells word) Dialogue: 0,0:03:38.47,0:03:41.04,Default,,0000,0000,0000,,- Language is extremely complex\Nand sophisticated... Dialogue: 0,0:03:41.28,0:03:42.44,Default,,0000,0000,0000,,- From the semantics Dialogue: 0,0:03:42.44,0:03:43.82,Default,,0000,0000,0000,,- (Man in chair) Ironies...\N- (Woman) Strong accents... Dialogue: 0,0:03:43.82,0:03:45.18,Default,,0000,0000,0000,,- (Man) Facial expressions... Dialogue: 0,0:03:45.18,0:03:47.62,Default,,0000,0000,0000,,- Human emotions, because that's\Npart of how we communicate. Dialogue: 0,0:03:47.62,0:03:48.85,Default,,0000,0000,0000,,- Humor... Dialogue: 0,0:03:48.85,0:03:51.57,Default,,0000,0000,0000,,(Aside) Do I have to be careful\Nnot to offend the dinosaur? Dialogue: 0,0:03:51.57,0:03:54.59,Default,,0000,0000,0000,,- Language has so many different\Nlayers and that's why it's Dialogue: 0,0:03:54.59,0:03:56.55,Default,,0000,0000,0000,,such a difficult problem. Dialogue: 0,0:03:56.55,0:03:59.21,Default,,0000,0000,0000,,(Man) The present human brain\Nand the learning algorithms in it Dialogue: 0,0:03:59.21,0:04:01.100,Default,,0000,0000,0000,,are far, far better at things like\Nlanguage understanding Dialogue: 0,0:04:02.18,0:04:05.07,Default,,0000,0000,0000,,and they're still a lot better\Nat pun recognition. Dialogue: 0,0:04:05.47,0:04:09.38,Default,,0000,0000,0000,,- Whether or not we replicate exactly\Nwhat the brain does, to understand Dialogue: 0,0:04:09.38,0:04:12.62,Default,,0000,0000,0000,,language and speech, is still a question. Dialogue: 0,0:04:15.76,0:04:17.43,Default,,0000,0000,0000,,(Beeping) Dialogue: 0,0:04:17.76,0:04:23.61,Default,,0000,0000,0000,,(Man) For many years, we believed that\Nneural networks should work better than Dialogue: 0,0:04:23.61,0:04:26.91,Default,,0000,0000,0000,,the dumb existing technology that's\Nbasically just "table look-up" Dialogue: 0,0:04:27.71,0:04:33.48,Default,,0000,0000,0000,,and then, in 2009, two of my students\N(with some help from me) got it Dialogue: 0,0:04:33.48,0:04:36.85,Default,,0000,0000,0000,,working better. The first time it was\Njust a little better. Dialogue: 0,0:04:36.85,0:04:40.16,Default,,0000,0000,0000,,But it was obvious that this could be\Nimproved to work much better. Dialogue: 0,0:04:40.16,0:04:44.44,Default,,0000,0000,0000,,(Man) The brain has this system of neurons\Nall computing in parallel. Dialogue: 0,0:04:45.06,0:04:48.88,Default,,0000,0000,0000,,All knowledge in the brain is in the\Nstrength of connection between neurons. Dialogue: 0,0:04:49.61,0:04:53.22,Default,,0000,0000,0000,,What I mean by "neural net" is something\Nthat is simulated on a conventional Dialogue: 0,0:04:53.22,0:04:58.62,Default,,0000,0000,0000,,computer, but is designed to work in\Nroughly the same ways as the brain. Dialogue: 0,0:04:59.80,0:05:03.95,Default,,0000,0000,0000,,Until quite recently, people got features\Nby hand engineering them. Dialogue: 0,0:05:04.68,0:05:08.60,Default,,0000,0000,0000,,They looked at sine waves and did fourier\Nanalysis and tried to figure out Dialogue: 0,0:05:08.60,0:05:11.92,Default,,0000,0000,0000,,what features they should feed to the\Npattern recognition system. Dialogue: 0,0:05:12.26,0:05:14.73,Default,,0000,0000,0000,,The thing about neural networks is that\Nthey learn their own features. Dialogue: 0,0:05:14.95,0:05:20.00,Default,,0000,0000,0000,,In particular, they can learn features\Nand features of features, etc, Dialogue: 0,0:05:21.06,0:05:23.77,Default,,0000,0000,0000,,and that's lead to huge improvement\Nin speech recognition. Dialogue: 0,0:05:24.06,0:05:26.89,Default,,0000,0000,0000,,- But you can also use them for language\Nunderstanding tasks. Dialogue: 0,0:05:27.16,0:05:32.69,Default,,0000,0000,0000,,How you do this is to represent words\Nin very high-dimensional spaces. Dialogue: 0,0:05:32.99,0:05:36.39,Default,,0000,0000,0000,,- (Man) We can now deal with analogies\Nwhere a word is represented as a list Dialogue: 0,0:05:36.39,0:05:37.57,Default,,0000,0000,0000,,of numbers. Dialogue: 0,0:05:37.91,0:05:44.29,Default,,0000,0000,0000,,For example, if I take 100 numbers that\Nrepresent "Paris," and I subtract from it Dialogue: 0,0:05:44.29,0:05:50.31,Default,,0000,0000,0000,,"France" and add "Italy," if I look\Nat the numbers I have, the closest Dialogue: 0,0:05:50.31,0:05:53.00,Default,,0000,0000,0000,,thing is a list of numbers that\Nrepresents "Rome." Dialogue: 0,0:05:53.62,0:05:58.33,Default,,0000,0000,0000,,By first converting words into numbers,\Nusing a neural net, you can actually Dialogue: 0,0:05:58.33,0:06:00.77,Default,,0000,0000,0000,,do this analogical reasoning. Dialogue: 0,0:06:01.55,0:06:06.89,Default,,0000,0000,0000,,I predict that, in the next five years, it\Nwill be clear that these neural networks Dialogue: 0,0:06:06.89,0:06:10.83,Default,,0000,0000,0000,,with new learning algorithms will give us\Nmuch better language understanding. Dialogue: 0,0:06:13.88,0:06:19.08,Default,,0000,0000,0000,,(Woman) When we started out, we thought\Nthings like chess or mathematics or logic Dialogue: 0,0:06:19.08,0:06:21.88,Default,,0000,0000,0000,,would be things that were really hard. Dialogue: 0,0:06:21.88,0:06:26.18,Default,,0000,0000,0000,,They're not that hard. We ended up with\Na machine that played as well as Dialogue: 0,0:06:26.18,0:06:28.49,Default,,0000,0000,0000,,a Grand Master at chess. Dialogue: 0,0:06:28.49,0:06:33.09,Default,,0000,0000,0000,,What we thought would be easy for\Na computer system, like language, Dialogue: 0,0:06:33.09,0:06:37.25,Default,,0000,0000,0000,,has turned out incredibly hard. Dialogue: 0,0:06:37.25,0:06:42.10,Default,,0000,0000,0000,,(Man) I can't even imagine the moment of\Nsuccess quite yet because there are so many Dialogue: 0,0:06:42.10,0:06:46.53,Default,,0000,0000,0000,,pieces of this puzzle that are unsolved,\Nboth from a science point of view Dialogue: 0,0:06:46.53,0:06:51.02,Default,,0000,0000,0000,,as well as a technical implementation\Npoint of view. Dialogue: 0,0:06:51.02,0:06:52.65,Default,,0000,0000,0000,,There are a lot of unknowns. Dialogue: 0,0:06:52.88,0:06:56.59,Default,,0000,0000,0000,,(Woman) Those are the great revolutions.\NNot just what we fiddle with what Dialogue: 0,0:06:56.59,0:07:00.50,Default,,0000,0000,0000,,we already know, but when we discover\Nsomething completely new and unexpected. Dialogue: 0,0:07:00.50,0:07:03.59,Default,,0000,0000,0000,,(Man) Once you are in the area of Dialogue: 0,0:07:03.59,0:07:07.88,Default,,0000,0000,0000,,human-level performance, \Nthat will be pretty remarkable. Dialogue: 0,0:07:12.76,0:07:14.36,Default,,0000,0000,0000,,(Beep)