Behind the Mic: The Science of Talking with Computers

Rollback to version 3

0:04 - 0:07

♪ (Music fades in) ♪
0:16 - 0:19

(Chirping)
0:25 - 0:30

(Vocalizations, different languages)
0:33 - 0:36

(Talking overlaps in background)
0:38 - 0:41

(Computerized beeping)
0:50 - 0:54

(Man) We come into this world with the
innate ability to learn to interact
0:54 - 0:57

with other sentient beings.
0:59 - 1:00

(Child vocalizing)
1:01 - 1:04

(Man) Suppose you are to interact with
other people by writing little messages.
1:06 - 1:07

(Man) It'd be a real pain.
1:07 - 1:09

(Man) And that's how we interact
with computers.
1:09 - 1:13

It's much easier just to talk to them...
just so much easier...
1:13 - 1:16

(Man) If the computers could understand
what we're saying.
1:18 - 1:21

For that, you need really
good speech recognition.
1:21 - 1:24

(Narrator) The first speech recognition
system was developed by Bell Laboratories
1:24 - 1:29

in 1952. It could only recognize
numbers spoken by one person.
1:29 - 1:32

In the 1970s, Carnegie-Mellon
came out with the Harpy System.
1:32 - 1:37

This was able to recognize over
1,000 words and different pronunciations
1:37 - 1:40

(Narrator) of the same word.
- (Man) Tomato - (Woman) Tomato
1:40 - 1:43

(Narrator) Speech recognition continued
in the 80s with the introduction of the
1:43 - 1:46

Hidden Markov Model, which
used a more mathematical approach
1:46 - 1:50

to analyzing sound waves that led to
many breakthroughs we have today.
1:50 - 1:53

You're taking in very raw audio wave forms
1:53 - 1:55

like you get through a microphone
1:55 - 1:56

on your phone
1:56 - 1:57

or whatever...
1:57 - 2:02

(Woman) We chop it into small pieces
and it tries to identify which phoneme
2:02 - 2:06

was spoken in that piece of speech.
2:06 - 2:09

- Phoneme is a primitive unit for
expressing words.
2:10 - 2:15

(voicing phonemes shown above)
2:15 - 2:20

And then you stitch those together
into likely words like Palo Alto.
2:20 - 2:24

- Speech recognition today is good at
transcribing what you've said...
2:24 - 2:26

(Man, to phone) What's the weather
like in Topeka?
2:26 - 2:30

(Man) You can talk about travels, your
contacts, like, "Where can I get pizza?"
2:30 - 2:32

(Phone) Here are the listings for Pizza.
2:32 - 2:34

(Man) "How tall is the Eiffel Tower?"
(Phone) The Eiffel Tower is ...
2:34 - 2:37

(Woman) We've made tremendous
improvements very quickly.
2:37 - 2:39

(Man, to phone) Who is the 21st
President of the United States?
2:40 - 2:43

(Phone beeps)
(Phone) Chester A. Arthur was the 21st...
2:43 - 2:44

(Man, to phone) Okay, Google,
where is he from?
2:44 - 2:47

(Man) Years ago, you had to be an engineer
to interact with computers.
2:48 - 2:50

Today, everybody can interact.
2:50 - 2:54

- One thing still in its
infancy is understanding.
2:54 - 2:56

- We need a far more sophisticated
language understanding model
2:56 - 2:59

that understands what the sentence means.
2:59 - 3:01

We're still a very long way from that.
3:01 - 3:03

(Beeping)
3:04 - 3:07

♪ (Soft background music) ♪
3:08 - 3:12

(Woman) Our ability to use language is one
of the things that helps us have culture.
3:14 - 3:19

It's one of the things that helps
us pass on traditions across generations.
3:20 - 3:26

Figuring out how the system of language
works, even though it seems easy,
3:26 - 3:33

turns out to be very hard, but is one that
every baby understands by 2 years old.
3:33 - 3:36

(Girl) There's two of them.
(Woman) There's two Ls, yeah (spells word)
3:38 - 3:41

- Language is extremely complex
and sophisticated...
3:41 - 3:42

- From the semantics
3:42 - 3:44

- (Man in chair) Ironies...
- (Woman) Strong accents...
3:44 - 3:45

- (Man) Facial expressions...
3:45 - 3:48

- Human emotions, because that's
part of how we communicate.
3:48 - 3:49

- Humor...
3:49 - 3:52

(Aside) Do I have to be careful
not to offend the dinosaur?
3:52 - 3:55

- Language has so many different
layers and that's why it's
3:55 - 3:57

such a difficult problem.
3:57 - 3:59

(Man) The present human brain
and the learning algorithms in it
3:59 - 4:02

are far, far better at things like
language understanding
4:02 - 4:05

and they're still a lot better
at pun recognition.
4:05 - 4:09

- Whether or not we replicate exactly
what the brain does, to understand
4:09 - 4:13

language and speech, is still a question.
4:16 - 4:17

(Beeping)
4:18 - 4:24

(Man) For many years, we believed that
neural networks should work better than
4:24 - 4:27

the dumb existing technology that's
basically just "table look-up"
4:28 - 4:33

and then, in 2009, two of my students
(with some help from me) got it
4:33 - 4:37

working better. The first time it was
just a little better.
4:37 - 4:40

But it was obvious that this could be
improved to work much better.
4:40 - 4:44

(Man) The brain has this system of neurons
all computing in parallel.
4:45 - 4:49

All knowledge in the brain is in the
strength of connection between neurons.
4:50 - 4:53

What I mean by "neural net" is something
that is simulated on a conventional
4:53 - 4:59

computer, but is designed to work in
roughly the same ways as the brain.
5:00 - 5:04

Until quite recently, people got features
by hand engineering them.
5:05 - 5:09

They looked at sine waves and did fourier
analysis and tried to figure out
5:09 - 5:12

what features they should feed to the
pattern recognition system.
5:12 - 5:15

The thing about neural networks is that
they learn their own features.
5:15 - 5:20

In particular, they can learn features
and features of features, etc,
5:21 - 5:24

and that's lead to huge improvement
in speech recognition.
5:24 - 5:27

- But you can also use them for language
understanding tasks.
5:27 - 5:33

How you do this is to represent words
in very high-dimensional spaces.
5:33 - 5:36

- (Man) We can now deal with analogies
where a word is represented as a list
5:36 - 5:38

of numbers.
5:38 - 5:44

For example, if I take 100 numbers that
represent "Paris," and I subtract from it
5:44 - 5:50

"France" and add "Italy," if I look
at the numbers I have, the closest
5:50 - 5:53

thing is a list of numbers that
represents "Rome."
5:54 - 5:58

By first converting words into numbers,
using a neural net, you can actually
5:58 - 6:01

do this analogical reasoning.
6:02 - 6:07

I predict that, in the next five years, it
will be clear that these neural networks
6:07 - 6:11

with new learning algorithms will give us
much better language understanding.
6:14 - 6:19

(Woman) When we started out, we thought
things like chess or mathematics or logic
6:19 - 6:22

would be things that were really hard.
6:22 - 6:26

They're not that hard. We ended up with
a machine that played as well as
6:26 - 6:28

a Grand Master at chess.
6:28 - 6:33

What we thought would be easy for
a computer system, like language,
6:33 - 6:37

has turned out incredibly hard.
6:37 - 6:42

(Man) I can't even imagine the moment of
success quite yet because there are so many
6:42 - 6:47

pieces of this puzzle that are unsolved,
both from a science point of view
6:47 - 6:51

as well as a technical implementation
point of view.
6:51 - 6:53

There are a lot of unknowns.
6:53 - 6:57

(Woman) Those are the great revolutions.
Not just what we fiddle with what
6:57 - 7:00

we already know, but when we discover
something completely new and unexpected.
7:00 - 7:04

(Man) Once you are in the area of
7:04 - 7:08

human-level performance,
that will be pretty remarkable.
7:13 - 7:14

(Beep)

Title:: Behind the Mic: The Science of Talking with Computers
Description:: more » « less
Video Language:: English
Team:: Captions Requested
Duration:: 07:19

	bsrcube commented on English subtitles for Behind the Mic: The Science of Talking with Computers
	michael.j.shepard edited English subtitles for Behind the Mic: The Science of Talking with Computers
	Michael Shepard edited English subtitles for Behind the Mic: The Science of Talking with Computers
	michael.j.shepard commented on English subtitles for Behind the Mic: The Science of Talking with Computers
	Claude Almansi commented on English subtitles for Behind the Mic: The Science of Talking with Computers
	Michael Shepard edited English subtitles for Behind the Mic: The Science of Talking with Computers
	Michael Shepard edited English subtitles for Behind the Mic: The Science of Talking with Computers
	Michael Shepard edited English subtitles for Behind the Mic: The Science of Talking with Computers

Claude Almansi

Thank you so much, Michael. You descriptions of non-verbal sounds are great!

Claude
michael.j.shepard

You're welcome - thank you very much. :)
bsrcube

Thank you, Michael and Claude.

English subtitles

Revisions Compare revisions

Revision 5 Edited

michael.j.shepard
Revision 4 Edited

Michael Shepard
Revision 3 Edited

Michael Shepard
Revision 2 Edited

Michael Shepard
Revision 1 Edited

Michael Shepard

	Revision Number	Author	Created
	5	michael.j.shepard
	4	Michael Shepard
	3	Michael Shepard
	2	Michael Shepard
	1	Michael Shepard

Behind the Mic: The Science of Talking with Computers

Revisions Compare revisions

Our website uses cookies

Operating cookies (Required)