-
♪ (Music fades in) ♪
-
(Chirping)
-
(Vocalizations, different languages)
-
(Talking overlaps in background)
-
(Computerized beeping)
-
(Man) We come into this world with the
innate ability to learn to interact
-
with other sentient beings.
-
(Child vocalizing)
-
(Man) Suppose you are to interact with
other people by writing little messages.
-
(Man) It'd be a real pain.
-
(Man) And that's how we interact
with computers.
-
It's much easier just to talk to them...
just so much easier...
-
(Man) If the computers could understand
what we're saying.
-
For that, you need really
good speech recognition.
-
(Narrator) The first speech recognition
system was developed by Bell Laboratories
-
in 1952. It could only recognize
numbers spoken by one person.
-
In the 1970s, Carnegie-Mellon
came out with the Harpy System.
-
This was able to recognize over
1,000 words and different pronunciations
-
(Narrator) of the same word.
- (Man) Tomato - (Woman) Tomato
-
(Narrator) Speech recognition continued
in the 80s with the introduction of the
-
Hidden Markov Model, which
used a more mathematical approach
-
to analyzing sound waves that led to
many breakthroughs we have today.
-
You're taking in very raw audio wave forms
-
like you get through a microphone
-
on your phone
-
or whatever...
-
(Woman) We chop it into small pieces
and it tries to identify which phoneme
-
was spoken in that piece of speech.
-
- Phoneme is a primitive unit for
expressing words.
-
(voicing phonemes shown above)
-
And then you stitch those together
into likely words like Palo Alto.
-
- Speech recognition today is good at
transcribing what you've said...
-
(Man, to phone) What's the weather
like in Topeka?
-
(Man) You can talk about travels, your
contacts, like, "Where can I get pizza?"
-
(Phone) Here are the listings for Pizza.
-
(Man) "How tall is the Eiffel Tower?"
(Phone) The Eiffel Tower is ...
-
(Woman) We've made tremendous
improvements very quickly.
-
(Man, to phone) Who is the 21st
President of the United States?
-
(Phone beeps)
(Phone) Chester A. Arthur was the 21st...
-
(Man, to phone) Okay, Google,
where is he from?
-
(Man) Years ago, you had to be an engineer
to interact with computers.
-
Today, everybody can interact.
-
- One thing still in its
infancy is understanding.
-
- We need a far more sophisticated
language understanding model
-
that understands what the sentence means.
-
We're still a very long way from that.
-
(Beeping)
-
♪ (Soft background music) ♪
-
(Woman) Our ability to use language is one
of the things that helps us have culture.
-
It's one of the things that helps
us pass on traditions across generations.
-
Figuring out how the system of language
works, even though it seems easy,
-
turns out to be very hard, but is one that
every baby understands by 2 years old.
-
(Girl) There's two of them.
(Woman) There's two Ls, yeah (spells word)
-
- Language is extremely complex
and sophisticated...
-
- From the semantics
-
- (Man in chair) Ironies...
- (Woman) Strong accents...
-
- (Man) Facial expressions...
-
- Human emotions, because that's
part of how we communicate.
-
- Humor...
-
(Aside) Do I have to be careful
not to offend the dinosaur?
-
- Language has so many different
layers and that's why it's
-
such a difficult problem.
-
(Man) The present human brain
and the learning algorithms in it
-
are far, far better at things like
language understanding
-
and they're still a lot better
at pun recognition.
-
- Whether or not we replicate exactly
what the brain does, to understand
-
language and speech, is still a question.
-
(Beeping)
-
(Man) For many years, we believed that
neural networks should work better than
-
the dumb existing technology that's
basically just "table look-up"
-
and then, in 2009, two of my students
(with some help from me) got it
-
working better. The first time it was
just a little better.
-
But it was obvious that this could be
improved to work much better.
-
(Man) The brain has this system of neurons
all computing in parallel.
-
All knowledge in the brain is in the
strength of connection between neurons.
-
What I mean by "neural net" is something
that is simulated on a conventional
-
computer, but is designed to work in
roughly the same ways as the brain.
-
Until quite recently, people got features
by hand engineering them.
-
They looked at sine waves and did fourier
analysis and tried to figure out
-
what features they should feed to the
pattern recognition system.
-
The thing about neural networks is that
they learn their own features.
-
In particular, they can learn features
and features of features, etc,
-
and that's lead to huge improvement
in speech recognition.
-
- But you can also use them for language
understanding tasks.
-
How you do this is to represent words
in very high-dimensional spaces.
-
- (Man) We can now deal with analogies
where a word is represented as a list
-
of numbers.
-
For example, if I take 100 numbers that
represent "Paris," and I subtract from it
-
"France" and add "Italy," if I look
at the numbers I have, the closest
-
thing is a list of numbers that
represents "Rome."
-
By first converting words into numbers,
using a neural net, you can actually
-
do this analogical reasoning.
-
I predict that, in the next five years, it
will be clear that these neural networks
-
with new learning algorithms will give us
much better language understanding.
-
(Woman) When we started out, we thought
things like chess or mathematics or logic
-
would be things that were really hard.
-
They're not that hard. We ended up with
a machine that played as well as
-
a Grand Master at chess.
-
What we thought would be easy for
a computer system, like language,
-
has turned out incredibly hard.
-
(Man) I can't even imagine the moment of
success quite yet because there are so many
-
pieces of this puzzle that are unsolved,
both from a science point of view
-
as well as a technical implementation
point of view.
-
There are a lot of unknowns.
-
(Woman) Those are the great revolutions.
Not just what we fiddle with what
-
we already know, but when we discover
something completely new and unexpected.
-
(Man) Once you are in the area of
-
human-level performance,
that will be pretty remarkable.
-
(Beep)
Claude Almansi
Thank you so much, Michael. You descriptions of non-verbal sounds are great!
Claude
michael.j.shepard
You're welcome - thank you very much. :)
bsrcube
Thank you, Michael and Claude.