Behind the Mic: The Science of Talking with Computers

    ♪ (Music fades in) ♪
  • 0:25 - 0:30
    (Vocalizations, different languages)
    (Talking overlaps in background)
    (Computerized beeping)
    (Man) We come into this world with the
    innate ability to learn to interact
    with other sentient beings.
    (Child vocalizing)
    (Man) Suppose you are to interact with
    other people by writing little messages.
    (Man) It'd be a real pain.
    (Man) And that's how we interact
    with computers.
    It's much easier just to talk to them...
    just so much easier...
    (Man) If the computers could understand
    what we're saying.
    For that, you need really
    good speech recognition.
    (Narrator) The first speech recognition
    system was developed by Bell Laboratories
    in 1952. It could only recognize
    numbers spoken by one person.
    In the 1970s, Carnegie-Mellon
    came out with the Harpy System.
    This was able to recognize over
    1,000 words and different pronunciations
    (Narrator) of the same word.
    - (Man) Tomato - (Woman) Tomato
    (Narrator) Speech recognition continued
    in the 80s with the introduction of the
    Hidden Markov Model, which
    used a more mathematical approach
    to analyzing sound waves that led to
    many breakthroughs we have today.
    You're taking in very raw audio wave forms
    like you get through a microphone
    on your phone
    or whatever...
    (Woman) We chop it into small pieces
    and it tries to identify which phoneme
    was spoken in that piece of speech.
    - Phoneme is a primitive unit for
    expressing words.
    (voicing phonemes shown above)
  • 2:15 - 2:20
    And then you stitch those together
    into likely words like Palo Alto.
    - Speech recognition today is good at
    transcribing what you've said...
    (Man, to phone) What's the weather
    like in Topeka?
    (Man) You can talk about travels, your
    contacts, like, "Where can I get pizza?"
    (Phone) Here are the listings for Pizza.
    (Man) "How tall is the Eiffel Tower?"
    (Phone) The Eiffel Tower is ...
    (Woman) We've made tremendous
    improvements very quickly.
    (Man, to phone) Who is the 21st
    President of the United States?
    (Phone beeps)
    (Phone) Chester A. Arthur was the 21st...
    (Man, to phone) Okay, Google,
    where is he from?
    (Man) Years ago, you had to be an engineer
    to interact with computers.
    Today, everybody can interact.
    - One thing still in its
    infancy is understanding.
    - We need a far more sophisticated
    language understanding model
    that understands what the sentence means.
    We're still a very long way from that.
    ♪ (Soft background music) ♪
    (Woman) Our ability to use language is one
    of the things that helps us have culture.
    It's one of the things that helps
    us pass on traditions across generations.
    Figuring out how the system of language
    works, even though it seems easy,
    turns out to be very hard, but is one that
    every baby understands by 2 years old.
    (Girl) There's two of them.
    (Woman) There's two Ls, yeah (spells word)
    - Language is extremely complex
    and sophisticated...
    - From the semantics
    - (Man in chair) Ironies...
    - (Woman) Strong accents...
    - (Man) Facial expressions...
    - Human emotions, because that's
    part of how we communicate.
    - Humor...
    (Aside) Do I have to be careful
    not to offend the dinosaur?
    - Language has so many different
    layers and that's why it's
    such a difficult problem.
    (Man) The present human brain
    and the learning algorithms in it
    are far, far better at things like
    language understanding
    and they're still a lot better
    at pun recognition.
    - Whether or not we replicate exactly
    what the brain does, to understand
  • 4:09 - 4:13
  • 4:16 - 4:17
    (Man) For many years, we believed that
    neural networks should work better than
    the dumb existing technology that's
    basically just "table look-up"
    and then, in 2009, two of my students
    (with some help from me) got it
    working better. The first time it was
    just a little better.
    But it was obvious that this could be
    improved to work much better.
    (Man) The brain has this system of neurons
    all computing in parallel.
    All knowledge in the brain is in the
    strength of connection between neurons.
    What I mean by "neural net" is something
    that is simulated on a conventional
    computer, but is designed to work in
    roughly the same ways as the brain.
    Until quite recently, people got features
    by hand engineering them.
    They looked at sine waves and did fourier
    analysis and tried to figure out
    what features they should feed to the
    pattern recognition system.
    The thing about neural networks is that
    they learn their own features.
    In particular, they can learn features
    and features of features, etc,
    and that's lead to huge improvement
    in speech recognition.
    - But you can also use them for language
    understanding tasks.
    How you do this is to represent words
    in very high-dimensional spaces.
    - (Man) We can now deal with analogies
    where a word is represented as a list
    of numbers.
    For example, if I take 100 numbers that
    represent "Paris," and I subtract from it
  • 5:44 - 5:50
    at the numbers I have, the closest
  • 5:50 - 5:53
    represents "Rome."
  • 5:54 - 5:58
    using a neural net, you can actually
  • 5:58 - 6:01
  • 6:02 - 6:07
    I predict that, in the next five years, it
    will be clear that these neural networks
  • 6:07 - 6:11
    much better language understanding.
    (Woman) When we started out, we thought
    things like chess or mathematics or logic
  • 6:19 - 6:22
  • 6:22 - 6:26
    They're not that hard. We ended up with
  • 6:26 - 6:28
    a Grand Master at chess.
    What we thought would be easy for
    a computer system, like language,
    has turned out incredibly hard.
  • 6:37 - 6:42
    success quite yet because there are so many
  • 6:42 - 6:47
    both from a science point of view
  • 6:47 - 6:51
    point of view.
  • 6:51 - 6:53
  • 6:53 - 6:57
    (Woman) Those are the great revolutions.
  • 6:57 - 7:00
    we already know, but when we discover
  • 7:00 - 7:04
    (Man) Once you are in the area of
    human-level performance,
    that will be pretty remarkable.
