< Return to Video

Google Wave: Natural Language Processing

  • 0:06 - 0:08
    Whitelaw: Hi. My name
    is Casey Whitelaw.
  • 0:08 - 0:09
    I'm the Tech Lead
  • 0:09 - 0:11
    for the Natural Language
    Processing Group
  • 0:11 - 0:13
    here in Sydney,
    and today I'm gonna talk to you
  • 0:13 - 0:14
    a little bit about
  • 0:14 - 0:17
    some of the cool things
    that we've added to Google Wave.
  • 0:17 - 0:19
    So one of the main things
  • 0:19 - 0:22
    that we want to stay focused on
    in Google Wave is productivity.
  • 0:22 - 0:24
    We want users to be able
    to stay productive,
  • 0:24 - 0:26
    whether they're reading
    or whether they're writing.
  • 0:26 - 0:28
    One of the ways
    that we've done that
  • 0:28 - 0:29
    is with our
    spell correction system.
  • 0:29 - 0:32
    What we'd like is for users
    just to be able to
  • 0:32 - 0:35
    focus on what they're typing
    and not worry about
  • 0:35 - 0:37
    whether there's any mistakes
    they've made.
  • 0:37 - 0:39
    We think that if people could
    just loosen up a little bit
  • 0:39 - 0:41
    and, you know,
    or maybe type 5% faster,
  • 0:41 - 0:43
    then that's 5% less time
    that they spend typing.
  • 0:43 - 0:46
    So I'll start with an example.
  • 0:46 - 0:48
    It's probably the easiest way
    to explain.
  • 0:48 - 0:51
    Let's say you want to meet up
    with one of your friends.
  • 0:51 - 0:52
    You're having a chat.
  • 0:52 - 0:55
    So you write...
  • 0:55 - 0:56
    Let's...
  • 0:56 - 0:58
    met...
  • 0:58 - 1:00
    whoops...
  • 1:00 - 1:04
    tomorrow.
  • 1:04 - 1:06
    So here you see
    I've made a mistake.
  • 1:06 - 1:07
    I've written met
    instead of meet here.
  • 1:07 - 1:10
    My finger slipped on the "e."
  • 1:10 - 1:14
    So now, the way that we
    implemented spelling
  • 1:14 - 1:17
    is we introduced an automatic
    participant called Spelly
  • 1:17 - 1:20
    who works just like
    another user
  • 1:20 - 1:22
    that's participating
    on the wave with you.
  • 1:22 - 1:24
    So Spelly's on your wave
    with you,
  • 1:24 - 1:29
    and it can see that you've
    typed "Let's met tomorrow,"
  • 1:29 - 1:31
    and it's now gonna try
    and spell-check it.
  • 1:31 - 1:33
    For each word...
  • 1:33 - 1:36
    it doesn't have any kind
    of dictionary,
  • 1:36 - 1:40
    so it doesn't know whether
    met is a well-spelled word
  • 1:40 - 1:41
    or a misspelling.
  • 1:41 - 1:43
    So to start with,
    it comes up with a list
  • 1:43 - 1:47
    of possible candidate
    corrections for this word.
  • 1:47 - 1:50
    So some examples of that
    might be...
  • 1:50 - 1:53
    meat, the food...
  • 1:53 - 1:57
    or meet, the correctly
    spelled version of this.
  • 1:57 - 1:59
    And you can imagine
    lots of others.
  • 1:59 - 2:02
    So set or net or me--
  • 2:02 - 2:05
    all kinds of different words
    that we would evaluate
  • 2:05 - 2:09
    to see whether they're what
    you actually meant to type.
  • 2:09 - 2:13
    We've learned from the web
  • 2:13 - 2:15
    the kind of misspellings
    that people make
  • 2:15 - 2:17
    and which things
    are more and less likely.
  • 2:17 - 2:19
    So we know that,
    for instance,
  • 2:19 - 2:21
    maybe slipping
    and inserting an "A"
  • 2:21 - 2:22
    is relatively likely,
  • 2:22 - 2:25
    but misspelling
    the very first letter
  • 2:25 - 2:28
    might be less likely
    in this case.
  • 2:28 - 2:33
    So we've got some suggestions,
    and the next thing that we do
  • 2:33 - 2:35
    is evaluate these suggestions
    in context.
  • 2:35 - 2:38
    So there are other systems
    at Google that already use
  • 2:38 - 2:40
    the same kind of statistical
    language models as this,
  • 2:40 - 2:42
    such as the Google
    translation system,
  • 2:42 - 2:44
    that essentially
    encode information
  • 2:44 - 2:46
    about how language is used.
  • 2:46 - 2:48
    These are learned from the web
  • 2:48 - 2:50
    from looking at billions
    of web pages,
  • 2:50 - 2:51
    so we get a really good idea
  • 2:51 - 2:54
    about the way that people
    really use language in practice.
  • 2:54 - 2:55
    So what we would do
  • 2:55 - 2:59
    is look at the likelihood
    of "Let's met tomorrow"
  • 2:59 - 3:02
    and "Let's meat tomorrow,"
    less likely,
  • 3:02 - 3:04
    and "Let's meet tomorrow,"
  • 3:04 - 3:06
    which is gonna be more likely
    than either of these.
  • 3:06 - 3:08
    And we combine that
    with our error model
  • 3:08 - 3:10
    which tells us how likely
    the misspellings are,
  • 3:10 - 3:14
    you know, without any context,
    to get a final determination
  • 3:14 - 3:16
    as to what are
    the most likely words--
  • 3:16 - 3:19
    most likely word
    that you meant right here.
  • 3:19 - 3:22
    So in this case,
    we would suggest meet.
  • 3:22 - 3:25
    Once we think
    that a word is misspelled,
  • 3:25 - 3:29
    we need to get that back
    to the Google Wave client
  • 3:29 - 3:32
    so that the user
    can actually see it
  • 3:32 - 3:35
    and either correct it
    automatically or manually.
  • 3:35 - 3:36
    Two kinds of ways
  • 3:36 - 3:39
    that this differs
    from existing spelling systems.
  • 3:39 - 3:42
    One of them is just that
    it's hosted.
  • 3:42 - 3:44
    And this means that we can do
  • 3:44 - 3:46
    this same kind of spelling
    for you,
  • 3:46 - 3:49
    regardless of which device
    you're connecting from.
  • 3:49 - 3:53
    So whether you're on your laptop
    or your mobile or your desktop,
  • 3:53 - 3:56
    we can give the same
    quality spelling, regardless.
  • 3:56 - 3:58
    And that applies
    across languages too,
  • 3:58 - 3:59
    so, you know, we're doing this
  • 3:59 - 4:01
    for other alphabetic
    languages also.
  • 4:01 - 4:07
    So like I said, we use large
    statistical language models.
  • 4:07 - 4:08
    When I said large, you know,
  • 4:08 - 4:10
    we train them
    from billions of words.
  • 4:10 - 4:12
    They end up being
    many, many gigabytes.
  • 4:12 - 4:16
    It's pretty infeasible to run
    these on a single machine,
  • 4:16 - 4:18
    which isn't such a problem
    in a data center
  • 4:18 - 4:19
    where you can have
    a set of machines
  • 4:19 - 4:22
    running a language model
    and a spelling model together.
  • 4:22 - 4:27
    And then we can share
    that spelling model
  • 4:27 - 4:29
    between many users
  • 4:29 - 4:31
    so that the cost per user
    is very low.
  • 4:31 - 4:34
    So it's very efficient
    for us to do this.
  • 4:34 - 4:36
    Once you realize
    that you've got a system
  • 4:36 - 4:38
    that supports
    collaborative editing,
  • 4:38 - 4:40
    that has structured data,
  • 4:40 - 4:43
    and that you can change
    the user interface
  • 4:43 - 4:45
    by having remote participants,
  • 4:45 - 4:47
    then, really,
    the sky's the limit.
  • 4:47 - 4:49
    I mean, there's all kinds
    of existing
  • 4:49 - 4:51
    natural language technologies
    like spell checking
  • 4:51 - 4:53
    or translation
    that we can apply,
  • 4:53 - 4:56
    and we're seeing
    a lot of new applications
  • 4:56 - 4:58
    as the way that we communicate
    changes as well.
  • 4:58 - 5:01
    So, you know, really,
    it's gonna be exciting times.
Title:
Google Wave: Natural Language Processing
Description:

more » « less
Video Language:
English
Duration:
05:05

English subtitles

Revisions