-
Whitelaw: Hi. My name
is Casey Whitelaw.
-
I'm the Tech Lead
-
for the Natural Language
Processing Group
-
here in Sydney,
and today I'm gonna talk to you
-
a little bit about
-
some of the cool things
that we've added to Google Wave.
-
So one of the main things
-
that we want to stay focused on
in Google Wave is productivity.
-
We want users to be able
to stay productive,
-
whether they're reading
or whether they're writing.
-
One of the ways
that we've done that
-
is with our
spell correction system.
-
What we'd like is for users
just to be able to
-
focus on what they're typing
and not worry about
-
whether there's any mistakes
they've made.
-
We think that if people could
just loosen up a little bit
-
and, you know,
or maybe type 5% faster,
-
then that's 5% less time
that they spend typing.
-
So I'll start with an example.
-
It's probably the easiest way
to explain.
-
Let's say you want to meet up
with one of your friends.
-
You're having a chat.
-
So you write...
-
Let's...
-
met...
-
whoops...
-
tomorrow.
-
So here you see
I've made a mistake.
-
I've written met
instead of meet here.
-
My finger slipped on the "e."
-
So now, the way that we
implemented spelling
-
is we introduced an automatic
participant called Spelly
-
who works just like
another user
-
that's participating
on the wave with you.
-
So Spelly's on your wave
with you,
-
and it can see that you've
typed "Let's met tomorrow,"
-
and it's now gonna try
and spell-check it.
-
For each word...
-
it doesn't have any kind
of dictionary,
-
so it doesn't know whether
met is a well-spelled word
-
or a misspelling.
-
So to start with,
it comes up with a list
-
of possible candidate
corrections for this word.
-
So some examples of that
might be...
-
meat, the food...
-
or meet, the correctly
spelled version of this.
-
And you can imagine
lots of others.
-
So set or net or me--
-
all kinds of different words
that we would evaluate
-
to see whether they're what
you actually meant to type.
-
We've learned from the web
-
the kind of misspellings
that people make
-
and which things
are more and less likely.
-
So we know that,
for instance,
-
maybe slipping
and inserting an "A"
-
is relatively likely,
-
but misspelling
the very first letter
-
might be less likely
in this case.
-
So we've got some suggestions,
and the next thing that we do
-
is evaluate these suggestions
in context.
-
So there are other systems
at Google that already use
-
the same kind of statistical
language models as this,
-
such as the Google
translation system,
-
that essentially
encode information
-
about how language is used.
-
These are learned from the web
-
from looking at billions
of web pages,
-
so we get a really good idea
-
about the way that people
really use language in practice.
-
So what we would do
-
is look at the likelihood
of "Let's met tomorrow"
-
and "Let's meat tomorrow,"
less likely,
-
and "Let's meet tomorrow,"
-
which is gonna be more likely
than either of these.
-
And we combine that
with our error model
-
which tells us how likely
the misspellings are,
-
you know, without any context,
to get a final determination
-
as to what are
the most likely words--
-
most likely word
that you meant right here.
-
So in this case,
we would suggest meet.
-
Once we think
that a word is misspelled,
-
we need to get that back
to the Google Wave client
-
so that the user
can actually see it
-
and either correct it
automatically or manually.
-
Two kinds of ways
-
that this differs
from existing spelling systems.
-
One of them is just that
it's hosted.
-
And this means that we can do
-
this same kind of spelling
for you,
-
regardless of which device
you're connecting from.
-
So whether you're on your laptop
or your mobile or your desktop,
-
we can give the same
quality spelling, regardless.
-
And that applies
across languages too,
-
so, you know, we're doing this
-
for other alphabetic
languages also.
-
So like I said, we use large
statistical language models.
-
When I said large, you know,
-
we train them
from billions of words.
-
They end up being
many, many gigabytes.
-
It's pretty infeasible to run
these on a single machine,
-
which isn't such a problem
in a data center
-
where you can have
a set of machines
-
running a language model
and a spelling model together.
-
And then we can share
that spelling model
-
between many users
-
so that the cost per user
is very low.
-
So it's very efficient
for us to do this.
-
Once you realize
that you've got a system
-
that supports
collaborative editing,
-
that has structured data,
-
and that you can change
the user interface
-
by having remote participants,
-
then, really,
the sky's the limit.
-
I mean, there's all kinds
of existing
-
natural language technologies
like spell checking
-
or translation
that we can apply,
-
and we're seeing
a lot of new applications
-
as the way that we communicate
changes as well.
-
So, you know, really,
it's gonna be exciting times.