Whitelaw: Hi. My name
is Casey Whitelaw.
I'm the Tech Lead
for the Natural Language
Processing Group
here in Sydney,
and today I'm gonna talk to you
a little bit about
some of the cool things
that we've added to Google Wave.
So one of the main things
that we want to stay focused on
in Google Wave is productivity.
We want users to be able
to stay productive,
whether they're reading
or whether they're writing.
One of the ways
that we've done that
is with our
spell correction system.
What we'd like is for users
just to be able to
focus on what they're typing
and not worry about
whether there's any mistakes
they've made.
We think that if people could
just loosen up a little bit
and, you know,
or maybe type 5% faster,
then that's 5% less time
that they spend typing.
So I'll start with an example.
It's probably the easiest way
to explain.
Let's say you want to meet up
with one of your friends.
You're having a chat.
So you write...
Let's...
met...
whoops...
tomorrow.
So here you see
I've made a mistake.
I've written met
instead of meet here.
My finger slipped on the "e."
So now, the way that we
implemented spelling
is we introduced an automatic
participant called Spelly
who works just like
another user
that's participating
on the wave with you.
So Spelly's on your wave
with you,
and it can see that you've
typed "Let's met tomorrow,"
and it's now gonna try
and spell-check it.
For each word...
it doesn't have any kind
of dictionary,
so it doesn't know whether
met is a well-spelled word
or a misspelling.
So to start with,
it comes up with a list
of possible candidate
corrections for this word.
So some examples of that
might be...
meat, the food...
or meet, the correctly
spelled version of this.
And you can imagine
lots of others.
So set or net or me--
all kinds of different words
that we would evaluate
to see whether they're what
you actually meant to type.
We've learned from the web
the kind of misspellings
that people make
and which things
are more and less likely.
So we know that,
for instance,
maybe slipping
and inserting an "A"
is relatively likely,
but misspelling
the very first letter
might be less likely
in this case.
So we've got some suggestions,
and the next thing that we do
is evaluate these suggestions
in context.
So there are other systems
at Google that already use
the same kind of statistical
language models as this,
such as the Google
translation system,
that essentially
encode information
about how language is used.
These are learned from the web
from looking at billions
of web pages,
so we get a really good idea
about the way that people
really use language in practice.
So what we would do
is look at the likelihood
of "Let's met tomorrow"
and "Let's meat tomorrow,"
less likely,
and "Let's meet tomorrow,"
which is gonna be more likely
than either of these.
And we combine that
with our error model
which tells us how likely
the misspellings are,
you know, without any context,
to get a final determination
as to what are
the most likely words--
most likely word
that you meant right here.
So in this case,
we would suggest meet.
Once we think
that a word is misspelled,
we need to get that back
to the Google Wave client
so that the user
can actually see it
and either correct it
automatically or manually.
Two kinds of ways
that this differs
from existing spelling systems.
One of them is just that
it's hosted.
And this means that we can do
this same kind of spelling
for you,
regardless of which device
you're connecting from.
So whether you're on your laptop
or your mobile or your desktop,
we can give the same
quality spelling, regardless.
And that applies
across languages too,
so, you know, we're doing this
for other alphabetic
languages also.
So like I said, we use large
statistical language models.
When I said large, you know,
we train them
from billions of words.
They end up being
many, many gigabytes.
It's pretty infeasible to run
these on a single machine,
which isn't such a problem
in a data center
where you can have
a set of machines
running a language model
and a spelling model together.
And then we can share
that spelling model
between many users
so that the cost per user
is very low.
So it's very efficient
for us to do this.
Once you realize
that you've got a system
that supports
collaborative editing,
that has structured data,
and that you can change
the user interface
by having remote participants,
then, really,
the sky's the limit.
I mean, there's all kinds
of existing
natural language technologies
like spell checking
or translation
that we can apply,
and we're seeing
a lot of new applications
as the way that we communicate
changes as well.
So, you know, really,
it's gonna be exciting times.