Google Wave: Natural Language Processing

Author: Casey Whitelaw. Link to original: http://www.youtube.com/watch?v=Sx3Fpw0XCXk (English).

Translations of this material:

into Russian: Google Wave: Обработка естественных языков. Translation complete.
Submitted for translation by qmax 05.08.2009 Published 2 years, 4 months ago.

Text

Google Wave:
Обработка естественных языков.

Whitelaw: Hi. My name
is Casey Whitelaw.

I'm the Tech Lead

for the Natural Language
Processing Group

here in Sydney,
and today I'm gonna talk to you

a little bit about

some of the cool things
that we've added to Google Wave.

So one of the main things

that we want to stay focused on
in Google Wave is productivity.

We want users to be able
to stay productive,

whether they're reading
or whether they're writing.

One of the ways
that we've done that

is with our
spell correction system.

What we'd like is for users
just to be able to

focus on what they're typing
and not worry about

whether there's any mistakes
they've made.

We think that if people could
just loosen up a little bit

and, you know,
or maybe type 5% faster,

then that's 5% less time
that they spend typing.

So I'll start with an example.

It's probably the easiest way
to explain.

Let's say you want to meet up
with one of your friends.

You're having a chat.

So you write...

Let's...

met...

whoops...

tomorrow.

So here you see
I've made a mistake.

I've written met
instead of meet here.

My finger slipped on the "e."

So now, the way that we
implemented spelling

is we introduced an automatic
participant called Spelly

who works just like
another user

that's participating
on the wave with you.

So Spelly's on your wave
with you,

and it can see that you've
typed "Let's met tomorrow,"

and it's now gonna try
and spell-check it.

For each word...

it doesn't have any kind
of dictionary,

so it doesn't know whether
met is a well-spelled word

or a misspelling.

So to start with,
it comes up with a list

of possible candidate
corrections for this word.

So some examples of that
might be...

meat, the food...

or meet, the correctly
spelled version of this.

And you can imagine
lots of others.

So set or net or me--

all kinds of different words
that we would evaluate

to see whether they're what
you actually meant to type.

We've learned from the web

the kind of misspellings
that people make

and which things
are more and less likely.

So we know that,
for instance,

maybe slipping
and inserting an "A"

is relatively likely,

but misspelling
the very first letter

might be less likely
in this case.

So we've got some suggestions,
and the next thing that we do

is evaluate these suggestions
in context.

So there are other systems
at Google that already use

the same kind of statistical
language models as this,

such as the Google
translation system,

that essentially
encode information

about how language is used.

These are learned from the web

from looking at billions
of web pages,

so we get a really good idea

about the way that people
really use language in practice.

So what we would do

is look at the likelihood
of "Let's met tomorrow"

and "Let's meat tomorrow,"
less likely,

and "Let's meet tomorrow,"

which is gonna be more likely
than either of these.

And we combine that
with our error model

which tells us how likely
the misspellings are,

you know, without any context,
to get a final determination

as to what are
the most likely words--

most likely word
that you meant right here.

So in this case,
we would suggest meet.

Once we think
that a word is misspelled,

we need to get that back
to the Google Wave client

so that the user
can actually see it

and either correct it
automatically or manually.

Two kinds of ways

that this differs
from existing spelling systems.

One of them is just that
it's hosted.

And this means that we can do

this same kind of spelling
for you,

regardless of which device
you're connecting from.

So whether you're on your laptop
or your mobile or your desktop,

we can give the same
quality spelling, regardless.

And that applies
across languages too,

so, you know, we're doing this

for other alphabetic
languages also.

So like I said, we use large
statistical language models.

When I said large, you know,

we train them
from billions of words.

They end up being
many, many gigabytes.

Pages: ← previous Ctrl next
1 2

© Google.