## 21-38 Spelling Data

• 0:00 - 0:06
Now, here I show some data that I've gathered from sites that deal with spelling correction,
• 0:06 - 0:12
and these are all examples of the correct spelling followed by misspelled words
• 0:12 - 0:15
and maybe multiple of them.
• 0:15 - 0:22
And from that we want to calculate the probability of a word given the correction.
• 0:22 - 0:29
So for example, we would like to know what's the probability of P-L-U-S-E
• 0:29 - 0:33
being the word that's spelled when the correct word was "pulse."
• 0:33 - 0:38
And we do have examples of that here. We have a single example.
• 0:38 - 0:42
But it's clear that we're just not going to have enough to cover all
• 0:42 - 0:46
the possible words we want to deal with and all the possible misspellings for those words.
• 0:46 - 0:49
With only tens of thousands of examples,
• 0:49 - 0:53
there are so many words in English that we're not going to have them all.
• 0:53 - 0:57
Instead of trying to deal with word-to-word spelling errors,
• 0:57 - 1:00
let's deal with letter-to-letter errors.
• 1:00 - 1:06
And so let's not say that this is "pulse" misspelled as "pluse,"
• 1:06 - 1:12
but rather let's say this is U-L misspelled as L-U.
• 1:12 - 1:19
Here, let's say this is the E in "elegant" misspelled as an A.
• 1:19 - 1:24
And we'll look at these types of edits from one word to another,
• 1:24 - 1:32
a transposition between 2, a replacement, or an insertion or deletion of a single letter.
• 1:32 - 1:37
We'll build up probability tables for those rather than probability tables for all the words.
• 1:37 -
That's much easier to do with a smaller amount of data.
Title:
21-38 Spelling Data
Team:
Udacity
Project:
CS271 - Intro to Artificial Intelligence
Duration:
01:42