## 21-38 Spelling Data

Now, here I show some data that I've gathered from sites that deal with spelling correction,
and these are all examples of the correct spelling followed by misspelled words
and maybe multiple of them.
And from that we want to calculate the probability of a word given the correction.
So for example, we would like to know what's the probability of P-L-U-S-E
being the word that's spelled when the correct word was "pulse."
And we do have examples of that here. We have a single example.
But it's clear that we're just not going to have enough to cover all
the possible words we want to deal with and all the possible misspellings for those words.
With only tens of thousands of examples,
there are so many words in English that we're not going to have them all.
Instead of trying to deal with word-to-word spelling errors,
let's deal with letter-to-letter errors.
And so let's not say that this is "pulse" misspelled as "pluse,"
but rather let's say this is U-L misspelled as L-U.
Here, let's say this is the E in "elegant" misspelled as an A.
And we'll look at these types of edits from one word to another,
a transposition between 2, a replacement, or an insertion or deletion of a single letter.
We'll build up probability tables for those rather than probability tables for all the words.
That's much easier to do with a smaller amount of data.
