Return to Video

21-38 Spelling Data

  • 0:00 - 0:06
    Now, here I show some data that I've gathered from sites that deal with spelling correction,
  • 0:06 - 0:12
    and these are all examples of the correct spelling followed by misspelled words
  • 0:12 - 0:15
    and maybe multiple of them.
  • 0:15 - 0:22
    And from that we want to calculate the probability of a word given the correction.
  • 0:22 - 0:29
    So for example, we would like to know what's the probability of P-L-U-S-E
  • 0:29 - 0:33
    being the word that's spelled when the correct word was "pulse."
  • 0:33 - 0:38
    And we do have examples of that here. We have a single example.
  • 0:38 - 0:42
    But it's clear that we're just not going to have enough to cover all
  • 0:42 - 0:46
    the possible words we want to deal with and all the possible misspellings for those words.
  • 0:46 - 0:49
    With only tens of thousands of examples,
  • 0:49 - 0:53
    there are so many words in English that we're not going to have them all.
  • 0:53 - 0:57
    Instead of trying to deal with word-to-word spelling errors,
  • 0:57 - 1:00
    let's deal with letter-to-letter errors.
  • 1:00 - 1:06
    And so let's not say that this is "pulse" misspelled as "pluse,"
  • 1:06 - 1:12
    but rather let's say this is U-L misspelled as L-U.
  • 1:12 - 1:19
    Here, let's say this is the E in "elegant" misspelled as an A.
  • 1:19 - 1:24
    And we'll look at these types of edits from one word to another,
  • 1:24 - 1:32
    a transposition between 2, a replacement, or an insertion or deletion of a single letter.
  • 1:32 - 1:37
    We'll build up probability tables for those rather than probability tables for all the words.
  • 1:37 -
    That's much easier to do with a smaller amount of data.
Title:
21-38 Spelling Data
Team:
Udacity
Project:
CS271 - Intro to Artificial Intelligence
Duration:
01:42
Amara Bot added a translation

English subtitles

Revisions