Return to Video

21-30 Best Segmentation 1

  • 0:00 - 0:02
    Now, 2^n is a lot.
  • 0:02 - 0:09
    For example, if we have 30 characters in our string, then there'd be a billion possible segmentations to deal with.
  • 0:09 - 0:12
    We clearly don't want to have to enumerate them all.
  • 0:12 - 0:15
    We'd like some way of searching through them efficiently
  • 0:15 - 0:19
    without having to consider the probability of every possible segmentation.
  • 0:19 - 0:25
    That's one of the reasons why making this naive Bayes assumption is so helpful.
  • 0:25 - 0:29
    It means that there's no interrelations between the various words,
  • 0:29 - 0:31
    so we can consider them one at a time.
  • 0:31 - 0:33
    That is, here's one thing we can say.
  • 0:33 - 0:39
    We can say that the best segmentation is equal to the argmax
  • 0:39 - 0:45
    over all possible segmentations of the string into a first word and the rest of the words
  • 0:45 - 0:53
    of the probability of that first word times the probability of the best segmentation of the rest of the words.
  • 0:53 - 0:55
    And notice that this is independent.
  • 0:55 - 1:00
    The best segmentation of the rest of the words doesn't depend on the first word.
  • 1:00 - 1:03
    And so that means we don't have to consider all interactions,
  • 1:03 - 1:06
    and we don't need to consider all 2^n possibilities.
  • 1:06 - 1:10
    So now we have two reasons why the naive Bayes assumption is a good thing.
  • 1:10 - 1:13
    One is it makes this computation much more efficient,
  • 1:13 - 1:16
    and secondly, it makes learning easier,
  • 1:16 - 1:19
    because it's easy to come up with a unigram probability.
  • 1:19 - 1:23
    What's the probability of an individual word from our corpus of text?
  • 1:23 - 1:27
    It's much harder to get combinations of multiple word sequences.
  • 1:27 - 1:32
    We're going to have to do more smoothing, more guessing what those probabilities are,
  • 1:32 -
    because we just won't have the counts for them.
Title:
21-30 Best Segmentation 1
Team:
Udacity
Project:
CS271 - Intro to Artificial Intelligence
Duration:
01:36
Amara Bot added a translation

English subtitles

Revisions