1 00:00:00,000 --> 00:00:02,000 Now, 2^n is a lot. 2 00:00:02,000 --> 00:00:09,000 For example, if we have 30 characters in our string, then there'd be a billion possible segmentations to deal with. 3 00:00:09,000 --> 00:00:12,000 We clearly don't want to have to enumerate them all. 4 00:00:12,000 --> 00:00:15,000 We'd like some way of searching through them efficiently 5 00:00:15,000 --> 00:00:19,000 without having to consider the probability of every possible segmentation. 6 00:00:19,000 --> 00:00:25,000 That's one of the reasons why making this naive Bayes assumption is so helpful. 7 00:00:25,000 --> 00:00:29,000 It means that there's no interrelations between the various words, 8 00:00:29,000 --> 00:00:31,000 so we can consider them one at a time. 9 00:00:31,000 --> 00:00:33,000 That is, here's one thing we can say. 10 00:00:33,000 --> 00:00:39,000 We can say that the best segmentation is equal to the argmax 11 00:00:39,000 --> 00:00:45,000 over all possible segmentations of the string into a first word and the rest of the words 12 00:00:45,000 --> 00:00:53,000 of the probability of that first word times the probability of the best segmentation of the rest of the words. 13 00:00:53,000 --> 00:00:55,000 And notice that this is independent. 14 00:00:55,000 --> 00:01:00,000 The best segmentation of the rest of the words doesn't depend on the first word. 15 00:01:00,000 --> 00:01:03,000 And so that means we don't have to consider all interactions, 16 00:01:03,000 --> 00:01:06,000 and we don't need to consider all 2^n possibilities. 17 00:01:06,000 --> 00:01:10,000 So now we have two reasons why the naive Bayes assumption is a good thing. 18 00:01:10,000 --> 00:01:13,000 One is it makes this computation much more efficient, 19 00:01:13,000 --> 00:01:16,000 and secondly, it makes learning easier, 20 00:01:16,000 --> 00:01:19,000 because it's easy to come up with a unigram probability. 21 00:01:19,000 --> 00:01:23,000 What's the probability of an individual word from our corpus of text? 22 00:01:23,000 --> 00:01:27,000 It's much harder to get combinations of multiple word sequences. 23 00:01:27,000 --> 00:01:32,000 We're going to have to do more smoothing, more guessing what those probabilities are, 24 00:01:32,000 --> 99:59:59,999 because we just won't have the counts for them.