﻿[Script Info] Title: [Events] Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text Dialogue: 0,0:00:00.00,0:00:02.00,Default,,0000,0000,0000,,Now, 2^n is a lot. Dialogue: 0,0:00:02.00,0:00:09.00,Default,,0000,0000,0000,,For example, if we have 30 characters in our string, then there'd be a billion possible segmentations to deal with. Dialogue: 0,0:00:09.00,0:00:12.00,Default,,0000,0000,0000,,We clearly don't want to have to enumerate them all. Dialogue: 0,0:00:12.00,0:00:15.00,Default,,0000,0000,0000,,We'd like some way of searching through them efficiently Dialogue: 0,0:00:15.00,0:00:19.00,Default,,0000,0000,0000,,without having to consider the probability of every possible segmentation. Dialogue: 0,0:00:19.00,0:00:25.00,Default,,0000,0000,0000,,That's one of the reasons why making this naive Bayes assumption is so helpful. Dialogue: 0,0:00:25.00,0:00:29.00,Default,,0000,0000,0000,,It means that there's no interrelations between the various words, Dialogue: 0,0:00:29.00,0:00:31.00,Default,,0000,0000,0000,,so we can consider them one at a time. Dialogue: 0,0:00:31.00,0:00:33.00,Default,,0000,0000,0000,,That is, here's one thing we can say. Dialogue: 0,0:00:33.00,0:00:39.00,Default,,0000,0000,0000,,We can say that the best segmentation is equal to the argmax Dialogue: 0,0:00:39.00,0:00:45.00,Default,,0000,0000,0000,,over all possible segmentations of the string into a first word and the rest of the words Dialogue: 0,0:00:45.00,0:00:53.00,Default,,0000,0000,0000,,of the probability of that first word times the probability of the best segmentation of the rest of the words. Dialogue: 0,0:00:53.00,0:00:55.00,Default,,0000,0000,0000,,And notice that this is independent. Dialogue: 0,0:00:55.00,0:01:00.00,Default,,0000,0000,0000,,The best segmentation of the rest of the words doesn't depend on the first word. Dialogue: 0,0:01:00.00,0:01:03.00,Default,,0000,0000,0000,,And so that means we don't have to consider all interactions, Dialogue: 0,0:01:03.00,0:01:06.00,Default,,0000,0000,0000,,and we don't need to consider all 2^n possibilities. Dialogue: 0,0:01:06.00,0:01:10.00,Default,,0000,0000,0000,,So now we have two reasons why the naive Bayes assumption is a good thing. Dialogue: 0,0:01:10.00,0:01:13.00,Default,,0000,0000,0000,,One is it makes this computation much more efficient, Dialogue: 0,0:01:13.00,0:01:16.00,Default,,0000,0000,0000,,and secondly, it makes learning easier, Dialogue: 0,0:01:16.00,0:01:19.00,Default,,0000,0000,0000,,because it's easy to come up with a unigram probability. Dialogue: 0,0:01:19.00,0:01:23.00,Default,,0000,0000,0000,,What's the probability of an individual word from our corpus of text? Dialogue: 0,0:01:23.00,0:01:27.00,Default,,0000,0000,0000,,It's much harder to get combinations of multiple word sequences. Dialogue: 0,0:01:27.00,0:01:32.00,Default,,0000,0000,0000,,We're going to have to do more smoothing, more guessing what those probabilities are, Dialogue: 0,0:01:32.00,9:59:59.99,Default,,0000,0000,0000,,because we just won't have the counts for them.