WEBVTT 00:00:00.280 --> 00:00:03.950 Now I want to show you a visualization tool that I found online that I 00:00:03.950 --> 00:00:09.350 think does a really great job of helping you see what k-means clustering does. 00:00:09.350 --> 00:00:12.160 And that should give you a good intuition for how it works. 00:00:12.160 --> 00:00:15.800 So I'd like to give a special shout out to Naftali Harris, 00:00:15.800 --> 00:00:19.890 who wrote this visualization and very kindly agreed to let us use it. 00:00:19.890 --> 00:00:22.990 I'll put a link to this website in the instructor notes that you can go and 00:00:22.990 --> 00:00:24.640 play around with it on your own. 00:00:24.640 --> 00:00:29.720 So it starts out by asking me how to pick the initial centroids of my clusters. 00:00:29.720 --> 00:00:32.530 I'll start out with Randomly right now. 00:00:32.530 --> 00:00:34.490 What kind of data would I like to use? 00:00:34.490 --> 00:00:35.800 There are a number of different things here, and 00:00:35.800 --> 00:00:37.500 I encourage you to play around with them. 00:00:37.500 --> 00:00:40.630 A Gaussian Mixture has been really similar to one of the simple examples we've 00:00:40.630 --> 00:00:41.670 done so far. 00:00:41.670 --> 00:00:43.490 So Gaussian mixture data looks like this. 00:00:43.490 --> 00:00:45.460 These are all the points that we have to classify. 00:00:45.460 --> 00:00:46.540 The first question for you is, 00:00:46.540 --> 00:00:49.470 how many centroids do you think is the correct number of centroids on this data?