1 00:00:00,280 --> 00:00:03,950 Now I want to show you a visualization tool that I found online that I 2 00:00:03,950 --> 00:00:09,350 think does a really great job of helping you see what k-means clustering does. 3 00:00:09,350 --> 00:00:12,160 And that should give you a good intuition for how it works. 4 00:00:12,160 --> 00:00:15,800 So I'd like to give a special shout out to Naftali Harris, 5 00:00:15,800 --> 00:00:19,890 who wrote this visualization and very kindly agreed to let us use it. 6 00:00:19,890 --> 00:00:22,990 I'll put a link to this website in the instructor notes that you can go and 7 00:00:22,990 --> 00:00:24,640 play around with it on your own. 8 00:00:24,640 --> 00:00:29,720 So it starts out by asking me how to pick the initial centroids of my clusters. 9 00:00:29,720 --> 00:00:32,530 I'll start out with Randomly right now. 10 00:00:32,530 --> 00:00:34,490 What kind of data would I like to use? 11 00:00:34,490 --> 00:00:35,800 There are a number of different things here, and 12 00:00:35,800 --> 00:00:37,500 I encourage you to play around with them. 13 00:00:37,500 --> 00:00:40,630 A Gaussian Mixture has been really similar to one of the simple examples we've 14 00:00:40,630 --> 00:00:41,670 done so far. 15 00:00:41,670 --> 00:00:43,490 So Gaussian mixture data looks like this. 16 00:00:43,490 --> 00:00:45,460 These are all the points that we have to classify. 17 00:00:45,460 --> 00:00:46,540 The first question for you is, 18 00:00:46,540 --> 00:00:49,470 how many centroids do you think is the correct number of centroids on this data?