Now I want to show you a visualization tool that I found online that I
think does a really great job of helping you see what k-means clustering does.
And that should give you a good intuition for how it works.
So I'd like to give a special shout out to Naftali Harris,
who wrote this visualization and very kindly agreed to let us use it.
I'll put a link to this website in the instructor notes that you can go and
play around with it on your own.
So it starts out by asking me how to pick the initial centroids of my clusters.
I'll start out with Randomly right now.
What kind of data would I like to use?
There are a number of different things here, and
I encourage you to play around with them.
A Gaussian Mixture has been really similar to one of the simple examples we've
done so far.
So Gaussian mixture data looks like this.
These are all the points that we have to classify.
The first question for you is,
how many centroids do you think is the correct number of centroids on this data?