
Title:
PCA in sklearn  Intro to Machine Learning

Description:

The principal components analysis that I'm doing happens to live in

this function called doPCA.

And it looks very familiar to a lot of the stuff we've done

before in scikitlearn.

You have an import statement where you

actually get the module out that has the code that you want.

You create, in this case, the principal components analysis.

You fit it.

And then you can return that as an object.

And so what I do is I, I get my principal components that, analysis that way.

And I can ask some very interesting,

some very important questions of it by accessing the attributes.

So let's explain these three lines.

This is how I actually get the information out of my PCA object.

The first, the explained variance ratio is actually where the eigenvalues live.

So by printing out this, this line here.

This is the way that I know that the first principle component has about 90,

91 percent of the variation in the data, and the second one has about 9, or

10 percent.

Those numbers come from this statement.

And then the second thing that I do is I look at the first and

second principle components.

I get these out of the components attribute of my PCA object.

So the components is going to be a list,

a Python list that has as many principle components in it.

As I ask for as a parameter.

So in that case, I have two principal components that I'm getting.

So I name them the first and second pc.

So in previous quizzes where we were talking about,

what's the direction in of, say, x prime in the xy original feature space,

we came up with two numbers that were sort of packaged together into a vector.

You can access that directional information through these components.

Once I've fit my principle components analysis I have to,

in order to do anything, perform something like a transformation of the data.

And this code I will just give you in the starter code for the quiz.

What I'm doing here is, I'm visualizing it.

The first line is in red.

I'll be plotting the first principle component,

the locations of all the points along that principle component.

As well as the direction of the principle components.

I'm accessing that information by using the elements of the first PC vector.

Then in Cyan or kind of a teal color, I'll be accessing the second

principle component, and in blue I have the original data.

So let me show you what this looks like,

and then you give it a try yourself in the quiz.

The first thing that you get is that you print out the eigenvalues.

Remember that's this explained variance ratio information.

And then the second thing is that you'll get a scatter plot.

And it should look something like this.

So you remember the red was the direction of our first principle component.

And that's hopefully exactly where you guessed it was.

Certainly intuitively seems like it's in the right place.

The cyan is perpendicular to that.

And then the blue is the original data points.

One thing that I'll add is that it looks to the eye like the red and

the cyan are not perfectly orthogonal.

This doesn't quite look like a 90 degree angle.

But remember that our axis have different scales.

That this one goes all the way out to ten million,

the y axis only goes out to less than half of that, about four million.

So, in reality, if we were to plot everything proportionally,

this graph should be twice as long as it is tall.

And if we were to visualize it in exactly that way, they would be orthogonal