Welcome to the mini project for feature selection.
In one of the earlier videos in this lesson I told you about when I was working
with the e-mail data, that there was a word that was effectively serving as
a signature on the e-mails and I didn't initially realize it.
Now, the mark of a good machine learner doesn't mean that they never make any
mistakes or that their features are always perfect.
It means that they're on the lookout for ways to check this and
to figure out if there is a bug in there that they need to go in and fix.
So in this case it would mean that there's a type of signature word,
that we would need to go in and remove in order for
us to, to feel like we were being fair in our supervised classification.
So this was a really big learning experience for me.
So I want to share it with you in this mini project.
I'm going to sort of take you into my head as I was trying to
figure out what was going on that I couldn't over fit this decision tree.
And how I figured out that there was one feature or
a couple features that were responsible for that.
And then, specifically,
how I figured out what words they were and how I removed them.
So that's what you'll be doing in this mini project.