Return to Video

New Enron Feature Solution - Intro to Machine Learning

  • 0:00 - 0:03
    Here's a visualization of the new feature.
  • 0:03 - 0:06
    Along the x axis here, I have the number of emails from a person of
  • 0:06 - 0:09
    interest to a given person in the data set.
  • 0:09 - 0:10
    Along the y axis,
  • 0:10 - 0:14
    I have something else that I think might give me some discrimination as well.
  • 0:14 - 0:18
    Which is the number of emails that this person sends to persons of interest.
  • 0:18 - 0:23
    What I've also done is colored my persons of interest red in the scatter plot,
  • 0:23 - 0:26
    so I can easily identify if there's some sort of pattern in this feature that I
  • 0:26 - 0:30
    start to see clumps of red dots all together, for example.
  • 0:30 - 0:32
    That would be an indication of something that a supervised learning
  • 0:32 - 0:37
    algorithm could exploit in trying to predict persons of interest.
  • 0:37 - 0:41
    And what I see is that there doesn't seem to be a very strong trend here.
  • 0:41 - 0:45
    The red points seem to be mixed in rather equally with the blue points.
  • 0:45 - 0:48
    Another thing that I notice is that there are a few outliers.
  • 0:48 - 0:52
    Most people, we only have maybe less than 100 emails to or
  • 0:52 - 0:55
    from them, but some people we have many, many more that that.
  • 0:55 - 1:00
    So this visualization leads me into the next step of repeating this process.
  • 1:00 - 1:05
    Using my human intuition to think about what features might be valuable here.
  • 1:05 - 1:08
    The thing that I'm thinking of at this point is maybe the feature that I
  • 1:08 - 1:13
    need is not the absolute number of emails from a person of interest to a,
  • 1:13 - 1:14
    a given person.
  • 1:14 - 1:18
    But the fraction of emails that a person receives that come
  • 1:18 - 1:19
    from a person of interest.
  • 1:19 - 1:24
    In other words, if you get 80% of your emails from persons of interest,
  • 1:24 - 1:27
    my intuition might be that you yourself are one.
  • 1:27 - 1:30
    But of course, I have to actually code up the feature to test this hypothesis.
Title:
New Enron Feature Solution - Intro to Machine Learning
Description:

more » « less
Video Language:
English
Team:
Udacity
Project:
ud120 - Intro to Machine Learning
Duration:
01:31

English subtitles

Revisions Compare revisions