Return to Video

Order of Operations in Text Processing - Intro to Machine Learning

  • 0:00 - 0:02
    And this is another quiz where of course we
  • 0:02 - 0:04
    haven't already given you the answer.
  • 0:04 - 0:06
    Hopefully you had to think about it a little bit.
  • 0:06 - 0:09
    The answer is that you want to do stemming before you do the bag of
  • 0:09 - 0:12
    words representation and that's for two reasons.
  • 0:12 - 0:16
    The first one is if you put it in the bag of words representation before you
  • 0:16 - 0:19
    stem then there's kind of no point in stemming because you could get
  • 0:19 - 0:23
    the same word repeated many times within your bag of words representation.
  • 0:23 - 0:27
    You're not really like condensing the information in any useful way.
  • 0:27 - 0:29
    In fact, you're probably making it noisier and worse.
  • 0:29 - 0:33
    because you'll just have the word sponse in there six times.
  • 0:33 - 0:36
    Also it's more technically feasible to apply stemming first and
  • 0:36 - 0:39
    then put it in the bag of words representation.
  • 0:39 - 0:41
    Because stemming is going to assume a string.
  • 0:41 - 0:45
    And the bag of words representation is going to look like some kind of
  • 0:45 - 0:49
    matrix that has many different documents and, and words within those documents.
  • 0:49 - 0:51
    So you almost always want to do stemming as one of
  • 0:51 - 0:53
    the first steps in your text processing.
  • 0:53 - 0:56
    You go through and you stem each word and then put it into
  • 0:56 - 0:59
    the representation that you'll use in your machine learning algorithm.
Title:
Order of Operations in Text Processing - Intro to Machine Learning
Description:

more » « less
Video Language:
English
Team:
Udacity
Project:
ud120 - Intro to Machine Learning

English subtitles

Revisions Compare revisions