Return to Video

Stopwords - Intro to Machine Learning

  • 0:00 - 0:03
    Some words just usually don't contain a lot of information.
  • 0:03 - 0:06
    And it can be really valuable to have an eye out for these words.
  • 0:06 - 0:09
    And to be able to just sort of remove them from your corpus, so
  • 0:09 - 0:10
    you don't have to to consider them.
  • 0:10 - 0:13
    You don't allow them to become noise in your dataset.
  • 0:14 - 0:17
    In general, this list of words is called stop words.
  • 0:17 - 0:20
    And the exact definition of what a stop word is can vary.
  • 0:20 - 0:24
    But in general, it's a low information word that occurs very frequently.
  • 0:24 - 0:29
    Some examples might include words like and, the, I, you and have.
  • 0:30 - 0:34
    And a very common pre-processing step in text analysis,
  • 0:34 - 0:38
    is to remove the stop words before you do anything else with the data.
  • 0:38 - 0:44
    Suppose that our body of stop words is the, in, for, you, will, have and be.
  • 0:44 - 0:48
    Say I just give these to you and say, by fiat, these are the stop words.
  • 0:49 - 0:51
    My question for you in a quiz,
  • 0:51 - 0:55
    is how many words will be removed when we remove the stop words from the message,
  • 0:55 - 0:58
    hi Katie, the machine learning class will be great.
  • 0:58 - 0:59
    Best, Sebastian
Title:
Stopwords - Intro to Machine Learning
Description:

more » « less
Video Language:
English
Team:
Udacity
Project:
ud120 - Intro to Machine Learning

English (United States) subtitles

Revisions