< Return to Video

AI: Training Data & Bias

  • 0:07 - 0:12
    Machine learning is only as good as the
    training data you put into it.
  • 0:12 - 0:16
    So, it's super important to use high quality data, and lots of it.
  • 0:17 - 0:22
    But if data is important, it's worth asking where does training data come from?
  • 0:22 - 0:26
    Often, computers are collecting training data from people like you and me,
  • 0:26 - 0:28
    without any effort on our part.
  • 0:28 - 0:31
    A video streaming service might keep track of what you watch, then it can recognize patterns
  • 0:32 - 0:36
    in that data to recommend what you might want to watch next.
  • 0:37 - 0:43
    Other times, you're directly asked to help, like when a website asks you to spot street signs and photos,
  • 0:44 - 0:49
    You're providing training data to help a
    machine learn to see, and maybe even one day drive.
  • 0:52 - 0:56
    Medical researchers can use
    medical images as training data to teach
  • 0:57 - 1:00
    computers how to recognize and diagnose diseases.
  • 1:00 - 1:06
    Machine Learning needs hundreds and thousands of images, and training direction from a doctor
  • 1:06 - 1:10
    who knows what to look for, before it can correctly identify disease.
  • 1:11 - 1:16
    Even with thousands of examples, there can be problems with the computer's predictions.
  • 1:16 - 1:21
    If X-ray data is only collected from men, then the computer's predictions may only work for men.
  • 1:22 - 1:26
    It may not recognize diseases when
    asked to diagnose the X-ray of a woman.
  • 1:27 - 1:31
    This blind spot in the training data
    creates something called bias.
  • 1:31 - 1:36
    Biased data favors some things, and de-prioritizes or excludes others.
  • 1:37 - 1:42
    Depending on how training data is collected, who is doing the collecting, and how the data is fed,
  • 1:42 - 1:45
    there is a chance that
    human bias is included in the data.
  • 1:46 - 1:51
    By learning from bias data, the computer may make biased predictions,
  • 1:51 - 1:54
    whether the people training the computer
    are aware of it or not.
  • 1:55 - 1:58
    When you are looking at training data, ask yourself two questions:
  • 1:59 - 2:02
    Is this enough data to accurately train a computer?
  • 2:02 - 2:07
    And, does this data represent all possible scenarios and users without bias?
  • 2:07 - 2:11
    This is where you, as the human training, play a crucial role.
  • 2:11 - 2:14
    It's up to you to give your machine unbiased data.
  • 2:14 - 2:18
    That means collecting tons of examples, from lots of sources.
  • 2:19 - 2:23
    Remember, when you pick and choose data for machine learning,
  • 2:23 - 2:27
    you're actually programming the algorithm, using training data instead of code.
  • 2:27 - 2:30
    The data IS the code.
  • 2:30 - 2:35
    The better the data you provide, the better the computer will learn.
Title:
AI: Training Data & Bias
Description:

more » « less
Video Language:
English
Team:
Code.org
Project:
How AI Works
Duration:
02:41

English subtitles

Revisions