0:00:07.360,0:00:11.760 Machine learning is only as good as the[br]training data you put into it. 0:00:11.800,0:00:15.820 So, it's super important to use high quality data, and lots of it. 0:00:16.760,0:00:21.960 But if data is important, it's worth asking where does training data come from? 0:00:22.280,0:00:26.260 Often, computers are collecting training data from people like you and me, 0:00:26.260,0:00:27.860 without any effort on our part. 0:00:28.440,0:00:31.480 A video streaming service might keep track of what you watch, then it can recognize patterns 0:00:31.660,0:00:36.000 in that data to recommend what you might want to watch next. 0:00:37.420,0:00:43.200 Other times, you're directly asked to help, like when a website asks you to spot street signs and photos, 0:00:43.780,0:00:49.280 You're providing training data to help a[br]machine learn to see, and maybe even one day drive. 0:00:52.320,0:00:56.440 Medical researchers can use[br]medical images as training data to teach 0:00:56.520,0:00:59.900 computers how to recognize and diagnose diseases. 0:01:00.300,0:01:05.560 Machine Learning needs hundreds and thousands of images, and training direction from a doctor 0:01:05.640,0:01:09.920 who knows what to look for, before it can correctly identify disease. 0:01:10.520,0:01:15.540 Even with thousands of examples, there can be problems with the computer's predictions. 0:01:15.880,0:01:20.660 If X-ray data is only collected from men, then the computer's predictions may only work for men. 0:01:21.880,0:01:26.300 It may not recognize diseases when[br]asked to diagnose the X-ray of a woman. 0:01:26.620,0:01:30.820 This blind spot in the training data[br]creates something called bias. 0:01:31.260,0:01:36.420 Biased data favors some things, and de-prioritizes or excludes others. 0:01:36.780,0:01:41.800 Depending on how training data is collected, who is doing the collecting, and how the data is fed, 0:01:41.800,0:01:45.340 there is a chance that[br]human bias is included in the data. 0:01:45.880,0:01:50.700 By learning from bias data, the computer may make biased predictions, 0:01:50.780,0:01:54.320 whether the people training the computer[br]are aware of it or not. 0:01:54.760,0:01:58.400 When you are looking at training data, ask yourself two questions: 0:01:58.640,0:02:01.600 Is this enough data to accurately train a computer? 0:02:02.320,0:02:06.860 And, does this data represent all possible scenarios and users without bias? 0:02:07.460,0:02:11.040 This is where you, as the human training, play a crucial role. 0:02:11.160,0:02:14.500 It's up to you to give your machine unbiased data. 0:02:14.500,0:02:18.160 That means collecting tons of examples, from lots of sources. 0:02:19.300,0:02:22.580 Remember, when you pick and choose data for machine learning, 0:02:22.580,0:02:26.660 you're actually programming the algorithm, using training data instead of code. 0:02:27.100,0:02:29.780 The data IS the code. 0:02:30.180,0:02:34.680 The better the data you provide, the better the computer will learn.