1 00:00:00,107 --> 00:00:03,926 ♪ [music] ♪ 2 00:00:20,880 --> 00:00:22,077 - [Thomas Stratmann] Hi! 3 00:00:22,077 --> 00:00:24,268 In the upcoming series of videos, 4 00:00:24,268 --> 00:00:26,858 we're going to give you a shiny new tool 5 00:00:26,858 --> 00:00:30,414 to put into your Understanding Data toolbox: 6 00:00:30,414 --> 00:00:31,981 linear regression. 7 00:00:32,885 --> 00:00:34,668 Say you've got this theory. 8 00:00:34,668 --> 00:00:37,249 You've witnessed how good-looking people 9 00:00:37,249 --> 00:00:39,067 seem to get special perks. 10 00:00:39,642 --> 00:00:40,878 You're wondering, 11 00:00:40,878 --> 00:00:43,798 "Where else might we see this phenomenon?" 12 00:00:44,132 --> 00:00:45,637 What about for professors? 13 00:00:45,637 --> 00:00:48,259 Is it possible good-looking professors 14 00:00:48,259 --> 00:00:50,010 might get special perks too? 15 00:00:50,350 --> 00:00:53,899 Is it possible students treat them better 16 00:00:53,899 --> 00:00:57,209 by showering them with better student evaluations? 17 00:00:57,866 --> 00:01:00,467 If so, is the effect of looks 18 00:01:00,467 --> 00:01:03,573 on evaluations really big or really small? 19 00:01:04,159 --> 00:01:07,519 And say there is a new professor starting at a university. 20 00:01:07,519 --> 00:01:08,759 - [man] G'day, mate. 21 00:01:08,759 --> 00:01:11,810 - What can we predict about his evaluation 22 00:01:11,810 --> 00:01:13,371 simply by his looks? 23 00:01:13,940 --> 00:01:17,216 Given that these evaluations can determine pay raises, 24 00:01:17,671 --> 00:01:21,709 if this theory were true, we might see professors resort 25 00:01:21,709 --> 00:01:24,519 to some surprising tactics to boost their scores. 26 00:01:24,519 --> 00:01:25,731 - [Lloyd Christmas] Yeah! 27 00:01:25,731 --> 00:01:27,461 - Suppose you wanted to find out 28 00:01:27,461 --> 00:01:30,801 if evaluations really improve with better looks. 29 00:01:31,441 --> 00:01:34,450 How would you go about testing this hypothesis? 30 00:01:34,956 --> 00:01:36,552 You could collect data. 31 00:01:36,761 --> 00:01:40,025 First you would have students rate on a scale from 1 to 10 32 00:01:40,025 --> 00:01:42,076 how good-looking a professor was, 33 00:01:42,076 --> 00:01:44,807 which gives you an average beauty score. 34 00:01:45,229 --> 00:01:48,552 Then you could retrieve the teacher's teaching evaluations 35 00:01:48,552 --> 00:01:50,421 from twenty-five students. 36 00:01:50,421 --> 00:01:53,273 Let's look at these two variables at the same time 37 00:01:53,273 --> 00:01:54,738 by using a scatterplot. 38 00:01:54,981 --> 00:01:57,419 We'll put beauty on the horizontal axis, 39 00:01:57,852 --> 00:02:00,589 and teacher evaluations on the vertical axis. 40 00:02:01,223 --> 00:02:04,903 For example, this dot represents Professor Peate, 41 00:02:04,903 --> 00:02:06,423 - [Bib Fortuna] De wana wanga. 42 00:02:06,423 --> 00:02:08,811 - who received a beauty score of 3 43 00:02:08,811 --> 00:02:11,866 and an evaluation of 8.425. 44 00:02:12,084 --> 00:02:14,958 This one way out here is Professor Helmchen. 45 00:02:14,958 --> 00:02:16,797 - [Ben Stiller, "Zoolander"] Ridiculously good-looking! 46 00:02:16,797 --> 00:02:18,721 - Who got a very high beauty score, 47 00:02:18,721 --> 00:02:20,872 but not such a good evaluation. 48 00:02:21,101 --> 00:02:22,283 Can you see a trend? 49 00:02:22,283 --> 00:02:25,533 As we move from left to right on the horizontal axis, 50 00:02:25,533 --> 00:02:27,963 from the ugly to the gorgeous, 51 00:02:27,963 --> 00:02:31,186 we see a trend upwards in evaluation scores. 52 00:02:31,870 --> 00:02:35,174 By the way, the data we're exploring in this series 53 00:02:35,174 --> 00:02:38,923 is not made up -- it comes from a real study 54 00:02:38,923 --> 00:02:40,897 done at the University of Texas. 55 00:02:41,337 --> 00:02:46,023 If you're wondering, "pulchritude" is just the fancy academic way 56 00:02:46,023 --> 00:02:47,880 of saying beauty. 57 00:02:48,405 --> 00:02:51,474 With scatterplots, it can sometimes be hard 58 00:02:51,474 --> 00:02:55,594 to make out the exact relationship between two variables -- 59 00:02:55,594 --> 00:02:59,104 especially when the values bounce around quite a bit 60 00:02:59,104 --> 00:03:01,318 as we go from left to right. 61 00:03:02,000 --> 00:03:04,908 One way to cut through this bounciness 62 00:03:04,908 --> 00:03:08,144 is to draw a straight line through the data cloud 63 00:03:08,144 --> 00:03:10,775 in such a way that this line summarizes the data 64 00:03:10,775 --> 00:03:12,613 as closely as possible. 65 00:03:13,295 --> 00:03:17,181 The technical term for this is "linear regression." 66 00:03:17,669 --> 00:03:20,888 Later on we'll talk about how this line is created, 67 00:03:20,888 --> 00:03:24,278 but for now we can assume that the line fits the data 68 00:03:24,278 --> 00:03:26,456 as closely as possible. 69 00:03:27,087 --> 00:03:29,536 So, what can this line tell us? 70 00:03:30,067 --> 00:03:32,596 First, we immediately see 71 00:03:32,596 --> 00:03:35,358 if the line is sloping upward or downward. 72 00:03:36,107 --> 00:03:39,827 In our data set we see the fitted line slopes upward. 73 00:03:40,794 --> 00:03:43,807 It thus confirms what we have conjectured earlier 74 00:03:43,807 --> 00:03:45,587 by just looking at the scatterplot. 75 00:03:46,070 --> 00:03:50,237 The upward slope means that there is a positive association 76 00:03:50,237 --> 00:03:53,026 between looks and evaluation scores. 77 00:03:53,544 --> 00:03:55,907 In other words, on average, 78 00:03:55,907 --> 00:03:59,469 better-looking professors are getting better evaluations. 79 00:03:59,768 --> 00:04:03,939 For other data sets, we might see a stronger positive association. 80 00:04:04,377 --> 00:04:07,420 Or, you might see a negative association. 81 00:04:07,857 --> 00:04:10,764 Or perhaps no association at all. 82 00:04:11,158 --> 00:04:13,903 And our lines don't have to be straight. 83 00:04:14,389 --> 00:04:17,304 They can curve to fit the data when necessary. 84 00:04:17,770 --> 00:04:21,262 This line also gives us a way to predict outcomes. 85 00:04:21,579 --> 00:04:25,569 We can simply take a beauty score and read off the line 86 00:04:25,569 --> 00:04:28,429 what the predicted evaluation score would be. 87 00:04:28,429 --> 00:04:30,229 So, back to our new professor. 88 00:04:30,229 --> 00:04:31,297 - [Lloyd] Look familiar? 89 00:04:31,297 --> 00:04:34,109 - We can precisely predict his evaluation score. 90 00:04:34,683 --> 00:04:36,749 "But wait! Wait!" you might say. 91 00:04:37,019 --> 00:04:38,749 "Can we trust this prediction?" 92 00:04:39,233 --> 00:04:41,665 How well does this one beauty variable 93 00:04:41,665 --> 00:04:43,515 really predict evaluations? 94 00:04:44,844 --> 00:04:47,890 Linear regression gives us some useful measures 95 00:04:47,890 --> 00:04:49,770 to answer those questions 96 00:04:49,770 --> 00:04:52,039 which we'll cover in a future video. 97 00:04:52,838 --> 00:04:55,439 We also have to be aware of other pitfalls 98 00:04:55,439 --> 00:04:58,340 before we draw any definite conclusions. 99 00:04:58,833 --> 00:05:00,430 You could imagine a scenario 100 00:05:00,430 --> 00:05:03,639 where what is driving the association we see 101 00:05:03,639 --> 00:05:06,900 is really a third variable that we have left out. 102 00:05:07,344 --> 00:05:09,965 For example, the difficulty of the course 103 00:05:09,965 --> 00:05:12,456 might be behind the positive association 104 00:05:12,456 --> 00:05:15,645 between beauty ratings and evaluation scores. 105 00:05:16,052 --> 00:05:18,956 Easy intro courses get good evaluations. 106 00:05:19,228 --> 00:05:22,972 Harder, more advanced courses get bad evaluations. 107 00:05:23,660 --> 00:05:27,668 And younger professors might get assigned to intro courses. 108 00:05:28,080 --> 00:05:32,095 Then, if students judge younger professors more attractive, 109 00:05:32,095 --> 00:05:34,335 you will find a positive association 110 00:05:34,335 --> 00:05:37,383 between beauty ratings and evaluation scores. 111 00:05:37,861 --> 00:05:40,388 But it's really the difficulty of the course, 112 00:05:40,388 --> 00:05:43,537 the variable that we've left out, not beauty, 113 00:05:43,537 --> 00:05:45,848 that is driving evaluation scores. 114 00:05:46,346 --> 00:05:49,807 In that case, all the primping would be for naught -- 115 00:05:50,289 --> 00:05:53,620 a case of mistaken correlation for causation -- 116 00:05:53,620 --> 00:05:54,900 - [Lloyd] Wait a minute. 117 00:05:54,900 --> 00:05:58,166 - Something we'll talk about further in a later video. 118 00:05:58,922 --> 00:06:02,069 And what if there were other important variables 119 00:06:02,069 --> 00:06:05,781 that affect both beauty ratings and evaluation scores? 120 00:06:06,626 --> 00:06:09,575 You might want to add considerations like skill, 121 00:06:09,846 --> 00:06:14,577 race, sex, and whether English is the teacher's native language 122 00:06:14,577 --> 00:06:18,994 to isolate more cleanly the effect of beauty on evaluations. 123 00:06:19,408 --> 00:06:21,758 When we get into multiple regression, 124 00:06:21,758 --> 00:06:24,477 we will be able to measure the impact of beauty 125 00:06:24,477 --> 00:06:26,219 on teacher evaluations 126 00:06:26,219 --> 00:06:28,368 while accounting for other variables 127 00:06:28,368 --> 00:06:30,737 that might confound this association. 128 00:06:31,762 --> 00:06:35,509 Next up, we'll get our hands dirty by playing with this data 129 00:06:35,509 --> 00:06:39,070 to gain a better understanding of what this line can tell us. 130 00:06:41,169 --> 00:06:42,445 - [Narrator] Congratulations! 131 00:06:42,445 --> 00:06:45,247 You're one step closer to being a data ninja! 132 00:06:45,568 --> 00:06:47,139 However, to master this 133 00:06:47,139 --> 00:06:48,700 you'll need to strengthen your skills 134 00:06:48,700 --> 00:06:50,404 with some practice questions. 135 00:06:50,865 --> 00:06:53,976 Ready for your next mission? Click "Next Video." 136 00:06:54,313 --> 00:06:55,364 Still here? 137 00:06:55,598 --> 00:06:58,325 Move from understanding data to understanding your world 138 00:06:58,325 --> 00:07:01,642 by checking out MRU's other popular economics videos. 139 00:07:01,892 --> 00:07:04,406 ♪ [music] ♪