1 00:00:00,107 --> 00:00:03,926 ♪ [music] ♪ 2 00:00:21,040 --> 00:00:22,077 - [Thomas Stratmann] Hi! 3 00:00:22,077 --> 00:00:24,268 In the upcoming series of videos 4 00:00:24,268 --> 00:00:26,858 we're going to give you a shiny new tool 5 00:00:26,858 --> 00:00:30,414 to put into your Understanding Data toolbox: 6 00:00:30,414 --> 00:00:31,981 linear regression. 7 00:00:32,885 --> 00:00:34,668 Say you've got this theory. 8 00:00:34,668 --> 00:00:37,249 You've witnessed how good-looking people 9 00:00:37,249 --> 00:00:39,067 seem to get special perks. 10 00:00:39,642 --> 00:00:40,878 You're wondering, 11 00:00:40,878 --> 00:00:43,798 "Where else might we see this phenomenon?" 12 00:00:44,132 --> 00:00:45,637 What about for professors? 13 00:00:45,637 --> 00:00:48,259 Is it possible good-looking professors 14 00:00:48,259 --> 00:00:50,010 might get special perks too? 15 00:00:50,350 --> 00:00:53,899 Is it possible students treat them better 16 00:00:53,899 --> 00:00:57,209 by showering them with better student evaluations? 17 00:00:57,866 --> 00:01:00,467 If so, is the effect of looks 18 00:01:00,467 --> 00:01:03,573 on evaluation score big or [inaudible]? 19 00:01:04,349 --> 00:01:08,143 And say there is a new professor starting at a university. 20 00:01:08,619 --> 00:01:11,810 What can we predict about his evaluation 21 00:01:11,810 --> 00:01:13,371 simply by his looks? 22 00:01:13,940 --> 00:01:17,216 Given that these evaluations can determine pay raises, 23 00:01:17,671 --> 00:01:21,709 if this theory were true we might see professors resort 24 00:01:21,709 --> 00:01:24,980 to some surprising tactics to boost their scores. 25 00:01:25,471 --> 00:01:27,461 Suppose you wanted to find out 26 00:01:27,461 --> 00:01:30,801 if evaluations really improve with better looks. 27 00:01:31,441 --> 00:01:34,450 How would you go about testing this hypothesis? 28 00:01:34,956 --> 00:01:36,552 You could collect data. 29 00:01:36,761 --> 00:01:40,025 First you would have students rate on a scale from 1 to 10 30 00:01:40,025 --> 00:01:42,076 how good-looking a professor was, 31 00:01:42,076 --> 00:01:44,807 which gives you an average beauty score. 32 00:01:45,229 --> 00:01:48,552 Then you could retrieve the teacher's teaching evaluations 33 00:01:48,552 --> 00:01:50,421 from twenty-five students. 34 00:01:50,421 --> 00:01:53,273 Let's look at these two variables at the same time 35 00:01:53,273 --> 00:01:54,738 by using a scatterplot. 36 00:01:54,981 --> 00:01:57,419 We'll put beauty on the horizontal axis, 37 00:01:57,852 --> 00:02:00,589 and teacher evaluations on the vertical axis. 38 00:02:01,463 --> 00:02:05,514 For example, this dot represents Professor Peate, 39 00:02:06,173 --> 00:02:08,811 who received a beauty score of 3 40 00:02:08,811 --> 00:02:11,866 and an evaluation of 8.425. 41 00:02:12,084 --> 00:02:14,958 This one way out here is Professor Helmchen. 42 00:02:14,958 --> 00:02:16,797 - [Ben Stiller, "Zoolander"] Ridiculously good-looking! 43 00:02:16,797 --> 00:02:18,721 - [Thomas] Who got a very high beauty score, 44 00:02:18,721 --> 00:02:20,872 but not such a good evaluation. 45 00:02:21,101 --> 00:02:22,283 Can you see a trend? 46 00:02:22,283 --> 00:02:25,533 As we move from left to right on the horizontal axis, 47 00:02:25,533 --> 00:02:27,963 from the ugly to the gorgeous, 48 00:02:27,963 --> 00:02:31,186 we see a trend upwards in evaluation scores. 49 00:02:31,870 --> 00:02:35,174 By the way, the data we're exploring in this series 50 00:02:35,174 --> 00:02:38,923 is not made up -- it comes from a real study 51 00:02:38,923 --> 00:02:40,897 done at the University of Texas. 52 00:02:41,337 --> 00:02:46,023 If you're wondering, "pulchritude" is just the fancy academic way 53 00:02:46,023 --> 00:02:47,880 of saying beauty. 54 00:02:48,405 --> 00:02:51,474 With scatterplots it can sometimes be hard 55 00:02:51,474 --> 00:02:55,594 to make out the exact relationship between two variables -- 56 00:02:55,594 --> 00:02:59,104 especially when the values bounce around quite a bit 57 00:02:59,104 --> 00:03:01,318 as we go from left to right. 58 00:03:02,000 --> 00:03:04,908 One way to cut through this bounciness 59 00:03:04,908 --> 00:03:08,144 is to draw a straight line through the data cloud 60 00:03:08,144 --> 00:03:10,775 in such a way that this line summarizes the data 61 00:03:10,775 --> 00:03:12,613 as closely as possible. 62 00:03:13,295 --> 00:03:17,181 The technical term for this is "linear regression." 63 00:03:17,669 --> 00:03:20,888 Later on we'll talk about how this line is created, 64 00:03:20,888 --> 00:03:24,278 but for now we can assume that the line fits the data 65 00:03:24,278 --> 00:03:26,456 as closely as possible. 66 00:03:27,087 --> 00:03:29,536 So, what can this line tell us? 67 00:03:30,067 --> 00:03:32,596 First, we immediately see 68 00:03:32,596 --> 00:03:35,358 if the line is sloping upward or downward. 69 00:03:36,107 --> 00:03:39,827 In our data set we see the [fitted] line slopes upward. 70 00:03:40,794 --> 00:03:43,807 It thus confirms what we have conjectured earlier 71 00:03:43,807 --> 00:03:45,587 by just looking at the scatterplot. 72 00:03:46,070 --> 00:03:50,237 The upward slope means that there is a positive association 73 00:03:50,237 --> 00:03:53,026 between looks and evaluation scores. 74 00:03:53,544 --> 00:03:55,907 In other words, on average, 75 00:03:55,907 --> 00:03:59,469 better-looking professors are getting better evaluations. 76 00:03:59,768 --> 00:04:03,939 For other data sets we might see a stronger positive association. 77 00:04:04,377 --> 00:04:07,420 Or, you might see a negative association. 78 00:04:07,857 --> 00:04:10,764 Or perhaps no association at all. 79 00:04:11,158 --> 00:04:13,903 And our lines don't have to be straight. 80 00:04:14,389 --> 00:04:17,304 They can curve to fit the data when necessary. 81 00:04:17,770 --> 00:04:21,262 This line also gives us a way to predict outcomes. 82 00:04:21,579 --> 00:04:25,569 We can simply take a beauty score and read off the line 83 00:04:25,569 --> 00:04:28,429 what the predicted evaluation score would be. 84 00:04:28,609 --> 00:04:30,546 So, back to our new professor. 85 00:04:31,097 --> 00:04:34,109 We can precisely predict his evaluation score. 86 00:04:34,683 --> 00:04:36,749 "But wait! Wait!" you might say. 87 00:04:37,019 --> 00:04:38,749 "Can we trust this prediction?" 88 00:04:39,233 --> 00:04:41,665 How well does this one beauty variable 89 00:04:41,665 --> 00:04:43,515 really predict evaluations? 90 00:04:44,844 --> 00:04:47,890 Linear regression gives us some useful measures 91 00:04:47,890 --> 00:04:49,770 to answer those questions 92 00:04:49,770 --> 00:04:52,039 which we'll cover in a future video. 93 00:04:52,838 --> 00:04:55,439 We also have to be aware of other pitfalls 94 00:04:55,439 --> 00:04:58,340 before we draw any definite conclusions. 95 00:04:58,833 --> 00:05:00,430 You could imagine a scenario 96 00:05:00,430 --> 00:05:03,639 where what is driving the association we see 97 00:05:03,639 --> 00:05:06,900 is really a third variable that we have left out. 98 00:05:07,344 --> 00:05:09,965 For example, the difficulty of the course 99 00:05:09,965 --> 00:05:12,456 might be behind the positive association 100 00:05:12,456 --> 00:05:15,645 between beauty ratings and evaluation scores. 101 00:05:16,052 --> 00:05:18,956 Easy intro. courses get good evaluations. 102 00:05:19,228 --> 00:05:22,972 Harder, more advanced courses get bad evaluations. 103 00:05:23,660 --> 00:05:27,668 And younger professors might get assigned to intro. courses. 104 00:05:28,080 --> 00:05:32,095 Then, if students judge younger professors more attractive, 105 00:05:32,095 --> 00:05:34,335 you will find a positive association 106 00:05:34,335 --> 00:05:37,383 between beauty ratings and evaluation scores. 107 00:05:37,861 --> 00:05:40,388 But it's really the difficulty of the course, 108 00:05:40,388 --> 00:05:43,537 the variable that we've left out, not beauty, 109 00:05:43,537 --> 00:05:45,848 that is driving evaluation scores. 110 00:05:46,346 --> 00:05:49,807 In that case, all the primping would be for naught -- 111 00:05:50,289 --> 00:05:54,441 a case of mistaken correlation for causation, 112 00:05:54,900 --> 00:05:58,166 something we'll talk about further in a later video. 113 00:05:58,922 --> 00:06:02,069 And what if there were other important variables 114 00:06:02,069 --> 00:06:05,781 that affect both beauty ratings and evaluation scores? 115 00:06:06,626 --> 00:06:09,575 You might want to add considerations like skill, 116 00:06:09,846 --> 00:06:14,577 race, sex, and whether English is the teacher's native language 117 00:06:14,577 --> 00:06:18,994 to isolate more cleanly the effect of beauty on evaluations. 118 00:06:19,408 --> 00:06:21,758 When we get into multiple regression 119 00:06:21,758 --> 00:06:24,477 we will be able to measure the impact of beauty 120 00:06:24,477 --> 00:06:26,219 on teacher evaluations 121 00:06:26,219 --> 00:06:28,368 while accounting for other variables 122 00:06:28,368 --> 00:06:30,737 that might confound this association. 123 00:06:31,762 --> 00:06:35,509 Next up, we'll get our hands dirty by playing with this data 124 00:06:35,509 --> 00:06:39,070 to gain a better understanding of what this line can tell us. 125 00:06:41,169 --> 00:06:42,445 - [Narrator] Congratulations! 126 00:06:42,445 --> 00:06:45,247 You're one step closer to being a data ninja! 127 00:06:45,568 --> 00:06:47,139 However, to master this 128 00:06:47,139 --> 00:06:48,700 you'll need to strengthen your skills 129 00:06:48,700 --> 00:06:50,404 with some practice questions. 130 00:06:50,865 --> 00:06:53,976 Ready for your next mission? Click "Next Video." 131 00:06:54,313 --> 00:06:55,364 Still here? 132 00:06:55,598 --> 00:06:58,325 Move from understanding data to understanding your world 133 00:06:58,325 --> 00:07:01,642 by checking out MRU's other popular economics videos. 134 00:07:01,892 --> 00:07:04,406 ♪ [music] ♪