WEBVTT 99:59:59.999 --> 99:59:59.999 ♪ [music] ♪ 99:59:59.999 --> 99:59:59.999 - [Thomas Stratmann] Hi! 99:59:59.999 --> 99:59:59.999 In the upcoming series of videos 99:59:59.999 --> 99:59:59.999 we're going to give you a shiny new tool 99:59:59.999 --> 99:59:59.999 to put into your Understanding Data toolbox: 99:59:59.999 --> 99:59:59.999 linear regression. 99:59:59.999 --> 99:59:59.999 Say you've got this theory. 99:59:59.999 --> 99:59:59.999 You've witnessed how good-looking people 99:59:59.999 --> 99:59:59.999 seem to get special perks. 99:59:59.999 --> 99:59:59.999 You're wondering -- "Where else might we see this phenomenon?" 99:59:59.999 --> 99:59:59.999 What about full professors? 99:59:59.999 --> 99:59:59.999 Is it possible good-looking professors 99:59:59.999 --> 99:59:59.999 might get special perks too? 99:59:59.999 --> 99:59:59.999 Is it possible students treat them better 99:59:59.999 --> 99:59:59.999 by showering them with better student evaluations? 99:59:59.999 --> 99:59:59.999 If so, is the effect of looks 99:59:59.999 --> 99:59:59.999 on evaluation score big or [inaudible]? 99:59:59.999 --> 99:59:59.999 And say there is a new professor starting at the university. 99:59:59.999 --> 99:59:59.999 What can we predict about his evaluation 99:59:59.999 --> 99:59:59.999 simply by his looks? 99:59:59.999 --> 99:59:59.999 Given that these evaluations can determine pay raises, 99:59:59.999 --> 99:59:59.999 if this theory were true we might see professors resort 99:59:59.999 --> 99:59:59.999 to some surprising tactics to boost their scores. 99:59:59.999 --> 99:59:59.999 Suppose you wanted to find out 99:59:59.999 --> 99:59:59.999 if evaluations really improve with better looks. 99:59:59.999 --> 99:59:59.999 How would you go about testing this hypothesis? 99:59:59.999 --> 99:59:59.999 You could collect data. 99:59:59.999 --> 99:59:59.999 First you would have students rate on a scale from 1 to 10 99:59:59.999 --> 99:59:59.999 how good looking a professor was, 99:59:59.999 --> 99:59:59.999 which gives you an average beauty score. 99:59:59.999 --> 99:59:59.999 Then you could retrieve the teacher's teaching evaluations 99:59:59.999 --> 99:59:59.999 from 25 students. 99:59:59.999 --> 99:59:59.999 Let's look at these two variables at the same time 99:59:59.999 --> 99:59:59.999 by using a scatterplot. 99:59:59.999 --> 99:59:59.999 We'll put beauty on the horizontal axis, 99:59:59.999 --> 99:59:59.999 and teacher evaluations on the vertical axis. 99:59:59.999 --> 99:59:59.999 For example, this dot represents Professor Peate, 99:59:59.999 --> 99:59:59.999 who received a beauty score of 3 99:59:59.999 --> 99:59:59.999 and an evaluation of 8.425. 99:59:59.999 --> 99:59:59.999 This one way out here is Professor Helmchen. 99:59:59.999 --> 99:59:59.999 - [Professor Helmchen] Ridiculously good-looking! 99:59:59.999 --> 99:59:59.999 - [Thomas] Who got a very high beauty score, 99:59:59.999 --> 99:59:59.999 but not such a good evaluation. 99:59:59.999 --> 99:59:59.999 Can you see a trend? 99:59:59.999 --> 99:59:59.999 As we move from left to right on the horizontal axis, 99:59:59.999 --> 99:59:59.999 from the ugly to the gorgeous, 99:59:59.999 --> 99:59:59.999 we see a trend upwards in evaluation scores. 99:59:59.999 --> 99:59:59.999 By the way, the data we're exploring in this series 99:59:59.999 --> 99:59:59.999 is not made up -- it comes from a real study 99:59:59.999 --> 99:59:59.999 done at the University of Texas. 99:59:59.999 --> 99:59:59.999 If you're wondering, "pulchritude" is just the fancy academic way 99:59:59.999 --> 99:59:59.999 of saying beauty. 99:59:59.999 --> 99:59:59.999 With scatterplots it can sometimes be hard 99:59:59.999 --> 99:59:59.999 to make out the exact relationship between two variables -- 99:59:59.999 --> 99:59:59.999 especially when the variables bounce around quite a bit 99:59:59.999 --> 99:59:59.999 as we go from left to right. 99:59:59.999 --> 99:59:59.999 One way to cut through this bounciness 99:59:59.999 --> 99:59:59.999 is to draw a straight line through the data cloud 99:59:59.999 --> 99:59:59.999 in such a way that this line summarizes the data 99:59:59.999 --> 99:59:59.999 as closely as possible. 99:59:59.999 --> 99:59:59.999 The technical term for this is "linear regression." 99:59:59.999 --> 99:59:59.999 Later on we'll talk about how this line is created, 99:59:59.999 --> 99:59:59.999 but for now we can assume that the line fits the data 99:59:59.999 --> 99:59:59.999 as closely as possible. 99:59:59.999 --> 99:59:59.999 So, what can this line tell us? 99:59:59.999 --> 99:59:59.999 First, we immediately see 99:59:59.999 --> 99:59:59.999 if the line is sloping upward or downward. 99:59:59.999 --> 99:59:59.999 In our data set we see the [fitted] line slopes upward. 99:59:59.999 --> 99:59:59.999 It thus confirms what we have conjectured earlier 99:59:59.999 --> 99:59:59.999 by just looking at the scatterplot. 99:59:59.999 --> 99:59:59.999 The upward slope means that there is a positive association 99:59:59.999 --> 99:59:59.999 between looks and evaluation scores. 99:59:59.999 --> 99:59:59.999 In other words, on average, 99:59:59.999 --> 99:59:59.999 better-looking professors are getting better evaluations. 99:59:59.999 --> 99:59:59.999 For other data sets we might see a stronger positive association. 99:59:59.999 --> 99:59:59.999 Or, you might see a negative association. 99:59:59.999 --> 99:59:59.999 Or perhaps no association at all. 99:59:59.999 --> 99:59:59.999 And our lines don't have to be straight. 99:59:59.999 --> 99:59:59.999 They can curve to fit the data when necessary. 99:59:59.999 --> 99:59:59.999 This line also gives us a way to predict outcomes. 99:59:59.999 --> 99:59:59.999 We can simply take a beauty score and read off the line 99:59:59.999 --> 99:59:59.999 what the predicted evaluation score would be. 99:59:59.999 --> 99:59:59.999 So, back to our new professor. 99:59:59.999 --> 99:59:59.999 - [Professor Lloyd] Look familiar? 99:59:59.999 --> 99:59:59.999 - [Thomas] We can precisely predict his evaluation score. 99:59:59.999 --> 99:59:59.999 "But wait! Wait!" you might say. 99:59:59.999 --> 99:59:59.999 "Can we trust this prediction?" 99:59:59.999 --> 99:59:59.999 How well does this one beauty variable 99:59:59.999 --> 99:59:59.999 really predict evaluations? 99:59:59.999 --> 99:59:59.999 Linear regression gives us some useful measures 99:59:59.999 --> 99:59:59.999 to answer those questions 99:59:59.999 --> 99:59:59.999 which we'll cover in a future video. 99:59:59.999 --> 99:59:59.999 We also have to be aware of other pitfalls 99:59:59.999 --> 99:59:59.999 before we draw any definite conclusions. 99:59:59.999 --> 99:59:59.999 You could imagine a scenario 99:59:59.999 --> 99:59:59.999 where what is driving the association 99:59:59.999 --> 99:59:59.999 we see is really a third variable that we have left out. 99:59:59.999 --> 99:59:59.999 For example, the difficulty of the course 99:59:59.999 --> 99:59:59.999 might be behind the positive association 99:59:59.999 --> 99:59:59.999 between beauty ratings and evaluation scores. 99:59:59.999 --> 99:59:59.999 Easy intro. courses get good evaluations. 99:59:59.999 --> 99:59:59.999 Harder, more advanced courses get bad evaluations. 99:59:59.999 --> 99:59:59.999 And younger professors might get assigned to intro. courses. 99:59:59.999 --> 99:59:59.999 Then, if students judge younger professors more attractive, 99:59:59.999 --> 99:59:59.999 you will find a positive association 99:59:59.999 --> 99:59:59.999 between beauty ratings and evaluation scores. 99:59:59.999 --> 99:59:59.999 But it's really the difficulty of the course. 99:59:59.999 --> 99:59:59.999 The variable that we've left out, not beauty, 99:59:59.999 --> 99:59:59.999 that is driving evaluation scores. 99:59:59.999 --> 99:59:59.999 In that case, all the primping would be for naught -- 99:59:59.999 --> 99:59:59.999 a case of mistaken correlation for causation, 99:59:59.999 --> 99:59:59.999 something we'll talk about further in a later video. 99:59:59.999 --> 99:59:59.999 And what if there were other important variables 99:59:59.999 --> 99:59:59.999 that affect both beauty ratings and evaluation scores? 99:59:59.999 --> 99:59:59.999 You might want to add considerations like skill, 99:59:59.999 --> 99:59:59.999 race, sex, and whether English is the teacher's native language 99:59:59.999 --> 99:59:59.999 to isolate more cleanly the effect of beauty on evaluations. 99:59:59.999 --> 99:59:59.999 When we get into multiple regression 99:59:59.999 --> 99:59:59.999 we will be able to measure the impact of beauty 99:59:59.999 --> 99:59:59.999 on teacher evaluations 99:59:59.999 --> 99:59:59.999 while accounting for other variables 99:59:59.999 --> 99:59:59.999 that might confound this association. 99:59:59.999 --> 99:59:59.999 Next up, we'll get our hands dirty by playing with this data 99:59:59.999 --> 99:59:59.999 to gain a better understanding of what this line can tell us. 99:59:59.999 --> 99:59:59.999 - [Narrator] Congratulations! 99:59:59.999 --> 99:59:59.999 You're one step closer to being a data ninja! 99:59:59.999 --> 99:59:59.999 However, to master this you'll need to strengthen your skills 99:59:59.999 --> 99:59:59.999 with some practice questions. 99:59:59.999 --> 99:59:59.999 Ready for your next mission? Click "Next Video." 99:59:59.999 --> 99:59:59.999 Still here? 99:59:59.999 --> 99:59:59.999 Move from understanding eata to understanding your world 99:59:59.999 --> 99:59:59.999 by checking out MRU's other popular economics videos. 99:59:59.999 --> 99:59:59.999 ♪ [music] ♪