9:59:59.000,9:59:59.000 ♪ [music] ♪ 9:59:59.000,9:59:59.000 - [Thomas Stratmann] Hi! 9:59:59.000,9:59:59.000 In the upcoming series of videos 9:59:59.000,9:59:59.000 we're going to give you[br]a shiny new tool 9:59:59.000,9:59:59.000 to put into your[br]Understanding Data toolbox: 9:59:59.000,9:59:59.000 linear regression. 9:59:59.000,9:59:59.000 Say you've got this theory. 9:59:59.000,9:59:59.000 You've witnessed[br]how good-looking people 9:59:59.000,9:59:59.000 seem to get special perks. 9:59:59.000,9:59:59.000 You're wondering -- "Where else[br]might we see this phenomenon?" 9:59:59.000,9:59:59.000 What about full professors? 9:59:59.000,9:59:59.000 Is it possible[br]good-looking professors 9:59:59.000,9:59:59.000 might get special perks too? 9:59:59.000,9:59:59.000 Is it possible[br]students treat them better 9:59:59.000,9:59:59.000 by showering them[br]with better student evaluations? 9:59:59.000,9:59:59.000 If so, is the effect of looks 9:59:59.000,9:59:59.000 on evaluation score[br]big or [inaudible]? 9:59:59.000,9:59:59.000 And say there is a new professor[br]starting at the university. 9:59:59.000,9:59:59.000 What can we predict[br]about his evaluation 9:59:59.000,9:59:59.000 simply by his looks? 9:59:59.000,9:59:59.000 Given that these evaluations[br]can determine pay raises, 9:59:59.000,9:59:59.000 if this theory were true[br]we might see professors resort 9:59:59.000,9:59:59.000 to some surprising tactics[br]to boost their scores. 9:59:59.000,9:59:59.000 Suppose you wanted to find out 9:59:59.000,9:59:59.000 if evaluations really improve[br]with better looks. 9:59:59.000,9:59:59.000 How would you go about[br]testing this hypothesis? 9:59:59.000,9:59:59.000 You could collect data. 9:59:59.000,9:59:59.000 First you would have students rate[br]on a scale from 1 to 10 9:59:59.000,9:59:59.000 how good looking a professor was, 9:59:59.000,9:59:59.000 which gives you[br]an average beauty score. 9:59:59.000,9:59:59.000 Then you could retrieve[br]the teacher's teaching evaluations 9:59:59.000,9:59:59.000 from 25 students. 9:59:59.000,9:59:59.000 Let's look at these two variables[br]at the same time 9:59:59.000,9:59:59.000 by using a scatterplot. 9:59:59.000,9:59:59.000 We'll put beauty[br]on the horizontal axis, 9:59:59.000,9:59:59.000 and teacher evaluations[br]on the vertical axis. 9:59:59.000,9:59:59.000 For example, this dot[br]represents Professor Peate, 9:59:59.000,9:59:59.000 who received a beauty score of 3 9:59:59.000,9:59:59.000 and an evaluation of 8.425. 9:59:59.000,9:59:59.000 This one way out here[br]is Professor Helmchen. 9:59:59.000,9:59:59.000 - [Professor Helmchen][br]Ridiculously good-looking! 9:59:59.000,9:59:59.000 - [Thomas] Who got[br]a very high beauty score, 9:59:59.000,9:59:59.000 but not such a good evaluation. 9:59:59.000,9:59:59.000 Can you see a trend? 9:59:59.000,9:59:59.000 As we move from left to right[br]on the horizontal axis, 9:59:59.000,9:59:59.000 from the ugly to the gorgeous, 9:59:59.000,9:59:59.000 we see a trend upwards[br]in evaluation scores. 9:59:59.000,9:59:59.000 By the way, the data[br]we're exploring in this series 9:59:59.000,9:59:59.000 is not made up --[br]it comes from a real study 9:59:59.000,9:59:59.000 done at the University of Texas. 9:59:59.000,9:59:59.000 If you're wondering, "pulchritude"[br]is just the fancy academic way 9:59:59.000,9:59:59.000 of saying beauty. 9:59:59.000,9:59:59.000 With scatterplots[br]it can sometimes be hard 9:59:59.000,9:59:59.000 to make out the exact relationship[br]between two variables -- 9:59:59.000,9:59:59.000 especially when the variables[br]bounce around quite a bit 9:59:59.000,9:59:59.000 as we go from left to right. 9:59:59.000,9:59:59.000 One way to cut through[br]this bounciness 9:59:59.000,9:59:59.000 is to draw a straight line[br]through the data cloud 9:59:59.000,9:59:59.000 in such a way that this line[br]summarizes the data 9:59:59.000,9:59:59.000 as closely as possible. 9:59:59.000,9:59:59.000 The technical term for this[br]is "linear regression." 9:59:59.000,9:59:59.000 Later on we'll talk about[br]how this line is created, 9:59:59.000,9:59:59.000 but for now we can assume[br]that the line fits the data 9:59:59.000,9:59:59.000 as closely as possible. 9:59:59.000,9:59:59.000 So, what can this line tell us? 9:59:59.000,9:59:59.000 First, we immediately see 9:59:59.000,9:59:59.000 if the line is sloping[br]upward or downward. 9:59:59.000,9:59:59.000 In our data set we see[br]the [fitted] line slopes upward. 9:59:59.000,9:59:59.000 It thus confirms what[br]we have conjectured earlier 9:59:59.000,9:59:59.000 by just looking at the scatterplot. 9:59:59.000,9:59:59.000 The upward slope means[br]that there is a positive association 9:59:59.000,9:59:59.000 between looks[br]and evaluation scores. 9:59:59.000,9:59:59.000 In other words, on average, 9:59:59.000,9:59:59.000 better-looking professors[br]are getting better evaluations. 9:59:59.000,9:59:59.000 For other data sets we might see[br]a stronger positive association. 9:59:59.000,9:59:59.000 Or, you might see[br]a negative association. 9:59:59.000,9:59:59.000 Or perhaps no association at all. 9:59:59.000,9:59:59.000 And our lines[br]don't have to be straight. 9:59:59.000,9:59:59.000 They can curve to fit the data[br]when necessary. 9:59:59.000,9:59:59.000 This line also gives us[br]a way to predict outcomes. 9:59:59.000,9:59:59.000 We can simply take a beauty score[br]and read off the line 9:59:59.000,9:59:59.000 what the predicted[br]evaluation score would be. 9:59:59.000,9:59:59.000 So, back to our new professor. 9:59:59.000,9:59:59.000 - [Professor Lloyd] Look familiar? 9:59:59.000,9:59:59.000 - [Thomas] We can precisely[br]predict his evaluation score. 9:59:59.000,9:59:59.000 "But wait! Wait!" you might say. 9:59:59.000,9:59:59.000 "Can we trust this prediction?" 9:59:59.000,9:59:59.000 How well does[br]this one beauty variable 9:59:59.000,9:59:59.000 really predict evaluations? 9:59:59.000,9:59:59.000 Linear regression gives us[br]some useful measures 9:59:59.000,9:59:59.000 to answer those questions 9:59:59.000,9:59:59.000 which we'll cover[br]in a future video. 9:59:59.000,9:59:59.000 We also have to be aware[br]of other pitfalls 9:59:59.000,9:59:59.000 before we draw[br]any definite conclusions. 9:59:59.000,9:59:59.000 You could imagine a scenario 9:59:59.000,9:59:59.000 where what is driving[br]the association 9:59:59.000,9:59:59.000 we see is really a third variable[br]that we have left out. 9:59:59.000,9:59:59.000 For example,[br]the difficulty of the course 9:59:59.000,9:59:59.000 might be behind[br]the positive association 9:59:59.000,9:59:59.000 between beauty ratings[br]and evaluation scores. 9:59:59.000,9:59:59.000 Easy intro. courses[br]get good evaluations. 9:59:59.000,9:59:59.000 Harder, more advanced courses[br]get bad evaluations. 9:59:59.000,9:59:59.000 And younger professors might[br]get assigned to intro. courses. 9:59:59.000,9:59:59.000 Then, if students judge[br]younger professors more attractive, 9:59:59.000,9:59:59.000 you will find[br]a positive association 9:59:59.000,9:59:59.000 between beauty ratings[br]and evaluation scores. 9:59:59.000,9:59:59.000 But it's really[br]the difficulty of the course. 9:59:59.000,9:59:59.000 The variable that we've left out,[br]not beauty, 9:59:59.000,9:59:59.000 that is driving evaluation scores. 9:59:59.000,9:59:59.000 In that case, all the primping[br]would be for naught -- 9:59:59.000,9:59:59.000 a case of mistaken correlation[br]for causation, 9:59:59.000,9:59:59.000 something we'll talk about further[br]in a later video. 9:59:59.000,9:59:59.000 And what if there were[br]other important variables 9:59:59.000,9:59:59.000 that affect both beauty ratings[br]and evaluation scores? 9:59:59.000,9:59:59.000 You might want to add[br]considerations like skill, 9:59:59.000,9:59:59.000 race, sex, and whether English[br]is the teacher's native language 9:59:59.000,9:59:59.000 to isolate more cleanly the effect[br]of beauty on evaluations. 9:59:59.000,9:59:59.000 When we get[br]into multiple regression 9:59:59.000,9:59:59.000 we will be able to measure[br]the impact of beauty 9:59:59.000,9:59:59.000 on teacher evaluations 9:59:59.000,9:59:59.000 while accounting[br]for other variables 9:59:59.000,9:59:59.000 that might confound[br]this association. 9:59:59.000,9:59:59.000 Next up, we'll get our hands dirty[br]by playing with this data 9:59:59.000,9:59:59.000 to gain a better understanding[br]of what this line can tell us. 9:59:59.000,9:59:59.000 - [Narrator] Congratulations! 9:59:59.000,9:59:59.000 You're one step closer[br]to being a data ninja! 9:59:59.000,9:59:59.000 However, to master this you'll need[br]to strengthen your skills 9:59:59.000,9:59:59.000 with some practice questions. 9:59:59.000,9:59:59.000 Ready for your next mission?[br]Click "Next Video." 9:59:59.000,9:59:59.000 Still here? 9:59:59.000,9:59:59.000 Move from understanding eata[br]to understanding your world 9:59:59.000,9:59:59.000 by checking out MRU's[br]other popular economics videos. 9:59:59.000,9:59:59.000 ♪ [music] ♪