0:00:00.107,0:00:03.926 ♪ [music] ♪ 0:00:20.880,0:00:22.077 - [Thomas Stratmann] Hi! 0:00:22.077,0:00:24.268 In the upcoming series of videos, 0:00:24.268,0:00:26.858 we're going to give you[br]a shiny new tool 0:00:26.858,0:00:30.414 to put into your[br]Understanding Data toolbox: 0:00:30.414,0:00:31.981 linear regression. 0:00:32.885,0:00:34.668 Say you've got this theory. 0:00:34.668,0:00:37.249 You've witnessed[br]how good-looking people 0:00:37.249,0:00:39.067 seem to get special perks. 0:00:39.642,0:00:40.878 You're wondering, 0:00:40.878,0:00:43.798 "Where else might we see[br]this phenomenon?" 0:00:44.132,0:00:45.637 What about for professors? 0:00:45.637,0:00:48.259 Is it possible[br]good-looking professors 0:00:48.259,0:00:50.010 might get special perks too? 0:00:50.350,0:00:53.899 Is it possible[br]students treat them better 0:00:53.899,0:00:57.209 by showering them[br]with better student evaluations? 0:00:57.866,0:01:00.467 If so, is the effect of looks 0:01:00.467,0:01:03.573 on evaluations really big [br]or really small? 0:01:04.349,0:01:08.263 And say there is a new professor[br]starting at a university. 0:01:08.619,0:01:11.810 What can we predict[br]about his evaluation 0:01:11.810,0:01:13.371 simply by his looks? 0:01:13.940,0:01:17.216 Given that these evaluations[br]can determine pay raises, 0:01:17.671,0:01:21.709 if this theory were true,[br]we might see professors resort 0:01:21.709,0:01:24.980 to some surprising tactics[br]to boost their scores. 0:01:25.471,0:01:27.461 Suppose you wanted to find out 0:01:27.461,0:01:30.801 if evaluations really improve[br]with better looks. 0:01:31.441,0:01:34.450 How would you go about[br]testing this hypothesis? 0:01:34.956,0:01:36.552 You could collect data. 0:01:36.761,0:01:40.025 First you would have students rate[br]on a scale from 1 to 10 0:01:40.025,0:01:42.076 how good-looking a professor was, 0:01:42.076,0:01:44.807 which gives you[br]an average beauty score. 0:01:45.229,0:01:48.552 Then you could retrieve[br]the teacher's teaching evaluations 0:01:48.552,0:01:50.421 from twenty-five students. 0:01:50.421,0:01:53.273 Let's look at these two variables[br]at the same time 0:01:53.273,0:01:54.738 by using a scatterplot. 0:01:54.981,0:01:57.419 We'll put beauty[br]on the horizontal axis, 0:01:57.852,0:02:00.589 and teacher evaluations[br]on the vertical axis. 0:02:01.463,0:02:03.173 For example, this dot[br]represents Professor Peate, 0:02:03.173,0:02:06.173 - ["Star Wars" 0:02:06.173,0:02:08.811 who received a beauty score of 3 0:02:08.811,0:02:11.866 and an evaluation of 8.425. 0:02:12.084,0:02:14.958 This one way out here[br]is Professor Helmchen. 0:02:14.958,0:02:16.797 - [Ben Stiller, "Zoolander"][br]Ridiculously good-looking! 0:02:16.797,0:02:18.721 - [Thomas] Who got[br]a very high beauty score, 0:02:18.721,0:02:20.872 but not such a good evaluation. 0:02:21.101,0:02:22.283 Can you see a trend? 0:02:22.283,0:02:25.533 As we move from left to right[br]on the horizontal axis, 0:02:25.533,0:02:27.963 from the ugly to the gorgeous, 0:02:27.963,0:02:31.186 we see a trend upwards[br]in evaluation scores. 0:02:31.870,0:02:35.174 By the way, the data[br]we're exploring in this series 0:02:35.174,0:02:38.923 is not made up --[br]it comes from a real study 0:02:38.923,0:02:40.897 done at the University of Texas. 0:02:41.337,0:02:46.023 If you're wondering, "pulchritude"[br]is just the fancy academic way 0:02:46.023,0:02:47.880 of saying beauty. 0:02:48.405,0:02:51.474 With scatterplots[br]it can sometimes be hard 0:02:51.474,0:02:55.594 to make out the exact relationship[br]between two variables -- 0:02:55.594,0:02:59.104 especially when the values[br]bounce around quite a bit 0:02:59.104,0:03:01.318 as we go from left to right. 0:03:02.000,0:03:04.908 One way to cut through[br]this bounciness 0:03:04.908,0:03:08.144 is to draw a straight line[br]through the data cloud 0:03:08.144,0:03:10.775 in such a way that this line[br]summarizes the data 0:03:10.775,0:03:12.613 as closely as possible. 0:03:13.295,0:03:17.181 The technical term for this[br]is "linear regression." 0:03:17.669,0:03:20.888 Later on we'll talk about[br]how this line is created, 0:03:20.888,0:03:24.278 but for now we can assume[br]that the line fits the data 0:03:24.278,0:03:26.456 as closely as possible. 0:03:27.087,0:03:29.536 So, what can this line tell us? 0:03:30.067,0:03:32.596 First, we immediately see 0:03:32.596,0:03:35.358 if the line is sloping[br]upward or downward. 0:03:36.107,0:03:39.827 In our data set we see[br]the [fitted] line slopes upward. 0:03:40.794,0:03:43.807 It thus confirms what[br]we have conjectured earlier 0:03:43.807,0:03:45.587 by just looking at the scatterplot. 0:03:46.070,0:03:50.237 The upward slope means[br]that there is a positive association 0:03:50.237,0:03:53.026 between looks[br]and evaluation scores. 0:03:53.544,0:03:55.907 In other words, on average, 0:03:55.907,0:03:59.469 better-looking professors[br]are getting better evaluations. 0:03:59.768,0:04:03.939 For other data sets we might see[br]a stronger positive association. 0:04:04.377,0:04:07.420 Or, you might see[br]a negative association. 0:04:07.857,0:04:10.764 Or perhaps no association at all. 0:04:11.158,0:04:13.903 And our lines[br]don't have to be straight. 0:04:14.389,0:04:17.304 They can curve to fit the data[br]when necessary. 0:04:17.770,0:04:21.262 This line also gives us[br]a way to predict outcomes. 0:04:21.579,0:04:25.569 We can simply take a beauty score[br]and read off the line 0:04:25.569,0:04:28.429 what the predicted[br]evaluation score would be. 0:04:28.609,0:04:30.546 So, back to our new professor. 0:04:31.097,0:04:34.109 We can precisely predict[br]his evaluation score. 0:04:34.683,0:04:36.749 "But wait! Wait!" you might say. 0:04:37.019,0:04:38.749 "Can we trust this prediction?" 0:04:39.233,0:04:41.665 How well does[br]this one beauty variable 0:04:41.665,0:04:43.515 really predict evaluations? 0:04:44.844,0:04:47.890 Linear regression gives us[br]some useful measures 0:04:47.890,0:04:49.770 to answer those questions 0:04:49.770,0:04:52.039 which we'll cover[br]in a future video. 0:04:52.838,0:04:55.439 We also have to be aware[br]of other pitfalls 0:04:55.439,0:04:58.340 before we draw[br]any definite conclusions. 0:04:58.833,0:05:00.430 You could imagine a scenario 0:05:00.430,0:05:03.639 where what is driving[br]the association we see 0:05:03.639,0:05:06.900 is really a third variable[br]that we have left out. 0:05:07.344,0:05:09.965 For example,[br]the difficulty of the course 0:05:09.965,0:05:12.456 might be behind[br]the positive association 0:05:12.456,0:05:15.645 between beauty ratings[br]and evaluation scores. 0:05:16.052,0:05:18.956 Easy intro. courses[br]get good evaluations. 0:05:19.228,0:05:22.972 Harder, more advanced courses[br]get bad evaluations. 0:05:23.660,0:05:27.668 And younger professors might[br]get assigned to intro. courses. 0:05:28.080,0:05:32.095 Then, if students judge[br]younger professors more attractive, 0:05:32.095,0:05:34.335 you will find[br]a positive association 0:05:34.335,0:05:37.383 between beauty ratings[br]and evaluation scores. 0:05:37.861,0:05:40.388 But it's really[br]the difficulty of the course, 0:05:40.388,0:05:43.537 the variable that we've left out,[br]not beauty, 0:05:43.537,0:05:45.848 that is driving evaluation scores. 0:05:46.346,0:05:49.807 In that case, all the primping[br]would be for naught -- 0:05:50.289,0:05:54.441 a case of mistaken correlation[br]for causation, 0:05:54.900,0:05:58.166 something we'll talk about further[br]in a later video. 0:05:58.922,0:06:02.069 And what if there were[br]other important variables 0:06:02.069,0:06:05.781 that affect both beauty ratings[br]and evaluation scores? 0:06:06.626,0:06:09.575 You might want to add[br]considerations like skill, 0:06:09.846,0:06:14.577 race, sex, and whether English[br]is the teacher's native language 0:06:14.577,0:06:18.994 to isolate more cleanly the effect[br]of beauty on evaluations. 0:06:19.408,0:06:21.758 When we get[br]into multiple regression 0:06:21.758,0:06:24.477 we will be able to measure[br]the impact of beauty 0:06:24.477,0:06:26.219 on teacher evaluations 0:06:26.219,0:06:28.368 while accounting[br]for other variables 0:06:28.368,0:06:30.737 that might confound[br]this association. 0:06:31.762,0:06:35.509 Next up, we'll get our hands dirty[br]by playing with this data 0:06:35.509,0:06:39.070 to gain a better understanding[br]of what this line can tell us. 0:06:41.169,0:06:42.445 - [Narrator] Congratulations! 0:06:42.445,0:06:45.247 You're one step closer[br]to being a data ninja! 0:06:45.568,0:06:47.139 However, to master this 0:06:47.139,0:06:48.700 you'll need[br]to strengthen your skills 0:06:48.700,0:06:50.404 with some practice questions. 0:06:50.865,0:06:53.976 Ready for your next mission?[br]Click "Next Video." 0:06:54.313,0:06:55.364 Still here? 0:06:55.598,0:06:58.325 Move from understanding data[br]to understanding your world 0:06:58.325,0:07:01.642 by checking out MRU's[br]other popular economics videos. 0:07:01.892,0:07:04.406 ♪ [music] ♪