1
00:00:00,107 --> 00:00:03,926
♪ [music] ♪

2
00:00:21,040 --> 00:00:22,077
- [Thomas Stratmann] Hi!

3
00:00:22,077 --> 00:00:24,268
In the upcoming series of videos

4
00:00:24,268 --> 00:00:26,858
we're going to give you
a shiny new tool

5
00:00:26,858 --> 00:00:30,414
to put into your
Understanding Data toolbox:

6
00:00:30,414 --> 00:00:31,981
linear regression.

7
00:00:32,885 --> 00:00:34,668
Say you've got this theory.

8
00:00:34,668 --> 00:00:37,249
You've witnessed
how good-looking people

9
00:00:37,249 --> 00:00:39,067
seem to get special perks.

10
00:00:39,642 --> 00:00:40,878
You're wondering,

11
00:00:40,878 --> 00:00:43,798
"Where else might we see
this phenomenon?"

12
00:00:44,132 --> 00:00:45,637
What about for professors?

13
00:00:45,637 --> 00:00:48,259
Is it possible
good-looking professors

14
00:00:48,259 --> 00:00:50,010
might get special perks too?

15
00:00:50,350 --> 00:00:53,899
Is it possible
students treat them better

16
00:00:53,899 --> 00:00:57,209
by showering them
with better student evaluations?

17
00:00:57,866 --> 00:01:00,467
If so, is the effect of looks

18
00:01:00,467 --> 00:01:03,573
on evaluation score
big or [inaudible]?

19
00:01:04,349 --> 00:01:08,143
And say there is a new professor
starting at a university.

20
00:01:08,619 --> 00:01:11,810
What can we predict
about his evaluation

21
00:01:11,810 --> 00:01:13,371
simply by his looks?

22
00:01:13,940 --> 00:01:17,216
Given that these evaluations
can determine pay raises,

23
00:01:17,671 --> 00:01:21,709
if this theory were true
we might see professors resort

24
00:01:21,709 --> 00:01:24,980
to some surprising tactics
to boost their scores.

25
00:01:25,471 --> 00:01:27,461
Suppose you wanted to find out

26
00:01:27,461 --> 00:01:30,801
if evaluations really improve
with better looks.

27
00:01:31,441 --> 00:01:34,450
How would you go about
testing this hypothesis?

28
00:01:34,956 --> 00:01:36,552
You could collect data.

29
00:01:36,761 --> 00:01:40,025
First you would have students rate
on a scale from 1 to 10

30
00:01:40,025 --> 00:01:42,076
how good-looking a professor was,

31
00:01:42,076 --> 00:01:44,807
which gives you
an average beauty score.

32
00:01:45,229 --> 00:01:48,552
Then you could retrieve
the teacher's teaching evaluations

33
00:01:48,552 --> 00:01:50,421
from twenty-five students.

34
00:01:50,421 --> 00:01:53,273
Let's look at these two variables
at the same time

35
00:01:53,273 --> 00:01:54,738
by using a scatterplot.

36
00:01:54,981 --> 00:01:57,419
We'll put beauty
on the horizontal axis,

37
00:01:57,852 --> 00:02:00,589
and teacher evaluations
on the vertical axis.

38
00:02:01,463 --> 00:02:05,514
For example, this dot
represents Professor Peate,

39
00:02:06,173 --> 00:02:08,811
who received a beauty score of 3

40
00:02:08,811 --> 00:02:11,866
and an evaluation of 8.425.

41
00:02:12,084 --> 00:02:14,958
This one way out here
is Professor Helmchen.

42
00:02:14,958 --> 00:02:16,797
- [Ben Stiller, "Zoolander"]
Ridiculously good-looking!

43
00:02:16,797 --> 00:02:18,721
- [Thomas] Who got
a very high beauty score,

44
00:02:18,721 --> 00:02:20,872
but not such a good evaluation.

45
00:02:21,101 --> 00:02:22,283
Can you see a trend?

46
00:02:22,283 --> 00:02:25,533
As we move from left to right
on the horizontal axis,

47
00:02:25,533 --> 00:02:27,963
from the ugly to the gorgeous,

48
00:02:27,963 --> 00:02:31,186
we see a trend upwards
in evaluation scores.

49
00:02:31,870 --> 00:02:35,174
By the way, the data
we're exploring in this series

50
00:02:35,174 --> 00:02:38,923
is not made up --
it comes from a real study

51
00:02:38,923 --> 00:02:40,897
done at the University of Texas.

52
00:02:41,337 --> 00:02:46,023
If you're wondering, "pulchritude"
is just the fancy academic way

53
00:02:46,023 --> 00:02:47,880
of saying beauty.

54
00:02:48,405 --> 00:02:51,474
With scatterplots
it can sometimes be hard

55
00:02:51,474 --> 00:02:55,594
to make out the exact relationship
between two variables --

56
00:02:55,594 --> 00:02:59,104
especially when the values
bounce around quite a bit

57
00:02:59,104 --> 00:03:01,318
as we go from left to right.

58
00:03:02,000 --> 00:03:04,908
One way to cut through
this bounciness

59
00:03:04,908 --> 00:03:08,144
is to draw a straight line
through the data cloud

60
00:03:08,144 --> 00:03:10,775
in such a way that this line
summarizes the data

61
00:03:10,775 --> 00:03:12,613
as closely as possible.

62
00:03:13,295 --> 00:03:17,181
The technical term for this
is "linear regression."

63
00:03:17,669 --> 00:03:20,888
Later on we'll talk about
how this line is created,

64
00:03:20,888 --> 00:03:24,278
but for now we can assume
that the line fits the data

65
00:03:24,278 --> 00:03:26,456
as closely as possible.

66
00:03:27,087 --> 00:03:29,536
So, what can this line tell us?

67
00:03:30,067 --> 00:03:32,596
First, we immediately see

68
00:03:32,596 --> 00:03:35,358
if the line is sloping
upward or downward.

69
00:03:36,107 --> 00:03:39,827
In our data set we see
the [fitted] line slopes upward.

70
00:03:40,794 --> 00:03:43,807
It thus confirms what
we have conjectured earlier

71
00:03:43,807 --> 00:03:45,587
by just looking at the scatterplot.

72
00:03:46,070 --> 00:03:50,237
The upward slope means
that there is a positive association

73
00:03:50,237 --> 00:03:53,026
between looks
and evaluation scores.

74
00:03:53,544 --> 00:03:55,907
In other words, on average,

75
00:03:55,907 --> 00:03:59,469
better-looking professors
are getting better evaluations.

76
00:03:59,768 --> 00:04:03,939
For other data sets we might see
a stronger positive association.

77
00:04:04,377 --> 00:04:07,420
Or, you might see
a negative association.

78
00:04:07,857 --> 00:04:10,764
Or perhaps no association at all.

79
00:04:11,158 --> 00:04:13,903
And our lines
don't have to be straight.

80
00:04:14,389 --> 00:04:17,304
They can curve to fit the data
when necessary.

81
00:04:17,770 --> 00:04:21,262
This line also gives us
a way to predict outcomes.

82
00:04:21,579 --> 00:04:25,569
We can simply take a beauty score
and read off the line

83
00:04:25,569 --> 00:04:28,429
what the predicted
evaluation score would be.

84
00:04:28,609 --> 00:04:30,546
So, back to our new professor.

85
00:04:31,097 --> 00:04:34,109
We can precisely predict
his evaluation score.

86
00:04:34,683 --> 00:04:36,749
"But wait! Wait!" you might say.

87
00:04:37,019 --> 00:04:38,749
"Can we trust this prediction?"

88
00:04:39,233 --> 00:04:41,665
How well does
this one beauty variable

89
00:04:41,665 --> 00:04:43,515
really predict evaluations?

90
00:04:44,844 --> 00:04:47,890
Linear regression gives us
some useful measures

91
00:04:47,890 --> 00:04:49,770
to answer those questions

92
00:04:49,770 --> 00:04:52,039
which we'll cover
in a future video.

93
00:04:52,838 --> 00:04:55,439
We also have to be aware
of other pitfalls

94
00:04:55,439 --> 00:04:58,340
before we draw
any definite conclusions.

95
00:04:58,833 --> 00:05:00,430
You could imagine a scenario

96
00:05:00,430 --> 00:05:03,639
where what is driving
the association we see

97
00:05:03,639 --> 00:05:06,900
is really a third variable
that we have left out.

98
00:05:07,344 --> 00:05:09,965
For example,
the difficulty of the course

99
00:05:09,965 --> 00:05:12,456
might be behind
the positive association

100
00:05:12,456 --> 00:05:15,645
between beauty ratings
and evaluation scores.

101
00:05:16,052 --> 00:05:18,956
Easy intro. courses
get good evaluations.

102
00:05:19,228 --> 00:05:22,972
Harder, more advanced courses
get bad evaluations.

103
00:05:23,660 --> 00:05:27,668
And younger professors might
get assigned to intro. courses.

104
00:05:28,080 --> 00:05:32,095
Then, if students judge
younger professors more attractive,

105
00:05:32,095 --> 00:05:34,335
you will find
a positive association

106
00:05:34,335 --> 00:05:37,383
between beauty ratings
and evaluation scores.

107
00:05:37,861 --> 00:05:40,388
But it's really
the difficulty of the course,

108
00:05:40,388 --> 00:05:43,537
the variable that we've left out,
not beauty,

109
00:05:43,537 --> 00:05:45,848
that is driving evaluation scores.

110
00:05:46,346 --> 00:05:49,807
In that case, all the primping
would be for naught --

111
00:05:50,289 --> 00:05:54,441
a case of mistaken correlation
for causation,

112
00:05:54,900 --> 00:05:58,166
something we'll talk about further
in a later video.

113
00:05:58,922 --> 00:06:02,069
And what if there were
other important variables

114
00:06:02,069 --> 00:06:05,781
that affect both beauty ratings
and evaluation scores?

115
00:06:06,626 --> 00:06:09,575
You might want to add
considerations like skill,

116
00:06:09,846 --> 00:06:14,577
race, sex, and whether English
is the teacher's native language

117
00:06:14,577 --> 00:06:18,994
to isolate more cleanly the effect
of beauty on evaluations.

118
00:06:19,408 --> 00:06:21,758
When we get
into multiple regression

119
00:06:21,758 --> 00:06:24,477
we will be able to measure
the impact of beauty

120
00:06:24,477 --> 00:06:26,219
on teacher evaluations

121
00:06:26,219 --> 00:06:28,368
while accounting
for other variables

122
00:06:28,368 --> 00:06:30,737
that might confound
this association.

123
00:06:31,762 --> 00:06:35,509
Next up, we'll get our hands dirty
by playing with this data

124
00:06:35,509 --> 00:06:39,070
to gain a better understanding
of what this line can tell us.

125
00:06:41,169 --> 00:06:42,445
- [Narrator] Congratulations!

126
00:06:42,445 --> 00:06:45,247
You're one step closer
to being a data ninja!

127
00:06:45,568 --> 00:06:47,139
However, to master this

128
00:06:47,139 --> 00:06:48,700
you'll need
to strengthen your skills

129
00:06:48,700 --> 00:06:50,404
with some practice questions.

130
00:06:50,865 --> 00:06:53,976
Ready for your next mission?
Click "Next Video."

131
00:06:54,313 --> 00:06:55,364
Still here?

132
00:06:55,598 --> 00:06:58,325
Move from understanding data
to understanding your world

133
00:06:58,325 --> 00:07:01,642
by checking out MRU's
other popular economics videos.

134
00:07:01,892 --> 00:07:04,406
♪ [music] ♪