1
00:00:00,100 --> 00:00:02,050
♪ [music] ♪

2
00:00:03,620 --> 00:00:05,700
- [Narrator] Welcome
to Nobel Conversations.

3
00:00:07,000 --> 00:00:10,043
In this episode, Josh Angrist
and Guido Imbens

4
00:00:10,043 --> 00:00:13,675
sit down with Isaiah Andrews
to discuss and disagree

5
00:00:13,675 --> 00:00:16,580
over the role of machine learning
in applied econometrics.

6
00:00:17,897 --> 00:00:19,769
- [Isaiah] So, of course,
there are a lot of topics

7
00:00:19,769 --> 00:00:21,087
where you guys largely agree,

8
00:00:21,087 --> 00:00:22,313
but I'd like to turn to one

9
00:00:22,313 --> 00:00:24,240
where maybe you have
some differences of opinion.

10
00:00:24,240 --> 00:00:25,728
I'd love to hear
some of your thoughts

11
00:00:25,728 --> 00:00:26,883
about machine learning

12
00:00:26,883 --> 00:00:29,900
and the goal that it's playing
and is going to play in economics.

13
00:00:30,200 --> 00:00:33,352
- [Guido] I've looked at some data
like the proprietary.

14
00:00:33,352 --> 00:00:35,150
We see that there's
no published paper there.

15
00:00:36,122 --> 00:00:38,159
There was an experiment
that was done

16
00:00:38,159 --> 00:00:39,500
on some search algorithm,

17
00:00:39,700 --> 00:00:41,327
and the question was --

18
00:00:42,901 --> 00:00:45,600
it was about ranking things
and changing the ranking.

19
00:00:45,900 --> 00:00:47,290
And it was sort of clear

20
00:00:48,400 --> 00:00:50,610
that there was going to be
a lot of heterogeneity there.

21
00:00:52,161 --> 00:00:56,282
If you look for, say,

22
00:00:57,831 --> 00:01:00,617
a picture of Britney Spears --

23
00:01:00,617 --> 00:01:02,493
that it doesn't really matter
where you rank it

24
00:01:02,493 --> 00:01:05,500
because you're going to figure out
what you're looking for,

25
00:01:06,200 --> 00:01:07,867
whether you put it
in the first or second

26
00:01:07,867 --> 00:01:09,920
or third position of the ranking.

27
00:01:10,100 --> 00:01:12,500
But if you're looking
for the best econometrics book,

28
00:01:13,300 --> 00:01:16,430
if you put your book first
or your book tenth --

29
00:01:16,430 --> 00:01:18,100
that's going to make
a big difference

30
00:01:18,600 --> 00:01:20,979
how often people
are going to click on it.

31
00:01:21,829 --> 00:01:23,417
And so there you --

32
00:01:23,417 --> 00:01:27,119
- [Josh] Why do I need
machine learning to discover that?

33
00:01:27,119 --> 00:01:29,195
It seems like -- because
I can discover it simply.

34
00:01:29,195 --> 00:01:30,435
- [Guido] So in general --

35
00:01:30,435 --> 00:01:32,100
- [Josh] There were lots
of possible...

36
00:01:32,100 --> 00:01:35,045
- You want to think about
there being lots of characteristics

37
00:01:35,490 --> 00:01:37,280
of the items,

38
00:01:37,610 --> 00:01:41,682
that you want to understand
what drives the heterogeneity

39
00:01:42,177 --> 00:01:43,427
in the effect of --

40
00:01:43,427 --> 00:01:45,008
- But you're just predicting

41
00:01:45,008 --> 00:01:47,665
In some sense, you're solving
a marketing problem.

42
00:01:47,665 --> 00:01:49,381
- No, it's a causal effect,

43
00:01:49,381 --> 00:01:51,911
- It's causal, but it has
no scientific content.

44
00:01:51,911 --> 00:01:53,141
Think about...

45
00:01:53,657 --> 00:01:57,300
- No, but there's similar things
in medical settings.

46
00:01:58,000 --> 00:02:01,300
If you do an experiment, 
you may actually be very interested

47
00:02:01,300 --> 00:02:03,900
in whether the treatment works
for some groups or not.

48
00:02:03,900 --> 00:02:06,143
And you have a lot
of individual characteristics,

49
00:02:06,143 --> 00:02:08,000
and you want
to systematically search --

50
00:02:08,000 --> 00:02:09,500
- Yeah. I'm skeptical about that --

51
00:02:09,500 --> 00:02:12,603
that sort of idea that there's
this personal causal effect

52
00:02:12,603 --> 00:02:14,000
that I should care about,

53
00:02:14,000 --> 00:02:15,740
and that machine learning
can discover it

54
00:02:15,740 --> 00:02:17,259
in some way that's useful.

55
00:02:17,259 --> 00:02:20,045
So think about -- I've done
a lot of work on schools,

56
00:02:20,045 --> 00:02:22,336
going to, say, a charter school,

57
00:02:22,336 --> 00:02:24,428
a publicly funded private school,

58
00:02:25,225 --> 00:02:27,392
effectively, 
that's free to structure

59
00:02:27,392 --> 00:02:29,399
its own curriculum
for context there.

60
00:02:29,399 --> 00:02:31,369
Some types of charter schools

61
00:02:31,369 --> 00:02:33,703
generate spectacular
achievement gains,

62
00:02:33,703 --> 00:02:36,400
and in the data set
that produces that result,

63
00:02:36,400 --> 00:02:37,800
I have a lot of covariates.

64
00:02:37,800 --> 00:02:41,353
So I have baseline scores,
and I have family background,

65
00:02:41,353 --> 00:02:43,207
the education of the parents,

66
00:02:43,576 --> 00:02:45,800
the sex of the child, 
the race of the child.

67
00:02:45,800 --> 00:02:49,795
And, well, soon as I put
half a dozen of those together,

68
00:02:49,795 --> 00:02:51,900
I have a very 
high-dimensional space.

69
00:02:52,300 --> 00:02:55,199
I'm definitely interested
in course features

70
00:02:55,199 --> 00:02:56,457
of that treatment effect,

71
00:02:56,457 --> 00:02:58,741
like whether it's better for people

72
00:02:58,741 --> 00:03:02,046
who come from
lower-income families.

73
00:03:02,600 --> 00:03:05,760
I have a hard time believing
that there's an application

74
00:03:07,273 --> 00:03:09,872
for the very high-dimensional
version of that,

75
00:03:09,872 --> 00:03:12,406
where I discovered
that for non-white children

76
00:03:12,406 --> 00:03:14,971
who have high family incomes

77
00:03:14,971 --> 00:03:17,800
but baseline scores
in the third quartile

78
00:03:18,166 --> 00:03:21,785
and only went to public school
in the third grade

79
00:03:21,785 --> 00:03:23,000
but not the sixth grade.

80
00:03:23,000 --> 00:03:25,796
So that's what that 
high-dimensional analysis produces.

81
00:03:25,800 --> 00:03:28,100
It's a very elaborate
conditional statement.

82
00:03:28,300 --> 00:03:30,605
There's two things that are wrong
with that in my view.

83
00:03:30,605 --> 00:03:31,797
First, I don't see it as --

84
00:03:31,797 --> 00:03:34,000
I just can't imagine
why it's actionable.

85
00:03:34,600 --> 00:03:36,600
I don't know why
you'd want to act on it.

86
00:03:36,600 --> 00:03:39,455
And I know also that
there's some alternative model

87
00:03:39,455 --> 00:03:41,200
that fits almost as well,

88
00:03:41,800 --> 00:03:43,398
that flips everything.

89
00:03:43,398 --> 00:03:45,350
Because machine learning
doesn't tell me

90
00:03:45,350 --> 00:03:48,582
that this is really
the predictor that matters --

91
00:03:48,582 --> 00:03:50,965
it just tells me
that this is a good predictor.

92
00:03:51,486 --> 00:03:54,870
And so, I think
there is something different

93
00:03:54,870 --> 00:03:57,586
about the social science context.

94
00:03:57,940 --> 00:04:00,186
- [Guido] I think
the social science applications

95
00:04:00,186 --> 00:04:01,633
you're talking about

96
00:04:01,633 --> 00:04:02,735
are ones where,

97
00:04:03,400 --> 00:04:08,100
I think, there's not a huge amount
of heterogeneity in the effects.

98
00:04:09,777 --> 00:04:11,544
- [Josh] Well, there might be
if you allow me

99
00:04:11,544 --> 00:04:13,466
to fill that space.

100
00:04:13,466 --> 00:04:15,740
- No... not even then.

101
00:04:15,740 --> 00:04:18,614
I think for a lot
of those interventions,

102
00:04:18,614 --> 00:04:22,765
you would expect that the effect
is the same sign for everybody.

103
00:04:24,367 --> 00:04:27,600
There may be small differences
in the magnitude, but it's not...

104
00:04:28,200 --> 00:04:30,232
For a lot of these
educational defenses --

105
00:04:30,232 --> 00:04:31,700
they're good for everybody.

106
00:04:34,169 --> 00:04:36,034
It's not that they're bad
for some people

107
00:04:36,034 --> 00:04:37,600
and good for other people,

108
00:04:37,600 --> 00:04:39,100
and that is kind
of very small pockets

109
00:04:39,100 --> 00:04:40,900
where they're bad there.

110
00:04:40,900 --> 00:04:44,011
But there may be some variation
in the magnitude,

111
00:04:44,011 --> 00:04:46,955
but you would need very, 
very big data sets to find those.

112
00:04:47,906 --> 00:04:49,078
I agree that in those cases,

113
00:04:49,078 --> 00:04:51,400
they probably wouldn't be
very actionable anyway.

114
00:04:51,700 --> 00:04:53,800
But I think there's a lot
of other settings

115
00:04:54,100 --> 00:04:56,600
where there is
much more heterogeneity.

116
00:04:57,250 --> 00:04:59,102
- Well, I'm open
to that possibility,

117
00:04:59,102 --> 00:05:04,918
and I think the example you gave
is essentially a marketing example.

118
00:05:06,315 --> 00:05:09,656
- No, those have
implications for it

119
00:05:09,656 --> 00:05:11,069
and that's the organization,

120
00:05:12,252 --> 00:05:14,330
whether you need
to worry about the...

121
00:05:15,469 --> 00:05:17,900
- Well, I need to see that paper.

122
00:05:18,400 --> 00:05:21,072
- So the sense
I'm getting is that --

123
00:05:21,467 --> 00:05:23,996
- We still disagree on something.
- Yes.

124
00:05:23,996 --> 00:05:25,440
- We haven't converged
on everything.

125
00:05:25,440 --> 00:05:27,200
- I'm getting that sense.
[laughter]

126
00:05:27,200 --> 00:05:28,679
- Actually, we've diverged on this

127
00:05:28,679 --> 00:05:30,833
because this wasn't around
to argue about.

128
00:05:30,833 --> 00:05:32,334
[laughter]

129
00:05:33,057 --> 00:05:34,771
- Is it getting a little warm here?

130
00:05:35,820 --> 00:05:38,147
- Warmed up. Warmed up is good.

131
00:05:38,147 --> 00:05:41,187
The sense I'm getting is, 
Josh, you're not saying

132
00:05:41,187 --> 00:05:43,119
that you're confident
that there is no way

133
00:05:43,119 --> 00:05:45,347
that there is an application
where this stuff is useful.

134
00:05:45,347 --> 00:05:47,028
You are saying
you are unconvinced

135
00:05:47,028 --> 00:05:49,487
by the existing
applications to date.

136
00:05:49,917 --> 00:05:52,022
- Fair enough.
- I'm very confident.

137
00:05:52,022 --> 00:05:53,704
[laughter]

138
00:05:54,156 --> 00:05:55,189
- In this case.

139
00:05:55,189 --> 00:05:56,555
- I think Josh does have a point

140
00:05:56,555 --> 00:06:00,452
that even in the prediction cases

141
00:06:01,639 --> 00:06:04,519
where a lot of the machine learning
methods really shine

142
00:06:04,519 --> 00:06:06,738
is where there's just a lot
of heterogeneity.

143
00:06:07,300 --> 00:06:10,769
- You don't really care much
about the details there, right?

144
00:06:10,769 --> 00:06:11,836
- [Guido] Yes.

145
00:06:11,836 --> 00:06:15,000
- It doesn't have
a policy angle or something.

146
00:06:15,200 --> 00:06:18,232
- The kind of recognizing
handwritten digits and stuff --

147
00:06:18,795 --> 00:06:20,090
it does much better there

148
00:06:20,090 --> 00:06:24,000
than building
some complicated model.

149
00:06:24,400 --> 00:06:28,183
But a lot of the social science,
a lot of the economic applications,

150
00:06:28,183 --> 00:06:30,383
we actually know a huge amount
about the relationship

151
00:06:30,383 --> 00:06:32,100
between its variables.

152
00:06:32,100 --> 00:06:34,700
A lot of the relationships
are strictly monotone.

153
00:06:37,166 --> 00:06:39,416
Education is going to increase
people's earnings,

154
00:06:39,697 --> 00:06:41,950
irrespective of the demographic,

155
00:06:41,950 --> 00:06:44,930
irrespective of the level
of education you already have.

156
00:06:44,930 --> 00:06:46,180
- Until they get to a Ph.D.

157
00:06:46,180 --> 00:06:47,956
- Is that true for graduate school?

158
00:06:47,956 --> 00:06:49,227
[laughter]

159
00:06:49,227 --> 00:06:50,700
- Over a reasonable range.

160
00:06:51,600 --> 00:06:55,488
It's not going
to go down very much.

161
00:06:56,100 --> 00:06:58,121
In a lot of the settings

162
00:06:58,121 --> 00:07:00,100
where these machine learning
methods shine,

163
00:07:00,100 --> 00:07:01,900
there's a lot of non-monotonicity,

164
00:07:02,100 --> 00:07:04,900
kind of multimodality
in these relationships,

165
00:07:05,300 --> 00:07:08,921
and they're going to be
very powerful.

166
00:07:08,921 --> 00:07:11,787
But I still stand by that.

167
00:07:12,410 --> 00:07:14,975
These methods just have
a huge amount to offer

168
00:07:15,925 --> 00:07:17,561
for economists,

169
00:07:17,561 --> 00:07:21,700
and they're going to be
a big part of the future.

170
00:07:21,930 --> 00:07:23,020
♪ [music] ♪

171
00:07:23,020 --> 00:07:24,600
- [Isaiah] It feels like
there's something interesting

172
00:07:24,600 --> 00:07:25,912
to be said about
machine learning here.

173
00:07:25,912 --> 00:07:28,100
So, Guido, I was wondering,
could you give some more...

174
00:07:28,100 --> 00:07:29,807
maybe some examples
of the sorts of examples

175
00:07:29,807 --> 00:07:30,908
you're thinking about

176
00:07:30,908 --> 00:07:32,660
with applications coming out
at the moment?

177
00:07:32,660 --> 00:07:34,182
- So one area is where

178
00:07:34,700 --> 00:07:36,947
instead of looking
for average causal effects,

179
00:07:36,947 --> 00:07:39,350
we're looking for
individualized estimates,

180
00:07:41,354 --> 00:07:43,288
predictions of causal effects,

181
00:07:43,288 --> 00:07:46,112
and there, 
the machine learning algorithms

182
00:07:46,112 --> 00:07:47,941
have been very effective.

183
00:07:47,941 --> 00:07:51,415
Traditionally, we would have done
these things using kernel methods,

184
00:07:51,415 --> 00:07:54,003
and theoretically, they work great,

185
00:07:54,003 --> 00:07:55,636
and there's some arguments

186
00:07:55,636 --> 00:07:57,612
that, formally, 
you can't do any better.

187
00:07:57,612 --> 00:07:59,579
But in practice, 
they don't work very well.

188
00:08:00,900 --> 00:08:03,527
Random causal forest-type things

189
00:08:03,527 --> 00:08:06,916
that Stefan Wager and Susan Athey
have been working on

190
00:08:06,916 --> 00:08:09,453
are used very widely.

191
00:08:09,453 --> 00:08:12,200
They've been very effective
in these settings

192
00:08:12,400 --> 00:08:19,208
to actually get causal effects
that vary by covariates.

193
00:08:20,700 --> 00:08:23,734
I think this is still just
the beginning of these methods.

194
00:08:23,734 --> 00:08:25,700
But in many cases,

195
00:08:27,351 --> 00:08:31,600
these algorithms are very effective
as searching over big spaces

196
00:08:31,800 --> 00:08:37,133
and finding the functions
that fit very well

197
00:08:37,133 --> 00:08:40,948
in ways that we couldn't
really do beforehand.

198
00:08:41,500 --> 00:08:42,697
- I don't know of an example

199
00:08:42,697 --> 00:08:45,300
where machine learning
has generated insights

200
00:08:45,300 --> 00:08:47,664
about a causal effect
that I'm interested in.

201
00:08:47,664 --> 00:08:49,610
And I do know of examples

202
00:08:49,610 --> 00:08:51,300
where it's potentially
very misleading.

203
00:08:51,300 --> 00:08:53,700
So I've done some work
with Brigham Frandsen,

204
00:08:54,100 --> 00:08:57,782
using, for example, random forests
to model covariate effects

205
00:08:57,782 --> 00:09:00,269
in an instrumental
variables problem

206
00:09:00,269 --> 00:09:03,375
where you need
to condition on covariates.

207
00:09:04,400 --> 00:09:06,531
And you don't particularly
have strong feelings

208
00:09:06,531 --> 00:09:08,200
about the functional form for that,

209
00:09:08,200 --> 00:09:10,000
so maybe you should curve...

210
00:09:10,900 --> 00:09:12,804
be open to flexible curve fitting,

211
00:09:12,804 --> 00:09:14,501
And that leads you down a path

212
00:09:14,501 --> 00:09:16,853
where there's a lot
of nonlinearities in the model,

213
00:09:17,384 --> 00:09:19,933
and that's very dangerous with IV

214
00:09:19,933 --> 00:09:23,000
because any sort
of excluded non-linearity

215
00:09:23,300 --> 00:09:25,839
potentially generates
a spurious causal effect,

216
00:09:25,839 --> 00:09:29,292
and Brigham and I showed that
very powerfully, I think,

217
00:09:29,292 --> 00:09:32,200
in the case of two instruments

218
00:09:32,944 --> 00:09:35,113
that come from a paper of mine
with Bill Evans,

219
00:09:35,113 --> 00:09:37,600
where if you replace it...

220
00:09:38,708 --> 00:09:40,825
a traditional two-stage
least squares estimator

221
00:09:40,825 --> 00:09:42,600
with some kind of random forest,

222
00:09:42,900 --> 00:09:46,807
you get very precisely estimated
nonsense estimates.

223
00:09:49,173 --> 00:09:51,100
I think that's a big caution.

224
00:09:51,944 --> 00:09:55,096
In view of those findings,
in an example I care about

225
00:09:55,096 --> 00:09:57,100
where the instruments
are very simple

226
00:09:57,400 --> 00:09:59,100
and I believe that they're valid,

227
00:09:59,300 --> 00:10:01,096
I would be skeptical of that.

228
00:10:02,900 --> 00:10:06,435
Non-linearity and IV
don't mix very comfortably.

229
00:10:06,435 --> 00:10:09,424
- No, it sounds like that's already
a more complicated...

230
00:10:10,206 --> 00:10:11,842
- Well, it's IV...
- Yeah.

231
00:10:12,591 --> 00:10:14,033
- ...but then we work on that.

232
00:10:14,403 --> 00:10:15,907
[laughter]

233
00:10:15,907 --> 00:10:17,289
- Fair enough.

234
00:10:17,289 --> 00:10:18,410
♪ [music] ♪

235
00:10:18,410 --> 00:10:20,001
- [Guido] As editor 
of Econometrica,

236
00:10:20,001 --> 00:10:22,054
a lot of these papers
cross my desk,

237
00:10:22,700 --> 00:10:26,823
but the motivation is not clear

238
00:10:27,555 --> 00:10:29,500
and, in fact, really lacking.

239
00:10:29,800 --> 00:10:31,028
They're not...

240
00:10:31,591 --> 00:10:34,926
big old type semiparametric
foundational papers.

241
00:10:35,315 --> 00:10:37,151
So that's a big problem.

242
00:10:38,761 --> 00:10:42,664
A related problem is that we have
this tradition in econometrics

243
00:10:42,664 --> 00:10:46,560
of being very focused
on these formal asymptotic results.

244
00:10:48,800 --> 00:10:53,289
We just have a lot of papers
where people propose a method,

245
00:10:53,289 --> 00:10:55,700
and then they establish
the asymptotic properties

246
00:10:56,300 --> 00:10:59,420
in a very kind of standardized way.

247
00:11:00,873 --> 00:11:02,055
- Is that bad?

248
00:11:02,900 --> 00:11:06,420
- Well, I think it's sort
of closed the door

249
00:11:06,420 --> 00:11:09,040
for a lot of work
that doesn't fit into that

250
00:11:09,040 --> 00:11:11,600
where in the machine
learning literature,

251
00:11:11,900 --> 00:11:13,453
a lot of things
are more algorithmic.

252
00:11:13,808 --> 00:11:18,500
People had algorithms
for coming up with predictions

253
00:11:18,800 --> 00:11:20,885
that turn out
to actually work much better

254
00:11:20,885 --> 00:11:23,600
than, say, nonparametric
kernel regression.

255
00:11:24,000 --> 00:11:26,800
For a long time, we were doing all
the nonparametrics in econometrics,

256
00:11:26,800 --> 00:11:28,950
and we were using
kernel regression,

257
00:11:28,950 --> 00:11:31,210
and that was great
for proving theorems.

258
00:11:31,210 --> 00:11:32,580
You could get confidence intervals

259
00:11:32,580 --> 00:11:34,684
and consistency, 
and asymptotic normality,

260
00:11:34,684 --> 00:11:35,736
and it was all great,

261
00:11:35,736 --> 00:11:37,000
But it wasn't very useful.

262
00:11:37,300 --> 00:11:39,100
And the things they did
in machine learning

263
00:11:39,100 --> 00:11:41,051
are just way, way better.

264
00:11:41,051 --> 00:11:42,557
But they didn't have the problem --

265
00:11:42,557 --> 00:11:44,449
- That's not my beef
with machine learning,

266
00:11:44,449 --> 00:11:45,871
that the theory is weak.

267
00:11:45,871 --> 00:11:47,141
[laughter]

268
00:11:47,141 --> 00:11:51,250
- No, but I'm saying there,
for the prediction part,

269
00:11:51,250 --> 00:11:52,394
it does much better.

270
00:11:52,394 --> 00:11:54,500
- Yeah, it's a better
curve fitting tool.

271
00:11:54,900 --> 00:11:57,608
- But it did so in a way

272
00:11:57,608 --> 00:11:59,782
that would not have made
those papers

273
00:11:59,782 --> 00:12:04,234
initially easy to get into,
the econometrics journals,

274
00:12:04,234 --> 00:12:06,270
because it wasn't proving
the type of things...

275
00:12:06,786 --> 00:12:09,864
When Breiman was doing
his regression trees --

276
00:12:09,864 --> 00:12:11,200
they just didn't fit in.

277
00:12:12,944 --> 00:12:14,934
I think he would have had
a very hard time

278
00:12:14,934 --> 00:12:18,400
publishing these things
in econometrics journals.

279
00:12:20,189 --> 00:12:23,656
I think we've limited
ourselves too much

280
00:12:24,700 --> 00:12:27,830
that left us close things off

281
00:12:27,830 --> 00:12:29,622
for a lot of these
machine-learning methods

282
00:12:29,622 --> 00:12:31,163
that are actually very useful.

283
00:12:31,163 --> 00:12:34,000
I mean, I think, in general,

284
00:12:34,900 --> 00:12:36,529
that literature, 
the computer scientist,

285
00:12:36,529 --> 00:12:40,013
have brought a huge number
of these algorithms there --

286
00:12:40,582 --> 00:12:42,632
have proposed a huge number
of these algorithms

287
00:12:42,632 --> 00:12:43,887
that actually are very useful.

288
00:12:43,887 --> 00:12:46,073
and that are affecting

289
00:12:46,073 --> 00:12:49,100
the way we're going
to be doing empirical work.

290
00:12:49,800 --> 00:12:52,105
But we've not fully
internalized that

291
00:12:52,105 --> 00:12:53,573
because we're still very focused

292
00:12:53,573 --> 00:12:57,500
on getting point estimates
and getting standard errors

293
00:12:58,600 --> 00:13:00,159
and getting P values

294
00:13:00,159 --> 00:13:03,209
in a way that we need
to move beyond

295
00:13:03,209 --> 00:13:06,090
to fully harness the force,

296
00:13:06,549 --> 00:13:08,351
the benefits

297
00:13:08,351 --> 00:13:10,979
from the machine 
learning literature.

298
00:13:11,198 --> 00:13:13,548
- On the one hand, I guess I very
much take your point

299
00:13:13,548 --> 00:13:16,850
that sort of the traditional
econometrics framework

300
00:13:16,850 --> 00:13:19,821
of propose a method,
prove a limit theorem

301
00:13:19,821 --> 00:13:23,870
under some asymptotic story,
story, story, story, story...

302
00:13:24,424 --> 00:13:27,057
publisher paper is constraining,

303
00:13:27,218 --> 00:13:30,132
and that, in some sense,
by thinking more broadly

304
00:13:30,132 --> 00:13:31,829
about what a methods paper
could look like,

305
00:13:31,829 --> 00:13:33,486
we may write, in some sense,

306
00:13:33,486 --> 00:13:35,229
certainly that the machine
learning literature

307
00:13:35,229 --> 00:13:37,189
has found a bunch of things
which seem to work quite well

308
00:13:37,189 --> 00:13:38,300
for a number of problems

309
00:13:38,300 --> 00:13:41,267
and are now having
substantial influence in economics.

310
00:13:41,267 --> 00:13:43,261
I guess a question
I'm interested in

311
00:13:43,261 --> 00:13:46,465
is how do you think
about the role of...

312
00:13:48,657 --> 00:13:51,200
Do you think there is no value
in the theory part of it?

313
00:13:51,600 --> 00:13:54,187
Because I guess a question
that I often have

314
00:13:54,187 --> 00:13:56,804
to seeing the output
from a machine learning tool,

315
00:13:56,804 --> 00:13:58,207
and actually a number
of the methods

316
00:13:58,207 --> 00:13:59,220
that you talked about

317
00:13:59,220 --> 00:14:00,679
actually do have
inferential results

318
00:14:00,679 --> 00:14:01,944
developed for them,

319
00:14:02,520 --> 00:14:03,963
something that
I always wonder about,

320
00:14:03,963 --> 00:14:06,659
a sort of uncertainty
quantification and just...

321
00:14:06,659 --> 00:14:08,000
I have my prior,

322
00:14:08,000 --> 00:14:11,000
I come into the world with my view,
I see the result of this thing.

323
00:14:11,000 --> 00:14:12,395
How should I update based on it?

324
00:14:12,395 --> 00:14:13,867
And in some sense, 
if I'm in a world

325
00:14:13,867 --> 00:14:15,914
where things
are normally distributed,

326
00:14:15,914 --> 00:14:17,280
I know how to do it --

327
00:14:17,280 --> 00:14:18,305
here I don't.

328
00:14:18,305 --> 00:14:21,028
And so I'm interested to hear
what you think about that.

329
00:14:21,500 --> 00:14:24,425
- I don't see this 
as sort of saying, well,

330
00:14:24,698 --> 00:14:26,556
these results are not interesting,

331
00:14:26,556 --> 00:14:27,968
but it's going to be a lot of cases

332
00:14:27,968 --> 00:14:30,233
where it's going to be incredibly
hard to get those results,

333
00:14:30,233 --> 00:14:32,489
and we may not
be able to get there,

334
00:14:32,489 --> 00:14:34,942
and we may need to do it in stages

335
00:14:34,942 --> 00:14:36,440
where first someone says,

336
00:14:36,440 --> 00:14:40,900
"Hey, I have
this interesting algorithm

337
00:14:40,900 --> 00:14:42,200
for doing something,"

338
00:14:42,200 --> 00:14:47,209
and it works well
by some criterion

339
00:14:47,209 --> 00:14:49,900
on this particular data set,

340
00:14:51,000 --> 00:14:52,602
and we should put it out there.

341
00:14:52,602 --> 00:14:55,410
and maybe someone
will figure out a way

342
00:14:55,410 --> 00:14:57,828
that you can later actually
still do inference

343
00:14:57,828 --> 00:14:59,463
under some conditions,

344
00:14:59,463 --> 00:15:02,100
and maybe those are not
particularly realistic conditions.

345
00:15:02,100 --> 00:15:03,800
Then we kind of go further.

346
00:15:03,800 --> 00:15:08,418
But I think we've been
constraining things too much

347
00:15:08,418 --> 00:15:09,519
where we said,

348
00:15:09,519 --> 00:15:13,185
"This is the type of things
that we need to do."

349
00:15:13,185 --> 00:15:14,502
And in some sense,

350
00:15:15,700 --> 00:15:18,200
that goes back
to the way Josh and I

351
00:15:19,700 --> 00:15:21,984
thought about things for the local
average treatment effect.

352
00:15:21,984 --> 00:15:23,137
That wasn't quite the way

353
00:15:23,137 --> 00:15:25,135
people were thinking
about these problems before.

354
00:15:25,805 --> 00:15:28,860
There was a sense
that some of the people said

355
00:15:29,500 --> 00:15:31,900
the way you need to do
these things is you first say

356
00:15:32,200 --> 00:15:34,140
what you're interested
in estimating,

357
00:15:34,140 --> 00:15:37,507
and then you do the best job
you can in estimating that.

358
00:15:38,100 --> 00:15:43,874
And what you guys are doing
is you're doing it backwards.

359
00:15:44,300 --> 00:15:46,700
You kind of say,
"Here, I have an estimator,

360
00:15:47,300 --> 00:15:50,642
and now I'm going to figure out
what it's estimating."

361
00:15:50,642 --> 00:15:53,900
And I suppose you're going to say
why you think that's interesting

362
00:15:53,900 --> 00:15:56,600
or maybe why it's not interesting,
and that's not okay.

363
00:15:56,600 --> 00:15:58,600
You're not allowed
to do that in that way.

364
00:15:59,000 --> 00:16:02,026
And I think we should
just be a little bit more flexible

365
00:16:02,026 --> 00:16:06,648
in thinking about
how to look at problems

366
00:16:06,648 --> 00:16:08,328
because I think
we've missed some things

367
00:16:08,328 --> 00:16:11,300
by not doing that.

368
00:16:11,300 --> 00:16:12,819
♪ [music] ♪

369
00:16:12,819 --> 00:16:14,753
- [Josh] So you've heard
our views, Isaiah,

370
00:16:14,753 --> 00:16:18,191
and you've seen that we have
some points of disagreement.

371
00:16:18,191 --> 00:16:20,400
Why don't you referee
this dispute for us?

372
00:16:20,950 --> 00:16:22,394
[laughter]

373
00:16:22,500 --> 00:16:24,999
- Oh, it's so nice of you
to ask me a small question.

374
00:16:24,999 --> 00:16:26,212
[laughter]

375
00:16:26,425 --> 00:16:27,993
So I guess, for one,

376
00:16:27,993 --> 00:16:33,200
I very much agree with something
that Guido said earlier of...

377
00:16:34,100 --> 00:16:35,710
[laughter]

378
00:16:35,920 --> 00:16:37,148
So one thing where it seems

379
00:16:37,148 --> 00:16:40,066
where the case for machine learning
seems relatively clear

380
00:16:40,066 --> 00:16:43,316
is in settings where
we're interested in some version

381
00:16:43,316 --> 00:16:45,100
of a nonparametric
prediction problem.

382
00:16:45,100 --> 00:16:46,392
So I'm interested in estimating

383
00:16:46,392 --> 00:16:49,700
a conditional expectation
or conditional probability,

384
00:16:50,000 --> 00:16:52,020
and in the past, maybe
I would have run a kernel...

385
00:16:52,020 --> 00:16:53,526
I would have run
a kernel regression,

386
00:16:53,526 --> 00:16:55,184
or I would have run
a series regression,

387
00:16:55,184 --> 00:16:57,400
or something along those lines.

388
00:16:57,976 --> 00:17:00,350
It seems like, at this point, 
we've a fairly good sense

389
00:17:00,350 --> 00:17:03,102
that in a fairly wide range
of applications,

390
00:17:03,102 --> 00:17:05,671
machine learning methods
seem to do better

391
00:17:05,671 --> 00:17:08,610
for estimating conditional
mean functions,

392
00:17:08,610 --> 00:17:09,811
or conditional probabilities,

393
00:17:09,811 --> 00:17:12,000
or various other
nonparametric objects

394
00:17:12,400 --> 00:17:15,309
than more traditional
nonparametric methods

395
00:17:15,309 --> 00:17:17,292
that were studied
in econometrics and statistics,

396
00:17:17,292 --> 00:17:19,100
especially in
high-dimensional settings.

397
00:17:19,500 --> 00:17:21,849
- So you're thinking of maybe
the propensity score

398
00:17:21,849 --> 00:17:23,155
or something like that?

399
00:17:23,155 --> 00:17:25,063
- Yeah, exactly,
- Nuisance functions.

400
00:17:25,063 --> 00:17:27,100
- Yeah, so things
like propensity scores.

401
00:17:27,872 --> 00:17:29,965
Even objects of more direct

402
00:17:29,965 --> 00:17:32,400
interest-like conditional
average treatment effects,

403
00:17:32,400 --> 00:17:35,100
which are the difference of two
conditional expectation functions,

404
00:17:35,100 --> 00:17:36,625
potentially things like that.

405
00:17:36,625 --> 00:17:40,573
Of course, even there, 
the theory...

406
00:17:40,573 --> 00:17:43,620
for inference of the theory
for how to interpret,

407
00:17:43,620 --> 00:17:45,797
how to make large sample statements
about some of these things

408
00:17:45,797 --> 00:17:47,733
are less well-developed
depending on

409
00:17:47,733 --> 00:17:50,100
the machine learning
estimator used.

410
00:17:50,100 --> 00:17:52,983
And so I think
something that is tricky

411
00:17:52,983 --> 00:17:55,700
is that we can have these methods,
which work a lot,

412
00:17:55,700 --> 00:17:58,000
which seem to work
a lot better for some purposes

413
00:17:58,000 --> 00:18:01,229
but which we need to be a bit
careful in how we plug them in

414
00:18:01,229 --> 00:18:03,300
or how we interpret
the resulting statements.

415
00:18:03,600 --> 00:18:05,985
But, of course, that's a very,
very active area right now

416
00:18:05,985 --> 00:18:07,668
where people are doing
tons of great work.

417
00:18:07,668 --> 00:18:10,694
So I fully expect
and hope to see

418
00:18:10,694 --> 00:18:12,800
much more going forward there.

419
00:18:13,000 --> 00:18:16,780
So one issue with machine learning
that always seems a danger is...

420
00:18:16,780 --> 00:18:18,517
or that is sometimes a danger

421
00:18:18,517 --> 00:18:20,938
and has sometimes
led to applications

422
00:18:20,938 --> 00:18:22,139
that have made less sense

423
00:18:22,139 --> 00:18:27,309
is when folks start with a method
that they're very excited about

424
00:18:27,309 --> 00:18:28,676
rather than a question.

425
00:18:28,900 --> 00:18:30,492
So sort of starting with a question

426
00:18:30,492 --> 00:18:33,782
where here's the object
I'm interested in,

427
00:18:33,782 --> 00:18:35,228
here is the parameter
of interest --

428
00:18:35,529 --> 00:18:39,500
let me think about how I would
identify that thing,

429
00:18:39,500 --> 00:18:41,824
how I would recover that thing
if I had a ton of data.

430
00:18:41,824 --> 00:18:44,000
Oh, here's a conditional
expectation function,

431
00:18:44,000 --> 00:18:47,065
let me plug in a machine
learning estimator for that --

432
00:18:47,065 --> 00:18:48,800
that seems very, very sensible.

433
00:18:49,000 --> 00:18:52,964
Whereas, you know, 
if I regress quantity on price

434
00:18:53,504 --> 00:18:56,000
and say that I used
a machine learning method,

435
00:18:56,300 --> 00:18:58,791
maybe I'm satisfied that 
that solves the endogeneity problem

436
00:18:58,791 --> 00:19:01,200
we're usually worried
about there... maybe I'm not.

437
00:19:01,500 --> 00:19:02,649
But, again, that's something

438
00:19:02,649 --> 00:19:06,300
where the way to address it
seems relatively clear.

439
00:19:06,500 --> 00:19:08,181
It's to find
your object of interest

440
00:19:08,181 --> 00:19:09,779
and think about --

441
00:19:09,779 --> 00:19:11,489
- Just bring in the economics.

442
00:19:11,489 --> 00:19:12,741
- Exactly.

443
00:19:12,741 --> 00:19:14,274
- And think about
the heterogeneity,

444
00:19:14,274 --> 00:19:17,067
but harness the power
of the machine learning methods

445
00:19:17,067 --> 00:19:20,148
for some of the components.

446
00:19:20,349 --> 00:19:21,388
- Precisely. Exactly.

447
00:19:21,388 --> 00:19:23,753
So the question of interest

448
00:19:23,753 --> 00:19:25,767
is the same as the question
of interest has always been,

449
00:19:25,767 --> 00:19:28,493
but we now have better methods
for estimating some pieces of this.

450
00:19:29,900 --> 00:19:32,704
The place that seems
harder to forecast

451
00:19:32,704 --> 00:19:35,816
is obviously there's
a huge amount going on

452
00:19:35,816 --> 00:19:37,500
in the machine learning literature,

453
00:19:37,500 --> 00:19:40,223
and the limited ways
of plugging it in

454
00:19:40,223 --> 00:19:41,388
that I've referenced so far

455
00:19:41,388 --> 00:19:43,090
are a limited piece of that.

456
00:19:43,090 --> 00:19:45,324
So I think there are all sorts
of other interesting questions

457
00:19:45,324 --> 00:19:46,520
about where...

458
00:19:47,100 --> 00:19:49,300
where does this interaction go? 
What else can we learn?

459
00:19:49,300 --> 00:19:52,932
And that's something where
I think there's a ton going on,

460
00:19:52,932 --> 00:19:54,414
which seems very promising,

461
00:19:54,414 --> 00:19:56,400
and I have no idea
what the answer is.

462
00:19:57,000 --> 00:20:00,297
- No, I totally agree with that,

463
00:20:00,297 --> 00:20:03,539
but that makes it very exciting.

464
00:20:03,539 --> 00:20:06,100
And I think there's just
a little work to be done there.

465
00:20:06,600 --> 00:20:08,720
Alright. So I say, 
he agrees with me there.

466
00:20:08,720 --> 00:20:10,174
[laughter]

467
00:20:10,174 --> 00:20:11,633
- I didn't say that per se.

468
00:20:12,926 --> 00:20:14,419
♪ [music] ♪

469
00:20:14,419 --> 00:20:16,833
- [Narrator] If you'd like to watch
more Nobel Conversations,

470
00:20:16,833 --> 00:20:18,012
click here.

471
00:20:18,012 --> 00:20:20,492
Or if you'd like to learn
more about econometrics,

472
00:20:20,500 --> 00:20:23,100
check out Josh's
Mastering Econometrics series.

473
00:20:23,600 --> 00:20:26,569
If you'd like to learn more
about Guido, Josh, and Isaiah,

474
00:20:26,569 --> 00:20:28,550
check out the links
in the description.

475
00:20:28,550 --> 00:20:30,535
♪ [music] ♪