1
00:00:00,100 --> 00:00:02,350
♪ [music] ♪

2
00:00:03,700 --> 00:00:05,700
- [narrator] Welcome
to Nobel Conversations.

3
00:00:07,000 --> 00:00:10,128
In this episode, Josh Angrist
and Guido Imbens

4
00:00:10,128 --> 00:00:13,700
sit down with Isaiah Andrews
to discuss and disagree

5
00:00:13,700 --> 00:00:16,580
over the role of machine learning
in applied econometrics.

6
00:00:18,300 --> 00:00:19,769
- [Isaiah] So, of course,
there are a lot of topics

7
00:00:19,769 --> 00:00:21,087
where you guys largely agree,

8
00:00:21,087 --> 00:00:22,313
but I'd like to turn to one

9
00:00:22,313 --> 00:00:24,240
where maybe you have
some differences of opinion.

10
00:00:24,240 --> 00:00:25,728
So I'd love to hear
some of your thoughts

11
00:00:25,728 --> 00:00:26,883
about machine learning

12
00:00:26,883 --> 00:00:29,900
and the goal that it's playing
and is going to play in economics.

13
00:00:30,200 --> 00:00:33,352
- [Guido] I've looked at some data
like the proprietary

14
00:00:33,352 --> 00:00:35,100
so that there's
no published paper there.

15
00:00:36,719 --> 00:00:38,159
There was an experiment
that was done

16
00:00:38,159 --> 00:00:39,500
on some search algorithm.

17
00:00:39,700 --> 00:00:41,497
And the question was...

18
00:00:42,901 --> 00:00:45,600
it was about ranking things
and changing the ranking.

19
00:00:45,900 --> 00:00:47,500
That was sort of clear...

20
00:00:48,400 --> 00:00:50,600
that was going to be
a lot of heterogeneity there.

21
00:00:50,600 --> 00:00:51,700
Mmm,

22
00:00:51,700 --> 00:00:58,120
You know, if you look for say,

23
00:00:58,300 --> 00:01:00,350
a picture of Britney Spears

24
00:01:00,350 --> 00:01:02,400
that it doesn't really matter
where you rank it

25
00:01:02,400 --> 00:01:05,500
because you're going to figure out
what you're looking for,

26
00:01:06,200 --> 00:01:07,867
whether you put it
in the first or second

27
00:01:07,867 --> 00:01:09,800
or third position of the ranking.

28
00:01:10,100 --> 00:01:12,500
But if you're looking
for the best econometrics book,

29
00:01:13,300 --> 00:01:16,500
if you put your book
first or your book tenth,

30
00:01:16,500 --> 00:01:18,100
that's going to make
a big difference

31
00:01:18,600 --> 00:01:21,829
how much how often people
are going to click on it.

32
00:01:21,829 --> 00:01:23,417
And so there you go --

33
00:01:23,417 --> 00:01:27,218
- [Josh] Why do I need
machine learning to discover that?

34
00:01:27,218 --> 00:01:29,195
It seems like could
I can discover it simply?

35
00:01:29,195 --> 00:01:30,435
- [Guido] So in general--

36
00:01:30,435 --> 00:01:32,100
- [Josh] There were lots
of possible...

37
00:01:32,100 --> 00:01:35,490
- You what you want to think about
there being lots of characteristics

38
00:01:35,490 --> 00:01:37,610
of the items

39
00:01:37,610 --> 00:01:41,682
that you want to understand
what drives the heterogeneity

40
00:01:42,300 --> 00:01:43,427
in the effect of--

41
00:01:43,427 --> 00:01:45,600
- But you're just predicting

42
00:01:45,600 --> 00:01:47,700
In some sense, you're solving
a marketing problem.

43
00:01:48,400 --> 00:01:49,580
- [inaudible] it's causal effect,

44
00:01:49,580 --> 00:01:51,800
- It's causal, but it has
no scientific content.

45
00:01:51,800 --> 00:01:53,300
Think about...

46
00:01:54,100 --> 00:01:57,300
- No, but it's similar things
in medical settings.

47
00:01:58,000 --> 00:02:01,300
If you do an experiment, 
you may actually be very interested

48
00:02:01,300 --> 00:02:03,900
in whether the treatment
works for some groups or not.

49
00:02:03,900 --> 00:02:06,500
And you have a lot of individual
characteristics,

50
00:02:06,500 --> 00:02:08,000
and you want
to systematically search.

51
00:02:08,000 --> 00:02:09,500
- Yeah. I'm skeptical about that --

52
00:02:09,500 --> 00:02:12,603
that sort of idea that there's
this personal causal effect

53
00:02:12,603 --> 00:02:13,900
that I should care about,

54
00:02:14,000 --> 00:02:16,063
and that machine learning
can discover it

55
00:02:16,063 --> 00:02:17,596
in some way that's useful.

56
00:02:17,596 --> 00:02:21,400
So think about -- I've done
a lot of work on schools,

57
00:02:21,400 --> 00:02:23,950
going to, say, a charter school,

58
00:02:23,950 --> 00:02:25,225
a publicly funded private school,

59
00:02:25,225 --> 00:02:26,500
effectively, you know,
that's free to structure

60
00:02:26,500 --> 00:02:29,300
its own curriculum
for context there.

61
00:02:29,300 --> 00:02:31,000
Some types of charter schools

62
00:02:31,000 --> 00:02:32,700
generate spectacular
achievement gains,

63
00:02:32,700 --> 00:02:36,400
and in the data set
that produces that result,

64
00:02:36,400 --> 00:02:37,800
I have a lot of covariance.

65
00:02:37,800 --> 00:02:41,353
So I have baseline scores,
and I have family background,

66
00:02:41,353 --> 00:02:43,576
the education of the parents,

67
00:02:43,576 --> 00:02:45,800
the sex of the child, 
the race of the child.

68
00:02:45,800 --> 00:02:48,300
And, well, soon as I put
half a dozen of those together,

69
00:02:48,400 --> 00:02:51,900
I have a very high dimensional space.

70
00:02:52,300 --> 00:02:53,600
I'm definitely interested
in sort of coarse features

71
00:02:53,600 --> 00:02:54,900
of that treatment effect,

72
00:02:54,900 --> 00:02:57,150
like whether it's better for people

73
00:02:57,150 --> 00:02:59,400
who come from
lower income families.

74
00:03:02,600 --> 00:03:06,000
I have a hard time believing
that there's an application,

75
00:03:06,400 --> 00:03:10,300
for the very high dimensional
version of that,

76
00:03:10,500 --> 00:03:11,850
where I discovered
that for non-white children

77
00:03:11,850 --> 00:03:13,200
who have high family incomes

78
00:03:13,800 --> 00:03:17,800
but baseline scores
in the third quartile

79
00:03:18,300 --> 00:03:20,650
and only went to public school
in the third grade

80
00:03:20,650 --> 00:03:23,000
but not the sixth grade.

81
00:03:23,000 --> 00:03:25,500
So that's what that high
dimensional analysis produces.

82
00:03:25,800 --> 00:03:28,100
This very elaborate
conditional statement.

83
00:03:28,300 --> 00:03:31,000
There's two things that are wrong
with that in my view.

84
00:03:31,000 --> 00:03:32,500
First, I don't see it as...

85
00:03:32,500 --> 00:03:34,000
I just can't imagine
why it's actionable.

86
00:03:34,600 --> 00:03:36,600
I don't know why
you'd want to act on it.

87
00:03:36,600 --> 00:03:38,900
And I know also
that there's some alternative model

88
00:03:38,900 --> 00:03:41,200
that fits almost as well,

89
00:03:41,800 --> 00:03:43,000
that flips everything,

90
00:03:43,200 --> 00:03:45,350
Because machine learning
doesn't tell me

91
00:03:45,350 --> 00:03:47,500
that this is really
the predictor that matters.

92
00:03:48,400 --> 00:03:52,300
It just tells me that
this is a good predictor.

93
00:03:52,800 --> 00:03:54,350
And so, I think
there is something different

94
00:03:54,350 --> 00:03:55,900
about the social science contest.

95
00:03:57,940 --> 00:03:59,545
- [Guido] I think
the [socialized sign] applications

96
00:03:59,545 --> 00:04:01,150
you're talking about,

97
00:04:01,150 --> 00:04:02,600
once were...

98
00:04:03,400 --> 00:04:08,100
I think there's not a huge amount
of heterogeneity in the effects.

99
00:04:08,400 --> 00:04:11,200
- [Josh] There might be

100
00:04:11,200 --> 00:04:14,000
if you allow me
to to fill that space.

101
00:04:14,600 --> 00:04:16,350
- No... not even then.

102
00:04:16,350 --> 00:04:18,100
I think for a lot
of those interventions,

103
00:04:18,300 --> 00:04:22,000
you would expect that the effect
is the same sign for everybody.

104
00:04:23,400 --> 00:04:27,600
There may be small differences
in the magnitude, but it's not...

105
00:04:28,200 --> 00:04:31,700
For a lot of these education
defenses -- they're good for everybody.

106
00:04:32,900 --> 00:04:35,250
It's not that they're bad
for some people

107
00:04:35,250 --> 00:04:37,600
and good for other people,

108
00:04:37,600 --> 00:04:39,200
and that is kind
of very small pockets

109
00:04:39,200 --> 00:04:40,800
where they're bad there.

110
00:04:40,900 --> 00:04:43,900
But it may be some variation
in the magnitude,

111
00:04:44,000 --> 00:04:48,200
but you would need very, 
very big data sets to find those.

112
00:04:48,400 --> 00:04:49,900
I agree that in those cases,

113
00:04:49,900 --> 00:04:51,400
they probably wouldn't be
very actionable anyone.

114
00:04:51,700 --> 00:04:53,800
But I think there's a lot
of other settings

115
00:04:54,100 --> 00:04:56,600
where there is
much more heterogeneity.

116
00:04:57,400 --> 00:04:59,500
- Well, I'm open
to that possibility,

117
00:04:59,500 --> 00:05:05,550
and I think the example you gave
is essentially a marketing example.

118
00:05:06,430 --> 00:05:10,700
- No, those have implications for it
and that's the organization,

119
00:05:10,700 --> 00:05:13,900
whether you need
to worry about the...

120
00:05:14,000 --> 00:05:17,900
- Well, I need to see that paper.

121
00:05:18,400 --> 00:05:21,200
- So the sense I'm getting...

122
00:05:21,500 --> 00:05:23,100
- We still disagree on something.
- Yes.

123
00:05:23,100 --> 00:05:24,100
[laughter]

124
00:05:24,100 --> 00:05:25,400
- We haven't converged
on everything.

125
00:05:25,400 --> 00:05:26,050
- I'm getting that sense.

126
00:05:26,050 --> 00:05:26,700
[laughter]

127
00:05:27,200 --> 00:05:29,100
- Actually, we've diverged on this

128
00:05:29,100 --> 00:05:30,050
because this wasn't around
to argue about.

129
00:05:30,050 --> 00:05:31,000
[laughter]

130
00:05:33,200 --> 00:05:35,600
- Is it getting a little warm here?

131
00:05:35,600 --> 00:05:38,000
- Warmed up. Warmed up is good.

132
00:05:38,100 --> 00:05:40,800
The sense I'm getting is, Josh,
you're not saying

133
00:05:40,900 --> 00:05:43,400
that you're confident
that there is no way

134
00:05:43,400 --> 00:05:45,400
that there is an application
where the stuff.

135
00:05:45,400 --> 00:05:46,800
It's useful you are saying

136
00:05:46,800 --> 00:05:48,200
you are unconvinced by
the existing application to date.

137
00:05:48,300 --> 00:05:51,280
Fair enough.

138
00:05:51,280 --> 00:05:53,120
- I'm very confident.

139
00:05:53,120 --> 00:05:54,300
[laughter]

140
00:05:54,300 --> 00:05:55,300
- In this case.

141
00:05:55,300 --> 00:05:57,500
- I think Josh does have a point

142
00:05:58,000 --> 00:06:02,100
that even in the prediction cases

143
00:06:02,300 --> 00:06:05,000
where a lot of the machine learning
methods really shine

144
00:06:05,000 --> 00:06:06,600
is where there's just a lot
of heterogeneity.

145
00:06:07,300 --> 00:06:10,600
- You don't really care much
about the details there, right?

146
00:06:10,900 --> 00:06:15,000
It doesn't have
a policy angle or something.

147
00:06:15,200 --> 00:06:18,100
- They kind of recognizing
handwritten digits and stuff.

148
00:06:18,300 --> 00:06:21,150
It does much better there

149
00:06:21,150 --> 00:06:24,000
than building
some complicated model.

150
00:06:24,400 --> 00:06:28,100
But a lot of the social science,
a lot of the economic applications,

151
00:06:28,300 --> 00:06:30,200
we actually know a huge amount
about the relationship

152
00:06:30,200 --> 00:06:32,100
between its variables.

153
00:06:32,100 --> 00:06:34,600
A lot of the relationships
are strictly monotone.

154
00:06:35,400 --> 00:06:39,400
Education is going to increase
people's earnings,

155
00:06:39,800 --> 00:06:41,950
irrespective of the demographic,

156
00:06:41,950 --> 00:06:44,100
irrespective of the level
of education you already have.

157
00:06:44,100 --> 00:06:45,950
- Until they get to a Ph.D.

158
00:06:45,950 --> 00:06:47,800
- Yeah, there is a graduate school...

159
00:06:48,150 --> 00:06:49,150
[laughter]

160
00:06:49,500 --> 00:06:50,700
but go over a reasonable range.

161
00:06:51,600 --> 00:06:55,900
It's not going
to go down very much.

162
00:06:56,100 --> 00:06:57,900
In a lot of the settings

163
00:06:57,900 --> 00:06:59,700
where these machine learning
methods shine,

164
00:06:59,700 --> 00:07:01,900
there's a lot of [ ]

165
00:07:02,100 --> 00:07:04,900
kind of multimodality
in these relationships,

166
00:07:05,300 --> 00:07:08,400
and they're going to be
very powerful.

167
00:07:08,400 --> 00:07:11,500
But I still stand by that.

168
00:07:11,700 --> 00:07:16,100
These methods just have
a huge amount to offer

169
00:07:16,400 --> 00:07:18,100
for economists,

170
00:07:18,200 --> 00:07:21,700
and they're going to be
a big part of the future.

171
00:07:23,400 --> 00:07:24,600
- [Isaiah] Feels like
there's something interesting

172
00:07:24,600 --> 00:07:25,800
to be said about
machine learning here.

173
00:07:25,800 --> 00:07:27,700
So, Guido, I was wondering,
could you give some more...

174
00:07:28,000 --> 00:07:29,000
maybe some examples
of the sorts of examples

175
00:07:29,000 --> 00:07:32,500
you're thinking about
with applications [ ] at the moment?

176
00:07:32,500 --> 00:07:34,100
- So on areas where

177
00:07:34,700 --> 00:07:36,400
instead of looking
for average cause or effects

178
00:07:36,500 --> 00:07:39,350
we're looking for
individualized estimates,

179
00:07:39,350 --> 00:07:42,200
predictions of cause or effects

180
00:07:42,400 --> 00:07:44,950
and the machine learning algorithms
have been very effective,

181
00:07:48,300 --> 00:07:51,500
Traditionally, we would have done
these things using kernel methods.

182
00:07:51,600 --> 00:07:54,500
And theoretically they work great,

183
00:07:54,600 --> 00:07:56,000
and there's some arguments

184
00:07:56,000 --> 00:07:57,400
that, formally, 
you can't do any better.

185
00:07:57,600 --> 00:08:00,500
But in practice, 
they don't work very well.

186
00:08:00,900 --> 00:08:03,150
Random causal forest-type things

187
00:08:03,150 --> 00:08:05,400
that Stefan Wager and Susan Athey
have been working on

188
00:08:05,400 --> 00:08:09,500
have used very widely.

189
00:08:09,600 --> 00:08:12,200
They've been very effective
in these settings

190
00:08:12,400 --> 00:08:18,100
to actually get causal effects
that vary be [ ].

191
00:08:20,700 --> 00:08:23,200
I think this is still just the beginning
of these methods.

192
00:08:23,200 --> 00:08:25,700
But in many cases,

193
00:08:26,400 --> 00:08:31,600
these algorithms are very effective
as searching over big spaces

194
00:08:31,800 --> 00:08:35,600
and finding the functions that fit very well

195
00:08:35,900 --> 00:08:41,100
in ways that we couldn't
really do beforehand.

196
00:08:41,500 --> 00:08:43,400
- I don't know of an example

197
00:08:43,400 --> 00:08:45,300
where machine learning
has generated insights

198
00:08:45,300 --> 00:08:48,100
about a causal effect
that I'm interested in.

199
00:08:48,300 --> 00:08:49,800
And I do know of examples

200
00:08:49,800 --> 00:08:51,300
where it's potentially
very misleading.

201
00:08:51,300 --> 00:08:53,700
So I've done some work
with Brigham Frandsen,

202
00:08:54,100 --> 00:08:55,100
using, for example, random forest
to model covariate effects

203
00:08:55,100 --> 00:08:59,900
in an instrumental
variables problem

204
00:09:00,200 --> 00:09:01,200
Where you need you need
to condition on covariance.

205
00:09:04,400 --> 00:09:06,300
And you don't particularly
have strong feelings

206
00:09:06,300 --> 00:09:08,200
about the functional form for that,

207
00:09:08,200 --> 00:09:10,000
so maybe you should curve...

208
00:09:10,900 --> 00:09:12,700
be open to flexible curve fitting,

209
00:09:12,700 --> 00:09:14,500
and that leads you down a path

210
00:09:14,500 --> 00:09:18,000
where there's a lot
of nonlinearities in the model,

211
00:09:18,200 --> 00:09:20,600
and that's very dangerous with IV

212
00:09:20,600 --> 00:09:23,000
because any sort
of excluded non-linearity

213
00:09:23,300 --> 00:09:25,450
potentially generates
a spurious causal effect

214
00:09:25,450 --> 00:09:27,600
and Brigham and I
showed that very powerfully.

215
00:09:27,900 --> 00:09:32,200
I think in the case
of two instruments

216
00:09:32,700 --> 00:09:36,000
that come from a paper of mine
with Bill Evans,

217
00:09:36,500 --> 00:09:37,600
where if you replace it

218
00:09:38,100 --> 00:09:40,350
a traditional two stage 
[ ] squares estimator

219
00:09:40,350 --> 00:09:42,600
with some kind of random forest,

220
00:09:42,900 --> 00:09:48,000
you get very precisely
estimated [non-sense] estimates.

221
00:09:49,000 --> 00:09:51,100
I think that's a big caution.

222
00:09:51,100 --> 00:09:53,400
In view of those findings
in an example I care about

223
00:09:53,700 --> 00:09:57,100
where the instruments
are very simple

224
00:09:57,400 --> 00:09:59,100
and I believe that they're valid,

225
00:09:59,300 --> 00:10:01,600
I would be skeptical of that.

226
00:10:02,900 --> 00:10:06,800
So non-linearity and IV
don't mix very comfortably.

227
00:10:07,200 --> 00:10:10,450
No, it sounds like that's already
a more complicated...

228
00:10:10,450 --> 00:10:11,400
- Well, it's IV....
- Yeah.

229
00:10:12,500 --> 00:10:16,700
- ...and we work on that.

230
00:10:17,150 --> 00:10:17,875
[laughter]

231
00:10:17,875 --> 00:10:18,600
- Fair enough.

232
00:10:18,600 --> 00:10:20,450
- As Editor of Econometric [guy],

233
00:10:20,450 --> 00:10:22,300
a lot of these papers
cross by my desk,

234
00:10:22,700 --> 00:10:26,100
but the motivation is not clear

235
00:10:26,100 --> 00:10:29,500
and, in fact, really lacking.

236
00:10:29,800 --> 00:10:35,100
They're not... [we call] type
semi-parametric foundational papers.

237
00:10:35,400 --> 00:10:37,100
So that that's a big problem.

238
00:10:38,000 --> 00:10:42,400
A related problem is that we have
this tradition in econometrics

239
00:10:42,600 --> 00:10:47,500
of being very focused
on these formal [ ] results.

240
00:10:48,800 --> 00:10:52,600
We have just have a lot of papers
where people propose a method

241
00:10:52,800 --> 00:10:55,700
and then establish
the asymptotic properties

242
00:10:56,300 --> 00:10:59,100
in a very kind of standardized way.

243
00:10:59,100 --> 00:11:01,900
- Is that bad?

244
00:11:02,900 --> 00:11:07,200
- Well, I think it's sort
of closed the door

245
00:11:07,200 --> 00:11:09,400
for a lot of work
that doesn't fit it into that.

246
00:11:09,400 --> 00:11:11,600
where in the machine
learning literature,

247
00:11:11,900 --> 00:11:14,300
a lot of things
are more algorithmic.

248
00:11:14,431 --> 00:11:18,500
People had algorithms
for coming up with predictions

249
00:11:18,800 --> 00:11:21,200
that turn out
to actually work much better

250
00:11:21,200 --> 00:11:23,600
than, say, nonparametric
kernel regression

251
00:11:24,000 --> 00:11:26,800
For a long time, we were doing all
the nonparametrics in econometrics,

252
00:11:26,800 --> 00:11:28,950
we were using kernel regression,

253
00:11:28,950 --> 00:11:31,100
and it was great for proving theorems.

254
00:11:31,300 --> 00:11:33,050
You could get [ ] intervals

255
00:11:33,050 --> 00:11:34,800
and consistency, 
and asymptotic normality,

256
00:11:34,800 --> 00:11:35,900
and it was all great,

257
00:11:35,900 --> 00:11:37,000
But it wasn't very useful.

258
00:11:37,300 --> 00:11:39,100
And the things they did
in machine learning

259
00:11:39,100 --> 00:11:40,900
are just way, way better.

260
00:11:41,000 --> 00:11:43,050
But they didn't have the problem--

261
00:11:43,050 --> 00:11:44,300
- That's not my beef
with machine learning theory.

262
00:11:44,300 --> 00:11:45,300
[laughter]

263
00:11:45,300 --> 00:11:51,200
No, but I'm saying there,
for the prediction part,

264
00:11:51,400 --> 00:11:52,950
it does much better.

265
00:11:52,950 --> 00:11:54,500
- Yeah, it's a better
curve fitting to it.

266
00:11:54,900 --> 00:11:56,500
- But it did so in a way

267
00:11:57,100 --> 00:11:58,500
that would not have made
those papers

268
00:11:58,500 --> 00:11:59,900
initially easy to get into,
the econometrics journals,

269
00:12:04,650 --> 00:12:06,300
because it wasn't proving
the type of things.

270
00:12:06,400 --> 00:12:08,800
When Brigham was doing
his regression trees

271
00:12:08,800 --> 00:12:11,200
that just didn't fit in.

272
00:12:11,800 --> 00:12:15,100
I think he would have had
a very hard time

273
00:12:15,200 --> 00:12:18,400
publishing these things
in econometric journals.

274
00:12:18,900 --> 00:12:24,400
I think we've limited
ourselves too much

275
00:12:24,700 --> 00:12:27,900
that left us close things off

276
00:12:28,000 --> 00:12:29,400
for a lot of these
machine learning methods

277
00:12:29,400 --> 00:12:30,800
that are actually very useful.

278
00:12:30,900 --> 00:12:34,000
I mean, I think, in general,

279
00:12:34,900 --> 00:12:36,200
that literature, 
the computer scientist,

280
00:12:36,200 --> 00:12:37,750
have proposed a huge number
of these algorithms

281
00:12:37,750 --> 00:12:39,300
that actually are very useful.

282
00:12:45,500 --> 00:12:47,300
and that are affecting

283
00:12:47,300 --> 00:12:49,100
the way we're going
to be doing empirical work.

284
00:12:49,800 --> 00:12:52,450
But we've not fully internalized that

285
00:12:52,450 --> 00:12:55,100
because we're still very focused

286
00:12:55,300 --> 00:12:57,500
on getting point estimates
and getting standard errors

287
00:12:58,600 --> 00:13:01,200
and getting P values

288
00:13:01,700 --> 00:13:03,100
in a way that we need to move beyond

289
00:13:03,300 --> 00:13:04,300
to fully harness the force,

290
00:13:04,300 --> 00:13:10,700
the benefits
from the machine learning literature.

291
00:13:10,900 --> 00:13:13,000
- On the one hand, I guess I very
much take your point

292
00:13:13,000 --> 00:13:15,100
that sort of the traditional
econometrics framework

293
00:13:15,200 --> 00:13:18,600
of sort of propose a method,
prove a limit theorem

294
00:13:18,600 --> 00:13:22,600
under some asymptotic story,
story story, story story...

295
00:13:22,600 --> 00:13:26,900
publisher paper is constraining.

296
00:13:26,900 --> 00:13:29,700
And that, in some sense,

297
00:13:29,700 --> 00:13:30,575
by thinking more broadly

298
00:13:30,575 --> 00:13:31,450
about what a methods paper
could look like,

299
00:13:31,450 --> 00:13:33,200
we may [write] in some sense.

300
00:13:33,200 --> 00:13:35,900
Certainly the machine learning
literature has found a bunch of things,

301
00:13:35,900 --> 00:13:38,300
which seem to work quite well
for a number of problems

302
00:13:38,300 --> 00:13:40,350
and are now having
substantial influence in economics.

303
00:13:40,350 --> 00:13:42,400
I guess a question I'm interested in

304
00:13:42,400 --> 00:13:44,800
is how do you think
about the role of...

305
00:13:47,900 --> 00:13:51,200
sort of -- do you think there is
no value in the theory part of it?

306
00:13:51,600 --> 00:13:54,800
Because I guess a question
that I often have

307
00:13:54,800 --> 00:13:56,900
to sort of seeing that output
from a machine learning tool,

308
00:13:56,900 --> 00:13:59,400
that actually a number of the
methods that you talked about

309
00:13:59,400 --> 00:14:01,800
actually do have inferential results
developed for them,

310
00:14:02,600 --> 00:14:04,500
something that
I always wonder about

311
00:14:04,500 --> 00:14:06,400
of uncertainty quantification
and just...

312
00:14:06,500 --> 00:14:08,000
I have my prior,

313
00:14:08,000 --> 00:14:11,000
I come into the world with my view.
I see the result of this thing.

314
00:14:11,000 --> 00:14:12,750
How should I update based on it?

315
00:14:12,750 --> 00:14:14,500
And in some sense, 
if I'm in a world

316
00:14:14,600 --> 00:14:15,100
where things are normally distributed,

317
00:14:15,200 --> 00:14:16,700
I know how to do it here --

318
00:14:16,700 --> 00:14:18,200
here I don't.

319
00:14:18,200 --> 00:14:21,400
And so I'm interested to hear
what you think about that.

320
00:14:21,500 --> 00:14:24,300
- I don't see this as sort
of saying, well,

321
00:14:24,400 --> 00:14:26,500
these results are not interesting,

322
00:14:26,600 --> 00:14:27,700
but it's going to be a lot of cases

323
00:14:28,000 --> 00:14:29,600
where it's going
to be incredibly hard

324
00:14:29,600 --> 00:14:31,200
to get those results

325
00:14:31,200 --> 00:14:33,200
and we may not be able to get there

326
00:14:33,400 --> 00:14:35,550
and we may need to do it in stages

327
00:14:35,550 --> 00:14:37,700
where first someone says,

328
00:14:39,600 --> 00:14:40,900
"Hey, I have
this interesting algorithm

329
00:14:40,900 --> 00:14:42,200
for doing something

330
00:14:42,200 --> 00:14:44,800
and it works well by some of the criterion

331
00:14:45,600 --> 00:14:49,900
that on this particular data set,

332
00:14:51,000 --> 00:14:53,400
and I'm visit put it out there,

333
00:14:53,700 --> 00:14:55,850
and maybe someone will figure out a way

334
00:14:55,850 --> 00:14:58,000
that you can later actually
still do inference

335
00:14:58,000 --> 00:14:59,100
on the [sum] condition,

336
00:14:59,100 --> 00:15:02,100
and maybe those are not
particularly realistic conditions,

337
00:15:02,100 --> 00:15:03,800
then we kind of go further.

338
00:15:03,800 --> 00:15:05,500
But I think we've been
constraining things too much

339
00:15:06,700 --> 00:15:09,050
where we said,

340
00:15:09,050 --> 00:15:11,400
"This is the type of things
that we need to do.

341
00:15:12,100 --> 00:15:14,400
And in some sense,

342
00:15:15,700 --> 00:15:18,200
that goes back
to the way Josh and I

343
00:15:19,700 --> 00:15:21,900
thought about things for the
[local average treatment] effect.

344
00:15:21,900 --> 00:15:23,250
That wasn't quite the way

345
00:15:23,250 --> 00:15:24,600
people were thinking
about these problems before.

346
00:15:24,600 --> 00:15:29,200
There was a sense
that some of the people said

347
00:15:29,500 --> 00:15:31,900
the way you need to do
these things is you first say,

348
00:15:32,200 --> 00:15:34,250
what you're interested in
in estimating

349
00:15:34,250 --> 00:15:36,300
and then you do the best job
you can in estimating that.

350
00:15:38,100 --> 00:15:44,200
and what you guys are doing
is you're doing it backwards.

351
00:15:44,300 --> 00:15:46,700
You kind of say,
"Here, I have an estimator,

352
00:15:47,300 --> 00:15:49,600
and now I'm going to figure out
what it's estimating,

353
00:15:51,400 --> 00:15:53,900
and I suppose you're going to say
why you think that's interesting

354
00:15:53,900 --> 00:15:56,600
or maybe why it's not interesting,
and that's not okay.

355
00:15:56,600 --> 00:15:58,600
You're not allowed
to do that that way.

356
00:15:59,000 --> 00:16:04,100
And I think we should
just be a little bit more flexible

357
00:16:04,300 --> 00:16:06,300
in thinking about
how to look at problems

358
00:16:06,400 --> 00:16:08,850
because I think
we've missed some things

359
00:16:08,850 --> 00:16:11,300
by not doing that.

360
00:16:13,000 --> 00:16:14,800
- [Josh] So you've heard
our views, Isaiah.

361
00:16:14,800 --> 00:16:16,600
You've seen that we have
some points of disagreement.

362
00:16:17,000 --> 00:16:20,400
Why don't you referee
this dispute for us?

363
00:16:20,950 --> 00:16:21,950
[laughter]

364
00:16:22,500 --> 00:16:25,300
- Oh, it's so nice of you
to ask me a small question.

365
00:16:25,300 --> 00:16:28,100
So I guess for one,

366
00:16:28,200 --> 00:16:33,200
I very much agree with something
that Guido said earlier of...

367
00:16:34,100 --> 00:16:35,100
[laughter]

368
00:16:36,500 --> 00:16:37,900
- So one thing where it seems

369
00:16:37,900 --> 00:16:39,650
where the case for machine learning
seems relatively clear

370
00:16:39,650 --> 00:16:41,400
is in settings where
we're interested in some version

371
00:16:41,500 --> 00:16:45,100
of a nonparametric
prediction problem.

372
00:16:45,100 --> 00:16:47,400
So I'm interested in estimating

373
00:16:47,400 --> 00:16:49,700
a conditional expectation
or conditional probability,

374
00:16:50,000 --> 00:16:52,100
and in the past, maybe
I would have run a kernel...

375
00:16:52,100 --> 00:16:53,950
I would have run
a kernel regression

376
00:16:53,950 --> 00:16:55,800
or I would have run
a series regression,

377
00:16:56,100 --> 00:16:57,400
or something along those lines.

378
00:16:58,700 --> 00:17:00,350
It seems like, at this point, 
we've a fairly good sense

379
00:17:00,350 --> 00:17:02,000
that in a fairly wide range
of applications,

380
00:17:02,000 --> 00:17:06,300
machine learning methods
seem to do better

381
00:17:06,800 --> 00:17:08,800
for estimating conditional
mean functions

382
00:17:08,800 --> 00:17:10,400
or conditional probabilities

383
00:17:10,400 --> 00:17:12,000
or various other
nonparametric objects

384
00:17:12,400 --> 00:17:14,500
than more traditional
nonparametric methods

385
00:17:14,500 --> 00:17:16,600
that were studied
in econometrics and statistics,

386
00:17:16,600 --> 00:17:19,100
especially
in high dimensional settings.

387
00:17:19,500 --> 00:17:21,300
- So you're thinking of maybe
the propensity score

388
00:17:21,300 --> 00:17:23,100
or something like that?

389
00:17:23,100 --> 00:17:24,200
- Yeah, exactly,

390
00:17:24,200 --> 00:17:25,300
- Nuisance functions.

391
00:17:25,300 --> 00:17:27,100
Yeah, so things
like propensity scores,

392
00:17:27,530 --> 00:17:29,965
even objects of more direct

393
00:17:29,965 --> 00:17:32,400
interest-like conditional
average treatment effects,

394
00:17:32,400 --> 00:17:35,100
which of the difference of two
conditional expectation functions,

395
00:17:35,100 --> 00:17:36,300
potentially things like that.

396
00:17:36,500 --> 00:17:40,400
Of course, even there, the theory...

397
00:17:40,500 --> 00:17:43,700
inference of the theory
for how to interpret,

398
00:17:43,700 --> 00:17:45,900
how to make large simple statements
about some of these things

399
00:17:46,000 --> 00:17:48,050
are less well-developed
depending on

400
00:17:48,050 --> 00:17:50,100
the machine learning
estimator used.

401
00:17:50,100 --> 00:17:53,800
And so I think there's
something that is tricky

402
00:17:53,900 --> 00:17:55,700
is that we can have these methods,
which work a lot,

403
00:17:55,700 --> 00:17:58,000
which seemed to work
a lot better for some purposes,

404
00:17:58,000 --> 00:18:01,600
but which we need to be a bit
careful in how we plug them in

405
00:18:01,600 --> 00:18:03,300
or how we interpret
the resulting statements.

406
00:18:03,600 --> 00:18:06,200
But of course, that's a very,
very active area right now

407
00:18:06,400 --> 00:18:08,400
where people are doing
tons of great work.

408
00:18:08,400 --> 00:18:10,400
And so I fully expect
and hope to see

409
00:18:10,400 --> 00:18:12,800
much more going forward there.

410
00:18:13,000 --> 00:18:17,300
So one issue with machine learning
that always seems a danger

411
00:18:17,400 --> 00:18:20,300
or that is sometimes a danger

412
00:18:20,500 --> 00:18:21,550
and had sometimes
led to applications

413
00:18:21,550 --> 00:18:22,600
that have made less sense

414
00:18:22,800 --> 00:18:25,100
is when folks start with a method
that they're very excited about

415
00:18:25,300 --> 00:18:28,500
rather than a question.

416
00:18:28,900 --> 00:18:32,100
So sort of starting with a question

417
00:18:32,500 --> 00:18:34,350
where here's the object I'm interested in,

418
00:18:34,350 --> 00:18:36,200
here is the parameter of interest.

419
00:18:37,300 --> 00:18:39,500
let me think about how I would
identify that thing,

420
00:18:39,500 --> 00:18:41,800
how I would recover that thing
if I had a ton of data.

421
00:18:41,900 --> 00:18:44,000
Oh, here's a conditional
expectation function.

422
00:18:44,000 --> 00:18:47,100
Let me plug in the machine
learning estimator for that.

423
00:18:47,200 --> 00:18:48,800
That seems very, very sensible.

424
00:18:49,000 --> 00:18:53,100
Whereas, you know, 
if I regress quantity on price

425
00:18:53,700 --> 00:18:56,000
and say that I used
a machine learning method,

426
00:18:56,300 --> 00:18:58,900
maybe I'm satisfied that 
that solves the [ ] problem

427
00:18:58,900 --> 00:19:01,200
we're usually worried
about there... maybe I'm not.

428
00:19:01,500 --> 00:19:03,200
But again, that's something

429
00:19:03,400 --> 00:19:06,300
where the way to address it
seems relatively clear.

430
00:19:06,500 --> 00:19:09,000
It's to find your object of interest

431
00:19:09,200 --> 00:19:10,400
and think about--

432
00:19:10,400 --> 00:19:11,600
- Just bring in the economics.

433
00:19:11,700 --> 00:19:12,200
- Exactly.

434
00:19:12,200 --> 00:19:15,400
- And and can I think about heterogeneity,

435
00:19:15,400 --> 00:19:18,300
but harnessed the power
of the machine learning methods

436
00:19:18,500 --> 00:19:20,650
for some of the components.

437
00:19:20,650 --> 00:19:22,800
- Precisely. Exactly.

438
00:19:22,900 --> 00:19:24,250
So the question of interest

439
00:19:24,250 --> 00:19:25,600
is the same as the question
of interest has always been,

440
00:19:25,600 --> 00:19:29,500
but we now have better methods
for estimating some pieces of this.

441
00:19:29,900 --> 00:19:31,600
The place that seems
harder to forecast

442
00:19:33,400 --> 00:19:34,850
is obviously, there's
a huge amount going on

443
00:19:34,850 --> 00:19:36,300
in the machine learning literature

444
00:19:37,500 --> 00:19:38,600
and the limited ways
of plugging it in

445
00:19:38,600 --> 00:19:39,700
that I've referenced so far

446
00:19:39,700 --> 00:19:42,900
are a limited piece of that.

447
00:19:43,000 --> 00:19:44,550
And so I think there are all sorts
of other interesting questions

448
00:19:44,550 --> 00:19:46,100
about where...

449
00:19:47,100 --> 00:19:49,300
where does this interaction go? 
What else can we learn?

450
00:19:49,300 --> 00:19:52,000
And that's something where
I think there's a ton going on

451
00:19:52,200 --> 00:19:54,300
which seems very promising,

452
00:19:54,300 --> 00:19:56,400
and I have no idea
what the answer is.

453
00:19:57,000 --> 00:19:59,100
- No, I totally agree with that,

454
00:19:59,100 --> 00:20:01,200
but that makes it very exciting.

455
00:20:03,800 --> 00:20:06,100
And I think there's just
a little work to be done there.

456
00:20:06,600 --> 00:20:09,000
Alright. So I say, he agrees
with me there.

457
00:20:09,000 --> 00:20:11,400
[laughter]

458
00:20:12,450 --> 00:20:13,450
- I didn't say that per se.

459
00:20:14,500 --> 00:20:16,100
- [Narrator] If you'd like to watch
more Nobel Conversations,

460
00:20:16,100 --> 00:20:17,700
click here.

461
00:20:18,000 --> 00:20:20,400
Pr if you'd like to learn
more about econometrics,

462
00:20:20,500 --> 00:20:23,100
check out Josh's
Mastering Econometrics series.

463
00:20:23,600 --> 00:20:26,500
If you'd like to learn more
about Guido, Josh, and Isaiah,

464
00:20:26,700 --> 00:20:28,200
check out the links
in the description.

465
00:20:28,550 --> 00:20:30,535
♪ [music] ♪