1
00:00:00,100 --> 00:00:02,050
♪ [music] ♪

2
00:00:03,620 --> 00:00:05,700
- [Narrator] Welcome
to Nobel Conversations.

3
00:00:07,000 --> 00:00:10,043
In this episode, Josh Angrist
and Guido Imbens

4
00:00:10,043 --> 00:00:13,675
sit down with Isaiah Andrews
to discuss and disagree

5
00:00:13,675 --> 00:00:16,580
over the role of machine learning
in applied econometrics.

6
00:00:18,237 --> 00:00:19,769
- [Isaiah] So, of course,
there are a lot of topics

7
00:00:19,769 --> 00:00:21,087
where you guys largely agree,

8
00:00:21,087 --> 00:00:22,313
but I'd like to turn to one

9
00:00:22,313 --> 00:00:24,240
where maybe you have
some differences of opinion.

10
00:00:24,240 --> 00:00:25,728
I'd love to hear
some of your thoughts

11
00:00:25,728 --> 00:00:26,883
about machine learning

12
00:00:26,883 --> 00:00:29,900
and the goal that it's playing
and is going to play in economics.

13
00:00:30,200 --> 00:00:33,352
- [Guido] I've looked at some data
like the proprietary.

14
00:00:33,352 --> 00:00:35,150
We see that there's
no published paper there.

15
00:00:36,122 --> 00:00:38,159
There was an experiment
that was done

16
00:00:38,159 --> 00:00:39,500
on some search algorithm,

17
00:00:39,700 --> 00:00:41,497
and the question was...

18
00:00:42,901 --> 00:00:45,600
it was about ranking things
and changing the ranking.

19
00:00:45,900 --> 00:00:47,290
And it was sort of clear...

20
00:00:48,400 --> 00:00:50,810
that there was going to be
a lot of heterogeneity there.

21
00:00:52,161 --> 00:00:57,580
If you look for, say,

22
00:00:57,831 --> 00:01:00,617
a picture of Britney Spears

23
00:01:00,617 --> 00:01:02,493
that it doesn't really matter
where you rank it

24
00:01:02,493 --> 00:01:05,500
because you're going to figure out
what you're looking for,

25
00:01:06,200 --> 00:01:07,867
whether you put it
in the first or second

26
00:01:07,867 --> 00:01:09,800
or third position of the ranking.

27
00:01:10,100 --> 00:01:12,500
But if you're looking
for the best econometrics book,

28
00:01:13,300 --> 00:01:16,430
if you put your book first
or your book tenth --

29
00:01:16,430 --> 00:01:18,100
that's going to make
a big difference

30
00:01:18,600 --> 00:01:20,979
how often people
are going to click on it.

31
00:01:21,829 --> 00:01:23,417
And so there you --

32
00:01:23,417 --> 00:01:27,218
- [Josh] Why do I need
machine learning to discover that?

33
00:01:27,218 --> 00:01:29,195
It seems like I could
I can discover it simply?

34
00:01:29,195 --> 00:01:30,435
- [Guido] So in general--

35
00:01:30,435 --> 00:01:32,100
- [Josh] There were lots
of possible...

36
00:01:32,100 --> 00:01:35,490
- You what you want to think about
there being lots of characteristics

37
00:01:35,490 --> 00:01:37,610
of the items

38
00:01:37,610 --> 00:01:41,682
that you want to understand
what drives the heterogeneity

39
00:01:42,300 --> 00:01:43,427
in the effect of--

40
00:01:43,427 --> 00:01:45,600
- But you're just predicting

41
00:01:45,600 --> 00:01:47,700
In some sense, you're solving
a marketing problem.

42
00:01:48,400 --> 00:01:49,580
- [inaudible] it's causal effect,

43
00:01:49,580 --> 00:01:51,800
- It's causal, but it has
no scientific content.

44
00:01:51,800 --> 00:01:53,300
Think about...

45
00:01:54,100 --> 00:01:57,300
- No, but it's similar things
in medical settings.

46
00:01:58,000 --> 00:02:01,300
If you do an experiment, 
you may actually be very interested

47
00:02:01,300 --> 00:02:03,900
in whether the treatment
works for some groups or not.

48
00:02:03,900 --> 00:02:06,500
And you have a lot of individual
characteristics,

49
00:02:06,500 --> 00:02:08,000
and you want
to systematically search.

50
00:02:08,000 --> 00:02:09,500
- Yeah. I'm skeptical about that --

51
00:02:09,500 --> 00:02:12,603
that sort of idea that there's
this personal causal effect

52
00:02:12,603 --> 00:02:13,900
that I should care about,

53
00:02:14,000 --> 00:02:16,063
and that machine learning
can discover it

54
00:02:16,063 --> 00:02:17,596
in some way that's useful.

55
00:02:17,596 --> 00:02:21,400
So think about -- I've done
a lot of work on schools,

56
00:02:21,400 --> 00:02:23,950
going to, say, a charter school,

57
00:02:23,950 --> 00:02:25,225
a publicly funded private school,

58
00:02:25,225 --> 00:02:26,500
effectively, you know,
that's free to structure

59
00:02:26,500 --> 00:02:29,300
its own curriculum
for context there.

60
00:02:29,300 --> 00:02:31,000
Some types of charter schools

61
00:02:31,000 --> 00:02:32,700
generate spectacular
achievement gains,

62
00:02:32,700 --> 00:02:36,400
and in the data set
that produces that result,

63
00:02:36,400 --> 00:02:37,800
I have a lot of covariance.

64
00:02:37,800 --> 00:02:41,353
So I have baseline scores,
and I have family background,

65
00:02:41,353 --> 00:02:43,576
the education of the parents,

66
00:02:43,576 --> 00:02:45,800
the sex of the child, 
the race of the child.

67
00:02:45,800 --> 00:02:48,300
And, well, soon as I put
half a dozen of those together,

68
00:02:48,400 --> 00:02:51,900
I have a very high dimensional space.

69
00:02:52,300 --> 00:02:53,600
I'm definitely interested
in sort of coarse features

70
00:02:53,600 --> 00:02:54,900
of that treatment effect,

71
00:02:54,900 --> 00:02:57,150
like whether it's better for people

72
00:02:57,150 --> 00:02:59,400
who come from
lower income families.

73
00:03:02,600 --> 00:03:06,000
I have a hard time believing
that there's an application,

74
00:03:06,400 --> 00:03:10,300
for the very high dimensional
version of that,

75
00:03:10,500 --> 00:03:11,850
where I discovered
that for non-white children

76
00:03:11,850 --> 00:03:13,200
who have high family incomes

77
00:03:13,800 --> 00:03:17,800
but baseline scores
in the third quartile

78
00:03:18,300 --> 00:03:20,650
and only went to public school
in the third grade

79
00:03:20,650 --> 00:03:23,000
but not the sixth grade.

80
00:03:23,000 --> 00:03:25,500
So that's what that high
dimensional analysis produces.

81
00:03:25,800 --> 00:03:28,100
This very elaborate
conditional statement.

82
00:03:28,300 --> 00:03:31,000
There's two things that are wrong
with that in my view.

83
00:03:31,000 --> 00:03:32,500
First, I don't see it as...

84
00:03:32,500 --> 00:03:34,000
I just can't imagine
why it's actionable.

85
00:03:34,600 --> 00:03:36,600
I don't know why
you'd want to act on it.

86
00:03:36,600 --> 00:03:38,900
And I know also
that there's some alternative model

87
00:03:38,900 --> 00:03:41,200
that fits almost as well,

88
00:03:41,800 --> 00:03:43,000
that flips everything,

89
00:03:43,200 --> 00:03:45,350
Because machine learning
doesn't tell me

90
00:03:45,350 --> 00:03:47,500
that this is really
the predictor that matters.

91
00:03:48,400 --> 00:03:52,300
It just tells me that
this is a good predictor.

92
00:03:52,800 --> 00:03:54,350
And so, I think
there is something different

93
00:03:54,350 --> 00:03:55,900
about the social science contest.

94
00:03:57,940 --> 00:03:59,545
- [Guido] I think
the [socialized sign] applications

95
00:03:59,545 --> 00:04:01,150
you're talking about,

96
00:04:01,150 --> 00:04:02,600
once were...

97
00:04:03,400 --> 00:04:08,100
I think there's not a huge amount
of heterogeneity in the effects.

98
00:04:08,400 --> 00:04:11,200
- [Josh] There might be

99
00:04:11,200 --> 00:04:14,000
if you allow me
to to fill that space.

100
00:04:14,600 --> 00:04:16,350
- No... not even then.

101
00:04:16,350 --> 00:04:18,100
I think for a lot
of those interventions,

102
00:04:18,300 --> 00:04:22,000
you would expect that the effect
is the same sign for everybody.

103
00:04:23,400 --> 00:04:27,600
There may be small differences
in the magnitude, but it's not...

104
00:04:28,200 --> 00:04:31,700
For a lot of these education
defenses -- they're good for everybody.

105
00:04:32,900 --> 00:04:35,250
It's not that they're bad
for some people

106
00:04:35,250 --> 00:04:37,600
and good for other people,

107
00:04:37,600 --> 00:04:39,200
and that is kind
of very small pockets

108
00:04:39,200 --> 00:04:40,800
where they're bad there.

109
00:04:40,900 --> 00:04:43,900
But it may be some variation
in the magnitude,

110
00:04:44,000 --> 00:04:48,200
but you would need very, 
very big data sets to find those.

111
00:04:48,400 --> 00:04:49,900
I agree that in those cases,

112
00:04:49,900 --> 00:04:51,400
they probably wouldn't be
very actionable anyone.

113
00:04:51,700 --> 00:04:53,800
But I think there's a lot
of other settings

114
00:04:54,100 --> 00:04:56,600
where there is
much more heterogeneity.

115
00:04:57,400 --> 00:04:59,500
- Well, I'm open
to that possibility,

116
00:04:59,500 --> 00:05:05,550
and I think the example you gave
is essentially a marketing example.

117
00:05:06,430 --> 00:05:10,700
- No, those have implications for it
and that's the organization,

118
00:05:10,700 --> 00:05:13,900
whether you need
to worry about the...

119
00:05:14,000 --> 00:05:17,900
- Well, I need to see that paper.

120
00:05:18,400 --> 00:05:21,200
- So the sense I'm getting...

121
00:05:21,500 --> 00:05:23,100
- We still disagree on something.
- Yes.

122
00:05:23,100 --> 00:05:24,100
[laughter]

123
00:05:24,100 --> 00:05:25,400
- We haven't converged
on everything.

124
00:05:25,400 --> 00:05:26,050
- I'm getting that sense.

125
00:05:26,050 --> 00:05:26,700
[laughter]

126
00:05:27,200 --> 00:05:29,100
- Actually, we've diverged on this

127
00:05:29,100 --> 00:05:30,050
because this wasn't around
to argue about.

128
00:05:30,050 --> 00:05:31,000
[laughter]

129
00:05:33,200 --> 00:05:35,600
- Is it getting a little warm here?

130
00:05:35,600 --> 00:05:38,000
- Warmed up. Warmed up is good.

131
00:05:38,100 --> 00:05:40,800
The sense I'm getting is, Josh,
you're not saying

132
00:05:40,900 --> 00:05:43,400
that you're confident
that there is no way

133
00:05:43,400 --> 00:05:45,400
that there is an application
where the stuff.

134
00:05:45,400 --> 00:05:46,800
It's useful you are saying

135
00:05:46,800 --> 00:05:48,200
you are unconvinced by
the existing application to date.

136
00:05:48,300 --> 00:05:51,280
Fair enough.

137
00:05:51,280 --> 00:05:53,120
- I'm very confident.

138
00:05:53,120 --> 00:05:54,300
[laughter]

139
00:05:54,300 --> 00:05:55,300
- In this case.

140
00:05:55,300 --> 00:05:57,500
- I think Josh does have a point

141
00:05:58,000 --> 00:06:02,100
that even in the prediction cases

142
00:06:02,300 --> 00:06:05,000
where a lot of the machine learning
methods really shine

143
00:06:05,000 --> 00:06:06,600
is where there's just a lot
of heterogeneity.

144
00:06:07,300 --> 00:06:10,600
- You don't really care much
about the details there, right?

145
00:06:10,900 --> 00:06:15,000
It doesn't have
a policy angle or something.

146
00:06:15,200 --> 00:06:18,100
- They kind of recognizing
handwritten digits and stuff.

147
00:06:18,300 --> 00:06:21,150
It does much better there

148
00:06:21,150 --> 00:06:24,000
than building
some complicated model.

149
00:06:24,400 --> 00:06:28,100
But a lot of the social science,
a lot of the economic applications,

150
00:06:28,300 --> 00:06:30,200
we actually know a huge amount
about the relationship

151
00:06:30,200 --> 00:06:32,100
between its variables.

152
00:06:32,100 --> 00:06:34,600
A lot of the relationships
are strictly monotone.

153
00:06:35,400 --> 00:06:39,400
Education is going to increase
people's earnings,

154
00:06:39,800 --> 00:06:41,950
irrespective of the demographic,

155
00:06:41,950 --> 00:06:44,100
irrespective of the level
of education you already have.

156
00:06:44,100 --> 00:06:45,950
- Until they get to a Ph.D.

157
00:06:45,950 --> 00:06:47,800
- Yeah, there is a graduate school...

158
00:06:48,150 --> 00:06:49,150
[laughter]

159
00:06:49,500 --> 00:06:50,700
but go over a reasonable range.

160
00:06:51,600 --> 00:06:55,900
It's not going
to go down very much.

161
00:06:56,100 --> 00:06:57,900
In a lot of the settings

162
00:06:57,900 --> 00:06:59,700
where these machine learning
methods shine,

163
00:06:59,700 --> 00:07:01,900
there's a lot of [ ]

164
00:07:02,100 --> 00:07:04,900
kind of multimodality
in these relationships,

165
00:07:05,300 --> 00:07:08,400
and they're going to be
very powerful.

166
00:07:08,400 --> 00:07:11,500
But I still stand by that.

167
00:07:11,700 --> 00:07:16,100
These methods just have
a huge amount to offer

168
00:07:16,400 --> 00:07:18,100
for economists,

169
00:07:18,200 --> 00:07:21,700
and they're going to be
a big part of the future.

170
00:07:23,400 --> 00:07:24,600
- [Isaiah] Feels like
there's something interesting

171
00:07:24,600 --> 00:07:25,800
to be said about
machine learning here.

172
00:07:25,800 --> 00:07:27,700
So, Guido, I was wondering,
could you give some more...

173
00:07:28,000 --> 00:07:29,000
maybe some examples
of the sorts of examples

174
00:07:29,000 --> 00:07:32,500
you're thinking about
with applications [ ] at the moment?

175
00:07:32,500 --> 00:07:34,100
- So on areas where

176
00:07:34,700 --> 00:07:36,400
instead of looking
for average cause or effects

177
00:07:36,500 --> 00:07:39,350
we're looking for
individualized estimates,

178
00:07:39,350 --> 00:07:42,200
predictions of cause or effects

179
00:07:42,400 --> 00:07:44,950
and the machine learning algorithms
have been very effective,

180
00:07:48,300 --> 00:07:51,500
Traditionally, we would have done
these things using kernel methods.

181
00:07:51,600 --> 00:07:54,500
And theoretically they work great,

182
00:07:54,600 --> 00:07:56,000
and there's some arguments

183
00:07:56,000 --> 00:07:57,400
that, formally, 
you can't do any better.

184
00:07:57,600 --> 00:08:00,500
But in practice, 
they don't work very well.

185
00:08:00,900 --> 00:08:03,150
Random causal forest-type things

186
00:08:03,150 --> 00:08:05,400
that Stefan Wager and Susan Athey
have been working on

187
00:08:05,400 --> 00:08:09,500
have used very widely.

188
00:08:09,600 --> 00:08:12,200
They've been very effective
in these settings

189
00:08:12,400 --> 00:08:18,100
to actually get causal effects
that vary be [ ].

190
00:08:20,700 --> 00:08:23,200
I think this is still just the beginning
of these methods.

191
00:08:23,200 --> 00:08:25,700
But in many cases,

192
00:08:26,400 --> 00:08:31,600
these algorithms are very effective
as searching over big spaces

193
00:08:31,800 --> 00:08:35,600
and finding the functions that fit very well

194
00:08:35,900 --> 00:08:41,100
in ways that we couldn't
really do beforehand.

195
00:08:41,500 --> 00:08:43,400
- I don't know of an example

196
00:08:43,400 --> 00:08:45,300
where machine learning
has generated insights

197
00:08:45,300 --> 00:08:48,100
about a causal effect
that I'm interested in.

198
00:08:48,300 --> 00:08:49,800
And I do know of examples

199
00:08:49,800 --> 00:08:51,300
where it's potentially
very misleading.

200
00:08:51,300 --> 00:08:53,700
So I've done some work
with Brigham Frandsen,

201
00:08:54,100 --> 00:08:55,100
using, for example, random forest
to model covariate effects

202
00:08:55,100 --> 00:08:59,900
in an instrumental
variables problem

203
00:09:00,200 --> 00:09:01,200
Where you need you need
to condition on covariance.

204
00:09:04,400 --> 00:09:06,300
And you don't particularly
have strong feelings

205
00:09:06,300 --> 00:09:08,200
about the functional form for that,

206
00:09:08,200 --> 00:09:10,000
so maybe you should curve...

207
00:09:10,900 --> 00:09:12,700
be open to flexible curve fitting,

208
00:09:12,700 --> 00:09:14,500
and that leads you down a path

209
00:09:14,500 --> 00:09:18,000
where there's a lot
of nonlinearities in the model,

210
00:09:18,200 --> 00:09:20,600
and that's very dangerous with IV

211
00:09:20,600 --> 00:09:23,000
because any sort
of excluded non-linearity

212
00:09:23,300 --> 00:09:25,450
potentially generates
a spurious causal effect

213
00:09:25,450 --> 00:09:27,600
and Brigham and I
showed that very powerfully.

214
00:09:27,900 --> 00:09:32,200
I think in the case
of two instruments

215
00:09:32,700 --> 00:09:36,000
that come from a paper of mine
with Bill Evans,

216
00:09:36,500 --> 00:09:37,600
where if you replace it

217
00:09:38,100 --> 00:09:40,350
a traditional two stage 
[ ] squares estimator

218
00:09:40,350 --> 00:09:42,600
with some kind of random forest,

219
00:09:42,900 --> 00:09:48,000
you get very precisely
estimated [non-sense] estimates.

220
00:09:49,000 --> 00:09:51,100
I think that's a big caution.

221
00:09:51,100 --> 00:09:53,400
In view of those findings
in an example I care about

222
00:09:53,700 --> 00:09:57,100
where the instruments
are very simple

223
00:09:57,400 --> 00:09:59,100
and I believe that they're valid,

224
00:09:59,300 --> 00:10:01,600
I would be skeptical of that.

225
00:10:02,900 --> 00:10:06,800
So non-linearity and IV
don't mix very comfortably.

226
00:10:07,200 --> 00:10:10,450
No, it sounds like that's already
a more complicated...

227
00:10:10,450 --> 00:10:11,400
- Well, it's IV....
- Yeah.

228
00:10:12,500 --> 00:10:16,700
- ...and we work on that.

229
00:10:17,150 --> 00:10:17,875
[laughter]

230
00:10:17,875 --> 00:10:18,600
- Fair enough.

231
00:10:18,600 --> 00:10:20,450
- As Editor of Econometric [guy],

232
00:10:20,450 --> 00:10:22,300
a lot of these papers
cross by my desk,

233
00:10:22,700 --> 00:10:26,100
but the motivation is not clear

234
00:10:26,100 --> 00:10:29,500
and, in fact, really lacking.

235
00:10:29,800 --> 00:10:35,100
They're not... [we call] type
semi-parametric foundational papers.

236
00:10:35,400 --> 00:10:37,100
So that that's a big problem.

237
00:10:38,000 --> 00:10:42,400
A related problem is that we have
this tradition in econometrics

238
00:10:42,600 --> 00:10:47,500
of being very focused
on these formal [ ] results.

239
00:10:48,800 --> 00:10:52,600
We have just have a lot of papers
where people propose a method

240
00:10:52,800 --> 00:10:55,700
and then establish
the asymptotic properties

241
00:10:56,300 --> 00:10:59,100
in a very kind of standardized way.

242
00:10:59,100 --> 00:11:01,900
- Is that bad?

243
00:11:02,900 --> 00:11:07,200
- Well, I think it's sort
of closed the door

244
00:11:07,200 --> 00:11:09,400
for a lot of work
that doesn't fit it into that.

245
00:11:09,400 --> 00:11:11,600
where in the machine
learning literature,

246
00:11:11,900 --> 00:11:14,300
a lot of things
are more algorithmic.

247
00:11:14,431 --> 00:11:18,500
People had algorithms
for coming up with predictions

248
00:11:18,800 --> 00:11:21,200
that turn out
to actually work much better

249
00:11:21,200 --> 00:11:23,600
than, say, nonparametric
kernel regression

250
00:11:24,000 --> 00:11:26,800
For a long time, we were doing all
the nonparametrics in econometrics,

251
00:11:26,800 --> 00:11:28,950
we were using kernel regression,

252
00:11:28,950 --> 00:11:31,100
and it was great for proving theorems.

253
00:11:31,300 --> 00:11:33,050
You could get [ ] intervals

254
00:11:33,050 --> 00:11:34,800
and consistency, 
and asymptotic normality,

255
00:11:34,800 --> 00:11:35,900
and it was all great,

256
00:11:35,900 --> 00:11:37,000
But it wasn't very useful.

257
00:11:37,300 --> 00:11:39,100
And the things they did
in machine learning

258
00:11:39,100 --> 00:11:40,900
are just way, way better.

259
00:11:41,000 --> 00:11:43,050
But they didn't have the problem--

260
00:11:43,050 --> 00:11:44,300
- That's not my beef
with machine learning theory.

261
00:11:44,300 --> 00:11:45,300
[laughter]

262
00:11:45,300 --> 00:11:51,200
No, but I'm saying there,
for the prediction part,

263
00:11:51,400 --> 00:11:52,950
it does much better.

264
00:11:52,950 --> 00:11:54,500
- Yeah, it's a better
curve fitting to it.

265
00:11:54,900 --> 00:11:56,500
- But it did so in a way

266
00:11:57,100 --> 00:11:58,500
that would not have made
those papers

267
00:11:58,500 --> 00:11:59,900
initially easy to get into,
the econometrics journals,

268
00:12:04,650 --> 00:12:06,300
because it wasn't proving
the type of things.

269
00:12:06,400 --> 00:12:08,800
When Brigham was doing
his regression trees

270
00:12:08,800 --> 00:12:11,200
that just didn't fit in.

271
00:12:11,800 --> 00:12:15,100
I think he would have had
a very hard time

272
00:12:15,200 --> 00:12:18,400
publishing these things
in econometric journals.

273
00:12:18,900 --> 00:12:24,400
I think we've limited
ourselves too much

274
00:12:24,700 --> 00:12:27,900
that left us close things off

275
00:12:28,000 --> 00:12:29,400
for a lot of these
machine learning methods

276
00:12:29,400 --> 00:12:30,800
that are actually very useful.

277
00:12:30,900 --> 00:12:34,000
I mean, I think, in general,

278
00:12:34,900 --> 00:12:36,200
that literature, 
the computer scientist,

279
00:12:36,200 --> 00:12:37,750
have proposed a huge number
of these algorithms

280
00:12:37,750 --> 00:12:39,300
that actually are very useful.

281
00:12:45,500 --> 00:12:47,300
and that are affecting

282
00:12:47,300 --> 00:12:49,100
the way we're going
to be doing empirical work.

283
00:12:49,800 --> 00:12:52,450
But we've not fully internalized that

284
00:12:52,450 --> 00:12:55,100
because we're still very focused

285
00:12:55,300 --> 00:12:57,500
on getting point estimates
and getting standard errors

286
00:12:58,600 --> 00:13:01,200
and getting P values

287
00:13:01,700 --> 00:13:03,100
in a way that we need to move beyond

288
00:13:03,300 --> 00:13:04,300
to fully harness the force,

289
00:13:04,300 --> 00:13:10,700
the benefits
from the machine learning literature.

290
00:13:10,900 --> 00:13:13,000
- On the one hand, I guess I very
much take your point

291
00:13:13,000 --> 00:13:15,100
that sort of the traditional
econometrics framework

292
00:13:15,200 --> 00:13:18,600
of sort of propose a method,
prove a limit theorem

293
00:13:18,600 --> 00:13:22,600
under some asymptotic story,
story story, story story...

294
00:13:22,600 --> 00:13:26,900
publisher paper is constraining.

295
00:13:26,900 --> 00:13:29,700
And that, in some sense,

296
00:13:29,700 --> 00:13:30,575
by thinking more broadly

297
00:13:30,575 --> 00:13:31,450
about what a methods paper
could look like,

298
00:13:31,450 --> 00:13:33,200
we may [write] in some sense.

299
00:13:33,200 --> 00:13:35,900
Certainly the machine learning
literature has found a bunch of things,

300
00:13:35,900 --> 00:13:38,300
which seem to work quite well
for a number of problems

301
00:13:38,300 --> 00:13:40,350
and are now having
substantial influence in economics.

302
00:13:40,350 --> 00:13:42,400
I guess a question I'm interested in

303
00:13:42,400 --> 00:13:44,800
is how do you think
about the role of...

304
00:13:47,900 --> 00:13:51,200
sort of -- do you think there is
no value in the theory part of it?

305
00:13:51,600 --> 00:13:54,800
Because I guess a question
that I often have

306
00:13:54,800 --> 00:13:56,900
to sort of seeing that output
from a machine learning tool,

307
00:13:56,900 --> 00:13:59,400
that actually a number of the
methods that you talked about

308
00:13:59,400 --> 00:14:01,800
actually do have inferential results
developed for them,

309
00:14:02,600 --> 00:14:04,500
something that
I always wonder about

310
00:14:04,500 --> 00:14:06,400
of uncertainty quantification
and just...

311
00:14:06,500 --> 00:14:08,000
I have my prior,

312
00:14:08,000 --> 00:14:11,000
I come into the world with my view.
I see the result of this thing.

313
00:14:11,000 --> 00:14:12,750
How should I update based on it?

314
00:14:12,750 --> 00:14:14,500
And in some sense, 
if I'm in a world

315
00:14:14,600 --> 00:14:15,100
where things are normally distributed,

316
00:14:15,200 --> 00:14:16,700
I know how to do it here --

317
00:14:16,700 --> 00:14:18,200
here I don't.

318
00:14:18,200 --> 00:14:21,400
And so I'm interested to hear
what you think about that.

319
00:14:21,500 --> 00:14:24,300
- I don't see this as sort
of saying, well,

320
00:14:24,400 --> 00:14:26,500
these results are not interesting,

321
00:14:26,600 --> 00:14:27,700
but it's going to be a lot of cases

322
00:14:28,000 --> 00:14:29,600
where it's going
to be incredibly hard

323
00:14:29,600 --> 00:14:31,200
to get those results

324
00:14:31,200 --> 00:14:33,200
and we may not be able to get there

325
00:14:33,400 --> 00:14:35,550
and we may need to do it in stages

326
00:14:35,550 --> 00:14:37,700
where first someone says,

327
00:14:39,600 --> 00:14:40,900
"Hey, I have
this interesting algorithm

328
00:14:40,900 --> 00:14:42,200
for doing something

329
00:14:42,200 --> 00:14:44,800
and it works well by some of the criterion

330
00:14:45,600 --> 00:14:49,900
that on this particular data set,

331
00:14:51,000 --> 00:14:53,400
and I'm visit put it out there,

332
00:14:53,700 --> 00:14:55,850
and maybe someone will figure out a way

333
00:14:55,850 --> 00:14:58,000
that you can later actually
still do inference

334
00:14:58,000 --> 00:14:59,100
on the [sum] condition,

335
00:14:59,100 --> 00:15:02,100
and maybe those are not
particularly realistic conditions,

336
00:15:02,100 --> 00:15:03,800
then we kind of go further.

337
00:15:03,800 --> 00:15:05,500
But I think we've been
constraining things too much

338
00:15:06,700 --> 00:15:09,050
where we said,

339
00:15:09,050 --> 00:15:11,400
"This is the type of things
that we need to do.

340
00:15:12,100 --> 00:15:14,400
And in some sense,

341
00:15:15,700 --> 00:15:18,200
that goes back
to the way Josh and I

342
00:15:19,700 --> 00:15:21,900
thought about things for the
[local average treatment] effect.

343
00:15:21,900 --> 00:15:23,250
That wasn't quite the way

344
00:15:23,250 --> 00:15:24,600
people were thinking
about these problems before.

345
00:15:24,600 --> 00:15:29,200
There was a sense
that some of the people said

346
00:15:29,500 --> 00:15:31,900
the way you need to do
these things is you first say,

347
00:15:32,200 --> 00:15:34,250
what you're interested in
in estimating

348
00:15:34,250 --> 00:15:36,300
and then you do the best job
you can in estimating that.

349
00:15:38,100 --> 00:15:44,200
and what you guys are doing
is you're doing it backwards.

350
00:15:44,300 --> 00:15:46,700
You kind of say,
"Here, I have an estimator,

351
00:15:47,300 --> 00:15:49,600
and now I'm going to figure out
what it's estimating,

352
00:15:51,400 --> 00:15:53,900
and I suppose you're going to say
why you think that's interesting

353
00:15:53,900 --> 00:15:56,600
or maybe why it's not interesting,
and that's not okay.

354
00:15:56,600 --> 00:15:58,600
You're not allowed
to do that that way.

355
00:15:59,000 --> 00:16:04,100
And I think we should
just be a little bit more flexible

356
00:16:04,300 --> 00:16:06,300
in thinking about
how to look at problems

357
00:16:06,400 --> 00:16:08,850
because I think
we've missed some things

358
00:16:08,850 --> 00:16:11,300
by not doing that.

359
00:16:13,000 --> 00:16:14,800
- [Josh] So you've heard
our views, Isaiah.

360
00:16:14,800 --> 00:16:16,600
You've seen that we have
some points of disagreement.

361
00:16:17,000 --> 00:16:20,400
Why don't you referee
this dispute for us?

362
00:16:20,950 --> 00:16:21,950
[laughter]

363
00:16:22,500 --> 00:16:25,300
- Oh, it's so nice of you
to ask me a small question.

364
00:16:25,300 --> 00:16:28,100
So I guess for one,

365
00:16:28,200 --> 00:16:33,200
I very much agree with something
that Guido said earlier of...

366
00:16:34,100 --> 00:16:35,100
[laughter]

367
00:16:36,500 --> 00:16:37,900
- So one thing where it seems

368
00:16:37,900 --> 00:16:39,650
where the case for machine learning
seems relatively clear

369
00:16:39,650 --> 00:16:41,400
is in settings where
we're interested in some version

370
00:16:41,500 --> 00:16:45,100
of a nonparametric
prediction problem.

371
00:16:45,100 --> 00:16:47,400
So I'm interested in estimating

372
00:16:47,400 --> 00:16:49,700
a conditional expectation
or conditional probability,

373
00:16:50,000 --> 00:16:52,100
and in the past, maybe
I would have run a kernel...

374
00:16:52,100 --> 00:16:53,950
I would have run
a kernel regression

375
00:16:53,950 --> 00:16:55,800
or I would have run
a series regression,

376
00:16:56,100 --> 00:16:57,400
or something along those lines.

377
00:16:58,700 --> 00:17:00,350
It seems like, at this point, 
we've a fairly good sense

378
00:17:00,350 --> 00:17:02,000
that in a fairly wide range
of applications,

379
00:17:02,000 --> 00:17:06,300
machine learning methods
seem to do better

380
00:17:06,800 --> 00:17:08,800
for estimating conditional
mean functions

381
00:17:08,800 --> 00:17:10,400
or conditional probabilities

382
00:17:10,400 --> 00:17:12,000
or various other
nonparametric objects

383
00:17:12,400 --> 00:17:14,500
than more traditional
nonparametric methods

384
00:17:14,500 --> 00:17:16,600
that were studied
in econometrics and statistics,

385
00:17:16,600 --> 00:17:19,100
especially
in high dimensional settings.

386
00:17:19,500 --> 00:17:21,300
- So you're thinking of maybe
the propensity score

387
00:17:21,300 --> 00:17:23,100
or something like that?

388
00:17:23,100 --> 00:17:24,200
- Yeah, exactly,

389
00:17:24,200 --> 00:17:25,300
- Nuisance functions.

390
00:17:25,300 --> 00:17:27,100
Yeah, so things
like propensity scores,

391
00:17:27,530 --> 00:17:29,965
even objects of more direct

392
00:17:29,965 --> 00:17:32,400
interest-like conditional
average treatment effects,

393
00:17:32,400 --> 00:17:35,100
which of the difference of two
conditional expectation functions,

394
00:17:35,100 --> 00:17:36,300
potentially things like that.

395
00:17:36,500 --> 00:17:40,400
Of course, even there, the theory...

396
00:17:40,500 --> 00:17:43,700
inference of the theory
for how to interpret,

397
00:17:43,700 --> 00:17:45,900
how to make large simple statements
about some of these things

398
00:17:46,000 --> 00:17:48,050
are less well-developed
depending on

399
00:17:48,050 --> 00:17:50,100
the machine learning
estimator used.

400
00:17:50,100 --> 00:17:53,800
And so I think there's
something that is tricky

401
00:17:53,900 --> 00:17:55,700
is that we can have these methods,
which work a lot,

402
00:17:55,700 --> 00:17:58,000
which seemed to work
a lot better for some purposes,

403
00:17:58,000 --> 00:18:01,600
but which we need to be a bit
careful in how we plug them in

404
00:18:01,600 --> 00:18:03,300
or how we interpret
the resulting statements.

405
00:18:03,600 --> 00:18:06,200
But of course, that's a very,
very active area right now

406
00:18:06,400 --> 00:18:08,400
where people are doing
tons of great work.

407
00:18:08,400 --> 00:18:10,400
And so I fully expect
and hope to see

408
00:18:10,400 --> 00:18:12,800
much more going forward there.

409
00:18:13,000 --> 00:18:17,300
So one issue with machine learning
that always seems a danger

410
00:18:17,400 --> 00:18:20,300
or that is sometimes a danger

411
00:18:20,500 --> 00:18:21,550
and had sometimes
led to applications

412
00:18:21,550 --> 00:18:22,600
that have made less sense

413
00:18:22,800 --> 00:18:25,100
is when folks start with a method
that they're very excited about

414
00:18:25,300 --> 00:18:28,500
rather than a question.

415
00:18:28,900 --> 00:18:32,100
So sort of starting with a question

416
00:18:32,500 --> 00:18:34,350
where here's the object I'm interested in,

417
00:18:34,350 --> 00:18:36,200
here is the parameter of interest.

418
00:18:37,300 --> 00:18:39,500
let me think about how I would
identify that thing,

419
00:18:39,500 --> 00:18:41,800
how I would recover that thing
if I had a ton of data.

420
00:18:41,900 --> 00:18:44,000
Oh, here's a conditional
expectation function.

421
00:18:44,000 --> 00:18:47,100
Let me plug in the machine
learning estimator for that.

422
00:18:47,200 --> 00:18:48,800
That seems very, very sensible.

423
00:18:49,000 --> 00:18:53,100
Whereas, you know, 
if I regress quantity on price

424
00:18:53,700 --> 00:18:56,000
and say that I used
a machine learning method,

425
00:18:56,300 --> 00:18:58,900
maybe I'm satisfied that 
that solves the [ ] problem

426
00:18:58,900 --> 00:19:01,200
we're usually worried
about there... maybe I'm not.

427
00:19:01,500 --> 00:19:03,200
But again, that's something

428
00:19:03,400 --> 00:19:06,300
where the way to address it
seems relatively clear.

429
00:19:06,500 --> 00:19:09,000
It's to find your object of interest

430
00:19:09,200 --> 00:19:10,400
and think about--

431
00:19:10,400 --> 00:19:11,600
- Just bring in the economics.

432
00:19:11,700 --> 00:19:12,200
- Exactly.

433
00:19:12,200 --> 00:19:15,400
- And and can I think about heterogeneity,

434
00:19:15,400 --> 00:19:18,300
but harnessed the power
of the machine learning methods

435
00:19:18,500 --> 00:19:20,650
for some of the components.

436
00:19:20,650 --> 00:19:22,800
- Precisely. Exactly.

437
00:19:22,900 --> 00:19:24,250
So the question of interest

438
00:19:24,250 --> 00:19:25,600
is the same as the question
of interest has always been,

439
00:19:25,600 --> 00:19:29,500
but we now have better methods
for estimating some pieces of this.

440
00:19:29,900 --> 00:19:31,600
The place that seems
harder to forecast

441
00:19:33,400 --> 00:19:34,850
is obviously, there's
a huge amount going on

442
00:19:34,850 --> 00:19:36,300
in the machine learning literature

443
00:19:37,500 --> 00:19:38,600
and the limited ways
of plugging it in

444
00:19:38,600 --> 00:19:39,700
that I've referenced so far

445
00:19:39,700 --> 00:19:42,900
are a limited piece of that.

446
00:19:43,000 --> 00:19:44,550
And so I think there are all sorts
of other interesting questions

447
00:19:44,550 --> 00:19:46,100
about where...

448
00:19:47,100 --> 00:19:49,300
where does this interaction go? 
What else can we learn?

449
00:19:49,300 --> 00:19:52,000
And that's something where
I think there's a ton going on

450
00:19:52,200 --> 00:19:54,300
which seems very promising,

451
00:19:54,300 --> 00:19:56,400
and I have no idea
what the answer is.

452
00:19:57,000 --> 00:19:59,100
- No, I totally agree with that,

453
00:19:59,100 --> 00:20:01,200
but that makes it very exciting.

454
00:20:03,800 --> 00:20:06,100
And I think there's just
a little work to be done there.

455
00:20:06,600 --> 00:20:09,000
Alright. So I say, he agrees
with me there.

456
00:20:09,000 --> 00:20:11,400
[laughter]

457
00:20:12,450 --> 00:20:13,450
- I didn't say that per se.

458
00:20:14,500 --> 00:20:16,100
- [Narrator] If you'd like to watch
more Nobel Conversations,

459
00:20:16,100 --> 00:20:17,700
click here.

460
00:20:18,000 --> 00:20:20,400
Pr if you'd like to learn
more about econometrics,

461
00:20:20,500 --> 00:20:23,100
check out Josh's
Mastering Econometrics series.

462
00:20:23,600 --> 00:20:26,500
If you'd like to learn more
about Guido, Josh, and Isaiah,

463
00:20:26,700 --> 00:20:28,200
check out the links
in the description.

464
00:20:28,550 --> 00:20:30,535
♪ [music] ♪