WEBVTT

00:00:00.100 --> 00:00:02.050
♪ [music] ♪

00:00:03.620 --> 00:00:05.700
- [Narrator] Welcome
to Nobel Conversations.

00:00:07.000 --> 00:00:10.043
In this episode, Josh Angrist
and Guido Imbens

00:00:10.043 --> 00:00:13.675
sit down with Isaiah Andrews
to discuss and disagree

00:00:13.675 --> 00:00:16.580
over the role of machine learning
in applied econometrics.

00:00:18.237 --> 00:00:19.769
- [Isaiah] So, of course,
there are a lot of topics

00:00:19.769 --> 00:00:21.087
where you guys largely agree,

00:00:21.087 --> 00:00:22.313
but I'd like to turn to one

00:00:22.313 --> 00:00:24.240
where maybe you have
some differences of opinion.

00:00:24.240 --> 00:00:25.728
I'd love to hear
some of your thoughts

00:00:25.728 --> 00:00:26.883
about machine learning

00:00:26.883 --> 00:00:29.900
and the goal that it's playing
and is going to play in economics.

00:00:30.200 --> 00:00:33.352
- [Guido] I've looked at some data
like the proprietary.

00:00:33.352 --> 00:00:35.150
We see that there's
no published paper there.

00:00:36.122 --> 00:00:38.159
There was an experiment
that was done

00:00:38.159 --> 00:00:39.500
on some search algorithm,

00:00:39.700 --> 00:00:41.327
and the question was...

00:00:42.901 --> 00:00:45.600
it was about ranking things
and changing the ranking.

00:00:45.900 --> 00:00:47.290
And it was sort of clear

00:00:48.400 --> 00:00:50.610
that there was going to be
a lot of heterogeneity there.

00:00:52.161 --> 00:00:56.282
If you look for, say,

00:00:57.831 --> 00:01:00.617
a picture of Britney Spears,

00:01:00.617 --> 00:01:02.493
that it doesn't really matter
where you rank it

00:01:02.493 --> 00:01:05.500
because you're going to figure out
what you're looking for,

00:01:06.200 --> 00:01:07.867
whether you put it
in the first or second

00:01:07.867 --> 00:01:09.920
or third position of the ranking.

00:01:10.100 --> 00:01:12.500
But if you're looking
for the best econometrics book,

00:01:13.300 --> 00:01:16.430
if you put your book first
or your book tenth --

00:01:16.430 --> 00:01:18.100
that's going to make
a big difference

00:01:18.600 --> 00:01:20.979
how often people
are going to click on it.

00:01:21.829 --> 00:01:23.417
And so there you --

00:01:23.417 --> 00:01:27.119
- [Josh] Why do I need
machine learning to discover that?

00:01:27.119 --> 00:01:29.195
It seems like-- because
I can discover it simply.

00:01:29.195 --> 00:01:30.435
- [Guido] So in [general]--

00:01:30.435 --> 00:01:32.100
- [Josh] There were lots
of possible...

00:01:32.100 --> 00:01:35.045
- You what to think about
there being lots of characteristics

00:01:35.490 --> 00:01:37.280
of the items,

00:01:37.610 --> 00:01:41.682
that you want to understand
what drives the heterogeneity

00:01:42.177 --> 00:01:43.427
in the effect of--

00:01:43.427 --> 00:01:45.008
- But you're just predicting

00:01:45.008 --> 00:01:47.665
In some sense, you're solving
a marketing problem.

00:01:47.665 --> 00:01:49.381
- No, it's a causal effect,

00:01:49.381 --> 00:01:51.911
- It's causal, but it has
no scientific content.

00:01:51.911 --> 00:01:53.141
Think about...

00:01:53.657 --> 00:01:57.300
- No, but there's similar things
in medical settings.

00:01:58.000 --> 00:02:01.300
If you do an experiment, 
you may actually be very interested

00:02:01.300 --> 00:02:03.900
in whether the treatment works
for some groups or not.

00:02:03.900 --> 00:02:06.143
And you have a lot
of individual characteristics,

00:02:06.143 --> 00:02:08.000
and you want
to systematically search--

00:02:08.000 --> 00:02:09.500
- Yeah. I'm skeptical about that --

00:02:09.500 --> 00:02:12.603
that sort of idea that there's
this personal causal effect

00:02:12.603 --> 00:02:14.000
that I should care about,

00:02:14.000 --> 00:02:15.740
and that machine learning
can discover it

00:02:15.740 --> 00:02:17.259
in some way that's useful.

00:02:17.259 --> 00:02:20.045
So think about -- I've done
a lot of work on schools,

00:02:20.045 --> 00:02:22.336
going to, say, a charter school,

00:02:22.336 --> 00:02:24.428
a publicly funded private school,

00:02:25.225 --> 00:02:27.392
effectively, 
that's free to structure

00:02:27.392 --> 00:02:29.399
its own curriculum
for context there.

00:02:29.399 --> 00:02:31.369
Some types of charter schools

00:02:31.369 --> 00:02:33.703
generate spectacular
achievement gains,

00:02:33.703 --> 00:02:36.400
and in the data set
that produces that result,

00:02:36.400 --> 00:02:37.800
I have a lot of covariance.

00:02:37.800 --> 00:02:41.353
So I have baseline scores,
and I have family background,

00:02:41.353 --> 00:02:43.207
the education of the parents,

00:02:43.576 --> 00:02:45.800
the sex of the child, 
the race of the child.

00:02:45.800 --> 00:02:49.795
And, well, soon as I put
half a dozen of those together,

00:02:49.795 --> 00:02:51.900
I have a very high
dimensional space.

00:02:52.300 --> 00:02:55.199
I'm definitely interested
in sort of course features

00:02:55.199 --> 00:02:56.457
of that treatment effect,

00:02:56.457 --> 00:02:58.741
like whether it's better for people

00:02:58.741 --> 00:03:02.046
who come from
lower income families.

00:03:02.600 --> 00:03:06.000
I have a hard time believing
that there's an application

00:03:07.273 --> 00:03:09.872
for the very high dimensional
version of that,

00:03:09.872 --> 00:03:12.406
where I discovered
that for non-white children

00:03:12.406 --> 00:03:14.971
who have high family incomes

00:03:14.971 --> 00:03:17.800
but baseline scores
in the third quartile

00:03:18.166 --> 00:03:21.785
and only went to public school
in the third grade

00:03:21.785 --> 00:03:23.000
but not the sixth grade.

00:03:23.000 --> 00:03:25.796
So that's what that high
dimensional analysis produces.

00:03:25.800 --> 00:03:28.100
It's a very elaborate
conditional statement.

00:03:28.300 --> 00:03:30.605
There's two things that are wrong
with that in my view.

00:03:30.605 --> 00:03:31.797
First, I don't see it as...

00:03:31.797 --> 00:03:34.000
I just can't imagine
why it's actionable.

00:03:34.600 --> 00:03:36.600
I don't know why
you'd want to act on it.

00:03:36.600 --> 00:03:39.455
And I know also that
there's some alternative model

00:03:39.455 --> 00:03:41.200
that fits almost as well,

00:03:41.800 --> 00:03:43.398
that flips everything.

00:03:43.398 --> 00:03:45.350
Because machine learning
doesn't tell me

00:03:45.350 --> 00:03:48.582
that this is really
the predictor that matters.

00:03:48.582 --> 00:03:50.965
It just tells me that
this is a good predictor.

00:03:51.486 --> 00:03:54.870
And so, I think
there is something different

00:03:54.870 --> 00:03:57.586
about the social science context.

00:03:57.940 --> 00:04:00.186
- [Guido] I think
the social science applications

00:04:00.186 --> 00:04:01.633
you're talking about,

00:04:01.633 --> 00:04:02.735
once were...

00:04:03.400 --> 00:04:08.100
I think there's not a huge amount
of heterogeneity in the effects.

00:04:09.777 --> 00:04:11.544
- [Josh] Well, there might be
if you allow me

00:04:11.544 --> 00:04:13.466
to fill that space.

00:04:13.466 --> 00:04:15.740
- No... not even then.

00:04:15.740 --> 00:04:18.614
I think for a lot
of those interventions,

00:04:18.614 --> 00:04:22.765
you would expect that the effect
is the same sign for everybody.

00:04:24.367 --> 00:04:27.600
There may be small differences
in the magnitude, but it's not...

00:04:28.200 --> 00:04:31.700
For a lot of these education [ ],
they're good for everybody.

00:04:34.169 --> 00:04:36.034
It's not that they're bad
for some people

00:04:36.034 --> 00:04:37.600
and good for other people,

00:04:37.600 --> 00:04:39.200
and that is kind
of very small pockets

00:04:39.200 --> 00:04:40.900
where they're bad there.

00:04:40.900 --> 00:04:44.011
But there may be some variation
in the magnitude,

00:04:44.011 --> 00:04:46.955
but you would need very, 
very big data sets to find those.

00:04:47.906 --> 00:04:49.078
I agree that in those cases,

00:04:49.078 --> 00:04:51.400
they probably wouldn't be
very actionable anyway.

00:04:51.700 --> 00:04:53.800
But I think there's a lot
of other settings

00:04:54.100 --> 00:04:56.600
where there is
much more heterogeneity.

00:04:57.400 --> 00:04:59.102
- Well, I'm open
to that possibility,

00:04:59.102 --> 00:05:04.918
and I think the example you gave
is essentially a marketing example.

00:05:06.430 --> 00:05:09.656
- No, those have
implications for it

00:05:09.656 --> 00:05:11.069
and that's the organization,

00:05:12.252 --> 00:05:14.330
whether you need
to worry about the...

00:05:15.469 --> 00:05:17.900
- Well, I need to see that paper.

00:05:18.400 --> 00:05:21.072
- So the sense
I'm getting is that...

00:05:21.467 --> 00:05:23.996
- We still disagree on something.
- Yes. [laughter]

00:05:23.996 --> 00:05:25.440
- We haven't converged
on everything.

00:05:25.440 --> 00:05:27.200
- I'm getting that sense.
[laughter]

00:05:27.200 --> 00:05:28.679
- Actually, we've diverged on this

00:05:28.679 --> 00:05:30.833
because this wasn't around
to argue about.

00:05:30.833 --> 00:05:32.334
[laughter]

00:05:33.057 --> 00:05:34.771
- Is it getting a little warm here?

00:05:35.820 --> 00:05:38.147
- Warmed up. Warmed up is good.

00:05:38.147 --> 00:05:41.187
The sense I'm getting is, Josh,
you're not saying

00:05:41.187 --> 00:05:43.289
that you're confident
that there is no way

00:05:43.289 --> 00:05:45.017
that there is an application
with the stuff.

00:05:45.017 --> 00:05:47.028
It's useful you are saying
you are unconvinced

00:05:47.028 --> 00:05:49.487
by the existing
applications to date.

00:05:49.917 --> 00:05:52.022
- Fair enough.
- I'm very confident.

00:05:52.022 --> 00:05:53.704
[laughter]

00:05:54.156 --> 00:05:55.189
- In this case.

00:05:55.189 --> 00:05:56.555
- I think Josh does have a point

00:05:56.555 --> 00:06:00.452
that even in the prediction cases

00:06:01.639 --> 00:06:04.519
where a lot of the machine learning
methods really shine

00:06:04.519 --> 00:06:06.738
is where there's just a lot
of heterogeneity.

00:06:07.300 --> 00:06:10.769
- You don't really care much
about the details there, right?

00:06:10.769 --> 00:06:11.836
- [Guido] Yes.

00:06:11.836 --> 00:06:15.000
It doesn't have
a policy angle or something.

00:06:15.200 --> 00:06:18.795
- The kind of recognizing
handwritten digits and stuff.

00:06:18.795 --> 00:06:20.090
It does much better there

00:06:20.090 --> 00:06:24.000
than building
some complicated model.

00:06:24.400 --> 00:06:28.183
But a lot of the social science,
a lot of the economic applications,

00:06:28.183 --> 00:06:30.383
we actually know a huge amount
about the relationship

00:06:30.383 --> 00:06:32.100
between its variables.

00:06:32.100 --> 00:06:34.700
A lot of the relationships
are strictly monotone.

00:06:37.166 --> 00:06:39.416
Education is going to increase
people's earnings,

00:06:39.697 --> 00:06:41.950
irrespective of the demographic,

00:06:41.950 --> 00:06:44.930
irrespective of the level
of education you already have.

00:06:44.930 --> 00:06:46.180
- Until they get to a Ph.D.

00:06:46.180 --> 00:06:47.846
- They have proof
of graduate school...

00:06:47.846 --> 00:06:49.227
[laughter]

00:06:49.227 --> 00:06:50.700
- Over a reasonable range.

00:06:51.600 --> 00:06:55.488
It's not going
to go down very much.

00:06:56.100 --> 00:06:58.121
In a lot of the settings

00:06:58.121 --> 00:07:00.100
where these machine learning
methods shine,

00:07:00.100 --> 00:07:01.900
there's a lot of [ ]

00:07:02.100 --> 00:07:04.900
kind of multimodality
in these relationships,

00:07:05.300 --> 00:07:08.921
and they're going to be
very powerful.

00:07:08.921 --> 00:07:11.787
But I still stand by that.

00:07:12.410 --> 00:07:14.975
These methods just have
a huge amount to offer

00:07:15.925 --> 00:07:17.561
for economists,

00:07:17.561 --> 00:07:21.700
and they're going to be
a big part of the future.

00:07:21.930 --> 00:07:23.050
♪ [music] ♪

00:07:23.380 --> 00:07:24.600
- [Isaiah] Feels like
there's something interesting

00:07:24.600 --> 00:07:25.800
to be said about
machine learning here.

00:07:25.800 --> 00:07:28.000
So, Guido, I was wondering,
could you give some more...

00:07:28.000 --> 00:07:29.897
maybe some examples
of the sorts of examples

00:07:29.897 --> 00:07:32.500
you're thinking about
with applications [ ] at the moment?

00:07:32.500 --> 00:07:34.182
- So one area is where

00:07:34.700 --> 00:07:36.947
instead of looking
for average causal effects

00:07:36.947 --> 00:07:39.350
we're looking for
individualized estimates,

00:07:41.354 --> 00:07:43.288
predictions of causal or effects,

00:07:43.288 --> 00:07:47.337
and the machine learning algorithms
have been very effective,

00:07:48.031 --> 00:07:51.415
Traditionally, we would have done
these things using kernel methods,

00:07:51.415 --> 00:07:54.003
and theoretically they work great,

00:07:54.003 --> 00:07:55.636
and there's some arguments

00:07:55.636 --> 00:07:57.612
that, formally, 
you can't do any better.

00:07:57.612 --> 00:07:59.579
But in practice, 
they don't work very well.

00:08:00.900 --> 00:08:03.527
Random causal forest-type things

00:08:03.527 --> 00:08:06.916
that Stefan Wager and Susan Athey
have been working on

00:08:06.916 --> 00:08:09.453
are used very widely.

00:08:09.453 --> 00:08:12.200
They've been very effective
in these settings

00:08:12.400 --> 00:08:18.819
to actually get causal effects
that vary be [ ].

00:08:20.700 --> 00:08:23.734
I think this is still just
the beginning of these methods.

00:08:23.734 --> 00:08:25.700
But in many cases,

00:08:27.351 --> 00:08:31.600
these algorithms are very effective
as searching over big spaces

00:08:31.800 --> 00:08:37.133
and finding the functions
that fit very well

00:08:37.133 --> 00:08:40.948
in ways that we couldn't
really do beforehand.

00:08:41.500 --> 00:08:42.697
- I don't know of an example

00:08:42.697 --> 00:08:45.300
where machine learning
has generated insights

00:08:45.300 --> 00:08:47.547
about a causal effect
that I'm interested in.

00:08:47.664 --> 00:08:49.610
And I do know of examples

00:08:49.610 --> 00:08:51.300
where it's potentially
very misleading.

00:08:51.300 --> 00:08:53.700
So I've done some work
with Brigham Frandsen,

00:08:54.100 --> 00:08:57.782
using, for example, random forest
to model covariate effects

00:08:57.782 --> 00:09:00.269
in an instrumental
variables problem

00:09:00.269 --> 00:09:03.375
Where you need you need
to condition on covariance.

00:09:04.400 --> 00:09:06.531
And you don't particularly
have strong feelings

00:09:06.531 --> 00:09:08.200
about the functional form for that,

00:09:08.200 --> 00:09:10.000
so maybe you should curve...

00:09:10.900 --> 00:09:12.804
be open to flexible curve fitting,

00:09:12.804 --> 00:09:14.501
and that leads you down a path

00:09:14.501 --> 00:09:16.853
where there's a lot
of nonlinearities in the model,

00:09:17.384 --> 00:09:19.933
and that's very dangerous with IV

00:09:19.933 --> 00:09:23.000
because any sort
of excluded non-linearity

00:09:23.300 --> 00:09:25.839
potentially generates
a ,spurious causal effect

00:09:25.839 --> 00:09:29.292
and Brigham and I showed that
very powerfully, I think,

00:09:29.292 --> 00:09:32.200
in the case of two instruments

00:09:32.944 --> 00:09:35.113
that come from a paper of mine
with Bill Evans,

00:09:35.113 --> 00:09:37.600
where if you replace it

00:09:38.708 --> 00:09:40.825
a traditional two stage 
[ ] squares estimator

00:09:40.825 --> 00:09:42.600
with some kind of random forest,

00:09:42.900 --> 00:09:46.807
you get very precisely estimated
[non-sense] estimates.

00:09:49.173 --> 00:09:51.100
I think that's a big caution.

00:09:51.944 --> 00:09:55.096
In view of those findings,
in an example I care about

00:09:55.096 --> 00:09:57.100
where the instruments
are very simple

00:09:57.400 --> 00:09:59.100
and I believe that they're valid,

00:09:59.300 --> 00:10:01.096
I would be skeptical of that.

00:10:02.900 --> 00:10:06.435
Non-linearity and IV
don't mix very comfortably.

00:10:06.435 --> 00:10:09.424
- No, it sounds like that's already
a more complicated...

00:10:10.206 --> 00:10:11.842
- Well, it's IV....
- Yeah.

00:10:12.591 --> 00:10:14.033
- ...but then we work on that.

00:10:14.403 --> 00:10:15.907
[laughter]

00:10:15.907 --> 00:10:17.289
- Fair enough.

00:10:17.289 --> 00:10:18.520
♪ [music] ♪

00:10:18.520 --> 00:10:19.931
- [Guido] As an editor
of econometrics guy,

00:10:19.931 --> 00:10:22.054
a lot of these papers
cross by my desk,

00:10:22.700 --> 00:10:26.823
but the motivation is not clear

00:10:27.659 --> 00:10:29.500
and, in fact, really lacking.

00:10:29.800 --> 00:10:31.028
They're not...

00:10:31.591 --> 00:10:34.926
[we call] type semi-parametric
foundational papers.

00:10:34.926 --> 00:10:37.031
So that's a big problem.

00:10:38.761 --> 00:10:42.664
A related problem is that we have
this tradition in econometrics

00:10:42.664 --> 00:10:46.560
of being very focused
on these formal asymptotic results.

00:10:48.800 --> 00:10:53.289
We just have a lot of papers
where people propose a method

00:10:53.289 --> 00:10:55.700
and then they establish
the asymptotic properties

00:10:56.300 --> 00:10:59.420
in a very kind of standardized way.

00:11:00.873 --> 00:11:02.055
- Is that bad?

00:11:02.900 --> 00:11:06.420
- Well, I think it's sort
of closed the door

00:11:06.420 --> 00:11:09.040
for a lot of work
that doesn't fit it into that

00:11:09.040 --> 00:11:11.600
where in the machine
learning literature,

00:11:11.900 --> 00:11:13.453
a lot of things
are more algorithmic.

00:11:13.808 --> 00:11:18.500
People had algorithms
for coming up with predictions

00:11:18.800 --> 00:11:20.885
that turn out
to actually work much better

00:11:20.885 --> 00:11:23.600
than, say, nonparametric
kernel regression.

00:11:24.000 --> 00:11:26.800
For a long time, we were doing all
the nonparametrics in econometrics,

00:11:26.800 --> 00:11:28.950
and we were using
kernel regression,

00:11:28.950 --> 00:11:31.100
and that was great
for proving theorems.

00:11:31.210 --> 00:11:32.580
You could get [ ] intervals

00:11:32.580 --> 00:11:34.684
and consistency, 
and asymptotic normality,

00:11:34.684 --> 00:11:35.736
and it was all great,

00:11:35.736 --> 00:11:37.000
But it wasn't very useful.

00:11:37.300 --> 00:11:39.100
And the things they did
in machine learning

00:11:39.100 --> 00:11:41.051
are just way, way better.

00:11:41.051 --> 00:11:42.557
But they didn't have the problem--

00:11:42.557 --> 00:11:44.449
- That's not my beef
with machine learning,

00:11:44.449 --> 00:11:45.871
that the theory is weak.

00:11:45.871 --> 00:11:47.141
[laughter]

00:11:47.141 --> 00:11:51.320
- No, but I'm saying there,
for the prediction part,

00:11:51.320 --> 00:11:52.394
it does much better.

00:11:52.394 --> 00:11:54.500
- Yeah, it's a better
curve fitting to it.

00:11:54.900 --> 00:11:57.608
- But it did so in a way

00:11:57.608 --> 00:11:59.782
that would not have made
those papers

00:11:59.782 --> 00:12:04.234
initially easy to get into,
the econometrics journals,

00:12:04.234 --> 00:12:06.270
because it wasn't proving
the type of things.

00:12:06.786 --> 00:12:09.864
When Breiman was doing
his regression trees --

00:12:09.864 --> 00:12:11.200
they just didn't fit in.

00:12:12.944 --> 00:12:14.934
I think he would have had
a very hard time

00:12:14.934 --> 00:12:18.400
publishing these things
in econometrics journals.

00:12:20.189 --> 00:12:23.656
I think we've limited
ourselves too much

00:12:24.700 --> 00:12:27.830
that left us close things off

00:12:27.830 --> 00:12:29.622
for a lot of these
machine learning methods

00:12:29.622 --> 00:12:31.163
that are actually very useful.

00:12:31.163 --> 00:12:34.000
I mean, I think, in general,

00:12:34.900 --> 00:12:36.529
that literature, 
the computer scientist,

00:12:36.529 --> 00:12:40.013
have brought a huge number
of these algorithms there...

00:12:40.582 --> 00:12:42.632
have proposed a huge number
of these algorithms

00:12:42.632 --> 00:12:43.875
that actually are very useful.

00:12:43.887 --> 00:12:46.153
and that are affecting

00:12:46.153 --> 00:12:49.100
the way we're going
to be doing empirical work.

00:12:49.800 --> 00:12:52.105
But we've not fully
internalized that

00:12:52.105 --> 00:12:53.573
because we're still very focused

00:12:53.573 --> 00:12:57.500
on getting point estimates
and getting standard errors

00:12:58.600 --> 00:13:00.144
and getting P values

00:13:00.159 --> 00:13:03.209
in a way that we need
to move beyond

00:13:03.209 --> 00:13:06.090
to fully harness the force,

00:13:06.549 --> 00:13:08.351
the benefits

00:13:08.351 --> 00:13:10.700
from the machine
learning literature.

00:13:11.198 --> 00:13:13.548
- On the one hand, I guess I very
much take your point

00:13:13.548 --> 00:13:16.850
that sort of the traditional
econometrics framework

00:13:16.850 --> 00:13:19.821
of sort of propose a method,
prove a limit theorem

00:13:19.821 --> 00:13:23.870
under some asymptotic story,
story story, story story...

00:13:24.424 --> 00:13:27.057
publisher paper is constraining,

00:13:27.218 --> 00:13:30.132
and that, in some sense,
by thinking more broadly

00:13:30.132 --> 00:13:31.699
about what a methods paper
could look like,

00:13:31.699 --> 00:13:33.316
we may [write] in some sense.

00:13:33.316 --> 00:13:34.929
Certainly the machine
learning literature

00:13:34.929 --> 00:13:37.189
has found a bunch of things
which seem to work quite well

00:13:37.189 --> 00:13:38.300
for a number of problems

00:13:38.300 --> 00:13:41.267
and are now having
substantial influence in economics.

00:13:41.267 --> 00:13:43.261
I guess a question
I'm interested in

00:13:43.261 --> 00:13:46.465
is how do you think
about the role of...

00:13:48.657 --> 00:13:51.200
Do you think there is no value
in the theory part of it?

00:13:51.600 --> 00:13:54.187
Because I guess a question
that I often have

00:13:54.187 --> 00:13:56.804
to sort of seeing the output
from a machine learning tool,

00:13:56.804 --> 00:13:58.207
and actually a number
of the methods

00:13:58.207 --> 00:13:59.220
that you talked about

00:13:59.220 --> 00:14:00.679
actually do have
inferential results

00:14:00.679 --> 00:14:01.944
developed for them,

00:14:02.520 --> 00:14:03.963
something that
I always wonder about,

00:14:03.963 --> 00:14:06.659
a sort of uncertainty
quantification and just...

00:14:06.659 --> 00:14:08.000
I have my prior,

00:14:08.000 --> 00:14:11.000
I come into the world with my view.
I see the result of this thing.

00:14:11.000 --> 00:14:12.395
How should I update based on it?

00:14:12.395 --> 00:14:13.867
And in some sense, 
if I'm in a world

00:14:13.867 --> 00:14:15.914
where things
are normally distributed,

00:14:15.914 --> 00:14:17.280
I know how to do it --

00:14:17.280 --> 00:14:18.305
here I don't.

00:14:18.305 --> 00:14:21.028
And so I'm interested to hear
what you think about that.

00:14:21.500 --> 00:14:24.698
- I don't see this as sort
of saying, well,

00:14:24.698 --> 00:14:26.556
these results are not interesting,

00:14:26.556 --> 00:14:27.968
but it's going to be a lot of cases

00:14:27.968 --> 00:14:30.153
where it's going to be incredibly
hard to get those results,

00:14:30.153 --> 00:14:32.489
and we may not
be able to get there,

00:14:32.489 --> 00:14:34.942
and we may need to do it in stages

00:14:34.942 --> 00:14:36.440
where first someone says,

00:14:36.440 --> 00:14:40.900
"Hey, I have
this interesting algorithm

00:14:40.900 --> 00:14:42.200
for doing something

00:14:42.200 --> 00:14:47.769
and it works well
by some of the criterion there

00:14:47.769 --> 00:14:49.900
on this particular data set,

00:14:51.000 --> 00:14:52.602
and we should put it out there,

00:14:52.602 --> 00:14:55.410
and maybe someone
will figure out a way

00:14:55.410 --> 00:14:57.828
that you can later actually
still do inference

00:14:57.828 --> 00:14:59.463
under some condition,

00:14:59.463 --> 00:15:02.100
and maybe those are not
particularly realistic conditions,

00:15:02.100 --> 00:15:03.800
then we kind of go further.

00:15:03.800 --> 00:15:08.418
But I think we've been
constraining things too much

00:15:08.418 --> 00:15:09.519
where we said,

00:15:09.519 --> 00:15:13.185
"This is the type of things
that we need to do.

00:15:13.185 --> 00:15:14.502
And in some sense,

00:15:15.700 --> 00:15:18.200
that goes back
to the way Josh and I

00:15:19.700 --> 00:15:21.984
thought about things for the
[local average treatment] effect.

00:15:21.984 --> 00:15:23.137
That wasn't quite the way

00:15:23.137 --> 00:15:25.135
people were thinking
about these problems before.

00:15:25.805 --> 00:15:28.860
There was a sense
that some of the people said

00:15:29.500 --> 00:15:31.900
the way you need to do
these things is you first say,

00:15:32.200 --> 00:15:34.250
what you're interested in
in estimating

00:15:34.250 --> 00:15:37.040
and then you do the best job
you can in estimating that.

00:15:38.100 --> 00:15:43.874
And what you guys are doing
is you're doing it backwards.

00:15:44.300 --> 00:15:46.700
You kind of say,
"Here, I have an estimator,

00:15:47.300 --> 00:15:50.642
and now I'm going to figure out
what it's estimating,

00:15:50.642 --> 00:15:53.900
and I suppose you're going to say
why you think that's interesting

00:15:53.900 --> 00:15:56.600
or maybe why it's not interesting,
and that's not okay.

00:15:56.600 --> 00:15:58.600
You're not allowed
to do that that way.

00:15:59.000 --> 00:16:02.026
And I think we should
just be a little bit more flexible

00:16:02.026 --> 00:16:06.648
in thinking about
how to look at problems

00:16:06.648 --> 00:16:08.328
because I think
we've missed some things

00:16:08.328 --> 00:16:11.300
by not doing that.

00:16:11.300 --> 00:16:12.819
♪ [music] ♪

00:16:12.819 --> 00:16:14.753
- [Josh] So you've heard
our views, Isaiah,

00:16:14.753 --> 00:16:18.191
and you've seen that we have
some points of disagreement.

00:16:18.191 --> 00:16:20.400
Why don't you referee
this dispute for us?

00:16:20.950 --> 00:16:22.394
[laughter]

00:16:22.500 --> 00:16:25.300
- Oh, it's so nice of you
to ask me a small question.

00:16:26.425 --> 00:16:27.993
So I guess for one,

00:16:27.993 --> 00:16:33.200
I very much agree with something
that Guido said earlier of...

00:16:34.100 --> 00:16:35.710
[laughter]

00:16:35.920 --> 00:16:37.148
- So one thing where it seems

00:16:37.148 --> 00:16:40.066
where the case for machine learning
seems relatively clear

00:16:40.066 --> 00:16:43.316
is in settings where
we're interested in some version

00:16:43.316 --> 00:16:45.100
of a nonparametric
prediction problem.

00:16:45.100 --> 00:16:46.392
So I'm interested in estimating

00:16:46.392 --> 00:16:49.700
a conditional expectation
or conditional probability,

00:16:50.000 --> 00:16:52.100
and in the past, maybe
I would have run a kernel...

00:16:52.100 --> 00:16:53.526
I would have run
a kernel regression

00:16:53.526 --> 00:16:55.184
or I would have run
a series regression,

00:16:55.184 --> 00:16:57.400
or something along those lines.

00:16:57.976 --> 00:17:00.350
It seems like, at this point, 
we've a fairly good sense

00:17:00.350 --> 00:17:03.102
that in a fairly wide range
of applications,

00:17:03.102 --> 00:17:05.671
machine learning methods
seem to do better

00:17:05.671 --> 00:17:08.610
for estimating conditional
mean functions

00:17:08.610 --> 00:17:09.811
or conditional probabilities

00:17:09.811 --> 00:17:12.000
or various other
nonparametric objects

00:17:12.400 --> 00:17:15.309
than more traditional
nonparametric methods

00:17:15.309 --> 00:17:17.292
that were studied
in econometrics and statistics,

00:17:17.292 --> 00:17:19.100
especially
in high dimensional settings.

00:17:19.500 --> 00:17:21.849
- So you're thinking of maybe
the propensity score

00:17:21.849 --> 00:17:23.155
or something like that?

00:17:23.155 --> 00:17:25.063
- Yeah, exactly,
- Nuisance functions.

00:17:25.063 --> 00:17:27.100
- Yeah, so things
like propensity scores,

00:17:27.872 --> 00:17:29.965
even objects of more direct

00:17:29.965 --> 00:17:32.400
interest-like conditional
average treatment effects,

00:17:32.400 --> 00:17:35.100
which are the difference of two
conditional expectation functions,

00:17:35.100 --> 00:17:36.625
potentially things like that.

00:17:36.625 --> 00:17:40.573
Of course, even there, 
the theory...

00:17:40.573 --> 00:17:43.620
for inference of the theory
for how to interpret,

00:17:43.620 --> 00:17:45.797
how to make large simple statements
about some of these things

00:17:45.797 --> 00:17:47.733
are less well-developed
depending on

00:17:47.733 --> 00:17:50.100
the machine learning
estimator used.

00:17:50.100 --> 00:17:52.983
And so I think
something that is tricky

00:17:52.983 --> 00:17:55.700
is that we can have these methods,
which work a lot,

00:17:55.700 --> 00:17:58.000
which seemed to work
a lot better for some purposes

00:17:58.000 --> 00:18:01.229
but which we need to be a bit
careful in how we plug them in

00:18:01.229 --> 00:18:03.300
or how we interpret
the resulting statements.

00:18:03.600 --> 00:18:05.985
But of course, that's a very,
very active area right now

00:18:05.985 --> 00:18:07.568
where people are doing
tons of great work.

00:18:07.568 --> 00:18:10.694
And so I fully expect
and hope to see

00:18:10.694 --> 00:18:12.800
much more going forward there.

00:18:13.000 --> 00:18:16.780
So one issue with machine learning
that always seems a danger is...

00:18:16.780 --> 00:18:18.517
or that is sometimes a danger

00:18:18.517 --> 00:18:20.938
and has sometimes
led to applications

00:18:20.938 --> 00:18:22.139
that have made less sense

00:18:22.139 --> 00:18:27.309
is when folks start with a method
that they're very excited about

00:18:27.309 --> 00:18:28.676
rather than a question.

00:18:28.900 --> 00:18:30.492
So sort of starting with a question

00:18:30.492 --> 00:18:33.782
where here's the object
I'm interested in,

00:18:33.782 --> 00:18:35.228
here is the parameter
of interest --

00:18:35.602 --> 00:18:39.500
let me think about how I would
identify that thing,

00:18:39.500 --> 00:18:41.824
how I would recover that thing
if I had a ton of data.

00:18:41.824 --> 00:18:44.000
Oh, here's a conditional
expectation function,

00:18:44.000 --> 00:18:47.065
let me plug in a machine
learning estimator for that --

00:18:47.065 --> 00:18:48.800
that seems very, very sensible.

00:18:49.000 --> 00:18:52.964
Whereas, you know, 
if I regress quantity on price

00:18:53.504 --> 00:18:56.000
and say that I used
a machine learning method,

00:18:56.300 --> 00:18:58.791
maybe I'm satisfied that 
that solves the [ ] problem

00:18:58.791 --> 00:19:01.200
we're usually worried
about there... maybe I'm not.

00:19:01.500 --> 00:19:02.649
But again, that's something

00:19:02.649 --> 00:19:06.300
where the way to address it
seems relatively clear.

00:19:06.500 --> 00:19:08.181
It's to find
your object of interest

00:19:08.181 --> 00:19:09.779
and think about--

00:19:09.779 --> 00:19:11.489
- Just bring in the economics.

00:19:11.489 --> 00:19:12.741
- Exactly.

00:19:12.741 --> 00:19:14.274
- And think about
the heterogeneity,

00:19:14.274 --> 00:19:17.067
but harnessed the power
of the machine learning methods

00:19:17.067 --> 00:19:20.148
for some of the components.

00:19:20.349 --> 00:19:21.388
- Precisely. Exactly.

00:19:21.388 --> 00:19:23.753
So the question of interest

00:19:23.753 --> 00:19:25.767
is the same as the question
of interest has always been,

00:19:25.767 --> 00:19:28.493
but we now have better methods
for estimating some pieces of this.

00:19:29.900 --> 00:19:32.704
The place that seems
harder to forecast

00:19:32.704 --> 00:19:35.816
is obviously, there's
a huge amount going on

00:19:35.816 --> 00:19:37.500
in the machine learning literature

00:19:37.500 --> 00:19:40.223
and the limited ways
of plugging it in

00:19:40.223 --> 00:19:41.388
that I've referenced so far

00:19:41.388 --> 00:19:43.090
are a limited piece of that.

00:19:43.090 --> 00:19:45.324
So I think there are all sorts
of other interesting questions

00:19:45.324 --> 00:19:46.520
about where...

00:19:47.100 --> 00:19:49.300
where does this interaction go? 
What else can we learn?

00:19:49.300 --> 00:19:52.932
And that's something where
I think there's a ton going on

00:19:52.932 --> 00:19:54.414
which seems very promising,

00:19:54.414 --> 00:19:56.400
and I have no idea
what the answer is.

00:19:57.000 --> 00:20:00.043
- No, I totally agree with that,

00:20:00.043 --> 00:20:03.539
but that makes it very exciting.

00:20:03.539 --> 00:20:06.100
And I think there's just
a little work to be done there.

00:20:06.600 --> 00:20:08.720
Alright. So I say, 
he agrees with me there.

00:20:08.720 --> 00:20:10.174
[laughter]

00:20:10.174 --> 00:20:11.633
- I didn't say that per se.

00:20:12.926 --> 00:20:14.419
♪ [music] ♪

00:20:14.419 --> 00:20:16.833
- [Narrator] If you'd like to watch
more Nobel Conversations,

00:20:16.833 --> 00:20:18.012
click here.

00:20:18.012 --> 00:20:20.492
Or if you'd like to learn
more about econometrics,

00:20:20.500 --> 00:20:23.100
check out Josh's
Mastering Econometrics series.

00:20:23.600 --> 00:20:26.569
If you'd like to learn more
about Guido, Josh, and Isaiah,

00:20:26.569 --> 00:20:28.550
check out the links
in the description.

00:20:28.550 --> 00:20:30.535
♪ [music] ♪