WEBVTT

00:00:00.100 --> 00:00:02.350
♪ [music] ♪

00:00:03.700 --> 00:00:05.700
- [narrator] Welcome
to Nobel Conversations.

00:00:07.000 --> 00:00:10.128
In this episode, Josh Angrist
and Guido Imbens

00:00:10.128 --> 00:00:13.700
sit down with Isaiah Andrews
to discuss and disagree

00:00:13.700 --> 00:00:16.580
over the role of machine learning
in applied econometrics.

00:00:18.300 --> 00:00:19.769
- [Isaiah] So, of course,
there are a lot of topics

00:00:19.769 --> 00:00:21.087
where you guys largely agree,

00:00:21.087 --> 00:00:22.313
but I'd like to turn to one

00:00:22.313 --> 00:00:24.240
where maybe you have
some differences of opinion.

00:00:24.240 --> 00:00:25.728
So I'd love to hear
some of your thoughts

00:00:25.728 --> 00:00:26.883
about machine learning

00:00:26.883 --> 00:00:29.900
and the goal that it's playing
and is going to play in economics.

00:00:30.200 --> 00:00:33.352
- [Guido] I've looked at some data
like the proprietary

00:00:33.352 --> 00:00:35.100
so that there's
no published paper there.

00:00:36.719 --> 00:00:38.159
There was an experiment
that was done

00:00:38.159 --> 00:00:39.500
on some search algorithm.

00:00:39.700 --> 00:00:41.497
And the question was...

00:00:42.901 --> 00:00:45.600
it was about ranking things
and changing the ranking.

00:00:45.900 --> 00:00:47.500
That was sort of clear...

00:00:48.400 --> 00:00:50.600
that was going to be
a lot of heterogeneity there.

00:00:50.600 --> 00:00:51.700
Mmm,

00:00:51.700 --> 00:00:58.120
You know, if you look for say,

00:00:58.300 --> 00:01:00.350
a picture of Britney Spears

00:01:00.350 --> 00:01:02.400
that it doesn't really matter
where you rank it

00:01:02.400 --> 00:01:05.500
because you're going to figure out
what you're looking for,

00:01:06.200 --> 00:01:07.867
whether you put it
in the first or second

00:01:07.867 --> 00:01:09.800
or third position of the ranking.

00:01:10.100 --> 00:01:12.500
But if you're looking
for the best econometrics book,

00:01:13.300 --> 00:01:16.500
if you put your book
first or your book tenth,

00:01:16.500 --> 00:01:18.100
that's going to make
a big difference

00:01:18.600 --> 00:01:21.829
how much how often people
are going to click on it.

00:01:21.829 --> 00:01:23.417
And so there you go --

00:01:23.417 --> 00:01:27.218
- [Josh] Why do I need
machine learning to discover that?

00:01:27.218 --> 00:01:29.195
It seems like could
I can discover it simply?

00:01:29.195 --> 00:01:30.435
- [Guido] So in general--

00:01:30.435 --> 00:01:32.100
- [Josh] There were lots
of possible...

00:01:32.100 --> 00:01:35.490
- You what you want to think about
there being lots of characteristics

00:01:35.490 --> 00:01:37.610
of the items

00:01:37.610 --> 00:01:41.682
that you want to understand
what drives the heterogeneity

00:01:42.300 --> 00:01:43.427
in the effect of--

00:01:43.427 --> 00:01:45.600
- But you're just predicting

00:01:45.600 --> 00:01:47.700
In some sense, you're solving
a marketing problem.

00:01:48.400 --> 00:01:49.580
- [inaudible] it's causal effect,

00:01:49.580 --> 00:01:51.800
- It's causal, but it has
no scientific content.

00:01:51.800 --> 00:01:53.300
Think about...

00:01:54.100 --> 00:01:57.300
- No, but it's similar things
in medical settings.

00:01:58.000 --> 00:02:01.300
If you do an experiment, 
you may actually be very interested

00:02:01.300 --> 00:02:03.900
in whether the treatment
works for some groups or not.

00:02:03.900 --> 00:02:06.500
And you have a lot of individual
characteristics,

00:02:06.500 --> 00:02:08.000
and you want
to systematically search.

00:02:08.000 --> 00:02:09.500
- Yeah. I'm skeptical about that --

00:02:09.500 --> 00:02:12.603
that sort of idea that there's
this personal causal effect

00:02:12.603 --> 00:02:13.900
that I should care about,

00:02:14.000 --> 00:02:16.063
and that machine learning
can discover it

00:02:16.063 --> 00:02:17.596
in some way that's useful.

00:02:17.596 --> 00:02:21.400
So think about -- I've done
a lot of work on schools,

00:02:21.400 --> 00:02:23.950
going to, say, a charter school,

00:02:23.950 --> 00:02:25.225
a publicly funded private school,

00:02:25.225 --> 00:02:26.500
effectively, you know,
that's free to structure

00:02:26.500 --> 00:02:29.300
its own curriculum
for context there.

00:02:29.300 --> 00:02:31.000
Some types of charter schools

00:02:31.000 --> 00:02:32.700
generate spectacular
achievement gains,

00:02:32.700 --> 00:02:36.400
and in the data set
that produces that result,

00:02:36.400 --> 00:02:37.800
I have a lot of covariance.

00:02:37.800 --> 00:02:41.353
So I have baseline scores,
and I have family background,

00:02:41.353 --> 00:02:43.576
the education of the parents,

00:02:43.576 --> 00:02:45.800
the sex of the child, 
the race of the child.

00:02:45.800 --> 00:02:48.300
And, well, soon as I put
half a dozen of those together,

00:02:48.400 --> 00:02:51.900
I have a very high dimensional space.

00:02:52.300 --> 00:02:53.600
I'm definitely interested
in sort of coarse features

00:02:53.600 --> 00:02:54.900
of that treatment effect,

00:02:54.900 --> 00:02:57.150
like whether it's better for people

00:02:57.150 --> 00:02:59.400
who come from
lower income families.

00:03:02.600 --> 00:03:06.000
I have a hard time believing
that there's an application,

00:03:06.400 --> 00:03:10.300
for the very high dimensional
version of that,

00:03:10.500 --> 00:03:11.850
where I discovered
that for non-white children

00:03:11.850 --> 00:03:13.200
who have high family incomes

00:03:13.800 --> 00:03:17.800
but baseline scores
in the third quartile

00:03:18.300 --> 00:03:20.650
and only went to public school
in the third grade

00:03:20.650 --> 00:03:23.000
but not the sixth grade.

00:03:23.000 --> 00:03:25.500
So that's what that high
dimensional analysis produces.

00:03:25.800 --> 00:03:28.100
This very elaborate
conditional statement.

00:03:28.300 --> 00:03:31.000
There's two things that are wrong
with that in my view.

00:03:31.000 --> 00:03:32.500
First, I don't see it as...

00:03:32.500 --> 00:03:34.000
I just can't imagine
why it's actionable.

00:03:34.600 --> 00:03:36.600
I don't know why
you'd want to act on it.

00:03:36.600 --> 00:03:38.900
And I know also
that there's some alternative model

00:03:38.900 --> 00:03:41.200
that fits almost as well,

00:03:41.800 --> 00:03:43.000
that flips everything,

00:03:43.200 --> 00:03:45.350
Because machine learning
doesn't tell me

00:03:45.350 --> 00:03:47.500
that this is really
the predictor that matters.

00:03:48.400 --> 00:03:52.300
It just tells me that
this is a good predictor.

00:03:52.800 --> 00:03:54.350
And so, I think
there is something different

00:03:54.350 --> 00:03:55.900
about the social science contest.

00:03:57.940 --> 00:03:59.545
- [Guido] I think
the [socialized sign] applications

00:03:59.545 --> 00:04:01.150
you're talking about,

00:04:01.150 --> 00:04:02.600
once were...

00:04:03.400 --> 00:04:08.100
I think there's not a huge amount
of heterogeneity in the effects.

00:04:08.400 --> 00:04:11.200
- [Josh] There might be

00:04:11.200 --> 00:04:14.000
if you allow me
to to fill that space.

00:04:14.600 --> 00:04:16.350
- No... not even then.

00:04:16.350 --> 00:04:18.100
I think for a lot
of those interventions,

00:04:18.300 --> 00:04:22.000
you would expect that the effect
is the same sign for everybody.

00:04:23.400 --> 00:04:27.600
There may be small differences
in the magnitude, but it's not...

00:04:28.200 --> 00:04:31.700
For a lot of these education
defenses -- they're good for everybody.

00:04:32.900 --> 00:04:35.250
It's not that they're bad
for some people

00:04:35.250 --> 00:04:37.600
and good for other people,

00:04:37.600 --> 00:04:39.200
and that is kind
of very small pockets

00:04:39.200 --> 00:04:40.800
where they're bad there.

00:04:40.900 --> 00:04:43.900
But it may be some variation
in the magnitude,

00:04:44.000 --> 00:04:48.200
but you would need very, 
very big data sets to find those.

00:04:48.400 --> 00:04:49.900
I agree that in those cases,

00:04:49.900 --> 00:04:51.400
they probably wouldn't be
very actionable anyone.

00:04:51.700 --> 00:04:53.800
But I think there's a lot
of other settings

00:04:54.100 --> 00:04:56.600
where there is
much more heterogeneity.

00:04:57.400 --> 00:04:59.500
- Well, I'm open
to that possibility,

00:04:59.500 --> 00:05:05.550
and I think the example you gave
is essentially a marketing example.

00:05:06.430 --> 00:05:10.700
- No, those have implications for it
and that's the organization,

00:05:10.700 --> 00:05:13.900
whether you need
to worry about the...

00:05:14.000 --> 00:05:17.900
- Well, I need to see that paper.

00:05:18.400 --> 00:05:21.200
- So the sense I'm getting...

00:05:21.500 --> 00:05:23.100
- We still disagree on something.
- Yes.

00:05:23.100 --> 00:05:24.100
[laughter]

00:05:24.100 --> 00:05:25.400
- We haven't converged
on everything.

00:05:25.400 --> 00:05:26.050
- I'm getting that sense.

00:05:26.050 --> 00:05:26.700
[laughter]

00:05:27.200 --> 00:05:29.100
- Actually, we've diverged on this

00:05:29.100 --> 00:05:30.050
because this wasn't around
to argue about.

00:05:30.050 --> 00:05:31.000
[laughter]

00:05:33.200 --> 00:05:35.600
- Is it getting a little warm here?

00:05:35.600 --> 00:05:38.000
- Warmed up. Warmed up is good.

00:05:38.100 --> 00:05:40.800
The sense I'm getting is, Josh,
you're not saying

00:05:40.900 --> 00:05:43.400
that you're confident
that there is no way

00:05:43.400 --> 00:05:45.400
that there is an application
where the stuff.

00:05:45.400 --> 00:05:46.800
It's useful you are saying

00:05:46.800 --> 00:05:48.200
you are unconvinced by
the existing application to date.

00:05:48.300 --> 00:05:51.280
Fair enough.

00:05:51.280 --> 00:05:53.120
- I'm very confident.

00:05:53.120 --> 00:05:54.300
[laughter]

00:05:54.300 --> 00:05:55.300
- In this case.

00:05:55.300 --> 00:05:57.500
- I think Josh does have a point

00:05:58.000 --> 00:06:02.100
that even in the prediction cases

00:06:02.300 --> 00:06:05.000
where a lot of the machine learning
methods really shine

00:06:05.000 --> 00:06:06.600
is where there's just a lot
of heterogeneity.

00:06:07.300 --> 00:06:10.600
- You don't really care much
about the details there, right?

00:06:10.900 --> 00:06:15.000
It doesn't have
a policy angle or something.

00:06:15.200 --> 00:06:18.100
- They kind of recognizing
handwritten digits and stuff.

00:06:18.300 --> 00:06:21.150
It does much better there

00:06:21.150 --> 00:06:24.000
than building
some complicated model.

00:06:24.400 --> 00:06:28.100
But a lot of the social science,
a lot of the economic applications,

00:06:28.300 --> 00:06:30.200
we actually know a huge amount
about the relationship

00:06:30.200 --> 00:06:32.100
between its variables.

00:06:32.100 --> 00:06:34.600
A lot of the relationships
are strictly monotone.

00:06:35.400 --> 00:06:39.400
Education is going to increase
people's earnings,

00:06:39.800 --> 00:06:41.950
irrespective of the demographic,

00:06:41.950 --> 00:06:44.100
irrespective of the level
of education you already have.

00:06:44.100 --> 00:06:45.950
- Until they get to a Ph.D.

00:06:45.950 --> 00:06:47.800
- Yeah, there is a graduate school...

00:06:48.150 --> 00:06:49.150
[laughter]

00:06:49.500 --> 00:06:50.700
but go over a reasonable range.

00:06:51.600 --> 00:06:55.900
It's not going
to go down very much.

00:06:56.100 --> 00:06:57.900
In a lot of the settings

00:06:57.900 --> 00:06:59.700
where these machine learning
methods shine,

00:06:59.700 --> 00:07:01.900
there's a lot of [ ]

00:07:02.100 --> 00:07:04.900
kind of multimodality
in these relationships,

00:07:05.300 --> 00:07:08.400
and they're going to be
very powerful.

00:07:08.400 --> 00:07:11.500
But I still stand by that.

00:07:11.700 --> 00:07:16.100
These methods just have
a huge amount to offer

00:07:16.400 --> 00:07:18.100
for economists,

00:07:18.200 --> 00:07:21.700
and they're going to be
a big part of the future.

00:07:23.400 --> 00:07:24.600
- [Isaiah] Feels like
there's something interesting

00:07:24.600 --> 00:07:25.800
to be said about
machine learning here.

00:07:25.800 --> 00:07:27.700
So, Guido, I was wondering,
could you give some more...

00:07:28.000 --> 00:07:29.000
maybe some examples
of the sorts of examples

00:07:29.000 --> 00:07:32.500
you're thinking about
with applications [ ] at the moment?

00:07:32.500 --> 00:07:34.100
- So on areas where

00:07:34.700 --> 00:07:36.400
instead of looking
for average cause or effects

00:07:36.500 --> 00:07:39.350
we're looking for
individualized estimates,

00:07:39.350 --> 00:07:42.200
predictions of cause or effects

00:07:42.400 --> 00:07:44.950
and the machine learning algorithms
have been very effective,

00:07:48.300 --> 00:07:51.500
Traditionally, we would have done
these things using kernel methods.

00:07:51.600 --> 00:07:54.500
And theoretically they work great,

00:07:54.600 --> 00:07:56.000
and there's some arguments

00:07:56.000 --> 00:07:57.400
that, formally, 
you can't do any better.

00:07:57.600 --> 00:08:00.500
But in practice, 
they don't work very well.

00:08:00.900 --> 00:08:03.150
Random causal forest-type things

00:08:03.150 --> 00:08:05.400
that Stefan Wager and Susan Athey
have been working on

00:08:05.400 --> 00:08:09.500
have used very widely.

00:08:09.600 --> 00:08:12.200
They've been very effective
in these settings

00:08:12.400 --> 00:08:18.100
to actually get causal effects
that vary be [ ].

00:08:20.700 --> 00:08:23.200
I think this is still just the beginning
of these methods.

00:08:23.200 --> 00:08:25.700
But in many cases,

00:08:26.400 --> 00:08:31.600
these algorithms are very effective
as searching over big spaces

00:08:31.800 --> 00:08:35.600
and finding the functions that fit very well

00:08:35.900 --> 00:08:41.100
in ways that we couldn't
really do beforehand.

00:08:41.500 --> 00:08:43.400
- I don't know of an example

00:08:43.400 --> 00:08:45.300
where machine learning
has generated insights

00:08:45.300 --> 00:08:48.100
about a causal effect
that I'm interested in.

00:08:48.300 --> 00:08:49.800
And I do know of examples

00:08:49.800 --> 00:08:51.300
where it's potentially
very misleading.

00:08:51.300 --> 00:08:53.700
So I've done some work
with Brigham Frandsen,

00:08:54.100 --> 00:08:55.100
using, for example, random forest
to model covariate effects

00:08:55.100 --> 00:08:59.900
in an instrumental
variables problem

00:09:00.200 --> 00:09:01.200
Where you need you need
to condition on covariance.

00:09:04.400 --> 00:09:06.300
And you don't particularly
have strong feelings

00:09:06.300 --> 00:09:08.200
about the functional form for that,

00:09:08.200 --> 00:09:10.000
so maybe you should curve...

00:09:10.900 --> 00:09:12.700
be open to flexible curve fitting,

00:09:12.700 --> 00:09:14.500
and that leads you down a path

00:09:14.500 --> 00:09:18.000
where there's a lot
of nonlinearities in the model,

00:09:18.200 --> 00:09:20.600
and that's very dangerous with IV

00:09:20.600 --> 00:09:23.000
because any sort
of excluded non-linearity

00:09:23.300 --> 00:09:25.450
potentially generates
a spurious causal effect

00:09:25.450 --> 00:09:27.600
and Brigham and I
showed that very powerfully.

00:09:27.900 --> 00:09:32.200
I think in the case
of two instruments

00:09:32.700 --> 00:09:36.000
that come from a paper of mine
with Bill Evans,

00:09:36.500 --> 00:09:37.600
where if you replace it

00:09:38.100 --> 00:09:40.350
a traditional two stage 
[ ] squares estimator

00:09:40.350 --> 00:09:42.600
with some kind of random forest,

00:09:42.900 --> 00:09:48.000
you get very precisely
estimated [non-sense] estimates.

00:09:49.000 --> 00:09:51.100
I think that's a big caution.

00:09:51.100 --> 00:09:53.400
In view of those findings
in an example I care about

00:09:53.700 --> 00:09:57.100
where the instruments
are very simple

00:09:57.400 --> 00:09:59.100
and I believe that they're valid,

00:09:59.300 --> 00:10:01.600
I would be skeptical of that.

00:10:02.900 --> 00:10:06.800
So non-linearity and IV
don't mix very comfortably.

00:10:07.200 --> 00:10:10.450
No, it sounds like that's already
a more complicated...

00:10:10.450 --> 00:10:11.400
- Well, it's IV....
- Yeah.

00:10:12.500 --> 00:10:16.700
- ...and we work on that.

00:10:17.150 --> 00:10:17.875
[laughter]

00:10:17.875 --> 00:10:18.600
- Fair enough.

00:10:18.600 --> 00:10:20.450
- As Editor of Econometric [guy],

00:10:20.450 --> 00:10:22.300
a lot of these papers
cross by my desk,

00:10:22.700 --> 00:10:26.100
but the motivation is not clear

00:10:26.100 --> 00:10:29.500
and, in fact, really lacking.

00:10:29.800 --> 00:10:35.100
They're not... [we call] type
semi-parametric foundational papers.

00:10:35.400 --> 00:10:37.100
So that that's a big problem.

00:10:38.000 --> 00:10:42.400
A related problem is that we have
this tradition in econometrics

00:10:42.600 --> 00:10:47.500
of being very focused
on these formal [ ] results.

00:10:48.800 --> 00:10:52.600
We have just have a lot of papers
where people propose a method

00:10:52.800 --> 00:10:55.700
and then establish
the asymptotic properties

00:10:56.300 --> 00:10:59.100
in a very kind of standardized way.

00:10:59.100 --> 00:11:01.900
- Is that bad?

00:11:02.900 --> 00:11:07.200
- Well, I think it's sort
of closed the door

00:11:07.200 --> 00:11:09.400
for a lot of work
that doesn't fit it into that.

00:11:09.400 --> 00:11:11.600
where in the machine
learning literature,

00:11:11.900 --> 00:11:14.300
a lot of things
are more algorithmic.

00:11:14.431 --> 00:11:18.500
People had algorithms
for coming up with predictions

00:11:18.800 --> 00:11:21.200
that turn out
to actually work much better

00:11:21.200 --> 00:11:23.600
than, say, nonparametric
kernel regression

00:11:24.000 --> 00:11:26.800
For a long time, we were doing all
the nonparametrics in econometrics,

00:11:26.800 --> 00:11:28.950
we were using kernel regression,

00:11:28.950 --> 00:11:31.100
and it was great for proving theorems.

00:11:31.300 --> 00:11:33.050
You could get [ ] intervals

00:11:33.050 --> 00:11:34.800
and consistency, 
and asymptotic normality,

00:11:34.800 --> 00:11:35.900
and it was all great,

00:11:35.900 --> 00:11:37.000
But it wasn't very useful.

00:11:37.300 --> 00:11:39.100
And the things they did
in machine learning

00:11:39.100 --> 00:11:40.900
are just way, way better.

00:11:41.000 --> 00:11:43.050
But they didn't have the problem--

00:11:43.050 --> 00:11:44.300
- That's not my beef
with machine learning theory.

00:11:44.300 --> 00:11:45.300
[laughter]

00:11:45.300 --> 00:11:51.200
No, but I'm saying there,
for the prediction part,

00:11:51.400 --> 00:11:52.950
it does much better.

00:11:52.950 --> 00:11:54.500
- Yeah, it's a better
curve fitting to it.

00:11:54.900 --> 00:11:56.500
- But it did so in a way

00:11:57.100 --> 00:11:58.500
that would not have made
those papers

00:11:58.500 --> 00:11:59.900
initially easy to get into,
the econometrics journals,

00:12:04.650 --> 00:12:06.300
because it wasn't proving
the type of things.

00:12:06.400 --> 00:12:08.800
When Brigham was doing
his regression trees

00:12:08.800 --> 00:12:11.200
that just didn't fit in.

00:12:11.800 --> 00:12:15.100
I think he would have had
a very hard time

00:12:15.200 --> 00:12:18.400
publishing these things
in econometric journals.

00:12:18.900 --> 00:12:24.400
I think we've limited
ourselves too much

00:12:24.700 --> 00:12:27.900
that left us close things off

00:12:28.000 --> 00:12:29.400
for a lot of these
machine learning methods

00:12:29.400 --> 00:12:30.800
that are actually very useful.

00:12:30.900 --> 00:12:34.000
I mean, I think, in general,

00:12:34.900 --> 00:12:36.200
that literature, 
the computer scientist,

00:12:36.200 --> 00:12:37.750
have proposed a huge number
of these algorithms

00:12:37.750 --> 00:12:39.300
that actually are very useful.

00:12:45.500 --> 00:12:47.300
and that are affecting

00:12:47.300 --> 00:12:49.100
the way we're going
to be doing empirical work.

00:12:49.800 --> 00:12:52.450
But we've not fully internalized that

00:12:52.450 --> 00:12:55.100
because we're still very focused

00:12:55.300 --> 00:12:57.500
on getting point estimates
and getting standard errors

00:12:58.600 --> 00:13:01.200
and getting P values

00:13:01.700 --> 00:13:03.100
in a way that we need to move beyond

00:13:03.300 --> 00:13:04.300
to fully harness the force,

00:13:04.300 --> 00:13:10.700
the benefits
from the machine learning literature.

00:13:10.900 --> 00:13:13.000
- On the one hand, I guess I very
much take your point

00:13:13.000 --> 00:13:15.100
that sort of the traditional
econometrics framework

00:13:15.200 --> 00:13:18.600
of sort of propose a method,
prove a limit theorem

00:13:18.600 --> 00:13:22.600
under some asymptotic story,
story story, story story...

00:13:22.600 --> 00:13:26.900
publisher paper is constraining.

00:13:26.900 --> 00:13:29.700
And that, in some sense,

00:13:29.700 --> 00:13:30.575
by thinking more broadly

00:13:30.575 --> 00:13:31.450
about what a methods paper
could look like,

00:13:31.450 --> 00:13:33.200
we may [write] in some sense.

00:13:33.200 --> 00:13:35.900
Certainly the machine learning
literature has found a bunch of things,

00:13:35.900 --> 00:13:38.300
which seem to work quite well
for a number of problems

00:13:38.300 --> 00:13:40.350
and are now having
substantial influence in economics.

00:13:40.350 --> 00:13:42.400
I guess a question I'm interested in

00:13:42.400 --> 00:13:44.800
is how do you think
about the role of...

00:13:47.900 --> 00:13:51.200
sort of -- do you think there is
no value in the theory part of it?

00:13:51.600 --> 00:13:54.800
Because I guess a question
that I often have

00:13:54.800 --> 00:13:56.900
to sort of seeing that output
from a machine learning tool,

00:13:56.900 --> 00:13:59.400
that actually a number of the
methods that you talked about

00:13:59.400 --> 00:14:01.800
actually do have inferential results
developed for them,

00:14:02.600 --> 00:14:04.500
something that
I always wonder about

00:14:04.500 --> 00:14:06.400
of uncertainty quantification
and just...

00:14:06.500 --> 00:14:08.000
I have my prior,

00:14:08.000 --> 00:14:11.000
I come into the world with my view.
I see the result of this thing.

00:14:11.000 --> 00:14:12.750
How should I update based on it?

00:14:12.750 --> 00:14:14.500
And in some sense, 
if I'm in a world

00:14:14.600 --> 00:14:15.100
where things are normally distributed,

00:14:15.200 --> 00:14:16.700
I know how to do it here --

00:14:16.700 --> 00:14:18.200
here I don't.

00:14:18.200 --> 00:14:21.400
And so I'm interested to hear
what you think about that.

00:14:21.500 --> 00:14:24.300
- I don't see this as sort
of saying, well,

00:14:24.400 --> 00:14:26.500
these results are not interesting,

00:14:26.600 --> 00:14:27.700
but it's going to be a lot of cases

00:14:28.000 --> 00:14:29.600
where it's going
to be incredibly hard

00:14:29.600 --> 00:14:31.200
to get those results

00:14:31.200 --> 00:14:33.200
and we may not be able to get there

00:14:33.400 --> 00:14:35.550
and we may need to do it in stages

00:14:35.550 --> 00:14:37.700
where first someone says,

00:14:39.600 --> 00:14:40.900
"Hey, I have
this interesting algorithm

00:14:40.900 --> 00:14:42.200
for doing something

00:14:42.200 --> 00:14:44.800
and it works well by some of the criterion

00:14:45.600 --> 00:14:49.900
that on this particular data set,

00:14:51.000 --> 00:14:53.400
and I'm visit put it out there,

00:14:53.700 --> 00:14:55.850
and maybe someone will figure out a way

00:14:55.850 --> 00:14:58.000
that you can later actually
still do inference

00:14:58.000 --> 00:14:59.100
on the [sum] condition,

00:14:59.100 --> 00:15:02.100
and maybe those are not
particularly realistic conditions,

00:15:02.100 --> 00:15:03.800
then we kind of go further.

00:15:03.800 --> 00:15:05.500
But I think we've been
constraining things too much

00:15:06.700 --> 00:15:09.050
where we said,

00:15:09.050 --> 00:15:11.400
"This is the type of things
that we need to do.

00:15:12.100 --> 00:15:14.400
And in some sense,

00:15:15.700 --> 00:15:18.200
that goes back
to the way Josh and I

00:15:19.700 --> 00:15:21.900
thought about things for the
[local average treatment] effect.

00:15:21.900 --> 00:15:23.250
That wasn't quite the way

00:15:23.250 --> 00:15:24.600
people were thinking
about these problems before.

00:15:24.600 --> 00:15:29.200
There was a sense
that some of the people said

00:15:29.500 --> 00:15:31.900
the way you need to do
these things is you first say,

00:15:32.200 --> 00:15:34.250
what you're interested in
in estimating

00:15:34.250 --> 00:15:36.300
and then you do the best job
you can in estimating that.

00:15:38.100 --> 00:15:44.200
and what you guys are doing
is you're doing it backwards.

00:15:44.300 --> 00:15:46.700
You kind of say,
"Here, I have an estimator,

00:15:47.300 --> 00:15:49.600
and now I'm going to figure out
what it's estimating,

00:15:51.400 --> 00:15:53.900
and I suppose you're going to say
why you think that's interesting

00:15:53.900 --> 00:15:56.600
or maybe why it's not interesting,
and that's not okay.

00:15:56.600 --> 00:15:58.600
You're not allowed
to do that that way.

00:15:59.000 --> 00:16:04.100
And I think we should
just be a little bit more flexible

00:16:04.300 --> 00:16:06.300
in thinking about
how to look at problems

00:16:06.400 --> 00:16:08.850
because I think
we've missed some things

00:16:08.850 --> 00:16:11.300
by not doing that.

00:16:13.000 --> 00:16:14.800
- [Josh] So you've heard
our views, Isaiah.

00:16:14.800 --> 00:16:16.600
You've seen that we have
some points of disagreement.

00:16:17.000 --> 00:16:20.400
Why don't you referee
this dispute for us?

00:16:20.950 --> 00:16:21.950
[laughter]

00:16:22.500 --> 00:16:25.300
- Oh, it's so nice of you
to ask me a small question.

00:16:25.300 --> 00:16:28.100
So I guess for one,

00:16:28.200 --> 00:16:33.200
I very much agree with something
that Guido said earlier of...

00:16:34.100 --> 00:16:35.100
[laughter]

00:16:36.500 --> 00:16:37.900
- So one thing where it seems

00:16:37.900 --> 00:16:39.650
where the case for machine learning
seems relatively clear

00:16:39.650 --> 00:16:41.400
is in settings where
we're interested in some version

00:16:41.500 --> 00:16:45.100
of a nonparametric
prediction problem.

00:16:45.100 --> 00:16:47.400
So I'm interested in estimating

00:16:47.400 --> 00:16:49.700
a conditional expectation
or conditional probability,

00:16:50.000 --> 00:16:52.100
and in the past, maybe
I would have run a kernel...

00:16:52.100 --> 00:16:53.950
I would have run
a kernel regression

00:16:53.950 --> 00:16:55.800
or I would have run
a series regression,

00:16:56.100 --> 00:16:57.400
or something along those lines.

00:16:58.700 --> 00:17:00.350
It seems like, at this point, 
we've a fairly good sense

00:17:00.350 --> 00:17:02.000
that in a fairly wide range
of applications,

00:17:02.000 --> 00:17:06.300
machine learning methods
seem to do better

00:17:06.800 --> 00:17:08.800
for estimating conditional
mean functions

00:17:08.800 --> 00:17:10.400
or conditional probabilities

00:17:10.400 --> 00:17:12.000
or various other
nonparametric objects

00:17:12.400 --> 00:17:16.600
than more traditional nonparametric
methods that were studied in econometrics

00:17:16.600 --> 00:17:19.100
and statistics, especially
in high dimensional settings.

00:17:19.500 --> 00:17:23.100
So you thinking of maybe the propensity
score or something like that?

00:17:23.100 --> 00:17:25.300
So exactly, so nuisance functions. Yeah.

00:17:25.300 --> 00:17:28.900
So things like propensity scores
things or I mean even objects

00:17:28.900 --> 00:17:30.100
of more direct inference

00:17:30.200 --> 00:17:32.400
interest, like conditional
average treatment effects, right?

00:17:32.400 --> 00:17:35.100
Which of the difference of two
conditional, expectation functions,

00:17:35.100 --> 00:17:36.300
potentially things like that.

00:17:36.500 --> 00:17:40.400
Of course, even there,
right? We the the theory

00:17:40.500 --> 00:17:43.700
for in France or the theory for
sort of how to how to interpret,

00:17:43.700 --> 00:17:45.900
how to make large simple statements
about some of these things are

00:17:46.000 --> 00:17:50.100
less well-developed depending on the
machine learning, estimator used.

00:17:50.100 --> 00:17:53.800
And so, I think there's something
that is tricky is that we

00:17:53.900 --> 00:17:55.700
can have these methods, which work a lot,

00:17:55.700 --> 00:17:58.000
which seemed to work a lot
better for some purposes.

00:17:58.000 --> 00:18:01.600
But which we need to be a bit
careful in how we plug them in or how

00:18:01.600 --> 00:18:03.300
we interpret the resulting statements.

00:18:03.600 --> 00:18:06.200
But of course, that's a very,
very active area right now. We're

00:18:06.400 --> 00:18:10.400
People are doing tons of great work.
And so I exfoli expect and hope

00:18:10.400 --> 00:18:12.800
to see much more going forward there.

00:18:13.000 --> 00:18:17.300
So one issue with machine learning,
that always seems a danger is, or

00:18:17.400 --> 00:18:20.300
that is sometimes a danger
and had some times led to

00:18:20.500 --> 00:18:22.600
applications that have
made. Less sense, is

00:18:22.800 --> 00:18:25.100
when folks start with a method that are

00:18:25.300 --> 00:18:28.500
start with a method that they're very
excited about rather than a question,

00:18:28.900 --> 00:18:32.100
right? So sort of starting with
a question where here's the

00:18:32.500 --> 00:18:36.200
object I'm interested in here is
the parameter of Interest. Let me

00:18:36.700 --> 00:18:37.100
You know,

00:18:37.300 --> 00:18:39.500
think about how I would
identify that thing,

00:18:39.500 --> 00:18:41.800
how I would recover that
thing, if I had a ton of data,

00:18:41.900 --> 00:18:44.000
oh, here's a conditional
expectation function.

00:18:44.000 --> 00:18:47.100
Let me plug in an estimator on
machine. Learning estimator for that.

00:18:47.200 --> 00:18:48.800
That seems very very sensible.

00:18:49.000 --> 00:18:53.100
Whereas, you know, if I
digress quantity on price

00:18:53.700 --> 00:18:56.000
and say that I used a
machine learning method,

00:18:56.300 --> 00:18:58.900
maybe I'm satisfied that that
solves the in dodging, 80 problem.

00:18:58.900 --> 00:19:01.200
We're usually worried
about their maybe I'm not,

00:19:01.500 --> 00:19:03.200
but again, that's something where the,

00:19:03.400 --> 00:19:06.300
the way to address. It, seems
relatively clear, right?

00:19:06.500 --> 00:19:09.000
It's the find your object of interest and

00:19:09.200 --> 00:19:11.600
think about, is that just
bringing the economics?

00:19:11.700 --> 00:19:12.200
Exactly.

00:19:12.200 --> 00:19:15.400
And and can I think about it,
and they denied it, but harnessed

00:19:15.400 --> 00:19:18.300
the power of the machine
learning methods for precisely

00:19:18.500 --> 00:19:22.800
for some of the components precisely.
Exactly. So sort of, you know, the, the,

00:19:22.900 --> 00:19:25.600
the question of interest is the same as
the question of interest is always been,

00:19:25.600 --> 00:19:29.500
but we now better methods for estimating
some pieces of this, right? The

00:19:29.900 --> 00:19:31.600
the place that seems harder to, uh,

00:19:31.900 --> 00:19:33.400
harder to forecast is Right.

00:19:33.400 --> 00:19:36.300
Obviously, there's a huge amount
going in going on in the machine.

00:19:36.400 --> 00:19:37.400
Learning literature

00:19:37.500 --> 00:19:39.700
and the great sort of The Limited ways

00:19:39.700 --> 00:19:42.900
of plugging it in that I've referenced
so far are limited piece of that.

00:19:43.000 --> 00:19:46.100
And so I think there are all sorts of
other interesting questions about where,

00:19:46.300 --> 00:19:46.900
right sort of

00:19:47.100 --> 00:19:49.300
where does this interaction
go? What else can we learn?

00:19:49.300 --> 00:19:52.000
And that's something where,
you know, I think there's

00:19:52.200 --> 00:19:56.400
a ton going on which seems very promising
and I have no idea what the answer is.

00:19:57.000 --> 00:20:01.200
No, no. No, it's I so I totally
agree with that but it's no.

00:20:01.800 --> 00:20:03.500
That's makes it very exciting.

00:20:03.800 --> 00:20:06.100
And I think that's just a
little work to be done there.

00:20:06.600 --> 00:20:11.400
All right. So I say agrees
with me there, say that person.

00:20:14.500 --> 00:20:17.700
If you'd like to watch more
Nobel conversations, click here,

00:20:18.000 --> 00:20:20.400
or if you'd like to learn
more about econometrics,

00:20:20.500 --> 00:20:23.100
check out Josh's mastering
econometrics series.

00:20:23.600 --> 00:20:26.500
If you'd like to learn more
about he do Josh and Isaiah

00:20:26.700 --> 00:20:28.200
check out the links in the description.