0:00:00.100,0:00:02.350
♪ [music] ♪

0:00:03.700,0:00:05.700
- [narrator] Welcome[br]to Nobel Conversations.

0:00:07.000,0:00:10.128
In this episode, Josh Angrist[br]and Guido Imbens

0:00:10.128,0:00:13.700
sit down with Isaiah Andrews[br]to discuss and disagree

0:00:13.700,0:00:16.580
over the role of machine learning[br]in applied econometrics.

0:00:18.300,0:00:19.769
- [Isaiah] So, of course,[br]there are a lot of topics

0:00:19.769,0:00:21.087
where you guys largely agree,

0:00:21.087,0:00:22.313
but I'd like to turn to one

0:00:22.313,0:00:24.240
where maybe you have[br]some differences of opinion.

0:00:24.240,0:00:25.728
So I'd love to hear[br]some of your thoughts

0:00:25.728,0:00:26.883
about machine learning

0:00:26.883,0:00:29.900
and the goal that it's playing[br]and is going to play in economics.

0:00:30.200,0:00:33.352
- [Guido] I've looked at some data[br]like the proprietary

0:00:33.352,0:00:35.100
so that there's[br]no published paper there.

0:00:36.719,0:00:38.159
There was an experiment[br]that was done

0:00:38.159,0:00:39.500
on some search algorithm.

0:00:39.700,0:00:41.497
And the question was...

0:00:42.901,0:00:45.600
it was about ranking things[br]and changing the ranking.

0:00:45.900,0:00:47.500
That was sort of clear...

0:00:48.400,0:00:50.600
that was going to be[br]a lot of heterogeneity there.

0:00:50.600,0:00:51.700
Mmm,

0:00:51.700,0:00:58.120
You know, if you look for say,

0:00:58.300,0:01:00.350
a picture of Britney Spears

0:01:00.350,0:01:02.400
that it doesn't really matter[br]where you rank it

0:01:02.400,0:01:05.500
because you're going to figure out[br]what you're looking for,

0:01:06.200,0:01:07.867
whether you put it[br]in the first or second

0:01:07.867,0:01:09.800
or third position of the ranking.

0:01:10.100,0:01:12.500
But if you're looking[br]for the best econometrics book,

0:01:13.300,0:01:16.500
if you put your book[br]first or your book tenth,

0:01:16.500,0:01:18.100
that's going to make[br]a big difference

0:01:18.600,0:01:21.829
how much how often people[br]are going to click on it.

0:01:21.829,0:01:23.417
And so there you go --

0:01:23.417,0:01:27.218
- [Josh] Why do I need[br]machine learning to discover that?

0:01:27.218,0:01:29.195
It seems like could[br]I can discover it simply?

0:01:29.195,0:01:30.435
- [Guido] So in general--

0:01:30.435,0:01:32.100
- [Josh] There were lots[br]of possible...

0:01:32.100,0:01:35.490
- You what you want to think about[br]there being lots of characteristics

0:01:35.490,0:01:37.610
of the items

0:01:37.610,0:01:41.682
that you want to understand[br]what drives the heterogeneity

0:01:42.300,0:01:43.427
in the effect of--

0:01:43.427,0:01:45.600
- But you're just predicting

0:01:45.600,0:01:47.700
In some sense, you're solving[br]a marketing problem.

0:01:48.400,0:01:49.580
- [inaudible] it's causal effect,

0:01:49.580,0:01:51.800
- It's causal, but it has[br]no scientific content.

0:01:51.800,0:01:53.300
Think about...

0:01:54.100,0:01:57.300
- No, but it's similar things[br]in medical settings.

0:01:58.000,0:02:01.300
If you do an experiment, [br]you may actually be very interested

0:02:01.300,0:02:03.900
in whether the treatment[br]works for some groups or not.

0:02:03.900,0:02:06.500
And you have a lot of individual[br]characteristics,

0:02:06.500,0:02:08.000
and you want[br]to systematically search.

0:02:08.000,0:02:09.500
- Yeah. I'm skeptical about that --

0:02:09.500,0:02:12.603
that sort of idea that there's[br]this personal causal effect

0:02:12.603,0:02:13.900
that I should care about,

0:02:14.000,0:02:16.063
and that machine learning[br]can discover it

0:02:16.063,0:02:17.596
in some way that's useful.

0:02:17.596,0:02:21.400
So think about -- I've done[br]a lot of work on schools,

0:02:21.400,0:02:23.950
going to, say, a charter school,

0:02:23.950,0:02:25.225
a publicly funded private school,

0:02:25.225,0:02:26.500
effectively, you know,[br]that's free to structure

0:02:26.500,0:02:29.300
its own curriculum[br]for context there.

0:02:29.300,0:02:31.000
Some types of charter schools

0:02:31.000,0:02:32.700
generate spectacular[br]achievement gains,

0:02:32.700,0:02:36.400
and in the data set[br]that produces that result,

0:02:36.400,0:02:37.800
I have a lot of covariance.

0:02:37.800,0:02:41.353
So I have baseline scores,[br]and I have family background,

0:02:41.353,0:02:43.576
the education of the parents,

0:02:43.576,0:02:45.800
the sex of the child, [br]the race of the child.

0:02:45.800,0:02:48.300
And, well, soon as I put[br]half a dozen of those together,

0:02:48.400,0:02:51.900
I have a very high dimensional space.

0:02:52.300,0:02:53.600
I'm definitely interested[br]in sort of coarse features

0:02:53.600,0:02:54.900
of that treatment effect,

0:02:54.900,0:02:57.150
like whether it's better for people

0:02:57.150,0:02:59.400
who come from[br]lower income families.

0:03:02.600,0:03:06.000
I have a hard time believing[br]that there's an application,

0:03:06.400,0:03:10.300
for the very high dimensional[br]version of that,

0:03:10.500,0:03:11.850
where I discovered[br]that for non-white children

0:03:11.850,0:03:13.200
who have high family incomes

0:03:13.800,0:03:17.800
but baseline scores[br]in the third quartile

0:03:18.300,0:03:20.650
and only went to public school[br]in the third grade

0:03:20.650,0:03:23.000
but not the sixth grade.

0:03:23.000,0:03:25.500
So that's what that high[br]dimensional analysis produces.

0:03:25.800,0:03:28.100
This very elaborate[br]conditional statement.

0:03:28.300,0:03:31.000
There's two things that are wrong[br]with that in my view.

0:03:31.000,0:03:32.500
First, I don't see it as...

0:03:32.500,0:03:34.000
I just can't imagine[br]why it's actionable.

0:03:34.600,0:03:36.600
I don't know why[br]you'd want to act on it.

0:03:36.600,0:03:38.900
And I know also[br]that there's some alternative model

0:03:38.900,0:03:41.200
that fits almost as well,

0:03:41.800,0:03:43.000
that flips everything,

0:03:43.200,0:03:45.350
Because machine learning[br]doesn't tell me

0:03:45.350,0:03:47.500
that this is really[br]the predictor that matters.

0:03:48.400,0:03:52.300
It just tells me that[br]this is a good predictor.

0:03:52.800,0:03:54.350
And so, I think[br]there is something different

0:03:54.350,0:03:55.900
about the social science contest.

0:03:57.940,0:03:59.545
- [Guido] I think[br]the [socialized sign] applications

0:03:59.545,0:04:01.150
you're talking about,

0:04:01.150,0:04:02.600
once were...

0:04:03.400,0:04:08.100
I think there's not a huge amount[br]of heterogeneity in the effects.

0:04:08.400,0:04:11.200
- [Josh] There might be

0:04:11.200,0:04:14.000
if you allow me[br]to to fill that space.

0:04:14.600,0:04:16.350
- No... not even then.

0:04:16.350,0:04:18.100
I think for a lot[br]of those interventions,

0:04:18.300,0:04:22.000
you would expect that the effect[br]is the same sign for everybody.

0:04:23.400,0:04:27.600
There may be small differences[br]in the magnitude, but it's not...

0:04:28.200,0:04:31.700
For a lot of these education[br]defenses -- they're good for everybody.

0:04:32.900,0:04:35.250
It's not that they're bad[br]for some people

0:04:35.250,0:04:37.600
and good for other people,

0:04:37.600,0:04:39.200
and that is kind[br]of very small pockets

0:04:39.200,0:04:40.800
where they're bad there.

0:04:40.900,0:04:43.900
But it may be some variation[br]in the magnitude,

0:04:44.000,0:04:48.200
but you would need very, [br]very big data sets to find those.

0:04:48.400,0:04:49.900
I agree that in those cases,

0:04:49.900,0:04:51.400
they probably wouldn't be[br]very actionable anyone.

0:04:51.700,0:04:53.800
But I think there's a lot[br]of other settings

0:04:54.100,0:04:56.600
where there is[br]much more heterogeneity.

0:04:57.400,0:04:59.500
- Well, I'm open[br]to that possibility,

0:04:59.500,0:05:05.550
and I think the example you gave[br]is essentially a marketing example.

0:05:06.430,0:05:10.700
- No, those have implications for it[br]and that's the organization,

0:05:10.700,0:05:13.900
whether you need[br]to worry about the...

0:05:14.000,0:05:17.900
- Well, I need to see that paper.

0:05:18.400,0:05:21.200
- So the sense I'm getting...

0:05:21.500,0:05:23.100
- We still disagree on something.[br]- Yes.

0:05:23.100,0:05:24.100
[laughter]

0:05:24.100,0:05:25.400
- We haven't converged[br]on everything.

0:05:25.400,0:05:26.050
- I'm getting that sense.

0:05:26.050,0:05:26.700
[laughter]

0:05:27.200,0:05:29.100
- Actually, we've diverged on this

0:05:29.100,0:05:30.050
because this wasn't around[br]to argue about.

0:05:30.050,0:05:31.000
[laughter]

0:05:33.200,0:05:35.600
- Is it getting a little warm here?

0:05:35.600,0:05:38.000
- Warmed up. Warmed up is good.

0:05:38.100,0:05:40.800
The sense I'm getting is, Josh,[br]you're not saying

0:05:40.900,0:05:43.400
that you're confident[br]that there is no way

0:05:43.400,0:05:45.400
that there is an application[br]where the stuff.

0:05:45.400,0:05:46.800
It's useful you are saying

0:05:46.800,0:05:48.200
you are unconvinced by[br]the existing application to date.

0:05:48.300,0:05:51.280
Fair enough.

0:05:51.280,0:05:53.120
- I'm very confident.

0:05:53.120,0:05:54.300
[laughter]

0:05:54.300,0:05:55.300
- In this case.

0:05:55.300,0:05:57.500
- I think Josh does have a point

0:05:58.000,0:06:02.100
that even in the prediction cases

0:06:02.300,0:06:05.000
where a lot of the machine learning[br]methods really shine

0:06:05.000,0:06:06.600
is where there's just a lot[br]of heterogeneity.

0:06:07.300,0:06:10.600
- You don't really care much[br]about the details there, right?

0:06:10.900,0:06:15.000
It doesn't have[br]a policy angle or something.

0:06:15.200,0:06:18.100
- They kind of recognizing[br]handwritten digits and stuff.

0:06:18.300,0:06:21.150
It does much better there

0:06:21.150,0:06:24.000
than building[br]some complicated model.

0:06:24.400,0:06:28.100
But a lot of the social science,[br]a lot of the economic applications,

0:06:28.300,0:06:30.200
we actually know a huge amount[br]about the relationship

0:06:30.200,0:06:32.100
between its variables.

0:06:32.100,0:06:34.600
A lot of the relationships[br]are strictly monotone.

0:06:35.400,0:06:39.400
Education is going to increase[br]people's earnings,

0:06:39.800,0:06:41.950
irrespective of the demographic,

0:06:41.950,0:06:44.100
irrespective of the level[br]of education you already have.

0:06:44.100,0:06:45.950
- Until they get to a Ph.D.

0:06:45.950,0:06:47.800
- Yeah, there is a graduate school...

0:06:48.150,0:06:49.150
[laughter]

0:06:49.500,0:06:50.700
but go over a reasonable range.

0:06:51.600,0:06:55.900
It's not going[br]to go down very much.

0:06:56.100,0:06:57.900
In a lot of the settings

0:06:57.900,0:06:59.700
where these machine learning[br]methods shine,

0:06:59.700,0:07:01.900
there's a lot of [ ]

0:07:02.100,0:07:04.900
kind of multimodality[br]in these relationships,

0:07:05.300,0:07:08.400
and they're going to be[br]very powerful.

0:07:08.400,0:07:11.500
But I still stand by that.

0:07:11.700,0:07:16.100
These methods just have[br]a huge amount to offer

0:07:16.400,0:07:18.100
for economists,

0:07:18.200,0:07:21.700
and they're going to be[br]a big part of the future.

0:07:23.400,0:07:24.600
- [Isaiah] Feels like[br]there's something interesting

0:07:24.600,0:07:25.800
to be said about[br]machine learning here.

0:07:25.800,0:07:27.700
So, Guido, I was wondering,[br]could you give some more...

0:07:28.000,0:07:29.000
maybe some examples[br]of the sorts of examples

0:07:29.000,0:07:32.500
you're thinking about[br]with applications [ ] at the moment?

0:07:32.500,0:07:34.100
- So on areas where

0:07:34.700,0:07:36.400
instead of looking[br]for average cause or effects

0:07:36.500,0:07:39.350
we're looking for[br]individualized estimates,

0:07:39.350,0:07:42.200
predictions of cause or effects

0:07:42.400,0:07:44.950
and the machine learning algorithms[br]have been very effective,

0:07:48.300,0:07:51.500
Traditionally, we would have done[br]these things using kernel methods.

0:07:51.600,0:07:54.500
And theoretically they work great,

0:07:54.600,0:07:56.000
and there's some arguments

0:07:56.000,0:07:57.400
that, formally, [br]you can't do any better.

0:07:57.600,0:08:00.500
But in practice, [br]they don't work very well.

0:08:00.900,0:08:03.150
Random causal forest-type things

0:08:03.150,0:08:05.400
that Stefan Wager and Susan Athey[br]have been working on

0:08:05.400,0:08:09.500
have used very widely.

0:08:09.600,0:08:12.200
They've been very effective[br]in these settings

0:08:12.400,0:08:18.100
to actually get causal effects[br]that vary be [ ].

0:08:20.700,0:08:23.200
I think this is still just the beginning[br]of these methods.

0:08:23.200,0:08:25.700
But in many cases,

0:08:26.400,0:08:31.600
these algorithms are very effective[br]as searching over big spaces

0:08:31.800,0:08:35.600
and finding the functions that fit very well

0:08:35.900,0:08:41.100
in ways that we couldn't[br]really do beforehand.

0:08:41.500,0:08:43.400
- I don't know of an example

0:08:43.400,0:08:45.300
where machine learning[br]has generated insights

0:08:45.300,0:08:48.100
about a causal effect[br]that I'm interested in.

0:08:48.300,0:08:49.800
And I do know of examples

0:08:49.800,0:08:51.300
where it's potentially[br]very misleading.

0:08:51.300,0:08:53.700
So I've done some work[br]with Brigham Frandsen,

0:08:54.100,0:08:55.100
using, for example, random forest[br]to model covariate effects

0:08:55.100,0:08:59.900
in an instrumental[br]variables problem

0:09:00.200,0:09:01.200
Where you need you need[br]to condition on covariance.

0:09:04.400,0:09:06.300
And you don't particularly[br]have strong feelings

0:09:06.300,0:09:08.200
about the functional form for that,

0:09:08.200,0:09:10.000
so maybe you should curve...

0:09:10.900,0:09:12.700
be open to flexible curve fitting,

0:09:12.700,0:09:14.500
and that leads you down a path

0:09:14.500,0:09:18.000
where there's a lot[br]of nonlinearities in the model,

0:09:18.200,0:09:20.600
and that's very dangerous with IV

0:09:20.600,0:09:23.000
because any sort[br]of excluded non-linearity

0:09:23.300,0:09:25.450
potentially generates[br]a spurious causal effect

0:09:25.450,0:09:27.600
and Brigham and I[br]showed that very powerfully.

0:09:27.900,0:09:32.200
I think in the case[br]of two instruments

0:09:32.700,0:09:36.000
that come from a paper of mine[br]with Bill Evans,

0:09:36.500,0:09:37.600
where if you replace it

0:09:38.100,0:09:40.350
a traditional two stage [br][ ] squares estimator

0:09:40.350,0:09:42.600
with some kind of random forest,

0:09:42.900,0:09:48.000
you get very precisely[br]estimated [non-sense] estimates.

0:09:49.000,0:09:51.100
I think that's a big caution.

0:09:51.100,0:09:53.400
In view of those findings[br]in an example I care about

0:09:53.700,0:09:57.100
where the instruments[br]are very simple

0:09:57.400,0:09:59.100
and I believe that they're valid,

0:09:59.300,0:10:01.600
I would be skeptical of that.

0:10:02.900,0:10:06.800
So non-linearity and IV[br]don't mix very comfortably.

0:10:07.200,0:10:10.450
No, it sounds like that's already[br]a more complicated...

0:10:10.450,0:10:11.400
- Well, it's IV....[br]- Yeah.

0:10:12.500,0:10:16.700
- ...and we work on that.

0:10:17.150,0:10:17.875
[laughter]

0:10:17.875,0:10:18.600
- Fair enough.

0:10:18.600,0:10:20.450
- As Editor of Econometric [guy],

0:10:20.450,0:10:22.300
a lot of these papers[br]cross by my desk,

0:10:22.700,0:10:26.100
but the motivation is not clear

0:10:26.100,0:10:29.500
and, in fact, really lacking.

0:10:29.800,0:10:35.100
They're not... [we call] type[br]semi-parametric foundational papers.

0:10:35.400,0:10:37.100
So that that's a big problem.

0:10:38.000,0:10:42.400
A related problem is that we have[br]this tradition in econometrics

0:10:42.600,0:10:47.500
of being very focused[br]on these formal [ ] results.

0:10:48.800,0:10:52.600
We have just have a lot of papers[br]where people propose a method

0:10:52.800,0:10:55.700
and then establish[br]the asymptotic properties

0:10:56.300,0:10:59.100
in a very kind of standardized way.

0:10:59.100,0:11:01.900
- Is that bad?

0:11:02.900,0:11:07.200
- Well, I think it's sort[br]of closed the door

0:11:07.200,0:11:09.400
for a lot of work[br]that doesn't fit it into that.

0:11:09.400,0:11:11.600
where in the machine[br]learning literature,

0:11:11.900,0:11:14.300
a lot of things[br]are more algorithmic.

0:11:14.431,0:11:18.500
People had algorithms[br]for coming up with predictions

0:11:18.800,0:11:21.200
that turn out[br]to actually work much better

0:11:21.200,0:11:23.600
than, say, nonparametric[br]kernel regression

0:11:24.000,0:11:26.800
For a long time, we were doing all[br]the nonparametrics in econometrics,

0:11:26.800,0:11:28.950
we were using kernel regression,

0:11:28.950,0:11:31.100
and it was great for proving theorems.

0:11:31.300,0:11:33.050
You could get [ ] intervals

0:11:33.050,0:11:34.800
and consistency, [br]and asymptotic normality,

0:11:34.800,0:11:35.900
and it was all great,

0:11:35.900,0:11:37.000
But it wasn't very useful.

0:11:37.300,0:11:39.100
And the things they did[br]in machine learning

0:11:39.100,0:11:40.900
are just way, way better.

0:11:41.000,0:11:43.050
But they didn't have the problem--

0:11:43.050,0:11:44.300
- That's not my beef[br]with machine learning theory.

0:11:44.300,0:11:45.300
[laughter]

0:11:45.300,0:11:51.200
No, but I'm saying there,[br]for the prediction part,

0:11:51.400,0:11:52.950
it does much better.

0:11:52.950,0:11:54.500
- Yeah, it's a better[br]curve fitting to it.

0:11:54.900,0:11:56.500
- But it did so in a way

0:11:57.100,0:11:58.500
that would not have made[br]those papers

0:11:58.500,0:11:59.900
initially easy to get into,[br]the econometrics journals,

0:12:04.650,0:12:06.300
because it wasn't proving[br]the type of things.

0:12:06.400,0:12:08.800
When Brigham was doing[br]his regression trees

0:12:08.800,0:12:11.200
that just didn't fit in.

0:12:11.800,0:12:15.100
I think he would have had[br]a very hard time

0:12:15.200,0:12:18.400
publishing these things[br]in econometric journals.

0:12:18.900,0:12:24.400
I think we've limited[br]ourselves too much

0:12:24.700,0:12:27.900
that left us close things off

0:12:28.000,0:12:29.400
for a lot of these[br]machine learning methods

0:12:29.400,0:12:30.800
that are actually very useful.

0:12:30.900,0:12:34.000
I mean, I think, in general,

0:12:34.900,0:12:36.200
that literature, [br]the computer scientist,

0:12:36.200,0:12:37.750
have proposed a huge number[br]of these algorithms

0:12:37.750,0:12:39.300
that actually are very useful.

0:12:45.500,0:12:47.300
and that are affecting

0:12:47.300,0:12:49.100
the way we're going[br]to be doing empirical work.

0:12:49.800,0:12:52.450
But we've not fully internalized that

0:12:52.450,0:12:55.100
because we're still very focused

0:12:55.300,0:12:57.500
on getting point estimates[br]and getting standard errors

0:12:58.600,0:13:01.200
and getting P values

0:13:01.700,0:13:03.100
in a way that we need to move beyond

0:13:03.300,0:13:04.300
to fully harness the force,

0:13:04.300,0:13:10.700
the benefits[br]from the machine learning literature.

0:13:10.900,0:13:13.000
- On the one hand, I guess I very[br]much take your point

0:13:13.000,0:13:15.100
that sort of the traditional[br]econometrics framework

0:13:15.200,0:13:18.600
of sort of propose a method,[br]prove a limit theorem

0:13:18.600,0:13:22.600
under some asymptotic story,[br]story story, story story...

0:13:22.600,0:13:26.900
publisher paper is constraining.

0:13:26.900,0:13:29.700
And that, in some sense,

0:13:29.700,0:13:30.575
by thinking more broadly

0:13:30.575,0:13:31.450
about what a methods paper[br]could look like,

0:13:31.450,0:13:33.200
we may [write] in some sense.

0:13:33.200,0:13:35.900
Certainly the machine learning[br]literature has found a bunch of things,

0:13:35.900,0:13:38.300
which seem to work quite well[br]for a number of problems

0:13:38.300,0:13:40.350
and are now having[br]substantial influence in economics.

0:13:40.350,0:13:42.400
I guess a question I'm interested in

0:13:42.400,0:13:44.800
is how do you think[br]about the role of...

0:13:47.900,0:13:51.200
sort of -- do you think there is[br]no value in the theory part of it?

0:13:51.600,0:13:54.800
Because I guess a question[br]that I often have

0:13:54.800,0:13:56.900
to sort of seeing that output[br]from a machine learning tool,

0:13:56.900,0:13:59.400
that actually a number of the[br]methods that you talked about

0:13:59.400,0:14:01.800
actually do have inferential results[br]developed for them,

0:14:02.600,0:14:04.500
something that[br]I always wonder about

0:14:04.500,0:14:06.400
of uncertainty quantification[br]and just...

0:14:06.500,0:14:08.000
I have my prior,

0:14:08.000,0:14:11.000
I come into the world with my view.[br]I see the result of this thing.

0:14:11.000,0:14:12.750
How should I update based on it?

0:14:12.750,0:14:14.500
And in some sense, [br]if I'm in a world

0:14:14.600,0:14:15.100
where things are normally distributed,

0:14:15.200,0:14:16.700
I know how to do it here --

0:14:16.700,0:14:18.200
here I don't.

0:14:18.200,0:14:21.400
And so I'm interested to hear[br]what you think about that.

0:14:21.500,0:14:24.300
- I don't see this as sort[br]of saying, well,

0:14:24.400,0:14:26.500
these results are not interesting,

0:14:26.600,0:14:27.700
but it's going to be a lot of cases

0:14:28.000,0:14:29.600
where it's going[br]to be incredibly hard

0:14:29.600,0:14:31.200
to get those results

0:14:31.200,0:14:33.200
and we may not be able to get there

0:14:33.400,0:14:35.550
and we may need to do it in stages

0:14:35.550,0:14:37.700
where first someone says,

0:14:39.600,0:14:40.900
"Hey, I have[br]this interesting algorithm

0:14:40.900,0:14:42.200
for doing something

0:14:42.200,0:14:44.800
and it works well by some of the criterion

0:14:45.600,0:14:49.900
that on this particular data set,

0:14:51.000,0:14:53.400
and I'm visit put it out there,

0:14:53.700,0:14:55.850
and maybe someone will figure out a way

0:14:55.850,0:14:58.000
that you can later actually[br]still do inference

0:14:58.000,0:14:59.100
on the [sum] condition,

0:14:59.100,0:15:02.100
and maybe those are not[br]particularly realistic conditions,

0:15:02.100,0:15:03.800
then we kind of go further.

0:15:03.800,0:15:05.500
But I think we've been[br]constraining things too much

0:15:06.700,0:15:09.050
where we said,

0:15:09.050,0:15:11.400
"This is the type of things[br]that we need to do.

0:15:12.100,0:15:14.400
And in some sense,

0:15:15.700,0:15:18.200
that goes back[br]to the way Josh and I

0:15:19.700,0:15:21.900
thought about things for the[br][local average treatment] effect.

0:15:21.900,0:15:23.250
That wasn't quite the way

0:15:23.250,0:15:24.600
people were thinking[br]about these problems before.

0:15:24.600,0:15:29.200
There was a sense[br]that some of the people said

0:15:29.500,0:15:31.900
the way you need to do[br]these things is you first say,

0:15:32.200,0:15:34.250
what you're interested in[br]in estimating

0:15:34.250,0:15:36.300
and then you do the best job[br]you can in estimating that.

0:15:38.100,0:15:44.200
and what you guys are doing[br]is you're doing it backwards.

0:15:44.300,0:15:46.700
You kind of say,[br]"Here, I have an estimator,

0:15:47.300,0:15:49.600
and now I'm going to figure out[br]what it's estimating,

0:15:51.400,0:15:53.900
and I suppose you're going to say[br]why you think that's interesting

0:15:53.900,0:15:56.600
or maybe why it's not interesting,[br]and that's not okay.

0:15:56.600,0:15:58.600
You're not allowed[br]to do that that way.

0:15:59.000,0:16:04.100
And I think we should[br]just be a little bit more flexible

0:16:04.300,0:16:06.300
in thinking about[br]how to look at problems

0:16:06.400,0:16:08.850
because I think[br]we've missed some things

0:16:08.850,0:16:11.300
by not doing that.

0:16:13.000,0:16:14.800
- [Josh] So you've heard[br]our views, Isaiah.

0:16:14.800,0:16:16.600
You've seen that we have[br]some points of disagreement.

0:16:17.000,0:16:20.400
Why don't you referee[br]this dispute for us?

0:16:20.950,0:16:21.950
[laughter]

0:16:22.500,0:16:25.300
- Oh, it's so nice of you[br]to ask me a small question.

0:16:25.300,0:16:28.100
So I guess for one,

0:16:28.200,0:16:33.200
I very much agree with something[br]that Guido said earlier of...

0:16:34.100,0:16:35.100
[laughter]

0:16:36.500,0:16:37.900
- So one thing where it seems

0:16:37.900,0:16:39.650
where the case for machine learning[br]seems relatively clear

0:16:39.650,0:16:41.400
is in settings where[br]we're interested in some version

0:16:41.500,0:16:45.100
of a nonparametric[br]prediction problem.

0:16:45.100,0:16:47.400
So I'm interested in estimating

0:16:47.400,0:16:49.700
a conditional expectation[br]or conditional probability,

0:16:50.000,0:16:52.100
and in the past, maybe[br]I would have run a kernel...

0:16:52.100,0:16:53.950
I would have run[br]a kernel regression

0:16:53.950,0:16:55.800
or I would have run[br]a series regression,

0:16:56.100,0:16:57.400
or something along those lines.

0:16:58.700,0:17:00.350
It seems like, at this point, [br]we've a fairly good sense

0:17:00.350,0:17:02.000
that in a fairly wide range[br]of applications,

0:17:02.000,0:17:06.300
machine learning methods[br]seem to do better

0:17:06.800,0:17:08.800
for estimating conditional[br]mean functions

0:17:08.800,0:17:10.400
or conditional probabilities

0:17:10.400,0:17:12.000
or various other[br]nonparametric objects

0:17:12.400,0:17:14.500
than more traditional[br]nonparametric methods

0:17:14.500,0:17:16.600
that were studied[br]in econometrics and statistics,

0:17:16.600,0:17:19.100
especially[br]in high dimensional settings.

0:17:19.500,0:17:21.300
- So you're thinking of maybe[br]the propensity score

0:17:21.300,0:17:23.100
or something like that?

0:17:23.100,0:17:24.200
- Yeah, exactly,

0:17:24.200,0:17:25.300
- Nuisance functions.

0:17:25.300,0:17:27.100
Yeah, so things[br]like propensity scores,

0:17:27.530,0:17:29.965
even objects of more direct

0:17:29.965,0:17:32.400
interest-like conditional[br]average treatment effects,

0:17:32.400,0:17:35.100
which of the difference of two[br]conditional expectation functions,

0:17:35.100,0:17:36.300
potentially things like that.

0:17:36.500,0:17:40.400
Of course, even there, the theory...

0:17:40.500,0:17:43.700
inference of the theory[br]for how to interpret,

0:17:43.700,0:17:45.900
how to make large simple statements[br]about some of these things

0:17:46.000,0:17:48.050
are less well-developed[br]depending on

0:17:48.050,0:17:50.100
the machine learning[br]estimator used.

0:17:50.100,0:17:53.800
And so I think there's[br]something that is tricky

0:17:53.900,0:17:55.700
is that we can have these methods,[br]which work a lot,

0:17:55.700,0:17:58.000
which seemed to work[br]a lot better for some purposes,

0:17:58.000,0:18:01.600
but which we need to be a bit[br]careful in how we plug them in

0:18:01.600,0:18:03.300
or how we interpret[br]the resulting statements.

0:18:03.600,0:18:06.200
But of course, that's a very,[br]very active area right now

0:18:06.400,0:18:08.400
where people are doing[br]tons of great work.

0:18:08.400,0:18:10.400
And so I fully expect[br]and hope to see

0:18:10.400,0:18:12.800
much more going forward there.

0:18:13.000,0:18:17.300
So one issue with machine learning[br]that always seems a danger

0:18:17.400,0:18:20.300
or that is sometimes a danger

0:18:20.500,0:18:21.550
and had sometimes[br]led to applications

0:18:21.550,0:18:22.600
that have made less sense

0:18:22.800,0:18:25.100
is when folks start with a method[br]that they're very excited about

0:18:25.300,0:18:28.500
rather than a question.

0:18:28.900,0:18:32.100
So sort of starting with a question

0:18:32.500,0:18:34.350
where here's the object I'm interested in,

0:18:34.350,0:18:36.200
here is the parameter of interest.

0:18:37.300,0:18:39.500
let me think about how I would[br]identify that thing,

0:18:39.500,0:18:41.800
how I would recover that thing[br]if I had a ton of data.

0:18:41.900,0:18:44.000
Oh, here's a conditional[br]expectation function.

0:18:44.000,0:18:47.100
Let me plug in the machine[br]learning estimator for that.

0:18:47.200,0:18:48.800
That seems very, very sensible.

0:18:49.000,0:18:53.100
Whereas, you know, [br]if I regress quantity on price

0:18:53.700,0:18:56.000
and say that I used[br]a machine learning method,

0:18:56.300,0:18:58.900
maybe I'm satisfied that [br]that solves the [ ] problem

0:18:58.900,0:19:01.200
we're usually worried[br]about there... maybe I'm not.

0:19:01.500,0:19:03.200
But again, that's something

0:19:03.400,0:19:06.300
where the way to address it[br]seems relatively clear.

0:19:06.500,0:19:09.000
It's to find your object of interest

0:19:09.200,0:19:10.400
and think about--

0:19:10.400,0:19:11.600
- Just bring in the economics.

0:19:11.700,0:19:12.200
- Exactly.

0:19:12.200,0:19:15.400
- And and can I think about heterogeneity,

0:19:15.400,0:19:18.300
but harnessed the power[br]of the machine learning methods

0:19:18.500,0:19:20.650
for some of the components.

0:19:20.650,0:19:22.800
- Precisely. Exactly.

0:19:22.900,0:19:24.250
So the question of interest

0:19:24.250,0:19:25.600
is the same as the question[br]of interest has always been,

0:19:25.600,0:19:29.500
but we now have better methods[br]for estimating some pieces of this.

0:19:29.900,0:19:31.600
The place that seems[br]harder to forecast

0:19:33.400,0:19:34.850
is obviously, there's[br]a huge amount going on

0:19:34.850,0:19:36.300
in the machine learning literature

0:19:37.500,0:19:38.600
and the limited ways[br]of plugging it in

0:19:38.600,0:19:39.700
that I've referenced so far

0:19:39.700,0:19:42.900
are a limited piece of that.

0:19:43.000,0:19:44.550
And so I think there are all sorts[br]of other interesting questions

0:19:44.550,0:19:46.100
about where...

0:19:47.100,0:19:49.300
where does this interaction go? [br]What else can we learn?

0:19:49.300,0:19:52.000
And that's something where[br]I think there's a ton going on

0:19:52.200,0:19:54.300
which seems very promising,

0:19:54.300,0:19:56.400
and I have no idea[br]what the answer is.

0:19:57.000,0:19:59.100
- No, I totally agree with that,

0:19:59.100,0:20:01.200
but that makes it very exciting.

0:20:03.800,0:20:06.100
And I think there's just[br]a little work to be done there.

0:20:06.600,0:20:09.000
Alright. So I say, he agrees[br]with me there.

0:20:09.000,0:20:11.400
[laughter]

0:20:12.450,0:20:13.450
- I didn't say that per se.

0:20:14.500,0:20:16.100
- [Narrator] If you'd like to watch[br]more Nobel Conversations,

0:20:16.100,0:20:17.700
click here.

0:20:18.000,0:20:20.400
Pr if you'd like to learn[br]more about econometrics,

0:20:20.500,0:20:23.100
check out Josh's[br]Mastering Econometrics series.

0:20:23.600,0:20:26.500
If you'd like to learn more[br]about Guido, Josh, and Isaiah,

0:20:26.700,0:20:28.200
check out the links[br]in the description.

0:20:28.550,0:20:30.535
♪ [music] ♪