-
♪ [music] ♪
-
- [Narrator] Welcome
to Nobel Conversations.
-
In this episode, Josh Angrist
and Guido Imbens
-
sit down with Isaiah Andrews
to discuss and disagree
-
over the role of machine learning
in applied econometrics.
-
- [Isaiah] So, of course,
there are a lot of topics
-
where you guys largely agree,
-
but I'd like to turn to one
-
where maybe you have
some differences of opinion.
-
I'd love to hear
some of your thoughts
-
about machine learning
-
and the goal that it's playing
and is going to play in economics.
-
- [Guido] I've looked at some data
like the proprietary.
-
We see that there's
no published paper there.
-
There was an experiment
that was done
-
on some search algorithm,
-
and the question was --
-
it was about ranking things
and changing the ranking.
-
And it was sort of clear
-
that there was going to be
a lot of heterogeneity there.
-
If you look for, say,
-
a picture of Britney Spears --
-
that it doesn't really matter
where you rank it
-
because you're going to figure out
what you're looking for,
-
whether you put it
in the first or second
-
or third position of the ranking.
-
But if you're looking
for the best econometrics book,
-
if you put your book first
or your book tenth --
-
that's going to make
a big difference
-
how often people
are going to click on it.
-
And so there you --
-
- [Josh] Why do I need
machine learning to discover that?
-
It seems like -- because
I can discover it simply.
-
- [Guido] So in general --
-
- [Josh] There were lots
of possible...
-
- You want to think about
there being lots of characteristics
-
of the items,
-
that you want to understand
what drives the heterogeneity
-
in the effect of --
-
- But you're just predicting
-
In some sense, you're solving
a marketing problem.
-
- No, it's a causal effect,
-
- It's causal, but it has
no scientific content.
-
Think about...
-
- No, but there's similar things
in medical settings.
-
If you do an experiment,
you may actually be very interested
-
in whether the treatment works
for some groups or not.
-
And you have a lot
of individual characteristics,
-
and you want
to systematically search --
-
- Yeah. I'm skeptical about that --
-
that sort of idea that there's
this personal causal effect
-
that I should care about,
-
and that machine learning
can discover it
-
in some way that's useful.
-
So think about -- I've done
a lot of work on schools,
-
going to, say, a charter school,
-
a publicly funded private school,
-
effectively,
that's free to structure
-
its own curriculum
for context there.
-
Some types of charter schools
-
generate spectacular
achievement gains,
-
and in the data set
that produces that result,
-
I have a lot of covariates.
-
So I have baseline scores,
and I have family background,
-
the education of the parents,
-
the sex of the child,
the race of the child.
-
And, well, soon as I put
half a dozen of those together,
-
I have a very
high-dimensional space.
-
I'm definitely interested
in course features
-
of that treatment effect,
-
like whether it's better for people
-
who come from
lower-income families.
-
I have a hard time believing
that there's an application
-
for the very high-dimensional
version of that,
-
where I discovered
that for non-white children
-
who have high family incomes
-
but baseline scores
in the third quartile
-
and only went to public school
in the third grade
-
but not the sixth grade.
-
So that's what that
high-dimensional analysis produces.
-
It's a very elaborate
conditional statement.
-
There's two things that are wrong
with that in my view.
-
First, I don't see it as --
-
I just can't imagine
why it's actionable.
-
I don't know why
you'd want to act on it.
-
And I know also that
there's some alternative model
-
that fits almost as well,
-
that flips everything.
-
Because machine learning
doesn't tell me
-
that this is really
the predictor that matters --
-
it just tells me
that this is a good predictor.
-
And so, I think
there is something different
-
about the social science context.
-
- [Guido] I think
the social science applications
-
you're talking about,
-
ones where...
-
I think there's not a huge amount
of heterogeneity in the effects.
-
- [Josh] Well, there might be
if you allow me
-
to fill that space.
-
- No... not even then.
-
I think for a lot
of those interventions,
-
you would expect that the effect
has the same sign for everybody.
-
There may be small differences
in the magnitude, but it's not...
-
For a lot of these
educational [defenses] --
-
they're good for everybody.
-
It's not that they're bad
for some people
-
and good for other people,
-
and that is kind
of very small pockets
-
where they're bad there.
-
But there may be some variation
in the magnitude,
-
but you would need very,
very big data sets to find those.
-
I agree that in those cases,
-
they probably wouldn't be
very actionable anyway.
-
But I think there's a lot
of other settings
-
where there is
much more heterogeneity.
-
- Well, I'm open
to that possibility,
-
and I think the example you gave
is essentially a marketing example.
-
- No, those have
implications for it
-
and that's the organization,
-
whether you need
to worry about the...
-
- Well, I need to see that paper.
-
- So the sense
I'm getting is that --
-
- We still disagree on something.
- Yes. [laughter]
-
- We haven't converged
on everything.
-
- I'm getting that sense.
[laughter]
-
- Actually, we've diverged on this
-
because this wasn't around
to argue about.
-
[laughter]
-
- Is it getting a little warm here?
-
- Warmed up. Warmed up is good.
-
The sense I'm getting is,
Josh, you're not saying
-
that you're confident
that there is no way
-
that there is an application
with the stuff.
-
It's useful you are saying
you are unconvinced
-
by the existing
applications to date.
-
- Fair enough.
- I'm very confident.
-
[laughter]
-
- In this case.
-
- I think Josh does have a point
-
that even in the prediction cases
-
where a lot of the machine learning
methods really shine
-
is where there's just a lot
of heterogeneity.
-
- You don't really care much
about the details there, right?
-
- [Guido] Yes.
-
- It doesn't have
a policy angle or something.
-
- The kind of recognizing
handwritten digits and stuff --
-
it does much better there
-
than building
some complicated model.
-
But a lot of the social science,
a lot of the economic applications,
-
we actually know a huge amount
about the relationship
-
between its variables.
-
A lot of the relationships
are strictly monotone.
-
Education is going to increase
people's earnings,
-
irrespective of the demographic,
-
irrespective of the level
of education you already have.
-
- Until they get to a Ph.D.
-
- They don't have proof
of graduate school...
-
[laughter]
-
- Over a reasonable range.
-
It's not going
to go down very much.
-
In a lot of the settings
-
where these machine learning
methods shine,
-
there's a lot of non-monotonicity
-
kind of multimodality
in these relationships,
-
and they're going to be
very powerful.
-
But I still stand by that.
-
These methods just have
a huge amount to offer
-
for economists,
-
and they're going to be
a big part of the future.
-
♪ [music] ♪
-
- [Isaiah] It feels like
there's something interesting
-
to be said about
machine learning here.
-
So, Guido, I was wondering,
could you give some more...
-
maybe some examples
of the sorts of examples
-
you're thinking about
-
with applications [inaudible]
at the moment?
-
- So one area is where
-
instead of looking
for average causal effects,
-
we're looking for
individualized estimates,
-
predictions of cause or effects,
-
and the machine learning algorithms
have been very effective.
-
Traditionally, we would have done
these things using kernel methods,
-
and theoretically, they work great,
-
and there's some arguments
-
that, formally,
you can't do any better.
-
But in practice,
they don't work very well.
-
Random causal forest-type things
-
that Stefan Wager and Susan Athey
have been working on
-
are used very widely.
-
They've been very effective
in these settings
-
to actually get causal effects
that vary by covariate.
-
I think this is still just
the beginning of these methods.
-
But in many cases,
-
these algorithms are very effective
as searching over big spaces
-
and finding the functions
that fit very well
-
in ways that we couldn't
really do beforehand.
-
- I don't know of an example
-
where machine learning
has generated insights
-
about a causal effect
that I'm interested in.
-
And I do know of examples
-
where it's potentially
very misleading.
-
So I've done some work
with Brigham Frandsen,
-
using, for example, random forest
to model covariate effects
-
in an instrumental
variables problem
-
where you need
to condition on covariates.
-
And you don't particularly
have strong feelings
-
about the functional form for that,
-
so maybe you should curve...
-
be open to flexible curve fitting,
-
And that leads you down a path
-
where there's a lot
of nonlinearities in the model,
-
and that's very dangerous with IV
-
because any sort
of excluded non-linearity
-
potentially generates
a spurious causal effect,
-
and Brigham and I showed that
very powerfully, I think,
-
in the case of two instruments
-
that come from a paper of mine
with Bill Evans,
-
where if you replace it...
-
a traditional two-stage
least squares estimator
-
with some kind of random forest,
-
you get very precisely estimated
nonsense estimates.
-
I think that's a big caution.
-
In view of those findings,
in an example I care about
-
where the instruments
are very simple
-
and I believe that they're valid,
-
I would be skeptical of that.
-
Non-linearity and IV
don't mix very comfortably.
-
- No, it sounds like that's already
a more complicated...
-
- Well, it's IV...
- Yeah.
-
- ...but then we work on that.
-
[laughter]
-
- Fair enough.
-
♪ [music] ♪
-
- [Guido] As an editor
of econometric guy,
-
a lot of these papers
cross my desk,
-
but the motivation is not clear
-
and, in fact, really lacking.
-
They're not...
-
[vehicle]-type semiparametric
foundational papers.
-
So that's a big problem.
-
A related problem is that we have
this tradition in econometrics
-
of being very focused
on these formal asymptotic results.
-
We just have a lot of papers
where people propose a method,
-
and then they establish
the asymptotic properties
-
in a very kind of standardized way.
-
- Is that bad?
-
- Well, I think it's sort
of closed the door
-
for a lot of work
that doesn't fit into that
-
where in the machine
learning literature,
-
a lot of things
are more algorithmic.
-
People had algorithms
for coming up with predictions
-
that turn out
to actually work much better
-
than, say, nonparametric
kernel regression.
-
For a long time, we were doing all
the nonparametrics in econometrics,
-
and we were using
kernel regression,
-
and that was great
for proving theorems.
-
You could get confidence intervals
-
and consistency,
and asymptotic normality,
-
and it was all great,
-
But it wasn't very useful.
-
And the things they did
in machine learning
-
are just way, way better.
-
But they didn't have the problem --
-
- That's not my beef
with machine learning,
-
that the theory is weak.
-
[laughter]
-
- No, but I'm saying there,
for the prediction part,
-
it does much better.
-
- Yeah, it's a better
curve fitting to it.
-
- But it did so in a way
-
that would not have made
those papers
-
initially easy to get into,
the econometrics journals,
-
because it wasn't proving
the type of things...
-
When Breiman was doing
his regression trees --
-
they just didn't fit in.
-
I think he would have had
a very hard time
-
publishing these things
in econometrics journals.
-
I think we've limited
ourselves too much
-
that left us close things off
-
for a lot of these
machine-learning methods
-
that are actually very useful.
-
I mean, I think, in general,
-
that literature,
the computer scientist,
-
have brought a huge number
of these algorithms there --
-
have proposed a huge number
of these algorithms
-
that actually are very useful.
-
and that are affecting
-
the way we're going
to be doing empirical work.
-
But we've not fully
internalized that
-
because we're still very focused
-
on getting point estimates
and getting standard errors
-
and getting P values
-
in a way that we need
to move beyond
-
to fully harness the force,
-
the benefits
-
from machine learning literature.
-
- On the one hand, I guess I very
much take your point
-
that sort of the traditional
econometrics framework
-
of propose a method,
prove a limit theorem
-
under some asymptotic story,
story, story, story, story...
-
publisher paper is constraining,
-
and that, in some sense,
by thinking more broadly
-
about what a methods paper
could look like,
-
we may in some sense.
-
Certainly, the machine
learning literature
-
has found a bunch of things
which seem to work quite well
-
for a number of problems
-
and are now having
substantial influence in economics.
-
I guess a question
I'm interested in
-
is how do you think
about the role of...
-
Do you think there is no value
in the theory part of it?
-
Because I guess a question
that I often have
-
to seeing the output
from a machine learning tool,
-
and actually a number
of the methods
-
that you talked about
-
actually do have
inferential results
-
developed for them,
-
something that
I always wonder about,
-
a sort of uncertainty
quantification and just...
-
I have my prior,
-
I come into the world with my view,
I see the result of this thing.
-
How should I update based on it?
-
And in some sense,
if I'm in a world
-
where things
are normally distributed,
-
I know how to do it --
-
here I don't.
-
And so I'm interested to hear
what you think about that.
-
- I don't see this
as sort of saying, well,
-
these results are not interesting,
-
but it's going to be a lot of cases
-
where it's going to be incredibly
hard to get those results,
-
and we may not
be able to get there,
-
and we may need to do it in stages
-
where first someone says,
-
"Hey, I have
this interesting algorithm
-
for doing something,
-
and it works well
by some criterion there
-
on this particular data set,
-
and we should put it out there."
-
and maybe someone
will figure out a way
-
that you can later actually
still do inference
-
under some condition,
-
and maybe those are not
particularly realistic conditions,
-
then we kind of go further.
-
But I think we've been
constraining things too much
-
where we said,
-
"This is the type of things
that we need to do."
-
And in some sense,
-
that goes back
to the way Josh and I
-
thought about things for the local
average treatment effect.
-
That wasn't quite the way
-
people were thinking
about these problems before.
-
There was a sense
that some of the people said
-
the way you need to do
these things is you first say
-
what you're interested
in estimating,
-
and then you do the best job
you can in estimating that.
-
And what you guys are doing
is you're doing it backwards.
-
You kind of say,
"Here, I have an estimator,
-
and now I'm going to figure out
what it's estimating."
-
And I suppose you're going to say
why you think that's interesting
-
or maybe why it's not interesting,
and that's not okay.
-
You're not allowed
to do that in that way.
-
And I think we should
just be a little bit more flexible
-
in thinking about
how to look at problems
-
because I think
we've missed some things
-
by not doing that.
-
♪ [music] ♪
-
- [Josh] So you've heard
our views, Isaiah,
-
and you've seen that we have
some points of disagreement.
-
Why don't you referee
this dispute for us?
-
[laughter]
-
- Oh, it's so nice of you
to ask me a small question.
-
[laughter]
-
So I guess, for one,
-
I very much agree with something
that Guido said earlier of...
-
[laughter]
-
So one thing where it seems
-
where the case for machine learning
seems relatively clear
-
is in settings where
we're interested in some version
-
of a nonparametric
prediction problem.
-
So I'm interested in estimating
-
a conditional expectation
or conditional probability,
-
and in the past, maybe
I would have run a kernel...
-
I would have run
a kernel regression
-
or I would have run
a series regression,
-
or something along those lines.
-
It seems like, at this point,
we've a fairly good sense
-
that in a fairly wide range
of applications,
-
machine learning methods
seem to do better
-
for estimating conditional
mean functions,
-
or conditional probabilities,
-
or various other
nonparametric objects
-
than more traditional
nonparametric methods
-
that were studied
in econometrics and statistics,
-
especially in
high-dimensional settings.
-
- So you're thinking of maybe
the propensity score
-
or something like that?
-
- Yeah, exactly,
- Nuisance functions.
-
- Yeah, so things
like propensity scores.
-
Even objects of more direct
-
interest-like conditional
average treatment effects,
-
which are the difference of two
conditional expectation functions,
-
potentially things like that.
-
Of course, even there,
the theory...
-
for inference of the theory
for how to interpret,
-
how to make large sample statements
about some of these things
-
are less well-developed
depending on
-
the machine learning
estimator used.
-
And so I think
something that is tricky
-
is that we can have these methods,
which work a lot,
-
which seem to work
a lot better for some purposes
-
but which we need to be a bit
careful in how we plug them in
-
or how we interpret
the resulting statements.
-
But, of course, that's a very,
very active area right now
-
where people are doing
tons of great work.
-
And so I fully expect
and hope to see
-
much more going forward there.
-
So one issue with machine learning
that always seems a danger is...
-
or that is sometimes a danger
-
and has sometimes
led to applications
-
that have made less sense
-
is when folks start with a method
that they're very excited about
-
rather than a question.
-
So sort of starting with a question
-
where here's the object
I'm interested in,
-
here is the parameter
of interest --
-
let me think about how I would
identify that thing,
-
how I would recover that thing
if I had a ton of data.
-
Oh, here's a conditional
expectation function,
-
let me plug in a machine
learning estimator for that --
-
that seems very, very sensible.
-
Whereas, you know,
if I regress quantity on price
-
and say that I used
a machine learning method,
-
maybe I'm satisfied that
that solves the endogeneity problem
-
we're usually worried
about there... maybe I'm not.
-
But, again, that's something
-
where the way to address it
seems relatively clear.
-
It's to find
your object of interest
-
and think about --
-
- Just bring in the economics.
-
- Exactly.
-
- And think about
the heterogeneity,
-
but harness the power
of the machine learning methods
-
for some of the components.
-
- Precisely. Exactly.
-
So the question of interest
-
is the same as the question
of interest has always been,
-
but we now have better methods
for estimating some pieces of this.
-
The place that seems
harder to forecast
-
is obviously there's
a huge amount going on
-
in the machine learning literature,
-
and the limited ways
of plugging it in
-
that I've referenced so far
-
are a limited piece of that.
-
So I think there are all sorts
of other interesting questions
-
about where...
-
where does this interaction go?
What else can we learn?
-
And that's something where
I think there's a ton going on,
-
which seems very promising,
-
and I have no idea
what the answer is.
-
- No, I totally agree with that,
-
but that makes it very exciting.
-
And I think there's just
a little work to be done there.
-
Alright. So I say,
he agrees with me there.
-
[laughter]
-
- I didn't say that per se.
-
♪ [music] ♪
-
- [Narrator] If you'd like to watch
more Nobel Conversations,
-
click here.
-
Or if you'd like to learn
more about econometrics,
-
check out Josh's
Mastering Econometrics series.
-
If you'd like to learn more
about Guido, Josh, and Isaiah,
-
check out the links
in the description.
-
♪ [music] ♪