-
♪ [music] ♪
-
- [narrator] Welcome
to Nobel Conversations.
-
In this episode, Josh Angrist
and Guido Imbens
-
sit down with Isaiah Andrews
to discuss and disagree
-
over the role of machine learning
in applied econometrics.
-
- [Isaiah] So, of course,
there are a lot of topics
-
where you guys largely agree,
-
but I'd like to turn to one
-
where maybe you have
some differences of opinion.
-
So I'd love to hear
some of your thoughts
-
about machine learning
-
and the goal that it's playing
and is going to play in economics.
-
- [Guido] I've looked at some data
like the proprietary
-
so that there's
no published paper there.
-
There was an experiment
that was done
-
on some search algorithm.
-
And the question was...
-
it was about ranking things
and changing the ranking.
-
That was sort of clear...
-
that was going to be
a lot of heterogeneity there.
-
Mmm,
-
You know, if you look for say,
-
a picture of Britney Spears
-
that it doesn't really matter
where you rank it
-
because you're going to figure out
what you're looking for,
-
whether you put it
in the first or second
-
or third position of the ranking.
-
But if you're looking
for the best econometrics book,
-
if you put your book
first or your book tenth,
-
that's going to make
a big difference
-
how much how often people
are going to click on it.
-
And so there you go --
-
- [Josh] Why do I need
machine learning to discover that?
-
It seems like could
I can discover it simply?
-
- [Guido] So in general--
-
- [Josh] There were lots
of possible...
-
- You what you want to think about
there being lots of characteristics
-
of the items
-
that you want to understand
what drives the heterogeneity
-
in the effect of--
-
- But you're just predicting
-
In some sense, you're solving
a marketing problem.
-
- [inaudible] it's causal effect,
-
- It's causal, but it has
no scientific content.
-
Think about...
-
- No, but it's similar things
in medical settings.
-
If you do an experiment,
you may actually be very interested
-
in whether the treatment
works for some groups or not.
-
And you have a lot of individual
characteristics,
-
and you want
to systematically search.
-
- Yeah. I'm skeptical about that --
-
that sort of idea that there's
this personal causal effect
-
that I should care about,
-
and that machine learning
can discover it
-
in some way that's useful.
-
So think about -- I've done
a lot of work on schools,
-
going to, say, a charter school,
-
a publicly funded private school,
-
effectively, you know,
that's free to structure
-
its own curriculum
for context there.
-
Some types of charter schools
-
generate spectacular
achievement gains,
-
and in the data set
that produces that result,
-
I have a lot of covariance.
-
So I have baseline scores,
and I have family background,
-
the education of the parents,
-
the sex of the child,
the race of the child.
-
And, well, soon as I put
half a dozen of those together,
-
I have a very high dimensional space.
-
I'm definitely interested
in sort of coarse features
-
of that treatment effect,
-
like whether it's better for people
-
who come from
lower income families.
-
I have a hard time believing
that there's an application,
-
for the very high dimensional
version of that,
-
where I discovered
that for non-white children
-
who have high family incomes
-
but baseline scores
in the third quartile
-
and only went to public school
in the third grade
-
but not the sixth grade.
-
So that's what that high
dimensional analysis produces.
-
This very elaborate
conditional statement.
-
There's two things that are wrong
with that in my view.
-
First, I don't see it as...
-
I just can't imagine
why it's actionable.
-
I don't know why
you'd want to act on it.
-
And I know also
that there's some alternative model
-
that fits almost as well,
-
that flips everything,
-
Because machine learning
doesn't tell me
-
that this is really
the predictor that matters.
-
It just tells me that
this is a good predictor.
-
And so, I think
there is something different
-
about the social science contest.
-
- [Guido] I think
the [socialized sign] applications
-
you're talking about,
-
once were...
-
I think there's not a huge amount
of heterogeneity in the effects.
-
- [Josh] There might be
-
if you allow me
to to fill that space.
-
- No... not even then.
-
I think for a lot
of those interventions,
-
you would expect that the effect
is the same sign for everybody.
-
There may be small differences
in the magnitude, but it's not...
-
For a lot of these education
defenses -- they're good for everybody.
-
It's not that they're bad
for some people
-
and good for other people,
-
and that is kind
of very small pockets
-
where they're bad there.
-
But it may be some variation
in the magnitude,
-
but you would need very,
very big data sets to find those.
-
I agree that in those cases,
-
they probably wouldn't be
very actionable anyone.
-
But I think there's a lot
of other settings
-
where there is
much more heterogeneity.
-
- Well, I'm open
to that possibility,
-
and I think the example you gave
is essentially a marketing example.
-
- No, those have implications for it
and that's the organization,
-
whether you need
to worry about the...
-
- Well, I need to see that paper.
-
- So the sense I'm getting...
-
- We still disagree on something.
- Yes.
-
[laughter]
-
- We haven't converged
on everything.
-
- I'm getting that sense.
-
[laughter]
-
- Actually, we've diverged on this
-
because this wasn't around
to argue about.
-
[laughter]
-
- Is it getting a little warm here?
-
- Warmed up. Warmed up is good.
-
The sense I'm getting is, Josh,
you're not saying
-
that you're confident
that there is no way
-
that there is an application
where the stuff.
-
It's useful you are saying
-
you are unconvinced by
the existing application to date.
-
Fair enough.
-
- I'm very confident.
-
[laughter]
-
- In this case.
-
- I think Josh does have a point
-
that even in the prediction cases
-
where a lot of the machine learning
methods really shine
-
is where there's just a lot
of heterogeneity.
-
- You don't really care much
about the details there, right?
-
It doesn't have
a policy angle or something.
-
- They kind of recognizing
handwritten digits and stuff.
-
It does much better there
-
than building
some complicated model.
-
But a lot of the social science,
a lot of the economic applications,
-
we actually know a huge amount
about the relationship
-
between its variables.
-
A lot of the relationships
are strictly monotone.
-
Education is going to increase
people's earnings,
-
irrespective of the demographic,
-
irrespective of the level
of education you already have.
-
- Until they get to a Ph.D.
-
- Yeah, there is a graduate school...
-
[laughter]
-
but go over a reasonable range.
-
It's not going
to go down very much.
-
In a lot of the settings
-
where these machine learning
methods shine,
-
there's a lot of [ ]
-
kind of multimodality
in these relationships,
-
and they're going to be
very powerful.
-
But I still stand by that.
-
These methods just have
a huge amount to offer
-
for economists,
-
and they're going to be
a big part of the future.
-
- [Isaiah] Feels like
there's something interesting
-
to be said about
machine learning here.
-
So, Guido, I was wondering,
could you give some more...
-
maybe some examples
of the sorts of examples
-
you're thinking about
with applications [ ] at the moment?
-
- So on areas where
-
instead of looking
for average cause or effects
-
we're looking for
individualized estimates,
-
predictions of cause or effects
-
and the machine learning algorithms
have been very effective,
-
Traditionally, we would have done
these things using kernel methods.
-
And theoretically they work great,
-
and there's some arguments
-
that, formally,
you can't do any better.
-
But in practice,
they don't work very well.
-
Random causal forest-type things
-
that Stefan Wager and Susan Athey
have been working on
-
have used very widely.
-
They've been very effective
in these settings
-
to actually get causal effects
that vary be [ ].
-
I think this is still just the beginning
of these methods.
-
But in many cases,
-
these algorithms are very effective
as searching over big spaces
-
and finding the functions that fit very well
-
in ways that we couldn't
really do beforehand.
-
- I don't know of an example
-
where machine learning
has generated insights
-
about a causal effect
that I'm interested in.
-
And I do know of examples
-
where it's potentially
very misleading.
-
So I've done some work
with Brigham Frandsen,
-
using, for example, random forest
to model covariate effects
-
in an instrumental
variables problem
-
Where you need you need
to condition on covariance.
-
And you don't particularly
have strong feelings
-
about the functional form for that,
-
so maybe you should curve...
-
be open to flexible curve fitting,
-
and that leads you down a path
-
where there's a lot
of nonlinearities in the model,
-
and that's very dangerous with IV
-
because any sort
of excluded non-linearity
-
potentially generates
a spurious causal effect
-
and Brigham and I
showed that very powerfully.
-
I think in the case
of two instruments
-
that come from a paper of mine
with Bill Evans,
-
where if you replace it
-
a traditional two stage
[ ] squares estimator
-
with some kind of random forest,
-
you get very precisely
estimated [non-sense] estimates.
-
I think that's a big caution.
-
In view of those findings
in an example I care about
-
where the instruments
are very simple
-
and I believe that they're valid,
-
I would be skeptical of that.
-
So non-linearity and IV
don't mix very comfortably.
-
No, it sounds like that's already
a more complicated...
-
- Well, it's IV....
- Yeah.
-
- ...and we work on that.
-
[laughter]
-
- Fair enough.
-
- As Editor of Econometric [guy],
-
a lot of these papers
cross by my desk,
-
but the motivation is not clear
-
and, in fact, really lacking.
-
They're not... [we call] type
semi-parametric foundational papers.
-
So that that's a big problem.
-
A related problem is that we have
this tradition in econometrics
-
of being very focused
on these formal [ ] results.
-
We have just have a lot of papers
where people propose a method
-
and then establish
the asymptotic properties
-
in a very kind of standardized way.
-
- Is that bad?
-
- Well, I think it's sort
of closed the door
-
for a lot of work
that doesn't fit it into that.
-
where in the machine
learning literature,
-
a lot of things
are more algorithmic.
-
People had algorithms
for coming up with predictions
-
that turn out
to actually work much better
-
than, say, nonparametric
kernel regression
-
For a long time, we were doing all
the nonparametrics in econometrics,
-
we were using kernel regression,
-
and it was great for proving theorems.
-
You could get [ ] intervals
-
and consistency,
and asymptotic normality,
-
and it was all great,
-
But it wasn't very useful.
-
And the things they did
in machine learning
-
are just way, way better.
-
But they didn't have the problem--
-
- That's not my beef
with machine learning theory.
-
[laughter]
-
No, but I'm saying there,
for the prediction part,
-
it does much better.
-
- Yeah, it's a better
curve fitting to it.
-
- But it did so in a way
-
that would not have made
those papers
-
initially easy to get into,
the econometrics journals,
-
because it wasn't proving
the type of things.
-
When Brigham was doing
his regression trees
-
that just didn't fit in.
-
I think he would have had
a very hard time
-
publishing these things
in econometric journals.
-
I think we've limited
ourselves too much
-
that left us close things off
-
for a lot of these
machine learning methods
-
that are actually very useful.
-
I mean, I think, in general,
-
that literature,
the computer scientist,
-
have proposed a huge number
of these algorithms
-
that actually are very useful.
-
and that are affecting
-
the way we're going
to be doing empirical work.
-
But we've not fully internalized that
-
because we're still very focused
-
on getting point estimates
and getting standard errors
-
and getting P values
-
in a way that we need to move beyond
-
to fully harness the force,
-
the benefits
from machine learning literature.
-
Hmm. On the one hand. I guess I very
much take your point that sort of the the
-
Tional. Econometrics, framework
of sort of propose, a method,
-
proved a limit theorem under some
asymptotic story, story story,
-
story story publish a
paper is constraining.
-
And that in some sense by thinking, more,
-
broadly about what a methods paper could
look. Like we may write in some sense.
-
Certainly the machine learning
literature has found a bunch of things,
-
which seem to work quite
well for a number of problems
-
and are now having substantial influence
in economics. I guess a question.
-
I'm interested in is, how do you think?
-
The goal of fear.
-
Sort of, do you think there is? There's
no value in the theory part of it?
-
Because I guess it's sort of a question
that I often have to sort of seeing
-
that output from a machine learning tool
-
that actually a number of the
methods that you talked about.
-
Actually do have inferential
results, develop for them,
-
something that I always wonder about a sort
of uncertainty quantification and just,
-
you know, I I have my prior,
-
I come into the world with my view.
I see the result of this thing.
-
How should I update based on it? And
in some sense, if I'm in a world where
-
things are.
-
Normally distributed. I know
how to do it here. I don't.
-
And so I'm interested to hear
had I think it sounds. So
-
I don't see this as sort
of close it saying, well
-
we do these results
are not not interesting
-
but it's gonna be a lot of cases
-
where it's going to be incredibly hard to
get those results and we may not be able
-
to get there and
-
we may need to do it in stages. Where
first someone says. Hey I have this
-
interesting algorithm for for doing
something and it works well by some
-
The Criterion that on this
this particular data set
-
and I'm visit put it
out there and we should
-
maybe someone will figure out a way that
you can later actually still do inference
-
on the some condition.
-
So and maybe those are not
particularly realistic conditions,
-
then we kind of go further,
but I think we've been
-
Too constraining things too much where we
said, you know, this is the type of things
-
that we need to do. And I had some sense
-
that goes back to kind of
the way they dress and I
-
thought about things for the
local average treatment effect.
-
That wasn't quite the way people
were thinking about these problems.
-
Before they say they there was a sense
that some of the people said, you know,
-
the way you need to do. These
things, is you first, say
-
what you're interested in estimating
and then you do the best job you can.
-
In estimating that
-
and what you have you guys had doing is
doing it, you guys are doing it backwards.
-
You're going to say
here. I have an estimator
-
and now I'm going to figure out what what
-
what it says estimating then expose.
-
You're going to say why you
think that's interesting
-
or maybe why it's not interesting
and that's that's not okay.
-
You're not allowed to do that that way.
-
And I think we should just be a little
bit more flexible and thinking about the
-
how to look at at
-
Problems because I think we've missed
some things by not by not doing that.
-
So you've heard our views.
Isaiah, you've seen that, we have
-
some points of disagreement. Why
don't you referee this dispute for us?
-
Oh, I'm so so nice of you to ask me
a small question. So I guess for one.
-
I very much agree with something
that he do said earlier of.
-
So what?
-
Where it seems. Where the,
-
the case for machine learning seems
relatively clear is in settings, where
-
you know, we're interested in some version
of a nonparametric prediction problem.
-
So I'm interested in estimating a conditional
expectation or conditional probability
-
and in the past, maybe I
would have run a colonel,
-
I would have run a kernel regression or
I would have run a series regression or
-
something along those lines.
-
Sort of,
-
it seems like
-
at this point we've a fairly good
sense that in a fairly wide range
-
of applications machine learning
methods seem to do better for
-
Or, you know,
-
estimating conditional mean functions
-
or conditional probabilities or
various other nonparametric objects
-
than more traditional nonparametric
methods that were studied in econometrics
-
and statistics, especially
in high dimensional settings.
-
So you thinking of maybe the propensity
score or something like that?
-
So exactly, so nuisance functions. Yeah.
-
So things like propensity scores
things or I mean even objects
-
of more direct inference
-
interest, like conditional
average treatment effects, right?
-
Which of the difference of two
conditional, expectation functions,
-
potentially things like that.
-
Of course, even there,
right? We the the theory
-
for in France or the theory for
sort of how to how to interpret,
-
how to make large simple statements
about some of these things are
-
less well-developed depending on the
machine learning, estimator used.
-
And so, I think there's something
that is tricky is that we
-
can have these methods, which work a lot,
-
which seemed to work a lot
better for some purposes.
-
But which we need to be a bit
careful in how we plug them in or how
-
we interpret the resulting statements.
-
But of course, that's a very,
very active area right now. We're
-
People are doing tons of great work.
And so I exfoli expect and hope
-
to see much more going forward there.
-
So one issue with machine learning,
that always seems a danger is, or
-
that is sometimes a danger
and had some times led to
-
applications that have
made. Less sense, is
-
when folks start with a method that are
-
start with a method that they're very
excited about rather than a question,
-
right? So sort of starting with
a question where here's the
-
object I'm interested in here is
the parameter of Interest. Let me
-
You know,
-
think about how I would
identify that thing,
-
how I would recover that
thing, if I had a ton of data,
-
oh, here's a conditional
expectation function.
-
Let me plug in an estimator on
machine. Learning estimator for that.
-
That seems very very sensible.
-
Whereas, you know, if I
digress quantity on price
-
and say that I used a
machine learning method,
-
maybe I'm satisfied that that
solves the in dodging, 80 problem.
-
We're usually worried
about their maybe I'm not,
-
but again, that's something where the,
-
the way to address. It, seems
relatively clear, right?
-
It's the find your object of interest and
-
think about, is that just
bringing the economics?
-
Exactly.
-
And and can I think about it,
and they denied it, but harnessed
-
the power of the machine
learning methods for precisely
-
for some of the components precisely.
Exactly. So sort of, you know, the, the,
-
the question of interest is the same as
the question of interest is always been,
-
but we now better methods for estimating
some pieces of this, right? The
-
the place that seems harder to, uh,
-
harder to forecast is Right.
-
Obviously, there's a huge amount
going in going on in the machine.
-
Learning literature
-
and the great sort of The Limited ways
-
of plugging it in that I've referenced
so far are limited piece of that.
-
And so I think there are all sorts of
other interesting questions about where,
-
right sort of
-
where does this interaction
go? What else can we learn?
-
And that's something where,
you know, I think there's
-
a ton going on which seems very promising
and I have no idea what the answer is.
-
No, no. No, it's I so I totally
agree with that but it's no.
-
That's makes it very exciting.
-
And I think that's just a
little work to be done there.
-
All right. So I say agrees
with me there, say that person.
-
If you'd like to watch more
Nobel conversations, click here,
-
or if you'd like to learn
more about econometrics,
-
check out Josh's mastering
econometrics series.
-
If you'd like to learn more
about he do Josh and Isaiah
-
check out the links in the description.