♪ [music] ♪
- [narrator] Welcome
to Nobel Conversations.
In this episode, Josh Angrist
and Guido Imbens
sit down with Isaiah Andrews
to discuss and disagree
over the role of machine learning
in applied econometrics.
- [Isaiah] So, of course,
there are a lot of topics
where you guys largely agree,
but I'd like to turn to one
where maybe you have
some differences of opinion.
So I'd love to hear
some of your thoughts
about machine learning
and the goal that it's playing
and is going to play in economics.
- [Guido] I've looked at some data
like the proprietary
so that there's
no published paper there.
There was an experiment
that was done
on some search algorithm.
And the question was...
it was about ranking things
and changing the ranking.
That was sort of clear...
that was going to be
a lot of heterogeneity there.
Mmm,
You know, if you look for say,
a picture of Britney Spears
that it doesn't really matter
where you rank it
because you're going to figure out
what you're looking for,
whether you put it
in the first or second
or third position of the ranking.
But if you're looking
for the best econometrics book,
if you put your book
first or your book tenth,
that's going to make
a big difference
how much how often people
are going to click on it.
And so there you go --
- [Josh] Why do I need
machine learning to discover that?
It seems like because
I can discover it simply.
So in general, there
were lots of possible.
You what you want to think about there
being lots of characteristics of the
the items that you want to understand
where, what drives the heterogeneity
in the effect of your just rekt,
you know, that in some sense.
You're solving a marketing problem.
Also affect you, it's causal,
but it has no scientific content.
I think about think about,
but it's similar things
and medical settings.
If you do an experiment, you
may actually be very interested
in whether the treatment
works for some groups or not.
And you have a lot of individual
characteristics and you want
to systematically search.
Yeah. I'm skeptical about that.
That sort of idea that there's this personal
causal effect that I should care about,
and that machine learning can Discover it
in some way that's useful. So think about
I've done a lot of work
on schools, going to say
a charter school publicly funded
private school effectively, you know,
that's free to structure its own
curriculum for context there.
Some types of charter, schools
are generate spectacular,
achievement gains and in the data
set that produces that result.
I have a lot of covariance.
So I have Baseline scores,
and I have family background,
the education of the parents, the sex
of the child, the race of the child.
And, well, soon as I put
Half a dozen of those together. I
have a very high dimensional space.
I'm definitely interested
in in sort, of course,
features of that treatment effect,
like whether it's better for people who
come from lower income families.
I have a hard time believing
that there's an application,
you know, for the very high
dimensional version of that, where
I discovered that for
non-white children who have
high family incomes, but Baseline
scores in the third quartile,
And only went to public school in the
third grade, but not the sixth grade.
So that's what that high
dimensional analysis produces.
This very elaborate conditional statement.
There's two things that are wrong
with that. In my view first.
I don't see it as I just can't
imagine why it's actionable.
I don't know why you'd want to act on it.
And I know also that there's some
alternative model that fits almost as well.
That flips everything,
right? Because machine learning doesn't
tell me that this is really the predictor
that
Is it just tells me that this
is a good predictor? And so,
you know, I think there is
something different about the
Moss social science contest. So I think
the socialized signs of applications
you're talking about once where
I think there's not a huge amount
of heterogeneity in the effects.
And so what there might be a few
allow me to to fill that space. No,
not even then I think for
a lot of those those into
Sanctions even effect. You would expect
that. The effect is the same sign
for everybody.
It may be there may be small differences
in the magnitude, but it's not
for a lot of these education
defenses. They're good for everybody.
They're
the it's not that they're bad for some
people and good for other people and
that is kind of very small
Pockets where they're bad the
but it may be some
variation in the magnitude,
but you would need very very big
data sets to find those and I
Then in those cases, they probably
wouldn't be very actionable anyone.
But there's I think there's
a lot of other settings
where there is much more hydrogen it.
Well, I'm open to that possibility
and I think the example you gave of
it's essentially a marketing example.
Now that maybe they
say there's a there's a
have implications for
and that's organization.
How you actually need to
whether you need to worry about
the well, I know Market
power, some see that paper.
So that's the sense. The
sense I'm getting is that
we still disagree on something. Yes.
We have it converged on
everything. I'm getting that sense.
Actually. We've diverged on this because
this wasn't around to argue about.
Is it getting a little warm here? Yeah.
Warm warmed up. Warmed up is good.
The sense. I'm getting his Jaws.
Sort of, you're not, you're not
saying that you're confident
that there is no way.
That there is an application
where the stuff is useful.
You are saying you are you're
unconvinced by the existing.
Applications to dedicate fair
that I'm very confident. Yeah,
in this case.
I think Josh does have a point that today
even in the prediction cases the where
a lot of the machine learning
methods really shine is
where there's just a lot of heterogeneity.
You don't really care much
about the details there, right?
Yes. It does. It doesn't have
a policy angle or something.
They kind of recognizing
handwritten digits and stuff.
For it does much better there than
building some complicated model.
But a lot of the social science, a
lot of the economic applications.
We actually know a huge amount about the
relationship between various variables.
A lot of the relationships
are strictly monotone.
There and education is going
to increase people's earnings,
irrespective of the demographic,
irrespective of the level of Education.
You already have until they get to a
PhD. Yeah. There is a graduate school.
A reasonable range.
It's a it's not going to
go down very much. We're
in a lot of the settings. For these
machine learning method shine.
It's going to there's a lot
of non-monetary Necessities
kind of multi modality
in these relationships
and they're they're going to be very
powerful but I still stand by that.
It kind of It kind of this message just
have a huge amount to offer the for
for economists and they go.
To be a big part of the future.
Feels like there's something interesting
to be said about machine learning here.
So, here I was wondering,
could you give some more,
maybe some examples
of the sorts of examples you're thinking
about with applications? I'm at the moment.
So while I'm on areas where
instead of looking for average
cause of facts were looking for
individualized estimates, and predictions of
of course of facts and their machine
learning algorithms have been very effective,
too.
Surely would have, we would have done
these things, using kernel methods.
And theoretically they work great and
the sort of some arguments that
you formally can't do any better.
But in practice, they
don't work very well and
random Forest, random cause of forest
type things that stuff on wagon, Susan.
I think I've been working
on. I used very widely.
They've been very effective,
kind of, in the settings
to actually get cause of facts
that are that the ferry by
Bike over has, and this kind of,
I think this is still just the beginning
of these methods. But in many cases,
the these algorithms are very
effective as searching over big spaces
and finding the functions that fit
the very well in ways that we
couldn't really do the beforehand.
I don't know of an example, where
machine learning has generated insights
about a causal effect that
I'm interested in. And I,
You know of examples where it's
potentially very misleading.
So I've done some work
with Brigham Franz and
using, for example,
random Forest to model covariate effects
in an instrumental variables problem.
Where you need,
you need to condition on covariance
and you don't particularly have strong
feelings about the functional form for that.
So maybe you should curve
think,
be open to flexible curve fitting
and that leads you down a path
where there's a lot of
nonlinearities in the model and
That's very dangerous with IV because
any sort of excluded non-linearity
potentially generates a spurious, causal
effect and Brigham. And I showed that
very powerfully. I think in
the case of two instruments
that come from a paper, mine
with Bill Evans. Where if you,
you know, replace it
in a traditional two stage least squares,
estimator with some kind of random Forest.
You get very precisely at
estimated nonsense estimates and
You know, I think that's
a, that's a big caution.
And I, you know, in view of those findings
in an example, I care about where
the instruments are very simple
and I believe that they're valid,
you know, I would be skeptical of that. So
non-linearity and Ivy don't mix
very comfortably. Now I said,
you know in some sense that's already
a more complicated. Well, it's Ivy.
Yeah,
but then we work on that and friend out.
I sat in tow vehicle actually guy a lot
of these papers Cross by my desk and it,
but the motivation is is not
clear at a fact, really lacking.
And they're not, they're not, they called
type semi-parametric foundational papers.
So that that's a big problem
and kind of related problem is that
we have this tradition in econometrics
being very focused on these formulas
and tonic results kind of weird.
We have just have a lot of papers
that where you people, propose
a method and then establish
the asymptotic properties
in in a very kind of
standardized way that bad.
Well, I think it's sort of close
the door for a lot of work.
That doesn't fit it into that. We're
in the machine learning literature.
A lot of things are
more algorithmic people.
Had algorithms for coming
up with predictions.
The turn out to actually work much better
than say, nonparametric kernel regression
for a long-ass time. We're doing all
the nonparametric syndecan, metrics.
We do it using kernel regression and
I was great for proving theorems.
You could get confidence, intervals and
consistency, and asymptotic normality,
and it was all great, but
it wasn't very useful.
And the things they did in machine
learning. I just way way better,
but they didn't have to the proper. That's
not my beef with machine learning theory.
As we know my name, I'm saying
there for the prediction part.
It does much better. Yeah, that's
a better curve fitting to it.
But it did. So
in a way that would not have made
those papers initially easy to get into
the econometrics journals because it
wasn't proving the type of things.
You know, when when Brian was doing his
regression trees that just didn't fit in
and I think he would have
had a very hard time.
Polishing these things. And it
could have had six journals.
I, so I think we're we limited
ourselves too much and we
that left us close things off
for a lot of these machine learning
methods, that actually very useful.
Hmm. I mean, I think they're in general,
that literature the computer.
Scientists have brought a huge
number of these algorithms.
The have proposed a huge number of these
algorithms that actually very useful
at that are
Affecting the way we're going
to be doing empirical work,
but we've not fully internalize that
because we're still very focused on getting
Point estimates and
getting standard errors
and getting P values in a way that
we need to move Beyond
to fully harness.
The force, the quote, the benefits
from machine learning literature.
Hmm. On the one hand. I guess I very
much take your point that sort of the the
Tional. Econometrics, framework
of sort of propose, a method,
proved a limit theorem under some
asymptotic story, story story,
story story publish a
paper is constraining.
And that in some sense by thinking, more,
broadly about what a methods paper could
look. Like we may write in some sense.
Certainly the machine learning
literature has found a bunch of things,
which seem to work quite
well for a number of problems
and are now having substantial influence
in economics. I guess a question.
I'm interested in is, how do you think?
The goal of fear.
Sort of, do you think there is? There's
no value in the theory part of it?
Because I guess it's sort of a question
that I often have to sort of seeing
that output from a machine learning tool
that actually a number of the
methods that you talked about.
Actually do have inferential
results, develop for them,
something that I always wonder about a sort
of uncertainty quantification and just,
you know, I I have my prior,
I come into the world with my view.
I see the result of this thing.
How should I update based on it? And
in some sense, if I'm in a world where
things are.
Normally distributed. I know
how to do it here. I don't.
And so I'm interested to hear
had I think it sounds. So
I don't see this as sort
of close it saying, well
we do these results
are not not interesting
but it's gonna be a lot of cases
where it's going to be incredibly hard to
get those results and we may not be able
to get there and
we may need to do it in stages. Where
first someone says. Hey I have this
interesting algorithm for for doing
something and it works well by some
The Criterion that on this
this particular data set
and I'm visit put it
out there and we should
maybe someone will figure out a way that
you can later actually still do inference
on the some condition.
So and maybe those are not
particularly realistic conditions,
then we kind of go further,
but I think we've been
Too constraining things too much where we
said, you know, this is the type of things
that we need to do. And I had some sense
that goes back to kind of
the way they dress and I
thought about things for the
local average treatment effect.
That wasn't quite the way people
were thinking about these problems.
Before they say they there was a sense
that some of the people said, you know,
the way you need to do. These
things, is you first, say
what you're interested in estimating
and then you do the best job you can.
In estimating that
and what you have you guys had doing is
doing it, you guys are doing it backwards.
You're going to say
here. I have an estimator
and now I'm going to figure out what what
what it says estimating then expose.
You're going to say why you
think that's interesting
or maybe why it's not interesting
and that's that's not okay.
You're not allowed to do that that way.
And I think we should just be a little
bit more flexible and thinking about the
how to look at at
Problems because I think we've missed
some things by not by not doing that.
So you've heard our views.
Isaiah, you've seen that, we have
some points of disagreement. Why
don't you referee this dispute for us?
Oh, I'm so so nice of you to ask me
a small question. So I guess for one.
I very much agree with something
that he do said earlier of.
So what?
Where it seems. Where the,
the case for machine learning seems
relatively clear is in settings, where
you know, we're interested in some version
of a nonparametric prediction problem.
So I'm interested in estimating a conditional
expectation or conditional probability
and in the past, maybe I
would have run a colonel,
I would have run a kernel regression or
I would have run a series regression or
something along those lines.
Sort of,
it seems like
at this point we've a fairly good
sense that in a fairly wide range
of applications machine learning
methods seem to do better for
Or, you know,
estimating conditional mean functions
or conditional probabilities or
various other nonparametric objects
than more traditional nonparametric
methods that were studied in econometrics
and statistics, especially
in high dimensional settings.
So you thinking of maybe the propensity
score or something like that?
So exactly, so nuisance functions. Yeah.
So things like propensity scores
things or I mean even objects
of more direct inference
interest, like conditional
average treatment effects, right?
Which of the difference of two
conditional, expectation functions,
potentially things like that.
Of course, even there,
right? We the the theory
for in France or the theory for
sort of how to how to interpret,
how to make large simple statements
about some of these things are
less well-developed depending on the
machine learning, estimator used.
And so, I think there's something
that is tricky is that we
can have these methods, which work a lot,
which seemed to work a lot
better for some purposes.
But which we need to be a bit
careful in how we plug them in or how
we interpret the resulting statements.
But of course, that's a very,
very active area right now. We're
People are doing tons of great work.
And so I exfoli expect and hope
to see much more going forward there.
So one issue with machine learning,
that always seems a danger is, or
that is sometimes a danger
and had some times led to
applications that have
made. Less sense, is
when folks start with a method that are
start with a method that they're very
excited about rather than a question,
right? So sort of starting with
a question where here's the
object I'm interested in here is
the parameter of Interest. Let me
You know,
think about how I would
identify that thing,
how I would recover that
thing, if I had a ton of data,
oh, here's a conditional
expectation function.
Let me plug in an estimator on
machine. Learning estimator for that.
That seems very very sensible.
Whereas, you know, if I
digress quantity on price
and say that I used a
machine learning method,
maybe I'm satisfied that that
solves the in dodging, 80 problem.
We're usually worried
about their maybe I'm not,
but again, that's something where the,
the way to address. It, seems
relatively clear, right?
It's the find your object of interest and
think about, is that just
bringing the economics?
Exactly.
And and can I think about it,
and they denied it, but harnessed
the power of the machine
learning methods for precisely
for some of the components precisely.
Exactly. So sort of, you know, the, the,
the question of interest is the same as
the question of interest is always been,
but we now better methods for estimating
some pieces of this, right? The
the place that seems harder to, uh,
harder to forecast is Right.
Obviously, there's a huge amount
going in going on in the machine.
Learning literature
and the great sort of The Limited ways
of plugging it in that I've referenced
so far are limited piece of that.
And so I think there are all sorts of
other interesting questions about where,
right sort of
where does this interaction
go? What else can we learn?
And that's something where,
you know, I think there's
a ton going on which seems very promising
and I have no idea what the answer is.
No, no. No, it's I so I totally
agree with that but it's no.
That's makes it very exciting.
And I think that's just a
little work to be done there.
All right. So I say agrees
with me there, say that person.
If you'd like to watch more
Nobel conversations, click here,
or if you'd like to learn
more about econometrics,
check out Josh's mastering
econometrics series.
If you'd like to learn more
about he do Josh and Isaiah
check out the links in the description.