♪ [music] ♪

- [Narrator] Welcome
to Nobel Conversations.

In this episode, Josh Angrist
and Guido Imbens

sit down with Isaiah Andrews
to discuss and disagree

over the role of machine learning
in applied econometrics.

- [Isaiah] So, of course,
there are a lot of topics

where you guys largely agree,

but I'd like to turn to one

where maybe you have
some differences of opinion.

I'd love to hear
some of your thoughts

about machine learning

and the goal that it's playing
and is going to play in economics.

- [Guido] I've looked at some data
like the proprietary.

We see that there's
no published paper there.

There was an experiment
that was done

on some search algorithm,

and the question was...

it was about ranking things
and changing the ranking.

And it was sort of clear...

that there was going to be
a lot of heterogeneity there.

If you look for, say,

a picture of Britney Spears

that it doesn't really matter
where you rank it

because you're going to figure out
what you're looking for,

whether you put it
in the first or second

or third position of the ranking.

But if you're looking
for the best econometrics book,

if you put your book first
or your book tenth --

that's going to make
a big difference

how often people
are going to click on it.

And so there you --

- [Josh] Why do I need
machine learning to discover that?

It seems like I could
I can discover it simply?

- [Guido] So in [general]--

- [Josh] There were lots
of possible...

- You what you want to think about
there being lots of characteristics

of the items,

that you want to understand
what drives the heterogeneity

in the effect of--

- But you're just predicting

In some sense, you're solving
a marketing problem.

- [inaudible] it's causal effect,

- It's causal, but it has
no scientific content.

Think about...

- No, but it's similar things
in medical settings.

If you do an experiment, 
you may actually be very interested

in whether the treatment
works for some groups or not.

And you have a lot
of individual characteristics,

and you want
to systematically search--

- Yeah. I'm skeptical about that --

that sort of idea that there's
this personal causal effect

that I should care about,

and that machine learning
can discover it

in some way that's useful.

So think about -- I've done
a lot of work on schools,

going to, say, a charter school,

a publicly funded private school,

effectively, you know,
that's free to structure

its own curriculum
for context there.

Some types of charter schools

generate spectacular
achievement gains,

and in the data set
that produces that result,

I have a lot of covariance.

So I have baseline scores,
and I have family background,

the education of the parents,

the sex of the child, 
the race of the child.

And, well, soon as I put
half a dozen of those together,

I have a very high
dimensional space.

I'm definitely interested
in sort of course features

of that treatment effect,

like whether it's better for people

who come from
lower income families.

I have a hard time believing
that there's an application

for the very high dimensional
version of that,

where I discovered
that for non-white children

who have high family incomes

but baseline scores
in the third quartile

and only went to public school
in the third grade

but not the sixth grade.

So that's what that high
dimensional analysis produces.

It's a very elaborate
conditional statement.

There's two things that are wrong
with that in my view.

First, I don't see it as...

I just can't imagine
why it's actionable.

I don't know why
you'd want to act on it.

And I know also that
there's some alternative model

that fits almost as well,

that flips everything.

Because machine learning
doesn't tell me

that this is really
the predictor that matters.

It just tells me that
this is a good predictor.

And so, I think
there is something different

about the social science context.

- [Guido] I think
the social science applications

you're talking about,

once were...

I think there's not a huge amount
of heterogeneity in the effects.

- [Josh] Well, there might be
if you allow me

to fill that space.

- No... not even then.

I think for a lot
of those interventions,

you would expect that the effect
is the same sign for everybody.

There may be small differences
in the magnitude, but it's not...

For a lot of these education [ ],
they're good for everybody.

It's not that they're bad
for some people

and good for other people,

and that is kind
of very small pockets

where they're bad there.

But there may be some variation
in the magnitude,

but you would need very, 
very big data sets to find those.

I agree that in those cases,

they probably wouldn't be
very actionable anyway.

But I think there's a lot
of other settings

where there is
much more heterogeneity.

- Well, I'm open
to that possibility,

and I think the example you gave
is essentially a marketing example.

- No, those have
implications for it

and that's the organization,

whether you need
to worry about the...

- Well, I need to see that paper.

- So the sense
I'm getting is that...

- We still disagree on something.
- Yes. [laughter]

- We haven't converged
on everything.

- I'm getting that sense.
[laughter]

- Actually, we've diverged on this

because this wasn't around
to argue about.

[laughter]

- Is it getting a little warm here?

- Warmed up. Warmed up is good.

The sense I'm getting is, Josh,
you're not saying

that you're confident
that there is no way

that there is an application
with the stuff.

It's useful you are saying
you are unconvinced

by the existing
applications to date.

- Fair enough.
- I'm very confident.

[laughter]

- In this case.

- I think Josh does have a point

that even in the prediction cases

where a lot of the machine learning
methods really shine

is where there's just a lot
of heterogeneity.

- You don't really care much
about the details there, right?

- [Guido] Yes.

It doesn't have
a policy angle or something.

- The kind of recognizing
handwritten digits and stuff.

It does much better there

than building
some complicated model.

But a lot of the social science,
a lot of the economic applications,

we actually know a huge amount
about the relationship

between its variables.

A lot of the relationships
are strictly monotone.

Education is going to increase
people's earnings,

irrespective of the demographic,

irrespective of the level
of education you already have.

- Until they get to a Ph.D.

- They have proof
of graduate school...

[laughter]

- Over a reasonable range.

It's not going
to go down very much.

In a lot of the settings

where these machine learning
methods shine,

there's a lot of [ ]

kind of multimodality
in these relationships,

and they're going to be
very powerful.

But I still stand by that.

These methods just have
a huge amount to offer

for economists,

and they're going to be
a big part of the future.

♪ [music] ♪

- [Isaiah] Feels like
there's something interesting

to be said about
machine learning here.

So, Guido, I was wondering,
could you give some more...

maybe some examples
of the sorts of examples

you're thinking about
with applications [ ] at the moment?

- So one area is where

instead of looking
for average causal effects

we're looking for
individualized estimates,

predictions of causal or effects,

and the machine learning algorithms
have been very effective,

Traditionally, we would have done
these things using kernel methods,

and theoretically they work great,

and there's some arguments

that, formally, 
you can't do any better.

But in practice, 
they don't work very well.

Random causal forest-type things

that Stefan Wager and Susan Athey
have been working on

are used very widely.

They've been very effective
in these settings

to actually get causal effects
that vary be [ ].

I think this is still just the beginning
of these methods.

But in many cases,

these algorithms are very effective
as searching over big spaces

and finding the functions that fit very well

in ways that we couldn't
really do beforehand.

- I don't know of an example

where machine learning
has generated insights

about a causal effect
that I'm interested in.

And I do know of examples

where it's potentially
very misleading.

So I've done some work
with Brigham Frandsen,

using, for example, random forest
to model covariate effects

in an instrumental
variables problem

Where you need you need
to condition on covariance.

And you don't particularly
have strong feelings

about the functional form for that,

so maybe you should curve...

be open to flexible curve fitting,

and that leads you down a path

where there's a lot
of nonlinearities in the model,

and that's very dangerous with IV

because any sort
of excluded non-linearity

potentially generates
a spurious causal effect

and Brigham and I
showed that very powerfully.

I think in the case
of two instruments

that come from a paper of mine
with Bill Evans,

where if you replace it

a traditional two stage 
[ ] squares estimator

with some kind of random forest,

you get very precisely
estimated [non-sense] estimates.

I think that's a big caution.

In view of those findings
in an example I care about

where the instruments
are very simple

and I believe that they're valid,

I would be skeptical of that.

So non-linearity and IV
don't mix very comfortably.

No, it sounds like that's already
a more complicated...

- Well, it's IV....
- Yeah.

- ...and we work on that.

[laughter]

- Fair enough.

- As Editor of Econometric [guy],

a lot of these papers
cross by my desk,

but the motivation is not clear

and, in fact, really lacking.

They're not... [we call] type
semi-parametric foundational papers.

So that that's a big problem.

A related problem is that we have
this tradition in econometrics

of being very focused
on these formal [ ] results.

We have just have a lot of papers
where people propose a method

and then establish
the asymptotic properties

in a very kind of standardized way.

- Is that bad?

- Well, I think it's sort
of closed the door

for a lot of work
that doesn't fit it into that.

where in the machine
learning literature,

a lot of things
are more algorithmic.

People had algorithms
for coming up with predictions

that turn out
to actually work much better

than, say, nonparametric
kernel regression

For a long time, we were doing all
the nonparametrics in econometrics,

we were using kernel regression,

and it was great for proving theorems.

You could get [ ] intervals

and consistency, 
and asymptotic normality,

and it was all great,

But it wasn't very useful.

And the things they did
in machine learning

are just way, way better.

But they didn't have the problem--

- That's not my beef
with machine learning theory.

[laughter]

No, but I'm saying there,
for the prediction part,

it does much better.

- Yeah, it's a better
curve fitting to it.

- But it did so in a way

that would not have made
those papers

initially easy to get into,
the econometrics journals,

because it wasn't proving
the type of things.

When Brigham was doing
his regression trees

that just didn't fit in.

I think he would have had
a very hard time

publishing these things
in econometric journals.

I think we've limited
ourselves too much

that left us close things off

for a lot of these
machine learning methods

that are actually very useful.

I mean, I think, in general,

that literature, 
the computer scientist,

have proposed a huge number
of these algorithms

that actually are very useful.

and that are affecting

the way we're going
to be doing empirical work.

But we've not fully internalized that

because we're still very focused

on getting point estimates
and getting standard errors

and getting P values

in a way that we need to move beyond

to fully harness the force,

the benefits
from the machine learning literature.

- On the one hand, I guess I very
much take your point

that sort of the traditional
econometrics framework

of sort of propose a method,
prove a limit theorem

under some asymptotic story,
story story, story story...

publisher paper is constraining.

And that, in some sense,

by thinking more broadly

about what a methods paper
could look like,

we may [write] in some sense.

Certainly the machine learning
literature has found a bunch of things,

which seem to work quite well
for a number of problems

and are now having
substantial influence in economics.

I guess a question I'm interested in

is how do you think
about the role of...

sort of -- do you think there is
no value in the theory part of it?

Because I guess a question
that I often have

to sort of seeing that output
from a machine learning tool,

that actually a number of the
methods that you talked about

actually do have inferential results
developed for them,

something that
I always wonder about

of uncertainty quantification
and just...

I have my prior,

I come into the world with my view.
I see the result of this thing.

How should I update based on it?

And in some sense, 
if I'm in a world

where things are normally distributed,

I know how to do it here --

here I don't.

And so I'm interested to hear
what you think about that.

- I don't see this as sort
of saying, well,

these results are not interesting,

but it's going to be a lot of cases

where it's going
to be incredibly hard

to get those results

and we may not be able to get there

and we may need to do it in stages

where first someone says,

"Hey, I have
this interesting algorithm

for doing something

and it works well by some of the criterion

that on this particular data set,

and I'm visit put it out there,

and maybe someone will figure out a way

that you can later actually
still do inference

on the [sum] condition,

and maybe those are not
particularly realistic conditions,

then we kind of go further.

But I think we've been
constraining things too much

where we said,

"This is the type of things
that we need to do.

And in some sense,

that goes back
to the way Josh and I

thought about things for the
[local average treatment] effect.

That wasn't quite the way

people were thinking
about these problems before.

There was a sense
that some of the people said

the way you need to do
these things is you first say,

what you're interested in
in estimating

and then you do the best job
you can in estimating that.

and what you guys are doing
is you're doing it backwards.

You kind of say,
"Here, I have an estimator,

and now I'm going to figure out
what it's estimating,

and I suppose you're going to say
why you think that's interesting

or maybe why it's not interesting,
and that's not okay.

You're not allowed
to do that that way.

And I think we should
just be a little bit more flexible

in thinking about
how to look at problems

because I think
we've missed some things

by not doing that.

- [Josh] So you've heard
our views, Isaiah.

You've seen that we have
some points of disagreement.

Why don't you referee
this dispute for us?

[laughter]

- Oh, it's so nice of you
to ask me a small question.

So I guess for one,

I very much agree with something
that Guido said earlier of...

[laughter]

- So one thing where it seems

where the case for machine learning
seems relatively clear

is in settings where
we're interested in some version

of a nonparametric
prediction problem.

So I'm interested in estimating

a conditional expectation
or conditional probability,

and in the past, maybe
I would have run a kernel...

I would have run
a kernel regression

or I would have run
a series regression,

or something along those lines.

It seems like, at this point, 
we've a fairly good sense

that in a fairly wide range
of applications,

machine learning methods
seem to do better

for estimating conditional
mean functions

or conditional probabilities

or various other
nonparametric objects

than more traditional
nonparametric methods

that were studied
in econometrics and statistics,

especially
in high dimensional settings.

- So you're thinking of maybe
the propensity score

or something like that?

- Yeah, exactly,

- Nuisance functions.

Yeah, so things
like propensity scores,

even objects of more direct

interest-like conditional
average treatment effects,

which of the difference of two
conditional expectation functions,

potentially things like that.

Of course, even there, the theory...

inference of the theory
for how to interpret,

how to make large simple statements
about some of these things

are less well-developed
depending on

the machine learning
estimator used.

And so I think there's
something that is tricky

is that we can have these methods,
which work a lot,

which seemed to work
a lot better for some purposes,

but which we need to be a bit
careful in how we plug them in

or how we interpret
the resulting statements.

But of course, that's a very,
very active area right now

where people are doing
tons of great work.

And so I fully expect
and hope to see

much more going forward there.

So one issue with machine learning
that always seems a danger

or that is sometimes a danger

and had sometimes
led to applications

that have made less sense

is when folks start with a method
that they're very excited about

rather than a question.

So sort of starting with a question

where here's the object I'm interested in,

here is the parameter of interest.

let me think about how I would
identify that thing,

how I would recover that thing
if I had a ton of data.

Oh, here's a conditional
expectation function.

Let me plug in the machine
learning estimator for that.

That seems very, very sensible.

Whereas, you know, 
if I regress quantity on price

and say that I used
a machine learning method,

maybe I'm satisfied that 
that solves the [ ] problem

we're usually worried
about there... maybe I'm not.

But again, that's something

where the way to address it
seems relatively clear.

It's to find your object of interest

and think about--

- Just bring in the economics.

- Exactly.

- And and can I think about heterogeneity,

but harnessed the power
of the machine learning methods

for some of the components.

- Precisely. Exactly.

So the question of interest

is the same as the question
of interest has always been,

but we now have better methods
for estimating some pieces of this.

The place that seems
harder to forecast

is obviously, there's
a huge amount going on

in the machine learning literature

and the limited ways
of plugging it in

that I've referenced so far

are a limited piece of that.

And so I think there are all sorts
of other interesting questions

about where...

where does this interaction go? 
What else can we learn?

And that's something where
I think there's a ton going on

which seems very promising,

and I have no idea
what the answer is.

- No, I totally agree with that,

but that makes it very exciting.

And I think there's just
a little work to be done there.

Alright. So I say, he agrees
with me there.

[laughter]

- I didn't say that per se.

- [Narrator] If you'd like to watch
more Nobel Conversations,

click here.

Pr if you'd like to learn
more about econometrics,

check out Josh's
Mastering Econometrics series.

If you'd like to learn more
about Guido, Josh, and Isaiah,

check out the links
in the description.

♪ [music] ♪