♪ [music] ♪

Welcome to Nobel conversations.

In this episode, Josh Angrist
and Guido Imbens

sit down with Isaiah Andrews
to discuss and disagree

over the role of machine learning
in applied econometrics.

So, of course, there are a lot of topics
where you guys largely agree,

but I'd like to turn to one

where maybe you have
some differences of opinion.

So I would love to hear
some of your thoughts

about machine learning

and the goal that it's playing
and is going to play in economics.

I've looked at some data like
this proprietary so that there's

no published paper there with

the was an experiment that was
done on some search algorithm.

And the question was it sort of it was about
ranking things and changing the ranking

that was sort of clear that I made.

That was going to be a lot of heterogeneity
there. Mmm, you know, if if you know,

look for for say,

the

a picture of Britney Spears that it
doesn't really matter where you rank it

because you going to figure
out what you're looking for.

But if you put it in the first or second,
or third possession of the ranking,

but if you're looking for
the best econometrics book,

if you put your book
first or your book tense,

that's going to make a big difference.

How much how often people are
going to click on it? And so,

there you go. Why do I
need machine learning to

discover that? It seems like
I said Discover it soon.

So in general, there
were lots of possible.

You what you want to think about there
being lots of characteristics of the

the items that you want to understand
where, what drives the heterogeneity

in the effect of your just rekt,
you know, that in some sense.

You're solving a marketing problem.

Also affect you, it's causal,
but it has no scientific content.

I think about think about,

but it's similar things
and medical settings.

If you do an experiment, you
may actually be very interested

in whether the treatment
works for some groups or not.

And you have a lot of individual
characteristics and you want

to systematically search.
Yeah. I'm skeptical about that.

That sort of idea that there's this personal
causal effect that I should care about,

and that machine learning can Discover it
in some way that's useful. So think about

I've done a lot of work
on schools, going to say

a charter school publicly funded
private school effectively, you know,

that's free to structure its own
curriculum for context there.

Some types of charter, schools
are generate spectacular,

achievement gains and in the data
set that produces that result.

I have a lot of covariance.

So I have Baseline scores,
and I have family background,

the education of the parents, the sex
of the child, the race of the child.

And, well, soon as I put

Half a dozen of those together. I
have a very high dimensional space.

I'm definitely interested
in in sort, of course,

features of that treatment effect,
like whether it's better for people who

come from lower income families.

I have a hard time believing
that there's an application,

you know, for the very high
dimensional version of that, where

I discovered that for
non-white children who have

high family incomes, but Baseline
scores in the third quartile,

And only went to public school in the
third grade, but not the sixth grade.

So that's what that high
dimensional analysis produces.

This very elaborate conditional statement.

There's two things that are wrong
with that. In my view first.

I don't see it as I just can't
imagine why it's actionable.

I don't know why you'd want to act on it.

And I know also that there's some
alternative model that fits almost as well.

That flips everything,

right? Because machine learning doesn't
tell me that this is really the predictor

that

Is it just tells me that this
is a good predictor? And so,

you know, I think there is
something different about the

Moss social science contest. So I think

the socialized signs of applications
you're talking about once where

I think there's not a huge amount
of heterogeneity in the effects.

And so what there might be a few
allow me to to fill that space. No,

not even then I think for
a lot of those those into

Sanctions even effect. You would expect
that. The effect is the same sign

for everybody.

It may be there may be small differences
in the magnitude, but it's not

for a lot of these education
defenses. They're good for everybody.

They're

the it's not that they're bad for some
people and good for other people and

that is kind of very small
Pockets where they're bad the

but it may be some
variation in the magnitude,

but you would need very very big
data sets to find those and I

Then in those cases, they probably
wouldn't be very actionable anyone.

But there's I think there's
a lot of other settings

where there is much more hydrogen it.

Well, I'm open to that possibility
and I think the example you gave of

it's essentially a marketing example.

Now that maybe they
say there's a there's a

have implications for
and that's organization.

How you actually need to
whether you need to worry about

the well, I know Market
power, some see that paper.

So that's the sense. The
sense I'm getting is that

we still disagree on something. Yes.

We have it converged on
everything. I'm getting that sense.

Actually. We've diverged on this because
this wasn't around to argue about.

Is it getting a little warm here? Yeah.
Warm warmed up. Warmed up is good.

The sense. I'm getting his Jaws.
Sort of, you're not, you're not

saying that you're confident
that there is no way.

That there is an application
where the stuff is useful.

You are saying you are you're
unconvinced by the existing.

Applications to dedicate fair
that I'm very confident. Yeah,

in this case.

I think Josh does have a point that today

even in the prediction cases the where

a lot of the machine learning
methods really shine is

where there's just a lot of heterogeneity.

You don't really care much
about the details there, right?

Yes. It does. It doesn't have
a policy angle or something.

They kind of recognizing
handwritten digits and stuff.

For it does much better there than
building some complicated model.

But a lot of the social science, a
lot of the economic applications.

We actually know a huge amount about the
relationship between various variables.

A lot of the relationships
are strictly monotone.

There and education is going
to increase people's earnings,

irrespective of the demographic,
irrespective of the level of Education.

You already have until they get to a
PhD. Yeah. There is a graduate school.

A reasonable range.

It's a it's not going to
go down very much. We're

in a lot of the settings. For these
machine learning method shine.

It's going to there's a lot
of non-monetary Necessities

kind of multi modality
in these relationships

and they're they're going to be very
powerful but I still stand by that.

It kind of It kind of this message just
have a huge amount to offer the for

for economists and they go.

To be a big part of the future.

Feels like there's something interesting
to be said about machine learning here.

So, here I was wondering,
could you give some more,

maybe some examples

of the sorts of examples you're thinking
about with applications? I'm at the moment.

So while I'm on areas where

instead of looking for average

cause of facts were looking for
individualized estimates, and predictions of

of course of facts and their machine
learning algorithms have been very effective,

too.

Surely would have, we would have done
these things, using kernel methods.

And theoretically they work great and

the sort of some arguments that
you formally can't do any better.

But in practice, they
don't work very well and

random Forest, random cause of forest
type things that stuff on wagon, Susan.

I think I've been working
on. I used very widely.

They've been very effective,
kind of, in the settings

to actually get cause of facts
that are that the ferry by

Bike over has, and this kind of,

I think this is still just the beginning
of these methods. But in many cases,

the these algorithms are very
effective as searching over big spaces

and finding the functions that fit

the very well in ways that we
couldn't really do the beforehand.

I don't know of an example, where
machine learning has generated insights

about a causal effect that
I'm interested in. And I,

You know of examples where it's
potentially very misleading.

So I've done some work
with Brigham Franz and

using, for example,

random Forest to model covariate effects
in an instrumental variables problem.

Where you need,

you need to condition on covariance

and you don't particularly have strong
feelings about the functional form for that.

So maybe you should curve

think,

be open to flexible curve fitting
and that leads you down a path

where there's a lot of
nonlinearities in the model and

That's very dangerous with IV because
any sort of excluded non-linearity

potentially generates a spurious, causal
effect and Brigham. And I showed that

very powerfully. I think in
the case of two instruments

that come from a paper, mine
with Bill Evans. Where if you,

you know, replace it

in a traditional two stage least squares,
estimator with some kind of random Forest.

You get very precisely at
estimated nonsense estimates and

You know, I think that's
a, that's a big caution.

And I, you know, in view of those findings

in an example, I care about where
the instruments are very simple

and I believe that they're valid,

you know, I would be skeptical of that. So

non-linearity and Ivy don't mix
very comfortably. Now I said,

you know in some sense that's already
a more complicated. Well, it's Ivy.

Yeah,

but then we work on that and friend out.

I sat in tow vehicle actually guy a lot
of these papers Cross by my desk and it,

but the motivation is is not
clear at a fact, really lacking.

And they're not, they're not, they called
type semi-parametric foundational papers.

So that that's a big problem

and kind of related problem is that
we have this tradition in econometrics

being very focused on these formulas
and tonic results kind of weird.

We have just have a lot of papers
that where you people, propose

a method and then establish
the asymptotic properties

in in a very kind of
standardized way that bad.

Well, I think it's sort of close
the door for a lot of work.

That doesn't fit it into that. We're
in the machine learning literature.

A lot of things are
more algorithmic people.

Had algorithms for coming
up with predictions.

The turn out to actually work much better
than say, nonparametric kernel regression

for a long-ass time. We're doing all
the nonparametric syndecan, metrics.

We do it using kernel regression and
I was great for proving theorems.

You could get confidence, intervals and
consistency, and asymptotic normality,

and it was all great, but
it wasn't very useful.

And the things they did in machine
learning. I just way way better,

but they didn't have to the proper. That's
not my beef with machine learning theory.

As we know my name, I'm saying
there for the prediction part.

It does much better. Yeah, that's
a better curve fitting to it.

But it did. So

in a way that would not have made
those papers initially easy to get into

the econometrics journals because it
wasn't proving the type of things.

You know, when when Brian was doing his
regression trees that just didn't fit in

and I think he would have
had a very hard time.

Polishing these things. And it
could have had six journals.

I, so I think we're we limited
ourselves too much and we

that left us close things off

for a lot of these machine learning
methods, that actually very useful.

Hmm. I mean, I think they're in general,

that literature the computer.

Scientists have brought a huge
number of these algorithms.

The have proposed a huge number of these
algorithms that actually very useful

at that are

Affecting the way we're going
to be doing empirical work,

but we've not fully internalize that
because we're still very focused on getting

Point estimates and
getting standard errors

and getting P values in a way that

we need to move Beyond

to fully harness.

The force, the quote, the benefits
from machine learning literature.

Hmm. On the one hand. I guess I very
much take your point that sort of the the

Tional. Econometrics, framework
of sort of propose, a method,

proved a limit theorem under some
asymptotic story, story story,

story story publish a
paper is constraining.

And that in some sense by thinking, more,

broadly about what a methods paper could
look. Like we may write in some sense.

Certainly the machine learning
literature has found a bunch of things,

which seem to work quite
well for a number of problems

and are now having substantial influence
in economics. I guess a question.

I'm interested in is, how do you think?

The goal of fear.

Sort of, do you think there is? There's
no value in the theory part of it?

Because I guess it's sort of a question
that I often have to sort of seeing

that output from a machine learning tool

that actually a number of the
methods that you talked about.

Actually do have inferential
results, develop for them,

something that I always wonder about a sort
of uncertainty quantification and just,

you know, I I have my prior,

I come into the world with my view.
I see the result of this thing.

How should I update based on it? And
in some sense, if I'm in a world where

things are.

Normally distributed. I know
how to do it here. I don't.

And so I'm interested to hear
had I think it sounds. So

I don't see this as sort
of close it saying, well

we do these results
are not not interesting

but it's gonna be a lot of cases

where it's going to be incredibly hard to
get those results and we may not be able

to get there and

we may need to do it in stages. Where
first someone says. Hey I have this

interesting algorithm for for doing
something and it works well by some

The Criterion that on this
this particular data set

and I'm visit put it
out there and we should

maybe someone will figure out a way that
you can later actually still do inference

on the some condition.

So and maybe those are not
particularly realistic conditions,

then we kind of go further,
but I think we've been

Too constraining things too much where we
said, you know, this is the type of things

that we need to do. And I had some sense

that goes back to kind of
the way they dress and I

thought about things for the
local average treatment effect.

That wasn't quite the way people
were thinking about these problems.

Before they say they there was a sense
that some of the people said, you know,

the way you need to do. These
things, is you first, say

what you're interested in estimating
and then you do the best job you can.

In estimating that

and what you have you guys had doing is
doing it, you guys are doing it backwards.

You're going to say
here. I have an estimator

and now I'm going to figure out what what

what it says estimating then expose.

You're going to say why you
think that's interesting

or maybe why it's not interesting
and that's that's not okay.

You're not allowed to do that that way.

And I think we should just be a little
bit more flexible and thinking about the

how to look at at

Problems because I think we've missed
some things by not by not doing that.

So you've heard our views.
Isaiah, you've seen that, we have

some points of disagreement. Why
don't you referee this dispute for us?

Oh, I'm so so nice of you to ask me
a small question. So I guess for one.

I very much agree with something
that he do said earlier of.

So what?

Where it seems. Where the,

the case for machine learning seems
relatively clear is in settings, where

you know, we're interested in some version
of a nonparametric prediction problem.

So I'm interested in estimating a conditional
expectation or conditional probability

and in the past, maybe I
would have run a colonel,

I would have run a kernel regression or
I would have run a series regression or

something along those lines.

Sort of,

it seems like

at this point we've a fairly good
sense that in a fairly wide range

of applications machine learning
methods seem to do better for

Or, you know,

estimating conditional mean functions

or conditional probabilities or
various other nonparametric objects

than more traditional nonparametric
methods that were studied in econometrics

and statistics, especially
in high dimensional settings.

So you thinking of maybe the propensity
score or something like that?

So exactly, so nuisance functions. Yeah.

So things like propensity scores
things or I mean even objects

of more direct inference

interest, like conditional
average treatment effects, right?

Which of the difference of two
conditional, expectation functions,

potentially things like that.

Of course, even there,
right? We the the theory

for in France or the theory for
sort of how to how to interpret,

how to make large simple statements
about some of these things are

less well-developed depending on the
machine learning, estimator used.

And so, I think there's something
that is tricky is that we

can have these methods, which work a lot,

which seemed to work a lot
better for some purposes.

But which we need to be a bit
careful in how we plug them in or how

we interpret the resulting statements.

But of course, that's a very,
very active area right now. We're

People are doing tons of great work.
And so I exfoli expect and hope

to see much more going forward there.

So one issue with machine learning,
that always seems a danger is, or

that is sometimes a danger
and had some times led to

applications that have
made. Less sense, is

when folks start with a method that are

start with a method that they're very
excited about rather than a question,

right? So sort of starting with
a question where here's the

object I'm interested in here is
the parameter of Interest. Let me

You know,

think about how I would
identify that thing,

how I would recover that
thing, if I had a ton of data,

oh, here's a conditional
expectation function.

Let me plug in an estimator on
machine. Learning estimator for that.

That seems very very sensible.

Whereas, you know, if I
digress quantity on price

and say that I used a
machine learning method,

maybe I'm satisfied that that
solves the in dodging, 80 problem.

We're usually worried
about their maybe I'm not,

but again, that's something where the,

the way to address. It, seems
relatively clear, right?

It's the find your object of interest and

think about, is that just
bringing the economics?

Exactly.

And and can I think about it,
and they denied it, but harnessed

the power of the machine
learning methods for precisely

for some of the components precisely.
Exactly. So sort of, you know, the, the,

the question of interest is the same as
the question of interest is always been,

but we now better methods for estimating
some pieces of this, right? The

the place that seems harder to, uh,

harder to forecast is Right.

Obviously, there's a huge amount
going in going on in the machine.

Learning literature

and the great sort of The Limited ways

of plugging it in that I've referenced
so far are limited piece of that.

And so I think there are all sorts of
other interesting questions about where,

right sort of

where does this interaction
go? What else can we learn?

And that's something where,
you know, I think there's

a ton going on which seems very promising
and I have no idea what the answer is.

No, no. No, it's I so I totally
agree with that but it's no.

That's makes it very exciting.

And I think that's just a
little work to be done there.

All right. So I say agrees
with me there, say that person.

If you'd like to watch more
Nobel conversations, click here,

or if you'd like to learn
more about econometrics,

check out Josh's mastering
econometrics series.

If you'd like to learn more
about he do Josh and Isaiah

check out the links in the description.