Return to Video

How Will Machine Learning Impact Economics?

  • 0:00 - 0:02
    ♪ [music] ♪
  • 0:04 - 0:06
    - [Narrator] Welcome
    to Nobel Conversations.
  • 0:07 - 0:10
    In this episode, Josh Angrist
    and Guido Imbens
  • 0:10 - 0:14
    sit down with Isaiah Andrews
    to discuss and disagree
  • 0:14 - 0:17
    over the role of machine learning
    in applied econometrics.
  • 0:18 - 0:20
    - [Isaiah] So, of course,
    there are a lot of topics
  • 0:20 - 0:21
    where you guys largely agree,
  • 0:21 - 0:22
    but I'd like to turn to one
  • 0:22 - 0:24
    where maybe you have
    some differences of opinion.
  • 0:24 - 0:26
    I'd love to hear
    some of your thoughts
  • 0:26 - 0:27
    about machine learning
  • 0:27 - 0:30
    and the goal that it's playing
    and is going to play in economics.
  • 0:30 - 0:33
    - [Guido] I've looked at some data
    like the proprietary.
  • 0:33 - 0:35
    We see that there's
    no published paper there.
  • 0:36 - 0:38
    There was an experiment
    that was done
  • 0:38 - 0:40
    on some search algorithm,
  • 0:40 - 0:41
    and the question was --
  • 0:43 - 0:46
    it was about ranking things
    and changing the ranking.
  • 0:46 - 0:47
    And it was sort of clear
  • 0:48 - 0:51
    that there was going to be
    a lot of heterogeneity there.
  • 0:52 - 0:56
    If you look for, say,
  • 0:58 - 1:01
    a picture of Britney Spears --
  • 1:01 - 1:02
    that it doesn't really matter
    where you rank it
  • 1:02 - 1:06
    because you're going to figure out
    what you're looking for,
  • 1:06 - 1:08
    whether you put it
    in the first or second
  • 1:08 - 1:10
    or third position of the ranking.
  • 1:10 - 1:12
    But if you're looking
    for the best econometrics book,
  • 1:13 - 1:16
    if you put your book first
    or your book tenth --
  • 1:16 - 1:18
    that's going to make
    a big difference
  • 1:19 - 1:21
    how often people
    are going to click on it.
  • 1:22 - 1:23
    And so there you --
  • 1:23 - 1:27
    - [Josh] Why do I need
    machine learning to discover that?
  • 1:27 - 1:29
    It seems like -- because
    I can discover it simply.
  • 1:29 - 1:30
    - [Guido] So in general --
  • 1:30 - 1:32
    - [Josh] There were lots
    of possible...
  • 1:32 - 1:35
    - You want to think about
    there being lots of characteristics
  • 1:35 - 1:37
    of the items,
  • 1:38 - 1:42
    that you want to understand
    what drives the heterogeneity
  • 1:42 - 1:43
    in the effect of --
  • 1:43 - 1:45
    - But you're just predicting
  • 1:45 - 1:48
    In some sense, you're solving
    a marketing problem.
  • 1:48 - 1:49
    - No, it's a causal effect,
  • 1:49 - 1:52
    - It's causal, but it has
    no scientific content.
  • 1:52 - 1:53
    Think about...
  • 1:54 - 1:57
    - No, but there's similar things
    in medical settings.
  • 1:58 - 2:01
    If you do an experiment,
    you may actually be very interested
  • 2:01 - 2:04
    in whether the treatment works
    for some groups or not.
  • 2:04 - 2:06
    And you have a lot
    of individual characteristics,
  • 2:06 - 2:08
    and you want
    to systematically search --
  • 2:08 - 2:10
    - Yeah. I'm skeptical about that --
  • 2:10 - 2:13
    that sort of idea that there's
    this personal causal effect
  • 2:13 - 2:14
    that I should care about,
  • 2:14 - 2:16
    and that machine learning
    can discover it
  • 2:16 - 2:17
    in some way that's useful.
  • 2:17 - 2:20
    So think about -- I've done
    a lot of work on schools,
  • 2:20 - 2:22
    going to, say, a charter school,
  • 2:22 - 2:24
    a publicly funded private school,
  • 2:25 - 2:27
    effectively,
    that's free to structure
  • 2:27 - 2:29
    its own curriculum
    for context there.
  • 2:29 - 2:31
    Some types of charter schools
  • 2:31 - 2:34
    generate spectacular
    achievement gains,
  • 2:34 - 2:36
    and in the data set
    that produces that result,
  • 2:36 - 2:38
    I have a lot of covariates.
  • 2:38 - 2:41
    So I have baseline scores,
    and I have family background,
  • 2:41 - 2:43
    the education of the parents,
  • 2:44 - 2:46
    the sex of the child,
    the race of the child.
  • 2:46 - 2:50
    And, well, soon as I put
    half a dozen of those together,
  • 2:50 - 2:52
    I have a very
    high-dimensional space.
  • 2:52 - 2:55
    I'm definitely interested
    in course features
  • 2:55 - 2:56
    of that treatment effect,
  • 2:56 - 2:59
    like whether it's better for people
  • 2:59 - 3:02
    who come from
    lower-income families.
  • 3:03 - 3:06
    I have a hard time believing
    that there's an application
  • 3:07 - 3:10
    for the very high-dimensional
    version of that,
  • 3:10 - 3:12
    where I discovered
    that for non-white children
  • 3:12 - 3:15
    who have high family incomes
  • 3:15 - 3:18
    but baseline scores
    in the third quartile
  • 3:18 - 3:22
    and only went to public school
    in the third grade
  • 3:22 - 3:23
    but not the sixth grade.
  • 3:23 - 3:26
    So that's what that
    high-dimensional analysis produces.
  • 3:26 - 3:28
    It's a very elaborate
    conditional statement.
  • 3:28 - 3:31
    There's two things that are wrong
    with that in my view.
  • 3:31 - 3:32
    First, I don't see it as --
  • 3:32 - 3:34
    I just can't imagine
    why it's actionable.
  • 3:35 - 3:37
    I don't know why
    you'd want to act on it.
  • 3:37 - 3:39
    And I know also that
    there's some alternative model
  • 3:39 - 3:41
    that fits almost as well,
  • 3:42 - 3:43
    that flips everything.
  • 3:43 - 3:45
    Because machine learning
    doesn't tell me
  • 3:45 - 3:49
    that this is really
    the predictor that matters --
  • 3:49 - 3:51
    it just tells me
    that this is a good predictor.
  • 3:51 - 3:55
    And so, I think
    there is something different
  • 3:55 - 3:58
    about the social science context.
  • 3:58 - 4:00
    - [Guido] I think
    the social science applications
  • 4:00 - 4:02
    you're talking about
  • 4:02 - 4:03
    are ones where,
  • 4:03 - 4:08
    I think, there's not a huge amount
    of heterogeneity in the effects.
  • 4:10 - 4:12
    - [Josh] Well, there might be
    if you allow me
  • 4:12 - 4:13
    to fill that space.
  • 4:13 - 4:16
    - No... not even then.
  • 4:16 - 4:19
    I think for a lot
    of those interventions,
  • 4:19 - 4:23
    you would expect that the effect
    is the same sign for everybody.
  • 4:24 - 4:28
    There may be small differences
    in the magnitude, but it's not...
  • 4:28 - 4:30
    For a lot of these
    educational defenses --
  • 4:30 - 4:32
    they're good for everybody.
  • 4:34 - 4:36
    It's not that they're bad
    for some people
  • 4:36 - 4:38
    and good for other people,
  • 4:38 - 4:39
    and that is kind
    of very small pockets
  • 4:39 - 4:41
    where they're bad there.
  • 4:41 - 4:44
    But there may be some variation
    in the magnitude,
  • 4:44 - 4:47
    but you would need very,
    very big data sets to find those.
  • 4:48 - 4:49
    I agree that in those cases,
  • 4:49 - 4:51
    they probably wouldn't be
    very actionable anyway.
  • 4:52 - 4:54
    But I think there's a lot
    of other settings
  • 4:54 - 4:57
    where there is
    much more heterogeneity.
  • 4:57 - 4:59
    - Well, I'm open
    to that possibility,
  • 4:59 - 5:05
    and I think the example you gave
    is essentially a marketing example.
  • 5:06 - 5:10
    - No, those have
    implications for it
  • 5:10 - 5:11
    and that's the organization,
  • 5:12 - 5:14
    whether you need
    to worry about the...
  • 5:15 - 5:18
    - Well, I need to see that paper.
  • 5:18 - 5:21
    - So the sense
    I'm getting is that --
  • 5:21 - 5:24
    - We still disagree on something.
    - Yes.
  • 5:24 - 5:25
    - We haven't converged
    on everything.
  • 5:25 - 5:27
    - I'm getting that sense.
    [laughter]
  • 5:27 - 5:29
    - Actually, we've diverged on this
  • 5:29 - 5:31
    because this wasn't around
    to argue about.
  • 5:31 - 5:32
    [laughter]
  • 5:33 - 5:35
    - Is it getting a little warm here?
  • 5:36 - 5:38
    - Warmed up. Warmed up is good.
  • 5:38 - 5:41
    The sense I'm getting is,
    Josh, you're not saying
  • 5:41 - 5:43
    that you're confident
    that there is no way
  • 5:43 - 5:45
    that there is an application
    where this stuff is useful.
  • 5:45 - 5:47
    You are saying
    you are unconvinced
  • 5:47 - 5:49
    by the existing
    applications to date.
  • 5:50 - 5:52
    - Fair enough.
    - I'm very confident.
  • 5:52 - 5:54
    [laughter]
  • 5:54 - 5:55
    - In this case.
  • 5:55 - 5:57
    - I think Josh does have a point
  • 5:57 - 6:00
    that even in the prediction cases
  • 6:02 - 6:05
    where a lot of the machine learning
    methods really shine
  • 6:05 - 6:07
    is where there's just a lot
    of heterogeneity.
  • 6:07 - 6:11
    - You don't really care much
    about the details there, right?
  • 6:11 - 6:12
    - [Guido] Yes.
  • 6:12 - 6:15
    - It doesn't have
    a policy angle or something.
  • 6:15 - 6:18
    - The kind of recognizing
    handwritten digits and stuff --
  • 6:19 - 6:20
    it does much better there
  • 6:20 - 6:24
    than building
    some complicated model.
  • 6:24 - 6:28
    But a lot of the social science,
    a lot of the economic applications,
  • 6:28 - 6:30
    we actually know a huge amount
    about the relationship
  • 6:30 - 6:32
    between its variables.
  • 6:32 - 6:35
    A lot of the relationships
    are strictly monotone.
  • 6:37 - 6:39
    Education is going to increase
    people's earnings,
  • 6:40 - 6:42
    irrespective of the demographic,
  • 6:42 - 6:45
    irrespective of the level
    of education you already have.
  • 6:45 - 6:46
    - Until they get to a Ph.D.
  • 6:46 - 6:48
    - Is that true for graduate school?
  • 6:48 - 6:49
    [laughter]
  • 6:49 - 6:51
    - Over a reasonable range.
  • 6:52 - 6:55
    It's not going
    to go down very much.
  • 6:56 - 6:58
    In a lot of the settings
  • 6:58 - 7:00
    where these machine learning
    methods shine,
  • 7:00 - 7:02
    there's a lot of non-monotonicity,
  • 7:02 - 7:05
    kind of multimodality
    in these relationships,
  • 7:05 - 7:09
    and they're going to be
    very powerful.
  • 7:09 - 7:12
    But I still stand by that.
  • 7:12 - 7:15
    These methods just have
    a huge amount to offer
  • 7:16 - 7:18
    for economists,
  • 7:18 - 7:22
    and they're going to be
    a big part of the future.
  • 7:22 - 7:23
    ♪ [music] ♪
  • 7:23 - 7:25
    - [Isaiah] It feels like
    there's something interesting
  • 7:25 - 7:26
    to be said about
    machine learning here.
  • 7:26 - 7:28
    So, Guido, I was wondering,
    could you give some more...
  • 7:28 - 7:30
    maybe some examples
    of the sorts of examples
  • 7:30 - 7:31
    you're thinking about
  • 7:31 - 7:33
    with applications coming out
    at the moment?
  • 7:33 - 7:34
    - So one area is where
  • 7:35 - 7:37
    instead of looking
    for average causal effects,
  • 7:37 - 7:39
    we're looking for
    individualized estimates,
  • 7:41 - 7:43
    predictions of causal effects,
  • 7:43 - 7:46
    and there,
    the machine learning algorithms
  • 7:46 - 7:48
    have been very effective.
  • 7:48 - 7:51
    Traditionally, we would have done
    these things using kernel methods,
  • 7:51 - 7:54
    and theoretically, they work great,
  • 7:54 - 7:56
    and there's some arguments
  • 7:56 - 7:58
    that, formally,
    you can't do any better.
  • 7:58 - 8:00
    But in practice,
    they don't work very well.
  • 8:01 - 8:04
    Random causal forest-type things
  • 8:04 - 8:07
    that Stefan Wager and Susan Athey
    have been working on
  • 8:07 - 8:09
    are used very widely.
  • 8:09 - 8:12
    They've been very effective
    in these settings
  • 8:12 - 8:19
    to actually get causal effects
    that vary by covariates.
  • 8:21 - 8:24
    I think this is still just
    the beginning of these methods.
  • 8:24 - 8:26
    But in many cases,
  • 8:27 - 8:32
    these algorithms are very effective
    as searching over big spaces
  • 8:32 - 8:37
    and finding the functions
    that fit very well
  • 8:37 - 8:41
    in ways that we couldn't
    really do beforehand.
  • 8:42 - 8:43
    - I don't know of an example
  • 8:43 - 8:45
    where machine learning
    has generated insights
  • 8:45 - 8:48
    about a causal effect
    that I'm interested in.
  • 8:48 - 8:50
    And I do know of examples
  • 8:50 - 8:51
    where it's potentially
    very misleading.
  • 8:51 - 8:54
    So I've done some work
    with Brigham Frandsen,
  • 8:54 - 8:58
    using, for example, random forests
    to model covariate effects
  • 8:58 - 9:00
    in an instrumental
    variables problem
  • 9:00 - 9:03
    where you need
    to condition on covariates.
  • 9:04 - 9:07
    And you don't particularly
    have strong feelings
  • 9:07 - 9:08
    about the functional form for that,
  • 9:08 - 9:10
    so maybe you should curve...
  • 9:11 - 9:13
    be open to flexible curve fitting,
  • 9:13 - 9:15
    And that leads you down a path
  • 9:15 - 9:17
    where there's a lot
    of nonlinearities in the model,
  • 9:17 - 9:20
    and that's very dangerous with IV
  • 9:20 - 9:23
    because any sort
    of excluded non-linearity
  • 9:23 - 9:26
    potentially generates
    a spurious causal effect,
  • 9:26 - 9:29
    and Brigham and I showed that
    very powerfully, I think,
  • 9:29 - 9:32
    in the case of two instruments
  • 9:33 - 9:35
    that come from a paper of mine
    with Bill Evans,
  • 9:35 - 9:38
    where if you replace it...
  • 9:39 - 9:41
    a traditional two-stage
    least squares estimator
  • 9:41 - 9:43
    with some kind of random forest,
  • 9:43 - 9:47
    you get very precisely estimated
    nonsense estimates.
  • 9:49 - 9:51
    I think that's a big caution.
  • 9:52 - 9:55
    In view of those findings,
    in an example I care about
  • 9:55 - 9:57
    where the instruments
    are very simple
  • 9:57 - 9:59
    and I believe that they're valid,
  • 9:59 - 10:01
    I would be skeptical of that.
  • 10:03 - 10:06
    Non-linearity and IV
    don't mix very comfortably.
  • 10:06 - 10:09
    - No, it sounds like that's already
    a more complicated...
  • 10:10 - 10:12
    - Well, it's IV...
    - Yeah.
  • 10:13 - 10:14
    - ...but then we work on that.
  • 10:14 - 10:16
    [laughter]
  • 10:16 - 10:17
    - Fair enough.
  • 10:17 - 10:18
    ♪ [music] ♪
  • 10:18 - 10:20
    - [Guido] As editor
    of Econometrica,
  • 10:20 - 10:22
    a lot of these papers
    cross my desk,
  • 10:23 - 10:27
    but the motivation is not clear
  • 10:28 - 10:30
    and, in fact, really lacking.
  • 10:30 - 10:31
    They're not...
  • 10:32 - 10:35
    big old type semiparametric
    foundational papers.
  • 10:35 - 10:37
    So that's a big problem.
  • 10:39 - 10:43
    A related problem is that we have
    this tradition in econometrics
  • 10:43 - 10:47
    of being very focused
    on these formal asymptotic results.
  • 10:49 - 10:53
    We just have a lot of papers
    where people propose a method,
  • 10:53 - 10:56
    and then they establish
    the asymptotic properties
  • 10:56 - 10:59
    in a very kind of standardized way.
  • 11:01 - 11:02
    - Is that bad?
  • 11:03 - 11:06
    - Well, I think it's sort
    of closed the door
  • 11:06 - 11:09
    for a lot of work
    that doesn't fit into that
  • 11:09 - 11:12
    where in the machine
    learning literature,
  • 11:12 - 11:13
    a lot of things
    are more algorithmic.
  • 11:14 - 11:18
    People had algorithms
    for coming up with predictions
  • 11:19 - 11:21
    that turn out
    to actually work much better
  • 11:21 - 11:24
    than, say, nonparametric
    kernel regression.
  • 11:24 - 11:27
    For a long time, we were doing all
    the nonparametrics in econometrics,
  • 11:27 - 11:29
    and we were using
    kernel regression,
  • 11:29 - 11:31
    and that was great
    for proving theorems.
  • 11:31 - 11:33
    You could get confidence intervals
  • 11:33 - 11:35
    and consistency,
    and asymptotic normality,
  • 11:35 - 11:36
    and it was all great,
  • 11:36 - 11:37
    But it wasn't very useful.
  • 11:37 - 11:39
    And the things they did
    in machine learning
  • 11:39 - 11:41
    are just way, way better.
  • 11:41 - 11:43
    But they didn't have the problem --
  • 11:43 - 11:44
    - That's not my beef
    with machine learning,
  • 11:44 - 11:46
    that the theory is weak.
  • 11:46 - 11:47
    [laughter]
  • 11:47 - 11:51
    - No, but I'm saying there,
    for the prediction part,
  • 11:51 - 11:52
    it does much better.
  • 11:52 - 11:54
    - Yeah, it's a better
    curve fitting tool.
  • 11:55 - 11:58
    - But it did so in a way
  • 11:58 - 12:00
    that would not have made
    those papers
  • 12:00 - 12:04
    initially easy to get into,
    the econometrics journals,
  • 12:04 - 12:06
    because it wasn't proving
    the type of things...
  • 12:07 - 12:10
    When Breiman was doing
    his regression trees --
  • 12:10 - 12:11
    they just didn't fit in.
  • 12:13 - 12:15
    I think he would have had
    a very hard time
  • 12:15 - 12:18
    publishing these things
    in econometrics journals.
  • 12:20 - 12:24
    I think we've limited
    ourselves too much
  • 12:25 - 12:28
    that left us close things off
  • 12:28 - 12:30
    for a lot of these
    machine-learning methods
  • 12:30 - 12:31
    that are actually very useful.
  • 12:31 - 12:34
    I mean, I think, in general,
  • 12:35 - 12:37
    that literature,
    the computer scientist,
  • 12:37 - 12:40
    have brought a huge number
    of these algorithms there --
  • 12:41 - 12:43
    have proposed a huge number
    of these algorithms
  • 12:43 - 12:44
    that actually are very useful.
  • 12:44 - 12:46
    and that are affecting
  • 12:46 - 12:49
    the way we're going
    to be doing empirical work.
  • 12:50 - 12:52
    But we've not fully
    internalized that
  • 12:52 - 12:54
    because we're still very focused
  • 12:54 - 12:58
    on getting point estimates
    and getting standard errors
  • 12:59 - 13:00
    and getting P values
  • 13:00 - 13:03
    in a way that we need
    to move beyond
  • 13:03 - 13:06
    to fully harness the force,
  • 13:07 - 13:08
    the benefits
  • 13:08 - 13:11
    from the machine
    learning literature.
  • 13:11 - 13:14
    - On the one hand, I guess I very
    much take your point
  • 13:14 - 13:17
    that sort of the traditional
    econometrics framework
  • 13:17 - 13:20
    of propose a method,
    prove a limit theorem
  • 13:20 - 13:24
    under some asymptotic story,
    story, story, story, story...
  • 13:24 - 13:27
    publisher paper is constraining,
  • 13:27 - 13:30
    and that, in some sense,
    by thinking more broadly
  • 13:30 - 13:32
    about what a methods paper
    could look like,
  • 13:32 - 13:33
    we may write, in some sense,
  • 13:33 - 13:35
    certainly that the machine
    learning literature
  • 13:35 - 13:37
    has found a bunch of things
    which seem to work quite well
  • 13:37 - 13:38
    for a number of problems
  • 13:38 - 13:41
    and are now having
    substantial influence in economics.
  • 13:41 - 13:43
    I guess a question
    I'm interested in
  • 13:43 - 13:46
    is how do you think
    about the role of...
  • 13:49 - 13:51
    Do you think there is no value
    in the theory part of it?
  • 13:52 - 13:54
    Because I guess a question
    that I often have
  • 13:54 - 13:57
    to seeing the output
    from a machine learning tool,
  • 13:57 - 13:58
    and actually a number
    of the methods
  • 13:58 - 13:59
    that you talked about
  • 13:59 - 14:01
    actually do have
    inferential results
  • 14:01 - 14:02
    developed for them,
  • 14:03 - 14:04
    something that
    I always wonder about,
  • 14:04 - 14:07
    a sort of uncertainty
    quantification and just...
  • 14:07 - 14:08
    I have my prior,
  • 14:08 - 14:11
    I come into the world with my view,
    I see the result of this thing.
  • 14:11 - 14:12
    How should I update based on it?
  • 14:12 - 14:14
    And in some sense,
    if I'm in a world
  • 14:14 - 14:16
    where things
    are normally distributed,
  • 14:16 - 14:17
    I know how to do it --
  • 14:17 - 14:18
    here I don't.
  • 14:18 - 14:21
    And so I'm interested to hear
    what you think about that.
  • 14:22 - 14:24
    - I don't see this
    as sort of saying, well,
  • 14:25 - 14:27
    these results are not interesting,
  • 14:27 - 14:28
    but it's going to be a lot of cases
  • 14:28 - 14:30
    where it's going to be incredibly
    hard to get those results,
  • 14:30 - 14:32
    and we may not
    be able to get there,
  • 14:32 - 14:35
    and we may need to do it in stages
  • 14:35 - 14:36
    where first someone says,
  • 14:36 - 14:41
    "Hey, I have
    this interesting algorithm
  • 14:41 - 14:42
    for doing something,"
  • 14:42 - 14:47
    and it works well
    by some criterion
  • 14:47 - 14:50
    on this particular data set,
  • 14:51 - 14:53
    and we should put it out there.
  • 14:53 - 14:55
    and maybe someone
    will figure out a way
  • 14:55 - 14:58
    that you can later actually
    still do inference
  • 14:58 - 14:59
    under some conditions,
  • 14:59 - 15:02
    and maybe those are not
    particularly realistic conditions.
  • 15:02 - 15:04
    Then we kind of go further.
  • 15:04 - 15:08
    But I think we've been
    constraining things too much
  • 15:08 - 15:10
    where we said,
  • 15:10 - 15:13
    "This is the type of things
    that we need to do."
  • 15:13 - 15:15
    And in some sense,
  • 15:16 - 15:18
    that goes back
    to the way Josh and I
  • 15:20 - 15:22
    thought about things for the local
    average treatment effect.
  • 15:22 - 15:23
    That wasn't quite the way
  • 15:23 - 15:25
    people were thinking
    about these problems before.
  • 15:26 - 15:29
    There was a sense
    that some of the people said
  • 15:30 - 15:32
    the way you need to do
    these things is you first say
  • 15:32 - 15:34
    what you're interested
    in estimating,
  • 15:34 - 15:38
    and then you do the best job
    you can in estimating that.
  • 15:38 - 15:44
    And what you guys are doing
    is you're doing it backwards.
  • 15:44 - 15:47
    You kind of say,
    "Here, I have an estimator,
  • 15:47 - 15:51
    and now I'm going to figure out
    what it's estimating."
  • 15:51 - 15:54
    And I suppose you're going to say
    why you think that's interesting
  • 15:54 - 15:57
    or maybe why it's not interesting,
    and that's not okay.
  • 15:57 - 15:59
    You're not allowed
    to do that in that way.
  • 15:59 - 16:02
    And I think we should
    just be a little bit more flexible
  • 16:02 - 16:07
    in thinking about
    how to look at problems
  • 16:07 - 16:08
    because I think
    we've missed some things
  • 16:08 - 16:11
    by not doing that.
  • 16:11 - 16:13
    ♪ [music] ♪
  • 16:13 - 16:15
    - [Josh] So you've heard
    our views, Isaiah,
  • 16:15 - 16:18
    and you've seen that we have
    some points of disagreement.
  • 16:18 - 16:20
    Why don't you referee
    this dispute for us?
  • 16:21 - 16:22
    [laughter]
  • 16:22 - 16:25
    - Oh, it's so nice of you
    to ask me a small question.
  • 16:25 - 16:26
    [laughter]
  • 16:26 - 16:28
    So I guess, for one,
  • 16:28 - 16:33
    I very much agree with something
    that Guido said earlier of...
  • 16:34 - 16:36
    [laughter]
  • 16:36 - 16:37
    So one thing where it seems
  • 16:37 - 16:40
    where the case for machine learning
    seems relatively clear
  • 16:40 - 16:43
    is in settings where
    we're interested in some version
  • 16:43 - 16:45
    of a nonparametric
    prediction problem.
  • 16:45 - 16:46
    So I'm interested in estimating
  • 16:46 - 16:50
    a conditional expectation
    or conditional probability,
  • 16:50 - 16:52
    and in the past, maybe
    I would have run a kernel...
  • 16:52 - 16:54
    I would have run
    a kernel regression,
  • 16:54 - 16:55
    or I would have run
    a series regression,
  • 16:55 - 16:57
    or something along those lines.
  • 16:58 - 17:00
    It seems like, at this point,
    we've a fairly good sense
  • 17:00 - 17:03
    that in a fairly wide range
    of applications,
  • 17:03 - 17:06
    machine learning methods
    seem to do better
  • 17:06 - 17:09
    for estimating conditional
    mean functions,
  • 17:09 - 17:10
    or conditional probabilities,
  • 17:10 - 17:12
    or various other
    nonparametric objects
  • 17:12 - 17:15
    than more traditional
    nonparametric methods
  • 17:15 - 17:17
    that were studied
    in econometrics and statistics,
  • 17:17 - 17:19
    especially in
    high-dimensional settings.
  • 17:20 - 17:22
    - So you're thinking of maybe
    the propensity score
  • 17:22 - 17:23
    or something like that?
  • 17:23 - 17:25
    - Yeah, exactly,
    - Nuisance functions.
  • 17:25 - 17:27
    - Yeah, so things
    like propensity scores.
  • 17:28 - 17:30
    Even objects of more direct
  • 17:30 - 17:32
    interest-like conditional
    average treatment effects,
  • 17:32 - 17:35
    which are the difference of two
    conditional expectation functions,
  • 17:35 - 17:37
    potentially things like that.
  • 17:37 - 17:41
    Of course, even there,
    the theory...
  • 17:41 - 17:44
    for inference of the theory
    for how to interpret,
  • 17:44 - 17:46
    how to make large sample statements
    about some of these things
  • 17:46 - 17:48
    are less well-developed
    depending on
  • 17:48 - 17:50
    the machine learning
    estimator used.
  • 17:50 - 17:53
    And so I think
    something that is tricky
  • 17:53 - 17:56
    is that we can have these methods,
    which work a lot,
  • 17:56 - 17:58
    which seem to work
    a lot better for some purposes
  • 17:58 - 18:01
    but which we need to be a bit
    careful in how we plug them in
  • 18:01 - 18:03
    or how we interpret
    the resulting statements.
  • 18:04 - 18:06
    But, of course, that's a very,
    very active area right now
  • 18:06 - 18:08
    where people are doing
    tons of great work.
  • 18:08 - 18:11
    So I fully expect
    and hope to see
  • 18:11 - 18:13
    much more going forward there.
  • 18:13 - 18:17
    So one issue with machine learning
    that always seems a danger is...
  • 18:17 - 18:19
    or that is sometimes a danger
  • 18:19 - 18:21
    and has sometimes
    led to applications
  • 18:21 - 18:22
    that have made less sense
  • 18:22 - 18:27
    is when folks start with a method
    that they're very excited about
  • 18:27 - 18:29
    rather than a question.
  • 18:29 - 18:30
    So sort of starting with a question
  • 18:30 - 18:34
    where here's the object
    I'm interested in,
  • 18:34 - 18:35
    here is the parameter
    of interest --
  • 18:36 - 18:40
    let me think about how I would
    identify that thing,
  • 18:40 - 18:42
    how I would recover that thing
    if I had a ton of data.
  • 18:42 - 18:44
    Oh, here's a conditional
    expectation function,
  • 18:44 - 18:47
    let me plug in a machine
    learning estimator for that --
  • 18:47 - 18:49
    that seems very, very sensible.
  • 18:49 - 18:53
    Whereas, you know,
    if I regress quantity on price
  • 18:54 - 18:56
    and say that I used
    a machine learning method,
  • 18:56 - 18:59
    maybe I'm satisfied that
    that solves the endogeneity problem
  • 18:59 - 19:01
    we're usually worried
    about there -- maybe I'm not.
  • 19:02 - 19:03
    But, again, that's something
  • 19:03 - 19:06
    where the way to address it
    seems relatively clear.
  • 19:06 - 19:08
    It's to find
    your object of interest
  • 19:08 - 19:10
    and think about --
  • 19:10 - 19:11
    - Just bring in the economics.
  • 19:11 - 19:13
    - Exactly.
  • 19:13 - 19:14
    - And think about
    the heterogeneity,
  • 19:14 - 19:17
    but harness the power
    of the machine learning methods
  • 19:17 - 19:20
    for some of the components.
  • 19:20 - 19:21
    - Precisely. Exactly.
  • 19:21 - 19:24
    So the question of interest
  • 19:24 - 19:26
    is the same as the question
    of interest has always been,
  • 19:26 - 19:29
    but we now have better methods
    for estimating some pieces of this.
  • 19:30 - 19:33
    The place that seems
    harder to forecast
  • 19:33 - 19:36
    is obviously there's
    a huge amount going on
  • 19:36 - 19:38
    in the machine learning literature,
  • 19:38 - 19:40
    and the limited ways
    of plugging it in
  • 19:40 - 19:41
    that I've referenced so far
  • 19:41 - 19:43
    are a limited piece of that.
  • 19:43 - 19:45
    So I think there are all sorts
    of other interesting questions
  • 19:45 - 19:47
    about where...
  • 19:47 - 19:49
    where does this interaction go?
    What else can we learn?
  • 19:49 - 19:53
    And that's something where
    I think there's a ton going on,
  • 19:53 - 19:54
    which seems very promising,
  • 19:54 - 19:56
    and I have no idea
    what the answer is.
  • 19:57 - 20:00
    - No, I totally agree with that,
  • 20:00 - 20:04
    but that makes it very exciting.
  • 20:04 - 20:06
    And I think there's just
    a little work to be done there.
  • 20:07 - 20:09
    Alright. So Isaiah agrees
    with me there.
  • 20:09 - 20:10
    [laughter]
  • 20:10 - 20:12
    - I didn't say that per se.
  • 20:13 - 20:14
    ♪ [music] ♪
  • 20:14 - 20:17
    - [Narrator] If you'd like to watch
    more Nobel Conversations,
  • 20:17 - 20:18
    click here.
  • 20:18 - 20:20
    Or if you'd like to learn
    more about econometrics,
  • 20:20 - 20:23
    check out Josh's
    Mastering Econometrics series.
  • 20:24 - 20:27
    If you'd like to learn more
    about Guido, Josh, and Isaiah,
  • 20:27 - 20:29
    check out the links
    in the description.
  • 20:29 - 20:31
    ♪ [music] ♪
Title:
How Will Machine Learning Impact Economics?
ASR Confidence:
0.83
Description:

This episode is the most heated of the series! While Nobel laureates Josh Angrist and Guido Imbens agree on most topics, they sharply diverge on the potential of machine learning to impact economics. Host Isaiah Andrews steps in to referee the dispute, adding his own take on how machine learning might change econometrics.

Guido Imbens is optimistic about the potential of using machine learning to estimate “personalized casual effects” in large data sets. He laments that econometrics journals have been too rigid in their expectations, turning away many useful insights from machine learning.

Josh Angrist has a less rosy view. He has yet to see machine learning make an impact on the work he’s doing. Instead, he’s seen cases where it can be very misleading.

More about Guido Imbens: https://www.gsb.stanford.edu/faculty-research/faculty/guido-w-imbens
More about Joshua Angrist: https://economics.mit.edu/faculty/angrist
More about Isaiah Andrews: https://scholar.harvard.edu/iandrews/home

00:00 - Intro
00:18 - Potential for "personalized" causal effects
07:22 - Applications of machine learning
10:17 - Opportunities for publishing in journals
16:12 - Isaiah Andrews referees!

more » « less
Video Language:
English
Team:
Marginal Revolution University
Duration:
20:33

English subtitles

Revisions Compare revisions