Return to Video

How Will Machine Learning Impact Economics?

  • 0:00 - 0:02
    ♪ [music] ♪
  • 0:04 - 0:06
    - [Narrator] Welcome
    to Nobel Conversations.
  • 0:07 - 0:10
    In this episode, Josh Angrist
    and Guido Imbens
  • 0:10 - 0:14
    sit down with Isaiah Andrews
    to discuss and disagree
  • 0:14 - 0:17
    over the role of machine learning
    in applied econometrics.
  • 0:18 - 0:20
    - [Isaiah] So, of course,
    there are a lot of topics
  • 0:20 - 0:21
    where you guys largely agree,
  • 0:21 - 0:22
    but I'd like to turn to one
  • 0:22 - 0:24
    where maybe you have
    some differences of opinion.
  • 0:24 - 0:26
    I'd love to hear
    some of your thoughts
  • 0:26 - 0:27
    about machine learning
  • 0:27 - 0:30
    and the goal that it's playing
    and is going to play in economics.
  • 0:30 - 0:33
    - [Guido] I've looked at some data
    like the proprietary.
  • 0:33 - 0:35
    We see that there's
    no published paper there.
  • 0:36 - 0:38
    There was an experiment
    that was done
  • 0:38 - 0:40
    on some search algorithm,
  • 0:40 - 0:41
    and the question was --
  • 0:43 - 0:46
    it was about ranking things
    and changing the ranking.
  • 0:46 - 0:47
    And it was sort of clear
  • 0:48 - 0:51
    that there was going to be
    a lot of heterogeneity there.
  • 0:52 - 0:56
    If you look for, say,
  • 0:58 - 1:01
    a picture of Britney Spears --
  • 1:01 - 1:02
    that it doesn't really matter
    where you rank it
  • 1:02 - 1:06
    because you're going to figure out
    what you're looking for,
  • 1:06 - 1:08
    whether you put it
    in the first or second
  • 1:08 - 1:10
    or third position of the ranking.
  • 1:10 - 1:12
    But if you're looking
    for the best econometrics book,
  • 1:13 - 1:16
    if you put your book first
    or your book tenth --
  • 1:16 - 1:18
    that's going to make
    a big difference
  • 1:19 - 1:21
    how often people
    are going to click on it.
  • 1:22 - 1:23
    And so there you --
  • 1:23 - 1:27
    - [Josh] Why do I need
    machine learning to discover that?
  • 1:27 - 1:29
    It seems like -- because
    I can discover it simply.
  • 1:29 - 1:30
    - [Guido] So in general --
  • 1:30 - 1:32
    - [Josh] There were lots
    of possible...
  • 1:32 - 1:35
    - You want to think about
    there being lots of characteristics
  • 1:35 - 1:37
    of the items,
  • 1:38 - 1:42
    that you want to understand
    what drives the heterogeneity
  • 1:42 - 1:43
    in the effect of --
  • 1:43 - 1:45
    - But you're just predicting
  • 1:45 - 1:48
    In some sense, you're solving
    a marketing problem.
  • 1:48 - 1:49
    - No, it's a causal effect,
  • 1:49 - 1:52
    - It's causal, but it has
    no scientific content.
  • 1:52 - 1:53
    Think about...
  • 1:54 - 1:57
    - No, but there's similar things
    in medical settings.
  • 1:58 - 2:01
    If you do an experiment,
    you may actually be very interested
  • 2:01 - 2:04
    in whether the treatment works
    for some groups or not.
  • 2:04 - 2:06
    And you have a lot
    of individual characteristics,
  • 2:06 - 2:08
    and you want
    to systematically search --
  • 2:08 - 2:10
    - Yeah. I'm skeptical about that --
  • 2:10 - 2:13
    that sort of idea that there's
    this personal causal effect
  • 2:13 - 2:14
    that I should care about,
  • 2:14 - 2:16
    and that machine learning
    can discover it
  • 2:16 - 2:17
    in some way that's useful.
  • 2:17 - 2:20
    So think about -- I've done
    a lot of work on schools,
  • 2:20 - 2:22
    going to, say, a charter school,
  • 2:22 - 2:24
    a publicly funded private school,
  • 2:25 - 2:27
    effectively,
    that's free to structure
  • 2:27 - 2:29
    its own curriculum
    for context there.
  • 2:29 - 2:31
    Some types of charter schools
  • 2:31 - 2:34
    generate spectacular
    achievement gains,
  • 2:34 - 2:36
    and in the data set
    that produces that result,
  • 2:36 - 2:38
    I have a lot of covariates.
  • 2:38 - 2:41
    So I have baseline scores,
    and I have family background,
  • 2:41 - 2:43
    the education of the parents,
  • 2:44 - 2:46
    the sex of the child,
    the race of the child.
  • 2:46 - 2:50
    And, well, soon as I put
    half a dozen of those together,
  • 2:50 - 2:52
    I have a very
    high-dimensional space.
  • 2:52 - 2:55
    I'm definitely interested
    in course features
  • 2:55 - 2:56
    of that treatment effect,
  • 2:56 - 2:59
    like whether it's better for people
  • 2:59 - 3:02
    who come from
    lower-income families.
  • 3:03 - 3:06
    I have a hard time believing
    that there's an application
  • 3:07 - 3:10
    for the very high-dimensional
    version of that,
  • 3:10 - 3:12
    where I discovered
    that for non-white children
  • 3:12 - 3:15
    who have high family incomes
  • 3:15 - 3:18
    but baseline scores
    in the third quartile
  • 3:18 - 3:22
    and only went to public school
    in the third grade
  • 3:22 - 3:23
    but not the sixth grade.
  • 3:23 - 3:26
    So that's what that
    high-dimensional analysis produces.
  • 3:26 - 3:28
    It's a very elaborate
    conditional statement.
  • 3:28 - 3:31
    There's two things that are wrong
    with that in my view.
  • 3:31 - 3:32
    First, I don't see it as --
  • 3:32 - 3:34
    I just can't imagine
    why it's actionable.
  • 3:35 - 3:37
    I don't know why
    you'd want to act on it.
  • 3:37 - 3:39
    And I know also that
    there's some alternative model
  • 3:39 - 3:41
    that fits almost as well,
  • 3:42 - 3:43
    that flips everything.
  • 3:43 - 3:45
    Because machine learning
    doesn't tell me
  • 3:45 - 3:49
    that this is really
    the predictor that matters --
  • 3:49 - 3:51
    it just tells me
    that this is a good predictor.
  • 3:51 - 3:55
    And so, I think
    there is something different
  • 3:55 - 3:58
    about the social science context.
  • 3:58 - 4:00
    - [Guido] I think
    the social science applications
  • 4:00 - 4:02
    you're talking about,
  • 4:02 - 4:03
    ones where...
  • 4:03 - 4:08
    I think there's not a huge amount
    of heterogeneity in the effects.
  • 4:10 - 4:12
    - [Josh] Well, there might be
    if you allow me
  • 4:12 - 4:13
    to fill that space.
  • 4:13 - 4:16
    - No... not even then.
  • 4:16 - 4:19
    I think for a lot
    of those interventions,
  • 4:19 - 4:23
    you would expect that the effect
    has the same sign for everybody.
  • 4:24 - 4:28
    There may be small differences
    in the magnitude, but it's not...
  • 4:28 - 4:30
    For a lot of these
    educational [defenses] --
  • 4:30 - 4:32
    they're good for everybody.
  • 4:34 - 4:36
    It's not that they're bad
    for some people
  • 4:36 - 4:38
    and good for other people,
  • 4:38 - 4:39
    and that is kind
    of very small pockets
  • 4:39 - 4:41
    where they're bad there.
  • 4:41 - 4:44
    But there may be some variation
    in the magnitude,
  • 4:44 - 4:47
    but you would need very,
    very big data sets to find those.
  • 4:48 - 4:49
    I agree that in those cases,
  • 4:49 - 4:51
    they probably wouldn't be
    very actionable anyway.
  • 4:52 - 4:54
    But I think there's a lot
    of other settings
  • 4:54 - 4:57
    where there is
    much more heterogeneity.
  • 4:57 - 4:59
    - Well, I'm open
    to that possibility,
  • 4:59 - 5:05
    and I think the example you gave
    is essentially a marketing example.
  • 5:06 - 5:10
    - No, those have
    implications for it
  • 5:10 - 5:11
    and that's the organization,
  • 5:12 - 5:14
    whether you need
    to worry about the...
  • 5:15 - 5:18
    - Well, I need to see that paper.
  • 5:18 - 5:21
    - So the sense
    I'm getting is that --
  • 5:21 - 5:24
    - We still disagree on something.
    - Yes. [laughter]
  • 5:24 - 5:25
    - We haven't converged
    on everything.
  • 5:25 - 5:27
    - I'm getting that sense.
    [laughter]
  • 5:27 - 5:29
    - Actually, we've diverged on this
  • 5:29 - 5:31
    because this wasn't around
    to argue about.
  • 5:31 - 5:32
    [laughter]
  • 5:33 - 5:35
    - Is it getting a little warm here?
  • 5:36 - 5:38
    - Warmed up. Warmed up is good.
  • 5:38 - 5:41
    The sense I'm getting is,
    Josh, you're not saying
  • 5:41 - 5:43
    that you're confident
    that there is no way
  • 5:43 - 5:45
    that there is an application
    with the stuff.
  • 5:45 - 5:47
    It's useful you are saying
    you are unconvinced
  • 5:47 - 5:49
    by the existing
    applications to date.
  • 5:50 - 5:52
    - Fair enough.
    - I'm very confident.
  • 5:52 - 5:54
    [laughter]
  • 5:54 - 5:55
    - In this case.
  • 5:55 - 5:57
    - I think Josh does have a point
  • 5:57 - 6:00
    that even in the prediction cases
  • 6:02 - 6:05
    where a lot of the machine learning
    methods really shine
  • 6:05 - 6:07
    is where there's just a lot
    of heterogeneity.
  • 6:07 - 6:11
    - You don't really care much
    about the details there, right?
  • 6:11 - 6:12
    - [Guido] Yes.
  • 6:12 - 6:15
    - It doesn't have
    a policy angle or something.
  • 6:15 - 6:18
    - The kind of recognizing
    handwritten digits and stuff --
  • 6:19 - 6:20
    it does much better there
  • 6:20 - 6:24
    than building
    some complicated model.
  • 6:24 - 6:28
    But a lot of the social science,
    a lot of the economic applications,
  • 6:28 - 6:30
    we actually know a huge amount
    about the relationship
  • 6:30 - 6:32
    between its variables.
  • 6:32 - 6:35
    A lot of the relationships
    are strictly monotone.
  • 6:37 - 6:39
    Education is going to increase
    people's earnings,
  • 6:40 - 6:42
    irrespective of the demographic,
  • 6:42 - 6:45
    irrespective of the level
    of education you already have.
  • 6:45 - 6:46
    - Until they get to a Ph.D.
  • 6:46 - 6:48
    - They don't have proof
    of graduate school...
  • 6:48 - 6:49
    [laughter]
  • 6:49 - 6:51
    - Over a reasonable range.
  • 6:52 - 6:55
    It's not going
    to go down very much.
  • 6:56 - 6:58
    In a lot of the settings
  • 6:58 - 7:00
    where these machine learning
    methods shine,
  • 7:00 - 7:02
    there's a lot of non-monotonicity
  • 7:02 - 7:05
    kind of multimodality
    in these relationships,
  • 7:05 - 7:09
    and they're going to be
    very powerful.
  • 7:09 - 7:12
    But I still stand by that.
  • 7:12 - 7:15
    These methods just have
    a huge amount to offer
  • 7:16 - 7:18
    for economists,
  • 7:18 - 7:22
    and they're going to be
    a big part of the future.
  • 7:22 - 7:23
    ♪ [music] ♪
  • 7:23 - 7:25
    - [Isaiah] It feels like
    there's something interesting
  • 7:25 - 7:26
    to be said about
    machine learning here.
  • 7:26 - 7:28
    So, Guido, I was wondering,
    could you give some more...
  • 7:28 - 7:30
    maybe some examples
    of the sorts of examples
  • 7:30 - 7:31
    you're thinking about
  • 7:31 - 7:32
    with applications [inaudible]
    at the moment?
  • 7:32 - 7:34
    - So one area is where
  • 7:35 - 7:37
    instead of looking
    for average causal effects,
  • 7:37 - 7:39
    we're looking for
    individualized estimates,
  • 7:41 - 7:43
    predictions of cause or effects,
  • 7:43 - 7:47
    and the machine learning algorithms
    have been very effective.
  • 7:48 - 7:51
    Traditionally, we would have done
    these things using kernel methods,
  • 7:51 - 7:54
    and theoretically, they work great,
  • 7:54 - 7:56
    and there's some arguments
  • 7:56 - 7:58
    that, formally,
    you can't do any better.
  • 7:58 - 8:00
    But in practice,
    they don't work very well.
  • 8:01 - 8:04
    Random causal forest-type things
  • 8:04 - 8:07
    that Stefan Wager and Susan Athey
    have been working on
  • 8:07 - 8:09
    are used very widely.
  • 8:09 - 8:12
    They've been very effective
    in these settings
  • 8:12 - 8:19
    to actually get causal effects
    that vary by covariate.
  • 8:21 - 8:24
    I think this is still just
    the beginning of these methods.
  • 8:24 - 8:26
    But in many cases,
  • 8:27 - 8:32
    these algorithms are very effective
    as searching over big spaces
  • 8:32 - 8:37
    and finding the functions
    that fit very well
  • 8:37 - 8:41
    in ways that we couldn't
    really do beforehand.
  • 8:42 - 8:43
    - I don't know of an example
  • 8:43 - 8:45
    where machine learning
    has generated insights
  • 8:45 - 8:48
    about a causal effect
    that I'm interested in.
  • 8:48 - 8:50
    And I do know of examples
  • 8:50 - 8:51
    where it's potentially
    very misleading.
  • 8:51 - 8:54
    So I've done some work
    with Brigham Frandsen,
  • 8:54 - 8:58
    using, for example, random forest
    to model covariate effects
  • 8:58 - 9:00
    in an instrumental
    variables problem
  • 9:00 - 9:03
    where you need
    to condition on covariates.
  • 9:04 - 9:07
    And you don't particularly
    have strong feelings
  • 9:07 - 9:08
    about the functional form for that,
  • 9:08 - 9:10
    so maybe you should curve...
  • 9:11 - 9:13
    be open to flexible curve fitting,
  • 9:13 - 9:15
    And that leads you down a path
  • 9:15 - 9:17
    where there's a lot
    of nonlinearities in the model,
  • 9:17 - 9:20
    and that's very dangerous with IV
  • 9:20 - 9:23
    because any sort
    of excluded non-linearity
  • 9:23 - 9:26
    potentially generates
    a spurious causal effect,
  • 9:26 - 9:29
    and Brigham and I showed that
    very powerfully, I think,
  • 9:29 - 9:32
    in the case of two instruments
  • 9:33 - 9:35
    that come from a paper of mine
    with Bill Evans,
  • 9:35 - 9:38
    where if you replace it...
  • 9:39 - 9:41
    a traditional two-stage
    least squares estimator
  • 9:41 - 9:43
    with some kind of random forest,
  • 9:43 - 9:47
    you get very precisely estimated
    nonsense estimates.
  • 9:49 - 9:51
    I think that's a big caution.
  • 9:52 - 9:55
    In view of those findings,
    in an example I care about
  • 9:55 - 9:57
    where the instruments
    are very simple
  • 9:57 - 9:59
    and I believe that they're valid,
  • 9:59 - 10:01
    I would be skeptical of that.
  • 10:03 - 10:06
    Non-linearity and IV
    don't mix very comfortably.
  • 10:06 - 10:09
    - No, it sounds like that's already
    a more complicated...
  • 10:10 - 10:12
    - Well, it's IV...
    - Yeah.
  • 10:13 - 10:14
    - ...but then we work on that.
  • 10:14 - 10:16
    [laughter]
  • 10:16 - 10:17
    - Fair enough.
  • 10:17 - 10:19
    ♪ [music] ♪
  • 10:19 - 10:20
    - [Guido] As an editor
    of econometric guy,
  • 10:20 - 10:22
    a lot of these papers
    cross my desk,
  • 10:23 - 10:27
    but the motivation is not clear
  • 10:28 - 10:30
    and, in fact, really lacking.
  • 10:30 - 10:31
    They're not...
  • 10:32 - 10:35
    [vehicle]-type semiparametric
    foundational papers.
  • 10:35 - 10:37
    So that's a big problem.
  • 10:39 - 10:43
    A related problem is that we have
    this tradition in econometrics
  • 10:43 - 10:47
    of being very focused
    on these formal asymptotic results.
  • 10:49 - 10:53
    We just have a lot of papers
    where people propose a method,
  • 10:53 - 10:56
    and then they establish
    the asymptotic properties
  • 10:56 - 10:59
    in a very kind of standardized way.
  • 11:01 - 11:02
    - Is that bad?
  • 11:03 - 11:06
    - Well, I think it's sort
    of closed the door
  • 11:06 - 11:09
    for a lot of work
    that doesn't fit into that
  • 11:09 - 11:12
    where in the machine
    learning literature,
  • 11:12 - 11:13
    a lot of things
    are more algorithmic.
  • 11:14 - 11:18
    People had algorithms
    for coming up with predictions
  • 11:19 - 11:21
    that turn out
    to actually work much better
  • 11:21 - 11:24
    than, say, nonparametric
    kernel regression.
  • 11:24 - 11:27
    For a long time, we were doing all
    the nonparametrics in econometrics,
  • 11:27 - 11:29
    and we were using
    kernel regression,
  • 11:29 - 11:31
    and that was great
    for proving theorems.
  • 11:31 - 11:33
    You could get confidence intervals
  • 11:33 - 11:35
    and consistency,
    and asymptotic normality,
  • 11:35 - 11:36
    and it was all great,
  • 11:36 - 11:37
    But it wasn't very useful.
  • 11:37 - 11:39
    And the things they did
    in machine learning
  • 11:39 - 11:41
    are just way, way better.
  • 11:41 - 11:43
    But they didn't have the problem --
  • 11:43 - 11:44
    - That's not my beef
    with machine learning,
  • 11:44 - 11:46
    that the theory is weak.
  • 11:46 - 11:47
    [laughter]
  • 11:47 - 11:51
    - No, but I'm saying there,
    for the prediction part,
  • 11:51 - 11:52
    it does much better.
  • 11:52 - 11:54
    - Yeah, it's a better
    curve fitting to it.
  • 11:55 - 11:58
    - But it did so in a way
  • 11:58 - 12:00
    that would not have made
    those papers
  • 12:00 - 12:04
    initially easy to get into,
    the econometrics journals,
  • 12:04 - 12:06
    because it wasn't proving
    the type of things...
  • 12:07 - 12:10
    When Breiman was doing
    his regression trees --
  • 12:10 - 12:11
    they just didn't fit in.
  • 12:13 - 12:15
    I think he would have had
    a very hard time
  • 12:15 - 12:18
    publishing these things
    in econometrics journals.
  • 12:20 - 12:24
    I think we've limited
    ourselves too much
  • 12:25 - 12:28
    that left us close things off
  • 12:28 - 12:30
    for a lot of these
    machine-learning methods
  • 12:30 - 12:31
    that are actually very useful.
  • 12:31 - 12:34
    I mean, I think, in general,
  • 12:35 - 12:37
    that literature,
    the computer scientist,
  • 12:37 - 12:40
    have brought a huge number
    of these algorithms there --
  • 12:41 - 12:43
    have proposed a huge number
    of these algorithms
  • 12:43 - 12:44
    that actually are very useful.
  • 12:44 - 12:46
    and that are affecting
  • 12:46 - 12:49
    the way we're going
    to be doing empirical work.
  • 12:50 - 12:52
    But we've not fully
    internalized that
  • 12:52 - 12:54
    because we're still very focused
  • 12:54 - 12:58
    on getting point estimates
    and getting standard errors
  • 12:59 - 13:00
    and getting P values
  • 13:00 - 13:03
    in a way that we need
    to move beyond
  • 13:03 - 13:06
    to fully harness the force,
  • 13:07 - 13:08
    the benefits
  • 13:08 - 13:11
    from machine learning literature.
  • 13:11 - 13:14
    - On the one hand, I guess I very
    much take your point
  • 13:14 - 13:17
    that sort of the traditional
    econometrics framework
  • 13:17 - 13:20
    of propose a method,
    prove a limit theorem
  • 13:20 - 13:24
    under some asymptotic story,
    story, story, story, story...
  • 13:24 - 13:27
    publisher paper is constraining,
  • 13:27 - 13:30
    and that, in some sense,
    by thinking more broadly
  • 13:30 - 13:32
    about what a methods paper
    could look like,
  • 13:32 - 13:33
    we may in some sense.
  • 13:33 - 13:35
    Certainly, the machine
    learning literature
  • 13:35 - 13:37
    has found a bunch of things
    which seem to work quite well
  • 13:37 - 13:38
    for a number of problems
  • 13:38 - 13:41
    and are now having
    substantial influence in economics.
  • 13:41 - 13:43
    I guess a question
    I'm interested in
  • 13:43 - 13:46
    is how do you think
    about the role of...
  • 13:49 - 13:51
    Do you think there is no value
    in the theory part of it?
  • 13:52 - 13:54
    Because I guess a question
    that I often have
  • 13:54 - 13:57
    to seeing the output
    from a machine learning tool,
  • 13:57 - 13:58
    and actually a number
    of the methods
  • 13:58 - 13:59
    that you talked about
  • 13:59 - 14:01
    actually do have
    inferential results
  • 14:01 - 14:02
    developed for them,
  • 14:03 - 14:04
    something that
    I always wonder about,
  • 14:04 - 14:07
    a sort of uncertainty
    quantification and just...
  • 14:07 - 14:08
    I have my prior,
  • 14:08 - 14:11
    I come into the world with my view,
    I see the result of this thing.
  • 14:11 - 14:12
    How should I update based on it?
  • 14:12 - 14:14
    And in some sense,
    if I'm in a world
  • 14:14 - 14:16
    where things
    are normally distributed,
  • 14:16 - 14:17
    I know how to do it --
  • 14:17 - 14:18
    here I don't.
  • 14:18 - 14:21
    And so I'm interested to hear
    what you think about that.
  • 14:22 - 14:24
    - I don't see this
    as sort of saying, well,
  • 14:25 - 14:27
    these results are not interesting,
  • 14:27 - 14:28
    but it's going to be a lot of cases
  • 14:28 - 14:30
    where it's going to be incredibly
    hard to get those results,
  • 14:30 - 14:32
    and we may not
    be able to get there,
  • 14:32 - 14:35
    and we may need to do it in stages
  • 14:35 - 14:36
    where first someone says,
  • 14:36 - 14:41
    "Hey, I have
    this interesting algorithm
  • 14:41 - 14:42
    for doing something,
  • 14:42 - 14:48
    and it works well
    by some criterion there
  • 14:48 - 14:50
    on this particular data set,
  • 14:51 - 14:53
    and we should put it out there."
  • 14:53 - 14:55
    and maybe someone
    will figure out a way
  • 14:55 - 14:58
    that you can later actually
    still do inference
  • 14:58 - 14:59
    under some condition,
  • 14:59 - 15:02
    and maybe those are not
    particularly realistic conditions,
  • 15:02 - 15:04
    then we kind of go further.
  • 15:04 - 15:08
    But I think we've been
    constraining things too much
  • 15:08 - 15:10
    where we said,
  • 15:10 - 15:13
    "This is the type of things
    that we need to do."
  • 15:13 - 15:15
    And in some sense,
  • 15:16 - 15:18
    that goes back
    to the way Josh and I
  • 15:20 - 15:22
    thought about things for the local
    average treatment effect.
  • 15:22 - 15:23
    That wasn't quite the way
  • 15:23 - 15:25
    people were thinking
    about these problems before.
  • 15:26 - 15:29
    There was a sense
    that some of the people said
  • 15:30 - 15:32
    the way you need to do
    these things is you first say
  • 15:32 - 15:34
    what you're interested
    in estimating,
  • 15:34 - 15:38
    and then you do the best job
    you can in estimating that.
  • 15:38 - 15:44
    And what you guys are doing
    is you're doing it backwards.
  • 15:44 - 15:47
    You kind of say,
    "Here, I have an estimator,
  • 15:47 - 15:51
    and now I'm going to figure out
    what it's estimating."
  • 15:51 - 15:54
    And I suppose you're going to say
    why you think that's interesting
  • 15:54 - 15:57
    or maybe why it's not interesting,
    and that's not okay.
  • 15:57 - 15:59
    You're not allowed
    to do that in that way.
  • 15:59 - 16:02
    And I think we should
    just be a little bit more flexible
  • 16:02 - 16:07
    in thinking about
    how to look at problems
  • 16:07 - 16:08
    because I think
    we've missed some things
  • 16:08 - 16:11
    by not doing that.
  • 16:11 - 16:13
    ♪ [music] ♪
  • 16:13 - 16:15
    - [Josh] So you've heard
    our views, Isaiah,
  • 16:15 - 16:18
    and you've seen that we have
    some points of disagreement.
  • 16:18 - 16:20
    Why don't you referee
    this dispute for us?
  • 16:21 - 16:22
    [laughter]
  • 16:22 - 16:25
    - Oh, it's so nice of you
    to ask me a small question.
  • 16:25 - 16:26
    [laughter]
  • 16:26 - 16:28
    So I guess, for one,
  • 16:28 - 16:33
    I very much agree with something
    that Guido said earlier of...
  • 16:34 - 16:36
    [laughter]
  • 16:36 - 16:37
    So one thing where it seems
  • 16:37 - 16:40
    where the case for machine learning
    seems relatively clear
  • 16:40 - 16:43
    is in settings where
    we're interested in some version
  • 16:43 - 16:45
    of a nonparametric
    prediction problem.
  • 16:45 - 16:46
    So I'm interested in estimating
  • 16:46 - 16:50
    a conditional expectation
    or conditional probability,
  • 16:50 - 16:52
    and in the past, maybe
    I would have run a kernel...
  • 16:52 - 16:54
    I would have run
    a kernel regression
  • 16:54 - 16:55
    or I would have run
    a series regression,
  • 16:55 - 16:57
    or something along those lines.
  • 16:58 - 17:00
    It seems like, at this point,
    we've a fairly good sense
  • 17:00 - 17:03
    that in a fairly wide range
    of applications,
  • 17:03 - 17:06
    machine learning methods
    seem to do better
  • 17:06 - 17:09
    for estimating conditional
    mean functions,
  • 17:09 - 17:10
    or conditional probabilities,
  • 17:10 - 17:12
    or various other
    nonparametric objects
  • 17:12 - 17:15
    than more traditional
    nonparametric methods
  • 17:15 - 17:17
    that were studied
    in econometrics and statistics,
  • 17:17 - 17:19
    especially in
    high-dimensional settings.
  • 17:20 - 17:22
    - So you're thinking of maybe
    the propensity score
  • 17:22 - 17:23
    or something like that?
  • 17:23 - 17:25
    - Yeah, exactly,
    - Nuisance functions.
  • 17:25 - 17:27
    - Yeah, so things
    like propensity scores.
  • 17:28 - 17:30
    Even objects of more direct
  • 17:30 - 17:32
    interest-like conditional
    average treatment effects,
  • 17:32 - 17:35
    which are the difference of two
    conditional expectation functions,
  • 17:35 - 17:37
    potentially things like that.
  • 17:37 - 17:41
    Of course, even there,
    the theory...
  • 17:41 - 17:44
    for inference of the theory
    for how to interpret,
  • 17:44 - 17:46
    how to make large sample statements
    about some of these things
  • 17:46 - 17:48
    are less well-developed
    depending on
  • 17:48 - 17:50
    the machine learning
    estimator used.
  • 17:50 - 17:53
    And so I think
    something that is tricky
  • 17:53 - 17:56
    is that we can have these methods,
    which work a lot,
  • 17:56 - 17:58
    which seem to work
    a lot better for some purposes
  • 17:58 - 18:01
    but which we need to be a bit
    careful in how we plug them in
  • 18:01 - 18:03
    or how we interpret
    the resulting statements.
  • 18:04 - 18:06
    But, of course, that's a very,
    very active area right now
  • 18:06 - 18:08
    where people are doing
    tons of great work.
  • 18:08 - 18:11
    And so I fully expect
    and hope to see
  • 18:11 - 18:13
    much more going forward there.
  • 18:13 - 18:17
    So one issue with machine learning
    that always seems a danger is...
  • 18:17 - 18:19
    or that is sometimes a danger
  • 18:19 - 18:21
    and has sometimes
    led to applications
  • 18:21 - 18:22
    that have made less sense
  • 18:22 - 18:27
    is when folks start with a method
    that they're very excited about
  • 18:27 - 18:29
    rather than a question.
  • 18:29 - 18:30
    So sort of starting with a question
  • 18:30 - 18:34
    where here's the object
    I'm interested in,
  • 18:34 - 18:35
    here is the parameter
    of interest --
  • 18:36 - 18:40
    let me think about how I would
    identify that thing,
  • 18:40 - 18:42
    how I would recover that thing
    if I had a ton of data.
  • 18:42 - 18:44
    Oh, here's a conditional
    expectation function,
  • 18:44 - 18:47
    let me plug in a machine
    learning estimator for that --
  • 18:47 - 18:49
    that seems very, very sensible.
  • 18:49 - 18:53
    Whereas, you know,
    if I regress quantity on price
  • 18:54 - 18:56
    and say that I used
    a machine learning method,
  • 18:56 - 18:59
    maybe I'm satisfied that
    that solves the endogeneity problem
  • 18:59 - 19:01
    we're usually worried
    about there... maybe I'm not.
  • 19:02 - 19:03
    But, again, that's something
  • 19:03 - 19:06
    where the way to address it
    seems relatively clear.
  • 19:06 - 19:08
    It's to find
    your object of interest
  • 19:08 - 19:10
    and think about --
  • 19:10 - 19:11
    - Just bring in the economics.
  • 19:11 - 19:13
    - Exactly.
  • 19:13 - 19:14
    - And think about
    the heterogeneity,
  • 19:14 - 19:17
    but harness the power
    of the machine learning methods
  • 19:17 - 19:20
    for some of the components.
  • 19:20 - 19:21
    - Precisely. Exactly.
  • 19:21 - 19:24
    So the question of interest
  • 19:24 - 19:26
    is the same as the question
    of interest has always been,
  • 19:26 - 19:28
    but we now have better methods
    for estimating some pieces of this.
  • 19:30 - 19:33
    The place that seems
    harder to forecast
  • 19:33 - 19:36
    is obviously there's
    a huge amount going on
  • 19:36 - 19:38
    in the machine learning literature,
  • 19:38 - 19:40
    and the limited ways
    of plugging it in
  • 19:40 - 19:41
    that I've referenced so far
  • 19:41 - 19:43
    are a limited piece of that.
  • 19:43 - 19:45
    So I think there are all sorts
    of other interesting questions
  • 19:45 - 19:47
    about where...
  • 19:47 - 19:49
    where does this interaction go?
    What else can we learn?
  • 19:49 - 19:53
    And that's something where
    I think there's a ton going on,
  • 19:53 - 19:54
    which seems very promising,
  • 19:54 - 19:56
    and I have no idea
    what the answer is.
  • 19:57 - 20:00
    - No, I totally agree with that,
  • 20:00 - 20:04
    but that makes it very exciting.
  • 20:04 - 20:06
    And I think there's just
    a little work to be done there.
  • 20:07 - 20:09
    Alright. So I say,
    he agrees with me there.
  • 20:09 - 20:10
    [laughter]
  • 20:10 - 20:12
    - I didn't say that per se.
  • 20:13 - 20:14
    ♪ [music] ♪
  • 20:14 - 20:17
    - [Narrator] If you'd like to watch
    more Nobel Conversations,
  • 20:17 - 20:18
    click here.
  • 20:18 - 20:20
    Or if you'd like to learn
    more about econometrics,
  • 20:20 - 20:23
    check out Josh's
    Mastering Econometrics series.
  • 20:24 - 20:27
    If you'd like to learn more
    about Guido, Josh, and Isaiah,
  • 20:27 - 20:29
    check out the links
    in the description.
  • 20:29 - 20:31
    ♪ [music] ♪
Title:
How Will Machine Learning Impact Economics?
ASR Confidence:
0.83
Description:

more » « less
Video Language:
English
Team:
Marginal Revolution University
Duration:
20:33

English subtitles

Revisions Compare revisions