♪ [music] ♪ - [Narrator] Welcome to Nobel Conversations. In this episode, Josh Angrist and Guido Imbens sit down with Isaiah Andrews to discuss and disagree over the role of machine learning in applied econometrics. - [Isaiah] So, of course, there are a lot of topics where you guys largely agree, but I'd like to turn to one where maybe you have some differences of opinion. I'd love to hear some of your thoughts about machine learning and the goal that it's playing and is going to play in economics. - [Guido] I've looked at some data like the proprietary. We see that there's no published paper there. There was an experiment that was done on some search algorithm, and the question was -- it was about ranking things and changing the ranking. And it was sort of clear that there was going to be a lot of heterogeneity there. If you look for, say, a picture of Britney Spears -- that it doesn't really matter where you rank it because you're going to figure out what you're looking for, whether you put it in the first or second or third position of the ranking. But if you're looking for the best econometrics book, if you put your book first or your book tenth -- that's going to make a big difference how often people are going to click on it. And so there you -- - [Josh] Why do I need machine learning to discover that? It seems like -- because I can discover it simply. - [Guido] So in general -- - [Josh] There were lots of possible... - You want to think about there being lots of characteristics of the items, that you want to understand what drives the heterogeneity in the effect of -- - But you're just predicting In some sense, you're solving a marketing problem. - No, it's a causal effect, - It's causal, but it has no scientific content. Think about... - No, but there's similar things in medical settings. If you do an experiment, you may actually be very interested in whether the treatment works for some groups or not. And you have a lot of individual characteristics, and you want to systematically search -- - Yeah. I'm skeptical about that -- that sort of idea that there's this personal causal effect that I should care about, and that machine learning can discover it in some way that's useful. So think about -- I've done a lot of work on schools, going to, say, a charter school, a publicly funded private school, effectively, that's free to structure its own curriculum for context there. Some types of charter schools generate spectacular achievement gains, and in the data set that produces that result, I have a lot of covariates. So I have baseline scores, and I have family background, the education of the parents, the sex of the child, the race of the child. And, well, soon as I put half a dozen of those together, I have a very high-dimensional space. I'm definitely interested in course features of that treatment effect, like whether it's better for people who come from lower-income families. I have a hard time believing that there's an application for the very high-dimensional version of that, where I discovered that for non-white children who have high family incomes but baseline scores in the third quartile and only went to public school in the third grade but not the sixth grade. So that's what that high-dimensional analysis produces. It's a very elaborate conditional statement. There's two things that are wrong with that in my view. First, I don't see it as -- I just can't imagine why it's actionable. I don't know why you'd want to act on it. And I know also that there's some alternative model that fits almost as well, that flips everything. Because machine learning doesn't tell me that this is really the predictor that matters -- it just tells me that this is a good predictor. And so, I think there is something different about the social science context. - [Guido] I think the social science applications you're talking about are ones where, I think, there's not a huge amount of heterogeneity in the effects. - [Josh] Well, there might be if you allow me to fill that space. - No... not even then. I think for a lot of those interventions, you would expect that the effect is the same sign for everybody. There may be small differences in the magnitude, but it's not... For a lot of these educational defenses -- they're good for everybody. It's not that they're bad for some people and good for other people, and that is kind of very small pockets where they're bad there. But there may be some variation in the magnitude, but you would need very, very big data sets to find those. I agree that in those cases, they probably wouldn't be very actionable anyway. But I think there's a lot of other settings where there is much more heterogeneity. - Well, I'm open to that possibility, and I think the example you gave is essentially a marketing example. - No, those have implications for it and that's the organization, whether you need to worry about the... - Well, I need to see that paper. - So the sense I'm getting is that -- - We still disagree on something. - Yes. - We haven't converged on everything. - I'm getting that sense. [laughter] - Actually, we've diverged on this because this wasn't around to argue about. [laughter] - Is it getting a little warm here? - Warmed up. Warmed up is good. The sense I'm getting is, Josh, you're not saying that you're confident that there is no way that there is an application where this stuff is useful. You are saying you are unconvinced by the existing applications to date. - Fair enough. - I'm very confident. [laughter] - In this case. - I think Josh does have a point that even in the prediction cases where a lot of the machine learning methods really shine is where there's just a lot of heterogeneity. - You don't really care much about the details there, right? - [Guido] Yes. - It doesn't have a policy angle or something. - The kind of recognizing handwritten digits and stuff -- it does much better there than building some complicated model. But a lot of the social science, a lot of the economic applications, we actually know a huge amount about the relationship between its variables. A lot of the relationships are strictly monotone. Education is going to increase people's earnings, irrespective of the demographic, irrespective of the level of education you already have. - Until they get to a Ph.D. - Is that true for graduate school? [laughter] - Over a reasonable range. It's not going to go down very much. In a lot of the settings where these machine learning methods shine, there's a lot of non-monotonicity, kind of multimodality in these relationships, and they're going to be very powerful. But I still stand by that. These methods just have a huge amount to offer for economists, and they're going to be a big part of the future. ♪ [music] ♪ - [Isaiah] It feels like there's something interesting to be said about machine learning here. So, Guido, I was wondering, could you give some more... maybe some examples of the sorts of examples you're thinking about with applications coming out at the moment? - So one area is where instead of looking for average causal effects, we're looking for individualized estimates, predictions of causal effects, and there, the machine learning algorithms have been very effective. Traditionally, we would have done these things using kernel methods, and theoretically, they work great, and there's some arguments that, formally, you can't do any better. But in practice, they don't work very well. Random causal forest-type things that Stefan Wager and Susan Athey have been working on are used very widely. They've been very effective in these settings to actually get causal effects that vary by covariates. I think this is still just the beginning of these methods. But in many cases, these algorithms are very effective as searching over big spaces and finding the functions that fit very well in ways that we couldn't really do beforehand. - I don't know of an example where machine learning has generated insights about a causal effect that I'm interested in. And I do know of examples where it's potentially very misleading. So I've done some work with Brigham Frandsen, using, for example, random forests to model covariate effects in an instrumental variables problem where you need to condition on covariates. And you don't particularly have strong feelings about the functional form for that, so maybe you should curve... be open to flexible curve fitting, And that leads you down a path where there's a lot of nonlinearities in the model, and that's very dangerous with IV because any sort of excluded non-linearity potentially generates a spurious causal effect, and Brigham and I showed that very powerfully, I think, in the case of two instruments that come from a paper of mine with Bill Evans, where if you replace it... a traditional two-stage least squares estimator with some kind of random forest, you get very precisely estimated nonsense estimates. I think that's a big caution. In view of those findings, in an example I care about where the instruments are very simple and I believe that they're valid, I would be skeptical of that. Non-linearity and IV don't mix very comfortably. - No, it sounds like that's already a more complicated... - Well, it's IV... - Yeah. - ...but then we work on that. [laughter] - Fair enough. ♪ [music] ♪ - [Guido] As editor of Econometrica, a lot of these papers cross my desk, but the motivation is not clear and, in fact, really lacking. They're not... big old type semiparametric foundational papers. So that's a big problem. A related problem is that we have this tradition in econometrics of being very focused on these formal asymptotic results. We just have a lot of papers where people propose a method, and then they establish the asymptotic properties in a very kind of standardized way. - Is that bad? - Well, I think it's sort of closed the door for a lot of work that doesn't fit into that where in the machine learning literature, a lot of things are more algorithmic. People had algorithms for coming up with predictions that turn out to actually work much better than, say, nonparametric kernel regression. For a long time, we were doing all the nonparametrics in econometrics, and we were using kernel regression, and that was great for proving theorems. You could get confidence intervals and consistency, and asymptotic normality, and it was all great, But it wasn't very useful. And the things they did in machine learning are just way, way better. But they didn't have the problem -- - That's not my beef with machine learning, that the theory is weak. [laughter] - No, but I'm saying there, for the prediction part, it does much better. - Yeah, it's a better curve fitting tool. - But it did so in a way that would not have made those papers initially easy to get into, the econometrics journals, because it wasn't proving the type of things... When Breiman was doing his regression trees -- they just didn't fit in. I think he would have had a very hard time publishing these things in econometrics journals. I think we've limited ourselves too much that left us close things off for a lot of these machine-learning methods that are actually very useful. I mean, I think, in general, that literature, the computer scientist, have brought a huge number of these algorithms there -- have proposed a huge number of these algorithms that actually are very useful. and that are affecting the way we're going to be doing empirical work. But we've not fully internalized that because we're still very focused on getting point estimates and getting standard errors and getting P values in a way that we need to move beyond to fully harness the force, the benefits from the machine learning literature. - On the one hand, I guess I very much take your point that sort of the traditional econometrics framework of propose a method, prove a limit theorem under some asymptotic story, story, story, story, story... publisher paper is constraining, and that, in some sense, by thinking more broadly about what a methods paper could look like, we may write, in some sense, certainly that the machine learning literature has found a bunch of things which seem to work quite well for a number of problems and are now having substantial influence in economics. I guess a question I'm interested in is how do you think about the role of... Do you think there is no value in the theory part of it? Because I guess a question that I often have to seeing the output from a machine learning tool, and actually a number of the methods that you talked about actually do have inferential results developed for them, something that I always wonder about, a sort of uncertainty quantification and just... I have my prior, I come into the world with my view, I see the result of this thing. How should I update based on it? And in some sense, if I'm in a world where things are normally distributed, I know how to do it -- here I don't. And so I'm interested to hear what you think about that. - I don't see this as sort of saying, well, these results are not interesting, but it's going to be a lot of cases where it's going to be incredibly hard to get those results, and we may not be able to get there, and we may need to do it in stages where first someone says, "Hey, I have this interesting algorithm for doing something," and it works well by some criterion on this particular data set, and we should put it out there. and maybe someone will figure out a way that you can later actually still do inference under some conditions, and maybe those are not particularly realistic conditions. Then we kind of go further. But I think we've been constraining things too much where we said, "This is the type of things that we need to do." And in some sense, that goes back to the way Josh and I thought about things for the local average treatment effect. That wasn't quite the way people were thinking about these problems before. There was a sense that some of the people said the way you need to do these things is you first say what you're interested in estimating, and then you do the best job you can in estimating that. And what you guys are doing is you're doing it backwards. You kind of say, "Here, I have an estimator, and now I'm going to figure out what it's estimating." And I suppose you're going to say why you think that's interesting or maybe why it's not interesting, and that's not okay. You're not allowed to do that in that way. And I think we should just be a little bit more flexible in thinking about how to look at problems because I think we've missed some things by not doing that. ♪ [music] ♪ - [Josh] So you've heard our views, Isaiah, and you've seen that we have some points of disagreement. Why don't you referee this dispute for us? [laughter] - Oh, it's so nice of you to ask me a small question. [laughter] So I guess, for one, I very much agree with something that Guido said earlier of... [laughter] So one thing where it seems where the case for machine learning seems relatively clear is in settings where we're interested in some version of a nonparametric prediction problem. So I'm interested in estimating a conditional expectation or conditional probability, and in the past, maybe I would have run a kernel... I would have run a kernel regression, or I would have run a series regression, or something along those lines. It seems like, at this point, we've a fairly good sense that in a fairly wide range of applications, machine learning methods seem to do better for estimating conditional mean functions, or conditional probabilities, or various other nonparametric objects than more traditional nonparametric methods that were studied in econometrics and statistics, especially in high-dimensional settings. - So you're thinking of maybe the propensity score or something like that? - Yeah, exactly, - Nuisance functions. - Yeah, so things like propensity scores. Even objects of more direct interest-like conditional average treatment effects, which are the difference of two conditional expectation functions, potentially things like that. Of course, even there, the theory... for inference of the theory for how to interpret, how to make large sample statements about some of these things are less well-developed depending on the machine learning estimator used. And so I think something that is tricky is that we can have these methods, which work a lot, which seem to work a lot better for some purposes but which we need to be a bit careful in how we plug them in or how we interpret the resulting statements. But, of course, that's a very, very active area right now where people are doing tons of great work. So I fully expect and hope to see much more going forward there. So one issue with machine learning that always seems a danger is... or that is sometimes a danger and has sometimes led to applications that have made less sense is when folks start with a method that they're very excited about rather than a question. So sort of starting with a question where here's the object I'm interested in, here is the parameter of interest -- let me think about how I would identify that thing, how I would recover that thing if I had a ton of data. Oh, here's a conditional expectation function, let me plug in a machine learning estimator for that -- that seems very, very sensible. Whereas, you know, if I regress quantity on price and say that I used a machine learning method, maybe I'm satisfied that that solves the endogeneity problem we're usually worried about there -- maybe I'm not. But, again, that's something where the way to address it seems relatively clear. It's to find your object of interest and think about -- - Just bring in the economics. - Exactly. - And think about the heterogeneity, but harness the power of the machine learning methods for some of the components. - Precisely. Exactly. So the question of interest is the same as the question of interest has always been, but we now have better methods for estimating some pieces of this. The place that seems harder to forecast is obviously there's a huge amount going on in the machine learning literature, and the limited ways of plugging it in that I've referenced so far are a limited piece of that. So I think there are all sorts of other interesting questions about where... where does this interaction go? What else can we learn? And that's something where I think there's a ton going on, which seems very promising, and I have no idea what the answer is. - No, I totally agree with that, but that makes it very exciting. And I think there's just a little work to be done there. Alright. So Isaiah agrees with me there. [laughter] - I didn't say that per se. ♪ [music] ♪ - [Narrator] If you'd like to watch more Nobel Conversations, click here. Or if you'd like to learn more about econometrics, check out Josh's Mastering Econometrics series. If you'd like to learn more about Guido, Josh, and Isaiah, check out the links in the description. ♪ [music] ♪