♪ [music] ♪ - [narrator] Welcome to Nobel Conversations. In this episode, Josh Angrist and Guido Imbens sit down with Isaiah Andrews to discuss and disagree over the role of machine learning in applied econometrics. - [Isaiah] So, of course, there are a lot of topics where you guys largely agree, but I'd like to turn to one where maybe you have some differences of opinion. So I'd love to hear some of your thoughts about machine learning and the goal that it's playing and is going to play in economics. - [Guido] I've looked at some data like the proprietary so that there's no published paper there. There was an experiment that was done on some search algorithm. And the question was... it was about ranking things and changing the ranking. That was sort of clear... that was going to be a lot of heterogeneity there. Mmm, You know, if you look for say, a picture of Britney Spears that it doesn't really matter where you rank it because you're going to figure out what you're looking for, whether you put it in the first or second or third position of the ranking. But if you're looking for the best econometrics book, if you put your book first or your book tenth, that's going to make a big difference how much how often people are going to click on it. And so there you go -- - [Josh] Why do I need machine learning to discover that? It seems like because I can discover it simply. So in general, there were lots of possible. You what you want to think about there being lots of characteristics of the the items that you want to understand where, what drives the heterogeneity in the effect of your just rekt, you know, that in some sense. You're solving a marketing problem. Also affect you, it's causal, but it has no scientific content. I think about think about, but it's similar things and medical settings. If you do an experiment, you may actually be very interested in whether the treatment works for some groups or not. And you have a lot of individual characteristics and you want to systematically search. Yeah. I'm skeptical about that. That sort of idea that there's this personal causal effect that I should care about, and that machine learning can Discover it in some way that's useful. So think about I've done a lot of work on schools, going to say a charter school publicly funded private school effectively, you know, that's free to structure its own curriculum for context there. Some types of charter, schools are generate spectacular, achievement gains and in the data set that produces that result. I have a lot of covariance. So I have Baseline scores, and I have family background, the education of the parents, the sex of the child, the race of the child. And, well, soon as I put Half a dozen of those together. I have a very high dimensional space. I'm definitely interested in in sort, of course, features of that treatment effect, like whether it's better for people who come from lower income families. I have a hard time believing that there's an application, you know, for the very high dimensional version of that, where I discovered that for non-white children who have high family incomes, but Baseline scores in the third quartile, And only went to public school in the third grade, but not the sixth grade. So that's what that high dimensional analysis produces. This very elaborate conditional statement. There's two things that are wrong with that. In my view first. I don't see it as I just can't imagine why it's actionable. I don't know why you'd want to act on it. And I know also that there's some alternative model that fits almost as well. That flips everything, right? Because machine learning doesn't tell me that this is really the predictor that Is it just tells me that this is a good predictor? And so, you know, I think there is something different about the Moss social science contest. So I think the socialized signs of applications you're talking about once where I think there's not a huge amount of heterogeneity in the effects. And so what there might be a few allow me to to fill that space. No, not even then I think for a lot of those those into Sanctions even effect. You would expect that. The effect is the same sign for everybody. It may be there may be small differences in the magnitude, but it's not for a lot of these education defenses. They're good for everybody. They're the it's not that they're bad for some people and good for other people and that is kind of very small Pockets where they're bad the but it may be some variation in the magnitude, but you would need very very big data sets to find those and I Then in those cases, they probably wouldn't be very actionable anyone. But there's I think there's a lot of other settings where there is much more hydrogen it. Well, I'm open to that possibility and I think the example you gave of it's essentially a marketing example. Now that maybe they say there's a there's a have implications for and that's organization. How you actually need to whether you need to worry about the well, I know Market power, some see that paper. So that's the sense. The sense I'm getting is that we still disagree on something. Yes. We have it converged on everything. I'm getting that sense. Actually. We've diverged on this because this wasn't around to argue about. Is it getting a little warm here? Yeah. Warm warmed up. Warmed up is good. The sense. I'm getting his Jaws. Sort of, you're not, you're not saying that you're confident that there is no way. That there is an application where the stuff is useful. You are saying you are you're unconvinced by the existing. Applications to dedicate fair that I'm very confident. Yeah, in this case. I think Josh does have a point that today even in the prediction cases the where a lot of the machine learning methods really shine is where there's just a lot of heterogeneity. You don't really care much about the details there, right? Yes. It does. It doesn't have a policy angle or something. They kind of recognizing handwritten digits and stuff. For it does much better there than building some complicated model. But a lot of the social science, a lot of the economic applications. We actually know a huge amount about the relationship between various variables. A lot of the relationships are strictly monotone. There and education is going to increase people's earnings, irrespective of the demographic, irrespective of the level of Education. You already have until they get to a PhD. Yeah. There is a graduate school. A reasonable range. It's a it's not going to go down very much. We're in a lot of the settings. For these machine learning method shine. It's going to there's a lot of non-monetary Necessities kind of multi modality in these relationships and they're they're going to be very powerful but I still stand by that. It kind of It kind of this message just have a huge amount to offer the for for economists and they go. To be a big part of the future. Feels like there's something interesting to be said about machine learning here. So, here I was wondering, could you give some more, maybe some examples of the sorts of examples you're thinking about with applications? I'm at the moment. So while I'm on areas where instead of looking for average cause of facts were looking for individualized estimates, and predictions of of course of facts and their machine learning algorithms have been very effective, too. Surely would have, we would have done these things, using kernel methods. And theoretically they work great and the sort of some arguments that you formally can't do any better. But in practice, they don't work very well and random Forest, random cause of forest type things that stuff on wagon, Susan. I think I've been working on. I used very widely. They've been very effective, kind of, in the settings to actually get cause of facts that are that the ferry by Bike over has, and this kind of, I think this is still just the beginning of these methods. But in many cases, the these algorithms are very effective as searching over big spaces and finding the functions that fit the very well in ways that we couldn't really do the beforehand. I don't know of an example, where machine learning has generated insights about a causal effect that I'm interested in. And I, You know of examples where it's potentially very misleading. So I've done some work with Brigham Franz and using, for example, random Forest to model covariate effects in an instrumental variables problem. Where you need, you need to condition on covariance and you don't particularly have strong feelings about the functional form for that. So maybe you should curve think, be open to flexible curve fitting and that leads you down a path where there's a lot of nonlinearities in the model and That's very dangerous with IV because any sort of excluded non-linearity potentially generates a spurious, causal effect and Brigham. And I showed that very powerfully. I think in the case of two instruments that come from a paper, mine with Bill Evans. Where if you, you know, replace it in a traditional two stage least squares, estimator with some kind of random Forest. You get very precisely at estimated nonsense estimates and You know, I think that's a, that's a big caution. And I, you know, in view of those findings in an example, I care about where the instruments are very simple and I believe that they're valid, you know, I would be skeptical of that. So non-linearity and Ivy don't mix very comfortably. Now I said, you know in some sense that's already a more complicated. Well, it's Ivy. Yeah, but then we work on that and friend out. I sat in tow vehicle actually guy a lot of these papers Cross by my desk and it, but the motivation is is not clear at a fact, really lacking. And they're not, they're not, they called type semi-parametric foundational papers. So that that's a big problem and kind of related problem is that we have this tradition in econometrics being very focused on these formulas and tonic results kind of weird. We have just have a lot of papers that where you people, propose a method and then establish the asymptotic properties in in a very kind of standardized way that bad. Well, I think it's sort of close the door for a lot of work. That doesn't fit it into that. We're in the machine learning literature. A lot of things are more algorithmic people. Had algorithms for coming up with predictions. The turn out to actually work much better than say, nonparametric kernel regression for a long-ass time. We're doing all the nonparametric syndecan, metrics. We do it using kernel regression and I was great for proving theorems. You could get confidence, intervals and consistency, and asymptotic normality, and it was all great, but it wasn't very useful. And the things they did in machine learning. I just way way better, but they didn't have to the proper. That's not my beef with machine learning theory. As we know my name, I'm saying there for the prediction part. It does much better. Yeah, that's a better curve fitting to it. But it did. So in a way that would not have made those papers initially easy to get into the econometrics journals because it wasn't proving the type of things. You know, when when Brian was doing his regression trees that just didn't fit in and I think he would have had a very hard time. Polishing these things. And it could have had six journals. I, so I think we're we limited ourselves too much and we that left us close things off for a lot of these machine learning methods, that actually very useful. Hmm. I mean, I think they're in general, that literature the computer. Scientists have brought a huge number of these algorithms. The have proposed a huge number of these algorithms that actually very useful at that are Affecting the way we're going to be doing empirical work, but we've not fully internalize that because we're still very focused on getting Point estimates and getting standard errors and getting P values in a way that we need to move Beyond to fully harness. The force, the quote, the benefits from machine learning literature. Hmm. On the one hand. I guess I very much take your point that sort of the the Tional. Econometrics, framework of sort of propose, a method, proved a limit theorem under some asymptotic story, story story, story story publish a paper is constraining. And that in some sense by thinking, more, broadly about what a methods paper could look. Like we may write in some sense. Certainly the machine learning literature has found a bunch of things, which seem to work quite well for a number of problems and are now having substantial influence in economics. I guess a question. I'm interested in is, how do you think? The goal of fear. Sort of, do you think there is? There's no value in the theory part of it? Because I guess it's sort of a question that I often have to sort of seeing that output from a machine learning tool that actually a number of the methods that you talked about. Actually do have inferential results, develop for them, something that I always wonder about a sort of uncertainty quantification and just, you know, I I have my prior, I come into the world with my view. I see the result of this thing. How should I update based on it? And in some sense, if I'm in a world where things are. Normally distributed. I know how to do it here. I don't. And so I'm interested to hear had I think it sounds. So I don't see this as sort of close it saying, well we do these results are not not interesting but it's gonna be a lot of cases where it's going to be incredibly hard to get those results and we may not be able to get there and we may need to do it in stages. Where first someone says. Hey I have this interesting algorithm for for doing something and it works well by some The Criterion that on this this particular data set and I'm visit put it out there and we should maybe someone will figure out a way that you can later actually still do inference on the some condition. So and maybe those are not particularly realistic conditions, then we kind of go further, but I think we've been Too constraining things too much where we said, you know, this is the type of things that we need to do. And I had some sense that goes back to kind of the way they dress and I thought about things for the local average treatment effect. That wasn't quite the way people were thinking about these problems. Before they say they there was a sense that some of the people said, you know, the way you need to do. These things, is you first, say what you're interested in estimating and then you do the best job you can. In estimating that and what you have you guys had doing is doing it, you guys are doing it backwards. You're going to say here. I have an estimator and now I'm going to figure out what what what it says estimating then expose. You're going to say why you think that's interesting or maybe why it's not interesting and that's that's not okay. You're not allowed to do that that way. And I think we should just be a little bit more flexible and thinking about the how to look at at Problems because I think we've missed some things by not by not doing that. So you've heard our views. Isaiah, you've seen that, we have some points of disagreement. Why don't you referee this dispute for us? Oh, I'm so so nice of you to ask me a small question. So I guess for one. I very much agree with something that he do said earlier of. So what? Where it seems. Where the, the case for machine learning seems relatively clear is in settings, where you know, we're interested in some version of a nonparametric prediction problem. So I'm interested in estimating a conditional expectation or conditional probability and in the past, maybe I would have run a colonel, I would have run a kernel regression or I would have run a series regression or something along those lines. Sort of, it seems like at this point we've a fairly good sense that in a fairly wide range of applications machine learning methods seem to do better for Or, you know, estimating conditional mean functions or conditional probabilities or various other nonparametric objects than more traditional nonparametric methods that were studied in econometrics and statistics, especially in high dimensional settings. So you thinking of maybe the propensity score or something like that? So exactly, so nuisance functions. Yeah. So things like propensity scores things or I mean even objects of more direct inference interest, like conditional average treatment effects, right? Which of the difference of two conditional, expectation functions, potentially things like that. Of course, even there, right? We the the theory for in France or the theory for sort of how to how to interpret, how to make large simple statements about some of these things are less well-developed depending on the machine learning, estimator used. And so, I think there's something that is tricky is that we can have these methods, which work a lot, which seemed to work a lot better for some purposes. But which we need to be a bit careful in how we plug them in or how we interpret the resulting statements. But of course, that's a very, very active area right now. We're People are doing tons of great work. And so I exfoli expect and hope to see much more going forward there. So one issue with machine learning, that always seems a danger is, or that is sometimes a danger and had some times led to applications that have made. Less sense, is when folks start with a method that are start with a method that they're very excited about rather than a question, right? So sort of starting with a question where here's the object I'm interested in here is the parameter of Interest. Let me You know, think about how I would identify that thing, how I would recover that thing, if I had a ton of data, oh, here's a conditional expectation function. Let me plug in an estimator on machine. Learning estimator for that. That seems very very sensible. Whereas, you know, if I digress quantity on price and say that I used a machine learning method, maybe I'm satisfied that that solves the in dodging, 80 problem. We're usually worried about their maybe I'm not, but again, that's something where the, the way to address. It, seems relatively clear, right? It's the find your object of interest and think about, is that just bringing the economics? Exactly. And and can I think about it, and they denied it, but harnessed the power of the machine learning methods for precisely for some of the components precisely. Exactly. So sort of, you know, the, the, the question of interest is the same as the question of interest is always been, but we now better methods for estimating some pieces of this, right? The the place that seems harder to, uh, harder to forecast is Right. Obviously, there's a huge amount going in going on in the machine. Learning literature and the great sort of The Limited ways of plugging it in that I've referenced so far are limited piece of that. And so I think there are all sorts of other interesting questions about where, right sort of where does this interaction go? What else can we learn? And that's something where, you know, I think there's a ton going on which seems very promising and I have no idea what the answer is. No, no. No, it's I so I totally agree with that but it's no. That's makes it very exciting. And I think that's just a little work to be done there. All right. So I say agrees with me there, say that person. If you'd like to watch more Nobel conversations, click here, or if you'd like to learn more about econometrics, check out Josh's mastering econometrics series. If you'd like to learn more about he do Josh and Isaiah check out the links in the description.