WEBVTT 00:00:00.100 --> 00:00:02.050 ♪ [music] ♪ 00:00:03.620 --> 00:00:05.700 - [Narrator] Welcome to Nobel Conversations. 00:00:07.000 --> 00:00:10.043 In this episode, Josh Angrist and Guido Imbens 00:00:10.043 --> 00:00:13.675 sit down with Isaiah Andrews to discuss and disagree 00:00:13.675 --> 00:00:16.580 over the role of machine learning in applied econometrics. 00:00:17.897 --> 00:00:19.769 - [Isaiah] So, of course, there are a lot of topics 00:00:19.769 --> 00:00:21.087 where you guys largely agree, 00:00:21.087 --> 00:00:22.313 but I'd like to turn to one 00:00:22.313 --> 00:00:24.240 where maybe you have some differences of opinion. 00:00:24.240 --> 00:00:25.728 I'd love to hear some of your thoughts 00:00:25.728 --> 00:00:26.883 about machine learning 00:00:26.883 --> 00:00:29.900 and the goal that it's playing and is going to play in economics. 00:00:30.200 --> 00:00:33.352 - [Guido] I've looked at some data like the proprietary. 00:00:33.352 --> 00:00:35.150 We see that there's no published paper there. 00:00:36.122 --> 00:00:38.159 There was an experiment that was done 00:00:38.159 --> 00:00:39.500 on some search algorithm, 00:00:39.700 --> 00:00:41.327 and the question was -- 00:00:42.901 --> 00:00:45.600 it was about ranking things and changing the ranking. 00:00:45.900 --> 00:00:47.290 And it was sort of clear 00:00:48.400 --> 00:00:50.610 that there was going to be a lot of heterogeneity there. 00:00:52.161 --> 00:00:56.282 If you look for, say, 00:00:57.831 --> 00:01:00.617 a picture of Britney Spears -- 00:01:00.617 --> 00:01:02.493 that it doesn't really matter where you rank it 00:01:02.493 --> 00:01:05.500 because you're going to figure out what you're looking for, 00:01:06.200 --> 00:01:07.867 whether you put it in the first or second 00:01:07.867 --> 00:01:09.920 or third position of the ranking. 00:01:10.100 --> 00:01:12.500 But if you're looking for the best econometrics book, 00:01:13.300 --> 00:01:16.430 if you put your book first or your book tenth -- 00:01:16.430 --> 00:01:18.100 that's going to make a big difference 00:01:18.600 --> 00:01:20.979 how often people are going to click on it. 00:01:21.829 --> 00:01:23.417 And so there you -- 00:01:23.417 --> 00:01:27.119 - [Josh] Why do I need machine learning to discover that? 00:01:27.119 --> 00:01:29.195 It seems like -- because I can discover it simply. 00:01:29.195 --> 00:01:30.435 - [Guido] So in general -- 00:01:30.435 --> 00:01:32.100 - [Josh] There were lots of possible... 00:01:32.100 --> 00:01:35.045 - You want to think about there being lots of characteristics 00:01:35.490 --> 00:01:37.280 of the items, 00:01:37.610 --> 00:01:41.682 that you want to understand what drives the heterogeneity 00:01:42.177 --> 00:01:43.427 in the effect of -- 00:01:43.427 --> 00:01:45.008 - But you're just predicting 00:01:45.008 --> 00:01:47.665 In some sense, you're solving a marketing problem. 00:01:47.665 --> 00:01:49.381 - No, it's a causal effect, 00:01:49.381 --> 00:01:51.911 - It's causal, but it has no scientific content. 00:01:51.911 --> 00:01:53.141 Think about... 00:01:53.657 --> 00:01:57.300 - No, but there's similar things in medical settings. 00:01:58.000 --> 00:02:01.300 If you do an experiment, you may actually be very interested 00:02:01.300 --> 00:02:03.900 in whether the treatment works for some groups or not. 00:02:03.900 --> 00:02:06.143 And you have a lot of individual characteristics, 00:02:06.143 --> 00:02:08.000 and you want to systematically search -- 00:02:08.000 --> 00:02:09.500 - Yeah. I'm skeptical about that -- 00:02:09.500 --> 00:02:12.603 that sort of idea that there's this personal causal effect 00:02:12.603 --> 00:02:14.000 that I should care about, 00:02:14.000 --> 00:02:15.740 and that machine learning can discover it 00:02:15.740 --> 00:02:17.259 in some way that's useful. 00:02:17.259 --> 00:02:20.045 So think about -- I've done a lot of work on schools, 00:02:20.045 --> 00:02:22.336 going to, say, a charter school, 00:02:22.336 --> 00:02:24.428 a publicly funded private school, 00:02:25.225 --> 00:02:27.392 effectively, that's free to structure 00:02:27.392 --> 00:02:29.399 its own curriculum for context there. 00:02:29.399 --> 00:02:31.369 Some types of charter schools 00:02:31.369 --> 00:02:33.703 generate spectacular achievement gains, 00:02:33.703 --> 00:02:36.400 and in the data set that produces that result, 00:02:36.400 --> 00:02:37.800 I have a lot of covariates. 00:02:37.800 --> 00:02:41.353 So I have baseline scores, and I have family background, 00:02:41.353 --> 00:02:43.207 the education of the parents, 00:02:43.576 --> 00:02:45.800 the sex of the child, the race of the child. 00:02:45.800 --> 00:02:49.795 And, well, soon as I put half a dozen of those together, 00:02:49.795 --> 00:02:51.900 I have a very high-dimensional space. 00:02:52.300 --> 00:02:55.199 I'm definitely interested in course features 00:02:55.199 --> 00:02:56.457 of that treatment effect, 00:02:56.457 --> 00:02:58.741 like whether it's better for people 00:02:58.741 --> 00:03:02.046 who come from lower-income families. 00:03:02.600 --> 00:03:05.760 I have a hard time believing that there's an application 00:03:07.273 --> 00:03:09.872 for the very high-dimensional version of that, 00:03:09.872 --> 00:03:12.406 where I discovered that for non-white children 00:03:12.406 --> 00:03:14.971 who have high family incomes 00:03:14.971 --> 00:03:17.800 but baseline scores in the third quartile 00:03:18.166 --> 00:03:21.785 and only went to public school in the third grade 00:03:21.785 --> 00:03:23.000 but not the sixth grade. 00:03:23.000 --> 00:03:25.796 So that's what that high-dimensional analysis produces. 00:03:25.800 --> 00:03:28.100 It's a very elaborate conditional statement. 00:03:28.300 --> 00:03:30.605 There's two things that are wrong with that in my view. 00:03:30.605 --> 00:03:31.797 First, I don't see it as -- 00:03:31.797 --> 00:03:34.000 I just can't imagine why it's actionable. 00:03:34.600 --> 00:03:36.600 I don't know why you'd want to act on it. 00:03:36.600 --> 00:03:39.455 And I know also that there's some alternative model 00:03:39.455 --> 00:03:41.200 that fits almost as well, 00:03:41.800 --> 00:03:43.398 that flips everything. 00:03:43.398 --> 00:03:45.350 Because machine learning doesn't tell me 00:03:45.350 --> 00:03:48.582 that this is really the predictor that matters -- 00:03:48.582 --> 00:03:50.965 it just tells me that this is a good predictor. 00:03:51.486 --> 00:03:54.870 And so, I think there is something different 00:03:54.870 --> 00:03:57.586 about the social science context. 00:03:57.940 --> 00:04:00.186 - [Guido] I think the social science applications 00:04:00.186 --> 00:04:01.633 you're talking about 00:04:01.633 --> 00:04:02.735 are ones where, 00:04:03.400 --> 00:04:08.100 I think, there's not a huge amount of heterogeneity in the effects. 00:04:09.777 --> 00:04:11.544 - [Josh] Well, there might be if you allow me 00:04:11.544 --> 00:04:13.466 to fill that space. 00:04:13.466 --> 00:04:15.740 - No... not even then. 00:04:15.740 --> 00:04:18.614 I think for a lot of those interventions, 00:04:18.614 --> 00:04:22.765 you would expect that the effect is the same sign for everybody. 00:04:24.367 --> 00:04:27.600 There may be small differences in the magnitude, but it's not... 00:04:28.200 --> 00:04:30.232 For a lot of these educational defenses -- 00:04:30.232 --> 00:04:31.700 they're good for everybody. 00:04:34.169 --> 00:04:36.034 It's not that they're bad for some people 00:04:36.034 --> 00:04:37.600 and good for other people, 00:04:37.600 --> 00:04:39.100 and that is kind of very small pockets 00:04:39.100 --> 00:04:40.900 where they're bad there. 00:04:40.900 --> 00:04:44.011 But there may be some variation in the magnitude, 00:04:44.011 --> 00:04:46.955 but you would need very, very big data sets to find those. 00:04:47.906 --> 00:04:49.078 I agree that in those cases, 00:04:49.078 --> 00:04:51.400 they probably wouldn't be very actionable anyway. 00:04:51.700 --> 00:04:53.800 But I think there's a lot of other settings 00:04:54.100 --> 00:04:56.600 where there is much more heterogeneity. 00:04:57.250 --> 00:04:59.102 - Well, I'm open to that possibility, 00:04:59.102 --> 00:05:04.918 and I think the example you gave is essentially a marketing example. 00:05:06.315 --> 00:05:09.656 - No, those have implications for it 00:05:09.656 --> 00:05:11.069 and that's the organization, 00:05:12.252 --> 00:05:14.330 whether you need to worry about the... 00:05:15.469 --> 00:05:17.900 - Well, I need to see that paper. 00:05:18.400 --> 00:05:21.072 - So the sense I'm getting is that -- 00:05:21.467 --> 00:05:23.996 - We still disagree on something. - Yes. [laughter] 00:05:23.996 --> 00:05:25.440 - We haven't converged on everything. 00:05:25.440 --> 00:05:27.200 - I'm getting that sense. [laughter] 00:05:27.200 --> 00:05:28.679 - Actually, we've diverged on this 00:05:28.679 --> 00:05:30.833 because this wasn't around to argue about. 00:05:30.833 --> 00:05:32.334 [laughter] 00:05:33.057 --> 00:05:34.771 - Is it getting a little warm here? 00:05:35.820 --> 00:05:38.147 - Warmed up. Warmed up is good. 00:05:38.147 --> 00:05:41.187 The sense I'm getting is, Josh, you're not saying 00:05:41.187 --> 00:05:43.289 that you're confident that there is no way 00:05:43.289 --> 00:05:45.017 that there is an application with the stuff. 00:05:45.017 --> 00:05:47.028 It's useful you are saying you are unconvinced 00:05:47.028 --> 00:05:49.487 by the existing applications to date. 00:05:49.917 --> 00:05:52.022 - Fair enough. - I'm very confident. 00:05:52.022 --> 00:05:53.704 [laughter] 00:05:54.156 --> 00:05:55.189 - In this case. 00:05:55.189 --> 00:05:56.555 - I think Josh does have a point 00:05:56.555 --> 00:06:00.452 that even in the prediction cases 00:06:01.639 --> 00:06:04.519 where a lot of the machine learning methods really shine 00:06:04.519 --> 00:06:06.738 is where there's just a lot of heterogeneity. 00:06:07.300 --> 00:06:10.769 - You don't really care much about the details there, right? 00:06:10.769 --> 00:06:11.836 - [Guido] Yes. 00:06:11.836 --> 00:06:15.000 - It doesn't have a policy angle or something. 00:06:15.200 --> 00:06:18.232 - The kind of recognizing handwritten digits and stuff -- 00:06:18.795 --> 00:06:20.090 it does much better there 00:06:20.090 --> 00:06:24.000 than building some complicated model. 00:06:24.400 --> 00:06:28.183 But a lot of the social science, a lot of the economic applications, 00:06:28.183 --> 00:06:30.383 we actually know a huge amount about the relationship 00:06:30.383 --> 00:06:32.100 between its variables. 00:06:32.100 --> 00:06:34.700 A lot of the relationships are strictly monotone. 00:06:37.166 --> 00:06:39.416 Education is going to increase people's earnings, 00:06:39.697 --> 00:06:41.950 irrespective of the demographic, 00:06:41.950 --> 00:06:44.930 irrespective of the level of education you already have. 00:06:44.930 --> 00:06:46.180 - Until they get to a Ph.D. 00:06:46.180 --> 00:06:47.956 - They don't have proof of graduate school... 00:06:47.956 --> 00:06:49.227 [laughter] 00:06:49.227 --> 00:06:50.700 - Over a reasonable range. 00:06:51.600 --> 00:06:55.488 It's not going to go down very much. 00:06:56.100 --> 00:06:58.121 In a lot of the settings 00:06:58.121 --> 00:07:00.100 where these machine learning methods shine, 00:07:00.100 --> 00:07:01.900 there's a lot of non-monotonicity 00:07:02.100 --> 00:07:04.900 kind of multimodality in these relationships, 00:07:05.300 --> 00:07:08.921 and they're going to be very powerful. 00:07:08.921 --> 00:07:11.787 But I still stand by that. 00:07:12.410 --> 00:07:14.975 These methods just have a huge amount to offer 00:07:15.925 --> 00:07:17.561 for economists, 00:07:17.561 --> 00:07:21.700 and they're going to be a big part of the future. 00:07:21.930 --> 00:07:23.240 ♪ [music] ♪ 00:07:23.240 --> 00:07:24.600 - [Isaiah] It feels like there's something interesting 00:07:24.600 --> 00:07:25.800 to be said about machine learning here. 00:07:25.800 --> 00:07:28.000 So, Guido, I was wondering, could you give some more... 00:07:28.000 --> 00:07:29.717 maybe some examples of the sorts of examples 00:07:29.717 --> 00:07:30.758 you're thinking about 00:07:30.758 --> 00:07:32.500 with applications [inaudible] at the moment? 00:07:32.500 --> 00:07:34.182 - So one area is where 00:07:34.700 --> 00:07:36.947 instead of looking for average causal effects, 00:07:36.947 --> 00:07:39.350 we're looking for individualized estimates, 00:07:41.354 --> 00:07:43.288 predictions of cause or effects, 00:07:43.288 --> 00:07:47.337 and the machine learning algorithms have been very effective. 00:07:48.031 --> 00:07:51.415 Traditionally, we would have done these things using kernel methods, 00:07:51.415 --> 00:07:54.003 and theoretically, they work great, 00:07:54.003 --> 00:07:55.636 and there's some arguments 00:07:55.636 --> 00:07:57.612 that, formally, you can't do any better. 00:07:57.612 --> 00:07:59.579 But in practice, they don't work very well. 00:08:00.900 --> 00:08:03.527 Random causal forest-type things 00:08:03.527 --> 00:08:06.916 that Stefan Wager and Susan Athey have been working on 00:08:06.916 --> 00:08:09.453 are used very widely. 00:08:09.453 --> 00:08:12.200 They've been very effective in these settings 00:08:12.400 --> 00:08:18.898 to actually get causal effects that vary by covariate. 00:08:20.700 --> 00:08:23.734 I think this is still just the beginning of these methods. 00:08:23.734 --> 00:08:25.700 But in many cases, 00:08:27.351 --> 00:08:31.600 these algorithms are very effective as searching over big spaces 00:08:31.800 --> 00:08:37.133 and finding the functions that fit very well 00:08:37.133 --> 00:08:40.948 in ways that we couldn't really do beforehand. 00:08:41.500 --> 00:08:42.697 - I don't know of an example 00:08:42.697 --> 00:08:45.300 where machine learning has generated insights 00:08:45.300 --> 00:08:47.664 about a causal effect that I'm interested in. 00:08:47.664 --> 00:08:49.610 And I do know of examples 00:08:49.610 --> 00:08:51.300 where it's potentially very misleading. 00:08:51.300 --> 00:08:53.700 So I've done some work with Brigham Frandsen, 00:08:54.100 --> 00:08:57.782 using, for example, random forest to model covariate effects 00:08:57.782 --> 00:09:00.269 in an instrumental variables problem 00:09:00.269 --> 00:09:03.375 where you need to condition on covariates. 00:09:04.400 --> 00:09:06.531 And you don't particularly have strong feelings 00:09:06.531 --> 00:09:08.200 about the functional form for that, 00:09:08.200 --> 00:09:10.000 so maybe you should curve... 00:09:10.900 --> 00:09:12.804 be open to flexible curve fitting, 00:09:12.804 --> 00:09:14.501 And that leads you down a path 00:09:14.501 --> 00:09:16.853 where there's a lot of nonlinearities in the model, 00:09:17.384 --> 00:09:19.933 and that's very dangerous with IV 00:09:19.933 --> 00:09:23.000 because any sort of excluded non-linearity 00:09:23.300 --> 00:09:25.839 potentially generates a spurious causal effect, 00:09:25.839 --> 00:09:29.292 and Brigham and I showed that very powerfully, I think, 00:09:29.292 --> 00:09:32.200 in the case of two instruments 00:09:32.944 --> 00:09:35.113 that come from a paper of mine with Bill Evans, 00:09:35.113 --> 00:09:37.600 where if you replace it... 00:09:38.708 --> 00:09:40.825 a traditional two-stage least squares estimator 00:09:40.825 --> 00:09:42.600 with some kind of random forest, 00:09:42.900 --> 00:09:46.807 you get very precisely estimated nonsense estimates. 00:09:49.173 --> 00:09:51.100 I think that's a big caution. 00:09:51.944 --> 00:09:55.096 In view of those findings, in an example I care about 00:09:55.096 --> 00:09:57.100 where the instruments are very simple 00:09:57.400 --> 00:09:59.100 and I believe that they're valid, 00:09:59.300 --> 00:10:01.096 I would be skeptical of that. 00:10:02.900 --> 00:10:06.435 Non-linearity and IV don't mix very comfortably. 00:10:06.435 --> 00:10:09.424 - No, it sounds like that's already a more complicated... 00:10:10.206 --> 00:10:11.842 - Well, it's IV... - Yeah. 00:10:12.591 --> 00:10:14.033 - ...but then we work on that. 00:10:14.403 --> 00:10:15.907 [laughter] 00:10:15.907 --> 00:10:17.289 - Fair enough. 00:10:17.289 --> 00:10:18.520 ♪ [music] ♪ 00:10:18.520 --> 00:10:19.931 - [Guido] As an editor of econometric guy, 00:10:19.931 --> 00:10:22.054 a lot of these papers cross my desk, 00:10:22.700 --> 00:10:26.823 but the motivation is not clear 00:10:27.555 --> 00:10:29.500 and, in fact, really lacking. 00:10:29.800 --> 00:10:31.028 They're not... 00:10:31.591 --> 00:10:34.926 [vehicle]-type semiparametric foundational papers. 00:10:35.315 --> 00:10:37.151 So that's a big problem. 00:10:38.761 --> 00:10:42.664 A related problem is that we have this tradition in econometrics 00:10:42.664 --> 00:10:46.560 of being very focused on these formal asymptotic results. 00:10:48.800 --> 00:10:53.289 We just have a lot of papers where people propose a method, 00:10:53.289 --> 00:10:55.700 and then they establish the asymptotic properties 00:10:56.300 --> 00:10:59.420 in a very kind of standardized way. 00:11:00.873 --> 00:11:02.055 - Is that bad? 00:11:02.900 --> 00:11:06.420 - Well, I think it's sort of closed the door 00:11:06.420 --> 00:11:09.040 for a lot of work that doesn't fit into that 00:11:09.040 --> 00:11:11.600 where in the machine learning literature, 00:11:11.900 --> 00:11:13.453 a lot of things are more algorithmic. 00:11:13.808 --> 00:11:18.500 People had algorithms for coming up with predictions 00:11:18.800 --> 00:11:20.885 that turn out to actually work much better 00:11:20.885 --> 00:11:23.600 than, say, nonparametric kernel regression. 00:11:24.000 --> 00:11:26.800 For a long time, we were doing all the nonparametrics in econometrics, 00:11:26.800 --> 00:11:28.950 and we were using kernel regression, 00:11:28.950 --> 00:11:31.210 and that was great for proving theorems. 00:11:31.210 --> 00:11:32.580 You could get confidence intervals 00:11:32.580 --> 00:11:34.684 and consistency, and asymptotic normality, 00:11:34.684 --> 00:11:35.736 and it was all great, 00:11:35.736 --> 00:11:37.000 But it wasn't very useful. 00:11:37.300 --> 00:11:39.100 And the things they did in machine learning 00:11:39.100 --> 00:11:41.051 are just way, way better. 00:11:41.051 --> 00:11:42.557 But they didn't have the problem -- 00:11:42.557 --> 00:11:44.449 - That's not my beef with machine learning, 00:11:44.449 --> 00:11:45.871 that the theory is weak. 00:11:45.871 --> 00:11:47.141 [laughter] 00:11:47.141 --> 00:11:51.320 - No, but I'm saying there, for the prediction part, 00:11:51.320 --> 00:11:52.394 it does much better. 00:11:52.394 --> 00:11:54.500 - Yeah, it's a better curve fitting to it. 00:11:54.900 --> 00:11:57.608 - But it did so in a way 00:11:57.608 --> 00:11:59.782 that would not have made those papers 00:11:59.782 --> 00:12:04.234 initially easy to get into, the econometrics journals, 00:12:04.234 --> 00:12:06.270 because it wasn't proving the type of things... 00:12:06.786 --> 00:12:09.864 When Breiman was doing his regression trees -- 00:12:09.864 --> 00:12:11.200 they just didn't fit in. 00:12:12.944 --> 00:12:14.934 I think he would have had a very hard time 00:12:14.934 --> 00:12:18.400 publishing these things in econometrics journals. 00:12:20.189 --> 00:12:23.656 I think we've limited ourselves too much 00:12:24.700 --> 00:12:27.830 that left us close things off 00:12:27.830 --> 00:12:29.622 for a lot of these machine-learning methods 00:12:29.622 --> 00:12:31.163 that are actually very useful. 00:12:31.163 --> 00:12:34.000 I mean, I think, in general, 00:12:34.900 --> 00:12:36.529 that literature, the computer scientist, 00:12:36.529 --> 00:12:40.013 have brought a huge number of these algorithms there -- 00:12:40.582 --> 00:12:42.632 have proposed a huge number of these algorithms 00:12:42.632 --> 00:12:43.875 that actually are very useful. 00:12:43.887 --> 00:12:46.153 and that are affecting 00:12:46.153 --> 00:12:49.100 the way we're going to be doing empirical work. 00:12:49.800 --> 00:12:52.105 But we've not fully internalized that 00:12:52.105 --> 00:12:53.573 because we're still very focused 00:12:53.573 --> 00:12:57.500 on getting point estimates and getting standard errors 00:12:58.600 --> 00:13:00.144 and getting P values 00:13:00.159 --> 00:13:03.209 in a way that we need to move beyond 00:13:03.209 --> 00:13:06.090 to fully harness the force, 00:13:06.549 --> 00:13:08.351 the benefits 00:13:08.351 --> 00:13:10.700 from machine learning literature. 00:13:11.198 --> 00:13:13.548 - On the one hand, I guess I very much take your point 00:13:13.548 --> 00:13:16.850 that sort of the traditional econometrics framework 00:13:16.850 --> 00:13:19.821 of propose a method, prove a limit theorem 00:13:19.821 --> 00:13:23.870 under some asymptotic story, story, story, story, story... 00:13:24.424 --> 00:13:27.057 publisher paper is constraining, 00:13:27.218 --> 00:13:30.132 and that, in some sense, by thinking more broadly 00:13:30.132 --> 00:13:31.699 about what a methods paper could look like, 00:13:31.699 --> 00:13:33.316 we may in some sense. 00:13:33.316 --> 00:13:34.959 Certainly, the machine learning literature 00:13:34.959 --> 00:13:37.189 has found a bunch of things which seem to work quite well 00:13:37.189 --> 00:13:38.300 for a number of problems 00:13:38.300 --> 00:13:41.267 and are now having substantial influence in economics. 00:13:41.267 --> 00:13:43.261 I guess a question I'm interested in 00:13:43.261 --> 00:13:46.465 is how do you think about the role of... 00:13:48.657 --> 00:13:51.200 Do you think there is no value in the theory part of it? 00:13:51.600 --> 00:13:54.187 Because I guess a question that I often have 00:13:54.187 --> 00:13:56.804 to seeing the output from a machine learning tool, 00:13:56.804 --> 00:13:58.207 and actually a number of the methods 00:13:58.207 --> 00:13:59.220 that you talked about 00:13:59.220 --> 00:14:00.679 actually do have inferential results 00:14:00.679 --> 00:14:01.944 developed for them, 00:14:02.520 --> 00:14:03.963 something that I always wonder about, 00:14:03.963 --> 00:14:06.659 a sort of uncertainty quantification and just... 00:14:06.659 --> 00:14:08.000 I have my prior, 00:14:08.000 --> 00:14:11.000 I come into the world with my view, I see the result of this thing. 00:14:11.000 --> 00:14:12.395 How should I update based on it? 00:14:12.395 --> 00:14:13.867 And in some sense, if I'm in a world 00:14:13.867 --> 00:14:15.914 where things are normally distributed, 00:14:15.914 --> 00:14:17.280 I know how to do it -- 00:14:17.280 --> 00:14:18.305 here I don't. 00:14:18.305 --> 00:14:21.028 And so I'm interested to hear what you think about that. 00:14:21.500 --> 00:14:24.425 - I don't see this as sort of saying, well, 00:14:24.698 --> 00:14:26.556 these results are not interesting, 00:14:26.556 --> 00:14:27.968 but it's going to be a lot of cases 00:14:27.968 --> 00:14:30.153 where it's going to be incredibly hard to get those results, 00:14:30.153 --> 00:14:32.489 and we may not be able to get there, 00:14:32.489 --> 00:14:34.942 and we may need to do it in stages 00:14:34.942 --> 00:14:36.440 where first someone says, 00:14:36.440 --> 00:14:40.900 "Hey, I have this interesting algorithm 00:14:40.900 --> 00:14:42.200 for doing something, 00:14:42.200 --> 00:14:47.769 and it works well by some criterion there 00:14:47.769 --> 00:14:49.900 on this particular data set, 00:14:51.000 --> 00:14:52.602 and we should put it out there." 00:14:52.602 --> 00:14:55.410 and maybe someone will figure out a way 00:14:55.410 --> 00:14:57.828 that you can later actually still do inference 00:14:57.828 --> 00:14:59.463 under some condition, 00:14:59.463 --> 00:15:02.100 and maybe those are not particularly realistic conditions, 00:15:02.100 --> 00:15:03.800 then we kind of go further. 00:15:03.800 --> 00:15:08.418 But I think we've been constraining things too much 00:15:08.418 --> 00:15:09.519 where we said, 00:15:09.519 --> 00:15:13.185 "This is the type of things that we need to do." 00:15:13.185 --> 00:15:14.502 And in some sense, 00:15:15.700 --> 00:15:18.200 that goes back to the way Josh and I 00:15:19.700 --> 00:15:21.984 thought about things for the local average treatment effect. 00:15:21.984 --> 00:15:23.137 That wasn't quite the way 00:15:23.137 --> 00:15:25.135 people were thinking about these problems before. 00:15:25.805 --> 00:15:28.860 There was a sense that some of the people said 00:15:29.500 --> 00:15:31.900 the way you need to do these things is you first say 00:15:32.200 --> 00:15:34.250 what you're interested in estimating, 00:15:34.250 --> 00:15:37.507 and then you do the best job you can in estimating that. 00:15:38.100 --> 00:15:43.874 And what you guys are doing is you're doing it backwards. 00:15:44.300 --> 00:15:46.700 You kind of say, "Here, I have an estimator, 00:15:47.300 --> 00:15:50.642 and now I'm going to figure out what it's estimating." 00:15:50.642 --> 00:15:53.900 And I suppose you're going to say why you think that's interesting 00:15:53.900 --> 00:15:56.600 or maybe why it's not interesting, and that's not okay. 00:15:56.600 --> 00:15:58.600 You're not allowed to do that in that way. 00:15:59.000 --> 00:16:02.026 And I think we should just be a little bit more flexible 00:16:02.026 --> 00:16:06.648 in thinking about how to look at problems 00:16:06.648 --> 00:16:08.328 because I think we've missed some things 00:16:08.328 --> 00:16:11.300 by not doing that. 00:16:11.300 --> 00:16:12.819 ♪ [music] ♪ 00:16:12.819 --> 00:16:14.753 - [Josh] So you've heard our views, Isaiah, 00:16:14.753 --> 00:16:18.191 and you've seen that we have some points of disagreement. 00:16:18.191 --> 00:16:20.400 Why don't you referee this dispute for us? 00:16:20.950 --> 00:16:22.394 [laughter] 00:16:22.500 --> 00:16:24.999 - Oh, it's so nice of you to ask me a small question. 00:16:24.999 --> 00:16:26.212 [laughter] 00:16:26.425 --> 00:16:27.993 So I guess, for one, 00:16:27.993 --> 00:16:33.200 I very much agree with something that Guido said earlier of... 00:16:34.100 --> 00:16:35.710 [laughter] 00:16:35.920 --> 00:16:37.148 So one thing where it seems 00:16:37.148 --> 00:16:40.066 where the case for machine learning seems relatively clear 00:16:40.066 --> 00:16:43.316 is in settings where we're interested in some version 00:16:43.316 --> 00:16:45.100 of a nonparametric prediction problem. 00:16:45.100 --> 00:16:46.392 So I'm interested in estimating 00:16:46.392 --> 00:16:49.700 a conditional expectation or conditional probability, 00:16:50.000 --> 00:16:52.100 and in the past, maybe I would have run a kernel... 00:16:52.100 --> 00:16:53.526 I would have run a kernel regression 00:16:53.526 --> 00:16:55.184 or I would have run a series regression, 00:16:55.184 --> 00:16:57.400 or something along those lines. 00:16:57.976 --> 00:17:00.350 It seems like, at this point, we've a fairly good sense 00:17:00.350 --> 00:17:03.102 that in a fairly wide range of applications, 00:17:03.102 --> 00:17:05.671 machine learning methods seem to do better 00:17:05.671 --> 00:17:08.610 for estimating conditional mean functions, 00:17:08.610 --> 00:17:09.811 or conditional probabilities, 00:17:09.811 --> 00:17:12.000 or various other nonparametric objects 00:17:12.400 --> 00:17:15.309 than more traditional nonparametric methods 00:17:15.309 --> 00:17:17.292 that were studied in econometrics and statistics, 00:17:17.292 --> 00:17:19.100 especially in high-dimensional settings. 00:17:19.500 --> 00:17:21.849 - So you're thinking of maybe the propensity score 00:17:21.849 --> 00:17:23.155 or something like that? 00:17:23.155 --> 00:17:25.063 - Yeah, exactly, - Nuisance functions. 00:17:25.063 --> 00:17:27.100 - Yeah, so things like propensity scores. 00:17:27.872 --> 00:17:29.965 Even objects of more direct 00:17:29.965 --> 00:17:32.400 interest-like conditional average treatment effects, 00:17:32.400 --> 00:17:35.100 which are the difference of two conditional expectation functions, 00:17:35.100 --> 00:17:36.625 potentially things like that. 00:17:36.625 --> 00:17:40.573 Of course, even there, the theory... 00:17:40.573 --> 00:17:43.620 for inference of the theory for how to interpret, 00:17:43.620 --> 00:17:45.797 how to make large sample statements about some of these things 00:17:45.797 --> 00:17:47.733 are less well-developed depending on 00:17:47.733 --> 00:17:50.100 the machine learning estimator used. 00:17:50.100 --> 00:17:52.983 And so I think something that is tricky 00:17:52.983 --> 00:17:55.700 is that we can have these methods, which work a lot, 00:17:55.700 --> 00:17:58.000 which seem to work a lot better for some purposes 00:17:58.000 --> 00:18:01.229 but which we need to be a bit careful in how we plug them in 00:18:01.229 --> 00:18:03.300 or how we interpret the resulting statements. 00:18:03.600 --> 00:18:05.985 But, of course, that's a very, very active area right now 00:18:05.985 --> 00:18:07.568 where people are doing tons of great work. 00:18:07.568 --> 00:18:10.694 And so I fully expect and hope to see 00:18:10.694 --> 00:18:12.800 much more going forward there. 00:18:13.000 --> 00:18:16.780 So one issue with machine learning that always seems a danger is... 00:18:16.780 --> 00:18:18.517 or that is sometimes a danger 00:18:18.517 --> 00:18:20.938 and has sometimes led to applications 00:18:20.938 --> 00:18:22.139 that have made less sense 00:18:22.139 --> 00:18:27.309 is when folks start with a method that they're very excited about 00:18:27.309 --> 00:18:28.676 rather than a question. 00:18:28.900 --> 00:18:30.492 So sort of starting with a question 00:18:30.492 --> 00:18:33.782 where here's the object I'm interested in, 00:18:33.782 --> 00:18:35.228 here is the parameter of interest -- 00:18:35.529 --> 00:18:39.500 let me think about how I would identify that thing, 00:18:39.500 --> 00:18:41.824 how I would recover that thing if I had a ton of data. 00:18:41.824 --> 00:18:44.000 Oh, here's a conditional expectation function, 00:18:44.000 --> 00:18:47.065 let me plug in a machine learning estimator for that -- 00:18:47.065 --> 00:18:48.800 that seems very, very sensible. 00:18:49.000 --> 00:18:52.964 Whereas, you know, if I regress quantity on price 00:18:53.504 --> 00:18:56.000 and say that I used a machine learning method, 00:18:56.300 --> 00:18:58.791 maybe I'm satisfied that that solves the endogeneity problem 00:18:58.791 --> 00:19:01.200 we're usually worried about there... maybe I'm not. 00:19:01.500 --> 00:19:02.649 But, again, that's something 00:19:02.649 --> 00:19:06.300 where the way to address it seems relatively clear. 00:19:06.500 --> 00:19:08.181 It's to find your object of interest 00:19:08.181 --> 00:19:09.779 and think about -- 00:19:09.779 --> 00:19:11.489 - Just bring in the economics. 00:19:11.489 --> 00:19:12.741 - Exactly. 00:19:12.741 --> 00:19:14.274 - And think about the heterogeneity, 00:19:14.274 --> 00:19:17.067 but harness the power of the machine learning methods 00:19:17.067 --> 00:19:20.148 for some of the components. 00:19:20.349 --> 00:19:21.388 - Precisely. Exactly. 00:19:21.388 --> 00:19:23.753 So the question of interest 00:19:23.753 --> 00:19:25.767 is the same as the question of interest has always been, 00:19:25.767 --> 00:19:28.493 but we now have better methods for estimating some pieces of this. 00:19:29.900 --> 00:19:32.704 The place that seems harder to forecast 00:19:32.704 --> 00:19:35.816 is obviously there's a huge amount going on 00:19:35.816 --> 00:19:37.500 in the machine learning literature, 00:19:37.500 --> 00:19:40.223 and the limited ways of plugging it in 00:19:40.223 --> 00:19:41.388 that I've referenced so far 00:19:41.388 --> 00:19:43.090 are a limited piece of that. 00:19:43.090 --> 00:19:45.324 So I think there are all sorts of other interesting questions 00:19:45.324 --> 00:19:46.520 about where... 00:19:47.100 --> 00:19:49.300 where does this interaction go? What else can we learn? 00:19:49.300 --> 00:19:52.932 And that's something where I think there's a ton going on, 00:19:52.932 --> 00:19:54.414 which seems very promising, 00:19:54.414 --> 00:19:56.400 and I have no idea what the answer is. 00:19:57.000 --> 00:20:00.297 - No, I totally agree with that, 00:20:00.297 --> 00:20:03.539 but that makes it very exciting. 00:20:03.539 --> 00:20:06.100 And I think there's just a little work to be done there. 00:20:06.600 --> 00:20:08.720 Alright. So I say, he agrees with me there. 00:20:08.720 --> 00:20:10.174 [laughter] 00:20:10.174 --> 00:20:11.633 - I didn't say that per se. 00:20:12.926 --> 00:20:14.419 ♪ [music] ♪ 00:20:14.419 --> 00:20:16.833 - [Narrator] If you'd like to watch more Nobel Conversations, 00:20:16.833 --> 00:20:18.012 click here. 00:20:18.012 --> 00:20:20.492 Or if you'd like to learn more about econometrics, 00:20:20.500 --> 00:20:23.100 check out Josh's Mastering Econometrics series. 00:20:23.600 --> 00:20:26.569 If you'd like to learn more about Guido, Josh, and Isaiah, 00:20:26.569 --> 00:20:28.550 check out the links in the description. 00:20:28.550 --> 00:20:30.535 ♪ [music] ♪