0:00:00.100,0:00:02.050 ♪ [music] ♪ 0:00:03.620,0:00:05.700 - [Narrator] Welcome[br]to Nobel Conversations. 0:00:07.000,0:00:10.043 In this episode, Josh Angrist[br]and Guido Imbens 0:00:10.043,0:00:13.675 sit down with Isaiah Andrews[br]to discuss and disagree 0:00:13.675,0:00:16.580 over the role of machine learning[br]in applied econometrics. 0:00:17.897,0:00:19.769 - [Isaiah] So, of course,[br]there are a lot of topics 0:00:19.769,0:00:21.087 where you guys largely agree, 0:00:21.087,0:00:22.313 but I'd like to turn to one 0:00:22.313,0:00:24.240 where maybe you have[br]some differences of opinion. 0:00:24.240,0:00:25.728 I'd love to hear[br]some of your thoughts 0:00:25.728,0:00:26.883 about machine learning 0:00:26.883,0:00:29.900 and the goal that it's playing[br]and is going to play in economics. 0:00:30.200,0:00:33.352 - [Guido] I've looked at some data[br]like the proprietary. 0:00:33.352,0:00:35.150 We see that there's[br]no published paper there. 0:00:36.122,0:00:38.159 There was an experiment[br]that was done 0:00:38.159,0:00:39.500 on some search algorithm, 0:00:39.700,0:00:41.327 and the question was -- 0:00:42.901,0:00:45.600 it was about ranking things[br]and changing the ranking. 0:00:45.900,0:00:47.290 And it was sort of clear 0:00:48.400,0:00:50.610 that there was going to be[br]a lot of heterogeneity there. 0:00:52.161,0:00:56.282 If you look for, say, 0:00:57.831,0:01:00.617 a picture of Britney Spears -- 0:01:00.617,0:01:02.493 that it doesn't really matter[br]where you rank it 0:01:02.493,0:01:05.500 because you're going to figure out[br]what you're looking for, 0:01:06.200,0:01:07.867 whether you put it[br]in the first or second 0:01:07.867,0:01:09.920 or third position of the ranking. 0:01:10.100,0:01:12.500 But if you're looking[br]for the best econometrics book, 0:01:13.300,0:01:16.430 if you put your book first[br]or your book tenth -- 0:01:16.430,0:01:18.100 that's going to make[br]a big difference 0:01:18.600,0:01:20.979 how often people[br]are going to click on it. 0:01:21.829,0:01:23.417 And so there you -- 0:01:23.417,0:01:27.119 - [Josh] Why do I need[br]machine learning to discover that? 0:01:27.119,0:01:29.195 It seems like -- because[br]I can discover it simply. 0:01:29.195,0:01:30.435 - [Guido] So in general -- 0:01:30.435,0:01:32.100 - [Josh] There were lots[br]of possible... 0:01:32.100,0:01:35.045 - You want to think about[br]there being lots of characteristics 0:01:35.490,0:01:37.280 of the items, 0:01:37.610,0:01:41.682 that you want to understand[br]what drives the heterogeneity 0:01:42.177,0:01:43.427 in the effect of -- 0:01:43.427,0:01:45.008 - But you're just predicting 0:01:45.008,0:01:47.665 In some sense, you're solving[br]a marketing problem. 0:01:47.665,0:01:49.381 - No, it's a causal effect, 0:01:49.381,0:01:51.911 - It's causal, but it has[br]no scientific content. 0:01:51.911,0:01:53.141 Think about... 0:01:53.657,0:01:57.300 - No, but there's similar things[br]in medical settings. 0:01:58.000,0:02:01.300 If you do an experiment, [br]you may actually be very interested 0:02:01.300,0:02:03.900 in whether the treatment works[br]for some groups or not. 0:02:03.900,0:02:06.143 And you have a lot[br]of individual characteristics, 0:02:06.143,0:02:08.000 and you want[br]to systematically search -- 0:02:08.000,0:02:09.500 - Yeah. I'm skeptical about that -- 0:02:09.500,0:02:12.603 that sort of idea that there's[br]this personal causal effect 0:02:12.603,0:02:14.000 that I should care about, 0:02:14.000,0:02:15.740 and that machine learning[br]can discover it 0:02:15.740,0:02:17.259 in some way that's useful. 0:02:17.259,0:02:20.045 So think about -- I've done[br]a lot of work on schools, 0:02:20.045,0:02:22.336 going to, say, a charter school, 0:02:22.336,0:02:24.428 a publicly funded private school, 0:02:25.225,0:02:27.392 effectively, [br]that's free to structure 0:02:27.392,0:02:29.399 its own curriculum[br]for context there. 0:02:29.399,0:02:31.369 Some types of charter schools 0:02:31.369,0:02:33.703 generate spectacular[br]achievement gains, 0:02:33.703,0:02:36.400 and in the data set[br]that produces that result, 0:02:36.400,0:02:37.800 I have a lot of covariates. 0:02:37.800,0:02:41.353 So I have baseline scores,[br]and I have family background, 0:02:41.353,0:02:43.207 the education of the parents, 0:02:43.576,0:02:45.800 the sex of the child, [br]the race of the child. 0:02:45.800,0:02:49.795 And, well, soon as I put[br]half a dozen of those together, 0:02:49.795,0:02:51.900 I have a very [br]high-dimensional space. 0:02:52.300,0:02:55.199 I'm definitely interested[br]in course features 0:02:55.199,0:02:56.457 of that treatment effect, 0:02:56.457,0:02:58.741 like whether it's better for people 0:02:58.741,0:03:02.046 who come from[br]lower-income families. 0:03:02.600,0:03:05.760 I have a hard time believing[br]that there's an application 0:03:07.273,0:03:09.872 for the very high-dimensional[br]version of that, 0:03:09.872,0:03:12.406 where I discovered[br]that for non-white children 0:03:12.406,0:03:14.971 who have high family incomes 0:03:14.971,0:03:17.800 but baseline scores[br]in the third quartile 0:03:18.166,0:03:21.785 and only went to public school[br]in the third grade 0:03:21.785,0:03:23.000 but not the sixth grade. 0:03:23.000,0:03:25.796 So that's what that [br]high-dimensional analysis produces. 0:03:25.800,0:03:28.100 It's a very elaborate[br]conditional statement. 0:03:28.300,0:03:30.605 There's two things that are wrong[br]with that in my view. 0:03:30.605,0:03:31.797 First, I don't see it as -- 0:03:31.797,0:03:34.000 I just can't imagine[br]why it's actionable. 0:03:34.600,0:03:36.600 I don't know why[br]you'd want to act on it. 0:03:36.600,0:03:39.455 And I know also that[br]there's some alternative model 0:03:39.455,0:03:41.200 that fits almost as well, 0:03:41.800,0:03:43.398 that flips everything. 0:03:43.398,0:03:45.350 Because machine learning[br]doesn't tell me 0:03:45.350,0:03:48.582 that this is really[br]the predictor that matters -- 0:03:48.582,0:03:50.965 it just tells me[br]that this is a good predictor. 0:03:51.486,0:03:54.870 And so, I think[br]there is something different 0:03:54.870,0:03:57.586 about the social science context. 0:03:57.940,0:04:00.186 - [Guido] I think[br]the social science applications 0:04:00.186,0:04:01.633 you're talking about 0:04:01.633,0:04:02.735 are ones where, 0:04:03.400,0:04:08.100 I think, there's not a huge amount[br]of heterogeneity in the effects. 0:04:09.777,0:04:11.544 - [Josh] Well, there might be[br]if you allow me 0:04:11.544,0:04:13.466 to fill that space. 0:04:13.466,0:04:15.740 - No... not even then. 0:04:15.740,0:04:18.614 I think for a lot[br]of those interventions, 0:04:18.614,0:04:22.765 you would expect that the effect[br]is the same sign for everybody. 0:04:24.367,0:04:27.600 There may be small differences[br]in the magnitude, but it's not... 0:04:28.200,0:04:30.232 For a lot of these[br]educational defenses -- 0:04:30.232,0:04:31.700 they're good for everybody. 0:04:34.169,0:04:36.034 It's not that they're bad[br]for some people 0:04:36.034,0:04:37.600 and good for other people, 0:04:37.600,0:04:39.100 and that is kind[br]of very small pockets 0:04:39.100,0:04:40.900 where they're bad there. 0:04:40.900,0:04:44.011 But there may be some variation[br]in the magnitude, 0:04:44.011,0:04:46.955 but you would need very, [br]very big data sets to find those. 0:04:47.906,0:04:49.078 I agree that in those cases, 0:04:49.078,0:04:51.400 they probably wouldn't be[br]very actionable anyway. 0:04:51.700,0:04:53.800 But I think there's a lot[br]of other settings 0:04:54.100,0:04:56.600 where there is[br]much more heterogeneity. 0:04:57.250,0:04:59.102 - Well, I'm open[br]to that possibility, 0:04:59.102,0:05:04.918 and I think the example you gave[br]is essentially a marketing example. 0:05:06.315,0:05:09.656 - No, those have[br]implications for it 0:05:09.656,0:05:11.069 and that's the organization, 0:05:12.252,0:05:14.330 whether you need[br]to worry about the... 0:05:15.469,0:05:17.900 - Well, I need to see that paper. 0:05:18.400,0:05:21.072 - So the sense[br]I'm getting is that -- 0:05:21.467,0:05:23.996 - We still disagree on something.[br]- Yes. 0:05:23.996,0:05:25.440 - We haven't converged[br]on everything. 0:05:25.440,0:05:27.200 - I'm getting that sense.[br][laughter] 0:05:27.200,0:05:28.679 - Actually, we've diverged on this 0:05:28.679,0:05:30.833 because this wasn't around[br]to argue about. 0:05:30.833,0:05:32.334 [laughter] 0:05:33.057,0:05:34.771 - Is it getting a little warm here? 0:05:35.820,0:05:38.147 - Warmed up. Warmed up is good. 0:05:38.147,0:05:41.187 The sense I'm getting is, [br]Josh, you're not saying 0:05:41.187,0:05:43.119 that you're confident[br]that there is no way 0:05:43.119,0:05:45.347 that there is an application[br]where this stuff is useful. 0:05:45.347,0:05:47.028 You are saying[br]you are unconvinced 0:05:47.028,0:05:49.487 by the existing[br]applications to date. 0:05:49.917,0:05:52.022 - Fair enough.[br]- I'm very confident. 0:05:52.022,0:05:53.704 [laughter] 0:05:54.156,0:05:55.189 - In this case. 0:05:55.189,0:05:56.555 - I think Josh does have a point 0:05:56.555,0:06:00.452 that even in the prediction cases 0:06:01.639,0:06:04.519 where a lot of the machine learning[br]methods really shine 0:06:04.519,0:06:06.738 is where there's just a lot[br]of heterogeneity. 0:06:07.300,0:06:10.769 - You don't really care much[br]about the details there, right? 0:06:10.769,0:06:11.836 - [Guido] Yes. 0:06:11.836,0:06:15.000 - It doesn't have[br]a policy angle or something. 0:06:15.200,0:06:18.232 - The kind of recognizing[br]handwritten digits and stuff -- 0:06:18.795,0:06:20.090 it does much better there 0:06:20.090,0:06:24.000 than building[br]some complicated model. 0:06:24.400,0:06:28.183 But a lot of the social science,[br]a lot of the economic applications, 0:06:28.183,0:06:30.383 we actually know a huge amount[br]about the relationship 0:06:30.383,0:06:32.100 between its variables. 0:06:32.100,0:06:34.700 A lot of the relationships[br]are strictly monotone. 0:06:37.166,0:06:39.416 Education is going to increase[br]people's earnings, 0:06:39.697,0:06:41.950 irrespective of the demographic, 0:06:41.950,0:06:44.930 irrespective of the level[br]of education you already have. 0:06:44.930,0:06:46.180 - Until they get to a Ph.D. 0:06:46.180,0:06:47.956 - Is that true for graduate school? 0:06:47.956,0:06:49.227 [laughter] 0:06:49.227,0:06:50.700 - Over a reasonable range. 0:06:51.600,0:06:55.488 It's not going[br]to go down very much. 0:06:56.100,0:06:58.121 In a lot of the settings 0:06:58.121,0:07:00.100 where these machine learning[br]methods shine, 0:07:00.100,0:07:01.900 there's a lot of non-monotonicity, 0:07:02.100,0:07:04.900 kind of multimodality[br]in these relationships, 0:07:05.300,0:07:08.921 and they're going to be[br]very powerful. 0:07:08.921,0:07:11.787 But I still stand by that. 0:07:12.410,0:07:14.975 These methods just have[br]a huge amount to offer 0:07:15.925,0:07:17.561 for economists, 0:07:17.561,0:07:21.700 and they're going to be[br]a big part of the future. 0:07:21.930,0:07:23.020 ♪ [music] ♪ 0:07:23.020,0:07:24.600 - [Isaiah] It feels like[br]there's something interesting 0:07:24.600,0:07:25.912 to be said about[br]machine learning here. 0:07:25.912,0:07:28.100 So, Guido, I was wondering,[br]could you give some more... 0:07:28.100,0:07:29.807 maybe some examples[br]of the sorts of examples 0:07:29.807,0:07:30.908 you're thinking about 0:07:30.908,0:07:32.660 with applications coming out[br]at the moment? 0:07:32.660,0:07:34.182 - So one area is where 0:07:34.700,0:07:36.947 instead of looking[br]for average causal effects, 0:07:36.947,0:07:39.350 we're looking for[br]individualized estimates, 0:07:41.354,0:07:43.288 predictions of causal effects, 0:07:43.288,0:07:46.112 and there, [br]the machine learning algorithms 0:07:46.112,0:07:47.941 have been very effective. 0:07:47.941,0:07:51.415 Traditionally, we would have done[br]these things using kernel methods, 0:07:51.415,0:07:54.003 and theoretically, they work great, 0:07:54.003,0:07:55.636 and there's some arguments 0:07:55.636,0:07:57.612 that, formally, [br]you can't do any better. 0:07:57.612,0:07:59.579 But in practice, [br]they don't work very well. 0:08:00.900,0:08:03.527 Random causal forest-type things 0:08:03.527,0:08:06.916 that Stefan Wager and Susan Athey[br]have been working on 0:08:06.916,0:08:09.453 are used very widely. 0:08:09.453,0:08:12.200 They've been very effective[br]in these settings 0:08:12.400,0:08:19.208 to actually get causal effects[br]that vary by covariates. 0:08:20.700,0:08:23.734 I think this is still just[br]the beginning of these methods. 0:08:23.734,0:08:25.700 But in many cases, 0:08:27.351,0:08:31.600 these algorithms are very effective[br]as searching over big spaces 0:08:31.800,0:08:37.133 and finding the functions[br]that fit very well 0:08:37.133,0:08:40.948 in ways that we couldn't[br]really do beforehand. 0:08:41.500,0:08:42.697 - I don't know of an example 0:08:42.697,0:08:45.300 where machine learning[br]has generated insights 0:08:45.300,0:08:47.664 about a causal effect[br]that I'm interested in. 0:08:47.664,0:08:49.610 And I do know of examples 0:08:49.610,0:08:51.300 where it's potentially[br]very misleading. 0:08:51.300,0:08:53.700 So I've done some work[br]with Brigham Frandsen, 0:08:54.100,0:08:57.782 using, for example, random forests[br]to model covariate effects 0:08:57.782,0:09:00.269 in an instrumental[br]variables problem 0:09:00.269,0:09:03.375 where you need[br]to condition on covariates. 0:09:04.400,0:09:06.531 And you don't particularly[br]have strong feelings 0:09:06.531,0:09:08.200 about the functional form for that, 0:09:08.200,0:09:10.000 so maybe you should curve... 0:09:10.900,0:09:12.804 be open to flexible curve fitting, 0:09:12.804,0:09:14.501 And that leads you down a path 0:09:14.501,0:09:16.853 where there's a lot[br]of nonlinearities in the model, 0:09:17.384,0:09:19.933 and that's very dangerous with IV 0:09:19.933,0:09:23.000 because any sort[br]of excluded non-linearity 0:09:23.300,0:09:25.839 potentially generates[br]a spurious causal effect, 0:09:25.839,0:09:29.292 and Brigham and I showed that[br]very powerfully, I think, 0:09:29.292,0:09:32.200 in the case of two instruments 0:09:32.944,0:09:35.113 that come from a paper of mine[br]with Bill Evans, 0:09:35.113,0:09:37.600 where if you replace it... 0:09:38.708,0:09:40.825 a traditional two-stage[br]least squares estimator 0:09:40.825,0:09:42.600 with some kind of random forest, 0:09:42.900,0:09:46.807 you get very precisely estimated[br]nonsense estimates. 0:09:49.173,0:09:51.100 I think that's a big caution. 0:09:51.944,0:09:55.096 In view of those findings,[br]in an example I care about 0:09:55.096,0:09:57.100 where the instruments[br]are very simple 0:09:57.400,0:09:59.100 and I believe that they're valid, 0:09:59.300,0:10:01.096 I would be skeptical of that. 0:10:02.900,0:10:06.435 Non-linearity and IV[br]don't mix very comfortably. 0:10:06.435,0:10:09.424 - No, it sounds like that's already[br]a more complicated... 0:10:10.206,0:10:11.842 - Well, it's IV...[br]- Yeah. 0:10:12.591,0:10:14.033 - ...but then we work on that. 0:10:14.403,0:10:15.907 [laughter] 0:10:15.907,0:10:17.289 - Fair enough. 0:10:17.289,0:10:18.410 ♪ [music] ♪ 0:10:18.410,0:10:20.001 - [Guido] As editor [br]of Econometrica, 0:10:20.001,0:10:22.054 a lot of these papers[br]cross my desk, 0:10:22.700,0:10:26.823 but the motivation is not clear 0:10:27.555,0:10:29.500 and, in fact, really lacking. 0:10:29.800,0:10:31.028 They're not... 0:10:31.591,0:10:34.926 big old type semiparametric[br]foundational papers. 0:10:35.315,0:10:37.151 So that's a big problem. 0:10:38.761,0:10:42.664 A related problem is that we have[br]this tradition in econometrics 0:10:42.664,0:10:46.560 of being very focused[br]on these formal asymptotic results. 0:10:48.800,0:10:53.289 We just have a lot of papers[br]where people propose a method, 0:10:53.289,0:10:55.700 and then they establish[br]the asymptotic properties 0:10:56.300,0:10:59.420 in a very kind of standardized way. 0:11:00.873,0:11:02.055 - Is that bad? 0:11:02.900,0:11:06.420 - Well, I think it's sort[br]of closed the door 0:11:06.420,0:11:09.040 for a lot of work[br]that doesn't fit into that 0:11:09.040,0:11:11.600 where in the machine[br]learning literature, 0:11:11.900,0:11:13.453 a lot of things[br]are more algorithmic. 0:11:13.808,0:11:18.500 People had algorithms[br]for coming up with predictions 0:11:18.800,0:11:20.885 that turn out[br]to actually work much better 0:11:20.885,0:11:23.600 than, say, nonparametric[br]kernel regression. 0:11:24.000,0:11:26.800 For a long time, we were doing all[br]the nonparametrics in econometrics, 0:11:26.800,0:11:28.950 and we were using[br]kernel regression, 0:11:28.950,0:11:31.210 and that was great[br]for proving theorems. 0:11:31.210,0:11:32.580 You could get confidence intervals 0:11:32.580,0:11:34.684 and consistency, [br]and asymptotic normality, 0:11:34.684,0:11:35.736 and it was all great, 0:11:35.736,0:11:37.000 But it wasn't very useful. 0:11:37.300,0:11:39.100 And the things they did[br]in machine learning 0:11:39.100,0:11:41.051 are just way, way better. 0:11:41.051,0:11:42.557 But they didn't have the problem -- 0:11:42.557,0:11:44.449 - That's not my beef[br]with machine learning, 0:11:44.449,0:11:45.871 that the theory is weak. 0:11:45.871,0:11:47.141 [laughter] 0:11:47.141,0:11:51.250 - No, but I'm saying there,[br]for the prediction part, 0:11:51.250,0:11:52.394 it does much better. 0:11:52.394,0:11:54.500 - Yeah, it's a better[br]curve fitting tool. 0:11:54.900,0:11:57.608 - But it did so in a way 0:11:57.608,0:11:59.782 that would not have made[br]those papers 0:11:59.782,0:12:04.234 initially easy to get into,[br]the econometrics journals, 0:12:04.234,0:12:06.270 because it wasn't proving[br]the type of things... 0:12:06.786,0:12:09.864 When Breiman was doing[br]his regression trees -- 0:12:09.864,0:12:11.200 they just didn't fit in. 0:12:12.944,0:12:14.934 I think he would have had[br]a very hard time 0:12:14.934,0:12:18.400 publishing these things[br]in econometrics journals. 0:12:20.189,0:12:23.656 I think we've limited[br]ourselves too much 0:12:24.700,0:12:27.830 that left us close things off 0:12:27.830,0:12:29.622 for a lot of these[br]machine-learning methods 0:12:29.622,0:12:31.163 that are actually very useful. 0:12:31.163,0:12:34.000 I mean, I think, in general, 0:12:34.900,0:12:36.529 that literature, [br]the computer scientist, 0:12:36.529,0:12:40.013 have brought a huge number[br]of these algorithms there -- 0:12:40.582,0:12:42.632 have proposed a huge number[br]of these algorithms 0:12:42.632,0:12:43.887 that actually are very useful. 0:12:43.887,0:12:46.073 and that are affecting 0:12:46.073,0:12:49.100 the way we're going[br]to be doing empirical work. 0:12:49.800,0:12:52.105 But we've not fully[br]internalized that 0:12:52.105,0:12:53.573 because we're still very focused 0:12:53.573,0:12:57.500 on getting point estimates[br]and getting standard errors 0:12:58.600,0:13:00.159 and getting P values 0:13:00.159,0:13:03.209 in a way that we need[br]to move beyond 0:13:03.209,0:13:06.090 to fully harness the force, 0:13:06.549,0:13:08.351 the benefits 0:13:08.351,0:13:10.979 from the machine [br]learning literature. 0:13:11.198,0:13:13.548 - On the one hand, I guess I very[br]much take your point 0:13:13.548,0:13:16.850 that sort of the traditional[br]econometrics framework 0:13:16.850,0:13:19.821 of propose a method,[br]prove a limit theorem 0:13:19.821,0:13:23.870 under some asymptotic story,[br]story, story, story, story... 0:13:24.424,0:13:27.057 publisher paper is constraining, 0:13:27.218,0:13:30.132 and that, in some sense,[br]by thinking more broadly 0:13:30.132,0:13:31.829 about what a methods paper[br]could look like, 0:13:31.829,0:13:33.486 we may write, in some sense, 0:13:33.486,0:13:35.229 certainly that the machine[br]learning literature 0:13:35.229,0:13:37.189 has found a bunch of things[br]which seem to work quite well 0:13:37.189,0:13:38.300 for a number of problems 0:13:38.300,0:13:41.267 and are now having[br]substantial influence in economics. 0:13:41.267,0:13:43.261 I guess a question[br]I'm interested in 0:13:43.261,0:13:46.465 is how do you think[br]about the role of... 0:13:48.657,0:13:51.200 Do you think there is no value[br]in the theory part of it? 0:13:51.600,0:13:54.187 Because I guess a question[br]that I often have 0:13:54.187,0:13:56.804 to seeing the output[br]from a machine learning tool, 0:13:56.804,0:13:58.207 and actually a number[br]of the methods 0:13:58.207,0:13:59.220 that you talked about 0:13:59.220,0:14:00.679 actually do have[br]inferential results 0:14:00.679,0:14:01.944 developed for them, 0:14:02.520,0:14:03.963 something that[br]I always wonder about, 0:14:03.963,0:14:06.659 a sort of uncertainty[br]quantification and just... 0:14:06.659,0:14:08.000 I have my prior, 0:14:08.000,0:14:11.000 I come into the world with my view,[br]I see the result of this thing. 0:14:11.000,0:14:12.395 How should I update based on it? 0:14:12.395,0:14:13.867 And in some sense, [br]if I'm in a world 0:14:13.867,0:14:15.914 where things[br]are normally distributed, 0:14:15.914,0:14:17.280 I know how to do it -- 0:14:17.280,0:14:18.305 here I don't. 0:14:18.305,0:14:21.028 And so I'm interested to hear[br]what you think about that. 0:14:21.500,0:14:24.425 - I don't see this [br]as sort of saying, well, 0:14:24.698,0:14:26.556 these results are not interesting, 0:14:26.556,0:14:27.968 but it's going to be a lot of cases 0:14:27.968,0:14:30.233 where it's going to be incredibly[br]hard to get those results, 0:14:30.233,0:14:32.489 and we may not[br]be able to get there, 0:14:32.489,0:14:34.942 and we may need to do it in stages 0:14:34.942,0:14:36.440 where first someone says, 0:14:36.440,0:14:40.900 "Hey, I have[br]this interesting algorithm 0:14:40.900,0:14:42.200 for doing something," 0:14:42.200,0:14:47.209 and it works well[br]by some criterion 0:14:47.209,0:14:49.900 on this particular data set, 0:14:51.000,0:14:52.602 and we should put it out there. 0:14:52.602,0:14:55.410 and maybe someone[br]will figure out a way 0:14:55.410,0:14:57.828 that you can later actually[br]still do inference 0:14:57.828,0:14:59.463 under some conditions, 0:14:59.463,0:15:02.100 and maybe those are not[br]particularly realistic conditions. 0:15:02.100,0:15:03.800 Then we kind of go further. 0:15:03.800,0:15:08.418 But I think we've been[br]constraining things too much 0:15:08.418,0:15:09.519 where we said, 0:15:09.519,0:15:13.185 "This is the type of things[br]that we need to do." 0:15:13.185,0:15:14.502 And in some sense, 0:15:15.700,0:15:18.200 that goes back[br]to the way Josh and I 0:15:19.700,0:15:21.984 thought about things for the local[br]average treatment effect. 0:15:21.984,0:15:23.137 That wasn't quite the way 0:15:23.137,0:15:25.135 people were thinking[br]about these problems before. 0:15:25.805,0:15:28.860 There was a sense[br]that some of the people said 0:15:29.500,0:15:31.900 the way you need to do[br]these things is you first say 0:15:32.200,0:15:34.140 what you're interested[br]in estimating, 0:15:34.140,0:15:37.507 and then you do the best job[br]you can in estimating that. 0:15:38.100,0:15:43.874 And what you guys are doing[br]is you're doing it backwards. 0:15:44.300,0:15:46.700 You kind of say,[br]"Here, I have an estimator, 0:15:47.300,0:15:50.642 and now I'm going to figure out[br]what it's estimating." 0:15:50.642,0:15:53.900 And I suppose you're going to say[br]why you think that's interesting 0:15:53.900,0:15:56.600 or maybe why it's not interesting,[br]and that's not okay. 0:15:56.600,0:15:58.600 You're not allowed[br]to do that in that way. 0:15:59.000,0:16:02.026 And I think we should[br]just be a little bit more flexible 0:16:02.026,0:16:06.648 in thinking about[br]how to look at problems 0:16:06.648,0:16:08.328 because I think[br]we've missed some things 0:16:08.328,0:16:11.300 by not doing that. 0:16:11.300,0:16:12.819 ♪ [music] ♪ 0:16:12.819,0:16:14.753 - [Josh] So you've heard[br]our views, Isaiah, 0:16:14.753,0:16:18.191 and you've seen that we have[br]some points of disagreement. 0:16:18.191,0:16:20.400 Why don't you referee[br]this dispute for us? 0:16:20.950,0:16:22.394 [laughter] 0:16:22.500,0:16:24.999 - Oh, it's so nice of you[br]to ask me a small question. 0:16:24.999,0:16:26.212 [laughter] 0:16:26.425,0:16:27.993 So I guess, for one, 0:16:27.993,0:16:33.200 I very much agree with something[br]that Guido said earlier of... 0:16:34.100,0:16:35.710 [laughter] 0:16:35.920,0:16:37.148 So one thing where it seems 0:16:37.148,0:16:40.066 where the case for machine learning[br]seems relatively clear 0:16:40.066,0:16:43.316 is in settings where[br]we're interested in some version 0:16:43.316,0:16:45.100 of a nonparametric[br]prediction problem. 0:16:45.100,0:16:46.392 So I'm interested in estimating 0:16:46.392,0:16:49.700 a conditional expectation[br]or conditional probability, 0:16:50.000,0:16:52.020 and in the past, maybe[br]I would have run a kernel... 0:16:52.020,0:16:53.526 I would have run[br]a kernel regression, 0:16:53.526,0:16:55.184 or I would have run[br]a series regression, 0:16:55.184,0:16:57.400 or something along those lines. 0:16:57.976,0:17:00.350 It seems like, at this point, [br]we've a fairly good sense 0:17:00.350,0:17:03.102 that in a fairly wide range[br]of applications, 0:17:03.102,0:17:05.671 machine learning methods[br]seem to do better 0:17:05.671,0:17:08.610 for estimating conditional[br]mean functions, 0:17:08.610,0:17:09.811 or conditional probabilities, 0:17:09.811,0:17:12.000 or various other[br]nonparametric objects 0:17:12.400,0:17:15.309 than more traditional[br]nonparametric methods 0:17:15.309,0:17:17.292 that were studied[br]in econometrics and statistics, 0:17:17.292,0:17:19.100 especially in[br]high-dimensional settings. 0:17:19.500,0:17:21.849 - So you're thinking of maybe[br]the propensity score 0:17:21.849,0:17:23.155 or something like that? 0:17:23.155,0:17:25.063 - Yeah, exactly,[br]- Nuisance functions. 0:17:25.063,0:17:27.100 - Yeah, so things[br]like propensity scores. 0:17:27.872,0:17:29.965 Even objects of more direct 0:17:29.965,0:17:32.400 interest-like conditional[br]average treatment effects, 0:17:32.400,0:17:35.100 which are the difference of two[br]conditional expectation functions, 0:17:35.100,0:17:36.625 potentially things like that. 0:17:36.625,0:17:40.573 Of course, even there, [br]the theory... 0:17:40.573,0:17:43.620 for inference of the theory[br]for how to interpret, 0:17:43.620,0:17:45.797 how to make large sample statements[br]about some of these things 0:17:45.797,0:17:47.733 are less well-developed[br]depending on 0:17:47.733,0:17:50.100 the machine learning[br]estimator used. 0:17:50.100,0:17:52.983 And so I think[br]something that is tricky 0:17:52.983,0:17:55.700 is that we can have these methods,[br]which work a lot, 0:17:55.700,0:17:58.000 which seem to work[br]a lot better for some purposes 0:17:58.000,0:18:01.229 but which we need to be a bit[br]careful in how we plug them in 0:18:01.229,0:18:03.300 or how we interpret[br]the resulting statements. 0:18:03.600,0:18:05.985 But, of course, that's a very,[br]very active area right now 0:18:05.985,0:18:07.668 where people are doing[br]tons of great work. 0:18:07.668,0:18:10.694 So I fully expect[br]and hope to see 0:18:10.694,0:18:12.800 much more going forward there. 0:18:13.000,0:18:16.780 So one issue with machine learning[br]that always seems a danger is... 0:18:16.780,0:18:18.517 or that is sometimes a danger 0:18:18.517,0:18:20.938 and has sometimes[br]led to applications 0:18:20.938,0:18:22.139 that have made less sense 0:18:22.139,0:18:27.309 is when folks start with a method[br]that they're very excited about 0:18:27.309,0:18:28.676 rather than a question. 0:18:28.900,0:18:30.492 So sort of starting with a question 0:18:30.492,0:18:33.782 where here's the object[br]I'm interested in, 0:18:33.782,0:18:35.228 here is the parameter[br]of interest -- 0:18:35.529,0:18:39.500 let me think about how I would[br]identify that thing, 0:18:39.500,0:18:41.824 how I would recover that thing[br]if I had a ton of data. 0:18:41.824,0:18:44.000 Oh, here's a conditional[br]expectation function, 0:18:44.000,0:18:47.065 let me plug in a machine[br]learning estimator for that -- 0:18:47.065,0:18:48.800 that seems very, very sensible. 0:18:49.000,0:18:52.964 Whereas, you know, [br]if I regress quantity on price 0:18:53.504,0:18:56.000 and say that I used[br]a machine learning method, 0:18:56.300,0:18:58.791 maybe I'm satisfied that [br]that solves the endogeneity problem 0:18:58.791,0:19:01.200 we're usually worried[br]about there -- maybe I'm not. 0:19:01.500,0:19:02.649 But, again, that's something 0:19:02.649,0:19:06.300 where the way to address it[br]seems relatively clear. 0:19:06.500,0:19:08.181 It's to find[br]your object of interest 0:19:08.181,0:19:09.779 and think about -- 0:19:09.779,0:19:11.489 - Just bring in the economics. 0:19:11.489,0:19:12.741 - Exactly. 0:19:12.741,0:19:14.274 - And think about[br]the heterogeneity, 0:19:14.274,0:19:17.067 but harness the power[br]of the machine learning methods 0:19:17.067,0:19:20.148 for some of the components. 0:19:20.349,0:19:21.388 - Precisely. Exactly. 0:19:21.388,0:19:23.673 So the question of interest 0:19:23.673,0:19:25.801 is the same as the question[br]of interest has always been, 0:19:25.801,0:19:28.603 but we now have better methods[br]for estimating some pieces of this. 0:19:29.900,0:19:32.704 The place that seems[br]harder to forecast 0:19:32.704,0:19:35.816 is obviously there's[br]a huge amount going on 0:19:35.816,0:19:37.500 in the machine learning literature, 0:19:37.500,0:19:40.223 and the limited ways[br]of plugging it in 0:19:40.223,0:19:41.388 that I've referenced so far 0:19:41.388,0:19:43.090 are a limited piece of that. 0:19:43.090,0:19:45.394 So I think there are all sorts[br]of other interesting questions 0:19:45.394,0:19:46.630 about where... 0:19:47.100,0:19:49.300 where does this interaction go? [br]What else can we learn? 0:19:49.300,0:19:52.932 And that's something where[br]I think there's a ton going on, 0:19:52.932,0:19:54.414 which seems very promising, 0:19:54.414,0:19:56.400 and I have no idea[br]what the answer is. 0:19:57.000,0:20:00.297 - No, I totally agree with that, 0:20:00.297,0:20:03.539 but that makes it very exciting. 0:20:03.539,0:20:06.100 And I think there's just[br]a little work to be done there. 0:20:06.600,0:20:08.720 Alright. So Isaiah agrees [br]with me there. 0:20:08.720,0:20:10.174 [laughter] 0:20:10.174,0:20:11.633 - I didn't say that per se. 0:20:12.926,0:20:14.419 ♪ [music] ♪ 0:20:14.419,0:20:16.833 - [Narrator] If you'd like to watch[br]more Nobel Conversations, 0:20:16.833,0:20:18.012 click here. 0:20:18.012,0:20:20.500 Or if you'd like to learn[br]more about econometrics, 0:20:20.500,0:20:23.100 check out Josh's[br]Mastering Econometrics series. 0:20:23.600,0:20:26.569 If you'd like to learn more[br]about Guido, Josh, and Isaiah, 0:20:26.569,0:20:28.550 check out the links[br]in the description. 0:20:28.550,0:20:30.535 ♪ [music] ♪