1 00:00:00,100 --> 00:00:02,050 ♪ [music] ♪ 2 00:00:03,620 --> 00:00:05,700 - [Narrator] Welcome to Nobel Conversations. 3 00:00:07,000 --> 00:00:10,043 In this episode, Josh Angrist and Guido Imbens 4 00:00:10,043 --> 00:00:13,675 sit down with Isaiah Andrews to discuss and disagree 5 00:00:13,675 --> 00:00:16,580 over the role of machine learning in applied econometrics. 6 00:00:17,897 --> 00:00:19,769 - [Isaiah] So, of course, there are a lot of topics 7 00:00:19,769 --> 00:00:21,087 where you guys largely agree, 8 00:00:21,087 --> 00:00:22,313 but I'd like to turn to one 9 00:00:22,313 --> 00:00:24,240 where maybe you have some differences of opinion. 10 00:00:24,240 --> 00:00:25,728 I'd love to hear some of your thoughts 11 00:00:25,728 --> 00:00:26,883 about machine learning 12 00:00:26,883 --> 00:00:29,900 and the goal that it's playing and is going to play in economics. 13 00:00:30,200 --> 00:00:33,352 - [Guido] I've looked at some data like the proprietary. 14 00:00:33,352 --> 00:00:35,150 We see that there's no published paper there. 15 00:00:36,122 --> 00:00:38,159 There was an experiment that was done 16 00:00:38,159 --> 00:00:39,500 on some search algorithm, 17 00:00:39,700 --> 00:00:41,327 and the question was -- 18 00:00:42,901 --> 00:00:45,600 it was about ranking things and changing the ranking. 19 00:00:45,900 --> 00:00:47,290 And it was sort of clear 20 00:00:48,400 --> 00:00:50,610 that there was going to be a lot of heterogeneity there. 21 00:00:52,161 --> 00:00:56,282 If you look for, say, 22 00:00:57,831 --> 00:01:00,617 a picture of Britney Spears -- 23 00:01:00,617 --> 00:01:02,493 that it doesn't really matter where you rank it 24 00:01:02,493 --> 00:01:05,500 because you're going to figure out what you're looking for, 25 00:01:06,200 --> 00:01:07,867 whether you put it in the first or second 26 00:01:07,867 --> 00:01:09,920 or third position of the ranking. 27 00:01:10,100 --> 00:01:12,500 But if you're looking for the best econometrics book, 28 00:01:13,300 --> 00:01:16,430 if you put your book first or your book tenth -- 29 00:01:16,430 --> 00:01:18,100 that's going to make a big difference 30 00:01:18,600 --> 00:01:20,979 how often people are going to click on it. 31 00:01:21,829 --> 00:01:23,417 And so there you -- 32 00:01:23,417 --> 00:01:27,119 - [Josh] Why do I need machine learning to discover that? 33 00:01:27,119 --> 00:01:29,195 It seems like -- because I can discover it simply. 34 00:01:29,195 --> 00:01:30,435 - [Guido] So in general -- 35 00:01:30,435 --> 00:01:32,100 - [Josh] There were lots of possible... 36 00:01:32,100 --> 00:01:35,045 - You want to think about there being lots of characteristics 37 00:01:35,490 --> 00:01:37,280 of the items, 38 00:01:37,610 --> 00:01:41,682 that you want to understand what drives the heterogeneity 39 00:01:42,177 --> 00:01:43,427 in the effect of -- 40 00:01:43,427 --> 00:01:45,008 - But you're just predicting 41 00:01:45,008 --> 00:01:47,665 In some sense, you're solving a marketing problem. 42 00:01:47,665 --> 00:01:49,381 - No, it's a causal effect, 43 00:01:49,381 --> 00:01:51,911 - It's causal, but it has no scientific content. 44 00:01:51,911 --> 00:01:53,141 Think about... 45 00:01:53,657 --> 00:01:57,300 - No, but there's similar things in medical settings. 46 00:01:58,000 --> 00:02:01,300 If you do an experiment, you may actually be very interested 47 00:02:01,300 --> 00:02:03,900 in whether the treatment works for some groups or not. 48 00:02:03,900 --> 00:02:06,143 And you have a lot of individual characteristics, 49 00:02:06,143 --> 00:02:08,000 and you want to systematically search -- 50 00:02:08,000 --> 00:02:09,500 - Yeah. I'm skeptical about that -- 51 00:02:09,500 --> 00:02:12,603 that sort of idea that there's this personal causal effect 52 00:02:12,603 --> 00:02:14,000 that I should care about, 53 00:02:14,000 --> 00:02:15,740 and that machine learning can discover it 54 00:02:15,740 --> 00:02:17,259 in some way that's useful. 55 00:02:17,259 --> 00:02:20,045 So think about -- I've done a lot of work on schools, 56 00:02:20,045 --> 00:02:22,336 going to, say, a charter school, 57 00:02:22,336 --> 00:02:24,428 a publicly funded private school, 58 00:02:25,225 --> 00:02:27,392 effectively, that's free to structure 59 00:02:27,392 --> 00:02:29,399 its own curriculum for context there. 60 00:02:29,399 --> 00:02:31,369 Some types of charter schools 61 00:02:31,369 --> 00:02:33,703 generate spectacular achievement gains, 62 00:02:33,703 --> 00:02:36,400 and in the data set that produces that result, 63 00:02:36,400 --> 00:02:37,800 I have a lot of covariates. 64 00:02:37,800 --> 00:02:41,353 So I have baseline scores, and I have family background, 65 00:02:41,353 --> 00:02:43,207 the education of the parents, 66 00:02:43,576 --> 00:02:45,800 the sex of the child, the race of the child. 67 00:02:45,800 --> 00:02:49,795 And, well, soon as I put half a dozen of those together, 68 00:02:49,795 --> 00:02:51,900 I have a very high-dimensional space. 69 00:02:52,300 --> 00:02:55,199 I'm definitely interested in course features 70 00:02:55,199 --> 00:02:56,457 of that treatment effect, 71 00:02:56,457 --> 00:02:58,741 like whether it's better for people 72 00:02:58,741 --> 00:03:02,046 who come from lower-income families. 73 00:03:02,600 --> 00:03:05,760 I have a hard time believing that there's an application 74 00:03:07,273 --> 00:03:09,872 for the very high-dimensional version of that, 75 00:03:09,872 --> 00:03:12,406 where I discovered that for non-white children 76 00:03:12,406 --> 00:03:14,971 who have high family incomes 77 00:03:14,971 --> 00:03:17,800 but baseline scores in the third quartile 78 00:03:18,166 --> 00:03:21,785 and only went to public school in the third grade 79 00:03:21,785 --> 00:03:23,000 but not the sixth grade. 80 00:03:23,000 --> 00:03:25,796 So that's what that high-dimensional analysis produces. 81 00:03:25,800 --> 00:03:28,100 It's a very elaborate conditional statement. 82 00:03:28,300 --> 00:03:30,605 There's two things that are wrong with that in my view. 83 00:03:30,605 --> 00:03:31,797 First, I don't see it as -- 84 00:03:31,797 --> 00:03:34,000 I just can't imagine why it's actionable. 85 00:03:34,600 --> 00:03:36,600 I don't know why you'd want to act on it. 86 00:03:36,600 --> 00:03:39,455 And I know also that there's some alternative model 87 00:03:39,455 --> 00:03:41,200 that fits almost as well, 88 00:03:41,800 --> 00:03:43,398 that flips everything. 89 00:03:43,398 --> 00:03:45,350 Because machine learning doesn't tell me 90 00:03:45,350 --> 00:03:48,582 that this is really the predictor that matters -- 91 00:03:48,582 --> 00:03:50,965 it just tells me that this is a good predictor. 92 00:03:51,486 --> 00:03:54,870 And so, I think there is something different 93 00:03:54,870 --> 00:03:57,586 about the social science context. 94 00:03:57,940 --> 00:04:00,186 - [Guido] I think the social science applications 95 00:04:00,186 --> 00:04:01,633 you're talking about 96 00:04:01,633 --> 00:04:02,735 are ones where, 97 00:04:03,400 --> 00:04:08,100 I think, there's not a huge amount of heterogeneity in the effects. 98 00:04:09,777 --> 00:04:11,544 - [Josh] Well, there might be if you allow me 99 00:04:11,544 --> 00:04:13,466 to fill that space. 100 00:04:13,466 --> 00:04:15,740 - No... not even then. 101 00:04:15,740 --> 00:04:18,614 I think for a lot of those interventions, 102 00:04:18,614 --> 00:04:22,765 you would expect that the effect is the same sign for everybody. 103 00:04:24,367 --> 00:04:27,600 There may be small differences in the magnitude, but it's not... 104 00:04:28,200 --> 00:04:30,232 For a lot of these educational defenses -- 105 00:04:30,232 --> 00:04:31,700 they're good for everybody. 106 00:04:34,169 --> 00:04:36,034 It's not that they're bad for some people 107 00:04:36,034 --> 00:04:37,600 and good for other people, 108 00:04:37,600 --> 00:04:39,100 and that is kind of very small pockets 109 00:04:39,100 --> 00:04:40,900 where they're bad there. 110 00:04:40,900 --> 00:04:44,011 But there may be some variation in the magnitude, 111 00:04:44,011 --> 00:04:46,955 but you would need very, very big data sets to find those. 112 00:04:47,906 --> 00:04:49,078 I agree that in those cases, 113 00:04:49,078 --> 00:04:51,400 they probably wouldn't be very actionable anyway. 114 00:04:51,700 --> 00:04:53,800 But I think there's a lot of other settings 115 00:04:54,100 --> 00:04:56,600 where there is much more heterogeneity. 116 00:04:57,250 --> 00:04:59,102 - Well, I'm open to that possibility, 117 00:04:59,102 --> 00:05:04,918 and I think the example you gave is essentially a marketing example. 118 00:05:06,315 --> 00:05:09,656 - No, those have implications for it 119 00:05:09,656 --> 00:05:11,069 and that's the organization, 120 00:05:12,252 --> 00:05:14,330 whether you need to worry about the... 121 00:05:15,469 --> 00:05:17,900 - Well, I need to see that paper. 122 00:05:18,400 --> 00:05:21,072 - So the sense I'm getting is that -- 123 00:05:21,467 --> 00:05:23,996 - We still disagree on something. - Yes. 124 00:05:23,996 --> 00:05:25,440 - We haven't converged on everything. 125 00:05:25,440 --> 00:05:27,200 - I'm getting that sense. [laughter] 126 00:05:27,200 --> 00:05:28,679 - Actually, we've diverged on this 127 00:05:28,679 --> 00:05:30,833 because this wasn't around to argue about. 128 00:05:30,833 --> 00:05:32,334 [laughter] 129 00:05:33,057 --> 00:05:34,771 - Is it getting a little warm here? 130 00:05:35,820 --> 00:05:38,147 - Warmed up. Warmed up is good. 131 00:05:38,147 --> 00:05:41,187 The sense I'm getting is, Josh, you're not saying 132 00:05:41,187 --> 00:05:43,119 that you're confident that there is no way 133 00:05:43,119 --> 00:05:45,347 that there is an application where this stuff is useful. 134 00:05:45,347 --> 00:05:47,028 You are saying you are unconvinced 135 00:05:47,028 --> 00:05:49,487 by the existing applications to date. 136 00:05:49,917 --> 00:05:52,022 - Fair enough. - I'm very confident. 137 00:05:52,022 --> 00:05:53,704 [laughter] 138 00:05:54,156 --> 00:05:55,189 - In this case. 139 00:05:55,189 --> 00:05:56,555 - I think Josh does have a point 140 00:05:56,555 --> 00:06:00,452 that even in the prediction cases 141 00:06:01,639 --> 00:06:04,519 where a lot of the machine learning methods really shine 142 00:06:04,519 --> 00:06:06,738 is where there's just a lot of heterogeneity. 143 00:06:07,300 --> 00:06:10,769 - You don't really care much about the details there, right? 144 00:06:10,769 --> 00:06:11,836 - [Guido] Yes. 145 00:06:11,836 --> 00:06:15,000 - It doesn't have a policy angle or something. 146 00:06:15,200 --> 00:06:18,232 - The kind of recognizing handwritten digits and stuff -- 147 00:06:18,795 --> 00:06:20,090 it does much better there 148 00:06:20,090 --> 00:06:24,000 than building some complicated model. 149 00:06:24,400 --> 00:06:28,183 But a lot of the social science, a lot of the economic applications, 150 00:06:28,183 --> 00:06:30,383 we actually know a huge amount about the relationship 151 00:06:30,383 --> 00:06:32,100 between its variables. 152 00:06:32,100 --> 00:06:34,700 A lot of the relationships are strictly monotone. 153 00:06:37,166 --> 00:06:39,416 Education is going to increase people's earnings, 154 00:06:39,697 --> 00:06:41,950 irrespective of the demographic, 155 00:06:41,950 --> 00:06:44,930 irrespective of the level of education you already have. 156 00:06:44,930 --> 00:06:46,180 - Until they get to a Ph.D. 157 00:06:46,180 --> 00:06:47,956 - Is that true for graduate school? 158 00:06:47,956 --> 00:06:49,227 [laughter] 159 00:06:49,227 --> 00:06:50,700 - Over a reasonable range. 160 00:06:51,600 --> 00:06:55,488 It's not going to go down very much. 161 00:06:56,100 --> 00:06:58,121 In a lot of the settings 162 00:06:58,121 --> 00:07:00,100 where these machine learning methods shine, 163 00:07:00,100 --> 00:07:01,900 there's a lot of non-monotonicity, 164 00:07:02,100 --> 00:07:04,900 kind of multimodality in these relationships, 165 00:07:05,300 --> 00:07:08,921 and they're going to be very powerful. 166 00:07:08,921 --> 00:07:11,787 But I still stand by that. 167 00:07:12,410 --> 00:07:14,975 These methods just have a huge amount to offer 168 00:07:15,925 --> 00:07:17,561 for economists, 169 00:07:17,561 --> 00:07:21,700 and they're going to be a big part of the future. 170 00:07:21,930 --> 00:07:23,020 ♪ [music] ♪ 171 00:07:23,020 --> 00:07:24,600 - [Isaiah] It feels like there's something interesting 172 00:07:24,600 --> 00:07:25,912 to be said about machine learning here. 173 00:07:25,912 --> 00:07:28,100 So, Guido, I was wondering, could you give some more... 174 00:07:28,100 --> 00:07:29,807 maybe some examples of the sorts of examples 175 00:07:29,807 --> 00:07:30,908 you're thinking about 176 00:07:30,908 --> 00:07:32,660 with applications coming out at the moment? 177 00:07:32,660 --> 00:07:34,182 - So one area is where 178 00:07:34,700 --> 00:07:36,947 instead of looking for average causal effects, 179 00:07:36,947 --> 00:07:39,350 we're looking for individualized estimates, 180 00:07:41,354 --> 00:07:43,288 predictions of causal effects, 181 00:07:43,288 --> 00:07:46,112 and there, the machine learning algorithms 182 00:07:46,112 --> 00:07:47,941 have been very effective. 183 00:07:47,941 --> 00:07:51,415 Traditionally, we would have done these things using kernel methods, 184 00:07:51,415 --> 00:07:54,003 and theoretically, they work great, 185 00:07:54,003 --> 00:07:55,636 and there's some arguments 186 00:07:55,636 --> 00:07:57,612 that, formally, you can't do any better. 187 00:07:57,612 --> 00:07:59,579 But in practice, they don't work very well. 188 00:08:00,900 --> 00:08:03,527 Random causal forest-type things 189 00:08:03,527 --> 00:08:06,916 that Stefan Wager and Susan Athey have been working on 190 00:08:06,916 --> 00:08:09,453 are used very widely. 191 00:08:09,453 --> 00:08:12,200 They've been very effective in these settings 192 00:08:12,400 --> 00:08:19,208 to actually get causal effects that vary by covariates. 193 00:08:20,700 --> 00:08:23,734 I think this is still just the beginning of these methods. 194 00:08:23,734 --> 00:08:25,700 But in many cases, 195 00:08:27,351 --> 00:08:31,600 these algorithms are very effective as searching over big spaces 196 00:08:31,800 --> 00:08:37,133 and finding the functions that fit very well 197 00:08:37,133 --> 00:08:40,948 in ways that we couldn't really do beforehand. 198 00:08:41,500 --> 00:08:42,697 - I don't know of an example 199 00:08:42,697 --> 00:08:45,300 where machine learning has generated insights 200 00:08:45,300 --> 00:08:47,664 about a causal effect that I'm interested in. 201 00:08:47,664 --> 00:08:49,610 And I do know of examples 202 00:08:49,610 --> 00:08:51,300 where it's potentially very misleading. 203 00:08:51,300 --> 00:08:53,700 So I've done some work with Brigham Frandsen, 204 00:08:54,100 --> 00:08:57,782 using, for example, random forests to model covariate effects 205 00:08:57,782 --> 00:09:00,269 in an instrumental variables problem 206 00:09:00,269 --> 00:09:03,375 where you need to condition on covariates. 207 00:09:04,400 --> 00:09:06,531 And you don't particularly have strong feelings 208 00:09:06,531 --> 00:09:08,200 about the functional form for that, 209 00:09:08,200 --> 00:09:10,000 so maybe you should curve... 210 00:09:10,900 --> 00:09:12,804 be open to flexible curve fitting, 211 00:09:12,804 --> 00:09:14,501 And that leads you down a path 212 00:09:14,501 --> 00:09:16,853 where there's a lot of nonlinearities in the model, 213 00:09:17,384 --> 00:09:19,933 and that's very dangerous with IV 214 00:09:19,933 --> 00:09:23,000 because any sort of excluded non-linearity 215 00:09:23,300 --> 00:09:25,839 potentially generates a spurious causal effect, 216 00:09:25,839 --> 00:09:29,292 and Brigham and I showed that very powerfully, I think, 217 00:09:29,292 --> 00:09:32,200 in the case of two instruments 218 00:09:32,944 --> 00:09:35,113 that come from a paper of mine with Bill Evans, 219 00:09:35,113 --> 00:09:37,600 where if you replace it... 220 00:09:38,708 --> 00:09:40,825 a traditional two-stage least squares estimator 221 00:09:40,825 --> 00:09:42,600 with some kind of random forest, 222 00:09:42,900 --> 00:09:46,807 you get very precisely estimated nonsense estimates. 223 00:09:49,173 --> 00:09:51,100 I think that's a big caution. 224 00:09:51,944 --> 00:09:55,096 In view of those findings, in an example I care about 225 00:09:55,096 --> 00:09:57,100 where the instruments are very simple 226 00:09:57,400 --> 00:09:59,100 and I believe that they're valid, 227 00:09:59,300 --> 00:10:01,096 I would be skeptical of that. 228 00:10:02,900 --> 00:10:06,435 Non-linearity and IV don't mix very comfortably. 229 00:10:06,435 --> 00:10:09,424 - No, it sounds like that's already a more complicated... 230 00:10:10,206 --> 00:10:11,842 - Well, it's IV... - Yeah. 231 00:10:12,591 --> 00:10:14,033 - ...but then we work on that. 232 00:10:14,403 --> 00:10:15,907 [laughter] 233 00:10:15,907 --> 00:10:17,289 - Fair enough. 234 00:10:17,289 --> 00:10:18,410 ♪ [music] ♪ 235 00:10:18,410 --> 00:10:20,001 - [Guido] As editor of Econometrica, 236 00:10:20,001 --> 00:10:22,054 a lot of these papers cross my desk, 237 00:10:22,700 --> 00:10:26,823 but the motivation is not clear 238 00:10:27,555 --> 00:10:29,500 and, in fact, really lacking. 239 00:10:29,800 --> 00:10:31,028 They're not... 240 00:10:31,591 --> 00:10:34,926 big old type semiparametric foundational papers. 241 00:10:35,315 --> 00:10:37,151 So that's a big problem. 242 00:10:38,761 --> 00:10:42,664 A related problem is that we have this tradition in econometrics 243 00:10:42,664 --> 00:10:46,560 of being very focused on these formal asymptotic results. 244 00:10:48,800 --> 00:10:53,289 We just have a lot of papers where people propose a method, 245 00:10:53,289 --> 00:10:55,700 and then they establish the asymptotic properties 246 00:10:56,300 --> 00:10:59,420 in a very kind of standardized way. 247 00:11:00,873 --> 00:11:02,055 - Is that bad? 248 00:11:02,900 --> 00:11:06,420 - Well, I think it's sort of closed the door 249 00:11:06,420 --> 00:11:09,040 for a lot of work that doesn't fit into that 250 00:11:09,040 --> 00:11:11,600 where in the machine learning literature, 251 00:11:11,900 --> 00:11:13,453 a lot of things are more algorithmic. 252 00:11:13,808 --> 00:11:18,500 People had algorithms for coming up with predictions 253 00:11:18,800 --> 00:11:20,885 that turn out to actually work much better 254 00:11:20,885 --> 00:11:23,600 than, say, nonparametric kernel regression. 255 00:11:24,000 --> 00:11:26,800 For a long time, we were doing all the nonparametrics in econometrics, 256 00:11:26,800 --> 00:11:28,950 and we were using kernel regression, 257 00:11:28,950 --> 00:11:31,210 and that was great for proving theorems. 258 00:11:31,210 --> 00:11:32,580 You could get confidence intervals 259 00:11:32,580 --> 00:11:34,684 and consistency, and asymptotic normality, 260 00:11:34,684 --> 00:11:35,736 and it was all great, 261 00:11:35,736 --> 00:11:37,000 But it wasn't very useful. 262 00:11:37,300 --> 00:11:39,100 And the things they did in machine learning 263 00:11:39,100 --> 00:11:41,051 are just way, way better. 264 00:11:41,051 --> 00:11:42,557 But they didn't have the problem -- 265 00:11:42,557 --> 00:11:44,449 - That's not my beef with machine learning, 266 00:11:44,449 --> 00:11:45,871 that the theory is weak. 267 00:11:45,871 --> 00:11:47,141 [laughter] 268 00:11:47,141 --> 00:11:51,250 - No, but I'm saying there, for the prediction part, 269 00:11:51,250 --> 00:11:52,394 it does much better. 270 00:11:52,394 --> 00:11:54,500 - Yeah, it's a better curve fitting tool. 271 00:11:54,900 --> 00:11:57,608 - But it did so in a way 272 00:11:57,608 --> 00:11:59,782 that would not have made those papers 273 00:11:59,782 --> 00:12:04,234 initially easy to get into, the econometrics journals, 274 00:12:04,234 --> 00:12:06,270 because it wasn't proving the type of things... 275 00:12:06,786 --> 00:12:09,864 When Breiman was doing his regression trees -- 276 00:12:09,864 --> 00:12:11,200 they just didn't fit in. 277 00:12:12,944 --> 00:12:14,934 I think he would have had a very hard time 278 00:12:14,934 --> 00:12:18,400 publishing these things in econometrics journals. 279 00:12:20,189 --> 00:12:23,656 I think we've limited ourselves too much 280 00:12:24,700 --> 00:12:27,830 that left us close things off 281 00:12:27,830 --> 00:12:29,622 for a lot of these machine-learning methods 282 00:12:29,622 --> 00:12:31,163 that are actually very useful. 283 00:12:31,163 --> 00:12:34,000 I mean, I think, in general, 284 00:12:34,900 --> 00:12:36,529 that literature, the computer scientist, 285 00:12:36,529 --> 00:12:40,013 have brought a huge number of these algorithms there -- 286 00:12:40,582 --> 00:12:42,632 have proposed a huge number of these algorithms 287 00:12:42,632 --> 00:12:43,887 that actually are very useful. 288 00:12:43,887 --> 00:12:46,073 and that are affecting 289 00:12:46,073 --> 00:12:49,100 the way we're going to be doing empirical work. 290 00:12:49,800 --> 00:12:52,105 But we've not fully internalized that 291 00:12:52,105 --> 00:12:53,573 because we're still very focused 292 00:12:53,573 --> 00:12:57,500 on getting point estimates and getting standard errors 293 00:12:58,600 --> 00:13:00,159 and getting P values 294 00:13:00,159 --> 00:13:03,209 in a way that we need to move beyond 295 00:13:03,209 --> 00:13:06,090 to fully harness the force, 296 00:13:06,549 --> 00:13:08,351 the benefits 297 00:13:08,351 --> 00:13:10,979 from the machine learning literature. 298 00:13:11,198 --> 00:13:13,548 - On the one hand, I guess I very much take your point 299 00:13:13,548 --> 00:13:16,850 that sort of the traditional econometrics framework 300 00:13:16,850 --> 00:13:19,821 of propose a method, prove a limit theorem 301 00:13:19,821 --> 00:13:23,870 under some asymptotic story, story, story, story, story... 302 00:13:24,424 --> 00:13:27,057 publisher paper is constraining, 303 00:13:27,218 --> 00:13:30,132 and that, in some sense, by thinking more broadly 304 00:13:30,132 --> 00:13:31,829 about what a methods paper could look like, 305 00:13:31,829 --> 00:13:33,486 we may write, in some sense, 306 00:13:33,486 --> 00:13:35,229 certainly that the machine learning literature 307 00:13:35,229 --> 00:13:37,189 has found a bunch of things which seem to work quite well 308 00:13:37,189 --> 00:13:38,300 for a number of problems 309 00:13:38,300 --> 00:13:41,267 and are now having substantial influence in economics. 310 00:13:41,267 --> 00:13:43,261 I guess a question I'm interested in 311 00:13:43,261 --> 00:13:46,465 is how do you think about the role of... 312 00:13:48,657 --> 00:13:51,200 Do you think there is no value in the theory part of it? 313 00:13:51,600 --> 00:13:54,187 Because I guess a question that I often have 314 00:13:54,187 --> 00:13:56,804 to seeing the output from a machine learning tool, 315 00:13:56,804 --> 00:13:58,207 and actually a number of the methods 316 00:13:58,207 --> 00:13:59,220 that you talked about 317 00:13:59,220 --> 00:14:00,679 actually do have inferential results 318 00:14:00,679 --> 00:14:01,944 developed for them, 319 00:14:02,520 --> 00:14:03,963 something that I always wonder about, 320 00:14:03,963 --> 00:14:06,659 a sort of uncertainty quantification and just... 321 00:14:06,659 --> 00:14:08,000 I have my prior, 322 00:14:08,000 --> 00:14:11,000 I come into the world with my view, I see the result of this thing. 323 00:14:11,000 --> 00:14:12,395 How should I update based on it? 324 00:14:12,395 --> 00:14:13,867 And in some sense, if I'm in a world 325 00:14:13,867 --> 00:14:15,914 where things are normally distributed, 326 00:14:15,914 --> 00:14:17,280 I know how to do it -- 327 00:14:17,280 --> 00:14:18,305 here I don't. 328 00:14:18,305 --> 00:14:21,028 And so I'm interested to hear what you think about that. 329 00:14:21,500 --> 00:14:24,425 - I don't see this as sort of saying, well, 330 00:14:24,698 --> 00:14:26,556 these results are not interesting, 331 00:14:26,556 --> 00:14:27,968 but it's going to be a lot of cases 332 00:14:27,968 --> 00:14:30,233 where it's going to be incredibly hard to get those results, 333 00:14:30,233 --> 00:14:32,489 and we may not be able to get there, 334 00:14:32,489 --> 00:14:34,942 and we may need to do it in stages 335 00:14:34,942 --> 00:14:36,440 where first someone says, 336 00:14:36,440 --> 00:14:40,900 "Hey, I have this interesting algorithm 337 00:14:40,900 --> 00:14:42,200 for doing something," 338 00:14:42,200 --> 00:14:47,209 and it works well by some criterion 339 00:14:47,209 --> 00:14:49,900 on this particular data set, 340 00:14:51,000 --> 00:14:52,602 and we should put it out there. 341 00:14:52,602 --> 00:14:55,410 and maybe someone will figure out a way 342 00:14:55,410 --> 00:14:57,828 that you can later actually still do inference 343 00:14:57,828 --> 00:14:59,463 under some conditions, 344 00:14:59,463 --> 00:15:02,100 and maybe those are not particularly realistic conditions. 345 00:15:02,100 --> 00:15:03,800 Then we kind of go further. 346 00:15:03,800 --> 00:15:08,418 But I think we've been constraining things too much 347 00:15:08,418 --> 00:15:09,519 where we said, 348 00:15:09,519 --> 00:15:13,185 "This is the type of things that we need to do." 349 00:15:13,185 --> 00:15:14,502 And in some sense, 350 00:15:15,700 --> 00:15:18,200 that goes back to the way Josh and I 351 00:15:19,700 --> 00:15:21,984 thought about things for the local average treatment effect. 352 00:15:21,984 --> 00:15:23,137 That wasn't quite the way 353 00:15:23,137 --> 00:15:25,135 people were thinking about these problems before. 354 00:15:25,805 --> 00:15:28,860 There was a sense that some of the people said 355 00:15:29,500 --> 00:15:31,900 the way you need to do these things is you first say 356 00:15:32,200 --> 00:15:34,140 what you're interested in estimating, 357 00:15:34,140 --> 00:15:37,507 and then you do the best job you can in estimating that. 358 00:15:38,100 --> 00:15:43,874 And what you guys are doing is you're doing it backwards. 359 00:15:44,300 --> 00:15:46,700 You kind of say, "Here, I have an estimator, 360 00:15:47,300 --> 00:15:50,642 and now I'm going to figure out what it's estimating." 361 00:15:50,642 --> 00:15:53,900 And I suppose you're going to say why you think that's interesting 362 00:15:53,900 --> 00:15:56,600 or maybe why it's not interesting, and that's not okay. 363 00:15:56,600 --> 00:15:58,600 You're not allowed to do that in that way. 364 00:15:59,000 --> 00:16:02,026 And I think we should just be a little bit more flexible 365 00:16:02,026 --> 00:16:06,648 in thinking about how to look at problems 366 00:16:06,648 --> 00:16:08,328 because I think we've missed some things 367 00:16:08,328 --> 00:16:11,300 by not doing that. 368 00:16:11,300 --> 00:16:12,819 ♪ [music] ♪ 369 00:16:12,819 --> 00:16:14,753 - [Josh] So you've heard our views, Isaiah, 370 00:16:14,753 --> 00:16:18,191 and you've seen that we have some points of disagreement. 371 00:16:18,191 --> 00:16:20,400 Why don't you referee this dispute for us? 372 00:16:20,950 --> 00:16:22,394 [laughter] 373 00:16:22,500 --> 00:16:24,999 - Oh, it's so nice of you to ask me a small question. 374 00:16:24,999 --> 00:16:26,212 [laughter] 375 00:16:26,425 --> 00:16:27,993 So I guess, for one, 376 00:16:27,993 --> 00:16:33,200 I very much agree with something that Guido said earlier of... 377 00:16:34,100 --> 00:16:35,710 [laughter] 378 00:16:35,920 --> 00:16:37,148 So one thing where it seems 379 00:16:37,148 --> 00:16:40,066 where the case for machine learning seems relatively clear 380 00:16:40,066 --> 00:16:43,316 is in settings where we're interested in some version 381 00:16:43,316 --> 00:16:45,100 of a nonparametric prediction problem. 382 00:16:45,100 --> 00:16:46,392 So I'm interested in estimating 383 00:16:46,392 --> 00:16:49,700 a conditional expectation or conditional probability, 384 00:16:50,000 --> 00:16:52,020 and in the past, maybe I would have run a kernel... 385 00:16:52,020 --> 00:16:53,526 I would have run a kernel regression, 386 00:16:53,526 --> 00:16:55,184 or I would have run a series regression, 387 00:16:55,184 --> 00:16:57,400 or something along those lines. 388 00:16:57,976 --> 00:17:00,350 It seems like, at this point, we've a fairly good sense 389 00:17:00,350 --> 00:17:03,102 that in a fairly wide range of applications, 390 00:17:03,102 --> 00:17:05,671 machine learning methods seem to do better 391 00:17:05,671 --> 00:17:08,610 for estimating conditional mean functions, 392 00:17:08,610 --> 00:17:09,811 or conditional probabilities, 393 00:17:09,811 --> 00:17:12,000 or various other nonparametric objects 394 00:17:12,400 --> 00:17:15,309 than more traditional nonparametric methods 395 00:17:15,309 --> 00:17:17,292 that were studied in econometrics and statistics, 396 00:17:17,292 --> 00:17:19,100 especially in high-dimensional settings. 397 00:17:19,500 --> 00:17:21,849 - So you're thinking of maybe the propensity score 398 00:17:21,849 --> 00:17:23,155 or something like that? 399 00:17:23,155 --> 00:17:25,063 - Yeah, exactly, - Nuisance functions. 400 00:17:25,063 --> 00:17:27,100 - Yeah, so things like propensity scores. 401 00:17:27,872 --> 00:17:29,965 Even objects of more direct 402 00:17:29,965 --> 00:17:32,400 interest-like conditional average treatment effects, 403 00:17:32,400 --> 00:17:35,100 which are the difference of two conditional expectation functions, 404 00:17:35,100 --> 00:17:36,625 potentially things like that. 405 00:17:36,625 --> 00:17:40,573 Of course, even there, the theory... 406 00:17:40,573 --> 00:17:43,620 for inference of the theory for how to interpret, 407 00:17:43,620 --> 00:17:45,797 how to make large sample statements about some of these things 408 00:17:45,797 --> 00:17:47,733 are less well-developed depending on 409 00:17:47,733 --> 00:17:50,100 the machine learning estimator used. 410 00:17:50,100 --> 00:17:52,983 And so I think something that is tricky 411 00:17:52,983 --> 00:17:55,700 is that we can have these methods, which work a lot, 412 00:17:55,700 --> 00:17:58,000 which seem to work a lot better for some purposes 413 00:17:58,000 --> 00:18:01,229 but which we need to be a bit careful in how we plug them in 414 00:18:01,229 --> 00:18:03,300 or how we interpret the resulting statements. 415 00:18:03,600 --> 00:18:05,985 But, of course, that's a very, very active area right now 416 00:18:05,985 --> 00:18:07,668 where people are doing tons of great work. 417 00:18:07,668 --> 00:18:10,694 So I fully expect and hope to see 418 00:18:10,694 --> 00:18:12,800 much more going forward there. 419 00:18:13,000 --> 00:18:16,780 So one issue with machine learning that always seems a danger is... 420 00:18:16,780 --> 00:18:18,517 or that is sometimes a danger 421 00:18:18,517 --> 00:18:20,938 and has sometimes led to applications 422 00:18:20,938 --> 00:18:22,139 that have made less sense 423 00:18:22,139 --> 00:18:27,309 is when folks start with a method that they're very excited about 424 00:18:27,309 --> 00:18:28,676 rather than a question. 425 00:18:28,900 --> 00:18:30,492 So sort of starting with a question 426 00:18:30,492 --> 00:18:33,782 where here's the object I'm interested in, 427 00:18:33,782 --> 00:18:35,228 here is the parameter of interest -- 428 00:18:35,529 --> 00:18:39,500 let me think about how I would identify that thing, 429 00:18:39,500 --> 00:18:41,824 how I would recover that thing if I had a ton of data. 430 00:18:41,824 --> 00:18:44,000 Oh, here's a conditional expectation function, 431 00:18:44,000 --> 00:18:47,065 let me plug in a machine learning estimator for that -- 432 00:18:47,065 --> 00:18:48,800 that seems very, very sensible. 433 00:18:49,000 --> 00:18:52,964 Whereas, you know, if I regress quantity on price 434 00:18:53,504 --> 00:18:56,000 and say that I used a machine learning method, 435 00:18:56,300 --> 00:18:58,791 maybe I'm satisfied that that solves the endogeneity problem 436 00:18:58,791 --> 00:19:01,200 we're usually worried about there... maybe I'm not. 437 00:19:01,500 --> 00:19:02,649 But, again, that's something 438 00:19:02,649 --> 00:19:06,300 where the way to address it seems relatively clear. 439 00:19:06,500 --> 00:19:08,181 It's to find your object of interest 440 00:19:08,181 --> 00:19:09,779 and think about -- 441 00:19:09,779 --> 00:19:11,489 - Just bring in the economics. 442 00:19:11,489 --> 00:19:12,741 - Exactly. 443 00:19:12,741 --> 00:19:14,274 - And think about the heterogeneity, 444 00:19:14,274 --> 00:19:17,067 but harness the power of the machine learning methods 445 00:19:17,067 --> 00:19:20,148 for some of the components. 446 00:19:20,349 --> 00:19:21,388 - Precisely. Exactly. 447 00:19:21,388 --> 00:19:23,753 So the question of interest 448 00:19:23,753 --> 00:19:25,767 is the same as the question of interest has always been, 449 00:19:25,767 --> 00:19:28,493 but we now have better methods for estimating some pieces of this. 450 00:19:29,900 --> 00:19:32,704 The place that seems harder to forecast 451 00:19:32,704 --> 00:19:35,816 is obviously there's a huge amount going on 452 00:19:35,816 --> 00:19:37,500 in the machine learning literature, 453 00:19:37,500 --> 00:19:40,223 and the limited ways of plugging it in 454 00:19:40,223 --> 00:19:41,388 that I've referenced so far 455 00:19:41,388 --> 00:19:43,090 are a limited piece of that. 456 00:19:43,090 --> 00:19:45,324 So I think there are all sorts of other interesting questions 457 00:19:45,324 --> 00:19:46,520 about where... 458 00:19:47,100 --> 00:19:49,300 where does this interaction go? What else can we learn? 459 00:19:49,300 --> 00:19:52,932 And that's something where I think there's a ton going on, 460 00:19:52,932 --> 00:19:54,414 which seems very promising, 461 00:19:54,414 --> 00:19:56,400 and I have no idea what the answer is. 462 00:19:57,000 --> 00:20:00,297 - No, I totally agree with that, 463 00:20:00,297 --> 00:20:03,539 but that makes it very exciting. 464 00:20:03,539 --> 00:20:06,100 And I think there's just a little work to be done there. 465 00:20:06,600 --> 00:20:08,720 Alright. So I say, he agrees with me there. 466 00:20:08,720 --> 00:20:10,174 [laughter] 467 00:20:10,174 --> 00:20:11,633 - I didn't say that per se. 468 00:20:12,926 --> 00:20:14,419 ♪ [music] ♪ 469 00:20:14,419 --> 00:20:16,833 - [Narrator] If you'd like to watch more Nobel Conversations, 470 00:20:16,833 --> 00:20:18,012 click here. 471 00:20:18,012 --> 00:20:20,492 Or if you'd like to learn more about econometrics, 472 00:20:20,500 --> 00:20:23,100 check out Josh's Mastering Econometrics series. 473 00:20:23,600 --> 00:20:26,569 If you'd like to learn more about Guido, Josh, and Isaiah, 474 00:20:26,569 --> 00:20:28,550 check out the links in the description. 475 00:20:28,550 --> 00:20:30,535 ♪ [music] ♪