1 00:00:00,100 --> 00:00:02,350 ♪ [music] ♪ 2 00:00:03,700 --> 00:00:05,700 - [narrator] Welcome to Nobel Conversations. 3 00:00:07,000 --> 00:00:10,128 In this episode, Josh Angrist and Guido Imbens 4 00:00:10,128 --> 00:00:13,700 sit down with Isaiah Andrews to discuss and disagree 5 00:00:13,700 --> 00:00:16,580 over the role of machine learning in applied econometrics. 6 00:00:18,300 --> 00:00:19,769 - [Isaiah] So, of course, there are a lot of topics 7 00:00:19,769 --> 00:00:21,087 where you guys largely agree, 8 00:00:21,087 --> 00:00:22,313 but I'd like to turn to one 9 00:00:22,313 --> 00:00:24,240 where maybe you have some differences of opinion. 10 00:00:24,240 --> 00:00:25,728 So I'd love to hear some of your thoughts 11 00:00:25,728 --> 00:00:26,883 about machine learning 12 00:00:26,883 --> 00:00:29,900 and the goal that it's playing and is going to play in economics. 13 00:00:30,200 --> 00:00:33,352 - [Guido] I've looked at some data like the proprietary 14 00:00:33,352 --> 00:00:35,100 so that there's no published paper there. 15 00:00:36,719 --> 00:00:38,159 There was an experiment that was done 16 00:00:38,159 --> 00:00:39,500 on some search algorithm. 17 00:00:39,700 --> 00:00:41,497 And the question was... 18 00:00:42,901 --> 00:00:45,600 it was about ranking things and changing the ranking. 19 00:00:45,900 --> 00:00:47,500 That was sort of clear... 20 00:00:48,400 --> 00:00:50,600 that was going to be a lot of heterogeneity there. 21 00:00:50,600 --> 00:00:51,700 Mmm, 22 00:00:51,700 --> 00:00:58,120 You know, if you look for say, 23 00:00:58,300 --> 00:01:00,350 a picture of Britney Spears 24 00:01:00,350 --> 00:01:02,400 that it doesn't really matter where you rank it 25 00:01:02,400 --> 00:01:05,500 because you're going to figure out what you're looking for, 26 00:01:06,200 --> 00:01:07,867 whether you put it in the first or second 27 00:01:07,867 --> 00:01:09,800 or third position of the ranking. 28 00:01:10,100 --> 00:01:12,500 But if you're looking for the best econometrics book, 29 00:01:13,300 --> 00:01:16,500 if you put your book first or your book tenth, 30 00:01:16,500 --> 00:01:18,100 that's going to make a big difference 31 00:01:18,600 --> 00:01:21,829 how much how often people are going to click on it. 32 00:01:21,829 --> 00:01:23,417 And so there you go -- 33 00:01:23,417 --> 00:01:27,218 - [Josh] Why do I need machine learning to discover that? 34 00:01:27,218 --> 00:01:29,195 It seems like could I can discover it simply? 35 00:01:29,195 --> 00:01:30,435 - [Guido] So in general-- 36 00:01:30,435 --> 00:01:32,100 - [Josh] There were lots of possible... 37 00:01:32,100 --> 00:01:35,490 - You what you want to think about there being lots of characteristics 38 00:01:35,490 --> 00:01:37,610 of the items 39 00:01:37,610 --> 00:01:41,682 that you want to understand what drives the heterogeneity 40 00:01:42,300 --> 00:01:43,427 in the effect of-- 41 00:01:43,427 --> 00:01:45,600 - But you're just predicting 42 00:01:45,600 --> 00:01:47,700 In some sense, you're solving a marketing problem. 43 00:01:48,400 --> 00:01:49,580 - [inaudible] it's causal effect, 44 00:01:49,580 --> 00:01:51,800 - It's causal, but it has no scientific content. 45 00:01:51,800 --> 00:01:53,300 Think about... 46 00:01:54,100 --> 00:01:57,300 - No, but it's similar things in medical settings. 47 00:01:58,000 --> 00:02:01,300 If you do an experiment, you may actually be very interested 48 00:02:01,300 --> 00:02:03,900 in whether the treatment works for some groups or not. 49 00:02:03,900 --> 00:02:06,500 And you have a lot of individual characteristics, 50 00:02:06,500 --> 00:02:08,000 and you want to systematically search. 51 00:02:08,000 --> 00:02:09,500 - Yeah. I'm skeptical about that -- 52 00:02:09,500 --> 00:02:12,603 that sort of idea that there's this personal causal effect 53 00:02:12,603 --> 00:02:13,900 that I should care about, 54 00:02:14,000 --> 00:02:16,063 and that machine learning can discover it 55 00:02:16,063 --> 00:02:17,596 in some way that's useful. 56 00:02:17,596 --> 00:02:21,400 So think about -- I've done a lot of work on schools, 57 00:02:21,400 --> 00:02:23,950 going to, say, a charter school, 58 00:02:23,950 --> 00:02:25,225 a publicly funded private school, 59 00:02:25,225 --> 00:02:26,500 effectively, you know, that's free to structure 60 00:02:26,500 --> 00:02:29,300 its own curriculum for context there. 61 00:02:29,300 --> 00:02:31,000 Some types of charter schools 62 00:02:31,000 --> 00:02:32,700 generate spectacular achievement gains, 63 00:02:32,700 --> 00:02:36,400 and in the data set that produces that result, 64 00:02:36,400 --> 00:02:37,800 I have a lot of covariance. 65 00:02:37,800 --> 00:02:41,353 So I have baseline scores, and I have family background, 66 00:02:41,353 --> 00:02:43,576 the education of the parents, 67 00:02:43,576 --> 00:02:45,800 the sex of the child, the race of the child. 68 00:02:45,800 --> 00:02:48,300 And, well, soon as I put half a dozen of those together, 69 00:02:48,400 --> 00:02:51,900 I have a very high dimensional space. 70 00:02:52,300 --> 00:02:53,600 I'm definitely interested in sort of coarse features 71 00:02:53,600 --> 00:02:54,900 of that treatment effect, 72 00:02:54,900 --> 00:02:57,150 like whether it's better for people 73 00:02:57,150 --> 00:02:59,400 who come from lower income families. 74 00:03:02,600 --> 00:03:06,000 I have a hard time believing that there's an application, 75 00:03:06,400 --> 00:03:10,300 for the very high dimensional version of that, 76 00:03:10,500 --> 00:03:11,850 where I discovered that for non-white children 77 00:03:11,850 --> 00:03:13,200 who have high family incomes 78 00:03:13,800 --> 00:03:17,800 but baseline scores in the third quartile 79 00:03:18,300 --> 00:03:20,650 and only went to public school in the third grade 80 00:03:20,650 --> 00:03:23,000 but not the sixth grade. 81 00:03:23,000 --> 00:03:25,500 So that's what that high dimensional analysis produces. 82 00:03:25,800 --> 00:03:28,100 This very elaborate conditional statement. 83 00:03:28,300 --> 00:03:31,000 There's two things that are wrong with that in my view. 84 00:03:31,000 --> 00:03:32,500 First, I don't see it as... 85 00:03:32,500 --> 00:03:34,000 I just can't imagine why it's actionable. 86 00:03:34,600 --> 00:03:36,600 I don't know why you'd want to act on it. 87 00:03:36,600 --> 00:03:38,900 And I know also that there's some alternative model 88 00:03:38,900 --> 00:03:41,200 that fits almost as well, 89 00:03:41,800 --> 00:03:43,000 that flips everything, 90 00:03:43,200 --> 00:03:45,350 Because machine learning doesn't tell me 91 00:03:45,350 --> 00:03:47,500 that this is really the predictor that matters. 92 00:03:48,400 --> 00:03:52,300 It just tells me that this is a good predictor. 93 00:03:52,800 --> 00:03:54,350 And so, I think there is something different 94 00:03:54,350 --> 00:03:55,900 about the social science contest. 95 00:03:57,940 --> 00:03:59,545 - [Guido] I think the [socialized sign] applications 96 00:03:59,545 --> 00:04:01,150 you're talking about, 97 00:04:01,150 --> 00:04:02,600 once were... 98 00:04:03,400 --> 00:04:08,100 I think there's not a huge amount of heterogeneity in the effects. 99 00:04:08,400 --> 00:04:11,200 - [Josh] There might be 100 00:04:11,200 --> 00:04:14,000 if you allow me to to fill that space. 101 00:04:14,600 --> 00:04:16,350 - No... not even then. 102 00:04:16,350 --> 00:04:18,100 I think for a lot of those interventions, 103 00:04:18,300 --> 00:04:22,000 you would expect that the effect is the same sign for everybody. 104 00:04:23,400 --> 00:04:27,600 There may be small differences in the magnitude, but it's not... 105 00:04:28,200 --> 00:04:31,700 For a lot of these education defenses -- they're good for everybody. 106 00:04:32,900 --> 00:04:35,250 It's not that they're bad for some people 107 00:04:35,250 --> 00:04:37,600 and good for other people, 108 00:04:37,600 --> 00:04:39,200 and that is kind of very small pockets 109 00:04:39,200 --> 00:04:40,800 where they're bad there. 110 00:04:40,900 --> 00:04:43,900 But it may be some variation in the magnitude, 111 00:04:44,000 --> 00:04:48,200 but you would need very, very big data sets to find those. 112 00:04:48,400 --> 00:04:49,900 I agree that in those cases, 113 00:04:49,900 --> 00:04:51,400 they probably wouldn't be very actionable anyone. 114 00:04:51,700 --> 00:04:53,800 But I think there's a lot of other settings 115 00:04:54,100 --> 00:04:56,600 where there is much more heterogeneity. 116 00:04:57,400 --> 00:04:59,500 - Well, I'm open to that possibility, 117 00:04:59,500 --> 00:05:05,550 and I think the example you gave is essentially a marketing example. 118 00:05:06,430 --> 00:05:10,700 - No, those have implications for it and that's the organization, 119 00:05:10,700 --> 00:05:13,900 whether you need to worry about the... 120 00:05:14,000 --> 00:05:17,900 - Well, I need to see that paper. 121 00:05:18,400 --> 00:05:21,200 - So the sense I'm getting... 122 00:05:21,500 --> 00:05:23,100 - We still disagree on something. - Yes. 123 00:05:23,100 --> 00:05:24,100 [laughter] 124 00:05:24,100 --> 00:05:25,400 - We haven't converged on everything. 125 00:05:25,400 --> 00:05:26,050 - I'm getting that sense. 126 00:05:26,050 --> 00:05:26,700 [laughter] 127 00:05:27,200 --> 00:05:29,100 - Actually, we've diverged on this 128 00:05:29,100 --> 00:05:30,050 because this wasn't around to argue about. 129 00:05:30,050 --> 00:05:31,000 [laughter] 130 00:05:33,200 --> 00:05:35,600 - Is it getting a little warm here? 131 00:05:35,600 --> 00:05:38,000 - Warmed up. Warmed up is good. 132 00:05:38,100 --> 00:05:40,800 The sense I'm getting is, Josh, you're not saying 133 00:05:40,900 --> 00:05:43,400 that you're confident that there is no way 134 00:05:43,400 --> 00:05:45,400 that there is an application where the stuff. 135 00:05:45,400 --> 00:05:46,800 It's useful you are saying 136 00:05:46,800 --> 00:05:48,200 you are unconvinced by the existing application to date. 137 00:05:48,300 --> 00:05:51,280 Fair enough. 138 00:05:51,280 --> 00:05:53,120 - I'm very confident. 139 00:05:53,120 --> 00:05:54,300 [laughter] 140 00:05:54,300 --> 00:05:55,300 - In this case. 141 00:05:55,300 --> 00:05:57,500 - I think Josh does have a point 142 00:05:58,000 --> 00:06:02,100 that even in the prediction cases 143 00:06:02,300 --> 00:06:05,000 where a lot of the machine learning methods really shine 144 00:06:05,000 --> 00:06:06,600 is where there's just a lot of heterogeneity. 145 00:06:07,300 --> 00:06:10,600 - You don't really care much about the details there, right? 146 00:06:10,900 --> 00:06:15,000 It doesn't have a policy angle or something. 147 00:06:15,200 --> 00:06:18,100 - They kind of recognizing handwritten digits and stuff. 148 00:06:18,300 --> 00:06:21,150 It does much better there 149 00:06:21,150 --> 00:06:24,000 than building some complicated model. 150 00:06:24,400 --> 00:06:28,100 But a lot of the social science, a lot of the economic applications, 151 00:06:28,300 --> 00:06:30,200 we actually know a huge amount about the relationship 152 00:06:30,200 --> 00:06:32,100 between its variables. 153 00:06:32,100 --> 00:06:34,600 A lot of the relationships are strictly monotone. 154 00:06:35,400 --> 00:06:39,400 Education is going to increase people's earnings, 155 00:06:39,800 --> 00:06:41,950 irrespective of the demographic, 156 00:06:41,950 --> 00:06:44,100 irrespective of the level of education you already have. 157 00:06:44,100 --> 00:06:45,950 - Until they get to a Ph.D. 158 00:06:45,950 --> 00:06:47,800 - Yeah, there is a graduate school... 159 00:06:48,150 --> 00:06:49,150 [laughter] 160 00:06:49,500 --> 00:06:50,700 but go over a reasonable range. 161 00:06:51,600 --> 00:06:55,900 It's not going to go down very much. 162 00:06:56,100 --> 00:06:57,900 In a lot of the settings 163 00:06:57,900 --> 00:06:59,700 where these machine learning methods shine, 164 00:06:59,700 --> 00:07:01,900 there's a lot of [ ] 165 00:07:02,100 --> 00:07:04,900 kind of multimodality in these relationships, 166 00:07:05,300 --> 00:07:08,400 and they're going to be very powerful. 167 00:07:08,400 --> 00:07:11,500 But I still stand by that. 168 00:07:11,700 --> 00:07:16,100 These methods just have a huge amount to offer 169 00:07:16,400 --> 00:07:18,100 for economists, 170 00:07:18,200 --> 00:07:21,700 and they're going to be a big part of the future. 171 00:07:23,400 --> 00:07:24,600 - [Isaiah] Feels like there's something interesting 172 00:07:24,600 --> 00:07:25,800 to be said about machine learning here. 173 00:07:25,800 --> 00:07:27,700 So, Guido, I was wondering, could you give some more... 174 00:07:28,000 --> 00:07:29,000 maybe some examples of the sorts of examples 175 00:07:29,000 --> 00:07:32,500 you're thinking about with applications [ ] at the moment? 176 00:07:32,500 --> 00:07:34,100 - So on areas where 177 00:07:34,700 --> 00:07:36,400 instead of looking for average cause or effects 178 00:07:36,500 --> 00:07:39,350 we're looking for individualized estimates, 179 00:07:39,350 --> 00:07:42,200 predictions of cause or effects 180 00:07:42,400 --> 00:07:44,950 and the machine learning algorithms have been very effective, 181 00:07:48,300 --> 00:07:51,500 Traditionally, we would have done these things using kernel methods. 182 00:07:51,600 --> 00:07:54,500 And theoretically they work great, 183 00:07:54,600 --> 00:07:56,000 and there's some arguments 184 00:07:56,000 --> 00:07:57,400 that, formally, you can't do any better. 185 00:07:57,600 --> 00:08:00,500 But in practice, they don't work very well. 186 00:08:00,900 --> 00:08:03,150 Random causal forest-type things 187 00:08:03,150 --> 00:08:05,400 that Stefan Wager and Susan Athey have been working on 188 00:08:05,400 --> 00:08:09,500 have used very widely. 189 00:08:09,600 --> 00:08:12,200 They've been very effective in these settings 190 00:08:12,400 --> 00:08:18,100 to actually get causal effects that vary be [ ]. 191 00:08:20,700 --> 00:08:23,200 I think this is still just the beginning of these methods. 192 00:08:23,200 --> 00:08:25,700 But in many cases, 193 00:08:26,400 --> 00:08:31,600 these algorithms are very effective as searching over big spaces 194 00:08:31,800 --> 00:08:35,600 and finding the functions that fit very well 195 00:08:35,900 --> 00:08:41,100 in ways that we couldn't really do beforehand. 196 00:08:41,500 --> 00:08:43,400 - I don't know of an example 197 00:08:43,400 --> 00:08:45,300 where machine learning has generated insights 198 00:08:45,300 --> 00:08:48,100 about a causal effect that I'm interested in. 199 00:08:48,300 --> 00:08:49,800 And I do know of examples 200 00:08:49,800 --> 00:08:51,300 where it's potentially very misleading. 201 00:08:51,300 --> 00:08:53,700 So I've done some work with Brigham Frandsen, 202 00:08:54,100 --> 00:08:55,100 using, for example, random forest to model covariate effects 203 00:08:55,100 --> 00:08:59,900 in an instrumental variables problem 204 00:09:00,200 --> 00:09:01,200 Where you need you need to condition on covariance. 205 00:09:04,400 --> 00:09:06,300 And you don't particularly have strong feelings 206 00:09:06,300 --> 00:09:08,200 about the functional form for that, 207 00:09:08,200 --> 00:09:10,000 so maybe you should curve... 208 00:09:10,900 --> 00:09:12,700 be open to flexible curve fitting, 209 00:09:12,700 --> 00:09:14,500 and that leads you down a path 210 00:09:14,500 --> 00:09:18,000 where there's a lot of nonlinearities in the model, 211 00:09:18,200 --> 00:09:20,600 and that's very dangerous with IV 212 00:09:20,600 --> 00:09:23,000 because any sort of excluded non-linearity 213 00:09:23,300 --> 00:09:25,450 potentially generates a spurious causal effect 214 00:09:25,450 --> 00:09:27,600 and Brigham and I showed that very powerfully. 215 00:09:27,900 --> 00:09:32,200 I think in the case of two instruments 216 00:09:32,700 --> 00:09:36,000 that come from a paper of mine with Bill Evans, 217 00:09:36,500 --> 00:09:37,600 where if you replace it 218 00:09:38,100 --> 00:09:40,350 a traditional two stage [ ] squares estimator 219 00:09:40,350 --> 00:09:42,600 with some kind of random forest, 220 00:09:42,900 --> 00:09:48,000 you get very precisely estimated [non-sense] estimates. 221 00:09:49,000 --> 00:09:51,100 I think that's a big caution. 222 00:09:51,100 --> 00:09:53,400 In view of those findings in an example I care about 223 00:09:53,700 --> 00:09:57,100 where the instruments are very simple 224 00:09:57,400 --> 00:09:59,100 and I believe that they're valid, 225 00:09:59,300 --> 00:10:01,600 I would be skeptical of that. 226 00:10:02,900 --> 00:10:06,800 So non-linearity and IV don't mix very comfortably. 227 00:10:07,200 --> 00:10:10,450 No, it sounds like that's already a more complicated... 228 00:10:10,450 --> 00:10:11,400 - Well, it's IV.... - Yeah. 229 00:10:12,500 --> 00:10:16,700 - ...and we work on that. 230 00:10:17,150 --> 00:10:17,875 [laughter] 231 00:10:17,875 --> 00:10:18,600 - Fair enough. 232 00:10:18,600 --> 00:10:20,450 - As Editor of Econometric [guy], 233 00:10:20,450 --> 00:10:22,300 a lot of these papers cross by my desk, 234 00:10:22,700 --> 00:10:26,100 but the motivation is not clear 235 00:10:26,100 --> 00:10:29,500 and, in fact, really lacking. 236 00:10:29,800 --> 00:10:35,100 They're not... [we call] type semi-parametric foundational papers. 237 00:10:35,400 --> 00:10:37,100 So that that's a big problem. 238 00:10:38,000 --> 00:10:42,400 A related problem is that we have this tradition in econometrics 239 00:10:42,600 --> 00:10:47,500 of being very focused on these formal [ ] results. 240 00:10:48,800 --> 00:10:52,600 We have just have a lot of papers where people propose a method 241 00:10:52,800 --> 00:10:55,700 and then establish the asymptotic properties 242 00:10:56,300 --> 00:10:59,100 in a very kind of standardized way. 243 00:10:59,100 --> 00:11:01,900 - Is that bad? 244 00:11:02,900 --> 00:11:07,200 - Well, I think it's sort of closed the door 245 00:11:07,200 --> 00:11:09,400 for a lot of work that doesn't fit it into that. 246 00:11:09,400 --> 00:11:11,600 where in the machine learning literature, 247 00:11:11,900 --> 00:11:14,300 a lot of things are more algorithmic. 248 00:11:14,431 --> 00:11:18,500 People had algorithms for coming up with predictions 249 00:11:18,800 --> 00:11:21,200 that turn out to actually work much better 250 00:11:21,200 --> 00:11:23,600 than, say, nonparametric kernel regression 251 00:11:24,000 --> 00:11:26,800 For a long time, we were doing all the nonparametrics in econometrics, 252 00:11:26,800 --> 00:11:28,950 we were using kernel regression, 253 00:11:28,950 --> 00:11:31,100 and it was great for proving theorems. 254 00:11:31,300 --> 00:11:33,050 You could get [ ] intervals 255 00:11:33,050 --> 00:11:34,800 and consistency, and asymptotic normality, 256 00:11:34,800 --> 00:11:35,900 and it was all great, 257 00:11:35,900 --> 00:11:37,000 But it wasn't very useful. 258 00:11:37,300 --> 00:11:39,100 And the things they did in machine learning 259 00:11:39,100 --> 00:11:40,900 are just way, way better. 260 00:11:41,000 --> 00:11:43,050 But they didn't have the problem-- 261 00:11:43,050 --> 00:11:44,300 - That's not my beef with machine learning theory. 262 00:11:44,300 --> 00:11:45,300 [laughter] 263 00:11:45,300 --> 00:11:51,200 No, but I'm saying there, for the prediction part, 264 00:11:51,400 --> 00:11:52,950 it does much better. 265 00:11:52,950 --> 00:11:54,500 - Yeah, it's a better curve fitting to it. 266 00:11:54,900 --> 00:11:56,500 - But it did so in a way 267 00:11:57,100 --> 00:11:58,500 that would not have made those papers 268 00:11:58,500 --> 00:11:59,900 initially easy to get into, the econometrics journals, 269 00:12:04,650 --> 00:12:06,300 because it wasn't proving the type of things. 270 00:12:06,400 --> 00:12:08,800 When Brigham was doing his regression trees 271 00:12:08,800 --> 00:12:11,200 that just didn't fit in. 272 00:12:11,800 --> 00:12:15,100 I think he would have had a very hard time 273 00:12:15,200 --> 00:12:18,400 publishing these things in econometric journals. 274 00:12:18,900 --> 00:12:24,400 I think we've limited ourselves too much 275 00:12:24,700 --> 00:12:27,900 that left us close things off 276 00:12:28,000 --> 00:12:29,400 for a lot of these machine learning methods 277 00:12:29,400 --> 00:12:30,800 that are actually very useful. 278 00:12:30,900 --> 00:12:34,000 I mean, I think, in general, 279 00:12:34,900 --> 00:12:36,200 that literature, the computer scientist, 280 00:12:36,200 --> 00:12:37,750 have proposed a huge number of these algorithms 281 00:12:37,750 --> 00:12:39,300 that actually are very useful. 282 00:12:45,500 --> 00:12:47,300 and that are affecting 283 00:12:47,300 --> 00:12:49,100 the way we're going to be doing empirical work. 284 00:12:49,800 --> 00:12:52,450 But we've not fully internalized that 285 00:12:52,450 --> 00:12:55,100 because we're still very focused 286 00:12:55,300 --> 00:12:57,500 on getting point estimates and getting standard errors 287 00:12:58,600 --> 00:13:01,200 and getting P values 288 00:13:01,700 --> 00:13:03,100 in a way that we need to move beyond 289 00:13:03,300 --> 00:13:04,300 to fully harness the force, 290 00:13:04,300 --> 00:13:10,700 the benefits from the machine learning literature. 291 00:13:10,900 --> 00:13:13,000 - On the one hand, I guess I very much take your point 292 00:13:13,000 --> 00:13:15,100 that sort of the traditional econometrics framework 293 00:13:15,200 --> 00:13:18,600 of sort of propose a method, prove a limit theorem 294 00:13:18,600 --> 00:13:22,600 under some asymptotic story, story story, story story... 295 00:13:22,600 --> 00:13:26,900 publisher paper is constraining. 296 00:13:26,900 --> 00:13:29,700 And that, in some sense, 297 00:13:29,700 --> 00:13:30,575 by thinking more broadly 298 00:13:30,575 --> 00:13:31,450 about what a methods paper could look like, 299 00:13:31,450 --> 00:13:33,200 we may [write] in some sense. 300 00:13:33,200 --> 00:13:35,900 Certainly the machine learning literature has found a bunch of things, 301 00:13:35,900 --> 00:13:38,300 which seem to work quite well for a number of problems 302 00:13:38,300 --> 00:13:40,350 and are now having substantial influence in economics. 303 00:13:40,350 --> 00:13:42,400 I guess a question I'm interested in 304 00:13:42,400 --> 00:13:44,800 is how do you think about the role of... 305 00:13:47,900 --> 00:13:51,200 sort of -- do you think there is no value in the theory part of it? 306 00:13:51,600 --> 00:13:54,800 Because I guess a question that I often have 307 00:13:54,800 --> 00:13:56,900 to sort of seeing that output from a machine learning tool, 308 00:13:56,900 --> 00:13:59,400 that actually a number of the methods that you talked about 309 00:13:59,400 --> 00:14:01,800 actually do have inferential results developed for them, 310 00:14:02,600 --> 00:14:04,500 something that I always wonder about 311 00:14:04,500 --> 00:14:06,400 of uncertainty quantification and just... 312 00:14:06,500 --> 00:14:08,000 I have my prior, 313 00:14:08,000 --> 00:14:11,000 I come into the world with my view. I see the result of this thing. 314 00:14:11,000 --> 00:14:12,750 How should I update based on it? 315 00:14:12,750 --> 00:14:14,500 And in some sense, if I'm in a world 316 00:14:14,600 --> 00:14:15,100 where things are normally distributed, 317 00:14:15,200 --> 00:14:16,700 I know how to do it here -- 318 00:14:16,700 --> 00:14:18,200 here I don't. 319 00:14:18,200 --> 00:14:21,400 And so I'm interested to hear what you think about that. 320 00:14:21,500 --> 00:14:24,300 - I don't see this as sort of saying, well, 321 00:14:24,400 --> 00:14:26,500 these results are not interesting, 322 00:14:26,600 --> 00:14:27,700 but it's going to be a lot of cases 323 00:14:28,000 --> 00:14:29,600 where it's going to be incredibly hard 324 00:14:29,600 --> 00:14:31,200 to get those results 325 00:14:31,200 --> 00:14:33,200 and we may not be able to get there 326 00:14:33,400 --> 00:14:35,550 and we may need to do it in stages 327 00:14:35,550 --> 00:14:37,700 where first someone says, 328 00:14:39,600 --> 00:14:40,900 "Hey, I have this interesting algorithm 329 00:14:40,900 --> 00:14:42,200 for doing something 330 00:14:42,200 --> 00:14:44,800 and it works well by some of the criterion 331 00:14:45,600 --> 00:14:49,900 that on this particular data set, 332 00:14:51,000 --> 00:14:53,400 and I'm visit put it out there, 333 00:14:53,700 --> 00:14:55,850 and maybe someone will figure out a way 334 00:14:55,850 --> 00:14:58,000 that you can later actually still do inference 335 00:14:58,000 --> 00:14:59,100 on the [sum] condition, 336 00:14:59,100 --> 00:15:02,100 and maybe those are not particularly realistic conditions, 337 00:15:02,100 --> 00:15:03,800 then we kind of go further. 338 00:15:03,800 --> 00:15:05,500 But I think we've been constraining things too much 339 00:15:06,700 --> 00:15:09,050 where we said, 340 00:15:09,050 --> 00:15:11,400 "This is the type of things that we need to do. 341 00:15:12,100 --> 00:15:14,400 And in some sense, 342 00:15:15,700 --> 00:15:18,200 that goes back to the way Josh and I 343 00:15:19,700 --> 00:15:21,900 thought about things for the [local average treatment] effect. 344 00:15:21,900 --> 00:15:23,250 That wasn't quite the way 345 00:15:23,250 --> 00:15:24,600 people were thinking about these problems before. 346 00:15:24,600 --> 00:15:29,200 There was a sense that some of the people said 347 00:15:29,500 --> 00:15:31,900 the way you need to do these things is you first say, 348 00:15:32,200 --> 00:15:34,250 what you're interested in in estimating 349 00:15:34,250 --> 00:15:36,300 and then you do the best job you can in estimating that. 350 00:15:38,100 --> 00:15:44,200 and what you guys are doing is you're doing it backwards. 351 00:15:44,300 --> 00:15:46,700 You kind of say, "Here, I have an estimator, 352 00:15:47,300 --> 00:15:49,600 and now I'm going to figure out what it's estimating, 353 00:15:51,400 --> 00:15:53,900 and I suppose you're going to say why you think that's interesting 354 00:15:53,900 --> 00:15:56,600 or maybe why it's not interesting, and that's not okay. 355 00:15:56,600 --> 00:15:58,600 You're not allowed to do that that way. 356 00:15:59,000 --> 00:16:04,100 And I think we should just be a little bit more flexible 357 00:16:04,300 --> 00:16:06,300 in thinking about how to look at problems 358 00:16:06,400 --> 00:16:08,850 because I think we've missed some things 359 00:16:08,850 --> 00:16:11,300 by not doing that. 360 00:16:13,000 --> 00:16:14,800 - [Josh] So you've heard our views, Isaiah. 361 00:16:14,800 --> 00:16:16,600 You've seen that we have some points of disagreement. 362 00:16:17,000 --> 00:16:20,400 Why don't you referee this dispute for us? 363 00:16:20,950 --> 00:16:21,950 [laughter] 364 00:16:22,500 --> 00:16:25,300 - Oh, it's so nice of you to ask me a small question. 365 00:16:25,300 --> 00:16:28,100 So I guess for one, 366 00:16:28,200 --> 00:16:33,200 I very much agree with something that Guido said earlier of... 367 00:16:34,100 --> 00:16:35,100 [laughter] 368 00:16:36,500 --> 00:16:37,900 - So one thing where it seems 369 00:16:37,900 --> 00:16:39,650 where the case for machine learning seems relatively clear 370 00:16:39,650 --> 00:16:41,400 is in settings where we're interested in some version 371 00:16:41,500 --> 00:16:45,100 of a nonparametric prediction problem. 372 00:16:45,100 --> 00:16:47,400 So I'm interested in estimating 373 00:16:47,400 --> 00:16:49,700 a conditional expectation or conditional probability, 374 00:16:50,000 --> 00:16:52,100 and in the past, maybe I would have run a kernel... 375 00:16:52,100 --> 00:16:53,950 I would have run a kernel regression 376 00:16:53,950 --> 00:16:55,800 or I would have run a series regression, 377 00:16:56,100 --> 00:16:57,400 or something along those lines. 378 00:16:58,700 --> 00:17:00,350 It seems like, at this point, we've a fairly good sense 379 00:17:00,350 --> 00:17:02,000 that in a fairly wide range of applications, 380 00:17:02,000 --> 00:17:06,300 machine learning methods seem to do better 381 00:17:06,800 --> 00:17:08,800 for estimating conditional mean functions 382 00:17:08,800 --> 00:17:10,400 or conditional probabilities 383 00:17:10,400 --> 00:17:12,000 or various other nonparametric objects 384 00:17:12,400 --> 00:17:14,500 than more traditional nonparametric methods 385 00:17:14,500 --> 00:17:16,600 that were studied in econometrics and statistics, 386 00:17:16,600 --> 00:17:19,100 especially in high dimensional settings. 387 00:17:19,500 --> 00:17:21,300 - So you're thinking of maybe the propensity score 388 00:17:21,300 --> 00:17:23,100 or something like that? 389 00:17:23,100 --> 00:17:24,200 - Yeah, exactly, 390 00:17:24,200 --> 00:17:25,300 - Nuisance functions. 391 00:17:25,300 --> 00:17:27,100 Yeah, so things like propensity scores, 392 00:17:27,530 --> 00:17:29,965 even objects of more direct 393 00:17:29,965 --> 00:17:32,400 interest-like conditional average treatment effects, 394 00:17:32,400 --> 00:17:35,100 which of the difference of two conditional expectation functions, 395 00:17:35,100 --> 00:17:36,300 potentially things like that. 396 00:17:36,500 --> 00:17:40,400 Of course, even there, the theory... 397 00:17:40,500 --> 00:17:43,700 inference of the theory for how to interpret, 398 00:17:43,700 --> 00:17:45,900 how to make large simple statements about some of these things 399 00:17:46,000 --> 00:17:48,050 are less well-developed depending on 400 00:17:48,050 --> 00:17:50,100 the machine learning estimator used. 401 00:17:50,100 --> 00:17:53,800 And so I think there's something that is tricky 402 00:17:53,900 --> 00:17:55,700 is that we can have these methods, which work a lot, 403 00:17:55,700 --> 00:17:58,000 which seemed to work a lot better for some purposes, 404 00:17:58,000 --> 00:18:01,600 but which we need to be a bit careful in how we plug them in 405 00:18:01,600 --> 00:18:03,300 or how we interpret the resulting statements. 406 00:18:03,600 --> 00:18:06,200 But of course, that's a very, very active area right now 407 00:18:06,400 --> 00:18:08,400 where people are doing tons of great work. 408 00:18:08,400 --> 00:18:10,400 And so I fully expect and hope to see 409 00:18:10,400 --> 00:18:12,800 much more going forward there. 410 00:18:13,000 --> 00:18:17,300 So one issue with machine learning that always seems a danger 411 00:18:17,400 --> 00:18:20,300 or that is sometimes a danger 412 00:18:20,500 --> 00:18:21,550 and had sometimes led to applications 413 00:18:21,550 --> 00:18:22,600 that have made less sense 414 00:18:22,800 --> 00:18:25,100 is when folks start with a method that they're very excited about 415 00:18:25,300 --> 00:18:28,500 rather than a question. 416 00:18:28,900 --> 00:18:32,100 So sort of starting with a question 417 00:18:32,500 --> 00:18:34,350 where here's the object I'm interested in, 418 00:18:34,350 --> 00:18:36,200 here is the parameter of interest. 419 00:18:37,300 --> 00:18:39,500 let me think about how I would identify that thing, 420 00:18:39,500 --> 00:18:41,800 how I would recover that thing if I had a ton of data. 421 00:18:41,900 --> 00:18:44,000 Oh, here's a conditional expectation function. 422 00:18:44,000 --> 00:18:47,100 Let me plug in the machine learning estimator for that. 423 00:18:47,200 --> 00:18:48,800 That seems very, very sensible. 424 00:18:49,000 --> 00:18:53,100 Whereas, you know, if I regress quantity on price 425 00:18:53,700 --> 00:18:56,000 and say that I used a machine learning method, 426 00:18:56,300 --> 00:18:58,900 maybe I'm satisfied that that solves the [ ] problem 427 00:18:58,900 --> 00:19:01,200 we're usually worried about there... maybe I'm not. 428 00:19:01,500 --> 00:19:03,200 But again, that's something 429 00:19:03,400 --> 00:19:06,300 where the way to address it seems relatively clear. 430 00:19:06,500 --> 00:19:09,000 It's to find your object of interest 431 00:19:09,200 --> 00:19:10,400 and think about-- 432 00:19:10,400 --> 00:19:11,600 - Just bring in the economics. 433 00:19:11,700 --> 00:19:12,200 - Exactly. 434 00:19:12,200 --> 00:19:15,400 - And and can I think about heterogeneity, 435 00:19:15,400 --> 00:19:18,300 but harnessed the power of the machine learning methods 436 00:19:18,500 --> 00:19:20,650 for some of the components. 437 00:19:20,650 --> 00:19:22,800 - Precisely. Exactly. 438 00:19:22,900 --> 00:19:24,250 So the question of interest 439 00:19:24,250 --> 00:19:25,600 is the same as the question of interest has always been, 440 00:19:25,600 --> 00:19:29,500 but we now have better methods for estimating some pieces of this. 441 00:19:29,900 --> 00:19:31,600 The place that seems harder to forecast 442 00:19:33,400 --> 00:19:34,850 is obviously, there's a huge amount going on 443 00:19:34,850 --> 00:19:36,300 in the machine learning literature 444 00:19:37,500 --> 00:19:38,600 and the limited ways of plugging it in 445 00:19:38,600 --> 00:19:39,700 that I've referenced so far 446 00:19:39,700 --> 00:19:42,900 are a limited piece of that. 447 00:19:43,000 --> 00:19:44,550 And so I think there are all sorts of other interesting questions 448 00:19:44,550 --> 00:19:46,100 about where... 449 00:19:47,100 --> 00:19:49,300 where does this interaction go? What else can we learn? 450 00:19:49,300 --> 00:19:52,000 And that's something where I think there's a ton going on 451 00:19:52,200 --> 00:19:54,300 which seems very promising, 452 00:19:54,300 --> 00:19:56,400 and I have no idea what the answer is. 453 00:19:57,000 --> 00:19:59,100 - No, I totally agree with that, 454 00:19:59,100 --> 00:20:01,200 but that makes it very exciting. 455 00:20:03,800 --> 00:20:06,100 And I think there's just a little work to be done there. 456 00:20:06,600 --> 00:20:09,000 Alright. So I say, he agrees with me there. 457 00:20:09,000 --> 00:20:11,400 [laughter] 458 00:20:12,450 --> 00:20:13,450 - I didn't say that per se. 459 00:20:14,500 --> 00:20:16,100 - [Narrator] If you'd like to watch more Nobel Conversations, 460 00:20:16,100 --> 00:20:17,700 click here. 461 00:20:18,000 --> 00:20:20,400 Pr if you'd like to learn more about econometrics, 462 00:20:20,500 --> 00:20:23,100 check out Josh's Mastering Econometrics series. 463 00:20:23,600 --> 00:20:26,500 If you'd like to learn more about Guido, Josh, and Isaiah, 464 00:20:26,700 --> 00:20:28,200 check out the links in the description. 465 00:20:28,550 --> 00:20:30,535 ♪ [music] ♪