WEBVTT 00:00:00.100 --> 00:00:02.350 ♪ [music] ♪ 00:00:03.700 --> 00:00:05.700 - [narrator] Welcome to Nobel Conversations. 00:00:07.000 --> 00:00:10.128 In this episode, Josh Angrist and Guido Imbens 00:00:10.128 --> 00:00:13.700 sit down with Isaiah Andrews to discuss and disagree 00:00:13.700 --> 00:00:16.580 over the role of machine learning in applied econometrics. 00:00:18.300 --> 00:00:19.769 - [Isaiah] So, of course, there are a lot of topics 00:00:19.769 --> 00:00:21.087 where you guys largely agree, 00:00:21.087 --> 00:00:22.313 but I'd like to turn to one 00:00:22.313 --> 00:00:24.240 where maybe you have some differences of opinion. 00:00:24.240 --> 00:00:25.728 So I'd love to hear some of your thoughts 00:00:25.728 --> 00:00:26.883 about machine learning 00:00:26.883 --> 00:00:29.900 and the goal that it's playing and is going to play in economics. 00:00:30.200 --> 00:00:33.352 - [Guido] I've looked at some data like the proprietary 00:00:33.352 --> 00:00:35.100 so that there's no published paper there. 00:00:36.719 --> 00:00:38.159 There was an experiment that was done 00:00:38.159 --> 00:00:39.500 on some search algorithm. 00:00:39.700 --> 00:00:41.497 And the question was... 00:00:42.901 --> 00:00:45.600 it was about ranking things and changing the ranking. 00:00:45.900 --> 00:00:47.500 That was sort of clear... 00:00:48.400 --> 00:00:50.600 that was going to be a lot of heterogeneity there. 00:00:50.600 --> 00:00:51.700 Mmm, 00:00:51.700 --> 00:00:58.120 You know, if you look for say, 00:00:58.300 --> 00:01:00.350 a picture of Britney Spears 00:01:00.350 --> 00:01:02.400 that it doesn't really matter where you rank it 00:01:02.400 --> 00:01:05.500 because you're going to figure out what you're looking for, 00:01:06.200 --> 00:01:07.867 whether you put it in the first or second 00:01:07.867 --> 00:01:09.800 or third position of the ranking. 00:01:10.100 --> 00:01:12.500 But if you're looking for the best econometrics book, 00:01:13.300 --> 00:01:16.500 if you put your book first or your book tenth, 00:01:16.500 --> 00:01:18.100 that's going to make a big difference 00:01:18.600 --> 00:01:21.829 how much how often people are going to click on it. 00:01:21.829 --> 00:01:23.417 And so there you go -- 00:01:23.417 --> 00:01:27.218 - [Josh] Why do I need machine learning to discover that? 00:01:27.218 --> 00:01:29.195 It seems like could I can discover it simply? 00:01:29.195 --> 00:01:30.435 - [Guido] So in general-- 00:01:30.435 --> 00:01:32.100 - [Josh] There were lots of possible... 00:01:32.100 --> 00:01:35.490 - You what you want to think about there being lots of characteristics 00:01:35.490 --> 00:01:37.610 of the items 00:01:37.610 --> 00:01:41.682 that you want to understand what drives the heterogeneity 00:01:42.300 --> 00:01:43.427 in the effect of-- 00:01:43.427 --> 00:01:45.600 - But you're just predicting 00:01:45.600 --> 00:01:47.700 In some sense, you're solving a marketing problem. 00:01:48.400 --> 00:01:49.580 - [inaudible] it's causal effect, 00:01:49.580 --> 00:01:51.800 - It's causal, but it has no scientific content. 00:01:51.800 --> 00:01:53.300 Think about... 00:01:54.100 --> 00:01:57.300 - No, but it's similar things in medical settings. 00:01:58.000 --> 00:02:01.300 If you do an experiment, you may actually be very interested 00:02:01.300 --> 00:02:03.900 in whether the treatment works for some groups or not. 00:02:03.900 --> 00:02:06.500 And you have a lot of individual characteristics, 00:02:06.500 --> 00:02:08.000 and you want to systematically search. 00:02:08.000 --> 00:02:09.500 - Yeah. I'm skeptical about that -- 00:02:09.500 --> 00:02:12.603 that sort of idea that there's this personal causal effect 00:02:12.603 --> 00:02:13.900 that I should care about, 00:02:14.000 --> 00:02:16.063 and that machine learning can discover it 00:02:16.063 --> 00:02:17.596 in some way that's useful. 00:02:17.596 --> 00:02:21.400 So think about -- I've done a lot of work on schools, 00:02:21.400 --> 00:02:23.950 going to, say, a charter school, 00:02:23.950 --> 00:02:25.225 a publicly funded private school, 00:02:25.225 --> 00:02:26.500 effectively, you know, that's free to structure 00:02:26.500 --> 00:02:29.300 its own curriculum for context there. 00:02:29.300 --> 00:02:31.000 Some types of charter schools 00:02:31.000 --> 00:02:32.700 generate spectacular achievement gains, 00:02:32.700 --> 00:02:36.400 and in the data set that produces that result, 00:02:36.400 --> 00:02:37.800 I have a lot of covariance. 00:02:37.800 --> 00:02:41.353 So I have baseline scores, and I have family background, 00:02:41.353 --> 00:02:43.576 the education of the parents, 00:02:43.576 --> 00:02:45.800 the sex of the child, the race of the child. 00:02:45.800 --> 00:02:48.300 And, well, soon as I put half a dozen of those together, 00:02:48.400 --> 00:02:51.900 I have a very high dimensional space. 00:02:52.300 --> 00:02:53.600 I'm definitely interested in sort of coarse features 00:02:53.600 --> 00:02:54.900 of that treatment effect, 00:02:54.900 --> 00:02:57.150 like whether it's better for people 00:02:57.150 --> 00:02:59.400 who come from lower income families. 00:03:02.600 --> 00:03:06.000 I have a hard time believing that there's an application, 00:03:06.400 --> 00:03:10.300 for the very high dimensional version of that, 00:03:10.500 --> 00:03:11.850 where I discovered that for non-white children 00:03:11.850 --> 00:03:13.200 who have high family incomes 00:03:13.800 --> 00:03:17.800 but baseline scores in the third quartile 00:03:18.300 --> 00:03:20.650 and only went to public school in the third grade 00:03:20.650 --> 00:03:23.000 but not the sixth grade. 00:03:23.000 --> 00:03:25.500 So that's what that high dimensional analysis produces. 00:03:25.800 --> 00:03:28.100 This very elaborate conditional statement. 00:03:28.300 --> 00:03:31.000 There's two things that are wrong with that in my view. 00:03:31.000 --> 00:03:32.500 First, I don't see it as... 00:03:32.500 --> 00:03:34.000 I just can't imagine why it's actionable. 00:03:34.600 --> 00:03:36.600 I don't know why you'd want to act on it. 00:03:36.600 --> 00:03:38.900 And I know also that there's some alternative model 00:03:38.900 --> 00:03:41.200 that fits almost as well, 00:03:41.800 --> 00:03:43.000 that flips everything, 00:03:43.200 --> 00:03:45.350 Because machine learning doesn't tell me 00:03:45.350 --> 00:03:47.500 that this is really the predictor that matters. 00:03:48.400 --> 00:03:52.300 It just tells me that this is a good predictor. 00:03:52.800 --> 00:03:54.350 And so, I think there is something different 00:03:54.350 --> 00:03:55.900 about the social science contest. 00:03:57.940 --> 00:03:59.545 - [Guido] I think the [socialized sign] applications 00:03:59.545 --> 00:04:01.150 you're talking about, 00:04:01.150 --> 00:04:02.600 once were... 00:04:03.400 --> 00:04:08.100 I think there's not a huge amount of heterogeneity in the effects. 00:04:08.400 --> 00:04:11.200 - [Josh] There might be 00:04:11.200 --> 00:04:14.000 if you allow me to to fill that space. 00:04:14.600 --> 00:04:16.350 - No... not even then. 00:04:16.350 --> 00:04:18.100 I think for a lot of those interventions, 00:04:18.300 --> 00:04:22.000 you would expect that the effect is the same sign for everybody. 00:04:23.400 --> 00:04:27.600 There may be small differences in the magnitude, but it's not... 00:04:28.200 --> 00:04:31.700 For a lot of these education defenses -- they're good for everybody. 00:04:32.900 --> 00:04:35.250 It's not that they're bad for some people 00:04:35.250 --> 00:04:37.600 and good for other people, 00:04:37.600 --> 00:04:39.200 and that is kind of very small pockets 00:04:39.200 --> 00:04:40.800 where they're bad there. 00:04:40.900 --> 00:04:43.900 But it may be some variation in the magnitude, 00:04:44.000 --> 00:04:48.200 but you would need very, very big data sets to find those. 00:04:48.400 --> 00:04:49.900 I agree that in those cases, 00:04:49.900 --> 00:04:51.400 they probably wouldn't be very actionable anyone. 00:04:51.700 --> 00:04:53.800 But I think there's a lot of other settings 00:04:54.100 --> 00:04:56.600 where there is much more heterogeneity. 00:04:57.400 --> 00:04:59.500 - Well, I'm open to that possibility, 00:04:59.500 --> 00:05:05.550 and I think the example you gave is essentially a marketing example. 00:05:06.430 --> 00:05:10.700 - No, those have implications for it and that's the organization, 00:05:10.700 --> 00:05:13.900 whether you need to worry about the... 00:05:14.000 --> 00:05:17.900 - Well, I need to see that paper. 00:05:18.400 --> 00:05:21.200 - So the sense I'm getting... 00:05:21.500 --> 00:05:23.100 - We still disagree on something. - Yes. 00:05:23.100 --> 00:05:24.100 [laughter] 00:05:24.100 --> 00:05:25.400 - We haven't converged on everything. 00:05:25.400 --> 00:05:26.050 - I'm getting that sense. 00:05:26.050 --> 00:05:26.700 [laughter] 00:05:27.200 --> 00:05:29.100 - Actually, we've diverged on this 00:05:29.100 --> 00:05:30.050 because this wasn't around to argue about. 00:05:30.050 --> 00:05:31.000 [laughter] 00:05:33.200 --> 00:05:35.600 - Is it getting a little warm here? 00:05:35.600 --> 00:05:38.000 - Warmed up. Warmed up is good. 00:05:38.100 --> 00:05:40.800 The sense I'm getting is, Josh, you're not saying 00:05:40.900 --> 00:05:43.400 that you're confident that there is no way 00:05:43.400 --> 00:05:45.400 that there is an application where the stuff. 00:05:45.400 --> 00:05:46.800 It's useful you are saying 00:05:46.800 --> 00:05:48.200 you are unconvinced by the existing application to date. 00:05:48.300 --> 00:05:51.280 Fair enough. 00:05:51.280 --> 00:05:53.120 - I'm very confident. 00:05:53.120 --> 00:05:54.300 [laughter] 00:05:54.300 --> 00:05:55.300 - In this case. 00:05:55.300 --> 00:05:57.500 - I think Josh does have a point 00:05:58.000 --> 00:06:02.100 that even in the prediction cases 00:06:02.300 --> 00:06:05.000 where a lot of the machine learning methods really shine 00:06:05.000 --> 00:06:06.600 is where there's just a lot of heterogeneity. 00:06:07.300 --> 00:06:10.600 - You don't really care much about the details there, right? 00:06:10.900 --> 00:06:15.000 It doesn't have a policy angle or something. 00:06:15.200 --> 00:06:18.100 - They kind of recognizing handwritten digits and stuff. 00:06:18.300 --> 00:06:21.150 It does much better there 00:06:21.150 --> 00:06:24.000 than building some complicated model. 00:06:24.400 --> 00:06:28.100 But a lot of the social science, a lot of the economic applications, 00:06:28.300 --> 00:06:30.200 we actually know a huge amount about the relationship 00:06:30.200 --> 00:06:32.100 between its variables. 00:06:32.100 --> 00:06:34.600 A lot of the relationships are strictly monotone. 00:06:35.400 --> 00:06:39.400 Education is going to increase people's earnings, 00:06:39.800 --> 00:06:41.950 irrespective of the demographic, 00:06:41.950 --> 00:06:44.100 irrespective of the level of education you already have. 00:06:44.100 --> 00:06:45.950 - Until they get to a Ph.D. 00:06:45.950 --> 00:06:47.800 - Yeah, there is a graduate school... 00:06:48.150 --> 00:06:49.150 [laughter] 00:06:49.500 --> 00:06:50.700 but go over a reasonable range. 00:06:51.600 --> 00:06:55.900 It's not going to go down very much. 00:06:56.100 --> 00:06:57.900 In a lot of the settings 00:06:57.900 --> 00:06:59.700 where these machine learning methods shine, 00:06:59.700 --> 00:07:01.900 there's a lot of [ ] 00:07:02.100 --> 00:07:04.900 kind of multimodality in these relationships, 00:07:05.300 --> 00:07:08.400 and they're going to be very powerful. 00:07:08.400 --> 00:07:11.500 But I still stand by that. 00:07:11.700 --> 00:07:16.100 These methods just have a huge amount to offer 00:07:16.400 --> 00:07:18.100 for economists, 00:07:18.200 --> 00:07:21.700 and they're going to be a big part of the future. 00:07:23.400 --> 00:07:24.600 - [Isaiah] Feels like there's something interesting 00:07:24.600 --> 00:07:25.800 to be said about machine learning here. 00:07:25.800 --> 00:07:27.700 So, Guido, I was wondering, could you give some more... 00:07:28.000 --> 00:07:29.000 maybe some examples of the sorts of examples 00:07:29.000 --> 00:07:32.500 you're thinking about with applications [ ] at the moment? 00:07:32.500 --> 00:07:34.100 - So on areas where 00:07:34.700 --> 00:07:36.400 instead of looking for average cause or effects 00:07:36.500 --> 00:07:39.350 we're looking for individualized estimates, 00:07:39.350 --> 00:07:42.200 predictions of cause or effects 00:07:42.400 --> 00:07:44.950 and the machine learning algorithms have been very effective, 00:07:48.300 --> 00:07:51.500 Traditionally, we would have done these things using kernel methods. 00:07:51.600 --> 00:07:54.500 And theoretically they work great, 00:07:54.600 --> 00:07:56.000 and there's some arguments 00:07:56.000 --> 00:07:57.400 that, formally, you can't do any better. 00:07:57.600 --> 00:08:00.500 But in practice, they don't work very well. 00:08:00.900 --> 00:08:03.150 Random causal forest-type things 00:08:03.150 --> 00:08:05.400 that Stefan Wager and Susan Athey have been working on 00:08:05.400 --> 00:08:09.500 have used very widely. 00:08:09.600 --> 00:08:12.200 They've been very effective in these settings 00:08:12.400 --> 00:08:18.100 to actually get causal effects that vary be [ ]. 00:08:20.700 --> 00:08:23.200 I think this is still just the beginning of these methods. 00:08:23.200 --> 00:08:25.700 But in many cases, 00:08:26.400 --> 00:08:31.600 these algorithms are very effective as searching over big spaces 00:08:31.800 --> 00:08:35.600 and finding the functions that fit very well 00:08:35.900 --> 00:08:41.100 in ways that we couldn't really do beforehand. 00:08:41.500 --> 00:08:43.400 - I don't know of an example 00:08:43.400 --> 00:08:45.300 where machine learning has generated insights 00:08:45.300 --> 00:08:48.100 about a causal effect that I'm interested in. 00:08:48.300 --> 00:08:49.800 And I do know of examples 00:08:49.800 --> 00:08:51.300 where it's potentially very misleading. 00:08:51.300 --> 00:08:53.700 So I've done some work with Brigham Frandsen, 00:08:54.100 --> 00:08:55.100 using, for example, random forest to model covariate effects 00:08:55.100 --> 00:08:59.900 in an instrumental variables problem 00:09:00.200 --> 00:09:01.200 Where you need you need to condition on covariance. 00:09:04.400 --> 00:09:06.300 And you don't particularly have strong feelings 00:09:06.300 --> 00:09:08.200 about the functional form for that, 00:09:08.200 --> 00:09:10.000 so maybe you should curve... 00:09:10.900 --> 00:09:12.700 be open to flexible curve fitting, 00:09:12.700 --> 00:09:14.500 and that leads you down a path 00:09:14.500 --> 00:09:18.000 where there's a lot of nonlinearities in the model, 00:09:18.200 --> 00:09:20.600 and that's very dangerous with IV 00:09:20.600 --> 00:09:23.000 because any sort of excluded non-linearity 00:09:23.300 --> 00:09:25.450 potentially generates a spurious causal effect 00:09:25.450 --> 00:09:27.600 and Brigham and I showed that very powerfully. 00:09:27.900 --> 00:09:32.200 I think in the case of two instruments 00:09:32.700 --> 00:09:36.000 that come from a paper of mine with Bill Evans, 00:09:36.500 --> 00:09:37.600 where if you replace it 00:09:38.100 --> 00:09:40.350 a traditional two stage [ ] squares estimator 00:09:40.350 --> 00:09:42.600 with some kind of random forest, 00:09:42.900 --> 00:09:48.000 you get very precisely estimated [non-sense] estimates. 00:09:49.000 --> 00:09:51.100 I think that's a big caution. 00:09:51.100 --> 00:09:53.400 In view of those findings in an example I care about 00:09:53.700 --> 00:09:57.100 where the instruments are very simple 00:09:57.400 --> 00:09:59.100 and I believe that they're valid, 00:09:59.300 --> 00:10:01.600 I would be skeptical of that. 00:10:02.900 --> 00:10:06.800 So non-linearity and IV don't mix very comfortably. 00:10:07.200 --> 00:10:10.450 No, it sounds like that's already a more complicated... 00:10:10.450 --> 00:10:11.400 - Well, it's IV.... - Yeah. 00:10:12.500 --> 00:10:16.700 - ...and we work on that. 00:10:17.150 --> 00:10:17.875 [laughter] 00:10:17.875 --> 00:10:18.600 - Fair enough. 00:10:18.600 --> 00:10:20.450 - As Editor of Econometric [guy], 00:10:20.450 --> 00:10:22.300 a lot of these papers cross by my desk, 00:10:22.700 --> 00:10:26.100 but the motivation is not clear 00:10:26.100 --> 00:10:29.500 and, in fact, really lacking. 00:10:29.800 --> 00:10:35.100 They're not... [we call] type semi-parametric foundational papers. 00:10:35.400 --> 00:10:37.100 So that that's a big problem. 00:10:38.000 --> 00:10:42.400 A related problem is that we have this tradition in econometrics 00:10:42.600 --> 00:10:47.500 of being very focused on these formal [ ] results. 00:10:48.800 --> 00:10:52.600 We have just have a lot of papers where people propose a method 00:10:52.800 --> 00:10:55.700 and then establish the asymptotic properties 00:10:56.300 --> 00:10:59.100 in a very kind of standardized way. 00:10:59.100 --> 00:11:01.900 - Is that bad? 00:11:02.900 --> 00:11:07.200 - Well, I think it's sort of closed the door 00:11:07.200 --> 00:11:09.400 for a lot of work that doesn't fit it into that. 00:11:09.400 --> 00:11:11.600 where in the machine learning literature, 00:11:11.900 --> 00:11:14.300 a lot of things are more algorithmic. 00:11:14.431 --> 00:11:18.500 People had algorithms for coming up with predictions 00:11:18.800 --> 00:11:21.200 that turn out to actually work much better 00:11:21.200 --> 00:11:23.600 than, say, nonparametric kernel regression 00:11:24.000 --> 00:11:26.800 For a long time, we were doing all the nonparametrics in econometrics, 00:11:26.800 --> 00:11:28.950 we were using kernel regression, 00:11:28.950 --> 00:11:31.100 and it was great for proving theorems. 00:11:31.300 --> 00:11:33.050 You could get [ ] intervals 00:11:33.050 --> 00:11:34.800 and consistency, and asymptotic normality, 00:11:34.800 --> 00:11:35.900 and it was all great, 00:11:35.900 --> 00:11:37.000 But it wasn't very useful. 00:11:37.300 --> 00:11:39.100 And the things they did in machine learning 00:11:39.100 --> 00:11:40.900 are just way, way better. 00:11:41.000 --> 00:11:43.050 But they didn't have the problem-- 00:11:43.050 --> 00:11:44.300 - That's not my beef with machine learning theory. 00:11:44.300 --> 00:11:45.300 [laughter] 00:11:45.300 --> 00:11:51.200 No, but I'm saying there, for the prediction part, 00:11:51.400 --> 00:11:52.950 it does much better. 00:11:52.950 --> 00:11:54.500 - Yeah, it's a better curve fitting to it. 00:11:54.900 --> 00:11:56.500 - But it did so in a way 00:11:57.100 --> 00:11:58.500 that would not have made those papers 00:11:58.500 --> 00:11:59.900 initially easy to get into, the econometrics journals, 00:12:04.650 --> 00:12:06.300 because it wasn't proving the type of things. 00:12:06.400 --> 00:12:08.800 When Brigham was doing his regression trees 00:12:08.800 --> 00:12:11.200 that just didn't fit in. 00:12:11.800 --> 00:12:15.100 I think he would have had a very hard time 00:12:15.200 --> 00:12:18.400 publishing these things in econometric journals. 00:12:18.900 --> 00:12:24.400 I think we've limited ourselves too much 00:12:24.700 --> 00:12:27.900 that left us close things off 00:12:28.000 --> 00:12:29.400 for a lot of these machine learning methods 00:12:29.400 --> 00:12:30.800 that are actually very useful. 00:12:30.900 --> 00:12:34.000 I mean, I think, in general, 00:12:34.900 --> 00:12:36.200 that literature, the computer scientist, 00:12:36.200 --> 00:12:37.750 have proposed a huge number of these algorithms 00:12:37.750 --> 00:12:39.300 that actually are very useful. 00:12:45.500 --> 00:12:47.300 and that are affecting 00:12:47.300 --> 00:12:49.100 the way we're going to be doing empirical work. 00:12:49.800 --> 00:12:52.450 But we've not fully internalized that 00:12:52.450 --> 00:12:55.100 because we're still very focused 00:12:55.300 --> 00:12:57.500 on getting point estimates and getting standard errors 00:12:58.600 --> 00:13:01.200 and getting P values 00:13:01.700 --> 00:13:03.100 in a way that we need to move beyond 00:13:03.300 --> 00:13:04.300 to fully harness the force, 00:13:04.300 --> 00:13:10.700 the benefits from the machine learning literature. 00:13:10.900 --> 00:13:13.000 - On the one hand, I guess I very much take your point 00:13:13.000 --> 00:13:15.100 that sort of the traditional econometrics framework 00:13:15.200 --> 00:13:18.600 of sort of propose a method, prove a limit theorem 00:13:18.600 --> 00:13:22.600 under some asymptotic story, story story, story story... 00:13:22.600 --> 00:13:26.900 publisher paper is constraining. 00:13:26.900 --> 00:13:29.700 And that, in some sense, 00:13:29.700 --> 00:13:30.575 by thinking more broadly 00:13:30.575 --> 00:13:31.450 about what a methods paper could look like, 00:13:31.450 --> 00:13:33.200 we may [write] in some sense. 00:13:33.200 --> 00:13:35.900 Certainly the machine learning literature has found a bunch of things, 00:13:35.900 --> 00:13:38.300 which seem to work quite well for a number of problems 00:13:38.300 --> 00:13:40.350 and are now having substantial influence in economics. 00:13:40.350 --> 00:13:42.400 I guess a question I'm interested in 00:13:42.400 --> 00:13:44.800 is how do you think about the role of... 00:13:47.900 --> 00:13:51.200 sort of -- do you think there is no value in the theory part of it? 00:13:51.600 --> 00:13:54.800 Because I guess a question that I often have 00:13:54.800 --> 00:13:56.900 to sort of seeing that output from a machine learning tool, 00:13:56.900 --> 00:13:59.400 that actually a number of the methods that you talked about 00:13:59.400 --> 00:14:01.800 actually do have inferential results developed for them, 00:14:02.600 --> 00:14:04.500 something that I always wonder about 00:14:04.500 --> 00:14:06.400 of uncertainty quantification and just... 00:14:06.500 --> 00:14:08.000 I have my prior, 00:14:08.000 --> 00:14:11.000 I come into the world with my view. I see the result of this thing. 00:14:11.000 --> 00:14:12.750 How should I update based on it? 00:14:12.750 --> 00:14:14.500 And in some sense, if I'm in a world 00:14:14.600 --> 00:14:15.100 where things are normally distributed, 00:14:15.200 --> 00:14:16.700 I know how to do it here -- 00:14:16.700 --> 00:14:18.200 here I don't. 00:14:18.200 --> 00:14:21.400 And so I'm interested to hear what you think about that. 00:14:21.500 --> 00:14:24.300 - I don't see this as sort of saying, well, 00:14:24.400 --> 00:14:26.500 these results are not interesting, 00:14:26.600 --> 00:14:27.700 but it's going to be a lot of cases 00:14:28.000 --> 00:14:29.600 where it's going to be incredibly hard 00:14:29.600 --> 00:14:31.200 to get those results 00:14:31.200 --> 00:14:33.200 and we may not be able to get there 00:14:33.400 --> 00:14:35.550 and we may need to do it in stages 00:14:35.550 --> 00:14:37.700 where first someone says, 00:14:39.600 --> 00:14:40.900 "Hey, I have this interesting algorithm 00:14:40.900 --> 00:14:42.200 for doing something 00:14:42.200 --> 00:14:44.800 and it works well by some of the criterion 00:14:45.600 --> 00:14:49.900 that on this particular data set, 00:14:51.000 --> 00:14:53.400 and I'm visit put it out there, 00:14:53.700 --> 00:14:55.850 and maybe someone will figure out a way 00:14:55.850 --> 00:14:58.000 that you can later actually still do inference 00:14:58.000 --> 00:14:59.100 on the [sum] condition, 00:14:59.100 --> 00:15:02.100 and maybe those are not particularly realistic conditions, 00:15:02.100 --> 00:15:03.800 then we kind of go further. 00:15:03.800 --> 00:15:05.500 But I think we've been constraining things too much 00:15:06.700 --> 00:15:09.050 where we said, 00:15:09.050 --> 00:15:11.400 "This is the type of things that we need to do. 00:15:12.100 --> 00:15:14.400 And in some sense, 00:15:15.700 --> 00:15:18.200 that goes back to the way Josh and I 00:15:19.700 --> 00:15:21.900 thought about things for the [local average treatment] effect. 00:15:21.900 --> 00:15:23.250 That wasn't quite the way 00:15:23.250 --> 00:15:24.600 people were thinking about these problems before. 00:15:24.600 --> 00:15:29.200 There was a sense that some of the people said 00:15:29.500 --> 00:15:31.900 the way you need to do these things is you first say, 00:15:32.200 --> 00:15:34.250 what you're interested in in estimating 00:15:34.250 --> 00:15:36.300 and then you do the best job you can in estimating that. 00:15:38.100 --> 00:15:44.200 and what you guys are doing is you're doing it backwards. 00:15:44.300 --> 00:15:46.700 You kind of say, "Here, I have an estimator, 00:15:47.300 --> 00:15:49.600 and now I'm going to figure out what it's estimating, 00:15:51.400 --> 00:15:53.900 and I suppose you're going to say why you think that's interesting 00:15:53.900 --> 00:15:56.600 or maybe why it's not interesting, and that's not okay. 00:15:56.600 --> 00:15:58.600 You're not allowed to do that that way. 00:15:59.000 --> 00:16:04.100 And I think we should just be a little bit more flexible 00:16:04.300 --> 00:16:06.300 in thinking about how to look at problems 00:16:06.400 --> 00:16:08.850 because I think we've missed some things 00:16:08.850 --> 00:16:11.300 by not doing that. 00:16:13.000 --> 00:16:14.800 - [Josh] So you've heard our views, Isaiah. 00:16:14.800 --> 00:16:16.600 You've seen that we have some points of disagreement. 00:16:17.000 --> 00:16:20.400 Why don't you referee this dispute for us? 00:16:20.950 --> 00:16:21.950 [laughter] 00:16:22.500 --> 00:16:25.300 - Oh, it's so nice of you to ask me a small question. 00:16:25.300 --> 00:16:28.100 So I guess for one, 00:16:28.200 --> 00:16:33.200 I very much agree with something that Guido said earlier of... 00:16:34.100 --> 00:16:35.100 [laughter] 00:16:36.500 --> 00:16:37.900 - So one thing where it seems 00:16:37.900 --> 00:16:39.650 where the case for machine learning seems relatively clear 00:16:39.650 --> 00:16:41.400 is in settings where we're interested in some version 00:16:41.500 --> 00:16:45.100 of a nonparametric prediction problem. 00:16:45.100 --> 00:16:47.400 So I'm interested in estimating 00:16:47.400 --> 00:16:49.700 a conditional expectation or conditional probability, 00:16:50.000 --> 00:16:52.100 and in the past, maybe I would have run a kernel... 00:16:52.100 --> 00:16:53.950 I would have run a kernel regression 00:16:53.950 --> 00:16:55.800 or I would have run a series regression, 00:16:56.100 --> 00:16:57.400 or something along those lines. 00:16:58.700 --> 00:17:00.350 It seems like, at this point, we've a fairly good sense 00:17:00.350 --> 00:17:02.000 that in a fairly wide range of applications, 00:17:02.000 --> 00:17:06.300 machine learning methods seem to do better 00:17:06.800 --> 00:17:08.800 for estimating conditional mean functions 00:17:08.800 --> 00:17:10.400 or conditional probabilities 00:17:10.400 --> 00:17:12.000 or various other nonparametric objects 00:17:12.400 --> 00:17:16.600 than more traditional nonparametric methods that were studied in econometrics 00:17:16.600 --> 00:17:19.100 and statistics, especially in high dimensional settings. 00:17:19.500 --> 00:17:23.100 So you thinking of maybe the propensity score or something like that? 00:17:23.100 --> 00:17:25.300 So exactly, so nuisance functions. Yeah. 00:17:25.300 --> 00:17:28.900 So things like propensity scores things or I mean even objects 00:17:28.900 --> 00:17:30.100 of more direct inference 00:17:30.200 --> 00:17:32.400 interest, like conditional average treatment effects, right? 00:17:32.400 --> 00:17:35.100 Which of the difference of two conditional, expectation functions, 00:17:35.100 --> 00:17:36.300 potentially things like that. 00:17:36.500 --> 00:17:40.400 Of course, even there, right? We the the theory 00:17:40.500 --> 00:17:43.700 for in France or the theory for sort of how to how to interpret, 00:17:43.700 --> 00:17:45.900 how to make large simple statements about some of these things are 00:17:46.000 --> 00:17:50.100 less well-developed depending on the machine learning, estimator used. 00:17:50.100 --> 00:17:53.800 And so, I think there's something that is tricky is that we 00:17:53.900 --> 00:17:55.700 can have these methods, which work a lot, 00:17:55.700 --> 00:17:58.000 which seemed to work a lot better for some purposes. 00:17:58.000 --> 00:18:01.600 But which we need to be a bit careful in how we plug them in or how 00:18:01.600 --> 00:18:03.300 we interpret the resulting statements. 00:18:03.600 --> 00:18:06.200 But of course, that's a very, very active area right now. We're 00:18:06.400 --> 00:18:10.400 People are doing tons of great work. And so I exfoli expect and hope 00:18:10.400 --> 00:18:12.800 to see much more going forward there. 00:18:13.000 --> 00:18:17.300 So one issue with machine learning, that always seems a danger is, or 00:18:17.400 --> 00:18:20.300 that is sometimes a danger and had some times led to 00:18:20.500 --> 00:18:22.600 applications that have made. Less sense, is 00:18:22.800 --> 00:18:25.100 when folks start with a method that are 00:18:25.300 --> 00:18:28.500 start with a method that they're very excited about rather than a question, 00:18:28.900 --> 00:18:32.100 right? So sort of starting with a question where here's the 00:18:32.500 --> 00:18:36.200 object I'm interested in here is the parameter of Interest. Let me 00:18:36.700 --> 00:18:37.100 You know, 00:18:37.300 --> 00:18:39.500 think about how I would identify that thing, 00:18:39.500 --> 00:18:41.800 how I would recover that thing, if I had a ton of data, 00:18:41.900 --> 00:18:44.000 oh, here's a conditional expectation function. 00:18:44.000 --> 00:18:47.100 Let me plug in an estimator on machine. Learning estimator for that. 00:18:47.200 --> 00:18:48.800 That seems very very sensible. 00:18:49.000 --> 00:18:53.100 Whereas, you know, if I digress quantity on price 00:18:53.700 --> 00:18:56.000 and say that I used a machine learning method, 00:18:56.300 --> 00:18:58.900 maybe I'm satisfied that that solves the in dodging, 80 problem. 00:18:58.900 --> 00:19:01.200 We're usually worried about their maybe I'm not, 00:19:01.500 --> 00:19:03.200 but again, that's something where the, 00:19:03.400 --> 00:19:06.300 the way to address. It, seems relatively clear, right? 00:19:06.500 --> 00:19:09.000 It's the find your object of interest and 00:19:09.200 --> 00:19:11.600 think about, is that just bringing the economics? 00:19:11.700 --> 00:19:12.200 Exactly. 00:19:12.200 --> 00:19:15.400 And and can I think about it, and they denied it, but harnessed 00:19:15.400 --> 00:19:18.300 the power of the machine learning methods for precisely 00:19:18.500 --> 00:19:22.800 for some of the components precisely. Exactly. So sort of, you know, the, the, 00:19:22.900 --> 00:19:25.600 the question of interest is the same as the question of interest is always been, 00:19:25.600 --> 00:19:29.500 but we now better methods for estimating some pieces of this, right? The 00:19:29.900 --> 00:19:31.600 the place that seems harder to, uh, 00:19:31.900 --> 00:19:33.400 harder to forecast is Right. 00:19:33.400 --> 00:19:36.300 Obviously, there's a huge amount going in going on in the machine. 00:19:36.400 --> 00:19:37.400 Learning literature 00:19:37.500 --> 00:19:39.700 and the great sort of The Limited ways 00:19:39.700 --> 00:19:42.900 of plugging it in that I've referenced so far are limited piece of that. 00:19:43.000 --> 00:19:46.100 And so I think there are all sorts of other interesting questions about where, 00:19:46.300 --> 00:19:46.900 right sort of 00:19:47.100 --> 00:19:49.300 where does this interaction go? What else can we learn? 00:19:49.300 --> 00:19:52.000 And that's something where, you know, I think there's 00:19:52.200 --> 00:19:56.400 a ton going on which seems very promising and I have no idea what the answer is. 00:19:57.000 --> 00:20:01.200 No, no. No, it's I so I totally agree with that but it's no. 00:20:01.800 --> 00:20:03.500 That's makes it very exciting. 00:20:03.800 --> 00:20:06.100 And I think that's just a little work to be done there. 00:20:06.600 --> 00:20:11.400 All right. So I say agrees with me there, say that person. 00:20:14.500 --> 00:20:17.700 If you'd like to watch more Nobel conversations, click here, 00:20:18.000 --> 00:20:20.400 or if you'd like to learn more about econometrics, 00:20:20.500 --> 00:20:23.100 check out Josh's mastering econometrics series. 00:20:23.600 --> 00:20:26.500 If you'd like to learn more about he do Josh and Isaiah 00:20:26.700 --> 00:20:28.200 check out the links in the description.