WEBVTT 00:00:00.299 --> 00:00:04.644 因果推断之路径既黑暗又危险 00:00:05.139 --> 00:00:08.015 但是计量经济学是很厉害的武器 00:00:08.480 --> 00:00:11.734 当自然界给你带来偶然的随机分配时 00:00:11.734 --> 00:00:15.803 使用气势汹汹与灵活多變的 工具变量进行攻击 00:00:19.393 --> 00:00:21.094 [] 00:00:23.653 --> 00:00:26.362 随机试验是完成 “其他条件不变”的比较 00:00:26.362 --> 00:00:28.704 的最可靠途径 00:00:28.704 --> 00:00:32.640 但我们经常无法使用 这个功能强大的工具 00:00:33.224 --> 00:00:36.940 但是有时候,随机是偶然发生的 00:00:36.940 --> 00:00:40.592 这时候我们转向工具变量 00:00:40.592 --> 00:00:41.938 —简称IV 00:00:41.938 --> 00:00:44.508 工具变量 00:00:44.508 --> 00:00:48.186 今天的课堂是IV两节课的第一节 00:00:48.958 --> 00:00:52.801 我们的第一节IV课 从学校的故事开始 00:00:52.801 --> 00:00:54.348 [] 00:00:54.348 --> 00:00:56.138 特许学校是一些公立学校 00:00:56.138 --> 00:01:00.112 不受日常学区监督 与教师工会合同约束 00:01:00.895 --> 00:01:03.511 特许学校能否提高成绩 00:01:03.511 --> 00:01:05.161 是美国教育改革史上 00:01:05.161 --> 00:01:07.761 最重要的问题之一 00:01:08.145 --> 00:01:12.562 最受欢迎的特许学校的申请人数 远多于学位 00:01:12.562 --> 00:01:16.462 因此抽奖运决定了 谁家孩子可获录取 00:01:16.870 --> 00:01:20.695 在学生争夺机会时需要面对很多风险 00:01:20.695 --> 00:01:25.003 正如获奖纪录片“等待超人”中 00:01:25.003 --> 00:01:27.832 所描述的那样 00:01:27.832 --> 00:01:29.699 等待结果时会产生很多种情绪 00:01:30.258 --> 00:01:32.916 别哭,你会让妈妈哭的 好吗? 00:01:37.498 --> 00:01:40.618 特许学校真的能提供更好的教育吗? 00:01:40.948 --> 00:01:43.183 评论家肯定会说"不是的" 00:01:43.413 --> 00:01:46.586 他们会争辩说特许学校 能夠招募更好 00:01:46.586 --> 00:01:50.164 更聪明或更主动的学生 因此以后结果的差异 00:01:50.164 --> 00:01:52.061 反映了选择性偏差 00:01:52.595 --> 00:01:54.729 等一下,这个似乎很容易 00:01:55.139 --> 00:01:57.639 在抽奖活动中 我们会随机选择优胜者 00:01:57.639 --> 00:02:00.083 因此只比较赢家和输家 很明显的 00:02:00.083 --> 00:02:01.784 在正确的轨道上,卡马尔 00:02:01.784 --> 00:02:04.375 但是特许学校的抽签安排 00:02:04.375 --> 00:02:07.560 不会强迫孩子们进入 或离开特定的学校 00:02:07.749 --> 00:02:10.667 他们随机分配了特许学校的学位 00:02:11.650 --> 00:02:13.449 有些孩子很幸运 00:02:13.449 --> 00:02:14.966 有些孩子不是 00:02:14.966 --> 00:02:17.235 如果我们只是想知道特许学校 00:02:17.235 --> 00:02:19.202 所带来的影响 00:02:19.202 --> 00:02:22.417 我们可以将其视为随机试验 00:02:22.717 --> 00:02:24.684 但是,我们只对特许学校 就学的影响 00:02:24.684 --> 00:02:27.042 感兴趣 00:02:27.042 --> 00:02:28.283 而对录取不感兴趣 00:02:28.568 --> 00:02:32.039 并非所有获录取的学生 都会接受学位 00:02:32.039 --> 00:02:37.234 IV将被录取为特许学校学生的影响 00:02:37.234 --> 00:02:40.367 转变为实际就读特许学校的影响 00:02:40.367 --> 00:02:42.344 - 太酷了 - 哦,太好了 00:02:45.925 --> 00:02:48.871 让我们看一个例子 00:02:48.871 --> 00:02:52.353 这是一所执行知识就是力量专案 的特许学校,或简称为KIPP 00:02:52.736 --> 00:02:54.937 这所KIPP特许学校位于林恩 00:02:54.937 --> 00:02:58.837 一座位于麻省海边的 褪色工业城镇 00:02:59.104 --> 00:03:01.886 这所学校的申请者多于学位 00:03:01.886 --> 00:03:05.620 因此他们要抽签来挑选学生 00:03:05.834 --> 00:03:11.854 从2005年到2008年 共有371名四年级以及五年级生 00:03:11.854 --> 00:03:15.350 参加了KIPP林恩的抽签 00:03:15.350 --> 00:03:18.805 当中253名学生KIPP获录取 00:03:18.805 --> 00:03:21.651 118名学生没有录取 00:03:21.967 --> 00:03:26.001 一年后,获录取者的数学分数 00:03:26.001 --> 00:03:27.852 比未获录取者更高 00:03:27.852 --> 00:03:30.466 我们并不是试图弄清楚 00:03:30.466 --> 00:03:33.803 获录取后是否会提高 你的数学水平 00:03:34.070 --> 00:03:38.471 我们想知道参加KIPP 是否会使你的数学成绩改进 00:03:39.041 --> 00:03:45.750 在253位获录取者中 实际上只有199位到KIPP上学 00:03:46.139 --> 00:03:48.804 其他学生选择了传统的公立学校 00:03:49.563 --> 00:03:55.536 同样,在118名未被录取的学生中 事实上有一些最终参加了KIPP 00:03:55.536 --> 00:03:57.452 他们后来也获录取 00:03:57.452 --> 00:04:00.044 那么,实际上参加KIPP 00:04:00.044 --> 00:04:02.377 对考试成绩有何影响呢? 00:04:03.109 --> 00:04:05.426 为什么我们不能只衡量 他们的数学成绩? 00:04:05.894 --> 00:04:07.235 这是很好的问题 00:04:07.235 --> 00:04:09.302 你将他们与谁进行比较呢? 00:04:09.302 --> 00:04:11.111 那些没有参加的学生 00:04:11.111 --> 00:04:12.944 上学率是随机的吗? 00:04:14.161 --> 00:04:16.177 - 不是啊 - 选择性偏差 00:04:16.177 --> 00:04:17.909 - 对啊 - 什么? 00:04:17.909 --> 00:04:21.826 KIPP的录取是随机的,因此我们 对“其他条件不变”的假设充满信心 00:04:21.826 --> 00:04:26.409 但上学率不是随机的 00:04:26.635 --> 00:04:30.626 选择接受录取通知 00:04:30.626 --> 00:04:32.984 可能是与数学成绩有关的特征 00:04:33.251 --> 00:04:36.157 例如,有奉献精神的父母 00:04:36.157 --> 00:04:38.957 更有可能接受录取 00:04:38.957 --> 00:04:42.646 无论上那间学校 00:04:42.646 --> 00:04:44.090 他们的孩子的数学成绩 也有可能更好 00:04:44.090 --> 00:04:45.114 对啊 00:04:45.114 --> 00:04:47.725 IV将录取的影响 00:04:47.725 --> 00:04:50.567 转化为KIPP上学率的影响 00:04:50.573 --> 00:04:53.371 并就一些获录取者到其他学校上学 00:04:53.371 --> 00:04:56.573 而一些未被录取者还是设法 参加了KIPP 而进行调整 00:04:56.950 --> 00:05:00.517 本质上,IV需要进行不完全的随机化 00:05:00.517 --> 00:05:03.007 并进行适当的调整 00:05:03.684 --> 00:05:07.107 怎么样? IV描述了一种连锁反应 00:05:07.641 --> 00:05:10.343 为什么学校的录取会影响成绩? 00:05:10.343 --> 00:05:13.256 可能是因为这影响了 特许学校的上学率 00:05:13.256 --> 00:05:16.643 而特许学校的上学率 提高了数学成绩 00:05:16.643 --> 00:05:20.645 连锁反应的第一个环节 称之为“第一阶段” 00:05:20.645 --> 00:05:24.478 是抽签对特许学校上学率的影响 00:05:24.478 --> 00:05:28.452 第二阶段是在特许学校学 00:05:28.452 --> 00:05:30.153 以及结果变量之间的关联 00:05:30.153 --> 00:05:32.261 在这情况下,数学分数 00:05:32.732 --> 00:05:36.441 工具变量或简称为“工具” 00:05:36.441 --> 00:05:40.246 是启动链式反应的变量 00:05:40.979 --> 00:05:43.993 工具变量对结果的影响 00:05:43.993 --> 00:05:46.631 称为简化式 00:05:48.143 --> 00:05:51.869 这个链式反应可以用数学表示 00:05:51.869 --> 00:05:54.241 我们乘以第一阶段 00:05:54.241 --> 00:05:56.349 即录取者对上学率的影响 00:05:56.349 --> 00:05:57.960 到第二阶段 00:05:57.960 --> 00:06:00.538 上学率对分数的影响 00:06:00.538 --> 00:06:02.713 我们得到简化式 00:06:02.713 --> 00:06:05.680 获录取对分数的影响 00:06:06.780 --> 00:06:11.566 简化式和第一阶段是可观察的 并且易于计算 00:06:11.752 --> 00:06:14.876 但是,上学率对成绩的影响 00:06:14.876 --> 00:06:17.093 并未能直接观察到 00:06:17.093 --> 00:06:20.360 这是我们试图确定的因果关系 00:06:21.043 --> 00:06:23.827 考虑到我们将在稍后进行讨论的 一些重要假设 00:06:23.827 --> 00:06:25.977 我们可以通过将简化式 除以第一阶段 00:06:25.977 --> 00:06:29.265 来找出KIPP上学率的影响 00:06:29.265 --> 00:06:32.910 通过示例,这点将会更加清楚 00:06:32.910 --> 00:06:34.207 让我们做吧 00:06:37.161 --> 00:06:38.728 有关衡量的简短笔记 00:06:38.728 --> 00:06:41.745 我们使用标准差来衡量成就 00:06:41.745 --> 00:06:44.728 通常用希腊字母sigma (σ) 表示 00:06:44.728 --> 00:06:48.862 一个σ是从大多数成就分配的 最低15% 00:06:48.862 --> 00:06:51.634 到中间位置的巨大变化 00:06:51.634 --> 00:06:55.412 甚至¼或½ σ 的差异也很大 00:06:56.262 --> 00:06:58.389 现在我们准备将一些数字 00:06:58.389 --> 00:07:01.655 插入到前面介绍的方程式中 00:07:01.655 --> 00:07:03.231 首先,获录取对数学成绩 00:07:03.231 --> 00:07:06.076 有何影响呢? 00:07:06.354 --> 00:07:10.421 KIPP申请人的数学成绩是 00:07:10.421 --> 00:07:11.835 申请KIPP之前一年中 00:07:11.835 --> 00:07:14.386 低于州平均值的标准差的三分之一 00:07:14.386 --> 00:07:18.320 但是一年后,获录取者得分 达到了州平均水平 00:07:18.320 --> 00:07:21.482 而未被录取者 00:07:21.482 --> 00:07:25.499 仍然落后于平均分数-0.36σ 00:07:25.834 --> 00:07:29.619 获录取者对分数的影响 是获录取者的分数 00:07:29.619 --> 00:07:32.819 与未被录取者的分数之间的差异 00:07:33.403 --> 00:07:35.784 获录取者的平均数学成绩 00:07:35.784 --> 00:07:38.269 减去未被录取者的平均数学成绩 00:07:38.269 --> 00:07:41.502 你的答案是0.36σ 00:07:41.908 --> 00:07:46.880 Next up: what's the effect of winning the lottery on attendance? 00:07:46.880 --> 00:07:49.193 In other words, if you win the lottery, 00:07:49.193 --> 00:07:52.257 how much more likely are you to attend KIPP 00:07:52.257 --> 00:07:53.456 than if you lose? 00:07:53.671 --> 00:07:57.798 First, what percentage of lottery winners attend KIPP? 00:07:57.798 --> 00:08:00.774 Divide the number of winners who attended KIPP 00:08:00.774 --> 00:08:05.490 by the total number of lottery winners -- that's 78%. 00:08:05.810 --> 00:08:09.331 To find the percentage of lottery losers who attended KIPP, 00:08:09.331 --> 00:08:12.333 we divide the number of losers who attended KIPP 00:08:12.333 --> 00:08:16.865 by the total number of lottery losers -- that's 4%. 00:08:17.377 --> 00:08:21.597 Subtract 4 from 78, and we find that winning the lottery 00:08:21.597 --> 00:08:25.600 makes you 74% more likely to attend KIPP. 00:08:25.946 --> 00:08:28.532 Now we can find what we're really after -- 00:08:28.532 --> 00:08:34.551 the effect of attendance on scores, by dividing 0.36 by 0.74. 00:08:34.789 --> 00:08:37.585 Attending KIPP raises math scores 00:08:37.585 --> 00:08:41.606 by 0.48 standard deviations on average. 00:08:42.126 --> 00:08:44.503 That's an awesome achievement gain, 00:08:44.503 --> 00:08:47.380 equal to moving from about the bottom third 00:08:47.380 --> 00:08:49.925 to the middle of the achievement distribution. 00:08:49.925 --> 00:08:51.085 - [Student] Whoa, half a sig. 00:08:51.085 --> 00:08:53.507 - [Instructor] These estimates are for kids opting in 00:08:53.507 --> 00:08:54.781 to the KIPP lottery, 00:08:54.781 --> 00:08:57.762 whose enrollment status is changed by winning. 00:08:57.985 --> 00:09:00.617 That's not necessarily a random sample 00:09:00.617 --> 00:09:02.283 of all children in Lynn. 00:09:02.536 --> 00:09:05.035 So we can't assume we'd see the same effect 00:09:05.035 --> 00:09:07.327 for other types of students. - [Student] Huh. 00:09:07.327 --> 00:09:10.218 - But this effect on keen for KIPP kids 00:09:10.218 --> 00:09:13.367 is likely to be a good indicator of the consequences 00:09:13.367 --> 00:09:15.767 of adding additional charter seats. 00:09:15.767 --> 00:09:17.216 - [Student] Cool. - [Student] Got it. 00:09:19.628 --> 00:09:23.352 - IV eliminates selection bias, but like all of our tools, 00:09:23.352 --> 00:09:25.624 the solution builds on a set of assumptions 00:09:25.624 --> 00:09:27.540 not to be taken for granted. 00:09:28.098 --> 00:09:31.463 First, there must be a substantial first stage -- 00:09:31.463 --> 00:09:35.565 that is the instrumental variable, winning or losing the lottery, 00:09:35.565 --> 00:09:39.065 must really change the variable whose effect we're interested in -- 00:09:39.065 --> 00:09:41.031 here, KIPP attendance. 00:09:41.298 --> 00:09:44.594 In this case, the first stage is not really in doubt. 00:09:44.594 --> 00:09:47.894 Winning the lottery makes KIPP attendance much more likely. 00:09:48.386 --> 00:09:50.631 Not all IV stories are like that. 00:09:51.321 --> 00:09:53.698 Second, the instrument must be as good 00:09:53.698 --> 00:09:54.931 as randomly assigned, 00:09:54.931 --> 00:09:58.716 meaning lottery winners and losers have similar characteristics. 00:09:58.893 --> 00:10:01.559 This is the independence assumption. 00:10:01.977 --> 00:10:05.716 Of course, KIPP lottery wins really are randomly assigned. 00:10:05.716 --> 00:10:09.656 Still, we should check for balance and confirm that winners and losers 00:10:09.656 --> 00:10:11.493 have similar family backgrounds, 00:10:11.493 --> 00:10:13.590 similar aptitudes and so on. 00:10:13.590 --> 00:10:16.969 In essence, we're checking to ensure KIPP lotteries are fair 00:10:16.969 --> 00:10:20.055 with no group of applicants suspiciously likely to win. 00:10:21.373 --> 00:10:24.373 Finally, we require the instrument change outcomes 00:10:24.373 --> 00:10:26.092 solely through the variable of interest -- 00:10:26.092 --> 00:10:28.100 in this case, attending KIPP. 00:10:28.299 --> 00:10:31.367 This assumption is called the exclusion restriction. 00:10:32.951 --> 00:10:37.500 - IV only works if you can satisfy these three assumptions. 00:10:38.033 --> 00:10:40.418 - I don't understand the exclusion restriction. 00:10:40.917 --> 00:10:43.599 How could winning the lottery affect math scores 00:10:43.599 --> 00:10:45.244 other than by attending KIPP? 00:10:45.244 --> 00:10:47.230 - [Student] Yeah. - [Instructor] Great question. 00:10:47.230 --> 00:10:50.536 Suppose lottery winners are just thrilled to win, 00:10:50.536 --> 00:10:55.045 and this happiness motivates them to study more and learn more math, 00:10:55.045 --> 00:10:57.285 regardless of where they go to school. 00:10:57.285 --> 00:10:59.901 This would violate the exclusion restriction 00:10:59.901 --> 00:11:03.787 because the motivational effect of winning is a second channel 00:11:03.787 --> 00:11:06.569 whereby lotteries might affect test scores. 00:11:06.865 --> 00:11:09.546 While it's hard to rule this out entirely, 00:11:09.546 --> 00:11:12.650 there's no evidence of any alternative channels 00:11:12.650 --> 00:11:14.108 in the KIPP study. 00:11:17.817 --> 00:11:20.700 - IV solves the problem of selection bias 00:11:20.700 --> 00:11:25.051 in scenarios like the KIPP lottery where treatment offers are random 00:11:25.051 --> 00:11:27.083 but some of those offered opt out. 00:11:28.451 --> 00:11:31.700 This sort of intentional yet incomplete random assignment 00:11:31.700 --> 00:11:33.367 is surprisingly common. 00:11:33.367 --> 00:11:36.318 Even randomized clinical trials have this feature. 00:11:37.134 --> 00:11:40.053 IV solves the problem of non-random take-up 00:11:40.053 --> 00:11:42.534 in lotteries or clinical research. 00:11:43.054 --> 00:11:46.725 But lotteries are not the only source of compelling instruments. 00:11:46.915 --> 00:11:49.124 Many causal questions can be addressed 00:11:49.124 --> 00:11:50.758 by naturally occurring 00:11:50.758 --> 00:11:53.831 as good as randomly assigned variation. 00:11:54.731 --> 00:11:56.915 Here's a causal question for you: 00:11:56.915 --> 00:11:59.450 Do women who have children early in their careers 00:11:59.450 --> 00:12:01.647 suffer a substantial earnings penalty 00:12:01.647 --> 00:12:02.648 as a result? 00:12:02.648 --> 00:12:04.970 After all, women earn less than men. 00:12:05.573 --> 00:12:08.506 We could, of course, simply compare the earnings of women 00:12:08.506 --> 00:12:10.891 with more and fewer children. 00:12:10.891 --> 00:12:14.190 But such comparisons are fraught with selection bias. 00:12:14.806 --> 00:12:17.401 If only we could randomly assign babies 00:12:17.401 --> 00:12:19.089 to different households. 00:12:19.089 --> 00:12:22.131 Yeah, right, sounds pretty fanciful. 00:12:22.470 --> 00:12:26.714 Our next IV story -- fantastic and not fanciful -- 00:12:26.714 --> 00:12:30.234 illustrates an amazing, naturally occurring instrument 00:12:30.234 --> 00:12:31.918 for family size. 00:12:33.317 --> 00:12:34.551 ♪ [] ♪ 00:12:34.551 --> 00:12:38.202 - [Instructor] You're on your way to mastering econometrics. 00:12:38.202 --> 00:12:40.170 Make sure this video sticks 00:12:40.170 --> 00:12:42.636 by taking a few quick practice questions. 00:12:42.886 --> 00:12:46.336 Or, if you're ready, click for the next video. 00:12:46.529 --> 00:12:50.204 You can also check out MRU's website for more courses, 00:12:50.204 --> 00:12:52.027 teacher resources, and more. 00:12:52.289 --> 00:12:53.772 ♪ [] ♪