1 00:00:00,299 --> 00:00:04,644 因果推断之路径既黑暗又危险 2 00:00:05,139 --> 00:00:08,015 但是计量经济学是很厉害的武器 3 00:00:08,480 --> 00:00:11,734 当自然界给你带来偶然的随机分配时 4 00:00:11,734 --> 00:00:15,803 使用气势汹汹与灵活多變的 工具变量进行攻击 5 00:00:19,393 --> 00:00:21,094 [] 6 00:00:23,653 --> 00:00:26,362 随机试验是完成 “其他条件不变”的比较 7 00:00:26,362 --> 00:00:28,704 的最可靠途径 8 00:00:28,704 --> 00:00:32,640 但我们经常无法使用 这个功能强大的工具 9 00:00:33,224 --> 00:00:36,940 但是有时候,随机是偶然发生的 10 00:00:36,940 --> 00:00:40,592 这时候我们转向工具变量 11 00:00:40,592 --> 00:00:41,938 —简称IV 12 00:00:41,938 --> 00:00:44,508 工具变量 13 00:00:44,508 --> 00:00:48,186 今天的课堂是IV两节课的第一节 14 00:00:48,958 --> 00:00:52,801 我们的第一节IV课 从学校的故事开始 15 00:00:52,801 --> 00:00:54,348 [] 16 00:00:54,348 --> 00:00:56,138 特许学校是一些公立学校 17 00:00:56,138 --> 00:01:00,112 不受日常学区监督 与教师工会合同约束 18 00:01:00,895 --> 00:01:03,511 特许学校能否提高成绩 19 00:01:03,511 --> 00:01:05,161 是美国教育改革史上 20 00:01:05,161 --> 00:01:07,761 最重要的问题之一 21 00:01:08,145 --> 00:01:12,562 最受欢迎的特许学校的申请人数 远多于学位 22 00:01:12,562 --> 00:01:16,462 因此抽奖运决定了 谁家孩子可获录取 23 00:01:16,870 --> 00:01:20,695 在学生争夺机会时需要面对很多风险 24 00:01:20,695 --> 00:01:25,003 正如获奖纪录片“等待超人”中 25 00:01:25,003 --> 00:01:27,832 所描述的那样 26 00:01:27,832 --> 00:01:29,699 等待结果时会产生很多种情绪 27 00:01:30,258 --> 00:01:32,916 别哭,你会让妈妈哭的 好吗? 28 00:01:37,498 --> 00:01:40,618 特许学校真的能提供更好的教育吗? 29 00:01:40,948 --> 00:01:43,183 评论家肯定会说"不是的" 30 00:01:43,413 --> 00:01:46,586 他们会争辩说特许学校 能夠招募更好 31 00:01:46,586 --> 00:01:50,164 更聪明或更主动的学生 因此以后结果的差异 32 00:01:50,164 --> 00:01:52,061 反映了选择性偏差 33 00:01:52,595 --> 00:01:54,729 等一下,这个似乎很容易 34 00:01:55,139 --> 00:01:57,639 在抽奖活动中 我们会随机选择优胜者 35 00:01:57,639 --> 00:02:00,083 因此只比较赢家和输家 很明显的 36 00:02:00,083 --> 00:02:01,784 在正确的轨道上,卡马尔 37 00:02:01,784 --> 00:02:04,375 但是特许学校的抽签安排 38 00:02:04,375 --> 00:02:07,560 不会强迫孩子们进入 或离开特定的学校 39 00:02:07,749 --> 00:02:10,667 他们随机分配了特许学校的学位 40 00:02:11,650 --> 00:02:13,449 有些孩子很幸运 41 00:02:13,449 --> 00:02:14,966 有些孩子不是 42 00:02:14,966 --> 00:02:17,235 如果我们只是想知道特许学校 43 00:02:17,235 --> 00:02:19,202 所带来的影响 44 00:02:19,202 --> 00:02:22,417 我们可以将其视为随机试验 45 00:02:22,717 --> 00:02:24,684 但是,我们只对特许学校 就学的影响 46 00:02:24,684 --> 00:02:27,042 感兴趣 47 00:02:27,042 --> 00:02:28,283 而对录取不感兴趣 48 00:02:28,568 --> 00:02:32,039 并非所有获录取的学生 都会接受学位 49 00:02:32,039 --> 00:02:37,234 IV将被录取为特许学校学生的影响 50 00:02:37,234 --> 00:02:40,367 转变为实际就读特许学校的影响 51 00:02:40,367 --> 00:02:42,344 - 太酷了 - 哦,太好了 52 00:02:45,925 --> 00:02:48,871 让我们看一个例子 53 00:02:48,871 --> 00:02:52,353 这是一所执行知识就是力量专案 的特许学校,或简称为KIPP 54 00:02:52,736 --> 00:02:54,937 这所KIPP特许学校位于林恩 55 00:02:54,937 --> 00:02:58,837 一座位于麻省海边的 褪色工业城镇 56 00:02:59,104 --> 00:03:01,886 这所学校的申请者多于学位 57 00:03:01,886 --> 00:03:05,620 因此他们要抽签来挑选学生 58 00:03:05,834 --> 00:03:11,854 从2005年到2008年 共有371名四年级以及五年级生 59 00:03:11,854 --> 00:03:15,350 参加了KIPP林恩的抽签 60 00:03:15,350 --> 00:03:18,805 当中253名学生KIPP获录取 61 00:03:18,805 --> 00:03:21,651 118名学生没有录取 62 00:03:21,967 --> 00:03:26,001 一年后,获录取者的数学分数 63 00:03:26,001 --> 00:03:27,852 比未获录取者更高 64 00:03:27,852 --> 00:03:30,466 我们并不是试图弄清楚 65 00:03:30,466 --> 00:03:33,803 获录取后是否会提高 你的数学水平 66 00:03:34,070 --> 00:03:38,471 我们想知道参加KIPP 是否会使你的数学成绩改进 67 00:03:39,041 --> 00:03:45,750 在253位获录取者中 实际上只有199位到KIPP上学 68 00:03:46,139 --> 00:03:48,804 其他学生选择了传统的公立学校 69 00:03:49,563 --> 00:03:55,536 同样,在118名未被录取的学生中 事实上有一些最终参加了KIPP 70 00:03:55,536 --> 00:03:57,452 他们后来也获录取 71 00:03:57,452 --> 00:04:00,044 那么,实际上参加KIPP 72 00:04:00,044 --> 00:04:02,377 对考试成绩有何影响呢? 73 00:04:03,109 --> 00:04:05,426 为什么我们不能只衡量 他们的数学成绩? 74 00:04:05,894 --> 00:04:07,235 这是很好的问题 75 00:04:07,235 --> 00:04:09,302 你将他们与谁进行比较呢? 76 00:04:09,302 --> 00:04:11,111 那些没有参加的学生 77 00:04:11,111 --> 00:04:12,944 上学率是随机的吗? 78 00:04:14,161 --> 00:04:16,177 - 不是啊 - 选择性偏差 79 00:04:16,177 --> 00:04:17,909 - 对啊 - 什么? 80 00:04:17,909 --> 00:04:21,826 KIPP的录取是随机的,因此我们 对“其他条件不变”的假设充满信心 81 00:04:21,826 --> 00:04:26,409 但上学率不是随机的 82 00:04:26,635 --> 00:04:30,626 选择接受录取通知 83 00:04:30,626 --> 00:04:32,984 可能是与数学成绩有关的特征 84 00:04:33,251 --> 00:04:36,157 例如,有奉献精神的父母 85 00:04:36,157 --> 00:04:38,957 更有可能接受录取 86 00:04:38,957 --> 00:04:42,646 无论上那间学校 87 00:04:42,646 --> 00:04:44,090 他们的孩子的数学成绩 也有可能更好 88 00:04:44,090 --> 00:04:45,114 对啊 89 00:04:45,114 --> 00:04:47,725 IV将录取的影响 90 00:04:47,725 --> 00:04:50,567 转化为KIPP上学率的影响 91 00:04:50,573 --> 00:04:53,371 并就一些获录取者到其他学校上学 92 00:04:53,371 --> 00:04:56,573 而一些未被录取者还是设法 参加了KIPP 而进行调整 93 00:04:56,950 --> 00:05:00,517 本质上,IV需要进行不完全的随机化 94 00:05:00,517 --> 00:05:03,007 并进行适当的调整 95 00:05:03,684 --> 00:05:07,107 怎么样? IV描述了一种连锁反应 96 00:05:07,641 --> 00:05:10,343 为什么学校的录取会影响成绩? 97 00:05:10,343 --> 00:05:13,256 可能是因为这影响了 特许学校的上学率 98 00:05:13,256 --> 00:05:16,643 而特许学校的上学率 提高了数学成绩 99 00:05:16,643 --> 00:05:20,645 连锁反应的第一个环节 称之为“第一阶段” 100 00:05:20,645 --> 00:05:24,478 是抽签对特许学校上学率的影响 101 00:05:24,478 --> 00:05:28,452 第二阶段是在特许学校学 102 00:05:28,452 --> 00:05:30,153 以及结果变量之间的关联 103 00:05:30,153 --> 00:05:32,261 在这情况下,数学分数 104 00:05:32,732 --> 00:05:36,441 工具变量或简称为“工具” 105 00:05:36,441 --> 00:05:40,246 是启动链式反应的变量 106 00:05:40,979 --> 00:05:43,993 工具变量对结果的影响 107 00:05:43,993 --> 00:05:46,631 称为简化式 108 00:05:48,143 --> 00:05:51,869 这个链式反应可以用数学表示 109 00:05:51,869 --> 00:05:54,241 我们乘以第一阶段 110 00:05:54,241 --> 00:05:56,349 即录取者对上学率的影响 111 00:05:56,349 --> 00:05:57,960 到第二阶段 112 00:05:57,960 --> 00:06:00,538 上学率对分数的影响 113 00:06:00,538 --> 00:06:02,713 我们得到简化式 114 00:06:02,713 --> 00:06:05,680 获录取对分数的影响 115 00:06:06,780 --> 00:06:11,566 简化式和第一阶段是可观察的 并且易于计算 116 00:06:11,752 --> 00:06:14,876 但是,上学率对成绩的影响 117 00:06:14,876 --> 00:06:17,093 并未能直接观察到 118 00:06:17,093 --> 00:06:20,360 这是我们试图确定的因果关系 119 00:06:21,043 --> 00:06:23,827 Given some important assumptions we'll discuss shortly, 120 00:06:23,827 --> 00:06:25,977 we can find the effect of KIPP attendance 121 00:06:25,977 --> 00:06:29,265 by dividing the reduced form by the first stage. 122 00:06:29,265 --> 00:06:32,910 This will become more clear as we work through an example. 123 00:06:32,910 --> 00:06:34,207 - [Student] Let's do this. 124 00:06:37,161 --> 00:06:38,728 - A quick note on measurement. 125 00:06:38,728 --> 00:06:41,745 We measure achievement using standard deviations, 126 00:06:41,745 --> 00:06:44,728 often denoted by the Greek letter sigma (σ). 127 00:06:44,728 --> 00:06:48,862 One σ is a huge move from around the bottom 15% 128 00:06:48,862 --> 00:06:51,634 to the middle of most achievement distributions. 129 00:06:51,634 --> 00:06:55,412 Even a ¼ or ½ σ difference is big. 130 00:06:56,262 --> 00:06:58,389 - [Instructor] Now we're ready to plug some numbers 131 00:06:58,389 --> 00:07:01,655 into the equation we introduced earlier. 132 00:07:01,655 --> 00:07:03,231 First up, what's the effect 133 00:07:03,231 --> 00:07:06,076 of winning the lottery on math scores? 134 00:07:06,354 --> 00:07:10,421 KIPP applicants' math scores are a third of a standard deviation 135 00:07:10,421 --> 00:07:11,835 below the state average 136 00:07:11,835 --> 00:07:14,386 in the year before they apply to KIPP. 137 00:07:14,386 --> 00:07:18,320 But a year later, lottery winners score right at the state average, 138 00:07:18,320 --> 00:07:21,482 while the lottery losers are still well behind 139 00:07:21,482 --> 00:07:25,499 with an average score around -0.36 σ. 140 00:07:25,834 --> 00:07:29,619 The effect of winning the lottery on scores is the difference 141 00:07:29,619 --> 00:07:32,819 between the winners' scores and the losers' scores. 142 00:07:33,403 --> 00:07:35,784 Take the winners' average math scores, 143 00:07:35,784 --> 00:07:38,269 subtract the losers' average math scores, 144 00:07:38,269 --> 00:07:41,502 and you will have 0.36 σ. 145 00:07:41,908 --> 00:07:46,880 Next up: what's the effect of winning the lottery on attendance? 146 00:07:46,880 --> 00:07:49,193 In other words, if you win the lottery, 147 00:07:49,193 --> 00:07:52,257 how much more likely are you to attend KIPP 148 00:07:52,257 --> 00:07:53,456 than if you lose? 149 00:07:53,671 --> 00:07:57,798 First, what percentage of lottery winners attend KIPP? 150 00:07:57,798 --> 00:08:00,774 Divide the number of winners who attended KIPP 151 00:08:00,774 --> 00:08:05,490 by the total number of lottery winners -- that's 78%. 152 00:08:05,810 --> 00:08:09,331 To find the percentage of lottery losers who attended KIPP, 153 00:08:09,331 --> 00:08:12,333 we divide the number of losers who attended KIPP 154 00:08:12,333 --> 00:08:16,865 by the total number of lottery losers -- that's 4%. 155 00:08:17,377 --> 00:08:21,597 Subtract 4 from 78, and we find that winning the lottery 156 00:08:21,597 --> 00:08:25,600 makes you 74% more likely to attend KIPP. 157 00:08:25,946 --> 00:08:28,532 Now we can find what we're really after -- 158 00:08:28,532 --> 00:08:34,551 the effect of attendance on scores, by dividing 0.36 by 0.74. 159 00:08:34,789 --> 00:08:37,585 Attending KIPP raises math scores 160 00:08:37,585 --> 00:08:41,606 by 0.48 standard deviations on average. 161 00:08:42,126 --> 00:08:44,503 That's an awesome achievement gain, 162 00:08:44,503 --> 00:08:47,380 equal to moving from about the bottom third 163 00:08:47,380 --> 00:08:49,925 to the middle of the achievement distribution. 164 00:08:49,925 --> 00:08:51,085 - [Student] Whoa, half a sig. 165 00:08:51,085 --> 00:08:53,507 - [Instructor] These estimates are for kids opting in 166 00:08:53,507 --> 00:08:54,781 to the KIPP lottery, 167 00:08:54,781 --> 00:08:57,762 whose enrollment status is changed by winning. 168 00:08:57,985 --> 00:09:00,617 That's not necessarily a random sample 169 00:09:00,617 --> 00:09:02,283 of all children in Lynn. 170 00:09:02,536 --> 00:09:05,035 So we can't assume we'd see the same effect 171 00:09:05,035 --> 00:09:07,327 for other types of students. - [Student] Huh. 172 00:09:07,327 --> 00:09:10,218 - But this effect on keen for KIPP kids 173 00:09:10,218 --> 00:09:13,367 is likely to be a good indicator of the consequences 174 00:09:13,367 --> 00:09:15,767 of adding additional charter seats. 175 00:09:15,767 --> 00:09:17,216 - [Student] Cool. - [Student] Got it. 176 00:09:19,628 --> 00:09:23,352 - IV eliminates selection bias, but like all of our tools, 177 00:09:23,352 --> 00:09:25,624 the solution builds on a set of assumptions 178 00:09:25,624 --> 00:09:27,540 not to be taken for granted. 179 00:09:28,098 --> 00:09:31,463 First, there must be a substantial first stage -- 180 00:09:31,463 --> 00:09:35,565 that is the instrumental variable, winning or losing the lottery, 181 00:09:35,565 --> 00:09:39,065 must really change the variable whose effect we're interested in -- 182 00:09:39,065 --> 00:09:41,031 here, KIPP attendance. 183 00:09:41,298 --> 00:09:44,594 In this case, the first stage is not really in doubt. 184 00:09:44,594 --> 00:09:47,894 Winning the lottery makes KIPP attendance much more likely. 185 00:09:48,386 --> 00:09:50,631 Not all IV stories are like that. 186 00:09:51,321 --> 00:09:53,698 Second, the instrument must be as good 187 00:09:53,698 --> 00:09:54,931 as randomly assigned, 188 00:09:54,931 --> 00:09:58,716 meaning lottery winners and losers have similar characteristics. 189 00:09:58,893 --> 00:10:01,559 This is the independence assumption. 190 00:10:01,977 --> 00:10:05,716 Of course, KIPP lottery wins really are randomly assigned. 191 00:10:05,716 --> 00:10:09,656 Still, we should check for balance and confirm that winners and losers 192 00:10:09,656 --> 00:10:11,493 have similar family backgrounds, 193 00:10:11,493 --> 00:10:13,590 similar aptitudes and so on. 194 00:10:13,590 --> 00:10:16,969 In essence, we're checking to ensure KIPP lotteries are fair 195 00:10:16,969 --> 00:10:20,055 with no group of applicants suspiciously likely to win. 196 00:10:21,373 --> 00:10:24,373 Finally, we require the instrument change outcomes 197 00:10:24,373 --> 00:10:26,092 solely through the variable of interest -- 198 00:10:26,092 --> 00:10:28,100 in this case, attending KIPP. 199 00:10:28,299 --> 00:10:31,367 This assumption is called the exclusion restriction. 200 00:10:32,951 --> 00:10:37,500 - IV only works if you can satisfy these three assumptions. 201 00:10:38,033 --> 00:10:40,418 - I don't understand the exclusion restriction. 202 00:10:40,917 --> 00:10:43,599 How could winning the lottery affect math scores 203 00:10:43,599 --> 00:10:45,244 other than by attending KIPP? 204 00:10:45,244 --> 00:10:47,230 - [Student] Yeah. - [Instructor] Great question. 205 00:10:47,230 --> 00:10:50,536 Suppose lottery winners are just thrilled to win, 206 00:10:50,536 --> 00:10:55,045 and this happiness motivates them to study more and learn more math, 207 00:10:55,045 --> 00:10:57,285 regardless of where they go to school. 208 00:10:57,285 --> 00:10:59,901 This would violate the exclusion restriction 209 00:10:59,901 --> 00:11:03,787 because the motivational effect of winning is a second channel 210 00:11:03,787 --> 00:11:06,569 whereby lotteries might affect test scores. 211 00:11:06,865 --> 00:11:09,546 While it's hard to rule this out entirely, 212 00:11:09,546 --> 00:11:12,650 there's no evidence of any alternative channels 213 00:11:12,650 --> 00:11:14,108 in the KIPP study. 214 00:11:17,817 --> 00:11:20,700 - IV solves the problem of selection bias 215 00:11:20,700 --> 00:11:25,051 in scenarios like the KIPP lottery where treatment offers are random 216 00:11:25,051 --> 00:11:27,083 but some of those offered opt out. 217 00:11:28,451 --> 00:11:31,700 This sort of intentional yet incomplete random assignment 218 00:11:31,700 --> 00:11:33,367 is surprisingly common. 219 00:11:33,367 --> 00:11:36,318 Even randomized clinical trials have this feature. 220 00:11:37,134 --> 00:11:40,053 IV solves the problem of non-random take-up 221 00:11:40,053 --> 00:11:42,534 in lotteries or clinical research. 222 00:11:43,054 --> 00:11:46,725 But lotteries are not the only source of compelling instruments. 223 00:11:46,915 --> 00:11:49,124 Many causal questions can be addressed 224 00:11:49,124 --> 00:11:50,758 by naturally occurring 225 00:11:50,758 --> 00:11:53,831 as good as randomly assigned variation. 226 00:11:54,731 --> 00:11:56,915 Here's a causal question for you: 227 00:11:56,915 --> 00:11:59,450 Do women who have children early in their careers 228 00:11:59,450 --> 00:12:01,647 suffer a substantial earnings penalty 229 00:12:01,647 --> 00:12:02,648 as a result? 230 00:12:02,648 --> 00:12:04,970 After all, women earn less than men. 231 00:12:05,573 --> 00:12:08,506 We could, of course, simply compare the earnings of women 232 00:12:08,506 --> 00:12:10,891 with more and fewer children. 233 00:12:10,891 --> 00:12:14,190 But such comparisons are fraught with selection bias. 234 00:12:14,806 --> 00:12:17,401 If only we could randomly assign babies 235 00:12:17,401 --> 00:12:19,089 to different households. 236 00:12:19,089 --> 00:12:22,131 Yeah, right, sounds pretty fanciful. 237 00:12:22,470 --> 00:12:26,714 Our next IV story -- fantastic and not fanciful -- 238 00:12:26,714 --> 00:12:30,234 illustrates an amazing, naturally occurring instrument 239 00:12:30,234 --> 00:12:31,918 for family size. 240 00:12:33,317 --> 00:12:34,551 ♪ [] ♪ 241 00:12:34,551 --> 00:12:38,202 - [Instructor] You're on your way to mastering econometrics. 242 00:12:38,202 --> 00:12:40,170 Make sure this video sticks 243 00:12:40,170 --> 00:12:42,636 by taking a few quick practice questions. 244 00:12:42,886 --> 00:12:46,336 Or, if you're ready, click for the next video. 245 00:12:46,529 --> 00:12:50,204 You can also check out MRU's website for more courses, 246 00:12:50,204 --> 00:12:52,027 teacher resources, and more. 247 00:12:52,289 --> 00:12:53,772 ♪ [] ♪