0:00:00.540,0:00:02.540 Where we left off in the[br]last video I kind 0:00:02.540,0:00:03.400 of gave you a question. 0:00:03.400,0:00:06.240 Find an interval so that we're[br]reasonably confident-- we'll 0:00:06.240,0:00:08.220 talk a little bit more about why[br]I have to give this kind 0:00:08.220,0:00:12.570 of vague wording right here--[br]reasonably confident that 0:00:12.570,0:00:19.030 there's a 95% chance that the[br]true population mean, which is 0:00:19.030,0:00:22.360 p, which is the same thing as[br]the mean of the sampling 0:00:22.360,0:00:23.850 distribution of the[br]sampling mean. 0:00:23.850,0:00:26.570 So there's a 95% chance that[br]the true mean-- and 0:00:26.570,0:00:27.360 let me put this here. 0:00:27.360,0:00:29.910 This is also the same thing as[br]the mean of the sampling 0:00:29.910,0:00:32.790 distribution of the sampling[br]mean is in that interval. 0:00:32.790,0:00:36.740 And to do that let me just[br]throw out a few ideas. 0:00:36.740,0:00:46.295 What is the probability that if[br]I take a sample and I were 0:00:46.295,0:00:48.870 to take a mean of that sample,[br]so the probability that a 0:00:48.870,0:00:59.260 random sample mean is within two[br]standard deviations of the 0:00:59.260,0:01:04.500 sampling mean, of[br]our sample mean? 0:01:04.500,0:01:07.610 So what is this probability[br]right over here? 0:01:07.610,0:01:10.100 Let's just look at our[br]actual distribution. 0:01:10.100,0:01:12.950 So this is our distribution,[br]this right here is our 0:01:12.950,0:01:14.020 sampling mean. 0:01:14.020,0:01:15.580 Maybe I should do it in[br]blue because that's 0:01:15.580,0:01:18.730 the color up here. 0:01:18.730,0:01:20.180 This is our sampling mean. 0:01:20.180,0:01:23.330 And so what is the probability[br]that a random sampling mean is 0:01:23.330,0:01:24.970 going to be two standard[br]deviations? 0:01:24.970,0:01:28.440 Well a random sampling is a[br]sample from this distribution. 0:01:28.440,0:01:30.980 It is a sample from the sampling[br]distribution of the 0:01:30.980,0:01:32.040 sample mean. 0:01:32.040,0:01:35.400 So it's literally what is the[br]probability of finding a 0:01:35.400,0:01:37.360 sample within two standard[br]deviations of the mean? 0:01:37.360,0:01:40.150 That's one standard deviation,[br]that's another standard 0:01:40.150,0:01:42.980 deviation right over there. 0:01:42.980,0:01:45.140 In general, if you haven't[br]committed this to memory 0:01:45.140,0:01:48.340 already, it's not a bad thing[br]to commit to memory, is that 0:01:48.340,0:01:50.710 if you have a normal[br]distribution the probability 0:01:50.710,0:01:55.170 of taking a sample within two[br]standard deviations is 95-- 0:01:55.170,0:01:57.130 and if you want to get[br]a little bit more 0:01:57.130,0:01:59.600 accurate it's 95.4%. 0:01:59.600,0:02:03.860 But you could say it's roughly--[br]or maybe I could 0:02:03.860,0:02:06.210 write it like this--[br]it's roughly 95%. 0:02:06.210,0:02:08.400 And really that's all that[br]matters because we have this 0:02:08.400,0:02:10.910 little funny language here[br]called reasonably confident, 0:02:10.910,0:02:13.800 and we have to estimate the[br]standard deviation anyway. 0:02:13.800,0:02:16.130 In fact, we could say if we[br]want, I could say that it's 0:02:16.130,0:02:20.820 going to be exactly[br]equal to 95.4%. 0:02:20.820,0:02:24.240 But in general, two standard[br]deviations, 95%, that's what 0:02:24.240,0:02:25.740 people equate with each other. 0:02:25.740,0:02:28.680 Now this statement is the[br]exact same thing as the 0:02:28.680,0:02:36.170 probability that the sample[br]mean, that the sampling mean-- 0:02:36.170,0:02:38.440 not the sample mean, the[br]probability of the mean of the 0:02:38.440,0:02:46.700 sampling distribution is within[br]two standard deviations 0:02:46.700,0:02:51.350 of the sampling distribution of[br]x is also going to be the 0:02:51.350,0:02:54.750 same number, is also going[br]to be equal to 95.4%. 0:02:54.750,0:02:56.220 These are the exact[br]same statements. 0:02:56.220,0:03:00.170 If x is within two standard[br]deviations of this, then this, 0:03:00.170,0:03:02.580 then the mean, is within two[br]standard deviations of x. 0:03:02.580,0:03:05.400 These are just two ways of[br]phrasing the same thing. 0:03:05.400,0:03:09.090 Now we know that the mean of the[br]sampling distribution, the 0:03:09.090,0:03:11.760 same thing as a mean of the[br]population distribution, which 0:03:11.760,0:03:14.940 is the same thing as the[br]parameter p-- the proportion 0:03:14.940,0:03:19.670 of people or the proportion of[br]the population that is a 1. 0:03:19.670,0:03:22.780 So this right here is the same[br]thing as the population mean. 0:03:22.780,0:03:26.900 So this statement right here[br]we can switch this with p. 0:03:26.900,0:03:32.290 So the probability that p is[br]within two standard deviations 0:03:32.290,0:03:37.260 of the sampling distribution[br]of x is 95.4%. 0:03:37.260,0:03:41.830 Now we don't know what this[br]number right here is. 0:03:41.830,0:03:43.890 But we have estimated it. 0:03:43.890,0:03:48.870 Remember, our best estimate of[br]this is the true standard, or 0:03:48.870,0:03:51.060 it is the true standard[br]deviation of the population 0:03:51.060,0:03:52.110 divided by 10. 0:03:52.110,0:03:54.350 We can estimate the true[br]standard deviation of the 0:03:54.350,0:03:57.300 population with our sampling[br]standard deviation, which was 0:03:57.300,0:04:00.000 0.5, 0.5 divided by 10. 0:04:00.000,0:04:03.980 Our best estimate of the[br]standard deviation of the 0:04:03.980,0:04:08.190 sampling distribution of the[br]sample mean is 0.05. 0:04:08.190,0:04:11.470 So now we can say-- and I'll[br]switch colors-- the 0:04:11.470,0:04:14.670 probability that the parameter[br]p, the proportion of the 0:04:14.670,0:04:21.720 population saying 1, is within[br]two times-- remember, our best 0:04:21.720,0:04:28.600 estimate of this right here is[br]0.05 of a sample mean that we 0:04:28.600,0:04:33.500 take is equal to 95.4%. 0:04:33.500,0:04:40.770 And so we could say the[br]probability that p is within 2 0:04:40.770,0:04:46.650 times 0.05 is going to be equal[br]to-- 2.0 is going to be 0:04:46.650,0:04:53.290 0.10 of our mean is equal to[br]95-- and actually let me be a 0:04:53.290,0:04:54.230 little careful here. 0:04:54.230,0:04:58.420 I can't say the equal now,[br]because over here if we knew 0:04:58.420,0:05:01.110 this, if we knew this parameter[br]of the sampling 0:05:01.110,0:05:03.120 distribution of the sample[br]mean, we could 0:05:03.120,0:05:05.250 say that it is 95.4%. 0:05:05.250,0:05:06.280 We don't know it. 0:05:06.280,0:05:09.050 We are just trying to find our[br]best estimator for it. 0:05:09.050,0:05:11.450 So actually what I'm going to[br]do here is actually just say 0:05:11.450,0:05:14.260 is roughly-- and just to show[br]that we don't even have that 0:05:14.260,0:05:17.510 level of accuracy, I'm going[br]to say roughly 95%. 0:05:17.510,0:05:20.680 We're reasonably confident that[br]it's about 95% because 0:05:20.680,0:05:23.880 we're using this estimator that[br]came out of our sample, 0:05:23.880,0:05:26.070 and if the sample is really[br]skewed this is going to be a 0:05:26.070,0:05:26.950 really weird number. 0:05:26.950,0:05:29.810 So this is why we just have to[br]be a little bit more exact 0:05:29.810,0:05:30.590 about what we're doing. 0:05:30.590,0:05:31.890 But this is the tool[br]for at least saying 0:05:31.890,0:05:33.910 how good is our result. 0:05:33.910,0:05:37.840 So this is going to[br]be about 95%. 0:05:37.840,0:05:46.500 Or we could say that the[br]probability that p is within 0:05:46.500,0:05:49.870 0.10 of our sample mean[br]that we actually got. 0:05:49.870,0:05:51.840 So what was the sample mean[br]that we actually got? 0:05:51.840,0:05:53.460 It was 0.43. 0:05:53.460,0:05:59.550 So if we're within 0.1 of 0.43,[br]that means we are within 0:05:59.550,0:06:07.730 0.43 plus or minus 0.1 is[br]also, roughly, we're 0:06:07.730,0:06:11.850 reasonably confident[br]it's about 95%. 0:06:11.850,0:06:12.870 And I want to be very clear. 0:06:12.870,0:06:15.060 Everything that I started all[br]the way from up here in brown 0:06:15.060,0:06:17.610 to yellow and all this magenta,[br]I'm just restating 0:06:17.610,0:06:19.340 the same thing inside of this. 0:06:19.340,0:06:22.490 It became a little bit more[br]loosey-goosey once I went from 0:06:22.490,0:06:25.950 the exact standard deviation of[br]the sampling distribution 0:06:25.950,0:06:27.340 to an estimator for it. 0:06:27.340,0:06:29.960 And that's why this is just[br]becoming-- I kind of put the 0:06:29.960,0:06:32.540 squiggly equal signs there[br]to say we're reasonably 0:06:32.540,0:06:35.110 confident-- and I even got rid[br]of some of the precision. 0:06:35.110,0:06:36.950 But we just found[br]our interval. 0:06:36.950,0:06:39.240 An interval that we can be[br]reasonably confident that 0:06:39.240,0:06:42.460 there's a 95% probability that[br]p is within that, is going to 0:06:42.460,0:06:45.190 be 0.43 plus or minus 0.1. 0:06:45.190,0:06:48.460 Or an interval of-- we have[br]a confidence interval. 0:06:48.460,0:06:59.620 We have a 95% confidence[br]interval of, and we could say, 0:06:59.620,0:07:04.240 0.43 minus 0.1 is 0.33. 0:07:04.240,0:07:08.710 If we write that as a percent[br]we could say 33% to-- and if 0:07:08.710,0:07:16.630 we add the 0.1, 0.43 plus[br]0.1 we get 53%-- to 53%. 0:07:16.630,0:07:20.890 So we are 95% confident. 0:07:20.890,0:07:24.340 So we're not saying kind of[br]precisely that the probability 0:07:24.340,0:07:28.550 of the actual proportion is 95%,[br]but we're 95% confident 0:07:28.550,0:07:35.870 that the true proportion[br]is between 33% and 55%. 0:07:35.870,0:07:37.830 That p is in this[br]range over here. 0:07:37.830,0:07:40.980 Or another way, and you'll see[br]this in a lot of surveys that 0:07:40.980,0:07:45.070 have been done, people will say[br]we did a survey and we got 0:07:45.070,0:07:55.180 43% will vote for number one,[br]and number one in this case is 0:07:55.180,0:07:56.430 candidate B. 0:08:02.420,0:08:04.450 And then the other side, since[br]everyone else voted for 0:08:04.450,0:08:13.290 candidate A, 57% will[br]vote for A. 0:08:13.290,0:08:15.460 And then they're going to[br]put on margin of error. 0:08:15.460,0:08:17.750 And you'll see this in any[br]survey that you see on TV. 0:08:17.750,0:08:22.350 They'll put a margin of error. 0:08:22.350,0:08:24.920 And the margin of error is just[br]another way of describing 0:08:24.920,0:08:26.480 this confidence interval. 0:08:26.480,0:08:29.330 And they'll say that the margin[br]of error in this case 0:08:29.330,0:08:37.200 is 10%, which means that there's[br]a 95% confidence 0:08:37.200,0:08:41.919 interval, if you go plus or[br]minus 10% from that value 0:08:41.919,0:08:42.510 right over there. 0:08:42.510,0:08:44.850 And I really want to emphasize,[br]you can't say with 0:08:44.850,0:08:48.620 certainty that there is a 95%[br]chance that the true result 0:08:48.620,0:08:52.180 will be within 10% of this,[br]because we had to estimate the 0:08:52.180,0:08:55.030 standard deviation of[br]the sampling mean. 0:08:55.030,0:08:58.140 But this is the best measure[br]we can with the information 0:08:58.140,0:09:00.500 you have. If you're going to[br]do a survey of 100 people, 0:09:00.500,0:09:03.570 this is the best kind of[br]confidence that we can get. 0:09:03.570,0:09:05.600 And this number is actually[br]fairly big. 0:09:05.600,0:09:08.560 So if you were to look at this[br]you would say, roughly there's 0:09:08.560,0:09:12.360 a 95% chance that the true[br]value of this number is 0:09:12.360,0:09:15.100 between 33% and 53%. 0:09:15.100,0:09:18.240 So there's actually still a[br]chance that candidate B can 0:09:18.240,0:09:21.170 win, even though only[br]43% of your 100 are 0:09:21.170,0:09:21.920 going to vote for him. 0:09:21.920,0:09:25.250 If you wanted to make it a[br]little bit more precise you 0:09:25.250,0:09:26.770 would want to take[br]more samples. 0:09:26.770,0:09:28.390 You can imagine. 0:09:28.390,0:09:31.710 Instead of taking 100 samples,[br]instead of n being 100, if you 0:09:31.710,0:09:35.310 made n equal 1,000, then you[br]would take this number over 0:09:35.310,0:09:37.800 here, you would take this number[br]here and divide by the 0:09:37.800,0:09:40.570 square root of 1,000 instead[br]of the square root of 100. 0:09:40.570,0:09:43.250 So you'd be dividing[br]by 33 or whatever. 0:09:43.250,0:09:48.550 And so then the size of the[br]standard deviation of your 0:09:48.550,0:09:50.760 sampling distribution[br]will go down. 0:09:50.760,0:09:53.350 And so the distance of two[br]standard deviations will be a 0:09:53.350,0:09:55.480 smaller number, and so[br]then you will have a 0:09:55.480,0:09:57.210 smaller margin of error. 0:09:57.210,0:10:00.290 And maybe you want to get the[br]margin of error small enough 0:10:00.290,0:10:02.920 so that you can figure out[br]decisively who's going to win 0:10:02.920,0:10:04.340 the election.