WEBVTT 00:00:00.000 --> 00:00:00.540 00:00:00.540 --> 00:00:02.540 Where we left off in the last video I kind 00:00:02.540 --> 00:00:03.400 of gave you a question. 00:00:03.400 --> 00:00:06.240 Find an interval so that we're reasonably confident-- we'll 00:00:06.240 --> 00:00:08.220 talk a little bit more about why I have to give this kind 00:00:08.220 --> 00:00:12.570 of vague wording right here-- reasonably confident that 00:00:12.570 --> 00:00:19.030 there's a 95% chance that the true population mean, which is 00:00:19.030 --> 00:00:22.360 p, which is the same thing as the mean of the sampling 00:00:22.360 --> 00:00:23.850 distribution of the sampling mean. 00:00:23.850 --> 00:00:26.570 So there's a 95% chance that the true mean-- and 00:00:26.570 --> 00:00:27.360 let me put this here. 00:00:27.360 --> 00:00:29.910 This is also the same thing as the mean of the sampling 00:00:29.910 --> 00:00:32.790 distribution of the sampling mean is in that interval. 00:00:32.790 --> 00:00:36.740 And to do that let me just throw out a few ideas. 00:00:36.740 --> 00:00:46.295 What is the probability that if I take a sample and I were 00:00:46.295 --> 00:00:48.870 to take a mean of that sample, so the probability that a 00:00:48.870 --> 00:00:59.260 random sample mean is within two standard deviations of the 00:00:59.260 --> 00:01:04.500 sampling mean, of our sample mean? 00:01:04.500 --> 00:01:07.610 So what is this probability right over here? 00:01:07.610 --> 00:01:10.100 Let's just look at our actual distribution. 00:01:10.100 --> 00:01:12.950 So this is our distribution, this right here is our 00:01:12.950 --> 00:01:14.020 sampling mean. 00:01:14.020 --> 00:01:15.580 Maybe I should do it in blue because that's 00:01:15.580 --> 00:01:18.730 the color up here. 00:01:18.730 --> 00:01:20.180 This is our sampling mean. 00:01:20.180 --> 00:01:23.330 And so what is the probability that a random sampling mean is 00:01:23.330 --> 00:01:24.970 going to be two standard deviations? 00:01:24.970 --> 00:01:28.440 Well a random sampling is a sample from this distribution. 00:01:28.440 --> 00:01:30.980 It is a sample from the sampling distribution of the 00:01:30.980 --> 00:01:32.040 sample mean. 00:01:32.040 --> 00:01:35.400 So it's literally what is the probability of finding a 00:01:35.400 --> 00:01:37.360 sample within two standard deviations of the mean? 00:01:37.360 --> 00:01:40.150 That's one standard deviation, that's another standard 00:01:40.150 --> 00:01:42.980 deviation right over there. 00:01:42.980 --> 00:01:45.140 In general, if you haven't committed this to memory 00:01:45.140 --> 00:01:48.340 already, it's not a bad thing to commit to memory, is that 00:01:48.340 --> 00:01:50.710 if you have a normal distribution the probability 00:01:50.710 --> 00:01:55.170 of taking a sample within two standard deviations is 95-- 00:01:55.170 --> 00:01:57.130 and if you want to get a little bit more 00:01:57.130 --> 00:01:59.600 accurate it's 95.4%. 00:01:59.600 --> 00:02:03.860 But you could say it's roughly-- or maybe I could 00:02:03.860 --> 00:02:06.210 write it like this-- it's roughly 95%. 00:02:06.210 --> 00:02:08.400 And really that's all that matters because we have this 00:02:08.400 --> 00:02:10.910 little funny language here called reasonably confident, 00:02:10.910 --> 00:02:13.800 and we have to estimate the standard deviation anyway. 00:02:13.800 --> 00:02:16.130 In fact, we could say if we want, I could say that it's 00:02:16.130 --> 00:02:20.820 going to be exactly equal to 95.4%. 00:02:20.820 --> 00:02:24.240 But in general, two standard deviations, 95%, that's what 00:02:24.240 --> 00:02:25.740 people equate with each other. 00:02:25.740 --> 00:02:28.680 Now this statement is the exact same thing as the 00:02:28.680 --> 00:02:36.170 probability that the sample mean, that the sampling mean-- 00:02:36.170 --> 00:02:38.440 not the sample mean, the probability of the mean of the 00:02:38.440 --> 00:02:46.700 sampling distribution is within two standard deviations 00:02:46.700 --> 00:02:51.350 of the sampling distribution of x is also going to be the 00:02:51.350 --> 00:02:54.750 same number, is also going to be equal to 95.4%. 00:02:54.750 --> 00:02:56.220 These are the exact same statements. 00:02:56.220 --> 00:03:00.170 If x is within two standard deviations of this, then this, 00:03:00.170 --> 00:03:02.580 then the mean, is within two standard deviations of x. 00:03:02.580 --> 00:03:05.400 These are just two ways of phrasing the same thing. 00:03:05.400 --> 00:03:09.090 Now we know that the mean of the sampling distribution, the 00:03:09.090 --> 00:03:11.760 same thing as a mean of the population distribution, which 00:03:11.760 --> 00:03:14.940 is the same thing as the parameter p-- the proportion 00:03:14.940 --> 00:03:19.670 of people or the proportion of the population that is a 1. 00:03:19.670 --> 00:03:22.780 So this right here is the same thing as the population mean. 00:03:22.780 --> 00:03:26.900 So this statement right here we can switch this with p. 00:03:26.900 --> 00:03:32.290 So the probability that p is within two standard deviations 00:03:32.290 --> 00:03:37.260 of the sampling distribution of x is 95.4%. 00:03:37.260 --> 00:03:41.830 Now we don't know what this number right here is. 00:03:41.830 --> 00:03:43.890 But we have estimated it. 00:03:43.890 --> 00:03:48.870 Remember, our best estimate of this is the true standard, or 00:03:48.870 --> 00:03:51.060 it is the true standard deviation of the population 00:03:51.060 --> 00:03:52.110 divided by 10. 00:03:52.110 --> 00:03:54.350 We can estimate the true standard deviation of the 00:03:54.350 --> 00:03:57.300 population with our sampling standard deviation, which was 00:03:57.300 --> 00:04:00.000 0.5, 0.5 divided by 10. 00:04:00.000 --> 00:04:03.980 Our best estimate of the standard deviation of the 00:04:03.980 --> 00:04:08.190 sampling distribution of the sample mean is 0.05. 00:04:08.190 --> 00:04:11.470 So now we can say-- and I'll switch colors-- the 00:04:11.470 --> 00:04:14.670 probability that the parameter p, the proportion of the 00:04:14.670 --> 00:04:21.720 population saying 1, is within two times-- remember, our best 00:04:21.720 --> 00:04:28.600 estimate of this right here is 0.05 of a sample mean that we 00:04:28.600 --> 00:04:33.500 take is equal to 95.4%. 00:04:33.500 --> 00:04:40.770 And so we could say the probability that p is within 2 00:04:40.770 --> 00:04:46.650 times 0.05 is going to be equal to-- 2.0 is going to be 00:04:46.650 --> 00:04:53.290 0.10 of our mean is equal to 95-- and actually let me be a 00:04:53.290 --> 00:04:54.230 little careful here. 00:04:54.230 --> 00:04:58.420 I can't say the equal now, because over here if we knew 00:04:58.420 --> 00:05:01.110 this, if we knew this parameter of the sampling 00:05:01.110 --> 00:05:03.120 distribution of the sample mean, we could 00:05:03.120 --> 00:05:05.250 say that it is 95.4%. 00:05:05.250 --> 00:05:06.280 We don't know it. 00:05:06.280 --> 00:05:09.050 We are just trying to find our best estimator for it. 00:05:09.050 --> 00:05:11.450 So actually what I'm going to do here is actually just say 00:05:11.450 --> 00:05:14.260 is roughly-- and just to show that we don't even have that 00:05:14.260 --> 00:05:17.510 level of accuracy, I'm going to say roughly 95%. 00:05:17.510 --> 00:05:20.680 We're reasonably confident that it's about 95% because 00:05:20.680 --> 00:05:23.880 we're using this estimator that came out of our sample, 00:05:23.880 --> 00:05:26.070 and if the sample is really skewed this is going to be a 00:05:26.070 --> 00:05:26.950 really weird number. 00:05:26.950 --> 00:05:29.810 So this is why we just have to be a little bit more exact 00:05:29.810 --> 00:05:30.590 about what we're doing. 00:05:30.590 --> 00:05:31.890 But this is the tool for at least saying 00:05:31.890 --> 00:05:33.910 how good is our result. 00:05:33.910 --> 00:05:37.840 So this is going to be about 95%. 00:05:37.840 --> 00:05:46.500 Or we could say that the probability that p is within 00:05:46.500 --> 00:05:49.870 0.10 of our sample mean that we actually got. 00:05:49.870 --> 00:05:51.840 So what was the sample mean that we actually got? 00:05:51.840 --> 00:05:53.460 It was 0.43. 00:05:53.460 --> 00:05:59.550 So if we're within 0.1 of 0.43, that means we are within 00:05:59.550 --> 00:06:07.730 0.43 plus or minus 0.1 is also, roughly, we're 00:06:07.730 --> 00:06:11.850 reasonably confident it's about 95%. 00:06:11.850 --> 00:06:12.870 And I want to be very clear. 00:06:12.870 --> 00:06:15.060 Everything that I started all the way from up here in brown 00:06:15.060 --> 00:06:17.610 to yellow and all this magenta, I'm just restating 00:06:17.610 --> 00:06:19.340 the same thing inside of this. 00:06:19.340 --> 00:06:22.490 It became a little bit more loosey-goosey once I went from 00:06:22.490 --> 00:06:25.950 the exact standard deviation of the sampling distribution 00:06:25.950 --> 00:06:27.340 to an estimator for it. 00:06:27.340 --> 00:06:29.960 And that's why this is just becoming-- I kind of put the 00:06:29.960 --> 00:06:32.540 squiggly equal signs there to say we're reasonably 00:06:32.540 --> 00:06:35.110 confident-- and I even got rid of some of the precision. 00:06:35.110 --> 00:06:36.950 But we just found our interval. 00:06:36.950 --> 00:06:39.240 An interval that we can be reasonably confident that 00:06:39.240 --> 00:06:42.460 there's a 95% probability that p is within that, is going to 00:06:42.460 --> 00:06:45.190 be 0.43 plus or minus 0.1. 00:06:45.190 --> 00:06:48.460 Or an interval of-- we have a confidence interval. 00:06:48.460 --> 00:06:59.620 We have a 95% confidence interval of, and we could say, 00:06:59.620 --> 00:07:04.240 0.43 minus 0.1 is 0.33. 00:07:04.240 --> 00:07:08.710 If we write that as a percent we could say 33% to-- and if 00:07:08.710 --> 00:07:16.630 we add the 0.1, 0.43 plus 0.1 we get 53%-- to 53%. 00:07:16.630 --> 00:07:20.890 So we are 95% confident. 00:07:20.890 --> 00:07:24.340 So we're not saying kind of precisely that the probability 00:07:24.340 --> 00:07:28.550 of the actual proportion is 95%, but we're 95% confident 00:07:28.550 --> 00:07:35.870 that the true proportion is between 33% and 55%. 00:07:35.870 --> 00:07:37.830 That p is in this range over here. 00:07:37.830 --> 00:07:40.980 Or another way, and you'll see this in a lot of surveys that 00:07:40.980 --> 00:07:45.070 have been done, people will say we did a survey and we got 00:07:45.070 --> 00:07:55.180 43% will vote for number one, and number one in this case is 00:07:55.180 --> 00:07:56.430 candidate B. 00:07:56.430 --> 00:08:02.420 00:08:02.420 --> 00:08:04.450 And then the other side, since everyone else voted for 00:08:04.450 --> 00:08:13.290 candidate A, 57% will vote for A. 00:08:13.290 --> 00:08:15.460 And then they're going to put on margin of error. 00:08:15.460 --> 00:08:17.750 And you'll see this in any survey that you see on TV. 00:08:17.750 --> 00:08:22.350 They'll put a margin of error. 00:08:22.350 --> 00:08:24.920 And the margin of error is just another way of describing 00:08:24.920 --> 00:08:26.480 this confidence interval. 00:08:26.480 --> 00:08:29.330 And they'll say that the margin of error in this case 00:08:29.330 --> 00:08:37.200 is 10%, which means that there's a 95% confidence 00:08:37.200 --> 00:08:41.919 interval, if you go plus or minus 10% from that value 00:08:41.919 --> 00:08:42.510 right over there. 00:08:42.510 --> 00:08:44.850 And I really want to emphasize, you can't say with 00:08:44.850 --> 00:08:48.620 certainty that there is a 95% chance that the true result 00:08:48.620 --> 00:08:52.180 will be within 10% of this, because we had to estimate the 00:08:52.180 --> 00:08:55.030 standard deviation of the sampling mean. 00:08:55.030 --> 00:08:58.140 But this is the best measure we can with the information 00:08:58.140 --> 00:09:00.500 you have. If you're going to do a survey of 100 people, 00:09:00.500 --> 00:09:03.570 this is the best kind of confidence that we can get. 00:09:03.570 --> 00:09:05.600 And this number is actually fairly big. 00:09:05.600 --> 00:09:08.560 So if you were to look at this you would say, roughly there's 00:09:08.560 --> 00:09:12.360 a 95% chance that the true value of this number is 00:09:12.360 --> 00:09:15.100 between 33% and 53%. 00:09:15.100 --> 00:09:18.240 So there's actually still a chance that candidate B can 00:09:18.240 --> 00:09:21.170 win, even though only 43% of your 100 are 00:09:21.170 --> 00:09:21.920 going to vote for him. 00:09:21.920 --> 00:09:25.250 If you wanted to make it a little bit more precise you 00:09:25.250 --> 00:09:26.770 would want to take more samples. 00:09:26.770 --> 00:09:28.390 You can imagine. 00:09:28.390 --> 00:09:31.710 Instead of taking 100 samples, instead of n being 100, if you 00:09:31.710 --> 00:09:35.310 made n equal 1,000, then you would take this number over 00:09:35.310 --> 00:09:37.800 here, you would take this number here and divide by the 00:09:37.800 --> 00:09:40.570 square root of 1,000 instead of the square root of 100. 00:09:40.570 --> 00:09:43.250 So you'd be dividing by 33 or whatever. 00:09:43.250 --> 00:09:48.550 And so then the size of the standard deviation of your 00:09:48.550 --> 00:09:50.760 sampling distribution will go down. 00:09:50.760 --> 00:09:53.350 And so the distance of two standard deviations will be a 00:09:53.350 --> 00:09:55.480 smaller number, and so then you will have a 00:09:55.480 --> 00:09:57.210 smaller margin of error. 00:09:57.210 --> 00:10:00.290 And maybe you want to get the margin of error small enough 00:10:00.290 --> 00:10:02.920 so that you can figure out decisively who's going to win 00:10:02.920 --> 00:10:04.340 the election. 00:10:04.340 --> 00:10:04.734