Hello. In this video, I would like to introduce you to the concept of sampling distributions. Now, sampling distributions are going to be fundamental to understanding how we are able to derive at certain knowledge about distributions. So what -- Let's, first of all, let's just recap what a sample is and what a population is. So we've seen this diagram before. A population is essentially all the individuals we could possibly ever get information on, get data. And populations -- let's just pick a very, very boring example of height. Maybe the population is all UT students, all UT students. Now, every single student enrolled at UT, if we were able to get all of their heights, there would be some true measure of the average height. The average height would be a population mean, and there would be some value that that is. We'd also be able to get some other truth, we call these truths. A standard deviation, if we were able to measure every however many thousand UT students (inaudible). If we were able to measure those things, we'd be able to get those values, and they're the population values, and you may remember we call them parameters. Now, that's pretty much impossible. We would never actually, in theory, ever be able to measure the height of all UT students. So what we do, if we were interested, if this was something we were interested in for some strange reason, we'd be able to just collect a sample. I don't know. Maybe we'd get a sample of five people, or maybe we would get a sample of 50 people, or maybe we would get a sample of 500. In future videos, we'll talk about what's an appropriate size of sample to collect when we're interested in finding out something. But for now, who knows? Maybe we picked ten people, but we pick a sample from those individuals. Maybe we measure, we measure all their heights and we calculate their average height and we call that the sample mean, "x-bar." We're also able to calculate their, maybe their sample standard deviation. We'd call that "s," something like that. These things we call statistics. This is just recap stuff. And the purpose of calculating the statistics is because we want to estimate these parameters. We want to make an estimation of what the true value is. Now the issue is, let's say we're just collecting samples of five, and then maybe another day, we got a different sample of five, and then on another day, we're addicted to measuring the heights of people, we get another sample of five. Every time we get another sample of five, let's, for now, just consider the sample mean. We're not that interested in the standard deviation of these. We're just gonna get a different average every time we collect a sample of five. It'd be very, very unusual if we randomly sampled five individuals from all UT students, and it was the same five people. It's almost unlikely ever to be the same five people, and so the average height of the sample mean is going to be different every single time. And I don't know, let's say another day we did another five, we got another sample mean. So we're trying to estimate these things, but our values are going to generally be different, and then they're never going to be exactly -- well they could actually be exactly the population mean, but it's unlikely. So what we've just done, by the way, is a sampling distribution. I kind of, by accident, have introduced you to it. A sampling distribution is these things: It's when you collect lots, and lots, and lots of samples. You collect lots and lots of samples, and then you just look at all the values that you got. And then you tend to plot them as a histogram. So in the next couple of slides, I'll just more formally go through this, because it gets very slightly more exciting than this. Well this distribution, actually, looks quite nice when you collect the data. Okay, so in this case, I don't have -- my population is not all UT students. My population is four people. There's just four people. Maybe my population is four people that are roommates together, and I ask them, "How many people do you phone in one week?" And one person said they phoned four people, one person said seven, one person said five, one person said eight. So, if you look at our population mean, it's six. That's just 4+5+7+8 divided by the number of individuals, which is four. Our mean is six. So, I was really interested in knowing how many, this population of this one house, I wanted to know what's the average number of people that they phone in a week. But I thought collecting data on four people was too much work, so I decided, I'm just going to collect samples of three people. And so, I got a five, a seven, and an eight. They were the first three people that I sampled. And so my sample mean here is 6.67, which is 5+7+8 divided by my sample size of three. So I got one sample mean, but this is not a sampling distribution because it's just one sample. For a sampling distribution, I need to do this over and over and over again. So I can do something like this, where I do, let's say, one sample, two samples, three samples, four samples, five samples, six samples, and here are my values: 5,7,8; 4,5,8; 4,7,8; 4, 8, 8; and for each of these, I calculate the sample mean. This is a sampling distribution. I've now got many samples, and I've collected the sample mean for each of them. One thing -- there's two things you may have noticed about this: one thing I want you to notice is that I've actually sampled with replacement. So in the UT example, I said it would be really unlikely to ever get the same five people. More than that, if you randomly selected five people, it's unlikely to get the same individual twice in one sample. But for sampling distributions, we tend to actually say that we will sample with replacement. That's just a -- it's not something we actually need to worry too much about right now. I just want you to be aware of -- in this particular example, I sampled with replacement, which means you could sample the same individual twice, or even three times, because you'll notice the 64th sample had the same individual three times and I got 8,8,8, and the mean of that sample mean is 8. The second thing you may have noticed is that the sample number only goes up to 64 here. I've done 64 samples, and that's because when you collect three numbers from four potential numbers, there's only 64 combinations of the data, so I just stopped at 64. Okay, but what I want you to realize is that I have collected 64 samples. These dots just refer to -- I didn't put all the data between 6 and 64. I have 64 sample means. So I have 64 of these things, so what should I do with them? Well, I could plot them on a histogram. So, here's my histogram of the 64 sample means, and I wonder what you think about it. Well, one thing is I've used a nice color, I think, for the bars, but another thing is maybe you see that there's potentially a shape in this data. So if I, if you allow me to be kind of, draw a curve over it, I haven't done a great job, but there's a curve, and it looks normal-ish. I'm using the word, "ish," because it's obviously not normal. It's kind of getting there to be normal distributed. So this is just a histogram of all the possible sample means. And here is the one where we had 8,8,8, and actually there's one down here where we got 4,4,4, but there's all the ones in between as well. Now what we -- And I'll just put the curve back on it. Maybe I'll actually this time do a better job. I'm not sur -- No, I didn't. I'm gonna undo that because that was a terrible job. Let's see if I can -- I think going slow... no, it will do. Mmm... no it won't. Let's do it again. This time, third time's lucky. Okay I'm happy with that. This is the sampling distribution. I just told you that, but I want you to formally know what it means. It's the sampling distribution, dot dot dot, for the sample mean, that's what we collected. We collected the sample mean, so it's a sampling distribution of the sample mean, and then we say for n=3. Because our sample could have been 2, n=2m in which case we would have had a different distribution. It could've been 1. We could've just been really lazy and said, "I only want to just check one person and ask them," and calculated these sample mean for 1. But is the sample -- we did three people, and so this is technically the sampling distribution for the sample mean for n=3. Okay, so that's probably enough for now on introducing sampling distribution. I hope you understand about what it is. It is the -- you collect many samples, and you collect some information about each of those samples -- in this case, it was the sample mean -- and then you plot them as a histogram, and you are able to look at the shape of the distribution. And that's what we call a sampling distribution. And we're going to extend this idea in future videos.