-
- [Instructor] Let's
say that we run a school
-
and in that school there is a population
-
of students right over here.
-
And that is our population.
-
And we want to get a sense of how
-
these students feel about the
quality of math instruction
-
at the school, so we construct a survey,
-
and we just need to decide
who are we going to get
-
to actually answer this survey.
-
One option is to just go to
every member of the population,
-
but let's just say it's
a really large school.
-
Let's say we're a college
-
and there's 10,000 people in the college.
-
We say, well, we can't
just talk to everyone.
-
So instead, we say, let's
sample this population
-
to get an indication of how
the entire school feels.
-
So we are going to sample it.
-
We are going to sample that population.
-
Now in order to avoid having bias
-
in our response, in order for it to
-
have the best chance of
it being indicative of the
-
entire population, we want
our sample to be random.
-
So our sample could either be random,
-
random, or not random.
-
Not random.
-
And it might seem, at first,
pretty straightforward
-
to do a random sample, but when
you actually get down to it,
-
it's not always as straightforward
as you would think.
-
So one type of random sample
is just a simple random sample.
-
So, simple, simple,
-
random, random, sample,
-
and this is saying, alright, let me
-
maybe assign a number to
every person in the school,
-
maybe they already have
a student ID number,
-
and I'm just going to get a computer,
-
a random number generator, to generate the
-
100 people, the 100 students,
-
so let's say there's a
sample of 100 students,
-
that I'm going to apply the survey to,
-
so that would be a simple random sample.
-
We are just going into this
whole population and randomly,
-
let me just draw this.
-
So this is the population,
we are just randomly
-
picking people out, and we
know it's random because
-
a random number generator, or
we have a string of numbers
-
or something like that,
that is allowing us
-
to pick the students.
-
Now that's pretty good, it's
unlikely that you're going
-
to have bias from this
sample, but there is some
-
probability that, just by chance,
-
your random number generator
just happened to select
-
maybe a disproportionate
number of boys over girls,
-
or a disproportionate number of freshmen,
-
or a disproportionate
number of engineering majors
-
versus English majors,
and that's a possibility.
-
So even though you are
taking a simple random sample
-
that is truly random, once
again, it's some probability
-
that it's not indicative
of the entire population.
-
And so to mitigate that,
there are other techniques
-
at our disposal.
-
One technique is a stratified sample.
-
Stratified.
-
And so this is the idea of
taking our entire population
-
and essentially stratifying it.
-
So let's say we want to, we
take that same population,
-
we take that same
population, I'll draw it as a
-
square here just for convenience,
-
and we're gonna stratify it by,
-
let's say we're concerned that
we get a appropriate sample
-
of freshmen, sophomores,
juniors, and seniors.
-
So we'll stratify it by
freshmen, sophomores,
-
juniors, and seniors, and then we sample
-
25 from each of these groups.
-
So these are the stratifications.
-
This is freshmen, sophomore,
juniors, and seniors,
-
and instead of just sampling
100 out of the entire pool,
-
we sample 25 from each of these.
-
So just like that.
-
And so that makes sure that you are
-
getting indicative responses from
-
at least all of the different age groups
-
or levels within your university.
-
Now there might be another
issue where you say,
-
well, I'm actually more
concerned that we have
-
accurate representation of
males and females in the school,
-
and there is some probability,
-
you know, if I do 100
random people, it's very
-
likely that it's close to
50/50, but there's some chance,
-
just due to randomness,
there's disproportionately male
-
or disproportionately female.
-
And that's even possible
in the stratified case.
-
And so what you might say is,
-
well, you know what I'm gonna do?
-
I'm going to, there's a technique
called a clustered sample.
-
Let me write this right
over here, clustered,
-
a clustered sample, and what
we do is we sample groups.
-
Each of those groups we feel confident has
-
a good balance of male females.
-
So, for example, we might,
-
instead of sampling individuals
from the entire population,
-
we might say, look, you know,
-
on Tuesdays and Thursdays, and this, well,
-
even there as you can tell this is not a
-
trivial thing to do, let's
just say that we can split,
-
let's say we can split our population
-
into groups, maybe these are classrooms,
-
and each of these classrooms
have an even distribution
-
of males and females, or pretty
close to even distributions.
-
And so what we do is we
sample the actual classrooms,
-
so that's why it's called
cluster, or cluster technique,
-
or clustered random
sample, because we're going
-
to randomly sample our
classrooms, each of which have a
-
close or maybe a exact
balance of males and females
-
so we know that we're gonna
get good representation,
-
but we are still sampling,
we are sampling from
-
the clusters, but then we're gonna survey
-
every single person in
each of these clusters,
-
every single person in
one of these classrooms.
-
So, once again, these are
all forms of random surveys,
-
or random samples, you have
the simple random sample,
-
you can stratify, or
you can cluster and then
-
randomly pick the clusters and then survey
-
everyone in that cluster.
-
Now if these are all random samples,
-
what are the non-random things like?
-
Well, one case of
non-random, you could have a
-
voluntary survey,
-
or voluntary sample,
and this might just be
-
you tell every student at the school,
-
"Hey, here's a web address.
-
"If you're interested, come
and fill out this survey."
-
And that's likely to
introduce bias because
-
you might have maybe the
students who really like
-
the math instruction at their school
-
more likely to fill it out,
maybe the students who really
-
don't like it are more
likely to fill it out,
-
maybe it's just the
kids who have more time
-
more likely to fill it out.
-
So this has a good chance
of introducing bias.
-
The students who fill out the survey
-
might be just more skewed
one way or the other because,
-
you know, they volunteered for it.
-
Another not random sample would be called
-
you're introducing bias
because of convenience
-
is the term that's often used,
-
and this might say, well,
let's just sample the 100
-
first students who show up in school.
-
And that's just convenient for me because
-
I didn't have to use random numbers,
-
or do the stratification, or
doing any of this clustering,
-
but you can understand how
this also would introduce bias,
-
because the first 100 students
who show up at school,
-
maybe those are the
most diligent students,
-
maybe they all take an
early math class that has
-
a very good instructor where
they're all happy about it.
-
Or it might go the other way,
-
the instructor there
isn't the best one, and so
-
it might introduce bias the other way.
-
So if you let people
volunteer or you just say,
-
"Oh, let me do the first N students."
-
Or you say, "Hey, let me just
talk to all of the students
-
"who happen to be in
front of me right now."
-
They might be in front of
you out of convenience,
-
but they might not be
a true random sample.
-
Now there is other reasons
why you might introduce bias,
-
and it might not be
because of the sampling.
-
You might introduce bias because of the
-
wording of your survey.
-
You could imagine a survey that says,
-
do you consider yourself
lucky to get a math education
-
that very few other people
in the world have access to?
-
Well, that might bias you to say,
-
"Well, yeah, I guess I feel lucky."
-
Well, if the wording was,
-
do you like the fact
that a disproportionate
-
more students at your school tend to fail
-
algebra than our surrounding schools?
-
Well, that might bias you negatively.
-
So the wording really, really,
really matters in surveys,
-
and there is a lot that
would go into this.
-
And the other one is just people's,
-
you know, it's called response bias.
-
And, once again, this isn't about...
-
Response bias.
-
And this is just people not
wanting to tell the truth
-
or maybe not wanting to respond at all.
-
Maybe they're afraid that somehow
-
their response is gonna show up
-
in front of their math
teacher or the administrators,
-
or if they're too negative,
-
it might be taken out on them in some way.
-
And because of that, they
might not be truthful,
-
and so they might be overly positive
-
or not fill it out at all.
-
So anyway, this is a
very high level overview
-
of how you could think about sampling.
-
You want to go random
because it lowers the
-
probability of their
introducing some bias into it.
-
And then these are some techniques.
-
And also think about
whether you're falling
-
into some of these pitfalls
that have a good chance
-
of introducing bias.