
Title:

Description:

So, in the previous section,
you had an example of coarsegraining

from image compression

I showed you two different examples

of what you might call
realspace compression.

These were just things that
I made up,

that have some parallel
to what people have done

when they come to coarsegrain
a physical system.

What I did in particular
was take things

that were nearby each other in space
 I took a group

of objects, or a group of properties
of the system

that were nearby each other in space 
and I summarized them in some way.

So in the majorityrule case, for example,
all I did was, I took all the pixels

in a certain grid square, and I said, OK,
if the number of black pixels

outnumbers the number of white pixels,
then that pixel will be black,

and otherwise, it will be white.

And what I've done there, of course,
is taken a whole set of very different

grid squares, each of which
looks different,

and I mapped them all onto a single
kind of square

that only has one property, zero or one,
whereas here, in the tenbyten case,

there are in fact a hundred bits
of information, you might say.

So this is the general property
of coarsegraining; this is

a simple example of how we build
a function, f, that maps from some space S

to some space Sprime.
This is a list of properties of,

or this is the set of properties,
that describe the finegrained system 

for example, it could be the entire
productivity list,

the entire employment history
of every single person in the economy 

and over here is a much,
usually much smaller set of quantities

that we're going to use to describe
the coarsegrained system.

Coarsegraining is something we do
all the time.

Sometimes, we make it up ahead of time.
So for example,

when we came to do the coarsegraining
of the image from Alice in Wonderland,

we just took a couple different ways
in which we thought

that simplifying the image
might work  and of course,

what it means for it to work
is a problem that we often

don't think about 'til
right at the end, but here,

we took the majority vote.
We're saying, OK, look, you know

if this pixel, if this kind of collection
of pixels here is mostly black,

let's just call it black.
And the idea is that your eye

might not really be
sensitive to it. In the end,

what we're doing when we choose
that rule is probably having a theory

about how the eye works,
or perhaps about what we care about

when we look at an image.
Of course, you'd be crazy to

do this on a bar code or QR code,
all right,

because in that case, the device
that's scanning the image

is much more sensitive
than your eye is,

and so if you took a photograph
of a QR code,

and then coarsegrained it
in the way we did,

the QR code would probably
no longer work.

In that case, the information that
we'd be throwing out,

the distinctions between the different
elements of this set here,

that the function all maps
to the same point here,

those distinctions would have been lost.
So sometimes,

we think about the problem we want
to solve,

and invent a coarsegraining algorithm.
Other times 

and in fact this is becoming
increasingly common 

other times, we don't know what
the coarsegraining is when we begin,

and in fact, we'll do something
really simple, like use

a clustering algorithm.
And so a classic example of clustering

might be kmeans.
So what does kmeans do,

what does a kmeans algorithm do?
If you haven't heard this term,

don't worry; I'm going to tell you
what it means and you can go

look it up and do it
to your heart's content.

kmeans takes a set of highdimensional
data, and because we can't afford

a highdimensional whiteboard,
we only have two,

so what I'm going to do here is,
I'm going to put a lot of

data points here, and maybe, for example,
this axis here is height,

and this axis here is weight, OK,
and these points here refer

to individuals of a species
that you've captured in the wild,

maybe these are birds, right.
And so what kmeans will do is,

it will take that rich description,
OK, where every individual is

described by a real number, OK,
on both the x and y axes here,

each bird you've caught, let's say,
has a certain height and a certain weight,

and you go out, and you sort of
laboriously, you know,

maybe you're in some tropical country,
like you're in Costa Rica or something,

which seems to be where
all the bird people go, OK,

and then you say, well, I don't know,
something's going on there,

and you hand it to kmeans, which says,
oh, you know what, actually,

there's two clusters here, right?
There's this cluster, A,

and this cluster, B. And so
what kmeans has done is

mapped from a twodimensional
vector space, maybe not R^2

because you can't have, like, you know,
negative weights, so maybe it's just

the positive side of R^2, right,
it's mapped from that twodimensional

vector space into a binary set, A and B.
So in general, when you call up

a clustering algorithm,
really what you're asking it to do

is coarsegrain my data, OK,
and then if you're a biologist,

you might be like, you know,
there's something funny, like

all the A birds, they're all female,
let's say, and all the B birds, OK,

they're all male, OK,
and now you feel really good

because in fact what you've done is,
you've discovered, using this data

here, using this rich highdimensional
data, you've discovered a simple

lowdimensional explanation
for what's happening.

Why do birds have different weights?
Well, you know, the main thing is,

is if they're women, or if they're men,
if they're male or female.

So, clustering is sort of an automated way
to have the data suggest to you

what a good coarsegraining might be.
You may have noticed, of course, that

in fact, kmeans operates with a notion
of closeness just as the

Alice in Wonderland coarsegraining did.
In both cases, we were saying that

things that were near each other, OK,
should probably be described by

a single description, right?
All these points here get

the description A; all these points here
get the description B.

kmeans is in fact an example of
what we call a hard clustering, OK,

and in this case here it says that
each point is uniquely assigned

to a single cluster, and this kind of
hard clustering is a form of

coarsegraining, is the easiest to
understand, and we'll give an example of,

in particular, how hard clustering plays
a central role in information theory. OK.

There are other clustering algorithms
that give what are called soft clusterings,

and the most closely related
soft clustering algorithm to the

kmeans one is called the GMM,
or the Gaussian mixture model.

And in fact the GMM is very similar
to kmeans. What it does is, it takes

highdimensional data, a vector of
highdimensional data, or rather a set

of vectors of highdimensional data,
right, each point referring to

a particular observation, and what the GMM
does is says, oh, you know, it looks like

this could be described as, like,
you're kind of drawing from two

Gaussians, right, the Gaussians
have a particular center, right,

they have a particular variance
and covariance matrix,

so they kind of, like,
these little sausages that lie here,

right, OK, and so what the GMM does is,
it produces a Bayesian model

of the way your data might have been
created, OK, and then for each point here,

it tells you the probability that that point
was drawn either from this Gaussian 

we'll call that G_1  or this
Gaussian  call it G_2.

So in fact, in this case, it's giving you
not a unique assignment, but it's

giving you a sort of probabilistic
assignment. In particular, it's mapping

this twodimensional vector space 
and let's say it finds two clusters 

it's mapping that twodimensional
vector space into the, sorry,

into the interval [0, 1], which you can
think of here as the probability

that the point was drawn from cluster A.
So this point here, right, really centered

to the first cluster, that would have a
very high probability of being a member

of G_1 and a much lower probability of
being a member of G_2.

So these are subtly different concepts
here  there's the hard clustering case

when it takes each point and says, OK,
here is the coarsegrained category

you belong to. Just as in the case of the
Alice in Wonderland algorithm, what

we did was, we took that little
grid square, that tenbyten grid,

and we say, OK, you're majority zero,
therefore you belong to

the category zero, you're majority one,
you belong to the category one.

The soft clustering is a little weaker,
right; it sort of avoids

making a decision. You'll encounter,
probably in your own research,

different kinds of clustering algorithms,
different kinds of clustering choices.

People tend to prefer hard clustering,
because you don't have to think as hard

when you work with it.