YouTube

Got a YouTube account?

New: enable viewer-created translations and captions on your YouTube channel!

English subtitles

← renorm 1 3 Coarse Graining I Clustering Algorithms

Get Embed Code
1 Language

Download

Showing Revision 4 created 03/16/2019 by Alexander Krotov.

  1. So, in the previous section,
    you had an example of coarse-graining
  2. from image compression
  3. I showed you two different examples
  4. of what you might call
    real-space compression.
  5. These were just things that
    I made up,
  6. that have some parallel
    to what people have done
  7. when they come to coarse-grain
    a physical system.
  8. What I did in particular
    was take things
  9. that were nearby each other in space
    --- I took a group
  10. of objects, or a group of properties
    of the system
  11. that were nearby each other in space ---
    and I summarized them in some way.
  12. So in the majority-rule case, for example,
    all I did was, I took all the pixels
  13. in a certain grid square, and I said, OK,
    if the number of black pixels
  14. outnumbers the number of white pixels,
    then that pixel will be black,
  15. and otherwise, it will be white.
  16. And what I've done there, of course,
    is taken a whole set of very different
  17. grid squares, each of which
    looks different,
  18. and I mapped them all onto a single
    kind of square
  19. that only has one property, zero or one,
    whereas here, in the ten-by-ten case,
  20. there are in fact a hundred bits
    of information, you might say.
  21. So this is the general property
    of coarse-graining; this is
  22. a simple example of how we build
    a function, f, that maps from some space S
  23. to some space S-prime.
    This is a list of properties of,
  24. or this is the set of properties,
    that describe the fine-grained system ---
  25. for example, it could be the entire
    productivity list,
  26. the entire employment history
    of every single person in the economy ---
  27. and over here is a much,
    usually much smaller set of quantities
  28. that we're going to use to describe
    the coarse-grained system.
  29. Coarse-graining is something we do
    all the time.
  30. Sometimes, we make it up ahead of time.
    So for example,
  31. when we came to do the coarse-graining
    of the image from Alice in Wonderland,
  32. we just took a couple different ways
    in which we thought
  33. that simplifying the image
    might work --- and of course,
  34. what it means for it to work
    is a problem that we often
  35. don't think about 'til
    right at the end, but here,
  36. we took the majority vote.
    We're saying, OK, look, you know
  37. if this pixel, if this kind of collection
    of pixels here is mostly black,
  38. let's just call it black.
    And the idea is that your eye
  39. might not really be
    sensitive to it. In the end,
  40. what we're doing when we choose
    that rule is probably having a theory
  41. about how the eye works,
    or perhaps about what we care about
  42. when we look at an image.
    Of course, you'd be crazy to
  43. do this on a bar code or QR code,
    all right,
  44. because in that case, the device
    that's scanning the image
  45. is much more sensitive
    than your eye is,
  46. and so if you took a photograph
    of a QR code,
  47. and then coarse-grained it
    in the way we did,
  48. the QR code would probably
    no longer work.
  49. In that case, the information that
    we'd be throwing out,
  50. the distinctions between the different
    elements of this set here,
  51. that the function all maps
    to the same point here,
  52. those distinctions would have been lost.
    So sometimes,
  53. we think about the problem we want
    to solve,
  54. and invent a coarse-graining algorithm.
    Other times ---
  55. and in fact this is becoming
    increasingly common ---
  56. other times, we don't know what
    the coarse-graining is when we begin,
  57. and in fact, we'll do something
    really simple, like use
  58. a clustering algorithm.
    And so a classic example of clustering
  59. might be k-means.
    So what does k-means do,
  60. what does a k-means algorithm do?
    If you haven't heard this term,
  61. don't worry; I'm going to tell you
    what it means and you can go
  62. look it up and do it
    to your heart's content.
  63. k-means takes a set of high-dimensional
    data, and because we can't afford
  64. a high-dimensional whiteboard,
    we only have two,
  65. so what I'm going to do here is,
    I'm going to put a lot of
  66. data points here, and maybe, for example,
    this axis here is height,
  67. and this axis here is weight, OK,
    and these points here refer
  68. to individuals of a species
    that you've captured in the wild,
  69. maybe these are birds, right.
    And so what k-means will do is,
  70. it will take that rich description,
    OK, where every individual is
  71. described by a real number, OK,
    on both the x and y axes here,
  72. each bird you've caught, let's say,
    has a certain height and a certain weight,
  73. and you go out, and you sort of
    laboriously, you know,
  74. maybe you're in some tropical country,
    like you're in Costa Rica or something,
  75. which seems to be where
    all the bird people go, OK,
  76. and then you say, well, I don't know,
    something's going on there,
  77. and you hand it to k-means, which says,
    oh, you know what, actually,
  78. there's two clusters here, right?
    There's this cluster, A,
  79. and this cluster, B. And so
    what k-means has done is
  80. mapped from a two-dimensional
    vector space, maybe not R^2
  81. because you can't have, like, you know,
    negative weights, so maybe it's just
  82. the positive side of R^2, right,
    it's mapped from that two-dimensional
  83. vector space into a binary set, A and B.
    So in general, when you call up
  84. a clustering algorithm,
    really what you're asking it to do
  85. is coarse-grain my data, OK,
    and then if you're a biologist,
  86. you might be like, you know,
    there's something funny, like
  87. all the A birds, they're all female,
    let's say, and all the B birds, OK,
  88. they're all male, OK,
    and now you feel really good
  89. because in fact what you've done is,
    you've discovered, using this data
  90. here, using this rich high-dimensional
    data, you've discovered a simple
  91. low-dimensional explanation
    for what's happening.
  92. Why do birds have different weights?
    Well, you know, the main thing is,
  93. is if they're women, or if they're men,
    if they're male or female.
  94. So, clustering is sort of an automated way
    to have the data suggest to you
  95. what a good coarse-graining might be.
    You may have noticed, of course, that
  96. in fact, k-means operates with a notion
    of closeness just as the
  97. Alice in Wonderland coarse-graining did.
    In both cases, we were saying that
  98. things that were near each other, OK,
    should probably be described by
  99. a single description, right?
    All these points here get
  100. the description A; all these points here
    get the description B.
  101. k-means is in fact an example of
    what we call a hard clustering, OK,
  102. and in this case here it says that
    each point is uniquely assigned
  103. to a single cluster, and this kind of
    hard clustering is a form of
  104. coarse-graining, is the easiest to
    understand, and we'll give an example of,
  105. in particular, how hard clustering plays
    a central role in information theory. OK.
  106. There are other clustering algorithms
    that give what are called soft clusterings,
  107. and the most closely related
    soft clustering algorithm to the
  108. k-means one is called the GMM,
    or the Gaussian mixture model.
  109. And in fact the GMM is very similar
    to k-means. What it does is, it takes
  110. high-dimensional data, a vector of
    high-dimensional data, or rather a set
  111. of vectors of high-dimensional data,
    right, each point referring to
  112. a particular observation, and what the GMM
    does is says, oh, you know, it looks like
  113. this could be described as, like,
    you're kind of drawing from two
  114. Gaussians, right, the Gaussians
    have a particular center, right,
  115. they have a particular variance
    and covariance matrix,
  116. so they kind of, like,
    these little sausages that lie here,
  117. right, OK, and so what the GMM does is,
    it produces a Bayesian model
  118. of the way your data might have been
    created, OK, and then for each point here,
  119. it tells you the probability that that point
    was drawn either from this Gaussian ---
  120. we'll call that G_1 --- or this
    Gaussian --- call it G_2.
  121. So in fact, in this case, it's giving you
    not a unique assignment, but it's
  122. giving you a sort of probabilistic
    assignment. In particular, it's mapping
  123. this two-dimensional vector space ---
    and let's say it finds two clusters ---
  124. it's mapping that two-dimensional
    vector space into the, sorry,
  125. into the interval [0, 1], which you can
    think of here as the probability
  126. that the point was drawn from cluster A.
    So this point here, right, really centered
  127. to the first cluster, that would have a
    very high probability of being a member
  128. of G_1 and a much lower probability of
    being a member of G_2.
  129. So these are subtly different concepts
    here --- there's the hard clustering case
  130. when it takes each point and says, OK,
    here is the coarse-grained category
  131. you belong to. Just as in the case of the
    Alice in Wonderland algorithm, what
  132. we did was, we took that little
    grid square, that ten-by-ten grid,
  133. and we say, OK, you're majority zero,
    therefore you belong to
  134. the category zero, you're majority one,
    you belong to the category one.
  135. The soft clustering is a little weaker,
    right; it sort of avoids
  136. making a decision. You'll encounter,
    probably in your own research,
  137. different kinds of clustering algorithms,
    different kinds of clustering choices.
  138. People tend to prefer hard clustering,
    because you don't have to think as hard
  139. when you work with it.