Return to Video

Heat Maps - Data Analysis with R

  • 0:00 - 0:03
    The last plot that we'll make for this course is called
  • 0:03 - 0:05
    a Heat Map. For our data set we want to display
  • 0:05 - 0:09
    each combination of gene and sample case, the difference in gene
  • 0:09 - 0:13
    expression and the sample from the base line. We want to display combinations
  • 0:13 - 0:17
    where a gene is over expressed in red. in combinations where
  • 0:17 - 0:20
    it is under expressed in blue. Here's the code to make that
  • 0:20 - 0:23
    Heat Map. First, we'll run all of this in order to
  • 0:23 - 0:26
    melt our data to a long format. And then we just run
  • 0:26 - 0:29
    our ggplot code using the geom, geom tile. Now,
  • 0:29 - 0:32
    this last line is going to give us a scale gradient.
  • 0:32 - 0:34
    And we're going to use the colors from blue to red.
  • 0:34 - 0:36
    So, let's see what the output looks like. And, there's
  • 0:36 - 0:40
    our Heat Map. Even with such a dense display, we
  • 0:40 - 0:43
    aren't looking at all the data. In particular, we're just
  • 0:43 - 0:46
    showing the first 200 genes. That's 200 genes of over
  • 0:46 - 0:50
    6,000 of them. And since this data set was produced.
  • 0:50 - 0:53
    Genomic data sets of these kind, sometimes called
  • 0:53 - 0:57
    micro data are only getting larger, and more complex.
  • 0:57 - 1:00
    What's most interesting, is that other data sets also
  • 1:00 - 1:03
    look like this. For example, internet companies run lots
  • 1:03 - 1:07
    of randomized experiments. Where in the simplest versions, users
  • 1:07 - 1:10
    are randomly assigned to a treatment like a new
  • 1:10 - 1:12
    version of a website or some sort of new
  • 1:12 - 1:15
    feature or product or a control condition. Then the
  • 1:15 - 1:18
    difference in outcome between the treatment and control can
  • 1:18 - 1:21
    be computed for a number of metrics of interest.
  • 1:21 - 1:23
    In many situations, there might have been hundreds or
  • 1:23 - 1:26
    thousands of experiments and hundreds of metrics. This data
  • 1:26 - 1:28
    looks very similar to the genomic data in some
  • 1:28 - 1:31
    ways. And this is why the useful maxim plot
  • 1:31 - 1:34
    all the data might not always apply to a
  • 1:34 - 1:36
    data set as it did to most of this course.
タイトル:
Heat Maps - Data Analysis with R
Video Language:
English
Team:
Udacity
プロジェクト:
UD651: Exploratory Data Analysis
Duration:
01:37

English subtitles

改訂 Compare revisions