So let's do it. Let's calculate mean
and standard deviation. And to do that, let's
think back to our example with stores and
sales. And let's say the question we want to
answer is, is there any correlation between the
day of the week and how much money
people spend on various items? And what's interesting
about this design pattern is that all the
mapper has to do is, I'll put the day of the week as a key, so maybe Monday, and
the value of a sale, maybe $5.20 as a value.
That's it. What does that leave for the reducer? Well,
it leaves all the math for the reducer. And the
general reason for this rule of thumb, for what the
mapper and reducer are doing, comes from the fact that
oftentimes with these, with these summary statistics, you sort of need
to know all of the statistics or all of
the parent data before you can make any calculations. So
we don't want to jump the gun and have the mapper
do calculations before it's ready. So why don't you go
ahead and calculate the mean and standard deviation for
sales for each day of the week, to help us
try to answer this question. If there's any correlation between
the day of the week and how much people spend.