
Title:
Overplotting and Domain Knowledge  Data Analysis with R

Description:

In the last exercise, we used alpha and jitter to reduce

over plotting, but it turns out that there's more that we can

do. Let's hear from Mora about how she used her domain

knowledge and a transformation to make an adjustment to her scatter plot.

>> The next thing that I did, was

to take again, the perceived audience size, and their

actual audience size, but this time I transformed

the axes. So this time, it's as a percentage

of their friend count. Some people in this study had

50 friends, some had 100, some had 2,000, and so,

it actually makes more sense to think about your audience

size as a percentage of the possible audience. All of

the people in the study had shared their post with

friends only privacy, so you'd expect that it would be

bounded by their friend count. So, what we found when

we plotted it this way was that all of the points

are below this line of perfect accuracy, this diagonal line, really

well below. And one other thing I should note about this

plot, we actually ran two different surveys. We ran one survey

where we asked people in a single post, how many people

do you think saw, saw your post? But we also asked

a different set of people, in general, how many people do

you think see the content that you share on Facebook? So

that's what this plot is showing. This is the in general

question, and their guesses are a little bit

higher. But still, people typically think people that maybe

10% of their friends see their content when in

reality it's more like 40% or 50%, even 60%

of their friends will see their content in

a given month. So that's what this plot is

showing, is the percentage of friends who actually saw

their content in the last month, again, they're underestimating.