-
Not Synced
1
00:00:04,470 --> 00:00:10,000
Blow and this video, I'm going to introduce our week three module on presenting data.
-
Not Synced
2
00:00:10,000 --> 00:00:14,110
So our learning outcomes for this week are for you to be able to create plots from data,
-
Not Synced
3
00:00:14,110 --> 00:00:18,910
identify the appropriate plot for type of plot for data in question.
-
Not Synced
4
00:00:18,910 --> 00:00:24,130
I want you to be able to read and interpret a plot, refine a plot more clearly, show its data,
-
Not Synced
5
00:00:24,130 --> 00:00:34,030
but then also put these plots and our other presentations in discussions of data into a well-organized notebook to present your data analysis.
-
Not Synced
6
00:00:34,030 --> 00:00:41,260
Before we dove into how to actually present data, I want to start with talking about some of the purposes of data presentation,
-
Not Synced
7
00:00:41,260 --> 00:00:45,190
because these purposes should guide your presentation design decisions.
-
Not Synced
8
00:00:45,190 --> 00:00:49,750
They should guide your evaluation of your own presentations and those of others.
-
Not Synced
9
00:00:49,750 --> 00:00:56,050
They'll also guide my evaluation of your presentation when you are submitting assignments.
-
Not Synced
10
00:00:56,050 --> 00:01:02,320
And one of the first things we need to do is guide the reader attention to important results.
-
Not Synced
11
00:01:02,320 --> 00:01:07,900
And effective presentation is not going to have every piece of thing in it.
-
Not Synced
12
00:01:07,900 --> 00:01:12,820
It's going to draw focus and attention to the important results and make it easy for
-
Not Synced
13
00:01:12,820 --> 00:01:17,710
your reader to ask the key questions around the data analysis you're presenting.
-
Not Synced
14
00:01:17,710 --> 00:01:22,420
But then it also needs to substantiate the results and conclusions.
-
Not Synced
15
00:01:22,420 --> 00:01:30,010
So when we're presenting the data, we want to guide the reader to focus on what it is that we want them to learn,
-
Not Synced
16
00:01:30,010 --> 00:01:39,430
but also in a context that gives them the information needed to assess the validity of the conclusions that we're presenting.
-
Not Synced
17
00:01:39,430 --> 00:01:44,710
And we want to do so with integrity. It's easy to make charts that highlight the result.
-
Not Synced
18
00:01:44,710 --> 00:01:50,440
You want the user to see whether or not or not that result is rigorously defensible from the data.
-
Not Synced
19
00:01:50,440 --> 00:01:57,730
And we want to avoid making those kinds of misleading data visualizations and presentations.
-
Not Synced
20
00:01:57,730 --> 00:02:03,910
So in doing this, we want to be able to think about the audience and you're gonna be presenting data to several different audiences.
-
Not Synced
21
00:02:03,910 --> 00:02:05,710
The first audience is yourself.
-
Not Synced
22
00:02:05,710 --> 00:02:12,770
When you're working with a data set, when you're trying to understand what you're learning from it, the results of your inferences,
-
Not Synced
23
00:02:12,770 --> 00:02:21,070
your presenting data to yourself, and you need the presentation to be clear so that you understand what it is that you're learning from the data.
-
Not Synced
24
00:02:21,070 --> 00:02:26,980
So you see the next question to asks. You're not misleading yourself in the data analysis process.
-
Not Synced
25
00:02:26,980 --> 00:02:34,000
But those kinds of charts don't necessarily need the same level of Polish that a chart for an external audience.
-
Not Synced
26
00:02:34,000 --> 00:02:39,880
Would your collaborators, supervisors, et cetera, need to be able to see the data, see what you're learning?
-
Not Synced
27
00:02:39,880 --> 00:02:43,690
Maybe it's in the weekly meeting you have with your research advisor.
-
Not Synced
28
00:02:43,690 --> 00:02:47,130
You're presenting them with the results that you found.
-
Not Synced
29
00:02:47,130 --> 00:02:53,340
They are people who have a lot of knowledge of the project you're working on another problem base you're working on.
-
Not Synced
30
00:02:53,340 --> 00:02:58,230
They're gonna help you guide and refine your questions.
-
Not Synced
31
00:02:58,230 --> 00:03:06,660
Again, they perhaps don't need as much Polish as a final published result, but they're not just the ones that you're creating internally for yourself.
-
Not Synced
32
00:03:06,660 --> 00:03:08,550
You're going to be presenting to expert readers.
-
Not Synced
33
00:03:08,550 --> 00:03:15,030
If you write a scientific paper, the readers are probably usually wouldn't have some level of expertize in the topic that you're talking about.
-
Not Synced
34
00:03:15,030 --> 00:03:20,550
They may know. They'll probably know the subject in general, but they may not know your specific work.
-
Not Synced
35
00:03:20,550 --> 00:03:28,650
You may be presenting this to decision makers, especially if you're doing a data science project in an industrial or corporate environment.
-
Not Synced
36
00:03:28,650 --> 00:03:29,820
You're providing data.
-
Not Synced
37
00:03:29,820 --> 00:03:39,240
That's going to inform the decisions that your boss, who may not have significant statistical expertize or data expertize that may,
-
Not Synced
38
00:03:39,240 --> 00:03:45,840
but they're going to be using those decision, those the data and the the data that you present in order to make decisions.
-
Not Synced
39
00:03:45,840 --> 00:03:52,170
And then finally, you may occasionally be be producing or presenting data for the general public at large.
-
Not Synced
40
00:03:52,170 --> 00:03:57,120
Each of these audiences is going to require different things from your data presentation.
-
Not Synced
41
00:03:57,120 --> 00:04:04,830
So you need to understand who it is that you're presenting the data to in order to make appropriate data presentation decisions.
-
Not Synced
42
00:04:04,830 --> 00:04:14,940
When we're presenting the data, here are some questions that are going to help us understand what it is that we need to guide the reader towards.
-
Not Synced
43
00:04:14,940 --> 00:04:22,110
So we need to be clear on the reader needs to come away knowing what we sought to find out.
-
Not Synced
44
00:04:22,110 --> 00:04:30,360
This might be just explicitly stating our research questions, but they need to know the purpose that the data we're presenting is supposed to serve.
-
Not Synced
45
00:04:30,360 --> 00:04:36,630
What are they supposed to learn from it? We then they didn't then need to see what we do learn.
-
Not Synced
46
00:04:36,630 --> 00:04:43,110
And then they need to see the supporting evidence, the context to trust the conclusions.
-
Not Synced
47
00:04:43,110 --> 00:04:48,330
It's not just enough to say here's the results, but in many cases you need to provide enough data,
-
Not Synced
48
00:04:48,330 --> 00:04:56,190
enough context that they not only see what you learned, but they see why you believe it is true.
-
Not Synced
49
00:04:56,190 --> 00:05:04,350
And presentation with integrity shows the reader and really makes it clear to the reader what we learned.
-
Not Synced
50
00:05:04,350 --> 00:05:08,370
The evidentiary support behind it, why it flows from the data.
-
Not Synced
51
00:05:08,370 --> 00:05:17,280
Whereas dishonest presentation manipulates them into the conclusion without having the rigorous foundation underneath it.
-
Not Synced
52
00:05:17,280 --> 00:05:26,790
So I want to show you an example of a very bad graphic that came up out of the state of Georgia earlier this year.
-
Not Synced
53
00:05:26,790 --> 00:05:40,980
They presented campaign ad hundred network television, a graph that's purporting to show Kofod cases in various high population counties over time.
-
Not Synced
54
00:05:40,980 --> 00:05:49,170
But if you look closely at the Y at the X axis of this graph and you'll see this more clearly when you go and look at the slides,
-
Not Synced
55
00:05:49,170 --> 00:05:57,660
the axis is not sorted. It starts with the twenty eighth of April and then it goes to the twenty seventh, followed by the twenty ninth.
-
Not Synced
56
00:05:57,660 --> 00:06:01,470
Then May 1st. Then April 30th. At the end we have May 2nd.
-
Not Synced
57
00:06:01,470 --> 00:06:06,420
May 7th. April 26. May 3rd. It violates the expected convention.
-
Not Synced
58
00:06:06,420 --> 00:06:12,030
And what they're what you'd need to do to show a trend over time that time goes from left to right.
-
Not Synced
59
00:06:12,030 --> 00:06:18,960
They're sorting things in. They're putting things out of order to show the trend.
-
Not Synced
60
00:06:18,960 --> 00:06:26,880
They want. In a way that's not substantiated by the data and it takes a lot of work to make a chart.
-
Not Synced
61
00:06:26,880 --> 00:06:35,480
This bat. I'm not sure how to do it in any of the statistical software I actually use, but this is a this is an egregious example,
-
Not Synced
62
00:06:35,480 --> 00:06:42,220
but it's an example of one of the things that can happen when where when we focus on.
-
Not Synced
63
00:06:42,220 --> 00:06:47,220
The effect we want to demonstrate over the evidentiary support for it.
-
Not Synced
64
00:06:47,220 --> 00:07:04,960
I want to contrast with a. I want to contrast with a chart from W edi w e, b do BWAS charts created for the nineteen hundred parece X position.
-
Not Synced
65
00:07:04,960 --> 00:07:11,250
And these were a series of charts for an exhibition to show the economic, educational,
-
Not Synced
66
00:07:11,250 --> 00:07:19,660
etc. progress of black Americans from emancipation to the turn of the century.
-
Not Synced
67
00:07:19,660 --> 00:07:23,500
And he made a series of charts showing economic situations and things.
-
Not Synced
68
00:07:23,500 --> 00:07:28,090
And here's a bar chart that clearly shows the result.
-
Not Synced
69
00:07:28,090 --> 00:07:35,350
The distribution of economic statuses for farmers after a year of farm labor.
-
Not Synced
70
00:07:35,350 --> 00:07:39,250
And it shows we have the first categories of bankrupt and in debt.
-
Not Synced
71
00:07:39,250 --> 00:07:46,570
And then we have four different or five different levels of of non-negative return up to clearing.
-
Not Synced
72
00:07:46,570 --> 00:07:53,160
Fifty dollars or more. And it shows them the bars are proportional to the length of the data.
-
Not Synced
73
00:07:53,160 --> 00:07:55,290
Very clearly highlights these things.
-
Not Synced
74
00:07:55,290 --> 00:08:03,720
It then also does a creative thing of pulls out the separate bar that it indicates is the composite of all of the non-negative bars.
-
Not Synced
75
00:08:03,720 --> 00:08:08,760
And we can see that even if you if you add all of that, they non-negative bars together.
-
Not Synced
76
00:08:08,760 --> 00:08:13,860
It's not as many farmers as as the indebt category.
-
Not Synced
77
00:08:13,860 --> 00:08:17,490
That's not a very standard thing. These charts were hand drawn.
-
Not Synced
78
00:08:17,490 --> 00:08:25,260
But it's a creative use of the visualization to highlight in a way that's supported by the data,
-
Not Synced
79
00:08:25,260 --> 00:08:35,910
the relative distributions of a different in return levels for black American farmers at the time.
-
Not Synced
80
00:08:35,910 --> 00:08:42,990
Another one, that's another one that's creative here is this spiral bar chart.
-
Not Synced
81
00:08:42,990 --> 00:08:49,080
Again, it's an unusual thing, but the lines, we can see them going progressively longer.
-
Not Synced
82
00:08:49,080 --> 00:08:55,680
And if this were just a part of normal horizontal bar chart without the spiral that the smallest line,
-
Not Synced
83
00:08:55,680 --> 00:09:00,570
the first line, 1975, would be so small you couldn't even see us.
-
Not Synced
84
00:09:00,570 --> 00:09:07,050
This gives more space. And it shows good visualization doesn't just mean following the checklist of rules.
-
Not Synced
85
00:09:07,050 --> 00:09:16,920
It means presenting the data in a way that the conclusions and the takeaways are clear and they're rigorously supported by the underlying data.
-
Not Synced
86
00:09:16,920 --> 00:09:20,850
There's no visual tricks to make things look larger or smaller than they are.
-
Not Synced
87
00:09:20,850 --> 00:09:27,990
It transparently shows the connection between the conclusion and the underlying data that support it.
-
Not Synced
88
00:09:27,990 --> 00:09:33,850
So to wrap up, the goal of good presentation is to guide the reader to what we learned and how we know it.
-
Not Synced
89
00:09:33,850 --> 00:09:41,460
Effective. Presentation is going to highlight the important things for the reader to understand without distraction or deception.
-
Not Synced
90
00:09:41,460 --> 00:09:55,533
And we're going to see throughout more of this week and throughout more the semester how practically to go about doing that.
-
Not Synced