[Script Info] Title: [Events] Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,1\N00:00:04,610 --> 00:00:10,490\NHello. And this video, I want to talk to you about how to build up a chart from the ground up as we think Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,2\N00:00:10,490 --> 00:00:15,260\Nabout the question it's going to try to answer and the pieces that need to go into it. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,3\N00:00:15,260 --> 00:00:21,350\NSo the learning outcome for this video is for us to be able to design a chart by thinking first of the questions, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,4\N00:00:21,350 --> 00:00:28,070\Nthe goals and the data that are going to be in and from the chart. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,5\N00:00:28,070 --> 00:00:34,580\NSo a good chart answers a question and the guiding principle for how we design and Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,6\N00:00:34,580 --> 00:00:40,940\Nhow we lay out our chart is to illuminate the question that we want to answer. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,7\N00:00:40,940 --> 00:00:45,230\NAnd this depends. We need to know what question we want to answer in the first place. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,8\N00:00:45,230 --> 00:00:52,550\NWe also need to know precisely how we operationalize that question so we can use that to then inform how we're going into the chart layout. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,9\N00:00:52,550 --> 00:00:59,370\NAnd we need to know what data that we're using, specifically what variables we're using as a part of this chart. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,10\N00:00:59,370 --> 00:01:05,250\NFor example, there's a data set, you'll see it in the notebook that goes with this video for of passengers on the Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,11\N00:01:05,250 --> 00:01:09,800\NTitanic and supposedly wanted to examine whether passengers in a higher fare class, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,12\N00:01:09,800 --> 00:01:15,450\Nsay, first class or more likely to survive than passengers in lower fare classes. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,13\N00:01:15,450 --> 00:01:19,860\NIn this analysis, we have an outcome variable zero one, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,14\N00:01:19,860 --> 00:01:29,590\Nwhether or not the passenger survived the Titanic sinking and a lot of charts are going to have an outcome variable. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,15\N00:01:29,590 --> 00:01:37,140\NWe want to we have some outcome variable and we want to see how it responds to or how it differs with some other variable, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,16\N00:01:37,140 --> 00:01:43,170\Nwhich we call the explanatory variable, in this case, the passage class where outcome is survival. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,17\N00:01:43,170 --> 00:01:52,500\NAnd we want to see how it changes as the the the passengers passage class, the explanatory variable changes. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,18\N00:01:52,500 --> 00:01:57,510\NThe outcome variable is also called the response variable or the dependent variable, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,19\N00:01:57,510 --> 00:02:01,770\Nbecause it's what we're trying to measure that's responding to the condition we're trying to analyze. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,20\N00:02:01,770 --> 00:02:06,990\NAnd the explanatory variable is sometimes called the independent variable because it's changing, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,21\N00:02:06,990 --> 00:02:11,820\Nbut it's not changing as a function of the other variables in theory. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,22\N00:02:11,820 --> 00:02:20,160\NSo we can do this with a bar chart and this bar chart shows the x axis is our steerage class through our passage class first, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,23\N00:02:20,160 --> 00:02:29,520\Nsecond and third, and the Y axis is the average is the fraction of passengers in that class who survive. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,24\N00:02:29,520 --> 00:02:35,460\NWe also see some error bars. We're going to see later what those mean and how to how to compute them. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,25\N00:02:35,460 --> 00:02:47,550\NBut this lets us see how the outcome survival changes as we age, as the pass or with the different passage classes of the passengers. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,26\N00:02:47,550 --> 00:02:56,490\NAnd one of the things to note here is that we have our explanatory variable on the X axis and the outcome variable on the Y axis. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,27\N00:02:56,490 --> 00:03:02,100\NThat is the general convention. There are some cases where we might want to flip it. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,28\N00:03:02,100 --> 00:03:09,510\NSo we've we've got a horizontal bar chart where the explanatory is on the Y and the outcome is on the X, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,29\N00:03:09,510 --> 00:03:13,920\Nparticularly if we if if it makes the labels more readable. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,30\N00:03:13,920 --> 00:03:25,380\NBut the standard convention for most types of charts is to put explanatory on x axis, the horizontal axis and the outcome variable on the Y axis. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,31\N00:03:25,380 --> 00:03:29,370\NAnd this chart shows the relationship, many charts or relationship, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,32\N00:03:29,370 --> 00:03:39,010\Nmost of the plot that we're going to be drawing in this class show how some kind of a numeric variable either continues or. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,33\N00:03:39,010 --> 00:03:51,610\NOr integer changes between different values of one or more other variables, and in this case, even though our response was zero one logical. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,34\N00:03:51,610 --> 00:03:59,480\NWhen we convert it into a rate per per passage class, it became a continuous variable. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,35\N00:03:59,480 --> 00:04:05,620\NAnd so when we do this, we need to identify a few key things to design our plots. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,36\N00:04:05,620 --> 00:04:12,490\NWe need to identify what variable we want to show. That's going to guide a lot of plots that'll be on our y axis. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,37\N00:04:12,490 --> 00:04:18,520\NWhen it's not, it'll usually be on X and it's going to identifying that variable is, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,38\N00:04:18,520 --> 00:04:22,660\Nif anything, probably the most important thing in designing a plot. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,39\N00:04:22,660 --> 00:04:27,160\NWe then need to identify what we want to show about this variable. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,40\N00:04:27,160 --> 00:04:32,380\NDo we want to show its value for different data points? Do we want to show a statistic? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,41\N00:04:32,380 --> 00:04:36,370\NThe do we want to show, for example, the the mean or the rate? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,42\N00:04:36,370 --> 00:04:44,220\NIn the previous when we showed a statistic, the Titanic example, we showed a statistic, the rate of of survival. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,43\N00:04:44,220 --> 00:04:47,660\NDo we want to show its distribution? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,44\N00:04:47,660 --> 00:04:57,410\NAnd then how do we want to compare that between values of the explanatory variable, particularly, do we want to look at absolute differences? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,45\N00:04:57,410 --> 00:05:08,410\NDo we want to look at relative or proportional differences? And even the histograms follow this kind of a design because they have an outcome, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,46\N00:05:08,410 --> 00:05:15,640\Nwhich is the frequency or the count of the abortion or the density, depending on precisely what kind of histogram we're showing. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,47\N00:05:15,640 --> 00:05:20,500\NAnd then they have the explanatory variable, which is the value or the. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,48\N00:05:20,500 --> 00:05:33,850\NSo we've got a histogram and we've got some beans. And the response variable is how how many values are in that bend and the explanatory is the. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,49\N00:05:33,850 --> 00:05:40,900\NSo identifying these then informs the entire pipeline of producing our chart, the data processing the beginning. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,50\N00:05:40,900 --> 00:05:47,110\NWe're going to do group aggregation transformation that gets us to the final values we can actually plot. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,51\N00:05:47,110 --> 00:05:55,210\NIt's going to affect our choice of plot type and it's going to affect our choice of axis labels, colors, facets, the other aspects of the plot. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,52\N00:05:55,210 --> 00:06:04,500\NSo. The type of the variable has a significant impact if the response is numeric or can be transformed, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,53\N00:06:04,500 --> 00:06:07,560\Nthe response is often numeric or can be transformed to be. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,54\N00:06:07,560 --> 00:06:15,840\NIf we're talking about a of categorical value, we usually want the relative frequency of different of different values of that. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,55\N00:06:15,840 --> 00:06:22,320\NSo either we're doing it like a histogram and the we're going to transform it so that we're showing just the distribution. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,56\N00:06:22,320 --> 00:06:29,010\NWe're going to transform it, so that we're showing that the explanatory becomes the value of the categorical. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,57\N00:06:29,010 --> 00:06:33,870\NAnd the response is how many or what fraction have it in a logical. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,58\N00:06:33,870 --> 00:06:40,870\NIt might if it's a two level categorical, we might turn it into a fraction, just a fraction to have one of the levels versus the other explanatory. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,59\N00:06:40,870 --> 00:06:46,920\NIt can be anything. We're going to see how to use numeric explanatory variable is categorical explanatory variables ordinal. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,60\N00:06:46,920 --> 00:06:48,420\NWe were just like categorical, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,61\N00:06:48,420 --> 00:06:57,180\Nexcept that the that it's a discrete axis that preserves order and we need to make sure that the order and ordinal data is being preserved. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,62\N00:06:57,180 --> 00:07:05,270\NIf you're using pandas ordered category type, it automatically preserves order when you're doing the plot for you. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,63\N00:07:05,270 --> 00:07:12,050\NSo if we just have one explanatory variable, this is the easiest case, if our explanatory variable is continuous, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,64\N00:07:12,050 --> 00:07:16,820\Nwe usually want to scatterplot or align plot for showing individual values. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,65\N00:07:16,820 --> 00:07:22,890\NSometimes we'll flip the response and explanatory on a scatterplot or will or both might be explanatory. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,66\N00:07:22,890 --> 00:07:25,790\NWe want to show where points lie in a two dimensional space. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,67\N00:07:25,790 --> 00:07:35,120\NBut generally, if we've got an explanatory, a continuous explanatory variable and we've got a and we're trying to show values, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,68\N00:07:35,120 --> 00:07:39,530\Nwe're going to use a scatterplot or a line excuse me, we're going to try to show values. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,69\N00:07:39,530 --> 00:07:45,380\NWe're going to try to show statistics like a mean at each at each value of the explanatory variable. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,70\N00:07:45,380 --> 00:07:56,490\NWe're going to use a scatterplot or a line plott. If the explanatory variable is discrete, then we're going to use a bar chart to show a statistic. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,71\N00:07:56,490 --> 00:08:05,400\NIf we want to estimate the relative difference, we want to be able to compare the relative value relatively compared to values, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,72\N00:08:05,400 --> 00:08:08,640\Nbecause a bar, one bar will be twice as high as another. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,73\N00:08:08,640 --> 00:08:15,390\NAnd a point plot shows a statistic or an individual value, and it emphasizes absolute difference. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,74\N00:08:15,390 --> 00:08:19,740\NYou don't have a whole bar in order to to compare heights. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,75\N00:08:19,740 --> 00:08:28,140\NYou just have the point. And then if we want to show a distribution, we usually use a box or a violin plot with this discrete explanatory variable. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,76\N00:08:28,140 --> 00:08:32,670\NWe don't have great ways to show distributions with continuous explanatory variables. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,77\N00:08:32,670 --> 00:08:36,480\NYou can show a variance with an error bar, but that's about where a ribbon. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,78\N00:08:36,480 --> 00:08:44,220\NBut that's about it. For too explanatory variables we get into. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,79\N00:08:44,220 --> 00:08:53,160\NToo explanatory variables, we have a couple of options. One is we can do a three, a pseudo 3D display where we do a contour plot or a heat map. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,80\N00:08:53,160 --> 00:08:57,000\NAnd I'm going to show both of these here. So this is a contour plot. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,81\N00:08:57,000 --> 00:09:01,860\NThe left one is a contour plot and it reads like a topographical map. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,82\N00:09:01,860 --> 00:09:09,180\NIf you envision your your two explanatory variables in this case, we're going to we're showing a two dimensional distribution. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,83\N00:09:09,180 --> 00:09:18,990\NSo one explanatory variable is the score given to a movie by its critics, and another explanatory variable is the score given by its audience. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,84\N00:09:18,990 --> 00:09:23,930\NAnd then the response variable is how many movies have that combination? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,85\N00:09:23,930 --> 00:09:30,320\NAnd so we can see here, this is the peak, a contour plot is really good for showing us the peak. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,86\N00:09:30,320 --> 00:09:43,510\NIt's going to be that innermost circle and it also shows us the shape because each of these rings is a a a level of decreasing. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,87\N00:09:43,510 --> 00:09:52,090\NDecreasing height in this map, so if the response if we envision that the response variable is this height and we're looking at a two dimensional map, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,88\N00:09:52,090 --> 00:09:56,110\Nthe rings show us the contours around the mountains of that height. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,89\N00:09:56,110 --> 00:10:01,150\NGood for showing, good for showing shape. The other one of the heat map which uses color. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,90\N00:10:01,150 --> 00:10:07,240\NAnd so it's usually going to be from a cool color like, say, black here to to a hot orange, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,91\N00:10:07,240 --> 00:10:18,230\Nor it's going to be sometimes if you have a bidirectional one, which goes blue to red and it lets us see the highest density is here. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,92\N00:10:18,230 --> 00:10:23,390\NAnd the as you go out from there, you get lower and lower densities. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,93\N00:10:23,390 --> 00:10:27,740\NEither one can work for a continuous variable heat map, you often have to it in order to. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,94\N00:10:27,740 --> 00:10:36,890\NThis is a descriptivist heat map where we have been everything in in bins of of a half a star or a half Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,95\N00:10:36,890 --> 00:10:41,510\Na star on the audience score and a four star on the credit score because they're on different scales. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,96\N00:10:41,510 --> 00:10:46,850\NBut heat maps also work well for categorical ordinal data. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,97\N00:10:46,850 --> 00:10:55,180\NSo. Another way we can do it is we can use other esthetics for secondary variables such as color or shape or size, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,98\N00:10:55,180 --> 00:10:58,300\Nsometimes we'll use that to indicate a second response variable, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,99\N00:10:58,300 --> 00:11:06,820\Nlike you might have a scatterplot where the size of the point is a second response variable, but often it's for multiple explanatory variables. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,100\N00:11:06,820 --> 00:11:15,640\NSo this shows us how we can do that. So if we wanted to break down Titanic's survival rates by both class and sex, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,101\N00:11:15,640 --> 00:11:22,240\Nwe can see we can use we keep our class on the X axis like we did before, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,102\N00:11:22,240 --> 00:11:32,290\Nand then we use color for the passenger sex so we can see significantly higher survival rates for women across all three classes. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,103\N00:11:32,290 --> 00:11:36,550\NI'm also showing you here the difference between a bar chart and a point plot. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,104\N00:11:36,550 --> 00:11:41,590\NSo the left is the bar chart. The right is the point plot. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,105\N00:11:41,590 --> 00:11:46,030\NAnd the bar chart lets us compare the heights of the bars. Note that it starts at zero. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,106\N00:11:46,030 --> 00:11:55,590\NBar charts always start at zero. And because so it lets us compare the height of the bars and we can see that. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,107\N00:11:55,590 --> 00:12:08,290\NIt's easy to see from just using our vision that the female passenger first class bar is almost is more than twice as tall as the. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,108\N00:12:08,290 --> 00:12:12,370\NAs the female or the male passenger first class bar, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,109\N00:12:12,370 --> 00:12:18,370\Nthe male passenger first class bar is twice as tall as the male passenger passenger second class bar. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,110\N00:12:18,370 --> 00:12:23,410\NSo it lets us compare make relative comparisons between the different values. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,111\N00:12:23,410 --> 00:12:29,230\NThis is why it always starts at zero, because the natural thing to do with the bar is compare its height. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,112\N00:12:29,230 --> 00:12:34,990\NIf your bar chart does not start at zero, suppose our bar chart started at point one, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,113\N00:12:34,990 --> 00:12:40,300\Nthen the comparison of height would exaggerate the difference relative to the value. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,114\N00:12:40,300 --> 00:12:45,850\NAnd what looks twice as tall isn't actually twice as tall because we cut off a bunch of the bottom. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,115\N00:12:45,850 --> 00:12:52,010\NSo always start at zero. The point plot. Does not it makes it hard to compare relative difference. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,116\N00:12:52,010 --> 00:12:55,790\NWe can't it's difficult for us to tell that the survival rate visually tell. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,117\N00:12:55,790 --> 00:12:57,500\NWe can tell if we look at the numbers, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,118\N00:12:57,500 --> 00:13:05,090\Nbut it's difficult to visually tell that the survival rate of women in first classes is twice as high as the survival rate of men. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,119\N00:13:05,090 --> 00:13:11,390\NBut what it does literacy is it lets us see the absolute, absolute difference between these values, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,120\N00:13:11,390 --> 00:13:16,550\Nand it makes it easy to compare the difference in the gaps across the three classes. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,121\N00:13:16,550 --> 00:13:31,670\NWe can see that the the survival rate by by sex is much higher or is much closer in the third class than it is in the first or in the second class. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,122\N00:13:31,670 --> 00:13:39,020\NSo your choice of plot really guides the user to see different things in your choice of plot, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,123\N00:13:39,020 --> 00:13:42,650\Nallows you to emphasize different things and you need to decide. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,124\N00:13:42,650 --> 00:13:51,440\NYou need to choose and design your plot in such a way that's going to tell the story that you need to tell from the data. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,125\N00:13:51,440 --> 00:13:59,570\NWe can also have more than two explanatory variables. It's difficult to have more than one that's numeric or two for doing a contour plot. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,126\N00:13:59,570 --> 00:14:05,270\NWe can bend variables that are then going to let us use some more techniques, such as FaceTime. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,127\N00:14:05,270 --> 00:14:08,270\NSo if we want to break down by more categorical variables, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,128\N00:14:08,270 --> 00:14:12,320\Nso we want let's say we also want to look at a or we want to break down many more variables. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,129\N00:14:12,320 --> 00:14:18,140\NLet's say we also want to look at age. And so we're going to keep sex on the color. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,130\N00:14:18,140 --> 00:14:24,050\NWe're going to now use age as the x axis. Since this numeric, it really works better on an axis. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,131\N00:14:24,050 --> 00:14:28,970\NI have bend it into bins of tens that you only have one point for every decade. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,132\N00:14:28,970 --> 00:14:36,530\NBut then we use a fassett and the fassett means we draw a different chart for each of the three classes. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,133\N00:14:36,530 --> 00:14:43,400\NThe charts all share a y axis so we can directly compare across the row of charts and we can see it lets us see Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,134\N00:14:43,400 --> 00:14:55,400\Nparticularly how does the survival as a function of age change between different different passenger classes, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,135\N00:14:55,400 --> 00:15:01,460\Nfor example? And so it is, but it lets us start to build up. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,136\N00:15:01,460 --> 00:15:05,720\NAnd if we had a fourth, we could use rows and columns in the faceted plot. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,137\N00:15:05,720 --> 00:15:10,310\NSo we have these mechanisms of building up and we have our x axis or y axis. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,138\N00:15:10,310 --> 00:15:18,380\NWe can use esthetics of the lines of the points, particularly color, size, shape, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,139\N00:15:18,380 --> 00:15:26,730\Nand then we can use facets to build up even more variables into our plot. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,140\N00:15:26,730 --> 00:15:32,370\NTo do fascinating, there's a couple of things you can do, it's built into some of the seabourne row plotting functions. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,141\N00:15:32,370 --> 00:15:37,230\NThe plot and cat plot function functions can both do fascinating on their own. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,142\N00:15:37,230 --> 00:15:42,240\NThey let you control the statistic. They're very, very flexible functions for a wide range of plot. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,143\N00:15:42,240 --> 00:15:48,810\NThe general purpose Fassett Grid allows you to fassett any kind of plot by writing some more Python code on your own. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,144\N00:15:48,810 --> 00:15:53,190\NVery useful if you want to fassett something that doesn't support Facetune built in. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,145\N00:15:53,190 --> 00:15:59,940\NAnd if you're using Plot nine or the R.G. plot to package Fassett Grid and Fassett wrap a control fassett, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,146\N00:15:59,940 --> 00:16:07,680\Nyou build that faceted plot you need to pay attention to what variables go where your choice of which variables are going to be on color, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,147\N00:16:07,680 --> 00:16:16,530\Nwhat variables are going to be facets, which variables are going to be on your axes really affect how the reader is going to interpret and understand Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,148\N00:16:16,530 --> 00:16:22,680\Nyour plot and you need to choose them strategically to tell the story that addresses your question. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,149\N00:16:22,680 --> 00:16:28,950\NYou also need to do it, though, in a way that is honest and does not mislead your user, your readers. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,150\N00:16:28,950 --> 00:16:38,810\NThe chart needs to honestly show the readers what it is that you learned from the data and show that clearly. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,151\N00:16:38,810 --> 00:16:43,610\NAnother thing we can do to build up a chart, especially if we have more categorical variables, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,152\N00:16:43,610 --> 00:16:47,180\Nif we've got a categorical response variable with more than two levels, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,153\N00:16:47,180 --> 00:16:56,390\Nand we want to show how particularly how the the proportion in different categories changes the response to another variable, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,154\N00:16:56,390 --> 00:17:04,250\Na stack chart can be very good. Let's see the differences in composition to see how the parts of a hole change. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,155\N00:17:04,250 --> 00:17:05,390\NAnd so this chart, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,156\N00:17:05,390 --> 00:17:16,060\Nthis is a stacked bar chart and it's a horizontal bar chart where I put the explanatory variable on the x axis excuse me, on the Y axis. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,157\N00:17:16,060 --> 00:17:22,150\NJust in part to make the labels easier to read and so are explanatory variable is what data set. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,158\N00:17:22,150 --> 00:17:28,210\NSomething came from Locke, M.D. Gry. What those are don't matter for our purposes right now. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,159\N00:17:28,210 --> 00:17:34,540\NThe response variable is the distribution of gender's in this case. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,160\N00:17:34,540 --> 00:17:39,970\NThese are data sets of books, the genders of the authors of those books in the data set. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,161\N00:17:39,970 --> 00:17:47,260\NAnd so we have female, we've got mail and we also have codes for we it's ambiguous or unknown or we didn't have data. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,162\N00:17:47,260 --> 00:17:59,470\NAnd so we can see, for example, the GYŐRI data set has a higher fraction of women and a significantly lower fraction of men. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,163\N00:17:59,470 --> 00:18:12,900\NAnd we can see quite a few more. Books that we don't know what gender on, and so this the order on this chart is very strategic. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,164\N00:18:12,900 --> 00:18:22,230\NI observed these levels is very strategic. I bunch I batched all of the various kinds of we don't know together so that Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,165\N00:18:22,230 --> 00:18:27,180\Nyou can look at that whole block and see the and see the various types of. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,166\N00:18:27,180 --> 00:18:33,720\NWe don't know the gender of the book's author together, but you can also see how they're broken down into individual things. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,167\N00:18:33,720 --> 00:18:41,790\NYou can see that UNlinked is a very, very large fraction of of that increase in books where we don't know the author's gender. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,168\N00:18:41,790 --> 00:18:46,560\NSo you need to think you need to think about all of these different things in order Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,169\N00:18:46,560 --> 00:18:51,630\Nto be able to generate a chart that's going to clearly and unambiguously communicate. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,170\N00:18:51,630 --> 00:18:54,930\NYou can show either you can show raw values in a stack bar chart at the bars. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,171\N00:18:54,930 --> 00:18:58,690\NDon't all have to be the same height you can show fractions, in which case they will be. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,172\N00:18:58,690 --> 00:19:07,080\NI chose to show fractions in this chart. The code that generates this using raw matplotlib is linked in the notes for the video. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,173\N00:19:07,080 --> 00:19:09,330\NSometimes we're also going to transform our charts. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,174\N00:19:09,330 --> 00:19:15,330\NWe might transform the axis such as doing a log ten scale, in which case the label would transform the axis. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,175\N00:19:15,330 --> 00:19:20,070\NThe labels are still in their original value. It's just they're spaced out logarithmically. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,176\N00:19:20,070 --> 00:19:24,780\NWe generally won't do this for bars. Reading a bar on a large scale. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,177\N00:19:24,780 --> 00:19:30,840\NYou can draw it, but you have to be really, really careful in order to make sure that your readers are going to accurately interpret it. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,178\N00:19:30,840 --> 00:19:35,580\NBut for line and scatter plots, log transforms are a lot more common. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,179\N00:19:35,580 --> 00:19:41,580\NSometimes, though, we're actually going to transform the data itself and we're going to plot a log or a square root or some other rescaling. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,180\N00:19:41,580 --> 00:19:49,160\NAnd another kargman transformation is to be in the data, somehow democratize it into fixed bins. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,181\N00:19:49,160 --> 00:19:54,530\NBy some mechanism or another, so the key decisions that you need to make when you're making one of these charts Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,182\N00:19:54,530 --> 00:20:00,920\Nare you need to pick the variables and how you're doing their transformations. You need to pick that what's called the esthetics, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,183\N00:20:00,920 --> 00:20:06,680\Nhow you're going to map the different variables you're looking at to chart features your X and Y axes, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,184\N00:20:06,680 --> 00:20:10,700\Nyour facets row and column your color, your point marker style. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,185\N00:20:10,700 --> 00:20:14,690\NIf you're doing a joint plot, often it's useful to put. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,186\N00:20:14,690 --> 00:20:25,370\NThe same esthetic on both color and style, and that way, if you have a reader who's colorblind, they still get different point styles, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,187\N00:20:25,370 --> 00:20:30,050\Neven if they can't tell the colors apart or if someone's putting it on a black and white printer. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,188\N00:20:30,050 --> 00:20:34,340\NAnd then you need the type of the chart line, chart, bar, point box, et cetera. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,189\N00:20:34,340 --> 00:20:41,750\NSo you have to make all of these decisions when you're drawing this chart and they're driven by what variables and data you have and what Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,190\N00:20:41,750 --> 00:20:50,210\Nquestion you're trying to answer and what story you're trying to tell about that you do need to be careful to avoid excessive complexity. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,191\N00:20:50,210 --> 00:20:58,550\NWe can put a different variable on every conceivable esthetic and it's often going to result in a chart that's very difficult to read. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,192\N00:20:58,550 --> 00:21:03,410\NWe also have to be careful with color because it's easy to make a chart that has differences Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,193\N00:21:03,410 --> 00:21:07,580\Nthat are difficult for the human eye to distinguish or get obscured by printers, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,194\N00:21:07,580 --> 00:21:17,990\Nlow quality displays, etc. It's also important to note a good graphic reveals the data and does not distort or obscure the data. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,195\N00:21:17,990 --> 00:21:24,530\NIt's easy to create a graphic that manipulates the data to tell a story that's not very well supported. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,196\N00:21:24,530 --> 00:21:30,080\NAnd we want to avoid that when we're doing data science with honesty and integrity. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,197\N00:21:30,080 --> 00:21:35,120\NSo wrap up. You need to identify the variables and relationships that you want to highlight in your chart. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,198\N00:21:35,120 --> 00:21:38,030\NYou want to design a plot that illustrates them, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,199\N00:21:38,030 --> 00:21:43,760\Nand you're going to need to spend some time studying your plodding library APIs and the Plotting Libraries Gallery. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,200\N00:21:43,760 --> 00:21:50,180\NAny plotting library usually has a gallery of a bunch of different plots and the code that was used to generate them. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,201\N00:21:50,180 --> 00:21:52,880\NSeabourne has this, matplotlib has this. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,202\N00:21:52,880 --> 00:22:00,650\NAnd so you spending some time with that looking, oh, this looks like this looks like the kind of plot that might display my data well. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,203\N00:22:00,650 --> 00:22:14,167\NAnd then look and click on it and see what code they use to generate it and borrow it. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,