WEBVTT 00:00:00.000 --> 00:00:05.893 So up to now. We've looped over data, and written of statements to count one thing 00:00:05.893 --> 00:00:09.887 or another. But really what you want to do is be able to count multiple things. So 00:00:09.887 --> 00:00:13.931 that you can compare them, like are there more boys or girls or whatever they have 00:00:13.931 --> 00:00:18.024 some quality. So that's what we're going to do in this section. So in order to do 00:00:18.024 --> 00:00:21.772 this basically what I want to do is just have multiple counter variables. So 00:00:21.772 --> 00:00:26.216 instead of just having count, have a few of them. So let me just show you in this 00:00:26.216 --> 00:00:30.961 code section. So let's say I wanna go through and I wanna count my goal is do 00:00:30.961 --> 00:00:35.410 more boy or girl names end in Y, like who knows the answer? And well use the 00:00:35.410 --> 00:00:39.610 computer, so what I'm gonna do. Is introduce two count variables. So outside 00:00:39.610 --> 00:00:44.397 the loop, whereas before I just said count equals zero I'm just gonna use the simple 00:00:44.397 --> 00:00:48.634 form of, of calling my variables count1, count2, and so on. Pretty unimaginative, 00:00:48.634 --> 00:00:53.256 but it's simple. So I'm gonna h, in this case I'm gonna have two variables. So I'll 00:00:53.256 --> 00:00:57.602 say count1 equals zero and count2 equals zero. And my intention is that, well, in 00:00:57.602 --> 00:01:02.059 count1 I'll keep track of the boy case, and in count2 I'll keep track of the girl 00:01:02.059 --> 00:01:06.546 case. So, inside the loop. This looks very much like what we did before. So, I have 00:01:06.546 --> 00:01:11.077 an if task where I'm looking for, rows where the name ends with Y, and the gender 00:01:11.077 --> 00:01:15.499 is boy. And so when that's true, I'll bump up count one. So, count one being sort of 00:01:15.499 --> 00:01:19.921 a boy counter. And then I have, following it, I have a very similar if statement, 00:01:19.921 --> 00:01:24.398 that looks for any name ending in Y, but I'm looking for a gender that equals girl. 00:01:24.562 --> 00:01:29.415 And in that case, I'll bump up count two. So this, so the loop runs, and it's just 00:01:29.415 --> 00:01:34.068 going to be counting up the girl, it's, you know, at the same time, it's counting 00:01:34.068 --> 00:01:38.900 up the boy and girl cases. And then when it gets the end here. I'll just run it. 00:01:39.070 --> 00:01:43.741 Then it just says boy count colon count one and girl count, count two. So we see 00:01:43.741 --> 00:01:48.412 it turns out more girl name, more girl names end in Y than, than boy names or 00:01:48.412 --> 00:01:52.513 whatever it is. Obviously with this formula you could test any number of 00:01:52.513 --> 00:01:57.013 things. One thi ng I should point out is that, this is gonna be sort of our 00:01:57.013 --> 00:02:01.855 official, class format for how complicated I wanna make things. So, I've got the one 00:02:01.855 --> 00:02:05.534 loop. And then I could have, you know, generally just two or three variables. But 00:02:05.534 --> 00:02:09.161 any number of variables, I set them to zero right here. And then for each one, I 00:02:09.161 --> 00:02:12.648 have an if statement. And I wanna point out, the if statements are one after 00:02:12.648 --> 00:02:16.322 another. And, in fact, the order doesn't really matter. What, what I would point 00:02:16.322 --> 00:02:20.019 is, the if statements are not inside of each other. I'm not gonna do that. That's 00:02:20.019 --> 00:02:24.066 more complicated. Of the stuff we can do, perfectly interesting work just sticking 00:02:24.066 --> 00:02:28.212 with this form. Alright. So let me just try and natural extension with this, so 00:02:28.212 --> 00:02:32.114 we'll just go to three variables, I'll just show you how that works. So, for 00:02:32.114 --> 00:02:36.909 three variables, I'm just gonna stick with my trivial naming convention. Count one, 00:02:36.909 --> 00:02:41.415 count two, count three. So I set these three variables to zero outside the loop. 00:02:41.588 --> 00:02:46.152 And this case, the questions I wanna ask it, or answer, is, do more names end in A 00:02:46.152 --> 00:02:50.368 or I or O? Who knows. So I've got the three counters and I'll use count one to 00:02:50.368 --> 00:02:54.475 count the A case, and count two for the I case, and count three to the O case. So 00:02:54.475 --> 00:02:58.685 here's the sort of obvious if statement. If name ends with A, and then [inaudible] 00:02:58.685 --> 00:03:03.000 count one equals count one plus one. And then likewise there is an if statement for 00:03:03.000 --> 00:03:07.262 the I case that bumps up count two, and, and if statement for the O case that bumps 00:03:07.262 --> 00:03:11.369 up count three. And here I've got these three print statements, outside the loop. 00:03:11.369 --> 00:03:15.475 So these run right the loop has completed so its bumped up all the counters to 00:03:15.475 --> 00:03:19.686 whatever they're going to be, and then we just print them out. So it's trapped. Huh. 00:03:19.686 --> 00:03:25.358 [inaudible]. So A. Totally dominates. 377 in or just like, yeah, whatever. Thanks 00:03:25.358 --> 00:03:31.971 for playing. Nice try. Just a little bit stylistic thing I'll point out here. My 00:03:31.971 --> 00:03:36.738 naming convention here is, I mean it's. Kind of lame, you know, just one, two, 00:03:36.738 --> 00:03:41.032 three. Another way we could of done this is, in this case we could of called this 00:03:41.032 --> 00:03:45.111 one count A and count I. I li ke this, count A and count I. So, it would be more 00:03:45.111 --> 00:03:49.405 demonic of, of what it was counting. But then it has the disadvantage of whenever 00:03:49.405 --> 00:03:53.646 you copy pasted or switched from one example to another you would have to like 00:03:53.646 --> 00:03:57.779 to remember rename. So, I decided to go with this very trivial simple just one, 00:03:57.779 --> 00:04:01.890 two, three scheme but we could of done something more complicated there. The 00:04:01.890 --> 00:04:05.286 other thing I'll point out is that it's, it's natural to find, to use copy paste 00:04:05.286 --> 00:04:08.638 for these so you kind of get your first case working and then you [inaudible]. 00:04:08.638 --> 00:04:11.904 However, you do that there this very natural error where you have to be very 00:04:11.904 --> 00:04:15.299 careful that you're manipulating the right variable. So that in this "if" statement 00:04:15.299 --> 00:04:18.652 I'm manipulating count one and then in this "if" statement count two and so and 00:04:18.652 --> 00:04:21.832 so. That's the sort of thing that happening to [inaudible]. Might help with, 00:04:21.832 --> 00:04:25.141 but it's, that is, no matter what you're doing it's a common error. So you just 00:04:25.141 --> 00:04:28.914 have to be a little bit careful about that. Right. So now that we've got the 00:04:28.914 --> 00:04:33.476 ability to count multiply things, I wanna, kinda expand our data set a little bit. So 00:04:33.641 --> 00:04:37.615 I did this survey, which actually I didn't bring up here. Hum, so in Google 00:04:37.615 --> 00:04:41.668 spreadsheets, since it's all, everything is just intrinsically online, it works 00:04:41.668 --> 00:04:45.722 really well for sharing data and doing easy stuff. Hum, so it has this feature, 00:04:45.722 --> 00:04:49.881 where you can put up a little, sort of, form survey in front of people. So I made 00:04:49.881 --> 00:04:54.251 this trivial survey where I ask gender, and favorite color, and favorite T.V show 00:04:54.251 --> 00:04:58.515 that's on, and I just sent this out to my class. Hum, and the way it works is every 00:04:58.515 --> 00:05:03.258 time someone's submitted a set of answers, and this' anonymous. It would go into the 00:05:03.258 --> 00:05:08.309 spreadsheet. And so this is a, Google Dock spreadsheet and what you is there's, it's, 00:05:08.309 --> 00:05:12.694 it's a table. So here's a column for gender, and here's a column for color. And 00:05:12.694 --> 00:05:17.456 these are just the answers. And you see is that every time someone types in an answer 00:05:17.456 --> 00:05:22.049 to the survey that just goes in as one row. And so we have data to sort of play 00:05:22.049 --> 00:05:26.418 around with. We have favorite color, favorite TV show, f avorite book, and what 00:05:26.418 --> 00:05:30.955 not. What I found is that it's easiest to do stuff with color, and sport, and, 00:05:31.123 --> 00:05:35.660 favorite soda drink, 'cause there's enough repetition there. If you look at book, 00:05:35.660 --> 00:05:39.917 there's, there's just so many books published that there, you know, most books 00:05:39.917 --> 00:05:44.124 just appear once. Anyway this is interesting data to, look at just to see 00:05:44.124 --> 00:05:48.837 what's going on with people who are, I guess about twenty years old in 2012. So, 00:05:49.005 --> 00:05:53.157 from Google spreadsheets you can export that data in CSV format I think I 00:05:53.157 --> 00:05:57.253 mentioned that before. It's a really common interchange format, and I just 00:05:57.253 --> 00:06:01.573 cleaned up the data a little bit so I removed dots. There was a problem where 00:06:01.573 --> 00:06:06.118 people would type in a Dr. Pepper either with a dot after the first R or not so I 00:06:06.118 --> 00:06:10.831 just removed all dots but other than the data just looks like whatever people typed 00:06:10.831 --> 00:06:16.458 in. So with that data we can write do all sorts of interesting problems, so here 00:06:16.458 --> 00:06:20.868 I've set some up. So this data is available as survey/2012.csb so we could 00:06:20.868 --> 00:06:25.348 load that into a table. There's a function I haven't talked about before the table 00:06:25.348 --> 00:06:29.882 has called convert to lowercase. What that does is it goes through the table and it 00:06:29.882 --> 00:06:34.253 modifies all the text to just be lowercase letters. So, for example if we want to 00:06:34.253 --> 00:06:38.635 count, oh, well how many people have blue as their favorite color. Well there's this 00:06:38.635 --> 00:06:42.324 problem that did they type, upper case B blue, or lower case, or you know, all 00:06:42.324 --> 00:06:46.406 lowercase or whatever. So by calling this function we just cha-, all the data is now 00:06:46.406 --> 00:06:50.292 gonna be lowercase. So we just don't have to think about that variation of what 00:06:50.292 --> 00:06:54.226 people typed. So, I'm gonna do that as a simplification here. Alright. So let me 00:06:54.226 --> 00:06:57.915 look at, so there's some sample problems here, and as usual we've got the, 00:06:58.063 --> 00:07:01.850 solutions level so this will, I'll just try this out. So it says right code to 00:07:01.850 --> 00:07:05.915 print the soda field of each route. So, what this, what I'm gonna do here. I could 00:07:05.915 --> 00:07:10.223 just print the whole row but it, it's so much data it doesn't make a lot of sense. 00:07:10.223 --> 00:07:14.691 But what could be interesting is, suppose you were curious about what people put for 00:07:14.691 --> 00:07:19.213 the soda data. W hat you could say is, get field, and you need to know what the names 00:07:19.213 --> 00:07:23.415 of the fields are. Their printed [inaudible] somewhere up here. Anyway the 00:07:23.415 --> 00:07:27.510 name of the field that the soda drinks answer is, soda. So I'm just gonna print 00:07:27.510 --> 00:07:34.113 those. And I'm not gonna count anything. Now we'll comment that out. So if I just 00:07:34.113 --> 00:07:38.222 print that, what we get is it just goes through all the rows. And, and remember, 00:07:38.222 --> 00:07:42.438 now, it's all lower case, and we can just kinda see what's there. This is maybe a 00:07:42.438 --> 00:07:46.494 good first step if you sorta wanna see, oh, well, what things seems to come up 00:07:46.494 --> 00:07:50.870 [inaudible]? Or just, kind of, if you're just curious, about TV shows or movies or 00:07:50.870 --> 00:07:55.352 whatever. [inaudible]. This also shows, I guess, ultimately that row.getfield really 00:07:55.352 --> 00:07:59.568 does just return a string. And so, print understands string, so if we just put it 00:07:59.568 --> 00:08:03.998 there, [inaudible]. Alright, so what I'd like to do is, count the favorite sodas. 00:08:03.998 --> 00:08:08.661 So I wanna say, I'm just gonna say Sprite, Dr. Pepper and Coke. Lets count those 00:08:08.661 --> 00:08:14.433 three. So what I'm go to do is following my previous strategy, as I'll say count 00:08:14.433 --> 00:08:19.694 one equals zero, count two equals zero, and count three equals zero. So my 00:08:19.694 --> 00:08:25.321 intention is that I'll, you know, I'll follow, whatever the order is, what I say. 00:08:25.540 --> 00:08:31.552 Sprite, Dr. Pepper and Coke. So what do I wanna say? If row dot get field soda is 00:08:31.552 --> 00:08:37.325 equal to Sprite. And then write, I can just write it in lower case letters, 00:08:37.325 --> 00:08:43.731 'cause I know that it has been changed. So I wanna say, count one is equal to count 00:08:43.731 --> 00:08:52.321 one plus one. So that's, that's counting one drink. We'll just try it. So this is 00:08:52.321 --> 00:08:58.699 count Sprite. Print count one, and I'm gonna, I'm gonna get rid of this line. 00:08:58.699 --> 00:09:05.891 I'm, I'm not gonna print all the sodas as we go. Okay, let's try that one. Okay. 00:09:05.891 --> 00:09:12.153 Sprite eight, that seems to work. So I wanna check it, because then. I'm gonna do 00:09:12.153 --> 00:09:19.300 some merciless copy-paste. So then I'm gonna count doctor pepper. And I'm gonna 00:09:19.300 --> 00:09:24.640 manipulate count two in that case. And I'm gonna count Coke. And I'm gonna play count 00:09:24.640 --> 00:09:29.789 three. So this is the case I was saying, where you have to be careful. Copy base is 00:09:29.789 --> 00:09:34.997 great, but you gotta make sure you're doing the right thing. >> And here I'll 00:09:34.997 --> 00:09:41.686 make two copies of this line. So then this is gonna be Dr. Pepper, count two. And 00:09:41.686 --> 00:09:48.547 Coke is count three. I have to say, there was one other cleanup I did in the data. 00:09:48.547 --> 00:09:53.371 [inaudible], ... >> Okay. That looks right. One other is, people spelled Coke. 00:09:53.371 --> 00:09:57.037 Sometimes they write Coca Cola, with a dash, or not, or whatever, so I just 00:09:57.037 --> 00:10:01.263 changed those all to Coke. So. This is a different data set, but this is following 00:10:01.263 --> 00:10:05.513 kind of my earlier example of counting three things. So it should do well. So the 00:10:05.513 --> 00:10:09.516 case I'd like to, the complexity I'd like to work out is, like, well, you know, 00:10:09.516 --> 00:10:13.892 really, if you look at the data, sometimes people would say Coke, and sometimes they 00:10:13.892 --> 00:10:18.055 would say Diet Coke. And sometimes they would say Dr. Pepper, and sometimes they 00:10:18.055 --> 00:10:22.324 would say Diet Dr. Pepper. So how could I get count two, let's say, to include both. 00:10:22.324 --> 00:10:27.569 I want to include Dr. Pepper and also Diet Dr. Pepper. And the way to make it more 00:10:27.569 --> 00:10:38.047 inclusive there is to use, the or, oops. So I'm going to say, or. Grab this. Just 00:10:38.047 --> 00:10:44.721 us we've done before. I'll say well, so this is doctor pepper, or diet doctor 00:10:44.721 --> 00:10:55.095 pepper. And do the same thing for this. [inaudible] Be on this. So I'm gonna say 00:10:55.095 --> 00:11:04.512 diet sprite. Oops, [inaudible]. Diet Dr. Pepper and here we'll say or Diet Coke. 00:11:04.512 --> 00:11:13.296 Alright, so without diet it was eight four eight. Assuming this code is correct, 00:11:13.296 --> 00:11:22.236 let's read it. So we see it was eight four eight. So, then Dr. Pepper [inaudible]. No 00:11:22.236 --> 00:11:26.450 one drinks Diet Sprite in my class, apparently. Or likes it their best. So Dr. 00:11:26.450 --> 00:11:30.560 Pepper, Pepper went up from four to seven. So it about doubled. And actually, Coke 00:11:30.560 --> 00:11:34.619 also about doubled. So I guess that we've, we've learned something a little bit 00:11:34.619 --> 00:11:38.833 there. That, for those drinks, diet represents about half. So I also obviously 00:11:38.833 --> 00:11:42.839 like this example, 'cause now we're sort of combining multiple techniques. That 00:11:42.839 --> 00:11:46.689 we're doing the counting, and then we're doing things like using or, and, or 00:11:46.689 --> 00:11:51.761 whatever on the if test that's controlling the count inside the loop. Okay, so let me 00:11:51.761 --> 00:11:56.781 try these are kind of, yo u know, non-trivial examples. Let me try another 00:11:56.781 --> 00:12:02.548 one. Let's try, let's try another one of the fields. Let's try the sport field. So 00:12:02.548 --> 00:12:08.518 this is where people there were a bunch of different sports identified, but I'm going 00:12:08.518 --> 00:12:14.081 to look at the sports of Soccer and football were common ones identified. So 00:12:14.081 --> 00:12:19.373 I'll just use count one and count two. So I'll say if sport is equal to soccer, 00:12:19.373 --> 00:12:24.919 let's say that'll, we'll do that on count one and otherwise or. Football is the 00:12:24.919 --> 00:12:30.064 other one, so that one we'll do in count two. So here we'll say soccer and 00:12:30.064 --> 00:12:36.197 football. And this count three, I'm just gonna stop doing. Okay, so we're gonna, so 00:12:36.197 --> 00:12:42.259 this should just go through, and count how many times soccer was the sport that was 00:12:42.259 --> 00:12:47.898 named. And how many times football, and then we'll see if the assumptions work 00:12:47.898 --> 00:12:53.598 out. So you see, we've got soccer seven, football twelve. So football's pretty 00:12:53.598 --> 00:12:59.791 skippingly ahead there. So the last thing we want to try here is well that we also 00:12:59.791 --> 00:13:05.897 have this gender data. So what I wanna do is let's just take this seven. Number for 00:13:05.897 --> 00:13:10.838 soccer and let's break it down by gender and so wait when I say I can break it 00:13:10.838 --> 00:13:15.903 down, what I am doing is like let's have one counter for women playing soccer and 00:13:15.903 --> 00:13:20.906 one counter for men playing soccer. So let's say count one will be women playing 00:13:20.906 --> 00:13:26.190 soccer. So how do I, what do I do here. And this is gonna be an, an and. So, I 00:13:26.190 --> 00:13:32.048 say, well if it's row dot, get field gender. And then, you, you, it just, you 00:13:32.048 --> 00:13:38.566 have to know how the data is coded. In this case, the data is coded as female, is 00:13:38.566 --> 00:13:46.656 the word for, used for that. So, count one is gonna be, answers. Where the row is 00:13:46.656 --> 00:13:55.211 also where they said [inaudible] and here I'll say soccer and male. And so that will 00:13:55.211 --> 00:14:03.664 go into count two. So here I'll say soccer and female. And then here I'll say soccer, 00:14:03.664 --> 00:14:11.098 oops, M. Okay. So, let's try that. So, soccer without looking at gender, it was 00:14:11.098 --> 00:14:15.626 seven. So now if you look at it you'll see there's actually this huge generator. So I 00:14:15.626 --> 00:14:19.969 don't know if maybe there's just a lot of women's soccer team. People in my class 00:14:19.969 --> 00:14:24.330 this quarter, or who knows. But a nyway, yeah. So of, of those seven, six, were 00:14:24.330 --> 00:14:28.746 female, and there was, one male who identified soccer as their favorite sport 00:14:28.746 --> 00:14:33.498 [inaudible]. So, this is just another example of using, you know, combining, the 00:14:33.498 --> 00:14:37.948 counting with and, and or. My previous one was or. This one, I used and. So, that's, 00:14:38.097 --> 00:14:42.235 that's sort of as complicated as I wanna get with this table data. I think it has a 00:14:42.235 --> 00:14:46.323 nice very kind of realistic feeling that you have a, data set. And then you just, 00:14:46.323 --> 00:14:50.410 the computer rips over it with a little bit of this logic that you write. And then 00:14:50.410 --> 00:14:54.349 eventually just comes up with a couple numbers, that, you know, are gonna help 00:14:54.349 --> 00:14:58.237 you analyze what's going on. So that's a very realistic way that computers are 00:14:58.237 --> 00:15:00.780 used, and an excellent format for writing exercises.