1 00:00:00,110 --> 00:00:01,760 Okay, so let's do a little exploring and think about the 2 00:00:01,760 --> 00:00:04,620 mechanics of this problem. The first thing I'd like to know, is 3 00:00:04,620 --> 00:00:07,590 how do I go about requesting the data I want from this 4 00:00:07,590 --> 00:00:11,170 website? Again, I'm thinking about how I get to a point where 5 00:00:11,170 --> 00:00:14,820 I can actually do this programmatically. So most web browsers give 6 00:00:14,820 --> 00:00:19,250 you the ability to actually inspect individual elements of a webpage, so 7 00:00:19,250 --> 00:00:22,060 in this case, let's just take a look at that selector. And 8 00:00:22,060 --> 00:00:25,330 if we do that, we can see all of the options right 9 00:00:25,330 --> 00:00:28,000 here in the HTML for the page. And based on 10 00:00:28,000 --> 00:00:31,600 our understanding of HTML, we know that it's these values here 11 00:00:31,600 --> 00:00:33,940 that we would need to submit as part of our post 12 00:00:33,940 --> 00:00:39,130 request. Alright, the options for airports look very similar. So instead 13 00:00:39,130 --> 00:00:42,750 of looking at that, let's take a look at The data 14 00:00:42,750 --> 00:00:44,970 that's here. Now, a couple things that I want to point out 15 00:00:44,970 --> 00:00:48,190 here, before we actually look at the HTML again. One is 16 00:00:48,190 --> 00:00:51,230 that for any given airport, this is reporting both domestic and 17 00:00:51,230 --> 00:00:54,920 international flights for that airport. Now in my case, I'm not actually 18 00:00:54,920 --> 00:00:59,090 interested in that distinction. I simply want to know arrivals and departures. 19 00:00:59,090 --> 00:01:02,070 So where does value in both columns, for a given month? I'm 20 00:01:02,070 --> 00:01:04,410 simply going to add them together. So this is one place that 21 00:01:04,410 --> 00:01:07,220 I'm doing a little bit of reshaping of the data. It's also 22 00:01:07,220 --> 00:01:09,820 the case that I've got these totals here in this columns, and 23 00:01:09,820 --> 00:01:13,250 then at the end of the rows for any given year. I'm 24 00:01:13,250 --> 00:01:16,670 simply going to ignore those. Again, a little bit of reshaping. Okay. 25 00:01:16,670 --> 00:01:20,198 So let's take a look at these elements. Now as you 26 00:01:20,198 --> 00:01:23,036 might expect, these are laid out in a table, and if 27 00:01:23,036 --> 00:01:25,742 I scroll to the top, I can actually see there's a 28 00:01:25,742 --> 00:01:29,648 class attribute for this particular table here. And this is going to help 29 00:01:29,648 --> 00:01:32,252 me when it comes time to actually parse this HTML in 30 00:01:32,252 --> 00:01:36,130 order to extract the data. So we've looked at, both how to 31 00:01:36,130 --> 00:01:38,940 go about getting the values we're going to need to submit 32 00:01:38,940 --> 00:01:42,300 in a post request in order to get the data we need. 33 00:01:42,300 --> 00:01:45,410 And then we've looked at once that data is presented to us, or 34 00:01:45,410 --> 00:01:49,950 in this case, to our program that's going to be accessing the site. 35 00:01:49,950 --> 00:01:52,100 How do we go about finding that data and pulling that data out 36 00:01:52,100 --> 00:01:55,850 of the HTML? Or at least where is it located in the HTML file?