Return to Video

Familiarize Yourself with the Dataset - Data Wranging with MongoDB

  • 0:00 - 0:03
    Okay, so let's familiarize ourselves with this dataset a little bit
  • 0:03 - 0:06
    more, and what I really mean here is lets get a little
  • 0:06 - 0:09
    bit better understanding of the OpenStreetMap project itself and
  • 0:09 - 0:12
    get started learning what we need to know in order to do
  • 0:12 - 0:15
    this particular case study. Okay, so I'm actually going to go
  • 0:15 - 0:19
    ahead and submit a search for Chicago to OpenStreetMap and
  • 0:19 - 0:22
    you'll see that I get a number of different results here.
  • 0:22 - 0:25
    The one I'm interested in is this one, the city boundary of
  • 0:25 - 0:28
    Chicago, Cook County, Illinois, United States of America. Doing this is
  • 0:28 - 0:32
    essentially going to select from the OpenStreetMap data set,
  • 0:32 - 0:34
    just that data that has to do with the city of
  • 0:34 - 0:37
    Chicago. So I'm going to click through to this and you can
  • 0:37 - 0:41
    see the outline here that identifies the city boundary. And so,
  • 0:41 - 0:43
    the data that I would be working with is anything that falls
  • 0:43 - 0:46
    in here. So now if I click export, what's going to happen
  • 0:46 - 0:50
    is that I'll see the latitude longitude and it's going to tell me,
  • 0:50 - 0:53
    this is too large to be exported. Okay but
  • 0:53 - 0:56
    then, if I scroll down, I can see that there
  • 0:56 - 1:01
    are actually already prepared extracts from this data set. Okay,
  • 1:01 - 1:02
    so let me make this a little bit bigger. Now
  • 1:02 - 1:06
    what this is, is pre-prepared extracts from this particular
  • 1:06 - 1:11
    data set and these are extracts for major metropolitan areas.
  • 1:11 - 1:14
    So, you can see I've actually clicked on Chicago link
  • 1:14 - 1:16
    before. I'm going to go ahead and click on that and
  • 1:16 - 1:21
    then here I have an opportunity to download a zipped up version of this OSM data
  • 1:21 - 1:23
    as XML. So I click on that and
  • 1:23 - 1:26
    it begins downloading. And once it's done downloading, then
  • 1:26 - 1:31
    we can go ahead and take a look at it. Alright, this data is downloaded, I
  • 1:31 - 1:33
    am going to take a look at it.
  • 1:33 - 1:36
    It's in my downloads directory. I've already unzipped it,
  • 1:38 - 1:40
    and here it is. Lots and lots and lots of
  • 1:40 - 1:43
    XML data. This should look somewhat familiar to you. We've seen
  • 1:43 - 1:47
    these note tags before and we've actually extracted a little data
  • 1:47 - 1:51
    from this data set previously. Okay, I'm going to use the
  • 1:51 - 1:54
    shell command LS, see what the size of this dataset is.
  • 1:54 - 1:57
    You can see here that it's about 1.8 gigabytes. So, this
  • 1:57 - 2:00
    is a huge dataset. What this means is that in order
  • 2:00 - 2:03
    to process this data, we can't really read it into memory.
  • 2:03 - 2:05
    And so as you'll see a little bit later on, we're going to
  • 2:05 - 2:09
    use an approach to parsing this that uses a SAX parser, which we did
  • 2:09 - 2:13
    look at a little bit in a previous lesson. Okay so now, what I
  • 2:13 - 2:15
    usually do in a situation like this is I explore the data itself a
  • 2:15 - 2:18
    little bit. I might even write a little code to parse out a
  • 2:18 - 2:21
    little of what's here just so I can get a feel for it. The
  • 2:21 - 2:24
    next thing then that we'll want to do is to read enough documentation to
  • 2:24 - 2:28
    answer any questions that we have or at least enough to get us started.
  • 2:28 - 2:31
    Okay so what I'm going to do here, then, is simply query for
  • 2:31 - 2:35
    OpenStreetMap documentation and I can see that there's actually a
  • 2:35 - 2:37
    Wiki for OpenStreetMap. So if I click through to that,
  • 2:37 - 2:40
    I land on the Wiki page which gives me quite a bit
  • 2:40 - 2:42
    of information about OpenStreetMap. Now I'm going to make this ridiculously
  • 2:42 - 2:45
    big so that you can see it on your screen. And if
  • 2:45 - 2:48
    we scroll down, you can see there's a lot to read here.
  • 2:48 - 2:53
    So you can see there's a beginners guide, information for developers, and other
  • 2:53 - 2:56
    pieces of information - map features, that sort of thing.
  • 2:56 - 2:59
    I know from having looked at this page before that there's
  • 2:59 - 3:03
    actually documentation on the XML format which is going to
  • 3:03 - 3:06
    be useful to us as we move forward. This provides an
  • 3:06 - 3:08
    example of the different sorts of tags that you're going to
  • 3:08 - 3:10
    see in this data set and a little bit of an
  • 3:10 - 3:14
    explanation about it. For example, we can see from this documentation
  • 3:14 - 3:19
    that this data is essentially instances of 3 different data primitives.
  • 3:19 - 3:22
    Data primitives being nodes, ways, and relations and
  • 3:22 - 3:23
    if we click through to any one of
  • 3:23 - 3:25
    these, we'll get a little bit more information
  • 3:25 - 3:28
    about them, okay? So I encourage you to take
  • 3:28 - 3:30
    a look at this documentation for yourself. Make
  • 3:30 - 3:33
    sure you understand nodes, ways and relations and
  • 3:33 - 3:36
    you'll be in good shape to more on. Just do a little bit of reading for now.
Tytuł:
Familiarize Yourself with the Dataset - Data Wranging with MongoDB
Video Language:
English
Team:
Udacity
Projekt:
UD032: Data Wrangling with MongoDB
Duration:
03:36

English (United States) subtitles

Revisions Compare revisions