Return to Video

Using Indexes - Data Wranging with MongoDB

  • 0:00 - 0:01
    All right. So now I would like to show you a
  • 0:01 - 0:07
    large collection, and the effect that indexing can have on performance. Now,
  • 0:07 - 0:10
    our collection of tweets is relatively small. So, instead of going
  • 0:10 - 0:13
    to revisit the open street map dataset, which you should remember from
  • 0:13 - 0:16
    the previous lesson. This will also serve as a nice transition
  • 0:16 - 0:19
    to the next lesson, which is a project using the Open Street
  • 0:19 - 0:22
    Map data. So for this example, I'm going to work in
  • 0:22 - 0:25
    the MongoDB shell, and the database in which this data is stored
  • 0:25 - 0:28
    is the OSM database. So I'll switch to using
  • 0:28 - 0:33
    that, and then let's take a look at what documents
  • 0:33 - 0:34
    look like in this database, and to do that,
  • 0:34 - 0:37
    we'll just do a find everything. Okay, so then if
  • 0:37 - 0:40
    we scroll up, You see what these documents look
  • 0:40 - 0:44
    like. There's location information, latitude/longitude. So all data in this
  • 0:44 - 0:48
    collection is tied to a specific location. And something I'd
  • 0:48 - 0:50
    like to point out is that some of the documents
  • 0:50 - 0:53
    in this collection actually have a TG field. And this
  • 0:53 - 0:55
    is shorthand for tag. So if you remember, we looked
  • 0:55 - 1:00
    before at how specific locations will be tagged from time
  • 1:00 - 1:03
    to time within this particular data set. And the way
  • 1:03 - 1:06
    these tags work in this collection is, there is a
  • 1:06 - 1:10
    tag field that is array valued and each of the
  • 1:10 - 1:13
    individual values in this array are a sub-document with a"
  • 1:13 - 1:16
    k" and" v" field. Much like we saw in the XML
  • 1:16 - 1:18
    version of this data set. Now, the reason why the data
  • 1:18 - 1:22
    is represented this way is because there could be multiple taggings
  • 1:22 - 1:25
    all with the same type or key. So storing them in
  • 1:25 - 1:28
    an array this way gives us the ability to do that without
  • 1:28 - 1:32
    one tag writing over another as we put them into MongoDB.
  • 1:32 - 1:35
    One alternative would be to have this be a field name out
  • 1:35 - 1:38
    here, and this be the value for that field. Okay, now
  • 1:38 - 1:41
    something I'd like to point out about this data set is there
  • 1:41 - 1:46
    are more than seven million documents in this particular collection. Now, we've
  • 1:46 - 1:50
    just talked about indexing, and how indexes can improve performance. So let's
  • 1:50 - 1:53
    take a look at a query and the amount of time it
  • 1:53 - 1:54
    takes for that query to come back. Right now, I don't have
  • 1:54 - 1:58
    an index on this tag field. So let's do a query and
  • 1:58 - 2:00
    see what our performance looks like. Now this particular collection that I've
  • 2:00 - 2:03
    loaded here has actually all of the Open Street Map data from
  • 2:03 - 2:06
    the city of Chicago again. And what I'm doing here is querying
  • 2:06 - 2:12
    for any nodes or any geographic locations that have been tagged with the
  • 2:12 - 2:17
    name Giordanos which is a famous Chicago pizza chain. So if you do this query
  • 2:17 - 2:20
    [BLANK_AUDIO]
  • 2:20 - 2:22
    You can see that it takes a little while to
  • 2:22 - 2:24
    come back. Let's do a pretty version of this query.
  • 2:24 - 2:26
    [BLANK_AUDIO]
  • 2:26 - 2:28
    Okay, so a couple of seconds right? For the query
  • 2:28 - 2:31
    to come back. So, if we're doing a single query,
  • 2:31 - 2:34
    [SOUND] not a big of a deal. The fact is,
  • 2:34 - 2:39
    in most applications, we're doing many queries. In some applications
  • 2:39 - 2:41
    hundreds, or maybe even thousands, or tens of thousands of
  • 2:41 - 2:45
    queries in a very short period of time. So, waiting
  • 2:45 - 2:48
    two seconds for a query to come back, given the
  • 2:48 - 2:52
    load that places on the database server, and. Although simultaneous queries
  • 2:52 - 2:54
    going on at the same time simply just doesn't work
  • 2:54 - 2:57
    for our applications. It's death to our application, as I mentioned
  • 2:57 - 3:01
    previously. So if we built an index on this collection, then
  • 3:01 - 3:04
    our performance improves significantly because instead of having to do a
  • 3:04 - 3:07
    table scan. As we saw before. I can simply go right
  • 3:07 - 3:10
    to the place on disc where these particular documents matching my
  • 3:10 - 3:14
    query are located. Now, going too far beyond where we're already
  • 3:14 - 3:17
    at with indexing is beyond the scope of this class. So
  • 3:17 - 3:20
    I'm just going to, essentially get you started with indexes
  • 3:20 - 3:23
    here. And encourage you to go look at the MongoDB
  • 3:23 - 3:26
    documentation or take a look at the free online courses
  • 3:26 - 3:30
    we offer on university.mongodb.com, see in the instructor notes for
  • 3:30 - 3:33
    a comprehensive treatment of all of these topics that we've
  • 3:33 - 3:36
    been discussing with regard to Mongo DB, including indexing. Okay,
  • 3:36 - 3:38
    so let's build my index, and the way I do
  • 3:38 - 3:42
    this is specifying the field on which I would like the
  • 3:42 - 3:45
    index created. And I do this simply by saying, make
  • 3:45 - 3:48
    it the case that there is an index on the
  • 3:48 - 3:54
    field tg in the collection nodes. Okay? Now, again, more
  • 3:54 - 3:56
    than seven million documents in this collection, so this is going to
  • 3:56 - 3:58
    take a little while to come back. It's going to take
  • 3:58 - 4:01
    a little while to build the index. Had we created the
  • 4:01 - 4:03
    index at first then as we were loading data into
  • 4:03 - 4:07
    the collection it would have been updated with each write to
  • 4:07 - 4:10
    the data base. But in this case we're building the index after
  • 4:10 - 4:11
    we've loaded all the data in so that I could work through
  • 4:11 - 4:15
    this example. So, we'll just skip ahead in the video to the
  • 4:15 - 4:17
    point where the index is actually created. This is going to take a
  • 4:17 - 4:21
    couple of minutes. Okay. So now with the index created, let's run
  • 4:21 - 4:24
    that query again and look at the performance difference. It was intimidate
  • 4:24 - 4:29
    really. We got back our documents right away. And so that illustrates
  • 4:29 - 4:33
    the performance differences in using an index versus not using an index.
Tytuł:
Using Indexes - Data Wranging with MongoDB
Video Language:
English
Team:
Udacity
Projekt:
UD032: Data Wrangling with MongoDB
Duration:
04:34

English subtitles

Revisions Compare revisions