Return to Video

Making Lookup Faster - Intro to Computer Science

  • 0:00 - 0:02
    So what do we need to do to make lookup faster?
  • 0:02 - 0:06
    Well let's think about why it was so slow, right? The reason
  • 0:06 - 0:09
    it was so slow is that we were doing this for loop,
  • 0:09 - 0:12
    we were going through all the elements in order, and we're checking
  • 0:12 - 0:14
    if they match the keyword, right? And we had to do this
  • 0:14 - 0:19
    for the entire index, for an entry, for a keyword that's not
  • 0:19 - 0:21
    in the index, To determine that it's not there, we had to
  • 0:21 - 0:26
    go through the whole index. This is not how we use indexes
  • 0:26 - 0:29
    in real life, right? If you're looking for a word in the
  • 0:29 - 0:32
    index of a book, you don't have to look through every single entry
  • 0:32 - 0:35
    to see if that word exists. You can jump around, and the
  • 0:35 - 0:38
    reason that you can jump around is because the entries in the index,
  • 0:38 - 0:41
    are sorted, they're sorted in alphabetical order, so you know where that
  • 0:41 - 0:44
    entry would belong. You just need to find the right place and see
  • 0:44 - 0:47
    if it's there. So we can do that with our index instead
  • 0:47 - 0:51
    of having our index kept in arbitrary order. If we kept our index
  • 0:51 - 0:54
    in assorted order, then we could find the place where
  • 0:54 - 0:58
    that entry belongs and look for it. Sorting is a very
  • 0:58 - 1:01
    interesting problem. It's something we're not actually going to talk about more
  • 1:01 - 1:04
    in this class. We're going to do a different way of doing
  • 1:04 - 1:06
    that. What we're going to do is find a way to
  • 1:06 - 1:10
    find where the entry should be that doesn't require actually keeping
  • 1:10 - 1:12
    all the entries sorted. What we want is something that will
  • 1:12 - 1:16
    allow us, given a keyword, we're going to have some function that
  • 1:16 - 1:19
    tells us where it belongs. We're going to call that
  • 1:19 - 1:23
    a hash function. That tells us where in the entry
  • 1:23 - 1:26
    to look. And so instead of having to look through
  • 1:26 - 1:29
    the whole index, the hash function will tell us where
  • 1:29 - 1:33
    that entry belongs. So what we need for this is
  • 1:33 - 1:35
    some function that's going to take a key word, map
  • 1:35 - 1:38
    it to a number And that number is the position
  • 1:38 - 1:41
    in the index where that number belongs. We could do this
  • 1:41 - 1:44
    lots of different ways. One simple thing would be to
  • 1:44 - 1:48
    think, well we know alphabet. This is more like the way
  • 1:48 - 1:51
    an index for a book would work, and we're going to
  • 1:51 - 1:55
    have for each entry and index, we'll have based on the
  • 1:55 - 1:58
    first letter, we'll put all the entries that start with
  • 1:58 - 2:01
    tha tfirst letter in the same place. So, if we're looking
  • 2:01 - 2:05
    for a key word that starts with u, that prefer hash
  • 2:05 - 2:06
    would tell us to look in the place where all the
  • 2:06 - 2:09
    words that start with u are. And then we'd only have to
  • 2:09 - 2:11
    look through the words that start with u. So this would allow
  • 2:11 - 2:14
    us to do a look up much more quickly than looking through
  • 2:14 - 2:17
    the whole index. This isn't quite the best way to do things. If
  • 2:17 - 2:21
    we made our places based on the letter, well, then we have
  • 2:21 - 2:24
    a problem if we have two words with the same first letter. You
  • 2:24 - 2:27
    generally expect to have more than one word that starts with the
  • 2:27 - 2:32
    same letter. So instead of having just an element here for each position,
  • 2:32 - 2:35
    we're going to have a list of elements that would be all
  • 2:35 - 2:39
    the words that start with u So when we look up the
  • 2:39 - 2:43
    word udacity, we look in the entry for u, and if the
  • 2:43 - 2:46
    word that's there doesn't match then we know udacity isn't in the
  • 2:46 - 2:49
    index. There are lots of problems with this. The first problem is
  • 2:49 - 2:51
    well, there might be more than one word that starts with u.
  • 2:51 - 2:54
    So we can't just have one entry here. What we need to
  • 2:54 - 2:57
    have is a list of entries We often call this a bucket.
  • 2:57 - 2:59
    So we need a bucket of all the entries that
  • 2:59 - 3:03
    start with u that would be in this position. So, instead
  • 3:03 - 3:06
    of just having one entry like the old structure of our
  • 3:06 - 3:08
    index, now we're going to have a list of entries, and
  • 3:08 - 3:11
    each element of the index will now be a bucket, which
  • 3:11 - 3:15
    is a list of entries that are in the right position.
  • 3:15 - 3:18
    This is going to be our bucket of our, all the entries
  • 3:18 - 3:22
    that start with u, and that would have all the different
  • 3:22 - 3:26
    entries that start with the letter u in that bucket. So this
  • 3:26 - 3:29
    is getting better. Now, for each look up, instead of having to
  • 3:29 - 3:31
    look through all of the words in index, we just need to
  • 3:31 - 3:34
    find the position that starts with the right letter. That's got a
  • 3:34 - 3:37
    bucket of all the words that start with that letter, and then
  • 3:37 - 3:40
    we just need to look through that bucket. This works okay, but
  • 3:40 - 3:43
    this doesn't really scale very well. At best, if we have you
  • 3:43 - 3:48
    know, ten million words well now instead of having ten million entries
  • 3:48 - 3:51
    to go through, we need to go through ten million divided
  • 3:51 - 3:55
    by say 26, if we have 26 letters. It's not making
  • 3:55 - 3:58
    things much faster. It's making things maybe at best, 26 times
  • 3:58 - 4:02
    letter. That assumes that all of the buckets are the same size.
  • 4:02 - 4:04
    Certainly if we make the buckets based on the first letter,
  • 4:04 - 4:06
    that's not going to be the same size. If the words
  • 4:06 - 4:09
    are a typical English words. We're going to have many more
  • 4:09 - 4:12
    words that start with s or t, say than start with u.
  • 4:12 - 4:16
    So, we want to fix those two problems. We want to
  • 4:16 - 4:18
    be able to have more buckets. So we're not going to just
  • 4:18 - 4:20
    use the first letter, we're going to use some function on the
  • 4:20 - 4:24
    whole word that tells us where it belongs. And we're going to try
  • 4:24 - 4:28
    to make that function distribute the words fairly well. So the
  • 4:28 - 4:30
    structure that I've described is what's called a hash table. This
  • 4:30 - 4:33
    is a very useful data structure. It's so useful that it's
  • 4:33 - 4:37
    built into Python. There's the Python type called a dictionary. Which provides
  • 4:37 - 4:40
    this functionality. At the end of today's unit, I'll explain
  • 4:40 - 4:42
    how the Python dictionary works, and how to use it,
  • 4:42 - 4:45
    and we'll modify the search engine code to use dictionary
  • 4:45 - 4:48
    instead of the lookup table that we built, but before
  • 4:48 - 4:51
    we do that, we are going to implement it ourselves. We
  • 4:51 - 4:53
    are going to make sure that we understand how the hash
  • 4:53 - 4:55
    table works by writing all the code to do it
  • 4:55 - 4:59
    ourselves and then we'll switch to using the built-in Python [INAUDIBLE]
Title:
Making Lookup Faster - Intro to Computer Science
Description:

more » « less
Video Language:
English
Team:
Udacity
Project:
CS101 - Intro to Computer Science
Duration:
04:59

English subtitles

Revisions Compare revisions