## Making Lookup Faster - Intro to Computer Science

• 0:00 - 0:02
So what do we need to do to make lookup faster?
• 0:02 - 0:06
Well let's think about why it was so slow, right? The reason
• 0:06 - 0:09
it was so slow is that we were doing this for loop,
• 0:09 - 0:12
we were going through all the elements in order, and we're checking
• 0:12 - 0:14
if they match the keyword, right? And we had to do this
• 0:14 - 0:19
for the entire index, for an entry, for a keyword that's not
• 0:19 - 0:21
in the index, To determine that it's not there, we had to
• 0:21 - 0:26
go through the whole index. This is not how we use indexes
• 0:26 - 0:29
in real life, right? If you're looking for a word in the
• 0:29 - 0:32
index of a book, you don't have to look through every single entry
• 0:32 - 0:35
to see if that word exists. You can jump around, and the
• 0:35 - 0:38
reason that you can jump around is because the entries in the index,
• 0:38 - 0:41
are sorted, they're sorted in alphabetical order, so you know where that
• 0:41 - 0:44
entry would belong. You just need to find the right place and see
• 0:44 - 0:47
if it's there. So we can do that with our index instead
• 0:47 - 0:51
of having our index kept in arbitrary order. If we kept our index
• 0:51 - 0:54
in assorted order, then we could find the place where
• 0:54 - 0:58
that entry belongs and look for it. Sorting is a very
• 0:58 - 1:01
interesting problem. It's something we're not actually going to talk about more
• 1:01 - 1:04
in this class. We're going to do a different way of doing
• 1:04 - 1:06
that. What we're going to do is find a way to
• 1:06 - 1:10
find where the entry should be that doesn't require actually keeping
• 1:10 - 1:12
all the entries sorted. What we want is something that will
• 1:12 - 1:16
allow us, given a keyword, we're going to have some function that
• 1:16 - 1:19
tells us where it belongs. We're going to call that
• 1:19 - 1:23
a hash function. That tells us where in the entry
• 1:23 - 1:26
to look. And so instead of having to look through
• 1:26 - 1:29
the whole index, the hash function will tell us where
• 1:29 - 1:33
that entry belongs. So what we need for this is
• 1:33 - 1:35
some function that's going to take a key word, map
• 1:35 - 1:38
it to a number And that number is the position
• 1:38 - 1:41
in the index where that number belongs. We could do this
• 1:41 - 1:44
lots of different ways. One simple thing would be to
• 1:44 - 1:48
think, well we know alphabet. This is more like the way
• 1:48 - 1:51
an index for a book would work, and we're going to
• 1:51 - 1:55
have for each entry and index, we'll have based on the
• 1:55 - 1:58
• 1:58 - 2:01
tha tfirst letter in the same place. So, if we're looking
• 2:01 - 2:05
for a key word that starts with u, that prefer hash
• 2:05 - 2:06
would tell us to look in the place where all the
• 2:06 - 2:09
words that start with u are. And then we'd only have to
• 2:09 - 2:11
look through the words that start with u. So this would allow
• 2:11 - 2:14
us to do a look up much more quickly than looking through
• 2:14 - 2:17
the whole index. This isn't quite the best way to do things. If
• 2:17 - 2:21
we made our places based on the letter, well, then we have
• 2:21 - 2:24
a problem if we have two words with the same first letter. You
• 2:24 - 2:27
generally expect to have more than one word that starts with the
• 2:27 - 2:32
same letter. So instead of having just an element here for each position,
• 2:32 - 2:35
we're going to have a list of elements that would be all
• 2:35 - 2:39
the words that start with u So when we look up the
• 2:39 - 2:43
word udacity, we look in the entry for u, and if the
• 2:43 - 2:46
word that's there doesn't match then we know udacity isn't in the
• 2:46 - 2:49
index. There are lots of problems with this. The first problem is
• 2:49 - 2:51
well, there might be more than one word that starts with u.
• 2:51 - 2:54
So we can't just have one entry here. What we need to
• 2:54 - 2:57
have is a list of entries We often call this a bucket.
• 2:57 - 2:59
So we need a bucket of all the entries that
• 2:59 - 3:03
• 3:03 - 3:06
of just having one entry like the old structure of our
• 3:06 - 3:08
index, now we're going to have a list of entries, and
• 3:08 - 3:11
each element of the index will now be a bucket, which
• 3:11 - 3:15
is a list of entries that are in the right position.
• 3:15 - 3:18
This is going to be our bucket of our, all the entries
• 3:18 - 3:22
• 3:22 - 3:26
entries that start with the letter u in that bucket. So this
• 3:26 - 3:29
is getting better. Now, for each look up, instead of having to
• 3:29 - 3:31
look through all of the words in index, we just need to
• 3:31 - 3:34
find the position that starts with the right letter. That's got a
• 3:34 - 3:37
bucket of all the words that start with that letter, and then
• 3:37 - 3:40
we just need to look through that bucket. This works okay, but
• 3:40 - 3:43
this doesn't really scale very well. At best, if we have you
• 3:43 - 3:48
know, ten million words well now instead of having ten million entries
• 3:48 - 3:51
to go through, we need to go through ten million divided
• 3:51 - 3:55
by say 26, if we have 26 letters. It's not making
• 3:55 - 3:58
things much faster. It's making things maybe at best, 26 times
• 3:58 - 4:02
letter. That assumes that all of the buckets are the same size.
• 4:02 - 4:04
Certainly if we make the buckets based on the first letter,
• 4:04 - 4:06
that's not going to be the same size. If the words
• 4:06 - 4:09
are a typical English words. We're going to have many more
• 4:09 - 4:12
• 4:12 - 4:16
So, we want to fix those two problems. We want to
• 4:16 - 4:18
be able to have more buckets. So we're not going to just
• 4:18 - 4:20
use the first letter, we're going to use some function on the
• 4:20 - 4:24
whole word that tells us where it belongs. And we're going to try
• 4:24 - 4:28
to make that function distribute the words fairly well. So the
• 4:28 - 4:30
structure that I've described is what's called a hash table. This
• 4:30 - 4:33
is a very useful data structure. It's so useful that it's
• 4:33 - 4:37
built into Python. There's the Python type called a dictionary. Which provides
• 4:37 - 4:40
this functionality. At the end of today's unit, I'll explain
• 4:40 - 4:42
how the Python dictionary works, and how to use it,
• 4:42 - 4:45
and we'll modify the search engine code to use dictionary
• 4:45 - 4:48
instead of the lookup table that we built, but before
• 4:48 - 4:51
we do that, we are going to implement it ourselves. We
• 4:51 - 4:53
are going to make sure that we understand how the hash
• 4:53 - 4:55
table works by writing all the code to do it
• 4:55 - 4:59
ourselves and then we'll switch to using the built-in Python [INAUDIBLE]
Title:
Making Lookup Faster - Intro to Computer Science
Description:

more » « less
Video Language:
English
Team:
Udacity
Project:
CS101 - Intro to Computer Science
Duration:
04:59
 Udacity Robot edited English subtitles for 18-18 Making Lookup Faster Udacity Robot edited English subtitles for 18-18 Making Lookup Faster Udacity Robot edited English subtitles for 18-18 Making Lookup Faster Udacity Robot edited English subtitles for 18-18 Making Lookup Faster Cogi-Admin edited English subtitles for 18-18 Making Lookup Faster

# English subtitles

## Revisions Compare revisions

• API
Udacity Robot
• API
Udacity Robot
• API
Udacity Robot
• API
Udacity Robot
• API