## ← Making Lookup Faster - Intro to Computer Science

• 2 Followers
• 96 Lines

### Get Embed Code x Embed video Use the following code to embed this video. See our usage guide for more details on embedding. Paste this in your document somewhere (closest to the closing body tag is preferable): ```<script type="text/javascript" src='https://amara.org/embedder-iframe'></script> ``` Paste this inside your HTML body, where you want to include the widget: ```<div class="amara-embed" data-url="http://www.youtube.com/watch?v=AArXvYMTCOM" data-team="udacity"></div> ``` 2 Languages

Showing Revision 5 created 05/24/2016 by Udacity Robot.

1. So what do we need to do to make lookup faster?

2. Well let's think about why it was so slow, right? The reason
3. it was so slow is that we were doing this for loop,
4. we were going through all the elements in order, and we're checking
5. if they match the keyword, right? And we had to do this
6. for the entire index, for an entry, for a keyword that's not
7. in the index, To determine that it's not there, we had to
8. go through the whole index. This is not how we use indexes
9. in real life, right? If you're looking for a word in the
10. index of a book, you don't have to look through every single entry
11. to see if that word exists. You can jump around, and the
12. reason that you can jump around is because the entries in the index,
13. are sorted, they're sorted in alphabetical order, so you know where that
14. entry would belong. You just need to find the right place and see
15. if it's there. So we can do that with our index instead
16. of having our index kept in arbitrary order. If we kept our index
17. in assorted order, then we could find the place where
18. that entry belongs and look for it. Sorting is a very
19. interesting problem. It's something we're not actually going to talk about more
20. in this class. We're going to do a different way of doing
21. that. What we're going to do is find a way to
22. find where the entry should be that doesn't require actually keeping
23. all the entries sorted. What we want is something that will
24. allow us, given a keyword, we're going to have some function that
25. tells us where it belongs. We're going to call that
26. a hash function. That tells us where in the entry
27. to look. And so instead of having to look through
28. the whole index, the hash function will tell us where
29. that entry belongs. So what we need for this is
30. some function that's going to take a key word, map
31. it to a number And that number is the position
32. in the index where that number belongs. We could do this
33. lots of different ways. One simple thing would be to
34. think, well we know alphabet. This is more like the way
35. an index for a book would work, and we're going to
36. have for each entry and index, we'll have based on the
38. tha tfirst letter in the same place. So, if we're looking
39. for a key word that starts with u, that prefer hash
40. would tell us to look in the place where all the
41. words that start with u are. And then we'd only have to
42. look through the words that start with u. So this would allow
43. us to do a look up much more quickly than looking through
44. the whole index. This isn't quite the best way to do things. If
45. we made our places based on the letter, well, then we have
46. a problem if we have two words with the same first letter. You
47. generally expect to have more than one word that starts with the
48. same letter. So instead of having just an element here for each position,
49. we're going to have a list of elements that would be all
50. the words that start with u So when we look up the
51. word udacity, we look in the entry for u, and if the
52. word that's there doesn't match then we know udacity isn't in the
53. index. There are lots of problems with this. The first problem is
54. well, there might be more than one word that starts with u.
55. So we can't just have one entry here. What we need to
56. have is a list of entries We often call this a bucket.
57. So we need a bucket of all the entries that
59. of just having one entry like the old structure of our
60. index, now we're going to have a list of entries, and
61. each element of the index will now be a bucket, which
62. is a list of entries that are in the right position.
63. This is going to be our bucket of our, all the entries
64. that start with u, and that would have all the different
65. entries that start with the letter u in that bucket. So this
66. is getting better. Now, for each look up, instead of having to
67. look through all of the words in index, we just need to
68. find the position that starts with the right letter. That's got a
69. bucket of all the words that start with that letter, and then
70. we just need to look through that bucket. This works okay, but
71. this doesn't really scale very well. At best, if we have you
72. know, ten million words well now instead of having ten million entries
73. to go through, we need to go through ten million divided
74. by say 26, if we have 26 letters. It's not making
75. things much faster. It's making things maybe at best, 26 times
76. letter. That assumes that all of the buckets are the same size.
77. Certainly if we make the buckets based on the first letter,
78. that's not going to be the same size. If the words
79. are a typical English words. We're going to have many more
81. So, we want to fix those two problems. We want to
82. be able to have more buckets. So we're not going to just
83. use the first letter, we're going to use some function on the
84. whole word that tells us where it belongs. And we're going to try
85. to make that function distribute the words fairly well. So the
86. structure that I've described is what's called a hash table. This
87. is a very useful data structure. It's so useful that it's
88. built into Python. There's the Python type called a dictionary. Which provides
89. this functionality. At the end of today's unit, I'll explain
90. how the Python dictionary works, and how to use it,
91. and we'll modify the search engine code to use dictionary
92. instead of the lookup table that we built, but before
93. we do that, we are going to implement it ourselves. We
94. are going to make sure that we understand how the hash
95. table works by writing all the code to do it
96. ourselves and then we'll switch to using the built-in Python [INAUDIBLE]