## ← Testing Hash Functions - Intro to Computer Science

• 3 Followers
• 53 Lines

### Get Embed Code x Embed video Use the following code to embed this video. See our usage guide for more details on embedding. Paste this in your document somewhere (closest to the closing body tag is preferable): ```<script type="text/javascript" src='https://amara.org/embedder-iframe'></script> ``` Paste this inside your HTML body, where you want to include the widget: ```<div class="amara-embed" data-url="http://www.youtube.com/watch?v=TjWwI-MvEhI" data-team="udacity"></div> ``` 2 Languages

Showing Revision 6 created 07/06/2017 by Dejan Trajkovic.

1. So let's test our new hash function. See if it does
2. better than the bad string hash function we defined earlier. We're going
3. to use the same test hash function we defined before, that
4. takes the function as input. So we can pass in either the
5. original bad string hash function, or the new hash string function
6. that we hope will work better. A list of keys, and the
7. size, computing the hash size value for each key. And, records
8. how many end up in each position. So let's try that again,
9. we'll use the same example we had before using
10. the words from Sherlock Holmes. And so we'll initialize
11. words, as all with words in the page that
13. from the Web. So, first we'll try this again
14. with the old hash function, bad string hash, and
15. obtain the counts, and let's print those out, remember
16. what those look like. So it's a pretty bad distribution,
17. as we saw before, ranging from 725 all the way up to over 2,000, in one
18. bucket. Now, let's try it with the new
20. plain hash string. We're using the same words,
21. the same number of buckets. And we'll see
22. the distribution now. It looks a lot better,
23. right. There's no values under a thousand now,
24. there's now values over the highest one would be 1,363
25. in the first bucket. So, lets look at that graphically.
27. function we can see the size of the buckets varies a
28. great deal. And we have the red and blue bars
29. showing some that are too popular, some that are not popular
30. enough. With the new hash function we have much less
31. variance. Still not perfect. We'd like to have all the bars
32. really be as close to the same as possible, but it's really
33. close. So this is working pretty well. The other thing we can
34. try is having more buckets. So lets try this one, we are
35. doing the same thing but this time with a 100 buckets instead of
36. 12 buckets. And we can print that out, and so we see
37. the results when we have 100 buckets, are pretty good, but certainly
38. not perfect. We have buckets as small as this one, that has
39. a 104, and as larger, this one that has a 197. So almost
40. twice the size of the smallest bucket. It's certainly a hard
41. problem to build a better hash function. People put a lot of
42. effort into building good hash functions. As your tables get larger,
43. it's very important to both have the hash function be efficient. Our
44. hash string function is not that great, because it does take
45. a long time to execute. If the string gets longer we have
46. to go through that loop once for each character. And so there
47. are better hash functions available. We're not going to look at those
48. in more detail now. But there'll be some links on the website to documents about
49. more interesting hash functions. This is going to
50. work well enough for us though. So before
51. we go on to actually implementing in a hash table. We're going to have one quiz
52. to make sure that you understand, why this
53. is so much better than the original index.