YouTube

Got a YouTube account?

New: enable viewer-created translations and captions on your YouTube channel!

English subtitles

← Testing Hash Functions - Intro to Computer Science

Get Embed Code
2 Languages

Showing Revision 6 created 07/06/2017 by Dejan Trajkovic.

  1. So let's test our new hash function. See if it does
  2. better than the bad string hash function we defined earlier. We're going
  3. to use the same test hash function we defined before, that
  4. takes the function as input. So we can pass in either the
  5. original bad string hash function, or the new hash string function
  6. that we hope will work better. A list of keys, and the
  7. size, computing the hash size value for each key. And, records
  8. how many end up in each position. So let's try that again,
  9. we'll use the same example we had before using
  10. the words from Sherlock Holmes. And so we'll initialize
  11. words, as all with words in the page that
  12. is the Adventures of Sherlock Holmes, that we load
  13. from the Web. So, first we'll try this again
  14. with the old hash function, bad string hash, and
  15. obtain the counts, and let's print those out, remember
  16. what those look like. So it's a pretty bad distribution,
  17. as we saw before, ranging from 725 all the way up to over 2,000, in one
  18. bucket. Now, let's try it with the new
  19. hash function, instead of bad_hash_string. We'll pass in
  20. plain hash string. We're using the same words,
  21. the same number of buckets. And we'll see
  22. the distribution now. It looks a lot better,
  23. right. There's no values under a thousand now,
  24. there's now values over the highest one would be 1,363
  25. in the first bucket. So, lets look at that graphically.
  26. Here's what we had before. With a bad hash string
  27. function we can see the size of the buckets varies a
  28. great deal. And we have the red and blue bars
  29. showing some that are too popular, some that are not popular
  30. enough. With the new hash function we have much less
  31. variance. Still not perfect. We'd like to have all the bars
  32. really be as close to the same as possible, but it's really
  33. close. So this is working pretty well. The other thing we can
  34. try is having more buckets. So lets try this one, we are
  35. doing the same thing but this time with a 100 buckets instead of
  36. 12 buckets. And we can print that out, and so we see
  37. the results when we have 100 buckets, are pretty good, but certainly
  38. not perfect. We have buckets as small as this one, that has
  39. a 104, and as larger, this one that has a 197. So almost
  40. twice the size of the smallest bucket. It's certainly a hard
  41. problem to build a better hash function. People put a lot of
  42. effort into building good hash functions. As your tables get larger,
  43. it's very important to both have the hash function be efficient. Our
  44. hash string function is not that great, because it does take
  45. a long time to execute. If the string gets longer we have
  46. to go through that loop once for each character. And so there
  47. are better hash functions available. We're not going to look at those
  48. in more detail now. But there'll be some links on the website to documents about
  49. more interesting hash functions. This is going to
  50. work well enough for us though. So before
  51. we go on to actually implementing in a hash table. We're going to have one quiz
  52. to make sure that you understand, why this
  53. is so much better than the original index.