YouTube

Got a YouTube account?

New: enable viewer-created translations and captions on your YouTube channel!

English subtitles

← Better Hash Functions Solution - Intro to Computer Science

Get Embed Code
2 Languages

Showing Revision 5 created 05/24/2016 by Udacity Robot.

  1. So, this is a pretty tough question. If you weren't able to
  2. get it yourself, that's okay. What I'd encourage you to do is stop
  3. in the middle of the explanation, once the part where you got
  4. stuck on makes sense, and see if you can finish it yourself. So,
  5. what we need to do is define a function, so we're going to
  6. define a function hash-string, and it takes two inputs, the keyword and number
  7. of buckets. And we're going to keep track of where we are in the
  8. circle. We need to introduce a new variable to do that, and we
  9. should start at position zero. So, we'll initialize that variable.
  10. We'll use h to represent the hash, and we'll initialize that
  11. to zero. Now, we want to go through the characters in
  12. the key word. So, we'll have a for loop that goes
  13. through each character in keyword. And for each character, we
  14. want to add to the hash. So, we're going to add to
  15. the hash, the value of that character. We could do the
  16. modulo here, so we could, at this point, use modulo buckets.
  17. We have to be careful to have the parentheses here. If we
  18. just have the modulo buckets here, we wouldn't get the right result.
  19. because it would do ord c modulo buckets, but what we really
  20. need to do is modulo buckets, the sum that we get from h
  21. and ord c. And then at the end, we're going to return the
  22. hash value. We could, instead of doing the modulo each time here, we
  23. could do the modulo just once at the end. That would be
  24. computing a big number if we have a really big string, and then,
  25. at the end, compute our modulo buckets. Either way should work.
  26. This way's a little better, in the sense that if our
  27. string is very long, we would have to compute a really,
  28. really big number, which gets to be more expensive. And we
  29. might even run out if it's a super long keyword. So,
  30. it's better to do the modulo here. We get the same
  31. result either way though. Let's try this in the Python interpreter.
  32. So, here's the code that we wrote out on the sketchpad.
  33. We have our variable, h, which is going to keep track
  34. of the hash value. We're going to go through all the
  35. characters in the keyword, adding each one into the hash
  36. value, modulo the number of buckets. So let's try the examples.
  37. So a, with 12 buckets, hashes the bucket one as
  38. we expect, and if we look at b. And hash just
  39. the bucket two, also as we expect. So, one other thing
  40. we should try, if you remember, when we had the really
  41. bad hash string function. One of the many problems it had
  42. was it didn't work on all strings, in particular, did not
  43. work on empty strings. Do you think our hash string function
  44. here will work on the empty string? Try to guess what
  45. the result should be before I run it. And then I
  46. will run it. And you see the result is zero. So, no
  47. error. And it makes sense that the result is zero. We
  48. start with h is zero. When there are no characters in the
  49. string, we don't go through this loop at all. So, h
  50. is still zero when we return. And let's also try the longer
  51. example. When we hash the string udacity with 12 buckets, we get
  52. bucket 11. We should be able to increase the number of buckets.
  53. So, let's increase the number of buckets. Let's suppose we had 1,000 buckets,
  54. and we get bucket 755. This isn't enough to convince us that our hash string
  55. function is distributing all strings well between
  56. all buckets. But at least we're getting
  57. a, a fairly large number that says we might be using all the buckets.