YouTube

Got a YouTube account?

New: enable viewer-created translations and captions on your YouTube channel!

English subtitles

← Cuckoo Hashing Part3 - Intro to Parallel Programming

Get Embed Code
2 Languages

Showing Revision 4 created 05/24/2016 by Udacity Robot.

  1. So a few notes on implementation.
  2. The real benefits of this particular formulation of hashing
  3. are that our look-ups are again constant time,
  4. and that's terrific for parallelism for a machine like the GPU,
  5. because we keep all of our threads busy.
  6. There's no thread divergence at all, and the construction algorithm is actually pretty simple and pretty fast.
  7. It's actually a very efficient hash table construction when it's done in shared memory with a small of item to hash,
  8. but it turns out that it actually works fairly well with large data sets
  9. when the hash table needs to be constructed in main memory.
  10. The really important thing to keep in mind there
  11. is that the operation of write my item into the hash table and kick the other item out needs to be an atomic operation--
  12. in our implementation we used atomic exchange
  13. to make sure the 2 competing threads don't have any possibility of stomping on top of each other.
  14. And a final note, there's more than one way we might solve this general problem of check if an element is in a set.
  15. Instead of using hash tables, we could choose a fairly brute force way to do this.
  16. We could choose to sort all the elements in the set as a construction step,
  17. and then, for the look-up step, we could do binary into the set
  18. to see if that particular element is present.
  19. So for instance, we might have 5 keys here in a set.
  20. To construct this particular data structure, we sort these in order of the key,
  21. and to do look-up we just do binary search within this set.
  22. So for instance, if we're looking up value k6 with one thread,
  23. we'd go see k5--my keys bigger than that.
  24. I'd then pop this way and so on until I get to k6.
  25. I might have another thread that's looking up k3.
  26. It would start in the middle at k5 and then do binary search until it finds k3.
  27. Now sorting is quite fast on GPUs, and even though hash tables are faster for this particular computation,
  28. they're not so much faster that should never consider a sort to be a good idea.
  29. Often on the GPU, sometimes a brute force approach,
  30. like sort and binary search, might be your best option.