WEBVTT 00:00:00.150 --> 00:00:02.220 So, this is a pretty tough question. If you weren't able to 00:00:02.220 --> 00:00:05.350 get it yourself, that's okay. What I'd encourage you to do is stop 00:00:05.350 --> 00:00:08.090 in the middle of the explanation, once the part where you got 00:00:08.090 --> 00:00:11.850 stuck on makes sense, and see if you can finish it yourself. So, 00:00:11.850 --> 00:00:14.330 what we need to do is define a function, so we're going to 00:00:14.330 --> 00:00:18.980 define a function hash-string, and it takes two inputs, the keyword and number 00:00:18.980 --> 00:00:22.330 of buckets. And we're going to keep track of where we are in the 00:00:22.330 --> 00:00:25.640 circle. We need to introduce a new variable to do that, and we 00:00:25.640 --> 00:00:28.780 should start at position zero. So, we'll initialize that variable. 00:00:28.780 --> 00:00:31.790 We'll use h to represent the hash, and we'll initialize that 00:00:31.790 --> 00:00:34.260 to zero. Now, we want to go through the characters in 00:00:34.260 --> 00:00:37.300 the key word. So, we'll have a for loop that goes 00:00:37.300 --> 00:00:40.650 through each character in keyword. And for each character, we 00:00:40.650 --> 00:00:42.960 want to add to the hash. So, we're going to add to 00:00:42.960 --> 00:00:46.928 the hash, the value of that character. We could do the 00:00:46.928 --> 00:00:51.730 modulo here, so we could, at this point, use modulo buckets. 00:00:51.730 --> 00:00:53.660 We have to be careful to have the parentheses here. If we 00:00:53.660 --> 00:00:57.365 just have the modulo buckets here, we wouldn't get the right result. 00:00:57.365 --> 00:00:59.690 because it would do ord c modulo buckets, but what we really 00:00:59.690 --> 00:01:03.460 need to do is modulo buckets, the sum that we get from h 00:01:03.460 --> 00:01:06.380 and ord c. And then at the end, we're going to return the 00:01:06.380 --> 00:01:10.280 hash value. We could, instead of doing the modulo each time here, we 00:01:10.280 --> 00:01:12.970 could do the modulo just once at the end. That would be 00:01:12.970 --> 00:01:16.640 computing a big number if we have a really big string, and then, 00:01:16.640 --> 00:01:20.610 at the end, compute our modulo buckets. Either way should work. 00:01:20.610 --> 00:01:22.820 This way's a little better, in the sense that if our 00:01:22.820 --> 00:01:25.520 string is very long, we would have to compute a really, 00:01:25.520 --> 00:01:28.230 really big number, which gets to be more expensive. And we 00:01:28.230 --> 00:01:31.790 might even run out if it's a super long keyword. So, 00:01:31.790 --> 00:01:34.490 it's better to do the modulo here. We get the same 00:01:34.490 --> 00:01:38.300 result either way though. Let's try this in the Python interpreter. 00:01:38.300 --> 00:01:41.990 So, here's the code that we wrote out on the sketchpad. 00:01:41.990 --> 00:01:44.980 We have our variable, h, which is going to keep track 00:01:44.980 --> 00:01:46.650 of the hash value. We're going to go through all the 00:01:46.650 --> 00:01:49.550 characters in the keyword, adding each one into the hash 00:01:49.550 --> 00:01:53.800 value, modulo the number of buckets. So let's try the examples. 00:01:56.820 --> 00:01:59.890 So a, with 12 buckets, hashes the bucket one as 00:01:59.890 --> 00:02:04.590 we expect, and if we look at b. And hash just 00:02:04.590 --> 00:02:08.008 the bucket two, also as we expect. So, one other thing 00:02:08.008 --> 00:02:10.320 we should try, if you remember, when we had the really 00:02:10.320 --> 00:02:13.530 bad hash string function. One of the many problems it had 00:02:13.530 --> 00:02:15.890 was it didn't work on all strings, in particular, did not 00:02:15.890 --> 00:02:18.690 work on empty strings. Do you think our hash string function 00:02:18.690 --> 00:02:21.920 here will work on the empty string? Try to guess what 00:02:21.920 --> 00:02:24.440 the result should be before I run it. And then I 00:02:24.440 --> 00:02:27.730 will run it. And you see the result is zero. So, no 00:02:27.730 --> 00:02:30.230 error. And it makes sense that the result is zero. We 00:02:30.230 --> 00:02:32.700 start with h is zero. When there are no characters in the 00:02:32.700 --> 00:02:35.130 string, we don't go through this loop at all. So, h 00:02:35.130 --> 00:02:38.360 is still zero when we return. And let's also try the longer 00:02:38.360 --> 00:02:43.800 example. When we hash the string udacity with 12 buckets, we get 00:02:43.800 --> 00:02:47.440 bucket 11. We should be able to increase the number of buckets. 00:02:47.440 --> 00:02:51.500 So, let's increase the number of buckets. Let's suppose we had 1,000 buckets, 00:02:52.580 --> 00:02:57.670 and we get bucket 755. This isn't enough to convince us that our hash string 00:02:57.670 --> 00:03:00.650 function is distributing all strings well between 00:03:00.650 --> 00:03:02.390 all buckets. But at least we're getting 00:03:02.390 --> 00:03:06.070 a, a fairly large number that says we might be using all the buckets.