
Title:
Better Hash Functions Solution  Intro to Computer Science

Description:

So, this is a pretty tough question. If you weren't able to
¶

get it yourself, that's okay. What I'd encourage you to do is stop

in the middle of the explanation, once the part where you got

stuck on makes sense, and see if you can finish it yourself. So,

what we need to do is define a function, so we're going to

define a function hashstring, and it takes two inputs, the keyword and number

of buckets. And we're going to keep track of where we are in the

circle. We need to introduce a new variable to do that, and we

should start at position zero. So, we'll initialize that variable.

We'll use h to represent the hash, and we'll initialize that

to zero. Now, we want to go through the characters in

the key word. So, we'll have a for loop that goes

through each character in keyword. And for each character, we

want to add to the hash. So, we're going to add to

the hash, the value of that character. We could do the

modulo here, so we could, at this point, use modulo buckets.

We have to be careful to have the parentheses here. If we

just have the modulo buckets here, we wouldn't get the right result.

because it would do ord c modulo buckets, but what we really

need to do is modulo buckets, the sum that we get from h

and ord c. And then at the end, we're going to return the

hash value. We could, instead of doing the modulo each time here, we

could do the modulo just once at the end. That would be

computing a big number if we have a really big string, and then,

at the end, compute our modulo buckets. Either way should work.

This way's a little better, in the sense that if our

string is very long, we would have to compute a really,

really big number, which gets to be more expensive. And we

might even run out if it's a super long keyword. So,

it's better to do the modulo here. We get the same

result either way though. Let's try this in the Python interpreter.

So, here's the code that we wrote out on the sketchpad.

We have our variable, h, which is going to keep track

of the hash value. We're going to go through all the

characters in the keyword, adding each one into the hash

value, modulo the number of buckets. So let's try the examples.

So a, with 12 buckets, hashes the bucket one as

we expect, and if we look at b. And hash just

the bucket two, also as we expect. So, one other thing

we should try, if you remember, when we had the really

bad hash string function. One of the many problems it had

was it didn't work on all strings, in particular, did not

work on empty strings. Do you think our hash string function

here will work on the empty string? Try to guess what

the result should be before I run it. And then I

will run it. And you see the result is zero. So, no

error. And it makes sense that the result is zero. We

start with h is zero. When there are no characters in the

string, we don't go through this loop at all. So, h

is still zero when we return. And let's also try the longer

example. When we hash the string udacity with 12 buckets, we get

bucket 11. We should be able to increase the number of buckets.

So, let's increase the number of buckets. Let's suppose we had 1,000 buckets,

and we get bucket 755. This isn't enough to convince us that our hash string

function is distributing all strings well between

all buckets. But at least we're getting

a, a fairly large number that says we might be using all the buckets.