[Script Info] Title: [Events] Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text Dialogue: 0,0:00:00.00,0:00:12.00,Default,,0000,0000,0000,, Dialogue: 0,0:00:12.00,0:00:15.00,Default,,0000,0000,0000,,This presentation is delivered by the Stanford Center for Professional Dialogue: 0,0:00:15.00,0:00:28.00,Default,,0000,0000,0000,,Development. Dialogue: 0,0:00:28.00,0:00:29.00,Default,,0000,0000,0000,, Dialogue: 0,0:00:29.00,0:00:33.00,Default,,0000,0000,0000,,What are we doing today? We're gonna talk about hashing. Hashing's one of the coolest things you're ever Dialogue: 0,0:00:33.00,0:00:36.00,Default,,0000,0000,0000,,gonna learn in 106b, so it's a good day to be here and learn something really neat, kinda Dialogue: 0,0:00:36.00,0:00:39.00,Default,,0000,0000,0000,,clever, inventive idea for how you do search and retrieval in a very efficient Dialogue: 0,0:00:39.00,0:00:40.00,Default,,0000,0000,0000,,way. Dialogue: 0,0:00:40.00,0:00:44.00,Default,,0000,0000,0000,,So I'm gonna put us back to where we were a lecture ago, two lectures back to Monday, Dialogue: 0,0:00:44.00,0:00:48.00,Default,,0000,0000,0000,,where we had talked about ways we might implement the map abstraction, right, the key Dialogue: 0,0:00:48.00,0:00:49.00,Default,,0000,0000,0000,,value pairing Dialogue: 0,0:00:49.00,0:00:52.00,Default,,0000,0000,0000,,to allow for quick search and retrieval Dialogue: 0,0:00:52.00,0:00:56.00,Default,,0000,0000,0000,,and we had looked at vector and had just done a little thought experiment about Dialogue: 0,0:00:56.00,0:00:59.00,Default,,0000,0000,0000,,what sorting the vector would buy us, right, and that would give us the Dialogue: 0,0:00:59.00,0:01:02.00,Default,,0000,0000,0000,,ability to do searches faster because we could use binary search, Dialogue: 0,0:01:02.00,0:01:07.00,Default,,0000,0000,0000,,right. But in the case of add, right, we'd still be slow because the Dialogue: 0,0:01:07.00,0:01:10.00,Default,,0000,0000,0000,,need be kinda shuffled or rearranged in the continuous memory Dialogue: 0,0:01:10.00,0:01:13.00,Default,,0000,0000,0000,,and then that's what motivated our introduction to trees there, the binary search tree, Dialogue: 0,0:01:13.00,0:01:17.00,Default,,0000,0000,0000,,which is designed exactly to enable that kind of searching, right, that logarithmic Dialogue: 0,0:01:17.00,0:01:20.00,Default,,0000,0000,0000,,divide and conquer, divide in half and work your way down Dialogue: 0,0:01:20.00,0:01:22.00,Default,,0000,0000,0000,,and then because of the flexibility of the pointer base structure there, right, we Dialogue: 0,0:01:22.00,0:01:24.00,Default,,0000,0000,0000,,can also do the insert Dialogue: 0,0:01:24.00,0:01:26.00,Default,,0000,0000,0000,,in the same logarithmic Dialogue: 0,0:01:26.00,0:01:30.00,Default,,0000,0000,0000,,time because we're no longer constrained by that contiguous memory that the array Dialogue: 0,0:01:30.00,0:01:30.00,Default,,0000,0000,0000,,fact Dialogue: 0,0:01:30.00,0:01:32.00,Default,,0000,0000,0000,,data structures force us to. Dialogue: 0,0:01:32.00,0:01:36.00,Default,,0000,0000,0000,,And so, at that point what we have seen this logarithmic and both get valued at and so Dialogue: 0,0:01:36.00,0:01:38.00,Default,,0000,0000,0000,,on the PQ, right, which you were just finishing, Dialogue: 0,0:01:38.00,0:01:42.00,Default,,0000,0000,0000,,you'll notice that logarithmic is pretty fast. In fact logarithmic for most values of n that you Dialogue: 0,0:01:42.00,0:01:44.00,Default,,0000,0000,0000,,would normally run into in an Dialogue: 0,0:01:44.00,0:01:45.00,Default,,0000,0000,0000,,ordinary use, right, Dialogue: 0,0:01:45.00,0:01:49.00,Default,,0000,0000,0000,,it's un-measurable, right. So those of you who got your heap up and running and Dialogue: 0,0:01:49.00,0:01:51.00,Default,,0000,0000,0000,,correctly working logarithmic and time was discovering that you can put 10,000, 20,000, Dialogue: 0,0:01:51.00,0:01:54.00,Default,,0000,0000,0000,,50,000, 100,000 things in it and just never be able to measure Dialogue: 0,0:01:54.00,0:01:58.00,Default,,0000,0000,0000,,what it took to NQ or DQ from those structures because logarithmic is very, Dialogue: 0,0:01:58.00,0:02:00.00,Default,,0000,0000,0000,,very fast. Dialogue: 0,0:02:00.00,0:02:01.00,Default,,0000,0000,0000,,So that said, Dialogue: 0,0:02:01.00,0:02:04.00,Default,,0000,0000,0000,,that's actually a fine point to say, we have a great math notation, we don't have to look further, Dialogue: 0,0:02:04.00,0:02:07.00,Default,,0000,0000,0000,,but in fact I'm gonna push it a little bit today and we're gonna actually getting it down to where Dialogue: 0,0:02:07.00,0:02:10.00,Default,,0000,0000,0000,,we can do both those operations in constant time, which is to say that Dialogue: 0,0:02:10.00,0:02:13.00,Default,,0000,0000,0000,,no matter how large the table gets, right, we expect Dialogue: 0,0:02:13.00,0:02:16.00,Default,,0000,0000,0000,,we'll be able to insert and retrieve with Dialogue: 0,0:02:16.00,0:02:21.00,Default,,0000,0000,0000,,the exact same time required kind of for a 10,000 table as a million entry Dialogue: 0,0:02:21.00,0:02:22.00,Default,,0000,0000,0000,,table which is pretty neat. Dialogue: 0,0:02:22.00,0:02:26.00,Default,,0000,0000,0000,,To note that as we start to get to these more complicated data structures, Dialogue: 0,0:02:26.00,0:02:29.00,Default,,0000,0000,0000,,one thing we also need to keep in mind, right, is that there are other things Dialogue: 0,0:02:29.00,0:02:33.00,Default,,0000,0000,0000,,of trade offs, right, in adding that complexity of the code that Dialogue: 0,0:02:33.00,0:02:36.00,Default,,0000,0000,0000,,starting with something that's simple and Dialogue: 0,0:02:36.00,0:02:39.00,Default,,0000,0000,0000,,a little slower performer but that's easy to write might actually solve the problem Dialogue: 0,0:02:39.00,0:02:42.00,Default,,0000,0000,0000,,we have at hand. So maybe that we don't really wanna go to one of these fancier Dialogue: 0,0:02:42.00,0:02:45.00,Default,,0000,0000,0000,,things because we don't really need it to be that much faster Dialogue: 0,0:02:45.00,0:02:48.00,Default,,0000,0000,0000,,because the BST, right, adds a bunch of memory, right, so left and right pointers Dialogue: 0,0:02:48.00,0:02:53.00,Default,,0000,0000,0000,,for every cell, which is an 8 byte overhead on top of the entry itself. Dialogue: 0,0:02:53.00,0:02:56.00,Default,,0000,0000,0000,,Now we have to deal with pointers, dynamic memories, all this allocation, Dialogue: 0,0:02:56.00,0:02:59.00,Default,,0000,0000,0000,,deallocation, right, we've got a lot of recursion going on in there Dialogue: 0,0:02:59.00,0:03:01.00,Default,,0000,0000,0000,,and so just opportunities for error Dialogue: 0,0:03:01.00,0:03:03.00,Default,,0000,0000,0000,,and stuff to creep into Dialogue: 0,0:03:03.00,0:03:07.00,Default,,0000,0000,0000,,and if we take the steps to provide tree balancing, which we only kind of grazed Dialogue: 0,0:03:07.00,0:03:11.00,Default,,0000,0000,0000,,the surface of. We just mentioned the need for it. It actually adds quite a lot of Dialogue: 0,0:03:11.00,0:03:14.00,Default,,0000,0000,0000,,complication to the code to do the rotations Dialogue: 0,0:03:14.00,0:03:16.00,Default,,0000,0000,0000,,or the rebalancing Dialogue: 0,0:03:16.00,0:03:21.00,Default,,0000,0000,0000,,that from a simple binary search tree. So in fact the whole package, right, of Dialogue: 0,0:03:21.00,0:03:24.00,Default,,0000,0000,0000,,building a really guaranteed log n performing Dialogue: 0,0:03:24.00,0:03:27.00,Default,,0000,0000,0000,,binary search tree is actually a pretty big investment of human capital Dialogue: 0,0:03:27.00,0:03:29.00,Default,,0000,0000,0000,,that you Dialogue: 0,0:03:29.00,0:03:30.00,Default,,0000,0000,0000,,have to kinda Dialogue: 0,0:03:30.00,0:03:33.00,Default,,0000,0000,0000,,balance against the future hit off of it running faster. I'm Dialogue: 0,0:03:33.00,0:03:36.00,Default,,0000,0000,0000,,gonna show this strategy today that actually is Dialogue: 0,0:03:36.00,0:03:37.00,Default,,0000,0000,0000,,kind of simple to write. Dialogue: 0,0:03:37.00,0:03:40.00,Default,,0000,0000,0000,,It gets us solve one case and doesn't have the degenerate Dialogue: 0,0:03:40.00,0:03:43.00,Default,,0000,0000,0000,,looming that the binary search tree did. Dialogue: 0,0:03:43.00,0:03:46.00,Default,,0000,0000,0000,,So, let me kinda just give you a theoretical concept to think about how Dialogue: 0,0:03:46.00,0:03:49.00,Default,,0000,0000,0000,,you do stuff. Dialogue: 0,0:03:49.00,0:03:51.00,Default,,0000,0000,0000,,You have a big dictionary. You've got, you know, like a Dialogue: 0,0:03:51.00,0:03:54.00,Default,,0000,0000,0000,,many thousand page dictionary, right, and you want to look up a word. You certainly Dialogue: 0,0:03:54.00,0:03:57.00,Default,,0000,0000,0000,,don't do linear search. You're about to look up the word xylophone and you're actually not Dialogue: 0,0:03:57.00,0:04:00.00,Default,,0000,0000,0000,,going to start from the A's and kind of page your way through, you Dialogue: 0,0:04:00.00,0:04:03.00,Default,,0000,0000,0000,,know, 2,000 pages to get to the X's. You Dialogue: 0,0:04:03.00,0:04:07.00,Default,,0000,0000,0000,,don't even ? like we do binary search. You can do binary searches pretty fast, right? It Dialogue: 0,0:04:07.00,0:04:10.00,Default,,0000,0000,0000,,turns out like starting at the M's to get something that's in X, right? You Dialogue: 0,0:04:10.00,0:04:11.00,Default,,0000,0000,0000,,tend to know about where X Dialogue: 0,0:04:11.00,0:04:14.00,Default,,0000,0000,0000,,is and often dictionaries have little tabs Dialogue: 0,0:04:14.00,0:04:17.00,Default,,0000,0000,0000,,along the pages that give you an idea of kinda where the breaks for Dialogue: 0,0:04:17.00,0:04:22.00,Default,,0000,0000,0000,,certain words are. Maybe it says Xa is here and Xe is there or something, Dialogue: 0,0:04:22.00,0:04:25.00,Default,,0000,0000,0000,,that lets you kinda more quickly just illuminate all the surrounding noise Dialogue: 0,0:04:25.00,0:04:29.00,Default,,0000,0000,0000,,in the dictionary, kinda get straight to the right region where the X words are Dialogue: 0,0:04:29.00,0:04:30.00,Default,,0000,0000,0000,,and then you can do Dialogue: 0,0:04:30.00,0:04:33.00,Default,,0000,0000,0000,,a simple search from there, maybe a little bit of a linear or binary search to kind of zero Dialogue: 0,0:04:33.00,0:04:37.00,Default,,0000,0000,0000,,in on the page where the word is you wanna find is. Dialogue: 0,0:04:37.00,0:04:38.00,Default,,0000,0000,0000,,This is the idea Dialogue: 0,0:04:38.00,0:04:45.00,Default,,0000,0000,0000,,behind this concept called a hash table or based on the algorithmic technique of hashing Dialogue: 0,0:04:45.00,0:04:46.00,Default,,0000,0000,0000,,is that Dialogue: 0,0:04:46.00,0:04:49.00,Default,,0000,0000,0000,,rather than try to figure out how to kinda keep this very large collection sorted or organized Dialogue: 0,0:04:49.00,0:04:51.00,Default,,0000,0000,0000,,in a way that you can kind of Dialogue: 0,0:04:51.00,0:04:52.00,Default,,0000,0000,0000,,jump around within it, it says, Dialogue: 0,0:04:52.00,0:04:56.00,Default,,0000,0000,0000,,''Well how about we maintain a bunch of different collections.'' We break up the Dialogue: 0,0:04:56.00,0:04:59.00,Default,,0000,0000,0000,,data set into little tiny pieces Dialogue: 0,0:04:59.00,0:05:00.00,Default,,0000,0000,0000,,and if we can Dialogue: 0,0:05:00.00,0:05:02.00,Default,,0000,0000,0000,,tell, given a key, Dialogue: 0,0:05:02.00,0:05:05.00,Default,,0000,0000,0000,,like which piece of the data set it's supposed to be in and we'll call those pieces Dialogue: 0,0:05:05.00,0:05:08.00,Default,,0000,0000,0000,,buckets. I'm gonna put these in different buckets. Dialogue: 0,0:05:08.00,0:05:11.00,Default,,0000,0000,0000,,So if I had a whole bunch of students I might put all the freshmen in this bucket Dialogue: 0,0:05:11.00,0:05:13.00,Default,,0000,0000,0000,,and the sophomores in this bucket or something like that and then when I look Dialogue: 0,0:05:13.00,0:05:16.00,Default,,0000,0000,0000,,for a student I would say, ''Well which class are you?'' You say, ''Freshmen.'' I'd look in the Dialogue: 0,0:05:16.00,0:05:18.00,Default,,0000,0000,0000,,freshmen bucket. Dialogue: 0,0:05:18.00,0:05:19.00,Default,,0000,0000,0000,,Okay. You know, the idea Dialogue: 0,0:05:19.00,0:05:22.00,Default,,0000,0000,0000,,in a simple case would be like, ''Okay, well I ? it has some Dialogue: 0,0:05:22.00,0:05:26.00,Default,,0000,0000,0000,,criteria by which I can divide you into these buckets that don't overlap and everyone Dialogue: 0,0:05:26.00,0:05:28.00,Default,,0000,0000,0000,,has a unique bucket to go in Dialogue: 0,0:05:28.00,0:05:31.00,Default,,0000,0000,0000,,and if I can make up enough of those categorizations, Dialogue: 0,0:05:31.00,0:05:33.00,Default,,0000,0000,0000,,have as many of those buckets around Dialogue: 0,0:05:33.00,0:05:35.00,Default,,0000,0000,0000,,to divide you up into, Dialogue: 0,0:05:35.00,0:05:38.00,Default,,0000,0000,0000,,then I can hopefully keep the set that lands in that bucket pretty small, Dialogue: 0,0:05:38.00,0:05:42.00,Default,,0000,0000,0000,,so that I don't have a smaller of students. So maybe I did it by the Dialogue: 0,0:05:42.00,0:05:45.00,Default,,0000,0000,0000,,first letter of your last name. I could say, ''Well everybody whose last name begins with W Dialogue: 0,0:05:45.00,0:05:47.00,Default,,0000,0000,0000,,goes here and everybody who's B goes here and so on.'' Well, Dialogue: 0,0:05:47.00,0:05:50.00,Default,,0000,0000,0000,,when somebody comes to me then I don't have to look through the B's. Dialogue: 0,0:05:50.00,0:05:51.00,Default,,0000,0000,0000,,You know, and if you have Dialogue: 0,0:05:51.00,0:05:53.00,Default,,0000,0000,0000,,a class of 200 students Dialogue: 0,0:05:53.00,0:05:56.00,Default,,0000,0000,0000,,and you have 26 letters of the alphabet then you'd think that if it were Dialogue: 0,0:05:56.00,0:05:59.00,Default,,0000,0000,0000,,evenly divided across those letters, it's not going to be to be, it turns out in practice, Dialogue: 0,0:05:59.00,0:06:02.00,Default,,0000,0000,0000,,but as a thought experiment there'd be about 4 people in a bucket and Dialogue: 0,0:06:02.00,0:06:06.00,Default,,0000,0000,0000,,it'd be pretty fast to just look through the bucket and find the one who I was Dialogue: 0,0:06:06.00,0:06:08.00,Default,,0000,0000,0000,,looking for. Okay, Dialogue: 0,0:06:08.00,0:06:10.00,Default,,0000,0000,0000,,so let's take this idea. This idea of, like, Dialogue: 0,0:06:10.00,0:06:14.00,Default,,0000,0000,0000,,there's going to be some strategy for how we divide into buckets and we call Dialogue: 0,0:06:14.00,0:06:16.00,Default,,0000,0000,0000,,that strategy the hash function. Dialogue: 0,0:06:16.00,0:06:18.00,Default,,0000,0000,0000,,That, given a key, Dialogue: 0,0:06:18.00,0:06:23.00,Default,,0000,0000,0000,,and a buckets and we'll just label them with numbers. Zero to the number of buckets Dialogue: 0,0:06:23.00,0:06:26.00,Default,,0000,0000,0000,,minus one. So let's say I have 99 buckets, I've got zero to Dialogue: 0,0:06:26.00,0:06:27.00,Default,,0000,0000,0000,,98 Dialogue: 0,0:06:27.00,0:06:28.00,Default,,0000,0000,0000,,that I will take the key Dialogue: 0,0:06:28.00,0:06:32.00,Default,,0000,0000,0000,,and the key in this case is the string key that we're using in the map, Dialogue: 0,0:06:32.00,0:06:33.00,Default,,0000,0000,0000,,and I have a hash function Dialogue: 0,0:06:33.00,0:06:37.00,Default,,0000,0000,0000,,which given a key is input produces a number as output Dialogue: 0,0:06:37.00,0:06:39.00,Default,,0000,0000,0000,,in the range zero to B minus one. Dialogue: 0,0:06:39.00,0:06:42.00,Default,,0000,0000,0000,,And that is what's called the hash value for that key Dialogue: 0,0:06:42.00,0:06:45.00,Default,,0000,0000,0000,,and that tells me which bucket to look at, Dialogue: 0,0:06:45.00,0:06:48.00,Default,,0000,0000,0000,,either find a match for that if I'm trying to do a get value or where to place a Dialogue: 0,0:06:48.00,0:06:52.00,Default,,0000,0000,0000,,new entry if I'm trying to add into the table. Dialogue: 0,0:06:52.00,0:06:57.00,Default,,0000,0000,0000,,So that's the basic idea. I'm Dialogue: 0,0:06:57.00,0:07:00.00,Default,,0000,0000,0000,,gonna talk a little bit about hash functions. Dialogue: 0,0:07:00.00,0:07:03.00,Default,,0000,0000,0000,,In terms of writing a really good hash function is actually pretty complicated Dialogue: 0,0:07:03.00,0:07:04.00,Default,,0000,0000,0000,,and so Dialogue: 0,0:07:04.00,0:07:07.00,Default,,0000,0000,0000,,we're gonna think about it in sort of a ? kind Dialogue: 0,0:07:07.00,0:07:10.00,Default,,0000,0000,0000,,of a big picture perspective and think about what qualities a hash Dialogue: 0,0:07:10.00,0:07:13.00,Default,,0000,0000,0000,,function has to have and then I'm actually not gonna go through Dialogue: 0,0:07:13.00,0:07:16.00,Default,,0000,0000,0000,,proving to you about what mathematically constitutes a good hash function. I'm just gonna give you Dialogue: 0,0:07:16.00,0:07:21.00,Default,,0000,0000,0000,,some intuitive sense of what kinda things matter when designing a hash function. Dialogue: 0,0:07:21.00,0:07:23.00,Default,,0000,0000,0000,,So it's given a key that maps it to a number. Dialogue: 0,0:07:23.00,0:07:25.00,Default,,0000,0000,0000,,Typically it's gonna say, you know, Dialogue: 0,0:07:25.00,0:07:27.00,Default,,0000,0000,0000,,zero to the number of buckets minus one. Dialogue: 0,0:07:27.00,0:07:30.00,Default,,0000,0000,0000,,Often that means that somehow it's gonna compute something and then use Dialogue: 0,0:07:30.00,0:07:33.00,Default,,0000,0000,0000,,the mod operation to bring it back in range. Dialogue: 0,0:07:33.00,0:07:35.00,Default,,0000,0000,0000,,So it's gonna take your number Dialogue: 0,0:07:35.00,0:07:37.00,Default,,0000,0000,0000,,and then say, ''Okay. Well, if you have five buckets to divide it in, I will give you Dialogue: 0,0:07:37.00,0:07:39.00,Default,,0000,0000,0000,,a number zero to four.'' Dialogue: 0,0:07:39.00,0:07:41.00,Default,,0000,0000,0000,,It's very important that it be Dialogue: 0,0:07:41.00,0:07:44.00,Default,,0000,0000,0000,,stable, that you put the key in, you get a number out. If you put that same key in Dialogue: 0,0:07:44.00,0:07:48.00,Default,,0000,0000,0000,,later you should get the same number out. So it has to be sort of reliable guaranteed Dialogue: 0,0:07:48.00,0:07:52.00,Default,,0000,0000,0000,,mapping. It can't be random. That Dialogue: 0,0:07:52.00,0:07:55.00,Default,,0000,0000,0000,,said, you want it to almost look as though it's random in terms of Dialogue: 0,0:07:55.00,0:07:59.00,Default,,0000,0000,0000,,how it maps things to buckets, that you don't want it to have a real systematic plan Dialogue: 0,0:07:59.00,0:08:02.00,Default,,0000,0000,0000,,where it puts a whole bunch of things in bucket one and very few things in bucket two Dialogue: 0,0:08:02.00,0:08:06.00,Default,,0000,0000,0000,,or somehow always uses the even buckets and not the odds ones or something like that. So Dialogue: 0,0:08:06.00,0:08:08.00,Default,,0000,0000,0000,,you wanna have a strategy that's gonna take those keys Dialogue: 0,0:08:08.00,0:08:11.00,Default,,0000,0000,0000,,and use your whole bucket range Dialogue: 0,0:08:11.00,0:08:12.00,Default,,0000,0000,0000,,in an equal manner. Dialogue: 0,0:08:12.00,0:08:15.00,Default,,0000,0000,0000,,So that if you put 100 keys in and you have 100 buckets you're hoping Dialogue: 0,0:08:15.00,0:08:16.00,Default,,0000,0000,0000,,that, Dialogue: 0,0:08:16.00,0:08:19.00,Default,,0000,0000,0000,,on average, each bucket will have one thing. That you won't have these clusters Dialogue: 0,0:08:19.00,0:08:22.00,Default,,0000,0000,0000,,where 25 things are in one bucket, ten here and then a whole bunch of buckets Dialogue: 0,0:08:22.00,0:08:24.00,Default,,0000,0000,0000,,are empty. So Dialogue: 0,0:08:24.00,0:08:27.00,Default,,0000,0000,0000,,you want this kinda divided across the range. Dialogue: 0,0:08:27.00,0:08:29.00,Default,,0000,0000,0000,,So given that we have string input, Dialogue: 0,0:08:29.00,0:08:32.00,Default,,0000,0000,0000,,likely what we're gonna be using is the characters in the string as part of Dialogue: 0,0:08:32.00,0:08:34.00,Default,,0000,0000,0000,,the ? how can I take these characters Dialogue: 0,0:08:34.00,0:08:40.00,Default,,0000,0000,0000,,and somehow kinda jungle them up in a way to sort of move them around in these Dialogue: 0,0:08:40.00,0:08:43.00,Default,,0000,0000,0000,,buckets in a way that's reliable so that I get that same string back, I get the Dialogue: 0,0:08:43.00,0:08:44.00,Default,,0000,0000,0000,,same one. Dialogue: 0,0:08:44.00,0:08:47.00,Default,,0000,0000,0000,,But that doesn't actually produce a lot of artifacts of Dialogue: 0,0:08:47.00,0:08:50.00,Default,,0000,0000,0000,,similar strings, let's say mapping to the same place, hopefully spreading them Dialogue: 0,0:08:50.00,0:08:52.00,Default,,0000,0000,0000,,around. Dialogue: 0,0:08:52.00,0:08:55.00,Default,,0000,0000,0000,,So for example, some really simple things you could use, you could just take the first Dialogue: 0,0:08:55.00,0:08:56.00,Default,,0000,0000,0000,,letter. So I Dialogue: 0,0:08:56.00,0:08:59.00,Default,,0000,0000,0000,,talked about this sorta with last names. If I was trying to divide up my students, right, I could divide Dialogue: 0,0:08:59.00,0:09:01.00,Default,,0000,0000,0000,,them by the first letter of their last name. Dialogue: 0,0:09:01.00,0:09:05.00,Default,,0000,0000,0000,,That would use exactly 26 buckets, no more, no less, Dialogue: 0,0:09:05.00,0:09:08.00,Default,,0000,0000,0000,,and if I happen to have 26 buckets Dialogue: 0,0:09:08.00,0:09:11.00,Default,,0000,0000,0000,,then at least there's some chance it would get some coverage across the range. Dialogue: 0,0:09:11.00,0:09:14.00,Default,,0000,0000,0000,,There is likely to be clustering, right, some Dialogue: 0,0:09:14.00,0:09:15.00,Default,,0000,0000,0000,,names are a lot more common Dialogue: 0,0:09:15.00,0:09:18.00,Default,,0000,0000,0000,,or letters are a lot more commonly used than others. Dialogue: 0,0:09:18.00,0:09:21.00,Default,,0000,0000,0000,,So it would tend to produce a little bit of an artifact, but it would be reliable and it would Dialogue: 0,0:09:21.00,0:09:24.00,Default,,0000,0000,0000,,use 26 buckets. If I had 100 buckets, Dialogue: 0,0:09:24.00,0:09:26.00,Default,,0000,0000,0000,,then this would be kind of a crummy function Dialogue: 0,0:09:26.00,0:09:28.00,Default,,0000,0000,0000,,because in fact it would use the first 26 buckets and none of the other ones, Dialogue: 0,0:09:28.00,0:09:30.00,Default,,0000,0000,0000,,right, if it was just taking Dialogue: 0,0:09:30.00,0:09:32.00,Default,,0000,0000,0000,,that letter and deciding. So, Dialogue: 0,0:09:32.00,0:09:36.00,Default,,0000,0000,0000,,you know, in some situations this would get me a little bit of the thing I wanted. Dialogue: 0,0:09:36.00,0:09:40.00,Default,,0000,0000,0000,,Things like using the length of the word. A Dialogue: 0,0:09:40.00,0:09:42.00,Default,,0000,0000,0000,,very, very bad clustering function here, Dialogue: 0,0:09:42.00,0:09:44.00,Default,,0000,0000,0000,,right because there'd be a bunch of strings that are, like, six characters, seven Dialogue: 0,0:09:44.00,0:09:46.00,Default,,0000,0000,0000,,characters, ten characters, Dialogue: 0,0:09:46.00,0:09:50.00,Default,,0000,0000,0000,,but very few that are one or two, and very few that are more than ten, or something. Dialogue: 0,0:09:50.00,0:09:53.00,Default,,0000,0000,0000,,You'd have this very tight little cluster in the middle, right, and would not use a Dialogue: 0,0:09:53.00,0:09:57.00,Default,,0000,0000,0000,,large bucket range effectively at all. Dialogue: 0,0:09:57.00,0:10:01.00,Default,,0000,0000,0000,,More likely what you're gonna try to use is kind of more of the information about the word, Dialogue: 0,0:10:01.00,0:10:04.00,Default,,0000,0000,0000,,the key that you have. Instead of just a single letter or the count of letters, it's Dialogue: 0,0:10:04.00,0:10:06.00,Default,,0000,0000,0000,,trying to use kinda all the data you have at hand, Dialogue: 0,0:10:06.00,0:10:08.00,Default,,0000,0000,0000,,which is to say all the letters, Dialogue: 0,0:10:08.00,0:10:12.00,Default,,0000,0000,0000,,and so one way to do it would be to take the ASCII values for the letters, so A is Dialogue: 0,0:10:12.00,0:10:13.00,Default,,0000,0000,0000,,97, you Dialogue: 0,0:10:13.00,0:10:15.00,Default,,0000,0000,0000,,know, B is 98, Dialogue: 0,0:10:15.00,0:10:17.00,Default,,0000,0000,0000,,is you would take a letter, a word like atoms and you'd add Dialogue: 0,0:10:17.00,0:10:18.00,Default,,0000,0000,0000,,96 Dialogue: 0,0:10:18.00,0:10:22.00,Default,,0000,0000,0000,,plus 99 plus 96, you know, add Dialogue: 0,0:10:22.00,0:10:23.00,Default,,0000,0000,0000,,them together Dialogue: 0,0:10:23.00,0:10:26.00,Default,,0000,0000,0000,,and then you'd get some sum of them, 500, 600, whatever it is, Dialogue: 0,0:10:26.00,0:10:28.00,Default,,0000,0000,0000,,depending on how many letters there are Dialogue: 0,0:10:28.00,0:10:32.00,Default,,0000,0000,0000,,and then use mod to back it back into ? let's say your range was, you had 100 Dialogue: 0,0:10:32.00,0:10:34.00,Default,,0000,0000,0000,,buckets, right, so you'd want it to be zero to 99, so Dialogue: 0,0:10:34.00,0:10:37.00,Default,,0000,0000,0000,,you'd take that thing and then mod it by 100, then take the last two digits of it. So it'd Dialogue: 0,0:10:37.00,0:10:42.00,Default,,0000,0000,0000,,be, okay, the sum to 524 then it goes into bucket 24. Dialogue: 0,0:10:42.00,0:10:47.00,Default,,0000,0000,0000,,So if you did that, right, and so if act Dialogue: 0,0:10:47.00,0:10:50.00,Default,,0000,0000,0000,,is the word that you have, right, you'd add these codes together and let's say this is Dialogue: 0,0:10:50.00,0:10:53.00,Default,,0000,0000,0000,,97 and then this is 99 and then Dialogue: 0,0:10:53.00,0:10:57.00,Default,,0000,0000,0000,,that's, you know, a hundred and something and so we end up with ? I'm making this Dialogue: 0,0:10:57.00,0:10:59.00,Default,,0000,0000,0000,,number up, right ? 283, then we might Dialogue: 0,0:10:59.00,0:11:03.00,Default,,0000,0000,0000,,mod to the last two digit and we'll say this one goes into bucket 83. Dialogue: 0,0:11:03.00,0:11:07.00,Default,,0000,0000,0000,,And if we have, you know, dog, Dialogue: 0,0:11:07.00,0:11:11.00,Default,,0000,0000,0000,,maybe that adds to 267 and we would Dialogue: 0,0:11:11.00,0:11:13.00,Default,,0000,0000,0000,,mod that down to Dialogue: 0,0:11:13.00,0:11:17.00,Default,,0000,0000,0000,,67. So this actually tries to use, kinda, all the information at hand, all the letters Dialogue: 0,0:11:17.00,0:11:18.00,Default,,0000,0000,0000,,that we have Dialogue: 0,0:11:18.00,0:11:19.00,Default,,0000,0000,0000,,and bring them down, Dialogue: 0,0:11:19.00,0:11:22.00,Default,,0000,0000,0000,,and so if you figure that across the space of all the numbers that are being added here Dialogue: 0,0:11:22.00,0:11:25.00,Default,,0000,0000,0000,,you are just as likely to get 210 as you are to get 299, you know Dialogue: 0,0:11:25.00,0:11:28.00,Default,,0000,0000,0000,,that any of the subsequent numbers in between, Dialogue: 0,0:11:28.00,0:11:32.00,Default,,0000,0000,0000,,then hopefully you'd get kind of a spread across your buckets. Dialogue: 0,0:11:32.00,0:11:36.00,Default,,0000,0000,0000,,There is one thing this clusters on. Does anybody see what words cluster in this Dialogue: 0,0:11:36.00,0:11:40.00,Default,,0000,0000,0000,,system? Dialogue: 0,0:11:40.00,0:11:41.00,Default,,0000,0000,0000,,That you'd think of as Dialogue: 0,0:11:41.00,0:11:47.00,Default,,0000,0000,0000,,maybe not being related, but would land in the same bucket? To Dialogue: 0,0:11:47.00,0:11:49.00,Default,,0000,0000,0000,,rearrange some numbers. Yes. Any Dialogue: 0,0:11:49.00,0:11:51.00,Default,,0000,0000,0000,,anagram, right, so Dialogue: 0,0:11:51.00,0:11:52.00,Default,,0000,0000,0000,,you Dialogue: 0,0:11:52.00,0:11:54.00,Default,,0000,0000,0000,,take cat which is just a ranagram ? Dialogue: 0,0:11:54.00,0:11:57.00,Default,,0000,0000,0000,,anagram of act, right, it's gonna add to the same thing, right, Dialogue: 0,0:11:57.00,0:12:00.00,Default,,0000,0000,0000,,and so it would be a little bit of clustering in that respect. Dialogue: 0,0:12:00.00,0:12:02.00,Default,,0000,0000,0000,,So anagrams come down to the same place. Dialogue: 0,0:12:02.00,0:12:05.00,Default,,0000,0000,0000,,And so the words that you expect to, kinda like, act and cat to Dialogue: 0,0:12:05.00,0:12:09.00,Default,,0000,0000,0000,,you seem like different words, they are gonna collide in this system and that might be a little Dialogue: 0,0:12:09.00,0:12:11.00,Default,,0000,0000,0000,,bit of something we worry about. So one way you can actually kind of avoid that in the Dialogue: 0,0:12:11.00,0:12:12.00,Default,,0000,0000,0000,,way that Dialogue: 0,0:12:12.00,0:12:15.00,Default,,0000,0000,0000,,the hash function we're going to eventually use will do, is Dialogue: 0,0:12:15.00,0:12:16.00,Default,,0000,0000,0000,,it'll do this thing where Dialogue: 0,0:12:16.00,0:12:21.00,Default,,0000,0000,0000,,not only does it add the letters together, but it also multiplies the letter by a Dialogue: 0,0:12:21.00,0:12:22.00,Default,,0000,0000,0000,,very large prime number Dialogue: 0,0:12:22.00,0:12:25.00,Default,,0000,0000,0000,,that helps to kinda say, ''Well the large prime number Dialogue: 0,0:12:25.00,0:12:26.00,Default,,0000,0000,0000,,times C plus Dialogue: 0,0:12:26.00,0:12:30.00,Default,,0000,0000,0000,,the large prime number squared times A plus the large prime number cubed times C and Dialogue: 0,0:12:30.00,0:12:33.00,Default,,0000,0000,0000,,then it just allows the, each position of the letter actually is also then Dialogue: 0,0:12:33.00,0:12:35.00,Default,,0000,0000,0000,,encoded in the number that comes out Dialogue: 0,0:12:35.00,0:12:37.00,Default,,0000,0000,0000,,and so something that's positionally different Dialogue: 0,0:12:37.00,0:12:40.00,Default,,0000,0000,0000,,won't necessarily land in the same bucket Dialogue: 0,0:12:40.00,0:12:43.00,Default,,0000,0000,0000,,without some other factors kinda coming into play. So a little Dialogue: 0,0:12:43.00,0:12:49.00,Default,,0000,0000,0000,,bit of trying to just take the information and jumble it up a little bit. Dialogue: 0,0:12:49.00,0:12:51.00,Default,,0000,0000,0000,,So let me show you a little bit about how this looks. Actually I'm gonna tell you about Dialogue: 0,0:12:51.00,0:12:54.00,Default,,0000,0000,0000,,the collisions and then I'll show you the data structure. Dialogue: 0,0:12:54.00,0:12:58.00,Default,,0000,0000,0000,,So certainly I said there's gonna be this problem where things collide, Dialogue: 0,0:12:58.00,0:13:01.00,Default,,0000,0000,0000,,that 83 and 83 in this simple system of adding the codes will Dialogue: 0,0:13:01.00,0:13:03.00,Default,,0000,0000,0000,,come out the same. Dialogue: 0,0:13:03.00,0:13:06.00,Default,,0000,0000,0000,,So it can't be that each of these buckets holds exactly one thing. Dialogue: 0,0:13:06.00,0:13:07.00,Default,,0000,0000,0000,,The hashing, Dialogue: 0,0:13:07.00,0:13:08.00,Default,,0000,0000,0000,,right, although we may Dialogue: 0,0:13:08.00,0:13:11.00,Default,,0000,0000,0000,,try to get those buckets as small as possible we have to account for the Dialogue: 0,0:13:11.00,0:13:14.00,Default,,0000,0000,0000,,possibility that two things will collide, that their hash function, even though Dialogue: 0,0:13:14.00,0:13:16.00,Default,,0000,0000,0000,,they were different keys, Dialogue: 0,0:13:16.00,0:13:18.00,Default,,0000,0000,0000,,will land in the same bucket Dialogue: 0,0:13:18.00,0:13:20.00,Default,,0000,0000,0000,,based on how I've jumbled it up Dialogue: 0,0:13:20.00,0:13:23.00,Default,,0000,0000,0000,,and so we're trying to avoid that, but when it does happen we also have to have a Dialogue: 0,0:13:23.00,0:13:25.00,Default,,0000,0000,0000,,strategy for it. So we have to have a way of saying, ''Well Dialogue: 0,0:13:25.00,0:13:28.00,Default,,0000,0000,0000,,there can be more than one thing in a bucket.'' Or some other way of when there's Dialogue: 0,0:13:28.00,0:13:33.00,Default,,0000,0000,0000,,a collision, deciding where to put the one that was placed in the table second. Dialogue: 0,0:13:33.00,0:13:36.00,Default,,0000,0000,0000,,And in the strategy, we're gonna use the one called chaning, Dialogue: 0,0:13:36.00,0:13:39.00,Default,,0000,0000,0000,,and the chaning basically says, ''Well if once we have something in a bucket like ACT, Dialogue: 0,0:13:39.00,0:13:43.00,Default,,0000,0000,0000,,if we hash something and it comes to the same bucket, we'll just tack it onto that bucket, Dialogue: 0,0:13:43.00,0:13:45.00,Default,,0000,0000,0000,,in this case using a link list.'' Dialogue: 0,0:13:45.00,0:13:48.00,Default,,0000,0000,0000,,So each of these buckets is actually gonna be a link list of Dialogue: 0,0:13:48.00,0:13:49.00,Default,,0000,0000,0000,,the entries Dialogue: 0,0:13:49.00,0:13:52.00,Default,,0000,0000,0000,,and so the entire table will be a Dialogue: 0,0:13:52.00,0:13:54.00,Default,,0000,0000,0000,,vector or array of those. Dialogue: 0,0:13:54.00,0:13:56.00,Default,,0000,0000,0000,,So let me show you, Dialogue: 0,0:13:56.00,0:13:59.00,Default,,0000,0000,0000,,a little Dialogue: 0,0:13:59.00,0:14:05.00,Default,,0000,0000,0000,,live action. Dialogue: 0,0:14:05.00,0:14:08.00,Default,,0000,0000,0000,,So this is a hash table that has seven buckets. Dialogue: 0,0:14:08.00,0:14:08.00,Default,,0000,0000,0000,,In Dialogue: 0,0:14:08.00,0:14:11.00,Default,,0000,0000,0000,,this case number 0 through 6 are assigned to each one. Dialogue: 0,0:14:11.00,0:14:14.00,Default,,0000,0000,0000,,They all start empty, and so these are all empty link lists illustrated by the Dialogue: 0,0:14:14.00,0:14:15.00,Default,,0000,0000,0000,,null, Dialogue: 0,0:14:15.00,0:14:17.00,Default,,0000,0000,0000,,and then the hash function Dialogue: 0,0:14:17.00,0:14:20.00,Default,,0000,0000,0000,,is shown here like a black box. You don't actually know how the workings of hash Dialogue: 0,0:14:20.00,0:14:23.00,Default,,0000,0000,0000,,function is, but you know what it's job is to do, which is you give it a key and it gives you Dialogue: 0,0:14:23.00,0:14:27.00,Default,,0000,0000,0000,,a number in the range 0 to 6, and that same key when passed, and again gives you Dialogue: 0,0:14:27.00,0:14:28.00,Default,,0000,0000,0000,,the same number. Dialogue: 0,0:14:28.00,0:14:31.00,Default,,0000,0000,0000,,So if I say 106B Dialogue: 0,0:14:31.00,0:14:34.00,Default,,0000,0000,0000,,equals Julie, it pushes 106B through the hash function and says, Dialogue: 0,0:14:34.00,0:14:35.00,Default,,0000,0000,0000,,''Okay, well 106B Dialogue: 0,0:14:35.00,0:14:37.00,Default,,0000,0000,0000,,comes out as one.'' Dialogue: 0,0:14:37.00,0:14:40.00,Default,,0000,0000,0000,,You add those numbers together, jumble them up, the letters, and you get one. Dialogue: 0,0:14:40.00,0:14:43.00,Default,,0000,0000,0000,,And so it went into the table, and it said, ''Okay, well that's bucket number one, make a new Dialogue: 0,0:14:43.00,0:14:46.00,Default,,0000,0000,0000,,cell for 106B Julie and stick it in the table.'' Dialogue: 0,0:14:46.00,0:14:48.00,Default,,0000,0000,0000,,So now there's a Dialogue: 0,0:14:48.00,0:14:51.00,Default,,0000,0000,0000,,non empty chain in one place, the rest are empty. Dialogue: 0,0:14:51.00,0:14:53.00,Default,,0000,0000,0000,,If I say 107 Dialogue: 0,0:14:53.00,0:14:55.00,Default,,0000,0000,0000,,is being taught by Dialogue: 0,0:14:55.00,0:14:57.00,Default,,0000,0000,0000,,Jerry, 107 happens to get the same hash function. Dialogue: 0,0:14:57.00,0:15:00.00,Default,,0000,0000,0000,,107, 106B, different letters but it just happen to come out that way, Dialogue: 0,0:15:00.00,0:15:01.00,Default,,0000,0000,0000,,hash to one Dialogue: 0,0:15:01.00,0:15:04.00,Default,,0000,0000,0000,,and so it went ahead in this case just tacked it on to the front of the list. Dialogue: 0,0:15:04.00,0:15:05.00,Default,,0000,0000,0000,,If I do Dialogue: 0,0:15:05.00,0:15:08.00,Default,,0000,0000,0000,,106A Dialogue: 0,0:15:08.00,0:15:10.00,Default,,0000,0000,0000,,being taught by Patrick, Dialogue: 0,0:15:10.00,0:15:11.00,Default,,0000,0000,0000,,106A Dialogue: 0,0:15:11.00,0:15:14.00,Default,,0000,0000,0000,,happens to sum to zero in the hash function. Dialogue: 0,0:15:14.00,0:15:17.00,Default,,0000,0000,0000,,That's all we know about the black box that ends up loading it up into the zero's Dialogue: 0,0:15:17.00,0:15:17.00,Default,,0000,0000,0000,,slot. Dialogue: 0,0:15:17.00,0:15:21.00,Default,,0000,0000,0000,,I say, well wait, it's being taught by Dialogue: 0,0:15:21.00,0:15:25.00,Default,,0000,0000,0000,,Nick, it also come out to be zero and so Dialogue: 0,0:15:25.00,0:15:27.00,Default,,0000,0000,0000,,it ends up kinda pushing on the front of that. So I go 103 Dialogue: 0,0:15:27.00,0:15:30.00,Default,,0000,0000,0000,,is being taught by Maron and it's in Dialogue: 0,0:15:30.00,0:15:33.00,Default,,0000,0000,0000,,bucket five and so as I kinda type things in I'm starting to see that things are just kinda Dialogue: 0,0:15:33.00,0:15:35.00,Default,,0000,0000,0000,,going around in difference place in the table. Dialogue: 0,0:15:35.00,0:15:38.00,Default,,0000,0000,0000,,A little bit of clustering in the top, a little bit of empty at the bottom, but as I continue Dialogue: 0,0:15:38.00,0:15:42.00,Default,,0000,0000,0000,,to type in strings for other course numbers, I'd hopefully kinda fill out the table Dialogue: 0,0:15:42.00,0:15:43.00,Default,,0000,0000,0000,,to where there was kind of an even, Dialogue: 0,0:15:43.00,0:15:46.00,Default,,0000,0000,0000,,roughly even spread across that. Dialogue: 0,0:15:46.00,0:15:47.00,Default,,0000,0000,0000,,When I want to look something up Dialogue: 0,0:15:47.00,0:15:51.00,Default,,0000,0000,0000,,because I wanna see who's teaching 106B, then Dialogue: 0,0:15:51.00,0:15:54.00,Default,,0000,0000,0000,,it will use the same process in reverse. It asks the hash function, Dialogue: 0,0:15:54.00,0:15:56.00,Default,,0000,0000,0000,,''Well which bucket should I look in?'' And it Dialogue: 0,0:15:56.00,0:15:59.00,Default,,0000,0000,0000,,says, ''Oh, you should look in bucket one because 106B's hash code is one.'' So Dialogue: 0,0:15:59.00,0:16:01.00,Default,,0000,0000,0000,,all the other buckets are eliminated Dialogue: 0,0:16:01.00,0:16:04.00,Default,,0000,0000,0000,,from consideration, so very quick kind of Dialogue: 0,0:16:04.00,0:16:07.00,Default,,0000,0000,0000,,focused down to the right place and then it actually just does a link list reversal to Dialogue: 0,0:16:07.00,0:16:10.00,Default,,0000,0000,0000,,find 106B in the bucket Dialogue: 0,0:16:10.00,0:16:11.00,Default,,0000,0000,0000,,where it has to be, Dialogue: 0,0:16:11.00,0:16:13.00,Default,,0000,0000,0000,,and so Dialogue: 0,0:16:13.00,0:16:16.00,Default,,0000,0000,0000,,the key to realizing what makes this so magical, right, Dialogue: 0,0:16:16.00,0:16:19.00,Default,,0000,0000,0000,,is that the choice of the number of buckets Dialogue: 0,0:16:19.00,0:16:21.00,Default,,0000,0000,0000,,is up to me. Dialogue: 0,0:16:21.00,0:16:24.00,Default,,0000,0000,0000,,I can pick the number of buckets to be whatever I want. Dialogue: 0,0:16:24.00,0:16:27.00,Default,,0000,0000,0000,,I can pick it to be five, I can pick it to be 10, I can be to be Dialogue: 0,0:16:27.00,0:16:29.00,Default,,0000,0000,0000,,1,000, I can pick it to be a million. If Dialogue: 0,0:16:29.00,0:16:34.00,Default,,0000,0000,0000,,I choose the number of buckets to be kind of roughly in the same order of magnitude Dialogue: 0,0:16:34.00,0:16:35.00,Default,,0000,0000,0000,,as the number of entries. So if I Dialogue: 0,0:16:35.00,0:16:38.00,Default,,0000,0000,0000,,wanna put a thousand things in my table, Dialogue: 0,0:16:38.00,0:16:40.00,Default,,0000,0000,0000,,if I make a thousand buckets to hold them in Dialogue: 0,0:16:40.00,0:16:43.00,Default,,0000,0000,0000,,and my hash function can divide stuff Dialogue: 0,0:16:43.00,0:16:45.00,Default,,0000,0000,0000,,pretty equally across those thousand buckets, Dialogue: 0,0:16:45.00,0:16:47.00,Default,,0000,0000,0000,,then I will expect that each of those link lists is about length one. Dialogue: 0,0:16:47.00,0:16:52.00,Default,,0000,0000,0000,,Some will be two, some will be zero, but if it's doing its job right then I have a Dialogue: 0,0:16:52.00,0:16:56.00,Default,,0000,0000,0000,,bunch of little tiny buckets, each of which holds a tiny fraction of Dialogue: 0,0:16:56.00,0:16:58.00,Default,,0000,0000,0000,,the data set, in this case one or two, Dialogue: 0,0:16:58.00,0:17:00.00,Default,,0000,0000,0000,,and it's very easy to search through them. Dialogue: 0,0:17:00.00,0:17:05.00,Default,,0000,0000,0000,,If I know I'm gonna put a million things in, then I make a million buckets, and then Dialogue: 0,0:17:05.00,0:17:08.00,Default,,0000,0000,0000,,I divide one out of a million things, I run it through the hash function, it gives me Dialogue: 0,0:17:08.00,0:17:09.00,Default,,0000,0000,0000,,this number Dialogue: 0,0:17:09.00,0:17:12.00,Default,,0000,0000,0000,,that says, ''Oh, it's 862,444.'' I go to Dialogue: 0,0:17:12.00,0:17:16.00,Default,,0000,0000,0000,,that bucket and the item is either there in that bucket or I have to look a little Dialogue: 0,0:17:16.00,0:17:18.00,Default,,0000,0000,0000,,bit to find it, Dialogue: 0,0:17:18.00,0:17:20.00,Default,,0000,0000,0000,,but there's no other place in the table I have to look. So the fact that there's Dialogue: 0,0:17:20.00,0:17:21.00,Default,,0000,0000,0000,,a Dialogue: 0,0:17:21.00,0:17:24.00,Default,,0000,0000,0000,,million other entries kinda near by in these other buckets is completely not Dialogue: 0,0:17:24.00,0:17:26.00,Default,,0000,0000,0000,,important to us, right, and so Dialogue: 0,0:17:26.00,0:17:28.00,Default,,0000,0000,0000,,the searching and Dialogue: 0,0:17:28.00,0:17:32.00,Default,,0000,0000,0000,,updating, adding, and replacing, and removing, and stuff all happens at the bucket level Dialogue: 0,0:17:32.00,0:17:35.00,Default,,0000,0000,0000,,and by choosing the buckets to be Dialogue: 0,0:17:35.00,0:17:36.00,Default,,0000,0000,0000,, Dialogue: 0,0:17:36.00,0:17:39.00,Default,,0000,0000,0000,,the number, total number are gonna be roughly equal to the size of the things, then Dialogue: 0,0:17:39.00,0:17:40.00,Default,,0000,0000,0000,,I have this Dialogue: 0,0:17:40.00,0:17:43.00,Default,,0000,0000,0000,,constant time access to Dialogue: 0,0:17:43.00,0:17:50.00,Default,,0000,0000,0000,,the tiny part of the subset of the set that matters. That's pretty cool. Let's write some Dialogue: 0,0:17:50.00,0:17:51.00,Default,,0000,0000,0000,,code. It's kinda Dialogue: 0,0:17:51.00,0:17:53.00,Default,,0000,0000,0000,,good to see the whole thing Dialogue: 0,0:17:53.00,0:18:00.00,Default,,0000,0000,0000,,doing what it needs to do. So I'm gonna go ahead and go back over here. Dialogue: 0,0:18:00.00,0:18:02.00,Default,,0000,0000,0000,,All right, so I've got my Dialogue: 0,0:18:02.00,0:18:05.00,Default,,0000,0000,0000,,map which has add and get value and Dialogue: 0,0:18:05.00,0:18:07.00,Default,,0000,0000,0000,,not much else there, right, Dialogue: 0,0:18:07.00,0:18:12.00,Default,,0000,0000,0000,,but kinda just set up and ready for us to go. What I'm gonna build here Dialogue: 0,0:18:12.00,0:18:13.00,Default,,0000,0000,0000,,is a Dialogue: 0,0:18:13.00,0:18:15.00,Default,,0000,0000,0000,,link list cell Dialogue: 0,0:18:15.00,0:18:20.00,Default,,0000,0000,0000,,that's gonna hold a string key, a val type value, and Dialogue: 0,0:18:20.00,0:18:24.00,Default,,0000,0000,0000,,a pointer to the next, and Dialogue: 0,0:18:24.00,0:18:27.00,Default,,0000,0000,0000,,so I plan on having each one of these cells, right, for each entry that goes in the Dialogue: 0,0:18:27.00,0:18:27.00,Default,,0000,0000,0000,,table Dialogue: 0,0:18:27.00,0:18:31.00,Default,,0000,0000,0000,,and then I'm going to have an array of Dialogue: 0,0:18:31.00,0:18:32.00,Default,,0000,0000,0000,,these, and Dialogue: 0,0:18:32.00,0:18:35.00,Default,,0000,0000,0000,,I'm gonna make a constant for it, and then I'll talk a little bit about the Dialogue: 0,0:18:35.00,0:18:42.00,Default,,0000,0000,0000,,consequences of that. So I'm gonna make a constant that is ? so Dialogue: 0,0:18:42.00,0:18:45.00,Default,,0000,0000,0000,,I make it 99. So then I have 99 buckets to start with. That's not Dialogue: 0,0:18:45.00,0:18:49.00,Default,,0000,0000,0000,,gonna solve all my problems, it's gonna show that it's kinda tuned for maps that Dialogue: 0,0:18:49.00,0:18:51.00,Default,,0000,0000,0000,,expect to hold about a hundred things, but Dialogue: 0,0:18:51.00,0:18:56.00,Default,,0000,0000,0000,,? and so I have an array in this case, so if you look at deciphering this Dialogue: 0,0:18:56.00,0:18:58.00,Default,,0000,0000,0000,,declaration is a little bit complicated here. Dialogue: 0,0:18:58.00,0:19:00.00,Default,,0000,0000,0000,,It says that buckets is the name of the variable, Dialogue: 0,0:19:00.00,0:19:06.00,Default,,0000,0000,0000,,it is an array of num buckets length, so a fixed size, 99 Dialogue: 0,0:19:06.00,0:19:07.00,Default,,0000,0000,0000,,entry array here. Dialogue: 0,0:19:07.00,0:19:11.00,Default,,0000,0000,0000,,Each of those entries is a pointer to a cell, so ready to be the Dialogue: 0,0:19:11.00,0:19:16.00,Default,,0000,0000,0000,,head pointer of a link list for us. Dialogue: 0,0:19:16.00,0:19:19.00,Default,,0000,0000,0000,,I'm gonna add a ? Dialogue: 0,0:19:19.00,0:19:23.00,Default,,0000,0000,0000,,well, we'll get there in a second. I'm gonna go over here and work on the constructor first. Dialogue: 0,0:19:23.00,0:19:25.00,Default,,0000,0000,0000,,So Dialogue: 0,0:19:25.00,0:19:30.00,Default,,0000,0000,0000,,the first thing I'm gonna do is make sure that those pointers have good values. Dialogue: 0,0:19:30.00,0:19:32.00,Default,,0000,0000,0000,,If I do not do this, right, they will be junk Dialogue: 0,0:19:32.00,0:19:36.00,Default,,0000,0000,0000,,and there will be badness that will happen, so I can make sure that each of them starts off as an Dialogue: 0,0:19:36.00,0:19:37.00,Default,,0000,0000,0000,,empty list, Dialogue: 0,0:19:37.00,0:19:42.00,Default,,0000,0000,0000,,and then correspondently, right, I would need to be doing a delete all list Dialogue: 0,0:19:42.00,0:19:43.00,Default,,0000,0000,0000,,cells here, but Dialogue: 0,0:19:43.00,0:19:48.00,Default,,0000,0000,0000,,I've been lazy about doing that and I'm gonna continue to be lazy just knowing that I'd have to iterate through and then Dialogue: 0,0:19:48.00,0:19:52.00,Default,,0000,0000,0000,,delete all the cells in each of those lists that are there. Dialogue: 0,0:19:52.00,0:19:54.00,Default,,0000,0000,0000,,And then for add and get value Dialogue: 0,0:19:54.00,0:19:56.00,Default,,0000,0000,0000,,what I'm gonna ? else we need to do Dialogue: 0,0:19:56.00,0:19:57.00,Default,,0000,0000,0000,,is figure out Dialogue: 0,0:19:57.00,0:19:58.00,Default,,0000,0000,0000,,in both cases, right, which Dialogue: 0,0:19:58.00,0:20:00.00,Default,,0000,0000,0000,,is the appropriate list to operate on. Dialogue: 0,0:20:00.00,0:20:03.00,Default,,0000,0000,0000,,So running the hash function on my key, Dialogue: 0,0:20:03.00,0:20:08.00,Default,,0000,0000,0000,,seeing where it tells me to look, and then either searching it to find the match Dialogue: 0,0:20:08.00,0:20:09.00,Default,,0000,0000,0000,,for get value Dialogue: 0,0:20:09.00,0:20:13.00,Default,,0000,0000,0000,,to return the matching value or in the add case, to search for the matching Dialogue: 0,0:20:13.00,0:20:16.00,Default,,0000,0000,0000,,entry to overwrite if it exists and if not, then to create a new cell and tack it on the Dialogue: 0,0:20:16.00,0:20:19.00,Default,,0000,0000,0000,,front of the list. So Dialogue: 0,0:20:19.00,0:20:21.00,Default,,0000,0000,0000,,let's write a Dialogue: 0,0:20:21.00,0:20:25.00,Default,,0000,0000,0000,,helper function because they're both gonna do the same thing. I'm gonna need Dialogue: 0,0:20:25.00,0:20:28.00,Default,,0000,0000,0000,,the same thing twice. So I'm gonna put a hash function in here Dialogue: 0,0:20:28.00,0:20:31.00,Default,,0000,0000,0000,,that, given a key and a number of buckets, is gonna Dialogue: 0,0:20:31.00,0:20:35.00,Default,,0000,0000,0000,,return to me the right bucket to look in, Dialogue: 0,0:20:35.00,0:20:39.00,Default,,0000,0000,0000,,and then I'm gonna write a find cell function that, given a key and Dialogue: 0,0:20:39.00,0:20:39.00,Default,,0000,0000,0000,,the pointer to a list, Dialogue: 0,0:20:39.00,0:20:43.00,Default,,0000,0000,0000,,will find the cell that matches that key or return a null if it didn't find Dialogue: 0,0:20:43.00,0:20:52.00,Default,,0000,0000,0000,,it. Okay, let me put my hash Dialogue: 0,0:20:52.00,0:21:03.00,Default,,0000,0000,0000,,function in. I have a good hash function down there in a minute that I'm gonna pull out, but I just ? Dialogue: 0,0:21:03.00,0:21:05.00,Default,,0000,0000,0000,,so what I'm gonna do, I'm Dialogue: 0,0:21:05.00,0:21:07.00,Default,,0000,0000,0000,,gonna make an interesting choice here Dialogue: 0,0:21:07.00,0:21:12.00,Default,,0000,0000,0000,,and I'm gonna have my hash function just return zero, which Dialogue: 0,0:21:12.00,0:21:13.00,Default,,0000,0000,0000,,seems pretty surprising. Dialogue: 0,0:21:13.00,0:21:16.00,Default,,0000,0000,0000,,But I'm gonna talk about why that is. It turns out, when I'm first building it that Dialogue: 0,0:21:16.00,0:21:19.00,Default,,0000,0000,0000,,actually is an easy way to test that my code is working Dialogue: 0,0:21:19.00,0:21:22.00,Default,,0000,0000,0000,,and to not be dependent on the behavior of the hash function. At this point it says, ''We'll Dialogue: 0,0:21:22.00,0:21:23.00,Default,,0000,0000,0000,,just put Dialogue: 0,0:21:23.00,0:21:25.00,Default,,0000,0000,0000,,everything in the zero's bucket.'' Dialogue: 0,0:21:25.00,0:21:27.00,Default,,0000,0000,0000,,Now that's not a good strategy for my long term efficiency, Dialogue: 0,0:21:27.00,0:21:29.00,Default,,0000,0000,0000,,but it is a good way to test that Dialogue: 0,0:21:29.00,0:21:33.00,Default,,0000,0000,0000,,my handling of the bucket, and the searching, and the finding, and Dialogue: 0,0:21:33.00,0:21:36.00,Default,,0000,0000,0000,,inserting is correct, and the performance I should expect to be abysmal on this, right, it should be Dialogue: 0,0:21:36.00,0:21:37.00,Default,,0000,0000,0000,,totally linear Dialogue: 0,0:21:37.00,0:21:39.00,Default,,0000,0000,0000,,in terms of the number of elements to add or to get them because they'll all Dialogue: 0,0:21:39.00,0:21:41.00,Default,,0000,0000,0000,,be in one link list Dialogue: 0,0:21:41.00,0:21:45.00,Default,,0000,0000,0000,,with no easy access to them. And then eventually I can change this to divide across Dialogue: 0,0:21:45.00,0:21:48.00,Default,,0000,0000,0000,,the buckets and get better spread, but the correctness of the hash table Dialogue: 0,0:21:48.00,0:21:52.00,Default,,0000,0000,0000,,isn't affected by this choice. It's kind of an interesting way to think about designing Dialogue: 0,0:21:52.00,0:21:54.00,Default,,0000,0000,0000,,the code. Dialogue: 0,0:21:54.00,0:22:01.00,Default,,0000,0000,0000,,The other helper I have down here is gonna be my find cell that, given the Dialogue: 0,0:22:01.00,0:22:04.00,Default,,0000,0000,0000,,head pointer to a list and the key, Dialogue: 0,0:22:04.00,0:22:14.00,Default,,0000,0000,0000,,is gonna go find it, and it's basically just Dialogue: 0,0:22:14.00,0:22:17.00,Default,,0000,0000,0000,,if this ker cells key equals the key, Dialogue: 0,0:22:17.00,0:22:20.00,Default,,0000,0000,0000,,then we return it and then if we've gone through the whole list and haven't found it, we return Dialogue: 0,0:22:20.00,0:22:22.00,Default,,0000,0000,0000,,a null. Dialogue: 0,0:22:22.00,0:22:25.00,Default,,0000,0000,0000,,So this guy is returning a cell Dialogue: 0,0:22:25.00,0:22:27.00,Default,,0000,0000,0000,,T star, right, Dialogue: 0,0:22:27.00,0:22:30.00,Default,,0000,0000,0000,,as the matching cell within the thing, and this one is gonna provoke Dialogue: 0,0:22:30.00,0:22:31.00,Default,,0000,0000,0000,,that Dialogue: 0,0:22:31.00,0:22:35.00,Default,,0000,0000,0000,,little C++ compilation snafu that we encountered on the binary search Dialogue: 0,0:22:35.00,0:22:36.00,Default,,0000,0000,0000,,tree, as well, Dialogue: 0,0:22:36.00,0:22:40.00,Default,,0000,0000,0000,,which is that cell T, right, is a type that's declared in the private Dialogue: 0,0:22:40.00,0:22:43.00,Default,,0000,0000,0000,,section within the scope of my map. So Dialogue: 0,0:22:43.00,0:22:46.00,Default,,0000,0000,0000,,outside of the scope, which to say, before I've crossed the my map Dialogue: 0,0:22:46.00,0:22:51.00,Default,,0000,0000,0000,,angle bracket val type colon, colon, it doesn't actually believe we're in my map scope Dialogue: 0,0:22:51.00,0:22:54.00,Default,,0000,0000,0000,,yet. It hasn't seen that and it can't anticipate it, so I have to actually give Dialogue: 0,0:22:54.00,0:22:56.00,Default,,0000,0000,0000,,the full name Dialogue: 0,0:22:56.00,0:22:58.00,Default,,0000,0000,0000,,with the scope specifier on it, Dialogue: 0,0:22:58.00,0:23:03.00,Default,,0000,0000,0000,,and then because of this use of the template name outside of its class Dialogue: 0,0:23:03.00,0:23:09.00,Default,,0000,0000,0000,,is a special place for the C++ compiler, the type name keyword also Dialogue: 0,0:23:09.00,0:23:12.00,Default,,0000,0000,0000,,needs to go there. Okay, so there's the heavy lifting of these two things, right, getting it to hash to the Dialogue: 0,0:23:12.00,0:23:14.00,Default,,0000,0000,0000,,right place, which I kinda Dialogue: 0,0:23:14.00,0:23:15.00,Default,,0000,0000,0000,,short changed, Dialogue: 0,0:23:15.00,0:23:18.00,Default,,0000,0000,0000,,and then searching to find a cell. If Dialogue: 0,0:23:18.00,0:23:23.00,Default,,0000,0000,0000,,I go back up here to get value, Dialogue: 0,0:23:23.00,0:23:25.00,Default,,0000,0000,0000,,and I call the hash function, Dialogue: 0,0:23:25.00,0:23:31.00,Default,,0000,0000,0000,,given my key and the number of buckets to find out which bucket to work on, Dialogue: 0,0:23:31.00,0:23:35.00,Default,,0000,0000,0000,,I go looking for a match by calling find cell Dialogue: 0,0:23:35.00,0:23:40.00,Default,,0000,0000,0000,,of the key in buckets of which bucket, Dialogue: 0,0:23:40.00,0:23:43.00,Default,,0000,0000,0000,,and then if match does not equal null then Dialogue: 0,0:23:43.00,0:23:45.00,Default,,0000,0000,0000,,I can return match as val and Dialogue: 0,0:23:45.00,0:23:46.00,Default,,0000,0000,0000,,then Dialogue: 0,0:23:46.00,0:23:47.00,Default,,0000,0000,0000,,otherwise Dialogue: 0,0:23:47.00,0:23:52.00,Default,,0000,0000,0000,,we say, ''No such key found.'' So that's Dialogue: 0,0:23:52.00,0:23:58.00,Default,,0000,0000,0000,,the error case, right, for get value was that if it didn't find one, its behavior is to Dialogue: 0,0:23:58.00,0:23:59.00,Default,,0000,0000,0000,,raise an error Dialogue: 0,0:23:59.00,0:24:01.00,Default,,0000,0000,0000,,so that someone knows that they've Dialogue: 0,0:24:01.00,0:24:05.00,Default,,0000,0000,0000,,asked something it can't deal Dialogue: 0,0:24:05.00,0:24:09.00,Default,,0000,0000,0000,,with. Okay, so the key thing here usually is that the hash is some sort of constant operation that Dialogue: 0,0:24:09.00,0:24:12.00,Default,,0000,0000,0000,,just jumbles the key up, in this case it just returned zero, but could actually look at Dialogue: 0,0:24:12.00,0:24:14.00,Default,,0000,0000,0000,,the elements of the key, Dialogue: 0,0:24:14.00,0:24:16.00,Default,,0000,0000,0000,,and then it does its reversal down that link list, Dialogue: 0,0:24:16.00,0:24:20.00,Default,,0000,0000,0000,,which given a correct working hash function should have divided them up in Dialogue: 0,0:24:20.00,0:24:20.00,Default,,0000,0000,0000,,these Dialogue: 0,0:24:20.00,0:24:23.00,Default,,0000,0000,0000,,little tiny chains, then we can just find the one Dialogue: 0,0:24:23.00,0:24:26.00,Default,,0000,0000,0000,,quickly by looking through that link list. If we find it, right, we've got it, Dialogue: 0,0:24:26.00,0:24:28.00,Default,,0000,0000,0000,,otherwise, right, we're in Dialogue: 0,0:24:28.00,0:24:31.00,Default,,0000,0000,0000,,the error case. Dialogue: 0,0:24:31.00,0:24:37.00,Default,,0000,0000,0000,,Add looks almost exactly the same. I Dialogue: 0,0:24:37.00,0:24:40.00,Default,,0000,0000,0000,,start by getting the bucket, doing the find cell, Dialogue: 0,0:24:40.00,0:24:41.00,Default,,0000,0000,0000,,if it's not ? Dialogue: 0,0:24:41.00,0:24:43.00,Default,,0000,0000,0000,,if it did find it, right, Dialogue: 0,0:24:43.00,0:24:46.00,Default,,0000,0000,0000,,then we overwrite, Dialogue: 0,0:24:46.00,0:24:49.00,Default,,0000,0000,0000,,and then in the second case, right, where we didn't find it, right, then we need to Dialogue: 0,0:24:49.00,0:24:55.00,Default,,0000,0000,0000,,make ourselves a new cell, and Dialogue: 0,0:24:55.00,0:24:59.00,Default,,0000,0000,0000,,copy in the key, copy in the Dialogue: 0,0:24:59.00,0:25:01.00,Default,,0000,0000,0000,,value, Dialogue: 0,0:25:01.00,0:25:05.00,Default,,0000,0000,0000,,set it ? in this case, the easiest place, right, to add something to a link list is I should just do append Dialogue: 0,0:25:05.00,0:25:08.00,Default,,0000,0000,0000,,it to the front, right, no need to make it hard on ourselves, right, we're not Dialogue: 0,0:25:08.00,0:25:11.00,Default,,0000,0000,0000,,keeping them in any particular order, whatsoever, right, they're base on kind of order Dialogue: 0,0:25:11.00,0:25:12.00,Default,,0000,0000,0000,,of entry. Dialogue: 0,0:25:12.00,0:25:17.00,Default,,0000,0000,0000,,So if somebody gives us a new one, we might as well just stick it in the front, that'd make our job easy. So Dialogue: 0,0:25:17.00,0:25:18.00,Default,,0000,0000,0000,,it will Dialogue: 0,0:25:18.00,0:25:20.00,Default,,0000,0000,0000,,point to the Dialogue: 0,0:25:20.00,0:25:23.00,Default,,0000,0000,0000,,current front pointer of the cell, Dialogue: 0,0:25:23.00,0:25:26.00,Default,,0000,0000,0000,,and then we will update the Dialogue: 0,0:25:26.00,0:25:34.00,Default,,0000,0000,0000,,bucket pointer to point to this guy. So I think Dialogue: 0,0:25:34.00,0:25:35.00,Default,,0000,0000,0000,,we are Dialogue: 0,0:25:35.00,0:25:37.00,Default,,0000,0000,0000,,in okay shape here. Let me Dialogue: 0,0:25:37.00,0:25:40.00,Default,,0000,0000,0000,,take a look and make sure I like what I did, right, Dialogue: 0,0:25:40.00,0:25:44.00,Default,,0000,0000,0000,,so hashing, finding the match, if it found a non empty, so ? this is the Dialogue: 0,0:25:44.00,0:25:46.00,Default,,0000,0000,0000,,overwrite case of finding an existing cell, we just overwrite with that. Otherwise, Dialogue: 0,0:25:46.00,0:25:50.00,Default,,0000,0000,0000,,making a new cell and then attaching it to be the Dialogue: 0,0:25:50.00,0:25:50.00,Default,,0000,0000,0000,,front Dialogue: 0,0:25:50.00,0:25:52.00,Default,,0000,0000,0000,,of the bucket there. Dialogue: 0,0:25:52.00,0:25:55.00,Default,,0000,0000,0000,,So for example, in the case where the bucket's totally empty, it's good to kinda trace Dialogue: 0,0:25:55.00,0:25:59.00,Default,,0000,0000,0000,,that. That's the most common case when we start, right, if we hash to bucket zero Dialogue: 0,0:25:59.00,0:26:01.00,Default,,0000,0000,0000,,and we do a search of a null link list, we should get a null out, Dialogue: 0,0:26:01.00,0:26:03.00,Default,,0000,0000,0000,,so then we create a new cell, right, Dialogue: 0,0:26:03.00,0:26:06.00,Default,,0000,0000,0000,,and we set the next field to be what the current bucket's head pointer was, which Dialogue: 0,0:26:06.00,0:26:10.00,Default,,0000,0000,0000,,was null. So there should be a single to list and then we update the bucket Dialogue: 0,0:26:10.00,0:26:12.00,Default,,0000,0000,0000,,to point to this guy. So then we should have a Dialogue: 0,0:26:12.00,0:26:14.00,Default,,0000,0000,0000,,single list cell with null trailing it Dialogue: 0,0:26:14.00,0:26:18.00,Default,,0000,0000,0000,,or inserting into the empty bucket case. Dialogue: 0,0:26:18.00,0:26:20.00,Default,,0000,0000,0000,,I like it. Let's see if I have a little code over Dialogue: 0,0:26:20.00,0:26:24.00,Default,,0000,0000,0000,,here that adds Dialogue: 0,0:26:24.00,0:26:27.00,Default,,0000,0000,0000,,some things, car, car, cat, with some numbers. Okay, and that tries to Dialogue: 0,0:26:27.00,0:26:31.00,Default,,0000,0000,0000,,retrieve the value of car, which it wrote a three and then it wrote a four on top Dialogue: 0,0:26:31.00,0:26:39.00,Default,,0000,0000,0000,,of it, so hopefully the answer we should get out of this is four. Okay, if I ? be Dialogue: 0,0:26:39.00,0:26:41.00,Default,,0000,0000,0000,,interesting to do something, like ask for something that we know is not in the Dialogue: 0,0:26:41.00,0:26:45.00,Default,,0000,0000,0000,,table, right, to see that our error case gets handled correctly, Dialogue: 0,0:26:45.00,0:26:46.00,Default,,0000,0000,0000,,the switch Dialogue: 0,0:26:46.00,0:26:47.00,Default,,0000,0000,0000,,key's Dialogue: 0,0:26:47.00,0:26:50.00,Default,,0000,0000,0000,,found. And if I were to continue to do this, I could do a little bit of stress testing to see how Dialogue: 0,0:26:50.00,0:26:55.00,Default,,0000,0000,0000,,that zero is causing us some performance implications, but if I sat there and just put tons and Dialogue: 0,0:26:55.00,0:26:56.00,Default,,0000,0000,0000,,tons of things in there, right, Dialogue: 0,0:26:56.00,0:27:00.00,Default,,0000,0000,0000,,made up strings like crazy and stuffed them in that with numbers, right, I would just be creating one Dialogue: 0,0:27:00.00,0:27:04.00,Default,,0000,0000,0000,,long chain off the zero bucket, ignoring all the other ones, right, Dialogue: 0,0:27:04.00,0:27:08.00,Default,,0000,0000,0000,,but then it's kinda stress testing my link list handling about the adding and Dialogue: 0,0:27:08.00,0:27:09.00,Default,,0000,0000,0000,,searching that list, but Dialogue: 0,0:27:09.00,0:27:12.00,Default,,0000,0000,0000,,would also show us that we expect that as we got to be, have 100 entries, 200 Dialogue: 0,0:27:12.00,0:27:16.00,Default,,0000,0000,0000,,entries, 300 entries that subsequent adds and get values, right, would Dialogue: 0,0:27:16.00,0:27:21.00,Default,,0000,0000,0000,,show a linear increase in performance because of the Dialogue: 0,0:27:21.00,0:27:27.00,Default,,0000,0000,0000,,cluster that we got going there. Dialogue: 0,0:27:27.00,0:27:30.00,Default,,0000,0000,0000,,So let me give you a real hash function. I'll talk Dialogue: 0,0:27:30.00,0:27:37.00,Default,,0000,0000,0000,,you through what it's doing and then I'll Dialogue: 0,0:27:37.00,0:27:39.00,Default,,0000,0000,0000,,leave you with the thought of, Dialogue: 0,0:27:39.00,0:27:41.00,Default,,0000,0000,0000,,that writing one of these actually correctly and reliably is Dialogue: 0,0:27:41.00,0:27:44.00,Default,,0000,0000,0000,,actually Dialogue: 0,0:27:44.00,0:27:47.00,Default,,0000,0000,0000,,more of an advance exercise than we're gonna look at, but this is actually a hash Dialogue: 0,0:27:47.00,0:27:52.00,Default,,0000,0000,0000,,function that's taken from Don Knuth's Art of Computer Science, Dialogue: 0,0:27:52.00,0:27:55.00,Default,,0000,0000,0000,,some of the work that guides a lot of computer scientists still to this day, Dialogue: 0,0:27:55.00,0:27:59.00,Default,,0000,0000,0000,,and it has the strategy I basically told you about of, Dialogue: 0,0:27:59.00,0:28:02.00,Default,,0000,0000,0000,,for all the characters that are in the string, Dialogue: 0,0:28:02.00,0:28:06.00,Default,,0000,0000,0000,,right, adding it into the thing, but having multiplied, Dialogue: 0,0:28:06.00,0:28:10.00,Default,,0000,0000,0000,,right, the previous hash code that's being built up by this large multiplier Dialogue: 0,0:28:10.00,0:28:12.00,Default,,0000,0000,0000,,which is in effect kinda taking Dialogue: 0,0:28:12.00,0:28:14.00,Default,,0000,0000,0000,,advantage of the positional information. Dialogue: 0,0:28:14.00,0:28:17.00,Default,,0000,0000,0000,,And so the multiplier happens to be a very large negative number Dialogue: 0,0:28:17.00,0:28:19.00,Default,,0000,0000,0000,,that's a prime that says, okay, Dialogue: 0,0:28:19.00,0:28:20.00,Default,,0000,0000,0000,,the first time through, right, Dialogue: 0,0:28:20.00,0:28:23.00,Default,,0000,0000,0000,,it'll take the hash code of zero and multiply it by and add the first character. So the first Dialogue: 0,0:28:23.00,0:28:26.00,Default,,0000,0000,0000,,character gets added in without any modification. Dialogue: 0,0:28:26.00,0:28:29.00,Default,,0000,0000,0000,,The next character, though, will take the previous one and multiply it by Dialogue: 0,0:28:29.00,0:28:31.00,Default,,0000,0000,0000,,one power of the multiplier Dialogue: 0,0:28:31.00,0:28:33.00,Default,,0000,0000,0000,,and then add in the next character, Dialogue: 0,0:28:33.00,0:28:36.00,Default,,0000,0000,0000,,and then that sum, right, on the sub [inaudible] gets multiplied again by the multiplier, so Dialogue: 0,0:28:36.00,0:28:39.00,Default,,0000,0000,0000,,raising it to the squared, and the third, and to the fourth powers, it works down the Dialogue: 0,0:28:39.00,0:28:40.00,Default,,0000,0000,0000,,string. The Dialogue: 0,0:28:40.00,0:28:43.00,Default,,0000,0000,0000,,effectiveness actually is that it's gonna do a lot of overflow. Dialogue: 0,0:28:43.00,0:28:45.00,Default,,0000,0000,0000,,That's a very large number Dialogue: 0,0:28:45.00,0:28:47.00,Default,,0000,0000,0000,,and it's adding Dialogue: 0,0:28:47.00,0:28:51.00,Default,,0000,0000,0000,,and multiplying in this very large space, so in effect we're actually counting on the Dialogue: 0,0:28:51.00,0:28:54.00,Default,,0000,0000,0000,,fact that the addition and multiplication Dialogue: 0,0:28:54.00,0:28:56.00,Default,,0000,0000,0000,,that is built into C++ Dialogue: 0,0:28:56.00,0:28:58.00,Default,,0000,0000,0000,,doesn't raise any errors when the numbers get so large that they can't be represented. It Dialogue: 0,0:28:58.00,0:29:03.00,Default,,0000,0000,0000,,just kind of truncates back down to what fits, and so it kind of wraps around using a modular Dialogue: 0,0:29:03.00,0:29:06.00,Default,,0000,0000,0000,,strategy here, like a ring. And so we'll actually end up kind of making a very big number out Dialogue: 0,0:29:06.00,0:29:10.00,Default,,0000,0000,0000,,of this. The longer the string is, the more those multiplies, the more those powers raise. We've got Dialogue: 0,0:29:10.00,0:29:13.00,Default,,0000,0000,0000,,this huge number, what we do is kinda truncating it back to what fits in to a long Dialogue: 0,0:29:13.00,0:29:16.00,Default,,0000,0000,0000,,and then when we're done, we take whatever Dialogue: 0,0:29:16.00,0:29:18.00,Default,,0000,0000,0000,,number fell out of this process Dialogue: 0,0:29:18.00,0:29:21.00,Default,,0000,0000,0000,,and we mod it by the number of buckets to give us a number from zero to buckets minus Dialogue: 0,0:29:21.00,0:29:22.00,Default,,0000,0000,0000,,one. Dialogue: 0,0:29:22.00,0:29:26.00,Default,,0000,0000,0000,,And so whatever gets stuffed into here, right, it's reliable in the sense that it's, Dialogue: 0,0:29:26.00,0:29:28.00,Default,,0000,0000,0000,,every time you put in the number you'll get the same thing back. Dialogue: 0,0:29:28.00,0:29:33.00,Default,,0000,0000,0000,,We know it will be in the range end buckets because of that last mod Dialogue: 0,0:29:33.00,0:29:34.00,Default,,0000,0000,0000,,call Dialogue: 0,0:29:34.00,0:29:36.00,Default,,0000,0000,0000,,that worked it out for us Dialogue: 0,0:29:36.00,0:29:37.00,Default,,0000,0000,0000,,and we Dialogue: 0,0:29:37.00,0:29:40.00,Default,,0000,0000,0000,,then use that to say which bucket to go look in when we're ready Dialogue: 0,0:29:40.00,0:29:41.00,Default,,0000,0000,0000,,to do our work. Dialogue: 0,0:29:41.00,0:29:44.00,Default,,0000,0000,0000,,So if I switch that in for my old hash Dialogue: 0,0:29:44.00,0:29:47.00,Default,,0000,0000,0000,,function, I shouldn't actually see any Dialogue: 0,0:29:47.00,0:29:47.00,Default,,0000,0000,0000,,change Dialogue: 0,0:29:47.00,0:29:48.00,Default,,0000,0000,0000,,from the outside, Dialogue: 0,0:29:48.00,0:29:51.00,Default,,0000,0000,0000,,other than the performance, right, should suddenly actually be a lot faster, which is Dialogue: 0,0:29:51.00,0:29:55.00,Default,,0000,0000,0000,,hard to tell when I have three things in there, but ? I'm gonna Dialogue: 0,0:29:55.00,0:30:01.00,Default,,0000,0000,0000,,take out that error case. In this case car Dialogue: 0,0:30:01.00,0:30:03.00,Default,,0000,0000,0000,,and cat now Dialogue: 0,0:30:03.00,0:30:06.00,Default,,0000,0000,0000,,are probably not very likely to be going to the same bucket. In fact I can Dialogue: 0,0:30:06.00,0:30:09.00,Default,,0000,0000,0000,,guarantee they don't go in the same bucket, whereas Dialogue: 0,0:30:09.00,0:30:16.00,Default,,0000,0000,0000,,before they were kind of overlapping. Dialogue: 0,0:30:16.00,0:30:18.00,Default,,0000,0000,0000,,I'll leave that code there for a second, although Dialogue: 0,0:30:18.00,0:30:19.00,Default,,0000,0000,0000,,really I would say that point of Dialogue: 0,0:30:19.00,0:30:22.00,Default,,0000,0000,0000,,hashing really is not to get so worried about how it is that the hash code does it's Dialogue: 0,0:30:22.00,0:30:25.00,Default,,0000,0000,0000,,work, but to understand the general idea of what a hashing function has to do, Dialogue: 0,0:30:25.00,0:30:29.00,Default,,0000,0000,0000,,right, what its role is and how it relates to getting this constant time Dialogue: 0,0:30:29.00,0:30:30.00,Default,,0000,0000,0000,,performance. Dialogue: 0,0:30:30.00,0:30:33.00,Default,,0000,0000,0000,,So I'll give you an example of hashing in the real world, I think is really interesting Dialogue: 0,0:30:33.00,0:30:37.00,Default,,0000,0000,0000,,to see it actually happen and people not even realizing how hashing is so useful. Dialogue: 0,0:30:37.00,0:30:39.00,Default,,0000,0000,0000,,So I ordered something from REI, which is Dialogue: 0,0:30:39.00,0:30:41.00,Default,,0000,0000,0000,,this camping store and Dialogue: 0,0:30:41.00,0:30:42.00,Default,,0000,0000,0000,,it ? Dialogue: 0,0:30:42.00,0:30:43.00,Default,,0000,0000,0000,,they didn't have it in stock, so they had to Dialogue: 0,0:30:43.00,0:30:46.00,Default,,0000,0000,0000,,place an order for it. This is actually kinda awhile ago, but I ? Dialogue: 0,0:30:46.00,0:30:50.00,Default,,0000,0000,0000,,so then when I called to see if it's in, I call them on the phone, I say, ''Oh, is my order in?'' And they don't Dialogue: 0,0:30:50.00,0:30:52.00,Default,,0000,0000,0000,,ask me my name. I thought this was very funny. Like, Dialogue: 0,0:30:52.00,0:30:55.00,Default,,0000,0000,0000,,''Oh, it's ? I'm Julie. I wanna know about my order.'' They're like, Dialogue: 0,0:30:55.00,0:30:59.00,Default,,0000,0000,0000,,''Okay, what are the last two digits of your phone number?'' And I'm Dialogue: 0,0:30:59.00,0:31:02.00,Default,,0000,0000,0000,,like ? you know, it's not one of those things you can answer right off the top of your head. Okay, 21. Turns out last Dialogue: 0,0:31:02.00,0:31:04.00,Default,,0000,0000,0000,,two digits of my phone number are 21. Dialogue: 0,0:31:04.00,0:31:06.00,Default,,0000,0000,0000,,And they say ? and they go and they look in the 21 basket and then they say, ''What's your Dialogue: 0,0:31:06.00,0:31:09.00,Default,,0000,0000,0000,,name?'' Now they want to know my name. I'm like, ''Okay, my name's Julie.'' Okay, they look it up and they're Dialogue: 0,0:31:09.00,0:31:11.00,Default,,0000,0000,0000,,like, ''Okay, it's here.'' I'm like, Dialogue: 0,0:31:11.00,0:31:12.00,Default,,0000,0000,0000,,''Okay.'' So then I go to get it, like Dialogue: 0,0:31:12.00,0:31:13.00,Default,,0000,0000,0000,,a couple days later, and I go to the store Dialogue: 0,0:31:13.00,0:31:15.00,Default,,0000,0000,0000,,and I'm standing in the line waiting for them, Dialogue: 0,0:31:15.00,0:31:17.00,Default,,0000,0000,0000,,and I look up on the wall Dialogue: 0,0:31:17.00,0:31:20.00,Default,,0000,0000,0000,,and they have Dialogue: 0,0:31:20.00,0:31:22.00,Default,,0000,0000,0000,,100 baskets, little Dialogue: 0,0:31:22.00,0:31:24.00,Default,,0000,0000,0000,,wire baskets up on the wall Dialogue: 0,0:31:24.00,0:31:27.00,Default,,0000,0000,0000,,and they're labeled 0 through 9 over here Dialogue: 0,0:31:27.00,0:31:29.00,Default,,0000,0000,0000,,and 0 through 9 over there, Dialogue: 0,0:31:29.00,0:31:33.00,Default,,0000,0000,0000,,and when I come in they ask me the same question. ''What's the last two digits of your phone number?'' Dialogue: 0,0:31:33.00,0:31:36.00,Default,,0000,0000,0000,,Now I'm all prepared because I've already been primed on this question. I say, ''21.'' Then Dialogue: 0,0:31:36.00,0:31:38.00,Default,,0000,0000,0000,,they walk over here, right, Dialogue: 0,0:31:38.00,0:31:40.00,Default,,0000,0000,0000,,to the 21 basket here. Dialogue: 0,0:31:40.00,0:31:42.00,Default,,0000,0000,0000,,It's like the top digit in this digit, Dialogue: 0,0:31:42.00,0:31:45.00,Default,,0000,0000,0000,,and there's only one piece of paper in there, and they pull out my order thing, Dialogue: 0,0:31:45.00,0:31:48.00,Default,,0000,0000,0000,,and then they get me my tent and everybody's happy. Dialogue: 0,0:31:48.00,0:31:52.00,Default,,0000,0000,0000,,So while I'm there, I try to engage the cashier on how this is an example of a real world Dialogue: 0,0:31:52.00,0:31:55.00,Default,,0000,0000,0000,,hashing system Dialogue: 0,0:31:55.00,0:31:58.00,Default,,0000,0000,0000,,and I still got my tent, I'll have you know, but it was close, Dialogue: 0,0:31:58.00,0:32:02.00,Default,,0000,0000,0000,,right. They're like, ''Okay, yeah, you crazy crackpot. You can now leave now.'' Dialogue: 0,0:32:02.00,0:32:06.00,Default,,0000,0000,0000,,I was like looking at it and them and I tried to talk to them about the number of digits because in some sets, right, this is a Dialogue: 0,0:32:06.00,0:32:10.00,Default,,0000,0000,0000,,very good example of what the investment in the number of buckets is versus what Dialogue: 0,0:32:10.00,0:32:12.00,Default,,0000,0000,0000,,tradeoff it gives you, right. Because there's this very physical Dialogue: 0,0:32:12.00,0:32:13.00,Default,,0000,0000,0000,,sort of set up of this, it's like Dialogue: 0,0:32:13.00,0:32:17.00,Default,,0000,0000,0000,,by choosing a larger number of buckets, right, you can more fine grain your Dialogue: 0,0:32:17.00,0:32:18.00,Default,,0000,0000,0000,,access, Dialogue: 0,0:32:18.00,0:32:22.00,Default,,0000,0000,0000,,but that investment in buckets means you're kind of permanently installing that story. So Dialogue: 0,0:32:22.00,0:32:25.00,Default,,0000,0000,0000,,in this case, right, they didn't use just the last digit of my phone number. They could've done that and Dialogue: 0,0:32:25.00,0:32:28.00,Default,,0000,0000,0000,,they would have only had ten buckets on the wall, but then they would expect, Dialogue: 0,0:32:28.00,0:32:30.00,Default,,0000,0000,0000,,you know, ten times as many things in each bucket. Dialogue: 0,0:32:30.00,0:32:33.00,Default,,0000,0000,0000,,And apparently their Dialogue: 0,0:32:33.00,0:32:37.00,Default,,0000,0000,0000,,estimate was the number of active orders in this backorder category was enough Dialogue: 0,0:32:37.00,0:32:38.00,Default,,0000,0000,0000,,to Dialogue: 0,0:32:38.00,0:32:41.00,Default,,0000,0000,0000,,warrant being more than ten, you know, significantly more than ten, Dialogue: 0,0:32:41.00,0:32:45.00,Default,,0000,0000,0000,,but not so much more than 100, then in fact 100 buckets was the right Dialogue: 0,0:32:45.00,0:32:46.00,Default,,0000,0000,0000,,investment to make in their bucket strategy. Dialogue: 0,0:32:46.00,0:32:49.00,Default,,0000,0000,0000,,They didn't put three buckets on the wall because, you know, like what are they Dialogue: 0,0:32:49.00,0:32:52.00,Default,,0000,0000,0000,,gonna do, have this big 3D sort of thing. They didn't enjoy this Dialogue: 0,0:32:52.00,0:32:56.00,Default,,0000,0000,0000,,discussion, I went on for hours with them and they were, like, really not amused. But it Dialogue: 0,0:32:56.00,0:32:59.00,Default,,0000,0000,0000,,had, you know, the three digits, then you'd get this even more likelihood that each Dialogue: 0,0:32:59.00,0:33:03.00,Default,,0000,0000,0000,,bucket would be empty, but ? or have a very small number of things, but given Dialogue: 0,0:33:03.00,0:33:07.00,Default,,0000,0000,0000,,their set up, that seemed to be the right strategy was to say 100 buckets and now ? they didn't ask me Dialogue: 0,0:33:07.00,0:33:09.00,Default,,0000,0000,0000,,the Dialogue: 0,0:33:09.00,0:33:11.00,Default,,0000,0000,0000,,first two digits of my phone number. Dialogue: 0,0:33:11.00,0:33:14.00,Default,,0000,0000,0000,,They asked me the last two. Why does that Dialogue: 0,0:33:14.00,0:33:18.00,Default,,0000,0000,0000,,matter? Because you're using all [inaudible]. Yeah. It's hated a Dialogue: 0,0:33:18.00,0:33:21.00,Default,,0000,0000,0000,,lot. If you ask, like Stanford students, right, like especially Dialogue: 0,0:33:21.00,0:33:25.00,Default,,0000,0000,0000,,when, you know, like campus numbers used to all be 49's or 72's or something, so like, if Dialogue: 0,0:33:25.00,0:33:27.00,Default,,0000,0000,0000,,you use the first two digits, right, you'd have everybody's in the 49 bucket or Dialogue: 0,0:33:27.00,0:33:29.00,Default,,0000,0000,0000,,the 72 bucket or Dialogue: 0,0:33:29.00,0:33:32.00,Default,,0000,0000,0000,,something, and a whole bunch of other ones not used. An example, even the first two digits are never Dialogue: 0,0:33:32.00,0:33:35.00,Default,,0000,0000,0000,,00, like there's a whole bunch of buckets that don't even get used as the first two digits of a Dialogue: 0,0:33:35.00,0:33:38.00,Default,,0000,0000,0000,,phone number. Phone numbers never start with zeros. Dialogue: 0,0:33:38.00,0:33:42.00,Default,,0000,0000,0000,,That they rarely actually have zeros or ones in them at all. They try to use those for the Dialogue: 0,0:33:42.00,0:33:44.00,Default,,0000,0000,0000,,area code and stuff in the leading digits. So I Dialogue: 0,0:33:44.00,0:33:45.00,Default,,0000,0000,0000,,thought it was just a really Dialogue: 0,0:33:45.00,0:33:48.00,Default,,0000,0000,0000,,interesting exercise and like, oh yeah, they exactly had kinda thought through hashing. Of Dialogue: 0,0:33:48.00,0:33:52.00,Default,,0000,0000,0000,,course, they did not think it was hashing, and they thought I was a crackpot, and they didn't want to give me Dialogue: 0,0:33:52.00,0:33:56.00,Default,,0000,0000,0000,,my stuff, but I paid my money and they [inaudible], Dialogue: 0,0:33:56.00,0:34:00.00,Default,,0000,0000,0000,,but it kinda shows you, okay what you're trying to do here is try to make that number of Dialogue: 0,0:34:00.00,0:34:00.00,Default,,0000,0000,0000,,buckets, Dialogue: 0,0:34:00.00,0:34:03.00,Default,,0000,0000,0000,,right, roughly equal to the number of elements Dialogue: 0,0:34:03.00,0:34:04.00,Default,,0000,0000,0000,,so that Dialogue: 0,0:34:04.00,0:34:09.00,Default,,0000,0000,0000,,if your hash function's dividing up across your whole world, Dialogue: 0,0:34:09.00,0:34:11.00,Default,,0000,0000,0000,,you've got Dialogue: 0,0:34:11.00,0:34:14.00,Default,,0000,0000,0000,,constant access to what Dialogue: 0,0:34:14.00,0:34:17.00,Default,,0000,0000,0000,,you have. Why is there an L at the end of one item? You know, that's actually just a ? C++ isn't for this. This is a long value Dialogue: 0,0:34:17.00,0:34:21.00,Default,,0000,0000,0000,,that without it, it assumes it's an int and that number is to big to fit in an int and it Dialogue: 0,0:34:21.00,0:34:25.00,Default,,0000,0000,0000,,is twice as big as a long in terms of space and so the L needs to be there, otherwise it Dialogue: 0,0:34:25.00,0:34:26.00,Default,,0000,0000,0000,,will Dialogue: 0,0:34:26.00,0:34:30.00,Default,,0000,0000,0000,,try to read it as an int and through away part of the information I was trying to give it. So it's the Dialogue: 0,0:34:30.00,0:34:31.00,Default,,0000,0000,0000,,way Dialogue: 0,0:34:31.00,0:34:35.00,Default,,0000,0000,0000,,of identifying it along constant. Dialogue: 0,0:34:35.00,0:34:37.00,Default,,0000,0000,0000,,So let me talk about this just a little bit more Dialogue: 0,0:34:37.00,0:34:39.00,Default,,0000,0000,0000,,in terms of Dialogue: 0,0:34:39.00,0:34:41.00,Default,,0000,0000,0000,, Dialogue: 0,0:34:41.00,0:34:43.00,Default,,0000,0000,0000,,actual performance, right, we've got N Dialogue: 0,0:34:43.00,0:34:46.00,Default,,0000,0000,0000,,over B, right, so if the division is complete, right, the number of entries over Dialogue: 0,0:34:46.00,0:34:48.00,Default,,0000,0000,0000,,the number of buckets, Dialogue: 0,0:34:48.00,0:34:48.00,Default,,0000,0000,0000,, Dialogue: 0,0:34:48.00,0:34:51.00,Default,,0000,0000,0000,,if we make them kind of on the same order of Dialogue: 0,0:34:51.00,0:34:52.00,Default,,0000,0000,0000,,magnitude, kind of in the same range, Dialogue: 0,0:34:52.00,0:34:55.00,Default,,0000,0000,0000,,so having to sum this could be to make a little bit of an estimated guess about Dialogue: 0,0:34:55.00,0:34:58.00,Default,,0000,0000,0000,,where we're going, Dialogue: 0,0:34:58.00,0:35:01.00,Default,,0000,0000,0000,,and then there are some choices here in how we store each bucket, right. We could use Dialogue: 0,0:35:01.00,0:35:05.00,Default,,0000,0000,0000,,a little array, we could use a link list, right, we could use a vector. Dialogue: 0,0:35:05.00,0:35:07.00,Default,,0000,0000,0000,,We expect this to be very small, is the truth. Dialogue: 0,0:35:07.00,0:35:09.00,Default,,0000,0000,0000,,We're hoping that it's gonna be one or two, Dialogue: 0,0:35:09.00,0:35:13.00,Default,,0000,0000,0000,,and so there's not a lot of reason to buy any real expensive structure or even bother Dialogue: 0,0:35:13.00,0:35:17.00,Default,,0000,0000,0000,,doing things like sorting. Like you could sort them, but like if you expect there to be two Dialogue: 0,0:35:17.00,0:35:19.00,Default,,0000,0000,0000,,things, what's the point? You're gonna spend more time Dialogue: 0,0:35:19.00,0:35:21.00,Default,,0000,0000,0000,,figuring out whether to put it in the front or the back Dialogue: 0,0:35:21.00,0:35:25.00,Default,,0000,0000,0000,,then you would gain at all by being able to Dialogue: 0,0:35:25.00,0:35:27.00,Default,,0000,0000,0000,,find it a little faster. Dialogue: 0,0:35:27.00,0:35:28.00,Default,,0000,0000,0000,,So the link list Dialogue: 0,0:35:28.00,0:35:32.00,Default,,0000,0000,0000,,is just gonna be easiest way to get kind of allocated without over capacity, Dialogue: 0,0:35:32.00,0:35:35.00,Default,,0000,0000,0000,,excess capacity that's unused in the bucket. You know it's gonna be small, use the Dialogue: 0,0:35:35.00,0:35:37.00,Default,,0000,0000,0000,,simple strategy. Dialogue: 0,0:35:37.00,0:35:41.00,Default,,0000,0000,0000,,It's also, though, an important thing that the hash type is likely to do in kind of a Dialogue: 0,0:35:41.00,0:35:44.00,Default,,0000,0000,0000,,industrial strength situation is it does have to deal with this idea, well what if that number of Dialogue: 0,0:35:44.00,0:35:47.00,Default,,0000,0000,0000,,buckets that you predicted was wrong? Dialogue: 0,0:35:47.00,0:35:51.00,Default,,0000,0000,0000,,And so the map as we have given it to you, right, has to take this sponge because Dialogue: 0,0:35:51.00,0:35:54.00,Default,,0000,0000,0000,,it doesn't know in advance how many things are you gonna put in. Dialogue: 0,0:35:54.00,0:35:57.00,Default,,0000,0000,0000,,The only way to know that is to wait and see as things get added, Dialogue: 0,0:35:57.00,0:35:59.00,Default,,0000,0000,0000,,and then realize your buckets are getting full. Dialogue: 0,0:35:59.00,0:36:02.00,Default,,0000,0000,0000,,So typically the industrial strength version of the hash table is gonna Dialogue: 0,0:36:02.00,0:36:06.00,Default,,0000,0000,0000,,track what's called the load factor. And the load factor is just the number of total Dialogue: 0,0:36:06.00,0:36:08.00,Default,,0000,0000,0000,,entries divided by the number of buckets. Dialogue: 0,0:36:08.00,0:36:09.00,Default,,0000,0000,0000,,When that Dialogue: 0,0:36:09.00,0:36:13.00,Default,,0000,0000,0000,,factor gets high, and high actually is actually quite small, in fact. If Dialogue: 0,0:36:13.00,0:36:15.00,Default,,0000,0000,0000,,it's two, is often the trigger for Dialogue: 0,0:36:15.00,0:36:19.00,Default,,0000,0000,0000,,causing a rehash situation, so Dialogue: 0,0:36:19.00,0:36:22.00,Default,,0000,0000,0000,,not even letting it get to be three or four. Just saying as soon as you realize that Dialogue: 0,0:36:22.00,0:36:22.00,Default,,0000,0000,0000,,you have Dialogue: 0,0:36:22.00,0:36:26.00,Default,,0000,0000,0000,,exceeded by double the capacity you intended, you planned for 100 things, Dialogue: 0,0:36:26.00,0:36:27.00,Default,,0000,0000,0000,,you got 200 things, Dialogue: 0,0:36:27.00,0:36:28.00,Default,,0000,0000,0000,,go ahead Dialogue: 0,0:36:28.00,0:36:31.00,Default,,0000,0000,0000,,and redo your whole bucket array. Dialogue: 0,0:36:31.00,0:36:39.00,Default,,0000,0000,0000,,So in the case, for example, of our simple like 0 through 7 case, right, maybe Dialogue: 0,0:36:39.00,0:36:43.00,Default,,0000,0000,0000,,it's 0 through 6. One, two, three, Dialogue: 0,0:36:43.00,0:36:45.00,Default,,0000,0000,0000,,four, so I have one that had six buckets, right, Dialogue: 0,0:36:45.00,0:36:48.00,Default,,0000,0000,0000,,and then I've gotten to where Dialogue: 0,0:36:48.00,0:36:50.00,Default,,0000,0000,0000,,each of them has Dialogue: 0,0:36:50.00,0:36:54.00,Default,,0000,0000,0000,,two or maybe three in some and one in another. Dialogue: 0,0:36:54.00,0:36:57.00,Default,,0000,0000,0000,,The plan is to just take this whole bucket array and Dialogue: 0,0:36:57.00,0:37:01.00,Default,,0000,0000,0000,,enlarge it by a factor of two, so grow it, and in this case, from 6 to 12, Dialogue: 0,0:37:01.00,0:37:02.00,Default,,0000,0000,0000,,and then rehash Dialogue: 0,0:37:02.00,0:37:04.00,Default,,0000,0000,0000,,to move everybody. The Dialogue: 0,0:37:04.00,0:37:08.00,Default,,0000,0000,0000,,idea is that the earlier decision about where to place them was based on knowing that Dialogue: 0,0:37:08.00,0:37:11.00,Default,,0000,0000,0000,,there was exactly six buckets and so where it was placed before Dialogue: 0,0:37:11.00,0:37:15.00,Default,,0000,0000,0000,,is not likely to be the place it will place if you have, you know, Dialogue: 0,0:37:15.00,0:37:18.00,Default,,0000,0000,0000,,12 choices to lay them down into. Dialogue: 0,0:37:18.00,0:37:21.00,Default,,0000,0000,0000,,And ideal ? in fact you don't even want it to be that case. Like, if they all land into the Dialogue: 0,0:37:21.00,0:37:23.00,Default,,0000,0000,0000,,same bucket, right, you would have Dialogue: 0,0:37:23.00,0:37:25.00,Default,,0000,0000,0000,,the same clustering and then a big empty clump, Dialogue: 0,0:37:25.00,0:37:28.00,Default,,0000,0000,0000,,and so you would rehash everything, run it through the new hash function having Dialogue: 0,0:37:28.00,0:37:31.00,Default,,0000,0000,0000,,changed the number of buckets, and say, ''Well, now where does it go in the hashing scheme?'' Dialogue: 0,0:37:31.00,0:37:34.00,Default,,0000,0000,0000,,And hopefully you'd end up getting a clean table, right, where you had 12 Dialogue: 0,0:37:34.00,0:37:37.00,Default,,0000,0000,0000,,items now with one in each bucket, Dialogue: 0,0:37:37.00,0:37:40.00,Default,,0000,0000,0000,,ready to kinda give constant performance through Dialogue: 0,0:37:40.00,0:37:42.00,Default,,0000,0000,0000,,that and then potentially, Dialogue: 0,0:37:42.00,0:37:43.00,Default,,0000,0000,0000,,again if you overload Dialogue: 0,0:37:43.00,0:37:47.00,Default,,0000,0000,0000,,your load factor, go back in and rearrange stuff again. Dialogue: 0,0:37:47.00,0:37:50.00,Default,,0000,0000,0000,,So using this strategy a little bit like the one we talked about with the binary search tree of just Dialogue: 0,0:37:50.00,0:37:51.00,Default,,0000,0000,0000,,wait and see. Rather Dialogue: 0,0:37:51.00,0:37:55.00,Default,,0000,0000,0000,,than if you got, you know, just because you get two items in a bucket it doesn't immediately cause any Dialogue: 0,0:37:55.00,0:37:57.00,Default,,0000,0000,0000,,real alarm or crisis, Dialogue: 0,0:37:57.00,0:38:00.00,Default,,0000,0000,0000,,it waits until there's kinda sufficiently clear demand Dialogue: 0,0:38:00.00,0:38:03.00,Default,,0000,0000,0000,,that exceeds the capacity plan for, Dialogue: 0,0:38:03.00,0:38:05.00,Default,,0000,0000,0000,,to do a big rearrangement. Dialogue: 0,0:38:05.00,0:38:08.00,Default,,0000,0000,0000,,So you'd end up saying that adds, adds, adds would be fast, fast, fast, fast, fast. Dialogue: 0,0:38:08.00,0:38:12.00,Default,,0000,0000,0000,,Starting to get just a hair slower, right, having to do a search through a few things as opposed Dialogue: 0,0:38:12.00,0:38:13.00,Default,,0000,0000,0000,,to one, Dialogue: 0,0:38:13.00,0:38:16.00,Default,,0000,0000,0000,,and then every now and then, right, you'd see an add which caused a big reshuffle of Dialogue: 0,0:38:16.00,0:38:19.00,Default,,0000,0000,0000,,the table, which would be a totally linear operation in the number of Dialogue: 0,0:38:19.00,0:38:20.00,Default,,0000,0000,0000,,elements Dialogue: 0,0:38:20.00,0:38:22.00,Default,,0000,0000,0000,,and then they'd be fast again Dialogue: 0,0:38:22.00,0:38:24.00,Default,,0000,0000,0000,,for another big clump of inserts until that Dialogue: 0,0:38:24.00,0:38:27.00,Default,,0000,0000,0000,,same expansion might be complete or triggered by Dialogue: 0,0:38:27.00,0:38:28.00,Default,,0000,0000,0000,,it growing even more. Dialogue: 0,0:38:28.00,0:38:31.00,Default,,0000,0000,0000,,But dividing it sorta by the total number of inserts, so if you had 1,000 inserts Dialogue: 0,0:38:31.00,0:38:35.00,Default,,0000,0000,0000,,before you overloaded and then you had to copy all 1,000 things to get ready for the Dialogue: 0,0:38:35.00,0:38:36.00,Default,,0000,0000,0000,,next thousand, Dialogue: 0,0:38:36.00,0:38:39.00,Default,,0000,0000,0000,,if you average that out, it still ends up Dialogue: 0,0:38:39.00,0:38:43.00,Default,,0000,0000,0000,,being called a constant operation. So Dialogue: 0,0:38:43.00,0:38:46.00,Default,,0000,0000,0000,,that's pretty neat. Dialogue: 0,0:38:46.00,0:38:47.00,Default,,0000,0000,0000,,It's a ? Dialogue: 0,0:38:47.00,0:38:50.00,Default,,0000,0000,0000,,really one of the, sort of, more, I think, Dialogue: 0,0:38:50.00,0:38:53.00,Default,,0000,0000,0000,,easily understood and beautiful algorithmic techniques, right, for solving, right, Dialogue: 0,0:38:53.00,0:38:55.00,Default,,0000,0000,0000,,this hard problem of Dialogue: 0,0:38:55.00,0:38:57.00,Default,,0000,0000,0000,,search and retrieval Dialogue: 0,0:38:57.00,0:38:58.00,Default,,0000,0000,0000,,that Dialogue: 0,0:38:58.00,0:39:01.00,Default,,0000,0000,0000,,the vector and the sorting and the BST are all trying to get at, but ? and sometimes the Dialogue: 0,0:39:01.00,0:39:04.00,Default,,0000,0000,0000,,sorter, vector and the BST are solving actually a much harder problem, right, Dialogue: 0,0:39:04.00,0:39:09.00,Default,,0000,0000,0000,,which is keeping a total ordering of all the data and being able to flip it, Dialogue: 0,0:39:09.00,0:39:12.00,Default,,0000,0000,0000,,traverse it and sorted order is one of the advantages that the sorted vector Dialogue: 0,0:39:12.00,0:39:12.00,Default,,0000,0000,0000,,and Dialogue: 0,0:39:12.00,0:39:14.00,Default,,0000,0000,0000,,the BST have that hash table doesn't have at all. Dialogue: 0,0:39:14.00,0:39:17.00,Default,,0000,0000,0000,,In fact, it's jumbled it all up, and so if you think about how the iterator for Dialogue: 0,0:39:17.00,0:39:21.00,Default,,0000,0000,0000,,the map works, our map is actually implemented using a hash table, Dialogue: 0,0:39:21.00,0:39:24.00,Default,,0000,0000,0000,,it actually just iterates over the link list, and that explains why it almost appears Dialogue: 0,0:39:24.00,0:39:27.00,Default,,0000,0000,0000,,completely random. If you put a bunch of things in the table and you go to iterate Dialogue: 0,0:39:27.00,0:39:28.00,Default,,0000,0000,0000,,and Dialogue: 0,0:39:28.00,0:39:29.00,Default,,0000,0000,0000,,visit them again, Dialogue: 0,0:39:29.00,0:39:33.00,Default,,0000,0000,0000,,that the order you visit the keys seems to make no sense, and that's because it's based on Dialogue: 0,0:39:33.00,0:39:35.00,Default,,0000,0000,0000,,their hash codes, right, which Dialogue: 0,0:39:35.00,0:39:36.00,Default,,0000,0000,0000,,aren't designed Dialogue: 0,0:39:36.00,0:39:37.00,Default,,0000,0000,0000,,to be Dialogue: 0,0:39:37.00,0:39:39.00,Default,,0000,0000,0000,,any real sensible ordering Dialogue: 0,0:39:39.00,0:39:42.00,Default,,0000,0000,0000,,to anybody not in the know of how the hash function works, Dialogue: 0,0:39:42.00,0:39:46.00,Default,,0000,0000,0000,,whereas iterating over a vector that's sorted or iterating over a BST Dialogue: 0,0:39:46.00,0:39:47.00,Default,,0000,0000,0000,,or a, Dialogue: 0,0:39:47.00,0:39:51.00,Default,,0000,0000,0000,,in this case, our set, for example, is backed by a binary research tree Dialogue: 0,0:39:51.00,0:39:54.00,Default,,0000,0000,0000,,will give you that sorted order. So it solves this harder problem, right, which cause there to Dialogue: 0,0:39:54.00,0:39:57.00,Default,,0000,0000,0000,,be more kinda more investment in that problem, whereas hashing solves just exactly the Dialogue: 0,0:39:57.00,0:39:58.00,Default,,0000,0000,0000,,one problem, which is Dialogue: 0,0:39:58.00,0:40:02.00,Default,,0000,0000,0000,,I want to be able to find exactly this value again Dialogue: 0,0:40:02.00,0:40:03.00,Default,,0000,0000,0000,,and update this value, Dialogue: 0,0:40:03.00,0:40:08.00,Default,,0000,0000,0000,,and nothing more is needed than just identifying this, Dialogue: 0,0:40:08.00,0:40:10.00,Default,,0000,0000,0000,,not finding the things that are less than this. For example, if you needed to find all Dialogue: 0,0:40:10.00,0:40:13.00,Default,,0000,0000,0000,,the values that were less than this key alphabetically, Dialogue: 0,0:40:13.00,0:40:15.00,Default,,0000,0000,0000,,right, the hash table makes that Dialogue: 0,0:40:15.00,0:40:16.00,Default,,0000,0000,0000,,no easier Dialogue: 0,0:40:16.00,0:40:18.00,Default,,0000,0000,0000,,than an unsorted vector, Dialogue: 0,0:40:18.00,0:40:21.00,Default,,0000,0000,0000,,whereas things like assorted vector and binary search tree actually enable that kinda search, Dialogue: 0,0:40:21.00,0:40:24.00,Default,,0000,0000,0000,,you could find the place in the tree and kind of work your way down the left side Dialogue: 0,0:40:24.00,0:40:27.00,Default,,0000,0000,0000,,to find the things that were less than that. Dialogue: 0,0:40:27.00,0:40:31.00,Default,,0000,0000,0000,,That's not actually being solved by the hash at all. Its use of Dialogue: 0,0:40:31.00,0:40:34.00,Default,,0000,0000,0000,,memory is actually comparable to a Dialogue: 0,0:40:34.00,0:40:35.00,Default,,0000,0000,0000,,binary search tree Dialogue: 0,0:40:35.00,0:40:38.00,Default,,0000,0000,0000,,in that it has a four byte pointer per entry, which is the next field on the Dialogue: 0,0:40:38.00,0:40:40.00,Default,,0000,0000,0000,,link list chain, and Dialogue: 0,0:40:40.00,0:40:44.00,Default,,0000,0000,0000,,then there's a four byte pointer for each bucket, the head of the link list Dialogue: 0,0:40:44.00,0:40:47.00,Default,,0000,0000,0000,,cell. Given that we expect that the buckets to be roughly equal to entry, Dialogue: 0,0:40:47.00,0:40:51.00,Default,,0000,0000,0000,,than we can kinda just summarize that as well. Each entry Dialogue: 0,0:40:51.00,0:40:53.00,Default,,0000,0000,0000,,represented an eight byte overhead, Dialogue: 0,0:40:53.00,0:40:56.00,Default,,0000,0000,0000,,which is the same eight byte overhead, right, that the left and right pointers of Dialogue: 0,0:40:56.00,0:40:58.00,Default,,0000,0000,0000,,the binary search Dialogue: 0,0:40:58.00,0:40:59.00,Default,,0000,0000,0000,,tree add. Dialogue: 0,0:40:59.00,0:41:01.00,Default,,0000,0000,0000,,Does it have any degenerate cases? That's a good Dialogue: 0,0:41:01.00,0:41:06.00,Default,,0000,0000,0000,,question. So one of the things that made the binary search tree a little bit of a Dialogue: 0,0:41:06.00,0:41:08.00,Default,,0000,0000,0000,,heavy investment was that to get that Dialogue: 0,0:41:08.00,0:41:11.00,Default,,0000,0000,0000,,balancing behavior, even when we're getting data coming in in a way that's Dialogue: 0,0:41:11.00,0:41:16.00,Default,,0000,0000,0000,,causing a lopsided construction, we have to go out of our way to do something special. Dialogue: 0,0:41:16.00,0:41:19.00,Default,,0000,0000,0000,,What is the degenerate case for hash? Is there something that caused it to behave really, Dialogue: 0,0:41:19.00,0:41:24.00,Default,,0000,0000,0000,,really badly? Anyone wanna help me out with Dialogue: 0,0:41:24.00,0:41:28.00,Default,,0000,0000,0000,,that? Is it always good? Does it always give a good spread? Can we Dialogue: 0,0:41:28.00,0:41:32.00,Default,,0000,0000,0000,,end up with everything in the same bucket somehow? [Inaudible] repeated it that way? Dialogue: 0,0:41:32.00,0:41:35.00,Default,,0000,0000,0000,,So if you have repeated elements the way that ? Dialogue: 0,0:41:35.00,0:41:37.00,Default,,0000,0000,0000,,both versions of the map work is they overwrite. Dialogue: 0,0:41:37.00,0:41:41.00,Default,,0000,0000,0000,,So actually the nice thing is, yeah, if you did it again you just take that same cell and Dialogue: 0,0:41:41.00,0:41:43.00,Default,,0000,0000,0000,,overwrite it, so in fact we wouldn't get a clustering based on the same entry Dialogue: 0,0:41:43.00,0:41:45.00,Default,,0000,0000,0000,,going in and out. Dialogue: 0,0:41:45.00,0:41:51.00,Default,,0000,0000,0000,,But how'd we end up with everything being in the same bucket? Dialogue: 0,0:41:51.00,0:41:54.00,Default,,0000,0000,0000,,Mostly comes out, if you look at your hash function, when would a hash function collide? Dialogue: 0,0:41:54.00,0:41:58.00,Default,,0000,0000,0000,,Right, and if you have a dumb hash function, right, you can definitely have some degenerate Dialogue: 0,0:41:58.00,0:42:01.00,Default,,0000,0000,0000,,cases. My dumb hash function of return zero, right, being the Dialogue: 0,0:42:01.00,0:42:03.00,Default,,0000,0000,0000,,worst example of that, Dialogue: 0,0:42:03.00,0:42:07.00,Default,,0000,0000,0000,,but any hash function, right, that wasn't being particularly clever about using all of its Dialogue: 0,0:42:07.00,0:42:09.00,Default,,0000,0000,0000,,information might actually have some clustering in Dialogue: 0,0:42:09.00,0:42:12.00,Default,,0000,0000,0000,,it. It is possible, for example, even with a particularly good hash function that Dialogue: 0,0:42:12.00,0:42:15.00,Default,,0000,0000,0000,,there are strings that will hash to the same thing, and it's like, if you somehow got Dialogue: 0,0:42:15.00,0:42:18.00,Default,,0000,0000,0000,,really, really unlucky Dialogue: 0,0:42:18.00,0:42:19.00,Default,,0000,0000,0000,,that you ? Dialogue: 0,0:42:19.00,0:42:23.00,Default,,0000,0000,0000,,and had a hash function that wasn't doing the best job possible that you could get some Dialogue: 0,0:42:23.00,0:42:25.00,Default,,0000,0000,0000,,clustering, but in general it doesn't require Dialogue: 0,0:42:25.00,0:42:29.00,Default,,0000,0000,0000,,? the sole responsibility, right, for the generate case comes down to your hash function. Dialogue: 0,0:42:29.00,0:42:31.00,Default,,0000,0000,0000,,So as long as your hash function Dialogue: 0,0:42:31.00,0:42:34.00,Default,,0000,0000,0000,,and your inputs match your expectation, Dialogue: 0,0:42:34.00,0:42:35.00,Default,,0000,0000,0000,,you don't have any surprises about Dialogue: 0,0:42:35.00,0:42:38.00,Default,,0000,0000,0000,,how they got inserted the way it was in a binary search tree, which is sometimes hard Dialogue: 0,0:42:38.00,0:42:40.00,Default,,0000,0000,0000,,to control. Dialogue: 0,0:42:40.00,0:42:43.00,Default,,0000,0000,0000,,Just as you could underestimate the number of buckets you need ? Yeah. ? could Dialogue: 0,0:42:43.00,0:42:46.00,Default,,0000,0000,0000,,you over estimate the number of buckets you Dialogue: 0,0:42:46.00,0:42:51.00,Default,,0000,0000,0000,,need by a large amount ? Um hm. ? that's wasting a lot of it ? Exactly. ? memory, and Dialogue: 0,0:42:51.00,0:42:52.00,Default,,0000,0000,0000,,do you then go through and take Dialogue: 0,0:42:52.00,0:42:54.00,Default,,0000,0000,0000,,that memory back? Yeah. Dialogue: 0,0:42:54.00,0:42:57.00,Default,,0000,0000,0000,,Yeah, so that's a great question. So a lot of data structures Dialogue: 0,0:42:57.00,0:42:58.00,Default,,0000,0000,0000,,don't shrink, Dialogue: 0,0:42:58.00,0:43:02.00,Default,,0000,0000,0000,,right, and for example, the vector often will only grow on demand and then even if you deleted a Dialogue: 0,0:43:02.00,0:43:04.00,Default,,0000,0000,0000,,bunch of, or removed a bunch of elements, it doesn't Dialogue: 0,0:43:04.00,0:43:06.00,Default,,0000,0000,0000,,necessarily consider shrinking a big deal, and so Dialogue: 0,0:43:06.00,0:43:10.00,Default,,0000,0000,0000,,that's often a ? maybe it's a little bit lazy, but it turns out, right, that having a bunch of memory around Dialogue: 0,0:43:10.00,0:43:12.00,Default,,0000,0000,0000,,you're not using Dialogue: 0,0:43:12.00,0:43:15.00,Default,,0000,0000,0000,,doesn't have nearly the same penalty that not Dialogue: 0,0:43:15.00,0:43:17.00,Default,,0000,0000,0000,,having the memory when you needed it and having to do a lot of extra work to find Dialogue: 0,0:43:17.00,0:43:18.00,Default,,0000,0000,0000,,things. Dialogue: 0,0:43:18.00,0:43:21.00,Default,,0000,0000,0000,,So typically, right, you would try to pick a size that you are Dialogue: 0,0:43:21.00,0:43:26.00,Default,,0000,0000,0000,,willing to commit to, and say, ''Well, no matter what, I'll stay at this size. I won't get any smaller.'' Dialogue: 0,0:43:26.00,0:43:28.00,Default,,0000,0000,0000,, Dialogue: 0,0:43:28.00,0:43:30.00,Default,,0000,0000,0000,,But that, as you grow, you might not shrink back to that size. You might just Dialogue: 0,0:43:30.00,0:43:34.00,Default,,0000,0000,0000,,let the capacity just kinda lay in waste. There are good Dialogue: 0,0:43:34.00,0:43:37.00,Default,,0000,0000,0000,,reasons to actually be tidy about it, but again, some of this is hard to predict. Dialogue: 0,0:43:37.00,0:43:40.00,Default,,0000,0000,0000,,It might be that your table kinda grows big and then a bunch of things get taken out, and then it grows Dialogue: 0,0:43:40.00,0:43:41.00,Default,,0000,0000,0000,,big again. Like, maybe you Dialogue: 0,0:43:41.00,0:43:43.00,Default,,0000,0000,0000,,have a new freshman class come in, Dialogue: 0,0:43:43.00,0:43:45.00,Default,,0000,0000,0000,,and then you graduate a whole bunch of people. Then Dialogue: 0,0:43:45.00,0:43:47.00,Default,,0000,0000,0000,,at the point where you graduate them all, you're about ready to take in a new freshman Dialogue: 0,0:43:47.00,0:43:51.00,Default,,0000,0000,0000,,class and so it might be that in fact, right, the capacity you just got rid of, you're gonna plan on Dialogue: 0,0:43:51.00,0:43:53.00,Default,,0000,0000,0000,,reusing in a minute anyway, and so maybe Dialogue: 0,0:43:53.00,0:43:56.00,Default,,0000,0000,0000,,that clearing it out and totally releasing Dialogue: 0,0:43:56.00,0:43:57.00,Default,,0000,0000,0000,,may actually be an exercise Dialogue: 0,0:43:57.00,0:44:01.00,Default,,0000,0000,0000,,that was unimportant. You're planning on getting back to that size eventually, anyway. Dialogue: 0,0:44:01.00,0:44:04.00,Default,,0000,0000,0000,,Most hash tables tend to grow is actually the truth. Right, you tend to Dialogue: 0,0:44:04.00,0:44:07.00,Default,,0000,0000,0000,,be collecting data in there and Dialogue: 0,0:44:07.00,0:44:11.00,Default,,0000,0000,0000,,they tend to only enlarge. It's kind of unusual that you take out a bunch of Dialogue: 0,0:44:11.00,0:44:15.00,Default,,0000,0000,0000,,entries you already put in. Dialogue: 0,0:44:15.00,0:44:18.00,Default,,0000,0000,0000,,And so I just put a little last note, which is like, okay, well I had said earlier that when we Dialogue: 0,0:44:18.00,0:44:19.00,Default,,0000,0000,0000,,have the map, Dialogue: 0,0:44:19.00,0:44:22.00,Default,,0000,0000,0000,,the key had to be string type and I was gonna at Dialogue: 0,0:44:22.00,0:44:25.00,Default,,0000,0000,0000,,some point come back to this and talk about Dialogue: 0,0:44:25.00,0:44:28.00,Default,,0000,0000,0000,,why that was the one restriction in all our template classes, right, you can put anything you Dialogue: 0,0:44:28.00,0:44:31.00,Default,,0000,0000,0000,,want in statter or vector or stacker queue. Dialogue: 0,0:44:31.00,0:44:34.00,Default,,0000,0000,0000,,That map lets you store any kind of value associated with that key, but that key was Dialogue: 0,0:44:34.00,0:44:36.00,Default,,0000,0000,0000,,string type, and Dialogue: 0,0:44:36.00,0:44:40.00,Default,,0000,0000,0000,,so given how our map is implemented as a hash table, that actually starts to make sense, that if Dialogue: 0,0:44:40.00,0:44:42.00,Default,,0000,0000,0000,,you're gonna take the key and kind of jumble it up and map it to Dialogue: 0,0:44:42.00,0:44:44.00,Default,,0000,0000,0000,,a bucket, Dialogue: 0,0:44:44.00,0:44:47.00,Default,,0000,0000,0000,,you need to know something about how to extract information from the key, and so Dialogue: 0,0:44:47.00,0:44:51.00,Default,,0000,0000,0000,,this case, knowing it's a string means you know it has characters, you know its length, and you can do Dialogue: 0,0:44:51.00,0:44:52.00,Default,,0000,0000,0000,,those things. Dialogue: 0,0:44:52.00,0:44:53.00,Default,,0000,0000,0000,,If you wanted to make a map Dialogue: 0,0:44:53.00,0:44:57.00,Default,,0000,0000,0000,,that could take any kinda key, maybe integers is key or student structures is Dialogue: 0,0:44:57.00,0:44:59.00,Default,,0000,0000,0000,,key or, you Dialogue: 0,0:44:59.00,0:45:00.00,Default,,0000,0000,0000,,know, Dialogue: 0,0:45:00.00,0:45:01.00,Default,,0000,0000,0000,,doubles is key, Dialogue: 0,0:45:01.00,0:45:05.00,Default,,0000,0000,0000,,any of these things, it's like, well, how can you write a generic form Dialogue: 0,0:45:05.00,0:45:06.00,Default,,0000,0000,0000,,that could use Dialogue: 0,0:45:06.00,0:45:08.00,Default,,0000,0000,0000,,a hashing operation on that Dialogue: 0,0:45:08.00,0:45:11.00,Default,,0000,0000,0000,,kind of key and map it to bucket numbers. Dialogue: 0,0:45:11.00,0:45:14.00,Default,,0000,0000,0000,,You would build a two type template to kinda get this working in C++, so Dialogue: 0,0:45:14.00,0:45:18.00,Default,,0000,0000,0000,,you can actually have a template that has a type name for the key type, a type name for Dialogue: 0,0:45:18.00,0:45:19.00,Default,,0000,0000,0000,,the value type, Dialogue: 0,0:45:19.00,0:45:20.00,Default,,0000,0000,0000,,and then Dialogue: 0,0:45:20.00,0:45:25.00,Default,,0000,0000,0000,,in the member functions you'd refer to both key type and value type as these two Dialogue: 0,0:45:25.00,0:45:26.00,Default,,0000,0000,0000,,distinct place holders. Dialogue: 0,0:45:26.00,0:45:29.00,Default,,0000,0000,0000,,And then when the client set it up, right, they'd be saying, ''Okay, I want a map Dialogue: 0,0:45:29.00,0:45:32.00,Default,,0000,0000,0000,,that has strings that map to integers.'' So maybe this is Dialogue: 0,0:45:32.00,0:45:36.00,Default,,0000,0000,0000,,words and the page they appear in a document. I have a map of integers and a vector Dialogue: 0,0:45:36.00,0:45:40.00,Default,,0000,0000,0000,,string. Maybe this is score on an exam and the names of the students who got that Dialogue: 0,0:45:40.00,0:45:42.00,Default,,0000,0000,0000,,score, doing it that way, Dialogue: 0,0:45:42.00,0:45:44.00,Default,,0000,0000,0000,,and to make this work, right, Dialogue: 0,0:45:44.00,0:45:48.00,Default,,0000,0000,0000,,there has to be some kind of universal hashing that, given some generic type, Dialogue: 0,0:45:48.00,0:45:53.00,Default,,0000,0000,0000,,can turn it into an integer in the range of buckets. The Dialogue: 0,0:45:53.00,0:45:53.00,Default,,0000,0000,0000,,? Dialogue: 0,0:45:53.00,0:45:56.00,Default,,0000,0000,0000,,what it's gonna require here is that the client get involved. Dialogue: 0,0:45:56.00,0:45:58.00,Default,,0000,0000,0000,,That you can not write this Dialogue: 0,0:45:58.00,0:46:00.00,Default,,0000,0000,0000,,generic Dialogue: 0,0:46:00.00,0:46:01.00,Default,,0000,0000,0000,,hash function Dialogue: 0,0:46:01.00,0:46:04.00,Default,,0000,0000,0000,,that will work for all types, right, if it's a student structure, what are you gonna look at? The Dialogue: 0,0:46:04.00,0:46:08.00,Default,,0000,0000,0000,,ID field, the names field, under like ? there's just no sensible way that you can Dialogue: 0,0:46:08.00,0:46:10.00,Default,,0000,0000,0000,,talk about a generic type and talk about how to hash Dialogue: 0,0:46:10.00,0:46:14.00,Default,,0000,0000,0000,,it, so what you would have to do is have some sort of client call back, so probably you would Dialogue: 0,0:46:14.00,0:46:18.00,Default,,0000,0000,0000,,create this by passing out a call back that's given a key type thing, and a number of Dialogue: 0,0:46:18.00,0:46:20.00,Default,,0000,0000,0000,,buckets would Dialogue: 0,0:46:20.00,0:46:24.00,Default,,0000,0000,0000,,do the necessary machinations, right, to hash Dialogue: 0,0:46:24.00,0:46:26.00,Default,,0000,0000,0000,,it into a right number. Dialogue: 0,0:46:26.00,0:46:30.00,Default,,0000,0000,0000,,And so that would be kinda one of those coordination things between the client and the implementer Dialogue: 0,0:46:30.00,0:46:34.00,Default,,0000,0000,0000,,about the client knows what the data is what the data is it's trying to store, and kinda how to Dialogue: 0,0:46:34.00,0:46:35.00,Default,,0000,0000,0000,, Dialogue: 0,0:46:35.00,0:46:41.00,Default,,0000,0000,0000,,mash it up and then the map would know, okay, given that number, where to stick it. So Dialogue: 0,0:46:41.00,0:46:42.00,Default,,0000,0000,0000,,that's what it would take, and then Dialogue: 0,0:46:42.00,0:46:45.00,Default,,0000,0000,0000,,rather sorta load that onto our Dialogue: 0,0:46:45.00,0:46:48.00,Default,,0000,0000,0000,,novice heads, we went ahead just said, ''Okay, it'll always be a string, so that we can Dialogue: 0,0:46:48.00,0:46:53.00,Default,,0000,0000,0000,,do a hash internally without having to get involved.'' All right, Dialogue: 0,0:46:53.00,0:46:55.00,Default,,0000,0000,0000,,any questions about hashing? I'm gonna Dialogue: 0,0:46:55.00,0:46:59.00,Default,,0000,0000,0000,,give you like, a two second talk about set Dialogue: 0,0:46:59.00,0:47:01.00,Default,,0000,0000,0000,,because it's the one Dialogue: 0,0:47:01.00,0:47:04.00,Default,,0000,0000,0000,,ADT I never said, ''Well how does it work? How does it work? What do we do?'' We're talking Dialogue: 0,0:47:04.00,0:47:07.00,Default,,0000,0000,0000,,about, it doesn't add any new things that we haven't already done, so in fact I'm just Dialogue: 0,0:47:07.00,0:47:10.00,Default,,0000,0000,0000,,gonna tell you what it does, and Dialogue: 0,0:47:10.00,0:47:13.00,Default,,0000,0000,0000,,then your picture will be complete, right. Dialogue: 0,0:47:13.00,0:47:16.00,Default,,0000,0000,0000,,It wants to do fast searching, fast updating, add and remove, and it also has all Dialogue: 0,0:47:16.00,0:47:21.00,Default,,0000,0000,0000,,these high level operations, such as intersect, union and difference, Dialogue: 0,0:47:21.00,0:47:24.00,Default,,0000,0000,0000,,that are kind of its primary set of things to do. Dialogue: 0,0:47:24.00,0:47:25.00,Default,,0000,0000,0000,, Dialogue: 0,0:47:25.00,0:47:27.00,Default,,0000,0000,0000,,A bunch of the things we've already seen, right, Dialogue: 0,0:47:27.00,0:47:30.00,Default,,0000,0000,0000,,would actually be a reasonable strategy for upholding stat, right, using some kind Dialogue: 0,0:47:30.00,0:47:31.00,Default,,0000,0000,0000,,of vector or array. Dialogue: 0,0:47:31.00,0:47:34.00,Default,,0000,0000,0000,,Most likely you'd want to keep it in sorted order because that would buy you Dialogue: 0,0:47:34.00,0:47:37.00,Default,,0000,0000,0000,,the fast lookup for contains, Dialogue: 0,0:47:37.00,0:47:41.00,Default,,0000,0000,0000,,but the add and remove, right, would necessarily be slow in those cases because of the Dialogue: 0,0:47:41.00,0:47:43.00,Default,,0000,0000,0000,,continuous memory. Dialogue: 0,0:47:43.00,0:47:44.00,Default,,0000,0000,0000,, Dialogue: 0,0:47:44.00,0:47:47.00,Default,,0000,0000,0000,,The link list, not really probably too good of an option here Dialogue: 0,0:47:47.00,0:47:50.00,Default,,0000,0000,0000,,because it contains then becomes Dialogue: 0,0:47:50.00,0:47:51.00,Default,,0000,0000,0000,,linear and Dialogue: 0,0:47:51.00,0:47:54.00,Default,,0000,0000,0000,,add and remove similarly, right, liner, so it doesn't actually get you anywhere, in Dialogue: 0,0:47:54.00,0:47:57.00,Default,,0000,0000,0000,,fact it just adds the memory in pointers and gunk like that. Dialogue: 0,0:47:57.00,0:47:59.00,Default,,0000,0000,0000,,The binary search tree, probably a Dialogue: 0,0:47:59.00,0:48:00.00,Default,,0000,0000,0000,,pretty good call, Dialogue: 0,0:48:00.00,0:48:03.00,Default,,0000,0000,0000,,right, that's the ? Dialogue: 0,0:48:03.00,0:48:06.00,Default,,0000,0000,0000,,kinda meshes the advantage of Adamic structure with the sorted property to Dialogue: 0,0:48:06.00,0:48:11.00,Default,,0000,0000,0000,,be able to give it fast update adding and removing and searching all can be done Dialogue: 0,0:48:11.00,0:48:14.00,Default,,0000,0000,0000,,in logarithmic time if you have balancing. Dialogue: 0,0:48:14.00,0:48:17.00,Default,,0000,0000,0000,,And then it actually also enables the high level ops, so the idea of being able to do a union Dialogue: 0,0:48:17.00,0:48:19.00,Default,,0000,0000,0000,,intersection difference, and actually being able to Dialogue: 0,0:48:19.00,0:48:23.00,Default,,0000,0000,0000,,walk over both of the trees in sorted order, which is very easily done Dialogue: 0,0:48:23.00,0:48:24.00,Default,,0000,0000,0000,,with a [inaudible] reversal, Dialogue: 0,0:48:24.00,0:48:28.00,Default,,0000,0000,0000,,makes those operations actually more efficient to implement than actually kinda having to deal with Dialogue: 0,0:48:28.00,0:48:32.00,Default,,0000,0000,0000,,data coming at you all different ways. It Dialogue: 0,0:48:32.00,0:48:33.00,Default,,0000,0000,0000,,turns out hashing really Dialogue: 0,0:48:33.00,0:48:35.00,Default,,0000,0000,0000,,won't quite do it Dialogue: 0,0:48:35.00,0:48:36.00,Default,,0000,0000,0000,, Dialogue: 0,0:48:36.00,0:48:37.00,Default,,0000,0000,0000,,very easily for us Dialogue: 0,0:48:37.00,0:48:40.00,Default,,0000,0000,0000,,because of the same need about the template situations, like, well you have to have a hash function that Dialogue: 0,0:48:40.00,0:48:43.00,Default,,0000,0000,0000,,can hash these things and do the right thing, and so the tree happens to be a Dialogue: 0,0:48:43.00,0:48:46.00,Default,,0000,0000,0000,,good of just ? if the client can just give you ordering information, you can Dialogue: 0,0:48:46.00,0:48:48.00,Default,,0000,0000,0000,,do all the hard work without Dialogue: 0,0:48:48.00,0:48:50.00,Default,,0000,0000,0000,,pushing hashing onto them. So in fact Dialogue: 0,0:48:50.00,0:48:53.00,Default,,0000,0000,0000,,the way our set really does work is it does use a binary search tree. It happens to Dialogue: 0,0:48:53.00,0:48:56.00,Default,,0000,0000,0000,,use it through another layered abstraction Dialogue: 0,0:48:56.00,0:48:59.00,Default,,0000,0000,0000,,that there actually is a BST class that we've never really Dialogue: 0,0:48:59.00,0:49:02.00,Default,,0000,0000,0000,,encountered directly, but the BST class just takes the idea of a binary search Dialogue: 0,0:49:02.00,0:49:06.00,Default,,0000,0000,0000,,tree, adds balancing, and Dialogue: 0,0:49:06.00,0:49:08.00,Default,,0000,0000,0000,,all the properties about finding and inserting and adding, Dialogue: 0,0:49:08.00,0:49:12.00,Default,,0000,0000,0000,,and packages it up, and then set actually just turns out to be one liners Dialogue: 0,0:49:12.00,0:49:14.00,Default,,0000,0000,0000,,making calls into the binary search tree. Dialogue: 0,0:49:14.00,0:49:17.00,Default,,0000,0000,0000,,This is where all the heavy lifting Dialogue: 0,0:49:17.00,0:49:18.00,Default,,0000,0000,0000,,goes, is down there. Dialogue: 0,0:49:18.00,0:49:23.00,Default,,0000,0000,0000,,That kinda completes the big picture, so some of stuff, right, so our map uses our hash, right, our Dialogue: 0,0:49:23.00,0:49:26.00,Default,,0000,0000,0000,,BST uses ? or set uses the BST, which is the binary search tree, Dialogue: 0,0:49:26.00,0:49:28.00,Default,,0000,0000,0000,,right, our vector, right, using an array, Dialogue: 0,0:49:28.00,0:49:33.00,Default,,0000,0000,0000,,and then that stacks and queues having both viable strategies both for array Dialogue: 0,0:49:33.00,0:49:37.00,Default,,0000,0000,0000,,vector based backing as well as link list both getting good time performance. And then Dialogue: 0,0:49:37.00,0:49:38.00,Default,,0000,0000,0000,,you saw with the PQ, right, Dialogue: 0,0:49:38.00,0:49:40.00,Default,,0000,0000,0000,,even Dialogue: 0,0:49:40.00,0:49:44.00,Default,,0000,0000,0000,,yet another structure, the heap, right, which is kinda a variant of the tree, and so those tend to be the Dialogue: 0,0:49:44.00,0:49:46.00,Default,,0000,0000,0000,,kind of really classic things in which a lot of things get built, Dialogue: 0,0:49:46.00,0:49:51.00,Default,,0000,0000,0000,,and so having a chance to use them both as a client and kinda dig Dialogue: 0,0:49:51.00,0:49:53.00,Default,,0000,0000,0000,,around as an implementer kinda gives you this big picture of Dialogue: 0,0:49:53.00,0:49:55.00,Default,,0000,0000,0000,,some of the more common ways that Dialogue: 0,0:49:55.00,0:49:57.00,Default,,0000,0000,0000,,cool data structures get built. Dialogue: 0,0:49:57.00,0:50:00.00,Default,,0000,0000,0000,,So that's all for Friday. We're gonna do some more good stuff next week. I'm gonna go to the Terman Dialogue: 0,0:50:00.00,0:50:03.00,Default,,0000,0000,0000,,cafe in a little bit, if you've got some time, please come, and have Dialogue: 0,0:50:03.00,0:50:04.00,Default,,0000,0000,0000,,a good weekend. Dialogue: 0,0:50:04.00,0:50:07.00,Default,,0000,0000,0000,,Work hard on pathfinder.