0:00:00.000,0:00:07.000 0:00:07.000,0:00:12.000 0:00:12.000,0:00:15.000 This presentation is delivered by the Stanford Center for Professional 0:00:15.000,0:00:25.000 Development. 0:00:25.000,0:00:27.000 How are we doing? 0:00:27.000,0:00:28.000 Spring apparently is here; 0:00:28.000,0:00:33.000 just like California, one day it's winter, one day it's spring. Hopefully it's here to say. 0:00:33.000,0:00:38.000 So we had talked about doing it, we did this last time, we did an unsorted vector 0:00:38.000,0:00:40.000 implementation of the map, we just did a little struct of 0:00:40.000,0:00:42.000 the key value pair. 0:00:42.000,0:00:44.000 And then we went through 0:00:44.000,0:00:48.000 that both add and get value are running at linear time in that form because of the need 0:00:48.000,0:00:52.000 to search; search to find a value to override it, and search to find a value to 0:00:52.000,0:00:54.000 pull out the existing 0:00:54.000,0:00:55.000 match to it. 0:00:55.000,0:00:58.000 And if we went to the sorted form of that, we could get get value to be able to 0:00:58.000,0:01:02.000 capitalize on that binary search property and be able to quickly find the 0:01:02.000,0:01:06.000 key if it exists in there or to determine it doesn't exist. 0:01:06.000,0:01:09.000 But the add still suffered from the shuffling problem. We can quickly 0:01:09.000,0:01:12.000 find, using the binary search to find where the 0:01:12.000,0:01:15.000 new element would need to go. But if it doesn't already go there, it doesn't 0:01:15.000,0:01:17.000 already exist. We're going to have to shuffle to make that space. 0:01:17.000,0:01:21.000 So we have still a linear factor in there that we're working on getting rid of. 0:01:21.000,0:01:24.000 And then I know at the bottom that none of them have any 0:01:24.000,0:01:27.000 built in overhead per element. There may be some capacity, excess capacity in the 0:01:27.000,0:01:30.000 vector that we're absorbing, but there's no 0:01:30.000,0:01:33.000 four, ten, eight-byte chunk that every 0:01:33.000,0:01:36.000 element is taking as a cost. 0:01:36.000,0:01:39.000 So we started to talk about this at the end on Friday, and we're going to go with this and 0:01:39.000,0:01:41.000 see where this takes 0:01:41.000,0:01:45.000 us. We had the idea of the vector, and we know what vector constraints are, right? This 0:01:45.000,0:01:48.000 idea of the contiguous memory, and we were going to think about the link list for a minute to 0:01:48.000,0:01:51.000 see if it was going to buy us anything. 0:01:51.000,0:01:54.000 It tends to have an inverse relationship, that once you know where you 0:01:54.000,0:01:57.000 want to insert something it's easy to do that rearrangement because it's just a 0:01:57.000,0:02:01.000 little bit of pointer rewiring that needs to happen in that situation. 0:02:01.000,0:02:03.000 But it's hard to find a position insert, 0:02:03.000,0:02:06.000 even if the link list is in sorted order, 0:02:06.000,0:02:10.000 knowing where the S's are, or the M's are, or the B's are is still a 0:02:10.000,0:02:13.000 matter of starting from the beginning; that your only access in the 0:02:13.000,0:02:16.000 singly linked list is to start at the beginning and work your way down. 0:02:16.000,0:02:20.000 And so I'm going to take you through a little bit of a thought exercise about well, what if 0:02:20.000,0:02:23.000 we try to do something about that? Right now having a pointer coming into 0:02:23.000,0:02:26.000 Bashful and then subsequent ones going in sorted order 0:02:26.000,0:02:27.000 isn't really very helpful. 0:02:27.000,0:02:30.000 But if we were willing to try to rearrange our pointers 0:02:30.000,0:02:33.000 to give us the access that we have in the 0:02:33.000,0:02:36.000 sorted array form; for example, if I had a pointer 0:02:36.000,0:02:39.000 to the middlemost element, in 0:02:39.000,0:02:42.000 this case the Grumpy that's 0:02:42.000,0:02:43.000 midway down the list, if I had 0:02:43.000,0:02:45.000 that pointer 0:02:45.000,0:02:49.000 and from there I could say, is the element I'm looking for 0:02:49.000,0:02:53.000 ahead of Grumpy or behind Grumpy. So having Grumpy split my input in 0:02:53.000,0:02:54.000 half. 0:02:54.000,0:02:56.000 If I had that, right, then that would actually get me 0:02:56.000,0:02:59.000 sort of moving towards that direction that I wanted to get, which is using the binary 0:02:59.000,0:03:03.000 search property to facilitate throwing away half the data 0:03:03.000,0:03:06.000 and look into just the half remaining that's interesting. 0:03:06.000,0:03:09.000 Well, if I had a pointer to Grumpy, that doesn't quite solve my problem. A pointer to Grumpy 0:03:09.000,0:03:12.000 could get me halfway down the list, but then the pointer's coming in this way. If I 0:03:12.000,0:03:16.000 inverted the pointers leading away from it, 0:03:16.000,0:03:17.000 that's also kind of helping 0:03:17.000,0:03:20.000 to facilitate what I'm trying to do. So if I could point toward the middle 0:03:20.000,0:03:24.000 and then kind of move away, and then if I actually kind of applied the same strategy not just at the topmost 0:03:24.000,0:03:28.000 level, but at every subsequent division down, from the half, to the 0:03:28.000,0:03:29.000 quarter to the eight, 0:03:29.000,0:03:32.000 to the sixteenth. And instead of having a pointer 0:03:32.000,0:03:34.000 to the front of the list, what I really maintained was a pointer to the 0:03:34.000,0:03:36.000 middlemost element, and then 0:03:36.000,0:03:38.000 that element maintained pointers 0:03:38.000,0:03:40.000 to the middle of the 0:03:40.000,0:03:44.000 upper quarter and the middle of the lower quarter, and all the way down. 0:03:44.000,0:03:46.000 Then I could pull up this list 0:03:46.000,0:03:50.000 into something that we call a binary search tree. 0:03:50.000,0:03:54.000 So the way I would be storing this is having a pointer to a node, which 0:03:54.000,0:03:58.000 is expected to the median or the middle, or somewhere right along 0:03:58.000,0:03:59.000 the 0:03:59.000,0:04:00.000 values being stored. 0:04:00.000,0:04:04.000 And that it has two pointers, not just a next or previous or something like that, that are related to all 0:04:04.000,0:04:08.000 the elements that precede this one in the ordering, and all the elements that 0:04:08.000,0:04:09.000 follow it are in 0:04:09.000,0:04:11.000 these two pointers. 0:04:11.000,0:04:14.000 They call these subtrees coming off the smaller trees over here with three 0:04:14.000,0:04:17.000 elements on this side, and three elements on that side. 0:04:17.000,0:04:19.000 And that each of those nodes 0:04:19.000,0:04:22.000 has recursively the same formulation; that it has two 0:04:22.000,0:04:25.000 subtree hanging off of it 0:04:25.000,0:04:27.000 that contain, 0:04:27.000,0:04:32.000 given any particular node, that there's this binary search ordering here that says 0:04:32.000,0:04:36.000 everything that's in that left subtree precedes this node in ordering. 0:04:36.000,0:04:38.000 And everything that's in that right subtree 0:04:38.000,0:04:39.000 follows it. 0:04:39.000,0:04:44.000 And that's true for every node throughout the entire arrangement here. 0:04:44.000,0:04:47.000 So this is called a tree, and 0:04:47.000,0:04:50.000 this particular form is actually called the binary search tree. I'm 0:04:50.000,0:04:53.000 going to throw a little bit of vocabulary at you, because I'm going to start using these words, and I just want to get 0:04:53.000,0:04:54.000 them out there 0:04:54.000,0:04:58.000 early so that you understand what those words mean when I start using them. 0:04:58.000,0:05:01.000 Is that each of the cells in here, 0:05:01.000,0:05:02.000 like the cells that are 0:05:02.000,0:05:04.000 allocated for a link list, 0:05:04.000,0:05:07.000 are individually heap allocated, we're going to have pointers that allow us to get back 0:05:07.000,0:05:08.000 to 0:05:08.000,0:05:12.000 them. We typically call those nodes; in terms of tree I'll call it a cell or node in the list kind 0:05:12.000,0:05:16.000 of interchangeably. I'm more likely to use node when I'm talking about tree. 0:05:16.000,0:05:19.000 And a tree is really just a pointer to a node. 0:05:19.000,0:05:21.000 The empty tree 0:05:21.000,0:05:24.000 would be a pointer whose value is null, so pointing at nothing. 0:05:24.000,0:05:27.000 A singleton tree would be a pointer to a single node 0:05:27.000,0:05:29.000 that has left and right subtrees 0:05:29.000,0:05:31.000 that are both empty or both null. 0:05:31.000,0:05:35.000 So Bashful by itself is a singleton, a pointer to 0:05:35.000,0:05:38.000 Bashful. When we talk about subtrees, it just kind of means looking at the top; Grumpy has two 0:05:38.000,0:05:40.000 subtrees, a left and a right, 0:05:40.000,0:05:41.000 0:05:41.000,0:05:42.000 that 0:05:42.000,0:05:45.000 themselves are just smaller forms of the same recursive structure. 0:05:45.000,0:05:49.000 And then we would say that Grumpy is the parent of Doc and Sleepy. 0:05:49.000,0:05:52.000 So those are the two children nodes, and we'll call them the left child and the right 0:05:52.000,0:05:53.000 child. 0:05:53.000,0:05:56.000 And the node that's at the very top, 0:05:56.000,0:06:00.000 the one that our external pointer is coming into and then kind of leads the way, is called 0:06:00.000,0:06:02.000 the root node. The 0:06:02.000,0:06:02.000 first node 0:06:02.000,0:06:05.000 of the tree where we first start our work is going to be called the root. 0:06:05.000,0:06:09.000 And these nodes down here at the bottom that have empty subtrees are called leaf nodes. 0:06:09.000,0:06:14.000 So a node that has no further subtrees, so by itself has null 0:06:14.000,0:06:16.000 left and right trees, is called a leaf. 0:06:16.000,0:06:18.000 So that actually means the tree is kind of upside down 0:06:18.000,0:06:21.000 if you look at it. That Grumpy's the root and these things are the 0:06:21.000,0:06:25.000 leaves shows you that apparently most computer scientists don't get out very much, 0:06:25.000,0:06:28.000 because they haven't noticed that trees actually grow the other direction. 0:06:28.000,0:06:29.000 0:06:29.000,0:06:32.000 But just kind of go with it, and 0:06:32.000,0:06:33.000 you'll be 0:06:33.000,0:06:36.000 speaking tree-speak in no time. 0:06:36.000,0:06:43.000 Anybody have any questions about what we've got basically set up here? 0:06:43.000,0:06:45.000 So there is a 0:06:45.000,0:06:48.000 whole family of things that come under the heading of tree in computer 0:06:48.000,0:06:52.000 science. We're going to look in particular at this one embodiment, the binary search 0:06:52.000,0:06:53.000 tree. 0:06:53.000,0:06:56.000 But before I get deep down in the details of that, I just want to back 0:06:56.000,0:06:59.000 up for a second and just talk about what trees in general mean, and what kind of 0:06:59.000,0:07:01.000 things in general are represented in manipulative trees. 0:07:01.000,0:07:06.000 So that you have a sense of what is the broad notion of tree as a 0:07:06.000,0:07:08.000 computer scientist would see it, 0:07:08.000,0:07:10.000 is just that a tree is a recursive branching structure 0:07:10.000,0:07:14.000 where there are nodes, which have pointers to subtrees. 0:07:14.000,0:07:17.000 There may be one of those subtrees. There may be ten of those subtrees. There 0:07:17.000,0:07:20.000 may be two; depending on the nature of the tree you're working on there may be a 0:07:20.000,0:07:22.000 constraint on exactly how many children they have 0:07:22.000,0:07:26.000 or some flexibility in the number of children they have 0:07:26.000,0:07:27.000 0:07:27.000,0:07:29.000 depending on what's being modeled here. 0:07:29.000,0:07:31.000 There's always a single root node. 0:07:31.000,0:07:35.000 There's a node that kind of sits at the top of this hierarchy that leads down to 0:07:35.000,0:07:36.000 these things, 0:07:36.000,0:07:41.000 and that to form a tree there has to be a reachable path from the root 0:07:41.000,0:07:43.000 to every single node. 0:07:43.000,0:07:46.000 So there aren't some nodes that exist way out here, and there aren't two different ways to 0:07:46.000,0:07:48.000 get to the same 0:07:48.000,0:07:50.000 node in a tree; that if the 0:07:50.000,0:07:53.000 midterm is kept in the folder called exams, which is in the folder 0:07:53.000,0:07:56.000 called cs106, which is in my home folder. There's not some other way that 0:07:56.000,0:07:59.000 you can get to the midterm; the midterm lives exactly that part in the tree, 0:07:59.000,0:08:01.000 and there's no 0:08:01.000,0:08:04.000 circularity or loops in the structure. 0:08:04.000,0:08:07.000 And so a lot of things do get modeled tree, the one I've drawn here is just the file system. 0:08:07.000,0:08:10.000 Think about how your files are maintained on a computer, there tends to be a 0:08:10.000,0:08:11.000 structure of an 0:08:11.000,0:08:14.000 outermost folder, which has inner folders, 0:08:14.000,0:08:17.000 which have inner folders. And along the way there are these files, which you can think of as leaf 0:08:17.000,0:08:21.000 nodes that have no further subtrees underneath them. 0:08:21.000,0:08:24.000 And you can talk about, say, what is the size of 0:08:24.000,0:08:27.000 all the folders in my home directory? Well it would kind 0:08:27.000,0:08:28.000 of be recursively defined as summing over 0:08:28.000,0:08:30.000 all the folders within it, 0:08:30.000,0:08:33.000 how much storage is being used by those; which when it gets down to a leaf node can say, how 0:08:33.000,0:08:38.000 much space does this file take up and kind of add those together. 0:08:38.000,0:08:42.000 Things like your family trees, right, sort of modeling genealogy. 0:08:42.000,0:08:44.000 Things like the decomposition tree in your program; 0:08:44.000,0:08:47.000 main calling, 0:08:47.000,0:08:50.000 read file calling, set up boggle board calling, 0:08:50.000,0:08:52.000 give instructions. 0:08:52.000,0:08:56.000 There is a tree structure to that, about what pieces use what pieces and 0:08:56.000,0:08:57.000 descendants from there. 0:08:57.000,0:09:01.000 The ones that we're going to look at in particularly are the ones that are in the family of the binary trees. So a binary 0:09:01.000,0:09:05.000 tree has two children, exactly two children. There's a left and a right 0:09:05.000,0:09:07.000 pointer coming out of each node. 0:09:07.000,0:09:11.000 One or both of those can be null, but there can never be more than two 0:09:11.000,0:09:11.000 children. 0:09:11.000,0:09:15.000 And there is sort of space for recording two children, as opposed to recording things which 0:09:15.000,0:09:17.000 are trinary, which have three 0:09:17.000,0:09:20.000 possibilities for children, or where there may be some four or eight or 0:09:20.000,0:09:22.000 some other number there. 0:09:22.000,0:09:24.000 And in particular it's the search tree 0:09:24.000,0:09:26.000 that's going to help us get 0:09:26.000,0:09:28.000 the work done that we want, which is 0:09:28.000,0:09:32.000 there's not just any old arrangement of what's on the left, and what's on the right side. 0:09:32.000,0:09:35.000 That the nodes are going to be organized in here to facilitate binary search by 0:09:35.000,0:09:36.000 keeping 0:09:36.000,0:09:37.000 one half 0:09:37.000,0:09:40.000 over on the left and one half over on the right, and that we'll know which 0:09:40.000,0:09:44.000 half something goes in by virtue of looking at the key. So 0:09:44.000,0:09:47.000 let's take a look at a simple example of a binary search tree with 0:09:47.000,0:09:49.000 numbers. So 0:09:49.000,0:09:53.000 if I have a struct node that keeps track of an integer value and has two pointers, the 0:09:53.000,0:09:55.000 left and the right, 0:09:55.000,0:09:59.000 that point away from it. And in this case my root node is 57, 0:09:59.000,0:10:03.000 and looking at 57 I can tell you that every value in this tree 0:10:03.000,0:10:07.000 that is less than 57 will be in the left subtree. There will be no values 0:10:07.000,0:10:09.000 that are 56 or lower that are on the 0:10:09.000,0:10:10.000 right side. 0:10:10.000,0:10:14.000 And that will be true at every step down the tree. So if I'm looking for 0:10:14.000,0:10:15.000 something 0:10:15.000,0:10:18.000 that actually gives me that immediate access we were using the vector's 0:10:18.000,0:10:19.000 sorting property for. Which is if 0:10:19.000,0:10:24.000 I'm looking for the number 70, and I start at the root node of 57, then I know 0:10:24.000,0:10:25.000 it's not in the left subtree, 0:10:25.000,0:10:28.000 so assuming that about half the nodes are 0:10:28.000,0:10:30.000 pointed to over on that side, 0:10:30.000,0:10:34.000 I've now discarded half of the input even from consideration. If 0:10:34.000,0:10:36.000 I start at the 70, 0:10:36.000,0:10:39.000 and I say okay well if 57 must be over here, then again recursively apply the 0:10:39.000,0:10:44.000 same search property here. It's like if I'm looking for 70, is it going to be 0:10:44.000,0:10:45.000 before or after 79? 0:10:45.000,0:10:49.000 It's going to be before, so in fact I throw away anything that leads away on 0:10:49.000,0:10:52.000 the right subtree of 79, and work my way one level down. 0:10:52.000,0:10:53.000 So each step 0:10:53.000,0:10:58.000 in the recursive search is represented by sort of taking a step down 0:10:58.000,0:11:02.000 in that search tree; working my way down from the root to the leaf nodes. And 0:11:02.000,0:11:05.000 I either will find the value I'm looking for along the way, 0:11:05.000,0:11:08.000 or I will fall off the bottom of that tree 0:11:08.000,0:11:10.000 when I haven't been able to find it. 0:11:10.000,0:11:13.000 And that will tell me; oh it couldn't have been in the tree, because the only place 0:11:13.000,0:11:16.000 it could have been is the one that followed the binary search property. 0:11:16.000,0:11:19.000 And so if I look for 70, I say is it greater or less than 0:11:19.000,0:11:21.000 62? It's greater. Is it 0:11:21.000,0:11:24.000 greater or less than 71? It's less then. And then I work off there I get to the empty 0:11:24.000,0:11:26.000 tree. Then I say well if there was a 60, that's where it was 0:11:26.000,0:11:31.000 going to be, and it wasn't there so there much not be a 70 in this tree. 0:11:31.000,0:11:32.000 So it is exactly designed 0:11:32.000,0:11:36.000 to support one kind of search, and one kind of search only, which is this 0:11:36.000,0:11:40.000 efficient divide and conquer binary search, work its way down. 0:11:40.000,0:11:40.000 0:11:40.000,0:11:42.000 The other kinds of things, let's say, that 0:11:42.000,0:11:44.000 trees 0:11:44.000,0:11:45.000 are used for 0:11:45.000,0:11:48.000 may not need this kind of support, so they may not go out of their way to do this. But the one 0:11:48.000,0:11:52.000 we're actually very interested in is because of maps doing a lot of 0:11:52.000,0:11:55.000 additions into a sorted facility and 0:11:55.000,0:11:57.000 searches in that sort of facility. If we can 0:11:57.000,0:12:00.000 bottle it as a binary search tree, we're able to get the best of both worlds. To 0:12:00.000,0:12:03.000 have that super fast logarithmic search, 0:12:03.000,0:12:07.000 as well as have the flexibility of the pointer based structure 0:12:07.000,0:12:11.000 to help us do an insert. So if I'm ready to put a 70 into this tree, 0:12:11.000,0:12:14.000 the same path that led to me to where it should be 0:12:14.000,0:12:16.000 tells us where to put that new node 0:12:16.000,0:12:20.000 in place. And if all I need to do is to wire that in place; and if all I need to do to wire that in place is to kind of create a new node in the 0:12:20.000,0:12:21.000 heap and 0:12:21.000,0:12:24.000 put the pointers coming in and out of it the way that I would in a link list, 0:12:24.000,0:12:27.000 then I should be able to do both searches and inserts 0:12:27.000,0:12:30.000 in time proportional to the height of the tree, 0:12:30.000,0:12:35.000 which for a relatively balanced tree would be logarithmic. So 0:12:35.000,0:12:37.000 it sounds like we can get everything we wanted here 0:12:37.000,0:12:40.000 with a little bit of work. So 0:12:40.000,0:12:43.000 let me tell you - just do a little bit of playing with trees, just to kind of get used to thinking 0:12:43.000,0:12:46.000 about how we do stuff on them, and then I'm going to go forward with 0:12:46.000,0:12:49.000 implementing map using a binary search tree. 0:12:49.000,0:12:52.000 Trees are very, very recursive. So for those of you who still feel like 0:12:52.000,0:12:54.000 recursion is a little bit hazy, 0:12:54.000,0:12:58.000 this is another great area to strengthen your thinking about 0:12:58.000,0:13:01.000 recursive. In fact, I think a lot of people find operating on trees very, very 0:13:01.000,0:13:03.000 natural. Because trees are so recursive, the 0:13:03.000,0:13:07.000 idea that you write recursive algorithms to them is almost like you do it without even thinking, you don't 0:13:07.000,0:13:08.000 have to think hard. You're like, 0:13:08.000,0:13:11.000 you typically need to do something to your current node, 0:13:11.000,0:13:14.000 and then recursively operate on your left and your right. 0:13:14.000,0:13:17.000 And it's so natural that you don't even have stop and think hard about 0:13:17.000,0:13:18.000 what that means. So 0:13:18.000,0:13:21.000 your base case tends to be getting to an empty tree and having a 0:13:21.000,0:13:23.000 pointer that points to null, 0:13:23.000,0:13:28.000 and in the other cases have some node operate on your children. 0:13:28.000,0:13:31.000 Some of the simplest things are just traversals. 0:13:31.000,0:13:33.000 If I tree structure and I'd like to 0:13:33.000,0:13:35.000 do something to every node, 0:13:35.000,0:13:37.000 go through it and print them all, 0:13:37.000,0:13:39.000 go through it and write them to a file, 0:13:39.000,0:13:42.000 go through and add them all up to determine the average score, I 0:13:42.000,0:13:46.000 need these things. I'm going to need to do some processing that visits every node of the 0:13:46.000,0:13:48.000 tree once and exactly once, 0:13:48.000,0:13:50.000 and recursion is exactly the way to do that. 0:13:50.000,0:13:53.000 And there's a little bit of some choices, and when you're handling a 0:13:53.000,0:13:55.000 particular part of a tree, are you handling the current node and then 0:13:55.000,0:13:59.000 recurring on the left and right subtrees? Are you recurring first and then handling 0:13:59.000,0:14:02.000 the nodes. So some of those things that we saw in terms of the link list 0:14:02.000,0:14:03.000 that 0:14:03.000,0:14:06.000 change the order of the processing, but they don't change 0:14:06.000,0:14:08.000 whether you visit everybody. It's just a matter of which one you want to do 0:14:08.000,0:14:09.000 before. 0:14:09.000,0:14:12.000 I'm going to look at a couple of the simple traversals here, 0:14:12.000,0:14:14.000 and I'll point out 0:14:14.000,0:14:16.000 why that mattered. 0:14:16.000,0:14:18.000 If we look at in order, 0:14:18.000,0:14:22.000 so we'll consider that an in order traversal is for you to operate on your 0:14:22.000,0:14:24.000 left subtree recursively, 0:14:24.000,0:14:27.000 handle the current node right here, and then operate on your right subtree 0:14:27.000,0:14:29.000 recursively. 0:14:29.000,0:14:31.000 The base case, which is 0:14:31.000,0:14:34.000 kind of hidden in here, is if T is null, we get to an empty subtree we 0:14:34.000,0:14:37.000 don't do anything. So only in the nonempty case are we actually 0:14:37.000,0:14:39.000 processing and making recursive calls. 0:14:39.000,0:14:43.000 So if I start with my node 57 being the root of my tree, and I say 0:14:43.000,0:14:45.000 print that tree starting from the root. 0:14:45.000,0:14:48.000 It will print everything out of the left subtree, 0:14:48.000,0:14:52.000 then print the 57, and then print everything out of the right subtree. 0:14:52.000,0:14:55.000 And given that gets applied everywhere down, 0:14:55.000,0:14:58.000 the effect will be 57 says first print my left subtree. 0:14:58.000,0:15:01.000 And 37 says, well print my left subtree. And 15 says, well first 0:15:01.000,0:15:03.000 print my left subtree. 0:15:03.000,0:15:06.000 And then seven says, first print my left subtree. Well, that ones empty, 0:15:06.000,0:15:09.000 and so nothing gets printed. 0:15:09.000,0:15:13.000 As the recursion unwinds, it comes back to the last most call. The last most call was 0:15:13.000,0:15:17.000 seven. It says, okay, print seven and now print seven's right subtree. Again that's an empty, 0:15:17.000,0:15:19.000 so nothing gets printed there. It 0:15:19.000,0:15:22.000 will come back and then do the same thing with 15. I've done everything 0:15:22.000,0:15:23.000 to the left of 15; 0:15:23.000,0:15:26.000 let's print 15 and its right. 0:15:26.000,0:15:29.000 And doing that kind of all the way throughout the tree means that we will get the numbers out 0:15:29.000,0:15:31.000 in sorted order; 0:15:31.000,0:15:34.000 all right, five, 17, seven, 0:15:34.000,0:15:37.000 15, 32, 57, 59, 62, 71, 0:15:37.000,0:15:40.000 79, 93. An in order traversal 0:15:40.000,0:15:42.000 in a binary search tree 0:15:42.000,0:15:50.000 rediscovers, visits all the nodes from smallest to largest. [Inaudible] 0:15:50.000,0:15:54.000 So duplicates, you just have to have a strategy. It turns out in the map, the duplicates 0:15:54.000,0:15:54.000 overwrite. 0:15:54.000,0:15:58.000 So any time you get a new value you print there. Typically you just need to decide, 0:15:58.000,0:16:00.000 do they go to the left or do they go to the right? 0:16:00.000,0:16:03.000 So that you know when you're equal to if you wanted to find another one 0:16:03.000,0:16:04.000 of these, where would it go? 0:16:04.000,0:16:07.000 In a lot of situations you just don't have duplicates, so it's not actually a big deal. 0:16:07.000,0:16:10.000 You can't throw them arbitrarily on either side because you would end up having to search 0:16:10.000,0:16:14.000 both sides to find them. So you want to just know that's it's less than or equal to on the 0:16:14.000,0:16:18.000 left and greater than on the right. So that's 0:16:18.000,0:16:22.000 pretty neat, right, that that means that this 0:16:22.000,0:16:22.000 0:16:22.000,0:16:25.000 for purposes of being able to iterate for something, if 0:16:25.000,0:16:26.000 you were to walk 0:16:26.000,0:16:30.000 the structure in order, then you can produce them back out in sorted order, which tends to be 0:16:30.000,0:16:31.000 a convenient way 0:16:31.000,0:16:33.000 0:16:33.000,0:16:36.000 to print the output. There are a lot of situations where you do want that browsable 0:16:36.000,0:16:43.000 list in alphanumerical order. And the tree makes that easy to do. 0:16:43.000,0:16:45.000 When it comes time to delete the tree, I'm 0:16:45.000,0:16:48.000 going to need to go through and delete each of those nodes individually, the 0:16:48.000,0:16:51.000 same way I would on a link list to delete the whole structure. 0:16:51.000,0:16:54.000 The 0:16:54.000,0:16:57.000 order of traversal that's going to be most convenient for doing that is going to be to do it 0:16:57.000,0:16:58.000 post order. 0:16:58.000,0:17:00.000 Which is to say, when I'm looking at the root node 57, 0:17:00.000,0:17:05.000 I want to delete my entire left subtree recursively, delete my entire right 0:17:05.000,0:17:10.000 subtree recursively, and only after both of those pieces have been cleaned up, 0:17:10.000,0:17:12.000 delete the node we're on; 0:17:12.000,0:17:14.000 so deleting down below to the left, down below to the right 0:17:14.000,0:17:17.000 and then delete this one. If I did it in the other order, if 0:17:17.000,0:17:21.000 I tried to delete in between in the in order or in the pre order case where I did 0:17:21.000,0:17:23.000 my node and then to my subtrees, 0:17:23.000,0:17:26.000 I'd be reaching into parts of my memory I've deleted. 0:17:26.000,0:17:31.000 So we do want to do it in that order, so 0:17:31.000,0:17:32.000 recursion 0:17:32.000,0:17:35.000 coming back to visit you again. 0:17:35.000,0:17:42.000 Let's talk about how to make map work as a tree. 0:17:42.000,0:17:46.000 So if you think about what maps hold, maps hold a key, which is string type; a 0:17:46.000,0:17:48.000 value, which is client 0:17:48.000,0:17:51.000 some sort of template parameter there. 0:17:51.000,0:17:55.000 Now that I've built a structure that raps those two things with a pair and 0:17:55.000,0:17:58.000 adds to it, right these two additional pointers to the left and to the right 0:17:58.000,0:18:01.000 subtrees that point back to a recursive struct, 0:18:01.000,0:18:05.000 the one data member that the map will have at the top is just the root 0:18:05.000,0:18:09.000 pointer. The pointer to the node at the top of the tree, 0:18:09.000,0:18:12.000 and then from there we'll traverse pointers to find our way to other ones. 0:18:12.000,0:18:15.000 And we will organize that tree for the binary search property. 0:18:15.000,0:18:16.000 So using the 0:18:16.000,0:18:20.000 key as the discriminate of what goes left and right, 0:18:20.000,0:18:25.000 we'll compare each new key to the key at the root and decide is it going to go in the left subtree 0:18:25.000,0:18:27.000 root or the right subtree root? And so on down to the 0:18:27.000,0:18:31.000 matching position when we're doing a find, or to the place to insert if we're ready to 0:18:31.000,0:18:33.000 put a new node in. 0:18:33.000,0:18:36.000 So get value and add will have very similar structures, in terms of 0:18:36.000,0:18:38.000 searching that tree 0:18:38.000,0:18:39.000 0:18:39.000,0:18:40.000 from the root 0:18:40.000,0:18:45.000 down to the leaf to find that match along the way, or to 0:18:45.000,0:18:49.000 report where to put a new one in. So 0:18:49.000,0:18:52.000 let's look at the actual code; it has 0:18:52.000,0:18:54.000 my struct, it 0:18:54.000,0:18:57.000 has the strike key, it has the value type node, 0:18:57.000,0:19:00.000 two pointers there, and then the one data member for the map 0:19:00.000,0:19:03.000 itself here is the pointer to the root node. I 0:19:03.000,0:19:09.000 also put a couple of helper number functions that I'm going to need. 0:19:09.000,0:19:10.000 0:19:10.000,0:19:13.000 Searching the tree and entering a new node in the tree are going to require a little 0:19:13.000,0:19:18.000 helper, and we're going to see why that is. That basically that the add and get value public 0:19:18.000,0:19:20.000 number functions are going to be just wrappers 0:19:20.000,0:19:23.000 that turn and pass the code off to these recursive helpers. 0:19:23.000,0:19:26.000 And if these cases, the additional information that each of these is 0:19:26.000,0:19:28.000 tracking is what node we're at 0:19:28.000,0:19:32.000 as we work our way down the tree. But the initial call to add is just going to say, well 0:19:32.000,0:19:34.000 here's a key to value go put it in the right place. 0:19:34.000,0:19:37.000 And that while we're doing the recursion we need to know where we are in the tree. So 0:19:37.000,0:19:39.000 we're adding that extra parameter. 0:19:39.000,0:19:43.000 What a lot of recursive code needs is just a little more housekeeping, and the 0:19:43.000,0:19:44.000 housekeeping is 0:19:44.000,0:19:51.000 what part of the tree we're currently searching or trying to enter into. 0:19:51.000,0:19:54.000 So this is the outer call for get value. 0:19:54.000,0:19:57.000 So get value is really just a wrapper, 0:19:57.000,0:19:59.000 it's going to call tree search. It's going to 0:19:59.000,0:20:03.000 ask the starting value for root. It's the 0:20:03.000,0:20:04.000 beginning point for the search, 0:20:04.000,0:20:07.000 looking for a particular key. And what tree searches are going to return is a 0:20:07.000,0:20:10.000 pointer to a node. If 0:20:10.000,0:20:14.000 that node was not found, if there was no node that has a string 0:20:14.000,0:20:17.000 key that matches the one that we're looking for, then we'll return null. Otherwise, we'll return 0:20:17.000,0:20:18.000 the node. 0:20:18.000,0:20:20.000 And so it either will error 0:20:20.000,0:20:23.000 when it didn't find it or 0:20:23.000,0:20:25.000 pull out the value that was out of 0:20:25.000,0:20:25.000 that 0:20:25.000,0:20:27.000 node structure. 0:20:27.000,0:20:30.000 Well, this part looks pretty okay. The template, right, there's always just a 0:20:30.000,0:20:33.000 little bit of goo up there, right, seeing kind of 0:20:33.000,0:20:36.000 the template header on the top and the use of the val type, and all this other stuff. 0:20:36.000,0:20:39.000 It's going to get a little bit worse here. So just kind of 0:20:39.000,0:20:42.000 hopefully take a deep breath and say, okay that part actually 0:20:42.000,0:20:46.000 doesn't scare me too much. We're going to look at what happens when I write the tree search code, where 0:20:46.000,0:20:48.000 it really actually goes off and does the search down the tree. And there's going to be 0:20:48.000,0:20:49.000 0:20:49.000,0:20:52.000 a little bit more machinations required on this. So let's just talk about it first, 0:20:52.000,0:20:55.000 just what the code does. And then I'm going to come back and talk about some of the syntax that's 0:20:55.000,0:20:58.000 required to make this work correctly. 0:20:58.000,0:21:02.000 So the idea of tree search is, given a pointer to a particular 0:21:02.000,0:21:02.000 subtree, it 0:21:02.000,0:21:05.000 could be the root of the original tree or some 0:21:05.000,0:21:07.000 smaller subtree further down, and a key that we're looking for, we're 0:21:07.000,0:21:10.000 going to return a match when we find that node. 0:21:10.000,0:21:15.000 So looking at the step here, where if the key that we have matches the key of the node 0:21:15.000,0:21:17.000 we're looking at, then this is the node. 0:21:17.000,0:21:19.000 This is the one we wanted. 0:21:19.000,0:21:23.000 Otherwise, right, we look at two cases. If the key is less than this node's key, 0:21:23.000,0:21:26.000 then we're just going to return the result from searching in the left subtree. 0:21:26.000,0:21:31.000 Otherwise it must be greater than, and we're going to search in the right subtree. 0:21:31.000,0:21:34.000 So this case, right, duplicates aren't allowed, so there will be exactly one match, if 0:21:34.000,0:21:37.000 at all. 0:21:37.000,0:21:39.000 And when we find it we return. 0:21:39.000,0:21:43.000 The other case that needs to be handled as part of base case is if what if we ever get to an 0:21:43.000,0:21:44.000 empty tree. 0:21:44.000,0:21:47.000 So we're looking for 32 and the last node we just passed was 0:21:47.000,0:21:50.000 35. We needed to be the left of it, and it turns out there was nothing down there, we 0:21:50.000,0:21:52.000 have an empty tree. 0:21:52.000,0:21:54.000 We'll hit this case where T is null, and we'll return null. That means there was never 0:21:54.000,0:21:58.000 a match to that. There's no other place in the tree 0:21:58.000,0:22:01.000 where that 32 could have been. The binary search property completely dictates the 0:22:01.000,0:22:04.000 series of left and right turns. There's nowhere else to look. 0:22:04.000,0:22:06.000 So that's actually, really solving 0:22:06.000,0:22:07.000 0:22:07.000,0:22:11.000 a lot of our - cleaning up our work for us, streamlining where we need to go; 0:22:11.000,0:22:14.000 because the tree is organized through the 0:22:14.000,0:22:16.000 mid point and the quarter point and then the eighth point 0:22:16.000,0:22:19.000 to narrow down to where we could go. So as 0:22:19.000,0:22:21.000 it is, this code works correctly. 0:22:21.000,0:22:25.000 So do you have any questions about how it's thinking about doing its work? 0:22:25.000,0:22:30.000 And then I'm going to show you why the syntax is a little bit wrong though. 0:22:30.000,0:22:32.000 Okay, so let's talk about this 0:22:32.000,0:22:34.000 return type coming out of 0:22:34.000,0:22:36.000 the function. 0:22:36.000,0:22:38.000 So let's go back to this for a second. 0:22:38.000,0:22:41.000 That node is defined 0:22:41.000,0:22:43.000 as a private member 0:22:43.000,0:22:46.000 of the map class. 0:22:46.000,0:22:48.000 So node is actually not a global type. 0:22:48.000,0:22:53.000 It does not exist outside the map class. Its real name is 0:22:53.000,0:22:56.000 0:22:56.000,0:22:58.000 map 0:22:58.000,0:23:00.000 val type:: 0:23:00.000,0:23:03.000 node. 0:23:03.000,0:23:05.000 That's its full name, 0:23:05.000,0:23:08.000 is that. Now, 0:23:08.000,0:23:08.000 inside 0:23:08.000,0:23:10.000 the implementation of map, 0:23:10.000,0:23:14.000 which is to say inside the class map, or inside member functions that we're 0:23:14.000,0:23:17.000 defining later, you'll not that we can just kind of drop 0:23:17.000,0:23:21.000 the big full name. The kind of my name is mister such and such, and such and 0:23:21.000,0:23:26.000 such; it's like okay, you can just call it node, the short name internal to the class 0:23:26.000,0:23:28.000 because it's inside that scope. And 0:23:28.000,0:23:31.000 it's able to say, okay in this scope is there something called node? Okay there is. Okay, that's the 0:23:31.000,0:23:34.000 one. Outside of that scope, though, if you were trying to use that outside of the 0:23:34.000,0:23:37.000 class or some other places, it's going to 0:23:37.000,0:23:40.000 have a little bit more trouble with this idea of node by itself. It's going to need to see 0:23:40.000,0:23:42.000 that full name. Let 0:23:42.000,0:23:44.000 me show you on this piece of code, 0:23:44.000,0:23:46.000 that node here 0:23:46.000,0:23:51.000 happens to be in the slight quirk of the way sequel plus defines things, that that 0:23:51.000,0:23:56.000 use of node is not considered in class scope yet. 0:23:56.000,0:23:57.000 That 0:23:57.000,0:24:01.000 the compiler leads from left to write, so we'll look at the first one. It says, template 0:24:01.000,0:24:04.000 type name val type, and so what the compiler sees so far is, I'm about to 0:24:04.000,0:24:07.000 see something that's templated on val type. 0:24:07.000,0:24:10.000 So it's a pattern for which val type is a placeholder in. 0:24:10.000,0:24:12.000 And then it sees val type 0:24:12.000,0:24:16.000 as the return value, and it says, okay, that's the placeholder. And then it sees this map :: and 0:24:16.000,0:24:18.000 0:24:18.000,0:24:21.000 it says, okay, this is within the - the thing 0:24:21.000,0:24:26.000 that I'm defining here is within scope of the map class. And so it sees the map 0:24:26.000,0:24:26.000 :: 0:24:26.000,0:24:29.000 and that as soon as it crosses those double colons, 0:24:29.000,0:24:31.000 from that point forward 0:24:31.000,0:24:33.000 it's inside the map scope. 0:24:33.000,0:24:34.000 And so, 0:24:34.000,0:24:37.000 when I use things like tree search, or I use the name node, it knows what I'm 0:24:37.000,0:24:40.000 talking about. It says, oh there's a tree search in map class. There's a node struct in map 0:24:40.000,0:24:43.000 class, it's all good. 0:24:43.000,0:24:47.000 In the case of this one I've seen template type and val type, and 0:24:47.000,0:24:50.000 at this point it doesn't know anything yet. 0:24:50.000,0:24:53.000 And so it sees me using node, and it say, 0:24:53.000,0:24:54.000 node I've 0:24:54.000,0:24:55.000 never heard of node. 0:24:55.000,0:24:58.000 I'm in the middle of making some template based on val type, but I haven't heard 0:24:58.000,0:25:01.000 anything about node. Node isn't something that exists in the global name space. 0:25:01.000,0:25:03.000 You must have made some 0:25:03.000,0:25:04.000 tragic error. 0:25:04.000,0:25:08.000 What it wants you to say is 0:25:08.000,0:25:09.000 to give it the full name 0:25:09.000,0:25:12.000 of map :: node. 0:25:12.000,0:25:14.000 That the real full 0:25:14.000,0:25:16.000 this is my name, 0:25:16.000,0:25:21.000 and otherwise it will be quite confused. 0:25:21.000,0:25:24.000 If that weren't enough, 0:25:24.000,0:25:26.000 we'd like to think that we were done 0:25:26.000,0:25:27.000 with 0:25:27.000,0:25:30.000 placating the C compiler. But it turns out there's one other little 0:25:30.000,0:25:32.000 thing that has to happen here, 0:25:32.000,0:25:34.000 which is there is an obscure 0:25:34.000,0:25:38.000 reason why, even though I've given it the full name, and I feel like I've done my 0:25:38.000,0:25:40.000 duty in describing this thing, 0:25:40.000,0:25:43.000 that the compiler still is a little bit confused 0:25:43.000,0:25:45.000 by this usage. 0:25:45.000,0:25:48.000 And because the C language is just immensely complicated, 0:25:48.000,0:25:50.000 there are a lot of funny interactions, 0:25:50.000,0:25:52.000 there are a couple of situations it can get in to, 0:25:52.000,0:25:53.000 where 0:25:53.000,0:25:56.000 because it's trying hard to read left to right and see the things. 0:25:56.000,0:25:59.000 Sometimes it needs to know some things in advance that it doesn't have all the 0:25:59.000,0:26:00.000 information for. 0:26:00.000,0:26:03.000 And it turns out in this case there's actually some obscure situation where 0:26:03.000,0:26:06.000 it might not realize this is really a type. 0:26:06.000,0:26:08.000 It might think it was 0:26:08.000,0:26:10.000 more of a global constant 0:26:10.000,0:26:14.000 that we actually have to use the type name key word again here 0:26:14.000,0:26:16.000 on this one 0:26:16.000,0:26:17.000 construction. 0:26:17.000,0:26:20.000 It is basically a hint to the compiler 0:26:20.000,0:26:22.000 that says, the next thing coming up 0:26:22.000,0:26:25.000 really is a type name. 0:26:25.000,0:26:28.000 So given that you can't tell that, you'd think it would be obvious; in 0:26:28.000,0:26:31.000 this case it doesn't seem like there's a lot for it to be confused at, but there's 0:26:31.000,0:26:33.000 actually some other constructions that we can't rule out. 0:26:33.000,0:26:37.000 So the compiler is a little bit confused in this situation, and it wants you to 0:26:37.000,0:26:38.000 tell it in advance, 0:26:38.000,0:26:41.000 and so I'll tell you that the basic rules for when you're going to need to use this is 0:26:41.000,0:26:45.000 exactly in this and only this situation. 0:26:45.000,0:26:48.000 And I'll tell you what the things are that are required for type name to 0:26:48.000,0:26:50.000 become part of your 0:26:50.000,0:26:51.000 repertoire. 0:26:51.000,0:26:53.000 You have a return type 0:26:53.000,0:26:54.000 on a member function 0:26:54.000,0:26:58.000 that is defined as part of a class that is a template, 0:26:58.000,0:27:00.000 and that type that you are using 0:27:00.000,0:27:03.000 is defined within the scope of that template. 0:27:03.000,0:27:07.000 So it's trying to use something that's within a template class scope 0:27:07.000,0:27:08.000 outside of that scope 0:27:08.000,0:27:09.000 0:27:09.000,0:27:11.000 that requires this type name. 0:27:11.000,0:27:14.000 And the only place where we see things used out of their scope is on the 0:27:14.000,0:27:15.000 return value. 0:27:15.000,0:27:19.000 So this is kind of the one 0:27:19.000,0:27:21.000 configuration that causes you to have to do this. 0:27:21.000,0:27:22.000 0:27:22.000,0:27:25.000 It's really just, you know, a little bit of a C quirk. 0:27:25.000,0:27:28.000 It doesn't change anything about what the code's doing or anything interesting about binary 0:27:28.000,0:27:31.000 search trees, but it will come up in any situation where you have this helper member 0:27:31.000,0:27:34.000 function that wants to return something that's templated. You 0:27:34.000,0:27:35.000 have 0:27:35.000,0:27:37.000 to placate the compiler there. 0:27:37.000,0:27:39.000 The error message when you don't do it, actually turns out to be very good in 0:27:39.000,0:27:45.000 GCC and very terrible in visual studio. GCC says type name required here, 0:27:45.000,0:27:47.000 and visual studio says something completely 0:27:47.000,0:27:48.000 mystical. 0:27:48.000,0:27:49.000 So it 0:27:49.000,0:27:52.000 is something to be just a little bit careful about when you're trying to write 0:27:52.000,0:27:56.000 that kind of 0:27:56.000,0:27:58.000 code. Question? The asterisk? The asterisk is just because it returns a pointer. 0:27:58.000,0:28:00.000 0:28:00.000,0:28:03.000 In all the situations, I'm always returning a pointer to the node back 0:28:03.000,0:28:04.000 there. 0:28:04.000,0:28:16.000 So here's the pointer to the match I found or node when I didn't find one. All right, that's a little bit of grimy code. Let's keep going. 0:28:16.000,0:28:17.000 So how is it 0:28:17.000,0:28:18.000 that add 0:28:18.000,0:28:22.000 works? Well, it starts like get value, that 0:28:22.000,0:28:23.000 the 0:28:23.000,0:28:27.000 strategy we're going to use for inserting into the tree is the really simple one that say, 0:28:27.000,0:28:30.000 well don't rearrange anything that's there. For example, if I were trying to put 0:28:30.000,0:28:32.000 the number 54 into this tree, 0:28:32.000,0:28:33.000 you could say - well 0:28:33.000,0:28:36.000 you could imagine putting it at the root and moving things around. Imagine 0:28:36.000,0:28:40.000 that your goal is to put it in with a minimal amount of rearrangement. 0:28:40.000,0:28:44.000 So leaving all the existing nodes connected the way they are, 0:28:44.000,0:28:47.000 we're going to follow the binary search path of 0:28:47.000,0:28:49.000 the get value code 0:28:49.000,0:28:51.000 to find the place where 54 would have been 0:28:51.000,0:28:54.000 if we left everything else in tact. 0:28:54.000,0:28:57.000 Looking at 57 you could say, well it has to be to the left of 0:28:57.000,0:29:01.000 57. Looking to the 32 you say, it has to be to the right of 0:29:01.000,0:29:01.000 32. 0:29:01.000,0:29:05.000 Here's where we get some empty tree, so we say, well that's the place where we'll put it. 0:29:05.000,0:29:06.000 We're going to tack it off, kind 0:29:06.000,0:29:12.000 of hang it off one of the empty trees 0:29:12.000,0:29:15.000 along the front tier, the bottom of that tree. 0:29:15.000,0:29:18.000 And so, we'll just walk our way down to where we fall off the tree, and put the 0:29:18.000,0:29:19.000 new node in place there. 0:29:19.000,0:29:22.000 So if I want the node 58 in, I'll say it 0:29:22.000,0:29:25.000 has to go to the right; it has to go to the left; it has to go to the left; it has to go to the left; it 0:29:25.000,0:29:27.000 will go right down there. If 0:29:27.000,0:29:30.000 I want the node 100 in, it has to go right, right, right, right, right. 0:29:30.000,0:29:35.000 That just kind of following the path, leaving all the current pointers and 0:29:35.000,0:29:40.000 nodes in place, kind of finding a new place. And there's exactly one 0:29:40.000,0:29:43.000 that given any number in a current configuration 0:29:43.000,0:29:46.000 and the goal of not rearranging anything, there's exactly one place to put 0:29:46.000,0:29:48.000 that new node in there 0:29:48.000,0:29:49.000 that will 0:29:49.000,0:29:51.000 keep the binary search property in tact. 0:29:51.000,0:29:55.000 So it actually does a lot of decision making for us. 0:29:55.000,0:29:58.000 We don't have to make any hard decisions about where to go. It just all this 0:29:58.000,0:30:05.000 left, right, left, right based on less than and greater than. What if the values are equal? If 0:30:05.000,0:30:09.000 the values are equal you have to have a plan, and it could be 0:30:09.000,0:30:11.000 that you're using all the less than or equal to or greater than 0:30:11.000,0:30:14.000 or equal to. You just have to have a plan. In terms of our map it turns out to 0:30:14.000,0:30:17.000 equal to all the rights. 0:30:17.000,0:30:20.000 So we handle it differently, which we don't every put a duplicate into the 0:30:20.000,0:30:23.000 tree. 0:30:23.000,0:30:26.000 So the add call is going to basically just be a wrapper 0:30:26.000,0:30:29.000 that calls tree enter, starting from the root 0:30:29.000,0:30:31.000 with the key and the value 0:30:31.000,0:30:34.000 that then goes and does that work, walks its way down to the bottom to 0:30:34.000,0:30:40.000 find the right place to stick the new one in. 0:30:40.000,0:30:43.000 Here is 0:30:43.000,0:30:44.000 a 0:30:44.000,0:30:49.000 very dense little piece of code that will put a node into a tree 0:30:49.000,0:30:52.000 using the strategy that we've just talked about of, 0:30:52.000,0:30:55.000 don't rearrange any of the pointers, just walk your way down to where you find an empty 0:30:55.000,0:30:59.000 tree. And once you find that empty tree, replace the empty tree with a singleton node 0:30:59.000,0:31:02.000 with the new key and value that you have. 0:31:02.000,0:31:06.000 So let's not look at the base case first. Let's actually look at the 0:31:06.000,0:31:09.000 recursive cases, because I think that they're a little bit easier to think about. 0:31:09.000,0:31:13.000 Is if we every find a match, so we have a key coming in, and it matches 0:31:13.000,0:31:14.000 the current node, then we just overwrite. 0:31:14.000,0:31:17.000 So that's the way the map handles 0:31:17.000,0:31:18.000 insertion of a duplicate value. 0:31:18.000,0:31:21.000 Now, in some other situation you might actually insert one. But in our case it turns out we 0:31:21.000,0:31:24.000 never want a duplicate. We always want to take the existing value and replace 0:31:24.000,0:31:26.000 it with a new one. 0:31:26.000,0:31:28.000 If it matches, we're done. 0:31:28.000,0:31:29.000 Otherwise, 0:31:29.000,0:31:32.000 we need to decide whether we go left or right. 0:31:32.000,0:31:36.000 Based on the key we're attempting to insert or the current node value, it either 0:31:36.000,0:31:39.000 will go in the left of the subtree, left of the right, in which case we make 0:31:39.000,0:31:44.000 exactly one recursive call either to the left or to the right depending on the 0:31:44.000,0:31:46.000 relationship of key to the current 0:31:46.000,0:31:48.000 key's value. 0:31:48.000,0:31:49.000 And then 0:31:49.000,0:31:52.000 the one base case that everything eventually gets down to, that's going to create a 0:31:52.000,0:31:53.000 new node, 0:31:53.000,0:31:57.000 is when you have an empty tree, 0:31:57.000,0:31:59.000 then that says that you have followed the path that 0:31:59.000,0:32:04.000 has lead off the tree and fallen off the bottom. It says replace that with 0:32:04.000,0:32:05.000 this new singleton node. 0:32:05.000,0:32:07.000 So we make a new node out in the heap. 0:32:07.000,0:32:09.000 We're assigning the key 0:32:09.000,0:32:12.000 based on this. We're assigning the value based on the parameter coming in. And then we set 0:32:12.000,0:32:14.000 its left and right null to do 0:32:14.000,0:32:16.000 0:32:16.000,0:32:20.000 a new leaf node, right, it has left and right null subtrees. 0:32:20.000,0:32:22.000 And the way this got wired in the tree, 0:32:22.000,0:32:26.000 you don't see who is it that's now pointing to this tree, this new 0:32:26.000,0:32:28.000 singleton node that I placed into here? 0:32:28.000,0:32:31.000 You don't typically see that wire that's coming into it. Who 0:32:31.000,0:32:33.000 is pointing to it 0:32:33.000,0:32:34.000 is actually being done 0:32:34.000,0:32:39.000 by T being passed by reference. 0:32:39.000,0:32:42.000 The node right above it, 0:32:42.000,0:32:44.000 the left or the right 0:32:44.000,0:32:48.000 of what's going to become the parent of the new node being entered, 0:32:48.000,0:32:51.000 was passed by reference. It previously was null, 0:32:51.000,0:32:55.000 and it got reassigned to point to this new node. So the kind of wiring in of 0:32:55.000,0:32:58.000 that node is happening by the pass by reference of the pointer, which is a 0:32:58.000,0:33:04.000 tricky thing to kind of watch and get your head around. Let's 0:33:04.000,0:33:06.000 look at the code operating. Do you have a 0:33:06.000,0:33:13.000 question? Yes. [Inaudible]. Can you 0:33:13.000,0:33:14.000 explain 0:33:14.000,0:33:16.000 0:33:16.000,0:33:17.000 that little step? 0:33:17.000,0:33:22.000 Instructor: Well this step is just something in the case of the map, if we every find an existing entity that has that key, and 0:33:22.000,0:33:23.000 we overwrite. 0:33:23.000,0:33:27.000 So in the map you said previously, Bob's phone number is this, and then you install a new 0:33:27.000,0:33:30.000 phone number on Bob. If it finds a map to Bob saying Bob was previously entered, it just 0:33:30.000,0:33:34.000 replaces his value. So no nodes are created, no pointers are rearranged; 0:33:34.000,0:33:38.000 it just commandeers that node and changes its value. 0:33:38.000,0:33:43.000 That's the behaviors specified by map. 0:33:43.000,0:33:45.000 So if we have this 0:33:45.000,0:33:46.000 right now, we've got 0:33:46.000,0:33:49.000 some handful of fruits that are installed 0:33:49.000,0:33:53.000 in our tree with some value. It's doesn't matter what the value is, so I didn't even bother to fill 0:33:53.000,0:33:57.000 it in to confuse you. So this is what I have, papaya is my root node, and then 0:33:57.000,0:34:01.000 the next level down has banana and peach, and the pear and what not. So 0:34:01.000,0:34:03.000 root is pointing to papaya right now. 0:34:03.000,0:34:06.000 When I make my call to insert a new value, 0:34:06.000,0:34:09.000 let's say that the one that I want to put in the tree now is orange, 0:34:09.000,0:34:13.000 the very first call to tree enter looks like this, 0:34:13.000,0:34:16.000 T is the reference parameter that's coming in, and 0:34:16.000,0:34:20.000 it refers to root. 0:34:20.000,0:34:24.000 So the very first call to tree enter says please insert 0:34:24.000,0:34:27.000 orange and its value 0:34:27.000,0:34:28.000 into the tree 0:34:28.000,0:34:31.000 who is pointed to by 0:34:31.000,0:34:34.000 root. So it actually has that by reference. It's going to need that by 0:34:34.000,0:34:37.000 reference because it's actually planning on updating that 0:34:37.000,0:34:39.000 when it finally installs the new node. 0:34:39.000,0:34:43.000 So the first call to tree enter says, okay well is orange less than or greater than 0:34:43.000,0:34:45.000 papaya. 0:34:45.000,0:34:47.000 Orange precedes papaya 0:34:47.000,0:34:48.000 in the alphabet, 0:34:48.000,0:34:50.000 so okay, well call tree enter 0:34:50.000,0:34:52.000 the second time 0:34:52.000,0:34:53.000 passing 0:34:53.000,0:34:58.000 the left subtree of the current value of tee. And so T 0:34:58.000,0:35:00.000 was a reference to root right then, 0:35:00.000,0:35:04.000 now T becomes a reference to the left subtree coming out of papaya. So it says, now 0:35:04.000,0:35:06.000 okay, for the tree 0:35:06.000,0:35:10.000 that we're talking about here, which is the one that points to the banana and 0:35:10.000,0:35:14.000 below. That's the new root of that subtree is banana, it says, does orange go 0:35:14.000,0:35:15.000 to the left or right 0:35:15.000,0:35:19.000 of the root node 0:35:19.000,0:35:21.000 banana? And it goes to the right. So the third stack frame 0:35:21.000,0:35:24.000 of tree enter now says, well the reference pointer 0:35:24.000,0:35:28.000 is now looking at the right subtree coming out of banana. And it 0:35:28.000,0:35:29.000 says, well does melon go to 0:35:29.000,0:35:31.000 the left or the right of 0:35:31.000,0:35:32.000 that? 0:35:32.000,0:35:34.000 It goes to the right, 0:35:34.000,0:35:36.000 so now on this call, right, 0:35:36.000,0:35:40.000 the current value of T is going to be null. It is a reference, so a synonym for 0:35:40.000,0:35:44.000 what was the right field coming out of the melon node, 0:35:44.000,0:35:47.000 and in this is the case where it says, if orange was in the tree, 0:35:47.000,0:35:50.000 then this is where it would be hanging off of. It would be coming off of the 0:35:50.000,0:35:51.000 right 0:35:51.000,0:35:52.000 0:35:52.000,0:35:53.000 subtree of melon. 0:35:53.000,0:35:55.000 That part is null, 0:35:55.000,0:35:58.000 and so that's our place. That's the place where we're going to take out the 0:35:58.000,0:35:59.000 existing null, 0:35:59.000,0:36:00.000 wipe over it, 0:36:00.000,0:36:04.000 and put in a new pointer to the node orange 0:36:04.000,0:36:06.000 that's out here. 0:36:06.000,0:36:09.000 And so that's how the thing got wired in. It's actually kind of 0:36:09.000,0:36:12.000 subtle to see how that happened, because you don?t ever see a 0:36:12.000,0:36:16.000 direct descendant, like T is left equals this, equals that new node; or T is 0:36:16.000,0:36:17.000 right equals that new 0:36:17.000,0:36:20.000 node. It happened by virtue of passing T right 0:36:20.000,0:36:22.000 by reference into the 0:36:22.000,0:36:26.000 calls further down the stack. That eventually, right when it hit the base 0:36:26.000,0:36:26.000 case, 0:36:26.000,0:36:30.000 it took what was previously a pointer to null and overwrote it with a pointer to this 0:36:30.000,0:36:31.000 new cell, 0:36:31.000,0:36:37.000 that then got attached into the tree. Let's take a 0:36:37.000,0:36:38.000 look at this piece 0:36:38.000,0:36:40.000 of code. 0:36:40.000,0:36:44.000 That by reference there turns out to be really, really important. 0:36:44.000,0:36:46.000 If that 0:36:46.000,0:36:49.000 T was just being passed as an ordinary node star, 0:36:49.000,0:36:52.000 nodes would just get lost, dropped on the floor like crazy. 0:36:52.000,0:36:56.000 For example, consider the very first time when you're passing in root and root 0:36:56.000,0:36:58.000 is empty. If root is null, 0:36:58.000,0:36:59.000 it would say, well if root is null 0:36:59.000,0:37:02.000 and then it would change this local parameter T; T is some new node of these 0:37:02.000,0:37:06.000 things. But if it really wasn't making permanent changes to the root, 0:37:06.000,0:37:08.000 really wasn't making root pointer of that new node, root 0:37:08.000,0:37:12.000 would continue pointing to null, and that node would just get orphaned. And 0:37:12.000,0:37:14.000 every subsequent one, the same thing would happen. 0:37:14.000,0:37:17.000 So in order to have this be a persistent change, 0:37:17.000,0:37:19.000 the pointer that's coming in, 0:37:19.000,0:37:22.000 either the root in the first case or T left or T 0:37:22.000,0:37:23.000 right 0:37:23.000,0:37:25.000 in subsequent calls, 0:37:25.000,0:37:30.000 needed to have the ability to be permanently modified when we attach that new 0:37:30.000,0:37:33.000 subtree. 0:37:33.000,0:37:37.000 That's really very tricky code, so I would be happy to 0:37:37.000,0:37:39.000 try to answer 0:37:39.000,0:37:41.000 questions or go through something to try to get your head around what 0:37:41.000,0:37:44.000 it's supposed to be doing. If you could ask a question I 0:37:44.000,0:37:47.000 could 0:37:47.000,0:37:49.000 help you. Or you could just nod your head and say, I'm just totally fine. 0:37:49.000,0:37:52.000 Could you use the tree search algorithm that you programmed before? 0:37:52.000,0:37:53.000 They're 0:37:53.000,0:37:58.000 very close, right, and 0:37:58.000,0:37:59.000 where did I put it? 0:37:59.000,0:38:02.000 It's on this slide. 0:38:02.000,0:38:05.000 The problem with this one right now is it does really stop and return the 0:38:05.000,0:38:07.000 null. So I could actually unify them, but it would take a little more 0:38:07.000,0:38:11.000 work, and at some point you say, wow I could unify this 0:38:11.000,0:38:15.000 at the expense of making this other piece of code a little more complicated. So in this case I chose 0:38:15.000,0:38:18.000 to keep them separate. It's not impossible to unify them, but you end up 0:38:18.000,0:38:19.000 having to point to 0:38:19.000,0:38:22.000 where the node is or a null where you would insert a new node. Tree 0:38:22.000,0:38:26.000 search doesn't care about, but tree enter does. It sort of would 0:38:26.000,0:38:29.000 modify it to fit the needs of both at the expense of making it a little more awkward 0:38:29.000,0:38:38.000 on both sides. Question here. 0:38:38.000,0:38:43.000 [Inaudible] the number example. 0:38:43.000,0:38:47.000 If you were inserting 54 where you said, if you called print, 0:38:47.000,0:38:49.000 wouldn't it print 0:38:49.000,0:38:51.000 out of order? So 0:38:51.000,0:38:56.000 54 would go left of 57, right of 32, so it'd go over here. 0:38:56.000,0:38:59.000 But the way in order traversals work is says, 0:38:59.000,0:39:03.000 print everything in my left subtree, print everything in my left subtree, left subtree. So it would kind of print seven 0:39:03.000,0:39:07.000 as it came back five, 32, and then it would print 54 before it got to the 0:39:07.000,0:39:08.000 57. So 0:39:08.000,0:39:12.000 an in order traversal of a binary search tree, as long as the binary search tree 0:39:12.000,0:39:16.000 is properly maintained, will print them in sorted order. 0:39:16.000,0:39:19.000 It's almost like, don't even think too hard about it, if 0:39:19.000,0:39:22.000 everything in my left subtree is smaller, and everything in my right subtree is greater than me, 0:39:22.000,0:39:26.000 if I print all of those I have to come out in sorted order. 0:39:26.000,0:39:30.000 If the property is correctly maintained in the tree, then that 0:39:30.000,0:39:31.000 by definition 0:39:31.000,0:39:34.000 is a sorted way to pass through all the numbers. So if 0:39:34.000,0:39:38.000 at any point an in order traversal is hitting numbers out of order, it would mean that the tree wasn't 0:39:38.000,0:39:40.000 properly 0:39:40.000,0:39:51.000 organized. 0:39:51.000,0:39:55.000 So let's talk about how that works, there are a 0:39:55.000,0:39:59.000 couple of other operations on the tree, things like doing the size, 0:39:59.000,0:40:01.000 which would count the number of members. Which you could do by just doing a traversal, or you 0:40:01.000,0:40:04.000 might do by just caching a number 0:40:04.000,0:40:07.000 that you updated as you added and moved nodes. And 0:40:07.000,0:40:10.000 the contains keys, that's basically just a tree search 0:40:10.000,0:40:14.000 that determines whether it found a node or not. 0:40:14.000,0:40:17.000 The rest of the implementation is not interest. There's also deleting, but I 0:40:17.000,0:40:18.000 didn't show you 0:40:18.000,0:40:23.000 that. The space use, how does this relate to the other known 0:40:23.000,0:40:24.000 implementations that we have for math? 0:40:24.000,0:40:27.000 It adds certainly some space overhead. We're 0:40:27.000,0:40:31.000 going to have two pointers, a left and a right off every entry. So every entry is 0:40:31.000,0:40:35.000 guaranteed to have an excess 8-byte capacity 0:40:35.000,0:40:37.000 relative to the data being stored. 0:40:37.000,0:40:40.000 It does actually add those tightly to the allocation. So like a link 0:40:40.000,0:40:43.000 list, it allocates nodes on a per 0:40:43.000,0:40:46.000 entry basis, so there's actually not a lot of 0:40:46.000,0:40:49.000 reserve capacity that's waiting around unused the way it might been in a 0:40:49.000,0:40:51.000 vector situation. 0:40:51.000,0:40:53.000 And the performance of it is that it's based on the 0:40:53.000,0:40:55.000 height of the tree, 0:40:55.000,0:41:00.000 that both the add and the get value are doing a traversal starting from the root and 0:41:00.000,0:41:03.000 working their way down to the bottom of the tree, either finding what it wanted 0:41:03.000,0:41:05.000 along the way 0:41:05.000,0:41:07.000 or falling off the bottom of the tree 0:41:07.000,0:41:11.000 when it wasn't found or when it needs to be inserting a new node. 0:41:11.000,0:41:15.000 That height of the tree, you're expecting it to be logarithmic. And so 0:41:15.000,0:41:17.000 if I look at, for example, the values, I've got 0:41:17.000,0:41:22.000 seven values here, two, eight, 14, 15, 20, and 21. 0:41:22.000,0:41:23.000 Those seven, 0:41:23.000,0:41:27.000 eight values can be inserted in a tree to get very 0:41:27.000,0:41:28.000 different results. 0:41:28.000,0:41:30.000 There's not one binary search tree 0:41:30.000,0:41:33.000 that represents those values 0:41:33.000,0:41:37.000 uniquely. Depending on what order you manage to insert them you'll have 0:41:37.000,0:41:38.000 different things. 0:41:38.000,0:41:39.000 For example, 0:41:39.000,0:41:41.000 given the way we're doing it, 0:41:41.000,0:41:44.000 whatever node is inserted first will be the root, and we never move the 0:41:44.000,0:41:48.000 root. We leave the root along. So if you look at this as a historical 0:41:48.000,0:41:50.000 artifact and say, well they entered a bunch of numbers, 0:41:50.000,0:41:52.000 and this is the tree I got. 0:41:52.000,0:41:55.000 I can tell you for sure that 15 was the very first number that they entered. 0:41:55.000,0:41:57.000 There's no other possibility, 15 got entered first. 0:41:57.000,0:42:01.000 The next number that got inserted was one of eight or 20, I don't know 0:42:01.000,0:42:03.000 which. 0:42:03.000,0:42:06.000 I can tell you, for example, that eight got inserted before 14. In terms 0:42:06.000,0:42:09.000 of a level-by-level arrangement, the ones on level one 0:42:09.000,0:42:13.000 were always inserted before the ones on level two. 0:42:13.000,0:42:16.000 But one possibility I said here was well maybe 15 went in first, 0:42:16.000,0:42:18.000 then eight, and then two, 0:42:18.000,0:42:21.000 and then 20, and then 21, and then 14, and then 18. There 0:42:21.000,0:42:25.000 are some other rearrangements that are also possible that would create the exact same 0:42:25.000,0:42:26.000 tree. 0:42:26.000,0:42:30.000 And there are other variations that would produce a different, but still valid, 0:42:30.000,0:42:32.000 binary search tree. 0:42:32.000,0:42:33.000 So with these seven nodes, right, 0:42:33.000,0:42:36.000 this tree is height three, 0:42:36.000,0:42:39.000 and is what's considered perfectly balanced. Half the nodes are on the 0:42:39.000,0:42:42.000 left and the right of each tree all the way through it, 0:42:42.000,0:42:45.000 leading to that just beautifully balanced 0:42:45.000,0:42:46.000 0:42:46.000,0:42:47.000 result we're hoping for 0:42:47.000,0:42:52.000 where both our operations, both add and get value, will be logarithmic. So 0:42:52.000,0:42:54.000 if I have 0:42:54.000,0:42:56.000 1,000 0:42:56.000,0:42:59.000 nodes inserted, that height tree should be about ten. And if everything kind of went 0:42:59.000,0:43:03.000 well and we got kind of went well, and we got an even split all the way around, then it would take ten 0:43:03.000,0:43:06.000 comparisons to find something in that tree or determine it wasn't there' and 0:43:06.000,0:43:07.000 also to insert, 0:43:07.000,0:43:10.000 which was the thing we were having trouble with at the vector case, that 0:43:10.000,0:43:13.000 because that search leads us right to the place where it needs to go, 0:43:13.000,0:43:16.000 and the inserting because of the flexibility of the pointer base structure is 0:43:16.000,0:43:17.000 very easy, 0:43:17.000,0:43:21.000 we can make both that insert and add operate logarithmically. 0:43:21.000,0:43:24.000 Let's think about some cases that aren't so good. 0:43:24.000,0:43:28.000 I take those same seven numbers and I insert them 0:43:28.000,0:43:32.000 not quite so perfectly, and I get a not quite so even result. 0:43:32.000,0:43:36.000 So on this side I can tell you that 20 was the first one inserted. 0:43:36.000,0:43:38.000 It was the root and the root doesn't move, 0:43:38.000,0:43:41.000 but then subsequent ones was inserted an eight, and then a 21, and then an 18, 0:43:41.000,0:43:46.000 and then a 14, and then a 15, and then it went back and got that two, 0:43:46.000,0:43:47.000 the one over there. 0:43:47.000,0:43:50.000 Another possibility for how it got inserted 0:43:50.000,0:43:51.000 0:43:51.000,0:43:53.000 18, 14, 15, and so on. 0:43:53.000,0:43:54.000 These are 0:43:54.000,0:43:58.000 not quite as balanced as the last one, but they're not terribly unbalanced. 0:43:58.000,0:44:00.000 They're like one or two 0:44:00.000,0:44:03.000 more than the perfectly balanced case. 0:44:03.000,0:44:09.000 So at logarithmic plus a little constant still doesn't look 0:44:09.000,0:44:11.000 that bad. Okay, how bad can it really get? 0:44:11.000,0:44:15.000 It can get really bad. 0:44:15.000,0:44:18.000 I can take those seven numbers and insert them in the worst 0:44:18.000,0:44:21.000 case, and produce a completely degenerate tree 0:44:21.000,0:44:24.000 where, for example, if I insert them in sorted order 0:44:24.000,0:44:25.000 or in reverse sorted order, 0:44:25.000,0:44:26.000 0:44:26.000,0:44:28.000 I really have a link list. 0:44:28.000,0:44:32.000 I don't have a tree at all. It goes two, 18, 14, 15, 20, 21, 0:44:32.000,0:44:34.000 that's still a valid binary search tree. 0:44:34.000,0:44:35.000 If you 0:44:35.000,0:44:38.000 print it in order, it still comes out in sorted order. 0:44:38.000,0:44:41.000 It still can be searched using a binary search strategy. 0:44:41.000,0:44:45.000 It's just because the way it's been divided up here, it's not buying 0:44:45.000,0:44:48.000 us a lot. So if I'm looking for the number 22, I'll say, well 0:44:48.000,0:44:52.000 is it to the left or the right of two? It's to the right. Is it to the left or right of 0:44:52.000,0:44:55.000 eight? Oh, it's to the right. Is it to the left or right, you know like all the way down, so finding 0:44:55.000,0:44:57.000 22 looking at every single one 0:44:57.000,0:45:01.000 of the current entries in the tree, before I could conclude, as I fell off the 0:45:01.000,0:45:04.000 bottom, that it wasn't there. 0:45:04.000,0:45:06.000 Similarly kind of over there; 0:45:06.000,0:45:09.000 it didn't buy us a lot. It has a lot to do with 0:45:09.000,0:45:13.000 how we inserted it as to how good a balance we're going to get. 0:45:13.000,0:45:16.000 And there are some cases that will produce just drastically unbalanced 0:45:16.000,0:45:17.000 0:45:17.000,0:45:21.000 things that require linear searches, totally linear to walk that way 0:45:21.000,0:45:22.000 down 0:45:22.000,0:45:25.000 to find something. 0:45:25.000,0:45:26.000 There's a couple of other 0:45:26.000,0:45:29.000 degenerate trees, just to mention it's not only the sorted or reverse sorted. 0:45:29.000,0:45:33.000 There's also these other kind of saw tooth inputs, they're called, 0:45:33.000,0:45:35.000 where you just happen to at any given point to 0:45:35.000,0:45:39.000 have inserted the max or the min of the elements that remain, so it 0:45:39.000,0:45:42.000 keeps kind of dividing the input into one and 0:45:42.000,0:45:44.000 N-1. So having the 0:45:44.000,0:45:47.000 largest and the smallest, and the largest and the smallest, and then 0:45:47.000,0:45:48.000 subsequent smallest down there 0:45:48.000,0:45:52.000 can also produce a linear line. It's not quite all just down the left or 0:45:52.000,0:45:55.000 down the right, but it does mean that every single node has an empty 0:45:55.000,0:45:59.000 left or right subtree is what's considered the degenerate case here. And 0:45:59.000,0:46:01.000 both have the same property where 0:46:01.000,0:46:04.000 half of the pointers off 0:46:04.000,0:46:08.000 of each cell are going nowhere. 0:46:08.000,0:46:10.000 There is an interesting relationship here 0:46:10.000,0:46:12.000 between these worst-case inputs for transversion and quick sort, and they're 0:46:12.000,0:46:14.000 actually kind of exactly one to one. 0:46:14.000,0:46:18.000 That what made quick sort deteriorate was the element of the input, 0:46:18.000,0:46:21.000 the max or the min of what remain. And that's exactly the same case 0:46:21.000,0:46:25.000 for transversion, max or min of what remains means that you are not dividing your 0:46:25.000,0:46:29.000 input into two partitions of roughly equal size. You're dividing them into the 0:46:29.000,0:46:33.000 one, N-1, which is getting you nowhere. What do you 0:46:33.000,0:46:37.000 get to do about it? 0:46:37.000,0:46:40.000 You have to decide first of all if it's a problem worth solving. 0:46:40.000,0:46:41.000 0:46:41.000,0:46:44.000 So that's what the first one say. Sometimes you just say that 0:46:44.000,0:46:46.000 it will sometimes happen in some 0:46:46.000,0:46:47.000 inputs 0:46:47.000,0:46:50.000 that I consider rare enough that it will degenerate. I don't think they're 0:46:50.000,0:46:52.000 important enough or common enough to make a big deal out of. 0:46:52.000,0:46:56.000 I don't think that applies here, because it turns out that getting your data in sorted order is 0:46:56.000,0:46:59.000 probably not unlikely. That somebody might be refilling a map 0:46:59.000,0:47:02.000 with data they read from a file that they wrote out in sorted order last time. 0:47:02.000,0:47:05.000 So saying that if it is coming in sorted, 0:47:05.000,0:47:09.000 then I'm probably going to tolerate that, is probably not going to work. 0:47:09.000,0:47:12.000 And there's two main strategies then, if you've agreed to take action about it, 0:47:12.000,0:47:14.000 about what you're going to do. 0:47:14.000,0:47:15.000 0:47:15.000,0:47:19.000 The first one is to wait until 0:47:19.000,0:47:21.000 the problem gets bad enough 0:47:21.000,0:47:23.000 that you decide to do something about it. 0:47:23.000,0:47:28.000 So you don't do any rebalancing as you go. You just let stuff go, left, right, left, 0:47:28.000,0:47:30.000 right, whatever's happening. And 0:47:30.000,0:47:31.000 0:47:31.000,0:47:34.000 you try to keep track of the height of your tree, 0:47:34.000,0:47:37.000 and you realize when the height of your tree seems sufficiently out of whack with 0:47:37.000,0:47:39.000 your expected logarithmic height, 0:47:39.000,0:47:42.000 to do something about it. Then you just fix the whole tree. 0:47:42.000,0:47:45.000 You can clean the whole thing up. Pull them all out into a vector, rearrange them, 0:47:45.000,0:47:49.000 put them back in, so the middlemost node is the root all the way down. And kind of 0:47:49.000,0:47:50.000 create the perfectly balanced tree, 0:47:50.000,0:47:52.000 and then keep going 0:47:52.000,0:47:54.000 and wait for it to get out of whack, and fix it again; 0:47:54.000,0:47:56.000 kind 0:47:56.000,0:47:58.000 of wait and see. 0:47:58.000,0:48:01.000 The other strategy, and this happens to be a more common one in 0:48:01.000,0:48:06.000 practice, is you just constantly do a little bit of fixing up all the time. 0:48:06.000,0:48:09.000 Whenever you notice that the tree is looking just a little bit lopsided, you let 0:48:09.000,0:48:12.000 it get a little bit; usually you'll let it be out of balance by one. 0:48:12.000,0:48:15.000 So you might have height four on that side and height three on that side. 0:48:15.000,0:48:17.000 But as soon as it starts to get to be three and five, 0:48:17.000,0:48:20.000 you do a little shuffle to bring them back to both four. And it 0:48:20.000,0:48:21.000 involves just a little bit of work 0:48:21.000,0:48:23.000 all the time 0:48:23.000,0:48:25.000 to where you just don't let 0:48:25.000,0:48:28.000 the tree turn into a real mess. 0:48:28.000,0:48:31.000 But there's a little bit of extra fixing up being done all the time. 0:48:31.000,0:48:35.000 And so the particular kind of tree that's discussed in the text, which you can read about, 0:48:35.000,0:48:37.000 but we won't cover 0:48:37.000,0:48:40.000 as part of our core material, is called the ADL tree. It's interesting to 0:48:40.000,0:48:43.000 read about it to see what does it take, what kind of 0:48:43.000,0:48:47.000 process is required to monitor this and do something about it. And 0:48:47.000,0:48:50.000 so if we have that in place, 0:48:50.000,0:48:51.000 0:48:51.000,0:48:53.000 we can get it to where 0:48:53.000,0:48:57.000 the binary search tree part of this is logarithmic on both 0:48:57.000,0:48:59.000 inserting new values and searching for values, 0:48:59.000,0:49:03.000 which is that Holy Grail of efficiency boundary. You can say, yeah logarithmic is 0:49:03.000,0:49:04.000 something we can tolerate 0:49:04.000,0:49:08.000 by virtue of adding 8 bytes of pointer, potentially some balance factor, 0:49:08.000,0:49:12.000 and some fanciness to guarantee that performance all over, 0:49:12.000,0:49:15.000 but gets us somewhere that will scale very, very well for thousands 0:49:15.000,0:49:18.000 and millions of entries, right, logarithmic is still a very, very fast 0:49:18.000,0:49:20.000 function. So it 0:49:20.000,0:49:24.000 creates a map that could really actually go the distance in scale. So, that's 0:49:24.000,0:49:28.000 your talk about binary search trees. We'll come back and talk about 0:49:28.000,0:49:38.000 hashing in graphs and stuff. I will see you on Wednesday.