Lecture 22 | Programming Abstractions (Stanford)
-
0:00 - 0:07
-
0:07 - 0:12
-
0:12 - 0:15This presentation is delivered by the Stanford Center for Professional
-
0:15 - 0:25Development.
-
0:25 - 0:27How are we doing?
-
0:27 - 0:28Spring apparently is here;
-
0:28 - 0:33just like California, one day it's winter, one day it's spring. Hopefully it's here to say.
-
0:33 - 0:38So we had talked about doing it, we did this last time, we did an unsorted vector
-
0:38 - 0:40implementation of the map, we just did a little struct of
-
0:40 - 0:42the key value pair.
-
0:42 - 0:44And then we went through
-
0:44 - 0:48that both add and get value are running at linear time in that form because of the need
-
0:48 - 0:52to search; search to find a value to override it, and search to find a value to
-
0:52 - 0:54pull out the existing
-
0:54 - 0:55match to it.
-
0:55 - 0:58And if we went to the sorted form of that, we could get get value to be able to
-
0:58 - 1:02capitalize on that binary search property and be able to quickly find the
-
1:02 - 1:06key if it exists in there or to determine it doesn't exist.
-
1:06 - 1:09But the add still suffered from the shuffling problem. We can quickly
-
1:09 - 1:12find, using the binary search to find where the
-
1:12 - 1:15new element would need to go. But if it doesn't already go there, it doesn't
-
1:15 - 1:17already exist. We're going to have to shuffle to make that space.
-
1:17 - 1:21So we have still a linear factor in there that we're working on getting rid of.
-
1:21 - 1:24And then I know at the bottom that none of them have any
-
1:24 - 1:27built in overhead per element. There may be some capacity, excess capacity in the
-
1:27 - 1:30vector that we're absorbing, but there's no
-
1:30 - 1:33four, ten, eight-byte chunk that every
-
1:33 - 1:36element is taking as a cost.
-
1:36 - 1:39So we started to talk about this at the end on Friday, and we're going to go with this and
-
1:39 - 1:41see where this takes
-
1:41 - 1:45us. We had the idea of the vector, and we know what vector constraints are, right? This
-
1:45 - 1:48idea of the contiguous memory, and we were going to think about the link list for a minute to
-
1:48 - 1:51see if it was going to buy us anything.
-
1:51 - 1:54It tends to have an inverse relationship, that once you know where you
-
1:54 - 1:57want to insert something it's easy to do that rearrangement because it's just a
-
1:57 - 2:01little bit of pointer rewiring that needs to happen in that situation.
-
2:01 - 2:03But it's hard to find a position insert,
-
2:03 - 2:06even if the link list is in sorted order,
-
2:06 - 2:10knowing where the S's are, or the M's are, or the B's are is still a
-
2:10 - 2:13matter of starting from the beginning; that your only access in the
-
2:13 - 2:16singly linked list is to start at the beginning and work your way down.
-
2:16 - 2:20And so I'm going to take you through a little bit of a thought exercise about well, what if
-
2:20 - 2:23we try to do something about that? Right now having a pointer coming into
-
2:23 - 2:26Bashful and then subsequent ones going in sorted order
-
2:26 - 2:27isn't really very helpful.
-
2:27 - 2:30But if we were willing to try to rearrange our pointers
-
2:30 - 2:33to give us the access that we have in the
-
2:33 - 2:36sorted array form; for example, if I had a pointer
-
2:36 - 2:39to the middlemost element, in
-
2:39 - 2:42this case the Grumpy that's
-
2:42 - 2:43midway down the list, if I had
-
2:43 - 2:45that pointer
-
2:45 - 2:49and from there I could say, is the element I'm looking for
-
2:49 - 2:53ahead of Grumpy or behind Grumpy. So having Grumpy split my input in
-
2:53 - 2:54half.
-
2:54 - 2:56If I had that, right, then that would actually get me
-
2:56 - 2:59sort of moving towards that direction that I wanted to get, which is using the binary
-
2:59 - 3:03search property to facilitate throwing away half the data
-
3:03 - 3:06and look into just the half remaining that's interesting.
-
3:06 - 3:09Well, if I had a pointer to Grumpy, that doesn't quite solve my problem. A pointer to Grumpy
-
3:09 - 3:12could get me halfway down the list, but then the pointer's coming in this way. If I
-
3:12 - 3:16inverted the pointers leading away from it,
-
3:16 - 3:17that's also kind of helping
-
3:17 - 3:20to facilitate what I'm trying to do. So if I could point toward the middle
-
3:20 - 3:24and then kind of move away, and then if I actually kind of applied the same strategy not just at the topmost
-
3:24 - 3:28level, but at every subsequent division down, from the half, to the
-
3:28 - 3:29quarter to the eight,
-
3:29 - 3:32to the sixteenth. And instead of having a pointer
-
3:32 - 3:34to the front of the list, what I really maintained was a pointer to the
-
3:34 - 3:36middlemost element, and then
-
3:36 - 3:38that element maintained pointers
-
3:38 - 3:40to the middle of the
-
3:40 - 3:44upper quarter and the middle of the lower quarter, and all the way down.
-
3:44 - 3:46Then I could pull up this list
-
3:46 - 3:50into something that we call a binary search tree.
-
3:50 - 3:54So the way I would be storing this is having a pointer to a node, which
-
3:54 - 3:58is expected to the median or the middle, or somewhere right along
-
3:58 - 3:59the
-
3:59 - 4:00values being stored.
-
4:00 - 4:04And that it has two pointers, not just a next or previous or something like that, that are related to all
-
4:04 - 4:08the elements that precede this one in the ordering, and all the elements that
-
4:08 - 4:09follow it are in
-
4:09 - 4:11these two pointers.
-
4:11 - 4:14They call these subtrees coming off the smaller trees over here with three
-
4:14 - 4:17elements on this side, and three elements on that side.
-
4:17 - 4:19And that each of those nodes
-
4:19 - 4:22has recursively the same formulation; that it has two
-
4:22 - 4:25subtree hanging off of it
-
4:25 - 4:27that contain,
-
4:27 - 4:32given any particular node, that there's this binary search ordering here that says
-
4:32 - 4:36everything that's in that left subtree precedes this node in ordering.
-
4:36 - 4:38And everything that's in that right subtree
-
4:38 - 4:39follows it.
-
4:39 - 4:44And that's true for every node throughout the entire arrangement here.
-
4:44 - 4:47So this is called a tree, and
-
4:47 - 4:50this particular form is actually called the binary search tree. I'm
-
4:50 - 4:53going to throw a little bit of vocabulary at you, because I'm going to start using these words, and I just want to get
-
4:53 - 4:54them out there
-
4:54 - 4:58early so that you understand what those words mean when I start using them.
-
4:58 - 5:01Is that each of the cells in here,
-
5:01 - 5:02like the cells that are
-
5:02 - 5:04allocated for a link list,
-
5:04 - 5:07are individually heap allocated, we're going to have pointers that allow us to get back
-
5:07 - 5:08to
-
5:08 - 5:12them. We typically call those nodes; in terms of tree I'll call it a cell or node in the list kind
-
5:12 - 5:16of interchangeably. I'm more likely to use node when I'm talking about tree.
-
5:16 - 5:19And a tree is really just a pointer to a node.
-
5:19 - 5:21The empty tree
-
5:21 - 5:24would be a pointer whose value is null, so pointing at nothing.
-
5:24 - 5:27A singleton tree would be a pointer to a single node
-
5:27 - 5:29that has left and right subtrees
-
5:29 - 5:31that are both empty or both null.
-
5:31 - 5:35So Bashful by itself is a singleton, a pointer to
-
5:35 - 5:38Bashful. When we talk about subtrees, it just kind of means looking at the top; Grumpy has two
-
5:38 - 5:40subtrees, a left and a right,
-
5:40 - 5:41
-
5:41 - 5:42that
-
5:42 - 5:45themselves are just smaller forms of the same recursive structure.
-
5:45 - 5:49And then we would say that Grumpy is the parent of Doc and Sleepy.
-
5:49 - 5:52So those are the two children nodes, and we'll call them the left child and the right
-
5:52 - 5:53child.
-
5:53 - 5:56And the node that's at the very top,
-
5:56 - 6:00the one that our external pointer is coming into and then kind of leads the way, is called
-
6:00 - 6:02the root node. The
-
6:02 - 6:02first node
-
6:02 - 6:05of the tree where we first start our work is going to be called the root.
-
6:05 - 6:09And these nodes down here at the bottom that have empty subtrees are called leaf nodes.
-
6:09 - 6:14So a node that has no further subtrees, so by itself has null
-
6:14 - 6:16left and right trees, is called a leaf.
-
6:16 - 6:18So that actually means the tree is kind of upside down
-
6:18 - 6:21if you look at it. That Grumpy's the root and these things are the
-
6:21 - 6:25leaves shows you that apparently most computer scientists don't get out very much,
-
6:25 - 6:28because they haven't noticed that trees actually grow the other direction.
-
6:28 - 6:29
-
6:29 - 6:32But just kind of go with it, and
-
6:32 - 6:33you'll be
-
6:33 - 6:36speaking tree-speak in no time.
-
6:36 - 6:43Anybody have any questions about what we've got basically set up here?
-
6:43 - 6:45So there is a
-
6:45 - 6:48whole family of things that come under the heading of tree in computer
-
6:48 - 6:52science. We're going to look in particular at this one embodiment, the binary search
-
6:52 - 6:53tree.
-
6:53 - 6:56But before I get deep down in the details of that, I just want to back
-
6:56 - 6:59up for a second and just talk about what trees in general mean, and what kind of
-
6:59 - 7:01things in general are represented in manipulative trees.
-
7:01 - 7:06So that you have a sense of what is the broad notion of tree as a
-
7:06 - 7:08computer scientist would see it,
-
7:08 - 7:10is just that a tree is a recursive branching structure
-
7:10 - 7:14where there are nodes, which have pointers to subtrees.
-
7:14 - 7:17There may be one of those subtrees. There may be ten of those subtrees. There
-
7:17 - 7:20may be two; depending on the nature of the tree you're working on there may be a
-
7:20 - 7:22constraint on exactly how many children they have
-
7:22 - 7:26or some flexibility in the number of children they have
-
7:26 - 7:27
-
7:27 - 7:29depending on what's being modeled here.
-
7:29 - 7:31There's always a single root node.
-
7:31 - 7:35There's a node that kind of sits at the top of this hierarchy that leads down to
-
7:35 - 7:36these things,
-
7:36 - 7:41and that to form a tree there has to be a reachable path from the root
-
7:41 - 7:43to every single node.
-
7:43 - 7:46So there aren't some nodes that exist way out here, and there aren't two different ways to
-
7:46 - 7:48get to the same
-
7:48 - 7:50node in a tree; that if the
-
7:50 - 7:53midterm is kept in the folder called exams, which is in the folder
-
7:53 - 7:56called cs106, which is in my home folder. There's not some other way that
-
7:56 - 7:59you can get to the midterm; the midterm lives exactly that part in the tree,
-
7:59 - 8:01and there's no
-
8:01 - 8:04circularity or loops in the structure.
-
8:04 - 8:07And so a lot of things do get modeled tree, the one I've drawn here is just the file system.
-
8:07 - 8:10Think about how your files are maintained on a computer, there tends to be a
-
8:10 - 8:11structure of an
-
8:11 - 8:14outermost folder, which has inner folders,
-
8:14 - 8:17which have inner folders. And along the way there are these files, which you can think of as leaf
-
8:17 - 8:21nodes that have no further subtrees underneath them.
-
8:21 - 8:24And you can talk about, say, what is the size of
-
8:24 - 8:27all the folders in my home directory? Well it would kind
-
8:27 - 8:28of be recursively defined as summing over
-
8:28 - 8:30all the folders within it,
-
8:30 - 8:33how much storage is being used by those; which when it gets down to a leaf node can say, how
-
8:33 - 8:38much space does this file take up and kind of add those together.
-
8:38 - 8:42Things like your family trees, right, sort of modeling genealogy.
-
8:42 - 8:44Things like the decomposition tree in your program;
-
8:44 - 8:47main calling,
-
8:47 - 8:50read file calling, set up boggle board calling,
-
8:50 - 8:52give instructions.
-
8:52 - 8:56There is a tree structure to that, about what pieces use what pieces and
-
8:56 - 8:57descendants from there.
-
8:57 - 9:01The ones that we're going to look at in particularly are the ones that are in the family of the binary trees. So a binary
-
9:01 - 9:05tree has two children, exactly two children. There's a left and a right
-
9:05 - 9:07pointer coming out of each node.
-
9:07 - 9:11One or both of those can be null, but there can never be more than two
-
9:11 - 9:11children.
-
9:11 - 9:15And there is sort of space for recording two children, as opposed to recording things which
-
9:15 - 9:17are trinary, which have three
-
9:17 - 9:20possibilities for children, or where there may be some four or eight or
-
9:20 - 9:22some other number there.
-
9:22 - 9:24And in particular it's the search tree
-
9:24 - 9:26that's going to help us get
-
9:26 - 9:28the work done that we want, which is
-
9:28 - 9:32there's not just any old arrangement of what's on the left, and what's on the right side.
-
9:32 - 9:35That the nodes are going to be organized in here to facilitate binary search by
-
9:35 - 9:36keeping
-
9:36 - 9:37one half
-
9:37 - 9:40over on the left and one half over on the right, and that we'll know which
-
9:40 - 9:44half something goes in by virtue of looking at the key. So
-
9:44 - 9:47let's take a look at a simple example of a binary search tree with
-
9:47 - 9:49numbers. So
-
9:49 - 9:53if I have a struct node that keeps track of an integer value and has two pointers, the
-
9:53 - 9:55left and the right,
-
9:55 - 9:59that point away from it. And in this case my root node is 57,
-
9:59 - 10:03and looking at 57 I can tell you that every value in this tree
-
10:03 - 10:07that is less than 57 will be in the left subtree. There will be no values
-
10:07 - 10:09that are 56 or lower that are on the
-
10:09 - 10:10right side.
-
10:10 - 10:14And that will be true at every step down the tree. So if I'm looking for
-
10:14 - 10:15something
-
10:15 - 10:18that actually gives me that immediate access we were using the vector's
-
10:18 - 10:19sorting property for. Which is if
-
10:19 - 10:24I'm looking for the number 70, and I start at the root node of 57, then I know
-
10:24 - 10:25it's not in the left subtree,
-
10:25 - 10:28so assuming that about half the nodes are
-
10:28 - 10:30pointed to over on that side,
-
10:30 - 10:34I've now discarded half of the input even from consideration. If
-
10:34 - 10:36I start at the 70,
-
10:36 - 10:39and I say okay well if 57 must be over here, then again recursively apply the
-
10:39 - 10:44same search property here. It's like if I'm looking for 70, is it going to be
-
10:44 - 10:45before or after 79?
-
10:45 - 10:49It's going to be before, so in fact I throw away anything that leads away on
-
10:49 - 10:52the right subtree of 79, and work my way one level down.
-
10:52 - 10:53So each step
-
10:53 - 10:58in the recursive search is represented by sort of taking a step down
-
10:58 - 11:02in that search tree; working my way down from the root to the leaf nodes. And
-
11:02 - 11:05I either will find the value I'm looking for along the way,
-
11:05 - 11:08or I will fall off the bottom of that tree
-
11:08 - 11:10when I haven't been able to find it.
-
11:10 - 11:13And that will tell me; oh it couldn't have been in the tree, because the only place
-
11:13 - 11:16it could have been is the one that followed the binary search property.
-
11:16 - 11:19And so if I look for 70, I say is it greater or less than
-
11:19 - 11:2162? It's greater. Is it
-
11:21 - 11:24greater or less than 71? It's less then. And then I work off there I get to the empty
-
11:24 - 11:26tree. Then I say well if there was a 60, that's where it was
-
11:26 - 11:31going to be, and it wasn't there so there much not be a 70 in this tree.
-
11:31 - 11:32So it is exactly designed
-
11:32 - 11:36to support one kind of search, and one kind of search only, which is this
-
11:36 - 11:40efficient divide and conquer binary search, work its way down.
-
11:40 - 11:40
-
11:40 - 11:42The other kinds of things, let's say, that
-
11:42 - 11:44trees
-
11:44 - 11:45are used for
-
11:45 - 11:48may not need this kind of support, so they may not go out of their way to do this. But the one
-
11:48 - 11:52we're actually very interested in is because of maps doing a lot of
-
11:52 - 11:55additions into a sorted facility and
-
11:55 - 11:57searches in that sort of facility. If we can
-
11:57 - 12:00bottle it as a binary search tree, we're able to get the best of both worlds. To
-
12:00 - 12:03have that super fast logarithmic search,
-
12:03 - 12:07as well as have the flexibility of the pointer based structure
-
12:07 - 12:11to help us do an insert. So if I'm ready to put a 70 into this tree,
-
12:11 - 12:14the same path that led to me to where it should be
-
12:14 - 12:16tells us where to put that new node
-
12:16 - 12:20in place. And if all I need to do is to wire that in place; and if all I need to do to wire that in place is to kind of create a new node in the
-
12:20 - 12:21heap and
-
12:21 - 12:24put the pointers coming in and out of it the way that I would in a link list,
-
12:24 - 12:27then I should be able to do both searches and inserts
-
12:27 - 12:30in time proportional to the height of the tree,
-
12:30 - 12:35which for a relatively balanced tree would be logarithmic. So
-
12:35 - 12:37it sounds like we can get everything we wanted here
-
12:37 - 12:40with a little bit of work. So
-
12:40 - 12:43let me tell you - just do a little bit of playing with trees, just to kind of get used to thinking
-
12:43 - 12:46about how we do stuff on them, and then I'm going to go forward with
-
12:46 - 12:49implementing map using a binary search tree.
-
12:49 - 12:52Trees are very, very recursive. So for those of you who still feel like
-
12:52 - 12:54recursion is a little bit hazy,
-
12:54 - 12:58this is another great area to strengthen your thinking about
-
12:58 - 13:01recursive. In fact, I think a lot of people find operating on trees very, very
-
13:01 - 13:03natural. Because trees are so recursive, the
-
13:03 - 13:07idea that you write recursive algorithms to them is almost like you do it without even thinking, you don't
-
13:07 - 13:08have to think hard. You're like,
-
13:08 - 13:11you typically need to do something to your current node,
-
13:11 - 13:14and then recursively operate on your left and your right.
-
13:14 - 13:17And it's so natural that you don't even have stop and think hard about
-
13:17 - 13:18what that means. So
-
13:18 - 13:21your base case tends to be getting to an empty tree and having a
-
13:21 - 13:23pointer that points to null,
-
13:23 - 13:28and in the other cases have some node operate on your children.
-
13:28 - 13:31Some of the simplest things are just traversals.
-
13:31 - 13:33If I tree structure and I'd like to
-
13:33 - 13:35do something to every node,
-
13:35 - 13:37go through it and print them all,
-
13:37 - 13:39go through it and write them to a file,
-
13:39 - 13:42go through and add them all up to determine the average score, I
-
13:42 - 13:46need these things. I'm going to need to do some processing that visits every node of the
-
13:46 - 13:48tree once and exactly once,
-
13:48 - 13:50and recursion is exactly the way to do that.
-
13:50 - 13:53And there's a little bit of some choices, and when you're handling a
-
13:53 - 13:55particular part of a tree, are you handling the current node and then
-
13:55 - 13:59recurring on the left and right subtrees? Are you recurring first and then handling
-
13:59 - 14:02the nodes. So some of those things that we saw in terms of the link list
-
14:02 - 14:03that
-
14:03 - 14:06change the order of the processing, but they don't change
-
14:06 - 14:08whether you visit everybody. It's just a matter of which one you want to do
-
14:08 - 14:09before.
-
14:09 - 14:12I'm going to look at a couple of the simple traversals here,
-
14:12 - 14:14and I'll point out
-
14:14 - 14:16why that mattered.
-
14:16 - 14:18If we look at in order,
-
14:18 - 14:22so we'll consider that an in order traversal is for you to operate on your
-
14:22 - 14:24left subtree recursively,
-
14:24 - 14:27handle the current node right here, and then operate on your right subtree
-
14:27 - 14:29recursively.
-
14:29 - 14:31The base case, which is
-
14:31 - 14:34kind of hidden in here, is if T is null, we get to an empty subtree we
-
14:34 - 14:37don't do anything. So only in the nonempty case are we actually
-
14:37 - 14:39processing and making recursive calls.
-
14:39 - 14:43So if I start with my node 57 being the root of my tree, and I say
-
14:43 - 14:45print that tree starting from the root.
-
14:45 - 14:48It will print everything out of the left subtree,
-
14:48 - 14:52then print the 57, and then print everything out of the right subtree.
-
14:52 - 14:55And given that gets applied everywhere down,
-
14:55 - 14:58the effect will be 57 says first print my left subtree.
-
14:58 - 15:01And 37 says, well print my left subtree. And 15 says, well first
-
15:01 - 15:03print my left subtree.
-
15:03 - 15:06And then seven says, first print my left subtree. Well, that ones empty,
-
15:06 - 15:09and so nothing gets printed.
-
15:09 - 15:13As the recursion unwinds, it comes back to the last most call. The last most call was
-
15:13 - 15:17seven. It says, okay, print seven and now print seven's right subtree. Again that's an empty,
-
15:17 - 15:19so nothing gets printed there. It
-
15:19 - 15:22will come back and then do the same thing with 15. I've done everything
-
15:22 - 15:23to the left of 15;
-
15:23 - 15:26let's print 15 and its right.
-
15:26 - 15:29And doing that kind of all the way throughout the tree means that we will get the numbers out
-
15:29 - 15:31in sorted order;
-
15:31 - 15:34all right, five, 17, seven,
-
15:34 - 15:3715, 32, 57, 59, 62, 71,
-
15:37 - 15:4079, 93. An in order traversal
-
15:40 - 15:42in a binary search tree
-
15:42 - 15:50rediscovers, visits all the nodes from smallest to largest. [Inaudible]
-
15:50 - 15:54So duplicates, you just have to have a strategy. It turns out in the map, the duplicates
-
15:54 - 15:54overwrite.
-
15:54 - 15:58So any time you get a new value you print there. Typically you just need to decide,
-
15:58 - 16:00do they go to the left or do they go to the right?
-
16:00 - 16:03So that you know when you're equal to if you wanted to find another one
-
16:03 - 16:04of these, where would it go?
-
16:04 - 16:07In a lot of situations you just don't have duplicates, so it's not actually a big deal.
-
16:07 - 16:10You can't throw them arbitrarily on either side because you would end up having to search
-
16:10 - 16:14both sides to find them. So you want to just know that's it's less than or equal to on the
-
16:14 - 16:18left and greater than on the right. So that's
-
16:18 - 16:22pretty neat, right, that that means that this
-
16:22 - 16:22
-
16:22 - 16:25for purposes of being able to iterate for something, if
-
16:25 - 16:26you were to walk
-
16:26 - 16:30the structure in order, then you can produce them back out in sorted order, which tends to be
-
16:30 - 16:31a convenient way
-
16:31 - 16:33
-
16:33 - 16:36to print the output. There are a lot of situations where you do want that browsable
-
16:36 - 16:43list in alphanumerical order. And the tree makes that easy to do.
-
16:43 - 16:45When it comes time to delete the tree, I'm
-
16:45 - 16:48going to need to go through and delete each of those nodes individually, the
-
16:48 - 16:51same way I would on a link list to delete the whole structure.
-
16:51 - 16:54The
-
16:54 - 16:57order of traversal that's going to be most convenient for doing that is going to be to do it
-
16:57 - 16:58post order.
-
16:58 - 17:00Which is to say, when I'm looking at the root node 57,
-
17:00 - 17:05I want to delete my entire left subtree recursively, delete my entire right
-
17:05 - 17:10subtree recursively, and only after both of those pieces have been cleaned up,
-
17:10 - 17:12delete the node we're on;
-
17:12 - 17:14so deleting down below to the left, down below to the right
-
17:14 - 17:17and then delete this one. If I did it in the other order, if
-
17:17 - 17:21I tried to delete in between in the in order or in the pre order case where I did
-
17:21 - 17:23my node and then to my subtrees,
-
17:23 - 17:26I'd be reaching into parts of my memory I've deleted.
-
17:26 - 17:31So we do want to do it in that order, so
-
17:31 - 17:32recursion
-
17:32 - 17:35coming back to visit you again.
-
17:35 - 17:42Let's talk about how to make map work as a tree.
-
17:42 - 17:46So if you think about what maps hold, maps hold a key, which is string type; a
-
17:46 - 17:48value, which is client
-
17:48 - 17:51some sort of template parameter there.
-
17:51 - 17:55Now that I've built a structure that raps those two things with a pair and
-
17:55 - 17:58adds to it, right these two additional pointers to the left and to the right
-
17:58 - 18:01subtrees that point back to a recursive struct,
-
18:01 - 18:05the one data member that the map will have at the top is just the root
-
18:05 - 18:09pointer. The pointer to the node at the top of the tree,
-
18:09 - 18:12and then from there we'll traverse pointers to find our way to other ones.
-
18:12 - 18:15And we will organize that tree for the binary search property.
-
18:15 - 18:16So using the
-
18:16 - 18:20key as the discriminate of what goes left and right,
-
18:20 - 18:25we'll compare each new key to the key at the root and decide is it going to go in the left subtree
-
18:25 - 18:27root or the right subtree root? And so on down to the
-
18:27 - 18:31matching position when we're doing a find, or to the place to insert if we're ready to
-
18:31 - 18:33put a new node in.
-
18:33 - 18:36So get value and add will have very similar structures, in terms of
-
18:36 - 18:38searching that tree
-
18:38 - 18:39
-
18:39 - 18:40from the root
-
18:40 - 18:45down to the leaf to find that match along the way, or to
-
18:45 - 18:49report where to put a new one in. So
-
18:49 - 18:52let's look at the actual code; it has
-
18:52 - 18:54my struct, it
-
18:54 - 18:57has the strike key, it has the value type node,
-
18:57 - 19:00two pointers there, and then the one data member for the map
-
19:00 - 19:03itself here is the pointer to the root node. I
-
19:03 - 19:09also put a couple of helper number functions that I'm going to need.
-
19:09 - 19:10
-
19:10 - 19:13Searching the tree and entering a new node in the tree are going to require a little
-
19:13 - 19:18helper, and we're going to see why that is. That basically that the add and get value public
-
19:18 - 19:20number functions are going to be just wrappers
-
19:20 - 19:23that turn and pass the code off to these recursive helpers.
-
19:23 - 19:26And if these cases, the additional information that each of these is
-
19:26 - 19:28tracking is what node we're at
-
19:28 - 19:32as we work our way down the tree. But the initial call to add is just going to say, well
-
19:32 - 19:34here's a key to value go put it in the right place.
-
19:34 - 19:37And that while we're doing the recursion we need to know where we are in the tree. So
-
19:37 - 19:39we're adding that extra parameter.
-
19:39 - 19:43What a lot of recursive code needs is just a little more housekeeping, and the
-
19:43 - 19:44housekeeping is
-
19:44 - 19:51what part of the tree we're currently searching or trying to enter into.
-
19:51 - 19:54So this is the outer call for get value.
-
19:54 - 19:57So get value is really just a wrapper,
-
19:57 - 19:59it's going to call tree search. It's going to
-
19:59 - 20:03ask the starting value for root. It's the
-
20:03 - 20:04beginning point for the search,
-
20:04 - 20:07looking for a particular key. And what tree searches are going to return is a
-
20:07 - 20:10pointer to a node. If
-
20:10 - 20:14that node was not found, if there was no node that has a string
-
20:14 - 20:17key that matches the one that we're looking for, then we'll return null. Otherwise, we'll return
-
20:17 - 20:18the node.
-
20:18 - 20:20And so it either will error
-
20:20 - 20:23when it didn't find it or
-
20:23 - 20:25pull out the value that was out of
-
20:25 - 20:25that
-
20:25 - 20:27node structure.
-
20:27 - 20:30Well, this part looks pretty okay. The template, right, there's always just a
-
20:30 - 20:33little bit of goo up there, right, seeing kind of
-
20:33 - 20:36the template header on the top and the use of the val type, and all this other stuff.
-
20:36 - 20:39It's going to get a little bit worse here. So just kind of
-
20:39 - 20:42hopefully take a deep breath and say, okay that part actually
-
20:42 - 20:46doesn't scare me too much. We're going to look at what happens when I write the tree search code, where
-
20:46 - 20:48it really actually goes off and does the search down the tree. And there's going to be
-
20:48 - 20:49
-
20:49 - 20:52a little bit more machinations required on this. So let's just talk about it first,
-
20:52 - 20:55just what the code does. And then I'm going to come back and talk about some of the syntax that's
-
20:55 - 20:58required to make this work correctly.
-
20:58 - 21:02So the idea of tree search is, given a pointer to a particular
-
21:02 - 21:02subtree, it
-
21:02 - 21:05could be the root of the original tree or some
-
21:05 - 21:07smaller subtree further down, and a key that we're looking for, we're
-
21:07 - 21:10going to return a match when we find that node.
-
21:10 - 21:15So looking at the step here, where if the key that we have matches the key of the node
-
21:15 - 21:17we're looking at, then this is the node.
-
21:17 - 21:19This is the one we wanted.
-
21:19 - 21:23Otherwise, right, we look at two cases. If the key is less than this node's key,
-
21:23 - 21:26then we're just going to return the result from searching in the left subtree.
-
21:26 - 21:31Otherwise it must be greater than, and we're going to search in the right subtree.
-
21:31 - 21:34So this case, right, duplicates aren't allowed, so there will be exactly one match, if
-
21:34 - 21:37at all.
-
21:37 - 21:39And when we find it we return.
-
21:39 - 21:43The other case that needs to be handled as part of base case is if what if we ever get to an
-
21:43 - 21:44empty tree.
-
21:44 - 21:47So we're looking for 32 and the last node we just passed was
-
21:47 - 21:5035. We needed to be the left of it, and it turns out there was nothing down there, we
-
21:50 - 21:52have an empty tree.
-
21:52 - 21:54We'll hit this case where T is null, and we'll return null. That means there was never
-
21:54 - 21:58a match to that. There's no other place in the tree
-
21:58 - 22:01where that 32 could have been. The binary search property completely dictates the
-
22:01 - 22:04series of left and right turns. There's nowhere else to look.
-
22:04 - 22:06So that's actually, really solving
-
22:06 - 22:07
-
22:07 - 22:11a lot of our - cleaning up our work for us, streamlining where we need to go;
-
22:11 - 22:14because the tree is organized through the
-
22:14 - 22:16mid point and the quarter point and then the eighth point
-
22:16 - 22:19to narrow down to where we could go. So as
-
22:19 - 22:21it is, this code works correctly.
-
22:21 - 22:25So do you have any questions about how it's thinking about doing its work?
-
22:25 - 22:30And then I'm going to show you why the syntax is a little bit wrong though.
-
22:30 - 22:32Okay, so let's talk about this
-
22:32 - 22:34return type coming out of
-
22:34 - 22:36the function.
-
22:36 - 22:38So let's go back to this for a second.
-
22:38 - 22:41That node is defined
-
22:41 - 22:43as a private member
-
22:43 - 22:46of the map class.
-
22:46 - 22:48So node is actually not a global type.
-
22:48 - 22:53It does not exist outside the map class. Its real name is
-
22:53 - 22:56
-
22:56 - 22:58map
-
22:58 - 23:00val type::
-
23:00 - 23:03node.
-
23:03 - 23:05That's its full name,
-
23:05 - 23:08is that. Now,
-
23:08 - 23:08inside
-
23:08 - 23:10the implementation of map,
-
23:10 - 23:14which is to say inside the class map, or inside member functions that we're
-
23:14 - 23:17defining later, you'll not that we can just kind of drop
-
23:17 - 23:21the big full name. The kind of my name is mister such and such, and such and
-
23:21 - 23:26such; it's like okay, you can just call it node, the short name internal to the class
-
23:26 - 23:28because it's inside that scope. And
-
23:28 - 23:31it's able to say, okay in this scope is there something called node? Okay there is. Okay, that's the
-
23:31 - 23:34one. Outside of that scope, though, if you were trying to use that outside of the
-
23:34 - 23:37class or some other places, it's going to
-
23:37 - 23:40have a little bit more trouble with this idea of node by itself. It's going to need to see
-
23:40 - 23:42that full name. Let
-
23:42 - 23:44me show you on this piece of code,
-
23:44 - 23:46that node here
-
23:46 - 23:51happens to be in the slight quirk of the way sequel plus defines things, that that
-
23:51 - 23:56use of node is not considered in class scope yet.
-
23:56 - 23:57That
-
23:57 - 24:01the compiler leads from left to write, so we'll look at the first one. It says, template
-
24:01 - 24:04type name val type, and so what the compiler sees so far is, I'm about to
-
24:04 - 24:07see something that's templated on val type.
-
24:07 - 24:10So it's a pattern for which val type is a placeholder in.
-
24:10 - 24:12And then it sees val type
-
24:12 - 24:16as the return value, and it says, okay, that's the placeholder. And then it sees this map :: and
-
24:16 - 24:18
-
24:18 - 24:21it says, okay, this is within the - the thing
-
24:21 - 24:26that I'm defining here is within scope of the map class. And so it sees the map
-
24:26 - 24:26::
-
24:26 - 24:29and that as soon as it crosses those double colons,
-
24:29 - 24:31from that point forward
-
24:31 - 24:33it's inside the map scope.
-
24:33 - 24:34And so,
-
24:34 - 24:37when I use things like tree search, or I use the name node, it knows what I'm
-
24:37 - 24:40talking about. It says, oh there's a tree search in map class. There's a node struct in map
-
24:40 - 24:43class, it's all good.
-
24:43 - 24:47In the case of this one I've seen template type and val type, and
-
24:47 - 24:50at this point it doesn't know anything yet.
-
24:50 - 24:53And so it sees me using node, and it say,
-
24:53 - 24:54node I've
-
24:54 - 24:55never heard of node.
-
24:55 - 24:58I'm in the middle of making some template based on val type, but I haven't heard
-
24:58 - 25:01anything about node. Node isn't something that exists in the global name space.
-
25:01 - 25:03You must have made some
-
25:03 - 25:04tragic error.
-
25:04 - 25:08What it wants you to say is
-
25:08 - 25:09to give it the full name
-
25:09 - 25:12of map :: node.
-
25:12 - 25:14That the real full
-
25:14 - 25:16this is my name,
-
25:16 - 25:21and otherwise it will be quite confused.
-
25:21 - 25:24If that weren't enough,
-
25:24 - 25:26we'd like to think that we were done
-
25:26 - 25:27with
-
25:27 - 25:30placating the C compiler. But it turns out there's one other little
-
25:30 - 25:32thing that has to happen here,
-
25:32 - 25:34which is there is an obscure
-
25:34 - 25:38reason why, even though I've given it the full name, and I feel like I've done my
-
25:38 - 25:40duty in describing this thing,
-
25:40 - 25:43that the compiler still is a little bit confused
-
25:43 - 25:45by this usage.
-
25:45 - 25:48And because the C language is just immensely complicated,
-
25:48 - 25:50there are a lot of funny interactions,
-
25:50 - 25:52there are a couple of situations it can get in to,
-
25:52 - 25:53where
-
25:53 - 25:56because it's trying hard to read left to right and see the things.
-
25:56 - 25:59Sometimes it needs to know some things in advance that it doesn't have all the
-
25:59 - 26:00information for.
-
26:00 - 26:03And it turns out in this case there's actually some obscure situation where
-
26:03 - 26:06it might not realize this is really a type.
-
26:06 - 26:08It might think it was
-
26:08 - 26:10more of a global constant
-
26:10 - 26:14that we actually have to use the type name key word again here
-
26:14 - 26:16on this one
-
26:16 - 26:17construction.
-
26:17 - 26:20It is basically a hint to the compiler
-
26:20 - 26:22that says, the next thing coming up
-
26:22 - 26:25really is a type name.
-
26:25 - 26:28So given that you can't tell that, you'd think it would be obvious; in
-
26:28 - 26:31this case it doesn't seem like there's a lot for it to be confused at, but there's
-
26:31 - 26:33actually some other constructions that we can't rule out.
-
26:33 - 26:37So the compiler is a little bit confused in this situation, and it wants you to
-
26:37 - 26:38tell it in advance,
-
26:38 - 26:41and so I'll tell you that the basic rules for when you're going to need to use this is
-
26:41 - 26:45exactly in this and only this situation.
-
26:45 - 26:48And I'll tell you what the things are that are required for type name to
-
26:48 - 26:50become part of your
-
26:50 - 26:51repertoire.
-
26:51 - 26:53You have a return type
-
26:53 - 26:54on a member function
-
26:54 - 26:58that is defined as part of a class that is a template,
-
26:58 - 27:00and that type that you are using
-
27:00 - 27:03is defined within the scope of that template.
-
27:03 - 27:07So it's trying to use something that's within a template class scope
-
27:07 - 27:08outside of that scope
-
27:08 - 27:09
-
27:09 - 27:11that requires this type name.
-
27:11 - 27:14And the only place where we see things used out of their scope is on the
-
27:14 - 27:15return value.
-
27:15 - 27:19So this is kind of the one
-
27:19 - 27:21configuration that causes you to have to do this.
-
27:21 - 27:22
-
27:22 - 27:25It's really just, you know, a little bit of a C quirk.
-
27:25 - 27:28It doesn't change anything about what the code's doing or anything interesting about binary
-
27:28 - 27:31search trees, but it will come up in any situation where you have this helper member
-
27:31 - 27:34function that wants to return something that's templated. You
-
27:34 - 27:35have
-
27:35 - 27:37to placate the compiler there.
-
27:37 - 27:39The error message when you don't do it, actually turns out to be very good in
-
27:39 - 27:45GCC and very terrible in visual studio. GCC says type name required here,
-
27:45 - 27:47and visual studio says something completely
-
27:47 - 27:48mystical.
-
27:48 - 27:49So it
-
27:49 - 27:52is something to be just a little bit careful about when you're trying to write
-
27:52 - 27:56that kind of
-
27:56 - 27:58code. Question? The asterisk? The asterisk is just because it returns a pointer.
-
27:58 - 28:00
-
28:00 - 28:03In all the situations, I'm always returning a pointer to the node back
-
28:03 - 28:04there.
-
28:04 - 28:16So here's the pointer to the match I found or node when I didn't find one. All right, that's a little bit of grimy code. Let's keep going.
-
28:16 - 28:17So how is it
-
28:17 - 28:18that add
-
28:18 - 28:22works? Well, it starts like get value, that
-
28:22 - 28:23the
-
28:23 - 28:27strategy we're going to use for inserting into the tree is the really simple one that say,
-
28:27 - 28:30well don't rearrange anything that's there. For example, if I were trying to put
-
28:30 - 28:32the number 54 into this tree,
-
28:32 - 28:33you could say - well
-
28:33 - 28:36you could imagine putting it at the root and moving things around. Imagine
-
28:36 - 28:40that your goal is to put it in with a minimal amount of rearrangement.
-
28:40 - 28:44So leaving all the existing nodes connected the way they are,
-
28:44 - 28:47we're going to follow the binary search path of
-
28:47 - 28:49the get value code
-
28:49 - 28:51to find the place where 54 would have been
-
28:51 - 28:54if we left everything else in tact.
-
28:54 - 28:57Looking at 57 you could say, well it has to be to the left of
-
28:57 - 29:0157. Looking to the 32 you say, it has to be to the right of
-
29:01 - 29:0132.
-
29:01 - 29:05Here's where we get some empty tree, so we say, well that's the place where we'll put it.
-
29:05 - 29:06We're going to tack it off, kind
-
29:06 - 29:12of hang it off one of the empty trees
-
29:12 - 29:15along the front tier, the bottom of that tree.
-
29:15 - 29:18And so, we'll just walk our way down to where we fall off the tree, and put the
-
29:18 - 29:19new node in place there.
-
29:19 - 29:22So if I want the node 58 in, I'll say it
-
29:22 - 29:25has to go to the right; it has to go to the left; it has to go to the left; it has to go to the left; it
-
29:25 - 29:27will go right down there. If
-
29:27 - 29:30I want the node 100 in, it has to go right, right, right, right, right.
-
29:30 - 29:35That just kind of following the path, leaving all the current pointers and
-
29:35 - 29:40nodes in place, kind of finding a new place. And there's exactly one
-
29:40 - 29:43that given any number in a current configuration
-
29:43 - 29:46and the goal of not rearranging anything, there's exactly one place to put
-
29:46 - 29:48that new node in there
-
29:48 - 29:49that will
-
29:49 - 29:51keep the binary search property in tact.
-
29:51 - 29:55So it actually does a lot of decision making for us.
-
29:55 - 29:58We don't have to make any hard decisions about where to go. It just all this
-
29:58 - 30:05left, right, left, right based on less than and greater than. What if the values are equal? If
-
30:05 - 30:09the values are equal you have to have a plan, and it could be
-
30:09 - 30:11that you're using all the less than or equal to or greater than
-
30:11 - 30:14or equal to. You just have to have a plan. In terms of our map it turns out to
-
30:14 - 30:17equal to all the rights.
-
30:17 - 30:20So we handle it differently, which we don't every put a duplicate into the
-
30:20 - 30:23tree.
-
30:23 - 30:26So the add call is going to basically just be a wrapper
-
30:26 - 30:29that calls tree enter, starting from the root
-
30:29 - 30:31with the key and the value
-
30:31 - 30:34that then goes and does that work, walks its way down to the bottom to
-
30:34 - 30:40find the right place to stick the new one in.
-
30:40 - 30:43Here is
-
30:43 - 30:44a
-
30:44 - 30:49very dense little piece of code that will put a node into a tree
-
30:49 - 30:52using the strategy that we've just talked about of,
-
30:52 - 30:55don't rearrange any of the pointers, just walk your way down to where you find an empty
-
30:55 - 30:59tree. And once you find that empty tree, replace the empty tree with a singleton node
-
30:59 - 31:02with the new key and value that you have.
-
31:02 - 31:06So let's not look at the base case first. Let's actually look at the
-
31:06 - 31:09recursive cases, because I think that they're a little bit easier to think about.
-
31:09 - 31:13Is if we every find a match, so we have a key coming in, and it matches
-
31:13 - 31:14the current node, then we just overwrite.
-
31:14 - 31:17So that's the way the map handles
-
31:17 - 31:18insertion of a duplicate value.
-
31:18 - 31:21Now, in some other situation you might actually insert one. But in our case it turns out we
-
31:21 - 31:24never want a duplicate. We always want to take the existing value and replace
-
31:24 - 31:26it with a new one.
-
31:26 - 31:28If it matches, we're done.
-
31:28 - 31:29Otherwise,
-
31:29 - 31:32we need to decide whether we go left or right.
-
31:32 - 31:36Based on the key we're attempting to insert or the current node value, it either
-
31:36 - 31:39will go in the left of the subtree, left of the right, in which case we make
-
31:39 - 31:44exactly one recursive call either to the left or to the right depending on the
-
31:44 - 31:46relationship of key to the current
-
31:46 - 31:48key's value.
-
31:48 - 31:49And then
-
31:49 - 31:52the one base case that everything eventually gets down to, that's going to create a
-
31:52 - 31:53new node,
-
31:53 - 31:57is when you have an empty tree,
-
31:57 - 31:59then that says that you have followed the path that
-
31:59 - 32:04has lead off the tree and fallen off the bottom. It says replace that with
-
32:04 - 32:05this new singleton node.
-
32:05 - 32:07So we make a new node out in the heap.
-
32:07 - 32:09We're assigning the key
-
32:09 - 32:12based on this. We're assigning the value based on the parameter coming in. And then we set
-
32:12 - 32:14its left and right null to do
-
32:14 - 32:16
-
32:16 - 32:20a new leaf node, right, it has left and right null subtrees.
-
32:20 - 32:22And the way this got wired in the tree,
-
32:22 - 32:26you don't see who is it that's now pointing to this tree, this new
-
32:26 - 32:28singleton node that I placed into here?
-
32:28 - 32:31You don't typically see that wire that's coming into it. Who
-
32:31 - 32:33is pointing to it
-
32:33 - 32:34is actually being done
-
32:34 - 32:39by T being passed by reference.
-
32:39 - 32:42The node right above it,
-
32:42 - 32:44the left or the right
-
32:44 - 32:48of what's going to become the parent of the new node being entered,
-
32:48 - 32:51was passed by reference. It previously was null,
-
32:51 - 32:55and it got reassigned to point to this new node. So the kind of wiring in of
-
32:55 - 32:58that node is happening by the pass by reference of the pointer, which is a
-
32:58 - 33:04tricky thing to kind of watch and get your head around. Let's
-
33:04 - 33:06look at the code operating. Do you have a
-
33:06 - 33:13question? Yes. [Inaudible]. Can you
-
33:13 - 33:14explain
-
33:14 - 33:16
-
33:16 - 33:17that little step?
-
33:17 - 33:22Instructor: Well this step is just something in the case of the map, if we every find an existing entity that has that key, and
-
33:22 - 33:23we overwrite.
-
33:23 - 33:27So in the map you said previously, Bob's phone number is this, and then you install a new
-
33:27 - 33:30phone number on Bob. If it finds a map to Bob saying Bob was previously entered, it just
-
33:30 - 33:34replaces his value. So no nodes are created, no pointers are rearranged;
-
33:34 - 33:38it just commandeers that node and changes its value.
-
33:38 - 33:43That's the behaviors specified by map.
-
33:43 - 33:45So if we have this
-
33:45 - 33:46right now, we've got
-
33:46 - 33:49some handful of fruits that are installed
-
33:49 - 33:53in our tree with some value. It's doesn't matter what the value is, so I didn't even bother to fill
-
33:53 - 33:57it in to confuse you. So this is what I have, papaya is my root node, and then
-
33:57 - 34:01the next level down has banana and peach, and the pear and what not. So
-
34:01 - 34:03root is pointing to papaya right now.
-
34:03 - 34:06When I make my call to insert a new value,
-
34:06 - 34:09let's say that the one that I want to put in the tree now is orange,
-
34:09 - 34:13the very first call to tree enter looks like this,
-
34:13 - 34:16T is the reference parameter that's coming in, and
-
34:16 - 34:20it refers to root.
-
34:20 - 34:24So the very first call to tree enter says please insert
-
34:24 - 34:27orange and its value
-
34:27 - 34:28into the tree
-
34:28 - 34:31who is pointed to by
-
34:31 - 34:34root. So it actually has that by reference. It's going to need that by
-
34:34 - 34:37reference because it's actually planning on updating that
-
34:37 - 34:39when it finally installs the new node.
-
34:39 - 34:43So the first call to tree enter says, okay well is orange less than or greater than
-
34:43 - 34:45papaya.
-
34:45 - 34:47Orange precedes papaya
-
34:47 - 34:48in the alphabet,
-
34:48 - 34:50so okay, well call tree enter
-
34:50 - 34:52the second time
-
34:52 - 34:53passing
-
34:53 - 34:58the left subtree of the current value of tee. And so T
-
34:58 - 35:00was a reference to root right then,
-
35:00 - 35:04now T becomes a reference to the left subtree coming out of papaya. So it says, now
-
35:04 - 35:06okay, for the tree
-
35:06 - 35:10that we're talking about here, which is the one that points to the banana and
-
35:10 - 35:14below. That's the new root of that subtree is banana, it says, does orange go
-
35:14 - 35:15to the left or right
-
35:15 - 35:19of the root node
-
35:19 - 35:21banana? And it goes to the right. So the third stack frame
-
35:21 - 35:24of tree enter now says, well the reference pointer
-
35:24 - 35:28is now looking at the right subtree coming out of banana. And it
-
35:28 - 35:29says, well does melon go to
-
35:29 - 35:31the left or the right of
-
35:31 - 35:32that?
-
35:32 - 35:34It goes to the right,
-
35:34 - 35:36so now on this call, right,
-
35:36 - 35:40the current value of T is going to be null. It is a reference, so a synonym for
-
35:40 - 35:44what was the right field coming out of the melon node,
-
35:44 - 35:47and in this is the case where it says, if orange was in the tree,
-
35:47 - 35:50then this is where it would be hanging off of. It would be coming off of the
-
35:50 - 35:51right
-
35:51 - 35:52
-
35:52 - 35:53subtree of melon.
-
35:53 - 35:55That part is null,
-
35:55 - 35:58and so that's our place. That's the place where we're going to take out the
-
35:58 - 35:59existing null,
-
35:59 - 36:00wipe over it,
-
36:00 - 36:04and put in a new pointer to the node orange
-
36:04 - 36:06that's out here.
-
36:06 - 36:09And so that's how the thing got wired in. It's actually kind of
-
36:09 - 36:12subtle to see how that happened, because you don?t ever see a
-
36:12 - 36:16direct descendant, like T is left equals this, equals that new node; or T is
-
36:16 - 36:17right equals that new
-
36:17 - 36:20node. It happened by virtue of passing T right
-
36:20 - 36:22by reference into the
-
36:22 - 36:26calls further down the stack. That eventually, right when it hit the base
-
36:26 - 36:26case,
-
36:26 - 36:30it took what was previously a pointer to null and overwrote it with a pointer to this
-
36:30 - 36:31new cell,
-
36:31 - 36:37that then got attached into the tree. Let's take a
-
36:37 - 36:38look at this piece
-
36:38 - 36:40of code.
-
36:40 - 36:44That by reference there turns out to be really, really important.
-
36:44 - 36:46If that
-
36:46 - 36:49T was just being passed as an ordinary node star,
-
36:49 - 36:52nodes would just get lost, dropped on the floor like crazy.
-
36:52 - 36:56For example, consider the very first time when you're passing in root and root
-
36:56 - 36:58is empty. If root is null,
-
36:58 - 36:59it would say, well if root is null
-
36:59 - 37:02and then it would change this local parameter T; T is some new node of these
-
37:02 - 37:06things. But if it really wasn't making permanent changes to the root,
-
37:06 - 37:08really wasn't making root pointer of that new node, root
-
37:08 - 37:12would continue pointing to null, and that node would just get orphaned. And
-
37:12 - 37:14every subsequent one, the same thing would happen.
-
37:14 - 37:17So in order to have this be a persistent change,
-
37:17 - 37:19the pointer that's coming in,
-
37:19 - 37:22either the root in the first case or T left or T
-
37:22 - 37:23right
-
37:23 - 37:25in subsequent calls,
-
37:25 - 37:30needed to have the ability to be permanently modified when we attach that new
-
37:30 - 37:33subtree.
-
37:33 - 37:37That's really very tricky code, so I would be happy to
-
37:37 - 37:39try to answer
-
37:39 - 37:41questions or go through something to try to get your head around what
-
37:41 - 37:44it's supposed to be doing. If you could ask a question I
-
37:44 - 37:47could
-
37:47 - 37:49help you. Or you could just nod your head and say, I'm just totally fine.
-
37:49 - 37:52Could you use the tree search algorithm that you programmed before?
-
37:52 - 37:53They're
-
37:53 - 37:58very close, right, and
-
37:58 - 37:59where did I put it?
-
37:59 - 38:02It's on this slide.
-
38:02 - 38:05The problem with this one right now is it does really stop and return the
-
38:05 - 38:07null. So I could actually unify them, but it would take a little more
-
38:07 - 38:11work, and at some point you say, wow I could unify this
-
38:11 - 38:15at the expense of making this other piece of code a little more complicated. So in this case I chose
-
38:15 - 38:18to keep them separate. It's not impossible to unify them, but you end up
-
38:18 - 38:19having to point to
-
38:19 - 38:22where the node is or a null where you would insert a new node. Tree
-
38:22 - 38:26search doesn't care about, but tree enter does. It sort of would
-
38:26 - 38:29modify it to fit the needs of both at the expense of making it a little more awkward
-
38:29 - 38:38on both sides. Question here.
-
38:38 - 38:43[Inaudible] the number example.
-
38:43 - 38:47If you were inserting 54 where you said, if you called print,
-
38:47 - 38:49wouldn't it print
-
38:49 - 38:51out of order? So
-
38:51 - 38:5654 would go left of 57, right of 32, so it'd go over here.
-
38:56 - 38:59But the way in order traversals work is says,
-
38:59 - 39:03print everything in my left subtree, print everything in my left subtree, left subtree. So it would kind of print seven
-
39:03 - 39:07as it came back five, 32, and then it would print 54 before it got to the
-
39:07 - 39:0857. So
-
39:08 - 39:12an in order traversal of a binary search tree, as long as the binary search tree
-
39:12 - 39:16is properly maintained, will print them in sorted order.
-
39:16 - 39:19It's almost like, don't even think too hard about it, if
-
39:19 - 39:22everything in my left subtree is smaller, and everything in my right subtree is greater than me,
-
39:22 - 39:26if I print all of those I have to come out in sorted order.
-
39:26 - 39:30If the property is correctly maintained in the tree, then that
-
39:30 - 39:31by definition
-
39:31 - 39:34is a sorted way to pass through all the numbers. So if
-
39:34 - 39:38at any point an in order traversal is hitting numbers out of order, it would mean that the tree wasn't
-
39:38 - 39:40properly
-
39:40 - 39:51organized.
-
39:51 - 39:55So let's talk about how that works, there are a
-
39:55 - 39:59couple of other operations on the tree, things like doing the size,
-
39:59 - 40:01which would count the number of members. Which you could do by just doing a traversal, or you
-
40:01 - 40:04might do by just caching a number
-
40:04 - 40:07that you updated as you added and moved nodes. And
-
40:07 - 40:10the contains keys, that's basically just a tree search
-
40:10 - 40:14that determines whether it found a node or not.
-
40:14 - 40:17The rest of the implementation is not interest. There's also deleting, but I
-
40:17 - 40:18didn't show you
-
40:18 - 40:23that. The space use, how does this relate to the other known
-
40:23 - 40:24implementations that we have for math?
-
40:24 - 40:27It adds certainly some space overhead. We're
-
40:27 - 40:31going to have two pointers, a left and a right off every entry. So every entry is
-
40:31 - 40:35guaranteed to have an excess 8-byte capacity
-
40:35 - 40:37relative to the data being stored.
-
40:37 - 40:40It does actually add those tightly to the allocation. So like a link
-
40:40 - 40:43list, it allocates nodes on a per
-
40:43 - 40:46entry basis, so there's actually not a lot of
-
40:46 - 40:49reserve capacity that's waiting around unused the way it might been in a
-
40:49 - 40:51vector situation.
-
40:51 - 40:53And the performance of it is that it's based on the
-
40:53 - 40:55height of the tree,
-
40:55 - 41:00that both the add and the get value are doing a traversal starting from the root and
-
41:00 - 41:03working their way down to the bottom of the tree, either finding what it wanted
-
41:03 - 41:05along the way
-
41:05 - 41:07or falling off the bottom of the tree
-
41:07 - 41:11when it wasn't found or when it needs to be inserting a new node.
-
41:11 - 41:15That height of the tree, you're expecting it to be logarithmic. And so
-
41:15 - 41:17if I look at, for example, the values, I've got
-
41:17 - 41:22seven values here, two, eight, 14, 15, 20, and 21.
-
41:22 - 41:23Those seven,
-
41:23 - 41:27eight values can be inserted in a tree to get very
-
41:27 - 41:28different results.
-
41:28 - 41:30There's not one binary search tree
-
41:30 - 41:33that represents those values
-
41:33 - 41:37uniquely. Depending on what order you manage to insert them you'll have
-
41:37 - 41:38different things.
-
41:38 - 41:39For example,
-
41:39 - 41:41given the way we're doing it,
-
41:41 - 41:44whatever node is inserted first will be the root, and we never move the
-
41:44 - 41:48root. We leave the root along. So if you look at this as a historical
-
41:48 - 41:50artifact and say, well they entered a bunch of numbers,
-
41:50 - 41:52and this is the tree I got.
-
41:52 - 41:55I can tell you for sure that 15 was the very first number that they entered.
-
41:55 - 41:57There's no other possibility, 15 got entered first.
-
41:57 - 42:01The next number that got inserted was one of eight or 20, I don't know
-
42:01 - 42:03which.
-
42:03 - 42:06I can tell you, for example, that eight got inserted before 14. In terms
-
42:06 - 42:09of a level-by-level arrangement, the ones on level one
-
42:09 - 42:13were always inserted before the ones on level two.
-
42:13 - 42:16But one possibility I said here was well maybe 15 went in first,
-
42:16 - 42:18then eight, and then two,
-
42:18 - 42:21and then 20, and then 21, and then 14, and then 18. There
-
42:21 - 42:25are some other rearrangements that are also possible that would create the exact same
-
42:25 - 42:26tree.
-
42:26 - 42:30And there are other variations that would produce a different, but still valid,
-
42:30 - 42:32binary search tree.
-
42:32 - 42:33So with these seven nodes, right,
-
42:33 - 42:36this tree is height three,
-
42:36 - 42:39and is what's considered perfectly balanced. Half the nodes are on the
-
42:39 - 42:42left and the right of each tree all the way through it,
-
42:42 - 42:45leading to that just beautifully balanced
-
42:45 - 42:46
-
42:46 - 42:47result we're hoping for
-
42:47 - 42:52where both our operations, both add and get value, will be logarithmic. So
-
42:52 - 42:54if I have
-
42:54 - 42:561,000
-
42:56 - 42:59nodes inserted, that height tree should be about ten. And if everything kind of went
-
42:59 - 43:03well and we got kind of went well, and we got an even split all the way around, then it would take ten
-
43:03 - 43:06comparisons to find something in that tree or determine it wasn't there' and
-
43:06 - 43:07also to insert,
-
43:07 - 43:10which was the thing we were having trouble with at the vector case, that
-
43:10 - 43:13because that search leads us right to the place where it needs to go,
-
43:13 - 43:16and the inserting because of the flexibility of the pointer base structure is
-
43:16 - 43:17very easy,
-
43:17 - 43:21we can make both that insert and add operate logarithmically.
-
43:21 - 43:24Let's think about some cases that aren't so good.
-
43:24 - 43:28I take those same seven numbers and I insert them
-
43:28 - 43:32not quite so perfectly, and I get a not quite so even result.
-
43:32 - 43:36So on this side I can tell you that 20 was the first one inserted.
-
43:36 - 43:38It was the root and the root doesn't move,
-
43:38 - 43:41but then subsequent ones was inserted an eight, and then a 21, and then an 18,
-
43:41 - 43:46and then a 14, and then a 15, and then it went back and got that two,
-
43:46 - 43:47the one over there.
-
43:47 - 43:50Another possibility for how it got inserted
-
43:50 - 43:51
-
43:51 - 43:5318, 14, 15, and so on.
-
43:53 - 43:54These are
-
43:54 - 43:58not quite as balanced as the last one, but they're not terribly unbalanced.
-
43:58 - 44:00They're like one or two
-
44:00 - 44:03more than the perfectly balanced case.
-
44:03 - 44:09So at logarithmic plus a little constant still doesn't look
-
44:09 - 44:11that bad. Okay, how bad can it really get?
-
44:11 - 44:15It can get really bad.
-
44:15 - 44:18I can take those seven numbers and insert them in the worst
-
44:18 - 44:21case, and produce a completely degenerate tree
-
44:21 - 44:24where, for example, if I insert them in sorted order
-
44:24 - 44:25or in reverse sorted order,
-
44:25 - 44:26
-
44:26 - 44:28I really have a link list.
-
44:28 - 44:32I don't have a tree at all. It goes two, 18, 14, 15, 20, 21,
-
44:32 - 44:34that's still a valid binary search tree.
-
44:34 - 44:35If you
-
44:35 - 44:38print it in order, it still comes out in sorted order.
-
44:38 - 44:41It still can be searched using a binary search strategy.
-
44:41 - 44:45It's just because the way it's been divided up here, it's not buying
-
44:45 - 44:48us a lot. So if I'm looking for the number 22, I'll say, well
-
44:48 - 44:52is it to the left or the right of two? It's to the right. Is it to the left or right of
-
44:52 - 44:55eight? Oh, it's to the right. Is it to the left or right, you know like all the way down, so finding
-
44:55 - 44:5722 looking at every single one
-
44:57 - 45:01of the current entries in the tree, before I could conclude, as I fell off the
-
45:01 - 45:04bottom, that it wasn't there.
-
45:04 - 45:06Similarly kind of over there;
-
45:06 - 45:09it didn't buy us a lot. It has a lot to do with
-
45:09 - 45:13how we inserted it as to how good a balance we're going to get.
-
45:13 - 45:16And there are some cases that will produce just drastically unbalanced
-
45:16 - 45:17
-
45:17 - 45:21things that require linear searches, totally linear to walk that way
-
45:21 - 45:22down
-
45:22 - 45:25to find something.
-
45:25 - 45:26There's a couple of other
-
45:26 - 45:29degenerate trees, just to mention it's not only the sorted or reverse sorted.
-
45:29 - 45:33There's also these other kind of saw tooth inputs, they're called,
-
45:33 - 45:35where you just happen to at any given point to
-
45:35 - 45:39have inserted the max or the min of the elements that remain, so it
-
45:39 - 45:42keeps kind of dividing the input into one and
-
45:42 - 45:44N-1. So having the
-
45:44 - 45:47largest and the smallest, and the largest and the smallest, and then
-
45:47 - 45:48subsequent smallest down there
-
45:48 - 45:52can also produce a linear line. It's not quite all just down the left or
-
45:52 - 45:55down the right, but it does mean that every single node has an empty
-
45:55 - 45:59left or right subtree is what's considered the degenerate case here. And
-
45:59 - 46:01both have the same property where
-
46:01 - 46:04half of the pointers off
-
46:04 - 46:08of each cell are going nowhere.
-
46:08 - 46:10There is an interesting relationship here
-
46:10 - 46:12between these worst-case inputs for transversion and quick sort, and they're
-
46:12 - 46:14actually kind of exactly one to one.
-
46:14 - 46:18That what made quick sort deteriorate was the element of the input,
-
46:18 - 46:21the max or the min of what remain. And that's exactly the same case
-
46:21 - 46:25for transversion, max or min of what remains means that you are not dividing your
-
46:25 - 46:29input into two partitions of roughly equal size. You're dividing them into the
-
46:29 - 46:33one, N-1, which is getting you nowhere. What do you
-
46:33 - 46:37get to do about it?
-
46:37 - 46:40You have to decide first of all if it's a problem worth solving.
-
46:40 - 46:41
-
46:41 - 46:44So that's what the first one say. Sometimes you just say that
-
46:44 - 46:46it will sometimes happen in some
-
46:46 - 46:47inputs
-
46:47 - 46:50that I consider rare enough that it will degenerate. I don't think they're
-
46:50 - 46:52important enough or common enough to make a big deal out of.
-
46:52 - 46:56I don't think that applies here, because it turns out that getting your data in sorted order is
-
46:56 - 46:59probably not unlikely. That somebody might be refilling a map
-
46:59 - 47:02with data they read from a file that they wrote out in sorted order last time.
-
47:02 - 47:05So saying that if it is coming in sorted,
-
47:05 - 47:09then I'm probably going to tolerate that, is probably not going to work.
-
47:09 - 47:12And there's two main strategies then, if you've agreed to take action about it,
-
47:12 - 47:14about what you're going to do.
-
47:14 - 47:15
-
47:15 - 47:19The first one is to wait until
-
47:19 - 47:21the problem gets bad enough
-
47:21 - 47:23that you decide to do something about it.
-
47:23 - 47:28So you don't do any rebalancing as you go. You just let stuff go, left, right, left,
-
47:28 - 47:30right, whatever's happening. And
-
47:30 - 47:31
-
47:31 - 47:34you try to keep track of the height of your tree,
-
47:34 - 47:37and you realize when the height of your tree seems sufficiently out of whack with
-
47:37 - 47:39your expected logarithmic height,
-
47:39 - 47:42to do something about it. Then you just fix the whole tree.
-
47:42 - 47:45You can clean the whole thing up. Pull them all out into a vector, rearrange them,
-
47:45 - 47:49put them back in, so the middlemost node is the root all the way down. And kind of
-
47:49 - 47:50create the perfectly balanced tree,
-
47:50 - 47:52and then keep going
-
47:52 - 47:54and wait for it to get out of whack, and fix it again;
-
47:54 - 47:56kind
-
47:56 - 47:58of wait and see.
-
47:58 - 48:01The other strategy, and this happens to be a more common one in
-
48:01 - 48:06practice, is you just constantly do a little bit of fixing up all the time.
-
48:06 - 48:09Whenever you notice that the tree is looking just a little bit lopsided, you let
-
48:09 - 48:12it get a little bit; usually you'll let it be out of balance by one.
-
48:12 - 48:15So you might have height four on that side and height three on that side.
-
48:15 - 48:17But as soon as it starts to get to be three and five,
-
48:17 - 48:20you do a little shuffle to bring them back to both four. And it
-
48:20 - 48:21involves just a little bit of work
-
48:21 - 48:23all the time
-
48:23 - 48:25to where you just don't let
-
48:25 - 48:28the tree turn into a real mess.
-
48:28 - 48:31But there's a little bit of extra fixing up being done all the time.
-
48:31 - 48:35And so the particular kind of tree that's discussed in the text, which you can read about,
-
48:35 - 48:37but we won't cover
-
48:37 - 48:40as part of our core material, is called the ADL tree. It's interesting to
-
48:40 - 48:43read about it to see what does it take, what kind of
-
48:43 - 48:47process is required to monitor this and do something about it. And
-
48:47 - 48:50so if we have that in place,
-
48:50 - 48:51
-
48:51 - 48:53we can get it to where
-
48:53 - 48:57the binary search tree part of this is logarithmic on both
-
48:57 - 48:59inserting new values and searching for values,
-
48:59 - 49:03which is that Holy Grail of efficiency boundary. You can say, yeah logarithmic is
-
49:03 - 49:04something we can tolerate
-
49:04 - 49:08by virtue of adding 8 bytes of pointer, potentially some balance factor,
-
49:08 - 49:12and some fanciness to guarantee that performance all over,
-
49:12 - 49:15but gets us somewhere that will scale very, very well for thousands
-
49:15 - 49:18and millions of entries, right, logarithmic is still a very, very fast
-
49:18 - 49:20function. So it
-
49:20 - 49:24creates a map that could really actually go the distance in scale. So, that's
-
49:24 - 49:28your talk about binary search trees. We'll come back and talk about
-
49:28 - 49:38hashing in graphs and stuff. I will see you on Wednesday.
- Title:
- Lecture 22 | Programming Abstractions (Stanford)
- Description:
-
Lecture 22 by Julie Zelenski for the Programming Abstractions Course (CS106B) in the Stanford Computer Science Department.
Julie discusses map as a vector and describes a different, possibly better, strategy. The basics of trees and their usefulness and how they can be used with binary search is then introduced.
Complete Playlist for the Course:
http://www.youtube.com/view_play_list?p=FE6E58F856038C69CS 106B Course Website:
http://cs106b.stanford.eduStanford Center for Professional Development:
http://scpd.stanford.edu/Stanford University:
http://www.stanford.edu/Stanford University Channel on YouTube:
http://www.youtube.com/stanforduniversity/ - Video Language:
- English
- Duration:
- 49:45
N. Ueda edited English subtitles for Lecture 22 | Programming Abstractions (Stanford) | ||
Eunjeong_Kim added a translation |