Lecture 22 | Programming Abstractions (Stanford)

Edit subtitles

0:00 - 0:07
0:07 - 0:12
0:12 - 0:15

This presentation is delivered by the Stanford Center for Professional
0:15 - 0:25

Development.
0:25 - 0:27

How are we doing?
0:27 - 0:28

Spring apparently is here;
0:28 - 0:33

just like California, one day it's winter, one day it's spring. Hopefully it's here to say.
0:33 - 0:38

So we had talked about doing it, we did this last time, we did an unsorted vector
0:38 - 0:40

implementation of the map, we just did a little struct of
0:40 - 0:42

the key value pair.
0:42 - 0:44

And then we went through
0:44 - 0:48

that both add and get value are running at linear time in that form because of the need
0:48 - 0:52

to search; search to find a value to override it, and search to find a value to
0:52 - 0:54

pull out the existing
0:54 - 0:55

match to it.
0:55 - 0:58

And if we went to the sorted form of that, we could get get value to be able to
0:58 - 1:02

capitalize on that binary search property and be able to quickly find the
1:02 - 1:06

key if it exists in there or to determine it doesn't exist.
1:06 - 1:09

But the add still suffered from the shuffling problem. We can quickly
1:09 - 1:12

find, using the binary search to find where the
1:12 - 1:15

new element would need to go. But if it doesn't already go there, it doesn't
1:15 - 1:17

already exist. We're going to have to shuffle to make that space.
1:17 - 1:21

So we have still a linear factor in there that we're working on getting rid of.
1:21 - 1:24

And then I know at the bottom that none of them have any
1:24 - 1:27

built in overhead per element. There may be some capacity, excess capacity in the
1:27 - 1:30

vector that we're absorbing, but there's no
1:30 - 1:33

four, ten, eight-byte chunk that every
1:33 - 1:36

element is taking as a cost.
1:36 - 1:39

So we started to talk about this at the end on Friday, and we're going to go with this and
1:39 - 1:41

see where this takes
1:41 - 1:45

us. We had the idea of the vector, and we know what vector constraints are, right? This
1:45 - 1:48

idea of the contiguous memory, and we were going to think about the link list for a minute to
1:48 - 1:51

see if it was going to buy us anything.
1:51 - 1:54

It tends to have an inverse relationship, that once you know where you
1:54 - 1:57

want to insert something it's easy to do that rearrangement because it's just a
1:57 - 2:01

little bit of pointer rewiring that needs to happen in that situation.
2:01 - 2:03

But it's hard to find a position insert,
2:03 - 2:06

even if the link list is in sorted order,
2:06 - 2:10

knowing where the S's are, or the M's are, or the B's are is still a
2:10 - 2:13

matter of starting from the beginning; that your only access in the
2:13 - 2:16

singly linked list is to start at the beginning and work your way down.
2:16 - 2:20

And so I'm going to take you through a little bit of a thought exercise about well, what if
2:20 - 2:23

we try to do something about that? Right now having a pointer coming into
2:23 - 2:26

Bashful and then subsequent ones going in sorted order
2:26 - 2:27

isn't really very helpful.
2:27 - 2:30

But if we were willing to try to rearrange our pointers
2:30 - 2:33

to give us the access that we have in the
2:33 - 2:36

sorted array form; for example, if I had a pointer
2:36 - 2:39

to the middlemost element, in
2:39 - 2:42

this case the Grumpy that's
2:42 - 2:43

midway down the list, if I had
2:43 - 2:45

that pointer
2:45 - 2:49

and from there I could say, is the element I'm looking for
2:49 - 2:53

ahead of Grumpy or behind Grumpy. So having Grumpy split my input in
2:53 - 2:54

half.
2:54 - 2:56

If I had that, right, then that would actually get me
2:56 - 2:59

sort of moving towards that direction that I wanted to get, which is using the binary
2:59 - 3:03

search property to facilitate throwing away half the data
3:03 - 3:06

and look into just the half remaining that's interesting.
3:06 - 3:09

Well, if I had a pointer to Grumpy, that doesn't quite solve my problem. A pointer to Grumpy
3:09 - 3:12

could get me halfway down the list, but then the pointer's coming in this way. If I
3:12 - 3:16

inverted the pointers leading away from it,
3:16 - 3:17

that's also kind of helping
3:17 - 3:20

to facilitate what I'm trying to do. So if I could point toward the middle
3:20 - 3:24

and then kind of move away, and then if I actually kind of applied the same strategy not just at the topmost
3:24 - 3:28

level, but at every subsequent division down, from the half, to the
3:28 - 3:29

quarter to the eight,
3:29 - 3:32

to the sixteenth. And instead of having a pointer
3:32 - 3:34

to the front of the list, what I really maintained was a pointer to the
3:34 - 3:36

middlemost element, and then
3:36 - 3:38

that element maintained pointers
3:38 - 3:40

to the middle of the
3:40 - 3:44

upper quarter and the middle of the lower quarter, and all the way down.
3:44 - 3:46

Then I could pull up this list
3:46 - 3:50

into something that we call a binary search tree.
3:50 - 3:54

So the way I would be storing this is having a pointer to a node, which
3:54 - 3:58

is expected to the median or the middle, or somewhere right along
3:58 - 3:59

the
3:59 - 4:00

values being stored.
4:00 - 4:04

And that it has two pointers, not just a next or previous or something like that, that are related to all
4:04 - 4:08

the elements that precede this one in the ordering, and all the elements that
4:08 - 4:09

follow it are in
4:09 - 4:11

these two pointers.
4:11 - 4:14

They call these subtrees coming off the smaller trees over here with three
4:14 - 4:17

elements on this side, and three elements on that side.
4:17 - 4:19

And that each of those nodes
4:19 - 4:22

has recursively the same formulation; that it has two
4:22 - 4:25

subtree hanging off of it
4:25 - 4:27

that contain,
4:27 - 4:32

given any particular node, that there's this binary search ordering here that says
4:32 - 4:36

everything that's in that left subtree precedes this node in ordering.
4:36 - 4:38

And everything that's in that right subtree
4:38 - 4:39

follows it.
4:39 - 4:44

And that's true for every node throughout the entire arrangement here.
4:44 - 4:47

So this is called a tree, and
4:47 - 4:50

this particular form is actually called the binary search tree. I'm
4:50 - 4:53

going to throw a little bit of vocabulary at you, because I'm going to start using these words, and I just want to get
4:53 - 4:54

them out there
4:54 - 4:58

early so that you understand what those words mean when I start using them.
4:58 - 5:01

Is that each of the cells in here,
5:01 - 5:02

like the cells that are
5:02 - 5:04

allocated for a link list,
5:04 - 5:07

are individually heap allocated, we're going to have pointers that allow us to get back
5:07 - 5:08

to
5:08 - 5:12

them. We typically call those nodes; in terms of tree I'll call it a cell or node in the list kind
5:12 - 5:16

of interchangeably. I'm more likely to use node when I'm talking about tree.
5:16 - 5:19

And a tree is really just a pointer to a node.
5:19 - 5:21

The empty tree
5:21 - 5:24

would be a pointer whose value is null, so pointing at nothing.
5:24 - 5:27

A singleton tree would be a pointer to a single node
5:27 - 5:29

that has left and right subtrees
5:29 - 5:31

that are both empty or both null.
5:31 - 5:35

So Bashful by itself is a singleton, a pointer to
5:35 - 5:38

Bashful. When we talk about subtrees, it just kind of means looking at the top; Grumpy has two
5:38 - 5:40

subtrees, a left and a right,
5:40 - 5:41
5:41 - 5:42

that
5:42 - 5:45

themselves are just smaller forms of the same recursive structure.
5:45 - 5:49

And then we would say that Grumpy is the parent of Doc and Sleepy.
5:49 - 5:52

So those are the two children nodes, and we'll call them the left child and the right
5:52 - 5:53

child.
5:53 - 5:56

And the node that's at the very top,
5:56 - 6:00

the one that our external pointer is coming into and then kind of leads the way, is called
6:00 - 6:02

the root node. The
6:02 - 6:02

first node
6:02 - 6:05

of the tree where we first start our work is going to be called the root.
6:05 - 6:09

And these nodes down here at the bottom that have empty subtrees are called leaf nodes.
6:09 - 6:14

So a node that has no further subtrees, so by itself has null
6:14 - 6:16

left and right trees, is called a leaf.
6:16 - 6:18

So that actually means the tree is kind of upside down
6:18 - 6:21

if you look at it. That Grumpy's the root and these things are the
6:21 - 6:25

leaves shows you that apparently most computer scientists don't get out very much,
6:25 - 6:28

because they haven't noticed that trees actually grow the other direction.
6:28 - 6:29
6:29 - 6:32

But just kind of go with it, and
6:32 - 6:33

you'll be
6:33 - 6:36

speaking tree-speak in no time.
6:36 - 6:43

Anybody have any questions about what we've got basically set up here?
6:43 - 6:45

So there is a
6:45 - 6:48

whole family of things that come under the heading of tree in computer
6:48 - 6:52

science. We're going to look in particular at this one embodiment, the binary search
6:52 - 6:53

tree.
6:53 - 6:56

But before I get deep down in the details of that, I just want to back
6:56 - 6:59

up for a second and just talk about what trees in general mean, and what kind of
6:59 - 7:01

things in general are represented in manipulative trees.
7:01 - 7:06

So that you have a sense of what is the broad notion of tree as a
7:06 - 7:08

computer scientist would see it,
7:08 - 7:10

is just that a tree is a recursive branching structure
7:10 - 7:14

where there are nodes, which have pointers to subtrees.
7:14 - 7:17

There may be one of those subtrees. There may be ten of those subtrees. There
7:17 - 7:20

may be two; depending on the nature of the tree you're working on there may be a
7:20 - 7:22

constraint on exactly how many children they have
7:22 - 7:26

or some flexibility in the number of children they have
7:26 - 7:27
7:27 - 7:29

depending on what's being modeled here.
7:29 - 7:31

There's always a single root node.
7:31 - 7:35

There's a node that kind of sits at the top of this hierarchy that leads down to
7:35 - 7:36

these things,
7:36 - 7:41

and that to form a tree there has to be a reachable path from the root
7:41 - 7:43

to every single node.
7:43 - 7:46

So there aren't some nodes that exist way out here, and there aren't two different ways to
7:46 - 7:48

get to the same
7:48 - 7:50

node in a tree; that if the
7:50 - 7:53

midterm is kept in the folder called exams, which is in the folder
7:53 - 7:56

called cs106, which is in my home folder. There's not some other way that
7:56 - 7:59

you can get to the midterm; the midterm lives exactly that part in the tree,
7:59 - 8:01

and there's no
8:01 - 8:04

circularity or loops in the structure.
8:04 - 8:07

And so a lot of things do get modeled tree, the one I've drawn here is just the file system.
8:07 - 8:10

Think about how your files are maintained on a computer, there tends to be a
8:10 - 8:11

structure of an
8:11 - 8:14

outermost folder, which has inner folders,
8:14 - 8:17

which have inner folders. And along the way there are these files, which you can think of as leaf
8:17 - 8:21

nodes that have no further subtrees underneath them.
8:21 - 8:24

And you can talk about, say, what is the size of
8:24 - 8:27

all the folders in my home directory? Well it would kind
8:27 - 8:28

of be recursively defined as summing over
8:28 - 8:30

all the folders within it,
8:30 - 8:33

how much storage is being used by those; which when it gets down to a leaf node can say, how
8:33 - 8:38

much space does this file take up and kind of add those together.
8:38 - 8:42

Things like your family trees, right, sort of modeling genealogy.
8:42 - 8:44

Things like the decomposition tree in your program;
8:44 - 8:47

main calling,
8:47 - 8:50

read file calling, set up boggle board calling,
8:50 - 8:52

give instructions.
8:52 - 8:56

There is a tree structure to that, about what pieces use what pieces and
8:56 - 8:57

descendants from there.
8:57 - 9:01

The ones that we're going to look at in particularly are the ones that are in the family of the binary trees. So a binary
9:01 - 9:05

tree has two children, exactly two children. There's a left and a right
9:05 - 9:07

pointer coming out of each node.
9:07 - 9:11

One or both of those can be null, but there can never be more than two
9:11 - 9:11

children.
9:11 - 9:15

And there is sort of space for recording two children, as opposed to recording things which
9:15 - 9:17

are trinary, which have three
9:17 - 9:20

possibilities for children, or where there may be some four or eight or
9:20 - 9:22

some other number there.
9:22 - 9:24

And in particular it's the search tree
9:24 - 9:26

that's going to help us get
9:26 - 9:28

the work done that we want, which is
9:28 - 9:32

there's not just any old arrangement of what's on the left, and what's on the right side.
9:32 - 9:35

That the nodes are going to be organized in here to facilitate binary search by
9:35 - 9:36

keeping
9:36 - 9:37

one half
9:37 - 9:40

over on the left and one half over on the right, and that we'll know which
9:40 - 9:44

half something goes in by virtue of looking at the key. So
9:44 - 9:47

let's take a look at a simple example of a binary search tree with
9:47 - 9:49

numbers. So
9:49 - 9:53

if I have a struct node that keeps track of an integer value and has two pointers, the
9:53 - 9:55

left and the right,
9:55 - 9:59

that point away from it. And in this case my root node is 57,
9:59 - 10:03

and looking at 57 I can tell you that every value in this tree
10:03 - 10:07

that is less than 57 will be in the left subtree. There will be no values
10:07 - 10:09

that are 56 or lower that are on the
10:09 - 10:10

right side.
10:10 - 10:14

And that will be true at every step down the tree. So if I'm looking for
10:14 - 10:15

something
10:15 - 10:18

that actually gives me that immediate access we were using the vector's
10:18 - 10:19

sorting property for. Which is if
10:19 - 10:24

I'm looking for the number 70, and I start at the root node of 57, then I know
10:24 - 10:25

it's not in the left subtree,
10:25 - 10:28

so assuming that about half the nodes are
10:28 - 10:30

pointed to over on that side,
10:30 - 10:34

I've now discarded half of the input even from consideration. If
10:34 - 10:36

I start at the 70,
10:36 - 10:39

and I say okay well if 57 must be over here, then again recursively apply the
10:39 - 10:44

same search property here. It's like if I'm looking for 70, is it going to be
10:44 - 10:45

before or after 79?
10:45 - 10:49

It's going to be before, so in fact I throw away anything that leads away on
10:49 - 10:52

the right subtree of 79, and work my way one level down.
10:52 - 10:53

So each step
10:53 - 10:58

in the recursive search is represented by sort of taking a step down
10:58 - 11:02

in that search tree; working my way down from the root to the leaf nodes. And
11:02 - 11:05

I either will find the value I'm looking for along the way,
11:05 - 11:08

or I will fall off the bottom of that tree
11:08 - 11:10

when I haven't been able to find it.
11:10 - 11:13

And that will tell me; oh it couldn't have been in the tree, because the only place
11:13 - 11:16

it could have been is the one that followed the binary search property.
11:16 - 11:19

And so if I look for 70, I say is it greater or less than
11:19 - 11:21

62? It's greater. Is it
11:21 - 11:24

greater or less than 71? It's less then. And then I work off there I get to the empty
11:24 - 11:26

tree. Then I say well if there was a 60, that's where it was
11:26 - 11:31

going to be, and it wasn't there so there much not be a 70 in this tree.
11:31 - 11:32

So it is exactly designed
11:32 - 11:36

to support one kind of search, and one kind of search only, which is this
11:36 - 11:40

efficient divide and conquer binary search, work its way down.
11:40 - 11:40
11:40 - 11:42

The other kinds of things, let's say, that
11:42 - 11:44

trees
11:44 - 11:45

are used for
11:45 - 11:48

may not need this kind of support, so they may not go out of their way to do this. But the one
11:48 - 11:52

we're actually very interested in is because of maps doing a lot of
11:52 - 11:55

additions into a sorted facility and
11:55 - 11:57

searches in that sort of facility. If we can
11:57 - 12:00

bottle it as a binary search tree, we're able to get the best of both worlds. To
12:00 - 12:03

have that super fast logarithmic search,
12:03 - 12:07

as well as have the flexibility of the pointer based structure
12:07 - 12:11

to help us do an insert. So if I'm ready to put a 70 into this tree,
12:11 - 12:14

the same path that led to me to where it should be
12:14 - 12:16

tells us where to put that new node
12:16 - 12:20

in place. And if all I need to do is to wire that in place; and if all I need to do to wire that in place is to kind of create a new node in the
12:20 - 12:21

heap and
12:21 - 12:24

put the pointers coming in and out of it the way that I would in a link list,
12:24 - 12:27

then I should be able to do both searches and inserts
12:27 - 12:30

in time proportional to the height of the tree,
12:30 - 12:35

which for a relatively balanced tree would be logarithmic. So
12:35 - 12:37

it sounds like we can get everything we wanted here
12:37 - 12:40

with a little bit of work. So
12:40 - 12:43

let me tell you - just do a little bit of playing with trees, just to kind of get used to thinking
12:43 - 12:46

about how we do stuff on them, and then I'm going to go forward with
12:46 - 12:49

implementing map using a binary search tree.
12:49 - 12:52

Trees are very, very recursive. So for those of you who still feel like
12:52 - 12:54

recursion is a little bit hazy,
12:54 - 12:58

this is another great area to strengthen your thinking about
12:58 - 13:01

recursive. In fact, I think a lot of people find operating on trees very, very
13:01 - 13:03

natural. Because trees are so recursive, the
13:03 - 13:07

idea that you write recursive algorithms to them is almost like you do it without even thinking, you don't
13:07 - 13:08

have to think hard. You're like,
13:08 - 13:11

you typically need to do something to your current node,
13:11 - 13:14

and then recursively operate on your left and your right.
13:14 - 13:17

And it's so natural that you don't even have stop and think hard about
13:17 - 13:18

what that means. So
13:18 - 13:21

your base case tends to be getting to an empty tree and having a
13:21 - 13:23

pointer that points to null,
13:23 - 13:28

and in the other cases have some node operate on your children.
13:28 - 13:31

Some of the simplest things are just traversals.
13:31 - 13:33

If I tree structure and I'd like to
13:33 - 13:35

do something to every node,
13:35 - 13:37

go through it and print them all,
13:37 - 13:39

go through it and write them to a file,
13:39 - 13:42

go through and add them all up to determine the average score, I
13:42 - 13:46

need these things. I'm going to need to do some processing that visits every node of the
13:46 - 13:48

tree once and exactly once,
13:48 - 13:50

and recursion is exactly the way to do that.
13:50 - 13:53

And there's a little bit of some choices, and when you're handling a
13:53 - 13:55

particular part of a tree, are you handling the current node and then
13:55 - 13:59

recurring on the left and right subtrees? Are you recurring first and then handling
13:59 - 14:02

the nodes. So some of those things that we saw in terms of the link list
14:02 - 14:03

that
14:03 - 14:06

change the order of the processing, but they don't change
14:06 - 14:08

whether you visit everybody. It's just a matter of which one you want to do
14:08 - 14:09

before.
14:09 - 14:12

I'm going to look at a couple of the simple traversals here,
14:12 - 14:14

and I'll point out
14:14 - 14:16

why that mattered.
14:16 - 14:18

If we look at in order,
14:18 - 14:22

so we'll consider that an in order traversal is for you to operate on your
14:22 - 14:24

left subtree recursively,
14:24 - 14:27

handle the current node right here, and then operate on your right subtree
14:27 - 14:29

recursively.
14:29 - 14:31

The base case, which is
14:31 - 14:34

kind of hidden in here, is if T is null, we get to an empty subtree we
14:34 - 14:37

don't do anything. So only in the nonempty case are we actually
14:37 - 14:39

processing and making recursive calls.
14:39 - 14:43

So if I start with my node 57 being the root of my tree, and I say
14:43 - 14:45

print that tree starting from the root.
14:45 - 14:48

It will print everything out of the left subtree,
14:48 - 14:52

then print the 57, and then print everything out of the right subtree.
14:52 - 14:55

And given that gets applied everywhere down,
14:55 - 14:58

the effect will be 57 says first print my left subtree.
14:58 - 15:01

And 37 says, well print my left subtree. And 15 says, well first
15:01 - 15:03

print my left subtree.
15:03 - 15:06

And then seven says, first print my left subtree. Well, that ones empty,
15:06 - 15:09

and so nothing gets printed.
15:09 - 15:13

As the recursion unwinds, it comes back to the last most call. The last most call was
15:13 - 15:17

seven. It says, okay, print seven and now print seven's right subtree. Again that's an empty,
15:17 - 15:19

so nothing gets printed there. It
15:19 - 15:22

will come back and then do the same thing with 15. I've done everything
15:22 - 15:23

to the left of 15;
15:23 - 15:26

let's print 15 and its right.
15:26 - 15:29

And doing that kind of all the way throughout the tree means that we will get the numbers out
15:29 - 15:31

in sorted order;
15:31 - 15:34

all right, five, 17, seven,
15:34 - 15:37

15, 32, 57, 59, 62, 71,
15:37 - 15:40

79, 93. An in order traversal
15:40 - 15:42

in a binary search tree
15:42 - 15:50

rediscovers, visits all the nodes from smallest to largest. [Inaudible]
15:50 - 15:54

So duplicates, you just have to have a strategy. It turns out in the map, the duplicates
15:54 - 15:54

overwrite.
15:54 - 15:58

So any time you get a new value you print there. Typically you just need to decide,
15:58 - 16:00

do they go to the left or do they go to the right?
16:00 - 16:03

So that you know when you're equal to if you wanted to find another one
16:03 - 16:04

of these, where would it go?
16:04 - 16:07

In a lot of situations you just don't have duplicates, so it's not actually a big deal.
16:07 - 16:10

You can't throw them arbitrarily on either side because you would end up having to search
16:10 - 16:14

both sides to find them. So you want to just know that's it's less than or equal to on the
16:14 - 16:18

left and greater than on the right. So that's
16:18 - 16:22

pretty neat, right, that that means that this
16:22 - 16:22
16:22 - 16:25

for purposes of being able to iterate for something, if
16:25 - 16:26

you were to walk
16:26 - 16:30

the structure in order, then you can produce them back out in sorted order, which tends to be
16:30 - 16:31

a convenient way
16:31 - 16:33
16:33 - 16:36

to print the output. There are a lot of situations where you do want that browsable
16:36 - 16:43

list in alphanumerical order. And the tree makes that easy to do.
16:43 - 16:45

When it comes time to delete the tree, I'm
16:45 - 16:48

going to need to go through and delete each of those nodes individually, the
16:48 - 16:51

same way I would on a link list to delete the whole structure.
16:51 - 16:54

The
16:54 - 16:57

order of traversal that's going to be most convenient for doing that is going to be to do it
16:57 - 16:58

post order.
16:58 - 17:00

Which is to say, when I'm looking at the root node 57,
17:00 - 17:05

I want to delete my entire left subtree recursively, delete my entire right
17:05 - 17:10

subtree recursively, and only after both of those pieces have been cleaned up,
17:10 - 17:12

delete the node we're on;
17:12 - 17:14

so deleting down below to the left, down below to the right
17:14 - 17:17

and then delete this one. If I did it in the other order, if
17:17 - 17:21

I tried to delete in between in the in order or in the pre order case where I did
17:21 - 17:23

my node and then to my subtrees,
17:23 - 17:26

I'd be reaching into parts of my memory I've deleted.
17:26 - 17:31

So we do want to do it in that order, so
17:31 - 17:32

recursion
17:32 - 17:35

coming back to visit you again.
17:35 - 17:42

Let's talk about how to make map work as a tree.
17:42 - 17:46

So if you think about what maps hold, maps hold a key, which is string type; a
17:46 - 17:48

value, which is client
17:48 - 17:51

some sort of template parameter there.
17:51 - 17:55

Now that I've built a structure that raps those two things with a pair and
17:55 - 17:58

adds to it, right these two additional pointers to the left and to the right
17:58 - 18:01

subtrees that point back to a recursive struct,
18:01 - 18:05

the one data member that the map will have at the top is just the root
18:05 - 18:09

pointer. The pointer to the node at the top of the tree,
18:09 - 18:12

and then from there we'll traverse pointers to find our way to other ones.
18:12 - 18:15

And we will organize that tree for the binary search property.
18:15 - 18:16

So using the
18:16 - 18:20

key as the discriminate of what goes left and right,
18:20 - 18:25

we'll compare each new key to the key at the root and decide is it going to go in the left subtree
18:25 - 18:27

root or the right subtree root? And so on down to the
18:27 - 18:31

matching position when we're doing a find, or to the place to insert if we're ready to
18:31 - 18:33

put a new node in.
18:33 - 18:36

So get value and add will have very similar structures, in terms of
18:36 - 18:38

searching that tree
18:38 - 18:39
18:39 - 18:40

from the root
18:40 - 18:45

down to the leaf to find that match along the way, or to
18:45 - 18:49

report where to put a new one in. So
18:49 - 18:52

let's look at the actual code; it has
18:52 - 18:54

my struct, it
18:54 - 18:57

has the strike key, it has the value type node,
18:57 - 19:00

two pointers there, and then the one data member for the map
19:00 - 19:03

itself here is the pointer to the root node. I
19:03 - 19:09

also put a couple of helper number functions that I'm going to need.
19:09 - 19:10
19:10 - 19:13

Searching the tree and entering a new node in the tree are going to require a little
19:13 - 19:18

helper, and we're going to see why that is. That basically that the add and get value public
19:18 - 19:20

number functions are going to be just wrappers
19:20 - 19:23

that turn and pass the code off to these recursive helpers.
19:23 - 19:26

And if these cases, the additional information that each of these is
19:26 - 19:28

tracking is what node we're at
19:28 - 19:32

as we work our way down the tree. But the initial call to add is just going to say, well
19:32 - 19:34

here's a key to value go put it in the right place.
19:34 - 19:37

And that while we're doing the recursion we need to know where we are in the tree. So
19:37 - 19:39

we're adding that extra parameter.
19:39 - 19:43

What a lot of recursive code needs is just a little more housekeeping, and the
19:43 - 19:44

housekeeping is
19:44 - 19:51

what part of the tree we're currently searching or trying to enter into.
19:51 - 19:54

So this is the outer call for get value.
19:54 - 19:57

So get value is really just a wrapper,
19:57 - 19:59

it's going to call tree search. It's going to
19:59 - 20:03

ask the starting value for root. It's the
20:03 - 20:04

beginning point for the search,
20:04 - 20:07

looking for a particular key. And what tree searches are going to return is a
20:07 - 20:10

pointer to a node. If
20:10 - 20:14

that node was not found, if there was no node that has a string
20:14 - 20:17

key that matches the one that we're looking for, then we'll return null. Otherwise, we'll return
20:17 - 20:18

the node.
20:18 - 20:20

And so it either will error
20:20 - 20:23

when it didn't find it or
20:23 - 20:25

pull out the value that was out of
20:25 - 20:25

that
20:25 - 20:27

node structure.
20:27 - 20:30

Well, this part looks pretty okay. The template, right, there's always just a
20:30 - 20:33

little bit of goo up there, right, seeing kind of
20:33 - 20:36

the template header on the top and the use of the val type, and all this other stuff.
20:36 - 20:39

It's going to get a little bit worse here. So just kind of
20:39 - 20:42

hopefully take a deep breath and say, okay that part actually
20:42 - 20:46

doesn't scare me too much. We're going to look at what happens when I write the tree search code, where
20:46 - 20:48

it really actually goes off and does the search down the tree. And there's going to be
20:48 - 20:49
20:49 - 20:52

a little bit more machinations required on this. So let's just talk about it first,
20:52 - 20:55

just what the code does. And then I'm going to come back and talk about some of the syntax that's
20:55 - 20:58

required to make this work correctly.
20:58 - 21:02

So the idea of tree search is, given a pointer to a particular
21:02 - 21:02

subtree, it
21:02 - 21:05

could be the root of the original tree or some
21:05 - 21:07

smaller subtree further down, and a key that we're looking for, we're
21:07 - 21:10

going to return a match when we find that node.
21:10 - 21:15

So looking at the step here, where if the key that we have matches the key of the node
21:15 - 21:17

we're looking at, then this is the node.
21:17 - 21:19

This is the one we wanted.
21:19 - 21:23

Otherwise, right, we look at two cases. If the key is less than this node's key,
21:23 - 21:26

then we're just going to return the result from searching in the left subtree.
21:26 - 21:31

Otherwise it must be greater than, and we're going to search in the right subtree.
21:31 - 21:34

So this case, right, duplicates aren't allowed, so there will be exactly one match, if
21:34 - 21:37

at all.
21:37 - 21:39

And when we find it we return.
21:39 - 21:43

The other case that needs to be handled as part of base case is if what if we ever get to an
21:43 - 21:44

empty tree.
21:44 - 21:47

So we're looking for 32 and the last node we just passed was
21:47 - 21:50

35. We needed to be the left of it, and it turns out there was nothing down there, we
21:50 - 21:52

have an empty tree.
21:52 - 21:54

We'll hit this case where T is null, and we'll return null. That means there was never
21:54 - 21:58

a match to that. There's no other place in the tree
21:58 - 22:01

where that 32 could have been. The binary search property completely dictates the
22:01 - 22:04

series of left and right turns. There's nowhere else to look.
22:04 - 22:06

So that's actually, really solving
22:06 - 22:07
22:07 - 22:11

a lot of our - cleaning up our work for us, streamlining where we need to go;
22:11 - 22:14

because the tree is organized through the
22:14 - 22:16

mid point and the quarter point and then the eighth point
22:16 - 22:19

to narrow down to where we could go. So as
22:19 - 22:21

it is, this code works correctly.
22:21 - 22:25

So do you have any questions about how it's thinking about doing its work?
22:25 - 22:30

And then I'm going to show you why the syntax is a little bit wrong though.
22:30 - 22:32

Okay, so let's talk about this
22:32 - 22:34

return type coming out of
22:34 - 22:36

the function.
22:36 - 22:38

So let's go back to this for a second.
22:38 - 22:41

That node is defined
22:41 - 22:43

as a private member
22:43 - 22:46

of the map class.
22:46 - 22:48

So node is actually not a global type.
22:48 - 22:53

It does not exist outside the map class. Its real name is
22:53 - 22:56
22:56 - 22:58

map
22:58 - 23:00

val type::
23:00 - 23:03

node.
23:03 - 23:05

That's its full name,
23:05 - 23:08

is that. Now,
23:08 - 23:08

inside
23:08 - 23:10

the implementation of map,
23:10 - 23:14

which is to say inside the class map, or inside member functions that we're
23:14 - 23:17

defining later, you'll not that we can just kind of drop
23:17 - 23:21

the big full name. The kind of my name is mister such and such, and such and
23:21 - 23:26

such; it's like okay, you can just call it node, the short name internal to the class
23:26 - 23:28

because it's inside that scope. And
23:28 - 23:31

it's able to say, okay in this scope is there something called node? Okay there is. Okay, that's the
23:31 - 23:34

one. Outside of that scope, though, if you were trying to use that outside of the
23:34 - 23:37

class or some other places, it's going to
23:37 - 23:40

have a little bit more trouble with this idea of node by itself. It's going to need to see
23:40 - 23:42

that full name. Let
23:42 - 23:44

me show you on this piece of code,
23:44 - 23:46

that node here
23:46 - 23:51

happens to be in the slight quirk of the way sequel plus defines things, that that
23:51 - 23:56

use of node is not considered in class scope yet.
23:56 - 23:57

That
23:57 - 24:01

the compiler leads from left to write, so we'll look at the first one. It says, template
24:01 - 24:04

type name val type, and so what the compiler sees so far is, I'm about to
24:04 - 24:07

see something that's templated on val type.
24:07 - 24:10

So it's a pattern for which val type is a placeholder in.
24:10 - 24:12

And then it sees val type
24:12 - 24:16

as the return value, and it says, okay, that's the placeholder. And then it sees this map :: and
24:16 - 24:18
24:18 - 24:21

it says, okay, this is within the - the thing
24:21 - 24:26

that I'm defining here is within scope of the map class. And so it sees the map
24:26 - 24:26

::
24:26 - 24:29

and that as soon as it crosses those double colons,
24:29 - 24:31

from that point forward
24:31 - 24:33

it's inside the map scope.
24:33 - 24:34

And so,
24:34 - 24:37

when I use things like tree search, or I use the name node, it knows what I'm
24:37 - 24:40

talking about. It says, oh there's a tree search in map class. There's a node struct in map
24:40 - 24:43

class, it's all good.
24:43 - 24:47

In the case of this one I've seen template type and val type, and
24:47 - 24:50

at this point it doesn't know anything yet.
24:50 - 24:53

And so it sees me using node, and it say,
24:53 - 24:54

node I've
24:54 - 24:55

never heard of node.
24:55 - 24:58

I'm in the middle of making some template based on val type, but I haven't heard
24:58 - 25:01

anything about node. Node isn't something that exists in the global name space.
25:01 - 25:03

You must have made some
25:03 - 25:04

tragic error.
25:04 - 25:08

What it wants you to say is
25:08 - 25:09

to give it the full name
25:09 - 25:12

of map :: node.
25:12 - 25:14

That the real full
25:14 - 25:16

this is my name,
25:16 - 25:21

and otherwise it will be quite confused.
25:21 - 25:24

If that weren't enough,
25:24 - 25:26

we'd like to think that we were done
25:26 - 25:27

with
25:27 - 25:30

placating the C compiler. But it turns out there's one other little
25:30 - 25:32

thing that has to happen here,
25:32 - 25:34

which is there is an obscure
25:34 - 25:38

reason why, even though I've given it the full name, and I feel like I've done my
25:38 - 25:40

duty in describing this thing,
25:40 - 25:43

that the compiler still is a little bit confused
25:43 - 25:45

by this usage.
25:45 - 25:48

And because the C language is just immensely complicated,
25:48 - 25:50

there are a lot of funny interactions,
25:50 - 25:52

there are a couple of situations it can get in to,
25:52 - 25:53

where
25:53 - 25:56

because it's trying hard to read left to right and see the things.
25:56 - 25:59

Sometimes it needs to know some things in advance that it doesn't have all the
25:59 - 26:00

information for.
26:00 - 26:03

And it turns out in this case there's actually some obscure situation where
26:03 - 26:06

it might not realize this is really a type.
26:06 - 26:08

It might think it was
26:08 - 26:10

more of a global constant
26:10 - 26:14

that we actually have to use the type name key word again here
26:14 - 26:16

on this one
26:16 - 26:17

construction.
26:17 - 26:20

It is basically a hint to the compiler
26:20 - 26:22

that says, the next thing coming up
26:22 - 26:25

really is a type name.
26:25 - 26:28

So given that you can't tell that, you'd think it would be obvious; in
26:28 - 26:31

this case it doesn't seem like there's a lot for it to be confused at, but there's
26:31 - 26:33

actually some other constructions that we can't rule out.
26:33 - 26:37

So the compiler is a little bit confused in this situation, and it wants you to
26:37 - 26:38

tell it in advance,
26:38 - 26:41

and so I'll tell you that the basic rules for when you're going to need to use this is
26:41 - 26:45

exactly in this and only this situation.
26:45 - 26:48

And I'll tell you what the things are that are required for type name to
26:48 - 26:50

become part of your
26:50 - 26:51

repertoire.
26:51 - 26:53

You have a return type
26:53 - 26:54

on a member function
26:54 - 26:58

that is defined as part of a class that is a template,
26:58 - 27:00

and that type that you are using
27:00 - 27:03

is defined within the scope of that template.
27:03 - 27:07

So it's trying to use something that's within a template class scope
27:07 - 27:08

outside of that scope
27:08 - 27:09
27:09 - 27:11

that requires this type name.
27:11 - 27:14

And the only place where we see things used out of their scope is on the
27:14 - 27:15

return value.
27:15 - 27:19

So this is kind of the one
27:19 - 27:21

configuration that causes you to have to do this.
27:21 - 27:22
27:22 - 27:25

It's really just, you know, a little bit of a C quirk.
27:25 - 27:28

It doesn't change anything about what the code's doing or anything interesting about binary
27:28 - 27:31

search trees, but it will come up in any situation where you have this helper member
27:31 - 27:34

function that wants to return something that's templated. You
27:34 - 27:35

have
27:35 - 27:37

to placate the compiler there.
27:37 - 27:39

The error message when you don't do it, actually turns out to be very good in
27:39 - 27:45

GCC and very terrible in visual studio. GCC says type name required here,
27:45 - 27:47

and visual studio says something completely
27:47 - 27:48

mystical.
27:48 - 27:49

So it
27:49 - 27:52

is something to be just a little bit careful about when you're trying to write
27:52 - 27:56

that kind of
27:56 - 27:58

code. Question? The asterisk? The asterisk is just because it returns a pointer.
27:58 - 28:00
28:00 - 28:03

In all the situations, I'm always returning a pointer to the node back
28:03 - 28:04

there.
28:04 - 28:16

So here's the pointer to the match I found or node when I didn't find one. All right, that's a little bit of grimy code. Let's keep going.
28:16 - 28:17

So how is it
28:17 - 28:18

that add
28:18 - 28:22

works? Well, it starts like get value, that
28:22 - 28:23

the
28:23 - 28:27

strategy we're going to use for inserting into the tree is the really simple one that say,
28:27 - 28:30

well don't rearrange anything that's there. For example, if I were trying to put
28:30 - 28:32

the number 54 into this tree,
28:32 - 28:33

you could say - well
28:33 - 28:36

you could imagine putting it at the root and moving things around. Imagine
28:36 - 28:40

that your goal is to put it in with a minimal amount of rearrangement.
28:40 - 28:44

So leaving all the existing nodes connected the way they are,
28:44 - 28:47

we're going to follow the binary search path of
28:47 - 28:49

the get value code
28:49 - 28:51

to find the place where 54 would have been
28:51 - 28:54

if we left everything else in tact.
28:54 - 28:57

Looking at 57 you could say, well it has to be to the left of
28:57 - 29:01

57. Looking to the 32 you say, it has to be to the right of
29:01 - 29:01

32.
29:01 - 29:05

Here's where we get some empty tree, so we say, well that's the place where we'll put it.
29:05 - 29:06

We're going to tack it off, kind
29:06 - 29:12

of hang it off one of the empty trees
29:12 - 29:15

along the front tier, the bottom of that tree.
29:15 - 29:18

And so, we'll just walk our way down to where we fall off the tree, and put the
29:18 - 29:19

new node in place there.
29:19 - 29:22

So if I want the node 58 in, I'll say it
29:22 - 29:25

has to go to the right; it has to go to the left; it has to go to the left; it has to go to the left; it
29:25 - 29:27

will go right down there. If
29:27 - 29:30

I want the node 100 in, it has to go right, right, right, right, right.
29:30 - 29:35

That just kind of following the path, leaving all the current pointers and
29:35 - 29:40

nodes in place, kind of finding a new place. And there's exactly one
29:40 - 29:43

that given any number in a current configuration
29:43 - 29:46

and the goal of not rearranging anything, there's exactly one place to put
29:46 - 29:48

that new node in there
29:48 - 29:49

that will
29:49 - 29:51

keep the binary search property in tact.
29:51 - 29:55

So it actually does a lot of decision making for us.
29:55 - 29:58

We don't have to make any hard decisions about where to go. It just all this
29:58 - 30:05

left, right, left, right based on less than and greater than. What if the values are equal? If
30:05 - 30:09

the values are equal you have to have a plan, and it could be
30:09 - 30:11

that you're using all the less than or equal to or greater than
30:11 - 30:14

or equal to. You just have to have a plan. In terms of our map it turns out to
30:14 - 30:17

equal to all the rights.
30:17 - 30:20

So we handle it differently, which we don't every put a duplicate into the
30:20 - 30:23

tree.
30:23 - 30:26

So the add call is going to basically just be a wrapper
30:26 - 30:29

that calls tree enter, starting from the root
30:29 - 30:31

with the key and the value
30:31 - 30:34

that then goes and does that work, walks its way down to the bottom to
30:34 - 30:40

find the right place to stick the new one in.
30:40 - 30:43

Here is
30:43 - 30:44

a
30:44 - 30:49

very dense little piece of code that will put a node into a tree
30:49 - 30:52

using the strategy that we've just talked about of,
30:52 - 30:55

don't rearrange any of the pointers, just walk your way down to where you find an empty
30:55 - 30:59

tree. And once you find that empty tree, replace the empty tree with a singleton node
30:59 - 31:02

with the new key and value that you have.
31:02 - 31:06

So let's not look at the base case first. Let's actually look at the
31:06 - 31:09

recursive cases, because I think that they're a little bit easier to think about.
31:09 - 31:13

Is if we every find a match, so we have a key coming in, and it matches
31:13 - 31:14

the current node, then we just overwrite.
31:14 - 31:17

So that's the way the map handles
31:17 - 31:18

insertion of a duplicate value.
31:18 - 31:21

Now, in some other situation you might actually insert one. But in our case it turns out we
31:21 - 31:24

never want a duplicate. We always want to take the existing value and replace
31:24 - 31:26

it with a new one.
31:26 - 31:28

If it matches, we're done.
31:28 - 31:29

Otherwise,
31:29 - 31:32

we need to decide whether we go left or right.
31:32 - 31:36

Based on the key we're attempting to insert or the current node value, it either
31:36 - 31:39

will go in the left of the subtree, left of the right, in which case we make
31:39 - 31:44

exactly one recursive call either to the left or to the right depending on the
31:44 - 31:46

relationship of key to the current
31:46 - 31:48

key's value.
31:48 - 31:49

And then
31:49 - 31:52

the one base case that everything eventually gets down to, that's going to create a
31:52 - 31:53

new node,
31:53 - 31:57

is when you have an empty tree,
31:57 - 31:59

then that says that you have followed the path that
31:59 - 32:04

has lead off the tree and fallen off the bottom. It says replace that with
32:04 - 32:05

this new singleton node.
32:05 - 32:07

So we make a new node out in the heap.
32:07 - 32:09

We're assigning the key
32:09 - 32:12

based on this. We're assigning the value based on the parameter coming in. And then we set
32:12 - 32:14

its left and right null to do
32:14 - 32:16
32:16 - 32:20

a new leaf node, right, it has left and right null subtrees.
32:20 - 32:22

And the way this got wired in the tree,
32:22 - 32:26

you don't see who is it that's now pointing to this tree, this new
32:26 - 32:28

singleton node that I placed into here?
32:28 - 32:31

You don't typically see that wire that's coming into it. Who
32:31 - 32:33

is pointing to it
32:33 - 32:34

is actually being done
32:34 - 32:39

by T being passed by reference.
32:39 - 32:42

The node right above it,
32:42 - 32:44

the left or the right
32:44 - 32:48

of what's going to become the parent of the new node being entered,
32:48 - 32:51

was passed by reference. It previously was null,
32:51 - 32:55

and it got reassigned to point to this new node. So the kind of wiring in of
32:55 - 32:58

that node is happening by the pass by reference of the pointer, which is a
32:58 - 33:04

tricky thing to kind of watch and get your head around. Let's
33:04 - 33:06

look at the code operating. Do you have a
33:06 - 33:13

question? Yes. [Inaudible]. Can you
33:13 - 33:14

explain
33:14 - 33:16
33:16 - 33:17

that little step?
33:17 - 33:22

Instructor: Well this step is just something in the case of the map, if we every find an existing entity that has that key, and
33:22 - 33:23

we overwrite.
33:23 - 33:27

So in the map you said previously, Bob's phone number is this, and then you install a new
33:27 - 33:30

phone number on Bob. If it finds a map to Bob saying Bob was previously entered, it just
33:30 - 33:34

replaces his value. So no nodes are created, no pointers are rearranged;
33:34 - 33:38

it just commandeers that node and changes its value.
33:38 - 33:43

That's the behaviors specified by map.
33:43 - 33:45

So if we have this
33:45 - 33:46

right now, we've got
33:46 - 33:49

some handful of fruits that are installed
33:49 - 33:53

in our tree with some value. It's doesn't matter what the value is, so I didn't even bother to fill
33:53 - 33:57

it in to confuse you. So this is what I have, papaya is my root node, and then
33:57 - 34:01

the next level down has banana and peach, and the pear and what not. So
34:01 - 34:03

root is pointing to papaya right now.
34:03 - 34:06

When I make my call to insert a new value,
34:06 - 34:09

let's say that the one that I want to put in the tree now is orange,
34:09 - 34:13

the very first call to tree enter looks like this,
34:13 - 34:16

T is the reference parameter that's coming in, and
34:16 - 34:20

it refers to root.
34:20 - 34:24

So the very first call to tree enter says please insert
34:24 - 34:27

orange and its value
34:27 - 34:28

into the tree
34:28 - 34:31

who is pointed to by
34:31 - 34:34

root. So it actually has that by reference. It's going to need that by
34:34 - 34:37

reference because it's actually planning on updating that
34:37 - 34:39

when it finally installs the new node.
34:39 - 34:43

So the first call to tree enter says, okay well is orange less than or greater than
34:43 - 34:45

papaya.
34:45 - 34:47

Orange precedes papaya
34:47 - 34:48

in the alphabet,
34:48 - 34:50

so okay, well call tree enter
34:50 - 34:52

the second time
34:52 - 34:53

passing
34:53 - 34:58

the left subtree of the current value of tee. And so T
34:58 - 35:00

was a reference to root right then,
35:00 - 35:04

now T becomes a reference to the left subtree coming out of papaya. So it says, now
35:04 - 35:06

okay, for the tree
35:06 - 35:10

that we're talking about here, which is the one that points to the banana and
35:10 - 35:14

below. That's the new root of that subtree is banana, it says, does orange go
35:14 - 35:15

to the left or right
35:15 - 35:19

of the root node
35:19 - 35:21

banana? And it goes to the right. So the third stack frame
35:21 - 35:24

of tree enter now says, well the reference pointer
35:24 - 35:28

is now looking at the right subtree coming out of banana. And it
35:28 - 35:29

says, well does melon go to
35:29 - 35:31

the left or the right of
35:31 - 35:32

that?
35:32 - 35:34

It goes to the right,
35:34 - 35:36

so now on this call, right,
35:36 - 35:40

the current value of T is going to be null. It is a reference, so a synonym for
35:40 - 35:44

what was the right field coming out of the melon node,
35:44 - 35:47

and in this is the case where it says, if orange was in the tree,
35:47 - 35:50

then this is where it would be hanging off of. It would be coming off of the
35:50 - 35:51

right
35:51 - 35:52
35:52 - 35:53

subtree of melon.
35:53 - 35:55

That part is null,
35:55 - 35:58

and so that's our place. That's the place where we're going to take out the
35:58 - 35:59

existing null,
35:59 - 36:00

wipe over it,
36:00 - 36:04

and put in a new pointer to the node orange
36:04 - 36:06

that's out here.
36:06 - 36:09

And so that's how the thing got wired in. It's actually kind of
36:09 - 36:12

subtle to see how that happened, because you don?t ever see a
36:12 - 36:16

direct descendant, like T is left equals this, equals that new node; or T is
36:16 - 36:17

right equals that new
36:17 - 36:20

node. It happened by virtue of passing T right
36:20 - 36:22

by reference into the
36:22 - 36:26

calls further down the stack. That eventually, right when it hit the base
36:26 - 36:26

case,
36:26 - 36:30

it took what was previously a pointer to null and overwrote it with a pointer to this
36:30 - 36:31

new cell,
36:31 - 36:37

that then got attached into the tree. Let's take a
36:37 - 36:38

look at this piece
36:38 - 36:40

of code.
36:40 - 36:44

That by reference there turns out to be really, really important.
36:44 - 36:46

If that
36:46 - 36:49

T was just being passed as an ordinary node star,
36:49 - 36:52

nodes would just get lost, dropped on the floor like crazy.
36:52 - 36:56

For example, consider the very first time when you're passing in root and root
36:56 - 36:58

is empty. If root is null,
36:58 - 36:59

it would say, well if root is null
36:59 - 37:02

and then it would change this local parameter T; T is some new node of these
37:02 - 37:06

things. But if it really wasn't making permanent changes to the root,
37:06 - 37:08

really wasn't making root pointer of that new node, root
37:08 - 37:12

would continue pointing to null, and that node would just get orphaned. And
37:12 - 37:14

every subsequent one, the same thing would happen.
37:14 - 37:17

So in order to have this be a persistent change,
37:17 - 37:19

the pointer that's coming in,
37:19 - 37:22

either the root in the first case or T left or T
37:22 - 37:23

right
37:23 - 37:25

in subsequent calls,
37:25 - 37:30

needed to have the ability to be permanently modified when we attach that new
37:30 - 37:33

subtree.
37:33 - 37:37

That's really very tricky code, so I would be happy to
37:37 - 37:39

try to answer
37:39 - 37:41

questions or go through something to try to get your head around what
37:41 - 37:44

it's supposed to be doing. If you could ask a question I
37:44 - 37:47

could
37:47 - 37:49

help you. Or you could just nod your head and say, I'm just totally fine.
37:49 - 37:52

Could you use the tree search algorithm that you programmed before?
37:52 - 37:53

They're
37:53 - 37:58

very close, right, and
37:58 - 37:59

where did I put it?
37:59 - 38:02

It's on this slide.
38:02 - 38:05

The problem with this one right now is it does really stop and return the
38:05 - 38:07

null. So I could actually unify them, but it would take a little more
38:07 - 38:11

work, and at some point you say, wow I could unify this
38:11 - 38:15

at the expense of making this other piece of code a little more complicated. So in this case I chose
38:15 - 38:18

to keep them separate. It's not impossible to unify them, but you end up
38:18 - 38:19

having to point to
38:19 - 38:22

where the node is or a null where you would insert a new node. Tree
38:22 - 38:26

search doesn't care about, but tree enter does. It sort of would
38:26 - 38:29

modify it to fit the needs of both at the expense of making it a little more awkward
38:29 - 38:38

on both sides. Question here.
38:38 - 38:43

[Inaudible] the number example.
38:43 - 38:47

If you were inserting 54 where you said, if you called print,
38:47 - 38:49

wouldn't it print
38:49 - 38:51

out of order? So
38:51 - 38:56

54 would go left of 57, right of 32, so it'd go over here.
38:56 - 38:59

But the way in order traversals work is says,
38:59 - 39:03

print everything in my left subtree, print everything in my left subtree, left subtree. So it would kind of print seven
39:03 - 39:07

as it came back five, 32, and then it would print 54 before it got to the
39:07 - 39:08

57. So
39:08 - 39:12

an in order traversal of a binary search tree, as long as the binary search tree
39:12 - 39:16

is properly maintained, will print them in sorted order.
39:16 - 39:19

It's almost like, don't even think too hard about it, if
39:19 - 39:22

everything in my left subtree is smaller, and everything in my right subtree is greater than me,
39:22 - 39:26

if I print all of those I have to come out in sorted order.
39:26 - 39:30

If the property is correctly maintained in the tree, then that
39:30 - 39:31

by definition
39:31 - 39:34

is a sorted way to pass through all the numbers. So if
39:34 - 39:38

at any point an in order traversal is hitting numbers out of order, it would mean that the tree wasn't
39:38 - 39:40

properly
39:40 - 39:51

organized.
39:51 - 39:55

So let's talk about how that works, there are a
39:55 - 39:59

couple of other operations on the tree, things like doing the size,
39:59 - 40:01

which would count the number of members. Which you could do by just doing a traversal, or you
40:01 - 40:04

might do by just caching a number
40:04 - 40:07

that you updated as you added and moved nodes. And
40:07 - 40:10

the contains keys, that's basically just a tree search
40:10 - 40:14

that determines whether it found a node or not.
40:14 - 40:17

The rest of the implementation is not interest. There's also deleting, but I
40:17 - 40:18

didn't show you
40:18 - 40:23

that. The space use, how does this relate to the other known
40:23 - 40:24

implementations that we have for math?
40:24 - 40:27

It adds certainly some space overhead. We're
40:27 - 40:31

going to have two pointers, a left and a right off every entry. So every entry is
40:31 - 40:35

guaranteed to have an excess 8-byte capacity
40:35 - 40:37

relative to the data being stored.
40:37 - 40:40

It does actually add those tightly to the allocation. So like a link
40:40 - 40:43

list, it allocates nodes on a per
40:43 - 40:46

entry basis, so there's actually not a lot of
40:46 - 40:49

reserve capacity that's waiting around unused the way it might been in a
40:49 - 40:51

vector situation.
40:51 - 40:53

And the performance of it is that it's based on the
40:53 - 40:55

height of the tree,
40:55 - 41:00

that both the add and the get value are doing a traversal starting from the root and
41:00 - 41:03

working their way down to the bottom of the tree, either finding what it wanted
41:03 - 41:05

along the way
41:05 - 41:07

or falling off the bottom of the tree
41:07 - 41:11

when it wasn't found or when it needs to be inserting a new node.
41:11 - 41:15

That height of the tree, you're expecting it to be logarithmic. And so
41:15 - 41:17

if I look at, for example, the values, I've got
41:17 - 41:22

seven values here, two, eight, 14, 15, 20, and 21.
41:22 - 41:23

Those seven,
41:23 - 41:27

eight values can be inserted in a tree to get very
41:27 - 41:28

different results.
41:28 - 41:30

There's not one binary search tree
41:30 - 41:33

that represents those values
41:33 - 41:37

uniquely. Depending on what order you manage to insert them you'll have
41:37 - 41:38

different things.
41:38 - 41:39

For example,
41:39 - 41:41

given the way we're doing it,
41:41 - 41:44

whatever node is inserted first will be the root, and we never move the
41:44 - 41:48

root. We leave the root along. So if you look at this as a historical
41:48 - 41:50

artifact and say, well they entered a bunch of numbers,
41:50 - 41:52

and this is the tree I got.
41:52 - 41:55

I can tell you for sure that 15 was the very first number that they entered.
41:55 - 41:57

There's no other possibility, 15 got entered first.
41:57 - 42:01

The next number that got inserted was one of eight or 20, I don't know
42:01 - 42:03

which.
42:03 - 42:06

I can tell you, for example, that eight got inserted before 14. In terms
42:06 - 42:09

of a level-by-level arrangement, the ones on level one
42:09 - 42:13

were always inserted before the ones on level two.
42:13 - 42:16

But one possibility I said here was well maybe 15 went in first,
42:16 - 42:18

then eight, and then two,
42:18 - 42:21

and then 20, and then 21, and then 14, and then 18. There
42:21 - 42:25

are some other rearrangements that are also possible that would create the exact same
42:25 - 42:26

tree.
42:26 - 42:30

And there are other variations that would produce a different, but still valid,
42:30 - 42:32

binary search tree.
42:32 - 42:33

So with these seven nodes, right,
42:33 - 42:36

this tree is height three,
42:36 - 42:39

and is what's considered perfectly balanced. Half the nodes are on the
42:39 - 42:42

left and the right of each tree all the way through it,
42:42 - 42:45

leading to that just beautifully balanced
42:45 - 42:46
42:46 - 42:47

result we're hoping for
42:47 - 42:52

where both our operations, both add and get value, will be logarithmic. So
42:52 - 42:54

if I have
42:54 - 42:56

1,000
42:56 - 42:59

nodes inserted, that height tree should be about ten. And if everything kind of went
42:59 - 43:03

well and we got kind of went well, and we got an even split all the way around, then it would take ten
43:03 - 43:06

comparisons to find something in that tree or determine it wasn't there' and
43:06 - 43:07

also to insert,
43:07 - 43:10

which was the thing we were having trouble with at the vector case, that
43:10 - 43:13

because that search leads us right to the place where it needs to go,
43:13 - 43:16

and the inserting because of the flexibility of the pointer base structure is
43:16 - 43:17

very easy,
43:17 - 43:21

we can make both that insert and add operate logarithmically.
43:21 - 43:24

Let's think about some cases that aren't so good.
43:24 - 43:28

I take those same seven numbers and I insert them
43:28 - 43:32

not quite so perfectly, and I get a not quite so even result.
43:32 - 43:36

So on this side I can tell you that 20 was the first one inserted.
43:36 - 43:38

It was the root and the root doesn't move,
43:38 - 43:41

but then subsequent ones was inserted an eight, and then a 21, and then an 18,
43:41 - 43:46

and then a 14, and then a 15, and then it went back and got that two,
43:46 - 43:47

the one over there.
43:47 - 43:50

Another possibility for how it got inserted
43:50 - 43:51
43:51 - 43:53

18, 14, 15, and so on.
43:53 - 43:54

These are
43:54 - 43:58

not quite as balanced as the last one, but they're not terribly unbalanced.
43:58 - 44:00

They're like one or two
44:00 - 44:03

more than the perfectly balanced case.
44:03 - 44:09

So at logarithmic plus a little constant still doesn't look
44:09 - 44:11

that bad. Okay, how bad can it really get?
44:11 - 44:15

It can get really bad.
44:15 - 44:18

I can take those seven numbers and insert them in the worst
44:18 - 44:21

case, and produce a completely degenerate tree
44:21 - 44:24

where, for example, if I insert them in sorted order
44:24 - 44:25

or in reverse sorted order,
44:25 - 44:26
44:26 - 44:28

I really have a link list.
44:28 - 44:32

I don't have a tree at all. It goes two, 18, 14, 15, 20, 21,
44:32 - 44:34

that's still a valid binary search tree.
44:34 - 44:35

If you
44:35 - 44:38

print it in order, it still comes out in sorted order.
44:38 - 44:41

It still can be searched using a binary search strategy.
44:41 - 44:45

It's just because the way it's been divided up here, it's not buying
44:45 - 44:48

us a lot. So if I'm looking for the number 22, I'll say, well
44:48 - 44:52

is it to the left or the right of two? It's to the right. Is it to the left or right of
44:52 - 44:55

eight? Oh, it's to the right. Is it to the left or right, you know like all the way down, so finding
44:55 - 44:57

22 looking at every single one
44:57 - 45:01

of the current entries in the tree, before I could conclude, as I fell off the
45:01 - 45:04

bottom, that it wasn't there.
45:04 - 45:06

Similarly kind of over there;
45:06 - 45:09

it didn't buy us a lot. It has a lot to do with
45:09 - 45:13

how we inserted it as to how good a balance we're going to get.
45:13 - 45:16

And there are some cases that will produce just drastically unbalanced
45:16 - 45:17
45:17 - 45:21

things that require linear searches, totally linear to walk that way
45:21 - 45:22

down
45:22 - 45:25

to find something.
45:25 - 45:26

There's a couple of other
45:26 - 45:29

degenerate trees, just to mention it's not only the sorted or reverse sorted.
45:29 - 45:33

There's also these other kind of saw tooth inputs, they're called,
45:33 - 45:35

where you just happen to at any given point to
45:35 - 45:39

have inserted the max or the min of the elements that remain, so it
45:39 - 45:42

keeps kind of dividing the input into one and
45:42 - 45:44

N-1. So having the
45:44 - 45:47

largest and the smallest, and the largest and the smallest, and then
45:47 - 45:48

subsequent smallest down there
45:48 - 45:52

can also produce a linear line. It's not quite all just down the left or
45:52 - 45:55

down the right, but it does mean that every single node has an empty
45:55 - 45:59

left or right subtree is what's considered the degenerate case here. And
45:59 - 46:01

both have the same property where
46:01 - 46:04

half of the pointers off
46:04 - 46:08

of each cell are going nowhere.
46:08 - 46:10

There is an interesting relationship here
46:10 - 46:12

between these worst-case inputs for transversion and quick sort, and they're
46:12 - 46:14

actually kind of exactly one to one.
46:14 - 46:18

That what made quick sort deteriorate was the element of the input,
46:18 - 46:21

the max or the min of what remain. And that's exactly the same case
46:21 - 46:25

for transversion, max or min of what remains means that you are not dividing your
46:25 - 46:29

input into two partitions of roughly equal size. You're dividing them into the
46:29 - 46:33

one, N-1, which is getting you nowhere. What do you
46:33 - 46:37

get to do about it?
46:37 - 46:40

You have to decide first of all if it's a problem worth solving.
46:40 - 46:41
46:41 - 46:44

So that's what the first one say. Sometimes you just say that
46:44 - 46:46

it will sometimes happen in some
46:46 - 46:47

inputs
46:47 - 46:50

that I consider rare enough that it will degenerate. I don't think they're
46:50 - 46:52

important enough or common enough to make a big deal out of.
46:52 - 46:56

I don't think that applies here, because it turns out that getting your data in sorted order is
46:56 - 46:59

probably not unlikely. That somebody might be refilling a map
46:59 - 47:02

with data they read from a file that they wrote out in sorted order last time.
47:02 - 47:05

So saying that if it is coming in sorted,
47:05 - 47:09

then I'm probably going to tolerate that, is probably not going to work.
47:09 - 47:12

And there's two main strategies then, if you've agreed to take action about it,
47:12 - 47:14

about what you're going to do.
47:14 - 47:15
47:15 - 47:19

The first one is to wait until
47:19 - 47:21

the problem gets bad enough
47:21 - 47:23

that you decide to do something about it.
47:23 - 47:28

So you don't do any rebalancing as you go. You just let stuff go, left, right, left,
47:28 - 47:30

right, whatever's happening. And
47:30 - 47:31
47:31 - 47:34

you try to keep track of the height of your tree,
47:34 - 47:37

and you realize when the height of your tree seems sufficiently out of whack with
47:37 - 47:39

your expected logarithmic height,
47:39 - 47:42

to do something about it. Then you just fix the whole tree.
47:42 - 47:45

You can clean the whole thing up. Pull them all out into a vector, rearrange them,
47:45 - 47:49

put them back in, so the middlemost node is the root all the way down. And kind of
47:49 - 47:50

create the perfectly balanced tree,
47:50 - 47:52

and then keep going
47:52 - 47:54

and wait for it to get out of whack, and fix it again;
47:54 - 47:56

kind
47:56 - 47:58

of wait and see.
47:58 - 48:01

The other strategy, and this happens to be a more common one in
48:01 - 48:06

practice, is you just constantly do a little bit of fixing up all the time.
48:06 - 48:09

Whenever you notice that the tree is looking just a little bit lopsided, you let
48:09 - 48:12

it get a little bit; usually you'll let it be out of balance by one.
48:12 - 48:15

So you might have height four on that side and height three on that side.
48:15 - 48:17

But as soon as it starts to get to be three and five,
48:17 - 48:20

you do a little shuffle to bring them back to both four. And it
48:20 - 48:21

involves just a little bit of work
48:21 - 48:23

all the time
48:23 - 48:25

to where you just don't let
48:25 - 48:28

the tree turn into a real mess.
48:28 - 48:31

But there's a little bit of extra fixing up being done all the time.
48:31 - 48:35

And so the particular kind of tree that's discussed in the text, which you can read about,
48:35 - 48:37

but we won't cover
48:37 - 48:40

as part of our core material, is called the ADL tree. It's interesting to
48:40 - 48:43

read about it to see what does it take, what kind of
48:43 - 48:47

process is required to monitor this and do something about it. And
48:47 - 48:50

so if we have that in place,
48:50 - 48:51
48:51 - 48:53

we can get it to where
48:53 - 48:57

the binary search tree part of this is logarithmic on both
48:57 - 48:59

inserting new values and searching for values,
48:59 - 49:03

which is that Holy Grail of efficiency boundary. You can say, yeah logarithmic is
49:03 - 49:04

something we can tolerate
49:04 - 49:08

by virtue of adding 8 bytes of pointer, potentially some balance factor,
49:08 - 49:12

and some fanciness to guarantee that performance all over,
49:12 - 49:15

but gets us somewhere that will scale very, very well for thousands
49:15 - 49:18

and millions of entries, right, logarithmic is still a very, very fast
49:18 - 49:20

function. So it
49:20 - 49:24

creates a map that could really actually go the distance in scale. So, that's
49:24 - 49:28

your talk about binary search trees. We'll come back and talk about
49:28 - 49:38

hashing in graphs and stuff. I will see you on Wednesday.

Title:: Lecture 22 | Programming Abstractions (Stanford)
Description:: Lecture 22 by Julie Zelenski for the Programming Abstractions Course (CS106B) in the Stanford Computer Science Department.

Julie discusses map as a vector and describes a different, possibly better, strategy. The basics of trees and their usefulness and how they can be used with binary search is then introduced.

Complete Playlist for the Course:
http://www.youtube.com/view_play_list?p=FE6E58F856038C69

CS 106B Course Website:
http://cs106b.stanford.edu

Stanford Center for Professional Development:
http://scpd.stanford.edu/

Stanford University:
http://www.stanford.edu/

Stanford University Channel on YouTube:
http://www.youtube.com/stanforduniversity/

more » « less
Video Language:: English
Duration:: 49:45

	N. Ueda edited English subtitles for Lecture 22 \| Programming Abstractions (Stanford)
	Eunjeong_Kim added a translation

English subtitles

Revisions

Revision 2

N. Ueda

Lecture 22 | Programming Abstractions (Stanford)

Revisions

Our website uses cookies

Operating cookies (Required)