Lecture 21 | Programming Abstractions (Stanford)

Edit subtitles

0:00 - 0:02
0:02 - 0:10
0:10 - 0:13

This presentation is delivered by the Stanford Center for Professional
0:13 - 0:21

Development.
0:21 - 0:23

Let us
0:23 - 0:25

refresh about where we were at the end of Wednesday.
0:25 - 0:29

As we had talked about the vector form of the buffer, so the editor buffer that's
0:29 - 0:31

backing the
0:31 - 0:34

word processing client,
0:34 - 0:37

and the two stack version, where we're shuffling stuff, often on the before
0:37 - 0:40

and after stacks to get things done, right?
0:40 - 0:43

And when we were looking at the main six operations, we're looking at the four
0:43 - 0:47

cursor movements and the two editing operations, right, that we had traded off.
0:47 - 0:49

Where editing had been slow in the vector,
0:49 - 0:52

to where editing was now fast, but then these long distance movements were slow,
0:52 - 0:54

because of the shuffling.
0:54 - 0:57

And then there was a little discussion at the very end about how this probably
0:57 - 1:00

increases our space requirements, because we're now keeping two stacks that are
1:00 - 1:02

each capable of holding
1:02 - 1:04

the entire contents
1:04 - 1:06

as opposed to one vector here
1:06 - 1:09

with its slop.
1:09 - 1:10

So that was where we were at.
1:10 - 1:13

We are still in kind of the pursuit of this Holy Grail of could we
1:13 - 1:16

get it to where everything is fast. Is it always a matter of tradeoffs or could we just
1:16 - 1:19

invest some more smarts and more
1:19 - 1:20

cleverness
1:20 - 1:24

and hard work and squeeze it down to where we could get everything to perform
1:24 - 1:26

efficiently? So typically when computer scientists say they want something to perform
1:26 - 1:28

efficiently, they're hoping for log
1:28 - 1:32

in time or faster. Constant time is the best. Log in grows so slowly that in fact,
1:32 - 1:34

logarithm time is effectively constant
1:34 - 1:40

for all reasonable values of N up to millions and billions.
1:40 - 1:43

And so typically anything that is linear or higher is often a weak spot that you are
1:43 - 1:47

trying to look at, and certainly things that are quadratic or exponential
1:47 - 1:51

are definite opportunities for improvement.
1:51 - 1:53

So let's look at this idea of the link list,
1:53 - 1:55

so that both
1:55 - 1:56

the
1:56 - 1:58

vector and the stack
1:58 - 2:03

- we're relying on this idea that continuous memory is the
2:03 - 2:06

weak link that was causing some problems. In the vector, that shuffling to
2:06 - 2:08

move things up and down,
2:08 - 2:12

or in the case of the stack version, one side to the other
2:12 - 2:15

was part of what we were working up against.
2:15 - 2:18

And the link list promises to give us this opportunity to have these each
2:18 - 2:21

individual character being managed separately,
2:21 - 2:26

and flexibility might help resolve some of our problems here.
2:26 - 2:30

SO what we're going to look at is the idea of implementing the buffer using a link
2:30 - 2:32

list. So if I have
2:32 - 2:34

characters A, B, C, D, E,
2:34 - 2:37

I could have a link list here where I'm pointing to the head cell, which is A,
2:37 - 2:40

which points to B, and C, and D, and E, and down to the null.
2:40 - 2:43

SO what I'm modeling it as in the private data section will be a
2:43 - 2:44

little
2:44 - 2:47

link list node with a character in there, a pointer to the next one.
2:47 - 2:52

And then I'm going to keep two pointers, one to the front-most cell, and then that's going to
2:52 - 2:55

model where the cursor is within the buffer.
2:55 - 2:59

So rather than doing this like an index, an index was handy in the case of vector where we
2:59 - 3:00

had direct access, right?
3:00 - 3:02

Given the way the link list works,
3:02 - 3:05

knowing that it's at the fifth cell, the fourth cell isn't really very helpful. It would be more
3:05 - 3:09

helpful to have the pointer kind of directly in the midst of things to give us direct
3:09 - 3:12

access to where the cursor is.
3:12 - 3:14

But let's think a little bit about that cursor,
3:14 - 3:17

because there's an important decision to be made early about helping to
3:17 - 3:20

facilitate the later work we have to do.
3:20 - 3:24

I have the contents A, B, C, D, E,
3:24 - 3:28

so the cursor is actually between two characters.
3:28 - 3:31

If the cursor right now is situated
3:31 - 3:32

after the A and before the B,
3:32 - 3:33

what I'm
3:33 - 3:36

modeling is A cursor B,
3:36 - 3:38

C, D,
3:38 - 3:39

then
3:39 - 3:42

it seems like the obvious two choices for where the cursor might point is to
3:42 - 3:45

A or to B. And I
3:45 - 3:50

had said in the case of the vector that if it were index zero or index one
3:50 - 3:51

that
3:51 - 3:53

there wasn't a really strong preference for one over the other.
3:53 - 3:56

There is going to be a really good reason to pick one for the link list
3:56 - 3:59

over the other. Would you rather point to the A
3:59 - 4:11

for where the cursor is or would you rather point to the B? [Inaudible]
4:11 - 4:14

Exactly, so the idea is that when the cursor position is there, the next edit
4:14 - 4:18

actually goes in between the A and the B. So if I insert the character
4:18 - 4:21

Z, it really does go right here.
4:21 - 4:24

And if I am going to, if effect, wire that in and splice it in then list,
4:24 - 4:29

if I'm pointing to B then I'm kind of hosed, because I'm already one past where I need to be able slice in. Because I need to
4:29 - 4:33

know what was behind it to bring it in. So I'm pointing to the character behind the
4:33 - 4:34

cursor
4:34 - 4:36

gives you access to
4:36 - 4:39

wiring this down to here and this into there, and then updating the cursor to
4:39 - 4:40

point to the Z
4:40 - 4:44

without having to do any of the difficult backward looking backward
4:44 - 4:45

traversal.
4:45 - 4:48

So that strongly suggests what it is, it's going to point to
4:48 - 4:52

A when it's between A and B; B when it's between B and C; and so on.
4:52 - 4:56

There is another little point here, which is there are actually five letters
4:56 - 4:59

in the cursor, but there are six cursor positions. It could be at the very
4:59 - 5:00

beginning,
5:00 - 5:03

in between all these, or at the very end.
5:03 - 5:07

And the way we've chosen it right now, I can identify all of the five
5:07 - 5:08

positions
5:08 - 5:11

that have at least one character to the left, so being -
5:11 - 5:12

pointing to the A means it's
5:12 - 5:15

after A; pointing to the B it's after B; and so on, all the way down to the E.
5:15 - 5:18

But I don't have a cursor position for
5:18 - 5:23

representing when the cursor is at the very beginning of the buffer.
5:23 - 5:25

So
5:25 - 5:28

one strategy I could use, right, is to say I'll just use the special cursor
5:28 - 5:29

position of null. But
5:29 - 5:32

I need one more pointer value to hold onto,
5:32 - 5:36

and I could make a special case out of this because there's null. Once I've done that though,
5:36 - 5:40

I've actually committed myself to this thing that's going to require all of my code to be managing a
5:40 - 5:41

special case of,
5:41 - 5:43

well when the cursor's null do something a little different than when
5:43 - 5:46

the cursor is pointing to some cell. I'm going to
5:46 - 5:50

show you a way that we can kind of
5:50 - 5:54

avoid the need for making a special case out of it,
5:54 - 5:55

by
5:55 - 5:58

wasting a little memory and reorganizing our list
5:58 - 6:01

to allow for there to be a sixth cell,
6:01 - 6:04

and extra cell that we call the dummy cell.
6:04 - 6:07

So in this case that cell would go at the very front.
6:07 - 6:10

It would have no character, just let the character field be unstatused. It's
6:10 - 6:13

not really a character. What it is, is just a placeholder.
6:13 - 6:16

And when I created the buffer to begin with, even when it's empty, it's always going to have
6:16 - 6:20

that cell. I'm going to create that cell in advance, I'm going to have it sitting ready and waiting, and that cell
6:20 - 6:24

never gets deleted or moved or changed. In fact, the list always, the head
6:24 - 6:28

of this will always point to the same cell from the beginning of this object's lifetime
6:28 - 6:32

to the end. So it will never change this at all, and the cells that follow are
6:32 - 6:35

the ones that are going to be changing and growing and rearranging in
6:35 - 6:38

response to editing commands.
6:38 - 6:40

What this is going to buy us is a couple of things. First off, it's going to make it
6:40 - 6:42

so there is a sixth cursor position.
6:42 - 6:45

So now when the cursor is at the very beginning,
6:45 - 6:47

we can set the cursor to point to this dummy cell.
6:47 - 6:48

Okay
6:48 - 6:51

that's one thing that we solve by doing that.
6:51 - 6:55

Another thing that we've solved is actually we have made all the other
6:55 - 6:57

cells on the list, the ones that are being edited and manipulated.
6:57 - 7:00

We've made them all totally symmetric.
7:00 - 7:02

Previously there was always going to be a little bit of a special case about, well if you're
7:02 - 7:06

the front of the list, we're seeing that sometimes we're doing inserts on the
7:06 - 7:08

stack in queue link list. That
7:08 - 7:12

inserting later in the list involves kind of wiring in two pointers; one to command and one
7:12 - 7:13

to go out.
7:13 - 7:16

But there was a special case that oh, if you're the very first cell of the list, then
7:16 - 7:19

your outpointer is the same, but your incoming pointer is resetting the head.
7:19 - 7:22

That piece has actually gone away now,
7:22 - 7:26

that now every cell in there always has a predecessor. It always has a previous,
7:26 - 7:29

right? Something behind it so that actually they will always be
7:29 - 7:31

pulling the pointer in and pulling the pointer out
7:31 - 7:34

for every cell with character data that's being modified.
7:34 - 7:37

And that means some of that extra little handing of oh, if you're the
7:37 - 7:39

very first cell,
7:39 - 7:40

have gone away.
7:40 - 7:44

The very first cell is always this kind of dummy cell and doesn't require us to go
7:44 - 7:46

out of our way to do anything special for it.
7:46 - 7:49

So this is going to solve a bunch of problems for us, and it's kind of a neat technique. This thing
7:49 - 7:51

is used actually
7:51 - 7:52

fairly commonly
7:52 - 7:57

in situations just managing a link list just to buy yourself some symmetry in the
7:57 - 8:02

remaining cells.
8:02 - 8:05

So let me look at some code with you, and I'm going to actually draw some pictures as I
8:05 - 8:06

go. I'm
8:06 - 8:10

not going to write this code in real time, because I think it's actually more important to
8:10 - 8:11

see it
8:11 - 8:15

acting on things than it is to see me typing out the characters.
8:15 - 8:18

But the mechanism, right, in the case of the buffer here
8:18 - 8:21

is it's going to have a head point,
8:21 - 8:23

which points to the front most cell; and a
8:23 - 8:24

cursor,
8:24 - 8:28

which points to the character that proceeds the cursor. So if right now the
8:28 - 8:31

contents of the buffer
8:31 - 8:34

are
8:34 - 8:37

the contents A, D, C.
8:37 - 8:39

And let's say that right now
8:39 - 8:40

the
8:40 - 8:42

cursor is pointing to A.
8:42 - 8:45

Actually, I need my
8:45 - 8:50

dummy cell. Let me move everybody down one, so you just have dummy
8:50 - 8:52

cell and my A and my B.
8:52 - 8:56

And so the cursor right now is at the very front of the buffer.
8:56 - 8:59

This is kind of what it looks like. It has two characters, the A and the B.
8:59 - 9:03

I'm going to trace through the process of inserting a new character
9:03 - 9:04

into the buffer. It
9:04 - 9:07

is that my local variable CP
9:07 - 9:11

is going to point to a new cell allocated out in the heap.
9:11 - 9:15

Let's say the character I'm going to insert is the Z, so I'll
9:15 - 9:16

assign
9:16 - 9:18

CH to the character, the slot right there.
9:18 - 9:20

And then the next for this
9:20 - 9:24

gets to - what the cursor is - kind of following the cursor. So the
9:24 - 9:26

cursor is pointing to the cell before it,
9:26 - 9:29

and the next field of that cell tells us what follows it;
9:29 - 9:31

so if I trace my cursor's
9:31 - 9:33

next field, it points to A,
9:33 - 9:35

and so I'm going to go ahead and
9:35 - 9:41

set up this new cell's next field to also point to A.
9:41 - 9:44

So now I've got two different ways to get to the A cell, tracing off the dummy cell or
9:44 - 9:46

tracing off this new cell.
9:46 - 9:48

And then I
9:48 - 9:51

update the cursor's next field to point to CP.
9:51 - 9:55

So I just copied right there that CP's next field was pointing to A. I copied it down here
9:55 - 9:57

to kind of do the splice out.
9:57 - 10:02

Now I'm going to do the splice in,
10:02 - 10:05

causing it to no longer point to A,
10:05 - 10:06

but instead to
10:06 - 10:08

point to my new cell down here, CP.
10:08 - 10:10

And then I update the cursor
10:10 - 10:13

to point to this new cell. That just means that subsequent inserts, like the
10:13 - 10:18

C is kind of behind the cursor after I've made this
10:18 - 10:21
10:21 - 10:23

edit.
10:23 - 10:25

That goes away when that one ends,
10:25 - 10:28

and so now the new structure that I have here is
10:28 - 10:31

head still points to the dummy cell that never changes.
10:31 - 10:33

Dummy cell now points to Z,
10:33 - 10:36

Z points to A, Z points to B,
10:36 - 10:41

and then the cursor in this case is between the Z and the A,
10:41 - 10:44

by virtue of the fact that the cursor is pointing to Z.
10:44 - 10:50

A subsequent one would kind of do the same thing. If I typed out ZY
10:50 - 10:51

after I did this. We'd
10:51 - 10:55

make a new cell over here. It would get what
10:55 - 10:58

was currently coming out of the cursor's field. The next field, right, we would
10:58 - 11:02

update cursor's next field to point
11:02 - 11:05

to this new one. And then we would update cursor
11:05 - 11:08

to point to this guy. And
11:08 - 11:10

so now following the list, right, dummy cell,
11:10 - 11:11

Z,
11:11 - 11:16

Y, A, B, cursor pointing to the Y, which is the
11:16 - 11:20

character directly to the left of the cursor.
11:20 - 11:22

And so inserting in any position now,
11:22 - 11:25

front cell, middle cell, end cell,
11:25 - 11:29

doesn't require any special case handling. So you don't see any if of things.
11:29 - 11:33

And you notice that you'll never see an update to the head directly here.
11:33 - 11:37

That the head was set up to point to the dummy cell in the constructor
11:37 - 11:39

and never needs any adjusting.
11:39 - 11:42

It's only the cells after it that requires
11:42 - 11:43
11:43 - 11:44

the new
11:44 - 11:47

arrangement of the pointers coming in and
11:47 - 11:49

out. When I'm ready to delete a cell,
11:49 - 11:53

then the mechanism for delete in the buffer is to delete the thing that follows
11:53 - 11:55

the cursor. So if I have Z, Y, A, B,
11:55 - 11:58

this is the character that delete is supposed to delete, the one after the
11:58 - 11:59

cursor.
11:59 - 12:00

And so,
12:00 - 12:04

the first thing it looks at is to make sure that we're not already at the end.
12:04 - 12:07

In the case of the link list form of this, if the cursor was pointing to
12:07 - 12:09

the last most cell,
12:09 - 12:12

and then we looked at it's next field and saw that it was null, that would be a clue. Oh, it's
12:12 - 12:15

pointing to the very last cell, there's nothing that follows it. So we have a
12:15 - 12:17

quick test to make sure that there's something there to delete.
12:17 - 12:19

There is something following the cursor, and
12:19 - 12:24

in this case since the cursor is pointing to the Y, we're good. It says look at its
12:24 - 12:25

next field, it points to A.
12:25 - 12:27

It says okay,
12:27 - 12:29

call this thing old
12:29 - 12:30

and point to that A.
12:30 - 12:33

Now do a wire around it.
12:33 - 12:34

So take the cursor's
12:34 - 12:36

next field, and
12:36 - 12:39

so this thing is where I'm targeting my overwrite,
12:39 - 12:44

and copy out of the old its next. It's
12:44 - 12:49

no longer pointing to the A, but instead it's
12:49 - 12:50

pointing to the B.
12:50 - 12:52

And deleting the old, which
12:52 - 12:54

causes this piece of memory to
12:54 - 12:56

be reclaimed,
12:56 - 13:01

and now the contents of my buffer have been shifted over to
13:01 - 13:06

where it goes Z, Y, and straight to D. The cursor in this case doesn't move. It
13:06 - 13:07

actually deleted to the right, and so
13:07 - 13:11

whatever was to the left of the cursor just stays in place.
13:11 - 13:15

So again, no need to update anything special for the first or the last or any of those
13:15 - 13:15

cells,
13:15 - 13:19

but the symmetry of all the cells past the dummy kind of buys us some convenience
13:19 - 13:22

of handling it. What does a dummy cell look like? The
13:22 - 13:26

dummy cell looks like just any other cell. It's actually a cell, T, and what you do is
13:26 - 13:29

you just don't assign the Christian field, the character field, because it has nothing in it.
13:29 - 13:32

So it actually looks just like all the others, it's just another node in the
13:32 - 13:33

chain.
13:33 - 13:36

But this one, you know you put there purposefully just as the placeholder.
13:36 - 13:40

So it doesn't contain any character data. So if you're ready to print the
13:40 - 13:42

contents of the buffer, you have to remember the dummy cell is there,
13:42 - 13:43

and just skip over it. You
13:43 - 13:45

tend to start your list
13:45 - 13:49

at heads next, the first interesting cell to look at, then work
13:49 - 13:51

your way down to the end. So it's
13:51 - 13:54

just like all the others, but you just know
13:54 - 14:00

that the character data here is useless, it's not important. You didn't even write anything in there. When you delete, you just go left? It turns out
14:00 - 14:02

the delete in this
14:02 - 14:04

case goes forward.
14:04 - 14:07

So you can imagine that in some other world it does, but all the ones we
14:07 - 14:11

have done so far have always deleted forward. So the delete - for example, the stack case,
14:11 - 14:13

grabbed from the after stack
14:13 - 14:16

- it's done that way, because it happens to simplify some other
14:16 - 14:19

things that you can imagine that delete and reverse. What would it take? Well, you'd have to back
14:19 - 14:22

the cursor up. And we're going to find out pretty soon that would make it
14:22 - 14:24

challenging for the link list. So partly we've picked a delete that
14:24 - 14:26

made certain things work out a
14:26 - 14:29

little better for us.
14:29 - 14:31

So both of these operations
14:31 - 14:33

are O of
14:33 - 14:33

1;
14:33 - 14:37

this is where the link list really shines, right, on this sort of stuff. Because
14:37 - 14:41

if you have access to the point at which you want to make a modification, so if you are already kind of
14:41 - 14:44

in the midst of the thing you're trying to rearrange,
14:44 - 14:47

it's often to the link list's advantage, because it actually can do these this rewiring
14:47 - 14:51

in the local context. And even if there's thousands of characters that
14:51 - 14:52

follow the cursor.
14:52 - 14:55

It doesn't matter. And there can be thousands of characters preceding the cursor
14:55 - 14:56

for that matter, too.
14:56 - 15:00

The idea that there's a much larger context that's working and doesn't impact
15:00 - 15:03

its performance; it's really just doing this local little rearrangement of the pointers.
15:03 - 15:05

And that's a real strength of the link list design
15:05 - 15:11

is that flexibility that doesn't rely on everything living in contiguous memory.
15:11 - 15:15

So let's talk about the movement options for this guy.
15:15 - 15:19

So right now I have the contents
15:19 - 15:21

Z, Y with
15:21 - 15:22

the cursor
15:22 - 15:27

between that, and the B following that. So three characters worth,
15:27 - 15:29
15:29 - 15:30

and then
15:30 - 15:31

the
15:31 - 15:33

initial two operations
15:33 - 15:36

are pretty simple. The later two are going to be a little bit messier. So let's look at
15:36 - 15:37

the easy ones first.
15:37 - 15:40

Moving the cursor to the beginning, so that jump that pulls you back to the very
15:40 - 15:41

beginning,
15:41 - 15:44

very easy in a link list form. If I want to move the cursor to the beginning, all
15:44 - 15:47

I need to do is take that cursor pointer,
15:47 - 15:49

and reset it
15:49 - 15:51

to the head, which is where the dummy cell is. And so, by
15:51 - 15:54

virtue of doing that,
15:54 - 15:57
15:57 - 15:59

the character it points to is the one that's to the left,
15:59 - 16:02

right of that, and to the left of that the dummy cell is one we're not
16:02 - 16:05

counting. In fact, it's like nothingness that's there,
16:05 - 16:07

and that means that the first character that follows the cursor is Z, which
16:07 - 16:11

is the initial character of the buffer. So that reset, easy to do, we have
16:11 - 16:13

easy access to the front of the list,
16:13 - 16:14

and so
16:14 - 16:16

no
16:16 - 16:17

special
16:17 - 16:18

crazy code required.
16:18 - 16:21

Moving the cursor forward, also an easy thing to do, link
16:21 - 16:24

list are designed to make it easy to work your way from the front to
16:24 - 16:28

the back. And so moving the cursor forward is advancing that cursor by one step. So
16:28 - 16:32

in the array form of this, what was a cursor plus, plus, the
16:32 - 16:36

comparable form of that in the link list is cursor equals cursor next.
16:36 - 16:37

It has a little
16:37 - 16:40

if test there, all of the versions actually have something that is
16:40 - 16:42

comparable to this. If you're already at the end, then there's
16:42 - 16:46

nothing to be done, so you don't advance the cursor off into space.
16:46 - 16:47

You
16:47 - 16:51

check to make sure that there is something for it to point to. So in this case,
16:51 - 16:54

seeing if the cursor's next field is null, it's not, it points to Z. Then we go
16:54 - 16:58

ahead and update the cursor to there,
16:58 - 17:00

which has the effect of
17:00 - 17:03

changing things to where the Z is the character
17:03 - 17:07

to the left of the cursor, and then the YB follows it.
17:07 - 17:10

So moving this way, getting back to the beginning, and kind of walking our way down is kind of easy
17:10 - 17:11

to do.
17:11 - 17:15

Now we start to see where the link list causes us a little bit of grief,
17:15 - 17:18

moving the cursor to the end.
17:18 - 17:21

So now, starting at the beginning or the middle, or wherever I am,
17:21 - 17:25

I want to advance my way to the end. I don't know where the end is.
17:25 - 17:28

Link lists in general don't keep track of that,
17:28 - 17:31

that in the simple form of the single link list, I'm going to have to work to
17:31 - 17:34

find it. So I'm going to take where I am and walk my way down.
17:34 - 17:37

And this one actually just makes use of an existing
17:37 - 17:40

public number function to move cursor forward. It's like while the cursor next does
17:40 - 17:44

not equal null, which is to say while we are not pointing at the very last cell of the link list,
17:44 - 17:46

then keep going.
17:46 - 17:49

So advance cursor equals cursor.
17:49 - 17:50

Some will say,
17:50 - 17:54

"Is the cursor's next field null?" No, then set the cursor to be the cursor's
17:54 - 17:56

next field.
17:56 - 17:58

Is the cursor's next field null?
17:58 - 17:59
17:59 - 18:02

No it's not, so advance the cursor to the next field.
18:02 - 18:04

Is the cursor's next field null? Yes.
18:04 - 18:06

So that would
18:06 - 18:08

the last thing in the buffer,
18:08 - 18:10

and so if we have
18:10 - 18:13

hundreds or thousands or
18:13 - 18:15

tens of characters, whichever it is, right,
18:15 - 18:18

and depending on how close we already were, it will walk through
18:18 - 18:21

everything from where the current editing position was to find the end of the
18:21 - 18:22

buffer
18:22 - 18:25

one by one, working it's way down.
18:25 - 18:26

Even more painful
18:26 - 18:28

operation
18:28 - 18:32

is the simple one of just moving backwards.
18:32 - 18:34

If I'm pointing right now to that B,
18:34 - 18:37

and I'd like to get back to that Y, the
18:37 - 18:41

link list is oblivious about how you got there. It knows where to go from
18:41 - 18:45

here, but that backing up is not supported in the simple form.
18:45 - 18:48

In order to find out what preceded the V,
18:48 - 18:51

our only option right now is to go back to the beginning,
18:51 - 18:53

and find it.
18:53 - 18:56

So starting this pointer CP at the head,
18:56 - 18:57

and saying,
18:57 - 18:58

do you point to;
18:58 - 19:01

is your next field the cursor?
19:01 - 19:04

And this one says, no. And then it says, well okay advance. Does your next field point to
19:04 - 19:08

the cursor? No it's not. Go to this one, does your next field point to the same place
19:08 - 19:12

as the cursor, yes. Wise next field, points to the same place the cursor
19:12 - 19:13

does.
19:13 - 19:18

That must mean that Y was the one that preceded the cursor in the link
19:18 - 19:22

list. So after having gone back to the beginning and walked all the way down, especially if I'm near
19:22 - 19:24

the end, that's a long traversal.
19:24 - 19:26

And when I get there, I can say "Oh yeah
19:26 - 19:34

that's the one, back it up to there."
19:34 - 19:37

Walking it all the way down;
19:37 - 19:41

some good link list coding, good practice in there, but
19:41 - 19:42

again a funny,
19:42 - 19:44

funny set of trade offs, right?
19:44 - 19:47

Does anybody have any questions about any part of the code?
19:47 - 19:49

At the end
19:49 - 19:51

could you just
19:51 - 19:53

save the address of the pointer, and then the
19:53 - 19:54

person
19:54 - 19:56

who wants to go to the end could just
19:56 - 19:59

access that? That was a great idea. It's like, so what about
19:59 - 20:02

improving this? Right now all we have is a pointer to the front, and each
20:02 - 20:06

pointer has only a pointer to the next. It's like, maybe we need to add some
20:06 - 20:07

more stuff, track
20:07 - 20:10

some more things. What if we tracked the
20:10 - 20:12

tail? Would that help us?
20:12 - 20:16

If we tracked the tail, we could move to the end quickly. Would
20:16 - 20:19

that help us moving backwards?
20:19 - 20:22

No. It solves one of our problems though, so you're on to something.
20:22 - 20:25

Let's look at what we've got, and then we'll start thinking about ways to fix it, because that's going to be
20:25 - 20:27

one of the pieces we're going to think about.
20:27 - 20:29

So if I move to link list
20:29 - 20:31

relative to what I have,
20:31 - 20:33

again, it's like a shell game. The old ends moved.
20:33 - 20:36

It used to be that editing was slow and we didn't like that.
20:36 - 20:38

Then we made big movement fast,
20:38 - 20:41

and now we have this funny thing were moving in certain ways is easy and
20:41 - 20:45

moving other ways is bad. So moving
20:45 - 20:46

down the document
20:46 - 20:49

is good, moving backwards in the document not good.
20:49 - 20:53

Moving back to the beginning is good, but moving to the end is bad.
20:53 - 20:56

But then editing in all situations is good,
20:56 - 20:59

another sort of quirky set of things. It just fells like it's like that game where the
20:59 - 21:00
21:00 - 21:03

gophers pop there heads up, and you pound on them. And they just show up somewhere else. As
21:03 - 21:06

soon as you get one thing fixed, it seems like something else had
21:06 - 21:08

to give.
21:08 - 21:11

This, I think, would actually be my ideal editor, because it turns out I
21:11 - 21:14

have exactly this working style, which is like, I'll be down at the end of
21:14 - 21:17

the document, and I'll have to be generating new things. And
21:17 - 21:20

I'm getting tired and I don't like writing, and what I keep wanting to
21:20 - 21:24

do is I go back to the beginning. And I read that again, because I worked on that the most and
21:24 - 21:27

it's nice. And so I go back to the beginning, and I'm like oh yeah, look at this. Look at how lovely this
21:27 - 21:28

is,
21:28 - 21:31

and then I never want to go back to the end where all the trouble is. So I just
21:31 - 21:34

keep going back to the beginning and editing in the front, and then slowly get
21:34 - 21:37

back to the end. And then I don't like it again, and then I go back to the beginning.
21:37 - 21:39

But I never actually willingly go to the end
21:39 - 21:42

to face my fears. So
21:42 - 21:46

this is the editor I need.
21:46 - 21:47

Also
21:47 - 21:51

has a bit of kicker in terms of space.
21:51 - 21:56

So if the space for a character is 1 byte, which it typically is,
21:56 - 21:58

and the space for a pointer is 4 bytes, then
21:58 - 22:03

each of those link list cells is costing us 5 bytes, right?
22:03 - 22:03

Which is
22:03 - 22:07

sort of five times the amount of storage that the character itself
22:07 - 22:10

would have taken alone,
22:10 - 22:14

so in the vector it's fairly tight, a little bit more going up on that. But we've actually kind of blown
22:14 - 22:18

it up by another factor of two even beyond the stack, which is not quite a
22:18 - 22:22

good direction to be going.
22:22 - 22:24

So let's talk about what we can do to fix that.
22:24 - 22:27

I'm not going to go through the process of this. I'm just going to kind of give you a thought
22:27 - 22:29

exercise, so you can just kind of
22:29 - 22:30

visualize it.
22:30 - 22:34

What we have here, right, is an information problem. We
22:34 - 22:37

only are keeping track of certain
22:37 - 22:39

pathways through that link list.
22:39 - 22:42

What if we actually increased our access to the list?
22:42 - 22:46

We don't need to have the full access of a vector, being able to know the nth
22:46 - 22:49

character by number is not actually as far as we need to go. But we
22:49 - 22:53

do just need a little bit more coordination between a cell and its neighbors.
22:53 - 22:56

So the two things that would buy us out of a lot of the things that we're seeing
22:56 - 22:58

here is if we added a tail pointer.
22:58 - 23:01

So if I just add a tail pointer that points to the last cell, then I can move to the end
23:01 - 23:02

quickly.
23:02 - 23:05

You say cursor equals tail; the same that cursor equals head. So what was an O of N
23:05 - 23:07

problem now is an O of 1.
23:07 - 23:08

The backing up,
23:08 - 23:12

not as easily solved by just having one thing that's going on;
23:12 - 23:13

I also am going to need,
23:13 - 23:17

on a per cell basis this ability to move backwards.
23:17 - 23:19
23:19 - 23:22

Well, it's just symmetry, right? If A knows where B is, and B knows where C is, and C knows where D is, what's to stop
23:22 - 23:25

it from also just tracking the information going in the other way,
23:25 - 23:30

so that D knows that it came from C, and C knows it came from B, and so on.
23:30 - 23:35

And so having this completely symmetric parallel set of links that just run the other
23:35 - 23:35

direction,
23:35 - 23:39

then buys you the ability to move backwards with ease.
23:39 - 23:42

The other thing that this would give you is the tail pointer
23:42 - 23:46

is almost not quite enough to give you the exact place of the end. It gives it to
23:46 - 23:47

you, but you also have to be
23:47 - 23:49

careful about what is it going to take to update it?
23:49 - 23:53

And so inserting can easily update the tail pointer. It's the deleting that kind of
23:53 - 23:55

would get you into trouble. If you were
23:55 - 23:58

here and deleting, you would have to kind of keep track of making sure if you were deleting
23:58 - 24:01

the thing that was the tail pointer, so you have to be a little about making sure
24:01 - 24:04

you don't lose track of your tail.
24:04 - 24:06

But once I have both of these in place,
24:06 - 24:09

I suddenly have something that can move
24:09 - 24:10

forward easily, O of 1;
24:10 - 24:15

move to the front easily, O of 1; move to the tail easily, O of 1; move backwards O of 1.
24:15 - 24:20

So I can make all of my operations
24:20 - 24:22

O of 1.
24:22 - 24:24

Deleting, you're still going to have to delete the individual cells, but there's one place where
24:24 - 24:27

it suffers, but that's a much more rare occurrence.
24:27 - 24:31

And I can get all six; it moves fast, in every dimension;
24:31 - 24:32

inserts and deletes
24:32 - 24:35

all good.
24:35 - 24:37

And there are two places we paid for it.
24:37 - 24:40

One is certainly the memory.
24:40 - 24:45

So a 4 byte pointer one direction, a 4 byte pointer another direction,
24:45 - 24:49

plus the 1 byte for the character means we have a 9 byte per
24:49 - 24:50

character
24:50 - 24:52

storage usage.
24:52 - 24:55

And the other way, which is a little bit hard to represent with some number
24:55 - 24:57

in a quantitative way, but actually it's also an important factor, which is
24:57 - 25:00

your code got a lot more complicated.
25:00 - 25:02

If you looked at the code for the vector of the stack version, it was a few
25:02 - 25:03

lines here a few lines there.
25:03 - 25:06

If you looked at the code for the link list singly we looked at, it's like you've got
25:06 - 25:07

to wire up these pointers.
25:07 - 25:10

Now imagine you've got to wire up twice as many pointers.
25:10 - 25:14

Not only do you have the in and out going one direction, you'll have the in and out going the
25:14 - 25:15

other direction,
25:15 - 25:17

and just the
25:17 - 25:21

opportunity to make errors with pointers has now gone up by a factor of two.
25:21 - 25:24

The idea that you could get a vector version of this up and running in
25:24 - 25:26

no time,
25:26 - 25:30

compared to several days, let's say, of getting the doubly link one working. It
25:30 - 25:32

might be a factor to keep in mind.
25:32 - 25:36

If you're kind of investing for the long haul, sure, work hard, but know
25:36 - 25:41

that it wasn't for free.
25:41 - 25:42

So you've
25:42 - 25:45

got 89 percent overhead.
25:45 - 25:49

That's a lot of overhead for small bits of data.
25:49 - 25:53

More likely, and this is actually the way that most modern word processors
25:53 - 25:54

do do this,
25:54 - 25:58

is they don't just make one or the other; it's not just a array or a stack or a link list.
25:58 - 25:59

They
25:59 - 26:04

actually look at ways to combine the features of both to get a hybrid,
26:04 - 26:05

where you can take
26:05 - 26:10

some of the advantages of one pipe of data, but also try to avoid its worst
26:10 - 26:13

weaknesses. So the most common way they're going to do this is some kind of
26:13 - 26:15

chunking strategy,
26:15 - 26:17

where you have a segment of the
26:17 - 26:21

buffer that kind of is moved aside and worked on in one little array. It
26:21 - 26:25

might be that those arrays are ten or 20 or a sentence long, or paragraph long.
26:25 - 26:28

It might be that they are actually dictated by when you change styles.
26:28 - 26:30

All of the code that's in one font
26:30 - 26:33

kind of gets moved into one chunk, and as soon as you change styles, it breaks
26:33 - 26:37

into a new chunk that follows to kind of keep track of the formatting information. It
26:37 - 26:40

depends on some of the other features that are present in the processor
26:40 - 26:43

here, but it's likely that rather than having all 1,000 or all 10 million
26:43 - 26:45

characters in one
26:45 - 26:46

data structure,
26:46 - 26:47

they're actually kind of separated
26:47 - 26:50

to where you have a little bit of both. So the most likely thing is that
26:50 - 26:53

there is some kind of array strategy on a chunk basis.
26:53 - 26:56

There's a linking between those chunks.
26:56 - 26:59

And what this allows you do is to
26:59 - 27:01

share that overhead cost.
27:01 - 27:05

So in this case, if I had every sentence in a chunk by itself,
27:05 - 27:08

and then if I had next and previous pointer, which pointed to the next sentence and the previous
27:08 - 27:09
27:09 - 27:11

sentence, when I need to move the cursor forward or backwards
27:11 - 27:15

within a chunk, it's just in changing this index. Oh, it's in index two, but
27:15 - 27:16

now it's moving to three, or four, or five.
27:16 - 27:20

And then when it gets to the end of that chunk, to the end of that sentence, and you move forward, well then it
27:20 - 27:23

follows the next link forward.
27:23 - 27:25

Similarly moving backwards, right, so
27:25 - 27:29

within a chunk doing array-like manipulations, and then as it crosses the
27:29 - 27:32

boundary using a link list steps to get further down.
27:32 - 27:35

So when it needs to move to the front or the back, those things are easily
27:35 - 27:37

accessible using head and tail pointers.
27:37 - 27:39

When I need to do an edit,
27:39 - 27:42

it operates in this array-like kind of
27:42 - 27:44

way
27:44 - 27:46

when there's space within that chunk. So you start editing in a sentence, then it
27:46 - 27:49

would kind of do some shifting on that array. But the array itself being pretty
27:49 - 27:51

small
27:51 - 27:53

means that although you are paying that cost of shuffling,
27:53 - 27:56

you are doing it on a smaller chunk. It's not 1,000 characters that have to move, it's
27:56 - 27:59

20 or 40 that have to move within that sentence.
27:59 - 28:02

And that when
28:02 - 28:04

you need to insert a new sentence or a new chunk in the middle, you are doing
28:04 - 28:08

link list-like manipulations to build a new chunk and insert and splice,
28:08 - 28:11

so you have that flexibility, where the many characters aren't
28:11 - 28:14

all affected by it. So you have kind of local array editing
28:14 - 28:16

and global link list editing
28:16 - 28:18

that gives you kind of the best of both worlds.
28:18 - 28:19
28:19 - 28:23

But again, the factor is cost showing up in code complexity.
28:23 - 28:27

That what you are looking at is kind of space/time tradeoff. Well, I can throw space at
28:27 - 28:29

this, and most people will say that in computer science, "Well I can throw space and this and
28:29 - 28:31

get better times." So
28:31 - 28:35

the double link list is a good example of something that throws space at a problem to get better
28:35 - 28:37

time. There's a lot of overhead.
28:37 - 28:39

Moving to the chunk list says I want to get back some of that space. I'm
28:39 - 28:43

not willing to invest 90 percent overhead to get this. Can I find a way to
28:43 - 28:44

take a
28:44 - 28:47

whole sentence worth of things and add a few pointers here? That means that
28:47 - 28:48

there's only two
28:48 - 28:51

4 byte pointers per each sentence as opposed to each character,
28:51 - 28:53

and that really way reduces the overhead.
28:53 - 28:55

But then now,
28:55 - 28:58

the problem is still back on your shoulders in terms of effort,
28:58 - 29:02

now I have both an array, and a link list, and double links, and
29:02 - 29:03

a
29:03 - 29:07

lot of complicated stuff flying around to make it work.
29:07 - 29:08

But it does work out.
29:08 - 29:11

It's kind of the process that designers go through when
29:11 - 29:12

they're building these kinds of
29:12 - 29:16

key structures. It like, what are the options, and do any of the basic
29:16 - 29:18

things we know about work? What about
29:18 - 29:21

building even fancier things that kind of take the best of both world and mix
29:21 - 29:23

them together and have a little bit of the weaknesses of this a little of that, but none of
29:23 - 29:26

the really awful
29:26 - 29:27

worst cases
29:27 - 29:28

present in the end result.
29:28 - 29:31

So in this case you'd have a lot of things that were constant time where constant was
29:31 - 29:33

defined by the chunk size.
29:33 - 29:37

So O of 100, let's say, if that was the maximum chunk size
29:37 - 29:39

to do these things, as opposed to O of
29:39 - 29:42

N for the whole block net,
29:42 - 29:43

so kind of a neat
29:43 - 29:45

little thing to think through.
29:45 - 29:48

There was a time, just so you can feel
29:48 - 29:51

gratified about how I've worked you very hard, but actually in the past I've been
29:51 - 29:53

even harsher.
29:53 - 29:57

We actually had them build the doubly linked chunked list editor buffer,
29:57 - 30:00

and now we have you build the singly linked chunk list PQ, and it's
30:00 - 30:02

actually
30:02 - 30:06

still very complicated, as you'll find out, but not nearly quite as hairy as the
30:06 - 30:09

full in both directions every way
30:09 - 30:11

craziness.
30:11 - 30:14

And you can kind of imagine in your mind.
30:14 - 30:16

Any questions about edited buffer?
30:16 - 30:18

We?ve got two more data structures to implement,
30:18 - 30:20

two more things that we've been using,
30:20 - 30:25

and happy to have in our arsenal that we need to know how they work.
30:25 - 30:29

Let's talk about map first, and in
30:29 - 30:31

fact the strategy that I'm going to end up showing you for map is actually going to be
30:31 - 30:35

the same one we use for set, then we'll come back and think of an even more clever
30:35 - 30:37

way to do map, too. So
30:37 - 30:40

we've got two things to talk about, which are binary search trees and hashing. We'll
30:40 - 30:41

do binary search trees first,
30:41 - 30:44

and then we'll get back to the hashing the second time around.
30:44 - 30:46

So
30:46 - 30:49

map is, next to vector, probably the most useful thing you've got going on in
30:49 - 30:50

your class library,
30:50 - 30:53

all kinds of things that do look up based
30:53 - 30:56

activities; looking up students by ID number,
30:56 - 30:57

phone numbers,
30:57 - 31:00

by name, any of these kinds of things where you need to have this key at associated satellite
31:00 - 31:02

information. And
31:02 - 31:05

you'd like to be able to do that retrieval and update quickly.
31:05 - 31:08

The map is the one for you.
31:08 - 31:11

So if you remember it's key value fears, it's all about key being the
31:11 - 31:15

identifying mark that allows you to find the associated value. And then the value
31:15 - 31:18

is actually just being stored and kind of retrieved without using it
31:18 - 31:21

or examining it in any kind of interesting way.
31:21 - 31:25

And what we're really interested in here is how to make this guy work really efficiently. So efficiently
31:25 - 31:27

being hopefully log in or better,
31:27 - 31:31

if we could get that on both the update operations that add and remove things
31:31 - 31:33

from the map, as well as the
31:33 - 31:34

look up operation, then
31:34 - 31:37

that would please probably all our clients immensely, if we could get both of
31:37 - 31:41

those going well.
31:41 - 31:45

So I'm going to build you on that. I probably have enough time to do that.
31:45 - 31:46

That
31:46 - 31:48

attacks it from
31:48 - 31:53

using simple tools to start with, just to kind of get a base line for what we can do.
31:53 - 31:54
31:54 - 31:56

Vector, our old friend vector,
31:56 - 31:59

can't get too much mileage out of vector, apparently, because you can always find a
31:59 - 32:00

way to make it do something
32:00 - 32:01

useful for you.
32:01 - 32:02

It
32:02 - 32:03

gives us the convenience
32:03 - 32:05
32:05 - 32:07

of managing that thing with low overhead,
32:07 - 32:09

direct access is good.
32:09 - 32:11

We make a parastruct
32:11 - 32:13

that has the key and the value together.
32:13 - 32:16

We store it into a vector,
32:16 - 32:18

and then we
32:18 - 32:21

may or may not sort that vector. But if it's sorted, there's probably some good reasons. Think
32:21 - 32:22

about
32:22 - 32:26

what order to sort it in, what information to be using to sort that,
32:26 - 32:28

and then we'll get value to add. We're just going to go
32:28 - 32:30

through doing it, rather than talk about it.
32:30 - 32:32

We'll just go over and
32:32 - 32:36

make a vector happen.
32:36 - 32:38

Let's get over here in X-code.
32:38 - 32:40

Let's take a look at
32:40 - 32:42

my map dot H,
32:42 - 32:43

and all
32:43 - 32:46

I'm going to put in there is actually add and get value.
32:46 - 32:48

And
32:48 - 32:50

the other one contains key and remove."
32:50 - 32:52
32:52 - 32:55

But these are these two operations that really matter to us, getting something in there, and
32:55 - 32:57

getting something back out.
32:57 - 32:58
32:58 - 33:00

And so I'm going to start by
33:00 - 33:02

building this thing on vector. So let me go ahead and
33:02 - 33:14

indicate that I plan to use vector as part of
33:14 - 33:15

what I'm doing. So I defined a little structure
33:15 - 33:20

that's the pair. I'm going to call that "pair"
33:20 - 33:23

even, that's a nice word for it. So the pair of the key and the value that go together,
33:23 - 33:29

and then what I will store here
33:29 - 33:31

is a vector of those pairs. They'll
33:31 - 33:36

give me a key to value; I'll stick them together in that little struct, stick them into my vector, and
33:36 - 33:38

hopefully be able to retrieve them later.
33:38 - 33:42

So given that I'm depending only on vector, I don?t have any other
33:42 - 33:45

information, I don't technically need to go out of my way to disallow then
33:45 - 33:48

memory-wise copying, because vector does a deep copy. But I'm going to leave it in there, because I
33:48 - 33:52

plan on going some place that is eventually going to need it. I might as well be safe from
33:52 - 33:54

the get go.
33:54 - 33:55

The vector
33:55 - 34:00

will be set up and destroyed automatically as part of management of my
34:00 - 34:04

data member. So I actually don't have any explicit allocation/deallocation, so if I look at my
34:04 - 34:06

constructor and destructor they'll be totally empty.
34:06 - 34:11

And then add and get value are where I'm going to actually get to do some work.
34:11 - 34:14

Before I implement them, I'm just going to think for a second about
34:14 - 34:16

how this is going to play out.
34:16 - 34:20

Let's say I'm doing a vector that's storing
34:20 - 34:26

strings and their length. Let's say, it's just a map of
34:26 - 34:27

string,
34:27 - 34:30
34:30 - 34:34

and if I do M dot add
34:34 - 34:37

car. I'm going to store a three with it,
34:37 - 34:39

so on the other side of the wall,
34:39 - 34:41

what we're going to do is make a little struct
34:41 - 34:43

that has the word car
34:43 - 34:47

and the number three with it, and we package it into our ray.
34:47 - 34:51

And so then we get a second call to put something in like
34:51 - 34:55

dog, also with a three, then we'll make a second one of these and stick it in
34:55 - 34:58

our ray.
34:58 - 35:01

And so on, so the idea is to have the vector really kind of manage where the storage is going. And
35:01 - 35:02

then what
35:02 - 35:05

we'll just do is package it up and stick it in there.
35:05 - 35:08

The one thing, though, that I have to think a little bit about, because it sounds
35:08 - 35:11

like add should be pretty easy. It almost seems like what I need to do is something
35:11 - 35:13

as simple as this. And
35:13 - 35:15

I'll start to write the code, and then you can tell me why
35:15 - 35:17

this isn't what I want to do.
35:17 - 35:20

I say pair TP, and I say P dot
35:20 - 35:23

key equals key P dot
35:23 - 35:30

value equals val, and then I say entries dot add P.
35:30 - 35:32

Almost, but not quite what I want,
35:32 - 35:34

what have I forgotten
35:34 - 35:39

to take care of?
35:39 - 35:42

If it's not already in there;
35:42 - 35:46

this has to do with just what's the interface for nap. What does nap say about it if you
35:46 - 35:48

try to add
35:48 - 35:51

for a key a different value. So in the case of this one, it didn't quite make sense that I would add
35:51 - 35:55

something, but let's say I accidentally put car in there with a link to the first time that I went
35:55 - 35:57

to go fix it, right,
35:57 - 36:01

that my subsequent to ask it store car with the proper value,
36:01 - 36:03

three,
36:03 - 36:05

doesn't create a totally new entry.
36:05 - 36:09

It has to find the existing entry that already has car and overwrite it.
36:09 - 36:13

So the behavior of mad was it could add when it was not previously existing.
36:13 - 36:16

It also could be an overwrite of an existing value. So we do need
36:16 - 36:17

to find
36:17 - 36:20

if there is an entry that already has that key, and just
36:20 - 36:21

replace its value
36:21 - 36:23

rather than
36:23 - 36:26

add a second entry with the same key. So at any given point, right, in this whole
36:26 - 36:28

vector, there should be one entry
36:28 - 36:30

for each unique key.
36:30 - 36:33

So I have to do a search. And
36:33 - 36:35

let me
36:35 - 36:38

write this code, and then I'm going to mention why I'm going to decompose it in a
36:38 - 36:39

second.
36:39 - 36:42

If I do entries dot size, I plus plus
36:42 - 36:44

then if
36:44 - 36:49

entries of I dot key equals key.
36:49 - 36:51

Then
36:51 - 36:52

let's break. Let me pull out
36:52 - 37:00

I so I can get access to it when I'm done. So after this
37:00 - 37:01

loop exits,
37:01 - 37:06

if I is less than entries dot size,
37:06 - 37:06

then
37:06 - 37:09

I know that it found a match.
37:09 - 37:12

In the case where it went through all of them, and it never hit the break. Then I will
37:12 - 37:15

have gone all the way to where it is exactly equal to entries dot size; if it didn't then I
37:15 - 37:16

just
37:16 - 37:19

have a place to overwrite. And I can say entries of
37:19 - 37:21

I dot val
37:21 - 37:23

equals val.
37:23 - 37:27

Otherwise I go through this process of adding a new one. I had to
37:27 - 37:30

go hunt that thing down
37:30 - 37:33

and replace it.
37:33 - 37:35

That code,
37:35 - 37:38

when you look at it, you can say, "Gosh I feel like that code's going to be familiar."
37:38 - 37:43

Because that idea of searching to find a match to the key
37:43 - 37:45

probably is going to come in really handy
37:45 - 37:46

when I need to do get value,
37:46 - 37:50

but it has the same exact problem of oh, I've got to go find that match. So in fact,
37:50 - 37:55

that kind of motivates the idea of why don't I just take this little piece of code
37:55 - 37:59

and separate it out and do a helper that they can both use. I
37:59 - 38:02

may not have planned for this from the beginning, but once I see it happening, I might as well fix
38:02 - 38:05

it. So I can say find index
38:05 - 38:07

for key that
38:07 - 38:10

given a key will return to you
38:10 - 38:11
38:11 - 38:15

the vector index at which
38:15 - 38:19

that key is or a negative one. I don't know if I need to keep that
38:19 - 38:22

one anymore.
38:22 - 38:24

If it ever found it, it returns I otherwise
38:24 - 38:28

it returned negative one. That can be our not found.
38:28 - 38:37

I can actually use that up here,
38:37 - 38:40

and then I can say, if found
38:40 - 38:43

does not equal negative one.
38:43 - 38:44

Then we do that,
38:44 - 38:46

and then that
38:46 - 38:48

tells us that
38:48 - 38:51

this little piece of code is going to come in handy right up here,
38:51 - 38:55

get value if found does not equal
38:55 - 38:56

negative one, then we can return that
38:56 - 39:02

and what is the behavior of get value if it didn't find it? Does anybody remember?
39:02 - 39:06

I ask it to get the value in this case for lollipop and it's not
39:06 - 39:07

there,
39:07 - 39:13

does it turn zero, what does it
39:13 - 39:22

do? Untracked forget value, remember?
39:22 - 39:25

This is an error.
39:25 - 39:26

There is not sort of a
39:26 - 39:30

general case return value you can produce here
39:30 - 39:33

that will really sort of sufficiently describe to someone who's made
39:33 - 39:37

this call in error what happened, right? I can't return zero, because it happens to be
39:37 - 39:40

that sometimes I?m putting in, and sometimes I'm putting strings or structs
39:40 - 39:44

kinds of things. There's no general zero or negative one to use,
39:44 - 39:47

so the behavior of get value is if you ask it to retrieve a value for a key, it ought to be
39:47 - 39:48

there. So
39:48 - 39:52

the corresponding implementation would likely also just go through
39:52 - 39:56

the process of contains key, which we call find index for key and check.
39:56 - 39:58

It could be used by the client
39:58 - 40:01

if they need to know in advance that it's really there.
40:01 - 40:04

So let's see how I'm
40:04 - 40:06

doing. Let's go back to my
40:06 - 40:11

code here,
40:11 - 40:13

and this
40:13 - 40:15

being
40:15 - 40:16

not my specialty,
40:16 - 40:20

and just see that I can put something in and
40:20 - 40:25

get it back out would be a good first test to see that things are
40:25 - 40:29

going as they should. Oh, no it didn't like that. Oh yeah, find
40:29 - 40:30

index for key.
40:30 - 40:32

Find index for key
40:32 - 40:34

was not
40:34 - 40:38

populated clear in my dot H, so let me go move it there. I'll put
40:38 - 40:42

it in the private section, because I don?t want this to be something that is
40:42 - 40:45

exposed outside of the implementation. The idea that there is an index and
40:45 - 40:48

there is a vector is not really something that I would want to make part of my
40:48 - 40:54

public interface. It's really just an internal detail of how I'm managing stuff.
40:54 - 40:56

Something about what I did, oh yeah, I
40:56 - 41:00

used the right name for that.
41:00 - 41:05

One more case let
41:05 - 41:06

me just take a look at them both
41:06 - 41:09

before I let the compiler tell me what is and what isn't right about them. Their both
41:09 - 41:13

doing find index for key using linear search. If it comes back at not negative one
41:13 - 41:17

pulling out the matching value and overriding it, and then in the case of adding
41:17 - 41:23

or replacing the error -
41:23 - 41:24

so the
41:24 - 41:26

previously stored value for that is three,
41:26 - 41:29

and if I go and I do one of those tests it's always good to know what
41:29 - 41:34

happens if I ask it for something that doesn't exist.
41:34 - 41:37

Making sure that the error handling that I put in there does
41:37 - 41:43

what it was supposed to do, right? Are
41:43 - 41:46

there any questions about the code that I wrote here?
41:46 - 41:49

The vector is always kind of a good starting place for these things, because actually it just
41:49 - 41:53

tends to use very simple things, and it tends to lend itself to easy code.
41:53 - 41:57

But because of vector constraints being at a contiguous back structure, right, there's likely to
41:57 - 41:59

be some
41:59 - 42:01

performance implication. And in
42:01 - 42:02

particular,
42:02 - 42:04

in the unsorted case here,
42:04 - 42:07

both add and get value
42:07 - 42:09

involve doing a linear search.
42:09 - 42:12

In some cases it will find it quickly at the very beginning or even half way
42:12 - 42:13

through.
42:13 - 42:17

In other cases it won't find it at all and have to search through the whole thing. So on
42:17 - 42:20

average, if you figure it's equally likely to be in any position or not there at all,
42:20 - 42:23

it's at going to at least have to look at half of them or more.
42:23 - 42:25

And so we can say they're both linear.
42:25 - 42:28

If we change it to be sorted,
42:28 - 42:31

which is the first obvious improvement to make here,
42:31 - 42:34

then get value
42:34 - 42:37

gets a lot faster, because we can take advantage of binary search.
42:37 - 42:39

I sort it by key,
42:39 - 42:43

so ignoring the value, values are just satellite data. But stored in order of
42:43 - 42:44

key,
42:44 - 42:48

then all the A words, B words, C words, D words, what not, so
42:48 - 42:51

walking it down the middle and seeing which half you look at, then looking at the
42:51 - 42:53

quarter of there and the eight of there. It's going to very quickly narrow down on
42:53 - 42:57

where it had to be if it was there or if it wasn't there at all.
42:57 - 43:00

So we can implement that in logarithmic time, meaning that if you 1,000
43:00 - 43:04

entries, it takes ten comparisons to find it or agree that it's not there.
43:04 - 43:08

If you double that it will take one more comparison. If
43:08 - 43:12

you go to 2,000, negligibly faster, even numbers in the millions
43:12 - 43:15

are still just a handful
43:15 - 43:19

of comparisons to work it out, and say,
43:19 - 43:22

to the twentieth for example is roughly a million; so 20 comparisons in or out,
43:22 - 43:25

and so it didn't look at a million things a million times over.
43:25 - 43:27

However,
43:27 - 43:30

that didn't improve add,
43:30 - 43:31

why not?
43:31 - 43:33

Why does keeping an assorted order
43:33 - 43:37

not have the same boost for add as it did for get value?
43:37 - 43:46

Can I do the same logarithmic search? You have to
43:46 - 43:50

move all the elements ?
43:50 - 43:51

Yeah,
43:51 - 43:55

you can find the place fast, right, you can say returning the word
43:55 - 43:59

apple it's like oh, okay. I can very quickly narrow in on where it needed to be. If it
43:59 - 44:02

was there I could override it quickly, but in the case where it really needed to do
44:02 - 44:03

an add,
44:03 - 44:04

it's got to move them down.
44:04 - 44:05

So the old add
44:05 - 44:09

did all its work in the search and then just tacked it on the end if it didn't find
44:09 - 44:12

it, or overrode the middle. But the new add
44:12 - 44:16

also has to modify the array to put it in the right place, and putting it in the right place means
44:16 - 44:18

shuffling to make space.
44:18 - 44:20

So even though it found it quickly,
44:20 - 44:25

it was unable to update that quickly, so it still ends up being
44:25 - 44:26

linear in that case.
44:26 - 44:29

Now this actually not a terrible result,
44:29 - 44:30

as it stands
44:30 - 44:33

it's probably much more likely that you're going to load up the data
44:33 - 44:36

structure once and then do a lot
44:36 - 44:40

of searches on it, then infrequent additions. That's a pretty common usage
44:40 - 44:41

pattern for a map. It's kind of like you
44:41 - 44:43

build a dictionary, right,
44:43 - 44:46

loading the dictionary once could be expensive, but then people get tons and tons of
44:46 - 44:49

look ups of words later. But you don't change the definitions a lot later or add a
44:49 - 44:51

lot of new words.
44:51 - 44:51

So
44:51 - 44:56

this might actually be quite tolerable in many situations
44:56 - 44:59

that the building of it was expensive, but it gave
44:59 - 45:00

you
45:00 - 45:02

fast search access.
45:02 - 45:05

But really what you want to do is say, can we get both of those? So driving to say
45:05 - 45:08

what we do with the editor buffer, you say, "Well, what if I just want both of those operations
45:08 - 45:11

to be logarithmic or better, what can I do?" So
45:11 - 45:12

I'll leave you with kind of just
45:12 - 45:16

a little bit of a taste of where we're heading.
45:16 - 45:19

Just to think a little bit about what the link
45:19 - 45:21

list does and doesn't buy us,
45:21 - 45:24

right, that the link list gave us flexibility
45:24 - 45:29

at the editing, the local editing site, but it doesn't give us fast searching.
45:29 - 45:31

Well maybe we can take this link list and try to
45:31 - 45:33

add searching
45:33 - 45:37

to the idea of this flexibility to kind of get the best of both
45:37 - 45:41

worlds. So that's what we're going to start on Monday, building a tree. So if
45:41 - 45:44

you have time today, I'd love to have you come to Truman, and otherwise have a good
45:44 - 45:45

weekend, and
45:45 - 45:50

get your PQ work in.

Title:: Lecture 21 | Programming Abstractions (Stanford)
Description:: Lecture 21 by Julie Zelenski for the Programming Abstractions Course (CS106B) in the Stanford Computer Science Department.

Julie talks about the buffer version of vector vs. stack and follows this with an example of cursor design. She also talks about linked list insertion and deletion. Cursor movement is the next topic covered; she illustrates how the cursor points from one cell to the next.

Complete Playlist for the Course:
http://www.youtube.com/view_play_list?p=FE6E58F856038C69

CS 106B Course Website:
http://cs106b.stanford.edu

Stanford Center for Professional Development:
http://scpd.stanford.edu/

Stanford University:
http://www.stanford.edu/

Stanford University Channel on YouTube:
http://www.youtube.com/stanforduniversity/

more » « less
Video Language:: English
Duration:: 46:02

	N. Ueda edited English subtitles for Lecture 21 \| Programming Abstractions (Stanford)
	Eunjeong_Kim added a translation

English subtitles

Revisions

Revision 2

N. Ueda

Lecture 21 | Programming Abstractions (Stanford)

Revisions

Our website uses cookies

Operating cookies (Required)