WEBVTT

00:00:00.000 --> 00:00:02.000


00:00:02.000 --> 00:00:10.000


00:00:10.000 --> 00:00:13.000
This presentation is delivered by the Stanford Center for Professional

00:00:13.000 --> 00:00:21.000
Development.

00:00:21.000 --> 00:00:23.000
Let us

00:00:23.000 --> 00:00:25.000
refresh about where we were at the end of Wednesday.

00:00:25.000 --> 00:00:29.000
As we had talked about the vector form of the buffer, so the editor buffer that's

00:00:29.000 --> 00:00:31.000
backing the

00:00:31.000 --> 00:00:34.000
word processing client,

00:00:34.000 --> 00:00:37.000
and the two stack version, where we're shuffling stuff, often on the before

00:00:37.000 --> 00:00:40.000
and after stacks to get things done, right?

00:00:40.000 --> 00:00:43.000
And when we were looking at the main six operations, we're looking at the four

00:00:43.000 --> 00:00:47.000
cursor movements and the two editing operations, right, that we had traded off.

00:00:47.000 --> 00:00:49.000
Where editing had been slow in the vector,

00:00:49.000 --> 00:00:52.000
to where editing was now fast, but then these long distance movements were slow,

00:00:52.000 --> 00:00:54.000
because of the shuffling.

00:00:54.000 --> 00:00:57.000
And then there was a little discussion at the very end about how this probably

00:00:57.000 --> 00:01:00.000
increases our space requirements, because we're now keeping two stacks that are

00:01:00.000 --> 00:01:02.000
each capable of holding

00:01:02.000 --> 00:01:04.000
the entire contents

00:01:04.000 --> 00:01:06.000
as opposed to one vector here

00:01:06.000 --> 00:01:09.000
with its slop.

00:01:09.000 --> 00:01:10.000
So that was where we were at.

00:01:10.000 --> 00:01:13.000
We are still in kind of the pursuit of this Holy Grail of could we

00:01:13.000 --> 00:01:16.000
get it to where everything is fast. Is it always a matter of tradeoffs or could we just

00:01:16.000 --> 00:01:19.000
invest some more smarts and more

00:01:19.000 --> 00:01:20.000
cleverness

00:01:20.000 --> 00:01:24.000
and hard work and squeeze it down to where we could get everything to perform

00:01:24.000 --> 00:01:26.000
efficiently? So typically when computer scientists say they want something to perform

00:01:26.000 --> 00:01:28.000
efficiently, they're hoping for log

00:01:28.000 --> 00:01:32.000
in time or faster. Constant time is the best. Log in grows so slowly that in fact,

00:01:32.000 --> 00:01:34.000
logarithm time is effectively constant

00:01:34.000 --> 00:01:40.000
for all reasonable values of N up to millions and billions.

00:01:40.000 --> 00:01:43.000
And so typically anything that is linear or higher is often a weak spot that you are

00:01:43.000 --> 00:01:47.000
trying to look at, and certainly things that are quadratic or exponential

00:01:47.000 --> 00:01:51.000
are definite opportunities for improvement.

00:01:51.000 --> 00:01:53.000
So let's look at this idea of the link list,

00:01:53.000 --> 00:01:55.000
so that both

00:01:55.000 --> 00:01:56.000
the

00:01:56.000 --> 00:01:58.000
vector and the stack

00:01:58.000 --> 00:02:03.000
- we're relying on this idea that continuous memory is the

00:02:03.000 --> 00:02:06.000
weak link that was causing some problems. In the vector, that shuffling to

00:02:06.000 --> 00:02:08.000
move things up and down,

00:02:08.000 --> 00:02:12.000
or in the case of the stack version, one side to the other

00:02:12.000 --> 00:02:15.000
was part of what we were working up against.

00:02:15.000 --> 00:02:18.000
And the link list promises to give us this opportunity to have these each

00:02:18.000 --> 00:02:21.000
individual character being managed separately,

00:02:21.000 --> 00:02:26.000
and flexibility might help resolve some of our problems here.

00:02:26.000 --> 00:02:30.000
SO what we're going to look at is the idea of implementing the buffer using a link

00:02:30.000 --> 00:02:32.000
list. So if I have

00:02:32.000 --> 00:02:34.000
characters A, B, C, D, E,

00:02:34.000 --> 00:02:37.000
I could have a link list here where I'm pointing to the head cell, which is A,

00:02:37.000 --> 00:02:40.000
which points to B, and C, and D, and E, and down to the null.

00:02:40.000 --> 00:02:43.000
SO what I'm modeling it as in the private data section will be a

00:02:43.000 --> 00:02:44.000
little

00:02:44.000 --> 00:02:47.000
link list node with a character in there, a pointer to the next one.

00:02:47.000 --> 00:02:52.000
And then I'm going to keep two pointers, one to the front-most cell, and then that's going to

00:02:52.000 --> 00:02:55.000
model where the cursor is within the buffer.

00:02:55.000 --> 00:02:59.000
So rather than doing this like an index, an index was handy in the case of vector where we

00:02:59.000 --> 00:03:00.000
had direct access, right?

00:03:00.000 --> 00:03:02.000
Given the way the link list works,

00:03:02.000 --> 00:03:05.000
knowing that it's at the fifth cell, the fourth cell isn't really very helpful. It would be more

00:03:05.000 --> 00:03:09.000
helpful to have the pointer kind of directly in the midst of things to give us direct

00:03:09.000 --> 00:03:12.000
access to where the cursor is.

00:03:12.000 --> 00:03:14.000
But let's think a little bit about that cursor,

00:03:14.000 --> 00:03:17.000
because there's an important decision to be made early about helping to

00:03:17.000 --> 00:03:20.000
facilitate the later work we have to do.

00:03:20.000 --> 00:03:24.000
I have the contents A, B, C, D, E,

00:03:24.000 --> 00:03:28.000
so the cursor is actually between two characters.

00:03:28.000 --> 00:03:31.000
If the cursor right now is situated

00:03:31.000 --> 00:03:32.000
after the A and before the B,

00:03:32.000 --> 00:03:33.000
what I'm

00:03:33.000 --> 00:03:36.000
modeling is A cursor B,

00:03:36.000 --> 00:03:38.000
C, D,

00:03:38.000 --> 00:03:39.000
then

00:03:39.000 --> 00:03:42.000
it seems like the obvious two choices for where the cursor might point is to

00:03:42.000 --> 00:03:45.000
A or to B. And I

00:03:45.000 --> 00:03:50.000
had said in the case of the vector that if it were index zero or index one

00:03:50.000 --> 00:03:51.000
that

00:03:51.000 --> 00:03:53.000
there wasn't a really strong preference for one over the other.

00:03:53.000 --> 00:03:56.000
There is going to be a really good reason to pick one for the link list

00:03:56.000 --> 00:03:59.000
over the other. Would you rather point to the A

00:03:59.000 --> 00:04:11.000
for where the cursor is or would you rather point to the B? [Inaudible]

00:04:11.000 --> 00:04:14.000
Exactly, so the idea is that when the cursor position is there, the next edit

00:04:14.000 --> 00:04:18.000
actually goes in between the A and the B. So if I insert the character

00:04:18.000 --> 00:04:21.000
Z, it really does go right here.

00:04:21.000 --> 00:04:24.000
And if I am going to, if effect, wire that in and splice it in then list,

00:04:24.000 --> 00:04:29.000
if I'm pointing to B then I'm kind of hosed, because I'm already one past where I need to be able slice in. Because I need to

00:04:29.000 --> 00:04:33.000
know what was behind it to bring it in. So I'm pointing to the character behind the

00:04:33.000 --> 00:04:34.000
cursor

00:04:34.000 --> 00:04:36.000
gives you access to

00:04:36.000 --> 00:04:39.000
wiring this down to here and this into there, and then updating the cursor to

00:04:39.000 --> 00:04:40.000
point to the Z

00:04:40.000 --> 00:04:44.000
without having to do any of the difficult backward looking backward

00:04:44.000 --> 00:04:45.000
traversal.

00:04:45.000 --> 00:04:48.000
So that strongly suggests what it is, it's going to point to

00:04:48.000 --> 00:04:52.000
A when it's between A and B; B when it's between B and C; and so on.

00:04:52.000 --> 00:04:56.000
There is another little point here, which is there are actually five letters

00:04:56.000 --> 00:04:59.000
in the cursor, but there are six cursor positions. It could be at the very

00:04:59.000 --> 00:05:00.000
beginning,

00:05:00.000 --> 00:05:03.000
in between all these, or at the very end.

00:05:03.000 --> 00:05:07.000
And the way we've chosen it right now, I can identify all of the five

00:05:07.000 --> 00:05:08.000
positions

00:05:08.000 --> 00:05:11.000
that have at least one character to the left, so being -

00:05:11.000 --> 00:05:12.000
pointing to the A means it's

00:05:12.000 --> 00:05:15.000
after A; pointing to the B it's after B; and so on, all the way down to the E.

00:05:15.000 --> 00:05:18.000
But I don't have a cursor position for

00:05:18.000 --> 00:05:23.000
representing when the cursor is at the very beginning of the buffer.

00:05:23.000 --> 00:05:25.000
So

00:05:25.000 --> 00:05:28.000
one strategy I could use, right, is to say I'll just use the special cursor

00:05:28.000 --> 00:05:29.000
position of null. But

00:05:29.000 --> 00:05:32.000
I need one more pointer value to hold onto,

00:05:32.000 --> 00:05:36.000
and I could make a special case out of this because there's null. Once I've done that though,

00:05:36.000 --> 00:05:40.000
I've actually committed myself to this thing that's going to require all of my code to be managing a

00:05:40.000 --> 00:05:41.000
special case of,

00:05:41.000 --> 00:05:43.000
well when the cursor's null do something a little different than when

00:05:43.000 --> 00:05:46.000
the cursor is pointing to some cell. I'm going to

00:05:46.000 --> 00:05:50.000
show you a way that we can kind of

00:05:50.000 --> 00:05:54.000
avoid the need for making a special case out of it,

00:05:54.000 --> 00:05:55.000
by

00:05:55.000 --> 00:05:58.000
wasting a little memory and reorganizing our list

00:05:58.000 --> 00:06:01.000
to allow for there to be a sixth cell,

00:06:01.000 --> 00:06:04.000
and extra cell that we call the dummy cell.

00:06:04.000 --> 00:06:07.000
So in this case that cell would go at the very front.

00:06:07.000 --> 00:06:10.000
It would have no character, just let the character field be unstatused. It's

00:06:10.000 --> 00:06:13.000
not really a character. What it is, is just a placeholder.

00:06:13.000 --> 00:06:16.000
And when I created the buffer to begin with, even when it's empty, it's always going to have

00:06:16.000 --> 00:06:20.000
that cell. I'm going to create that cell in advance, I'm going to have it sitting ready and waiting, and that cell

00:06:20.000 --> 00:06:24.000
never gets deleted or moved or changed. In fact, the list always, the head

00:06:24.000 --> 00:06:28.000
of this will always point to the same cell from the beginning of this object's lifetime

00:06:28.000 --> 00:06:32.000
to the end. So it will never change this at all, and the cells that follow are

00:06:32.000 --> 00:06:35.000
the ones that are going to be changing and growing and rearranging in

00:06:35.000 --> 00:06:38.000
response to editing commands.

00:06:38.000 --> 00:06:40.000
What this is going to buy us is a couple of things. First off, it's going to make it

00:06:40.000 --> 00:06:42.000
so there is a sixth cursor position.

00:06:42.000 --> 00:06:45.000
So now when the cursor is at the very beginning,

00:06:45.000 --> 00:06:47.000
we can set the cursor to point to this dummy cell.

00:06:47.000 --> 00:06:48.000
Okay

00:06:48.000 --> 00:06:51.000
that's one thing that we solve by doing that.

00:06:51.000 --> 00:06:55.000
Another thing that we've solved is actually we have made all the other

00:06:55.000 --> 00:06:57.000
cells on the list, the ones that are being edited and manipulated.

00:06:57.000 --> 00:07:00.000
We've made them all totally symmetric.

00:07:00.000 --> 00:07:02.000
Previously there was always going to be a little bit of a special case about, well if you're

00:07:02.000 --> 00:07:06.000
the front of the list, we're seeing that sometimes we're doing inserts on the

00:07:06.000 --> 00:07:08.000
stack in queue link list. That

00:07:08.000 --> 00:07:12.000
inserting later in the list involves kind of wiring in two pointers; one to command and one

00:07:12.000 --> 00:07:13.000
to go out.

00:07:13.000 --> 00:07:16.000
But there was a special case that oh, if you're the very first cell of the list, then

00:07:16.000 --> 00:07:19.000
your outpointer is the same, but your incoming pointer is resetting the head.

00:07:19.000 --> 00:07:22.000
That piece has actually gone away now,

00:07:22.000 --> 00:07:26.000
that now every cell in there always has a predecessor. It always has a previous,

00:07:26.000 --> 00:07:29.000
right? Something behind it so that actually they will always be

00:07:29.000 --> 00:07:31.000
pulling the pointer in and pulling the pointer out

00:07:31.000 --> 00:07:34.000
for every cell with character data that's being modified.

00:07:34.000 --> 00:07:37.000
And that means some of that extra little handing of oh, if you're the

00:07:37.000 --> 00:07:39.000
very first cell,

00:07:39.000 --> 00:07:40.000
have gone away.

00:07:40.000 --> 00:07:44.000
The very first cell is always this kind of dummy cell and doesn't require us to go

00:07:44.000 --> 00:07:46.000
out of our way to do anything special for it.

00:07:46.000 --> 00:07:49.000
So this is going to solve a bunch of problems for us, and it's kind of a neat technique. This thing

00:07:49.000 --> 00:07:51.000
is used actually

00:07:51.000 --> 00:07:52.000
fairly commonly

00:07:52.000 --> 00:07:57.000
in situations just managing a link list just to buy yourself some symmetry in the

00:07:57.000 --> 00:08:02.000
remaining cells.

00:08:02.000 --> 00:08:05.000
So let me look at some code with you, and I'm going to actually draw some pictures as I

00:08:05.000 --> 00:08:06.000
go. I'm

00:08:06.000 --> 00:08:10.000
not going to write this code in real time, because I think it's actually more important to

00:08:10.000 --> 00:08:11.000
see it

00:08:11.000 --> 00:08:15.000
acting on things than it is to see me typing out the characters.

00:08:15.000 --> 00:08:18.000
But the mechanism, right, in the case of the buffer here

00:08:18.000 --> 00:08:21.000
is it's going to have a head point,

00:08:21.000 --> 00:08:23.000
which points to the front most cell; and a

00:08:23.000 --> 00:08:24.000
cursor,

00:08:24.000 --> 00:08:28.000
which points to the character that proceeds the cursor. So if right now the

00:08:28.000 --> 00:08:31.000
contents of the buffer

00:08:31.000 --> 00:08:34.000
are

00:08:34.000 --> 00:08:37.000
the contents A, D, C.

00:08:37.000 --> 00:08:39.000
And let's say that right now

00:08:39.000 --> 00:08:40.000
the

00:08:40.000 --> 00:08:42.000
cursor is pointing to A.

00:08:42.000 --> 00:08:45.000
Actually, I need my

00:08:45.000 --> 00:08:50.000
dummy cell. Let me move everybody down one, so you just have dummy

00:08:50.000 --> 00:08:52.000
cell and my A and my B.

00:08:52.000 --> 00:08:56.000
And so the cursor right now is at the very front of the buffer.

00:08:56.000 --> 00:08:59.000
This is kind of what it looks like. It has two characters, the A and the B.

00:08:59.000 --> 00:09:03.000
I'm going to trace through the process of inserting a new character

00:09:03.000 --> 00:09:04.000
into the buffer. It

00:09:04.000 --> 00:09:07.000
is that my local variable CP

00:09:07.000 --> 00:09:11.000
is going to point to a new cell allocated out in the heap.

00:09:11.000 --> 00:09:15.000
Let's say the character I'm going to insert is the Z, so I'll

00:09:15.000 --> 00:09:16.000
assign

00:09:16.000 --> 00:09:18.000
CH to the character, the slot right there.

00:09:18.000 --> 00:09:20.000
And then the next for this

00:09:20.000 --> 00:09:24.000
gets to - what the cursor is - kind of following the cursor. So the

00:09:24.000 --> 00:09:26.000
cursor is pointing to the cell before it,

00:09:26.000 --> 00:09:29.000
and the next field of that cell tells us what follows it;

00:09:29.000 --> 00:09:31.000
so if I trace my cursor's

00:09:31.000 --> 00:09:33.000
next field, it points to A,

00:09:33.000 --> 00:09:35.000
and so I'm going to go ahead and

00:09:35.000 --> 00:09:41.000
set up this new cell's next field to also point to A.

00:09:41.000 --> 00:09:44.000
So now I've got two different ways to get to the A cell, tracing off the dummy cell or

00:09:44.000 --> 00:09:46.000
tracing off this new cell.

00:09:46.000 --> 00:09:48.000
And then I

00:09:48.000 --> 00:09:51.000
update the cursor's next field to point to CP.

00:09:51.000 --> 00:09:55.000
So I just copied right there that CP's next field was pointing to A. I copied it down here

00:09:55.000 --> 00:09:57.000
to kind of do the splice out.

00:09:57.000 --> 00:10:02.000
Now I'm going to do the splice in,

00:10:02.000 --> 00:10:05.000
causing it to no longer point to A,

00:10:05.000 --> 00:10:06.000
but instead to

00:10:06.000 --> 00:10:08.000
point to my new cell down here, CP.

00:10:08.000 --> 00:10:10.000
And then I update the cursor

00:10:10.000 --> 00:10:13.000
to point to this new cell. That just means that subsequent inserts, like the

00:10:13.000 --> 00:10:18.000
C is kind of behind the cursor after I've made this

00:10:18.000 --> 00:10:21.000


00:10:21.000 --> 00:10:23.000
edit.

00:10:23.000 --> 00:10:25.000
That goes away when that one ends,

00:10:25.000 --> 00:10:28.000
and so now the new structure that I have here is

00:10:28.000 --> 00:10:31.000
head still points to the dummy cell that never changes.

00:10:31.000 --> 00:10:33.000
Dummy cell now points to Z,

00:10:33.000 --> 00:10:36.000
Z points to A, Z points to B,

00:10:36.000 --> 00:10:41.000
and then the cursor in this case is between the Z and the A,

00:10:41.000 --> 00:10:44.000
by virtue of the fact that the cursor is pointing to Z.

00:10:44.000 --> 00:10:50.000
A subsequent one would kind of do the same thing. If I typed out ZY

00:10:50.000 --> 00:10:51.000
after I did this. We'd

00:10:51.000 --> 00:10:55.000
make a new cell over here. It would get what

00:10:55.000 --> 00:10:58.000
was currently coming out of the cursor's field. The next field, right, we would

00:10:58.000 --> 00:11:02.000
update cursor's next field to point

00:11:02.000 --> 00:11:05.000
to this new one. And then we would update cursor

00:11:05.000 --> 00:11:08.000
to point to this guy. And

00:11:08.000 --> 00:11:10.000
so now following the list, right, dummy cell,

00:11:10.000 --> 00:11:11.000
Z,

00:11:11.000 --> 00:11:16.000
Y, A, B, cursor pointing to the Y, which is the

00:11:16.000 --> 00:11:20.000
character directly to the left of the cursor.

00:11:20.000 --> 00:11:22.000
And so inserting in any position now,

00:11:22.000 --> 00:11:25.000
front cell, middle cell, end cell,

00:11:25.000 --> 00:11:29.000
doesn't require any special case handling. So you don't see any if of things.

00:11:29.000 --> 00:11:33.000
And you notice that you'll never see an update to the head directly here.

00:11:33.000 --> 00:11:37.000
That the head was set up to point to the dummy cell in the constructor

00:11:37.000 --> 00:11:39.000
and never needs any adjusting.

00:11:39.000 --> 00:11:42.000
It's only the cells after it that requires

00:11:42.000 --> 00:11:43.000


00:11:43.000 --> 00:11:44.000
the new

00:11:44.000 --> 00:11:47.000
arrangement of the pointers coming in and

00:11:47.000 --> 00:11:49.000
out. When I'm ready to delete a cell,

00:11:49.000 --> 00:11:53.000
then the mechanism for delete in the buffer is to delete the thing that follows

00:11:53.000 --> 00:11:55.000
the cursor. So if I have Z, Y, A, B,

00:11:55.000 --> 00:11:58.000
this is the character that delete is supposed to delete, the one after the

00:11:58.000 --> 00:11:59.000
cursor.

00:11:59.000 --> 00:12:00.000
And so,

00:12:00.000 --> 00:12:04.000
the first thing it looks at is to make sure that we're not already at the end.

00:12:04.000 --> 00:12:07.000
In the case of the link list form of this, if the cursor was pointing to

00:12:07.000 --> 00:12:09.000
the last most cell,

00:12:09.000 --> 00:12:12.000
and then we looked at it's next field and saw that it was null, that would be a clue. Oh, it's

00:12:12.000 --> 00:12:15.000
pointing to the very last cell, there's nothing that follows it. So we have a

00:12:15.000 --> 00:12:17.000
quick test to make sure that there's something there to delete.

00:12:17.000 --> 00:12:19.000
There is something following the cursor, and

00:12:19.000 --> 00:12:24.000
in this case since the cursor is pointing to the Y, we're good. It says look at its

00:12:24.000 --> 00:12:25.000
next field, it points to A.

00:12:25.000 --> 00:12:27.000
It says okay,

00:12:27.000 --> 00:12:29.000
call this thing old

00:12:29.000 --> 00:12:30.000
and point to that A.

00:12:30.000 --> 00:12:33.000
Now do a wire around it.

00:12:33.000 --> 00:12:34.000
So take the cursor's

00:12:34.000 --> 00:12:36.000
next field, and

00:12:36.000 --> 00:12:39.000
so this thing is where I'm targeting my overwrite,

00:12:39.000 --> 00:12:44.000
and copy out of the old its next. It's

00:12:44.000 --> 00:12:49.000
no longer pointing to the A, but instead it's

00:12:49.000 --> 00:12:50.000
pointing to the B.

00:12:50.000 --> 00:12:52.000
And deleting the old, which

00:12:52.000 --> 00:12:54.000
causes this piece of memory to

00:12:54.000 --> 00:12:56.000
be reclaimed,

00:12:56.000 --> 00:13:01.000
and now the contents of my buffer have been shifted over to

00:13:01.000 --> 00:13:06.000
where it goes Z, Y, and straight to D. The cursor in this case doesn't move. It

00:13:06.000 --> 00:13:07.000
actually deleted to the right, and so

00:13:07.000 --> 00:13:11.000
whatever was to the left of the cursor just stays in place.

00:13:11.000 --> 00:13:15.000
So again, no need to update anything special for the first or the last or any of those

00:13:15.000 --> 00:13:15.000
cells,

00:13:15.000 --> 00:13:19.000
but the symmetry of all the cells past the dummy kind of buys us some convenience

00:13:19.000 --> 00:13:22.000
of handling it. What does a dummy cell look like? The

00:13:22.000 --> 00:13:26.000
dummy cell looks like just any other cell. It's actually a cell, T, and what you do is

00:13:26.000 --> 00:13:29.000
you just don't assign the Christian field, the character field, because it has nothing in it.

00:13:29.000 --> 00:13:32.000
So it actually looks just like all the others, it's just another node in the

00:13:32.000 --> 00:13:33.000
chain.

00:13:33.000 --> 00:13:36.000
But this one, you know you put there purposefully just as the placeholder.

00:13:36.000 --> 00:13:40.000
So it doesn't contain any character data. So if you're ready to print the

00:13:40.000 --> 00:13:42.000
contents of the buffer, you have to remember the dummy cell is there,

00:13:42.000 --> 00:13:43.000
and just skip over it. You

00:13:43.000 --> 00:13:45.000
tend to start your list

00:13:45.000 --> 00:13:49.000
at heads next, the first interesting cell to look at, then work

00:13:49.000 --> 00:13:51.000
your way down to the end. So it's

00:13:51.000 --> 00:13:54.000
just like all the others, but you just know

00:13:54.000 --> 00:14:00.000
that the character data here is useless, it's not important. You didn't even write anything in there. When you delete, you just go left? It turns out

00:14:00.000 --> 00:14:02.000
the delete in this

00:14:02.000 --> 00:14:04.000
case goes forward.

00:14:04.000 --> 00:14:07.000
So you can imagine that in some other world it does, but all the ones we

00:14:07.000 --> 00:14:11.000
have done so far have always deleted forward. So the delete - for example, the stack case,

00:14:11.000 --> 00:14:13.000
grabbed from the after stack

00:14:13.000 --> 00:14:16.000
- it's done that way, because it happens to simplify some other

00:14:16.000 --> 00:14:19.000
things that you can imagine that delete and reverse. What would it take? Well, you'd have to back

00:14:19.000 --> 00:14:22.000
the cursor up. And we're going to find out pretty soon that would make it

00:14:22.000 --> 00:14:24.000
challenging for the link list. So partly we've picked a delete that

00:14:24.000 --> 00:14:26.000
made certain things work out a

00:14:26.000 --> 00:14:29.000
little better for us.

00:14:29.000 --> 00:14:31.000
So both of these operations

00:14:31.000 --> 00:14:33.000
are O of

00:14:33.000 --> 00:14:33.000
1;

00:14:33.000 --> 00:14:37.000
this is where the link list really shines, right, on this sort of stuff. Because

00:14:37.000 --> 00:14:41.000
if you have access to the point at which you want to make a modification, so if you are already kind of

00:14:41.000 --> 00:14:44.000
in the midst of the thing you're trying to rearrange,

00:14:44.000 --> 00:14:47.000
it's often to the link list's advantage, because it actually can do these this rewiring

00:14:47.000 --> 00:14:51.000
in the local context. And even if there's thousands of characters that

00:14:51.000 --> 00:14:52.000
follow the cursor.

00:14:52.000 --> 00:14:55.000
It doesn't matter. And there can be thousands of characters preceding the cursor

00:14:55.000 --> 00:14:56.000
for that matter, too.

00:14:56.000 --> 00:15:00.000
The idea that there's a much larger context that's working and doesn't impact

00:15:00.000 --> 00:15:03.000
its performance; it's really just doing this local little rearrangement of the pointers.

00:15:03.000 --> 00:15:05.000
And that's a real strength of the link list design

00:15:05.000 --> 00:15:11.000
is that flexibility that doesn't rely on everything living in contiguous memory.

00:15:11.000 --> 00:15:15.000
So let's talk about the movement options for this guy.

00:15:15.000 --> 00:15:19.000
So right now I have the contents

00:15:19.000 --> 00:15:21.000
Z, Y with

00:15:21.000 --> 00:15:22.000
the cursor

00:15:22.000 --> 00:15:27.000
between that, and the B following that. So three characters worth,

00:15:27.000 --> 00:15:29.000


00:15:29.000 --> 00:15:30.000
and then

00:15:30.000 --> 00:15:31.000
the

00:15:31.000 --> 00:15:33.000
initial two operations

00:15:33.000 --> 00:15:36.000
are pretty simple. The later two are going to be a little bit messier. So let's look at

00:15:36.000 --> 00:15:37.000
the easy ones first.

00:15:37.000 --> 00:15:40.000
Moving the cursor to the beginning, so that jump that pulls you back to the very

00:15:40.000 --> 00:15:41.000
beginning,

00:15:41.000 --> 00:15:44.000
very easy in a link list form. If I want to move the cursor to the beginning, all

00:15:44.000 --> 00:15:47.000
I need to do is take that cursor pointer,

00:15:47.000 --> 00:15:49.000
and reset it

00:15:49.000 --> 00:15:51.000
to the head, which is where the dummy cell is. And so, by

00:15:51.000 --> 00:15:54.000
virtue of doing that,

00:15:54.000 --> 00:15:57.000


00:15:57.000 --> 00:15:59.000
the character it points to is the one that's to the left,

00:15:59.000 --> 00:16:02.000
right of that, and to the left of that the dummy cell is one we're not

00:16:02.000 --> 00:16:05.000
counting. In fact, it's like nothingness that's there,

00:16:05.000 --> 00:16:07.000
and that means that the first character that follows the cursor is Z, which

00:16:07.000 --> 00:16:11.000
is the initial character of the buffer. So that reset, easy to do, we have

00:16:11.000 --> 00:16:13.000
easy access to the front of the list,

00:16:13.000 --> 00:16:14.000
and so

00:16:14.000 --> 00:16:16.000
no

00:16:16.000 --> 00:16:17.000
special

00:16:17.000 --> 00:16:18.000
crazy code required.

00:16:18.000 --> 00:16:21.000
Moving the cursor forward, also an easy thing to do, link

00:16:21.000 --> 00:16:24.000
list are designed to make it easy to work your way from the front to

00:16:24.000 --> 00:16:28.000
the back. And so moving the cursor forward is advancing that cursor by one step. So

00:16:28.000 --> 00:16:32.000
in the array form of this, what was a cursor plus, plus, the

00:16:32.000 --> 00:16:36.000
comparable form of that in the link list is cursor equals cursor next.

00:16:36.000 --> 00:16:37.000
It has a little

00:16:37.000 --> 00:16:40.000
if test there, all of the versions actually have something that is

00:16:40.000 --> 00:16:42.000
comparable to this. If you're already at the end, then there's

00:16:42.000 --> 00:16:46.000
nothing to be done, so you don't advance the cursor off into space.

00:16:46.000 --> 00:16:47.000
You

00:16:47.000 --> 00:16:51.000
check to make sure that there is something for it to point to. So in this case,

00:16:51.000 --> 00:16:54.000
seeing if the cursor's next field is null, it's not, it points to Z. Then we go

00:16:54.000 --> 00:16:58.000
ahead and update the cursor to there,

00:16:58.000 --> 00:17:00.000
which has the effect of

00:17:00.000 --> 00:17:03.000
changing things to where the Z is the character

00:17:03.000 --> 00:17:07.000
to the left of the cursor, and then the YB follows it.

00:17:07.000 --> 00:17:10.000
So moving this way, getting back to the beginning, and kind of walking our way down is kind of easy

00:17:10.000 --> 00:17:11.000
to do.

00:17:11.000 --> 00:17:15.000
Now we start to see where the link list causes us a little bit of grief,

00:17:15.000 --> 00:17:18.000
moving the cursor to the end.

00:17:18.000 --> 00:17:21.000
So now, starting at the beginning or the middle, or wherever I am,

00:17:21.000 --> 00:17:25.000
I want to advance my way to the end. I don't know where the end is.

00:17:25.000 --> 00:17:28.000
Link lists in general don't keep track of that,

00:17:28.000 --> 00:17:31.000
that in the simple form of the single link list, I'm going to have to work to

00:17:31.000 --> 00:17:34.000
find it. So I'm going to take where I am and walk my way down.

00:17:34.000 --> 00:17:37.000
And this one actually just makes use of an existing

00:17:37.000 --> 00:17:40.000
public number function to move cursor forward. It's like while the cursor next does

00:17:40.000 --> 00:17:44.000
not equal null, which is to say while we are not pointing at the very last cell of the link list,

00:17:44.000 --> 00:17:46.000
then keep going.

00:17:46.000 --> 00:17:49.000
So advance cursor equals cursor.

00:17:49.000 --> 00:17:50.000
Some will say,

00:17:50.000 --> 00:17:54.000
"Is the cursor's next field null?" No, then set the cursor to be the cursor's

00:17:54.000 --> 00:17:56.000
next field.

00:17:56.000 --> 00:17:58.000
Is the cursor's next field null?

00:17:58.000 --> 00:17:59.000


00:17:59.000 --> 00:18:02.000
No it's not, so advance the cursor to the next field.

00:18:02.000 --> 00:18:04.000
Is the cursor's next field null? Yes.

00:18:04.000 --> 00:18:06.000
So that would

00:18:06.000 --> 00:18:08.000
the last thing in the buffer,

00:18:08.000 --> 00:18:10.000
and so if we have

00:18:10.000 --> 00:18:13.000
hundreds or thousands or

00:18:13.000 --> 00:18:15.000
tens of characters, whichever it is, right,

00:18:15.000 --> 00:18:18.000
and depending on how close we already were, it will walk through

00:18:18.000 --> 00:18:21.000
everything from where the current editing position was to find the end of the

00:18:21.000 --> 00:18:22.000
buffer

00:18:22.000 --> 00:18:25.000
one by one, working it's way down.

00:18:25.000 --> 00:18:26.000
Even more painful

00:18:26.000 --> 00:18:28.000
operation

00:18:28.000 --> 00:18:32.000
is the simple one of just moving backwards.

00:18:32.000 --> 00:18:34.000
If I'm pointing right now to that B,

00:18:34.000 --> 00:18:37.000
and I'd like to get back to that Y, the

00:18:37.000 --> 00:18:41.000
link list is oblivious about how you got there. It knows where to go from

00:18:41.000 --> 00:18:45.000
here, but that backing up is not supported in the simple form.

00:18:45.000 --> 00:18:48.000
In order to find out what preceded the V,

00:18:48.000 --> 00:18:51.000
our only option right now is to go back to the beginning,

00:18:51.000 --> 00:18:53.000
and find it.

00:18:53.000 --> 00:18:56.000
So starting this pointer CP at the head,

00:18:56.000 --> 00:18:57.000
and saying,

00:18:57.000 --> 00:18:58.000
do you point to;

00:18:58.000 --> 00:19:01.000
is your next field the cursor?

00:19:01.000 --> 00:19:04.000
And this one says, no. And then it says, well okay advance. Does your next field point to

00:19:04.000 --> 00:19:08.000
the cursor? No it's not. Go to this one, does your next field point to the same place

00:19:08.000 --> 00:19:12.000
as the cursor, yes. Wise next field, points to the same place the cursor

00:19:12.000 --> 00:19:13.000
does.

00:19:13.000 --> 00:19:18.000
That must mean that Y was the one that preceded the cursor in the link

00:19:18.000 --> 00:19:22.000
list. So after having gone back to the beginning and walked all the way down, especially if I'm near

00:19:22.000 --> 00:19:24.000
the end, that's a long traversal.

00:19:24.000 --> 00:19:26.000
And when I get there, I can say "Oh yeah

00:19:26.000 --> 00:19:34.000
that's the one, back it up to there."

00:19:34.000 --> 00:19:37.000
Walking it all the way down;

00:19:37.000 --> 00:19:41.000
some good link list coding, good practice in there, but

00:19:41.000 --> 00:19:42.000
again a funny,

00:19:42.000 --> 00:19:44.000
funny set of trade offs, right?

00:19:44.000 --> 00:19:47.000
Does anybody have any questions about any part of the code?

00:19:47.000 --> 00:19:49.000
At the end

00:19:49.000 --> 00:19:51.000
could you just

00:19:51.000 --> 00:19:53.000
save the address of the pointer, and then the

00:19:53.000 --> 00:19:54.000
person

00:19:54.000 --> 00:19:56.000
who wants to go to the end could just

00:19:56.000 --> 00:19:59.000
access that? That was a great idea. It's like, so what about

00:19:59.000 --> 00:20:02.000
improving this? Right now all we have is a pointer to the front, and each

00:20:02.000 --> 00:20:06.000
pointer has only a pointer to the next. It's like, maybe we need to add some

00:20:06.000 --> 00:20:07.000
more stuff, track

00:20:07.000 --> 00:20:10.000
some more things. What if we tracked the

00:20:10.000 --> 00:20:12.000
tail? Would that help us?

00:20:12.000 --> 00:20:16.000
If we tracked the tail, we could move to the end quickly. Would

00:20:16.000 --> 00:20:19.000
that help us moving backwards?

00:20:19.000 --> 00:20:22.000
No. It solves one of our problems though, so you're on to something.

00:20:22.000 --> 00:20:25.000
Let's look at what we've got, and then we'll start thinking about ways to fix it, because that's going to be

00:20:25.000 --> 00:20:27.000
one of the pieces we're going to think about.

00:20:27.000 --> 00:20:29.000
So if I move to link list

00:20:29.000 --> 00:20:31.000
relative to what I have,

00:20:31.000 --> 00:20:33.000
again, it's like a shell game. The old ends moved.

00:20:33.000 --> 00:20:36.000
It used to be that editing was slow and we didn't like that.

00:20:36.000 --> 00:20:38.000
Then we made big movement fast,

00:20:38.000 --> 00:20:41.000
and now we have this funny thing were moving in certain ways is easy and

00:20:41.000 --> 00:20:45.000
moving other ways is bad. So moving

00:20:45.000 --> 00:20:46.000
down the document

00:20:46.000 --> 00:20:49.000
is good, moving backwards in the document not good.

00:20:49.000 --> 00:20:53.000
Moving back to the beginning is good, but moving to the end is bad.

00:20:53.000 --> 00:20:56.000
But then editing in all situations is good,

00:20:56.000 --> 00:20:59.000
another sort of quirky set of things. It just fells like it's like that game where the

00:20:59.000 --> 00:21:00.000


00:21:00.000 --> 00:21:03.000
gophers pop there heads up, and you pound on them. And they just show up somewhere else. As

00:21:03.000 --> 00:21:06.000
soon as you get one thing fixed, it seems like something else had

00:21:06.000 --> 00:21:08.000
to give.

00:21:08.000 --> 00:21:11.000
This, I think, would actually be my ideal editor, because it turns out I

00:21:11.000 --> 00:21:14.000
have exactly this working style, which is like, I'll be down at the end of

00:21:14.000 --> 00:21:17.000
the document, and I'll have to be generating new things. And

00:21:17.000 --> 00:21:20.000
I'm getting tired and I don't like writing, and what I keep wanting to

00:21:20.000 --> 00:21:24.000
do is I go back to the beginning. And I read that again, because I worked on that the most and

00:21:24.000 --> 00:21:27.000
it's nice. And so I go back to the beginning, and I'm like oh yeah, look at this. Look at how lovely this

00:21:27.000 --> 00:21:28.000
is,

00:21:28.000 --> 00:21:31.000
and then I never want to go back to the end where all the trouble is. So I just

00:21:31.000 --> 00:21:34.000
keep going back to the beginning and editing in the front, and then slowly get

00:21:34.000 --> 00:21:37.000
back to the end. And then I don't like it again, and then I go back to the beginning.

00:21:37.000 --> 00:21:39.000
But I never actually willingly go to the end

00:21:39.000 --> 00:21:42.000
to face my fears. So

00:21:42.000 --> 00:21:46.000
this is the editor I need.

00:21:46.000 --> 00:21:47.000
Also

00:21:47.000 --> 00:21:51.000
has a bit of kicker in terms of space.

00:21:51.000 --> 00:21:56.000
So if the space for a character is 1 byte, which it typically is,

00:21:56.000 --> 00:21:58.000
and the space for a pointer is 4 bytes, then

00:21:58.000 --> 00:22:03.000
each of those link list cells is costing us 5 bytes, right?

00:22:03.000 --> 00:22:03.000
Which is

00:22:03.000 --> 00:22:07.000
sort of five times the amount of storage that the character itself

00:22:07.000 --> 00:22:10.000
would have taken alone,

00:22:10.000 --> 00:22:14.000
so in the vector it's fairly tight, a little bit more going up on that. But we've actually kind of blown

00:22:14.000 --> 00:22:18.000
it up by another factor of two even beyond the stack, which is not quite a

00:22:18.000 --> 00:22:22.000
good direction to be going.

00:22:22.000 --> 00:22:24.000
So let's talk about what we can do to fix that.

00:22:24.000 --> 00:22:27.000
I'm not going to go through the process of this. I'm just going to kind of give you a thought

00:22:27.000 --> 00:22:29.000
exercise, so you can just kind of

00:22:29.000 --> 00:22:30.000
visualize it.

00:22:30.000 --> 00:22:34.000
What we have here, right, is an information problem. We

00:22:34.000 --> 00:22:37.000
only are keeping track of certain

00:22:37.000 --> 00:22:39.000
pathways through that link list.

00:22:39.000 --> 00:22:42.000
What if we actually increased our access to the list?

00:22:42.000 --> 00:22:46.000
We don't need to have the full access of a vector, being able to know the nth

00:22:46.000 --> 00:22:49.000
character by number is not actually as far as we need to go. But we

00:22:49.000 --> 00:22:53.000
do just need a little bit more coordination between a cell and its neighbors.

00:22:53.000 --> 00:22:56.000
So the two things that would buy us out of a lot of the things that we're seeing

00:22:56.000 --> 00:22:58.000
here is if we added a tail pointer.

00:22:58.000 --> 00:23:01.000
So if I just add a tail pointer that points to the last cell, then I can move to the end

00:23:01.000 --> 00:23:02.000
quickly.

00:23:02.000 --> 00:23:05.000
You say cursor equals tail; the same that cursor equals head. So what was an O of N

00:23:05.000 --> 00:23:07.000
problem now is an O of 1.

00:23:07.000 --> 00:23:08.000
The backing up,

00:23:08.000 --> 00:23:12.000
not as easily solved by just having one thing that's going on;

00:23:12.000 --> 00:23:13.000
I also am going to need,

00:23:13.000 --> 00:23:17.000
on a per cell basis this ability to move backwards.

00:23:17.000 --> 00:23:19.000


00:23:19.000 --> 00:23:22.000
Well, it's just symmetry, right? If A knows where B is, and B knows where C is, and C knows where D is, what's to stop

00:23:22.000 --> 00:23:25.000
it from also just tracking the information going in the other way,

00:23:25.000 --> 00:23:30.000
so that D knows that it came from C, and C knows it came from B, and so on.

00:23:30.000 --> 00:23:35.000
And so having this completely symmetric parallel set of links that just run the other

00:23:35.000 --> 00:23:35.000
direction,

00:23:35.000 --> 00:23:39.000
then buys you the ability to move backwards with ease.

00:23:39.000 --> 00:23:42.000
The other thing that this would give you is the tail pointer

00:23:42.000 --> 00:23:46.000
is almost not quite enough to give you the exact place of the end. It gives it to

00:23:46.000 --> 00:23:47.000
you, but you also have to be

00:23:47.000 --> 00:23:49.000
careful about what is it going to take to update it?

00:23:49.000 --> 00:23:53.000
And so inserting can easily update the tail pointer. It's the deleting that kind of

00:23:53.000 --> 00:23:55.000
would get you into trouble. If you were

00:23:55.000 --> 00:23:58.000
here and deleting, you would have to kind of keep track of making sure if you were deleting

00:23:58.000 --> 00:24:01.000
the thing that was the tail pointer, so you have to be a little about making sure

00:24:01.000 --> 00:24:04.000
you don't lose track of your tail.

00:24:04.000 --> 00:24:06.000
But once I have both of these in place,

00:24:06.000 --> 00:24:09.000
I suddenly have something that can move

00:24:09.000 --> 00:24:10.000
forward easily, O of 1;

00:24:10.000 --> 00:24:15.000
move to the front easily, O of 1; move to the tail easily, O of 1; move backwards O of 1.

00:24:15.000 --> 00:24:20.000
So I can make all of my operations

00:24:20.000 --> 00:24:22.000
O of 1.

00:24:22.000 --> 00:24:24.000
Deleting, you're still going to have to delete the individual cells, but there's one place where

00:24:24.000 --> 00:24:27.000
it suffers, but that's a much more rare occurrence.

00:24:27.000 --> 00:24:31.000
And I can get all six; it moves fast, in every dimension;

00:24:31.000 --> 00:24:32.000
inserts and deletes

00:24:32.000 --> 00:24:35.000
all good.

00:24:35.000 --> 00:24:37.000
And there are two places we paid for it.

00:24:37.000 --> 00:24:40.000
One is certainly the memory.

00:24:40.000 --> 00:24:45.000
So a 4 byte pointer one direction, a 4 byte pointer another direction,

00:24:45.000 --> 00:24:49.000
plus the 1 byte for the character means we have a 9 byte per

00:24:49.000 --> 00:24:50.000
character

00:24:50.000 --> 00:24:52.000
storage usage.

00:24:52.000 --> 00:24:55.000
And the other way, which is a little bit hard to represent with some number

00:24:55.000 --> 00:24:57.000
in a quantitative way, but actually it's also an important factor, which is

00:24:57.000 --> 00:25:00.000
your code got a lot more complicated.

00:25:00.000 --> 00:25:02.000
If you looked at the code for the vector of the stack version, it was a few

00:25:02.000 --> 00:25:03.000
lines here a few lines there.

00:25:03.000 --> 00:25:06.000
If you looked at the code for the link list singly we looked at, it's like you've got

00:25:06.000 --> 00:25:07.000
to wire up these pointers.

00:25:07.000 --> 00:25:10.000
Now imagine you've got to wire up twice as many pointers.

00:25:10.000 --> 00:25:14.000
Not only do you have the in and out going one direction, you'll have the in and out going the

00:25:14.000 --> 00:25:15.000
other direction,

00:25:15.000 --> 00:25:17.000
and just the

00:25:17.000 --> 00:25:21.000
opportunity to make errors with pointers has now gone up by a factor of two.

00:25:21.000 --> 00:25:24.000
The idea that you could get a vector version of this up and running in

00:25:24.000 --> 00:25:26.000
no time,

00:25:26.000 --> 00:25:30.000
compared to several days, let's say, of getting the doubly link one working. It

00:25:30.000 --> 00:25:32.000
might be a factor to keep in mind.

00:25:32.000 --> 00:25:36.000
If you're kind of investing for the long haul, sure, work hard, but know

00:25:36.000 --> 00:25:41.000
that it wasn't for free.

00:25:41.000 --> 00:25:42.000
So you've

00:25:42.000 --> 00:25:45.000
got 89 percent overhead.

00:25:45.000 --> 00:25:49.000
That's a lot of overhead for small bits of data.

00:25:49.000 --> 00:25:53.000
More likely, and this is actually the way that most modern word processors

00:25:53.000 --> 00:25:54.000
do do this,

00:25:54.000 --> 00:25:58.000
is they don't just make one or the other; it's not just a array or a stack or a link list.

00:25:58.000 --> 00:25:59.000
They

00:25:59.000 --> 00:26:04.000
actually look at ways to combine the features of both to get a hybrid,

00:26:04.000 --> 00:26:05.000
where you can take

00:26:05.000 --> 00:26:10.000
some of the advantages of one pipe of data, but also try to avoid its worst

00:26:10.000 --> 00:26:13.000
weaknesses. So the most common way they're going to do this is some kind of

00:26:13.000 --> 00:26:15.000
chunking strategy,

00:26:15.000 --> 00:26:17.000
where you have a segment of the

00:26:17.000 --> 00:26:21.000
buffer that kind of is moved aside and worked on in one little array. It

00:26:21.000 --> 00:26:25.000
might be that those arrays are ten or 20 or a sentence long, or paragraph long.

00:26:25.000 --> 00:26:28.000
It might be that they are actually dictated by when you change styles.

00:26:28.000 --> 00:26:30.000
All of the code that's in one font

00:26:30.000 --> 00:26:33.000
kind of gets moved into one chunk, and as soon as you change styles, it breaks

00:26:33.000 --> 00:26:37.000
into a new chunk that follows to kind of keep track of the formatting information. It

00:26:37.000 --> 00:26:40.000
depends on some of the other features that are present in the processor

00:26:40.000 --> 00:26:43.000
here, but it's likely that rather than having all 1,000 or all 10 million

00:26:43.000 --> 00:26:45.000
characters in one

00:26:45.000 --> 00:26:46.000
data structure,

00:26:46.000 --> 00:26:47.000
they're actually kind of separated

00:26:47.000 --> 00:26:50.000
to where you have a little bit of both. So the most likely thing is that

00:26:50.000 --> 00:26:53.000
there is some kind of array strategy on a chunk basis.

00:26:53.000 --> 00:26:56.000
There's a linking between those chunks.

00:26:56.000 --> 00:26:59.000
And what this allows you do is to

00:26:59.000 --> 00:27:01.000
share that overhead cost.

00:27:01.000 --> 00:27:05.000
So in this case, if I had every sentence in a chunk by itself,

00:27:05.000 --> 00:27:08.000
and then if I had next and previous pointer, which pointed to the next sentence and the previous

00:27:08.000 --> 00:27:09.000


00:27:09.000 --> 00:27:11.000
sentence, when I need to move the cursor forward or backwards

00:27:11.000 --> 00:27:15.000
within a chunk, it's just in changing this index. Oh, it's in index two, but

00:27:15.000 --> 00:27:16.000
now it's moving to three, or four, or five.

00:27:16.000 --> 00:27:20.000
And then when it gets to the end of that chunk, to the end of that sentence, and you move forward, well then it

00:27:20.000 --> 00:27:23.000
follows the next link forward.

00:27:23.000 --> 00:27:25.000
Similarly moving backwards, right, so

00:27:25.000 --> 00:27:29.000
within a chunk doing array-like manipulations, and then as it crosses the

00:27:29.000 --> 00:27:32.000
boundary using a link list steps to get further down.

00:27:32.000 --> 00:27:35.000
So when it needs to move to the front or the back, those things are easily

00:27:35.000 --> 00:27:37.000
accessible using head and tail pointers.

00:27:37.000 --> 00:27:39.000
When I need to do an edit,

00:27:39.000 --> 00:27:42.000
it operates in this array-like kind of

00:27:42.000 --> 00:27:44.000
way

00:27:44.000 --> 00:27:46.000
when there's space within that chunk. So you start editing in a sentence, then it

00:27:46.000 --> 00:27:49.000
would kind of do some shifting on that array. But the array itself being pretty

00:27:49.000 --> 00:27:51.000
small

00:27:51.000 --> 00:27:53.000
means that although you are paying that cost of shuffling,

00:27:53.000 --> 00:27:56.000
you are doing it on a smaller chunk. It's not 1,000 characters that have to move, it's

00:27:56.000 --> 00:27:59.000
20 or 40 that have to move within that sentence.

00:27:59.000 --> 00:28:02.000
And that when

00:28:02.000 --> 00:28:04.000
you need to insert a new sentence or a new chunk in the middle, you are doing

00:28:04.000 --> 00:28:08.000
link list-like manipulations to build a new chunk and insert and splice,

00:28:08.000 --> 00:28:11.000
so you have that flexibility, where the many characters aren't

00:28:11.000 --> 00:28:14.000
all affected by it. So you have kind of local array editing

00:28:14.000 --> 00:28:16.000
and global link list editing

00:28:16.000 --> 00:28:18.000
that gives you kind of the best of both worlds.

00:28:18.000 --> 00:28:19.000


00:28:19.000 --> 00:28:23.000
But again, the factor is cost showing up in code complexity.

00:28:23.000 --> 00:28:27.000
That what you are looking at is kind of space/time tradeoff. Well, I can throw space at

00:28:27.000 --> 00:28:29.000
this, and most people will say that in computer science, "Well I can throw space and this and

00:28:29.000 --> 00:28:31.000
get better times." So

00:28:31.000 --> 00:28:35.000
the double link list is a good example of something that throws space at a problem to get better

00:28:35.000 --> 00:28:37.000
time. There's a lot of overhead.

00:28:37.000 --> 00:28:39.000
Moving to the chunk list says I want to get back some of that space. I'm

00:28:39.000 --> 00:28:43.000
not willing to invest 90 percent overhead to get this. Can I find a way to

00:28:43.000 --> 00:28:44.000
take a

00:28:44.000 --> 00:28:47.000
whole sentence worth of things and add a few pointers here? That means that

00:28:47.000 --> 00:28:48.000
there's only two

00:28:48.000 --> 00:28:51.000
4 byte pointers per each sentence as opposed to each character,

00:28:51.000 --> 00:28:53.000
and that really way reduces the overhead.

00:28:53.000 --> 00:28:55.000
But then now,

00:28:55.000 --> 00:28:58.000
the problem is still back on your shoulders in terms of effort,

00:28:58.000 --> 00:29:02.000
now I have both an array, and a link list, and double links, and

00:29:02.000 --> 00:29:03.000
a

00:29:03.000 --> 00:29:07.000
lot of complicated stuff flying around to make it work.

00:29:07.000 --> 00:29:08.000
But it does work out.

00:29:08.000 --> 00:29:11.000
It's kind of the process that designers go through when

00:29:11.000 --> 00:29:12.000
they're building these kinds of

00:29:12.000 --> 00:29:16.000
key structures. It like, what are the options, and do any of the basic

00:29:16.000 --> 00:29:18.000
things we know about work? What about

00:29:18.000 --> 00:29:21.000
building even fancier things that kind of take the best of both world and mix

00:29:21.000 --> 00:29:23.000
them together and have a little bit of the weaknesses of this a little of that, but none of

00:29:23.000 --> 00:29:26.000
the really awful

00:29:26.000 --> 00:29:27.000
worst cases

00:29:27.000 --> 00:29:28.000
present in the end result.

00:29:28.000 --> 00:29:31.000
So in this case you'd have a lot of things that were constant time where constant was

00:29:31.000 --> 00:29:33.000
defined by the chunk size.

00:29:33.000 --> 00:29:37.000
So O of 100, let's say, if that was the maximum chunk size

00:29:37.000 --> 00:29:39.000
to do these things, as opposed to O of

00:29:39.000 --> 00:29:42.000
N for the whole block net,

00:29:42.000 --> 00:29:43.000
so kind of a neat

00:29:43.000 --> 00:29:45.000
little thing to think through.

00:29:45.000 --> 00:29:48.000
There was a time, just so you can feel

00:29:48.000 --> 00:29:51.000
gratified about how I've worked you very hard, but actually in the past I've been

00:29:51.000 --> 00:29:53.000
even harsher.

00:29:53.000 --> 00:29:57.000
We actually had them build the doubly linked chunked list editor buffer,

00:29:57.000 --> 00:30:00.000
and now we have you build the singly linked chunk list PQ, and it's

00:30:00.000 --> 00:30:02.000
actually

00:30:02.000 --> 00:30:06.000
still very complicated, as you'll find out, but not nearly quite as hairy as the

00:30:06.000 --> 00:30:09.000
full in both directions every way

00:30:09.000 --> 00:30:11.000
craziness.

00:30:11.000 --> 00:30:14.000
And you can kind of imagine in your mind.

00:30:14.000 --> 00:30:16.000
Any questions about edited buffer?

00:30:16.000 --> 00:30:18.000
We?ve got two more data structures to implement,

00:30:18.000 --> 00:30:20.000
two more things that we've been using,

00:30:20.000 --> 00:30:25.000
and happy to have in our arsenal that we need to know how they work.

00:30:25.000 --> 00:30:29.000
Let's talk about map first, and in

00:30:29.000 --> 00:30:31.000
fact the strategy that I'm going to end up showing you for map is actually going to be

00:30:31.000 --> 00:30:35.000
the same one we use for set, then we'll come back and think of an even more clever

00:30:35.000 --> 00:30:37.000
way to do map, too. So

00:30:37.000 --> 00:30:40.000
we've got two things to talk about, which are binary search trees and hashing. We'll

00:30:40.000 --> 00:30:41.000
do binary search trees first,

00:30:41.000 --> 00:30:44.000
and then we'll get back to the hashing the second time around.

00:30:44.000 --> 00:30:46.000
So

00:30:46.000 --> 00:30:49.000
map is, next to vector, probably the most useful thing you've got going on in

00:30:49.000 --> 00:30:50.000
your class library,

00:30:50.000 --> 00:30:53.000
all kinds of things that do look up based

00:30:53.000 --> 00:30:56.000
activities; looking up students by ID number,

00:30:56.000 --> 00:30:57.000
phone numbers,

00:30:57.000 --> 00:31:00.000
by name, any of these kinds of things where you need to have this key at associated satellite

00:31:00.000 --> 00:31:02.000
information. And

00:31:02.000 --> 00:31:05.000
you'd like to be able to do that retrieval and update quickly.

00:31:05.000 --> 00:31:08.000
The map is the one for you.

00:31:08.000 --> 00:31:11.000
So if you remember it's key value fears, it's all about key being the

00:31:11.000 --> 00:31:15.000
identifying mark that allows you to find the associated value. And then the value

00:31:15.000 --> 00:31:18.000
is actually just being stored and kind of retrieved without using it

00:31:18.000 --> 00:31:21.000
or examining it in any kind of interesting way.

00:31:21.000 --> 00:31:25.000
And what we're really interested in here is how to make this guy work really efficiently. So efficiently

00:31:25.000 --> 00:31:27.000
being hopefully log in or better,

00:31:27.000 --> 00:31:31.000
if we could get that on both the update operations that add and remove things

00:31:31.000 --> 00:31:33.000
from the map, as well as the

00:31:33.000 --> 00:31:34.000
look up operation, then

00:31:34.000 --> 00:31:37.000
that would please probably all our clients immensely, if we could get both of

00:31:37.000 --> 00:31:41.000
those going well.

00:31:41.000 --> 00:31:45.000
So I'm going to build you on that. I probably have enough time to do that.

00:31:45.000 --> 00:31:46.000
That

00:31:46.000 --> 00:31:48.000
attacks it from

00:31:48.000 --> 00:31:53.000
using simple tools to start with, just to kind of get a base line for what we can do.

00:31:53.000 --> 00:31:54.000


00:31:54.000 --> 00:31:56.000
Vector, our old friend vector,

00:31:56.000 --> 00:31:59.000
can't get too much mileage out of vector, apparently, because you can always find a

00:31:59.000 --> 00:32:00.000
way to make it do something

00:32:00.000 --> 00:32:01.000
useful for you.

00:32:01.000 --> 00:32:02.000
It

00:32:02.000 --> 00:32:03.000
gives us the convenience

00:32:03.000 --> 00:32:05.000


00:32:05.000 --> 00:32:07.000
of managing that thing with low overhead,

00:32:07.000 --> 00:32:09.000
direct access is good.

00:32:09.000 --> 00:32:11.000
We make a parastruct

00:32:11.000 --> 00:32:13.000
that has the key and the value together.

00:32:13.000 --> 00:32:16.000
We store it into a vector,

00:32:16.000 --> 00:32:18.000
and then we

00:32:18.000 --> 00:32:21.000
may or may not sort that vector. But if it's sorted, there's probably some good reasons. Think

00:32:21.000 --> 00:32:22.000
about

00:32:22.000 --> 00:32:26.000
what order to sort it in, what information to be using to sort that,

00:32:26.000 --> 00:32:28.000
and then we'll get value to add. We're just going to go

00:32:28.000 --> 00:32:30.000
through doing it, rather than talk about it.

00:32:30.000 --> 00:32:32.000
We'll just go over and

00:32:32.000 --> 00:32:36.000
make a vector happen.

00:32:36.000 --> 00:32:38.000
Let's get over here in X-code.

00:32:38.000 --> 00:32:40.000
Let's take a look at

00:32:40.000 --> 00:32:42.000
my map dot H,

00:32:42.000 --> 00:32:43.000
and all

00:32:43.000 --> 00:32:46.000
I'm going to put in there is actually add and get value.

00:32:46.000 --> 00:32:48.000
And

00:32:48.000 --> 00:32:50.000
the other one contains key and remove."

00:32:50.000 --> 00:32:52.000


00:32:52.000 --> 00:32:55.000
But these are these two operations that really matter to us, getting something in there, and

00:32:55.000 --> 00:32:57.000
getting something back out.

00:32:57.000 --> 00:32:58.000


00:32:58.000 --> 00:33:00.000
And so I'm going to start by

00:33:00.000 --> 00:33:02.000
building this thing on vector. So let me go ahead and

00:33:02.000 --> 00:33:14.000
indicate that I plan to use vector as part of

00:33:14.000 --> 00:33:15.000
what I'm doing. So I defined a little structure

00:33:15.000 --> 00:33:20.000
that's the pair. I'm going to call that "pair"

00:33:20.000 --> 00:33:23.000
even, that's a nice word for it. So the pair of the key and the value that go together,

00:33:23.000 --> 00:33:29.000
and then what I will store here

00:33:29.000 --> 00:33:31.000
is a vector of those pairs. They'll

00:33:31.000 --> 00:33:36.000
give me a key to value; I'll stick them together in that little struct, stick them into my vector, and

00:33:36.000 --> 00:33:38.000
hopefully be able to retrieve them later.

00:33:38.000 --> 00:33:42.000
So given that I'm depending only on vector, I don?t have any other

00:33:42.000 --> 00:33:45.000
information, I don't technically need to go out of my way to disallow then

00:33:45.000 --> 00:33:48.000
memory-wise copying, because vector does a deep copy. But I'm going to leave it in there, because I

00:33:48.000 --> 00:33:52.000
plan on going some place that is eventually going to need it. I might as well be safe from

00:33:52.000 --> 00:33:54.000
the get go.

00:33:54.000 --> 00:33:55.000
The vector

00:33:55.000 --> 00:34:00.000
will be set up and destroyed automatically as part of management of my

00:34:00.000 --> 00:34:04.000
data member. So I actually don't have any explicit allocation/deallocation, so if I look at my

00:34:04.000 --> 00:34:06.000
constructor and destructor they'll be totally empty.

00:34:06.000 --> 00:34:11.000
And then add and get value are where I'm going to actually get to do some work.

00:34:11.000 --> 00:34:14.000
Before I implement them, I'm just going to think for a second about

00:34:14.000 --> 00:34:16.000
how this is going to play out.

00:34:16.000 --> 00:34:20.000
Let's say I'm doing a vector that's storing

00:34:20.000 --> 00:34:26.000
strings and their length. Let's say, it's just a map of

00:34:26.000 --> 00:34:27.000
string,

00:34:27.000 --> 00:34:30.000


00:34:30.000 --> 00:34:34.000
and if I do M dot add

00:34:34.000 --> 00:34:37.000
car. I'm going to store a three with it,

00:34:37.000 --> 00:34:39.000
so on the other side of the wall,

00:34:39.000 --> 00:34:41.000
what we're going to do is make a little struct

00:34:41.000 --> 00:34:43.000
that has the word car

00:34:43.000 --> 00:34:47.000
and the number three with it, and we package it into our ray.

00:34:47.000 --> 00:34:51.000
And so then we get a second call to put something in like

00:34:51.000 --> 00:34:55.000
dog, also with a three, then we'll make a second one of these and stick it in

00:34:55.000 --> 00:34:58.000
our ray.

00:34:58.000 --> 00:35:01.000
And so on, so the idea is to have the vector really kind of manage where the storage is going. And

00:35:01.000 --> 00:35:02.000
then what

00:35:02.000 --> 00:35:05.000
we'll just do is package it up and stick it in there.

00:35:05.000 --> 00:35:08.000
The one thing, though, that I have to think a little bit about, because it sounds

00:35:08.000 --> 00:35:11.000
like add should be pretty easy. It almost seems like what I need to do is something

00:35:11.000 --> 00:35:13.000
as simple as this. And

00:35:13.000 --> 00:35:15.000
I'll start to write the code, and then you can tell me why

00:35:15.000 --> 00:35:17.000
this isn't what I want to do.

00:35:17.000 --> 00:35:20.000
I say pair TP, and I say P dot

00:35:20.000 --> 00:35:23.000
key equals key P dot

00:35:23.000 --> 00:35:30.000
value equals val, and then I say entries dot add P.

00:35:30.000 --> 00:35:32.000
Almost, but not quite what I want,

00:35:32.000 --> 00:35:34.000
what have I forgotten

00:35:34.000 --> 00:35:39.000
to take care of?

00:35:39.000 --> 00:35:42.000
If it's not already in there;

00:35:42.000 --> 00:35:46.000
this has to do with just what's the interface for nap. What does nap say about it if you

00:35:46.000 --> 00:35:48.000
try to add

00:35:48.000 --> 00:35:51.000
for a key a different value. So in the case of this one, it didn't quite make sense that I would add

00:35:51.000 --> 00:35:55.000
something, but let's say I accidentally put car in there with a link to the first time that I went

00:35:55.000 --> 00:35:57.000
to go fix it, right,

00:35:57.000 --> 00:36:01.000
that my subsequent to ask it store car with the proper value,

00:36:01.000 --> 00:36:03.000
three,

00:36:03.000 --> 00:36:05.000
doesn't create a totally new entry.

00:36:05.000 --> 00:36:09.000
It has to find the existing entry that already has car and overwrite it.

00:36:09.000 --> 00:36:13.000
So the behavior of mad was it could add when it was not previously existing.

00:36:13.000 --> 00:36:16.000
It also could be an overwrite of an existing value. So we do need

00:36:16.000 --> 00:36:17.000
to find

00:36:17.000 --> 00:36:20.000
if there is an entry that already has that key, and just

00:36:20.000 --> 00:36:21.000
replace its value

00:36:21.000 --> 00:36:23.000
rather than

00:36:23.000 --> 00:36:26.000
add a second entry with the same key. So at any given point, right, in this whole

00:36:26.000 --> 00:36:28.000
vector, there should be one entry

00:36:28.000 --> 00:36:30.000
for each unique key.

00:36:30.000 --> 00:36:33.000
So I have to do a search. And

00:36:33.000 --> 00:36:35.000
let me

00:36:35.000 --> 00:36:38.000
write this code, and then I'm going to mention why I'm going to decompose it in a

00:36:38.000 --> 00:36:39.000
second.

00:36:39.000 --> 00:36:42.000
If I do entries dot size, I plus plus

00:36:42.000 --> 00:36:44.000
then if

00:36:44.000 --> 00:36:49.000
entries of I dot key equals key.

00:36:49.000 --> 00:36:51.000
Then

00:36:51.000 --> 00:36:52.000
let's break. Let me pull out

00:36:52.000 --> 00:37:00.000
I so I can get access to it when I'm done. So after this

00:37:00.000 --> 00:37:01.000
loop exits,

00:37:01.000 --> 00:37:06.000
if I is less than entries dot size,

00:37:06.000 --> 00:37:06.000
then

00:37:06.000 --> 00:37:09.000
I know that it found a match.

00:37:09.000 --> 00:37:12.000
In the case where it went through all of them, and it never hit the break. Then I will

00:37:12.000 --> 00:37:15.000
have gone all the way to where it is exactly equal to entries dot size; if it didn't then I

00:37:15.000 --> 00:37:16.000
just

00:37:16.000 --> 00:37:19.000
have a place to overwrite. And I can say entries of

00:37:19.000 --> 00:37:21.000
I dot val

00:37:21.000 --> 00:37:23.000
equals val.

00:37:23.000 --> 00:37:27.000
Otherwise I go through this process of adding a new one. I had to

00:37:27.000 --> 00:37:30.000
go hunt that thing down

00:37:30.000 --> 00:37:33.000
and replace it.

00:37:33.000 --> 00:37:35.000
That code,

00:37:35.000 --> 00:37:38.000
when you look at it, you can say, "Gosh I feel like that code's going to be familiar."

00:37:38.000 --> 00:37:43.000
Because that idea of searching to find a match to the key

00:37:43.000 --> 00:37:45.000
probably is going to come in really handy

00:37:45.000 --> 00:37:46.000
when I need to do get value,

00:37:46.000 --> 00:37:50.000
but it has the same exact problem of oh, I've got to go find that match. So in fact,

00:37:50.000 --> 00:37:55.000
that kind of motivates the idea of why don't I just take this little piece of code

00:37:55.000 --> 00:37:59.000
and separate it out and do a helper that they can both use. I

00:37:59.000 --> 00:38:02.000
may not have planned for this from the beginning, but once I see it happening, I might as well fix

00:38:02.000 --> 00:38:05.000
it. So I can say find index

00:38:05.000 --> 00:38:07.000
for key that

00:38:07.000 --> 00:38:10.000
given a key will return to you

00:38:10.000 --> 00:38:11.000


00:38:11.000 --> 00:38:15.000
the vector index at which

00:38:15.000 --> 00:38:19.000
that key is or a negative one. I don't know if I need to keep that

00:38:19.000 --> 00:38:22.000
one anymore.

00:38:22.000 --> 00:38:24.000
If it ever found it, it returns I otherwise

00:38:24.000 --> 00:38:28.000
it returned negative one. That can be our not found.

00:38:28.000 --> 00:38:37.000
I can actually use that up here,

00:38:37.000 --> 00:38:40.000
and then I can say, if found

00:38:40.000 --> 00:38:43.000
does not equal negative one.

00:38:43.000 --> 00:38:44.000
Then we do that,

00:38:44.000 --> 00:38:46.000
and then that

00:38:46.000 --> 00:38:48.000
tells us that

00:38:48.000 --> 00:38:51.000
this little piece of code is going to come in handy right up here,

00:38:51.000 --> 00:38:55.000
get value if found does not equal

00:38:55.000 --> 00:38:56.000
negative one, then we can return that

00:38:56.000 --> 00:39:02.000
and what is the behavior of get value if it didn't find it? Does anybody remember?

00:39:02.000 --> 00:39:06.000
I ask it to get the value in this case for lollipop and it's not

00:39:06.000 --> 00:39:07.000
there,

00:39:07.000 --> 00:39:13.000
does it turn zero, what does it

00:39:13.000 --> 00:39:22.000
do? Untracked forget value, remember?

00:39:22.000 --> 00:39:25.000
This is an error.

00:39:25.000 --> 00:39:26.000
There is not sort of a

00:39:26.000 --> 00:39:30.000
general case return value you can produce here

00:39:30.000 --> 00:39:33.000
that will really sort of sufficiently describe to someone who's made

00:39:33.000 --> 00:39:37.000
this call in error what happened, right? I can't return zero, because it happens to be

00:39:37.000 --> 00:39:40.000
that sometimes I?m putting in, and sometimes I'm putting strings or structs

00:39:40.000 --> 00:39:44.000
kinds of things. There's no general zero or negative one to use,

00:39:44.000 --> 00:39:47.000
so the behavior of get value is if you ask it to retrieve a value for a key, it ought to be

00:39:47.000 --> 00:39:48.000
there. So

00:39:48.000 --> 00:39:52.000
the corresponding implementation would likely also just go through

00:39:52.000 --> 00:39:56.000
the process of contains key, which we call find index for key and check.

00:39:56.000 --> 00:39:58.000
It could be used by the client

00:39:58.000 --> 00:40:01.000
if they need to know in advance that it's really there.

00:40:01.000 --> 00:40:04.000
So let's see how I'm

00:40:04.000 --> 00:40:06.000
doing. Let's go back to my

00:40:06.000 --> 00:40:11.000
code here,

00:40:11.000 --> 00:40:13.000
and this

00:40:13.000 --> 00:40:15.000
being

00:40:15.000 --> 00:40:16.000
not my specialty,

00:40:16.000 --> 00:40:20.000
and just see that I can put something in and

00:40:20.000 --> 00:40:25.000
get it back out would be a good first test to see that things are

00:40:25.000 --> 00:40:29.000
going as they should. Oh, no it didn't like that. Oh yeah, find

00:40:29.000 --> 00:40:30.000
index for key.

00:40:30.000 --> 00:40:32.000
Find index for key

00:40:32.000 --> 00:40:34.000
was not

00:40:34.000 --> 00:40:38.000
populated clear in my dot H, so let me go move it there. I'll put

00:40:38.000 --> 00:40:42.000
it in the private section, because I don?t want this to be something that is

00:40:42.000 --> 00:40:45.000
exposed outside of the implementation. The idea that there is an index and

00:40:45.000 --> 00:40:48.000
there is a vector is not really something that I would want to make part of my

00:40:48.000 --> 00:40:54.000
public interface. It's really just an internal detail of how I'm managing stuff.

00:40:54.000 --> 00:40:56.000
Something about what I did, oh yeah, I

00:40:56.000 --> 00:41:00.000
used the right name for that.

00:41:00.000 --> 00:41:05.000
One more case let

00:41:05.000 --> 00:41:06.000
me just take a look at them both

00:41:06.000 --> 00:41:09.000
before I let the compiler tell me what is and what isn't right about them. Their both

00:41:09.000 --> 00:41:13.000
doing find index for key using linear search. If it comes back at not negative one

00:41:13.000 --> 00:41:17.000
pulling out the matching value and overriding it, and then in the case of adding

00:41:17.000 --> 00:41:23.000
or replacing the error -

00:41:23.000 --> 00:41:24.000
so the

00:41:24.000 --> 00:41:26.000
previously stored value for that is three,

00:41:26.000 --> 00:41:29.000
and if I go and I do one of those tests it's always good to know what

00:41:29.000 --> 00:41:34.000
happens if I ask it for something that doesn't exist.

00:41:34.000 --> 00:41:37.000
Making sure that the error handling that I put in there does

00:41:37.000 --> 00:41:43.000
what it was supposed to do, right? Are

00:41:43.000 --> 00:41:46.000
there any questions about the code that I wrote here?

00:41:46.000 --> 00:41:49.000
The vector is always kind of a good starting place for these things, because actually it just

00:41:49.000 --> 00:41:53.000
tends to use very simple things, and it tends to lend itself to easy code.

00:41:53.000 --> 00:41:57.000
But because of vector constraints being at a contiguous back structure, right, there's likely to

00:41:57.000 --> 00:41:59.000
be some

00:41:59.000 --> 00:42:01.000
performance implication. And in

00:42:01.000 --> 00:42:02.000
particular,

00:42:02.000 --> 00:42:04.000
in the unsorted case here,

00:42:04.000 --> 00:42:07.000
both add and get value

00:42:07.000 --> 00:42:09.000
involve doing a linear search.

00:42:09.000 --> 00:42:12.000
In some cases it will find it quickly at the very beginning or even half way

00:42:12.000 --> 00:42:13.000
through.

00:42:13.000 --> 00:42:17.000
In other cases it won't find it at all and have to search through the whole thing. So on

00:42:17.000 --> 00:42:20.000
average, if you figure it's equally likely to be in any position or not there at all,

00:42:20.000 --> 00:42:23.000
it's at going to at least have to look at half of them or more.

00:42:23.000 --> 00:42:25.000
And so we can say they're both linear.

00:42:25.000 --> 00:42:28.000
If we change it to be sorted,

00:42:28.000 --> 00:42:31.000
which is the first obvious improvement to make here,

00:42:31.000 --> 00:42:34.000
then get value

00:42:34.000 --> 00:42:37.000
gets a lot faster, because we can take advantage of binary search.

00:42:37.000 --> 00:42:39.000
I sort it by key,

00:42:39.000 --> 00:42:43.000
so ignoring the value, values are just satellite data. But stored in order of

00:42:43.000 --> 00:42:44.000
key,

00:42:44.000 --> 00:42:48.000
then all the A words, B words, C words, D words, what not, so

00:42:48.000 --> 00:42:51.000
walking it down the middle and seeing which half you look at, then looking at the

00:42:51.000 --> 00:42:53.000
quarter of there and the eight of there. It's going to very quickly narrow down on

00:42:53.000 --> 00:42:57.000
where it had to be if it was there or if it wasn't there at all.

00:42:57.000 --> 00:43:00.000
So we can implement that in logarithmic time, meaning that if you 1,000

00:43:00.000 --> 00:43:04.000
entries, it takes ten comparisons to find it or agree that it's not there.

00:43:04.000 --> 00:43:08.000
If you double that it will take one more comparison. If

00:43:08.000 --> 00:43:12.000
you go to 2,000, negligibly faster, even numbers in the millions

00:43:12.000 --> 00:43:15.000
are still just a handful

00:43:15.000 --> 00:43:19.000
of comparisons to work it out, and say,

00:43:19.000 --> 00:43:22.000
to the twentieth for example is roughly a million; so 20 comparisons in or out,

00:43:22.000 --> 00:43:25.000
and so it didn't look at a million things a million times over.

00:43:25.000 --> 00:43:27.000
However,

00:43:27.000 --> 00:43:30.000
that didn't improve add,

00:43:30.000 --> 00:43:31.000
why not?

00:43:31.000 --> 00:43:33.000
Why does keeping an assorted order

00:43:33.000 --> 00:43:37.000
not have the same boost for add as it did for get value?

00:43:37.000 --> 00:43:46.000
Can I do the same logarithmic search? You have to

00:43:46.000 --> 00:43:50.000
move all the elements ?

00:43:50.000 --> 00:43:51.000
Yeah,

00:43:51.000 --> 00:43:55.000
you can find the place fast, right, you can say returning the word

00:43:55.000 --> 00:43:59.000
apple it's like oh, okay. I can very quickly narrow in on where it needed to be. If it

00:43:59.000 --> 00:44:02.000
was there I could override it quickly, but in the case where it really needed to do

00:44:02.000 --> 00:44:03.000
an add,

00:44:03.000 --> 00:44:04.000
it's got to move them down.

00:44:04.000 --> 00:44:05.000
So the old add

00:44:05.000 --> 00:44:09.000
did all its work in the search and then just tacked it on the end if it didn't find

00:44:09.000 --> 00:44:12.000
it, or overrode the middle. But the new add

00:44:12.000 --> 00:44:16.000
also has to modify the array to put it in the right place, and putting it in the right place means

00:44:16.000 --> 00:44:18.000
shuffling to make space.

00:44:18.000 --> 00:44:20.000
So even though it found it quickly,

00:44:20.000 --> 00:44:25.000
it was unable to update that quickly, so it still ends up being

00:44:25.000 --> 00:44:26.000
linear in that case.

00:44:26.000 --> 00:44:29.000
Now this actually not a terrible result,

00:44:29.000 --> 00:44:30.000
as it stands

00:44:30.000 --> 00:44:33.000
it's probably much more likely that you're going to load up the data

00:44:33.000 --> 00:44:36.000
structure once and then do a lot

00:44:36.000 --> 00:44:40.000
of searches on it, then infrequent additions. That's a pretty common usage

00:44:40.000 --> 00:44:41.000
pattern for a map. It's kind of like you

00:44:41.000 --> 00:44:43.000
build a dictionary, right,

00:44:43.000 --> 00:44:46.000
loading the dictionary once could be expensive, but then people get tons and tons of

00:44:46.000 --> 00:44:49.000
look ups of words later. But you don't change the definitions a lot later or add a

00:44:49.000 --> 00:44:51.000
lot of new words.

00:44:51.000 --> 00:44:51.000
So

00:44:51.000 --> 00:44:56.000
this might actually be quite tolerable in many situations

00:44:56.000 --> 00:44:59.000
that the building of it was expensive, but it gave

00:44:59.000 --> 00:45:00.000
you

00:45:00.000 --> 00:45:02.000
fast search access.

00:45:02.000 --> 00:45:05.000
But really what you want to do is say, can we get both of those? So driving to say

00:45:05.000 --> 00:45:08.000
what we do with the editor buffer, you say, "Well, what if I just want both of those operations

00:45:08.000 --> 00:45:11.000
to be logarithmic or better, what can I do?" So

00:45:11.000 --> 00:45:12.000
I'll leave you with kind of just

00:45:12.000 --> 00:45:16.000
a little bit of a taste of where we're heading.

00:45:16.000 --> 00:45:19.000
Just to think a little bit about what the link

00:45:19.000 --> 00:45:21.000
list does and doesn't buy us,

00:45:21.000 --> 00:45:24.000
right, that the link list gave us flexibility

00:45:24.000 --> 00:45:29.000
at the editing, the local editing site, but it doesn't give us fast searching.

00:45:29.000 --> 00:45:31.000
Well maybe we can take this link list and try to

00:45:31.000 --> 00:45:33.000
add searching

00:45:33.000 --> 00:45:37.000
to the idea of this flexibility to kind of get the best of both

00:45:37.000 --> 00:45:41.000
worlds. So that's what we're going to start on Monday, building a tree. So if

00:45:41.000 --> 00:45:44.000
you have time today, I'd love to have you come to Truman, and otherwise have a good

00:45:44.000 --> 00:45:45.000
weekend, and

00:45:45.000 --> 00:45:50.000
get your PQ work in.