Lecture 21 | Programming Abstractions (Stanford)
-
0:00 - 0:02
-
0:02 - 0:10
-
0:10 - 0:13This presentation is delivered by the Stanford Center for Professional
-
0:13 - 0:21Development.
-
0:21 - 0:23Let us
-
0:23 - 0:25refresh about where we were at the end of Wednesday.
-
0:25 - 0:29As we had talked about the vector form of the buffer, so the editor buffer that's
-
0:29 - 0:31backing the
-
0:31 - 0:34word processing client,
-
0:34 - 0:37and the two stack version, where we're shuffling stuff, often on the before
-
0:37 - 0:40and after stacks to get things done, right?
-
0:40 - 0:43And when we were looking at the main six operations, we're looking at the four
-
0:43 - 0:47cursor movements and the two editing operations, right, that we had traded off.
-
0:47 - 0:49Where editing had been slow in the vector,
-
0:49 - 0:52to where editing was now fast, but then these long distance movements were slow,
-
0:52 - 0:54because of the shuffling.
-
0:54 - 0:57And then there was a little discussion at the very end about how this probably
-
0:57 - 1:00increases our space requirements, because we're now keeping two stacks that are
-
1:00 - 1:02each capable of holding
-
1:02 - 1:04the entire contents
-
1:04 - 1:06as opposed to one vector here
-
1:06 - 1:09with its slop.
-
1:09 - 1:10So that was where we were at.
-
1:10 - 1:13We are still in kind of the pursuit of this Holy Grail of could we
-
1:13 - 1:16get it to where everything is fast. Is it always a matter of tradeoffs or could we just
-
1:16 - 1:19invest some more smarts and more
-
1:19 - 1:20cleverness
-
1:20 - 1:24and hard work and squeeze it down to where we could get everything to perform
-
1:24 - 1:26efficiently? So typically when computer scientists say they want something to perform
-
1:26 - 1:28efficiently, they're hoping for log
-
1:28 - 1:32in time or faster. Constant time is the best. Log in grows so slowly that in fact,
-
1:32 - 1:34logarithm time is effectively constant
-
1:34 - 1:40for all reasonable values of N up to millions and billions.
-
1:40 - 1:43And so typically anything that is linear or higher is often a weak spot that you are
-
1:43 - 1:47trying to look at, and certainly things that are quadratic or exponential
-
1:47 - 1:51are definite opportunities for improvement.
-
1:51 - 1:53So let's look at this idea of the link list,
-
1:53 - 1:55so that both
-
1:55 - 1:56the
-
1:56 - 1:58vector and the stack
-
1:58 - 2:03- we're relying on this idea that continuous memory is the
-
2:03 - 2:06weak link that was causing some problems. In the vector, that shuffling to
-
2:06 - 2:08move things up and down,
-
2:08 - 2:12or in the case of the stack version, one side to the other
-
2:12 - 2:15was part of what we were working up against.
-
2:15 - 2:18And the link list promises to give us this opportunity to have these each
-
2:18 - 2:21individual character being managed separately,
-
2:21 - 2:26and flexibility might help resolve some of our problems here.
-
2:26 - 2:30SO what we're going to look at is the idea of implementing the buffer using a link
-
2:30 - 2:32list. So if I have
-
2:32 - 2:34characters A, B, C, D, E,
-
2:34 - 2:37I could have a link list here where I'm pointing to the head cell, which is A,
-
2:37 - 2:40which points to B, and C, and D, and E, and down to the null.
-
2:40 - 2:43SO what I'm modeling it as in the private data section will be a
-
2:43 - 2:44little
-
2:44 - 2:47link list node with a character in there, a pointer to the next one.
-
2:47 - 2:52And then I'm going to keep two pointers, one to the front-most cell, and then that's going to
-
2:52 - 2:55model where the cursor is within the buffer.
-
2:55 - 2:59So rather than doing this like an index, an index was handy in the case of vector where we
-
2:59 - 3:00had direct access, right?
-
3:00 - 3:02Given the way the link list works,
-
3:02 - 3:05knowing that it's at the fifth cell, the fourth cell isn't really very helpful. It would be more
-
3:05 - 3:09helpful to have the pointer kind of directly in the midst of things to give us direct
-
3:09 - 3:12access to where the cursor is.
-
3:12 - 3:14But let's think a little bit about that cursor,
-
3:14 - 3:17because there's an important decision to be made early about helping to
-
3:17 - 3:20facilitate the later work we have to do.
-
3:20 - 3:24I have the contents A, B, C, D, E,
-
3:24 - 3:28so the cursor is actually between two characters.
-
3:28 - 3:31If the cursor right now is situated
-
3:31 - 3:32after the A and before the B,
-
3:32 - 3:33what I'm
-
3:33 - 3:36modeling is A cursor B,
-
3:36 - 3:38C, D,
-
3:38 - 3:39then
-
3:39 - 3:42it seems like the obvious two choices for where the cursor might point is to
-
3:42 - 3:45A or to B. And I
-
3:45 - 3:50had said in the case of the vector that if it were index zero or index one
-
3:50 - 3:51that
-
3:51 - 3:53there wasn't a really strong preference for one over the other.
-
3:53 - 3:56There is going to be a really good reason to pick one for the link list
-
3:56 - 3:59over the other. Would you rather point to the A
-
3:59 - 4:11for where the cursor is or would you rather point to the B? [Inaudible]
-
4:11 - 4:14Exactly, so the idea is that when the cursor position is there, the next edit
-
4:14 - 4:18actually goes in between the A and the B. So if I insert the character
-
4:18 - 4:21Z, it really does go right here.
-
4:21 - 4:24And if I am going to, if effect, wire that in and splice it in then list,
-
4:24 - 4:29if I'm pointing to B then I'm kind of hosed, because I'm already one past where I need to be able slice in. Because I need to
-
4:29 - 4:33know what was behind it to bring it in. So I'm pointing to the character behind the
-
4:33 - 4:34cursor
-
4:34 - 4:36gives you access to
-
4:36 - 4:39wiring this down to here and this into there, and then updating the cursor to
-
4:39 - 4:40point to the Z
-
4:40 - 4:44without having to do any of the difficult backward looking backward
-
4:44 - 4:45traversal.
-
4:45 - 4:48So that strongly suggests what it is, it's going to point to
-
4:48 - 4:52A when it's between A and B; B when it's between B and C; and so on.
-
4:52 - 4:56There is another little point here, which is there are actually five letters
-
4:56 - 4:59in the cursor, but there are six cursor positions. It could be at the very
-
4:59 - 5:00beginning,
-
5:00 - 5:03in between all these, or at the very end.
-
5:03 - 5:07And the way we've chosen it right now, I can identify all of the five
-
5:07 - 5:08positions
-
5:08 - 5:11that have at least one character to the left, so being -
-
5:11 - 5:12pointing to the A means it's
-
5:12 - 5:15after A; pointing to the B it's after B; and so on, all the way down to the E.
-
5:15 - 5:18But I don't have a cursor position for
-
5:18 - 5:23representing when the cursor is at the very beginning of the buffer.
-
5:23 - 5:25So
-
5:25 - 5:28one strategy I could use, right, is to say I'll just use the special cursor
-
5:28 - 5:29position of null. But
-
5:29 - 5:32I need one more pointer value to hold onto,
-
5:32 - 5:36and I could make a special case out of this because there's null. Once I've done that though,
-
5:36 - 5:40I've actually committed myself to this thing that's going to require all of my code to be managing a
-
5:40 - 5:41special case of,
-
5:41 - 5:43well when the cursor's null do something a little different than when
-
5:43 - 5:46the cursor is pointing to some cell. I'm going to
-
5:46 - 5:50show you a way that we can kind of
-
5:50 - 5:54avoid the need for making a special case out of it,
-
5:54 - 5:55by
-
5:55 - 5:58wasting a little memory and reorganizing our list
-
5:58 - 6:01to allow for there to be a sixth cell,
-
6:01 - 6:04and extra cell that we call the dummy cell.
-
6:04 - 6:07So in this case that cell would go at the very front.
-
6:07 - 6:10It would have no character, just let the character field be unstatused. It's
-
6:10 - 6:13not really a character. What it is, is just a placeholder.
-
6:13 - 6:16And when I created the buffer to begin with, even when it's empty, it's always going to have
-
6:16 - 6:20that cell. I'm going to create that cell in advance, I'm going to have it sitting ready and waiting, and that cell
-
6:20 - 6:24never gets deleted or moved or changed. In fact, the list always, the head
-
6:24 - 6:28of this will always point to the same cell from the beginning of this object's lifetime
-
6:28 - 6:32to the end. So it will never change this at all, and the cells that follow are
-
6:32 - 6:35the ones that are going to be changing and growing and rearranging in
-
6:35 - 6:38response to editing commands.
-
6:38 - 6:40What this is going to buy us is a couple of things. First off, it's going to make it
-
6:40 - 6:42so there is a sixth cursor position.
-
6:42 - 6:45So now when the cursor is at the very beginning,
-
6:45 - 6:47we can set the cursor to point to this dummy cell.
-
6:47 - 6:48Okay
-
6:48 - 6:51that's one thing that we solve by doing that.
-
6:51 - 6:55Another thing that we've solved is actually we have made all the other
-
6:55 - 6:57cells on the list, the ones that are being edited and manipulated.
-
6:57 - 7:00We've made them all totally symmetric.
-
7:00 - 7:02Previously there was always going to be a little bit of a special case about, well if you're
-
7:02 - 7:06the front of the list, we're seeing that sometimes we're doing inserts on the
-
7:06 - 7:08stack in queue link list. That
-
7:08 - 7:12inserting later in the list involves kind of wiring in two pointers; one to command and one
-
7:12 - 7:13to go out.
-
7:13 - 7:16But there was a special case that oh, if you're the very first cell of the list, then
-
7:16 - 7:19your outpointer is the same, but your incoming pointer is resetting the head.
-
7:19 - 7:22That piece has actually gone away now,
-
7:22 - 7:26that now every cell in there always has a predecessor. It always has a previous,
-
7:26 - 7:29right? Something behind it so that actually they will always be
-
7:29 - 7:31pulling the pointer in and pulling the pointer out
-
7:31 - 7:34for every cell with character data that's being modified.
-
7:34 - 7:37And that means some of that extra little handing of oh, if you're the
-
7:37 - 7:39very first cell,
-
7:39 - 7:40have gone away.
-
7:40 - 7:44The very first cell is always this kind of dummy cell and doesn't require us to go
-
7:44 - 7:46out of our way to do anything special for it.
-
7:46 - 7:49So this is going to solve a bunch of problems for us, and it's kind of a neat technique. This thing
-
7:49 - 7:51is used actually
-
7:51 - 7:52fairly commonly
-
7:52 - 7:57in situations just managing a link list just to buy yourself some symmetry in the
-
7:57 - 8:02remaining cells.
-
8:02 - 8:05So let me look at some code with you, and I'm going to actually draw some pictures as I
-
8:05 - 8:06go. I'm
-
8:06 - 8:10not going to write this code in real time, because I think it's actually more important to
-
8:10 - 8:11see it
-
8:11 - 8:15acting on things than it is to see me typing out the characters.
-
8:15 - 8:18But the mechanism, right, in the case of the buffer here
-
8:18 - 8:21is it's going to have a head point,
-
8:21 - 8:23which points to the front most cell; and a
-
8:23 - 8:24cursor,
-
8:24 - 8:28which points to the character that proceeds the cursor. So if right now the
-
8:28 - 8:31contents of the buffer
-
8:31 - 8:34are
-
8:34 - 8:37the contents A, D, C.
-
8:37 - 8:39And let's say that right now
-
8:39 - 8:40the
-
8:40 - 8:42cursor is pointing to A.
-
8:42 - 8:45Actually, I need my
-
8:45 - 8:50dummy cell. Let me move everybody down one, so you just have dummy
-
8:50 - 8:52cell and my A and my B.
-
8:52 - 8:56And so the cursor right now is at the very front of the buffer.
-
8:56 - 8:59This is kind of what it looks like. It has two characters, the A and the B.
-
8:59 - 9:03I'm going to trace through the process of inserting a new character
-
9:03 - 9:04into the buffer. It
-
9:04 - 9:07is that my local variable CP
-
9:07 - 9:11is going to point to a new cell allocated out in the heap.
-
9:11 - 9:15Let's say the character I'm going to insert is the Z, so I'll
-
9:15 - 9:16assign
-
9:16 - 9:18CH to the character, the slot right there.
-
9:18 - 9:20And then the next for this
-
9:20 - 9:24gets to - what the cursor is - kind of following the cursor. So the
-
9:24 - 9:26cursor is pointing to the cell before it,
-
9:26 - 9:29and the next field of that cell tells us what follows it;
-
9:29 - 9:31so if I trace my cursor's
-
9:31 - 9:33next field, it points to A,
-
9:33 - 9:35and so I'm going to go ahead and
-
9:35 - 9:41set up this new cell's next field to also point to A.
-
9:41 - 9:44So now I've got two different ways to get to the A cell, tracing off the dummy cell or
-
9:44 - 9:46tracing off this new cell.
-
9:46 - 9:48And then I
-
9:48 - 9:51update the cursor's next field to point to CP.
-
9:51 - 9:55So I just copied right there that CP's next field was pointing to A. I copied it down here
-
9:55 - 9:57to kind of do the splice out.
-
9:57 - 10:02Now I'm going to do the splice in,
-
10:02 - 10:05causing it to no longer point to A,
-
10:05 - 10:06but instead to
-
10:06 - 10:08point to my new cell down here, CP.
-
10:08 - 10:10And then I update the cursor
-
10:10 - 10:13to point to this new cell. That just means that subsequent inserts, like the
-
10:13 - 10:18C is kind of behind the cursor after I've made this
-
10:18 - 10:21
-
10:21 - 10:23edit.
-
10:23 - 10:25That goes away when that one ends,
-
10:25 - 10:28and so now the new structure that I have here is
-
10:28 - 10:31head still points to the dummy cell that never changes.
-
10:31 - 10:33Dummy cell now points to Z,
-
10:33 - 10:36Z points to A, Z points to B,
-
10:36 - 10:41and then the cursor in this case is between the Z and the A,
-
10:41 - 10:44by virtue of the fact that the cursor is pointing to Z.
-
10:44 - 10:50A subsequent one would kind of do the same thing. If I typed out ZY
-
10:50 - 10:51after I did this. We'd
-
10:51 - 10:55make a new cell over here. It would get what
-
10:55 - 10:58was currently coming out of the cursor's field. The next field, right, we would
-
10:58 - 11:02update cursor's next field to point
-
11:02 - 11:05to this new one. And then we would update cursor
-
11:05 - 11:08to point to this guy. And
-
11:08 - 11:10so now following the list, right, dummy cell,
-
11:10 - 11:11Z,
-
11:11 - 11:16Y, A, B, cursor pointing to the Y, which is the
-
11:16 - 11:20character directly to the left of the cursor.
-
11:20 - 11:22And so inserting in any position now,
-
11:22 - 11:25front cell, middle cell, end cell,
-
11:25 - 11:29doesn't require any special case handling. So you don't see any if of things.
-
11:29 - 11:33And you notice that you'll never see an update to the head directly here.
-
11:33 - 11:37That the head was set up to point to the dummy cell in the constructor
-
11:37 - 11:39and never needs any adjusting.
-
11:39 - 11:42It's only the cells after it that requires
-
11:42 - 11:43
-
11:43 - 11:44the new
-
11:44 - 11:47arrangement of the pointers coming in and
-
11:47 - 11:49out. When I'm ready to delete a cell,
-
11:49 - 11:53then the mechanism for delete in the buffer is to delete the thing that follows
-
11:53 - 11:55the cursor. So if I have Z, Y, A, B,
-
11:55 - 11:58this is the character that delete is supposed to delete, the one after the
-
11:58 - 11:59cursor.
-
11:59 - 12:00And so,
-
12:00 - 12:04the first thing it looks at is to make sure that we're not already at the end.
-
12:04 - 12:07In the case of the link list form of this, if the cursor was pointing to
-
12:07 - 12:09the last most cell,
-
12:09 - 12:12and then we looked at it's next field and saw that it was null, that would be a clue. Oh, it's
-
12:12 - 12:15pointing to the very last cell, there's nothing that follows it. So we have a
-
12:15 - 12:17quick test to make sure that there's something there to delete.
-
12:17 - 12:19There is something following the cursor, and
-
12:19 - 12:24in this case since the cursor is pointing to the Y, we're good. It says look at its
-
12:24 - 12:25next field, it points to A.
-
12:25 - 12:27It says okay,
-
12:27 - 12:29call this thing old
-
12:29 - 12:30and point to that A.
-
12:30 - 12:33Now do a wire around it.
-
12:33 - 12:34So take the cursor's
-
12:34 - 12:36next field, and
-
12:36 - 12:39so this thing is where I'm targeting my overwrite,
-
12:39 - 12:44and copy out of the old its next. It's
-
12:44 - 12:49no longer pointing to the A, but instead it's
-
12:49 - 12:50pointing to the B.
-
12:50 - 12:52And deleting the old, which
-
12:52 - 12:54causes this piece of memory to
-
12:54 - 12:56be reclaimed,
-
12:56 - 13:01and now the contents of my buffer have been shifted over to
-
13:01 - 13:06where it goes Z, Y, and straight to D. The cursor in this case doesn't move. It
-
13:06 - 13:07actually deleted to the right, and so
-
13:07 - 13:11whatever was to the left of the cursor just stays in place.
-
13:11 - 13:15So again, no need to update anything special for the first or the last or any of those
-
13:15 - 13:15cells,
-
13:15 - 13:19but the symmetry of all the cells past the dummy kind of buys us some convenience
-
13:19 - 13:22of handling it. What does a dummy cell look like? The
-
13:22 - 13:26dummy cell looks like just any other cell. It's actually a cell, T, and what you do is
-
13:26 - 13:29you just don't assign the Christian field, the character field, because it has nothing in it.
-
13:29 - 13:32So it actually looks just like all the others, it's just another node in the
-
13:32 - 13:33chain.
-
13:33 - 13:36But this one, you know you put there purposefully just as the placeholder.
-
13:36 - 13:40So it doesn't contain any character data. So if you're ready to print the
-
13:40 - 13:42contents of the buffer, you have to remember the dummy cell is there,
-
13:42 - 13:43and just skip over it. You
-
13:43 - 13:45tend to start your list
-
13:45 - 13:49at heads next, the first interesting cell to look at, then work
-
13:49 - 13:51your way down to the end. So it's
-
13:51 - 13:54just like all the others, but you just know
-
13:54 - 14:00that the character data here is useless, it's not important. You didn't even write anything in there. When you delete, you just go left? It turns out
-
14:00 - 14:02the delete in this
-
14:02 - 14:04case goes forward.
-
14:04 - 14:07So you can imagine that in some other world it does, but all the ones we
-
14:07 - 14:11have done so far have always deleted forward. So the delete - for example, the stack case,
-
14:11 - 14:13grabbed from the after stack
-
14:13 - 14:16- it's done that way, because it happens to simplify some other
-
14:16 - 14:19things that you can imagine that delete and reverse. What would it take? Well, you'd have to back
-
14:19 - 14:22the cursor up. And we're going to find out pretty soon that would make it
-
14:22 - 14:24challenging for the link list. So partly we've picked a delete that
-
14:24 - 14:26made certain things work out a
-
14:26 - 14:29little better for us.
-
14:29 - 14:31So both of these operations
-
14:31 - 14:33are O of
-
14:33 - 14:331;
-
14:33 - 14:37this is where the link list really shines, right, on this sort of stuff. Because
-
14:37 - 14:41if you have access to the point at which you want to make a modification, so if you are already kind of
-
14:41 - 14:44in the midst of the thing you're trying to rearrange,
-
14:44 - 14:47it's often to the link list's advantage, because it actually can do these this rewiring
-
14:47 - 14:51in the local context. And even if there's thousands of characters that
-
14:51 - 14:52follow the cursor.
-
14:52 - 14:55It doesn't matter. And there can be thousands of characters preceding the cursor
-
14:55 - 14:56for that matter, too.
-
14:56 - 15:00The idea that there's a much larger context that's working and doesn't impact
-
15:00 - 15:03its performance; it's really just doing this local little rearrangement of the pointers.
-
15:03 - 15:05And that's a real strength of the link list design
-
15:05 - 15:11is that flexibility that doesn't rely on everything living in contiguous memory.
-
15:11 - 15:15So let's talk about the movement options for this guy.
-
15:15 - 15:19So right now I have the contents
-
15:19 - 15:21Z, Y with
-
15:21 - 15:22the cursor
-
15:22 - 15:27between that, and the B following that. So three characters worth,
-
15:27 - 15:29
-
15:29 - 15:30and then
-
15:30 - 15:31the
-
15:31 - 15:33initial two operations
-
15:33 - 15:36are pretty simple. The later two are going to be a little bit messier. So let's look at
-
15:36 - 15:37the easy ones first.
-
15:37 - 15:40Moving the cursor to the beginning, so that jump that pulls you back to the very
-
15:40 - 15:41beginning,
-
15:41 - 15:44very easy in a link list form. If I want to move the cursor to the beginning, all
-
15:44 - 15:47I need to do is take that cursor pointer,
-
15:47 - 15:49and reset it
-
15:49 - 15:51to the head, which is where the dummy cell is. And so, by
-
15:51 - 15:54virtue of doing that,
-
15:54 - 15:57
-
15:57 - 15:59the character it points to is the one that's to the left,
-
15:59 - 16:02right of that, and to the left of that the dummy cell is one we're not
-
16:02 - 16:05counting. In fact, it's like nothingness that's there,
-
16:05 - 16:07and that means that the first character that follows the cursor is Z, which
-
16:07 - 16:11is the initial character of the buffer. So that reset, easy to do, we have
-
16:11 - 16:13easy access to the front of the list,
-
16:13 - 16:14and so
-
16:14 - 16:16no
-
16:16 - 16:17special
-
16:17 - 16:18crazy code required.
-
16:18 - 16:21Moving the cursor forward, also an easy thing to do, link
-
16:21 - 16:24list are designed to make it easy to work your way from the front to
-
16:24 - 16:28the back. And so moving the cursor forward is advancing that cursor by one step. So
-
16:28 - 16:32in the array form of this, what was a cursor plus, plus, the
-
16:32 - 16:36comparable form of that in the link list is cursor equals cursor next.
-
16:36 - 16:37It has a little
-
16:37 - 16:40if test there, all of the versions actually have something that is
-
16:40 - 16:42comparable to this. If you're already at the end, then there's
-
16:42 - 16:46nothing to be done, so you don't advance the cursor off into space.
-
16:46 - 16:47You
-
16:47 - 16:51check to make sure that there is something for it to point to. So in this case,
-
16:51 - 16:54seeing if the cursor's next field is null, it's not, it points to Z. Then we go
-
16:54 - 16:58ahead and update the cursor to there,
-
16:58 - 17:00which has the effect of
-
17:00 - 17:03changing things to where the Z is the character
-
17:03 - 17:07to the left of the cursor, and then the YB follows it.
-
17:07 - 17:10So moving this way, getting back to the beginning, and kind of walking our way down is kind of easy
-
17:10 - 17:11to do.
-
17:11 - 17:15Now we start to see where the link list causes us a little bit of grief,
-
17:15 - 17:18moving the cursor to the end.
-
17:18 - 17:21So now, starting at the beginning or the middle, or wherever I am,
-
17:21 - 17:25I want to advance my way to the end. I don't know where the end is.
-
17:25 - 17:28Link lists in general don't keep track of that,
-
17:28 - 17:31that in the simple form of the single link list, I'm going to have to work to
-
17:31 - 17:34find it. So I'm going to take where I am and walk my way down.
-
17:34 - 17:37And this one actually just makes use of an existing
-
17:37 - 17:40public number function to move cursor forward. It's like while the cursor next does
-
17:40 - 17:44not equal null, which is to say while we are not pointing at the very last cell of the link list,
-
17:44 - 17:46then keep going.
-
17:46 - 17:49So advance cursor equals cursor.
-
17:49 - 17:50Some will say,
-
17:50 - 17:54"Is the cursor's next field null?" No, then set the cursor to be the cursor's
-
17:54 - 17:56next field.
-
17:56 - 17:58Is the cursor's next field null?
-
17:58 - 17:59
-
17:59 - 18:02No it's not, so advance the cursor to the next field.
-
18:02 - 18:04Is the cursor's next field null? Yes.
-
18:04 - 18:06So that would
-
18:06 - 18:08the last thing in the buffer,
-
18:08 - 18:10and so if we have
-
18:10 - 18:13hundreds or thousands or
-
18:13 - 18:15tens of characters, whichever it is, right,
-
18:15 - 18:18and depending on how close we already were, it will walk through
-
18:18 - 18:21everything from where the current editing position was to find the end of the
-
18:21 - 18:22buffer
-
18:22 - 18:25one by one, working it's way down.
-
18:25 - 18:26Even more painful
-
18:26 - 18:28operation
-
18:28 - 18:32is the simple one of just moving backwards.
-
18:32 - 18:34If I'm pointing right now to that B,
-
18:34 - 18:37and I'd like to get back to that Y, the
-
18:37 - 18:41link list is oblivious about how you got there. It knows where to go from
-
18:41 - 18:45here, but that backing up is not supported in the simple form.
-
18:45 - 18:48In order to find out what preceded the V,
-
18:48 - 18:51our only option right now is to go back to the beginning,
-
18:51 - 18:53and find it.
-
18:53 - 18:56So starting this pointer CP at the head,
-
18:56 - 18:57and saying,
-
18:57 - 18:58do you point to;
-
18:58 - 19:01is your next field the cursor?
-
19:01 - 19:04And this one says, no. And then it says, well okay advance. Does your next field point to
-
19:04 - 19:08the cursor? No it's not. Go to this one, does your next field point to the same place
-
19:08 - 19:12as the cursor, yes. Wise next field, points to the same place the cursor
-
19:12 - 19:13does.
-
19:13 - 19:18That must mean that Y was the one that preceded the cursor in the link
-
19:18 - 19:22list. So after having gone back to the beginning and walked all the way down, especially if I'm near
-
19:22 - 19:24the end, that's a long traversal.
-
19:24 - 19:26And when I get there, I can say "Oh yeah
-
19:26 - 19:34that's the one, back it up to there."
-
19:34 - 19:37Walking it all the way down;
-
19:37 - 19:41some good link list coding, good practice in there, but
-
19:41 - 19:42again a funny,
-
19:42 - 19:44funny set of trade offs, right?
-
19:44 - 19:47Does anybody have any questions about any part of the code?
-
19:47 - 19:49At the end
-
19:49 - 19:51could you just
-
19:51 - 19:53save the address of the pointer, and then the
-
19:53 - 19:54person
-
19:54 - 19:56who wants to go to the end could just
-
19:56 - 19:59access that? That was a great idea. It's like, so what about
-
19:59 - 20:02improving this? Right now all we have is a pointer to the front, and each
-
20:02 - 20:06pointer has only a pointer to the next. It's like, maybe we need to add some
-
20:06 - 20:07more stuff, track
-
20:07 - 20:10some more things. What if we tracked the
-
20:10 - 20:12tail? Would that help us?
-
20:12 - 20:16If we tracked the tail, we could move to the end quickly. Would
-
20:16 - 20:19that help us moving backwards?
-
20:19 - 20:22No. It solves one of our problems though, so you're on to something.
-
20:22 - 20:25Let's look at what we've got, and then we'll start thinking about ways to fix it, because that's going to be
-
20:25 - 20:27one of the pieces we're going to think about.
-
20:27 - 20:29So if I move to link list
-
20:29 - 20:31relative to what I have,
-
20:31 - 20:33again, it's like a shell game. The old ends moved.
-
20:33 - 20:36It used to be that editing was slow and we didn't like that.
-
20:36 - 20:38Then we made big movement fast,
-
20:38 - 20:41and now we have this funny thing were moving in certain ways is easy and
-
20:41 - 20:45moving other ways is bad. So moving
-
20:45 - 20:46down the document
-
20:46 - 20:49is good, moving backwards in the document not good.
-
20:49 - 20:53Moving back to the beginning is good, but moving to the end is bad.
-
20:53 - 20:56But then editing in all situations is good,
-
20:56 - 20:59another sort of quirky set of things. It just fells like it's like that game where the
-
20:59 - 21:00
-
21:00 - 21:03gophers pop there heads up, and you pound on them. And they just show up somewhere else. As
-
21:03 - 21:06soon as you get one thing fixed, it seems like something else had
-
21:06 - 21:08to give.
-
21:08 - 21:11This, I think, would actually be my ideal editor, because it turns out I
-
21:11 - 21:14have exactly this working style, which is like, I'll be down at the end of
-
21:14 - 21:17the document, and I'll have to be generating new things. And
-
21:17 - 21:20I'm getting tired and I don't like writing, and what I keep wanting to
-
21:20 - 21:24do is I go back to the beginning. And I read that again, because I worked on that the most and
-
21:24 - 21:27it's nice. And so I go back to the beginning, and I'm like oh yeah, look at this. Look at how lovely this
-
21:27 - 21:28is,
-
21:28 - 21:31and then I never want to go back to the end where all the trouble is. So I just
-
21:31 - 21:34keep going back to the beginning and editing in the front, and then slowly get
-
21:34 - 21:37back to the end. And then I don't like it again, and then I go back to the beginning.
-
21:37 - 21:39But I never actually willingly go to the end
-
21:39 - 21:42to face my fears. So
-
21:42 - 21:46this is the editor I need.
-
21:46 - 21:47Also
-
21:47 - 21:51has a bit of kicker in terms of space.
-
21:51 - 21:56So if the space for a character is 1 byte, which it typically is,
-
21:56 - 21:58and the space for a pointer is 4 bytes, then
-
21:58 - 22:03each of those link list cells is costing us 5 bytes, right?
-
22:03 - 22:03Which is
-
22:03 - 22:07sort of five times the amount of storage that the character itself
-
22:07 - 22:10would have taken alone,
-
22:10 - 22:14so in the vector it's fairly tight, a little bit more going up on that. But we've actually kind of blown
-
22:14 - 22:18it up by another factor of two even beyond the stack, which is not quite a
-
22:18 - 22:22good direction to be going.
-
22:22 - 22:24So let's talk about what we can do to fix that.
-
22:24 - 22:27I'm not going to go through the process of this. I'm just going to kind of give you a thought
-
22:27 - 22:29exercise, so you can just kind of
-
22:29 - 22:30visualize it.
-
22:30 - 22:34What we have here, right, is an information problem. We
-
22:34 - 22:37only are keeping track of certain
-
22:37 - 22:39pathways through that link list.
-
22:39 - 22:42What if we actually increased our access to the list?
-
22:42 - 22:46We don't need to have the full access of a vector, being able to know the nth
-
22:46 - 22:49character by number is not actually as far as we need to go. But we
-
22:49 - 22:53do just need a little bit more coordination between a cell and its neighbors.
-
22:53 - 22:56So the two things that would buy us out of a lot of the things that we're seeing
-
22:56 - 22:58here is if we added a tail pointer.
-
22:58 - 23:01So if I just add a tail pointer that points to the last cell, then I can move to the end
-
23:01 - 23:02quickly.
-
23:02 - 23:05You say cursor equals tail; the same that cursor equals head. So what was an O of N
-
23:05 - 23:07problem now is an O of 1.
-
23:07 - 23:08The backing up,
-
23:08 - 23:12not as easily solved by just having one thing that's going on;
-
23:12 - 23:13I also am going to need,
-
23:13 - 23:17on a per cell basis this ability to move backwards.
-
23:17 - 23:19
-
23:19 - 23:22Well, it's just symmetry, right? If A knows where B is, and B knows where C is, and C knows where D is, what's to stop
-
23:22 - 23:25it from also just tracking the information going in the other way,
-
23:25 - 23:30so that D knows that it came from C, and C knows it came from B, and so on.
-
23:30 - 23:35And so having this completely symmetric parallel set of links that just run the other
-
23:35 - 23:35direction,
-
23:35 - 23:39then buys you the ability to move backwards with ease.
-
23:39 - 23:42The other thing that this would give you is the tail pointer
-
23:42 - 23:46is almost not quite enough to give you the exact place of the end. It gives it to
-
23:46 - 23:47you, but you also have to be
-
23:47 - 23:49careful about what is it going to take to update it?
-
23:49 - 23:53And so inserting can easily update the tail pointer. It's the deleting that kind of
-
23:53 - 23:55would get you into trouble. If you were
-
23:55 - 23:58here and deleting, you would have to kind of keep track of making sure if you were deleting
-
23:58 - 24:01the thing that was the tail pointer, so you have to be a little about making sure
-
24:01 - 24:04you don't lose track of your tail.
-
24:04 - 24:06But once I have both of these in place,
-
24:06 - 24:09I suddenly have something that can move
-
24:09 - 24:10forward easily, O of 1;
-
24:10 - 24:15move to the front easily, O of 1; move to the tail easily, O of 1; move backwards O of 1.
-
24:15 - 24:20So I can make all of my operations
-
24:20 - 24:22O of 1.
-
24:22 - 24:24Deleting, you're still going to have to delete the individual cells, but there's one place where
-
24:24 - 24:27it suffers, but that's a much more rare occurrence.
-
24:27 - 24:31And I can get all six; it moves fast, in every dimension;
-
24:31 - 24:32inserts and deletes
-
24:32 - 24:35all good.
-
24:35 - 24:37And there are two places we paid for it.
-
24:37 - 24:40One is certainly the memory.
-
24:40 - 24:45So a 4 byte pointer one direction, a 4 byte pointer another direction,
-
24:45 - 24:49plus the 1 byte for the character means we have a 9 byte per
-
24:49 - 24:50character
-
24:50 - 24:52storage usage.
-
24:52 - 24:55And the other way, which is a little bit hard to represent with some number
-
24:55 - 24:57in a quantitative way, but actually it's also an important factor, which is
-
24:57 - 25:00your code got a lot more complicated.
-
25:00 - 25:02If you looked at the code for the vector of the stack version, it was a few
-
25:02 - 25:03lines here a few lines there.
-
25:03 - 25:06If you looked at the code for the link list singly we looked at, it's like you've got
-
25:06 - 25:07to wire up these pointers.
-
25:07 - 25:10Now imagine you've got to wire up twice as many pointers.
-
25:10 - 25:14Not only do you have the in and out going one direction, you'll have the in and out going the
-
25:14 - 25:15other direction,
-
25:15 - 25:17and just the
-
25:17 - 25:21opportunity to make errors with pointers has now gone up by a factor of two.
-
25:21 - 25:24The idea that you could get a vector version of this up and running in
-
25:24 - 25:26no time,
-
25:26 - 25:30compared to several days, let's say, of getting the doubly link one working. It
-
25:30 - 25:32might be a factor to keep in mind.
-
25:32 - 25:36If you're kind of investing for the long haul, sure, work hard, but know
-
25:36 - 25:41that it wasn't for free.
-
25:41 - 25:42So you've
-
25:42 - 25:45got 89 percent overhead.
-
25:45 - 25:49That's a lot of overhead for small bits of data.
-
25:49 - 25:53More likely, and this is actually the way that most modern word processors
-
25:53 - 25:54do do this,
-
25:54 - 25:58is they don't just make one or the other; it's not just a array or a stack or a link list.
-
25:58 - 25:59They
-
25:59 - 26:04actually look at ways to combine the features of both to get a hybrid,
-
26:04 - 26:05where you can take
-
26:05 - 26:10some of the advantages of one pipe of data, but also try to avoid its worst
-
26:10 - 26:13weaknesses. So the most common way they're going to do this is some kind of
-
26:13 - 26:15chunking strategy,
-
26:15 - 26:17where you have a segment of the
-
26:17 - 26:21buffer that kind of is moved aside and worked on in one little array. It
-
26:21 - 26:25might be that those arrays are ten or 20 or a sentence long, or paragraph long.
-
26:25 - 26:28It might be that they are actually dictated by when you change styles.
-
26:28 - 26:30All of the code that's in one font
-
26:30 - 26:33kind of gets moved into one chunk, and as soon as you change styles, it breaks
-
26:33 - 26:37into a new chunk that follows to kind of keep track of the formatting information. It
-
26:37 - 26:40depends on some of the other features that are present in the processor
-
26:40 - 26:43here, but it's likely that rather than having all 1,000 or all 10 million
-
26:43 - 26:45characters in one
-
26:45 - 26:46data structure,
-
26:46 - 26:47they're actually kind of separated
-
26:47 - 26:50to where you have a little bit of both. So the most likely thing is that
-
26:50 - 26:53there is some kind of array strategy on a chunk basis.
-
26:53 - 26:56There's a linking between those chunks.
-
26:56 - 26:59And what this allows you do is to
-
26:59 - 27:01share that overhead cost.
-
27:01 - 27:05So in this case, if I had every sentence in a chunk by itself,
-
27:05 - 27:08and then if I had next and previous pointer, which pointed to the next sentence and the previous
-
27:08 - 27:09
-
27:09 - 27:11sentence, when I need to move the cursor forward or backwards
-
27:11 - 27:15within a chunk, it's just in changing this index. Oh, it's in index two, but
-
27:15 - 27:16now it's moving to three, or four, or five.
-
27:16 - 27:20And then when it gets to the end of that chunk, to the end of that sentence, and you move forward, well then it
-
27:20 - 27:23follows the next link forward.
-
27:23 - 27:25Similarly moving backwards, right, so
-
27:25 - 27:29within a chunk doing array-like manipulations, and then as it crosses the
-
27:29 - 27:32boundary using a link list steps to get further down.
-
27:32 - 27:35So when it needs to move to the front or the back, those things are easily
-
27:35 - 27:37accessible using head and tail pointers.
-
27:37 - 27:39When I need to do an edit,
-
27:39 - 27:42it operates in this array-like kind of
-
27:42 - 27:44way
-
27:44 - 27:46when there's space within that chunk. So you start editing in a sentence, then it
-
27:46 - 27:49would kind of do some shifting on that array. But the array itself being pretty
-
27:49 - 27:51small
-
27:51 - 27:53means that although you are paying that cost of shuffling,
-
27:53 - 27:56you are doing it on a smaller chunk. It's not 1,000 characters that have to move, it's
-
27:56 - 27:5920 or 40 that have to move within that sentence.
-
27:59 - 28:02And that when
-
28:02 - 28:04you need to insert a new sentence or a new chunk in the middle, you are doing
-
28:04 - 28:08link list-like manipulations to build a new chunk and insert and splice,
-
28:08 - 28:11so you have that flexibility, where the many characters aren't
-
28:11 - 28:14all affected by it. So you have kind of local array editing
-
28:14 - 28:16and global link list editing
-
28:16 - 28:18that gives you kind of the best of both worlds.
-
28:18 - 28:19
-
28:19 - 28:23But again, the factor is cost showing up in code complexity.
-
28:23 - 28:27That what you are looking at is kind of space/time tradeoff. Well, I can throw space at
-
28:27 - 28:29this, and most people will say that in computer science, "Well I can throw space and this and
-
28:29 - 28:31get better times." So
-
28:31 - 28:35the double link list is a good example of something that throws space at a problem to get better
-
28:35 - 28:37time. There's a lot of overhead.
-
28:37 - 28:39Moving to the chunk list says I want to get back some of that space. I'm
-
28:39 - 28:43not willing to invest 90 percent overhead to get this. Can I find a way to
-
28:43 - 28:44take a
-
28:44 - 28:47whole sentence worth of things and add a few pointers here? That means that
-
28:47 - 28:48there's only two
-
28:48 - 28:514 byte pointers per each sentence as opposed to each character,
-
28:51 - 28:53and that really way reduces the overhead.
-
28:53 - 28:55But then now,
-
28:55 - 28:58the problem is still back on your shoulders in terms of effort,
-
28:58 - 29:02now I have both an array, and a link list, and double links, and
-
29:02 - 29:03a
-
29:03 - 29:07lot of complicated stuff flying around to make it work.
-
29:07 - 29:08But it does work out.
-
29:08 - 29:11It's kind of the process that designers go through when
-
29:11 - 29:12they're building these kinds of
-
29:12 - 29:16key structures. It like, what are the options, and do any of the basic
-
29:16 - 29:18things we know about work? What about
-
29:18 - 29:21building even fancier things that kind of take the best of both world and mix
-
29:21 - 29:23them together and have a little bit of the weaknesses of this a little of that, but none of
-
29:23 - 29:26the really awful
-
29:26 - 29:27worst cases
-
29:27 - 29:28present in the end result.
-
29:28 - 29:31So in this case you'd have a lot of things that were constant time where constant was
-
29:31 - 29:33defined by the chunk size.
-
29:33 - 29:37So O of 100, let's say, if that was the maximum chunk size
-
29:37 - 29:39to do these things, as opposed to O of
-
29:39 - 29:42N for the whole block net,
-
29:42 - 29:43so kind of a neat
-
29:43 - 29:45little thing to think through.
-
29:45 - 29:48There was a time, just so you can feel
-
29:48 - 29:51gratified about how I've worked you very hard, but actually in the past I've been
-
29:51 - 29:53even harsher.
-
29:53 - 29:57We actually had them build the doubly linked chunked list editor buffer,
-
29:57 - 30:00and now we have you build the singly linked chunk list PQ, and it's
-
30:00 - 30:02actually
-
30:02 - 30:06still very complicated, as you'll find out, but not nearly quite as hairy as the
-
30:06 - 30:09full in both directions every way
-
30:09 - 30:11craziness.
-
30:11 - 30:14And you can kind of imagine in your mind.
-
30:14 - 30:16Any questions about edited buffer?
-
30:16 - 30:18We?ve got two more data structures to implement,
-
30:18 - 30:20two more things that we've been using,
-
30:20 - 30:25and happy to have in our arsenal that we need to know how they work.
-
30:25 - 30:29Let's talk about map first, and in
-
30:29 - 30:31fact the strategy that I'm going to end up showing you for map is actually going to be
-
30:31 - 30:35the same one we use for set, then we'll come back and think of an even more clever
-
30:35 - 30:37way to do map, too. So
-
30:37 - 30:40we've got two things to talk about, which are binary search trees and hashing. We'll
-
30:40 - 30:41do binary search trees first,
-
30:41 - 30:44and then we'll get back to the hashing the second time around.
-
30:44 - 30:46So
-
30:46 - 30:49map is, next to vector, probably the most useful thing you've got going on in
-
30:49 - 30:50your class library,
-
30:50 - 30:53all kinds of things that do look up based
-
30:53 - 30:56activities; looking up students by ID number,
-
30:56 - 30:57phone numbers,
-
30:57 - 31:00by name, any of these kinds of things where you need to have this key at associated satellite
-
31:00 - 31:02information. And
-
31:02 - 31:05you'd like to be able to do that retrieval and update quickly.
-
31:05 - 31:08The map is the one for you.
-
31:08 - 31:11So if you remember it's key value fears, it's all about key being the
-
31:11 - 31:15identifying mark that allows you to find the associated value. And then the value
-
31:15 - 31:18is actually just being stored and kind of retrieved without using it
-
31:18 - 31:21or examining it in any kind of interesting way.
-
31:21 - 31:25And what we're really interested in here is how to make this guy work really efficiently. So efficiently
-
31:25 - 31:27being hopefully log in or better,
-
31:27 - 31:31if we could get that on both the update operations that add and remove things
-
31:31 - 31:33from the map, as well as the
-
31:33 - 31:34look up operation, then
-
31:34 - 31:37that would please probably all our clients immensely, if we could get both of
-
31:37 - 31:41those going well.
-
31:41 - 31:45So I'm going to build you on that. I probably have enough time to do that.
-
31:45 - 31:46That
-
31:46 - 31:48attacks it from
-
31:48 - 31:53using simple tools to start with, just to kind of get a base line for what we can do.
-
31:53 - 31:54
-
31:54 - 31:56Vector, our old friend vector,
-
31:56 - 31:59can't get too much mileage out of vector, apparently, because you can always find a
-
31:59 - 32:00way to make it do something
-
32:00 - 32:01useful for you.
-
32:01 - 32:02It
-
32:02 - 32:03gives us the convenience
-
32:03 - 32:05
-
32:05 - 32:07of managing that thing with low overhead,
-
32:07 - 32:09direct access is good.
-
32:09 - 32:11We make a parastruct
-
32:11 - 32:13that has the key and the value together.
-
32:13 - 32:16We store it into a vector,
-
32:16 - 32:18and then we
-
32:18 - 32:21may or may not sort that vector. But if it's sorted, there's probably some good reasons. Think
-
32:21 - 32:22about
-
32:22 - 32:26what order to sort it in, what information to be using to sort that,
-
32:26 - 32:28and then we'll get value to add. We're just going to go
-
32:28 - 32:30through doing it, rather than talk about it.
-
32:30 - 32:32We'll just go over and
-
32:32 - 32:36make a vector happen.
-
32:36 - 32:38Let's get over here in X-code.
-
32:38 - 32:40Let's take a look at
-
32:40 - 32:42my map dot H,
-
32:42 - 32:43and all
-
32:43 - 32:46I'm going to put in there is actually add and get value.
-
32:46 - 32:48And
-
32:48 - 32:50the other one contains key and remove."
-
32:50 - 32:52
-
32:52 - 32:55But these are these two operations that really matter to us, getting something in there, and
-
32:55 - 32:57getting something back out.
-
32:57 - 32:58
-
32:58 - 33:00And so I'm going to start by
-
33:00 - 33:02building this thing on vector. So let me go ahead and
-
33:02 - 33:14indicate that I plan to use vector as part of
-
33:14 - 33:15what I'm doing. So I defined a little structure
-
33:15 - 33:20that's the pair. I'm going to call that "pair"
-
33:20 - 33:23even, that's a nice word for it. So the pair of the key and the value that go together,
-
33:23 - 33:29and then what I will store here
-
33:29 - 33:31is a vector of those pairs. They'll
-
33:31 - 33:36give me a key to value; I'll stick them together in that little struct, stick them into my vector, and
-
33:36 - 33:38hopefully be able to retrieve them later.
-
33:38 - 33:42So given that I'm depending only on vector, I don?t have any other
-
33:42 - 33:45information, I don't technically need to go out of my way to disallow then
-
33:45 - 33:48memory-wise copying, because vector does a deep copy. But I'm going to leave it in there, because I
-
33:48 - 33:52plan on going some place that is eventually going to need it. I might as well be safe from
-
33:52 - 33:54the get go.
-
33:54 - 33:55The vector
-
33:55 - 34:00will be set up and destroyed automatically as part of management of my
-
34:00 - 34:04data member. So I actually don't have any explicit allocation/deallocation, so if I look at my
-
34:04 - 34:06constructor and destructor they'll be totally empty.
-
34:06 - 34:11And then add and get value are where I'm going to actually get to do some work.
-
34:11 - 34:14Before I implement them, I'm just going to think for a second about
-
34:14 - 34:16how this is going to play out.
-
34:16 - 34:20Let's say I'm doing a vector that's storing
-
34:20 - 34:26strings and their length. Let's say, it's just a map of
-
34:26 - 34:27string,
-
34:27 - 34:30
-
34:30 - 34:34and if I do M dot add
-
34:34 - 34:37car. I'm going to store a three with it,
-
34:37 - 34:39so on the other side of the wall,
-
34:39 - 34:41what we're going to do is make a little struct
-
34:41 - 34:43that has the word car
-
34:43 - 34:47and the number three with it, and we package it into our ray.
-
34:47 - 34:51And so then we get a second call to put something in like
-
34:51 - 34:55dog, also with a three, then we'll make a second one of these and stick it in
-
34:55 - 34:58our ray.
-
34:58 - 35:01And so on, so the idea is to have the vector really kind of manage where the storage is going. And
-
35:01 - 35:02then what
-
35:02 - 35:05we'll just do is package it up and stick it in there.
-
35:05 - 35:08The one thing, though, that I have to think a little bit about, because it sounds
-
35:08 - 35:11like add should be pretty easy. It almost seems like what I need to do is something
-
35:11 - 35:13as simple as this. And
-
35:13 - 35:15I'll start to write the code, and then you can tell me why
-
35:15 - 35:17this isn't what I want to do.
-
35:17 - 35:20I say pair TP, and I say P dot
-
35:20 - 35:23key equals key P dot
-
35:23 - 35:30value equals val, and then I say entries dot add P.
-
35:30 - 35:32Almost, but not quite what I want,
-
35:32 - 35:34what have I forgotten
-
35:34 - 35:39to take care of?
-
35:39 - 35:42If it's not already in there;
-
35:42 - 35:46this has to do with just what's the interface for nap. What does nap say about it if you
-
35:46 - 35:48try to add
-
35:48 - 35:51for a key a different value. So in the case of this one, it didn't quite make sense that I would add
-
35:51 - 35:55something, but let's say I accidentally put car in there with a link to the first time that I went
-
35:55 - 35:57to go fix it, right,
-
35:57 - 36:01that my subsequent to ask it store car with the proper value,
-
36:01 - 36:03three,
-
36:03 - 36:05doesn't create a totally new entry.
-
36:05 - 36:09It has to find the existing entry that already has car and overwrite it.
-
36:09 - 36:13So the behavior of mad was it could add when it was not previously existing.
-
36:13 - 36:16It also could be an overwrite of an existing value. So we do need
-
36:16 - 36:17to find
-
36:17 - 36:20if there is an entry that already has that key, and just
-
36:20 - 36:21replace its value
-
36:21 - 36:23rather than
-
36:23 - 36:26add a second entry with the same key. So at any given point, right, in this whole
-
36:26 - 36:28vector, there should be one entry
-
36:28 - 36:30for each unique key.
-
36:30 - 36:33So I have to do a search. And
-
36:33 - 36:35let me
-
36:35 - 36:38write this code, and then I'm going to mention why I'm going to decompose it in a
-
36:38 - 36:39second.
-
36:39 - 36:42If I do entries dot size, I plus plus
-
36:42 - 36:44then if
-
36:44 - 36:49entries of I dot key equals key.
-
36:49 - 36:51Then
-
36:51 - 36:52let's break. Let me pull out
-
36:52 - 37:00I so I can get access to it when I'm done. So after this
-
37:00 - 37:01loop exits,
-
37:01 - 37:06if I is less than entries dot size,
-
37:06 - 37:06then
-
37:06 - 37:09I know that it found a match.
-
37:09 - 37:12In the case where it went through all of them, and it never hit the break. Then I will
-
37:12 - 37:15have gone all the way to where it is exactly equal to entries dot size; if it didn't then I
-
37:15 - 37:16just
-
37:16 - 37:19have a place to overwrite. And I can say entries of
-
37:19 - 37:21I dot val
-
37:21 - 37:23equals val.
-
37:23 - 37:27Otherwise I go through this process of adding a new one. I had to
-
37:27 - 37:30go hunt that thing down
-
37:30 - 37:33and replace it.
-
37:33 - 37:35That code,
-
37:35 - 37:38when you look at it, you can say, "Gosh I feel like that code's going to be familiar."
-
37:38 - 37:43Because that idea of searching to find a match to the key
-
37:43 - 37:45probably is going to come in really handy
-
37:45 - 37:46when I need to do get value,
-
37:46 - 37:50but it has the same exact problem of oh, I've got to go find that match. So in fact,
-
37:50 - 37:55that kind of motivates the idea of why don't I just take this little piece of code
-
37:55 - 37:59and separate it out and do a helper that they can both use. I
-
37:59 - 38:02may not have planned for this from the beginning, but once I see it happening, I might as well fix
-
38:02 - 38:05it. So I can say find index
-
38:05 - 38:07for key that
-
38:07 - 38:10given a key will return to you
-
38:10 - 38:11
-
38:11 - 38:15the vector index at which
-
38:15 - 38:19that key is or a negative one. I don't know if I need to keep that
-
38:19 - 38:22one anymore.
-
38:22 - 38:24If it ever found it, it returns I otherwise
-
38:24 - 38:28it returned negative one. That can be our not found.
-
38:28 - 38:37I can actually use that up here,
-
38:37 - 38:40and then I can say, if found
-
38:40 - 38:43does not equal negative one.
-
38:43 - 38:44Then we do that,
-
38:44 - 38:46and then that
-
38:46 - 38:48tells us that
-
38:48 - 38:51this little piece of code is going to come in handy right up here,
-
38:51 - 38:55get value if found does not equal
-
38:55 - 38:56negative one, then we can return that
-
38:56 - 39:02and what is the behavior of get value if it didn't find it? Does anybody remember?
-
39:02 - 39:06I ask it to get the value in this case for lollipop and it's not
-
39:06 - 39:07there,
-
39:07 - 39:13does it turn zero, what does it
-
39:13 - 39:22do? Untracked forget value, remember?
-
39:22 - 39:25This is an error.
-
39:25 - 39:26There is not sort of a
-
39:26 - 39:30general case return value you can produce here
-
39:30 - 39:33that will really sort of sufficiently describe to someone who's made
-
39:33 - 39:37this call in error what happened, right? I can't return zero, because it happens to be
-
39:37 - 39:40that sometimes I?m putting in, and sometimes I'm putting strings or structs
-
39:40 - 39:44kinds of things. There's no general zero or negative one to use,
-
39:44 - 39:47so the behavior of get value is if you ask it to retrieve a value for a key, it ought to be
-
39:47 - 39:48there. So
-
39:48 - 39:52the corresponding implementation would likely also just go through
-
39:52 - 39:56the process of contains key, which we call find index for key and check.
-
39:56 - 39:58It could be used by the client
-
39:58 - 40:01if they need to know in advance that it's really there.
-
40:01 - 40:04So let's see how I'm
-
40:04 - 40:06doing. Let's go back to my
-
40:06 - 40:11code here,
-
40:11 - 40:13and this
-
40:13 - 40:15being
-
40:15 - 40:16not my specialty,
-
40:16 - 40:20and just see that I can put something in and
-
40:20 - 40:25get it back out would be a good first test to see that things are
-
40:25 - 40:29going as they should. Oh, no it didn't like that. Oh yeah, find
-
40:29 - 40:30index for key.
-
40:30 - 40:32Find index for key
-
40:32 - 40:34was not
-
40:34 - 40:38populated clear in my dot H, so let me go move it there. I'll put
-
40:38 - 40:42it in the private section, because I don?t want this to be something that is
-
40:42 - 40:45exposed outside of the implementation. The idea that there is an index and
-
40:45 - 40:48there is a vector is not really something that I would want to make part of my
-
40:48 - 40:54public interface. It's really just an internal detail of how I'm managing stuff.
-
40:54 - 40:56Something about what I did, oh yeah, I
-
40:56 - 41:00used the right name for that.
-
41:00 - 41:05One more case let
-
41:05 - 41:06me just take a look at them both
-
41:06 - 41:09before I let the compiler tell me what is and what isn't right about them. Their both
-
41:09 - 41:13doing find index for key using linear search. If it comes back at not negative one
-
41:13 - 41:17pulling out the matching value and overriding it, and then in the case of adding
-
41:17 - 41:23or replacing the error -
-
41:23 - 41:24so the
-
41:24 - 41:26previously stored value for that is three,
-
41:26 - 41:29and if I go and I do one of those tests it's always good to know what
-
41:29 - 41:34happens if I ask it for something that doesn't exist.
-
41:34 - 41:37Making sure that the error handling that I put in there does
-
41:37 - 41:43what it was supposed to do, right? Are
-
41:43 - 41:46there any questions about the code that I wrote here?
-
41:46 - 41:49The vector is always kind of a good starting place for these things, because actually it just
-
41:49 - 41:53tends to use very simple things, and it tends to lend itself to easy code.
-
41:53 - 41:57But because of vector constraints being at a contiguous back structure, right, there's likely to
-
41:57 - 41:59be some
-
41:59 - 42:01performance implication. And in
-
42:01 - 42:02particular,
-
42:02 - 42:04in the unsorted case here,
-
42:04 - 42:07both add and get value
-
42:07 - 42:09involve doing a linear search.
-
42:09 - 42:12In some cases it will find it quickly at the very beginning or even half way
-
42:12 - 42:13through.
-
42:13 - 42:17In other cases it won't find it at all and have to search through the whole thing. So on
-
42:17 - 42:20average, if you figure it's equally likely to be in any position or not there at all,
-
42:20 - 42:23it's at going to at least have to look at half of them or more.
-
42:23 - 42:25And so we can say they're both linear.
-
42:25 - 42:28If we change it to be sorted,
-
42:28 - 42:31which is the first obvious improvement to make here,
-
42:31 - 42:34then get value
-
42:34 - 42:37gets a lot faster, because we can take advantage of binary search.
-
42:37 - 42:39I sort it by key,
-
42:39 - 42:43so ignoring the value, values are just satellite data. But stored in order of
-
42:43 - 42:44key,
-
42:44 - 42:48then all the A words, B words, C words, D words, what not, so
-
42:48 - 42:51walking it down the middle and seeing which half you look at, then looking at the
-
42:51 - 42:53quarter of there and the eight of there. It's going to very quickly narrow down on
-
42:53 - 42:57where it had to be if it was there or if it wasn't there at all.
-
42:57 - 43:00So we can implement that in logarithmic time, meaning that if you 1,000
-
43:00 - 43:04entries, it takes ten comparisons to find it or agree that it's not there.
-
43:04 - 43:08If you double that it will take one more comparison. If
-
43:08 - 43:12you go to 2,000, negligibly faster, even numbers in the millions
-
43:12 - 43:15are still just a handful
-
43:15 - 43:19of comparisons to work it out, and say,
-
43:19 - 43:22to the twentieth for example is roughly a million; so 20 comparisons in or out,
-
43:22 - 43:25and so it didn't look at a million things a million times over.
-
43:25 - 43:27However,
-
43:27 - 43:30that didn't improve add,
-
43:30 - 43:31why not?
-
43:31 - 43:33Why does keeping an assorted order
-
43:33 - 43:37not have the same boost for add as it did for get value?
-
43:37 - 43:46Can I do the same logarithmic search? You have to
-
43:46 - 43:50move all the elements ?
-
43:50 - 43:51Yeah,
-
43:51 - 43:55you can find the place fast, right, you can say returning the word
-
43:55 - 43:59apple it's like oh, okay. I can very quickly narrow in on where it needed to be. If it
-
43:59 - 44:02was there I could override it quickly, but in the case where it really needed to do
-
44:02 - 44:03an add,
-
44:03 - 44:04it's got to move them down.
-
44:04 - 44:05So the old add
-
44:05 - 44:09did all its work in the search and then just tacked it on the end if it didn't find
-
44:09 - 44:12it, or overrode the middle. But the new add
-
44:12 - 44:16also has to modify the array to put it in the right place, and putting it in the right place means
-
44:16 - 44:18shuffling to make space.
-
44:18 - 44:20So even though it found it quickly,
-
44:20 - 44:25it was unable to update that quickly, so it still ends up being
-
44:25 - 44:26linear in that case.
-
44:26 - 44:29Now this actually not a terrible result,
-
44:29 - 44:30as it stands
-
44:30 - 44:33it's probably much more likely that you're going to load up the data
-
44:33 - 44:36structure once and then do a lot
-
44:36 - 44:40of searches on it, then infrequent additions. That's a pretty common usage
-
44:40 - 44:41pattern for a map. It's kind of like you
-
44:41 - 44:43build a dictionary, right,
-
44:43 - 44:46loading the dictionary once could be expensive, but then people get tons and tons of
-
44:46 - 44:49look ups of words later. But you don't change the definitions a lot later or add a
-
44:49 - 44:51lot of new words.
-
44:51 - 44:51So
-
44:51 - 44:56this might actually be quite tolerable in many situations
-
44:56 - 44:59that the building of it was expensive, but it gave
-
44:59 - 45:00you
-
45:00 - 45:02fast search access.
-
45:02 - 45:05But really what you want to do is say, can we get both of those? So driving to say
-
45:05 - 45:08what we do with the editor buffer, you say, "Well, what if I just want both of those operations
-
45:08 - 45:11to be logarithmic or better, what can I do?" So
-
45:11 - 45:12I'll leave you with kind of just
-
45:12 - 45:16a little bit of a taste of where we're heading.
-
45:16 - 45:19Just to think a little bit about what the link
-
45:19 - 45:21list does and doesn't buy us,
-
45:21 - 45:24right, that the link list gave us flexibility
-
45:24 - 45:29at the editing, the local editing site, but it doesn't give us fast searching.
-
45:29 - 45:31Well maybe we can take this link list and try to
-
45:31 - 45:33add searching
-
45:33 - 45:37to the idea of this flexibility to kind of get the best of both
-
45:37 - 45:41worlds. So that's what we're going to start on Monday, building a tree. So if
-
45:41 - 45:44you have time today, I'd love to have you come to Truman, and otherwise have a good
-
45:44 - 45:45weekend, and
-
45:45 - 45:50get your PQ work in.
- Title:
- Lecture 21 | Programming Abstractions (Stanford)
- Description:
-
Lecture 21 by Julie Zelenski for the Programming Abstractions Course (CS106B) in the Stanford Computer Science Department.
Julie talks about the buffer version of vector vs. stack and follows this with an example of cursor design. She also talks about linked list insertion and deletion. Cursor movement is the next topic covered; she illustrates how the cursor points from one cell to the next.
Complete Playlist for the Course:
http://www.youtube.com/view_play_list?p=FE6E58F856038C69CS 106B Course Website:
http://cs106b.stanford.eduStanford Center for Professional Development:
http://scpd.stanford.edu/Stanford University:
http://www.stanford.edu/Stanford University Channel on YouTube:
http://www.youtube.com/stanforduniversity/ - Video Language:
- English
- Duration:
- 46:02
N. Ueda edited English subtitles for Lecture 21 | Programming Abstractions (Stanford) | ||
Eunjeong_Kim added a translation |