WEBVTT 00:00:00.000 --> 00:00:02.000 00:00:02.000 --> 00:00:10.000 00:00:10.000 --> 00:00:13.000 This presentation is delivered by the Stanford Center for Professional 00:00:13.000 --> 00:00:21.000 Development. 00:00:21.000 --> 00:00:23.000 Let us 00:00:23.000 --> 00:00:25.000 refresh about where we were at the end of Wednesday. 00:00:25.000 --> 00:00:29.000 As we had talked about the vector form of the buffer, so the editor buffer that's 00:00:29.000 --> 00:00:31.000 backing the 00:00:31.000 --> 00:00:34.000 word processing client, 00:00:34.000 --> 00:00:37.000 and the two stack version, where we're shuffling stuff, often on the before 00:00:37.000 --> 00:00:40.000 and after stacks to get things done, right? 00:00:40.000 --> 00:00:43.000 And when we were looking at the main six operations, we're looking at the four 00:00:43.000 --> 00:00:47.000 cursor movements and the two editing operations, right, that we had traded off. 00:00:47.000 --> 00:00:49.000 Where editing had been slow in the vector, 00:00:49.000 --> 00:00:52.000 to where editing was now fast, but then these long distance movements were slow, 00:00:52.000 --> 00:00:54.000 because of the shuffling. 00:00:54.000 --> 00:00:57.000 And then there was a little discussion at the very end about how this probably 00:00:57.000 --> 00:01:00.000 increases our space requirements, because we're now keeping two stacks that are 00:01:00.000 --> 00:01:02.000 each capable of holding 00:01:02.000 --> 00:01:04.000 the entire contents 00:01:04.000 --> 00:01:06.000 as opposed to one vector here 00:01:06.000 --> 00:01:09.000 with its slop. 00:01:09.000 --> 00:01:10.000 So that was where we were at. 00:01:10.000 --> 00:01:13.000 We are still in kind of the pursuit of this Holy Grail of could we 00:01:13.000 --> 00:01:16.000 get it to where everything is fast. Is it always a matter of tradeoffs or could we just 00:01:16.000 --> 00:01:19.000 invest some more smarts and more 00:01:19.000 --> 00:01:20.000 cleverness 00:01:20.000 --> 00:01:24.000 and hard work and squeeze it down to where we could get everything to perform 00:01:24.000 --> 00:01:26.000 efficiently? So typically when computer scientists say they want something to perform 00:01:26.000 --> 00:01:28.000 efficiently, they're hoping for log 00:01:28.000 --> 00:01:32.000 in time or faster. Constant time is the best. Log in grows so slowly that in fact, 00:01:32.000 --> 00:01:34.000 logarithm time is effectively constant 00:01:34.000 --> 00:01:40.000 for all reasonable values of N up to millions and billions. 00:01:40.000 --> 00:01:43.000 And so typically anything that is linear or higher is often a weak spot that you are 00:01:43.000 --> 00:01:47.000 trying to look at, and certainly things that are quadratic or exponential 00:01:47.000 --> 00:01:51.000 are definite opportunities for improvement. 00:01:51.000 --> 00:01:53.000 So let's look at this idea of the link list, 00:01:53.000 --> 00:01:55.000 so that both 00:01:55.000 --> 00:01:56.000 the 00:01:56.000 --> 00:01:58.000 vector and the stack 00:01:58.000 --> 00:02:03.000 - we're relying on this idea that continuous memory is the 00:02:03.000 --> 00:02:06.000 weak link that was causing some problems. In the vector, that shuffling to 00:02:06.000 --> 00:02:08.000 move things up and down, 00:02:08.000 --> 00:02:12.000 or in the case of the stack version, one side to the other 00:02:12.000 --> 00:02:15.000 was part of what we were working up against. 00:02:15.000 --> 00:02:18.000 And the link list promises to give us this opportunity to have these each 00:02:18.000 --> 00:02:21.000 individual character being managed separately, 00:02:21.000 --> 00:02:26.000 and flexibility might help resolve some of our problems here. 00:02:26.000 --> 00:02:30.000 SO what we're going to look at is the idea of implementing the buffer using a link 00:02:30.000 --> 00:02:32.000 list. So if I have 00:02:32.000 --> 00:02:34.000 characters A, B, C, D, E, 00:02:34.000 --> 00:02:37.000 I could have a link list here where I'm pointing to the head cell, which is A, 00:02:37.000 --> 00:02:40.000 which points to B, and C, and D, and E, and down to the null. 00:02:40.000 --> 00:02:43.000 SO what I'm modeling it as in the private data section will be a 00:02:43.000 --> 00:02:44.000 little 00:02:44.000 --> 00:02:47.000 link list node with a character in there, a pointer to the next one. 00:02:47.000 --> 00:02:52.000 And then I'm going to keep two pointers, one to the front-most cell, and then that's going to 00:02:52.000 --> 00:02:55.000 model where the cursor is within the buffer. 00:02:55.000 --> 00:02:59.000 So rather than doing this like an index, an index was handy in the case of vector where we 00:02:59.000 --> 00:03:00.000 had direct access, right? 00:03:00.000 --> 00:03:02.000 Given the way the link list works, 00:03:02.000 --> 00:03:05.000 knowing that it's at the fifth cell, the fourth cell isn't really very helpful. It would be more 00:03:05.000 --> 00:03:09.000 helpful to have the pointer kind of directly in the midst of things to give us direct 00:03:09.000 --> 00:03:12.000 access to where the cursor is. 00:03:12.000 --> 00:03:14.000 But let's think a little bit about that cursor, 00:03:14.000 --> 00:03:17.000 because there's an important decision to be made early about helping to 00:03:17.000 --> 00:03:20.000 facilitate the later work we have to do. 00:03:20.000 --> 00:03:24.000 I have the contents A, B, C, D, E, 00:03:24.000 --> 00:03:28.000 so the cursor is actually between two characters. 00:03:28.000 --> 00:03:31.000 If the cursor right now is situated 00:03:31.000 --> 00:03:32.000 after the A and before the B, 00:03:32.000 --> 00:03:33.000 what I'm 00:03:33.000 --> 00:03:36.000 modeling is A cursor B, 00:03:36.000 --> 00:03:38.000 C, D, 00:03:38.000 --> 00:03:39.000 then 00:03:39.000 --> 00:03:42.000 it seems like the obvious two choices for where the cursor might point is to 00:03:42.000 --> 00:03:45.000 A or to B. And I 00:03:45.000 --> 00:03:50.000 had said in the case of the vector that if it were index zero or index one 00:03:50.000 --> 00:03:51.000 that 00:03:51.000 --> 00:03:53.000 there wasn't a really strong preference for one over the other. 00:03:53.000 --> 00:03:56.000 There is going to be a really good reason to pick one for the link list 00:03:56.000 --> 00:03:59.000 over the other. Would you rather point to the A 00:03:59.000 --> 00:04:11.000 for where the cursor is or would you rather point to the B? [Inaudible] 00:04:11.000 --> 00:04:14.000 Exactly, so the idea is that when the cursor position is there, the next edit 00:04:14.000 --> 00:04:18.000 actually goes in between the A and the B. So if I insert the character 00:04:18.000 --> 00:04:21.000 Z, it really does go right here. 00:04:21.000 --> 00:04:24.000 And if I am going to, if effect, wire that in and splice it in then list, 00:04:24.000 --> 00:04:29.000 if I'm pointing to B then I'm kind of hosed, because I'm already one past where I need to be able slice in. Because I need to 00:04:29.000 --> 00:04:33.000 know what was behind it to bring it in. So I'm pointing to the character behind the 00:04:33.000 --> 00:04:34.000 cursor 00:04:34.000 --> 00:04:36.000 gives you access to 00:04:36.000 --> 00:04:39.000 wiring this down to here and this into there, and then updating the cursor to 00:04:39.000 --> 00:04:40.000 point to the Z 00:04:40.000 --> 00:04:44.000 without having to do any of the difficult backward looking backward 00:04:44.000 --> 00:04:45.000 traversal. 00:04:45.000 --> 00:04:48.000 So that strongly suggests what it is, it's going to point to 00:04:48.000 --> 00:04:52.000 A when it's between A and B; B when it's between B and C; and so on. 00:04:52.000 --> 00:04:56.000 There is another little point here, which is there are actually five letters 00:04:56.000 --> 00:04:59.000 in the cursor, but there are six cursor positions. It could be at the very 00:04:59.000 --> 00:05:00.000 beginning, 00:05:00.000 --> 00:05:03.000 in between all these, or at the very end. 00:05:03.000 --> 00:05:07.000 And the way we've chosen it right now, I can identify all of the five 00:05:07.000 --> 00:05:08.000 positions 00:05:08.000 --> 00:05:11.000 that have at least one character to the left, so being - 00:05:11.000 --> 00:05:12.000 pointing to the A means it's 00:05:12.000 --> 00:05:15.000 after A; pointing to the B it's after B; and so on, all the way down to the E. 00:05:15.000 --> 00:05:18.000 But I don't have a cursor position for 00:05:18.000 --> 00:05:23.000 representing when the cursor is at the very beginning of the buffer. 00:05:23.000 --> 00:05:25.000 So 00:05:25.000 --> 00:05:28.000 one strategy I could use, right, is to say I'll just use the special cursor 00:05:28.000 --> 00:05:29.000 position of null. But 00:05:29.000 --> 00:05:32.000 I need one more pointer value to hold onto, 00:05:32.000 --> 00:05:36.000 and I could make a special case out of this because there's null. Once I've done that though, 00:05:36.000 --> 00:05:40.000 I've actually committed myself to this thing that's going to require all of my code to be managing a 00:05:40.000 --> 00:05:41.000 special case of, 00:05:41.000 --> 00:05:43.000 well when the cursor's null do something a little different than when 00:05:43.000 --> 00:05:46.000 the cursor is pointing to some cell. I'm going to 00:05:46.000 --> 00:05:50.000 show you a way that we can kind of 00:05:50.000 --> 00:05:54.000 avoid the need for making a special case out of it, 00:05:54.000 --> 00:05:55.000 by 00:05:55.000 --> 00:05:58.000 wasting a little memory and reorganizing our list 00:05:58.000 --> 00:06:01.000 to allow for there to be a sixth cell, 00:06:01.000 --> 00:06:04.000 and extra cell that we call the dummy cell. 00:06:04.000 --> 00:06:07.000 So in this case that cell would go at the very front. 00:06:07.000 --> 00:06:10.000 It would have no character, just let the character field be unstatused. It's 00:06:10.000 --> 00:06:13.000 not really a character. What it is, is just a placeholder. 00:06:13.000 --> 00:06:16.000 And when I created the buffer to begin with, even when it's empty, it's always going to have 00:06:16.000 --> 00:06:20.000 that cell. I'm going to create that cell in advance, I'm going to have it sitting ready and waiting, and that cell 00:06:20.000 --> 00:06:24.000 never gets deleted or moved or changed. In fact, the list always, the head 00:06:24.000 --> 00:06:28.000 of this will always point to the same cell from the beginning of this object's lifetime 00:06:28.000 --> 00:06:32.000 to the end. So it will never change this at all, and the cells that follow are 00:06:32.000 --> 00:06:35.000 the ones that are going to be changing and growing and rearranging in 00:06:35.000 --> 00:06:38.000 response to editing commands. 00:06:38.000 --> 00:06:40.000 What this is going to buy us is a couple of things. First off, it's going to make it 00:06:40.000 --> 00:06:42.000 so there is a sixth cursor position. 00:06:42.000 --> 00:06:45.000 So now when the cursor is at the very beginning, 00:06:45.000 --> 00:06:47.000 we can set the cursor to point to this dummy cell. 00:06:47.000 --> 00:06:48.000 Okay 00:06:48.000 --> 00:06:51.000 that's one thing that we solve by doing that. 00:06:51.000 --> 00:06:55.000 Another thing that we've solved is actually we have made all the other 00:06:55.000 --> 00:06:57.000 cells on the list, the ones that are being edited and manipulated. 00:06:57.000 --> 00:07:00.000 We've made them all totally symmetric. 00:07:00.000 --> 00:07:02.000 Previously there was always going to be a little bit of a special case about, well if you're 00:07:02.000 --> 00:07:06.000 the front of the list, we're seeing that sometimes we're doing inserts on the 00:07:06.000 --> 00:07:08.000 stack in queue link list. That 00:07:08.000 --> 00:07:12.000 inserting later in the list involves kind of wiring in two pointers; one to command and one 00:07:12.000 --> 00:07:13.000 to go out. 00:07:13.000 --> 00:07:16.000 But there was a special case that oh, if you're the very first cell of the list, then 00:07:16.000 --> 00:07:19.000 your outpointer is the same, but your incoming pointer is resetting the head. 00:07:19.000 --> 00:07:22.000 That piece has actually gone away now, 00:07:22.000 --> 00:07:26.000 that now every cell in there always has a predecessor. It always has a previous, 00:07:26.000 --> 00:07:29.000 right? Something behind it so that actually they will always be 00:07:29.000 --> 00:07:31.000 pulling the pointer in and pulling the pointer out 00:07:31.000 --> 00:07:34.000 for every cell with character data that's being modified. 00:07:34.000 --> 00:07:37.000 And that means some of that extra little handing of oh, if you're the 00:07:37.000 --> 00:07:39.000 very first cell, 00:07:39.000 --> 00:07:40.000 have gone away. 00:07:40.000 --> 00:07:44.000 The very first cell is always this kind of dummy cell and doesn't require us to go 00:07:44.000 --> 00:07:46.000 out of our way to do anything special for it. 00:07:46.000 --> 00:07:49.000 So this is going to solve a bunch of problems for us, and it's kind of a neat technique. This thing 00:07:49.000 --> 00:07:51.000 is used actually 00:07:51.000 --> 00:07:52.000 fairly commonly 00:07:52.000 --> 00:07:57.000 in situations just managing a link list just to buy yourself some symmetry in the 00:07:57.000 --> 00:08:02.000 remaining cells. 00:08:02.000 --> 00:08:05.000 So let me look at some code with you, and I'm going to actually draw some pictures as I 00:08:05.000 --> 00:08:06.000 go. I'm 00:08:06.000 --> 00:08:10.000 not going to write this code in real time, because I think it's actually more important to 00:08:10.000 --> 00:08:11.000 see it 00:08:11.000 --> 00:08:15.000 acting on things than it is to see me typing out the characters. 00:08:15.000 --> 00:08:18.000 But the mechanism, right, in the case of the buffer here 00:08:18.000 --> 00:08:21.000 is it's going to have a head point, 00:08:21.000 --> 00:08:23.000 which points to the front most cell; and a 00:08:23.000 --> 00:08:24.000 cursor, 00:08:24.000 --> 00:08:28.000 which points to the character that proceeds the cursor. So if right now the 00:08:28.000 --> 00:08:31.000 contents of the buffer 00:08:31.000 --> 00:08:34.000 are 00:08:34.000 --> 00:08:37.000 the contents A, D, C. 00:08:37.000 --> 00:08:39.000 And let's say that right now 00:08:39.000 --> 00:08:40.000 the 00:08:40.000 --> 00:08:42.000 cursor is pointing to A. 00:08:42.000 --> 00:08:45.000 Actually, I need my 00:08:45.000 --> 00:08:50.000 dummy cell. Let me move everybody down one, so you just have dummy 00:08:50.000 --> 00:08:52.000 cell and my A and my B. 00:08:52.000 --> 00:08:56.000 And so the cursor right now is at the very front of the buffer. 00:08:56.000 --> 00:08:59.000 This is kind of what it looks like. It has two characters, the A and the B. 00:08:59.000 --> 00:09:03.000 I'm going to trace through the process of inserting a new character 00:09:03.000 --> 00:09:04.000 into the buffer. It 00:09:04.000 --> 00:09:07.000 is that my local variable CP 00:09:07.000 --> 00:09:11.000 is going to point to a new cell allocated out in the heap. 00:09:11.000 --> 00:09:15.000 Let's say the character I'm going to insert is the Z, so I'll 00:09:15.000 --> 00:09:16.000 assign 00:09:16.000 --> 00:09:18.000 CH to the character, the slot right there. 00:09:18.000 --> 00:09:20.000 And then the next for this 00:09:20.000 --> 00:09:24.000 gets to - what the cursor is - kind of following the cursor. So the 00:09:24.000 --> 00:09:26.000 cursor is pointing to the cell before it, 00:09:26.000 --> 00:09:29.000 and the next field of that cell tells us what follows it; 00:09:29.000 --> 00:09:31.000 so if I trace my cursor's 00:09:31.000 --> 00:09:33.000 next field, it points to A, 00:09:33.000 --> 00:09:35.000 and so I'm going to go ahead and 00:09:35.000 --> 00:09:41.000 set up this new cell's next field to also point to A. 00:09:41.000 --> 00:09:44.000 So now I've got two different ways to get to the A cell, tracing off the dummy cell or 00:09:44.000 --> 00:09:46.000 tracing off this new cell. 00:09:46.000 --> 00:09:48.000 And then I 00:09:48.000 --> 00:09:51.000 update the cursor's next field to point to CP. 00:09:51.000 --> 00:09:55.000 So I just copied right there that CP's next field was pointing to A. I copied it down here 00:09:55.000 --> 00:09:57.000 to kind of do the splice out. 00:09:57.000 --> 00:10:02.000 Now I'm going to do the splice in, 00:10:02.000 --> 00:10:05.000 causing it to no longer point to A, 00:10:05.000 --> 00:10:06.000 but instead to 00:10:06.000 --> 00:10:08.000 point to my new cell down here, CP. 00:10:08.000 --> 00:10:10.000 And then I update the cursor 00:10:10.000 --> 00:10:13.000 to point to this new cell. That just means that subsequent inserts, like the 00:10:13.000 --> 00:10:18.000 C is kind of behind the cursor after I've made this 00:10:18.000 --> 00:10:21.000 00:10:21.000 --> 00:10:23.000 edit. 00:10:23.000 --> 00:10:25.000 That goes away when that one ends, 00:10:25.000 --> 00:10:28.000 and so now the new structure that I have here is 00:10:28.000 --> 00:10:31.000 head still points to the dummy cell that never changes. 00:10:31.000 --> 00:10:33.000 Dummy cell now points to Z, 00:10:33.000 --> 00:10:36.000 Z points to A, Z points to B, 00:10:36.000 --> 00:10:41.000 and then the cursor in this case is between the Z and the A, 00:10:41.000 --> 00:10:44.000 by virtue of the fact that the cursor is pointing to Z. 00:10:44.000 --> 00:10:50.000 A subsequent one would kind of do the same thing. If I typed out ZY 00:10:50.000 --> 00:10:51.000 after I did this. We'd 00:10:51.000 --> 00:10:55.000 make a new cell over here. It would get what 00:10:55.000 --> 00:10:58.000 was currently coming out of the cursor's field. The next field, right, we would 00:10:58.000 --> 00:11:02.000 update cursor's next field to point 00:11:02.000 --> 00:11:05.000 to this new one. And then we would update cursor 00:11:05.000 --> 00:11:08.000 to point to this guy. And 00:11:08.000 --> 00:11:10.000 so now following the list, right, dummy cell, 00:11:10.000 --> 00:11:11.000 Z, 00:11:11.000 --> 00:11:16.000 Y, A, B, cursor pointing to the Y, which is the 00:11:16.000 --> 00:11:20.000 character directly to the left of the cursor. 00:11:20.000 --> 00:11:22.000 And so inserting in any position now, 00:11:22.000 --> 00:11:25.000 front cell, middle cell, end cell, 00:11:25.000 --> 00:11:29.000 doesn't require any special case handling. So you don't see any if of things. 00:11:29.000 --> 00:11:33.000 And you notice that you'll never see an update to the head directly here. 00:11:33.000 --> 00:11:37.000 That the head was set up to point to the dummy cell in the constructor 00:11:37.000 --> 00:11:39.000 and never needs any adjusting. 00:11:39.000 --> 00:11:42.000 It's only the cells after it that requires 00:11:42.000 --> 00:11:43.000 00:11:43.000 --> 00:11:44.000 the new 00:11:44.000 --> 00:11:47.000 arrangement of the pointers coming in and 00:11:47.000 --> 00:11:49.000 out. When I'm ready to delete a cell, 00:11:49.000 --> 00:11:53.000 then the mechanism for delete in the buffer is to delete the thing that follows 00:11:53.000 --> 00:11:55.000 the cursor. So if I have Z, Y, A, B, 00:11:55.000 --> 00:11:58.000 this is the character that delete is supposed to delete, the one after the 00:11:58.000 --> 00:11:59.000 cursor. 00:11:59.000 --> 00:12:00.000 And so, 00:12:00.000 --> 00:12:04.000 the first thing it looks at is to make sure that we're not already at the end. 00:12:04.000 --> 00:12:07.000 In the case of the link list form of this, if the cursor was pointing to 00:12:07.000 --> 00:12:09.000 the last most cell, 00:12:09.000 --> 00:12:12.000 and then we looked at it's next field and saw that it was null, that would be a clue. Oh, it's 00:12:12.000 --> 00:12:15.000 pointing to the very last cell, there's nothing that follows it. So we have a 00:12:15.000 --> 00:12:17.000 quick test to make sure that there's something there to delete. 00:12:17.000 --> 00:12:19.000 There is something following the cursor, and 00:12:19.000 --> 00:12:24.000 in this case since the cursor is pointing to the Y, we're good. It says look at its 00:12:24.000 --> 00:12:25.000 next field, it points to A. 00:12:25.000 --> 00:12:27.000 It says okay, 00:12:27.000 --> 00:12:29.000 call this thing old 00:12:29.000 --> 00:12:30.000 and point to that A. 00:12:30.000 --> 00:12:33.000 Now do a wire around it. 00:12:33.000 --> 00:12:34.000 So take the cursor's 00:12:34.000 --> 00:12:36.000 next field, and 00:12:36.000 --> 00:12:39.000 so this thing is where I'm targeting my overwrite, 00:12:39.000 --> 00:12:44.000 and copy out of the old its next. It's 00:12:44.000 --> 00:12:49.000 no longer pointing to the A, but instead it's 00:12:49.000 --> 00:12:50.000 pointing to the B. 00:12:50.000 --> 00:12:52.000 And deleting the old, which 00:12:52.000 --> 00:12:54.000 causes this piece of memory to 00:12:54.000 --> 00:12:56.000 be reclaimed, 00:12:56.000 --> 00:13:01.000 and now the contents of my buffer have been shifted over to 00:13:01.000 --> 00:13:06.000 where it goes Z, Y, and straight to D. The cursor in this case doesn't move. It 00:13:06.000 --> 00:13:07.000 actually deleted to the right, and so 00:13:07.000 --> 00:13:11.000 whatever was to the left of the cursor just stays in place. 00:13:11.000 --> 00:13:15.000 So again, no need to update anything special for the first or the last or any of those 00:13:15.000 --> 00:13:15.000 cells, 00:13:15.000 --> 00:13:19.000 but the symmetry of all the cells past the dummy kind of buys us some convenience 00:13:19.000 --> 00:13:22.000 of handling it. What does a dummy cell look like? The 00:13:22.000 --> 00:13:26.000 dummy cell looks like just any other cell. It's actually a cell, T, and what you do is 00:13:26.000 --> 00:13:29.000 you just don't assign the Christian field, the character field, because it has nothing in it. 00:13:29.000 --> 00:13:32.000 So it actually looks just like all the others, it's just another node in the 00:13:32.000 --> 00:13:33.000 chain. 00:13:33.000 --> 00:13:36.000 But this one, you know you put there purposefully just as the placeholder. 00:13:36.000 --> 00:13:40.000 So it doesn't contain any character data. So if you're ready to print the 00:13:40.000 --> 00:13:42.000 contents of the buffer, you have to remember the dummy cell is there, 00:13:42.000 --> 00:13:43.000 and just skip over it. You 00:13:43.000 --> 00:13:45.000 tend to start your list 00:13:45.000 --> 00:13:49.000 at heads next, the first interesting cell to look at, then work 00:13:49.000 --> 00:13:51.000 your way down to the end. So it's 00:13:51.000 --> 00:13:54.000 just like all the others, but you just know 00:13:54.000 --> 00:14:00.000 that the character data here is useless, it's not important. You didn't even write anything in there. When you delete, you just go left? It turns out 00:14:00.000 --> 00:14:02.000 the delete in this 00:14:02.000 --> 00:14:04.000 case goes forward. 00:14:04.000 --> 00:14:07.000 So you can imagine that in some other world it does, but all the ones we 00:14:07.000 --> 00:14:11.000 have done so far have always deleted forward. So the delete - for example, the stack case, 00:14:11.000 --> 00:14:13.000 grabbed from the after stack 00:14:13.000 --> 00:14:16.000 - it's done that way, because it happens to simplify some other 00:14:16.000 --> 00:14:19.000 things that you can imagine that delete and reverse. What would it take? Well, you'd have to back 00:14:19.000 --> 00:14:22.000 the cursor up. And we're going to find out pretty soon that would make it 00:14:22.000 --> 00:14:24.000 challenging for the link list. So partly we've picked a delete that 00:14:24.000 --> 00:14:26.000 made certain things work out a 00:14:26.000 --> 00:14:29.000 little better for us. 00:14:29.000 --> 00:14:31.000 So both of these operations 00:14:31.000 --> 00:14:33.000 are O of 00:14:33.000 --> 00:14:33.000 1; 00:14:33.000 --> 00:14:37.000 this is where the link list really shines, right, on this sort of stuff. Because 00:14:37.000 --> 00:14:41.000 if you have access to the point at which you want to make a modification, so if you are already kind of 00:14:41.000 --> 00:14:44.000 in the midst of the thing you're trying to rearrange, 00:14:44.000 --> 00:14:47.000 it's often to the link list's advantage, because it actually can do these this rewiring 00:14:47.000 --> 00:14:51.000 in the local context. And even if there's thousands of characters that 00:14:51.000 --> 00:14:52.000 follow the cursor. 00:14:52.000 --> 00:14:55.000 It doesn't matter. And there can be thousands of characters preceding the cursor 00:14:55.000 --> 00:14:56.000 for that matter, too. 00:14:56.000 --> 00:15:00.000 The idea that there's a much larger context that's working and doesn't impact 00:15:00.000 --> 00:15:03.000 its performance; it's really just doing this local little rearrangement of the pointers. 00:15:03.000 --> 00:15:05.000 And that's a real strength of the link list design 00:15:05.000 --> 00:15:11.000 is that flexibility that doesn't rely on everything living in contiguous memory. 00:15:11.000 --> 00:15:15.000 So let's talk about the movement options for this guy. 00:15:15.000 --> 00:15:19.000 So right now I have the contents 00:15:19.000 --> 00:15:21.000 Z, Y with 00:15:21.000 --> 00:15:22.000 the cursor 00:15:22.000 --> 00:15:27.000 between that, and the B following that. So three characters worth, 00:15:27.000 --> 00:15:29.000 00:15:29.000 --> 00:15:30.000 and then 00:15:30.000 --> 00:15:31.000 the 00:15:31.000 --> 00:15:33.000 initial two operations 00:15:33.000 --> 00:15:36.000 are pretty simple. The later two are going to be a little bit messier. So let's look at 00:15:36.000 --> 00:15:37.000 the easy ones first. 00:15:37.000 --> 00:15:40.000 Moving the cursor to the beginning, so that jump that pulls you back to the very 00:15:40.000 --> 00:15:41.000 beginning, 00:15:41.000 --> 00:15:44.000 very easy in a link list form. If I want to move the cursor to the beginning, all 00:15:44.000 --> 00:15:47.000 I need to do is take that cursor pointer, 00:15:47.000 --> 00:15:49.000 and reset it 00:15:49.000 --> 00:15:51.000 to the head, which is where the dummy cell is. And so, by 00:15:51.000 --> 00:15:54.000 virtue of doing that, 00:15:54.000 --> 00:15:57.000 00:15:57.000 --> 00:15:59.000 the character it points to is the one that's to the left, 00:15:59.000 --> 00:16:02.000 right of that, and to the left of that the dummy cell is one we're not 00:16:02.000 --> 00:16:05.000 counting. In fact, it's like nothingness that's there, 00:16:05.000 --> 00:16:07.000 and that means that the first character that follows the cursor is Z, which 00:16:07.000 --> 00:16:11.000 is the initial character of the buffer. So that reset, easy to do, we have 00:16:11.000 --> 00:16:13.000 easy access to the front of the list, 00:16:13.000 --> 00:16:14.000 and so 00:16:14.000 --> 00:16:16.000 no 00:16:16.000 --> 00:16:17.000 special 00:16:17.000 --> 00:16:18.000 crazy code required. 00:16:18.000 --> 00:16:21.000 Moving the cursor forward, also an easy thing to do, link 00:16:21.000 --> 00:16:24.000 list are designed to make it easy to work your way from the front to 00:16:24.000 --> 00:16:28.000 the back. And so moving the cursor forward is advancing that cursor by one step. So 00:16:28.000 --> 00:16:32.000 in the array form of this, what was a cursor plus, plus, the 00:16:32.000 --> 00:16:36.000 comparable form of that in the link list is cursor equals cursor next. 00:16:36.000 --> 00:16:37.000 It has a little 00:16:37.000 --> 00:16:40.000 if test there, all of the versions actually have something that is 00:16:40.000 --> 00:16:42.000 comparable to this. If you're already at the end, then there's 00:16:42.000 --> 00:16:46.000 nothing to be done, so you don't advance the cursor off into space. 00:16:46.000 --> 00:16:47.000 You 00:16:47.000 --> 00:16:51.000 check to make sure that there is something for it to point to. So in this case, 00:16:51.000 --> 00:16:54.000 seeing if the cursor's next field is null, it's not, it points to Z. Then we go 00:16:54.000 --> 00:16:58.000 ahead and update the cursor to there, 00:16:58.000 --> 00:17:00.000 which has the effect of 00:17:00.000 --> 00:17:03.000 changing things to where the Z is the character 00:17:03.000 --> 00:17:07.000 to the left of the cursor, and then the YB follows it. 00:17:07.000 --> 00:17:10.000 So moving this way, getting back to the beginning, and kind of walking our way down is kind of easy 00:17:10.000 --> 00:17:11.000 to do. 00:17:11.000 --> 00:17:15.000 Now we start to see where the link list causes us a little bit of grief, 00:17:15.000 --> 00:17:18.000 moving the cursor to the end. 00:17:18.000 --> 00:17:21.000 So now, starting at the beginning or the middle, or wherever I am, 00:17:21.000 --> 00:17:25.000 I want to advance my way to the end. I don't know where the end is. 00:17:25.000 --> 00:17:28.000 Link lists in general don't keep track of that, 00:17:28.000 --> 00:17:31.000 that in the simple form of the single link list, I'm going to have to work to 00:17:31.000 --> 00:17:34.000 find it. So I'm going to take where I am and walk my way down. 00:17:34.000 --> 00:17:37.000 And this one actually just makes use of an existing 00:17:37.000 --> 00:17:40.000 public number function to move cursor forward. It's like while the cursor next does 00:17:40.000 --> 00:17:44.000 not equal null, which is to say while we are not pointing at the very last cell of the link list, 00:17:44.000 --> 00:17:46.000 then keep going. 00:17:46.000 --> 00:17:49.000 So advance cursor equals cursor. 00:17:49.000 --> 00:17:50.000 Some will say, 00:17:50.000 --> 00:17:54.000 "Is the cursor's next field null?" No, then set the cursor to be the cursor's 00:17:54.000 --> 00:17:56.000 next field. 00:17:56.000 --> 00:17:58.000 Is the cursor's next field null? 00:17:58.000 --> 00:17:59.000 00:17:59.000 --> 00:18:02.000 No it's not, so advance the cursor to the next field. 00:18:02.000 --> 00:18:04.000 Is the cursor's next field null? Yes. 00:18:04.000 --> 00:18:06.000 So that would 00:18:06.000 --> 00:18:08.000 the last thing in the buffer, 00:18:08.000 --> 00:18:10.000 and so if we have 00:18:10.000 --> 00:18:13.000 hundreds or thousands or 00:18:13.000 --> 00:18:15.000 tens of characters, whichever it is, right, 00:18:15.000 --> 00:18:18.000 and depending on how close we already were, it will walk through 00:18:18.000 --> 00:18:21.000 everything from where the current editing position was to find the end of the 00:18:21.000 --> 00:18:22.000 buffer 00:18:22.000 --> 00:18:25.000 one by one, working it's way down. 00:18:25.000 --> 00:18:26.000 Even more painful 00:18:26.000 --> 00:18:28.000 operation 00:18:28.000 --> 00:18:32.000 is the simple one of just moving backwards. 00:18:32.000 --> 00:18:34.000 If I'm pointing right now to that B, 00:18:34.000 --> 00:18:37.000 and I'd like to get back to that Y, the 00:18:37.000 --> 00:18:41.000 link list is oblivious about how you got there. It knows where to go from 00:18:41.000 --> 00:18:45.000 here, but that backing up is not supported in the simple form. 00:18:45.000 --> 00:18:48.000 In order to find out what preceded the V, 00:18:48.000 --> 00:18:51.000 our only option right now is to go back to the beginning, 00:18:51.000 --> 00:18:53.000 and find it. 00:18:53.000 --> 00:18:56.000 So starting this pointer CP at the head, 00:18:56.000 --> 00:18:57.000 and saying, 00:18:57.000 --> 00:18:58.000 do you point to; 00:18:58.000 --> 00:19:01.000 is your next field the cursor? 00:19:01.000 --> 00:19:04.000 And this one says, no. And then it says, well okay advance. Does your next field point to 00:19:04.000 --> 00:19:08.000 the cursor? No it's not. Go to this one, does your next field point to the same place 00:19:08.000 --> 00:19:12.000 as the cursor, yes. Wise next field, points to the same place the cursor 00:19:12.000 --> 00:19:13.000 does. 00:19:13.000 --> 00:19:18.000 That must mean that Y was the one that preceded the cursor in the link 00:19:18.000 --> 00:19:22.000 list. So after having gone back to the beginning and walked all the way down, especially if I'm near 00:19:22.000 --> 00:19:24.000 the end, that's a long traversal. 00:19:24.000 --> 00:19:26.000 And when I get there, I can say "Oh yeah 00:19:26.000 --> 00:19:34.000 that's the one, back it up to there." 00:19:34.000 --> 00:19:37.000 Walking it all the way down; 00:19:37.000 --> 00:19:41.000 some good link list coding, good practice in there, but 00:19:41.000 --> 00:19:42.000 again a funny, 00:19:42.000 --> 00:19:44.000 funny set of trade offs, right? 00:19:44.000 --> 00:19:47.000 Does anybody have any questions about any part of the code? 00:19:47.000 --> 00:19:49.000 At the end 00:19:49.000 --> 00:19:51.000 could you just 00:19:51.000 --> 00:19:53.000 save the address of the pointer, and then the 00:19:53.000 --> 00:19:54.000 person 00:19:54.000 --> 00:19:56.000 who wants to go to the end could just 00:19:56.000 --> 00:19:59.000 access that? That was a great idea. It's like, so what about 00:19:59.000 --> 00:20:02.000 improving this? Right now all we have is a pointer to the front, and each 00:20:02.000 --> 00:20:06.000 pointer has only a pointer to the next. It's like, maybe we need to add some 00:20:06.000 --> 00:20:07.000 more stuff, track 00:20:07.000 --> 00:20:10.000 some more things. What if we tracked the 00:20:10.000 --> 00:20:12.000 tail? Would that help us? 00:20:12.000 --> 00:20:16.000 If we tracked the tail, we could move to the end quickly. Would 00:20:16.000 --> 00:20:19.000 that help us moving backwards? 00:20:19.000 --> 00:20:22.000 No. It solves one of our problems though, so you're on to something. 00:20:22.000 --> 00:20:25.000 Let's look at what we've got, and then we'll start thinking about ways to fix it, because that's going to be 00:20:25.000 --> 00:20:27.000 one of the pieces we're going to think about. 00:20:27.000 --> 00:20:29.000 So if I move to link list 00:20:29.000 --> 00:20:31.000 relative to what I have, 00:20:31.000 --> 00:20:33.000 again, it's like a shell game. The old ends moved. 00:20:33.000 --> 00:20:36.000 It used to be that editing was slow and we didn't like that. 00:20:36.000 --> 00:20:38.000 Then we made big movement fast, 00:20:38.000 --> 00:20:41.000 and now we have this funny thing were moving in certain ways is easy and 00:20:41.000 --> 00:20:45.000 moving other ways is bad. So moving 00:20:45.000 --> 00:20:46.000 down the document 00:20:46.000 --> 00:20:49.000 is good, moving backwards in the document not good. 00:20:49.000 --> 00:20:53.000 Moving back to the beginning is good, but moving to the end is bad. 00:20:53.000 --> 00:20:56.000 But then editing in all situations is good, 00:20:56.000 --> 00:20:59.000 another sort of quirky set of things. It just fells like it's like that game where the 00:20:59.000 --> 00:21:00.000 00:21:00.000 --> 00:21:03.000 gophers pop there heads up, and you pound on them. And they just show up somewhere else. As 00:21:03.000 --> 00:21:06.000 soon as you get one thing fixed, it seems like something else had 00:21:06.000 --> 00:21:08.000 to give. 00:21:08.000 --> 00:21:11.000 This, I think, would actually be my ideal editor, because it turns out I 00:21:11.000 --> 00:21:14.000 have exactly this working style, which is like, I'll be down at the end of 00:21:14.000 --> 00:21:17.000 the document, and I'll have to be generating new things. And 00:21:17.000 --> 00:21:20.000 I'm getting tired and I don't like writing, and what I keep wanting to 00:21:20.000 --> 00:21:24.000 do is I go back to the beginning. And I read that again, because I worked on that the most and 00:21:24.000 --> 00:21:27.000 it's nice. And so I go back to the beginning, and I'm like oh yeah, look at this. Look at how lovely this 00:21:27.000 --> 00:21:28.000 is, 00:21:28.000 --> 00:21:31.000 and then I never want to go back to the end where all the trouble is. So I just 00:21:31.000 --> 00:21:34.000 keep going back to the beginning and editing in the front, and then slowly get 00:21:34.000 --> 00:21:37.000 back to the end. And then I don't like it again, and then I go back to the beginning. 00:21:37.000 --> 00:21:39.000 But I never actually willingly go to the end 00:21:39.000 --> 00:21:42.000 to face my fears. So 00:21:42.000 --> 00:21:46.000 this is the editor I need. 00:21:46.000 --> 00:21:47.000 Also 00:21:47.000 --> 00:21:51.000 has a bit of kicker in terms of space. 00:21:51.000 --> 00:21:56.000 So if the space for a character is 1 byte, which it typically is, 00:21:56.000 --> 00:21:58.000 and the space for a pointer is 4 bytes, then 00:21:58.000 --> 00:22:03.000 each of those link list cells is costing us 5 bytes, right? 00:22:03.000 --> 00:22:03.000 Which is 00:22:03.000 --> 00:22:07.000 sort of five times the amount of storage that the character itself 00:22:07.000 --> 00:22:10.000 would have taken alone, 00:22:10.000 --> 00:22:14.000 so in the vector it's fairly tight, a little bit more going up on that. But we've actually kind of blown 00:22:14.000 --> 00:22:18.000 it up by another factor of two even beyond the stack, which is not quite a 00:22:18.000 --> 00:22:22.000 good direction to be going. 00:22:22.000 --> 00:22:24.000 So let's talk about what we can do to fix that. 00:22:24.000 --> 00:22:27.000 I'm not going to go through the process of this. I'm just going to kind of give you a thought 00:22:27.000 --> 00:22:29.000 exercise, so you can just kind of 00:22:29.000 --> 00:22:30.000 visualize it. 00:22:30.000 --> 00:22:34.000 What we have here, right, is an information problem. We 00:22:34.000 --> 00:22:37.000 only are keeping track of certain 00:22:37.000 --> 00:22:39.000 pathways through that link list. 00:22:39.000 --> 00:22:42.000 What if we actually increased our access to the list? 00:22:42.000 --> 00:22:46.000 We don't need to have the full access of a vector, being able to know the nth 00:22:46.000 --> 00:22:49.000 character by number is not actually as far as we need to go. But we 00:22:49.000 --> 00:22:53.000 do just need a little bit more coordination between a cell and its neighbors. 00:22:53.000 --> 00:22:56.000 So the two things that would buy us out of a lot of the things that we're seeing 00:22:56.000 --> 00:22:58.000 here is if we added a tail pointer. 00:22:58.000 --> 00:23:01.000 So if I just add a tail pointer that points to the last cell, then I can move to the end 00:23:01.000 --> 00:23:02.000 quickly. 00:23:02.000 --> 00:23:05.000 You say cursor equals tail; the same that cursor equals head. So what was an O of N 00:23:05.000 --> 00:23:07.000 problem now is an O of 1. 00:23:07.000 --> 00:23:08.000 The backing up, 00:23:08.000 --> 00:23:12.000 not as easily solved by just having one thing that's going on; 00:23:12.000 --> 00:23:13.000 I also am going to need, 00:23:13.000 --> 00:23:17.000 on a per cell basis this ability to move backwards. 00:23:17.000 --> 00:23:19.000 00:23:19.000 --> 00:23:22.000 Well, it's just symmetry, right? If A knows where B is, and B knows where C is, and C knows where D is, what's to stop 00:23:22.000 --> 00:23:25.000 it from also just tracking the information going in the other way, 00:23:25.000 --> 00:23:30.000 so that D knows that it came from C, and C knows it came from B, and so on. 00:23:30.000 --> 00:23:35.000 And so having this completely symmetric parallel set of links that just run the other 00:23:35.000 --> 00:23:35.000 direction, 00:23:35.000 --> 00:23:39.000 then buys you the ability to move backwards with ease. 00:23:39.000 --> 00:23:42.000 The other thing that this would give you is the tail pointer 00:23:42.000 --> 00:23:46.000 is almost not quite enough to give you the exact place of the end. It gives it to 00:23:46.000 --> 00:23:47.000 you, but you also have to be 00:23:47.000 --> 00:23:49.000 careful about what is it going to take to update it? 00:23:49.000 --> 00:23:53.000 And so inserting can easily update the tail pointer. It's the deleting that kind of 00:23:53.000 --> 00:23:55.000 would get you into trouble. If you were 00:23:55.000 --> 00:23:58.000 here and deleting, you would have to kind of keep track of making sure if you were deleting 00:23:58.000 --> 00:24:01.000 the thing that was the tail pointer, so you have to be a little about making sure 00:24:01.000 --> 00:24:04.000 you don't lose track of your tail. 00:24:04.000 --> 00:24:06.000 But once I have both of these in place, 00:24:06.000 --> 00:24:09.000 I suddenly have something that can move 00:24:09.000 --> 00:24:10.000 forward easily, O of 1; 00:24:10.000 --> 00:24:15.000 move to the front easily, O of 1; move to the tail easily, O of 1; move backwards O of 1. 00:24:15.000 --> 00:24:20.000 So I can make all of my operations 00:24:20.000 --> 00:24:22.000 O of 1. 00:24:22.000 --> 00:24:24.000 Deleting, you're still going to have to delete the individual cells, but there's one place where 00:24:24.000 --> 00:24:27.000 it suffers, but that's a much more rare occurrence. 00:24:27.000 --> 00:24:31.000 And I can get all six; it moves fast, in every dimension; 00:24:31.000 --> 00:24:32.000 inserts and deletes 00:24:32.000 --> 00:24:35.000 all good. 00:24:35.000 --> 00:24:37.000 And there are two places we paid for it. 00:24:37.000 --> 00:24:40.000 One is certainly the memory. 00:24:40.000 --> 00:24:45.000 So a 4 byte pointer one direction, a 4 byte pointer another direction, 00:24:45.000 --> 00:24:49.000 plus the 1 byte for the character means we have a 9 byte per 00:24:49.000 --> 00:24:50.000 character 00:24:50.000 --> 00:24:52.000 storage usage. 00:24:52.000 --> 00:24:55.000 And the other way, which is a little bit hard to represent with some number 00:24:55.000 --> 00:24:57.000 in a quantitative way, but actually it's also an important factor, which is 00:24:57.000 --> 00:25:00.000 your code got a lot more complicated. 00:25:00.000 --> 00:25:02.000 If you looked at the code for the vector of the stack version, it was a few 00:25:02.000 --> 00:25:03.000 lines here a few lines there. 00:25:03.000 --> 00:25:06.000 If you looked at the code for the link list singly we looked at, it's like you've got 00:25:06.000 --> 00:25:07.000 to wire up these pointers. 00:25:07.000 --> 00:25:10.000 Now imagine you've got to wire up twice as many pointers. 00:25:10.000 --> 00:25:14.000 Not only do you have the in and out going one direction, you'll have the in and out going the 00:25:14.000 --> 00:25:15.000 other direction, 00:25:15.000 --> 00:25:17.000 and just the 00:25:17.000 --> 00:25:21.000 opportunity to make errors with pointers has now gone up by a factor of two. 00:25:21.000 --> 00:25:24.000 The idea that you could get a vector version of this up and running in 00:25:24.000 --> 00:25:26.000 no time, 00:25:26.000 --> 00:25:30.000 compared to several days, let's say, of getting the doubly link one working. It 00:25:30.000 --> 00:25:32.000 might be a factor to keep in mind. 00:25:32.000 --> 00:25:36.000 If you're kind of investing for the long haul, sure, work hard, but know 00:25:36.000 --> 00:25:41.000 that it wasn't for free. 00:25:41.000 --> 00:25:42.000 So you've 00:25:42.000 --> 00:25:45.000 got 89 percent overhead. 00:25:45.000 --> 00:25:49.000 That's a lot of overhead for small bits of data. 00:25:49.000 --> 00:25:53.000 More likely, and this is actually the way that most modern word processors 00:25:53.000 --> 00:25:54.000 do do this, 00:25:54.000 --> 00:25:58.000 is they don't just make one or the other; it's not just a array or a stack or a link list. 00:25:58.000 --> 00:25:59.000 They 00:25:59.000 --> 00:26:04.000 actually look at ways to combine the features of both to get a hybrid, 00:26:04.000 --> 00:26:05.000 where you can take 00:26:05.000 --> 00:26:10.000 some of the advantages of one pipe of data, but also try to avoid its worst 00:26:10.000 --> 00:26:13.000 weaknesses. So the most common way they're going to do this is some kind of 00:26:13.000 --> 00:26:15.000 chunking strategy, 00:26:15.000 --> 00:26:17.000 where you have a segment of the 00:26:17.000 --> 00:26:21.000 buffer that kind of is moved aside and worked on in one little array. It 00:26:21.000 --> 00:26:25.000 might be that those arrays are ten or 20 or a sentence long, or paragraph long. 00:26:25.000 --> 00:26:28.000 It might be that they are actually dictated by when you change styles. 00:26:28.000 --> 00:26:30.000 All of the code that's in one font 00:26:30.000 --> 00:26:33.000 kind of gets moved into one chunk, and as soon as you change styles, it breaks 00:26:33.000 --> 00:26:37.000 into a new chunk that follows to kind of keep track of the formatting information. It 00:26:37.000 --> 00:26:40.000 depends on some of the other features that are present in the processor 00:26:40.000 --> 00:26:43.000 here, but it's likely that rather than having all 1,000 or all 10 million 00:26:43.000 --> 00:26:45.000 characters in one 00:26:45.000 --> 00:26:46.000 data structure, 00:26:46.000 --> 00:26:47.000 they're actually kind of separated 00:26:47.000 --> 00:26:50.000 to where you have a little bit of both. So the most likely thing is that 00:26:50.000 --> 00:26:53.000 there is some kind of array strategy on a chunk basis. 00:26:53.000 --> 00:26:56.000 There's a linking between those chunks. 00:26:56.000 --> 00:26:59.000 And what this allows you do is to 00:26:59.000 --> 00:27:01.000 share that overhead cost. 00:27:01.000 --> 00:27:05.000 So in this case, if I had every sentence in a chunk by itself, 00:27:05.000 --> 00:27:08.000 and then if I had next and previous pointer, which pointed to the next sentence and the previous 00:27:08.000 --> 00:27:09.000 00:27:09.000 --> 00:27:11.000 sentence, when I need to move the cursor forward or backwards 00:27:11.000 --> 00:27:15.000 within a chunk, it's just in changing this index. Oh, it's in index two, but 00:27:15.000 --> 00:27:16.000 now it's moving to three, or four, or five. 00:27:16.000 --> 00:27:20.000 And then when it gets to the end of that chunk, to the end of that sentence, and you move forward, well then it 00:27:20.000 --> 00:27:23.000 follows the next link forward. 00:27:23.000 --> 00:27:25.000 Similarly moving backwards, right, so 00:27:25.000 --> 00:27:29.000 within a chunk doing array-like manipulations, and then as it crosses the 00:27:29.000 --> 00:27:32.000 boundary using a link list steps to get further down. 00:27:32.000 --> 00:27:35.000 So when it needs to move to the front or the back, those things are easily 00:27:35.000 --> 00:27:37.000 accessible using head and tail pointers. 00:27:37.000 --> 00:27:39.000 When I need to do an edit, 00:27:39.000 --> 00:27:42.000 it operates in this array-like kind of 00:27:42.000 --> 00:27:44.000 way 00:27:44.000 --> 00:27:46.000 when there's space within that chunk. So you start editing in a sentence, then it 00:27:46.000 --> 00:27:49.000 would kind of do some shifting on that array. But the array itself being pretty 00:27:49.000 --> 00:27:51.000 small 00:27:51.000 --> 00:27:53.000 means that although you are paying that cost of shuffling, 00:27:53.000 --> 00:27:56.000 you are doing it on a smaller chunk. It's not 1,000 characters that have to move, it's 00:27:56.000 --> 00:27:59.000 20 or 40 that have to move within that sentence. 00:27:59.000 --> 00:28:02.000 And that when 00:28:02.000 --> 00:28:04.000 you need to insert a new sentence or a new chunk in the middle, you are doing 00:28:04.000 --> 00:28:08.000 link list-like manipulations to build a new chunk and insert and splice, 00:28:08.000 --> 00:28:11.000 so you have that flexibility, where the many characters aren't 00:28:11.000 --> 00:28:14.000 all affected by it. So you have kind of local array editing 00:28:14.000 --> 00:28:16.000 and global link list editing 00:28:16.000 --> 00:28:18.000 that gives you kind of the best of both worlds. 00:28:18.000 --> 00:28:19.000 00:28:19.000 --> 00:28:23.000 But again, the factor is cost showing up in code complexity. 00:28:23.000 --> 00:28:27.000 That what you are looking at is kind of space/time tradeoff. Well, I can throw space at 00:28:27.000 --> 00:28:29.000 this, and most people will say that in computer science, "Well I can throw space and this and 00:28:29.000 --> 00:28:31.000 get better times." So 00:28:31.000 --> 00:28:35.000 the double link list is a good example of something that throws space at a problem to get better 00:28:35.000 --> 00:28:37.000 time. There's a lot of overhead. 00:28:37.000 --> 00:28:39.000 Moving to the chunk list says I want to get back some of that space. I'm 00:28:39.000 --> 00:28:43.000 not willing to invest 90 percent overhead to get this. Can I find a way to 00:28:43.000 --> 00:28:44.000 take a 00:28:44.000 --> 00:28:47.000 whole sentence worth of things and add a few pointers here? That means that 00:28:47.000 --> 00:28:48.000 there's only two 00:28:48.000 --> 00:28:51.000 4 byte pointers per each sentence as opposed to each character, 00:28:51.000 --> 00:28:53.000 and that really way reduces the overhead. 00:28:53.000 --> 00:28:55.000 But then now, 00:28:55.000 --> 00:28:58.000 the problem is still back on your shoulders in terms of effort, 00:28:58.000 --> 00:29:02.000 now I have both an array, and a link list, and double links, and 00:29:02.000 --> 00:29:03.000 a 00:29:03.000 --> 00:29:07.000 lot of complicated stuff flying around to make it work. 00:29:07.000 --> 00:29:08.000 But it does work out. 00:29:08.000 --> 00:29:11.000 It's kind of the process that designers go through when 00:29:11.000 --> 00:29:12.000 they're building these kinds of 00:29:12.000 --> 00:29:16.000 key structures. It like, what are the options, and do any of the basic 00:29:16.000 --> 00:29:18.000 things we know about work? What about 00:29:18.000 --> 00:29:21.000 building even fancier things that kind of take the best of both world and mix 00:29:21.000 --> 00:29:23.000 them together and have a little bit of the weaknesses of this a little of that, but none of 00:29:23.000 --> 00:29:26.000 the really awful 00:29:26.000 --> 00:29:27.000 worst cases 00:29:27.000 --> 00:29:28.000 present in the end result. 00:29:28.000 --> 00:29:31.000 So in this case you'd have a lot of things that were constant time where constant was 00:29:31.000 --> 00:29:33.000 defined by the chunk size. 00:29:33.000 --> 00:29:37.000 So O of 100, let's say, if that was the maximum chunk size 00:29:37.000 --> 00:29:39.000 to do these things, as opposed to O of 00:29:39.000 --> 00:29:42.000 N for the whole block net, 00:29:42.000 --> 00:29:43.000 so kind of a neat 00:29:43.000 --> 00:29:45.000 little thing to think through. 00:29:45.000 --> 00:29:48.000 There was a time, just so you can feel 00:29:48.000 --> 00:29:51.000 gratified about how I've worked you very hard, but actually in the past I've been 00:29:51.000 --> 00:29:53.000 even harsher. 00:29:53.000 --> 00:29:57.000 We actually had them build the doubly linked chunked list editor buffer, 00:29:57.000 --> 00:30:00.000 and now we have you build the singly linked chunk list PQ, and it's 00:30:00.000 --> 00:30:02.000 actually 00:30:02.000 --> 00:30:06.000 still very complicated, as you'll find out, but not nearly quite as hairy as the 00:30:06.000 --> 00:30:09.000 full in both directions every way 00:30:09.000 --> 00:30:11.000 craziness. 00:30:11.000 --> 00:30:14.000 And you can kind of imagine in your mind. 00:30:14.000 --> 00:30:16.000 Any questions about edited buffer? 00:30:16.000 --> 00:30:18.000 We?ve got two more data structures to implement, 00:30:18.000 --> 00:30:20.000 two more things that we've been using, 00:30:20.000 --> 00:30:25.000 and happy to have in our arsenal that we need to know how they work. 00:30:25.000 --> 00:30:29.000 Let's talk about map first, and in 00:30:29.000 --> 00:30:31.000 fact the strategy that I'm going to end up showing you for map is actually going to be 00:30:31.000 --> 00:30:35.000 the same one we use for set, then we'll come back and think of an even more clever 00:30:35.000 --> 00:30:37.000 way to do map, too. So 00:30:37.000 --> 00:30:40.000 we've got two things to talk about, which are binary search trees and hashing. We'll 00:30:40.000 --> 00:30:41.000 do binary search trees first, 00:30:41.000 --> 00:30:44.000 and then we'll get back to the hashing the second time around. 00:30:44.000 --> 00:30:46.000 So 00:30:46.000 --> 00:30:49.000 map is, next to vector, probably the most useful thing you've got going on in 00:30:49.000 --> 00:30:50.000 your class library, 00:30:50.000 --> 00:30:53.000 all kinds of things that do look up based 00:30:53.000 --> 00:30:56.000 activities; looking up students by ID number, 00:30:56.000 --> 00:30:57.000 phone numbers, 00:30:57.000 --> 00:31:00.000 by name, any of these kinds of things where you need to have this key at associated satellite 00:31:00.000 --> 00:31:02.000 information. And 00:31:02.000 --> 00:31:05.000 you'd like to be able to do that retrieval and update quickly. 00:31:05.000 --> 00:31:08.000 The map is the one for you. 00:31:08.000 --> 00:31:11.000 So if you remember it's key value fears, it's all about key being the 00:31:11.000 --> 00:31:15.000 identifying mark that allows you to find the associated value. And then the value 00:31:15.000 --> 00:31:18.000 is actually just being stored and kind of retrieved without using it 00:31:18.000 --> 00:31:21.000 or examining it in any kind of interesting way. 00:31:21.000 --> 00:31:25.000 And what we're really interested in here is how to make this guy work really efficiently. So efficiently 00:31:25.000 --> 00:31:27.000 being hopefully log in or better, 00:31:27.000 --> 00:31:31.000 if we could get that on both the update operations that add and remove things 00:31:31.000 --> 00:31:33.000 from the map, as well as the 00:31:33.000 --> 00:31:34.000 look up operation, then 00:31:34.000 --> 00:31:37.000 that would please probably all our clients immensely, if we could get both of 00:31:37.000 --> 00:31:41.000 those going well. 00:31:41.000 --> 00:31:45.000 So I'm going to build you on that. I probably have enough time to do that. 00:31:45.000 --> 00:31:46.000 That 00:31:46.000 --> 00:31:48.000 attacks it from 00:31:48.000 --> 00:31:53.000 using simple tools to start with, just to kind of get a base line for what we can do. 00:31:53.000 --> 00:31:54.000 00:31:54.000 --> 00:31:56.000 Vector, our old friend vector, 00:31:56.000 --> 00:31:59.000 can't get too much mileage out of vector, apparently, because you can always find a 00:31:59.000 --> 00:32:00.000 way to make it do something 00:32:00.000 --> 00:32:01.000 useful for you. 00:32:01.000 --> 00:32:02.000 It 00:32:02.000 --> 00:32:03.000 gives us the convenience 00:32:03.000 --> 00:32:05.000 00:32:05.000 --> 00:32:07.000 of managing that thing with low overhead, 00:32:07.000 --> 00:32:09.000 direct access is good. 00:32:09.000 --> 00:32:11.000 We make a parastruct 00:32:11.000 --> 00:32:13.000 that has the key and the value together. 00:32:13.000 --> 00:32:16.000 We store it into a vector, 00:32:16.000 --> 00:32:18.000 and then we 00:32:18.000 --> 00:32:21.000 may or may not sort that vector. But if it's sorted, there's probably some good reasons. Think 00:32:21.000 --> 00:32:22.000 about 00:32:22.000 --> 00:32:26.000 what order to sort it in, what information to be using to sort that, 00:32:26.000 --> 00:32:28.000 and then we'll get value to add. We're just going to go 00:32:28.000 --> 00:32:30.000 through doing it, rather than talk about it. 00:32:30.000 --> 00:32:32.000 We'll just go over and 00:32:32.000 --> 00:32:36.000 make a vector happen. 00:32:36.000 --> 00:32:38.000 Let's get over here in X-code. 00:32:38.000 --> 00:32:40.000 Let's take a look at 00:32:40.000 --> 00:32:42.000 my map dot H, 00:32:42.000 --> 00:32:43.000 and all 00:32:43.000 --> 00:32:46.000 I'm going to put in there is actually add and get value. 00:32:46.000 --> 00:32:48.000 And 00:32:48.000 --> 00:32:50.000 the other one contains key and remove." 00:32:50.000 --> 00:32:52.000 00:32:52.000 --> 00:32:55.000 But these are these two operations that really matter to us, getting something in there, and 00:32:55.000 --> 00:32:57.000 getting something back out. 00:32:57.000 --> 00:32:58.000 00:32:58.000 --> 00:33:00.000 And so I'm going to start by 00:33:00.000 --> 00:33:02.000 building this thing on vector. So let me go ahead and 00:33:02.000 --> 00:33:14.000 indicate that I plan to use vector as part of 00:33:14.000 --> 00:33:15.000 what I'm doing. So I defined a little structure 00:33:15.000 --> 00:33:20.000 that's the pair. I'm going to call that "pair" 00:33:20.000 --> 00:33:23.000 even, that's a nice word for it. So the pair of the key and the value that go together, 00:33:23.000 --> 00:33:29.000 and then what I will store here 00:33:29.000 --> 00:33:31.000 is a vector of those pairs. They'll 00:33:31.000 --> 00:33:36.000 give me a key to value; I'll stick them together in that little struct, stick them into my vector, and 00:33:36.000 --> 00:33:38.000 hopefully be able to retrieve them later. 00:33:38.000 --> 00:33:42.000 So given that I'm depending only on vector, I don?t have any other 00:33:42.000 --> 00:33:45.000 information, I don't technically need to go out of my way to disallow then 00:33:45.000 --> 00:33:48.000 memory-wise copying, because vector does a deep copy. But I'm going to leave it in there, because I 00:33:48.000 --> 00:33:52.000 plan on going some place that is eventually going to need it. I might as well be safe from 00:33:52.000 --> 00:33:54.000 the get go. 00:33:54.000 --> 00:33:55.000 The vector 00:33:55.000 --> 00:34:00.000 will be set up and destroyed automatically as part of management of my 00:34:00.000 --> 00:34:04.000 data member. So I actually don't have any explicit allocation/deallocation, so if I look at my 00:34:04.000 --> 00:34:06.000 constructor and destructor they'll be totally empty. 00:34:06.000 --> 00:34:11.000 And then add and get value are where I'm going to actually get to do some work. 00:34:11.000 --> 00:34:14.000 Before I implement them, I'm just going to think for a second about 00:34:14.000 --> 00:34:16.000 how this is going to play out. 00:34:16.000 --> 00:34:20.000 Let's say I'm doing a vector that's storing 00:34:20.000 --> 00:34:26.000 strings and their length. Let's say, it's just a map of 00:34:26.000 --> 00:34:27.000 string, 00:34:27.000 --> 00:34:30.000 00:34:30.000 --> 00:34:34.000 and if I do M dot add 00:34:34.000 --> 00:34:37.000 car. I'm going to store a three with it, 00:34:37.000 --> 00:34:39.000 so on the other side of the wall, 00:34:39.000 --> 00:34:41.000 what we're going to do is make a little struct 00:34:41.000 --> 00:34:43.000 that has the word car 00:34:43.000 --> 00:34:47.000 and the number three with it, and we package it into our ray. 00:34:47.000 --> 00:34:51.000 And so then we get a second call to put something in like 00:34:51.000 --> 00:34:55.000 dog, also with a three, then we'll make a second one of these and stick it in 00:34:55.000 --> 00:34:58.000 our ray. 00:34:58.000 --> 00:35:01.000 And so on, so the idea is to have the vector really kind of manage where the storage is going. And 00:35:01.000 --> 00:35:02.000 then what 00:35:02.000 --> 00:35:05.000 we'll just do is package it up and stick it in there. 00:35:05.000 --> 00:35:08.000 The one thing, though, that I have to think a little bit about, because it sounds 00:35:08.000 --> 00:35:11.000 like add should be pretty easy. It almost seems like what I need to do is something 00:35:11.000 --> 00:35:13.000 as simple as this. And 00:35:13.000 --> 00:35:15.000 I'll start to write the code, and then you can tell me why 00:35:15.000 --> 00:35:17.000 this isn't what I want to do. 00:35:17.000 --> 00:35:20.000 I say pair TP, and I say P dot 00:35:20.000 --> 00:35:23.000 key equals key P dot 00:35:23.000 --> 00:35:30.000 value equals val, and then I say entries dot add P. 00:35:30.000 --> 00:35:32.000 Almost, but not quite what I want, 00:35:32.000 --> 00:35:34.000 what have I forgotten 00:35:34.000 --> 00:35:39.000 to take care of? 00:35:39.000 --> 00:35:42.000 If it's not already in there; 00:35:42.000 --> 00:35:46.000 this has to do with just what's the interface for nap. What does nap say about it if you 00:35:46.000 --> 00:35:48.000 try to add 00:35:48.000 --> 00:35:51.000 for a key a different value. So in the case of this one, it didn't quite make sense that I would add 00:35:51.000 --> 00:35:55.000 something, but let's say I accidentally put car in there with a link to the first time that I went 00:35:55.000 --> 00:35:57.000 to go fix it, right, 00:35:57.000 --> 00:36:01.000 that my subsequent to ask it store car with the proper value, 00:36:01.000 --> 00:36:03.000 three, 00:36:03.000 --> 00:36:05.000 doesn't create a totally new entry. 00:36:05.000 --> 00:36:09.000 It has to find the existing entry that already has car and overwrite it. 00:36:09.000 --> 00:36:13.000 So the behavior of mad was it could add when it was not previously existing. 00:36:13.000 --> 00:36:16.000 It also could be an overwrite of an existing value. So we do need 00:36:16.000 --> 00:36:17.000 to find 00:36:17.000 --> 00:36:20.000 if there is an entry that already has that key, and just 00:36:20.000 --> 00:36:21.000 replace its value 00:36:21.000 --> 00:36:23.000 rather than 00:36:23.000 --> 00:36:26.000 add a second entry with the same key. So at any given point, right, in this whole 00:36:26.000 --> 00:36:28.000 vector, there should be one entry 00:36:28.000 --> 00:36:30.000 for each unique key. 00:36:30.000 --> 00:36:33.000 So I have to do a search. And 00:36:33.000 --> 00:36:35.000 let me 00:36:35.000 --> 00:36:38.000 write this code, and then I'm going to mention why I'm going to decompose it in a 00:36:38.000 --> 00:36:39.000 second. 00:36:39.000 --> 00:36:42.000 If I do entries dot size, I plus plus 00:36:42.000 --> 00:36:44.000 then if 00:36:44.000 --> 00:36:49.000 entries of I dot key equals key. 00:36:49.000 --> 00:36:51.000 Then 00:36:51.000 --> 00:36:52.000 let's break. Let me pull out 00:36:52.000 --> 00:37:00.000 I so I can get access to it when I'm done. So after this 00:37:00.000 --> 00:37:01.000 loop exits, 00:37:01.000 --> 00:37:06.000 if I is less than entries dot size, 00:37:06.000 --> 00:37:06.000 then 00:37:06.000 --> 00:37:09.000 I know that it found a match. 00:37:09.000 --> 00:37:12.000 In the case where it went through all of them, and it never hit the break. Then I will 00:37:12.000 --> 00:37:15.000 have gone all the way to where it is exactly equal to entries dot size; if it didn't then I 00:37:15.000 --> 00:37:16.000 just 00:37:16.000 --> 00:37:19.000 have a place to overwrite. And I can say entries of 00:37:19.000 --> 00:37:21.000 I dot val 00:37:21.000 --> 00:37:23.000 equals val. 00:37:23.000 --> 00:37:27.000 Otherwise I go through this process of adding a new one. I had to 00:37:27.000 --> 00:37:30.000 go hunt that thing down 00:37:30.000 --> 00:37:33.000 and replace it. 00:37:33.000 --> 00:37:35.000 That code, 00:37:35.000 --> 00:37:38.000 when you look at it, you can say, "Gosh I feel like that code's going to be familiar." 00:37:38.000 --> 00:37:43.000 Because that idea of searching to find a match to the key 00:37:43.000 --> 00:37:45.000 probably is going to come in really handy 00:37:45.000 --> 00:37:46.000 when I need to do get value, 00:37:46.000 --> 00:37:50.000 but it has the same exact problem of oh, I've got to go find that match. So in fact, 00:37:50.000 --> 00:37:55.000 that kind of motivates the idea of why don't I just take this little piece of code 00:37:55.000 --> 00:37:59.000 and separate it out and do a helper that they can both use. I 00:37:59.000 --> 00:38:02.000 may not have planned for this from the beginning, but once I see it happening, I might as well fix 00:38:02.000 --> 00:38:05.000 it. So I can say find index 00:38:05.000 --> 00:38:07.000 for key that 00:38:07.000 --> 00:38:10.000 given a key will return to you 00:38:10.000 --> 00:38:11.000 00:38:11.000 --> 00:38:15.000 the vector index at which 00:38:15.000 --> 00:38:19.000 that key is or a negative one. I don't know if I need to keep that 00:38:19.000 --> 00:38:22.000 one anymore. 00:38:22.000 --> 00:38:24.000 If it ever found it, it returns I otherwise 00:38:24.000 --> 00:38:28.000 it returned negative one. That can be our not found. 00:38:28.000 --> 00:38:37.000 I can actually use that up here, 00:38:37.000 --> 00:38:40.000 and then I can say, if found 00:38:40.000 --> 00:38:43.000 does not equal negative one. 00:38:43.000 --> 00:38:44.000 Then we do that, 00:38:44.000 --> 00:38:46.000 and then that 00:38:46.000 --> 00:38:48.000 tells us that 00:38:48.000 --> 00:38:51.000 this little piece of code is going to come in handy right up here, 00:38:51.000 --> 00:38:55.000 get value if found does not equal 00:38:55.000 --> 00:38:56.000 negative one, then we can return that 00:38:56.000 --> 00:39:02.000 and what is the behavior of get value if it didn't find it? Does anybody remember? 00:39:02.000 --> 00:39:06.000 I ask it to get the value in this case for lollipop and it's not 00:39:06.000 --> 00:39:07.000 there, 00:39:07.000 --> 00:39:13.000 does it turn zero, what does it 00:39:13.000 --> 00:39:22.000 do? Untracked forget value, remember? 00:39:22.000 --> 00:39:25.000 This is an error. 00:39:25.000 --> 00:39:26.000 There is not sort of a 00:39:26.000 --> 00:39:30.000 general case return value you can produce here 00:39:30.000 --> 00:39:33.000 that will really sort of sufficiently describe to someone who's made 00:39:33.000 --> 00:39:37.000 this call in error what happened, right? I can't return zero, because it happens to be 00:39:37.000 --> 00:39:40.000 that sometimes I?m putting in, and sometimes I'm putting strings or structs 00:39:40.000 --> 00:39:44.000 kinds of things. There's no general zero or negative one to use, 00:39:44.000 --> 00:39:47.000 so the behavior of get value is if you ask it to retrieve a value for a key, it ought to be 00:39:47.000 --> 00:39:48.000 there. So 00:39:48.000 --> 00:39:52.000 the corresponding implementation would likely also just go through 00:39:52.000 --> 00:39:56.000 the process of contains key, which we call find index for key and check. 00:39:56.000 --> 00:39:58.000 It could be used by the client 00:39:58.000 --> 00:40:01.000 if they need to know in advance that it's really there. 00:40:01.000 --> 00:40:04.000 So let's see how I'm 00:40:04.000 --> 00:40:06.000 doing. Let's go back to my 00:40:06.000 --> 00:40:11.000 code here, 00:40:11.000 --> 00:40:13.000 and this 00:40:13.000 --> 00:40:15.000 being 00:40:15.000 --> 00:40:16.000 not my specialty, 00:40:16.000 --> 00:40:20.000 and just see that I can put something in and 00:40:20.000 --> 00:40:25.000 get it back out would be a good first test to see that things are 00:40:25.000 --> 00:40:29.000 going as they should. Oh, no it didn't like that. Oh yeah, find 00:40:29.000 --> 00:40:30.000 index for key. 00:40:30.000 --> 00:40:32.000 Find index for key 00:40:32.000 --> 00:40:34.000 was not 00:40:34.000 --> 00:40:38.000 populated clear in my dot H, so let me go move it there. I'll put 00:40:38.000 --> 00:40:42.000 it in the private section, because I don?t want this to be something that is 00:40:42.000 --> 00:40:45.000 exposed outside of the implementation. The idea that there is an index and 00:40:45.000 --> 00:40:48.000 there is a vector is not really something that I would want to make part of my 00:40:48.000 --> 00:40:54.000 public interface. It's really just an internal detail of how I'm managing stuff. 00:40:54.000 --> 00:40:56.000 Something about what I did, oh yeah, I 00:40:56.000 --> 00:41:00.000 used the right name for that. 00:41:00.000 --> 00:41:05.000 One more case let 00:41:05.000 --> 00:41:06.000 me just take a look at them both 00:41:06.000 --> 00:41:09.000 before I let the compiler tell me what is and what isn't right about them. Their both 00:41:09.000 --> 00:41:13.000 doing find index for key using linear search. If it comes back at not negative one 00:41:13.000 --> 00:41:17.000 pulling out the matching value and overriding it, and then in the case of adding 00:41:17.000 --> 00:41:23.000 or replacing the error - 00:41:23.000 --> 00:41:24.000 so the 00:41:24.000 --> 00:41:26.000 previously stored value for that is three, 00:41:26.000 --> 00:41:29.000 and if I go and I do one of those tests it's always good to know what 00:41:29.000 --> 00:41:34.000 happens if I ask it for something that doesn't exist. 00:41:34.000 --> 00:41:37.000 Making sure that the error handling that I put in there does 00:41:37.000 --> 00:41:43.000 what it was supposed to do, right? Are 00:41:43.000 --> 00:41:46.000 there any questions about the code that I wrote here? 00:41:46.000 --> 00:41:49.000 The vector is always kind of a good starting place for these things, because actually it just 00:41:49.000 --> 00:41:53.000 tends to use very simple things, and it tends to lend itself to easy code. 00:41:53.000 --> 00:41:57.000 But because of vector constraints being at a contiguous back structure, right, there's likely to 00:41:57.000 --> 00:41:59.000 be some 00:41:59.000 --> 00:42:01.000 performance implication. And in 00:42:01.000 --> 00:42:02.000 particular, 00:42:02.000 --> 00:42:04.000 in the unsorted case here, 00:42:04.000 --> 00:42:07.000 both add and get value 00:42:07.000 --> 00:42:09.000 involve doing a linear search. 00:42:09.000 --> 00:42:12.000 In some cases it will find it quickly at the very beginning or even half way 00:42:12.000 --> 00:42:13.000 through. 00:42:13.000 --> 00:42:17.000 In other cases it won't find it at all and have to search through the whole thing. So on 00:42:17.000 --> 00:42:20.000 average, if you figure it's equally likely to be in any position or not there at all, 00:42:20.000 --> 00:42:23.000 it's at going to at least have to look at half of them or more. 00:42:23.000 --> 00:42:25.000 And so we can say they're both linear. 00:42:25.000 --> 00:42:28.000 If we change it to be sorted, 00:42:28.000 --> 00:42:31.000 which is the first obvious improvement to make here, 00:42:31.000 --> 00:42:34.000 then get value 00:42:34.000 --> 00:42:37.000 gets a lot faster, because we can take advantage of binary search. 00:42:37.000 --> 00:42:39.000 I sort it by key, 00:42:39.000 --> 00:42:43.000 so ignoring the value, values are just satellite data. But stored in order of 00:42:43.000 --> 00:42:44.000 key, 00:42:44.000 --> 00:42:48.000 then all the A words, B words, C words, D words, what not, so 00:42:48.000 --> 00:42:51.000 walking it down the middle and seeing which half you look at, then looking at the 00:42:51.000 --> 00:42:53.000 quarter of there and the eight of there. It's going to very quickly narrow down on 00:42:53.000 --> 00:42:57.000 where it had to be if it was there or if it wasn't there at all. 00:42:57.000 --> 00:43:00.000 So we can implement that in logarithmic time, meaning that if you 1,000 00:43:00.000 --> 00:43:04.000 entries, it takes ten comparisons to find it or agree that it's not there. 00:43:04.000 --> 00:43:08.000 If you double that it will take one more comparison. If 00:43:08.000 --> 00:43:12.000 you go to 2,000, negligibly faster, even numbers in the millions 00:43:12.000 --> 00:43:15.000 are still just a handful 00:43:15.000 --> 00:43:19.000 of comparisons to work it out, and say, 00:43:19.000 --> 00:43:22.000 to the twentieth for example is roughly a million; so 20 comparisons in or out, 00:43:22.000 --> 00:43:25.000 and so it didn't look at a million things a million times over. 00:43:25.000 --> 00:43:27.000 However, 00:43:27.000 --> 00:43:30.000 that didn't improve add, 00:43:30.000 --> 00:43:31.000 why not? 00:43:31.000 --> 00:43:33.000 Why does keeping an assorted order 00:43:33.000 --> 00:43:37.000 not have the same boost for add as it did for get value? 00:43:37.000 --> 00:43:46.000 Can I do the same logarithmic search? You have to 00:43:46.000 --> 00:43:50.000 move all the elements ? 00:43:50.000 --> 00:43:51.000 Yeah, 00:43:51.000 --> 00:43:55.000 you can find the place fast, right, you can say returning the word 00:43:55.000 --> 00:43:59.000 apple it's like oh, okay. I can very quickly narrow in on where it needed to be. If it 00:43:59.000 --> 00:44:02.000 was there I could override it quickly, but in the case where it really needed to do 00:44:02.000 --> 00:44:03.000 an add, 00:44:03.000 --> 00:44:04.000 it's got to move them down. 00:44:04.000 --> 00:44:05.000 So the old add 00:44:05.000 --> 00:44:09.000 did all its work in the search and then just tacked it on the end if it didn't find 00:44:09.000 --> 00:44:12.000 it, or overrode the middle. But the new add 00:44:12.000 --> 00:44:16.000 also has to modify the array to put it in the right place, and putting it in the right place means 00:44:16.000 --> 00:44:18.000 shuffling to make space. 00:44:18.000 --> 00:44:20.000 So even though it found it quickly, 00:44:20.000 --> 00:44:25.000 it was unable to update that quickly, so it still ends up being 00:44:25.000 --> 00:44:26.000 linear in that case. 00:44:26.000 --> 00:44:29.000 Now this actually not a terrible result, 00:44:29.000 --> 00:44:30.000 as it stands 00:44:30.000 --> 00:44:33.000 it's probably much more likely that you're going to load up the data 00:44:33.000 --> 00:44:36.000 structure once and then do a lot 00:44:36.000 --> 00:44:40.000 of searches on it, then infrequent additions. That's a pretty common usage 00:44:40.000 --> 00:44:41.000 pattern for a map. It's kind of like you 00:44:41.000 --> 00:44:43.000 build a dictionary, right, 00:44:43.000 --> 00:44:46.000 loading the dictionary once could be expensive, but then people get tons and tons of 00:44:46.000 --> 00:44:49.000 look ups of words later. But you don't change the definitions a lot later or add a 00:44:49.000 --> 00:44:51.000 lot of new words. 00:44:51.000 --> 00:44:51.000 So 00:44:51.000 --> 00:44:56.000 this might actually be quite tolerable in many situations 00:44:56.000 --> 00:44:59.000 that the building of it was expensive, but it gave 00:44:59.000 --> 00:45:00.000 you 00:45:00.000 --> 00:45:02.000 fast search access. 00:45:02.000 --> 00:45:05.000 But really what you want to do is say, can we get both of those? So driving to say 00:45:05.000 --> 00:45:08.000 what we do with the editor buffer, you say, "Well, what if I just want both of those operations 00:45:08.000 --> 00:45:11.000 to be logarithmic or better, what can I do?" So 00:45:11.000 --> 00:45:12.000 I'll leave you with kind of just 00:45:12.000 --> 00:45:16.000 a little bit of a taste of where we're heading. 00:45:16.000 --> 00:45:19.000 Just to think a little bit about what the link 00:45:19.000 --> 00:45:21.000 list does and doesn't buy us, 00:45:21.000 --> 00:45:24.000 right, that the link list gave us flexibility 00:45:24.000 --> 00:45:29.000 at the editing, the local editing site, but it doesn't give us fast searching. 00:45:29.000 --> 00:45:31.000 Well maybe we can take this link list and try to 00:45:31.000 --> 00:45:33.000 add searching 00:45:33.000 --> 00:45:37.000 to the idea of this flexibility to kind of get the best of both 00:45:37.000 --> 00:45:41.000 worlds. So that's what we're going to start on Monday, building a tree. So if 00:45:41.000 --> 00:45:44.000 you have time today, I'd love to have you come to Truman, and otherwise have a good 00:45:44.000 --> 00:45:45.000 weekend, and 00:45:45.000 --> 00:45:50.000 get your PQ work in.