0:00:00.000,0:00:08.000 0:00:08.000,0:00:11.000 0:00:11.000,0:00:22.000 This presentation is delivered by the Stanford Center for Professional Development. 0:00:22.000,0:00:25.000 Okay, so thank you for coming out in the rain. 0:00:25.000,0:00:26.000 I 0:00:26.000,0:00:27.000 appreciate you 0:00:27.000,0:00:29.000 trudging over here and 0:00:29.000,0:00:30.000 risking 0:00:30.000,0:00:31.000 your health and life. 0:00:31.000,0:00:37.000 So I'm gonna do a little diversion here to explain this idea of functions as data before I come back 0:00:37.000,0:00:40.000 to fixing the problem we had kind of hit upon at the very end of the last lecture 0:00:40.000,0:00:42.000 about set. 0:00:42.000,0:00:44.000 So what I'm just gonna show you is 0:00:44.000,0:00:45.000 a little bit about 0:00:45.000,0:00:46.000 0:00:46.000,0:00:49.000 the design of something in C++ 0:00:49.000,0:00:52.000 that is gonna helpful in solving the problem we'd run into. 0:00:52.000,0:00:55.000 What I'm gonna show you here is two pieces of code that plot 0:00:55.000,0:00:57.000 a 0:00:57.000,0:01:01.000 single value function across the number line. So in this case, the function is that it's the 0:01:01.000,0:01:05.000 top one is applying the sine function, the sine wave as it varies across, and then the 0:01:05.000,0:01:06.000 square root function 0:01:06.000,0:01:09.000 which goes up and has a sloping curve to it, 0:01:09.000,0:01:13.000 and that both of these have exactly the same behaviors. They're designed to kind of go into the graphics 0:01:13.000,0:01:17.000 window, and they're just using a kind of a very simple kind of hand moving strategy. It 0:01:17.000,0:01:21.000 starts on a particular location based on what the value of sine is here, 0:01:21.000,0:01:23.000 and then over a particular interval 0:01:23.000,0:01:24.000 at 0.1 0:01:24.000,0:01:28.000 of one tenth of an inch, it plots the next point and connects a line between 0:01:28.000,0:01:31.000 them. So it does a simple kind of line approximation of what 0:01:31.000,0:01:34.000 that function looks like over the interval you asked it to plot. 0:01:34.000,0:01:37.000 The same code is being used here for plotting the square root. 0:01:37.000,0:01:41.000 And the thing to note and I tried to highlight by putting it in blue here was that every other 0:01:41.000,0:01:45.000 part of the code is exactly the same except for those two calls where I need to know what's 0:01:45.000,0:01:48.000 the value of square root starting at this particular 0:01:48.000,0:01:51.000 X. So starting at Value 1, 0:01:51.000,0:01:53.000 what is sine of one? What is square root of one? As I 0:01:53.000,0:01:57.000 work my way across the interval, what's a square root of 1.1, 1.2, 1.3 and 0:01:57.000,0:01:59.000 sine of those same values? 0:01:59.000,0:02:03.000 And so everything about the code is functionally identical. It's kind of frustrating 0:02:03.000,0:02:07.000 to look at it, though, and realize if I wanted to plot a bunch of other things -I also wanna be able to plot 0:02:07.000,0:02:09.000 the cosine function, or 0:02:09.000,0:02:12.000 some other function of my own creation 0:02:12.000,0:02:13.000 -across this interval 0:02:13.000,0:02:17.000 that I keep having to copy and paste this code and duplicate it. 0:02:17.000,0:02:21.000 And so one of the things that hopefully your 106A and prior experiences really 0:02:21.000,0:02:24.000 heightened your attention to is 0:02:24.000,0:02:26.000 if I have the same piece of code 0:02:26.000,0:02:30.000 in multiple places, I ought to be able to unify it. I ought to be able to make there be a plot 0:02:30.000,0:02:31.000 function 0:02:31.000,0:02:36.000 that can handle both sine and square root without actually having to distinguish it by copy and pasting, 0:02:36.000,0:02:38.000 so I wanna unify these two. 0:02:38.000,0:02:43.000 And so there is -the mechanism that we're gonna use kinda follows naturally if you don't let yourself get 0:02:43.000,0:02:45.000 too tripped up by what it means. 0:02:45.000,0:02:48.000 Just imagine that the parameter for example going into the function right now are the start and the stop, 0:02:48.000,0:02:49.000 the interval 0:02:49.000,0:02:52.000 from X is one to five. 0:02:52.000,0:02:55.000 What we'd like to do is further parameterize the function. 0:02:55.000,0:02:58.000 We'd like to add a third argument to it, which is to say 0:02:58.000,0:03:03.000 and when you're ready to know what function you're plotting, here's the one to use. 0:03:03.000,0:03:04.000 I'd like you to plot 0:03:04.000,0:03:08.000 sine over the interval start to stop. I'd like you to plot square root over that. So we added the third argument, 0:03:08.000,0:03:11.000 which was the function we wanted to invoke there. 0:03:11.000,0:03:16.000 Then we would be able to unify this down to where we had one generic plot function. 0:03:16.000,0:03:19.000 The good news is that this does actually 0:03:19.000,0:03:20.000 have 0:03:20.000,0:03:23.000 support features in the C++ language that let us do this. 0:03:23.000,0:03:26.000 The syntax for it is a little bit unusual, and 0:03:26.000,0:03:29.000 if you think about it too much, it can actually make your head hurt a little bit, think about what 0:03:29.000,0:03:33.000 it's doing. But you can actually use a function, a function's name 0:03:33.000,0:03:35.000 and the code that is associated with that name, 0:03:35.000,0:03:37.000 as a piece of data 0:03:37.000,0:03:41.000 that not just -you think of the code as we're calling this function, and 0:03:41.000,0:03:44.000 we're moving these things around, and executing things. 0:03:44.000,0:03:47.000 The things that we tend to be executing and operating on you think of as being integers, and strings, 0:03:47.000,0:03:49.000 and characters, and files. 0:03:49.000,0:03:52.000 But you can also extend your notion of what's data 0:03:52.000,0:03:54.000 to include the code you wrote 0:03:54.000,0:03:56.000 as part of the possibilities. 0:03:56.000,0:03:59.000 So in this case, I've added that third argument that I wanted to plot, 0:03:59.000,0:04:02.000 and this 0:04:02.000,0:04:06.000 syntax here that's a little bit unusual to you, and I'll kind of identify it, is that the 0:04:06.000,0:04:09.000 name of the parameter is actually FN. 0:04:09.000,0:04:12.000 Its name is enclosed in parentheses there. 0:04:12.000,0:04:16.000 And then to the right would be a list of the arguments 0:04:16.000,0:04:17.000 or the prototype 0:04:17.000,0:04:21.000 information about this function, and on the left is its return value. 0:04:21.000,0:04:24.000 So what this says is you have two doubles, the start and stop, 0:04:24.000,0:04:26.000 and the third thing in there isn't a double at all. It is a 0:04:26.000,0:04:27.000 function 0:04:27.000,0:04:32.000 of one double that returns a double, so a single value function that operates on doubles 0:04:32.000,0:04:33.000 here. 0:04:33.000,0:04:36.000 That is the syntax in C++ for specifying what you want coming in here 0:04:36.000,0:04:37.000 is 0:04:37.000,0:04:41.000 not a double, not a ray of doubles, anything funky like that. It is a function of a 0:04:41.000,0:04:43.000 double that returns a double. 0:04:43.000,0:04:45.000 And then in the body of this code, 0:04:45.000,0:04:47.000 when I say FN here 0:04:47.000,0:04:50.000 where I would've said sine, or square root, or identified 0:04:50.000,0:04:51.000 a particular function, 0:04:51.000,0:04:53.000 it's using the parameter 0:04:53.000,0:04:57.000 that was passed in by the client, so it says call the client's function 0:04:57.000,0:04:58.000 passing these values 0:04:58.000,0:05:01.000 to plot over the range 0:05:01.000,0:05:03.000 using their function 0:05:03.000,0:05:07.000 in. So the idea is that valid calls to plot now become things like plot -and then give it an 0:05:07.000,0:05:09.000 interval, zero to two, 0:05:09.000,0:05:11.000 and you give the name of a function: 0:05:11.000,0:05:15.000 the sine function, which comes out of the math library, the square root function, 0:05:15.000,0:05:16.000 also in the math library. 0:05:16.000,0:05:19.000 It could be that the function is something you wrote yourself, 0:05:19.000,0:05:21.000 the my function. 0:05:21.000,0:05:25.000 In order for this to be valid though, you can't just put any old function name there. It is 0:05:25.000,0:05:27.000 actually being quite specific about it 0:05:27.000,0:05:30.000 that plot was to find [inaudible]. It took a double, returned a double. That's the kind of function 0:05:30.000,0:05:33.000 that you can give it, so any function that has that 0:05:33.000,0:05:36.000 prototype, so it matches that 0:05:36.000,0:05:40.000 format is an acceptable one to pass in. If you try to pass something that actually 0:05:40.000,0:05:42.000 just does some other kind of function, 0:05:42.000,0:05:45.000 it doesn't have the same prototype, so the get line that takes no arguments and returns a string just 0:05:45.000,0:05:47.000 doesn't match on any front. 0:05:47.000,0:05:51.000 And if I try to say here, plot the get line function of the integral two to five, 0:05:51.000,0:05:53.000 it will quite rightfully complain to me 0:05:53.000,0:05:54.000 0:05:54.000,0:05:56.000 that that just doesn't make sense. 0:05:56.000,0:05:57.000 That's because it doesn't match. 0:05:57.000,0:06:02.000 So a little bit of syntax here, but actually kind of a very powerful thing, and it allows us to write to 0:06:02.000,0:06:05.000 an addition to kind of parameterizing on these things you think of as traditional 0:06:05.000,0:06:07.000 data, integers, and strings, and whatnot. It's 0:06:07.000,0:06:12.000 also say as part of your operations, you may need to make a call out to some other function. 0:06:12.000,0:06:15.000 Let's leave it open what that function is and allow the client to specify what function 0:06:15.000,0:06:20.000 to call at that time. So in this case for a plot, what function that you're trying to plot, let the client 0:06:20.000,0:06:21.000 tell you, 0:06:21.000,0:06:29.000 and then based on what they ask you to plot you can plot different things. All right. Way in the back. Is 0:06:29.000,0:06:34.000 there a similar setup for multivariable functions [inaudible]? Certainly. So all I would 0:06:34.000,0:06:37.000 need to do if this was a function that took a couple arguments, I would say double comma double comma int. If it 0:06:37.000,0:06:42.000 returned void, [inaudible] returned. Sometimes it looks a little bit like the prototype kinda taken out of context 0:06:42.000,0:06:43.000 and stuffed in there, 0:06:43.000,0:06:46.000 and then those parens around the function are a very important part of that 0:06:46.000,0:06:50.000 which is telling you yeah, this is a function of these 0:06:50.000,0:06:52.000 with this prototype information. 0:06:52.000,0:06:54.000 Behind you. No? You're good now. Somebody else? Over here? Is that FN a fixed 0:06:54.000,0:06:56.000 name 0:06:56.000,0:06:57.000 0:06:57.000,0:07:01.000 [inaudible]? No. It's just like any parameter name. You get to pick it. So I could've called it plot function, my function, 0:07:01.000,0:07:08.000 your function, whatever I wanted. Here 0:07:08.000,0:07:12.000 in the front. [Inaudible] Java? So Java doesn't really have a similar mechanism that looks like this. C does, so C++ inherits it 0:07:12.000,0:07:14.000 from 0:07:14.000,0:07:18.000 C. There are other ways you try to accomplish this in Java. It tries to support the same functionality in the end, but it 0:07:18.000,0:07:19.000 0:07:19.000,0:07:22.000 uses a pretty different approach than a functions as 0:07:22.000,0:07:27.000 data approach. [Inaudible]? 0:07:27.000,0:07:28.000 Can you pass like a 0:07:28.000,0:07:31.000 method, like an operator? So typically not. 0:07:31.000,0:07:35.000 This syntax that's being used here is for a free function, a function 0:07:35.000,0:07:38.000 that's kind of out in the global namespace, that level. 0:07:38.000,0:07:42.000 There is a different syntax for passing a method, which is a little bit more messy, 0:07:42.000,0:07:46.000 and we won't tend to need it, so I won't go there with you, but it does exist. It just as it stands 0:07:46.000,0:07:50.000 does not want a method. It wants a function. So 0:07:50.000,0:07:54.000 a method meaning member function of a class. 0:07:54.000,0:07:56.000 Okay, so let me -that was kind of just to 0:07:56.000,0:08:00.000 set the groundwork for the problem we were trying to solve in set, 0:08:00.000,0:08:05.000 which was set is holding this collection of elements that the client has stuffed in there. It's 0:08:05.000,0:08:05.000 a generic 0:08:05.000,0:08:07.000 templative class, so it doesn't 0:08:07.000,0:08:11.000 have any preconceived notion about what's being stored. Are the strings? Are they student structures? 0:08:11.000,0:08:13.000 Are they integers? 0:08:13.000,0:08:17.000 And that in order to perform its operations efficiently, it actually is using this 0:08:17.000,0:08:22.000 notion of ordering, keeping them in an order so that it can iterate an order. It can quickly find on the basis 0:08:22.000,0:08:27.000 of using order to quickly decide where something has to be and if it's present in the collection. 0:08:27.000,0:08:29.000 So how does it know how to compare 0:08:29.000,0:08:31.000 something that's of unknown type? 0:08:31.000,0:08:36.000 Well, what it does is it has a default strategy. It makes an assumption that 0:08:36.000,0:08:39.000 if I used equals equals and less than that would tell me kinda where to go. 0:08:39.000,0:08:42.000 And so it's got this idea that it wants to know. We'll give it two things. 0:08:42.000,0:08:47.000 Are they the same thing? In which case, it uses kinda the zero to show the ordering between them. If one 0:08:47.000,0:08:51.000 precedes the other, it wants to have some negative number that says well this one precedes it, or some 0:08:51.000,0:08:53.000 positive number 0:08:53.000,0:08:54.000 if it follows it. 0:08:54.000,0:08:57.000 So it applies this operation to the [inaudible] that 0:08:57.000,0:09:02.000 are there, and for strings, and ints, and doubles, and characters, 0:09:02.000,0:09:04.000 this works perfectly well. 0:09:04.000,0:09:08.000 And so that's how without us going out of our way, we can have sets of things 0:09:08.000,0:09:11.000 that respond to the built in relational operators 0:09:11.000,0:09:14.000 without any special effort as a client. 0:09:14.000,0:09:17.000 But what we can get into trouble with right is when 0:09:17.000,0:09:20.000 equals equals less than don't make sense for type. 0:09:20.000,0:09:24.000 So let me just go actually type some code, and I'll show you. I have it on the slide, but I'm gonna -wow, 0:09:24.000,0:09:29.000 this chair suddenly got really short. [Inaudible] fix that. Okay. 0:09:29.000,0:09:30.000 0:09:30.000,0:09:33.000 We'll go over here because I think it's better just to see it really happening, 0:09:33.000,0:09:36.000 so I'm gonna ignore this piece of code because it's not what I wanted. 0:09:36.000,0:09:40.000 But if I make some student T structure, 0:09:40.000,0:09:40.000 and it's got 0:09:40.000,0:09:44.000 the first and last name, and it's got the ID number, 0:09:44.000,0:09:48.000 and maybe that's all I want for now 0:09:48.000,0:09:48.000 -that if 0:09:48.000,0:09:51.000 down in my main -can't 0:09:51.000,0:09:54.000 find my main. There it is. I'm 0:09:54.000,0:09:56.000 gonna need that piece of code later, so I'm gonna leave it there. 0:09:56.000,0:09:58.000 If I make a set, 0:09:58.000,0:10:05.000 and I say I'd like a set of students -my class 0:10:05.000,0:10:09.000 -and I do this. And so I feel like I haven't gotten [inaudible]. I made this structure. I 0:10:09.000,0:10:14.000 say I'd like to make a set of students, that each student is in the class exactly once and I don't need any duplicates, 0:10:14.000,0:10:18.000 and I go through the process of trying to compile this, it's gonna give me some complaints. 0:10:18.000,0:10:21.000 And its complaint, which is a little bit hard to see up here, is 0:10:21.000,0:10:26.000 there's no match for operator equals equals in one, equals equals two, and operator less than in 0:10:26.000,0:10:27.000 one equals 0:10:27.000,0:10:30.000 two. So it's kind of showing me another piece of code that's kind of a 0:10:30.000,0:10:35.000 hidden piece of code I haven't actually seen directly, which is this operator compare function. 0:10:35.000,0:10:37.000 And that is the one that the set is using, as 0:10:37.000,0:10:40.000 it has this idea of what's the way it should compare two things. It says well I have some of this 0:10:40.000,0:10:46.000 operator compare that works generically on any type of things using equals equals and less than. And 0:10:46.000,0:10:50.000 if I click up here on the instantiate from here, it's gonna help me to understand what caused 0:10:50.000,0:10:52.000 this problem. 0:10:52.000,0:10:53.000 The problem was caused by 0:10:53.000,0:10:57.000 trying to create a set who was holding student T. 0:10:57.000,0:11:00.000 And so this gives you a little bit of an insight on how the template 0:11:00.000,0:11:02.000 operations are handled by the compiler, that 0:11:02.000,0:11:06.000 I have built this whole set class, and it depends on there being equals equals and less than 0:11:06.000,0:11:08.000 working for the type. 0:11:08.000,0:11:13.000 The fact that it didn't work for student T wasn't a cause for alarm until 0:11:13.000,0:11:15.000 I actually tried to instantiate 0:11:15.000,0:11:18.000 it. So at the point where I said I'd like to make a set holding student 0:11:18.000,0:11:22.000 T, the first point where the compiler actually goes through the process of generating a whole set 0:11:22.000,0:11:25.000 class, the set angle brackets student T, 0:11:25.000,0:11:29.000 filling in all the details, kinda working it all out, making the add, and contains, and whatever 0:11:29.000,0:11:32.000 operations, and then in the process of those, those 0:11:32.000,0:11:36.000 things are making calls that try to take student T objects and compare them using less than and 0:11:36.000,0:11:37.000 equals equals. 0:11:37.000,0:11:39.000 And that causes that code to fail. 0:11:39.000,0:11:42.000 So the code that's really failing is kind of somewhere in the class library, 0:11:42.000,0:11:44.000 but it's failing because 0:11:44.000,0:11:49.000 the things that we're passing through and instantiating for don't work with that 0:11:49.000,0:11:50.000 setup. 0:11:50.000,0:11:52.000 So it's something we've gotta 0:11:52.000,0:11:54.000 fix, we've gotta do something about. 0:11:54.000,0:11:59.000 Let me go back here and say what am I gonna do. Well, so 0:11:59.000,0:12:01.000 there's my error message. Same one, 0:12:01.000,0:12:04.000 right? Saying yeah, you can't do that 0:12:04.000,0:12:05.000 with a [inaudible]. Well, what 0:12:05.000,0:12:06.000 we do 0:12:06.000,0:12:09.000 is we use this notion of functions as data 0:12:09.000,0:12:12.000 to work out a solution for this problem 0:12:12.000,0:12:15.000 that if you think about kinda what's going on, the set actually knows everything about how 0:12:15.000,0:12:19.000 to store things. It's a very fancy efficient structure that says given your things, keeps them 0:12:19.000,0:12:20.000 in order, and 0:12:20.000,0:12:22.000 it manages to update, and 0:12:22.000,0:12:24.000 insert, and search that thing very efficiently. 0:12:24.000,0:12:28.000 But it doesn't know given any two random things how to compare them other than this assumption 0:12:28.000,0:12:32.000 it was making about less than and equals equals being a way to tell. 0:12:32.000,0:12:37.000 If it wants to have sort of a more sophisticated handling of that, what it needs to do is cooperate 0:12:37.000,0:12:38.000 with the client -that 0:12:38.000,0:12:43.000 the implementer of the set can't do it all. So there's these two programmers that 0:12:43.000,0:12:44.000 need to work in harmony. 0:12:44.000,0:12:50.000 So what the set does is it allows for the client to specify by providing a function. 0:12:50.000,0:12:54.000 It says well when I need to compare two things, how about you give me the name of a function 0:12:54.000,0:12:58.000 that when given two elements will return to me their ordering, this integer zero, 0:12:58.000,0:13:00.000 negative, positive that tells me how to put them in place. 0:13:00.000,0:13:05.000 And so the set kind of writes all of its operations in terms of well there's some function I can call, this function that 0:13:05.000,0:13:07.000 will compare two things. 0:13:07.000,0:13:11.000 If they don't specify one, I'll use this default one that maps them to the relationals, but if they do give 0:13:11.000,0:13:15.000 me one, I'll just ask them to do the comparison. And so then as its doing its searching and inserting 0:13:15.000,0:13:16.000 and whatnot, 0:13:16.000,0:13:19.000 it's calling back. We call that calling back to the client. 0:13:19.000,0:13:22.000 So the client writes a function. If I wanna put 0:13:22.000,0:13:25.000 student Ts into a set, 0:13:25.000,0:13:28.000 then I need to say when you compare two student Ts, what do you look at to know if they're 0:13:28.000,0:13:29.000 the same or how to order them. 0:13:29.000,0:13:31.000 So maybe I'm gonna order them by ID number. 0:13:31.000,0:13:33.000 Maybe I'm gonna use their first and last name. 0:13:33.000,0:13:37.000 Whatever it means for two things to be equal and have some sense of order, 0:13:37.000,0:13:38.000 I supply 0:13:38.000,0:13:40.000 -I write the function, 0:13:40.000,0:13:43.000 and then I pass it to the set constructor. 0:13:43.000,0:13:44.000 I say here's the function to use. 0:13:44.000,0:13:49.000 The set will hold on to that function. So I say here's the compare student structure function. 0:13:49.000,0:13:52.000 It holds onto that name, and when needed it calls back. 0:13:52.000,0:13:53.000 It says I'm 0:13:53.000,0:13:57.000 about to go look for a student structure. Is this the one? Well, I don't know if two student structures are 0:13:57.000,0:14:01.000 the same. I'll ask the client. Here's two student structures. Are they the same? 0:14:01.000,0:14:02.000 And then 0:14:02.000,0:14:06.000 as needed, it'll keep looking or 0:14:06.000,0:14:09.000 insert and add and do whatever I need to do. 0:14:09.000,0:14:13.000 So let's go back over here. 0:14:13.000,0:14:19.000 I'll write a little function. So the 0:14:19.000,0:14:21.000 prototype for it is it takes two elem Ts 0:14:21.000,0:14:24.000 and it returns an int. 0:14:24.000,0:14:31.000 That int is expected to have a value zero if they are the same, 0:14:31.000,0:14:33.000 and a value that is negative if 0:14:33.000,0:14:37.000 the first argument precedes the second. So if A is less than B, returns some negative 0:14:37.000,0:14:41.000 thing. You can return negative one, or negative 100, or negative one million, 0:14:41.000,0:14:42.000 but 0:14:42.000,0:14:45.000 you need to return some negative value. And then if A [cuts 0:14:45.000,0:14:47.000 out] some ordering, so they're not equal 0:14:47.000,0:14:51.000 [cuts out] later, it will return some positive value: 1, 10, 6 million. 0:14:51.000,0:14:55.000 So if I do [cuts out] say I use ID num as my comparison. 0:14:55.000,0:14:56.000 Based on [cuts 0:14:56.000,0:14:58.000 out] are the same, if they are 0:14:58.000,0:15:01.000 I can return zero. And if 0:15:01.000,0:15:03.000 ID num of A 0:15:03.000,0:15:07.000 is less than the ID num of B, I can 0:15:07.000,0:15:08.000 return negative one, and then in 0:15:08.000,0:15:11.000 the other case, I'll return one. 0:15:11.000,0:15:14.000 So it will compare them on the basis [cuts out] 0:15:14.000,0:15:16.000 figuring that the name field 0:15:16.000,0:15:17.000 at that point is 0:15:17.000,0:15:19.000 nothing new. 0:15:19.000,0:15:24.000 And then the way I use that is over here when I'm constructing it is there is 0:15:24.000,0:15:27.000 [cuts out] to the constructor, and so that's how I would pass [cuts out] add 0:15:27.000,0:15:29.000 parens to the [cuts 0:15:29.000,0:15:35.000 out] as I'm declaring it, and then I pass the name. Do I call it compare student or compare students? I can't remember. Compare student. Okay. 0:15:35.000,0:15:38.000 And 0:15:38.000,0:15:42.000 this then -so now there's nothing going on in the code -causes it to compile, 0:15:42.000,0:15:45.000 and that if you were to put 0:15:45.000,0:15:50.000 let's say a see out statement in your comparison function just for fun, you would find out as 0:15:50.000,0:15:53.000 you were doing adds, and compares, and removes, and whatnot on this set 0:15:53.000,0:15:58.000 that you would see that your call kept being made. It kept calling back to you as the 0:15:58.000,0:16:01.000 client saying I need to compare these things. I need to compare these things to decide where to put something, 0:16:01.000,0:16:03.000 whether it had something, 0:16:03.000,0:16:04.000 and whatnot. 0:16:04.000,0:16:08.000 And then based on your ordering, that would control for example how the iterator worked, that 0:16:08.000,0:16:12.000 the smallest one according to your function would be the one first returned by your 0:16:12.000,0:16:14.000 iterator, and it would move through larger 0:16:14.000,0:16:17.000 or sort of later in the ordering ones until the end. So 0:16:17.000,0:16:20.000 it's a very powerful mechanism 0:16:20.000,0:16:24.000 that's at work here because it means that for 0:16:24.000,0:16:28.000 anything you wanna put in a set, as long as you're willing to say how it is you compare 0:16:28.000,0:16:28.000 it, 0:16:28.000,0:16:33.000 then the set will take over and do the very efficient storing, and searching, and organizing 0:16:33.000,0:16:34.000 of that. 0:16:34.000,0:16:38.000 But you, the only piece you need to supply is this one little thing it can't figure 0:16:38.000,0:16:42.000 out for itself, which is given your type of thing, how do you compare 0:16:42.000,0:16:46.000 it. For the built in types string, and int, and double, and car, 0:16:46.000,0:16:51.000 it does have a default comparison function, that one that was called operator compare. Let 0:16:51.000,0:16:55.000 me go open it for you 0:16:55.000,0:16:56.000 [inaudible] the 106. 0:16:56.000,0:16:58.000 So there is a 0:16:58.000,0:17:00.000 compare function dot H, 0:17:00.000,0:17:02.000 and this is actually what 0:17:02.000,0:17:05.000 the default version of it looks. It's actually written as a template itself 0:17:05.000,0:17:07.000 that given two things it 0:17:07.000,0:17:08.000 just 0:17:08.000,0:17:11.000 turns around and asks the built in operators to help us out with that. 0:17:11.000,0:17:15.000 And that is the name that's being used 0:17:15.000,0:17:17.000 if I open the set 0:17:17.000,0:17:20.000 and you look at its constructor call. I had said that I would come back and tell you about 0:17:20.000,0:17:22.000 what this was 0:17:22.000,0:17:25.000 that the argument going into the set constructor 0:17:25.000,0:17:31.000 is one parameter whose name is CMP function that takes two elem type things -so here's the 0:17:31.000,0:17:33.000 one in the example of the two argument prototype 0:17:33.000,0:17:38.000 -returns an int, and then it uses a default assignment for that of operator compare, the one we just 0:17:38.000,0:17:39.000 looked at, 0:17:39.000,0:17:41.000 so that if you don't specify it, it goes 0:17:41.000,0:17:46.000 through and generates the standard comparison function for the type, 0:17:46.000,0:17:47.000 which 0:17:47.000,0:17:49.000 for built-ins will work fine, but for 0:17:49.000,0:17:53.000 user defined things is gonna create compiler errors. 0:17:53.000,0:17:56.000 So you can also choose if you don't like the default ordering works. 0:17:56.000,0:18:01.000 So for example, if you wanted to build a set of strings that was case insensitive 0:18:01.000,0:18:04.000 -so the default string handling would be to use equals equals and less than, which actually 0:18:04.000,0:18:08.000 does care about case. It doesn't think that capital [inaudible] is the same as lower case. 0:18:08.000,0:18:10.000 If you wanted it to consider them the same, 0:18:10.000,0:18:14.000 you could supply your own. A compare case insensitively function took two strings, 0:18:14.000,0:18:16.000 converted their case, and then compared them. 0:18:16.000,0:18:19.000 And then when you establish a set of strings, instead of letting the default argument take over, 0:18:19.000,0:18:24.000 go ahead and use your case insensitive compare, and then now you have a set of strings 0:18:24.000,0:18:26.000 that operates case insensitively. 0:18:26.000,0:18:28.000 So you can change 0:18:28.000,0:18:31.000 the ordering, adapt the ordering, whatever you like for the primitives, as well as supply the necessary 0:18:31.000,0:18:32.000 one for the 0:18:32.000,0:18:34.000 things the built-ins 0:18:34.000,0:18:35.000 don't have 0:18:35.000,0:18:40.000 properties for. So then that's that piece of code right 0:18:40.000,0:18:41.000 there. All right. So 0:18:41.000,0:18:44.000 does that make sense? 0:18:44.000,0:18:45.000 Well, now you know 0:18:45.000,0:18:49.000 kind of the whole range of things that are available in the class library. All 0:18:49.000,0:18:53.000 right, so we saw the four sequential containers, the vector, stack, queue, 0:18:53.000,0:18:55.000 and the grid that you kinda indexed ordering, 0:18:55.000,0:18:58.000 and kinda allowed you to throw things in and get them back out. 0:18:58.000,0:19:02.000 We went through the map, which is the sort of fancy heavy lifter that does that key value 0:19:02.000,0:19:04.000 lookup, and then we've seen the set, which does kind of 0:19:04.000,0:19:06.000 aggregate collection management, 0:19:06.000,0:19:10.000 and very efficient operations for kind of searching, retrieving, ordering, 0:19:10.000,0:19:12.000 0:19:12.000,0:19:13.000 joining with other kind of sets, and stuff 0:19:13.000,0:19:16.000 that also has a lot of high utility. I wanna do 0:19:16.000,0:19:19.000 one quick little program with you before I start recursion 0:19:19.000,0:19:21.000 just because I think it's kinda cool 0:19:21.000,0:19:23.000 is to talk a little about this idea of like once 0:19:23.000,0:19:26.000 you have these ADTs, you can solve a lot of cool problems, and that's certainly what this 0:19:26.000,0:19:29.000 week's assignment is about. It's like well here are these tasks that 0:19:29.000,0:19:31.000 if you didn't have 0:19:31.000,0:19:34.000 -So ADTs, just a reminder -I say this word as though everybody knows exactly what it means 0:19:34.000,0:19:37.000 -is just the idea of an abstract data type. 0:19:37.000,0:19:39.000 So an abstract data type is a data type 0:19:39.000,0:19:42.000 that you think of in terms of what it provides as an abstraction. 0:19:42.000,0:19:44.000 So a queue is this FIFO line, 0:19:44.000,0:19:47.000 and how it works internally, what it's implemented as, we're not worried 0:19:47.000,0:19:51.000 about it at all. We're only worried about the abstraction of enqueue and dequeue, and it coming first 0:19:51.000,0:19:53.000 in first out. So we talk 0:19:53.000,0:19:56.000 about ADTs, we say once we have a queue, a stack, or a vector, 0:19:56.000,0:19:58.000 we know what those things do, 0:19:58.000,0:20:00.000 what a mathematical set is about. 0:20:00.000,0:20:03.000 We build on top of that. We write code 0:20:03.000,0:20:06.000 that leverages those ADTs to do cool things 0:20:06.000,0:20:10.000 without having to also manage the low level details of where the memory came from for these things, 0:20:10.000,0:20:13.000 how they grow, how they search, how 0:20:13.000,0:20:15.000 they store and organize the data. 0:20:15.000,0:20:17.000 You just get to do real cool things. 0:20:17.000,0:20:21.000 So you probably got a little taste of that at the end of 106A when you get to use the array list and the 0:20:21.000,0:20:24.000 hashmap to do things. This set kinda just expands out 0:20:24.000,0:20:26.000 to fill out some other niches 0:20:26.000,0:20:28.000 where you can do a lot of really cool things 0:20:28.000,0:20:30.000 because you have these things around to build on. 0:20:30.000,0:20:34.000 So one of the things that happens a lot is you tend to do layered things, and you'll see a little bit of this 0:20:34.000,0:20:38.000 in the assignment you're doing this week where it's not just a set of something, it's a set of a map 0:20:38.000,0:20:39.000 of something, or 0:20:39.000,0:20:43.000 a vector of queues, a map of set. So I gave you a couple of examples here of 0:20:43.000,0:20:44.000 the things 0:20:44.000,0:20:48.000 that might be useful. Like if you think of what a smoothie is, it's a set of things 0:20:48.000,0:20:52.000 mixed together, some yogurt, and some different fruits, some wheatgrass, whatever it is you have in 0:20:52.000,0:20:53.000 it. 0:20:53.000,0:20:57.000 And that the menu for a smoothie shop is really just a bunch of those sets, so each set 0:20:57.000,0:21:00.000 of ingredients is a particular smoothie they have, and then 0:21:00.000,0:21:04.000 the set of all those sets is the menu that they post up on the board you can come 0:21:04.000,0:21:06.000 in and order. 0:21:06.000,0:21:07.000 The 0:21:07.000,0:21:09.000 compiler tends to use a map 0:21:09.000,0:21:13.000 to keep track of all the variables that are in scope. As you declare variables, it 0:21:13.000,0:21:17.000 adds them to the map so that when you later use that name, it knows where to find it. 0:21:17.000,0:21:19.000 Well, it also has to manage though 0:21:19.000,0:21:21.000 not just one map, 0:21:21.000,0:21:25.000 but the idea is as you enter and exit scopes, there is this 0:21:25.000,0:21:27.000 layering of open scopes. 0:21:27.000,0:21:32.000 So you have some open scope here. You go into a for loop where you open another scope. You add some 0:21:32.000,0:21:33.000 new variables 0:21:33.000,0:21:37.000 that when you look it actually shadows then at the nearest definition, so if you had two variables 0:21:37.000,0:21:41.000 of the same, it needs to look at the one that's closest. 0:21:41.000,0:21:44.000 And then when you exit that scope, it needs those variables to go away and 0:21:44.000,0:21:48.000 no longer be accessible. So one model for that could be very much a stack of 0:21:48.000,0:21:51.000 maps where each of those maps represents the scope 0:21:51.000,0:21:52.000 that's active, 0:21:52.000,0:21:56.000 a set of variable names, and maybe their types and information is stored in that map. And then 0:21:56.000,0:21:58.000 you stack them up. As you 0:21:58.000,0:22:01.000 open a scope, you push on a new empty map. 0:22:01.000,0:22:03.000 You put things into it, and then 0:22:03.000,0:22:06.000 maybe you enter another scope. You push on another new empty map. You stick things into it, but as you 0:22:06.000,0:22:11.000 exit and hit those closing braces, you pop those things from the stack to get to 0:22:11.000,0:22:13.000 this previous 0:22:13.000,0:22:16.000 environment you were in. So let me do a 0:22:16.000,0:22:19.000 little program with you. I just have 0:22:19.000,0:22:22.000 this idea of how this would work, and we'll see if 0:22:22.000,0:22:26.000 I can code something up. So I have -let's go 0:22:26.000,0:22:28.000 back over 0:22:28.000,0:22:32.000 here. I'm going to -this is the piece of code that just reads 0:22:32.000,0:22:34.000 words, so that's a 0:22:34.000,0:22:36.000 fine piece of code to have here. 0:22:36.000,0:22:38.000 I have a file here -let 0:22:38.000,0:22:39.000 me open 0:22:39.000,0:22:43.000 it up for you -that just contains the contents of the Official Scrabble Players' Dictionary, 0:22:43.000,0:22:45.000 Edition 2. 0:22:45.000,0:22:47.000 It's got a lot of words it. 0:22:47.000,0:22:48.000 It's pretty long. 0:22:48.000,0:22:49.000 It's still loading. 0:22:49.000,0:22:53.000 Let's go back and do something else while it's loading. 0:22:53.000,0:22:54.000 It happened to have 0:22:54.000,0:22:55.000 0:22:55.000,0:22:58.000 about 120,000 words I think is what it would be right now. 0:22:58.000,0:22:59.000 So it's busy loading, 0:22:59.000,0:23:01.000 and I have this question for you. 0:23:01.000,0:23:04.000 There are certain words that are anagrams of each other. 0:23:04.000,0:23:06.000 0:23:06.000,0:23:08.000 The word 0:23:08.000,0:23:10.000 cheap can be anagrammed into the word peach, 0:23:10.000,0:23:11.000 things like that. 0:23:11.000,0:23:12.000 0:23:12.000,0:23:13.000 And so I 0:23:13.000,0:23:15.000 am curious for the 0:23:15.000,0:23:16.000 Official Scrabble Players' Dictionary 0:23:16.000,0:23:20.000 what -so if you imagine that some words can be anagrammed a couple times, five 0:23:20.000,0:23:24.000 or six different words just on how you can rearrange the letters. I'm curious to know what the largest anagram 0:23:24.000,0:23:28.000 cluster is in the Official Scrabble Players' Dictionary. So I'd like to know across 0:23:28.000,0:23:30.000 all 127,000 words 0:23:30.000,0:23:34.000 that they form little clusters, and I'd like to find out what's the biggest of 0:23:34.000,0:23:39.000 those clusters. 0:23:39.000,0:23:41.000 Okay. That's my goal. So 0:23:41.000,0:23:45.000 here's what I've got going. I've got something that's gonna read them one by one. 0:23:45.000,0:23:48.000 So let's brainstorm for a second. 0:23:48.000,0:23:49.000 0:23:49.000,0:23:51.000 I want a way to take a particular word and kinda stick 0:23:51.000,0:23:55.000 it with its other anagram cluster friends. 0:23:55.000,0:23:57.000 What's a way I might do that? 0:23:57.000,0:24:03.000 Help me design my data structure. 0:24:03.000,0:24:06.000 Help me out. [Inaudible]. I've 0:24:06.000,0:24:08.000 got the word peach. 0:24:08.000,0:24:10.000 Where should I stick it so I can ? You could treat each string as like a set of 0:24:10.000,0:24:13.000 ? So I have this 0:24:13.000,0:24:17.000 string, 0:24:17.000,0:24:21.000 which represents the letters. I've 0:24:21.000,0:24:22.000 got the word peach. I wanna be able to stick peach with cheap, 0:24:22.000,0:24:26.000 so where should I stick peach in such a way that I could find it. And you've got this idea that 0:24:26.000,0:24:27.000 0:24:27.000,0:24:29.000 the letters there are a set. 0:24:29.000,0:24:32.000 They're not quite a set, though. Be careful because the word banana has a 0:24:32.000,0:24:36.000 couple As and a couple Ns, and so it's not that I'd really want it to come down to be 0:24:36.000,0:24:39.000 the set BAN. I wouldn't wanna coalesce the duplicates on that, so I really do wanna preserve all the letters that 0:24:39.000,0:24:43.000 are in there, 0:24:43.000,0:24:46.000 but your idea's getting us somewhere. It's like there is this idea of kind of like for any 0:24:46.000,0:24:49.000 particular word there's the 0:24:49.000,0:24:53.000 collection of letters that it is formed from. 0:24:53.000,0:24:56.000 And somehow if I could use that as an identifier 0:24:56.000,0:24:58.000 in a way that was reliable 0:24:58.000,0:25:01.000 -anybody 0:25:01.000,0:25:02.000 got any 0:25:02.000,0:25:03.000 ideas 0:25:03.000,0:25:08.000 about what to do with that? If you did that a vector, each letter and then the frequency of each letter in the word? So I could certainly do that. I could build kind of a vector that 0:25:08.000,0:25:12.000 had kinda frequencies, that had this little struct that maybe it was number of times it occurs. 0:25:12.000,0:25:15.000 Then I could try to build something on the basis of that vector that was like 0:25:15.000,0:25:18.000 here is -do these two vectors 0:25:18.000,0:25:22.000 match? Does banana match apple? And you'd say well no. It turns out they don't 0:25:22.000,0:25:23.000 have the same letters. I'm trying to 0:25:23.000,0:25:27.000 think of a really easy way to represent that. Your idea's good, 0:25:27.000,0:25:31.000 but I'm thinking really lazy. So somebody help me who's lazy. Could you 0:25:31.000,0:25:36.000 have a map where the key is all the letters in alphabetical order and the value is a 0:25:36.000,0:25:41.000 vector of all the ? Yeah. That is a great idea. You're taking his idea, and you're making it easier. 0:25:41.000,0:25:43.000 You're capitalizing on lazy, which is yeah 0:25:43.000,0:25:45.000 -I wanna keep track of all the letters that are in the word, 0:25:45.000,0:25:48.000 but I wanna do it in a way that makes it really easy for 0:25:48.000,0:25:52.000 example to know whether two have the same -we'll call it the signature. The signature of 0:25:52.000,0:25:54.000 the word is the letter frequency across it. 0:25:54.000,0:25:57.000 If I could come up with a way to represent the signature that was really to compare two 0:25:57.000,0:26:01.000 signatures quickly to see if they're the same, then I will have less work to do. And so your idea is a great 0:26:01.000,0:26:04.000 one. We take the letters and we alphabetize them. 0:26:04.000,0:26:08.000 So cheap and peach both turn into the same ACEHP. It's 0:26:08.000,0:26:13.000 a nonsense thing, but it's a signature. It's unique for any anagram, and 0:26:13.000,0:26:14.000 if I use a map 0:26:14.000,0:26:15.000 where that's the key, 0:26:15.000,0:26:17.000 then I can associate with every word 0:26:17.000,0:26:20.000 that had that same signature. 0:26:20.000,0:26:23.000 So 0:26:23.000,0:26:27.000 let's start building that. So let me write 0:26:27.000,0:26:29.000 -I wanna call this the signature. 0:26:29.000,0:26:31.000 Given a string, 0:26:31.000,0:26:33.000 it's going to 0:26:33.000,0:26:34.000 alphabetize it. 0:26:34.000,0:26:37.000 I'm going to write the 0:26:37.000,0:26:40.000 dumbest version of this ever, 0:26:40.000,0:26:47.000 but I'm just gonna use a simple sorting routine on this. So 0:26:47.000,0:26:51.000 smallest is gonna small index -we'll call it min index. That's a better name for it. 0:26:51.000,0:26:53.000 Min index equals I, 0:26:53.000,0:26:57.000 and then I'm gonna look through the string, and I'm gonna find the smallest letter that's 0:26:57.000,0:26:58.000 there and 0:26:58.000,0:27:03.000 move it to the front. That's basically gonna be my strategy. I've 0:27:03.000,0:27:05.000 got the wrong place to start, though. I'm gonna 0:27:05.000,0:27:11.000 look from I to the one. So if 0:27:11.000,0:27:13.000 S sub J 0:27:13.000,0:27:15.000 is less than S sub min index, 0:27:15.000,0:27:20.000 so it is a smaller letter, then the min index gets to be J. 0:27:20.000,0:27:23.000 So far, what I've done is I've run this loop that kind 0:27:23.000,0:27:27.000 of starts in this case on the very first iteration and says the first character in Slot 0, that 0:27:27.000,0:27:31.000 must be the min. And then it looks from there to the end. Is there anything smaller? Whenever it 0:27:31.000,0:27:34.000 find anything smaller, it updates the min index, so when it's done 0:27:34.000,0:27:39.000 after that second loop has fully 0:27:39.000,0:27:40.000 operated, 0:27:40.000,0:27:44.000 then min index will point to the smallest alphabetically 0:27:44.000,0:27:45.000 character in the 0:27:45.000,0:27:49.000 string that we have. Then I'm gonna swap it to the front. 0:27:49.000,0:27:52.000 So S sub I 0:27:52.000,0:27:55.000 with S sub min index, and 0:27:55.000,0:28:08.000 I'll write a swap because swap is pretty easy to write. 0:28:08.000,0:28:10.000 See about how we do this, okay. 0:28:10.000,0:28:12.000 So 0:28:12.000,0:28:13.000 if I 0:28:13.000,0:28:16.000 -I like how this works, and then let me stop and say 0:28:16.000,0:28:17.000 right here, now 0:28:17.000,0:28:19.000 is a good time to test this thing. 0:28:19.000,0:28:23.000 I just wrote signature. It probably works, but it potentially 0:28:23.000,0:28:24.000 could not. 0:28:24.000,0:28:27.000 And I'm just gonna show you a little bit of how I write code. This is good to know. As I 0:28:27.000,0:28:28.000 say, 0:28:28.000,0:28:29.000 0:28:29.000,0:28:32.000 I put some code in here that I plan on throwing away. 0:28:32.000,0:28:35.000 Enter word 0:28:35.000,0:28:42.000 -and then I throw it through my function. I think I called it 0:28:42.000,0:28:44.000 signature, didn't I? 0:28:44.000,0:28:47.000 Signature S 0:28:47.000,0:28:51.000 -so the idea being if it doesn't work, I wanna find out sooner rather than 0:28:51.000,0:28:52.000 later. It 0:28:52.000,0:28:55.000 doesn't like my use of get line. Is that because it's not 0:28:55.000,0:28:57.000 included? 0:28:57.000,0:29:00.000 Yes, it's not. So let's go get the right header files. This is half the 0:29:00.000,0:29:03.000 battle sometimes is figuring out what 0:29:03.000,0:29:04.000 headers 0:29:04.000,0:29:05.000 are needed to make 0:29:05.000,0:29:10.000 your compiler happy. So now it's up 0:29:10.000,0:29:14.000 here at enter word, and I say what does cheap come out as? 0:29:14.000,0:29:16.000 It goes into the debugger. That's good. 0:29:16.000,0:29:17.000 So it wasn't right. 0:29:17.000,0:29:19.000 Now we're gonna get to find out 0:29:19.000,0:29:21.000 what does it not like about that. 0:29:21.000,0:29:27.000 It says 0:29:27.000,0:29:29.000 -did we forget to return something? 0:29:29.000,0:29:30.000 Let's go look at our code. 0:29:30.000,0:29:34.000 So it's complaining about -oh yeah, I can see where we're gonna get in trouble here. It's 0:29:34.000,0:29:38.000 complaining that the return -it was saying that I'm trying to print something and it looks like garbage, that 0:29:38.000,0:29:40.000 what I'm trying to print didn't make sense at all. 0:29:40.000,0:29:47.000 I could say well that's funny. Let's go look at signature. [Inaudible] pass the [inaudible]. 0:29:47.000,0:29:51.000 That's probably a good idea. We ought to fix that while we're there. 0:29:51.000,0:29:54.000 Let me leave that bug in for a minute because I'm gonna fix my first bug, and then I'll come back 0:29:54.000,0:29:55.000 to that one. 0:29:55.000,0:29:57.000 So it turns out what it's really complaining about though is it has to do with 0:29:57.000,0:30:01.000 -I said well what does signature return. Somehow, what's being returned by signature is being interpreted 0:30:01.000,0:30:04.000 as total crap when it got back to main, and there's a very good reason for that 0:30:04.000,0:30:06.000 because I never returned 0:30:06.000,0:30:09.000 anything. So maybe if I had been less cavalier about the fact that it was giving me a warning 0:30:09.000,0:30:11.000 over here that said 0:30:11.000,0:30:14.000 control reaches the end of non-void function, but I was being lazy and I didn't look 0:30:14.000,0:30:17.000 -was that I 0:30:17.000,0:30:18.000 didn't pay attention to the fact. 0:30:18.000,0:30:20.000 So let's leave my other bug in that you've already pointed out 0:30:20.000,0:30:24.000 because this is exactly what it's like when you're doing it. You enter words and you say cheap, 0:30:24.000,0:30:25.000 and it says cheap, and you're like 0:30:25.000,0:30:28.000 no. 0:30:28.000,0:30:32.000 And then you're like about how about an. Hey look, that's in alphabetical order. And then you spend all your time thinking about words 0:30:32.000,0:30:37.000 for a while. Well, tux. That's in alphabetical order. It seems to work. You could do that for a while, but 0:30:37.000,0:30:39.000 then 0:30:39.000,0:30:41.000 you're like this whole cheap, 0:30:41.000,0:30:42.000 not good. 0:30:42.000,0:30:46.000 Okay, so I come back here and my friend who's already one step ahead of me has pointed out that my swap 0:30:46.000,0:30:50.000 is missing the all important pass by reference 0:30:50.000,0:30:53.000 that as it is it what swapping the copies that got passed to 0:30:53.000,0:30:56.000 the function, but of course nothing was happening back here in signature land. 0:30:56.000,0:30:57.000 So if I fix that, and 0:30:57.000,0:30:59.000 I come back in here, 0:30:59.000,0:31:01.000 I'm gonna feel better about this. Oh, 0:31:01.000,0:31:04.000 my gosh. And even tux still works. 0:31:04.000,0:31:08.000 And if I say banana, I should get a bunch of As, a bunch of B and an N, so there's various cases to test out if you have 0:31:08.000,0:31:09.000 multiple letters, and if 0:31:09.000,0:31:12.000 you have letters that are the same letters like 0:31:12.000,0:31:16.000 apple so that it doesn't lose your duplicates, and it seems to come out right. 0:31:16.000,0:31:17.000 So given this, 0:31:17.000,0:31:19.000 the word peach 0:31:19.000,0:31:22.000 should come down to the same signature as cheap, and so that seems to indicate we're on 0:31:22.000,0:31:24.000 the path 0:31:24.000,0:31:24.000 toward 0:31:24.000,0:31:26.000 building the thing we wanted to build. So I 0:31:26.000,0:31:28.000 have this read file thing that I 0:31:28.000,0:31:29.000 have left over from last time that 0:31:29.000,0:31:31.000 reads each word, 0:31:31.000,0:31:35.000 and I wanna change my vector 0:31:35.000,0:31:37.000 into a map of 0:31:37.000,0:31:40.000 set of string. 0:31:40.000,0:31:48.000 What capitalization do you not like? [Inaudible]. 0:31:48.000,0:31:52.000 Yeah. So it turns out this file happens to all be 0:31:52.000,0:32:00.000 lower case, but there's no harm in doing this. 0:32:00.000,0:32:03.000 That way even if they weren't, it'll 0:32:03.000,0:32:06.000 take care of that problem for us if we wanted it to. 0:32:06.000,0:32:09.000 [Inaudible]. 0:32:09.000,0:32:11.000 I want lower case. Oh 0:32:11.000,0:32:12.000 yeah, I do. Well, your 0:32:12.000,0:32:15.000 case. You guys are so picky. 0:32:15.000,0:32:18.000 All right. Here's my deal. I'm gonna take my map, 0:32:18.000,0:32:21.000 and this is gonna be a line of code that's gonna make your head spin. 0:32:21.000,0:32:31.000 Just go with it. 0:32:31.000,0:32:38.000 This is the do all 0:32:38.000,0:32:40.000 craziness. Okay. 0:32:40.000,0:32:41.000 So 0:32:41.000,0:32:43.000 I've got this map, 0:32:43.000,0:32:46.000 and what I wanna do is under the signature of this word, 0:32:46.000,0:32:50.000 I want to look up the set of strings that's associated with it and tack this one in. 0:32:50.000,0:32:54.000 And the add in this case with the set, I know that's gonna do non-duplication. In fact, 0:32:54.000,0:32:58.000 the file doesn't contain duplicates, but if it did, I certainly don't want it to record it 0:32:58.000,0:33:00.000 twice anyway, so I might as well do this. 0:33:00.000,0:33:04.000 Now this form of this is a heavy lifter for a small piece of 0:33:04.000,0:33:04.000 code. 0:33:04.000,0:33:09.000 The signature then went and converted into the ACEHP form. 0:33:09.000,0:33:12.000 I used that as the key into the table. 0:33:12.000,0:33:16.000 If it was already there, it's gonna retrieve me an existing set that I'm gonna just go ahead and 0:33:16.000,0:33:21.000 add a word onto. If it wasn't there, the behavior for the brackets is to create 0:33:21.000,0:33:26.000 kind of a new empty 0:33:26.000,0:33:28.000 value for that. So it'll use the key 0:33:28.000,0:33:32.000 and create a default value for type. Well, the default value for set of string -so if you just create a 0:33:32.000,0:33:32.000 set 0:33:32.000,0:33:36.000 without any other information in the default constructor will always create you a 0:33:36.000,0:33:40.000 nice clean empty set. So in fact, it will get me exactly what I want which is to put it in there with an empty 0:33:40.000,0:33:43.000 set that I will immediately add the word into. So after 0:33:43.000,0:33:46.000 I do this, I should have this fully populated map. And 0:33:46.000,0:33:49.000 then I'm 0:33:49.000,0:33:56.000 gonna do this just as a little test. I'm gonna say num words when I'm done 0:33:56.000,0:33:58.000 to feel 0:33:58.000,0:34:00.000 a little bit confident about what 0:34:00.000,0:34:04.000 got put in there. How 0:34:04.000,0:34:06.000 about I call it -that's a good idea. It's 0:34:06.000,0:34:11.000 so fussy. C++ never does what you want. 0:34:11.000,0:34:15.000 I think I called this thing OSPD2.txt that has the 0:34:15.000,0:34:20.000 words in it. 0:34:20.000,0:34:29.000 And then I need to declare the variable that I'm sticking all this stuff into is a set. So 0:34:29.000,0:34:32.000 go in, load stuff, 0:34:32.000,0:34:35.000 0:34:35.000,0:34:36.000 doing it's thing, the 0:34:36.000,0:34:38.000 number of words 112,882. Okay. 0:34:38.000,0:34:42.000 That's close enough. I can't remember the number that's in there, but that sounds like a fine 0:34:42.000,0:34:43.000 0:34:43.000,0:34:47.000 approximation of it. So I feel like it did sort of manage to do something for it. And I can actually do this if I 0:34:47.000,0:34:51.000 just wanna get a little glimpse of it is to use my iterator to look 0:34:51.000,0:34:53.000 at 0:34:53.000,0:35:04.000 something that's in there. Wait, is map a set of string? Iterator -- 0:35:04.000,0:35:05.000 file iter.hasNext. I'm gonna 0:35:05.000,0:35:07.000 say key 0:35:07.000,0:35:10.000 equals iter.next, 0:35:10.000,0:35:13.000 0:35:13.000,0:35:18.000 and I'm going to print that key, and I'm going to print the size of the 0:35:18.000,0:35:28.000 set because I'm at this 0:35:28.000,0:35:33.000 point -I should see gobs of printing come out of this thing. It 0:35:33.000,0:35:37.000 takes a little bit of a while to process the thing, and then 0:35:37.000,0:35:42.000 see gobs and gobs of stuff going by. It looks like a lot of things are ones if you can 0:35:42.000,0:35:45.000 imagine reading them as they go by because a lot of words are really just not anagrams 0:35:45.000,0:35:48.000 of anything else. But 0:35:48.000,0:35:52.000 some of the shorter ones have sort of a better chance. 0:35:52.000,0:35:56.000 So you can find out here at the end that there are EEIKLPST. I don't know what that is. Leakiest? No. I don't 0:35:56.000,0:35:58.000 know 0:35:58.000,0:35:59.000 what that word is. I 0:35:59.000,0:36:03.000 should write it in for any of them. 0:36:03.000,0:36:09.000 This dictionary has a bunch of really crazy words in it too, so it makes it especially challenging. What is that? I don't know. That one 0:36:09.000,0:36:10.000 almost looks 0:36:10.000,0:36:14.000 like beginners, but it's got an F in it. It's the F beginners, a very famous 0:36:14.000,0:36:16.000 word. You guys have probably heard 0:36:16.000,0:36:19.000 of it. So I've seen that, and now I wanna do the thing where I will pick the 0:36:19.000,0:36:21.000 largest one. I'd like to know. 0:36:21.000,0:36:23.000 Somebody should tell me. 0:36:23.000,0:36:25.000 Int max size [cuts 0:36:25.000,0:36:28.000 out] 0:36:28.000,0:36:30.000 max key, so I'll 0:36:30.000,0:36:33.000 set this to be zero, and max 0:36:33.000,0:36:34.000 key is that. 0:36:34.000,0:36:35.000 And then I'm gonna do this. If 0:36:35.000,0:36:38.000 the 0:36:38.000,0:36:40.000 size of this key is 0:36:40.000,0:36:43.000 greater than my max size, 0:36:43.000,0:36:55.000 then it's gonna get to be the new key. 0:36:55.000,0:37:06.000 So after I did all of that then [cuts out] 0:37:06.000,0:37:08.000 key. 0:37:08.000,0:37:13.000 And then I probably wanna see what they are, so why don't I go ahead and take a look. Ah, I have to go to 0:37:13.000,0:37:17.000 the type, though. [Inaudible] 0:37:17.000,0:37:21.000 to equals 0:37:21.000,0:37:22.000 M of 0:37:22.000,0:37:27.000 key -max key, I guess, dot 0:37:27.000,0:37:29.000 iterator, 0:37:29.000,0:37:30.000 [inaudible] IT 0:37:30.000,0:37:37.000 has next, CLIT.next [inaudible]. 0:37:37.000,0:37:43.000 So it went back. It found the maximum, in this case using the size of the sets as the distinguishing 0:37:43.000,0:37:47.000 feature. And 0:37:47.000,0:37:49.000 then max is AEPRS, 0:37:49.000,0:37:51.000 which 0:37:51.000,0:37:53.000 it's got a big old list of about 12. 0:37:53.000,0:37:55.000 I think that's 0:37:55.000,0:37:58.000 12, actually. Maybe 13. 0:37:58.000,0:37:59.000 So now you know. 0:37:59.000,0:38:02.000 You can impress your friends at parties. This is the kind of thing you can win bar bets on. 0:38:02.000,0:38:03.000 Oh, 0:38:03.000,0:38:06.000 yeah. What's the size of the largest 0:38:06.000,0:38:07.000 anagram cluster? 0:38:07.000,0:38:10.000 Everybody wants to know this kind of stuff. I can't believe you guys can sleep at night without 0:38:10.000,0:38:12.000 actually knowing this. 0:38:12.000,0:38:16.000 And what's neat to know though [inaudible] just to point out a couple of things that -you can use a little decomposition 0:38:16.000,0:38:18.000 on this code, 0:38:18.000,0:38:23.000 but there's kind of a very small amount of things we're having to do. For example, one of the things that's really 0:38:23.000,0:38:27.000 powerful, things like the map where we can just figure out how to key the things 0:38:27.000,0:38:29.000 to store the collection right under that key, 0:38:29.000,0:38:33.000 then looking it up and adding something is a very 0:38:33.000,0:38:36.000 efficient sort of direct operation, just building on these things and it going through and doing all 0:38:36.000,0:38:37.000 the work of 0:38:37.000,0:38:41.000 storing them, sorting them, making it efficient for us to retrieve them and look them up such that I can 0:38:41.000,0:38:44.000 process 100,000 words 0:38:44.000,0:38:49.000 in the blink of an eye, and then go back through, look at them all, find the biggest one, 0:38:49.000,0:38:51.000 get my information out. 0:38:51.000,0:38:55.000 When you make the call to M.size, is 0:38:55.000,0:39:00.000 that the number of words? [Inaudible]. That is the number of keys. Keys, right. So that's not actually [inaudible]. 0:39:00.000,0:39:01.000 Yeah. So it doesn't know anything about 0:39:01.000,0:39:04.000 everything else that was [inaudible], but in fact that's why it's 0:39:04.000,0:39:07.000 [inaudible]. I know the dictionary has about 127,000 words. It turns out they form about 0:39:07.000,0:39:08.000 0:39:08.000,0:39:11.000 112 unique 0:39:11.000,0:39:14.000 signatures, and so there's actually another 20,000 words that are 0:39:14.000,0:39:17.000 clung onto some existing signature. That's the number of unique signatures across 0:39:17.000,0:39:20.000 the dictionary, not the number of words, so that's probably the wrong name to call 0:39:20.000,0:39:22.000 it. For the M 0:39:22.000,0:39:24.000 0:39:24.000,0:39:28.000 signature word thing where [inaudible] of the default just to create a new stack, that 0:39:28.000,0:39:32.000 works as well for vectors [inaudible]? Yeah. It works for anything if you were just to declare 0:39:32.000,0:39:34.000 it on the stack and the right thing happened, 0:39:34.000,0:39:37.000 so vectors, set, maps. All those things do. 0:39:37.000,0:39:41.000 But the primitive types like int and double, it doesn't. So it would work for string. String 0:39:41.000,0:39:45.000 is an empty string. So for some of the fancier, more modern types tend to actually 0:39:45.000,0:39:49.000 know how to just default construct themselves into a good state, but the primitives don't do that. 0:39:49.000,0:39:53.000 So if you were having a map of ints and you wanted to have them start at zero, you need to really 0:39:53.000,0:39:58.000 start them at zero. You can call just M sub this. It would be garbage, and it would just be operating with 0:39:58.000,0:40:02.000 garbage from that way forward. 0:40:02.000,0:40:06.000 All right. Well, we're good. What 0:40:06.000,0:40:08.000 I'm gonna give you 0:40:08.000,0:40:10.000 is the 0:40:10.000,0:40:11.000 eight minute 0:40:11.000,0:40:12.000 discussion of recursion 0:40:12.000,0:40:14.000 that whets your appetite for 0:40:14.000,0:40:18.000 the things we're gonna be doing next week. 0:40:18.000,0:40:21.000 So recursion is one of those things I think that when you haven't yet 0:40:21.000,0:40:24.000 had a chance to explore it first hand and other people tell you about it, it 0:40:24.000,0:40:30.000 has sort of an awe inspiring sort of mystery, some fear, and whatnot. 0:40:30.000,0:40:31.000 So 0:40:31.000,0:40:34.000 first off, I wanna kinda shake that fear off. It is a little bit 0:40:34.000,0:40:35.000 hard to wrap your head around 0:40:35.000,0:40:38.000 the first time you see it, but we're gonna have a whole week's worth of time to spend on it, so 0:40:38.000,0:40:41.000 we're gonna 0:40:41.000,0:40:42.000 try to give you a lot of 0:40:42.000,0:40:46.000 different ways to think about it, and different problems to see to kinda help you do it. And I think 0:40:46.000,0:40:48.000 once you do get your head around it, it turns out then 0:40:48.000,0:40:53.000 you'll discover how infinitely powerful it is, that there is kind of a simple idea in it 0:40:53.000,0:40:57.000 that once you kinda fully get your head around, you can explore and solve lots of different 0:40:57.000,0:41:00.000 problems using this just one technique again and again. 0:41:00.000,0:41:02.000 So in itself, it's 0:41:02.000,0:41:04.000 a little bit mysterious at first glance, but then 0:41:04.000,0:41:06.000 once you kind of 0:41:06.000,0:41:09.000 master it, you'll be amazed at the kind of neat things you can do with it. 0:41:09.000,0:41:15.000 So it is certainly what I'd consider an indispensable tool in a programmer's tool kit. The kind of 0:41:15.000,0:41:19.000 problems you can solve using the techniques you have so far is fundamentally limited. And 0:41:19.000,0:41:22.000 part of what we need to do in this class is expose you to these new ways of solving 0:41:22.000,0:41:26.000 harder, more sophisticated problems that the old techniques don't work for. One of the 0:41:26.000,0:41:31.000 cool things about recursion is it actually lends very simple, elegant, short solutions 0:41:31.000,0:41:35.000 to problems that at first glance seem completely unsolvable. 0:41:35.000,0:41:36.000 That if you can 0:41:36.000,0:41:40.000 formulate a structure for [inaudible], you will discover that the code is not long 0:41:40.000,0:41:45.000 to write. It's not tedious to write. The tricky part is to figure out how 0:41:45.000,0:41:47.000 to express it, 0:41:47.000,0:41:50.000 so it's more of a thinking puzzle than it is a coding puzzle. I 0:41:50.000,0:41:55.000 certainly like thinking puzzles as much as coding puzzles if not more. 0:41:55.000,0:41:58.000 The general sort of idea is that 0:41:58.000,0:42:02.000 you are going to try to solve a problem 0:42:02.000,0:42:05.000 -instead of sort of breaking it down into component tasks like if I need to make dinner, I need 0:42:05.000,0:42:08.000 to go to the store and buy things, and I need to come home, and chop them, and get 0:42:08.000,0:42:12.000 the recipe. You think of what your standard decomposition is all about 0:42:12.000,0:42:16.000 -breaking down your tasks into A, B, and C, and D, and then you add them all together to get 0:42:16.000,0:42:17.000 0:42:17.000,0:42:20.000 the whole task done. Recursion has this kind of very different way of thinking about the problem, which is like 0:42:20.000,0:42:22.000 well if I needed to get Task A done, 0:42:22.000,0:42:26.000 and I had Task A prime, which was somehow a lot like the task I was trying to solve, but it somehow 0:42:26.000,0:42:29.000 was a little bit simpler, a little bit easier, 0:42:29.000,0:42:32.000 a little bit more manageable than the one I started out to solve, 0:42:32.000,0:42:34.000 and if I had that solution 0:42:34.000,0:42:37.000 -so if somehow I could delegate it off to some 0:42:37.000,0:42:39.000 0:42:39.000,0:42:43.000 minion who works for me, and then I could use that to solve my problem, then my 0:42:43.000,0:42:45.000 job would be made much easier by using that result of 0:42:45.000,0:42:48.000 solving a similar problem that's a little bit [inaudible]. 0:42:48.000,0:42:51.000 Okay, that seems a little bit wacky. 0:42:51.000,0:42:55.000 Let me give you sort of an example of how this might work. 0:42:55.000,0:42:58.000 So your standard problem, I said yeah, it's like you do these dissimilar sub problems. 0:42:58.000,0:43:04.000 Let's imagine I had this goal where I wanted to survey the Stanford student body. 0:43:04.000,0:43:06.000 I don't want just like a 0:43:06.000,0:43:08.000 haphazard 0:43:08.000,0:43:11.000 most of the people involved. Let's say I really wanted to get input from every single person 0:43:11.000,0:43:16.000 on campus whether they think having cardinal as your mascot is a ridiculous choice or not. So 0:43:16.000,0:43:19.000 let's imagine I really wanna hear from all 10,000 students. 0:43:19.000,0:43:21.000 Now I can stand out in White Plaza with a big 0:43:21.000,0:43:26.000 note pad and try to accost people and sort of work my way down the list. 0:43:26.000,0:43:29.000 And then I'd be there for eons and never solve my problem. 0:43:29.000,0:43:31.000 Instead, what I decide to do is I say well I'm gonna 0:43:31.000,0:43:32.000 recruit some people to help me 0:43:32.000,0:43:34.000 because I'm lazy 0:43:34.000,0:43:35.000 as we've already established, 0:43:35.000,0:43:37.000 and I would like to get 0:43:37.000,0:43:41.000 some other people to join in my quest to answer these burning questions and to 0:43:41.000,0:43:43.000 solve the survey. 0:43:43.000,0:43:45.000 So what I do is I round up ten people let's say, 0:43:45.000,0:43:49.000 and I say would you help me, and I decide to divide the campus into kind of ten 0:43:49.000,0:43:51.000 partitions. And I say if you 0:43:51.000,0:43:54.000 could survey all the people whose names begin with A, B, and C, that would really help. 0:43:54.000,0:43:56.000 And if you could do [inaudible] Gs, and 0:43:56.000,0:44:00.000 if you would do -and if I divide up the alphabet that way, give each of them two or three letters, and I 0:44:00.000,0:44:02.000 say if you would go get the data, it'd be really 0:44:02.000,0:44:05.000 easy for me to do my job then. If I just took all their 0:44:05.000,0:44:08.000 data [inaudible]. Well, being 0:44:08.000,0:44:12.000 the kind of lazy person that I am, it's likely that the ten people I recruit would have similar lazy qualities 0:44:12.000,0:44:15.000 because lazy people hang out with other lazy people. 0:44:15.000,0:44:18.000 And so the person who was in charge of A, B, C, the first thing they do is turn around and find ten 0:44:18.000,0:44:22.000 more friends, and then they divide it up and say could you do the 0:44:22.000,0:44:23.000 AA through AM and so on. 0:44:23.000,0:44:25.000 If they divide it into these pools of 0:44:25.000,0:44:30.000 one tenth of what they were responsible for, and say you can go get the information from these people, 0:44:30.000,0:44:31.000 and if they did the same thing 0:44:31.000,0:44:35.000 -so if everybody along the road. We started with 10,000. Now each person had 1,000 0:44:35.000,0:44:39.000 to survey. They asked their friend to do 100. Their friend asked ten people to do 0:44:39.000,0:44:39.000 ten. 0:44:39.000,0:44:43.000 And then at some point, the person who has ten says well I just need to ask these ten people. 0:44:43.000,0:44:45.000 0:44:45.000,0:44:47.000 Once I get their data, we don't need to do anything further. 0:44:47.000,0:44:50.000 So at some point the problem becomes so small, so simple, 0:44:50.000,0:44:55.000 even though it was kind of the same problem all along. I just reproduced the same problem we had, but in 0:44:55.000,0:44:58.000 a slightly more tractable form, but then I divided it around. Divide and conquer 0:44:58.000,0:45:00.000 sometimes they call this to where 0:45:00.000,0:45:01.000 I spread out the work 0:45:01.000,0:45:05.000 around a bunch of people to where each person's contribution is just a little part of the whole. 0:45:05.000,0:45:09.000 You had to find the ten volunteers around underneath you and 0:45:09.000,0:45:11.000 get their 0:45:11.000,0:45:14.000 help in solving the problem, but nobody had to do much work, and that's kind of a really 0:45:14.000,0:45:18.000 interesting way to solve a problem. It sounds like a very big problem of surveying 0:45:18.000,0:45:19.000 10,000 people, 0:45:19.000,0:45:23.000 but by dividing and conquer, everybody has a little tiny role to play. You leverage a lot 0:45:23.000,0:45:24.000 of people getting it done, 0:45:24.000,0:45:29.000 and there is this self-similarity to it, which is kind of intriguing that 0:45:29.000,0:45:34.000 everybody is trying to solve the same problem but just at different levels of scale. 0:45:34.000,0:45:36.000 And so this idea 0:45:36.000,0:45:38.000 applies to things like phone trees rather than 0:45:38.000,0:45:42.000 trying to get the message out to everybody in my class, it might be that I call two people who call two 0:45:42.000,0:45:45.000 friends who call two friends until everybody gets covered. 0:45:45.000,0:45:48.000 Nobody does the whole job. Everybody just does a little part of it. Sometimes you'll see these 0:45:48.000,0:45:50.000 fractal drawings where 0:45:50.000,0:45:55.000 there is a large leaf which when you look closer actually consists of littler leaves, 0:45:55.000,0:45:57.000 which themselves are littler leaves, so that at 0:45:57.000,0:45:58.000 every level of scale 0:45:58.000,0:46:01.000 the same image is being reproduced, 0:46:01.000,0:46:05.000 and the result kind of on the outside is something that in itself if you look deeper has the 0:46:05.000,0:46:08.000 same problem but smaller embedded in it. 0:46:08.000,0:46:10.000 So it's a pretty neat 0:46:10.000,0:46:11.000 sort of 0:46:11.000,0:46:13.000 way of solving things. 0:46:13.000,0:46:14.000 I am 0:46:14.000,0:46:18.000 gonna tell you a little bit about how the code looks, and then I really am not gonna be able to show you much code today. 0:46:18.000,0:46:23.000 I think it's actually even better to show you this code on Monday when we can come back fresh, 0:46:23.000,0:46:27.000 but that it involves taking -So we're looking at functional recursion first, 0:46:27.000,0:46:29.000 and functional recursion is 0:46:29.000,0:46:33.000 taking some sort of functions, a function that takes an input and returns an answers, returns 0:46:33.000,0:46:36.000 non-void thing that comes back. 0:46:36.000,0:46:39.000 And we're gonna be looking at problems where you're trying to solve this big problem, 0:46:39.000,0:46:44.000 and that if you had the answer to making a call to yourself on a smaller version of 0:46:44.000,0:46:45.000 the problem 0:46:45.000,0:46:48.000 -maybe one call, maybe two calls, maybe several calls 0:46:48.000,0:46:52.000 -that you could add those, or multiply those, or combine those in some way to answer the 0:46:52.000,0:46:53.000 bigger problem. So 0:46:53.000,0:46:56.000 if I were trying to solve 0:46:56.000,0:47:03.000 this campus survey, having the answers to these smaller campus surveys gives me that total result. And 0:47:03.000,0:47:05.000 so the -I really should not try to do this in two minutes. What 0:47:05.000,0:47:09.000 I should do is try to tell you on Monday. We'll come back and we'll talk about recursion, and it 0:47:09.000,0:47:11.000 will be an impressive week. Meanwhile, work on 0:47:11.000,0:47:14.000 your ADT homework.