WEBVTT 00:00:00.000 --> 00:00:08.000 00:00:08.000 --> 00:00:11.000 00:00:11.000 --> 00:00:22.000 This presentation is delivered by the Stanford Center for Professional Development. 00:00:22.000 --> 00:00:25.000 Okay, so thank you for coming out in the rain. 00:00:25.000 --> 00:00:26.000 I 00:00:26.000 --> 00:00:27.000 appreciate you 00:00:27.000 --> 00:00:29.000 trudging over here and 00:00:29.000 --> 00:00:30.000 risking 00:00:30.000 --> 00:00:31.000 your health and life. 00:00:31.000 --> 00:00:37.000 So I'm gonna do a little diversion here to explain this idea of functions as data before I come back 00:00:37.000 --> 00:00:40.000 to fixing the problem we had kind of hit upon at the very end of the last lecture 00:00:40.000 --> 00:00:42.000 about set. 00:00:42.000 --> 00:00:44.000 So what I'm just gonna show you is 00:00:44.000 --> 00:00:45.000 a little bit about 00:00:45.000 --> 00:00:46.000 00:00:46.000 --> 00:00:49.000 the design of something in C++ 00:00:49.000 --> 00:00:52.000 that is gonna helpful in solving the problem we'd run into. 00:00:52.000 --> 00:00:55.000 What I'm gonna show you here is two pieces of code that plot 00:00:55.000 --> 00:00:57.000 a 00:00:57.000 --> 00:01:01.000 single value function across the number line. So in this case, the function is that it's the 00:01:01.000 --> 00:01:05.000 top one is applying the sine function, the sine wave as it varies across, and then the 00:01:05.000 --> 00:01:06.000 square root function 00:01:06.000 --> 00:01:09.000 which goes up and has a sloping curve to it, 00:01:09.000 --> 00:01:13.000 and that both of these have exactly the same behaviors. They're designed to kind of go into the graphics 00:01:13.000 --> 00:01:17.000 window, and they're just using a kind of a very simple kind of hand moving strategy. It 00:01:17.000 --> 00:01:21.000 starts on a particular location based on what the value of sine is here, 00:01:21.000 --> 00:01:23.000 and then over a particular interval 00:01:23.000 --> 00:01:24.000 at 0.1 00:01:24.000 --> 00:01:28.000 of one tenth of an inch, it plots the next point and connects a line between 00:01:28.000 --> 00:01:31.000 them. So it does a simple kind of line approximation of what 00:01:31.000 --> 00:01:34.000 that function looks like over the interval you asked it to plot. 00:01:34.000 --> 00:01:37.000 The same code is being used here for plotting the square root. 00:01:37.000 --> 00:01:41.000 And the thing to note and I tried to highlight by putting it in blue here was that every other 00:01:41.000 --> 00:01:45.000 part of the code is exactly the same except for those two calls where I need to know what's 00:01:45.000 --> 00:01:48.000 the value of square root starting at this particular 00:01:48.000 --> 00:01:51.000 X. So starting at Value 1, 00:01:51.000 --> 00:01:53.000 what is sine of one? What is square root of one? As I 00:01:53.000 --> 00:01:57.000 work my way across the interval, what's a square root of 1.1, 1.2, 1.3 and 00:01:57.000 --> 00:01:59.000 sine of those same values? 00:01:59.000 --> 00:02:03.000 And so everything about the code is functionally identical. It's kind of frustrating 00:02:03.000 --> 00:02:07.000 to look at it, though, and realize if I wanted to plot a bunch of other things -I also wanna be able to plot 00:02:07.000 --> 00:02:09.000 the cosine function, or 00:02:09.000 --> 00:02:12.000 some other function of my own creation 00:02:12.000 --> 00:02:13.000 -across this interval 00:02:13.000 --> 00:02:17.000 that I keep having to copy and paste this code and duplicate it. 00:02:17.000 --> 00:02:21.000 And so one of the things that hopefully your 106A and prior experiences really 00:02:21.000 --> 00:02:24.000 heightened your attention to is 00:02:24.000 --> 00:02:26.000 if I have the same piece of code 00:02:26.000 --> 00:02:30.000 in multiple places, I ought to be able to unify it. I ought to be able to make there be a plot 00:02:30.000 --> 00:02:31.000 function 00:02:31.000 --> 00:02:36.000 that can handle both sine and square root without actually having to distinguish it by copy and pasting, 00:02:36.000 --> 00:02:38.000 so I wanna unify these two. 00:02:38.000 --> 00:02:43.000 And so there is -the mechanism that we're gonna use kinda follows naturally if you don't let yourself get 00:02:43.000 --> 00:02:45.000 too tripped up by what it means. 00:02:45.000 --> 00:02:48.000 Just imagine that the parameter for example going into the function right now are the start and the stop, 00:02:48.000 --> 00:02:49.000 the interval 00:02:49.000 --> 00:02:52.000 from X is one to five. 00:02:52.000 --> 00:02:55.000 What we'd like to do is further parameterize the function. 00:02:55.000 --> 00:02:58.000 We'd like to add a third argument to it, which is to say 00:02:58.000 --> 00:03:03.000 and when you're ready to know what function you're plotting, here's the one to use. 00:03:03.000 --> 00:03:04.000 I'd like you to plot 00:03:04.000 --> 00:03:08.000 sine over the interval start to stop. I'd like you to plot square root over that. So we added the third argument, 00:03:08.000 --> 00:03:11.000 which was the function we wanted to invoke there. 00:03:11.000 --> 00:03:16.000 Then we would be able to unify this down to where we had one generic plot function. 00:03:16.000 --> 00:03:19.000 The good news is that this does actually 00:03:19.000 --> 00:03:20.000 have 00:03:20.000 --> 00:03:23.000 support features in the C++ language that let us do this. 00:03:23.000 --> 00:03:26.000 The syntax for it is a little bit unusual, and 00:03:26.000 --> 00:03:29.000 if you think about it too much, it can actually make your head hurt a little bit, think about what 00:03:29.000 --> 00:03:33.000 it's doing. But you can actually use a function, a function's name 00:03:33.000 --> 00:03:35.000 and the code that is associated with that name, 00:03:35.000 --> 00:03:37.000 as a piece of data 00:03:37.000 --> 00:03:41.000 that not just -you think of the code as we're calling this function, and 00:03:41.000 --> 00:03:44.000 we're moving these things around, and executing things. 00:03:44.000 --> 00:03:47.000 The things that we tend to be executing and operating on you think of as being integers, and strings, 00:03:47.000 --> 00:03:49.000 and characters, and files. 00:03:49.000 --> 00:03:52.000 But you can also extend your notion of what's data 00:03:52.000 --> 00:03:54.000 to include the code you wrote 00:03:54.000 --> 00:03:56.000 as part of the possibilities. 00:03:56.000 --> 00:03:59.000 So in this case, I've added that third argument that I wanted to plot, 00:03:59.000 --> 00:04:02.000 and this 00:04:02.000 --> 00:04:06.000 syntax here that's a little bit unusual to you, and I'll kind of identify it, is that the 00:04:06.000 --> 00:04:09.000 name of the parameter is actually FN. 00:04:09.000 --> 00:04:12.000 Its name is enclosed in parentheses there. 00:04:12.000 --> 00:04:16.000 And then to the right would be a list of the arguments 00:04:16.000 --> 00:04:17.000 or the prototype 00:04:17.000 --> 00:04:21.000 information about this function, and on the left is its return value. 00:04:21.000 --> 00:04:24.000 So what this says is you have two doubles, the start and stop, 00:04:24.000 --> 00:04:26.000 and the third thing in there isn't a double at all. It is a 00:04:26.000 --> 00:04:27.000 function 00:04:27.000 --> 00:04:32.000 of one double that returns a double, so a single value function that operates on doubles 00:04:32.000 --> 00:04:33.000 here. 00:04:33.000 --> 00:04:36.000 That is the syntax in C++ for specifying what you want coming in here 00:04:36.000 --> 00:04:37.000 is 00:04:37.000 --> 00:04:41.000 not a double, not a ray of doubles, anything funky like that. It is a function of a 00:04:41.000 --> 00:04:43.000 double that returns a double. 00:04:43.000 --> 00:04:45.000 And then in the body of this code, 00:04:45.000 --> 00:04:47.000 when I say FN here 00:04:47.000 --> 00:04:50.000 where I would've said sine, or square root, or identified 00:04:50.000 --> 00:04:51.000 a particular function, 00:04:51.000 --> 00:04:53.000 it's using the parameter 00:04:53.000 --> 00:04:57.000 that was passed in by the client, so it says call the client's function 00:04:57.000 --> 00:04:58.000 passing these values 00:04:58.000 --> 00:05:01.000 to plot over the range 00:05:01.000 --> 00:05:03.000 using their function 00:05:03.000 --> 00:05:07.000 in. So the idea is that valid calls to plot now become things like plot -and then give it an 00:05:07.000 --> 00:05:09.000 interval, zero to two, 00:05:09.000 --> 00:05:11.000 and you give the name of a function: 00:05:11.000 --> 00:05:15.000 the sine function, which comes out of the math library, the square root function, 00:05:15.000 --> 00:05:16.000 also in the math library. 00:05:16.000 --> 00:05:19.000 It could be that the function is something you wrote yourself, 00:05:19.000 --> 00:05:21.000 the my function. 00:05:21.000 --> 00:05:25.000 In order for this to be valid though, you can't just put any old function name there. It is 00:05:25.000 --> 00:05:27.000 actually being quite specific about it 00:05:27.000 --> 00:05:30.000 that plot was to find [inaudible]. It took a double, returned a double. That's the kind of function 00:05:30.000 --> 00:05:33.000 that you can give it, so any function that has that 00:05:33.000 --> 00:05:36.000 prototype, so it matches that 00:05:36.000 --> 00:05:40.000 format is an acceptable one to pass in. If you try to pass something that actually 00:05:40.000 --> 00:05:42.000 just does some other kind of function, 00:05:42.000 --> 00:05:45.000 it doesn't have the same prototype, so the get line that takes no arguments and returns a string just 00:05:45.000 --> 00:05:47.000 doesn't match on any front. 00:05:47.000 --> 00:05:51.000 And if I try to say here, plot the get line function of the integral two to five, 00:05:51.000 --> 00:05:53.000 it will quite rightfully complain to me 00:05:53.000 --> 00:05:54.000 00:05:54.000 --> 00:05:56.000 that that just doesn't make sense. 00:05:56.000 --> 00:05:57.000 That's because it doesn't match. 00:05:57.000 --> 00:06:02.000 So a little bit of syntax here, but actually kind of a very powerful thing, and it allows us to write to 00:06:02.000 --> 00:06:05.000 an addition to kind of parameterizing on these things you think of as traditional 00:06:05.000 --> 00:06:07.000 data, integers, and strings, and whatnot. It's 00:06:07.000 --> 00:06:12.000 also say as part of your operations, you may need to make a call out to some other function. 00:06:12.000 --> 00:06:15.000 Let's leave it open what that function is and allow the client to specify what function 00:06:15.000 --> 00:06:20.000 to call at that time. So in this case for a plot, what function that you're trying to plot, let the client 00:06:20.000 --> 00:06:21.000 tell you, 00:06:21.000 --> 00:06:29.000 and then based on what they ask you to plot you can plot different things. All right. Way in the back. Is 00:06:29.000 --> 00:06:34.000 there a similar setup for multivariable functions [inaudible]? Certainly. So all I would 00:06:34.000 --> 00:06:37.000 need to do if this was a function that took a couple arguments, I would say double comma double comma int. If it 00:06:37.000 --> 00:06:42.000 returned void, [inaudible] returned. Sometimes it looks a little bit like the prototype kinda taken out of context 00:06:42.000 --> 00:06:43.000 and stuffed in there, 00:06:43.000 --> 00:06:46.000 and then those parens around the function are a very important part of that 00:06:46.000 --> 00:06:50.000 which is telling you yeah, this is a function of these 00:06:50.000 --> 00:06:52.000 with this prototype information. 00:06:52.000 --> 00:06:54.000 Behind you. No? You're good now. Somebody else? Over here? Is that FN a fixed 00:06:54.000 --> 00:06:56.000 name 00:06:56.000 --> 00:06:57.000 00:06:57.000 --> 00:07:01.000 [inaudible]? No. It's just like any parameter name. You get to pick it. So I could've called it plot function, my function, 00:07:01.000 --> 00:07:08.000 your function, whatever I wanted. Here 00:07:08.000 --> 00:07:12.000 in the front. [Inaudible] Java? So Java doesn't really have a similar mechanism that looks like this. C does, so C++ inherits it 00:07:12.000 --> 00:07:14.000 from 00:07:14.000 --> 00:07:18.000 C. There are other ways you try to accomplish this in Java. It tries to support the same functionality in the end, but it 00:07:18.000 --> 00:07:19.000 00:07:19.000 --> 00:07:22.000 uses a pretty different approach than a functions as 00:07:22.000 --> 00:07:27.000 data approach. [Inaudible]? 00:07:27.000 --> 00:07:28.000 Can you pass like a 00:07:28.000 --> 00:07:31.000 method, like an operator? So typically not. 00:07:31.000 --> 00:07:35.000 This syntax that's being used here is for a free function, a function 00:07:35.000 --> 00:07:38.000 that's kind of out in the global namespace, that level. 00:07:38.000 --> 00:07:42.000 There is a different syntax for passing a method, which is a little bit more messy, 00:07:42.000 --> 00:07:46.000 and we won't tend to need it, so I won't go there with you, but it does exist. It just as it stands 00:07:46.000 --> 00:07:50.000 does not want a method. It wants a function. So 00:07:50.000 --> 00:07:54.000 a method meaning member function of a class. 00:07:54.000 --> 00:07:56.000 Okay, so let me -that was kind of just to 00:07:56.000 --> 00:08:00.000 set the groundwork for the problem we were trying to solve in set, 00:08:00.000 --> 00:08:05.000 which was set is holding this collection of elements that the client has stuffed in there. It's 00:08:05.000 --> 00:08:05.000 a generic 00:08:05.000 --> 00:08:07.000 templative class, so it doesn't 00:08:07.000 --> 00:08:11.000 have any preconceived notion about what's being stored. Are the strings? Are they student structures? 00:08:11.000 --> 00:08:13.000 Are they integers? 00:08:13.000 --> 00:08:17.000 And that in order to perform its operations efficiently, it actually is using this 00:08:17.000 --> 00:08:22.000 notion of ordering, keeping them in an order so that it can iterate an order. It can quickly find on the basis 00:08:22.000 --> 00:08:27.000 of using order to quickly decide where something has to be and if it's present in the collection. 00:08:27.000 --> 00:08:29.000 So how does it know how to compare 00:08:29.000 --> 00:08:31.000 something that's of unknown type? 00:08:31.000 --> 00:08:36.000 Well, what it does is it has a default strategy. It makes an assumption that 00:08:36.000 --> 00:08:39.000 if I used equals equals and less than that would tell me kinda where to go. 00:08:39.000 --> 00:08:42.000 And so it's got this idea that it wants to know. We'll give it two things. 00:08:42.000 --> 00:08:47.000 Are they the same thing? In which case, it uses kinda the zero to show the ordering between them. If one 00:08:47.000 --> 00:08:51.000 precedes the other, it wants to have some negative number that says well this one precedes it, or some 00:08:51.000 --> 00:08:53.000 positive number 00:08:53.000 --> 00:08:54.000 if it follows it. 00:08:54.000 --> 00:08:57.000 So it applies this operation to the [inaudible] that 00:08:57.000 --> 00:09:02.000 are there, and for strings, and ints, and doubles, and characters, 00:09:02.000 --> 00:09:04.000 this works perfectly well. 00:09:04.000 --> 00:09:08.000 And so that's how without us going out of our way, we can have sets of things 00:09:08.000 --> 00:09:11.000 that respond to the built in relational operators 00:09:11.000 --> 00:09:14.000 without any special effort as a client. 00:09:14.000 --> 00:09:17.000 But what we can get into trouble with right is when 00:09:17.000 --> 00:09:20.000 equals equals less than don't make sense for type. 00:09:20.000 --> 00:09:24.000 So let me just go actually type some code, and I'll show you. I have it on the slide, but I'm gonna -wow, 00:09:24.000 --> 00:09:29.000 this chair suddenly got really short. [Inaudible] fix that. Okay. 00:09:29.000 --> 00:09:30.000 00:09:30.000 --> 00:09:33.000 We'll go over here because I think it's better just to see it really happening, 00:09:33.000 --> 00:09:36.000 so I'm gonna ignore this piece of code because it's not what I wanted. 00:09:36.000 --> 00:09:40.000 But if I make some student T structure, 00:09:40.000 --> 00:09:40.000 and it's got 00:09:40.000 --> 00:09:44.000 the first and last name, and it's got the ID number, 00:09:44.000 --> 00:09:48.000 and maybe that's all I want for now 00:09:48.000 --> 00:09:48.000 -that if 00:09:48.000 --> 00:09:51.000 down in my main -can't 00:09:51.000 --> 00:09:54.000 find my main. There it is. I'm 00:09:54.000 --> 00:09:56.000 gonna need that piece of code later, so I'm gonna leave it there. 00:09:56.000 --> 00:09:58.000 If I make a set, 00:09:58.000 --> 00:10:05.000 and I say I'd like a set of students -my class 00:10:05.000 --> 00:10:09.000 -and I do this. And so I feel like I haven't gotten [inaudible]. I made this structure. I 00:10:09.000 --> 00:10:14.000 say I'd like to make a set of students, that each student is in the class exactly once and I don't need any duplicates, 00:10:14.000 --> 00:10:18.000 and I go through the process of trying to compile this, it's gonna give me some complaints. 00:10:18.000 --> 00:10:21.000 And its complaint, which is a little bit hard to see up here, is 00:10:21.000 --> 00:10:26.000 there's no match for operator equals equals in one, equals equals two, and operator less than in 00:10:26.000 --> 00:10:27.000 one equals 00:10:27.000 --> 00:10:30.000 two. So it's kind of showing me another piece of code that's kind of a 00:10:30.000 --> 00:10:35.000 hidden piece of code I haven't actually seen directly, which is this operator compare function. 00:10:35.000 --> 00:10:37.000 And that is the one that the set is using, as 00:10:37.000 --> 00:10:40.000 it has this idea of what's the way it should compare two things. It says well I have some of this 00:10:40.000 --> 00:10:46.000 operator compare that works generically on any type of things using equals equals and less than. And 00:10:46.000 --> 00:10:50.000 if I click up here on the instantiate from here, it's gonna help me to understand what caused 00:10:50.000 --> 00:10:52.000 this problem. 00:10:52.000 --> 00:10:53.000 The problem was caused by 00:10:53.000 --> 00:10:57.000 trying to create a set who was holding student T. 00:10:57.000 --> 00:11:00.000 And so this gives you a little bit of an insight on how the template 00:11:00.000 --> 00:11:02.000 operations are handled by the compiler, that 00:11:02.000 --> 00:11:06.000 I have built this whole set class, and it depends on there being equals equals and less than 00:11:06.000 --> 00:11:08.000 working for the type. 00:11:08.000 --> 00:11:13.000 The fact that it didn't work for student T wasn't a cause for alarm until 00:11:13.000 --> 00:11:15.000 I actually tried to instantiate 00:11:15.000 --> 00:11:18.000 it. So at the point where I said I'd like to make a set holding student 00:11:18.000 --> 00:11:22.000 T, the first point where the compiler actually goes through the process of generating a whole set 00:11:22.000 --> 00:11:25.000 class, the set angle brackets student T, 00:11:25.000 --> 00:11:29.000 filling in all the details, kinda working it all out, making the add, and contains, and whatever 00:11:29.000 --> 00:11:32.000 operations, and then in the process of those, those 00:11:32.000 --> 00:11:36.000 things are making calls that try to take student T objects and compare them using less than and 00:11:36.000 --> 00:11:37.000 equals equals. 00:11:37.000 --> 00:11:39.000 And that causes that code to fail. 00:11:39.000 --> 00:11:42.000 So the code that's really failing is kind of somewhere in the class library, 00:11:42.000 --> 00:11:44.000 but it's failing because 00:11:44.000 --> 00:11:49.000 the things that we're passing through and instantiating for don't work with that 00:11:49.000 --> 00:11:50.000 setup. 00:11:50.000 --> 00:11:52.000 So it's something we've gotta 00:11:52.000 --> 00:11:54.000 fix, we've gotta do something about. 00:11:54.000 --> 00:11:59.000 Let me go back here and say what am I gonna do. Well, so 00:11:59.000 --> 00:12:01.000 there's my error message. Same one, 00:12:01.000 --> 00:12:04.000 right? Saying yeah, you can't do that 00:12:04.000 --> 00:12:05.000 with a [inaudible]. Well, what 00:12:05.000 --> 00:12:06.000 we do 00:12:06.000 --> 00:12:09.000 is we use this notion of functions as data 00:12:09.000 --> 00:12:12.000 to work out a solution for this problem 00:12:12.000 --> 00:12:15.000 that if you think about kinda what's going on, the set actually knows everything about how 00:12:15.000 --> 00:12:19.000 to store things. It's a very fancy efficient structure that says given your things, keeps them 00:12:19.000 --> 00:12:20.000 in order, and 00:12:20.000 --> 00:12:22.000 it manages to update, and 00:12:22.000 --> 00:12:24.000 insert, and search that thing very efficiently. 00:12:24.000 --> 00:12:28.000 But it doesn't know given any two random things how to compare them other than this assumption 00:12:28.000 --> 00:12:32.000 it was making about less than and equals equals being a way to tell. 00:12:32.000 --> 00:12:37.000 If it wants to have sort of a more sophisticated handling of that, what it needs to do is cooperate 00:12:37.000 --> 00:12:38.000 with the client -that 00:12:38.000 --> 00:12:43.000 the implementer of the set can't do it all. So there's these two programmers that 00:12:43.000 --> 00:12:44.000 need to work in harmony. 00:12:44.000 --> 00:12:50.000 So what the set does is it allows for the client to specify by providing a function. 00:12:50.000 --> 00:12:54.000 It says well when I need to compare two things, how about you give me the name of a function 00:12:54.000 --> 00:12:58.000 that when given two elements will return to me their ordering, this integer zero, 00:12:58.000 --> 00:13:00.000 negative, positive that tells me how to put them in place. 00:13:00.000 --> 00:13:05.000 And so the set kind of writes all of its operations in terms of well there's some function I can call, this function that 00:13:05.000 --> 00:13:07.000 will compare two things. 00:13:07.000 --> 00:13:11.000 If they don't specify one, I'll use this default one that maps them to the relationals, but if they do give 00:13:11.000 --> 00:13:15.000 me one, I'll just ask them to do the comparison. And so then as its doing its searching and inserting 00:13:15.000 --> 00:13:16.000 and whatnot, 00:13:16.000 --> 00:13:19.000 it's calling back. We call that calling back to the client. 00:13:19.000 --> 00:13:22.000 So the client writes a function. If I wanna put 00:13:22.000 --> 00:13:25.000 student Ts into a set, 00:13:25.000 --> 00:13:28.000 then I need to say when you compare two student Ts, what do you look at to know if they're 00:13:28.000 --> 00:13:29.000 the same or how to order them. 00:13:29.000 --> 00:13:31.000 So maybe I'm gonna order them by ID number. 00:13:31.000 --> 00:13:33.000 Maybe I'm gonna use their first and last name. 00:13:33.000 --> 00:13:37.000 Whatever it means for two things to be equal and have some sense of order, 00:13:37.000 --> 00:13:38.000 I supply 00:13:38.000 --> 00:13:40.000 -I write the function, 00:13:40.000 --> 00:13:43.000 and then I pass it to the set constructor. 00:13:43.000 --> 00:13:44.000 I say here's the function to use. 00:13:44.000 --> 00:13:49.000 The set will hold on to that function. So I say here's the compare student structure function. 00:13:49.000 --> 00:13:52.000 It holds onto that name, and when needed it calls back. 00:13:52.000 --> 00:13:53.000 It says I'm 00:13:53.000 --> 00:13:57.000 about to go look for a student structure. Is this the one? Well, I don't know if two student structures are 00:13:57.000 --> 00:14:01.000 the same. I'll ask the client. Here's two student structures. Are they the same? 00:14:01.000 --> 00:14:02.000 And then 00:14:02.000 --> 00:14:06.000 as needed, it'll keep looking or 00:14:06.000 --> 00:14:09.000 insert and add and do whatever I need to do. 00:14:09.000 --> 00:14:13.000 So let's go back over here. 00:14:13.000 --> 00:14:19.000 I'll write a little function. So the 00:14:19.000 --> 00:14:21.000 prototype for it is it takes two elem Ts 00:14:21.000 --> 00:14:24.000 and it returns an int. 00:14:24.000 --> 00:14:31.000 That int is expected to have a value zero if they are the same, 00:14:31.000 --> 00:14:33.000 and a value that is negative if 00:14:33.000 --> 00:14:37.000 the first argument precedes the second. So if A is less than B, returns some negative 00:14:37.000 --> 00:14:41.000 thing. You can return negative one, or negative 100, or negative one million, 00:14:41.000 --> 00:14:42.000 but 00:14:42.000 --> 00:14:45.000 you need to return some negative value. And then if A [cuts 00:14:45.000 --> 00:14:47.000 out] some ordering, so they're not equal 00:14:47.000 --> 00:14:51.000 [cuts out] later, it will return some positive value: 1, 10, 6 million. 00:14:51.000 --> 00:14:55.000 So if I do [cuts out] say I use ID num as my comparison. 00:14:55.000 --> 00:14:56.000 Based on [cuts 00:14:56.000 --> 00:14:58.000 out] are the same, if they are 00:14:58.000 --> 00:15:01.000 I can return zero. And if 00:15:01.000 --> 00:15:03.000 ID num of A 00:15:03.000 --> 00:15:07.000 is less than the ID num of B, I can 00:15:07.000 --> 00:15:08.000 return negative one, and then in 00:15:08.000 --> 00:15:11.000 the other case, I'll return one. 00:15:11.000 --> 00:15:14.000 So it will compare them on the basis [cuts out] 00:15:14.000 --> 00:15:16.000 figuring that the name field 00:15:16.000 --> 00:15:17.000 at that point is 00:15:17.000 --> 00:15:19.000 nothing new. 00:15:19.000 --> 00:15:24.000 And then the way I use that is over here when I'm constructing it is there is 00:15:24.000 --> 00:15:27.000 [cuts out] to the constructor, and so that's how I would pass [cuts out] add 00:15:27.000 --> 00:15:29.000 parens to the [cuts 00:15:29.000 --> 00:15:35.000 out] as I'm declaring it, and then I pass the name. Do I call it compare student or compare students? I can't remember. Compare student. Okay. 00:15:35.000 --> 00:15:38.000 And 00:15:38.000 --> 00:15:42.000 this then -so now there's nothing going on in the code -causes it to compile, 00:15:42.000 --> 00:15:45.000 and that if you were to put 00:15:45.000 --> 00:15:50.000 let's say a see out statement in your comparison function just for fun, you would find out as 00:15:50.000 --> 00:15:53.000 you were doing adds, and compares, and removes, and whatnot on this set 00:15:53.000 --> 00:15:58.000 that you would see that your call kept being made. It kept calling back to you as the 00:15:58.000 --> 00:16:01.000 client saying I need to compare these things. I need to compare these things to decide where to put something, 00:16:01.000 --> 00:16:03.000 whether it had something, 00:16:03.000 --> 00:16:04.000 and whatnot. 00:16:04.000 --> 00:16:08.000 And then based on your ordering, that would control for example how the iterator worked, that 00:16:08.000 --> 00:16:12.000 the smallest one according to your function would be the one first returned by your 00:16:12.000 --> 00:16:14.000 iterator, and it would move through larger 00:16:14.000 --> 00:16:17.000 or sort of later in the ordering ones until the end. So 00:16:17.000 --> 00:16:20.000 it's a very powerful mechanism 00:16:20.000 --> 00:16:24.000 that's at work here because it means that for 00:16:24.000 --> 00:16:28.000 anything you wanna put in a set, as long as you're willing to say how it is you compare 00:16:28.000 --> 00:16:28.000 it, 00:16:28.000 --> 00:16:33.000 then the set will take over and do the very efficient storing, and searching, and organizing 00:16:33.000 --> 00:16:34.000 of that. 00:16:34.000 --> 00:16:38.000 But you, the only piece you need to supply is this one little thing it can't figure 00:16:38.000 --> 00:16:42.000 out for itself, which is given your type of thing, how do you compare 00:16:42.000 --> 00:16:46.000 it. For the built in types string, and int, and double, and car, 00:16:46.000 --> 00:16:51.000 it does have a default comparison function, that one that was called operator compare. Let 00:16:51.000 --> 00:16:55.000 me go open it for you 00:16:55.000 --> 00:16:56.000 [inaudible] the 106. 00:16:56.000 --> 00:16:58.000 So there is a 00:16:58.000 --> 00:17:00.000 compare function dot H, 00:17:00.000 --> 00:17:02.000 and this is actually what 00:17:02.000 --> 00:17:05.000 the default version of it looks. It's actually written as a template itself 00:17:05.000 --> 00:17:07.000 that given two things it 00:17:07.000 --> 00:17:08.000 just 00:17:08.000 --> 00:17:11.000 turns around and asks the built in operators to help us out with that. 00:17:11.000 --> 00:17:15.000 And that is the name that's being used 00:17:15.000 --> 00:17:17.000 if I open the set 00:17:17.000 --> 00:17:20.000 and you look at its constructor call. I had said that I would come back and tell you about 00:17:20.000 --> 00:17:22.000 what this was 00:17:22.000 --> 00:17:25.000 that the argument going into the set constructor 00:17:25.000 --> 00:17:31.000 is one parameter whose name is CMP function that takes two elem type things -so here's the 00:17:31.000 --> 00:17:33.000 one in the example of the two argument prototype 00:17:33.000 --> 00:17:38.000 -returns an int, and then it uses a default assignment for that of operator compare, the one we just 00:17:38.000 --> 00:17:39.000 looked at, 00:17:39.000 --> 00:17:41.000 so that if you don't specify it, it goes 00:17:41.000 --> 00:17:46.000 through and generates the standard comparison function for the type, 00:17:46.000 --> 00:17:47.000 which 00:17:47.000 --> 00:17:49.000 for built-ins will work fine, but for 00:17:49.000 --> 00:17:53.000 user defined things is gonna create compiler errors. 00:17:53.000 --> 00:17:56.000 So you can also choose if you don't like the default ordering works. 00:17:56.000 --> 00:18:01.000 So for example, if you wanted to build a set of strings that was case insensitive 00:18:01.000 --> 00:18:04.000 -so the default string handling would be to use equals equals and less than, which actually 00:18:04.000 --> 00:18:08.000 does care about case. It doesn't think that capital [inaudible] is the same as lower case. 00:18:08.000 --> 00:18:10.000 If you wanted it to consider them the same, 00:18:10.000 --> 00:18:14.000 you could supply your own. A compare case insensitively function took two strings, 00:18:14.000 --> 00:18:16.000 converted their case, and then compared them. 00:18:16.000 --> 00:18:19.000 And then when you establish a set of strings, instead of letting the default argument take over, 00:18:19.000 --> 00:18:24.000 go ahead and use your case insensitive compare, and then now you have a set of strings 00:18:24.000 --> 00:18:26.000 that operates case insensitively. 00:18:26.000 --> 00:18:28.000 So you can change 00:18:28.000 --> 00:18:31.000 the ordering, adapt the ordering, whatever you like for the primitives, as well as supply the necessary 00:18:31.000 --> 00:18:32.000 one for the 00:18:32.000 --> 00:18:34.000 things the built-ins 00:18:34.000 --> 00:18:35.000 don't have 00:18:35.000 --> 00:18:40.000 properties for. So then that's that piece of code right 00:18:40.000 --> 00:18:41.000 there. All right. So 00:18:41.000 --> 00:18:44.000 does that make sense? 00:18:44.000 --> 00:18:45.000 Well, now you know 00:18:45.000 --> 00:18:49.000 kind of the whole range of things that are available in the class library. All 00:18:49.000 --> 00:18:53.000 right, so we saw the four sequential containers, the vector, stack, queue, 00:18:53.000 --> 00:18:55.000 and the grid that you kinda indexed ordering, 00:18:55.000 --> 00:18:58.000 and kinda allowed you to throw things in and get them back out. 00:18:58.000 --> 00:19:02.000 We went through the map, which is the sort of fancy heavy lifter that does that key value 00:19:02.000 --> 00:19:04.000 lookup, and then we've seen the set, which does kind of 00:19:04.000 --> 00:19:06.000 aggregate collection management, 00:19:06.000 --> 00:19:10.000 and very efficient operations for kind of searching, retrieving, ordering, 00:19:10.000 --> 00:19:12.000 00:19:12.000 --> 00:19:13.000 joining with other kind of sets, and stuff 00:19:13.000 --> 00:19:16.000 that also has a lot of high utility. I wanna do 00:19:16.000 --> 00:19:19.000 one quick little program with you before I start recursion 00:19:19.000 --> 00:19:21.000 just because I think it's kinda cool 00:19:21.000 --> 00:19:23.000 is to talk a little about this idea of like once 00:19:23.000 --> 00:19:26.000 you have these ADTs, you can solve a lot of cool problems, and that's certainly what this 00:19:26.000 --> 00:19:29.000 week's assignment is about. It's like well here are these tasks that 00:19:29.000 --> 00:19:31.000 if you didn't have 00:19:31.000 --> 00:19:34.000 -So ADTs, just a reminder -I say this word as though everybody knows exactly what it means 00:19:34.000 --> 00:19:37.000 -is just the idea of an abstract data type. 00:19:37.000 --> 00:19:39.000 So an abstract data type is a data type 00:19:39.000 --> 00:19:42.000 that you think of in terms of what it provides as an abstraction. 00:19:42.000 --> 00:19:44.000 So a queue is this FIFO line, 00:19:44.000 --> 00:19:47.000 and how it works internally, what it's implemented as, we're not worried 00:19:47.000 --> 00:19:51.000 about it at all. We're only worried about the abstraction of enqueue and dequeue, and it coming first 00:19:51.000 --> 00:19:53.000 in first out. So we talk 00:19:53.000 --> 00:19:56.000 about ADTs, we say once we have a queue, a stack, or a vector, 00:19:56.000 --> 00:19:58.000 we know what those things do, 00:19:58.000 --> 00:20:00.000 what a mathematical set is about. 00:20:00.000 --> 00:20:03.000 We build on top of that. We write code 00:20:03.000 --> 00:20:06.000 that leverages those ADTs to do cool things 00:20:06.000 --> 00:20:10.000 without having to also manage the low level details of where the memory came from for these things, 00:20:10.000 --> 00:20:13.000 how they grow, how they search, how 00:20:13.000 --> 00:20:15.000 they store and organize the data. 00:20:15.000 --> 00:20:17.000 You just get to do real cool things. 00:20:17.000 --> 00:20:21.000 So you probably got a little taste of that at the end of 106A when you get to use the array list and the 00:20:21.000 --> 00:20:24.000 hashmap to do things. This set kinda just expands out 00:20:24.000 --> 00:20:26.000 to fill out some other niches 00:20:26.000 --> 00:20:28.000 where you can do a lot of really cool things 00:20:28.000 --> 00:20:30.000 because you have these things around to build on. 00:20:30.000 --> 00:20:34.000 So one of the things that happens a lot is you tend to do layered things, and you'll see a little bit of this 00:20:34.000 --> 00:20:38.000 in the assignment you're doing this week where it's not just a set of something, it's a set of a map 00:20:38.000 --> 00:20:39.000 of something, or 00:20:39.000 --> 00:20:43.000 a vector of queues, a map of set. So I gave you a couple of examples here of 00:20:43.000 --> 00:20:44.000 the things 00:20:44.000 --> 00:20:48.000 that might be useful. Like if you think of what a smoothie is, it's a set of things 00:20:48.000 --> 00:20:52.000 mixed together, some yogurt, and some different fruits, some wheatgrass, whatever it is you have in 00:20:52.000 --> 00:20:53.000 it. 00:20:53.000 --> 00:20:57.000 And that the menu for a smoothie shop is really just a bunch of those sets, so each set 00:20:57.000 --> 00:21:00.000 of ingredients is a particular smoothie they have, and then 00:21:00.000 --> 00:21:04.000 the set of all those sets is the menu that they post up on the board you can come 00:21:04.000 --> 00:21:06.000 in and order. 00:21:06.000 --> 00:21:07.000 The 00:21:07.000 --> 00:21:09.000 compiler tends to use a map 00:21:09.000 --> 00:21:13.000 to keep track of all the variables that are in scope. As you declare variables, it 00:21:13.000 --> 00:21:17.000 adds them to the map so that when you later use that name, it knows where to find it. 00:21:17.000 --> 00:21:19.000 Well, it also has to manage though 00:21:19.000 --> 00:21:21.000 not just one map, 00:21:21.000 --> 00:21:25.000 but the idea is as you enter and exit scopes, there is this 00:21:25.000 --> 00:21:27.000 layering of open scopes. 00:21:27.000 --> 00:21:32.000 So you have some open scope here. You go into a for loop where you open another scope. You add some 00:21:32.000 --> 00:21:33.000 new variables 00:21:33.000 --> 00:21:37.000 that when you look it actually shadows then at the nearest definition, so if you had two variables 00:21:37.000 --> 00:21:41.000 of the same, it needs to look at the one that's closest. 00:21:41.000 --> 00:21:44.000 And then when you exit that scope, it needs those variables to go away and 00:21:44.000 --> 00:21:48.000 no longer be accessible. So one model for that could be very much a stack of 00:21:48.000 --> 00:21:51.000 maps where each of those maps represents the scope 00:21:51.000 --> 00:21:52.000 that's active, 00:21:52.000 --> 00:21:56.000 a set of variable names, and maybe their types and information is stored in that map. And then 00:21:56.000 --> 00:21:58.000 you stack them up. As you 00:21:58.000 --> 00:22:01.000 open a scope, you push on a new empty map. 00:22:01.000 --> 00:22:03.000 You put things into it, and then 00:22:03.000 --> 00:22:06.000 maybe you enter another scope. You push on another new empty map. You stick things into it, but as you 00:22:06.000 --> 00:22:11.000 exit and hit those closing braces, you pop those things from the stack to get to 00:22:11.000 --> 00:22:13.000 this previous 00:22:13.000 --> 00:22:16.000 environment you were in. So let me do a 00:22:16.000 --> 00:22:19.000 little program with you. I just have 00:22:19.000 --> 00:22:22.000 this idea of how this would work, and we'll see if 00:22:22.000 --> 00:22:26.000 I can code something up. So I have -let's go 00:22:26.000 --> 00:22:28.000 back over 00:22:28.000 --> 00:22:32.000 here. I'm going to -this is the piece of code that just reads 00:22:32.000 --> 00:22:34.000 words, so that's a 00:22:34.000 --> 00:22:36.000 fine piece of code to have here. 00:22:36.000 --> 00:22:38.000 I have a file here -let 00:22:38.000 --> 00:22:39.000 me open 00:22:39.000 --> 00:22:43.000 it up for you -that just contains the contents of the Official Scrabble Players' Dictionary, 00:22:43.000 --> 00:22:45.000 Edition 2. 00:22:45.000 --> 00:22:47.000 It's got a lot of words it. 00:22:47.000 --> 00:22:48.000 It's pretty long. 00:22:48.000 --> 00:22:49.000 It's still loading. 00:22:49.000 --> 00:22:53.000 Let's go back and do something else while it's loading. 00:22:53.000 --> 00:22:54.000 It happened to have 00:22:54.000 --> 00:22:55.000 00:22:55.000 --> 00:22:58.000 about 120,000 words I think is what it would be right now. 00:22:58.000 --> 00:22:59.000 So it's busy loading, 00:22:59.000 --> 00:23:01.000 and I have this question for you. 00:23:01.000 --> 00:23:04.000 There are certain words that are anagrams of each other. 00:23:04.000 --> 00:23:06.000 00:23:06.000 --> 00:23:08.000 The word 00:23:08.000 --> 00:23:10.000 cheap can be anagrammed into the word peach, 00:23:10.000 --> 00:23:11.000 things like that. 00:23:11.000 --> 00:23:12.000 00:23:12.000 --> 00:23:13.000 And so I 00:23:13.000 --> 00:23:15.000 am curious for the 00:23:15.000 --> 00:23:16.000 Official Scrabble Players' Dictionary 00:23:16.000 --> 00:23:20.000 what -so if you imagine that some words can be anagrammed a couple times, five 00:23:20.000 --> 00:23:24.000 or six different words just on how you can rearrange the letters. I'm curious to know what the largest anagram 00:23:24.000 --> 00:23:28.000 cluster is in the Official Scrabble Players' Dictionary. So I'd like to know across 00:23:28.000 --> 00:23:30.000 all 127,000 words 00:23:30.000 --> 00:23:34.000 that they form little clusters, and I'd like to find out what's the biggest of 00:23:34.000 --> 00:23:39.000 those clusters. 00:23:39.000 --> 00:23:41.000 Okay. That's my goal. So 00:23:41.000 --> 00:23:45.000 here's what I've got going. I've got something that's gonna read them one by one. 00:23:45.000 --> 00:23:48.000 So let's brainstorm for a second. 00:23:48.000 --> 00:23:49.000 00:23:49.000 --> 00:23:51.000 I want a way to take a particular word and kinda stick 00:23:51.000 --> 00:23:55.000 it with its other anagram cluster friends. 00:23:55.000 --> 00:23:57.000 What's a way I might do that? 00:23:57.000 --> 00:24:03.000 Help me design my data structure. 00:24:03.000 --> 00:24:06.000 Help me out. [Inaudible]. I've 00:24:06.000 --> 00:24:08.000 got the word peach. 00:24:08.000 --> 00:24:10.000 Where should I stick it so I can ? You could treat each string as like a set of 00:24:10.000 --> 00:24:13.000 ? So I have this 00:24:13.000 --> 00:24:17.000 string, 00:24:17.000 --> 00:24:21.000 which represents the letters. I've 00:24:21.000 --> 00:24:22.000 got the word peach. I wanna be able to stick peach with cheap, 00:24:22.000 --> 00:24:26.000 so where should I stick peach in such a way that I could find it. And you've got this idea that 00:24:26.000 --> 00:24:27.000 00:24:27.000 --> 00:24:29.000 the letters there are a set. 00:24:29.000 --> 00:24:32.000 They're not quite a set, though. Be careful because the word banana has a 00:24:32.000 --> 00:24:36.000 couple As and a couple Ns, and so it's not that I'd really want it to come down to be 00:24:36.000 --> 00:24:39.000 the set BAN. I wouldn't wanna coalesce the duplicates on that, so I really do wanna preserve all the letters that 00:24:39.000 --> 00:24:43.000 are in there, 00:24:43.000 --> 00:24:46.000 but your idea's getting us somewhere. It's like there is this idea of kind of like for any 00:24:46.000 --> 00:24:49.000 particular word there's the 00:24:49.000 --> 00:24:53.000 collection of letters that it is formed from. 00:24:53.000 --> 00:24:56.000 And somehow if I could use that as an identifier 00:24:56.000 --> 00:24:58.000 in a way that was reliable 00:24:58.000 --> 00:25:01.000 -anybody 00:25:01.000 --> 00:25:02.000 got any 00:25:02.000 --> 00:25:03.000 ideas 00:25:03.000 --> 00:25:08.000 about what to do with that? If you did that a vector, each letter and then the frequency of each letter in the word? So I could certainly do that. I could build kind of a vector that 00:25:08.000 --> 00:25:12.000 had kinda frequencies, that had this little struct that maybe it was number of times it occurs. 00:25:12.000 --> 00:25:15.000 Then I could try to build something on the basis of that vector that was like 00:25:15.000 --> 00:25:18.000 here is -do these two vectors 00:25:18.000 --> 00:25:22.000 match? Does banana match apple? And you'd say well no. It turns out they don't 00:25:22.000 --> 00:25:23.000 have the same letters. I'm trying to 00:25:23.000 --> 00:25:27.000 think of a really easy way to represent that. Your idea's good, 00:25:27.000 --> 00:25:31.000 but I'm thinking really lazy. So somebody help me who's lazy. Could you 00:25:31.000 --> 00:25:36.000 have a map where the key is all the letters in alphabetical order and the value is a 00:25:36.000 --> 00:25:41.000 vector of all the ? Yeah. That is a great idea. You're taking his idea, and you're making it easier. 00:25:41.000 --> 00:25:43.000 You're capitalizing on lazy, which is yeah 00:25:43.000 --> 00:25:45.000 -I wanna keep track of all the letters that are in the word, 00:25:45.000 --> 00:25:48.000 but I wanna do it in a way that makes it really easy for 00:25:48.000 --> 00:25:52.000 example to know whether two have the same -we'll call it the signature. The signature of 00:25:52.000 --> 00:25:54.000 the word is the letter frequency across it. 00:25:54.000 --> 00:25:57.000 If I could come up with a way to represent the signature that was really to compare two 00:25:57.000 --> 00:26:01.000 signatures quickly to see if they're the same, then I will have less work to do. And so your idea is a great 00:26:01.000 --> 00:26:04.000 one. We take the letters and we alphabetize them. 00:26:04.000 --> 00:26:08.000 So cheap and peach both turn into the same ACEHP. It's 00:26:08.000 --> 00:26:13.000 a nonsense thing, but it's a signature. It's unique for any anagram, and 00:26:13.000 --> 00:26:14.000 if I use a map 00:26:14.000 --> 00:26:15.000 where that's the key, 00:26:15.000 --> 00:26:17.000 then I can associate with every word 00:26:17.000 --> 00:26:20.000 that had that same signature. 00:26:20.000 --> 00:26:23.000 So 00:26:23.000 --> 00:26:27.000 let's start building that. So let me write 00:26:27.000 --> 00:26:29.000 -I wanna call this the signature. 00:26:29.000 --> 00:26:31.000 Given a string, 00:26:31.000 --> 00:26:33.000 it's going to 00:26:33.000 --> 00:26:34.000 alphabetize it. 00:26:34.000 --> 00:26:37.000 I'm going to write the 00:26:37.000 --> 00:26:40.000 dumbest version of this ever, 00:26:40.000 --> 00:26:47.000 but I'm just gonna use a simple sorting routine on this. So 00:26:47.000 --> 00:26:51.000 smallest is gonna small index -we'll call it min index. That's a better name for it. 00:26:51.000 --> 00:26:53.000 Min index equals I, 00:26:53.000 --> 00:26:57.000 and then I'm gonna look through the string, and I'm gonna find the smallest letter that's 00:26:57.000 --> 00:26:58.000 there and 00:26:58.000 --> 00:27:03.000 move it to the front. That's basically gonna be my strategy. I've 00:27:03.000 --> 00:27:05.000 got the wrong place to start, though. I'm gonna 00:27:05.000 --> 00:27:11.000 look from I to the one. So if 00:27:11.000 --> 00:27:13.000 S sub J 00:27:13.000 --> 00:27:15.000 is less than S sub min index, 00:27:15.000 --> 00:27:20.000 so it is a smaller letter, then the min index gets to be J. 00:27:20.000 --> 00:27:23.000 So far, what I've done is I've run this loop that kind 00:27:23.000 --> 00:27:27.000 of starts in this case on the very first iteration and says the first character in Slot 0, that 00:27:27.000 --> 00:27:31.000 must be the min. And then it looks from there to the end. Is there anything smaller? Whenever it 00:27:31.000 --> 00:27:34.000 find anything smaller, it updates the min index, so when it's done 00:27:34.000 --> 00:27:39.000 after that second loop has fully 00:27:39.000 --> 00:27:40.000 operated, 00:27:40.000 --> 00:27:44.000 then min index will point to the smallest alphabetically 00:27:44.000 --> 00:27:45.000 character in the 00:27:45.000 --> 00:27:49.000 string that we have. Then I'm gonna swap it to the front. 00:27:49.000 --> 00:27:52.000 So S sub I 00:27:52.000 --> 00:27:55.000 with S sub min index, and 00:27:55.000 --> 00:28:08.000 I'll write a swap because swap is pretty easy to write. 00:28:08.000 --> 00:28:10.000 See about how we do this, okay. 00:28:10.000 --> 00:28:12.000 So 00:28:12.000 --> 00:28:13.000 if I 00:28:13.000 --> 00:28:16.000 -I like how this works, and then let me stop and say 00:28:16.000 --> 00:28:17.000 right here, now 00:28:17.000 --> 00:28:19.000 is a good time to test this thing. 00:28:19.000 --> 00:28:23.000 I just wrote signature. It probably works, but it potentially 00:28:23.000 --> 00:28:24.000 could not. 00:28:24.000 --> 00:28:27.000 And I'm just gonna show you a little bit of how I write code. This is good to know. As I 00:28:27.000 --> 00:28:28.000 say, 00:28:28.000 --> 00:28:29.000 00:28:29.000 --> 00:28:32.000 I put some code in here that I plan on throwing away. 00:28:32.000 --> 00:28:35.000 Enter word 00:28:35.000 --> 00:28:42.000 -and then I throw it through my function. I think I called it 00:28:42.000 --> 00:28:44.000 signature, didn't I? 00:28:44.000 --> 00:28:47.000 Signature S 00:28:47.000 --> 00:28:51.000 -so the idea being if it doesn't work, I wanna find out sooner rather than 00:28:51.000 --> 00:28:52.000 later. It 00:28:52.000 --> 00:28:55.000 doesn't like my use of get line. Is that because it's not 00:28:55.000 --> 00:28:57.000 included? 00:28:57.000 --> 00:29:00.000 Yes, it's not. So let's go get the right header files. This is half the 00:29:00.000 --> 00:29:03.000 battle sometimes is figuring out what 00:29:03.000 --> 00:29:04.000 headers 00:29:04.000 --> 00:29:05.000 are needed to make 00:29:05.000 --> 00:29:10.000 your compiler happy. So now it's up 00:29:10.000 --> 00:29:14.000 here at enter word, and I say what does cheap come out as? 00:29:14.000 --> 00:29:16.000 It goes into the debugger. That's good. 00:29:16.000 --> 00:29:17.000 So it wasn't right. 00:29:17.000 --> 00:29:19.000 Now we're gonna get to find out 00:29:19.000 --> 00:29:21.000 what does it not like about that. 00:29:21.000 --> 00:29:27.000 It says 00:29:27.000 --> 00:29:29.000 -did we forget to return something? 00:29:29.000 --> 00:29:30.000 Let's go look at our code. 00:29:30.000 --> 00:29:34.000 So it's complaining about -oh yeah, I can see where we're gonna get in trouble here. It's 00:29:34.000 --> 00:29:38.000 complaining that the return -it was saying that I'm trying to print something and it looks like garbage, that 00:29:38.000 --> 00:29:40.000 what I'm trying to print didn't make sense at all. 00:29:40.000 --> 00:29:47.000 I could say well that's funny. Let's go look at signature. [Inaudible] pass the [inaudible]. 00:29:47.000 --> 00:29:51.000 That's probably a good idea. We ought to fix that while we're there. 00:29:51.000 --> 00:29:54.000 Let me leave that bug in for a minute because I'm gonna fix my first bug, and then I'll come back 00:29:54.000 --> 00:29:55.000 to that one. 00:29:55.000 --> 00:29:57.000 So it turns out what it's really complaining about though is it has to do with 00:29:57.000 --> 00:30:01.000 -I said well what does signature return. Somehow, what's being returned by signature is being interpreted 00:30:01.000 --> 00:30:04.000 as total crap when it got back to main, and there's a very good reason for that 00:30:04.000 --> 00:30:06.000 because I never returned 00:30:06.000 --> 00:30:09.000 anything. So maybe if I had been less cavalier about the fact that it was giving me a warning 00:30:09.000 --> 00:30:11.000 over here that said 00:30:11.000 --> 00:30:14.000 control reaches the end of non-void function, but I was being lazy and I didn't look 00:30:14.000 --> 00:30:17.000 -was that I 00:30:17.000 --> 00:30:18.000 didn't pay attention to the fact. 00:30:18.000 --> 00:30:20.000 So let's leave my other bug in that you've already pointed out 00:30:20.000 --> 00:30:24.000 because this is exactly what it's like when you're doing it. You enter words and you say cheap, 00:30:24.000 --> 00:30:25.000 and it says cheap, and you're like 00:30:25.000 --> 00:30:28.000 no. 00:30:28.000 --> 00:30:32.000 And then you're like about how about an. Hey look, that's in alphabetical order. And then you spend all your time thinking about words 00:30:32.000 --> 00:30:37.000 for a while. Well, tux. That's in alphabetical order. It seems to work. You could do that for a while, but 00:30:37.000 --> 00:30:39.000 then 00:30:39.000 --> 00:30:41.000 you're like this whole cheap, 00:30:41.000 --> 00:30:42.000 not good. 00:30:42.000 --> 00:30:46.000 Okay, so I come back here and my friend who's already one step ahead of me has pointed out that my swap 00:30:46.000 --> 00:30:50.000 is missing the all important pass by reference 00:30:50.000 --> 00:30:53.000 that as it is it what swapping the copies that got passed to 00:30:53.000 --> 00:30:56.000 the function, but of course nothing was happening back here in signature land. 00:30:56.000 --> 00:30:57.000 So if I fix that, and 00:30:57.000 --> 00:30:59.000 I come back in here, 00:30:59.000 --> 00:31:01.000 I'm gonna feel better about this. Oh, 00:31:01.000 --> 00:31:04.000 my gosh. And even tux still works. 00:31:04.000 --> 00:31:08.000 And if I say banana, I should get a bunch of As, a bunch of B and an N, so there's various cases to test out if you have 00:31:08.000 --> 00:31:09.000 multiple letters, and if 00:31:09.000 --> 00:31:12.000 you have letters that are the same letters like 00:31:12.000 --> 00:31:16.000 apple so that it doesn't lose your duplicates, and it seems to come out right. 00:31:16.000 --> 00:31:17.000 So given this, 00:31:17.000 --> 00:31:19.000 the word peach 00:31:19.000 --> 00:31:22.000 should come down to the same signature as cheap, and so that seems to indicate we're on 00:31:22.000 --> 00:31:24.000 the path 00:31:24.000 --> 00:31:24.000 toward 00:31:24.000 --> 00:31:26.000 building the thing we wanted to build. So I 00:31:26.000 --> 00:31:28.000 have this read file thing that I 00:31:28.000 --> 00:31:29.000 have left over from last time that 00:31:29.000 --> 00:31:31.000 reads each word, 00:31:31.000 --> 00:31:35.000 and I wanna change my vector 00:31:35.000 --> 00:31:37.000 into a map of 00:31:37.000 --> 00:31:40.000 set of string. 00:31:40.000 --> 00:31:48.000 What capitalization do you not like? [Inaudible]. 00:31:48.000 --> 00:31:52.000 Yeah. So it turns out this file happens to all be 00:31:52.000 --> 00:32:00.000 lower case, but there's no harm in doing this. 00:32:00.000 --> 00:32:03.000 That way even if they weren't, it'll 00:32:03.000 --> 00:32:06.000 take care of that problem for us if we wanted it to. 00:32:06.000 --> 00:32:09.000 [Inaudible]. 00:32:09.000 --> 00:32:11.000 I want lower case. Oh 00:32:11.000 --> 00:32:12.000 yeah, I do. Well, your 00:32:12.000 --> 00:32:15.000 case. You guys are so picky. 00:32:15.000 --> 00:32:18.000 All right. Here's my deal. I'm gonna take my map, 00:32:18.000 --> 00:32:21.000 and this is gonna be a line of code that's gonna make your head spin. 00:32:21.000 --> 00:32:31.000 Just go with it. 00:32:31.000 --> 00:32:38.000 This is the do all 00:32:38.000 --> 00:32:40.000 craziness. Okay. 00:32:40.000 --> 00:32:41.000 So 00:32:41.000 --> 00:32:43.000 I've got this map, 00:32:43.000 --> 00:32:46.000 and what I wanna do is under the signature of this word, 00:32:46.000 --> 00:32:50.000 I want to look up the set of strings that's associated with it and tack this one in. 00:32:50.000 --> 00:32:54.000 And the add in this case with the set, I know that's gonna do non-duplication. In fact, 00:32:54.000 --> 00:32:58.000 the file doesn't contain duplicates, but if it did, I certainly don't want it to record it 00:32:58.000 --> 00:33:00.000 twice anyway, so I might as well do this. 00:33:00.000 --> 00:33:04.000 Now this form of this is a heavy lifter for a small piece of 00:33:04.000 --> 00:33:04.000 code. 00:33:04.000 --> 00:33:09.000 The signature then went and converted into the ACEHP form. 00:33:09.000 --> 00:33:12.000 I used that as the key into the table. 00:33:12.000 --> 00:33:16.000 If it was already there, it's gonna retrieve me an existing set that I'm gonna just go ahead and 00:33:16.000 --> 00:33:21.000 add a word onto. If it wasn't there, the behavior for the brackets is to create 00:33:21.000 --> 00:33:26.000 kind of a new empty 00:33:26.000 --> 00:33:28.000 value for that. So it'll use the key 00:33:28.000 --> 00:33:32.000 and create a default value for type. Well, the default value for set of string -so if you just create a 00:33:32.000 --> 00:33:32.000 set 00:33:32.000 --> 00:33:36.000 without any other information in the default constructor will always create you a 00:33:36.000 --> 00:33:40.000 nice clean empty set. So in fact, it will get me exactly what I want which is to put it in there with an empty 00:33:40.000 --> 00:33:43.000 set that I will immediately add the word into. So after 00:33:43.000 --> 00:33:46.000 I do this, I should have this fully populated map. And 00:33:46.000 --> 00:33:49.000 then I'm 00:33:49.000 --> 00:33:56.000 gonna do this just as a little test. I'm gonna say num words when I'm done 00:33:56.000 --> 00:33:58.000 to feel 00:33:58.000 --> 00:34:00.000 a little bit confident about what 00:34:00.000 --> 00:34:04.000 got put in there. How 00:34:04.000 --> 00:34:06.000 about I call it -that's a good idea. It's 00:34:06.000 --> 00:34:11.000 so fussy. C++ never does what you want. 00:34:11.000 --> 00:34:15.000 I think I called this thing OSPD2.txt that has the 00:34:15.000 --> 00:34:20.000 words in it. 00:34:20.000 --> 00:34:29.000 And then I need to declare the variable that I'm sticking all this stuff into is a set. So 00:34:29.000 --> 00:34:32.000 go in, load stuff, 00:34:32.000 --> 00:34:35.000 00:34:35.000 --> 00:34:36.000 doing it's thing, the 00:34:36.000 --> 00:34:38.000 number of words 112,882. Okay. 00:34:38.000 --> 00:34:42.000 That's close enough. I can't remember the number that's in there, but that sounds like a fine 00:34:42.000 --> 00:34:43.000 00:34:43.000 --> 00:34:47.000 approximation of it. So I feel like it did sort of manage to do something for it. And I can actually do this if I 00:34:47.000 --> 00:34:51.000 just wanna get a little glimpse of it is to use my iterator to look 00:34:51.000 --> 00:34:53.000 at 00:34:53.000 --> 00:35:04.000 something that's in there. Wait, is map a set of string? Iterator -- 00:35:04.000 --> 00:35:05.000 file iter.hasNext. I'm gonna 00:35:05.000 --> 00:35:07.000 say key 00:35:07.000 --> 00:35:10.000 equals iter.next, 00:35:10.000 --> 00:35:13.000 00:35:13.000 --> 00:35:18.000 and I'm going to print that key, and I'm going to print the size of the 00:35:18.000 --> 00:35:28.000 set because I'm at this 00:35:28.000 --> 00:35:33.000 point -I should see gobs of printing come out of this thing. It 00:35:33.000 --> 00:35:37.000 takes a little bit of a while to process the thing, and then 00:35:37.000 --> 00:35:42.000 see gobs and gobs of stuff going by. It looks like a lot of things are ones if you can 00:35:42.000 --> 00:35:45.000 imagine reading them as they go by because a lot of words are really just not anagrams 00:35:45.000 --> 00:35:48.000 of anything else. But 00:35:48.000 --> 00:35:52.000 some of the shorter ones have sort of a better chance. 00:35:52.000 --> 00:35:56.000 So you can find out here at the end that there are EEIKLPST. I don't know what that is. Leakiest? No. I don't 00:35:56.000 --> 00:35:58.000 know 00:35:58.000 --> 00:35:59.000 what that word is. I 00:35:59.000 --> 00:36:03.000 should write it in for any of them. 00:36:03.000 --> 00:36:09.000 This dictionary has a bunch of really crazy words in it too, so it makes it especially challenging. What is that? I don't know. That one 00:36:09.000 --> 00:36:10.000 almost looks 00:36:10.000 --> 00:36:14.000 like beginners, but it's got an F in it. It's the F beginners, a very famous 00:36:14.000 --> 00:36:16.000 word. You guys have probably heard 00:36:16.000 --> 00:36:19.000 of it. So I've seen that, and now I wanna do the thing where I will pick the 00:36:19.000 --> 00:36:21.000 largest one. I'd like to know. 00:36:21.000 --> 00:36:23.000 Somebody should tell me. 00:36:23.000 --> 00:36:25.000 Int max size [cuts 00:36:25.000 --> 00:36:28.000 out] 00:36:28.000 --> 00:36:30.000 max key, so I'll 00:36:30.000 --> 00:36:33.000 set this to be zero, and max 00:36:33.000 --> 00:36:34.000 key is that. 00:36:34.000 --> 00:36:35.000 And then I'm gonna do this. If 00:36:35.000 --> 00:36:38.000 the 00:36:38.000 --> 00:36:40.000 size of this key is 00:36:40.000 --> 00:36:43.000 greater than my max size, 00:36:43.000 --> 00:36:55.000 then it's gonna get to be the new key. 00:36:55.000 --> 00:37:06.000 So after I did all of that then [cuts out] 00:37:06.000 --> 00:37:08.000 key. 00:37:08.000 --> 00:37:13.000 And then I probably wanna see what they are, so why don't I go ahead and take a look. Ah, I have to go to 00:37:13.000 --> 00:37:17.000 the type, though. [Inaudible] 00:37:17.000 --> 00:37:21.000 to equals 00:37:21.000 --> 00:37:22.000 M of 00:37:22.000 --> 00:37:27.000 key -max key, I guess, dot 00:37:27.000 --> 00:37:29.000 iterator, 00:37:29.000 --> 00:37:30.000 [inaudible] IT 00:37:30.000 --> 00:37:37.000 has next, CLIT.next [inaudible]. 00:37:37.000 --> 00:37:43.000 So it went back. It found the maximum, in this case using the size of the sets as the distinguishing 00:37:43.000 --> 00:37:47.000 feature. And 00:37:47.000 --> 00:37:49.000 then max is AEPRS, 00:37:49.000 --> 00:37:51.000 which 00:37:51.000 --> 00:37:53.000 it's got a big old list of about 12. 00:37:53.000 --> 00:37:55.000 I think that's 00:37:55.000 --> 00:37:58.000 12, actually. Maybe 13. 00:37:58.000 --> 00:37:59.000 So now you know. 00:37:59.000 --> 00:38:02.000 You can impress your friends at parties. This is the kind of thing you can win bar bets on. 00:38:02.000 --> 00:38:03.000 Oh, 00:38:03.000 --> 00:38:06.000 yeah. What's the size of the largest 00:38:06.000 --> 00:38:07.000 anagram cluster? 00:38:07.000 --> 00:38:10.000 Everybody wants to know this kind of stuff. I can't believe you guys can sleep at night without 00:38:10.000 --> 00:38:12.000 actually knowing this. 00:38:12.000 --> 00:38:16.000 And what's neat to know though [inaudible] just to point out a couple of things that -you can use a little decomposition 00:38:16.000 --> 00:38:18.000 on this code, 00:38:18.000 --> 00:38:23.000 but there's kind of a very small amount of things we're having to do. For example, one of the things that's really 00:38:23.000 --> 00:38:27.000 powerful, things like the map where we can just figure out how to key the things 00:38:27.000 --> 00:38:29.000 to store the collection right under that key, 00:38:29.000 --> 00:38:33.000 then looking it up and adding something is a very 00:38:33.000 --> 00:38:36.000 efficient sort of direct operation, just building on these things and it going through and doing all 00:38:36.000 --> 00:38:37.000 the work of 00:38:37.000 --> 00:38:41.000 storing them, sorting them, making it efficient for us to retrieve them and look them up such that I can 00:38:41.000 --> 00:38:44.000 process 100,000 words 00:38:44.000 --> 00:38:49.000 in the blink of an eye, and then go back through, look at them all, find the biggest one, 00:38:49.000 --> 00:38:51.000 get my information out. 00:38:51.000 --> 00:38:55.000 When you make the call to M.size, is 00:38:55.000 --> 00:39:00.000 that the number of words? [Inaudible]. That is the number of keys. Keys, right. So that's not actually [inaudible]. 00:39:00.000 --> 00:39:01.000 Yeah. So it doesn't know anything about 00:39:01.000 --> 00:39:04.000 everything else that was [inaudible], but in fact that's why it's 00:39:04.000 --> 00:39:07.000 [inaudible]. I know the dictionary has about 127,000 words. It turns out they form about 00:39:07.000 --> 00:39:08.000 00:39:08.000 --> 00:39:11.000 112 unique 00:39:11.000 --> 00:39:14.000 signatures, and so there's actually another 20,000 words that are 00:39:14.000 --> 00:39:17.000 clung onto some existing signature. That's the number of unique signatures across 00:39:17.000 --> 00:39:20.000 the dictionary, not the number of words, so that's probably the wrong name to call 00:39:20.000 --> 00:39:22.000 it. For the M 00:39:22.000 --> 00:39:24.000 00:39:24.000 --> 00:39:28.000 signature word thing where [inaudible] of the default just to create a new stack, that 00:39:28.000 --> 00:39:32.000 works as well for vectors [inaudible]? Yeah. It works for anything if you were just to declare 00:39:32.000 --> 00:39:34.000 it on the stack and the right thing happened, 00:39:34.000 --> 00:39:37.000 so vectors, set, maps. All those things do. 00:39:37.000 --> 00:39:41.000 But the primitive types like int and double, it doesn't. So it would work for string. String 00:39:41.000 --> 00:39:45.000 is an empty string. So for some of the fancier, more modern types tend to actually 00:39:45.000 --> 00:39:49.000 know how to just default construct themselves into a good state, but the primitives don't do that. 00:39:49.000 --> 00:39:53.000 So if you were having a map of ints and you wanted to have them start at zero, you need to really 00:39:53.000 --> 00:39:58.000 start them at zero. You can call just M sub this. It would be garbage, and it would just be operating with 00:39:58.000 --> 00:40:02.000 garbage from that way forward. 00:40:02.000 --> 00:40:06.000 All right. Well, we're good. What 00:40:06.000 --> 00:40:08.000 I'm gonna give you 00:40:08.000 --> 00:40:10.000 is the 00:40:10.000 --> 00:40:11.000 eight minute 00:40:11.000 --> 00:40:12.000 discussion of recursion 00:40:12.000 --> 00:40:14.000 that whets your appetite for 00:40:14.000 --> 00:40:18.000 the things we're gonna be doing next week. 00:40:18.000 --> 00:40:21.000 So recursion is one of those things I think that when you haven't yet 00:40:21.000 --> 00:40:24.000 had a chance to explore it first hand and other people tell you about it, it 00:40:24.000 --> 00:40:30.000 has sort of an awe inspiring sort of mystery, some fear, and whatnot. 00:40:30.000 --> 00:40:31.000 So 00:40:31.000 --> 00:40:34.000 first off, I wanna kinda shake that fear off. It is a little bit 00:40:34.000 --> 00:40:35.000 hard to wrap your head around 00:40:35.000 --> 00:40:38.000 the first time you see it, but we're gonna have a whole week's worth of time to spend on it, so 00:40:38.000 --> 00:40:41.000 we're gonna 00:40:41.000 --> 00:40:42.000 try to give you a lot of 00:40:42.000 --> 00:40:46.000 different ways to think about it, and different problems to see to kinda help you do it. And I think 00:40:46.000 --> 00:40:48.000 once you do get your head around it, it turns out then 00:40:48.000 --> 00:40:53.000 you'll discover how infinitely powerful it is, that there is kind of a simple idea in it 00:40:53.000 --> 00:40:57.000 that once you kinda fully get your head around, you can explore and solve lots of different 00:40:57.000 --> 00:41:00.000 problems using this just one technique again and again. 00:41:00.000 --> 00:41:02.000 So in itself, it's 00:41:02.000 --> 00:41:04.000 a little bit mysterious at first glance, but then 00:41:04.000 --> 00:41:06.000 once you kind of 00:41:06.000 --> 00:41:09.000 master it, you'll be amazed at the kind of neat things you can do with it. 00:41:09.000 --> 00:41:15.000 So it is certainly what I'd consider an indispensable tool in a programmer's tool kit. The kind of 00:41:15.000 --> 00:41:19.000 problems you can solve using the techniques you have so far is fundamentally limited. And 00:41:19.000 --> 00:41:22.000 part of what we need to do in this class is expose you to these new ways of solving 00:41:22.000 --> 00:41:26.000 harder, more sophisticated problems that the old techniques don't work for. One of the 00:41:26.000 --> 00:41:31.000 cool things about recursion is it actually lends very simple, elegant, short solutions 00:41:31.000 --> 00:41:35.000 to problems that at first glance seem completely unsolvable. 00:41:35.000 --> 00:41:36.000 That if you can 00:41:36.000 --> 00:41:40.000 formulate a structure for [inaudible], you will discover that the code is not long 00:41:40.000 --> 00:41:45.000 to write. It's not tedious to write. The tricky part is to figure out how 00:41:45.000 --> 00:41:47.000 to express it, 00:41:47.000 --> 00:41:50.000 so it's more of a thinking puzzle than it is a coding puzzle. I 00:41:50.000 --> 00:41:55.000 certainly like thinking puzzles as much as coding puzzles if not more. 00:41:55.000 --> 00:41:58.000 The general sort of idea is that 00:41:58.000 --> 00:42:02.000 you are going to try to solve a problem 00:42:02.000 --> 00:42:05.000 -instead of sort of breaking it down into component tasks like if I need to make dinner, I need 00:42:05.000 --> 00:42:08.000 to go to the store and buy things, and I need to come home, and chop them, and get 00:42:08.000 --> 00:42:12.000 the recipe. You think of what your standard decomposition is all about 00:42:12.000 --> 00:42:16.000 -breaking down your tasks into A, B, and C, and D, and then you add them all together to get 00:42:16.000 --> 00:42:17.000 00:42:17.000 --> 00:42:20.000 the whole task done. Recursion has this kind of very different way of thinking about the problem, which is like 00:42:20.000 --> 00:42:22.000 well if I needed to get Task A done, 00:42:22.000 --> 00:42:26.000 and I had Task A prime, which was somehow a lot like the task I was trying to solve, but it somehow 00:42:26.000 --> 00:42:29.000 was a little bit simpler, a little bit easier, 00:42:29.000 --> 00:42:32.000 a little bit more manageable than the one I started out to solve, 00:42:32.000 --> 00:42:34.000 and if I had that solution 00:42:34.000 --> 00:42:37.000 -so if somehow I could delegate it off to some 00:42:37.000 --> 00:42:39.000 00:42:39.000 --> 00:42:43.000 minion who works for me, and then I could use that to solve my problem, then my 00:42:43.000 --> 00:42:45.000 job would be made much easier by using that result of 00:42:45.000 --> 00:42:48.000 solving a similar problem that's a little bit [inaudible]. 00:42:48.000 --> 00:42:51.000 Okay, that seems a little bit wacky. 00:42:51.000 --> 00:42:55.000 Let me give you sort of an example of how this might work. 00:42:55.000 --> 00:42:58.000 So your standard problem, I said yeah, it's like you do these dissimilar sub problems. 00:42:58.000 --> 00:43:04.000 Let's imagine I had this goal where I wanted to survey the Stanford student body. 00:43:04.000 --> 00:43:06.000 I don't want just like a 00:43:06.000 --> 00:43:08.000 haphazard 00:43:08.000 --> 00:43:11.000 most of the people involved. Let's say I really wanted to get input from every single person 00:43:11.000 --> 00:43:16.000 on campus whether they think having cardinal as your mascot is a ridiculous choice or not. So 00:43:16.000 --> 00:43:19.000 let's imagine I really wanna hear from all 10,000 students. 00:43:19.000 --> 00:43:21.000 Now I can stand out in White Plaza with a big 00:43:21.000 --> 00:43:26.000 note pad and try to accost people and sort of work my way down the list. 00:43:26.000 --> 00:43:29.000 And then I'd be there for eons and never solve my problem. 00:43:29.000 --> 00:43:31.000 Instead, what I decide to do is I say well I'm gonna 00:43:31.000 --> 00:43:32.000 recruit some people to help me 00:43:32.000 --> 00:43:34.000 because I'm lazy 00:43:34.000 --> 00:43:35.000 as we've already established, 00:43:35.000 --> 00:43:37.000 and I would like to get 00:43:37.000 --> 00:43:41.000 some other people to join in my quest to answer these burning questions and to 00:43:41.000 --> 00:43:43.000 solve the survey. 00:43:43.000 --> 00:43:45.000 So what I do is I round up ten people let's say, 00:43:45.000 --> 00:43:49.000 and I say would you help me, and I decide to divide the campus into kind of ten 00:43:49.000 --> 00:43:51.000 partitions. And I say if you 00:43:51.000 --> 00:43:54.000 could survey all the people whose names begin with A, B, and C, that would really help. 00:43:54.000 --> 00:43:56.000 And if you could do [inaudible] Gs, and 00:43:56.000 --> 00:44:00.000 if you would do -and if I divide up the alphabet that way, give each of them two or three letters, and I 00:44:00.000 --> 00:44:02.000 say if you would go get the data, it'd be really 00:44:02.000 --> 00:44:05.000 easy for me to do my job then. If I just took all their 00:44:05.000 --> 00:44:08.000 data [inaudible]. Well, being 00:44:08.000 --> 00:44:12.000 the kind of lazy person that I am, it's likely that the ten people I recruit would have similar lazy qualities 00:44:12.000 --> 00:44:15.000 because lazy people hang out with other lazy people. 00:44:15.000 --> 00:44:18.000 And so the person who was in charge of A, B, C, the first thing they do is turn around and find ten 00:44:18.000 --> 00:44:22.000 more friends, and then they divide it up and say could you do the 00:44:22.000 --> 00:44:23.000 AA through AM and so on. 00:44:23.000 --> 00:44:25.000 If they divide it into these pools of 00:44:25.000 --> 00:44:30.000 one tenth of what they were responsible for, and say you can go get the information from these people, 00:44:30.000 --> 00:44:31.000 and if they did the same thing 00:44:31.000 --> 00:44:35.000 -so if everybody along the road. We started with 10,000. Now each person had 1,000 00:44:35.000 --> 00:44:39.000 to survey. They asked their friend to do 100. Their friend asked ten people to do 00:44:39.000 --> 00:44:39.000 ten. 00:44:39.000 --> 00:44:43.000 And then at some point, the person who has ten says well I just need to ask these ten people. 00:44:43.000 --> 00:44:45.000 00:44:45.000 --> 00:44:47.000 Once I get their data, we don't need to do anything further. 00:44:47.000 --> 00:44:50.000 So at some point the problem becomes so small, so simple, 00:44:50.000 --> 00:44:55.000 even though it was kind of the same problem all along. I just reproduced the same problem we had, but in 00:44:55.000 --> 00:44:58.000 a slightly more tractable form, but then I divided it around. Divide and conquer 00:44:58.000 --> 00:45:00.000 sometimes they call this to where 00:45:00.000 --> 00:45:01.000 I spread out the work 00:45:01.000 --> 00:45:05.000 around a bunch of people to where each person's contribution is just a little part of the whole. 00:45:05.000 --> 00:45:09.000 You had to find the ten volunteers around underneath you and 00:45:09.000 --> 00:45:11.000 get their 00:45:11.000 --> 00:45:14.000 help in solving the problem, but nobody had to do much work, and that's kind of a really 00:45:14.000 --> 00:45:18.000 interesting way to solve a problem. It sounds like a very big problem of surveying 00:45:18.000 --> 00:45:19.000 10,000 people, 00:45:19.000 --> 00:45:23.000 but by dividing and conquer, everybody has a little tiny role to play. You leverage a lot 00:45:23.000 --> 00:45:24.000 of people getting it done, 00:45:24.000 --> 00:45:29.000 and there is this self-similarity to it, which is kind of intriguing that 00:45:29.000 --> 00:45:34.000 everybody is trying to solve the same problem but just at different levels of scale. 00:45:34.000 --> 00:45:36.000 And so this idea 00:45:36.000 --> 00:45:38.000 applies to things like phone trees rather than 00:45:38.000 --> 00:45:42.000 trying to get the message out to everybody in my class, it might be that I call two people who call two 00:45:42.000 --> 00:45:45.000 friends who call two friends until everybody gets covered. 00:45:45.000 --> 00:45:48.000 Nobody does the whole job. Everybody just does a little part of it. Sometimes you'll see these 00:45:48.000 --> 00:45:50.000 fractal drawings where 00:45:50.000 --> 00:45:55.000 there is a large leaf which when you look closer actually consists of littler leaves, 00:45:55.000 --> 00:45:57.000 which themselves are littler leaves, so that at 00:45:57.000 --> 00:45:58.000 every level of scale 00:45:58.000 --> 00:46:01.000 the same image is being reproduced, 00:46:01.000 --> 00:46:05.000 and the result kind of on the outside is something that in itself if you look deeper has the 00:46:05.000 --> 00:46:08.000 same problem but smaller embedded in it. 00:46:08.000 --> 00:46:10.000 So it's a pretty neat 00:46:10.000 --> 00:46:11.000 sort of 00:46:11.000 --> 00:46:13.000 way of solving things. 00:46:13.000 --> 00:46:14.000 I am 00:46:14.000 --> 00:46:18.000 gonna tell you a little bit about how the code looks, and then I really am not gonna be able to show you much code today. 00:46:18.000 --> 00:46:23.000 I think it's actually even better to show you this code on Monday when we can come back fresh, 00:46:23.000 --> 00:46:27.000 but that it involves taking -So we're looking at functional recursion first, 00:46:27.000 --> 00:46:29.000 and functional recursion is 00:46:29.000 --> 00:46:33.000 taking some sort of functions, a function that takes an input and returns an answers, returns 00:46:33.000 --> 00:46:36.000 non-void thing that comes back. 00:46:36.000 --> 00:46:39.000 And we're gonna be looking at problems where you're trying to solve this big problem, 00:46:39.000 --> 00:46:44.000 and that if you had the answer to making a call to yourself on a smaller version of 00:46:44.000 --> 00:46:45.000 the problem 00:46:45.000 --> 00:46:48.000 -maybe one call, maybe two calls, maybe several calls 00:46:48.000 --> 00:46:52.000 -that you could add those, or multiply those, or combine those in some way to answer the 00:46:52.000 --> 00:46:53.000 bigger problem. So 00:46:53.000 --> 00:46:56.000 if I were trying to solve 00:46:56.000 --> 00:47:03.000 this campus survey, having the answers to these smaller campus surveys gives me that total result. And 00:47:03.000 --> 00:47:05.000 so the -I really should not try to do this in two minutes. What 00:47:05.000 --> 00:47:09.000 I should do is try to tell you on Monday. We'll come back and we'll talk about recursion, and it 00:47:09.000 --> 00:47:11.000 will be an impressive week. Meanwhile, work on 00:47:11.000 --> 00:47:14.000 your ADT homework.