0:00:00.000,0:00:08.000


0:00:08.000,0:00:11.000


0:00:11.000,0:00:22.000
This presentation is delivered by the Stanford Center for Professional Development.

0:00:22.000,0:00:25.000
Okay, so thank you for coming out in the rain.

0:00:25.000,0:00:26.000
I

0:00:26.000,0:00:27.000
appreciate you

0:00:27.000,0:00:29.000
trudging over here and

0:00:29.000,0:00:30.000
risking

0:00:30.000,0:00:31.000
your health and life.

0:00:31.000,0:00:37.000
So I'm gonna do a little diversion here to explain this idea of functions as data before I come back

0:00:37.000,0:00:40.000
to fixing the problem we had kind of hit upon at the very end of the last lecture

0:00:40.000,0:00:42.000
about set.

0:00:42.000,0:00:44.000
So what I'm just gonna show you is

0:00:44.000,0:00:45.000
a little bit about

0:00:45.000,0:00:46.000


0:00:46.000,0:00:49.000
the design of something in C++

0:00:49.000,0:00:52.000
that is gonna helpful in solving the problem we'd run into.

0:00:52.000,0:00:55.000
What I'm gonna show you here is two pieces of code that plot

0:00:55.000,0:00:57.000
a

0:00:57.000,0:01:01.000
single value function across the number line. So in this case, the function is that it's the

0:01:01.000,0:01:05.000
top one is applying the sine function, the sine wave as it varies across, and then the

0:01:05.000,0:01:06.000
square root function

0:01:06.000,0:01:09.000
which goes up and has a sloping curve to it,

0:01:09.000,0:01:13.000
and that both of these have exactly the same behaviors. They're designed to kind of go into the graphics

0:01:13.000,0:01:17.000
window, and they're just using a kind of a very simple kind of hand moving strategy. It

0:01:17.000,0:01:21.000
starts on a particular location based on what the value of sine is here,

0:01:21.000,0:01:23.000
and then over a particular interval

0:01:23.000,0:01:24.000
at 0.1

0:01:24.000,0:01:28.000
of one tenth of an inch, it plots the next point and connects a line between

0:01:28.000,0:01:31.000
them. So it does a simple kind of line approximation of what

0:01:31.000,0:01:34.000
that function looks like over the interval you asked it to plot.

0:01:34.000,0:01:37.000
The same code is being used here for plotting the square root.

0:01:37.000,0:01:41.000
And the thing to note and I tried to highlight by putting it in blue here was that every other

0:01:41.000,0:01:45.000
part of the code is exactly the same except for those two calls where I need to know what's

0:01:45.000,0:01:48.000
the value of square root starting at this particular

0:01:48.000,0:01:51.000
X. So starting at Value 1,

0:01:51.000,0:01:53.000
what is sine of one? What is square root of one? As I

0:01:53.000,0:01:57.000
work my way across the interval, what's a square root of 1.1, 1.2, 1.3 and

0:01:57.000,0:01:59.000
sine of those same values?

0:01:59.000,0:02:03.000
And so everything about the code is functionally identical. It's kind of frustrating

0:02:03.000,0:02:07.000
to look at it, though, and realize if I wanted to plot a bunch of other things -I also wanna be able to plot

0:02:07.000,0:02:09.000
the cosine function, or

0:02:09.000,0:02:12.000
some other function of my own creation

0:02:12.000,0:02:13.000
-across this interval

0:02:13.000,0:02:17.000
that I keep having to copy and paste this code and duplicate it.

0:02:17.000,0:02:21.000
And so one of the things that hopefully your 106A and prior experiences really

0:02:21.000,0:02:24.000
heightened your attention to is

0:02:24.000,0:02:26.000
if I have the same piece of code

0:02:26.000,0:02:30.000
in multiple places, I ought to be able to unify it. I ought to be able to make there be a plot

0:02:30.000,0:02:31.000
function

0:02:31.000,0:02:36.000
that can handle both sine and square root without actually having to distinguish it by copy and pasting,

0:02:36.000,0:02:38.000
so I wanna unify these two.

0:02:38.000,0:02:43.000
And so there is -the mechanism that we're gonna use kinda follows naturally if you don't let yourself get

0:02:43.000,0:02:45.000
too tripped up by what it means.

0:02:45.000,0:02:48.000
Just imagine that the parameter for example going into the function right now are the start and the stop,

0:02:48.000,0:02:49.000
the interval

0:02:49.000,0:02:52.000
from X is one to five.

0:02:52.000,0:02:55.000
What we'd like to do is further parameterize the function.

0:02:55.000,0:02:58.000
We'd like to add a third argument to it, which is to say

0:02:58.000,0:03:03.000
and when you're ready to know what function you're plotting, here's the one to use.

0:03:03.000,0:03:04.000
I'd like you to plot

0:03:04.000,0:03:08.000
sine over the interval start to stop. I'd like you to plot square root over that. So we added the third argument,

0:03:08.000,0:03:11.000
which was the function we wanted to invoke there.

0:03:11.000,0:03:16.000
Then we would be able to unify this down to where we had one generic plot function.

0:03:16.000,0:03:19.000
The good news is that this does actually

0:03:19.000,0:03:20.000
have

0:03:20.000,0:03:23.000
support features in the C++ language that let us do this.

0:03:23.000,0:03:26.000
The syntax for it is a little bit unusual, and

0:03:26.000,0:03:29.000
if you think about it too much, it can actually make your head hurt a little bit, think about what

0:03:29.000,0:03:33.000
it's doing. But you can actually use a function, a function's name

0:03:33.000,0:03:35.000
and the code that is associated with that name,

0:03:35.000,0:03:37.000
as a piece of data

0:03:37.000,0:03:41.000
that not just -you think of the code as we're calling this function, and

0:03:41.000,0:03:44.000
we're moving these things around, and executing things.

0:03:44.000,0:03:47.000
The things that we tend to be executing and operating on you think of as being integers, and strings,

0:03:47.000,0:03:49.000
and characters, and files.

0:03:49.000,0:03:52.000
But you can also extend your notion of what's data

0:03:52.000,0:03:54.000
to include the code you wrote

0:03:54.000,0:03:56.000
as part of the possibilities.

0:03:56.000,0:03:59.000
So in this case, I've added that third argument that I wanted to plot,

0:03:59.000,0:04:02.000
and this

0:04:02.000,0:04:06.000
syntax here that's a little bit unusual to you, and I'll kind of identify it, is that the

0:04:06.000,0:04:09.000
name of the parameter is actually FN.

0:04:09.000,0:04:12.000
Its name is enclosed in parentheses there.

0:04:12.000,0:04:16.000
And then to the right would be a list of the arguments

0:04:16.000,0:04:17.000
or the prototype

0:04:17.000,0:04:21.000
information about this function, and on the left is its return value.

0:04:21.000,0:04:24.000
So what this says is you have two doubles, the start and stop,

0:04:24.000,0:04:26.000
and the third thing in there isn't a double at all. It is a

0:04:26.000,0:04:27.000
function

0:04:27.000,0:04:32.000
of one double that returns a double, so a single value function that operates on doubles

0:04:32.000,0:04:33.000
here.

0:04:33.000,0:04:36.000
That is the syntax in C++ for specifying what you want coming in here

0:04:36.000,0:04:37.000
is

0:04:37.000,0:04:41.000
not a double, not a ray of doubles, anything funky like that. It is a function of a

0:04:41.000,0:04:43.000
double that returns a double.

0:04:43.000,0:04:45.000
And then in the body of this code,

0:04:45.000,0:04:47.000
when I say FN here

0:04:47.000,0:04:50.000
where I would've said sine, or square root, or identified

0:04:50.000,0:04:51.000
a particular function,

0:04:51.000,0:04:53.000
it's using the parameter

0:04:53.000,0:04:57.000
that was passed in by the client, so it says call the client's function

0:04:57.000,0:04:58.000
passing these values

0:04:58.000,0:05:01.000
to plot over the range

0:05:01.000,0:05:03.000
using their function

0:05:03.000,0:05:07.000
in. So the idea is that valid calls to plot now become things like plot -and then give it an

0:05:07.000,0:05:09.000
interval, zero to two,

0:05:09.000,0:05:11.000
and you give the name of a function:

0:05:11.000,0:05:15.000
the sine function, which comes out of the math library, the square root function,

0:05:15.000,0:05:16.000
also in the math library.

0:05:16.000,0:05:19.000
It could be that the function is something you wrote yourself,

0:05:19.000,0:05:21.000
the my function.

0:05:21.000,0:05:25.000
In order for this to be valid though, you can't just put any old function name there. It is

0:05:25.000,0:05:27.000
actually being quite specific about it

0:05:27.000,0:05:30.000
that plot was to find [inaudible]. It took a double, returned a double. That's the kind of function

0:05:30.000,0:05:33.000
that you can give it, so any function that has that

0:05:33.000,0:05:36.000
prototype, so it matches that

0:05:36.000,0:05:40.000
format is an acceptable one to pass in. If you try to pass something that actually

0:05:40.000,0:05:42.000
just does some other kind of function,

0:05:42.000,0:05:45.000
it doesn't have the same prototype, so the get line that takes no arguments and returns a string just

0:05:45.000,0:05:47.000
doesn't match on any front.

0:05:47.000,0:05:51.000
And if I try to say here, plot the get line function of the integral two to five,

0:05:51.000,0:05:53.000
it will quite rightfully complain to me

0:05:53.000,0:05:54.000


0:05:54.000,0:05:56.000
that that just doesn't make sense.

0:05:56.000,0:05:57.000
That's because it doesn't match.

0:05:57.000,0:06:02.000
So a little bit of syntax here, but actually kind of a very powerful thing, and it allows us to write to

0:06:02.000,0:06:05.000
an addition to kind of parameterizing on these things you think of as traditional

0:06:05.000,0:06:07.000
data, integers, and strings, and whatnot. It's

0:06:07.000,0:06:12.000
also say as part of your operations, you may need to make a call out to some other function.

0:06:12.000,0:06:15.000
Let's leave it open what that function is and allow the client to specify what function

0:06:15.000,0:06:20.000
to call at that time. So in this case for a plot, what function that you're trying to plot, let the client

0:06:20.000,0:06:21.000
tell you,

0:06:21.000,0:06:29.000
and then based on what they ask you to plot you can plot different things. All right. Way in the back. Is

0:06:29.000,0:06:34.000
there a similar setup for multivariable functions [inaudible]? Certainly. So all I would

0:06:34.000,0:06:37.000
need to do if this was a function that took a couple arguments, I would say double comma double comma int. If it

0:06:37.000,0:06:42.000
returned void, [inaudible] returned. Sometimes it looks a little bit like the prototype kinda taken out of context

0:06:42.000,0:06:43.000
and stuffed in there,

0:06:43.000,0:06:46.000
and then those parens around the function are a very important part of that

0:06:46.000,0:06:50.000
which is telling you yeah, this is a function of these

0:06:50.000,0:06:52.000
with this prototype information.

0:06:52.000,0:06:54.000
Behind you. No? You're good now. Somebody else? Over here? Is that FN a fixed

0:06:54.000,0:06:56.000
name

0:06:56.000,0:06:57.000


0:06:57.000,0:07:01.000
[inaudible]? No. It's just like any parameter name. You get to pick it. So I could've called it plot function, my function,

0:07:01.000,0:07:08.000
your function, whatever I wanted. Here

0:07:08.000,0:07:12.000
in the front. [Inaudible] Java? So Java doesn't really have a similar mechanism that looks like this. C does, so C++ inherits it

0:07:12.000,0:07:14.000
from

0:07:14.000,0:07:18.000
C. There are other ways you try to accomplish this in Java. It tries to support the same functionality in the end, but it

0:07:18.000,0:07:19.000


0:07:19.000,0:07:22.000
uses a pretty different approach than a functions as

0:07:22.000,0:07:27.000
data approach. [Inaudible]?

0:07:27.000,0:07:28.000
Can you pass like a

0:07:28.000,0:07:31.000
method, like an operator? So typically not.

0:07:31.000,0:07:35.000
This syntax that's being used here is for a free function, a function

0:07:35.000,0:07:38.000
that's kind of out in the global namespace, that level.

0:07:38.000,0:07:42.000
There is a different syntax for passing a method, which is a little bit more messy,

0:07:42.000,0:07:46.000
and we won't tend to need it, so I won't go there with you, but it does exist. It just as it stands

0:07:46.000,0:07:50.000
does not want a method. It wants a function. So

0:07:50.000,0:07:54.000
a method meaning member function of a class.

0:07:54.000,0:07:56.000
Okay, so let me -that was kind of just to

0:07:56.000,0:08:00.000
set the groundwork for the problem we were trying to solve in set,

0:08:00.000,0:08:05.000
which was set is holding this collection of elements that the client has stuffed in there. It's

0:08:05.000,0:08:05.000
a generic

0:08:05.000,0:08:07.000
templative class, so it doesn't

0:08:07.000,0:08:11.000
have any preconceived notion about what's being stored. Are the strings? Are they student structures?

0:08:11.000,0:08:13.000
Are they integers?

0:08:13.000,0:08:17.000
And that in order to perform its operations efficiently, it actually is using this

0:08:17.000,0:08:22.000
notion of ordering, keeping them in an order so that it can iterate an order. It can quickly find on the basis

0:08:22.000,0:08:27.000
of using order to quickly decide where something has to be and if it's present in the collection.

0:08:27.000,0:08:29.000
So how does it know how to compare

0:08:29.000,0:08:31.000
something that's of unknown type?

0:08:31.000,0:08:36.000
Well, what it does is it has a default strategy. It makes an assumption that

0:08:36.000,0:08:39.000
if I used equals equals and less than that would tell me kinda where to go.

0:08:39.000,0:08:42.000
And so it's got this idea that it wants to know. We'll give it two things.

0:08:42.000,0:08:47.000
Are they the same thing? In which case, it uses kinda the zero to show the ordering between them. If one

0:08:47.000,0:08:51.000
precedes the other, it wants to have some negative number that says well this one precedes it, or some

0:08:51.000,0:08:53.000
positive number

0:08:53.000,0:08:54.000
if it follows it.

0:08:54.000,0:08:57.000
So it applies this operation to the [inaudible] that

0:08:57.000,0:09:02.000
are there, and for strings, and ints, and doubles, and characters,

0:09:02.000,0:09:04.000
this works perfectly well.

0:09:04.000,0:09:08.000
And so that's how without us going out of our way, we can have sets of things

0:09:08.000,0:09:11.000
that respond to the built in relational operators

0:09:11.000,0:09:14.000
without any special effort as a client.

0:09:14.000,0:09:17.000
But what we can get into trouble with right is when

0:09:17.000,0:09:20.000
equals equals less than don't make sense for type.

0:09:20.000,0:09:24.000
So let me just go actually type some code, and I'll show you. I have it on the slide, but I'm gonna -wow,

0:09:24.000,0:09:29.000
this chair suddenly got really short. [Inaudible] fix that. Okay.

0:09:29.000,0:09:30.000


0:09:30.000,0:09:33.000
We'll go over here because I think it's better just to see it really happening,

0:09:33.000,0:09:36.000
so I'm gonna ignore this piece of code because it's not what I wanted.

0:09:36.000,0:09:40.000
But if I make some student T structure,

0:09:40.000,0:09:40.000
and it's got

0:09:40.000,0:09:44.000
the first and last name, and it's got the ID number,

0:09:44.000,0:09:48.000
and maybe that's all I want for now

0:09:48.000,0:09:48.000
-that if

0:09:48.000,0:09:51.000
down in my main -can't

0:09:51.000,0:09:54.000
find my main. There it is. I'm

0:09:54.000,0:09:56.000
gonna need that piece of code later, so I'm gonna leave it there.

0:09:56.000,0:09:58.000
If I make a set,

0:09:58.000,0:10:05.000
and I say I'd like a set of students -my class

0:10:05.000,0:10:09.000
-and I do this. And so I feel like I haven't gotten [inaudible]. I made this structure. I

0:10:09.000,0:10:14.000
say I'd like to make a set of students, that each student is in the class exactly once and I don't need any duplicates,

0:10:14.000,0:10:18.000
and I go through the process of trying to compile this, it's gonna give me some complaints.

0:10:18.000,0:10:21.000
And its complaint, which is a little bit hard to see up here, is

0:10:21.000,0:10:26.000
there's no match for operator equals equals in one, equals equals two, and operator less than in

0:10:26.000,0:10:27.000
one equals

0:10:27.000,0:10:30.000
two. So it's kind of showing me another piece of code that's kind of a

0:10:30.000,0:10:35.000
hidden piece of code I haven't actually seen directly, which is this operator compare function.

0:10:35.000,0:10:37.000
And that is the one that the set is using, as

0:10:37.000,0:10:40.000
it has this idea of what's the way it should compare two things. It says well I have some of this

0:10:40.000,0:10:46.000
operator compare that works generically on any type of things using equals equals and less than. And

0:10:46.000,0:10:50.000
if I click up here on the instantiate from here, it's gonna help me to understand what caused

0:10:50.000,0:10:52.000
this problem.

0:10:52.000,0:10:53.000
The problem was caused by

0:10:53.000,0:10:57.000
trying to create a set who was holding student T.

0:10:57.000,0:11:00.000
And so this gives you a little bit of an insight on how the template

0:11:00.000,0:11:02.000
operations are handled by the compiler, that

0:11:02.000,0:11:06.000
I have built this whole set class, and it depends on there being equals equals and less than

0:11:06.000,0:11:08.000
working for the type.

0:11:08.000,0:11:13.000
The fact that it didn't work for student T wasn't a cause for alarm until

0:11:13.000,0:11:15.000
I actually tried to instantiate

0:11:15.000,0:11:18.000
it. So at the point where I said I'd like to make a set holding student

0:11:18.000,0:11:22.000
T, the first point where the compiler actually goes through the process of generating a whole set

0:11:22.000,0:11:25.000
class, the set angle brackets student T,

0:11:25.000,0:11:29.000
filling in all the details, kinda working it all out, making the add, and contains, and whatever

0:11:29.000,0:11:32.000
operations, and then in the process of those, those

0:11:32.000,0:11:36.000
things are making calls that try to take student T objects and compare them using less than and

0:11:36.000,0:11:37.000
equals equals.

0:11:37.000,0:11:39.000
And that causes that code to fail.

0:11:39.000,0:11:42.000
So the code that's really failing is kind of somewhere in the class library,

0:11:42.000,0:11:44.000
but it's failing because

0:11:44.000,0:11:49.000
the things that we're passing through and instantiating for don't work with that

0:11:49.000,0:11:50.000
setup.

0:11:50.000,0:11:52.000
So it's something we've gotta

0:11:52.000,0:11:54.000
fix, we've gotta do something about.

0:11:54.000,0:11:59.000
Let me go back here and say what am I gonna do. Well, so

0:11:59.000,0:12:01.000
there's my error message. Same one,

0:12:01.000,0:12:04.000
right? Saying yeah, you can't do that

0:12:04.000,0:12:05.000
with a [inaudible]. Well, what

0:12:05.000,0:12:06.000
we do

0:12:06.000,0:12:09.000
is we use this notion of functions as data

0:12:09.000,0:12:12.000
to work out a solution for this problem

0:12:12.000,0:12:15.000
that if you think about kinda what's going on, the set actually knows everything about how

0:12:15.000,0:12:19.000
to store things. It's a very fancy efficient structure that says given your things, keeps them

0:12:19.000,0:12:20.000
in order, and

0:12:20.000,0:12:22.000
it manages to update, and

0:12:22.000,0:12:24.000
insert, and search that thing very efficiently.

0:12:24.000,0:12:28.000
But it doesn't know given any two random things how to compare them other than this assumption

0:12:28.000,0:12:32.000
it was making about less than and equals equals being a way to tell.

0:12:32.000,0:12:37.000
If it wants to have sort of a more sophisticated handling of that, what it needs to do is cooperate

0:12:37.000,0:12:38.000
with the client -that

0:12:38.000,0:12:43.000
the implementer of the set can't do it all. So there's these two programmers that

0:12:43.000,0:12:44.000
need to work in harmony.

0:12:44.000,0:12:50.000
So what the set does is it allows for the client to specify by providing a function.

0:12:50.000,0:12:54.000
It says well when I need to compare two things, how about you give me the name of a function

0:12:54.000,0:12:58.000
that when given two elements will return to me their ordering, this integer zero,

0:12:58.000,0:13:00.000
negative, positive that tells me how to put them in place.

0:13:00.000,0:13:05.000
And so the set kind of writes all of its operations in terms of well there's some function I can call, this function that

0:13:05.000,0:13:07.000
will compare two things.

0:13:07.000,0:13:11.000
If they don't specify one, I'll use this default one that maps them to the relationals, but if they do give

0:13:11.000,0:13:15.000
me one, I'll just ask them to do the comparison. And so then as its doing its searching and inserting

0:13:15.000,0:13:16.000
and whatnot,

0:13:16.000,0:13:19.000
it's calling back. We call that calling back to the client.

0:13:19.000,0:13:22.000
So the client writes a function. If I wanna put

0:13:22.000,0:13:25.000
student Ts into a set,

0:13:25.000,0:13:28.000
then I need to say when you compare two student Ts, what do you look at to know if they're

0:13:28.000,0:13:29.000
the same or how to order them.

0:13:29.000,0:13:31.000
So maybe I'm gonna order them by ID number.

0:13:31.000,0:13:33.000
Maybe I'm gonna use their first and last name.

0:13:33.000,0:13:37.000
Whatever it means for two things to be equal and have some sense of order,

0:13:37.000,0:13:38.000
I supply

0:13:38.000,0:13:40.000
-I write the function,

0:13:40.000,0:13:43.000
and then I pass it to the set constructor.

0:13:43.000,0:13:44.000
I say here's the function to use.

0:13:44.000,0:13:49.000
The set will hold on to that function. So I say here's the compare student structure function.

0:13:49.000,0:13:52.000
It holds onto that name, and when needed it calls back.

0:13:52.000,0:13:53.000
It says I'm

0:13:53.000,0:13:57.000
about to go look for a student structure. Is this the one? Well, I don't know if two student structures are

0:13:57.000,0:14:01.000
the same. I'll ask the client. Here's two student structures. Are they the same?

0:14:01.000,0:14:02.000
And then

0:14:02.000,0:14:06.000
as needed, it'll keep looking or

0:14:06.000,0:14:09.000
insert and add and do whatever I need to do.

0:14:09.000,0:14:13.000
So let's go back over here.

0:14:13.000,0:14:19.000
I'll write a little function. So the

0:14:19.000,0:14:21.000
prototype for it is it takes two elem Ts

0:14:21.000,0:14:24.000
and it returns an int.

0:14:24.000,0:14:31.000
That int is expected to have a value zero if they are the same,

0:14:31.000,0:14:33.000
and a value that is negative if

0:14:33.000,0:14:37.000
the first argument precedes the second. So if A is less than B, returns some negative

0:14:37.000,0:14:41.000
thing. You can return negative one, or negative 100, or negative one million,

0:14:41.000,0:14:42.000
but

0:14:42.000,0:14:45.000
you need to return some negative value. And then if A [cuts

0:14:45.000,0:14:47.000
out] some ordering, so they're not equal

0:14:47.000,0:14:51.000
[cuts out] later, it will return some positive value: 1, 10, 6 million.

0:14:51.000,0:14:55.000
So if I do [cuts out] say I use ID num as my comparison.

0:14:55.000,0:14:56.000
Based on [cuts

0:14:56.000,0:14:58.000
out] are the same, if they are

0:14:58.000,0:15:01.000
I can return zero. And if

0:15:01.000,0:15:03.000
ID num of A

0:15:03.000,0:15:07.000
is less than the ID num of B, I can

0:15:07.000,0:15:08.000
return negative one, and then in

0:15:08.000,0:15:11.000
the other case, I'll return one.

0:15:11.000,0:15:14.000
So it will compare them on the basis [cuts out]

0:15:14.000,0:15:16.000
figuring that the name field

0:15:16.000,0:15:17.000
at that point is

0:15:17.000,0:15:19.000
nothing new.

0:15:19.000,0:15:24.000
And then the way I use that is over here when I'm constructing it is there is

0:15:24.000,0:15:27.000
[cuts out] to the constructor, and so that's how I would pass [cuts out] add

0:15:27.000,0:15:29.000
parens to the [cuts

0:15:29.000,0:15:35.000
out] as I'm declaring it, and then I pass the name. Do I call it compare student or compare students? I can't remember. Compare student. Okay.

0:15:35.000,0:15:38.000
And

0:15:38.000,0:15:42.000
this then -so now there's nothing going on in the code -causes it to compile,

0:15:42.000,0:15:45.000
and that if you were to put

0:15:45.000,0:15:50.000
let's say a see out statement in your comparison function just for fun, you would find out as

0:15:50.000,0:15:53.000
you were doing adds, and compares, and removes, and whatnot on this set

0:15:53.000,0:15:58.000
that you would see that your call kept being made. It kept calling back to you as the

0:15:58.000,0:16:01.000
client saying I need to compare these things. I need to compare these things to decide where to put something,

0:16:01.000,0:16:03.000
whether it had something,

0:16:03.000,0:16:04.000
and whatnot.

0:16:04.000,0:16:08.000
And then based on your ordering, that would control for example how the iterator worked, that

0:16:08.000,0:16:12.000
the smallest one according to your function would be the one first returned by your

0:16:12.000,0:16:14.000
iterator, and it would move through larger

0:16:14.000,0:16:17.000
or sort of later in the ordering ones until the end. So

0:16:17.000,0:16:20.000
it's a very powerful mechanism

0:16:20.000,0:16:24.000
that's at work here because it means that for

0:16:24.000,0:16:28.000
anything you wanna put in a set, as long as you're willing to say how it is you compare

0:16:28.000,0:16:28.000
it,

0:16:28.000,0:16:33.000
then the set will take over and do the very efficient storing, and searching, and organizing

0:16:33.000,0:16:34.000
of that.

0:16:34.000,0:16:38.000
But you, the only piece you need to supply is this one little thing it can't figure

0:16:38.000,0:16:42.000
out for itself, which is given your type of thing, how do you compare

0:16:42.000,0:16:46.000
it. For the built in types string, and int, and double, and car,

0:16:46.000,0:16:51.000
it does have a default comparison function, that one that was called operator compare. Let

0:16:51.000,0:16:55.000
me go open it for you

0:16:55.000,0:16:56.000
[inaudible] the 106.

0:16:56.000,0:16:58.000
So there is a

0:16:58.000,0:17:00.000
compare function dot H,

0:17:00.000,0:17:02.000
and this is actually what

0:17:02.000,0:17:05.000
the default version of it looks. It's actually written as a template itself

0:17:05.000,0:17:07.000
that given two things it

0:17:07.000,0:17:08.000
just

0:17:08.000,0:17:11.000
turns around and asks the built in operators to help us out with that.

0:17:11.000,0:17:15.000
And that is the name that's being used

0:17:15.000,0:17:17.000
if I open the set

0:17:17.000,0:17:20.000
and you look at its constructor call. I had said that I would come back and tell you about

0:17:20.000,0:17:22.000
what this was

0:17:22.000,0:17:25.000
that the argument going into the set constructor

0:17:25.000,0:17:31.000
is one parameter whose name is CMP function that takes two elem type things -so here's the

0:17:31.000,0:17:33.000
one in the example of the two argument prototype

0:17:33.000,0:17:38.000
-returns an int, and then it uses a default assignment for that of operator compare, the one we just

0:17:38.000,0:17:39.000
looked at,

0:17:39.000,0:17:41.000
so that if you don't specify it, it goes

0:17:41.000,0:17:46.000
through and generates the standard comparison function for the type,

0:17:46.000,0:17:47.000
which

0:17:47.000,0:17:49.000
for built-ins will work fine, but for

0:17:49.000,0:17:53.000
user defined things is gonna create compiler errors.

0:17:53.000,0:17:56.000
So you can also choose if you don't like the default ordering works.

0:17:56.000,0:18:01.000
So for example, if you wanted to build a set of strings that was case insensitive

0:18:01.000,0:18:04.000
-so the default string handling would be to use equals equals and less than, which actually

0:18:04.000,0:18:08.000
does care about case. It doesn't think that capital [inaudible] is the same as lower case.

0:18:08.000,0:18:10.000
If you wanted it to consider them the same,

0:18:10.000,0:18:14.000
you could supply your own. A compare case insensitively function took two strings,

0:18:14.000,0:18:16.000
converted their case, and then compared them.

0:18:16.000,0:18:19.000
And then when you establish a set of strings, instead of letting the default argument take over,

0:18:19.000,0:18:24.000
go ahead and use your case insensitive compare, and then now you have a set of strings

0:18:24.000,0:18:26.000
that operates case insensitively.

0:18:26.000,0:18:28.000
So you can change

0:18:28.000,0:18:31.000
the ordering, adapt the ordering, whatever you like for the primitives, as well as supply the necessary

0:18:31.000,0:18:32.000
one for the

0:18:32.000,0:18:34.000
things the built-ins

0:18:34.000,0:18:35.000
don't have

0:18:35.000,0:18:40.000
properties for. So then that's that piece of code right

0:18:40.000,0:18:41.000
there. All right. So

0:18:41.000,0:18:44.000
does that make sense?

0:18:44.000,0:18:45.000
Well, now you know

0:18:45.000,0:18:49.000
kind of the whole range of things that are available in the class library. All

0:18:49.000,0:18:53.000
right, so we saw the four sequential containers, the vector, stack, queue,

0:18:53.000,0:18:55.000
and the grid that you kinda indexed ordering,

0:18:55.000,0:18:58.000
and kinda allowed you to throw things in and get them back out.

0:18:58.000,0:19:02.000
We went through the map, which is the sort of fancy heavy lifter that does that key value

0:19:02.000,0:19:04.000
lookup, and then we've seen the set, which does kind of

0:19:04.000,0:19:06.000
aggregate collection management,

0:19:06.000,0:19:10.000
and very efficient operations for kind of searching, retrieving, ordering,

0:19:10.000,0:19:12.000


0:19:12.000,0:19:13.000
joining with other kind of sets, and stuff

0:19:13.000,0:19:16.000
that also has a lot of high utility. I wanna do

0:19:16.000,0:19:19.000
one quick little program with you before I start recursion

0:19:19.000,0:19:21.000
just because I think it's kinda cool

0:19:21.000,0:19:23.000
is to talk a little about this idea of like once

0:19:23.000,0:19:26.000
you have these ADTs, you can solve a lot of cool problems, and that's certainly what this

0:19:26.000,0:19:29.000
week's assignment is about. It's like well here are these tasks that

0:19:29.000,0:19:31.000
if you didn't have

0:19:31.000,0:19:34.000
-So ADTs, just a reminder -I say this word as though everybody knows exactly what it means

0:19:34.000,0:19:37.000
-is just the idea of an abstract data type.

0:19:37.000,0:19:39.000
So an abstract data type is a data type

0:19:39.000,0:19:42.000
that you think of in terms of what it provides as an abstraction.

0:19:42.000,0:19:44.000
So a queue is this FIFO line,

0:19:44.000,0:19:47.000
and how it works internally, what it's implemented as, we're not worried

0:19:47.000,0:19:51.000
about it at all. We're only worried about the abstraction of enqueue and dequeue, and it coming first

0:19:51.000,0:19:53.000
in first out. So we talk

0:19:53.000,0:19:56.000
about ADTs, we say once we have a queue, a stack, or a vector,

0:19:56.000,0:19:58.000
we know what those things do,

0:19:58.000,0:20:00.000
what a mathematical set is about.

0:20:00.000,0:20:03.000
We build on top of that. We write code

0:20:03.000,0:20:06.000
that leverages those ADTs to do cool things

0:20:06.000,0:20:10.000
without having to also manage the low level details of where the memory came from for these things,

0:20:10.000,0:20:13.000
how they grow, how they search, how

0:20:13.000,0:20:15.000
they store and organize the data.

0:20:15.000,0:20:17.000
You just get to do real cool things.

0:20:17.000,0:20:21.000
So you probably got a little taste of that at the end of 106A when you get to use the array list and the

0:20:21.000,0:20:24.000
hashmap to do things. This set kinda just expands out

0:20:24.000,0:20:26.000
to fill out some other niches

0:20:26.000,0:20:28.000
where you can do a lot of really cool things

0:20:28.000,0:20:30.000
because you have these things around to build on.

0:20:30.000,0:20:34.000
So one of the things that happens a lot is you tend to do layered things, and you'll see a little bit of this

0:20:34.000,0:20:38.000
in the assignment you're doing this week where it's not just a set of something, it's a set of a map

0:20:38.000,0:20:39.000
of something, or

0:20:39.000,0:20:43.000
a vector of queues, a map of set. So I gave you a couple of examples here of

0:20:43.000,0:20:44.000
the things

0:20:44.000,0:20:48.000
that might be useful. Like if you think of what a smoothie is, it's a set of things

0:20:48.000,0:20:52.000
mixed together, some yogurt, and some different fruits, some wheatgrass, whatever it is you have in

0:20:52.000,0:20:53.000
it.

0:20:53.000,0:20:57.000
And that the menu for a smoothie shop is really just a bunch of those sets, so each set

0:20:57.000,0:21:00.000
of ingredients is a particular smoothie they have, and then

0:21:00.000,0:21:04.000
the set of all those sets is the menu that they post up on the board you can come

0:21:04.000,0:21:06.000
in and order.

0:21:06.000,0:21:07.000
The

0:21:07.000,0:21:09.000
compiler tends to use a map

0:21:09.000,0:21:13.000
to keep track of all the variables that are in scope. As you declare variables, it

0:21:13.000,0:21:17.000
adds them to the map so that when you later use that name, it knows where to find it.

0:21:17.000,0:21:19.000
Well, it also has to manage though

0:21:19.000,0:21:21.000
not just one map,

0:21:21.000,0:21:25.000
but the idea is as you enter and exit scopes, there is this

0:21:25.000,0:21:27.000
layering of open scopes.

0:21:27.000,0:21:32.000
So you have some open scope here. You go into a for loop where you open another scope. You add some

0:21:32.000,0:21:33.000
new variables

0:21:33.000,0:21:37.000
that when you look it actually shadows then at the nearest definition, so if you had two variables

0:21:37.000,0:21:41.000
of the same, it needs to look at the one that's closest.

0:21:41.000,0:21:44.000
And then when you exit that scope, it needs those variables to go away and

0:21:44.000,0:21:48.000
no longer be accessible. So one model for that could be very much a stack of

0:21:48.000,0:21:51.000
maps where each of those maps represents the scope

0:21:51.000,0:21:52.000
that's active,

0:21:52.000,0:21:56.000
a set of variable names, and maybe their types and information is stored in that map. And then

0:21:56.000,0:21:58.000
you stack them up. As you

0:21:58.000,0:22:01.000
open a scope, you push on a new empty map.

0:22:01.000,0:22:03.000
You put things into it, and then

0:22:03.000,0:22:06.000
maybe you enter another scope. You push on another new empty map. You stick things into it, but as you

0:22:06.000,0:22:11.000
exit and hit those closing braces, you pop those things from the stack to get to

0:22:11.000,0:22:13.000
this previous

0:22:13.000,0:22:16.000
environment you were in. So let me do a

0:22:16.000,0:22:19.000
little program with you. I just have

0:22:19.000,0:22:22.000
this idea of how this would work, and we'll see if

0:22:22.000,0:22:26.000
I can code something up. So I have -let's go

0:22:26.000,0:22:28.000
back over

0:22:28.000,0:22:32.000
here. I'm going to -this is the piece of code that just reads

0:22:32.000,0:22:34.000
words, so that's a

0:22:34.000,0:22:36.000
fine piece of code to have here.

0:22:36.000,0:22:38.000
I have a file here -let

0:22:38.000,0:22:39.000
me open

0:22:39.000,0:22:43.000
it up for you -that just contains the contents of the Official Scrabble Players' Dictionary,

0:22:43.000,0:22:45.000
Edition 2.

0:22:45.000,0:22:47.000
It's got a lot of words it.

0:22:47.000,0:22:48.000
It's pretty long.

0:22:48.000,0:22:49.000
It's still loading.

0:22:49.000,0:22:53.000
Let's go back and do something else while it's loading.

0:22:53.000,0:22:54.000
It happened to have

0:22:54.000,0:22:55.000


0:22:55.000,0:22:58.000
about 120,000 words I think is what it would be right now.

0:22:58.000,0:22:59.000
So it's busy loading,

0:22:59.000,0:23:01.000
and I have this question for you.

0:23:01.000,0:23:04.000
There are certain words that are anagrams of each other.

0:23:04.000,0:23:06.000


0:23:06.000,0:23:08.000
The word

0:23:08.000,0:23:10.000
cheap can be anagrammed into the word peach,

0:23:10.000,0:23:11.000
things like that.

0:23:11.000,0:23:12.000


0:23:12.000,0:23:13.000
And so I

0:23:13.000,0:23:15.000
am curious for the

0:23:15.000,0:23:16.000
Official Scrabble Players' Dictionary

0:23:16.000,0:23:20.000
what -so if you imagine that some words can be anagrammed a couple times, five

0:23:20.000,0:23:24.000
or six different words just on how you can rearrange the letters. I'm curious to know what the largest anagram

0:23:24.000,0:23:28.000
cluster is in the Official Scrabble Players' Dictionary. So I'd like to know across

0:23:28.000,0:23:30.000
all 127,000 words

0:23:30.000,0:23:34.000
that they form little clusters, and I'd like to find out what's the biggest of

0:23:34.000,0:23:39.000
those clusters.

0:23:39.000,0:23:41.000
Okay. That's my goal. So

0:23:41.000,0:23:45.000
here's what I've got going. I've got something that's gonna read them one by one.

0:23:45.000,0:23:48.000
So let's brainstorm for a second.

0:23:48.000,0:23:49.000


0:23:49.000,0:23:51.000
I want a way to take a particular word and kinda stick

0:23:51.000,0:23:55.000
it with its other anagram cluster friends.

0:23:55.000,0:23:57.000
What's a way I might do that?

0:23:57.000,0:24:03.000
Help me design my data structure.

0:24:03.000,0:24:06.000
Help me out. [Inaudible]. I've

0:24:06.000,0:24:08.000
got the word peach.

0:24:08.000,0:24:10.000
Where should I stick it so I can ? You could treat each string as like a set of

0:24:10.000,0:24:13.000
? So I have this

0:24:13.000,0:24:17.000
string,

0:24:17.000,0:24:21.000
which represents the letters. I've

0:24:21.000,0:24:22.000
got the word peach. I wanna be able to stick peach with cheap,

0:24:22.000,0:24:26.000
so where should I stick peach in such a way that I could find it. And you've got this idea that

0:24:26.000,0:24:27.000


0:24:27.000,0:24:29.000
the letters there are a set.

0:24:29.000,0:24:32.000
They're not quite a set, though. Be careful because the word banana has a

0:24:32.000,0:24:36.000
couple As and a couple Ns, and so it's not that I'd really want it to come down to be

0:24:36.000,0:24:39.000
the set BAN. I wouldn't wanna coalesce the duplicates on that, so I really do wanna preserve all the letters that

0:24:39.000,0:24:43.000
are in there,

0:24:43.000,0:24:46.000
but your idea's getting us somewhere. It's like there is this idea of kind of like for any

0:24:46.000,0:24:49.000
particular word there's the

0:24:49.000,0:24:53.000
collection of letters that it is formed from.

0:24:53.000,0:24:56.000
And somehow if I could use that as an identifier

0:24:56.000,0:24:58.000
in a way that was reliable

0:24:58.000,0:25:01.000
-anybody

0:25:01.000,0:25:02.000
got any

0:25:02.000,0:25:03.000
ideas

0:25:03.000,0:25:08.000
about what to do with that? If you did that a vector, each letter and then the frequency of each letter in the word? So I could certainly do that. I could build kind of a vector that

0:25:08.000,0:25:12.000
had kinda frequencies, that had this little struct that maybe it was number of times it occurs.

0:25:12.000,0:25:15.000
Then I could try to build something on the basis of that vector that was like

0:25:15.000,0:25:18.000
here is -do these two vectors

0:25:18.000,0:25:22.000
match? Does banana match apple? And you'd say well no. It turns out they don't

0:25:22.000,0:25:23.000
have the same letters. I'm trying to

0:25:23.000,0:25:27.000
think of a really easy way to represent that. Your idea's good,

0:25:27.000,0:25:31.000
but I'm thinking really lazy. So somebody help me who's lazy. Could you

0:25:31.000,0:25:36.000
have a map where the key is all the letters in alphabetical order and the value is a

0:25:36.000,0:25:41.000
vector of all the ? Yeah. That is a great idea. You're taking his idea, and you're making it easier.

0:25:41.000,0:25:43.000
You're capitalizing on lazy, which is yeah

0:25:43.000,0:25:45.000
-I wanna keep track of all the letters that are in the word,

0:25:45.000,0:25:48.000
but I wanna do it in a way that makes it really easy for

0:25:48.000,0:25:52.000
example to know whether two have the same -we'll call it the signature. The signature of

0:25:52.000,0:25:54.000
the word is the letter frequency across it.

0:25:54.000,0:25:57.000
If I could come up with a way to represent the signature that was really to compare two

0:25:57.000,0:26:01.000
signatures quickly to see if they're the same, then I will have less work to do. And so your idea is a great

0:26:01.000,0:26:04.000
one. We take the letters and we alphabetize them.

0:26:04.000,0:26:08.000
So cheap and peach both turn into the same ACEHP. It's

0:26:08.000,0:26:13.000
a nonsense thing, but it's a signature. It's unique for any anagram, and

0:26:13.000,0:26:14.000
if I use a map

0:26:14.000,0:26:15.000
where that's the key,

0:26:15.000,0:26:17.000
then I can associate with every word

0:26:17.000,0:26:20.000
that had that same signature.

0:26:20.000,0:26:23.000
So

0:26:23.000,0:26:27.000
let's start building that. So let me write

0:26:27.000,0:26:29.000
-I wanna call this the signature.

0:26:29.000,0:26:31.000
Given a string,

0:26:31.000,0:26:33.000
it's going to

0:26:33.000,0:26:34.000
alphabetize it.

0:26:34.000,0:26:37.000
I'm going to write the

0:26:37.000,0:26:40.000
dumbest version of this ever,

0:26:40.000,0:26:47.000
but I'm just gonna use a simple sorting routine on this. So

0:26:47.000,0:26:51.000
smallest is gonna small index -we'll call it min index. That's a better name for it.

0:26:51.000,0:26:53.000
Min index equals I,

0:26:53.000,0:26:57.000
and then I'm gonna look through the string, and I'm gonna find the smallest letter that's

0:26:57.000,0:26:58.000
there and

0:26:58.000,0:27:03.000
move it to the front. That's basically gonna be my strategy. I've

0:27:03.000,0:27:05.000
got the wrong place to start, though. I'm gonna

0:27:05.000,0:27:11.000
look from I to the one. So if

0:27:11.000,0:27:13.000
S sub J

0:27:13.000,0:27:15.000
is less than S sub min index,

0:27:15.000,0:27:20.000
so it is a smaller letter, then the min index gets to be J.

0:27:20.000,0:27:23.000
So far, what I've done is I've run this loop that kind

0:27:23.000,0:27:27.000
of starts in this case on the very first iteration and says the first character in Slot 0, that

0:27:27.000,0:27:31.000
must be the min. And then it looks from there to the end. Is there anything smaller? Whenever it

0:27:31.000,0:27:34.000
find anything smaller, it updates the min index, so when it's done

0:27:34.000,0:27:39.000
after that second loop has fully

0:27:39.000,0:27:40.000
operated,

0:27:40.000,0:27:44.000
then min index will point to the smallest alphabetically

0:27:44.000,0:27:45.000
character in the

0:27:45.000,0:27:49.000
string that we have. Then I'm gonna swap it to the front.

0:27:49.000,0:27:52.000
So S sub I

0:27:52.000,0:27:55.000
with S sub min index, and

0:27:55.000,0:28:08.000
I'll write a swap because swap is pretty easy to write.

0:28:08.000,0:28:10.000
See about how we do this, okay.

0:28:10.000,0:28:12.000
So

0:28:12.000,0:28:13.000
if I

0:28:13.000,0:28:16.000
-I like how this works, and then let me stop and say

0:28:16.000,0:28:17.000
right here, now

0:28:17.000,0:28:19.000
is a good time to test this thing.

0:28:19.000,0:28:23.000
I just wrote signature. It probably works, but it potentially

0:28:23.000,0:28:24.000
could not.

0:28:24.000,0:28:27.000
And I'm just gonna show you a little bit of how I write code. This is good to know. As I

0:28:27.000,0:28:28.000
say,

0:28:28.000,0:28:29.000


0:28:29.000,0:28:32.000
I put some code in here that I plan on throwing away.

0:28:32.000,0:28:35.000
Enter word

0:28:35.000,0:28:42.000
-and then I throw it through my function. I think I called it

0:28:42.000,0:28:44.000
signature, didn't I?

0:28:44.000,0:28:47.000
Signature S

0:28:47.000,0:28:51.000
-so the idea being if it doesn't work, I wanna find out sooner rather than

0:28:51.000,0:28:52.000
later. It

0:28:52.000,0:28:55.000
doesn't like my use of get line. Is that because it's not

0:28:55.000,0:28:57.000
included?

0:28:57.000,0:29:00.000
Yes, it's not. So let's go get the right header files. This is half the

0:29:00.000,0:29:03.000
battle sometimes is figuring out what

0:29:03.000,0:29:04.000
headers

0:29:04.000,0:29:05.000
are needed to make

0:29:05.000,0:29:10.000
your compiler happy. So now it's up

0:29:10.000,0:29:14.000
here at enter word, and I say what does cheap come out as?

0:29:14.000,0:29:16.000
It goes into the debugger. That's good.

0:29:16.000,0:29:17.000
So it wasn't right.

0:29:17.000,0:29:19.000
Now we're gonna get to find out

0:29:19.000,0:29:21.000
what does it not like about that.

0:29:21.000,0:29:27.000
It says

0:29:27.000,0:29:29.000
-did we forget to return something?

0:29:29.000,0:29:30.000
Let's go look at our code.

0:29:30.000,0:29:34.000
So it's complaining about -oh yeah, I can see where we're gonna get in trouble here. It's

0:29:34.000,0:29:38.000
complaining that the return -it was saying that I'm trying to print something and it looks like garbage, that

0:29:38.000,0:29:40.000
what I'm trying to print didn't make sense at all.

0:29:40.000,0:29:47.000
I could say well that's funny. Let's go look at signature. [Inaudible] pass the [inaudible].

0:29:47.000,0:29:51.000
That's probably a good idea. We ought to fix that while we're there.

0:29:51.000,0:29:54.000
Let me leave that bug in for a minute because I'm gonna fix my first bug, and then I'll come back

0:29:54.000,0:29:55.000
to that one.

0:29:55.000,0:29:57.000
So it turns out what it's really complaining about though is it has to do with

0:29:57.000,0:30:01.000
-I said well what does signature return. Somehow, what's being returned by signature is being interpreted

0:30:01.000,0:30:04.000
as total crap when it got back to main, and there's a very good reason for that

0:30:04.000,0:30:06.000
because I never returned

0:30:06.000,0:30:09.000
anything. So maybe if I had been less cavalier about the fact that it was giving me a warning

0:30:09.000,0:30:11.000
over here that said

0:30:11.000,0:30:14.000
control reaches the end of non-void function, but I was being lazy and I didn't look

0:30:14.000,0:30:17.000
-was that I

0:30:17.000,0:30:18.000
didn't pay attention to the fact.

0:30:18.000,0:30:20.000
So let's leave my other bug in that you've already pointed out

0:30:20.000,0:30:24.000
because this is exactly what it's like when you're doing it. You enter words and you say cheap,

0:30:24.000,0:30:25.000
and it says cheap, and you're like

0:30:25.000,0:30:28.000
no.

0:30:28.000,0:30:32.000
And then you're like about how about an. Hey look, that's in alphabetical order. And then you spend all your time thinking about words

0:30:32.000,0:30:37.000
for a while. Well, tux. That's in alphabetical order. It seems to work. You could do that for a while, but

0:30:37.000,0:30:39.000
then

0:30:39.000,0:30:41.000
you're like this whole cheap,

0:30:41.000,0:30:42.000
not good.

0:30:42.000,0:30:46.000
Okay, so I come back here and my friend who's already one step ahead of me has pointed out that my swap

0:30:46.000,0:30:50.000
is missing the all important pass by reference

0:30:50.000,0:30:53.000
that as it is it what swapping the copies that got passed to

0:30:53.000,0:30:56.000
the function, but of course nothing was happening back here in signature land.

0:30:56.000,0:30:57.000
So if I fix that, and

0:30:57.000,0:30:59.000
I come back in here,

0:30:59.000,0:31:01.000
I'm gonna feel better about this. Oh,

0:31:01.000,0:31:04.000
my gosh. And even tux still works.

0:31:04.000,0:31:08.000
And if I say banana, I should get a bunch of As, a bunch of B and an N, so there's various cases to test out if you have

0:31:08.000,0:31:09.000
multiple letters, and if

0:31:09.000,0:31:12.000
you have letters that are the same letters like

0:31:12.000,0:31:16.000
apple so that it doesn't lose your duplicates, and it seems to come out right.

0:31:16.000,0:31:17.000
So given this,

0:31:17.000,0:31:19.000
the word peach

0:31:19.000,0:31:22.000
should come down to the same signature as cheap, and so that seems to indicate we're on

0:31:22.000,0:31:24.000
the path

0:31:24.000,0:31:24.000
toward

0:31:24.000,0:31:26.000
building the thing we wanted to build. So I

0:31:26.000,0:31:28.000
have this read file thing that I

0:31:28.000,0:31:29.000
have left over from last time that

0:31:29.000,0:31:31.000
reads each word,

0:31:31.000,0:31:35.000
and I wanna change my vector

0:31:35.000,0:31:37.000
into a map of

0:31:37.000,0:31:40.000
set of string.

0:31:40.000,0:31:48.000
What capitalization do you not like? [Inaudible].

0:31:48.000,0:31:52.000
Yeah. So it turns out this file happens to all be

0:31:52.000,0:32:00.000
lower case, but there's no harm in doing this.

0:32:00.000,0:32:03.000
That way even if they weren't, it'll

0:32:03.000,0:32:06.000
take care of that problem for us if we wanted it to.

0:32:06.000,0:32:09.000
[Inaudible].

0:32:09.000,0:32:11.000
I want lower case. Oh

0:32:11.000,0:32:12.000
yeah, I do. Well, your

0:32:12.000,0:32:15.000
case. You guys are so picky.

0:32:15.000,0:32:18.000
All right. Here's my deal. I'm gonna take my map,

0:32:18.000,0:32:21.000
and this is gonna be a line of code that's gonna make your head spin.

0:32:21.000,0:32:31.000
Just go with it.

0:32:31.000,0:32:38.000
This is the do all

0:32:38.000,0:32:40.000
craziness. Okay.

0:32:40.000,0:32:41.000
So

0:32:41.000,0:32:43.000
I've got this map,

0:32:43.000,0:32:46.000
and what I wanna do is under the signature of this word,

0:32:46.000,0:32:50.000
I want to look up the set of strings that's associated with it and tack this one in.

0:32:50.000,0:32:54.000
And the add in this case with the set, I know that's gonna do non-duplication. In fact,

0:32:54.000,0:32:58.000
the file doesn't contain duplicates, but if it did, I certainly don't want it to record it

0:32:58.000,0:33:00.000
twice anyway, so I might as well do this.

0:33:00.000,0:33:04.000
Now this form of this is a heavy lifter for a small piece of

0:33:04.000,0:33:04.000
code.

0:33:04.000,0:33:09.000
The signature then went and converted into the ACEHP form.

0:33:09.000,0:33:12.000
I used that as the key into the table.

0:33:12.000,0:33:16.000
If it was already there, it's gonna retrieve me an existing set that I'm gonna just go ahead and

0:33:16.000,0:33:21.000
add a word onto. If it wasn't there, the behavior for the brackets is to create

0:33:21.000,0:33:26.000
kind of a new empty

0:33:26.000,0:33:28.000
value for that. So it'll use the key

0:33:28.000,0:33:32.000
and create a default value for type. Well, the default value for set of string -so if you just create a

0:33:32.000,0:33:32.000
set

0:33:32.000,0:33:36.000
without any other information in the default constructor will always create you a

0:33:36.000,0:33:40.000
nice clean empty set. So in fact, it will get me exactly what I want which is to put it in there with an empty

0:33:40.000,0:33:43.000
set that I will immediately add the word into. So after

0:33:43.000,0:33:46.000
I do this, I should have this fully populated map. And

0:33:46.000,0:33:49.000
then I'm

0:33:49.000,0:33:56.000
gonna do this just as a little test. I'm gonna say num words when I'm done

0:33:56.000,0:33:58.000
to feel

0:33:58.000,0:34:00.000
a little bit confident about what

0:34:00.000,0:34:04.000
got put in there. How

0:34:04.000,0:34:06.000
about I call it -that's a good idea. It's

0:34:06.000,0:34:11.000
so fussy. C++ never does what you want.

0:34:11.000,0:34:15.000
I think I called this thing OSPD2.txt that has the

0:34:15.000,0:34:20.000
words in it.

0:34:20.000,0:34:29.000
And then I need to declare the variable that I'm sticking all this stuff into is a set. So

0:34:29.000,0:34:32.000
go in, load stuff,

0:34:32.000,0:34:35.000


0:34:35.000,0:34:36.000
doing it's thing, the

0:34:36.000,0:34:38.000
number of words 112,882. Okay.

0:34:38.000,0:34:42.000
That's close enough. I can't remember the number that's in there, but that sounds like a fine

0:34:42.000,0:34:43.000


0:34:43.000,0:34:47.000
approximation of it. So I feel like it did sort of manage to do something for it. And I can actually do this if I

0:34:47.000,0:34:51.000
just wanna get a little glimpse of it is to use my iterator to look

0:34:51.000,0:34:53.000
at

0:34:53.000,0:35:04.000
something that's in there. Wait, is map a set of string? Iterator --

0:35:04.000,0:35:05.000
file iter.hasNext. I'm gonna

0:35:05.000,0:35:07.000
say key

0:35:07.000,0:35:10.000
equals iter.next,

0:35:10.000,0:35:13.000


0:35:13.000,0:35:18.000
and I'm going to print that key, and I'm going to print the size of the

0:35:18.000,0:35:28.000
set because I'm at this

0:35:28.000,0:35:33.000
point -I should see gobs of printing come out of this thing. It

0:35:33.000,0:35:37.000
takes a little bit of a while to process the thing, and then

0:35:37.000,0:35:42.000
see gobs and gobs of stuff going by. It looks like a lot of things are ones if you can

0:35:42.000,0:35:45.000
imagine reading them as they go by because a lot of words are really just not anagrams

0:35:45.000,0:35:48.000
of anything else. But

0:35:48.000,0:35:52.000
some of the shorter ones have sort of a better chance.

0:35:52.000,0:35:56.000
So you can find out here at the end that there are EEIKLPST. I don't know what that is. Leakiest? No. I don't

0:35:56.000,0:35:58.000
know

0:35:58.000,0:35:59.000
what that word is. I

0:35:59.000,0:36:03.000
should write it in for any of them.

0:36:03.000,0:36:09.000
This dictionary has a bunch of really crazy words in it too, so it makes it especially challenging. What is that? I don't know. That one

0:36:09.000,0:36:10.000
almost looks

0:36:10.000,0:36:14.000
like beginners, but it's got an F in it. It's the F beginners, a very famous

0:36:14.000,0:36:16.000
word. You guys have probably heard

0:36:16.000,0:36:19.000
of it. So I've seen that, and now I wanna do the thing where I will pick the

0:36:19.000,0:36:21.000
largest one. I'd like to know.

0:36:21.000,0:36:23.000
Somebody should tell me.

0:36:23.000,0:36:25.000
Int max size [cuts

0:36:25.000,0:36:28.000
out]

0:36:28.000,0:36:30.000
max key, so I'll

0:36:30.000,0:36:33.000
set this to be zero, and max

0:36:33.000,0:36:34.000
key is that.

0:36:34.000,0:36:35.000
And then I'm gonna do this. If

0:36:35.000,0:36:38.000
the

0:36:38.000,0:36:40.000
size of this key is

0:36:40.000,0:36:43.000
greater than my max size,

0:36:43.000,0:36:55.000
then it's gonna get to be the new key.

0:36:55.000,0:37:06.000
So after I did all of that then [cuts out]

0:37:06.000,0:37:08.000
key.

0:37:08.000,0:37:13.000
And then I probably wanna see what they are, so why don't I go ahead and take a look. Ah, I have to go to

0:37:13.000,0:37:17.000
the type, though. [Inaudible]

0:37:17.000,0:37:21.000
to equals

0:37:21.000,0:37:22.000
M of

0:37:22.000,0:37:27.000
key -max key, I guess, dot

0:37:27.000,0:37:29.000
iterator,

0:37:29.000,0:37:30.000
[inaudible] IT

0:37:30.000,0:37:37.000
has next, CLIT.next [inaudible].

0:37:37.000,0:37:43.000
So it went back. It found the maximum, in this case using the size of the sets as the distinguishing

0:37:43.000,0:37:47.000
feature. And

0:37:47.000,0:37:49.000
then max is AEPRS,

0:37:49.000,0:37:51.000
which

0:37:51.000,0:37:53.000
it's got a big old list of about 12.

0:37:53.000,0:37:55.000
I think that's

0:37:55.000,0:37:58.000
12, actually. Maybe 13.

0:37:58.000,0:37:59.000
So now you know.

0:37:59.000,0:38:02.000
You can impress your friends at parties. This is the kind of thing you can win bar bets on.

0:38:02.000,0:38:03.000
Oh,

0:38:03.000,0:38:06.000
yeah. What's the size of the largest

0:38:06.000,0:38:07.000
anagram cluster?

0:38:07.000,0:38:10.000
Everybody wants to know this kind of stuff. I can't believe you guys can sleep at night without

0:38:10.000,0:38:12.000
actually knowing this.

0:38:12.000,0:38:16.000
And what's neat to know though [inaudible] just to point out a couple of things that -you can use a little decomposition

0:38:16.000,0:38:18.000
on this code,

0:38:18.000,0:38:23.000
but there's kind of a very small amount of things we're having to do. For example, one of the things that's really

0:38:23.000,0:38:27.000
powerful, things like the map where we can just figure out how to key the things

0:38:27.000,0:38:29.000
to store the collection right under that key,

0:38:29.000,0:38:33.000
then looking it up and adding something is a very

0:38:33.000,0:38:36.000
efficient sort of direct operation, just building on these things and it going through and doing all

0:38:36.000,0:38:37.000
the work of

0:38:37.000,0:38:41.000
storing them, sorting them, making it efficient for us to retrieve them and look them up such that I can

0:38:41.000,0:38:44.000
process 100,000 words

0:38:44.000,0:38:49.000
in the blink of an eye, and then go back through, look at them all, find the biggest one,

0:38:49.000,0:38:51.000
get my information out.

0:38:51.000,0:38:55.000
When you make the call to M.size, is

0:38:55.000,0:39:00.000
that the number of words? [Inaudible]. That is the number of keys. Keys, right. So that's not actually [inaudible].

0:39:00.000,0:39:01.000
Yeah. So it doesn't know anything about

0:39:01.000,0:39:04.000
everything else that was [inaudible], but in fact that's why it's

0:39:04.000,0:39:07.000
[inaudible]. I know the dictionary has about 127,000 words. It turns out they form about

0:39:07.000,0:39:08.000


0:39:08.000,0:39:11.000
112 unique

0:39:11.000,0:39:14.000
signatures, and so there's actually another 20,000 words that are

0:39:14.000,0:39:17.000
clung onto some existing signature. That's the number of unique signatures across

0:39:17.000,0:39:20.000
the dictionary, not the number of words, so that's probably the wrong name to call

0:39:20.000,0:39:22.000
it. For the M

0:39:22.000,0:39:24.000


0:39:24.000,0:39:28.000
signature word thing where [inaudible] of the default just to create a new stack, that

0:39:28.000,0:39:32.000
works as well for vectors [inaudible]? Yeah. It works for anything if you were just to declare

0:39:32.000,0:39:34.000
it on the stack and the right thing happened,

0:39:34.000,0:39:37.000
so vectors, set, maps. All those things do.

0:39:37.000,0:39:41.000
But the primitive types like int and double, it doesn't. So it would work for string. String

0:39:41.000,0:39:45.000
is an empty string. So for some of the fancier, more modern types tend to actually

0:39:45.000,0:39:49.000
know how to just default construct themselves into a good state, but the primitives don't do that.

0:39:49.000,0:39:53.000
So if you were having a map of ints and you wanted to have them start at zero, you need to really

0:39:53.000,0:39:58.000
start them at zero. You can call just M sub this. It would be garbage, and it would just be operating with

0:39:58.000,0:40:02.000
garbage from that way forward.

0:40:02.000,0:40:06.000
All right. Well, we're good. What

0:40:06.000,0:40:08.000
I'm gonna give you

0:40:08.000,0:40:10.000
is the

0:40:10.000,0:40:11.000
eight minute

0:40:11.000,0:40:12.000
discussion of recursion

0:40:12.000,0:40:14.000
that whets your appetite for

0:40:14.000,0:40:18.000
the things we're gonna be doing next week.

0:40:18.000,0:40:21.000
So recursion is one of those things I think that when you haven't yet

0:40:21.000,0:40:24.000
had a chance to explore it first hand and other people tell you about it, it

0:40:24.000,0:40:30.000
has sort of an awe inspiring sort of mystery, some fear, and whatnot.

0:40:30.000,0:40:31.000
So

0:40:31.000,0:40:34.000
first off, I wanna kinda shake that fear off. It is a little bit

0:40:34.000,0:40:35.000
hard to wrap your head around

0:40:35.000,0:40:38.000
the first time you see it, but we're gonna have a whole week's worth of time to spend on it, so

0:40:38.000,0:40:41.000
we're gonna

0:40:41.000,0:40:42.000
try to give you a lot of

0:40:42.000,0:40:46.000
different ways to think about it, and different problems to see to kinda help you do it. And I think

0:40:46.000,0:40:48.000
once you do get your head around it, it turns out then

0:40:48.000,0:40:53.000
you'll discover how infinitely powerful it is, that there is kind of a simple idea in it

0:40:53.000,0:40:57.000
that once you kinda fully get your head around, you can explore and solve lots of different

0:40:57.000,0:41:00.000
problems using this just one technique again and again.

0:41:00.000,0:41:02.000
So in itself, it's

0:41:02.000,0:41:04.000
a little bit mysterious at first glance, but then

0:41:04.000,0:41:06.000
once you kind of

0:41:06.000,0:41:09.000
master it, you'll be amazed at the kind of neat things you can do with it.

0:41:09.000,0:41:15.000
So it is certainly what I'd consider an indispensable tool in a programmer's tool kit. The kind of

0:41:15.000,0:41:19.000
problems you can solve using the techniques you have so far is fundamentally limited. And

0:41:19.000,0:41:22.000
part of what we need to do in this class is expose you to these new ways of solving

0:41:22.000,0:41:26.000
harder, more sophisticated problems that the old techniques don't work for. One of the

0:41:26.000,0:41:31.000
cool things about recursion is it actually lends very simple, elegant, short solutions

0:41:31.000,0:41:35.000
to problems that at first glance seem completely unsolvable.

0:41:35.000,0:41:36.000
That if you can

0:41:36.000,0:41:40.000
formulate a structure for [inaudible], you will discover that the code is not long

0:41:40.000,0:41:45.000
to write. It's not tedious to write. The tricky part is to figure out how

0:41:45.000,0:41:47.000
to express it,

0:41:47.000,0:41:50.000
so it's more of a thinking puzzle than it is a coding puzzle. I

0:41:50.000,0:41:55.000
certainly like thinking puzzles as much as coding puzzles if not more.

0:41:55.000,0:41:58.000
The general sort of idea is that

0:41:58.000,0:42:02.000
you are going to try to solve a problem

0:42:02.000,0:42:05.000
-instead of sort of breaking it down into component tasks like if I need to make dinner, I need

0:42:05.000,0:42:08.000
to go to the store and buy things, and I need to come home, and chop them, and get

0:42:08.000,0:42:12.000
the recipe. You think of what your standard decomposition is all about

0:42:12.000,0:42:16.000
-breaking down your tasks into A, B, and C, and D, and then you add them all together to get

0:42:16.000,0:42:17.000


0:42:17.000,0:42:20.000
the whole task done. Recursion has this kind of very different way of thinking about the problem, which is like

0:42:20.000,0:42:22.000
well if I needed to get Task A done,

0:42:22.000,0:42:26.000
and I had Task A prime, which was somehow a lot like the task I was trying to solve, but it somehow

0:42:26.000,0:42:29.000
was a little bit simpler, a little bit easier,

0:42:29.000,0:42:32.000
a little bit more manageable than the one I started out to solve,

0:42:32.000,0:42:34.000
and if I had that solution

0:42:34.000,0:42:37.000
-so if somehow I could delegate it off to some

0:42:37.000,0:42:39.000


0:42:39.000,0:42:43.000
minion who works for me, and then I could use that to solve my problem, then my

0:42:43.000,0:42:45.000
job would be made much easier by using that result of

0:42:45.000,0:42:48.000
solving a similar problem that's a little bit [inaudible].

0:42:48.000,0:42:51.000
Okay, that seems a little bit wacky.

0:42:51.000,0:42:55.000
Let me give you sort of an example of how this might work.

0:42:55.000,0:42:58.000
So your standard problem, I said yeah, it's like you do these dissimilar sub problems.

0:42:58.000,0:43:04.000
Let's imagine I had this goal where I wanted to survey the Stanford student body.

0:43:04.000,0:43:06.000
I don't want just like a

0:43:06.000,0:43:08.000
haphazard

0:43:08.000,0:43:11.000
most of the people involved. Let's say I really wanted to get input from every single person

0:43:11.000,0:43:16.000
on campus whether they think having cardinal as your mascot is a ridiculous choice or not. So

0:43:16.000,0:43:19.000
let's imagine I really wanna hear from all 10,000 students.

0:43:19.000,0:43:21.000
Now I can stand out in White Plaza with a big

0:43:21.000,0:43:26.000
note pad and try to accost people and sort of work my way down the list.

0:43:26.000,0:43:29.000
And then I'd be there for eons and never solve my problem.

0:43:29.000,0:43:31.000
Instead, what I decide to do is I say well I'm gonna

0:43:31.000,0:43:32.000
recruit some people to help me

0:43:32.000,0:43:34.000
because I'm lazy

0:43:34.000,0:43:35.000
as we've already established,

0:43:35.000,0:43:37.000
and I would like to get

0:43:37.000,0:43:41.000
some other people to join in my quest to answer these burning questions and to

0:43:41.000,0:43:43.000
solve the survey.

0:43:43.000,0:43:45.000
So what I do is I round up ten people let's say,

0:43:45.000,0:43:49.000
and I say would you help me, and I decide to divide the campus into kind of ten

0:43:49.000,0:43:51.000
partitions. And I say if you

0:43:51.000,0:43:54.000
could survey all the people whose names begin with A, B, and C, that would really help.

0:43:54.000,0:43:56.000
And if you could do [inaudible] Gs, and

0:43:56.000,0:44:00.000
if you would do -and if I divide up the alphabet that way, give each of them two or three letters, and I

0:44:00.000,0:44:02.000
say if you would go get the data, it'd be really

0:44:02.000,0:44:05.000
easy for me to do my job then. If I just took all their

0:44:05.000,0:44:08.000
data [inaudible]. Well, being

0:44:08.000,0:44:12.000
the kind of lazy person that I am, it's likely that the ten people I recruit would have similar lazy qualities

0:44:12.000,0:44:15.000
because lazy people hang out with other lazy people.

0:44:15.000,0:44:18.000
And so the person who was in charge of A, B, C, the first thing they do is turn around and find ten

0:44:18.000,0:44:22.000
more friends, and then they divide it up and say could you do the

0:44:22.000,0:44:23.000
AA through AM and so on.

0:44:23.000,0:44:25.000
If they divide it into these pools of

0:44:25.000,0:44:30.000
one tenth of what they were responsible for, and say you can go get the information from these people,

0:44:30.000,0:44:31.000
and if they did the same thing

0:44:31.000,0:44:35.000
-so if everybody along the road. We started with 10,000. Now each person had 1,000

0:44:35.000,0:44:39.000
to survey. They asked their friend to do 100. Their friend asked ten people to do

0:44:39.000,0:44:39.000
ten.

0:44:39.000,0:44:43.000
And then at some point, the person who has ten says well I just need to ask these ten people.

0:44:43.000,0:44:45.000


0:44:45.000,0:44:47.000
Once I get their data, we don't need to do anything further.

0:44:47.000,0:44:50.000
So at some point the problem becomes so small, so simple,

0:44:50.000,0:44:55.000
even though it was kind of the same problem all along. I just reproduced the same problem we had, but in

0:44:55.000,0:44:58.000
a slightly more tractable form, but then I divided it around. Divide and conquer

0:44:58.000,0:45:00.000
sometimes they call this to where

0:45:00.000,0:45:01.000
I spread out the work

0:45:01.000,0:45:05.000
around a bunch of people to where each person's contribution is just a little part of the whole.

0:45:05.000,0:45:09.000
You had to find the ten volunteers around underneath you and

0:45:09.000,0:45:11.000
get their

0:45:11.000,0:45:14.000
help in solving the problem, but nobody had to do much work, and that's kind of a really

0:45:14.000,0:45:18.000
interesting way to solve a problem. It sounds like a very big problem of surveying

0:45:18.000,0:45:19.000
10,000 people,

0:45:19.000,0:45:23.000
but by dividing and conquer, everybody has a little tiny role to play. You leverage a lot

0:45:23.000,0:45:24.000
of people getting it done,

0:45:24.000,0:45:29.000
and there is this self-similarity to it, which is kind of intriguing that

0:45:29.000,0:45:34.000
everybody is trying to solve the same problem but just at different levels of scale.

0:45:34.000,0:45:36.000
And so this idea

0:45:36.000,0:45:38.000
applies to things like phone trees rather than

0:45:38.000,0:45:42.000
trying to get the message out to everybody in my class, it might be that I call two people who call two

0:45:42.000,0:45:45.000
friends who call two friends until everybody gets covered.

0:45:45.000,0:45:48.000
Nobody does the whole job. Everybody just does a little part of it. Sometimes you'll see these

0:45:48.000,0:45:50.000
fractal drawings where

0:45:50.000,0:45:55.000
there is a large leaf which when you look closer actually consists of littler leaves,

0:45:55.000,0:45:57.000
which themselves are littler leaves, so that at

0:45:57.000,0:45:58.000
every level of scale

0:45:58.000,0:46:01.000
the same image is being reproduced,

0:46:01.000,0:46:05.000
and the result kind of on the outside is something that in itself if you look deeper has the

0:46:05.000,0:46:08.000
same problem but smaller embedded in it.

0:46:08.000,0:46:10.000
So it's a pretty neat

0:46:10.000,0:46:11.000
sort of

0:46:11.000,0:46:13.000
way of solving things.

0:46:13.000,0:46:14.000
I am

0:46:14.000,0:46:18.000
gonna tell you a little bit about how the code looks, and then I really am not gonna be able to show you much code today.

0:46:18.000,0:46:23.000
I think it's actually even better to show you this code on Monday when we can come back fresh,

0:46:23.000,0:46:27.000
but that it involves taking -So we're looking at functional recursion first,

0:46:27.000,0:46:29.000
and functional recursion is

0:46:29.000,0:46:33.000
taking some sort of functions, a function that takes an input and returns an answers, returns

0:46:33.000,0:46:36.000
non-void thing that comes back.

0:46:36.000,0:46:39.000
And we're gonna be looking at problems where you're trying to solve this big problem,

0:46:39.000,0:46:44.000
and that if you had the answer to making a call to yourself on a smaller version of

0:46:44.000,0:46:45.000
the problem

0:46:45.000,0:46:48.000
-maybe one call, maybe two calls, maybe several calls

0:46:48.000,0:46:52.000
-that you could add those, or multiply those, or combine those in some way to answer the

0:46:52.000,0:46:53.000
bigger problem. So

0:46:53.000,0:46:56.000
if I were trying to solve

0:46:56.000,0:47:03.000
this campus survey, having the answers to these smaller campus surveys gives me that total result. And

0:47:03.000,0:47:05.000
so the -I really should not try to do this in two minutes. What

0:47:05.000,0:47:09.000
I should do is try to tell you on Monday. We'll come back and we'll talk about recursion, and it

0:47:09.000,0:47:11.000
will be an impressive week. Meanwhile, work on

0:47:11.000,0:47:14.000
your ADT homework.