This presentation is delivered by the Stanford Center for Professional
Development.
Hi, there.
It's
Wednesday. We are coming to the end of this fine quarter. So
what I'm going to do today is I
brought a cup filled with candy. We'll start with that
because that's going to be helpful. What I'd like to do is it's just a couple
bits of loose ends, some things that I wanted to touch on that
have come and gone this quarter. I maybe wanted to pull them together and reach a bit of closure
on.
Then hopefully, if we have time, we'll move on to what
happens after this. If you liked 106b, what do you do? If you don't like
106b, what do you do?
What kind of options and future opportunities are there
after this
ends? I'm going to talk a little about some of the things that
[inaudible], but I do want to talk about the final just
for a second because it is kind of a
last
opportunity for you to show us your mastery of the material
and get the grade that you've been working hard for all quarter.
It does carry a certain amount of weight. 30 percent of the weight in the default
system is actually placed on the final,
and more of it can move to that if you have a better
showing on the final than you did in the midterm. We're happy to move some of that
weight over so it could actually
account for even more of your total grade.
So there's a lot riding on it, and you want to be sure you come
ready to show us the
things that you know and that you have comprehension and mastery over.
To note, though, it is going to look a little bit different than the midterm in terms of its
emphasis. We are going to see a lot more of this nitty gritty dynamic data
structure,
Linklis, trees,
graphs, pointer-based
structures with the
inherent complications in dealing with that kind of dynamic structure. So bringing
your best game to this exam
is certainly going to pay off here. The
coding that you will see will look a lot like what's in the assignments. If you already had a
chance to look at the practice exam, you'll see that there are
themes there that echo the things you did on the work. You want to try to make sure
that's we're connecting to the
effort you've put in all along.
There will actually be some opportunity to think a little bit at a higher level
about making and analyzing choices for design and implementation. I'm going to talk a
little bit about that today because I feel like
at any given moment, we're talking about
how we might implement a stack and how we might implement a queue. Sometimes it helps to
step back a level and think a little bit about the choices for
the bigger picture of when to use a stack and queue and how to make
decisions in the large about those tradeoffs that are intelligent and
grounded. So I'll do a little bit of that today, and then
some of that has been built up all quarter long.
It is open book and open notes. You can bring all your printouts, your readers,
sections, all those sort of things.
For some people, that translates to this idea that, well, I don't need to
prepare because I'll have all my stuff there. If I need to know something
in the exam, I'll just go look it up. That's certainly true for looking up details. You want to remember how
this works or you had a piece of code that you think was similar. It could help
you to brainstorm what you're trying to do this time. But
you're not likely to have time to learn new things n the exam. So if there's
something that you really feel you have not
gotten your head around,
the investment upfront before you get into the exam is where to get that
understanding in place rather than trying, in the exam, to be flipping
through chapter ten, trying to learn something. It
may not really play out.
I highly recommend practicing on paper in an exam-like environment. So really
working
the way you will be working in the exam to give yourself
the best,
consistent prep for what's happening there.
Then some people have asked about extra problems, just wanting more of them to work on. I'll give you a
couple places where you can look for things. One is that cs106 is
being offered this same quarter, and they are just the honors version of our class.
So they have some
slightly notched up problems. But their practice and real midterm are posted on their
website as well as their practice final.
They have very similar coverage. They did some assignments that were like
ours, some that were a little bit different, but still go places to kind of
just grab
actual exam problems with their solution.
The chapter exercises and section problems often are old
exam problems or
eventually became exam problems. So they kind of come and go in terms of being used as
examples or used as exam problems. So they're definitely places to
pick up extra mileage. Okay. So let me tell
you a little bit about
- any questions about the final? Okay. Let me talk about
this line
just for a second because I feel like we have
touched on these themes again and again, but maybe it's good to have
at least one
moment of trying to pull this together and think a little bit about
the closure and the big picture of all these things.
A lot of what we're talking about in the beginning was we
were giving you these abstractions, these cool things to do stuff with, a map, a
stack, a queue, a vector,
and problems to solve where those abstractions have a role to play.
Then we spent a long time saying,
okay, now that it's our job to make those abstractions work,
what techniques can we use? What algorithms
and data structures will help us manipulate and
model those structures in efficient ways. We
certainly spent a lot of time talking about runtime performance as though that
was maybe a little too much in your mind to think that was the only thing
that really mattered. How fast can we make something? O of N is better
than N squared.
Log N is better than N, driving to these holy grails of
bringing the time down.
It's certainly a very important factor
that often -
the difference between something being linear and something being quadratic is a
matter of it being tractable at all for the problem at hand. You cannot sort a
million numbers in a quadratic sort in nearly
the same time. For large enough data sets, it'd
be completely unfeasible.
So it is important to have that big picture, to be thinking about it and know about
those tradeoffs, to understand issues
of scale here.
But we also did some stuff on the homework about actual empirical time
results, which is another way to bolster our understanding. The big O
tells us these things, but
what really happens in real life
matches it, but not precisely. Those constant factors, we threw away,
and other artifacts that are
happening in the real world often give us new insights about things and
challenges that we're facing.
We also need to think about things like mix of expected operations, having some be
slow
at the consequence of other operations being fast. It's often a tradeoff we're
willing to make.
On the editor rougher, we drove toward getting all the operations to be
fast. In an ideal world, that's certainly the best place to be, but
it's also a matter at what cost
in terms of code complexity and memory used
and effort put into it. It may be that we're willing to tolerate some operations
being less efficient
as long as the most commonly used ones are very streamlined.
We talked about these worst cases and degenerates about
knowing when we can expect our algorithms to degrade and whether we
want to do something about that
or choose a different algorithm or provide some
protection against those kind of cases coming through. To a lesser
extent, we already talked about memory used, how much overhead
there is. Seeing that we went to a pointer-based Linklis structure, we added
the four byte overhead
for every element. We went to an eight byte overhead once we got minor
[inaudible] and maybe even a little bit more if we had balance factors included.
Those things ride along with our data,
increasing the size and the memory footprint of it.
So there's a certain amount of
overhead built into this system. There's also this notion of excess capacity. To what
extent do we over allocate our
data structures early, planning for future growth. In some senses, save
ourselves that trouble of enlarging on demand. It's preparing for
a large
allotment and then using it up when we find it.
Then when we do exceed that capacity, having you do it again. So where
do we make those tradeoffs about how much excess capacity and when to do
the enlarging?
It has a lot to do with what our general memory constraints are.
I would say that memory has gotten to be quite
free
in recent processors and
computer systems. You don't really think about
100 bytes here, 200 bytes there, 1,000 bytes there. Even
megabytes, you can talk about as thought it's a drop in the bucket.
So this is something that isn't given as much weight
nowadays as it was ten years ago when there were a lot more constraints. But with
the move for a lot more imbedded device programming, and looking the
iPhone or
cell phones or things like that where they don't have that same liberty
to be wanton about memory, it's come back to being a little more careful and keeping
an eye on it.
There's also and issue here which trades off a lot with
runtime, which is redundancy versus sharing. Having
two different ways to get at the same piece of data
often will provide you with quick access. For example, if you have a
card catalogue for a library where you can look up by author, you can look up by title, you
probably are maintaining two data structures.
One that manipulates them,
sorted by title, one that manipulates them sorted by author. They're both kind
of looking at the same books in the end.
So maybe there's this set of pointers in one map over here,
indexed by author, another
keyed by title over there.
That means that I'm repeating all my data.
Hopefully I'm sharing it with a pointer, so I'm not actually really repeating all the
data, but I actually have two pointers or potentially two or more
pointers to the same book,
depending on the different ways you can get to it. That gives me that fast
access. I want to know all the books written by Leo
Lionni, I can find them easily, as well as finding all the books that have King in the
title. I don't have to do searches on a data set that's been optimized for only
one
access. Definitely,
tradeoffs were more space thrown at it to get at that fast access for
different ways.
Then this last one is one that I had mentioned along the way, but I think it's one that's easy to overlook,
which is to get very excited about how fast you can make it, how small you can
make it,
how fancy it is. It
has a real downside in terms of how hard it was to
write. Typically, when you go to these fancier strategies
that are
space efficient and time efficient, you had to pay for it somewhere.
It wasn't for free that the easy, obvious code would do that. It tends to
actually be complicated code that may use bid operations, that probably uses
more pointers
that has kind of a density to it that means it's going to take you longer to get it
right. It's going to take you more time to debug it. It's going to be more likely to
have lurking bugs. It's going to be harder to maintain.
If somebody comes along and wants to add a new operation,
they open up your code, and they're frightened away. It might be that that
code will never get
moved forward. Everybody will just open it
and close it immediately and say, whatever. We'll make it work the way it is,
rather than trying to improve it or move it forward because it actually is kind of scary
to look at.
So thinking about
what's the simplest thing that could work.
There's this movement that I think is pretty neat to think about.
There's this idea of the agile programming methodology.
It's based on the idea that
rather than planning for building the most super stellar
infrastructure for solving all the worlds problems. You're about to write a chess
program.
Actually, something that's even simpler. You're going to write a checkers program. You say,
I'm going to sit down, and I'm going to design the pieces and the board. Then you think, hey,
checkers is just one embodiment of this. Why don't I try to make the
uber board that can be used for strategic conquest or for chess or Chinese
checkers. You could have these hexagonal things. You start building this crazy
infrastructure that
could handle all of these cases equally well, even though all you really needed
was checkers, a very simple, 2D grid.
But you imagine
the future needs way in advance of having them, and then you design something
overly complicated. Then what happens is you get bogged out of that.
It never happens, and the project dies.
You
lead a lonely life
by yourself, eating oatmeal.
I like
oatmeal. So one of the mottos is
design the simplest thing that could possibly work.
I think that's a neat gift to yourself to realize that
this simplest thing that could possibly work, that meets the needs that
you have, that has a decent enough [inaudible] and memory constraint right there.
Meets your needs in the simplest form of that.
It's going to be the easiest thing to get working and running. If you later decide you want to
go fancy,
you can swap it out
using the principles of distraction and [inaudible]. It should be that it's independent
when you decide you need that fancier thing. I've got a
couple questions here because I thought this would just help us frame a little
bit of thinking about these things.
These are questions that I'm hoping you can answer for me. That's why I
brought
cup full of candy. I'm prepared to offer bribery for people who help
me answer these questions.
So let's talk about array
versus vector.
So the sequels [inaudible] builds and
array is what is behind the scenes of a vector. Then the vector adds
convenience on the outside of it.
Early on, I had said, once I tell
you about arrays,
put that back in your memory, and just use vector instead because everything array
does, vector does
nicely and cleanly and adds all sorts of features. Somebody give me a list
of things that vector does much better than
array.
Way in the back. Bounce
checking.
Bounce checking. What else? Come on. A
blow pop is on the line.
I need to interrupt your - help us
out.
You've got bounce checking. What
else? More? I want more. You've got nothing else for
me? I don't even want a blow pop. You don't want
a blow
pop? All right. I'm going to give your blow pop to somebody else. Right
here. I'm going to give the blow pop to you. Tell me what makes vector good. It has operations that
shuffle
elements for you. Yeah, it does the insert, delete, moving stuff down.
Also, it does the growing.
You keep adding, and it just grows. [Inaudible] don't do that. You
want a [inaudible] to grow, you grow it.
You take the pointer, you copy stuff over, you delete the old one, you
rearrange things. You do all the work. So
vector buys you kind of
array with benefits.
So it seems like, then,
you could kind of take away - the naive point would be like, well,
just use vector. Always use vector.
Vector's always better than array. Is there ever a situation where array still
is a pretty valid choice?
What is it array has that vector maybe doesn't?
Well, when you add things dynamically to the vector, it actually doubles the size [inaudible]
increasing it by one. If you know
the exact size, and you know it's not going to change, you're using [inaudible]. That's it exactly.
One of its big advantages. If you know you only have
- let's say you have a very small array
planned for. The one on the practice exam was, if you're doing a calendar
where you had 365 days in a year,
and you're keeping track of the events on a particular day, you know
how many days in the year. There's never going to be any change in that. Well, leap
year.
But you know that, and you know that in advance. So why fuss with a data structure that does all
this growing and shrinking and has some overhead associated with that
when, in fact, you know there's exactly this number. That's all you need.
In fact, it's almost a little bit more convenient to set up. You can
just declare it that size, and it's ready to go. With
the vector, you have to go through and add all the elements and stuff like that.
So there are a few situations where you might just say,
I don't need the advantages.
It adds some overhead that I don't want to pay,
and I'm perfectly happy making it a small fixed
array. You'll still lose bounce checking,
but
if you are being careful, maybe you're
comfortable with
making that tradeoff. By losing bounce checking, you actually are getting a little bit
of the speed improvement back. The bounce check does actually cost you
a little bit on every
vector access that an array would not actually pay.
That's really in the noise for most programs. I hesitate to
even mention it, to get you to think it matters,
but it is part of the thinking of a
professional program. Sometimes that may be the thing it comes down to,
making your operation
constant factors streamline enough for the operation you need. So I put here
stack and queue versus vector.
Given that stack and queue are really just vectors in disguise, and they're
vectors with less features, vectors
that can't do as much.
One of the implementations for stack and vector
could easily be just layered on top of a vector.
Why is it good to take away things?
[Inaudible]
tampering with contents of [inaudible]. Yeah. So it's enforcing discipline
on how you use it. Stacks are
lifo, queues are fifo.
The way things go out, they way they go out
are very
rigid. By using stack and queue,
you are guaranteeing that you won't go in and accidentally put something at the head
of the line or remove something out of the middle of the stack because you just don't
have access to that.
It's been tidied up and
packaged up into stack, which has push and pop.
queue, which has MQDQ.
No other access implied or given,
totally encapsulated from you. So in fact, it allows you to
maintain on a discipline
that otherwise the vector didn't strongly provide for you. Question? [Inaudible]. Another
thing you could do is you could optimize a stack or a
queue to be very good at the only active things you're doing, which
is pushing and popping or RQDQ. I think that's worth a smartie, don't you?
I think that is.
Exactly.
So by restricting what you have available,
it actually gives you choices in the implementation that wouldn't have worked out as well, [inaudible] full array
of operations. It's exactly true. Stack and queue only adding from one end
mean that things like a Linklis, which isn't a good choice for implementing the
general case of vector,
are perfectly good choices for the stack and queue case. So giving ourselves
some freedom as the implementer. So information about how it's being used
can pay off. It also
means when you read code - if you see
code that's using something, call the stack
or using a queue, you immediately know how it works. You could
say, here's this search that's actually stuffing a bunch of things into
a stack. I know they're last and first [inaudible].
If you see them using a vector,
you have to verify where they put them in the vector. Where do
they pull them out? The code doesn't tell you the same things. It doesn't tell you the same story
as it does when you see a word that has a strong meaning to it, like
stack and queue.
Generally, these [inaudible] to make about sorting things or not sorting things. We spent a
long time talking about sorting
and how various algorithms approach the sorting problem and what
they come up with.
I have a question here. What sorting algorithm is best? Is
there an
answer to that question? Why would I ask a question that has no answer? It depends on what you expect it
to be giving you. It depends. That's a great thing. It depends on
what you expect it to give you. How big is it?
What are the contents of it? Is it random? Is it sorted? Is it almost
sorted? Is it reverse sorted? Is it
drawn from a range of one to 1,000 values, or is it drawn from one to three
values? Huge array that has just one, two, three in it
might necessitate a different approach than one that has
one to max sine in it.
So
there is no answer to that. It is depends. What do you know? If you don't
know anything, give
it to someone else.
If you don't know, there's certain things that perform well. You know in the
average case, you don't have degenerate behaviors. For example, [inaudible] is a
great sort for the general case of an N log N that doesn't use any space
or any extra space the way word sort does. It doesn't have any degenerate cases.
For an
unknown set of inputs, it works pretty well.
If you happen to have a little bit of data, it's almost sorted.
Even
Barack Obama's
not favorite bubble sort
has a
chance of being in the run because it's already sorted. Is it worth sorting? Calculating the payoff, we had a problem on
one of the section
exercises once. I think it's a really good
one to
think through and revisit. This idea of
certain operations, many operations are much more
efficiently implemented if the data's sorted.
You can think of them off the top of your head. Oh, you want to have the minimum? If
it's sorted, it's very easy. You want the maximum? Very easy if it's sorted. You want to find the median.
Also very easy if it's sorted. You want to
find duplicates. Once they're sorted, they're all lined up together. You want to find the
mode, which is the most
frequent element.
Also easier if it's sorted because they'll all be groups together in a run. So you
can do a linear pass, counting what you see to find the longest run.
If you had two arrays and you wanted to merge them or intersect them to find
their common elements, it's much easier if they're sorted. All these things
that are hard to do when the data's in
a random order
actually have much more streamlined algorithms once you invest in sorting it.
So there was a problem that was like,
how do you know it's worth doing that?
If you only plan on writing one merge, and
you could do it in N squared,
is it worth it to
sort it, N log N, so you can get an O out of merge? So it has a lot to do
with how often you plan on doing the merge.
How big is your data set?
What are your
constraints on the times there. It
is interesting to think about that relationship. It isn't for free, but
potentially, it's an investment by sorting it so you can do a lot of fast searches
later.
This one actually is interesting because they actually solve a lot of the
same problems, and they differ in one small way. If you had a
set -
so set is being backed by a balance by a [inaudible] string that actually is a sorted data
structure internally - versus a
sorted vector.
So if your goal were to keep track of all the
students at Stanford, and you wanted to be able to look up people by name,
then the contains operation of the set
is actually doing these minor ray
search on assorted vectors
using the same path, but working through a vector.
So we can make the searching operations, like contains
and
other look-up based things,
algorithmic in both cases.
So what advantage - this sort
of vector, where I just used the interchangeably, is there something one
can do that the other can't or some advantage or disadvantage that's
assumed in that decision?
You don't have duplicates in a set, but you can have duplicates in your sorted vector. Yeah,
so the set has another concern with just how it operates. You can't put duplicates
in, so if you have two students with the same name, then
in the set, it turns out you're all coalesced down to one. Your transcripts all get
merged to your advantage or not. What
else
does set buy you that vector doesn't? How hard is it to
edit a sorted vector? You
want to put something new in there?
You're shuffling. You want to
take something out? You're shuffling.
The movement to the pointer-based structure there, the research
stream behind this,
is giving us this editability of the data structure.
The convenient logarithmic time to remove and add elements using the
pointer wiring to do that,
that vector doesn't have. That tradeoff
is actually one that
situation may be that you just don't do a lot of edits. That's actually a very common
situation. The sorted vector happens to be a very underutilized
or under appreciated arrangement
for data because for most things that are getting a lot of
edits, having a
linear time edit isn't going to be a painful
consequence.
You're doing a lot of searches, and you're getting the finer research there,
where you really care about it,
with no overhead. Very little overhead. A little bit of excess capacity,
but none of the pointer-based
trouble that came with the set. So the set actually adding onto that another eight
bytes, potentially even a little bit more,
of overhead on every element to give us that editability that maybe isn't
that important to us.
So if you're looking at that thing and saying, I'd like to be able to have
a sorted structure for both the set and the sorted vector, let you
access the elements in order. The interrater of the set goes in sort order,
going from zero to the end. The vector gives me order.
I can browse them in sort order. I can search them using sorted.
Where they differ is where does it take to edit one?
The one that invested more in the memory had faster
editing. The P-queue. This is kind of interesting. The P-queue is also a sorted structure, but not quite.
It's a much more weakly
sorted structure. So if you're using the [inaudible] heap like we did on the
P-queue
[inaudible], it does give you this access to the maximum value,
and then kind of a quick reshuffling [inaudible] the next
and so on.
But it doesn't really give you the full range. For example, if you
were looking for the minimum in a P-queue, where is it?
It's somewhere in the
bottom. It's in the bottom-most level, but where across the bottom,
no knowledge, right? It could be
at any of the nodes that are at the bottom of the heap. So in fact it
gives you this access to
a partial
ordering of the data. In this case, it's relative
to the prioritization, but not
the fluid form of sorting the way a vector is. For example,
if P-queue doesn't give you things for
finding duplicates or finding the minimum or
doing the mode it's not actually an easy operation because you have
it in a heap form. I can get to the max
quickly.
So if I cared about pulling them out in sorted order
to browse them,
I could stuff things into a P-queue and then pull them out, but what's interesting about that is
it requires destroying the P-queue.
The P-queue isn't really a place you store things and plan on using them again. It's
the place you stick things in order to get them out
in an order that's only
value is in the de-queuing of them, pulling them out.
So finding the most promising option to look at in a search or finding the most
high-priority activity to take on or something.
But it doesn't really have the - if I wanted to be able to look at all
the students in my class in sorted order,
it's like, well, I could stick them in a P-queue and then pull them out, but it's sort of
funny. I have to put them in the P-queue just to pull them back out. It wasn't a good way to store them for that
browsing operation to be easily repeated.
If I put them into a sorted vector, I could just keep
iterating over it whenever I need to see it again.
So a lot of the things that it comes down to is having a pretty good
visualization of what it is that going to a pointer-based data structure buys you, and
what it costs you, relative to the continuous memory. That's one of the
tensions underneath most of these things. It's like,
putting more memory into it
by virtue of having a pointer wire as opposed to a
location
arrangement.
There's no information that's stored to get you from a race of zero to a
race of two. They're just continuous in memory. So it saves you space because you only
have to know where it starts and access continuously for that. This pointer means
there's more kerversals. There's more memory. There's more opportunity
to air, but it gave
you this flexibility. It broke down this continuous block, that's kind
of a monolith,
into these pieces that are more flexible in how you can rearrange
them. So thinking about these things is
an important skill to come away from 106b with that,
yeah, there's never
one right answer.
You're investing to solve this other problem, and
there's going to be downsides. There's no
obvious
triumph over all.
It's fast, it's cheap and it's small and easy to write
solutions out there.
So I put up here - I had to
whittle
this down to the five or six things that, at
the end of 106b, I want you to feel like, that's what I got.
That's the heart of
the
goal. The MVPs - the most valuable players of the
106b curriculum here.
You're looking back at it, and you go, where did I start, where did I end up,
and how did my knowledge grow?
I'm hoping these are the things that stand out for you. That was a
real
growth area for me. One
is this idea of abstraction.
That was stressed early on. Here's a stack, here's a queue, here's a set. They
do things for you. We don't know how
they work yet,
but you make cool things happen with them. So you solve maze problems and
random writers and
do neat things with
dictionaries without knowing how they work. So you're leveraging existing
material without knowledge of how it works.
It helps to keep the complexity down. You can solve much more interesting problems
than you could without them. Try to go back and imagine the Markov random writer
problem.
Now imagine doing it
without the map in play. So
you had to figure out that
given this character, go look it up and find a place to store it, and
then store the character in some other array that was growing on
demand, as you put stuff in there. You would never have completed that
program. You would still be writing it today, having pointer errors and whatnot
behind the scenes.
Having those tools already there, it's like having
the microwave and the food processor, and you're about to cook instead
of having a
dull knife
and doing all your work with a
dull
knife and a fire. You actually have
some real power
there. Recursion's a big theme in the weeks that followed that.
How to solve these problems that
have this similar structure,
looking at these exhaustive traversals and choice problems that can
be solved with backtracking. Having this idea of how to think about thing [inaudible]
using the recursive problem solving
and how to model
that
process using the computer was a
very important thing that comes back. We look at algorithm
analysis, this idea of making this
sloppy map that computer scientists are
fond of, talking about the scalability and growth, knowing about curves and
the relationships and how to
look at algorithms. The more formal analysis.
Still, in this case, there's still quite
a bit more formulas to this than we really went into. If you
were go on in CS and look at it, you will see there's quite a lot
of rigor that can be
looked at in that area.
We were getting more an intuition for how to gauge things about algorithms and
their growth.
Then this backside, all about implementation strategies and tradeoffs.
So now that we've
benefited from those abstractions, and we've
solved some cool problems, what's it like to actually be the person building them? What
are the techniques? What does it take to
wrestle a pointer or a Linklis or a tree, heap graph into doing something really cool for you. How
does that
inform your information, what you expect about it and the
coding complexity that was inherent in achieving that result?
Hopefully in continuing with the theme that 106a started you on was this
appreciation for [inaudible]. In the
end, getting code that works, that's ugly, just
isn't, hopefully, what you're satisfied with. You want to
approach problem solving to produce beautiful, elegant,
unified, clean,
readable code
that you would be happy to show to other people and be proud of and
maintain and live with.
The kind of work that
other people you work with would appreciate
being involved with.
Things that are not so much
- I don't think these are the most valuable pointers, but what
came along the way,
the idea of pointers and C++ being part of the -
C++ and its pointers, they're one and the same. We choose [inaudible] for a
language, so as a result, we get to learn some C++
because it's so heavily
invested with pointers. We also have to do some practice with the pointers to
get our jobs done.
I think of those as being off to the side. I don't think of those being the
platform of things I worship. They are
things we need to get our job done. I
did put a little note about pointers here because there were quite a lot of comments on
the mid-quarter evals that said, pointers, pointers still scare me. They make me crazy.
I don't know enough about
pointers. Here's a fact for you.
You'll probably never feel like you know enough about pointers.
Today, tomorrow, ten years from now, you'll still kind of feel like
there's some real subtlety and trickiness in there.
They're an extremely powerful facility that gives you that
direct control over allocation and deallocation and all the wiring and
sharing
that
is important to building these complicated data structures.
But there are so many pitfalls. I'm sure you've run into
most of these, if not all of them at some point, where you
mishandled a pointer, failed the allocator, deallocate,
and
the way it gets reported, whether the compiler or the runtime error that
you see is helpful in sorting it out is a real cause for
long hours to disappear
in the name of mastery of this.
If you're still wary of pointers, you
probably should be.
Think of this as your first go-around.
We're still
at the journeyman's stage for programming here.
So although there were pointers in 106a, they were very hidden. This
is the first time you really see it and are exposed to it and trying to get your head
around it.
If you go on to 107 and 108, you'll just
get more and more practice. It will
become more solid as you keep working your way through it, but
at this stage, it's okay
and
commonplace to feel still, a little bit - you see the star, and you take a
double take. That's just where you should be.
It's just like in the
Chinese restaurants. They put the little red star by food you want to be
careful of. That's why they put the star for the
pointer. I also wanted to
zoom out farther,
a decade from
now, when you are
thinking back on the
whole of your education and what you learned. Do I want to really
remember how to
type in keyword and the sequel plus generic
template facility? No.
If you code in C++ every day, you'll know that. If you don't, you
can forget it. It's long gone.
What I hope you'll carry away from you
is a couple
concepts that
broaden
outside of CS. One is this idea of algorithmic thinking,
how you approach something logically and step-wise and draw pictures and analyze it to understand
the cases
and to debug when things aren't working well. I think that's a skill that
transcends
the art of computer programming into other domains, any kind of thinking
logically and problem solving and analyzing in that
way. I also hope that some of the experimental things we did with the sort
lab and the P-queue
and some of the big O and thing we tried to do in class together help
you to [inaudible] back of the envelope calculations. This is something I'm
a little bit of a -
it's a pet issue of mine, which is,
I think, in the era of computers
and whatnot, really easy to get distanced from numbers and start
thinking, I just punch numbers into formulas and one comes out. It must be
the truth. Not having any intuition as to whether that's the right
number, whether that number makes sense, whether it's grounded,
and whether you actually are [inaudible] or so off because you've
actually made an entry error, and you don't even notice it.
So becoming comfortable with this idea of using numbers to drive things,
being able to
take some time trials, do some predictions, do some more time trials,
match those things up, do a little sketching on your numbers and see the
things that are working out. So being comfortable having the math and the theory
reinforce each other
and
not get too
distanced from the fact that those numbers do play out in
the real world in ways that are interesting. Don't just let the numbers themselves
tell the story. There really is more
to connect it up with.
This idea of tradeoffs.
You may not remember
how you can write a Linklis or how the hash table does
the things it does
and how to get one working, but hopefully you will take away this idea that
the answer to most questions begins with the phrase, it depends.
If I say, is it better to use a hash table or [inaudible] search tree, the answer
will be, it depends. There's no,
it's always better to use this sort. It's always
wrong to use this approach.
It's, it depends. Do
you have the code written for that, and it works well, and you don't care how
long it takes?
Then bubble sort actually could be the right answer.
Barack Obama not withstanding.
But knowing whether memory is
at premium, time is at premium, what your mix of operations are,
what
language you're working with, what
program expertise you have, what timeline you have for developing it all
can play a role in deciding what choices to make and what strategy to pursue.
So being very comfortable with this idea that it's about gathering information
before you
can commit to one strategy as opposed to it's always going to be
this or that.
So that's the 106b thing.
I have a couple slides here on
things that happen after 106b. I'm
just going to go ahead and show you.
This would be a great time to ask questions. In particular, if there's anything you're
curious about.
Where do we go from here, and
how do we make good decisions about things that follow? The
most obvious following courses from CS - if you took 106b
and you hated it, I'm very sorry.
Hopefully
the scarring
will heal over and you'll lead a
productive and happy life
elsewhere. If you did like it and you think,
I'm kind of interested in this. What do I do next to explore further?
Where else can I go with this?
I put up the intro
courses with their numbers and names, and I'll talk a little bit
about what each of these does to give you an idea of what opportunity it
offers to you.
The obvious follow onto the programming side - 106a and b
are programming courses, building your programming mastery.
107 follow in the same vein. We typically call those systems
courses in CS nomenclature.
That means practical, hands on programming, getting more exposure
to the system.
So 107 builds on 106b's knowledge in a class called programming paradigms.
Part of what it's trying to explore is different languages.
So you actually look at the language, C, kind of stepping back to old school.
You look at the language scheme, which is a functional language that
was developed for AI purposes
that
has a long history coming from the math side of things. Probably looking at some
scripting and flexible [inaudible] like Python. We kind of mix them up a
little bit, so it's not exactly the same things, but looking at other
languages, other ways of doing things.
The other component of 107 is also looking at
some more low-level things.
What happens? When you
pass your code off to the compiler, what is it doing? What kind of analysis does do of your
code? How does it actually generate machine code? What happens in the machine layer,
understanding some more of the performance implications
and how things are expressed. Getting a better understanding of pointers actually
comes up because there is a certain amount of low-level coding that's being
explored in this class.
So building on the programming skill as we move toward a Unix
Linux-based
environment. So away from the Mac and Windows.
Just continue building larger programs with complicated
features to grow your mastery.
Internally, we call this class programming maturity, even though
it's not its title.
We see it that way in the sequence in terms of just growing you into
a
more versatile programmer, seeing different languages, different things
and getting more understanding of what happens under the hood.
It is a five-unit class. It's typically offered fall and spring. Most
people thing it is
about as much work as 106b. Some a
little more or a little less in different parts of the quarter, depending on what your
background is and
how 106b worked for you.
In a model, I think most people think of it as being - would you guys
say 107
about as hard as 106b?
Some ways, it's harder. Some ways, it's not. Maybe that's
the deal. The 108 class which follows onto 107, but that
prerequisite is particularly strong. It's a Java class.
So moving in the other direction. Rather than moving down an extraction, moving
up on the extraction,
looking at more large-scale design, modern programming technique,
using
object-oriented facilities, thinking about large-scale programming. You do a
big team project in the
end, so a three week,
three person, massive effort
that is the first
really big scale thing that you'll be exposed to in the lower-division
curriculum. [Inaudible]. Sometimes in 107l, which has some relationship to the
106l in the way that it's more hands-on, more
C++,
I don't know whether it will be offered this spring or not,
but
if you look in Access, it will tell you at least whether we have it tentatively on
the schedule.
I think that comes and goes with the interest from the students and the
availability of someone to teach
it.
[Inaudible] can we watch [inaudible]?
Yeah, 107 and 108 are offered on TV usually
once a year. 107 is offered on TV this spring. The one in fall is
not taped.
It might be that the old lectures from spring are somewhere that
you could dig them out, but I actually don't happen to know for sure.
108 is on TV, I think,
in the fall, not in the winter. So the fall lectures are probably still around for
108, so you could take a look at them. You
could certainly look at their websites, and
there's often a lot of old materials and handouts
and stuff that gives you at least some idea of things that have been
covered in the most recent offering of it.
Certainly showing up in the first week is one way to get an
idea of what you're in for.
108 is a four-unit class.
Because it's in Java, there's certain things about Java,
the safety and
the attentiveness of the compiler, makes the error handing a little
bit easier. So I don't think most people think of 108 as quite as intense as
107 or 106b, but that project at the end has quite a reputation
as being a
real time suck. So it has lots to do with being
scheduled and motivated and working through that team project at
the end.
I think that's where most people find the big challenge of
108. Then on the math side, in the
nomenclature for CS, we have this idea of theory, introducing you to more
the formalism for big O and discrete structures and doing proofs and
thinking about logic.
So we have a 103 a and b and a 103x, which is a combined,
accelerated form of a and b, which serves as math classes. They are math classes
for introducing
the kind of mathematics that are important for computer scientists.
So that is
for people who are thinking about a CS major, another good course to get in
early. Pretty
much everything in the upper division of the CS major has
these things layered down as prereqs. So these are the intro courses that serve as the
gateway to prepping you to go on, and then looking at things in a more topical
way.
Looking at networks or security,
HCI or
artificial intelligence layers on a grounding in theory and programming
that came from the intro courses. So
that said, we're in the
midst of a really
serious curriculum revision,
which is
on the table and being designed right now,
and was voted on by our faculty and improved and has to go
through the school of engineering and get approval. Then there's going to be a little bit of
jostling as everything gets worked out.
But started next year, you're going to see some changes in the
department. Some of these courses are going to
get morphed, and
they won't be exactly the way they are today.
As a Stanford undergrad, you actually have the option of graduating under any set of
requirements that are in play any time you're an undergrad. So you could graduate under this year's
requirements,
or you could wait.
I think for most, the right answer's going to be wait because I think the new
arrangement is very much more student-friendly.
It has more flexibility
in the program. If you look at the CS program as it is, it has a lot of pick
one of two, pick one of three, pick one of two,
where you're very constrained on what you can choose. The new program is going to be
more track-based where you pick an area of interest. You say, I'm really interested in the
graphics. You do a little
depth concentration in that, and then just have room for a
very wide selection of electives to fill up the program as opposed to
a smorgasbord where we forced you to pick through a bunch of different areas.
So for most students, I think it gives you flexibility that actually gives you
the options
to
pick the classes that are most interesting to you and get the material that you want. It's a growing
recognition of the fact that CS
has really broadened as a field. Some say that our major looks the same way
it did ten years ago,
a classic definition of here's these
seven things that all matter. The
things aren't seven anymore. There's no 15 of them, 25 of them. If
you look at the spectrum of things that computer science is now
embracing, we can't say, you have to take one of all these things without keeping you here
for eight years.
So we are
allowing that our field's footprint has gotten so large that we need to
let you pick and choose a little bit to get the slice that seems most
appropriate for where you're headed.
So you'll see more about that.
If you are thinking about being a CS major
and have not yet declared,
I would advise
that you add yourself to what's called the considering CS list,
and you can get to this from the CS course advisor's page,
which,
off the top of my head, I think
it might be
CSadvising, but I'm not positive. I'll put the link on
our webpage, when I remember what it is.
It is a mailing list. So we have a mailing list of the declared undergrads, but we
also have a list for people who are
at least potentially interested in the major and just want to be kept up to date on
things that are happening.
There's going to be a lot of activity on this list in the
upcoming months as we publish the information about the new
curriculum.
So
you'll know what's coming down the pipes so you can make good decisions for
preparing for
what will be [inaudible], what the transition plan will be as we move into the new
courses
and how to make sure you
land
in the right place when all is said
and done. Then I put
a little note about
other majors that exist.
CS is
the home base for people who are
solving problems using computers. It spans a large variety of people
doing a lot of different things.
Then there's a couple other majors that we participate in, in the interdisciplinary
sense,
that UCS is part of a larger
context for doing things. So computer systems engineering, which is
between us and EE, which is a little bit more about hardware.
The symbolic systems program, which is
run by linguistics and includes cooperation from psychology, philosophy, communications,
CS looking at artificial intelligence and modeling human
thought and
understanding
intelligence as expressed in that way.
Then there's a mathematical computational science program that's homed
in the statistic department that is joint between math, computer science and the
MS and E. It's more of an applied math degree. So taking
mathematical thinking and
using computers and statistical things to analyze and
think about things. A lot of financial people end up
being drawn to that particular one or people who are doing risk analysis or
that sort of
combination of things. But where CS has a role to play in all of these because it's a
great tool for solving these kind of problems. So
different ways to get at
different pieces
of the major.
I've put a couple notes of things that
I think are worth
looking into
in case they,
at some point, are the right thing for you to look at. Section leading,
so joining the 106 staff
and shepherding the next generation
of students through the program
is a great way to strengthen your own knowledge. There's nothing like
making sure you understand recursion as to try and explain it to someone else who
doesn't understand it.
I
can really do a lot for
the personal connection to you and your craft
as well as just being involved in a great community. The section leaders are a really
wonderful,
dedicated group of students who are
fun to be around and great people to make friends with. So I
highly encourage you to take a look at that.
We interview for that every quarter,
so usually about week six,
we put out
applications and bring people in for interviews. We're always looking for new people
to join on a quarterly basis.
There are research opportunities all around the campus. One that the CS
department sponsors in particular is their [inaudible] summer institute.
So matching professors with undergrads to do work in the summer for
slave wages.
But great opportunity. You can get started on
thinking about how research might be part of your future.
One of the best preparations for thinking about grad school.
We do have an honors program.
Like most, we have a chance to stake out on an independent project and
show your mettle.
We have a co-terminal masters program
where you can
get some of the depth
before you leave this place.
Not to mention, there's a million fabulous
things to do with Stanford, getting involved with
arts or nonprofit organizations or sports or the student leadership
or whatever.
Maybe the most important memory I have from my undergrad was that the
trickiest part of my undergrad was learning what to say no to.
Realizing that there are
10,000 fabulous, worthwhile opportunities,
and doing all of them only means sacrificing your sanity and
the quality of the things you do.
So be careful and thoughtful about choosing the things you do and embracing them.
Be okay with saying, I won't do this. Even though it's great to go overseas, and I
would've liked that opportunity, it's not going to fit, and I'm okay with that. Don't
beat yourself up over it and cause
yourself too much grief trying to do too many things.
I won't be here on Friday. Keith is going to come. He's going to teach you about C++ in the real
world.
I'm going to see you guys Friday at the final.
I think that's all I need to say for today.