This presentation is delivered by the Stanford Center for Professional Development.
Any administration questions on your mind?
How many people have actually successfully installed a compiler?
Have stuff working - okay, so that's like a
third of you, good to know.
Remaining two thirds, you
want to get on it. Okay,
so we started to talk about this on Monday, and I'm gonna
try to finish off the things that I had started to get you thinking about;
about how input/output works in C++. We've seen the simple
forms of
using stream insertion, the less than less than operator to push things on to cout,
the Console Output Stream.
A C-Out is capable of writing all the
basic types that are built into C++, ants and doubles and cars and strings,
right,
by virtue of just sort of putting the string on the left and the thing you want on
the right, it will kind of
take that thing and push it out onto stream. You can chain those
together with lots and lots of those < < to get a whole bunch
of things, and then the endl is the - what's called stream
manipulator that produces a new line, starts the next line of text, a line
beneath that.
The analog to that on the reading side is the stream
extraction operator, which is the > >. And then when applied
to an input stream it attempts to sort of take where the cursor position is in
the input stream and read the next characters using the expected format
given by the type of the thing you're trying to extract. So in this case
what I'm saying, CN > > extract an integer here, X being an integer.
What it's gonna look for in the input stream is it's going to skip
over white space. So by default
the stream extraction always skips over any leading white space. That means tabs,
new lines,
and ordinary space characters. So
scans up to that, gets to the first non-space character
and then starts assuming that what should be there is a number, and so
number being, a sequence of digit characters. And in this case, because it's
integer, it shouldn't have a dot or any of the exponentiations sort of things that
a real number
would. If it runs into something that's not integer, it
runs into a character, it runs into a punctuation, it runs into a
39.5,
what happens is that the screen goes into a fail state
where it says, I - you told me to expect an integer. What I read next wasn't an
integer.
I don't know how to make heads or tails of this. So it basically just
throws up its hand.
And so it -
at that point the stream is - it requires you to kind of intervene,
check the fail state, see that something's wrong,
clear that fail state,
decide what to do about it, kind of restart, and kind of pick up where you left off. It
makes for kind of messy handling
to have all that code kind of in your face when you're trying to do that reading,
and that's actually why we've provided the things like get integer, get line and get
wheel, and the simple I/O library
to just manage that for you.
Basically what they're doing is in a loop they're trying to read that integer
off the console. And if it fails, write resetting the stream,
going back around asking the user to type in
- give it another try, until they get something that's well formed. So
typically we're just going to use these,
because they just provide conveniences. You could certainly use this, but it would just
require more effort on your part to kind of manage the error conditions and retry
and whatnot. So
that's why it's there.
The C++ file I/O; so
the console is actually just a particular instance of the stream. Cout and cin
are the string that's attached to the users interface console there.
That the same sort of mechanism is used to read files on disks, so text files on
disks that have contents you like to
pull into a database, or you want to write some information out to a file, you
use the file stream for that.
There is a header called fstream, standard C++ header
in this case, so
enclosed in < >,
that declares the isstream and the osstream. The input file stream for reading,
the output file stream for writing.
Declaring these variables; this [inaudible]
just sets up a default stream that is not connected to anything on disc.
Before you do anything with it you really do need to attach it to some
named location, some file by name on your disk
to have the right thing happen, to read from some
contents, or to write the contents somewhere.
The operation that does that is open,
so the isstream and the osstream are objects,
so dot notation is used to send messages to it. In this case, telling the
input stream
to open the file whose name is "names.txt."
The behavior for open is to assume that you meant the file in the current
directory if you don't otherwise give a more fully specified path. So
this is almost always the way we're going to do this, we're just going to open a file by name. It's going to look
for it in the project directory, where your code is, where you project is, so
kind of right there locally.
Now this will look for a file whose name is exactly names.txt,
and then from that point the file positions, the kind of cursor we
call it, is positioned at the beginning of the input stream. The first character read
will be the first character of names.txt, and as you move forward
it will read its way all the way to the end.
Similarly, doing an outopen,
it opens a file and kind of positions the writing at the very beginning
that will - the first character written will be the first character then when
you finish. And that file, they'll be written in sequence.
So this is one of those places, actually, probably the only one that this
direction is going to be relevant for. I talked a little bit last time about C-strings
and C++ strings, and you might have been a little bit
worried about why
I'm telling you you need to know that both exist.
And so last time I talked a little about
one way in which C-strings don't do what you think, in that one case of
concatenation, and how you can do a - force a conversion from the old to the new.
Now, I also mentioned that there was a conversion that went in the
opposite direction. You had a new string, and you wanted the old one.
And
one of the first questions you might ask is well why would I ever want to do that? Why
would I ever want to go backwards? Why do I want to move back to the older yucky thing?
This is the case that comes up;
the open operation
on isstream and osstream
expects its argument to be specified as an old style string.
This is actually just an artifact; it has to do with it -
the group that was working on
designing the string package. The group that was designing the string package were
not in sync, and they were not working together. The string package was
finalized before the string package was ready
and so it depended on what was available at the time and that was only the old style
string.
So as a result, it wants an old style string, and that's what it takes, and you
can't give it a C++
string. So in double quotes - so this is the case where the double quotes
are actually old style strings,
in almost all situations gets converted on your behalf automatically.
In this case it's not being converted and it's exactly what's wanted.
So if you have a name that's a string constant or a literal, you can just pass it
in double quotes to open.
If you have a C++ variable,
so you've asked the user for what file to open, and you've used getline to
read it into a string,
if you try to pass that C++ string variable to open, it will not match
what it's expecting.
I do need to do that conversion asking it to go .c_str
to convert itself into the old style format.
So that was sort of where I was getting to when I kind of
positioned you to realize this was gonna
someday come up. This is the one piece of the interface that will interact with this
quarter that requires that old string,
where you'll have to make that effort to
convert it backwards.
Both of these operations can fail.
When you open a file and [inaudible] - question here? So how hard
[inaudible]?
You know it's obviously extremely easy to do it;
the issue has to do with compatibility.
They announced it this way, people wrote code that expected it this way
and then you change it out from under them and all this code breaks that used
to work.
And so as a result of this [inaudible]
compatibility an issue of
once we kind of published it and we told people this was how it works, we can't
really take it away from them. And so part of that's - sort of part of what we're doing within C++2,
which is things that used to work in C still need to work in C,
and so as a result
there's a certain amount of history that we're all carrying forward with us in a very
annoying way. I totally agree
that it seems like we could just fix it, but we would break a lot of code in the process
and anger a lot of
existing programmers.
So both of these open calls could fail; you might be able to - try to open a file and it
doesn't exist, you don't have the permissions for it, you spelled the name wrong.
Similarly trying to open it for writing, it's like you might not have write
permission in the directory.
And
in either situation you need to know, well did it open or did it not?
There's not a return value from open that tells you that.
What there is is a member function called
.fail, that you can ask the stream at any point, are you in a fail state. So for
operations that actually kinda have a chance of succeeding or failing in the
string, you'll tend to actually almost write the code as a
try it
then check in .sale. So try to read this thing, check in .sale. Try to
open this file check in .sale as your way
of following up on did it work and making sure that you
have good contents before you keep going.
If the in .open has failed,
then every subsequent read on it will fail.
Once the string is in a fail state, nothing works. You can't read
or write or do anything with it until you fix the error,
and that's the in .clear
command that kind of resets the state back into a known good state,
and then you have a chance to retry. So for example, if you were trying to open a
file that the user gave you a name for,
they might type the name wrong. So you could try in .openit, check
in .dot fail.
If it failed, say no, no, I couldn't open that file, why don't you try again, get a new
name,
and then you'd clear the state, come back around and try another in .open
to - until you get one that succeeds.
Once you have one of those guys open
for reading or writing,
there are three
main ways that you can do your input/output.
We have seen this form a little bit, this one with the insertion/extraction,
these other two are more likely to be useful in the file reading state as
opposed to interacting with the user state, and they have to deal with just
breaking down the input
more
fine graindly.
Let's say this first one is reading and writing single characters. It might be
that all I want to do is just go through the file and read it character by character.
Maybe what I'm trying to write is something that will just count the characters and
produce a frequency count across
the file, tell me how many A's and B's and C's are in it,
or just tell me how many characters are in the file at all.
In .get
is the number function that you send to an input file stream
that will retrieve the next character.
If [inaudible] the next character from the stream it returns EOF when there are no
more characters. EOF is the end of file marker, it's actually capital
EOF, it's the constant that's defined with the class. And
so you could read till EOF as a way of just getting them character by
character.
Similarly there is a put on the other side, which is when you're writing, do you just
want to write a single character.
You could also do this with
out << ch, which writes the character. This actually just does a
put of the character, just
kind of a matching function in the analog to get input
that do single character io.
Sometimes what you're trying to do is process it line by line. Each line is the
name of somebody and you're kind of putting those names into a database. You
don't want to just assemble the characters by characters, and you don't know how
many
tokens there might be,
that the white space might be that there's Julie Diane Zelenski, sometimes
there might be Julie Zelenski, you don't know how many name pieces might appear to be
there.
You can use getline to read an entire line in one chuck.
So it'll read everything up to
the first new line character it finds. It actually discards the new line and advances
past it. So what you will get is -
the sequence of characters that you will have read will be everything up to and not including
the new line. The new line will be consumed though so that reading will
pick up
on the next line and go forward.
Getline is a free function.
It is not a member function on the stream. It takes a stream as its first
argument.
It takes a string by reference as its second argument,
and it fills in the line with the text of
the characters from here to the next line read in the file.
If it fails the way you will find out is by checking the fail
states. You can do a getline
inline and then in .fail after it to see, well did it write something
in the line that was valid? If it failed, then the contents of line are
unchanged, so they'll be whatever nonsense they were. So
it's a way of just pulling it line by line.
This name has the same words in it as
rgetlineGL
in the sympio, which shows that it's kind of a reasonable name for the kind
of thing that reads line by line, but there is a different arrangement to how it's - what
it's used for and how it's it used. So rgetline takes no arguments and returns a line read
for the console.
The lower case getline takes the file stream to read from and the string to write
it into
and
does not have a return value.
You check in .fail if you
want to know how it went. So write the entire line out there, [inaudible] a put line equivalence, so
in fact you could just use the out
stream insertion here, stick that line back out with an nline to kind of reproduce
the same line your just read.
And then these we've talked a little about, this idea of formatted
read and write, where it's expecting things by format. It's expecting to see a character,
it's expecting to see an integer, and it's expecting to see a
string.
It uses white space as the default delimiter between those things. So it's kind of
scanning over white space and discarding it and then trying to pull the next thing out.
These are definitely much trickier to use because if the format that you're
expecting doesn't show up, it causes the stream to get new fail state, and you
have to kind of fix it and recreate it.
So often even when you expect that things are going to be, let's say, a sequence of
numbers or a name fall by number,
you might instead choose to pull it as a string
and then use operations on the string itself to kinda divide it up
rather than depending on stream io because stream io is just a little bit harder to get
that same effect.
And then in all these cases write in .fail.
There is also -
you could check out.fail. It's just much less common that the
writing will fail, so you don't see it as much, but it is true for example, if
you had wanted a disk space and you were writing, a
write operation could fail because it had
wanted a space or some media error had happened on the disk, so
both of those
have reasons to check fail.
So let me do just a little bit of
live coding
to show you that I -
it works the way I'm
telling you. Yeah? So the
fail
function, is it
going
to always be the stream that's failing and not
the function that's failing? Yes,
pretty much. There are a couple rare cases where the function actually also
tells you a little bit about it, but a general fail just covers the whole general
case of anything I have just got on the stream fail
so any of the operations
that could potentially run into some error condition will set the fail in such a way
that your next call to in .fail will tell you about it.
And so that's the - the general model will be; make the call, check the fail,
if you know that there was a chance that something could have gone
wrong and then you want to clean up after it and do something [inaudible].
So I'm gonna show you that I'm gonna get the name of the file from the
user here,
I'm going to use in .open of that,
and I'm going to show you the error that you're gonna get when you forget to convert
it, while I'm at it.
And then I'll have like an in
.fail error
wouldn't -
file didn't open.
First I just want to show you this little simple stuff; I've got
my ifstream declared, my attempt to open it and then my check for seeing that it
failed. I'm gonna
anticipate the fact that the compiler's gonna be
complaining about the fact that it hasn't heard about fstream, so I'm gonna tell it about
fstream.
And I'm gonna let this go ahead in compiling, although I know it has an error in it,
because I want to show you sort of the things that are happening. So the first thing
it's complaining about actually is this one, which is
the fact that getline is not declared in the scope, which meant I forgot one more of my
headers that I wanted. Let
me move this up a little bit because it's sitting down a little far.
And then the second thing it's complaining about is right here.
This is pretty hard to see, but I'll read it to you so you can tell what it says; it says error,
there's no matching function call
and then it has sort of some gobbly gook that's a little bit scary,
but includes the name ifstream. It's actually - the full name for ifstream is a
lot bigger than you think,
but it's saying that there's -
the ifstream is open, and it says that it does not
have a match to that, that there is no open call on the ifstream class, so no
member function of the ifstream class whose name is open,
whose argument is a string.
And so that cryptic little bit of information is gonna be your reminder
to jog your memory about the fact that open doesn't deal in the
new string world, it wants the old string world. It will
not take a new string,
and I will convert it
to my old string,
and then be able to get this thing compiling.
And so when it runs if I enter a file name of I say [inaudible],
it'll say error file didn't open, some file that
I don't have access for. It happens that I have one sitting here, I think, whose name is
handout.txt. I took the text of some handout and then I just
left it there. So
let me
doing something with that file. Let's
just do something simple where we just count the number of lines in it. Let's say - actually I'll make
a little function that -
just to talk a little bit about one of the things that's a little quirky about
ifstreams
is that
when you pass an ifstream you will
typically want to do so by reference.
Not only is this kind of a good idea,
because the ifstream is kind of changing in the process of being read. It's
updating its internal state and you want to be sure that we're not
missing this update that's going on. It's also the case that most
libraries require you to pass it by reference. That it doesn't have a model for how
to take a copy of a stream and make another copy that's distinct. That it really
is always referring to the same file, so in fact in most libraries you have
to pass it by reference.
So I'll go ahead and pass it by reference. I'm gonna go in here and I'm just gonna do a line-by-line
read and count as I go. I'm
gonna write this as a wild [inaudible],
and I'm gonna say
read the next line
from the file into the variable,
and then if in .fail - so if it was unable to read another line,
the - my assumption here is gonna be that
we're done, so it will fail as eof . It's the most common reason it could
fail. It could also fail if there was some sort of more catastrophic error, you're leading a file from a
network and the network's gone down or something like that. In our
case its right, the in .fail is going to tell us yeah, there's nothing more to read
from this file, which means we've gotten to the end.
We've advanced the count. Whenever we get a good line we go back
around, so we're
using kind of the wild true in this case because we have a little bit of work to do
before we're ready to decide whether to keep going,
in this case, reading that line.
And then I return the count at the end,
and then I can then down
here print it nom lines
= mi call to count lines of n
and l. Okay. Let
me move that up a little bit.
Last time I posted the code that I wrote in the
editor here, and I'll be happy to do that again today, so you
shouldn't need to worry about copying it down, I will post it later if you want to
have a copy of it for your records, but
just showing, okay, yeah, we're just a line by line read,
counting, and then a little bit more of the how do you open something, how do you
check for failure. And
when I put this together, what does it complain about? Well I think it complains about the fact that
I told it my function returned void, but then I made it return it. And that
should be okay now. And so if I read the handout.txt file,
the number of lines in it happens to be 28. It's just some text I'd cut out of the handout, so
there are 28 new line characters
is basically what it's telling me
there.
So I can just do more things, like I could use - change this loop and instead use like get to
do a single character count. I could say how many characters were in there.
If I used
the
tokenization and I said, well just tell how many strings I find using string
extraction, it would kind of count the number of non-space things that it found and
things like that.
Typically the
IO is one of those errors I said where there's like a vast array of nuances to all
the different things you can do with it, but the simple things actually are
usually fairly easy, and those are the only ones that really going to matter to us as being
able to do a little bit of simple reading and
file reading/writing to get information into our programs. How do
you feel about that? Question? Sorry, why do
have getline
an empty string?
So getline,
the one that was down here? This one? No,
the one that - Oh, the
one that's up here. So yeah, let's talk about that. The getline that's here is -
the second argument to getline is being passed by reference, and so it's
filling in that line with the information it read from the file.
So I just declared the variable so I had a place to store it
and I said,
okay, read the next line from the file, store the thing you read in the line. It turns
out I don't actually care about that information, but there's no way to tell
getline to just throw it away anyway. Oh.
So I'm using it to just kinda move through line-by-line, but it happens to
be that getline requires me to store the answer somewhere, and I'm storing it.
Instead of returning it, it happens to use the design where it fills it in by
reference.
There's actually - it turns out to be a little bit more efficient
to do a pass by reference and fill something in, then to return it. And the
C++ libraries in general prefer that style of
getting information back out of a function as opposed to the function return,
which you think of as being a little more natural design. There's a slight
inefficiency to that relative to the pass by reference and the libraries tend to be very
hyper-conscious of that efficiency, so they tend to prefer this
slightly more awkward style. Question?
Why in the
main [inaudible] does the
error
open [inaudible] file didn't open with [inaudible] like print error: file didn't open? You
know
it's just the way that error works. Error wants to make sure that you don't mistake what
it does, and so it actually prefixes whatever you ask it to write with this big
ERROR in uppercase letters, and so
the purpose of error is twofold; is to report what happened and to halt
processing. And so when it reports that it actually prefixes it with this big red
E-R-R-O-R just to say don't miss this, and then it halts processing
there. And it's just - the error [inaudible] libraries function, which is your way of handling any
kind of catastrophic I can't recover from this. And it's certainly
something we don't want anybody to overlook, and so we try to make it
really jump out at you when
it tells you that. So this is in symbio? :It is in genlib actually. Oh. So error's actually declared out of genlib. And can we use it -
so it's global basically? It is global. It's a telefree function, and you will definitely have occasion
to use it. Right, it's just - it's your way of saying something happened that there's just no
recovery from and continuing on would not make sense. Here's a
- stop and help
and alert the user something's really wrong, so you don't
want to keep going after this because there's no way to kind of
patch things back together. In
this case probably a more likely thing we'd do, is I should say
give me another name, let's go back around and try again, would be a
sort of better way to handle that. I can
even show you how I would do that.
I could say, well while true,
enter the name,
and maybe I could change this to be well
if it didn't fail
then go ahead and break out of the loop. Otherwise, just report that the file
didn't open,
and say try again.
And then the last thing I will need to do
is clear that state.
So now it's prompting,
trying to open it.
If it didn't fail it will break and then it will move forward to counting the lines.
If it did fail it'll continue on through here reporting this message, and then
that clear, very important, because that clear kind of gets us
back in the state where we can try again. If we don't clear the error and we try to do
another in .open, once the string is in a fail state it stays in a fail
state until you clear it, and no subsequent operation will work whatsoever.
It's just ignoring everything you ask it to do
until you have acknowledged you have done something about the problem, which
in this case was as simple as clearing and asking to open again.
So if I do it this way
I enter some name it'll say that didn't open, try again. And then if I say
handout.txt,
it'll open it and
go ahead and read. All right,
any questions about iostreams? We're
gonna move away from this [inaudible], if there's anything about it you'd like to know I'd be happy to answer it.
So let me
get us back to
our slides,
and I'll kind
of move on to the more object-oriented features of the things we're going to be
depending on and using this quarter.
So the libraries that we have been looking at,
many of them are just provided as what we call free functions. Global functions that
aren't assigned to a particular object, they are part of a class, so asking for random
integer,
reading a line, competing the square root,
gobs of things are there that just kind of have
functionality that you can use anywhere and everywhere procedurally.
We've just started to see some things that are provided in terms of classes,
the string of the class, that means that you have string objects that you're messaging and
having them manipulate themselves.
The stream object also is class, ifstream, ofstream, those are all classes
that you send messages like open to
and fail to, to ask about that streams state or reset its state. This idea of
a class is one that's hopefully not new to you. Most of you are coming from Java
have - this is pretty much the only mechanism for writing code for
Java is in the context of a class. Those
of you who haven't seen that as much, we're going to definitely be practicing
on this in our - some simple things you need to know to kind of just get up to the
vocabulary wise is class is just a way of taking a set of
fields or data
and attaching operations to it to where it kind of creates a kind of an
entity that has both its state and its functionality kind of packaged
together.
So in the class interface you'll say here is a time object, and a time object has an
hour and a minute
and you can do things like
tell me if this time's before that time or what the
duration starting at this time and this end time would - there would be all these
behaviors that are like [inaudible] to do. Can you print a time, sure. Can I read a time for a
file, sure.
As long as the interface for the time class provides those things, its kinda this fully
flip - fleshed out
new data type
that then you use time objects of whenever you need to work with time.
The idea is that the client use the object, which is the first role we're
gonna be in for a couple weeks here,
is you learn what the abstraction is. What does the class provide? It provides the notion of a
sequence of characters, that's what stream does. And so that sequence
has all these operations; like well tell me what characters are at this position, or
find this sub-string,
or insert these characters, remove those characters. And
internally it's obviously doing some machinations to keep track of what you
asked it to do and how to update its internal state. But what's neat is that from
the outside as a client you just think well there's a sequence of
characters there and I can ask that sequence of characters to do these
operations, and
it does what I ask,
and that I don't need to know
how it's implemented internally. What mechanisms it uses and how it responds
to those things to update it state
is very much
kind of behind the abstraction or inside that black box, sometime we'll call it to kind
of
suggest to ourselves that we can't see inside of it, we don't know how it works. It's
like the microwave, you go up and you punch on the microwave and you say cook for a minute. Like
what does the microwave do? I don't know, I have no idea, but things get hot, that's what I
know.
So the nice thing about [inaudible] is you can say, yeah,
if you push this button things get hot and that's what I need to know.
[Inaudible] has become widely
industry standard in sort
of all existing languages that are out there. It seems like there's been
somebody who's gone to the trouble of trying to extend it to add these
object [inaudible] features and languages like Java that are fully object
oriented, are very much all the rage now.
And I thought it was interesting to take just a minute to talk about well why is it so
successful? Why is object oriented like the next big thing in programming?
And there are some really good valid reasons for why it is a very
sensible approach to writing programs
that is
worth thinking a little bit about.
Probably the largest sort of
motivation for the industry has to do with this idea of taming complexity
that certainly one of the
weaknesses of
ourself as a discipline is that
the complexity kinda can quickly spiral out of control.
The programs that -
as they get larger and larger, their interactions get harder and harder to
model and we have more
and more issues where we have bugs and security flaws and viruses
and whatnot that exploit holes in these things.
That we need a way as engineers to kind of
tighten down our discipline and really produce things that actually
don't have those kind of holes in them.
And that object oriented probably means one of the ways to try to manage the complexities of
systems.
That instead of having lots and lots of code that [inaudible] things, if you can
break it down into these objects, and each
class that represents that object can be
designed and tested and worked on independently,
there's some hope that you can have a team of programmers working together,
each managing their own classes
and have them be able to not interfere with each other too much to kind of
accomplish -
get the whole end result done by having people collaborate, but without them kind
of stepping on top of each other.
It has a - the advantage of modeling the real world, that we tend to talk to talk about
classes that kind of have names that speak to us, what's a ballot, what's
a class list, what's a database, what is a
time, a string,
that - a fraction? These things kind of - we have ideas about what those things are
in the real world, and having the class
model that abstraction makes it easier to understand what the code is doing and
what that objects role is
in solving the problem.
It also has the advantage of [inaudible] use. That once you build a class and it's
operations, the idea is that it can
be pulled out of the - neatly out of the one program and used in another if the
design has been done,
and can be changed extended fairly easily in the future if the design was
good to begin with.
So let me tell you what
kind of things we're going to be doing in our class library
that will help you to kind of just become a big fan
of having a bunch of pre-written classes around.
We have,
I think, seven classes - I think there's eight actually in our class library
that just look at certain problems that either
C++
provides in a way that's not as convenient for us, or is kind of missing,
or that can be improved on where we've
tackled those things and given you seven classes that you just get to use from
the get go
that solve problems that are likely to come up for you.
One of them is the scanner,
which I kind of separated by itself because it's a little bit of an unusual class, and
then there's a bunch of container classes on that next line, the vector
grid, staque, math and set
that are used for storing data, different kinds of collections,
and they differ in kind of
what their usage pattern is and what they're storing,
how they're storing it for you.
But that most programs need to do stuff like this, need to store some kind of
collection of date,
why not have some good tools to do it.
These tools kinda let you live higher on the food chain. They're very efficient,
they're debugged, they're commented, the abstraction's been thought about and
kind of worked out
and so they provide kinda this very useful piece of function [inaudible] kinda written to you
ready to go.
And then I - a little note here is that we study these - we are going to study these
abstractions twice.
We're gonna look at these seven classes
today and Friday as a client, and then start using them all through the quarter.
In about a week or so after the mid-term we're gonna come back to them
and say, well how are they implemented?
That after having used them and appreciated what they provided to you, it
will be interesting, I think, to open up the hood
and look down in there and see how they work.
I think this is - there is an interesting pedagogical
debate going on about this, about
whether
it's better to first know how to implement these things and then get to
use them, or to use them and then later know how to implement them.
And I liken it to a little bit if you think about some things we do
very clearly one way or the other in our curriculum, and it's interesting to think about
why.
That when you learn, for example, arithmetic as a
primary schooler,
they don't give you a calculator and say, here, go do some division and multiplication,
and then later try to teach you long division.
You'll never do it. You'll be like, why would I ever do this, this little box
does it for me, the black box.
So in fact they drill you on your multiplication tables and
your long division long before they let you touch a calculator,
which I think is one way of doing it. And, so - and for example, it's like
we could do that with you, make you do it the kind of painful way and then
later say, okay, well here's these way you can avoid
being bogged down by that tedium.
On the other had, think about the way we teach you to drive.
We do not say, here's a wheel and
then they say,
let me tell you a little bit about the combustion engine, you
know, we give you some spark plugs and
try to get you to build your car from the ground up. It's like you learn to drive
and then if you
are more interested in that you might learn what's under the hood, how to
take care of your car, and eventually how to do
more serious repairs or design of your own care.
Where I think of that as being a client first model, like you learn how to use the car
and drive and get places and then if it
intrigues you, you can dig further to learn more about how the car works.
So that's definitely - our model is more of the drive one than the arithmetic one that
it's really nice to be able to drive places first. Like if I - we spent all quarter
learning how to build a combustion engine and you didn't get to go
anywhere,
I'd feel like you wouldn't have tasted what - where you're trying to get, and why that's
so fabulous. So
we will see them first as a client, and you'll get to do really neat things. You'll discover this thing called
the map where you can put thousands, millions of entries in and
have instantaneous look-up access on that.
That you can put these things in a stack or a queue and then have them maintained
for you and popped back out
and all the storage of that being managed and the safety of that being managed without
you having to kinda take any active role in that. That they provide functionality to
you, that you just get
to -
leverage from the get go, and hopefully it will cause you to be
curious though, like how does it work, why does it work so well,
and what kind
of things must happen behind the scenes and under the hood
so that when we get to that you're actually kind of inspired to know
how it did it, what it did.
So I'm gonna tell you about the scanner
and maybe even tell you a little bit about the vector today, and then we'll do the remaining
ones on Friday, perhaps even carrying over a little bit into the weeks
to get ourselves used to what we've got.
The scanner I kind of separated because the scanner's more of a task based object then it
is a
collection or a container for storing things. The scanner's job is to break
apart input into tokens. To take a string in this case that either you read from
the file or you got from the user, or you constructed some way, and just tokenize
it. It's called tokenizer parsec.
That this is something a little bit like - strained extraction kind of does this,
but strained extraction, as I said, isn't very flexible,
that it doesn't
make it easy for you to kind of - you
have to sort of fully anticipate what's coming up on the string. There's not
anyway you can sort of
take a look at it and then to decide what to do with it and
decide how to change your parstring strategy. And scanner has a kind of flexibility that
lets it be a little bit more
configurable about what you expect coming up and how it works.
So the idea is that basically it just takes your input, you know, this line contains ten
tokens,
and as you go into a loop saying,
give me the next token, it will
sub-string out and return to you this four character string followed by this single
character space and then this four character line
and space, and so the default behavior is to extract all the tokens to come up,
to use white-space and punctuation as delimiters. So it will kind of
aggregate letters and numbers together
and then individual spaces and new lines and tabs will come out as single
character tokens. The parenthesis and dots and number signs would all come out as single character
tokens,
and it just kind of divides it up for you.
Okay.
It has fancy options though that let you do things like discard those face
tokens because you don't care about them. To do things like read
the fancy number formats. So it can read
integer formats and real formats, it can do the real format with exponentiation
in it with leading minus', things like that,
that
you can control
with these setters and getters, like what it is you wanted to do about those things.
You can it things like when I see an opening quote, I want you to gather everything to
the closing quote, and so it does kind of
gather
phrases out of sequence if that's what you want. And so you have control over
when and where it decides to do those things that lets you kind of
handle a variety of kind of parsing and dividing tasks by using the scanner
to get that job done. So I listed some things you might need, if you're
reading txt files, you're parsing expressions, you were processing some kind of commands, that
this scanner is a very handy way to just divide that [inaudible] up.
You could certainly do this kind of stuff manually,
for example,
like using the find on the string and finding those faces and dividing it up, but
that the idea is just doing that
in a more convenient way for you
than you having to handle that process manually.
This is what its interface looks like.
So this is a C++ class definition. It looks
very similar to a Java class definition, but there's a little bit of
variation in some of the ways the syntax comes through in the class.
The class being here is scanner,
the public colon introduces a sequence of where everything from
here until the next access modifier is
public. So I don't actually have public repeated again and again on all the
individual entries here.
It tells us that the scanner has a constructor
that takes no arguments; it just initializes a new empty scanner.
I'm gonna skip the destructor for a second; I'll come back to it.
There is a set input member function that you give it the string that you want
scanned and then there's these two
operations that tend to be used in a look where you keep asking are there more
tokens and if so, give me the next token, so it
just kind of pulls them out one by one. I picked
just one of the space - of the particular advanced options to show you
the format for them. There's actually about six more that deal with
some other more obscure things.
This one is
how is it you'd like it to deal with spaces,
when you see face tokens, should they be returned as ordinary tokens or should you
just discard them entirely and not even bother with them?
The default is what's called preserve spaces, so it really does return them, so if
you ask and there's only spaces left in the file, it will say there are more tokens
and as you call the next token we'll return those spaces as individual tokens.
If you instead have set the space option of ignore spaces, then it will just
skip over all of those, and if all that was left in the file was white space
when you ask for more tokens, it will say no.
And when you ask for a token and there's some spaces leading up to
something it will just skip right over those and return the next non-space token.
There's a variety of these other ones that exist
that handle the floating point and the double quote and other kind of
fancy behaviors.
There's one little detail I'll show you that's a C++ ism that isn't
- doesn't really have a Java analog,
which is the constructor which is used as the initialization function for a
class
has a
corresponding destructor.
Every class has the option of doing this.
That is
the - kind of when the object is being created, the constructor is being called. When the
object is being de-allocated or destroyed, going out of scope, the destructor is
called.
And the pairing allows sort of the constructor to do any kind of set up that needs to be
done and the destructor to do any kind of tear down that needs to be done.
In most cases there's not that much that needs to be there, but
it is part of the mechanism that allows all classes to have an option kind of at
birth and death to do what it needs to do. For example, my file
stream
object, when you -
when it goes away, closes it file automatically. So it's a place where the
destructor gets used to do cleanup as that object is no longer valid.
So a little bit of
scanner
code
showing kind of the most common access pattern, is you declare the
scanner. So at this point the scanner is empty, it has no contents to scan.
Before I start pulling stuff out of it,
I'm typically gonna call a set input on it, passing some string. In this case the
string I'm passing is the one that was entered by the user, using getline.
And then the
ubiquitous loop that says well while the scanner has more tokens, get the next
token.
And in this case I'm not even actually paying attention to what those tokens are, I'm
just counting them.
So this one is kind of a
very simple access that just says just call the next token as many times as you can
until there
are no more tokens to pull out. Way in the back? [Inaudible] I
mean, like
in the beginning when it says scanner, scanner, do we write scanner scanner = new
scanner () or [inaudible]?
Yes.
Not exactly. So that's a very good example of like where Java and C++ are gonna
conspire to trip you up just a little bit,
that in Java objects were always printed using the syntax of new. You say new
this thing, and in fact that actually does an allocation
out in what's called the heap
of that object and then from there you use it.
In C++ you actually don't have to put things in the heap, and in fact
we will rarely put things in the heap, and that's what new is for.
So we're gonna use the stack to allocate them. So when I say scanner scanner,
that really declares a scanner object right there
and in this case there are no [inaudible] my constructor, so I don't have anything in
parenths. If there were some arguments I would put parenths and put the
information there,
but the constructor is being called even with out this new. New actually is
more about where the memory comes from. The constructor is called regardless of
where the memory came from. And so this is the mechanism of C++ to get
yourself an object tends to be, say the class name, say the name of the variable.
If you have arguments for the constructor, they will go in parenths
after the variable's name.
So if scanner had
something, I would be putting it right here,
open parenth, yada, yada.
So that's a little
C++/Java
difference. Oh, that's good. Question over
here?
When do we have to use the destructor?
So typically you will not ever make a call that explicitly calls the
destructor. It happens for you automatically. So you're - [inaudible] you're gonna
see it in the interface as part of the completeness of the class it, here's how I
set up, here's how I tear down.
When we start implementing classes we'll have a reason to think more seriously about
what goes in the destructor. But now you will never explicitly call it. Just know that
it automatically gets called for you.
The constructor kinda gets automatically called; the destructor gets automatically called, so
just know that they're there. One
of the things that's - I just want to encourage you not to get too
bogged down in is that there's a lot of syntax to C++. I'm trying to give
you the important parts that are going to matter early on, and we'll see more and
more as we go through.
Don't let it get you too overwhelmed, the feeling of it's
almost but not quite like Java and it's going to make me crazy.
Realize that
there's just a little bit of differences that you kinda got to absorb, and once you
get your head around them actually you will find yourself very able to
express yourself without getting too tripped up by it. But it's just at the beginning I'm sure
it feels like you've got this big list of here's a thousand things that are a
little bit different that -
and it will not be long before it will feel like your native language, so
hang in there with us.
So
I wanted to show you the vector before we get done today and then we'll
have a lot more chance to talk about this on Friday. That the other six
classes that come in [inaudible] class library
are all container classes. So containers are these things like they're buckets
or shells or bags. They hold things for you. You stick things into the
container and then later you can retrieve them.
This turns out to be the most common need in all programs. If you look
at all the things programs do, [inaudible] manipulating information, where are
they putting that information, where are they storing it?
One of the sorts of obvious needs is something that is just kind of a
linear collection. I need to put together the 100 student that are in
this class in a list, well what do I do - what do I use to do that?
There is a build in kind of raw array, or primitive array in C++. I'm not
even gonna show it to you right now.
The truth is
it's functional, it does kinda what it sets out to do, but it's very weak.
It has constraints on how big it is
and how it's access to it is. For example, you can make an array that has 10 members
and then you can axe the 12th member or the 1,500th member
without any good error reporting from either the compiler or the runtime
system.
That it's designed for kind of to be a professional's tool and it's very efficient,
but it's not very safe.
It doesn't have any convenience attached to it whatsoever. If you have a - you
create a ten number array and later you decide you need to put 12 things into
it,
then your only recourse is to go create a new 12 number array and copy over
those ten things
and get rid of your old array and make a totally new one, that you can't take the
one you have and just grow it
in the standard language.
So we'll come back to see it because it turns out there's some reasons we're gonna need to
know how it works. But for now if you say if I needed to make a list what I want
to use is the vector.
So we have a vector class
in our class library
that just solves this problem of you need to collect up this sequence of
things, a bunch of scores on a test,
a bunch of students who are in a class,
a bunch of name
that are being invited to a party.
And what it does for you is the things that array does but with safety
and convenience built into it.
So it does bounds checking. If you created a vector and you put ten things
into it,
then you can ask for the zero through 9th entries, but you cannot ask
for the 22nd entry, it will raise an error and
it will use that error function, you will get a big red error message, you will not
bludgeon on unknowingly.
You can add things and insert them and then remove them. So I can go into the array and
say I'd like to put something in slot zero, it will shuffle everything over and make
that space. If I say delete the element that's at zero it will move everything
down. So it just does all this kind of handling of
keeping the integrity of the list
and its ordering maintained
on your behalf.
It also does all the
management of how much storage space is needed. So if I put ten things into
the vector and I put the 11th or the 12th or the - add
100 more,
it knows how to make the space necessary for it.
Behind the scenes it's figuring out where I can get that space and how to take
care of it. It always knows what count it has and what's going on there, but
its doing this on our behalf in a way that that rawray just does not, that becomes
very tedious and error prone if it's our responsibility to deal with it.
So what the vector is kind of running, it's an instruction. And this is a key word for us in
things that we're going to be talking about this quarter
is that
what you really wanted was a list.
I want a list of students and I want to be able to put it in sorted order or
find this person or print them.
The fact that where the memory came from and how it's keeping track of is really
a tedious detail that I'd rather not have to deal with. And that's exactly
what the vector's gonna do for you, is make it so
you store things and the storage is somebody else's problem.
You use a list,
you get an abstraction.
How that - there's one little quirk, and this is
not so startling to those of you who have
worked on a recent version of Java,
is in order to make the vector generally useful,
it cannot store just one type of thing.
That you can't make a vector that stores [inaudible] and
service everyone's needs, that it has to be able to hold vectors of doubles
or vectors of strings or vectors of student structures
equally well.
And so the way the vector class is actually supplied is using a
feature in the C++ language called templates where
the vector describes what it's storing using a placeholder. It says, well this is a
vector of something and
when you put these things in they all have to be the same type of thing
and when you get one out you'll get the thing you put in,
but I will not commit to, and the interface saying it's always an integer,
it's always a double.
It's left open and then the client has to describe what they want when they're
ready to use it.
So this is like the Java generics. When you're using an array list you said, well what
kind of things am I sticking in my array list, and then that way
the compiler can keep track of it for you and help you to use it correctly.
The interpart of this kinda
looks as
we've seen before. It's a class vector,
it has a constructor and destructor
and it has some operations that
return things like the number of elements that you can find out whether it
has zero elements, you can get the element at index, you can set the element at
index,
you can add, insert and remove
things within there.
The one thing that's a little bit unusual about it is that every time it's
talking about the type of something that's going into the vector or
something that's coming out of the vector,
it uses this elem type
which traces its origin back to this template header up there,
that is the clue to you that the vector
doesn't
commit to I'm storing ants, I'm storing doubles, I'm storing strings, it stores some
generic elem type thing,
which went the client is ready to create a vector, they will have to make
that commitment and say this vector is gonna hold doubles, this vector is
gonna hold ants,
and from that point forward that vector knows that the
getat on a vector of ants returns something of n type. And then add on a vector of nts
expects a perimeter of n type,
which is distinct from a vector of strings or a vector
of doubles. So I'll
show you a little code and we'll have to just really talk about this more deeply on
Friday.
A
little bit of this in text for how I make a vector of [inaudible] how I make a vector of
strings, and
then
some of the things that you could try to mix up
that the template will actually
not let you get away with,
mixing those types. So
we'll see this on Friday, so don't worry,
there will be time to look at it
and meanwhile good luck getting your compiler set up.