-
In this section, we're gonna look at a new
form of data, called a table. And once we
-
look at how tables work, then we're gonna
play around with code that manipulates
-
tables. So it's very similar to the way
earlier we did images and then looked at
-
the code that manipulates images. The code
to work with tables will actually in some
-
ways look, similar to the code that worked
on images. So my goal is that the real
-
patterns that make any sorta code work are
gonna start coming through. So, tables are
-
a really common way to organize data on
the computer. So as a running example for
-
this section, I'm gonna use the social
security baby names database. So the
-
social security administration does
retirement benefits and stuff in the US.
-
But they also happen to track, every year.
What names are given to babies born in
-
that year in the US? And so that's gonna
be kinda fun data set that we're gonna
-
use, So here I've, I've structured this as
an example of a table. So, as I was
-
saying, table's a way of storing data.
It's basically, you can think of it as
-
like a rectangle. So the way the table
works is that it is first organized into
-
fields. So the baby data is organized into
four fields and the fields are name, rank,
-
gender and year, Look at the other fields
as basically as the columns that make this
-
thing up, And then the data is stored in
what we'll call rows. So here's the first
-
row has the data for the name Jacob, so it
says the name is Jacob, the rank is one
-
for that name and what rank one for this
data set is that Jacob is the most popular
-
boy name for babies born in 2010. Then we
have gender boys in years 2010. So the
-
second row has another name. So each name
has its own row. So in this case it says
-
the name is Isabella, the rank is one. So
what that means is Isabella was The most
-
popular girl name for babies born in 2010.
So, then we see, Ethan has rank two for
-
boy names. Sophia has rank two for girls,
and so on. So it, the, the table just has
-
all the names. In this case there, they're
shown, sorted by rank. So there's o ne row
-
per name. In this case it has the 1,000
top boy names and the 1,000 top girl
-
names. So, it's, there's 2,000 rows
overall. So as I was saying, tables are
-
really common for storing all sorts of
data on the computer. You may have heard
-
the term database. So, a database is a
related concept to this, sort of simple,
-
basic idea of a table. Generally the way
this works is that the fields are, are, or
-
you can think of them as the categories,
the number of fields is not very big.
-
Fields, and there might be eight or ten or
something. So they represent kinda the
-
fixed categories we wanna keep track of.
And then the number of rows could be
-
enormous. It might be millions or maybe
even billions of rows. So I'll just,
-
mention a couple examples. So you could
think of your, your email inbox is maybe
-
stored in a table on the computer. So the
way that would work is, well, what would
-
the fields be? The fields might be
something like from, and to, and date, and
-
subject, and, you know, a few other things
that you store, per message. And then one
-
row is just one message. So each message
gets its own row, and then we have this,
-
fixed number of fields. So then when you
go to your inbox, well, there might be.
-
10,000 rows in there for all your email
and maybe when you go to your inbox it
-
just selects the ten most recent ones and
shows you, maybe not all the fields, but
-
maybe the most important fields from that
message. Another example is Craig's List.
-
Or, you know, any sorta online auction
site. Where maybe it's stored, it could be
-
stored in a table where one row is gonna
be one item for sale. And then the fields
-
would again be sorta the categories that
you want for one item. So the categories,
-
the fields might be the price, the date it
was listed. Maybe a short description, and
-
a long description, and a few things like
that. So those are just a couple examples
-
of how many of the things you deal with
day to day often, back on the computer,
-
that's gonna be stored in some kinda
table. Alright, so to make this real, I
-
wanna look at co de to manipulate, tables.
And I'm gonna use the baby name table as
-
sort of our, our working example for a, a
couple sections here. So, in this case,
-
the baby data for 2010 is stored in,
baby-2010.csv. I should just mention, CSV
-
stands for Comma Separated Values. It's a
standard for storing, essentially table
-
data in a text file, and it's a really
simple, fairly old standard. So it's a
-
pretty, you know, easy way to interchange
data from one program to another. So in
-
terms of the code, I'll make my analogy to
images. So for images, we had four pixel
-
colon images, And that would loop through
all the pixels in the image, and for each
-
pixel. Everyone, whatever this code was
inside the colon braces. So, for the table
-
to be very similar we're going to have
four row colon table, And what that's
-
going to do is it's just going to loop
through each row through the table. So, it
-
just starts from the top and go through
each one. And for each row it's going to
-
run whatever code I put in the colon
braces. So, here is our first example.
-
That is the line, very similar to, loading
an image. So that's the line that, grabs
-
the table and stores it in a variable,
which I will inevitably just call the
-
table, And then here I have the four loop,
sorta looking through all the rows. And in
-
this case, the simplest thing I'm gonna do
is I'm just gonna say, print row. So, I'm
-
just gonna, essentially just, you know,
look at a, print each row in the data. So
-
this is the baby data, so if I run this.
There is row one and row two and so on, So
-
you can see that Jacob, Isabel, Ethan,
those fairly popular names. It actually
-
made my web page quite tall because of
course there is two thousand of these
-
things. So you know there's Courtney with
a K The 637 popular girl names. So it runs
-
all the way down here as I was saying.
Oops, to a, to a thousand. So Acre and an
-
Danea, So That is one thing, so what, I
guess what this shows, sort of, a bulk
-
output thing, but what it shows is, that
line ran 2,000 times. Once for each row in
-
the table. So, just as with the image, the
four loop just went through and looked at
-
each one. Alright so here I'm gonna comma
this out and run again just to get rid of
-
the output so I can have my webpage and
I'll be a mile high here. So what are we
-
gonna do with the table? Just looping
through and printing each row, that's like
-
[laugh], like for Craigslist or for your
email. That's never what you want. What we
-
want is to loop through all the rows and
just pick out the six or two of the 2,000
-
that we want. This is very common thing to
do with table [inaudible]. It is sometimes
-
called in database terminology a quarry.
That I'm going to kind of sort of narrow
-
down to just the rows I want. So, let's
talk about the code to do that. So
-
[inaudible] we're going to do this with an
IF statement, Put an IF statement inside
-
the loop and in the IF task we will write
a task to select just some of the rows. So
-
here's gonna be my first example. So here
is the four loop. So that's looping
-
through all the rows. And then inside the
four loop, I've got this if statement. So
-
what's gonna happen is, this highlighted
code is gonna run again and again and
-
again, once for each row in the thing. And
so what I've done. So I've, written a test
-
here, and my, the goal here is, in this
case, is to just pick out the rows where
-
the rank is six. And so, let me talk about
how that works. So what's gonna happen is
-
that highlighted test, that test is gonna
be evaluated once for every row. So in a
-
sense 2000 times. So, what I'm gonna do is
structure the test so it's true for a row
-
I care about. And then inside of here I'll
put a print, so it'll print the ones I
-
care about. In all the other rows this
will be false, and so it won't print the,
-
won't print those. All right, so how does
this work? So just as for the pixel, we
-
had get red and get green and get blue the
row has get field. And so you could,
-
remember we called it a row because all
the way across it has a bunch of different
-
fields. So you can say, well, which field
do you want? The way this works is each
-
field has a n ame. In this case, the names
are name, rank, gender and year. So in
-
this case, I say get field. And then,
within the parentheses, I say in a string,
-
which field do I want by name? So in this
case, I'm, like, oh right. I wanna go to
-
the row, and I wanna pick out the rank. So
this highlighted part that goes to the
-
row. And that picks out the rank. Just as
before we would have a pixel dot get red
-
and that would pick, that would pull the
red just out of the pixel, so this is
-
analogous but for a table. So now my call
here for this example is I wanted to just
-
show what the rows where the rank
[inaudible] required new little bit of
-
code. So having picked the rank out here,
then I says equals, equals, which I think
-
we already used before, but two equal
signs next to each other that compares two
-
things for equality, it tested they are
the same. And so road get field rank
-
equal, equal six. What that says is, get
the rank out, and test if it's six. And if
-
it's six, we'll say that that's, the test
is true. And if it's not, we'll say it's
-
false. So, let me just try running this.
So if I run it, what's happened is, it
-
went through all 2,000 rows. And for these
two rows, that test was true, Because
-
that's the case where the, the rank was
six. And obviously, you know, I could say
-
it, like, 127 here or whatever. And then
we would get the two rows. It just
-
happens; each rank number has one boy name
and one girl name in the Stata set. So,
-
that's why I keep getting two rows here.
So let me try another example. Oh, also I
-
should mention a, a warning about this. So
I'll change this back to six, quick. So
-
this use of the two equals for equality is
a little odd in computer code. I think it
-
would be very reasonable to think, oh,
what, shouldn't there be just one equal
-
sign? Right? If rank equals six? And
unfortunately the single equal sign in
-
JavaScript already has been used for
variable assignment. It's kinda already
-
dedicated to meaning that. And so they
couldn't use it for quality, so that's why
-
there's this different symbol for equa
lity. Now, just for this class. So the,
-
it's actually a pretty common error coding
to sort of accidentally type a single
-
equal sign, when someone meant two equal
signs for comparison. In this case. I've
-
outfitted the run button with some special
checking code, where it notices if in an
-
if test, it sees a single equal sign, And
it gives this error message that basically
-
says, hey, did, did you maybe mean to use,
two equal signs? So, that is an easy error
-
to make, but. Hit the run button and we'll
catch it for you. That, that's something I
-
just did for this class, Alright so now
let me do a now let me do another example.
-
So the test I did before I tested if rank
was six but really any kind of test as we
-
were doing before with images, will work
here. So in this case what I'm going to do
-
is I want to go through the data set and I
want to find the data, let's just say, for
-
Alice. So as I mentioned before forget
field you can just patch in the name for
-
any field. So, you would need to know what
the field names are. For this data set
-
they are name ranked under here and here.
So, here I will go to the row and say, hey
-
give me the name field. So I'll say, name
there. And then I'll, I'll equals, equals,
-
test if the name is, is the same as Alice.
So, if I run that. In effect what this
-
does is it just pulls out the Alice row.
It goes through all the rows, does this
-
test, and if the name is Alice, let's hear
the English translation of this, then it
-
prints the row out. Alright, so that's the
basic pattern. So let me just work a few
-
examples for this. So, the pattern is
gonna be, [inaudible] just as I was doing.
-
We have a four loop, there's an if
statement side of it. And then really, all
-
of the action is in the parentheses of the
test. Where I say row.getfield something,
-
and I have some test about it. So let's
try these. So if I run it this way, we
-
pull out, it says, if name is equal, equal
to Alice, I get the Alice row. If I wanted
-
to look for something else, pull out some
other data, we could say Robert. So Alice
-
is 172. Ro bert is 54. Let's try Abby.
284. So, what's happening is, this
-
highlighted test is happening all 2000
times. And it's just a question of which
-
rows are we, are we picking out there? I
did Robert before. I'll show you something
-
kind of funny. If you do Bob and you run.
Nothing appears here. What's going on
-
there is actually no one names their kid
Bob. Apparently, so what's happening is
-
that we are getting no... Zero printing is
happening here. This thing was just never
-
true. That's sort of the pattern on the
form I guess for just as how people name
-
babies is that they tend of the form...
They put a long name, like Robert. So, and
-
then Bob is like, they don't put on the
form. Maybe that's just what they actually
-
call the kid. Alright, so let me try a
different test. Let's say I wanna test if
-
the rank is one. So I would change get
field, and I would type rank here. And
-
then the equals, equals. I can say one,
sure. So that gives me the two rows Jacob
-
and Isabelle. We saw four, those are rank
one. So. [inaudible], what was the other
-
one we did 1,000. So say rank equals a
thousand. And we get crew ending. So the
-
test we did earlier with images like less
than, less than equal to. All that stuff
-
works too. So let's say I wanna look at,
if the rank is less than ten. [inaudible]
-
say less than ten and when I run that. You
can see I get, rank one, rank two, rank
-
three, rank... All these are rank numbers
where the less than ten test is true.
-
Although you'll notice the last I get is
Aiden and Cloe, number nine. The rows
-
where rank is ten, I don't get. And that's
because this form of less than is a strict
-
less than. So it's true for nine but it's
not true for ten. If you want, there's
-
another form of less than where you're
like, where you wanna say less than or
-
equal to. And, I don't think we did this
for the images but it's just, what you do
-
is you put in an equal sign right after
it. That means less than or equal to. So
-
if I run it now then it goes through ten.
So, and that works for, greater than as
-
well. Alright, so let's try a, let's try a
greater than one. So I could say, I would
-
like to see all the rows where the rank is
greater than 990, let's say. And so what
-
I, so I get 991, 92, da, da, da, da, up
through 1000. Okay, let me just try one
-
more. I, so [inaudible] examples with name
and rank. And [inaudible] inevitably, I'm
-
calling, road.getfield, and just changing
what string is there to pull out a
-
different field. I'll try pulling out the,
the gender field. And this case, the way
-
the data's coded, the gender field is
it's, it's, it's just strings. So it's
-
either the string boy or string girl. So
if I were to say, if gender is equal,
-
equal to girl. Hit one then I get [sound]
I mean if you look where it say scroll
-
here, what's happened is I have just
gotten all 1,000 girl bros. And, and none
-
of the 1000 [inaudible] woops. Alrighty.
Sorry, let me get this back. So this is
-
ju-, just a trick where I comment out
print, so it prints nothing, and run it
-
again. So then, that way, it just, it just
blanks out the output here. So. Just to
-
repeat what the pattern is. So, t, t,
these first few lines were always the
-
same. And I guess I was always [inaudible]
the row. So the, that was always the same.
-
What I change is the if test. And the gist
of it, the pattern tended to be I would
-
say row.getField, whatever field I care
about. And then I would write equals,
-
equals or less than or equal to or
something. Let's say on the rank or equal,
-
equal to the name to, in a sense, pull out
the rows. And the rule was, I'm pulling
-
out a row, if this test is true. And so,
with that in mind, well this can be a good
-
source of some exercises.