< Return to Video

Table Data (16 mins)

  • 0:00 - 0:05
    In this section, we're gonna look at a new
    form of data, called a table. And once we
  • 0:05 - 0:09
    look at how tables work, then we're gonna
    play around with code that manipulates
  • 0:09 - 0:14
    tables. So it's very similar to the way
    earlier we did images and then looked at
  • 0:14 - 0:19
    the code that manipulates images. The code
    to work with tables will actually in some
  • 0:19 - 0:24
    ways look, similar to the code that worked
    on images. So my goal is that the real
  • 0:24 - 0:28
    patterns that make any sorta code work are
    gonna start coming through. So, tables are
  • 0:28 - 0:33
    a really common way to organize data on
    the computer. So as a running example for
  • 0:33 - 0:38
    this section, I'm gonna use the social
    security baby names database. So the
  • 0:38 - 0:42
    social security administration does
    retirement benefits and stuff in the US.
  • 0:42 - 0:47
    But they also happen to track, every year.
    What names are given to babies born in
  • 0:47 - 0:51
    that year in the US? And so that's gonna
    be kinda fun data set that we're gonna
  • 0:51 - 0:56
    use, So here I've, I've structured this as
    an example of a table. So, as I was
  • 0:56 - 1:00
    saying, table's a way of storing data.
    It's basically, you can think of it as
  • 1:00 - 1:04
    like a rectangle. So the way the table
    works is that it is first organized into
  • 1:04 - 1:09
    fields. So the baby data is organized into
    four fields and the fields are name, rank,
  • 1:09 - 1:13
    gender and year, Look at the other fields
    as basically as the columns that make this
  • 1:13 - 1:19
    thing up, And then the data is stored in
    what we'll call rows. So here's the first
  • 1:19 - 1:24
    row has the data for the name Jacob, so it
    says the name is Jacob, the rank is one
  • 1:24 - 1:30
    for that name and what rank one for this
    data set is that Jacob is the most popular
  • 1:30 - 1:36
    boy name for babies born in 2010. Then we
    have gender boys in years 2010. So the
  • 1:36 - 1:41
    second row has another name. So each name
    has its own row. So in this case it says
  • 1:41 - 1:46
    the name is Isabella, the rank is one. So
    what that means is Isabella was The most
  • 1:46 - 1:52
    popular girl name for babies born in 2010.
    So, then we see, Ethan has rank two for
  • 1:52 - 1:57
    boy names. Sophia has rank two for girls,
    and so on. So it, the, the table just has
  • 1:57 - 2:04
    all the names. In this case there, they're
    shown, sorted by rank. So there's o ne row
  • 2:04 - 2:09
    per name. In this case it has the 1,000
    top boy names and the 1,000 top girl
  • 2:09 - 2:15
    names. So, it's, there's 2,000 rows
    overall. So as I was saying, tables are
  • 2:15 - 2:19
    really common for storing all sorts of
    data on the computer. You may have heard
  • 2:19 - 2:24
    the term database. So, a database is a
    related concept to this, sort of simple,
  • 2:24 - 2:29
    basic idea of a table. Generally the way
    this works is that the fields are, are, or
  • 2:29 - 2:33
    you can think of them as the categories,
    the number of fields is not very big.
  • 2:34 - 2:39
    Fields, and there might be eight or ten or
    something. So they represent kinda the
  • 2:39 - 2:43
    fixed categories we wanna keep track of.
    And then the number of rows could be
  • 2:43 - 2:48
    enormous. It might be millions or maybe
    even billions of rows. So I'll just,
  • 2:48 - 2:53
    mention a couple examples. So you could
    think of your, your email inbox is maybe
  • 2:53 - 2:57
    stored in a table on the computer. So the
    way that would work is, well, what would
  • 2:57 - 3:01
    the fields be? The fields might be
    something like from, and to, and date, and
  • 3:01 - 3:06
    subject, and, you know, a few other things
    that you store, per message. And then one
  • 3:06 - 3:11
    row is just one message. So each message
    gets its own row, and then we have this,
  • 3:11 - 3:15
    fixed number of fields. So then when you
    go to your inbox, well, there might be.
  • 3:15 - 3:19
    10,000 rows in there for all your email
    and maybe when you go to your inbox it
  • 3:19 - 3:23
    just selects the ten most recent ones and
    shows you, maybe not all the fields, but
  • 3:23 - 3:28
    maybe the most important fields from that
    message. Another example is Craig's List.
  • 3:28 - 3:32
    Or, you know, any sorta online auction
    site. Where maybe it's stored, it could be
  • 3:32 - 3:37
    stored in a table where one row is gonna
    be one item for sale. And then the fields
  • 3:37 - 3:41
    would again be sorta the categories that
    you want for one item. So the categories,
  • 3:41 - 3:46
    the fields might be the price, the date it
    was listed. Maybe a short description, and
  • 3:46 - 3:51
    a long description, and a few things like
    that. So those are just a couple examples
  • 3:51 - 3:55
    of how many of the things you deal with
    day to day often, back on the computer,
  • 3:55 - 4:00
    that's gonna be stored in some kinda
    table. Alright, so to make this real, I
  • 4:00 - 4:06
    wanna look at co de to manipulate, tables.
    And I'm gonna use the baby name table as
  • 4:06 - 4:12
    sort of our, our working example for a, a
    couple sections here. So, in this case,
  • 4:12 - 4:18
    the baby data for 2010 is stored in,
    baby-2010.csv. I should just mention, CSV
  • 4:18 - 4:23
    stands for Comma Separated Values. It's a
    standard for storing, essentially table
  • 4:23 - 4:28
    data in a text file, and it's a really
    simple, fairly old standard. So it's a
  • 4:28 - 4:34
    pretty, you know, easy way to interchange
    data from one program to another. So in
  • 4:34 - 4:39
    terms of the code, I'll make my analogy to
    images. So for images, we had four pixel
  • 4:39 - 4:44
    colon images, And that would loop through
    all the pixels in the image, and for each
  • 4:44 - 4:49
    pixel. Everyone, whatever this code was
    inside the colon braces. So, for the table
  • 4:49 - 4:54
    to be very similar we're going to have
    four row colon table, And what that's
  • 4:54 - 4:59
    going to do is it's just going to loop
    through each row through the table. So, it
  • 4:59 - 5:03
    just starts from the top and go through
    each one. And for each row it's going to
  • 5:03 - 5:08
    run whatever code I put in the colon
    braces. So, here is our first example.
  • 5:08 - 5:13
    That is the line, very similar to, loading
    an image. So that's the line that, grabs
  • 5:13 - 5:17
    the table and stores it in a variable,
    which I will inevitably just call the
  • 5:17 - 5:22
    table, And then here I have the four loop,
    sorta looking through all the rows. And in
  • 5:22 - 5:27
    this case, the simplest thing I'm gonna do
    is I'm just gonna say, print row. So, I'm
  • 5:27 - 5:31
    just gonna, essentially just, you know,
    look at a, print each row in the data. So
  • 5:31 - 5:37
    this is the baby data, so if I run this.
    There is row one and row two and so on, So
  • 5:37 - 5:43
    you can see that Jacob, Isabel, Ethan,
    those fairly popular names. It actually
  • 5:43 - 5:49
    made my web page quite tall because of
    course there is two thousand of these
  • 5:49 - 5:56
    things. So you know there's Courtney with
    a K The 637 popular girl names. So it runs
  • 5:56 - 6:02
    all the way down here as I was saying.
    Oops, to a, to a thousand. So Acre and an
  • 6:02 - 6:07
    Danea, So That is one thing, so what, I
    guess what this shows, sort of, a bulk
  • 6:07 - 6:12
    output thing, but what it shows is, that
    line ran 2,000 times. Once for each row in
  • 6:12 - 6:17
    the table. So, just as with the image, the
    four loop just went through and looked at
  • 6:17 - 6:22
    each one. Alright so here I'm gonna comma
    this out and run again just to get rid of
  • 6:22 - 6:27
    the output so I can have my webpage and
    I'll be a mile high here. So what are we
  • 6:27 - 6:32
    gonna do with the table? Just looping
    through and printing each row, that's like
  • 6:32 - 6:37
    [laugh], like for Craigslist or for your
    email. That's never what you want. What we
  • 6:37 - 6:42
    want is to loop through all the rows and
    just pick out the six or two of the 2,000
  • 6:42 - 6:47
    that we want. This is very common thing to
    do with table [inaudible]. It is sometimes
  • 6:47 - 6:52
    called in database terminology a quarry.
    That I'm going to kind of sort of narrow
  • 6:52 - 6:57
    down to just the rows I want. So, let's
    talk about the code to do that. So
  • 6:57 - 7:03
    [inaudible] we're going to do this with an
    IF statement, Put an IF statement inside
  • 7:03 - 7:08
    the loop and in the IF task we will write
    a task to select just some of the rows. So
  • 7:08 - 7:13
    here's gonna be my first example. So here
    is the four loop. So that's looping
  • 7:13 - 7:18
    through all the rows. And then inside the
    four loop, I've got this if statement. So
  • 7:18 - 7:22
    what's gonna happen is, this highlighted
    code is gonna run again and again and
  • 7:22 - 7:28
    again, once for each row in the thing. And
    so what I've done. So I've, written a test
  • 7:28 - 7:34
    here, and my, the goal here is, in this
    case, is to just pick out the rows where
  • 7:34 - 7:40
    the rank is six. And so, let me talk about
    how that works. So what's gonna happen is
  • 7:40 - 7:44
    that highlighted test, that test is gonna
    be evaluated once for every row. So in a
  • 7:44 - 7:49
    sense 2000 times. So, what I'm gonna do is
    structure the test so it's true for a row
  • 7:49 - 7:53
    I care about. And then inside of here I'll
    put a print, so it'll print the ones I
  • 7:53 - 7:57
    care about. In all the other rows this
    will be false, and so it won't print the,
  • 7:57 - 8:01
    won't print those. All right, so how does
    this work? So just as for the pixel, we
  • 8:01 - 8:07
    had get red and get green and get blue the
    row has get field. And so you could,
  • 8:07 - 8:12
    remember we called it a row because all
    the way across it has a bunch of different
  • 8:12 - 8:16
    fields. So you can say, well, which field
    do you want? The way this works is each
  • 8:16 - 8:22
    field has a n ame. In this case, the names
    are name, rank, gender and year. So in
  • 8:22 - 8:26
    this case, I say get field. And then,
    within the parentheses, I say in a string,
  • 8:26 - 8:30
    which field do I want by name? So in this
    case, I'm, like, oh right. I wanna go to
  • 8:30 - 8:35
    the row, and I wanna pick out the rank. So
    this highlighted part that goes to the
  • 8:35 - 8:38
    row. And that picks out the rank. Just as
    before we would have a pixel dot get red
  • 8:38 - 8:42
    and that would pick, that would pull the
    red just out of the pixel, so this is
  • 8:42 - 8:47
    analogous but for a table. So now my call
    here for this example is I wanted to just
  • 8:47 - 8:52
    show what the rows where the rank
    [inaudible] required new little bit of
  • 8:52 - 8:57
    code. So having picked the rank out here,
    then I says equals, equals, which I think
  • 8:57 - 9:02
    we already used before, but two equal
    signs next to each other that compares two
  • 9:02 - 9:07
    things for equality, it tested they are
    the same. And so road get field rank
  • 9:07 - 9:12
    equal, equal six. What that says is, get
    the rank out, and test if it's six. And if
  • 9:12 - 9:17
    it's six, we'll say that that's, the test
    is true. And if it's not, we'll say it's
  • 9:17 - 9:22
    false. So, let me just try running this.
    So if I run it, what's happened is, it
  • 9:22 - 9:27
    went through all 2,000 rows. And for these
    two rows, that test was true, Because
  • 9:27 - 9:31
    that's the case where the, the rank was
    six. And obviously, you know, I could say
  • 9:31 - 9:36
    it, like, 127 here or whatever. And then
    we would get the two rows. It just
  • 9:36 - 9:40
    happens; each rank number has one boy name
    and one girl name in the Stata set. So,
  • 9:40 - 9:46
    that's why I keep getting two rows here.
    So let me try another example. Oh, also I
  • 9:46 - 9:54
    should mention a, a warning about this. So
    I'll change this back to six, quick. So
  • 9:54 - 9:58
    this use of the two equals for equality is
    a little odd in computer code. I think it
  • 9:58 - 10:03
    would be very reasonable to think, oh,
    what, shouldn't there be just one equal
  • 10:03 - 10:07
    sign? Right? If rank equals six? And
    unfortunately the single equal sign in
  • 10:07 - 10:11
    JavaScript already has been used for
    variable assignment. It's kinda already
  • 10:11 - 10:16
    dedicated to meaning that. And so they
    couldn't use it for quality, so that's why
  • 10:16 - 10:20
    there's this different symbol for equa
    lity. Now, just for this class. So the,
  • 10:20 - 10:25
    it's actually a pretty common error coding
    to sort of accidentally type a single
  • 10:25 - 10:30
    equal sign, when someone meant two equal
    signs for comparison. In this case. I've
  • 10:30 - 10:35
    outfitted the run button with some special
    checking code, where it notices if in an
  • 10:35 - 10:40
    if test, it sees a single equal sign, And
    it gives this error message that basically
  • 10:40 - 10:46
    says, hey, did, did you maybe mean to use,
    two equal signs? So, that is an easy error
  • 10:46 - 10:49
    to make, but. Hit the run button and we'll
    catch it for you. That, that's something I
  • 10:49 - 10:54
    just did for this class, Alright so now
    let me do a now let me do another example.
  • 10:54 - 11:00
    So the test I did before I tested if rank
    was six but really any kind of test as we
  • 11:00 - 11:05
    were doing before with images, will work
    here. So in this case what I'm going to do
  • 11:05 - 11:09
    is I want to go through the data set and I
    want to find the data, let's just say, for
  • 11:09 - 11:14
    Alice. So as I mentioned before forget
    field you can just patch in the name for
  • 11:14 - 11:19
    any field. So, you would need to know what
    the field names are. For this data set
  • 11:19 - 11:24
    they are name ranked under here and here.
    So, here I will go to the row and say, hey
  • 11:24 - 11:28
    give me the name field. So I'll say, name
    there. And then I'll, I'll equals, equals,
  • 11:28 - 11:33
    test if the name is, is the same as Alice.
    So, if I run that. In effect what this
  • 11:33 - 11:37
    does is it just pulls out the Alice row.
    It goes through all the rows, does this
  • 11:37 - 11:41
    test, and if the name is Alice, let's hear
    the English translation of this, then it
  • 11:41 - 11:46
    prints the row out. Alright, so that's the
    basic pattern. So let me just work a few
  • 11:46 - 11:51
    examples for this. So, the pattern is
    gonna be, [inaudible] just as I was doing.
  • 11:51 - 11:56
    We have a four loop, there's an if
    statement side of it. And then really, all
  • 11:56 - 12:00
    of the action is in the parentheses of the
    test. Where I say row.getfield something,
  • 12:00 - 12:05
    and I have some test about it. So let's
    try these. So if I run it this way, we
  • 12:05 - 12:10
    pull out, it says, if name is equal, equal
    to Alice, I get the Alice row. If I wanted
  • 12:10 - 12:15
    to look for something else, pull out some
    other data, we could say Robert. So Alice
  • 12:15 - 12:26
    is 172. Ro bert is 54. Let's try Abby.
    284. So, what's happening is, this
  • 12:26 - 12:31
    highlighted test is happening all 2000
    times. And it's just a question of which
  • 12:31 - 12:37
    rows are we, are we picking out there? I
    did Robert before. I'll show you something
  • 12:37 - 12:44
    kind of funny. If you do Bob and you run.
    Nothing appears here. What's going on
  • 12:44 - 12:48
    there is actually no one names their kid
    Bob. Apparently, so what's happening is
  • 12:48 - 12:52
    that we are getting no... Zero printing is
    happening here. This thing was just never
  • 12:52 - 12:56
    true. That's sort of the pattern on the
    form I guess for just as how people name
  • 12:56 - 13:00
    babies is that they tend of the form...
    They put a long name, like Robert. So, and
  • 13:00 - 13:05
    then Bob is like, they don't put on the
    form. Maybe that's just what they actually
  • 13:05 - 13:10
    call the kid. Alright, so let me try a
    different test. Let's say I wanna test if
  • 13:10 - 13:15
    the rank is one. So I would change get
    field, and I would type rank here. And
  • 13:15 - 13:21
    then the equals, equals. I can say one,
    sure. So that gives me the two rows Jacob
  • 13:21 - 13:27
    and Isabelle. We saw four, those are rank
    one. So. [inaudible], what was the other
  • 13:27 - 13:32
    one we did 1,000. So say rank equals a
    thousand. And we get crew ending. So the
  • 13:33 - 13:38
    test we did earlier with images like less
    than, less than equal to. All that stuff
  • 13:38 - 13:44
    works too. So let's say I wanna look at,
    if the rank is less than ten. [inaudible]
  • 13:44 - 13:50
    say less than ten and when I run that. You
    can see I get, rank one, rank two, rank
  • 13:50 - 13:54
    three, rank... All these are rank numbers
    where the less than ten test is true.
  • 13:55 - 14:00
    Although you'll notice the last I get is
    Aiden and Cloe, number nine. The rows
  • 14:00 - 14:05
    where rank is ten, I don't get. And that's
    because this form of less than is a strict
  • 14:05 - 14:10
    less than. So it's true for nine but it's
    not true for ten. If you want, there's
  • 14:10 - 14:15
    another form of less than where you're
    like, where you wanna say less than or
  • 14:15 - 14:19
    equal to. And, I don't think we did this
    for the images but it's just, what you do
  • 14:19 - 14:23
    is you put in an equal sign right after
    it. That means less than or equal to. So
  • 14:23 - 14:30
    if I run it now then it goes through ten.
    So, and that works for, greater than as
  • 14:30 - 14:35
    well. Alright, so let's try a, let's try a
    greater than one. So I could say, I would
  • 14:35 - 14:41
    like to see all the rows where the rank is
    greater than 990, let's say. And so what
  • 14:41 - 14:45
    I, so I get 991, 92, da, da, da, da, up
    through 1000. Okay, let me just try one
  • 14:45 - 14:50
    more. I, so [inaudible] examples with name
    and rank. And [inaudible] inevitably, I'm
  • 14:50 - 14:55
    calling, road.getfield, and just changing
    what string is there to pull out a
  • 14:55 - 15:00
    different field. I'll try pulling out the,
    the gender field. And this case, the way
  • 15:00 - 15:05
    the data's coded, the gender field is
    it's, it's, it's just strings. So it's
  • 15:05 - 15:09
    either the string boy or string girl. So
    if I were to say, if gender is equal,
  • 15:09 - 15:15
    equal to girl. Hit one then I get [sound]
    I mean if you look where it say scroll
  • 15:15 - 15:20
    here, what's happened is I have just
    gotten all 1,000 girl bros. And, and none
  • 15:20 - 15:26
    of the 1000 [inaudible] woops. Alrighty.
    Sorry, let me get this back. So this is
  • 15:26 - 15:31
    ju-, just a trick where I comment out
    print, so it prints nothing, and run it
  • 15:31 - 15:36
    again. So then, that way, it just, it just
    blanks out the output here. So. Just to
  • 15:37 - 15:41
    repeat what the pattern is. So, t, t,
    these first few lines were always the
  • 15:41 - 15:46
    same. And I guess I was always [inaudible]
    the row. So the, that was always the same.
  • 15:46 - 15:51
    What I change is the if test. And the gist
    of it, the pattern tended to be I would
  • 15:51 - 15:56
    say row.getField, whatever field I care
    about. And then I would write equals,
  • 15:56 - 16:01
    equals or less than or equal to or
    something. Let's say on the rank or equal,
  • 16:01 - 16:06
    equal to the name to, in a sense, pull out
    the rows. And the rule was, I'm pulling
  • 16:06 - 16:13
    out a row, if this test is true. And so,
    with that in mind, well this can be a good
  • 16:13 - 16:15
    source of some exercises.
Title:
Table Data (16 mins)
Video Language:
English
stanford-bot edited English subtitles for Table Data (16 mins)
stanford-bot edited English subtitles for Table Data (16 mins)
stanford-bot added a translation

English subtitles

Revisions