Return to Video

Python for Informatics - Chapter 6 - Strings

  • 0:00 - 0:03
    Hello, and welcome to Chapter Six.
  • 0:03 - 0:05
    This chapter we're going to
    talk about strings, and
  • 0:05 - 0:09
    stuff is going to start to get real now.
  • 0:09 - 0:13
    So, as always, this material, this video,
    these
  • 0:13 - 0:16
    slides and book are copyright Creative
    Commons Attribution.
  • 0:16 - 0:17
    I want you to use these materials.
  • 0:17 - 0:19
    I want you to, somebody else, I want to
  • 0:19 - 0:22
    make more teachers, so everyone can teach
    this stuff.
  • 0:22 - 0:23
    Use it however you like.
  • 0:24 - 0:25
    Okay, so we've been playing with
  • 0:25 - 0:27
    strings from the beginning.
  • 0:27 - 0:28
    I mean, literally, if we didn't work
  • 0:28 - 0:31
    with strings, we could've never printed
    Hello World.
  • 0:31 - 0:36
    And, and lord knows, we need to print
    Hello World in a programming language.
  • 0:36 - 0:40
    And so, we've been using them, especially
    constants.
  • 0:40 - 0:42
    Now, in this chapter, we're going to dig in.
  • 0:42 - 0:47
    So, oops, so a string is a sequence of
    characters.
  • 0:47 - 0:50
    You can use either use single quotes or
    double quotes in Python
  • 0:50 - 0:51
    to delimit a string.
  • 0:51 - 0:55
    And so here's two string constants, Hello
    and there,
  • 0:55 - 0:58
    and stuck into the variables str1
    and str2.
  • 0:58 - 1:01
    We can concatenate them together
    with a plus sign.
  • 1:01 - 1:03
    Python is smart enough to look and say,
  • 1:03 - 1:06
    oh, those are strings, I know what to
    do with those.
  • 1:06 - 1:10
    And you'll notice that the plus doesn't
    add any space here, because when
  • 1:10 - 1:14
    we print bob out, Hello and there are right
    next to one another.
  • 1:14 - 1:17
    If, for example, we've done some
    conversions,
  • 1:17 - 1:19
    so when we were, like, reading pay,
  • 1:19 - 1:21
    and rate, and hours, and stuff,
    we've done some conversions.
  • 1:21 - 1:23
    So this is an example of the,
    a string 1 2 3
  • 1:23 - 1:27
    Not 123, but the string, quote 1 2 3
    quote.
  • 1:27 - 1:29
    And we can't add 1 to this, we get
  • 1:29 - 1:33
    a traceback, kind of, at this point, as we
    expected.
  • 1:33 - 1:37
    And we would convert that to an integer
    using the int function that's built in.
  • 1:37 - 1:40
    See how much Python you already know?
    I mean, this is awesome, right?
  • 1:40 - 1:41
    I can just say,
  • 1:41 - 1:43
    oh, you call the int function,
    and you know what that is.
  • 1:43 - 1:46
    That's, sorry, sorry, I'm just
    awesomed out.
  • 1:46 - 1:51
    So you convert this to an integer, and
    then you add 1 to it, and then we get 124.
  • 1:51 - 1:52
    So, there you go.
  • 1:52 - 1:55
    We've been doing strings all along, had to.
  • 1:55 - 1:57
    I mean, literally, strings and numeric data
  • 1:57 - 2:00
    are the two things that programs deal with.
  • 2:00 - 2:03
    So, we've been reading and converting.
  • 2:03 - 2:05
    Again, this is sort of a pattern from some
    of the earlier programs
  • 2:05 - 2:09
    where we do a raw input, you know?
  • 2:09 - 2:11
    And the raw input just takes a string and
    puts it in a variable.
  • 2:11 - 2:15
    So if I take Chuck, then the
    variable contains the string C-h-u-c-k.
  • 2:16 - 2:19
    Even if we type numbers, that is a string.
  • 2:19 - 2:24
    We can't, just because I put 1 0 0 in,
    I still can't subtract 10.
  • 2:24 - 2:28
    We get a happy little traceback, oh, happy
    little, sad-faced traceback.
  • 2:28 - 2:31
    And, and, but of course, if we convert it
  • 2:31 - 2:34
    into float or something like that.
  • 2:35 - 2:39
    We convert int or float, we can do that
    and subtract 10, and we can do that.
  • 2:39 - 2:42
    So, so we've been doing this for a while.
  • 2:42 - 2:45
    We've been doing strings and manipulating
    strings and converting strings all along.
  • 2:45 - 2:49
    So the thing we're going to start doing
    now is we're going to dive into strings.
  • 2:49 - 2:53
    We realize that strings are addressable at
    a character-by-character basis,
  • 2:53 - 2:56
    and we can do all kind of cool
    things with that.
  • 2:56 - 3:00
    And so, a string is a sequence of
    characters, and we
  • 3:00 - 3:04
    can look inside them using what we call
    the index operator,
  • 3:04 - 3:07
    the square brackets. And we've seen
    square brackets in
  • 3:07 - 3:08
    lists, and you'll see that there's sort of
  • 3:08 - 3:12
    similarities between lists of numbers,
    and, in effect, a
  • 3:12 - 3:14
    string is a special kind of list of
    characters.
  • 3:14 - 3:17
    So if we take this string banana,
  • 3:17 - 3:21
    the string banana starts, the first
    character starts at 0.
  • 3:21 - 3:25
    So, we call this operator sub, so
    letter equals
  • 3:25 - 3:28
    fruit sub 1 and that is the second
    character.
  • 3:28 - 3:31
    Now this may seem a little weird that the
    first character
  • 3:31 - 3:34
    is a 0 and the second character is a 1.
  • 3:34 - 3:38
    It actually is kind of similar to the old
    elevator thing, where in Europe they're
  • 3:38 - 3:41
    called, the first floor is zero, then
    negative one,
  • 3:41 - 3:44
    and the second floor is one, right?
  • 3:44 - 3:46
    It's kind of the same thing.
    Actually, it turns out that
  • 3:46 - 3:50
    internally zero was a better way
    to start than one.
  • 3:50 - 3:54
    It, you'll get used to it and then after
    a while there's
  • 3:54 - 3:59
    some little cool advantages to it, but for
    now, beginning is zero.
  • 3:59 - 4:02
    Just, beginning is zero, it is the rule,
    just remember it.
  • 4:03 - 4:09
    Okay, so the 0 is b, the 1 is a, the 2 is
    n, et cetera, et cetera.
  • 4:09 - 4:11
    And we call this syntax
  • 4:11 - 4:13
    fruit sub 1, okay?
  • 4:13 - 4:17
    So that is the sub 1 character of fruit,
    and then that is an a.
  • 4:17 - 4:21
    So that fruit sub 1 says, look up in
    banana, find the 1 position,
  • 4:21 - 4:26
    and give me what's in that 1
    position, that's what's the sub.
  • 4:26 - 4:30
    And what's inside these brackets can be
    an expression.
  • 4:30 - 4:34
    So if we set n to 3, n minus 1, well
    that'll compute to 2.
  • 4:34 - 4:37
    And then fruit sub 2 is the letter n,
  • 4:37 - 4:40
    right? So that's fruit sub 2, okay?
  • 4:40 - 4:42
    It's the third character, fruit sub 2.
  • 4:42 - 4:47
    So the index starts at 0, the, we read the
    brackets as sub, fruit sub 1,
  • 4:47 - 4:53
    fruit sub 2. Now, Python will
  • 4:53 - 4:58
    complain to you if you use this sub
    operator too far down the string.
  • 4:58 - 5:01
    Here is a character with 3, which
    is 0, 1, and 2.
  • 5:01 - 5:05
    And if we go to sub 5, it blows up.
  • 5:05 - 5:10
    Now, you know, by now I hope that you're
    not freaking out about traceback errors.
  • 5:10 - 5:14
    Remember, traceback errors are just Python
    trying to inform you.
  • 5:14 - 5:19
    And if we just stop looking at that as
    mean Python face, and
  • 5:19 - 5:24
    instead look at that as, oh, index error,
    string index out of range.
  • 5:24 - 5:27
    Oh yeah, I stuck a five in there and
    there's only three, oh,
  • 5:27 - 5:31
    my bad, thank you, Python, appreciate it,
    thanks for the help.
  • 5:31 - 5:35
    So, think of this as like, it's not a
    smiley face
  • 5:35 - 5:39
    but it's kind of like a, a quizzical face,
    right, it's like [SOUND].
  • 5:39 - 5:40
    I don't know.
  • 5:40 - 5:43
    Python's confused and it's trying to tell
    you what it's confused, okay?
  • 5:43 - 5:47
    So don't look at these as sad faces.
    Python doesn't hate you, Python loves you.
  • 5:48 - 5:52
    And loves me too.
    So, strings have individual
  • 5:52 - 5:54
    characters that we can address with the
    index operator.
  • 5:54 - 5:56
    They also have length.
  • 5:56 - 6:00
    And there is a built-in function called
    len, that we can call and pass in
  • 6:00 - 6:04
    as a parameter the variable or a
    constant,
  • 6:04 - 6:06
    and it will tell us how many characters.
  • 6:06 - 6:10
    Now this banana has six characters in it
    that are 0 through 5.
  • 6:10 - 6:13
    So don't get a little confused, the last
    character is
  • 6:13 - 6:16
    the fifth, is sub 5, but it's also the
    sixth character.
  • 6:16 - 6:17
    So the length is really the length, it's
  • 6:17 - 6:22
    not length minus 1, okay?
    So len is like a built-in function.
  • 6:22 - 6:24
    It's not a function we have to write,
  • 6:24 - 6:27
    as we talked in chapter the functions
    chapter.
  • 6:27 - 6:29
    There are things that are part of Python
    that are just sitting there.
  • 6:29 - 6:31
    And so we are passing banana, the
    variable
  • 6:31 - 6:35
    fruit, into function, we're passing it
    into function.
  • 6:35 - 6:37
    And then, into the len function.
  • 6:37 - 6:42
    Then len [SOUND] does magic stuff.
    And then out comes the answer.
  • 6:42 - 6:48
    And that 6 replaces this and then the 6 goes
    into the variable x, and so x is 6.
  • 6:48 - 6:51
    I sure made that a messy looking slide.
  • 6:51 - 6:55
    And so, think of inside the len function,
    there's a def.
  • 6:55 - 7:00
    len takes a parameter, does some loopy
    things, and it does its thing.
  • 7:00 - 7:02
    So, it's a function that we might write
    except we don't
  • 7:02 - 7:07
    have to because it's already written and
    built in to Python.
  • 7:07 - 7:10
    Okay. So that's the length of the
  • 7:10 - 7:12
    string, that's getting it individual
    characters.
  • 7:12 - 7:16
    We can also loop through strings.
  • 7:16 - 7:19
    Obviously, if we can use the index
    operator, and we
  • 7:19 - 7:22
    can put a variable in there, we can
    write a loop.
  • 7:22 - 7:24
    This is an indefinite loop.
  • 7:24 - 7:27
    So we have this variable fruit, has the
    string banana in it.
  • 7:27 - 7:30
    We set the variable index to 0.
  • 7:30 - 7:33
    We make a little while loop.
    And we ask,
  • 7:33 - 7:35
    as long as index is less than the length
    of fruit.
  • 7:35 - 7:38
    Now remember, the length of fruit is
    going to be 6.
  • 7:38 - 7:40
    But we don't want to make that less than
    or equal to
  • 7:40 - 7:44
    because then we would crash, because
    the last character is 5.
  • 7:44 - 7:46
    We can say letter is equal to fruit sub
    index, so that's going to
  • 7:46 - 7:50
    start out being index of, is going to be
    0, so that's fruit sub 0.
  • 7:50 - 7:53
    Then we print index and letter, so that
    means the
  • 7:53 - 7:56
    first time through the loop we're
    going to see 0 b.
  • 7:56 - 7:58
    Then we increment our
  • 7:58 - 8:04
    iteration operator, and go up.
    And then we come out with 1 a.
  • 8:04 - 8:14
    And index advances until index is 6, but
    has printed out each of the letters.
  • 8:14 - 8:16
    Now, we're not doing this just to
  • 8:16 - 8:19
    print them out, we will do something
    a little more valuable,
  • 8:22 - 8:23
    valuable inside that loop.
  • 8:23 - 8:29
    But this gives the sense that we can work
    through a loop just like we, we,
  • 8:29 - 8:36
    we can work through a string just like
    we work through a list of numbers, okay?
  • 8:36 - 8:39
    Now, that was how you do it with an
    indefinite loop.
  • 8:39 - 8:43
    In a definite loop, it's just far more
    awesome, okay?
  • 8:43 - 8:45
    Just like we did with the list of numbers,
  • 8:46 - 8:49
    Python understands strings and allows us
    to write
  • 8:49 - 8:53
    for loops, using for and in, that go through
    the strings.
  • 8:53 - 8:57
    So basically, for letter in fruit, now
    remember, I'm using letter as a
  • 8:57 - 9:01
    mnemonic variable here, it's just a
    choice, a wise choice of a variable name.
  • 9:01 - 9:06
    So that says, run this little block of
    text once for
  • 9:06 - 9:08
    each letter in the variable fruit, which
    means that letter's going to
  • 9:08 - 9:14
    take on the successive b-a-n-a-n-a.
  • 9:14 - 9:16
    When I look at that I always worry that I
    misspelled it.
  • 9:16 - 9:19
    I think I got these right.
  • 9:19 - 9:22
    If I rewrite this book, I'm not going to
    use banana as the example because I'm
  • 9:22 - 9:25
    terrified that I misspelled banana,
    because I don't
  • 9:25 - 9:27
    know how many n's banana has in it.
  • 9:27 - 9:32
    But, be that as it may, we are
    abstracting, we are letting Python say,
  • 9:32 - 9:36
    run this little block of text once, in
    order, for each of the letters in
  • 9:36 - 9:41
    the variable fruit, which is b-a-n-a, and
    so it prints out each of the letters.
  • 9:41 - 9:46
    So this is a much prettier version of the,
    the looping,
  • 9:46 - 9:51
    so using the definite, the for keyword
    instead of the while keyword.
  • 9:51 - 9:54
    And so, we can just kind of compare these
    two things.
  • 9:54 - 9:56
    They kind of do the exact same thing.
  • 9:56 - 9:58
    And it also is kind of a, gives you a
  • 9:58 - 10:01
    sense of what the for is doing for us,
    right?
  • 10:01 - 10:02
    The for is
  • 10:02 - 10:05
    setting up this index, the for is
    looking up
  • 10:05 - 10:08
    inside of fruit, and the for is advancing
    the index.
  • 10:08 - 10:10
    So the for's doing a bunch of work for us
  • 10:10 - 10:12
    and I've characterized that, sort of, in
    the previous lecture.
  • 10:12 - 10:15
    How the for is sort of doing a bunch of
    things for us
  • 10:15 - 10:20
    and that's, it allows our code to
    be more
  • 10:20 - 10:22
    expressive and, and instead of, so this
    is, a lot of
  • 10:22 - 10:26
    this is just kind of bookkeeping crap that
    we don't really need.
  • 10:26 - 10:30
    And so the for loop helps us by doing some
    of the bookkeeping for us.
  • 10:32 - 10:35
    Okay, so we can do all those loops again.
  • 10:35 - 10:39
    We can find the largest letter, the
    smallest letter, the, how many times.
  • 10:39 - 10:45
    So, I think, what, how many n's are in
    this, or how many a's are in this.
  • 10:45 - 10:50
    So this is a simple counting pattern and,
    and a looking pattern.
  • 10:50 - 10:53
    And so, our word is banana, our count is 0.
  • 10:53 - 10:55
    For the letter in word, again, boop, boop,
  • 10:55 - 10:57
    boop, boop, boop, that comes out like that.
  • 10:57 - 11:01
    So it's going to run this little block.
    If the letter is a, add 1 to the count.
  • 11:02 - 11:08
    So this is going to basically print out at
    the end the number of a's in banana.
  • 11:08 - 11:10
    It would probably be more useful, for me,
    to print out the number
  • 11:10 - 11:14
    of n's in banana, because I never know how
    many n's are in banana.
  • 11:14 - 11:15
    But it looks like there's supposed to be two,
  • 11:15 - 11:17
    or otherwise I have a typo on this slide.
  • 11:19 - 11:21
    So the in, again, I, I love the in.
  • 11:21 - 11:22
    I just absolutely
  • 11:22 - 11:25
    love this in.
    I love this syntax.
  • 11:25 - 11:31
    This for each letter in the word banana.
    Just, to me, it reads very smoothly.
  • 11:31 - 11:33
    Cognitively, it fits in my mind what's
    going on.
  • 11:33 - 11:37
    For each letter in banana, run this little
    indented block of text.
  • 11:37 - 11:43
    Again, very pretty, I love in, it's one of
    my favorite little pieces of Python.
  • 11:46 - 11:49
    So, again, with the for, it takes care of
  • 11:49 - 11:52
    all the iteration variables for us, and it
    goes through the sequence.
  • 11:52 - 11:55
    And so here's, here's an animation, right?
  • 11:55 - 11:58
    Remember that the for is going to do all
    this work for us, right?
  • 11:58 - 12:01
    Letter is going to advance through the
  • 12:01 - 12:05
    successive values, the successive letters
    in banana.
  • 12:05 - 12:12
    So letter is being moved for us by the for
    statement, okay?
  • 12:12 - 12:15
    So that's looping through.
  • 12:15 - 12:17
    Now it turns out there's a lot of
    common things that
  • 12:17 - 12:19
    we want to do that are already built into
    Python for us.
  • 12:20 - 12:24
    Clear the screen there.
    We call these slicing.
  • 12:24 - 12:29
    So the index operator looks up various
    things in a string, but we
  • 12:29 - 12:33
    can also pull substrings out, using the
    colon in addition to the square brackets.
  • 12:33 - 12:35
    Again, this is called slicing.
  • 12:36 - 12:37
    So the
  • 12:37 - 12:43
    colon operator, basically, takes a
    starting position, and then an ending
  • 12:43 - 12:48
    position, but the ending position is up to
    but not including the second one.
  • 12:48 - 12:52
    So this is, it's up to but not including,
    up to but not including.
  • 12:52 - 12:54
    Just like the zero, you get used to it
    pretty quick,
  • 12:54 - 12:56
    but the first time you see it, it's a
    little bit
  • 12:58 - 12:59
    wonky.
  • 12:59 - 13:03
    So, if we're going 0 through 4, that's how
    I read this print, s sub 0
  • 13:03 - 13:09
    through 4, or, or better, better said,
    s 0, up to but not including 4.
  • 13:09 - 13:14
    That is, print me out the chunk that is up
    to, but not including, 4.
  • 13:14 - 13:19
    So, it doesn't include 4, and so out comes
    Mont, right?
  • 13:20 - 13:23
    So the next one is 6 up to but not
    including 7, so it starts at 6,
  • 13:23 - 13:30
    up to but not including 7, so
    out comes the P.
  • 13:30 - 13:32
    And, even though you might expect that it
  • 13:32 - 13:36
    would traceback on this, Python is a
    little forgiving.
  • 13:36 - 13:37
    So here's a moment where Python is a
    little
  • 13:37 - 13:40
    forgiving, saying, you know, I'll give you
    a break here.
  • 13:40 - 13:43
    If you go 6, but up to, but not including 20,
  • 13:43 - 13:46
    I'll just stop at the end of the string.
  • 13:46 - 13:49
    So it's 6 to the end, so it, it, you can
    over-reference here and
  • 13:49 - 13:52
    you can not, you won't get yourself in
    trouble.
  • 13:52 - 13:53
    So that comes out, Python.
  • 13:53 - 13:58
    So, again, the second character is
    up to but not including,
  • 13:58 - 14:00
    and that's the, kind of the
    weird thing there.
  • 14:00 - 14:02
    Of course once you remember that
    the first character
  • 14:02 - 14:05
    is 0, 0 up through but not including.
    Nice.
  • 14:09 - 14:12
    If we leave off the first or the last
    number, leaving off the first number, the
  • 14:12 - 14:17
    last number and both of them, they mean
    the beginning and end of the string,
  • 14:17 - 14:24
    respectively.
    And so, up to but not including 2 is M-o.
  • 14:24 - 14:31
    8 colon means starting at 8 to the end of
    the string.
  • 14:31 - 14:34
    So that's, thon.
    And then, that means
  • 14:34 - 14:37
    the beginning to the end, and so it's
    just the whole string, Monty Python.
  • 14:38 - 14:40
    Now we've already played with string
  • 14:40 - 14:43
    concatenation, just a thing to
    emphasize here is,
  • 14:43 - 14:49
    the concatenation operator does not
    add a space, does not add a space.
  • 14:49 - 14:52
    If you want a space, you explicitly add it.
  • 14:52 - 14:56
    So here there's no space in between the o
    and the t, but here
  • 14:56 - 15:00
    there is a space between the o and the t
    because we explicitly added it.
  • 15:00 - 15:02
    So you can concatenate more than one
    thing.
  • 15:02 - 15:05
    And you add your spaces as you want,
    okay?
  • 15:08 - 15:10
    Another thing you can do is you can ask
    questions about a string.
  • 15:10 - 15:15
    Sort of like doing a string lookup, using
    the in operator.
  • 15:15 - 15:18
    This is a little different than how we use
    it inside of a for loop.
  • 15:18 - 15:21
    This is a logical operation asking a
    question
  • 15:21 - 15:23
    like less than or greater than or
    whatever.
  • 15:23 - 15:25
    So, here's an expression.
  • 15:25 - 15:29
    So fruit is banana, as always.
    Is n in fruit?
  • 15:30 - 15:33
    And the answer is yes it is, True.
    So this
  • 15:33 - 15:35
    is a logical operation.
    It's a question.
  • 15:35 - 15:37
    It's a true or false.
  • 15:37 - 15:40
    Is m in fruit?
    No, False.
  • 15:40 - 15:42
    And you can, this can be a string, not
    just a single character.
  • 15:42 - 15:45
    Is n-a-n in fruit?
    The answer is True.
  • 15:45 - 15:50
    And you can put, sort of, you know, if,
    parts of ifs, et cetera, et cetera.
  • 15:50 - 15:54
    So, this is a logical expression that can
    be on an if,
  • 15:54 - 15:57
    you can have a while, et cetera, et
    cetera, et cetera.
  • 15:57 - 15:58
    So it's a logical,
  • 15:58 - 16:01
    logical expression and it returns
    True or False.
  • 16:04 - 16:06
    You can also do comparisons.
  • 16:06 - 16:11
    Now, the comparison operations, equals
    makes a lot of sense, less
  • 16:11 - 16:15
    than and greater than depend on the
    language that you're using Python.
  • 16:15 - 16:20
    And so, if you're using, like, a Latin
    character set, then alphabetical matters.
  • 16:20 - 16:22
    You know, the, the way the Latin character
    set would do.
  • 16:22 - 16:24
    But if you're in a different character
    set, Python is
  • 16:24 - 16:29
    aware of multiple character sets and will
    sort strings based on
  • 16:29 - 16:32
    the sorting algorithm of the particular
    character set.
  • 16:33 - 16:38
    So you can do comparisons like equality,
    less than, and greater than.
  • 16:38 - 16:40
    And we've seen some of these things in
    previous lectures, actually.
  • 16:40 - 16:41
    We've had to use them.
  • 16:42 - 16:47
    So in addition, to, sort of, these sort of
    fundamental operations that we
  • 16:47 - 16:54
    can do on strings, there's a extensive
    library of built-in capabilities
  • 16:54 - 16:55
    in Python.
  • 16:55 - 16:59
    And so the, the way we see these built-in
    capabilities
  • 16:59 - 17:03
    are they're, they're actually sort of
    built in to strings.
  • 17:03 - 17:06
    So, let's go real slow here.
  • 17:06 - 17:07
    Here we have a variable called greet and
  • 17:07 - 17:10
    we're sticking the string Hello Bob
    into it.
  • 17:10 - 17:13
    Now greet is of type string, as a result
  • 17:13 - 17:17
    of this, and it contains Hello Bob as its
    value.
  • 17:17 - 17:18
    But we can actually access
  • 17:18 - 17:27
    capabilities inside of this value. So we
    can say, greet.lower().
  • 17:27 - 17:31
    This is calling something that's part of
    greet itself, it's a part of all strings.
  • 17:31 - 17:35
    The fact that greet contains a string,
    means that we can ask for,
  • 17:35 - 17:38
    hey, give me greet, which just gives you
    back what you're looking for.
  • 17:38 - 17:41
    Like here, print greet is Hello Bob.
  • 17:41 - 17:46
    Or you can say give me greet lower, so
    this is giving me a lowercase copy.
  • 17:46 - 17:51
    It doesn't convert it to lowercase.
    It gives me a lowercase copy of Hello Bob.
  • 17:51 - 17:54
    So zap is hello bob, all lowercase.
  • 17:55 - 18:00
    Now, it didn't change greet, right?
    And, you can even put this .lower on the
  • 18:00 - 18:05
    end of constants so, why you'd do this, I don't
    know, but Hi There, with H and T capitalized,
  • 18:05 - 18:11
    .lower comes out as hi there.
    So this bit is part of
  • 18:11 - 18:12
    all strings.
  • 18:12 - 18:18
    Both variables and constants have these
    string functions built into them.
  • 18:18 - 18:21
    And every instance of a string, whether it
  • 18:21 - 18:24
    be a variable or a constant, has these
    capabilities.
  • 18:24 - 18:28
    They don't modify it, they just give you
    back a copy.
  • 18:28 - 18:32
    Now it turns out there is a, a
  • 18:32 - 18:36
    command inside Python called dir, to ask
    questions like
  • 18:36 - 18:40
    hey, well here's, you know, stuff
    has got Hello World.
  • 18:40 - 18:43
    We can say. Redo this.
  • 18:43 - 18:46
    Come here.
  • 18:46 - 18:48
    Stuff is a string.
    We can ask, hey, what are you?
  • 18:48 - 18:50
    I am a string.
  • 18:50 - 18:54
    dir is another built-in Python that asks
    the question, hey, what are all
  • 18:54 - 18:57
    the things that are built into this that I
    can make use of?
  • 18:57 - 18:58
    And here they are.
  • 18:58 - 19:01
    That's kind of a raw dump of them.
    You can also go look at
  • 19:01 - 19:06
    the online documentation for Python and
    see at the Pyth, oop, at
  • 19:06 - 19:10
    the Python website, you can see a whole
    bunch of these things.
  • 19:10 - 19:14
    And they have the calling sequence, what
    the parameters are, et cetera.
  • 19:14 - 19:18
    So when you're looking these things up,
    you can go, go read about them.
  • 19:18 - 19:19
    Here's just a few that I've pulled out,
  • 19:19 - 19:23
    capitalize, which uppercases the
    first characters,
  • 19:23 - 19:27
    center, endswith, find, there's stripping.
  • 19:27 - 19:28
    So I'll look through a couple of these,
  • 19:28 - 19:31
    just the kind of things to be looking for.
  • 19:31 - 19:34
    It'll be a good idea to take a look and read
    through some of the things.
  • 19:34 - 19:38
    Here's a couple that, that we'll probably
    be using early on.
  • 19:38 - 19:44
    The find function, it's similar to in but
    it tells you where it finds the, the
  • 19:44 - 19:50
    particular thing that it's looking for.
    And and so we'll put fruit is banana.
  • 19:50 - 19:52
    And I'm going to say pos, which is
    going to be an integer variable,
  • 19:52 - 19:54
    equals fruit.find("na").
  • 19:54 - 19:58
    So what it's saying is, go look inside
    this variable fruit,
  • 19:58 - 20:02
    hunt until you find the first occurrence
    of the string na.
  • 20:02 - 20:06
    Hunt, hunt, hunt, hunt, whoop, got it.
    And then return it to me.
  • 20:06 - 20:11
    So that's going to give me back 2.
    2 is where it found it, right?
  • 20:11 - 20:14
    So, where is it in the string, that's what
    find does.
  • 20:14 - 20:17
    And if you don't find anything, like
    you're looking for z,
  • 20:17 - 20:21
    no, no, no, I didn't find a z, then it
    gives me back negative 1.
  • 20:21 - 20:27
    So just, again, this is just one of many
    built-in functions in string.
  • 20:27 - 20:30
    The ability to find a substring, okay?
  • 20:30 - 20:33
    Or find, yeah, find a character or string
    within another string.
  • 20:35 - 20:37
    There's a lower case, there's also an
  • 20:37 - 20:41
    upper case, This might be better named
    shouting.
  • 20:41 - 20:44
    Upper means give me an uppercase copy of
    this variable.
  • 20:44 - 20:50
    So Hello Bob becomes HELLO BOB, and then
    lower is hello bob, right?
  • 20:50 - 20:56
    So these are both ways to get copies of
    uppercase and lowercase versions.
  • 20:56 - 20:58
    You might think these are kind of silly,
    but one of the things
  • 20:58 - 21:01
    that you tend to use lower for is if
    you're doing searching and
  • 21:01 - 21:04
    you want to ignore case, you convert the
    whole thing
  • 21:04 - 21:06
    to lowercase, and then you search for a
    lowercase string.
  • 21:06 - 21:09
    So you, depends on if you want to ignore
    case or not.
  • 21:09 - 21:12
    So that's, that's one of the reasons that
    you have things like this.
  • 21:14 - 21:19
    There is a replace function.
    Again, it doesn't change the value.
  • 21:19 - 21:22
    Greet is going to have Hello Bob.
  • 21:22 - 21:28
    And I'm going to say, greet.replace all
    occurrences of Bob with Jane.
  • 21:28 - 21:33
    That gives me back a copy, in nstr, says
    Hello Jane.
  • 21:33 - 21:36
    So, so greet is unchanged.
  • 21:36 - 21:40
    This replace says, make a copy and then
    make that following
  • 21:40 - 21:43
    edit that you, that, that we've requested.
  • 21:43 - 21:46
    [COUGH] Now we can also say, well, I
    mean, the replace
  • 21:46 - 21:50
    is going to do all occurrences, so greet
    is still Hello Bob.
  • 21:50 - 21:52
    This is kind of redundant here.
  • 21:52 - 21:54
    I'm just doing it so you remember what it is.
  • 21:54 - 21:55
    Greet is still Hello Bob.
  • 21:55 - 21:58
    I put Hello Bob back in it and replace
  • 21:58 - 22:01
    all the occurrences of lowercase o with
    uppercase X.
  • 22:02 - 22:05
    And then that happens.
    So this says,
  • 22:05 - 22:12
    go through the whole string [SOUND] doing
    all those replaces, okay?
  • 22:12 - 22:14
    Now another common thing that we're
    going to have to do
  • 22:14 - 22:17
    is we're going to have to throw away
    whitespace.
  • 22:17 - 22:19
    Sometimes you have a string that
  • 22:19 - 22:22
    has characters, blank characters, or other
    characters,
  • 22:22 - 22:26
    at the beginning and the end, nonprintable
    characters, and we can strip them.
  • 22:26 - 22:30
    And there's three charact, three functions
    that are built into
  • 22:30 - 22:33
    to Python strings that do this for us.
  • 22:34 - 22:38
    There is lstrip, which strips from the left.
  • 22:38 - 22:44
    There is rstrip, which strips from the right.
  • 22:44 - 22:47
    So it throws away these whitespaces, so,
    Hello Bob is gone.
  • 22:48 - 22:51
    I mean, the, so it gets rid of these
    characters.
  • 22:51 - 22:53
    Oops, these are the ones that are gotten
    rid of there.
  • 22:53 - 22:56
    I need an eraser.
    And then
  • 22:56 - 22:59
    let's use white, and then strip
    basically, gets rid of
  • 22:59 - 23:03
    all the whitespace, both on the left and
    the right side.
  • 23:03 - 23:04
    And gets rid of that.
  • 23:04 - 23:07
    So we're going to, we're going to be using
    these a lot.
  • 23:07 - 23:10
    It, one of the things you tend to do in
    Python is cleaning up data.
  • 23:10 - 23:12
    Sometimes if you have spaces at the
    beginning or
  • 23:12 - 23:14
    the end, you just want to kind of ignore
    them.
  • 23:14 - 23:16
    So you strip them off, you throw them
    away.
  • 23:18 - 23:22
    When we're looking for data, we sometimes
    are looking for a prefix, and
  • 23:22 - 23:27
    there is a startswith function [COUGH]
    that gives you a true or a false.
  • 23:27 - 23:31
    We're asking here, does this variable line
    start with the string Please.
  • 23:31 - 23:35
    And the answer is True, because it does
    start with the string Please.
  • 23:35 - 23:38
    Or, and then next, we ask, does this start
    with the letter p?
  • 23:38 - 23:41
    And the answer is False, it does not start
    with the letter p.
  • 23:42 - 23:43
    Okay? So there's
  • 23:43 - 23:45
    lots more of these things.
  • 23:48 - 23:53
    And reading data and tearing it apart is
    one of the things that we're going to
  • 23:53 - 23:57
    really focus on for the rest of these
    first few chapters of the book, okay?
  • 23:57 - 24:00
    Because that's one thing that Python's
    really good at is
  • 24:00 - 24:04
    tearing data into pieces and pulling the
    pieces that you want.
  • 24:04 - 24:07
    So, so let's take a look at this line.
  • 24:07 - 24:11
    So this line that we've got here is a line
    from an actual email box.
  • 24:11 - 24:14
    This is what, if you
  • 24:14 - 24:16
    looked at your email, sort of, on your hard
  • 24:16 - 24:19
    drive, email boxes would have this kind of
    a format.
  • 24:19 - 24:24
    And there's actually many lines, and soon
    we'll reading whole files full of email.
  • 24:24 - 24:27
    But for now, let's just say we've got this
    one line, somehow.
  • 24:27 - 24:29
    And we're looking for, we don't know
    how long
  • 24:29 - 24:32
    these things are going to be, the first
    charac, the
  • 24:32 - 24:35
    first thing is from, then there's an
    email address,
  • 24:35 - 24:38
    then there's some detail about when the
    mail was sent.
  • 24:38 - 24:41
    But what we actually want is
  • 24:41 - 24:42
    we want this part right here,
  • 24:42 - 24:46
    and that's the domain name of the mail
    address, right?
  • 24:46 - 24:48
    We want to extract this out.
  • 24:48 - 24:53
    We're faced with this line, in a variable,
    and we want to extract that out.
  • 24:53 - 24:56
    So this is kind of putting all these
    things together.
  • 24:56 - 24:59
    So let's walk through how we do this.
  • 24:59 - 25:02
    So, here's this line, and it's a big long
    string.
  • 25:02 - 25:04
    Mostly we would've read this from a file,
  • 25:04 - 25:06
    rather than just put it in a constant, but
    for now we
  • 25:06 - 25:08
    put it in a constant, because we, files is
    the next chapter.
  • 25:10 - 25:12
    And so what we're going to do is we're
    going to say, you
  • 25:12 - 25:15
    know what, I'm going to look at this line
    and I'm going to go
  • 25:15 - 25:18
    find the @ sign, and I want to know where
    the @ sign is.
  • 25:18 - 25:24
    So I call data.find @ sign, and put
    the result in atpos.
  • 25:24 - 25:27
    And that gives me 21.
  • 25:27 - 25:29
    It hunts until it finds the @ sign, and
  • 25:29 - 25:34
    then tells me where I found it.
    Then what I want to look at is, starting
  • 25:34 - 25:39
    here, for the rest of the string, I want
    to find the first space afterwards.
  • 25:40 - 25:46
    So what I say is, this, sppos is my
    variable for the position of the space,
  • 25:46 - 25:51
    data.find, a blank, starting
    at the @.
  • 25:51 - 25:54
    So this is starting at 21.
    So it says, I'll start
  • 25:54 - 26:00
    at 21 and I'll look for the next blank.
    And I find that at 31.
  • 26:00 - 26:05
    So now I know where the @ sign is and I
    know where the space is.
  • 26:05 - 26:08
    And so what I'm looking at is, I want the
    stuff
  • 26:08 - 26:14
    one beyond the @ sign, up to but not
    including the space.
  • 26:14 - 26:20
    So then I can use a slicing operation, I
    can use a slicing operation.
  • 26:20 - 26:23
    Start at the @ position, add 1 to it,
  • 26:23 - 26:26
    so advance 1, that's going to be the
    letter u.
  • 26:26 - 26:31
    And then a slicing operation, up to but
    not including space.
  • 26:31 - 26:36
    Up to, this is going to work out nicely
    all of a sudden, but not
  • 26:36 - 26:42
    including, okay?
    And then
  • 26:42 - 26:46
    I'm going to take that slice, which is
    really this little bit of data right here,
  • 26:46 - 26:50
    take that slice, and put in the variable
    host.
  • 26:50 - 26:54
    Then we print that out and we get the
    piece, okay?
  • 26:54 - 26:57
    And so, here we have some data we want to
    tear apart.
  • 26:57 - 26:58
    We hunt for the @.
  • 26:58 - 27:00
    We find it at position 21.
  • 27:00 - 27:05
    We start at 21 and we look for the, the
    space after that.
  • 27:05 - 27:11
    31, and then we pull from 22, up to but
    not including, 31.
  • 27:11 - 27:13
    And it, it wouldn't matter where this
    thing was, because these aren't all
  • 27:13 - 27:17
    the same length when we start looking at
    them in files, but it
  • 27:17 - 27:21
    would have found the @ sign and the space
    after the @ sign,
  • 27:21 - 27:24
    and it would have reliably
    pulled out the host, okay?
  • 27:24 - 27:30
    So this is a basic pattern we call
    parsing.
  • 27:30 - 27:32
    Parsing text.
  • 27:32 - 27:36
    Find this, find that other thing, grab
    this thing out,
  • 27:36 - 27:40
    then look inside that thing and [SOUND].
    So it does all these things, right?
  • 27:40 - 27:45
    So, that's kind of like strings.
    Up next, we have files.
  • 27:45 - 27:47
    Files are going to be lots of strings.
  • 27:47 - 27:49
    So we're going to start putting all these
    things together.
  • 27:49 - 27:52
    And and so the next chapter is a really,
    really
  • 27:52 - 27:56
    important chapter, where it starts to
    really start coming together.
  • 27:56 - 27:57
    So see you soon.
Title:
Python for Informatics - Chapter 6 - Strings
Description:

This lecture covers Chapter 6 - Strings from the book Python for Informatics: Exploring Data - www.pythonlearn.com
All Lectures: http://www.youtube.com/playlist?list=PLlRFEj9H3Oj4JXIwMwN1_s­s1Tk8wZShEJ

more » « less
Video Language:
English
Team:
Captions Requested
Duration:
27:58

English subtitles

Revisions