Python for Informatics - Chapter 6 - Strings
-
0:00 - 0:03Hello, and welcome to Chapter Six.
-
0:03 - 0:05This chapter we're going to
talk about strings, and -
0:05 - 0:09stuff is going to start to get real now.
-
0:09 - 0:13So, as always, this material, this video,
these -
0:13 - 0:16slides and book are copyright Creative
Commons Attribution. -
0:16 - 0:17I want you to use these materials.
-
0:17 - 0:19I want you to, somebody else, I want to
-
0:19 - 0:22make more teachers, so everyone can teach
this stuff. -
0:22 - 0:23Use it however you like.
-
0:24 - 0:25Okay, so we've been playing with
-
0:25 - 0:27strings from the beginning.
-
0:27 - 0:28I mean, literally, if we didn't work
-
0:28 - 0:31with strings, we could've never printed
Hello World. -
0:31 - 0:36And, and lord knows, we need to print
Hello World in a programming language. -
0:36 - 0:40And so, we've been using them, especially
constants. -
0:40 - 0:42Now, in this chapter, we're going to dig in.
-
0:42 - 0:47So, oops, so a string is a sequence of
characters. -
0:47 - 0:50You can use either use single quotes or
double quotes in Python -
0:50 - 0:51to delimit a string.
-
0:51 - 0:55And so here's two string constants, Hello
and there, -
0:55 - 0:58and stuck into the variables str1
and str2. -
0:58 - 1:01We can concatenate them together
with a plus sign. -
1:01 - 1:03Python is smart enough to look and say,
-
1:03 - 1:06oh, those are strings, I know what to
do with those. -
1:06 - 1:10And you'll notice that the plus doesn't
add any space here, because when -
1:10 - 1:14we print bob out, Hello and there are right
next to one another. -
1:14 - 1:17If, for example, we've done some
conversions, -
1:17 - 1:19so when we were, like, reading pay,
-
1:19 - 1:21and rate, and hours, and stuff,
we've done some conversions. -
1:21 - 1:23So this is an example of the,
a string 1 2 3 -
1:23 - 1:27Not 123, but the string, quote 1 2 3
quote. -
1:27 - 1:29And we can't add 1 to this, we get
-
1:29 - 1:33a traceback, kind of, at this point, as we
expected. -
1:33 - 1:37And we would convert that to an integer
using the int function that's built in. -
1:37 - 1:40See how much Python you already know?
I mean, this is awesome, right? -
1:40 - 1:41I can just say,
-
1:41 - 1:43oh, you call the int function,
and you know what that is. -
1:43 - 1:46That's, sorry, sorry, I'm just
awesomed out. -
1:46 - 1:51So you convert this to an integer, and
then you add 1 to it, and then we get 124. -
1:51 - 1:52So, there you go.
-
1:52 - 1:55We've been doing strings all along, had to.
-
1:55 - 1:57I mean, literally, strings and numeric data
-
1:57 - 2:00are the two things that programs deal with.
-
2:00 - 2:03So, we've been reading and converting.
-
2:03 - 2:05Again, this is sort of a pattern from some
of the earlier programs -
2:05 - 2:09where we do a raw input, you know?
-
2:09 - 2:11And the raw input just takes a string and
puts it in a variable. -
2:11 - 2:15So if I take Chuck, then the
variable contains the string C-h-u-c-k. -
2:16 - 2:19Even if we type numbers, that is a string.
-
2:19 - 2:24We can't, just because I put 1 0 0 in,
I still can't subtract 10. -
2:24 - 2:28We get a happy little traceback, oh, happy
little, sad-faced traceback. -
2:28 - 2:31And, and, but of course, if we convert it
-
2:31 - 2:34into float or something like that.
-
2:35 - 2:39We convert int or float, we can do that
and subtract 10, and we can do that. -
2:39 - 2:42So, so we've been doing this for a while.
-
2:42 - 2:45We've been doing strings and manipulating
strings and converting strings all along. -
2:45 - 2:49So the thing we're going to start doing
now is we're going to dive into strings. -
2:49 - 2:53We realize that strings are addressable at
a character-by-character basis, -
2:53 - 2:56and we can do all kind of cool
things with that. -
2:56 - 3:00And so, a string is a sequence of
characters, and we -
3:00 - 3:04can look inside them using what we call
the index operator, -
3:04 - 3:07the square brackets. And we've seen
square brackets in -
3:07 - 3:08lists, and you'll see that there's sort of
-
3:08 - 3:12similarities between lists of numbers,
and, in effect, a -
3:12 - 3:14string is a special kind of list of
characters. -
3:14 - 3:17So if we take this string banana,
-
3:17 - 3:21the string banana starts, the first
character starts at 0. -
3:21 - 3:25So, we call this operator sub, so
letter equals -
3:25 - 3:28fruit sub 1 and that is the second
character. -
3:28 - 3:31Now this may seem a little weird that the
first character -
3:31 - 3:34is a 0 and the second character is a 1.
-
3:34 - 3:38It actually is kind of similar to the old
elevator thing, where in Europe they're -
3:38 - 3:41called, the first floor is zero, then
negative one, -
3:41 - 3:44and the second floor is one, right?
-
3:44 - 3:46It's kind of the same thing.
Actually, it turns out that -
3:46 - 3:50internally zero was a better way
to start than one. -
3:50 - 3:54It, you'll get used to it and then after
a while there's -
3:54 - 3:59some little cool advantages to it, but for
now, beginning is zero. -
3:59 - 4:02Just, beginning is zero, it is the rule,
just remember it. -
4:03 - 4:09Okay, so the 0 is b, the 1 is a, the 2 is
n, et cetera, et cetera. -
4:09 - 4:11And we call this syntax
-
4:11 - 4:13fruit sub 1, okay?
-
4:13 - 4:17So that is the sub 1 character of fruit,
and then that is an a. -
4:17 - 4:21So that fruit sub 1 says, look up in
banana, find the 1 position, -
4:21 - 4:26and give me what's in that 1
position, that's what's the sub. -
4:26 - 4:30And what's inside these brackets can be
an expression. -
4:30 - 4:34So if we set n to 3, n minus 1, well
that'll compute to 2. -
4:34 - 4:37And then fruit sub 2 is the letter n,
-
4:37 - 4:40right? So that's fruit sub 2, okay?
-
4:40 - 4:42It's the third character, fruit sub 2.
-
4:42 - 4:47So the index starts at 0, the, we read the
brackets as sub, fruit sub 1, -
4:47 - 4:53fruit sub 2. Now, Python will
-
4:53 - 4:58complain to you if you use this sub
operator too far down the string. -
4:58 - 5:01Here is a character with 3, which
is 0, 1, and 2. -
5:01 - 5:05And if we go to sub 5, it blows up.
-
5:05 - 5:10Now, you know, by now I hope that you're
not freaking out about traceback errors. -
5:10 - 5:14Remember, traceback errors are just Python
trying to inform you. -
5:14 - 5:19And if we just stop looking at that as
mean Python face, and -
5:19 - 5:24instead look at that as, oh, index error,
string index out of range. -
5:24 - 5:27Oh yeah, I stuck a five in there and
there's only three, oh, -
5:27 - 5:31my bad, thank you, Python, appreciate it,
thanks for the help. -
5:31 - 5:35So, think of this as like, it's not a
smiley face -
5:35 - 5:39but it's kind of like a, a quizzical face,
right, it's like [SOUND]. -
5:39 - 5:40I don't know.
-
5:40 - 5:43Python's confused and it's trying to tell
you what it's confused, okay? -
5:43 - 5:47So don't look at these as sad faces.
Python doesn't hate you, Python loves you. -
5:48 - 5:52And loves me too.
So, strings have individual -
5:52 - 5:54characters that we can address with the
index operator. -
5:54 - 5:56They also have length.
-
5:56 - 6:00And there is a built-in function called
len, that we can call and pass in -
6:00 - 6:04as a parameter the variable or a
constant, -
6:04 - 6:06and it will tell us how many characters.
-
6:06 - 6:10Now this banana has six characters in it
that are 0 through 5. -
6:10 - 6:13So don't get a little confused, the last
character is -
6:13 - 6:16the fifth, is sub 5, but it's also the
sixth character. -
6:16 - 6:17So the length is really the length, it's
-
6:17 - 6:22not length minus 1, okay?
So len is like a built-in function. -
6:22 - 6:24It's not a function we have to write,
-
6:24 - 6:27as we talked in chapter the functions
chapter. -
6:27 - 6:29There are things that are part of Python
that are just sitting there. -
6:29 - 6:31And so we are passing banana, the
variable -
6:31 - 6:35fruit, into function, we're passing it
into function. -
6:35 - 6:37And then, into the len function.
-
6:37 - 6:42Then len [SOUND] does magic stuff.
And then out comes the answer. -
6:42 - 6:48And that 6 replaces this and then the 6 goes
into the variable x, and so x is 6. -
6:48 - 6:51I sure made that a messy looking slide.
-
6:51 - 6:55And so, think of inside the len function,
there's a def. -
6:55 - 7:00len takes a parameter, does some loopy
things, and it does its thing. -
7:00 - 7:02So, it's a function that we might write
except we don't -
7:02 - 7:07have to because it's already written and
built in to Python. -
7:07 - 7:10Okay. So that's the length of the
-
7:10 - 7:12string, that's getting it individual
characters. -
7:12 - 7:16We can also loop through strings.
-
7:16 - 7:19Obviously, if we can use the index
operator, and we -
7:19 - 7:22can put a variable in there, we can
write a loop. -
7:22 - 7:24This is an indefinite loop.
-
7:24 - 7:27So we have this variable fruit, has the
string banana in it. -
7:27 - 7:30We set the variable index to 0.
-
7:30 - 7:33We make a little while loop.
And we ask, -
7:33 - 7:35as long as index is less than the length
of fruit. -
7:35 - 7:38Now remember, the length of fruit is
going to be 6. -
7:38 - 7:40But we don't want to make that less than
or equal to -
7:40 - 7:44because then we would crash, because
the last character is 5. -
7:44 - 7:46We can say letter is equal to fruit sub
index, so that's going to -
7:46 - 7:50start out being index of, is going to be
0, so that's fruit sub 0. -
7:50 - 7:53Then we print index and letter, so that
means the -
7:53 - 7:56first time through the loop we're
going to see 0 b. -
7:56 - 7:58Then we increment our
-
7:58 - 8:04iteration operator, and go up.
And then we come out with 1 a. -
8:04 - 8:14And index advances until index is 6, but
has printed out each of the letters. -
8:14 - 8:16Now, we're not doing this just to
-
8:16 - 8:19print them out, we will do something
a little more valuable, -
8:22 - 8:23valuable inside that loop.
-
8:23 - 8:29But this gives the sense that we can work
through a loop just like we, we, -
8:29 - 8:36we can work through a string just like
we work through a list of numbers, okay? -
8:36 - 8:39Now, that was how you do it with an
indefinite loop. -
8:39 - 8:43In a definite loop, it's just far more
awesome, okay? -
8:43 - 8:45Just like we did with the list of numbers,
-
8:46 - 8:49Python understands strings and allows us
to write -
8:49 - 8:53for loops, using for and in, that go through
the strings. -
8:53 - 8:57So basically, for letter in fruit, now
remember, I'm using letter as a -
8:57 - 9:01mnemonic variable here, it's just a
choice, a wise choice of a variable name. -
9:01 - 9:06So that says, run this little block of
text once for -
9:06 - 9:08each letter in the variable fruit, which
means that letter's going to -
9:08 - 9:14take on the successive b-a-n-a-n-a.
-
9:14 - 9:16When I look at that I always worry that I
misspelled it. -
9:16 - 9:19I think I got these right.
-
9:19 - 9:22If I rewrite this book, I'm not going to
use banana as the example because I'm -
9:22 - 9:25terrified that I misspelled banana,
because I don't -
9:25 - 9:27know how many n's banana has in it.
-
9:27 - 9:32But, be that as it may, we are
abstracting, we are letting Python say, -
9:32 - 9:36run this little block of text once, in
order, for each of the letters in -
9:36 - 9:41the variable fruit, which is b-a-n-a, and
so it prints out each of the letters. -
9:41 - 9:46So this is a much prettier version of the,
the looping, -
9:46 - 9:51so using the definite, the for keyword
instead of the while keyword. -
9:51 - 9:54And so, we can just kind of compare these
two things. -
9:54 - 9:56They kind of do the exact same thing.
-
9:56 - 9:58And it also is kind of a, gives you a
-
9:58 - 10:01sense of what the for is doing for us,
right? -
10:01 - 10:02The for is
-
10:02 - 10:05setting up this index, the for is
looking up -
10:05 - 10:08inside of fruit, and the for is advancing
the index. -
10:08 - 10:10So the for's doing a bunch of work for us
-
10:10 - 10:12and I've characterized that, sort of, in
the previous lecture. -
10:12 - 10:15How the for is sort of doing a bunch of
things for us -
10:15 - 10:20and that's, it allows our code to
be more -
10:20 - 10:22expressive and, and instead of, so this
is, a lot of -
10:22 - 10:26this is just kind of bookkeeping crap that
we don't really need. -
10:26 - 10:30And so the for loop helps us by doing some
of the bookkeeping for us. -
10:32 - 10:35Okay, so we can do all those loops again.
-
10:35 - 10:39We can find the largest letter, the
smallest letter, the, how many times. -
10:39 - 10:45So, I think, what, how many n's are in
this, or how many a's are in this. -
10:45 - 10:50So this is a simple counting pattern and,
and a looking pattern. -
10:50 - 10:53And so, our word is banana, our count is 0.
-
10:53 - 10:55For the letter in word, again, boop, boop,
-
10:55 - 10:57boop, boop, boop, that comes out like that.
-
10:57 - 11:01So it's going to run this little block.
If the letter is a, add 1 to the count. -
11:02 - 11:08So this is going to basically print out at
the end the number of a's in banana. -
11:08 - 11:10It would probably be more useful, for me,
to print out the number -
11:10 - 11:14of n's in banana, because I never know how
many n's are in banana. -
11:14 - 11:15But it looks like there's supposed to be two,
-
11:15 - 11:17or otherwise I have a typo on this slide.
-
11:19 - 11:21So the in, again, I, I love the in.
-
11:21 - 11:22I just absolutely
-
11:22 - 11:25love this in.
I love this syntax. -
11:25 - 11:31This for each letter in the word banana.
Just, to me, it reads very smoothly. -
11:31 - 11:33Cognitively, it fits in my mind what's
going on. -
11:33 - 11:37For each letter in banana, run this little
indented block of text. -
11:37 - 11:43Again, very pretty, I love in, it's one of
my favorite little pieces of Python. -
11:46 - 11:49So, again, with the for, it takes care of
-
11:49 - 11:52all the iteration variables for us, and it
goes through the sequence. -
11:52 - 11:55And so here's, here's an animation, right?
-
11:55 - 11:58Remember that the for is going to do all
this work for us, right? -
11:58 - 12:01Letter is going to advance through the
-
12:01 - 12:05successive values, the successive letters
in banana. -
12:05 - 12:12So letter is being moved for us by the for
statement, okay? -
12:12 - 12:15So that's looping through.
-
12:15 - 12:17Now it turns out there's a lot of
common things that -
12:17 - 12:19we want to do that are already built into
Python for us. -
12:20 - 12:24Clear the screen there.
We call these slicing. -
12:24 - 12:29So the index operator looks up various
things in a string, but we -
12:29 - 12:33can also pull substrings out, using the
colon in addition to the square brackets. -
12:33 - 12:35Again, this is called slicing.
-
12:36 - 12:37So the
-
12:37 - 12:43colon operator, basically, takes a
starting position, and then an ending -
12:43 - 12:48position, but the ending position is up to
but not including the second one. -
12:48 - 12:52So this is, it's up to but not including,
up to but not including. -
12:52 - 12:54Just like the zero, you get used to it
pretty quick, -
12:54 - 12:56but the first time you see it, it's a
little bit -
12:58 - 12:59wonky.
-
12:59 - 13:03So, if we're going 0 through 4, that's how
I read this print, s sub 0 -
13:03 - 13:09through 4, or, or better, better said,
s 0, up to but not including 4. -
13:09 - 13:14That is, print me out the chunk that is up
to, but not including, 4. -
13:14 - 13:19So, it doesn't include 4, and so out comes
Mont, right? -
13:20 - 13:23So the next one is 6 up to but not
including 7, so it starts at 6, -
13:23 - 13:30up to but not including 7, so
out comes the P. -
13:30 - 13:32And, even though you might expect that it
-
13:32 - 13:36would traceback on this, Python is a
little forgiving. -
13:36 - 13:37So here's a moment where Python is a
little -
13:37 - 13:40forgiving, saying, you know, I'll give you
a break here. -
13:40 - 13:43If you go 6, but up to, but not including 20,
-
13:43 - 13:46I'll just stop at the end of the string.
-
13:46 - 13:49So it's 6 to the end, so it, it, you can
over-reference here and -
13:49 - 13:52you can not, you won't get yourself in
trouble. -
13:52 - 13:53So that comes out, Python.
-
13:53 - 13:58So, again, the second character is
up to but not including, -
13:58 - 14:00and that's the, kind of the
weird thing there. -
14:00 - 14:02Of course once you remember that
the first character -
14:02 - 14:05is 0, 0 up through but not including.
Nice. -
14:09 - 14:12If we leave off the first or the last
number, leaving off the first number, the -
14:12 - 14:17last number and both of them, they mean
the beginning and end of the string, -
14:17 - 14:24respectively.
And so, up to but not including 2 is M-o. -
14:24 - 14:318 colon means starting at 8 to the end of
the string. -
14:31 - 14:34So that's, thon.
And then, that means -
14:34 - 14:37the beginning to the end, and so it's
just the whole string, Monty Python. -
14:38 - 14:40Now we've already played with string
-
14:40 - 14:43concatenation, just a thing to
emphasize here is, -
14:43 - 14:49the concatenation operator does not
add a space, does not add a space. -
14:49 - 14:52If you want a space, you explicitly add it.
-
14:52 - 14:56So here there's no space in between the o
and the t, but here -
14:56 - 15:00there is a space between the o and the t
because we explicitly added it. -
15:00 - 15:02So you can concatenate more than one
thing. -
15:02 - 15:05And you add your spaces as you want,
okay? -
15:08 - 15:10Another thing you can do is you can ask
questions about a string. -
15:10 - 15:15Sort of like doing a string lookup, using
the in operator. -
15:15 - 15:18This is a little different than how we use
it inside of a for loop. -
15:18 - 15:21This is a logical operation asking a
question -
15:21 - 15:23like less than or greater than or
whatever. -
15:23 - 15:25So, here's an expression.
-
15:25 - 15:29So fruit is banana, as always.
Is n in fruit? -
15:30 - 15:33And the answer is yes it is, True.
So this -
15:33 - 15:35is a logical operation.
It's a question. -
15:35 - 15:37It's a true or false.
-
15:37 - 15:40Is m in fruit?
No, False. -
15:40 - 15:42And you can, this can be a string, not
just a single character. -
15:42 - 15:45Is n-a-n in fruit?
The answer is True. -
15:45 - 15:50And you can put, sort of, you know, if,
parts of ifs, et cetera, et cetera. -
15:50 - 15:54So, this is a logical expression that can
be on an if, -
15:54 - 15:57you can have a while, et cetera, et
cetera, et cetera. -
15:57 - 15:58So it's a logical,
-
15:58 - 16:01logical expression and it returns
True or False. -
16:04 - 16:06You can also do comparisons.
-
16:06 - 16:11Now, the comparison operations, equals
makes a lot of sense, less -
16:11 - 16:15than and greater than depend on the
language that you're using Python. -
16:15 - 16:20And so, if you're using, like, a Latin
character set, then alphabetical matters. -
16:20 - 16:22You know, the, the way the Latin character
set would do. -
16:22 - 16:24But if you're in a different character
set, Python is -
16:24 - 16:29aware of multiple character sets and will
sort strings based on -
16:29 - 16:32the sorting algorithm of the particular
character set. -
16:33 - 16:38So you can do comparisons like equality,
less than, and greater than. -
16:38 - 16:40And we've seen some of these things in
previous lectures, actually. -
16:40 - 16:41We've had to use them.
-
16:42 - 16:47So in addition, to, sort of, these sort of
fundamental operations that we -
16:47 - 16:54can do on strings, there's a extensive
library of built-in capabilities -
16:54 - 16:55in Python.
-
16:55 - 16:59And so the, the way we see these built-in
capabilities -
16:59 - 17:03are they're, they're actually sort of
built in to strings. -
17:03 - 17:06So, let's go real slow here.
-
17:06 - 17:07Here we have a variable called greet and
-
17:07 - 17:10we're sticking the string Hello Bob
into it. -
17:10 - 17:13Now greet is of type string, as a result
-
17:13 - 17:17of this, and it contains Hello Bob as its
value. -
17:17 - 17:18But we can actually access
-
17:18 - 17:27capabilities inside of this value. So we
can say, greet.lower(). -
17:27 - 17:31This is calling something that's part of
greet itself, it's a part of all strings. -
17:31 - 17:35The fact that greet contains a string,
means that we can ask for, -
17:35 - 17:38hey, give me greet, which just gives you
back what you're looking for. -
17:38 - 17:41Like here, print greet is Hello Bob.
-
17:41 - 17:46Or you can say give me greet lower, so
this is giving me a lowercase copy. -
17:46 - 17:51It doesn't convert it to lowercase.
It gives me a lowercase copy of Hello Bob. -
17:51 - 17:54So zap is hello bob, all lowercase.
-
17:55 - 18:00Now, it didn't change greet, right?
And, you can even put this .lower on the -
18:00 - 18:05end of constants so, why you'd do this, I don't
know, but Hi There, with H and T capitalized, -
18:05 - 18:11.lower comes out as hi there.
So this bit is part of -
18:11 - 18:12all strings.
-
18:12 - 18:18Both variables and constants have these
string functions built into them. -
18:18 - 18:21And every instance of a string, whether it
-
18:21 - 18:24be a variable or a constant, has these
capabilities. -
18:24 - 18:28They don't modify it, they just give you
back a copy. -
18:28 - 18:32Now it turns out there is a, a
-
18:32 - 18:36command inside Python called dir, to ask
questions like -
18:36 - 18:40hey, well here's, you know, stuff
has got Hello World. -
18:40 - 18:43We can say. Redo this.
-
18:43 - 18:46Come here.
-
18:46 - 18:48Stuff is a string.
We can ask, hey, what are you? -
18:48 - 18:50I am a string.
-
18:50 - 18:54dir is another built-in Python that asks
the question, hey, what are all -
18:54 - 18:57the things that are built into this that I
can make use of? -
18:57 - 18:58And here they are.
-
18:58 - 19:01That's kind of a raw dump of them.
You can also go look at -
19:01 - 19:06the online documentation for Python and
see at the Pyth, oop, at -
19:06 - 19:10the Python website, you can see a whole
bunch of these things. -
19:10 - 19:14And they have the calling sequence, what
the parameters are, et cetera. -
19:14 - 19:18So when you're looking these things up,
you can go, go read about them. -
19:18 - 19:19Here's just a few that I've pulled out,
-
19:19 - 19:23capitalize, which uppercases the
first characters, -
19:23 - 19:27center, endswith, find, there's stripping.
-
19:27 - 19:28So I'll look through a couple of these,
-
19:28 - 19:31just the kind of things to be looking for.
-
19:31 - 19:34It'll be a good idea to take a look and read
through some of the things. -
19:34 - 19:38Here's a couple that, that we'll probably
be using early on. -
19:38 - 19:44The find function, it's similar to in but
it tells you where it finds the, the -
19:44 - 19:50particular thing that it's looking for.
And and so we'll put fruit is banana. -
19:50 - 19:52And I'm going to say pos, which is
going to be an integer variable, -
19:52 - 19:54equals fruit.find("na").
-
19:54 - 19:58So what it's saying is, go look inside
this variable fruit, -
19:58 - 20:02hunt until you find the first occurrence
of the string na. -
20:02 - 20:06Hunt, hunt, hunt, hunt, whoop, got it.
And then return it to me. -
20:06 - 20:11So that's going to give me back 2.
2 is where it found it, right? -
20:11 - 20:14So, where is it in the string, that's what
find does. -
20:14 - 20:17And if you don't find anything, like
you're looking for z, -
20:17 - 20:21no, no, no, I didn't find a z, then it
gives me back negative 1. -
20:21 - 20:27So just, again, this is just one of many
built-in functions in string. -
20:27 - 20:30The ability to find a substring, okay?
-
20:30 - 20:33Or find, yeah, find a character or string
within another string. -
20:35 - 20:37There's a lower case, there's also an
-
20:37 - 20:41upper case, This might be better named
shouting. -
20:41 - 20:44Upper means give me an uppercase copy of
this variable. -
20:44 - 20:50So Hello Bob becomes HELLO BOB, and then
lower is hello bob, right? -
20:50 - 20:56So these are both ways to get copies of
uppercase and lowercase versions. -
20:56 - 20:58You might think these are kind of silly,
but one of the things -
20:58 - 21:01that you tend to use lower for is if
you're doing searching and -
21:01 - 21:04you want to ignore case, you convert the
whole thing -
21:04 - 21:06to lowercase, and then you search for a
lowercase string. -
21:06 - 21:09So you, depends on if you want to ignore
case or not. -
21:09 - 21:12So that's, that's one of the reasons that
you have things like this. -
21:14 - 21:19There is a replace function.
Again, it doesn't change the value. -
21:19 - 21:22Greet is going to have Hello Bob.
-
21:22 - 21:28And I'm going to say, greet.replace all
occurrences of Bob with Jane. -
21:28 - 21:33That gives me back a copy, in nstr, says
Hello Jane. -
21:33 - 21:36So, so greet is unchanged.
-
21:36 - 21:40This replace says, make a copy and then
make that following -
21:40 - 21:43edit that you, that, that we've requested.
-
21:43 - 21:46[COUGH] Now we can also say, well, I
mean, the replace -
21:46 - 21:50is going to do all occurrences, so greet
is still Hello Bob. -
21:50 - 21:52This is kind of redundant here.
-
21:52 - 21:54I'm just doing it so you remember what it is.
-
21:54 - 21:55Greet is still Hello Bob.
-
21:55 - 21:58I put Hello Bob back in it and replace
-
21:58 - 22:01all the occurrences of lowercase o with
uppercase X. -
22:02 - 22:05And then that happens.
So this says, -
22:05 - 22:12go through the whole string [SOUND] doing
all those replaces, okay? -
22:12 - 22:14Now another common thing that we're
going to have to do -
22:14 - 22:17is we're going to have to throw away
whitespace. -
22:17 - 22:19Sometimes you have a string that
-
22:19 - 22:22has characters, blank characters, or other
characters, -
22:22 - 22:26at the beginning and the end, nonprintable
characters, and we can strip them. -
22:26 - 22:30And there's three charact, three functions
that are built into -
22:30 - 22:33to Python strings that do this for us.
-
22:34 - 22:38There is lstrip, which strips from the left.
-
22:38 - 22:44There is rstrip, which strips from the right.
-
22:44 - 22:47So it throws away these whitespaces, so,
Hello Bob is gone. -
22:48 - 22:51I mean, the, so it gets rid of these
characters. -
22:51 - 22:53Oops, these are the ones that are gotten
rid of there. -
22:53 - 22:56I need an eraser.
And then -
22:56 - 22:59let's use white, and then strip
basically, gets rid of -
22:59 - 23:03all the whitespace, both on the left and
the right side. -
23:03 - 23:04And gets rid of that.
-
23:04 - 23:07So we're going to, we're going to be using
these a lot. -
23:07 - 23:10It, one of the things you tend to do in
Python is cleaning up data. -
23:10 - 23:12Sometimes if you have spaces at the
beginning or -
23:12 - 23:14the end, you just want to kind of ignore
them. -
23:14 - 23:16So you strip them off, you throw them
away. -
23:18 - 23:22When we're looking for data, we sometimes
are looking for a prefix, and -
23:22 - 23:27there is a startswith function [COUGH]
that gives you a true or a false. -
23:27 - 23:31We're asking here, does this variable line
start with the string Please. -
23:31 - 23:35And the answer is True, because it does
start with the string Please. -
23:35 - 23:38Or, and then next, we ask, does this start
with the letter p? -
23:38 - 23:41And the answer is False, it does not start
with the letter p. -
23:42 - 23:43Okay? So there's
-
23:43 - 23:45lots more of these things.
-
23:48 - 23:53And reading data and tearing it apart is
one of the things that we're going to -
23:53 - 23:57really focus on for the rest of these
first few chapters of the book, okay? -
23:57 - 24:00Because that's one thing that Python's
really good at is -
24:00 - 24:04tearing data into pieces and pulling the
pieces that you want. -
24:04 - 24:07So, so let's take a look at this line.
-
24:07 - 24:11So this line that we've got here is a line
from an actual email box. -
24:11 - 24:14This is what, if you
-
24:14 - 24:16looked at your email, sort of, on your hard
-
24:16 - 24:19drive, email boxes would have this kind of
a format. -
24:19 - 24:24And there's actually many lines, and soon
we'll reading whole files full of email. -
24:24 - 24:27But for now, let's just say we've got this
one line, somehow. -
24:27 - 24:29And we're looking for, we don't know
how long -
24:29 - 24:32these things are going to be, the first
charac, the -
24:32 - 24:35first thing is from, then there's an
email address, -
24:35 - 24:38then there's some detail about when the
mail was sent. -
24:38 - 24:41But what we actually want is
-
24:41 - 24:42we want this part right here,
-
24:42 - 24:46and that's the domain name of the mail
address, right? -
24:46 - 24:48We want to extract this out.
-
24:48 - 24:53We're faced with this line, in a variable,
and we want to extract that out. -
24:53 - 24:56So this is kind of putting all these
things together. -
24:56 - 24:59So let's walk through how we do this.
-
24:59 - 25:02So, here's this line, and it's a big long
string. -
25:02 - 25:04Mostly we would've read this from a file,
-
25:04 - 25:06rather than just put it in a constant, but
for now we -
25:06 - 25:08put it in a constant, because we, files is
the next chapter. -
25:10 - 25:12And so what we're going to do is we're
going to say, you -
25:12 - 25:15know what, I'm going to look at this line
and I'm going to go -
25:15 - 25:18find the @ sign, and I want to know where
the @ sign is. -
25:18 - 25:24So I call data.find @ sign, and put
the result in atpos. -
25:24 - 25:27And that gives me 21.
-
25:27 - 25:29It hunts until it finds the @ sign, and
-
25:29 - 25:34then tells me where I found it.
Then what I want to look at is, starting -
25:34 - 25:39here, for the rest of the string, I want
to find the first space afterwards. -
25:40 - 25:46So what I say is, this, sppos is my
variable for the position of the space, -
25:46 - 25:51data.find, a blank, starting
at the @. -
25:51 - 25:54So this is starting at 21.
So it says, I'll start -
25:54 - 26:00at 21 and I'll look for the next blank.
And I find that at 31. -
26:00 - 26:05So now I know where the @ sign is and I
know where the space is. -
26:05 - 26:08And so what I'm looking at is, I want the
stuff -
26:08 - 26:14one beyond the @ sign, up to but not
including the space. -
26:14 - 26:20So then I can use a slicing operation, I
can use a slicing operation. -
26:20 - 26:23Start at the @ position, add 1 to it,
-
26:23 - 26:26so advance 1, that's going to be the
letter u. -
26:26 - 26:31And then a slicing operation, up to but
not including space. -
26:31 - 26:36Up to, this is going to work out nicely
all of a sudden, but not -
26:36 - 26:42including, okay?
And then -
26:42 - 26:46I'm going to take that slice, which is
really this little bit of data right here, -
26:46 - 26:50take that slice, and put in the variable
host. -
26:50 - 26:54Then we print that out and we get the
piece, okay? -
26:54 - 26:57And so, here we have some data we want to
tear apart. -
26:57 - 26:58We hunt for the @.
-
26:58 - 27:00We find it at position 21.
-
27:00 - 27:05We start at 21 and we look for the, the
space after that. -
27:05 - 27:1131, and then we pull from 22, up to but
not including, 31. -
27:11 - 27:13And it, it wouldn't matter where this
thing was, because these aren't all -
27:13 - 27:17the same length when we start looking at
them in files, but it -
27:17 - 27:21would have found the @ sign and the space
after the @ sign, -
27:21 - 27:24and it would have reliably
pulled out the host, okay? -
27:24 - 27:30So this is a basic pattern we call
parsing. -
27:30 - 27:32Parsing text.
-
27:32 - 27:36Find this, find that other thing, grab
this thing out, -
27:36 - 27:40then look inside that thing and [SOUND].
So it does all these things, right? -
27:40 - 27:45So, that's kind of like strings.
Up next, we have files. -
27:45 - 27:47Files are going to be lots of strings.
-
27:47 - 27:49So we're going to start putting all these
things together. -
27:49 - 27:52And and so the next chapter is a really,
really -
27:52 - 27:56important chapter, where it starts to
really start coming together. -
27:56 - 27:57So see you soon.
- Title:
- Python for Informatics - Chapter 6 - Strings
- Description:
-
This lecture covers Chapter 6 - Strings from the book Python for Informatics: Exploring Data - www.pythonlearn.com
All Lectures: http://www.youtube.com/playlist?list=PLlRFEj9H3Oj4JXIwMwN1_ss1Tk8wZShEJ - Video Language:
- English
- Team:
- Captions Requested
- Duration:
- 27:58
Claude Almansi edited English subtitles for Python for Informatics - Chapter 6 - Strings |