https:/.../4c5b7bbb-dc57-4484-b160-ad9000efd740-94330698-4916-4a28-9b48-ad9101273819.mp4?invocationId=8cd5deeb-2f09-ec11-a9e9-0a1a827ad0ec

Edit subtitles

0:05 - 0:10

This video, I'm going to show you how to do grouping and aggregate operations and pandas sort of learning
0:10 - 0:15

outcomes are few to be able to compute an aggregate aggregate values from a panda series,
0:15 - 0:22

compute grouped aggregate values from a PANDAS data frame, and also be able to order a data frame,
0:22 - 0:26

pick the larger the rows with the largest values for some series.
0:26 - 0:32

And then finally join two pandas data frames to get context for the results that we just computed in the first part.
0:32 - 0:39

So we have a data frame, so this is the movie lends data that we used and some of the earlier videos.
0:39 - 0:48

So we have the data frame and we've got this ratings table that has the user I.D., movie, I.D. rating and timestamp columns.
0:48 - 0:53

It's twenty five million rose by four columns. So an aggregate.
0:53 - 0:59

If we want to ask the question, what does the mean rating? So all of the rating values users has ever given.
0:59 - 1:06

What's the mean value? And this is this is the code we would use to do this.
1:06 - 1:12

And there's a few pieces. There's. So we're using this data frame.
1:12 - 1:26

There's a few pieces. We're using this data frame. There's a few pieces.
1:26 - 1:32

We're using this data frame. And we're we're then selecting a column.
1:32 - 1:50

Remember, this is the court. This is how we select a column. And then the result of that operation, this whole operation here is a series.
1:50 - 1:58

And so then we call the mean method on the series and we get the mean we get the mean rating.
1:58 - 2:05

Think a little bit about the previous video to think a moment about what the conceptual problem here is, the common operation.
2:05 - 2:09

But there is a little conceptual problem with it in terms of what it actually means.
2:09 - 2:14

There's a variety of different aggregate functions that we have in pandas.
2:14 - 2:18

We've got mean median mode. We've got the minimum and the maximum.
2:18 - 2:23

You can some you can count. You can compute standard deviation and variance.
2:23 - 2:27

There is there are several others as well.
2:27 - 2:30

These are all methods on a PANDAS series. If you have a series, this is a method.
2:30 - 2:38

You've got the serious object dot. And then this function parentheses to call it, and you're going to compute that aggregate statistics.
2:38 - 2:52

So let's see these in action. So I'm going to compute the mean rating and we get three point five.
2:52 - 2:59

I can compute the sum. There's an alternate form. All of these are also available as functions in the num pi module.
2:59 - 3:04

That ticker, an array and a series is a kind of array for some of the functions.
3:04 - 3:13

There are slight differences between the panda's versions and the num pi versions, but mean and some are the same.
3:13 - 3:20

So if we want to get the size of a series, there's a couple of different ways so we can ask the series for its size or do a line on it.
3:20 - 3:26

Those are the same operation and they will give us the total length of the series, including missing values.
3:26 - 3:29

If we've got a series that has missing values and we haven't seen missing values yet,
3:29 - 3:35

but they're going to come up later and we want to count how many values we actually have.
3:35 - 3:41

That's what the series count does count. Method does. So we can see those.
3:41 - 3:51

The size, the land, those are the same. Also, we can get a series is an array and a raise in the number PI world have a shape we can get shape,
3:51 - 3:56

which is the same as the size except as a tuple, because arrays can have more than one dimension.
3:56 - 4:02

This weird syntax here where we have a number with a comma after it inside parentheses.
4:02 - 4:07

That's the python syntax for a tuple consisting of exactly one value.
4:07 - 4:11

It's a little bit of a weird syntax, but it comes up in a few places. But that's what that means.
4:11 - 4:21

It's a tuple with exactly one value. Then we can count the number of ratings and since we don't have any missing ratings, it returns the same number.
4:21 - 4:32

So. Another thing we can do that. That's a form of an aggregate is to get a Quantrell and the quanti all takes a parameter that is the fraction.
4:32 - 4:38

And what it does is, is this the parameter as a fraction? We want to find the value.
4:38 - 4:44

If we sorted the if we sort of the series from low smallest to largest.
4:44 - 4:51

And we went that fraction along it, so point five would be the middle. The median value, we're gonna see median in the next video.
4:51 - 4:56

What's the value that's there? So we can go we can see those run.
4:56 - 5:06

The median rating is three point five. If we ask for the Quanti or point two, we're going to get the.
5:06 - 5:10

We're going to if we ask for quanti or point two, we get 3.0.
5:10 - 5:16

And what this means, it's it's point to the way across 80 percent of the ratings are 3.0 or higher.
5:16 - 5:22

On a five star scale. So think a little bit about why that might be.
5:22 - 5:26

We've seen so far aggregates that work over a single serious evalu to take the series.
5:26 - 5:33

We get one value, but sometimes we want to be able to group and compute aggregates per group.
5:33 - 5:37

So remember this. This data frame has movie.
5:37 - 5:45

The ratings are for movies. And they're provided by users, so maybe we want to get rather than just the mean overall rating.
5:45 - 5:50

Maybe you won't want to do is we want to find the average number of ratings per movie.
5:50 - 5:56

This would give us a measure of popularity. We could say, well, the movie that's rated the most frequently is the most popular.
5:56 - 6:02

We could also look at the average rating per movie. And so we can do this with the group by.
6:02 - 6:10

So group by. Returns an object that allows us to perform grouped operations on a data frame.
6:10 - 6:17

And so we give it the column name that we want to group by. In this case, movie I.D., we can group by more than one column at a time.
6:17 - 6:25

We're only doing one for now. Then we can we can do the.
6:25 - 6:31

We then in this group are we're going to say we only want to work on one column.
6:31 - 6:36

And otherwise, it's going to count the ratings and the time stamps and back, so they're going to be the same count.
6:36 - 6:40

So we're gonna say we're gonna to group by movie idea. They're going to say within each group.
6:40 - 6:46

We only want to work with the rating. And then we want to do is we want to count it all of the aggregate values.
6:46 - 6:52

The functions that we've seen before work on a per group basis as well. And.
6:52 - 6:59

Do note, though, that we are grouping by grouping the whole data frame by movie I.D. before we select the column.
6:59 - 7:02

If we did it the other way around, we were okay, select red and well,
7:02 - 7:06

now we don't have a movie idea to group by because we've pulled the rating out of the movie.
7:06 - 7:11

This order is important. So we group by movie I.D. That's another column in the frame and we use the rating column.
7:11 - 7:23

So let's see this in action. So we want to count the number of ratings per movie and what it gives us is a series whose index?
7:23 - 7:30

Is the movie I.D.? And whose value is the number of movies for that movie?
7:30 - 7:34

We haven't really seen indexes yet. We haven't really worked with them much yet.
7:34 - 7:39

But that's what it's doing here. We're indexing the data front. It's resulting in a series that's indexed.
7:39 - 7:48

And this is the thing. Serious ads on top of a normal non pie array is that we have this index that tells us, oh, this is for movie one.
7:48 - 7:53

This is for movie two thousand two hundred nine thousand one hundred and seventy one.
7:53 - 7:57

We can also compute multiple aggregates at the same time.
7:57 - 8:11

So the the AG, there's an AG function that allows you to to specify movies, to aggregate functions, to call you, specify them by name.
8:11 - 8:18

So here I'm doing the group by that we did before. And then I'm AG.
8:18 - 8:21

Calling AG to say I want to aggregate the values values this column.
8:21 - 8:27

But I'm giving it a list of two different aggregation functions, mean and count.
8:27 - 8:39

And when I run this, I get a data frame. That's indexed by movie I.D., but then it has two columns and the columns are named after the function,
8:39 - 8:43

so have a mean function that's the result of mean and account function.
8:43 - 8:50

That's the result of count. And because I know that I did this on the rating column, I know these are the mean and the count to the ratings.
8:50 - 8:55

So we can see that movie I.D. has a mean rating of three point eighty nine and.
8:55 - 9:02

And the number of ratings is fifty seven thousand three hundred and nine.
9:02 - 9:09

So sometimes you want to sort out data. So sort values will resource an entire data frame.
9:09 - 9:13

And by a specific column, you get a column numbers.
9:13 - 9:18

We could resource this whole data frame by by, say, the number of ratings.
9:18 - 9:22

Sometimes we also want to just get the largest or small. Sometimes the reason we want to sort.
9:22 - 9:28

Is I want I'd want to know the five movies with the most ratings.
9:28 - 9:33

In which case, we don't necessarily need to sort the entire thing. And largest. And then smallest.
9:33 - 9:40

Let us just get the rose with the with the end largest or smallest values for a particular column.
9:40 - 9:43

So if I go over and do this.
9:43 - 9:51

So I want to get the 10 movies with the most ratings I can call and largest and tell it, I want 10 and I want to do it by count.
9:51 - 9:57

And it gives me the 10 movies with the with the most ratings sorted in decreasing order of count.
9:57 - 10:07

And we see movie ads. Three hundred and fifty six gives has eighty one thousand movies with a mean a 4.0 five.
10:07 - 10:13

But this doesn't tell us what movie that is. What we can do.
10:13 - 10:19

Remember, we have this movie is table two that gives us the movie titles. We can join the tables together.
10:19 - 10:27

And the simplest way to join is to join on a common index. There's a set index method that lets you set a column is the index.
10:27 - 10:32

You can also specify columns to join by. We're going to see more of this later, particularly.
10:32 - 10:37

I'm going to make a note book that walks you through the different indexing operations.
10:37 - 10:42

And you can also read about them in more detail in the text book.
10:42 - 10:47

But. If we want to see, so I'm going to say.
10:47 - 10:54

So I'm going to take our movie's frame that as a movie column and I'm going to join it with movie stats and movie stats.
10:54 - 11:01

Remember, it's the result of our aggregate its index. Is the movie I.D. and so on when to call?
11:01 - 11:08

I'm going to tell it. I want to join movies on movie stats and I'm going to tell it on movie I.D. movies doesn't have a useful index.
11:08 - 11:15

Its index is just the positions. But on whereas when I use the on keyword in join what it does.
11:15 - 11:19

Is it. It tells it to join the left feet.
11:19 - 11:26

The left table movies, to use that color movie ideas column and join it with the index in the other table.
11:26 - 11:35

So movie starts has an index and it expects the movie idea, column and movies to match up with the index in movie stats.
11:35 - 11:41

And so the resulting frame. Has our title in our genre does.
11:41 - 11:45

And then it has the mean and the count for each of these movies.
11:45 - 11:56

So now if I say and largest of this movie info frame, I see that the most frequently rated movie with 81000 movie ratings is Forrest Gump.
11:56 - 12:00

So another thing you can do is so the movie level rating statistics to be computed, this count.
12:00 - 12:03

This mean those are just more variables.
12:03 - 12:10

Remember, in the earlier we talked about, we can make some of the variables you might observe are actually aggregates from other things.
12:10 - 12:17

Well, these are just more variables. So now we have sweet. So if we have an observation of a movie, it has an I.D., it has a title.
12:17 - 12:22

It has on Rose. And it has the number of people who've rated it in the mean rating.
12:22 - 12:31

These also can be aggregated. So in the downloads for this video, you're going to find the notebook that I was just using for practice.
12:31 - 12:36

What I'd like you to do is to go in and compute the mean number of ratings per movie.
12:36 - 12:38

Maybe use some additional exploration as well.
12:38 - 12:47

But that's going to let you start to see how we can build from these aggregates into into additional structures.
12:47 - 12:52

And also emphasize that. A data frame is just a data frame.
12:52 - 12:59

Mike, we give it meaning in terms of observations. But the fact that a data frame resulted from an aggregate doesn't make it special in any way.
12:59 - 13:03

We can aggregate the results of of of an aggregate because it's just another data frame.
13:03 - 13:10

Everything's a data frame or a series and pandas. So to wrap up aggregates, combine a series or array into a single value.
13:10 - 13:14

That's what it means to aggregate. We can do this over an entire series.
13:14 - 13:17

We can also do this on a group by group basis.
13:17 - 13:27

If we have another column that provides us with grouping information so we can compute the average Beacon computer Mean Asama or whatever per group,
13:27 - 13:34

you might have this like if you have if you have records of financial transactions, you might want to compute.
13:34 - 13:39

Well, what was what was our total profit in each month.
13:39 - 13:42

So you could group by year, maybe as you grew by month,
13:42 - 13:50

maybe group by year and month and take a some of the of the profit margin on each of your transactions.
13:50 - 13:58

And then finally join Combine's frames, we can start to put two frames together in order to get context for values.
13:58 - 14:00

We're going to see a lot of other uses for join later.
14:00 - 14:12

But this that's in this context that lets us get context for understanding what's going on in a value.

Title:: https:/.../4c5b7bbb-dc57-4484-b160-ad9000efd740-94330698-4916-4a28-9b48-ad9101273819.mp4?invocationId=8cd5deeb-2f09-ec11-a9e9-0a1a827ad0ec
Video Language:: English
Duration:: 14:11

janetlayne edited English subtitles for https:/.../4c5b7bbb-dc57-4484-b160-ad9000efd740-94330698-4916-4a28-9b48-ad9101273819.mp4?invocationId=8cd5deeb-2f09-ec11-a9e9-0a1a827ad0ec

English subtitles

Revisions

Revision 1 Uploaded

janetlayne

https:/.../4c5b7bbb-dc57-4484-b160-ad9000efd740-94330698-4916-4a28-9b48-ad9101273819.mp4?invocationId=8cd5deeb-2f09-ec11-a9e9-0a1a827ad0ec

Revisions

Our website uses cookies

Operating cookies (Required)