[Script Info] Title: [Events] Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,1\N00:00:04,870 --> 00:00:12,190\NThis video, I'm going to introduce some of the fundamental structures and principles of doing scientific computing in Python. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,2\N00:00:12,190 --> 00:00:18,070\NSince the last couple of videos, I've briefly introduced Python's core structures and core data types. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,3\N00:00:18,070 --> 00:00:23,390\NBut a lot of our work is going to be working with an additional set of structures, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,4\N00:00:23,390 --> 00:00:28,060\Na set of libraries known as scientific python or as the pie data stack. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,5\N00:00:28,060 --> 00:00:35,500\NSo learning outcomes of this video are to understand limitations of core python data types for data science to know. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,6\N00:00:35,500 --> 00:00:41,440\NThree key rate data types particularly. Are we focusing primarily on the number high end the array? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,7\N00:00:41,440 --> 00:00:48,450\NAlso briefly introduce serious and data frame. We're going to see a lot more about those next week and the. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,8\N00:00:48,450 --> 00:00:56,390\NTo be able to perform basic vectorized operations. So in Python, we can write a list of numbers like this. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,9\N00:00:56,390 --> 00:01:01,040\NSo numbers equals I'm using the list syntax that we talked about in the earlier video. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,10\N00:01:01,040 --> 00:01:07,630\NAnd I've got four numbers in here that I'm storing in this list and the variable numbers. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,11\N00:01:07,630 --> 00:01:11,260\NNow. This seems like a perfectly natural thing to do. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,12\N00:01:11,260 --> 00:01:17,860\NBut remember, we said I said in the previous video that everything in Python is an object. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,13\N00:01:17,860 --> 00:01:24,220\NSo this isn't just a list of numbers. If we wrote this in Java or C, we would have an array of numbers where system array. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,14\N00:01:24,220 --> 00:01:31,180\NAnd it's the stores, the numbers, one after the other. But in Python, that's not how it works because everything is an object. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,15\N00:01:31,180 --> 00:01:34,960\NWhat our list stores is, it stores pointers to numbers. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,16\N00:01:34,960 --> 00:01:50,480\NSo we've got a list. And it's got a pointer to O point three and a pointer to nine point two, et cetera. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,17\N00:01:50,480 --> 00:01:58,510\NSo what we store is the list itself has these pointers, which are eight bytes each. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,18\N00:01:58,510 --> 00:02:02,650\NAnd it has the. Numbers themselves. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,19\N00:02:02,650 --> 00:02:08,730\NA flooding point. A double precision flooding point number takes eight bites. But the numbers aren't just numbers, they're objects. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,20\N00:02:08,730 --> 00:02:12,870\NAnd every python object has at least 16 bites. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,21\N00:02:12,870 --> 00:02:17,610\NThis is all on a 64 bit system, has at least 16 bytes of header information. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,22\N00:02:17,610 --> 00:02:24,060\NAnd so this whole list of numbers takes 144 bytes because we've the list has a header. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,23\N00:02:24,060 --> 00:02:28,830\NIt has pointers. The pointers are the objects that have headers in addition to the data. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,24\N00:02:28,830 --> 00:02:36,300\NAlso, the elements of a list can be different types. So when you go over the list, there's no guarantee that everything is a number. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,25\N00:02:36,300 --> 00:02:42,510\NSo if we if we want to sum our numbers, there is a python function called some that will double do a sum. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,26\N00:02:42,510 --> 00:02:47,550\NBut it's basically doing this. So we'll initialize a variable called total. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,27\N00:02:47,550 --> 00:02:51,400\NWell, then loop over all of our numbers and we'll add each one to the total. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,28\N00:02:51,400 --> 00:02:57,360\NAnd that's gonna make the total equal the total of the numbers. This works, it works just fine. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,29\N00:02:57,360 --> 00:03:02,310\NAnd for a list of four numbers, it's completely fine. But Python. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,30\N00:03:02,310 --> 00:03:07,770\NThere's a couple of issues here. One python is Python. The language itself is rather slow. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,31\N00:03:07,770 --> 00:03:11,280\NIt's quite convenient, but it's slow and it's slow for two reasons. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,32\N00:03:11,280 --> 00:03:17,790\NOne is that it is interpreted the python code is compiled to an internal data structure, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,33\N00:03:17,790 --> 00:03:24,090\Nbut then there's C code that runs in a loop interpreting that data structure. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,34\N00:03:24,090 --> 00:03:32,310\NIt's also dynamically typed. So remember, I said there's the the values and the numbers are in the list can have different types. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,35\N00:03:32,310 --> 00:03:38,070\NWe wrote a set of numbers there. But Python isn't guaranteed that they're all numbers. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,36\N00:03:38,070 --> 00:03:41,040\NAnd so rather than saying, okay, I have a number, I'm going to keep adding it. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,37\N00:03:41,040 --> 00:03:45,810\NWhat it says is I have a thing and I'm going to try to add it to the thing I already have. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,38\N00:03:45,810 --> 00:03:50,130\NAnd it has to go look up how to do that, and it does that every time for each number. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,39\N00:03:50,130 --> 00:03:58,900\NThis is all very slow. Also, since it's pointers, if you've taken the computer architecture class. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,40\N00:03:58,900 --> 00:04:06,610\NThat may ring a few alarm bells for you because rather than just having an array of numbers which will be loaded into our cash very quickly accessed, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,41\N00:04:06,610 --> 00:04:12,280\Nwe have an array of pointers and each pointer has to go off and look up the number in memory. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,42\N00:04:12,280 --> 00:04:16,970\NAnd those numbers might be stored next to each other, but they might be stored all over the heap. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,43\N00:04:16,970 --> 00:04:25,390\NWe're gonna have cash misses which make these slow process even slower so we can write code like this and it works fine, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,44\N00:04:25,390 --> 00:04:31,510\Nbut it's not an efficient way to do computation. And as we get to larger and larger data sets, you get a few hundred. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,45\N00:04:31,510 --> 00:04:37,780\NYou got a few thousand numbers. You're gonna be fine. When you've got a million numbers, when you have a hundred million or a billion numbers. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,46\N00:04:37,780 --> 00:04:47,500\NThen things start to really get slow. So none PI is a python package that provides efficient data types for doing numeric computation. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,47\N00:04:47,500 --> 00:04:55,720\NAnd NUM Pi underlies almost all of the rest of the scientific python and data science and machine learning for Python software. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,48\N00:04:55,720 --> 00:05:00,430\NIt has a data type called an NDA array. There's a variety of different ways you can create one, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,49\N00:05:00,430 --> 00:05:06,170\Nbut here we're going to just create one using the array constructor and then we're going to pass it our list. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,50\N00:05:06,170 --> 00:05:14,890\NSo we're creating the list in this case. We are going to see later many ways to load arrays without having to go through a list. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,51\N00:05:14,890 --> 00:05:19,210\NI'm just doing this here so I can demonstrate how the array works. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,52\N00:05:19,210 --> 00:05:24,940\NBut all the elements are of the same type in an array and they're also stored directly in the array. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,53\N00:05:24,940 --> 00:05:28,480\NSo this ENDI array, it's the stores, the floats, one right after each other. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,54\N00:05:28,480 --> 00:05:35,140\NEight bytes each. And so we don't have the indirection, three pointers. We don't have all of the overhead of storing all of these different objects. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,55\N00:05:35,140 --> 00:05:41,170\NIt's just storing the numbers, one right after each other. You can have an Endi array of objects and that's going to store the pointers. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,56\N00:05:41,170 --> 00:05:46,900\NAnd that's useful in a few cases, especially for treating strings consistently with numbers. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,57\N00:05:46,900 --> 00:05:55,280\NBut it really shines when we're dealing with arrays of numbers for various scientific computing applications. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,58\N00:05:55,280 --> 00:06:01,880\NSo if you want to sum our numbers, we can use the num pi some function and it it's much shorter. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,59\N00:06:01,880 --> 00:06:08,340\NA little python has a some function, as I mentioned, that we could have used, but also it's implemented in a compiled language. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,60\N00:06:08,340 --> 00:06:14,660\NAnd when you have a num high array that's storing numbers, whether the integers, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,61\N00:06:14,660 --> 00:06:20,840\Nwhether they're floating point numbers, it's stored internally in a format that's compatible with C or Fortran. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,62\N00:06:20,840 --> 00:06:25,570\NAnd so a lot of num pi. Functions. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,63\N00:06:25,570 --> 00:06:29,650\NWhat they're doing is they're passing the array to see code or Fortran code or Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,64\N00:06:29,650 --> 00:06:36,520\NC++ code that has a comp. loop that works on that data type and is able to very, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,65\N00:06:36,520 --> 00:06:42,070\Nvery efficiently sum up those numbers. We don't have a cast mate cash issues from the indirection. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,66\N00:06:42,070 --> 00:06:45,640\NWe don't have the overhead of Python's interpreted code. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,67\N00:06:45,640 --> 00:06:52,810\NWe don't have the overhead of having to deal with the the elements of the array might be of different types. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,68\N00:06:52,810 --> 00:06:59,650\NThey're all the same type. We can work over them in in a loop, in comp. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,69\N00:06:59,650 --> 00:07:05,350\NMachine code. So in general, don't loop. You can loop over a number high end the array. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,70\N00:07:05,350 --> 00:07:09,370\NIt's iterable just like a list. But in general, you don't want to do that. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,71\N00:07:09,370 --> 00:07:14,380\NYou want to set up your code so that num pi can do the looping for you. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,72\N00:07:14,380 --> 00:07:24,970\NAnd effectively what we wind up using Python as is a scripting language to tell the underlying C, C++ and Fortran code. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,73\N00:07:24,970 --> 00:07:26,500\NWhat to do. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,74\N00:07:26,500 --> 00:07:38,240\NAnd the fact that Python is a slow language doesn't matter very much because the vast majority of our processing time won't be spent in Python. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,75\N00:07:38,240 --> 00:07:43,190\NSo I thought none pile. So has a feature called Vector Ization. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,76\N00:07:43,190 --> 00:07:47,750\NThere are a lot of operations that operate on an entire array at a time. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,77\N00:07:47,750 --> 00:07:52,580\NSo if I get it, I can create another array. The Linn's base function here. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,78\N00:07:52,580 --> 00:08:01,180\NIt. The land space function here. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,79\N00:08:01,180 --> 00:08:06,370\NIt creates an array of four values that are evenly spaced from zero to one inclusive. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,80\N00:08:06,370 --> 00:08:11,170\NAnd then the plus operator here, remember, plus between two numbers is going to add it between two strings. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,81\N00:08:11,170 --> 00:08:18,460\NIt's going to concatenate them plus between two arrays requires them to be of compatible shapes. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,82\N00:08:18,460 --> 00:08:24,970\NAnd it adds the the corresponding elements of the arrays to each other and returns a new array. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,83\N00:08:24,970 --> 00:08:29,500\NSo what if we have a bunch of number one array of numbers and we have another array of numbers? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,84\N00:08:29,500 --> 00:08:39,460\NWe want to add them together. We just add the two arrays and it does that addition again in a loop written in C or Fortran. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,85\N00:08:39,460 --> 00:08:41,740\NAnd it does it very, very quickly. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,86\N00:08:41,740 --> 00:08:49,540\NYou can also add an integer or an integer or a floating point, single number to an array, and it'll add it to every element of the array. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,87\N00:08:49,540 --> 00:08:55,120\NBut this is the key point to be able to make scientific computing with Python fast. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,88\N00:08:55,120 --> 00:09:04,090\NWe setup our code and throughout. We're gonna be trying to set it up so that we use vectorized nation as much as possible. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,89\N00:09:04,090 --> 00:09:11,270\NAnd we vectorized over as much data at a time as possible so we can allow the optimized loops and in num pi, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,90\N00:09:11,270 --> 00:09:21,400\Nin Pandas and Sai Pi and psychic learn to do the work and to put as much of the work as possible into those compile loops. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,91\N00:09:21,400 --> 00:09:28,390\NSo we're not spending a lot of time in slow python code. Each array has three key things. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,92\N00:09:28,390 --> 00:09:33,740\NIt has a data type called a D type, and that says what kind of elements are in the array? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,93\N00:09:33,740 --> 00:09:39,710\NPI has data types for your standard integers of various sizes. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,94\N00:09:39,710 --> 00:09:44,870\NSingle and double precision floating point numbers. It also has D types for working with. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,95\N00:09:44,870 --> 00:09:50,080\NDate. Date. Times. Strings and then storing arrays. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,96\N00:09:50,080 --> 00:09:53,000\NThat's where pointers to arbitrary python objects. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,97\N00:09:53,000 --> 00:09:59,720\NThe data type or the array also has a shape which is a tuple of integers that says how big the array is. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,98\N00:09:59,720 --> 00:10:04,700\NThe array may be multidimensional. So Endi array stands for N Dimensional Array. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,99\N00:10:04,700 --> 00:10:15,580\NAnd it can be one, two, three, four, whatever dimensional. So if we have a 100 by 50 matrix, it's stored in a in a number PI in the array of shape. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,100\N00:10:15,580 --> 00:10:19,730\NOne hundred, comma 50. And then there's the data. It's stealth. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,101\N00:10:19,730 --> 00:10:26,420\NThat's the elements of the array. The data points themselves that are stored in the array. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,102\N00:10:26,420 --> 00:10:33,170\NSo then pandas, which we're going to see next week, builds on top of a raise with two new data types, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,103\N00:10:33,170 --> 00:10:38,960\Na series is an array with an associated index that allows us to look up. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,104\N00:10:38,960 --> 00:10:48,260\NSo an ENDI array, like a python list is indexed using numbers starting from zero zero one, two, three, four, five. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,105\N00:10:48,260 --> 00:10:55,850\NBut sometimes for a lot of times we're gonna have some other natural index. If you've taken databases, it's equivalent to the primary key. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,106\N00:10:55,850 --> 00:11:00,150\NSo a series is an array with an associated index that might be other numbers. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,107\N00:11:00,150 --> 00:11:04,580\NThat might be strings. But some other way of accessing the points. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,108\N00:11:04,580 --> 00:11:12,470\NIt also has an efficient representations that you can have a series that's indexed zero through and minus one where N is the length of the series. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,109\N00:11:12,470 --> 00:11:18,800\NAnd it does not take up a lot of space to do that. And then a data frame is a table where each column is a series. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,110\N00:11:18,800 --> 00:11:25,760\NAnd they all share the same index. And we're gonna see those a lot because we load in a set of data points that's gonna be in a data frame. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,111\N00:11:25,760 --> 00:11:30,110\NNow, an assignment zero, you're going to briefly see both of these data structures. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,112\N00:11:30,110 --> 00:11:35,930\NI walk you through everything you have to do with them in assignment zero. And we're going to introduce them a lot more. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,113\N00:11:35,930 --> 00:11:39,560\NWoomera's talking about how to describe data next week. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,114\N00:11:39,560 --> 00:11:47,920\NBut. Endi Arae, the number higher radiata structure is the fundamental core that all of these others are built on. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,115\N00:11:47,920 --> 00:11:55,740\NThe series augments it with an index. The data frame collects multiple series together with column names like a spreadsheet table. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,116\N00:11:55,740 --> 00:12:00,950\NSo we're still going to sometimes use Python native lists and loops. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,117\N00:12:00,950 --> 00:12:05,360\NOftentimes, it's going to be because for some reason, we need a list of arrays or data frames. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,118\N00:12:05,360 --> 00:12:07,310\NAlso, if we need to loop, if we have, say, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,119\N00:12:07,310 --> 00:12:15,950\N20 input files that we need to put together to to to be our data set or we got different groups of data, we're going to loop over those. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,120\N00:12:15,950 --> 00:12:20,390\NBut the big thing we avoid doing is looping over individual data points. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,121\N00:12:20,390 --> 00:12:24,450\NWe load in a few hundred thousand records. They're going to be in a data frame. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,122\N00:12:24,450 --> 00:12:33,470\NWe don't loop over the rows of a data frame. If we can avoid it, because there's almost always a more efficient way to do that computation, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,123\N00:12:33,470 --> 00:12:41,000\Nthat pushes a lot of it into the C and C++ code and Fortran code that underlies NUM, Pi, pandas, et cetera. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,124\N00:12:41,000 --> 00:12:46,070\NSo wrap up num pi provides efficient to ray data structures that are more memory compact's. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,125\N00:12:46,070 --> 00:12:50,690\NThey don't take up nearly as much space and they're also much more efficient to compute over. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,126\N00:12:50,690 --> 00:12:54,320\NThese are going to be the backbone of our data processing throughout the rest of the class. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,127\N00:12:54,320 --> 00:13:04,250\NAnd we want to prefer vectorized operations that perform these loops in native comp. machine code whenever possible for a little bit of practice. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,128\N00:13:04,250 --> 00:13:09,320\NI encourage you to take the example code from this from these slides and go and try them Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,129\N00:13:09,320 --> 00:13:13,880\Nin a notebook so you can get a little more practice creating notebooks and running code. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,130\N00:13:13,880 --> 00:13:20,967\NI will see you in class. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,