1 00:00:00,000 --> 00:00:04,200 So welcome back. Today we are gonna cover debugging and profiling. 2 00:00:04,720 --> 00:00:09,340 Before I get into it we're gonna make another reminder to fill in the survey. 3 00:00:09,520 --> 00:00:14,580 Just one of the main things we want to get from you is questions, because the last day 4 00:00:14,820 --> 00:00:18,080 is gonna be questions from you guys: about things that 5 00:00:18,080 --> 00:00:22,020 we haven't covered, or like you want us to kind of talk more in depth. 6 00:00:23,350 --> 00:00:26,969 The more questions we get, the more interesting we can make that section, 7 00:00:26,970 --> 00:00:28,900 so please go on and fill in the survey. 8 00:00:28,900 --> 00:00:35,660 So today's lecture is gonna be a lot of topics. All the topics revolve around the concept of 9 00:00:35,820 --> 00:00:39,920 what do you do when you have a program that has some bugs. 10 00:00:39,920 --> 00:00:42,520 Which is most of the time, like when you are programming, you're kind of thinking 11 00:00:42,720 --> 00:00:47,400 about how you implement something and there's like a half life of fixing all the issues that 12 00:00:47,620 --> 00:00:52,140 that program has. And even if your program behaves like you want, it might be that it's 13 00:00:52,390 --> 00:00:55,680 really slow, or it's taking a lot of resources in the process. 14 00:00:55,680 --> 00:01:00,569 So today we're gonna see a lot of different approaches of dealing with these problems. 15 00:01:01,300 --> 00:01:05,099 So first, the first section is on debugging. 16 00:01:06,159 --> 00:01:08,279 Debugging can be done in many different ways, 17 00:01:08,380 --> 00:01:10,119 there are all kinds of... 18 00:01:10,120 --> 00:01:13,640 The most simple approach that, pretty much, all 19 00:01:13,640 --> 00:01:17,140 CS students will go through, will be just: you have some code, and it's not behaving 20 00:01:17,160 --> 00:01:20,280 like you want, so you probe the code by adding 21 00:01:20,280 --> 00:01:23,420 print statements. This is called "printf debugging" and 22 00:01:23,440 --> 00:01:24,450 it works pretty well. 23 00:01:24,450 --> 00:01:26,680 Like, I have to be honest, 24 00:01:26,820 --> 00:01:33,120 I use it a lot of the time because of how simple to set up and how quick the feedback can be. 25 00:01:34,360 --> 00:01:39,320 One of the issues with printf debugging is that you can get a lot of output 26 00:01:39,320 --> 00:01:40,740 and maybe you don't want 27 00:01:40,800 --> 00:01:43,240 to get as much output as you're getting. 28 00:01:43,780 --> 00:01:49,349 There has... people have thought of slightly more complex ways of doing printf debugging and 29 00:01:53,920 --> 00:01:58,320 one of these ways is what is usually referred to as "logging". 30 00:01:58,420 --> 00:02:04,530 So the advantage of doing logging versus doing printf debugging is that, when you're creating logs, 31 00:02:05,080 --> 00:02:09,780 you're not necessarily creating the logs because there's a specific issue you want to fix; 32 00:02:09,780 --> 00:02:12,460 it's mostly because you have built a 33 00:02:12,480 --> 00:02:16,840 more complex software system and you want to log when some events happen. 34 00:02:17,360 --> 00:02:21,560 One of the core advantages of using a logging library is that 35 00:02:22,180 --> 00:02:27,040 you can can define severity levels, and you can filter based on those. 36 00:02:27,400 --> 00:02:31,620 Let's see an example of how we can do something like that. 37 00:02:32,320 --> 00:02:35,840 Yeah, everything fits here. This is a really silly example: 38 00:02:36,340 --> 00:02:37,520 We're just gonna 39 00:02:37,520 --> 00:02:40,980 sample random numbers and, depending on the value of the number, 40 00:02:41,120 --> 00:02:44,720 that we can interpret as a kind of "how wrong things are going". 41 00:02:44,740 --> 00:02:48,760 We're going to log the value of the number and then 42 00:02:49,340 --> 00:02:51,640 we can see what is going on. 43 00:02:52,580 --> 00:02:59,280 I need to disable these formatters... 44 00:02:59,620 --> 00:03:03,720 And if we were just to execute the code as it is, 45 00:03:04,160 --> 00:03:07,420 we just get the output and we just keep getting more and more output. 46 00:03:07,420 --> 00:03:13,599 But you have to kind of stare at it and make sense of what is going on, and we don't know 47 00:03:13,600 --> 00:03:19,629 what is the relative timing between printfs, we don't really know whether this is just an information message 48 00:03:19,630 --> 00:03:22,960 or a message of whether something went wrong. 49 00:03:23,810 --> 00:03:25,810 If we just go in, 50 00:03:27,320 --> 00:03:29,780 and undo, not that one... 51 00:03:34,220 --> 00:03:37,140 That one, we can set that formatter. 52 00:03:38,620 --> 00:03:41,600 Now the output looks something more like this 53 00:03:41,620 --> 00:03:44,840 So for example, if you have several different modules that you are programming with, 54 00:03:44,840 --> 00:03:46,940 you can identify them with like different levels. 55 00:03:46,940 --> 00:03:49,800 Here, we have, we have debug levels, 56 00:03:50,330 --> 00:03:51,890 we have critical 57 00:03:51,890 --> 00:03:57,540 info, different levels. And it might be handy because here we might only care about the error messages. 58 00:03:57,740 --> 00:04:00,640 Like those are like, the... We have been 59 00:04:00,700 --> 00:04:03,960 working on our code, so far so good, and suddenly we get some error. 60 00:04:03,960 --> 00:04:06,540 We can log that to identify where it's happening. 61 00:04:06,580 --> 00:04:11,640 But maybe there's a lot of information messages, but we can deal with that 62 00:04:12,709 --> 00:04:16,809 by just changing the level to error level. 63 00:04:17,400 --> 00:04:17,900 And 64 00:04:18,890 --> 00:04:22,960 now if we were to run this again, we are only going to get those 65 00:04:23,620 --> 00:04:28,160 errors in the output, and we can just look through those to make sense of what is going on. 66 00:04:28,920 --> 00:04:33,320 Another really useful tool when you're dealing with logs is 67 00:04:34,130 --> 00:04:36,670 As you kind of look at this, 68 00:04:36,670 --> 00:04:42,580 it has become easier because now we have this critical and error levels that we can quickly identify. 69 00:04:43,310 --> 00:04:46,750 But since humans are fairly visual creatures, 70 00:04:48,680 --> 00:04:53,109 one thing that you can do is use colors from your terminal to 71 00:04:53,630 --> 00:04:57,369 identify these things. So now, changing the formatter, 72 00:04:57,369 --> 00:05:03,320 what I've done is slightly change how the output is formatted. 73 00:05:03,580 --> 00:05:09,340 When I do that, now whenever I get a warning message, it's color coded by yellow; 74 00:05:09,340 --> 00:05:10,880 whenever I get like an error, 75 00:05:10,960 --> 00:05:16,140 faded red; and when it's critical, I have a bold red indicating something went wrong. 76 00:05:16,280 --> 00:05:22,620 And here it's a really short output, but when you start having thousands and thousands of lines of log, 77 00:05:22,620 --> 00:05:26,380 which is not unrealistic and happens every single day in a lot of apps, 78 00:05:27,140 --> 00:05:32,500 quickly browsing through them and identifying where the error or the red patches are 79 00:05:32,600 --> 00:05:35,320 can be really useful. 80 00:05:35,600 --> 00:05:41,400 A quick aside is, you might be curious about how the terminal is displaying these colors. 81 00:05:41,580 --> 00:05:45,320 At the end of the day, the terminal is only outputting characters. 82 00:05:47,160 --> 00:05:49,480 Like, how is this program or how are other programs, like LS, 83 00:05:50,060 --> 00:05:56,050 that has all these fancy colors. How are they telling the terminal that it should use these different colors? 84 00:05:56,360 --> 00:05:58,779 This is nothing extremely fancy, 85 00:05:59,440 --> 00:06:03,440 what these tools are doing, is something along these lines. 86 00:06:03,740 --> 00:06:04,540 Here we have... 87 00:06:05,420 --> 00:06:08,340 I can clear the rest of the output, so we can focus on this. 88 00:06:08,660 --> 00:06:14,000 There's some special characters, some escape characters here, 89 00:06:14,260 --> 00:06:19,740 then we have some text and then we have some other special characters. And if we execute this line 90 00:06:19,940 --> 00:06:22,360 we get a red "This is red". 91 00:06:22,480 --> 00:06:26,640 And you might have picked up on the fact that we have a "255;0;0" here, 92 00:06:26,720 --> 00:06:31,400 this is just telling the RGB values of the color we want in the terminal. 93 00:06:31,400 --> 00:06:38,100 And you pretty much can do this in any piece of code that you have, and like that you can color code the output. 94 00:06:38,100 --> 00:06:42,540 Your terminal is fairly fancy and supports a lot of different colors in the output. 95 00:06:42,550 --> 00:06:45,400 This is not even all of them, this is like a sixteenth of them. 96 00:06:46,100 --> 00:06:49,119 I think it can be fairly useful to know about that. 97 00:06:52,100 --> 00:06:55,960 Another thing is maybe you don't enjoy or you don't think 98 00:06:56,200 --> 00:06:58,620 logs are really fit for you. 99 00:06:58,620 --> 00:07:02,480 The thing is a lot of other systems that you might start using will use logs. 100 00:07:02,840 --> 00:07:05,360 As you start building larger and larger systems, 101 00:07:05,360 --> 00:07:10,140 you might rely on other dependencies. Common dependencies might be web servers or 102 00:07:10,220 --> 00:07:12,320 databases, it's a really common one. 103 00:07:12,440 --> 00:07:17,740 And those will be logging their errors or exceptions in their own logs. 104 00:07:17,740 --> 00:07:20,540 Of course, you will get some client-side error, 105 00:07:20,620 --> 00:07:25,140 but those sometimes are not informative enough for you to figure out what is going on. 106 00:07:25,900 --> 00:07:33,940 In most UNIX systems, the logs are usually placed under a folder called "/var/log" 107 00:07:33,940 --> 00:07:37,980 and if we list it, we can see there's a bunch of logs in here. 108 00:07:42,680 --> 00:07:48,040 So we have like the shutdown monitor log, or some weekly logs. 109 00:07:49,669 --> 00:07:56,199 Things related to the Wi-Fi, for example. And if we output the 110 00:07:57,560 --> 00:08:00,840 System log, which contains a lot of information about the system, 111 00:08:00,840 --> 00:08:03,940 we can get information about what's going on. 112 00:08:04,120 --> 00:08:06,780 Similarly, there are tools that will let you 113 00:08:07,460 --> 00:08:13,090 more sanely go through this output. But here, looking at the system log, 114 00:08:13,090 --> 00:08:15,520 I can look at this, and say: 115 00:08:15,760 --> 00:08:20,040 oh there's some service that is exiting with some abnormal code 116 00:08:20,420 --> 00:08:25,460 and based on that information, I can go and try to figure out what's going on, 117 00:08:25,510 --> 00:08:27,500 like what's going wrong. 118 00:08:29,020 --> 00:08:32,000 One thing to know when you're working with logs is that 119 00:08:32,000 --> 00:08:35,900 more traditionally, every software had their own 120 00:08:35,920 --> 00:08:42,540 log, but it has been increasingly more popular to have a unified system log where everything is placed. 121 00:08:43,010 --> 00:08:49,299 Pretty much any application can log into the system log, but instead of being in a plain text format, 122 00:08:49,300 --> 00:08:52,380 it will be compressed in some special format. 123 00:08:52,380 --> 00:08:56,460 An example of this, it was what we covered in the data wrangling lecture. 124 00:08:56,520 --> 00:08:59,900 In the data wrangling lecture we were using the "journalctl", 125 00:09:00,200 --> 00:09:04,280 which is accessing the log and outputting all that output. 126 00:09:04,340 --> 00:09:07,380 Here in Mac, now the command is "log show", 127 00:09:07,380 --> 00:09:10,020 which will display a lot of information. 128 00:09:10,100 --> 00:09:15,760 I'm gonna just display the last ten seconds, because logs are really, really verbose and 129 00:09:17,060 --> 00:09:23,720 just displaying the last 10 seconds is still gonna output a fairly large amount of lines. 130 00:09:23,900 --> 00:09:28,240 So if we go back through what's going on, 131 00:09:28,240 --> 00:09:33,460 we here see that a lot of Apple things are going on, since this is a macbook. 132 00:09:33,500 --> 00:09:38,460 Maybe we could find errors about like some system issue here. 133 00:09:39,280 --> 00:09:46,920 Again they're fairly verbose, so you might want to practice your data wrangling techniques here, 134 00:09:46,920 --> 00:09:50,440 like 10 seconds equal to like 500 lines of logs, so you can kind of 135 00:09:50,960 --> 00:09:54,960 get an idea of how many lines per second you're getting. 136 00:09:56,360 --> 00:10:01,060 They're not only useful for figuring out some other programs' output, 137 00:10:01,060 --> 00:10:05,619 they're also useful for you, if you want to log there instead of into your own file. 138 00:10:05,779 --> 00:10:11,319 So using the "logger" command, in both linux and mac, 139 00:10:11,839 --> 00:10:13,480 You can say okay 140 00:10:13,480 --> 00:10:18,880 I'm gonna log this "Hello Logs" into this system log. 141 00:10:18,880 --> 00:10:21,939 We execute the command and then 142 00:10:22,760 --> 00:10:27,640 we can check by going through the last minute of logs, 143 00:10:27,640 --> 00:10:31,760 since it's gonna be fairly recent, and grepping for that "Hello" 144 00:10:31,760 --> 00:10:38,260 we find our entry. Fairly recent entry, that we just created that said "Hello Logs". 145 00:10:39,220 --> 00:10:46,840 As you become more and more familiar with these tools, you will find yourself using 146 00:10:48,800 --> 00:10:51,279 the logs more and more often, since 147 00:10:51,529 --> 00:10:56,349 even if you have some bug that you haven't detected, and the program has been running for a while, 148 00:10:56,349 --> 00:11:02,240 maybe the information is already in the log and can tell you enough to figure out what is going on. 149 00:11:02,800 --> 00:11:08,260 However, printf debugging is not everything. So now I'm going to be covering debuggers. 150 00:11:08,260 --> 00:11:10,380 But first any questions on logs so far? 151 00:11:11,720 --> 00:11:15,040 So what kind of things can you figure out from the logs? 152 00:11:15,040 --> 00:11:18,800 like this Hello Logs says that you did something with Hello at that time? 153 00:11:18,940 --> 00:11:25,040 Yeah, like say, for example, I can write a bash script that detects... 154 00:11:25,060 --> 00:11:29,480 Well, that checks every time what Wi-Fi network I'm connected to. 155 00:11:29,480 --> 00:11:34,150 And every time it detects that it has changed, it makes an entry in the logs and says 156 00:11:34,150 --> 00:11:37,440 Oh now it looks like we have changed Wi-Fi networks. 157 00:11:37,440 --> 00:11:41,400 and then you might go back and parse through the logs and take like, okay 158 00:11:41,510 --> 00:11:47,559 When did my computer change from one Wi-Fi network to another. And this is just kind of a simple example 159 00:11:47,560 --> 00:11:50,260 But there are many, many ways, 160 00:11:50,660 --> 00:11:54,020 many types of information that you could be logging here. 161 00:11:54,020 --> 00:11:59,040 More commonly, you will probably want to check if your computer, for example, is 162 00:11:59,100 --> 00:12:02,540 entering sleep, for example, for some unknown reason. 163 00:12:02,680 --> 00:12:04,660 Like it's on hibernation mode. 164 00:12:04,820 --> 00:12:09,100 There's probably some information in the logs about who asked that to happen, 165 00:12:09,100 --> 00:12:10,240 or why it's that happening. 166 00:12:11,720 --> 00:12:14,880 Any other questions? Okay. 167 00:12:14,880 --> 00:12:17,380 So when printf debugging is not enough, 168 00:12:18,320 --> 00:12:22,360 the best alternative after that is using... 169 00:12:23,360 --> 00:12:25,360 [Exit that] 170 00:12:28,480 --> 00:12:30,260 So, it's using a debugger. 171 00:12:30,580 --> 00:12:37,620 So a debugger is a tool that will wrap around your code and will let you run your code, 172 00:12:38,120 --> 00:12:40,480 but it will kind of keep control over it. 173 00:12:40,480 --> 00:12:42,500 So it will let you step 174 00:12:42,500 --> 00:12:47,080 through the code and execute it and set breakpoints. 175 00:12:47,080 --> 00:12:50,020 You probably have seen debuggers in some way, if you have 176 00:12:50,020 --> 00:12:55,800 ever used something like an IDE, because IDEs have this kind of fancy: set a breakpoint here, execute, ... 177 00:12:56,080 --> 00:12:59,040 But at the end of the day what these tools are using is just 178 00:12:59,040 --> 00:13:04,740 these command line debuggers and they're just presenting them in a really fancy format. 179 00:13:04,850 --> 00:13:09,969 Here we have a completely broken bubble sort, a simple sorting algorithm. 180 00:13:10,000 --> 00:13:11,560 Don't worry about the details, 181 00:13:11,560 --> 00:13:14,980 but we just want to sort this array that we have here. 182 00:13:17,360 --> 00:13:19,460 We can try doing that by just doing 183 00:13:21,340 --> 00:13:23,340 Python bubble.py 184 00:13:23,500 --> 00:13:28,360 And when we do that... Oh there's some index error, list index out of range. 185 00:13:28,480 --> 00:13:31,200 We could start adding prints 186 00:13:31,200 --> 00:13:33,740 but if have a really long string, we can get a lot of information. 187 00:13:33,820 --> 00:13:37,820 So how about we go up to the moment that we crashed? 188 00:13:37,900 --> 00:13:41,020 We can go to that moment and examine what the 189 00:13:41,020 --> 00:13:43,360 current state of the program was. 190 00:13:43,520 --> 00:13:49,080 So for doing that I'm gonna run the program using the Python debugger. 191 00:13:49,080 --> 00:13:53,820 Here I'm using technically the ipython debugger, just because it has nice coloring syntax 192 00:13:54,060 --> 00:13:59,140 so it's probably easier for both of us to understand 193 00:13:59,300 --> 00:14:01,300 what's going on in the output. 194 00:14:01,310 --> 00:14:04,929 But they're pretty much identical anyway. 195 00:14:05,140 --> 00:14:09,400 So we execute this, and now we are given a prompt 196 00:14:09,400 --> 00:14:13,080 where we're being told that we are here, at the very first line of our program. 197 00:14:13,100 --> 00:14:15,440 And we can... 198 00:14:15,980 --> 00:14:20,380 "L" stands for "List", so as with many of these tools 199 00:14:21,140 --> 00:14:24,400 there's kind of like a language of operations that you can do, 200 00:14:24,400 --> 00:14:28,220 and they are often mnemonic, as it was the case with VIM or TMUX. 201 00:14:28,860 --> 00:14:32,940 So here, "L" is for "Listing" the code, and we can see the entire code. 202 00:14:34,540 --> 00:14:38,880 "S" is for "Step" and will let us kind of one 203 00:14:38,880 --> 00:14:42,180 line at a time, go through the execution. 204 00:14:42,300 --> 00:14:47,360 The thing is we're only triggering the error some time later. 205 00:14:47,360 --> 00:14:48,710 So 206 00:14:48,710 --> 00:14:55,150 we can restart the program and instead of trying to step until we get to the issue, 207 00:14:55,150 --> 00:15:00,820 we can just ask for the program to continue which is the "C" command and 208 00:15:01,480 --> 00:15:04,160 hey, we reached the issue. 209 00:15:04,640 --> 00:15:08,080 We got to this line where everything crashed, 210 00:15:08,080 --> 00:15:11,020 we're getting this list index out of range. 211 00:15:11,020 --> 00:15:13,560 And now that we are here we can say, huh? 212 00:15:14,120 --> 00:15:17,520 Okay, first, let's print the value of the array. 213 00:15:18,080 --> 00:15:21,520 This is the value of the current array 214 00:15:23,120 --> 00:15:26,840 So we have six items. Okay. What is the value of "J" here? 215 00:15:27,200 --> 00:15:31,929 So we look at the value of "J". "J" is 5 here, which will be the last element, but 216 00:15:32,480 --> 00:15:37,119 "J" plus 1 is going to be 6, so that's triggering the out of bounds error. 217 00:15:37,970 --> 00:15:40,389 So what we have to do is 218 00:15:40,660 --> 00:15:47,660 this "N", instead of "N" has to be "N minus one". We have identified that the error lies there. 219 00:15:47,660 --> 00:15:50,800 So we can quit, which is "Q". 220 00:15:52,010 --> 00:15:54,729 Again, because it's a post-mortem debugger. 221 00:15:56,090 --> 00:16:00,219 We go back to the code and say okay, 222 00:16:02,860 --> 00:16:06,180 we need to append this "N minus one". 223 00:16:06,760 --> 00:16:11,140 That will prevent the list index out of range and 224 00:16:11,480 --> 00:16:14,260 if we run this again without the debugger, 225 00:16:15,020 --> 00:16:18,729 okay, no errors now. But this is not our sorted list. 226 00:16:18,729 --> 00:16:21,200 This is sorted, but it's not our list. 227 00:16:21,300 --> 00:16:23,000 We are missing entries from our list, 228 00:16:23,160 --> 00:16:27,420 so there is some behavioral issue that we're reaching here. 229 00:16:27,920 --> 00:16:32,409 Again, we could start using printf debugging but kind of a hunch now 230 00:16:32,409 --> 00:16:37,940 is that probably the way we're swapping entries in the bubble sort program is wrong. 231 00:16:38,480 --> 00:16:45,920 We can use the debugger for this. We can go through them to the moment we're doing a swap and 232 00:16:46,120 --> 00:16:48,320 check how the swap is being performed. 233 00:16:48,540 --> 00:16:50,600 So a quick overview, 234 00:16:50,600 --> 00:16:56,590 we have two for loops and in the most nested loop, 235 00:16:56,720 --> 00:17:03,220 we are checking if the array is larger than the other array. The thing is if we just try to execute until this line, 236 00:17:03,589 --> 00:17:06,609 it's only going to trigger whenever we make a swap. 237 00:17:06,700 --> 00:17:11,640 So what we can do is we can set a breakpoint in the sixth line. 238 00:17:11,820 --> 00:17:15,520 We can create a breakpoint in this line and then 239 00:17:15,580 --> 00:17:20,820 the program will execute and the moment we try to swap variables is when the program is going to stop. 240 00:17:21,080 --> 00:17:22,940 So we create a breakpoint there 241 00:17:22,940 --> 00:17:27,000 and then we continue the execution of the program. The program halts 242 00:17:27,000 --> 00:17:30,520 and says hey, I have executed and I have reached this line. 243 00:17:30,820 --> 00:17:31,860 Now 244 00:17:31,920 --> 00:17:39,120 I can use "locals()", which is a Python function that returns a dictionary with all the values 245 00:17:39,120 --> 00:17:41,220 to quickly see the entire context. 246 00:17:43,100 --> 00:17:48,140 The string, the array is fine and is six, again, just the beginning and 247 00:17:48,680 --> 00:17:51,100 I step, go to the next line. 248 00:17:51,780 --> 00:17:52,620 Oh, 249 00:17:52,620 --> 00:17:57,000 and I identify the issue: I'm swapping one item at a time, instead of simultaneously, 250 00:17:57,020 --> 00:18:01,840 so that's what's triggering the fact that we're losing variables as we go through. 251 00:18:03,200 --> 00:18:06,729 That's kind of a very simple example, but 252 00:18:07,490 --> 00:18:09,050 debuggers are really powerful. 253 00:18:09,050 --> 00:18:13,320 Most programming languages will give you some sort of debugger, 254 00:18:13,540 --> 00:18:19,920 and when you go to more low level debugging you might run into tools like... 255 00:18:19,920 --> 00:18:21,920 You might want to use something like 256 00:18:25,340 --> 00:18:27,340 GDB. 257 00:18:31,580 --> 00:18:34,360 And GDB has one nice property: 258 00:18:34,460 --> 00:18:37,740 GDB works really well with C/C++ and all these C-like languages. 259 00:18:37,780 --> 00:18:42,720 But GDB actually lets you work with pretty much any binary that you can execute. 260 00:18:42,720 --> 00:18:47,800 So for example here we have sleep, which is just a program that's going to sleep for 20 seconds. 261 00:18:48,520 --> 00:18:55,340 It's loaded and then we can do run, and then we can interrupt this sending an interrupt signal. 262 00:18:55,340 --> 00:19:02,020 And GDB is displaying for us, here, very low-level information about what's going on in the program. 263 00:19:02,030 --> 00:19:06,820 So we're getting the stack trace, we're seeing we are in this nanosleep function, 264 00:19:07,060 --> 00:19:11,660 we can see the values of all the hardware registers in your machine. So 265 00:19:12,300 --> 00:19:17,160 you can get a lot of low-level detail using these tools. 266 00:19:18,560 --> 00:19:22,520 I think that's all I want to cover for debuggers. 267 00:19:22,520 --> 00:19:25,540 Any questions related to that? 268 00:19:33,520 --> 00:19:39,040 Another interesting tool when you're trying to debug is that sometimes you want to debug as if 269 00:19:39,480 --> 00:19:42,220 your program is a black box. 270 00:19:42,220 --> 00:19:46,059 So you, maybe, know what the internals of the program but at the same time 271 00:19:46,430 --> 00:19:52,119 your computer knows whenever your program is trying to do some operations. 272 00:19:52,280 --> 00:19:54,729 So this is in UNIX systems, 273 00:19:54,760 --> 00:19:58,060 there's this notion of like user level code and kernel level code. 274 00:19:58,060 --> 00:20:03,180 And when you try to do some operations like reading a file or like reading the network connection 275 00:20:03,340 --> 00:20:06,020 you will have to do something called system calls. 276 00:20:06,180 --> 00:20:12,560 You can get a program and go through those operations and ask 277 00:20:14,000 --> 00:20:18,300 what operations did this software do? 278 00:20:18,300 --> 00:20:20,920 So for example, if you have like a Python function 279 00:20:20,980 --> 00:20:26,660 that is only supposed to do a mathematical operation and you run it through this program, 280 00:20:26,660 --> 00:20:28,460 and it's actually reading files, 281 00:20:28,460 --> 00:20:31,940 Why is it reading files? It shouldn't be reading files. So, let's see. 282 00:20:34,520 --> 00:20:37,200 This is "strace". 283 00:20:37,200 --> 00:20:38,740 So for example, we can do it something like this. 284 00:20:38,740 --> 00:20:41,260 So here we're gonna run the "LS - L" 285 00:20:42,220 --> 00:20:47,900 And then we're ignoring the output of LS, but we are not ignoring the output of STRACE. 286 00:20:47,900 --> 00:20:49,740 So if we execute that... 287 00:20:52,300 --> 00:20:54,720 We're gonna get a lot of output. 288 00:20:54,920 --> 00:20:58,740 This is all the different system calls 289 00:21:00,520 --> 00:21:02,080 That this 290 00:21:02,090 --> 00:21:07,510 LS has executed. You will see a bunch of OPEN, you will see FSTAT. 291 00:21:08,150 --> 00:21:14,170 And for example, since it has to list all the properties of the files that are in this folder, we can 292 00:21:15,110 --> 00:21:20,410 check for the LSTAT call. So the LSTAT call will check for the properties of the files and 293 00:21:21,020 --> 00:21:27,420 we can see that, effectively, all the files and folders that are in this directory 294 00:21:27,700 --> 00:21:31,540 have been accessed through a system call, through LS. 295 00:21:34,120 --> 00:21:43,400 Interestingly, sometimes you actually don't need to run your code to 296 00:21:44,360 --> 00:21:47,000 figure out that there is something wrong with your code. 297 00:21:47,960 --> 00:21:52,449 So far we have seen enough ways of identifying issues by running the code, 298 00:21:52,450 --> 00:21:54,410 but what if you... 299 00:21:54,410 --> 00:21:58,980 you can look at a piece of code like this, like the one I have shown right now in this screen, 300 00:21:58,980 --> 00:22:00,560 and identify an issue. 301 00:22:00,560 --> 00:22:02,030 So for example here, 302 00:22:02,030 --> 00:22:06,670 we have some really silly piece of code. It defines a function, prints a few variables, 303 00:22:07,720 --> 00:22:11,780 multiplies some variables, it sleeps for a while and then we try to print BAZ. 304 00:22:12,020 --> 00:22:14,840 And you could try to look at this and say, hey, BAZ has 305 00:22:15,500 --> 00:22:20,650 never been defined anywhere. This is a new variable. You probably meant to say BAR 306 00:22:20,650 --> 00:22:22,540 but you just mistyped it. 307 00:22:22,540 --> 00:22:26,480 Thing is, if we try to run this program, 308 00:22:28,820 --> 00:22:36,820 it's gonna take 60 seconds, because like we have to wait until this time.sleep function finishes. Here, sleep is just for 309 00:22:37,790 --> 00:22:42,070 motivating the example but in general you may be loading a data set that takes really long 310 00:22:42,140 --> 00:22:44,740 because you have to copy everything into memory. 311 00:22:44,740 --> 00:22:48,780 And the thing is, there are programs that will take source code as input, 312 00:22:49,340 --> 00:22:54,940 will process it and will say, oh probably this is wrong about this piece of code. So in Python, 313 00:22:55,760 --> 00:23:00,600 or in general, these are called static analysis tools. 314 00:23:00,780 --> 00:23:02,860 In Python we have for example pyflakes. 315 00:23:02,860 --> 00:23:06,640 If we get this piece of code and run it through pyflakes, 316 00:23:06,860 --> 00:23:09,820 pyflakes is gonna give us a couple of issues. 317 00:23:10,040 --> 00:23:15,700 First one is the one.... The second one is the one we identified: here's an undefined name called BAZ. 318 00:23:15,700 --> 00:23:17,760 You probably should be doing something about that. 319 00:23:17,760 --> 00:23:22,720 And the other one is like oh, you're redefining the 320 00:23:23,060 --> 00:23:27,240 the FOO variable name in that line. 321 00:23:27,540 --> 00:23:31,400 So here we have a FOO function and then we are kind of 322 00:23:31,400 --> 00:23:34,620 shadowing that function by using a loop variable here. 323 00:23:34,760 --> 00:23:38,460 So now that FOO function that we defined is not accessible anymore 324 00:23:38,470 --> 00:23:41,650 and then if we try to call it afterwards, we will get into errors. 325 00:23:43,520 --> 00:23:45,520 There are other types of 326 00:23:46,250 --> 00:23:53,170 Static Analysis tools. MYPY is a different one. MYPY is gonna report the same two errors, but it's also 327 00:23:53,840 --> 00:24:00,160 going to complain about type checking. So it's gonna say, oh here you're multiplying an int by a float and 328 00:24:00,680 --> 00:24:06,320 if you care about the type checking of your code, you should not be mixing those up. 329 00:24:07,490 --> 00:24:12,219 it can be kind of inconvenient, having to run this, look at the line, going back to your 330 00:24:12,800 --> 00:24:17,409 VIM or like your editor, and figuring out what the error matches to. 331 00:24:18,380 --> 00:24:21,190 There are already solutions for that. One 332 00:24:22,340 --> 00:24:27,069 way is that you can integrate most editors with these tools and here.. 333 00:24:28,279 --> 00:24:34,059 You can see there is like some red highlighting on the bash, and it will read the last line here. 334 00:24:34,059 --> 00:24:36,059 So, undefined named 'baz'. 335 00:24:36,160 --> 00:24:39,080 So as I'm editing this piece of Python code, 336 00:24:39,080 --> 00:24:43,360 my editor is gonna give me feedback about what's going wrong with this. 337 00:24:43,560 --> 00:24:48,480 Or like here have another one saying the redefinition of unused foo. 338 00:24:49,849 --> 00:24:51,849 And 339 00:24:53,080 --> 00:24:56,060 even, there are some stylistic complaints. 340 00:24:56,060 --> 00:24:58,060 So, oh, I will expect two empty lines. 341 00:24:58,120 --> 00:25:03,660 So like in Python, you should be having two empty lines between a function definition. 342 00:25:05,779 --> 00:25:07,009 There are... 343 00:25:07,009 --> 00:25:09,280 there is a resource on the lecture notes 344 00:25:09,280 --> 00:25:13,160 about pretty much static analyzers for a lot of different programming languages. 345 00:25:13,700 --> 00:25:18,460 There are even static analyzers for English. 346 00:25:18,840 --> 00:25:24,260 So I have my notes 347 00:25:24,580 --> 00:25:30,280 for the class here, and if I run it through this static analyzer for English, that is "writegood". 348 00:25:30,409 --> 00:25:33,008 It's going to complain about some stylistic properties. 349 00:25:33,009 --> 00:25:33,489 So like, oh, 350 00:25:33,489 --> 00:25:37,460 I'm using "very", which is a weasel word and I shouldn't be using it. 351 00:25:37,480 --> 00:25:43,080 Or "quickly" can weaken meaning, and you can have this for spell checking, or for a lot of different 352 00:25:43,600 --> 00:25:48,000 types of stylistic analysis. 353 00:25:48,760 --> 00:25:52,020 Any questions so far? 354 00:25:57,500 --> 00:25:59,490 Oh, 355 00:25:59,490 --> 00:26:01,490 I forgot to mention... 356 00:26:01,640 --> 00:26:07,320 Depending on the task that you're performing, there will be different types of debuggers. 357 00:26:07,320 --> 00:26:09,740 For example, if you're doing web development, 358 00:26:09,860 --> 00:26:13,520 both Firefox and Chrome 359 00:26:13,740 --> 00:26:20,600 have a really really good set of tools for doing debugging for websites. 360 00:26:20,600 --> 00:26:23,880 So here we go and say inspect element, 361 00:26:23,880 --> 00:26:25,880 we can get the... do you know? how to make this larger... 362 00:26:27,660 --> 00:26:29,220 We're getting 363 00:26:29,220 --> 00:26:33,380 the entire source code for the web page for the class. 364 00:26:35,549 --> 00:26:37,549 Oh, yeah, here we go. 365 00:26:38,640 --> 00:26:40,640 Is that better? 366 00:26:40,799 --> 00:26:47,149 And we can actually go and change properties about the course. So we can say... we can edit the title. 367 00:26:47,400 --> 00:26:51,280 Say, this is not a class on debugging and profiling. 368 00:26:51,620 --> 00:26:53,940 And now the code for the website has changed. 369 00:26:54,120 --> 00:26:56,000 This is one of the reasons why you should never trust 370 00:26:56,200 --> 00:27:00,560 any screenshots of websites, because they can be completely modified. 371 00:27:01,320 --> 00:27:05,030 And you can also modify this style. Like, here I have things 372 00:27:06,120 --> 00:27:07,559 using the 373 00:27:07,560 --> 00:27:09,500 the dark mode preference, 374 00:27:09,680 --> 00:27:11,900 but we can alter that. 375 00:27:11,900 --> 00:27:16,560 Because at the end of the day, the browser is rendering this for us. 376 00:27:17,840 --> 00:27:21,780 We can check the cookies, but there's like a lot of different operations. 377 00:27:21,799 --> 00:27:27,619 There's also a built-in debugger for JavaScript, so you can step through JavaScript code. 378 00:27:27,620 --> 00:27:34,020 So kind of the takeaway is, depending on what you are doing, you will probably want to search for what tools 379 00:27:34,320 --> 00:27:36,820 programmers have built for them. 380 00:27:44,880 --> 00:27:47,630 Now I'm gonna switch gears and 381 00:27:48,200 --> 00:27:51,800 stop talking about debugging, which is kind of finding issues with the code, right? 382 00:27:51,800 --> 00:27:54,200 kind of more about the behavior, and then start talking 383 00:27:54,200 --> 00:27:56,860 about like how you can use profiling. 384 00:27:56,860 --> 00:27:59,240 And profiling is how to optimize the code. 385 00:28:01,100 --> 00:28:05,940 It might be because you want to optimize the CPU, the memory, the network, ... 386 00:28:06,330 --> 00:28:09,889 There are many different reasons that you want to be optimizing it. 387 00:28:10,440 --> 00:28:14,000 As it was the case with debugging, the kind of first-order approach 388 00:28:14,000 --> 00:28:16,680 that a lot of people have experience with already is 389 00:28:16,880 --> 00:28:21,880 oh, let's use just printf profiling, so to say, like we can just take... 390 00:28:22,770 --> 00:28:25,610 Let me make this larger. We can 391 00:28:26,130 --> 00:28:28,110 take the current time here, 392 00:28:28,110 --> 00:28:34,610 then we can check, we can do some execution and then we can take the time again and 393 00:28:35,060 --> 00:28:37,320 subtract it from the original time. 394 00:28:37,320 --> 00:28:39,320 And by doing this you can kind of narrow down 395 00:28:39,540 --> 00:28:46,040 and fence some different parts of your code and try to figure out what is the time taken between those two parts. 396 00:28:47,040 --> 00:28:52,639 And that's good. But sometimes it can be interesting, the results. So here, we're sleeping for 397 00:28:53,730 --> 00:28:59,809 0.5 seconds and the output is saying, oh it's 0.5 plus some extra time, 398 00:28:59,810 --> 00:29:05,929 which is kind of interesting. And if we keep running it, we see there's like some small error and the thing is 399 00:29:06,240 --> 00:29:11,680 here, what we're actually measuring is what is usually referred to as the "real time". 400 00:29:12,060 --> 00:29:14,340 Real time is as if you get 401 00:29:14,340 --> 00:29:15,930 like a 402 00:29:15,930 --> 00:29:19,249 clock, and you start it when your program starts, and you stop it when your program ends. 403 00:29:19,500 --> 00:29:23,060 But the thing is, in your computer it is not only your program that is running. 404 00:29:23,060 --> 00:29:27,460 There are many other programs running at the same time and those might 405 00:29:27,760 --> 00:29:34,640 be the ones that are taking the CPU. So, to try to make sense of that, 406 00:29:35,790 --> 00:29:39,259 A lot of... you'll see a lot of programs 407 00:29:40,620 --> 00:29:43,250 using the terminology that is 408 00:29:44,100 --> 00:29:46,760 real time, user time and system time. 409 00:29:46,760 --> 00:29:51,460 Real time is what I explained, which is kind of the entire length of time from start to finish. 410 00:29:51,840 --> 00:29:59,780 Then there is the user time, which is the amount of time your program spent on the CPU doing user level cycles. 411 00:29:59,780 --> 00:30:06,100 So as I was mentioning, in UNIX, you can be running user level code or kernel level code. 412 00:30:06,920 --> 00:30:12,940 System is kind of the opposite, it's the amount of CPU, like the amount of time that your program spent on the CPU 413 00:30:13,500 --> 00:30:18,480 executing kernel mode instructions. So let's show this with an example. 414 00:30:18,620 --> 00:30:22,180 Here I'm going to "time", which is a command, 415 00:30:22,460 --> 00:30:27,840 a shell command that's gonna get these three metrics for the following command, and then I'm just 416 00:30:28,100 --> 00:30:30,560 grabbing a URL from 417 00:30:31,160 --> 00:30:36,760 a website that is hosted in Spain. So that's gonna take some extra time to go over there and then go back. 418 00:30:37,410 --> 00:30:39,499 If we see, here, if we were to just... 419 00:30:39,780 --> 00:30:43,670 We have two prints, between the beginning and the end of the program. 420 00:30:43,670 --> 00:30:49,039 We could think that this program is taking like 600 milliseconds to execute, but actually 421 00:30:49,500 --> 00:30:56,930 most of that time was spent just waiting for the response on the other side of the network and 422 00:30:57,330 --> 00:31:04,880 we actually only spent 16 milliseconds at the user level and like 9 seconds, in total 25 milliseconds, actually 423 00:31:05,280 --> 00:31:08,149 executing CURL code. Everything else was just waiting. 424 00:31:12,090 --> 00:31:14,480 Any questions related to timing? 425 00:31:19,860 --> 00:31:21,860 Ok, so 426 00:31:21,990 --> 00:31:23,580 timing can be 427 00:31:23,580 --> 00:31:29,480 can become tricky, it's also kind of a black box solution. Or if you start adding print statements, 428 00:31:29,660 --> 00:31:35,860 it's kind of hard to add print statements, with time everywhere. So programmers have figured out better tools. 429 00:31:36,140 --> 00:31:38,700 These are usually referred to as "profilers". 430 00:31:39,980 --> 00:31:44,260 One quick note that I'm gonna make, is that 431 00:31:44,720 --> 00:31:46,720 profilers, like usually when people 432 00:31:46,800 --> 00:31:48,800 refer to profilers they usually talk about 433 00:31:49,050 --> 00:31:55,190 CPU profilers because they are the most common, at identifying where like time is being spent on the CPU. 434 00:31:56,790 --> 00:31:59,180 Profilers usually come in kind of two flavors: 435 00:31:59,180 --> 00:32:02,140 there's tracing profilers and sampling profilers. 436 00:32:02,140 --> 00:32:06,380 and it's kind of good to know the difference because the output might be different. 437 00:32:07,640 --> 00:32:10,300 Tracing profilers kind of instrument your code. 438 00:32:10,680 --> 00:32:15,799 So they kind of execute with your code and every time your code enters a function call, 439 00:32:15,800 --> 00:32:20,479 they kind of take a note of it. It's like, oh we're entering this function call at this moment in time and 440 00:32:21,860 --> 00:32:24,860 they keep going and, once they finish, they can report 441 00:32:24,860 --> 00:32:28,300 oh, you spent this much time executing in this function and 442 00:32:28,580 --> 00:32:33,760 this much time in this other function. So on, so forth, which is the example that we're gonna see now. 443 00:32:34,590 --> 00:32:38,329 Another type of tools are tracing, sorry, sampling profilers. 444 00:32:38,430 --> 00:32:44,840 The issue with tracing profilers is they add a lot of overhead. Like you might be running your code and having these kind of 445 00:32:46,280 --> 00:32:49,400 profiling next to you making all these counts, 446 00:32:49,400 --> 00:32:54,340 will hinder the performance of your program, so you might get counts that are slightly off. 447 00:32:55,380 --> 00:32:59,450 A sampling profiler, what it's gonna do is gonna execute your program and every 448 00:32:59,940 --> 00:33:05,239 100 milliseconds, 10 milliseconds, like some defined period, it's gonna stop your program. It's gonna halt it, 449 00:33:05,580 --> 00:33:12,379 it's gonna look at the stack trace and say, oh, you're right now in this point in the hierarchy, and 450 00:33:12,630 --> 00:33:15,530 identify which function is gonna be executing at that point. 451 00:33:16,260 --> 00:33:19,760 The idea is that as long as you execute this for long enough, 452 00:33:19,760 --> 00:33:24,290 you're gonna get enough statistics to know where most of the time is being spent. 453 00:33:25,800 --> 00:33:28,800 So, let's see an example of a tracing profiling. 454 00:33:28,800 --> 00:33:32,340 So here we have a piece of code that is just like a 455 00:33:33,480 --> 00:33:35,540 really simple re-implementation of grep 456 00:33:36,330 --> 00:33:38,330 done in Python. 457 00:33:38,400 --> 00:33:44,030 What we want to check is what is the bottleneck of this program? Like we're just opening a bunch of files, 458 00:33:44,900 --> 00:33:49,620 trying to match this pattern, and then printing whenever we find a match. 459 00:33:49,620 --> 00:33:52,340 And maybe it's the regex, maybe it's the print... 460 00:33:52,460 --> 00:33:53,940 We don't really know. 461 00:33:53,940 --> 00:33:59,040 So to do this in Python, we have the "cProfile". 462 00:33:59,040 --> 00:34:00,080 And 463 00:34:00,990 --> 00:34:06,620 here I'm just calling this module and saying I want to sort this by the total amount of time, that 464 00:34:06,780 --> 00:34:13,429 we're gonna see briefly. I'm calling the program we just saw in the editor. 465 00:34:13,429 --> 00:34:18,679 I'm gonna execute this a thousand times and then I want to match (the grep 466 00:34:18,960 --> 00:34:21,770 Arguments here) is I want to match these regex 467 00:34:22,919 --> 00:34:27,469 to all the Python files in here. And this is gonna output some... 468 00:34:30,780 --> 00:34:34,369 This is gonna produce some output, then we're gonna look at it. First, 469 00:34:34,369 --> 00:34:38,539 is all the output from the greps, but at the very end, we're getting 470 00:34:39,119 --> 00:34:42,979 output from the profiler itself. If we go up 471 00:34:44,129 --> 00:34:46,939 we can see that, hey, 472 00:34:47,730 --> 00:34:55,250 by sorting we can see that the total number of calls. So we did 8000 calls, because we executed this 1000 times and 473 00:34:57,360 --> 00:35:03,440 this is the total amount of time we spent in this function (cumulative time). And here we can start to identify 474 00:35:03,920 --> 00:35:06,040 where the bottleneck is. 475 00:35:06,050 --> 00:35:11,449 So here, this built-in method IO open, is saying that we're spending a lot of the time just waiting for 476 00:35:12,080 --> 00:35:14,340 reading from the disk or... 477 00:35:14,340 --> 00:35:15,680 There, we can check, hey, 478 00:35:15,680 --> 00:35:19,840 a lot of time is also being spent trying to match the regex. 479 00:35:19,840 --> 00:35:22,640 Which is something that you will expect. 480 00:35:22,640 --> 00:35:26,220 One of the caveats of using this 481 00:35:26,480 --> 00:35:29,540 tracing profiler is that, as you can see, here 482 00:35:29,540 --> 00:35:35,239 we're seeing our function but we're also seeing a lot of functions that correspond to built-ins. 483 00:35:35,240 --> 00:35:35,910 So like, 484 00:35:35,910 --> 00:35:41,899 functions that are third party functions from the libraries. And as you start building more and more complex code, 485 00:35:41,900 --> 00:35:43,560 This is gonna be much harder. 486 00:35:44,200 --> 00:35:44,760 So 487 00:35:46,080 --> 00:35:49,720 here is another piece of Python code that, 488 00:35:51,540 --> 00:35:53,779 don't read through it, what it's doing is just 489 00:35:54,420 --> 00:35:57,589 grabbing the course website and then it's printing all the... 490 00:35:58,440 --> 00:36:01,960 It's parsing it, and then it's printing all the hyperlinks that it has found. 491 00:36:01,960 --> 00:36:03,520 So there are like these two operations: 492 00:36:03,520 --> 00:36:07,800 going there, grabbing a website, and then parsing it, printing the links. 493 00:36:07,800 --> 00:36:09,740 And we might want to get a sense of 494 00:36:09,740 --> 00:36:16,180 how those two operations compare to each other. If we just try to execute the 495 00:36:16,680 --> 00:36:18,680 cProfiler here and 496 00:36:19,260 --> 00:36:24,949 we're gonna do the same, this is not gonna print anything. I'm using a tool we haven't seen so far, 497 00:36:24,950 --> 00:36:25,700 but I think it's pretty nice. 498 00:36:25,700 --> 00:36:32,810 It's "TAC", which is the opposite of "CAT", and it is going to reverse the output so I don't have to go up and look. 499 00:36:33,430 --> 00:36:35,430 So we do this and... 500 00:36:36,250 --> 00:36:39,179 Hey, we get some interesting output. 501 00:36:39,880 --> 00:36:46,200 we're spending a bunch of time in this built-in method socket_getaddr_info and like in _imp_create_dynamic and 502 00:36:46,510 --> 00:36:48,540 method_connect and posix_stat... 503 00:36:49,210 --> 00:36:55,740 nothing in my code is directly calling these functions so I don't really know what is the split between the operation of 504 00:36:56,349 --> 00:37:03,929 making a web request and parsing the output of that web request. So, for that, we can use 505 00:37:04,900 --> 00:37:07,920 a different type of profiler which is 506 00:37:09,819 --> 00:37:14,309 a line profiler. And the line profiler is just going to present the same results 507 00:37:14,310 --> 00:37:20,879 but in a more human-readable way, which is just, for this line of code, this is the amount of time things took. 508 00:37:24,819 --> 00:37:31,079 So it knows it has to do that, we have to add a decorator to the Python function, we do that. 509 00:37:34,869 --> 00:37:36,869 And as we do that, 510 00:37:37,119 --> 00:37:39,749 we now get slightly cropped output, 511 00:37:39,750 --> 00:37:46,169 but the main idea, we can look at the percentage of time and we can see that making this request, get operation, took 512 00:37:46,450 --> 00:37:52,829 88% of the time, whereas parsing the response took only 10.9% of the time. 513 00:37:54,069 --> 00:38:00,869 This can be really informative and a lot of different programming languages will support this type of a line profiling. 514 00:38:04,569 --> 00:38:07,439 Sometimes, you might not care about CPU. 515 00:38:07,440 --> 00:38:15,000 Maybe you care about the memory or like some other resource. Similarly, there are memory profilers: in Python 516 00:38:15,000 --> 00:38:21,599 there is "memory_profiler", for C you will have "Valgrind". So here is a fairly simple example, 517 00:38:21,760 --> 00:38:28,530 we just create this list with a million elements. That's going to consume like megabytes of space and 518 00:38:29,200 --> 00:38:33,920 we do the same, creating another one with 20 million elements. 519 00:38:34,860 --> 00:38:38,180 To check, what was the memory allocation? 520 00:38:38,980 --> 00:38:44,369 How it's gonna happen, what's the consumption? We can go through one memory profiler and 521 00:38:44,950 --> 00:38:46,619 we execute it, 522 00:38:46,620 --> 00:38:51,380 and it's telling us the total memory usage and the increments. 523 00:38:51,380 --> 00:38:57,980 And we can see that we have some overhead, because this is an interpreted language and when we create 524 00:38:58,450 --> 00:39:00,599 this million, 525 00:39:03,520 --> 00:39:07,340 this list with a million entries, we're gonna need this many megabytes of information. 526 00:39:07,660 --> 00:39:15,299 Then we were getting another 150 megabytes. Then, we're freeing this entry and that's decreasing the total amount. 527 00:39:15,299 --> 00:39:19,169 We are not getting a negative increment because of a bug, probably in the profiler. 528 00:39:19,509 --> 00:39:26,549 But if you know that your program is taking a huge amount of memory and you don't know why, maybe because you're copying 529 00:39:26,920 --> 00:39:30,269 objects where you should be doing things in place, then 530 00:39:31,140 --> 00:39:33,320 using a memory profiler can be really useful. 531 00:39:33,320 --> 00:39:37,780 And in fact there's an exercise that will kind of work you through that, comparing 532 00:39:37,980 --> 00:39:39,980 an in-place version of quicksort with like a 533 00:39:40,059 --> 00:39:44,008 non-inplace, that keeps making new and new copies. And if you using the memory profiler 534 00:39:44,009 --> 00:39:47,909 you can get a really good comparison between the two of them 535 00:39:51,069 --> 00:39:53,459 Any questions so far, with profiling? 536 00:39:53,460 --> 00:39:57,940 Is the memory profiler running the program in order to get that? 537 00:39:58,140 --> 00:40:03,180 Yeah... you might be able to figure out like just looking at the code. 538 00:40:03,180 --> 00:40:05,759 But as you get more and more complex (for this code at least) 539 00:40:06,009 --> 00:40:10,738 But you get more and more complex programs what this is doing is running through the program 540 00:40:10,739 --> 00:40:16,739 and for every line, at the very beginning, it's looking at the heap and saying 541 00:40:16,739 --> 00:40:19,319 "What are the objects that I have allocated now?" 542 00:40:19,319 --> 00:40:22,979 "I have seven megabytes of objects", and then goes to the next line, 543 00:40:23,190 --> 00:40:27,869 looks again, "Oh now I have 50, so I have now added 43 there". 544 00:40:28,839 --> 00:40:34,709 Again, you could do this yourself by asking for those operations in your code, every single line. 545 00:40:34,920 --> 00:40:39,899 But that's not how you should be doing things since people have already written these tools for you to use. 546 00:40:43,089 --> 00:40:46,078 As it was the case with... 547 00:40:51,480 --> 00:40:58,220 So as in the case with strace, you can do something similar in profiling. 548 00:40:58,340 --> 00:41:03,380 You might not care about the specific lines of code that you have, 549 00:41:03,440 --> 00:41:08,200 but maybe you want to check for outside events. Like, you maybe want to check how many 550 00:41:09,410 --> 00:41:14,469 CPU cycles your computer program is using, or how many page faults it's creating. 551 00:41:14,469 --> 00:41:19,239 Maybe you have like bad cache locality and that's being manifested somehow. 552 00:41:19,340 --> 00:41:22,960 So for that, there is the "perf" command. 553 00:41:22,960 --> 00:41:27,220 The perf command is gonna do this, where it is gonna run your program and it's gonna 554 00:41:28,720 --> 00:41:33,360 keep track of all these statistics and report them back to you. And this can be really helpful if you are 555 00:41:33,680 --> 00:41:36,060 working at a lower level. So 556 00:41:37,300 --> 00:41:42,840 we execute this command, I'm gonna explain briefly what it's doing. 557 00:41:48,650 --> 00:41:51,639 And this stress program is just 558 00:41:52,219 --> 00:41:54,698 running in the CPU, and it's just a program to just 559 00:41:54,829 --> 00:41:59,528 hog one CPU and like test that you can hog the CPU. And now if we Ctrl-C, 560 00:42:00,619 --> 00:42:02,708 we can go back and 561 00:42:03,410 --> 00:42:08,559 we get some information about the number of page faults that we have or the number of 562 00:42:09,769 --> 00:42:11,769 CPU cycles that we utilize, and other 563 00:42:12,469 --> 00:42:14,329 useful 564 00:42:14,329 --> 00:42:18,968 metrics from our code. For some programs you can 565 00:42:21,469 --> 00:42:25,089 look at what the functions that were being used were. 566 00:42:26,120 --> 00:42:30,140 So we can record what this program is doing, 567 00:42:30,940 --> 00:42:34,920 which we don't know about because it's a program someone else has written. 568 00:42:35,240 --> 00:42:37,240 And 569 00:42:38,180 --> 00:42:42,279 we can report what it was doing by looking at the stack trace and we can say oh, 570 00:42:42,279 --> 00:42:44,279 It's spending a bunch of time in this 571 00:42:44,660 --> 00:42:46,640 __random_r 572 00:42:46,640 --> 00:42:53,229 standard library function. And it's mainly because the way of hogging a CPU is by just creating more and more pseudo-random numbers. 573 00:42:53,779 --> 00:42:55,779 There are some other 574 00:42:55,819 --> 00:42:58,149 functions that have not been mapped, because they 575 00:42:58,369 --> 00:43:01,448 belong to the program, but if you know about your program 576 00:43:01,448 --> 00:43:05,140 you can display this information using more flags, about perf. 577 00:43:05,140 --> 00:43:10,220 There are really good tutorials online about how to use this tool. 578 00:43:12,010 --> 00:43:14,010 Oh 579 00:43:14,119 --> 00:43:17,349 One one more thing regarding profilers is, so far, 580 00:43:17,350 --> 00:43:20,109 we have seen that these profilers are really good at 581 00:43:20,510 --> 00:43:25,419 aggregating all this information and giving you a lot of these numbers so you can 582 00:43:25,790 --> 00:43:29,739 optimize your code or you can reason about what is happening, but 583 00:43:30,560 --> 00:43:31,550 the thing is 584 00:43:31,550 --> 00:43:35,949 humans are not really good at making sense of lots of numbers and since 585 00:43:36,080 --> 00:43:39,249 humans are more visual creatures, it's much 586 00:43:39,920 --> 00:43:42,980 easier to kind of have some sort of visualization. 587 00:43:42,980 --> 00:43:48,700 Again, programmers have already thought about this and have come up with solutions. 588 00:43:49,480 --> 00:43:56,160 A couple of popular ones, is a FlameGraph. A FlameGraph is a 589 00:43:56,780 --> 00:44:00,160 sampling profiler. So this is just running your code and taking samples 590 00:44:00,160 --> 00:44:03,280 And then on the y-axis here 591 00:44:03,280 --> 00:44:10,980 we have the depth of the stack so we know that the bash function called this other function, and this called this other function, 592 00:44:11,260 --> 00:44:14,480 so on, so forth. And on the x-axis it's 593 00:44:14,630 --> 00:44:17,500 not time, it's not the timestamps. 594 00:44:17,500 --> 00:44:23,290 Like it's not this function run before, but it's just time taken. Because, again, this is a sampling profiler: 595 00:44:23,290 --> 00:44:28,540 we're just getting small glimpses of what was it going on in the program. But we know that, for example, 596 00:44:29,119 --> 00:44:32,949 this main program took the most time because the 597 00:44:33,530 --> 00:44:35,530 x-axis is proportional to that. 598 00:44:36,020 --> 00:44:43,090 They are interactive and they can be really useful to identify the hot spots in your program. 599 00:44:44,720 --> 00:44:50,540 Another way of displaying information, and there is also an exercise on how to do this, is using a call graph. 600 00:44:50,720 --> 00:44:58,320 So a call graph is going to be displaying information, and it's gonna create a graph of which function called which other function. 601 00:44:58,620 --> 00:45:00,940 And then you get information about, like, 602 00:45:00,940 --> 00:45:05,770 oh, we know that "__main__" called this "Person" function ten times and 603 00:45:06,050 --> 00:45:08,919 it took this much time. And as you have 604 00:45:09,080 --> 00:45:13,029 larger and larger programs, looking at one of these call graphs can be useful to identify 605 00:45:14,270 --> 00:45:19,689 what piece of your code is calling this really expensive IO operation, for example. 606 00:45:24,560 --> 00:45:30,360 With that I'm gonna cover the last part of the lecture, which is that 607 00:45:30,360 --> 00:45:36,600 sometimes, you might not even know what exact resource is constrained in your program. 608 00:45:36,619 --> 00:45:39,019 Like how do I know how much CPU 609 00:45:39,380 --> 00:45:44,060 my program is using, and I can quickly look in there, or how much memory. 610 00:45:44,060 --> 00:45:46,680 So there are a bunch of really 611 00:45:46,700 --> 00:45:49,760 nifty tools for doing that one of them is 612 00:45:50,400 --> 00:45:53,270 HTOP. so HTOP is an 613 00:45:54,000 --> 00:45:59,810 interactive command-line tool and here it's displaying all the CPUs this machine has, 614 00:46:00,160 --> 00:46:07,740 which is 12. It's displaying the amount of memory, it says I'm consuming almost a gigabyte of the 32 gigabytes my machine has. 615 00:46:07,740 --> 00:46:11,660 And then I'm getting all the different processes. 616 00:46:11,730 --> 00:46:13,290 So for example we have 617 00:46:13,290 --> 00:46:20,300 zsh, mysql and other processes that are running in this machine, and I can sort through the amount of CPU 618 00:46:20,300 --> 00:46:24,379 they're consuming or through the priority they're running at. 619 00:46:25,980 --> 00:46:28,129 We can check this, for example. Here 620 00:46:28,130 --> 00:46:30,230 we have the stress command again 621 00:46:30,230 --> 00:46:31,470 and we're going to 622 00:46:31,470 --> 00:46:37,040 run it to take over four CPUs and check that we can see that in HTOP. 623 00:46:37,040 --> 00:46:42,880 So we did spot those four CPU jobs, and now I have seen that 624 00:46:43,710 --> 00:46:46,429 besides the ones we had before, now I have this... 625 00:46:50,310 --> 00:46:56,119 Like this "stress -c" command running and taking a bunch of our CPU. 626 00:46:56,849 --> 00:47:03,169 Even though you could use a profiler to get similar information to this, the way HTOP displays this kind of in a live interactive 627 00:47:03,329 --> 00:47:07,099 fashion can be much quicker and much easier to parse. 628 00:47:07,890 --> 00:47:09,890 In the notes, there's a 629 00:47:10,160 --> 00:47:15,180 really long list of different tools for evaluating different parts of your system. 630 00:47:15,180 --> 00:47:17,180 So that might be tools for analyzing the 631 00:47:17,180 --> 00:47:19,720 network performance, about looking the 632 00:47:20,430 --> 00:47:24,530 number of IO operations, so you know whether you're saturating the 633 00:47:26,040 --> 00:47:28,040 the reads from your disks, 634 00:47:28,829 --> 00:47:31,429 you can also look at what is the space usage. 635 00:47:32,069 --> 00:47:34,369 Which, I think, here... 636 00:47:38,690 --> 00:47:44,829 So NCDU... There's a tool called "du" which stands for "disk usage" and 637 00:47:45,440 --> 00:47:49,480 we have the "-h" flag for "human readable output". 638 00:47:51,740 --> 00:47:58,959 We can do videos and we can get output about the size of all the files in this folder. 639 00:48:08,059 --> 00:48:10,059 Yeah, there we go. 640 00:48:10,400 --> 00:48:15,040 There are also interactive versions, like HTOP was an interactive version. 641 00:48:15,280 --> 00:48:21,200 So NCDU is an interactive version that will let me navigate through the folders and I can see quickly that 642 00:48:21,200 --> 00:48:25,740 oh, we have... This is one of the folders for the video lectures, 643 00:48:26,329 --> 00:48:29,049 and we can see there are these four files 644 00:48:29,690 --> 00:48:36,579 that have like almost 9 GB each and I could quickly delete them through this interface. 645 00:48:37,760 --> 00:48:43,839 Another neat tool is "LSOF" which stands for "LIST OF OPEN FILES". 646 00:48:44,240 --> 00:48:47,500 Another pattern that you may encounter is you know 647 00:48:47,780 --> 00:48:54,609 some process is using a file, but you don't know exactly which process is using that file. Or, similarly, some process is listening in 648 00:48:55,400 --> 00:48:59,020 a port, but again, how do you find out which one it is? 649 00:48:59,020 --> 00:49:00,820 So to set an example. 650 00:49:00,820 --> 00:49:04,280 We just run a Python HTTP server on port 651 00:49:05,210 --> 00:49:06,559 444 652 00:49:06,559 --> 00:49:10,899 Running there. Maybe we don't know that that's running, but then we can 653 00:49:13,130 --> 00:49:15,130 use... 654 00:49:17,089 --> 00:49:19,089 we can use LSOF. 655 00:49:22,660 --> 00:49:29,200 Yeah, we can use LSOF, and the thing is LSOF is gonna print a lot of information. 656 00:49:30,440 --> 00:49:32,740 You need SUDO permissions because 657 00:49:34,069 --> 00:49:39,219 this is gonna ask for who has all these items. 658 00:49:39,829 --> 00:49:43,929 Since we only care about the one who is listening in this 444 port 659 00:49:44,630 --> 00:49:46,369 we can ask 660 00:49:46,369 --> 00:49:47,960 grep for that. 661 00:49:47,960 --> 00:49:55,750 And we can see, oh, there's like this Python process, with this identifier, that is using the port and then we can 662 00:49:56,660 --> 00:49:58,009 kill it, 663 00:49:58,009 --> 00:50:00,969 and that terminates that process. 664 00:50:02,299 --> 00:50:06,669 Again, there's a lot of different tools. There's even tools for 665 00:50:08,450 --> 00:50:10,569 doing what is called benchmarking. 666 00:50:11,660 --> 00:50:18,789 So in the shell tools and scripting lecture, I said like for some tasks "fd" is much faster than "find" 667 00:50:18,950 --> 00:50:21,519 But like how will you check that? 668 00:50:22,059 --> 00:50:30,038 I can test that with "hyperfine" and I have here two commands: one with "fd" that is just 669 00:50:30,500 --> 00:50:34,029 searching for JPEG files and the same one with "find". 670 00:50:34,579 --> 00:50:41,079 If I execute them, it's gonna benchmark these scripts and give me some output about 671 00:50:41,869 --> 00:50:44,108 how much faster "fd" is 672 00:50:45,380 --> 00:50:47,380 compared to "find". 673 00:50:47,660 --> 00:50:52,269 So I think that kind of concludes... yeah, like 23 times for this task. 674 00:50:52,940 --> 00:50:55,990 So that kind of concludes the whole overview. 675 00:50:56,539 --> 00:51:00,309 I know that there's like a lot of different topics and there's like a lot of 676 00:51:00,650 --> 00:51:04,539 perspectives on doing these things, but again I want to reinforce the idea 677 00:51:04,539 --> 00:51:08,499 that you don't need to be a master of all these topics but more... 678 00:51:08,750 --> 00:51:11,229 To be aware that all these things exist. 679 00:51:11,230 --> 00:51:17,559 So if you run into these issues you don't reinvent the wheel, and you reuse all that other programmers have done. 680 00:51:18,280 --> 00:51:23,700 Given that, I'm happy to take any questions related to this last section or anything in the lecture. 681 00:51:25,900 --> 00:51:30,060 Is there any way to sort of think about how long a program should take? 682 00:51:30,060 --> 00:51:33,160 You know, if it's taking a while to run 683 00:51:33,160 --> 00:51:42,840 you know, should you be worried? Or depending on your process, let me wait another ten minutes before I start looking at why it's taking so long. 684 00:51:43,220 --> 00:51:45,220 Okay, so the... 685 00:51:46,070 --> 00:51:49,089 The task of knowing how long a program 686 00:51:49,090 --> 00:51:53,920 should run is pretty infeasible to figure out. It will depend on the type of program. 687 00:51:54,290 --> 00:52:01,899 It depends on whether you're making HTTP requests or you're reading data... one thing that you can do is if you have 688 00:52:02,390 --> 00:52:02,980 for example, 689 00:52:02,980 --> 00:52:10,689 if you know you have to read two gigabytes from memory, like from disk, and load that into memory, you can make 690 00:52:11,510 --> 00:52:16,719 back-of-the-envelope calculation. So like that shouldn't take longer than like X seconds because this is 691 00:52:16,940 --> 00:52:20,050 how things are set up. Or if you are 692 00:52:20,840 --> 00:52:27,460 reading some files from the network and you know kind of what the network link is and they are taking say five times longer than 693 00:52:27,460 --> 00:52:29,460 what you would expect then you could 694 00:52:29,990 --> 00:52:31,190 try to do that. 695 00:52:31,190 --> 00:52:37,839 Otherwise, if you don't really know. Say you're trying to do some mathematical operation in your code and you're not really sure 696 00:52:37,840 --> 00:52:44,050 about how long that will take you can use something like logging and try to kind of print intermediate 697 00:52:44,570 --> 00:52:50,469 stages to get a sense of like, oh I need to do a thousand operations of this and 698 00:52:51,800 --> 00:52:53,600 three iterations 699 00:52:53,600 --> 00:53:00,700 took ten seconds. Then this is gonna take much longer than I can handle in my case. 700 00:53:00,920 --> 00:53:04,599 So I think there are there are ways, it will again like depend on the task, 701 00:53:04,600 --> 00:53:08,800 but definitely, given all the tools we've seen really, we probably have like 702 00:53:09,620 --> 00:53:13,150 a couple of really good ways to start tackling that. 703 00:53:14,750 --> 00:53:16,750 Any other questions? 704 00:53:16,750 --> 00:53:18,750 You can also do things like 705 00:53:18,750 --> 00:53:21,060 run HTOP and see if anything is running. 706 00:53:22,380 --> 00:53:25,500 Like if your CPU is at 0%, something is probably wrong. 707 00:53:31,140 --> 00:53:32,579 Okay. 708 00:53:32,579 --> 00:53:38,268 There's a lot of exercises for all the topics that we have covered in today's class, 709 00:53:38,269 --> 00:53:41,419 so feel free to do the ones that are more interesting. 710 00:53:42,180 --> 00:53:44,539 We're gonna be holding office hours again today. 711 00:53:45,059 --> 00:53:48,979 Just a reminder, office hours. You can come and ask questions about any lecture. 712 00:53:48,980 --> 00:53:53,510 Like we're not gonna expect you to kind of do the exercises in a couple of minutes. 713 00:53:53,510 --> 00:53:57,979 They take a really long while to get through them, but we're gonna be there 714 00:53:58,529 --> 00:54:04,339 to answer any questions from previous classes, or even not related to exercises. Like if you want to know more about how you 715 00:54:04,619 --> 00:54:09,889 would use TMUX in a way to kind of quickly switch between panes, anything that comes to your mind.