WEBVTT 99:59:59.999 --> 99:59:59.999 Hi everyone, I'm Gil Tene. 99:59:59.999 --> 99:59:59.999 I'm going to be talking about this subject that I call "How NOT to Measure Latency". 99:59:59.999 --> 99:59:59.999 It's a subject that I've been talking about for 3 years or so. 99:59:59.999 --> 99:59:59.999 I keep the title and change all the slides every time. 99:59:59.999 --> 99:59:59.999 A bunch of this stuff is new. 99:59:59.999 --> 99:59:59.999 So if you've seen any of my previous "How NOT to", you'll see only some things that are common. 99:59:59.999 --> 99:59:59.999 A nickname for the subject is this... 99:59:59.999 --> 99:59:59.999 Because I often will get that reaction from some people in the audience. 99:59:59.999 --> 99:59:59.999 Ever since I've told people that it's a nickname, 99:59:59.999 --> 99:59:59.999 They feel free to actually exclaim, "Oh S@%#!". 99:59:59.999 --> 99:59:59.999 And feel free to do that here in this talk. 99:59:59.999 --> 99:59:59.999 I'll prompt you in a couple of places where it is natural. 99:59:59.999 --> 99:59:59.999 But if just have the urge, go ahead. 99:59:59.999 --> 99:59:59.999 So just a tiny bit about me. 99:59:59.999 --> 99:59:59.999 I am the co-founder of Azul Systems. 99:59:59.999 --> 99:59:59.999 I play around with garbage collection a lot. 99:59:59.999 --> 99:59:59.999 Here is some evidence of me playing around with garbage collection in my kitchen. 99:59:59.999 --> 99:59:59.999 That's a trash compactor. 99:59:59.999 --> 99:59:59.999 The compaction function wasn't working right, so I had to fix it. 99:59:59.999 --> 99:59:59.999 I thought it'd be funny to take a picture with a book. 99:59:59.999 --> 99:59:59.999 I've also built a lot of things. 99:59:59.999 --> 99:59:59.999 I've been playing with computers since the early 80's. 99:59:59.999 --> 99:59:59.999 I've built hardware. 99:59:59.999 --> 99:59:59.999 I've helped design chips. 99:59:59.999 --> 99:59:59.999 I've built software at many different levels. 99:59:59.999 --> 99:59:59.999 Operating systems, drivers... JVM's obviously. 99:59:59.999 --> 99:59:59.999 And lots of big systems at the system level. 99:59:59.999 --> 99:59:59.999 Built our own app server in the late 90's because web logic wasn't around yet. 99:59:59.999 --> 99:59:59.999 So, I've made a lot of mistakes, and I've learned from a few of them. 99:59:59.999 --> 99:59:59.999 This is actually a combination of a bunch of those mistakes looking at latency. 99:59:59.999 --> 99:59:59.999 I do have this hobby of depressing people by pulling the wool up from over your eyes, 99:59:59.999 --> 99:59:59.999 and this is what this talk is about. 99:59:59.999 --> 99:59:59.999 So, I need to give you a choice right here. 99:59:59.999 --> 99:59:59.999 There's the door. 99:59:59.999 --> 99:59:59.999 You can take the blue pill, and you can leave. 99:59:59.999 --> 99:59:59.999 Tomorrow you can keep believing whatever it is you want to believe. 99:59:59.999 --> 99:59:59.999 But if you stay here and take the red pill, I will show you a glimpse of how 99:59:59.999 --> 99:59:59.999 far down the rabbit hole goes, and it will never be the same again. 99:59:59.999 --> 99:59:59.999 Let's talk about latency. 99:59:59.999 --> 99:59:59.999 And when I say latency, I'm talking about latency response time, any of those things 99:59:59.999 --> 99:59:59.999 where you measure time from 'here to here', and you're interested in how long it took. 99:59:59.999 --> 99:59:59.999 We do this all the time, but I see a lot of mish-mash in how people 99:59:59.999 --> 99:59:59.999 treat the data, or think about it. 99:59:59.999 --> 99:59:59.999 Latency is basically the time it took something to happen once. 99:59:59.999 --> 99:59:59.999 That one time, how long did it take. 99:59:59.999 --> 99:59:59.999 And when we measure stuff, like we did a million operations in the last hour, 99:59:59.999 --> 99:59:59.999 we have a million latencies. Not one, we have a million of them. 99:59:59.999 --> 99:59:59.999 Our actual goal is to figure out how to describe that million. 99:59:59.999 --> 99:59:59.999 How did the million behave? 99:59:59.999 --> 99:59:59.999 For example, 'they're all really good, and they're all exactly the same', would be a 99:59:59.999 --> 99:59:59.999 behavior that you will never see, but that would be a great behavior. 99:59:59.999 --> 99:59:59.999 So we need to talk about how things behave, communicate, think, evaluate, 99:59:59.999 --> 99:59:59.999 set requirements for, talk to other people, but these are all common things around that. 99:59:59.999 --> 99:59:59.999 To do that, we have to describe the distribution, the set, the behavior, 99:59:59.999 --> 99:59:59.999 but not the one. 99:59:59.999 --> 99:59:59.999 For example, the behavior that says "the the common case was x" is a piece of 99:59:59.999 --> 99:59:59.999 information about the behavior, but it's a tiny sliver. 99:59:59.999 --> 99:59:59.999 Usually the least relevant one. 99:59:59.999 --> 99:59:59.999 Well, there's some less relevant ones, but not a strongly relevant one, 99:59:59.999 --> 99:59:59.999 and one that people often focus on. 99:59:59.999 --> 99:59:59.999 To take a look at what we actually do with this stuff, almost on a daily basis, 99:59:59.999 --> 99:59:59.999 this is a snapshot from a monitoring system. 99:59:59.999 --> 99:59:59.999 A small dashboard on a big screen in a monitoring system. 99:59:59.999 --> 99:59:59.999 Where you're watching the response time of a system over time. 99:59:59.999 --> 99:59:59.999 This is a two hour window. 99:59:59.999 --> 99:59:59.999 These lines that are 95th percentile, 90, 75, 50, and 25th percentiles, 99:59:59.999 --> 99:59:59.999 you can look at how they behave over time. 99:59:59.999 --> 99:59:59.999 We're a small audience here, if you look at this picture, what draws your eye? 99:59:59.999 --> 99:59:59.999 What do you want to go investigate here or pay attention to ? 99:59:59.999 --> 99:59:59.999 It's the big red spike there, right? 99:59:59.999 --> 99:59:59.999 So we could look at the red spike, cause it's different, 99:59:59.999 --> 99:59:59.999 and say, "Woah, the 95th percentile shot up here. And look, the 90th percentile 99:59:59.999 --> 99:59:59.999 shot up at about the same time. 99:59:59.999 --> 99:59:59.999 The rest of them didn't shoot up, so maybe something happened here 99:59:59.999 --> 99:59:59.999 that affected that much, I should probably pay attention to it 99:59:59.999 --> 99:59:59.999 because it's a monitoring system, and I like things to be calm." 99:59:59.999 --> 99:59:59.999 You could go investigate the why. 99:59:59.999 --> 99:59:59.999 At this point, I've managed to waste about 90 seconds of your life, 99:59:59.999 --> 99:59:59.999 looking at a completely meaningless chart, which unfortunately you do 99:59:59.999 --> 99:59:59.999 every day, all the time. 99:59:59.999 --> 99:59:59.999 This chart is the chart you want to show somebody if you want to 99:59:59.999 --> 99:59:59.999 hide the truth from them. 99:59:59.999 --> 99:59:59.999 If you want to pull the wool over their eyes. 99:59:59.999 --> 99:59:59.999 This is the chart of the good stuff. 99:59:59.999 --> 99:59:59.999 What's not on this chart? 99:59:59.999 --> 99:59:59.999 The 5% worse things that happened during this two hours. 99:59:59.999 --> 99:59:59.999 They're not here. 99:59:59.999 --> 99:59:59.999 This is only the good things that happened during the things. 99:59:59.999 --> 99:59:59.999 And to get this spike, that 5% had to be so bad that it even pulled 99:59:59.999 --> 99:59:59.999 the 95th percentile up.