1 99:59:59,999 --> 99:59:59,999 Hi everyone, I'm Gil Tene. 2 99:59:59,999 --> 99:59:59,999 I'm going to be talking about this subject that I call "How NOT to Measure Latency". 3 99:59:59,999 --> 99:59:59,999 It's a subject that I've been talking about for 3 years or so. 4 99:59:59,999 --> 99:59:59,999 I keep the title and change all the slides every time. 5 99:59:59,999 --> 99:59:59,999 A bunch of this stuff is new. 6 99:59:59,999 --> 99:59:59,999 So if you've seen any of my previous "How NOT to", you'll see only some things that are common. 7 99:59:59,999 --> 99:59:59,999 A nickname for the subject is this... 8 99:59:59,999 --> 99:59:59,999 Because I often will get that reaction from some people in the audience. 9 99:59:59,999 --> 99:59:59,999 Ever since I've told people that it's a nickname, 10 99:59:59,999 --> 99:59:59,999 They feel free to actually exclaim, "Oh S@%#!". 11 99:59:59,999 --> 99:59:59,999 And feel free to do that here in this talk. 12 99:59:59,999 --> 99:59:59,999 I'll prompt you in a couple of places where it is natural. 13 99:59:59,999 --> 99:59:59,999 But if just have the urge, go ahead. 14 99:59:59,999 --> 99:59:59,999 So just a tiny bit about me. 15 99:59:59,999 --> 99:59:59,999 I am the co-founder of Azul Systems. 16 99:59:59,999 --> 99:59:59,999 I play around with garbage collection a lot. 17 99:59:59,999 --> 99:59:59,999 Here is some evidence of me playing around with garbage collection in my kitchen. 18 99:59:59,999 --> 99:59:59,999 That's a trash compactor. 19 99:59:59,999 --> 99:59:59,999 The compaction function wasn't working right, so I had to fix it. 20 99:59:59,999 --> 99:59:59,999 I thought it'd be funny to take a picture with a book. 21 99:59:59,999 --> 99:59:59,999 I've also built a lot of things. 22 99:59:59,999 --> 99:59:59,999 I've been playing with computers since the early 80's. 23 99:59:59,999 --> 99:59:59,999 I've built hardware. 24 99:59:59,999 --> 99:59:59,999 I've helped design chips. 25 99:59:59,999 --> 99:59:59,999 I've built software at many different levels. 26 99:59:59,999 --> 99:59:59,999 Operating systems, drivers... JVM's obviously. 27 99:59:59,999 --> 99:59:59,999 And lots of big systems at the system level. 28 99:59:59,999 --> 99:59:59,999 Built our own app server in the late 90's because web logic wasn't around yet. 29 99:59:59,999 --> 99:59:59,999 So, I've made a lot of mistakes, and I've learned from a few of them. 30 99:59:59,999 --> 99:59:59,999 This is actually a combination of a bunch of those mistakes looking at latency. 31 99:59:59,999 --> 99:59:59,999 I do have this hobby of depressing people by pulling the wool up from over your eyes, 32 99:59:59,999 --> 99:59:59,999 and this is what this talk is about. 33 99:59:59,999 --> 99:59:59,999 So, I need to give you a choice right here. 34 99:59:59,999 --> 99:59:59,999 There's the door. 35 99:59:59,999 --> 99:59:59,999 You can take the blue pill, and you can leave. 36 99:59:59,999 --> 99:59:59,999 Tomorrow you can keep believing whatever it is you want to believe. 37 99:59:59,999 --> 99:59:59,999 But if you stay here and take the red pill, I will show you a glimpse of how 38 99:59:59,999 --> 99:59:59,999 far down the rabbit hole goes, and it will never be the same again. 39 99:59:59,999 --> 99:59:59,999 Let's talk about latency. 40 99:59:59,999 --> 99:59:59,999 And when I say latency, I'm talking about latency response time, any of those things 41 99:59:59,999 --> 99:59:59,999 where you measure time from 'here to here', and you're interested in how long it took. 42 99:59:59,999 --> 99:59:59,999 We do this all the time, but I see a lot of mish-mash in how people 43 99:59:59,999 --> 99:59:59,999 treat the data, or think about it. 44 99:59:59,999 --> 99:59:59,999 Latency is basically the time it took something to happen once. 45 99:59:59,999 --> 99:59:59,999 That one time, how long did it take. 46 99:59:59,999 --> 99:59:59,999 And when we measure stuff, like we did a million operations in the last hour, 47 99:59:59,999 --> 99:59:59,999 we have a million latencies. Not one, we have a million of them. 48 99:59:59,999 --> 99:59:59,999 Our actual goal is to figure out how to describe that million. 49 99:59:59,999 --> 99:59:59,999 How did the million behave? 50 99:59:59,999 --> 99:59:59,999 For example, 'they're all really good, and they're all exactly the same', would be a 51 99:59:59,999 --> 99:59:59,999 behavior that you will never see, but that would be a great behavior. 52 99:59:59,999 --> 99:59:59,999 So we need to talk about how things behave, communicate, think, evaluate, 53 99:59:59,999 --> 99:59:59,999 set requirements for, talk to other people, but these are all common things around that. 54 99:59:59,999 --> 99:59:59,999 To do that, we have to describe the distribution, the set, the behavior, 55 99:59:59,999 --> 99:59:59,999 but not the one. 56 99:59:59,999 --> 99:59:59,999 For example, the behavior that says "the the common case was x" is a piece of 57 99:59:59,999 --> 99:59:59,999 information about the behavior, but it's a tiny sliver. 58 99:59:59,999 --> 99:59:59,999 Usually the least relevant one. 59 99:59:59,999 --> 99:59:59,999 Well, there's some less relevant ones, but not a strongly relevant one, 60 99:59:59,999 --> 99:59:59,999 and one that people often focus on. 61 99:59:59,999 --> 99:59:59,999 To take a look at what we actually do with this stuff, almost on a daily basis, 62 99:59:59,999 --> 99:59:59,999 this is a snapshot from a monitoring system. 63 99:59:59,999 --> 99:59:59,999 A small dashboard on a big screen in a monitoring system. 64 99:59:59,999 --> 99:59:59,999 Where you're watching the response time of a system over time. 65 99:59:59,999 --> 99:59:59,999 This is a two hour window. 66 99:59:59,999 --> 99:59:59,999 These lines that are 95th percentile, 90, 75, 50, and 25th percentiles, 67 99:59:59,999 --> 99:59:59,999 you can look at how they behave over time. 68 99:59:59,999 --> 99:59:59,999 We're a small audience here, if you look at this picture, what draws your eye? 69 99:59:59,999 --> 99:59:59,999 What do you want to go investigate here or pay attention to ? 70 99:59:59,999 --> 99:59:59,999 It's the big red spike there, right? 71 99:59:59,999 --> 99:59:59,999 So we could look at the red spike, cause it's different, 72 99:59:59,999 --> 99:59:59,999 and say, "Woah, the 95th percentile shot up here. And look, the 90th percentile 73 99:59:59,999 --> 99:59:59,999 shot up at about the same time. 74 99:59:59,999 --> 99:59:59,999 The rest of them didn't shoot up, so maybe something happened here 75 99:59:59,999 --> 99:59:59,999 that affected that much, I should probably pay attention to it 76 99:59:59,999 --> 99:59:59,999 because it's a monitoring system, and I like things to be calm." 77 99:59:59,999 --> 99:59:59,999 You could go investigate the why. 78 99:59:59,999 --> 99:59:59,999 At this point, I've managed to waste about 90 seconds of your life, 79 99:59:59,999 --> 99:59:59,999 looking at a completely meaningless chart, which unfortunately you do 80 99:59:59,999 --> 99:59:59,999 every day, all the time. 81 99:59:59,999 --> 99:59:59,999 This chart is the chart you want to show somebody if you want to 82 99:59:59,999 --> 99:59:59,999 hide the truth from them. 83 99:59:59,999 --> 99:59:59,999 If you want to pull the wool over their eyes. 84 99:59:59,999 --> 99:59:59,999 This is the chart of the good stuff. 85 99:59:59,999 --> 99:59:59,999 What's not on this chart? 86 99:59:59,999 --> 99:59:59,999 The 5% worse things that happened during this two hours. 87 99:59:59,999 --> 99:59:59,999 They're not here. 88 99:59:59,999 --> 99:59:59,999 This is only the good things that happened during the things. 89 99:59:59,999 --> 99:59:59,999 And to get this spike, that 5% had to be so bad that it even pulled 90 99:59:59,999 --> 99:59:59,999 the 95th percentile up.