9:59:59.000,9:59:59.000 Hi everyone, I'm Gil Tene. 9:59:59.000,9:59:59.000 I'm going to be talking about this subject[br]that I call "How NOT to Measure Latency". 9:59:59.000,9:59:59.000 It's a subject that I've been talking[br]about for 3 years or so. 9:59:59.000,9:59:59.000 I keep the title and change all[br]the slides every time. 9:59:59.000,9:59:59.000 A bunch of this stuff is new. 9:59:59.000,9:59:59.000 So if you've seen any of my previous "How NOT to",[br]you'll see only some things that are common. 9:59:59.000,9:59:59.000 A nickname for the subject is this... 9:59:59.000,9:59:59.000 Because I often will get that reaction[br]from some people in the audience. 9:59:59.000,9:59:59.000 Ever since I've told people that it's a[br]nickname, 9:59:59.000,9:59:59.000 They feel free to actually exclaim,[br]"Oh S@%#!". 9:59:59.000,9:59:59.000 And feel free to do that here in this talk. 9:59:59.000,9:59:59.000 I'll prompt you in a couple of places[br]where it is natural. 9:59:59.000,9:59:59.000 But if just have the urge, go ahead. 9:59:59.000,9:59:59.000 So just a tiny bit about me. 9:59:59.000,9:59:59.000 I am the co-founder of Azul Systems. 9:59:59.000,9:59:59.000 I play around with garbage collection a lot. 9:59:59.000,9:59:59.000 Here is some evidence of me playing around[br]with garbage collection in my kitchen. 9:59:59.000,9:59:59.000 That's a trash compactor. 9:59:59.000,9:59:59.000 The compaction function wasn't working right,[br]so I had to fix it. 9:59:59.000,9:59:59.000 I thought it'd be funny to take a picture[br]with a book. 9:59:59.000,9:59:59.000 I've also built a lot of things. 9:59:59.000,9:59:59.000 I've been playing with computers since[br]the early 80's. 9:59:59.000,9:59:59.000 I've built hardware. 9:59:59.000,9:59:59.000 I've helped design chips. 9:59:59.000,9:59:59.000 I've built software at many [br]different levels. 9:59:59.000,9:59:59.000 Operating systems, drivers...[br]JVM's obviously. 9:59:59.000,9:59:59.000 And lots of big systems at the system level. 9:59:59.000,9:59:59.000 Built our own app server in the late 90's[br]because web logic wasn't around yet. 9:59:59.000,9:59:59.000 So, I've made a lot of mistakes,[br]and I've learned from a few of them. 9:59:59.000,9:59:59.000 This is actually a combination of a bunch[br]of those mistakes looking at latency. 9:59:59.000,9:59:59.000 I do have this hobby of depressing people[br]by pulling the wool up from over your eyes, 9:59:59.000,9:59:59.000 and this is what this talk is about. 9:59:59.000,9:59:59.000 So, I need to give you a choice right here. 9:59:59.000,9:59:59.000 There's the door. 9:59:59.000,9:59:59.000 You can take the blue pill, [br]and you can leave. 9:59:59.000,9:59:59.000 Tomorrow you can keep believing whatever[br]it is you want to believe. 9:59:59.000,9:59:59.000 But if you stay here and take the red pill, [br]I will show you a glimpse of how 9:59:59.000,9:59:59.000 far down the rabbit hole goes, [br]and it will never be the same again. 9:59:59.000,9:59:59.000 Let's talk about latency. 9:59:59.000,9:59:59.000 And when I say latency, I'm talking about[br]latency response time, any of those things 9:59:59.000,9:59:59.000 where you measure time from 'here to here',[br]and you're interested in how long it took. 9:59:59.000,9:59:59.000 We do this all the time, but I see a lot [br]of mish-mash in how people 9:59:59.000,9:59:59.000 treat the data, or think about it. 9:59:59.000,9:59:59.000 Latency is basically the time it took[br]something to happen once. 9:59:59.000,9:59:59.000 That one time, how long did it take. 9:59:59.000,9:59:59.000 And when we measure stuff, like we did [br]a million operations in the last hour, 9:59:59.000,9:59:59.000 we have a million latencies. Not one,[br]we have a million of them. 9:59:59.000,9:59:59.000 Our actual goal is to figure out how to[br]describe that million. 9:59:59.000,9:59:59.000 How did the million behave? 9:59:59.000,9:59:59.000 For example, 'they're all really good, and[br]they're all exactly the same', would be a 9:59:59.000,9:59:59.000 behavior that you will never see, [br]but that would be a great behavior. 9:59:59.000,9:59:59.000 So we need to talk about how things behave,[br]communicate, think, evaluate, 9:59:59.000,9:59:59.000 set requirements for, talk to other people,[br]but these are all common things around that. 9:59:59.000,9:59:59.000 To do that, we have to describe the [br]distribution, the set, the behavior, 9:59:59.000,9:59:59.000 but not the one. 9:59:59.000,9:59:59.000 For example, the behavior that says "the [br]the common case was x" is a piece of 9:59:59.000,9:59:59.000 information about the behavior,[br]but it's a tiny sliver. 9:59:59.000,9:59:59.000 Usually the least relevant one. 9:59:59.000,9:59:59.000 Well, there's some less relevant ones, [br]but not a strongly relevant one, 9:59:59.000,9:59:59.000 and one that people often focus on. 9:59:59.000,9:59:59.000 To take a look at what we actually do [br]with this stuff, almost on a daily basis, 9:59:59.000,9:59:59.000 this is a snapshot from a monitoring system. 9:59:59.000,9:59:59.000 A small dashboard on a big screen [br]in a monitoring system. 9:59:59.000,9:59:59.000 Where you're watching the response time of[br]a system over time. 9:59:59.000,9:59:59.000 This is a two hour window. 9:59:59.000,9:59:59.000 These lines that are 95th percentile, [br]90, 75, 50, and 25th percentiles, 9:59:59.000,9:59:59.000 you can look at how they behave over time. 9:59:59.000,9:59:59.000 We're a small audience here, if you look at[br]this picture, what draws your eye? 9:59:59.000,9:59:59.000 What do you want to go investigate here[br]or pay attention to ? 9:59:59.000,9:59:59.000 It's the big red spike there, right? 9:59:59.000,9:59:59.000 So we could look at the red spike,[br]cause it's different, 9:59:59.000,9:59:59.000 and say, "Woah, the 95th percentile shot up[br]here. And look, the 90th percentile 9:59:59.000,9:59:59.000 shot up at about the same time. 9:59:59.000,9:59:59.000 The rest of them didn't shoot up, [br]so maybe something happened here 9:59:59.000,9:59:59.000 that affected that much, I should probably[br]pay attention to it 9:59:59.000,9:59:59.000 because it's a monitoring system, and [br]I like things to be calm." 9:59:59.000,9:59:59.000 You could go investigate the why. 9:59:59.000,9:59:59.000 At this point, I've managed to waste [br]about 90 seconds of your life, 9:59:59.000,9:59:59.000 looking at a completely meaningless chart,[br]which unfortunately you do 9:59:59.000,9:59:59.000 every day, all the time. 9:59:59.000,9:59:59.000 This chart is the chart you want to show [br]somebody if you want to 9:59:59.000,9:59:59.000 hide the truth from them. 9:59:59.000,9:59:59.000 If you want to pull the wool [br]over their eyes. 9:59:59.000,9:59:59.000 This is the chart of the good stuff. 9:59:59.000,9:59:59.000 What's not on this chart? 9:59:59.000,9:59:59.000 The 5% worse things that happened during[br]this two hours. 9:59:59.000,9:59:59.000 They're not here. 9:59:59.000,9:59:59.000 This is only the good things that happened[br]during the things. 9:59:59.000,9:59:59.000 And to get this spike, that 5% had to be[br]so bad that it even pulled 9:59:59.000,9:59:59.000 the 95th percentile up.