< Return to Video

"How NOT to Measure Latency" by Gil Tene

  • Not Synced
    Hi everyone, I'm Gil Tene.
  • Not Synced
    I'm going to be talking about this subject
    that I call "How NOT to Measure Latency".
  • Not Synced
    It's a subject that I've been talking
    about for 3 years or so.
  • Not Synced
    I keep the title and change all
    the slides every time.
  • Not Synced
    A bunch of this stuff is new.
  • Not Synced
    So if you've seen any of my previous "How NOT to",
    you'll see only some things that are common.
  • Not Synced
    A nickname for the subject is this...
  • Not Synced
    Because I often will get that reaction
    from some people in the audience.
  • Not Synced
    Ever since I've told people that it's a
    nickname,
  • Not Synced
    They feel free to actually exclaim,
    "Oh S@%#!".
  • Not Synced
    And feel free to do that here in this talk.
  • Not Synced
    I'll prompt you in a couple of places
    where it is natural.
  • Not Synced
    But if just have the urge, go ahead.
  • Not Synced
    So just a tiny bit about me.
  • Not Synced
    I am the co-founder of Azul Systems.
  • Not Synced
    I play around with garbage collection a lot.
  • Not Synced
    Here is some evidence of me playing around
    with garbage collection in my kitchen.
  • Not Synced
    That's a trash compactor.
  • Not Synced
    The compaction function wasn't working right,
    so I had to fix it.
  • Not Synced
    I thought it'd be funny to take a picture
    with a book.
  • Not Synced
    I've also built a lot of things.
  • Not Synced
    I've been playing with computers since
    the early 80's.
  • Not Synced
    I've built hardware.
  • Not Synced
    I've helped design chips.
  • Not Synced
    I've built software at many
    different levels.
  • Not Synced
    Operating systems, drivers...
    JVM's obviously.
  • Not Synced
    And lots of big systems at the system level.
  • Not Synced
    Built our own app server in the late 90's
    because web logic wasn't around yet.
  • Not Synced
    So, I've made a lot of mistakes,
    and I've learned from a few of them.
  • Not Synced
    This is actually a combination of a bunch
    of those mistakes looking at latency.
  • Not Synced
    I do have this hobby of depressing people
    by pulling the wool up from over your eyes,
  • Not Synced
    and this is what this talk is about.
  • Not Synced
    So, I need to give you a choice right here.
  • Not Synced
    There's the door.
  • Not Synced
    You can take the blue pill,
    and you can leave.
  • Not Synced
    Tomorrow you can keep believing whatever
    it is you want to believe.
  • Not Synced
    But if you stay here and take the red pill,
    I will show you a glimpse of how
  • Not Synced
    far down the rabbit hole goes,
    and it will never be the same again.
  • Not Synced
    Let's talk about latency.
  • Not Synced
    And when I say latency, I'm talking about
    latency response time, any of those things
  • Not Synced
    where you measure time from 'here to here',
    and you're interested in how long it took.
  • Not Synced
    We do this all the time, but I see a lot
    of mish-mash in how people
  • Not Synced
    treat the data, or think about it.
  • Not Synced
    Latency is basically the time it took
    something to happen once.
  • Not Synced
    That one time, how long did it take.
  • Not Synced
    And when we measure stuff, like we did
    a million operations in the last hour,
  • Not Synced
    we have a million latencies. Not one,
    we have a million of them.
  • Not Synced
    Our actual goal is to figure out how to
    describe that million.
  • Not Synced
    How did the million behave?
  • Not Synced
    For example, 'they're all really good, and
    they're all exactly the same', would be a
  • Not Synced
    behavior that you will never see,
    but that would be a great behavior.
  • Not Synced
    So we need to talk about how things behave,
    communicate, think, evaluate,
  • Not Synced
    set requirements for, talk to other people,
    but these are all common things around that.
  • Not Synced
    To do that, we have to describe the
    distribution, the set, the behavior,
  • Not Synced
    but not the one.
  • Not Synced
    For example, the behavior that says "the
    the common case was x" is a piece of
  • Not Synced
    information about the behavior,
    but it's a tiny sliver.
  • Not Synced
    Usually the least relevant one.
  • Not Synced
    Well, there's some less relevant ones,
    but not a strongly relevant one,
  • Not Synced
    and one that people often focus on.
  • Not Synced
    To take a look at what we actually do
    with this stuff, almost on a daily basis,
  • Not Synced
    this is a snapshot from a monitoring system.
  • Not Synced
    A small dashboard on a big screen
    in a monitoring system.
  • Not Synced
    Where you're watching the response time of
    a system over time.
  • Not Synced
    This is a two hour window.
  • Not Synced
    These lines that are 95th percentile,
    90, 75, 50, and 25th percentiles,
  • Not Synced
    you can look at how they behave over time.
  • Not Synced
    We're a small audience here, if you look at
    this picture, what draws your eye?
  • Not Synced
    What do you want to go investigate here
    or pay attention to ?
  • Not Synced
    It's the big red spike there, right?
  • Not Synced
    So we could look at the red spike,
    cause it's different,
  • Not Synced
    and say, "Woah, the 95th percentile shot up
    here. And look, the 90th percentile
  • Not Synced
    shot up at about the same time.
  • Not Synced
    The rest of them didn't shoot up,
    so maybe something happened here
  • Not Synced
    that affected that much, I should probably
    pay attention to it
  • Not Synced
    because it's a monitoring system, and
    I like things to be calm."
  • Not Synced
    You could go investigate the why.
  • Not Synced
    At this point, I've managed to waste
    about 90 seconds of your life,
  • Not Synced
    looking at a completely meaningless chart,
    which unfortunately you do
  • Not Synced
    every day, all the time.
  • Not Synced
    This chart is the chart you want to show
    somebody if you want to
  • Not Synced
    hide the truth from them.
  • Not Synced
    If you want to pull the wool
    over their eyes.
  • Not Synced
    This is the chart of the good stuff.
  • Not Synced
    What's not on this chart?
  • Not Synced
    The 5% worse things that happened during
    this two hours.
  • Not Synced
    They're not here.
  • Not Synced
    This is only the good things that happened
    during the things.
  • Not Synced
    And to get this spike, that 5% had to be
    so bad that it even pulled
  • Not Synced
    the 95th percentile all up.
  • Not Synced
    There is zero information here at all about
    what happened bad during this two hours,
  • Not Synced
    which makes it a bad fit for
    a monitoring system.
  • Not Synced
    It's a really good thing for
    a marketing system.
  • Not Synced
    It's a great way to get the bonus from your boss, even though you didn't do the work.
  • Not Synced
    If you want to learn how to do that,
    we can do another talk about that.
  • Not Synced
    But this is not a good way to look at latency.
  • Not Synced
    It's the opposite of good.
  • Not Synced
    Unfortunately, this is one of the most
    common tools used for
  • Not Synced
    server monitoring on earth right now.
  • Not Synced
    That's where the snapshot is from,
    and this is what people look at.
  • Not Synced
    I find this chart to be a goldmine
    of information.
  • Not Synced
    When I first showed it in another talk
    like this, I had this really cool experience.
  • Not Synced
    Somebody came up to me and said, "Hey,
    as I was sitting here, I was texting one
  • Not Synced
    of our guys, and he was saying,
  • Not Synced
    'look, we have this issue with
    our 95th percentile'."
  • Not Synced
    And I got this chart from him!
  • Not Synced
    So I went and said, "Hey, what does the
    rest of the spectrum look like?"
  • Not Synced
    This is the actual chart they got.
  • Not Synced
    And when they look at the rest of the
    spectrum, it looked like that.
  • Not Synced
    That's what was hiding.
  • Not Synced
    I noticed the scales are a little different.
  • Not Synced
    That yellow line is that yellow line.
  • Not Synced
    So that's a much more representative number.
  • Not Synced
    Is it? Is that good enough?
  • Not Synced
    That's the 99th percentile.
  • Not Synced
    We still have another 1% of really bad
    stuff that's hiding above the blue line.
  • Not Synced
    I wonder how big that is?
  • Not Synced
    I don't know because he didn't have the data.
  • Not Synced
    So a common problem that we have is that
    we only plot what's convenient.
  • Not Synced
    We only plot what gives us nice,
    colorful graphs.
  • Not Synced
    And often, when we have to choose between
    the stuff that hides the rest of the data,
  • Not Synced
    and the stuff that is noise, we choose
    the noise to display.
  • Not Synced
    I like to rant about latency.
  • Not Synced
    This is from a blog that I don't write
    enough in, but the format for it was simple.
  • Not Synced
    I tweet a single tweet about latency,
    latency tip of the day,
  • Not Synced
    and then I rant about my own tweet.
  • Not Synced
    As an example, this chart is a goldmine
    of information because it has so many
  • Not Synced
    different things that are wrong in it,
    but we won't get into all of them.
  • Not Synced
    You can read it online.
  • Not Synced
    Anyway, this is one to take away from
    what we just said.
  • Not Synced
    If you are not measuring and showing the
    maximum value, what is it you are hiding?
  • Not Synced
    And from whom?
  • Not Synced
    If you're job is to hide the truth from
    others, this is a good way to do it.
  • Not Synced
    But if actually are interested in what's
    going on, the number one indicator
  • Not Synced
    you should never get rid of is the
    maximum value.
  • Not Synced
    That is not noise, that is the signal.
  • Not Synced
    The rest of it is noise.
  • Not Synced
    Okay, let's look at this chart for some
    more cool stuff.
  • Not Synced
    I'm gonna zoom in to a small part
    of the chart, and ask you what that means.
  • Not Synced
    What is the average of the 95th percentile
    over 2 hours mean?
  • Not Synced
    What is the math that does that?
  • Not Synced
    What does it do?
  • Not Synced
    Let's look at that, and I'll give you
    an example with another percentile.
  • Not Synced
    The 100th percentile. The max, right?
  • Not Synced
    Let's take a data set.
  • Not Synced
    Suppose this was the maximum every minute
    for 15 minutes.
  • Not Synced
    What does it mean to say that the average
    max over the last 15 minutes was 42?
  • Not Synced
    I specifically chose the data to
    make that happen.
  • Not Synced
    It's a meaningless statement.
  • Not Synced
    It's a completely meaningless statement.
  • Not Synced
    But when you see 95th percentile,
    average 184, you think that the 95th
  • Not Synced
    percentile for the last two hours
    was around 184.
  • Not Synced
    It makes you think that.
  • Not Synced
    Putting this on a piece of paper is not
    just noise and irrelevant,
  • Not Synced
    it's a way to mislead people.
  • Not Synced
    It's a way to mislead yourself, because
    you'll start to believe your own mistruths.
  • Not Synced
    This is true for any percentile.
  • Not Synced
    There is no percentile that you could do
    this math on.
  • Not Synced
    Another tip, you cannot average percentiles.
  • Not Synced
    That math doesn't happen.
  • Not Synced
    But percentiles do matter. You really
    want to know about them.
  • Not Synced
    And a common misperception is that we want
    to look at the main part of the spectrum,
  • Not Synced
    not those outliers and perfection stuff.
  • Not Synced
    Only people that actually bet their house
    every day, or the bank on it,
  • Not Synced
    need to know about the "five-nine's",
    and all those.
  • Not Synced
    The 99th percentile is a pretty
    good number.
  • Not Synced
    Is 99% really rare?
  • Not Synced
    Let's look at some stuff, because we can
    ask questions like, "If I were looking
  • Not Synced
    at a webpage, what is the chance of me
    hitting the 99th percentile?"
  • Not Synced
    Of things like this: a search engine node,
    or a key value store,
  • Not Synced
    or a database, or a CDN, right?
  • Not Synced
    Because they will report their 99th percentile.
  • Not Synced
    They won't tell you anything above that,
    but how many of the
  • Not Synced
    webpages that we go to
    actually experience this?
  • Not Synced
    You want to say 1%, right?
Title:
"How NOT to Measure Latency" by Gil Tene
Description:

more » « less
Video Language:
English
Team:
Captions Requested
Duration:
42:59

English subtitles

Incomplete

Revisions Compare revisions