0:00:05.901,0:00:10.531
So, we had a talk by a non-GitLab person[br]about GitLab.

0:00:10.531,0:00:13.057
Now, we have a talk by a GitLab person[br]on non-GtlLab.

0:00:13.202,0:00:14.603
Something like that?

0:00:15.894,0:00:19.393
The CCCHH hackerspace is now open,

0:00:19.946,0:00:22.118
from now on if you want to go there,[br]that's the announcement.

0:00:22.471,0:00:25.871
And the next talk will be by Ben Kochie

0:00:26.009,0:00:28.265
on metrics-based monitoring[br]with Prometheus.

0:00:28.748,0:00:30.212
Welcome.

0:00:30.545,0:00:33.133
[Applause]

0:00:35.395,0:00:36.578
Alright, so

0:00:36.886,0:00:39.371
my name is Ben Kochie

0:00:39.845,0:00:43.870
I work on DevOps features for GitLab

0:00:44.327,0:00:48.293
and apart working for GitLab, I also work[br]on the opensource Prometheus project.

0:00:51.163,0:00:54.355
I live in Berlin and I've been using[br]Debian since ???

0:00:54.355,0:00:56.797
yes, quite a long time.

0:00:58.806,0:01:01.018
So, what is Metrics-based Monitoring?

0:01:02.638,0:01:05.165
If you're running software in production,

0:01:05.585,0:01:07.772
you probably want to monitor it,

0:01:07.772,0:01:10.547
because if you don't monitor it, you don't[br]know it's right.

0:01:12.648,0:01:16.112
??? break down into two categories:

0:01:16.112,0:01:19.146
there's blackbox monitoring and[br]there's whitebox monitoring.

0:01:19.500,0:01:24.582
Blackbox monitoring is treating[br]your software like a blackbox.

0:01:24.757,0:01:26.377
It's just checks to see, like,

0:01:26.377,0:01:29.483
is it responding, or does it ping

0:01:29.753,0:01:33.588
or ??? HTTP requests

0:01:34.348,0:01:35.669
[mic turned on]

0:01:37.760,0:01:41.379
Ah, there we go, that's better.

0:01:46.592,0:01:51.898
So, blackbox monitoring is a probe,

0:01:51.898,0:01:54.684
it just kind of looks from the outside[br]to your software

0:01:55.454,0:01:57.432
and it has no knowledge of the internals

0:01:58.133,0:02:00.699
and it's really good for end to end testing.

0:02:00.942,0:02:03.560
So if you've got a fairly complicated[br]service,

0:02:03.990,0:02:06.426
you come in from the outside, you go[br]through the load balancer,

0:02:06.721,0:02:07.975
you hit the API server,

0:02:07.975,0:02:10.145
the API server might hit a database,

0:02:10.145,0:02:12.844
and you go all the way through[br]to the back of the stack

0:02:12.844,0:02:14.536
and all the way back out

0:02:14.560,0:02:16.294
so you know that everything is working[br]end to end.

0:02:16.328,0:02:18.768
But you only know about it[br]for that one request.

0:02:19.036,0:02:22.429
So in order to find out if your service[br]is working,

0:02:22.831,0:02:27.128
from the end to end, for every single[br]request,

0:02:27.135,0:02:29.523
this requires whitebox intrumentation.

0:02:29.836,0:02:33.965
So, basically, every event that happens[br]inside your software,

0:02:33.973,0:02:36.517
inside a serving stack,

0:02:36.817,0:02:39.807
gets collected and gets counted,

0:02:40.037,0:02:43.466
so you know that every request hits[br]the load balancer,

0:02:43.466,0:02:45.656
every request hits your application[br]service,

0:02:45.702,0:02:47.329
every request hits the database.

0:02:47.789,0:02:50.832
You know that everything matches up

0:02:50.997,0:02:55.764
and this is called whitebox, or[br]metrics-based monitoring.

0:02:56.010,0:02:57.688
There is different examples of, like,

0:02:57.913,0:03:02.392
the kind of software that does blackbox[br]and whitebox monitoring.

0:03:02.572,0:03:06.680
So you have software like Nagios that[br]you can configure checks

0:03:08.826,0:03:10.012
or pingdom,

0:03:10.211,0:03:12.347
pingdom will do ping of your website.

0:03:12.971,0:03:15.307
And then there is metrics-based monitoring,

0:03:15.517,0:03:19.293
things like Prometheus, things like[br]the TICK stack from influx data,

0:03:19.610,0:03:22.728
New Relic and other commercial solutions

0:03:23.027,0:03:25.480
but of course I like to talk about[br]the opensorce solutions.

0:03:25.748,0:03:28.379
We're gonna talk a little bit about[br]Prometheus.

0:03:28.819,0:03:31.955
Prometheus came out of the idea that

0:03:32.343,0:03:37.555
we needed a monitoring system that could[br]collect all this whitebox metric data

0:03:37.941,0:03:40.786
and do something useful with it.

0:03:40.915,0:03:42.667
Not just give us a pretty graph, but[br]we also want to be able to

0:03:42.985,0:03:44.189
alert on it.

0:03:44.189,0:03:45.988
So we needed both

0:03:49.872,0:03:54.068
a data gathering and an analytics system[br]in the same instance.

0:03:54.148,0:03:58.821
To do this, we built this thing and[br]we looked at the way that

0:03:59.014,0:04:01.835
data was being generated[br]by the applications

0:04:02.369,0:04:05.204
and there are advantages and[br]disadvantages to this

0:04:05.204,0:04:07.250
push vs. pull model for metrics.

0:04:07.384,0:04:09.701
We decided to go with the pulling model

0:04:09.938,0:04:13.953
because there is some slight advantages[br]for pulling over pushing.

0:04:16.323,0:04:18.163
With pulling, you get this free[br]blackbox check

0:04:18.471,0:04:20.151
that the application is running.

0:04:20.527,0:04:24.319
When you pull your application, you know[br]that the process is running.

0:04:24.532,0:04:27.529
If you are doing push-based, you can't[br]tell the difference between

0:04:27.851,0:04:31.521
your application doing no work and[br]your application not running.

0:04:32.416,0:04:33.900
So you don't know if it's stuck,

0:04:34.140,0:04:37.878
or is it just not having to do any work.

0:04:42.671,0:04:48.940
With pulling, the pulling system knows[br]the state of your network.

0:04:49.850,0:04:52.522
If you have a defined set of services,

0:04:52.887,0:04:56.788
that inventory drives what should be there.

0:04:58.274,0:05:00.080
Again, it's like, the disappearing,

0:05:00.288,0:05:03.950
is the process dead, or is it just[br]not doing anything?

0:05:04.205,0:05:07.117
With polling, you know for a fact[br]what processes should be there,

0:05:07.593,0:05:10.900
and it's a bit of an advantage there.

0:05:11.138,0:05:12.913
With pulling, there's really easy testing.

0:05:13.117,0:05:16.295
With push-based metrics, you have to[br]figure out

0:05:16.505,0:05:18.843
if you want to test a new version of[br]the monitoring system or

0:05:19.058,0:05:20.980
you want to test something new,

0:05:20.980,0:05:24.129
you have to tear off a copy of the data.

0:05:24.370,0:05:27.652
With pulling, you can just set up[br]another instance of your monitoring

0:05:27.676,0:05:29.189
and just test it.

0:05:29.714,0:05:31.033
Or you don't even have,

0:05:31.033,0:05:33.194
it doesn't even have to be monitoring,[br]you can just use curl

0:05:33.199,0:05:35.487
to pull the metrics endpoint.

0:05:38.417,0:05:40.436
It's significantly easier to test.

0:05:40.436,0:05:42.977
The other thing with the…

0:05:45.999,0:05:48.109
The other nice thing is that[br]the client is really simple.

0:05:48.481,0:05:51.068
The client doesn't have to know[br]where the monitoring system is.

0:05:51.272,0:05:53.669
It doesn't have to know about HA

0:05:53.820,0:05:55.720
It just has to sit and collect the data[br]about itself.

0:05:55.882,0:05:58.708
So it doesn't have to know anything about[br]the topology of the network.

0:05:59.134,0:06:03.363
As an application developer, if you're[br]writing a DNS server or

0:06:03.724,0:06:05.572
some other piece of software,

0:06:05.896,0:06:09.562
you don't have to know anything about[br]monitoring software,

0:06:09.803,0:06:12.217
you can just implement it inside[br]your application and

0:06:12.683,0:06:17.058
the monitoring software, whether it's[br]Prometheus or something else,

0:06:17.414,0:06:19.332
can just come and collect that data for you.

0:06:20.210,0:06:23.611
That's kind of similar to a very old[br]monitoring system called SNMP,

0:06:23.832,0:06:28.530
but SNMP has a significantly less friendly[br]data model for developers.

0:06:30.010,0:06:33.556
This is the basic layout[br]of a Prometheus server.

0:06:33.921,0:06:35.918
At the core, there's a Prometheus server

0:06:36.278,0:06:40.302
and it deals with all the data collection[br]and analytics.

0:06:42.941,0:06:46.697
Basically, this one binary,[br]it's all written in golang.

0:06:46.867,0:06:48.559
It's a single binary.

0:06:48.559,0:06:50.823
It knows how to read from your inventory,

0:06:50.823,0:06:52.659
there's a bunch of different methods,[br]whether you've got

0:06:53.121,0:06:58.843
a kubernetes cluster or a cloud platform

0:07:00.234,0:07:03.800
or you have your own customized thing[br]with ansible.

0:07:05.380,0:07:09.750
Ansible can take your layout, drop that[br]into a config file and

0:07:10.639,0:07:11.902
Prometheus can pick that up.

0:07:15.594,0:07:18.812
Once it has the layout, it goes out and[br]collects all the data.

0:07:18.844,0:07:24.254
It has a storage and a time series[br]database to store all that data locally.

0:07:24.462,0:07:28.228
It has a thing called PromQL, which is[br]a query language designed

0:07:28.452,0:07:31.033
for metrics and analytics.

0:07:31.500,0:07:36.779
From that PromQL, you can add frontends[br]that will,

0:07:36.985,0:07:39.319
whether it's a simple API client[br]to run reports,

0:07:40.019,0:07:42.942
you can use things like Grafana[br]for creating dashboards,

0:07:43.124,0:07:44.834
it's got a simple webUI built in.

0:07:45.031,0:07:46.920
You can plug in anything you want[br]on that side.

0:07:48.693,0:07:54.478
And then, it also has the ability to[br]continuously execute queries

0:07:54.625,0:07:56.191
called "recording rules"

0:07:56.832,0:07:59.103
and these recording rules have[br]two different modes.

0:07:59.103,0:08:01.871
You can either record, you can take[br]a query

0:08:02.150,0:08:03.711
and it will generate new data[br]from that query

0:08:04.072,0:08:06.967
or you can take a query, and[br]if it returns results,

0:08:07.354,0:08:08.910
it will return an alert.

0:08:09.176,0:08:12.506
That alert is a push message[br]to the alert manager.

0:08:12.813,0:08:18.969
This allows us to separate the generating[br]of alerts from the routing of alerts.

0:08:19.153,0:08:24.259
You can have one or hundreds of Prometheus[br]services, all generating alerts

0:08:24.599,0:08:28.807
and it goes into an alert manager cluster[br]and sends, does the deduplication

0:08:29.329,0:08:30.684
and the routing to the human

0:08:30.879,0:08:34.138
because, of course, the thing[br]that we want is

0:08:34.927,0:08:38.797
we had dashboards with graphs, but[br]in order to find out if something is broken

0:08:38.966,0:08:40.650
you had to have a human[br]looking at the graph.

0:08:40.830,0:08:42.942
With Prometheus, we don't have to do that[br]anymore,

0:08:43.103,0:08:47.638
we can simply let the software tell us[br]that we need to go investigate

0:08:47.638,0:08:48.650
our problems.

0:08:48.778,0:08:50.831
We don't have to sit there and[br]stare at dashboards all day,

0:08:51.035,0:08:52.380
because that's really boring.

0:08:54.519,0:08:57.556
What does it look like to actually[br]get data into Prometheus?

0:08:57.587,0:09:02.140
This is a very basic output[br]of a Prometheus metric.

0:09:02.613,0:09:03.930
This is a very simple thing.

0:09:04.086,0:09:07.572
If you know much about[br]the linux kernel,

0:09:06.883,0:09:12.779
the linux kernel tracks and proc stats,[br]all the state of all the CPUs

0:09:12.779,0:09:14.459
in your system

0:09:14.662,0:09:18.078
and we express this by having[br]the name of the metric, which is

0:09:22.449,0:09:26.123
'node_cpu_seconds_total' and so[br]this is a self-describing metric,

0:09:26.547,0:09:28.375
like you can just read the metrics name

0:09:28.530,0:09:30.845
and you understand a little bit about[br]what's going on here.

0:09:33.241,0:09:38.521
The linux kernel and other kernels track[br]their usage by the number of seconds

0:09:38.859,0:09:41.004
spent doing different things and

0:09:41.199,0:09:46.721
that could be, whether it's in system or[br]user space or IRQs

0:09:47.065,0:09:48.690
or iowait or idle.

0:09:48.908,0:09:51.280
Actually, the kernel tracks how much[br]idle time it has.

0:09:53.660,0:09:55.309
It also tracks it by the number of CPUs.

0:09:55.997,0:10:00.067
With other monitoring systems, they used[br]to do this with a tree structure

0:10:01.021,0:10:03.688
and this caused a lot of problems,[br]for like

0:10:03.854,0:10:09.291
How do you mix and match data so[br]by switching from

0:10:10.043,0:10:12.484
a tree structure to a tag-based structure,

0:10:12.985,0:10:16.896
we can do some really interesting[br]powerful data analytics.

0:10:18.170,0:10:25.170
Here's a nice example of taking[br]those CPU seconds counters

0:10:26.101,0:10:30.198
and then converting them into a graph[br]by using PromQL.

0:10:32.724,0:10:34.830
Now we can get into[br]Metrics-Based Alerting.

0:10:35.315,0:10:37.665
Now we have this graph, we have this thing

0:10:37.847,0:10:39.497
we can look and see here

0:10:39.999,0:10:42.920
"Oh there is some little spike here,[br]we might want to know about that."

0:10:43.191,0:10:45.849
Now we can get into Metrics-Based[br]Alerting.

0:10:46.281,0:10:51.128
I used to be a site reliability engineer,[br]I'm still a site reliability engineer at heart

0:10:52.371,0:11:00.362
and we have this concept of things that[br]you need on a site or a service reliably

0:11:00.910,0:11:03.231
The most important thing you need is[br]down at the bottom,

0:11:03.569,0:11:06.869
Monitoring, because if you don't have[br]monitoring of your service,

0:11:07.108,0:11:08.688
how do you know it's even working?

0:11:11.628,0:11:15.235
There's a couple of techniques here, and[br]we want to alert based on data

0:11:15.693,0:11:17.644
and not just those end to end tests.

0:11:18.796,0:11:23.387
There's a couple of techniques, a thing[br]called the RED method

0:11:23.555,0:11:25.141
and there's a thing called the USE method

0:11:25.588,0:11:28.400
and there's a couple nice things to some[br]blog posts about this

0:11:28.695,0:11:31.306
and basically it defines that, for example,

0:11:31.484,0:11:35.000
the RED method talks about the requests[br]that your system is handling

0:11:36.421,0:11:37.604
There are three things:

0:11:37.775,0:11:40.073
There's the number of requests, there's[br]the number of errors

0:11:40.268,0:11:42.306
and there's how long takes a duration.

0:11:42.868,0:11:45.000
With the combination of these three things

0:11:45.341,0:11:48.368
you can determine most of[br]what your users see

0:11:48.712,0:11:53.616
"Did my request go through? Did it[br]return an error? Was it fast?"

0:11:55.492,0:11:57.971
Most people, that's all they care about.

0:11:58.205,0:12:01.965
"I made a request to a website and[br]it came back and it was fast."

0:12:04.975,0:12:06.517
It's a very simple method of just, like,

0:12:07.162,0:12:10.109
those are the important things to[br]determine if your site is healthy.

0:12:12.193,0:12:17.045
But we can go back to some more[br]traditional, sysadmin style alerts

0:12:17.309,0:12:20.553
this is basically taking the filesystem[br]available space,

0:12:20.824,0:12:26.522
divided by the filesystem size, that becomes[br]the ratio of filesystem availability

0:12:26.697,0:12:27.523
from 0 to 1.

0:12:28.241,0:12:30.759
Multiply it by 100, we now have[br]a percentage

0:12:31.016,0:12:35.659
and if it's less than or equal to 1%[br]for 15 minutes,

0:12:35.940,0:12:41.782
this is less than 1% space, we should tell[br]a sysadmin to go check

0:12:41.957,0:12:44.290
to find out why the filesystem[br]has fall

0:12:44.635,0:12:46.168
It's super nice and simple.

0:12:46.494,0:12:49.685
We can also tag, we can include…

0:12:51.418,0:12:58.232
Every alert includes all the extraneous[br]labels that Prometheus adds to your metrics

0:12:59.488,0:13:05.461
When you add a metric in Prometheus, if[br]we go back and we look at this metric.

0:13:06.009,0:13:10.803
This metric only contain the information[br]about the internals of the application

0:13:12.942,0:13:14.995
anything about, like, what server it's on,[br]is it running in a container,

0:13:15.186,0:13:18.724
what cluster does it come from,[br]what continent is it on,

0:13:17.702,0:13:22.280
that's all extra annotations that are[br]added by the Prometheus server

0:13:22.619,0:13:23.949
at discovery time.

0:13:24.514,0:13:28.347
Unfortunately I don't have a good example [br]of what those labels look like

0:13:28.514,0:13:34.180
but every metric gets annotated[br]with location information.

0:13:36.904,0:13:41.121
That location information also comes through[br]as labels in the alert

0:13:41.300,0:13:48.074
so, if you have a message coming[br]into your alert manager,

0:13:48.269,0:13:49.899
the alert manager can look and go

0:13:50.093,0:13:51.621
"Oh, that's coming from this datacenter"

0:13:52.007,0:13:58.905
and it can include that in the email or[br]IRC message or SMS message.

0:13:59.069,0:14:00.772
So you can include

0:13:59.271,0:14:04.422
"Filesystem is out of space on this host[br]from this datacenter"

0:14:04.557,0:14:07.340
All these labels get passed through and[br]then you can append

0:14:07.491,0:14:13.292
"severity: critical" to that alert and[br]include that in the message to the human

0:14:13.693,0:14:16.775
because of course, this is how you define…

0:14:16.940,0:14:20.857
Getting the message from the monitoring[br]to the human.

0:14:22.197,0:14:23.850
You can even include nice things like,

0:14:24.027,0:14:27.508
if you've got documentation, you can[br]include a link to the documentation

0:14:27.620,0:14:28.686
as an annotation

0:14:29.079,0:14:33.438
and the alert manager can take that[br]basic url and, you know,

0:14:33.467,0:14:36.806
massaging it into whatever it needs[br]to look like to actually get

0:14:37.135,0:14:40.417
the operator to the correct documentation.

0:14:42.117,0:14:43.450
We can also do more fun things:

0:14:43.657,0:14:45.567
since we actually are not just checking

0:14:45.746,0:14:48.523
what is the space right now,[br]we're tracking data over time,

0:14:49.232,0:14:50.827
we can use 'predict_linear'.

0:14:52.406,0:14:55.255
'predict_linear' just takes and does[br]a simple linear regression.

0:14:55.749,0:15:00.270
This example takes the filesystem[br]available space over the last hour and

0:15:00.865,0:15:02.453
does a linear regression.

0:15:02.785,0:15:08.536
Prediction says "Well, it's going that way[br]and four hours from now,

0:15:08.749,0:15:13.112
based on one hour of history, it's gonna[br]be less than 0, which means full".

0:15:13.667,0:15:20.645
We know that within the next four hours,[br]the disc is gonna be full

0:15:20.874,0:15:24.658
so we can tell the operator ahead of time[br]that it's gonna be full

0:15:24.833,0:15:26.517
and not just tell them that it's full[br]right now.

0:15:27.113,0:15:32.303
They have some window of ability[br]to fix it before it fails.

0:15:32.674,0:15:35.369
This is really important because[br]if you're running a site

0:15:35.689,0:15:41.370
you want to be able to have alerts[br]that tell you that your system is failing

0:15:41.573,0:15:42.994
before it actually fails.

0:15:43.667,0:15:48.254
Because if it fails, you're out of SLO[br]or SLA and

0:15:48.404,0:15:50.322
your users are gonna be unhappy

0:15:50.729,0:15:52.493
and you don't want the users to tell you[br]that your site is down

0:15:52.682,0:15:54.953
you want to know about it before[br]your users can even tell.

0:15:55.193,0:15:58.491
This allows you to do that.

0:15:58.693,0:16:02.232
And also of course, Prometheus being[br]a modern system,

0:16:02.735,0:16:05.633
we support fully UTF8 in all of our labels.

0:16:08.283,0:16:12.101
Here's an other one, here's a good example[br]from the USE method.

0:16:12.490,0:16:16.036
This is a rate of 500 errors coming from[br]an application

0:16:16.423,0:16:17.813
and you can simply alert that

0:16:17.977,0:16:22.555
there's more than 500 errors per second[br]coming out of the application

0:16:22.568,0:16:25.670
if that's your threshold for pain

0:16:26.041,0:16:27.298
And you can do other things,

0:16:27.501,0:16:29.338
you can convert that from just[br]a raid of errors

0:16:29.723,0:16:31.054
to a percentive error.

0:16:31.304,0:16:32.605
So you could say

0:16:33.053,0:16:37.336
"I have an SLA of 3 9" and so you can say

0:16:37.574,0:16:46.710
"If the rate of errors divided by the rate[br]of requests is .01,

0:16:47.265,0:16:49.335
or is more than .01, then[br]that's a problem."

0:16:49.725,0:16:54.589
You can include that level of[br]error granularity.

0:16:54.797,0:16:57.622
And if you're just doing a blackbox test,

0:16:58.185,0:17:03.727
you wouldn't know this, you would only get[br]if you got an error from the system,

0:17:04.188,0:17:05.601
then you got another error from the system

0:17:05.826,0:17:06.938
then you fire an alert.

0:17:07.307,0:17:11.847
But if those checks are one minute apart[br]and you're serving 1000 requests per second

0:17:13.324,0:17:20.987
you could be serving 10,000 errors before[br]you even get an alert.

0:17:21.579,0:17:22.876
And you might miss it, because

0:17:23.104,0:17:24.993
what if you only get one random error

0:17:25.327,0:17:28.898
and then the next time, you're serving[br]25% errors,

0:17:29.094,0:17:31.571
you only have a 25% chance of that check[br]failing again.

0:17:31.800,0:17:36.230
You really need these metrics in order[br]to get

0:17:36.430,0:17:38.867
proper reports of the status of your system

0:17:43.176,0:17:43.850
There's even options

0:17:44.051,0:17:45.816
You can slice and dice those labels.

0:17:46.225,0:17:50.056
If you have a label on all of[br]your applications called 'service'

0:17:50.322,0:17:53.251
you can send that 'service' label through[br]to the message

0:17:53.523,0:17:55.857
and you can say[br]"Hey, this service is broken".

0:17:56.073,0:18:00.363
You can include that service label[br]in your alert messages.

0:18:01.426,0:18:06.723
And that's it, I can go to a demo and Q&A.

0:18:09.881,0:18:13.687
[Applause]

0:18:16.877,0:18:18.417
Any questions so far?

0:18:18.811,0:18:20.071
Or anybody want to see a demo?

0:18:29.517,0:18:35.065
[Q] Hi. Does Prometheus make metric[br]discovery inside containers

0:18:35.364,0:18:37.476
or do I have to implement the metrics[br]myself?

0:18:38.184,0:18:45.743
[A] For metrics in containers, there are[br]already things that expose

0:18:45.887,0:18:49.214
the metrics of the container system[br]itself.

0:18:49.512,0:18:52.174
There's a utility called 'cadvisor' and

0:18:52.395,0:18:57.172
cadvisor takes the links cgroup data[br]and exposes it as metrics

0:18:57.416,0:19:01.164
so you can get data about[br]how much CPU time is being

0:19:01.164,0:19:02.421
spent in your container,

0:19:02.683,0:19:04.139
how much memory is being spent[br]by your container.

0:19:04.775,0:19:08.411
[Q] But not about the application,[br]just about the container usage ?

0:19:08.597,0:19:11.355
[A] Right. Because the container[br]has no idea

0:19:11.698,0:19:15.451
whether your application is written[br]in Ruby or Go or Python or whatever,

0:19:18.698,0:19:21.602
you have to build that into[br]your application in order to get the data.

0:19:24.057,0:19:24.307
So for Prometheus,

0:19:27.890,0:19:35.031
we've written client libraries that can be[br]included in your application directly

0:19:35.195,0:19:36.413
so you can get that data out.

0:19:36.602,0:19:41.460
If you go to the Prometheus website,[br]we have a whole series of client libraries

0:19:44.936,0:19:48.913
and we cover a pretty good selection[br]of popular software.

0:19:56.569,0:19:59.537
[Q] What is the current state of[br]long-term data storage?

0:20:00.803,0:20:01.678
[A] Very good question.

0:20:02.697,0:20:04.513
There's been several…

0:20:04.913,0:20:06.521
There's actually several different methods[br]of doing this.

0:20:09.653,0:20:14.667
Prometheus stores all this data locally[br]in its own data storage

0:20:14.667,0:20:15.711
on the local disk.

0:20:16.609,0:20:19.156
But that's only as durable as[br]that server is durable.

0:20:19.423,0:20:21.627
So if you've got a really durable server,

0:20:21.812,0:20:23.357
you can store as much data as you want,

0:20:23.551,0:20:26.521
you can store years and years of data[br]locally on the Prometheus server.

0:20:26.653,0:20:28.088
That's not a problem.

0:20:28.781,0:20:32.244
There's a bunch of misconceptions because[br]of our default

0:20:32.464,0:20:34.492
and the language on our website said

0:20:34.698,0:20:36.160
"It's not long-term storage"

0:20:36.707,0:20:41.841
simply because we leave that problem[br]up to the person running the server.

0:20:43.389,0:20:46.389
But the time series database[br]that Prometheus includes

0:20:46.562,0:20:47.739
is actually quite durable.

0:20:49.157,0:20:51.069
But it's only as durable as the server[br]underneath it.

0:20:51.642,0:20:55.172
So if you've got a very large cluster and[br]you want really high durability,

0:20:55.800,0:20:57.705
you need to have some kind of[br]cluster software,

0:20:58.217,0:21:01.106
but because we want Prometheus to be[br]simple to deploy

0:21:01.701,0:21:02.911
and very simple to operate

0:21:03.355,0:21:06.774
and also very robust.

0:21:06.950,0:21:09.370
We didn't want to include any clustering[br]in Prometheus itself,

0:21:09.787,0:21:12.078
because anytime you have a clustered[br]software,

0:21:12.294,0:21:15.100
what happens if your network is[br]a little wanky.

0:21:15.586,0:21:19.470
The first thing that goes down is[br]all of your distributed systems fail.

0:21:20.328,0:21:23.048
And building distributed systems to be[br]really robust is really hard

0:21:23.445,0:21:29.142
so Prometheus is what we call[br]"uncoordinated distributed systems".

0:21:29.348,0:21:34.048
If you've got two Prometheus servers[br]monitoring all your targets in an HA mode

0:21:34.273,0:21:36.890
in a cluster, and there's a split brain,

0:21:37.131,0:21:40.363
each Prometheus can see[br]half of the cluster and

0:21:40.768,0:21:43.557
it can see that the other half[br]of the cluster is down.

0:21:43.846,0:21:46.740
They can both try to get alerts out[br]to the alert manager

0:21:46.945,0:21:50.466
and this is a really really robust way of[br]handling split brains

0:21:50.734,0:21:54.069
and bad network failures and bad problems[br]in a cluster.

0:21:54.294,0:21:57.163
It's designed to be super super robust

0:21:57.342,0:21:59.844
and so the two individual[br]Promotheus servers in you cluster

0:22:00.079,0:22:02.009
don't have to talk to each other[br]to do this,

0:22:02.193,0:22:03.994
they can just to it independently.

0:22:04.377,0:22:07.392
But if you want to be able[br]to correlate data

0:22:07.604,0:22:09.255
between many different Prometheus servers

0:22:09.439,0:22:12.185
you need an external data storage[br]to do this.

0:22:12.777,0:22:15.008
And also you may not have[br]very big servers,

0:22:15.164,0:22:17.126
you might be running your Prometheus[br]in a container

0:22:17.293,0:22:19.373
and it's only got a little bit of local[br]storage space

0:22:19.543,0:22:23.217
so you want to send all that data up[br]to a big cluster datastore

0:22:23.439,0:22:25.124
for a bigger use

0:22:25.707,0:22:27.913
We have several different ways of[br]doing this.

0:22:28.383,0:22:30.941
There's the classic way which is called[br]federation

0:22:31.156,0:22:34.875
where you have one Prometheus server[br]polling in summary data from

0:22:35.083,0:22:36.604
each of the individual Prometheus servers

0:22:36.823,0:22:40.266
and this is useful if you want to run[br]alerts against data coming

0:22:40.363,0:22:41.578
from multiple Prometheus servers.

0:22:42.488,0:22:44.240
But federation is not replication.

0:22:44.870,0:22:47.488
It only can do a little bit of data from[br]each Prometheus server.

0:22:47.715,0:22:51.078
If you've got a million metrics on[br]each Prometheus server,

0:22:51.683,0:22:55.725
you can't poll in a million metrics[br]and do…

0:22:55.725,0:22:58.850
If you've got 10 of those, you can't[br]poll in 10 million metrics

0:22:59.011,0:23:00.635
simultaneously into one Prometheus[br]server.

0:23:00.919,0:23:01.890
It's just to much data.

0:23:02.875,0:23:06.006
There is two others, a couple of other[br]nice options.

0:23:06.618,0:23:08.923
There's a piece of software called[br]Cortex.

0:23:09.132,0:23:16.033
Cortex is a Prometheus server that[br]stores its data in a database.

0:23:16.570,0:23:19.127
Specifically, a distributed database.

0:23:19.395,0:23:24.136
Things that are based on the Google[br]big table model, like Cassandra or…

0:23:25.892,0:23:27.166
What's the Amazon one?

0:23:30.332,0:23:32.667
Yeah.

0:23:32.682,0:23:33.700
Dynamodb.

0:23:34.193,0:23:37.137
If you have a dynamodb or a cassandra[br]cluster, or one of these other

0:23:37.350,0:23:39.298
really big distributed storage clusters,

0:23:39.713,0:23:44.615
Cortex can run and the Prometheus servers[br]will stream their data up to Cortex

0:23:44.907,0:23:49.384
and it will keep a copy of that accross[br]all of your Prometheus servers.

0:23:49.596,0:23:51.373
And because it's based on things[br]like Cassandra,

0:23:51.709,0:23:53.150
it's super scalable.

0:23:53.436,0:23:57.862
But it's a little complex to run and

0:23:57.536,0:24:00.836
many people don't want to run that[br]complex infrastructure.

0:24:01.254,0:24:06.080
We have another new one, we just blogged[br]about it yesterday.

0:24:01.564,0:24:06.513
It's a thing called Thanos.

0:24:06.513,0:24:10.596
Thanos is Prometheus at scale.

0:24:11.143,0:24:12.356
Basically, the way it works…

0:24:12.761,0:24:15.063
Actually, why don't I bring that up?

0:24:24.122,0:24:30.519
This was developed by a company[br]called Improbable

0:24:30.935,0:24:32.632
and they wanted to…

0:24:35.489,0:24:40.063
They had billions of metrics coming from[br]hundreds of Prometheus servers.

0:24:40.604,0:24:46.645
They developed this in collaboration with[br]the Prometheus team to build

0:24:47.000,0:24:48.581
a super highly scalable Prometheus server.

0:24:49.877,0:24:55.518
Prometheus itself stores the incoming[br]metrics data in a write ahead log

0:24:56.008,0:24:59.560
and then every two hours, it creates[br]a compaction cycle

0:24:59.982,0:25:03.177
and it creates an imutable time series block[br]of data which is

0:25:03.606,0:25:06.718
all the time series blocks themselves

0:25:07.131,0:25:10.319
and then an index into that data.

0:25:10.849,0:25:13.678
Those two hour windows are all imutable

0:25:14.037,0:25:16.297
so what Thanos does,[br]it has a little sidecar binary that

0:25:16.297,0:25:18.722
[br]watches for those new directories and

0:25:18.722,0:25:20.701
uploads them into a blob store.

0:25:20.701,0:25:25.819
So you could put them in S3 or minio or[br]some other simple object storage.

0:25:26.301,0:25:32.916
And then now you have all of your data,[br]all of this index data already

0:25:32.916,0:25:34.816
ready to go

0:25:34.816,0:25:38.489
and then the final sidecar creates[br]a little mesh cluster that can read from

0:25:38.489,0:25:39.616
all of those S3 blocks.

0:25:40.123,0:25:48.470
Now, you have this super global view[br]all stored in a big bucket storage and

0:25:49.621,0:25:52.404
things like S3 or minio are…

0:25:52.995,0:25:57.669
Bucket storage is not databases so they're[br]operationally a little easier to operate.

0:25:58.405,0:26:02.183
Plus, now we have all this data in[br]a bucket store and

0:26:02.600,0:26:06.081
the Thanos sidecars can talk to each other

0:26:06.526,0:26:08.150
We can now have a single entry point.

0:26:08.418,0:26:11.915
You can query Thanos and Thanos will[br]distribute your query

0:26:12.131,0:26:13.577
across all your Prometheus servers.

0:26:13.792,0:26:16.181
So now you can do global queries across[br]all of your servers.

0:26:17.696,0:26:22.246
But it's very new, they just released[br]their first release candidate yesterday.

0:26:23.926,0:26:26.875
It is looking to be like[br]the coolest thing ever

0:26:27.448,0:26:29.341
for running large scale Prometheus.

0:26:30.315,0:26:34.779
Here's an example of how that is laid out.

0:26:36.840,0:26:39.469
This will bring and let you have[br]a billion metric Prometheus cluster.

0:26:42.607,0:26:44.261
And it's got a bunch of other[br]cool features.

0:26:45.376,0:26:46.672
Any more questions?

0:26:55.353,0:26:57.436
Alright, maybe I'll do[br]a quick little demo.

0:27:05.407,0:27:10.547
Here is a Prometheus server that is[br]provided by this group

0:27:10.736,0:27:14.141
that just does a ansible deployment[br]for Prometheus.

0:27:15.342,0:27:19.597
And you can just simply query[br]for something like 'node_cpu'.

0:27:21.077,0:27:23.073
This is actually the old name for[br]that metric.

0:27:24.083,0:27:25.659
And you can see, here's exactly

0:27:28.078,0:27:31.250
the CPU metrics from some servers.

0:27:32.907,0:27:34.634
It's just a bunch of stuff.

0:27:35.008,0:27:37.060
There's actually two servers here,

0:27:37.445,0:27:40.660
there's an influx cloud alchemy and[br]there is a demo cloud alchemy.

0:27:42.011,0:27:43.666
[Q] Can you zoom in?[br][A] Oh yeah sure.

0:27:53.135,0:27:57.617
So you can see all the extra labels.

0:28:00.067,0:28:01.644
We can also do some things like…

0:28:02.176,0:28:04.247
Let's take a look at, say,[br]the last 30 seconds.

0:28:04.614,0:28:07.226
We can just add this little time window.

0:28:07.755,0:28:11.033
It's called a range request,[br]and you can see

0:28:11.257,0:28:12.398
the individual samples.

0:28:12.651,0:28:14.671
You can see that all Prometheus is doing

0:28:14.825,0:28:17.899
is storing the sample and a timestamp.

0:28:18.472,0:28:23.029
All the timestamps are in milliseconds[br]and it's all epoch

0:28:23.238,0:28:25.395
so it's super easy to manipulate.

0:28:25.600,0:28:30.169
But, looking at the individual samples and[br]looking at this, you can see that

0:28:30.493,0:28:36.333
if we go back and just take…[br]and look at the raw data, and

0:28:36.493,0:28:37.859
we graph the raw data…

0:28:39.961,0:28:43.026
Oops, that's a syntax error.

0:28:44.500,0:28:46.968
And we look at this graph…[br]Come on.

0:28:47.221,0:28:48.282
Here we go.

0:28:48.481,0:28:50.329
Well, that's kind of boring, it's just[br]a flat line because

0:28:50.600,0:28:52.795
it's just a counter going up very slowly.

0:28:52.992,0:28:55.999
What we really want to do, is we want to[br]take, and we want to apply

0:28:57.128,0:28:59.046
a rate function to this counter.

0:28:59.569,0:29:03.635
So let's look at the rate over[br]the last one minute.

0:29:04.493,0:29:06.772
There we go, now we get[br]a nice little graph.

0:29:08.308,0:29:14.056
And so you can see that this is[br]0.6 CPU seconds per second

0:29:15.223,0:29:18.118
for that set of labels.

0:29:18.529,0:29:21.034
But this is pretty noisy, there's a lot[br]of lines on this graph and

0:29:21.235,0:29:22.621
there's still a lot of data here.

0:29:23.137,0:29:25.842
So let's start doing some filtering.

0:29:26.194,0:29:29.434
One of the things we see here is,[br]well, there's idle.

0:29:29.720,0:29:32.296
We don't really care about[br]the machine being idle,

0:29:32.593,0:29:35.492
so let's just add a label filter[br]so we can say

0:29:35.673,0:29:42.354
'mode', it's the label name, and it's not[br]equal to 'idle'. Done.

0:29:45.089,0:29:47.560
And if I could type…[br]What did I miss?

0:29:50.555,0:29:51.126
Here we go.

0:29:51.438,0:29:53.911
So now we've removed idle from the graph.

0:29:54.164,0:29:55.907
That looks a little more sane.

0:29:56.659,0:30:01.094
Oh, wow, look at that, that's a nice[br]big spike in user space on the influx server

0:30:01.363,0:30:02.310
Okay…

0:30:03.672,0:30:05.252
Well, that's pretty cool.

0:30:05.654,0:30:06.479
What about…

0:30:06.940,0:30:08.625
This is still quite a lot of lines.

0:30:10.637,0:30:14.194
How much CPU is in use total across[br]all the servers that we have.

0:30:09.217,0:30:14.378
We can just sum up that rate.

0:30:14.378,0:30:24.457
We can just see that there is[br]a sum total of 0.6 CPU seconds/s

0:30:25.000,0:30:27.515
across the servers we have.

0:30:27.715,0:30:31.379
But that's a little to coarse.

0:30:31.733,0:30:36.698
What if we want to see it by instance?

0:30:39.155,0:30:42.156
Now, we can see the two servers,[br]we can see

0:30:42.527,0:30:45.395
that we're left with just that label.

0:30:45.959,0:30:50.229
The influx labels are the influx instance[br]and the influx demo.

0:30:50.229,0:30:53.334
That's a super easy way to see that,

0:30:53.854,0:30:56.817
but we can also do this[br]the other way around.

0:30:57.060,0:31:03.022
We can say 'without (mode,cpu)' so[br]we can drop those modes and

0:31:03.367,0:31:05.243
see all the labels that we have.

0:31:05.438,0:31:11.563
We can still see the environment label[br]and the job label on our list data.

0:31:12.182,0:31:15.640
You can go either way[br]with the summary functions.

0:31:15.812,0:31:20.210
There's a whole bunch of different functions

0:31:20.558,0:31:22.730
and it's all in our documentation.

0:31:25.124,0:31:30.113
But what if we want to see it…

0:31:30.572,0:31:33.726
What if we want to see which CPUs[br]are in use?

0:31:34.154,0:31:36.937
Now we can see that it's only CPU0

0:31:37.203,0:31:39.587
because apparently these are only[br]1-core instances.

0:31:42.276,0:31:46.660
You can add/remove labels and do[br]all these queries.

0:31:49.966,0:31:51.833
Any other questions so far?

0:31:53.965,0:31:59.056
[Q] I don't have a question, but I have[br]something to add.

0:31:59.427,0:32:03.063
Prometheus is really nice, but it's[br]a lot better if you combine it

0:32:03.389,0:32:04.954
with grafana.

0:32:05.222,0:32:06.330
[A] Yes, yes.

0:32:06.537,0:32:12.332
In the beginning, when we were creating[br]Prometheus, we actually built

0:32:12.851,0:32:14.698
a piece of dashboard software called[br]promdash.

0:32:16.029,0:32:20.566
It was a simple little Ruby on Rails app[br]to create dashboards

0:32:20.733,0:32:22.744
and it had a bunch of JavaScript.

0:32:22.936,0:32:24.195
And then grafana came out.

0:32:25.157,0:32:25.880
And we're like

0:32:25.997,0:32:29.590
"Oh, that's interesting. It doesn't support[br]Prometheus" so we were like

0:32:29.826,0:32:31.806
"Hey, can you support Prometheus"

0:32:32.217,0:32:34.375
and they're like "Yeah, we've got[br]a REST API, get the data, done"

0:32:36.035,0:32:37.867
Now grafana supports Prometheus and[br]we're like

0:32:39.761,0:32:41.991
"Well, promdash, this is crap, delete".

0:32:44.390,0:32:46.171
The Prometheus development team,

0:32:46.395,0:32:49.485
we're all backend developers[br]and SREs and

0:32:49.731,0:32:51.463
we have no JavaScript skills at all.

0:32:52.589,0:32:54.879
So we're like "Let somebody deal[br]with that".

0:32:55.393,0:32:57.647
One of the nice things about working on[br]this kind of project is

0:32:57.862,0:33:01.648
we can do things that we're good at and[br]and we don't, we don't try…

0:33:02.398,0:33:05.317
We don't have any marketing people,[br]it's just an opensource project,

0:33:06.320,0:33:09.111
there's no single company behind Prometheus.

0:33:09.914,0:33:14.452
I work for GitLab, Improbable paid for[br]the Thanos system,

0:33:15.594,0:33:25.286
other companies like Red Hat now pays[br]people that used to work on CoreOS to

0:33:25.471,0:33:26.517
work on Prometheus.

0:33:27.211,0:33:30.283
There's lots and lots of collaboration[br]between many companies

0:33:30.467,0:33:32.609
to build the Prometheus ecosystem.

0:33:35.864,0:33:37.455
But yeah, grafana is great.

0:33:38.835,0:33:44.983
Actually, grafana now has[br]two fulltime Prometheus developers.

0:33:49.185,0:33:51.031
Alright, that's it.

0:33:52.637,0:33:57.044
[Applause]