0:00:05.901,0:00:10.531 So, we had a talk by a non-GitLab person[br]about GitLab. 0:00:10.531,0:00:13.057 Now, we have a talk by a GitLab person[br]on non-GtlLab. 0:00:13.202,0:00:14.603 Something like that? 0:00:15.894,0:00:19.393 The CCCHH hackerspace is now open, 0:00:19.946,0:00:22.118 from now on if you want to go there,[br]that's the announcement. 0:00:22.471,0:00:25.871 And the next talk will be by Ben Kochie 0:00:26.009,0:00:28.265 on metrics-based monitoring[br]with Prometheus. 0:00:28.748,0:00:30.212 Welcome. 0:00:30.545,0:00:33.133 [Applause] 0:00:35.395,0:00:36.578 Alright, so 0:00:36.886,0:00:39.371 my name is Ben Kochie 0:00:39.845,0:00:43.870 I work on DevOps features for GitLab 0:00:44.327,0:00:48.293 and apart working for GitLab, I also work[br]on the opensource Prometheus project. 0:00:51.163,0:00:54.355 I live in Berlin and I've been using[br]Debian since ??? 0:00:54.355,0:00:56.797 yes, quite a long time. 0:00:58.806,0:01:01.018 So, what is Metrics-based Monitoring? 0:01:02.638,0:01:05.165 If you're running software in production, 0:01:05.585,0:01:07.772 you probably want to monitor it, 0:01:07.772,0:01:10.547 because if you don't monitor it, you don't[br]know it's right. 0:01:12.648,0:01:16.112 ??? break down into two categories: 0:01:16.112,0:01:19.146 there's blackbox monitoring and[br]there's whitebox monitoring. 0:01:19.500,0:01:24.582 Blackbox monitoring is treating[br]your software like a blackbox. 0:01:24.757,0:01:26.377 It's just checks to see, like, 0:01:26.377,0:01:29.483 is it responding, or does it ping 0:01:29.753,0:01:33.588 or ??? HTTP requests 0:01:34.348,0:01:35.669 [mic turned on] 0:01:37.760,0:01:41.379 Ah, there we go, that's better. 0:01:46.592,0:01:51.898 So, blackbox monitoring is a probe, 0:01:51.898,0:01:54.684 it just kind of looks from the outside[br]to your software 0:01:55.454,0:01:57.432 and it has no knowledge of the internals 0:01:58.133,0:02:00.699 and it's really good for end to end testing. 0:02:00.942,0:02:03.560 So if you've got a fairly complicated[br]service, 0:02:03.990,0:02:06.426 you come in from the outside, you go[br]through the load balancer, 0:02:06.721,0:02:07.975 you hit the API server, 0:02:07.975,0:02:10.145 the API server might hit a database, 0:02:10.145,0:02:12.844 and you go all the way through[br]to the back of the stack 0:02:12.844,0:02:14.536 and all the way back out 0:02:14.560,0:02:16.294 so you know that everything is working[br]end to end. 0:02:16.328,0:02:18.768 But you only know about it[br]for that one request. 0:02:19.036,0:02:22.429 So in order to find out if your service[br]is working, 0:02:22.831,0:02:27.128 from the end to end, for every single[br]request, 0:02:27.135,0:02:29.523 this requires whitebox intrumentation. 0:02:29.836,0:02:33.965 So, basically, every event that happens[br]inside your software, 0:02:33.973,0:02:36.517 inside a serving stack, 0:02:36.817,0:02:39.807 gets collected and gets counted, 0:02:40.037,0:02:43.466 so you know that every request hits[br]the load balancer, 0:02:43.466,0:02:45.656 every request hits your application[br]service, 0:02:45.702,0:02:47.329 every request hits the database. 0:02:47.789,0:02:50.832 You know that everything matches up 0:02:50.997,0:02:55.764 and this is called whitebox, or[br]metrics-based monitoring. 0:02:56.010,0:02:57.688 There is different examples of, like, 0:02:57.913,0:03:02.392 the kind of software that does blackbox[br]and whitebox monitoring. 0:03:02.572,0:03:06.680 So you have software like Nagios that[br]you can configure checks 0:03:08.826,0:03:10.012 or pingdom, 0:03:10.211,0:03:12.347 pingdom will do ping of your website. 0:03:12.971,0:03:15.307 And then there is metrics-based monitoring, 0:03:15.517,0:03:19.293 things like Prometheus, things like[br]the TICK stack from influx data, 0:03:19.610,0:03:22.728 New Relic and other commercial solutions 0:03:23.027,0:03:25.480 but of course I like to talk about[br]the opensorce solutions. 0:03:25.748,0:03:28.379 We're gonna talk a little bit about[br]Prometheus. 0:03:28.819,0:03:31.955 Prometheus came out of the idea that 0:03:32.343,0:03:37.555 we needed a monitoring system that could[br]collect all this whitebox metric data 0:03:37.941,0:03:40.786 and do something useful with it. 0:03:40.915,0:03:42.667 Not just give us a pretty graph, but[br]we also want to be able to 0:03:42.985,0:03:44.189 alert on it. 0:03:44.189,0:03:45.988 So we needed both 0:03:49.872,0:03:54.068 a data gathering and an analytics system[br]in the same instance. 0:03:54.148,0:03:58.821 To do this, we built this thing and[br]we looked at the way that 0:03:59.014,0:04:01.835 data was being generated[br]by the applications 0:04:02.369,0:04:05.204 and there are advantages and[br]disadvantages to this 0:04:05.204,0:04:07.250 push vs. pull model for metrics. 0:04:07.384,0:04:09.701 We decided to go with the pulling model 0:04:09.938,0:04:13.953 because there is some slight advantages[br]for pulling over pushing. 0:04:16.323,0:04:18.163 With pulling, you get this free[br]blackbox check 0:04:18.471,0:04:20.151 that the application is running. 0:04:20.527,0:04:24.319 When you pull your application, you know[br]that the process is running. 0:04:24.532,0:04:27.529 If you are doing push-based, you can't[br]tell the difference between 0:04:27.851,0:04:31.521 your application doing no work and[br]your application not running. 0:04:32.416,0:04:33.900 So you don't know if it's stuck, 0:04:34.140,0:04:37.878 or is it just not having to do any work. 0:04:42.671,0:04:48.940 With pulling, the pulling system knows[br]the state of your network. 0:04:49.850,0:04:52.522 If you have a defined set of services, 0:04:52.887,0:04:56.788 that inventory drives what should be there. 0:04:58.274,0:05:00.080 Again, it's like, the disappearing, 0:05:00.288,0:05:03.950 is the process dead, or is it just[br]not doing anything? 0:05:04.205,0:05:07.117 With polling, you know for a fact[br]what processes should be there, 0:05:07.593,0:05:10.900 and it's a bit of an advantage there. 0:05:11.138,0:05:12.913 With pulling, there's really easy testing. 0:05:13.117,0:05:16.295 With push-based metrics, you have to[br]figure out 0:05:16.505,0:05:18.843 if you want to test a new version of[br]the monitoring system or 0:05:19.058,0:05:20.980 you want to test something new, 0:05:20.980,0:05:24.129 you have to tear off a copy of the data. 0:05:24.370,0:05:27.652 With pulling, you can just set up[br]another instance of your monitoring 0:05:27.676,0:05:29.189 and just test it. 0:05:29.714,0:05:31.033 Or you don't even have, 0:05:31.033,0:05:33.194 it doesn't even have to be monitoring,[br]you can just use curl 0:05:33.199,0:05:35.487 to pull the metrics endpoint. 0:05:38.417,0:05:40.436 It's significantly easier to test. 0:05:40.436,0:05:42.977 The other thing with the… 0:05:45.999,0:05:48.109 The other nice thing is that[br]the client is really simple. 0:05:48.481,0:05:51.068 The client doesn't have to know[br]where the monitoring system is. 0:05:51.272,0:05:53.669 It doesn't have to know about HA 0:05:53.820,0:05:55.720 It just has to sit and collect the data[br]about itself. 0:05:55.882,0:05:58.708 So it doesn't have to know anything about[br]the topology of the network. 0:05:59.134,0:06:03.363 As an application developer, if you're[br]writing a DNS server or 0:06:03.724,0:06:05.572 some other piece of software, 0:06:05.896,0:06:09.562 you don't have to know anything about[br]monitoring software, 0:06:09.803,0:06:12.217 you can just implement it inside[br]your application and 0:06:12.683,0:06:17.058 the monitoring software, whether it's[br]Prometheus or something else, 0:06:17.414,0:06:19.332 can just come and collect that data for you. 0:06:20.210,0:06:23.611 That's kind of similar to a very old[br]monitoring system called SNMP, 0:06:23.832,0:06:28.530 but SNMP has a significantly less friendly[br]data model for developers. 0:06:30.010,0:06:33.556 This is the basic layout[br]of a Prometheus server. 0:06:33.921,0:06:35.918 At the core, there's a Prometheus server 0:06:36.278,0:06:40.302 and it deals with all the data collection[br]and analytics. 0:06:42.941,0:06:46.697 Basically, this one binary,[br]it's all written in golang. 0:06:46.867,0:06:48.559 It's a single binary. 0:06:48.559,0:06:50.823 It knows how to read from your inventory, 0:06:50.823,0:06:52.659 there's a bunch of different methods,[br]whether you've got 0:06:53.121,0:06:58.843 a kubernetes cluster or a cloud platform 0:07:00.234,0:07:03.800 or you have your own customized thing[br]with ansible. 0:07:05.380,0:07:09.750 Ansible can take your layout, drop that[br]into a config file and 0:07:10.639,0:07:11.902 Prometheus can pick that up. 0:07:15.594,0:07:18.812 Once it has the layout, it goes out and[br]collects all the data. 0:07:18.844,0:07:24.254 It has a storage and a time series[br]database to store all that data locally. 0:07:24.462,0:07:28.228 It has a thing called PromQL, which is[br]a query language designed 0:07:28.452,0:07:31.033 for metrics and analytics. 0:07:31.500,0:07:36.779 From that PromQL, you can add frontends[br]that will, 0:07:36.985,0:07:39.319 whether it's a simple API client[br]to run reports, 0:07:40.019,0:07:42.942 you can use things like Grafana[br]for creating dashboards, 0:07:43.124,0:07:44.834 it's got a simple webUI built in. 0:07:45.031,0:07:46.920 You can plug in anything you want[br]on that side. 0:07:48.693,0:07:54.478 And then, it also has the ability to[br]continuously execute queries 0:07:54.625,0:07:56.191 called "recording rules" 0:07:56.832,0:07:59.103 and these recording rules have[br]two different modes. 0:07:59.103,0:08:01.871 You can either record, you can take[br]a query 0:08:02.150,0:08:03.711 and it will generate new data[br]from that query 0:08:04.072,0:08:06.967 or you can take a query, and[br]if it returns results, 0:08:07.354,0:08:08.910 it will return an alert. 0:08:09.176,0:08:12.506 That alert is a push message[br]to the alert manager. 0:08:12.813,0:08:18.969 This allows us to separate the generating[br]of alerts from the routing of alerts. 0:08:19.153,0:08:24.259 You can have one or hundreds of Prometheus[br]services, all generating alerts 0:08:24.599,0:08:28.807 and it goes into an alert manager cluster[br]and sends, does the deduplication 0:08:29.329,0:08:30.684 and the routing to the human 0:08:30.879,0:08:34.138 because, of course, the thing[br]that we want is 0:08:34.927,0:08:38.797 we had dashboards with graphs, but[br]in order to find out if something is broken 0:08:38.966,0:08:40.650 you had to have a human[br]looking at the graph. 0:08:40.830,0:08:42.942 With Prometheus, we don't have to do that[br]anymore, 0:08:43.103,0:08:47.638 we can simply let the software tell us[br]that we need to go investigate 0:08:47.638,0:08:48.650 our problems. 0:08:48.778,0:08:50.831 We don't have to sit there and[br]stare at dashboards all day, 0:08:51.035,0:08:52.380 because that's really boring. 0:08:54.519,0:08:57.556 What does it look like to actually[br]get data into Prometheus? 0:08:57.587,0:09:02.140 This is a very basic output[br]of a Prometheus metric. 0:09:02.613,0:09:03.930 This is a very simple thing. 0:09:04.086,0:09:07.572 If you know much about[br]the linux kernel, 0:09:06.883,0:09:12.779 the linux kernel tracks and proc stats,[br]all the state of all the CPUs 0:09:12.779,0:09:14.459 in your system 0:09:14.662,0:09:18.078 and we express this by having[br]the name of the metric, which is 0:09:22.449,0:09:26.123 'node_cpu_seconds_total' and so[br]this is a self-describing metric, 0:09:26.547,0:09:28.375 like you can just read the metrics name 0:09:28.530,0:09:30.845 and you understand a little bit about[br]what's going on here. 0:09:33.241,0:09:38.521 The linux kernel and other kernels track[br]their usage by the number of seconds 0:09:38.859,0:09:41.004 spent doing different things and 0:09:41.199,0:09:46.721 that could be, whether it's in system or[br]user space or IRQs 0:09:47.065,0:09:48.690 or iowait or idle. 0:09:48.908,0:09:51.280 Actually, the kernel tracks how much[br]idle time it has. 0:09:53.660,0:09:55.309 It also tracks it by the number of CPUs. 0:09:55.997,0:10:00.067 With other monitoring systems, they used[br]to do this with a tree structure 0:10:01.021,0:10:03.688 and this caused a lot of problems,[br]for like 0:10:03.854,0:10:09.291 How do you mix and match data so[br]by switching from 0:10:10.043,0:10:12.484 a tree structure to a tag-based structure, 0:10:12.985,0:10:16.896 we can do some really interesting[br]powerful data analytics. 0:10:18.170,0:10:25.170 Here's a nice example of taking[br]those CPU seconds counters 0:10:26.101,0:10:30.198 and then converting them into a graph[br]by using PromQL. 0:10:32.724,0:10:34.830 Now we can get into[br]Metrics-Based Alerting. 0:10:35.315,0:10:37.665 Now we have this graph, we have this thing 0:10:37.847,0:10:39.497 we can look and see here 0:10:39.999,0:10:42.920 "Oh there is some little spike here,[br]we might want to know about that." 0:10:43.191,0:10:45.849 Now we can get into Metrics-Based[br]Alerting. 0:10:46.281,0:10:51.128 I used to be a site reliability engineer,[br]I'm still a site reliability engineer at heart 0:10:52.371,0:11:00.362 and we have this concept of things that[br]you need on a site or a service reliably 0:11:00.910,0:11:03.231 The most important thing you need is[br]down at the bottom, 0:11:03.569,0:11:06.869 Monitoring, because if you don't have[br]monitoring of your service, 0:11:07.108,0:11:08.688 how do you know it's even working? 0:11:11.628,0:11:15.235 There's a couple of techniques here, and[br]we want to alert based on data 0:11:15.693,0:11:17.644 and not just those end to end tests. 0:11:18.796,0:11:23.387 There's a couple of techniques, a thing[br]called the RED method 0:11:23.555,0:11:25.141 and there's a thing called the USE method 0:11:25.588,0:11:28.400 and there's a couple nice things to some[br]blog posts about this 0:11:28.695,0:11:31.306 and basically it defines that, for example, 0:11:31.484,0:11:35.000 the RED method talks about the requests[br]that your system is handling 0:11:36.421,0:11:37.604 There are three things: 0:11:37.775,0:11:40.073 There's the number of requests, there's[br]the number of errors 0:11:40.268,0:11:42.306 and there's how long takes a duration. 0:11:42.868,0:11:45.000 With the combination of these three things 0:11:45.341,0:11:48.368 you can determine most of[br]what your users see 0:11:48.712,0:11:53.616 "Did my request go through? Did it[br]return an error? Was it fast?" 0:11:55.492,0:11:57.971 Most people, that's all they care about. 0:11:58.205,0:12:01.965 "I made a request to a website and[br]it came back and it was fast." 0:12:04.975,0:12:06.517 It's a very simple method of just, like, 0:12:07.162,0:12:10.109 those are the important things to[br]determine if your site is healthy. 0:12:12.193,0:12:17.045 But we can go back to some more[br]traditional, sysadmin style alerts 0:12:17.309,0:12:20.553 this is basically taking the filesystem[br]available space, 0:12:20.824,0:12:26.522 divided by the filesystem size, that becomes[br]the ratio of filesystem availability 0:12:26.697,0:12:27.523 from 0 to 1. 0:12:28.241,0:12:30.759 Multiply it by 100, we now have[br]a percentage 0:12:31.016,0:12:35.659 and if it's less than or equal to 1%[br]for 15 minutes, 0:12:35.940,0:12:41.782 this is less than 1% space, we should tell[br]a sysadmin to go check 0:12:41.957,0:12:44.290 to find out why the filesystem[br]has fall 0:12:44.635,0:12:46.168 It's super nice and simple. 0:12:46.494,0:12:49.685 We can also tag, we can include… 0:12:51.418,0:12:58.232 Every alert includes all the extraneous[br]labels that Prometheus adds to your metrics 0:12:59.488,0:13:05.461 When you add a metric in Prometheus, if[br]we go back and we look at this metric. 0:13:06.009,0:13:10.803 This metric only contain the information[br]about the internals of the application 0:13:12.942,0:13:14.995 anything about, like, what server it's on,[br]is it running in a container, 0:13:15.186,0:13:18.724 what cluster does it come from,[br]what continent is it on, 0:13:17.702,0:13:22.280 that's all extra annotations that are[br]added by the Prometheus server 0:13:22.619,0:13:23.949 at discovery time. 0:13:24.514,0:13:28.347 Unfortunately I don't have a good example [br]of what those labels look like 0:13:28.514,0:13:34.180 but every metric gets annotated[br]with location information. 0:13:36.904,0:13:41.121 That location information also comes through[br]as labels in the alert 0:13:41.300,0:13:48.074 so, if you have a message coming[br]into your alert manager, 0:13:48.269,0:13:49.899 the alert manager can look and go 0:13:50.093,0:13:51.621 "Oh, that's coming from this datacenter" 0:13:52.007,0:13:58.905 and it can include that in the email or[br]IRC message or SMS message. 0:13:59.069,0:14:00.772 So you can include 0:13:59.271,0:14:04.422 "Filesystem is out of space on this host[br]from this datacenter" 0:14:04.557,0:14:07.340 All these labels get passed through and[br]then you can append 0:14:07.491,0:14:13.292 "severity: critical" to that alert and[br]include that in the message to the human 0:14:13.693,0:14:16.775 because of course, this is how you define… 0:14:16.940,0:14:20.857 Getting the message from the monitoring[br]to the human. 0:14:22.197,0:14:23.850 You can even include nice things like, 0:14:24.027,0:14:27.508 if you've got documentation, you can[br]include a link to the documentation 0:14:27.620,0:14:28.686 as an annotation 0:14:29.079,0:14:33.438 and the alert manager can take that[br]basic url and, you know, 0:14:33.467,0:14:36.806 massaging it into whatever it needs[br]to look like to actually get 0:14:37.135,0:14:40.417 the operator to the correct documentation. 0:14:42.117,0:14:43.450 We can also do more fun things: 0:14:43.657,0:14:45.567 since we actually are not just checking 0:14:45.746,0:14:48.523 what is the space right now,[br]we're tracking data over time, 0:14:49.232,0:14:50.827 we can use 'predict_linear'. 0:14:52.406,0:14:55.255 'predict_linear' just takes and does[br]a simple linear regression. 0:14:55.749,0:15:00.270 This example takes the filesystem[br]available space over the last hour and 0:15:00.865,0:15:02.453 does a linear regression. 0:15:02.785,0:15:08.536 Prediction says "Well, it's going that way[br]and four hours from now, 0:15:08.749,0:15:13.112 based on one hour of history, it's gonna[br]be less than 0, which means full". 0:15:13.667,0:15:20.645 We know that within the next four hours,[br]the disc is gonna be full 0:15:20.874,0:15:24.658 so we can tell the operator ahead of time[br]that it's gonna be full 0:15:24.833,0:15:26.517 and not just tell them that it's full[br]right now. 0:15:27.113,0:15:32.303 They have some window of ability[br]to fix it before it fails. 0:15:32.674,0:15:35.369 This is really important because[br]if you're running a site 0:15:35.689,0:15:41.370 you want to be able to have alerts[br]that tell you that your system is failing 0:15:41.573,0:15:42.994 before it actually fails. 0:15:43.667,0:15:48.254 Because if it fails, you're out of SLO[br]or SLA and 0:15:48.404,0:15:50.322 your users are gonna be unhappy 0:15:50.729,0:15:52.493 and you don't want the users to tell you[br]that your site is down 0:15:52.682,0:15:54.953 you want to know about it before[br]your users can even tell. 0:15:55.193,0:15:58.491 This allows you to do that. 0:15:58.693,0:16:02.232 And also of course, Prometheus being[br]a modern system, 0:16:02.735,0:16:05.633 we support fully UTF8 in all of our labels. 0:16:08.283,0:16:12.101 Here's an other one, here's a good example[br]from the USE method. 0:16:12.490,0:16:16.036 This is a rate of 500 errors coming from[br]an application 0:16:16.423,0:16:17.813 and you can simply alert that 0:16:17.977,0:16:22.555 there's more than 500 errors per second[br]coming out of the application 0:16:22.568,0:16:25.670 if that's your threshold for pain 0:16:26.041,0:16:27.298 And you can do other things, 0:16:27.501,0:16:29.338 you can convert that from just[br]a raid of errors 0:16:29.723,0:16:31.054 to a percentive error. 0:16:31.304,0:16:32.605 So you could say 0:16:33.053,0:16:37.336 "I have an SLA of 3 9" and so you can say 0:16:37.574,0:16:46.710 "If the rate of errors divided by the rate[br]of requests is .01, 0:16:47.265,0:16:49.335 or is more than .01, then[br]that's a problem." 0:16:49.725,0:16:54.589 You can include that level of[br]error granularity. 0:16:54.797,0:16:57.622 And if you're just doing a blackbox test, 0:16:58.185,0:17:03.727 you wouldn't know this, you would only get[br]if you got an error from the system, 0:17:04.188,0:17:05.601 then you got another error from the system 0:17:05.826,0:17:06.938 then you fire an alert. 0:17:07.307,0:17:11.847 But if those checks are one minute apart[br]and you're serving 1000 requests per second 0:17:13.324,0:17:20.987 you could be serving 10,000 errors before[br]you even get an alert. 0:17:21.579,0:17:22.876 And you might miss it, because 0:17:23.104,0:17:24.993 what if you only get one random error 0:17:25.327,0:17:28.898 and then the next time, you're serving[br]25% errors, 0:17:29.094,0:17:31.571 you only have a 25% chance of that check[br]failing again. 0:17:31.800,0:17:36.230 You really need these metrics in order[br]to get 0:17:36.430,0:17:38.867 proper reports of the status of your system 0:17:43.176,0:17:43.850 There's even options 0:17:44.051,0:17:45.816 You can slice and dice those labels. 0:17:46.225,0:17:50.056 If you have a label on all of[br]your applications called 'service' 0:17:50.322,0:17:53.251 you can send that 'service' label through[br]to the message 0:17:53.523,0:17:55.857 and you can say[br]"Hey, this service is broken". 0:17:56.073,0:18:00.363 You can include that service label[br]in your alert messages. 0:18:01.426,0:18:06.723 And that's it, I can go to a demo and Q&A. 0:18:09.881,0:18:13.687 [Applause] 0:18:16.877,0:18:18.417 Any questions so far? 0:18:18.811,0:18:20.071 Or anybody want to see a demo? 0:18:29.517,0:18:35.065 [Q] Hi. Does Prometheus make metric[br]discovery inside containers 0:18:35.364,0:18:37.476 or do I have to implement the metrics[br]myself? 0:18:38.184,0:18:45.743 [A] For metrics in containers, there are[br]already things that expose 0:18:45.887,0:18:49.214 the metrics of the container system[br]itself. 0:18:49.512,0:18:52.174 There's a utility called 'cadvisor' and 0:18:52.395,0:18:57.172 cadvisor takes the links cgroup data[br]and exposes it as metrics 0:18:57.416,0:19:01.164 so you can get data about[br]how much CPU time is being 0:19:01.164,0:19:02.421 spent in your container, 0:19:02.683,0:19:04.139 how much memory is being spent[br]by your container. 0:19:04.775,0:19:08.411 [Q] But not about the application,[br]just about the container usage ? 0:19:08.597,0:19:11.355 [A] Right. Because the container[br]has no idea 0:19:11.698,0:19:15.451 whether your application is written[br]in Ruby or Go or Python or whatever, 0:19:18.698,0:19:21.602 you have to build that into[br]your application in order to get the data. 0:19:24.057,0:19:24.307 So for Prometheus, 0:19:27.890,0:19:35.031 we've written client libraries that can be[br]included in your application directly 0:19:35.195,0:19:36.413 so you can get that data out. 0:19:36.602,0:19:41.460 If you go to the Prometheus website,[br]we have a whole series of client libraries 0:19:44.936,0:19:48.913 and we cover a pretty good selection[br]of popular software. 0:19:56.569,0:19:59.537 [Q] What is the current state of[br]long-term data storage? 0:20:00.803,0:20:01.678 [A] Very good question. 0:20:02.697,0:20:04.513 There's been several… 0:20:04.913,0:20:06.521 There's actually several different methods[br]of doing this. 0:20:09.653,0:20:14.667 Prometheus stores all this data locally[br]in its own data storage 0:20:14.667,0:20:15.711 on the local disk. 0:20:16.609,0:20:19.156 But that's only as durable as[br]that server is durable. 0:20:19.423,0:20:21.627 So if you've got a really durable server, 0:20:21.812,0:20:23.357 you can store as much data as you want, 0:20:23.551,0:20:26.521 you can store years and years of data[br]locally on the Prometheus server. 0:20:26.653,0:20:28.088 That's not a problem. 0:20:28.781,0:20:32.244 There's a bunch of misconceptions because[br]of our default 0:20:32.464,0:20:34.492 and the language on our website said 0:20:34.698,0:20:36.160 "It's not long-term storage" 0:20:36.707,0:20:41.841 simply because we leave that problem[br]up to the person running the server. 0:20:43.389,0:20:46.389 But the time series database[br]that Prometheus includes 0:20:46.562,0:20:47.739 is actually quite durable. 0:20:49.157,0:20:51.069 But it's only as durable as the server[br]underneath it. 0:20:51.642,0:20:55.172 So if you've got a very large cluster and[br]you want really high durability, 0:20:55.800,0:20:57.705 you need to have some kind of[br]cluster software, 0:20:58.217,0:21:01.106 but because we want Prometheus to be[br]simple to deploy 0:21:01.701,0:21:02.911 and very simple to operate 0:21:03.355,0:21:06.774 and also very robust. 0:21:06.950,0:21:09.370 We didn't want to include any clustering[br]in Prometheus itself, 0:21:09.787,0:21:12.078 because anytime you have a clustered[br]software, 0:21:12.294,0:21:15.100 what happens if your network is[br]a little wanky. 0:21:15.586,0:21:19.470 The first thing that goes down is[br]all of your distributed systems fail. 0:21:20.328,0:21:23.048 And building distributed systems to be[br]really robust is really hard 0:21:23.445,0:21:29.142 so Prometheus is what we call[br]"uncoordinated distributed systems". 0:21:29.348,0:21:34.048 If you've got two Prometheus servers[br]monitoring all your targets in an HA mode 0:21:34.273,0:21:36.890 in a cluster, and there's a split brain, 0:21:37.131,0:21:40.363 each Prometheus can see[br]half of the cluster and 0:21:40.768,0:21:43.557 it can see that the other half[br]of the cluster is down. 0:21:43.846,0:21:46.740 They can both try to get alerts out[br]to the alert manager 0:21:46.945,0:21:50.466 and this is a really really robust way of[br]handling split brains 0:21:50.734,0:21:54.069 and bad network failures and bad problems[br]in a cluster. 0:21:54.294,0:21:57.163 It's designed to be super super robust 0:21:57.342,0:21:59.844 and so the two individual[br]Promotheus servers in you cluster 0:22:00.079,0:22:02.009 don't have to talk to each other[br]to do this, 0:22:02.193,0:22:03.994 they can just to it independently. 0:22:04.377,0:22:07.392 But if you want to be able[br]to correlate data 0:22:07.604,0:22:09.255 between many different Prometheus servers 0:22:09.439,0:22:12.185 you need an external data storage[br]to do this. 0:22:12.777,0:22:15.008 And also you may not have[br]very big servers, 0:22:15.164,0:22:17.126 you might be running your Prometheus[br]in a container 0:22:17.293,0:22:19.373 and it's only got a little bit of local[br]storage space 0:22:19.543,0:22:23.217 so you want to send all that data up[br]to a big cluster datastore 0:22:23.439,0:22:25.124 for a bigger use 0:22:25.707,0:22:27.913 We have several different ways of[br]doing this. 0:22:28.383,0:22:30.941 There's the classic way which is called[br]federation 0:22:31.156,0:22:34.875 where you have one Prometheus server[br]polling in summary data from 0:22:35.083,0:22:36.604 each of the individual Prometheus servers 0:22:36.823,0:22:40.266 and this is useful if you want to run[br]alerts against data coming 0:22:40.363,0:22:41.578 from multiple Prometheus servers. 0:22:42.488,0:22:44.240 But federation is not replication. 0:22:44.870,0:22:47.488 It only can do a little bit of data from[br]each Prometheus server. 0:22:47.715,0:22:51.078 If you've got a million metrics on[br]each Prometheus server, 0:22:51.683,0:22:55.725 you can't poll in a million metrics[br]and do… 0:22:55.725,0:22:58.850 If you've got 10 of those, you can't[br]poll in 10 million metrics 0:22:59.011,0:23:00.635 simultaneously into one Prometheus[br]server. 0:23:00.919,0:23:01.890 It's just to much data. 0:23:02.875,0:23:06.006 There is two others, a couple of other[br]nice options. 0:23:06.618,0:23:08.923 There's a piece of software called[br]Cortex. 0:23:09.132,0:23:16.033 Cortex is a Prometheus server that[br]stores its data in a database. 0:23:16.570,0:23:19.127 Specifically, a distributed database. 0:23:19.395,0:23:24.136 Things that are based on the Google[br]big table model, like Cassandra or… 0:23:25.892,0:23:27.166 What's the Amazon one? 0:23:30.332,0:23:32.667 Yeah. 0:23:32.682,0:23:33.700 Dynamodb. 0:23:34.193,0:23:37.137 If you have a dynamodb or a cassandra[br]cluster, or one of these other 0:23:37.350,0:23:39.298 really big distributed storage clusters, 0:23:39.713,0:23:44.615 Cortex can run and the Prometheus servers[br]will stream their data up to Cortex 0:23:44.907,0:23:49.384 and it will keep a copy of that accross[br]all of your Prometheus servers. 0:23:49.596,0:23:51.373 And because it's based on things[br]like Cassandra, 0:23:51.709,0:23:53.150 it's super scalable. 0:23:53.436,0:23:57.862 But it's a little complex to run and 0:23:57.536,0:24:00.836 many people don't want to run that[br]complex infrastructure. 0:24:01.254,0:24:06.080 We have another new one, we just blogged[br]about it yesterday. 0:24:01.564,0:24:06.513 It's a thing called Thanos. 0:24:06.513,0:24:10.596 Thanos is Prometheus at scale. 0:24:11.143,0:24:12.356 Basically, the way it works… 0:24:12.761,0:24:15.063 Actually, why don't I bring that up? 0:24:24.122,0:24:30.519 This was developed by a company[br]called Improbable 0:24:30.935,0:24:32.632 and they wanted to… 0:24:35.489,0:24:40.063 They had billions of metrics coming from[br]hundreds of Prometheus servers. 0:24:40.604,0:24:46.645 They developed this in collaboration with[br]the Prometheus team to build 0:24:47.000,0:24:48.581 a super highly scalable Prometheus server. 0:24:49.877,0:24:55.518 Prometheus itself stores the incoming[br]metrics data in a write ahead log 0:24:56.008,0:24:59.560 and then every two hours, it creates[br]a compaction cycle 0:24:59.982,0:25:03.177 and it creates an imutable time series block[br]of data which is 0:25:03.606,0:25:06.718 all the time series blocks themselves 0:25:07.131,0:25:10.319 and then an index into that data. 0:25:10.849,0:25:13.678 Those two hour windows are all imutable 0:25:14.037,0:25:16.297 so what Thanos does,[br]it has a little sidecar binary that 0:25:16.297,0:25:18.722 [br]watches for those new directories and 0:25:18.722,0:25:20.701 uploads them into a blob store. 0:25:20.701,0:25:25.819 So you could put them in S3 or minio or[br]some other simple object storage. 0:25:26.301,0:25:32.916 And then now you have all of your data,[br]all of this index data already 0:25:32.916,0:25:34.816 ready to go 0:25:34.816,0:25:38.489 and then the final sidecar creates[br]a little mesh cluster that can read from 0:25:38.489,0:25:39.616 all of those S3 blocks. 0:25:40.123,0:25:48.470 Now, you have this super global view[br]all stored in a big bucket storage and 0:25:49.621,0:25:52.404 things like S3 or minio are… 0:25:52.995,0:25:57.669 Bucket storage is not databases so they're[br]operationally a little easier to operate. 0:25:58.405,0:26:02.183 Plus, now we have all this data in[br]a bucket store and 0:26:02.600,0:26:06.081 the Thanos sidecars can talk to each other 0:26:06.526,0:26:08.150 We can now have a single entry point. 0:26:08.418,0:26:11.915 You can query Thanos and Thanos will[br]distribute your query 0:26:12.131,0:26:13.577 across all your Prometheus servers. 0:26:13.792,0:26:16.181 So now you can do global queries across[br]all of your servers. 0:26:17.696,0:26:22.246 But it's very new, they just released[br]their first release candidate yesterday. 0:26:23.926,0:26:26.875 It is looking to be like[br]the coolest thing ever 0:26:27.448,0:26:29.341 for running large scale Prometheus. 0:26:30.315,0:26:34.779 Here's an example of how that is laid out. 0:26:36.840,0:26:39.469 This will bring and let you have[br]a billion metric Prometheus cluster. 0:26:42.607,0:26:44.261 And it's got a bunch of other[br]cool features. 0:26:45.376,0:26:46.672 Any more questions? 0:26:55.353,0:26:57.436 Alright, maybe I'll do[br]a quick little demo. 0:27:05.407,0:27:10.547 Here is a Prometheus server that is[br]provided by this group 0:27:10.736,0:27:14.141 that just does a ansible deployment[br]for Prometheus. 0:27:15.342,0:27:19.597 And you can just simply query[br]for something like 'node_cpu'. 0:27:21.077,0:27:23.073 This is actually the old name for[br]that metric. 0:27:24.083,0:27:25.659 And you can see, here's exactly 0:27:28.078,0:27:31.250 the CPU metrics from some servers. 0:27:32.907,0:27:34.634 It's just a bunch of stuff. 0:27:35.008,0:27:37.060 There's actually two servers here, 0:27:37.445,0:27:40.660 there's an influx cloud alchemy and[br]there is a demo cloud alchemy. 0:27:42.011,0:27:43.666 [Q] Can you zoom in?[br][A] Oh yeah sure. 0:27:53.135,0:27:57.617 So you can see all the extra labels. 0:28:00.067,0:28:01.644 We can also do some things like… 0:28:02.176,0:28:04.247 Let's take a look at, say,[br]the last 30 seconds. 0:28:04.614,0:28:07.226 We can just add this little time window. 0:28:07.755,0:28:11.033 It's called a range request,[br]and you can see 0:28:11.257,0:28:12.398 the individual samples. 0:28:12.651,0:28:14.671 You can see that all Prometheus is doing 0:28:14.825,0:28:17.899 is storing the sample and a timestamp. 0:28:18.472,0:28:23.029 All the timestamps are in milliseconds[br]and it's all epoch 0:28:23.238,0:28:25.395 so it's super easy to manipulate. 0:28:25.600,0:28:30.169 But, looking at the individual samples and[br]looking at this, you can see that 0:28:30.493,0:28:36.333 if we go back and just take…[br]and look at the raw data, and 0:28:36.493,0:28:37.859 we graph the raw data… 0:28:39.961,0:28:43.026 Oops, that's a syntax error. 0:28:44.500,0:28:46.968 And we look at this graph…[br]Come on. 0:28:47.221,0:28:48.282 Here we go. 0:28:48.481,0:28:50.329 Well, that's kind of boring, it's just[br]a flat line because 0:28:50.600,0:28:52.795 it's just a counter going up very slowly. 0:28:52.992,0:28:55.999 What we really want to do, is we want to[br]take, and we want to apply 0:28:57.128,0:28:59.046 a rate function to this counter. 0:28:59.569,0:29:03.635 So let's look at the rate over[br]the last one minute. 0:29:04.493,0:29:06.772 There we go, now we get[br]a nice little graph. 0:29:08.308,0:29:14.056 And so you can see that this is[br]0.6 CPU seconds per second 0:29:15.223,0:29:18.118 for that set of labels. 0:29:18.529,0:29:21.034 But this is pretty noisy, there's a lot[br]of lines on this graph and 0:29:21.235,0:29:22.621 there's still a lot of data here. 0:29:23.137,0:29:25.842 So let's start doing some filtering. 0:29:26.194,0:29:29.434 One of the things we see here is,[br]well, there's idle. 0:29:29.720,0:29:32.296 We don't really care about[br]the machine being idle, 0:29:32.593,0:29:35.492 so let's just add a label filter[br]so we can say 0:29:35.673,0:29:42.354 'mode', it's the label name, and it's not[br]equal to 'idle'. Done. 0:29:45.089,0:29:47.560 And if I could type…[br]What did I miss? 0:29:50.555,0:29:51.126 Here we go. 0:29:51.438,0:29:53.911 So now we've removed idle from the graph. 0:29:54.164,0:29:55.907 That looks a little more sane. 0:29:56.659,0:30:01.094 Oh, wow, look at that, that's a nice[br]big spike in user space on the influx server 0:30:01.363,0:30:02.310 Okay… 0:30:03.672,0:30:05.252 Well, that's pretty cool. 0:30:05.654,0:30:06.479 What about… 0:30:06.940,0:30:08.625 This is still quite a lot of lines. 0:30:10.637,0:30:14.194 How much CPU is in use total across[br]all the servers that we have. 0:30:09.217,0:30:14.378 We can just sum up that rate. 0:30:14.378,0:30:24.457 We can just see that there is[br]a sum total of 0.6 CPU seconds/s 0:30:25.000,0:30:27.515 across the servers we have. 0:30:27.715,0:30:31.379 But that's a little to coarse. 0:30:31.733,0:30:36.698 What if we want to see it by instance? 0:30:39.155,0:30:42.156 Now, we can see the two servers,[br]we can see 0:30:42.527,0:30:45.395 that we're left with just that label. 0:30:45.959,0:30:50.229 The influx labels are the influx instance[br]and the influx demo. 0:30:50.229,0:30:53.334 That's a super easy way to see that, 0:30:53.854,0:30:56.817 but we can also do this[br]the other way around. 0:30:57.060,0:31:03.022 We can say 'without (mode,cpu)' so[br]we can drop those modes and 0:31:03.367,0:31:05.243 see all the labels that we have. 0:31:05.438,0:31:11.563 We can still see the environment label[br]and the job label on our list data. 0:31:12.182,0:31:15.640 You can go either way[br]with the summary functions. 0:31:15.812,0:31:20.210 There's a whole bunch of different functions 0:31:20.558,0:31:22.730 and it's all in our documentation. 0:31:25.124,0:31:30.113 But what if we want to see it… 0:31:30.572,0:31:33.726 What if we want to see which CPUs[br]are in use? 0:31:34.154,0:31:36.937 Now we can see that it's only CPU0 0:31:37.203,0:31:39.587 because apparently these are only[br]1-core instances. 0:31:42.276,0:31:46.660 You can add/remove labels and do[br]all these queries. 0:31:49.966,0:31:51.833 Any other questions so far? 0:31:53.965,0:31:59.056 [Q] I don't have a question, but I have[br]something to add. 0:31:59.427,0:32:03.063 Prometheus is really nice, but it's[br]a lot better if you combine it 0:32:03.389,0:32:04.954 with grafana. 0:32:05.222,0:32:06.330 [A] Yes, yes. 0:32:06.537,0:32:12.332 In the beginning, when we were creating[br]Prometheus, we actually built 0:32:12.851,0:32:14.698 a piece of dashboard software called[br]promdash. 0:32:16.029,0:32:20.566 It was a simple little Ruby on Rails app[br]to create dashboards 0:32:20.733,0:32:22.744 and it had a bunch of JavaScript. 0:32:22.936,0:32:24.195 And then grafana came out. 0:32:25.157,0:32:25.880 And we're like 0:32:25.997,0:32:29.590 "Oh, that's interesting. It doesn't support[br]Prometheus" so we were like 0:32:29.826,0:32:31.806 "Hey, can you support Prometheus" 0:32:32.217,0:32:34.375 and they're like "Yeah, we've got[br]a REST API, get the data, done" 0:32:36.035,0:32:37.867 Now grafana supports Prometheus and[br]we're like 0:32:39.761,0:32:41.991 "Well, promdash, this is crap, delete". 0:32:44.390,0:32:46.171 The Prometheus development team, 0:32:46.395,0:32:49.485 we're all backend developers[br]and SREs and 0:32:49.731,0:32:51.463 we have no JavaScript skills at all. 0:32:52.589,0:32:54.879 So we're like "Let somebody deal[br]with that". 0:32:55.393,0:32:57.647 One of the nice things about working on[br]this kind of project is 0:32:57.862,0:33:01.648 we can do things that we're good at and[br]and we don't, we don't try… 0:33:02.398,0:33:05.317 We don't have any marketing people,[br]it's just an opensource project, 0:33:06.320,0:33:09.111 there's no single company behind Prometheus. 0:33:09.914,0:33:14.452 I work for GitLab, Improbable paid for[br]the Thanos system, 0:33:15.594,0:33:25.286 other companies like Red Hat now pays[br]people that used to work on CoreOS to 0:33:25.471,0:33:26.517 work on Prometheus. 0:33:27.211,0:33:30.283 There's lots and lots of collaboration[br]between many companies 0:33:30.467,0:33:32.609 to build the Prometheus ecosystem. 0:33:35.864,0:33:37.455 But yeah, grafana is great. 0:33:38.835,0:33:44.983 Actually, grafana now has[br]two fulltime Prometheus developers. 0:33:49.185,0:33:51.031 Alright, that's it. 0:33:52.637,0:33:57.044 [Applause]