TOM DALE: Hey, you guys ready? Thank you guys so much for coming. This is awesome. I was really, I, when they were putting together the schedule, I said, make sure that you put us down in the Caves of Moria. So thank you guys for coming down and making it. I'm Tom. This is Yehuda. YEHUDA KATZ: When people told me was signed up to do a back-to-back talk, I don't know what I was thinking. T.D.: Yup. So. We want to talk to you today about, about Skylight. So, just a little bit before we talk about that, I want to talk about us a little bit. So, in 2011 we started a company called Tilde. It's this shirt. It may have made me self-conscious, because this is actually a first edition and it's printed off-center. Well, either I'm off-center or the shirt's off-center. One of the two. So we started Tilde in 2011, and we had all just left a venture-backed company, and that was a pretty traumatic experience for us because we spent a lot of time building the company and then we ran out of money and sold to Facebook and we really didn't want to repeat that experience. So, we decided to start Tilde, and when we did it, we decided to be. DHH and the other people at Basecamp were talking about, you know, being bootstrapped and proud. And that was a message that really resonated with us, and so we wanted to capture the same thing. There's only problem with being bootstrapped and proud, and that is, in order to be both of those things, you actually need money, it turns out. It's not like you just say it in a blog post and then all of the sudden you are in business. So, we had to think a lot about, OK, well, how do we make money? How do we make money? How do we make a profitable and, most importantly, sustainable business? Because we didn't want to just flip to Facebook in a couple years. So, looking around, I think the most obvious thing that people suggested to us is, well, why don't you guys just become Ember, Inc.? Raise a few million dollars, you know, build a bunch of business model, mostly prayer. But that's not really how we want to think about building open source communities. We don't really think that that necessarily leads to the best open source communities. And if you're interested more in that, I recommend Leia Silver, who is one of our co-founders. She's giving a talk this afternoon. Oh, sorry. Friday afternoon, about how to build a company that is centered on open source. So if you want to learn more about how we've done that, I would really suggest you go check out her talk. So, no. So, no Ember, Inc. Not allowed. So, we really want to build something that leveraged the strengths that we thought that we had. One, I think most importantly, a really deep knowledge of open source and a deep knowledge of the Rails stack, and also Carl, it turns out, is really, really good at building highly scalable big data sys- big data systems. Lots of Hadoop in there. So, last year at RailsConf, we announced the private beta of Skylight. How many of you have used Skylight? Can you raise your hand if you have used it? OK. Many of you. Awesome. So, so Skylight is a tool for profiling and measuring the performance of your Rails applications in production. And, as a product, Skylight, I think, was built on three really, three key break-throughs. There were key, three key break-throughs. We didn't want to ship a product that was incrementally better than the competition. We wanted to ship a product that was dramatically better. Quantum leap. An order of magnitude better. And, in order to do that, we spent a lot of time thinking about it, about how we could solve most of the problems that we saw in the existing landscape. And so those, those break-throughs are predicated- sorry, those, delivering a product that does that is predicated on these three break-throughs. So, the first one I want to talk about is, honest response times. Honest response times. So, DHH wrote a blog post on what was then the 37Signals blog, now the Basecamp blog, called The problem with averages. How many of you have read this? Awesome. For those of you that have not, how many of you hate raising your hands at presentations? So, for those of you that- Y.K.: Just put a button in every seat. Press this button- T.D.: Press the button if you have. Yes. Great. So, if you read this blog post, the way it opens is, Our average response time for Basecamp right now is 87ms... That sounds fantastic. And it easily leads you to believe that all is well and that we wouldn't need to spend any more time optimizing performance. But that's actually wrong. The average number is completely skewed by tons of fast responses to feed requests and other cached replies. If you have 1000 requests that return in 5ms, and then you can have 200 requests taking 2000ms, or two seconds, you can still report an av- a respectable 170ms of average. That's useless. So what does DHH say that we need? DHH says the solution is histograms. So, for those of you like me who were sleeping through your statistics class in high school, and college, a brief primer on histograms. So a histogram is very simple. Basically, you have a, you have a series of numbers along some axis, and every time you, you're in that number, you're in that bucket, you basically increment that bar by one. So, this is an example of a histogram of response times in a Rails application. So you can see that there's a big cluster in the middle around 488ms, 500ms. This isn't a super speedy app but it's not the worst thing in the world. And they're all clustered, and then as you kind of move to the right you can see that the respond times get longer and longer and longer, and as you move to the left, response times get shorter and shorter and shorter. So, why do you want a histogram? What's the, what's the most important thing about a histogram? Y.K.: Well, I think it's because most requests don't actually look like this. T.D.: Yes. Y.K.: Most end points don't actually look like this. T.D.: Right. If you think about what your Rails app is doing, it's a complicated beast, right. Turns out, Ruby frankly, you can, you can do branching logic. You can do a lot of things. And so what that means is that one end point, if you represent that with a single number, you are losing a lot of fidelity, to the point where it becomes, as DHH said, useless. So, for example, in a histogram, you can easily see, oh, here's a group of requests and response times where I'm hitting the cache, and here's another group where I'm missing it. And you can see that that cluster is significantly slower than the faster cache-hitting cluster. And the other thing that you get when you have a, a distribution, when you keep the whole distribution in the histogram, is you can look at this number at the 95th percentile, right. So the right, the way to think about the performance of your web application is not the average, because the average doesn't really tell you anything. You want to think about the 95th percentile, because that's not the average response time, that's the average worst response time that a user is likely to hit. And the thing to keep in mind is that it's not as though a customer comes to your site, they issue one request, and then they're done, right. As someone is using your website, they're gonna be generating a lot of requests. And you need to look at the 95th percentile, because otherwise every request is basically you rolling the dice that they're not gonna hit one of those two second, three second, four second responses, close the tab and go to your competitor. So we look at this as, here's the crazy thing. Here's what I think is crazy. That blog post that DHH wrote, it's from 2009. It's been five years, and there's still no tool that does what DHH was asking for. So, we, frankly, we smelled money. We were like, holy crap. Y.K.: Yeah, why isn't that slide green? T.D.: Yeah. It should be green and dollars. I think keynote has the dollars, the make it rain effect I should have used. So we smelled blood in the water. We're like, this is awesome. There's only one problem that we discovered, and that is, it turns out that building this thing is actually really, really freaky hard. Really, really hard. So, we announced the private beta at RailsConf last year. Before doing that, we spent a year of research spiking out prototypes, building prototypes, building out the beta. We launched at RailsConf, and we realized, we made a lot of problems. We made a lot of errors when we were building this system. So then, after RailsConf last year, we basically took six months to completely rewrite the backend from the ground up. And I think tying into your keynote, Yehuda, we, we were like, oh. We clearly have a bespoke problem. No one else is doing this. So we rewrote our own custom backend. And then we had all these problems, and we realized that they had actually already all been solved by the open source community. And so we benefited tremendously by having a shared solution. Y.K.: Yeah. So our first release of this was really very bespoke, and the current release uses a tremendous amount of very off-the-shelf open source projects that just solved the particular problem very effectively, very well. None of which are as easy to use as Rails, but all of which solve really thorny problems very effectively. T.D.: So, so let's just talk, just for your own understanding, let's talk about how most performance monitoring tools work. So the way that most of these work is that you run your Rails app, and running inside of your Rails app is some gem, some agent that you install. And every time the Rails app handles a request, it generates events, and those events, which include information about performance data, those events are passed into the agent. And then the agent sends that data to some kind of centralized server. Now, it turns out that doing a running average is actually really simple. Which is why everyone does it. Basically you can do it in a single SQL query, right. All you do is you have three columns in database. The end point, the running average, and the number of requests, and then so, you can, those are the two things that you need to keep a running average, right. So keeping a running average is actually really simple from a technical point of view. Y.K.: I don't think you could even JavaScript through to the lack of integers. T.D.: Yes. You probably wouldn't want to do any math in JavaScript, it turns out. So, so we took a little bit different approach. Yehuda, do you want to go over the next section? Y.K.: Yeah. Sure. So, when we first started, right at the beginning, we basically did a similar thing where we had a bunch - your app creates events. Most of those start off as being ActiveSupport::Notifications, although it turns out that there's very limited use of ActiveSupport::Notifications so we had to do some normalization work to get them sane, which we're gonna be upstreaming back into, into Rails. But, one thing that's kind of unfortunate about having every single Rails app have an agent is that you end up having to do a lot of the same kind of work over and over again, and use up a lot of memory. So, for example, every one of these things is making HTTP requests. So now you have a queue of things that you're sending over HTTP in every single one of your Rails processes. And, of course, you probably don't notice this. People are used to Rails taking up hundreds and hundreds of megabytes, so you probably don't notice if you install some agent and it suddenly starts taking twenty, thirty, forty, fifty more megabytes. But we really wanted to keep the actual memory per process down to a small amount. So one of the very first things that we did, we even did it before last year, is that we pulled out all that shared logic into a, a separate process called the coordinator. And the agent is basically responsible simply for collecting the, the trace, and it's not responsible for actually talking to our server at all. And that means that the coordinator only has to do this queue, this keeping a st- a bunch of stuff of work in one place, and it doesn't end up using up as much memory. And I think this, this ended up being very effective for us. T.D.: And I think that low overhead also allows us to just collect more information, in general. Y.K.: Yeah. Now, after our first attempt, we started getting a bunch of customers that were telling us that even the separate - so the separate coordinator, started as a good thing and a bad thing. On the one hand, there's only one of them, so it uses up only one set of memory. On the other hand, it's really easy for someone to go in and PS that process and see how many megabytes of memory it's using. So, we got a lot of additional complaints that said oh, your process is using a lot of memory. And, I spent a few weeks, I, I know Ruby pretty well. I spent a couple of weeks. I actually wrote a gem called Allocation Counter that basically went in to try to pin point exactly where the allocations were hap- coming from. But it turns out that it's actually really, really hard to track down exactly where allocations are coming from in Ruby, because something as simple as using a regular expression in Ruby can allocate match objects that get put back on the stack. And so I was able to pair this down to some degree. But I really discovered quickly that, trying to keep a lid on the memory allocation by doing all the stuff in Ruby, is mostly fine. But for our specific use case where we really wanna, we wanna be telling you, you can run the agent on your process, on your box, and it's not gonna use a lot of memory. We really needed something more efficient, and our first thought was, we'll use C++ or C. No problem. C is, is native. It's great. And Carl did the work. Carl is very smart. And then he said, Yehuda. It is now your turn. You need to start maintaining this. And I said, I don't trust myself to write C++ code that's running in all of your guys's boxes, and not seg-fault. So I don't think that, that doesn't work for me. And so I, I noticed that rust was coming along, and what rust really gives you is it gives you the ability to write low-level code a la C or C++ with magma memory management, that keeps your memory allocation low and keeps things speedy. Low resource utilization. While also giving you compile time guarantees about not seg-faulting. So, again, if your processes randomly started seg-faulting because you installed the agent, I think you would stop being our customer very quickly. So having what, pretty much 100% guarantees about that was very important to us. And so that's why we decided to use rust. I'll just keep going. T.D.: Keep going. Y.K.: So, we had this coordinator object. And basically the coordinator object is receiving events. So the events basically end up being these traces that describe what's happening in your application. And the next thing, I think our initial work on this we used JSON just to send the pay load to the server, but we noticed that a lot of people have really big requests. So you may have a big request with a big SQL query in it, or a lot of big SQL queries in it. Some people have traces that are hundreds and hundreds of nodes long. And so we really wanted to figure out how to shrink down the payload size to something that we could be, you know, pumping out of your box on a regular basis without tearing up your bandwidth costs. So, one of the first things that we did early on was we switched using protobuf as the transport mechanism, and that really shrunk, shrunk down the payloads a lot. Our earlier prototypes for actually collecting the data were written in Ruby, but I think Carl did, like, a weekend hack to just pour it over the Java and got, like, 200x performance. And you don't always get 200x performance, if mostly what you're doing is database queries, you're not gonna get a huge performance swing. But mostly what we're doing is math. And algorithms and data structures. And for that, Ruby is, it could, in theory, one day, have a good git or something, but today, writing that code in Java didn't end up being significantly more code cause it's just, you know, algorithms and data structures. T.D.: And I'll just note something about standardizing on protobufs in our, in our stack, is actually a huge win, because we, we realized, hey, browsers, as it turns out are pretty powerful these days. They've got, you know, they can allocate memory, they can do all these types of computation. So, and protobuff's libraries exist everywhere. So we save ourselves a lot of computation and a lot of time by just treating protobuff as the canonical serialization form, and then you can move payloads around the entire stack and everything speaks the same language, so you've saved the serialization and deserialization. Y.K.: And JavaScript is actually surprisingly effective at des- at taking protobuffs and converting them to the format that we need efficiently. So, so we basically take this data. The Java collector is basically collecting all these protobuffs, and pretty much it just turns around, and this is sort of where we got into bespoke territory before we started rolling our own, but we realized that when you write a big, distributed, fault-tolerant system, there's a lot of problems that you really just want someone else to have thought about. So, what we do is we basically take these, take these payloads that are coming in. We convert them into batches and we send the batches down into the Kafka queue. And the, the next thing that happens, so the Kafka, sorry, Kafka's basically just a queue that allows you to throw things into, I guess, it might be considered similar to like, something lime AMQP. It has some nice fault-tolerance properties and integrates well with storm. But most important it's just super, super high through-put. So basically didn't want to put any barrier between you giving us the data and us getting it to disc as soon as possible. T.D.: Yeah. Which we'll, I think, talk about in a bit. Y.K.: So we, so the basic Kafka takes the data and starts sending it into Storm. And if you think about what has to happen in order to get some request. So, you have these requests. There's, you know, maybe traces that have a bunch of SQL queries, and our job is basically to take all those SQL queries and say, OK, I can see that in all of your requests, you had the SQL query and it took around this amount of time and it happened as a child of this other node. And the way to think about that is basically just a processing pipeline. Right. So you have these traces that come in one side. You start passing them through a bunch of processing steps, and then you end up on the other side with the data. And Storm is actually a way of describing that processing pipeline in sort of functional style, and then you tell it, OK. Here's how many servers I need. Here's how, here's how I'm gonna handle failures. And it basically deals with distribution and scaling and all that stuff for you. And part of that is because you wrote everything using functional style. And so what happens is Kafka sends the data into the entry spout, which is sort of terminology in, terminology in Storm for these streams that get created. And they basically go into these processing things, which very clever- cutely are called bolts. This is definitely not the naming I would have used, but. So they're called bolts. And the idea is that basically every request may have several things. So, for example, we now automatically detect n + 1 queries and that's sort of a different kind of processing from just, make a picture of the entire request. Or what is the 95th percentile across your entire app, right. These are all different kinds of processing. So we take the data and we send them into a bunch of bolts, and the cool thing about bolts is that, again, because they're just functional chaining, you can take the output from one bolt and feed it into another bolt. And that works, that works pretty well. And, and you don't have to worry about - I mean, you have to worry a little bit about things like fault tolerance, failure, item potence. But you worry about them at, at the abstraction level, and then the operational part is handled for you. T.D.: So it's just like a very declarative way of describing how this computation work in, in a way that's easy to scale. Y.K.: And Carl actually talked about this at very high speed yesterday, and you, some of you may have been there. I would recommend watching the video when it comes out if you want to make use of this stuff in your own applications. And then when you're finally done with all the processing, you need to actually do something with it. You need to put it somewhere so that the web app can get access to it, and that is basically, we use Cassandra for this. And Cassandra again is mostly, it's a dumb database, but it has, it's, has high capacity. It has some of the fault-tolerance capacities that we want. T.D.: We're very, we're just very, very heavy, right. Like, we tend to be writing more than we're ever reading. Y.K.: Yup. And then when we're done, when we're done with a particular batch, Cassandra basically kicks off the process over again. So we're basically doing these things as batches. T.D.: So these are, these are roll-ups, is what's happening here. So basically every minute, every ten minutes, and then at every hour, we reprocess and we re-aggregate, so that when you query us we know exactly what to give you. Y.K.: Yup. So we sort of have this cycle where we start off, obviously, in the first five second, the first minute, you really want high granularity. You want to see what's happening right now. But, if you want to go back and look at data from three months ago, you probably care about it, like the day granularity or maybe the hour granularity. So, we basically do these roll-ups and we cycle through the process. T.D.: So this, it turns out, building the system required an intense amount of work. Carl spent probably six months reading PHP thesises to find- Y.K.: Thesis. T.D.: Thesis. To find, to find data structures and algorithms that we could use. Because this is a huge amount of data. Like, I think even a few months after we were in private data, private beta, we were already handling over a billion requests per month. And obviously there's no way that we- Y.K.: Basically the number of requests that we handle is the sum of all of the requests that you handle. T.D.: Right. Y.K.: And all of our customers handle. T.D.: Right. Right. So. Y.K.: So that's a lot of requests. T.D.: So obviously we can't provide a service, at least one that's not, we can't provide an affordable service, an accessible service, if we have to store terabytes or exabytes of data just to tell you how your app is running. Y.K.: And I think, also a problem, it's problematic if you store all the data in a database and then every single time someone wants to learn something about that, you have to do a query. Those queries can take a very long time. They can take minutes. And I think we really wanted to have something that would be very, that would, where the feedback loop would be fast. So we wanted to find algorithms that let us handle the data at, at real time, and then provide it to you at real time instead of these, like, dump the data somewhere and then do these complicated queries. T.D.: So, hold on. So this slide was not supposed to be here. It was supposed to be a Rails slide. So, whoa. I went too far. K. We'll watch that again. That's pretty cool. So then the last thing I want to say is, perhaps your take away from looking at this architecture diagram is, oh my gosh, these Rails guys completely- Y.K.: They really jumped the shark. T.D.: They jumped the shark. They ditched Rails. I saw, like, three Tweets yesterday - I wasn't here, I was in Portland yesterday, but I saw, like, three Tweets that were like, I'm at RailsConf and I haven't seen a single talk about, like, Rails. So that's true here, too. But, I want to assure you that we are only using this stack for the heavy computation. We started in Rails. We started, we were like, hey, what do we need. Ah, well, people probably need to authenticate and log in, and we probably need to do billing. And those are all things that Rails is really, really good at. So we started with Rails as, basically, the starting point, and then when we realized oh my gosh, computation is really slow. There's no way we're gonna be able to offer this service. OK. Now let's think about how we can do all of that. Y.K.: And I think notably, a lot of people who look at Rails are like, there's a lot of companies that have built big stuff on Rails, and their attitude is, like, oh, this legacy terrible Rails app. I really wish we could get rid of it. If we could just write everything in Scala or Clojure or Go, everything would be amazing. That is definitely not our attitude. Our attitude is that Rails is really amazing, at particular, at the kinds of things that are really common across everyone's web applications - authentication, billing, et cetera. And we really want to be using Rails for the parts of our app- even things like error-tracking, we do through the Rails app. We want to be using Rails because it's very productive at doing those things. It happens to be very slow with doing data crunching, so we're gonna use a different tool for that. But I don't think you'll ever see me getting up and saying, ah, I really wish we had just started writing, you know, the Rails app in rust. T.D.: Yeah. Y.K.: That would be terrible. T.D.: So that's number one, is, is, honest response times, which we're, which it turns out, seems like it should be easy, requires storing insane amount of data. So the second thing that we realized when we were looking at a lot of these tools, is that most of them focus on data. They focus on giving you the raw data. But I'm not a machine. I'm not a computer. I don't enjoy sifting through data. That's what computers are good for. I would rather be drinking a beer. It's really nice in Portland, this time of year. So, we wanted to think about, if you're trying to solve the performance problems in your application, what are the things that you would suss out with the existing tools after spending, like, four hours depleting your ego to get there? Y.K.: And I think part of this is just people are actually very, people like to think that they're gonna use these tools, but when the tools require you to dig through a lot of data, people just don't use them very much. So, the goal here was to build a tool that people actually use and actually like using, and not to build a tool that happens to provide a lot of data you can sift through. T.D.: Yes. So, probably the, one of the first things that we realized is that we don't want to provide. This is a trace of a request, you've probably seen similar UIs using other tools, using, for example, the inspector in, in like Chrome or Safari, and this is just showing basically, it's basically a visual stack trace of where your application is spending its time. But I think what was important for us is showing not just a single request, because your app handles, you know, hundreds of thousands of requests, or millions of requests. So looking at a single request statistically is complete, it's just noise. Y.K.: And it's especially bad if it's the worst request, because the worst request is, is really noise. It's like, a hiccup in the network, right. T.D.: It's the outlier. Yeah. Y.K.: It's literally the outlier. T.D.: It's literally the outlier. Yup. So, what we present in Skylight is something a little bit different, and it's something that we call the aggregate trace. So the aggregate trace is basically us taking all of your requests, averaging them out where each of these things spends their time, and then showing you that. So this is basically like, this is like, this is like the statue of David. It is the idealized form of the stack trace of how your application's behaving. But, of course, you have the same problem as before, which is, if this is all that we were showing you, it would be obscuring a lot of information. You want to actually be able to tell the difference between, OK, what's my stack trace look like for fast requests, and how does that differ from requests that are slower. So what we've got, I've got a little video here. You can see that when I move the slider, that this trace below it is actually updating in real time. As I move the slider around, you can see that the aggregate trace actually updates with it. And that's because we're collecting all this information. We're collecting, like I said, a lot of data. We can recompute this aggregate trace on the fly. Basically, for each bucket, we're storing a different trace, and then on the client we're reassembling that. We'll go into that a little bit. Y.K.: And I think it's really important that you be able to do these experiments quickly. If every time you think, oh, I wonder what happens if I add another histogram bucket, if it requires a whole full page refresh. Then that would basically make people not want to use the tool. Not able to use the tool. So, actually building something which is real time and fast, gets the data as it comes, was really important to us. T.D.: So that's number one. And the second thing. So we built that, and we're like, OK, well what's next? And I think that the big problem with this is that you need to know that there's a problem before you go look at it, right. So we have been working for the past few months, and the Storm infrastructure that we built makes it pretty straight-forward to start building more abstractions on top of the data that we've already collected. It's a very declarative system. So we've been working on a feature called inspections. And what's cool about inspections is that we can look at this tremendous volume of data that we've collected from your app, and we can automatically tease out what the problems are. So the first one that we shipped, this is in beta right now. It's not, it's not out and enabled by default, but there, it's behind a feature flag that we've had some users turning on. And, and trying out. And so what we can do in this case, is because we have information about all of the database queries in your app, we can look and see if you have n plus one queries. Can you maybe explain what an n plus one query is? Y.K.: Yeah. So, I'm, people know, hopefully, what n plus one queries. But the, it's the idea that you, by accident, for some reason, instead of making one query, you ask for like all the posts and then you iterated through all of them and got all the comments and now you, instead of having one query, you have one query per post, right. And you, what I've, what I've like to do is do eager reloading, where you say include comments, right. But you have to know that you have to do that. So there's some tools that will run in development mode, if you happen to catch it, like a bullet. This is basically a tool that's looking at every single one of your classes and has some thresholds that, once we see that a bunch of your requests have the same exact query, so we do some work to pull out binds. So if it's, like, where something equals one, we will automatically pull out the one and replace it with a question mark. And then we basically take all those queries, if they're the exact same query repeated multiple times, subject to some thresholds, we'll start showing you hey, there's an n plus one query. And you can imagine this same sort of thing being done for things, like, are you missing an index, right. Or, are you using the Ruby version of JSON when you should be using the native version of JSON. These are all things that we can start detecting just because we're consuming an enormous amount of information, and we can start writing some heuristics for bubbling it up. So, third and final breakthrough, we realized that we really, really needed a lightning fast UI. Something really responsive. So, in particular, the feedback loop is critical, right. You can imagine, if the way that you dug into data was you clicked and you wait an hour, and then you get your results, no one would do it. No one would ever do it. And the existing tools are OK, but you click and you wait. You look at it and you're like, oh, I want a different view, so then you go edit your query and then you click and you wait and it's just not a pleasant experience. So, so we use Ember, the, the UI that you're using when you log into Skylight. Even though it feels just like a regular website, it doesn't feel like a native app, is powered, all of the routing, all of the rendering, all of the decision making, is happening in, as an Ember.js app, and we pair that with D3. So all of the charts, the charts that you saw there in the aggregate trace, that is all Ember components powered by D3. So, this is actually significantly cleaned up our client-side code. It makes re-usability really, really awesome. So to give you an example, this is from our billing page that I, the designer came and they had, they had a component that was like, the gate component. And, the- T.D.: It seems really boring at first. Y.K.: It seemed really boring. But, this is the implementation, right. So you could copy and paste this code over and over again, everywhere you go. Just remember to format it correctly. If you forget to format it, it's not gonna look the same everywhere. But I was like, hey, we're using this all over the place. Why don't we bundle this up into a component? And so with Ember, it was super easy. We basically just said, OK, here's new calendar date component. It has a property on it called date. Just set that to any JavaScript data object. Just set, you don't have to remember about converting it or formatting it. Here's the component. Set the date and it will render the correct thing automatically. And, so the architecture of the Ember app looks a little bit, something like this, where you have many, many different components, most of them just driven by D3, and then they're plugged into the model and the controller. And the Ember app will go fetch those models from the cloud, and the cloud from the Java app, which just queries Cassandra, and render them. And what's neat about this model is turning on web sockets is super easy, right. Because all of these components are bound to a single place. So when the web socket says, hey, we have updated information for you to show, it just pushes it onto the model or onto the controller, and the whole UI updates automatically. It's like magic. And- Y.K.: Like magic. T.D.: It's like magic. And, and when debugging, this is especially awesome too, because, and I'll maybe show a demo of the Ember inspector. It's nice. So. Yeah. So, lightning fast UI. Reducing the feedback loop so that you can quickly play with your data, makes it go from a chore to something that actually feels kind of fun. So, these were the breakthroughs that we had when we were building Skylight. The things that made us think, yes, this is actually a product that we think deserves to be on the market. So, one, honest response times. Collect data that no one else can collect. Focus on answers instead of just dumping data, and have a lightning fast UI to do it. So we like to think of Skylight as basically a smart profiler. It's a smart profiler that runs in production. It's like the profiler that you run on your local development machine, but instead of being on your local dev box which has nothing to do with the performance characteristics of what your users are experience, we're actually running in production. So, let me just give you guys a quick demo. So, this is what the Skylight, this is what Skylight looks like. What's under this? There we go. So, the first thing here is we've got the app dash board. So this, it's like our, 95th responsile- 95th percentile response time has peaked. Maybe you're all hammering it right now. That would be nice. So, this is a graph of your response time over time, and then on the right, this is the graph of the RPMs, the requests per minute that your app is handling. So this is app-wide. And this is live. This updates every minute. Then down below, you have a list of the end points in your application. So you can see, actually, the top, the slowest ones for us were, we have an instrumentation API, and we've gone and instrumented our background workers. So we can see them here, and their response time plays in. So we can see that we have this reporting worker that's taking 95th percentile, thirteen seconds. Y.K.: So all that time used to be inside of some request somewhere, and we discovered that there was a lot of time being spent in things that we could push to the background. We probably need to update the agony index so that it doesn't make workers very high, because spending some time in your workers is not that big of a deal. T.D.: So, so then, if we dive into one of these, you can see that for this request, we've got the time explorer up above, and that shows a graph of response time at, again, 95th percentile, and you can, if you want to go back and look at historical data, you just drag it like this. And this has got a brush, so you can zoom in and out on different times. And every time you change the range, you can see that it's very responsive. It's never waiting for the server. But it is going back and fetching data from the server and then when the data comes back, you see the whole UI just updates. And we get that for free with Ember and And then down below, as we discussed, you actually have a real histogram. And this histogram, in this case, is showing. So this is for fifty-seven requests. And if we click and drag, we could just move this. And you can see that the aggregate trace below updates in response to us dragging this. And if we want to look at the fastest quartile, we just click faster and we'll just choose that range on the histogram. Y.K.: I think it's the fastest load. T.D.: The fastest load. And then if you click on slower, you can see the slower requests. So this makes it really easy to compare and contrast. OK. Why are certain requests faster and why are certain requests slow? You can see the blue, these blue areas. This is Ruby code. So, right now it's not super granular. It would be nice if you could actually know what was going on here. But, it'll at least tell you where in your controller action this is happening, and then you can actually see which database queries are being executed, and what their duration is. And you can see that we actually extract the SQL and we denormalize it so we, so you, or, we normalize it so you can see exactly what those requests are even if the values are totally different between them. Y.K.: Yeah. So the real query, courtesy of Rails, not yet supporting bind extraction is like, where id equals one or, ten or whatever. T.D.: Yup. So that's pretty cool. Y.K.: So one, one other thing is, initially, we actually just showed the whole trace, but we discovered that, obviously when you show whole traces you have information that doesn't really matter that much. So we started off by, we've recently basically started to collapse things that don't matter so much so that you can basically expand or condense the trace. And we wanted to make it not, but you have to think about expanding or condensing individual areas, but just, you see what matters the most and then you can see trivial errors. T.D.: Yup. So, so that's the demo of Skylight. We'd really like it if you checked it out. There is one more thing I want to show you that is, like, really freaking cool. This is coming out of Tilde labs. Carl was like, has been hacking, he's been up until past midnight, getting almost no sleep for the past month trying to have this ready. I don't know how many of you know this, but Ruby 2 point 1 has a new, a, a stack sampling feature. So you can get really granular information about how your Ruby code is performing. So I want to show you, I just mentioned how it would be nice if we could get more information out of what your Ruby code is doing. And now we can do that. Basically, every few milliseconds, this code that Carl wrote is going into the, to the Ruby, into MRI, and it's taking a snap shot of the stack. And because this is built-in, it's very low-impact. It's not allocating any new memory. It's very little performance hit. Basically you wouldn't even notice it. And so every few milliseconds it's sampling, and we take that information and we send it up to our servers. So it's almost like you're running Ruby profiler on your local dev box, where you get extremely granular information about where your code is spending its time in Ruby, per method, per all of these things. But it's happening in production. So, this is, so this is a, we enabled it in staging. You can see that we've got some rendering bugs. It's still in beta. Y.K.: Yeah, and we haven't yet collapsed things that are not important- T.D.: Yes. Y.K.: -for this particular feature. T.D.: So we want to show, we want to hide things like, like framework code, obviously. But this gives you an incredibly, incredibly granular view of what your app is doing in production. And we think. This is a, an API that's built into, into Ruby 2.1.1. Because our agent is running so low-level, because we wrote it in Rust, we have the ability to do things like this, and Carl thinks that we may be able to actually back port this to older Rubies, too. So if you're not on Ruby 2.1, we think that we can actually bring this. But that's TPD. Y.K.: Yeah, I- so I think the cool thing about this, in general, is when you run a sampling- so this is a sampling profiler, right, we don't want to be burning every single thing that you do in your program with tracing, right. That would be very slow. So when you normally run a sampling profiler, you have to basically make a loop. You have to basically create a loop, run this code a million times and keep sampling it. Eventually we'll get enough samples to get the information. But it turns out that your production server is a loop. Your production server is serving tons and tons of requests. So, by simply tak- you know, taking a few microseconds out of every request and collecting a couple of samples, over time we can actually get this really high fidelity picture with basically no cost. And that's pretty mind-blowing. And this is the kind of stuff that we can start doing by really caring about, about both the user experience and the implementation and getting really scary about it. And I'm really, like, honestly this is a really exciting feature that really shows what we can do as we start building this out. T.D.: Once we've got that, once we've got that groundwork. So if you guys want to check it out, Skylight dot io, it's available today. It's no longer in private beta. Everyone can sign up. No invitation token necessary. And you can get a thirty-day free trial if you haven't started one already. So if you have any questions, please come see us right now, or we have a booth in the vendor hall. Thank you guys very much.