[Script Info] Title: [Events] Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text Dialogue: 0,0:00:16.99,0:00:18.12,Default,,0000,0000,0000,,CARL LERCHE: My name is Carl Lerche and Dialogue: 0,0:00:18.12,0:00:22.01,Default,,0000,0000,0000,,I'm gonna be talking about Apache Storm. Dialogue: 0,0:00:22.01,0:00:24.09,Default,,0000,0000,0000,,I am shooting for forty-minutes. Dialogue: 0,0:00:24.09,0:00:25.60,Default,,0000,0000,0000,,It's going to be a long talk. Dialogue: 0,0:00:25.60,0:00:27.39,Default,,0000,0000,0000,,I'm gonna try to cram everything in. Dialogue: 0,0:00:27.39,0:00:28.58,Default,,0000,0000,0000,,I already got a late start. Dialogue: 0,0:00:28.58,0:00:30.70,Default,,0000,0000,0000,,There probably won't be time for questions. Dialogue: 0,0:00:30.70,0:00:32.43,Default,,0000,0000,0000,,I'm also probably going to gloss over Dialogue: 0,0:00:32.43,0:00:34.46,Default,,0000,0000,0000,,a few things that, like, hand-wave-y things, Dialogue: 0,0:00:34.46,0:00:37.43,Default,,0000,0000,0000,,so if something comes up, you have a question, Dialogue: 0,0:00:37.43,0:00:39.34,Default,,0000,0000,0000,,just send me a Tweet directly Dialogue: 0,0:00:39.34,0:00:41.47,Default,,0000,0000,0000,,and I will respond to it after the talk. Dialogue: 0,0:00:41.47,0:00:44.21,Default,,0000,0000,0000,,Or just come and talk. Dialogue: 0,0:00:44.21,0:00:47.82,Default,,0000,0000,0000,,So I work for Tilda, and our product is Dialogue: 0,0:00:47.82,0:00:50.40,Default,,0000,0000,0000,,Skylight, and we're building a smart profiler\Nfor your Dialogue: 0,0:00:50.40,0:00:54.17,Default,,0000,0000,0000,,Rails app. And I've actually built the entire\Nbackend Dialogue: 0,0:00:54.17,0:00:56.52,Default,,0000,0000,0000,,using Storm, so another thing, if you want\Nto Dialogue: 0,0:00:56.52,0:00:58.36,Default,,0000,0000,0000,,talk about how that works, how we've, I've\Nended Dialogue: 0,0:00:58.36,0:01:00.86,Default,,0000,0000,0000,,up using Storm and the, in the real world, Dialogue: 0,0:01:00.86,0:01:02.80,Default,,0000,0000,0000,,I guess. We, I'll be happy to talk about Dialogue: 0,0:01:02.80,0:01:03.97,Default,,0000,0000,0000,,it. Dialogue: 0,0:01:03.97,0:01:07.77,Default,,0000,0000,0000,,So what is Storm? Let's start with how it Dialogue: 0,0:01:07.77,0:01:10.78,Default,,0000,0000,0000,,describes itself on the web site. It calls\Nitself Dialogue: 0,0:01:10.78,0:01:15.25,Default,,0000,0000,0000,,a distributed real time computation system,\Nand that sounds Dialogue: 0,0:01:15.25,0:01:19.31,Default,,0000,0000,0000,,really, really fancy. The first time I thought\Nabout Dialogue: 0,0:01:19.31,0:01:20.58,Default,,0000,0000,0000,,using it, I was like, oh, let me go Dialogue: 0,0:01:20.58,0:01:23.11,Default,,0000,0000,0000,,look and see what Storm's all about. I saw Dialogue: 0,0:01:23.11,0:01:25.44,Default,,0000,0000,0000,,this, was like, whoa. Step back. I'm not going Dialogue: 0,0:01:25.44,0:01:27.22,Default,,0000,0000,0000,,to. This, this sounds like, this is too serious Dialogue: 0,0:01:27.22,0:01:29.83,Default,,0000,0000,0000,,business for me. I need something simpler. Dialogue: 0,0:01:29.83,0:01:31.79,Default,,0000,0000,0000,,It took me like maybe like another six months Dialogue: 0,0:01:31.79,0:01:33.37,Default,,0000,0000,0000,,of really, OK, I'm gonna take time, I'm gonna Dialogue: 0,0:01:33.37,0:01:35.78,Default,,0000,0000,0000,,learn this, and as I got to know it, Dialogue: 0,0:01:35.78,0:01:37.59,Default,,0000,0000,0000,,it really came to me. Oh, this can be Dialogue: 0,0:01:37.59,0:01:40.19,Default,,0000,0000,0000,,used for many, many other use cases. I'm not Dialogue: 0,0:01:40.19,0:01:44.01,Default,,0000,0000,0000,,going to. I might, at the end, if there's Dialogue: 0,0:01:44.01,0:01:45.98,Default,,0000,0000,0000,,time, talk about some of them, but I'm going Dialogue: 0,0:01:45.98,0:01:48.73,Default,,0000,0000,0000,,to try to get into the nitty gritty sooner Dialogue: 0,0:01:48.73,0:01:50.02,Default,,0000,0000,0000,,than later. Dialogue: 0,0:01:50.02,0:01:51.93,Default,,0000,0000,0000,,So, but for now, I'm just gonna say, Storm Dialogue: 0,0:01:51.93,0:01:55.94,Default,,0000,0000,0000,,is a really powerful worker system. So some\Nthings Dialogue: 0,0:01:55.94,0:01:59.36,Default,,0000,0000,0000,,about it. It is distributed. It's very, very,\Nvery Dialogue: 0,0:01:59.36,0:02:01.04,Default,,0000,0000,0000,,distributed, which is a good and a bad thing. Dialogue: 0,0:02:01.04,0:02:03.91,Default,,0000,0000,0000,,One, the good thing is you can, like, spread Dialogue: 0,0:02:03.91,0:02:06.55,Default,,0000,0000,0000,,your load across many servers. The bad thing\Nis, Dialogue: 0,0:02:06.55,0:02:08.87,Default,,0000,0000,0000,,your operate, like your ops system is gonna\Nlook Dialogue: 0,0:02:08.87,0:02:12.34,Default,,0000,0000,0000,,something like this. Which can turn out to\Nbe Dialogue: 0,0:02:12.34,0:02:14.63,Default,,0000,0000,0000,,somewhat of an operational headache. So, like,\Nyou'll end Dialogue: 0,0:02:14.63,0:02:17.34,Default,,0000,0000,0000,,up having to run a Zookeeper cluster. You'll\Nend Dialogue: 0,0:02:17.34,0:02:20.12,Default,,0000,0000,0000,,up having to run, like, the Storm Nimbus process Dialogue: 0,0:02:20.12,0:02:21.85,Default,,0000,0000,0000,,somewhere, and then there's gonna be a number\Nof Dialogue: 0,0:02:21.85,0:02:24.73,Default,,0000,0000,0000,,other Storm coordination processes. There's\Ngonna be, like, every Dialogue: 0,0:02:24.73,0:02:26.30,Default,,0000,0000,0000,,server's gonna have the worker process. Dialogue: 0,0:02:26.30,0:02:28.11,Default,,0000,0000,0000,,And then, like, while you're at it, you've\Ngone Dialogue: 0,0:02:28.11,0:02:29.77,Default,,0000,0000,0000,,this far, let's throw in a distributed database\Nand Dialogue: 0,0:02:29.77,0:02:32.97,Default,,0000,0000,0000,,some distributed queues in there too. But\Nthe main Dialogue: 0,0:02:32.97,0:02:35.05,Default,,0000,0000,0000,,point is there's going to be an operational\Noverhead. Dialogue: 0,0:02:35.05,0:02:38.21,Default,,0000,0000,0000,,So don't just do it for fun. There needs Dialogue: 0,0:02:38.21,0:02:39.85,Default,,0000,0000,0000,,to be a good reason. If you can get Dialogue: 0,0:02:39.85,0:02:42.41,Default,,0000,0000,0000,,the job done on a single server, go for Dialogue: 0,0:02:42.41,0:02:44.39,Default,,0000,0000,0000,,that. Dialogue: 0,0:02:44.39,0:02:47.58,Default,,0000,0000,0000,,It's fault-tolerant, and what I mean by that\Nis, Dialogue: 0,0:02:47.58,0:02:50.72,Default,,0000,0000,0000,,it is able to recover and continue making\Nprogress Dialogue: 0,0:02:50.72,0:02:52.85,Default,,0000,0000,0000,,in the event of a failure. And I know Dialogue: 0,0:02:52.85,0:02:56.60,Default,,0000,0000,0000,,a lot of existing systems claim to, claim\Nto Dialogue: 0,0:02:56.60,0:03:00.92,Default,,0000,0000,0000,,do this, but the reality is, pandering faults\Nis Dialogue: 0,0:03:00.92,0:03:06.36,Default,,0000,0000,0000,,really, really hard. So it doesn't make it,\Nlike, Dialogue: 0,0:03:06.36,0:03:08.55,Default,,0000,0000,0000,,seamless. Storm doesn't make it seamless.\NThe reality is Dialogue: 0,0:03:08.55,0:03:10.09,Default,,0000,0000,0000,,it can't be seamless. But I think, and I'm Dialogue: 0,0:03:10.09,0:03:11.79,Default,,0000,0000,0000,,going to talk about it later, I think the Dialogue: 0,0:03:11.79,0:03:14.12,Default,,0000,0000,0000,,way it does it is about as easy as Dialogue: 0,0:03:14.12,0:03:15.62,Default,,0000,0000,0000,,it can get. I hope. Dialogue: 0,0:03:15.62,0:03:19.96,Default,,0000,0000,0000,,It's really fast. Very low overhead. Some\Ncompletely useless Dialogue: 0,0:03:19.96,0:03:22.59,Default,,0000,0000,0000,,benchmark numbers. It's been clocked at, like,\Nprocessing over Dialogue: 0,0:03:22.59,0:03:26.29,Default,,0000,0000,0000,,a million messages per second on a single\Nserver. Dialogue: 0,0:03:26.29,0:03:30.13,Default,,0000,0000,0000,,Completely useless number but it sounds good. Dialogue: 0,0:03:30.13,0:03:33.93,Default,,0000,0000,0000,,Supposedly it's language agnostic. And by\Nthat I mean, Dialogue: 0,0:03:33.93,0:03:36.00,Default,,0000,0000,0000,,supposedly you can use absolutely any language\Nat all Dialogue: 0,0:03:36.00,0:03:41.47,Default,,0000,0000,0000,,to build your data processing pipeline. There\Nare examples, Dialogue: 0,0:03:41.47,0:03:44.93,Default,,0000,0000,0000,,on the website, where they use bash to, I Dialogue: 0,0:03:44.93,0:03:47.68,Default,,0000,0000,0000,,don't know why, it's probably a novelty. If\Nyou're Dialogue: 0,0:03:47.68,0:03:50.54,Default,,0000,0000,0000,,distributed bash sounds good. But I guess\Nthat means Dialogue: 0,0:03:50.54,0:03:54.90,Default,,0000,0000,0000,,you can probably use MRI. However, I've personally\Nonly Dialogue: 0,0:03:54.90,0:03:57.50,Default,,0000,0000,0000,,used it on the JVM. It's a JVM-based project. Dialogue: 0,0:03:57.50,0:04:00.01,Default,,0000,0000,0000,,It's written in Java and Clojure. It uses\Nmany Dialogue: 0,0:04:00.01,0:04:03.82,Default,,0000,0000,0000,,Java libraries. I would just recommend, if\Nyou're gonna Dialogue: 0,0:04:03.82,0:04:05.69,Default,,0000,0000,0000,,try it, try it out on the JVM. Dialogue: 0,0:04:05.69,0:04:08.40,Default,,0000,0000,0000,,And the good news is, we have JRuby, which Dialogue: 0,0:04:08.40,0:04:11.19,Default,,0000,0000,0000,,has done an extremely good job at integrating\Nwith Dialogue: 0,0:04:11.19,0:04:14.51,Default,,0000,0000,0000,,Java. So you can use these Java libraries,\Nand Dialogue: 0,0:04:14.51,0:04:17.34,Default,,0000,0000,0000,,you're just writing Ruby. Some of the things\NI Dialogue: 0,0:04:17.34,0:04:20.05,Default,,0000,0000,0000,,want to handwave also is exactly how to get, Dialogue: 0,0:04:20.05,0:04:21.91,Default,,0000,0000,0000,,like, all the details of getting it running\Non Dialogue: 0,0:04:21.91,0:04:25.34,Default,,0000,0000,0000,,JRuby. But there is a GitHub project called\Nredstorm. Dialogue: 0,0:04:25.34,0:04:28.99,Default,,0000,0000,0000,,You could probably just Google GitHub Redstorm.\NI've realized Dialogue: 0,0:04:28.99,0:04:31.46,Default,,0000,0000,0000,,I didn't actually include any links. But,\Nagain, I Dialogue: 0,0:04:31.46,0:04:32.73,Default,,0000,0000,0000,,can find them later. Dialogue: 0,0:04:32.73,0:04:35.96,Default,,0000,0000,0000,,So it does a JRuby binding to storm. It Dialogue: 0,0:04:35.96,0:04:38.36,Default,,0000,0000,0000,,also goes a bit further. It provides, like,\NRuby Dialogue: 0,0:04:38.36,0:04:41.32,Default,,0000,0000,0000,,DSLs for everything related to storm. So you\Ncan Dialogue: 0,0:04:41.32,0:04:43.49,Default,,0000,0000,0000,,define all your data transformations in terms\Nof Ruby Dialogue: 0,0:04:43.49,0:04:45.13,Default,,0000,0000,0000,,DSLs. Dialogue: 0,0:04:45.13,0:04:48.32,Default,,0000,0000,0000,,So the way I'm gonna kind of go over Dialogue: 0,0:04:48.32,0:04:49.60,Default,,0000,0000,0000,,storm is I'm going to build up a straw Dialogue: 0,0:04:49.60,0:04:50.96,Default,,0000,0000,0000,,man. That straw man is going to be, how Dialogue: 0,0:04:50.96,0:04:53.63,Default,,0000,0000,0000,,would one implement Twitter trending topics.\NYou know, it's, Dialogue: 0,0:04:53.63,0:04:57.11,Default,,0000,0000,0000,,I think I, we all hopefully know about what Dialogue: 0,0:04:57.11,0:05:00.58,Default,,0000,0000,0000,,a trending topic is. That's kind of why I Dialogue: 0,0:05:00.58,0:05:04.93,Default,,0000,0000,0000,,picked it. It's. But. Essentially, everybody\NTweets and they Dialogue: 0,0:05:04.93,0:05:08.46,Default,,0000,0000,0000,,can include these hash tags. Hash tag RailsConf.\NHash Dialogue: 0,0:05:08.46,0:05:12.36,Default,,0000,0000,0000,,tag whatever. And as the rates of single hash Dialogue: 0,0:05:12.36,0:05:14.64,Default,,0000,0000,0000,,tags increase, like the highest, the most,\Nthe hash Dialogue: 0,0:05:14.64,0:05:17.10,Default,,0000,0000,0000,,tags that occur at the highest rates end up Dialogue: 0,0:05:17.10,0:05:18.59,Default,,0000,0000,0000,,getting bubbled up, and Twitter says, ah,\Nthese are Dialogue: 0,0:05:18.59,0:05:20.66,Default,,0000,0000,0000,,the topics that are trending so that everybody\Ncan Dialogue: 0,0:05:20.66,0:05:21.29,Default,,0000,0000,0000,,see what's going on. Dialogue: 0,0:05:21.29,0:05:24.30,Default,,0000,0000,0000,,So, like, around here, you might see, RailsConf\Nis Dialogue: 0,0:05:24.30,0:05:25.100,Default,,0000,0000,0000,,trending. Another good reason, a reason why\NI picked Dialogue: 0,0:05:25.100,0:05:29.53,Default,,0000,0000,0000,,it is it's very time-sensitive, and times\Ninto, ties Dialogue: 0,0:05:29.53,0:05:31.73,Default,,0000,0000,0000,,into the real time aspect of Storm. Dialogue: 0,0:05:31.73,0:05:34.31,Default,,0000,0000,0000,,All right. So to implement this, the way I'm Dialogue: 0,0:05:34.31,0:05:37.60,Default,,0000,0000,0000,,going to calculate the rate, like how often\Na Dialogue: 0,0:05:37.60,0:05:39.97,Default,,0000,0000,0000,,Tweet is happening is by using an exponentially\Nweighted Dialogue: 0,0:05:39.97,0:05:44.29,Default,,0000,0000,0000,,moving average. Fancy. Basically what it means\Nis like, Dialogue: 0,0:05:44.29,0:05:46.83,Default,,0000,0000,0000,,as we're moving up, going, like, processing\Nthe Tweets, Dialogue: 0,0:05:46.83,0:05:48.19,Default,,0000,0000,0000,,what I'm gonna do is I'm gonna count the Dialogue: 0,0:05:48.19,0:05:50.35,Default,,0000,0000,0000,,number of occurrences of each hash tag, for\Nlike, Dialogue: 0,0:05:50.35,0:05:52.86,Default,,0000,0000,0000,,every five seconds. The interval doesn't mat,\Nreally matter. Dialogue: 0,0:05:52.86,0:05:54.57,Default,,0000,0000,0000,,Then we're gonna average them up. Dialogue: 0,0:05:54.57,0:05:57.73,Default,,0000,0000,0000,,And, but instead of doing a normal average,\Nwhich Dialogue: 0,0:05:57.73,0:05:59.81,Default,,0000,0000,0000,,is sum up all the values divided by the Dialogue: 0,0:05:59.81,0:06:01.92,Default,,0000,0000,0000,,number of values, we're going to do a weighted Dialogue: 0,0:06:01.92,0:06:05.36,Default,,0000,0000,0000,,average, which is going to cause the older,\Nlike Dialogue: 0,0:06:05.36,0:06:09.91,Default,,0000,0000,0000,,the older occurrences, to drop in weights\Nexponentially. This Dialogue: 0,0:06:09.91,0:06:12.89,Default,,0000,0000,0000,,is how Linux does their one minute, five minute, Dialogue: 0,0:06:12.89,0:06:14.20,Default,,0000,0000,0000,,and fifteen minute load averages. Dialogue: 0,0:06:14.20,0:06:16.87,Default,,0000,0000,0000,,It's a pretty nice way of smoothing out the Dialogue: 0,0:06:16.87,0:06:19.13,Default,,0000,0000,0000,,curves, and you can tune how you want the Dialogue: 0,0:06:19.13,0:06:22.69,Default,,0000,0000,0000,,curves to end up getting smoothed. So first\Nis Dialogue: 0,0:06:22.69,0:06:25.64,Default,,0000,0000,0000,,kind of build up some context. Let's kind\Nof Dialogue: 0,0:06:25.64,0:06:28.13,Default,,0000,0000,0000,,look at how one might implement this with\NSidekiq Dialogue: 0,0:06:28.13,0:06:31.89,Default,,0000,0000,0000,,or, like, Resque or whatever. Any other, like,\Nkind Dialogue: 0,0:06:31.89,0:06:34.88,Default,,0000,0000,0000,,of simpler system that just pops off of a Dialogue: 0,0:06:34.88,0:06:35.74,Default,,0000,0000,0000,,queue with a worker. Dialogue: 0,0:06:35.74,0:06:36.99,Default,,0000,0000,0000,,This is what it might look like. You've got Dialogue: 0,0:06:36.99,0:06:40.36,Default,,0000,0000,0000,,your Rails app. Pushes data into Redis and\Nthen Dialogue: 0,0:06:40.36,0:06:42.40,Default,,0000,0000,0000,,each, like, the worker says, OK, I'm gonna\Npop Dialogue: 0,0:06:42.40,0:06:45.14,Default,,0000,0000,0000,,a message off of Redis. I'm going to read Dialogue: 0,0:06:45.14,0:06:47.08,Default,,0000,0000,0000,,straight from the database. I'm gonna process\Nthe message. Dialogue: 0,0:06:47.08,0:06:49.22,Default,,0000,0000,0000,,I'm gonna write straight back out from the\Ndatabase. Dialogue: 0,0:06:49.22,0:06:51.16,Default,,0000,0000,0000,,And then the Rails app can just query the Dialogue: 0,0:06:51.16,0:06:54.57,Default,,0000,0000,0000,,result by reading it straight from the database.\NHere's Dialogue: 0,0:06:54.57,0:06:56.30,Default,,0000,0000,0000,,a, it might run, I did not. I, is Dialogue: 0,0:06:56.30,0:07:00.08,Default,,0000,0000,0000,,that bright enough? Hopefully it is. The comment\Nbasically Dialogue: 0,0:07:00.08,0:07:02.48,Default,,0000,0000,0000,,says, yes, this is very naive. I know it. Dialogue: 0,0:07:02.48,0:07:04.97,Default,,0000,0000,0000,,But that's not the point. Don't say, oh you Dialogue: 0,0:07:04.97,0:07:08.01,Default,,0000,0000,0000,,could do this one million times faster. That's\Nnot Dialogue: 0,0:07:08.01,0:07:09.33,Default,,0000,0000,0000,,the point. Dialogue: 0,0:07:09.33,0:07:10.59,Default,,0000,0000,0000,,Hopefully it's readable. Dialogue: 0,0:07:10.59,0:07:12.85,Default,,0000,0000,0000,,So tags. So the basic premise is you have Dialogue: 0,0:07:12.85,0:07:15.20,Default,,0000,0000,0000,,an entry method, which takes in a Tweet as Dialogue: 0,0:07:15.20,0:07:18.63,Default,,0000,0000,0000,,an argument, and it, first step is to extract Dialogue: 0,0:07:18.63,0:07:21.35,Default,,0000,0000,0000,,the hash tags from the body. So I get, Dialogue: 0,0:07:21.35,0:07:22.86,Default,,0000,0000,0000,,like, an array of tags, and I loop over Dialogue: 0,0:07:22.86,0:07:25.48,Default,,0000,0000,0000,,each tag and I read from the database, is Dialogue: 0,0:07:25.48,0:07:27.76,Default,,0000,0000,0000,,there already an existing record? If so I'm\Ngoing Dialogue: 0,0:07:27.76,0:07:31.03,Default,,0000,0000,0000,,to update the moving average for it and then Dialogue: 0,0:07:31.03,0:07:32.08,Default,,0000,0000,0000,,save it back. Dialogue: 0,0:07:32.08,0:07:37.73,Default,,0000,0000,0000,,Oh. Yeah. Put highlights. So this is how one Dialogue: 0,0:07:37.73,0:07:40.54,Default,,0000,0000,0000,,might. It doesn't really matter. It was kind\Nof Dialogue: 0,0:07:40.54,0:07:43.23,Default,,0000,0000,0000,,like for informational purposes. But this\Nis how one Dialogue: 0,0:07:43.23,0:07:48.17,Default,,0000,0000,0000,,might implement the moving average like function.\NSo the, Dialogue: 0,0:07:48.17,0:07:50.94,Default,,0000,0000,0000,,the most important part that I want to call, Dialogue: 0,0:07:50.94,0:07:54.40,Default,,0000,0000,0000,,like, draw attention to is that in the update Dialogue: 0,0:07:54.40,0:07:58.13,Default,,0000,0000,0000,,EWMA, the very first thing we do is catch, Dialogue: 0,0:07:58.13,0:08:01.66,Default,,0000,0000,0000,,catch p. And I didn't pass now. But in Dialogue: 0,0:08:01.66,0:08:03.13,Default,,0000,0000,0000,,theory it should pass now down to the catch Dialogue: 0,0:08:03.13,0:08:06.82,Default,,0000,0000,0000,,up and it will keep ticking to update the Dialogue: 0,0:08:06.82,0:08:09.49,Default,,0000,0000,0000,,rate for every bucket that we decided we're\Nsmoothing Dialogue: 0,0:08:09.49,0:08:12.05,Default,,0000,0000,0000,,by. So, like, every five seconds it's going\Nto Dialogue: 0,0:08:12.05,0:08:14.96,Default,,0000,0000,0000,,reupdate the int, the rate so that it will Dialogue: 0,0:08:14.96,0:08:18.95,Default,,0000,0000,0000,,downweight previous values and go on from\Nthere. Dialogue: 0,0:08:18.95,0:08:21.56,Default,,0000,0000,0000,,So this is going to expose a problem with Dialogue: 0,0:08:21.56,0:08:24.44,Default,,0000,0000,0000,,our first version of the worker, in that if Dialogue: 0,0:08:24.44,0:08:27.36,Default,,0000,0000,0000,,we do not receive any Tweets with a given Dialogue: 0,0:08:27.36,0:08:31.52,Default,,0000,0000,0000,,hash tag, what, how do we handle that? We Dialogue: 0,0:08:31.52,0:08:34.13,Default,,0000,0000,0000,,need to still update the record in order to Dialogue: 0,0:08:34.13,0:08:36.87,Default,,0000,0000,0000,,constantly update the rates so that, ah, we're\Nnot Dialogue: 0,0:08:36.87,0:08:40.50,Default,,0000,0000,0000,,resaving any data. We have to start re, lowering Dialogue: 0,0:08:40.50,0:08:42.57,Default,,0000,0000,0000,,the weight of that hashtag. Dialogue: 0,0:08:42.57,0:08:45.18,Default,,0000,0000,0000,,So, just, we'll have to end up implementing\Nsomething Dialogue: 0,0:08:45.18,0:08:47.98,Default,,0000,0000,0000,,like this. Which is going to be scheduled\Nvia Dialogue: 0,0:08:47.98,0:08:50.66,Default,,0000,0000,0000,,Chrone or just regular, some regularly scheduled\Ntask that Dialogue: 0,0:08:50.66,0:08:54.66,Default,,0000,0000,0000,,probably has to run every bucket interval.\NAnd first Dialogue: 0,0:08:54.66,0:08:58.13,Default,,0000,0000,0000,,step is just delete all the existing tags\Nthat Dialogue: 0,0:08:58.13,0:09:00.04,Default,,0000,0000,0000,,are way too old just to clear the database. Dialogue: 0,0:09:00.04,0:09:02.98,Default,,0000,0000,0000,,And then load up all the tags and update Dialogue: 0,0:09:02.98,0:09:06.66,Default,,0000,0000,0000,,the rates for them. Make sure they're caught\Nup. Dialogue: 0,0:09:06.66,0:09:09.27,Default,,0000,0000,0000,,So this is going to, one thing to point Dialogue: 0,0:09:09.27,0:09:11.03,Default,,0000,0000,0000,,out is, OK, so we now have to have Dialogue: 0,0:09:11.03,0:09:13.72,Default,,0000,0000,0000,,a task that is going to read literally every Dialogue: 0,0:09:13.72,0:09:17.31,Default,,0000,0000,0000,,record from the database every tick. And there\Nprobably Dialogue: 0,0:09:17.31,0:09:18.40,Default,,0000,0000,0000,,is a better way to do this. But that's Dialogue: 0,0:09:18.40,0:09:20.35,Default,,0000,0000,0000,,not the point. Dialogue: 0,0:09:20.35,0:09:23.13,Default,,0000,0000,0000,,So is it web scale, yet? Well, let's look. Dialogue: 0,0:09:23.13,0:09:25.58,Default,,0000,0000,0000,,So far, not yet. But what we can do Dialogue: 0,0:09:25.58,0:09:27.98,Default,,0000,0000,0000,,is we can start scaling out our workers. We're Dialogue: 0,0:09:27.98,0:09:30.75,Default,,0000,0000,0000,,just gonna like, ah, let's spawn up three\Nworkers. Dialogue: 0,0:09:30.75,0:09:33.70,Default,,0000,0000,0000,,Now we're ready to handle fifty-thousand Tweets\Nper second. Dialogue: 0,0:09:33.70,0:09:36.56,Default,,0000,0000,0000,,Well, maybe not, but we have more tricks up Dialogue: 0,0:09:36.56,0:09:38.86,Default,,0000,0000,0000,,our sleeve. Like, we know, like I mentioned,\Nwe Dialogue: 0,0:09:38.86,0:09:40.68,Default,,0000,0000,0000,,know that the database is going to be a Dialogue: 0,0:09:40.68,0:09:42.89,Default,,0000,0000,0000,,pretty big bottleneck, cause every single\NTweet we get Dialogue: 0,0:09:42.89,0:09:44.32,Default,,0000,0000,0000,,in, it's going to have to read state, process Dialogue: 0,0:09:44.32,0:09:45.78,Default,,0000,0000,0000,,it, and write it back. Dialogue: 0,0:09:45.78,0:09:49.04,Default,,0000,0000,0000,,What if we could optimize the, and, remove\Nthe Dialogue: 0,0:09:49.04,0:09:52.65,Default,,0000,0000,0000,,read state and, by caching a memory. Super\Nsimple Dialogue: 0,0:09:52.65,0:09:56.57,Default,,0000,0000,0000,,implementation, which more is represented\Nas like, oh, unless Dialogue: 0,0:09:56.57,0:09:59.30,Default,,0000,0000,0000,,like, we basically build an in memory cache\Nof Dialogue: 0,0:09:59.30,0:10:01.68,Default,,0000,0000,0000,,hash tags in the process, such that every\Ntime Dialogue: 0,0:10:01.68,0:10:02.59,Default,,0000,0000,0000,,we get a Tweet, it comes in and we Dialogue: 0,0:10:02.59,0:10:06.10,Default,,0000,0000,0000,,first check our cache before trying to, making\Na Dialogue: 0,0:10:06.10,0:10:07.91,Default,,0000,0000,0000,,new one. And we update that and save it Dialogue: 0,0:10:07.91,0:10:08.53,Default,,0000,0000,0000,,back to the database. Dialogue: 0,0:10:08.53,0:10:11.06,Default,,0000,0000,0000,,So let's run through this and see what's happening. Dialogue: 0,0:10:11.06,0:10:13.89,Default,,0000,0000,0000,,Here's the first Tweet that comes, that comes\Nthrough. Dialogue: 0,0:10:13.89,0:10:16.96,Default,,0000,0000,0000,,We include, queue up, it comes in through\Nthe Dialogue: 0,0:10:16.96,0:10:21.57,Default,,0000,0000,0000,,queue. Fancy animations, yeah. And it goes\Nto the Dialogue: 0,0:10:21.57,0:10:23.76,Default,,0000,0000,0000,,first worker. The worker processes it. It's\Nlike, OK, Dialogue: 0,0:10:23.76,0:10:24.89,Default,,0000,0000,0000,,my cache is empty. I'm gonna check in the Dialogue: 0,0:10:24.89,0:10:27.69,Default,,0000,0000,0000,,database. Nothing for RailsConf in the database.\NSo I'm Dialogue: 0,0:10:27.69,0:10:30.57,Default,,0000,0000,0000,,just gonna process that memory the count one.\NAnd Dialogue: 0,0:10:30.57,0:10:32.93,Default,,0000,0000,0000,,we're gonna write it to the database. K. Dialogue: 0,0:10:32.93,0:10:34.98,Default,,0000,0000,0000,,Next Tweet comes in. Ah, what's for lunch\Nat Dialogue: 0,0:10:34.98,0:10:38.03,Default,,0000,0000,0000,,RailsConf? Well we know now, but it's going\Nto Dialogue: 0,0:10:38.03,0:10:39.85,Default,,0000,0000,0000,,be, like, queued, like, again, queued up in\Nhere, Dialogue: 0,0:10:39.85,0:10:40.94,Default,,0000,0000,0000,,and I'm sure you know where this is going. Dialogue: 0,0:10:40.94,0:10:43.64,Default,,0000,0000,0000,,Goes to the second worker. Second worker reads\Nthe Dialogue: 0,0:10:43.64,0:10:47.25,Default,,0000,0000,0000,,counts and it works, increments count to two\Nand Dialogue: 0,0:10:47.25,0:10:49.86,Default,,0000,0000,0000,,it goes back and writes the database. Dialogue: 0,0:10:49.86,0:10:51.88,Default,,0000,0000,0000,,And here's the third Tweet. And we're gonna\Nget Dialogue: 0,0:10:51.88,0:10:54.15,Default,,0000,0000,0000,,to it. This one's going to go back to Dialogue: 0,0:10:54.15,0:10:55.89,Default,,0000,0000,0000,,the first worker, it's like, oh, I have everything Dialogue: 0,0:10:55.89,0:10:58.16,Default,,0000,0000,0000,,cached. I don't need to read from the database. Dialogue: 0,0:10:58.16,0:11:03.00,Default,,0000,0000,0000,,No probably. We'll count to two and yes. Yes. Dialogue: 0,0:11:03.00,0:11:05.93,Default,,0000,0000,0000,,This. I did this way too late. I was Dialogue: 0,0:11:05.93,0:11:07.78,Default,,0000,0000,0000,,like, ah, I'm just gonna do fancy animations.\NAll Dialogue: 0,0:11:07.78,0:11:09.17,Default,,0000,0000,0000,,right, so, caching in this kind of system\Nis Dialogue: 0,0:11:09.17,0:11:12.28,Default,,0000,0000,0000,,not obvious. We have this problem of, like,\Nwe Dialogue: 0,0:11:12.28,0:11:15.89,Default,,0000,0000,0000,,have, we can still make it work. There's many Dialogue: 0,0:11:15.89,0:11:17.58,Default,,0000,0000,0000,,things we could do. Like, we could push, like, Dialogue: 0,0:11:17.58,0:11:19.53,Default,,0000,0000,0000,,we could push for cache external. We could\Npush Dialogue: 0,0:11:19.53,0:11:21.23,Default,,0000,0000,0000,,for mem cache, but now we still have an Dialogue: 0,0:11:21.23,0:11:25.72,Default,,0000,0000,0000,,external process that is required for coordinating\Nthe work. Dialogue: 0,0:11:25.72,0:11:29.06,Default,,0000,0000,0000,,So, the main thing is, even though we have Dialogue: 0,0:11:29.06,0:11:31.06,Default,,0000,0000,0000,,many worker jobs, there is still a high amount Dialogue: 0,0:11:31.06,0:11:35.05,Default,,0000,0000,0000,,of coordination that's required to get this\Nto work. Dialogue: 0,0:11:35.05,0:11:35.63,Default,,0000,0000,0000,,K. Dialogue: 0,0:11:35.63,0:11:38.85,Default,,0000,0000,0000,,Enter Storm. And, as you can guess, probably\Nthis Dialogue: 0,0:11:38.85,0:11:40.64,Default,,0000,0000,0000,,is gonna, like, all these problems are gonna\Nmagically Dialogue: 0,0:11:40.64,0:11:44.02,Default,,0000,0000,0000,,go away when we use storm. I wish. But Dialogue: 0,0:11:44.02,0:11:47.18,Default,,0000,0000,0000,,it'll, there's something we can do. Let's\Nstart with Dialogue: 0,0:11:47.18,0:11:52.46,Default,,0000,0000,0000,,some abstractions. The core abstractions in\NStorm are the Dialogue: 0,0:11:52.46,0:11:56.34,Default,,0000,0000,0000,,streams and the tuples. A stream is, essentially,\Na Dialogue: 0,0:11:56.34,0:11:59.78,Default,,0000,0000,0000,,series of tubes through which data flows.\NAnd, but Dialogue: 0,0:11:59.78,0:12:02.59,Default,,0000,0000,0000,,they call them streams. I would, tubes would\Nhave Dialogue: 0,0:12:02.59,0:12:03.52,Default,,0000,0000,0000,,been better. Dialogue: 0,0:12:03.52,0:12:07.33,Default,,0000,0000,0000,,And tuples are just, it's just like a list Dialogue: 0,0:12:07.33,0:12:09.57,Default,,0000,0000,0000,,of values. And these values can be anything.\NThey're Dialogue: 0,0:12:09.57,0:12:11.23,Default,,0000,0000,0000,,really just the messages. So you can put in Dialogue: 0,0:12:11.23,0:12:13.12,Default,,0000,0000,0000,,tuples, you can put your strings, you can\Nput Dialogue: 0,0:12:13.12,0:12:15.05,Default,,0000,0000,0000,,integers, you can put any object you want\Nas Dialogue: 0,0:12:15.05,0:12:18.10,Default,,0000,0000,0000,,long as you can serialize it. Dialogue: 0,0:12:18.10,0:12:20.37,Default,,0000,0000,0000,,And you, you're allowed to specify custom\Nserialization. So Dialogue: 0,0:12:20.37,0:12:22.87,Default,,0000,0000,0000,,as long as you can serialize it to JSON, Dialogue: 0,0:12:22.87,0:12:25.09,Default,,0000,0000,0000,,serialize it to any serialization format,\Nyou can put Dialogue: 0,0:12:25.09,0:12:27.56,Default,,0000,0000,0000,,that object in the tuple. Dialogue: 0,0:12:27.56,0:12:29.29,Default,,0000,0000,0000,,The rest of Storm is going to be a Dialogue: 0,0:12:29.29,0:12:32.04,Default,,0000,0000,0000,,series, a bunch of primitives to take these\Nstreams Dialogue: 0,0:12:32.04,0:12:35.18,Default,,0000,0000,0000,,and tuples and to transform the data from,\Nlike, Dialogue: 0,0:12:35.18,0:12:39.34,Default,,0000,0000,0000,,one representation to the other. Dialogue: 0,0:12:39.34,0:12:42.62,Default,,0000,0000,0000,,Next, spouts and states. So the spouts are\Nthe Dialogue: 0,0:12:42.62,0:12:45.19,Default,,0000,0000,0000,,source of the streams. So this is how you Dialogue: 0,0:12:45.19,0:12:48.04,Default,,0000,0000,0000,,get data into storm. This is the starting\Npoint Dialogue: 0,0:12:48.04,0:12:51.75,Default,,0000,0000,0000,,of the streams. So anything that you want\Nto Dialogue: 0,0:12:51.75,0:12:53.51,Default,,0000,0000,0000,,read from the outside world is going to end Dialogue: 0,0:12:53.51,0:12:56.37,Default,,0000,0000,0000,,up being represented as a spout. And this\Ncan Dialogue: 0,0:12:56.37,0:13:00.30,Default,,0000,0000,0000,,be reading from redis, reading from SQS, reading\Nfrom Dialogue: 0,0:13:00.30,0:13:02.84,Default,,0000,0000,0000,,anything. But it can also be, you could implement Dialogue: 0,0:13:02.84,0:13:06.12,Default,,0000,0000,0000,,a spout that reads directly from the Twitter\NAPI. Dialogue: 0,0:13:06.12,0:13:09.45,Default,,0000,0000,0000,,You could implement a spout that reads from\Nthe Dialogue: 0,0:13:09.45,0:13:13.36,Default,,0000,0000,0000,,database, that makes HTTP requests, that gets\Nthe time Dialogue: 0,0:13:13.36,0:13:16.10,Default,,0000,0000,0000,,of the day, even. And, yes, you will want Dialogue: 0,0:13:16.10,0:13:18.33,Default,,0000,0000,0000,,to actually implement a spout that gets current\Ntime Dialogue: 0,0:13:18.33,0:13:20.13,Default,,0000,0000,0000,,of the day. And I'll talk about that later. Dialogue: 0,0:13:20.13,0:13:24.75,Default,,0000,0000,0000,,States are the opposite. State is how you\Nget Dialogue: 0,0:13:24.75,0:13:28.40,Default,,0000,0000,0000,,the result of the transformations outside\Nof Storm. So Dialogue: 0,0:13:28.40,0:13:30.55,Default,,0000,0000,0000,,exactly the opposite. If you want to write\Noutside Dialogue: 0,0:13:30.55,0:13:32.41,Default,,0000,0000,0000,,of storm, like you write to the database,\Nwrite, Dialogue: 0,0:13:32.41,0:13:36.44,Default,,0000,0000,0000,,make HTTP post requests, send email, push\Nto external Dialogue: 0,0:13:36.44,0:13:39.76,Default,,0000,0000,0000,,queues, anything like that. It's going to\Nbe implemented Dialogue: 0,0:13:39.76,0:13:43.86,Default,,0000,0000,0000,,as a state. So, so far this is what Dialogue: 0,0:13:43.86,0:13:45.51,Default,,0000,0000,0000,,we know about Storm. Dialogue: 0,0:13:45.51,0:13:48.35,Default,,0000,0000,0000,,We have Spout. It starts a stream. Data is Dialogue: 0,0:13:48.35,0:13:49.39,Default,,0000,0000,0000,,going to flow through it and it's going to Dialogue: 0,0:13:49.39,0:13:53.59,Default,,0000,0000,0000,,end up at the state. So far nothing is Dialogue: 0,0:13:53.59,0:13:55.47,Default,,0000,0000,0000,,super interesting. Dialogue: 0,0:13:55.47,0:13:57.79,Default,,0000,0000,0000,,Transform. The rest is really about how do\Nwe Dialogue: 0,0:13:57.79,0:14:01.97,Default,,0000,0000,0000,,transform the data as it flows through. And\Ntransforms Dialogue: 0,0:14:01.97,0:14:05.35,Default,,0000,0000,0000,,are gonna be purely functional operations\Non the data. Dialogue: 0,0:14:05.35,0:14:08.76,Default,,0000,0000,0000,,So they are going to be given the, given, Dialogue: 0,0:14:08.76,0:14:11.85,Default,,0000,0000,0000,,given inputs, the same, it's, they're going\Nto output Dialogue: 0,0:14:11.85,0:14:14.92,Default,,0000,0000,0000,,the same values. Anyway. Dialogue: 0,0:14:14.92,0:14:17.85,Default,,0000,0000,0000,,So let's add a couple transforms here. So\Nwe Dialogue: 0,0:14:17.85,0:14:19.97,Default,,0000,0000,0000,,have a spout. It's going to send some data Dialogue: 0,0:14:19.97,0:14:21.93,Default,,0000,0000,0000,,and we can, like, filter if we need to. Dialogue: 0,0:14:21.93,0:14:23.59,Default,,0000,0000,0000,,So we'll filter some data out and we'll aggregate Dialogue: 0,0:14:23.59,0:14:25.91,Default,,0000,0000,0000,,it. And the data will flow through and end Dialogue: 0,0:14:25.91,0:14:30.16,Default,,0000,0000,0000,,up at the state and then get persisted. Dialogue: 0,0:14:30.16,0:14:32.87,Default,,0000,0000,0000,,And then more transforms. Basically from here,\Nwhat you Dialogue: 0,0:14:32.87,0:14:35.36,Default,,0000,0000,0000,,end up doing is modeling your data in terms Dialogue: 0,0:14:35.36,0:14:38.29,Default,,0000,0000,0000,,of dataflow to get to the end point. And Dialogue: 0,0:14:38.29,0:14:40.82,Default,,0000,0000,0000,,there's really, you can, there's no limitation\Nin terms Dialogue: 0,0:14:40.82,0:14:43.93,Default,,0000,0000,0000,,of, I mean, granted you don't add a million Dialogue: 0,0:14:43.93,0:14:45.81,Default,,0000,0000,0000,,spouts because that will probably be crazy\Nand you'll Dialogue: 0,0:14:45.81,0:14:47.11,Default,,0000,0000,0000,,be, not be able to understand the code. But Dialogue: 0,0:14:47.11,0:14:49.98,Default,,0000,0000,0000,,besides that, you can add as many spouts as Dialogue: 0,0:14:49.98,0:14:51.67,Default,,0000,0000,0000,,you want. You can add as many state end Dialogue: 0,0:14:51.67,0:14:54.78,Default,,0000,0000,0000,,points. You can, like, split the streams.\NYou can Dialogue: 0,0:14:54.78,0:14:58.88,Default,,0000,0000,0000,,join them. And do whatever real transformation\Nthat you Dialogue: 0,0:14:58.88,0:15:01.38,Default,,0000,0000,0000,,want. And Storm is going to take care of Dialogue: 0,0:15:01.38,0:15:04.84,Default,,0000,0000,0000,,figuring out how to take this definition,\Nlike, the Dialogue: 0,0:15:04.84,0:15:07.84,Default,,0000,0000,0000,,state of the definition and distribute it\Nacross your Dialogue: 0,0:15:07.84,0:15:08.86,Default,,0000,0000,0000,,cluster. Dialogue: 0,0:15:08.86,0:15:13.14,Default,,0000,0000,0000,,And this entire set, basically, wrap of starting\Nat Dialogue: 0,0:15:13.14,0:15:17.49,Default,,0000,0000,0000,,stream, starting at spouts and ending at states\Nis Dialogue: 0,0:15:17.49,0:15:21.04,Default,,0000,0000,0000,,called the topology. And in a bit, I'm gonna Dialogue: 0,0:15:21.04,0:15:23.96,Default,,0000,0000,0000,,show how to define the topology. Dialogue: 0,0:15:23.96,0:15:28.01,Default,,0000,0000,0000,,So, OK. Let's break it down just a little Dialogue: 0,0:15:28.01,0:15:30.71,Default,,0000,0000,0000,,bit. So the spout is where it starts, and Dialogue: 0,0:15:30.71,0:15:33.42,Default,,0000,0000,0000,,then it's going to emit tuples to the filter. Dialogue: 0,0:15:33.42,0:15:35.98,Default,,0000,0000,0000,,Generally, as you're gonna get the spout,\Nlike, it's Dialogue: 0,0:15:35.98,0:15:38.08,Default,,0000,0000,0000,,gonna be a pretty standard thing. Like, for\Nexample, Dialogue: 0,0:15:38.08,0:15:40.07,Default,,0000,0000,0000,,if you are pulling, if you are pulling data Dialogue: 0,0:15:40.07,0:15:42.02,Default,,0000,0000,0000,,from Redis and want, you will use a Redis Dialogue: 0,0:15:42.02,0:15:44.52,Default,,0000,0000,0000,,spout that already exists. And it will literally\Njust Dialogue: 0,0:15:44.52,0:15:47.63,Default,,0000,0000,0000,,be in your definition, just use Redis spouts.\NAnd Dialogue: 0,0:15:47.63,0:15:51.73,Default,,0000,0000,0000,,filters, as well, any, any transforms can\Nusually be Dialogue: 0,0:15:51.73,0:15:55.06,Default,,0000,0000,0000,,abstracted up to higher level concepts and\Nshared. Dialogue: 0,0:15:55.06,0:15:59.35,Default,,0000,0000,0000,,So, for example, this is how you might implement Dialogue: 0,0:15:59.35,0:16:05.96,Default,,0000,0000,0000,,the my filter, the my filter function or transform Dialogue: 0,0:16:05.96,0:16:09.48,Default,,0000,0000,0000,,for Storm. So the base function is provided\Nas Dialogue: 0,0:16:09.48,0:16:12.66,Default,,0000,0000,0000,,part of the Storm API. And the requirement\Nis Dialogue: 0,0:16:12.66,0:16:15.41,Default,,0000,0000,0000,,that you define a single method that takes\Nthe Dialogue: 0,0:16:15.41,0:16:19.41,Default,,0000,0000,0000,,input as a tuple and it will take the Dialogue: 0,0:16:19.41,0:16:21.98,Default,,0000,0000,0000,,output, which represents the stream. So you\Nwill get Dialogue: 0,0:16:21.98,0:16:24.06,Default,,0000,0000,0000,,in a tuple, you're gonna process it, and you Dialogue: 0,0:16:24.06,0:16:28.16,Default,,0000,0000,0000,,will then emit zero or more output tuples\Nonto Dialogue: 0,0:16:28.16,0:16:29.12,Default,,0000,0000,0000,,the output stream. Dialogue: 0,0:16:29.12,0:16:31.84,Default,,0000,0000,0000,,So, the first step in the filter is just Dialogue: 0,0:16:31.84,0:16:34.35,Default,,0000,0000,0000,,get the, this is, get the message out of Dialogue: 0,0:16:34.35,0:16:36.15,Default,,0000,0000,0000,,the tuple, and this is somewhat of a Java-ish Dialogue: 0,0:16:36.15,0:16:40.87,Default,,0000,0000,0000,,API, but it's possible to shim. So, get value Dialogue: 0,0:16:40.87,0:16:43.25,Default,,0000,0000,0000,,by field. You, in this case, we're just creating Dialogue: 0,0:16:43.25,0:16:45.60,Default,,0000,0000,0000,,the tuple more as a hash map than anything Dialogue: 0,0:16:45.60,0:16:47.79,Default,,0000,0000,0000,,else. So we get the message. Is the message Dialogue: 0,0:16:47.79,0:16:50.81,Default,,0000,0000,0000,,awesome? If so, we just output it on the Dialogue: 0,0:16:50.81,0:16:53.84,Default,,0000,0000,0000,,output stream. And that is all that it takes Dialogue: 0,0:16:53.84,0:16:56.16,Default,,0000,0000,0000,,to implement a very simple transform. Dialogue: 0,0:16:56.16,0:17:00.27,Default,,0000,0000,0000,,K. Here's another example. And this is not\Nsomething Dialogue: 0,0:17:00.27,0:17:02.92,Default,,0000,0000,0000,,to run in production but is super helpful\Nfor Dialogue: 0,0:17:02.92,0:17:07.49,Default,,0000,0000,0000,,debugging. All right, the next step is define\Nthe Dialogue: 0,0:17:07.49,0:17:10.22,Default,,0000,0000,0000,,topology. Like, define how all the different,\Nlike, transforms Dialogue: 0,0:17:10.22,0:17:13.26,Default,,0000,0000,0000,,and spouts and everything hook up together. Dialogue: 0,0:17:13.26,0:17:16.91,Default,,0000,0000,0000,,And the important part's on the screen. So\Nthe Dialogue: 0,0:17:16.91,0:17:20.64,Default,,0000,0000,0000,,define_topology method doesn't matter, it's\Nwhatever. The main points Dialogue: 0,0:17:20.64,0:17:23.08,Default,,0000,0000,0000,,you will start at, you will get a topology Dialogue: 0,0:17:23.08,0:17:26.43,Default,,0000,0000,0000,,object, and on that you'll use a series of Dialogue: 0,0:17:26.43,0:17:29.00,Default,,0000,0000,0000,,APIs define how things flow together. So you\Nstart Dialogue: 0,0:17:29.00,0:17:31.34,Default,,0000,0000,0000,,by defining the stream and you just name it Dialogue: 0,0:17:31.34,0:17:34.32,Default,,0000,0000,0000,,as, are my-spout, and in this case, I'm saying Dialogue: 0,0:17:34.32,0:17:37.27,Default,,0000,0000,0000,,MyQueueSpout, but it would be like My, like\NRedisSpout Dialogue: 0,0:17:37.27,0:17:40.56,Default,,0000,0000,0000,,dot new specify the server, the, like, wherever\Nthe Dialogue: 0,0:17:40.56,0:17:42.03,Default,,0000,0000,0000,,server is and the topic you want to read Dialogue: 0,0:17:42.03,0:17:45.44,Default,,0000,0000,0000,,from. Or, if you're using SQS you would pass Dialogue: 0,0:17:45.44,0:17:48.82,Default,,0000,0000,0000,,in your credentials and the topic you're reading\Nfrom, Dialogue: 0,0:17:48.82,0:17:51.28,Default,,0000,0000,0000,,et cetera. Dialogue: 0,0:17:51.28,0:17:54.77,Default,,0000,0000,0000,,Next step, usually the spouts will just output\Ntuples Dialogue: 0,0:17:54.77,0:17:56.76,Default,,0000,0000,0000,,that are raw bytes. So it will just get Dialogue: 0,0:17:56.76,0:17:58.24,Default,,0000,0000,0000,,the raw bytes from Redis and output it, so Dialogue: 0,0:17:58.24,0:18:00.71,Default,,0000,0000,0000,,the next step is to deserialize the raw bytes, Dialogue: 0,0:18:00.71,0:18:03.34,Default,,0000,0000,0000,,so we are getting a message that, we are Dialogue: 0,0:18:03.34,0:18:05.68,Default,,0000,0000,0000,,saying ah, we are expecting in, as an input, Dialogue: 0,0:18:05.68,0:18:08.38,Default,,0000,0000,0000,,a tuple that contains the field bytes. We\Nare Dialogue: 0,0:18:08.38,0:18:11.82,Default,,0000,0000,0000,,gonna pass it to the queue message deserializer.\NAnd Dialogue: 0,0:18:11.82,0:18:13.77,Default,,0000,0000,0000,,the output is gonna be a tuple that contains Dialogue: 0,0:18:13.77,0:18:14.24,Default,,0000,0000,0000,,message. Dialogue: 0,0:18:14.24,0:18:17.58,Default,,0000,0000,0000,,I included the input, an example implementation\Nunder. And Dialogue: 0,0:18:17.58,0:18:20.46,Default,,0000,0000,0000,,then we're gonna chain that, and we're gonna\Nsay, Dialogue: 0,0:18:20.46,0:18:22.37,Default,,0000,0000,0000,,like, for each tuple that we get, we're gonna Dialogue: 0,0:18:22.37,0:18:24.85,Default,,0000,0000,0000,,expect, well, we're gonna expect a tuple that\Ncontains Dialogue: 0,0:18:24.85,0:18:27.45,Default,,0000,0000,0000,,the field message. We're gonna pass it to\Nmy Dialogue: 0,0:18:27.45,0:18:30.05,Default,,0000,0000,0000,,filter, which will filter it based off of\Nour Dialogue: 0,0:18:30.05,0:18:33.17,Default,,0000,0000,0000,,predicates, and we'll output another tuple\Nthat contains a Dialogue: 0,0:18:33.17,0:18:37.62,Default,,0000,0000,0000,,message. And finally we'll pass it the logger. Dialogue: 0,0:18:37.62,0:18:40.27,Default,,0000,0000,0000,,Next step, run it. And the easiest way to Dialogue: 0,0:18:40.27,0:18:43.38,Default,,0000,0000,0000,,get this running, just for exper, like, playing\Naround, Dialogue: 0,0:18:43.38,0:18:47.33,Default,,0000,0000,0000,,is just running it locally. And redstorm does\Nthis, Dialogue: 0,0:18:47.33,0:18:50.51,Default,,0000,0000,0000,,but if you want to just, like, the actual Dialogue: 0,0:18:50.51,0:18:52.85,Default,,0000,0000,0000,,AP, like, amount of code to get it started Dialogue: 0,0:18:52.85,0:18:55.27,Default,,0000,0000,0000,,is ba- is just this. You initialize a new Dialogue: 0,0:18:55.27,0:18:59.30,Default,,0000,0000,0000,,cluster object. You define new TridentTopology.\NI'm going to Dialogue: 0,0:18:59.30,0:19:00.58,Default,,0000,0000,0000,,just kind of like, wave my hands over what Dialogue: 0,0:19:00.58,0:19:03.83,Default,,0000,0000,0000,,Trident is. You can read the Wiki. Dialogue: 0,0:19:03.83,0:19:06.74,Default,,0000,0000,0000,,And doesn't really matter for the talk. You\Ncan Dialogue: 0,0:19:06.74,0:19:10.94,Default,,0000,0000,0000,,initialize a config and then there's our define_topology\Nmethod. Dialogue: 0,0:19:10.94,0:19:12.72,Default,,0000,0000,0000,,You pass in topology and then you submit it Dialogue: 0,0:19:12.72,0:19:15.26,Default,,0000,0000,0000,,to the cluster, which is just your locally\Nrunning Dialogue: 0,0:19:15.26,0:19:18.00,Default,,0000,0000,0000,,machine. And the entire thing is running.\NWinning. Dialogue: 0,0:19:18.00,0:19:20.42,Default,,0000,0000,0000,,But only a little bit winning, because we\Nhave Dialogue: 0,0:19:20.42,0:19:22.98,Default,,0000,0000,0000,,not, we're not doing anything with the data\Nyet. Dialogue: 0,0:19:22.98,0:19:25.10,Default,,0000,0000,0000,,So the next step is going to be to Dialogue: 0,0:19:25.10,0:19:29.09,Default,,0000,0000,0000,,persist the results. And, again, there are\Nhigher-level things Dialogue: 0,0:19:29.09,0:19:32.12,Default,,0000,0000,0000,,that, there are like, there are already existing\Nlibraries Dialogue: 0,0:19:32.12,0:19:33.93,Default,,0000,0000,0000,,where it's like, oh, you take in these tuples Dialogue: 0,0:19:33.93,0:19:36.24,Default,,0000,0000,0000,,and just throw them, throw them entirely in\NRedis Dialogue: 0,0:19:36.24,0:19:38.12,Default,,0000,0000,0000,,that take no code. But I am going to Dialogue: 0,0:19:38.12,0:19:40.40,Default,,0000,0000,0000,,jump directly down into how you would implement\Na Dialogue: 0,0:19:40.40,0:19:43.21,Default,,0000,0000,0000,,state from scratch. Dialogue: 0,0:19:43.21,0:19:47.40,Default,,0000,0000,0000,,So what it takes is, you just make a Dialogue: 0,0:19:47.40,0:19:50.68,Default,,0000,0000,0000,,class, inherit from state. This is provided\Nfrom, as Dialogue: 0,0:19:50.68,0:19:54.34,Default,,0000,0000,0000,,part of the Storm API as well. The only Dialogue: 0,0:19:54.34,0:19:56.50,Default,,0000,0000,0000,,methods you have to define are begin commit\Nand Dialogue: 0,0:19:56.50,0:20:00.01,Default,,0000,0000,0000,,commit, and I'm gonna cover those later. Besides\Nthat, Dialogue: 0,0:20:00.01,0:20:03.02,Default,,0000,0000,0000,,you just provide whatever you want. And as\Nan Dialogue: 0,0:20:03.02,0:20:06.78,Default,,0000,0000,0000,,API to interactive to state. Because next\Ncomponent is Dialogue: 0,0:20:06.78,0:20:10.61,Default,,0000,0000,0000,,going to be the state updater. Dialogue: 0,0:20:10.61,0:20:15.20,Default,,0000,0000,0000,,The state updater takes in the, requires you\Nto Dialogue: 0,0:20:15.20,0:20:18.71,Default,,0000,0000,0000,,define the method update_state, which is going\Nto pass Dialogue: 0,0:20:18.71,0:20:21.22,Default,,0000,0000,0000,,in the instance of your state and is gonna Dialogue: 0,0:20:21.22,0:20:25.31,Default,,0000,0000,0000,,pass in the input tuple and an optional, like, Dialogue: 0,0:20:25.31,0:20:27.00,Default,,0000,0000,0000,,it also gives you an output string in case Dialogue: 0,0:20:27.00,0:20:30.63,Default,,0000,0000,0000,,you want your state to emit more tuples. But, Dialogue: 0,0:20:30.63,0:20:32.36,Default,,0000,0000,0000,,in our case, we're gonna, all we're gonna\Ndo Dialogue: 0,0:20:32.36,0:20:34.85,Default,,0000,0000,0000,,is ex- get the message out of the tuple Dialogue: 0,0:20:34.85,0:20:38.44,Default,,0000,0000,0000,,and call persist awesomely, and that's it\Nfor now. Dialogue: 0,0:20:38.44,0:20:43.41,Default,,0000,0000,0000,,Except, cause it's Java, it's also a factory.\NBut, Dialogue: 0,0:20:43.41,0:20:48.39,Default,,0000,0000,0000,,let's not mention that. So here we have, I Dialogue: 0,0:20:48.39,0:20:50.82,Default,,0000,0000,0000,,just wanted to add the state to the topology Dialogue: 0,0:20:50.82,0:20:53.88,Default,,0000,0000,0000,,that we're defining. And we're just adding,\Nall I Dialogue: 0,0:20:53.88,0:20:56.35,Default,,0000,0000,0000,,did was replace the logger one, and instead\Nof Dialogue: 0,0:20:56.35,0:21:01.25,Default,,0000,0000,0000,,logging, I'm calling partition persist, passing\Nin the factory. Dialogue: 0,0:21:01.25,0:21:04.21,Default,,0000,0000,0000,,I'm saying, I'm expecting. Now I'll just build\Nthe Dialogue: 0,0:21:04.21,0:21:07.70,Default,,0000,0000,0000,,state. But I'm expecting a tuple that has\Nthe Dialogue: 0,0:21:07.70,0:21:13.19,Default,,0000,0000,0000,,field message, and I'm passing use the basic\Nupdater. Dialogue: 0,0:21:13.19,0:21:17.55,Default,,0000,0000,0000,,And that's it for getting a very basic topology Dialogue: 0,0:21:17.55,0:21:22.97,Default,,0000,0000,0000,,running. So the next step is let's go back Dialogue: 0,0:21:22.97,0:21:29.56,Default,,0000,0000,0000,,to our initial, like, Twitter hash tag example.\NThis Dialogue: 0,0:21:29.56,0:21:31.97,Default,,0000,0000,0000,,is somewhat what, like, this might be what\Nit Dialogue: 0,0:21:31.97,0:21:34.12,Default,,0000,0000,0000,,might look like. Ah, many mights. This is\Nwhat Dialogue: 0,0:21:34.12,0:21:37.27,Default,,0000,0000,0000,,it could look like as a Storm topology. You'd Dialogue: 0,0:21:37.27,0:21:39.39,Default,,0000,0000,0000,,start with a Tweet spout, however you decide\Nto Dialogue: 0,0:21:39.39,0:21:43.28,Default,,0000,0000,0000,,get your Tweets in. Via Redis, via directly\NTwitter Dialogue: 0,0:21:43.28,0:21:45.84,Default,,0000,0000,0000,,API, whatever. You pass it, you'd pass it\Nto Dialogue: 0,0:21:45.84,0:21:49.11,Default,,0000,0000,0000,,a transform, which the entire, which its only\Ngoal Dialogue: 0,0:21:49.11,0:21:51.65,Default,,0000,0000,0000,,is to get the Tweet body, get out the Dialogue: 0,0:21:51.65,0:21:54.31,Default,,0000,0000,0000,,hash tags and output them as tuples. The next Dialogue: 0,0:21:54.31,0:21:55.99,Default,,0000,0000,0000,,step is going to be an aggregate, and what Dialogue: 0,0:21:55.99,0:21:57.66,Default,,0000,0000,0000,,that's going to do is it's going to get Dialogue: 0,0:21:57.66,0:22:01.13,Default,,0000,0000,0000,,all the, it's going to get all the hash Dialogue: 0,0:22:01.13,0:22:04.02,Default,,0000,0000,0000,,tags and track how many counts there are.\NAnd Dialogue: 0,0:22:04.02,0:22:06.50,Default,,0000,0000,0000,,then it's gonna send that to the state, and Dialogue: 0,0:22:06.50,0:22:08.84,Default,,0000,0000,0000,,the state is going to do the moving average Dialogue: 0,0:22:08.84,0:22:11.83,Default,,0000,0000,0000,,calculation. Dialogue: 0,0:22:11.83,0:22:15.04,Default,,0000,0000,0000,,This is what it might look like. Again, pretty Dialogue: 0,0:22:15.04,0:22:18.76,Default,,0000,0000,0000,,basic. Extract hash tags inherits from base\Nfunction. We Dialogue: 0,0:22:18.76,0:22:20.96,Default,,0000,0000,0000,,define execute and what it's gonna do is get Dialogue: 0,0:22:20.96,0:22:24.40,Default,,0000,0000,0000,,the Tweet body out of the Tuple, extract hash Dialogue: 0,0:22:24.40,0:22:26.19,Default,,0000,0000,0000,,tags. I believe it's the same code. And loop Dialogue: 0,0:22:26.19,0:22:29.54,Default,,0000,0000,0000,,over it and just emit new tuples. Dialogue: 0,0:22:29.54,0:22:34.35,Default,,0000,0000,0000,,Next, the aggregator. So the first thing it\Ndoes Dialogue: 0,0:22:34.35,0:22:37.23,Default,,0000,0000,0000,,is just init, so, the. Oh, yeah. First the, Dialogue: 0,0:22:37.23,0:22:44.23,Default,,0000,0000,0000,,an aggregate function is basically just like\NRuby's in, Dialogue: 0,0:22:44.27,0:22:47.29,Default,,0000,0000,0000,,inject on enumerable. So you pass it an initial Dialogue: 0,0:22:47.29,0:22:49.44,Default,,0000,0000,0000,,state and then it's going to loop over whatever Dialogue: 0,0:22:49.44,0:22:51.72,Default,,0000,0000,0000,,it is and pass, pass you the. So the Dialogue: 0,0:22:51.72,0:22:54.37,Default,,0000,0000,0000,,aggregate is the iteration. It's going to\Npass in Dialogue: 0,0:22:54.37,0:22:56.21,Default,,0000,0000,0000,,the state that it's building up to as well Dialogue: 0,0:22:56.21,0:22:59.35,Default,,0000,0000,0000,,as each Tuple, and again, an optional output\Nstream Dialogue: 0,0:22:59.35,0:23:03.79,Default,,0000,0000,0000,,that you could output Tuples in mid-iteration. Dialogue: 0,0:23:03.79,0:23:09.08,Default,,0000,0000,0000,,And the init method, you return the initial\Nstate. Dialogue: 0,0:23:09.08,0:23:11.47,Default,,0000,0000,0000,,Finally there's going to be an extra complete\Nmethod, Dialogue: 0,0:23:11.47,0:23:15.15,Default,,0000,0000,0000,,which is oh, we are done aggregating, so let Dialogue: 0,0:23:15.15,0:23:19.14,Default,,0000,0000,0000,,us then finally output our aggregation as\Ntuples to Dialogue: 0,0:23:19.14,0:23:21.83,Default,,0000,0000,0000,,the stream. And, in this case, just going\Nto Dialogue: 0,0:23:21.83,0:23:23.59,Default,,0000,0000,0000,,loop over our, our summary, which is a hash Dialogue: 0,0:23:23.59,0:23:25.49,Default,,0000,0000,0000,,map with a hash tag and a count. I'm Dialogue: 0,0:23:25.49,0:23:28.09,Default,,0000,0000,0000,,gonna output tuples that contain the hash\Nmap, the Dialogue: 0,0:23:28.09,0:23:31.75,Default,,0000,0000,0000,,hash tag and the count. Dialogue: 0,0:23:31.75,0:23:36.48,Default,,0000,0000,0000,,And hook it all up. Pretty similar. The main Dialogue: 0,0:23:36.48,0:23:41.04,Default,,0000,0000,0000,,things with aggregates, aggregates functions\Nor aggregate transforms, you Dialogue: 0,0:23:41.04,0:23:45.71,Default,,0000,0000,0000,,want to call partition aggregate. And I will\Ncover Dialogue: 0,0:23:45.71,0:23:47.96,Default,,0000,0000,0000,,that in a bit. And also we're going to, Dialogue: 0,0:23:47.96,0:23:51.89,Default,,0000,0000,0000,,at the same time, add our trending topic state. Dialogue: 0,0:23:51.89,0:23:54.99,Default,,0000,0000,0000,,So this one might be implemented, let's again,\Nwe Dialogue: 0,0:23:54.99,0:23:57.87,Default,,0000,0000,0000,,inherit from, OK, at the very all I do Dialogue: 0,0:23:57.87,0:24:00.52,Default,,0000,0000,0000,,is trending topic state inherits from state.\NExactly the Dialogue: 0,0:24:00.52,0:24:01.44,Default,,0000,0000,0000,,same as previously. Dialogue: 0,0:24:01.44,0:24:04.79,Default,,0000,0000,0000,,I'm going to implement begin commit and commit,\Nfor Dialogue: 0,0:24:04.79,0:24:08.36,Default,,0000,0000,0000,,now. And I'm gonna leave them empty and come Dialogue: 0,0:24:08.36,0:24:10.67,Default,,0000,0000,0000,,back to it. And the update method is going Dialogue: 0,0:24:10.67,0:24:14.36,Default,,0000,0000,0000,,to take the hash tag and the count. And Dialogue: 0,0:24:14.36,0:24:15.83,Default,,0000,0000,0000,,it's going to do the exact same work that Dialogue: 0,0:24:15.83,0:24:18.85,Default,,0000,0000,0000,,it did in the Sidekiq one, for now. So Dialogue: 0,0:24:18.85,0:24:21.24,Default,,0000,0000,0000,,it's going to find an existing hash tag record Dialogue: 0,0:24:21.24,0:24:25.16,Default,,0000,0000,0000,,by name and update the moving average and\Nsave Dialogue: 0,0:24:25.16,0:24:28.39,Default,,0000,0000,0000,,it again. And the updater is going to take Dialogue: 0,0:24:28.39,0:24:29.59,Default,,0000,0000,0000,,in all the tuples and just pass it to Dialogue: 0,0:24:29.59,0:24:30.86,Default,,0000,0000,0000,,the states. Dialogue: 0,0:24:30.86,0:24:36.29,Default,,0000,0000,0000,,So you might be wondering, well, I thought\Nstreams Dialogue: 0,0:24:36.29,0:24:39.90,Default,,0000,0000,0000,,were unbounded sequences of tuples. So if\Nthat is Dialogue: 0,0:24:39.90,0:24:44.83,Default,,0000,0000,0000,,the case, when does one call the complete\Nmethod? Dialogue: 0,0:24:44.83,0:24:47.60,Default,,0000,0000,0000,,Infinite time from now? I don't know. But\Nno. Dialogue: 0,0:24:47.60,0:24:51.66,Default,,0000,0000,0000,,What, the way Storm executes its topology\Nis it Dialogue: 0,0:24:51.66,0:24:54.83,Default,,0000,0000,0000,,executes in batches. So the way it works is Dialogue: 0,0:24:54.83,0:24:59.04,Default,,0000,0000,0000,,gonna be the Storm coordinator, is going to,\Nhere Dialogue: 0,0:24:59.04,0:25:01.40,Default,,0000,0000,0000,,we go. Storm coordinator is going to tell\Nyour Dialogue: 0,0:25:01.40,0:25:04.19,Default,,0000,0000,0000,,spout, yo spout. I'm gonna just read what\Nit Dialogue: 0,0:25:04.19,0:25:08.31,Default,,0000,0000,0000,,says. Yo spout, start batch id 123. Dialogue: 0,0:25:08.31,0:25:10.86,Default,,0000,0000,0000,,And the spout is going to then be like, Dialogue: 0,0:25:10.86,0:25:13.00,Default,,0000,0000,0000,,OK. Cool. We're starting batch 123. Let's\Nget some Dialogue: 0,0:25:13.00,0:25:16.93,Default,,0000,0000,0000,,data for it. I fetched two hundred tuples\Nfrom Dialogue: 0,0:25:16.93,0:25:20.06,Default,,0000,0000,0000,,whatever source. Like, I, I popped two hundred\Nmessages Dialogue: 0,0:25:20.06,0:25:22.84,Default,,0000,0000,0000,,off of Redis. Seems good. And this is gonna Dialogue: 0,0:25:22.84,0:25:24.34,Default,,0000,0000,0000,,then go through everything we just saw, send\Nthe Dialogue: 0,0:25:24.34,0:25:26.28,Default,,0000,0000,0000,,messages down. It's gonna go to the hash tag Dialogue: 0,0:25:26.28,0:25:27.89,Default,,0000,0000,0000,,function, which will then go to the aggregate\Nfunction, Dialogue: 0,0:25:27.89,0:25:31.04,Default,,0000,0000,0000,,which then go to states. The the states gonna Dialogue: 0,0:25:31.04,0:25:34.29,Default,,0000,0000,0000,,be, persist the result and send it back to Dialogue: 0,0:25:34.29,0:25:35.08,Default,,0000,0000,0000,,Storm. Dialogue: 0,0:25:35.08,0:25:37.35,Default,,0000,0000,0000,,So since an I can tell, we can tell Dialogue: 0,0:25:37.35,0:25:40.55,Default,,0000,0000,0000,,Storm, we completed batch 123. So what's,\Nso what Dialogue: 0,0:25:40.55,0:25:46.31,Default,,0000,0000,0000,,it means is, after, whenever there's a, a\Npart Dialogue: 0,0:25:46.31,0:25:48.77,Default,,0000,0000,0000,,of the API that appears to be time-based,\Nlike Dialogue: 0,0:25:48.77,0:25:52.17,Default,,0000,0000,0000,,begin commit, commit, and in our case, the\Ncomplete Dialogue: 0,0:25:52.17,0:25:54.51,Default,,0000,0000,0000,,or the aggregate function, that's gonna be\Ncalled after Dialogue: 0,0:25:54.51,0:25:59.38,Default,,0000,0000,0000,,every single batch. So what, when we say aggregate Dialogue: 0,0:25:59.38,0:26:02.60,Default,,0000,0000,0000,,all, like, aggregate all the counts of the\Nhash Dialogue: 0,0:26:02.60,0:26:04.79,Default,,0000,0000,0000,,tags, we're really just doing it for a batch. Dialogue: 0,0:26:04.79,0:26:06.85,Default,,0000,0000,0000,,Oh yeah. And that it's going to start the Dialogue: 0,0:26:06.85,0:26:10.42,Default,,0000,0000,0000,,next batch. This just says start batch 124. Dialogue: 0,0:26:10.42,0:26:16.82,Default,,0000,0000,0000,,So is it web scale yet? Well, this is Dialogue: 0,0:26:16.82,0:26:19.06,Default,,0000,0000,0000,,obviously always the answer you have to ask\Nafter Dialogue: 0,0:26:19.06,0:26:22.71,Default,,0000,0000,0000,,every single thing. Every single hour of work. Dialogue: 0,0:26:22.71,0:26:25.00,Default,,0000,0000,0000,,Well, this is what I've got so far. This Dialogue: 0,0:26:25.00,0:26:26.55,Default,,0000,0000,0000,,is what I told you was happening. We've got Dialogue: 0,0:26:26.55,0:26:29.26,Default,,0000,0000,0000,,a Tweet spout. It's sending data down a stream, Dialogue: 0,0:26:29.26,0:26:32.49,Default,,0000,0000,0000,,a single stream, into, to the extract hash\Ntags. Dialogue: 0,0:26:32.49,0:26:36.33,Default,,0000,0000,0000,,So we have one stream and, well, can a Dialogue: 0,0:26:36.33,0:26:41.29,Default,,0000,0000,0000,,single stream handle fifty thousand Tweets\Nper second? Well, Dialogue: 0,0:26:41.29,0:26:42.58,Default,,0000,0000,0000,,let's break it down for a bit. Dialogue: 0,0:26:42.58,0:26:45.44,Default,,0000,0000,0000,,This is the most basic topology. This is what Dialogue: 0,0:26:45.44,0:26:47.60,Default,,0000,0000,0000,,I said happened. This is what conceptually\Nit happens. Dialogue: 0,0:26:47.60,0:26:53.61,Default,,0000,0000,0000,,But, the way it executes is, a stream actually Dialogue: 0,0:26:53.61,0:26:57.06,Default,,0000,0000,0000,,has many partitions. So the number of partitions\Nis Dialogue: 0,0:26:57.06,0:27:03.27,Default,,0000,0000,0000,,configurable. But every single partition can,\Nis completely independent. Dialogue: 0,0:27:03.27,0:27:06.52,Default,,0000,0000,0000,,So it can run on completely different nodes\Nin Dialogue: 0,0:27:06.52,0:27:11.08,Default,,0000,0000,0000,,your Storm cluster. So if you have eight different Dialogue: 0,0:27:11.08,0:27:14.30,Default,,0000,0000,0000,,partitions, you're going, you have the opportunity\Nof read Dialogue: 0,0:27:14.30,0:27:16.63,Default,,0000,0000,0000,,both from the spout, read from eight different\Nservers Dialogue: 0,0:27:16.63,0:27:20.19,Default,,0000,0000,0000,,sending it, like, transforming and persisting\Non eight different Dialogue: 0,0:27:20.19,0:27:21.93,Default,,0000,0000,0000,,servers. You could have up to, you can have Dialogue: 0,0:27:21.93,0:27:25.25,Default,,0000,0000,0000,,like a thousand different partitions if that's\Nhow far Dialogue: 0,0:27:25.25,0:27:26.57,Default,,0000,0000,0000,,you need to go. Dialogue: 0,0:27:26.57,0:27:29.70,Default,,0000,0000,0000,,But the main point is that you can split Dialogue: 0,0:27:29.70,0:27:32.39,Default,,0000,0000,0000,,up the work load even though you have one Dialogue: 0,0:27:32.39,0:27:35.20,Default,,0000,0000,0000,,conceptual stream, you can split it up across\Nmany Dialogue: 0,0:27:35.20,0:27:40.27,Default,,0000,0000,0000,,partitions and thus servers and threads. So\Nlet's kind Dialogue: 0,0:27:40.27,0:27:43.76,Default,,0000,0000,0000,,of try to work through, again, the hash tag Dialogue: 0,0:27:43.76,0:27:48.33,Default,,0000,0000,0000,,thing through how it might work with partitions.\NWe Dialogue: 0,0:27:48.33,0:27:52.11,Default,,0000,0000,0000,,had the spout. That says RailsConf. And it\Ncomes Dialogue: 0,0:27:52.11,0:27:54.44,Default,,0000,0000,0000,,out of the spout in one partition, and because Dialogue: 0,0:27:54.44,0:27:56.08,Default,,0000,0000,0000,,we haven't said anything yet, it's just gonna\Nsay, Dialogue: 0,0:27:56.08,0:27:58.34,Default,,0000,0000,0000,,oh, we're just gonna send it to the aggregate Dialogue: 0,0:27:58.34,0:28:03.00,Default,,0000,0000,0000,,transform in the same partition. And it's\Ngoing to Dialogue: 0,0:28:03.00,0:28:04.85,Default,,0000,0000,0000,,go to the state as well on the same Dialogue: 0,0:28:04.85,0:28:06.97,Default,,0000,0000,0000,,partition. Dialogue: 0,0:28:06.97,0:28:09.70,Default,,0000,0000,0000,,Here's another bottom left, like, on the left\Nside, Dialogue: 0,0:28:09.70,0:28:11.51,Default,,0000,0000,0000,,another RailsConf tag. It's going to, it's\Nin a Dialogue: 0,0:28:11.51,0:28:14.42,Default,,0000,0000,0000,,completely different partition. So possibly\Na different server. It's Dialogue: 0,0:28:14.42,0:28:16.67,Default,,0000,0000,0000,,going to go the aggregate in a completely\Ndifferent Dialogue: 0,0:28:16.67,0:28:20.83,Default,,0000,0000,0000,,server, and ideally what we'd want is for,\Nat Dialogue: 0,0:28:20.83,0:28:22.70,Default,,0000,0000,0000,,the very end, we can do the moving average Dialogue: 0,0:28:22.70,0:28:25.28,Default,,0000,0000,0000,,calculation on the same server. So ideally\Nit would Dialogue: 0,0:28:25.28,0:28:29.10,Default,,0000,0000,0000,,end up on the same state partition. Dialogue: 0,0:28:29.10,0:28:30.90,Default,,0000,0000,0000,,But if we have another, like this is just Dialogue: 0,0:28:30.90,0:28:33.58,Default,,0000,0000,0000,,hash tag sleep, something I would like to\Ndo, Dialogue: 0,0:28:33.58,0:28:35.76,Default,,0000,0000,0000,,it would stay on its own partition and it Dialogue: 0,0:28:35.76,0:28:37.97,Default,,0000,0000,0000,,would not, like, it doesn't need to be on Dialogue: 0,0:28:37.97,0:28:41.77,Default,,0000,0000,0000,,the, run on the same server that processes\Nthe Dialogue: 0,0:28:41.77,0:28:44.13,Default,,0000,0000,0000,,RailsConf hash tag. Dialogue: 0,0:28:44.13,0:28:45.90,Default,,0000,0000,0000,,So in, but the one thing's like that, one Dialogue: 0,0:28:45.90,0:28:49.36,Default,,0000,0000,0000,,move from the aggregate middle partition up\Nto a Dialogue: 0,0:28:49.36,0:28:53.36,Default,,0000,0000,0000,,combining into one end point state partition,\Nwe haven't Dialogue: 0,0:28:53.36,0:28:55.02,Default,,0000,0000,0000,,defined yet. Dialogue: 0,0:28:55.02,0:28:56.80,Default,,0000,0000,0000,,The way you do that is, at any point, Dialogue: 0,0:28:56.80,0:28:58.82,Default,,0000,0000,0000,,you can be, oh, at this point, I want Dialogue: 0,0:28:58.82,0:29:01.56,Default,,0000,0000,0000,,to partition my tuples by hash tag. And when Dialogue: 0,0:29:01.56,0:29:03.80,Default,,0000,0000,0000,,we say that, what we're saying is we're specifying Dialogue: 0,0:29:03.80,0:29:06.27,Default,,0000,0000,0000,,a field on the tuple, and we're saying, hey Dialogue: 0,0:29:06.27,0:29:09.48,Default,,0000,0000,0000,,storm, we would like, once we cross this point, Dialogue: 0,0:29:09.48,0:29:11.48,Default,,0000,0000,0000,,we would like any tuple that has the same Dialogue: 0,0:29:11.48,0:29:13.93,Default,,0000,0000,0000,,hash tag value to end up in the same Dialogue: 0,0:29:13.93,0:29:16.78,Default,,0000,0000,0000,,partition. So this, at this point, like, up\Nto Dialogue: 0,0:29:16.78,0:29:19.49,Default,,0000,0000,0000,,now, storm didn't, could actually run the\Nentire topology Dialogue: 0,0:29:19.49,0:29:21.21,Default,,0000,0000,0000,,and same partition on one server. Dialogue: 0,0:29:21.21,0:29:24.59,Default,,0000,0000,0000,,There was no need to, like, it might, I Dialogue: 0,0:29:24.59,0:29:26.28,Default,,0000,0000,0000,,mean you could specify, make it do that, but Dialogue: 0,0:29:26.28,0:29:28.93,Default,,0000,0000,0000,,there is actually no need to process, to have Dialogue: 0,0:29:28.93,0:29:31.05,Default,,0000,0000,0000,,a Tweet hit the network, basically. To go\Nto Dialogue: 0,0:29:31.05,0:29:33.31,Default,,0000,0000,0000,,a different node. The moment you add a partition, Dialogue: 0,0:29:33.31,0:29:36.00,Default,,0000,0000,0000,,like statement, you're saying, OK, yes, at\Nthis point, Dialogue: 0,0:29:36.00,0:29:37.39,Default,,0000,0000,0000,,we're going to have to make sure that all Dialogue: 0,0:29:37.39,0:29:39.38,Default,,0000,0000,0000,,hash tags on the same value end up on Dialogue: 0,0:29:39.38,0:29:41.92,Default,,0000,0000,0000,,the same server. So it's going to, at this Dialogue: 0,0:29:41.92,0:29:43.86,Default,,0000,0000,0000,,point, try and, like, make sure to do that, Dialogue: 0,0:29:43.86,0:29:47.70,Default,,0000,0000,0000,,and thus may hit the network for it. Dialogue: 0,0:29:47.70,0:29:49.93,Default,,0000,0000,0000,,So how many partitions should we run? Well,\Nthis Dialogue: 0,0:29:49.93,0:29:53.30,Default,,0000,0000,0000,,is very, very use-case specific. You could\Nrun eight. Dialogue: 0,0:29:53.30,0:29:57.84,Default,,0000,0000,0000,,You could run five-hundred twelve. You, it,\Nit actually Dialogue: 0,0:29:57.84,0:30:00.94,Default,,0000,0000,0000,,depends on your use case. But in order to Dialogue: 0,0:30:00.94,0:30:02.96,Default,,0000,0000,0000,,answer this better, we need to talk a bit Dialogue: 0,0:30:02.96,0:30:09.72,Default,,0000,0000,0000,,about how Storm schedules partitions on the\Ncluster. So Dialogue: 0,0:30:09.72,0:30:11.96,Default,,0000,0000,0000,,first of all, there are many servers. And\Nyou Dialogue: 0,0:30:11.96,0:30:15.17,Default,,0000,0000,0000,,have a cluster, you have many servers distributed,\Nyay. Dialogue: 0,0:30:15.17,0:30:17.69,Default,,0000,0000,0000,,So let's say, I don't know, three servers,\Nand Dialogue: 0,0:30:17.69,0:30:21.79,Default,,0000,0000,0000,,each server could have many threads per server,\Nbecause Dialogue: 0,0:30:21.79,0:30:24.54,Default,,0000,0000,0000,,it's very, very concurrent. Dialogue: 0,0:30:24.54,0:30:27.64,Default,,0000,0000,0000,,So you might have, I don't know, thirty, forty, Dialogue: 0,0:30:27.64,0:30:31.33,Default,,0000,0000,0000,,you, configurable amount of servers, threads\Nper the server. Dialogue: 0,0:30:31.33,0:30:34.24,Default,,0000,0000,0000,,And there can be many partitions per thread,\Nas Dialogue: 0,0:30:34.24,0:30:37.11,Default,,0000,0000,0000,,well. So every, a thread in Storm is considered Dialogue: 0,0:30:37.11,0:30:41.36,Default,,0000,0000,0000,,an executer. And an executer can have an arbitrary Dialogue: 0,0:30:41.36,0:30:43.31,Default,,0000,0000,0000,,number of partitions. So when you start up\Nyour Dialogue: 0,0:30:43.31,0:30:45.05,Default,,0000,0000,0000,,cluster, like let's say you've got, you want\Nto Dialogue: 0,0:30:45.05,0:30:47.57,Default,,0000,0000,0000,,boot up a Storm cluster that is three nodes, Dialogue: 0,0:30:47.57,0:30:51.18,Default,,0000,0000,0000,,you have nine partitions, three threads. So\Nit works Dialogue: 0,0:30:51.18,0:30:53.41,Default,,0000,0000,0000,,out pretty nice. This is how it might, like, Dialogue: 0,0:30:53.41,0:30:54.41,Default,,0000,0000,0000,,start initially. Dialogue: 0,0:30:54.41,0:30:59.40,Default,,0000,0000,0000,,So you boot up and you have, like, anyway. Dialogue: 0,0:30:59.40,0:31:01.76,Default,,0000,0000,0000,,So it's going to assign three partitions per\Nthread, Dialogue: 0,0:31:01.76,0:31:06.38,Default,,0000,0000,0000,,per server. So that, the nice thing of dealing Dialogue: 0,0:31:06.38,0:31:10.10,Default,,0000,0000,0000,,with partitions, dealing with partitions is\Nactually a, something Dialogue: 0,0:31:10.10,0:31:13.79,Default,,0000,0000,0000,,that basically everything that ends up being\Npretty distributed Dialogue: 0,0:31:13.79,0:31:16.70,Default,,0000,0000,0000,,does. Like, Reaq, Cassandra, Kafka, all these\Nthings go Dialogue: 0,0:31:16.70,0:31:20.37,Default,,0000,0000,0000,,via partitions, and, because it's very nice.\NBut little Dialogue: 0,0:31:20.37,0:31:21.11,Default,,0000,0000,0000,,tangent. Dialogue: 0,0:31:21.11,0:31:24.13,Default,,0000,0000,0000,,So, what happens is, when you lose a server, Dialogue: 0,0:31:24.13,0:31:27.11,Default,,0000,0000,0000,,oh. We don't have these executers anymore.\NWe have Dialogue: 0,0:31:27.11,0:31:30.96,Default,,0000,0000,0000,,all these partitions, they need to be handled\Nstill. Dialogue: 0,0:31:30.96,0:31:33.59,Default,,0000,0000,0000,,Storm is going to redistribute them across\Nexisting servers. Dialogue: 0,0:31:33.59,0:31:35.22,Default,,0000,0000,0000,,So it's gonna be like, OK, all these, all Dialogue: 0,0:31:35.22,0:31:38.97,Default,,0000,0000,0000,,these partitions which represent work, work\Nload, are gonna Dialogue: 0,0:31:38.97,0:31:42.13,Default,,0000,0000,0000,,be moved to the other servers. Dialogue: 0,0:31:42.13,0:31:45.54,Default,,0000,0000,0000,,And, inversely, that's a little, on the top\Nis Dialogue: 0,0:31:45.54,0:31:47.38,Default,,0000,0000,0000,,a little server box, and I'm saying, ah, we're Dialogue: 0,0:31:47.38,0:31:49.70,Default,,0000,0000,0000,,adding a server to our cluster. And then we're Dialogue: 0,0:31:49.70,0:31:53.15,Default,,0000,0000,0000,,gonna rebalance it, and Storm is going to\Ntake Dialogue: 0,0:31:53.15,0:31:57.06,Default,,0000,0000,0000,,the existing partitions and redistribute,\Nredistribute it across the Dialogue: 0,0:31:57.06,0:31:59.27,Default,,0000,0000,0000,,available servers. Dialogue: 0,0:31:59.27,0:32:02.55,Default,,0000,0000,0000,,Probably more evenly than my little graph\Nsays, talks Dialogue: 0,0:32:02.55,0:32:02.96,Default,,0000,0000,0000,,ab- shows. OK. Dialogue: 0,0:32:02.96,0:32:09.96,Default,,0000,0000,0000,,It's \Ntime for some real talk. Failure. Dialogue: 0,0:32:13.17,0:32:17.97,Default,,0000,0000,0000,,So, the question is not will there be failure. Dialogue: 0,0:32:17.97,0:32:21.64,Default,,0000,0000,0000,,The question is, how do we handle it? Cause, Dialogue: 0,0:32:21.64,0:32:23.91,Default,,0000,0000,0000,,if you're building a distributed system, at\Nsome point Dialogue: 0,0:32:23.91,0:32:25.23,Default,,0000,0000,0000,,there is going to be fail, there is going Dialogue: 0,0:32:25.23,0:32:29.63,Default,,0000,0000,0000,,to be something that fails. And you can either Dialogue: 0,0:32:29.63,0:32:31.68,Default,,0000,0000,0000,,close your eyes and pretend it doesn't happen\Nor Dialogue: 0,0:32:31.68,0:32:34.15,Default,,0000,0000,0000,,you can think through the problems, like how\Nwill Dialogue: 0,0:32:34.15,0:32:36.01,Default,,0000,0000,0000,,we recover? Is the system going to be in Dialogue: 0,0:32:36.01,0:32:39.14,Default,,0000,0000,0000,,an inconsistent state afterwards? Is, are\Nwe gonna lose Dialogue: 0,0:32:39.14,0:32:41.69,Default,,0000,0000,0000,,availability during this time? Dialogue: 0,0:32:41.69,0:32:44.84,Default,,0000,0000,0000,,Handling failure properly is probably the\Nhardest part of Dialogue: 0,0:32:44.84,0:32:49.87,Default,,0000,0000,0000,,building any distributed system. But it's\Nnot like you Dialogue: 0,0:32:49.87,0:32:52.04,Default,,0000,0000,0000,,have to use really fancy technologies to build\Na Dialogue: 0,0:32:52.04,0:32:54.57,Default,,0000,0000,0000,,distributed system. Like if you build a plain-old\NRails Dialogue: 0,0:32:54.57,0:32:57.46,Default,,0000,0000,0000,,app to build a distributed system, you have\Na, Dialogue: 0,0:32:57.46,0:32:59.03,Default,,0000,0000,0000,,the browser talks to the Rails app. And all Dialogue: 0,0:32:59.03,0:33:01.05,Default,,0000,0000,0000,,these problems that exist, maybe we don't\Nthink about Dialogue: 0,0:33:01.05,0:33:03.56,Default,,0000,0000,0000,,it as much because we're OK with higher error Dialogue: 0,0:33:03.56,0:33:07.42,Default,,0000,0000,0000,,rates and inconsistencies. But, let's think,\Nlike, even a Dialogue: 0,0:33:07.42,0:33:10.64,Default,,0000,0000,0000,,simple case like a signup form, like the user Dialogue: 0,0:33:10.64,0:33:15.60,Default,,0000,0000,0000,,fills up the signup form and hits submit and Dialogue: 0,0:33:15.60,0:33:18.90,Default,,0000,0000,0000,,it errors out, what happened, right? Did the,\Ndid Dialogue: 0,0:33:18.90,0:33:21.40,Default,,0000,0000,0000,,it fail reaching our server? Did it reach\Nour Dialogue: 0,0:33:21.40,0:33:24.98,Default,,0000,0000,0000,,server and we started processing it and, but\Nwe Dialogue: 0,0:33:24.98,0:33:28.59,Default,,0000,0000,0000,,had an exception? Did we, like, actually create\Nthe Dialogue: 0,0:33:28.59,0:33:31.05,Default,,0000,0000,0000,,account in the database and the next time\Nthe Dialogue: 0,0:33:31.05,0:33:32.87,Default,,0000,0000,0000,,user is going to refresh the page it'll add, Dialogue: 0,0:33:32.87,0:33:34.02,Default,,0000,0000,0000,,like, I'd like, I'd like to really signup.\NI'm Dialogue: 0,0:33:34.02,0:33:36.06,Default,,0000,0000,0000,,gonna try to signup again. Dialogue: 0,0:33:36.06,0:33:38.59,Default,,0000,0000,0000,,And it's gonna be, oh no, you can't signup. Dialogue: 0,0:33:38.59,0:33:40.91,Default,,0000,0000,0000,,Your email is taken. Well, OK. How do we Dialogue: 0,0:33:40.91,0:33:44.39,Default,,0000,0000,0000,,handle all this? And there are ways. I'm just Dialogue: 0,0:33:44.39,0:33:47.35,Default,,0000,0000,0000,,kind of bringing up this simple case, which\NI'm Dialogue: 0,0:33:47.35,0:33:51.28,Default,,0000,0000,0000,,sure everybody here has properly protected\Ntheir Rails app Dialogue: 0,0:33:51.28,0:33:54.16,Default,,0000,0000,0000,,against, because nobody in this room has ever\Nhad Dialogue: 0,0:33:54.16,0:33:56.00,Default,,0000,0000,0000,,this bug happen. Dialogue: 0,0:33:56.00,0:33:58.52,Default,,0000,0000,0000,,But, I bring it up because this is a Dialogue: 0,0:33:58.52,0:34:01.81,Default,,0000,0000,0000,,relatively simple case, and it really just\Ngets more Dialogue: 0,0:34:01.81,0:34:05.22,Default,,0000,0000,0000,,complicated from here. So, for example, in\NStorm, the Dialogue: 0,0:34:05.22,0:34:07.84,Default,,0000,0000,0000,,whole, like, it's fine. The whole point is\Nthat, Dialogue: 0,0:34:07.84,0:34:09.75,Default,,0000,0000,0000,,you may have one input tuple. Like, one message Dialogue: 0,0:34:09.75,0:34:13.05,Default,,0000,0000,0000,,might be popped up from the queue, and as Dialogue: 0,0:34:13.05,0:34:15.41,Default,,0000,0000,0000,,you process it, you're going to end up generating Dialogue: 0,0:34:15.41,0:34:17.90,Default,,0000,0000,0000,,more tuples. Like, for example, we are going\Nto, Dialogue: 0,0:34:17.90,0:34:20.34,Default,,0000,0000,0000,,from one Tweet, we are going to extract many Dialogue: 0,0:34:20.34,0:34:23.52,Default,,0000,0000,0000,,hash tag messages. And if you, as you build Dialogue: 0,0:34:23.52,0:34:26.42,Default,,0000,0000,0000,,more and more complex topologies, this is\Njust going Dialogue: 0,0:34:26.42,0:34:29.83,Default,,0000,0000,0000,,to like, this can, like, fan out. Dialogue: 0,0:34:29.83,0:34:34.34,Default,,0000,0000,0000,,So what happens if, like, this one dies? So Dialogue: 0,0:34:34.34,0:34:38.18,Default,,0000,0000,0000,,everything else, like, succeeded, but that\None, and all Dialogue: 0,0:34:38.18,0:34:43.37,Default,,0000,0000,0000,,of its children, didn't. How do we, then,\Nwhat Dialogue: 0,0:34:43.37,0:34:47.25,Default,,0000,0000,0000,,do we do? Do we try to retry just Dialogue: 0,0:34:47.25,0:34:48.95,Default,,0000,0000,0000,,that one tuple? If we do that, we then Dialogue: 0,0:34:48.95,0:34:52.71,Default,,0000,0000,0000,,have to, well, we have to make, track, that Dialogue: 0,0:34:52.71,0:34:55.09,Default,,0000,0000,0000,,is that specific one that failed, that's gonna\Ninvolve Dialogue: 0,0:34:55.09,0:34:59.96,Default,,0000,0000,0000,,a lot of bookkeeping. The, and what happens\Nif Dialogue: 0,0:34:59.96,0:35:02.37,Default,,0000,0000,0000,,tuples have multiple parents? And one parent\Nended up Dialogue: 0,0:35:02.37,0:35:04.11,Default,,0000,0000,0000,,succeeding but not the other? Dialogue: 0,0:35:04.11,0:35:07.08,Default,,0000,0000,0000,,And that's possible, with Storm. You can have\Ntuples Dialogue: 0,0:35:07.08,0:35:09.98,Default,,0000,0000,0000,,that have multiple parents because you can\Nhave joins. Dialogue: 0,0:35:09.98,0:35:12.06,Default,,0000,0000,0000,,But how does one process it? Do you retry Dialogue: 0,0:35:12.06,0:35:17.51,Default,,0000,0000,0000,,the entire batch? It's hard. Dialogue: 0,0:35:17.51,0:35:20.01,Default,,0000,0000,0000,,I will get to how you handle it. But, Dialogue: 0,0:35:20.01,0:35:22.78,Default,,0000,0000,0000,,and the details of how Storm does it internally Dialogue: 0,0:35:22.78,0:35:26.21,Default,,0000,0000,0000,,are really interesting, and I would recommend\Nyou go Dialogue: 0,0:35:26.21,0:35:28.17,Default,,0000,0000,0000,,and read the Wiki page on it, but I Dialogue: 0,0:35:28.17,0:35:31.44,Default,,0000,0000,0000,,don't have time to talk about it. So, going Dialogue: 0,0:35:31.44,0:35:33.70,Default,,0000,0000,0000,,back to the original Sidekiq thing, what happens\Nif Dialogue: 0,0:35:33.70,0:35:36.64,Default,,0000,0000,0000,,we're, like, halfway through the iteration\Nand we've persisted Dialogue: 0,0:35:36.64,0:35:39.77,Default,,0000,0000,0000,,some of them and, we, it just explodes halfway Dialogue: 0,0:35:39.77,0:35:43.01,Default,,0000,0000,0000,,through? Well, do we retry the message? Do\Nwe, Dialogue: 0,0:35:43.01,0:35:47.38,Default,,0000,0000,0000,,I don't know. What do? What now? Dialogue: 0,0:35:47.38,0:35:49.66,Default,,0000,0000,0000,,Generally, we have to think about, well, when\Nwe're Dialogue: 0,0:35:49.66,0:35:52.24,Default,,0000,0000,0000,,using a system like Sidekiq or Storm, what\Nare Dialogue: 0,0:35:52.24,0:35:55.02,Default,,0000,0000,0000,,the message processing guarantees? Because\Nit's going to change Dialogue: 0,0:35:55.02,0:35:58.56,Default,,0000,0000,0000,,how you work with it. Generally, the two,\Nthe Dialogue: 0,0:35:58.56,0:36:02.52,Default,,0000,0000,0000,,two main ones are, are messages processed\Nat least Dialogue: 0,0:36:02.52,0:36:05.20,Default,,0000,0000,0000,,once, or are messages processed at most once?\NAnd Dialogue: 0,0:36:05.20,0:36:07.16,Default,,0000,0000,0000,,you can only get one, not the other. Like Dialogue: 0,0:36:07.16,0:36:09.81,Default,,0000,0000,0000,,you can, I mean, yes, you can get exactly Dialogue: 0,0:36:09.81,0:36:11.04,Default,,0000,0000,0000,,one, see what I mean. If anyone says, ah, Dialogue: 0,0:36:11.04,0:36:13.32,Default,,0000,0000,0000,,we guarantee that a message is going to be Dialogue: 0,0:36:13.32,0:36:16.41,Default,,0000,0000,0000,,processed exactly once, they're lying. Dialogue: 0,0:36:16.41,0:36:17.55,Default,,0000,0000,0000,,It's impossible. Dialogue: 0,0:36:17.55,0:36:19.95,Default,,0000,0000,0000,,So to que, what we can try to do Dialogue: 0,0:36:19.95,0:36:22.75,Default,,0000,0000,0000,,is, well, Storm is at least once. But what Dialogue: 0,0:36:22.75,0:36:25.22,Default,,0000,0000,0000,,we can try to do is handle the case, Dialogue: 0,0:36:25.22,0:36:27.53,Default,,0000,0000,0000,,is like, can we handle, if we get messages Dialogue: 0,0:36:27.53,0:36:30.31,Default,,0000,0000,0000,,that show up multiple times, can we handle\Nthat? Dialogue: 0,0:36:30.31,0:36:32.67,Default,,0000,0000,0000,,Can we maybe not redo the processing? Can\Nwe, Dialogue: 0,0:36:32.67,0:36:36.25,Default,,0000,0000,0000,,like, can we try to ensure that we end Dialogue: 0,0:36:36.25,0:36:38.57,Default,,0000,0000,0000,,up with what, as close to as possible, as Dialogue: 0,0:36:38.57,0:36:42.25,Default,,0000,0000,0000,,close as possible to processing exactly once? Dialogue: 0,0:36:42.25,0:36:45.76,Default,,0000,0000,0000,,And the way Storm tries to approach this is Dialogue: 0,0:36:45.76,0:36:49.07,Default,,0000,0000,0000,,really pretty nice. Like I said, we have batches. Dialogue: 0,0:36:49.07,0:36:51.19,Default,,0000,0000,0000,,Everything's processed in batches. And what\NStorm will guarantee Dialogue: 0,0:36:51.19,0:36:54.15,Default,,0000,0000,0000,,is that a batch, like the next batch will Dialogue: 0,0:36:54.15,0:36:57.57,Default,,0000,0000,0000,,not complete, like, will not go, reach completion\Nuntil Dialogue: 0,0:36:57.57,0:36:59.89,Default,,0000,0000,0000,,the previous one is fully committed. And it\Nwill Dialogue: 0,0:36:59.89,0:37:03.54,Default,,0000,0000,0000,,also give monotonically increasing batch ids. Dialogue: 0,0:37:03.54,0:37:05.42,Default,,0000,0000,0000,,So it will guarantee that the next batch id Dialogue: 0,0:37:05.42,0:37:08.36,Default,,0000,0000,0000,,will have a bigger id than the previous one. Dialogue: 0,0:37:08.36,0:37:11.66,Default,,0000,0000,0000,,So we can combine this, and I will just Dialogue: 0,0:37:11.66,0:37:13.88,Default,,0000,0000,0000,,Tweet my slides and it will be easier to Dialogue: 0,0:37:13.88,0:37:16.56,Default,,0000,0000,0000,,read, but the main this is, I'm now introducing, Dialogue: 0,0:37:16.56,0:37:18.92,Default,,0000,0000,0000,,in the beginning, commit and running out of\Ntime Dialogue: 0,0:37:18.92,0:37:20.15,Default,,0000,0000,0000,,so I'm gonna try to get through this. Dialogue: 0,0:37:20.15,0:37:22.96,Default,,0000,0000,0000,,But the main thing, if we, in the beginning Dialogue: 0,0:37:22.96,0:37:24.44,Default,,0000,0000,0000,,commit, we get the transaction id, what we\Ncan Dialogue: 0,0:37:24.44,0:37:28.04,Default,,0000,0000,0000,,do is store it off, and what, and load Dialogue: 0,0:37:28.04,0:37:30.30,Default,,0000,0000,0000,,up the state and we will talk about that Dialogue: 0,0:37:30.30,0:37:31.80,Default,,0000,0000,0000,,later. But the important thing I want to reach Dialogue: 0,0:37:31.80,0:37:34.71,Default,,0000,0000,0000,,is, talk about it is here, in the update Dialogue: 0,0:37:34.71,0:37:36.84,Default,,0000,0000,0000,,function, we don't actually write to the database\Nanymore. Dialogue: 0,0:37:36.84,0:37:39.46,Default,,0000,0000,0000,,We just store that in memory. And the main Dialogue: 0,0:37:39.46,0:37:42.93,Default,,0000,0000,0000,,line is return if hash tag last transaction\Nid Dialogue: 0,0:37:42.93,0:37:45.04,Default,,0000,0000,0000,,equals transaction id. What that means is\Nwe are Dialogue: 0,0:37:45.04,0:37:46.68,Default,,0000,0000,0000,,able to then see, OK, since we know the Dialogue: 0,0:37:46.68,0:37:49.67,Default,,0000,0000,0000,,guarantees, if for some reason a message failed\Nand Dialogue: 0,0:37:49.67,0:37:52.89,Default,,0000,0000,0000,,we reprocess the batch, we, and we know, we Dialogue: 0,0:37:52.89,0:37:55.39,Default,,0000,0000,0000,,can know that this hash tag actually passed\Na Dialogue: 0,0:37:55.39,0:37:57.51,Default,,0000,0000,0000,,certain point, like passed that point, so\Nwe don't Dialogue: 0,0:37:57.51,0:38:00.96,Default,,0000,0000,0000,,need to recompute the moving average on it. Dialogue: 0,0:38:00.96,0:38:05.40,Default,,0000,0000,0000,,And get as close as possible to exactly once. Dialogue: 0,0:38:05.40,0:38:09.04,Default,,0000,0000,0000,,So then, we store all this memory, and once Dialogue: 0,0:38:09.04,0:38:10.66,Default,,0000,0000,0000,,the commit happens, which happens at the end\Nof Dialogue: 0,0:38:10.66,0:38:12.42,Default,,0000,0000,0000,,the batch, we write all of this, we do Dialogue: 0,0:38:12.42,0:38:14.84,Default,,0000,0000,0000,,basically, we do all the work that hits the Dialogue: 0,0:38:14.84,0:38:16.56,Default,,0000,0000,0000,,outside world, which is right to the database.\NWe Dialogue: 0,0:38:16.56,0:38:22.27,Default,,0000,0000,0000,,do all that there. And if we reach completion, Dialogue: 0,0:38:22.27,0:38:24.63,Default,,0000,0000,0000,,the batch will complete successfully, and\Nwe know, OK, Dialogue: 0,0:38:24.63,0:38:28.73,Default,,0000,0000,0000,,all the messages have been successfully processed.\NBut if Dialogue: 0,0:38:28.73,0:38:31.03,Default,,0000,0000,0000,,it fails during the persist, half of the them Dialogue: 0,0:38:31.03,0:38:33.76,Default,,0000,0000,0000,,will have the new transaction id and half\Nof Dialogue: 0,0:38:33.76,0:38:35.49,Default,,0000,0000,0000,,them will have the old transaction id, but\Nwhen Dialogue: 0,0:38:35.49,0:38:37.44,Default,,0000,0000,0000,,the batch gets replayed and we reload that,\Nwe Dialogue: 0,0:38:37.44,0:38:40.99,Default,,0000,0000,0000,,will see, ah, this message already finished\Nthis batch. Dialogue: 0,0:38:40.99,0:38:42.69,Default,,0000,0000,0000,,So to roll through this real quick, I know Dialogue: 0,0:38:42.69,0:38:44.77,Default,,0000,0000,0000,,it's, I believe I'm getting really close or\Npast Dialogue: 0,0:38:44.77,0:38:48.41,Default,,0000,0000,0000,,my time. But, so, yeah, let's get data. What Dialogue: 0,0:38:48.41,0:38:51.66,Default,,0000,0000,0000,,happens? Like, I fetch two hundred tuples.\NBoom. We Dialogue: 0,0:38:51.66,0:38:54.40,Default,,0000,0000,0000,,go through. Aggregate. And it explodes. Dialogue: 0,0:38:54.40,0:38:59.34,Default,,0000,0000,0000,,Eventually, the Storm coordinator's gonna\Nbe like, up above, Dialogue: 0,0:38:59.34,0:39:02.16,Default,,0000,0000,0000,,it says, guys, I did not get a completion Dialogue: 0,0:39:02.16,0:39:05.39,Default,,0000,0000,0000,,message. All right, fine. Spout, let, we're\Ngoing to Dialogue: 0,0:39:05.39,0:39:08.68,Default,,0000,0000,0000,,restart batch 123, because I did not get completion. Dialogue: 0,0:39:08.68,0:39:11.44,Default,,0000,0000,0000,,So the Tweet spout's gonna be like, OK, I Dialogue: 0,0:39:11.44,0:39:14.12,Default,,0000,0000,0000,,got this. I'm gonna re-emit exactly the same\Ntwo Dialogue: 0,0:39:14.12,0:39:15.97,Default,,0000,0000,0000,,hundred tuples in the same order that I did Dialogue: 0,0:39:15.97,0:39:16.98,Default,,0000,0000,0000,,before. Dialogue: 0,0:39:16.98,0:39:19.64,Default,,0000,0000,0000,,So what that means is if we successfully extract, Dialogue: 0,0:39:19.64,0:39:23.41,Default,,0000,0000,0000,,like, if we successfully implemented our transforms\Nto be Dialogue: 0,0:39:23.41,0:39:27.89,Default,,0000,0000,0000,,purely functional, as in completely deterministic\Nbased off the Dialogue: 0,0:39:27.89,0:39:30.08,Default,,0000,0000,0000,,input, as we go through, we're gonna get exactly Dialogue: 0,0:39:30.08,0:39:32.86,Default,,0000,0000,0000,,the same outputs. Such that, once we get to Dialogue: 0,0:39:32.86,0:39:36.04,Default,,0000,0000,0000,,the state, I'd, in theory everything's going\Nto be Dialogue: 0,0:39:36.04,0:39:38.55,Default,,0000,0000,0000,,exactly the same, such that when we actually\Nrun Dialogue: 0,0:39:38.55,0:39:41.67,Default,,0000,0000,0000,,the persist, we know that if something we\Nsuccessfully Dialogue: 0,0:39:41.67,0:39:43.74,Default,,0000,0000,0000,,persisted, and we get the same transaction\Nid, we're Dialogue: 0,0:39:43.74,0:39:45.12,Default,,0000,0000,0000,,gonna be like, well, it's going to be exactly Dialogue: 0,0:39:45.12,0:39:46.91,Default,,0000,0000,0000,,the same output. Dialogue: 0,0:39:46.91,0:39:51.75,Default,,0000,0000,0000,,All right. So the main catch is, of course, Dialogue: 0,0:39:51.75,0:39:55.35,Default,,0000,0000,0000,,transforms must be purely functional, cause\Notherwise this won't Dialogue: 0,0:39:55.35,0:39:59.30,Default,,0000,0000,0000,,work. That means not ever using time dot now Dialogue: 0,0:39:59.30,0:40:02.92,Default,,0000,0000,0000,,within your transforms. I actually changed\Nit on the Dialogue: 0,0:40:02.92,0:40:05.03,Default,,0000,0000,0000,,last slide, but I had to, like, zoom through Dialogue: 0,0:40:05.03,0:40:07.19,Default,,0000,0000,0000,,it. The main thing is, again, one, if you Dialogue: 0,0:40:07.19,0:40:09.18,Default,,0000,0000,0000,,have need to get the current time for a Dialogue: 0,0:40:09.18,0:40:10.94,Default,,0000,0000,0000,,batch, one way to do it is to have Dialogue: 0,0:40:10.94,0:40:13.38,Default,,0000,0000,0000,,a spout that only emits the current time.\NAnd Dialogue: 0,0:40:13.38,0:40:15.19,Default,,0000,0000,0000,,every, if the batch gets re-emitted, it just\Nknows, Dialogue: 0,0:40:15.19,0:40:17.92,Default,,0000,0000,0000,,it saves off somewhere. For this batch id,\NI Dialogue: 0,0:40:17.92,0:40:19.75,Default,,0000,0000,0000,,outputted this time. Dialogue: 0,0:40:19.75,0:40:22.36,Default,,0000,0000,0000,,And the next thing is that, it's going to Dialogue: 0,0:40:22.36,0:40:27.05,Default,,0000,0000,0000,,require the spouts to re-emit exactly identical\Nbatches. Which, Dialogue: 0,0:40:27.05,0:40:32.86,Default,,0000,0000,0000,,unfortunately, I think most spouts do not.\NI mean, Dialogue: 0,0:40:32.86,0:40:36.40,Default,,0000,0000,0000,,most, like Redis, does not support. So in\Norder Dialogue: 0,0:40:36.40,0:40:40.48,Default,,0000,0000,0000,,to get this level of guarantees, the only\Nqueue Dialogue: 0,0:40:40.48,0:40:43.24,Default,,0000,0000,0000,,is Kafka. And I was, would like to talk Dialogue: 0,0:40:43.24,0:40:45.67,Default,,0000,0000,0000,,more about Kafka and why it's so awesome,\Nbut Dialogue: 0,0:40:45.67,0:40:47.84,Default,,0000,0000,0000,,we can talk about that after. In general,\Neven Dialogue: 0,0:40:47.84,0:40:49.62,Default,,0000,0000,0000,,if you're not using Storm, Kafka is a really Dialogue: 0,0:40:49.62,0:40:52.63,Default,,0000,0000,0000,,amazing tool that I highly recommend. Dialogue: 0,0:40:52.63,0:40:57.12,Default,,0000,0000,0000,,So, TL;DR. Storm is a really powerful platform\Nfor Dialogue: 0,0:40:57.12,0:41:00.87,Default,,0000,0000,0000,,writing workers. It's really great for stateful\Njobs, and Dialogue: 0,0:41:00.87,0:41:03.76,Default,,0000,0000,0000,,by stateful jobs, I mean jobs that depend\Non Dialogue: 0,0:41:03.76,0:41:07.71,Default,,0000,0000,0000,,the result of previous ones. It's good for\Ncomplex Dialogue: 0,0:41:07.71,0:41:11.82,Default,,0000,0000,0000,,data processing flows. And that's it. I think.\NI Dialogue: 0,0:41:11.82,0:41:13.58,Default,,0000,0000,0000,,don't. My time wasn't up here, so I'm guessing Dialogue: 0,0:41:13.58,0:41:14.64,Default,,0000,0000,0000,,this is about right. Dialogue: 0,0:41:14.64,0:41:19.86,Default,,0000,0000,0000,,Am I really early or late? OK. Cool. Dialogue: 0,0:41:19.86,0:41:22.03,Default,,0000,0000,0000,,I didn't have a clock on either so I'm Dialogue: 0,0:41:22.03,0:41:23.25,Default,,0000,0000,0000,,like, guessing. It didn't start, I don't know.\NAll Dialogue: 0,0:41:23.25,0:41:25.89,Default,,0000,0000,0000,,right. Well, I'm done. Thanks. All right.