0:00:16.830,0:00:17.590 TOBY HEDE: Good morning everybody. 0:00:17.590,0:00:24.369 Friday. Yes. It's been a long week. I'm excited. 0:00:24.369,0:00:29.279 I'm highly caffeinated. So without further[br]ado, 0:00:29.279,0:00:34.180 I present An Ode to 17 Databases in 33 Minutes. 0:00:34.180,0:00:37.870 I'm gonna mangle a large number of metaphors. 0:00:37.870,0:00:40.820 There'll be a lot of animated gifs. 0:00:40.820,0:00:44.210 I've learned that this week, if you see it[br]like that, 0:00:44.210,0:00:47.910 there's Star Wars, Dungeons and Dragons, 0:00:47.910,0:00:49.350 and all of that's very, unfortunately, stereotypical. 0:00:49.350,0:00:51.900 So a bit of an indictment. 0:00:51.900,0:00:55.839 This whole thing started as a joke. Seventeen[br]databases. 0:00:55.839,0:00:59.159 I actually did in five minutes. Thirty-three[br]minutes is 0:00:59.159,0:01:03.799 worse. The whole thing is just a catastrophe,[br]really. 0:01:03.799,0:01:04.659 But anyway. 0:01:04.659,0:01:07.610 We're gonna cover a whole bunch of different[br]databases 0:01:07.610,0:01:09.890 and a little bit of the underlying theory,[br]and 0:01:09.890,0:01:12.670 hopefully you'll walk out and you'll understand[br]why to 0:01:12.670,0:01:13.729 use PostGres. 0:01:13.729,0:01:14.250 [laughter] 0:01:14.250,0:01:20.180 I'm Toby. You can find me on the internet. 0:01:20.180,0:01:22.260 I work at a company called Nine Fold. 0:01:22.260,0:01:26.100 V.O.: We're having a problem, there's no screen. 0:01:26.100,0:01:33.100 T.H.: Oh. No screens. Is that me? 0:01:35.960,0:01:41.210 Before it was, there was no red. So, now 0:01:41.210,0:01:43.860 there's no any, anything. 0:01:43.860,0:01:45.180 V.O.: Nothing. 0:01:45.180,0:01:46.500 T.H.: Hey. 0:01:46.500,0:01:47.820 AUDIENCE: Hey! 0:01:47.820,0:01:51.120 T.H.: I have no slides. 0:01:51.120,0:01:54.670 Well, you missed my beautiful slides. There's.[br]You missed 0:01:54.670,0:01:57.740 the first animation. That's a shame. You missed[br]the 0:01:57.740,0:02:01.500 list. It's awesome. You missed me and my excellent 0:02:01.500,0:02:05.070 job titles. So yes. 0:02:05.070,0:02:08.340 I work at Nine Fold. They have very kindly 0:02:08.340,0:02:12.550 flown me over here from Australia, which explains[br]why 0:02:12.550,0:02:16.690 I sound like I come from the deep south. 0:02:16.690,0:02:18.060 Cause I do. 0:02:18.060,0:02:21.260 Most of this week, this has been me. So 0:02:21.260,0:02:23.560 today I'm finally over the jetlag just in[br]time 0:02:23.560,0:02:26.670 to go home and have it all over again 0:02:26.670,0:02:27.850 next week. 0:02:27.850,0:02:32.450 So, a couple of quick facts about Straya.[br]There 0:02:32.450,0:02:39.120 are much fewer syllables than you're used[br]to using. 0:02:39.120,0:02:43.950 This is an, a genuine Australian politician.[br]He's a 0:02:43.950,0:02:48.060 mining magnate billionaire and he is currently[br]running a 0:02:48.060,0:02:52.880 MVP Jurrassic theme park with giant fiberglass[br]dinosaurs. And 0:02:52.880,0:02:56.310 I, I for one am for it. So I 0:02:56.310,0:02:58.510 realize there wasn't enough Star Wars references[br]so this 0:02:58.510,0:03:00.540 is just completely gratuitous. 0:03:00.540,0:03:05.430 Anyway. So. The thrust is that distributed[br]systems are 0:03:05.430,0:03:08.470 hard and databases are fun. Pictured here[br]is a 0:03:08.470,0:03:13.530 distributed system. You can see there's two[br]app nodes 0:03:13.530,0:03:16.660 and then there's two, there's like a master/slave[br]kind 0:03:16.660,0:03:20.920 of setup going on here as well. So we're 0:03:20.920,0:03:23.950 gonna talk about some of the complexities[br]of running 0:03:23.950,0:03:27.670 these types of systems, and it's really fun[br]stuff 0:03:27.670,0:03:29.980 once you get under the cover and start thinking 0:03:29.980,0:03:32.250 about some of the complexities. 0:03:32.250,0:03:37.030 So. NoSQL is a thing. We have NewSQL now. 0:03:37.030,0:03:38.780 I'm gonna be covering some of these things.[br]We've 0:03:38.780,0:03:44.000 also got PostSQL, Post-Rock Ambient SQL. And[br]there's a 0:03:44.000,0:03:47.120 whole gammit of these things. They all make[br]my 0:03:47.120,0:03:50.819 brain explode and the, I think the trick to 0:03:50.819,0:03:53.069 understanding all of this stuff is to actually[br]think 0:03:53.069,0:03:55.459 about some of what's happening underneath.[br]And you can 0:03:55.459,0:03:59.650 make decisions about your databases. 0:03:59.650,0:04:01.700 Hopefully you're all familiar with some of[br]the concepts 0:04:01.700,0:04:07.250 of traditional relational databases. We have[br]Acid, which provides 0:04:07.250,0:04:10.640 certain guarantees about the way that your[br]data behaves. 0:04:10.640,0:04:13.129 You can update data and be sure it was 0:04:13.129,0:04:18.099 updated. Things are isolated from each other.[br]Things persist 0:04:18.099,0:04:20.970 over time. 0:04:20.970,0:04:23.129 Another thing that you may have heard of,[br]this 0:04:23.129,0:04:25.740 is a, this is a leap that I need 0:04:25.740,0:04:27.990 to another animation, is a thing called the[br]CAP 0:04:27.990,0:04:30.879 Theorem. So this gets talked about a lot when 0:04:30.879,0:04:34.889 we start talking about this new generation[br]of databases. 0:04:34.889,0:04:39.599 CAP stands for consistency, availability,[br]and partition tolerance, and 0:04:39.599,0:04:44.430 it provides, basically, some strong foundation[br]for reasoning about 0:04:44.430,0:04:48.050 the way distributed systems behave and how[br]they interoperate 0:04:48.050,0:04:49.680 and how they communicate. So I'm gonna give[br]you 0:04:49.680,0:04:52.620 a brief introduction to how that all kind[br]of 0:04:52.620,0:04:52.969 works. 0:04:52.969,0:04:57.279 So, the original CAP Theorem, as stated, was,[br]is 0:04:57.279,0:04:59.610 called Brewer's Conjecture. A guy called Brewer[br]just sort 0:04:59.610,0:05:02.659 of had this idea. It's actually on some really 0:05:02.659,0:05:06.680 awesomely-designed PowerPoint slides from[br]some thing he did. And 0:05:06.680,0:05:11.789 he was saying that with consistency, availability,[br]and partition 0:05:11.789,0:05:15.210 tolerance - so the data can, can only be 0:05:15.210,0:05:17.439 two of these things at any one time. So 0:05:17.439,0:05:20.039 the data can be consistent or it can be 0:05:20.039,0:05:23.979 accessible or it can handle network failures. 0:05:23.979,0:05:28.249 So people then took this conjecture and actually[br]made 0:05:28.249,0:05:31.800 a formal kind of proof in, in much more 0:05:31.800,0:05:38.800 rigorous computer science terms. And actually[br]said, it's impossible, 0:05:38.830,0:05:42.580 in an asynchronous network model, to implement[br]a read/write 0:05:42.580,0:05:49.210 data object that is simultaneously available[br]and is also 0:05:49.210,0:05:50.979 atomically consistent. 0:05:50.979,0:05:53.110 And so all of this stuff around NewSQL and 0:05:53.110,0:05:56.939 NoSQL and bleh, all of that stuff, is about 0:05:56.939,0:06:01.610 manipulating these different variables. There's[br]also a thing called 0:06:01.610,0:06:02.990 Base but I'm not gonna talk about it cause 0:06:02.990,0:06:05.789 it's actually just a made-up acronym that[br]has no 0:06:05.789,0:06:06.900 relevance to anything. 0:06:06.900,0:06:10.059 So, what, what does CAP actually, what, what[br]are 0:06:10.059,0:06:13.649 we talking about here? And why is it important? 0:06:13.649,0:06:16.719 It's important, actually, because everything[br]is already distributed. What 0:06:16.719,0:06:20.309 we do today is inherently a distributed system.[br]You 0:06:20.309,0:06:23.229 have a browser talking to a server, an app 0:06:23.229,0:06:25.520 server, Rails server - cause we're at RailsConf[br]- 0:06:25.520,0:06:29.249 and then that's talking to a PostGres database,[br]or 0:06:29.249,0:06:33.809 a MySQL database or something even fancier[br]and shinier. 0:06:33.809,0:06:36.110 That's a distributed system. And as we move[br]into 0:06:36.110,0:06:41.460 more heavy client-based operations, that distribution[br]is getting much 0:06:41.460,0:06:43.759 more front-loaded, so you, you've got state[br]in the 0:06:43.759,0:06:46.589 browser that's now synchronizing with state[br]on the server. 0:06:46.589,0:06:50.300 So we already actually suffer many of these[br]problems. 0:06:50.300,0:06:55.270 This is a handy and completely untrue guide[br]to 0:06:55.270,0:06:59.389 NoSQL systems and breaking them into this[br]idea of 0:06:59.389,0:07:03.039 some things are available and some things[br]are consistent. 0:07:03.039,0:07:06.990 So, all of that is almost but not quite 0:07:06.990,0:07:08.619 entirely untrue. 0:07:08.619,0:07:12.800 What the actual theorem says is that under[br]a 0:07:12.800,0:07:16.490 network failure - so you've got multiple nodes[br]and 0:07:16.490,0:07:20.159 they now can no longer communicate - you can 0:07:20.159,0:07:23.490 choose whether the data is consistent or whether[br]the 0:07:23.490,0:07:28.849 data is available. And I have some demonstrations[br]here 0:07:28.849,0:07:30.860 to just - it actually ends up being very 0:07:30.860,0:07:31.580 easy to understand. 0:07:31.580,0:07:36.009 So, here we have typical cluster of nodes[br]working 0:07:36.009,0:07:41.110 together. We're gonna model some communication[br]between them. So 0:07:41.110,0:07:45.669 there's a, there's a write on this system.[br]It 0:07:45.669,0:07:49.449 comes in, that gets replicated across, and[br]then on 0:07:49.449,0:07:51.240 the other system we now have that data coming 0:07:51.240,0:07:53.309 out. Someone's doing a read. And so this is 0:07:53.309,0:07:57.679 the kind of situation that we're talking about.[br]So 0:07:57.679,0:08:01.469 whether you're doing master/slave setup in[br]a relational database 0:08:01.469,0:08:06.129 or something trickier, this is kind of the[br]way 0:08:06.129,0:08:08.300 it works. A node gets some data and it 0:08:08.300,0:08:12.020 gives it to another node, and they have the 0:08:12.020,0:08:13.860 same information. 0:08:13.860,0:08:17.839 So when there's a network partition, that,[br]they no 0:08:17.839,0:08:22.119 longer can communicate. So a write comes in,[br]and 0:08:22.119,0:08:25.869 now we have to make a decision. And all 0:08:25.869,0:08:27.740 of this is actually just science, as you can 0:08:27.740,0:08:30.759 tell from this diagram. If those two nodes[br]can't 0:08:30.759,0:08:33.399 communicate, you can talk to the one that[br]got 0:08:33.399,0:08:36.890 the write - that's consistent. It got the[br]write. 0:08:36.890,0:08:39.219 It can now, can read out that same data. 0:08:39.219,0:08:40.190 That's all cool. 0:08:40.190,0:08:43.870 Or, you can have both nodes still communicating,[br]and 0:08:43.870,0:08:46.339 now you have someone reading data that is[br]no 0:08:46.339,0:08:49.100 longer in the write state. So we've got, you 0:08:49.100,0:08:51.650 know, we have updated a bank account. It's[br]got 0:08:51.650,0:08:53.680 a hundred dollars in it. It used to have 0:08:53.680,0:08:56.770 ten dollars in it. These people are reading[br]ten. 0:08:56.770,0:08:59.020 These people are reading a hundred. That's[br]available. The 0:08:59.020,0:09:01.200 data is now not consistent. But all of the 0:09:01.200,0:09:03.220 nodes can send back that data. 0:09:03.220,0:09:06.660 And so all of the discussion about CAP Theorem 0:09:06.660,0:09:11.020 and, and you know, people even claiming, we've[br]defeated 0:09:11.020,0:09:14.040 the CAP Theorem in our database at, you know, 0:09:14.040,0:09:19.430 low-low prices is incredibly awesome. Just[br]remember this image. 0:09:19.430,0:09:26.000 Two things that cannot communicate cannot[br]communicate. It's science. 0:09:26.000,0:09:28.250 And then when they can communicate, we're[br]back into 0:09:28.250,0:09:30.850 the realm of normal operations and things[br]get a 0:09:30.850,0:09:35.090 lot easier. If you were interested in any[br]of 0:09:35.090,0:09:39.540 the guts of how these things work, definitely[br]have 0:09:39.540,0:09:42.430 a look at a thing called jepsen, which is 0:09:42.430,0:09:47.670 this crazy motherfucker who is just analyzing[br]the network 0:09:47.670,0:09:51.510 operations of a whole variety of distributed[br]systems, and 0:09:51.510,0:09:54.900 it will, it's just, it will blow your mind. 0:09:54.900,0:10:01.080 OK. Good. That's, that's why. Now I remember. 0:10:01.080,0:10:04.590 So, here is our cast. We're about to go 0:10:04.590,0:10:08.850 on an adventure through a tortured maze of[br]ridiculous 0:10:08.850,0:10:11.620 Dungeons and Dragons metaphors. But, first[br]of all, a 0:10:11.620,0:10:14.960 shout out to the OwlBear. Yeah. The thing[br]I 0:10:14.960,0:10:18.740 love about the OwlBear is they've taken the[br]wrong, 0:10:18.740,0:10:22.680 the least scary aspects of a bear and an 0:10:22.680,0:10:26.260 owl, like if that was an owl with, you 0:10:26.260,0:10:30.120 know, if it had a bears head and wings, 0:10:30.120,0:10:33.620 that would be way more scary. Anyway. 0:10:33.620,0:10:37.130 It's just been bugging me for months. So. 0:10:37.130,0:10:41.670 PostGres. As we all know, it's MySQL for hipsters. 0:10:41.670,0:10:45.050 It's actually pretty good. So here's its character[br]reference 0:10:45.050,0:10:49.320 sheet. We, it's a relational database. It[br]has a 0:10:49.320,0:10:53.760 consistent model. So under conditions in network[br]partition, you 0:10:53.760,0:10:56.650 know, your, your slave is not in contact with 0:10:56.650,0:11:00.480 the master, it's, it's essentially unavailable.[br]That's the way 0:11:00.480,0:11:01.730 we treat it. 0:11:01.730,0:11:05.370 PostGres is actually really, really interesting[br]tic, because it 0:11:05.370,0:11:10.290 has a bunch of cool stuff hidden underneath[br]it. 0:11:10.290,0:11:13.010 So there's a thing called Hstore which is[br]a 0:11:13.010,0:11:15.650 key-value store that's baked right in. So[br]if you 0:11:15.650,0:11:18.720 need a lightweight key-value store and you're[br]already running 0:11:18.720,0:11:22.700 PostGres in production, you, you have one.[br]You don't 0:11:22.700,0:11:25.600 need to spin up any other thing. You can 0:11:25.600,0:11:27.450 actually do that today. 0:11:27.450,0:11:30.320 The really interesting thing about that is,[br]you can 0:11:30.320,0:11:34.100 index those keys. You can do joins across[br]an 0:11:34.100,0:11:38.230 Hstore reference into, across multiple tables.[br]It looks and 0:11:38.230,0:11:40.250 feels exactly like the kind of thing that[br]you're 0:11:40.250,0:11:42.060 already working with. 0:11:42.060,0:11:46.560 We've got, there's some things already baked[br]into the 0:11:46.560,0:11:49.380 Rails ecosystem that make this really easy[br]if you're 0:11:49.380,0:11:52.800 doing that kind of information. But the really[br]exciting 0:11:52.800,0:11:55.570 thing about what PostGres is up to at the 0:11:55.570,0:12:01.720 moment is JSON. And 9.2, 9.3, and upcoming[br]9.4 0:12:01.720,0:12:05.810 have pretty much a fully baked in JSON document 0:12:05.810,0:12:11.190 database. And it is crazy awesome. The new[br]one 0:12:11.190,0:12:14.350 is super high-performance. If you were sort[br]of, it's 0:12:14.350,0:12:17.400 the same thing. If you're thinking, ah, you[br]know, 0:12:17.400,0:12:20.250 documents would be easier for this use case,[br]let's 0:12:20.250,0:12:24.730 install something else, we're actually, you[br]already have one, 0:12:24.730,0:12:26.690 and it, it has all of those same properties. 0:12:26.690,0:12:28.610 You can index. You can do joins across your 0:12:28.610,0:12:32.760 normal table into the documents. It's crazy[br]cool. 0:12:32.760,0:12:36.670 MySQL. It's pretty much the same as PostGres,[br]is 0:12:36.670,0:12:42.310 my answer. But there's a slight caveat. So,[br]you 0:12:42.310,0:12:47.900 know, I, I recall, they're a company. Many[br]of 0:12:47.900,0:12:50.180 the same things apply. Like, this is why,[br]you 0:12:50.180,0:12:52.380 know, they're, they're kind of in the same[br]bucket. 0:12:52.380,0:12:55.950 For me, it doesn't particularly matter at[br]the end 0:12:55.950,0:12:58.190 of the day. Whatever you happen to have expertise 0:12:58.190,0:13:01.170 in, it's cool. It's got some kind of interesting 0:13:01.170,0:13:02.870 things that you can do. You can switch out 0:13:02.870,0:13:07.700 storage engines to actually get your different[br]performance profiles. 0:13:07.700,0:13:11.530 It is everywhere. It's got a thing called[br]Handler 0:13:11.530,0:13:16.410 Socket, which is essentially raw, right. Access[br]through a 0:13:16.410,0:13:19.740 low-level socket into the table infrastructure.[br]There's some paper 0:13:19.740,0:13:24.440 with really high performance kind of things. 0:13:24.440,0:13:26.660 You can actually just sort of bypass the whole 0:13:26.660,0:13:29.560 SQL engine, which is kind of interesting.[br]The other 0:13:29.560,0:13:31.870 thing that's happened since Oracle took over,[br]which is 0:13:31.870,0:13:35.340 kind of a really good thing, is that there's 0:13:35.340,0:13:40.470 some alternatives. So MariaDB is sort of the,[br]the 0:13:40.470,0:13:44.650 more open fork. There's a semi-commercial[br]addition that has 0:13:44.650,0:13:47.900 lots of really high-performance features,[br]and they basically run 0:13:47.900,0:13:51.610 binary compatible patches, that's Percona.[br]And they have, like, 0:13:51.610,0:13:55.610 huge expertise. And this Toku is quite interesting.[br]It's, 0:13:55.610,0:13:58.400 they're doing all of this crazy fractal indexing[br]and 0:13:58.400,0:14:02.250 things for particular use cases on very large[br]datasets. 0:14:02.250,0:14:04.900 But it still just looks and behaves in many 0:14:04.900,0:14:07.890 ways like the MySQL that you are kind of 0:14:07.890,0:14:08.630 used to. 0:14:08.630,0:14:13.130 So, there's some interesting things happening[br]there. So these, 0:14:13.130,0:14:16.640 hopefully none of that's a huge surprise.[br]That's databases. 0:14:16.640,0:14:21.320 You use it. It comes in the box, and 0:14:21.320,0:14:22.600 ActiveRecord talks to it. 0:14:22.600,0:14:24.779 So now we're gonna get slightly off the beaten 0:14:24.779,0:14:30.370 track. So, a lot of what we know SQL 0:14:30.370,0:14:35.370 comes from Dynamo, which was actually a paper[br]that 0:14:35.370,0:14:40.160 Amazon released years ago. I'm not gonna labor[br]too 0:14:40.160,0:14:42.460 much on this one. The paper's quite interesting.[br]It 0:14:42.460,0:14:48.000 talks about how you make a distributed system. 0:14:48.000,0:14:51.930 The interesting thing is actually that Riak[br]is essentially 0:14:51.930,0:14:55.300 an implementation of the underlying Dynamo[br]theory. So Riak 0:14:55.300,0:14:58.340 is crazy awesome. This is what happens to[br]you 0:14:58.340,0:15:01.580 when you run Riak in production. 0:15:01.580,0:15:02.430 [laughter] 0:15:02.430,0:15:05.930 I pretty much, like, it's a conversation I,[br]I 0:15:05.930,0:15:08.720 often have with people is like, wouldn't it[br]be 0:15:08.720,0:15:12.750 awesome to have a problem that needed Riak?[br]And 0:15:12.750,0:15:13.710 it was like, yeah, that would be so cool. 0:15:13.710,0:15:17.970 I'd be like the awesomeness engineer. 0:15:17.970,0:15:21.430 So Riak is, it's just crazy-well engineered.[br]They're doing 0:15:21.430,0:15:26.870 all sorts of interesting stuff. It's inherently,[br]it just 0:15:26.870,0:15:30.680 understands clustering. You know, you add[br]a new node, 0:15:30.680,0:15:35.260 it just, it's there. You know. With, with[br]those 0:15:35.260,0:15:37.610 older kind of databases, it's, it's a pain[br]in 0:15:37.610,0:15:40.170 the ass to actually get it working. 0:15:40.170,0:15:45.339 So, yeah, they're doing some really interesting[br]things. It's 0:15:45.339,0:15:47.920 got a cloud storage thing so you've got an 0:15:47.920,0:15:50.120 S3-compatible API and all of these kind of[br]stuff. 0:15:50.120,0:15:51.360 A lot of the magic of the way this 0:15:51.360,0:15:56.660 works is through consistent hashing. So, my[br]slides are 0:15:56.660,0:15:58.380 all mucked up. But anyway. 0:15:58.380,0:16:00.640 So, basically what it does is it just partitions 0:16:00.640,0:16:05.350 all of your data into a giant hash ring. 0:16:05.350,0:16:10.450 Excuse me. Physical nodes then just own parts[br]of 0:16:10.450,0:16:12.370 that hash. You add a new node or take 0:16:12.370,0:16:15.720 a node away and it repartitions all the rest 0:16:15.720,0:16:17.940 of the data across the remaining nodes. And[br]all 0:16:17.940,0:16:21.480 of that is just completely in the background[br]of 0:16:21.480,0:16:24.680 how Riak just works operationally. 0:16:24.680,0:16:27.300 So for large scale data and, you know, you, 0:16:27.300,0:16:30.710 you get away with, it has some really nice 0:16:30.710,0:16:34.170 operational characteristics that, that make[br]it quite cool to 0:16:34.170,0:16:34.750 manage. 0:16:34.750,0:16:36.529 And then the other thing is, it's a very 0:16:36.529,0:16:40.190 simple API. It's key-value store, you can[br]store JSON 0:16:40.190,0:16:42.220 documents in it, and it's just a bucket that 0:16:42.220,0:16:45.130 has keys, and then it's got other stuff on 0:16:45.130,0:16:49.510 top to retrieve data, do secondary indexes[br]and searching 0:16:49.510,0:16:51.130 and all of that kind of stuff. 0:16:51.130,0:16:54.220 So, it's a very cool piece of tech. 0:16:54.220,0:16:59.080 So, the other one we've got is, Google. Fucking 0:16:59.080,0:17:03.810 annoying. And you'll see why in a second.[br]So, 0:17:03.810,0:17:06.980 Google had this thing called BigTable that,[br]again, kind 0:17:06.980,0:17:10.470 of comes out of the internal research. You[br]have 0:17:10.470,0:17:14.299 access to it through some of their cloud properties. 0:17:14.299,0:17:16.799 As you can see, it's got, it's actually a 0:17:16.799,0:17:21.289 sparse distributed multidimensional sorted[br]map, which is good, I 0:17:21.289,0:17:23.618 guess. I imagine. It's awesome. 0:17:23.618,0:17:27.720 The stuff they're doing with this is crazy.[br]So 0:17:27.720,0:17:30.190 this is actually a, all, a couple years old 0:17:30.190,0:17:33.409 I think now. Some of these, some of the 0:17:33.409,0:17:37.190 information, so. Hundreds of petabytes of[br]data, you know, 0:17:37.190,0:17:40.580 ridiculous numbers of operations a second.[br]You do not 0:17:40.580,0:17:42.649 have any of these problems. 0:17:42.649,0:17:46.879 So, then they, they took this stuff, they[br]were 0:17:46.879,0:17:50.210 like, ah, we've got BigTable. You know, that[br]was, 0:17:50.210,0:17:53.499 that was fucking easy. Whatever. And so now[br]they've 0:17:53.499,0:17:55.480 got two other things. They've got one called[br]Spanner 0:17:55.480,0:18:00.019 and one called F-one, where they're basically[br]doing, you 0:18:00.019,0:18:07.019 know, proper, sort of relational looking data[br]across multiple 0:18:07.350,0:18:10.320 data centers and, you know, and. They're kind[br]of 0:18:10.320,0:18:12.590 really pushing the boundaries of some of that[br]CAP 0:18:12.590,0:18:14.710 stuff that's going on. 0:18:14.710,0:18:18.490 But all you need is a GPS in every 0:18:18.490,0:18:21.379 server, a couple of atomic clocks in each[br]data 0:18:21.379,0:18:26.830 center, and you, great. So, Google's basically[br]telling everyone 0:18:26.830,0:18:29.720 to, you know, just fuck off. 0:18:29.720,0:18:35.169 So, another one that I really, I really like, 0:18:35.169,0:18:39.490 and have used a long, a long time ago 0:18:39.490,0:18:45.690 in, in tech land, tech time, is Cassandra.[br]Cassandra 0:18:45.690,0:18:50.110 is a column-oriented database. Eventually[br]it's awesome. It's really 0:18:50.110,0:18:54.240 all about eventual consistency. 0:18:54.240,0:18:57.519 And you can see here, this is a man, 0:18:57.519,0:18:59.259 he eventually gets it right. So that's well[br]done 0:18:59.259,0:19:02.360 to him there. So Cassandra's a lot like that. 0:19:02.360,0:19:06.019 And, again, you know, the cool thing is, it's 0:19:06.019,0:19:10.549 a sparse distributor multi dimensional sorted[br]map. It, when 0:19:10.549,0:19:13.350 I was working with it, you, it was, you 0:19:13.350,0:19:16.100 had, you described your tables kind of thing[br]in 0:19:16.100,0:19:20.309 XML and hated yourself, and then every time[br]something 0:19:20.309,0:19:23.460 changed you rebooted the server and that took[br]awhile 0:19:23.460,0:19:27.389 and, yeah, the whole thing was really difficult. 0:19:27.389,0:19:30.570 What it basically does is it takes the availability 0:19:30.570,0:19:33.570 side of the question. Like, that's its world[br]model. 0:19:33.570,0:19:37.830 It has, again, a very simple clustering system.[br]New 0:19:37.830,0:19:41.289 nodes, add in, the data gets streamed out.[br]It 0:19:41.289,0:19:46.070 has a data model that is really complicated,[br]and 0:19:46.070,0:19:48.470 I, even though I've used it, it's really hard 0:19:48.470,0:19:50.909 to explain how it actually works. 0:19:50.909,0:19:54.730 So column databases basically kind of invert[br]the, the 0:19:54.730,0:19:56.700 whole table structure that you're used to[br]from the 0:19:56.700,0:20:01.190 relational world. And the advantage is that,[br]for some 0:20:01.190,0:20:04.159 types of data, and for some queries, it is 0:20:04.159,0:20:07.600 crazy blazing fast, cause you can just. Time[br]series 0:20:07.600,0:20:08.619 are always a good one, where you can just 0:20:08.619,0:20:10.929 have long streams of time series and it will 0:20:10.929,0:20:13.490 actually put that on disk or next to each 0:20:13.490,0:20:15.600 other and you can just pull it all out. 0:20:15.600,0:20:18.509 The cool thing in the new versions of Cassandra 0:20:18.509,0:20:22.299 is that they've abstracted all of that out,[br]and 0:20:22.299,0:20:25.570 you actually just get tables, so you can create 0:20:25.570,0:20:28.200 a table and give it a primary key, and 0:20:28.200,0:20:32.239 under the covers, it's setting up rows and[br]column 0:20:32.239,0:20:35.239 families and columns and all of, all of these 0:20:35.239,0:20:39.389 really abstract concepts, and they've completely[br]made some of 0:20:39.389,0:20:41.499 that go away. Which is really nice. 0:20:41.499,0:20:43.929 So you end up with something that looks a 0:20:43.929,0:20:48.739 lot like just SQL and, you know, a normal 0:20:48.739,0:20:52.649 table kind of structure. It's just clustering[br]out lots 0:20:52.649,0:20:55.100 of nodes. It's very tunable, so you can actually 0:20:55.100,0:20:57.989 set up, you know, it writes to a node 0:20:57.989,0:21:00.019 and you can say, actually write to five nodes 0:21:00.019,0:21:02.019 and that's a quorem and now we're cool. So 0:21:02.019,0:21:06.019 you can tune how much redundancy you have. 0:21:06.019,0:21:12.590 So that's kind of cool. That is a reminder. 0:21:12.590,0:21:17.559 That went cold really fast. Thank you. 0:21:17.559,0:21:20.830 So, the next one on our list is Memcache. 0:21:20.830,0:21:24.220 Memcache, there was, there was a talk earlier[br]in 0:21:24.220,0:21:27.529 the week that was describing using Memcache[br]and caching 0:21:27.529,0:21:29.789 and it, it had a very interesting observation,[br]which 0:21:29.789,0:21:32.669 was, it just works. He didn't even know what 0:21:32.669,0:21:36.590 version he was running in production, cause[br]neh. Doesn't 0:21:36.590,0:21:38.739 matter. That API has been stable for ages. 0:21:38.739,0:21:42.419 And I know, I know what you're saying. It's 0:21:42.419,0:21:45.559 not a database. It's a cache. Technically[br]true. But 0:21:45.559,0:21:48.049 it's interesting to think about, because the[br]moment you 0:21:48.049,0:21:51.379 add caching, even if you've been ignoring[br]the fact 0:21:51.379,0:21:54.779 that you had a distributed system before,[br]with caching 0:21:54.779,0:21:57.330 you now really have a distributed system.[br]You've got 0:21:57.330,0:21:59.980 data in one thing that may or may not 0:21:59.980,0:22:02.759 be fresh, and you've got data in your database 0:22:02.759,0:22:05.119 that, you know, you assume is up to date, 0:22:05.119,0:22:07.249 and now you've got a synchronization problem. 0:22:07.249,0:22:12.080 So, Memcache is actually really, you know,[br]it's, it's 0:22:12.080,0:22:16.659 just rock solid, old as the hills technology,[br]completely 0:22:16.659,0:22:22.279 simple. The API is everywhere. Lots of people[br]actually 0:22:22.279,0:22:26.119 have made their, you know, key-value store[br]they made 0:22:26.119,0:22:28.309 in the hacknight, which, you know, is a useful 0:22:28.309,0:22:30.739 hobby if you want to annoy everyone. 0:22:30.739,0:22:33.139 You have the, their API is actually the Memcached 0:22:33.139,0:22:36.080 API. It's got a handful of things. You can 0:22:36.080,0:22:40.129 set a key, you can replace one. It does 0:22:40.129,0:22:43.679 have something atomic operations so you can[br]increment and 0:22:43.679,0:22:46.149 decrement so that there is some flexibility[br]to actually 0:22:46.149,0:22:51.669 do a little bit of data storage in a, 0:22:51.669,0:22:55.779 in a more traditional sense. 0:22:55.779,0:22:59.389 It's actually a client-server model. Your,[br]your driver is 0:22:59.389,0:23:02.429 responsible for the clustering in a way, so[br]you 0:23:02.429,0:23:07.049 can have multiple Memcache nodes and the,[br]the hashing 0:23:07.049,0:23:11.279 algorithm determines which node, which node[br]a particular piece 0:23:11.279,0:23:13.440 of data is gonna be on. 0:23:13.440,0:23:15.960 That has the property of making it very, very 0:23:15.960,0:23:19.440 simple to use. And there's no cluster state.[br]There's 0:23:19.440,0:23:21.889 no coordination that nodes have. Like, a lot[br]of 0:23:21.889,0:23:23.519 the heavy lifting all of these other things[br]are 0:23:23.519,0:23:27.869 doing is about coordinating around all of[br]that information. 0:23:27.869,0:23:29.749 There's a whole bunch of awesome stuff just[br]baked 0:23:29.749,0:23:34.519 into Rails. So you can just easily cache into 0:23:34.519,0:23:38.940 Memcache, or your normal Rails fragment mutations.[br]All of 0:23:38.940,0:23:40.869 that kind of stuff. 0:23:40.869,0:23:42.409 And there's even some things we can, you can 0:23:42.409,0:23:46.289 actually put, push that into ActiveRecord[br]and have, have 0:23:46.289,0:23:48.440 caching at that level as well. 0:23:48.440,0:23:50.700 Redis is an interesting one for the, the Rails 0:23:50.700,0:23:56.580 community. Cause it's basically a queue, now.[br]Everyone seems 0:23:56.580,0:24:01.369 to be running Resq, Sidekiq, and, you know,[br]Redis 0:24:01.369,0:24:05.659 is, again, one of those just pieces of technology 0:24:05.659,0:24:12.220 that is beautifully engineered, incredibly[br]simple, incredibly robust. The 0:24:12.220,0:24:19.220 maintainers are just absolute, you know, scientists,[br]I guess. 0:24:19.309,0:24:22.999 Just a whole other level of crazy algorithm[br]stuff. 0:24:22.999,0:24:25.299 And they make blog posts and, you know, I'm 0:24:25.299,0:24:31.519 so stupid. I don't understand what you're[br]talking about. 0:24:31.519,0:24:35.989 It's really fast, it's slightly hard to distribute.[br]A 0:24:35.989,0:24:38.710 lot of that's in the pipeline with Redis.[br]It's 0:24:38.710,0:24:42.379 much more, it's much more simple to, to stick 0:24:42.379,0:24:46.070 it on one node and increase the RAM. It's 0:24:46.070,0:24:49.359 mu, more complicated then Memcache. It's essentially[br]just an 0:24:49.359,0:24:52.129 in-memory cache. It has a bunch of really[br]interesting 0:24:52.129,0:24:56.679 data structures, though. I think if you've[br]been confused 0:24:56.679,0:24:59.029 all week, now, which country I'm from, whether[br]I 0:24:59.029,0:25:01.720 say dayta or dahta, so now I just changed 0:25:01.720,0:25:03.710 them randomly. 0:25:03.710,0:25:08.070 So, you can, you have hashes you have lists, 0:25:08.070,0:25:09.779 you have strings. You've got all sorts of[br]other 0:25:09.779,0:25:14.129 interesting things. You can do optimistic[br]locking and have, 0:25:14.129,0:25:17.609 you know, a bunch of operations that are essentially 0:25:17.609,0:25:22.369 batched. You can do sort of, there's long[br]ways 0:25:22.369,0:25:25.440 of doing this kind of stuff. It's Resque and 0:25:25.440,0:25:28.690 Sidekiq both just make this, make it super[br]simple 0:25:28.690,0:25:31.139 to do background tasks with Rails and install[br]the 0:25:31.139,0:25:36.960 gem, have a worker, and it's all just magic. 0:25:36.960,0:25:39.769 It is Lua baked in, which is a whole 0:25:39.769,0:25:41.850 other thing. But Lua is a really cool programming 0:25:41.850,0:25:44.940 language that is designed for embeddability.[br]But one of 0:25:44.940,0:25:47.210 the things that happens if you can actually[br]write 0:25:47.210,0:25:51.389 little rule, Lua scripts that end up going[br]into 0:25:51.389,0:25:54.519 the Redis server to do more complex operations.[br]So, 0:25:54.519,0:25:57.179 in this case, this is a little script that 0:25:57.179,0:26:00.269 grabs something off a sorted hash and then[br]deletes 0:26:00.269,0:26:02.789 them and then returns the first thing, like,[br]then 0:26:02.789,0:26:05.789 returns what we had done. But it's, it's an 0:26:05.789,0:26:09.529 atomic kind of transactional way. 0:26:09.529,0:26:13.320 And, good news everybody! We've just invented[br]stored procedures. 0:26:13.320,0:26:16.409 So that's very exciting. Except now they're[br]much more 0:26:16.409,0:26:18.639 hip, because it's an in-memory database with[br]a language 0:26:18.639,0:26:23.330 no one's heard of. So. We are rocking it. 0:26:23.330,0:26:28.470 Also, maybe use a queue. Just, I know it's 0:26:28.470,0:26:32.869 crazy. But, if you're actually queuing, using[br]Redis as 0:26:32.869,0:26:36.809 your queue, maybe you have a queuing problem[br]and 0:26:36.809,0:26:39.609 you have queues. They exist. They're a thing.[br]It's 0:26:39.609,0:26:41.440 ridiculous. I know. 0:26:41.440,0:26:46.379 So, RabbitMQ is sort of the gold standard,[br]and 0:26:46.379,0:26:49.129 Kafka is another one that was talked about[br]earlier 0:26:49.129,0:26:50.909 this week, and it is crazy cool. 0:26:50.909,0:26:56.129 Where am I? Man. All right. Just gonna stretch. 0:26:56.129,0:26:58.820 I've lost count, so I don't know, now I'm 0:26:58.820,0:27:02.019 just gonna talk faster. Cool. 0:27:02.019,0:27:08.369 Neo4j is really interesting. It's a graph[br]database. That's. 0:27:08.369,0:27:13.350 It's slightly hard to explain. But you, the[br]way 0:27:13.350,0:27:15.210 I actually think about it, we'll just jump[br]straight 0:27:15.210,0:27:17.460 to here, is it's almost but not quite entirely 0:27:17.460,0:27:22.950 unlike a relational database. The difference,[br]essentially, is that 0:27:22.950,0:27:27.409 it is optimize for the connections rather[br]than aggregated 0:27:27.409,0:27:31.710 data. So relational database, you, puts things[br]in, in 0:27:31.710,0:27:33.279 a way where you can get a sum and 0:27:33.279,0:27:35.179 a count and like, that's kind of the heritage 0:27:35.179,0:27:37.029 of that kind of world view. 0:27:37.029,0:27:40.340 Whereas what the Neo4j people are doing is[br]actually 0:27:40.340,0:27:44.739 thinking about connections between pieces[br]of data, and for 0:27:44.739,0:27:49.340 some use cases, this is actually really, really[br]amazing 0:27:49.340,0:27:52.369 stuff. So you have, a graph is basically a 0:27:52.369,0:27:56.850 collection of nodes, and those nodes can have[br]relationships 0:27:56.850,0:27:59.179 between each other, and then a node just has 0:27:59.179,0:28:01.330 properties. 0:28:01.330,0:28:03.830 It's essentially an object database in a way.[br]It's 0:28:03.830,0:28:05.639 like very similar to the way that we think 0:28:05.639,0:28:08.109 about objects. So it has some really nice[br]properties 0:28:08.109,0:28:11.859 if you're working in a language like Ruby.[br]And 0:28:11.859,0:28:17.009 then it just does stuff that, you know, in 0:28:17.009,0:28:19.090 a really intuitive way. So if we've got a 0:28:19.090,0:28:22.159 graph of movies and actors, you actually define[br]a 0:28:22.159,0:28:26.460 relationship by name. Then an actor acts in[br]a 0:28:26.460,0:28:28.700 movie. And then when you were doing your queries, 0:28:28.700,0:28:32.909 this is a language called Cypher, you actually,[br]that's 0:28:32.909,0:28:34.059 a first-class thing. 0:28:34.059,0:28:36.019 Whereas in a relational world, you're, you're[br]using a 0:28:36.019,0:28:39.279 foreign key, which has no semantic meaning[br]at all. 0:28:39.279,0:28:41.330 You, you just have to remember that, you know, 0:28:41.330,0:28:43.019 an actor, you know, there's a table with an 0:28:43.019,0:28:45.729 actor id, and a movie id, and we're joining 0:28:45.729,0:28:49.919 across somewhere. Whereas Neo4j actually makes[br]those relationships first 0:28:49.919,0:28:53.359 class citizens. So if you've got problems[br]that are 0:28:53.359,0:29:00.359 graph problems, like social network friend[br]cloud stuff, some 0:29:01.549,0:29:04.799 of that stuff, Neo4j just makes trivially[br]easy in 0:29:04.799,0:29:06.070 a way that you would have had to do 0:29:06.070,0:29:10.119 a recursive self-join in PostGres and hate[br]your life 0:29:10.119,0:29:12.499 and, you know. 0:29:12.499,0:29:17.029 Couch is cool. I guess. Pretty much that's[br]my 0:29:17.029,0:29:21.029 opinion of it. It's really awesome. But, you[br]can't 0:29:21.029,0:29:25.659 query it. So cool. 0:29:25.659,0:29:28.109 That's it. That's a slight disservice to Couch[br]but, 0:29:28.109,0:29:31.970 you know, whatever. MongoDB, as we all know,[br]it 0:29:31.970,0:29:34.559 is webscale and that's excellent. If you think[br]of 0:29:34.559,0:29:39.200 it as Redis for JSON, that's good. Sixty percent 0:29:39.200,0:29:41.249 of the time, it works every time. Everyone's[br]familiar 0:29:41.249,0:29:43.169 with that. 0:29:43.169,0:29:46.929 So, the thing that's really, I mean, Mongo,[br]it 0:29:46.929,0:29:50.919 reminds me of My, MySQL. Like, Mongo is kind 0:29:50.919,0:29:54.320 of terrible, but MySQL was kind of terrible,[br]too. 0:29:54.320,0:29:56.789 Like, when that came out, it didn't do transactions, 0:29:56.789,0:30:00.039 for example, and I, I was working in enterprise-y 0:30:00.039,0:30:04.419 land, and transactions are actually a thing.[br]And, you're 0:30:04.419,0:30:08.929 like, you script kiddies with your database. 0:30:08.929,0:30:10.789 So Mongo feels like that, and not, you know, 0:30:10.789,0:30:13.970 what we learned is, if you make something[br]that's 0:30:13.970,0:30:17.539 awesome and useful and everywhere and ubiquitous[br]and it 0:30:17.539,0:30:20.749 doesn't work, you can make it work. And eventually, 0:30:20.749,0:30:23.309 you know, MySQL is a real database. So Mongo 0:30:23.309,0:30:25.470 feels a bit like that. It's come a massive 0:30:25.470,0:30:30.690 way, right about really early on with very[br]early 0:30:30.690,0:30:32.309 versions. 0:30:32.309,0:30:34.759 It stores JSON. Well sort of it. It stores 0:30:34.759,0:30:39.710 BSON, anyway. That's just binary JSON basically.[br]And it's 0:30:39.710,0:30:42.409 a, it's a really beautiful model to work with 0:30:42.409,0:30:45.129 in a development cycle, which is why think[br]is 0:30:45.129,0:30:47.489 why there's, why there's so much appeal. You've[br]just 0:30:47.489,0:30:50.929 got kind of, people treat it like an object 0:30:50.929,0:30:53.690 database. You've just got an object that's[br]in there, 0:30:53.690,0:30:55.720 and you can pull out objects and manipulate[br]them 0:30:55.720,0:30:59.859 and do all of this kind of crazy stuff. 0:30:59.859,0:31:05.220 The people who know what they're talking about,[br]though, 0:31:05.220,0:31:08.450 with distributed systems, if the reason you're[br]using Mongo 0:31:08.450,0:31:10.299 is because you think it's a panacea for all 0:31:10.299,0:31:13.700 of this, you know, we need to be webscale 0:31:13.700,0:31:17.229 and do all of this kind of stuff, that 0:31:17.229,0:31:19.399 is not a good reason to use it. Cause 0:31:19.399,0:31:21.919 there, there's still a lot of operational[br]problems and, 0:31:21.919,0:31:23.739 and stuff going on. 0:31:23.739,0:31:30.179 This, this one is interesting. It's essentially,[br]RethinkDB is 0:31:30.179,0:31:33.299 coming from the PostGres world view. Cause[br]PostGres made, 0:31:33.299,0:31:36.729 you know, MySQL was like, whatever, we'll[br]fix it. 0:31:36.729,0:31:39.669 PostGres was like, we'll do it right and it, 0:31:39.669,0:31:41.629 you can't use it cause it's so slow, but 0:31:41.629,0:31:43.539 at least it's correct. And they took lots[br]of 0:31:43.539,0:31:46.539 iterations to make it usable. So Rethink is[br]kind 0:31:46.539,0:31:48.340 of that school of thought. It's like, we're[br]gonna 0:31:48.340,0:31:50.619 make it all correct first, and then we'll[br]make 0:31:50.619,0:31:55.799 it usable. So it's very similar idea. JSON,[br]you 0:31:55.799,0:31:59.429 know, they're trying to make it operationally[br]great with 0:31:59.429,0:32:02.979 automatic clustering and all this kind of[br]stuff. You 0:32:02.979,0:32:05.149 know. Who knows what it is and how it's 0:32:05.149,0:32:07.179 actually gonna behave in the real world. It's[br]still 0:32:07.179,0:32:09.159 a very early piece of tech. 0:32:09.159,0:32:11.249 And that leads me into, there's a whole world 0:32:11.249,0:32:15.479 of databases around what I'm loosely calling[br]the commercial 0:32:15.479,0:32:20.149 fringe. So Couchbase is the Couch guys and[br]sort 0:32:20.149,0:32:24.019 of some commercial Memcached guys who got[br]together to 0:32:24.019,0:32:28.409 make a hybrid something. Aerospike is, their[br]marketing is 0:32:28.409,0:32:31.519 great. That's about the best you can say about 0:32:31.519,0:32:31.869 it. 0:32:31.869,0:32:33.289 So there's a whole bunch of people trying[br]to 0:32:33.289,0:32:36.799 solve these problems in interesting ways.[br]But all of 0:32:36.799,0:32:40.720 these ones cost money and, you know, they're,[br]the 0:32:40.720,0:32:42.200 mileage varies and all of that kind of stuff. 0:32:42.200,0:32:43.539 The cool thing about open sources ones is[br]you 0:32:43.539,0:32:45.029 get it and you try it and you hate 0:32:45.029,0:32:46.570 it and you go back to PostGres so it's 0:32:46.570,0:32:48.190 all fine. 0:32:48.190,0:32:53.190 So, Hyperdex. This is my favorite. Because[br]they have 0:32:53.190,0:32:58.379 HyperSpace Hashing, and it is so cool. These[br]guys 0:32:58.379,0:33:02.369 are making some really broad, amazing claims[br]about the, 0:33:02.369,0:33:06.549 the kind of things that they can do. Crazy 0:33:06.549,0:33:08.690 fast. It's, it's a key-value store but it[br]will 0:33:08.690,0:33:11.599 index, you know, it's not just a key but 0:33:11.599,0:33:14.039 it will index the properties of a value. So 0:33:14.039,0:33:16.509 now you can do que, you know, genuine queries 0:33:16.509,0:33:20.629 into the structure of objects that you're[br]storing. 0:33:20.629,0:33:23.499 They've got a whole bunch of papers around[br]what 0:33:23.499,0:33:27.299 they're doing. So, you can read that as, who 0:33:27.299,0:33:29.679 knows what it means. It maps objects to coordinates 0:33:29.679,0:33:34.529 in a multi-dimensioned Euclidean space. HyperSpace.[br]And I'm like. 0:33:34.529,0:33:37.109 Take my money! 0:33:37.109,0:33:40.989 And there's a, there's a picture of HyperSpace.[br]And, 0:33:40.989,0:33:43.659 like, I've read that like eight times. I don't 0:33:43.659,0:33:49.999 understand what's going on. But if, it does[br]seem 0:33:49.999,0:33:52.070 to be true. They're trying to solve some of 0:33:52.070,0:33:54.720 these problems and, you know, they call themselves[br]like 0:33:54.720,0:33:59.659 a second generation NoSQL thing, in a similar[br]way 0:33:59.659,0:34:01.669 to Google, you know, kind of taking all of 0:34:01.669,0:34:05.039 this stuff and trying to push the science[br]underneath 0:34:05.039,0:34:06.999 it forward. 0:34:06.999,0:34:09.510 So you can, you know, it's got a Ruby 0:34:09.510,0:34:12.960 client. You can use it now. It's got, just, 0:34:12.960,0:34:18.429 normal key-value. It's got atomic stuff. You[br]can do 0:34:18.429,0:34:22.969 conditional ports, so this is some code that's[br]basically 0:34:22.969,0:34:26.860 is only updating if the, only updating the[br]current 0:34:26.860,0:34:31.969 balance if the, updating the balance if the[br]current 0:34:31.969,0:34:34.460 balance is what we think it is. Otherwise[br]some 0:34:34.460,0:34:36.460 other thread has updated it. 0:34:36.460,0:34:38.889 So there's some really interesting stuff they[br]can do. 0:34:38.889,0:34:43.650 And they're guaranteeing those operations[br]across the cluster. And 0:34:43.650,0:34:45.620 it's also got a transactional engine as well,[br]so 0:34:45.620,0:34:47.250 that's really exciting. 0:34:47.250,0:34:51.610 Running out of time. HBase and Hadoop. You[br]don't 0:34:51.610,0:34:54.679 have any of these problems. Don't worry about[br]it. 0:34:54.679,0:34:56.219 You probably don't want to have any of these 0:34:56.219,0:34:59.840 problems. Cause this just ends up, you need[br]to 0:34:59.840,0:35:03.870 install every fucking thing the Apache foundation[br]has ever 0:35:03.870,0:35:08.240 made. And this isn't even the full list. This 0:35:08.240,0:35:09.980 is like, you probably need those. 0:35:09.980,0:35:12.620 I have a friend, he's a bit of a 0:35:12.620,0:35:16.870 dick, and he, he calls it, cause he, he 0:35:16.870,0:35:19.710 works in an actual big data organization,[br]and he 0:35:19.710,0:35:21.630 just, he goes, oh, you people with your small 0:35:21.630,0:35:25.970 to medium data. So, yeah, like, most of us, 0:35:25.970,0:35:27.630 we don't have big data in any sense of 0:35:27.630,0:35:31.060 the word, really. Like, if, if it's got GB 0:35:31.060,0:35:34.600 on the end of it, meh. You're not there 0:35:34.600,0:35:35.530 yet. 0:35:35.530,0:35:40.730 So, again, this is just you know, Facebook[br]is 0:35:40.730,0:35:42.270 using the hell out of this stuff, and they're 0:35:42.270,0:35:44.860 just like, this is all out of date. They're 0:35:44.860,0:35:49.590 like now just, they can't buy hard disks fast 0:35:49.590,0:35:53.930 enough. It's crazy. Yeah. There was a punch[br]line 0:35:53.930,0:35:56.380 at the end of all of that. 0:35:56.380,0:35:57.920 But my friend, the guy who I said was 0:35:57.920,0:36:00.960 a bit of a dick, he, he recommends having 0:36:00.960,0:36:04.230 a look at this. And this is his quote, 0:36:04.230,0:36:07.090 if you want to appear really cool and underground, 0:36:07.090,0:36:09.140 then I reckon the next big thing is the 0:36:09.140,0:36:12.280 Berkeley Data Analytics Stack. So, there's[br]a whole bunch 0:36:12.280,0:36:15.580 of people who are looking at that, you know, 0:36:15.580,0:36:18.180 crazy big data situation and trying to work[br]out 0:36:18.180,0:36:22.210 what that means and what the future is. 0:36:22.210,0:36:24.800 And so Apache and Berkeley are kind of in 0:36:24.800,0:36:26.940 a cold war for that at the moment. And 0:36:26.940,0:36:29.140 then there's heaps of people in the enterprise[br]space 0:36:29.140,0:36:31.850 because you can sell lots of products and[br]or 0:36:31.850,0:36:34.590 services to large companies who think they[br]have a 0:36:34.590,0:36:37.710 big data problem. So that's cool. 0:36:37.710,0:36:39.650 That's fine. This isn't, this is just a little 0:36:39.650,0:36:44.990 thing that's an embeddable document key-value[br]store that you 0:36:44.990,0:36:47.430 can, it's just kind of a fun team and 0:36:47.430,0:36:49.210 has an API that looks very similar to the 0:36:49.210,0:36:52.520 Mongo one. And it just sits in process. 0:36:52.520,0:36:56.210 Oh, ElasticSearch. Every time I use it, I[br]think, 0:36:56.210,0:37:01.400 why can you not be my database? It's awesome. 0:37:01.400,0:37:03.370 But it loses a couple of points there because 0:37:03.370,0:37:08.920 of its configurationability. It went, it works[br]when you 0:37:08.920,0:37:10.830 know how to make it works, and it's crazy 0:37:10.830,0:37:12.680 complicated sometimes. 0:37:12.680,0:37:19.640 So anyway. Thirty. Four minutes over technically,[br]I think. 0:37:19.640,0:37:21.950 Yeah. So that's good. 0:37:21.950,0:37:28.950 That's databases in a nutshell. I'm Toby Hede.[br]I'm 0:37:29.160,0:37:31.280 around the conference if you want to talk[br]about 0:37:31.280,0:37:35.340 databases. I think of myself as a lapa-, a 0:37:35.340,0:37:39.320 lap- a butterfly collector, I guess, is what[br]I'm 0:37:39.320,0:37:41.200 looking for, of databases. 0:37:41.200,0:37:45.960 Yeah. So come and say hi. Cool.