WEBVTT 00:00:16.830 --> 00:00:17.590 TOBY HEDE: Good morning everybody. 00:00:17.590 --> 00:00:24.369 Friday. Yes. It's been a long week. I'm excited. 00:00:24.369 --> 00:00:29.279 I'm highly caffeinated. So without further ado, 00:00:29.279 --> 00:00:34.180 I present An Ode to 17 Databases in 33 Minutes. 00:00:34.180 --> 00:00:37.870 I'm gonna mangle a large number of metaphors. 00:00:37.870 --> 00:00:40.820 There'll be a lot of animated gifs. 00:00:40.820 --> 00:00:44.210 I've learned that this week, if you see it like that, 00:00:44.210 --> 00:00:47.910 there's Star Wars, Dungeons and Dragons, 00:00:47.910 --> 00:00:49.350 and all of that's very, unfortunately, stereotypical. 00:00:49.350 --> 00:00:51.900 So a bit of an indictment. 00:00:51.900 --> 00:00:55.839 This whole thing started as a joke. Seventeen databases. 00:00:55.839 --> 00:00:59.159 I actually did in five minutes. Thirty-three minutes is 00:00:59.159 --> 00:01:03.799 worse. The whole thing is just a catastrophe, really. 00:01:03.799 --> 00:01:04.659 But anyway. 00:01:04.659 --> 00:01:07.610 We're gonna cover a whole bunch of different databases 00:01:07.610 --> 00:01:09.890 and a little bit of the underlying theory, and 00:01:09.890 --> 00:01:12.670 hopefully you'll walk out and you'll understand why to 00:01:12.670 --> 00:01:13.729 use PostGres. 00:01:13.729 --> 00:01:14.250 [laughter] 00:01:14.250 --> 00:01:20.180 I'm Toby. You can find me on the internet. 00:01:20.180 --> 00:01:22.260 I work at a company called Nine Fold. 00:01:22.260 --> 00:01:26.100 V.O.: We're having a problem, there's no screen. 00:01:26.100 --> 00:01:33.100 T.H.: Oh. No screens. Is that me? 00:01:35.960 --> 00:01:41.210 Before it was, there was no red. So, now 00:01:41.210 --> 00:01:43.860 there's no any, anything. 00:01:43.860 --> 00:01:45.180 V.O.: Nothing. 00:01:45.180 --> 00:01:46.500 T.H.: Hey. 00:01:46.500 --> 00:01:47.820 AUDIENCE: Hey! 00:01:47.820 --> 00:01:51.120 T.H.: I have no slides. 00:01:51.120 --> 00:01:54.670 Well, you missed my beautiful slides. There's. You missed 00:01:54.670 --> 00:01:57.740 the first animation. That's a shame. You missed the 00:01:57.740 --> 00:02:01.500 list. It's awesome. You missed me and my excellent 00:02:01.500 --> 00:02:05.070 job titles. So yes. 00:02:05.070 --> 00:02:08.340 I work at Nine Fold. They have very kindly 00:02:08.340 --> 00:02:12.550 flown me over here from Australia, which explains why 00:02:12.550 --> 00:02:16.690 I sound like I come from the deep south. 00:02:16.690 --> 00:02:18.060 Cause I do. 00:02:18.060 --> 00:02:21.260 Most of this week, this has been me. So 00:02:21.260 --> 00:02:23.560 today I'm finally over the jetlag just in time 00:02:23.560 --> 00:02:26.670 to go home and have it all over again 00:02:26.670 --> 00:02:27.850 next week. 00:02:27.850 --> 00:02:32.450 So, a couple of quick facts about Straya. There 00:02:32.450 --> 00:02:39.120 are much fewer syllables than you're used to using. 00:02:39.120 --> 00:02:43.950 This is an, a genuine Australian politician. He's a 00:02:43.950 --> 00:02:48.060 mining magnate billionaire and he is currently running a 00:02:48.060 --> 00:02:52.880 MVP Jurrassic theme park with giant fiberglass dinosaurs. And 00:02:52.880 --> 00:02:56.310 I, I for one am for it. So I 00:02:56.310 --> 00:02:58.510 realize there wasn't enough Star Wars references so this 00:02:58.510 --> 00:03:00.540 is just completely gratuitous. 00:03:00.540 --> 00:03:05.430 Anyway. So. The thrust is that distributed systems are 00:03:05.430 --> 00:03:08.470 hard and databases are fun. Pictured here is a 00:03:08.470 --> 00:03:13.530 distributed system. You can see there's two app nodes 00:03:13.530 --> 00:03:16.660 and then there's two, there's like a master/slave kind 00:03:16.660 --> 00:03:20.920 of setup going on here as well. So we're 00:03:20.920 --> 00:03:23.950 gonna talk about some of the complexities of running 00:03:23.950 --> 00:03:27.670 these types of systems, and it's really fun stuff 00:03:27.670 --> 00:03:29.980 once you get under the cover and start thinking 00:03:29.980 --> 00:03:32.250 about some of the complexities. 00:03:32.250 --> 00:03:37.030 So. NoSQL is a thing. We have NewSQL now. 00:03:37.030 --> 00:03:38.780 I'm gonna be covering some of these things. We've 00:03:38.780 --> 00:03:44.000 also got PostSQL, Post-Rock Ambient SQL. And there's a 00:03:44.000 --> 00:03:47.120 whole gammit of these things. They all make my 00:03:47.120 --> 00:03:50.819 brain explode and the, I think the trick to 00:03:50.819 --> 00:03:53.069 understanding all of this stuff is to actually think 00:03:53.069 --> 00:03:55.459 about some of what's happening underneath. And you can 00:03:55.459 --> 00:03:59.650 make decisions about your databases. 00:03:59.650 --> 00:04:01.700 Hopefully you're all familiar with some of the concepts 00:04:01.700 --> 00:04:07.250 of traditional relational databases. We have Acid, which provides 00:04:07.250 --> 00:04:10.640 certain guarantees about the way that your data behaves. 00:04:10.640 --> 00:04:13.129 You can update data and be sure it was 00:04:13.129 --> 00:04:18.099 updated. Things are isolated from each other. Things persist 00:04:18.099 --> 00:04:20.970 over time. 00:04:20.970 --> 00:04:23.129 Another thing that you may have heard of, this 00:04:23.129 --> 00:04:25.740 is a, this is a leap that I need 00:04:25.740 --> 00:04:27.990 to another animation, is a thing called the CAP 00:04:27.990 --> 00:04:30.879 Theorem. So this gets talked about a lot when 00:04:30.879 --> 00:04:34.889 we start talking about this new generation of databases. 00:04:34.889 --> 00:04:39.599 CAP stands for consistency, availability, and partition tolerance, and 00:04:39.599 --> 00:04:44.430 it provides, basically, some strong foundation for reasoning about 00:04:44.430 --> 00:04:48.050 the way distributed systems behave and how they interoperate 00:04:48.050 --> 00:04:49.680 and how they communicate. So I'm gonna give you 00:04:49.680 --> 00:04:52.620 a brief introduction to how that all kind of 00:04:52.620 --> 00:04:52.969 works. 00:04:52.969 --> 00:04:57.279 So, the original CAP Theorem, as stated, was, is 00:04:57.279 --> 00:04:59.610 called Brewer's Conjecture. A guy called Brewer just sort 00:04:59.610 --> 00:05:02.659 of had this idea. It's actually on some really 00:05:02.659 --> 00:05:06.680 awesomely-designed PowerPoint slides from some thing he did. And 00:05:06.680 --> 00:05:11.789 he was saying that with consistency, availability, and partition 00:05:11.789 --> 00:05:15.210 tolerance - so the data can, can only be 00:05:15.210 --> 00:05:17.439 two of these things at any one time. So 00:05:17.439 --> 00:05:20.039 the data can be consistent or it can be 00:05:20.039 --> 00:05:23.979 accessible or it can handle network failures. 00:05:23.979 --> 00:05:28.249 So people then took this conjecture and actually made 00:05:28.249 --> 00:05:31.800 a formal kind of proof in, in much more 00:05:31.800 --> 00:05:38.800 rigorous computer science terms. And actually said, it's impossible, 00:05:38.830 --> 00:05:42.580 in an asynchronous network model, to implement a read/write 00:05:42.580 --> 00:05:49.210 data object that is simultaneously available and is also 00:05:49.210 --> 00:05:50.979 atomically consistent. 00:05:50.979 --> 00:05:53.110 And so all of this stuff around NewSQL and 00:05:53.110 --> 00:05:56.939 NoSQL and bleh, all of that stuff, is about 00:05:56.939 --> 00:06:01.610 manipulating these different variables. There's also a thing called 00:06:01.610 --> 00:06:02.990 Base but I'm not gonna talk about it cause 00:06:02.990 --> 00:06:05.789 it's actually just a made-up acronym that has no 00:06:05.789 --> 00:06:06.900 relevance to anything. 00:06:06.900 --> 00:06:10.059 So, what, what does CAP actually, what, what are 00:06:10.059 --> 00:06:13.649 we talking about here? And why is it important? 00:06:13.649 --> 00:06:16.719 It's important, actually, because everything is already distributed. What 00:06:16.719 --> 00:06:20.309 we do today is inherently a distributed system. You 00:06:20.309 --> 00:06:23.229 have a browser talking to a server, an app 00:06:23.229 --> 00:06:25.520 server, Rails server - cause we're at RailsConf - 00:06:25.520 --> 00:06:29.249 and then that's talking to a PostGres database, or 00:06:29.249 --> 00:06:33.809 a MySQL database or something even fancier and shinier. 00:06:33.809 --> 00:06:36.110 That's a distributed system. And as we move into 00:06:36.110 --> 00:06:41.460 more heavy client-based operations, that distribution is getting much 00:06:41.460 --> 00:06:43.759 more front-loaded, so you, you've got state in the 00:06:43.759 --> 00:06:46.589 browser that's now synchronizing with state on the server. 00:06:46.589 --> 00:06:50.300 So we already actually suffer many of these problems. 00:06:50.300 --> 00:06:55.270 This is a handy and completely untrue guide to 00:06:55.270 --> 00:06:59.389 NoSQL systems and breaking them into this idea of 00:06:59.389 --> 00:07:03.039 some things are available and some things are consistent. 00:07:03.039 --> 00:07:06.990 So, all of that is almost but not quite 00:07:06.990 --> 00:07:08.619 entirely untrue. 00:07:08.619 --> 00:07:12.800 What the actual theorem says is that under a 00:07:12.800 --> 00:07:16.490 network failure - so you've got multiple nodes and 00:07:16.490 --> 00:07:20.159 they now can no longer communicate - you can 00:07:20.159 --> 00:07:23.490 choose whether the data is consistent or whether the 00:07:23.490 --> 00:07:28.849 data is available. And I have some demonstrations here 00:07:28.849 --> 00:07:30.860 to just - it actually ends up being very 00:07:30.860 --> 00:07:31.580 easy to understand. 00:07:31.580 --> 00:07:36.009 So, here we have typical cluster of nodes working 00:07:36.009 --> 00:07:41.110 together. We're gonna model some communication between them. So 00:07:41.110 --> 00:07:45.669 there's a, there's a write on this system. It 00:07:45.669 --> 00:07:49.449 comes in, that gets replicated across, and then on 00:07:49.449 --> 00:07:51.240 the other system we now have that data coming 00:07:51.240 --> 00:07:53.309 out. Someone's doing a read. And so this is 00:07:53.309 --> 00:07:57.679 the kind of situation that we're talking about. So 00:07:57.679 --> 00:08:01.469 whether you're doing master/slave setup in a relational database 00:08:01.469 --> 00:08:06.129 or something trickier, this is kind of the way 00:08:06.129 --> 00:08:08.300 it works. A node gets some data and it 00:08:08.300 --> 00:08:12.020 gives it to another node, and they have the 00:08:12.020 --> 00:08:13.860 same information. 00:08:13.860 --> 00:08:17.839 So when there's a network partition, that, they no 00:08:17.839 --> 00:08:22.119 longer can communicate. So a write comes in, and 00:08:22.119 --> 00:08:25.869 now we have to make a decision. And all 00:08:25.869 --> 00:08:27.740 of this is actually just science, as you can 00:08:27.740 --> 00:08:30.759 tell from this diagram. If those two nodes can't 00:08:30.759 --> 00:08:33.399 communicate, you can talk to the one that got 00:08:33.399 --> 00:08:36.890 the write - that's consistent. It got the write. 00:08:36.890 --> 00:08:39.219 It can now, can read out that same data. 00:08:39.219 --> 00:08:40.190 That's all cool. 00:08:40.190 --> 00:08:43.870 Or, you can have both nodes still communicating, and 00:08:43.870 --> 00:08:46.339 now you have someone reading data that is no 00:08:46.339 --> 00:08:49.100 longer in the write state. So we've got, you 00:08:49.100 --> 00:08:51.650 know, we have updated a bank account. It's got 00:08:51.650 --> 00:08:53.680 a hundred dollars in it. It used to have 00:08:53.680 --> 00:08:56.770 ten dollars in it. These people are reading ten. 00:08:56.770 --> 00:08:59.020 These people are reading a hundred. That's available. The 00:08:59.020 --> 00:09:01.200 data is now not consistent. But all of the 00:09:01.200 --> 00:09:03.220 nodes can send back that data. 00:09:03.220 --> 00:09:06.660 And so all of the discussion about CAP Theorem 00:09:06.660 --> 00:09:11.020 and, and you know, people even claiming, we've defeated 00:09:11.020 --> 00:09:14.040 the CAP Theorem in our database at, you know, 00:09:14.040 --> 00:09:19.430 low-low prices is incredibly awesome. Just remember this image. 00:09:19.430 --> 00:09:26.000 Two things that cannot communicate cannot communicate. It's science. 00:09:26.000 --> 00:09:28.250 And then when they can communicate, we're back into 00:09:28.250 --> 00:09:30.850 the realm of normal operations and things get a 00:09:30.850 --> 00:09:35.090 lot easier. If you were interested in any of 00:09:35.090 --> 00:09:39.540 the guts of how these things work, definitely have 00:09:39.540 --> 00:09:42.430 a look at a thing called jepsen, which is 00:09:42.430 --> 00:09:47.670 this crazy motherfucker who is just analyzing the network 00:09:47.670 --> 00:09:51.510 operations of a whole variety of distributed systems, and 00:09:51.510 --> 00:09:54.900 it will, it's just, it will blow your mind. 00:09:54.900 --> 00:10:01.080 OK. Good. That's, that's why. Now I remember. 00:10:01.080 --> 00:10:04.590 So, here is our cast. We're about to go 00:10:04.590 --> 00:10:08.850 on an adventure through a tortured maze of ridiculous 00:10:08.850 --> 00:10:11.620 Dungeons and Dragons metaphors. But, first of all, a 00:10:11.620 --> 00:10:14.960 shout out to the OwlBear. Yeah. The thing I 00:10:14.960 --> 00:10:18.740 love about the OwlBear is they've taken the wrong, 00:10:18.740 --> 00:10:22.680 the least scary aspects of a bear and an 00:10:22.680 --> 00:10:26.260 owl, like if that was an owl with, you 00:10:26.260 --> 00:10:30.120 know, if it had a bears head and wings, 00:10:30.120 --> 00:10:33.620 that would be way more scary. Anyway. 00:10:33.620 --> 00:10:37.130 It's just been bugging me for months. So. 00:10:37.130 --> 00:10:41.670 PostGres. As we all know, it's MySQL for hipsters. 00:10:41.670 --> 00:10:45.050 It's actually pretty good. So here's its character reference 00:10:45.050 --> 00:10:49.320 sheet. We, it's a relational database. It has a 00:10:49.320 --> 00:10:53.760 consistent model. So under conditions in network partition, you 00:10:53.760 --> 00:10:56.650 know, your, your slave is not in contact with 00:10:56.650 --> 00:11:00.480 the master, it's, it's essentially unavailable. That's the way 00:11:00.480 --> 00:11:01.730 we treat it. 00:11:01.730 --> 00:11:05.370 PostGres is actually really, really interesting tic, because it 00:11:05.370 --> 00:11:10.290 has a bunch of cool stuff hidden underneath it. 00:11:10.290 --> 00:11:13.010 So there's a thing called Hstore which is a 00:11:13.010 --> 00:11:15.650 key-value store that's baked right in. So if you 00:11:15.650 --> 00:11:18.720 need a lightweight key-value store and you're already running 00:11:18.720 --> 00:11:22.700 PostGres in production, you, you have one. You don't 00:11:22.700 --> 00:11:25.600 need to spin up any other thing. You can 00:11:25.600 --> 00:11:27.450 actually do that today. 00:11:27.450 --> 00:11:30.320 The really interesting thing about that is, you can 00:11:30.320 --> 00:11:34.100 index those keys. You can do joins across an 00:11:34.100 --> 00:11:38.230 Hstore reference into, across multiple tables. It looks and 00:11:38.230 --> 00:11:40.250 feels exactly like the kind of thing that you're 00:11:40.250 --> 00:11:42.060 already working with. 00:11:42.060 --> 00:11:46.560 We've got, there's some things already baked into the 00:11:46.560 --> 00:11:49.380 Rails ecosystem that make this really easy if you're 00:11:49.380 --> 00:11:52.800 doing that kind of information. But the really exciting 00:11:52.800 --> 00:11:55.570 thing about what PostGres is up to at the 00:11:55.570 --> 00:12:01.720 moment is JSON. And 9.2, 9.3, and upcoming 9.4 00:12:01.720 --> 00:12:05.810 have pretty much a fully baked in JSON document 00:12:05.810 --> 00:12:11.190 database. And it is crazy awesome. The new one 00:12:11.190 --> 00:12:14.350 is super high-performance. If you were sort of, it's 00:12:14.350 --> 00:12:17.400 the same thing. If you're thinking, ah, you know, 00:12:17.400 --> 00:12:20.250 documents would be easier for this use case, let's 00:12:20.250 --> 00:12:24.730 install something else, we're actually, you already have one, 00:12:24.730 --> 00:12:26.690 and it, it has all of those same properties. 00:12:26.690 --> 00:12:28.610 You can index. You can do joins across your 00:12:28.610 --> 00:12:32.760 normal table into the documents. It's crazy cool. 00:12:32.760 --> 00:12:36.670 MySQL. It's pretty much the same as PostGres, is 00:12:36.670 --> 00:12:42.310 my answer. But there's a slight caveat. So, you 00:12:42.310 --> 00:12:47.900 know, I, I recall, they're a company. Many of 00:12:47.900 --> 00:12:50.180 the same things apply. Like, this is why, you 00:12:50.180 --> 00:12:52.380 know, they're, they're kind of in the same bucket. 00:12:52.380 --> 00:12:55.950 For me, it doesn't particularly matter at the end 00:12:55.950 --> 00:12:58.190 of the day. Whatever you happen to have expertise 00:12:58.190 --> 00:13:01.170 in, it's cool. It's got some kind of interesting 00:13:01.170 --> 00:13:02.870 things that you can do. You can switch out 00:13:02.870 --> 00:13:07.700 storage engines to actually get your different performance profiles. 00:13:07.700 --> 00:13:11.530 It is everywhere. It's got a thing called Handler 00:13:11.530 --> 00:13:16.410 Socket, which is essentially raw, right. Access through a 00:13:16.410 --> 00:13:19.740 low-level socket into the table infrastructure. There's some paper 00:13:19.740 --> 00:13:24.440 with really high performance kind of things. 00:13:24.440 --> 00:13:26.660 You can actually just sort of bypass the whole 00:13:26.660 --> 00:13:29.560 SQL engine, which is kind of interesting. The other 00:13:29.560 --> 00:13:31.870 thing that's happened since Oracle took over, which is 00:13:31.870 --> 00:13:35.340 kind of a really good thing, is that there's 00:13:35.340 --> 00:13:40.470 some alternatives. So MariaDB is sort of the, the 00:13:40.470 --> 00:13:44.650 more open fork. There's a semi-commercial addition that has 00:13:44.650 --> 00:13:47.900 lots of really high-performance features, and they basically run 00:13:47.900 --> 00:13:51.610 binary compatible patches, that's Percona. And they have, like, 00:13:51.610 --> 00:13:55.610 huge expertise. And this Toku is quite interesting. It's, 00:13:55.610 --> 00:13:58.400 they're doing all of this crazy fractal indexing and 00:13:58.400 --> 00:14:02.250 things for particular use cases on very large datasets. 00:14:02.250 --> 00:14:04.900 But it still just looks and behaves in many 00:14:04.900 --> 00:14:07.890 ways like the MySQL that you are kind of 00:14:07.890 --> 00:14:08.630 used to. 00:14:08.630 --> 00:14:13.130 So, there's some interesting things happening there. So these, 00:14:13.130 --> 00:14:16.640 hopefully none of that's a huge surprise. That's databases. 00:14:16.640 --> 00:14:21.320 You use it. It comes in the box, and 00:14:21.320 --> 00:14:22.600 ActiveRecord talks to it. 00:14:22.600 --> 00:14:24.779 So now we're gonna get slightly off the beaten 00:14:24.779 --> 00:14:30.370 track. So, a lot of what we know SQL 00:14:30.370 --> 00:14:35.370 comes from Dynamo, which was actually a paper that 00:14:35.370 --> 00:14:40.160 Amazon released years ago. I'm not gonna labor too 00:14:40.160 --> 00:14:42.460 much on this one. The paper's quite interesting. It 00:14:42.460 --> 00:14:48.000 talks about how you make a distributed system. 00:14:48.000 --> 00:14:51.930 The interesting thing is actually that Riak is essentially 00:14:51.930 --> 00:14:55.300 an implementation of the underlying Dynamo theory. So Riak 00:14:55.300 --> 00:14:58.340 is crazy awesome. This is what happens to you 00:14:58.340 --> 00:15:01.580 when you run Riak in production. 00:15:01.580 --> 00:15:02.430 [laughter] 00:15:02.430 --> 00:15:05.930 I pretty much, like, it's a conversation I, I 00:15:05.930 --> 00:15:08.720 often have with people is like, wouldn't it be 00:15:08.720 --> 00:15:12.750 awesome to have a problem that needed Riak? And 00:15:12.750 --> 00:15:13.710 it was like, yeah, that would be so cool. 00:15:13.710 --> 00:15:17.970 I'd be like the awesomeness engineer. 00:15:17.970 --> 00:15:21.430 So Riak is, it's just crazy-well engineered. They're doing 00:15:21.430 --> 00:15:26.870 all sorts of interesting stuff. It's inherently, it just 00:15:26.870 --> 00:15:30.680 understands clustering. You know, you add a new node, 00:15:30.680 --> 00:15:35.260 it just, it's there. You know. With, with those 00:15:35.260 --> 00:15:37.610 older kind of databases, it's, it's a pain in 00:15:37.610 --> 00:15:40.170 the ass to actually get it working. 00:15:40.170 --> 00:15:45.339 So, yeah, they're doing some really interesting things. It's 00:15:45.339 --> 00:15:47.920 got a cloud storage thing so you've got an 00:15:47.920 --> 00:15:50.120 S3-compatible API and all of these kind of stuff. 00:15:50.120 --> 00:15:51.360 A lot of the magic of the way this 00:15:51.360 --> 00:15:56.660 works is through consistent hashing. So, my slides are 00:15:56.660 --> 00:15:58.380 all mucked up. But anyway. 00:15:58.380 --> 00:16:00.640 So, basically what it does is it just partitions 00:16:00.640 --> 00:16:05.350 all of your data into a giant hash ring. 00:16:05.350 --> 00:16:10.450 Excuse me. Physical nodes then just own parts of 00:16:10.450 --> 00:16:12.370 that hash. You add a new node or take 00:16:12.370 --> 00:16:15.720 a node away and it repartitions all the rest 00:16:15.720 --> 00:16:17.940 of the data across the remaining nodes. And all 00:16:17.940 --> 00:16:21.480 of that is just completely in the background of 00:16:21.480 --> 00:16:24.680 how Riak just works operationally. 00:16:24.680 --> 00:16:27.300 So for large scale data and, you know, you, 00:16:27.300 --> 00:16:30.710 you get away with, it has some really nice 00:16:30.710 --> 00:16:34.170 operational characteristics that, that make it quite cool to 00:16:34.170 --> 00:16:34.750 manage. 00:16:34.750 --> 00:16:36.529 And then the other thing is, it's a very 00:16:36.529 --> 00:16:40.190 simple API. It's key-value store, you can store JSON 00:16:40.190 --> 00:16:42.220 documents in it, and it's just a bucket that 00:16:42.220 --> 00:16:45.130 has keys, and then it's got other stuff on 00:16:45.130 --> 00:16:49.510 top to retrieve data, do secondary indexes and searching 00:16:49.510 --> 00:16:51.130 and all of that kind of stuff. 00:16:51.130 --> 00:16:54.220 So, it's a very cool piece of tech. 00:16:54.220 --> 00:16:59.080 So, the other one we've got is, Google. Fucking 00:16:59.080 --> 00:17:03.810 annoying. And you'll see why in a second. So, 00:17:03.810 --> 00:17:06.980 Google had this thing called BigTable that, again, kind 00:17:06.980 --> 00:17:10.470 of comes out of the internal research. You have 00:17:10.470 --> 00:17:14.299 access to it through some of their cloud properties. 00:17:14.299 --> 00:17:16.799 As you can see, it's got, it's actually a 00:17:16.799 --> 00:17:21.289 sparse distributed multidimensional sorted map, which is good, I 00:17:21.289 --> 00:17:23.618 guess. I imagine. It's awesome. 00:17:23.618 --> 00:17:27.720 The stuff they're doing with this is crazy. So 00:17:27.720 --> 00:17:30.190 this is actually a, all, a couple years old 00:17:30.190 --> 00:17:33.409 I think now. Some of these, some of the 00:17:33.409 --> 00:17:37.190 information, so. Hundreds of petabytes of data, you know, 00:17:37.190 --> 00:17:40.580 ridiculous numbers of operations a second. You do not 00:17:40.580 --> 00:17:42.649 have any of these problems. 00:17:42.649 --> 00:17:46.879 So, then they, they took this stuff, they were 00:17:46.879 --> 00:17:50.210 like, ah, we've got BigTable. You know, that was, 00:17:50.210 --> 00:17:53.499 that was fucking easy. Whatever. And so now they've 00:17:53.499 --> 00:17:55.480 got two other things. They've got one called Spanner 00:17:55.480 --> 00:18:00.019 and one called F-one, where they're basically doing, you 00:18:00.019 --> 00:18:07.019 know, proper, sort of relational looking data across multiple 00:18:07.350 --> 00:18:10.320 data centers and, you know, and. They're kind of 00:18:10.320 --> 00:18:12.590 really pushing the boundaries of some of that CAP 00:18:12.590 --> 00:18:14.710 stuff that's going on. 00:18:14.710 --> 00:18:18.490 But all you need is a GPS in every 00:18:18.490 --> 00:18:21.379 server, a couple of atomic clocks in each data 00:18:21.379 --> 00:18:26.830 center, and you, great. So, Google's basically telling everyone 00:18:26.830 --> 00:18:29.720 to, you know, just fuck off. 00:18:29.720 --> 00:18:35.169 So, another one that I really, I really like, 00:18:35.169 --> 00:18:39.490 and have used a long, a long time ago 00:18:39.490 --> 00:18:45.690 in, in tech land, tech time, is Cassandra. Cassandra 00:18:45.690 --> 00:18:50.110 is a column-oriented database. Eventually it's awesome. It's really 00:18:50.110 --> 00:18:54.240 all about eventual consistency. 00:18:54.240 --> 00:18:57.519 And you can see here, this is a man, 00:18:57.519 --> 00:18:59.259 he eventually gets it right. So that's well done 00:18:59.259 --> 00:19:02.360 to him there. So Cassandra's a lot like that. 00:19:02.360 --> 00:19:06.019 And, again, you know, the cool thing is, it's 00:19:06.019 --> 00:19:10.549 a sparse distributor multi dimensional sorted map. It, when 00:19:10.549 --> 00:19:13.350 I was working with it, you, it was, you 00:19:13.350 --> 00:19:16.100 had, you described your tables kind of thing in 00:19:16.100 --> 00:19:20.309 XML and hated yourself, and then every time something 00:19:20.309 --> 00:19:23.460 changed you rebooted the server and that took awhile 00:19:23.460 --> 00:19:27.389 and, yeah, the whole thing was really difficult. 00:19:27.389 --> 00:19:30.570 What it basically does is it takes the availability 00:19:30.570 --> 00:19:33.570 side of the question. Like, that's its world model. 00:19:33.570 --> 00:19:37.830 It has, again, a very simple clustering system. New 00:19:37.830 --> 00:19:41.289 nodes, add in, the data gets streamed out. It 00:19:41.289 --> 00:19:46.070 has a data model that is really complicated, and 00:19:46.070 --> 00:19:48.470 I, even though I've used it, it's really hard 00:19:48.470 --> 00:19:50.909 to explain how it actually works. 00:19:50.909 --> 00:19:54.730 So column databases basically kind of invert the, the 00:19:54.730 --> 00:19:56.700 whole table structure that you're used to from the 00:19:56.700 --> 00:20:01.190 relational world. And the advantage is that, for some 00:20:01.190 --> 00:20:04.159 types of data, and for some queries, it is 00:20:04.159 --> 00:20:07.600 crazy blazing fast, cause you can just. Time series 00:20:07.600 --> 00:20:08.619 are always a good one, where you can just 00:20:08.619 --> 00:20:10.929 have long streams of time series and it will 00:20:10.929 --> 00:20:13.490 actually put that on disk or next to each 00:20:13.490 --> 00:20:15.600 other and you can just pull it all out. 00:20:15.600 --> 00:20:18.509 The cool thing in the new versions of Cassandra 00:20:18.509 --> 00:20:22.299 is that they've abstracted all of that out, and 00:20:22.299 --> 00:20:25.570 you actually just get tables, so you can create 00:20:25.570 --> 00:20:28.200 a table and give it a primary key, and 00:20:28.200 --> 00:20:32.239 under the covers, it's setting up rows and column 00:20:32.239 --> 00:20:35.239 families and columns and all of, all of these 00:20:35.239 --> 00:20:39.389 really abstract concepts, and they've completely made some of 00:20:39.389 --> 00:20:41.499 that go away. Which is really nice. 00:20:41.499 --> 00:20:43.929 So you end up with something that looks a 00:20:43.929 --> 00:20:48.739 lot like just SQL and, you know, a normal 00:20:48.739 --> 00:20:52.649 table kind of structure. It's just clustering out lots 00:20:52.649 --> 00:20:55.100 of nodes. It's very tunable, so you can actually 00:20:55.100 --> 00:20:57.989 set up, you know, it writes to a node 00:20:57.989 --> 00:21:00.019 and you can say, actually write to five nodes 00:21:00.019 --> 00:21:02.019 and that's a quorem and now we're cool. So 00:21:02.019 --> 00:21:06.019 you can tune how much redundancy you have. 00:21:06.019 --> 00:21:12.590 So that's kind of cool. That is a reminder. 00:21:12.590 --> 00:21:17.559 That went cold really fast. Thank you. 00:21:17.559 --> 00:21:20.830 So, the next one on our list is Memcache. 00:21:20.830 --> 00:21:24.220 Memcache, there was, there was a talk earlier in 00:21:24.220 --> 00:21:27.529 the week that was describing using Memcache and caching 00:21:27.529 --> 00:21:29.789 and it, it had a very interesting observation, which 00:21:29.789 --> 00:21:32.669 was, it just works. He didn't even know what 00:21:32.669 --> 00:21:36.590 version he was running in production, cause neh. Doesn't 00:21:36.590 --> 00:21:38.739 matter. That API has been stable for ages. 00:21:38.739 --> 00:21:42.419 And I know, I know what you're saying. It's 00:21:42.419 --> 00:21:45.559 not a database. It's a cache. Technically true. But 00:21:45.559 --> 00:21:48.049 it's interesting to think about, because the moment you 00:21:48.049 --> 00:21:51.379 add caching, even if you've been ignoring the fact 00:21:51.379 --> 00:21:54.779 that you had a distributed system before, with caching 00:21:54.779 --> 00:21:57.330 you now really have a distributed system. You've got 00:21:57.330 --> 00:21:59.980 data in one thing that may or may not 00:21:59.980 --> 00:22:02.759 be fresh, and you've got data in your database 00:22:02.759 --> 00:22:05.119 that, you know, you assume is up to date, 00:22:05.119 --> 00:22:07.249 and now you've got a synchronization problem. 00:22:07.249 --> 00:22:12.080 So, Memcache is actually really, you know, it's, it's 00:22:12.080 --> 00:22:16.659 just rock solid, old as the hills technology, completely 00:22:16.659 --> 00:22:22.279 simple. The API is everywhere. Lots of people actually 00:22:22.279 --> 00:22:26.119 have made their, you know, key-value store they made 00:22:26.119 --> 00:22:28.309 in the hacknight, which, you know, is a useful 00:22:28.309 --> 00:22:30.739 hobby if you want to annoy everyone. 00:22:30.739 --> 00:22:33.139 You have the, their API is actually the Memcached 00:22:33.139 --> 00:22:36.080 API. It's got a handful of things. You can 00:22:36.080 --> 00:22:40.129 set a key, you can replace one. It does 00:22:40.129 --> 00:22:43.679 have something atomic operations so you can increment and 00:22:43.679 --> 00:22:46.149 decrement so that there is some flexibility to actually 00:22:46.149 --> 00:22:51.669 do a little bit of data storage in a, 00:22:51.669 --> 00:22:55.779 in a more traditional sense. 00:22:55.779 --> 00:22:59.389 It's actually a client-server model. Your, your driver is 00:22:59.389 --> 00:23:02.429 responsible for the clustering in a way, so you 00:23:02.429 --> 00:23:07.049 can have multiple Memcache nodes and the, the hashing 00:23:07.049 --> 00:23:11.279 algorithm determines which node, which node a particular piece 00:23:11.279 --> 00:23:13.440 of data is gonna be on. 00:23:13.440 --> 00:23:15.960 That has the property of making it very, very 00:23:15.960 --> 00:23:19.440 simple to use. And there's no cluster state. There's 00:23:19.440 --> 00:23:21.889 no coordination that nodes have. Like, a lot of 00:23:21.889 --> 00:23:23.519 the heavy lifting all of these other things are 00:23:23.519 --> 00:23:27.869 doing is about coordinating around all of that information. 00:23:27.869 --> 00:23:29.749 There's a whole bunch of awesome stuff just baked 00:23:29.749 --> 00:23:34.519 into Rails. So you can just easily cache into 00:23:34.519 --> 00:23:38.940 Memcache, or your normal Rails fragment mutations. All of 00:23:38.940 --> 00:23:40.869 that kind of stuff. 00:23:40.869 --> 00:23:42.409 And there's even some things we can, you can 00:23:42.409 --> 00:23:46.289 actually put, push that into ActiveRecord and have, have 00:23:46.289 --> 00:23:48.440 caching at that level as well. 00:23:48.440 --> 00:23:50.700 Redis is an interesting one for the, the Rails 00:23:50.700 --> 00:23:56.580 community. Cause it's basically a queue, now. Everyone seems 00:23:56.580 --> 00:24:01.369 to be running Resq, Sidekiq, and, you know, Redis 00:24:01.369 --> 00:24:05.659 is, again, one of those just pieces of technology 00:24:05.659 --> 00:24:12.220 that is beautifully engineered, incredibly simple, incredibly robust. The 00:24:12.220 --> 00:24:19.220 maintainers are just absolute, you know, scientists, I guess. 00:24:19.309 --> 00:24:22.999 Just a whole other level of crazy algorithm stuff. 00:24:22.999 --> 00:24:25.299 And they make blog posts and, you know, I'm 00:24:25.299 --> 00:24:31.519 so stupid. I don't understand what you're talking about. 00:24:31.519 --> 00:24:35.989 It's really fast, it's slightly hard to distribute. A 00:24:35.989 --> 00:24:38.710 lot of that's in the pipeline with Redis. It's 00:24:38.710 --> 00:24:42.379 much more, it's much more simple to, to stick 00:24:42.379 --> 00:24:46.070 it on one node and increase the RAM. It's 00:24:46.070 --> 00:24:49.359 mu, more complicated then Memcache. It's essentially just an 00:24:49.359 --> 00:24:52.129 in-memory cache. It has a bunch of really interesting 00:24:52.129 --> 00:24:56.679 data structures, though. I think if you've been confused 00:24:56.679 --> 00:24:59.029 all week, now, which country I'm from, whether I 00:24:59.029 --> 00:25:01.720 say dayta or dahta, so now I just changed 00:25:01.720 --> 00:25:03.710 them randomly. 00:25:03.710 --> 00:25:08.070 So, you can, you have hashes you have lists, 00:25:08.070 --> 00:25:09.779 you have strings. You've got all sorts of other 00:25:09.779 --> 00:25:14.129 interesting things. You can do optimistic locking and have, 00:25:14.129 --> 00:25:17.609 you know, a bunch of operations that are essentially 00:25:17.609 --> 00:25:22.369 batched. You can do sort of, there's long ways 00:25:22.369 --> 00:25:25.440 of doing this kind of stuff. It's Resque and 00:25:25.440 --> 00:25:28.690 Sidekiq both just make this, make it super simple 00:25:28.690 --> 00:25:31.139 to do background tasks with Rails and install the 00:25:31.139 --> 00:25:36.960 gem, have a worker, and it's all just magic. 00:25:36.960 --> 00:25:39.769 It is Lua baked in, which is a whole 00:25:39.769 --> 00:25:41.850 other thing. But Lua is a really cool programming 00:25:41.850 --> 00:25:44.940 language that is designed for embeddability. But one of 00:25:44.940 --> 00:25:47.210 the things that happens if you can actually write 00:25:47.210 --> 00:25:51.389 little rule, Lua scripts that end up going into 00:25:51.389 --> 00:25:54.519 the Redis server to do more complex operations. So, 00:25:54.519 --> 00:25:57.179 in this case, this is a little script that 00:25:57.179 --> 00:26:00.269 grabs something off a sorted hash and then deletes 00:26:00.269 --> 00:26:02.789 them and then returns the first thing, like, then 00:26:02.789 --> 00:26:05.789 returns what we had done. But it's, it's an 00:26:05.789 --> 00:26:09.529 atomic kind of transactional way. 00:26:09.529 --> 00:26:13.320 And, good news everybody! We've just invented stored procedures. 00:26:13.320 --> 00:26:16.409 So that's very exciting. Except now they're much more 00:26:16.409 --> 00:26:18.639 hip, because it's an in-memory database with a language 00:26:18.639 --> 00:26:23.330 no one's heard of. So. We are rocking it. 00:26:23.330 --> 00:26:28.470 Also, maybe use a queue. Just, I know it's 00:26:28.470 --> 00:26:32.869 crazy. But, if you're actually queuing, using Redis as 00:26:32.869 --> 00:26:36.809 your queue, maybe you have a queuing problem and 00:26:36.809 --> 00:26:39.609 you have queues. They exist. They're a thing. It's 00:26:39.609 --> 00:26:41.440 ridiculous. I know. 00:26:41.440 --> 00:26:46.379 So, RabbitMQ is sort of the gold standard, and 00:26:46.379 --> 00:26:49.129 Kafka is another one that was talked about earlier 00:26:49.129 --> 00:26:50.909 this week, and it is crazy cool. 00:26:50.909 --> 00:26:56.129 Where am I? Man. All right. Just gonna stretch. 00:26:56.129 --> 00:26:58.820 I've lost count, so I don't know, now I'm 00:26:58.820 --> 00:27:02.019 just gonna talk faster. Cool. 00:27:02.019 --> 00:27:08.369 Neo4j is really interesting. It's a graph database. That's. 00:27:08.369 --> 00:27:13.350 It's slightly hard to explain. But you, the way 00:27:13.350 --> 00:27:15.210 I actually think about it, we'll just jump straight 00:27:15.210 --> 00:27:17.460 to here, is it's almost but not quite entirely 00:27:17.460 --> 00:27:22.950 unlike a relational database. The difference, essentially, is that 00:27:22.950 --> 00:27:27.409 it is optimize for the connections rather than aggregated 00:27:27.409 --> 00:27:31.710 data. So relational database, you, puts things in, in 00:27:31.710 --> 00:27:33.279 a way where you can get a sum and 00:27:33.279 --> 00:27:35.179 a count and like, that's kind of the heritage 00:27:35.179 --> 00:27:37.029 of that kind of world view. 00:27:37.029 --> 00:27:40.340 Whereas what the Neo4j people are doing is actually 00:27:40.340 --> 00:27:44.739 thinking about connections between pieces of data, and for 00:27:44.739 --> 00:27:49.340 some use cases, this is actually really, really amazing 00:27:49.340 --> 00:27:52.369 stuff. So you have, a graph is basically a 00:27:52.369 --> 00:27:56.850 collection of nodes, and those nodes can have relationships 00:27:56.850 --> 00:27:59.179 between each other, and then a node just has 00:27:59.179 --> 00:28:01.330 properties. 00:28:01.330 --> 00:28:03.830 It's essentially an object database in a way. It's 00:28:03.830 --> 00:28:05.639 like very similar to the way that we think 00:28:05.639 --> 00:28:08.109 about objects. So it has some really nice properties 00:28:08.109 --> 00:28:11.859 if you're working in a language like Ruby. And 00:28:11.859 --> 00:28:17.009 then it just does stuff that, you know, in 00:28:17.009 --> 00:28:19.090 a really intuitive way. So if we've got a 00:28:19.090 --> 00:28:22.159 graph of movies and actors, you actually define a 00:28:22.159 --> 00:28:26.460 relationship by name. Then an actor acts in a 00:28:26.460 --> 00:28:28.700 movie. And then when you were doing your queries, 00:28:28.700 --> 00:28:32.909 this is a language called Cypher, you actually, that's 00:28:32.909 --> 00:28:34.059 a first-class thing. 00:28:34.059 --> 00:28:36.019 Whereas in a relational world, you're, you're using a 00:28:36.019 --> 00:28:39.279 foreign key, which has no semantic meaning at all. 00:28:39.279 --> 00:28:41.330 You, you just have to remember that, you know, 00:28:41.330 --> 00:28:43.019 an actor, you know, there's a table with an 00:28:43.019 --> 00:28:45.729 actor id, and a movie id, and we're joining 00:28:45.729 --> 00:28:49.919 across somewhere. Whereas Neo4j actually makes those relationships first 00:28:49.919 --> 00:28:53.359 class citizens. So if you've got problems that are 00:28:53.359 --> 00:29:00.359 graph problems, like social network friend cloud stuff, some 00:29:01.549 --> 00:29:04.799 of that stuff, Neo4j just makes trivially easy in 00:29:04.799 --> 00:29:06.070 a way that you would have had to do 00:29:06.070 --> 00:29:10.119 a recursive self-join in PostGres and hate your life 00:29:10.119 --> 00:29:12.499 and, you know. 00:29:12.499 --> 00:29:17.029 Couch is cool. I guess. Pretty much that's my 00:29:17.029 --> 00:29:21.029 opinion of it. It's really awesome. But, you can't 00:29:21.029 --> 00:29:25.659 query it. So cool. 00:29:25.659 --> 00:29:28.109 That's it. That's a slight disservice to Couch but, 00:29:28.109 --> 00:29:31.970 you know, whatever. MongoDB, as we all know, it 00:29:31.970 --> 00:29:34.559 is webscale and that's excellent. If you think of 00:29:34.559 --> 00:29:39.200 it as Redis for JSON, that's good. Sixty percent 00:29:39.200 --> 00:29:41.249 of the time, it works every time. Everyone's familiar 00:29:41.249 --> 00:29:43.169 with that. 00:29:43.169 --> 00:29:46.929 So, the thing that's really, I mean, Mongo, it 00:29:46.929 --> 00:29:50.919 reminds me of My, MySQL. Like, Mongo is kind 00:29:50.919 --> 00:29:54.320 of terrible, but MySQL was kind of terrible, too. 00:29:54.320 --> 00:29:56.789 Like, when that came out, it didn't do transactions, 00:29:56.789 --> 00:30:00.039 for example, and I, I was working in enterprise-y 00:30:00.039 --> 00:30:04.419 land, and transactions are actually a thing. And, you're 00:30:04.419 --> 00:30:08.929 like, you script kiddies with your database. 00:30:08.929 --> 00:30:10.789 So Mongo feels like that, and not, you know, 00:30:10.789 --> 00:30:13.970 what we learned is, if you make something that's 00:30:13.970 --> 00:30:17.539 awesome and useful and everywhere and ubiquitous and it 00:30:17.539 --> 00:30:20.749 doesn't work, you can make it work. And eventually, 00:30:20.749 --> 00:30:23.309 you know, MySQL is a real database. So Mongo 00:30:23.309 --> 00:30:25.470 feels a bit like that. It's come a massive 00:30:25.470 --> 00:30:30.690 way, right about really early on with very early 00:30:30.690 --> 00:30:32.309 versions. 00:30:32.309 --> 00:30:34.759 It stores JSON. Well sort of it. It stores 00:30:34.759 --> 00:30:39.710 BSON, anyway. That's just binary JSON basically. And it's 00:30:39.710 --> 00:30:42.409 a, it's a really beautiful model to work with 00:30:42.409 --> 00:30:45.129 in a development cycle, which is why think is 00:30:45.129 --> 00:30:47.489 why there's, why there's so much appeal. You've just 00:30:47.489 --> 00:30:50.929 got kind of, people treat it like an object 00:30:50.929 --> 00:30:53.690 database. You've just got an object that's in there, 00:30:53.690 --> 00:30:55.720 and you can pull out objects and manipulate them 00:30:55.720 --> 00:30:59.859 and do all of this kind of crazy stuff. 00:30:59.859 --> 00:31:05.220 The people who know what they're talking about, though, 00:31:05.220 --> 00:31:08.450 with distributed systems, if the reason you're using Mongo 00:31:08.450 --> 00:31:10.299 is because you think it's a panacea for all 00:31:10.299 --> 00:31:13.700 of this, you know, we need to be webscale 00:31:13.700 --> 00:31:17.229 and do all of this kind of stuff, that 00:31:17.229 --> 00:31:19.399 is not a good reason to use it. Cause 00:31:19.399 --> 00:31:21.919 there, there's still a lot of operational problems and, 00:31:21.919 --> 00:31:23.739 and stuff going on. 00:31:23.739 --> 00:31:30.179 This, this one is interesting. It's essentially, RethinkDB is 00:31:30.179 --> 00:31:33.299 coming from the PostGres world view. Cause PostGres made, 00:31:33.299 --> 00:31:36.729 you know, MySQL was like, whatever, we'll fix it. 00:31:36.729 --> 00:31:39.669 PostGres was like, we'll do it right and it, 00:31:39.669 --> 00:31:41.629 you can't use it cause it's so slow, but 00:31:41.629 --> 00:31:43.539 at least it's correct. And they took lots of 00:31:43.539 --> 00:31:46.539 iterations to make it usable. So Rethink is kind 00:31:46.539 --> 00:31:48.340 of that school of thought. It's like, we're gonna 00:31:48.340 --> 00:31:50.619 make it all correct first, and then we'll make 00:31:50.619 --> 00:31:55.799 it usable. So it's very similar idea. JSON, you 00:31:55.799 --> 00:31:59.429 know, they're trying to make it operationally great with 00:31:59.429 --> 00:32:02.979 automatic clustering and all this kind of stuff. You 00:32:02.979 --> 00:32:05.149 know. Who knows what it is and how it's 00:32:05.149 --> 00:32:07.179 actually gonna behave in the real world. It's still 00:32:07.179 --> 00:32:09.159 a very early piece of tech. 00:32:09.159 --> 00:32:11.249 And that leads me into, there's a whole world 00:32:11.249 --> 00:32:15.479 of databases around what I'm loosely calling the commercial 00:32:15.479 --> 00:32:20.149 fringe. So Couchbase is the Couch guys and sort 00:32:20.149 --> 00:32:24.019 of some commercial Memcached guys who got together to 00:32:24.019 --> 00:32:28.409 make a hybrid something. Aerospike is, their marketing is 00:32:28.409 --> 00:32:31.519 great. That's about the best you can say about 00:32:31.519 --> 00:32:31.869 it. 00:32:31.869 --> 00:32:33.289 So there's a whole bunch of people trying to 00:32:33.289 --> 00:32:36.799 solve these problems in interesting ways. But all of 00:32:36.799 --> 00:32:40.720 these ones cost money and, you know, they're, the 00:32:40.720 --> 00:32:42.200 mileage varies and all of that kind of stuff. 00:32:42.200 --> 00:32:43.539 The cool thing about open sources ones is you 00:32:43.539 --> 00:32:45.029 get it and you try it and you hate 00:32:45.029 --> 00:32:46.570 it and you go back to PostGres so it's 00:32:46.570 --> 00:32:48.190 all fine. 00:32:48.190 --> 00:32:53.190 So, Hyperdex. This is my favorite. Because they have 00:32:53.190 --> 00:32:58.379 HyperSpace Hashing, and it is so cool. These guys 00:32:58.379 --> 00:33:02.369 are making some really broad, amazing claims about the, 00:33:02.369 --> 00:33:06.549 the kind of things that they can do. Crazy 00:33:06.549 --> 00:33:08.690 fast. It's, it's a key-value store but it will 00:33:08.690 --> 00:33:11.599 index, you know, it's not just a key but 00:33:11.599 --> 00:33:14.039 it will index the properties of a value. So 00:33:14.039 --> 00:33:16.509 now you can do que, you know, genuine queries 00:33:16.509 --> 00:33:20.629 into the structure of objects that you're storing. 00:33:20.629 --> 00:33:23.499 They've got a whole bunch of papers around what 00:33:23.499 --> 00:33:27.299 they're doing. So, you can read that as, who 00:33:27.299 --> 00:33:29.679 knows what it means. It maps objects to coordinates 00:33:29.679 --> 00:33:34.529 in a multi-dimensioned Euclidean space. HyperSpace. And I'm like. 00:33:34.529 --> 00:33:37.109 Take my money! 00:33:37.109 --> 00:33:40.989 And there's a, there's a picture of HyperSpace. And, 00:33:40.989 --> 00:33:43.659 like, I've read that like eight times. I don't 00:33:43.659 --> 00:33:49.999 understand what's going on. But if, it does seem 00:33:49.999 --> 00:33:52.070 to be true. They're trying to solve some of 00:33:52.070 --> 00:33:54.720 these problems and, you know, they call themselves like 00:33:54.720 --> 00:33:59.659 a second generation NoSQL thing, in a similar way 00:33:59.659 --> 00:34:01.669 to Google, you know, kind of taking all of 00:34:01.669 --> 00:34:05.039 this stuff and trying to push the science underneath 00:34:05.039 --> 00:34:06.999 it forward. 00:34:06.999 --> 00:34:09.510 So you can, you know, it's got a Ruby 00:34:09.510 --> 00:34:12.960 client. You can use it now. It's got, just, 00:34:12.960 --> 00:34:18.429 normal key-value. It's got atomic stuff. You can do 00:34:18.429 --> 00:34:22.969 conditional ports, so this is some code that's basically 00:34:22.969 --> 00:34:26.860 is only updating if the, only updating the current 00:34:26.860 --> 00:34:31.969 balance if the, updating the balance if the current 00:34:31.969 --> 00:34:34.460 balance is what we think it is. Otherwise some 00:34:34.460 --> 00:34:36.460 other thread has updated it. 00:34:36.460 --> 00:34:38.889 So there's some really interesting stuff they can do. 00:34:38.889 --> 00:34:43.650 And they're guaranteeing those operations across the cluster. And 00:34:43.650 --> 00:34:45.620 it's also got a transactional engine as well, so 00:34:45.620 --> 00:34:47.250 that's really exciting. 00:34:47.250 --> 00:34:51.610 Running out of time. HBase and Hadoop. You don't 00:34:51.610 --> 00:34:54.679 have any of these problems. Don't worry about it. 00:34:54.679 --> 00:34:56.219 You probably don't want to have any of these 00:34:56.219 --> 00:34:59.840 problems. Cause this just ends up, you need to 00:34:59.840 --> 00:35:03.870 install every fucking thing the Apache foundation has ever 00:35:03.870 --> 00:35:08.240 made. And this isn't even the full list. This 00:35:08.240 --> 00:35:09.980 is like, you probably need those. 00:35:09.980 --> 00:35:12.620 I have a friend, he's a bit of a 00:35:12.620 --> 00:35:16.870 dick, and he, he calls it, cause he, he 00:35:16.870 --> 00:35:19.710 works in an actual big data organization, and he 00:35:19.710 --> 00:35:21.630 just, he goes, oh, you people with your small 00:35:21.630 --> 00:35:25.970 to medium data. So, yeah, like, most of us, 00:35:25.970 --> 00:35:27.630 we don't have big data in any sense of 00:35:27.630 --> 00:35:31.060 the word, really. Like, if, if it's got GB 00:35:31.060 --> 00:35:34.600 on the end of it, meh. You're not there 00:35:34.600 --> 00:35:35.530 yet. 00:35:35.530 --> 00:35:40.730 So, again, this is just you know, Facebook is 00:35:40.730 --> 00:35:42.270 using the hell out of this stuff, and they're 00:35:42.270 --> 00:35:44.860 just like, this is all out of date. They're 00:35:44.860 --> 00:35:49.590 like now just, they can't buy hard disks fast 00:35:49.590 --> 00:35:53.930 enough. It's crazy. Yeah. There was a punch line 00:35:53.930 --> 00:35:56.380 at the end of all of that. 00:35:56.380 --> 00:35:57.920 But my friend, the guy who I said was 00:35:57.920 --> 00:36:00.960 a bit of a dick, he, he recommends having 00:36:00.960 --> 00:36:04.230 a look at this. And this is his quote, 00:36:04.230 --> 00:36:07.090 if you want to appear really cool and underground, 00:36:07.090 --> 00:36:09.140 then I reckon the next big thing is the 00:36:09.140 --> 00:36:12.280 Berkeley Data Analytics Stack. So, there's a whole bunch 00:36:12.280 --> 00:36:15.580 of people who are looking at that, you know, 00:36:15.580 --> 00:36:18.180 crazy big data situation and trying to work out 00:36:18.180 --> 00:36:22.210 what that means and what the future is. 00:36:22.210 --> 00:36:24.800 And so Apache and Berkeley are kind of in 00:36:24.800 --> 00:36:26.940 a cold war for that at the moment. And 00:36:26.940 --> 00:36:29.140 then there's heaps of people in the enterprise space 00:36:29.140 --> 00:36:31.850 because you can sell lots of products and or 00:36:31.850 --> 00:36:34.590 services to large companies who think they have a 00:36:34.590 --> 00:36:37.710 big data problem. So that's cool. 00:36:37.710 --> 00:36:39.650 That's fine. This isn't, this is just a little 00:36:39.650 --> 00:36:44.990 thing that's an embeddable document key-value store that you 00:36:44.990 --> 00:36:47.430 can, it's just kind of a fun team and 00:36:47.430 --> 00:36:49.210 has an API that looks very similar to the 00:36:49.210 --> 00:36:52.520 Mongo one. And it just sits in process. 00:36:52.520 --> 00:36:56.210 Oh, ElasticSearch. Every time I use it, I think, 00:36:56.210 --> 00:37:01.400 why can you not be my database? It's awesome. 00:37:01.400 --> 00:37:03.370 But it loses a couple of points there because 00:37:03.370 --> 00:37:08.920 of its configurationability. It went, it works when you 00:37:08.920 --> 00:37:10.830 know how to make it works, and it's crazy 00:37:10.830 --> 00:37:12.680 complicated sometimes. 00:37:12.680 --> 00:37:19.640 So anyway. Thirty. Four minutes over technically, I think. 00:37:19.640 --> 00:37:21.950 Yeah. So that's good. 00:37:21.950 --> 00:37:28.950 That's databases in a nutshell. I'm Toby Hede. I'm 00:37:29.160 --> 00:37:31.280 around the conference if you want to talk about 00:37:31.280 --> 00:37:35.340 databases. I think of myself as a lapa-, a 00:37:35.340 --> 00:37:39.320 lap- a butterfly collector, I guess, is what I'm 00:37:39.320 --> 00:37:41.200 looking for, of databases. 00:37:41.200 --> 00:37:45.960 Yeah. So come and say hi. Cool.