1 00:00:16,830 --> 00:00:17,590 TOBY HEDE: Good morning everybody. 2 00:00:17,590 --> 00:00:24,369 Friday. Yes. It's been a long week. I'm excited. 3 00:00:24,369 --> 00:00:29,279 I'm highly caffeinated. So without further ado, 4 00:00:29,279 --> 00:00:34,180 I present An Ode to 17 Databases in 33 Minutes. 5 00:00:34,180 --> 00:00:37,870 I'm gonna mangle a large number of metaphors. 6 00:00:37,870 --> 00:00:40,820 There'll be a lot of animated gifs. 7 00:00:40,820 --> 00:00:44,210 I've learned that this week, if you see it like that, 8 00:00:44,210 --> 00:00:47,910 there's Star Wars, Dungeons and Dragons, 9 00:00:47,910 --> 00:00:49,350 and all of that's very, unfortunately, stereotypical. 10 00:00:49,350 --> 00:00:51,900 So a bit of an indictment. 11 00:00:51,900 --> 00:00:55,839 This whole thing started as a joke. Seventeen databases. 12 00:00:55,839 --> 00:00:59,159 I actually did in five minutes. Thirty-three minutes is 13 00:00:59,159 --> 00:01:03,799 worse. The whole thing is just a catastrophe, really. 14 00:01:03,799 --> 00:01:04,659 But anyway. 15 00:01:04,659 --> 00:01:07,610 We're gonna cover a whole bunch of different databases 16 00:01:07,610 --> 00:01:09,890 and a little bit of the underlying theory, and 17 00:01:09,890 --> 00:01:12,670 hopefully you'll walk out and you'll understand why to 18 00:01:12,670 --> 00:01:13,729 use PostGres. 19 00:01:13,729 --> 00:01:14,250 [laughter] 20 00:01:14,250 --> 00:01:20,180 I'm Toby. You can find me on the internet. 21 00:01:20,180 --> 00:01:22,260 I work at a company called Nine Fold. 22 00:01:22,260 --> 00:01:26,100 V.O.: We're having a problem, there's no screen. 23 00:01:26,100 --> 00:01:33,100 T.H.: Oh. No screens. Is that me? 24 00:01:35,960 --> 00:01:41,210 Before it was, there was no red. So, now 25 00:01:41,210 --> 00:01:43,860 there's no any, anything. 26 00:01:43,860 --> 00:01:45,180 V.O.: Nothing. 27 00:01:45,180 --> 00:01:46,500 T.H.: Hey. 28 00:01:46,500 --> 00:01:47,820 AUDIENCE: Hey! 29 00:01:47,820 --> 00:01:51,120 T.H.: I have no slides. 30 00:01:51,120 --> 00:01:54,670 Well, you missed my beautiful slides. There's. You missed 31 00:01:54,670 --> 00:01:57,740 the first animation. That's a shame. You missed the 32 00:01:57,740 --> 00:02:01,500 list. It's awesome. You missed me and my excellent 33 00:02:01,500 --> 00:02:05,070 job titles. So yes. 34 00:02:05,070 --> 00:02:08,340 I work at Nine Fold. They have very kindly 35 00:02:08,340 --> 00:02:12,550 flown me over here from Australia, which explains why 36 00:02:12,550 --> 00:02:16,690 I sound like I come from the deep south. 37 00:02:16,690 --> 00:02:18,060 Cause I do. 38 00:02:18,060 --> 00:02:21,260 Most of this week, this has been me. So 39 00:02:21,260 --> 00:02:23,560 today I'm finally over the jetlag just in time 40 00:02:23,560 --> 00:02:26,670 to go home and have it all over again 41 00:02:26,670 --> 00:02:27,850 next week. 42 00:02:27,850 --> 00:02:32,450 So, a couple of quick facts about Straya. There 43 00:02:32,450 --> 00:02:39,120 are much fewer syllables than you're used to using. 44 00:02:39,120 --> 00:02:43,950 This is an, a genuine Australian politician. He's a 45 00:02:43,950 --> 00:02:48,060 mining magnate billionaire and he is currently running a 46 00:02:48,060 --> 00:02:52,880 MVP Jurrassic theme park with giant fiberglass dinosaurs. And 47 00:02:52,880 --> 00:02:56,310 I, I for one am for it. So I 48 00:02:56,310 --> 00:02:58,510 realize there wasn't enough Star Wars references so this 49 00:02:58,510 --> 00:03:00,540 is just completely gratuitous. 50 00:03:00,540 --> 00:03:05,430 Anyway. So. The thrust is that distributed systems are 51 00:03:05,430 --> 00:03:08,470 hard and databases are fun. Pictured here is a 52 00:03:08,470 --> 00:03:13,530 distributed system. You can see there's two app nodes 53 00:03:13,530 --> 00:03:16,660 and then there's two, there's like a master/slave kind 54 00:03:16,660 --> 00:03:20,920 of setup going on here as well. So we're 55 00:03:20,920 --> 00:03:23,950 gonna talk about some of the complexities of running 56 00:03:23,950 --> 00:03:27,670 these types of systems, and it's really fun stuff 57 00:03:27,670 --> 00:03:29,980 once you get under the cover and start thinking 58 00:03:29,980 --> 00:03:32,250 about some of the complexities. 59 00:03:32,250 --> 00:03:37,030 So. NoSQL is a thing. We have NewSQL now. 60 00:03:37,030 --> 00:03:38,780 I'm gonna be covering some of these things. We've 61 00:03:38,780 --> 00:03:44,000 also got PostSQL, Post-Rock Ambient SQL. And there's a 62 00:03:44,000 --> 00:03:47,120 whole gammit of these things. They all make my 63 00:03:47,120 --> 00:03:50,819 brain explode and the, I think the trick to 64 00:03:50,819 --> 00:03:53,069 understanding all of this stuff is to actually think 65 00:03:53,069 --> 00:03:55,459 about some of what's happening underneath. And you can 66 00:03:55,459 --> 00:03:59,650 make decisions about your databases. 67 00:03:59,650 --> 00:04:01,700 Hopefully you're all familiar with some of the concepts 68 00:04:01,700 --> 00:04:07,250 of traditional relational databases. We have Acid, which provides 69 00:04:07,250 --> 00:04:10,640 certain guarantees about the way that your data behaves. 70 00:04:10,640 --> 00:04:13,129 You can update data and be sure it was 71 00:04:13,129 --> 00:04:18,099 updated. Things are isolated from each other. Things persist 72 00:04:18,099 --> 00:04:20,970 over time. 73 00:04:20,970 --> 00:04:23,129 Another thing that you may have heard of, this 74 00:04:23,129 --> 00:04:25,740 is a, this is a leap that I need 75 00:04:25,740 --> 00:04:27,990 to another animation, is a thing called the CAP 76 00:04:27,990 --> 00:04:30,879 Theorem. So this gets talked about a lot when 77 00:04:30,879 --> 00:04:34,889 we start talking about this new generation of databases. 78 00:04:34,889 --> 00:04:39,599 CAP stands for consistency, availability, and partition tolerance, and 79 00:04:39,599 --> 00:04:44,430 it provides, basically, some strong foundation for reasoning about 80 00:04:44,430 --> 00:04:48,050 the way distributed systems behave and how they interoperate 81 00:04:48,050 --> 00:04:49,680 and how they communicate. So I'm gonna give you 82 00:04:49,680 --> 00:04:52,620 a brief introduction to how that all kind of 83 00:04:52,620 --> 00:04:52,969 works. 84 00:04:52,969 --> 00:04:57,279 So, the original CAP Theorem, as stated, was, is 85 00:04:57,279 --> 00:04:59,610 called Brewer's Conjecture. A guy called Brewer just sort 86 00:04:59,610 --> 00:05:02,659 of had this idea. It's actually on some really 87 00:05:02,659 --> 00:05:06,680 awesomely-designed PowerPoint slides from some thing he did. And 88 00:05:06,680 --> 00:05:11,789 he was saying that with consistency, availability, and partition 89 00:05:11,789 --> 00:05:15,210 tolerance - so the data can, can only be 90 00:05:15,210 --> 00:05:17,439 two of these things at any one time. So 91 00:05:17,439 --> 00:05:20,039 the data can be consistent or it can be 92 00:05:20,039 --> 00:05:23,979 accessible or it can handle network failures. 93 00:05:23,979 --> 00:05:28,249 So people then took this conjecture and actually made 94 00:05:28,249 --> 00:05:31,800 a formal kind of proof in, in much more 95 00:05:31,800 --> 00:05:38,800 rigorous computer science terms. And actually said, it's impossible, 96 00:05:38,830 --> 00:05:42,580 in an asynchronous network model, to implement a read/write 97 00:05:42,580 --> 00:05:49,210 data object that is simultaneously available and is also 98 00:05:49,210 --> 00:05:50,979 atomically consistent. 99 00:05:50,979 --> 00:05:53,110 And so all of this stuff around NewSQL and 100 00:05:53,110 --> 00:05:56,939 NoSQL and bleh, all of that stuff, is about 101 00:05:56,939 --> 00:06:01,610 manipulating these different variables. There's also a thing called 102 00:06:01,610 --> 00:06:02,990 Base but I'm not gonna talk about it cause 103 00:06:02,990 --> 00:06:05,789 it's actually just a made-up acronym that has no 104 00:06:05,789 --> 00:06:06,900 relevance to anything. 105 00:06:06,900 --> 00:06:10,059 So, what, what does CAP actually, what, what are 106 00:06:10,059 --> 00:06:13,649 we talking about here? And why is it important? 107 00:06:13,649 --> 00:06:16,719 It's important, actually, because everything is already distributed. What 108 00:06:16,719 --> 00:06:20,309 we do today is inherently a distributed system. You 109 00:06:20,309 --> 00:06:23,229 have a browser talking to a server, an app 110 00:06:23,229 --> 00:06:25,520 server, Rails server - cause we're at RailsConf - 111 00:06:25,520 --> 00:06:29,249 and then that's talking to a PostGres database, or 112 00:06:29,249 --> 00:06:33,809 a MySQL database or something even fancier and shinier. 113 00:06:33,809 --> 00:06:36,110 That's a distributed system. And as we move into 114 00:06:36,110 --> 00:06:41,460 more heavy client-based operations, that distribution is getting much 115 00:06:41,460 --> 00:06:43,759 more front-loaded, so you, you've got state in the 116 00:06:43,759 --> 00:06:46,589 browser that's now synchronizing with state on the server. 117 00:06:46,589 --> 00:06:50,300 So we already actually suffer many of these problems. 118 00:06:50,300 --> 00:06:55,270 This is a handy and completely untrue guide to 119 00:06:55,270 --> 00:06:59,389 NoSQL systems and breaking them into this idea of 120 00:06:59,389 --> 00:07:03,039 some things are available and some things are consistent. 121 00:07:03,039 --> 00:07:06,990 So, all of that is almost but not quite 122 00:07:06,990 --> 00:07:08,619 entirely untrue. 123 00:07:08,619 --> 00:07:12,800 What the actual theorem says is that under a 124 00:07:12,800 --> 00:07:16,490 network failure - so you've got multiple nodes and 125 00:07:16,490 --> 00:07:20,159 they now can no longer communicate - you can 126 00:07:20,159 --> 00:07:23,490 choose whether the data is consistent or whether the 127 00:07:23,490 --> 00:07:28,849 data is available. And I have some demonstrations here 128 00:07:28,849 --> 00:07:30,860 to just - it actually ends up being very 129 00:07:30,860 --> 00:07:31,580 easy to understand. 130 00:07:31,580 --> 00:07:36,009 So, here we have typical cluster of nodes working 131 00:07:36,009 --> 00:07:41,110 together. We're gonna model some communication between them. So 132 00:07:41,110 --> 00:07:45,669 there's a, there's a write on this system. It 133 00:07:45,669 --> 00:07:49,449 comes in, that gets replicated across, and then on 134 00:07:49,449 --> 00:07:51,240 the other system we now have that data coming 135 00:07:51,240 --> 00:07:53,309 out. Someone's doing a read. And so this is 136 00:07:53,309 --> 00:07:57,679 the kind of situation that we're talking about. So 137 00:07:57,679 --> 00:08:01,469 whether you're doing master/slave setup in a relational database 138 00:08:01,469 --> 00:08:06,129 or something trickier, this is kind of the way 139 00:08:06,129 --> 00:08:08,300 it works. A node gets some data and it 140 00:08:08,300 --> 00:08:12,020 gives it to another node, and they have the 141 00:08:12,020 --> 00:08:13,860 same information. 142 00:08:13,860 --> 00:08:17,839 So when there's a network partition, that, they no 143 00:08:17,839 --> 00:08:22,119 longer can communicate. So a write comes in, and 144 00:08:22,119 --> 00:08:25,869 now we have to make a decision. And all 145 00:08:25,869 --> 00:08:27,740 of this is actually just science, as you can 146 00:08:27,740 --> 00:08:30,759 tell from this diagram. If those two nodes can't 147 00:08:30,759 --> 00:08:33,399 communicate, you can talk to the one that got 148 00:08:33,399 --> 00:08:36,890 the write - that's consistent. It got the write. 149 00:08:36,890 --> 00:08:39,219 It can now, can read out that same data. 150 00:08:39,219 --> 00:08:40,190 That's all cool. 151 00:08:40,190 --> 00:08:43,870 Or, you can have both nodes still communicating, and 152 00:08:43,870 --> 00:08:46,339 now you have someone reading data that is no 153 00:08:46,339 --> 00:08:49,100 longer in the write state. So we've got, you 154 00:08:49,100 --> 00:08:51,650 know, we have updated a bank account. It's got 155 00:08:51,650 --> 00:08:53,680 a hundred dollars in it. It used to have 156 00:08:53,680 --> 00:08:56,770 ten dollars in it. These people are reading ten. 157 00:08:56,770 --> 00:08:59,020 These people are reading a hundred. That's available. The 158 00:08:59,020 --> 00:09:01,200 data is now not consistent. But all of the 159 00:09:01,200 --> 00:09:03,220 nodes can send back that data. 160 00:09:03,220 --> 00:09:06,660 And so all of the discussion about CAP Theorem 161 00:09:06,660 --> 00:09:11,020 and, and you know, people even claiming, we've defeated 162 00:09:11,020 --> 00:09:14,040 the CAP Theorem in our database at, you know, 163 00:09:14,040 --> 00:09:19,430 low-low prices is incredibly awesome. Just remember this image. 164 00:09:19,430 --> 00:09:26,000 Two things that cannot communicate cannot communicate. It's science. 165 00:09:26,000 --> 00:09:28,250 And then when they can communicate, we're back into 166 00:09:28,250 --> 00:09:30,850 the realm of normal operations and things get a 167 00:09:30,850 --> 00:09:35,090 lot easier. If you were interested in any of 168 00:09:35,090 --> 00:09:39,540 the guts of how these things work, definitely have 169 00:09:39,540 --> 00:09:42,430 a look at a thing called jepsen, which is 170 00:09:42,430 --> 00:09:47,670 this crazy motherfucker who is just analyzing the network 171 00:09:47,670 --> 00:09:51,510 operations of a whole variety of distributed systems, and 172 00:09:51,510 --> 00:09:54,900 it will, it's just, it will blow your mind. 173 00:09:54,900 --> 00:10:01,080 OK. Good. That's, that's why. Now I remember. 174 00:10:01,080 --> 00:10:04,590 So, here is our cast. We're about to go 175 00:10:04,590 --> 00:10:08,850 on an adventure through a tortured maze of ridiculous 176 00:10:08,850 --> 00:10:11,620 Dungeons and Dragons metaphors. But, first of all, a 177 00:10:11,620 --> 00:10:14,960 shout out to the OwlBear. Yeah. The thing I 178 00:10:14,960 --> 00:10:18,740 love about the OwlBear is they've taken the wrong, 179 00:10:18,740 --> 00:10:22,680 the least scary aspects of a bear and an 180 00:10:22,680 --> 00:10:26,260 owl, like if that was an owl with, you 181 00:10:26,260 --> 00:10:30,120 know, if it had a bears head and wings, 182 00:10:30,120 --> 00:10:33,620 that would be way more scary. Anyway. 183 00:10:33,620 --> 00:10:37,130 It's just been bugging me for months. So. 184 00:10:37,130 --> 00:10:41,670 PostGres. As we all know, it's MySQL for hipsters. 185 00:10:41,670 --> 00:10:45,050 It's actually pretty good. So here's its character reference 186 00:10:45,050 --> 00:10:49,320 sheet. We, it's a relational database. It has a 187 00:10:49,320 --> 00:10:53,760 consistent model. So under conditions in network partition, you 188 00:10:53,760 --> 00:10:56,650 know, your, your slave is not in contact with 189 00:10:56,650 --> 00:11:00,480 the master, it's, it's essentially unavailable. That's the way 190 00:11:00,480 --> 00:11:01,730 we treat it. 191 00:11:01,730 --> 00:11:05,370 PostGres is actually really, really interesting tic, because it 192 00:11:05,370 --> 00:11:10,290 has a bunch of cool stuff hidden underneath it. 193 00:11:10,290 --> 00:11:13,010 So there's a thing called Hstore which is a 194 00:11:13,010 --> 00:11:15,650 key-value store that's baked right in. So if you 195 00:11:15,650 --> 00:11:18,720 need a lightweight key-value store and you're already running 196 00:11:18,720 --> 00:11:22,700 PostGres in production, you, you have one. You don't 197 00:11:22,700 --> 00:11:25,600 need to spin up any other thing. You can 198 00:11:25,600 --> 00:11:27,450 actually do that today. 199 00:11:27,450 --> 00:11:30,320 The really interesting thing about that is, you can 200 00:11:30,320 --> 00:11:34,100 index those keys. You can do joins across an 201 00:11:34,100 --> 00:11:38,230 Hstore reference into, across multiple tables. It looks and 202 00:11:38,230 --> 00:11:40,250 feels exactly like the kind of thing that you're 203 00:11:40,250 --> 00:11:42,060 already working with. 204 00:11:42,060 --> 00:11:46,560 We've got, there's some things already baked into the 205 00:11:46,560 --> 00:11:49,380 Rails ecosystem that make this really easy if you're 206 00:11:49,380 --> 00:11:52,800 doing that kind of information. But the really exciting 207 00:11:52,800 --> 00:11:55,570 thing about what PostGres is up to at the 208 00:11:55,570 --> 00:12:01,720 moment is JSON. And 9.2, 9.3, and upcoming 9.4 209 00:12:01,720 --> 00:12:05,810 have pretty much a fully baked in JSON document 210 00:12:05,810 --> 00:12:11,190 database. And it is crazy awesome. The new one 211 00:12:11,190 --> 00:12:14,350 is super high-performance. If you were sort of, it's 212 00:12:14,350 --> 00:12:17,400 the same thing. If you're thinking, ah, you know, 213 00:12:17,400 --> 00:12:20,250 documents would be easier for this use case, let's 214 00:12:20,250 --> 00:12:24,730 install something else, we're actually, you already have one, 215 00:12:24,730 --> 00:12:26,690 and it, it has all of those same properties. 216 00:12:26,690 --> 00:12:28,610 You can index. You can do joins across your 217 00:12:28,610 --> 00:12:32,760 normal table into the documents. It's crazy cool. 218 00:12:32,760 --> 00:12:36,670 MySQL. It's pretty much the same as PostGres, is 219 00:12:36,670 --> 00:12:42,310 my answer. But there's a slight caveat. So, you 220 00:12:42,310 --> 00:12:47,900 know, I, I recall, they're a company. Many of 221 00:12:47,900 --> 00:12:50,180 the same things apply. Like, this is why, you 222 00:12:50,180 --> 00:12:52,380 know, they're, they're kind of in the same bucket. 223 00:12:52,380 --> 00:12:55,950 For me, it doesn't particularly matter at the end 224 00:12:55,950 --> 00:12:58,190 of the day. Whatever you happen to have expertise 225 00:12:58,190 --> 00:13:01,170 in, it's cool. It's got some kind of interesting 226 00:13:01,170 --> 00:13:02,870 things that you can do. You can switch out 227 00:13:02,870 --> 00:13:07,700 storage engines to actually get your different performance profiles. 228 00:13:07,700 --> 00:13:11,530 It is everywhere. It's got a thing called Handler 229 00:13:11,530 --> 00:13:16,410 Socket, which is essentially raw, right. Access through a 230 00:13:16,410 --> 00:13:19,740 low-level socket into the table infrastructure. There's some paper 231 00:13:19,740 --> 00:13:24,440 with really high performance kind of things. 232 00:13:24,440 --> 00:13:26,660 You can actually just sort of bypass the whole 233 00:13:26,660 --> 00:13:29,560 SQL engine, which is kind of interesting. The other 234 00:13:29,560 --> 00:13:31,870 thing that's happened since Oracle took over, which is 235 00:13:31,870 --> 00:13:35,340 kind of a really good thing, is that there's 236 00:13:35,340 --> 00:13:40,470 some alternatives. So MariaDB is sort of the, the 237 00:13:40,470 --> 00:13:44,650 more open fork. There's a semi-commercial addition that has 238 00:13:44,650 --> 00:13:47,900 lots of really high-performance features, and they basically run 239 00:13:47,900 --> 00:13:51,610 binary compatible patches, that's Percona. And they have, like, 240 00:13:51,610 --> 00:13:55,610 huge expertise. And this Toku is quite interesting. It's, 241 00:13:55,610 --> 00:13:58,400 they're doing all of this crazy fractal indexing and 242 00:13:58,400 --> 00:14:02,250 things for particular use cases on very large datasets. 243 00:14:02,250 --> 00:14:04,900 But it still just looks and behaves in many 244 00:14:04,900 --> 00:14:07,890 ways like the MySQL that you are kind of 245 00:14:07,890 --> 00:14:08,630 used to. 246 00:14:08,630 --> 00:14:13,130 So, there's some interesting things happening there. So these, 247 00:14:13,130 --> 00:14:16,640 hopefully none of that's a huge surprise. That's databases. 248 00:14:16,640 --> 00:14:21,320 You use it. It comes in the box, and 249 00:14:21,320 --> 00:14:22,600 ActiveRecord talks to it. 250 00:14:22,600 --> 00:14:24,779 So now we're gonna get slightly off the beaten 251 00:14:24,779 --> 00:14:30,370 track. So, a lot of what we know SQL 252 00:14:30,370 --> 00:14:35,370 comes from Dynamo, which was actually a paper that 253 00:14:35,370 --> 00:14:40,160 Amazon released years ago. I'm not gonna labor too 254 00:14:40,160 --> 00:14:42,460 much on this one. The paper's quite interesting. It 255 00:14:42,460 --> 00:14:48,000 talks about how you make a distributed system. 256 00:14:48,000 --> 00:14:51,930 The interesting thing is actually that Riak is essentially 257 00:14:51,930 --> 00:14:55,300 an implementation of the underlying Dynamo theory. So Riak 258 00:14:55,300 --> 00:14:58,340 is crazy awesome. This is what happens to you 259 00:14:58,340 --> 00:15:01,580 when you run Riak in production. 260 00:15:01,580 --> 00:15:02,430 [laughter] 261 00:15:02,430 --> 00:15:05,930 I pretty much, like, it's a conversation I, I 262 00:15:05,930 --> 00:15:08,720 often have with people is like, wouldn't it be 263 00:15:08,720 --> 00:15:12,750 awesome to have a problem that needed Riak? And 264 00:15:12,750 --> 00:15:13,710 it was like, yeah, that would be so cool. 265 00:15:13,710 --> 00:15:17,970 I'd be like the awesomeness engineer. 266 00:15:17,970 --> 00:15:21,430 So Riak is, it's just crazy-well engineered. They're doing 267 00:15:21,430 --> 00:15:26,870 all sorts of interesting stuff. It's inherently, it just 268 00:15:26,870 --> 00:15:30,680 understands clustering. You know, you add a new node, 269 00:15:30,680 --> 00:15:35,260 it just, it's there. You know. With, with those 270 00:15:35,260 --> 00:15:37,610 older kind of databases, it's, it's a pain in 271 00:15:37,610 --> 00:15:40,170 the ass to actually get it working. 272 00:15:40,170 --> 00:15:45,339 So, yeah, they're doing some really interesting things. It's 273 00:15:45,339 --> 00:15:47,920 got a cloud storage thing so you've got an 274 00:15:47,920 --> 00:15:50,120 S3-compatible API and all of these kind of stuff. 275 00:15:50,120 --> 00:15:51,360 A lot of the magic of the way this 276 00:15:51,360 --> 00:15:56,660 works is through consistent hashing. So, my slides are 277 00:15:56,660 --> 00:15:58,380 all mucked up. But anyway. 278 00:15:58,380 --> 00:16:00,640 So, basically what it does is it just partitions 279 00:16:00,640 --> 00:16:05,350 all of your data into a giant hash ring. 280 00:16:05,350 --> 00:16:10,450 Excuse me. Physical nodes then just own parts of 281 00:16:10,450 --> 00:16:12,370 that hash. You add a new node or take 282 00:16:12,370 --> 00:16:15,720 a node away and it repartitions all the rest 283 00:16:15,720 --> 00:16:17,940 of the data across the remaining nodes. And all 284 00:16:17,940 --> 00:16:21,480 of that is just completely in the background of 285 00:16:21,480 --> 00:16:24,680 how Riak just works operationally. 286 00:16:24,680 --> 00:16:27,300 So for large scale data and, you know, you, 287 00:16:27,300 --> 00:16:30,710 you get away with, it has some really nice 288 00:16:30,710 --> 00:16:34,170 operational characteristics that, that make it quite cool to 289 00:16:34,170 --> 00:16:34,750 manage. 290 00:16:34,750 --> 00:16:36,529 And then the other thing is, it's a very 291 00:16:36,529 --> 00:16:40,190 simple API. It's key-value store, you can store JSON 292 00:16:40,190 --> 00:16:42,220 documents in it, and it's just a bucket that 293 00:16:42,220 --> 00:16:45,130 has keys, and then it's got other stuff on 294 00:16:45,130 --> 00:16:49,510 top to retrieve data, do secondary indexes and searching 295 00:16:49,510 --> 00:16:51,130 and all of that kind of stuff. 296 00:16:51,130 --> 00:16:54,220 So, it's a very cool piece of tech. 297 00:16:54,220 --> 00:16:59,080 So, the other one we've got is, Google. Fucking 298 00:16:59,080 --> 00:17:03,810 annoying. And you'll see why in a second. So, 299 00:17:03,810 --> 00:17:06,980 Google had this thing called BigTable that, again, kind 300 00:17:06,980 --> 00:17:10,470 of comes out of the internal research. You have 301 00:17:10,470 --> 00:17:14,299 access to it through some of their cloud properties. 302 00:17:14,299 --> 00:17:16,799 As you can see, it's got, it's actually a 303 00:17:16,799 --> 00:17:21,289 sparse distributed multidimensional sorted map, which is good, I 304 00:17:21,289 --> 00:17:23,618 guess. I imagine. It's awesome. 305 00:17:23,618 --> 00:17:27,720 The stuff they're doing with this is crazy. So 306 00:17:27,720 --> 00:17:30,190 this is actually a, all, a couple years old 307 00:17:30,190 --> 00:17:33,409 I think now. Some of these, some of the 308 00:17:33,409 --> 00:17:37,190 information, so. Hundreds of petabytes of data, you know, 309 00:17:37,190 --> 00:17:40,580 ridiculous numbers of operations a second. You do not 310 00:17:40,580 --> 00:17:42,649 have any of these problems. 311 00:17:42,649 --> 00:17:46,879 So, then they, they took this stuff, they were 312 00:17:46,879 --> 00:17:50,210 like, ah, we've got BigTable. You know, that was, 313 00:17:50,210 --> 00:17:53,499 that was fucking easy. Whatever. And so now they've 314 00:17:53,499 --> 00:17:55,480 got two other things. They've got one called Spanner 315 00:17:55,480 --> 00:18:00,019 and one called F-one, where they're basically doing, you 316 00:18:00,019 --> 00:18:07,019 know, proper, sort of relational looking data across multiple 317 00:18:07,350 --> 00:18:10,320 data centers and, you know, and. They're kind of 318 00:18:10,320 --> 00:18:12,590 really pushing the boundaries of some of that CAP 319 00:18:12,590 --> 00:18:14,710 stuff that's going on. 320 00:18:14,710 --> 00:18:18,490 But all you need is a GPS in every 321 00:18:18,490 --> 00:18:21,379 server, a couple of atomic clocks in each data 322 00:18:21,379 --> 00:18:26,830 center, and you, great. So, Google's basically telling everyone 323 00:18:26,830 --> 00:18:29,720 to, you know, just fuck off. 324 00:18:29,720 --> 00:18:35,169 So, another one that I really, I really like, 325 00:18:35,169 --> 00:18:39,490 and have used a long, a long time ago 326 00:18:39,490 --> 00:18:45,690 in, in tech land, tech time, is Cassandra. Cassandra 327 00:18:45,690 --> 00:18:50,110 is a column-oriented database. Eventually it's awesome. It's really 328 00:18:50,110 --> 00:18:54,240 all about eventual consistency. 329 00:18:54,240 --> 00:18:57,519 And you can see here, this is a man, 330 00:18:57,519 --> 00:18:59,259 he eventually gets it right. So that's well done 331 00:18:59,259 --> 00:19:02,360 to him there. So Cassandra's a lot like that. 332 00:19:02,360 --> 00:19:06,019 And, again, you know, the cool thing is, it's 333 00:19:06,019 --> 00:19:10,549 a sparse distributor multi dimensional sorted map. It, when 334 00:19:10,549 --> 00:19:13,350 I was working with it, you, it was, you 335 00:19:13,350 --> 00:19:16,100 had, you described your tables kind of thing in 336 00:19:16,100 --> 00:19:20,309 XML and hated yourself, and then every time something 337 00:19:20,309 --> 00:19:23,460 changed you rebooted the server and that took awhile 338 00:19:23,460 --> 00:19:27,389 and, yeah, the whole thing was really difficult. 339 00:19:27,389 --> 00:19:30,570 What it basically does is it takes the availability 340 00:19:30,570 --> 00:19:33,570 side of the question. Like, that's its world model. 341 00:19:33,570 --> 00:19:37,830 It has, again, a very simple clustering system. New 342 00:19:37,830 --> 00:19:41,289 nodes, add in, the data gets streamed out. It 343 00:19:41,289 --> 00:19:46,070 has a data model that is really complicated, and 344 00:19:46,070 --> 00:19:48,470 I, even though I've used it, it's really hard 345 00:19:48,470 --> 00:19:50,909 to explain how it actually works. 346 00:19:50,909 --> 00:19:54,730 So column databases basically kind of invert the, the 347 00:19:54,730 --> 00:19:56,700 whole table structure that you're used to from the 348 00:19:56,700 --> 00:20:01,190 relational world. And the advantage is that, for some 349 00:20:01,190 --> 00:20:04,159 types of data, and for some queries, it is 350 00:20:04,159 --> 00:20:07,600 crazy blazing fast, cause you can just. Time series 351 00:20:07,600 --> 00:20:08,619 are always a good one, where you can just 352 00:20:08,619 --> 00:20:10,929 have long streams of time series and it will 353 00:20:10,929 --> 00:20:13,490 actually put that on disk or next to each 354 00:20:13,490 --> 00:20:15,600 other and you can just pull it all out. 355 00:20:15,600 --> 00:20:18,509 The cool thing in the new versions of Cassandra 356 00:20:18,509 --> 00:20:22,299 is that they've abstracted all of that out, and 357 00:20:22,299 --> 00:20:25,570 you actually just get tables, so you can create 358 00:20:25,570 --> 00:20:28,200 a table and give it a primary key, and 359 00:20:28,200 --> 00:20:32,239 under the covers, it's setting up rows and column 360 00:20:32,239 --> 00:20:35,239 families and columns and all of, all of these 361 00:20:35,239 --> 00:20:39,389 really abstract concepts, and they've completely made some of 362 00:20:39,389 --> 00:20:41,499 that go away. Which is really nice. 363 00:20:41,499 --> 00:20:43,929 So you end up with something that looks a 364 00:20:43,929 --> 00:20:48,739 lot like just SQL and, you know, a normal 365 00:20:48,739 --> 00:20:52,649 table kind of structure. It's just clustering out lots 366 00:20:52,649 --> 00:20:55,100 of nodes. It's very tunable, so you can actually 367 00:20:55,100 --> 00:20:57,989 set up, you know, it writes to a node 368 00:20:57,989 --> 00:21:00,019 and you can say, actually write to five nodes 369 00:21:00,019 --> 00:21:02,019 and that's a quorem and now we're cool. So 370 00:21:02,019 --> 00:21:06,019 you can tune how much redundancy you have. 371 00:21:06,019 --> 00:21:12,590 So that's kind of cool. That is a reminder. 372 00:21:12,590 --> 00:21:17,559 That went cold really fast. Thank you. 373 00:21:17,559 --> 00:21:20,830 So, the next one on our list is Memcache. 374 00:21:20,830 --> 00:21:24,220 Memcache, there was, there was a talk earlier in 375 00:21:24,220 --> 00:21:27,529 the week that was describing using Memcache and caching 376 00:21:27,529 --> 00:21:29,789 and it, it had a very interesting observation, which 377 00:21:29,789 --> 00:21:32,669 was, it just works. He didn't even know what 378 00:21:32,669 --> 00:21:36,590 version he was running in production, cause neh. Doesn't 379 00:21:36,590 --> 00:21:38,739 matter. That API has been stable for ages. 380 00:21:38,739 --> 00:21:42,419 And I know, I know what you're saying. It's 381 00:21:42,419 --> 00:21:45,559 not a database. It's a cache. Technically true. But 382 00:21:45,559 --> 00:21:48,049 it's interesting to think about, because the moment you 383 00:21:48,049 --> 00:21:51,379 add caching, even if you've been ignoring the fact 384 00:21:51,379 --> 00:21:54,779 that you had a distributed system before, with caching 385 00:21:54,779 --> 00:21:57,330 you now really have a distributed system. You've got 386 00:21:57,330 --> 00:21:59,980 data in one thing that may or may not 387 00:21:59,980 --> 00:22:02,759 be fresh, and you've got data in your database 388 00:22:02,759 --> 00:22:05,119 that, you know, you assume is up to date, 389 00:22:05,119 --> 00:22:07,249 and now you've got a synchronization problem. 390 00:22:07,249 --> 00:22:12,080 So, Memcache is actually really, you know, it's, it's 391 00:22:12,080 --> 00:22:16,659 just rock solid, old as the hills technology, completely 392 00:22:16,659 --> 00:22:22,279 simple. The API is everywhere. Lots of people actually 393 00:22:22,279 --> 00:22:26,119 have made their, you know, key-value store they made 394 00:22:26,119 --> 00:22:28,309 in the hacknight, which, you know, is a useful 395 00:22:28,309 --> 00:22:30,739 hobby if you want to annoy everyone. 396 00:22:30,739 --> 00:22:33,139 You have the, their API is actually the Memcached 397 00:22:33,139 --> 00:22:36,080 API. It's got a handful of things. You can 398 00:22:36,080 --> 00:22:40,129 set a key, you can replace one. It does 399 00:22:40,129 --> 00:22:43,679 have something atomic operations so you can increment and 400 00:22:43,679 --> 00:22:46,149 decrement so that there is some flexibility to actually 401 00:22:46,149 --> 00:22:51,669 do a little bit of data storage in a, 402 00:22:51,669 --> 00:22:55,779 in a more traditional sense. 403 00:22:55,779 --> 00:22:59,389 It's actually a client-server model. Your, your driver is 404 00:22:59,389 --> 00:23:02,429 responsible for the clustering in a way, so you 405 00:23:02,429 --> 00:23:07,049 can have multiple Memcache nodes and the, the hashing 406 00:23:07,049 --> 00:23:11,279 algorithm determines which node, which node a particular piece 407 00:23:11,279 --> 00:23:13,440 of data is gonna be on. 408 00:23:13,440 --> 00:23:15,960 That has the property of making it very, very 409 00:23:15,960 --> 00:23:19,440 simple to use. And there's no cluster state. There's 410 00:23:19,440 --> 00:23:21,889 no coordination that nodes have. Like, a lot of 411 00:23:21,889 --> 00:23:23,519 the heavy lifting all of these other things are 412 00:23:23,519 --> 00:23:27,869 doing is about coordinating around all of that information. 413 00:23:27,869 --> 00:23:29,749 There's a whole bunch of awesome stuff just baked 414 00:23:29,749 --> 00:23:34,519 into Rails. So you can just easily cache into 415 00:23:34,519 --> 00:23:38,940 Memcache, or your normal Rails fragment mutations. All of 416 00:23:38,940 --> 00:23:40,869 that kind of stuff. 417 00:23:40,869 --> 00:23:42,409 And there's even some things we can, you can 418 00:23:42,409 --> 00:23:46,289 actually put, push that into ActiveRecord and have, have 419 00:23:46,289 --> 00:23:48,440 caching at that level as well. 420 00:23:48,440 --> 00:23:50,700 Redis is an interesting one for the, the Rails 421 00:23:50,700 --> 00:23:56,580 community. Cause it's basically a queue, now. Everyone seems 422 00:23:56,580 --> 00:24:01,369 to be running Resq, Sidekiq, and, you know, Redis 423 00:24:01,369 --> 00:24:05,659 is, again, one of those just pieces of technology 424 00:24:05,659 --> 00:24:12,220 that is beautifully engineered, incredibly simple, incredibly robust. The 425 00:24:12,220 --> 00:24:19,220 maintainers are just absolute, you know, scientists, I guess. 426 00:24:19,309 --> 00:24:22,999 Just a whole other level of crazy algorithm stuff. 427 00:24:22,999 --> 00:24:25,299 And they make blog posts and, you know, I'm 428 00:24:25,299 --> 00:24:31,519 so stupid. I don't understand what you're talking about. 429 00:24:31,519 --> 00:24:35,989 It's really fast, it's slightly hard to distribute. A 430 00:24:35,989 --> 00:24:38,710 lot of that's in the pipeline with Redis. It's 431 00:24:38,710 --> 00:24:42,379 much more, it's much more simple to, to stick 432 00:24:42,379 --> 00:24:46,070 it on one node and increase the RAM. It's 433 00:24:46,070 --> 00:24:49,359 mu, more complicated then Memcache. It's essentially just an 434 00:24:49,359 --> 00:24:52,129 in-memory cache. It has a bunch of really interesting 435 00:24:52,129 --> 00:24:56,679 data structures, though. I think if you've been confused 436 00:24:56,679 --> 00:24:59,029 all week, now, which country I'm from, whether I 437 00:24:59,029 --> 00:25:01,720 say dayta or dahta, so now I just changed 438 00:25:01,720 --> 00:25:03,710 them randomly. 439 00:25:03,710 --> 00:25:08,070 So, you can, you have hashes you have lists, 440 00:25:08,070 --> 00:25:09,779 you have strings. You've got all sorts of other 441 00:25:09,779 --> 00:25:14,129 interesting things. You can do optimistic locking and have, 442 00:25:14,129 --> 00:25:17,609 you know, a bunch of operations that are essentially 443 00:25:17,609 --> 00:25:22,369 batched. You can do sort of, there's long ways 444 00:25:22,369 --> 00:25:25,440 of doing this kind of stuff. It's Resque and 445 00:25:25,440 --> 00:25:28,690 Sidekiq both just make this, make it super simple 446 00:25:28,690 --> 00:25:31,139 to do background tasks with Rails and install the 447 00:25:31,139 --> 00:25:36,960 gem, have a worker, and it's all just magic. 448 00:25:36,960 --> 00:25:39,769 It is Lua baked in, which is a whole 449 00:25:39,769 --> 00:25:41,850 other thing. But Lua is a really cool programming 450 00:25:41,850 --> 00:25:44,940 language that is designed for embeddability. But one of 451 00:25:44,940 --> 00:25:47,210 the things that happens if you can actually write 452 00:25:47,210 --> 00:25:51,389 little rule, Lua scripts that end up going into 453 00:25:51,389 --> 00:25:54,519 the Redis server to do more complex operations. So, 454 00:25:54,519 --> 00:25:57,179 in this case, this is a little script that 455 00:25:57,179 --> 00:26:00,269 grabs something off a sorted hash and then deletes 456 00:26:00,269 --> 00:26:02,789 them and then returns the first thing, like, then 457 00:26:02,789 --> 00:26:05,789 returns what we had done. But it's, it's an 458 00:26:05,789 --> 00:26:09,529 atomic kind of transactional way. 459 00:26:09,529 --> 00:26:13,320 And, good news everybody! We've just invented stored procedures. 460 00:26:13,320 --> 00:26:16,409 So that's very exciting. Except now they're much more 461 00:26:16,409 --> 00:26:18,639 hip, because it's an in-memory database with a language 462 00:26:18,639 --> 00:26:23,330 no one's heard of. So. We are rocking it. 463 00:26:23,330 --> 00:26:28,470 Also, maybe use a queue. Just, I know it's 464 00:26:28,470 --> 00:26:32,869 crazy. But, if you're actually queuing, using Redis as 465 00:26:32,869 --> 00:26:36,809 your queue, maybe you have a queuing problem and 466 00:26:36,809 --> 00:26:39,609 you have queues. They exist. They're a thing. It's 467 00:26:39,609 --> 00:26:41,440 ridiculous. I know. 468 00:26:41,440 --> 00:26:46,379 So, RabbitMQ is sort of the gold standard, and 469 00:26:46,379 --> 00:26:49,129 Kafka is another one that was talked about earlier 470 00:26:49,129 --> 00:26:50,909 this week, and it is crazy cool. 471 00:26:50,909 --> 00:26:56,129 Where am I? Man. All right. Just gonna stretch. 472 00:26:56,129 --> 00:26:58,820 I've lost count, so I don't know, now I'm 473 00:26:58,820 --> 00:27:02,019 just gonna talk faster. Cool. 474 00:27:02,019 --> 00:27:08,369 Neo4j is really interesting. It's a graph database. That's. 475 00:27:08,369 --> 00:27:13,350 It's slightly hard to explain. But you, the way 476 00:27:13,350 --> 00:27:15,210 I actually think about it, we'll just jump straight 477 00:27:15,210 --> 00:27:17,460 to here, is it's almost but not quite entirely 478 00:27:17,460 --> 00:27:22,950 unlike a relational database. The difference, essentially, is that 479 00:27:22,950 --> 00:27:27,409 it is optimize for the connections rather than aggregated 480 00:27:27,409 --> 00:27:31,710 data. So relational database, you, puts things in, in 481 00:27:31,710 --> 00:27:33,279 a way where you can get a sum and 482 00:27:33,279 --> 00:27:35,179 a count and like, that's kind of the heritage 483 00:27:35,179 --> 00:27:37,029 of that kind of world view. 484 00:27:37,029 --> 00:27:40,340 Whereas what the Neo4j people are doing is actually 485 00:27:40,340 --> 00:27:44,739 thinking about connections between pieces of data, and for 486 00:27:44,739 --> 00:27:49,340 some use cases, this is actually really, really amazing 487 00:27:49,340 --> 00:27:52,369 stuff. So you have, a graph is basically a 488 00:27:52,369 --> 00:27:56,850 collection of nodes, and those nodes can have relationships 489 00:27:56,850 --> 00:27:59,179 between each other, and then a node just has 490 00:27:59,179 --> 00:28:01,330 properties. 491 00:28:01,330 --> 00:28:03,830 It's essentially an object database in a way. It's 492 00:28:03,830 --> 00:28:05,639 like very similar to the way that we think 493 00:28:05,639 --> 00:28:08,109 about objects. So it has some really nice properties 494 00:28:08,109 --> 00:28:11,859 if you're working in a language like Ruby. And 495 00:28:11,859 --> 00:28:17,009 then it just does stuff that, you know, in 496 00:28:17,009 --> 00:28:19,090 a really intuitive way. So if we've got a 497 00:28:19,090 --> 00:28:22,159 graph of movies and actors, you actually define a 498 00:28:22,159 --> 00:28:26,460 relationship by name. Then an actor acts in a 499 00:28:26,460 --> 00:28:28,700 movie. And then when you were doing your queries, 500 00:28:28,700 --> 00:28:32,909 this is a language called Cypher, you actually, that's 501 00:28:32,909 --> 00:28:34,059 a first-class thing. 502 00:28:34,059 --> 00:28:36,019 Whereas in a relational world, you're, you're using a 503 00:28:36,019 --> 00:28:39,279 foreign key, which has no semantic meaning at all. 504 00:28:39,279 --> 00:28:41,330 You, you just have to remember that, you know, 505 00:28:41,330 --> 00:28:43,019 an actor, you know, there's a table with an 506 00:28:43,019 --> 00:28:45,729 actor id, and a movie id, and we're joining 507 00:28:45,729 --> 00:28:49,919 across somewhere. Whereas Neo4j actually makes those relationships first 508 00:28:49,919 --> 00:28:53,359 class citizens. So if you've got problems that are 509 00:28:53,359 --> 00:29:00,359 graph problems, like social network friend cloud stuff, some 510 00:29:01,549 --> 00:29:04,799 of that stuff, Neo4j just makes trivially easy in 511 00:29:04,799 --> 00:29:06,070 a way that you would have had to do 512 00:29:06,070 --> 00:29:10,119 a recursive self-join in PostGres and hate your life 513 00:29:10,119 --> 00:29:12,499 and, you know. 514 00:29:12,499 --> 00:29:17,029 Couch is cool. I guess. Pretty much that's my 515 00:29:17,029 --> 00:29:21,029 opinion of it. It's really awesome. But, you can't 516 00:29:21,029 --> 00:29:25,659 query it. So cool. 517 00:29:25,659 --> 00:29:28,109 That's it. That's a slight disservice to Couch but, 518 00:29:28,109 --> 00:29:31,970 you know, whatever. MongoDB, as we all know, it 519 00:29:31,970 --> 00:29:34,559 is webscale and that's excellent. If you think of 520 00:29:34,559 --> 00:29:39,200 it as Redis for JSON, that's good. Sixty percent 521 00:29:39,200 --> 00:29:41,249 of the time, it works every time. Everyone's familiar 522 00:29:41,249 --> 00:29:43,169 with that. 523 00:29:43,169 --> 00:29:46,929 So, the thing that's really, I mean, Mongo, it 524 00:29:46,929 --> 00:29:50,919 reminds me of My, MySQL. Like, Mongo is kind 525 00:29:50,919 --> 00:29:54,320 of terrible, but MySQL was kind of terrible, too. 526 00:29:54,320 --> 00:29:56,789 Like, when that came out, it didn't do transactions, 527 00:29:56,789 --> 00:30:00,039 for example, and I, I was working in enterprise-y 528 00:30:00,039 --> 00:30:04,419 land, and transactions are actually a thing. And, you're 529 00:30:04,419 --> 00:30:08,929 like, you script kiddies with your database. 530 00:30:08,929 --> 00:30:10,789 So Mongo feels like that, and not, you know, 531 00:30:10,789 --> 00:30:13,970 what we learned is, if you make something that's 532 00:30:13,970 --> 00:30:17,539 awesome and useful and everywhere and ubiquitous and it 533 00:30:17,539 --> 00:30:20,749 doesn't work, you can make it work. And eventually, 534 00:30:20,749 --> 00:30:23,309 you know, MySQL is a real database. So Mongo 535 00:30:23,309 --> 00:30:25,470 feels a bit like that. It's come a massive 536 00:30:25,470 --> 00:30:30,690 way, right about really early on with very early 537 00:30:30,690 --> 00:30:32,309 versions. 538 00:30:32,309 --> 00:30:34,759 It stores JSON. Well sort of it. It stores 539 00:30:34,759 --> 00:30:39,710 BSON, anyway. That's just binary JSON basically. And it's 540 00:30:39,710 --> 00:30:42,409 a, it's a really beautiful model to work with 541 00:30:42,409 --> 00:30:45,129 in a development cycle, which is why think is 542 00:30:45,129 --> 00:30:47,489 why there's, why there's so much appeal. You've just 543 00:30:47,489 --> 00:30:50,929 got kind of, people treat it like an object 544 00:30:50,929 --> 00:30:53,690 database. You've just got an object that's in there, 545 00:30:53,690 --> 00:30:55,720 and you can pull out objects and manipulate them 546 00:30:55,720 --> 00:30:59,859 and do all of this kind of crazy stuff. 547 00:30:59,859 --> 00:31:05,220 The people who know what they're talking about, though, 548 00:31:05,220 --> 00:31:08,450 with distributed systems, if the reason you're using Mongo 549 00:31:08,450 --> 00:31:10,299 is because you think it's a panacea for all 550 00:31:10,299 --> 00:31:13,700 of this, you know, we need to be webscale 551 00:31:13,700 --> 00:31:17,229 and do all of this kind of stuff, that 552 00:31:17,229 --> 00:31:19,399 is not a good reason to use it. Cause 553 00:31:19,399 --> 00:31:21,919 there, there's still a lot of operational problems and, 554 00:31:21,919 --> 00:31:23,739 and stuff going on. 555 00:31:23,739 --> 00:31:30,179 This, this one is interesting. It's essentially, RethinkDB is 556 00:31:30,179 --> 00:31:33,299 coming from the PostGres world view. Cause PostGres made, 557 00:31:33,299 --> 00:31:36,729 you know, MySQL was like, whatever, we'll fix it. 558 00:31:36,729 --> 00:31:39,669 PostGres was like, we'll do it right and it, 559 00:31:39,669 --> 00:31:41,629 you can't use it cause it's so slow, but 560 00:31:41,629 --> 00:31:43,539 at least it's correct. And they took lots of 561 00:31:43,539 --> 00:31:46,539 iterations to make it usable. So Rethink is kind 562 00:31:46,539 --> 00:31:48,340 of that school of thought. It's like, we're gonna 563 00:31:48,340 --> 00:31:50,619 make it all correct first, and then we'll make 564 00:31:50,619 --> 00:31:55,799 it usable. So it's very similar idea. JSON, you 565 00:31:55,799 --> 00:31:59,429 know, they're trying to make it operationally great with 566 00:31:59,429 --> 00:32:02,979 automatic clustering and all this kind of stuff. You 567 00:32:02,979 --> 00:32:05,149 know. Who knows what it is and how it's 568 00:32:05,149 --> 00:32:07,179 actually gonna behave in the real world. It's still 569 00:32:07,179 --> 00:32:09,159 a very early piece of tech. 570 00:32:09,159 --> 00:32:11,249 And that leads me into, there's a whole world 571 00:32:11,249 --> 00:32:15,479 of databases around what I'm loosely calling the commercial 572 00:32:15,479 --> 00:32:20,149 fringe. So Couchbase is the Couch guys and sort 573 00:32:20,149 --> 00:32:24,019 of some commercial Memcached guys who got together to 574 00:32:24,019 --> 00:32:28,409 make a hybrid something. Aerospike is, their marketing is 575 00:32:28,409 --> 00:32:31,519 great. That's about the best you can say about 576 00:32:31,519 --> 00:32:31,869 it. 577 00:32:31,869 --> 00:32:33,289 So there's a whole bunch of people trying to 578 00:32:33,289 --> 00:32:36,799 solve these problems in interesting ways. But all of 579 00:32:36,799 --> 00:32:40,720 these ones cost money and, you know, they're, the 580 00:32:40,720 --> 00:32:42,200 mileage varies and all of that kind of stuff. 581 00:32:42,200 --> 00:32:43,539 The cool thing about open sources ones is you 582 00:32:43,539 --> 00:32:45,029 get it and you try it and you hate 583 00:32:45,029 --> 00:32:46,570 it and you go back to PostGres so it's 584 00:32:46,570 --> 00:32:48,190 all fine. 585 00:32:48,190 --> 00:32:53,190 So, Hyperdex. This is my favorite. Because they have 586 00:32:53,190 --> 00:32:58,379 HyperSpace Hashing, and it is so cool. These guys 587 00:32:58,379 --> 00:33:02,369 are making some really broad, amazing claims about the, 588 00:33:02,369 --> 00:33:06,549 the kind of things that they can do. Crazy 589 00:33:06,549 --> 00:33:08,690 fast. It's, it's a key-value store but it will 590 00:33:08,690 --> 00:33:11,599 index, you know, it's not just a key but 591 00:33:11,599 --> 00:33:14,039 it will index the properties of a value. So 592 00:33:14,039 --> 00:33:16,509 now you can do que, you know, genuine queries 593 00:33:16,509 --> 00:33:20,629 into the structure of objects that you're storing. 594 00:33:20,629 --> 00:33:23,499 They've got a whole bunch of papers around what 595 00:33:23,499 --> 00:33:27,299 they're doing. So, you can read that as, who 596 00:33:27,299 --> 00:33:29,679 knows what it means. It maps objects to coordinates 597 00:33:29,679 --> 00:33:34,529 in a multi-dimensioned Euclidean space. HyperSpace. And I'm like. 598 00:33:34,529 --> 00:33:37,109 Take my money! 599 00:33:37,109 --> 00:33:40,989 And there's a, there's a picture of HyperSpace. And, 600 00:33:40,989 --> 00:33:43,659 like, I've read that like eight times. I don't 601 00:33:43,659 --> 00:33:49,999 understand what's going on. But if, it does seem 602 00:33:49,999 --> 00:33:52,070 to be true. They're trying to solve some of 603 00:33:52,070 --> 00:33:54,720 these problems and, you know, they call themselves like 604 00:33:54,720 --> 00:33:59,659 a second generation NoSQL thing, in a similar way 605 00:33:59,659 --> 00:34:01,669 to Google, you know, kind of taking all of 606 00:34:01,669 --> 00:34:05,039 this stuff and trying to push the science underneath 607 00:34:05,039 --> 00:34:06,999 it forward. 608 00:34:06,999 --> 00:34:09,510 So you can, you know, it's got a Ruby 609 00:34:09,510 --> 00:34:12,960 client. You can use it now. It's got, just, 610 00:34:12,960 --> 00:34:18,429 normal key-value. It's got atomic stuff. You can do 611 00:34:18,429 --> 00:34:22,969 conditional ports, so this is some code that's basically 612 00:34:22,969 --> 00:34:26,860 is only updating if the, only updating the current 613 00:34:26,860 --> 00:34:31,969 balance if the, updating the balance if the current 614 00:34:31,969 --> 00:34:34,460 balance is what we think it is. Otherwise some 615 00:34:34,460 --> 00:34:36,460 other thread has updated it. 616 00:34:36,460 --> 00:34:38,889 So there's some really interesting stuff they can do. 617 00:34:38,889 --> 00:34:43,650 And they're guaranteeing those operations across the cluster. And 618 00:34:43,650 --> 00:34:45,620 it's also got a transactional engine as well, so 619 00:34:45,620 --> 00:34:47,250 that's really exciting. 620 00:34:47,250 --> 00:34:51,610 Running out of time. HBase and Hadoop. You don't 621 00:34:51,610 --> 00:34:54,679 have any of these problems. Don't worry about it. 622 00:34:54,679 --> 00:34:56,219 You probably don't want to have any of these 623 00:34:56,219 --> 00:34:59,840 problems. Cause this just ends up, you need to 624 00:34:59,840 --> 00:35:03,870 install every fucking thing the Apache foundation has ever 625 00:35:03,870 --> 00:35:08,240 made. And this isn't even the full list. This 626 00:35:08,240 --> 00:35:09,980 is like, you probably need those. 627 00:35:09,980 --> 00:35:12,620 I have a friend, he's a bit of a 628 00:35:12,620 --> 00:35:16,870 dick, and he, he calls it, cause he, he 629 00:35:16,870 --> 00:35:19,710 works in an actual big data organization, and he 630 00:35:19,710 --> 00:35:21,630 just, he goes, oh, you people with your small 631 00:35:21,630 --> 00:35:25,970 to medium data. So, yeah, like, most of us, 632 00:35:25,970 --> 00:35:27,630 we don't have big data in any sense of 633 00:35:27,630 --> 00:35:31,060 the word, really. Like, if, if it's got GB 634 00:35:31,060 --> 00:35:34,600 on the end of it, meh. You're not there 635 00:35:34,600 --> 00:35:35,530 yet. 636 00:35:35,530 --> 00:35:40,730 So, again, this is just you know, Facebook is 637 00:35:40,730 --> 00:35:42,270 using the hell out of this stuff, and they're 638 00:35:42,270 --> 00:35:44,860 just like, this is all out of date. They're 639 00:35:44,860 --> 00:35:49,590 like now just, they can't buy hard disks fast 640 00:35:49,590 --> 00:35:53,930 enough. It's crazy. Yeah. There was a punch line 641 00:35:53,930 --> 00:35:56,380 at the end of all of that. 642 00:35:56,380 --> 00:35:57,920 But my friend, the guy who I said was 643 00:35:57,920 --> 00:36:00,960 a bit of a dick, he, he recommends having 644 00:36:00,960 --> 00:36:04,230 a look at this. And this is his quote, 645 00:36:04,230 --> 00:36:07,090 if you want to appear really cool and underground, 646 00:36:07,090 --> 00:36:09,140 then I reckon the next big thing is the 647 00:36:09,140 --> 00:36:12,280 Berkeley Data Analytics Stack. So, there's a whole bunch 648 00:36:12,280 --> 00:36:15,580 of people who are looking at that, you know, 649 00:36:15,580 --> 00:36:18,180 crazy big data situation and trying to work out 650 00:36:18,180 --> 00:36:22,210 what that means and what the future is. 651 00:36:22,210 --> 00:36:24,800 And so Apache and Berkeley are kind of in 652 00:36:24,800 --> 00:36:26,940 a cold war for that at the moment. And 653 00:36:26,940 --> 00:36:29,140 then there's heaps of people in the enterprise space 654 00:36:29,140 --> 00:36:31,850 because you can sell lots of products and or 655 00:36:31,850 --> 00:36:34,590 services to large companies who think they have a 656 00:36:34,590 --> 00:36:37,710 big data problem. So that's cool. 657 00:36:37,710 --> 00:36:39,650 That's fine. This isn't, this is just a little 658 00:36:39,650 --> 00:36:44,990 thing that's an embeddable document key-value store that you 659 00:36:44,990 --> 00:36:47,430 can, it's just kind of a fun team and 660 00:36:47,430 --> 00:36:49,210 has an API that looks very similar to the 661 00:36:49,210 --> 00:36:52,520 Mongo one. And it just sits in process. 662 00:36:52,520 --> 00:36:56,210 Oh, ElasticSearch. Every time I use it, I think, 663 00:36:56,210 --> 00:37:01,400 why can you not be my database? It's awesome. 664 00:37:01,400 --> 00:37:03,370 But it loses a couple of points there because 665 00:37:03,370 --> 00:37:08,920 of its configurationability. It went, it works when you 666 00:37:08,920 --> 00:37:10,830 know how to make it works, and it's crazy 667 00:37:10,830 --> 00:37:12,680 complicated sometimes. 668 00:37:12,680 --> 00:37:19,640 So anyway. Thirty. Four minutes over technically, I think. 669 00:37:19,640 --> 00:37:21,950 Yeah. So that's good. 670 00:37:21,950 --> 00:37:28,950 That's databases in a nutshell. I'm Toby Hede. I'm 671 00:37:29,160 --> 00:37:31,280 around the conference if you want to talk about 672 00:37:31,280 --> 00:37:35,340 databases. I think of myself as a lapa-, a 673 00:37:35,340 --> 00:37:39,320 lap- a butterfly collector, I guess, is what I'm 674 00:37:39,320 --> 00:37:41,200 looking for, of databases. 675 00:37:41,200 --> 00:37:45,960 Yeah. So come and say hi. Cool.