WEBVTT 00:00:17.080 --> 00:00:18.190 TODD SCHNEIDER: All right. We're, we're good. Thank you. 00:00:18.190 --> 00:00:19.910 Sorry for the delay. Classic. 00:00:19.910 --> 00:00:22.270 Even in the future nothing works. Welcome. 00:00:22.270 --> 00:00:26.240 I am Todd. I'm an engineer at Rap Genius. 00:00:26.240 --> 00:00:31.640 And today's talk is going to be about data science with a live tutorial. 00:00:31.640 --> 00:00:34.360 And before we get into the live coding component, 00:00:34.360 --> 00:00:36.070 I wanted to show you all a project I 00:00:36.070 --> 00:00:39.030 built previously, which kind of serves as the inspiration 00:00:39.030 --> 00:00:41.470 for this talk. Sort of. So this is a 00:00:41.470 --> 00:00:45.440 website called weddingcrunchers dot com. What is Wedding Crunchers? 00:00:45.440 --> 00:00:48.110 It's a place where you can track the, the 00:00:48.110 --> 00:00:50.979 popularity of words and phrases in the New York 00:00:50.979 --> 00:00:54.449 Times wedding section over the past thirty-some years. 00:00:54.449 --> 00:00:56.129 And a lot of you might be wondering why 00:00:56.129 --> 00:00:58.640 on earth would this be interesting or relevant or 00:00:58.640 --> 00:01:01.530 funny or anything, and I hope to convince you 00:01:01.530 --> 00:01:04.360 of that very quickly. Here is a, a example 00:01:04.360 --> 00:01:07.220 wedding announcement from the New York Times. This one's 00:01:07.220 --> 00:01:08.030 from 1985. 00:01:08.030 --> 00:01:08.970 If you don't know me, you don't live in 00:01:08.970 --> 00:01:11.260 New York, read the New York Times, the wedding 00:01:11.260 --> 00:01:14.280 section is a certain cultural cache. It's kind of 00:01:14.280 --> 00:01:15.720 an honor to be listed in there and it's 00:01:15.720 --> 00:01:18.580 got a very resume-like structure. People get to brag 00:01:18.580 --> 00:01:20.110 about where they went to school and what they 00:01:20.110 --> 00:01:20.979 do. 00:01:20.979 --> 00:01:23.050 So here is an example. You know, Diane deCordova 00:01:23.050 --> 00:01:25.270 is marrying Michael Monro Lewis. They both went to 00:01:25.270 --> 00:01:28.250 Princeton. They graduated Cum Laude. You know, she works 00:01:28.250 --> 00:01:30.440 at Morgan Stanley. He works at Solomon Brothers in 00:01:30.440 --> 00:01:32.610 New York and they're gonna go to London. And 00:01:32.610 --> 00:01:34.430 this should be a little familiar to a bunch 00:01:34.430 --> 00:01:35.420 of you. 00:01:35.420 --> 00:01:37.870 Mr. Lewis and associates Solomon Brothers is Michael Lewis. 00:01:37.870 --> 00:01:40.600 He's given you Right Lawyers Poker??, famous book about 00:01:40.600 --> 00:01:42.810 his experience there. And before, before he was a 00:01:42.810 --> 00:01:45.710 famous writer, he was just another New York Times 00:01:45.710 --> 00:01:49.630 wedding announced person. 00:01:49.630 --> 00:01:51.560 And so what Wedding Crunchers does is it takes 00:01:51.560 --> 00:01:54.560 the entire corpus of New York Times wedding announcements 00:01:54.560 --> 00:01:57.409 back from 1981 and you can searh for words 00:01:57.409 --> 00:01:59.520 and phrases and you can see how common those 00:01:59.520 --> 00:02:01.800 words and phrases are, you know, by year. It's 00:02:01.800 --> 00:02:03.320 like, this is a good one that's relevant to 00:02:03.320 --> 00:02:06.409 people here. You know, banker and programmer. You know, 00:02:06.409 --> 00:02:08.979 for example, when you list so-and-so is a banker 00:02:08.979 --> 00:02:11.780 or is a programmer in the announcement and you 00:02:11.780 --> 00:02:13.700 see, over time, you know, banker used to be 00:02:13.700 --> 00:02:18.450 way more commonly used than programmer in these announcements. 00:02:18.450 --> 00:02:21.140 And only just this year, in 2014, programmer has 00:02:21.140 --> 00:02:28.140 finally overtaken banker as, you know, the, the place, 00:02:28.190 --> 00:02:29.890 you know, the people getting married in New York, 00:02:29.890 --> 00:02:32.770 who are part of society, come from. Another good 00:02:32.770 --> 00:02:35.170 one is, if you look at goldman, sachs and 00:02:35.170 --> 00:02:37.600 google- is my internet on? Good. 00:02:37.600 --> 00:02:41.150 So here's another good one. So Goldman Sachs, you 00:02:41.150 --> 00:02:44.120 know, classic New York financial instition. Google, new kid 00:02:44.120 --> 00:02:47.160 on the block. Tech scene. Boom. Taking over. 00:02:47.160 --> 00:02:49.800 And, you know, this is obviously fun, and it's 00:02:49.800 --> 00:02:52.440 amusing. But it's also actually pretty insightful for a 00:02:52.440 --> 00:02:55.760 relatively simple concept. I mean, this one graph tells 00:02:55.760 --> 00:02:58.740 a pretty powerful story of, you know, New York 00:02:58.740 --> 00:03:01.750 the, the finance capitol of the world. Meanwhile, we 00:03:01.750 --> 00:03:03.550 have this sort of emerging tech scene. You know, 00:03:03.550 --> 00:03:05.150 Google may be the biggest player in the kind 00:03:05.150 --> 00:03:06.959 of new tech world. 00:03:06.959 --> 00:03:09.510 And now, when you turn to the society pages 00:03:09.510 --> 00:03:11.209 to see who's getting married, you know, there's more 00:03:11.209 --> 00:03:13.970 employees from Google than there are from Gullman Sachs. 00:03:13.970 --> 00:03:16.750 And that, you know, kind of interesting thing in 00:03:16.750 --> 00:03:17.739 the world. 00:03:17.739 --> 00:03:20.500 And so what we're gonna do today is build 00:03:20.500 --> 00:03:25.120 something just like Wedding Crunchers, except, instead of using 00:03:25.120 --> 00:03:28.280 the text of wedding announcements to analyze, we're going 00:03:28.280 --> 00:03:32.670 to look at all of the RailsConf talk abstracts. 00:03:32.670 --> 00:03:34.080 And so, you know, hopefully this is, this is 00:03:34.080 --> 00:03:36.550 interesting to people here and, I always say, you 00:03:36.550 --> 00:03:38.709 know, if there's only one thing you take from 00:03:38.709 --> 00:03:41.319 this talk, really, what it should be is that, 00:03:41.319 --> 00:03:43.709 you know, work on a problem that's interesting to 00:03:43.709 --> 00:03:46.260 you. Because, especially when you're dealing with data science, 00:03:46.260 --> 00:03:47.590 a lot of it's pretty messy and then you 00:03:47.590 --> 00:03:49.290 have to go through scraping stuff as we'll get 00:03:49.290 --> 00:03:51.879 into, and it's easy to get frustrated and kind 00:03:51.879 --> 00:03:53.810 of lost and like, if you're not working on 00:03:53.810 --> 00:03:55.450 something that you care about, and something that you 00:03:55.450 --> 00:03:58.060 really want to know, kind of, the final result, 00:03:58.060 --> 00:04:00.110 it's just much easier to get distracted and kind 00:04:00.110 --> 00:04:01.069 of, ultimately, bail. 00:04:01.069 --> 00:04:03.819 So, again, if you take one thing, just work 00:04:03.819 --> 00:04:07.550 on something that is interesting to you. So the 00:04:07.550 --> 00:04:09.819 particular kind of analysis we're gonna do is something 00:04:09.819 --> 00:04:12.680 called n-gram analysis. And I have a little example 00:04:12.680 --> 00:04:14.190 set up here. So what is an n-gram? You 00:04:14.190 --> 00:04:15.800 may have heard the word before. 00:04:15.800 --> 00:04:19.099 Really, all it means is, you know, a, a 00:04:19.099 --> 00:04:23.830 consecutive words as part of a sentence. So like, 00:04:23.830 --> 00:04:26.030 examples very simple, for one simple. This talk is 00:04:26.030 --> 00:04:28.000 boring. What are the, what are the one grams 00:04:28.000 --> 00:04:30.330 in this sentence? It's just the words. This, talk, 00:04:30.330 --> 00:04:32.780 is, and boring. The two grams are every pair 00:04:32.780 --> 00:04:35.839 of consecutive words. This talk, talk is, is boring, 00:04:35.839 --> 00:04:37.219 and so on. 00:04:37.219 --> 00:04:38.150 And so what we need to be able to 00:04:38.150 --> 00:04:40.889 do in order to build, you know, a graph 00:04:40.889 --> 00:04:43.300 like this, is we need to take a term 00:04:43.300 --> 00:04:45.159 that's, you know, relavent to RailsConf, say something like 00:04:45.159 --> 00:04:46.960 Ember or whatever, and we need to be able 00:04:46.960 --> 00:04:48.759 to look up, you know, for each year how 00:04:48.759 --> 00:04:51.300 many times does this, you know, word or n-gram 00:04:51.300 --> 00:04:53.610 appear in the data. 00:04:53.610 --> 00:04:55.550 And so that is what we are going to 00:04:55.550 --> 00:04:58.789 build. And I have this brief little outline here. 00:04:58.789 --> 00:05:01.020 There's kind of three steps. And this is pretty 00:05:01.020 --> 00:05:04.629 general to, to any data project. You know, step 00:05:04.629 --> 00:05:06.719 one is gonna be just gathering the data, getting 00:05:06.719 --> 00:05:09.659 it in some usable form. Step two is gonna 00:05:09.659 --> 00:05:11.259 be kind of the analysis part where we do 00:05:11.259 --> 00:05:14.050 the n-gram calculation. We store the results. And then 00:05:14.050 --> 00:05:15.789 step three is gonna be to create a nice 00:05:15.789 --> 00:05:19.259 little front-end interface that lets us investigate, visualize and 00:05:19.259 --> 00:05:20.809 see what we've done. 00:05:20.809 --> 00:05:23.300 Now unfortunately, you know, in a, in a thirty 00:05:23.300 --> 00:05:26.020 minute talk we can't possibly do all of this. 00:05:26.020 --> 00:05:28.689 So we're gonna focus more on items one and 00:05:28.689 --> 00:05:31.490 two and less so on three, and even then 00:05:31.490 --> 00:05:33.099 it's too much. So, you know, I sort of 00:05:33.099 --> 00:05:34.689 used the analogy, it'll be a bit like watching 00:05:34.689 --> 00:05:37.419 TV on the Food Network, where we might, you 00:05:37.419 --> 00:05:40.039 know, throw something in the oven, mysteriously something else 00:05:40.039 --> 00:05:42.009 pops out of the other oven even though it's, 00:05:42.009 --> 00:05:43.759 where did that come from? 00:05:43.759 --> 00:05:46.089 But not to worry. Everything is also on GitHub. 00:05:46.089 --> 00:05:47.869 There's a repo I'll share with you at the 00:05:47.869 --> 00:05:50.339 end. So anything that we don't cover or that 00:05:50.339 --> 00:05:51.979 we cover too quickly or something, you'll be able 00:05:51.979 --> 00:05:53.779 to see sort of the, the full version on 00:05:53.779 --> 00:05:55.740 GitHub. 00:05:55.740 --> 00:05:57.770 So let us jump in now to step one, 00:05:57.770 --> 00:06:00.189 which is, you know, gathering the data. And so 00:06:00.189 --> 00:06:01.909 let's take a look back at the, the RailsConf 00:06:01.909 --> 00:06:03.080 website again. So we have to figure out how 00:06:03.080 --> 00:06:06.460 we're gonna model a, a RailsConf talk in our 00:06:06.460 --> 00:06:09.889 database. So like, what, you know, attributes does a, 00:06:09.889 --> 00:06:13.339 do a, excuse me, does a RailsConf talk have. 00:06:13.339 --> 00:06:14.289 And it's like, one thing we see is they 00:06:14.289 --> 00:06:17.669 all have titles. So that looks like something. They 00:06:17.669 --> 00:06:20.089 have speakers. You know, there's this thing, which is 00:06:20.089 --> 00:06:23.330 the abstract, and then there's the bio. And that's 00:06:23.330 --> 00:06:25.469 probably it. That's probably all we need. 00:06:25.469 --> 00:06:27.669 So that's pretty simple. And, you know, I have 00:06:27.669 --> 00:06:29.999 the little migration. I've already run here. But here 00:06:29.999 --> 00:06:31.789 are attributes for talks. It's just the year, you 00:06:31.789 --> 00:06:33.909 know, what, what conference were we actually at. The 00:06:33.909 --> 00:06:36.110 title of the talk, the speaker, the abstract, and 00:06:36.110 --> 00:06:37.569 the bio. 00:06:37.569 --> 00:06:41.490 And so also, that's, again, pretty straightforward. The gemfile 00:06:41.490 --> 00:06:45.089 is also very simple. It's mostly pretty boiler plate. 00:06:45.089 --> 00:06:47.830 Rails 4, Ruby 2.1. The only gems I wanted 00:06:47.830 --> 00:06:49.409 to call out here are, we're gonna use nokogiri 00:06:49.409 --> 00:06:52.309 for, you know, fetching, or, parsing websites and kind 00:06:52.309 --> 00:06:54.229 of scraping the data we need. We're gonna use 00:06:54.229 --> 00:06:56.389 PosGres as our main data store and we're gonna 00:06:56.389 --> 00:06:58.219 use redis to build these sort of index that 00:06:58.219 --> 00:07:00.180 we can ultimately use to look up, you know, 00:07:00.180 --> 00:07:02.389 how common a word is. 00:07:02.389 --> 00:07:05.389 And so one thing that's not here is, like, 00:07:05.389 --> 00:07:09.009 you know, gem fancy data algorithm. And a lot 00:07:09.009 --> 00:07:10.689 of people, this is kind of where Ruby often 00:07:10.689 --> 00:07:13.369 gets a bad reputation of, you know, not being 00:07:13.369 --> 00:07:16.039 supportive of scientific computing or whatever. And other languages 00:07:16.039 --> 00:07:18.589 have more, more support. But my claim is that 00:07:18.589 --> 00:07:20.520 it's really not that important. You can get a 00:07:20.520 --> 00:07:23.509 ton of mileage out of very simple tools that 00:07:23.509 --> 00:07:24.210 you can build yourself. 00:07:24.210 --> 00:07:25.809 You know, you don't need a fancy gem or 00:07:25.809 --> 00:07:28.360 any fancy algorithm. Those things are cool too and 00:07:28.360 --> 00:07:30.740 they have their place. But they're not needed a 00:07:30.740 --> 00:07:33.349 lot of the time. And, you know, Ruby is 00:07:33.349 --> 00:07:36.210 a wonderful language for, especially, scraping stuff from the 00:07:36.210 --> 00:07:38.449 web. There's a ton of support there. And so 00:07:38.449 --> 00:07:40.979 I don't think that the, the lack of, you 00:07:40.979 --> 00:07:43.509 know, fancy algorithm gems should necessarily be a deterrant 00:07:43.509 --> 00:07:44.439 at all. 00:07:44.439 --> 00:07:46.960 And so hopefully part of this talk is convincing 00:07:46.960 --> 00:07:49.649 people that Ruby and Rails are actually quite well-suited 00:07:49.649 --> 00:07:50.939 to problems like this. 00:07:50.939 --> 00:07:53.559 OK. So now we actually need to write some 00:07:53.559 --> 00:07:56.249 code to scrape the talk. And you know, if 00:07:56.249 --> 00:07:57.419 you've ever done anything like this before, you know 00:07:57.419 --> 00:07:59.520 that Chrome Inspector is your best friend. So let's 00:07:59.520 --> 00:08:02.499 fire that up. We're gonna inspect element, and so 00:08:02.499 --> 00:08:04.069 like, we actually, what we need to do now 00:08:04.069 --> 00:08:06.889 is take you know, this HTML on the page 00:08:06.889 --> 00:08:09.119 and turn it into a database record that we 00:08:09.119 --> 00:08:11.889 can then, you know, use to our advantage later. 00:08:11.889 --> 00:08:13.050 And so it looks like, you know, all the 00:08:13.050 --> 00:08:16.629 talks are in these session classes. So that's something. 00:08:16.629 --> 00:08:19.849 We can look in here. This looks like something. 00:08:19.849 --> 00:08:23.469 So let's make this bigger. 00:08:23.469 --> 00:08:25.039 And you know it helps to, well, it's kind 00:08:25.039 --> 00:08:29.059 of essential to be decent with CSS selectors here, 00:08:29.059 --> 00:08:32.149 because that's how we're going to basically find stuff. 00:08:32.149 --> 00:08:34.719 So let's see, OK, so there's eighty-one session divs. 00:08:34.719 --> 00:08:37.990 That sounds about right. I happen to know that 00:08:37.990 --> 00:08:42.229 mine is number seventy-eight, so let's, let's look at 00:08:42.229 --> 00:08:44.360 that. And so here we are. So we need 00:08:44.360 --> 00:08:46.970 to, again, the, the things we're mod- or, the 00:08:46.970 --> 00:08:50.250 attributes we're storing at the title, the speaker, the 00:08:50.250 --> 00:08:52.680 abstract, and the bio. And so we're gonna need 00:08:52.680 --> 00:08:54.850 to pull these things out. 00:08:54.850 --> 00:08:57.630 So let's see. It looks like the, the title 00:08:57.630 --> 00:09:00.490 is in this h1 element inside the header. So 00:09:00.490 --> 00:09:04.830 let's just make sure that works. You know, header 00:09:04.830 --> 00:09:08.450 h1. That looks right. 00:09:08.450 --> 00:09:13.650 The, the speaker looks to be the header h2. 00:09:13.650 --> 00:09:16.060 Cool. 00:09:16.060 --> 00:09:20.640 Now the abstract is in this p tag, so 00:09:20.640 --> 00:09:23.130 we can do something like this. But this is 00:09:23.130 --> 00:09:26.490 actually not quite right. So what's wrong with this? 00:09:26.490 --> 00:09:30.140 Well, the abstract ends, you know, suited to the 00:09:30.140 --> 00:09:32.310 problem. The bio here is also in the p 00:09:32.310 --> 00:09:35.310 tag. Originally a math guy. And we've actually pulled 00:09:35.310 --> 00:09:37.010 all the p-tags. So we need a way of 00:09:37.010 --> 00:09:38.940 not doing that. And this is where you just 00:09:38.940 --> 00:09:40.200 need to know a little bit of CSS. Not 00:09:40.200 --> 00:09:42.550 very complicated. But if you use the little greater 00:09:42.550 --> 00:09:44.800 than guy, what this says is only take the 00:09:44.800 --> 00:09:47.210 p tags that are immediate descendants of the session 00:09:47.210 --> 00:09:50.390 div. And so now we have, you know, only 00:09:50.390 --> 00:09:51.060 the abstract. 00:09:51.060 --> 00:09:54.340 And lastly, you know, the bio is just in 00:09:54.340 --> 00:09:58.460 its own little section. So something like that. Cool. 00:09:58.460 --> 00:10:00.190 So that is the jQuery version of it. We 00:10:00.190 --> 00:10:03.180 need to do this, though, in Ruby. And as 00:10:03.180 --> 00:10:05.250 I said, this does sometimes get a little tedious. 00:10:05.250 --> 00:10:07.340 But let's, let's write the code. So I have 00:10:07.340 --> 00:10:12.160 this empty method - create_railsconf_2014_talks. And also this method 00:10:12.160 --> 00:10:14.760 I've written already called fetch_and_parse, which just gets a 00:10:14.760 --> 00:10:16.610 URL and sends it to nokogiri, which we can 00:10:16.610 --> 00:10:17.690 then use to do our CSS selectors. 00:10:17.690 --> 00:10:20.510 So let, let's just write this. So we can 00:10:20.510 --> 00:10:27.400 say doc is fetch_and_parse. The url is this. Let's 00:10:27.400 --> 00:10:33.940 see if this works in the console. 00:10:33.940 --> 00:10:40.940 Of course, in here. Do I have internet? Nice. 00:10:47.360 --> 00:10:52.700 So we can then check the same thing. Again. 00:10:52.700 --> 00:10:57.830 Looks right. Let's find my talk, which, this part 00:10:57.830 --> 00:10:59.310 I couldn't possibly tell you. When you use the 00:10:59.310 --> 00:11:01.610 nokogiri, the eq thing, you have to add two 00:11:01.610 --> 00:11:04.330 from whatever jQuery does. So I'm number 80 now. 00:11:04.330 --> 00:11:06.570 Don't ask me why. I couldn't possibly tell you. 00:11:06.570 --> 00:11:10.210 But maybe someone here knows. Be curious to find 00:11:10.210 --> 00:11:10.780 out. 00:11:10.780 --> 00:11:11.920 AUDIENCE: ?? (00:11:13) 00:11:11.920 --> 00:11:15.400 T.S.: So there it is. There's the title. So 00:11:15.400 --> 00:11:17.380 let us now write some code here. We have 00:11:17.380 --> 00:11:21.520 our, our document. We're gonna go through each session. 00:11:21.520 --> 00:11:24.320 The CSS method is kind of like, you know, 00:11:24.320 --> 00:11:28.900 the selector for nokogiri. Each elements. So each of 00:11:28.900 --> 00:11:35.370 these we're gonna create a talk. 00:11:35.370 --> 00:11:38.390 And again. So the year we already know is 00:11:38.390 --> 00:11:45.390 2014. The title we're gonna say is, elm.css("header h1").inner_text. 00:11:48.300 --> 00:11:55.300 Speaker, header h2, dun nuh nuh dun nuh nuh 00:12:00.460 --> 00:12:04.520 nuh. Gettin' there. 00:12:04.520 --> 00:12:09.950 All right. So I think this will probably work. 00:12:09.950 --> 00:12:13.980 Let's find out. And so we're back in here. 00:12:13.980 --> 00:12:19.470 Just to prove to you that I'm not lying, 00:12:19.470 --> 00:12:23.450 2014 dot count. There's none of them. And, what'd 00:12:23.450 --> 00:12:26.440 I call this method? This guy. Delayed::Job. 00:12:26.440 --> 00:12:33.440 All right. So we just did something. Did it 00:12:33.440 --> 00:12:40.440 work? Nice. We got eighty-one talks. Most importantly, let's, 00:12:41.150 --> 00:12:42.390 we have my talk. That's the, that's the only 00:12:42.390 --> 00:12:46.760 one that matters anyway. And so, you know, you 00:12:46.760 --> 00:12:48.260 might be thinking now, like, you know, what the 00:12:48.260 --> 00:12:50.120 heck, I came to the, the data science talk, 00:12:50.120 --> 00:12:52.330 not the scraping talk. You know, to that, I 00:12:52.330 --> 00:12:56.020 would say, tough luck. They're the same thing. You 00:12:56.020 --> 00:12:57.880 know, you might not, you might not want to 00:12:57.880 --> 00:13:00.040 hear it, but guess what, this is usually the 00:13:00.040 --> 00:13:02.020 most important part of the entire project. 00:13:02.020 --> 00:13:04.960 It's the hardest part, you know, because guess what, 00:13:04.960 --> 00:13:07.080 just because we got the 2014 talks, you know, 00:13:07.080 --> 00:13:08.630 now we have to get the 2013 talks. And 00:13:08.630 --> 00:13:10.880 the 2012 talks. And they're all on different websites. 00:13:10.880 --> 00:13:12.890 They all have different structures. You know, you're gonna 00:13:12.890 --> 00:13:15.090 have to write different code to get each type 00:13:15.090 --> 00:13:17.120 of website. It's a pain. And this is why 00:13:17.120 --> 00:13:19.240 I said earlier, you know, really make sure you're 00:13:19.240 --> 00:13:21.160 working on something you care about. Because it's just 00:13:21.160 --> 00:13:24.300 not fun to like, like, ugh, in 2008 they 00:13:24.300 --> 00:13:26.850 separated the speakers and the abstracts. And it's like, 00:13:26.850 --> 00:13:29.260 it's just, it's annoying, but again, it's the most 00:13:29.260 --> 00:13:30.290 important part I would say. 00:13:30.290 --> 00:13:32.920 You know, so much of data science is taking 00:13:32.920 --> 00:13:35.960 data that's either unstructured or structured in the wrong 00:13:35.960 --> 00:13:39.020 format to you and, you know, getting it into 00:13:39.020 --> 00:13:40.510 the way, you know, into the structure that you 00:13:40.510 --> 00:13:43.410 need to do whatever analysis you want to do. 00:13:43.410 --> 00:13:45.130 So in this case, that's taking, you know, html 00:13:45.130 --> 00:13:47.930 on a page and converting it into a PosGres 00:13:47.930 --> 00:13:49.430 database. 00:13:49.430 --> 00:13:52.800 And so we have done that now. And again, 00:13:52.800 --> 00:13:53.920 take my word that, you know, I've done this 00:13:53.920 --> 00:13:56.980 for the other years as well. Back in 2007 00:13:56.980 --> 00:14:00.930 and so we have a total of 497 talks 00:14:00.930 --> 00:14:04.220 in here from RailsConfs over the years. And so 00:14:04.220 --> 00:14:06.870 that's cool. That's basically our dataset that we're gonna 00:14:06.870 --> 00:14:07.200 use. 00:14:07.200 --> 00:14:08.730 And so we can sort of move on to, 00:14:08.730 --> 00:14:11.000 you know, step two of the project here, which 00:14:11.000 --> 00:14:14.000 is, you know, do the n-gram calculation and store 00:14:14.000 --> 00:14:16.800 the results. And so let's go back to talk.rb. 00:14:16.800 --> 00:14:18.700 All this by the way is just in, you 00:14:18.700 --> 00:14:21.920 know, app/models/talk.rb. That's where all this code is. 00:14:21.920 --> 00:14:25.560 And I have another empty method somewhere called def 00:14:25.560 --> 00:14:27.590 ngrams. And so this method, we're gonna need to 00:14:27.590 --> 00:14:29.810 give, you know, it goes on a talk. So 00:14:29.810 --> 00:14:32.400 given a value of n, calculate on the ngrams 00:14:32.400 --> 00:14:34.540 from that talk's abstract. 00:14:34.540 --> 00:14:36.160 And so, what are we gonna do here? So 00:14:36.160 --> 00:14:43.160 again, let's look at, talk dot mine. Dot abstract. 00:14:43.550 --> 00:14:45.160 So here's the abstract, and we need to, you 00:14:45.160 --> 00:14:48.920 know, get ngrams out of this. And so the 00:14:48.920 --> 00:14:51.060 first thing, I've written a little helper method over 00:14:51.060 --> 00:14:54.050 here. Which I've just tacked on a string called 00:14:54.050 --> 00:14:57.410 normalized_for_ngrams. And you know, what does this do? Well, 00:14:57.410 --> 00:14:59.640 it downcases it, cause we're gonna do case insensitive. 00:14:59.640 --> 00:15:01.560 There might be cases where you want to keep 00:15:01.560 --> 00:15:03.820 case sensitivity. Whatever. Doesn't really matter. In this case 00:15:03.820 --> 00:15:06.060 we're gonna go case insensitive. 00:15:06.060 --> 00:15:08.880 Squish is a nice, convenient method that will kind 00:15:08.880 --> 00:15:11.460 of standardize the white space for you. So like, 00:15:11.460 --> 00:15:13.990 if there's any trailing or leading white space, and 00:15:13.990 --> 00:15:16.600 if there's like a bunch of middle white space, 00:15:16.600 --> 00:15:18.730 this will, it'll kill the beginning and ending and 00:15:18.730 --> 00:15:20.630 it'll turn anything in the middle into a single 00:15:20.630 --> 00:15:21.220 space. 00:15:21.220 --> 00:15:22.230 So that way you just don't have to worry 00:15:22.230 --> 00:15:25.130 about things like double spaces or, you know, other, 00:15:25.130 --> 00:15:26.820 other weird things that can happen. Cause of course 00:15:26.820 --> 00:15:28.600 it's the web. Whatever can go wrong will go 00:15:28.600 --> 00:15:31.510 wrong. So make sure that you're data's in some 00:15:31.510 --> 00:15:33.360 kind of standardized format. 00:15:33.360 --> 00:15:36.500 And the last thing I've done is removed punctuation. 00:15:36.500 --> 00:15:38.360 And the reason for that is just cause like, 00:15:38.360 --> 00:15:40.279 you know, there's commas, periods, colons, all sorts of 00:15:40.279 --> 00:15:42.930 stuff like that. We don't really care about it. 00:15:42.930 --> 00:15:44.710 And so let's just kill any character that's not 00:15:44.710 --> 00:15:46.540 either a space or a word character. This is 00:15:46.540 --> 00:15:49.450 kind of the, little like, Ruby special regex thing. 00:15:49.450 --> 00:15:53.040 So we're gonna kill punctuation. 00:15:53.040 --> 00:15:54.190 And so we can actually just mess with this 00:15:54.190 --> 00:15:56.610 in the console maybe. So let's take our little 00:15:56.610 --> 00:16:00.460 example sentence. You know, this talk is boring. And 00:16:00.460 --> 00:16:04.240 let's normalize that for ngrams. OK. All it did 00:16:04.240 --> 00:16:07.710 was downcase it. And now we want to get 00:16:07.710 --> 00:16:09.410 that into an array of words, which we can 00:16:09.410 --> 00:16:13.060 just do with split. Cool. 00:16:13.060 --> 00:16:16.830 And now there's actually this neat little Ruby enumerable 00:16:16.830 --> 00:16:18.290 thing, which I didn't know about until pretty recently. 00:16:18.290 --> 00:16:21.800 Each const, which stands for each consecutive. And it 00:16:21.800 --> 00:16:25.380 takes an argument, a single number, like two, and 00:16:25.380 --> 00:16:27.279 what this says is give me all of the, 00:16:27.279 --> 00:16:29.779 you know, consecutive pairs of two. So if we 00:16:29.779 --> 00:16:32.440 to_a this, now we have this array of arrays, 00:16:32.440 --> 00:16:34.180 which looks like exactly what we want. 00:16:34.180 --> 00:16:36.870 This talk, talk is, and is boring. And so 00:16:36.870 --> 00:16:38.310 the last thing we can do there is we 00:16:38.310 --> 00:16:43.690 can just map that array to make these just 00:16:43.690 --> 00:16:44.190 phrases. 00:16:44.190 --> 00:16:46.860 So cool. So this is actually the entirety of 00:16:46.860 --> 00:16:49.820 our ngrams method, is just, you know, this code 00:16:49.820 --> 00:16:51.630 right here. So let's copy and paste this into 00:16:51.630 --> 00:16:56.500 the old method here. So we want. We're doing 00:16:56.500 --> 00:17:03.040 this on the abstract. Let's get some new lines 00:17:03.040 --> 00:17:04.079 here. 00:17:04.079 --> 00:17:09.839 All right, cool. So again, just to recap, you 00:17:09.839 --> 00:17:12.039 take the abstract, we normalize it, which means, you 00:17:12.039 --> 00:17:14.880 know, downcase and kill the punctuation. We split it 00:17:14.880 --> 00:17:17.289 to words. Uh, wait. Actually this should not be 00:17:17.289 --> 00:17:21.118 two. That should be n. And then we join 00:17:21.118 --> 00:17:24.220 those. So let's, let's see if this worked. 00:17:24.220 --> 00:17:31.220 So talk dot mine again. And one. OK. So 00:17:31.360 --> 00:17:32.769 here are all the one grams, which is just 00:17:32.769 --> 00:17:36.240 the sequence of words. And that looks correct. And 00:17:36.240 --> 00:17:41.869 all of the two grams. Also looks correct, I 00:17:41.869 --> 00:17:45.369 think. Yeah. To get, get a, yeah, OK, perfect. 00:17:45.369 --> 00:17:47.619 And so this is kind of the, the method 00:17:47.619 --> 00:17:50.690 we're gonna use to decompose these talks into just, 00:17:50.690 --> 00:17:53.799 you know, an array of words and phrases. And 00:17:53.799 --> 00:17:55.929 so what is the next step, now that we 00:17:55.929 --> 00:17:57.549 have this method? Well, the next step is we 00:17:57.549 --> 00:17:59.470 have to build these indexes that we're actually gonna 00:17:59.470 --> 00:18:03.659 use to look up, you know, the final results. 00:18:03.659 --> 00:18:05.139 And so for that, we're gonna use redis. 00:18:05.139 --> 00:18:07.179 Now, we don't have sort of enough time to 00:18:07.179 --> 00:18:10.990 really get totally into the details of redis. But, 00:18:10.990 --> 00:18:12.039 you know, the, the thing that we're really gonna 00:18:12.039 --> 00:18:14.759 use is the, the sorted set data structure, which 00:18:14.759 --> 00:18:16.440 I'd definitely encourage you to check out. It's a 00:18:16.440 --> 00:18:19.159 great data structure. Great feature of redis. And so 00:18:19.159 --> 00:18:20.210 what is a sorted set? 00:18:20.210 --> 00:18:22.730 Well, it's got the word set in it, so 00:18:22.730 --> 00:18:24.720 that tells you something. It's, you know, unique elements. 00:18:24.720 --> 00:18:27.059 And the, the neat feature of a sorted set 00:18:27.059 --> 00:18:28.990 is that each element in the set also has 00:18:28.990 --> 00:18:32.360 a score associated with it. So the way we 00:18:32.360 --> 00:18:34.669 can use this is, remember, again, the question I'm 00:18:34.669 --> 00:18:36.610 gonna answer is, like, you know, if someone searches 00:18:36.610 --> 00:18:38.559 for Ember, you know, how many times was Ember 00:18:38.559 --> 00:18:40.429 mentioned in 2007. How many times was it mentioned 00:18:40.429 --> 00:18:42.169 in 2008. How many times was it mentioned in 00:18:42.169 --> 00:18:42.659 2009? 00:18:42.659 --> 00:18:44.610 So we're gonna have one sorted set for each 00:18:44.610 --> 00:18:47.700 year, where the members of the sorted set are 00:18:47.700 --> 00:18:50.259 all the words and phrases that appeared in RailsConf 00:18:50.259 --> 00:18:54.100 talks, and the scores are the number of times 00:18:54.100 --> 00:18:56.419 that those ngrams appeared. 00:18:56.419 --> 00:18:58.399 And then, you know, redis is very efficient about 00:18:58.399 --> 00:19:00.249 this zscore method. You can look up. It's like 00:19:00.249 --> 00:19:02.590 this command right here would say, OK, in the 00:19:02.590 --> 00:19:05.990 sorted set for 2014, get me the score associated 00:19:05.990 --> 00:19:09.249 with the member ember. And that's gonna tell you, 00:19:09.249 --> 00:19:11.559 you know, some number. Like, three or whatever. Is 00:19:11.559 --> 00:19:14.340 the number of times it gets mentioned. 00:19:14.340 --> 00:19:15.840 So what we have to do is build these 00:19:15.840 --> 00:19:18.799 sorted sets. One for each year again. And again 00:19:18.799 --> 00:19:23.590 I have an empty method called generate_ngram_data_by_year. So iterate 00:19:23.590 --> 00:19:26.110 through all talks from a given year, you know, 00:19:26.110 --> 00:19:27.389 calculate the ngram counts and add it to the 00:19:27.389 --> 00:19:29.940 appropriate redis sorted set. So let's write that. 00:19:29.940 --> 00:19:32.450 So one thing we need to do is make 00:19:32.450 --> 00:19:34.460 sure we're not double counting. So if we have 00:19:34.460 --> 00:19:37.240 an old sorted set sitting around, let's delete it. 00:19:37.240 --> 00:19:40.210 So let's, redis.delete year. We need to decide what 00:19:40.210 --> 00:19:43.460 values of n we're gonna use. So let's just 00:19:43.460 --> 00:19:46.210 say one, two, and three, meaning we're gonna calculate 00:19:46.210 --> 00:19:48.190 all the one grams, two grams, three grams. Anything 00:19:48.190 --> 00:19:49.700 longer than that and it's sort of, like, what's 00:19:49.700 --> 00:19:51.740 even the point. You're getting into pretty specific sentences. 00:19:51.740 --> 00:19:53.110 There's not gonna be a lot of repetition. 00:19:53.110 --> 00:19:55.789 So now we need to iterate through each talk 00:19:55.789 --> 00:20:02.789 for the given years. Where(:year => year).find_each. And then 00:20:05.789 --> 00:20:07.860 for each talk we need to iterate through each 00:20:07.860 --> 00:20:14.330 value of n. And then for each value of 00:20:14.330 --> 00:20:15.610 n, what do we need to do? We need 00:20:15.610 --> 00:20:17.480 to calculate the ngram, so do talk dot ngrams. 00:20:17.480 --> 00:20:19.059 This is the method we just wrote. We're gonna 00:20:19.059 --> 00:20:19.989 pass it n. 00:20:19.989 --> 00:20:22.649 Do |ngram|. 00:20:22.649 --> 00:20:26.489 And then finally, we're going to add this to 00:20:26.489 --> 00:20:29.330 the relevant redis sorted set. So the command for 00:20:29.330 --> 00:20:30.049 that is redis.zincrby. 00:20:30.049 --> 00:20:34.669 And this goes, you give it a year, you 00:20:34.669 --> 00:20:38.769 give it a number, like one, and you give 00:20:38.769 --> 00:20:40.320 it what are you incrementing. 00:20:40.320 --> 00:20:42.779 OK. So let's look at this method now. We're 00:20:42.779 --> 00:20:45.019 gonna take, give it a year. We're gonna go 00:20:45.019 --> 00:20:48.419 through every talk from that year. We're gonna go 00:20:48.419 --> 00:20:50.629 through values of n, which is one, two and 00:20:50.629 --> 00:20:53.200 three, so let's say one, OK. Get the talk. 00:20:53.200 --> 00:20:55.289 Calculate all of its one grams. And then for 00:20:55.289 --> 00:20:59.149 each one gram, add to the year sorted set 00:20:59.149 --> 00:21:02.869 the value of one for that ngram. And then 00:21:02.869 --> 00:21:05.139 do that just a bunch of times. 00:21:05.139 --> 00:21:07.549 So let's see if this works. 00:21:07.549 --> 00:21:14.480 Let's reload. Again to prove I'm not lying. There's 00:21:14.480 --> 00:21:21.360 nothing in redis at the moment. Oops. Gotta do 00:21:21.360 --> 00:21:22.419 talk. 00:21:22.419 --> 00:21:29.419 Let's worry about those Delayed::Jobs. Perfect. Drink break. 00:21:30.419 --> 00:21:33.019 So it's going through each year now. And each 00:21:33.019 --> 00:21:34.559 talk in each year, counting up all the words 00:21:34.559 --> 00:21:39.489 and phrases and building our sorted sets. And it 00:21:39.489 --> 00:21:40.440 is done. 00:21:40.440 --> 00:21:43.049 So let's see what we got in here now. 00:21:43.049 --> 00:21:46.779 OK, cool. So we got these keys. Let's, let's 00:21:46.779 --> 00:21:48.039 look into one of these. One of the nice 00:21:48.039 --> 00:21:49.610 things about the sorted set is you can, of 00:21:49.610 --> 00:21:52.909 course, sort by it. And so the command here 00:21:52.909 --> 00:21:55.950 is zrevrange. So we can do the 2014 sorted 00:21:55.950 --> 00:21:58.869 set. So this is gonna give us the top 00:21:58.869 --> 00:22:01.470 ten, or actually eleven, top eleven, you know, ngrams 00:22:01.470 --> 00:22:03.909 in 2014. So let's see. 00:22:03.909 --> 00:22:09.090 And we can actually add :with_scores = true. So 00:22:09.090 --> 00:22:11.759 the most common words and phrases from 2014 RailsConf 00:22:11.759 --> 00:22:16.639 talk abstracts. Not very surprising. The, to, and, a, 00:22:16.639 --> 00:22:20.200 of, in, you, how. Rails. OK. Rails makes the 00:22:20.200 --> 00:22:21.110 number ten. 00:22:21.110 --> 00:22:23.519 So there you go. 00:22:23.519 --> 00:22:25.249 Now we can also, let's just have a little 00:22:25.249 --> 00:22:28.369 fun here. See what some of the sort top 00:22:28.369 --> 00:22:30.480 non-trivial ones are. Obviously you could write some code, 00:22:30.480 --> 00:22:32.950 maybe kill stop words. Stuff like that. If you 00:22:32.950 --> 00:22:34.690 don't care about them. 00:22:34.690 --> 00:22:40.330 But, so. Rails. Can code. This talk. Most popular 00:22:40.330 --> 00:22:44.619 two-word phrase. Pretty good. How to. Ruby developers. Eh, 00:22:44.619 --> 00:22:46.399 this looks pretty, pretty relevant, right. I mean, these 00:22:46.399 --> 00:22:51.220 are not words you'd be surprised to see in 00:22:51.220 --> 00:22:53.289 a RailsConf talk abstract. 00:22:53.289 --> 00:22:56.220 So those, you know, are the most common words. 00:22:56.220 --> 00:22:57.289 So we now have this. We have this for 00:22:57.289 --> 00:22:58.509 every year, by the way. So we can also 00:22:58.509 --> 00:23:01.440 do something, this is the same thing for 2011. 00:23:01.440 --> 00:23:04.279 Whatever. And the last piece of code we're going 00:23:04.279 --> 00:23:05.739 to write, is we need to be able to 00:23:05.739 --> 00:23:06.769 query this data. 00:23:06.769 --> 00:23:08.940 So, you know, the actual, sort of, website or 00:23:08.940 --> 00:23:11.590 finished product, you're gonna have to, you know, search 00:23:11.590 --> 00:23:13.429 for a term. And you're gonna have to go 00:23:13.429 --> 00:23:15.739 look up in your data, you know, what, what 00:23:15.739 --> 00:23:19.340 are the relevant values for that term. 00:23:19.340 --> 00:23:21.299 And so, how we're gonna do this. Well, the 00:23:21.299 --> 00:23:23.499 first thing we gotta remember is that we normal- 00:23:23.499 --> 00:23:27.409 remember we did this normalize for ngrams thing. So 00:23:27.409 --> 00:23:28.919 we have to do that again, because what if 00:23:28.919 --> 00:23:31.100 someone searches for a capitalized word or with something 00:23:31.100 --> 00:23:32.989 with punctuation. We have to process it the exact 00:23:32.989 --> 00:23:35.739 same way that we processed our input. Otherwise it 00:23:35.739 --> 00:23:38.889 won't match. So let's just do that. 00:23:38.889 --> 00:23:42.809 And then we have this constant ALL_YEARS. And we're 00:23:42.809 --> 00:23:45.950 gonna iterate through that with an object with a 00:23:45.950 --> 00:23:47.299 hash. Let's just build up a hash. That's probably 00:23:47.299 --> 00:23:51.999 the easy way to do it. Do |year, hash|. 00:23:51.999 --> 00:23:57.549 And the, the relevant redis command, again, is zscore. 00:23:57.549 --> 00:24:03.700 So we can do redis dot zscore(). We're gonna 00:24:03.700 --> 00:24:05.869 look up in the hash for that year, the 00:24:05.869 --> 00:24:08.470 term. And we need to put this actually in 00:24:08.470 --> 00:24:13.739 the hash. And so, and then we need to 00:24:13.739 --> 00:24:16.289 to_i that in case it's nil. 00:24:16.289 --> 00:24:19.100 OK. So this now, what does this say? ALL_YEARS 00:24:19.100 --> 00:24:22.859 is just, you know, 2007 through 2014. Go through 00:24:22.859 --> 00:24:25.889 each of those years. And then build up our 00:24:25.889 --> 00:24:27.609 hash so that the hash, the key of the 00:24:27.609 --> 00:24:30.499 year, maps to the value of, you know, the 00:24:30.499 --> 00:24:33.889 number of times that term appeared in that year. 00:24:33.889 --> 00:24:38.179 So let's, again, see if that works. Talk dot 00:24:38.179 --> 00:24:43.639 query, you know, ruby or something. Cool. So in 00:24:43.639 --> 00:24:47.330 2007 it was mentioned 52 times, 2014 22 times. 00:24:47.330 --> 00:24:50.230 Whatever. We can, I guess, we said Ember originally. 00:24:50.230 --> 00:24:54.309 And there you go. It was not mentioned until 00:24:54.309 --> 00:24:58.369 this year. Which is also kind of telling. 00:24:58.369 --> 00:25:01.690 And so this is basically, you know, all of 00:25:01.690 --> 00:25:04.100 the kind of step two code you need. That's 00:25:04.100 --> 00:25:06.840 sort of the ngram calculation, store the results. And 00:25:06.840 --> 00:25:09.840 again, I reiterate, like, everything we just did, is 00:25:09.840 --> 00:25:12.830 kind of trivially simple. There's no fancy algorithms. It's 00:25:12.830 --> 00:25:15.220 just counting, you know, putting stuff in the right 00:25:15.220 --> 00:25:17.169 data structure. Accessing it in sort of the right 00:25:17.169 --> 00:25:18.269 way. 00:25:18.269 --> 00:25:20.940 And I just think there's something like pretty, you 00:25:20.940 --> 00:25:23.179 know, insightful about that, that you don't need to 00:25:23.179 --> 00:25:26.389 do fancy things all the time. And that often 00:25:26.389 --> 00:25:28.590 the kind of the coolest results will come from 00:25:28.590 --> 00:25:30.749 something simple. 00:25:30.749 --> 00:25:31.769 And so, as I said, the last thing we're 00:25:31.769 --> 00:25:33.139 gonna do here is create this nice front end 00:25:33.139 --> 00:25:35.970 interface that lets us investigate the results. You know, 00:25:35.970 --> 00:25:37.989 unfortunately, we don't really have time to get into 00:25:37.989 --> 00:25:40.320 that. It is all on the GitHub. But, I 00:25:40.320 --> 00:25:42.940 will tell you, I use pie charts as a 00:25:42.940 --> 00:25:46.100 nice library, front-end library that makes it very simple 00:25:46.100 --> 00:25:47.450 to get charts up and running. It's actually not 00:25:47.450 --> 00:25:48.419 that much code. 00:25:48.419 --> 00:25:49.889 And I've done this already. So let's start up 00:25:49.889 --> 00:25:54.039 a server. And, oops. Let's fire up the localhost. 00:25:54.039 --> 00:25:58.950 And so here we are. The abstractogram is our 00:25:58.950 --> 00:26:00.009 app. So what are we, what are we gonna 00:26:00.009 --> 00:26:01.080 search for here? 00:26:01.080 --> 00:26:03.919 Let's see. I, you, we or something. And there 00:26:03.919 --> 00:26:05.330 we go. So there, there it is. The number 00:26:05.330 --> 00:26:08.730 of times the word you appears in each year. 00:26:08.730 --> 00:26:11.100 Looks pretty flat. So, you know, the, these are 00:26:11.100 --> 00:26:13.100 kind of constant. Anyone have any, anything else they 00:26:13.100 --> 00:26:15.539 want to search for? Let's try ember, backbone. 00:26:15.539 --> 00:26:19.369 All right. Let's say, we got, PosGres I heard. 00:26:19.369 --> 00:26:24.109 All right. I guess we could all say, let's 00:26:24.109 --> 00:26:28.639 say SQL. No one cares about PosGres this year. 00:26:28.639 --> 00:26:32.700 Service. SOA. Oh, there is sort of a rising 00:26:32.700 --> 00:26:35.850 trend of service-oriented architecture. 00:26:35.850 --> 00:26:36.320 Anything else? 00:26:36.320 --> 00:26:41.419 TDD. That's a good one. TDD. Testing. Test-driven, how 00:26:41.419 --> 00:26:48.419 about. So there we go. I'm sorry? 00:26:48.909 --> 00:26:53.739 Rest. That's a trick one though, cause rest is 00:26:53.739 --> 00:26:55.480 also like a real word that, you know, like, 00:26:55.480 --> 00:26:57.440 the rest of the time will be something else. 00:26:57.440 --> 00:27:04.149 And. Refactor. Let's see. Ooh. That's a good one. 00:27:04.149 --> 00:27:09.629 DHH. Wow. Peaked 2011, peak DHH. Let's see, we 00:27:09.629 --> 00:27:11.570 got, Heroku is a good one. On the rise. 00:27:11.570 --> 00:27:13.700 I like we can just look at Ruby and 00:27:13.700 --> 00:27:15.409 Rails. This is actually, I think, pretty relevant. It's 00:27:15.409 --> 00:27:18.980 like, what are people talking about? Not Rails anymore. 00:27:18.980 --> 00:27:20.269 We got to find something new to talk about. 00:27:20.269 --> 00:27:22.730 You know, it's like, too many RailsConfs. And, in 00:27:22.730 --> 00:27:25.350 fact, this actually came up at the, you know, 00:27:25.350 --> 00:27:27.119 there was a speaker meeting, whatever, and everyone was 00:27:27.119 --> 00:27:29.489 talking about how, you know, their talks weren't actually 00:27:29.489 --> 00:27:30.600 about Rails. 00:27:30.600 --> 00:27:32.879 And, you know, maybe this is actually an insightful 00:27:32.879 --> 00:27:35.639 statement, that, you know, the, the community has obviously 00:27:35.639 --> 00:27:37.710 gotten very large and there's just a ton of 00:27:37.710 --> 00:27:38.350 other stuff to talk about. People have been talking 00:27:38.350 --> 00:27:41.299 about Rails for a long time. And so, you 00:27:41.299 --> 00:27:42.909 know, here I am giving a talk that's not 00:27:42.909 --> 00:27:46.059 really directly about Rails. But, so maybe this is 00:27:46.059 --> 00:27:47.369 like a real trend that people are just finding 00:27:47.369 --> 00:27:49.039 other stuff to talk about. 00:27:49.039 --> 00:27:53.080 And that is pretty cool. So I promised that 00:27:53.080 --> 00:27:56.470 I would show you the repo or whatever on 00:27:56.470 --> 00:27:59.609 GitHub. You can just do bit.ly slash railsconfdata. It's 00:27:59.609 --> 00:28:02.059 just the code. Everything we've looked at today. Plus 00:28:02.059 --> 00:28:04.419 some more stuff. It's actually running live on the 00:28:04.419 --> 00:28:07.399 internet at abstractogram dot herokuapp dot com. 00:28:07.399 --> 00:28:09.679 I figure the internet's probably not working, but let's 00:28:09.679 --> 00:28:16.679 see. Yup. Classic. And, you know, otherwise that is 00:28:16.809 --> 00:28:19.649 it. And thank you for listening. And I think 00:28:19.649 --> 00:28:20.450 we have time for questions.