< Return to Video

Ruby Conf 2013 - Recommendation Engines with Redis and Ruby by Evan Light

  • 0:16 - 0:18
    EVAN LIGHT: OK, so it's 2:01.
  • 0:18 - 0:20
    I guess we better get started, cause I was
    told that,
  • 0:20 - 0:22
    all right I have to hit this little button,
  • 0:22 - 0:25
    that once I run out of time this little doo-hicky
    here
  • 0:25 - 0:26
    is gonna make lots of noise
  • 0:26 - 0:27
    and then they're gonna bring out the,
  • 0:27 - 0:29
    the gong, and it won't be pretty.
  • 0:29 - 0:34
    So yeah, that's me. And we're, so right. I'm
  • 0:34 - 0:36
    mixed up. I'm Xavier Shay. I'm here to talk
  • 0:36 - 0:38
    about Ruby Profiling. If you were looking
    for this
  • 0:38 - 0:40
    Evan Light guy, he's in that other room.
  • 0:40 - 0:42
    Oh, wait, no that's not right. Yeah, OK. We're
  • 0:42 - 0:44
    - I'm Evan Light and we're talking about recommendation
  • 0:44 - 0:46
    engines with Ruby and Redis, and why are there
  • 0:46 - 0:50
    more people in here than I expected? OK.
  • 0:50 - 0:53
    So, very briefly about me - I created and
  • 0:53 - 0:55
    run this event out in northern Virginia called
    Ruby
  • 0:55 - 0:59
    DCamp. It's a three-day nerd commune in the
    woods
  • 0:59 - 1:02
    for Ruby programmers. If you haven't heard
    about it,
  • 1:02 - 1:04
    there are a bunch of handy, a bunch people,
  • 1:04 - 1:08
    participants here who have been before. So,
    but, in
  • 1:08 - 1:12
    a nutshell, you come out in the woods, you
  • 1:12 - 1:13
    hack on Ruby code, you hang out with awesome
  • 1:13 - 1:15
    programmers, you are not allowed to leave
    until the
  • 1:15 - 1:17
    very end.
  • 1:17 - 1:20
    And the attendees decide on basically everything
    and they
  • 1:20 - 1:21
    have to do all the chores. And that sounds
  • 1:21 - 1:22
    like a lot of work, but it's really an
  • 1:22 - 1:24
    awful lot of fun. Oh, and free. But you
  • 1:24 - 1:26
    have to get, you have to get a code
  • 1:26 - 1:27
    in order to be able to attend.
  • 1:27 - 1:30
    Also, I work for this little company called
    rackspace.
  • 1:30 - 1:32
    Can you guys raise your hands if you've heard
  • 1:32 - 1:34
    of us before? Oh, that's pretty good. How
    many
  • 1:34 - 1:38
    of you guys use us? Or, well, I guess
  • 1:38 - 1:43
    I'll say currently use us. Hmm. That's not
    too
  • 1:43 - 1:45
    many. We need to work on that some.
  • 1:45 - 1:47
    So I'm a, a what they call developer advocate
  • 1:47 - 1:50
    for rackspace. That is, that I'm here for
    you
  • 1:50 - 1:53
    guys. Truly. And that's why I took the job.
  • 1:53 - 1:54
    I wanted the job where I could do more
  • 1:54 - 1:57
    for the Ruby community, and they basically
    said, great,
  • 1:57 - 1:59
    that's the kind of person we want.
  • 1:59 - 2:01
    So if there's anything I can do to make
  • 2:01 - 2:03
    your lives, those few of you here, we need
  • 2:03 - 2:06
    more, who use rackspace, make your lives better
    with
  • 2:06 - 2:09
    rackspace, great. And for those of you who
    don't,
  • 2:09 - 2:11
    if there's anything you can think of that
    would
  • 2:11 - 2:13
    make you want to - yeah, we'd like to
  • 2:13 - 2:14
    hear that, too.
  • 2:14 - 2:17
    Let's see. So moving right along. In a nutshell,
  • 2:17 - 2:20
    here's what we're gonna talk about. This is
    a
  • 2:20 - 2:25
    case study of sorts for which I, a client
  • 2:25 - 2:26
    whose problem I solved with a recommendation
    engine, we'll
  • 2:26 - 2:29
    talk about that. So we'll talk about the context,
  • 2:29 - 2:31
    the solution that I used - I need to
  • 2:31 - 2:35
    not look at my phone. Some Redis-related tangents,
    because
  • 2:35 - 2:37
    this is a really all about Redis and Ruby,
  • 2:37 - 2:40
    and some painful lessons I learned along the
    way.
  • 2:40 - 2:43
    So the context. The client of mine, who shall
  • 2:43 - 2:45
    remain nameless, just so that way I can be
  • 2:45 - 2:49
    a little freer with discussion. They had a
    soccer,
  • 2:49 - 2:51
    or have, I should say, a soccer social network.
  • 2:51 - 2:53
    So imagine Facebook but for soccer.
  • 2:53 - 2:56
    Like, Facebook for blah blah blah, that's
    pretty common
  • 2:56 - 2:59
    in California, right. But in their case, what
    made
  • 2:59 - 3:01
    them really interesting is that they have
    a live
  • 3:01 - 3:04
    feed of soccer data coming in all the time.
  • 3:04 - 3:06
    So, as games are being played, every time
    there's
  • 3:06 - 3:08
    a red card or a yellow card, or someone
  • 3:08 - 3:10
    scores a goal or there's a penalty, they get
  • 3:10 - 3:13
    a notification about it.
  • 3:13 - 3:14
    And what they wanted to be able to do
  • 3:14 - 3:16
    is they wanted their users to be able to
  • 3:16 - 3:21
    see popular events, popular posts on their
    site, so
  • 3:21 - 3:25
    the, the soccer event feed, as it's coming
    in,
  • 3:25 - 3:28
    would be automatically spewed out into the
    website as
  • 3:28 - 3:31
    a series of posts. And they would be contextualized,
  • 3:31 - 3:33
    that is, that they would have tags instead,
    we'll
  • 3:33 - 3:35
    see more on that later.
  • 3:35 - 3:37
    So they wanted the, the users to be able
  • 3:37 - 3:41
    to see popular posts and relevant posts. And
    in
  • 3:41 - 3:44
    near real-time, in that in near real-time
    part means
  • 3:44 - 3:46
    that there's a little bit more exciting.
  • 3:46 - 3:50
    So recommendation engines - I'm sure that
    most of
  • 3:50 - 3:51
    you are at least familiar with the idea, because
  • 3:51 - 3:54
    you use this thing called Google, probably.
    Maybe you've
  • 3:54 - 3:59
    heard of it. So recommendation engines are
    an approximation,
  • 3:59 - 4:02
    and they are based on, obviously, large sets
    of
  • 4:02 - 4:07
    data, ideally. And in this case, we want two
  • 4:07 - 4:09
    different kinds of recommendations.
  • 4:09 - 4:13
    Again, we want what's popular - that's pretty
    straightforward.
  • 4:13 - 4:17
    But what's relevant - and that's very subjective.
  • 4:17 - 4:19
    So they're based in statistics. But this is
    me
  • 4:19 - 4:23
    an statistics. And, and, and, and this to
    me
  • 4:23 - 4:25
    is actually what makes this talk interesting,
    because I
  • 4:25 - 4:31
    built a recommendation engine being that dog.
  • 4:31 - 4:34
    So statistics - so recommendation engines
    are canonically based
  • 4:34 - 4:38
    in the statistical methods and, yeah. Statistical
    methods and
  • 4:38 - 4:40
    I, we don't get along so great. So this
  • 4:40 - 4:41
    is basically about how you do it with brute
  • 4:41 - 4:44
    force and still get away with it.
  • 4:44 - 4:46
    So other than being ignorant to statistical
    methods, quite
  • 4:46 - 4:48
    frankly, I couldn't get the client to pay
    me
  • 4:48 - 4:50
    for a day or two of research. I asked
  • 4:50 - 4:52
    them - I said, wouldn't you like to do
  • 4:52 - 4:55
    the, the right thing rather than just, just,
    than
  • 4:55 - 4:59
    something probably ugly that'll work? And
    they said, no
  • 4:59 - 5:02
    basically we trust you, so just go build it.
  • 5:02 - 5:04
    But I'm telling you, it'd be better if I
  • 5:04 - 5:05
    did a little research in advance.
  • 5:05 - 5:07
    No, no - just go build it.
  • 5:07 - 5:08
    OK.
  • 5:08 - 5:11
    Cause I like being paid.
  • 5:11 - 5:13
    So, why Ruby?
  • 5:13 - 5:17
    Well, kind of the same thing there. Their
    developers
  • 5:17 - 5:20
    knew Ruby. They knew JavaScript. I said maybe
    we
  • 5:20 - 5:23
    should use something faster, you know, Java
    - which,
  • 5:23 - 5:25
    I feel is really funny to say, having been
  • 5:25 - 5:26
    a programmer for awhile. If you said Java
    was
  • 5:26 - 5:29
    fast twenty years ago, you'd, I would, or
    if
  • 5:29 - 5:30
    I'd said it, I'd be laughed out of the
  • 5:30 - 5:31
    room.
  • 5:31 - 5:33
    Nowadays, you have Java - fast. Go - fast.
  • 5:33 - 5:36
    C - fast. Even, JVM languages. I said Clojure
  • 5:36 - 5:40
    because I like Clojure. But, nope. They wanted
    Ruby.
  • 5:40 - 5:44
    So, OK, Ruby it is. No statistical methods,
    really,
  • 5:44 - 5:47
    fine. I'll figure something out.
  • 5:47 - 5:49
    So let's talk about the system a little bit.
  • 5:49 - 5:52
    Like, every social network, it has the typical
    nouns
  • 5:52 - 5:56
    of users, posts, comments. You're used to
    this. But
  • 5:56 - 5:57
    then we have a few new ones. We have
  • 5:57 - 6:01
    teams, players. I forget, I think they, they
    had
  • 6:01 - 6:03
    a match as a noun, but really a match
  • 6:03 - 6:06
    to me was just two teams playing. An event
  • 6:06 - 6:08
    with two teams on it.
  • 6:08 - 6:11
    And then we had a series of verbs. So
  • 6:11 - 6:13
    submitting a post - I'm sure you're familiar
    with
  • 6:13 - 6:16
    that. Except that I alluded to this a little
  • 6:16 - 6:20
    bit earlier - posts have tags. They're taggable
    polymorphically.
  • 6:20 - 6:23
    So you could put any old thing on them,
  • 6:23 - 6:25
    but usually you would see teams and players,
    and
  • 6:25 - 6:27
    that's really all they wanted out of the recommendation
  • 6:27 - 6:27
    engine.
  • 6:27 - 6:30
    It's important to mention. More on that later.
  • 6:30 - 6:32
    And it's not that import - it's really not
  • 6:32 - 6:34
    that important, it's just a fun point later.
  • 6:34 - 6:36
    So you can comment on a post - big
  • 6:36 - 6:38
    surprise. Again, social network, you probably
    didn't expect to
  • 6:38 - 6:41
    see that. But you can tag posts, you can
  • 6:41 - 6:44
    tag, sorry, comments, with users - kind of
    like
  • 6:44 - 6:47
    you could in Facebook. It was a little bit
  • 6:47 - 6:50
    more of a nuissance because they didn't have
    a,
  • 6:50 - 6:53
    a tagging mechanism per se for users, like
    Facebook
  • 6:53 - 6:55
    does. I just had to write something to scan
  • 6:55 - 6:58
    the text. Not entirely relevant to the rest
    of
  • 6:58 - 7:00
    the discussion, so. We'll just keep on going.
  • 7:00 - 7:03
    Other verbs that kind of mattered a bit -
  • 7:03 - 7:06
    favoriting teams or players. This isn't something
    that, that
  • 7:06 - 7:08
    Facebook had. More like a FourSquare thing,
    when you
  • 7:08 - 7:10
    say, I love this. I love this team. I
  • 7:10 - 7:14
    love this player. And then liking posts. Pretty
    typical
  • 7:14 - 7:15
    stuff.
  • 7:15 - 7:19
    So given a, a model that looks a- something
  • 7:19 - 7:22
    a little like this, leaving out comments and
    likes
  • 7:22 - 7:24
    and favorites for now. Let's say you have
    a
  • 7:24 - 7:28
    user, in this case he's user 2, he has,
  • 7:28 - 7:30
    he posted three posts. The first two posts
    are
  • 7:30 - 7:32
    really the important ones, and the first two
    posts
  • 7:32 - 7:34
    talk about tags one, two, and three.
  • 7:34 - 7:37
    So say we're given this. And maybe we have
  • 7:37 - 7:41
    something like this, but we don't initially,
    where we
  • 7:42 - 7:43
    can say this other - we have this other
  • 7:43 - 7:47
    guy and he's interested in these tags. So
    those
  • 7:47 - 7:49
    might be teams or players.
  • 7:49 - 7:53
    And they have these scalar values associated,
    this user
  • 7:53 - 7:55
    has these scalar values associated with each
    of these,
  • 7:55 - 7:58
    say, teams or players. So given those things,
    what
  • 7:58 - 8:00
    we want, ultimately, is that.
  • 8:00 - 8:02
    We want to be able to say, this user
  • 8:02 - 8:05
    is going to be interested in these posts and
  • 8:05 - 8:08
    not that post. So post three and - going
  • 8:08 - 8:13
    back two slides - had tag four. And post,
  • 8:13 - 8:15
    and user 1 only cared about tags one, two,
  • 8:15 - 8:17
    and three. Not tag four.
  • 8:17 - 8:19
    So he's only interes- he should only have
    a
  • 8:19 - 8:21
    score for post one, post two, and he shouldn't
  • 8:21 - 8:23
    have anything for post three because he just
    doesn't
  • 8:23 - 8:27
    care. So we want something like that.
  • 8:27 - 8:30
    So this part here is where the, the interesting-ness
  • 8:30 - 8:32
    came in. When the client approached me, they
    said,
  • 8:32 - 8:35
    well we have this idea for how a recommendation
  • 8:35 - 8:37
    engine would work. We'll just, we'll just
    have a
  • 8:37 - 8:40
    weight associated with each one of these events
    as
  • 8:40 - 8:41
    they occur.
  • 8:41 - 8:44
    Well, that's all well and good, but going
    from
  • 8:44 - 8:47
    the first diagram with the post and the tags
  • 8:47 - 8:49
    to, oh, I have this in a single step
  • 8:49 - 8:51
    doesn't really make any sense.
  • 8:51 - 8:53
    I needed some kind of lens in order to,
  • 8:53 - 8:56
    to figure out what the user, what content
    the
  • 8:56 - 9:00
    user would actually care about. So I needed
    intermediate
  • 9:00 - 9:01
    value - I needed some intermediate values
    to get
  • 9:02 - 9:05
    a sense of what does the user care about?
  • 9:05 - 9:10
    So moving on. We start with ActiveRecord.
    Every good
  • 9:10 - 9:13
    application does - not really. But it was
    a
  • 9:13 - 9:16
    Rails app, so yeah, we had ActiveRecord. But
    really
  • 9:16 - 9:21
    we're talking about ActiveRecord::Observers.
    So that's to say that
  • 9:21 - 9:26
    we would capture, or would capture the lifecycle
    events
  • 9:26 - 9:30
    of the nouns I described earlier. And, well,
    we
  • 9:30 - 9:32
    would have some data that we would feed into
  • 9:32 - 9:35
    something and we'll get there in just a minute.
  • 9:35 - 9:37
    So to reiterate, we cared about two different
    kinds
  • 9:37 - 9:40
    of posts. Really, they're, well, posts are
    posts. But
  • 9:40 - 9:43
    we care about quantifying them in two different
    buckets.
  • 9:43 - 9:46
    Popular, which is a global thing, and relevant,
    which
  • 9:46 - 9:48
    is subjective to the individual user.
  • 9:48 - 9:51
    So popularity is pretty straightforward. It,
    it could be
  • 9:51 - 9:54
    made a little more complex, in this. Popularity,
    if
  • 9:54 - 9:59
    I recall, is based on comments and likes.
    And
  • 9:59 - 10:03
    I forget which was worth more. Because we,
    we
  • 10:03 - 10:04
    would - and that's kind of irrelevent. The
    point
  • 10:04 - 10:06
    is, they would have different weightings.
  • 10:06 - 10:08
    So for a trending this standpoint, a comment
    might
  • 10:08 - 10:10
    be worth than a like or a like might
  • 10:10 - 10:12
    be worth more than a comment. One thing that
  • 10:12 - 10:13
    we had talked about doing that if we hadn't
  • 10:13 - 10:15
    done would have made life a lot more interesting,
  • 10:15 - 10:17
    is to have a notion of taste makers. And
  • 10:17 - 10:20
    that is, people who are super jazzed about
    a
  • 10:20 - 10:24
    topic having their, their likes and their
    comments being
  • 10:24 - 10:29
    more valuable in terms of popularity than
    other people's.
  • 10:29 - 10:32
    If you instantly start thinking about gaming
    the sytem
  • 10:32 - 10:33
    when I say something like that, then you're
    basically
  • 10:33 - 10:35
    reading my mind. Because I kept going back
    to
  • 10:35 - 10:37
    the client over and over again about that,
    and
  • 10:37 - 10:40
    their response was, oh to have such problems.
    And,
  • 10:40 - 10:41
    well I had to agree with them.
  • 10:41 - 10:43
    If someone games your system, well then you're
    doing
  • 10:43 - 10:45
    pretty well for someone to care enough to
    do
  • 10:45 - 10:46
    it.
  • 10:46 - 10:48
    Relevence is really where it gets a little
    more
  • 10:48 - 10:51
    interesting, or a lot more interesting. So
    we have
  • 10:51 - 10:53
    these verbs, or I guess these statements,
    like if
  • 10:53 - 10:55
    you go and favorite DC United, or you submit
  • 10:55 - 10:58
    a post tag with DC United - let's say
  • 10:58 - 11:00
    you like DC United, or you comment on a
  • 11:00 - 11:03
    post that is tagged with DC United, or the
  • 11:03 - 11:06
    really confusing one, you're mentioned in
    a comment on
  • 11:06 - 11:08
    a post tagged DC United.
  • 11:08 - 11:09
    If your head hurts on that one, I understand.
  • 11:09 - 11:12
    It took me awhile to wrap my brain around
  • 11:12 - 11:12
    it too.
  • 11:12 - 11:13
    So obviously, if you hadn't figured out, there
    was
  • 11:13 - 11:16
    a time in my life when I liked DC
  • 11:16 - 11:20
    United. But I'm not really much into sports
    anymore.
  • 11:20 - 11:25
    But moving right along. So, relevence is,
    in this
  • 11:25 - 11:27
    system, is defined by an algorithm kind of
    like
  • 11:27 - 11:28
    this.
  • 11:28 - 11:32
    So given an arbitrary event defined by an
    AR
  • 11:32 - 11:36
    observer, or essentially serialized by an
    AR obvserver, for
  • 11:36 - 11:40
    each tag on that event, for each user interested
  • 11:40 - 11:44
    in that tag, go score the user's interest
    in
  • 11:44 - 11:47
    that tag, or go rescore assuming that there's
    an
  • 11:47 - 11:49
    interest already.
  • 11:49 - 11:50
    So if you hadn't figured out already, that's
    a
  • 11:50 - 11:52
    Big O events squared algorithm, if you're
    in computer
  • 11:52 - 11:55
    science. And that's a bad, bad, bad thing.
    Damn
  • 11:55 - 11:57
    - I was hoping they might have been more
  • 11:57 - 12:02
    Pacific Rim fans in the audience, but. Oh
    well.
  • 12:02 - 12:06
    So yeah, Big O N squared algorithm. I'm up
  • 12:06 - 12:09
    against it, so I'm thinking there's - this,
    this,
  • 12:09 - 12:11
    this is bad. What am I gonna do with
  • 12:11 - 12:15
    this situation? Well, how could we cheat?
  • 12:15 - 12:18
    So, it occurred to me, we're talking about
    soccer
  • 12:18 - 12:22
    matches, about sports games. We're talking
    about wanting timely
  • 12:22 - 12:25
    recommendations. Why do we care about stuff
    that's in
  • 12:25 - 12:27
    the past? WE shouldn't. So I went to the
  • 12:27 - 12:29
    client and said, what if we just say, have
  • 12:29 - 12:31
    a window of three days and then after that
  • 12:31 - 12:32
    we just don't care anymore?
  • 12:32 - 12:34
    And they said thumbs up, and I thought, oh
  • 12:34 - 12:36
    great, now there's a whole lot of data I
  • 12:36 - 12:38
    don't have to worry about. So, Big O N
  • 12:38 - 12:41
    squared is bad, but N just got a whole
  • 12:41 - 12:43
    lot smaller.
  • 12:43 - 12:45
    By the way, sorry, computer science parlence.
    Big O
  • 12:45 - 12:47
    of N squared is to say it's a nested
  • 12:47 - 12:51
    loop, and N is some arbitrarily large constant.
    It's
  • 12:51 - 12:54
    the largest, if, I think, we're being concrete
    about
  • 12:54 - 12:56
    it, the size of the largest data structure
    would
  • 12:56 - 12:59
    be iterating over. So the big O is worst
  • 12:59 - 13:03
    case run time of this algorithm would be looping
  • 13:03 - 13:07
    over the longest structure in a nested fashion.
    And
  • 13:07 - 13:09
    that's generally very slow - you don't want
    to
  • 13:09 - 13:11
    do that.
  • 13:11 - 13:14
    So we only care about recent posts, as I
  • 13:14 - 13:17
    said a moment ago. But now we, we've narrowed
  • 13:17 - 13:20
    down what events we care about. We need some
  • 13:20 - 13:22
    kind of event consumer. So how many of you
  • 13:22 - 13:24
    are familiar with resque?
  • 13:24 - 13:28
    Hmm, OK. About half. I wasn't sure what to
  • 13:28 - 13:29
    expect. Interesting.
  • 13:29 - 13:32
    So Resque is a, a queuing system for processing
  • 13:32 - 13:36
    background tasks, and it's written using this
    thing called
  • 13:36 - 13:38
    Redis, which was in the talks. I assume you
  • 13:38 - 13:40
    might be vaguely interested in this thing
    called Redis.
  • 13:40 - 13:43
    How many people know about Redis, are kind
    of
  • 13:43 - 13:44
    comfortable with it?
  • 13:44 - 13:46
    OK, that's a little more than half. Pretty
    good.
  • 13:46 - 13:47
    So I'll keep this short.
  • 13:47 - 13:51
    Again, don't time me. So Redis is a key
  • 13:51 - 13:53
    value store, which is to say it's a little
  • 13:53 - 13:54
    bit like a mem hash, or if you just
  • 13:54 - 13:57
    want to speak more Ruby parlence, it's basically
    like
  • 13:57 - 14:00
    a glorified hash, except it runs as a server,
  • 14:00 - 14:03
    as a daemon process basically.
  • 14:03 - 14:06
    It lives, or, it's storage is in-memory, but
    it
  • 14:06 - 14:08
    can persist to disk, and there are a couple
  • 14:08 - 14:10
    of different persistence options that give
    you a little
  • 14:10 - 14:15
    bit of flexibility about how often, how reliable
    it
  • 14:15 - 14:17
    persists.
  • 14:17 - 14:19
    And the interesting thing about Redis is it's
    not
  • 14:19 - 14:21
    just a straight key-value storage, it's not
    just a
  • 14:21 - 14:23
    hash, or I guess you could say it is
  • 14:23 - 14:24
    a lot like a Ruby hash in some ways,
  • 14:24 - 14:27
    because the value doesn't have to just be
    a
  • 14:27 - 14:28
    string. The value could be some kind of data
  • 14:28 - 14:32
    structure. And Redis supports, well, the ones
    listed here.
  • 14:32 - 14:37
    So lists, so lists allow repitition. And they're
    sorted.
  • 14:37 - 14:39
    Well, they're sorted based on insertion order,
    I should
  • 14:39 - 14:40
    say.
  • 14:40 - 14:42
    A hash, as you might expect, so actually,
    so
  • 14:42 - 14:45
    key-value is by virt- by nature a lot like
  • 14:45 - 14:47
    a hash, so basically you can have hashes in
  • 14:47 - 14:50
    your hashes. You don't necessarily want to
    use those,
  • 14:50 - 14:52
    and we'll talk about that soon.
  • 14:52 - 14:56
    Sets. So a list where the insertion order
    doesn't
  • 14:56 - 14:59
    necessarily matter, but no repetition is allowed,
    and sorted
  • 14:59 - 15:01
    sets, which are pretty darn interesting because
    they don't
  • 15:01 - 15:04
    allow repetition and they maintain a sorting
    order, and
  • 15:04 - 15:08
    you're, so you're inserting a value and some
    sortable
  • 15:08 - 15:10
    value to go with it.
  • 15:10 - 15:12
    Well, and again, more on that later.
  • 15:12 - 15:14
    Maybe one of the most interesting parts to
    me
  • 15:14 - 15:17
    about Redis is that it supports adding a time
  • 15:17 - 15:21
    to live, an arbitrary time to live, user-definable,
    to
  • 15:21 - 15:24
    any given key that you put in Redis. Now
  • 15:24 - 15:27
    when I say key, I need to be very
  • 15:27 - 15:29
    specific. Key, at the macro level of key-value
    for
  • 15:29 - 15:32
    Redis. So if you store a hash, a hash
  • 15:32 - 15:35
    has a single key that refers to the whole
  • 15:35 - 15:36
    hash.
  • 15:36 - 15:38
    If you're storing a list or a set or
  • 15:38 - 15:40
    a sorted set, there is one key that points
  • 15:40 - 15:42
    to the whole thing. So you put a TTL
  • 15:42 - 15:44
    on that, and what that says is, I want
  • 15:44 - 15:46
    this value to just go away after this amount
  • 15:46 - 15:49
    of time. That can be pretty handy. So when
  • 15:49 - 15:52
    I mentioned that three day window earlier,
    the TTL
  • 15:52 - 15:55
    is very handy there.
  • 15:55 - 15:58
    That's just a little too big font, font-wise.
    So
  • 15:58 - 16:02
    the AR::Observers were pushing events out
    to Resque. And
  • 16:02 - 16:04
    the event would look something like this.
    It'd be
  • 16:04 - 16:07
    pushing JSON up. So the event would have the
  • 16:07 - 16:12
    type, the noun, essentially, the action - I
    think
  • 16:12 - 16:16
    we were only concerned with creates, and occassionally
    deletes.
  • 16:16 - 16:19
    But we didn't really care about updates.
  • 16:19 - 16:21
    I offered to add that. It wasn't, this was
  • 16:21 - 16:23
    a one-over lease, it just wasn't something
    that mattered
  • 16:23 - 16:24
    that much at the time.
  • 16:24 - 16:26
    Then we would have the ID of whatever the
  • 16:26 - 16:28
    thing was, the user ID, because that very
    much
  • 16:28 - 16:31
    matters here since we're talking about the
    user's interest
  • 16:31 - 16:34
    in things. And then the names of the tags
  • 16:34 - 16:35
    associated.
  • 16:35 - 16:38
    But, we have all this stuff queued up, but
  • 16:38 - 16:40
    one does not simply share the load. We have
  • 16:40 - 16:44
    to define our workers. So the worker that
    I,
  • 16:44 - 16:47
    I created, I called a calculator because I
    figured
  • 16:47 - 16:52
    we're calculating a score, and the calculator
    originally was
  • 16:52 - 16:55
    just one giant class. And it was aweful.
  • 16:55 - 16:58
    So a TDD very quickly showed me how bad
  • 16:58 - 17:00
    of an idea this was, as my tests grew
  • 17:00 - 17:02
    to be more and more hard. SO then I
  • 17:02 - 17:05
    started to break it out into three different
    kinds
  • 17:05 - 17:10
    of calculators that formed a sort of workflow.
    And
  • 17:10 - 17:13
    also I, I learned through more TDD suffering
    that
  • 17:13 - 17:16
    I shouldn't even have my calculate, individual
    calculators think
  • 17:16 - 17:20
    about persistence, because then that made
    their already busy
  • 17:20 - 17:22
    life of trying to compute things even busier
    by
  • 17:22 - 17:24
    trying to worry about, well, where do I put
  • 17:24 - 17:25
    this stuff when I'm done.
  • 17:25 - 17:29
    So instead I just had the outer level calculator
  • 17:29 - 17:32
    act as a sort of strategy, I guess, in
  • 17:32 - 17:35
    the object-oriented sense. And, so he was
    the Resque
  • 17:35 - 17:37
    worker, and he handled all the persistence,
    and he
  • 17:37 - 17:39
    just directed the other guys to do work. He
  • 17:39 - 17:41
    would call one guy, get his output, pass it
  • 17:41 - 17:43
    on to the other and so forth.
  • 17:43 - 17:45
    So persistence was handled by Redis, but I
    created
  • 17:45 - 17:50
    a very simple abstraction around it, just
    a class,
  • 17:50 - 17:54
    so that way the customer could decide later,
    oh
  • 17:54 - 17:56
    well, storing everything in memory is kind
    of sucking,
  • 17:56 - 17:58
    so Redis is costing us hundreds of dollars
    now.
  • 17:58 - 18:02
    A month, or more. Because Redis again is all
  • 18:02 - 18:03
    memory, and memory gets a lot more expensive
    when
  • 18:03 - 18:05
    you start getting bigger and bigger and bigger
    chunks
  • 18:05 - 18:08
    of RAM. So I thought at some point they
  • 18:08 - 18:12
    might want something like, dare I say, MongoDB
    -
  • 18:12 - 18:16
    not a big fan, but. Something like that, maybe.
  • 18:16 - 18:18
    So I put that there. It wasn't something I
  • 18:18 - 18:20
    really had to worry about too much while I
  • 18:20 - 18:22
    was working with them again for 1.0 version,
    but
  • 18:22 - 18:24
    it seemed like an easy win.
  • 18:24 - 18:29
    So getting into the individual calculators.
    The trengingness calculator,
  • 18:29 - 18:32
    just like the, the, my discussion about popularity
    earlier,
  • 18:32 - 18:34
    this guy was really straightforward. You like
    something. That
  • 18:34 - 18:37
    bumps up the score on a post. You comment
  • 18:37 - 18:38
    on something, that bumps up the score on a
  • 18:38 - 18:40
    post. Really dumb.
  • 18:40 - 18:42
    And then it outputs, so it would get the
  • 18:42 - 18:44
    event, it would output a new score for that
  • 18:44 - 18:46
    individual post.
  • 18:46 - 18:48
    The way that data was stored was just as
  • 18:48 - 18:51
    a simple key-value pair in Redis. So you would
  • 18:51 - 18:55
    have, and this was actually a, I guess as
  • 18:55 - 18:57
    a brief aside, this was a little uncommon
    for
  • 18:57 - 18:58
    me. I was trying to find lots of ways
  • 18:58 - 19:00
    to use Redis data structures, and for whatever
    reason
  • 19:00 - 19:02
    this made more sense to me as a key-value
  • 19:02 - 19:03
    pair.
  • 19:03 - 19:05
    As it turned out, it probably would have been
  • 19:05 - 19:08
    better as something else. But that's in the
    lessons
  • 19:08 - 19:09
    learned section.
  • 19:09 - 19:12
    So I would munge the keys so I could,
  • 19:12 - 19:14
    you know, name space the values I was storing.
  • 19:14 - 19:16
    Because if I just had the post ID in
  • 19:16 - 19:17
    Redis, then I would have a key of forty-two
  • 19:17 - 19:21
    and, well, if that's the, if that's the post
  • 19:21 - 19:22
    ID, if I wanted to store anything else for
  • 19:22 - 19:24
    that key, well, I would overwrite whatever
    was there
  • 19:24 - 19:26
    and that would suck.
  • 19:26 - 19:28
    So I would put something up front, like say,
  • 19:28 - 19:31
    trend for trendingness. That's pretty common
    in key-value stores
  • 19:31 - 19:35
    to have long munged names sometimes just for
    namespacing
  • 19:35 - 19:38
    purposes.
  • 19:38 - 19:41
    So let's see, right. In the key-value, the
    trendingness
  • 19:41 - 19:43
    scores had the three day TTL that I talked
  • 19:43 - 19:46
    about earlier. The one part that I regretted
    here
  • 19:46 - 19:49
    was that these values were sorted in Ruby
    at
  • 19:49 - 19:52
    run-time, when trendingness was requested.
    Now remember we're only
  • 19:52 - 19:54
    talking about three days worth of posts.
  • 19:54 - 19:56
    And this was for a fairly new social network.
  • 19:56 - 19:58
    So, again, going back to the remark I made
  • 19:58 - 20:00
    about gaming, you know, oh to have such problems,
  • 20:00 - 20:04
    where sorting in Ruby would be that painful.
    But
  • 20:04 - 20:07
    I would far rather sort and say something
    like,
  • 20:07 - 20:09
    C or Java, which is like 100 times faster,
  • 20:09 - 20:12
    so that sorting wouldn't be as painful as
    soon,
  • 20:12 - 20:14
    but alas.
  • 20:14 - 20:16
    So the user interest calculator. This is where
    we
  • 20:16 - 20:18
    start getting into that, that relevance business.
    Deciding which
  • 20:18 - 20:22
    users care about what. So it would get the
  • 20:22 - 20:26
    event, and but it's important to mention that
    on
  • 20:26 - 20:28
    a, for a given even there might be multiple
  • 20:28 - 20:31
    users that might care about the event. And
    the
  • 20:31 - 20:33
    reason for that is because, you have the person
  • 20:33 - 20:35
    who posted the original post, but then you
    have
  • 20:35 - 20:37
    all the commenters, you have all the likers.
  • 20:37 - 20:39
    So you have to aggregate all of those people
  • 20:39 - 20:43
    together because if anything else happens
    in this event,
  • 20:43 - 20:48
    these people have expressed some degree of
    interest in
  • 20:48 - 20:50
    the tags that are involved. I don't think
    I
  • 20:50 - 20:52
    have a slide for this - so I wish
  • 20:52 - 20:54
    I had, I'll take a brief aside to mention
  • 20:54 - 20:57
    that every single one of those verbs had a
  • 20:57 - 20:59
    waiting factor associated with it.
  • 20:59 - 21:01
    So I'm just computer scalars here. I did have
  • 21:01 - 21:05
    an AI class twenty years ago, back in college.
  • 21:05 - 21:08
    So I learned a little, little bit.
  • 21:08 - 21:11
    So each one of those events would have some
  • 21:11 - 21:13
    kind of waiting associated with it, so when
    we
  • 21:13 - 21:14
    had a scalar value we would know it was
  • 21:14 - 21:17
    based primarily on this, and a little bit
    of
  • 21:17 - 21:18
    that, and a little bit less of this and
  • 21:18 - 21:20
    a little bit less of that, like, when you
  • 21:20 - 21:22
    favorite something, that's a big, large declaration
    to say,
  • 21:22 - 21:26
    I love this! When you comment on something
    -
  • 21:26 - 21:29
    well, maybe I kind of like it, and if
  • 21:29 - 21:32
    you are tagged on a comment belonging to a
  • 21:32 - 21:35
    post - eh, OK, that's a pretty weak attachment
  • 21:35 - 21:37
    but that connotes some degree of interest,
    cause you're
  • 21:37 - 21:40
    associated with someone who cares about something
    else.
  • 21:40 - 21:42
    So that's a very weak association, but it
    is
  • 21:42 - 21:43
    some form of association.
  • 21:43 - 21:46
    So all of those users needed to have their
  • 21:46 - 21:50
    interests rescored. Right, and I just mentioned
    arbitrarily assign
  • 21:50 - 21:51
    the weights for event times, so that was all
  • 21:51 - 21:53
    I had, that one bullet. So I think the
  • 21:53 - 21:56
    aside was worth it.
  • 21:56 - 21:59
    So this is how we get this structure, that
  • 21:59 - 22:01
    we have a user and we have a score
  • 22:01 - 22:05
    for each tag based on that big nasty big
  • 22:05 - 22:08
    O n squared algorithm that we defined earlier.
  • 22:08 - 22:10
    So internally in Redis this is how I would
  • 22:10 - 22:14
    store it. I would have one hash per user,
  • 22:14 - 22:19
    and the field, so that the key would be,
  • 22:19 - 22:22
    something like UI - the User ID, the user
  • 22:22 - 22:23
    interest. It's something that we would look
    up an
  • 22:23 - 22:26
    awful lot, so having a nice short key seemed
  • 22:26 - 22:29
    important. Having it munged kind of essential
    because, again,
  • 22:29 - 22:32
    we don't want to step on User ID values
  • 22:32 - 22:35
    with something else later.
  • 22:35 - 22:39
    I think Redis calls the individual keys in
    a
  • 22:39 - 22:41
    hash fields - I don't remember if I have
  • 22:41 - 22:44
    my Redis nomenclature right. But the field
    names were
  • 22:44 - 22:47
    just the tag names, and then the values were
  • 22:47 - 22:49
    the scalar interests.
  • 22:49 - 22:50
    And intentionally I did not want to put any
  • 22:50 - 22:53
    kind of time to live on that hash, because
  • 22:53 - 22:55
    users interests are one thing we know are
    gonna
  • 22:55 - 22:57
    live on and on and on and on.
  • 22:57 - 22:59
    Downside is the user interests, the users'
    interests are
  • 22:59 - 23:01
    something that will live on and on and on
  • 23:01 - 23:02
    and on. So you know that they're just gonna
  • 23:02 - 23:05
    take up more and more space. Tags are not
  • 23:05 - 23:08
    something that, that leave the system very
    often, because
  • 23:08 - 23:11
    players tend to play for awhile, and even
    if
  • 23:11 - 23:13
    they retire they might get mentioned again
    in the
  • 23:13 - 23:17
    social network, so I don't know that the players
  • 23:17 - 23:19
    are gonna leave the system often. Teams likewise.
  • 23:19 - 23:21
    So it just made sense to just basically leave
  • 23:21 - 23:23
    these datastructures alone and let them grow.
    That said,
  • 23:23 - 23:26
    having those in Redis, neh, it bugged me a
  • 23:26 - 23:30
    little. But, again, one O system, it wasn't
    a
  • 23:30 - 23:31
    big concern.
  • 23:31 - 23:36
    So post score calculator is, I think I got
  • 23:36 - 23:37
    mixed up there. Post score calculator is where
    the
  • 23:37 - 23:40
    big big o n squared nastiness came in. So
  • 23:40 - 23:44
    now we've rescored all these users' interests.
    WE need
  • 23:44 - 23:48
    to go propogate this throughout the system.
  • 23:48 - 23:49
    And so again we have, back to the, excuse
  • 23:49 - 23:54
    me I'm so sorry - but I discovered after
  • 23:54 - 23:57
    the fact a name for this pattern that I
  • 23:57 - 24:01
    came upon. It's called inverted indices. And
    inverted index,
  • 24:01 - 24:03
    that's a link by the way - these will
  • 24:03 - 24:04
    go up on GitHub at some point. This is
  • 24:04 - 24:06
    all HTML. You'll be able to click through.
    You
  • 24:06 - 24:08
    don't have to take any notes, if by chance
  • 24:08 - 24:09
    you are.
  • 24:09 - 24:13
    An inverted index is basically just an index
    of
  • 24:13 - 24:16
    the content to where the content is stored.
    So
  • 24:16 - 24:20
    I had a few difference sets I would, for,
  • 24:20 - 24:21
    let's see, a post, I would have the set
  • 24:21 - 24:24
    of all the tags, and let's see, and I
  • 24:24 - 24:26
    actually have to read this cause I don't remember
  • 24:26 - 24:27
    off the top of my head.
  • 24:27 - 24:29
    And then I had a set of right, the
  • 24:29 - 24:32
    interested user IDs by tag. And that would
    save
  • 24:32 - 24:33
    me from having to go out to the database
  • 24:33 - 24:34
    all the time to perform a whole bunch of
  • 24:34 - 24:36
    expensive queries. I could just go to Redis
    and
  • 24:36 - 24:40
    say hey just give me and boom, there I
  • 24:40 - 24:41
    go.
  • 24:41 - 24:43
    And the user post scores were also stored
    in
  • 24:43 - 24:46
    Redis as a hash, very much like the user
  • 24:46 - 24:48
    interest scores. It's just instead of having
    the tag
  • 24:48 - 24:51
    you had a post.
  • 24:51 - 24:54
    So this structure, I, I showed you earlier,
    it's
  • 24:54 - 24:56
    a workflow, but really what it also could
    be
  • 24:56 - 24:58
    is a series of queues. It's not a big
  • 24:58 - 25:01
    truck. You don't just dump stuff on it. Thank
  • 25:01 - 25:05
    you - I'm so glad somebody appreciated that
    one.
  • 25:05 - 25:11
    So some other design considerations that came
    up as
  • 25:11 - 25:15
    we went along. So I'm, I've alluded a few
  • 25:15 - 25:18
    times. I was trying to aggressively optimize
    the RBDMS
  • 25:18 - 25:21
    out of the equation. The client very much
    did
  • 25:21 - 25:23
    not want the recommendation engine to make
    the Rails
  • 25:23 - 25:25
    app run slower, because, well, they're on,
    they, let's
  • 25:25 - 25:28
    see, they're on Heroku and, you know, ?? [00:25:27]
  • 25:28 - 25:29
    cost money.
  • 25:29 - 25:34
    And we already talked about using inverted
    indices to
  • 25:34 - 25:37
    some effect, again reducing - further reducing
    the need
  • 25:37 - 25:41
    for database queries. And I already talked
    about those
  • 25:41 - 25:42
    examples.
  • 25:42 - 25:46
    Now the other thing I, that I, I've mentioned,
  • 25:46 - 25:48
    that I broke the calculator down into a trendingness
  • 25:48 - 25:51
    calculator and a user interest score calculator
    and post
  • 25:51 - 25:55
    score calculator, and no one's made any kind
    of
  • 25:55 - 25:58
    rude gestures, but those names really suck.
    I'm sorry.
  • 25:58 - 26:00
    Post score calculator - just, it's German.
    Take a
  • 26:00 - 26:04
    whole bunch of words together and mush them
    together.
  • 26:04 - 26:06
    Is it good enough? Well, it ran in production,
  • 26:06 - 26:09
    and the customer was happy. So yay.
  • 26:09 - 26:14
    Was I ashamed of a lot of the code
  • 26:14 - 26:18
    I wrote? Oh god yes. Would it scale? Well,
  • 26:18 - 26:22
    I was limited by Redis, so memory, RAM. And
  • 26:22 - 26:25
    I knew I'd be limited by CPU. But what
  • 26:25 - 26:27
    they were conc- what they were concerned with
    was
  • 26:27 - 26:29
    getting a one-over lease out the door, something
    that
  • 26:29 - 26:31
    people could use right away, and if they'd
    be
  • 26:31 - 26:33
    successful, well, worst case they would rewrite
    it. But
  • 26:33 - 26:37
    there was potential to refactor and scale
    it further.
  • 26:37 - 26:40
    One of the little interesting things that
    happened along
  • 26:40 - 26:43
    the way is, because a post, the post is
  • 26:43 - 26:48
    polymorphically taggable, you could just throw
    anything on it.
  • 26:48 - 26:52
    So the engine originally didn't just care
    about teams
  • 26:52 - 26:53
    and players - it just took any old tag
  • 26:53 - 26:56
    you gave it. And the, the client later said
  • 26:56 - 26:57
    yeah, I really only want the other teams and
  • 26:57 - 27:01
    players, but the interesting side effect was,
    well users
  • 27:01 - 27:02
    would get thrown in their as tags too.
  • 27:02 - 27:04
    So I thought, you know, maybe a side business
  • 27:04 - 27:07
    is a sports dating site or something, because
    all
  • 27:07 - 27:08
    of the sudden it would say, hey, this is
  • 27:08 - 27:10
    how interested I am in this person versus
    that
  • 27:10 - 27:10
    person.
  • 27:10 - 27:14
    But, no, I took that out and hard-coded it
  • 27:14 - 27:16
    to just teams and players.
  • 27:16 - 27:20
    So, so lessons learned along the way.
  • 27:20 - 27:22
    Statistical methods - obviously would have
    been nice, because
  • 27:22 - 27:24
    any time I had to write a big O
  • 27:24 - 27:25
    N squared algorithm and I know Ns gonna keep
  • 27:25 - 27:28
    getting bigger, I get really anxious. I did
    not
  • 27:28 - 27:30
    like writing this.
  • 27:30 - 27:33
    The lesson learned for me, if I were still
  • 27:33 - 27:34
    freelancing, but even though I'm not I work
    at
  • 27:34 - 27:37
    rackspace, is when I know something is right,
    I
  • 27:37 - 27:38
    need to argue for it, just a little bit
  • 27:38 - 27:41
    more.
  • 27:41 - 27:45
    Prefer straight key-value over hashes. I've
    mentioned TTLs and
  • 27:45 - 27:48
    I think I mentioned TTLs and I mentioned TTLs.
  • 27:48 - 27:51
    You can't put a TTL on a field on
  • 27:51 - 27:54
    hash. So you can't say I want this for
  • 27:54 - 27:57
    this user, this post score to expire some
    time
  • 27:57 - 28:00
    in the future. No, you're stuck with that
    guy.
  • 28:00 - 28:02
    So there is a way around that. That's, I
  • 28:02 - 28:04
    think, the next slide. But that's more work.
  • 28:04 - 28:06
    What I could have done instead is just had
  • 28:06 - 28:10
    longer munge names. Said, like, user ID, user
    ID
  • 28:10 - 28:13
    blah blah blah, post ID blah blah blah, and
  • 28:13 - 28:15
    then just had a value and put a TTL
  • 28:15 - 28:17
    on that and then it would just disappear in
  • 28:17 - 28:19
    three days and life would have been better.
  • 28:19 - 28:23
    Extracing small - OK, so one more slide. Extracting
  • 28:23 - 28:25
    smaller workers - when I said this is a
  • 28:25 - 28:27
    series of queues, what I was getting at was
  • 28:27 - 28:31
    I designed this system expecting that post
    score calculator
  • 28:31 - 28:34
    would inevitable get more CPU than the other
    guys.
  • 28:34 - 28:38
    So it, there're all written as Resque workers,
    but
  • 28:38 - 28:40
    only the outside calculator is a Resque worker.
  • 28:40 - 28:42
    It would have been fairly trivial to extract
    the
  • 28:42 - 28:44
    other three, give them all their each, give
    them
  • 28:44 - 28:46
    each their own queue. The only thing I would
  • 28:46 - 28:48
    have had to have done is I would have
  • 28:48 - 28:50
    had to add persistence capability to them,
    and that
  • 28:50 - 28:53
    could have been something dependency injectable,
    for example.
  • 28:53 - 28:56
    And that would have been kind of simple.
  • 28:56 - 28:57
    The other thing that would have been nice
    is
  • 28:57 - 29:00
    each one of those little guys basically runs
    on
  • 29:00 - 29:02
    a case statement, and that's the big giant
    oh,
  • 29:02 - 29:05
    oh, scream of please extract me, please extract
    me
  • 29:05 - 29:08
    and, well. Oh well.
  • 29:08 - 29:12
    Less chattiness with Redis. So I just was
    making
  • 29:12 - 29:14
    individual calls to Redis if I needed to get
  • 29:14 - 29:16
    a set, I would just make a single call.
  • 29:16 - 29:18
    If I needed to do, push a key value
  • 29:18 - 29:20
    pair, at least a single call. It was enough.
  • 29:20 - 29:25
    It worked well enough. Individual calls on
    AWS from
  • 29:25 - 29:26
    Heroku to Redis - I did actually bench it.
  • 29:26 - 29:28
    It was something like 2 milliseconds.
  • 29:28 - 29:30
    They add up, but if you're, only if you're
  • 29:30 - 29:32
    making a ton of them. This was still way
  • 29:32 - 29:35
    faster than the Rails app, so it, it really
  • 29:35 - 29:37
    wasn't a big concern. But something to be
    aware
  • 29:37 - 29:37
    of.
  • 29:37 - 29:41
    Redis supports two different features that,
    only one at
  • 29:41 - 29:44
    the time when I wrote this recommendation
    engine, that
  • 29:44 - 29:46
    would have helped here. Pipelining, which
    allows you to
  • 29:46 - 29:50
    just batch up commands. You get futures back,
    which
  • 29:50 - 29:52
    is basically saying here, this is where you
    result
  • 29:52 - 29:54
    will go later. And then all of the results
  • 29:54 - 29:57
    come back, and you just access the futures
    to
  • 29:57 - 29:59
    get the results.
  • 29:59 - 30:01
    So you send one big request with all of
  • 30:01 - 30:02
    your different commands, and then you get
    a bunch
  • 30:02 - 30:07
    of little responses back when they're ready.
    And that
  • 30:07 - 30:09
    will result in less network chattiness, which
    means you're
  • 30:09 - 30:11
    app will run faster.
  • 30:11 - 30:14
    The second one, and this one is very dangerous
  • 30:14 - 30:18
    because Redis is an evented key-value store,
    I didn't
  • 30:18 - 30:21
    mention. Which means there's one thread. You
    can script
  • 30:21 - 30:24
    inside of Redis. If you script badly inside
    of
  • 30:24 - 30:28
    Redis you might occupy that one thread for
    awhile,
  • 30:28 - 30:30
    and when you send commands to Redis, it might
  • 30:30 - 30:32
    say, sorry I'm busy right now.
  • 30:32 - 30:33
    So that would be bad. You know, like crossing
  • 30:33 - 30:36
    the streams in ghost busters.
  • 30:36 - 30:39
    So pruning. When I mentioned not wanting to
    use
  • 30:39 - 30:43
    hashes so much in Redis, if you are gonna
  • 30:43 - 30:45
    use something that's gonna grow and grow and
    grow
  • 30:45 - 30:47
    and nothing ever expires and you want things
    to
  • 30:47 - 30:51
    be removed eventually, one option is to put
    a,
  • 30:51 - 30:54
    a timestamp in every value that you're gonna
    put
  • 30:54 - 30:55
    in that hash.
  • 30:55 - 30:57
    So instead of just putting straight values
    into a
  • 30:57 - 31:00
    hash, just integers for example, you just
    put JSON
  • 31:00 - 31:02
    in, like we do with Resque, and that might
  • 31:02 - 31:05
    have a timestamp and then the value. And then
  • 31:05 - 31:08
    what that means is periodically, although
    I've heard that
  • 31:08 - 31:10
    the, the best practice is maybe every time
    you
  • 31:10 - 31:13
    do an insertion into a structure - also go
  • 31:13 - 31:15
    through and prune that structure, look for
    things that
  • 31:15 - 31:19
    you can remove that have outlived their usefulness.
  • 31:19 - 31:22
    But I mentioned earlier, still, better to
    prefer a
  • 31:22 - 31:25
    key-value where you can just set a TTL than
  • 31:25 - 31:26
    to go and have to deal with pruning. Pruning
  • 31:26 - 31:28
    is more work. I told the client that was
  • 31:28 - 31:31
    something was concerned about for later. Again,
    for 1.0
  • 31:31 - 31:33
    they didn't care. They were like, I could
    wash
  • 31:33 - 31:34
    my hands of it and walk away, but it,
  • 31:34 - 31:37
    still, I didn't like knowing memory would
    just grow.
  • 31:37 - 31:39
    I used to code in C and Java and
  • 31:39 - 31:43
    you don't like leaking memory.
  • 31:43 - 31:46
    So one other thing I realized actually just
    today
  • 31:46 - 31:48
    was the calculator, because it was stateless,
    could have
  • 31:48 - 31:53
    benefited a lot from a functional programming
    style using
  • 31:53 - 31:56
    what's called referential transparency. And
    really that's just a
  • 31:56 - 31:58
    fancy way of saying the output from one function
  • 31:58 - 32:00
    is the input to the next function. And you
  • 32:00 - 32:04
    just accumulate state by taking that output
    from, from
  • 32:04 - 32:06
    one function, passing that as your input to
    the
  • 32:06 - 32:08
    next, along with whatever stuff you need to,
    and
  • 32:08 - 32:10
    just keep accumulating and your final output's
    what you
  • 32:10 - 32:11
    care about.
  • 32:11 - 32:12
    That might have been pretty nice to do. It
  • 32:12 - 32:14
    might have made the code a little bit readable
  • 32:14 - 32:17
    because the imperative style can be a bit
    hard
  • 32:17 - 32:20
    to follow sometimes. I know I was, as I
  • 32:20 - 32:22
    said I wasn't thrilled with the end result
    of
  • 32:22 - 32:23
    the code, and I tried really hard to make
  • 32:23 - 32:25
    it readable, but the imperative style didn't
    look too
  • 32:25 - 32:28
    good in the calculator.
  • 32:28 - 32:30
    And the final lesson learned is you do something
  • 32:30 - 32:33
    faster in Ruby.
  • 32:33 - 32:35
    So, but that, that was really just a joke.
  • 32:35 - 32:38
    Because Ruby was actually adequate to the
    task. It
  • 32:38 - 32:39
    wasn't a problem. So this is the part where
  • 32:39 - 32:41
    I get to say I've got seven minutes and
  • 32:41 - 32:45
    forty-one seconds. Are there any questions,
    heckling, or other
  • 32:45 - 32:48
    statements, remarks, something?
  • 32:49 - 32:52
    No? OK. Cool, well. Three minutes left, so
    thanks
  • 32:52 - 32:53
    very much.
Title:
Ruby Conf 2013 - Recommendation Engines with Redis and Ruby by Evan Light
Description:

more » « less
Duration:
33:20

English subtitles

Revisions