< Return to Video

Ruby Conf 2013 - Extreme Makeover: Rubygems Edition by André Arko

  • 0:16 - 0:20
    ANDRE ARKO: So this is Extreme Makeover: Rubygems
    Edition.
  • 0:20 - 0:24
    I'm basically gonna talk about what happened
  • 0:24 - 0:26
    to Rubygems in the last year
  • 0:26 - 0:29
    and what we're planning on doing in the near
    future.
  • 0:29 - 0:34
    I am Andre Arko. I'm indirect on all of the
    internet things.
  • 0:34 - 0:38
    I work at Cloud City Development, a mostly
    Rails,
  • 0:38 - 0:40
    but general web development shop, where we
    build apps
  • 0:40 - 0:43
    for people.
  • 0:43 - 0:46
    So let's get started.
  • 0:46 - 0:50
    Rubygems. Lots of stuff happened this year.
    It was
  • 0:50 - 0:57
    a really eventful year for RubyGems. The infrastructure
    changed
  • 0:57 - 1:00
    a lot, and - mostly in good ways. So
  • 1:00 - 1:03
    the, the first kind of, stretching the definition
    of
  • 1:03 - 1:06
    a year till last October. The first thing
    that
  • 1:06 - 1:12
    happened was Bundler kind of DDoS'd Rubygems
    dot org.
  • 1:12 - 1:16
    Sorry.
  • 1:16 - 1:19
    Arguably my fault.
  • 1:19 - 1:22
    We basically couldn't tell that it was happening
    until
  • 1:22 - 1:25
    people slowly installed Bundler 1 point 1
    and more
  • 1:25 - 1:27
    and more and more of them installed Bundler
    1
  • 1:27 - 1:30
    point 1, and eventually it was enough people
    that
  • 1:30 - 1:34
    Rubygems couldn't handle it anymore and it
    died.
  • 1:34 - 1:37
    So the dependency API that Bundler was using
    turned
  • 1:37 - 1:41
    out to be really CPU intensive compared to
    the
  • 1:41 - 1:46
    like, not very CPU intensive delivering a
    file that
  • 1:46 - 1:48
    was happening without the API.
  • 1:48 - 1:50
    So I gave a talk at Gotham Ruby this
  • 1:50 - 1:52
    year, actually, just a few months ago with
    a
  • 1:52 - 1:55
    lot of detail about that particular situation
    and what
  • 1:55 - 1:57
    happened and what we learned from it and what
  • 1:57 - 2:00
    we did about it. It's online if you guys
  • 2:00 - 2:03
    really care about that particular thing.
  • 2:03 - 2:06
    The TL;DR is that we rebuilt the API as
  • 2:06 - 2:08
    a Sinatra app. It's not hosted on Horoku,
    separate
  • 2:08 - 2:11
    from Rubygems dot org, and we throw an unbelievable
  • 2:11 - 2:14
    amount of CPU and database resources at it
    in
  • 2:14 - 2:18
    comparison to like what we had before.
  • 2:18 - 2:23
    The next relatively significant thing that
    happened was there
  • 2:23 - 2:25
    was a security breach over at Rubygems dot
    org
  • 2:25 - 2:31
    in January. Their gems have yml gem specs
    and
  • 2:31 - 2:37
    Rails, at the time, provided a way to use
  • 2:37 - 2:44
    yml to exploit against the running application.
    So someone
  • 2:44 - 2:47
    uploaded a gem to Rubygems dot org that contained
  • 2:47 - 2:51
    crafted, malicious yml and the server executed
    it and
  • 2:51 - 2:54
    they got access to the server.
  • 2:54 - 2:56
    Like, I think they paste binned a copy of
  • 2:56 - 3:02
    the etsy passwords file. It was pretty bad.
  • 3:02 - 3:05
    As a result of that, potentially any gem on
  • 3:05 - 3:07
    Rubygems dot org could have been replaced
    with a
  • 3:07 - 3:09
    gem that had a Trojan in it and we
  • 3:09 - 3:11
    didn't really have a way to tell just from
  • 3:11 - 3:14
    the server logs because it could have been
    tampered
  • 3:14 - 3:14
    with.
  • 3:14 - 3:19
    So the Rubygems dot org team mostly up in
  • 3:19 - 3:22
    Phoenix kind of exhaustively compared every
    gem that we
  • 3:22 - 3:25
    had to copies of those gems that had been
  • 3:25 - 3:29
    taken by mirrors of Rubygems at other times
    before
  • 3:29 - 3:33
    the exploit had happened. Happily it turned
    out that
  • 3:33 - 3:35
    all of our gems were fine and no one
  • 3:35 - 3:40
    was screwed. Yay.
  • 3:40 - 3:43
    But we didn't really have a way to trust
  • 3:43 - 3:45
    the box that Rubygems dot org had been hosted
  • 3:45 - 3:50
    on again. Not too surprisingly. So as part
    of
  • 3:50 - 3:55
    the process of rebuilding everything from
    scratch, we actually
  • 3:55 - 3:58
    rebuilt everything on a new architecture that's
    more flexible.
  • 3:58 - 4:03
    We're on EC2 now. We have redundant servers.
    We
  • 4:03 - 4:05
    have maybe, possibly, hopefully fail-over
    if some of those
  • 4:05 - 4:11
    servers stop working. It's all managed by
    chef recipes.
  • 4:11 - 4:13
    Honestly I think it's way better than the
    set
  • 4:13 - 4:16
    up that we had before. The chef recipes are
  • 4:16 - 4:20
    open source. Anyone can contribute fixes or
    features to
  • 4:20 - 4:22
    not - before you could contribute fixes and
    features
  • 4:22 - 4:24
    to Rubygems dot org the Rails app. Now you
  • 4:24 - 4:26
    can contribute fixes and features to the servers
    that
  • 4:26 - 4:29
    Rubygems dot org runs on as well.
  • 4:29 - 4:31
    The refill is on GitHub in the Rubygems org
  • 4:31 - 4:35
    named Rubygems dash AWS.
  • 4:35 - 4:38
    And that's actually pretty cool. Like I'm
    - as,
  • 4:38 - 4:40
    as frustrating as this was at the time, I'm
  • 4:40 - 4:41
    really happy with how things turned out and
    how
  • 4:41 - 4:44
    things are better now.
  • 4:44 - 4:48
    Another issue that plagued a lot of people
    was
  • 4:48 - 4:52
    Travis network issues connecting Rubygems
    dot org. Like, I
  • 4:52 - 4:54
    don't know if all of you guys know what
  • 4:54 - 4:57
    Travis is. It's an automated continuous integration
    system. Lots
  • 4:57 - 5:00
    of opensource projects use it because they
    will provide
  • 5:00 - 5:03
    server-side continuous integration testing
    for free, for all opensource
  • 5:03 - 5:05
    projects.
  • 5:05 - 5:08
    Bundler uses Travis super extensively to test
    on every
  • 5:08 - 5:11
    Ruby version and every Rubygems version to
    make sure
  • 5:11 - 5:15
    it still works.
  • 5:15 - 5:19
    For a few months, it was basically a crapshoot,
  • 5:19 - 5:22
    whether you could actually install gems on
    Travis. You
  • 5:22 - 5:25
    could try, like, ten times, and sometimes
    eight of
  • 5:25 - 5:27
    those tries would work and sometimes one of
    those
  • 5:27 - 5:30
    tries would work, and it really, really frustrating.
    And
  • 5:30 - 5:32
    basically no one knew what the problem was
    because
  • 5:32 - 5:37
    everyone said it was someone else's fault.
  • 5:37 - 5:40
    There was a, I don't know, kind of on
  • 5:40 - 5:44
    and off investigation. It turned out that
    the problem
  • 5:44 - 5:49
    was actually DNS. The Travis virtual machines
    had hard-coded
  • 5:49 - 5:51
    DNS servers that were on the opposite side
    of
  • 5:51 - 5:53
    the country from the data center where the
    Travis
  • 5:53 - 5:58
    VMs actually ran.
  • 5:58 - 6:03
    That meant that whenever Rubygems tried, or
    actually, so,
  • 6:03 - 6:07
    Rubygems posts gems on Amazon's S3 service,
    and then
  • 6:07 - 6:10
    sends you to Cloud Front, which theoretically
    gives you
  • 6:10 - 6:12
    a server that's geographically close to where
    you are.
  • 6:12 - 6:14
    The problem is, it uses your GNS servers to
  • 6:14 - 6:18
    know what geographically close to you is.
    That meant
  • 6:18 - 6:21
    that the Travis servers looked to Cloud Front
    like
  • 6:21 - 6:24
    they were on the other side of the country
  • 6:24 - 6:26
    and they were getting told to use Cloud Front
  • 6:26 - 6:28
    servers that were about as far apart as is
  • 6:28 - 6:30
    possible to be whilst still being inside the
    United
  • 6:30 - 6:33
    States.
  • 6:33 - 6:37
    That was not optimal. It - once we actually
  • 6:37 - 6:41
    figured out that that was the problem, Travis
    was
  • 6:41 - 6:45
    able to sort of force their DNS servers back
  • 6:45 - 6:47
    to ones that were actually inside the data
    center
  • 6:47 - 6:50
    where their, their VMs were hosted. And that
    basically
  • 6:50 - 6:53
    just went away. It's not perfect - there's
    still,
  • 6:53 - 6:56
    like, it's still like a very heavily contended
    connection
  • 6:56 - 6:59
    to Rubygems. But because there's, like, so
    many jobs
  • 6:59 - 7:02
    running simultaneously. But it's way, way,
    way, way better.
  • 7:02 - 7:03
    It's now like nine or ten times out of
  • 7:03 - 7:08
    ten. It succeeds.
  • 7:08 - 7:15
    The other equally frustrating and equally
    intermittant problem that
  • 7:15 - 7:19
    happened to Rubygems this year kind of semi-concurrently
    with
  • 7:19 - 7:22
    the Travis issues, have been continuing after
    that was
  • 7:22 - 7:24
    SSL issues.
  • 7:24 - 7:26
    If you have done a lot of gem installing,
  • 7:26 - 7:29
    you have probably seen SSL errors and been,
    like,
  • 7:29 - 7:32
    I don't know why this happens. And sometimes
    they
  • 7:32 - 7:33
    just go away if you try again, which is,
  • 7:33 - 7:38
    like, the worst possible kind of bug.
  • 7:38 - 7:41
    So it turned out to actually be two different
  • 7:41 - 7:45
    bugs. There was one issue that was kind of
  • 7:45 - 7:51
    a combination of different certificate problems.
    Some, like, so,
  • 7:51 - 7:54
    some Linux machines don't ship with a new
    enough
  • 7:54 - 7:57
    certificate to verify the Rubygems dot org
    certificate, and
  • 7:57 - 8:01
    so we had to add the appropriate certificates
    to
  • 8:01 - 8:03
    Rubygems and Bundler, so that on machines
    that didn't
  • 8:03 - 8:06
    have it we can verify that Rubygems dot org
  • 8:06 - 8:09
    was the machine that we thought it was.
  • 8:09 - 8:12
    There was another certificate issue in that
    some S3
  • 8:12 - 8:16
    end points started using a newer SSL certificate,
    which
  • 8:16 - 8:19
    meant that what we'd done to fix that was
  • 8:19 - 8:22
    now semi-invalid, but it was kind of random
    which
  • 8:22 - 8:26
    S3 input you got, so it only sometimes failed.
  • 8:26 - 8:28
    So we also updated the certificates again
    to like
  • 8:28 - 8:33
    fix that issue. And then, surper frustratingly,
    at the
  • 8:33 - 8:37
    same time, there was a different SSL issue,
    where
  • 8:37 - 8:43
    if it, it turned out, eventually to be that,
  • 8:43 - 8:47
    if you were on a laggy connection, Rubygems
    dot
  • 8:47 - 8:50
    org would just stop responding to your requests
    to
  • 8:50 - 8:52
    open a connection if it took a little bit
  • 8:52 - 8:53
    too long.
  • 8:53 - 8:55
    The timeout was very short, just like a few
  • 8:55 - 8:59
    seconds. And so the SSL, like, what you would
  • 8:59 - 9:01
    see if the servers that note this took too
  • 9:01 - 9:05
    long kill it, was an SSL error. Because the
  • 9:05 - 9:10
    SSL connection had never been finished setting
    up.
  • 9:10 - 9:13
    So we increased the time out, and that also
  • 9:13 - 9:15
    has basically made that problem go away. Obviously
    it
  • 9:15 - 9:18
    can still happen if you're on an incredibly
    laggy
  • 9:18 - 9:20
    connection, but we set it to something more
    reasonable
  • 9:20 - 9:23
    that means almost everyone succeeds almost
    all the time
  • 9:23 - 9:30
    now. Which is great. Like it's so much better.
  • 9:30 - 9:34
    So that's kind of, like, a review of the
  • 9:34 - 9:38
    significant things that happened. Now I'd
    like to talk
  • 9:38 - 9:42
    about how Rubygems works and how I am working
  • 9:42 - 9:45
    on changing it to work differently. So, right.
    How
  • 9:45 - 9:47
    it works today.
  • 9:47 - 9:50
    Today, both Bundler and Rubygems download
    gem information from
  • 9:50 - 9:55
    Rubygems dot org. There's basically two ways
    to get
  • 9:55 - 9:58
    information about gems. You can either ask
    Rubygems dot
  • 9:58 - 10:00
    org for the list of all the gems that
  • 10:00 - 10:04
    exist, or you can ask the Bundler API for
  • 10:04 - 10:10
    just a, like, named list of gems.
  • 10:10 - 10:13
    Honestly neither one of these is that great.
    But
  • 10:13 - 10:17
    they both work. So we keep using them. When
  • 10:17 - 10:20
    you run gem install Rubygems downloads the
    list of
  • 10:20 - 10:23
    all of the gems, and then looks for the
  • 10:23 - 10:25
    newest version of whatever gem you asked for
    to
  • 10:25 - 10:28
    find out what it is. When you call Bundle
  • 10:28 - 10:33
    install, it will first try to use the API
  • 10:33 - 10:34
    and only ask about the gems that are in
  • 10:34 - 10:37
    your gem file, and then the gems that those
  • 10:37 - 10:39
    gems need and the gems that those gems need,
  • 10:39 - 10:40
    and then the gems that those-
  • 10:40 - 10:46
    So it can mean a lot of requests. Both
  • 10:46 - 10:50
    of those options, like, are pretty memory
    intensive, because
  • 10:50 - 10:52
    you end up with, like, if you download the
  • 10:52 - 10:54
    whole list, you end up with a list of
  • 10:54 - 10:56
    every gem that exists, even if you didn't
    actually
  • 10:56 - 10:59
    care about any of those gems - just the
  • 10:59 - 11:01
    one.
  • 11:01 - 11:06
    So with a fast connection it's not that bad.
  • 11:06 - 11:08
    You can download the whole list of every gem
  • 11:08 - 11:10
    pretty quickly. You can make lots of requests
    to
  • 11:10 - 11:15
    the Bundler API pretty quickly. And either
    way it's
  • 11:15 - 11:20
    tolerable. It's not great, but everyone's
    pretty much OK
  • 11:20 - 11:22
    with it.
  • 11:22 - 11:25
    The problem is, Rubygems dot org, and the
    Bundler
  • 11:25 - 11:30
    API both live in AWS US East zone, which
  • 11:30 - 11:36
    is in Virginia. That means that if you're
    not
  • 11:36 - 11:38
    in the United States, you don't have a fast
  • 11:38 - 11:41
    connection. The end.
  • 11:41 - 11:45
    If you're, like, in Europe or Asia or Australia,
  • 11:45 - 11:51
    god forbid, it's gonna take a really long
    time.
  • 11:51 - 11:53
    It, it's not as bad if you're just downloading
  • 11:53 - 11:57
    the whole list all at once, but then you
  • 11:57 - 12:00
    have the, like, this is it, it's like multi-megabyte
  • 12:00 - 12:04
    file, and then you have to unmarshal- It's,
    it's
  • 12:04 - 12:08
    a big file with arrays of gems and then
  • 12:08 - 12:13
    it's marshalled into the Ruby marshal binary
    format.
  • 12:13 - 12:15
    And so you have to unmarshal the entire list
  • 12:15 - 12:16
    and then look through that list for the gem
  • 12:16 - 12:18
    that you actually cared about, and that can
    use
  • 12:18 - 12:20
    up way more memory than it needs to, and
  • 12:20 - 12:22
    way more bandwidth than it needs to. And if
  • 12:22 - 12:26
    you're using Bundler, you've probably just
    made fifty round
  • 12:26 - 12:28
    trip requests to Virginia, and if you're in
    Australia,
  • 12:28 - 12:35
    that took forever. So definitely could be
    better. This
  • 12:36 - 12:39
    is not the fastest situation.
  • 12:39 - 12:44
    So basically after setting up the new Bundler
    API
  • 12:44 - 12:49
    system earlier this year, it took us probably
    a
  • 12:49 - 12:53
    month to get the replacement up after everything
    went
  • 12:53 - 12:56
    down in October. And then kind of after that,
  • 12:56 - 13:00
    I spent, I don't know, several, several conferences
    worth
  • 13:00 - 13:03
    of time talking with the Rubygems dot org
    team
  • 13:03 - 13:06
    members, the Rubygems team members and the
    Bundler team
  • 13:06 - 13:08
    members. And we kind of, like, all pooled
    our
  • 13:08 - 13:12
    ideas for how to make this less bad. And
  • 13:12 - 13:15
    I kind of aggregated all of them together
    and
  • 13:15 - 13:19
    sanity checked the overall ideas with everyone,
    and we
  • 13:19 - 13:21
    have a plan.
  • 13:21 - 13:24
    It's relatively straightforward, but it's
    a pretty big departure
  • 13:24 - 13:28
    from how we've been doing things up until
    now.
  • 13:28 - 13:31
    So instead of using Marshal to raise, we just
  • 13:31 - 13:33
    have a plain text file that lists the gem
  • 13:33 - 13:36
    names and the versions of the gems. You can
  • 13:36 - 13:39
    parse plain text files with, like, split - you
  • 13:39 - 13:42
    don't have the dangers of marshal or yml or
  • 13:42 - 13:46
    have to worry about the file changing from
    the
  • 13:46 - 13:48
    beginning to end just because you added a
    single
  • 13:48 - 13:51
    thing to the end of the list.
  • 13:51 - 13:54
    Those are all benefits. It's really easy to
    cache
  • 13:54 - 13:58
    plain text files. It's really easy to, you
    know,
  • 13:58 - 14:00
    like, copy them around and look at them and
  • 14:00 - 14:05
    it's much nicer in general, I think. Happily,
    we
  • 14:05 - 14:07
    figured out a way to use plain text files
  • 14:07 - 14:12
    that is very, very, like, within 5% as fast
  • 14:12 - 14:18
    as the current marshal format. So, pretty
    good.
  • 14:18 - 14:22
    So once we have that plain text format, we
  • 14:22 - 14:25
    can make some improvements. We can cache those
    files
  • 14:25 - 14:28
    on the client because it's broken down into
    individual
  • 14:28 - 14:32
    pieces, you know, like, each, each gem has
    a
  • 14:32 - 14:34
    file that lists all of the versions of that
  • 14:34 - 14:37
    gem and all of the gems that those versions
  • 14:37 - 14:40
    depend on. And then there's like a master
    list
  • 14:40 - 14:43
    that tells which gems, like, tells you about
    all
  • 14:43 - 14:46
    of the gems that exist.
  • 14:46 - 14:50
    So because those files are separate and small,
    we
  • 14:50 - 14:53
    can say, hey, I'll just keep these here on
  • 14:53 - 14:55
    my computer and I won't need to redownload
    them
  • 14:55 - 14:58
    every time, because right now both Rubygems
    and Bundler
  • 14:58 - 15:01
    redownload the entire list of gems from scratch
    because
  • 15:01 - 15:04
    it might have updated. And we had no good
  • 15:04 - 15:08
    way to incrementally add to that list with
    the
  • 15:08 - 15:13
    format that we already had.
  • 15:13 - 15:16
    That also reduce, like, so that, the, that
    obviously
  • 15:16 - 15:18
    reduces the size of the data that gets transferred.
  • 15:18 - 15:20
    But that also reduces, by a lot, the number
  • 15:20 - 15:24
    of requests that have to be made, because
    you
  • 15:24 - 15:27
    can do things like check to see if new
  • 15:27 - 15:29
    gems have been pushed since the last time
    you
  • 15:29 - 15:30
    ran, and if you get the response that's like
  • 15:30 - 15:32
    nope, no gems pushed, then you don't even
    have
  • 15:32 - 15:34
    to check to see if any of the individual
  • 15:34 - 15:36
    gems were updated. You just know that they're
    all
  • 15:36 - 15:38
    up to date.
  • 15:38 - 15:45
    So less response data, and less requests means
    faster
  • 15:45 - 15:48
    for everyone, but it will definitely be significantly,
    noticeably,
  • 15:48 - 15:51
    like, it will be noticeably faster in the
    US,
  • 15:51 - 15:53
    but it will be incredibly faster outside of
    the
  • 15:53 - 15:56
    US.
  • 15:56 - 15:58
    Right along those lines to speed things up
    even
  • 15:58 - 16:02
    more, we're going to add CDNs in front of
  • 16:02 - 16:05
    basically everything. Right now the way that
    the architecture
  • 16:05 - 16:09
    works is all requests have to go to AWS
  • 16:09 - 16:13
    in Virginia to find out where to get the
  • 16:13 - 16:18
    data from. And that works for gems, it works
  • 16:18 - 16:21
    for gem specs, it works for the gem index
  • 16:21 - 16:26
    file itself. So everything will be CDN.
  • 16:26 - 16:32
    I, the, the CDN company Fastly volunteered
    to both
  • 16:32 - 16:38
    provide engineering resources and an account.
    And so we're
  • 16:38 - 16:40
    going to have the gem specs, the gems, and
  • 16:40 - 16:43
    the gem index files, which right now are not
  • 16:43 - 16:47
    cached in a CDN at all, available just from
  • 16:47 - 16:48
    Fastly. Which means that when you make a request
  • 16:48 - 16:52
    from Australia, assuming that the file hasn't
    changed, Fastly
  • 16:52 - 16:54
    will just give it to you from a server
  • 16:54 - 16:56
    in Australia. You won't have to - like, there
  • 16:56 - 16:58
    will be no requests that happen to the US
  • 16:58 - 17:01
    until the Rubygems server tells Fastly, hey,
    there's a
  • 17:01 - 17:05
    new version of this file.
  • 17:05 - 17:08
    That should remove all requests that have
    to span
  • 17:08 - 17:11
    the world to install gems, and I'm hoping
    that
  • 17:11 - 17:14
    that will make all of the international Rubyists
    super
  • 17:14 - 17:16
    happy.
  • 17:16 - 17:20
    The final part of the improvement plan is
    to
  • 17:20 - 17:24
    provide easy to install and use local mirrors
    of
  • 17:24 - 17:29
    Rubygems. Right now this is basically a nightmare.
    It's
  • 17:29 - 17:33
    super hard to do and it's not - a
  • 17:33 - 17:35
    combination of the way that it works right
    now
  • 17:35 - 17:38
    and no one having spent a huge amount of
  • 17:38 - 17:40
    time on it means that the options basically
    boil
  • 17:40 - 17:42
    down to, hey you should just put a varnish
  • 17:42 - 17:46
    or squid cache in front of Rubygems. And hope
  • 17:46 - 17:49
    that that does what you want.
  • 17:49 - 17:53
    So we are like as part of this plan,
  • 17:53 - 17:55
    we're building out the app that currently
    provides the
  • 17:55 - 17:59
    Bundler API to also act as a local mirror
  • 17:59 - 18:01
    of Rubygems. So you'll be able to spin up
  • 18:01 - 18:04
    a copy of that inside your data center, near
  • 18:04 - 18:08
    your machines, and, like, we're working with
    the Travis
  • 18:08 - 18:09
    guys to get this set up inside the Travis
  • 18:09 - 18:14
    data center. Various other companies that
    run enough boxes
  • 18:14 - 18:17
    that install gems that they're, like, you
    know, if
  • 18:17 - 18:21
    you're, if you either care about this performance-wise,
    because
  • 18:21 - 18:22
    you have a lot of machines doing a lot
  • 18:22 - 18:24
    of gem installs, or you care about this from
  • 18:24 - 18:27
    like a paranoia perspective, where you want
    to have
  • 18:27 - 18:29
    copies of all of your gems yourself, inside
    your
  • 18:29 - 18:31
    data center, even if Rubygems dot org is gone
  • 18:31 - 18:36
    - this will allow you to do that.
  • 18:36 - 18:40
    And it, we're hoping to - although it's not
  • 18:40 - 18:43
    done yet - have scripts that will let you
  • 18:43 - 18:46
    just like run this on a, an out of
  • 18:46 - 18:48
    the box Ubuntu VM or run this on, you
  • 18:48 - 18:54
    know, like, whatever internal setup you have
    super easily.
  • 18:54 - 18:58
    So after I hashed out all of this plan
  • 18:58 - 18:59
    and wrote it down and said, this is what
  • 18:59 - 19:01
    I'm gonna spend my free time working on for
  • 19:01 - 19:05
    the next whatever, six months or a year, ten
  • 19:05 - 19:08
    years or however long it takes to do this.
  • 19:08 - 19:12
    Ruby Central said hey, that actually sounds
    like a
  • 19:12 - 19:15
    really great idea. And we would like to give
  • 19:15 - 19:16
    a grant to work on that.
  • 19:16 - 19:19
    So, I got a grant to work on that.
  • 19:19 - 19:22
    For the last few months I have been working
  • 19:22 - 19:24
    one or two days a week, paid by Rubygems
  • 19:24 - 19:31
    to implement that plan, which is pretty awesome.
    Yeah.
  • 19:38 - 19:44
    So I'm really excited about that. Like it's
    super
  • 19:44 - 19:46
    great that Ruby Central thought that this
    was worth
  • 19:46 - 19:48
    doing, and I am really happy to have been
  • 19:48 - 19:51
    working on it in that time. I just wanna
  • 19:51 - 19:53
    let you guys know what we've been able to
  • 19:53 - 19:57
    do so far and kind of where we're at.
  • 19:57 - 20:01
    So there had been like a couple of different
  • 20:01 - 20:04
    stabs at a new index format by various Rubygems
  • 20:04 - 20:07
    team members. But nothing that was like super
    solid
  • 20:07 - 20:10
    or that was actually getting used. I spent
    the
  • 20:10 - 20:14
    first, probably, month or so working on the
    index
  • 20:14 - 20:16
    format and making sure that it worked and
    contained
  • 20:16 - 20:19
    all the information that we needed, and it
    was,
  • 20:19 - 20:23
    you know, like usable across both Rubygems
    and Bundler
  • 20:23 - 20:25
    and could be created on the server and all
  • 20:25 - 20:28
    of that stuff.
  • 20:28 - 20:31
    And then started implementing it. So like
    right now,
  • 20:31 - 20:34
    the Bundler API Sinatra app can actually serve
    the
  • 20:34 - 20:37
    new plain text index formats, like, and it
    works.
  • 20:37 - 20:40
    It's really great. There's - I, I have a
  • 20:40 - 20:44
    prototype implementation in Bundler that lets
    Bundler install gem
  • 20:44 - 20:47
    files using only the new index format, from
    only
  • 20:47 - 20:50
    the, you know like, from a server that only
  • 20:50 - 20:55
    speaks the new index format, which is pretty
    sweet.
  • 20:55 - 20:58
    Along with working on this, we, you know,
    kind
  • 20:58 - 21:01
    of like, in that same time period, I worked
  • 21:01 - 21:04
    with the Rubygems dot org team on the SSL
  • 21:04 - 21:06
    issues that I'd previously mentioned, to figure
    out what
  • 21:06 - 21:10
    was going on and get those resolved. We've
    also
  • 21:10 - 21:12
    worked with Fastly and the Rubygems dot org
    team
  • 21:12 - 21:16
    to actually get - it's not the entire plan,
  • 21:16 - 21:18
    but right now gems engine specs are actually
    hosted
  • 21:18 - 21:23
    by Fastly, and we've asked international Rubyists
    to benchmark
  • 21:23 - 21:26
    this change. And it's been a huge improvement.
  • 21:26 - 21:28
    So we're part of the way there on the
  • 21:28 - 21:32
    CDN thing, which is really great.
  • 21:32 - 21:35
    So having made it that far, here's what we
  • 21:35 - 21:37
    have left to do.
  • 21:37 - 21:39
    Rubygems dot org is going to but does not
  • 21:39 - 21:43
    yet serve the new index format. Rubygems itself
    is
  • 21:43 - 21:45
    going to use but does not yet use the
  • 21:45 - 21:48
    new index format. And we're going to get all
  • 21:48 - 21:51
    of those files pushed out into Fastly so that
  • 21:51 - 21:54
    no requests have to go to the Rubygems server
  • 21:54 - 22:01
    itself unless there's a cache miss on the
    CDN.
  • 22:01 - 22:08
    So. At that point, once we have done that,
  • 22:08 - 22:12
    everyone installing gems, everyone using Bundler,
    everyone using Rubygems,
  • 22:12 - 22:13
    will be able to benefit from all of those
  • 22:13 - 22:20
    changes. That's pretty exciting, actually.
    Basically, like, at that
  • 22:23 - 22:29
    point, no matter which, like, gem installing
    client you're
  • 22:29 - 22:32
    using, or how you have, you know like, the
  • 22:32 - 22:34
    server set up, you will be able to use
  • 22:34 - 22:37
    the new index format, get your data from Fastly,
  • 22:37 - 22:42
    and make as few requests as possible and just
  • 22:42 - 22:46
    get to the business of actually installing
    gems.
  • 22:46 - 22:51
    So let's talk about what that means for the
  • 22:51 - 22:52
    future.
  • 22:52 - 22:57
    I am really excited about this plan. Like,
    I
  • 22:57 - 23:02
    - even in my prototype rudamentary testing
    is way
  • 23:02 - 23:06
    faster. I am super, super grateful that the
    Rubygems
  • 23:06 - 23:08
    team and the Rubygems dot org team and the
  • 23:08 - 23:12
    Bundler team have all, like, been helpful
    and supportive
  • 23:12 - 23:15
    as I've been working on this. Ruby Central
    has
  • 23:15 - 23:17
    obviously been paying for some of this work,
    which
  • 23:17 - 23:20
    is super awesome.
  • 23:20 - 23:25
    Kind of more immediately, in the future, there
    is
  • 23:25 - 23:29
    a pre-release version of Bundler right now
    that doesn't
  • 23:29 - 23:34
    include the new index format, yet. Instead
    it includes
  • 23:34 - 23:39
    a parallel installation, which is another
    huge speed increase
  • 23:39 - 23:43
    that we've been working on. If you install
    a
  • 23:43 - 23:45
    pre-release version of Bundler, and then call
    Bundler install
  • 23:45 - 23:48
    with dash j and then a number, it will
  • 23:48 - 23:51
    spin up that many processes or threads to
    install
  • 23:51 - 23:52
    your gems.
  • 23:52 - 23:55
    If you have, like, four or eight cores, this
  • 23:55 - 23:57
    can make a really significant difference in
    how fast
  • 23:57 - 24:02
    your entire gem file gets installed. Horoku
    and Travis
  • 24:02 - 24:07
    are both testing this change and will implement
    it,
  • 24:07 - 24:10
    like, system-wide once the release is final.
  • 24:10 - 24:14
    That should make both deploys and CI runs
    noticeably
  • 24:14 - 24:19
    faster, which will be pretty great.
  • 24:19 - 24:20
    As soon as this version is out as a
  • 24:20 - 24:25
    final release, I'm going to move into the
    pre-release
  • 24:25 - 24:27
    cycle for the version of Bundler with the
    new
  • 24:27 - 24:31
    index format. It's, it's almost baked enough
    to be
  • 24:31 - 24:34
    a pre-release, and I'm kind of like parallel-y
    splitting
  • 24:34 - 24:36
    my time between getting the work that we've
    already
  • 24:36 - 24:40
    done out and working on the new stuff.
  • 24:40 - 24:43
    Once these release - like once this release
    is
  • 24:43 - 24:47
    out completely I will ask everyone to try
    out
  • 24:47 - 24:51
    the new index version of Bundler, and probably
    will
  • 24:51 - 24:53
    have to fix whatever has gone wrong that we
  • 24:53 - 24:59
    didn't notice. But it's like real soon now,
    is
  • 24:59 - 25:04
    basically the answer. That said, there is
    a lot
  • 25:04 - 25:06
    of work left to do.
  • 25:06 - 25:12
    Like, obviously, like, there's ongoing Rubygems
    and Bundler maintenance
  • 25:12 - 25:15
    as Ruby moves forward and as Rubygems moves
    forward.
  • 25:15 - 25:17
    We have to like keep all of them in
  • 25:17 - 25:22
    sync and working together. So there's ongoing
    compatibility work.
  • 25:22 - 25:25
    There's working on making Bundler faster.
    There's working on
  • 25:25 - 25:29
    making all of these things work together to
    be
  • 25:29 - 25:33
    an awesome new system like we've planned.
    Even with
  • 25:33 - 25:35
    the Ruby Central grant, I'm still only able
    to
  • 25:35 - 25:38
    work on this like two days a week.
  • 25:38 - 25:42
    The volunteer teams that work on Rubygems
    and Bundler
  • 25:42 - 25:44
    and Rubygems dot org have been super helpful
    this
  • 25:44 - 25:49
    entire time, and we could totally use more
    help.
  • 25:49 - 25:52
    If any of you are interested and able to
  • 25:52 - 25:55
    help us out, definitely hit me up. I will
  • 25:55 - 25:59
    pull you into the next generation Bundler
    group and
  • 25:59 - 26:02
    we can try and get things worked out so
  • 26:02 - 26:05
    that it happens even faster.
  • 26:05 - 26:07
    That's it.
Title:
Ruby Conf 2013 - Extreme Makeover: Rubygems Edition by André Arko
Description:

more » « less
Duration:
26:38

English subtitles

Revisions