(Theme Music)
This talk is about how Bundler works
How does Bundler work?
This is an interesting question.
We'll talk about it for a while.
This talk is a brief history of
dependency management in Ruby,
a discussion of how libraries and
shared code works now and in the past
because how it works now is directly a result
of how it used to work in the past
and trying to fix problems that happened then.
Before we get started, let me introduce myself:
My name is Andre Arko, I'm @indirect on all social media
that's my avatar, maybe you've seen me on
a webpage somewhere. As my day job
I work at Cloud City Development doing
Ruby, Rails, Ember, and web consulting.
We do web and mobile development
and I mostly do architectural consulting
and Senior Developer pairing and training.
Talk to me if you're company is interested.
I also founded Ruby Together, a non-profit,
it's like npm incorporate without the venture capital.
Ruby Together is a trade association that
takes money from companies and people who
use Ruby and Bundler and RubyGems and all
of the public infrastructure that Rubyists use
and pays for developers to work on that
so that RubyGems.org stays up, and so
people can have gems, which is pretty cool.
As part of my work for Ruby Together I work as
lead of the Bundler team. I've been working on
Bundler since before 1.0 came out, and I've
been team lead for the last four years.
Using Ruby code written by other developers,
nowadays this is actually really easy,
you add a line to your Gemfile,
you go to your terminal and run
bundle install, and you start using it.
Pretty cool, that's really easy.
The thing that I've noticed, talking to people
who use Bundler and think it's awesome
is that, it's not actually clear what just happened.
Based on the text printed out by bundle install
it seems like something got downloaded
and something got installed, but it's not clear.
It's not clear what got downloaded or
installed, or where it happened.
What exactly happened there?
Nobody is really sure.
How does just putting a line in your Gemfile
mean you can just start using somebody else's code?
To explain that, we'll need a little bit of history.
We're going to back in a time.
I'm going to give you a tour from the beginning of sharing
code in Ruby up until now.
And hopefully by the end of it you'll understand
why things work the way they do now.
I'm going to start talking about require,
which came with the very first version of Ruby ever, in 1994.
And then talk about setup.rb from 2000,
and then RubyGems from 2003, and Bundler from 2009.
And that's what we're still using today.
The require method has been around since
1994, with the very first version of Ruby.
What I should say is that it's been there
since at least 1997, since that's
the oldest version controlled Ruby we have.
It was probably there before that though.
Require can be broken down into even
smaller concepts. Using code from
a file is basically the same as inserting
that code and having Ruby run it
as if you'd just written it in the file.
It's actually possible to implement it yourself,
with a one-line function.
This function says; I have a file
name and I want to require it,
and you read the file in the memory
into a string and you pass the
string to eval, and Ruby runs it
and it's just like you typed that code yourself.
There are problems with this.
Require doesn't work this way in real life.
I'm sure it's totally fine that this will
run that same piece of code over and over
if you require it over and over, you like
having lots and lots of constants that keep
getting redefined, I'm sure it's totally fine.
Working around that, is pretty straightforward.
Just keep track of what you've required in an Array
and not require something again if it' been required.
As you can see here,
you set up an Array, you check
to see if the Array already contains
the filename that just got passed in,
and if hasn't been required,
do the same thing we did before,
read the file in, pass it to eval,
and then add it to the array,
so it's not required again later.
In fact, this exactly what Ruby does,
but written in C not in Ruby.
There is a LOADED_FEATURES global variable,
and it's an Array, and it contains a list
of all the required files.
If you want to know if you've required something yet,
check the LOADED_FEATURES array.
There is one more problem with this,
it only works when you pass in absolute paths.
I'm sure you don't mind you typing the
full path from wherever you are to
exactly wherever the file you want to require is.
I'm sure that's fine too.
The easiest way to allow requires that
aren't absolute is to just treat
all requires as if they're relative
to the path where you started
the Ruby program. And that's easy,
but that doesn't help a lot if you
want to require Ruby files from different places.
Say you have a folder full of a library
you wrote and folder full of an application you wrote
and you want to use a library from the app, you can't,
because writing relative paths from wherever
you started the Ruby program would be terrible.
Instead we create an Array that holds the
list of paths we want to load we want to load
Ruby files from, in a burst of creativity
I'm just going to call that variable the
LOAD_PATH, and here's an implementation.
If you put something in the LOAD_PATH Array,
you can then pass a relative path to any directory
that's in the LOAD_PATH Array, and
it will look for the file.
If you require "foo", it will look for a file
named "foo" inside any of the LOAD_PATH directories,
and if the first one we find searching
the LOAD_PATH in order from first to last,
we will require that one.
Coincidentally, this is exactly what
Ruby does, there is a global variable
named LOAD_PATH, and if you put
a string that contains a path to a directory
in it, Ruby will look in that directory
whenever you require something for a file
with that name.
You can totally use the LOAD_PATH to require
files from somewhere else while you're working with them.
Of course, the LOADPATH, and LOADED_FEATURES
can both be combined, but that didn't
fit on a single slide, so I'll leave that
as an exercise to the listener.
It's pretty straightforward to be honest.
Load paths are pretty cool.
They allow us to load Ruby directories
even if they're spread across multiple places.
At this point, we could even
have automatically, at the start of every script,
the directory that holds the standard library,
to the load path, and then all of the
files that are pretty of the Ruby standard library,
like Net::HTTP, Set, the cool thing that
come with Ruby, could just be available for
require automatically and you wouldn't have
to worry about putting them in the
load path yourself. That's exactly
what Ruby does, the standard library
starts on the load path when Ruby starts.
It's pretty great. This was cool, and
for several years, this was enough.
People just added things to the load path.
Or wrote scripts that added things to the
load path before requiring things before their
actual script happened.
The thing that got tedius out just having
load paths, is that if you want to get code from
someone else, you have to find that code,
download it, put it somewhere, remember where,
put it in the load path, and then require it.
This was tedious.
Setup.rb happened next.
Around the year 2000 everyone is still
installing share Ruby code by hand.
That wasn't so much fun.
A Japanese Ruby developer, Minero Aoki,
wrote setup.rb, and amazingly,
even though this was created in 2000,
setup.rb is still around on the Internet.
The website for this developer is,
i.loveruby.net, which is pretty cool,
and you can even download setup.rb, but
to be honest, it hasn't been updated since 2005,
so I'm not sure it's super helpful to you.
How did setup.rb work?
At it's core it mimicked the classic
UNIX installation pattern,
downloading a piece of software,
decompressing it, and then running
configure make, and make install,
so setup.rb kind of copied that for Ruby.
You would run ruby setup.rb setup,
ruby setup.rb config, ruby setup.rb install
setup.rb would copy all the Ruby files,
there was a specific directory structure,
kind of like a Gem today, with
library files, and bin files you could run as programs,
and support files, and setup.rb would
copy all of those files into a directory
that was already in the load path called,
site ruby, and that was the ruby files
you had installed that were specific
to your computer.
After setup.rb, using Ruby libraries
was much easier than it had been.
You could find a library online,
download it, you had to untar it by hand,
and run ruby setup.rb all by hand,
but then it was all installed, and no more
manual copying, no more having to
manage all these files.
Everything was in the load path,
you could just require it after setup.rb ran.
After a little while, some of the
shortcomings of this scheme became apparent, too.
There were no versions for any libraries,
and after you run setup.rb there's not even
a way to tell what version you have, unless
you write it down, or the library author
was really nice, and put the version into
the code somehow. There was no way
to uninstall, everything thrown into
the same directory. You'd run setup.rb
for 5 different Ruby libraries and
now all of their files are in one directory.
Good luck figuring out which ones belongs to which.
If you delete the wrong one, too bad.
Upgrading was super fun, if there was
a new version of the library, which
good luck finding that out, you
have to remember the website you got
it from in the first place.
I hope you write all these down.
I hope you've written down every
website you've ever downloaded Ruby from.
You have to go back to that website,
remember which version you have, which
as I said before, there's nothing there unless
you wrote it down.
And then you have to download the
tarball with the new version, and
decompress it, and CD into it and
run ruby setup.rb on it all, and
hope that the new version didn't delete
any files because the old files are still there.
This was tedious, it was really tedious.
People frequently had no idea what
was actually happening with their libraries.
It was not uncommon for people to be like
"Oh this doesn't work, I'll just fix it
in my site ruby directory, ok everything
is great now"
Super awesome.
At some point, some people were like
this is not great. What if you could just
gem install. That would be cool.
And so in 2003, RubyGems came to the rescue.
And fixed all of the problems with setup.rb
that were known. You could check
to see if a library existed by running gem list,
install a gem by gem install, uninstall gems.
RubyGems kept each of these libraries in different directories.
You knew which libraries you had, and how to uninstall
and install new versions, all with one command.
No having to find it on the internet somewhere,
download, and unpack it, setup.rb it.
And RubyGems had another super cool trick
up it's sleeves -- versions.
RubyGems actually kept each version of each
gem in a different place. You could install
multiple versions of the same library.
And they could all be in your Ruby
because they didn't all go into one giant folder,
they went into their own separate folders.
Folders for rails 4.1, 4.2, and 5.0.
To make this work, because require doesn't
support versioning, inherently,
RubyGems added a gem method that
let's you say, I need version 1.0 of rack,
and RubyGems will check to make sure it's
installed, put that directory, just the one
with rack 1.0 into your load path.
So when you run require "rack" you'll
get rack 1.0, it's pretty cool.
Calling the gem method, told RubyGems
you wanted to manipulate the load path to
load exactly the version you knew your
code wanted to talk to.
It was pretty useful.
RubyGems also has a way to support
versioning even in commands that
come with gems. The rack gem
comes with the rackup command, and
if you have multiple versions of rack installed,
the rack command could run any of those versions.
RubyGems defaults to the newest version you have
installed, hoping the newest is the right one.
But if that's not, RubyGems checks the first
argument to the command for something
with underscores on either sides,
it takes that as the version number
that you want to use.
In the above example, we're running
rackup from rack version 1.2.2, and only 1.2.2.
If you don't have that version installed, RubyGems will
make you install that version first.
RubyGems was really, really successful.
Ruby grew in popularity a lot, but RubyGems
made sharing Ruby code grow a lot.
Present day we have 100,000 gems, with 1,000,000 versions.
That's a lot of shared Ruby code.
You probably knew this was coming,
but as cool as RubyGems is, it still had
some problems. If you have multiple
applications that all use RubyGems to load
their dependencies, this can be problematic.
It's hard to coordinate across multiple applications
because, each installation of Ruby itself just has
a set of gems. If you ran gem install, now
there are all these gems.
If one developer runs gem install "foo" and
starts using "foo" in their application,
commits that code and checks it in,
and the next person checks it out
and tries to run the application,
it's going to explode, because it doesn't
know what foo is, you need to fix that.
It led to an area of pure manual dependency management.
Start a new job, hooray!
This literally happened to me in 2008.
New job, welcome to the team, here's
your cool new laptop, we
except you to have the application
running by next week.
It actually took me only 3 and a half days,
working overtime on this. It was amazing.
[Audience Laughs]
To figure out which gems to run gem install,
I looked in the README and there
was a list.
And I installed all of them.
But clearly there was some that people
forgot to put in the README, and
then it kind of worked, but I wasn't
able to get images working. And then
some other developer was like,
you need to install imagemagick,
this was before homebrew. It was terrifying.
To try and fix this problem,
of do we just put the gems in the README?
How do we know if we have
written everything in the README?
"I don't know? Try it?"
Of course, you'd need a new machine
to try it on, because after 3 years
of using Ruby you generally have
installed every gem, and you have
no idea what's important and what's not.
It's terrible.
People started to work on tools to help this problem.
Rails added config.gem, this is Rails 2.3, 2.4 era.
You would put all the gems you need in application.rb
This was super helpful if you needed
to know for sure this was the
master list of all the gems
you needed in your application, but
you could only access that list when
Rails was already loaded.
It was pretty bad.
Because RubyGems automatically uses
the newest version of each gem, just having
an older version installed, didn't mean it
would be used. And if you install
some gem a month after the other person did,
maybe there's a new version? You would
just get the new version automatically.
This is also totally a real-life experience that happened to me in 2009.
Debug a production server that just randomly throws exceptions.
For three days.
The other production servers are fine.
We can't reproduce this problem on a single developer laptop.
What is going on? This is so weird.
After 3 days I finally thought to look at
the output from the gemlist for the entire production
machine and I was like, oh this production
server has gem version 1.1.3 and every
other production server and developer laptop has 1.1.4.
That was the problem.
There was a bug and only that server
had this problem.
And then, like I was saying, about Rails versions,
you could gem install rails, be happy,
make a new app, run your server,
everything is great. And then
you switch to another application
that already existed, didn't get
written to use that version of rails,
got writen to use some older version of rails.
You're like, "Okay, let's go!"
"Boom", because you didn't have
the right version of Rails.
If you put your rails version in the rails config
rails would complain you had the wrong version,
but rails had to be successfully started up to
tell you that you had the wrong version, so
it didn't actually help.
Ultimately, it was a significant part of my job
to figure this shit out by hand, and it sucked.
Depending on what you did on your team,
some people on my team at the time spent
a quarter or a third of their time
doing nothing but figuring out and fixing
dependency management issues.
And I felt really, really bad for them.
Sometimes it was me and I felt really bad for me.
Then there's one more, even after,
you've done all of this by hand management,
there's one more problem that RubyGems has
that is another reason why bundler was created.
Activation Errors, they happen in ruby gems
when you load an application and start by
requiring gems, ruby gems will load the newest
versions of those gems that it can.
Sometimes a gem's dependents need
other gems, that need other gems,
and you'll get the newest version of the
child gem. And later you'll say, I also
need this gem, but that gem won't work with the other.
So how common can this be really?
Unfortunately, it was super common.
Not like happens to you every day common,
but like happens you two or three times a year
and when it does you basically tear
all your hair out, delete your entire
ruby install, uninstall and reinstall all your gems,
because figuring out exactly which combo
of installed gems was causing this
problem was a total nightmare.
This is a real-life activation error.
I salvaged this from a presentation I gave in 2010
about why Bundler exists.
This is a rails app, it's loading, and
rails of course depends on ActionPack, this
was the Rails 2.3 era, ActionPack depends on Rack,
Rack is a gem that helps Rails talk to web servers.
And thin, which is a web server, also depends on rack.
So, rack is how rails talks to thin, how thin
talks to rails, but there's a problem.
thin is perfectly happy to use rack 1.1, which makes some
changes to how rack works.
ActionPack is not happy to use rack 1.1, and
can only use rack 1.0. And so
when you run your server, it loads thin
first because thin is the server.
And thin gets to work trying to load the rails app
and your rails app says "I can't use that rack, sorry"
The reason this happens is runtime resolution.
RubyGems figures out which versions
of which gems of which gems it should load.
After RubyGems is already running.
You say, "Hey I need a thing", and
it's like "Okay, this version might work".
And if later on you say,
"I need a thing that doesn't work with things you've already done"
RubyGems just has to be like, can't fix that.
The fix for this problem is to figure out all the versions
before you run your application.
You have to know the versions you're going
to use are all versions that can work together.
Resolving things at install time,
knowing you're installing versions that work together.
How do we make sure all the versions we're
installing work together?
That's actually where Bundler comes in.
Before Bundler, the process of figuring out
which gems would work together
was done entirely by hand and it
consisted of gem uninstall,
gem install a slighty older version, does rails start up yet?
Repeat the process.
When the exception stopped you knew you'd won.
Unsurprisingly, computers are faster at this than people.
Computers are also good and accurate at trying
many, many, many options until one works.
This is what Bundler does.
Bundler figures out the entire list of every gem
and every version of every gem that
you need, but that also all
work together with one another.
This is called Dependency Graph Resolution,
and there's an entire academic literature about this.
It's kind of well-known hard problem, it's
part of the set of problems called NP complete,
and the totally fantastic thing, and I say
this as a person who has to fix Bundler
when it doesn't work, in theory, you can construct
a set of gems in a gemfile such that
it is not possible to find a set of gems that
work together until after the heat death of the universe.
[Audience Laughs]
Most of the time we don't have that long to wait.
We use a lot of tricks, shortcuts, and heuristics
to figure out which gems to try first and
hopefully finish before you've drunk
that cup of coffee or whatever.
We have a large built-up set of tricks over the years
and most Gemfiles resolve in less than 10 seconds.
Which is pretty cool, considering the upper bound
on that is practically infinity.
After finding versions that work together
because this problem was really hard,
and we don't want to do this over and over.
Bundler writes down the exact versions of every gem
that did all work together, so they can be reused
by other people who are also interested in running
your application. That file is called Gemfile.lock.
Shows which gems to be installed,
the versions to install, and as a bonus
the lock file is what makes it possible
to install the exact same version of every
gem on every machine that's running this application.
That means when you develop on your laptop
you get whatever version of the gem that was
newest when you were developing because run
bundle install and got newest version by default.
Because of the lock file, when you put
that on your production server, you're guaranteed
to have the same versions. And you won't
have to spend 3 days figuring out why
that production server doesn't quite
work all of the time.
It's pretty great.
Fundamentally, the core of bundler consist of two steps.
bundle install, and bundle exec.
The steps for bundle install are simple.
They're totally understandable in plain english
It fits on a single slide, which is great.
I edited this slide for ten minutes deleting words.
So the steps are:
1. Read the Gemfile
2.Ask RubyGems.org for a list of all the gems we need
3. Find versions of those gems both allowed by Gemfile
4. Once found, write all those down in lock and install them all.
And that's how bundle install works.
BundleInstall uses RubyGems under the covers
to the installation, and so every
bundle is it's own little rubygems isolated install.
Every application has it's own rubygems thanks to bundler.
The next step is bundle exec.
This is how we use that applications dedicated ruby gems
instead of the one with whatever in it
because you ran gem install last year.
The way bundle exec works is:
1. Reads the Gemfile, and lock if it's there.
2a. Use locked gems if possible OR
2b. Find versions that work together like install would.
except bundle exec doesn't do any installing.
3. Deletes any existing gems in the LOAD_PATH
4. Adds the exact gem at the exact version at the load path.
That's it. That's all bundle exec does.
Once all the gems work together, and
there exact versions are in the load path
your application is happy. There is no
activation errors, all your requires succeed, I hope.
Everything is pretty great.
As I think I promised in the abstract for this talk,
here's a bundle exec removing pro tip.
I don't really like typing bundle exec, I find it
really annoying, but bundler provides a way
to not have to type it all the time.
And it's to create programs that map to
ruby gems installation that
belongs to that application.
You can use the binstubs command,
bundle binstubs [some gem]
and it will create, in the bin directory,
a program for that gem, that only
runs the exact version that belongs to
that application. So if you have
rspec in your rails app, you can have
bin/rspec that will only load the rspec
for your app. This way you can have
bin/rspec refer to rspec 3, and this application
can have rspec 2. Rails has started to do this.
Rails 4 ships with bin/rails bin/rake that are scoped
so when you run bin/rails, you get the exact
rails version for this application and not another one.
When you run bin/rake you get the exact version of rake.
Pretty cool, no more bundle exec.
If everyone did this, you can check in these binstubs
so you can take bin/rspec, but it in git,
and it'll be mapped to that application forever,
so no one would have bundle exec
ever again if everyone did this.
Now we bundle install, all our gems
show up. We have versions
dedicated for individual applications.
But, as you probably sensed a problem
going through history, that wasn't actually
the end. There are still problems
that show up after bundler came out.
The biggest problem that was left was
running bundle install, took forever.
If you lived a long time from the United States
it took a really long time.
I talked to some developers in South Africa
when I went there to give a talk
and they told me about how running
bundle install means they literally get
up to start making a cup of coffee
that they can finish before bundle install does.
To try and speed things up, bundler 1.1
created a completely different
way to get information from rubygems about gems.
And that sped things up by 50%, a big win.
We keep working on this, bundler 1.9 just
came out this month. There's a bunch more
improvements we're working on.
If you're interested in following along with that,
the bundler websites has news annoucements
at bundler.io, and twitter we're also @bundlerio.
Having said all of this, if you use Bundler,
I would totally love to have your help working on it.
It's an open source project.
We've dedicated a lot of time to making it easy
for people who don't know how to do open source
to help with Bundler, and to start working on Bundler,
and to get into open source that way.
It's a project at Github.com/bundler/bundler.
If you're interested but don't know where to start
email the bundler team at team@bundler.io
and we'll get you set up.
On the other hand, if you
have a job that means you have money,
but not time, join Ruby Together, and give
us money, and we'll work on Bundler, and it'll be
better. As RubyTogether grows, we will also be
tackling bigger community issues.
We want to add easy to use gem mirrors so you
don't have to go all the way to rubygems.org
for your office or data center, we want to
add better public benchmarks. There's a project
calling ruby-bench that's starting to do that,
and we'd really like to expand it.
There's a bunch of other things
that RubyTogether is working on that are cool
If you want Bundler or RubyTogether stickers
I have a giant pile, so find me later.
That's it.
[Audience Applause]