(Theme Music) This talk is about how Bundler works How does Bundler work? This is an interesting question. We'll talk about it for a while. This talk is a brief history of dependency management in Ruby, a discussion of how libraries and shared code works now and in the past because how it works now is directly a result of how it used to work in the past and trying to fix problems that happened then. Before we get started, let me introduce myself: My name is Andre Arko, I'm @indirect on all social media that's my avatar, maybe you've seen me on a webpage somewhere. As my day job I work at Cloud City Development doing Ruby, Rails, Ember, and web consulting. We do web and mobile development and I mostly do architectural consulting and Senior Developer pairing and training. Talk to me if you're company is interested. I also founded Ruby Together, a non-profit, it's like npm incorporate without the venture capital. Ruby Together is a trade association that takes money from companies and people who use Ruby and Bundler and RubyGems and all of the public infrastructure that Rubyists use and pays for developers to work on that so that RubyGems.org stays up, and so people can have gems, which is pretty cool. As part of my work for Ruby Together I work as lead of the Bundler team. I've been working on Bundler since before 1.0 came out, and I've been team lead for the last four years. Using Ruby code written by other developers, nowadays this is actually really easy, you add a line to your Gemfile, you go to your terminal and run bundle install, and you start using it. Pretty cool, that's really easy. The thing that I've noticed, talking to people who use Bundler and think it's awesome is that, it's not actually clear what just happened. Based on the text printed out by bundle install it seems like something got downloaded and something got installed, but it's not clear. It's not clear what got downloaded or installed, or where it happened. What exactly happened there? Nobody is really sure. How does just putting a line in your Gemfile mean you can just start using somebody else's code? To explain that, we'll need a little bit of history. We're going to back in a time. I'm going to give you a tour from the beginning of sharing code in Ruby up until now. And hopefully by the end of it you'll understand why things work the way they do now. I'm going to start talking about require, which came with the very first version of Ruby ever, in 1994. And then talk about setup.rb from 2000, and then RubyGems from 2003, and Bundler from 2009. And that's what we're still using today. The require method has been around since 1994, with the very first version of Ruby. What I should say is that it's been there since at least 1997, since that's the oldest version controlled Ruby we have. It was probably there before that though. Require can be broken down into even smaller concepts. Using code from a file is basically the same as inserting that code and having Ruby run it as if you'd just written it in the file. It's actually possible to implement it yourself, with a one-line function. This function says; I have a file name and I want to require it, and you read the file in the memory into a string and you pass the string to eval, and Ruby runs it and it's just like you typed that code yourself. There are problems with this. Require doesn't work this way in real life. I'm sure it's totally fine that this will run that same piece of code over and over if you require it over and over, you like having lots and lots of constants that keep getting redefined, I'm sure it's totally fine. Working around that, is pretty straightforward. Just keep track of what you've required in an Array and not require something again if it' been required. As you can see here, you set up an Array, you check to see if the Array already contains the filename that just got passed in, and if hasn't been required, do the same thing we did before, read the file in, pass it to eval, and then add it to the array, so it's not required again later. In fact, this exactly what Ruby does, but written in C not in Ruby. There is a LOADED_FEATURES global variable, and it's an Array, and it contains a list of all the required files. If you want to know if you've required something yet, check the LOADED_FEATURES array. There is one more problem with this, it only works when you pass in absolute paths. I'm sure you don't mind you typing the full path from wherever you are to exactly wherever the file you want to require is. I'm sure that's fine too. The easiest way to allow requires that aren't absolute is to just treat all requires as if they're relative to the path where you started the Ruby program. And that's easy, but that doesn't help a lot if you want to require Ruby files from different places. Say you have a folder full of a library you wrote and folder full of an application you wrote and you want to use a library from the app, you can't, because writing relative paths from wherever you started the Ruby program would be terrible. Instead we create an Array that holds the list of paths we want to load we want to load Ruby files from, in a burst of creativity I'm just going to call that variable the LOAD_PATH, and here's an implementation. If you put something in the LOAD_PATH Array, you can then pass a relative path to any directory that's in the LOAD_PATH Array, and it will look for the file. If you require "foo", it will look for a file named "foo" inside any of the LOAD_PATH directories, and if the first one we find searching the LOAD_PATH in order from first to last, we will require that one. Coincidentally, this is exactly what Ruby does, there is a global variable named LOAD_PATH, and if you put a string that contains a path to a directory in it, Ruby will look in that directory whenever you require something for a file with that name. You can totally use the LOAD_PATH to require files from somewhere else while you're working with them. Of course, the LOADPATH, and LOADED_FEATURES can both be combined, but that didn't fit on a single slide, so I'll leave that as an exercise to the listener. It's pretty straightforward to be honest. Load paths are pretty cool. They allow us to load Ruby directories even if they're spread across multiple places. At this point, we could even have automatically, at the start of every script, the directory that holds the standard library, to the load path, and then all of the files that are pretty of the Ruby standard library, like Net::HTTP, Set, the cool thing that come with Ruby, could just be available for require automatically and you wouldn't have to worry about putting them in the load path yourself. That's exactly what Ruby does, the standard library starts on the load path when Ruby starts. It's pretty great. This was cool, and for several years, this was enough. People just added things to the load path. Or wrote scripts that added things to the load path before requiring things before their actual script happened. The thing that got tedius out just having load paths, is that if you want to get code from someone else, you have to find that code, download it, put it somewhere, remember where, put it in the load path, and then require it. This was tedious. Setup.rb happened next. Around the year 2000 everyone is still installing share Ruby code by hand. That wasn't so much fun. A Japanese Ruby developer, Minero Aoki, wrote setup.rb, and amazingly, even though this was created in 2000, setup.rb is still around on the Internet. The website for this developer is, i.loveruby.net, which is pretty cool, and you can even download setup.rb, but to be honest, it hasn't been updated since 2005, so I'm not sure it's super helpful to you. How did setup.rb work? At it's core it mimicked the classic UNIX installation pattern, downloading a piece of software, decompressing it, and then running configure make, and make install, so setup.rb kind of copied that for Ruby. You would run ruby setup.rb setup, ruby setup.rb config, ruby setup.rb install setup.rb would copy all the Ruby files, there was a specific directory structure, kind of like a Gem today, with library files, and bin files you could run as programs, and support files, and setup.rb would copy all of those files into a directory that was already in the load path called, site ruby, and that was the ruby files you had installed that were specific to your computer. After setup.rb, using Ruby libraries was much easier than it had been. You could find a library online, download it, you had to untar it by hand, and run ruby setup.rb all by hand, but then it was all installed, and no more manual copying, no more having to manage all these files. Everything was in the load path, you could just require it after setup.rb ran. After a little while, some of the shortcomings of this scheme became apparent, too. There were no versions for any libraries, and after you run setup.rb there's not even a way to tell what version you have, unless you write it down, or the library author was really nice, and put the version into the code somehow. There was no way to uninstall, everything thrown into the same directory. You'd run setup.rb for 5 different Ruby libraries and now all of their files are in one directory. Good luck figuring out which ones belongs to which. If you delete the wrong one, too bad. Upgrading was super fun, if there was a new version of the library, which good luck finding that out, you have to remember the website you got it from in the first place. I hope you write all these down. I hope you've written down every website you've ever downloaded Ruby from. You have to go back to that website, remember which version you have, which as I said before, there's nothing there unless you wrote it down. And then you have to download the tarball with the new version, and decompress it, and CD into it and run ruby setup.rb on it all, and hope that the new version didn't delete any files because the old files are still there. This was tedious, it was really tedious. People frequently had no idea what was actually happening with their libraries. It was not uncommon for people to be like "Oh this doesn't work, I'll just fix it in my site ruby directory, ok everything is great now" Super awesome. At some point, some people were like this is not great. What if you could just gem install. That would be cool. And so in 2003, RubyGems came to the rescue. And fixed all of the problems with setup.rb that were known. You could check to see if a library existed by running gem list, install a gem by gem install, uninstall gems. RubyGems kept each of these libraries in different directories. You knew which libraries you had, and how to uninstall and install new versions, all with one command. No having to find it on the internet somewhere, download, and unpack it, setup.rb it. And RubyGems had another super cool trick up it's sleeves -- versions. RubyGems actually kept each version of each gem in a different place. You could install multiple versions of the same library. And they could all be in your Ruby because they didn't all go into one giant folder, they went into their own separate folders. Folders for rails 4.1, 4.2, and 5.0. To make this work, because require doesn't support versioning, inherently, RubyGems added a gem method that let's you say, I need version 1.0 of rack, and RubyGems will check to make sure it's installed, put that directory, just the one with rack 1.0 into your load path. So when you run require "rack" you'll get rack 1.0, it's pretty cool. Calling the gem method, told RubyGems you wanted to manipulate the load path to load exactly the version you knew your code wanted to talk to. It was pretty useful. RubyGems also has a way to support versioning even in commands that come with gems. The rack gem comes with the rackup command, and if you have multiple versions of rack installed, the rack command could run any of those versions. RubyGems defaults to the newest version you have installed, hoping the newest is the right one. But if that's not, RubyGems checks the first argument to the command for something with underscores on either sides, it takes that as the version number that you want to use. In the above example, we're running rackup from rack version 1.2.2, and only 1.2.2. If you don't have that version installed, RubyGems will make you install that version first. RubyGems was really, really successful. Ruby grew in popularity a lot, but RubyGems made sharing Ruby code grow a lot. Present day we have 100,000 gems, with 1,000,000 versions. That's a lot of shared Ruby code. You probably knew this was coming, but as cool as RubyGems is, it still had some problems. If you have multiple applications that all use RubyGems to load their dependencies, this can be problematic. It's hard to coordinate across multiple applications because, each installation of Ruby itself just has a set of gems. If you ran gem install, now there are all these gems. If one developer runs gem install "foo" and starts using "foo" in their application, commits that code and checks it in, and the next person checks it out and tries to run the application, it's going to explode, because it doesn't know what foo is, you need to fix that. It led to an area of pure manual dependency management. Start a new job, hooray! This literally happened to me in 2008. New job, welcome to the team, here's your cool new laptop, we except you to have the application running by next week. It actually took me only 3 and a half days, working overtime on this. It was amazing. [Audience Laughs] To figure out which gems to run gem install, I looked in the README and there was a list. And I installed all of them. But clearly there was some that people forgot to put in the README, and then it kind of worked, but I wasn't able to get images working. And then some other developer was like, you need to install imagemagick, this was before homebrew. It was terrifying. To try and fix this problem, of do we just put the gems in the README? How do we know if we have written everything in the README? "I don't know? Try it?" Of course, you'd need a new machine to try it on, because after 3 years of using Ruby you generally have installed every gem, and you have no idea what's important and what's not. It's terrible. People started to work on tools to help this problem. Rails added config.gem, this is Rails 2.3, 2.4 era. You would put all the gems you need in application.rb This was super helpful if you needed to know for sure this was the master list of all the gems you needed in your application, but you could only access that list when Rails was already loaded. It was pretty bad. Because RubyGems automatically uses the newest version of each gem, just having an older version installed, didn't mean it would be used. And if you install some gem a month after the other person did, maybe there's a new version? You would just get the new version automatically. This is also totally a real-life experience that happened to me in 2009. Debug a production server that just randomly throws exceptions. For three days. The other production servers are fine. We can't reproduce this problem on a single developer laptop. What is going on? This is so weird. After 3 days I finally thought to look at the output from the gemlist for the entire production machine and I was like, oh this production server has gem version 1.1.3 and every other production server and developer laptop has 1.1.4. That was the problem. There was a bug and only that server had this problem. And then, like I was saying, about Rails versions, you could gem install rails, be happy, make a new app, run your server, everything is great. And then you switch to another application that already existed, didn't get written to use that version of rails, got writen to use some older version of rails. You're like, "Okay, let's go!" "Boom", because you didn't have the right version of Rails. If you put your rails version in the rails config rails would complain you had the wrong version, but rails had to be successfully started up to tell you that you had the wrong version, so it didn't actually help. Ultimately, it was a significant part of my job to figure this shit out by hand, and it sucked. Depending on what you did on your team, some people on my team at the time spent a quarter or a third of their time doing nothing but figuring out and fixing dependency management issues. And I felt really, really bad for them. Sometimes it was me and I felt really bad for me. Then there's one more, even after, you've done all of this by hand management, there's one more problem that RubyGems has that is another reason why bundler was created. Activation Errors, they happen in ruby gems when you load an application and start by requiring gems, ruby gems will load the newest versions of those gems that it can. Sometimes a gem's dependents need other gems, that need other gems, and you'll get the newest version of the child gem. And later you'll say, I also need this gem, but that gem won't work with the other. So how common can this be really? Unfortunately, it was super common. Not like happens to you every day common, but like happens you two or three times a year and when it does you basically tear all your hair out, delete your entire ruby install, uninstall and reinstall all your gems, because figuring out exactly which combo of installed gems was causing this problem was a total nightmare. This is a real-life activation error. I salvaged this from a presentation I gave in 2010 about why Bundler exists. This is a rails app, it's loading, and rails of course depends on ActionPack, this was the Rails 2.3 era, ActionPack depends on Rack, Rack is a gem that helps Rails talk to web servers. And thin, which is a web server, also depends on rack. So, rack is how rails talks to thin, how thin talks to rails, but there's a problem. thin is perfectly happy to use rack 1.1, which makes some changes to how rack works. ActionPack is not happy to use rack 1.1, and can only use rack 1.0. And so when you run your server, it loads thin first because thin is the server. And thin gets to work trying to load the rails app and your rails app says "I can't use that rack, sorry" The reason this happens is runtime resolution. RubyGems figures out which versions of which gems of which gems it should load. After RubyGems is already running. You say, "Hey I need a thing", and it's like "Okay, this version might work". And if later on you say, "I need a thing that doesn't work with things you've already done" RubyGems just has to be like, can't fix that. The fix for this problem is to figure out all the versions before you run your application. You have to know the versions you're going to use are all versions that can work together. Resolving things at install time, knowing you're installing versions that work together. How do we make sure all the versions we're installing work together? That's actually where Bundler comes in. Before Bundler, the process of figuring out which gems would work together was done entirely by hand and it consisted of gem uninstall, gem install a slighty older version, does rails start up yet? Repeat the process. When the exception stopped you knew you'd won. Unsurprisingly, computers are faster at this than people. Computers are also good and accurate at trying many, many, many options until one works. This is what Bundler does. Bundler figures out the entire list of every gem and every version of every gem that you need, but that also all work together with one another. This is called Dependency Graph Resolution, and there's an entire academic literature about this. It's kind of well-known hard problem, it's part of the set of problems called NP complete, and the totally fantastic thing, and I say this as a person who has to fix Bundler when it doesn't work, in theory, you can construct a set of gems in a gemfile such that it is not possible to find a set of gems that work together until after the heat death of the universe. [Audience Laughs] Most of the time we don't have that long to wait. We use a lot of tricks, shortcuts, and heuristics to figure out which gems to try first and hopefully finish before you've drunk that cup of coffee or whatever. We have a large built-up set of tricks over the years and most Gemfiles resolve in less than 10 seconds. Which is pretty cool, considering the upper bound on that is practically infinity. After finding versions that work together because this problem was really hard, and we don't want to do this over and over. Bundler writes down the exact versions of every gem that did all work together, so they can be reused by other people who are also interested in running your application. That file is called Gemfile.lock. Shows which gems to be installed, the versions to install, and as a bonus the lock file is what makes it possible to install the exact same version of every gem on every machine that's running this application. That means when you develop on your laptop you get whatever version of the gem that was newest when you were developing because run bundle install and got newest version by default. Because of the lock file, when you put that on your production server, you're guaranteed to have the same versions. And you won't have to spend 3 days figuring out why that production server doesn't quite work all of the time. It's pretty great. Fundamentally, the core of bundler consist of two steps. bundle install, and bundle exec. The steps for bundle install are simple. They're totally understandable in plain english It fits on a single slide, which is great. I edited this slide for ten minutes deleting words. So the steps are: 1. Read the Gemfile 2.Ask RubyGems.org for a list of all the gems we need 3. Find versions of those gems both allowed by Gemfile 4. Once found, write all those down in lock and install them all. And that's how bundle install works. BundleInstall uses RubyGems under the covers to the installation, and so every bundle is it's own little rubygems isolated install. Every application has it's own rubygems thanks to bundler. The next step is bundle exec. This is how we use that applications dedicated ruby gems instead of the one with whatever in it because you ran gem install last year. The way bundle exec works is: 1. Reads the Gemfile, and lock if it's there. 2a. Use locked gems if possible OR 2b. Find versions that work together like install would. except bundle exec doesn't do any installing. 3. Deletes any existing gems in the LOAD_PATH 4. Adds the exact gem at the exact version at the load path. That's it. That's all bundle exec does. Once all the gems work together, and there exact versions are in the load path your application is happy. There is no activation errors, all your requires succeed, I hope. Everything is pretty great. As I think I promised in the abstract for this talk, here's a bundle exec removing pro tip. I don't really like typing bundle exec, I find it really annoying, but bundler provides a way to not have to type it all the time. And it's to create programs that map to ruby gems installation that belongs to that application. You can use the binstubs command, bundle binstubs [some gem] and it will create, in the bin directory, a program for that gem, that only runs the exact version that belongs to that application. So if you have rspec in your rails app, you can have bin/rspec that will only load the rspec for your app. This way you can have bin/rspec refer to rspec 3, and this application can have rspec 2. Rails has started to do this. Rails 4 ships with bin/rails bin/rake that are scoped so when you run bin/rails, you get the exact rails version for this application and not another one. When you run bin/rake you get the exact version of rake. Pretty cool, no more bundle exec. If everyone did this, you can check in these binstubs so you can take bin/rspec, but it in git, and it'll be mapped to that application forever, so no one would have bundle exec ever again if everyone did this. Now we bundle install, all our gems show up. We have versions dedicated for individual applications. But, as you probably sensed a problem going through history, that wasn't actually the end. There are still problems that show up after bundler came out. The biggest problem that was left was running bundle install, took forever. If you lived a long time from the United States it took a really long time. I talked to some developers in South Africa when I went there to give a talk and they told me about how running bundle install means they literally get up to start making a cup of coffee that they can finish before bundle install does. To try and speed things up, bundler 1.1 created a completely different way to get information from rubygems about gems. And that sped things up by 50%, a big win. We keep working on this, bundler 1.9 just came out this month. There's a bunch more improvements we're working on. If you're interested in following along with that, the bundler websites has news annoucements at bundler.io, and twitter we're also @bundlerio. Having said all of this, if you use Bundler, I would totally love to have your help working on it. It's an open source project. We've dedicated a lot of time to making it easy for people who don't know how to do open source to help with Bundler, and to start working on Bundler, and to get into open source that way. It's a project at Github.com/bundler/bundler. If you're interested but don't know where to start email the bundler team at team@bundler.io and we'll get you set up. On the other hand, if you have a job that means you have money, but not time, join Ruby Together, and give us money, and we'll work on Bundler, and it'll be better. As RubyTogether grows, we will also be tackling bigger community issues. We want to add easy to use gem mirrors so you don't have to go all the way to rubygems.org for your office or data center, we want to add better public benchmarks. There's a project calling ruby-bench that's starting to do that, and we'd really like to expand it. There's a bunch of other things that RubyTogether is working on that are cool If you want Bundler or RubyTogether stickers I have a giant pile, so find me later. That's it. [Audience Applause]