Next talk Chris and Holger are going to talk to us again about reproducible builds and tell us where they're up to. Thanks very much The outline of this talk is from last year we realised there were a lot of questions. The rough plan is to quickly go over what reproducible builds are I guess everyone is up to speed but getting everyone on the same page would be a good idea. Then Holger's going to jump in and give the status update and then we're going to talk about future work, questions etc What is the actual problem we're solving here? You can always inspect the source code of free software for malicious flaws or just flaws as well. Unfortunately distributions provide precompiled binaries to end users. So can you actually trust this compilation process has not introduced flaws of its own? The problem is it seems very effective if you want to go after end users you can go after developers. Because if you go infect a developers machine you will then infect all the users of the software they generate. Financial incentives. There always were but they are even more so these days with mobile phone etc. You can also have very subtle flaws. This one in particular there was a root exploit in OpenSSH just by changing a compare equal. That sort of assembler jump thing and it gives you root but with only a single bit difference in the binary. Which is not to shabby. Then you have all sorts of cute demos where you load up the source code in VIM and it just looks like 'Hello world' but when you compile it with GCC your kernel rootkit just goes 'oh I'm going to give you a different file' and self replicates of them like that. Difficult to trust the process. And there's some recent history as well around Xcodeghost and iOS and adverts and things like that. You can Google those things. Really scary stuff. The last example is actually coming from a CIA design paper from 2012. Which was then found in the wild in 2014. So these exploits are actually happening. People are targeting developers to get users. Xcodeghost had 20 milllion user installations. It was probably not the CIA or NSA but we don't know who it was. There are many people who do these exploits in the wild. Yeah it's not just 'Here's this cute thing we can talk about'. It's actually happening. The motivation is to ensure no flaws are introduced during the build process. We do this by ensuring the build always produces identical results. Then multiple parties do the same thing. I build it, you build it, your friends build it etc An an attacker would need to infect everyone simultaneously otherwise they'd be detected. For example if my machine was compromised I would suddenly come up with a different result. I would come up with different binaries. And you'd be 'what's going on here' and eventually we would discover that my machine was rootkitted etc. You probably know it but identically means bit by bit identical. As that is really the same. Yeah, bit, SHA, MD5 whatever you want. There are a bunch of challenges here. The biggest one is timestamps. A lot of software just loves to include timestamps everywhere. Documentation, underscore underscore date and underscore underscore time macros Just all over the place, in file names etc Things like that. Builds often vary by locale and timezone. Different new lines, different sorting orders for example collations. Different versions of libraries. I'm not sure what this refers to exactly. Moving on. Non-deterministic file ordering for example Shell Globs are not really defined to be, I say not really defined they aren't defined to come out in normal order. Also read syscall, it doesn't actually promise any particular ordering. Dictionary/hash key ordering. So this is in things like Perl and python you use a key or a hash. If you iterate over the keys with that it's a non-determinative order. If your build system loops over such a hash or a dictionary then the results from this build could be non-reproducible and non-deterministic. And also things like files in the part of the build process will just adsorb stuff from the surrounding environment like umask and all that kind of stuff that lives outside there. Build paths is a very interesting one which we cover in greater detail on another slide. Also specifying the environment, we'll also cover this one in the build info slides. So not only are there privacy and security advantages of using, moving towards reproducible builds there are also technical advantages. It's faster to build if you basically keep hitting cache. I'm pretty certain this is why Google are interested in it. Because of the amount of compilation they do they're just going to save a whole bucket load of money just by 'Oh we don't need to rebuild this because it's the same SHA' etc It's very nice to test revisions and changes I use all out tools when doing QA uploads or NMUs you rebuild a package and then you compare to the previous one. And as the only things that have changed should be the things that you've changed, there haven't been all sorts of random other nonsense being reorderd with timestamps added. You can get rid of all that noise and just be 'oh yeah brilliant I can see that the patch I've applied here has actually changed the behaviour of the program' and only that. It hasn't done all sorts of wierd wierd stuff. So you have safer uploads in that sense. Speaking of safety a reproducible build won't go talking to the Internet like a lot of modern package managers like to do. Mathen style ones. Also a reproducible build will typically not have any non-deterministic failure modes. So there's a lot of tests and test suites in Debian that will try and test things like 'Oh is this algorithm N squared or bigger than N'. And it will try doing that by running some sort of bench mark and fail if it doesn't meet some sort of arbitrary time difference and that's obviously that's not reliable. So we get rid of all those nonsense things. It also finds bugs in really weird locales. We build in French, Swiss-French, and it just comes up with all sorts of nonsense. Or timezones, if you build in UTC-12 then this date library doesn't work anymore and it's like 'you had one job to be a date library'. [audience laughter] It's pretty scary and some pretty cute bugs. It also detects if your machine is, you just have a broken ??? [8:28]. We build a year and a month in the future. You find things like the maintainer has added a pre-generated SSL certificate to their tests and it expires in the year. And so it breaks. We're preemptively detecting that fail to ??? [9:01] source.