WEBVTT 99:59:59.999 --> 99:59:59.999 Next talk 99:59:59.999 --> 99:59:59.999 Chris and Holger are going to talk to us again 99:59:59.999 --> 99:59:59.999 about reproducible builds and tell us where they're up to. 99:59:59.999 --> 99:59:59.999 Thanks very much 99:59:59.999 --> 99:59:59.999 The outline of this talk is from last year we realised there were a lot of questions. 99:59:59.999 --> 99:59:59.999 The rough plan is to quickly go over what reproducible builds are 99:59:59.999 --> 99:59:59.999 I guess everyone is up to speed 99:59:59.999 --> 99:59:59.999 but getting everyone on the same page would be a good idea. 99:59:59.999 --> 99:59:59.999 Then Holger's going to jump in and give the status update 99:59:59.999 --> 99:59:59.999 and then we're going to talk about future work, questions etc 99:59:59.999 --> 99:59:59.999 What is the actual problem we're solving here? 99:59:59.999 --> 99:59:59.999 You can always inspect the source code of free software for malicious flaws 99:59:59.999 --> 99:59:59.999 or just flaws as well. 99:59:59.999 --> 99:59:59.999 Unfortunately distributions provide precompiled binaries to end users. 99:59:59.999 --> 99:59:59.999 So can you actually trust this compilation process has not 99:59:59.999 --> 99:59:59.999 introduced flaws of its own? 99:59:59.999 --> 99:59:59.999 The problem is it seems very effective if you want to go after end users 99:59:59.999 --> 99:59:59.999 you can go after developers. Because if you go infect a developers 99:59:59.999 --> 99:59:59.999 machine you will then infect all the users of the software they generate. 99:59:59.999 --> 99:59:59.999 Financial incentives. There always were but they are even more so these days 99:59:59.999 --> 99:59:59.999 with mobile phone etc. 99:59:59.999 --> 99:59:59.999 You can also have very subtle flaws. This one in particular there was a 99:59:59.999 --> 99:59:59.999 root exploit in OpenSSH just by changing a compare equal. 99:59:59.999 --> 99:59:59.999 That sort of assembler jump thing and it gives you root 99:59:59.999 --> 99:59:59.999 but with only a single bit difference in the binary. 99:59:59.999 --> 99:59:59.999 Which is not to shabby. 99:59:59.999 --> 99:59:59.999 Then you have all sorts of cute demos where you load up the source code in VIM 99:59:59.999 --> 99:59:59.999 and it just looks like 'Hello world' but when you compile it with GCC 99:59:59.999 --> 99:59:59.999 your kernel rootkit just goes 'oh I'm going to give you a different file' 99:59:59.999 --> 99:59:59.999 and self replicates of them like that. 99:59:59.999 --> 99:59:59.999 Difficult to trust the process. 99:59:59.999 --> 99:59:59.999 And there's some recent history as well around Xcodeghost and iOS 99:59:59.999 --> 99:59:59.999 and adverts and things like that. 99:59:59.999 --> 99:59:59.999 You can Google those things. Really scary stuff. 99:59:59.999 --> 99:59:59.999 The last example is actually coming from a CIA design paper from 2012. 99:59:59.999 --> 99:59:59.999 Which was then found in the wild in 2014. So these exploits are actually happening. 99:59:59.999 --> 99:59:59.999 People are targeting developers to get users. 99:59:59.999 --> 99:59:59.999 Xcodeghost had 20 milllion user installations. 99:59:59.999 --> 99:59:59.999 It was probably not the CIA or NSA but we don't know who it was. 99:59:59.999 --> 99:59:59.999 There are many people who do these exploits in the wild. 99:59:59.999 --> 99:59:59.999 Yeah it's not just 'Here's this cute thing we can talk about'. 99:59:59.999 --> 99:59:59.999 It's actually happening. 99:59:59.999 --> 99:59:59.999 The motivation is to ensure no flaws are introduced during the build process. 99:59:59.999 --> 99:59:59.999 We do this by ensuring the build always produces identical results. 99:59:59.999 --> 99:59:59.999 Then multiple parties do the same thing. 99:59:59.999 --> 99:59:59.999 I build it, you build it, your friends build it etc 99:59:59.999 --> 99:59:59.999 An an attacker would need to infect everyone simultaneously 99:59:59.999 --> 99:59:59.999 otherwise they'd be detected. For example if my machine was compromised 99:59:59.999 --> 99:59:59.999 I would suddenly come up with a different result. 99:59:59.999 --> 99:59:59.999 I would come up with different binaries. 99:59:59.999 --> 99:59:59.999 And you'd be 'what's going on here' and eventually we would discover 99:59:59.999 --> 99:59:59.999 that my machine was rootkitted etc. 99:59:59.999 --> 99:59:59.999 You probably know it but identically means bit by bit identical. 99:59:59.999 --> 99:59:59.999 As that is really the same. 99:59:59.999 --> 99:59:59.999 Yeah, bit, SHA, MD5 whatever you want. 99:59:59.999 --> 99:59:59.999 There are a bunch of challenges here. The biggest one is timestamps. 99:59:59.999 --> 99:59:59.999 A lot of software just loves to include timestamps everywhere. 99:59:59.999 --> 99:59:59.999 Documentation, underscore underscore date and underscore underscore time macros 99:59:59.999 --> 99:59:59.999 Just all over the place, in file names etc Things like that. 99:59:59.999 --> 99:59:59.999 Builds often vary by locale and timezone. 99:59:59.999 --> 99:59:59.999 Different new lines, different sorting orders for example collations. 99:59:59.999 --> 99:59:59.999 Different versions of libraries. I'm not sure what this refers to exactly. 99:59:59.999 --> 99:59:59.999 Moving on. 99:59:59.999 --> 99:59:59.999 Non-deterministic file ordering for example Shell Globs are not really defined 99:59:59.999 --> 99:59:59.999 to be, I say not really defined they aren't defined to come out in normal order. 99:59:59.999 --> 99:59:59.999 Also read syscall, it doesn't actually promise any particular ordering. 99:59:59.999 --> 99:59:59.999 Dictionary/hash key ordering. So this is in things like Perl and python 99:59:59.999 --> 99:59:59.999 you use a key or a hash. If you iterate over the keys with that it's a non-determinative order. 99:59:59.999 --> 99:59:59.999 If your build system loops over such a hash or a dictionary 99:59:59.999 --> 99:59:59.999 then the results from this build could be non-reproducible and non-deterministic. 99:59:59.999 --> 99:59:59.999 And also things like files in the part of the build process will just adsorb 99:59:59.999 --> 99:59:59.999 stuff from the surrounding environment like umask and all that kind of 99:59:59.999 --> 99:59:59.999 stuff that lives outside there. 99:59:59.999 --> 99:59:59.999 Build paths is a very interesting one which we cover in greater detail on another slide. 99:59:59.999 --> 99:59:59.999 Also specifying the environment, we'll also cover this one in the build info slides. 99:59:59.999 --> 99:59:59.999 So not only are there privacy and security advantages of using, 99:59:59.999 --> 99:59:59.999 moving towards reproducible builds there are also technical advantages. 99:59:59.999 --> 99:59:59.999 It's faster to build if you basically keep hitting cache. 99:59:59.999 --> 99:59:59.999 I'm pretty certain this is why Google are interested in it. 99:59:59.999 --> 99:59:59.999 Because of the amount of compilation they do 99:59:59.999 --> 99:59:59.999 they're just going to save a whole bucket load of money just by 99:59:59.999 --> 99:59:59.999 'Oh we don't need to rebuild this because it's the same SHA' etc 99:59:59.999 --> 99:59:59.999 It's very nice to test revisions and changes I use all out tools 99:59:59.999 --> 99:59:59.999 when doing QA uploads or NMUs you rebuild a package 99:59:59.999 --> 99:59:59.999 and then you compare to the previous one. And as the only things that have changed 99:59:59.999 --> 99:59:59.999 should be the things that you've changed, there haven't been all sorts 99:59:59.999 --> 99:59:59.999 of random other nonsense being reorderd with timestamps added. 99:59:59.999 --> 99:59:59.999 You can get rid of all that noise and just be 'oh yeah brilliant I can see that 99:59:59.999 --> 99:59:59.999 the patch I've applied here has actually changed the behaviour of the program' 99:59:59.999 --> 99:59:59.999 and only that. It hasn't done all sorts of wierd wierd stuff. 99:59:59.999 --> 99:59:59.999 So you have safer uploads in that sense. 99:59:59.999 --> 99:59:59.999 Speaking of safety a reproducible build won't go talking to the Internet 99:59:59.999 --> 99:59:59.999 like a lot of modern package managers like to do. Mathen style ones. 99:59:59.999 --> 99:59:59.999 Also a reproducible build will typically not have any 99:59:59.999 --> 99:59:59.999 non-deterministic failure modes. 99:59:59.999 --> 99:59:59.999 So there's a lot of tests and test suites in Debian that will 99:59:59.999 --> 99:59:59.999 try and test things like 'Oh is this algorithm N squared or bigger than N'. 99:59:59.999 --> 99:59:59.999 And it will try doing that by running some sort of bench mark 99:59:59.999 --> 99:59:59.999 and fail if it doesn't meet some sort of arbitrary time difference and 99:59:59.999 --> 99:59:59.999 that's obviously that's not reliable. So we get rid of all those nonsense things. 99:59:59.999 --> 99:59:59.999 It also finds bugs in really weird locales. We build in French, 99:59:59.999 --> 99:59:59.999 Swiss-French, and it just comes up with all sorts of nonsense. 99:59:59.999 --> 99:59:59.999 Or timezones, if you build in UTC-12 then this date library doesn't work anymore 99:59:59.999 --> 99:59:59.999 and it's like 'you had one job to be a date library'. 99:59:59.999 --> 99:59:59.999 [audience laughter] 99:59:59.999 --> 99:59:59.999 It's pretty scary and some pretty cute bugs. 99:59:59.999 --> 99:59:59.999 It also detects if your machine is, you just have a broken ??? [8:28]. 99:59:59.999 --> 99:59:59.999 We build a year and a month in the future. You find things like the 99:59:59.999 --> 99:59:59.999 maintainer has added a pre-generated SSL certificate to their tests and 99:59:59.999 --> 99:59:59.999 it expires in the year. And so it breaks. 99:59:59.999 --> 99:59:59.999 We're preemptively detecting that fail to ??? [9:01] source.