WEBVTT 99:59:59.999 --> 99:59:59.999 I'm here today to talk to you about diffoscope 99:59:59.999 --> 99:59:59.999 and how you can use it as a better diff 99:59:59.999 --> 99:59:59.999 or for Quality Assurance, etc., things like that. 99:59:59.999 --> 99:59:59.999 Moin! 99:59:59.999 --> 99:59:59.999 Apparently that's like a north german thing to say "welcome". 99:59:59.999 --> 99:59:59.999 North german, north Denmark, Scandinavia, that kind of thing, I'm told. 99:59:59.999 --> 99:59:59.999 People are shaking their head, so I'm going to assume that's true. 99:59:59.999 --> 99:59:59.999 This is my first PC, an IBM 5155. 99:59:59.999 --> 99:59:59.999 Sometimes, when you rebooted it, it would launch into, it would somehow revert 99:59:59.999 --> 99:59:59.999 from booting from the hard disk to booting from a basic ROM, 99:59:59.999 --> 99:59:59.999 as in the programming language ROM. 99:59:59.999 --> 99:59:59.999 It was on my motherboard for some reason. 99:59:59.999 --> 99:59:59.999 So, randomly, you just get a chance to program in basic and then, 99:59:59.999 --> 99:59:59.999 sometimes you wouldn't, I don't know why, but… yeah. 99:59:59.999 --> 99:59:59.999 It's quite fun with this kind of clicky keyboard, and that folded in 99:59:59.999 --> 99:59:59.999 and it was this kind of big desk thing. 99:59:59.999 --> 99:59:59.999 Anyway… 99:59:59.999 --> 99:59:59.999 This is my first Debian. 99:59:59.999 --> 99:59:59.999 At the time it was already old. 99:59:59.999 --> 99:59:59.999 What's this one? Is this Slink? 2.2? Yeah. 99:59:59.999 --> 99:59:59.999 And this is when we had US and non-US, so that's really dating if you remember that. 99:59:59.999 --> 99:59:59.999 This is my first contribution to Debian, 19th December 2006, 99:59:59.999 --> 99:59:59.999 sending a patch to lillypond which is kind of interesting 99:59:59.999 --> 99:59:59.999 and the response was "Oh yeah, rock on, many thanks. I'll upload this and 99:59:59.999 --> 99:59:59.999 it'll be landing to Etch". 99:59:59.999 --> 99:59:59.999 And this was super motivating because Etch was just coming out and it was like 99:59:59.999 --> 99:59:59.999 "Great, I've got let one line of tiny patch in a release. This is super cool." 99:59:59.999 --> 99:59:59.999 Thomas' response was super motivating. 99:59:59.999 --> 99:59:59.999 So, after that, like that Christmas basically spent ??? 99:59:59.999 --> 99:59:59.999 Debian webpages and stuff. 99:59:59.999 --> 99:59:59.999 Very well timed. 99:59:59.999 --> 99:59:59.999 That's kind of a good… 99:59:59.999 --> 99:59:59.999 You know, someone sends a patch, be like "Cool, thanks" 99:59:59.999 --> 99:59:59.999 Like a little notice in the changelog. 99:59:59.999 --> 99:59:59.999 It was, you know, so stupid but… Yeah, do that kind of thing. 99:59:59.999 --> 99:59:59.999 So, moving on. 99:59:59.999 --> 99:59:59.999 Why diffoscope? Why did we write diffoscope? 99:59:59.999 --> 99:59:59.999 What's the background here? 99:59:59.999 --> 99:59:59.999 It comes from reproducible builds. 99:59:59.999 --> 99:59:59.999 The very quick outline is that once you get the source code for free software, 99:59:59.999 --> 99:59:59.999 you download the source code for nginx or whatever, 99:59:59.999 --> 99:59:59.999 pretty much everyone just runs binaries on their servers or their systems. 99:59:59.999 --> 99:59:59.999 You know, "apt install bla", "yum install", whatever. 99:59:59.999 --> 99:59:59.999 Android Playstore, whatever. 99:59:59.999 --> 99:59:59.999 Can you actually trust whether these two things correspond with each other? 99:59:59.999 --> 99:59:59.999 You've gotten the source code, it looks alright, and then you install this binary, 99:59:59.999 --> 99:59:59.999 yeah… 99:59:59.999 --> 99:59:59.999 Who generated that? Can you trust that process? 99:59:59.999 --> 99:59:59.999 Can you trust who generated it? 99:59:59.999 --> 99:59:59.999 Even if you could trust them, could you trust them not to be exploited? Etc. 99:59:59.999 --> 99:59:59.999 This is a big problem because you can exploit a build farm and then 99:59:59.999 --> 99:59:59.999 obviously exploit all of that, you know, a trojan into the build farm, 99:59:59.999 --> 99:59:59.999 so every single binary that comes out is compromised. 99:59:59.999 --> 99:59:59.999 Kind of problematic. 99:59:59.999 --> 99:59:59.999 You could also target individual developers machines, 99:59:59.999 --> 99:59:59.999 so I could go of to, say, your machine, add a backdoor to it, 99:59:59.999 --> 99:59:59.999 so every binary that you give to friends and things like that, 99:59:59.999 --> 99:59:59.999 are compromised in some way, stealing your bitcoins or whatever. 99:59:59.999 --> 99:59:59.999 I can also ??? and blackmail you into producing 99:59:59.999 --> 99:59:59.999 software that has compromises or extra features, shall we say, 99:59:59.999 --> 99:59:59.999 that don't exist in the source code. 99:59:59.999 --> 99:59:59.999 So what will happen there is that you'd release your source 99:59:59.999 --> 99:59:59.999 and the binaries you produce have this sort of backdoor that, you know, 99:59:59.999 --> 99:59:59.999 someone is forcing you into producing. 99:59:59.999 --> 99:59:59.999 So, you don't want to do that. 99:59:59.999 --> 99:59:59.999 Anyway 99:59:59.999 --> 99:59:59.999 enough of that. 99:59:59.999 --> 99:59:59.999 What you do for reproducible builds is you ensure that every time you build 99:59:59.999 --> 99:59:59.999 a piece of software, you get an identical result. 99:59:59.999 --> 99:59:59.999 Multiple people then compare their builds and check whether they all get 99:59:59.999 --> 99:59:59.999 the same results 99:59:59.999 --> 99:59:59.999 and this means that an attacker must either have infected everyone 99:59:59.999 --> 99:59:59.999 at the same time, or they haven't infected anyone. 99:59:59.999 --> 99:59:59.999 The point here is that you have to ensure that builds have identical results. 99:59:59.999 --> 99:59:59.999 Ok, great. 99:59:59.999 --> 99:59:59.999 So, we started the reproducible builds project, etc. 99:59:59.999 --> 99:59:59.999 And we build 2 debs. 99:59:59.999 --> 99:59:59.999 Oh, I'm sorry about the colors there. 99:59:59.999 --> 99:59:59.999 You probably can't see that. 99:59:59.999 --> 99:59:59.999 That says "sha1sum a.deb b.deb". 99:59:59.999 --> 99:59:59.999 Anyway, we're comparing the sha1sums of 2 binary Debian files. 99:59:59.999 --> 99:59:59.999 So, these two files differ. 99:59:59.999 --> 99:59:59.999 Ok, they're not reproducible. 99:59:59.999 --> 99:59:59.999 Why is that? 99:59:59.999 --> 99:59:59.999 So we run a diff on them. 99:59:59.999 --> 99:59:59.999 Yeah… 99:59:59.999 --> 99:59:59.999 So, what can we learn from this? 99:59:59.999 --> 99:59:59.999 Well, not very much, visibly they're compressed so 99:59:59.999 --> 99:59:59.999 as soon as we see one change, we'll see they would just cascade changes 99:59:59.999 --> 99:59:59.999 because that's how compression works. 99:59:59.999 --> 99:59:59.999 I guess we know it's a deb ??? format file, not very useful. 99:59:59.999 --> 99:59:59.999 Ok, great so we're gonna have a look in 99:59:59.999 --> 99:59:59.999 We'll do a binary diff and ok, well… 99:59:59.999 --> 99:59:59.999 Again, that's not really telling us very much 99:59:59.999 --> 99:59:59.999 with the diff there. 99:59:59.999 --> 99:59:59.999 Ok, great. 99:59:59.999 --> 99:59:59.999 ??? 99:59:59.999 --> 99:59:59.999 "ar x" is on the new maintainer thing, "how you unpack a deb" 99:59:59.999 --> 99:59:59.999 Everyone remembers this, right? 99:59:59.999 --> 99:59:59.999 You unpack a.deb with "ar x" and you do that to b.deb 99:59:59.999 --> 99:59:59.999 and then we diff the results of that. 99:59:59.999 --> 99:59:59.999 Ok, so…yeah, 7zip. 99:59:59.999 --> 99:59:59.999 Ok, compressed content, not very useful. 99:59:59.999 --> 99:59:59.999 Ok, so let's unpack the control.tar inside these debs. 99:59:59.999 --> 99:59:59.999 And then we run diff on that. 99:59:59.999 --> 99:59:59.999 Still not really telling anything useful about how to make this package reproducible 99:59:59.999 --> 99:59:59.999 So let's unpack the tar.xz into the tar. 99:59:59.999 --> 99:59:59.999 Inside that tar, there's a file called md5sums and we start to see some differences 99:59:59.999 --> 99:59:59.999 between some files in these two debs. 99:59:59.999 --> 99:59:59.999 ??? meaningful, so now we have some idea that 99:59:59.999 --> 99:59:59.999 it has something to do with this usr/bin/pmixer binary. 99:59:59.999 --> 99:59:59.999 Ok, interesting. 99:59:59.999 --> 99:59:59.999 We'll unzip that and then we do a diff on pmixer itself. 99:59:59.999 --> 99:59:59.999 Now we're back into just binary ??? mode 99:59:59.999 --> 99:59:59.999 This isn't very helpful and this is taking quite a while 99:59:59.999 --> 99:59:59.999 and if I remember correctly, Debian has a lot of packages. 99:59:59.999 --> 99:59:59.999 So this might take a little while. 99:59:59.999 --> 99:59:59.999 So, basically, ??? meme 99:59:59.999 --> 99:59:59.999 I should build a better diff.