1 99:59:59,999 --> 99:59:59,999 I'm here today to talk to you about diffoscope 2 99:59:59,999 --> 99:59:59,999 and how you can use it as a better diff 3 99:59:59,999 --> 99:59:59,999 or for Quality Assurance, etc., things like that. 4 99:59:59,999 --> 99:59:59,999 Moin! 5 99:59:59,999 --> 99:59:59,999 Apparently that's like a north german thing to say "welcome". 6 99:59:59,999 --> 99:59:59,999 North german, north Denmark, Scandinavia, that kind of thing, I'm told. 7 99:59:59,999 --> 99:59:59,999 People are shaking their head, so I'm going to assume that's true. 8 99:59:59,999 --> 99:59:59,999 This is my first PC, an IBM 5155. 9 99:59:59,999 --> 99:59:59,999 Sometimes, when you rebooted it, it would launch into, it would somehow revert 10 99:59:59,999 --> 99:59:59,999 from booting from the hard disk to booting from a basic ROM, 11 99:59:59,999 --> 99:59:59,999 as in the programming language ROM. 12 99:59:59,999 --> 99:59:59,999 It was on my motherboard for some reason. 13 99:59:59,999 --> 99:59:59,999 So, randomly, you just get a chance to program in basic and then, 14 99:59:59,999 --> 99:59:59,999 sometimes you wouldn't, I don't know why, but… yeah. 15 99:59:59,999 --> 99:59:59,999 It's quite fun with this kind of clicky keyboard, and that folded in 16 99:59:59,999 --> 99:59:59,999 and it was this kind of big desk thing. 17 99:59:59,999 --> 99:59:59,999 Anyway… 18 99:59:59,999 --> 99:59:59,999 This is my first Debian. 19 99:59:59,999 --> 99:59:59,999 At the time it was already old. 20 99:59:59,999 --> 99:59:59,999 What's this one? Is this Slink? 2.2? Yeah. 21 99:59:59,999 --> 99:59:59,999 And this is when we had US and non-US, so that's really dating if you remember that. 22 99:59:59,999 --> 99:59:59,999 This is my first contribution to Debian, 19th December 2006, 23 99:59:59,999 --> 99:59:59,999 sending a patch to lillypond which is kind of interesting 24 99:59:59,999 --> 99:59:59,999 and the response was "Oh yeah, rock on, many thanks. I'll upload this and 25 99:59:59,999 --> 99:59:59,999 it'll be landing to Etch". 26 99:59:59,999 --> 99:59:59,999 And this was super motivating because Etch was just coming out and it was like 27 99:59:59,999 --> 99:59:59,999 "Great, I've got let one line of tiny patch in a release. This is super cool." 28 99:59:59,999 --> 99:59:59,999 Thomas' response was super motivating. 29 99:59:59,999 --> 99:59:59,999 So, after that, like that Christmas basically spent ??? 30 99:59:59,999 --> 99:59:59,999 Debian webpages and stuff. 31 99:59:59,999 --> 99:59:59,999 Very well timed. 32 99:59:59,999 --> 99:59:59,999 That's kind of a good… 33 99:59:59,999 --> 99:59:59,999 You know, someone sends a patch, be like "Cool, thanks" 34 99:59:59,999 --> 99:59:59,999 Like a little notice in the changelog. 35 99:59:59,999 --> 99:59:59,999 It was, you know, so stupid but… Yeah, do that kind of thing. 36 99:59:59,999 --> 99:59:59,999 So, moving on. 37 99:59:59,999 --> 99:59:59,999 Why diffoscope? Why did we write diffoscope? 38 99:59:59,999 --> 99:59:59,999 What's the background here? 39 99:59:59,999 --> 99:59:59,999 It comes from reproducible builds. 40 99:59:59,999 --> 99:59:59,999 The very quick outline is that once you get the source code for free software, 41 99:59:59,999 --> 99:59:59,999 you download the source code for nginx or whatever, 42 99:59:59,999 --> 99:59:59,999 pretty much everyone just runs binaries on their servers or their systems. 43 99:59:59,999 --> 99:59:59,999 You know, "apt install bla", "yum install", whatever. 44 99:59:59,999 --> 99:59:59,999 Android Playstore, whatever. 45 99:59:59,999 --> 99:59:59,999 Can you actually trust whether these two things correspond with each other? 46 99:59:59,999 --> 99:59:59,999 You've gotten the source code, it looks alright, and then you install this binary, 47 99:59:59,999 --> 99:59:59,999 yeah… 48 99:59:59,999 --> 99:59:59,999 Who generated that? Can you trust that process? 49 99:59:59,999 --> 99:59:59,999 Can you trust who generated it? 50 99:59:59,999 --> 99:59:59,999 Even if you could trust them, could you trust them not to be exploited? Etc. 51 99:59:59,999 --> 99:59:59,999 This is a big problem because you can exploit a build farm and then 52 99:59:59,999 --> 99:59:59,999 obviously exploit all of that, you know, a trojan into the build farm, 53 99:59:59,999 --> 99:59:59,999 so every single binary that comes out is compromised. 54 99:59:59,999 --> 99:59:59,999 Kind of problematic. 55 99:59:59,999 --> 99:59:59,999 You could also target individual developers machines, 56 99:59:59,999 --> 99:59:59,999 so I could go of to, say, your machine, add a backdoor to it, 57 99:59:59,999 --> 99:59:59,999 so every binary that you give to friends and things like that, 58 99:59:59,999 --> 99:59:59,999 are compromised in some way, stealing your bitcoins or whatever. 59 99:59:59,999 --> 99:59:59,999 I can also ??? and blackmail you into producing 60 99:59:59,999 --> 99:59:59,999 software that has compromises or extra features, shall we say, 61 99:59:59,999 --> 99:59:59,999 that don't exist in the source code. 62 99:59:59,999 --> 99:59:59,999 So what will happen there is that you'd release your source 63 99:59:59,999 --> 99:59:59,999 and the binaries you produce have this sort of backdoor that, you know, 64 99:59:59,999 --> 99:59:59,999 someone is forcing you into producing. 65 99:59:59,999 --> 99:59:59,999 So, you don't want to do that. 66 99:59:59,999 --> 99:59:59,999 Anyway 67 99:59:59,999 --> 99:59:59,999 enough of that. 68 99:59:59,999 --> 99:59:59,999 What you do for reproducible builds is you ensure that every time you build 69 99:59:59,999 --> 99:59:59,999 a piece of software, you get an identical result. 70 99:59:59,999 --> 99:59:59,999 Multiple people then compare their builds and check whether they all get 71 99:59:59,999 --> 99:59:59,999 the same results 72 99:59:59,999 --> 99:59:59,999 and this means that an attacker must either have infected everyone 73 99:59:59,999 --> 99:59:59,999 at the same time, or they haven't infected anyone. 74 99:59:59,999 --> 99:59:59,999 The point here is that you have to ensure that builds have identical results. 75 99:59:59,999 --> 99:59:59,999 Ok, great. 76 99:59:59,999 --> 99:59:59,999 So, we started the reproducible builds project, etc. 77 99:59:59,999 --> 99:59:59,999 And we build 2 debs. 78 99:59:59,999 --> 99:59:59,999 Oh, I'm sorry about the colors there. 79 99:59:59,999 --> 99:59:59,999 You probably can't see that. 80 99:59:59,999 --> 99:59:59,999 That says "sha1sum a.deb b.deb". 81 99:59:59,999 --> 99:59:59,999 Anyway, we're comparing the sha1sums of 2 binary Debian files. 82 99:59:59,999 --> 99:59:59,999 So, these two files differ. 83 99:59:59,999 --> 99:59:59,999 Ok, they're not reproducible. 84 99:59:59,999 --> 99:59:59,999 Why is that? 85 99:59:59,999 --> 99:59:59,999 So we run a diff on them. 86 99:59:59,999 --> 99:59:59,999 Yeah… 87 99:59:59,999 --> 99:59:59,999 So, what can we learn from this? 88 99:59:59,999 --> 99:59:59,999 Well, not very much, visibly they're compressed so 89 99:59:59,999 --> 99:59:59,999 as soon as we see one change, we'll see they would just cascade changes 90 99:59:59,999 --> 99:59:59,999 because that's how compression works. 91 99:59:59,999 --> 99:59:59,999 I guess we know it's a deb ??? format file, not very useful. 92 99:59:59,999 --> 99:59:59,999 Ok, great so we're gonna have a look in 93 99:59:59,999 --> 99:59:59,999 We'll do a binary diff and ok, well… 94 99:59:59,999 --> 99:59:59,999 Again, that's not really telling us very much 95 99:59:59,999 --> 99:59:59,999 with the diff there. 96 99:59:59,999 --> 99:59:59,999 Ok, great. 97 99:59:59,999 --> 99:59:59,999 ??? 98 99:59:59,999 --> 99:59:59,999 "ar x" is on the new maintainer thing, "how you unpack a deb" 99 99:59:59,999 --> 99:59:59,999 Everyone remembers this, right? 100 99:59:59,999 --> 99:59:59,999 You unpack a.deb with "ar x" and you do that to b.deb 101 99:59:59,999 --> 99:59:59,999 and then we diff the results of that. 102 99:59:59,999 --> 99:59:59,999 Ok, so…yeah, 7zip. 103 99:59:59,999 --> 99:59:59,999 Ok, compressed content, not very useful. 104 99:59:59,999 --> 99:59:59,999 Ok, so let's unpack the control.tar inside these debs. 105 99:59:59,999 --> 99:59:59,999 And then we run diff on that. 106 99:59:59,999 --> 99:59:59,999 Still not really telling anything useful about how to make this package reproducible 107 99:59:59,999 --> 99:59:59,999 So let's unpack the tar.xz into the tar. 108 99:59:59,999 --> 99:59:59,999 Inside that tar, there's a file called md5sums and we start to see some differences 109 99:59:59,999 --> 99:59:59,999 between some files in these two debs. 110 99:59:59,999 --> 99:59:59,999 ??? meaningful, so now we have some idea that 111 99:59:59,999 --> 99:59:59,999 it has something to do with this usr/bin/pmixer binary. 112 99:59:59,999 --> 99:59:59,999 Ok, interesting. 113 99:59:59,999 --> 99:59:59,999 We'll unzip that and then we do a diff on pmixer itself. 114 99:59:59,999 --> 99:59:59,999 Now we're back into just binary ??? mode 115 99:59:59,999 --> 99:59:59,999 This isn't very helpful and this is taking quite a while 116 99:59:59,999 --> 99:59:59,999 and if I remember correctly, Debian has a lot of packages. 117 99:59:59,999 --> 99:59:59,999 So this might take a little while. 118 99:59:59,999 --> 99:59:59,999 So, basically, ??? meme 119 99:59:59,999 --> 99:59:59,999 I should build a better diff.