WEBVTT 99:59:59.999 --> 99:59:59.999 I'm here today to talk to you about diffoscope 99:59:59.999 --> 99:59:59.999 and how you can use it as a better diff 99:59:59.999 --> 99:59:59.999 or for Quality Assurance, etc., things like that. 99:59:59.999 --> 99:59:59.999 Moin! 99:59:59.999 --> 99:59:59.999 Apparently that's like a north german thing to say "welcome". 99:59:59.999 --> 99:59:59.999 North german, north Denmark, Scandinavia, that kind of thing, I'm told. 99:59:59.999 --> 99:59:59.999 People are shaking their head, so I'm going to assume that's true. 99:59:59.999 --> 99:59:59.999 This is my first PC, an IBM 5155. 99:59:59.999 --> 99:59:59.999 Sometimes, when you rebooted it, it would launch into, it would somehow revert 99:59:59.999 --> 99:59:59.999 from booting from the hard disk to booting from a basic ROM, 99:59:59.999 --> 99:59:59.999 as in the programming language ROM. 99:59:59.999 --> 99:59:59.999 It was on my motherboard for some reason. 99:59:59.999 --> 99:59:59.999 So, randomly, you just get a chance to program in basic and then, 99:59:59.999 --> 99:59:59.999 sometimes you wouldn't, I don't know why, but… yeah. 99:59:59.999 --> 99:59:59.999 It's quite fun with this kind of clicky keyboard, and that folded in 99:59:59.999 --> 99:59:59.999 and it was this kind of big desk thing. 99:59:59.999 --> 99:59:59.999 Anyway… 99:59:59.999 --> 99:59:59.999 This is my first Debian. 99:59:59.999 --> 99:59:59.999 At the time it was already old. 99:59:59.999 --> 99:59:59.999 What's this one? Is this Slink? 2.2? Yeah. 99:59:59.999 --> 99:59:59.999 And this is when we had US and non-US, so that's really dating if you remember that. 99:59:59.999 --> 99:59:59.999 This is my first contribution to Debian, 19th December 2006, 99:59:59.999 --> 99:59:59.999 sending a patch to lillypond which is kind of interesting 99:59:59.999 --> 99:59:59.999 and the response was "Oh yeah, rock on, many thanks. I'll upload this and 99:59:59.999 --> 99:59:59.999 it'll be landing to Etch". 99:59:59.999 --> 99:59:59.999 And this was super motivating because Etch was just coming out and it was like 99:59:59.999 --> 99:59:59.999 "Great, I've got let one line of tiny patch in a release. This is super cool." 99:59:59.999 --> 99:59:59.999 Thomas' response was super motivating. 99:59:59.999 --> 99:59:59.999 So, after that, like that Christmas basically spent ??? 99:59:59.999 --> 99:59:59.999 Debian webpages and stuff. 99:59:59.999 --> 99:59:59.999 Very well timed. 99:59:59.999 --> 99:59:59.999 That's kind of a good… 99:59:59.999 --> 99:59:59.999 You know, someone sends a patch, be like "Cool, thanks" 99:59:59.999 --> 99:59:59.999 Like a little notice in the changelog. 99:59:59.999 --> 99:59:59.999 It was, you know, so stupid but… Yeah, do that kind of thing. 99:59:59.999 --> 99:59:59.999 So, moving on. 99:59:59.999 --> 99:59:59.999 Why diffoscope? Why did we write diffoscope? 99:59:59.999 --> 99:59:59.999 What's the background here? 99:59:59.999 --> 99:59:59.999 It comes from reproducible builds. 99:59:59.999 --> 99:59:59.999 The very quick outline is that once you get the source code for free software, 99:59:59.999 --> 99:59:59.999 you download the source code for nginx or whatever, 99:59:59.999 --> 99:59:59.999 pretty much everyone just runs binaries on their servers or their systems. 99:59:59.999 --> 99:59:59.999 You know, "apt install bla", "yum install", whatever. 99:59:59.999 --> 99:59:59.999 Android Playstore, whatever. 99:59:59.999 --> 99:59:59.999 Can you actually trust whether these two things correspond with each other? 99:59:59.999 --> 99:59:59.999 You've gotten the source code, it looks alright, and then you install this binary, 99:59:59.999 --> 99:59:59.999 yeah… 99:59:59.999 --> 99:59:59.999 Who generated that? Can you trust that process? 99:59:59.999 --> 99:59:59.999 Can you trust who generated it? 99:59:59.999 --> 99:59:59.999 Even if you could trust them, could you trust them not to be exploited? Etc. 99:59:59.999 --> 99:59:59.999 This is a big problem because you can exploit a build farm and then 99:59:59.999 --> 99:59:59.999 obviously exploit all of that, you know, a trojan into the build farm, 99:59:59.999 --> 99:59:59.999 so every single binary that comes out is compromised. 99:59:59.999 --> 99:59:59.999 Kind of problematic. 99:59:59.999 --> 99:59:59.999 You could also target individual developers machines, 99:59:59.999 --> 99:59:59.999 so I could go of to, say, your machine, add a backdoor to it, 99:59:59.999 --> 99:59:59.999 so every binary that you give to friends and things like that, 99:59:59.999 --> 99:59:59.999 are compromised in some way, stealing your bitcoins or whatever. 99:59:59.999 --> 99:59:59.999 I can also ??? and blackmail you into producing 99:59:59.999 --> 99:59:59.999 software that has compromises or extra features, shall we say, 99:59:59.999 --> 99:59:59.999 that don't exist in the source code. 99:59:59.999 --> 99:59:59.999 So what will happen there is that you'd release your source 99:59:59.999 --> 99:59:59.999 and the binaries you produce have this sort of backdoor that, you know, 99:59:59.999 --> 99:59:59.999 someone is forcing you into producing. 99:59:59.999 --> 99:59:59.999 So, you don't want to do that. 99:59:59.999 --> 99:59:59.999 Anyway 99:59:59.999 --> 99:59:59.999 enough of that. 99:59:59.999 --> 99:59:59.999 What you do for reproducible builds is you ensure that every time you build 99:59:59.999 --> 99:59:59.999 a piece of software, you get an identical result. 99:59:59.999 --> 99:59:59.999 Multiple people then compare their builds and check whether they all get 99:59:59.999 --> 99:59:59.999 the same results 99:59:59.999 --> 99:59:59.999 and this means that an attacker must either have infected everyone 99:59:59.999 --> 99:59:59.999 at the same time, or they haven't infected anyone. 99:59:59.999 --> 99:59:59.999 The point here is that you have to ensure that builds have identical results. 99:59:59.999 --> 99:59:59.999 Ok, great. 99:59:59.999 --> 99:59:59.999 So, we started the reproducible builds project, etc. 99:59:59.999 --> 99:59:59.999 And we build 2 debs. 99:59:59.999 --> 99:59:59.999 Oh, I'm sorry about the colors there. 99:59:59.999 --> 99:59:59.999 You probably can't see that. 99:59:59.999 --> 99:59:59.999 That says "sha1sum a.deb b.deb". 99:59:59.999 --> 99:59:59.999 Anyway, we're comparing the sha1sums of 2 binary Debian files. 99:59:59.999 --> 99:59:59.999 So, these two files differ. 99:59:59.999 --> 99:59:59.999 Ok, they're not reproducible. 99:59:59.999 --> 99:59:59.999 Why is that? 99:59:59.999 --> 99:59:59.999 So we run a diff on them. 99:59:59.999 --> 99:59:59.999 Yeah… 99:59:59.999 --> 99:59:59.999 So, what can we learn from this? 99:59:59.999 --> 99:59:59.999 Well, not very much, visibly they're compressed so 99:59:59.999 --> 99:59:59.999 as soon as we see one change, we'll see they would just cascade changes 99:59:59.999 --> 99:59:59.999 because that's how compression works. 99:59:59.999 --> 99:59:59.999 I guess we know it's a deb ??? format file, not very useful. 99:59:59.999 --> 99:59:59.999 Ok, great so we're gonna have a look in 99:59:59.999 --> 99:59:59.999 We'll do a binary diff and ok, well… 99:59:59.999 --> 99:59:59.999 Again, that's not really telling us very much 99:59:59.999 --> 99:59:59.999 with the diff there. 99:59:59.999 --> 99:59:59.999 Ok, great. 99:59:59.999 --> 99:59:59.999 ??? 99:59:59.999 --> 99:59:59.999 "ar x" is on the new maintainer thing, "how you unpack a deb" 99:59:59.999 --> 99:59:59.999 Everyone remembers this, right? 99:59:59.999 --> 99:59:59.999 You unpack a.deb with "ar x" and you do that to b.deb 99:59:59.999 --> 99:59:59.999 and then we diff the results of that. 99:59:59.999 --> 99:59:59.999 Ok, so…yeah, 7zip. 99:59:59.999 --> 99:59:59.999 Ok, compressed content, not very useful. 99:59:59.999 --> 99:59:59.999 Ok, so let's unpack the control.tar inside these debs. 99:59:59.999 --> 99:59:59.999 And then we run diff on that. 99:59:59.999 --> 99:59:59.999 Still not really telling anything useful about how to make this package reproducible 99:59:59.999 --> 99:59:59.999 So let's unpack the tar.xz into the tar. 99:59:59.999 --> 99:59:59.999 Inside that tar, there's a file called md5sums and we start to see some differences 99:59:59.999 --> 99:59:59.999 between some files in these two debs. 99:59:59.999 --> 99:59:59.999 ??? meaningful, so now we have some idea that 99:59:59.999 --> 99:59:59.999 it has something to do with this usr/bin/pmixer binary. 99:59:59.999 --> 99:59:59.999 Ok, interesting. 99:59:59.999 --> 99:59:59.999 We'll unzip that and then we do a diff on pmixer itself. 99:59:59.999 --> 99:59:59.999 Now we're back into just binary ??? mode 99:59:59.999 --> 99:59:59.999 This isn't very helpful and this is taking quite a while 99:59:59.999 --> 99:59:59.999 and if I remember correctly, Debian has a lot of packages. 99:59:59.999 --> 99:59:59.999 So this might take a little while. 99:59:59.999 --> 99:59:59.999 So, basically, ??? meme 99:59:59.999 --> 99:59:59.999 I should build a better diff. 99:59:59.999 --> 99:59:59.999 That's not quite true, this is actually… 99:59:59.999 --> 99:59:59.999 It was lunar that started this project 99:59:59.999 --> 99:59:59.999 and it was called debbindiff, because we wanted to diff 99:59:59.999 --> 99:59:59.999 binary Debian packages. 99:59:59.999 --> 99:59:59.999 So this is the initial commit, 2014. 99:59:59.999 --> 99:59:59.999 "The version is successfully able to report differences in two .changes files. 99:59:59.999 --> 99:59:59.999 Not with much interesting details, but it's a start." 99:59:59.999 --> 99:59:59.999 And it was a start. 99:59:59.999 --> 99:59:59.999 Fast forwarding… Oh, sorry about these colors, 99:59:59.999 --> 99:59:59.999 I don't know if we can do anything about the lights? 99:59:59.999 --> 99:59:59.999 Yeah? 99:59:59.999 --> 99:59:59.999 No? 99:59:59.999 --> 99:59:59.999 Alright, well… 99:59:59.999 --> 99:59:59.999 Basically, we're diffoscoping on… 99:59:59.999 --> 99:59:59.999 It works kind of diff does normally, 99:59:59.999 --> 99:59:59.999 you give it two files, it outputs a unified diff. 99:59:59.999 --> 99:59:59.999 So "diffoscope a b", one file contains the word "foo", one contains the word "bar". 99:59:59.999 --> 99:59:59.999 Nothing actually out of the ordinary. 99:59:59.999 --> 99:59:59.999 It's sort of colored by default, so that's why you can't see it, but whatever. 99:59:59.999 --> 99:59:59.999 It supports archive formats, so if you give it two tar files, 99:59:59.999 --> 99:59:59.999 if we then tar up our "a" file and our "b" file into a a.tar and b.tar 99:59:59.999 --> 99:59:59.999 and then run diffoscope on those tar files 99:59:59.999 --> 99:59:59.999 we get this kind of, like, hierarchy here. 99:59:59.999 --> 99:59:59.999 So it's saying that there are differencies between these files, 99:59:59.999 --> 99:59:59.999 in the file list they have different time stamps, because I made them 99:59:59.999 --> 99:59:59.999 at different times, 99:59:59.999 --> 99:59:59.999 and here are the contents, so we got "foo" there and "bar" there. 99:59:59.999 --> 99:59:59.999 So we can see the difference between them. 99:59:59.999 --> 99:59:59.999 Well, I can, I don't know if you can, you get the slide there. 99:59:59.999 --> 99:59:59.999 If we gzip these tar files and then run diffoscope on those gzip things, 99:59:59.999 --> 99:59:59.999 it'll say "ok, what we've done is unpack it first, and here's the metadata 99:59:59.999 --> 99:59:59.999 about the gzip process", 99:59:59.999 --> 99:59:59.999 and inside that are a.tar and b.tar from the previous slides. 99:59:59.999 --> 99:59:59.999 And then the "a" file and the "b" file. 99:59:59.999 --> 99:59:59.999 So, it's really going two levels deep into this tar.gz file. 99:59:59.999 --> 99:59:59.999 That's pretty cool. 99:59:59.999 --> 99:59:59.999 And it's completely recursive, I think it will actually blow out after, I think, 99:59:59.999 --> 99:59:59.999 1000 [levels]. 99:59:59.999 --> 99:59:59.999 [light is turned down for the audience to see the slides] 99:59:59.999 --> 99:59:59.999 I'll just bump back a bit, just in case. 99:59:59.999 --> 99:59:59.999 [Applause] 99:59:59.999 --> 99:59:59.999 Thank you. 99:59:59.999 --> 99:59:59.999 So that's the a and b files. 99:59:59.999 --> 99:59:59.999 We've tared them up and so I see the hierarchy of foo and bar file layer. 99:59:59.999 --> 99:59:59.999 I've gziped them, so this is a gzip layer. 99:59:59.999 --> 99:59:59.999 Here's the tar layer and then there's the files themselves. 99:59:59.999 --> 99:59:59.999 This is from a real .deb from the archive. 99:59:59.999 --> 99:59:59.999 Inside this .deb, there's a data.tar.xz and in that xz file there's a data.tar 99:59:59.999 --> 99:59:59.999 and inside that tar file, there's a file called aff and inside that 99:59:59.999 --> 99:59:59.999 there's a version string that is different. 99:59:59.999 --> 99:59:59.999 And that looks like a build date so we probably know that if we went back 99:59:59.999 --> 99:59:59.999 to the source package, we could very quickly work out, 99:59:59.999 --> 99:59:59.999 with get a very quick grep, work out where this file is being generated from, 99:59:59.999 --> 99:59:59.999 the de_DE.aff file and then ??? probably quite obvious 99:59:59.999 --> 99:59:59.999 that it's using the current build time and then we can just patch that, fix it etc. 99:59:59.999 --> 99:59:59.999 This is gone from two rather obscure binary .debs all the way to the fix 99:59:59.999 --> 99:59:59.999 probably in about 5 minutes, and you can probably send the patch in that time 99:59:59.999 --> 99:59:59.999 because it'd be quite quick. 99:59:59.999 --> 99:59:59.999 Without diffoscope here, without this sort of recursive unpacking, 99:59:59.999 --> 99:59:59.999 you'd be just completely lost, you'd be there with arx all day 99:59:59.999 --> 99:59:59.999 and working out which files are different and trying to use xxd 99:59:59.999 --> 99:59:59.999 and this kind of nonsense. 99:59:59.999 --> 99:59:59.999 diffoscope's got some other things as well 99:59:59.999 --> 99:59:59.999 if you try to do reproducible packages and things are varying just on 99:59:59.999 --> 99:59:59.999 the line ordering, we detect whether a file differs only in the line ordering. 99:59:59.999 --> 99:59:59.999 So, here's file "a", "These lines are in order". 99:59:59.999 --> 99:59:59.999 File "b" has "These order are in lines". 99:59:59.999 --> 99:59:59.999 It's very difficult to say, actually, it's like one of these tongue twisters. 99:59:59.999 --> 99:59:59.999 Run diffoscope on these two and it says it's got ordering differences only. 99:59:59.999 --> 99:59:59.999 That's interesting, so you probably need to sort, 99:59:59.999 --> 99:59:59.999 you go all the way back to the source code, work out very quickly, 99:59:59.999 --> 99:59:59.999 if you know it's just ordering differences you just kind of know 99:59:59.999 --> 99:59:59.999 what the output's gonna be, you can search for order in ??? 99:59:59.999 --> 99:59:59.999 and you get the right files, 99:59:59.999 --> 99:59:59.999 ??? sort in the right place, BAM, send it patch of (???), 99:59:59.999 --> 99:59:59.999 everything is great. 99:59:59.999 --> 99:59:59.999 Oh, and send it to upstream as well because you're good. 99:59:59.999 --> 99:59:59.999 It supports a lot more things. 99:59:59.999 --> 99:59:59.999 We've been showing the terminal text output here. 99:59:59.999 --> 99:59:59.999 It's got a HTML output mode, which is really useful in the hierarchal thing 99:59:59.999 --> 99:59:59.999 when it gets a bit more complicated. 99:59:59.999 --> 99:59:59.999 Instead of being laid on top of each other like a unified diff, 99:59:59.999 --> 99:59:59.999 you get the diff on the left and the right and you get sort of a nested 99:59:59.999 --> 99:59:59.999 thing inside with colors and lines and you can link this and various things in it 99:59:59.999 --> 99:59:59.999 including bits of metadata here, other bits here, what command you used. 99:59:59.999 --> 99:59:59.999 That's the HTML output. 99:59:59.999 --> 99:59:59.999 We also support a lot of file formats, it's not just on text, 99:59:59.999 --> 99:59:59.999 it's about all of these, so let's quickly run through some of them. 99:59:59.999 --> 99:59:59.999 You give it two Androip apk files which are kind of like zips, but magic. 99:59:59.999 --> 99:59:59.999 It'll know how to compare them. 99:59:59.999 --> 99:59:59.999 There's like a Manifest file that needs decoding. 99:59:59.999 --> 99:59:59.999 It supports Berkeley DB databases, 99:59:59.999 --> 99:59:59.999 Word documents, that's a Word document with "a" and that's a Word document with "b" 99:59:59.999 --> 99:59:59.999 and it'll correctly do that. 99:59:59.999 --> 99:59:59.999 If you run that through diff normally, that ??? be a binaly mess, 99:59:59.999 --> 99:59:59.999 so completely useless. 99:59:59.999 --> 99:59:59.999 E-books, there's epub, it also supports mobi. 99:59:59.999 --> 99:59:59.999 So if you give it two epub files, it'll say "They just differ in this date". 99:59:59.999 --> 99:59:59.999 Brilliant. 99:59:59.999 --> 99:59:59.999 Normally that will be completely useless diff binary ??? 99:59:59.999 --> 99:59:59.999 So you can be like "epub date, ok", grep the source code for that, 99:59:59.999 --> 99:59:59.999 make a patch really quickly. 99:59:59.999 --> 99:59:59.999 Mono binaries, git repositories, why not? 99:59:59.999 --> 99:59:59.999 Gnumeric spreadsheets, ISO images. 99:59:59.999 --> 99:59:59.999 Oh yeah, ISO images is really cool. 99:59:59.999 --> 99:59:59.999 So, it'll basically unpack the ISO, then inside that there might be a squashfs image 99:59:59.999 --> 99:59:59.999 then it'll completely go down to that and work out any differences 99:59:59.999 --> 99:59:59.999 between the two contents in the ISO file, including any metadata. 99:59:59.999 --> 99:59:59.999 This is on the squashfs metadata headers, I think. 99:59:59.999 --> 99:59:59.999 But say inside that ISO, there was a file that was a pdf, and inside that pdf was 99:59:59.999 --> 99:59:59.999 a ??? which varied, 99:59:59.999 --> 99:59:59.999 it will basically go all the way down and say "yeah, it's actually here, 99:59:59.999 --> 99:59:59.999 in this ??? that the data differs." 99:59:59.999 --> 99:59:59.999 And that means you can just go again all the way back to the source 99:59:59.999 --> 99:59:59.999 and say "ok, cool, we know how to fix this quite quickly" 99:59:59.999 --> 99:59:59.999 And this is really valuable in getting the recent Tails distribution reproducible 99:59:59.999 --> 99:59:59.999 so their ISOs are reproducible. 99:59:59.999 --> 99:59:59.999 If you build one and I build one, we get the exact same one 99:59:59.999 --> 99:59:59.999 and that's kind of useful for something like Tails where you would probably want to 99:59:59.999 --> 99:59:59.999 of all, there's a lot of projects that you might want to compromise, 99:59:59.999 --> 99:59:59.999 you might want to go after that one, because of the kind of people that are using it. 99:59:59.999 --> 99:59:59.999 We support comparing images, so this is using ??? 99:59:59.999 --> 99:59:59.999 and then just running that through diff. 99:59:59.999 --> 99:59:59.999 That is a linux penguin and that is something else, 99:59:59.999 --> 99:59:59.999 I can't remember now. Oh, FT. 99:59:59.999 --> 99:59:59.999 It supports images. 99:59:59.999 --> 99:59:59.999 It supports JSON and pretty print, so if you give it two JSON files 99:59:59.999 --> 99:59:59.999 one with key/value… it'll do a nice diff of them. 99:59:59.999 --> 99:59:59.999 It will pretty print it first, before doing the diff, so it'll actually give you 99:59:59.999 --> 99:59:59.999 something clean, otherwise I don't know if you've ever diffed 99:59:59.999 --> 99:59:59.999 two very long JSON lines, if they differ in the middle, you just get 99:59:59.999 --> 99:59:59.999 a huge long unified diff, but here it's like "oh, just ??? things have changed" 99:59:59.999 --> 99:59:59.999 OpenDocument text formats, Ogg audio files, because why not. 99:59:59.999 --> 99:59:59.999 tcpdump capture files, that's actually quite useful. 99:59:59.999 --> 99:59:59.999 PDFs. That PDF says "Hello World" and this PDF says "Hello sick sad world", 99:59:59.999 --> 99:59:59.999 I don't know why. ??? in the demo. 99:59:59.999 --> 99:59:59.999 Again, run that through normal diff program… garbage. 99:59:59.999 --> 99:59:59.999 XML documents. Again, it'll pretty print them so it's nice, actually nice do read. 99:59:59.999 --> 99:59:59.999 If you want to get started on diffoscope, the very easiest and quickest way to do is 99:59:59.999 --> 99:59:59.999 fire up a web browser, try.diffoscope.org, select your files, press Compare 99:59:59.999 --> 99:59:59.999 and it'll upload them and run diffoscope with all the support for all the file formats 99:59:59.999 --> 99:59:59.999 in the cloud for you and give you a nice HTML page that you can then link to people 99:59:59.999 --> 99:59:59.999 So that's the very quickest way to get started. 99:59:59.999 --> 99:59:59.999 The next quickest way is to install trydiffoscope and then you run that 99:59:59.999 --> 99:59:59.999 on two files and it'll basically do the same thing, 99:59:59.999 --> 99:59:59.999 run it in the same cloud service as trydiffoscope 99:59:59.999 --> 99:59:59.999 but it'll give you the result on the command line or 99:59:59.999 --> 99:59:59.999 if you pass the webbrowser option, it will give you an URL or load your webbrowser, 99:59:59.999 --> 99:59:59.999 I can't remember exactly which, with the same results. 99:59:59.999 --> 99:59:59.999 This is 1kB of Python, nothing basically. 99:59:59.999 --> 99:59:59.999 That's the next easiest way. 99:59:59.999 --> 99:59:59.999 But you can then install diffoscope itself on your own machine. 99:59:59.999 --> 99:59:59.999 I recommend not installing recommends because all of those file formats 99:59:59.999 --> 99:59:59.999 might drag in extra things about the whole of TeX, 99:59:59.999 --> 99:59:59.999 I think the whole of OpenOffice, whole of Mono, whole Java… 99:59:59.999 --> 99:59:59.999 Android, yeah, quite big. 99:59:59.999 --> 99:59:59.999 I think there's another big one I can't think of. 99:59:59.999 --> 99:59:59.999 They're all optional, and they all say "By the way, I support TeX documents 99:59:59.999 --> 99:59:59.999 or whatever, Mono, whatever. 99:59:59.999 --> 99:59:59.999 But you need to install this package and then you get full pretty printed support", 99:59:59.999 --> 99:59:59.999 And it'll tell you that when it's missing. 99:59:59.999 --> 99:59:59.999 So, if you just start with --install-recommends disabled, 99:59:59.999 --> 99:59:59.999 right on your file, if it says "please install this package, you can then 99:59:59.999 --> 99:59:59.999 install them as you go along, as you want" 99:59:59.999 --> 99:59:59.999 rather than installing everything. 99:59:59.999 --> 99:59:59.999 And then ??? and then works as before 99:59:59.999 --> 99:59:59.999 I you can improve all your own quality assurance and debian packaging 99:59:59.999 --> 99:59:59.999 with different scope 99:59:59.999 --> 99:59:59.999 The biggest value here is not necessary for reproducible builds 99:59:59.999 --> 99:59:59.999 It's for basically just seeing where you do want to have a diff or expecting a diff 99:59:59.999 --> 99:59:59.999 and you are expecting a particularly type of diff in a particularly way 99:59:59.999 --> 99:59:59.999 you can basically see those changes 99:59:59.999 --> 99:59:59.999 And if you build two debs normally and ... i'll try to demo in a second 99:59:59.999 --> 99:59:59.999 You build a deb with a patch applied you can ??? see a diff on the source package 99:59:59.999 --> 99:59:59.999 But that's not very useful because the binaries are going to end in the 99:59:59.999 --> 99:59:59.999 people machines. But if you run a diff on the binary itself, did that change and 99:59:59.999 --> 99:59:59.999 really hit the binary, I think really ... No.. 99:59:59.999 --> 99:59:59.999 I just run through a very live demo of course, so it's gonna fail ... 99:59:59.999 --> 99:59:59.999 Checkout some .... We'll get this libnetx-java 99:59:59.999 --> 99:59:59.999 We just build that once 99:59:59.999 --> 99:59:59.999 Lets say we are on security team and 99:59:59.999 --> 99:59:59.999 want to apply a patch, and we want to be really sure because we are to push it out 99:59:59.999 --> 99:59:59.999 to all our users 99:59:59.999 --> 99:59:59.999 First we will make a changelog 99:59:59.999 --> 99:59:59.999 Closing a bug 99:59:59.999 --> 99:59:59.999 Find some java file to change 99:59:59.999 --> 99:59:59.999 Let's pretend we have a real patch 99:59:59.999 --> 99:59:59.999 Let's replace that equals equals, say that was the fix 99:59:59.999 --> 99:59:59.999 So that's the patch from upstream 99:59:59.999 --> 99:59:59.999 Upstream blast patch 99:59:59.999 --> 99:59:59.999 When we build this what we wanna see is just that change in the file 99:59:59.999 --> 99:59:59.999 we wanna see any nonsense changes of extended ??? but we also definitely want 99:59:59.999 --> 99:59:59.999 to see that change, cause if our binary as for security reasons don't have that change 99:59:59.999 --> 99:59:59.999 the we aren't fixing people machines, they will issue a DSA ??? installed, saying 99:59:59.999 --> 99:59:59.999 And you should do proper testing as well at multiple levels 99:59:59.999 --> 99:59:59.999 I will build that again 99:59:59.999 --> 99:59:59.999 So we wanna diff the original one 99:59:59.999 --> 99:59:59.999 We wanna diff that one with a fake security one 99:59:59.999 --> 99:59:59.999 You see on the progress bar 100% 1- there are diferences (there should be 99:59:59.999 --> 99:59:59.999 diferences) Lets see what that diferences are 99:59:59.999 --> 99:59:59.999 in our web browser, its a nice html output 99:59:59.999 --> 99:59:59.999 Let have a look. Are we seeing what we wanna see? 99:59:59.999 --> 99:59:59.999 There are some chances in the data ta, we kind of expect that 99:59:59.999 --> 99:59:59.999 Whats changed in our control file? Well the version changed,we wanted that 99:59:59.999 --> 99:59:59.999 to change. Perfect 99:59:59.999 --> 99:59:59.999 And its changed to ??? That's what we wanna see 99:59:59.999 --> 99:59:59.999 No other changes here so there was no weird control or in magic going on 99:59:59.999 --> 99:59:59.999 In our data tar the color of the timestamp changes, we will ignore these for now 99:59:59.999 --> 99:59:59.999 The changelog has changed, well I hope so because I have changed that entry 99:59:59.999 --> 99:59:59.999 Here is where we going to start seeing We are going to see the changing in the 99:59:59.999 --> 99:59:59.999 jar file which is the java class, java compile archive format 99:59:59.999 --> 99:59:59.999 We are seeing some meaningless timestamp changes but we can ignore those 99:59:59.999 --> 99:59:59.999 ??? cause its just metadata maybe 99:59:59.999 --> 99:59:59.999 Ok part of a class, so if you can see here it's basically a de-compilation of the 99:59:59.999 --> 99:59:59.999 java file itself and it's basically saying "oh I use to say if now and if not now" 99:59:59.999 --> 99:59:59.999 So these are the actual byte java byte code instructions and whats really 99:59:59.999 --> 99:59:59.999 And what is really ??? here its that nothing else has changed 99:59:59.999 --> 99:59:59.999 We were just expecting that change between the two op codes, of if now elseif not not now 99:59:59.999 --> 99:59:59.999 which is good cause its like it hasn't made any code changes but also crucial we can 99:59:59.999 --> 99:59:59.999 see that it has actually made a change to the code. 99:59:59.999 --> 99:59:59.999 For example its wasn't use some cached version or something like that 99:59:59.999 --> 99:59:59.999 This is really useful 99:59:59.999 --> 99:59:59.999 And just running a naif diff wouldn't give that of course, because it would just 99:59:59.999 --> 99:59:59.999 come with binary garbage And just seeing the diff had changed again 99:59:59.999 --> 99:59:59.999 ??? be told you anything, because all of the change would have changed as well 99:59:59.999 --> 99:59:59.999 So its like well yes it's diferent 99:59:59.999 --> 99:59:59.999 The meaningful change there it's what actually fixes the "floor" 99:59:59.999 --> 99:59:59.999 ??? but we know it's there 99:59:59.999 --> 99:59:59.999 That's kind of ??? Shifting this deb out I'll be quite 99:59:59.999 --> 99:59:59.999 confident, that this seemed like the actual bug 99:59:59.999 --> 99:59:59.999 I've been quite confident pushing that out because it's very minimal amount of changes 99:59:59.999 --> 99:59:59.999 you wanna do that for security reasons 99:59:59.999 --> 99:59:59.999 So this was the live demo 99:59:59.999 --> 99:59:59.999 The other one is seeing no changes at all, so you can build once 99:59:59.999 --> 99:59:59.999 if you build a reproducible 99:59:59.999 --> 99:59:59.999 You can build once change your compiler or change some other part of your toolchain 99:59:59.999 --> 99:59:59.999 Build it again and if you got the exact same results, well great, that's want you intended 99:59:59.999 --> 99:59:59.999 You wanna see no changes when you change some part of it 99:59:59.999 --> 99:59:59.999 And that is really useful, if there were changes diffoscope will highlight them 99:59:59.999 --> 99:59:59.999 and show exactly why they had changed, maybe some compile authorizations, 99:59:59.999 --> 99:59:59.999 maybe some other things as well 99:59:59.999 --> 99:59:59.999 So you can use it in both ways, when you expect changes and when you don't expect 99:59:59.999 --> 99:59:59.999 changes, and if those match the expectations diffoscope will tell you exactly why 99:59:59.999 --> 99:59:59.999 It's all ??? when other companies are doing security releases 99:59:59.999 --> 99:59:59.999 naming no names whatsoever, but they like to release patches as you 99:59:59.999 --> 99:59:59.999 know just a new firmware for your router 99:59:59.999 --> 99:59:59.999 Very large file system images, you basically have no ideia what changed 99:59:59.999 --> 99:59:59.999 between these two files, again you ??? through diff completely useless 99:59:59.999 --> 99:59:59.999 You can start to unpack them with ??? and blah blah blah 99:59:59.999 --> 99:59:59.999 But they're probably sort of concatenated cpio archives, so that's nonsense 99:59:59.999 --> 99:59:59.999 But diffoscope would just chew you those and give you actually what the diferences 99:59:59.999 --> 99:59:59.999 is between these two files, and say they changed this, they've removed or 99:59:59.999 --> 99:59:59.999 added some gpl license code or something kind of interesting 99:59:59.999 --> 99:59:59.999 So its very useful for diffing those kind binary blobs that come from various people 99:59:59.999 --> 99:59:59.999 So the current state of diffoscope, the development is up and down 99:59:59.999 --> 99:59:59.999 It started around May 2014 something like that A bunch of work here, that's is idle I think 99:59:59.999 --> 99:59:59.999 There are just for debconfs basically 99:59:59.999 --> 99:59:59.999 Anyway it's going up and down its kind of interesting 99:59:59.999 --> 99:59:59.999 ??? a lot of reproducible builds projects of course, so every time we do a build 99:59:59.999 --> 99:59:59.999 on the ??? reproducible builds or testing framework if we run diffoscope 99:59:59.999 --> 99:59:59.999 on the result, if it's reproducible it just says , hey the file is the same 99:59:59.999 --> 99:59:59.999 But if not, we publish the diffoscopes of all your packages that are unreproducible 99:59:59.999 --> 99:59:59.999 just you can just go there and be like whats the diference between these two things 99:59:59.999 --> 99:59:59.999 I invested a lot of work optimizing diffoscope, ??? rather perverse end square 99:59:59.999 --> 99:59:59.999 loops inside it. So i manage to cut down some of the time here, cut down here 99:59:59.999 --> 99:59:59.999 That's been quite a few performances and enhancements over the past ... 99:59:59.999 --> 99:59:59.999 these are the git tags , this is version 80 and this is version 50 I just run the same 99:59:59.999 --> 99:59:59.999 benchmark across them all 99:59:59.999 --> 99:59:59.999 So they shows when I have introduced some rather stupid code, embarrassing , but whatever 99:59:59.999 --> 99:59:59.999 ??? 99:59:59.999 --> 99:59:59.999 There's work been done right now, on parallel processing, there's been 99:59:59.999 --> 99:59:59.999 quite a few attempts before, but adding it it's kind of interesting and difficult 99:59:59.999 --> 99:59:59.999 Luckily we have a ??? student Liliana, is she in the room? Is she hiding? 99:59:59.999 --> 99:59:59.999 She's here and she's been talking tomorrow about her work on paralel processing in 99:59:59.999 --> 99:59:59.999 diffoscope and that will be amazing because a lot of it is IO bound or waiting for Xtel 99:59:59.999 --> 99:59:59.999 processors with multiple cpu machines, you mind as well just play well 99:59:59.999 --> 99:59:59.999 while as I stand waiting for the result for a pdf to be unpacked I maybe as well 99:59:59.999 --> 99:59:59.999 be running on another cpu, I think we are going to see some real performance wins 99:59:59.999 --> 99:59:59.999 as we do that paralell processing merge and working and ??? 99:59:59.999 --> 99:59:59.999 You can check out our website diffoscope.org recently migrated to Salsa .... yeeaahhh 99:59:59.999 --> 99:59:59.999 And everything ??? reproducible is now on Salsa, it's kind of cool 99:59:59.999 --> 99:59:59.999 That's quite recent... 99:59:59.999 --> 99:59:59.999 Thank you very muck, Danke shcön 99:59:59.999 --> 99:59:59.999 You got any questions? About diffoscope? 99:59:59.999 --> 99:59:59.999 Thank you very much ! 99:59:59.999 --> 99:59:59.999 Q: A buzz word question, can you diff containers image formats? 99:59:59.999 --> 99:59:59.999 A: Depend which ones. So if they are just directory, then yes, because is just a directory 99:59:59.999 --> 99:59:59.999 Do you have particullary in mind? Like docker? 99:59:59.999 --> 99:59:59.999 Yes, there's docker and then there's old CI, I believe is the standard one 99:59:59.999 --> 99:59:59.999 And that could make a buzz word complaint 99:59:59.999 --> 99:59:59.999 Ah ok we were all about buzz words 99:59:59.999 --> 99:59:59.999 Probable diffoscope block change as well 99:59:59.999 --> 99:59:59.999 And then run diffoscope on connectors and see the difference between updates of your 99:59:59.999 --> 99:59:59.999 container images 99:59:59.999 --> 99:59:59.999 BAM ... solved Where do I invest? 99:59:59.999 --> 99:59:59.999 I wasn't aware that OCI ... that's is how it's called? No it doesn't support that right now 99:59:59.999 --> 99:59:59.999 But it wouldn't be too difficult, presuming are tools to unpack it and as soon we have 99:59:59.999 --> 99:59:59.999 a tool to unpack it, it can then just go to that, there is a wishing list tool box 99:59:59.999 --> 99:59:59.999 for docker containers to the point were I think it would be really nice if you 99:59:59.999 --> 99:59:59.999 could just give it, say, two images names or whatever the noun is 99:59:59.999 --> 99:59:59.999 So you can say "please diff these two docker images that are available" and 99:59:59.999 --> 99:59:59.999 it can look at your local thing and do a diff on them, currently it's not 99:59:59.999 --> 99:59:59.999 supported, but there is an open wishlist bug. 99:59:59.999 --> 99:59:59.999 Q: Shouldn't any company that releases binaries, be interested in supporting 99:59:59.999 --> 99:59:59.999 diffoscope and using it? 99:59:59.999 --> 99:59:59.999 A1: Basically when companies release binaries they are not interested in users seeing diferences... 99:59:59.999 --> 99:59:59.999 A2: Yes, I'm surprised that actually the docker bug was only opened two months ago 99:59:59.999 --> 99:59:59.999 and hasn't been more interest on diffing container images, but if you like to open 99:59:59.999 --> 99:59:59.999 one for OCI that will be very appreciated, and we can get on to that, that would be 99:59:59.999 --> 99:59:59.999 great. 99:59:59.999 --> 99:59:59.999 I was looking the page for OCI, it says it's based on docker basically, so 99:59:59.999 --> 99:59:59.999 once you get OCI for free, you would sort it out for docker, if you're lucky 99:59:59.999 --> 99:59:59.999 The OCI image formaters, they wrote out on docker images 99:59:59.999 --> 99:59:59.999 Ok we will sort that out, and it seems like we're using a docker more and more 99:59:59.999 --> 99:59:59.999 on debian 99:59:59.999 --> 99:59:59.999 Any other questions? 99:59:59.999 --> 99:59:59.999 Q: Out of curiosity, which ??? are you using inside? Are you using some bio-informatics 99:59:59.999 --> 99:59:59.999 on ??? to diff trees efficiently? 99:59:59.999 --> 99:59:59.999 A: No it's really naif, all it does is run normal diff, the normal diff tools, but 99:59:59.999 --> 99:59:59.999 it will try to identify files and unpack first, so use the file utility identifier 99:59:59.999 --> 99:59:59.999 thing that says its a pdf , and try to unpack it first, he doesn't do any clever 99:59:59.999 --> 99:59:59.999 matching. The clever matching that he does do is fuzzy matching as well, so if just 99:59:59.999 --> 99:59:59.999 rename a directory between two inside a container, he will say , yeah there a 99:59:59.999 --> 99:59:59.999 massive match between this two files, and things like that. So that's kind of 99:59:59.999 --> 99:59:59.999 useful. ??? it's not so that clever, which is kind of what you want , cause if it's 99:59:59.999 --> 99:59:59.999 too clever it would start to be a little opaque ... 99:59:59.999 --> 99:59:59.999 I personally like dumb tools. 99:59:59.999 --> 99:59:59.999 Q: So one question to you is whether, if you wanna do a release to stable or 99:59:59.999 --> 99:59:59.999 something like that, you can ask for the debdiff, I'm wandering if anyone 99:59:59.999 --> 99:59:59.999 I mean I remember doing that myself I've been submitting diffoscope output 99:59:59.999 --> 99:59:59.999 as well, because is just more readable and useful. so I'm not sure if anyone have any 99:59:59.999 --> 99:59:59.999 objection to people asking for those. 99:59:59.999 --> 99:59:59.999 I'll propose that to the release team see what they say 99:59:59.999 --> 99:59:59.999 Thank you very much, any further questions? 99:59:59.999 --> 99:59:59.999 [Applause]