9:59:59.000,9:59:59.000 I'm here today to talk to you about[br]diffoscope 9:59:59.000,9:59:59.000 and how you can use it as a better diff 9:59:59.000,9:59:59.000 or for Quality Assurance, etc., things[br]like that. 9:59:59.000,9:59:59.000 Moin! 9:59:59.000,9:59:59.000 Apparently that's like a north german[br]thing to say "welcome". 9:59:59.000,9:59:59.000 North german, north Denmark, Scandinavia,[br]that kind of thing, I'm told. 9:59:59.000,9:59:59.000 People are shaking their head, so I'm[br]going to assume that's true. 9:59:59.000,9:59:59.000 This is my first PC, an IBM 5155. 9:59:59.000,9:59:59.000 Sometimes, when you rebooted it, it would[br]launch into, it would somehow revert 9:59:59.000,9:59:59.000 from booting from the hard disk to booting[br]from a basic ROM, 9:59:59.000,9:59:59.000 as in the programming language ROM. 9:59:59.000,9:59:59.000 It was on my motherboard for some reason. 9:59:59.000,9:59:59.000 So, randomly, you just get a chance to[br]program in basic and then, 9:59:59.000,9:59:59.000 sometimes you wouldn't, I don't know why,[br]but… yeah. 9:59:59.000,9:59:59.000 It's quite fun with this kind of clicky[br]keyboard, and that folded in 9:59:59.000,9:59:59.000 and it was this kind of big desk thing. 9:59:59.000,9:59:59.000 Anyway… 9:59:59.000,9:59:59.000 This is my first Debian. 9:59:59.000,9:59:59.000 At the time it was already old. 9:59:59.000,9:59:59.000 What's this one? Is this Slink? 2.2?[br]Yeah. 9:59:59.000,9:59:59.000 And this is when we had US and non-US,[br]so that's really dating if you remember that. 9:59:59.000,9:59:59.000 This is my first contribution to Debian,[br]19th December 2006, 9:59:59.000,9:59:59.000 sending a patch to lillypond which is kind[br]of interesting 9:59:59.000,9:59:59.000 and the response was "Oh yeah, rock on,[br]many thanks. I'll upload this and 9:59:59.000,9:59:59.000 it'll be landing to Etch". 9:59:59.000,9:59:59.000 And this was super motivating because[br]Etch was just coming out and it was like 9:59:59.000,9:59:59.000 "Great, I've got let one line of tiny patch[br]in a release. This is super cool." 9:59:59.000,9:59:59.000 Thomas' response was super motivating. 9:59:59.000,9:59:59.000 So, after that, like that Christmas[br]basically spent ??? 9:59:59.000,9:59:59.000 Debian webpages and stuff. 9:59:59.000,9:59:59.000 Very well timed. 9:59:59.000,9:59:59.000 That's kind of a good… 9:59:59.000,9:59:59.000 You know, someone sends a patch, be like[br]"Cool, thanks" 9:59:59.000,9:59:59.000 Like a little notice in the changelog. 9:59:59.000,9:59:59.000 It was, you know, so stupid but…[br]Yeah, do that kind of thing. 9:59:59.000,9:59:59.000 So, moving on. 9:59:59.000,9:59:59.000 Why diffoscope?[br]Why did we write diffoscope? 9:59:59.000,9:59:59.000 What's the background here? 9:59:59.000,9:59:59.000 It comes from reproducible builds. 9:59:59.000,9:59:59.000 The very quick outline is that once you[br]get the source code for free software, 9:59:59.000,9:59:59.000 you download the source code for nginx[br]or whatever, 9:59:59.000,9:59:59.000 pretty much everyone just runs binaries[br]on their servers or their systems. 9:59:59.000,9:59:59.000 You know, "apt install bla", "yum install",[br]whatever. 9:59:59.000,9:59:59.000 Android Playstore, whatever. 9:59:59.000,9:59:59.000 Can you actually trust whether these two[br]things correspond with each other? 9:59:59.000,9:59:59.000 You've gotten the source code, it looks[br]alright, and then you install this binary, 9:59:59.000,9:59:59.000 yeah… 9:59:59.000,9:59:59.000 Who generated that? Can you trust that[br]process? 9:59:59.000,9:59:59.000 Can you trust who generated it? 9:59:59.000,9:59:59.000 Even if you could trust them, could you[br]trust them not to be exploited? Etc. 9:59:59.000,9:59:59.000 This is a big problem because you can[br]exploit a build farm and then 9:59:59.000,9:59:59.000 obviously exploit all of that, you know,[br]a trojan into the build farm, 9:59:59.000,9:59:59.000 so every single binary that comes out[br]is compromised. 9:59:59.000,9:59:59.000 Kind of problematic. 9:59:59.000,9:59:59.000 You could also target individual developers[br]machines, 9:59:59.000,9:59:59.000 so I could go of to, say, your machine,[br]add a backdoor to it, 9:59:59.000,9:59:59.000 so every binary that you give to friends[br]and things like that, 9:59:59.000,9:59:59.000 are compromised in some way, stealing[br]your bitcoins or whatever. 9:59:59.000,9:59:59.000 I can also ???[br]and blackmail you into producing 9:59:59.000,9:59:59.000 software that has compromises or extra[br]features, shall we say, 9:59:59.000,9:59:59.000 that don't exist in the source code. 9:59:59.000,9:59:59.000 So what will happen there is that you'd[br]release your source 9:59:59.000,9:59:59.000 and the binaries you produce have[br]this sort of backdoor that, you know, 9:59:59.000,9:59:59.000 someone is forcing you into producing. 9:59:59.000,9:59:59.000 So, you don't want to do that. 9:59:59.000,9:59:59.000 Anyway 9:59:59.000,9:59:59.000 enough of that. 9:59:59.000,9:59:59.000 What you do for reproducible builds is you[br]ensure that every time you build 9:59:59.000,9:59:59.000 a piece of software, you get an identical[br]result. 9:59:59.000,9:59:59.000 Multiple people then compare their builds[br]and check whether they all get 9:59:59.000,9:59:59.000 the same results 9:59:59.000,9:59:59.000 and this means that an attacker must[br]either have infected everyone 9:59:59.000,9:59:59.000 at the same time, or they haven't[br]infected anyone. 9:59:59.000,9:59:59.000 The point here is that you have to ensure[br]that builds have identical results. 9:59:59.000,9:59:59.000 Ok, great. 9:59:59.000,9:59:59.000 So, we started the reproducible builds[br]project, etc. 9:59:59.000,9:59:59.000 And we build 2 debs. 9:59:59.000,9:59:59.000 Oh, I'm sorry about the colors there. 9:59:59.000,9:59:59.000 You probably can't see that. 9:59:59.000,9:59:59.000 That says "sha1sum a.deb b.deb". 9:59:59.000,9:59:59.000 Anyway, we're comparing the sha1sums[br]of 2 binary Debian files. 9:59:59.000,9:59:59.000 So, these two files differ. 9:59:59.000,9:59:59.000 Ok, they're not reproducible. 9:59:59.000,9:59:59.000 Why is that? 9:59:59.000,9:59:59.000 So we run a diff on them. 9:59:59.000,9:59:59.000 Yeah… 9:59:59.000,9:59:59.000 So, what can we learn from this? 9:59:59.000,9:59:59.000 Well, not very much, visibly they're[br]compressed so 9:59:59.000,9:59:59.000 as soon as we see one change, we'll see[br]they would just cascade changes 9:59:59.000,9:59:59.000 because that's how compression works. 9:59:59.000,9:59:59.000 I guess we know it's a deb ???[br]format file, not very useful. 9:59:59.000,9:59:59.000 Ok, great so we're gonna have a look in 9:59:59.000,9:59:59.000 We'll do a binary diff and ok, well… 9:59:59.000,9:59:59.000 Again, that's not really telling us[br]very much 9:59:59.000,9:59:59.000 with the diff there. 9:59:59.000,9:59:59.000 Ok, great. 9:59:59.000,9:59:59.000 ??? 9:59:59.000,9:59:59.000 "ar x" is on the new maintainer thing,[br]"how you unpack a deb" 9:59:59.000,9:59:59.000 Everyone remembers this, right? 9:59:59.000,9:59:59.000 You unpack a.deb with "ar x" and you[br]do that to b.deb 9:59:59.000,9:59:59.000 and then we diff the results of that. 9:59:59.000,9:59:59.000 Ok, so…yeah, 7zip. 9:59:59.000,9:59:59.000 Ok, compressed content, not very useful. 9:59:59.000,9:59:59.000 Ok, so let's unpack the control.tar inside[br]these debs. 9:59:59.000,9:59:59.000 And then we run diff on that. 9:59:59.000,9:59:59.000 Still not really telling anything useful[br]about how to make this package reproducible 9:59:59.000,9:59:59.000 So let's unpack the tar.xz into the tar. 9:59:59.000,9:59:59.000 Inside that tar, there's a file called[br]md5sums and we start to see some differences 9:59:59.000,9:59:59.000 between some files in these two debs. 9:59:59.000,9:59:59.000 ??? meaningful, so now[br]we have some idea that 9:59:59.000,9:59:59.000 it has something to do with this[br]usr/bin/pmixer binary. 9:59:59.000,9:59:59.000 Ok, interesting. 9:59:59.000,9:59:59.000 We'll unzip that and then we do a diff on[br]pmixer itself. 9:59:59.000,9:59:59.000 Now we're back into just binary[br]??? mode 9:59:59.000,9:59:59.000 This isn't very helpful and this is taking[br]quite a while 9:59:59.000,9:59:59.000 and if I remember correctly, Debian has[br]a lot of packages. 9:59:59.000,9:59:59.000 So this might take a little while. 9:59:59.000,9:59:59.000 So, basically, ??? meme 9:59:59.000,9:59:59.000 I should build a better diff. 9:59:59.000,9:59:59.000 That's not quite true, this is actually… 9:59:59.000,9:59:59.000 It was lunar that started this project 9:59:59.000,9:59:59.000 and it was called debbindiff, because[br]we wanted to diff 9:59:59.000,9:59:59.000 binary Debian packages. 9:59:59.000,9:59:59.000 So this is the initial commit, 2014. 9:59:59.000,9:59:59.000 "The version is successfully able to report[br]differences in two .changes files. 9:59:59.000,9:59:59.000 Not with much interesting details,[br]but it's a start." 9:59:59.000,9:59:59.000 And it was a start. 9:59:59.000,9:59:59.000 Fast forwarding… Oh, sorry about these[br]colors, 9:59:59.000,9:59:59.000 I don't know if we can do anything about[br]the lights? 9:59:59.000,9:59:59.000 Yeah? 9:59:59.000,9:59:59.000 No? 9:59:59.000,9:59:59.000 Alright, well… 9:59:59.000,9:59:59.000 Basically, we're diffoscoping on… 9:59:59.000,9:59:59.000 It works kind of diff does normally, 9:59:59.000,9:59:59.000 you give it two files, it outputs[br]a unified diff. 9:59:59.000,9:59:59.000 So "diffoscope a b", one file contains[br]the word "foo", one contains the word "bar". 9:59:59.000,9:59:59.000 Nothing actually out of the ordinary. 9:59:59.000,9:59:59.000 It's sort of colored by default, so that's[br]why you can't see it, but whatever. 9:59:59.000,9:59:59.000 It supports archive formats, so if you[br]give it two tar files, 9:59:59.000,9:59:59.000 if we then tar up our "a" file and[br]our "b" file into a a.tar and b.tar 9:59:59.000,9:59:59.000 and then run diffoscope on those tar files 9:59:59.000,9:59:59.000 we get this kind of, like, hierarchy here. 9:59:59.000,9:59:59.000 So it's saying that there are differencies[br]between these files, 9:59:59.000,9:59:59.000 in the file list they have different time[br]stamps, because I made them 9:59:59.000,9:59:59.000 at different times, 9:59:59.000,9:59:59.000 and here are the contents, so we got[br]"foo" there and "bar" there. 9:59:59.000,9:59:59.000 So we can see the difference between them. 9:59:59.000,9:59:59.000 Well, I can, I don't know if you can,[br]you get the slide there. 9:59:59.000,9:59:59.000 If we gzip these tar files and then run[br]diffoscope on those gzip things, 9:59:59.000,9:59:59.000 it'll say "ok, what we've done is unpack it[br]first, and here's the metadata 9:59:59.000,9:59:59.000 about the gzip process", 9:59:59.000,9:59:59.000 and inside that are a.tar and b.tar[br]from the previous slides. 9:59:59.000,9:59:59.000 And then the "a" file and the "b" file. 9:59:59.000,9:59:59.000 So, it's really going two levels deep[br]into this tar.gz file. 9:59:59.000,9:59:59.000 That's pretty cool. 9:59:59.000,9:59:59.000 And it's completely recursive, I think[br]it will actually blow out after, I think, 9:59:59.000,9:59:59.000 1000 [levels]. 9:59:59.000,9:59:59.000 [light is turned down for the audience[br]to see the slides] 9:59:59.000,9:59:59.000 I'll just bump back a bit, just in case. 9:59:59.000,9:59:59.000 [Applause] 9:59:59.000,9:59:59.000 Thank you. 9:59:59.000,9:59:59.000 So that's the a and b files. 9:59:59.000,9:59:59.000 We've tared them up and so I see[br]the hierarchy of foo and bar file layer. 9:59:59.000,9:59:59.000 I've gziped them, so this is a gzip layer. 9:59:59.000,9:59:59.000 Here's the tar layer and then there's[br]the files themselves. 9:59:59.000,9:59:59.000 This is from a real .deb from the archive. 9:59:59.000,9:59:59.000 Inside this .deb, there's a data.tar.xz[br]and in that xz file there's a data.tar 9:59:59.000,9:59:59.000 and inside that tar file, there's a file[br]called aff and inside that 9:59:59.000,9:59:59.000 there's a version string that is different. 9:59:59.000,9:59:59.000 And that looks like a build date so we[br]probably know that if we went back 9:59:59.000,9:59:59.000 to the source package, we could very[br]quickly work out, 9:59:59.000,9:59:59.000 with get a very quick grep, work out[br]where this file is being generated from, 9:59:59.000,9:59:59.000 the de_DE.aff file and then ???[br]probably quite obvious 9:59:59.000,9:59:59.000 that it's using the current build time[br]and then we can just patch that, fix it etc. 9:59:59.000,9:59:59.000 This is gone from two rather obscure[br]binary .debs all the way to the fix 9:59:59.000,9:59:59.000 probably in about 5 minutes, and you can[br]probably send the patch in that time 9:59:59.000,9:59:59.000 because it'd be quite quick. 9:59:59.000,9:59:59.000 Without diffoscope here, without this sort[br]of recursive unpacking, 9:59:59.000,9:59:59.000 you'd be just completely lost, you'd be[br]there with arx all day 9:59:59.000,9:59:59.000 and working out which files are different[br]and trying to use xxd 9:59:59.000,9:59:59.000 and this kind of nonsense. 9:59:59.000,9:59:59.000 diffoscope's got some other things as well 9:59:59.000,9:59:59.000 if you try to do reproducible packages[br]and things are varying just on 9:59:59.000,9:59:59.000 the line ordering, we detect whether[br]a file differs only in the line ordering. 9:59:59.000,9:59:59.000 So, here's file "a", "These lines are in[br]order". 9:59:59.000,9:59:59.000 File "b" has "These order are in lines". 9:59:59.000,9:59:59.000 It's very difficult to say, actually,[br]it's like one of these tongue twisters. 9:59:59.000,9:59:59.000 Run diffoscope on these two and it says[br]it's got ordering differences only. 9:59:59.000,9:59:59.000 That's interesting, so you probably need[br]to sort, 9:59:59.000,9:59:59.000 you go all the way back to the source code,[br]work out very quickly, 9:59:59.000,9:59:59.000 if you know it's just ordering differences[br]you just kind of know 9:59:59.000,9:59:59.000 what the output's gonna be, you can[br]search for order in ??? 9:59:59.000,9:59:59.000 and you get the right files, 9:59:59.000,9:59:59.000 ??? sort in the right place,[br]BAM, send it patch of (???), 9:59:59.000,9:59:59.000 everything is great. 9:59:59.000,9:59:59.000 Oh, and send it to upstream as well[br]because you're good. 9:59:59.000,9:59:59.000 It supports a lot more things. 9:59:59.000,9:59:59.000 We've been showing the terminal[br]text output here. 9:59:59.000,9:59:59.000 It's got a HTML output mode, which is[br]really useful in the hierarchal thing 9:59:59.000,9:59:59.000 when it gets a bit more complicated. 9:59:59.000,9:59:59.000 Instead of being laid on top of each other[br]like a unified diff, 9:59:59.000,9:59:59.000 you get the diff on the left and the right[br]and you get sort of a nested 9:59:59.000,9:59:59.000 thing inside with colors and lines and[br]you can link this and various things in it 9:59:59.000,9:59:59.000 including bits of metadata here, other[br]bits here, what command you used. 9:59:59.000,9:59:59.000 That's the HTML output. 9:59:59.000,9:59:59.000 We also support a lot of file formats,[br]it's not just on text, 9:59:59.000,9:59:59.000 it's about all of these, so let's quickly[br]run through some of them. 9:59:59.000,9:59:59.000 You give it two Androip apk files which[br]are kind of like zips, but magic. 9:59:59.000,9:59:59.000 It'll know how to compare them. 9:59:59.000,9:59:59.000 There's like a Manifest file that needs[br]decoding. 9:59:59.000,9:59:59.000 It supports Berkeley DB databases, 9:59:59.000,9:59:59.000 Word documents, that's a Word document[br]with "a" and that's a Word document with "b" 9:59:59.000,9:59:59.000 and it'll correctly do that. 9:59:59.000,9:59:59.000 If you run that through diff normally,[br]that ??? be a binaly mess, 9:59:59.000,9:59:59.000 so completely useless. 9:59:59.000,9:59:59.000 E-books, there's epub, it also supports[br]mobi. 9:59:59.000,9:59:59.000 So if you give it two epub files, it'll say[br]"They just differ in this date". 9:59:59.000,9:59:59.000 Brilliant. 9:59:59.000,9:59:59.000 Normally that will be completely useless[br]diff binary ??? 9:59:59.000,9:59:59.000 So you can be like "epub date, ok", grep[br]the source code for that, 9:59:59.000,9:59:59.000 make a patch really quickly. 9:59:59.000,9:59:59.000 Mono binaries, git repositories, why not? 9:59:59.000,9:59:59.000 Gnumeric spreadsheets, ISO images. 9:59:59.000,9:59:59.000 Oh yeah, ISO images is really cool. 9:59:59.000,9:59:59.000 So, it'll basically unpack the ISO, then[br]inside that there might be a squashfs image 9:59:59.000,9:59:59.000 then it'll completely go down to that and[br]work out any differences 9:59:59.000,9:59:59.000 between the two contents in the ISO file,[br]including any metadata. 9:59:59.000,9:59:59.000 This is on the squashfs metadata headers,[br]I think. 9:59:59.000,9:59:59.000 But say inside that ISO, there was a file[br]that was a pdf, and inside that pdf was 9:59:59.000,9:59:59.000 a ??? which varied, 9:59:59.000,9:59:59.000 it will basically go all the way down[br]and say "yeah, it's actually here, 9:59:59.000,9:59:59.000 in this ??? that the data differs." 9:59:59.000,9:59:59.000 And that means you can just go again[br]all the way back to the source 9:59:59.000,9:59:59.000 and say "ok, cool, we know how to fix[br]this quite quickly" 9:59:59.000,9:59:59.000 And this is really valuable in getting[br]the recent Tails distribution reproducible 9:59:59.000,9:59:59.000 so their ISOs are reproducible. 9:59:59.000,9:59:59.000 If you build one and I build one, we get[br]the exact same one 9:59:59.000,9:59:59.000 and that's kind of useful for something[br]like Tails where you would probably want to 9:59:59.000,9:59:59.000 of all, there's a lot of projects that you[br]might want to compromise, 9:59:59.000,9:59:59.000 you might want to go after that one,[br]because of the kind of people that are using it. 9:59:59.000,9:59:59.000 We support comparing images, so this is[br]using ??? 9:59:59.000,9:59:59.000 and then just running that through diff. 9:59:59.000,9:59:59.000 That is a linux penguin and that is[br]something else, 9:59:59.000,9:59:59.000 I can't remember now. Oh, FT. 9:59:59.000,9:59:59.000 It supports images. 9:59:59.000,9:59:59.000 It supports JSON and pretty print,[br]so if you give it two JSON files 9:59:59.000,9:59:59.000 one with key/value… it'll do a nice[br]diff of them. 9:59:59.000,9:59:59.000 It will pretty print it first, before[br]doing the diff, so it'll actually give you 9:59:59.000,9:59:59.000 something clean, otherwise I don't know[br]if you've ever diffed 9:59:59.000,9:59:59.000 two very long JSON lines, if they differ[br]in the middle, you just get 9:59:59.000,9:59:59.000 a huge long unified diff, but here it's[br]like "oh, just ??? things have changed" 9:59:59.000,9:59:59.000 OpenDocument text formats,[br]Ogg audio files, because why not. 9:59:59.000,9:59:59.000 tcpdump capture files, that's actually[br]quite useful. 9:59:59.000,9:59:59.000 PDFs. That PDF says "Hello World" and[br]this PDF says "Hello sick sad world", 9:59:59.000,9:59:59.000 I don't know why. ???[br]in the demo. 9:59:59.000,9:59:59.000 Again, run that through normal diff[br]program… garbage. 9:59:59.000,9:59:59.000 XML documents. Again, it'll pretty print[br]them so it's nice, actually nice do read. 9:59:59.000,9:59:59.000 If you want to get started on diffoscope,[br]the very easiest and quickest way to do is 9:59:59.000,9:59:59.000 fire up a web browser, try.diffoscope.org,[br]select your files, press Compare 9:59:59.000,9:59:59.000 and it'll upload them and run diffoscope[br]with all the support for all the file formats 9:59:59.000,9:59:59.000 in the cloud for you and give you a nice[br]HTML page that you can then link to people 9:59:59.000,9:59:59.000 So that's the very quickest way to get[br]started. 9:59:59.000,9:59:59.000 The next quickest way is to install[br]trydiffoscope and then you run that 9:59:59.000,9:59:59.000 on two files and it'll basically do[br]the same thing, 9:59:59.000,9:59:59.000 run it in the same cloud service as[br]trydiffoscope 9:59:59.000,9:59:59.000 but it'll give you the result on the[br]command line or 9:59:59.000,9:59:59.000 if you pass the webbrowser option, it will[br]give you an URL or load your webbrowser, 9:59:59.000,9:59:59.000 I can't remember exactly which, with[br]the same results. 9:59:59.000,9:59:59.000 This is 1kB of Python, nothing basically. 9:59:59.000,9:59:59.000 That's the next easiest way. 9:59:59.000,9:59:59.000 But you can then install diffoscope itself[br]on your own machine. 9:59:59.000,9:59:59.000 I recommend not installing recommends[br]because all of those file formats 9:59:59.000,9:59:59.000 might drag in extra things about[br]the whole of TeX, 9:59:59.000,9:59:59.000 I think the whole of OpenOffice, whole[br]of Mono, whole Java… 9:59:59.000,9:59:59.000 Android, yeah, quite big. 9:59:59.000,9:59:59.000 I think there's another big one I can't[br]think of. 9:59:59.000,9:59:59.000 They're all optional, and they all say[br]"By the way, I support TeX documents 9:59:59.000,9:59:59.000 or whatever, Mono, whatever. 9:59:59.000,9:59:59.000 But you need to install this package and[br]then you get full pretty printed support", 9:59:59.000,9:59:59.000 And it'll tell you that when it's missing. 9:59:59.000,9:59:59.000 So, if you just start with[br]--install-recommends disabled, 9:59:59.000,9:59:59.000 right on your file, if it says[br]"please install this package, you can then 9:59:59.000,9:59:59.000 install them as you go along, as you want" 9:59:59.000,9:59:59.000 rather than installing everything. 9:59:59.000,9:59:59.000 And then ??? and then works as before 9:59:59.000,9:59:59.000 I you can improve all your own quality[br]assurance and debian packaging 9:59:59.000,9:59:59.000 with different scope 9:59:59.000,9:59:59.000 The biggest value here is not[br]necessary for reproducible builds 9:59:59.000,9:59:59.000 It's for basically just seeing where you[br]do want to have a diff or expecting a diff 9:59:59.000,9:59:59.000 and you are expecting a particularly type[br]of diff in a particularly way 9:59:59.000,9:59:59.000 you can basically see those changes[br] 9:59:59.000,9:59:59.000 And if you build two debs normally and[br]... i'll try to demo in a second 9:59:59.000,9:59:59.000 You build a deb with a patch applied you[br]can ??? see a diff on the source package 9:59:59.000,9:59:59.000 But that's not very useful because the[br]binaries are going to end in the 9:59:59.000,9:59:59.000 people machines. But if you run a diff on[br]the binary itself, did that change and 9:59:59.000,9:59:59.000 really hit the binary, I think really ...[br]No.. 9:59:59.000,9:59:59.000 I just run through a very live demo of[br]course, so it's gonna fail ... 9:59:59.000,9:59:59.000 Checkout some .... We'll get this [br]libnetx-java 9:59:59.000,9:59:59.000 We just build that once 9:59:59.000,9:59:59.000 Lets say we are on security team and 9:59:59.000,9:59:59.000 want to apply a patch, and we want to be[br]really sure because we are to push it out 9:59:59.000,9:59:59.000 to all our users 9:59:59.000,9:59:59.000 First we will make a changelog 9:59:59.000,9:59:59.000 Closing a bug 9:59:59.000,9:59:59.000 Find some java file to change 9:59:59.000,9:59:59.000 Let's pretend we have a real patch 9:59:59.000,9:59:59.000 Let's replace that equals equals,[br]say that was the fix 9:59:59.000,9:59:59.000 So that's the patch from upstream 9:59:59.000,9:59:59.000 Upstream blast patch 9:59:59.000,9:59:59.000 When we build this what we wanna see is[br]just that change in the file 9:59:59.000,9:59:59.000 we wanna see any nonsense changes of [br]extended ??? but we also definitely want 9:59:59.000,9:59:59.000 to see that change, cause if our binary as[br]for security reasons don't have that change 9:59:59.000,9:59:59.000 the we aren't fixing people machines,[br]they will issue a DSA ??? installed, saying 9:59:59.000,9:59:59.000 And you should do proper testing as well[br]at multiple levels 9:59:59.000,9:59:59.000 I will build that again 9:59:59.000,9:59:59.000 So we wanna diff the original one 9:59:59.000,9:59:59.000 We wanna diff that one with a fake [br]security one 9:59:59.000,9:59:59.000 You see on the progress bar 100%[br]1- there are diferences (there should be 9:59:59.000,9:59:59.000 diferences)[br]Lets see what that diferences are 9:59:59.000,9:59:59.000 in our web browser, its a nice html output 9:59:59.000,9:59:59.000 Let have a look.[br]Are we seeing what we wanna see? 9:59:59.000,9:59:59.000 There are some chances in the data ta, we[br]kind of expect that 9:59:59.000,9:59:59.000 Whats changed in our control file?[br]Well the version changed,we wanted that 9:59:59.000,9:59:59.000 to change. Perfect 9:59:59.000,9:59:59.000 And its changed to ???[br]That's what we wanna see 9:59:59.000,9:59:59.000 No other changes here so there was no [br]weird control or in magic going on 9:59:59.000,9:59:59.000 In our data tar the color of the timestamp[br]changes, we will ignore these for now 9:59:59.000,9:59:59.000 The changelog has changed, well I hope so[br]because I have changed that entry[br] 9:59:59.000,9:59:59.000 Here is where we going to start seeing[br]We are going to see the changing in the 9:59:59.000,9:59:59.000 jar file which is the java class, java[br]compile archive format 9:59:59.000,9:59:59.000 We are seeing some meaningless timestamp[br]changes but we can ignore those 9:59:59.000,9:59:59.000 ??? cause its just metadata maybe 9:59:59.000,9:59:59.000 Ok part of a class, so if you can see here[br]it's basically a de-compilation of the 9:59:59.000,9:59:59.000 java file itself and it's basically saying[br]"oh I use to say if now and if not now"[br] 9:59:59.000,9:59:59.000 So these are the actual byte java[br]byte code instructions and whats really 9:59:59.000,9:59:59.000 And what is really ??? here[br]its that nothing else has changed[br] 9:59:59.000,9:59:59.000 We were just expecting that change between[br]the two op codes, of if now elseif not not now 9:59:59.000,9:59:59.000 which is good cause its like it hasn't made[br]any code changes but also crucial we can 9:59:59.000,9:59:59.000 see that it has actually made a change[br]to the code. 9:59:59.000,9:59:59.000 For example its wasn't use some cached[br]version or something like that 9:59:59.000,9:59:59.000 This is really useful 9:59:59.000,9:59:59.000 And just running a naif diff wouldn't[br]give that of course, because it would just 9:59:59.000,9:59:59.000 come with binary garbage[br]And just seeing the diff had changed again 9:59:59.000,9:59:59.000 ??? be told you anything, because all of the[br]change would have changed as well 9:59:59.000,9:59:59.000 So its like well yes it's diferent 9:59:59.000,9:59:59.000 The meaningful change there it's[br]what actually fixes the "floor" 9:59:59.000,9:59:59.000 ??? but we know it's there 9:59:59.000,9:59:59.000 That's kind of ??? [br]Shifting this deb out I'll be quite 9:59:59.000,9:59:59.000 confident, that this seemed like the[br]actual bug 9:59:59.000,9:59:59.000 I've been quite confident pushing that out[br]because it's very minimal amount of changes 9:59:59.000,9:59:59.000 you wanna do that for security reasons 9:59:59.000,9:59:59.000 So this was the live demo 9:59:59.000,9:59:59.000 The other one is seeing no changes[br]at all, so you can build once 9:59:59.000,9:59:59.000 if you build a reproducible 9:59:59.000,9:59:59.000 You can build once change your compiler[br]or change some other part of your toolchain 9:59:59.000,9:59:59.000 Build it again and if you got the exact same[br]results, well great, that's want you intended 9:59:59.000,9:59:59.000 You wanna see no changes when you change[br]some part of it 9:59:59.000,9:59:59.000 And that is really useful, if there were[br]changes diffoscope will highlight them 9:59:59.000,9:59:59.000 and show exactly why they had changed,[br]maybe some compile authorizations, 9:59:59.000,9:59:59.000 maybe some other things as well 9:59:59.000,9:59:59.000 So you can use it in both ways, when you[br]expect changes and when you don't expect 9:59:59.000,9:59:59.000 changes, and if those match the expectations[br]diffoscope will tell you exactly why 9:59:59.000,9:59:59.000 It's all ??? when other companies[br]are doing security releases 9:59:59.000,9:59:59.000 naming no names whatsoever,[br]but they like to release patches as you 9:59:59.000,9:59:59.000 know just a new firmware for your router 9:59:59.000,9:59:59.000 Very large file system images,[br]you basically have no ideia what changed 9:59:59.000,9:59:59.000 between these two files, again you ???[br]through diff completely useless 9:59:59.000,9:59:59.000 You can start to unpack them with[br]??? and blah blah blah 9:59:59.000,9:59:59.000 But they're probably sort of concatenated[br]cpio archives, so that's nonsense 9:59:59.000,9:59:59.000 But diffoscope would just chew you those[br]and give you actually what the diferences 9:59:59.000,9:59:59.000 is between these two files, and say[br]they changed this, they've removed or 9:59:59.000,9:59:59.000 added some gpl license code or something[br]kind of interesting 9:59:59.000,9:59:59.000 So its very useful for diffing those kind[br]binary blobs that come from various people 9:59:59.000,9:59:59.000 So the current state of diffoscope,[br]the development is up and down 9:59:59.000,9:59:59.000 It started around May 2014 something like that[br]A bunch of work here, that's is idle I think 9:59:59.000,9:59:59.000 There are just for debconfs basically 9:59:59.000,9:59:59.000 Anyway it's going up and down its kind[br]of interesting 9:59:59.000,9:59:59.000 ??? a lot of reproducible builds projects[br]of course, so every time we do a build[br] 9:59:59.000,9:59:59.000 on the ??? reproducible builds or[br]testing framework if we run diffoscope 9:59:59.000,9:59:59.000 on the result, if it's reproducible it[br]just says , hey the file is the same 9:59:59.000,9:59:59.000 But if not, we publish the diffoscopes of[br]all your packages that are unreproducible 9:59:59.000,9:59:59.000 just you can just go there and be like[br]whats the diference between these two things 9:59:59.000,9:59:59.000 I invested a lot of work optimizing[br]diffoscope, ??? rather perverse end square 9:59:59.000,9:59:59.000 loops inside it. So i manage to cut down[br]some of the time here, cut down here 9:59:59.000,9:59:59.000 That's been quite a few performances and [br]enhancements over the past ... 9:59:59.000,9:59:59.000 these are the git tags , this is version 80[br]and this is version 50 I just run the same 9:59:59.000,9:59:59.000 benchmark across them all 9:59:59.000,9:59:59.000 So they shows when I have introduced some[br]rather stupid code, embarrassing , but whatever 9:59:59.000,9:59:59.000 ??? 9:59:59.000,9:59:59.000 There's work been done right now,[br]on parallel processing, there's been 9:59:59.000,9:59:59.000 quite a few attempts before, but adding it[br]it's kind of interesting and difficult 9:59:59.000,9:59:59.000 Luckily we have a ??? student Liliana,[br]is she in the room? Is she hiding? 9:59:59.000,9:59:59.000 She's here and she's been talking tomorrow[br]about her work on paralel processing in 9:59:59.000,9:59:59.000 diffoscope and that will be amazing because[br]a lot of it is IO bound or waiting for Xtel 9:59:59.000,9:59:59.000 processors with multiple cpu machines,[br]you mind as well just play well 9:59:59.000,9:59:59.000 while as I stand waiting for the result[br]for a pdf to be unpacked I maybe as well 9:59:59.000,9:59:59.000 be running on another cpu, I think we are[br]going to see some real performance wins 9:59:59.000,9:59:59.000 as we do that paralell processing merge and[br]working and ??? 9:59:59.000,9:59:59.000 You can check out our website diffoscope.org[br]recently migrated to Salsa .... yeeaahhh 9:59:59.000,9:59:59.000 And everything ??? reproducible is now on[br]Salsa, it's kind of cool 9:59:59.000,9:59:59.000 That's quite recent... 9:59:59.000,9:59:59.000 Thank you very muck, Danke shcön 9:59:59.000,9:59:59.000 You got any questions?[br]About diffoscope? 9:59:59.000,9:59:59.000 Thank you very much ! 9:59:59.000,9:59:59.000 Q: A buzz word question, can you diff containers[br]image formats? 9:59:59.000,9:59:59.000 A: Depend which ones. So if they are just[br]directory, then yes, because is just a directory 9:59:59.000,9:59:59.000 Do you have particullary in mind? Like docker? 9:59:59.000,9:59:59.000 Yes, there's docker and then there's old[br]CI, I believe is the standard one 9:59:59.000,9:59:59.000 And that could make a buzz word complaint[br] 9:59:59.000,9:59:59.000 Ah ok we were all about buzz words 9:59:59.000,9:59:59.000 Probable diffoscope block change as well 9:59:59.000,9:59:59.000 And then run diffoscope on connectors and[br]see the difference between updates of your 9:59:59.000,9:59:59.000 container images 9:59:59.000,9:59:59.000 BAM ... solved[br]Where do I invest? 9:59:59.000,9:59:59.000 I wasn't aware that OCI ... that's is how it's[br]called? No it doesn't support that right now