1 00:00:05,326 --> 00:00:09,596 I'm here today to talk to you about diffoscope 2 00:00:11,454 --> 00:00:13,364 and how you can use it as a better diff 3 00:00:13,445 --> 00:00:17,545 or for Quality Assurance, etc., things like that. 4 99:59:59,999 --> 99:59:59,999 Moin! 5 99:59:59,999 --> 99:59:59,999 Apparently that's like a north german thing to say "welcome". 6 99:59:59,999 --> 99:59:59,999 North german, north Denmark, Scandinavia, that kind of thing, I'm told. 7 99:59:59,999 --> 99:59:59,999 People are shaking their head, so I'm going to assume that's true. 8 99:59:59,999 --> 99:59:59,999 This is my first PC, an IBM 5155. 9 99:59:59,999 --> 99:59:59,999 Sometimes, when you rebooted it, it would launch into, it would somehow revert 10 99:59:59,999 --> 99:59:59,999 from booting from the hard disk to booting from a basic ROM, 11 99:59:59,999 --> 99:59:59,999 as in the programming language ROM. 12 99:59:59,999 --> 99:59:59,999 It was on my motherboard for some reason. 13 99:59:59,999 --> 99:59:59,999 So, randomly, you just get a chance to program in basic and then, 14 99:59:59,999 --> 99:59:59,999 sometimes you wouldn't, I don't know why, but… yeah. 15 99:59:59,999 --> 99:59:59,999 It's quite fun with this kind of clicky keyboard, and that folded in 16 99:59:59,999 --> 99:59:59,999 and it was this kind of big desk thing. 17 99:59:59,999 --> 99:59:59,999 Anyway… 18 99:59:59,999 --> 99:59:59,999 This is my first Debian. 19 99:59:59,999 --> 99:59:59,999 At the time it was already old. 20 99:59:59,999 --> 99:59:59,999 What's this one? Is this Slink? 2.2? Yeah. 21 99:59:59,999 --> 99:59:59,999 And this is when we had US and non-US, so that's really dating if you remember that. 22 99:59:59,999 --> 99:59:59,999 This is my first contribution to Debian, 19th December 2006, 23 99:59:59,999 --> 99:59:59,999 sending a patch to lillypond which is kind of interesting 24 99:59:59,999 --> 99:59:59,999 and the response was "Oh yeah, rock on, many thanks. I'll upload this and 25 99:59:59,999 --> 99:59:59,999 it'll be landing to Etch". 26 99:59:59,999 --> 99:59:59,999 And this was super motivating because Etch was just coming out and it was like 27 99:59:59,999 --> 99:59:59,999 "Great, I've got let one line of tiny patch in a release. This is super cool." 28 99:59:59,999 --> 99:59:59,999 Thomas' response was super motivating. 29 99:59:59,999 --> 99:59:59,999 So, after that, like that Christmas basically spent ??? 30 99:59:59,999 --> 99:59:59,999 Debian webpages and stuff. 31 99:59:59,999 --> 99:59:59,999 Very well timed. 32 99:59:59,999 --> 99:59:59,999 That's kind of a good… 33 99:59:59,999 --> 99:59:59,999 You know, someone sends a patch, be like "Cool, thanks" 34 99:59:59,999 --> 99:59:59,999 Like a little notice in the changelog. 35 99:59:59,999 --> 99:59:59,999 It was, you know, so stupid but… Yeah, do that kind of thing. 36 99:59:59,999 --> 99:59:59,999 So, moving on. 37 99:59:59,999 --> 99:59:59,999 Why diffoscope? Why did we write diffoscope? 38 99:59:59,999 --> 99:59:59,999 What's the background here? 39 99:59:59,999 --> 99:59:59,999 It comes from reproducible builds. 40 99:59:59,999 --> 99:59:59,999 The very quick outline is that once you get the source code for free software, 41 99:59:59,999 --> 99:59:59,999 you download the source code for nginx or whatever, 42 99:59:59,999 --> 99:59:59,999 pretty much everyone just runs binaries on their servers or their systems. 43 99:59:59,999 --> 99:59:59,999 You know, "apt install bla", "yum install", whatever. 44 99:59:59,999 --> 99:59:59,999 Android Playstore, whatever. 45 99:59:59,999 --> 99:59:59,999 Can you actually trust whether these two things correspond with each other? 46 99:59:59,999 --> 99:59:59,999 You've gotten the source code, it looks alright, and then you install this binary, 47 99:59:59,999 --> 99:59:59,999 yeah… 48 99:59:59,999 --> 99:59:59,999 Who generated that? Can you trust that process? 49 99:59:59,999 --> 99:59:59,999 Can you trust who generated it? 50 99:59:59,999 --> 99:59:59,999 Even if you could trust them, could you trust them not to be exploited? Etc. 51 99:59:59,999 --> 99:59:59,999 This is a big problem because you can exploit a build farm and then 52 99:59:59,999 --> 99:59:59,999 obviously exploit all of that, you know, a trojan into the build farm, 53 99:59:59,999 --> 99:59:59,999 so every single binary that comes out is compromised. 54 99:59:59,999 --> 99:59:59,999 Kind of problematic. 55 99:59:59,999 --> 99:59:59,999 You could also target individual developers machines, 56 99:59:59,999 --> 99:59:59,999 so I could go of to, say, your machine, add a backdoor to it, 57 99:59:59,999 --> 99:59:59,999 so every binary that you give to friends and things like that, 58 99:59:59,999 --> 99:59:59,999 are compromised in some way, stealing your bitcoins or whatever. 59 99:59:59,999 --> 99:59:59,999 I can also turn up at your door and blackmail you into producing 60 99:59:59,999 --> 99:59:59,999 software that has compromises or extra features, shall we say, 61 99:59:59,999 --> 99:59:59,999 that don't exist in the source code. 62 99:59:59,999 --> 99:59:59,999 So what will happen there is that you'd release your source 63 99:59:59,999 --> 99:59:59,999 and the binaries you produce have this sort of backdoor that, you know, 64 99:59:59,999 --> 99:59:59,999 someone is forcing you into producing. 65 99:59:59,999 --> 99:59:59,999 So, you don't want to do that. 66 99:59:59,999 --> 99:59:59,999 Anyway 67 99:59:59,999 --> 99:59:59,999 enough of that. 68 99:59:59,999 --> 99:59:59,999 What you do for reproducible builds is you ensure that every time you build 69 99:59:59,999 --> 99:59:59,999 a piece of software, you get an identical result. 70 99:59:59,999 --> 99:59:59,999 Multiple people then compare their builds and check whether they all get 71 99:59:59,999 --> 99:59:59,999 the same results 72 99:59:59,999 --> 99:59:59,999 and this means that an attacker must either have infected everyone 73 99:59:59,999 --> 99:59:59,999 at the same time, or they haven't infected anyone. 74 99:59:59,999 --> 99:59:59,999 The point here is that you have to ensure that builds have identical results. 75 99:59:59,999 --> 99:59:59,999 Ok, great. 76 99:59:59,999 --> 99:59:59,999 So, we started the reproducible builds project, etc. 77 99:59:59,999 --> 99:59:59,999 And we build 2 debs. 78 99:59:59,999 --> 99:59:59,999 Oh, I'm sorry about the colors there. 79 99:59:59,999 --> 99:59:59,999 You probably can't see that. 80 99:59:59,999 --> 99:59:59,999 That says "sha1sum a.deb b.deb". 81 99:59:59,999 --> 99:59:59,999 Anyway, we're comparing the sha1sums of 2 binary Debian files. 82 99:59:59,999 --> 99:59:59,999 So, these two files differ. 83 99:59:59,999 --> 99:59:59,999 Ok, they're not reproducible. 84 99:59:59,999 --> 99:59:59,999 Why is that? 85 99:59:59,999 --> 99:59:59,999 So we run a diff on them. 86 99:59:59,999 --> 99:59:59,999 Yeah… 87 99:59:59,999 --> 99:59:59,999 So, what can we learn from this? 88 99:59:59,999 --> 99:59:59,999 Well, not very much, visibly they're compressed so 89 99:59:59,999 --> 99:59:59,999 as soon as we see one change, we'll see they would just cascade changes 90 99:59:59,999 --> 99:59:59,999 because that's how compression works. 91 99:59:59,999 --> 99:59:59,999 I guess we know it's a deb ??? format file, not very useful. 92 99:59:59,999 --> 99:59:59,999 Ok, great so we're gonna have a look in 93 99:59:59,999 --> 99:59:59,999 We'll do a binary diff and ok, well… 94 99:59:59,999 --> 99:59:59,999 Again, that's not really telling us very much 95 99:59:59,999 --> 99:59:59,999 with the diff there. 96 99:59:59,999 --> 99:59:59,999 Ok, great. 97 99:59:59,999 --> 99:59:59,999 ??? 98 99:59:59,999 --> 99:59:59,999 "ar x" is on the new maintainer thing, "how you unpack a deb" 99 99:59:59,999 --> 99:59:59,999 Everyone remembers this, right? 100 99:59:59,999 --> 99:59:59,999 You unpack a.deb with "ar x" and you do that to b.deb 101 99:59:59,999 --> 99:59:59,999 and then we diff the results of that. 102 99:59:59,999 --> 99:59:59,999 Ok, so…yeah, 7zip. 103 99:59:59,999 --> 99:59:59,999 Ok, compressed content, not very useful. 104 99:59:59,999 --> 99:59:59,999 Ok, so let's unpack the control.tar inside these debs. 105 99:59:59,999 --> 99:59:59,999 And then we run diff on that. 106 99:59:59,999 --> 99:59:59,999 Still not really telling anything useful about how to make this package reproducible 107 99:59:59,999 --> 99:59:59,999 So let's unpack the tar.xz into the tar. 108 99:59:59,999 --> 99:59:59,999 Inside that tar, there's a file called md5sums and we start to see some differences 109 99:59:59,999 --> 99:59:59,999 between some files in these two debs. 110 99:59:59,999 --> 99:59:59,999 ??? meaningful, so now we have some idea that 111 99:59:59,999 --> 99:59:59,999 it has something to do with this usr/bin/pmixer binary. 112 99:59:59,999 --> 99:59:59,999 Ok, interesting. 113 99:59:59,999 --> 99:59:59,999 We'll unzip that and then we do a diff on pmixer itself. 114 99:59:59,999 --> 99:59:59,999 Now we're back into just binary ??? mode 115 99:59:59,999 --> 99:59:59,999 This isn't very helpful and this is taking quite a while 116 99:59:59,999 --> 99:59:59,999 and if I remember correctly, Debian has a lot of packages. 117 99:59:59,999 --> 99:59:59,999 So this might take a little while. 118 99:59:59,999 --> 99:59:59,999 So, basically, ??? meme 119 99:59:59,999 --> 99:59:59,999 I should build a better diff. 120 99:59:59,999 --> 99:59:59,999 That's not quite true, this is actually… 121 99:59:59,999 --> 99:59:59,999 It was lunar that started this project 122 99:59:59,999 --> 99:59:59,999 and it was called debbindiff, because we wanted to diff 123 99:59:59,999 --> 99:59:59,999 binary Debian packages. 124 99:59:59,999 --> 99:59:59,999 So this is the initial commit, 2014. 125 99:59:59,999 --> 99:59:59,999 "The version is successfully able to report differences in two .changes files. 126 99:59:59,999 --> 99:59:59,999 Not with much interesting details, but it's a start." 127 99:59:59,999 --> 99:59:59,999 And it was a start. 128 99:59:59,999 --> 99:59:59,999 Fast forwarding… Oh, sorry about these colors, 129 99:59:59,999 --> 99:59:59,999 I don't know if we can do anything about the lights? 130 99:59:59,999 --> 99:59:59,999 Yeah? 131 99:59:59,999 --> 99:59:59,999 No? 132 99:59:59,999 --> 99:59:59,999 Alright, well… 133 99:59:59,999 --> 99:59:59,999 Basically, we're diffoscoping on… 134 99:59:59,999 --> 99:59:59,999 It works kind of diff does normally, 135 99:59:59,999 --> 99:59:59,999 you give it two files, it outputs a unified diff. 136 99:59:59,999 --> 99:59:59,999 So "diffoscope a b", one file contains the word "foo", one contains the word "bar". 137 99:59:59,999 --> 99:59:59,999 Nothing actually out of the ordinary. 138 99:59:59,999 --> 99:59:59,999 It's sort of colored by default, so that's why you can't see it, but whatever. 139 99:59:59,999 --> 99:59:59,999 It supports archive formats, so if you give it two tar files, 140 99:59:59,999 --> 99:59:59,999 if we then tar up our "a" file and our "b" file into a a.tar and b.tar 141 99:59:59,999 --> 99:59:59,999 and then run diffoscope on those tar files 142 99:59:59,999 --> 99:59:59,999 we get this kind of, like, hierarchy here. 143 99:59:59,999 --> 99:59:59,999 So it's saying that there are differencies between these files, 144 99:59:59,999 --> 99:59:59,999 in the file list they have different time stamps, because I made them 145 99:59:59,999 --> 99:59:59,999 at different times, 146 99:59:59,999 --> 99:59:59,999 and here are the contents, so we got "foo" there and "bar" there. 147 99:59:59,999 --> 99:59:59,999 So we can see the difference between them. 148 99:59:59,999 --> 99:59:59,999 Well, I can, I don't know if you can, you get the slide there. 149 99:59:59,999 --> 99:59:59,999 If we gzip these tar files and then run diffoscope on those gzip things, 150 99:59:59,999 --> 99:59:59,999 it'll say "ok, what we've done is unpack it first, and here's the metadata 151 99:59:59,999 --> 99:59:59,999 about the gzip process", 152 99:59:59,999 --> 99:59:59,999 and inside that are a.tar and b.tar from the previous slides. 153 99:59:59,999 --> 99:59:59,999 And then the "a" file and the "b" file. 154 99:59:59,999 --> 99:59:59,999 So, it's really going two levels deep into this tar.gz file. 155 99:59:59,999 --> 99:59:59,999 That's pretty cool. 156 99:59:59,999 --> 99:59:59,999 And it's completely recursive, I think it will actually blow out after, I think, 157 99:59:59,999 --> 99:59:59,999 1000 [levels]. 158 99:59:59,999 --> 99:59:59,999 [light is turned down for the audience to see the slides] 159 99:59:59,999 --> 99:59:59,999 I'll just bump back a bit, just in case. 160 99:59:59,999 --> 99:59:59,999 [Applause] 161 99:59:59,999 --> 99:59:59,999 Thank you. 162 99:59:59,999 --> 99:59:59,999 So that's the a and b files. 163 99:59:59,999 --> 99:59:59,999 We've tared them up and so I see the hierarchy of foo and bar file layer. 164 99:59:59,999 --> 99:59:59,999 I've gziped them, so this is a gzip layer. 165 99:59:59,999 --> 99:59:59,999 Here's the tar layer and then there's the files themselves. 166 99:59:59,999 --> 99:59:59,999 This is from a real .deb from the archive. 167 99:59:59,999 --> 99:59:59,999 Inside this .deb, there's a data.tar.xz and in that xz file there's a data.tar 168 99:59:59,999 --> 99:59:59,999 and inside that tar file, there's a file called aff and inside that 169 99:59:59,999 --> 99:59:59,999 there's a version string that is different. 170 99:59:59,999 --> 99:59:59,999 And that looks like a build date so we probably know that if we went back 171 99:59:59,999 --> 99:59:59,999 to the source package, we could very quickly work out, 172 99:59:59,999 --> 99:59:59,999 with get a very quick grep, work out where this file is being generated from, 173 99:59:59,999 --> 99:59:59,999 the de_DE.aff file and then ??? probably quite obvious 174 99:59:59,999 --> 99:59:59,999 that it's using the current build time and then we can just patch that, fix it etc. 175 99:59:59,999 --> 99:59:59,999 This is gone from two rather obscure binary .debs all the way to the fix 176 99:59:59,999 --> 99:59:59,999 probably in about 5 minutes, and you can probably send the patch in that time 177 99:59:59,999 --> 99:59:59,999 because it'd be quite quick. 178 99:59:59,999 --> 99:59:59,999 Without diffoscope here, without this sort of recursive unpacking, 179 99:59:59,999 --> 99:59:59,999 you'd be just completely lost, you'd be there with arx all day 180 99:59:59,999 --> 99:59:59,999 and working out which files are different and trying to use xxd 181 99:59:59,999 --> 99:59:59,999 and this kind of nonsense. 182 99:59:59,999 --> 99:59:59,999 diffoscope's got some other things as well 183 99:59:59,999 --> 99:59:59,999 if you try to do reproducible packages and things are varying just on 184 99:59:59,999 --> 99:59:59,999 the line ordering, we detect whether a file differs only in the line ordering. 185 99:59:59,999 --> 99:59:59,999 So, here's file "a", "These lines are in order". 186 99:59:59,999 --> 99:59:59,999 File "b" has "These order are in lines". 187 99:59:59,999 --> 99:59:59,999 It's very difficult to say, actually, it's like one of these tongue twisters. 188 99:59:59,999 --> 99:59:59,999 Run diffoscope on these two and it says it's got ordering differences only. 189 99:59:59,999 --> 99:59:59,999 That's interesting, so you probably need to sort, 190 99:59:59,999 --> 99:59:59,999 you go all the way back to the source code, work out very quickly, 191 99:59:59,999 --> 99:59:59,999 if you know it's just ordering differences you just kind of know 192 99:59:59,999 --> 99:59:59,999 what the output's gonna be, you can search for order in ??? 193 99:59:59,999 --> 99:59:59,999 and you get the right files, 194 99:59:59,999 --> 99:59:59,999 ??? sort in the right place, BAM, send it patch of (???), 195 99:59:59,999 --> 99:59:59,999 everything is great. 196 99:59:59,999 --> 99:59:59,999 Oh, and send it to upstream as well because you're good. 197 99:59:59,999 --> 99:59:59,999 It supports a lot more things. 198 99:59:59,999 --> 99:59:59,999 We've been showing the terminal text output here. 199 99:59:59,999 --> 99:59:59,999 It's got a HTML output mode, which is really useful in the hierarchal thing 200 99:59:59,999 --> 99:59:59,999 when it gets a bit more complicated. 201 99:59:59,999 --> 99:59:59,999 Instead of being laid on top of each other like a unified diff, 202 99:59:59,999 --> 99:59:59,999 you get the diff on the left and the right and you get sort of a nested 203 99:59:59,999 --> 99:59:59,999 thing inside with colors and lines and you can link this and various things in it 204 99:59:59,999 --> 99:59:59,999 including bits of metadata here, other bits here, what command you used. 205 99:59:59,999 --> 99:59:59,999 That's the HTML output. 206 99:59:59,999 --> 99:59:59,999 We also support a lot of file formats, it's not just on text, 207 99:59:59,999 --> 99:59:59,999 it's about all of these, so let's quickly run through some of them. 208 99:59:59,999 --> 99:59:59,999 You give it two Androip apk files which are kind of like zips, but magic. 209 99:59:59,999 --> 99:59:59,999 It'll know how to compare them. 210 99:59:59,999 --> 99:59:59,999 There's like a Manifest file that needs decoding. 211 99:59:59,999 --> 99:59:59,999 It supports Berkeley DB databases, 212 99:59:59,999 --> 99:59:59,999 Word documents, that's a Word document with "a" and that's a Word document with "b" 213 99:59:59,999 --> 99:59:59,999 and it'll correctly do that. 214 99:59:59,999 --> 99:59:59,999 If you run that through diff normally, that ??? be a binaly mess, 215 99:59:59,999 --> 99:59:59,999 so completely useless. 216 99:59:59,999 --> 99:59:59,999 E-books, there's epub, it also supports mobi. 217 99:59:59,999 --> 99:59:59,999 So if you give it two epub files, it'll say "They just differ in this date". 218 99:59:59,999 --> 99:59:59,999 Brilliant. 219 99:59:59,999 --> 99:59:59,999 Normally that will be completely useless diff binary ??? 220 99:59:59,999 --> 99:59:59,999 So you can be like "epub date, ok", grep the source code for that, 221 99:59:59,999 --> 99:59:59,999 make a patch really quickly. 222 99:59:59,999 --> 99:59:59,999 Mono binaries, git repositories, why not? 223 99:59:59,999 --> 99:59:59,999 Gnumeric spreadsheets, ISO images. 224 99:59:59,999 --> 99:59:59,999 Oh yeah, ISO images is really cool. 225 99:59:59,999 --> 99:59:59,999 So, it'll basically unpack the ISO, then inside that there might be a squashfs image 226 99:59:59,999 --> 99:59:59,999 then it'll completely go down to that and work out any differences 227 99:59:59,999 --> 99:59:59,999 between the two contents in the ISO file, including any metadata. 228 99:59:59,999 --> 99:59:59,999 This is on the squashfs metadata headers, I think. 229 99:59:59,999 --> 99:59:59,999 But say inside that ISO, there was a file that was a pdf, and inside that pdf was 230 99:59:59,999 --> 99:59:59,999 a ??? which varied, 231 99:59:59,999 --> 99:59:59,999 it will basically go all the way down and say "yeah, it's actually here, 232 99:59:59,999 --> 99:59:59,999 in this ??? that the data differs." 233 99:59:59,999 --> 99:59:59,999 And that means you can just go again all the way back to the source 234 99:59:59,999 --> 99:59:59,999 and say "ok, cool, we know how to fix this quite quickly" 235 99:59:59,999 --> 99:59:59,999 And this is really valuable in getting the recent Tails distribution reproducible 236 99:59:59,999 --> 99:59:59,999 so their ISOs are reproducible. 237 99:59:59,999 --> 99:59:59,999 If you build one and I build one, we get the exact same one 238 99:59:59,999 --> 99:59:59,999 and that's kind of useful for something like Tails where you would probably want to 239 99:59:59,999 --> 99:59:59,999 of all, there's a lot of projects that you might want to compromise, 240 99:59:59,999 --> 99:59:59,999 you might want to go after that one, because of the kind of people that are using it. 241 99:59:59,999 --> 99:59:59,999 We support comparing images, so this is using ??? 242 99:59:59,999 --> 99:59:59,999 and then just running that through diff. 243 99:59:59,999 --> 99:59:59,999 That is a linux penguin and that is something else, 244 99:59:59,999 --> 99:59:59,999 I can't remember now. Oh, FT. 245 99:59:59,999 --> 99:59:59,999 It supports images. 246 99:59:59,999 --> 99:59:59,999 It supports JSON and pretty print, so if you give it two JSON files 247 99:59:59,999 --> 99:59:59,999 one with key/value… it'll do a nice diff of them. 248 99:59:59,999 --> 99:59:59,999 It will pretty print it first, before doing the diff, so it'll actually give you 249 99:59:59,999 --> 99:59:59,999 something clean, otherwise I don't know if you've ever diffed 250 99:59:59,999 --> 99:59:59,999 two very long JSON lines, if they differ in the middle, you just get 251 99:59:59,999 --> 99:59:59,999 a huge long unified diff, but here it's like "oh, just ??? things have changed" 252 99:59:59,999 --> 99:59:59,999 OpenDocument text formats, Ogg audio files, because why not. 253 99:59:59,999 --> 99:59:59,999 tcpdump capture files, that's actually quite useful. 254 99:59:59,999 --> 99:59:59,999 PDFs. That PDF says "Hello World" and this PDF says "Hello sick sad world", 255 99:59:59,999 --> 99:59:59,999 I don't know why. ??? in the demo. 256 99:59:59,999 --> 99:59:59,999 Again, run that through normal diff program… garbage. 257 99:59:59,999 --> 99:59:59,999 XML documents. Again, it'll pretty print them so it's nice, actually nice do read. 258 99:59:59,999 --> 99:59:59,999 If you want to get started on diffoscope, the very easiest and quickest way to do is 259 99:59:59,999 --> 99:59:59,999 fire up a web browser, try.diffoscope.org, select your files, press Compare 260 99:59:59,999 --> 99:59:59,999 and it'll upload them and run diffoscope with all the support for all the file formats 261 99:59:59,999 --> 99:59:59,999 in the cloud for you and give you a nice HTML page that you can then link to people 262 99:59:59,999 --> 99:59:59,999 So that's the very quickest way to get started. 263 99:59:59,999 --> 99:59:59,999 The next quickest way is to install trydiffoscope and then you run that 264 99:59:59,999 --> 99:59:59,999 on two files and it'll basically do the same thing, 265 99:59:59,999 --> 99:59:59,999 run it in the same cloud service as trydiffoscope 266 99:59:59,999 --> 99:59:59,999 but it'll give you the result on the command line or 267 99:59:59,999 --> 99:59:59,999 if you pass the webbrowser option, it will give you an URL or load your webbrowser, 268 99:59:59,999 --> 99:59:59,999 I can't remember exactly which, with the same results. 269 99:59:59,999 --> 99:59:59,999 This is 1kB of Python, nothing basically. 270 99:59:59,999 --> 99:59:59,999 That's the next easiest way. 271 99:59:59,999 --> 99:59:59,999 But you can then install diffoscope itself on your own machine. 272 99:59:59,999 --> 99:59:59,999 I recommend not installing recommends because all of those file formats 273 99:59:59,999 --> 99:59:59,999 might drag in extra things about the whole of TeX, 274 99:59:59,999 --> 99:59:59,999 I think the whole of OpenOffice, whole of Mono, whole Java… 275 99:59:59,999 --> 99:59:59,999 Android, yeah, quite big. 276 99:59:59,999 --> 99:59:59,999 I think there's another big one I can't think of. 277 99:59:59,999 --> 99:59:59,999 They're all optional, and they all say "By the way, I support TeX documents 278 99:59:59,999 --> 99:59:59,999 or whatever, Mono, whatever. 279 99:59:59,999 --> 99:59:59,999 But you need to install this package and then you get full pretty printed support", 280 99:59:59,999 --> 99:59:59,999 And it'll tell you that when it's missing. 281 99:59:59,999 --> 99:59:59,999 So, if you just start with --install-recommends disabled, 282 99:59:59,999 --> 99:59:59,999 right on your file, if it says "please install this package, you can then 283 99:59:59,999 --> 99:59:59,999 install them as you go along, as you want" 284 99:59:59,999 --> 99:59:59,999 rather than installing everything. 285 99:59:59,999 --> 99:59:59,999 And then ??? and then works as before 286 99:59:59,999 --> 99:59:59,999 I you can improve all your own quality assurance and debian packaging 287 99:59:59,999 --> 99:59:59,999 with different scope 288 99:59:59,999 --> 99:59:59,999 The biggest value here is not necessary for reproducible builds 289 99:59:59,999 --> 99:59:59,999 It's for basically just seeing where you do want to have a diff or expecting a diff 290 99:59:59,999 --> 99:59:59,999 and you are expecting a particularly type of diff in a particularly way 291 99:59:59,999 --> 99:59:59,999 you can basically see those changes 292 99:59:59,999 --> 99:59:59,999 And if you build two debs normally and ... i'll try to demo in a second 293 99:59:59,999 --> 99:59:59,999 You build a deb with a patch applied you can ??? see a diff on the source package 294 99:59:59,999 --> 99:59:59,999 But that's not very useful because the binaries are going to end in the 295 99:59:59,999 --> 99:59:59,999 people machines. But if you run a diff on the binary itself, did that change and 296 99:59:59,999 --> 99:59:59,999 really hit the binary, I think really ... No.. 297 99:59:59,999 --> 99:59:59,999 I just run through a very live demo of course, so it's gonna fail ... 298 99:59:59,999 --> 99:59:59,999 Checkout some .... We'll get this libnetx-java 299 99:59:59,999 --> 99:59:59,999 We just build that once 300 99:59:59,999 --> 99:59:59,999 Lets say we are on security team and 301 99:59:59,999 --> 99:59:59,999 want to apply a patch, and we want to be really sure because we are to push it out 302 99:59:59,999 --> 99:59:59,999 to all our users 303 99:59:59,999 --> 99:59:59,999 First we will make a changelog 304 99:59:59,999 --> 99:59:59,999 Closing a bug 305 99:59:59,999 --> 99:59:59,999 Find some java file to change 306 99:59:59,999 --> 99:59:59,999 Let's pretend we have a real patch 307 99:59:59,999 --> 99:59:59,999 Let's replace that equals equals, say that was the fix 308 99:59:59,999 --> 99:59:59,999 So that's the patch from upstream 309 99:59:59,999 --> 99:59:59,999 Upstream blast patch 310 99:59:59,999 --> 99:59:59,999 When we build this what we wanna see is just that change in the file 311 99:59:59,999 --> 99:59:59,999 we wanna see any nonsense changes of extended ??? but we also definitely want 312 99:59:59,999 --> 99:59:59,999 to see that change, cause if our binary as for security reasons don't have that change 313 99:59:59,999 --> 99:59:59,999 the we aren't fixing people machines, they will issue a DSA ??? installed, saying 314 99:59:59,999 --> 99:59:59,999 And you should do proper testing as well at multiple levels 315 99:59:59,999 --> 99:59:59,999 I will build that again 316 99:59:59,999 --> 99:59:59,999 So we wanna diff the original one 317 99:59:59,999 --> 99:59:59,999 We wanna diff that one with a fake security one 318 99:59:59,999 --> 99:59:59,999 You see on the progress bar 100% 1- there are diferences (there should be 319 99:59:59,999 --> 99:59:59,999 diferences) Lets see what that diferences are 320 99:59:59,999 --> 99:59:59,999 in our web browser, its a nice html output 321 99:59:59,999 --> 99:59:59,999 Let have a look. Are we seeing what we wanna see? 322 99:59:59,999 --> 99:59:59,999 There are some chances in the data ta, we kind of expect that 323 99:59:59,999 --> 99:59:59,999 Whats changed in our control file? Well the version changed,we wanted that 324 99:59:59,999 --> 99:59:59,999 to change. Perfect 325 99:59:59,999 --> 99:59:59,999 And its changed to ??? That's what we wanna see 326 99:59:59,999 --> 99:59:59,999 No other changes here so there was no weird control or in magic going on 327 99:59:59,999 --> 99:59:59,999 In our data tar the color of the timestamp changes, we will ignore these for now 328 99:59:59,999 --> 99:59:59,999 The changelog has changed, well I hope so because I have changed that entry 329 99:59:59,999 --> 99:59:59,999 Here is where we going to start seeing We are going to see the changing in the 330 99:59:59,999 --> 99:59:59,999 jar file which is the java class, java compile archive format 331 99:59:59,999 --> 99:59:59,999 We are seeing some meaningless timestamp changes but we can ignore those 332 99:59:59,999 --> 99:59:59,999 ??? cause its just metadata maybe 333 99:59:59,999 --> 99:59:59,999 Ok part of a class, so if you can see here it's basically a de-compilation of the 334 99:59:59,999 --> 99:59:59,999 java file itself and it's basically saying "oh I use to say if now and if not now" 335 99:59:59,999 --> 99:59:59,999 So these are the actual byte java byte code instructions and whats really 336 99:59:59,999 --> 99:59:59,999 And what is really ??? here its that nothing else has changed 337 99:59:59,999 --> 99:59:59,999 We were just expecting that change between the two op codes, of if now elseif not not now 338 99:59:59,999 --> 99:59:59,999 which is good cause its like it hasn't made any code changes but also crucial we can 339 99:59:59,999 --> 99:59:59,999 see that it has actually made a change to the code. 340 99:59:59,999 --> 99:59:59,999 For example its wasn't use some cached version or something like that 341 99:59:59,999 --> 99:59:59,999 This is really useful 342 99:59:59,999 --> 99:59:59,999 And just running a naif diff wouldn't give that of course, because it would just 343 99:59:59,999 --> 99:59:59,999 come with binary garbage And just seeing the diff had changed again 344 99:59:59,999 --> 99:59:59,999 ??? be told you anything, because all of the change would have changed as well 345 99:59:59,999 --> 99:59:59,999 So its like well yes it's diferent 346 99:59:59,999 --> 99:59:59,999 The meaningful change there it's what actually fixes the "floor" 347 99:59:59,999 --> 99:59:59,999 ??? but we know it's there 348 99:59:59,999 --> 99:59:59,999 That's kind of ??? Shifting this deb out I'll be quite 349 99:59:59,999 --> 99:59:59,999 confident, that this seemed like the actual bug 350 99:59:59,999 --> 99:59:59,999 I've been quite confident pushing that out because it's very minimal amount of changes 351 99:59:59,999 --> 99:59:59,999 you wanna do that for security reasons 352 99:59:59,999 --> 99:59:59,999 So this was the live demo 353 99:59:59,999 --> 99:59:59,999 The other one is seeing no changes at all, so you can build once 354 99:59:59,999 --> 99:59:59,999 if you build a reproducible 355 99:59:59,999 --> 99:59:59,999 You can build once change your compiler or change some other part of your toolchain 356 99:59:59,999 --> 99:59:59,999 Build it again and if you got the exact same results, well great, that's want you intended 357 99:59:59,999 --> 99:59:59,999 You wanna see no changes when you change some part of it 358 99:59:59,999 --> 99:59:59,999 And that is really useful, if there were changes diffoscope will highlight them 359 99:59:59,999 --> 99:59:59,999 and show exactly why they had changed, maybe some compile authorizations, 360 99:59:59,999 --> 99:59:59,999 maybe some other things as well 361 99:59:59,999 --> 99:59:59,999 So you can use it in both ways, when you expect changes and when you don't expect 362 99:59:59,999 --> 99:59:59,999 changes, and if those match the expectations diffoscope will tell you exactly why 363 99:59:59,999 --> 99:59:59,999 It's all ??? when other companies are doing security releases 364 99:59:59,999 --> 99:59:59,999 naming no names whatsoever, but they like to release patches as you 365 99:59:59,999 --> 99:59:59,999 know just a new firmware for your router 366 99:59:59,999 --> 99:59:59,999 Very large file system images, you basically have no ideia what changed 367 99:59:59,999 --> 99:59:59,999 between these two files, again you ??? through diff completely useless 368 99:59:59,999 --> 99:59:59,999 You can start to unpack them with ??? and blah blah blah 369 99:59:59,999 --> 99:59:59,999 But they're probably sort of concatenated cpio archives, so that's nonsense 370 99:59:59,999 --> 99:59:59,999 But diffoscope would just chew you those and give you actually what the diferences 371 99:59:59,999 --> 99:59:59,999 is between these two files, and say they changed this, they've removed or 372 99:59:59,999 --> 99:59:59,999 added some gpl license code or something kind of interesting 373 99:59:59,999 --> 99:59:59,999 So its very useful for diffing those kind binary blobs that come from various people 374 99:59:59,999 --> 99:59:59,999 So the current state of diffoscope, the development is up and down 375 99:59:59,999 --> 99:59:59,999 It started around May 2014 something like that A bunch of work here, that's is idle I think 376 99:59:59,999 --> 99:59:59,999 There are just for debconfs basically 377 99:59:59,999 --> 99:59:59,999 Anyway it's going up and down its kind of interesting 378 99:59:59,999 --> 99:59:59,999 ??? a lot of reproducible builds projects of course, so every time we do a build 379 99:59:59,999 --> 99:59:59,999 on the ??? reproducible builds or testing framework if we run diffoscope 380 99:59:59,999 --> 99:59:59,999 on the result, if it's reproducible it just says , hey the file is the same 381 99:59:59,999 --> 99:59:59,999 But if not, we publish the diffoscopes of all your packages that are unreproducible 382 99:59:59,999 --> 99:59:59,999 just you can just go there and be like whats the diference between these two things 383 99:59:59,999 --> 99:59:59,999 I invested a lot of work optimizing diffoscope, ??? rather perverse end square 384 99:59:59,999 --> 99:59:59,999 loops inside it. So i manage to cut down some of the time here, cut down here 385 99:59:59,999 --> 99:59:59,999 That's been quite a few performances and enhancements over the past ... 386 99:59:59,999 --> 99:59:59,999 these are the git tags , this is version 80 and this is version 50 I just run the same 387 99:59:59,999 --> 99:59:59,999 benchmark across them all 388 99:59:59,999 --> 99:59:59,999 So they shows when I have introduced some rather stupid code, embarrassing , but whatever 389 99:59:59,999 --> 99:59:59,999 ??? 390 99:59:59,999 --> 99:59:59,999 There's work been done right now, on parallel processing, there's been 391 99:59:59,999 --> 99:59:59,999 quite a few attempts before, but adding it it's kind of interesting and difficult 392 99:59:59,999 --> 99:59:59,999 Luckily we have a ??? student Liliana, is she in the room? Is she hiding? 393 99:59:59,999 --> 99:59:59,999 She's here and she's been talking tomorrow about her work on paralel processing in 394 99:59:59,999 --> 99:59:59,999 diffoscope and that will be amazing because a lot of it is IO bound or waiting for Xtel 395 99:59:59,999 --> 99:59:59,999 processors with multiple cpu machines, you mind as well just play well 396 99:59:59,999 --> 99:59:59,999 while as I stand waiting for the result for a pdf to be unpacked I maybe as well 397 99:59:59,999 --> 99:59:59,999 be running on another cpu, I think we are going to see some real performance wins 398 99:59:59,999 --> 99:59:59,999 as we do that paralell processing merge and working and ??? 399 99:59:59,999 --> 99:59:59,999 You can check out our website diffoscope.org recently migrated to Salsa .... yeeaahhh 400 99:59:59,999 --> 99:59:59,999 And everything ??? reproducible is now on Salsa, it's kind of cool 401 99:59:59,999 --> 99:59:59,999 That's quite recent... 402 99:59:59,999 --> 99:59:59,999 Thank you very muck, Danke shcön 403 99:59:59,999 --> 99:59:59,999 You got any questions? About diffoscope? 404 99:59:59,999 --> 99:59:59,999 Thank you very much ! 405 99:59:59,999 --> 99:59:59,999 Q: A buzz word question, can you diff containers image formats? 406 99:59:59,999 --> 99:59:59,999 A: Depend which ones. So if they are just directory, then yes, because is just a directory 407 99:59:59,999 --> 99:59:59,999 Do you have particullary in mind? Like docker? 408 99:59:59,999 --> 99:59:59,999 Yes, there's docker and then there's old CI, I believe is the standard one 409 99:59:59,999 --> 99:59:59,999 And that could make a buzz word complaint 410 99:59:59,999 --> 99:59:59,999 Ah ok we were all about buzz words 411 99:59:59,999 --> 99:59:59,999 Probable diffoscope block change as well 412 99:59:59,999 --> 99:59:59,999 And then run diffoscope on connectors and see the difference between updates of your 413 99:59:59,999 --> 99:59:59,999 container images 414 99:59:59,999 --> 99:59:59,999 BAM ... solved Where do I invest? 415 99:59:59,999 --> 99:59:59,999 I wasn't aware that OCI ... that's is how it's called? No it doesn't support that right now 416 99:59:59,999 --> 99:59:59,999 But it wouldn't be too difficult, presuming are tools to unpack it and as soon we have 417 99:59:59,999 --> 99:59:59,999 a tool to unpack it, it can then just go to that, there is a wishing list tool box 418 99:59:59,999 --> 99:59:59,999 for docker containers to the point were I think it would be really nice if you 419 99:59:59,999 --> 99:59:59,999 could just give it, say, two images names or whatever the noun is 420 99:59:59,999 --> 99:59:59,999 So you can say "please diff these two docker images that are available" and 421 99:59:59,999 --> 99:59:59,999 it can look at your local thing and do a diff on them, currently it's not 422 99:59:59,999 --> 99:59:59,999 supported, but there is an open wishlist bug. 423 99:59:59,999 --> 99:59:59,999 Q: Shouldn't any company that releases binaries, be interested in supporting 424 99:59:59,999 --> 99:59:59,999 diffoscope and using it? 425 99:59:59,999 --> 99:59:59,999 A1: Basically when companies release binaries they are not interested in users seeing diferences... 426 99:59:59,999 --> 99:59:59,999 A2: Yes, I'm surprised that actually the docker bug was only opened two months ago 427 99:59:59,999 --> 99:59:59,999 and hasn't been more interest on diffing container images, but if you like to open 428 99:59:59,999 --> 99:59:59,999 one for OCI that will be very appreciated, and we can get on to that, that would be 429 99:59:59,999 --> 99:59:59,999 great. 430 99:59:59,999 --> 99:59:59,999 I was looking the page for OCI, it says it's based on docker basically, so 431 99:59:59,999 --> 99:59:59,999 once you get OCI for free, you would sort it out for docker, if you're lucky 432 99:59:59,999 --> 99:59:59,999 The OCI image formaters, they wrote out on docker images 433 99:59:59,999 --> 99:59:59,999 Ok we will sort that out, and it seems like we're using a docker more and more 434 99:59:59,999 --> 99:59:59,999 on debian 435 99:59:59,999 --> 99:59:59,999 Any other questions? 436 99:59:59,999 --> 99:59:59,999 Q: Out of curiosity, which ??? are you using inside? Are you using some bio-informatics 437 99:59:59,999 --> 99:59:59,999 on ??? to diff trees efficiently? 438 99:59:59,999 --> 99:59:59,999 A: No it's really naif, all it does is run normal diff, the normal diff tools, but 439 99:59:59,999 --> 99:59:59,999 it will try to identify files and unpack first, so use the file utility identifier 440 99:59:59,999 --> 99:59:59,999 thing that says its a pdf , and try to unpack it first, he doesn't do any clever 441 99:59:59,999 --> 99:59:59,999 matching. The clever matching that he does do is fuzzy matching as well, so if just 442 99:59:59,999 --> 99:59:59,999 rename a directory between two inside a container, he will say , yeah there a 443 99:59:59,999 --> 99:59:59,999 massive match between this two files, and things like that. So that's kind of 444 99:59:59,999 --> 99:59:59,999 useful. ??? it's not so that clever, which is kind of what you want , cause if it's 445 99:59:59,999 --> 99:59:59,999 too clever it would start to be a little opaque ... 446 99:59:59,999 --> 99:59:59,999 I personally like dumb tools. 447 99:59:59,999 --> 99:59:59,999 Q: So one question to you is whether, if you wanna do a release to stable or 448 99:59:59,999 --> 99:59:59,999 something like that, you can ask for the debdiff, I'm wandering if anyone 449 99:59:59,999 --> 99:59:59,999 I mean I remember doing that myself I've been submitting diffoscope output 450 99:59:59,999 --> 99:59:59,999 as well, because is just more readable and useful. so I'm not sure if anyone have any 451 99:59:59,999 --> 99:59:59,999 objection to people asking for those. 452 99:59:59,999 --> 99:59:59,999 I'll propose that to the release team see what they say 453 99:59:59,999 --> 99:59:59,999 Thank you very much, any further questions? 454 99:59:59,999 --> 99:59:59,999 [Applause]