WEBVTT 00:00:07.466 --> 00:00:10.156 I'm here today to talk to you about diffoscope 00:00:10.156 --> 00:00:13.190 and how you can use it as a better diff 00:00:14.063 --> 00:00:16.166 or for Quality Assurance, etc., things like that. 00:00:19.789 --> 00:00:20.810 Moin! 00:00:20.815 --> 00:00:24.409 Apparently that's like a north german thing to say "welcome". 00:00:25.938 --> 00:00:29.898 North german, north Denmark, Scandinavia, that kind of thing, I'm told. 00:00:31.836 --> 00:00:34.197 People are shaking their head, so I'm going to assume that's true. 00:00:37.306 --> 00:00:40.425 This is my first PC, an IBM 5155. 00:00:41.623 --> 00:00:46.441 Sometimes, when you rebooted it, it would launch into, it would somehow revert 00:00:46.688 --> 00:00:50.971 from booting from the hard disk to booting from a basic ROM, 00:00:51.359 --> 00:00:52.959 as in the programming language ROM. 00:00:53.017 --> 00:00:54.320 It was on my motherboard for some reason. 00:00:54.912 --> 00:00:57.691 So, randomly, you just get a chance to program in basic and then, 00:00:57.957 --> 00:01:00.456 sometimes you wouldn't, I don't know why, but… yeah. 00:01:00.718 --> 00:01:05.173 It's quite fun with this kind of clicky keyboard, and that folded in 00:01:05.519 --> 00:01:07.058 and it was this kind of big desk thing. 00:01:07.058 --> 00:01:08.014 Anyway… 00:01:09.067 --> 00:01:10.187 This is my first Debian. 00:01:10.500 --> 00:01:11.837 At the time it was already old. 00:01:12.890 --> 00:01:15.908 What's this one? Is this Slink? 2.2? Yeah. 00:01:17.077 --> 00:01:22.043 And this is when we had US and non-US, so that's really dating if you remember that. 00:01:23.522 --> 00:01:28.393 This is my first contribution to Debian, 19th December 2006, 00:01:28.803 --> 00:01:33.738 sending a patch to lillypond which is kind of interesting 00:01:34.155 --> 00:01:37.205 and the response was "Oh yeah, rock on, many thanks. I'll upload this and 00:01:37.440 --> 00:01:38.723 it'll be landing to Etch". 00:01:39.007 --> 00:01:43.408 And this was super motivating because Etch was just coming out and it was like 00:01:43.602 --> 00:01:48.732 "Great, I've got let one line of tiny patch in a release. This is super cool." 00:01:49.118 --> 00:01:52.687 Thomas' response was super motivating. 00:01:52.993 --> 00:01:56.450 So, after that, like that Christmas basically spent ??? 00:01:56.675 --> 00:01:59.754 Debian webpages and stuff. 00:02:00.327 --> 00:02:01.568 Very well timed. 00:02:02.234 --> 00:02:03.566 That's kind of a good… 00:02:04.301 --> 00:02:07.379 You know, someone sends a patch, be like "Cool, thanks" 00:02:07.849 --> 00:02:09.434 Like a little notice in the changelog. 00:02:09.807 --> 00:02:14.344 It was, you know, so stupid but… Yeah, do that kind of thing. 00:02:15.558 --> 00:02:17.249 So, moving on. 00:02:17.641 --> 00:02:20.276 Why diffoscope? Why did we write diffoscope? 00:02:20.552 --> 00:02:21.880 What's the background here? 00:02:22.184 --> 00:02:24.575 It comes from reproducible builds. 00:02:24.911 --> 00:02:28.983 The very quick outline is that once you get the source code for free software, 00:02:29.208 --> 00:02:31.505 you download the source code for nginx or whatever, 00:02:31.998 --> 00:02:35.844 pretty much everyone just runs binaries on their servers or their systems. 00:02:36.110 --> 00:02:39.119 You know, "apt install bla", "yum install", whatever. 00:02:40.531 --> 00:02:41.535 Android Playstore, whatever. 00:02:42.479 --> 00:02:46.176 Can you actually trust whether these two things correspond with each other? 00:02:46.470 --> 00:02:49.926 You've gotten the source code, it looks alright, and then you install this binary, 00:02:50.847 --> 00:02:51.821 yeah… 00:02:52.459 --> 00:02:55.861 Who generated that? Can you trust that process? 00:02:56.275 --> 00:02:57.430 Can you trust who generated it? 00:02:58.351 --> 00:03:01.493 Even if you could trust them, could you trust them not to be exploited? Etc. 00:03:02.295 --> 00:03:04.765 This is a big problem because you can exploit a build farm and then 00:03:05.160 --> 00:03:09.895 obviously exploit all of that, you know, a trojan into the build farm, 00:03:10.097 --> 00:03:13.290 so every single binary that comes out is compromised. 00:03:13.708 --> 00:03:14.792 Kind of problematic. 00:03:15.060 --> 00:03:17.686 You could also target individual developers machines, 00:03:17.937 --> 00:03:21.288 so I could go of to, say, your machine, add a backdoor to it, 00:03:21.578 --> 00:03:25.241 so every binary that you give to friends and things like that, 00:03:26.935 --> 00:03:30.485 are compromised in some way, stealing your bitcoins or whatever. 00:03:31.802 --> 00:03:36.127 I can also turn up at your door and blackmail you into producing 00:03:38.522 --> 00:03:42.997 software that has compromises or extra features, shall we say, 00:03:43.472 --> 00:03:44.783 that don't exist in the source code. 00:03:45.133 --> 00:03:47.885 So what will happen there is that you'd release your source 00:03:48.093 --> 00:03:51.968 and the binaries you produce have this sort of backdoor that, you know, 00:03:52.435 --> 00:03:55.127 someone is forcing you into producing. 00:03:55.464 --> 00:03:56.679 So, you don't want to do that. 00:03:56.856 --> 00:03:57.505 Anyway 00:03:58.197 --> 00:03:59.228 enough of that. 00:03:59.228 --> 00:04:03.211 What you do for reproducible builds is you ensure that every time you build 00:04:03.467 --> 00:04:05.773 a piece of software, you get an identical result. 00:04:06.916 --> 00:04:10.885 Multiple people then compare their builds and check whether they all get 00:04:07.074 --> 00:04:11.068 the same results 00:04:11.068 --> 00:04:15.626 and this means that an attacker must either have infected everyone 00:04:15.626 --> 00:04:17.726 at the same time, or they haven't infected anyone. 00:04:20.673 --> 00:04:24.058 The point here is that you have to ensure that builds have identical results. 00:04:24.173 --> 00:04:25.163 Ok, great. 00:04:28.003 --> 00:04:32.539 So, we started the reproducible builds project, etc. 00:04:33.470 --> 00:04:34.744 And we build 2 debs. 00:04:35.112 --> 00:04:36.537 Oh, I'm sorry about the colors there. 00:04:38.067 --> 00:04:38.965 You probably can't see that. 00:04:39.349 --> 00:04:42.485 That says "sha1sum a.deb b.deb". 00:04:46.128 --> 00:04:50.775 Anyway, we're comparing the sha1sums of 2 binary Debian files. 00:04:51.424 --> 00:04:53.922 So, these two files differ. 00:04:54.222 --> 00:04:55.612 Ok, they're not reproducible. 00:04:56.807 --> 00:04:57.527 Why is that? 00:04:57.873 --> 00:04:59.656 So we run a diff on them. 00:05:00.140 --> 00:05:00.637 Yeah… 00:05:01.340 --> 00:05:04.093 So, what can we learn from this? 00:05:04.418 --> 00:05:08.508 Well, not very much, visibly they're compressed so 00:05:08.947 --> 00:05:13.012 as soon as we see one change, we'll see they would just cascade changes 00:05:13.362 --> 00:05:14.866 because that's how compression works. 00:05:16.241 --> 00:05:23.983 I guess we know it's a deb probably a ar format file, not very useful. 00:05:24.193 --> 00:05:26.005 Ok, great so we're gonna have a look in 00:05:26.492 --> 00:05:29.919 We'll do a binary diff and ok, well… 00:05:30.923 --> 00:05:32.790 Again, that's not really telling us very much 00:05:34.413 --> 00:05:36.515 with the diff there. 00:05:37.206 --> 00:05:38.426 Ok, great. 00:05:39.417 --> 00:05:40.427 ??? one level in 00:05:40.513 --> 00:05:44.834 "ar x" is on the new maintainer thing, "how you unpack a deb" 00:05:44.858 --> 00:05:46.215 Everyone remembers this, right? 00:05:48.196 --> 00:05:51.167 You unpack a.deb with "ar x" and you do that to b.deb 00:05:51.599 --> 00:05:53.606 and then we diff the results of that. 00:05:54.099 --> 00:05:57.824 Ok, so…yeah, 7zip. 00:05:58.948 --> 00:06:01.329 Ok, compressed content, not very useful. 00:06:01.897 --> 00:06:07.898 Ok, so let's unpack the control.tar inside these debs. 00:06:08.725 --> 00:06:10.145 And then we run diff on that. 00:06:12.693 --> 00:06:16.850 Still not really telling anything useful about how to make this package reproducible 00:06:17.487 --> 00:06:20.345 So let's unpack the tar.xz into the tar. 00:06:22.463 --> 00:06:28.348 Inside that tar, there's a file called md5sums and we start to see some differences 00:06:28.768 --> 00:06:33.370 between some files in these two debs. 00:06:33.640 --> 00:06:36.527 ??? meaningful, so now we have some idea that 00:06:36.855 --> 00:06:39.101 it has something to do with this usr/bin/pmixer binary. 00:06:39.682 --> 00:06:40.653 Ok, interesting. 00:06:41.989 --> 00:06:45.015 We'll unzip that and then we do a diff on pmixer itself. 00:06:45.914 --> 00:06:48.600 Now we're back into just binary "globgoly" mode 00:06:49.002 --> 00:06:51.736 This isn't very helpful and this is taking quite a while 00:06:52.399 --> 00:06:54.663 and if I remember correctly, Debian has a lot of packages. 00:06:55.182 --> 00:06:56.784 So this might take a little while. 00:06:57.601 --> 00:07:00.415 So, basically, ??? mean 00:07:00.782 --> 00:07:02.008 I should build a better diff. 00:07:03.703 --> 00:07:05.194 That's not quite true, this is actually… 00:07:05.783 --> 00:07:07.472 It was lunar that started this project 00:07:07.801 --> 00:07:10.670 and it was called debbindiff, because we wanted to diff 00:07:11.093 --> 00:07:12.264 binary Debian packages. 00:07:13.474 --> 00:07:15.040 So this is the initial commit, 2014. 00:07:16.962 --> 00:07:20.100 "The version is successfully able to report differences in two .changes files. 00:07:20.100 --> 00:07:22.343 Not with much interesting details, but it's a start." 00:07:22.762 --> 00:07:23.806 And it was a start. 00:07:27.581 --> 00:07:29.918 Fast forwarding… Oh, sorry about these colors, 00:07:30.307 --> 00:07:31.872 I don't know if we can do anything about the lights? 00:07:34.713 --> 00:07:35.363 Yeah? 00:07:37.830 --> 00:07:38.080 No? 00:07:42.124 --> 00:07:42.974 Allright, whatever… 00:07:43.700 --> 00:07:46.410 Basically, we're diffoscoping on… 00:07:47.546 --> 00:07:49.595 It works kind of diff does normally, 00:07:49.981 --> 00:07:51.995 you give it two files, it outputs a unified diff. 00:07:52.699 --> 00:07:59.427 So "diffoscope a b", one file contains the word "foo", one contains the word "bar". 00:08:01.241 --> 00:08:03.340 Nothing actually out of the ordinary. 00:08:03.974 --> 00:08:07.670 It's sort of colored by default, so that's why you can't see it, but whatever. 00:08:10.432 --> 00:08:14.667 It supports archive formats, so if you give it two tar files, 00:08:15.413 --> 00:08:22.263 if we then tar up our "a" file and our "b" file into a a.tar and b.tar 00:08:23.206 --> 00:08:25.374 and then run diffoscope on those tar files 00:08:26.197 --> 00:08:28.395 we get this kind of, like, hierarchy here. 00:08:28.742 --> 00:08:32.006 So it's saying that there are differencies between these files, 00:08:32.513 --> 00:08:37.735 in the file list they have different time stamps, because I made them 00:08:38.161 --> 00:08:39.535 at different times, 00:08:39.848 --> 00:08:42.575 and here are the contents, so we got "foo" there and "bar" there. 00:08:43.296 --> 00:08:44.781 So we can see the difference between them. 00:08:45.566 --> 00:08:48.373 Well, I can, I don't know if you can, you get the slide there. 00:08:49.311 --> 00:08:53.551 If we gzip these tar files and then run diffoscope on those gzip things, 00:08:53.888 --> 00:08:59.230 it'll say "ok, what we've done is unpack it first, and here's the metadata 00:08:59.622 --> 00:09:01.653 about the gzip process", 00:09:02.107 --> 00:09:05.941 and inside that are a.tar and b.tar from the previous slides. 00:09:07.673 --> 00:09:09.085 And then the "a" file and the "b" file. 00:09:09.365 --> 00:09:15.303 So, it's really going two levels deep into this tar.gz file. 00:09:16.162 --> 00:09:17.042 That's pretty cool. 00:09:17.291 --> 00:09:20.772 And it's completely recursive, I think it will actually blow out after, I think, 00:09:20.993 --> 00:09:21.697 1000 [levels]. 00:09:23.119 --> 00:09:25.233 [light is turned down for the audience to see the slides] 00:09:30.195 --> 00:09:32.065 I'll just bump back a bit, just in case. 00:09:35.203 --> 00:09:37.055 [Applause] 00:09:37.806 --> 00:09:38.662 Thank you. 00:09:39.907 --> 00:09:43.462 So that's the a and b files. 00:09:43.884 --> 00:09:48.077 We've tared them up and so I see the hierarchy of foo and bar file layer. 00:09:48.472 --> 00:09:52.012 I've gziped them, so this is a gzip layer. 00:09:52.399 --> 00:09:54.661 Here's the tar layer and then there's the files themselves. 00:09:57.315 --> 00:09:59.252 This is from a real .deb from the archive. 00:10:00.637 --> 00:10:06.542 Inside this .deb, there's a data.tar.xz and in that xz file there's a data.tar 00:10:07.294 --> 00:10:11.081 and inside that tar file, there's a file called aff and inside that 00:10:11.648 --> 00:10:13.892 there's a version string that is different. 00:10:14.174 --> 00:10:17.527 And that looks like a build date so we probably know that if we went back 00:10:17.753 --> 00:10:22.748 to the source package, we could very quickly work out, 00:10:22.922 --> 00:10:26.582 with get a very quick grep, work out where this file is being generated from, 00:10:26.582 --> 00:10:31.536 the de_DE.aff file and then ??? probably quite obvious 00:10:32.285 --> 00:10:37.311 that it's using the current build time and then we can just patch that, fix it etc. 00:10:38.362 --> 00:10:45.681 This is gone from two rather obscure binary .debs all the way to the fix 00:10:46.040 --> 00:10:51.683 probably in about 5 minutes, and you can probably send the patch in that time 00:10:52.098 --> 00:10:53.086 because it'd be quite quick. 00:10:53.860 --> 00:10:57.482 Without diffoscope here, without this sort of recursive unpacking, 00:10:58.351 --> 00:11:03.380 you'd be just completely lost, you'd be there with arx all day 00:11:03.762 --> 00:11:07.109 and working out which files are different and trying to use xxd 00:11:07.859 --> 00:11:09.410 and this kind of nonsense. 00:11:10.612 --> 00:11:12.875 diffoscope's got some other things as well 00:11:13.277 --> 00:11:17.116 if you try to do reproducible packages and things are varying just on 00:11:17.381 --> 00:11:22.408 the line ordering, we detect whether a file differs only in the line ordering. 00:11:22.660 --> 00:11:26.178 So, here's file "a", "These lines are in order". 00:11:27.155 --> 00:11:30.108 File "b" has "These order are in lines". 00:11:30.630 --> 00:11:34.864 It's very difficult to say, actually, it's like one of these tongue twisters. 00:11:35.305 --> 00:11:38.862 Run diffoscope on these two and it says it's got ordering differences only. 00:11:39.210 --> 00:11:41.295 That's interesting, so you probably need to sort, 00:11:41.592 --> 00:11:45.076 you go all the way back to the source code, work out very quickly, 00:11:45.389 --> 00:11:48.381 if you know it's just ordering differences you just kind of know 00:11:48.672 --> 00:11:52.762 what the output's gonna be, you can search for order in ??? 00:11:53.166 --> 00:11:54.648 and you get the right files, 00:11:54.928 --> 00:11:57.803 ??? sort in the right place, BAM, send it patch of (???), 00:11:57.889 --> 00:11:59.358 everything is great. 00:11:59.790 --> 99:59:59.999 Oh, and send it to upstream as well because you're good. 99:59:59.999 --> 99:59:59.999 It supports a lot more things. 99:59:59.999 --> 99:59:59.999 We've been showing the terminal text output here. 99:59:59.999 --> 99:59:59.999 It's got a HTML output mode, which is really useful in the hierarchal thing 99:59:59.999 --> 99:59:59.999 when it gets a bit more complicated. 99:59:59.999 --> 99:59:59.999 Instead of being laid on top of each other like a unified diff, 99:59:59.999 --> 99:59:59.999 you get the diff on the left and the right and you get sort of a nested 99:59:59.999 --> 99:59:59.999 thing inside with colors and lines and you can link this and various things in it 99:59:59.999 --> 99:59:59.999 including bits of metadata here, other bits here, what command you used. 99:59:59.999 --> 99:59:59.999 That's the HTML output. 99:59:59.999 --> 99:59:59.999 We also support a lot of file formats, it's not just on text, 99:59:59.999 --> 99:59:59.999 it's about all of these, so let's quickly run through some of them. 99:59:59.999 --> 99:59:59.999 You give it two Androip apk files which are kind of like zips, but magic. 99:59:59.999 --> 99:59:59.999 It'll know how to compare them. 99:59:59.999 --> 99:59:59.999 There's like a Manifest file that needs decoding. 99:59:59.999 --> 99:59:59.999 It supports Berkeley DB databases, 99:59:59.999 --> 99:59:59.999 Word documents, that's a Word document with "a" and that's a Word document with "b" 99:59:59.999 --> 99:59:59.999 and it'll correctly do that. 99:59:59.999 --> 99:59:59.999 If you run that through diff normally, that ??? be a binaly mess, 99:59:59.999 --> 99:59:59.999 so completely useless. 99:59:59.999 --> 99:59:59.999 E-books, there's epub, it also supports mobi. 99:59:59.999 --> 99:59:59.999 So if you give it two epub files, it'll say "They just differ in this date". 99:59:59.999 --> 99:59:59.999 Brilliant. 99:59:59.999 --> 99:59:59.999 Normally that will be completely useless diff binary ??? 99:59:59.999 --> 99:59:59.999 So you can be like "epub date, ok", grep the source code for that, 99:59:59.999 --> 99:59:59.999 make a patch really quickly. 99:59:59.999 --> 99:59:59.999 Mono binaries, git repositories, why not? 99:59:59.999 --> 99:59:59.999 Gnumeric spreadsheets, ISO images. 99:59:59.999 --> 99:59:59.999 Oh yeah, ISO images is really cool. 99:59:59.999 --> 99:59:59.999 So, it'll basically unpack the ISO, then inside that there might be a squashfs image 99:59:59.999 --> 99:59:59.999 then it'll completely go down to that and work out any differences 99:59:59.999 --> 99:59:59.999 between the two contents in the ISO file, including any metadata. 99:59:59.999 --> 99:59:59.999 This is on the squashfs metadata headers, I think. 99:59:59.999 --> 99:59:59.999 But say inside that ISO, there was a file that was a pdf, and inside that pdf was 99:59:59.999 --> 99:59:59.999 a ??? which varied, 99:59:59.999 --> 99:59:59.999 it will basically go all the way down and say "yeah, it's actually here, 99:59:59.999 --> 99:59:59.999 in this ??? that the data differs." 99:59:59.999 --> 99:59:59.999 And that means you can just go again all the way back to the source 99:59:59.999 --> 99:59:59.999 and say "ok, cool, we know how to fix this quite quickly" 99:59:59.999 --> 99:59:59.999 And this is really valuable in getting the recent Tails distribution reproducible 99:59:59.999 --> 99:59:59.999 so their ISOs are reproducible. 99:59:59.999 --> 99:59:59.999 If you build one and I build one, we get the exact same one 99:59:59.999 --> 99:59:59.999 and that's kind of useful for something like Tails where you would probably want to 99:59:59.999 --> 99:59:59.999 of all, there's a lot of projects that you might want to compromise, 99:59:59.999 --> 99:59:59.999 you might want to go after that one, because of the kind of people that are using it. 99:59:59.999 --> 99:59:59.999 We support comparing images, so this is using ??? 99:59:59.999 --> 99:59:59.999 and then just running that through diff. 99:59:59.999 --> 99:59:59.999 That is a linux penguin and that is something else, 99:59:59.999 --> 99:59:59.999 I can't remember now. Oh, FT. 99:59:59.999 --> 99:59:59.999 It supports images. 99:59:59.999 --> 99:59:59.999 It supports JSON and pretty print, so if you give it two JSON files 99:59:59.999 --> 99:59:59.999 one with key/value… it'll do a nice diff of them. 99:59:59.999 --> 99:59:59.999 It will pretty print it first, before doing the diff, so it'll actually give you 99:59:59.999 --> 99:59:59.999 something clean, otherwise I don't know if you've ever diffed 99:59:59.999 --> 99:59:59.999 two very long JSON lines, if they differ in the middle, you just get 99:59:59.999 --> 99:59:59.999 a huge long unified diff, but here it's like "oh, just ??? things have changed" 99:59:59.999 --> 99:59:59.999 OpenDocument text formats, Ogg audio files, because why not. 99:59:59.999 --> 99:59:59.999 tcpdump capture files, that's actually quite useful. 99:59:59.999 --> 99:59:59.999 PDFs. That PDF says "Hello World" and this PDF says "Hello sick sad world", 99:59:59.999 --> 99:59:59.999 I don't know why. ??? in the demo. 99:59:59.999 --> 99:59:59.999 Again, run that through normal diff program… garbage. 99:59:59.999 --> 99:59:59.999 XML documents. Again, it'll pretty print them so it's nice, actually nice do read. 99:59:59.999 --> 99:59:59.999 If you want to get started on diffoscope, the very easiest and quickest way to do is 99:59:59.999 --> 99:59:59.999 fire up a web browser, try.diffoscope.org, select your files, press Compare 99:59:59.999 --> 99:59:59.999 and it'll upload them and run diffoscope with all the support for all the file formats 99:59:59.999 --> 99:59:59.999 in the cloud for you and give you a nice HTML page that you can then link to people 99:59:59.999 --> 99:59:59.999 So that's the very quickest way to get started. 99:59:59.999 --> 99:59:59.999 The next quickest way is to install trydiffoscope and then you run that 99:59:59.999 --> 99:59:59.999 on two files and it'll basically do the same thing, 99:59:59.999 --> 99:59:59.999 run it in the same cloud service as trydiffoscope 99:59:59.999 --> 99:59:59.999 but it'll give you the result on the command line or 99:59:59.999 --> 99:59:59.999 if you pass the webbrowser option, it will give you an URL or load your webbrowser, 99:59:59.999 --> 99:59:59.999 I can't remember exactly which, with the same results. 99:59:59.999 --> 99:59:59.999 This is 1kB of Python, nothing basically. 99:59:59.999 --> 99:59:59.999 That's the next easiest way. 99:59:59.999 --> 99:59:59.999 But you can then install diffoscope itself on your own machine. 99:59:59.999 --> 99:59:59.999 I recommend not installing recommends because all of those file formats 99:59:59.999 --> 99:59:59.999 might drag in extra things about the whole of TeX, 99:59:59.999 --> 99:59:59.999 I think the whole of OpenOffice, whole of Mono, whole Java… 99:59:59.999 --> 99:59:59.999 Android, yeah, quite big. 99:59:59.999 --> 99:59:59.999 I think there's another big one I can't think of. 99:59:59.999 --> 99:59:59.999 They're all optional, and they all say "By the way, I support TeX documents 99:59:59.999 --> 99:59:59.999 or whatever, Mono, whatever. 99:59:59.999 --> 99:59:59.999 But you need to install this package and then you get full pretty printed support", 99:59:59.999 --> 99:59:59.999 And it'll tell you that when it's missing. 99:59:59.999 --> 99:59:59.999 So, if you just start with --install-recommends disabled, 99:59:59.999 --> 99:59:59.999 right on your file, if it says "please install this package, you can then 99:59:59.999 --> 99:59:59.999 install them as you go along, as you want" 99:59:59.999 --> 99:59:59.999 rather than installing everything. 99:59:59.999 --> 99:59:59.999 And then ??? and then works as before 99:59:59.999 --> 99:59:59.999 I you can improve all your own quality assurance and debian packaging 99:59:59.999 --> 99:59:59.999 with different scope 99:59:59.999 --> 99:59:59.999 The biggest value here is not necessary for reproducible builds 99:59:59.999 --> 99:59:59.999 It's for basically just seeing where you do want to have a diff or expecting a diff 99:59:59.999 --> 99:59:59.999 and you are expecting a particularly type of diff in a particularly way 99:59:59.999 --> 99:59:59.999 you can basically see those changes 99:59:59.999 --> 99:59:59.999 And if you build two debs normally and ... i'll try to demo in a second 99:59:59.999 --> 99:59:59.999 You build a deb with a patch applied you can ??? see a diff on the source package 99:59:59.999 --> 99:59:59.999 But that's not very useful because the binaries are going to end in the 99:59:59.999 --> 99:59:59.999 people machines. But if you run a diff on the binary itself, did that change and 99:59:59.999 --> 99:59:59.999 really hit the binary, I think really ... No.. 99:59:59.999 --> 99:59:59.999 I just run through a very live demo of course, so it's gonna fail ... 99:59:59.999 --> 99:59:59.999 Checkout some .... We'll get this libnetx-java 99:59:59.999 --> 99:59:59.999 We just build that once 99:59:59.999 --> 99:59:59.999 Lets say we are on security team and 99:59:59.999 --> 99:59:59.999 want to apply a patch, and we want to be really sure because we are to push it out 99:59:59.999 --> 99:59:59.999 to all our users 99:59:59.999 --> 99:59:59.999 First we will make a changelog 99:59:59.999 --> 99:59:59.999 Closing a bug 99:59:59.999 --> 99:59:59.999 Find some java file to change 99:59:59.999 --> 99:59:59.999 Let's pretend we have a real patch 99:59:59.999 --> 99:59:59.999 Let's replace that equals equals, say that was the fix 99:59:59.999 --> 99:59:59.999 So that's the patch from upstream 99:59:59.999 --> 99:59:59.999 Upstream blast patch 99:59:59.999 --> 99:59:59.999 When we build this what we wanna see is just that change in the file 99:59:59.999 --> 99:59:59.999 we wanna see any nonsense changes of extended ??? but we also definitely want 99:59:59.999 --> 99:59:59.999 to see that change, cause if our binary as for security reasons don't have that change 99:59:59.999 --> 99:59:59.999 the we aren't fixing people machines, they will issue a DSA ??? installed, saying 99:59:59.999 --> 99:59:59.999 And you should do proper testing as well at multiple levels 99:59:59.999 --> 99:59:59.999 I will build that again 99:59:59.999 --> 99:59:59.999 So we wanna diff the original one 99:59:59.999 --> 99:59:59.999 We wanna diff that one with a fake security one 99:59:59.999 --> 99:59:59.999 You see on the progress bar 100% 1- there are diferences (there should be 99:59:59.999 --> 99:59:59.999 diferences) Lets see what that diferences are 99:59:59.999 --> 99:59:59.999 in our web browser, its a nice html output 99:59:59.999 --> 99:59:59.999 Let have a look. Are we seeing what we wanna see? 99:59:59.999 --> 99:59:59.999 There are some chances in the data ta, we kind of expect that 99:59:59.999 --> 99:59:59.999 Whats changed in our control file? Well the version changed,we wanted that 99:59:59.999 --> 99:59:59.999 to change. Perfect 99:59:59.999 --> 99:59:59.999 And its changed to ??? That's what we wanna see 99:59:59.999 --> 99:59:59.999 No other changes here so there was no weird control or in magic going on 99:59:59.999 --> 99:59:59.999 In our data tar the color of the timestamp changes, we will ignore these for now 99:59:59.999 --> 99:59:59.999 The changelog has changed, well I hope so because I have changed that entry 99:59:59.999 --> 99:59:59.999 Here is where we going to start seeing We are going to see the changing in the 99:59:59.999 --> 99:59:59.999 jar file which is the java class, java compile archive format 99:59:59.999 --> 99:59:59.999 We are seeing some meaningless timestamp changes but we can ignore those 99:59:59.999 --> 99:59:59.999 ??? cause its just metadata maybe 99:59:59.999 --> 99:59:59.999 Ok part of a class, so if you can see here it's basically a de-compilation of the 99:59:59.999 --> 99:59:59.999 java file itself and it's basically saying "oh I use to say if now and if not now" 99:59:59.999 --> 99:59:59.999 So these are the actual byte java byte code instructions and whats really 99:59:59.999 --> 99:59:59.999 And what is really ??? here its that nothing else has changed 99:59:59.999 --> 99:59:59.999 We were just expecting that change between the two op codes, of if now elseif not not now 99:59:59.999 --> 99:59:59.999 which is good cause its like it hasn't made any code changes but also crucial we can 99:59:59.999 --> 99:59:59.999 see that it has actually made a change to the code. 99:59:59.999 --> 99:59:59.999 For example its wasn't use some cached version or something like that 99:59:59.999 --> 99:59:59.999 This is really useful 99:59:59.999 --> 99:59:59.999 And just running a naif diff wouldn't give that of course, because it would just 99:59:59.999 --> 99:59:59.999 come with binary garbage And just seeing the diff had changed again 99:59:59.999 --> 99:59:59.999 ??? be told you anything, because all of the change would have changed as well 99:59:59.999 --> 99:59:59.999 So its like well yes it's diferent 99:59:59.999 --> 99:59:59.999 The meaningful change there it's what actually fixes the "floor" 99:59:59.999 --> 99:59:59.999 ??? but we know it's there 99:59:59.999 --> 99:59:59.999 That's kind of ??? Shifting this deb out I'll be quite 99:59:59.999 --> 99:59:59.999 confident, that this seemed like the actual bug 99:59:59.999 --> 99:59:59.999 I've been quite confident pushing that out because it's very minimal amount of changes 99:59:59.999 --> 99:59:59.999 you wanna do that for security reasons 99:59:59.999 --> 99:59:59.999 So this was the live demo 99:59:59.999 --> 99:59:59.999 The other one is seeing no changes at all, so you can build once 99:59:59.999 --> 99:59:59.999 if you build a reproducible 99:59:59.999 --> 99:59:59.999 You can build once change your compiler or change some other part of your toolchain 99:59:59.999 --> 99:59:59.999 Build it again and if you got the exact same results, well great, that's want you intended 99:59:59.999 --> 99:59:59.999 You wanna see no changes when you change some part of it 99:59:59.999 --> 99:59:59.999 And that is really useful, if there were changes diffoscope will highlight them 99:59:59.999 --> 99:59:59.999 and show exactly why they had changed, maybe some compile authorizations, 99:59:59.999 --> 99:59:59.999 maybe some other things as well 99:59:59.999 --> 99:59:59.999 So you can use it in both ways, when you expect changes and when you don't expect 99:59:59.999 --> 99:59:59.999 changes, and if those match the expectations diffoscope will tell you exactly why 99:59:59.999 --> 99:59:59.999 It's all ??? when other companies are doing security releases 99:59:59.999 --> 99:59:59.999 naming no names whatsoever, but they like to release patches as you 99:59:59.999 --> 99:59:59.999 know just a new firmware for your router 99:59:59.999 --> 99:59:59.999 Very large file system images, you basically have no ideia what changed 99:59:59.999 --> 99:59:59.999 between these two files, again you ??? through diff completely useless 99:59:59.999 --> 99:59:59.999 You can start to unpack them with ??? and blah blah blah 99:59:59.999 --> 99:59:59.999 But they're probably sort of concatenated cpio archives, so that's nonsense 99:59:59.999 --> 99:59:59.999 But diffoscope would just chew you those and give you actually what the diferences 99:59:59.999 --> 99:59:59.999 is between these two files, and say they changed this, they've removed or 99:59:59.999 --> 99:59:59.999 added some gpl license code or something kind of interesting 99:59:59.999 --> 99:59:59.999 So its very useful for diffing those kind binary blobs that come from various people 99:59:59.999 --> 99:59:59.999 So the current state of diffoscope, the development is up and down 99:59:59.999 --> 99:59:59.999 It started around May 2014 something like that A bunch of work here, that's is idle I think 99:59:59.999 --> 99:59:59.999 There are just for debconfs basically 99:59:59.999 --> 99:59:59.999 Anyway it's going up and down its kind of interesting 99:59:59.999 --> 99:59:59.999 ??? a lot of reproducible builds projects of course, so every time we do a build 99:59:59.999 --> 99:59:59.999 on the ??? reproducible builds or testing framework if we run diffoscope 99:59:59.999 --> 99:59:59.999 on the result, if it's reproducible it just says , hey the file is the same 99:59:59.999 --> 99:59:59.999 But if not, we publish the diffoscopes of all your packages that are unreproducible 99:59:59.999 --> 99:59:59.999 just you can just go there and be like whats the diference between these two things 99:59:59.999 --> 99:59:59.999 I invested a lot of work optimizing diffoscope, ??? rather perverse end square 99:59:59.999 --> 99:59:59.999 loops inside it. So i manage to cut down some of the time here, cut down here 99:59:59.999 --> 99:59:59.999 That's been quite a few performances and enhancements over the past ... 99:59:59.999 --> 99:59:59.999 these are the git tags , this is version 80 and this is version 50 I just run the same 99:59:59.999 --> 99:59:59.999 benchmark across them all 99:59:59.999 --> 99:59:59.999 So they shows when I have introduced some rather stupid code, embarrassing , but whatever 99:59:59.999 --> 99:59:59.999 ??? 99:59:59.999 --> 99:59:59.999 There's work been done right now, on parallel processing, there's been 99:59:59.999 --> 99:59:59.999 quite a few attempts before, but adding it it's kind of interesting and difficult 99:59:59.999 --> 99:59:59.999 Luckily we have a ??? student Liliana, is she in the room? Is she hiding? 99:59:59.999 --> 99:59:59.999 She's here and she's been talking tomorrow about her work on paralel processing in 99:59:59.999 --> 99:59:59.999 diffoscope and that will be amazing because a lot of it is IO bound or waiting for Xtel 99:59:59.999 --> 99:59:59.999 processors with multiple cpu machines, you mind as well just play well 99:59:59.999 --> 99:59:59.999 while as I stand waiting for the result for a pdf to be unpacked I maybe as well 99:59:59.999 --> 99:59:59.999 be running on another cpu, I think we are going to see some real performance wins 99:59:59.999 --> 99:59:59.999 as we do that paralell processing merge and working and ??? 99:59:59.999 --> 99:59:59.999 You can check out our website diffoscope.org recently migrated to Salsa .... yeeaahhh 99:59:59.999 --> 99:59:59.999 And everything ??? reproducible is now on Salsa, it's kind of cool 99:59:59.999 --> 99:59:59.999 That's quite recent... 99:59:59.999 --> 99:59:59.999 Thank you very muck, Danke shcön 99:59:59.999 --> 99:59:59.999 You got any questions? About diffoscope? 99:59:59.999 --> 99:59:59.999 Thank you very much ! 99:59:59.999 --> 99:59:59.999 Q: A buzz word question, can you diff containers image formats? 99:59:59.999 --> 99:59:59.999 A: Depend which ones. So if they are just directory, then yes, because is just a directory 99:59:59.999 --> 99:59:59.999 Do you have particullary in mind? Like docker? 99:59:59.999 --> 99:59:59.999 Yes, there's docker and then there's old CI, I believe is the standard one 99:59:59.999 --> 99:59:59.999 And that could make a buzz word complaint 99:59:59.999 --> 99:59:59.999 Ah ok we were all about buzz words 99:59:59.999 --> 99:59:59.999 Probable diffoscope block change as well 99:59:59.999 --> 99:59:59.999 And then run diffoscope on connectors and see the difference between updates of your 99:59:59.999 --> 99:59:59.999 container images 99:59:59.999 --> 99:59:59.999 BAM ... solved Where do I invest? 99:59:59.999 --> 99:59:59.999 I wasn't aware that OCI ... that's is how it's called? No it doesn't support that right now 99:59:59.999 --> 99:59:59.999 But it wouldn't be too difficult, presuming are tools to unpack it and as soon we have 99:59:59.999 --> 99:59:59.999 a tool to unpack it, it can then just go to that, there is a wishing list tool box 99:59:59.999 --> 99:59:59.999 for docker containers to the point were I think it would be really nice if you 99:59:59.999 --> 99:59:59.999 could just give it, say, two images names or whatever the noun is 99:59:59.999 --> 99:59:59.999 So you can say "please diff these two docker images that are available" and 99:59:59.999 --> 99:59:59.999 it can look at your local thing and do a diff on them, currently it's not 99:59:59.999 --> 99:59:59.999 supported, but there is an open wishlist bug. 99:59:59.999 --> 99:59:59.999 Q: Shouldn't any company that releases binaries, be interested in supporting 99:59:59.999 --> 99:59:59.999 diffoscope and using it? 99:59:59.999 --> 99:59:59.999 A1: Basically when companies release binaries they are not interested in users seeing diferences... 99:59:59.999 --> 99:59:59.999 A2: Yes, I'm surprised that actually the docker bug was only opened two months ago 99:59:59.999 --> 99:59:59.999 and hasn't been more interest on diffing container images, but if you like to open 99:59:59.999 --> 99:59:59.999 one for OCI that will be very appreciated, and we can get on to that, that would be 99:59:59.999 --> 99:59:59.999 great. 99:59:59.999 --> 99:59:59.999 I was looking the page for OCI, it says it's based on docker basically, so 99:59:59.999 --> 99:59:59.999 once you get OCI for free, you would sort it out for docker, if you're lucky 99:59:59.999 --> 99:59:59.999 The OCI image formaters, they wrote out on docker images 99:59:59.999 --> 99:59:59.999 Ok we will sort that out, and it seems like we're using a docker more and more 99:59:59.999 --> 99:59:59.999 on debian 99:59:59.999 --> 99:59:59.999 Any other questions? 99:59:59.999 --> 99:59:59.999 Q: Out of curiosity, which ??? are you using inside? Are you using some bio-informatics 99:59:59.999 --> 99:59:59.999 on ??? to diff trees efficiently? 99:59:59.999 --> 99:59:59.999 A: No it's really naif, all it does is run normal diff, the normal diff tools, but 99:59:59.999 --> 99:59:59.999 it will try to identify files and unpack first, so use the file utility identifier 99:59:59.999 --> 99:59:59.999 thing that says its a pdf , and try to unpack it first, he doesn't do any clever 99:59:59.999 --> 99:59:59.999 matching. The clever matching that he does do is fuzzy matching as well, so if just 99:59:59.999 --> 99:59:59.999 rename a directory between two inside a container, he will say , yeah there a 99:59:59.999 --> 99:59:59.999 massive match between this two files, and things like that. So that's kind of 99:59:59.999 --> 99:59:59.999 useful. ??? it's not so that clever, which is kind of what you want , cause if it's 99:59:59.999 --> 99:59:59.999 too clever it would start to be a little opaque ... 99:59:59.999 --> 99:59:59.999 I personally like dumb tools. 99:59:59.999 --> 99:59:59.999 Q: So one question to you is whether, if you wanna do a release to stable or 99:59:59.999 --> 99:59:59.999 something like that, you can ask for the debdiff, I'm wandering if anyone 99:59:59.999 --> 99:59:59.999 I mean I remember doing that myself I've been submitting diffoscope output 99:59:59.999 --> 99:59:59.999 as well, because is just more readable and useful. so I'm not sure if anyone have any 99:59:59.999 --> 99:59:59.999 objection to people asking for those. 99:59:59.999 --> 99:59:59.999 I'll propose that to the release team see what they say 99:59:59.999 --> 99:59:59.999 Thank you very much, any further questions? 99:59:59.999 --> 99:59:59.999 [Applause]