WEBVTT 00:00:07.466 --> 00:00:10.156 I'm here today to talk to you about diffoscope 00:00:10.156 --> 00:00:13.190 and how you can use it as a better diff 00:00:14.063 --> 00:00:16.166 or for Quality Assurance, etc., things like that. 00:00:19.789 --> 00:00:20.810 Moin! 00:00:20.815 --> 00:00:24.409 Apparently that's like a north german thing to say "welcome". 00:00:25.938 --> 00:00:29.898 North german, north Denmark, Scandinavia, that kind of thing, I'm told. 00:00:31.836 --> 00:00:34.197 People are shaking their head, so I'm going to assume that's true. 00:00:37.306 --> 00:00:40.425 This is my first PC, an IBM 5155. 00:00:41.623 --> 00:00:46.441 Sometimes, when you rebooted it, it would launch into, it would somehow revert 00:00:46.688 --> 00:00:50.971 from booting from the hard disk to booting from a basic ROM, 00:00:51.359 --> 00:00:52.959 as in the programming language ROM. 00:00:53.017 --> 00:00:54.320 It was on my motherboard for some reason. 00:00:54.912 --> 00:00:57.691 So, randomly, you just get a chance to program in basic and then, 00:00:57.957 --> 00:01:00.456 sometimes you wouldn't, I don't know why, but… yeah. 00:01:00.718 --> 00:01:05.173 It's quite fun with this kind of clicky keyboard, and that folded in 00:01:05.519 --> 00:01:07.058 and it was this kind of big desk thing. 00:01:07.058 --> 00:01:08.014 Anyway… 00:01:09.067 --> 00:01:10.187 This is my first Debian. 00:01:10.500 --> 00:01:11.837 At the time it was already old. 00:01:12.890 --> 00:01:15.908 What's this one? Is this Slink? 2.2? Yeah. 00:01:17.077 --> 00:01:22.043 And this is when we had US and non-US, so that's really dating if you remember that. 00:01:23.522 --> 00:01:28.393 This is my first contribution to Debian, 19th December 2006, 00:01:28.803 --> 00:01:33.738 sending a patch to lillypond which is kind of interesting 00:01:34.155 --> 00:01:37.205 and the response was "Oh yeah, rock on, many thanks. I'll upload this and 00:01:37.440 --> 00:01:38.723 it'll be landing to Etch". 00:01:39.007 --> 00:01:43.408 And this was super motivating because Etch was just coming out and it was like 00:01:43.602 --> 00:01:48.732 "Great, I've got let one line of tiny patch in a release. This is super cool." 00:01:49.118 --> 00:01:52.687 Thomas' response was super motivating. 00:01:52.993 --> 00:01:56.450 So, after that, like that Christmas basically spent ??? 00:01:56.675 --> 00:01:59.754 Debian webpages and stuff. 00:02:00.327 --> 00:02:01.568 Very well timed. 00:02:02.234 --> 00:02:03.566 That's kind of a good… 00:02:04.301 --> 00:02:07.379 You know, someone sends a patch, be like "Cool, thanks" 00:02:07.849 --> 00:02:09.434 Like a little notice in the changelog. 00:02:09.807 --> 00:02:14.344 It was, you know, so stupid but… Yeah, do that kind of thing. 00:02:15.558 --> 00:02:17.249 So, moving on. 00:02:17.641 --> 00:02:20.276 Why diffoscope? Why did we write diffoscope? 00:02:20.552 --> 00:02:21.880 What's the background here? 00:02:22.184 --> 00:02:24.575 It comes from reproducible builds. 00:02:24.911 --> 00:02:28.983 The very quick outline is that once you get the source code for free software, 00:02:29.208 --> 00:02:31.505 you download the source code for nginx or whatever, 00:02:31.998 --> 00:02:35.844 pretty much everyone just runs binaries on their servers or their systems. 00:02:36.110 --> 00:02:39.119 You know, "apt install bla", "yum install", whatever. 00:02:40.531 --> 00:02:41.535 Android Playstore, whatever. 00:02:42.479 --> 00:02:46.176 Can you actually trust whether these two things correspond with each other? 00:02:46.470 --> 00:02:49.926 You've gotten the source code, it looks alright, and then you install this binary, 00:02:50.847 --> 00:02:51.821 yeah… 00:02:52.459 --> 00:02:55.861 Who generated that? Can you trust that process? 00:02:56.275 --> 00:02:57.430 Can you trust who generated it? 00:02:58.351 --> 00:03:01.493 Even if you could trust them, could you trust them not to be exploited? Etc. 00:03:02.295 --> 00:03:04.765 This is a big problem because you can exploit a build farm and then 00:03:05.160 --> 00:03:09.895 obviously exploit all of that, you know, a trojan into the build farm, 00:03:10.097 --> 00:03:13.290 so every single binary that comes out is compromised. 00:03:13.708 --> 00:03:14.792 Kind of problematic. 00:03:15.060 --> 00:03:17.686 You could also target individual developers machines, 00:03:17.937 --> 00:03:21.288 so I could go of to, say, your machine, add a backdoor to it, 00:03:21.578 --> 00:03:25.241 so every binary that you give to friends and things like that, 00:03:26.935 --> 00:03:30.485 are compromised in some way, stealing your bitcoins or whatever. 00:03:31.802 --> 00:03:36.127 I can also turn up at your door and blackmail you into producing 00:03:38.522 --> 00:03:42.997 software that has compromises or extra features, shall we say, 00:03:43.472 --> 00:03:44.783 that don't exist in the source code. 00:03:45.133 --> 00:03:47.885 So what will happen there is that you'd release your source 00:03:48.093 --> 00:03:51.968 and the binaries you produce have this sort of backdoor that, you know, 00:03:52.435 --> 00:03:55.127 someone is forcing you into producing. 00:03:55.464 --> 00:03:56.679 So, you don't want to do that. 00:03:56.856 --> 00:03:57.505 Anyway 00:03:58.197 --> 00:03:59.228 enough of that. 00:03:59.228 --> 00:04:03.211 What you do for reproducible builds is you ensure that every time you build 00:04:03.467 --> 00:04:05.773 a piece of software, you get an identical result. 00:04:06.916 --> 00:04:10.885 Multiple people then compare their builds and check whether they all get 00:04:07.074 --> 00:04:11.068 the same results 00:04:11.068 --> 00:04:15.626 and this means that an attacker must either have infected everyone 00:04:15.626 --> 00:04:17.726 at the same time, or they haven't infected anyone. 00:04:20.673 --> 00:04:24.058 The point here is that you have to ensure that builds have identical results. 00:04:24.173 --> 00:04:25.163 Ok, great. 00:04:28.003 --> 00:04:32.539 So, we started the reproducible builds project, etc. 00:04:33.470 --> 00:04:34.744 And we build 2 debs. 00:04:35.112 --> 00:04:36.537 Oh, I'm sorry about the colors there. 00:04:38.067 --> 00:04:38.965 You probably can't see that. 00:04:39.349 --> 00:04:42.485 That says "sha1sum a.deb b.deb". 00:04:46.128 --> 00:04:50.775 Anyway, we're comparing the sha1sums of 2 binary Debian files. 00:04:51.424 --> 00:04:53.922 So, these two files differ. 00:04:54.222 --> 00:04:55.612 Ok, they're not reproducible. 00:04:56.807 --> 00:04:57.527 Why is that? 00:04:57.873 --> 00:04:59.656 So we run a diff on them. 00:05:00.140 --> 00:05:00.637 Yeah… 00:05:01.340 --> 00:05:04.093 So, what can we learn from this? 00:05:04.418 --> 00:05:08.508 Well, not very much, visibly they're compressed so 00:05:08.947 --> 00:05:13.012 as soon as we see one change, we'll see they would just cascade changes 00:05:13.362 --> 00:05:14.866 because that's how compression works. 00:05:16.241 --> 00:05:23.983 I guess we know it's a deb probably a ar format file, not very useful. 00:05:24.193 --> 00:05:26.005 Ok, great so we're gonna have a look in 00:05:26.492 --> 00:05:29.919 We'll do a binary diff and ok, well… 00:05:30.923 --> 00:05:32.790 Again, that's not really telling us very much 00:05:34.413 --> 00:05:36.515 with the diff there. 00:05:37.206 --> 00:05:38.426 Ok, great. 00:05:39.417 --> 00:05:40.427 ??? one level in 00:05:40.513 --> 00:05:44.834 "ar x" is on the new maintainer thing, "how you unpack a deb" 00:05:44.858 --> 00:05:46.215 Everyone remembers this, right? 00:05:48.196 --> 00:05:51.167 You unpack a.deb with "ar x" and you do that to b.deb 00:05:51.599 --> 00:05:53.606 and then we diff the results of that. 00:05:54.099 --> 00:05:57.824 Ok, so…yeah, 7zip. 00:05:58.948 --> 00:06:01.329 Ok, compressed content, not very useful. 00:06:01.897 --> 00:06:07.898 Ok, so let's unpack the control.tar inside these debs. 00:06:08.725 --> 00:06:10.145 And then we run diff on that. 00:06:12.693 --> 00:06:16.850 Still not really telling anything useful about how to make this package reproducible 00:06:17.487 --> 00:06:20.345 So let's unpack the tar.xz into the tar. 00:06:22.463 --> 00:06:28.348 Inside that tar, there's a file called md5sums and we start to see some differences 00:06:28.768 --> 00:06:33.370 between some files in these two debs. 00:06:33.640 --> 00:06:36.527 ??? meaningful, so now we have some idea that 00:06:36.855 --> 00:06:39.101 it has something to do with this usr/bin/pmixer binary. 00:06:39.682 --> 00:06:40.653 Ok, interesting. 00:06:41.989 --> 00:06:45.015 We'll unzip that and then we do a diff on pmixer itself. 00:06:45.914 --> 00:06:48.600 Now we're back into just binary "globgoly" mode 00:06:49.002 --> 00:06:51.736 This isn't very helpful and this is taking quite a while 00:06:52.399 --> 00:06:54.663 and if I remember correctly, Debian has a lot of packages. 00:06:55.182 --> 00:06:56.784 So this might take a little while. 00:06:57.601 --> 00:07:00.415 So, basically, ??? mean 00:07:00.782 --> 00:07:02.008 I should build a better diff. 00:07:03.703 --> 00:07:05.194 That's not quite true, this is actually… 00:07:05.783 --> 00:07:07.472 It was lunar that started this project 00:07:07.801 --> 00:07:10.670 and it was called debbindiff, because we wanted to diff 00:07:11.093 --> 00:07:12.264 binary Debian packages. 00:07:13.474 --> 00:07:15.040 So this is the initial commit, 2014. 00:07:16.962 --> 00:07:20.100 "The version is successfully able to report differences in two .changes files. 00:07:20.100 --> 00:07:22.343 Not with much interesting details, but it's a start." 00:07:22.762 --> 00:07:23.806 And it was a start. 00:07:27.581 --> 00:07:29.918 Fast forwarding… Oh, sorry about these colors, 00:07:30.307 --> 00:07:31.872 I don't know if we can do anything about the lights? 00:07:34.713 --> 00:07:35.363 Yeah? 00:07:37.830 --> 00:07:38.080 No? 00:07:42.124 --> 00:07:42.974 Allright, whatever… 00:07:43.700 --> 00:07:46.410 Basically, we're diffoscoping on… 00:07:47.546 --> 00:07:49.595 It works kind of diff does normally, 00:07:49.981 --> 00:07:51.995 you give it two files, it outputs a unified diff. 00:07:52.699 --> 00:07:59.427 So "diffoscope a b", one file contains the word "foo", one contains the word "bar". 00:08:01.241 --> 00:08:03.340 Nothing actually out of the ordinary. 00:08:03.974 --> 00:08:07.670 It's sort of colored by default, so that's why you can't see it, but whatever. 00:08:10.432 --> 00:08:14.667 It supports archive formats, so if you give it two tar files, 00:08:15.413 --> 00:08:22.263 if we then tar up our "a" file and our "b" file into a a.tar and b.tar 00:08:23.206 --> 00:08:25.374 and then run diffoscope on those tar files 00:08:26.197 --> 00:08:28.395 we get this kind of, like, hierarchy here. 00:08:28.742 --> 00:08:32.006 So it's saying that there are differencies between these files, 00:08:32.513 --> 00:08:37.735 in the file list they have different time stamps, because I made them 00:08:38.161 --> 00:08:39.535 at different times, 00:08:39.848 --> 00:08:42.575 and here are the contents, so we got "foo" there and "bar" there. 00:08:43.296 --> 00:08:44.781 So we can see the difference between them. 00:08:45.566 --> 00:08:48.373 Well, I can, I don't know if you can, you get the slide there. 00:08:49.311 --> 00:08:53.551 If we gzip these tar files and then run diffoscope on those gzip things, 00:08:53.888 --> 00:08:59.230 it'll say "ok, what we've done is unpack it first, and here's the metadata 00:08:59.622 --> 00:09:01.653 about the gzip process", 00:09:02.107 --> 00:09:05.941 and inside that are a.tar and b.tar from the previous slides. 00:09:07.673 --> 00:09:09.085 And then the "a" file and the "b" file. 00:09:09.365 --> 00:09:15.303 So, it's really going two levels deep into this tar.gz file. 00:09:16.162 --> 00:09:17.042 That's pretty cool. 00:09:17.291 --> 00:09:20.772 And it's completely recursive, I think it will actually blow out after, I think, 00:09:20.993 --> 00:09:21.697 1000 [levels]. 00:09:23.119 --> 00:09:25.233 [light is turned down for the audience to see the slides] 00:09:30.195 --> 00:09:32.065 I'll just bump back a bit, just in case. 00:09:35.203 --> 00:09:37.055 [Applause] 00:09:37.806 --> 00:09:38.662 Thank you. 00:09:39.907 --> 00:09:43.462 So that's the a and b files. 00:09:43.884 --> 00:09:48.077 We've tared them up and so I see the hierarchy of foo and bar file layer. 00:09:48.472 --> 00:09:52.012 I've gziped them, so this is a gzip layer. 00:09:52.399 --> 00:09:54.661 Here's the tar layer and then there's the files themselves. 00:09:57.315 --> 00:09:59.252 This is from a real .deb from the archive. 00:10:00.637 --> 00:10:06.542 Inside this .deb, there's a data.tar.xz and in that xz file there's a data.tar 00:10:07.294 --> 00:10:11.081 and inside that tar file, there's a file called aff and inside that 00:10:11.648 --> 00:10:13.892 there's a version string that is different. 00:10:14.174 --> 00:10:17.527 And that looks like a build date so we probably know that if we went back 00:10:17.753 --> 00:10:22.748 to the source package, we could very quickly work out, 00:10:22.922 --> 00:10:26.582 with get a very quick grep, work out where this file is being generated from, 00:10:26.582 --> 00:10:31.536 the de_DE.aff file and then ??? probably quite obvious 00:10:32.285 --> 00:10:37.311 that it's using the current build time and then we can just patch that, fix it etc. 00:10:38.362 --> 00:10:45.681 This is gone from two rather obscure binary .debs all the way to the fix 00:10:46.040 --> 00:10:51.683 probably in about 5 minutes, and you can probably send the patch in that time 00:10:52.098 --> 00:10:53.086 because it'd be quite quick. 00:10:53.860 --> 00:10:57.482 Without diffoscope here, without this sort of recursive unpacking, 00:10:58.351 --> 00:11:03.380 you'd be just completely lost, you'd be there with arx all day 00:11:03.762 --> 00:11:07.109 and working out which files are different and trying to use xxd 00:11:07.859 --> 00:11:09.410 and this kind of nonsense. 00:11:10.612 --> 00:11:12.875 diffoscope's got some other things as well 00:11:13.277 --> 00:11:17.116 if you try to do reproducible packages and things are varying just on 00:11:17.381 --> 00:11:22.408 the line ordering, we detect whether a file differs only in the line ordering. 00:11:22.660 --> 00:11:26.178 So, here's file "a", "These lines are in order". 00:11:27.155 --> 00:11:30.108 File "b" has "These order are in lines". 00:11:30.630 --> 00:11:34.864 It's very difficult to say, actually, it's like one of these tongue twisters. 00:11:35.305 --> 00:11:38.862 Run diffoscope on these two and it says it's got ordering differences only. 00:11:39.210 --> 00:11:41.295 That's interesting, so you probably need to sort, 00:11:41.592 --> 00:11:45.076 you go all the way back to the source code, work out very quickly, 00:11:45.389 --> 00:11:48.381 if you know it's just ordering differences you just kind of know 00:11:48.672 --> 00:11:52.762 what the output's gonna be, you can search for order in ??? 00:11:53.166 --> 00:11:54.648 and you get the right files, 00:11:54.928 --> 00:11:57.803 I have sorted in sort in the right place, BAM! send it patched of, 00:11:57.889 --> 00:11:59.280 everything is great. 00:11:59.280 --> 00:12:02.720 Oh, and send it to upstream as well because you're good. 00:12:03.041 --> 00:12:04.707 It supports a lot more things. 00:12:05.509 --> 00:12:08.611 We've been showing the terminal text output here. 00:12:10.978 --> 00:12:15.950 It's got a HTML output mode, which is really useful in the hierarchal thing 00:12:16.139 --> 00:12:17.359 when it gets a bit more complicated. 00:12:19.397 --> 00:12:21.766 Instead of being laid on top of each other like a unified diff, 00:12:22.312 --> 00:12:26.811 you get the diff on the left and the right and you get sort of a nested 00:12:27.075 --> 00:12:32.372 thing inside with colors and lines and you can link this and various things in it 00:12:32.728 --> 00:12:37.547 including bits of metadata here, other bits here, what command you used. 00:12:38.951 --> 00:12:40.392 That's the HTML output. 00:12:40.659 --> 00:12:43.960 We also support a lot of file formats, it's not just on text, 00:12:45.635 --> 00:12:48.958 it's about all of these, so let's quickly run through some of them. 00:12:49.298 --> 00:12:54.503 You give it two Androip apk files which are kind of like zips, but magic. 00:12:55.163 --> 00:12:58.211 It'll know how to compare them. 00:12:58.570 --> 00:13:01.026 There's like a Manifest file that needs decoding. 00:13:01.617 --> 00:13:03.761 It supports Berkeley DB databases, 00:13:04.098 --> 00:13:08.247 Word documents, that's a Word document with "a" and that's a Word document with "b" 00:13:08.715 --> 00:13:10.359 and it'll correctly do that. 00:13:10.583 --> 00:13:14.311 If you run that through diff normally, that ??? be a binaly mess, 00:13:14.932 --> 00:13:16.188 so completely useless. 00:13:17.503 --> 00:13:20.118 E-books, there's epub, it also supports mobi. 00:13:20.563 --> 00:13:25.958 So if you give it two epub files, it'll say "They just differ in this date". 00:13:26.463 --> 00:13:27.350 Brilliant. 00:13:28.177 --> 00:13:30.557 Normally that will be completely useless diff binary ??? 00:13:30.794 --> 00:13:35.624 So you can be like "epub date, ok", grep the source code for that, 00:13:36.427 --> 00:13:38.350 make a patch really quickly. 00:13:39.594 --> 00:13:42.786 Mono binaries, git repositories, why not? 00:13:43.693 --> 00:13:46.222 Gnumeric spreadsheets, ISO images. 00:13:46.454 --> 00:13:47.883 Oh yeah, ISO images is really cool. 00:13:48.359 --> 00:13:55.044 So, it'll basically unpack the ISO, then inside that there might be a squashfs image 00:13:55.378 --> 00:14:01.549 then it'll completely go down to that and work out any differences 00:14:01.746 --> 00:14:06.065 between the two contents in the ISO file, including any metadata. 00:14:06.432 --> 00:14:10.607 This is on the squashfs metadata headers, I think. 00:14:11.634 --> 00:14:19.251 But say inside that ISO, there was a file that was a pdf, and inside that pdf was 00:14:19.572 --> 00:14:23.048 a ??? which varied, 00:14:23.285 --> 00:14:26.653 it will basically go all the way down and say "yeah, it's actually here, 00:14:26.909 --> 00:14:28.446 in this ??? that the data differs." 00:14:28.866 --> 00:14:32.355 And that means you can just go again all the way back to the source 00:14:32.646 --> 00:14:35.555 and say "ok, cool, we know how to fix this quite quickly" 00:14:36.076 --> 00:14:39.600 And this is really valuable in getting the recent Tails distribution reproducible 00:14:39.973 --> 00:14:43.387 so their ISOs are reproducible. 00:14:43.829 --> 00:14:46.873 If you build one and I build one, we get the exact same one 00:14:47.241 --> 00:14:51.389 and that's kind of useful for something like Tails where you would probably want to 00:14:51.828 --> 00:14:54.966 of all, there's a lot of projects that you might want to compromise, 00:14:55.450 --> 00:14:58.792 you might want to go after that one, because of the kind of people that are using it. 00:15:01.734 --> 00:15:10.009 We support comparing images, so this is using ??? 00:15:12.043 --> 00:15:13.714 and then just running that through diff. 00:15:16.092 --> 00:15:20.272 That is a linux penguin and that is something else, 00:15:20.627 --> 00:15:23.629 I can't remember now. Oh, FT. 00:15:24.819 --> 00:15:25.801 It supports images. 00:15:27.044 --> 00:15:33.009 It supports JSON and pretty print, so if you give it two JSON files 00:15:33.485 --> 00:15:36.657 one with key/value… it'll do a nice diff of them. 00:15:38.042 --> 00:15:43.432 It will pretty print it first, before doing the diff, so it'll actually give you 00:15:43.634 --> 00:15:46.236 something clean, otherwise I don't know if you've ever diffed 00:15:46.978 --> 00:15:50.344 two very long JSON lines, if they differ in the middle, you just get 00:15:50.525 --> 00:15:54.737 a huge long unified diff, but here it's like "oh, just ??? things have changed" 00:15:58.875 --> 00:16:04.052 OpenDocument text formats, Ogg audio files, because why not. 00:16:05.148 --> 00:16:08.251 tcpdump capture files, that's actually quite useful. 00:16:09.019 --> 00:16:17.540 PDFs. That PDF says "Hello World" and this PDF says "Hello sick sad world", 00:16:17.995 --> 00:16:23.356 I don't know why, that particulary text in the demo. 00:16:23.852 --> 00:16:27.058 Again, run that through normal diff program… garbage. 00:16:28.212 --> 00:16:34.074 XML documents. Again, it'll pretty print them so it's nice, actually nice do read. 00:16:36.117 --> 00:16:41.809 If you want to get started on diffoscope, the very easiest and quickest way to do is 00:16:42.212 --> 00:16:47.678 fire up a web browser, try.diffoscope.org, select your files, press Compare 00:16:48.470 --> 00:16:54.883 and it'll upload them and run diffoscope with all the support for all the file formats 00:16:55.226 --> 00:16:59.096 in the cloud for you and give you a nice HTML page that you can then link to people 00:16:59.423 --> 00:17:01.107 So that's the very quickest way to get started. 00:17:02.360 --> 00:17:06.884 The next quickest way is to install trydiffoscope and then you run that 00:17:07.165 --> 00:17:09.751 on two files and it'll basically do the same thing, 00:17:10.018 --> 00:17:12.312 run it in the same cloud service as trydiffoscope 00:17:12.877 --> 00:17:16.672 but it'll give you the result on the command line or 00:17:16.981 --> 00:17:22.010 if you pass the webbrowser option, it will give you an URL or load your webbrowser, 00:17:22.228 --> 00:17:24.951 I can't remember exactly which, with the same results. 00:17:25.122 --> 00:17:29.574 This is 1kB of Python, nothing basically. 00:17:31.226 --> 00:17:33.120 That's the next easiest way. 00:17:34.262 --> 00:17:36.622 But you can then install diffoscope itself on your own machine. 00:17:37.631 --> 00:17:42.824 I recommend not installing recommends because all of those file formats 00:17:43.208 --> 00:17:46.560 might drag in extra things about the whole of TeX, 00:17:46.820 --> 00:17:52.178 I think the whole of OpenOffice, whole of Mono, whole Java… 00:17:57.263 --> 00:17:58.403 Android, yeah, quite big. 00:18:01.941 --> 00:18:03.489 I think there's another big one I can't think of. 00:18:04.554 --> 00:18:11.185 They're all optional, and they all say "By the way, I support TeX documents 00:18:12.046 --> 00:18:13.281 or whatever, Mono, whatever. 00:18:13.740 --> 00:18:18.954 But you need to install this package and then you get full pretty printed support", 00:18:19.846 --> 00:18:21.433 And it'll tell you that when it's missing. 00:18:21.791 --> 00:18:25.168 So, if you just start with --install-recommends disabled, 00:18:26.427 --> 00:18:29.107 right on your file, if it says "please install this package, you can then 00:18:29.335 --> 00:18:31.239 install them as you go along, as you want" 00:18:31.722 --> 00:18:34.319 rather than installing everything. 00:18:34.630 --> 00:18:38.333 And then you just pass ??? files and then works as before 00:18:41.978 --> 00:18:45.869 How you can you improve all your own quality assurance and debian packaging 00:18:45.959 --> 00:18:46.713 with different scope 00:18:47.582 --> 00:18:50.974 The biggest value here is not necessary for reproducible builds 00:18:51.771 --> 00:18:56.406 It's for basically just seeing where you do want to have a diff or expecting a diff 00:18:57.078 --> 00:19:00.368 and you are expecting a particularly type of diff in a particularly way 00:19:00.903 --> 00:19:02.307 you can basically see those changes 00:19:03.539 --> 00:19:12.151 And if you build two debs normally and ... i'll try to demo in a second 00:19:12.403 --> 00:19:16.239 You build a deb with a patch applied and then build a deb with the patch applied 00:19:16.792 --> 00:19:19.791 you can ??? run a diff on the source package 00:19:20.742 --> 00:19:24.455 But that's not very useful because the binaries are going to end in the 00:19:24.695 --> 00:19:30.698 people machines. But if you run a diff on the binary itself, did my change actually 00:19:31.150 --> 00:19:33.205 hit the binary? I think really ... No.. 00:19:36.118 --> 00:19:39.093 I just run through a very live demo of course, so it's gonna fail ... 00:20:03.706 --> 00:20:07.376 Checkout some .... We'll get this libnetx-java 00:20:11.041 --> 00:20:12.160 We just build that once 00:20:16.188 --> 00:20:19.258 Lets say we are on security team and 00:20:19.475 --> 00:20:22.701 want to apply a patch, and we want to be really sure because we are to push it out 00:20:22.888 --> 00:20:24.044 to all our users 00:20:25.046 --> 00:20:28.612 First we will make a changelog 00:20:38.445 --> 00:20:39.284 Closing a bug 00:20:48.105 --> 00:20:54.949 Find some java file to change 00:20:55.688 --> 00:20:56.798 Let's pretend we have a real patch 00:21:06.374 --> 00:21:10.650 Let's replace that equals equals, say that was the fix 00:21:14.033 --> 00:21:15.512 So that's the patch from upstream 00:21:15.884 --> 00:21:16.966 Upstream blast patch 00:21:23.505 --> 00:21:26.637 When we build this what we wanna see is just that change in the file 00:21:27.141 --> 00:21:32.116 we wanna see any nonsense changes of extended dump but we also definitely want 00:21:32.293 --> 00:21:37.129 to see that change, cause if our binary as for security reasons don't have that change 00:21:37.129 --> 00:21:42.270 then we aren't fixing people machines, they will issue a DSA ??? installed ??? 00:21:44.685 --> 00:21:48.766 And you should do proper testing as well at multiple levels 00:21:52.763 --> 00:21:53.799 I will build that again 00:22:23.976 --> 00:22:29.717 So we wanna diff the original one 0 5, 00:22:30.432 --> 00:22:36.212 We wanna diff that one with a fake security one 00:22:37.608 --> 00:22:43.481 You see on the progress bar 100% 1- there are diferences (there should be 00:22:43.681 --> 00:22:46.304 diferences) Lets see what that diferences are 00:22:48.418 --> 00:22:51.828 in our web browser, its a nice html output 00:23:01.180 --> 00:23:03.888 Let have a look. Are we seeing what we wanna see? 00:23:07.147 --> 00:23:11.151 There are some chances in the data tar, we kind of expect that 00:23:14.447 --> 00:23:18.389 What's changed in our control file? Well the version changed,we wanted that 00:23:18.565 --> 00:23:19.656 to change. Perfect 00:23:20.535 --> 00:23:24.294 And its changed to ??? That's what we wanna see 00:23:24.744 --> 00:23:28.370 No other changes here so there was no weird control or in magic going on 00:23:32.297 --> 00:23:38.421 In our data tar the color of the timestamp changes, we will ignore those for now 00:23:40.996 --> 00:23:44.944 The changelog has changed, well I hope so because I have changed that entry 00:23:48.820 --> 00:23:51.793 Here is where we going to start seeing We are going to see the changing in the 00:23:52.016 --> 00:23:59.455 jar file which is the java class, java compile archive format 00:24:00.442 --> 00:24:05.931 We are seeing some meaningless timestamp changes but we can ignore those 00:24:06.973 --> 00:24:08.923 lets pretend because its just metadata maybe 00:24:16.429 --> 00:24:24.131 Ok part of a class, so if you can see here it's basically a de-compilation of the 00:24:24.633 --> 00:24:31.500 java file itself and it's basically saying "oh I use to say if now and if not now" 00:24:31.796 --> 00:24:35.567 So these are the actual byte java byte code instructions and whats really 00:24:35.965 --> 00:24:39.241 And what is really ??? here its that nothing else has changed 00:24:39.627 --> 00:24:44.717 We were just expecting that change between the two op codes, of if now elseif not not now 00:24:45.554 --> 00:24:49.557 which is good cause its like it hasn't made any code changes but also crucial we can 00:24:49.725 --> 00:24:52.076 see that it has actually made a change to the code. 00:24:55.060 --> 00:24:58.072 For example its wasn't use some cached version or something like that 00:24:58.338 --> 00:24:59.505 This is really useful 00:25:00.326 --> 00:25:05.038 And just running a naif diff wouldn't give that of course, because it would just 00:25:05.223 --> 00:25:08.341 come with binary garbage And just seeing the diff had changed again 00:25:08.627 --> 00:25:12.604 ??? be told you anything, because all of the change would have changed as well 00:25:12.802 --> 00:25:15.886 So its like well yes it's diferent 00:25:16.028 --> 00:25:19.161 The meaningful change there it's what actually fixes the "floor" 00:25:19.597 --> 00:25:21.020 ??? but we know it's there 00:25:22.945 --> 00:25:27.448 That's kind of ??? Shifting this deb out I'll be quite 00:25:27.687 --> 00:25:30.004 confident, that this seemed like the actual bug 00:25:31.151 --> 00:25:34.721 I've been quite confident pushing that out because it's very minimal amount of changes 00:25:35.218 --> 00:25:36.750 you wanna do that for security reasons 00:25:37.285 --> 00:25:40.111 So this was the live demo 00:25:43.038 --> 00:25:48.108 The other one is seeing no changes at all, so you can build once 00:25:48.108 --> 00:25:49.894 if you build a reproducible 00:25:50.491 --> 00:25:54.753 You can build once change your compiler or change some other part of your toolchain 00:25:55.982 --> 00:26:02.267 Build it again and if you got the exact same results, well great, that's want you intended 00:26:02.534 --> 00:26:04.595 You wanna see no changes when you change some part of it 00:26:08.127 --> 00:26:11.928 And that is really useful, if there were changes diffoscope will highlight them 00:26:12.271 --> 00:26:15.993 and show exactly why they had changed, maybe some compile authorizations, 00:26:16.393 --> 00:26:17.565 maybe some other things as well 00:26:19.056 --> 00:26:22.603 So you can use it in both ways, when you expect changes and when you don't expect 00:26:22.789 --> 00:26:26.926 changes, and if those match the expectations diffoscope will tell you exactly why 00:26:29.922 --> 00:26:34.355 It's all ??? when other companies are doing security releases 00:26:35.111 --> 00:26:41.184 naming no names whatsoever, but they like to release patches as you 00:26:41.697 --> 00:26:44.618 know just a new firmware for your router 00:26:46.674 --> 00:26:50.629 Very large file system images, you basically have no ideia what changed 00:26:51.034 --> 00:26:55.037 between these two files, again you run through diff completely useless 00:26:55.419 --> 00:26:59.496 You can start to unpack them with squashfs and blah blah blah 00:27:01.143 --> 00:27:05.753 But they're probably sort of concatenated cpio archives, so that's nonsense 00:27:07.223 --> 00:27:11.913 But diffoscope would just chew you those and give you actually what the diferences 00:27:11.913 --> 00:27:15.197 is between these two files, and say they changed this, they've removed or 00:27:15.596 --> 00:27:19.260 added some gpl license code or something kind of interesting 00:27:24.293 --> 00:27:31.212 So its very useful for diffing those kind binary blobs that come from various people 00:27:33.013 --> 00:27:36.983 So the current state of diffoscope, the development is up and down 00:27:41.148 --> 00:27:51.343 It started around May 2014 something like that A bunch of work here, that's is idle I think 00:27:55.239 --> 00:27:56.841 These are just for debconfs basically 00:28:09.157 --> 00:28:12.343 Anyway it's going up and down its kind of interesting 00:28:14.939 --> 00:28:19.296 ??? a lot of reproducible builds projects of course, so every time we do a build 00:28:19.621 --> 00:28:25.064 on the ??? reproducible builds or testing framework if we run diffoscope 00:28:25.303 --> 00:28:29.834 on the result, if it's reproducible it just says , hey the file is the same 00:28:31.208 --> 00:28:36.767 But if not, we publish the diffoscopes of all your packages that are unreproducible 00:28:37.092 --> 00:28:40.870 just you can just go there and be like whats the diference between these two things 00:28:53.762 --> 00:29:02.115 I invested a lot of work optimizing diffoscope, ??? rather perverse end square 00:29:02.465 --> 00:29:07.556 loops inside it. So i manage to cut down some of the time here, cut down here 00:29:11.063 --> 00:29:14.214 That's been quite a few performances and enhancements over the past ... 00:29:07.993 --> 00:29:14.311 these are the git tags , this is version 80 and this is version 50 I just run the same 00:29:14.311 --> 00:29:15.662 benchmark across them all 99:59:59.999 --> 99:59:59.999 So they shows when I have introduced some rather stupid code, embarrassing , but whatever 99:59:59.999 --> 99:59:59.999 ??? 99:59:59.999 --> 99:59:59.999 There's work been done right now, on parallel processing, there's been 99:59:59.999 --> 99:59:59.999 quite a few attempts before, but adding it it's kind of interesting and difficult 99:59:59.999 --> 99:59:59.999 Luckily we have a ??? student Liliana, is she in the room? Is she hiding? 99:59:59.999 --> 99:59:59.999 She's here and she's been talking tomorrow about her work on paralel processing in 99:59:59.999 --> 99:59:59.999 diffoscope and that will be amazing because a lot of it is IO bound or waiting for Xtel 99:59:59.999 --> 99:59:59.999 processors with multiple cpu machines, you mind as well just play well 99:59:59.999 --> 99:59:59.999 while as I stand waiting for the result for a pdf to be unpacked I maybe as well 99:59:59.999 --> 99:59:59.999 be running on another cpu, I think we are going to see some real performance wins 99:59:59.999 --> 99:59:59.999 as we do that paralell processing merge and working and ??? 99:59:59.999 --> 99:59:59.999 You can check out our website diffoscope.org recently migrated to Salsa .... yeeaahhh 99:59:59.999 --> 99:59:59.999 And everything ??? reproducible is now on Salsa, it's kind of cool 99:59:59.999 --> 99:59:59.999 That's quite recent... 99:59:59.999 --> 99:59:59.999 Thank you very muck, Danke shcön 99:59:59.999 --> 99:59:59.999 You got any questions? About diffoscope? 99:59:59.999 --> 99:59:59.999 Thank you very much ! 99:59:59.999 --> 99:59:59.999 Q: A buzz word question, can you diff containers image formats? 99:59:59.999 --> 99:59:59.999 A: Depend which ones. So if they are just directory, then yes, because is just a directory 99:59:59.999 --> 99:59:59.999 Do you have particullary in mind? Like docker? 99:59:59.999 --> 99:59:59.999 Yes, there's docker and then there's old CI, I believe is the standard one 99:59:59.999 --> 99:59:59.999 And that could make a buzz word complaint 99:59:59.999 --> 99:59:59.999 Ah ok we were all about buzz words 99:59:59.999 --> 99:59:59.999 Probable diffoscope block change as well 99:59:59.999 --> 99:59:59.999 And then run diffoscope on connectors and see the difference between updates of your 99:59:59.999 --> 99:59:59.999 container images 99:59:59.999 --> 99:59:59.999 BAM ... solved Where do I invest? 99:59:59.999 --> 99:59:59.999 I wasn't aware that OCI ... that's is how it's called? No it doesn't support that right now 99:59:59.999 --> 99:59:59.999 But it wouldn't be too difficult, presuming are tools to unpack it and as soon we have 99:59:59.999 --> 99:59:59.999 a tool to unpack it, it can then just go to that, there is a wishing list tool box 99:59:59.999 --> 99:59:59.999 for docker containers to the point were I think it would be really nice if you 99:59:59.999 --> 99:59:59.999 could just give it, say, two images names or whatever the noun is 99:59:59.999 --> 99:59:59.999 So you can say "please diff these two docker images that are available" and 99:59:59.999 --> 99:59:59.999 it can look at your local thing and do a diff on them, currently it's not 99:59:59.999 --> 99:59:59.999 supported, but there is an open wishlist bug. 99:59:59.999 --> 99:59:59.999 Q: Shouldn't any company that releases binaries, be interested in supporting 99:59:59.999 --> 99:59:59.999 diffoscope and using it? 99:59:59.999 --> 99:59:59.999 A1: Basically when companies release binaries they are not interested in users seeing diferences... 99:59:59.999 --> 99:59:59.999 A2: Yes, I'm surprised that actually the docker bug was only opened two months ago 99:59:59.999 --> 99:59:59.999 and hasn't been more interest on diffing container images, but if you like to open 99:59:59.999 --> 99:59:59.999 one for OCI that will be very appreciated, and we can get on to that, that would be 99:59:59.999 --> 99:59:59.999 great. 99:59:59.999 --> 99:59:59.999 I was looking the page for OCI, it says it's based on docker basically, so 99:59:59.999 --> 99:59:59.999 once you get OCI for free, you would sort it out for docker, if you're lucky 99:59:59.999 --> 99:59:59.999 The OCI image formaters, they wrote out on docker images 99:59:59.999 --> 99:59:59.999 Ok we will sort that out, and it seems like we're using a docker more and more 99:59:59.999 --> 99:59:59.999 on debian 99:59:59.999 --> 99:59:59.999 Any other questions? 99:59:59.999 --> 99:59:59.999 Q: Out of curiosity, which ??? are you using inside? Are you using some bio-informatics 99:59:59.999 --> 99:59:59.999 on ??? to diff trees efficiently? 99:59:59.999 --> 99:59:59.999 A: No it's really naif, all it does is run normal diff, the normal diff tools, but 99:59:59.999 --> 99:59:59.999 it will try to identify files and unpack first, so use the file utility identifier 99:59:59.999 --> 99:59:59.999 thing that says its a pdf , and try to unpack it first, he doesn't do any clever 99:59:59.999 --> 99:59:59.999 matching. The clever matching that he does do is fuzzy matching as well, so if just 99:59:59.999 --> 99:59:59.999 rename a directory between two inside a container, he will say , yeah there a 99:59:59.999 --> 99:59:59.999 massive match between this two files, and things like that. So that's kind of 99:59:59.999 --> 99:59:59.999 useful. ??? it's not so that clever, which is kind of what you want , cause if it's 99:59:59.999 --> 99:59:59.999 too clever it would start to be a little opaque ... 99:59:59.999 --> 99:59:59.999 I personally like dumb tools. 99:59:59.999 --> 99:59:59.999 Q: So one question to you is whether, if you wanna do a release to stable or 99:59:59.999 --> 99:59:59.999 something like that, you can ask for the debdiff, I'm wandering if anyone 99:59:59.999 --> 99:59:59.999 I mean I remember doing that myself I've been submitting diffoscope output 99:59:59.999 --> 99:59:59.999 as well, because is just more readable and useful. so I'm not sure if anyone have any 99:59:59.999 --> 99:59:59.999 objection to people asking for those. 99:59:59.999 --> 99:59:59.999 I'll propose that to the release team see what they say 99:59:59.999 --> 99:59:59.999 Thank you very much, any further questions? 99:59:59.999 --> 99:59:59.999 [Applause]