0:00:07.466,0:00:10.156 I'm here today to talk to you about[br]diffoscope 0:00:10.156,0:00:13.190 and how you can use it as a better diff 0:00:14.063,0:00:16.166 or for Quality Assurance, etc., things[br]like that. 0:00:19.789,0:00:20.810 Moin! 0:00:20.815,0:00:24.409 Apparently that's like a north german[br]thing to say "welcome". 0:00:25.938,0:00:29.898 North german, north Denmark, Scandinavia,[br]that kind of thing, I'm told. 0:00:31.836,0:00:34.197 People are shaking their head, so I'm[br]going to assume that's true. 0:00:37.306,0:00:40.425 This is my first PC, an IBM 5155. 0:00:41.623,0:00:46.441 Sometimes, when you rebooted it, it would[br]launch into, it would somehow revert 0:00:46.688,0:00:50.971 from booting from the hard disk to booting[br]from a basic ROM, 0:00:51.359,0:00:52.959 as in the programming language ROM. 0:00:53.017,0:00:54.320 It was on my motherboard for some reason. 0:00:54.912,0:00:57.691 So, randomly, you just get a chance to[br]program in basic and then, 0:00:57.957,0:01:00.456 sometimes you wouldn't, I don't know why,[br]but… yeah. 0:01:00.718,0:01:05.173 It's quite fun with this kind of clicky[br]keyboard, and that folded in 0:01:05.519,0:01:07.058 and it was this kind of big desk thing. 0:01:07.058,0:01:08.014 Anyway… 0:01:09.067,0:01:10.187 This is my first Debian. 0:01:10.500,0:01:11.837 At the time it was already old. 0:01:12.890,0:01:15.908 What's this one? Is this Slink? 2.2?[br]Yeah. 0:01:17.077,0:01:22.043 And this is when we had US and non-US,[br]so that's really dating if you remember that. 0:01:23.522,0:01:28.393 This is my first contribution to Debian,[br]19th December 2006, 0:01:28.803,0:01:33.738 sending a patch to lillypond which is kind[br]of interesting 0:01:34.155,0:01:37.205 and the response was "Oh yeah, rock on,[br]many thanks. I'll upload this and 0:01:37.440,0:01:38.723 it'll be landing to Etch". 0:01:39.007,0:01:43.408 And this was super motivating because[br]Etch was just coming out and it was like 0:01:43.602,0:01:48.732 "Great, I've got let one line of tiny patch[br]in a release. This is super cool." 0:01:49.118,0:01:52.687 Thomas' response was super motivating. 0:01:52.993,0:01:56.450 So, after that, like that Christmas[br]basically spent ??? 0:01:56.675,0:01:59.754 Debian webpages and stuff. 0:02:00.327,0:02:01.568 Very well timed. 0:02:02.234,0:02:03.566 That's kind of a good… 0:02:04.301,0:02:07.379 You know, someone sends a patch, be like[br]"Cool, thanks" 0:02:07.849,0:02:09.434 Like a little notice in the changelog. 0:02:09.807,0:02:14.344 It was, you know, so stupid but…[br]Yeah, do that kind of thing. 0:02:15.558,0:02:17.249 So, moving on. 0:02:17.641,0:02:20.276 Why diffoscope?[br]Why did we write diffoscope? 0:02:20.552,0:02:21.880 What's the background here? 0:02:22.184,0:02:24.575 It comes from reproducible builds. 0:02:24.911,0:02:28.983 The very quick outline is that once you[br]get the source code for free software, 0:02:29.208,0:02:31.505 you download the source code for nginx[br]or whatever, 0:02:31.998,0:02:35.844 pretty much everyone just runs binaries[br]on their servers or their systems. 0:02:36.110,0:02:39.119 You know, "apt install bla", "yum install",[br]whatever. 0:02:40.531,0:02:41.535 Android Playstore, whatever. 0:02:42.479,0:02:46.176 Can you actually trust whether these two[br]things correspond with each other? 0:02:46.470,0:02:49.926 You've gotten the source code, it looks[br]alright, and then you install this binary, 0:02:50.847,0:02:51.821 yeah… 0:02:52.459,0:02:55.861 Who generated that? Can you trust that[br]process? 0:02:56.275,0:02:57.430 Can you trust who generated it? 0:02:58.351,0:03:01.493 Even if you could trust them, could you[br]trust them not to be exploited? Etc. 0:03:02.295,0:03:04.765 This is a big problem because you can[br]exploit a build farm and then 0:03:05.160,0:03:09.895 obviously exploit all of that, you know,[br]a trojan into the build farm, 0:03:10.097,0:03:13.290 so every single binary that comes out[br]is compromised. 0:03:13.708,0:03:14.792 Kind of problematic. 0:03:15.060,0:03:17.686 You could also target individual developers[br]machines, 0:03:17.937,0:03:21.288 so I could go of to, say, your machine,[br]add a backdoor to it, 0:03:21.578,0:03:25.241 so every binary that you give to friends[br]and things like that, 0:03:26.935,0:03:30.485 are compromised in some way, stealing[br]your bitcoins or whatever. 0:03:31.802,0:03:36.127 I can also turn up at your door[br]and blackmail you into producing 0:03:38.522,0:03:42.997 software that has compromises or extra[br]features, shall we say, 0:03:43.472,0:03:44.783 that don't exist in the source code. 0:03:45.133,0:03:47.885 So what will happen there is that you'd[br]release your source 0:03:48.093,0:03:51.968 and the binaries you produce have[br]this sort of backdoor that, you know, 0:03:52.435,0:03:55.127 someone is forcing you into producing. 0:03:55.464,0:03:56.679 So, you don't want to do that. 0:03:56.856,0:03:57.505 Anyway 0:03:58.197,0:03:59.228 enough of that. 0:03:59.228,0:04:03.211 What you do for reproducible builds is you[br]ensure that every time you build 0:04:03.467,0:04:05.773 a piece of software, you get an identical[br]result. 0:04:06.916,0:04:10.885 Multiple people then compare their builds[br]and check whether they all get 0:04:07.074,0:04:11.068 the same results 0:04:11.068,0:04:15.626 and this means that an attacker must[br]either have infected everyone 0:04:15.626,0:04:17.726 at the same time, or they haven't[br]infected anyone. 0:04:20.673,0:04:24.058 The point here is that you have to ensure[br]that builds have identical results. 0:04:24.173,0:04:25.163 Ok, great. 0:04:28.003,0:04:32.539 So, we started the reproducible builds[br]project, etc. 0:04:33.470,0:04:34.744 And we build 2 debs. 0:04:35.112,0:04:36.537 Oh, I'm sorry about the colors there. 0:04:38.067,0:04:38.965 You probably can't see that. 0:04:39.349,0:04:42.485 That says "sha1sum a.deb b.deb". 0:04:46.128,0:04:50.775 Anyway, we're comparing the sha1sums[br]of 2 binary Debian files. 0:04:51.424,0:04:53.922 So, these two files differ. 0:04:54.222,0:04:55.612 Ok, they're not reproducible. 0:04:56.807,0:04:57.527 Why is that? 0:04:57.873,0:04:59.656 So we run a diff on them. 0:05:00.140,0:05:00.637 Yeah… 0:05:01.340,0:05:04.093 So, what can we learn from this? 0:05:04.418,0:05:08.508 Well, not very much, visibly they're[br]compressed so 0:05:08.947,0:05:13.012 as soon as we see one change, we'll see[br]they would just cascade changes 0:05:13.362,0:05:14.866 because that's how compression works. 0:05:16.241,0:05:23.983 I guess we know it's a deb probably a ar[br]format file, not very useful. 0:05:24.193,0:05:26.005 Ok, great so we're gonna have a look in 0:05:26.492,0:05:29.919 We'll do a binary diff and ok, well… 0:05:30.923,0:05:32.790 Again, that's not really telling us[br]very much 0:05:34.413,0:05:36.515 with the diff there. 0:05:37.206,0:05:38.426 Ok, great. 0:05:39.417,0:05:40.427 ??? one level in 0:05:40.513,0:05:44.834 "ar x" is on the new maintainer thing,[br]"how you unpack a deb" 0:05:44.858,0:05:46.215 Everyone remembers this, right? 0:05:48.196,0:05:51.167 You unpack a.deb with "ar x" and you[br]do that to b.deb 0:05:51.599,0:05:53.606 and then we diff the results of that. 0:05:54.099,0:05:57.824 Ok, so…yeah, 7zip. 0:05:58.948,0:06:01.329 Ok, compressed content, not very useful. 0:06:01.897,0:06:07.898 Ok, so let's unpack the control.tar inside[br]these debs. 0:06:08.725,0:06:10.145 And then we run diff on that. 0:06:12.693,0:06:16.850 Still not really telling anything useful[br]about how to make this package reproducible 0:06:17.487,0:06:20.345 So let's unpack the tar.xz into the tar. 0:06:22.463,0:06:28.348 Inside that tar, there's a file called[br]md5sums and we start to see some differences 0:06:28.768,0:06:33.370 between some files in these two debs. 0:06:33.640,0:06:36.527 ??? meaningful, so now[br]we have some idea that 0:06:36.855,0:06:39.101 it has something to do with this[br]usr/bin/pmixer binary. 0:06:39.682,0:06:40.653 Ok, interesting. 0:06:41.989,0:06:45.015 We'll unzip that and then we do a diff on[br]pmixer itself. 0:06:45.914,0:06:48.600 Now we're back into just binary[br]"globgoly" mode 0:06:49.002,0:06:51.736 This isn't very helpful and this is taking[br]quite a while 0:06:52.399,0:06:54.663 and if I remember correctly, Debian has[br]a lot of packages. 0:06:55.182,0:06:56.784 So this might take a little while. 0:06:57.601,0:07:00.415 So, basically, ??? mean 0:07:00.782,0:07:02.008 I should build a better diff. 0:07:03.703,0:07:05.194 That's not quite true, this is actually… 0:07:05.783,0:07:07.472 It was lunar that started this project 0:07:07.801,0:07:10.670 and it was called debbindiff, because[br]we wanted to diff 0:07:11.093,0:07:12.264 binary Debian packages. 0:07:13.474,0:07:15.040 So this is the initial commit, 2014. 0:07:16.962,0:07:20.100 "The version is successfully able to report[br]differences in two .changes files. 0:07:20.100,0:07:22.343 Not with much interesting details,[br]but it's a start." 0:07:22.762,0:07:23.806 And it was a start. 0:07:27.581,0:07:29.918 Fast forwarding… Oh, sorry about these[br]colors, 0:07:30.307,0:07:31.872 I don't know if we can do anything about[br]the lights? 0:07:34.713,0:07:35.363 Yeah? 0:07:37.830,0:07:38.080 No? 0:07:42.124,0:07:42.974 Allright, whatever… 0:07:43.700,0:07:46.410 Basically, we're diffoscoping on… 0:07:47.546,0:07:49.595 It works kind of diff does normally, 0:07:49.981,0:07:51.995 you give it two files, it outputs[br]a unified diff. 0:07:52.699,0:07:59.427 So "diffoscope a b", one file contains[br]the word "foo", one contains the word "bar". 0:08:01.241,0:08:03.340 Nothing actually out of the ordinary. 0:08:03.974,0:08:07.670 It's sort of colored by default, so that's[br]why you can't see it, but whatever. 0:08:10.432,0:08:14.667 It supports archive formats, so if you[br]give it two tar files, 0:08:15.413,0:08:22.263 if we then tar up our "a" file and[br]our "b" file into a a.tar and b.tar 0:08:23.206,0:08:25.374 and then run diffoscope on those tar files 0:08:26.197,0:08:28.395 we get this kind of, like, hierarchy here. 0:08:28.742,0:08:32.006 So it's saying that there are differencies[br]between these files, 0:08:32.513,0:08:37.735 in the file list they have different time[br]stamps, because I made them 0:08:38.161,0:08:39.535 at different times, 0:08:39.848,0:08:42.575 and here are the contents, so we got[br]"foo" there and "bar" there. 0:08:43.296,0:08:44.781 So we can see the difference between them. 0:08:45.566,0:08:48.373 Well, I can, I don't know if you can,[br]you get the slide there. 0:08:49.311,0:08:53.551 If we gzip these tar files and then run[br]diffoscope on those gzip things, 0:08:53.888,0:08:59.230 it'll say "ok, what we've done is unpack it[br]first, and here's the metadata 0:08:59.622,0:09:01.653 about the gzip process", 0:09:02.107,0:09:05.941 and inside that are a.tar and b.tar[br]from the previous slides. 0:09:07.673,0:09:09.085 And then the "a" file and the "b" file. 0:09:09.365,0:09:15.303 So, it's really going two levels deep[br]into this tar.gz file. 0:09:16.162,0:09:17.042 That's pretty cool. 0:09:17.291,0:09:20.772 And it's completely recursive, I think[br]it will actually blow out after, I think, 0:09:20.993,0:09:21.697 1000 [levels]. 0:09:23.119,0:09:25.233 [light is turned down for the audience[br]to see the slides] 0:09:30.195,0:09:32.065 I'll just bump back a bit, just in case. 0:09:35.203,0:09:37.055 [Applause] 0:09:37.806,0:09:38.662 Thank you. 0:09:39.907,0:09:43.462 So that's the a and b files. 0:09:43.884,0:09:48.077 We've tared them up and so I see[br]the hierarchy of foo and bar file layer. 0:09:48.472,0:09:52.012 I've gziped them, so this is a gzip layer. 0:09:52.399,0:09:54.661 Here's the tar layer and then there's[br]the files themselves. 0:09:57.315,0:09:59.252 This is from a real .deb from the archive. 0:10:00.637,0:10:06.542 Inside this .deb, there's a data.tar.xz[br]and in that xz file there's a data.tar 0:10:07.294,0:10:11.081 and inside that tar file, there's a file[br]called aff and inside that 0:10:11.648,0:10:13.892 there's a version string that is different. 0:10:14.174,0:10:17.527 And that looks like a build date so we[br]probably know that if we went back 0:10:17.753,0:10:22.748 to the source package, we could very[br]quickly work out, 0:10:22.922,0:10:26.582 with get a very quick grep, work out[br]where this file is being generated from, 0:10:26.582,0:10:31.536 the de_DE.aff file and then ???[br]probably quite obvious 0:10:32.285,0:10:37.311 that it's using the current build time[br]and then we can just patch that, fix it etc. 0:10:38.362,0:10:45.681 This is gone from two rather obscure[br]binary .debs all the way to the fix 0:10:46.040,0:10:51.683 probably in about 5 minutes, and you can[br]probably send the patch in that time 0:10:52.098,0:10:53.086 because it'd be quite quick. 0:10:53.860,0:10:57.482 Without diffoscope here, without this sort[br]of recursive unpacking, 0:10:58.351,0:11:03.380 you'd be just completely lost, you'd be[br]there with arx all day 0:11:03.762,0:11:07.109 and working out which files are different[br]and trying to use xxd 0:11:07.859,0:11:09.410 and this kind of nonsense. 0:11:10.612,0:11:12.875 diffoscope's got some other things as well 0:11:13.277,0:11:17.116 if you try to do reproducible packages[br]and things are varying just on 0:11:17.381,0:11:22.408 the line ordering, we detect whether[br]a file differs only in the line ordering. 0:11:22.660,0:11:26.178 So, here's file "a", "These lines are in[br]order". 0:11:27.155,0:11:30.108 File "b" has "These order are in lines". 0:11:30.630,0:11:34.864 It's very difficult to say, actually,[br]it's like one of these tongue twisters. 0:11:35.305,0:11:38.862 Run diffoscope on these two and it says[br]it's got ordering differences only. 0:11:39.210,0:11:41.295 That's interesting, so you probably need[br]to sort, 0:11:41.592,0:11:45.076 you go all the way back to the source code,[br]work out very quickly, 0:11:45.389,0:11:48.381 if you know it's just ordering differences[br]you just kind of know 0:11:48.672,0:11:52.762 what the output's gonna be, you can[br]search for order in ??? 0:11:53.166,0:11:54.648 and you get the right files, 0:11:54.928,0:11:57.803 I have sorted in sort in the right[br]place, BAM! send it patched of, 0:11:57.889,0:11:59.280 everything is great. 0:11:59.280,0:12:02.720 Oh, and send it to upstream as well[br]because you're good. 0:12:03.041,0:12:04.707 It supports a lot more things. 0:12:05.509,0:12:08.611 We've been showing the terminal[br]text output here. 0:12:10.978,0:12:15.950 It's got a HTML output mode, which is[br]really useful in the hierarchal thing 0:12:16.139,0:12:17.359 when it gets a bit more complicated. 0:12:19.397,0:12:21.766 Instead of being laid on top of each other[br]like a unified diff, 0:12:22.312,0:12:26.811 you get the diff on the left and the right[br]and you get sort of a nested 0:12:27.075,0:12:32.372 thing inside with colors and lines and[br]you can link this and various things in it 0:12:32.728,0:12:37.547 including bits of metadata here, other[br]bits here, what command you used. 0:12:38.951,0:12:40.392 That's the HTML output. 0:12:40.659,0:12:43.960 We also support a lot of file formats,[br]it's not just on text, 0:12:45.635,0:12:48.958 it's about all of these, so let's quickly[br]run through some of them. 0:12:49.298,0:12:54.503 You give it two Androip apk files which[br]are kind of like zips, but magic. 0:12:55.163,0:12:58.211 It'll know how to compare them. 0:12:58.570,0:13:01.026 There's like a Manifest file that needs[br]decoding. 0:13:01.617,0:13:03.761 It supports Berkeley DB databases, 0:13:04.098,0:13:08.247 Word documents, that's a Word document[br]with "a" and that's a Word document with "b" 0:13:08.715,0:13:10.359 and it'll correctly do that. 0:13:10.583,0:13:14.311 If you run that through diff normally,[br]that ??? be a binaly mess, 0:13:14.932,0:13:16.188 so completely useless. 0:13:17.503,0:13:20.118 E-books, there's epub, it also supports[br]mobi. 0:13:20.563,0:13:25.958 So if you give it two epub files, it'll say[br]"They just differ in this date". 0:13:26.463,0:13:27.350 Brilliant. 0:13:28.177,0:13:30.557 Normally that will be completely useless[br]diff binary ??? 0:13:30.794,0:13:35.624 So you can be like "epub date, ok", grep[br]the source code for that, 0:13:36.427,0:13:38.350 make a patch really quickly. 0:13:39.594,0:13:42.786 Mono binaries, git repositories, why not? 0:13:43.693,0:13:46.222 Gnumeric spreadsheets, ISO images. 0:13:46.454,0:13:47.883 Oh yeah, ISO images is really cool. 0:13:48.359,0:13:55.044 So, it'll basically unpack the ISO, then[br]inside that there might be a squashfs image 0:13:55.378,0:14:01.549 then it'll completely go down to that and[br]work out any differences 0:14:01.746,0:14:06.065 between the two contents in the ISO file,[br]including any metadata. 0:14:06.432,0:14:10.607 This is on the squashfs metadata headers,[br]I think. 0:14:11.634,0:14:19.251 But say inside that ISO, there was a file[br]that was a pdf, and inside that pdf was 0:14:19.572,0:14:23.048 a ??? which varied, 0:14:23.285,0:14:26.653 it will basically go all the way down[br]and say "yeah, it's actually here, 0:14:26.909,0:14:28.446 in this ??? that the data differs." 0:14:28.866,0:14:32.355 And that means you can just go again[br]all the way back to the source 0:14:32.646,0:14:35.555 and say "ok, cool, we know how to fix[br]this quite quickly" 0:14:36.076,0:14:39.600 And this is really valuable in getting[br]the recent Tails distribution reproducible 0:14:39.973,0:14:43.387 so their ISOs are reproducible. 0:14:43.829,0:14:46.873 If you build one and I build one, we get[br]the exact same one 0:14:47.241,0:14:51.389 and that's kind of useful for something[br]like Tails where you would probably want to 0:14:51.828,0:14:54.966 of all, there's a lot of projects that you[br]might want to compromise, 0:14:55.450,0:14:58.792 you might want to go after that one,[br]because of the kind of people that are using it. 0:15:01.734,0:15:10.009 We support comparing images, so this is[br]using ??? 0:15:12.043,0:15:13.714 and then just running that through diff. 0:15:16.092,0:15:20.272 That is a linux penguin and that is[br]something else, 0:15:20.627,0:15:23.629 I can't remember now. Oh, FT. 0:15:24.819,0:15:25.801 It supports images. 0:15:27.044,0:15:33.009 It supports JSON and pretty print,[br]so if you give it two JSON files 0:15:33.485,0:15:36.657 one with key/value… it'll do a nice[br]diff of them. 0:15:38.042,0:15:43.432 It will pretty print it first, before[br]doing the diff, so it'll actually give you 0:15:43.634,0:15:46.236 something clean, otherwise I don't know[br]if you've ever diffed 0:15:46.978,0:15:50.344 two very long JSON lines, if they differ[br]in the middle, you just get 0:15:50.525,0:15:54.737 a huge long unified diff, but here it's[br]like "oh, just ??? things have changed" 0:15:58.875,0:16:04.052 OpenDocument text formats,[br]Ogg audio files, because why not. 0:16:05.148,0:16:08.251 tcpdump capture files, that's actually[br]quite useful. 0:16:09.019,0:16:17.540 PDFs. That PDF says "Hello World" and[br]this PDF says "Hello sick sad world", 0:16:17.995,0:16:23.356 I don't know why, that particulary text[br]in the demo. 0:16:23.852,0:16:27.058 Again, run that through normal diff[br]program… garbage. 0:16:28.212,0:16:34.074 XML documents. Again, it'll pretty print[br]them so it's nice, actually nice do read. 0:16:36.117,0:16:41.809 If you want to get started on diffoscope,[br]the very easiest and quickest way to do is 0:16:42.212,0:16:47.678 fire up a web browser, try.diffoscope.org,[br]select your files, press Compare 0:16:48.470,0:16:54.883 and it'll upload them and run diffoscope[br]with all the support for all the file formats 0:16:55.226,0:16:59.096 in the cloud for you and give you a nice[br]HTML page that you can then link to people 0:16:59.423,0:17:01.107 So that's the very quickest way to get[br]started. 0:17:02.360,0:17:06.884 The next quickest way is to install[br]trydiffoscope and then you run that 0:17:07.165,0:17:09.751 on two files and it'll basically do[br]the same thing, 0:17:10.018,0:17:12.312 run it in the same cloud service as[br]trydiffoscope 0:17:12.877,0:17:16.672 but it'll give you the result on the[br]command line or 0:17:16.981,0:17:22.010 if you pass the webbrowser option, it will[br]give you an URL or load your webbrowser, 0:17:22.228,0:17:24.951 I can't remember exactly which, with[br]the same results. 0:17:25.122,0:17:29.574 This is 1kB of Python, nothing basically. 0:17:31.226,0:17:33.120 That's the next easiest way. 0:17:34.262,0:17:36.622 But you can then install diffoscope itself[br]on your own machine. 0:17:37.631,0:17:42.824 I recommend not installing recommends[br]because all of those file formats 0:17:43.208,0:17:46.560 might drag in extra things about[br]the whole of TeX, 0:17:46.820,0:17:52.178 I think the whole of OpenOffice, whole[br]of Mono, whole Java… 0:17:57.263,0:17:58.403 Android, yeah, quite big. 0:18:01.941,0:18:03.489 I think there's another big one I can't[br]think of. 0:18:04.554,0:18:11.185 They're all optional, and they all say[br]"By the way, I support TeX documents 0:18:12.046,0:18:13.281 or whatever, Mono, whatever. 0:18:13.740,0:18:18.954 But you need to install this package and[br]then you get full pretty printed support", 0:18:19.846,0:18:21.433 And it'll tell you that when it's missing. 0:18:21.791,0:18:25.168 So, if you just start with[br]--install-recommends disabled, 0:18:26.427,0:18:29.107 right on your file, if it says[br]"please install this package, you can then 0:18:29.335,0:18:31.239 install them as you go along, as you want" 0:18:31.722,0:18:34.319 rather than installing everything. 0:18:34.630,0:18:38.333 And then you just pass ??? files[br]and then works as before 0:18:41.978,0:18:45.869 How you can you improve all your own[br]quality assurance and debian packaging 0:18:45.959,0:18:46.713 with different scope 0:18:47.582,0:18:50.974 The biggest value here is not[br]necessary for reproducible builds 0:18:51.771,0:18:56.406 It's for basically just seeing where you[br]do want to have a diff or expecting a diff 0:18:57.078,0:19:00.368 and you are expecting a particularly type[br]of diff in a particularly way 0:19:00.903,0:19:02.307 you can basically see those changes[br] 0:19:03.539,0:19:12.151 And if you build two debs normally and[br]... i'll try to demo in a second 0:19:12.403,0:19:16.239 You build a deb with a patch applied and [br]then build a deb with the patch applied 0:19:16.792,0:19:19.791 you can ??? run a diff on the source package 0:19:20.742,0:19:24.455 But that's not very useful because the[br]binaries are going to end in the 0:19:24.695,0:19:30.698 people machines. But if you run a diff on[br]the binary itself, did my change actually 0:19:31.150,0:19:33.205 hit the binary? I think really ...[br]No.. 0:19:36.118,0:19:39.093 I just run through a very live demo of[br]course, so it's gonna fail ... 0:20:03.706,0:20:07.376 Checkout some .... We'll get this [br]libnetx-java 0:20:11.041,0:20:12.160 We just build that once 0:20:16.188,0:20:19.258 Lets say we are on security team and 0:20:19.475,0:20:22.701 want to apply a patch, and we want to be[br]really sure because we are to push it out 0:20:22.888,0:20:24.044 to all our users 0:20:25.046,0:20:28.612 First we will make a changelog 0:20:38.445,0:20:39.284 Closing a bug 0:20:48.105,0:20:54.949 Find some java file to change 0:20:55.688,0:20:56.798 Let's pretend we have a real patch 0:21:06.374,0:21:10.650 Let's replace that equals equals,[br]say that was the fix 0:21:14.033,0:21:15.512 So that's the patch from upstream 0:21:15.884,0:21:16.966 Upstream blast patch 0:21:23.505,0:21:26.637 When we build this what we wanna see is[br]just that change in the file 0:21:27.141,0:21:32.116 we wanna see any nonsense changes of [br]extended dump but we also definitely want 0:21:32.293,0:21:37.129 to see that change, cause if our binary as[br]for security reasons don't have that change 0:21:37.129,0:21:42.270 then we aren't fixing people machines,[br]they will issue a DSA ??? installed ??? 0:21:44.685,0:21:48.766 And you should do proper testing as well[br]at multiple levels 0:21:52.763,0:21:53.799 I will build that again 0:22:23.976,0:22:29.717 So we wanna diff the original one 0 5, 0:22:30.432,0:22:36.212 We wanna diff that one with a fake [br]security one 0:22:37.608,0:22:43.481 You see on the progress bar 100%[br]1- there are diferences (there should be 0:22:43.681,0:22:46.304 diferences)[br]Lets see what that diferences are 0:22:48.418,0:22:51.828 in our web browser, its a nice html output 0:23:01.180,0:23:03.888 Let have a look.[br]Are we seeing what we wanna see? 0:23:07.147,0:23:11.151 There are some chances in the data tar, we[br]kind of expect that 0:23:14.447,0:23:18.389 What's changed in our control file?[br]Well the version changed,we wanted that 0:23:18.565,0:23:19.656 to change. Perfect 0:23:20.535,0:23:24.294 And its changed to ???[br]That's what we wanna see 0:23:24.744,0:23:28.370 No other changes here so there was no [br]weird control or in magic going on 0:23:32.297,0:23:38.421 In our data tar the color of the timestamp[br]changes, we will ignore those for now 0:23:40.996,0:23:44.944 The changelog has changed, well I hope so[br]because I have changed that entry[br] 0:23:48.820,0:23:51.793 Here is where we going to start seeing[br]We are going to see the changing in the 0:23:52.016,0:23:59.455 jar file which is the java class, java[br]compile archive format 0:24:00.442,0:24:05.931 We are seeing some meaningless timestamp[br]changes but we can ignore those 0:24:06.973,0:24:08.923 lets pretend because its just [br]metadata maybe 0:24:16.429,0:24:24.131 Ok part of a class, so if you can see here[br]it's basically a de-compilation of the 0:24:24.633,0:24:31.500 java file itself and it's basically saying[br]"oh I use to say if now and if not now"[br] 0:24:31.796,0:24:35.567 So these are the actual byte java[br]byte code instructions and whats really 0:24:35.965,0:24:39.241 And what is really ??? here[br]its that nothing else has changed[br] 0:24:39.627,0:24:44.717 We were just expecting that change between[br]the two op codes, of if now elseif not not now 0:24:45.554,0:24:49.557 which is good cause its like it hasn't made[br]any code changes but also crucial we can 0:24:49.725,0:24:52.076 see that it has actually made a change[br]to the code. 0:24:55.060,0:24:58.072 For example its wasn't use some cached[br]version or something like that 0:24:58.338,0:24:59.505 This is really useful 0:25:00.326,0:25:05.038 And just running a naif diff wouldn't[br]give that of course, because it would just 0:25:05.223,0:25:08.341 come with binary garbage[br]And just seeing the diff had changed again 0:25:08.627,0:25:12.604 ??? be told you anything, because all of the[br]change would have changed as well 0:25:12.802,0:25:15.886 So its like well yes it's diferent 0:25:16.028,0:25:19.161 The meaningful change there it's[br]what actually fixes the "floor" 0:25:19.597,0:25:21.020 ??? but we know it's there 0:25:22.945,0:25:27.448 That's kind of ??? [br]Shifting this deb out I'll be quite 0:25:27.687,0:25:30.004 confident, that this seemed like the[br]actual bug 0:25:31.151,0:25:34.721 I've been quite confident pushing that out[br]because it's very minimal amount of changes 0:25:35.218,0:25:36.750 you wanna do that for security reasons 0:25:37.285,0:25:40.111 So this was the live demo 0:25:43.038,0:25:48.108 The other one is seeing no changes[br]at all, so you can build once 0:25:48.108,0:25:49.894 if you build a reproducible 0:25:50.491,0:25:54.753 You can build once change your compiler[br]or change some other part of your toolchain 0:25:55.982,0:26:02.267 Build it again and if you got the exact same[br]results, well great, that's want you intended 0:26:02.534,0:26:04.595 You wanna see no changes when you change[br]some part of it 0:26:08.127,0:26:11.928 And that is really useful, if there were[br]changes diffoscope will highlight them 0:26:12.271,0:26:15.993 and show exactly why they had changed,[br]maybe some compile authorizations, 0:26:16.393,0:26:17.565 maybe some other things as well 0:26:19.056,0:26:22.603 So you can use it in both ways, when you[br]expect changes and when you don't expect 0:26:22.789,0:26:26.926 changes, and if those match the expectations[br]diffoscope will tell you exactly why 0:26:29.922,0:26:34.355 It's all ??? when other companies[br]are doing security releases 0:26:35.111,0:26:41.184 naming no names whatsoever,[br]but they like to release patches as you 0:26:41.697,0:26:44.618 know just a new firmware for your router 0:26:46.674,0:26:50.629 Very large file system images,[br]you basically have no ideia what changed 0:26:51.034,0:26:55.037 between these two files, again you run[br]through diff completely useless 0:26:55.419,0:26:59.496 You can start to unpack them with[br]squashfs and blah blah blah 0:27:01.143,0:27:05.753 But they're probably sort of concatenated[br]cpio archives, so that's nonsense 0:27:07.223,0:27:11.913 But diffoscope would just chew you those[br]and give you actually what the diferences 0:27:11.913,0:27:15.197 is between these two files, and say[br]they changed this, they've removed or 0:27:15.596,0:27:19.260 added some gpl license code or something[br]kind of interesting 0:27:24.293,0:27:31.212 So its very useful for diffing those kind[br]binary blobs that come from various people 0:27:33.013,0:27:36.983 So the current state of diffoscope,[br]the development is up and down 0:27:41.148,0:27:51.343 It started around May 2014 something like that[br]A bunch of work here, that's is idle I think 0:27:55.239,0:27:56.841 These are just for debconfs basically 0:28:09.157,0:28:12.343 Anyway it's going up and down its kind[br]of interesting 0:28:14.939,0:28:19.296 ??? a lot of reproducible builds projects[br]of course, so every time we do a build[br] 0:28:19.621,0:28:25.064 on the ??? reproducible builds or[br]testing framework if we run diffoscope 0:28:25.303,0:28:29.834 on the result, if it's reproducible it[br]just says , hey the file is the same 0:28:31.208,0:28:36.767 But if not, we publish the diffoscopes of[br]all your packages that are unreproducible 0:28:37.092,0:28:40.870 just you can just go there and be like[br]whats the diference between these two things 0:28:53.762,0:29:02.115 I invested a lot of work optimizing[br]diffoscope, ??? rather perverse end square 0:29:02.465,0:29:07.556 loops inside it. So i manage to cut down[br]some of the time here, cut down here 0:29:11.063,0:29:14.012 That's been quite a few performances and [br]enhancements over the past ... 0:29:16.395,0:29:21.240 these are the git tags , this is version 80[br]and this is version 50 I just run the same 0:29:22.147,0:29:23.363 benchmark across them all 0:29:24.705,0:29:35.180 So they shows when I have introduced some[br]rather stupid code, embarrassing , but whatever 0:29:35.703,0:29:36.424 ??? 0:29:37.482,0:29:40.522 There's work been done right now,[br]on parallel processing, there's been 0:29:40.923,0:29:46.344 quite a few attempts before, but adding it[br]it's kind of interesting and difficult 0:29:47.033,0:29:51.898 Luckily we have an outreach student[br]Liliana, is she in the room? Is she hiding? 0:29:53.069,0:29:57.225 She's here and she's been talking tomorrow[br]about her work on paralel processing in 0:29:57.520,0:30:02.162 diffoscope and that will be amazing because[br]a lot of it is IO bound or waiting for Xtel 0:30:02.388,0:30:06.635 processors with multiple cpu machines,[br]you mind as well just play well 0:30:07.012,0:30:11.631 while as I stand waiting for the result[br]for a pdf to be unpacked I maybe as well 0:30:11.913,0:30:16.859 be running on another cpu, I think we are[br]going to see some real performance wins 0:30:17.512,0:30:22.810 as we do that paralell processing merge and[br]working and ??? 0:30:24.189,0:30:29.544 You can check out our website diffoscope.org[br]recently migrated to Salsa .... yeeaahhh 0:30:33.375,0:30:37.771 And everything that's reproducible is now[br]on Salsa, it's kind of cool 0:30:38.732,0:30:42.450 That's quite recent...[br]??? 0:30:44.620,0:30:45.876 Thank you very muck, danke shcön 0:30:46.560,0:30:48.733 You got any questions?[br]About diffoscope? 0:30:51.659,0:30:53.558 Thank you very much ! 0:30:53.558,0:30:57.761 [Applause] 0:30:59.888,0:31:02.954 Q: A buzz word question, can you diff containers[br]image formats? 0:31:04.943,0:31:14.617 A: Depend which ones. So if they are just[br]directories, then yes, because is just a directory 0:31:15.139,0:31:17.224 Do you have particullary in mind? Like docker? 0:31:19.068,0:31:25.487 Yes, there's docker and then there's old[br]CI, I believe is the standard one 0:31:26.669,0:31:30.506 And that could make a buzz word complaint[br] 0:31:31.286,0:31:33.028 Ah ok we were all about buzz words 0:31:34.334,0:31:37.411 Probable diffoscope block change as well 0:31:38.249,0:31:42.059 And then run diffoscope on connectors and[br]see the difference between updates of your 0:31:42.059,0:31:43.395 container images 0:31:43.620,0:31:46.219 BAM ... solved[br]Where do I invest? 0:31:48.231,0:31:56.645 I wasn't aware that OCI ... that's is how it's[br]called? No it doesn't support that right now 0:31:58.347,0:32:02.025 But it wouldn't be too difficult, presuming[br]there are tools to unpack it and as soon 0:32:02.297,0:32:07.761 we have a tool to unpack it, it can then [br]just go to that, there is an open wishlist 0:32:08.177,0:32:15.402 bug tool box for docker containers to the [br]point were I think it would be really 0:32:15.668,0:32:19.338 nice if you could just give it, say, two [br]images names or whatever the noun is 0:32:19.835,0:32:24.083 So you can say "please diff these two[br]docker images that are available" and 0:32:24.274,0:32:28.753 it can look at your local thing and do [br]a diff on them, currently it's not 0:32:29.008,0:32:31.077 supported, but there is an open wishlist[br]bug. 0:32:32.345,0:32:36.860 Q: Shouldn't any company that releases[br]binaries, be interested in supporting 0:32:37.183,0:32:38.544 diffoscope and using it? 0:32:51.541,0:32:58.413 A1: Basically when companies release binaries they are not interested in users seeing diferences... 0:33:01.874,0:33:10.299 A2: Yes, I'm surprised that actually the[br]docker bug was only opened two months ago 0:33:10.776,0:33:17.144 and hasn't been more interest on diffing[br]container images, but if you like to open 0:33:17.561,0:33:24.460 one for OCI that will be very appreciated,[br]and we can get on to that, that would be 0:33:24.677,0:33:25.573 great. 0:33:30.038,0:33:35.465 I was looking the page for OCI, it says[br]it's based on docker basically, so 0:33:35.655,0:33:40.500 once you get OCI for free, you would[br]sort it out for docker, if you're lucky 0:33:48.166,0:33:51.646 The OCI image formaters, they wrote out[br]on docker images 0:33:55.429,0:34:00.232 Ok we will sort that out, and it seems like[br]we're using a docker more and more 0:34:00.279,0:34:01.451 on debian 0:34:07.484,0:34:09.216 Any other questions? 0:34:20.886,0:34:29.297 Q: Out of curiosity, which ??? are you using[br]inside? Are you using some bio-informatics 0:34:30.447,0:34:33.332 algorithm to diff trees efficiently? 0:34:34.200,0:34:46.781 A: No it's really naif, all it does is run[br]normal diff, the normal diff tools, but 0:34:47.126,0:34:59.242 it will try to identify files and unpack[br]first, so use the file utility identifier 0:34:59.716,0:35:06.547 thing that says its a pdf , and try to[br]unpack it first, he doesn't do any clever 0:35:07.415,0:35:12.056 matching. The clever matching that he does[br]do is fuzzy matching as well, so if just 0:35:12.293,0:35:18.567 rename a directory between two inside a [br]container, he will say , yeah there a 0:35:18.812,0:35:23.981 massive fuzzy match between this[br]two files, and things like that. So that's 0:35:24.241,0:35:31.110 kind of useful, but apart from that clever, [br]which is kind of what you want , because 0:35:31.292,0:35:34.308 if it's too clever it would start to be a little[br]opaque ... 0:35:37.749,0:35:40.046 I personally like dumb tools. 0:35:43.916,0:35:51.411 Q: So one question to you is whether,[br]if you wanna do a release to stable or 0:35:51.565,0:35:58.973 something like that, you can ask for the[br]debdiff, I'm wandering if anyone 0:35:59.174,0:36:03.914 I mean I remember doing that myself[br]I've been submitting diffoscope output 0:36:04.119,0:36:09.516 as well, because is just more readable and[br]useful. so I'm not sure if anyone have any 0:36:09.692,0:36:12.741 objection to people asking for those.[br] 0:36:22.179,0:36:24.752 I'll propose that to the release team[br]see what they say 0:36:26.024,0:36:28.950 Thank you very much, [br]is there any other questions? 0:36:32.634,0:36:36.787 No further questions? Then lets thanks[br]Chris again ! 0:36:37.137,0:36:41.940 [Applause]