Return to Video

https:/.../diffoscope.webm

  • 0:07 - 0:10
    I'm here today to talk to you about
    diffoscope
  • 0:10 - 0:13
    and how you can use it as a better diff
  • 0:14 - 0:16
    or for Quality Assurance, etc., things
    like that.
  • 0:20 - 0:21
    Moin!
  • 0:21 - 0:24
    Apparently that's like a north german
    thing to say "welcome".
  • 0:26 - 0:30
    North german, north Denmark, Scandinavia,
    that kind of thing, I'm told.
  • 0:32 - 0:34
    People are shaking their head, so I'm
    going to assume that's true.
  • 0:37 - 0:40
    This is my first PC, an IBM 5155.
  • 0:42 - 0:46
    Sometimes, when you rebooted it, it would
    launch into, it would somehow revert
  • 0:47 - 0:51
    from booting from the hard disk to booting
    from a basic ROM,
  • 0:51 - 0:53
    as in the programming language ROM.
  • 0:53 - 0:54
    It was on my motherboard for some reason.
  • 0:55 - 0:58
    So, randomly, you just get a chance to
    program in basic and then,
  • 0:58 - 1:00
    sometimes you wouldn't, I don't know why,
    but… yeah.
  • 1:01 - 1:05
    It's quite fun with this kind of clicky
    keyboard, and that folded in
  • 1:06 - 1:07
    and it was this kind of big desk thing.
  • 1:07 - 1:08
    Anyway…
  • 1:09 - 1:10
    This is my first Debian.
  • 1:10 - 1:12
    At the time it was already old.
  • 1:13 - 1:16
    What's this one? Is this Slink? 2.2?
    Yeah.
  • 1:17 - 1:22
    And this is when we had US and non-US,
    so that's really dating if you remember that.
  • 1:24 - 1:28
    This is my first contribution to Debian,
    19th December 2006,
  • 1:29 - 1:34
    sending a patch to lillypond which is kind
    of interesting
  • 1:34 - 1:37
    and the response was "Oh yeah, rock on,
    many thanks. I'll upload this and
  • 1:37 - 1:39
    it'll be landing to Etch".
  • 1:39 - 1:43
    And this was super motivating because
    Etch was just coming out and it was like
  • 1:44 - 1:49
    "Great, I've got let one line of tiny patch
    in a release. This is super cool."
  • 1:49 - 1:53
    Thomas' response was super motivating.
  • 1:53 - 1:56
    So, after that, like that Christmas
    basically spent ???
  • 1:57 - 2:00
    Debian webpages and stuff.
  • 2:00 - 2:02
    Very well timed.
  • 2:02 - 2:04
    That's kind of a good…
  • 2:04 - 2:07
    You know, someone sends a patch, be like
    "Cool, thanks"
  • 2:08 - 2:09
    Like a little notice in the changelog.
  • 2:10 - 2:14
    It was, you know, so stupid but…
    Yeah, do that kind of thing.
  • 2:16 - 2:17
    So, moving on.
  • 2:18 - 2:20
    Why diffoscope?
    Why did we write diffoscope?
  • 2:21 - 2:22
    What's the background here?
  • 2:22 - 2:25
    It comes from reproducible builds.
  • 2:25 - 2:29
    The very quick outline is that once you
    get the source code for free software,
  • 2:29 - 2:32
    you download the source code for nginx
    or whatever,
  • 2:32 - 2:36
    pretty much everyone just runs binaries
    on their servers or their systems.
  • 2:36 - 2:39
    You know, "apt install bla", "yum install",
    whatever.
  • 2:41 - 2:42
    Android Playstore, whatever.
  • 2:42 - 2:46
    Can you actually trust whether these two
    things correspond with each other?
  • 2:46 - 2:50
    You've gotten the source code, it looks
    alright, and then you install this binary,
  • 2:51 - 2:52
    yeah…
  • 2:52 - 2:56
    Who generated that? Can you trust that
    process?
  • 2:56 - 2:57
    Can you trust who generated it?
  • 2:58 - 3:01
    Even if you could trust them, could you
    trust them not to be exploited? Etc.
  • 3:02 - 3:05
    This is a big problem because you can
    exploit a build farm and then
  • 3:05 - 3:10
    obviously exploit all of that, you know,
    a trojan into the build farm,
  • 3:10 - 3:13
    so every single binary that comes out
    is compromised.
  • 3:14 - 3:15
    Kind of problematic.
  • 3:15 - 3:18
    You could also target individual developers
    machines,
  • 3:18 - 3:21
    so I could go of to, say, your machine,
    add a backdoor to it,
  • 3:22 - 3:25
    so every binary that you give to friends
    and things like that,
  • 3:27 - 3:30
    are compromised in some way, stealing
    your bitcoins or whatever.
  • 3:32 - 3:36
    I can also turn up at your door
    and blackmail you into producing
  • 3:39 - 3:43
    software that has compromises or extra
    features, shall we say,
  • 3:43 - 3:45
    that don't exist in the source code.
  • 3:45 - 3:48
    So what will happen there is that you'd
    release your source
  • 3:48 - 3:52
    and the binaries you produce have
    this sort of backdoor that, you know,
  • 3:52 - 3:55
    someone is forcing you into producing.
  • 3:55 - 3:57
    So, you don't want to do that.
  • 3:57 - 3:58
    Anyway
  • 3:58 - 3:59
    enough of that.
  • 3:59 - 4:03
    What you do for reproducible builds is you
    ensure that every time you build
  • 4:03 - 4:06
    a piece of software, you get an identical
    result.
  • 4:07 - 4:11
    Multiple people then compare their builds
    and check whether they all get
  • 4:07 - 4:11
    the same results
  • 4:11 - 4:16
    and this means that an attacker must
    either have infected everyone
  • 4:16 - 4:18
    at the same time, or they haven't
    infected anyone.
  • 4:21 - 4:24
    The point here is that you have to ensure
    that builds have identical results.
  • 4:24 - 4:25
    Ok, great.
  • 4:28 - 4:33
    So, we started the reproducible builds
    project, etc.
  • 4:33 - 4:35
    And we build 2 debs.
  • 4:35 - 4:37
    Oh, I'm sorry about the colors there.
  • 4:38 - 4:39
    You probably can't see that.
  • 4:39 - 4:42
    That says "sha1sum a.deb b.deb".
  • 4:46 - 4:51
    Anyway, we're comparing the sha1sums
    of 2 binary Debian files.
  • 4:51 - 4:54
    So, these two files differ.
  • 4:54 - 4:56
    Ok, they're not reproducible.
  • 4:57 - 4:58
    Why is that?
  • 4:58 - 5:00
    So we run a diff on them.
  • 5:00 - 5:01
    Yeah…
  • 5:01 - 5:04
    So, what can we learn from this?
  • 5:04 - 5:09
    Well, not very much, visibly they're
    compressed so
  • 5:09 - 5:13
    as soon as we see one change, we'll see
    they would just cascade changes
  • 5:13 - 5:15
    because that's how compression works.
  • 5:16 - 5:24
    I guess we know it's a deb probably a ar
    format file, not very useful.
  • 5:24 - 5:26
    Ok, great so we're gonna have a look in
  • 5:26 - 5:30
    We'll do a binary diff and ok, well…
  • 5:31 - 5:33
    Again, that's not really telling us
    very much
  • 5:34 - 5:37
    with the diff there.
  • 5:37 - 5:38
    Ok, great.
  • 5:39 - 5:40
    ??? one level in
  • 5:41 - 5:45
    "ar x" is on the new maintainer thing,
    "how you unpack a deb"
  • 5:45 - 5:46
    Everyone remembers this, right?
  • 5:48 - 5:51
    You unpack a.deb with "ar x" and you
    do that to b.deb
  • 5:52 - 5:54
    and then we diff the results of that.
  • 5:54 - 5:58
    Ok, so…yeah, 7zip.
  • 5:59 - 6:01
    Ok, compressed content, not very useful.
  • 6:02 - 6:08
    Ok, so let's unpack the control.tar inside
    these debs.
  • 6:09 - 6:10
    And then we run diff on that.
  • 6:13 - 6:17
    Still not really telling anything useful
    about how to make this package reproducible
  • 6:17 - 6:20
    So let's unpack the tar.xz into the tar.
  • 6:22 - 6:28
    Inside that tar, there's a file called
    md5sums and we start to see some differences
  • 6:29 - 6:33
    between some files in these two debs.
  • 6:34 - 6:37
    ??? meaningful, so now
    we have some idea that
  • 6:37 - 6:39
    it has something to do with this
    usr/bin/pmixer binary.
  • 6:40 - 6:41
    Ok, interesting.
  • 6:42 - 6:45
    We'll unzip that and then we do a diff on
    pmixer itself.
  • 6:46 - 6:49
    Now we're back into just binary
    "globgoly" mode
  • 6:49 - 6:52
    This isn't very helpful and this is taking
    quite a while
  • 6:52 - 6:55
    and if I remember correctly, Debian has
    a lot of packages.
  • 6:55 - 6:57
    So this might take a little while.
  • 6:58 - 7:00
    So, basically, ??? mean
  • 7:01 - 7:02
    I should build a better diff.
  • 7:04 - 7:05
    That's not quite true, this is actually…
  • 7:06 - 7:07
    It was lunar that started this project
  • 7:08 - 7:11
    and it was called debbindiff, because
    we wanted to diff
  • 7:11 - 7:12
    binary Debian packages.
  • 7:13 - 7:15
    So this is the initial commit, 2014.
  • 7:17 - 7:20
    "The version is successfully able to report
    differences in two .changes files.
  • 7:20 - 7:22
    Not with much interesting details,
    but it's a start."
  • 7:23 - 7:24
    And it was a start.
  • 7:28 - 7:30
    Fast forwarding… Oh, sorry about these
    colors,
  • 7:30 - 7:32
    I don't know if we can do anything about
    the lights?
  • 7:35 - 7:35
    Yeah?
  • 7:38 - 7:38
    No?
  • 7:42 - 7:43
    Allright, whatever…
  • 7:44 - 7:46
    Basically, we're diffoscoping on…
  • 7:48 - 7:50
    It works kind of diff does normally,
  • 7:50 - 7:52
    you give it two files, it outputs
    a unified diff.
  • 7:53 - 7:59
    So "diffoscope a b", one file contains
    the word "foo", one contains the word "bar".
  • 8:01 - 8:03
    Nothing actually out of the ordinary.
  • 8:04 - 8:08
    It's sort of colored by default, so that's
    why you can't see it, but whatever.
  • 8:10 - 8:15
    It supports archive formats, so if you
    give it two tar files,
  • 8:15 - 8:22
    if we then tar up our "a" file and
    our "b" file into a a.tar and b.tar
  • 8:23 - 8:25
    and then run diffoscope on those tar files
  • 8:26 - 8:28
    we get this kind of, like, hierarchy here.
  • 8:29 - 8:32
    So it's saying that there are differencies
    between these files,
  • 8:33 - 8:38
    in the file list they have different time
    stamps, because I made them
  • 8:38 - 8:40
    at different times,
  • 8:40 - 8:43
    and here are the contents, so we got
    "foo" there and "bar" there.
  • 8:43 - 8:45
    So we can see the difference between them.
  • 8:46 - 8:48
    Well, I can, I don't know if you can,
    you get the slide there.
  • 8:49 - 8:54
    If we gzip these tar files and then run
    diffoscope on those gzip things,
  • 8:54 - 8:59
    it'll say "ok, what we've done is unpack it
    first, and here's the metadata
  • 9:00 - 9:02
    about the gzip process",
  • 9:02 - 9:06
    and inside that are a.tar and b.tar
    from the previous slides.
  • 9:08 - 9:09
    And then the "a" file and the "b" file.
  • 9:09 - 9:15
    So, it's really going two levels deep
    into this tar.gz file.
  • 9:16 - 9:17
    That's pretty cool.
  • 9:17 - 9:21
    And it's completely recursive, I think
    it will actually blow out after, I think,
  • 9:21 - 9:22
    1000 [levels].
  • 9:23 - 9:25
    [light is turned down for the audience
    to see the slides]
  • 9:30 - 9:32
    I'll just bump back a bit, just in case.
  • 9:35 - 9:37
    [Applause]
  • 9:38 - 9:39
    Thank you.
  • 9:40 - 9:43
    So that's the a and b files.
  • 9:44 - 9:48
    We've tared them up and so I see
    the hierarchy of foo and bar file layer.
  • 9:48 - 9:52
    I've gziped them, so this is a gzip layer.
  • 9:52 - 9:55
    Here's the tar layer and then there's
    the files themselves.
  • 9:57 - 9:59
    This is from a real .deb from the archive.
  • 10:01 - 10:07
    Inside this .deb, there's a data.tar.xz
    and in that xz file there's a data.tar
  • 10:07 - 10:11
    and inside that tar file, there's a file
    called aff and inside that
  • 10:12 - 10:14
    there's a version string that is different.
  • 10:14 - 10:18
    And that looks like a build date so we
    probably know that if we went back
  • 10:18 - 10:23
    to the source package, we could very
    quickly work out,
  • 10:23 - 10:27
    with get a very quick grep, work out
    where this file is being generated from,
  • 10:27 - 10:32
    the de_DE.aff file and then ???
    probably quite obvious
  • 10:32 - 10:37
    that it's using the current build time
    and then we can just patch that, fix it etc.
  • 10:38 - 10:46
    This is gone from two rather obscure
    binary .debs all the way to the fix
  • 10:46 - 10:52
    probably in about 5 minutes, and you can
    probably send the patch in that time
  • 10:52 - 10:53
    because it'd be quite quick.
  • 10:54 - 10:57
    Without diffoscope here, without this sort
    of recursive unpacking,
  • 10:58 - 11:03
    you'd be just completely lost, you'd be
    there with arx all day
  • 11:04 - 11:07
    and working out which files are different
    and trying to use xxd
  • 11:08 - 11:09
    and this kind of nonsense.
  • 11:11 - 11:13
    diffoscope's got some other things as well
  • 11:13 - 11:17
    if you try to do reproducible packages
    and things are varying just on
  • 11:17 - 11:22
    the line ordering, we detect whether
    a file differs only in the line ordering.
  • 11:23 - 11:26
    So, here's file "a", "These lines are in
    order".
  • 11:27 - 11:30
    File "b" has "These order are in lines".
  • 11:31 - 11:35
    It's very difficult to say, actually,
    it's like one of these tongue twisters.
  • 11:35 - 11:39
    Run diffoscope on these two and it says
    it's got ordering differences only.
  • 11:39 - 11:41
    That's interesting, so you probably need
    to sort,
  • 11:42 - 11:45
    you go all the way back to the source code,
    work out very quickly,
  • 11:45 - 11:48
    if you know it's just ordering differences
    you just kind of know
  • 11:49 - 11:53
    what the output's gonna be, you can
    search for order in ???
  • 11:53 - 11:55
    and you get the right files,
  • 11:55 - 11:58
    I have sorted in sort in the right
    place, BAM! send it patched of,
  • 11:58 - 11:59
    everything is great.
  • 11:59 - 12:03
    Oh, and send it to upstream as well
    because you're good.
  • 12:03 - 12:05
    It supports a lot more things.
  • 12:06 - 12:09
    We've been showing the terminal
    text output here.
  • 12:11 - 12:16
    It's got a HTML output mode, which is
    really useful in the hierarchal thing
  • 12:16 - 12:17
    when it gets a bit more complicated.
  • 12:19 - 12:22
    Instead of being laid on top of each other
    like a unified diff,
  • 12:22 - 12:27
    you get the diff on the left and the right
    and you get sort of a nested
  • 12:27 - 12:32
    thing inside with colors and lines and
    you can link this and various things in it
  • 12:33 - 12:38
    including bits of metadata here, other
    bits here, what command you used.
  • 12:39 - 12:40
    That's the HTML output.
  • 12:41 - 12:44
    We also support a lot of file formats,
    it's not just on text,
  • 12:46 - 12:49
    it's about all of these, so let's quickly
    run through some of them.
  • 12:49 - 12:55
    You give it two Androip apk files which
    are kind of like zips, but magic.
  • 12:55 - 12:58
    It'll know how to compare them.
  • 12:59 - 13:01
    There's like a Manifest file that needs
    decoding.
  • 13:02 - 13:04
    It supports Berkeley DB databases,
  • 13:04 - 13:08
    Word documents, that's a Word document
    with "a" and that's a Word document with "b"
  • 13:09 - 13:10
    and it'll correctly do that.
  • 13:11 - 13:14
    If you run that through diff normally,
    that ??? be a binaly mess,
  • 13:15 - 13:16
    so completely useless.
  • 13:18 - 13:20
    E-books, there's epub, it also supports
    mobi.
  • 13:21 - 13:26
    So if you give it two epub files, it'll say
    "They just differ in this date".
  • 13:26 - 13:27
    Brilliant.
  • 13:28 - 13:31
    Normally that will be completely useless
    diff binary ???
  • 13:31 - 13:36
    So you can be like "epub date, ok", grep
    the source code for that,
  • 13:36 - 13:38
    make a patch really quickly.
  • 13:40 - 13:43
    Mono binaries, git repositories, why not?
  • 13:44 - 13:46
    Gnumeric spreadsheets, ISO images.
  • 13:46 - 13:48
    Oh yeah, ISO images is really cool.
  • 13:48 - 13:55
    So, it'll basically unpack the ISO, then
    inside that there might be a squashfs image
  • 13:55 - 14:02
    then it'll completely go down to that and
    work out any differences
  • 14:02 - 14:06
    between the two contents in the ISO file,
    including any metadata.
  • 14:06 - 14:11
    This is on the squashfs metadata headers,
    I think.
  • 14:12 - 14:19
    But say inside that ISO, there was a file
    that was a pdf, and inside that pdf was
  • 14:20 - 14:23
    a ??? which varied,
  • 14:23 - 14:27
    it will basically go all the way down
    and say "yeah, it's actually here,
  • 14:27 - 14:28
    in this ??? that the data differs."
  • 14:29 - 14:32
    And that means you can just go again
    all the way back to the source
  • 14:33 - 14:36
    and say "ok, cool, we know how to fix
    this quite quickly"
  • 14:36 - 14:40
    And this is really valuable in getting
    the recent Tails distribution reproducible
  • 14:40 - 14:43
    so their ISOs are reproducible.
  • 14:44 - 14:47
    If you build one and I build one, we get
    the exact same one
  • 14:47 - 14:51
    and that's kind of useful for something
    like Tails where you would probably want to
  • 14:52 - 14:55
    of all, there's a lot of projects that you
    might want to compromise,
  • 14:55 - 14:59
    you might want to go after that one,
    because of the kind of people that are using it.
  • 15:02 - 15:10
    We support comparing images, so this is
    using ???
  • 15:12 - 15:14
    and then just running that through diff.
  • 15:16 - 15:20
    That is a linux penguin and that is
    something else,
  • 15:21 - 15:24
    I can't remember now. Oh, FT.
  • 15:25 - 15:26
    It supports images.
  • 15:27 - 15:33
    It supports JSON and pretty print,
    so if you give it two JSON files
  • 15:33 - 15:37
    one with key/value… it'll do a nice
    diff of them.
  • 15:38 - 15:43
    It will pretty print it first, before
    doing the diff, so it'll actually give you
  • 15:44 - 15:46
    something clean, otherwise I don't know
    if you've ever diffed
  • 15:47 - 15:50
    two very long JSON lines, if they differ
    in the middle, you just get
  • 15:51 - 15:55
    a huge long unified diff, but here it's
    like "oh, just ??? things have changed"
  • 15:59 - 16:04
    OpenDocument text formats,
    Ogg audio files, because why not.
  • 16:05 - 16:08
    tcpdump capture files, that's actually
    quite useful.
  • 16:09 - 16:18
    PDFs. That PDF says "Hello World" and
    this PDF says "Hello sick sad world",
  • 16:18 - 16:23
    I don't know why, that particulary text
    in the demo.
  • 16:24 - 16:27
    Again, run that through normal diff
    program… garbage.
  • 16:28 - 16:34
    XML documents. Again, it'll pretty print
    them so it's nice, actually nice do read.
  • 16:36 - 16:42
    If you want to get started on diffoscope,
    the very easiest and quickest way to do is
  • 16:42 - 16:48
    fire up a web browser, try.diffoscope.org,
    select your files, press Compare
  • 16:48 - 16:55
    and it'll upload them and run diffoscope
    with all the support for all the file formats
  • 16:55 - 16:59
    in the cloud for you and give you a nice
    HTML page that you can then link to people
  • 16:59 - 17:01
    So that's the very quickest way to get
    started.
  • 17:02 - 17:07
    The next quickest way is to install
    trydiffoscope and then you run that
  • 17:07 - 17:10
    on two files and it'll basically do
    the same thing,
  • 17:10 - 17:12
    run it in the same cloud service as
    trydiffoscope
  • 17:13 - 17:17
    but it'll give you the result on the
    command line or
  • 17:17 - 17:22
    if you pass the webbrowser option, it will
    give you an URL or load your webbrowser,
  • 17:22 - 17:25
    I can't remember exactly which, with
    the same results.
  • 17:25 - 17:30
    This is 1kB of Python, nothing basically.
  • 17:31 - 17:33
    That's the next easiest way.
  • 17:34 - 17:37
    But you can then install diffoscope itself
    on your own machine.
  • 17:38 - 17:43
    I recommend not installing recommends
    because all of those file formats
  • 17:43 - 17:47
    might drag in extra things about
    the whole of TeX,
  • 17:47 - 17:52
    I think the whole of OpenOffice, whole
    of Mono, whole Java…
  • 17:57 - 17:58
    Android, yeah, quite big.
  • 18:02 - 18:03
    I think there's another big one I can't
    think of.
  • 18:05 - 18:11
    They're all optional, and they all say
    "By the way, I support TeX documents
  • 18:12 - 18:13
    or whatever, Mono, whatever.
  • 18:14 - 18:19
    But you need to install this package and
    then you get full pretty printed support",
  • 18:20 - 18:21
    And it'll tell you that when it's missing.
  • 18:22 - 18:25
    So, if you just start with
    --install-recommends disabled,
  • 18:26 - 18:29
    right on your file, if it says
    "please install this package, you can then
  • 18:29 - 18:31
    install them as you go along, as you want"
  • 18:32 - 18:34
    rather than installing everything.
  • 18:35 - 18:38
    And then you just pass ??? files
    and then works as before
  • 18:42 - 18:46
    How you can you improve all your own
    quality assurance and debian packaging
  • 18:46 - 18:47
    with different scope
  • 18:48 - 18:51
    The biggest value here is not
    necessary for reproducible builds
  • 18:52 - 18:56
    It's for basically just seeing where you
    do want to have a diff or expecting a diff
  • 18:57 - 19:00
    and you are expecting a particularly type
    of diff in a particularly way
  • 19:01 - 19:02
    you can basically see those changes
  • 19:04 - 19:12
    And if you build two debs normally and
    ... i'll try to demo in a second
  • 19:12 - 19:16
    You build a deb with a patch applied and
    then build a deb with the patch applied
  • 19:17 - 19:20
    you can ??? run a diff on the source package
  • 19:21 - 19:24
    But that's not very useful because the
    binaries are going to end in the
  • 19:25 - 19:31
    people machines. But if you run a diff on
    the binary itself, did my change actually
  • 19:31 - 19:33
    hit the binary? I think really ...
    No..
  • 19:36 - 19:39
    I just run through a very live demo of
    course, so it's gonna fail ...
  • 20:04 - 20:07
    Checkout some .... We'll get this
    libnetx-java
  • 20:11 - 20:12
    We just build that once
  • 20:16 - 20:19
    Lets say we are on security team and
  • 20:19 - 20:23
    want to apply a patch, and we want to be
    really sure because we are to push it out
  • 20:23 - 20:24
    to all our users
  • 20:25 - 20:29
    First we will make a changelog
  • 20:38 - 20:39
    Closing a bug
  • 20:48 - 20:55
    Find some java file to change
  • 20:56 - 20:57
    Let's pretend we have a real patch
  • 21:06 - 21:11
    Let's replace that equals equals,
    say that was the fix
  • 21:14 - 21:16
    So that's the patch from upstream
  • 21:16 - 21:17
    Upstream blast patch
  • 21:24 - 21:27
    When we build this what we wanna see is
    just that change in the file
  • 21:27 - 21:32
    we wanna see any nonsense changes of
    extended dump but we also definitely want
  • 21:32 - 21:37
    to see that change, cause if our binary as
    for security reasons don't have that change
  • 21:37 - 21:42
    then we aren't fixing people machines,
    they will issue a DSA ??? installed ???
  • 21:45 - 21:49
    And you should do proper testing as well
    at multiple levels
  • 21:53 - 21:54
    I will build that again
  • 22:24 - 22:30
    So we wanna diff the original one 0 5,
  • 22:30 - 22:36
    We wanna diff that one with a fake
    security one
  • 22:38 - 22:43
    You see on the progress bar 100%
    1- there are diferences (there should be
  • 22:44 - 22:46
    diferences)
    Lets see what that diferences are
  • 22:48 - 22:52
    in our web browser, its a nice html output
  • 23:01 - 23:04
    Let have a look.
    Are we seeing what we wanna see?
  • 23:07 - 23:11
    There are some chances in the data tar, we
    kind of expect that
  • 23:14 - 23:18
    What's changed in our control file?
    Well the version changed,we wanted that
  • 23:19 - 23:20
    to change. Perfect
  • 23:21 - 23:24
    And its changed to ???
    That's what we wanna see
  • 23:25 - 23:28
    No other changes here so there was no
    weird control or in magic going on
  • 23:32 - 23:38
    In our data tar the color of the timestamp
    changes, we will ignore those for now
  • 23:41 - 23:45
    The changelog has changed, well I hope so
    because I have changed that entry
  • 23:49 - 23:52
    Here is where we going to start seeing
    We are going to see the changing in the
  • 23:52 - 23:59
    jar file which is the java class, java
    compile archive format
  • 24:00 - 24:06
    We are seeing some meaningless timestamp
    changes but we can ignore those
  • 24:07 - 24:09
    lets pretend because its just
    metadata maybe
  • 24:16 - 24:24
    Ok part of a class, so if you can see here
    it's basically a de-compilation of the
  • 24:25 - 24:32
    java file itself and it's basically saying
    "oh I use to say if now and if not now"
  • 24:32 - 24:36
    So these are the actual byte java
    byte code instructions and whats really
  • 24:36 - 24:39
    And what is really ??? here
    its that nothing else has changed
  • 24:40 - 24:45
    We were just expecting that change between
    the two op codes, of if now elseif not not now
  • 24:46 - 24:50
    which is good cause its like it hasn't made
    any code changes but also crucial we can
  • 24:50 - 24:52
    see that it has actually made a change
    to the code.
  • 24:55 - 24:58
    For example its wasn't use some cached
    version or something like that
  • 24:58 - 25:00
    This is really useful
  • 25:00 - 25:05
    And just running a naif diff wouldn't
    give that of course, because it would just
  • 25:05 - 25:08
    come with binary garbage
    And just seeing the diff had changed again
  • 25:09 - 25:13
    ??? be told you anything, because all of the
    change would have changed as well
  • 25:13 - 25:16
    So its like well yes it's diferent
  • 25:16 - 25:19
    The meaningful change there it's
    what actually fixes the "floor"
  • 25:20 - 25:21
    ??? but we know it's there
  • 25:23 - 25:27
    That's kind of ???
    Shifting this deb out I'll be quite
  • 25:28 - 25:30
    confident, that this seemed like the
    actual bug
  • 25:31 - 25:35
    I've been quite confident pushing that out
    because it's very minimal amount of changes
  • 25:35 - 25:37
    you wanna do that for security reasons
  • 25:37 - 25:40
    So this was the live demo
  • 25:43 - 25:48
    The other one is seeing no changes
    at all, so you can build once
  • 25:48 - 25:50
    if you build a reproducible
  • 25:50 - 25:55
    You can build once change your compiler
    or change some other part of your toolchain
  • 25:56 - 26:02
    Build it again and if you got the exact same
    results, well great, that's want you intended
  • 26:03 - 26:05
    You wanna see no changes when you change
    some part of it
  • 26:08 - 26:12
    And that is really useful, if there were
    changes diffoscope will highlight them
  • 26:12 - 26:16
    and show exactly why they had changed,
    maybe some compile authorizations,
  • 26:16 - 26:18
    maybe some other things as well
  • 26:19 - 26:23
    So you can use it in both ways, when you
    expect changes and when you don't expect
  • 26:23 - 26:27
    changes, and if those match the expectations
    diffoscope will tell you exactly why
  • 26:30 - 26:34
    It's all ??? when other companies
    are doing security releases
  • 26:35 - 26:41
    naming no names whatsoever,
    but they like to release patches as you
  • 26:42 - 26:45
    know just a new firmware for your router
  • 26:47 - 26:51
    Very large file system images,
    you basically have no ideia what changed
  • 26:51 - 26:55
    between these two files, again you run
    through diff completely useless
  • 26:55 - 26:59
    You can start to unpack them with
    squashfs and blah blah blah
  • 27:01 - 27:06
    But they're probably sort of concatenated
    cpio archives, so that's nonsense
  • 27:07 - 27:12
    But diffoscope would just chew you those
    and give you actually what the diferences
  • 27:12 - 27:15
    is between these two files, and say
    they changed this, they've removed or
  • 27:16 - 27:19
    added some gpl license code or something
    kind of interesting
  • 27:24 - 27:31
    So its very useful for diffing those kind
    binary blobs that come from various people
  • 27:33 - 27:37
    So the current state of diffoscope,
    the development is up and down
  • 27:41 - 27:51
    It started around May 2014 something like that
    A bunch of work here, that's is idle I think
  • 27:55 - 27:57
    These are just for debconfs basically
  • 28:09 - 28:12
    Anyway it's going up and down its kind
    of interesting
  • 28:15 - 28:19
    ??? a lot of reproducible builds projects
    of course, so every time we do a build
  • 28:20 - 28:25
    on the ??? reproducible builds or
    testing framework if we run diffoscope
  • 28:25 - 28:30
    on the result, if it's reproducible it
    just says , hey the file is the same
  • 28:31 - 28:37
    But if not, we publish the diffoscopes of
    all your packages that are unreproducible
  • 28:37 - 28:41
    just you can just go there and be like
    whats the diference between these two things
  • 28:54 - 29:02
    I invested a lot of work optimizing
    diffoscope, ??? rather perverse end square
  • 29:02 - 29:08
    loops inside it. So i manage to cut down
    some of the time here, cut down here
  • 29:11 - 29:14
    That's been quite a few performances and
    enhancements over the past ...
  • 29:16 - 29:21
    these are the git tags , this is version 80
    and this is version 50 I just run the same
  • 29:22 - 29:23
    benchmark across them all
  • 29:25 - 29:35
    So they shows when I have introduced some
    rather stupid code, embarrassing , but whatever
  • 29:36 - 29:36
    ???
  • 29:37 - 29:41
    There's work been done right now,
    on parallel processing, there's been
  • 29:41 - 29:46
    quite a few attempts before, but adding it
    it's kind of interesting and difficult
  • 29:47 - 29:52
    Luckily we have an outreach student
    Liliana, is she in the room? Is she hiding?
  • 29:53 - 29:57
    She's here and she's been talking tomorrow
    about her work on paralel processing in
  • 29:58 - 30:02
    diffoscope and that will be amazing because
    a lot of it is IO bound or waiting for Xtel
  • 30:02 - 30:07
    processors with multiple cpu machines,
    you mind as well just play well
  • 30:07 - 30:12
    while as I stand waiting for the result
    for a pdf to be unpacked I maybe as well
  • 30:12 - 30:17
    be running on another cpu, I think we are
    going to see some real performance wins
  • 30:18 - 30:23
    as we do that paralell processing merge and
    working and ???
  • 30:24 - 30:30
    You can check out our website diffoscope.org
    recently migrated to Salsa .... yeeaahhh
  • 30:33 - 30:38
    And everything that's reproducible is now
    on Salsa, it's kind of cool
  • 30:39 - 30:42
    That's quite recent...
    ???
  • 30:45 - 30:46
    Thank you very muck, danke shcön
  • 30:47 - 30:49
    You got any questions?
    About diffoscope?
  • 30:52 - 30:54
    Thank you very much !
  • 30:54 - 30:58
    [Applause]
  • 31:00 - 31:03
    Q: A buzz word question, can you diff containers
    image formats?
  • 31:05 - 31:15
    A: Depend which ones. So if they are just
    directories, then yes, because is just a directory
  • 31:15 - 31:17
    Do you have particullary in mind? Like docker?
  • 31:19 - 31:25
    Yes, there's docker and then there's old
    CI, I believe is the standard one
  • 31:27 - 31:31
    And that could make a buzz word complaint
  • 31:31 - 31:33
    Ah ok we were all about buzz words
  • 31:34 - 31:37
    Probable diffoscope block change as well
  • 31:38 - 31:42
    And then run diffoscope on connectors and
    see the difference between updates of your
  • 31:42 - 31:43
    container images
  • 31:44 - 31:46
    BAM ... solved
    Where do I invest?
  • 31:48 - 31:57
    I wasn't aware that OCI ... that's is how it's
    called? No it doesn't support that right now
  • 31:58 - 32:02
    But it wouldn't be too difficult, presuming
    there are tools to unpack it and as soon
  • 32:02 - 32:08
    we have a tool to unpack it, it can then
    just go to that, there is an open wishlist
  • 32:08 - 32:15
    bug tool box for docker containers to the
    point were I think it would be really
  • 32:16 - 32:19
    nice if you could just give it, say, two
    images names or whatever the noun is
  • 32:20 - 32:24
    So you can say "please diff these two
    docker images that are available" and
  • 32:24 - 32:29
    it can look at your local thing and do
    a diff on them, currently it's not
  • 32:29 - 32:31
    supported, but there is an open wishlist
    bug.
  • 32:32 - 32:37
    Q: Shouldn't any company that releases
    binaries, be interested in supporting
  • 32:37 - 32:39
    diffoscope and using it?
  • 32:52 - 32:58
    A1: Basically when companies release binaries they are not interested in users seeing diferences...
  • 33:02 - 33:10
    A2: Yes, I'm surprised that actually the
    docker bug was only opened two months ago
  • 33:11 - 33:17
    and hasn't been more interest on diffing
    container images, but if you like to open
  • 33:18 - 33:24
    one for OCI that will be very appreciated,
    and we can get on to that, that would be
  • 33:25 - 33:26
    great.
  • 33:30 - 33:35
    I was looking the page for OCI, it says
    it's based on docker basically, so
  • 33:36 - 33:40
    once you get OCI for free, you would
    sort it out for docker, if you're lucky
  • 33:48 - 33:52
    The OCI image formaters, they wrote out
    on docker images
  • 33:55 - 34:00
    Ok we will sort that out, and it seems like
    we're using a docker more and more
  • 34:00 - 34:01
    on debian
  • 34:07 - 34:09
    Any other questions?
  • 34:21 - 34:29
    Q: Out of curiosity, which ??? are you using
    inside? Are you using some bio-informatics
  • 34:30 - 34:33
    algorithm to diff trees efficiently?
  • 34:34 - 34:47
    A: No it's really naif, all it does is run
    normal diff, the normal diff tools, but
  • 34:47 - 34:59
    it will try to identify files and unpack
    first, so use the file utility identifier
  • 35:00 - 35:07
    thing that says its a pdf , and try to
    unpack it first, he doesn't do any clever
  • 35:07 - 35:12
    matching. The clever matching that he does
    do is fuzzy matching as well, so if just
  • 35:12 - 35:19
    rename a directory between two inside a
    container, he will say , yeah there a
  • 35:19 - 35:24
    massive fuzzy match between this
    two files, and things like that. So that's
  • 35:24 - 35:31
    kind of useful, but apart from that clever,
    which is kind of what you want , because
  • 35:31 - 35:34
    if it's too clever it would start to be a little
    opaque ...
  • 35:38 - 35:40
    I personally like dumb tools.
  • 35:44 - 35:51
    Q: So one question to you is whether,
    if you wanna do a release to stable or
  • 35:52 - 35:59
    something like that, you can ask for the
    debdiff, I'm wandering if anyone
  • 35:59 - 36:04
    I mean I remember doing that myself
    I've been submitting diffoscope output
  • 36:04 - 36:10
    as well, because is just more readable and
    useful. so I'm not sure if anyone have any
  • 36:10 - 36:13
    objection to people asking for those.
  • 36:22 - 36:25
    I'll propose that to the release team
    see what they say
  • 36:26 - 36:29
    Thank you very much,
    is there any other questions?
  • 36:33 - 36:37
    No further questions? Then lets thanks
    Chris again !
  • 36:37 - 36:42
    [Applause]
Title:
https:/.../diffoscope.webm
Video Language:
English
Team:
Debconf
Project:
2018_mini-debconf-hamburg
Duration:
36:48

English subtitles

Incomplete

Revisions Compare revisions