I'm here today to talk to you about
diffoscope
and how you can use it as a better diff
or for Quality Assurance, etc., things
like that.
Moin!
Apparently that's like a north german
thing to say "welcome".
North german, north Denmark, Scandinavia,
that kind of thing, I'm told.
People are shaking their head, so I'm
going to assume that's true.
This is my first PC, an IBM 5155.
Sometimes, when you rebooted it, it would
launch into, it would somehow revert
from booting from the hard disk to booting
from a basic ROM,
as in the programming language ROM.
It was on my motherboard for some reason.
So, randomly, you just get a chance to
program in basic and then,
sometimes you wouldn't, I don't know why,
but… yeah.
It's quite fun with this kind of clicky
keyboard, and that folded in
and it was this kind of big desk thing.
Anyway…
This is my first Debian.
At the time it was already old.
What's this one? Is this Slink? 2.2?
Yeah.
And this is when we had US and non-US,
so that's really dating if you remember that.
This is my first contribution to Debian,
19th December 2006,
sending a patch to lillypond which is kind
of interesting
and the response was "Oh yeah, rock on,
many thanks. I'll upload this and
it'll be landing to Etch".
And this was super motivating because
Etch was just coming out and it was like
"Great, I've got let one line of tiny patch
in a release. This is super cool."
Thomas' response was super motivating.
So, after that, like that Christmas
basically spent ???
Debian webpages and stuff.
Very well timed.
That's kind of a good…
You know, someone sends a patch, be like
"Cool, thanks"
Like a little notice in the changelog.
It was, you know, so stupid but…
Yeah, do that kind of thing.
So, moving on.
Why diffoscope?
Why did we write diffoscope?
What's the background here?
It comes from reproducible builds.
The very quick outline is that once you
get the source code for free software,
you download the source code for nginx
or whatever,
pretty much everyone just runs binaries
on their servers or their systems.
You know, "apt install bla", "yum install",
whatever.
Android Playstore, whatever.
Can you actually trust whether these two
things correspond with each other?
You've gotten the source code, it looks
alright, and then you install this binary,
yeah…
Who generated that? Can you trust that
process?
Can you trust who generated it?
Even if you could trust them, could you
trust them not to be exploited? Etc.
This is a big problem because you can
exploit a build farm and then
obviously exploit all of that, you know,
a trojan into the build farm,
so every single binary that comes out
is compromised.
Kind of problematic.
You could also target individual developers
machines,
so I could go of to, say, your machine,
add a backdoor to it,
so every binary that you give to friends
and things like that,
are compromised in some way, stealing
your bitcoins or whatever.
I can also ???
and blackmail you into producing
software that has compromises or extra
features, shall we say,
that don't exist in the source code.
So what will happen there is that you'd
release your source
and the binaries you produce have
this sort of backdoor that, you know,
someone is forcing you into producing.
So, you don't want to do that.
Anyway
enough of that.
What you do for reproducible builds is you
ensure that every time you build
a piece of software, you get an identical
result.
Multiple people then compare their builds
and check whether they all get
the same results
and this means that an attacker must
either have infected everyone
at the same time, or they haven't
infected anyone.
The point here is that you have to ensure
that builds have identical results.
Ok, great.
So, we started the reproducible builds
project, etc.
And we build 2 debs.
Oh, I'm sorry about the colors there.
You probably can't see that.
That says "sha1sum a.deb b.deb".
Anyway, we're comparing the sha1sums
of 2 binary Debian files.
So, these two files differ.
Ok, they're not reproducible.
Why is that?
So we run a diff on them.
Yeah…
So, what can we learn from this?
Well, not very much, visibly they're
compressed so
as soon as we see one change, we'll see
they would just cascade changes
because that's how compression works.
I guess we know it's a deb ???
format file, not very useful.
Ok, great so we're gonna have a look in
We'll do a binary diff and ok, well…
Again, that's not really telling us
very much
with the diff there.
Ok, great.
???
"ar x" is on the new maintainer thing,
"how you unpack a deb"
Everyone remembers this, right?
You unpack a.deb with "ar x" and you
do that to b.deb
and then we diff the results of that.
Ok, so…yeah, 7zip.
Ok, compressed content, not very useful.
Ok, so let's unpack the control.tar inside
these debs.
And then we run diff on that.
Still not really telling anything useful
about how to make this package reproducible
So let's unpack the tar.xz into the tar.
Inside that tar, there's a file called
md5sums and we start to see some differences
between some files in these two debs.
??? meaningful, so now
we have some idea that
it has something to do with this
usr/bin/pmixer binary.
Ok, interesting.
We'll unzip that and then we do a diff on
pmixer itself.
Now we're back into just binary
??? mode
This isn't very helpful and this is taking
quite a while
and if I remember correctly, Debian has
a lot of packages.
So this might take a little while.
So, basically, ??? meme
I should build a better diff.