Next talk

Chris and Holger are going to talk to us again

about reproducible builds and tell us 
where they're up to.

Thanks very much

The outline of this talk is from last year 
we realised there were a lot of questions.

The rough plan is to quickly go over 
what reproducible builds are

I guess everyone is up to speed

but getting everyone on the same page 
would be a good idea.

Then Holger's going to jump in
and give the status update

and then we're going to talk about 
future work, questions etc

What is the actual problem we're 
solving here?

You can always inspect the source code of 
free software for malicious flaws

or just flaws as well.

Unfortunately distributions provide 
precompiled binaries to end users.

So can you actually trust this 
compilation process has not

introduced flaws of its own?

The problem is it seems very effective if 
you want to go after end users

you can go after developers. 
Because if you go infect a developers

machine you will then infect all the 
users of the software they generate.

Financial incentives. There always were 
but they are even more so these days

with mobile phone etc.

You can also have very subtle flaws. 
This one in particular there was a

root exploit in OpenSSH just by changing 
a compare equal.

That sort of assembler jump thing and it 
gives you root

but with only a single bit difference in 
the binary.

Which is not to shabby.

Then you have all sorts of cute demos 
where you load up the source code in VIM

and it just looks like 'Hello world' but 
when you compile it with GCC

your kernel rootkit just goes 'oh I'm 
going to give you a different file'

and self replicates of them like that.

Difficult to trust the process.

And there's some recent history as well 
around Xcodeghost and iOS

and adverts and things like that.

You can Google those things. 
Really scary stuff.

The last example is actually coming from 
a CIA design paper from 2012.

Which was then found in the wild in 2014. 
So these exploits are actually happening.

People are targeting developers to get 
users.

Xcodeghost had 20 milllion user 
installations.

It was probably not the CIA or NSA but 
we don't know who it was.

There are many people who do these 
exploits in the wild.

Yeah it's not just 'Here's this cute 
thing we can talk about'.

It's actually happening.

The motivation is to ensure no flaws are 
introduced during the build process.

We do this by ensuring the build always 
produces identical results.

Then multiple parties do the same thing.

I build it, you build it, your friends 
build it etc

An an attacker would need to infect 
everyone simultaneously

otherwise they'd be detected.
For example if my machine was compromised

I would suddenly come up with a 
different result.

I would come up with different binaries.

And you'd be 'what's going on here' and 
eventually we would discover

that my machine was rootkitted etc.

You probably know it but identically 
means bit by bit identical.

As that is really the same.

Yeah, bit, SHA, MD5 whatever you want.

There are a bunch of challenges here. 
The biggest one is timestamps.

A lot of software just loves to include 
timestamps everywhere.

Documentation, underscore underscore date
and underscore underscore time macros

Just all over the place, in file names etc
Things like that.

Builds often vary by locale and timezone.

Different new lines, different sorting 
orders for example collations.

Different versions of libraries. I'm not 
sure what this refers to exactly.

Moving on.

Non-deterministic file ordering for 
example Shell Globs are not really defined

to be, I say not really defined they 
aren't defined to come out in normal order.

Also read syscall, it doesn't actually 
promise any particular ordering.

Dictionary/hash key ordering. So this is 
in things like Perl and python

you use a key or a hash. If you iterate over the keys with that it's a non-determinative order.

If your build system loops over such a 
hash or a dictionary

then the results from this build could be 
non-reproducible and non-deterministic.

And also things like files in the part of 
the build process will just adsorb

stuff from the surrounding environment 
like umask and all that kind of

stuff that lives outside there.

Build paths is a very interesting one which 
we cover in greater detail on another slide.

Also specifying the environment, we'll also
cover this one in the build info slides.

So not only are there privacy and security
advantages of using,

moving towards reproducible builds there
are also technical advantages.

It's faster to build if you basically 
keep hitting cache.

I'm pretty certain this is why Google are 
interested in it.

Because of the amount of 
compilation they do

they're just going to save a whole bucket 
load of money just by

'Oh we don't need to rebuild this because 
it's the same SHA' etc

It's very nice to test revisions and 
changes I use all out tools

when doing QA uploads or NMUs you 
rebuild a package

and then you compare to the previous one. 
And as the only things that have changed

should be the things that you've changed, 
there haven't been all sorts

of random other nonsense being 
reorderd with timestamps added.

You can get rid of all that noise and 
just be 'oh yeah brilliant I can see that

the patch I've applied here has actually 
changed the behaviour of the program'

and only that. It hasn't done all sorts 
of wierd wierd stuff.

So you have safer uploads in that sense.

Speaking of safety a reproducible build 
won't go talking to the Internet

like a lot of modern package managers 
like to do. Mathen style ones.

Also a reproducible build will typically 
not have any

non-deterministic failure modes.

So there's a lot of tests and test suites 
in Debian that will

try and test things like 'Oh is this 
algorithm N squared or bigger than N'.

And it will try doing that by running some sort of bench mark

and fail if it doesn't meet some sort of 
arbitrary time difference and

that's obviously that's not reliable. So we 
get rid of all those nonsense things.

It also finds bugs in really weird 
locales. We build in French,

Swiss-French, and it just comes up with 
all sorts of nonsense.

Or timezones, if you build in UTC-12 then 
this date library doesn't work anymore

and it's like 'you had one job 
to be a date library'.

[audience laughter]

It's pretty scary and some pretty 
cute bugs.

It also detects if your machine is, you 
just have a broken ??? [8:28].

We build a year and a month in the 
future. You find things like the

maintainer has added a pre-generated SSL
certificate to their tests and

it expires in the year. And so it breaks.

We're preemptively detecting that fail to 
??? [9:01] source.