Stretching out for trustworthy reproducible builds creating bit by bit identical binaries
-
0:01 - 0:02Welcome and good morning
-
0:04 - 0:07This is the reproducible builds team,
talking about -
0:07 - 0:10"Stretching out towards trustworthy
computing" -
0:12 - 0:20[Applause]
-
0:22 - 0:26We're 4 on stage, but actually this is a
team effort. -
0:26 - 0:31All these people listed here have
contributed to the project at one point. -
0:31 - 0:33The 4 of us, that's
-
0:33 - 0:34Lunar − me
-
0:34 - 0:35there's Dhole,
-
0:35 - 0:36Chris Lamb − lamby
-
0:36 - 0:38and Holger.
-
0:39 - 0:43But actually, this is DebConf and so a lot
more of us have been or are -
0:43 - 0:47currently here and so, if you want to
thank anybody that is working on this -
0:47 - 0:49you need to actually thank all of
these folks -
0:49 - 0:51'cause, yay.
-
0:51 - 0:56[Applause]
-
0:57 - 1:00[Holger] The people in blue are here.
-
1:04 - 1:06[Lunar] Let's get started.
-
1:06 - 1:08Quick recap on what we're talking
about. -
1:08 - 1:11We have software, it's made from source.
-
1:11 - 1:15Source is readable by humans or at least
a good amount of humans. -
1:15 - 1:17In this room it's good.
-
1:17 - 1:24Binary, readable by computer and some
tiny fraction of humanity. -
1:24 - 1:30Going from source to binary is called
build, or like building or compiling -
1:30 - 1:33and we're doing free software and
free software is awesome because -
1:33 - 1:38we can actually run these binaries like
we want -
1:38 - 1:44We can actually study the software, how
it's been made by studying the source -
1:44 - 1:49and by studying the source we can assess
that it does what it's supposed to do -
1:49 - 1:51and not something else that does not
-
1:51 - 1:56have malware, or trojans or security bugs
-
1:56 - 2:01So we have the binary that can be used,
fine. -
2:01 - 2:04We have the source that can be verified.
-
2:04 - 2:10Problem is that right now, the only way we
know that a binary that we get… -
2:10 - 2:16We have to trust a website or a Debian
repository that says -
2:16 - 2:18"Well, this binary has been made with this
source" -
2:18 - 2:23But there's no way we can actually prove
that. -
2:23 - 2:27This is actually a problem that has been
well explained by -
2:27 - 2:34Mike Perry and Seth Schoen at the 31c3
in Hamburg last december. -
2:34 - 2:41For example, Seth Schoen made a proof of
concept exploit for the Linux kernel -
2:41 - 2:52that when GCC was called, the kernel would
without modifying anything on the disk -
2:52 - 2:59when the kernel detects that GCC is going
to read a C file, it will insert some -
2:59 - 3:06extra lines of code, and these lines of
code can be a very bad thing -
3:06 - 3:09in the case of 31c3 talk I was just
recalling. -
3:09 - 3:18Actually, you can even have developers
who are in very good faith, who have -
3:18 - 3:21totally secure dev machines, or they
thought they have, -
3:21 - 3:24who have reviewed all their source code
for any bugs -
3:24 - 3:31and we would still get totally owned as
soon as their computer gets compromised -
3:31 - 3:34or one of the build demons from Debian
gets compromised for example. -
3:34 - 3:41This is not, like, hypothetical threats
here we're discussing -
3:41 - 3:46A couple of months after Seth an Mike's
talk at 31c3, -
3:46 - 3:49the Intercept revealed from the Snowden
leaks -
3:49 - 3:56that at a CIA conference in 2012, one
of the talks that happened -
3:56 - 3:59was about a project called Strawhorse.
-
3:59 - 4:05Strawhorse is about modifying Apple XCode,
which is the development environment -
4:05 - 4:09for MacOS 10 and iOS applications
-
4:09 - 4:11and well, they were modifying XCode so
it would produce, -
4:11 - 4:13without the developer knowing,
-
4:13 - 4:23binaries with trojans, malware,
??? binaries, lots of bad things. -
4:23 - 4:25So, solution:
-
4:25 - 4:29enable anyone to reproduce identical
binary packages from a given source. -
4:29 - 4:35Because if using a source, using the same
environment, -
4:35 - 4:40multiple people on different computers, on
different networks, at different times, -
4:40 - 4:43can all get the same thing
from the same source -
4:43 - 4:45all the same binary, byte for byte,
-
4:45 - 4:47then there's a good chance that…
-
4:47 - 4:55Well, everybody could be owned,
but let's be more joyful and say that -
4:55 - 4:59probably, if everybody gets the same
result, there was actually no problem -
4:59 - 5:01and everybody is safe.
-
5:02 - 5:04We call that solution
"reproducible builds" -
5:07 - 5:08Yay.
-
5:08 - 5:11[Applause]
-
5:13 - 5:15Actually, it's not only about security.
-
5:15 - 5:19For Debian, we have, if you're doing
"Multi-arch: same" packages, -
5:19 - 5:25well they only have the same bytes if
they are built for different architectures, -
5:25 - 5:28the files in the package.
-
5:28 - 5:34Debug packages, you can create at a later
time, if you forgot to have debug packages -
5:34 - 5:36in the first place,
-
5:36 - 5:42you can pass the no-strip option later and
because the package is reproducible, -
5:42 - 5:47you will get the debug symbols that work
for software that has been shipped already -
5:47 - 5:50We do early detection of FTBFS that way
-
5:50 - 5:54because if we try pretty quickly
to reproduce a build, -
5:54 - 5:55then it has to work.
-
5:55 - 5:58It's useful for build profiles.
-
5:58 - 6:02We can get smaller .deb deltas,
-
6:02 - 6:05because from one version to the next we
might have the same content. -
6:05 - 6:09We can do validation of cross-builds,
-
6:09 - 6:12Helmut Grohne can talk to you about that.
-
6:12 - 6:17And also, Niels Thykier told me that
-
6:17 - 6:21he was very interested in reproducible
builds because it would enable him to -
6:21 - 6:24test debhelper better, because
-
6:24 - 6:29if the package builds reproducibly,
then he makes a change to debhelper -
6:29 - 6:32he can rebuild the package ???
-
6:32 - 6:36the same version of a package with a newer
debhelper and see what has changed -
6:36 - 6:40and this change can be isolated to only
what he has worked on debhelper -
6:40 - 6:42for example.
-
6:43 - 6:45And, oh my.
-
6:45 - 6:48The whole world is watching us.
-
6:48 - 6:56Since two years or a year and a half ago,
everybody I meet in security conference, -
6:56 - 6:59in hacker conference, in free software
conference is like -
6:59 - 7:01"Oh you're working on that,
that's awesome." -
7:01 - 7:09And, I mean, I've been the one doing quite
a lot of talks, and everybody comes to me -
7:09 - 7:11and I'm like "Wow wow, this is way bigger",
-
7:11 - 7:16but we're actually leading the field here.
-
7:16 - 7:19Yay Debian.
-
7:19 - 7:26[Applause]
-
7:26 - 7:29[Holger] So, we are not the only ones
leading the field, -
7:29 - 7:33Bitcoin and Tor made their software
reproducible before us, -
7:33 - 7:37Coreboot also succeeded, if you build
Coreboot without any payload, -
7:37 - 7:39that's 100% reproducible.
-
7:39 - 7:44FreeBSD has a page on their wiki since
2013 -
7:44 - 7:49saying there are 5 reproducibility issues
in their base system. -
7:49 - 7:52We're at the moment trying to
confirm this. -
7:52 - 7:57On jenkins.debian.net, I've also set up
now tests for FreeBSD, NetBSD, -
7:57 - 7:59Coreboot and OpenWrt.
-
7:59 - 8:03So if you go to
reproducible.debian.net/ -
8:03 - 8:05you get that tested.
-
8:05 - 8:08And there's more in the pipeline.
-
8:08 - 8:11There are other projects interested
as well. -
8:11 - 8:15NetBSD also has a variable ???
which you can set -
8:15 - 8:17and that builds reproducibly.
-
8:17 - 8:20Though they think "I'm keeping some
timestamps ??? and then -
8:20 - 8:22filtering them out later".
-
8:22 - 8:23We disagree.
-
8:23 - 8:28So this is how Debian looks like,
Debian Sid, -
8:28 - 8:30but this is a lie.
-
8:30 - 8:32This is not the truth.
-
8:32 - 8:34This is just our test setup.
-
8:34 - 8:36Sid is not like this.
-
8:36 - 8:40For Sid, it's all orange, there's zero
reprodicibility in Sid today. -
8:40 - 8:44But we'll talk now and in the following
round table, -
8:44 - 8:47it's to actually make Sid reproducible.
-
8:47 - 8:52The current status is
-
8:52 - 8:58we're working on this in Debian since
two years ago. -
8:58 - 9:02We have weekly reports about our project
now since May -
9:02 - 9:07and we've given several talks, especially
in the last year -
9:07 - 9:11and all these talks, presentation, also
other stuff is linked in the wiki. -
9:11 - 9:15There's a page with information about
Debian, these BSDs, -
9:15 - 9:19other Linuxes, upstream ???
all on this wiki. -
9:23 - 9:27Since DebConf14, which is merely
a year ago, -
9:27 - 9:29we've made quite some changes.
-
9:29 - 9:33We have introduced
strip-nondeterminism -
9:33 - 9:39which is called by dh at the end
of the build of the package -
9:39 - 9:45and will normalize some things
which Chris will explain later -
9:45 - 9:50We have decided on a fixed build path
-
9:50 - 9:54because the build path is leaked
in the binaries and several things -
9:54 - 9:57We didn't find a way yet to make
the build path arbitrary. -
9:57 - 10:03We designed a way to record the build
environment -
10:03 - 10:08because to rebuild, you need to recreate
the build environment. -
10:08 - 10:12We set up this Jenkins setup.
-
10:12 - 10:17We wrote diffoscope which used to be
called debbindiff -
10:17 - 10:21which shows differences between two
packages or two directories or -
10:21 - 10:24two filesystems by now.
-
10:24 - 10:31There's SOURCEDATEEPOCH, which is a way
that the tools expose -
10:31 - 10:34the last modification of the source.
-
10:34 - 10:37Because the build date, people want to
include the build date -
10:37 - 10:39because they think this is a
meaningful indication: -
10:39 - 10:42when a build was done,
which software used. -
Not SyncedBut if the build always recreates
the same results -
Not Syncedthe build date becomes meaningless
-
Not Syncedand the really interesting thing is
the latest modification of the source. -
Not SyncedWe have written patches for the tools
-
Not Synced[Lunar] strip-nondeterminism:
is Andrew Ayer in the audience? -
Not SyncedYay! He did it!
-
Not SyncedIt's written in Perl because we didn't
want to have a new build dependency -
Not Syncedin all Debian packages.
-
Not SyncedBasically it takes anything and tries
to normalize it as much as it can -
Not Syncedreplacing timestamps or file permissions
or removing some issues. -
Not SyncedIt's working very well on many formats,
it's meant to be extensible -
Not Syncedso we can actually add more things and
it's run by dh at the end of the process, as Holger said. -
Not SyncedThe .buildinfo is currently a proposal
we have not yet totally agreed -
Not Syncedbut we are generating them as part
of the test we have -
Not Syncedand basically it's a new control file that
will tie the sources, the generated binary -
Not Syncedthe packages that were used to build this
binary and their version. -
Not SyncedThe idea is that we can use this file to
reinstall all the specific versions from snapshot -
Not SyncedSo we recreate the same build environment
then we can just start the build from that source -
Not Syncedthat was mentioned and see if the binary
that has been generated matches. -
Not SyncedWhat it looks like for now, you see there is
a source binary, the build path -
Not Syncedbecause currently we don't have any good
post-processing tool for buildpaths -
Not Syncedin elf and dwarf binaries, we just decided
to specify the build path so when we do -
Not Synceda later rebuild we use that path and be safe.
-
Not SyncedThe source is ???, the binary is .deb and
a list of packages with the versions. -
Not SyncedWe currently use the base files version
to know which Debian release is to be used -
Not Syncedas the basis.
-
Not Synced[Holger] The general procedure for testing is:
we build the source, we save the results, -
Not Syncedwe modify the environment and we build
it again and compare the results. -
Not SyncedThat was a shell script last year which I
put on jenkins and then it exploded a bit -
Not Syncedand now we have 67 jenkins jobs running on
7 hosts. -
Not SyncedSince last week we have 4 armhf small boards
where we will be able to test armhf, -
Not Syncedbut very slowly.
-
Not SyncedWe have two new amd64 build nodes.
-
Not SyncedThe code is now split into Python and bash
scripts. -
Not SyncedFor all the other distro testing there's a
lot of bash code now which is mostly -
Not Syncedboilerplate and it's 5 lines or something
to build FreeBSD and 5 lines to build NetBSD -
Not Syncedbut there's 100 lines boilercode around so it's
really not that much code. -
Not SyncedWe do test Testing, Unstable and Experimental.
-
Not SyncedFor arm we only start with Unstable.
-
Not SyncedWe do like hardware so if you have hardware
to donate to us, that would be great, -
Not Syncedwe need ssh and then root basically.
-
Not SyncedWe are testing Coreboot, OpenWrt and the
BSD's, soon I will also set up a Fedora test -
Not SyncedI don't want to test all the 20,000 Fedora
packages but just 200 or something: -
Not Syncedthe base system of Fedora to examine how
rpm works -
Not Syncedto get really the whole Free Software world
reproducible. -
Not SyncedThis is all run on ProfitBricks hardware
since 2002, so thanks to ProfitBricks. -
Not SyncedThis is the variations we do for Debian.
-
Not SyncedIt's the hostname, username, timezone,
locale. -
Not SyncedChris will explain what modifications
this causes, variances... -
Not SyncedWe are not testing at the moment differences
in date so the date is always the same -
Not Syncedthe time is a bit different.
-
Not Synced[Lunar] Well almost! Because we cheat with
the timezone, we use one timezone that is -
Not SyncedGMT-14 and then GMT+12 so it's more than
24 hours appart. -
Not Synced[Holger] On the first of the month we
sometimes find new bugs where there's -
Not Syncedpackages which record the month.
-
Not SyncedWe don't have variations of the CPU type
at the moment. -
Not SyncedBoth time and CPU type variations, we'll
have them about one or two weeks -
Not Syncedthe nodes are being prepared at the moment.
-
Not SyncedThen we will test all the meaningful
variations we could think of. -
Not SyncedThere will be probably some packages which
build different according to the number of -
Not Syncednumber of CD drives attached or whatever
things, but those will be find by you. -
Not Synced[Lunar] We are doing all these tests because
we want when you rebuild a package on -
Not Syncedyour machine that if any this is different from
the build deamons in Debian you get -
Not Syncedthe same results.
-
Not SyncedWe use this to detect this problems early
before you actually a ??? that we have -
Not Syncedto investigate when someone rebuilds a
package on their machine.
Show all