Stretching out for trustworthy reproducible builds creating bit by bit identical binaries
-
0:01 - 0:02Welcome and good morning
-
0:04 - 0:07This is the reproducible builds team,
talking about -
0:07 - 0:10"Stretching out towards trustworthy
computing" -
0:12 - 0:20[Applause]
-
0:22 - 0:26We're 4 on stage, but actually this is a
team effort. -
0:26 - 0:31All these people listed here have
contributed to the project at one point. -
0:31 - 0:33The 4 of us, that's
-
0:33 - 0:34Lunar − me
-
0:34 - 0:35there's Dhole,
-
0:35 - 0:36Chris Lamb − lamby
-
0:36 - 0:38and Holger.
-
0:39 - 0:43But actually, this is DebConf and so a lot
more of us have been or are -
0:43 - 0:47currently here and so, if you want to
thank anybody that is working on this -
0:47 - 0:49you need to actually thank all of
these folks -
0:49 - 0:51'cause, yay.
-
0:51 - 0:56[Applause]
-
0:57 - 1:00[Holger] The people in blue are here.
-
1:04 - 1:06[Lunar] Let's get started.
-
1:06 - 1:08Quick recap on what we're talking
about. -
1:08 - 1:11We have software, it's made from source.
-
1:11 - 1:15Source is readable by humans or at least
a good amount of humans. -
1:15 - 1:17In this room it's good.
-
1:17 - 1:24Binary, readable by computer and some
tiny fraction of humanity. -
1:24 - 1:30Going from source to binary is called
build, or like building or compiling -
1:30 - 1:33and we're doing free software and
free software is awesome because -
1:33 - 1:38we can actually run these binaries like
we want -
1:38 - 1:44We can actually study the software, how
it's been made by studying the source -
1:44 - 1:49and by studying the source we can assess
that it does what it's supposed to do -
1:49 - 1:51and not something else that does not
-
1:51 - 1:56have malware, or trojans or security bugs
-
1:56 - 2:01So we have the binary that can be used,
fine. -
2:01 - 2:04We have the source that can be verified.
-
2:04 - 2:10Problem is that right now, the only way we
know that a binary that we get… -
2:10 - 2:16We have to trust a website or a Debian
repository that says -
2:16 - 2:18"Well, this binary has been made with this
source" -
2:18 - 2:23But there's no way we can actually prove
that. -
2:23 - 2:27This is actually a problem that has been
well explained by -
2:27 - 2:34Mike Perry and Seth Schoen at the 31c3
in Hamburg last december. -
2:34 - 2:41For example, Seth Schoen made a proof of
concept exploit for the Linux kernel -
2:41 - 2:52that when GCC was called, the kernel would
without modifying anything on the disk -
2:52 - 2:59when the kernel detects that GCC is going
to read a C file, it will insert some -
2:59 - 3:06extra lines of code, and these lines of
code can be a very bad thing -
3:06 - 3:09in the case of 31c3 talk I was just
recalling. -
3:09 - 3:18Actually, you can even have developers
who are in very good faith, who have -
3:18 - 3:21totally secure dev machines, or they
thought they have, -
3:21 - 3:24who have reviewed all their source code
for any bugs -
3:24 - 3:31and we would still get totally owned as
soon as their computer gets compromised -
3:31 - 3:34or one of the build demons from Debian
gets compromised for example. -
3:34 - 3:41This is not, like, hypothetical threats
here we're discussing -
3:41 - 3:46A couple of months after Seth an Mike's
talk at 31c3, -
3:46 - 3:49the Intercept revealed from the Snowden
leaks -
3:49 - 3:56that at a CIA conference in 2012, one
of the talks that happened -
3:56 - 3:59was about a project called Strawhorse.
-
3:59 - 4:05Strawhorse is about modifying Apple XCode,
which is the development environment -
4:05 - 4:09for MacOS 10 and iOS applications
-
4:09 - 4:11and well, they were modifying XCode so
it would produce, -
4:11 - 4:13without the developer knowing,
-
4:13 - 4:23binaries with trojans, malware,
watermarked binaries, lots of bad things. -
4:23 - 4:25So, solution:
-
4:25 - 4:29enable anyone to reproduce identical
binary packages from a given source. -
4:29 - 4:35Because if using a source, using the same
environment, -
4:35 - 4:40multiple people on different computers, on
different networks, at different times, -
4:40 - 4:43can all get the same thing
from the same source -
4:43 - 4:45all the same binary, byte for byte,
-
4:45 - 4:47then there's a good chance that…
-
4:47 - 4:55Well, everybody could be owned,
but let's be more joyful and say that -
4:55 - 4:59probably, if everybody gets the same
result, there was actually no problem -
4:59 - 5:01and everybody is safe.
-
5:02 - 5:04We call that solution
"reproducible builds" -
5:07 - 5:08Yay.
-
5:08 - 5:11[Applause]
-
5:13 - 5:15Actually, it's not only about security.
-
5:15 - 5:19For Debian, we have, if you're doing
"Multi-arch: same" packages, -
5:19 - 5:25well they only have the same bytes if
they are built for different architectures, -
5:25 - 5:28the files in the package.
-
5:28 - 5:34Debug packages, you can create at a later
time, if you forgot to have debug packages -
5:34 - 5:36in the first place,
-
5:36 - 5:42you can pass the no-strip option later and
because the package is reproducible, -
5:42 - 5:47you will get the debug symbols that work
for software that has been shipped already -
5:47 - 5:50We do early detection of FTBFS that way
-
5:50 - 5:54because if we try pretty quickly
to reproduce a build, -
5:54 - 5:55then it has to work.
-
5:55 - 5:58It's useful for build profiles.
-
5:58 - 6:02We can get smaller .deb deltas,
-
6:02 - 6:05because from one version to the next we
might have the same content. -
6:05 - 6:09We can do validation of cross-builds,
-
6:09 - 6:12Helmut Grohne can talk to you about that.
-
6:12 - 6:17And also, Niels Thykier told me that
-
6:17 - 6:21he was very interested in reproducible
builds because it would enable him to -
6:21 - 6:24test debhelper better, because
-
6:24 - 6:29if the package builds reproducibly,
then he makes a change to debhelper, -
6:29 - 6:32then he can rebuild
-
6:32 - 6:36the same version of a package with a newer
debhelper and see what has changed -
6:36 - 6:40and this change can be isolated to only
what he has worked on debhelper -
6:40 - 6:42for example.
-
6:43 - 6:45And, oh my.
-
6:45 - 6:48The whole world is watching us.
-
6:48 - 6:56Since two years or a year and a half ago,
everybody I meet in security conference, -
6:56 - 6:59in hacker conference, in free software
conference is like -
6:59 - 7:01"Oh you're working on that,
that's awesome." -
7:01 - 7:09And, I mean, I've been the one doing quite
a lot of talks, and everybody comes to me -
7:09 - 7:11and I'm like "Wow wow, this is way bigger",
-
7:11 - 7:16but we're actually leading the field here.
-
7:16 - 7:19Yay Debian.
-
7:19 - 7:26[Applause]
-
7:26 - 7:29[Holger] So, we are not the only ones
leading the field, -
7:29 - 7:33Bitcoin and Tor made their software
reproducible before us, -
7:33 - 7:37Coreboot also succeeded, if you build
Coreboot without any payload, -
7:37 - 7:39that's 100% reproducible.
-
7:39 - 7:44FreeBSD has a page on their wiki since
2013 -
7:44 - 7:49saying there are 5 reproducibility issues
in their base system. -
7:49 - 7:52We're at the moment trying to
confirm this. -
7:52 - 7:57On jenkins.debian.net, I've also set up
now tests for FreeBSD, NetBSD, -
7:57 - 7:59Coreboot and OpenWrt.
-
7:59 - 8:03So if you go to
reproducible.debian.net/ -
8:03 - 8:05you get that tested.
-
8:05 - 8:08And there's more in the pipeline.
-
8:08 - 8:11There are other projects interested
as well. -
8:11 - 8:15NetBSD also has a variable MKREPRO
which you can set -
8:15 - 8:17and that builds reproducibly.
-
8:17 - 8:20Though they think "I'm keeping some
timestamps it's fine" and then -
8:20 - 8:22filtering them out later".
-
8:22 - 8:23We disagree.
-
8:23 - 8:28So this is how Debian looks like,
Debian Sid, -
8:28 - 8:30but this is a lie.
-
8:30 - 8:32This is not the truth.
-
8:32 - 8:34This is just our test setup.
-
8:34 - 8:36Sid is not like this.
-
8:36 - 8:40For Sid, it's all orange, there's zero
reprodicibility in Sid today. -
8:40 - 8:44But we'll talk now and in the following
round table, -
8:44 - 8:47it's to actually make Sid reproducible.
-
8:47 - 8:52The current status is
-
8:52 - 8:58we're working on this in Debian since
two years ago. -
8:58 - 9:02We have weekly reports about our project
now since May -
9:02 - 9:07and we've given several talks, especially
in the last year -
9:07 - 9:11and all these talks, presentation, also
other stuff is linked in the wiki. -
9:11 - 9:15There's a page with information about
Debian, these BSDs, -
9:15 - 9:19other Linuxes, upstream softwares
all on this wiki. -
9:23 - 9:27Since DebConf14, which is merely
a year ago, -
9:27 - 9:29we've made quite some changes.
-
9:29 - 9:33We have introduced
strip-nondeterminism -
9:33 - 9:39which is called by dh at the end
of the build of the package -
9:39 - 9:45and will normalize some things
which Chris will explain later -
9:45 - 9:50We have decided on a fixed build path
-
9:50 - 9:54because the build path is leaked
in the binaries and several things -
9:54 - 9:57We didn't find a way yet to make
the build path arbitrary. -
9:57 - 10:03We designed a way to record the build
environment -
10:03 - 10:08because to rebuild, you need to recreate
the build environment. -
10:08 - 10:12We set up this Jenkins setup.
-
10:12 - 10:17We wrote diffoscope which used to be
called debbindiff -
10:17 - 10:21which shows differences between two
packages or two directories or -
10:21 - 10:24two filesystems by now.
-
10:24 - 10:31There's SOURCEDATEEPOCH, which is a way
that the tools expose -
10:31 - 10:34the last modification of the source.
-
10:34 - 10:37Because the build date, people want to
include the build date -
10:37 - 10:39because they think this is a
meaningful indication: -
10:39 - 10:42when a build was done,
which software used. -
10:42 - 10:46But if the build always recreates
the same results -
10:46 - 10:47the build date becomes meaningless
-
10:47 - 10:51and the really interesting thing is
the latest modification of the source. -
10:52 - 10:56We have written patches for the tools
-
10:58 - 11:04[Lunar] strip-nondeterminism:
is Andrew Ayer in the audience? -
11:04 - 11:06Yay! He did it!
-
11:06 - 11:12It's written in Perl because we didn't
want to have a new build dependency -
11:12 - 11:14in all Debian packages.
-
11:14 - 11:18Basically it takes anything and tries
to normalize it as much as it can -
11:18 - 11:27replacing timestamps or file permissions
or removing some issues. -
11:27 - 11:31It's working very well on many formats,
it's meant to be extensible -
11:31 - 11:38so we can actually add more things and
it's run by dh at the end of the process, as Holger said. -
11:38 - 11:45The .buildinfo is currently a proposal
we have not yet totally agreed -
11:45 - 11:49but we are generating them as part
of the test we have -
11:49 - 11:57and basically it's a new control file that
will tie the sources, the generated binary -
11:57 - 12:01the packages that were used to build this
binary and their version. -
12:01 - 12:09The idea is that we can use this file to
reinstall all the specific versions from snapshot -
12:09 - 12:17So we recreate the same build environment
then we can just start the build from that source -
12:17 - 12:21that was mentioned and see if the binary
that has been generated matches. -
12:23 - 12:28What it looks like for now, you see there is
a source binary, the build path -
12:28 - 12:34because currently we don't have any good
post-processing tool for buildpaths -
12:34 - 12:41in elf and dwarf binaries, we just decided
to specify the build path so when we do -
12:41 - 12:45a later rebuild we use that path and be safe.
-
12:45 - 12:52The source is dsc, the binary is .deb and
a list of packages with the versions. -
12:53 - 13:02We currently use the base files version
to know which Debian release is to be used -
13:02 - 13:04as the basis.
-
13:11 - 13:18[Holger] The general procedure for testing is:
we build the source, we save the results, -
13:18 - 13:23we modify the environment and we build
it again and compare the results. -
13:23 - 13:32That started as a shell script last year which I
put on jenkins and then it exploded a bit -
13:32 - 13:36and now we have 67 jenkins jobs running on
7 hosts. -
13:36 - 13:45Since last week we have 4 armhf small boards
where we will be able to test armhf, -
13:45 - 13:46but very slowly.
-
13:46 - 13:49We have two new amd64 build nodes.
-
13:49 - 13:53The code is now split into Python and bash
scripts. -
13:53 - 13:59For all the other distro testing there's a
lot of bash code now which is mostly -
13:59 - 14:05boilerplate and it's 5 lines or something
to build FreeBSD and 5 lines to build NetBSD -
14:05 - 14:09but there's 100 lines boilercode around so it's
really not that much code. -
14:09 - 14:13We do test Testing, Unstable and Experimental.
-
14:13 - 14:16For arm we only start with Unstable.
-
14:16 - 14:22We do like hardware so if you have hardware
to donate to us, that would be great, -
14:22 - 14:25we need ssh and then root basically.
-
14:27 - 14:34We are testing Coreboot, OpenWrt and the
BSD's, soon I will also set up a Fedora test -
14:34 - 14:40I don't want to test all the 20,000 Fedora
packages but just 200 or something: -
14:40 - 14:44the base system of Fedora to examine how
rpm works -
14:44 - 14:48to get really the whole Free Software world
reproducible. -
14:48 - 14:53This is all run on ProfitBricks hardware
since 2002, so thanks to ProfitBricks. -
14:57 - 15:00This is the variations we do for Debian.
-
15:02 - 15:07It's the hostname, username, timezone,
locale. -
15:07 - 15:14Chris will explain what modifications
this causes, variances... -
15:14 - 15:19We are not testing at the moment differences
in date so the date is always the same -
15:19 - 15:20the time is a bit different.
-
15:20 - 15:26[Lunar] Well almost! Because we cheat with
the timezone, we use one timezone that is -
15:26 - 15:32GMT-14 and then GMT+12 so it's more than
24 hours appart. -
15:33 - 15:36[Holger] On the first of the month we
sometimes find new bugs where there's -
15:36 - 15:38packages which record the month.
-
15:41 - 15:44We don't have variations of the CPU type
at the moment. -
15:46 - 15:51Both time and CPU type variations, we'll
have them about one or two weeks -
15:51 - 15:54the nodes are being prepared at the moment.
-
15:54 - 16:01Then we will test all the meaningful
variations we could think of. -
16:01 - 16:05There will be probably some packages which
build different according to the number of -
16:05 - 16:11number of CD drives attached or whatever
things, but those will be find by you. -
16:12 - 16:17[Lunar] We are doing all these tests because
we want when you rebuild a package on -
16:17 - 16:22your machine that if any this is different from
the build deamons in Debian you get -
16:22 - 16:23the same results.
-
16:23 - 16:30We use this to detect this problems early
before you actually a false positive that we have -
16:30 - 16:34to investigate when someone rebuilds a
package on their machine. -
16:37 - 16:43To understand the difference that we found
from one build to the other. -
16:43 - 16:51It started also as a 10 lines shellscript
and then it felt okeyish -
16:51 - 16:52and so Python!
-
16:52 - 16:58And now it's a lot of code and it actually
grew way beyond a Debian package. -
16:58 - 17:03We changed the name, it was called debbindiff
but it's absolutely not tied to Debian anymore. -
17:03 - 17:07It's called diffoscope, thanks to Jocelyn
for the name. -
17:07 - 17:12Basically what it does: it tries to get to
the bottom of what is different between -
17:12 - 17:14two archives or directories.
-
17:14 - 17:22Because it's not useful to compare bytes that
are compressed by gzip or xz, that will not -
17:22 - 17:27lead you to understand what is different
you need to uncompress and look at -
17:27 - 17:33uncompressed data, and if the thing actually
compressed is a tarball, you might actually -
17:33 - 17:35want to compare the files inside the tarball.
-
17:35 - 17:42If there is a PDF inside this archive, you
don't want to compare the bytes of the PDF -
17:42 - 17:44you want to compare the text of the PDF.
-
17:44 - 17:50So this is basically what diffoscope does,
it tries to transform anything that is -
17:50 - 17:57a container and compare things in this
container and if they can be transformed into -
17:57 - 18:01a human readable form it will try to do
that, and compare these human readable form. -
18:01 - 18:05And if it doesn't find any difference but
there are still differences from the bin -
18:05 - 18:07it will fall back to binary comparison.
-
18:08 - 18:13Try it, extend it; it's Python, it's modular,
it's great. -
18:13 - 18:23It already supports squashfs, ISO, rpm,
gettext, mo files files and so many different things. -
18:23 - 18:30You can have HTML output like that,
so this is what is displayed on many -
18:30 - 18:34examples we've shown so far, and also
to make it easier for copy paste -
18:34 - 18:38and post processing we have the text output.
-
18:38 - 18:43You can also use it to review packages before
uploading them to Debian. -
18:43 - 18:49It does fuzzy matching, so even if the
directory is different in the archive it will -
18:49 - 18:52find it like git does.
-
18:52 - 18:59It has grown way more beyond just build
reproducibly. A useful tool. -
19:01 - 19:07[Dhole] In order to solve timestamp issues, we are
proposing the SOURCEDATEEPOCH variable. -
19:07 - 19:12This is because most of the times having
the build date embedded in a package -
19:12 - 19:16is not useful for the user, because you could
take a really old package and build it today -
19:16 - 19:19and that day would not be useful.
-
19:19 - 19:26We are standardizing a replacement for build
dates so that tools can use it. -
19:26 - 19:32When this value is set, the tool instead of
embedding the current date, it will embed -
19:32 - 19:38the date taken from SOURCEDATEEPOCH which
will contain a Unix epoch timestamp. -
19:38 - 19:43This is a general solution we are trying to
standardize so that not only Debian uses it, -
19:43 - 19:48but other Free Software projects and
distributions and in the case of Debian, -
19:48 - 19:52we set this variable to the latest Debian
changelog entry timestamp. -
19:55 - 20:01We have already been sending patches to
different packages, mostly it's documentation -
20:01 - 20:06generation. So here's a list of bugs that
we have opened which have been closed -
20:06 - 20:12and merged; so it's help2man, epydoc,
ghostscript, texi2html and sphinx. -
20:12 - 20:19We are both sending these patches to Debian
and upstream so all the distributions can -
20:19 - 20:28use them, and we have also been sending
patches to other packages which are still -
20:28 - 20:32open, so we encourage you to take a look
at these packages if you are the maintainer -
20:32 - 20:35and merge the patch.
-
20:36 - 20:41[Lunar] Thanks to Daniel Kahn Gillmor and
Ximin Luo for pushing this proposal forward. -
20:41 - 20:46And also lots of these patches have been
written by Akira and Dhole as part of their -
20:46 - 20:49Google Summer of Code, and you work really
great. -
20:52 - 20:57[Applause]
-
21:03 - 21:08[Dhole] The gcc patch is: gcc uses two
macros which are _DATE and TIME_ -
21:08 - 21:14which embed the timestamp and I wrote a
patch so that if SOURCEDATEEPOCH is set -
21:14 - 21:19instead of adding the current time, it takes
the time from that variable. -
21:19 - 21:26I sent this patch to gcc, it's still there
forgotten with many other patches -
21:26 - 21:29but hopefully at some point they will
realize that this is interesting and they -
21:29 - 21:30will merge it.
-
21:39 - 21:46[Lamby] Hey. Let's very quickly run you
through some really simple ways -
21:46 - 21:50to fixing packages. The details don't
necessarily matter, it's just to give you -
21:50 - 21:56of what needs to be changed and basically
to point out that it's not rocket science. -
21:56 - 21:58So you can just come in and jump in.
-
21:58 - 22:08For example gzip, it's a very old tool
and they decided to add timestamps when -
22:08 - 22:12you generate it, but it's an easy fix, you
just add -n flag. -
22:12 - 22:20Some other things easy to change: some
Python stuff had tag_date=True, which -
22:20 - 22:25I don't know if you can see it but adds a
timestamp to eggs. You just change it to -
22:25 - 22:26False to get rid of it.
-
22:26 - 22:34Static libraries, they are just ar archives
so the same format as .deb, and you -
22:34 - 22:38can just use binutils or strip-nondeterminism
tool. -
22:38 - 22:44PNG has timestamps for some reason, you can
get rid of them, that's ImageMagick and it's -
22:44 - 22:49a bit ugly, but also strip-nondeterminism
gets rid of it. -
22:49 - 22:55Tarballs are quite interesting, they will
by default capture user and group -
22:55 - 22:58you just pass --owner=root bla bla bla...
-
22:58 - 23:05Ordering, this is interesting as well, it
will usually use file system ordering -
23:05 - 23:11which is completely non-deterministic. So
you need to sort with LC_ALL=C. -
23:15 - 23:19[Lunar] Think about the locale! Because
sorting order varies from local to the next. -
23:23 - 23:28[Lamby] They also take timestamps, again
you can set --mtime or you can mock around -
23:28 - 23:31with find/xargs/touch bla bla...
-
23:31 - 23:37Lots of other files have timestamps: Erlang
files for no reason, even upstream don't -
23:37 - 23:40know why they added a timestamp.
-
23:42 - 23:49We have now a patch for SOURCEDATEEPOCH,
which I think landed a couple days ago. -
23:50 - 23:57Here's an interesting one, not necessarily
the current build timestamp, so this is a -
23:57 - 24:05timezone dependent date which Ruby loads
and then saves incorrectly as your local time. -
24:05 - 24:07This gets mangled, so that's patching.
-
24:07 - 24:15I'm going from changing individual packages
to more toolchain things as you can see. -
24:15 - 24:21Upstream configure scripts, you can maybe
see the top that it just uses hostname -
24:21 - 24:26for no reason. Sometimes you can override
it in debian/rules just by exporting something -
24:26 - 24:32or passing a variable to dh_autobuild or
whatever. That's just a little bit more -
24:32 - 24:34involved, you have to look at it more
carefully. -
24:34 - 24:40Perl hash order, lot of Perl uses data
Data::Dumper to just output a bunch of stuff which -
24:40 - 24:47is just not deterministic. So often just
setting Sortkeys, but sometimes it's -
24:47 - 24:48a completely different solution.
-
24:48 - 24:53Header files, so you can maybe see that
they are using the timestamp essentially -
24:53 - 24:59as a unique identifier, you probably have
to start re-writing these something saner -
24:59 - 25:04because this is a wrong use of timestamp
anyway. -
25:04 - 25:13More Makefiles, the deeper they timestamp
in the upstream package the more you have -
25:13 - 25:15to start patching, so these kind of start
sucking a little. -
25:15 - 25:21We've made a lot of toolchain changes, some
already mentioned, some of them already -
25:21 - 25:25merged, see more in this link. Again,
details don't matter, just check it out -
25:25 - 25:30it isn't crazy, it's just working out
what's different. -
25:30 - 25:35In terms of the work done we've sent these
many patches: two patches a day, -
25:35 - 25:38which is not too bad, on average.
-
25:40 - 25:46[Applause]
-
25:48 - 25:51[Holger] I can't clap because I sent three
or something like that -
25:53 - 25:54[Lamby] Holger does three per day.
-
25:55 - 26:00And this doesn't count other bugs we found
in the process of building packages, like -
26:00 - 26:01fail to build.
-
26:01 - 26:08This is blue the ones that are open and
orange are done. -
26:08 - 26:14You can see that someone went a bit crazy
in February filing bugs and eventually they -
26:14 - 26:17were being fixed; slowly.
-
26:18 - 26:24[Holger] And actually we filed more bugs
because the fail to build from source bugs -
26:24 - 26:29are excluded, I think we filed 300 FTBFS
in the last two or three months. -
26:31 - 26:34[Lamby] And those include fail to build
because of reproducibility things as well -
26:34 - 26:36but we haven't split them up.
-
26:40 - 26:47[Lunar] What's left to be done because
Holger said "the graph is a lie". -
26:47 - 26:58The main thing that is blocking a lot of
work is dpkg. Right now the output of dpkg -
26:58 - 27:09will be not deterministic 100% of the time,
because of timestamps and at least the -
27:09 - 27:15file ordering. We also have a patch that
creates these .buildinfo files that we've -
27:15 - 27:22shown that works. It's not submitted yet
to dpkg because we need to agree on the -
27:22 - 27:27format. At least we have ftpmaster or
maybe dpkg, well we have a lot of people -
27:27 - 27:30and that's what we are going to do the
next hour. -
27:30 - 27:39Debhelper also has a few changes; the make
mtimes, debhelper might also not be -
27:39 - 27:43best place, maybe we want that in dpkg.
-
27:43 - 27:48I've been trying to put patches in tar so
we can make it easier. It's complicated to -
27:48 - 27:54see where's the best place but so far we've
been doing our tests with this frame and it works. -
27:54 - 28:00[Holger] In our repository we have these
packages with these bugs fixed so when -
28:00 - 28:04you want to test reproducibility issues on
your own machine you need to use the -
28:04 - 28:07repository which has these patches applied
at the moment. -
28:07 - 28:10In pure sid you cannot create reproducible
packages. -
28:10 - 28:18[Lunar] I heard that the SOURCEDATEEPOCH
patch is in git already, so it's going to happen. -
28:18 - 28:27cdbs also needed to export SOURCEDATEEPOCH
and we are starting to do more infrastructure -
28:27 - 28:34work: Josch mainly and Akira on sbuild,
because we wanted to have this -
28:34 - 28:40srebuild script, where you give it a
buildinfo and it will do the rebuild and -
28:40 - 28:47it needs changes in build daemon for the
build path and also a couple of changes in -
28:47 - 28:49sbuild itself.
-
28:49 - 28:53[Holger] And the script is not ready yet,
this "Finish" means it uses our repository -
28:53 - 28:57at the moment, we need to change it to only
use Sid and snapshot. -
28:57 - 29:02[Lunar] So there is the buildd issue that
we need to discuss -
29:02 - 29:09and we also need to see how we could include
or not, or somewhere give this buildinfo -
29:09 - 29:13control file to the world so they can
rebuild the packages, so it's not yet -
29:13 - 29:14clear where's the best place to store
them. -
29:14 - 29:21Because adding 22,000 files, some
people get cranky of this idea. -
29:21 - 29:26[Holger] It's more than 22,000 files, it's
22,000 source packages multiplied by -
29:26 - 29:3010 architectures; but there's a lot of
arch builds so that's probably 100,000 -
29:30 - 29:38buildinfo files, multiplied by Stretch and
Sid, so it's 200,000 files or more on -
29:38 - 29:40the file servers and on the mirrors we
would like to have it. -
29:40 - 29:44That's the same amount of files which are
currently there. The mirror operators are -
29:44 - 29:49currently not happy, they will not take it,
so our current idea is just concatenate -
29:49 - 29:55all these files into one file that's 140 MB
uncompressed, 40 MB compressed. -
29:55 - 29:56That's easier to handle.
-
29:56 - 30:00And then probably have a service
buildinfo.debian.org where you can -
30:00 - 30:03download individual buildinfo files if you
need them. -
30:04 - 30:10[Lunar] And so when we will be done with
all that we can maybe add a final patch -
30:10 - 30:16it would be to Debian policy, mandating
Debian packages be reproducible. -
30:20 - 30:23[Applause]
-
30:24 - 30:31I can say again that the dream of mine is
that we would stop uploading .deb when -
30:31 - 30:38we upload a package, but instead just upload
the hash of the binary, have the buildd -
30:38 - 30:43build again this package and only if these
two match they can enter the archive. -
30:43 - 30:48So we are sure that at least the two
machines, the developer machine and the -
30:48 - 30:51build deamon agree that they've built the
same thing. -
30:51 - 30:55[Applause]
-
30:58 - 31:03[Holger] I share this dream but I think
having this in policy is a mass requirement -
31:03 - 31:16sadly something only for Stretch + 1, but
I'm curious if we had fixed dpkg and -
31:16 - 31:22debhelper now, would you think we should
upgrade all these wishlist bugs to important now? -
31:23 - 31:26[Audience] Yes!
-
31:31 - 31:34[Holger] We'll talk about this later soon.
-
31:34 - 31:37[Lunar] But before that we actually have
work to do. -
31:40 - 31:44[Dhole] In order to fix your package, the
first thing you can do is go to -
31:44 - 31:51reproducible.debian.net/package, and you
can the web interface where you can see -
31:51 - 31:56notes on the package, we have tags to
identify different issues that make packages -
31:56 - 31:59not reproducible, with links to the wiki
about how to solve them. -
32:05 - 32:09[Holger] When you see this, you want to
click on this debbindiff link. -
32:09 - 32:12It's still called debbindiff not diffoscope,
this will show all the differences, -
32:12 - 32:17if there is a note. If the package is
unreproducible and there's no note -
32:17 - 32:21it will automatically display the
debbindiff, and if your package is fine -
32:21 - 32:23there's here a sun.
-
32:30 - 32:34[Dhole] You can also see an entry in the
tracker, stating if your package is -
32:34 - 32:35reproducible or not.
-
32:39 - 32:46You can also find information in DDPO and
DMD. You can find tips on the wiki it's -
32:46 - 32:54ReproducibleBuilds wiki, we are working on
a Howto to have detailed steps on different -
32:54 - 33:01issues and how to solve them. Lunar gave
a talk at CCCamp where there's many issues -
33:01 - 33:05really well explained and the solutions for
them. -
33:05 - 33:11You can also come to our irc channel which
is #debian-reproducible and ask for help -
33:11 - 33:13or go to the mailing-list.
-
33:14 - 33:21In order to test locally if your package is
reproducible right now we are using a -
33:21 - 33:29script that uses pbuilder in a custom
configuration, you need to set up our -
33:29 - 33:35reproducible repository. In the Howto in
the wiki there's the steps on how to set up -
33:35 - 33:39the chroot and everything, it's documented
in the wiki. -
33:39 - 33:44Diffoscope is in unstable and today it's
going in Stretch. -
33:44 - 33:54We plan to add these scripts to rebuild
packages in different settings in debscripts -
33:54 - 34:04once dpkg is good, and we welcome you
tomorrow to the hacking session from -
34:04 - 34:072 to 7 in Stockholm room.
-
34:10 - 34:15[Lunar] That's for fixing your packages,
please do that. If you want to have even -
34:15 - 34:19more fun, then test your own package, join
us! -
34:19 - 34:25This is the past year of my life, it has
been awesome because the team has been -
34:25 - 34:32so great, it's been friendly atmosphere, lots of
new understanding so many things you didn't -
34:32 - 34:39want to learn about that you had to learn
about, and basically it feels very good to -
34:39 - 34:46be part of this actual changing the world
thing. It's just software but it has some -
34:46 - 34:51profound effect. I've been told that the
work we've been doing is being tossed -
34:51 - 34:58around in Cisco and Google and Facebook;
all these big dot com companies bla bla, -
34:58 - 35:02they actually want to do that as well even
though they are not doing Free Software, -
35:02 - 35:03which I find wired, but whatever.
-
35:03 - 35:10So what do we do? We review packages, we
have these notes when we actually try to -
35:10 - 35:13identify, so when the maintainer comes
they don't have to think to much about -
35:13 - 35:19the problem and just fix it. We try to
identify common trends so when many -
35:19 - 35:24packages have the same problem we make an
entry and explain and maybe think about fixes -
35:24 - 35:27that could apply to the whole archive.
-
35:27 - 35:34We work on this reproducible.debian.net
jenkins setup, the scripts. -
35:34 - 35:41We hack on the diffoscope tool, we make
strip-nondeterminism better, we propose -
35:41 - 35:45changes for the toolchains when there are
needs, some need a lot of patches, -
35:45 - 35:59most of the bugs we have reported on
individual packages have patches. -
36:01 - 36:04[Holger] Bugs have patches
[Lunar] Yes! -
36:04 - 36:09And also we are actually writing some more
general documentation from the -
36:09 - 36:15understanding of these things we have been
having, we are preparing a reproducible -
36:15 - 36:22builds Howto to explain to the Free Software
world how they can do it so it's about some -
36:22 - 36:27of what Chris explained but also more
general consideration on what if you're -
36:27 - 36:29not Debian and you want your thing
reproducible when you distribute as an -
36:29 - 36:36independent vendor. So we want to work on
reference documentation so the whole world -
36:36 - 36:37can actually do that.
-
36:39 - 36:43We do a lot of talks as you've seen and
it's been fun, and with all these -
36:43 - 36:49presentations we've made so far it's all
in git. And everybody is free to take one -
36:49 - 36:53of these slide decks and run with it
somewhere, translate it... -
36:57 - 36:59Questions?
-
37:01 - 37:04We have to run with the microphone, because
there's no mic anymore. -
37:14 - 37:17[Question] I just wanted to make two quick
comments: so first of all diffoscope is -
37:17 - 37:22really awesome, not only for reproducibility
but also for example if you change your -
37:22 - 37:27debian/rules in some way and want to see if
the package is the same afterwards because -
37:27 - 37:32you just cleaned up a bit, that's really
awesome for that, so thank you. -
37:32 - 37:37And also I think the work you're doing now
is something that in 20 years time we're -
37:37 - 37:41going to look back towards it and think,
well, of course builds should be -
37:41 - 37:44reproducible, so thank you very much for
your work! -
37:45 - 37:49[Applause]
-
37:52 - 38:03[Question] When reproducibility becomes
part of the Debian policy, will there be a -
38:03 - 38:06lintian --reproducible?
-
38:09 - 38:12[Holger] I don't think lintian can detect
that because lintian works on the source -
38:12 - 38:15package and you need to build the package
for this. -
38:16 - 38:21[Lamby] Things that could be detected by
lintian from a static analysis point of view, -
38:21 - 38:26yeah I'm sure, like looking for gzip
without -n for example, but that wouldn't -
38:26 - 38:29be conclusive from lintian point of view.
-
38:29 - 38:33[Lunar] One thing that I really wanted to
diffoscope at some point - the code is made -
38:33 - 38:38the way that it's possible - it's to have
hints so when it actually looks up -
38:38 - 38:44differences between two packages then you
can have an idea, suggest you: hey you need -
38:44 - 38:50to remove that timestamps, or you should
sort these keys. It's not done yet, but if -
38:50 - 38:53anybody wants to do patches it's totally
doable. -
38:58 - 39:04[Question] Thank you for the work, have
you thought about reproducible images? -
39:05 - 39:06[Holger] It's on the todo list.
-
39:08 - 39:15Before images we need reproducible package
installation, and then we need reproducible -
39:15 - 39:20images like squashfs has some things which
are not reproducible, but the package -
39:20 - 39:23installation is not reproducible at the
moment because apt installs packages in -
39:23 - 39:28arbitrary order and then the post-inst
create for example users which get -
39:28 - 39:33user-ids in the order the packages are
installed, so for that to fix either apt -
39:33 - 39:39needs a way to install in a deterministic
order, but it's on the todo list file. -
39:40 - 39:44[Lunar] Pabs started a wiki page a couple
of months ago that is called reproducible -
39:44 - 39:50install. This is very important if we want
tools like Tails to actually be reproducible -
39:50 - 39:55so some people will work on that, we do
want to work on that. -
39:55 - 39:59[Lamby] It's quite a deep problem for
example d-i will install different stuff -
39:59 - 40:02depending on your hardware, so that's
immediately not reproducible. -
40:02 - 40:05It'd be great.
-
40:07 - 40:10[Question] I've been working on a couple
of my packages to get them reproducible -
40:10 - 40:16build, but I was often wondering if I
should fix it in my package or actually -
40:16 - 40:23that it should be fixed in higher up and I
guess I've been adding some fixes to my -
40:23 - 40:28packages which may in the future even not
be needed anymore and then it's just -
40:28 - 40:31unnecessary code as well.
-
40:31 - 40:36So how do you see where things should be
fixed and how should we as package -
40:36 - 40:38maintainers go about with this?
-
40:38 - 40:44[Holger] There's many things which there's
the easy fix to whatever: set the timezone in -
40:44 - 40:51debhelper or better in dpkg to UTC, but
that will not fix the upstream bugs, so -
40:51 - 40:57actually it's better not to fix, set the
timezone or other things deterministically -
40:57 - 41:01in these tools but rather have them fixed
upstream, that's what we want. -
41:01 - 41:07Some things we will need to fix them in
dpkg to get a meaningful result but -
41:07 - 41:12basically we want rather these distributions
with just build from source which don't have -
41:12 - 41:15debian/rules and they just build with
upstream Makefiles, we want the fixes -
41:15 - 41:17to land there.
-
41:18 - 41:22[Lunar] We've been experimenting for two
and this is a lot of trials and errors, -
41:22 - 41:26trying something, see how it fails, or
maybe we can do better than that and -
41:26 - 41:30changing. And I know this can be frustrating
at some point because you do changes -
41:30 - 41:36and they all become unneeded, but in the
end this is how we make stuff that matters. -
41:36 - 41:41And we move forward, it's not because we're
trying to make the big picture at once, -
41:41 - 41:46and I know in Debian we sometimes try to do
that, so we experiment and learn from it. -
41:47 - 41:52[Question] An example that I'm now looking
into is actually the documentation is built -
41:52 - 41:58for this package by looking in all the files
and generating but, for instances the -
41:58 - 42:06index file is sorted, but I guess upstream
would say: well, if you set some ordering -
42:06 - 42:11in your LC parameters you want this page
to be order as you want, instead of forcing -
42:11 - 42:16it in the sort, so I'm really wondering:
should I now upstream this or should -
42:16 - 42:19I just fix it in my rules because that's
the logical place? -
42:21 - 42:29[Lunar] Both. No, there's no good answer,
I'm quite a strong proponent on the idea -
42:29 - 42:35that if you use a computer you should be
able to talk and have the computer talk to -
42:35 - 42:41you in the language that you choose, so if
people want to have gcc error messages -
42:41 - 42:45in German, they should have it.
-
42:45 - 42:51But local sorting, this is the kind of
LC_ALL that can be very local and that -
42:51 - 42:54you can do for just one tool, it's fine to
do that. -
42:56 - 42:59[Question] Do you have ideas on making
sources reproducible? Like upstreams -
42:59 - 43:04calling make dist, or this infamous
autogen.sh files? -
43:07 - 43:12[Lunar] I don't think that anybody in the
team has looked into that yet, source -
43:12 - 43:23files are easy to analyze way more than
binary packages so, it would still be great -
43:23 - 43:30to have easier ways; you have source
tarballs be byte for byte identical, -
43:30 - 43:37but it's not as an issue as it is for
binaries. If people want to look in that -
43:37 - 43:39they should.
-
43:44 - 43:49[Question] Do you know a way to make git
archive build something reproducible? -
43:50 - 43:52[Lunar] Well pristine-tar
-
43:52 - 43:53[Question] Yes, but without it.
-
43:54 - 43:58[Holger] There's one tool. You want to use
a new one? Then write it. -
44:02 - 44:05Why not use that tool which does the job?
-
44:06 - 44:08pristine-tar does it.
-
44:11 - 44:17[Lunar] This is for source and so that's
another issue that what we are actually -
44:17 - 44:18currently working on.
-
44:22 - 44:26[Holger] You're welcome to join the team and
extend our scope to sources. -
44:29 - 44:31[Lunar] How many questions, two?
-
44:33 - 44:36Two more questions, two or three.
-
44:44 - 44:52[Question] So if there is a couple of other
environment variables that could be set -
44:52 - 44:59in the environment to increase
reproducibility, where to put them? -
44:59 - 45:08In the rules file? Or in the generic build
environment of all packages, or where -
45:08 - 45:10should these things be placed?
-
45:13 - 45:20[Lamby] It'd be nice if upstream fixed it,
so if we just change it in debian/rules -
45:20 - 45:29that's just only helping us, so often take
it upstream, would be the ideal solution. -
45:29 - 45:31Are you referring to something else?
-
45:32 - 45:40[Question] For example many hashmaps have
randomized data in the hash function, so if -
45:40 - 45:47you have some code that relies on hash
order, at least some implementations of -
45:47 - 45:57hash functions are leaving them be seeded
rather than using something random for -
45:57 - 46:03a build thing, but you want the randomness
in your hash functions for normal users -
46:03 - 46:11because else your hashmaps get open
to attacks. -
46:12 - 46:14[Lamby] Correct, yes.
-
46:16 - 46:22[Lunar] In these cases we send patches
adding sort everywhere for the keys and -
46:22 - 46:28it's solved. For very few cases, for Perl for
example you can set and environment -
46:28 - 46:33variable and some maintainers prefer to do
that. But usually we try to push these -
46:33 - 46:36changes upstream, because they are simple
enough and they like it. -
46:36 - 46:39Actually it makes testing easier to them.
-
46:42 - 46:45There was one in the back, there.
-
46:53 - 46:56[Lunar] That's the last question
-
46:56 - 47:00[Question] Follow up question to what we
had here before. -
47:00 - 47:10You showed an open bug report against gcc
to support SOURCEDATEEPOCH to cover -
47:10 - 47:20the mdate and mtime timestamps, so I have
patches to patch them out in my packages. -
47:20 - 47:24Should I remove those patches and if so,
when? -
47:26 - 47:30[Lunar] Have you seen any more emails
from the gcc maintainers? -
47:34 - 47:40[Dhole] The mail is forgotten, I guess we
should ping it again, and see if they -
47:40 - 47:50reply, because what I read from the gcc
website is that only the replies from -
47:50 - 47:55maintainers are the ones that matter, and
I think no maintainer replied to the -
47:55 - 47:58message, so we should ping again.
-
47:59 - 48:03[Question] That was just an example, my
question was more general. -
48:03 - 48:09At which time should I remove my patches
to fix things which were fixed higher up -
48:09 - 48:12in the toolchain? Or should I just leave
them in there? -
48:13 - 48:14[Holger] Once they are in Sid.
-
48:15 - 48:17[Question] Ok thanks!
-
48:18 - 48:20[Lunar] Ok, I guess we're out of time.
-
48:20 - 48:22Thank you for listening.
-
48:23 - 48:26[Applause]
-
48:26 - 48:29[Lunar] Fix your packages!
Show all