Stretching out for trustworthy reproducible builds creating bit by bit identical binaries
- 
0:01 - 0:02Welcome and good morning
- 
0:04 - 0:07This is the reproducible builds team,
 talking about
- 
0:07 - 0:10"Stretching out towards trustworthy
 computing"
- 
0:12 - 0:20[Applause]
- 
0:22 - 0:26We're 4 on stage, but actually this is a
 team effort.
- 
0:26 - 0:31All these people listed here have
 contributed to the project at one point.
- 
0:31 - 0:33The 4 of us, that's
- 
0:33 - 0:34Lunar − me
- 
0:34 - 0:35there's Dhole,
- 
0:35 - 0:36Chris Lamb − lamby
- 
0:36 - 0:38and Holger.
- 
0:39 - 0:43But actually, this is DebConf and so a lot
 more of us have been or are
- 
0:43 - 0:47currently here and so, if you want to
 thank anybody that is working on this
- 
0:47 - 0:49you need to actually thank all of
 these folks
- 
0:49 - 0:51'cause, yay.
- 
0:51 - 0:56[Applause]
- 
0:57 - 1:00[Holger] The people in blue are here.
- 
1:04 - 1:06[Lunar] Let's get started.
- 
1:06 - 1:08Quick recap on what we're talking
 about.
- 
1:08 - 1:11We have software, it's made from source.
- 
1:11 - 1:15Source is readable by humans or at least
 a good amount of humans.
- 
1:15 - 1:17In this room it's good.
- 
1:17 - 1:24Binary, readable by computer and some
 tiny fraction of humanity.
- 
1:24 - 1:30Going from source to binary is called
 build, or like building or compiling
- 
1:30 - 1:33and we're doing free software and
 free software is awesome because
- 
1:33 - 1:38we can actually run these binaries like
 we want
- 
1:38 - 1:44We can actually study the software, how
 it's been made by studying the source
- 
1:44 - 1:49and by studying the source we can assess
 that it does what it's supposed to do
- 
1:49 - 1:51and not something else that does not
- 
1:51 - 1:56have malware, or trojans or security bugs
- 
1:56 - 2:01So we have the binary that can be used,
 fine.
- 
2:01 - 2:04We have the source that can be verified.
- 
2:04 - 2:10Problem is that right now, the only way we
 know that a binary that we get…
- 
2:10 - 2:16We have to trust a website or a Debian
 repository that says
- 
2:16 - 2:18"Well, this binary has been made with this
 source"
- 
2:18 - 2:23But there's no way we can actually prove
 that.
- 
2:23 - 2:27This is actually a problem that has been
 well explained by
- 
2:27 - 2:34Mike Perry and Seth Schoen at the 31c3
 in Hamburg last december.
- 
2:34 - 2:41For example, Seth Schoen made a proof of
 concept exploit for the Linux kernel
- 
2:41 - 2:52that when GCC was called, the kernel would
 without modifying anything on the disk
- 
2:52 - 2:59when the kernel detects that GCC is going
 to read a C file, it will insert some
- 
2:59 - 3:06extra lines of code, and these lines of
 code can be a very bad thing
- 
3:06 - 3:09in the case of 31c3 talk I was just
 recalling.
- 
3:09 - 3:18Actually, you can even have developers
 who are in very good faith, who have
- 
3:18 - 3:21totally secure dev machines, or they
 thought they have,
- 
3:21 - 3:24who have reviewed all their source code
 for any bugs
- 
3:24 - 3:31and we would still get totally owned as
 soon as their computer gets compromised
- 
3:31 - 3:34or one of the build demons from Debian
 gets compromised for example.
- 
3:34 - 3:41This is not, like, hypothetical threats
 here we're discussing
- 
3:41 - 3:46A couple of months after Seth an Mike's
 talk at 31c3,
- 
3:46 - 3:49the Intercept revealed from the Snowden
 leaks
- 
3:49 - 3:56that at a CIA conference in 2012, one
 of the talks that happened
- 
3:56 - 3:59was about a project called Strawhorse.
- 
3:59 - 4:05Strawhorse is about modifying Apple XCode,
 which is the development environment
- 
4:05 - 4:09for MacOS 10 and iOS applications
- 
4:09 - 4:11and well, they were modifying XCode so
 it would produce,
- 
4:11 - 4:13without the developer knowing,
- 
4:13 - 4:23binaries with trojans, malware,
 ??? binaries, lots of bad things.
- 
4:23 - 4:25So, solution:
- 
4:25 - 4:29enable anyone to reproduce identical
 binary packages from a given source.
- 
4:29 - 4:35Because if using a source, using the same
 environment,
- 
4:35 - 4:40multiple people on different computers, on
 different networks, at different times,
- 
4:40 - 4:43can all get the same thing
 from the same source
- 
4:43 - 4:45all the same binary, byte for byte,
- 
4:45 - 4:47then there's a good chance that…
- 
4:47 - 4:55Well, everybody could be owned,
 but let's be more joyful and say that
- 
4:55 - 4:59probably, if everybody gets the same
 result, there was actually no problem
- 
4:59 - 5:01and everybody is safe.
- 
5:02 - 5:04We call that solution
 "reproducible builds"
- 
5:07 - 5:08Yay.
- 
5:08 - 5:11[Applause]
- 
5:13 - 5:15Actually, it's not only about security.
- 
5:15 - 5:19For Debian, we have, if you're doing
 "Multi-arch: same" packages,
- 
5:19 - 5:25well they only have the same bytes if
 they are built for different architectures,
- 
5:25 - 5:28the files in the package.
- 
5:28 - 5:34Debug packages, you can create at a later
 time, if you forgot to have debug packages
- 
5:34 - 5:36in the first place,
- 
5:36 - 5:42you can pass the no-strip option later and
 because the package is reproducible,
- 
5:42 - 5:47you will get the debug symbols that work
 for software that has been shipped already
- 
5:47 - 5:50We do early detection of FTBFS that way
- 
5:50 - 5:54because if we try pretty quickly
 to reproduce a build,
- 
5:54 - 5:55then it has to work.
- 
5:55 - 5:58It's useful for build profiles.
- 
5:58 - 6:02We can get smaller .deb deltas,
- 
6:02 - 6:05because from one version to the next we
 might have the same content.
- 
6:05 - 6:09We can do validation of cross-builds,
- 
6:09 - 6:12Helmut Grohne can talk to you about that.
- 
6:12 - 6:17And also, Niels Thykier told me that
- 
6:17 - 6:21he was very interested in reproducible
 builds because it would enable him to
- 
6:21 - 6:24test debhelper better, because
- 
6:24 - 6:29if the package builds reproducibly,
 then he makes a change to debhelper
- 
6:29 - 6:32he can rebuild the package ???
- 
6:32 - 6:36the same version of a package with a newer
 debhelper and see what has changed
- 
6:36 - 6:40and this change can be isolated to only
 what he has worked on debhelper
- 
6:40 - 6:42for example.
- 
6:43 - 6:45And, oh my.
- 
6:45 - 6:48The whole world is watching us.
- 
6:48 - 6:56Since two years or a year and a half ago,
 everybody I meet in security conference,
- 
6:56 - 6:59in hacker conference, in free software
 conference is like
- 
6:59 - 7:01"Oh you're working on that,
 that's awesome."
- 
7:01 - 7:09And, I mean, I've been the one doing quite
 a lot of talks, and everybody comes to me
- 
7:09 - 7:11and I'm like "Wow wow, this is way bigger",
- 
7:11 - 7:16but we're actually leading the field here.
- 
7:16 - 7:19Yay Debian.
- 
7:19 - 7:26[Applause]
- 
7:26 - 7:29[Holger] So, we are not the only ones
 leading the field,
- 
7:29 - 7:33Bitcoin and Tor made their software
 reproducible before us,
- 
7:33 - 7:37Coreboot also succeeded, if you build
 Coreboot without any payload,
- 
7:37 - 7:39that's 100% reproducible.
- 
7:39 - 7:44FreeBSD has a page on their wiki since
 2013
- 
7:44 - 7:49saying there are 5 reproducibility issues
 in their base system.
- 
7:49 - 7:52We're at the moment trying to
 confirm this.
- 
7:52 - 7:57On jenkins.debian.net, I've also set up
 now tests for FreeBSD, NetBSD,
- 
7:57 - 7:59Coreboot and OpenWrt.
- 
7:59 - 8:03So if you go to
 reproducible.debian.net/
- 
8:03 - 8:05you get that tested.
- 
8:05 - 8:08And there's more in the pipeline.
- 
8:08 - 8:11There are other projects interested
 as well.
- 
8:11 - 8:15NetBSD also has a variable ???
 which you can set
- 
8:15 - 8:17and that builds reproducibly.
- 
8:17 - 8:20Though they think "I'm keeping some
 timestamps ??? and then
- 
8:20 - 8:22filtering them out later".
- 
8:22 - 8:23We disagree.
- 
8:23 - 8:28So this is how Debian looks like,
 Debian Sid,
- 
8:28 - 8:30but this is a lie.
- 
8:30 - 8:32This is not the truth.
- 
8:32 - 8:34This is just our test setup.
- 
8:34 - 8:36Sid is not like this.
- 
8:36 - 8:40For Sid, it's all orange, there's zero
 reprodicibility in Sid today.
- 
8:40 - 8:44But we'll talk now and in the following
 round table,
- 
8:44 - 8:47it's to actually make Sid reproducible.
- 
8:47 - 8:52The current status is
- 
8:52 - 8:58we're working on this in Debian since
 two years ago.
- 
8:58 - 9:02We have weekly reports about our project
 now since May
- 
9:02 - 9:07and we've given several talks, especially
 in the last year
- 
9:07 - 9:11and all these talks, presentation, also
 other stuff is linked in the wiki.
- 
9:11 - 9:15There's a page with information about
 Debian, these BSDs,
- 
9:15 - 9:19other Linuxes, upstream ???
 all on this wiki.
- 
9:23 - 9:27Since DebConf14, which is merely
 a year ago,
- 
9:27 - 9:29we've made quite some changes.
- 
9:29 - 9:33We have introduced
 strip-nondeterminism
- 
9:33 - 9:39which is called by dh at the end
 of the build of the package
- 
9:39 - 9:45and will normalize some things
 which Chris will explain later
- 
9:45 - 9:50We have decided on a fixed build path
- 
9:50 - 9:54because the build path is leaked
 in the binaries and several things
- 
9:54 - 9:57We didn't find a way yet to make
 the build path arbitrary.
- 
9:57 - 10:03We designed a way to record the build
 environment
- 
10:03 - 10:08because to rebuild, you need to recreate
 the build environment.
- 
10:08 - 10:12We set up this Jenkins setup.
- 
10:12 - 10:17We wrote diffoscope which used to be
 called debbindiff
- 
10:17 - 10:21which shows differences between two
 packages or two directories or
- 
10:21 - 10:24two filesystems by now.
- 
10:24 - 10:31There's SOURCE_DATE_EPOCH, which is a way
 that the tools expose
- 
10:31 - 10:34the last modification of the source.
- 
10:34 - 10:37Because the build date, people want to
 include the build date
- 
10:37 - 10:39because they think this is a
 meaningful indication:
- 
10:39 - 10:42when a build was done,
 which software used.
- 
10:42 - 10:46But if the build always recreates
 the same results
- 
10:46 - 10:47the build date becomes meaningless
- 
10:47 - 10:51and the really interesting thing is
 the latest modification of the source.
- 
10:52 - 10:56We have written patches for the tools
- 
10:58 - 11:04[Lunar] strip-nondeterminism:
 is Andrew Ayer in the audience?
- 
11:04 - 11:06Yay! He did it!
- 
11:06 - 11:12It's written in Perl because we didn't
 want to have a new build dependency
- 
11:12 - 11:14in all Debian packages.
- 
11:14 - 11:18Basically it takes anything and tries
 to normalize it as much as it can
- 
11:18 - 11:27replacing timestamps or file permissions
 or removing some issues.
- 
11:27 - 11:31It's working very well on many formats,
 it's meant to be extensible
- 
11:31 - 11:38so we can actually add more things and
 it's run by dh at the end of the process, as Holger said.
- 
11:38 - 11:45The .buildinfo is currently a proposal
 we have not yet totally agreed
- 
11:45 - 11:49but we are generating them as part
 of the test we have
- 
11:49 - 11:57and basically it's a new control file that
 will tie the sources, the generated binary
- 
11:57 - 12:01the packages that were used to build this
 binary and their version.
- 
12:01 - 12:09The idea is that we can use this file to
 reinstall all the specific versions from snapshot
- 
12:09 - 12:17So we recreate the same build environment
 then we can just start the build from that source
- 
12:17 - 12:21that was mentioned and see if the binary
 that has been generated matches.
- 
12:23 - 12:28What it looks like for now, you see there is
 a source binary, the build path
- 
12:28 - 12:34because currently we don't have any good
 post-processing tool for buildpaths
- 
12:34 - 12:41in elf and dwarf binaries, we just decided
 to specify the build path so when we do
- 
12:41 - 12:45a later rebuild we use that path and be safe.
- 
12:45 - 12:52The source is ???, the binary is .deb and
 a list of packages with the versions.
- 
12:53 - 13:02We currently use the base files version
 to know which Debian release is to be used
- 
13:02 - 13:04as the basis.
- 
13:11 - 13:18[Holger] The general procedure for testing is:
 we build the source, we save the results,
- 
13:18 - 13:23we modify the environment and we build
 it again and compare the results.
- 
13:23 - 13:32That started as a shell script last year which I
 put on jenkins and then it exploded a bit
- 
13:32 - 13:36and now we have 67 jenkins jobs running on
 7 hosts.
- 
13:36 - 13:45Since last week we have 4 armhf small boards
 where we will be able to test armhf,
- 
13:45 - 13:46but very slowly.
- 
13:46 - 13:49We have two new amd64 build nodes.
- 
13:49 - 13:53The code is now split into Python and bash
 scripts.
- 
13:53 - 13:59For all the other distro testing there's a
 lot of bash code now which is mostly
- 
13:59 - 14:05boilerplate and it's 5 lines or something
 to build FreeBSD and 5 lines to build NetBSD
- 
14:05 - 14:09but there's 100 lines boilercode around so it's
 really not that much code.
- 
14:09 - 14:13We do test Testing, Unstable and Experimental.
- 
14:13 - 14:16For arm we only start with Unstable.
- 
14:16 - 14:22We do like hardware so if you have hardware
 to donate to us, that would be great,
- 
14:22 - 14:25we need ssh and then root basically.
- 
14:27 - 14:34We are testing Coreboot, OpenWrt and the
 BSD's, soon I will also set up a Fedora test
- 
14:34 - 14:40I don't want to test all the 20,000 Fedora
 packages but just 200 or something:
- 
14:40 - 14:44the base system of Fedora to examine how
 rpm works
- 
14:44 - 14:48to get really the whole Free Software world
 reproducible.
- 
14:48 - 14:53This is all run on ProfitBricks hardware
 since 2002, so thanks to ProfitBricks.
- 
14:57 - 15:00This is the variations we do for Debian.
- 
15:02 - 15:07It's the hostname, username, timezone,
 locale.
- 
15:07 - 15:14Chris will explain what modifications
 this causes, variances...
- 
15:14 - 15:19We are not testing at the moment differences
 in date so the date is always the same
- 
15:19 - 15:20the time is a bit different.
- 
15:20 - 15:26[Lunar] Well almost! Because we cheat with
 the timezone, we use one timezone that is
- 
15:26 - 15:32GMT-14 and then GMT+12 so it's more than
 24 hours appart.
- 
15:33 - 15:36[Holger] On the first of the month we
 sometimes find new bugs where there's
- 
15:36 - 15:38packages which record the month.
- 
15:41 - 15:44We don't have variations of the CPU type
 at the moment.
- 
15:46 - 15:51Both time and CPU type variations, we'll
 have them about one or two weeks
- 
15:51 - 15:54the nodes are being prepared at the moment.
- 
15:54 - 16:01Then we will test all the meaningful
 variations we could think of.
- 
16:01 - 16:05There will be probably some packages which
 build different according to the number of
- 
16:05 - 16:11number of CD drives attached or whatever
 things, but those will be find by you.
- 
16:12 - 16:17[Lunar] We are doing all these tests because
 we want when you rebuild a package on
- 
16:17 - 16:22your machine that if any this is different from
 the build deamons in Debian you get
- 
16:22 - 16:23the same results.
- 
16:23 - 16:30We use this to detect this problems early
 before you actually a false positive that we have
- 
16:30 - 16:34to investigate when someone rebuilds a
 package on their machine.
- 
Not SyncedTo understand the difference that we found
 from one build to the other.
- 
Not SyncedIt started also as a 10 lines javascript???
 and then it felt okeyish
- 
Not Syncedand so Python!
- 
Not SyncedAnd now it's a lot of code and it actually
 grew way beyond a Debian package.
- 
Not SyncedWe changed the name, it was called debbindiff
 but it's absolutely not tied to Debian anymore.
- 
Not SyncedIt's called diffoscope, thanks to ??? for the name.
- 
Not SyncedBasically what it does: it tries to get to
 the bottom of what is different between
- 
Not Syncedtwo archives or directories.
- 
Not SyncedBecause it's not useful to compare bytes that
 are compressed by gzip or xz, that will not
- 
Not Syncedlead you to understand what is different
 you need to uncompress and look at
- 
Not Synceduncompressed data, and if the thing actually
 compressed is a tarball, you might actually
- 
Not Syncedwant to compare the files inside the tarball.
- 
Not SyncedIf there is a PDF inside this archive, you
 don't want to compare the bytes of the PDF
- 
Not Syncedyou want to compare the text of the PDF.
- 
Not SyncedSo this is basically what diffoscope does,
 it tries to transform anything that is
- 
Not Synceda container and compare things in this
 container and if they can be transformed into
- 
Not Synceda human readable form it will try to do
 that, and compare these human readable form.
- 
Not SyncedAnd if it doesn't find any difference but
 there are still differences from the bin
- 
Not Syncedit will fall back to binary comparison.
- 
Not SyncedTry it, extend it; it's Python, it's modular,
 it's great.
- 
Not SyncedIt already supports squashfs, ISO, rpm,
 gettext, ??? files and so many different things.
- 
Not SyncedYou can have HTML output like that,
 so this is what is displayed on many
- 
Not Syncedexamples we've shown so far, and also
 to make it easier for copy paste
- 
Not Syncedand post processing we have the text output.
- 
Not SyncedYou can also use it to review packages before
 uploading them to Debian.
- 
Not SyncedIt does fuzzy matching, so even if the
 directory is different in the archive it will
- 
Not Syncedfind it like git does.
- 
Not SyncedIt has grown way more beyond just build
 reproducibly. A useful tool.
- 
Not Synced[Dhole] In order to solve timestamp issues, we are
 proposing the SOURCE_DATE_EPOCH variable.
- 
Not SyncedThis is because most of the times having
 the build date embedded in a package
- 
Not Syncedis not useful for the user, because you could
 take a really old package and build it today
- 
Not Syncedand that day would not be useful.
- 
Not SyncedWe are standardizing a replacement for build
 dates so that tools can use it.
- 
Not SyncedWhen this value is set, the tool instead of
 embedding the current date, it will embed
- 
Not Syncedthe date taken from SOURCE_DATE_EPOCH which
 will contain a Unix epoch timestamp.
- 
Not SyncedThis is a general solution we are trying to
 standardize so that not only Debian uses it,
- 
Not Syncedbut other Free Software projects and
 distributions and in the case of Debian,
- 
Not Syncedwe set this variable to the latest Debian
 changelog entry timestamp.
- 
Not SyncedWe have already been sending patches to
 different packages, mostly it's documentation
- 
Not Syncedgeneration. So here's a list of bugs that
 we have opened which have been closed
- 
Not Syncedand merged; so it's help2man, epydoc,
 ghostscript, texi2html and sphinx.
- 
Not SyncedWe are both sending these patches to Debian
 and upstream so all the distributions can
- 
Not Synceduse them, and we have also been sending
 patches to other packages which are still
- 
Not Syncedopen, so we encourage you to take a look
 at these packages if you are the maintainer
- 
Not Syncedand merge the patch.
- 
Not Synced[Lunar] Thanks to Daniel Kahn Gillmor and
 Ximin Luo for pushing this proposal forward.
- 
Not SyncedAnd also lots of these patches have been
 written by Akira and Dhole as part of their
- 
Not SyncedGoogle Summer of Code, and you work really
 great.
- 
Not Synced[Applause]
- 
Not Synced[Dhole] The gcc patch is: gcc uses two
 macros which are __DATE__ and __TIME__
- 
Not Syncedwhich embed the timestamp and I wrote a
 patch so that if SOURCE_DATE_EPOCH is set
- 
Not Syncedinstead of adding the current time, it takes
 the time from that variable.
- 
Not SyncedI sent this patch to gcc, it's still there
 forgotten with many other patches
- 
Not Syncedbut hopefully at some point they will
 realize that this is interesting and they
- 
Not Syncedwill merge it.
- 
Not Synced[Lamby] Hey. Let's very quickly run you
 through some really simple ways
- 
Not Syncedto fixing packages. The details don't
 necessarily matter, it's just to give you
- 
Not Syncedof what needs to be changed and basically
 to point out that it's not rocket science.
- 
Not SyncedSo you can just come in and jump in.
- 
Not SyncedFor example gzip, it's a very old tool
 and they decided to add timestamps when
- 
Not Syncedyou generate it, but it's an easy fix, you
 just add -n flag.
- 
Not SyncedSome other things easy to change: some
 Python stuff had tag_date=True, which
- 
Not SyncedI don't know if you can see it but adds a
 timestamp to eggs. You just change it to
- 
Not SyncedFalse to get rid of it.
- 
Not SyncedStatic libraries, they are just ar archives
 so the same format as .deb, and you
- 
Not Syncedcan just use binutils or strip-nondeterminism
 tool.
- 
Not SyncedPNG has timestamps for some reason, you can
 get rid of them, that's ImageMagick and it's
- 
Not Synceda bit ugly, but also strip-nondeterminism
 gets rid of it.
- 
Not SyncedTarballs are quite interesting, they will
 by default capture user and group
- 
Not Syncedyou just pass --owner=root bla bla bla...
- 
Not SyncedOrdering, this is interesting as well, it
 will usually use file system ordering
- 
Not Syncedwhich is completely non-deterministic. So
 you need to sort with LC_ALL=C.
- 
Not Synced[Lunar] Think about the locale! Because
 sorting order varies from local to the next.
- 
Not Synced[Lamby] They also take timestamps, again
 you can set --mtime or you can mock around
- 
Not Syncedwith find/xargs/touch bla bla...
- 
Not SyncedLots of other files have timestamps: Erlang
 files for no reason, even upstream don't
- 
Not Syncedknow why they added a timestamp.
- 
Not SyncedWe have now a patch for SOURCE_DATE_EPOCH,
 which I think landed a couple days ago.
- 
Not SyncedHere's an interesting one, not necessarily
 the current build timestamp, so this is a
- 
Not Syncedtimezone dependent date which Ruby loads
 and then saves incorrectly as your local time.
- 
Not SyncedThis gets mangled, so that's patching.
- 
Not SyncedI'm going from changing individual packages
 to more toolchain things as you can see.
- 
Not SyncedUpstream configure scripts, you can maybe
 see the top that it just uses hostname
- 
Not Syncedfor no reason. Sometimes you can override
 it in debian/rules just by exporting something
- 
Not Syncedor passing a variable to dh_autobuild or
 whatever. That's just a little bit more
- 
Not Syncedinvolved, you have to look at it more
 carefully.
- 
Not SyncedPerl hash order, lot of Perl uses data
 ??? to just output a bunch of stuff which
- 
Not Syncedis just not deterministic. So often just
 setting Sortkeys, but sometimes it's
- 
Not Synceda completely different solution.
- 
Not SyncedHeader files, so you can maybe see that
 they are using the timestamp essentially
- 
Not Syncedas a unique identifier, you probably have
 to start re-writing these something saner
- 
Not Syncedbecause this is a wrong use of timestamp
 anyway.
- 
Not SyncedMore Makefiles, the deeper they timestamp
 in the upstream package the more you have
- 
Not Syncedto start patching, so these kind of start
 sucking a little.
- 
Not SyncedWe've made a lot of toolchain changes, some
 already mentioned, some of them already
- 
Not Syncedmerged, see more in this link. Again,
 details don't matter, just check it out
- 
Not Syncedit isn't crazy, it's just working out
 what's different.
- 
Not SyncedIn terms of the work done we've sent these
 many patches: two patches a day,
- 
Not Syncedwhich is not too bad, on average.
- 
Not Synced[Applause]
- 
Not Synced[Holger] I can't clap because I sent three
 or something like that
- 
Not Synced[Lamby] Holger does three per day.
- 
Not SyncedAnd this doesn't count other bugs we found
 in the process of building packages, like
- 
Not Syncedfail to build.
- 
Not SyncedThis is blue the ones that are open and
 orange are done.
- 
Not SyncedYou can see that someone went a bit crazy
 in February filing bugs and eventually they
- 
Not Syncedwere being fixed; slowly.
- 
Not Synced[Holger] And actually we filed more bugs
 because the fail to build from source bugs
- 
Not Syncedare excluded, I think we filed 300 FTBFS
 in the last two or three months.
- 
Not Synced[Lamby] And those include fail to build
 because of reproducibility things as well
- 
Not Syncedbut we haven't split them up.
- 
Not Synced[Lunar] What's left to be done because
 Holger said "the graph is a lie".
- 
Not SyncedThe main thing that is blocking a lot of
 work is dpkg. Right now the output of dpkg
- 
Not Syncedwill be not deterministic 100% of the time,
 because of timestamps and at least the
- 
Not Syncedfile ordering. We also have a patch that
 creates these .buildinfo files that we've
- 
Not Syncedshown that works. It's not submitted yet
 to dpkg because we need to agree on the
- 
Not Syncedformat. At least we have ftpmaster or
 maybe dpkg, well we have a lot of people
- 
Not Syncedand that's what we are going to do the
 next hour.
- 
Not SyncedDebhelper also has a few changes; the make
 mtimes, debhelper might also not be
- 
Not Syncedbest place, maybe we want that in dpkg.
- 
Not SyncedI've been trying to put patches in tar so
 we can make it easier. It's complicated to
- 
Not Syncedsee where's the best place but so far we've
 been doing our tests with ??? and it works.
- 
Not Synced[Holger] In our repository we have these
 packages with these bugs fixed so when
- 
Not Syncedyou want to test reproducibility issues on
 your own machine you need to use the
- 
Not Syncedrepository which has these patches applied
 at the moment.
- 
Not SyncedIn pure sid you cannot create reproducible
 packages.
- 
Not Synced[Lunar] I heard that the SOURCE_DATE_EPOCH
 patch is in git already, so it's going to happen.
- 
Not Syncedcdbs also needed to export SOURCE_DATE_EPOCH
 and we are starting to do more infrastructure
- 
Not Syncedwork: Josch mainly and Akira on sbuild,
 because we wanted to have this
- 
Not Syncedsrebuild script, where you give it a
 buildinfo and it will do the rebuild and
- 
Not Syncedit needs changes in build daemon for the
 build path and also a couple of changes in
- 
Not Syncedsbuild itself.
- 
Not Synced[Holger] And the script is not ready yet,
 this "Finish" means it uses our repository
- 
Not Syncedat the moment, we need to change it to only
 use Sid and snapshot.
- 
Not Synced[Lunar] So there is the buildd issue that
 we need to discuss
- 
Not Syncedand we also need to see how we could include
 or not, or somewhere give this buildinfo
- 
Not Syncedcontrol file to the world so they can
 rebuild the packages, so it's not yet
- 
Not Syncedclear where's the best place to store
 them.
- 
Not SyncedBecause adding 22,000 files, some
 people get cranky of this idea.
- 
Not Synced[Holger] It's more than 22,000 files, it's
 22,000 source packages multiplied by
- 
Not Synced10 architectures; but there's a lot of
 arch builds so that's probably 100,000
- 
Not Syncedbuildinfo files, multiplied by Stretch and
 Sid, so it's 200,000 files or more on
- 
Not Syncedthe file servers and on the mirrors we
 would like to have it.
- 
Not SyncedThat's the same amount of files which are
 currently there. The mirror operators are
- 
Not Syncedcurrently not happy, they will not take it,
 so our current idea is just concatenate
- 
Not Syncedall these files into one file that's 140 MB
 uncompressed, 40 MB compressed.
- 
Not SyncedThat's easier to handle.
- 
Not SyncedAnd then probably have a service
 buildinfo.debian.org where you can
- 
Not Synceddownload individual buildinfo files if you
 need them.
- 
Not Synced[Lunar] And so when we will be done with
 all that we can maybe add a final patch
- 
Not Syncedit would be to Debian policy, ???
 Debian packages be reproducible.
- 
Not Synced[Applause]
- 
Not SyncedI can say again that the dream of mine is
 that we would stop uploading .deb when
- 
Not Syncedwe upload a package, but instead just upload
 the hash of the binary, have the buildd
- 
Not Syncedbuild again this package and only if these
 two match they can enter the archive.
- 
Not SyncedSo we are sure that at least the two
 machines, the developer machine and the
- 
Not Syncedbuild deamon agree that they've built the
 same thing.
- 
Not Synced[Applause]
- 
Not Synced[Holger] I share this dream but I think
 having this in policy is a mass requirement
- 
Not Syncedsadly something only for Stretch + 1, but
 I'm curious if we had fixed dpkg and
- 
Not Synceddebhelper now, would you think we should
 upgrade all these wishlist bugs to important now?
- 
Not Synced[Audience] Yes!
- 
Not Synced[Holger] We'll talk about this later soon.
- 
Not Synced[Lunar] But before that we actually have
 work to do.
- 
Not Synced[Dhole] In order to fix your package, the
 first thing you can do is go to
- 
Not Syncedreproducible.debian.net/, and you
 can the web interface where you can see
- 
Not Syncednotes on the package, we have tags to
 identify different issues that make packages
- 
Not Syncednot reproducible, with links to the wiki
 about how to solve them.
- 
Not Synced[Holger] When you see this, you want to
 click on this debbindiff link.
- 
Not SyncedIt's still called debbindiff not diffoscope,
 this will show all the differences,
- 
Not Syncedif there is a note. If the package is
 unreproducible and there's no note
- 
Not Syncedit will automatically display the
 debbindiff, and if your package is fine
- 
Not Syncedthere's here a sun.
- 
Not Synced[Dhole] You can also see an entry in the
 tracker, stating if your package is
- 
Not Syncedreproducible or not.
- 
Not SyncedYou can also find information in DDPO and
 DMD. You can find tips on the wiki it's
- 
Not SyncedReproducibleBuilds wiki, we are working on
 a Howto to have detailed steps on different
- 
Not Syncedissues and how to solve them. Lunar gave
 a talk at CCCamp where there's many issues
- 
Not Syncedreally well explained and the solutions for
 them.
- 
Not SyncedYou can also come to our irc channel which
 is #debian-reproducible and ask for help
- 
Not Syncedor go to the mailing-list.
- 
Not SyncedIn order to test locally if your package is
 reproducible right now we are using a
- 
Not Syncedscript that uses pbuilder in a custom
 configuration, you need to set up our
- 
Not Syncedreproducible repository. In the Howto in
 the wiki there's the steps on how to set up
- 
Not Syncedthe chroot and everything, it's documented
 in the wiki.
- 
Not SyncedDiffoscope is in unstable and today it's
 going in Stretch.
- 
Not SyncedWe plan to add these scripts to rebuild
 packages in different settings in debscripts
- 
Not Syncedonce dpkg is good, and we welcome you
 tomorrow to the hacking session from
- 
Not Synced2 to 7 in Stockholm room.
- 
Not Synced[Lunar] That's for fixing your packages,
 please do that. If you want to have even
- 
Not Syncedmore fun, then test your own package, join
 us!
- 
Not SyncedThis is the past year of my life, it has
 been awesome because the team has been
- 
Not Syncedso great, it's ??? atmosphere, lots of new
 understanding so many things you didn't
- 
Not Syncedwant to learn about that you had to learn
 about, and basically it feels very good to
- 
Not Syncedbe part of this actual changing the world
 thing. It's just software but it has some
- 
Not Syncedprofound effect. I've been told that the
 work we've been doing is being tossed
- 
Not Syncedaround in Cisco and Google and Facebook;
 all these big dot com companies bla bla,
- 
Not Syncedthey actually want to do that as well even
 though they are not doing Free Software,
- 
Not Syncedwhich I find wired, but whatever.
- 
Not SyncedSo what do we do? We review packages, we
 have these notes when we actually try to
- 
Not Syncedidentify, so when the maintainer comes
 they don't have to think to much about
- 
Not Syncedthe problem and just fix it. We try to
 identify common trends so when many
- 
Not Syncedpackages have the same problem we make an
 entry and explain and maybe think about fixes
- 
Not Syncedthat could apply to the whole archive.
- 
Not SyncedWe work on this reproducible.debian.net
 jenkins setup, the scripts.
- 
Not SyncedWe hack on the diffoscope tool, we make
 strip-nondeterminism better, we propose
- 
Not Syncedchanges for the toolchains when there are
 needs, some need a lot of patches,
- 
Not Syncedmost of the bugs we have reported on
 individual packages have patches.
- 
Not Synced[Holger] Bugs have patches
 [Lunar] Yes!
- 
Not SyncedAnd also we are actually writing some more
 general documentation from the
- 
Not Syncedunderstanding of these things we have been
 having, we are preparing a reproducible
- 
Not Syncedbuilds Howto to explain to the Free Software
 world how they can do it so it's about some
- 
Not Syncedof what Chris explained but also more
 general consideration on what if you're
- 
Not Syncednot Debian and you want your thing
 reproducible when you distribute as an
- 
Not Syncedindependent vendor. So we want to work on
 different documentation so the whole world
- 
Not Syncedcan actually do that.
- 
Not SyncedWe do a lot of talks as you've seen and
 it's been fun, and with all these
- 
Not Syncedpresentations we've made so far it's all
 in git. And everybody is free to take one
- 
Not Syncedof these slide decks and run with it
 somewhere, translate it...
- 
Not SyncedQuestions?
- 
Not SyncedWe have to run with the microphone, because
 there's no mic anymore.
- 
Not Synced[Question] I just wanted to make two quick
 comments: so first of all diffoscope is
- 
Not Syncedreally awesome, not only for reproducibility
 but also for example if you change your
- 
Not Synceddebian/rules in some way and want to see if
 the package is the same afterwards because
- 
Not Syncedyou just cleaned up a bit, that's really
 awesome for that, so thank you.
- 
Not SyncedAnd also I think the work you're doing now
 is something that in 20 years time we're
- 
Not Syncedgoing to look back towards it and think,
 well, of course builds should be
- 
Not Syncedreproducible, so thank you very much for
 your work!
- 
Not Synced[Applause]
- 
Not Synced[Question] When reproducibility becomes
 part of the Debian policy, will there be a
- 
Not Syncedlintian --reproducible?
- 
Not Synced[Holger] I don't think lintian can detect
 that because lintian works on the source
- 
Not Syncedpackage and you need to build the package
 for this.
- 
Not Synced[Lamby] Things that could be detected by
 lintian from a static analysis point of view,
- 
Not Syncedyeah I'm sure, like looking for gzip
 without -n for example, but that wouldn't
- 
Not Syncedbe conclusive from lintian point of view.
- 
Not Synced[Lunar] One thing that I really wanted to
 diffoscope at some point - the code is made
- 
Not Syncedthe way that it's possible - it's to have
 hints so when it actually looks up
- 
Not Synceddifferences between two packages then you
 can have an idea, suggest you: hey you need
- 
Not Syncedto remove that timestamps, or you should
 sort these keys. It's not done yet, but if
- 
Not Syncedanybody wants to do patches it's totally
 doable.
- 
Not Synced[Question] Thank you for the work, have
 you thought about reproducible images?
- 
Not Synced[Holger] It's on the todo list.
- 
Not SyncedBefore images we need reproducible package
 installation, and then we need reproducible
- 
Not Syncedimages like squashfs has some things which
 are not reproducible, but the package
- 
Not Syncedinstallation is not reproducible at the
 moment because apt installs packages in
- 
Not Syncedarbitrary order and then the post-inst
 create for example users which get
- 
Not Synceduser-ids in the order the packages are
 installed, so for that to fix either apt
- 
Not Syncedneeds a way to install in a deterministic
 order, but it's on the todo list file.
- 
Not Synced[Lunar] Pab??? started a wiki page a couple
 of months ago that is called reproducible
- 
Not Syncedinstall. This is very important if we want
 tools like Tails to actually be reproducible
- 
Not Syncedso some people will work on that, we do
 want to work on that.
- 
Not Synced[Lamby] It's quite a deep problem for
 example d-i will install different stuff
- 
Not Synceddepending on your hardware, so that's
 immediately not reproducible.
- 
Not SyncedIt'd be great.
- 
Not Synced[Question] I've been working on a couple
 of my packages to get them reproducible
- 
Not Syncedbuild, but I was often wondering if I
 should fix it in my package or actually
- 
Not Syncedthat it should be fixed in higher up and I
 guess I've been adding some fixes to my
- 
Not Syncedpackages which may in the future even not
 be needed anymore and then it's just
- 
Not Syncedunnecessary code as well.
- 
Not SyncedSo how do you see where things should be
 fixed and how should we as package
- 
Not Syncedmaintainers go about with this?
- 
Not Synced[Holger] There's many things which there's
 the easy fix to whatever: set the timezone in
- 
Not Synceddebhelper or better in dpkg to UTC, but
 that will not fix the upstream bugs, so
- 
Not Syncedactually it's better not to fix, set the
 timezone or other things deterministically
- 
Not Syncedin these tools but rather have them fixed
 upstream, that's what we want.
- 
Not SyncedSome things we will need to fix them in
 dpkg to get a meaningful result but
- 
Not Syncedbasically we want rather these distributions
 with just build from source which don't have
- 
Not Synceddebian/rules and they just build with
 upstream Makefiles, we want the fixes
- 
Not Syncedto land there.
- 
Not Synced[Lunar] We've been experimenting for two
 and this is a lot of trials and errors,
- 
Not Syncedtrying something, see how it fails, or
 maybe we can do better than that and
- 
Not Syncedchanging. And I know this can be frustrating
 at some point because you do changes
- 
Not Syncedand they all become unneeded, but in the
 end this is how we make stuff that matters.
- 
Not SyncedAnd we move forward, it's not because we're
 trying to make the big picture at once,
- 
Not Syncedand I know in Debian we sometimes try to do
 that, so we experiment and learn from it.
- 
Not Synced[Question] An example that I'm now looking
 into is actually the documentation is built
- 
Not Syncedfor this package by looking in all the files
 and generating but, for instances the
- 
Not Syncedindex file is sorted, but I guess upstream
 would say: well, if you set some ordering
- 
Not Syncedin your LC parameters you want this page
 to be order as you want, instead of forcing
- 
Not Syncedit in the sort, so I'm really wondering:
 should I now upstream this or should
- 
Not SyncedI just fix it in my rules because that's
 the logical place?
- 
Not Synced[Lunar] Both. No, there's no good answer,
 I'm quite a strong proponent on the idea
- 
Not Syncedthat if you use a computer you should be
 able to talk and have the computer talk to
- 
Not Syncedyou in the language that you choose, so if
 people want to have gcc error messages
- 
Not Syncedin German, they should have it.
- 
Not SyncedBut local sorting, this is the kind of
 LC_ALL that can be very local and that
- 
Not Syncedyou can do for just one tool, it's fine to
 do that.
- 
Not Synced[Question] Do you have ideas on making
 sources reproducible? Like upstreams
- 
Not Syncedcalling make dist, or this infamous
 autogen.sh files?
- 
Not Synced[Lunar] I don't think that anybody in the
 team has looked into that yet, source
- 
Not Syncedfiles are easy to analyze way more than
 binary packages so, it would still be great
- 
Not Syncedto have easier ways; you have source
 tarballs be byte for byte identical,
- 
Not Syncedbut it's not as an issue as it is for
 binaries. If people want to look in that
- 
Not Syncedthey should.
- 
Not Synced[Question] Do you know a way to make git
 archive build something reproducible?
- 
Not Synced[Lunar] Well ???
- 
Not Synced[Question] Yes, but without it.
- 
Not Synced[Holger] There's one tool. You want to use
 a new one? Then write it.
- 
Not SyncedWhy not use that tool which does the job?
- 
Not Synced??? does it.
- 
Not Synced[Lunar] This is for source and so that's
 another issue that what we are actually
- 
Not Syncedcurrently working on.
- 
Not Synced[Holger] You're welcome to join the team and
 extend our scope to sources.
- 
Not Synced[Lunar] How many questions, two?
- 
Not SyncedTwo more questions, two or three.
- 
Not Synced[Question] So if there is a couple of other
 environment variables that could be set
- 
Not Syncedin the environment to increase
 reproducibility, where to put them?
- 
Not SyncedIn the rules file? Or in the generic build
 environment of all packages, or where
- 
Not Syncedshould these things be placed?
- 
Not Synced[Lamby] It'd be nice if upstream fixed it,
 so if we just change it in debian/rules
- 
Not Syncedthat's just only helping us, so often take
 it upstream, would be the ideal solution.
- 
Not SyncedAre you referring to something else?
- 
Not Synced[Question] For example many hashmaps have
 randomized data in the hash function, so if
- 
Not Syncedyou have some code that relies on hash
 order, at least some implementations of
- 
Not Syncedhash functions are leaving them ???
 rather than using something random for
- 
Not Synceda build thing, but you want the randomness
 in your hash functions for normal users
- 
Not Syncedbecause else your hashmaps get open
 to attacks.
- 
Not Synced[Lamby] Correct, yes.
- 
Not Synced[Lunar] In these cases we send patches
 adding sort everywhere for the keys and
- 
Not Syncedit's ???. For very few cases, for Perl for
 example you can set and environment
- 
Not Syncedvariable and some maintainers prefer to do
 that. But usually we try to push these
- 
Not Syncedchanges upstream, because they are simple
 enough and they like it.
- 
Not SyncedActually it makes testing easier to them.
- 
Not SyncedThere was one in the back, there.
- 
Not Synced[Lunar] That's the last question
- 
Not Synced[Question] Follow up question to what we
 had here before.
- 
Not SyncedYou showed an open bug report against gcc
 to support SOURCE_DATE_EPOCH to cover
- 
Not Syncedthe mdate and mtime timestamps, so I have
 patches to patch them out in my packages.
- 
Not SyncedShould I remove those patches and if so,
 when?
- 
Not Synced[Lunar] Have you seen any more emails
 from the gcc maintainers?
- 
Not Synced[Dhole] The mail is forgotten, I guess we
 should ping it again, and see if they
- 
Not Syncedreply, because what I read from the gcc
 website is that only the replies from
- 
Not Syncedmaintainers are the ones that matter, and
 I think no maintainer replied to the
- 
Not Syncedmessage, so we should ping again.
- 
Not Synced[Question] That was just an example, my
 question was more general.
- 
Not SyncedAt which time should I remove my patches
 to fix things which were fixed higher up
- 
Not Syncedin the toolchain? Or should I just leave
 them in there?
- 
Not Synced[Holger] Once they are in Sid.
- 
Not Synced[Question] Ok thanks!
- 
Not Synced[Lunar] Ok, I guess we're out of time.
- 
Not SyncedThank you for listening.
- 
Not Synced[Applause]
- 
Not Synced[Lunar] Fix your packages!
              
Show all
            
            
            
            
           Debconf
 Debconf
