Welcome and good morning
This is the reproducible builds team,
talking about
"Stretching out towards trustworthy
computing"
[Applause]
We're 4 on stage, but actually this is a
team effort.
All these people listed here have
contributed to the project at one point.
The 4 of us, that's
Lunar − me
there's Dhole,
Chris Lamb − lamby
and Holger.
But actually, this is DebConf and so a lot
more of us have been or are
currently here and so, if you want to
thank anybody that is working on this
you need to actually thank all of
these folks
'cause, yay.
[Applause]
[Holger] The people in blue are here.
[Lunar] Let's get started.
Quick recap on what we're talking
about.
We have software, it's made from source.
Source is readable by humans or at least
a good amount of humans.
In this room it's good.
Binary, readable by computer and some
tiny fraction of humanity.
Going from source to binary is called
build, or like building or compiling
and we're doing free software and
free software is awesome because
we can actually run these binaries like
we want
We can actually study the software, how
it's been made by studying the source
and by studying the source we can assess
that it does what it's supposed to do
and not something else that does not
have malware, or trojans or security bugs
So we have the binary that can be used,
fine.
We have the source that can be verified.
Problem is that right now, the only way we
know that a binary that we get…
We have to trust a website or a Debian
repository that says
"Well, this binary has been made with this
source"
But there's no way we can actually prove
that.
This is actually a problem that has been
well explained by
Mike Perry and Seth Schoen at the 31c3
in Hamburg last december.
For example, Seth Schoen made a proof of
concept exploit for the Linux kernel
that when GCC was called, the kernel would
without modifying anything on the disk
when the kernel detects that GCC is going
to read a C file, it will insert some
extra lines of code, and these lines of
code can be a very bad thing
in the case of 31c3 talk I was just
recalling.
Actually, you can even have developers
who are in very good faith, who have
totally secure dev machines, or they
thought they have,
who have reviewed all their source code
for any bugs
and we would still get totally owned as
soon as their computer gets compromised
or one of the build demons from Debian
gets compromised for example.
This is not, like, hypothetical threats
here we're discussing
A couple of months after Seth an Mike's
talk at 31c3,
the Intercept revealed from the Snowden
leaks
that at a CIA conference in 2012, one
of the talks that happened
was about a project called Strawhorse.
Strawhorse is about modifying Apple XCode,
which is the development environment
for MacOS 10 and iOS applications
and well, they were modifying XCode so
it would produce,
without the developer knowing,
binaries with trojans, malware,
watermarked binaries, lots of bad things.
So, solution:
enable anyone to reproduce identical
binary packages from a given source.
Because if using a source, using the same
environment,
multiple people on different computers, on
different networks, at different times,
can all get the same thing
from the same source
all the same binary, byte for byte,
then there's a good chance that…
Well, everybody could be owned,
but let's be more joyful and say that
probably, if everybody gets the same
result, there was actually no problem
and everybody is safe.
We call that solution
"reproducible builds"
Yay.
[Applause]
Actually, it's not only about security.
For Debian, we have, if you're doing
"Multi-arch: same" packages,
well they only have the same bytes if
they are built for different architectures,
the files in the package.
Debug packages, you can create at a later
time, if you forgot to have debug packages
in the first place,
you can pass the no-strip option later and
because the package is reproducible,
you will get the debug symbols that work
for software that has been shipped already
We do early detection of FTBFS that way
because if we try pretty quickly
to reproduce a build,
then it has to work.
It's useful for build profiles.
We can get smaller .deb deltas,
because from one version to the next we
might have the same content.
We can do validation of cross-builds,
Helmut Grohne can talk to you about that.
And also, Niels Thykier told me that
he was very interested in reproducible
builds because it would enable him to
test debhelper better, because
if the package builds reproducibly,
then he makes a change to debhelper,
then he can rebuild
the same version of a package with a newer
debhelper and see what has changed
and this change can be isolated to only
what he has worked on debhelper
for example.
And, oh my.
The whole world is watching us.
Since two years or a year and a half ago,
everybody I meet in security conference,
in hacker conference, in free software
conference is like
"Oh you're working on that,
that's awesome."
And, I mean, I've been the one doing quite
a lot of talks, and everybody comes to me
and I'm like "Wow wow, this is way bigger",
but we're actually leading the field here.
Yay Debian.
[Applause]
[Holger] So, we are not the only ones
leading the field,
Bitcoin and Tor made their software
reproducible before us,
Coreboot also succeeded, if you build
Coreboot without any payload,
that's 100% reproducible.
FreeBSD has a page on their wiki since
2013
saying there are 5 reproducibility issues
in their base system.
We're at the moment trying to
confirm this.
On jenkins.debian.net, I've also set up
now tests for FreeBSD, NetBSD,
Coreboot and OpenWrt.
So if you go to
reproducible.debian.net/
you get that tested.
And there's more in the pipeline.
There are other projects interested
as well.
NetBSD also has a variable MKREPRO
which you can set
and that builds reproducibly.
Though they think "I'm keeping some
timestamps it's fine" and then
filtering them out later".
We disagree.
So this is how Debian looks like,
Debian Sid,
but this is a lie.
This is not the truth.
This is just our test setup.
Sid is not like this.
For Sid, it's all orange, there's zero
reprodicibility in Sid today.
But we'll talk now and in the following
round table,
it's to actually make Sid reproducible.
The current status is
we're working on this in Debian since
two years ago.
We have weekly reports about our project
now since May
and we've given several talks, especially
in the last year
and all these talks, presentation, also
other stuff is linked in the wiki.
There's a page with information about
Debian, these BSDs,
other Linuxes, upstream softwares
all on this wiki.
Since DebConf14, which is merely
a year ago,
we've made quite some changes.
We have introduced
strip-nondeterminism
which is called by dh at the end
of the build of the package
and will normalize some things
which Chris will explain later
We have decided on a fixed build path
because the build path is leaked
in the binaries and several things
We didn't find a way yet to make
the build path arbitrary.
We designed a way to record the build
environment
because to rebuild, you need to recreate
the build environment.
We set up this Jenkins setup.
We wrote diffoscope which used to be
called debbindiff
which shows differences between two
packages or two directories or
two filesystems by now.
There's SOURCEDATEEPOCH, which is a way
that the tools expose
the last modification of the source.
Because the build date, people want to
include the build date
because they think this is a
meaningful indication:
when a build was done,
which software used.
But if the build always recreates
the same results
the build date becomes meaningless
and the really interesting thing is
the latest modification of the source.
We have written patches for the tools
[Lunar] strip-nondeterminism:
is Andrew Ayer in the audience?
Yay! He did it!
It's written in Perl because we didn't
want to have a new build dependency
in all Debian packages.
Basically it takes anything and tries
to normalize it as much as it can
replacing timestamps or file permissions
or removing some issues.
It's working very well on many formats,
it's meant to be extensible
so we can actually add more things and
it's run by dh at the end of the process, as Holger said.
The .buildinfo is currently a proposal
we have not yet totally agreed
but we are generating them as part
of the test we have
and basically it's a new control file that
will tie the sources, the generated binary
the packages that were used to build this
binary and their version.
The idea is that we can use this file to
reinstall all the specific versions from snapshot
So we recreate the same build environment
then we can just start the build from that source
that was mentioned and see if the binary
that has been generated matches.
What it looks like for now, you see there is
a source binary, the build path
because currently we don't have any good
post-processing tool for buildpaths
in elf and dwarf binaries, we just decided
to specify the build path so when we do
a later rebuild we use that path and be safe.
The source is dsc, the binary is .deb and
a list of packages with the versions.
We currently use the base files version
to know which Debian release is to be used
as the basis.
[Holger] The general procedure for testing is:
we build the source, we save the results,
we modify the environment and we build
it again and compare the results.
That started as a shell script last year which I
put on jenkins and then it exploded a bit
and now we have 67 jenkins jobs running on
7 hosts.
Since last week we have 4 armhf small boards
where we will be able to test armhf,
but very slowly.
We have two new amd64 build nodes.
The code is now split into Python and bash
scripts.
For all the other distro testing there's a
lot of bash code now which is mostly
boilerplate and it's 5 lines or something
to build FreeBSD and 5 lines to build NetBSD
but there's 100 lines boilercode around so it's
really not that much code.
We do test Testing, Unstable and Experimental.
For arm we only start with Unstable.
We do like hardware so if you have hardware
to donate to us, that would be great,
we need ssh and then root basically.
We are testing Coreboot, OpenWrt and the
BSD's, soon I will also set up a Fedora test
I don't want to test all the 20,000 Fedora
packages but just 200 or something:
the base system of Fedora to examine how
rpm works
to get really the whole Free Software world
reproducible.
This is all run on ProfitBricks hardware
since 2002, so thanks to ProfitBricks.
This is the variations we do for Debian.
It's the hostname, username, timezone,
locale.
Chris will explain what modifications
this causes, variances...
We are not testing at the moment differences
in date so the date is always the same
the time is a bit different.
[Lunar] Well almost! Because we cheat with
the timezone, we use one timezone that is
GMT-14 and then GMT+12 so it's more than
24 hours appart.
[Holger] On the first of the month we
sometimes find new bugs where there's
packages which record the month.
We don't have variations of the CPU type
at the moment.
Both time and CPU type variations, we'll
have them about one or two weeks
the nodes are being prepared at the moment.
Then we will test all the meaningful
variations we could think of.
There will be probably some packages which
build different according to the number of
number of CD drives attached or whatever
things, but those will be find by you.
[Lunar] We are doing all these tests because
we want when you rebuild a package on
your machine that if any this is different from
the build deamons in Debian you get
the same results.
We use this to detect this problems early
before you actually a false positive that we have
to investigate when someone rebuilds a
package on their machine.
To understand the difference that we found
from one build to the other.
It started also as a 10 lines shellscript
and then it felt okeyish
and so Python!
And now it's a lot of code and it actually
grew way beyond a Debian package.
We changed the name, it was called debbindiff
but it's absolutely not tied to Debian anymore.
It's called diffoscope, thanks to Jocelyn
for the name.
Basically what it does: it tries to get to
the bottom of what is different between
two archives or directories.
Because it's not useful to compare bytes that
are compressed by gzip or xz, that will not
lead you to understand what is different
you need to uncompress and look at
uncompressed data, and if the thing actually
compressed is a tarball, you might actually
want to compare the files inside the tarball.
If there is a PDF inside this archive, you
don't want to compare the bytes of the PDF
you want to compare the text of the PDF.
So this is basically what diffoscope does,
it tries to transform anything that is
a container and compare things in this
container and if they can be transformed into
a human readable form it will try to do
that, and compare these human readable form.
And if it doesn't find any difference but
there are still differences from the bin
it will fall back to binary comparison.
Try it, extend it; it's Python, it's modular,
it's great.
It already supports squashfs, ISO, rpm,
gettext, mo files files and so many different things.
You can have HTML output like that,
so this is what is displayed on many
examples we've shown so far, and also
to make it easier for copy paste
and post processing we have the text output.
You can also use it to review packages before
uploading them to Debian.
It does fuzzy matching, so even if the
directory is different in the archive it will
find it like git does.
It has grown way more beyond just build
reproducibly. A useful tool.
[Dhole] In order to solve timestamp issues, we are
proposing the SOURCEDATEEPOCH variable.
This is because most of the times having
the build date embedded in a package
is not useful for the user, because you could
take a really old package and build it today
and that day would not be useful.
We are standardizing a replacement for build
dates so that tools can use it.
When this value is set, the tool instead of
embedding the current date, it will embed
the date taken from SOURCEDATEEPOCH which
will contain a Unix epoch timestamp.
This is a general solution we are trying to
standardize so that not only Debian uses it,
but other Free Software projects and
distributions and in the case of Debian,
we set this variable to the latest Debian
changelog entry timestamp.
We have already been sending patches to
different packages, mostly it's documentation
generation. So here's a list of bugs that
we have opened which have been closed
and merged; so it's help2man, epydoc,
ghostscript, texi2html and sphinx.
We are both sending these patches to Debian
and upstream so all the distributions can
use them, and we have also been sending
patches to other packages which are still
open, so we encourage you to take a look
at these packages if you are the maintainer
and merge the patch.
[Lunar] Thanks to Daniel Kahn Gillmor and
Ximin Luo for pushing this proposal forward.
And also lots of these patches have been
written by Akira and Dhole as part of their
Google Summer of Code, and you work really
great.
[Applause]
[Dhole] The gcc patch is: gcc uses two
macros which are _DATE and TIME_
which embed the timestamp and I wrote a
patch so that if SOURCEDATEEPOCH is set
instead of adding the current time, it takes
the time from that variable.
I sent this patch to gcc, it's still there
forgotten with many other patches
but hopefully at some point they will
realize that this is interesting and they
will merge it.
[Lamby] Hey. Let's very quickly run you
through some really simple ways
to fixing packages. The details don't
necessarily matter, it's just to give you
of what needs to be changed and basically
to point out that it's not rocket science.
So you can just come in and jump in.
For example gzip, it's a very old tool
and they decided to add timestamps when
you generate it, but it's an easy fix, you
just add -n flag.
Some other things easy to change: some
Python stuff had tag_date=True, which
I don't know if you can see it but adds a
timestamp to eggs. You just change it to
False to get rid of it.
Static libraries, they are just ar archives
so the same format as .deb, and you
can just use binutils or strip-nondeterminism
tool.
PNG has timestamps for some reason, you can
get rid of them, that's ImageMagick and it's
a bit ugly, but also strip-nondeterminism
gets rid of it.
Tarballs are quite interesting, they will
by default capture user and group
you just pass --owner=root bla bla bla...
Ordering, this is interesting as well, it
will usually use file system ordering
which is completely non-deterministic. So
you need to sort with LC_ALL=C.
[Lunar] Think about the locale! Because
sorting order varies from local to the next.
[Lamby] They also take timestamps, again
you can set --mtime or you can mock around
with find/xargs/touch bla bla...
Lots of other files have timestamps: Erlang
files for no reason, even upstream don't
know why they added a timestamp.
We have now a patch for SOURCEDATEEPOCH,
which I think landed a couple days ago.
Here's an interesting one, not necessarily
the current build timestamp, so this is a
timezone dependent date which Ruby loads
and then saves incorrectly as your local time.
This gets mangled, so that's patching.
I'm going from changing individual packages
to more toolchain things as you can see.
Upstream configure scripts, you can maybe
see the top that it just uses hostname
for no reason. Sometimes you can override
it in debian/rules just by exporting something
or passing a variable to dh_autobuild or
whatever. That's just a little bit more
involved, you have to look at it more
carefully.
Perl hash order, lot of Perl uses data
Data::Dumper to just output a bunch of stuff which
is just not deterministic. So often just
setting Sortkeys, but sometimes it's
a completely different solution.
Header files, so you can maybe see that
they are using the timestamp essentially
as a unique identifier, you probably have
to start re-writing these something saner
because this is a wrong use of timestamp
anyway.
More Makefiles, the deeper they timestamp
in the upstream package the more you have
to start patching, so these kind of start
sucking a little.
We've made a lot of toolchain changes, some
already mentioned, some of them already
merged, see more in this link. Again,
details don't matter, just check it out
it isn't crazy, it's just working out
what's different.
In terms of the work done we've sent these
many patches: two patches a day,
which is not too bad, on average.
[Applause]
[Holger] I can't clap because I sent three
or something like that
[Lamby] Holger does three per day.
And this doesn't count other bugs we found
in the process of building packages, like
fail to build.
This is blue the ones that are open and
orange are done.
You can see that someone went a bit crazy
in February filing bugs and eventually they
were being fixed; slowly.
[Holger] And actually we filed more bugs
because the fail to build from source bugs
are excluded, I think we filed 300 FTBFS
in the last two or three months.
[Lamby] And those include fail to build
because of reproducibility things as well
but we haven't split them up.
[Lunar] What's left to be done because
Holger said "the graph is a lie".
The main thing that is blocking a lot of
work is dpkg. Right now the output of dpkg
will be not deterministic 100% of the time,
because of timestamps and at least the
file ordering. We also have a patch that
creates these .buildinfo files that we've
shown that works. It's not submitted yet
to dpkg because we need to agree on the
format. At least we have ftpmaster or
maybe dpkg, well we have a lot of people
and that's what we are going to do the
next hour.
Debhelper also has a few changes; the make
mtimes, debhelper might also not be
best place, maybe we want that in dpkg.
I've been trying to put patches in tar so
we can make it easier. It's complicated to
see where's the best place but so far we've
been doing our tests with this frame and it works.
[Holger] In our repository we have these
packages with these bugs fixed so when
you want to test reproducibility issues on
your own machine you need to use the
repository which has these patches applied
at the moment.
In pure sid you cannot create reproducible
packages.
[Lunar] I heard that the SOURCEDATEEPOCH
patch is in git already, so it's going to happen.
cdbs also needed to export SOURCEDATEEPOCH
and we are starting to do more infrastructure
work: Josch mainly and Akira on sbuild,
because we wanted to have this
srebuild script, where you give it a
buildinfo and it will do the rebuild and
it needs changes in build daemon for the
build path and also a couple of changes in
sbuild itself.
[Holger] And the script is not ready yet,
this "Finish" means it uses our repository
at the moment, we need to change it to only
use Sid and snapshot.
[Lunar] So there is the buildd issue that
we need to discuss
and we also need to see how we could include
or not, or somewhere give this buildinfo
control file to the world so they can
rebuild the packages, so it's not yet
clear where's the best place to store
them.
Because adding 22,000 files, some
people get cranky of this idea.
[Holger] It's more than 22,000 files, it's
22,000 source packages multiplied by
10 architectures; but there's a lot of
arch builds so that's probably 100,000
buildinfo files, multiplied by Stretch and
Sid, so it's 200,000 files or more on
the file servers and on the mirrors we
would like to have it.
That's the same amount of files which are
currently there. The mirror operators are
currently not happy, they will not take it,
so our current idea is just concatenate
all these files into one file that's 140 MB
uncompressed, 40 MB compressed.
That's easier to handle.
And then probably have a service
buildinfo.debian.org where you can
download individual buildinfo files if you
need them.
[Lunar] And so when we will be done with
all that we can maybe add a final patch
it would be to Debian policy, mandating
Debian packages be reproducible.
[Applause]
I can say again that the dream of mine is
that we would stop uploading .deb when
we upload a package, but instead just upload
the hash of the binary, have the buildd
build again this package and only if these
two match they can enter the archive.
So we are sure that at least the two
machines, the developer machine and the
build deamon agree that they've built the
same thing.
[Applause]
[Holger] I share this dream but I think
having this in policy is a mass requirement
sadly something only for Stretch + 1, but
I'm curious if we had fixed dpkg and
debhelper now, would you think we should
upgrade all these wishlist bugs to important now?
[Audience] Yes!
[Holger] We'll talk about this later soon.
[Lunar] But before that we actually have
work to do.
[Dhole] In order to fix your package, the
first thing you can do is go to
reproducible.debian.net/package, and you
can the web interface where you can see
notes on the package, we have tags to
identify different issues that make packages
not reproducible, with links to the wiki
about how to solve them.
[Holger] When you see this, you want to
click on this debbindiff link.
It's still called debbindiff not diffoscope,
this will show all the differences,
if there is a note. If the package is
unreproducible and there's no note
it will automatically display the
debbindiff, and if your package is fine
there's here a sun.
[Dhole] You can also see an entry in the
tracker, stating if your package is
reproducible or not.
You can also find information in DDPO and
DMD. You can find tips on the wiki it's
ReproducibleBuilds wiki, we are working on
a Howto to have detailed steps on different
issues and how to solve them. Lunar gave
a talk at CCCamp where there's many issues
really well explained and the solutions for
them.
You can also come to our irc channel which
is #debian-reproducible and ask for help
or go to the mailing-list.
In order to test locally if your package is
reproducible right now we are using a
script that uses pbuilder in a custom
configuration, you need to set up our
reproducible repository. In the Howto in
the wiki there's the steps on how to set up
the chroot and everything, it's documented
in the wiki.
Diffoscope is in unstable and today it's
going in Stretch.
We plan to add these scripts to rebuild
packages in different settings in debscripts
once dpkg is good, and we welcome you
tomorrow to the hacking session from
2 to 7 in Stockholm room.
[Lunar] That's for fixing your packages,
please do that. If you want to have even
more fun, then test your own package, join
us!
This is the past year of my life, it has
been awesome because the team has been
so great, it's been friendly atmosphere, lots of
new understanding so many things you didn't
want to learn about that you had to learn
about, and basically it feels very good to
be part of this actual changing the world
thing. It's just software but it has some
profound effect. I've been told that the
work we've been doing is being tossed
around in Cisco and Google and Facebook;
all these big dot com companies bla bla,
they actually want to do that as well even
though they are not doing Free Software,
which I find wired, but whatever.
So what do we do? We review packages, we
have these notes when we actually try to
identify, so when the maintainer comes
they don't have to think to much about
the problem and just fix it. We try to
identify common trends so when many
packages have the same problem we make an
entry and explain and maybe think about fixes
that could apply to the whole archive.
We work on this reproducible.debian.net
jenkins setup, the scripts.
We hack on the diffoscope tool, we make
strip-nondeterminism better, we propose
changes for the toolchains when there are
needs, some need a lot of patches,
most of the bugs we have reported on
individual packages have patches.
[Holger] Bugs have patches
[Lunar] Yes!
And also we are actually writing some more
general documentation from the
understanding of these things we have been
having, we are preparing a reproducible
builds Howto to explain to the Free Software
world how they can do it so it's about some
of what Chris explained but also more
general consideration on what if you're
not Debian and you want your thing
reproducible when you distribute as an
independent vendor. So we want to work on
reference documentation so the whole world
can actually do that.
We do a lot of talks as you've seen and
it's been fun, and with all these
presentations we've made so far it's all
in git. And everybody is free to take one
of these slide decks and run with it
somewhere, translate it...
Questions?
We have to run with the microphone, because
there's no mic anymore.
[Question] I just wanted to make two quick
comments: so first of all diffoscope is
really awesome, not only for reproducibility
but also for example if you change your
debian/rules in some way and want to see if
the package is the same afterwards because
you just cleaned up a bit, that's really
awesome for that, so thank you.
And also I think the work you're doing now
is something that in 20 years time we're
going to look back towards it and think,
well, of course builds should be
reproducible, so thank you very much for
your work!
[Applause]
[Question] When reproducibility becomes
part of the Debian policy, will there be a
lintian --reproducible?
[Holger] I don't think lintian can detect
that because lintian works on the source
package and you need to build the package
for this.
[Lamby] Things that could be detected by
lintian from a static analysis point of view,
yeah I'm sure, like looking for gzip
without -n for example, but that wouldn't
be conclusive from lintian point of view.
[Lunar] One thing that I really wanted to
diffoscope at some point - the code is made
the way that it's possible - it's to have
hints so when it actually looks up
differences between two packages then you
can have an idea, suggest you: hey you need
to remove that timestamps, or you should
sort these keys. It's not done yet, but if
anybody wants to do patches it's totally
doable.
[Question] Thank you for the work, have
you thought about reproducible images?
[Holger] It's on the todo list.
Before images we need reproducible package
installation, and then we need reproducible
images like squashfs has some things which
are not reproducible, but the package
installation is not reproducible at the
moment because apt installs packages in
arbitrary order and then the post-inst
create for example users which get
user-ids in the order the packages are
installed, so for that to fix either apt
needs a way to install in a deterministic
order, but it's on the todo list file.
[Lunar] Pabs started a wiki page a couple
of months ago that is called reproducible
install. This is very important if we want
tools like Tails to actually be reproducible
so some people will work on that, we do
want to work on that.
[Lamby] It's quite a deep problem for
example d-i will install different stuff
depending on your hardware, so that's
immediately not reproducible.
It'd be great.
[Question] I've been working on a couple
of my packages to get them reproducible
build, but I was often wondering if I
should fix it in my package or actually
that it should be fixed in higher up and I
guess I've been adding some fixes to my
packages which may in the future even not
be needed anymore and then it's just
unnecessary code as well.
So how do you see where things should be
fixed and how should we as package
maintainers go about with this?
[Holger] There's many things which there's
the easy fix to whatever: set the timezone in
debhelper or better in dpkg to UTC, but
that will not fix the upstream bugs, so
actually it's better not to fix, set the
timezone or other things deterministically
in these tools but rather have them fixed
upstream, that's what we want.
Some things we will need to fix them in
dpkg to get a meaningful result but
basically we want rather these distributions
with just build from source which don't have
debian/rules and they just build with
upstream Makefiles, we want the fixes
to land there.
[Lunar] We've been experimenting for two
and this is a lot of trials and errors,
trying something, see how it fails, or
maybe we can do better than that and
changing. And I know this can be frustrating
at some point because you do changes
and they all become unneeded, but in the
end this is how we make stuff that matters.
And we move forward, it's not because we're
trying to make the big picture at once,
and I know in Debian we sometimes try to do
that, so we experiment and learn from it.
[Question] An example that I'm now looking
into is actually the documentation is built
for this package by looking in all the files
and generating but, for instances the
index file is sorted, but I guess upstream
would say: well, if you set some ordering
in your LC parameters you want this page
to be order as you want, instead of forcing
it in the sort, so I'm really wondering:
should I now upstream this or should
I just fix it in my rules because that's
the logical place?
[Lunar] Both. No, there's no good answer,
I'm quite a strong proponent on the idea
that if you use a computer you should be
able to talk and have the computer talk to
you in the language that you choose, so if
people want to have gcc error messages
in German, they should have it.
But local sorting, this is the kind of
LC_ALL that can be very local and that
you can do for just one tool, it's fine to
do that.
[Question] Do you have ideas on making
sources reproducible? Like upstreams
calling make dist, or this infamous
autogen.sh files?
[Lunar] I don't think that anybody in the
team has looked into that yet, source
files are easy to analyze way more than
binary packages so, it would still be great
to have easier ways; you have source
tarballs be byte for byte identical,
but it's not as an issue as it is for
binaries. If people want to look in that
they should.
[Question] Do you know a way to make git
archive build something reproducible?
[Lunar] Well pristine-tar
[Question] Yes, but without it.
[Holger] There's one tool. You want to use
a new one? Then write it.
Why not use that tool which does the job?
pristine-tar does it.
[Lunar] This is for source and so that's
another issue that what we are actually
currently working on.
[Holger] You're welcome to join the team and
extend our scope to sources.
[Lunar] How many questions, two?
Two more questions, two or three.
[Question] So if there is a couple of other
environment variables that could be set
in the environment to increase
reproducibility, where to put them?
In the rules file? Or in the generic build
environment of all packages, or where
should these things be placed?
[Lamby] It'd be nice if upstream fixed it,
so if we just change it in debian/rules
that's just only helping us, so often take
it upstream, would be the ideal solution.
Are you referring to something else?
[Question] For example many hashmaps have
randomized data in the hash function, so if
you have some code that relies on hash
order, at least some implementations of
hash functions are leaving them be seeded
rather than using something random for
a build thing, but you want the randomness
in your hash functions for normal users
because else your hashmaps get open
to attacks.
[Lamby] Correct, yes.
[Lunar] In these cases we send patches
adding sort everywhere for the keys and
it's solved. For very few cases, for Perl for
example you can set and environment
variable and some maintainers prefer to do
that. But usually we try to push these
changes upstream, because they are simple
enough and they like it.
Actually it makes testing easier to them.
There was one in the back, there.
[Lunar] That's the last question
[Question] Follow up question to what we
had here before.
You showed an open bug report against gcc
to support SOURCEDATEEPOCH to cover
the mdate and mtime timestamps, so I have
patches to patch them out in my packages.
Should I remove those patches and if so,
when?
[Lunar] Have you seen any more emails
from the gcc maintainers?
[Dhole] The mail is forgotten, I guess we
should ping it again, and see if they
reply, because what I read from the gcc
website is that only the replies from
maintainers are the ones that matter, and
I think no maintainer replied to the
message, so we should ping again.
[Question] That was just an example, my
question was more general.
At which time should I remove my patches
to fix things which were fixed higher up
in the toolchain? Or should I just leave
them in there?
[Holger] Once they are in Sid.
[Question] Ok thanks!
[Lunar] Ok, I guess we're out of time.
Thank you for listening.
[Applause]
[Lunar] Fix your packages!