YouTube

Vous avez un compte YouTube ?

Nouveauté : activer les traductions et les sous-titres créés par les internautes sur votre chaîne YouTube !

English sous-titres

← Stretching out for trustworthy reproducible builds creating bit by bit identical binaries

Talk about reproducible builds given by Lunar, Holger, Lamby and Dhole during DebConf15.

Obtenir le code d’intégration
1 langue

Afficher la révision 28 créée 09/02/2015 par tvincent.

  1. Welcome and good morning

  2. This is the reproducible builds team,
    talking about
  3. "Stretching out towards trustworthy
    computing"
  4. [Applause]
  5. We're 4 on stage, but actually this is a
    team effort.
  6. All these people listed here have
    contributed to the project at one point.
  7. The 4 of us, that's
  8. Lunar − me
  9. there's Dhole,
  10. Chris Lamb − lamby
  11. and Holger.
  12. But actually, this is DebConf and so a lot
    more of us have been or are
  13. currently here and so, if you want to
    thank anybody that is working on this
  14. you need to actually thank all of
    these folks
  15. 'cause, yay.
  16. [Applause]
  17. [Holger] The people in blue are here.
  18. [Lunar] Let's get started.
  19. Quick recap on what we're talking
    about.
  20. We have software, it's made from source.
  21. Source is readable by humans or at least
    a good amount of humans.
  22. In this room it's good.
  23. Binary, readable by computer and some
    tiny fraction of humanity.
  24. Going from source to binary is called
    build, or like building or compiling
  25. and we're doing free software and
    free software is awesome because
  26. we can actually run these binaries like
    we want
  27. We can actually study the software, how
    it's been made by studying the source
  28. and by studying the source we can assess
    that it does what it's supposed to do
  29. and not something else that does not
  30. have malware, or trojans or security bugs
  31. So we have the binary that can be used,
    fine.
  32. We have the source that can be verified.
  33. Problem is that right now, the only way we
    know that a binary that we get…
  34. We have to trust a website or a Debian
    repository that says
  35. "Well, this binary has been made with this
    source"
  36. But there's no way we can actually prove
    that.
  37. This is actually a problem that has been
    well explained by
  38. Mike Perry and Seth Schoen at the 31c3
    in Hamburg last december.
  39. For example, Seth Schoen made a proof of
    concept exploit for the Linux kernel
  40. that when GCC was called, the kernel would
    without modifying anything on the disk
  41. when the kernel detects that GCC is going
    to read a C file, it will insert some
  42. extra lines of code, and these lines of
    code can be a very bad thing
  43. in the case of 31c3 talk I was just
    recalling.
  44. Actually, you can even have developers
    who are in very good faith, who have
  45. totally secure dev machines, or they
    thought they have,
  46. who have reviewed all their source code
    for any bugs
  47. and we would still get totally owned as
    soon as their computer gets compromised
  48. or one of the build demons from Debian
    gets compromised for example.
  49. This is not, like, hypothetical threats
    here we're discussing
  50. A couple of months after Seth an Mike's
    talk at 31c3,
  51. the Intercept revealed from the Snowden
    leaks
  52. that at a CIA conference in 2012, one
    of the talks that happened
  53. was about a project called Strawhorse.
  54. Strawhorse is about modifying Apple XCode,
    which is the development environment
  55. for MacOS 10 and iOS applications
  56. and well, they were modifying XCode so
    it would produce,
  57. without the developer knowing,
  58. binaries with trojans, malware,
    watermarked binaries, lots of bad things.
  59. So, solution:
  60. enable anyone to reproduce identical
    binary packages from a given source.
  61. Because if using a source, using the same
    environment,
  62. multiple people on different computers, on
    different networks, at different times,
  63. can all get the same thing
    from the same source
  64. all the same binary, byte for byte,
  65. then there's a good chance that…
  66. Well, everybody could be owned,
    but let's be more joyful and say that
  67. probably, if everybody gets the same
    result, there was actually no problem
  68. and everybody is safe.
  69. We call that solution
    "reproducible builds"
  70. Yay.
  71. [Applause]
  72. Actually, it's not only about security.
  73. For Debian, we have, if you're doing
    "Multi-arch: same" packages,
  74. well they only have the same bytes if
    they are built for different architectures,
  75. the files in the package.
  76. Debug packages, you can create at a later
    time, if you forgot to have debug packages
  77. in the first place,
  78. you can pass the no-strip option later and
    because the package is reproducible,
  79. you will get the debug symbols that work
    for software that has been shipped already
  80. We do early detection of FTBFS that way
  81. because if we try pretty quickly
    to reproduce a build,
  82. then it has to work.
  83. It's useful for build profiles.
  84. We can get smaller .deb deltas,
  85. because from one version to the next we
    might have the same content.
  86. We can do validation of cross-builds,
  87. Helmut Grohne can talk to you about that.
  88. And also, Niels Thykier told me that
  89. he was very interested in reproducible
    builds because it would enable him to
  90. test debhelper better, because
  91. if the package builds reproducibly,
    then he makes a change to debhelper,
  92. then he can rebuild
  93. the same version of a package with a newer
    debhelper and see what has changed
  94. and this change can be isolated to only
    what he has worked on debhelper
  95. for example.
  96. And, oh my.
  97. The whole world is watching us.
  98. Since two years or a year and a half ago,
    everybody I meet in security conference,
  99. in hacker conference, in free software
    conference is like
  100. "Oh you're working on that,
    that's awesome."
  101. And, I mean, I've been the one doing quite
    a lot of talks, and everybody comes to me
  102. and I'm like "Wow wow, this is way bigger",
  103. but we're actually leading the field here.
  104. Yay Debian.
  105. [Applause]
  106. [Holger] So, we are not the only ones
    leading the field,
  107. Bitcoin and Tor made their software
    reproducible before us,
  108. Coreboot also succeeded, if you build
    Coreboot without any payload,
  109. that's 100% reproducible.
  110. FreeBSD has a page on their wiki since
    2013
  111. saying there are 5 reproducibility issues
    in their base system.
  112. We're at the moment trying to
    confirm this.
  113. On jenkins.debian.net, I've also set up
    now tests for FreeBSD, NetBSD,
  114. Coreboot and OpenWrt.
  115. So if you go to
    reproducible.debian.net/
  116. you get that tested.
  117. And there's more in the pipeline.
  118. There are other projects interested
    as well.
  119. NetBSD also has a variable MKREPRO
    which you can set
  120. and that builds reproducibly.
  121. Though they think "I'm keeping some
    timestamps it's fine" and then
  122. filtering them out later".
  123. We disagree.
  124. So this is how Debian looks like,
    Debian Sid,
  125. but this is a lie.
  126. This is not the truth.
  127. This is just our test setup.
  128. Sid is not like this.
  129. For Sid, it's all orange, there's zero
    reprodicibility in Sid today.
  130. But we'll talk now and in the following
    round table,
  131. it's to actually make Sid reproducible.
  132. The current status is
  133. we're working on this in Debian since
    two years ago.
  134. We have weekly reports about our project
    now since May
  135. and we've given several talks, especially
    in the last year
  136. and all these talks, presentation, also
    other stuff is linked in the wiki.
  137. There's a page with information about
    Debian, these BSDs,
  138. other Linuxes, upstream softwares
    all on this wiki.
  139. Since DebConf14, which is merely
    a year ago,
  140. we've made quite some changes.
  141. We have introduced
    strip-nondeterminism
  142. which is called by dh at the end
    of the build of the package
  143. and will normalize some things
    which Chris will explain later
  144. We have decided on a fixed build path
  145. because the build path is leaked
    in the binaries and several things
  146. We didn't find a way yet to make
    the build path arbitrary.
  147. We designed a way to record the build
    environment
  148. because to rebuild, you need to recreate
    the build environment.
  149. We set up this Jenkins setup.
  150. We wrote diffoscope which used to be
    called debbindiff
  151. which shows differences between two
    packages or two directories or
  152. two filesystems by now.
  153. There's SOURCEDATEEPOCH, which is a way
    that the tools expose
  154. the last modification of the source.
  155. Because the build date, people want to
    include the build date
  156. because they think this is a
    meaningful indication:
  157. when a build was done,
    which software used.
  158. But if the build always recreates
    the same results
  159. the build date becomes meaningless
  160. and the really interesting thing is
    the latest modification of the source.
  161. We have written patches for the tools
  162. [Lunar] strip-nondeterminism:
    is Andrew Ayer in the audience?
  163. Yay! He did it!
  164. It's written in Perl because we didn't
    want to have a new build dependency
  165. in all Debian packages.
  166. Basically it takes anything and tries
    to normalize it as much as it can
  167. replacing timestamps or file permissions
    or removing some issues.
  168. It's working very well on many formats,
    it's meant to be extensible
  169. so we can actually add more things and
    it's run by dh at the end of the process, as Holger said.
  170. The .buildinfo is currently a proposal
    we have not yet totally agreed
  171. but we are generating them as part
    of the test we have
  172. and basically it's a new control file that
    will tie the sources, the generated binary
  173. the packages that were used to build this
    binary and their version.
  174. The idea is that we can use this file to
    reinstall all the specific versions from snapshot
  175. So we recreate the same build environment
    then we can just start the build from that source
  176. that was mentioned and see if the binary
    that has been generated matches.
  177. What it looks like for now, you see there is
    a source binary, the build path
  178. because currently we don't have any good
    post-processing tool for buildpaths
  179. in elf and dwarf binaries, we just decided
    to specify the build path so when we do
  180. a later rebuild we use that path and be safe.
  181. The source is dsc, the binary is .deb and
    a list of packages with the versions.
  182. We currently use the base files version
    to know which Debian release is to be used
  183. as the basis.
  184. [Holger] The general procedure for testing is:
    we build the source, we save the results,
  185. we modify the environment and we build
    it again and compare the results.
  186. That started as a shell script last year which I
    put on jenkins and then it exploded a bit
  187. and now we have 67 jenkins jobs running on
    7 hosts.
  188. Since last week we have 4 armhf small boards
    where we will be able to test armhf,
  189. but very slowly.
  190. We have two new amd64 build nodes.
  191. The code is now split into Python and bash
    scripts.
  192. For all the other distro testing there's a
    lot of bash code now which is mostly
  193. boilerplate and it's 5 lines or something
    to build FreeBSD and 5 lines to build NetBSD
  194. but there's 100 lines boilercode around so it's
    really not that much code.
  195. We do test Testing, Unstable and Experimental.
  196. For arm we only start with Unstable.
  197. We do like hardware so if you have hardware
    to donate to us, that would be great,
  198. we need ssh and then root basically.
  199. We are testing Coreboot, OpenWrt and the
    BSD's, soon I will also set up a Fedora test
  200. I don't want to test all the 20,000 Fedora
    packages but just 200 or something:
  201. the base system of Fedora to examine how
    rpm works
  202. to get really the whole Free Software world
    reproducible.
  203. This is all run on ProfitBricks hardware
    since 2002, so thanks to ProfitBricks.
  204. This is the variations we do for Debian.
  205. It's the hostname, username, timezone,
    locale.
  206. Chris will explain what modifications
    this causes, variances...
  207. We are not testing at the moment differences
    in date so the date is always the same
  208. the time is a bit different.
  209. [Lunar] Well almost! Because we cheat with
    the timezone, we use one timezone that is
  210. GMT-14 and then GMT+12 so it's more than
    24 hours appart.
  211. [Holger] On the first of the month we
    sometimes find new bugs where there's
  212. packages which record the month.
  213. We don't have variations of the CPU type
    at the moment.
  214. Both time and CPU type variations, we'll
    have them about one or two weeks
  215. the nodes are being prepared at the moment.
  216. Then we will test all the meaningful
    variations we could think of.
  217. There will be probably some packages which
    build different according to the number of
  218. number of CD drives attached or whatever
    things, but those will be find by you.
  219. [Lunar] We are doing all these tests because
    we want when you rebuild a package on
  220. your machine that if any this is different from
    the build deamons in Debian you get
  221. the same results.
  222. We use this to detect this problems early
    before you actually a false positive that we have
  223. to investigate when someone rebuilds a
    package on their machine.
  224. To understand the difference that we found
    from one build to the other.
  225. It started also as a 10 lines shellscript
    and then it felt okeyish
  226. and so Python!
  227. And now it's a lot of code and it actually
    grew way beyond a Debian package.
  228. We changed the name, it was called debbindiff
    but it's absolutely not tied to Debian anymore.
  229. It's called diffoscope, thanks to Jocelyn
    for the name.
  230. Basically what it does: it tries to get to
    the bottom of what is different between
  231. two archives or directories.
  232. Because it's not useful to compare bytes that
    are compressed by gzip or xz, that will not
  233. lead you to understand what is different
    you need to uncompress and look at
  234. uncompressed data, and if the thing actually
    compressed is a tarball, you might actually
  235. want to compare the files inside the tarball.
  236. If there is a PDF inside this archive, you
    don't want to compare the bytes of the PDF
  237. you want to compare the text of the PDF.
  238. So this is basically what diffoscope does,
    it tries to transform anything that is
  239. a container and compare things in this
    container and if they can be transformed into
  240. a human readable form it will try to do
    that, and compare these human readable form.
  241. And if it doesn't find any difference but
    there are still differences from the bin
  242. it will fall back to binary comparison.
  243. Try it, extend it; it's Python, it's modular,
    it's great.
  244. It already supports squashfs, ISO, rpm,
    gettext, mo files files and so many different things.
  245. You can have HTML output like that,
    so this is what is displayed on many
  246. examples we've shown so far, and also
    to make it easier for copy paste
  247. and post processing we have the text output.
  248. You can also use it to review packages before
    uploading them to Debian.
  249. It does fuzzy matching, so even if the
    directory is different in the archive it will
  250. find it like git does.
  251. It has grown way more beyond just build
    reproducibly. A useful tool.
  252. [Dhole] In order to solve timestamp issues, we are
    proposing the SOURCEDATEEPOCH variable.
  253. This is because most of the times having
    the build date embedded in a package
  254. is not useful for the user, because you could
    take a really old package and build it today
  255. and that day would not be useful.
  256. We are standardizing a replacement for build
    dates so that tools can use it.
  257. When this value is set, the tool instead of
    embedding the current date, it will embed
  258. the date taken from SOURCEDATEEPOCH which
    will contain a Unix epoch timestamp.
  259. This is a general solution we are trying to
    standardize so that not only Debian uses it,
  260. but other Free Software projects and
    distributions and in the case of Debian,
  261. we set this variable to the latest Debian
    changelog entry timestamp.
  262. We have already been sending patches to
    different packages, mostly it's documentation
  263. generation. So here's a list of bugs that
    we have opened which have been closed
  264. and merged; so it's help2man, epydoc,
    ghostscript, texi2html and sphinx.
  265. We are both sending these patches to Debian
    and upstream so all the distributions can
  266. use them, and we have also been sending
    patches to other packages which are still
  267. open, so we encourage you to take a look
    at these packages if you are the maintainer
  268. and merge the patch.
  269. [Lunar] Thanks to Daniel Kahn Gillmor and
    Ximin Luo for pushing this proposal forward.
  270. And also lots of these patches have been
    written by Akira and Dhole as part of their
  271. Google Summer of Code, and you work really
    great.
  272. [Applause]
  273. [Dhole] The gcc patch is: gcc uses two
    macros which are _DATE and TIME_
  274. which embed the timestamp and I wrote a
    patch so that if SOURCEDATEEPOCH is set
  275. instead of adding the current time, it takes
    the time from that variable.
  276. I sent this patch to gcc, it's still there
    forgotten with many other patches
  277. but hopefully at some point they will
    realize that this is interesting and they
  278. will merge it.
  279. [Lamby] Hey. Let's very quickly run you
    through some really simple ways
  280. to fixing packages. The details don't
    necessarily matter, it's just to give you
  281. of what needs to be changed and basically
    to point out that it's not rocket science.
  282. So you can just come in and jump in.
  283. For example gzip, it's a very old tool
    and they decided to add timestamps when
  284. you generate it, but it's an easy fix, you
    just add -n flag.
  285. Some other things easy to change: some
    Python stuff had tag_date=True, which
  286. I don't know if you can see it but adds a
    timestamp to eggs. You just change it to
  287. False to get rid of it.
  288. Static libraries, they are just ar archives
    so the same format as .deb, and you
  289. can just use binutils or strip-nondeterminism
    tool.
  290. PNG has timestamps for some reason, you can
    get rid of them, that's ImageMagick and it's
  291. a bit ugly, but also strip-nondeterminism
    gets rid of it.
  292. Tarballs are quite interesting, they will
    by default capture user and group
  293. you just pass --owner=root bla bla bla...
  294. Ordering, this is interesting as well, it
    will usually use file system ordering
  295. which is completely non-deterministic. So
    you need to sort with LC_ALL=C.
  296. [Lunar] Think about the locale! Because
    sorting order varies from local to the next.
  297. [Lamby] They also take timestamps, again
    you can set --mtime or you can mock around
  298. with find/xargs/touch bla bla...
  299. Lots of other files have timestamps: Erlang
    files for no reason, even upstream don't
  300. know why they added a timestamp.
  301. We have now a patch for SOURCEDATEEPOCH,
    which I think landed a couple days ago.
  302. Here's an interesting one, not necessarily
    the current build timestamp, so this is a
  303. timezone dependent date which Ruby loads
    and then saves incorrectly as your local time.
  304. This gets mangled, so that's patching.
  305. I'm going from changing individual packages
    to more toolchain things as you can see.
  306. Upstream configure scripts, you can maybe
    see the top that it just uses hostname
  307. for no reason. Sometimes you can override
    it in debian/rules just by exporting something
  308. or passing a variable to dh_autobuild or
    whatever. That's just a little bit more
  309. involved, you have to look at it more
    carefully.
  310. Perl hash order, lot of Perl uses data
    Data::Dumper to just output a bunch of stuff which
  311. is just not deterministic. So often just
    setting Sortkeys, but sometimes it's
  312. a completely different solution.
  313. Header files, so you can maybe see that
    they are using the timestamp essentially
  314. as a unique identifier, you probably have
    to start re-writing these something saner
  315. because this is a wrong use of timestamp
    anyway.
  316. More Makefiles, the deeper they timestamp
    in the upstream package the more you have
  317. to start patching, so these kind of start
    sucking a little.
  318. We've made a lot of toolchain changes, some
    already mentioned, some of them already
  319. merged, see more in this link. Again,
    details don't matter, just check it out
  320. it isn't crazy, it's just working out
    what's different.
  321. In terms of the work done we've sent these
    many patches: two patches a day,
  322. which is not too bad, on average.
  323. [Applause]
  324. [Holger] I can't clap because I sent three
    or something like that
  325. [Lamby] Holger does three per day.
  326. And this doesn't count other bugs we found
    in the process of building packages, like
  327. fail to build.
  328. This is blue the ones that are open and
    orange are done.
  329. You can see that someone went a bit crazy
    in February filing bugs and eventually they
  330. were being fixed; slowly.
  331. [Holger] And actually we filed more bugs
    because the fail to build from source bugs
  332. are excluded, I think we filed 300 FTBFS
    in the last two or three months.
  333. [Lamby] And those include fail to build
    because of reproducibility things as well
  334. but we haven't split them up.
  335. [Lunar] What's left to be done because
    Holger said "the graph is a lie".
  336. The main thing that is blocking a lot of
    work is dpkg. Right now the output of dpkg
  337. will be not deterministic 100% of the time,
    because of timestamps and at least the
  338. file ordering. We also have a patch that
    creates these .buildinfo files that we've
  339. shown that works. It's not submitted yet
    to dpkg because we need to agree on the
  340. format. At least we have ftpmaster or
    maybe dpkg, well we have a lot of people
  341. and that's what we are going to do the
    next hour.
  342. Debhelper also has a few changes; the make
    mtimes, debhelper might also not be
  343. best place, maybe we want that in dpkg.
  344. I've been trying to put patches in tar so
    we can make it easier. It's complicated to
  345. see where's the best place but so far we've
    been doing our tests with this frame and it works.
  346. [Holger] In our repository we have these
    packages with these bugs fixed so when
  347. you want to test reproducibility issues on
    your own machine you need to use the
  348. repository which has these patches applied
    at the moment.
  349. In pure sid you cannot create reproducible
    packages.
  350. [Lunar] I heard that the SOURCEDATEEPOCH
    patch is in git already, so it's going to happen.
  351. cdbs also needed to export SOURCEDATEEPOCH
    and we are starting to do more infrastructure
  352. work: Josch mainly and Akira on sbuild,
    because we wanted to have this
  353. srebuild script, where you give it a
    buildinfo and it will do the rebuild and
  354. it needs changes in build daemon for the
    build path and also a couple of changes in
  355. sbuild itself.
  356. [Holger] And the script is not ready yet,
    this "Finish" means it uses our repository
  357. at the moment, we need to change it to only
    use Sid and snapshot.
  358. [Lunar] So there is the buildd issue that
    we need to discuss
  359. and we also need to see how we could include
    or not, or somewhere give this buildinfo
  360. control file to the world so they can
    rebuild the packages, so it's not yet
  361. clear where's the best place to store
    them.
  362. Because adding 22,000 files, some
    people get cranky of this idea.
  363. [Holger] It's more than 22,000 files, it's
    22,000 source packages multiplied by
  364. 10 architectures; but there's a lot of
    arch builds so that's probably 100,000
  365. buildinfo files, multiplied by Stretch and
    Sid, so it's 200,000 files or more on
  366. the file servers and on the mirrors we
    would like to have it.
  367. That's the same amount of files which are
    currently there. The mirror operators are
  368. currently not happy, they will not take it,
    so our current idea is just concatenate
  369. all these files into one file that's 140 MB
    uncompressed, 40 MB compressed.
  370. That's easier to handle.
  371. And then probably have a service
    buildinfo.debian.org where you can
  372. download individual buildinfo files if you
    need them.
  373. [Lunar] And so when we will be done with
    all that we can maybe add a final patch
  374. it would be to Debian policy, mandating
    Debian packages be reproducible.
  375. [Applause]
  376. I can say again that the dream of mine is
    that we would stop uploading .deb when
  377. we upload a package, but instead just upload
    the hash of the binary, have the buildd
  378. build again this package and only if these
    two match they can enter the archive.
  379. So we are sure that at least the two
    machines, the developer machine and the
  380. build deamon agree that they've built the
    same thing.
  381. [Applause]
  382. [Holger] I share this dream but I think
    having this in policy is a mass requirement
  383. sadly something only for Stretch + 1, but
    I'm curious if we had fixed dpkg and
  384. debhelper now, would you think we should
    upgrade all these wishlist bugs to important now?
  385. [Audience] Yes!
  386. [Holger] We'll talk about this later soon.
  387. [Lunar] But before that we actually have
    work to do.
  388. [Dhole] In order to fix your package, the
    first thing you can do is go to
  389. reproducible.debian.net/package, and you
    can the web interface where you can see
  390. notes on the package, we have tags to
    identify different issues that make packages
  391. not reproducible, with links to the wiki
    about how to solve them.
  392. [Holger] When you see this, you want to
    click on this debbindiff link.
  393. It's still called debbindiff not diffoscope,
    this will show all the differences,
  394. if there is a note. If the package is
    unreproducible and there's no note
  395. it will automatically display the
    debbindiff, and if your package is fine
  396. there's here a sun.
  397. [Dhole] You can also see an entry in the
    tracker, stating if your package is
  398. reproducible or not.
  399. You can also find information in DDPO and
    DMD. You can find tips on the wiki it's
  400. ReproducibleBuilds wiki, we are working on
    a Howto to have detailed steps on different
  401. issues and how to solve them. Lunar gave
    a talk at CCCamp where there's many issues
  402. really well explained and the solutions for
    them.
  403. You can also come to our irc channel which
    is #debian-reproducible and ask for help
  404. or go to the mailing-list.
  405. In order to test locally if your package is
    reproducible right now we are using a
  406. script that uses pbuilder in a custom
    configuration, you need to set up our
  407. reproducible repository. In the Howto in
    the wiki there's the steps on how to set up
  408. the chroot and everything, it's documented
    in the wiki.
  409. Diffoscope is in unstable and today it's
    going in Stretch.
  410. We plan to add these scripts to rebuild
    packages in different settings in debscripts
  411. once dpkg is good, and we welcome you
    tomorrow to the hacking session from
  412. 2 to 7 in Stockholm room.
  413. [Lunar] That's for fixing your packages,
    please do that. If you want to have even
  414. more fun, then test your own package, join
    us!
  415. This is the past year of my life, it has
    been awesome because the team has been
  416. so great, it's been friendly atmosphere, lots of
    new understanding so many things you didn't
  417. want to learn about that you had to learn
    about, and basically it feels very good to
  418. be part of this actual changing the world
    thing. It's just software but it has some
  419. profound effect. I've been told that the
    work we've been doing is being tossed
  420. around in Cisco and Google and Facebook;
    all these big dot com companies bla bla,
  421. they actually want to do that as well even
    though they are not doing Free Software,
  422. which I find wired, but whatever.
  423. So what do we do? We review packages, we
    have these notes when we actually try to
  424. identify, so when the maintainer comes
    they don't have to think to much about
  425. the problem and just fix it. We try to
    identify common trends so when many
  426. packages have the same problem we make an
    entry and explain and maybe think about fixes
  427. that could apply to the whole archive.
  428. We work on this reproducible.debian.net
    jenkins setup, the scripts.
  429. We hack on the diffoscope tool, we make
    strip-nondeterminism better, we propose
  430. changes for the toolchains when there are
    needs, some need a lot of patches,
  431. most of the bugs we have reported on
    individual packages have patches.
  432. [Holger] Bugs have patches
    [Lunar] Yes!
  433. And also we are actually writing some more
    general documentation from the
  434. understanding of these things we have been
    having, we are preparing a reproducible
  435. builds Howto to explain to the Free Software
    world how they can do it so it's about some
  436. of what Chris explained but also more
    general consideration on what if you're
  437. not Debian and you want your thing
    reproducible when you distribute as an
  438. independent vendor. So we want to work on
    reference documentation so the whole world
  439. can actually do that.
  440. We do a lot of talks as you've seen and
    it's been fun, and with all these
  441. presentations we've made so far it's all
    in git. And everybody is free to take one
  442. of these slide decks and run with it
    somewhere, translate it...
  443. Questions?
  444. We have to run with the microphone, because
    there's no mic anymore.
  445. [Question] I just wanted to make two quick
    comments: so first of all diffoscope is
  446. really awesome, not only for reproducibility
    but also for example if you change your
  447. debian/rules in some way and want to see if
    the package is the same afterwards because
  448. you just cleaned up a bit, that's really
    awesome for that, so thank you.
  449. And also I think the work you're doing now
    is something that in 20 years time we're
  450. going to look back towards it and think,
    well, of course builds should be
  451. reproducible, so thank you very much for
    your work!
  452. [Applause]
  453. [Question] When reproducibility becomes
    part of the Debian policy, will there be a
  454. lintian --reproducible?
  455. [Holger] I don't think lintian can detect
    that because lintian works on the source
  456. package and you need to build the package
    for this.
  457. [Lamby] Things that could be detected by
    lintian from a static analysis point of view,
  458. yeah I'm sure, like looking for gzip
    without -n for example, but that wouldn't
  459. be conclusive from lintian point of view.
  460. [Lunar] One thing that I really wanted to
    diffoscope at some point - the code is made
  461. the way that it's possible - it's to have
    hints so when it actually looks up
  462. differences between two packages then you
    can have an idea, suggest you: hey you need
  463. to remove that timestamps, or you should
    sort these keys. It's not done yet, but if
  464. anybody wants to do patches it's totally
    doable.
  465. [Question] Thank you for the work, have
    you thought about reproducible images?
  466. [Holger] It's on the todo list.
  467. Before images we need reproducible package
    installation, and then we need reproducible
  468. images like squashfs has some things which
    are not reproducible, but the package
  469. installation is not reproducible at the
    moment because apt installs packages in
  470. arbitrary order and then the post-inst
    create for example users which get
  471. user-ids in the order the packages are
    installed, so for that to fix either apt
  472. needs a way to install in a deterministic
    order, but it's on the todo list file.
  473. [Lunar] Pabs started a wiki page a couple
    of months ago that is called reproducible
  474. install. This is very important if we want
    tools like Tails to actually be reproducible
  475. so some people will work on that, we do
    want to work on that.
  476. [Lamby] It's quite a deep problem for
    example d-i will install different stuff
  477. depending on your hardware, so that's
    immediately not reproducible.
  478. It'd be great.
  479. [Question] I've been working on a couple
    of my packages to get them reproducible
  480. build, but I was often wondering if I
    should fix it in my package or actually
  481. that it should be fixed in higher up and I
    guess I've been adding some fixes to my
  482. packages which may in the future even not
    be needed anymore and then it's just
  483. unnecessary code as well.
  484. So how do you see where things should be
    fixed and how should we as package
  485. maintainers go about with this?
  486. [Holger] There's many things which there's
    the easy fix to whatever: set the timezone in
  487. debhelper or better in dpkg to UTC, but
    that will not fix the upstream bugs, so
  488. actually it's better not to fix, set the
    timezone or other things deterministically
  489. in these tools but rather have them fixed
    upstream, that's what we want.
  490. Some things we will need to fix them in
    dpkg to get a meaningful result but
  491. basically we want rather these distributions
    with just build from source which don't have
  492. debian/rules and they just build with
    upstream Makefiles, we want the fixes
  493. to land there.
  494. [Lunar] We've been experimenting for two
    and this is a lot of trials and errors,
  495. trying something, see how it fails, or
    maybe we can do better than that and
  496. changing. And I know this can be frustrating
    at some point because you do changes
  497. and they all become unneeded, but in the
    end this is how we make stuff that matters.
  498. And we move forward, it's not because we're
    trying to make the big picture at once,
  499. and I know in Debian we sometimes try to do
    that, so we experiment and learn from it.
  500. [Question] An example that I'm now looking
    into is actually the documentation is built
  501. for this package by looking in all the files
    and generating but, for instances the
  502. index file is sorted, but I guess upstream
    would say: well, if you set some ordering
  503. in your LC parameters you want this page
    to be order as you want, instead of forcing
  504. it in the sort, so I'm really wondering:
    should I now upstream this or should
  505. I just fix it in my rules because that's
    the logical place?
  506. [Lunar] Both. No, there's no good answer,
    I'm quite a strong proponent on the idea
  507. that if you use a computer you should be
    able to talk and have the computer talk to
  508. you in the language that you choose, so if
    people want to have gcc error messages
  509. in German, they should have it.
  510. But local sorting, this is the kind of
    LC_ALL that can be very local and that
  511. you can do for just one tool, it's fine to
    do that.
  512. [Question] Do you have ideas on making
    sources reproducible? Like upstreams
  513. calling make dist, or this infamous
    autogen.sh files?
  514. [Lunar] I don't think that anybody in the
    team has looked into that yet, source
  515. files are easy to analyze way more than
    binary packages so, it would still be great
  516. to have easier ways; you have source
    tarballs be byte for byte identical,
  517. but it's not as an issue as it is for
    binaries. If people want to look in that
  518. they should.
  519. [Question] Do you know a way to make git
    archive build something reproducible?
  520. [Lunar] Well pristine-tar
  521. [Question] Yes, but without it.
  522. [Holger] There's one tool. You want to use
    a new one? Then write it.
  523. Why not use that tool which does the job?
  524. pristine-tar does it.
  525. [Lunar] This is for source and so that's
    another issue that what we are actually
  526. currently working on.
  527. [Holger] You're welcome to join the team and
    extend our scope to sources.
  528. [Lunar] How many questions, two?
  529. Two more questions, two or three.
  530. [Question] So if there is a couple of other
    environment variables that could be set
  531. in the environment to increase
    reproducibility, where to put them?
  532. In the rules file? Or in the generic build
    environment of all packages, or where
  533. should these things be placed?
  534. [Lamby] It'd be nice if upstream fixed it,
    so if we just change it in debian/rules
  535. that's just only helping us, so often take
    it upstream, would be the ideal solution.
  536. Are you referring to something else?
  537. [Question] For example many hashmaps have
    randomized data in the hash function, so if
  538. you have some code that relies on hash
    order, at least some implementations of
  539. hash functions are leaving them be seeded
    rather than using something random for
  540. a build thing, but you want the randomness
    in your hash functions for normal users
  541. because else your hashmaps get open
    to attacks.
  542. [Lamby] Correct, yes.
  543. [Lunar] In these cases we send patches
    adding sort everywhere for the keys and
  544. it's solved. For very few cases, for Perl for
    example you can set and environment
  545. variable and some maintainers prefer to do
    that. But usually we try to push these
  546. changes upstream, because they are simple
    enough and they like it.
  547. Actually it makes testing easier to them.
  548. There was one in the back, there.
  549. [Lunar] That's the last question
  550. [Question] Follow up question to what we
    had here before.
  551. You showed an open bug report against gcc
    to support SOURCEDATEEPOCH to cover
  552. the mdate and mtime timestamps, so I have
    patches to patch them out in my packages.
  553. Should I remove those patches and if so,
    when?
  554. [Lunar] Have you seen any more emails
    from the gcc maintainers?
  555. [Dhole] The mail is forgotten, I guess we
    should ping it again, and see if they
  556. reply, because what I read from the gcc
    website is that only the replies from
  557. maintainers are the ones that matter, and
    I think no maintainer replied to the
  558. message, so we should ping again.
  559. [Question] That was just an example, my
    question was more general.
  560. At which time should I remove my patches
    to fix things which were fixed higher up
  561. in the toolchain? Or should I just leave
    them in there?
  562. [Holger] Once they are in Sid.
  563. [Question] Ok thanks!
  564. [Lunar] Ok, I guess we're out of time.
  565. Thank you for listening.
  566. [Applause]
  567. [Lunar] Fix your packages!