Software transparency: package security beyond signatures and reproducible builds
-
0:08 - 0:12This will be an academic talk
as announced. -
0:12 - 0:19I will try to bring some of my research
I did during my PhD into the real world. -
0:21 - 0:26We are going to talk about the security
of software distribution and -
0:26 - 0:30I'm going to propose a security feature
that adds on top of -
0:30 - 0:33the signatures we have up to today
-
0:33 - 0:40and also the reproducible builds that
we already have to very large degree. -
0:40 - 0:47I am going to highlight a few points where
I think infrastructure changes are required -
0:47 - 0:53to accommodate this system and I would
also appreciate any feedback -
0:53 - 0:55you might have.
-
0:56 - 1:00I'm going to ??? a few motivation of
what should we care about. -
1:01 - 1:03In the security of software distribution
-
1:03 - 1:06we already do have
cryptographic signatures -
1:06 - 1:11I've just put up a few examples of
recent attacks that involved -
1:11 - 1:19the distribution of software where
people who presumably thought -
1:19 - 1:24they knew what they were doing had
grave problems with software distribution. -
1:25 - 1:28For example, the juniper backdoors,
pretty famous. -
1:29 - 1:33Juniper discovered two backdoors
in the code and -
1:33 - 1:38nobody really knew where they were
coming from. -
1:39 - 1:43Another example would be
Chrome extension developers -
1:43 - 1:48who got their credentials fished and
subsequently their extensions backdoored -
1:48 - 1:58or another example, a signed update to
a banking software actually included -
1:58 - 2:02a malware and infected several banks.
-
2:03 - 2:11I hope this is motivation for us to
consider this kinds of attacks -
2:11 - 2:15to be possible and to prepare ourselves.
-
2:17 - 2:21I have two main goals in the system
I am going to propose. -
2:21 - 2:25The first is to relax trust in the archive.
-
2:25 - 2:31In particular, what I want to achieve is
a level of security even if -
2:31 - 2:38the archive is compromised and
the specific thing I am going to do is -
2:38 - 2:41to detect targeted backdoors.
-
2:41 - 2:47That means backdoors that are distributed
only to a subset of the population and -
2:47 - 2:53what we can achieve is to force
the attacker to deliver the malware -
2:53 - 3:00to everybody, thereby greatly decreasing
their degree of stealth and increasing -
3:00 - 3:02their danger of detection.
-
3:03 - 3:05This would work to our advantage.
-
3:06 - 3:10The second goal is the forensic auditability
-
3:10 - 3:17which overlaps to a surprising degree
with the first one in technical terms, -
3:17 - 3:19in terms of implementation.
-
3:20 - 3:24So, what I want to ensure is that we have
-
3:24 - 3:27inspectable source code for every binary.
-
3:28 - 3:33We do have of course the source code
available from our packages, but -
3:33 - 3:39only for the most recent version,
everything else is a best effort -
3:39 - 3:43by the code archiving services.
-
3:44 - 3:51The mapping between those and binary
can be verified once we have -
3:51 - 3:54reproducible builds to a large extent.
-
3:55 - 4:01I want to make sure that we can identify
the maintainer responsible for distribution -
4:01 - 4:08of a particular package and the system
is also interested in providing -
4:08 - 4:11attribution of where something went from,
-
4:11 - 4:16so that we are not in a situation where we
notice something went wrong but -
4:16 - 4:22we don't really know where we have to look
in order to find the problems -
4:22 - 4:29but that we really have specific and
secured indication of -
4:29 - 4:32where a compromised problem
was coming from. -
4:34 - 4:37Let's quickly recap how our software
distribution works. -
4:37 - 4:42We have the maintainers who upload
their code to the archive. -
4:42 - 4:47The archive has access to a signing key
which signs the releases. -
4:48 - 4:52Actually, metadata covering all the actual
binary packages. -
4:53 - 4:57These are distributed over
the mirror network -
4:57 - 5:02from where the apt clients will download
the package metadata. -
5:02 - 5:07That means the hash sums for the packages,
their dependencies and so on -
5:07 - 5:10as well as the actual packages themselves.
-
5:12 - 5:19This central architecture has an important
advantage, -
5:19 - 5:23mainly the mirror network need not
to be trusted, right? -
5:23 - 5:29We have the signature that covers all
the contents of binary and source packages -
5:29 - 5:33and the metadata, so the mirror network
need not to be trusted. -
5:33 - 5:39On the other hand, it makes the archive and
its signing key a very interesting target -
5:39 - 5:46for attackers because this central point
controls all the signing operations. -
5:46 - 5:52So this is a place where we need to be
particularly careful and perhaps -
5:52 - 5:56maybe even do better than
cryptographic signatures. -
5:57 - 6:02This is where the main focus of this talk
will be, although I will also consider -
6:02 - 6:05the uploaders to some extent.
-
6:07 - 6:08We want to achieve two things:
-
6:08 - 6:12resistance against key compromise and
targeted backdoors and -
6:12 - 6:18to get some better support for auditing
in case things go wrong. -
6:18 - 6:22The approach that we choose to do this is
-
6:22 - 6:27we want to make sure that everybody runs
exactly the same software -
6:27 - 6:30or at least the parts of it these choose
to install. -
6:31 - 6:35If we think about that for a moment,
this gives us a number of advantages. -
6:35 - 6:40For example, all the analysis that's done
on a piece of software immediately -
6:40 - 6:44carries over to all other users of
the software, right? -
6:44 - 6:48Because if we haven't made sure that
everybody installs the same software, -
6:48 - 6:51they might not have exactly
the same version and perhaps -
6:51 - 6:53some backdoored version.
-
6:54 - 7:01This also ensures that we cannot suffer
targeted backdoors by increasing -
7:01 - 7:04the detection risk of attackers
-
7:04 - 7:09and we also want to have a cryptographic
proof of where something went wrong. -
7:11 - 7:19Now, to look at some pictures,
I will present the data structure that -
7:19 - 7:22we use in order to achieve these goals.
-
7:23 - 7:28The data structure is a hash tree,
a Merkle tree which is -
7:28 - 7:31a data structure that operates over a list.
-
7:31 - 7:35So we have a list of these squares here
which represent the list items. -
7:35 - 7:39In our case, this is going to be
the files containing a package metedata -
7:39 - 7:42that just dependencies, a hash sum of
packages -
7:42 - 7:47and also the source packages themselves
are going to be elements in this list. -
7:48 - 7:49The tree works as follows.
-
7:50 - 7:52It uses a cryptographic hash function
-
7:52 - 7:56which is a collision resistant compressing
function -
7:56 - 8:01and the labels of the inner nodes
of the tree are computed as -
8:01 - 8:05the hashes of the children. Ok?
-
8:06 - 8:10Once we have computed the root hash,
the root label, -
8:10 - 8:15we have fixed all the elements and
none of the elements can be changed -
8:15 - 8:17without changing the root hash.
-
8:18 - 8:22We can exploit this in order to
efficiently prove -
8:22 - 8:26the two following properties for elements.
-
8:26 - 8:31First of all, we can efficiently prove
the inclusion of a given element -
8:31 - 8:32in the list.
-
8:32 - 8:37If we know the tree root ???,
this works as follows: -
8:37 - 8:41let's make a quick example, we see
the third list item is marked with an X -
8:41 - 8:50and if I know the tree root, then
the server operating the tree structure -
8:50 - 8:55will only need to give me the three gray
marked labels, -
8:55 - 9:01the three marked node values and then
I can recompute the root hash and -
9:01 - 9:06be convinced that this element actually
was contained in the list. -
9:07 - 9:12The second property is that we can also
efficiently verify the append-only operation -
9:12 - 9:14of the list.
-
9:14 - 9:18So we can have a log server operating
this kind of structure and -
9:18 - 9:19the log server need not to be trusted,
-
9:19 - 9:24it's not going to be trusted third party
but rather, its operation can be -
9:24 - 9:26verified from the outside.
-
9:28 - 9:31So, what does this design look like?
-
9:32 - 9:36The theoretical foundation is called
a transparency overlay and -
9:36 - 9:38in our system it looks like this:
-
9:38 - 9:41We have the archive as per usual,
-
9:41 - 9:47we have a log server and the archive will
submit package metadata, the release file, -
9:47 - 9:52the packages file containing dependencies
and so on and the source code -
9:52 - 9:55into this log server.
-
9:56 - 10:04The apt client will be augmented with
an auditor component and -
10:04 - 10:10this auditor component is responsible for
verifying the correct log operation -
10:10 - 10:15as well as the inclusion of the downloaded
release into the log. -
10:16 - 10:21This is a mechanism which we will be able
to make sure that everybody is running -
10:21 - 10:25the exact same version of the software
they installed. -
10:27 - 10:29A third component is the monitor.
-
10:30 - 10:37The monitor is necessary also to verify
log operation and also to inspect -
10:37 - 10:42the elements that are contained in the log
-
10:44 - 10:52The monitor would then be run by groups
of individuals or individuals that want to -
10:52 - 10:57make sure of certain properties in the log
-
11:01 - 11:06Alright, let's quickly recap.
-
11:07 - 11:13We have added this log server, which
can prove two properties efficiently -
11:13 - 11:16to the outside world.
-
11:17 - 11:21And we have the auditor and monitor
components. -
11:21 - 11:27The auditor is added to the apt client
and the monitor does -
11:27 - 11:30additional investigative tasks.
-
11:31 - 11:38Now, in order to make this system work,
we need to… -
11:39 - 11:41I need to make a few assumptions.
-
11:41 - 11:48The archive will need to handle
log submission and distribution of -
11:48 - 11:51certain log datastructure.
-
11:52 - 11:56These are usually very small things
given to the archive -
11:56 - 11:58in response to submission.
-
11:59 - 12:04Then I'm assuming a very consistent
release frequency. -
12:05 - 12:11The archive is responsible for distributing
reproducible binaries -
12:11 - 12:13in my architecture.
-
12:14 - 12:20I'm assuming that the buildinfo files are
covered by the release file -
12:21 - 12:28I treat them as additional source metadata
so whenever the source package or -
12:28 - 12:35the buildinfo file changes, I expect
an increase in the binary version number. -
12:36 - 12:42I also assume source-only uploads and
one additional thing that we have, -
12:42 - 12:49keyring source package treated
by the archive as authoritative and -
12:49 - 12:53this keyring must have
the special property that -
12:53 - 12:58is operated in append-only so that
we can go back in time and see -
12:58 - 13:01what keys were authorized at different
points in time. -
13:04 - 13:11The log server is a standalone server
component that speaks at the moment on -
13:11 - 13:13an HTTP-based protocol.
-
13:14 - 13:17Probably one would want to have
more than one, but -
13:17 - 13:21we are going to have, I think,
a much easier time running log servers -
13:21 - 13:24than for example, the certificate
transparency people -
13:24 - 13:30because we only have one source
of writing access, -
13:30 - 13:36namely the archive, so we can easily
schedule the write access, -
13:36 - 13:41and you can have read-only frontends that
aren't quite critical. -
13:43 - 13:50The auditor component would need to be
integrated into the apt client or library. -
13:50 - 13:54It needs two things like cryptographic
verifications, -
13:54 - 14:00understand a bit more file formats and
some more network access. -
14:03 - 14:07Parts of the proof could also
probably distribute over -
14:07 - 14:12the mirror network and we need
not necessarily do everything -
14:13 - 14:15??? communication with
the log server. -
14:19 - 14:23So, this covers archive auditor and
log server. -
14:24 - 14:31The monitoring servers have a few functions
that are necessary for the verification of -
14:31 - 14:38the log itself, meaning that they verify
the append-only operation of the log -
14:38 - 14:43and they will also likely want to exchange
the tree roots with perhaps -
14:43 - 14:45other monitors and some auditors.
-
14:47 - 14:51The important verification functions
of the log server are validating -
14:51 - 14:57the metadata of the release packages and
sources file, -
14:57 - 15:01namely making sure that these are complete,
that the sources are available, -
15:01 - 15:06that the versions are incremented
correctly and so on. -
15:06 - 15:10And that's necessary to make sure that
a compromised archive can't do -
15:10 - 15:12certain attacks.
-
15:13 - 15:19Also in this category is the fact that
we depend on a fixed release frequency -
15:19 - 15:27and monitors will also be verifying
the upload ACL, -
15:27 - 15:29meaning which keys are authorized to
upload. -
15:30 - 15:37Monitors also would be verifying
reproducible builds in this scenario. -
15:41 - 15:49That's the monitoring functions and
I think that many different people and -
15:49 - 15:56groups in Debian could get some benefits
out of these monitoring functions -
15:56 - 16:01in order to verify that everything
worked correctly. -
16:01 - 16:05We should note that all these verifications
are completely independent of -
16:05 - 16:09the existing infrastructure because
happening on the client side. -
16:10 - 16:16So we don't depend on any notifications
from the existing infrastructure that -
16:16 - 16:19works correctly and no notifications
are stopped. -
16:19 - 16:23This can be done completely
on the client side using -
16:23 - 16:26the data provided by the log server.
-
16:27 - 16:35For example, maintainers could verify that
the code uploaded builds reproducibly -
16:35 - 16:38using the corresponding build info or
-
16:38 - 16:40they could have checks:
-
16:40 - 16:43which uploads were done using their key
-
16:43 - 16:48which packages were modified perhaps
by other people -
16:48 - 16:54the keyring maintainers or account
managers could be looking at -
16:54 - 16:59the keyring: what keys are in the keyring
and what uploads were done -
16:59 - 17:01using which keys.
-
17:02 - 17:12And the archive, last but not least, has
an additional verification step available -
17:12 - 17:18to make sure all the metadata was produced
correctly and to know -
17:18 - 17:23wierd things happened during the production
of a given release. -
17:28 - 17:29This thing actually exists.
-
17:29 - 17:33Well, I have programmed prototypes
for all these components, -
17:34 - 17:37meaning nothing that would be ready
to implement, -
17:37 - 17:40but to show that it actually works.
-
17:40 - 17:46I've used two years of Debian Stretch
releases and fed it into the system. -
17:47 - 17:53This resulted in a tree size of
270,000 elements and -
17:53 - 17:59the storage required was about 400GB
where almost all of that is -
17:59 - 18:00source packages.
-
18:01 - 18:05I would say that it's imminently feasible
to do this. -
18:05 - 18:10The monitor functions run rather cheaply.
-
18:10 - 18:17A monitor needs not necessarily to keep
a complete copy of the log in all cases -
18:19 - 18:26but what I noticed some unexpected events
in the package metadata. -
18:28 - 18:33I have observed sources missing and
version increments missing where -
18:33 - 18:36I think there should be a version increment
-
18:37 - 18:42So I'll be looking more closely
into these cases. -
18:45 - 18:51If anybody is interested at
the theoretical side of this, -
18:51 - 18:55this would be the immediate pointers
I can give. -
18:56 - 19:01The first paper is the theoretical and
mathematical foundation and -
19:01 - 19:09the other ones are applications of
similar transparency work, but -
19:09 - 19:14with different goals.
-
19:18 - 19:26Summarizing, we can introduce a system
to detect target backdoors, -
19:26 - 19:29even under compromise of the archive.
-
19:30 - 19:36We need to add a bit more infrastructure
and need to change how some things are done -
19:38 - 19:47We also can improve the auditability of
what we can securely identify when -
19:47 - 19:49things go wrong.
-
19:50 - 19:56In particular, we can make sure that for
every binary, we can get -
19:56 - 20:03the source code that was used to produced
the binary -
20:03 - 20:06and then identify
the responsible maintainer. -
20:07 - 20:11There's one class of attacks I have left
out for today, -
20:11 - 20:14if anybody wants to talk about that, we
can do so too. -
20:16 - 20:21And now, I'm interested in your questions
and feedback. -
20:23 - 20:28[Applause]
-
20:34 - 20:40[Q] Did you already test the reproducibility
and how do you interact with -
20:40 - 20:44problems of not reproducible packages?
-
20:44 - 20:47I mean, do you not integrate some
into the log? -
20:48 - 20:53[A] For now, the implementation of
my monitor functions hasn't covered -
20:53 - 20:55reproducibility.
-
20:55 - 21:01I think the first step to do so would be
to have a blacklist of packages -
21:01 - 21:06that are known not to be built reproducibly
and then try to get on with it. -
21:17 - 21:18[Q] Two questions.
-
21:18 - 21:21You say "authenticating metadata and
code". -
21:21 - 21:26This means signing or what is it exactly,
"authenticating"? -
21:26 - 21:28[A] At which point?
-
21:29 - 21:33[Q] It was… back. Where the tree is.
-
21:34 - 21:37Yes, yes. The tree before that.
-
21:40 - 21:40[A] Ok.
-
21:42 - 21:49This authentication here doesn't quite mean
a signature. -
21:50 - 21:57It means if I know the value of the root
of the hashtree, then -
21:57 - 22:04I can be assured that a given element
is included if I'm told -
22:04 - 22:11the value of the three gray marked
inner nodes here. -
22:11 - 22:16And that works by recomputing
the hash tree. -
22:20 - 22:24[Q] Ok, I think I have to defer this
to after the talk. -
22:24 - 22:26[A] Yeah, I can explain.
-
22:27 - 22:29[Q] Another question would be,
-
22:29 - 22:32so, detection of targeted backdoors.
-
22:33 - 22:39You mean at the stage of signing archive
or which backdoors? -
22:41 - 22:46[A] The scenario would be that
the signing key of the archive is -
22:46 - 22:53used to create an additional release file
which covers -
22:53 - 22:55a manipulated software version.
-
22:55 - 23:00And this software version and signature is
only shown to the victim population -
23:00 - 23:02and not to the general population.
-
23:03 - 23:08This means that the malicious software
would only be observed by the victim -
23:08 - 23:10and not by everybody else.
-
23:11 - 23:15My goal is to force the attacker to
distribute the malicious software -
23:15 - 23:18to the whole world in order to increase
-
23:18 - 23:22the chance that they're going to be
detected and thereby deterring perhaps -
23:22 - 23:24the attack from the beginning.
-
23:40 - 23:42[Q] Great talk. Great ideas as well.
-
23:44 - 23:47I really liked your slide on
your assumptions -
23:47 - 23:50???
honest about them like -
23:50 - 23:51"yeah we assume ???"
-
23:53 - 23:56I wouldn't underestimate how difficult
it would be to make -
23:56 - 23:57some of these changes.
-
23:57 - 24:01I mean, even ones that look simple, like
source-only uploads. -
24:01 - 24:03Everyone wants them, right?
-
24:05 - 24:11[A] Yes, sure, we have to start somewhere
and I hope if people are convinced that -
24:12 - 24:14this is a great idea and we should to this
-
24:14 - 24:18then we get some more impetus
for these things that everybody wants -
24:18 - 24:21like source-only uploads.
-
24:22 - 24:25[Q] Thank you, yeah, and it will be really
pretty good to base this stuff -
24:25 - 24:29on reproducible builds effort because
it builds on the same choices. -
24:30 - 24:31Thank you.
-
24:34 - 24:37[A] Yeah, so I'm interested in any kind
of feedback. -
24:38 - 24:43If you think it's a great idea or think
there are some problems I might have missed -
24:43 - 24:46or it might get difficult to implement.
-
24:48 - 24:52Please come talk to me in case you have
anything. -
24:53 - 24:59[Applause]
- Title:
- Software transparency: package security beyond signatures and reproducible builds
- Description:
-
Talk given by Benjamin Hof at Minidebconf Hamburg 18
https://meetings-archive.debian.net/pub/debian-meetings/2018/miniconf-hamburg/2018-05-19/software_transparency.webm - Video Language:
- English
- Team:
Debconf
- Project:
- 2018_mini-debconf-hamburg
- Duration:
- 25:04