-
Not Synced
This will be an academic talk
as announced.
-
Not Synced
I will try to bring some of my research
I did during my PhD into the real world.
-
Not Synced
We are going to talk about the security
of software distribution and
-
Not Synced
I'm going to propose a security feature
that adds on top of
-
Not Synced
the signatures we have up to today
-
Not Synced
and also the reproducible builds that
we already have to very large degree.
-
Not Synced
I am going to highlight a few points where
I think infrastructure changes are required
-
Not Synced
to accommodate this system and I would
also appreciate any feedback
-
Not Synced
you might have.
-
Not Synced
I'm going to ??? a few motivation of
what should we care about.
-
Not Synced
In the security of software distribution
-
Not Synced
we already do have
cryptographic signatures
-
Not Synced
I've just put up a few examples of
recent attacks that involved
-
Not Synced
the distribution of software where
people who presumably thought
-
Not Synced
they knew what they were doing had
grave problems with software distribution.
-
Not Synced
For example, the juniper backdoors,
pretty famous.
-
Not Synced
Juniper discovered two backdoors
in the code and
-
Not Synced
nobody really knew where they were
coming from.
-
Not Synced
Another example would be
Chrome extension developers
-
Not Synced
who got their credentials fished and
subsequently their extensions backdoored
-
Not Synced
or another example, a signed update to
a banking software actually included
-
Not Synced
a malware and infected several banks.
-
Not Synced
I hope this is motivation for us to
consider this kinds of text
-
Not Synced
to be possible and to prepare ourselves.
-
Not Synced
I have two main goals in the system
I am going to propose.
-
Not Synced
The first is to relax trust in the archive.
-
Not Synced
In particular, what I want to achieve is
a level of security even if
-
Not Synced
the archive is compromised and
the specific thing I am going to do is
-
Not Synced
to detect targeted backdoors.
-
Not Synced
That means backdoors that are distributed
only to a subset of the population and
-
Not Synced
what we can achieve is to force
the attacker to deliver the malware
-
Not Synced
to everybody, thereby greatly decreasing
their degree of stealth and increasing
-
Not Synced
their danger of detection.
-
Not Synced
This would work to our advantage.
-
Not Synced
The second goal is the forensic auditability
-
Not Synced
which overlaps to a surprising degree
with the first one in technical terms,
-
Not Synced
in terms of implementation.
-
Not Synced
So, what I want to ensure is that we have
-
Not Synced
inspectable source code for every binary.
-
Not Synced
We do have of course the source code
available from our packages, but
-
Not Synced
only for the most recent version,
everything else is a best effort
-
Not Synced
by the code archiving services.
-
Not Synced
The mapping between those and binary
can be verified once we have
-
Not Synced
reproducible builds to a large extent.
-
Not Synced
I want to make sure that we can identify
the maintainer responsible for distribution
-
Not Synced
of a particular package and the system
is also interested in providing
-
Not Synced
attribution of where something went from,
-
Not Synced
so that we are not in a situation where we
notice something went wrong but
-
Not Synced
we don't really know where we have to look
in order to find the problems
-
Not Synced
but that we really have specific and
secured indication of
-
Not Synced
where a compromised problem
was coming from.
-
Not Synced
Let's quickly recap how our software
distribution works.
-
Not Synced
We have the maintainers who upload
their code to the archive.
-
Not Synced
The archive has access to a signing key
which signs the releases.
-
Not Synced
Actually, metadata covering all the actual
binary packages.
-
Not Synced
These are distributed over
the mirror network
-
Not Synced
from where the apt clients will download
the package metadata.
-
Not Synced
That means the hash sums for the packages,
their dependencies and so on
-
Not Synced
as well as the actual packages themselves.
-
Not Synced
This central architecture has an important
advantage,
-
Not Synced
mainly the mirror network need not
to be trusted, right?
-
Not Synced
We have the signature that covers all
the contents of binary and source packages
-
Not Synced
and the metadata, so the mirror network
need not to be trusted.
-
Not Synced
On the other hand, it makes the archive and
the signing key a very interesting target
-
Not Synced
for attackers because this central point
controls all the signing operations.
-
Not Synced
So this is a place where we need to be
particularly careful and perhaps
-
Not Synced
maybe even do better than
cryptographic signatures.
-
Not Synced
This is where the main focus of this talk
will be, although I will also consider
-
Not Synced
the uploaders to some extent.
-
Not Synced
We want to achieve two things:
-
Not Synced
resistance against key compromise and
targeted backdoors and
-
Not Synced
to get some better support for auditing
in case things go wrong.
-
Not Synced
The approach that we choose to do this is
-
Not Synced
we want to make sure that everybody runs
exactly the same software
-
Not Synced
or at least the parts of it these choose
to install.
-
Not Synced
If we think about that for a moment,
this gives us a number of advantages.
-
Not Synced
For example, all the analysis that's done
on a piece of software immediately
-
Not Synced
carries over to all other users of
the software, right?
-
Not Synced
Because if we haven't made sure that
everybody installs the same software,
-
Not Synced
they might not have exactly
the same version and perhaps
-
Not Synced
some backdoored version.
-
Not Synced
This also ensures that we cannot suffer
targeted backdoors by increasing
-
Not Synced
the detection risk of attackers
-
Not Synced
and we also want to have a cryptographic
proof of where something went wrong.
-
Not Synced
Now, to look at some pictures,
I will present the data structure that
-
Not Synced
we use in order to achieve these goals.
-
Not Synced
The data structure is hash tree,
a Merkle tree which is
-
Not Synced
a data structure that operates over a list.
-
Not Synced
So we have a list of these squares here
which represent the list items.
-
Not Synced
In our case, this is going to be
the files containing a package metedata
-
Not Synced
that just dependencies, a hash sum of
packages
-
Not Synced
and also the source packages themselves
are going to be elements in this list.
-
Not Synced
The tree works as follows.
-
Not Synced
It uses a cryptographic has function
-
Not Synced
which is a collision resistant compressing
function
-
Not Synced
and the labels of the inner nodes
of the tree are computed as
-
Not Synced
the hashes of the children. Ok?
-
Not Synced
Once we have computed the root hash,
the root label,
-
Not Synced
we have fixed all the elements and
none of the elements can be changed
-
Not Synced
without changing the root hash.
-
Not Synced
We can exploit this in order to
efficiently prove
-
Not Synced
the two following properties for ???
-
Not Synced
First of all, we can efficiently prove
the inclusion of a given element
-
Not Synced
in the list.
-
Not Synced
If we know the tree root ???,
this works as follows:
-
Not Synced
let's make a quick example, we see
the third list item is marked with an X
-
Not Synced
and if I know the tree root, then
the server operating the tree structure
-
Not Synced
will only need to give me the three gray
marked labels,
-
Not Synced
the three marked node values and then
I can recompute the root hash and
-
Not Synced
be convinced that this element actually
was contained in the list.
-
Not Synced
The second property is that we can also
efficiently verify the append-only operation
-
Not Synced
of the list.
-
Not Synced
So we can have a log server operating
this kind of structure and
-
Not Synced
the log server need not to be trusted,
-
Not Synced
it's not going to be trusted third party
but rather, its operation can be
-
Not Synced
verified from the outside.
-
Not Synced
So, what does this design look like?
-
Not Synced
The theoretical foundation is called
a transparency overlay and
-
Not Synced
in our system it looks like this:
-
Not Synced
We have the archive as per usual,
-
Not Synced
we have a log server and the archive will
submit package metadata, the release file,
-
Not Synced
the packages file containing dependencies
and so on and the source code
-
Not Synced
into this log server.
-
Not Synced
The apt client will be augmented with
an auditor component and
-
Not Synced
this auditor component is responsible for
verifying the correct log operation
-
Not Synced
as well as the inclusion of the downloaded
release into the log.
-
Not Synced
This is a mechanism which we will be able
to make sure that everybody is running
-
Not Synced
the exact same version of the software
they installed.
-
Not Synced
A third component is the monitor.
-
Not Synced
The monitor is necessary also to verify
log operation and also to inspect
-
Not Synced
the elements that are contained in the log
-
Not Synced
The monitor would then be run by groups
of individuals or individuals that want to
-
Not Synced
make sure of certain properties in the log
-
Not Synced
Alright, let's quickly recap.
-
Not Synced
We have added this log server, which
can prove two properties efficiently
-
Not Synced
to the outside world.
-
Not Synced
And we have the auditor and monitor
components.
-
Not Synced
The auditor is added to the apt client
and the monitor does
-
Not Synced
additional investigative tasks.
-
Not Synced
Now, in order to make this system work,
we need to…
-
Not Synced
I need to make a few assumptions.
-
Not Synced
The archive will need to handle
log submission and distribution of
-
Not Synced
certain log datastructure.
-
Not Synced
These are usually very small things
given to the archive
-
Not Synced
in response to submission.
-
Not Synced
Then I'm assuming a very consistent
release frequency.
-
Not Synced
The archive is responsible for distributing
reproducible binaries
-
Not Synced
in my architecture.
-
Not Synced
i'm assuming that the buildinfo files are
covered by the release file
-
Not Synced
I treat them as additional source metadata
so whenever the source package or
-
Not Synced
the buildinfo file changes, I expect
an increase in the binary version number.
-
Not Synced
I also assume source-only uploads and
one additional thing that we have,
-
Not Synced
keyring source package treated
by the archive as authoritative and
-
Not Synced
this keyring must have
the special property that
-
Not Synced
is operated in append-only so that
we can go back in time and see
-
Not Synced
what keys were authorized at different
points in time.
-
Not Synced
The log server is a standalone server
component that speaks at the moment on
-
Not Synced
an HTTP-based protocol.
-
Not Synced
Probably one would want to have
more than one, but
-
Not Synced
we are going to have, I think,
a much easier time running log servers
-
Not Synced
than for example, the certificate
transparency people
-
Not Synced
because we only have one source
of writing access,
-
Not Synced
namely the archive, so we can easily
schedule the write access,
-
Not Synced
and you can have read-only frontends that
aren't quite critical.
-
Not Synced
The auditor component would need to be
integrated into the apt client or library.
-
Not Synced
It needs two things like cryptographic
verifications,
-
Not Synced
understand a bit more file formats and
some more network access.
-
Not Synced
Parts of the proof could also
probably distribute over
-
Not Synced
the mirror network and we need
not necessarily do everything
-
Not Synced
??? communication with
the log server.
-
Not Synced
So, this covers archive auditor and
log server.
-
Not Synced
The monitoring servers have a few functions
that are necessary for the verification of
-
Not Synced
the log itself, meaning that they verify
the append-only operation of the log
-
Not Synced
and they will also likely want to exchange
the tree roots with perhaps
-
Not Synced
other monitors and some auditors.
-
Not Synced
The important verification functions
of the log server are validating
-
Not Synced
the metadata of the release packages and
sources file,
-
Not Synced
namely making sure that these are complete,
that the sources are available,
-
Not Synced
that the versions are incremented
correctly and so on.
-
Not Synced
And that's necessary to make sure that
a compromised archive can't do
-
Not Synced
certain attacks.
-
Not Synced
Also in this category is the fact that
we depend on a fixed release frequency
-
Not Synced
and monitors will also be verifying
the upload ACL,
-
Not Synced
meaning which keys are authorized to
upload.
-
Not Synced
Monitors also would be verifying
reproducible builds in this scenario.
-
Not Synced
That's the monitoring functions and
I think that many different people and
-
Not Synced
groups in Debian could get some benefits
out of these monitoring functions
-
Not Synced
in order to verify that everything
worked correctly.
-
Not Synced
We should note that all these verifications
are completely independent of
-
Not Synced
the existing infrastructure because
happening on the client side.
-
Not Synced
So we don't depend on any notifications
from the existing infrastructure that
-
Not Synced
works correctly and no notifications
are stopped.
-
Not Synced
This can be done completely
on the client side using
-
Not Synced
the data provided by the log server.
-
Not Synced
For example, maintainers could verify that
the code uploaded builds reproducibly
-
Not Synced
using the corresponding build info or
-
Not Synced
they could have checks:
-
Not Synced
which uploads were done using their key
-
Not Synced
which packages were modified perhaps
by other people
-
Not Synced
the keyring maintainers or account
managers could be looking at
-
Not Synced
the keyring: what keys are in the keyring
and what uploads were done
-
Not Synced
using which keys.
-
Not Synced
And the archive, last but not least, has
an additional verification step available
-
Not Synced
to make sure all the metadata was produced
correctly and to know
-
Not Synced
??? things happened during the production
of a given release.
-
Not Synced
This thing actually exists.
-
Not Synced
Well, I have programmed prototypes
for all these components,
-
Not Synced
meaning nothing that would be ready
to implement,
-
Not Synced
but to show that it actually works.
-
Not Synced
I've used two years of Debian Stretch
releases and fed it into the system.
-
Not Synced
This resulted in a tree size of
270,000 elements and
-
Not Synced
the storage required was about 400GB
where almost all of that is
-
Not Synced
source packages.
-
Not Synced
I would say that it's imminently feasible
to do this.
-
Not Synced
The monitor functions run rather cheaply.
-
Not Synced
A monitor needs not necessarily to keep
a complete copy of the log in all cases
-
Not Synced
but what I noticed some unexpected events
in the package metadata.
-
Not Synced
I have observed sources missing and
version increments missing where
-
Not Synced
I think there should be a version increment
-
Not Synced
So I ??? be looking more closely
into these cases.
-
Not Synced
If anybody is interested at
the theoretical side of this,
-
Not Synced
this would be the immediate pointers
I can give.
-
Not Synced
The first paper is the theoretical and
mathematical foundation and
-
Not Synced
the other ones are applications of
similar transparency work, but
-
Not Synced
with different goals.
-
Not Synced
Summarizing, we can introduce a system
to detect target backdoors,
-
Not Synced
even under compromise of the archive.
-
Not Synced
We need to add a bit more infrastructure
and need to change how some things are done
-
Not Synced
We also can improve the auditability of
what we can securely identify when
-
Not Synced
things go wrong.
-
Not Synced
In particular, we can make sure that for
every binary, we can get
-
Not Synced
the source code that was used to produced
the binary
-
Not Synced
and then identify
the responsible maintainer.
-
Not Synced
There's one class of attacks I have left
out for today,
-
Not Synced
if anybody wants to talk about that, we
can do so too.
-
Not Synced
And now, I'm interested in your questions
and feedback.
-
Not Synced
[Applause]
-
Not Synced
[Q] Did you already test the reproducibility
and how do you interact with
-
Not Synced
problems of not reproducible packages?
-
Not Synced
i mean, do you not integrate some
into the log?
-
Not Synced
[A] For now, the implementation of
my monitor functions hasn't covered
-
Not Synced
reproducibility.
-
Not Synced
I think the first step to do so would be
to have a blacklist of packages
-
Not Synced
that are known not to be built reproducibly
and then try to get on with it.
-
Not Synced
[Q] Two questions.
-
Not Synced
You say "authenticating metadata and
code".
-
Not Synced
This means signing or what is it exactly,
"authenticating"?
-
Not Synced
[A] At which point?
-
Not Synced
[Q] It was… back. Where the tree is.
-
Not Synced
Yes, yes. The tree before that.
-
Not Synced
[A] Ok.
-
Not Synced
This authentication here doesn't quite mean
a signature.
-
Not Synced
It means if I know the value of the root
of the hashtree, then
-
Not Synced
I can be assured that a given element
is included if I'm told
-
Not Synced
the value of the three gray marked
inner nodes here.
-
Not Synced
And that works by recomputing
the hash tree.
-
Not Synced
[Q] Ok, I think I have to defer this
to after the talk.
-
Not Synced
[A] Yeah, I can explain.
-
Not Synced
[Q] Another question would be,
-
Not Synced
so, detection of targeted backdoors.
-
Not Synced
You mean at the stage of signing archive
or which backdoors?
-
Not Synced
[A] The scenario would be that
the signing key of the archive is
-
Not Synced
used to create an additional release file
which covers
-
Not Synced
a manipulated software version.
-
Not Synced
And this software version and signature is
only shown to the victim population
-
Not Synced
and not to the general population.
-
Not Synced
This means that the malicious software
would only be observed by the victim
-
Not Synced
and not by everybody else.
-
Not Synced
My goal is to force the attacker to
distribute the malicious software
-
Not Synced
to the whole world in order to increase
-
Not Synced
the chance that they're going to be
detected and thereby deterring perhaps
-
Not Synced
the attack from the beginning.
-
Not Synced
[Q] Great talk. Great ideas as well.
-
Not Synced
I really liked your slide on
your assumptions
-
Not Synced
???
honest about them like
-
Not Synced
"yeah we assume ???"
-
Not Synced
I wouldn't underestimate how difficult
it would be to make
-
Not Synced
some of these changes.
-
Not Synced
I mean, even ones that look simple, like
source-only uploads.
-
Not Synced
Everyone wants them, right?
-
Not Synced
[A] Yes, sure, we have to start somewhere
and I hope if people are convinced that
-
Not Synced
this is a great idea and we should to this
-
Not Synced
then we get some more impetus
for these things that everybody wants
-
Not Synced
like source-only uploads.
-
Not Synced
[Q] Thank you, yeah, and it will be really
pretty good to base this stuff
-
Not Synced
on ??? effort
??? build on the same choices.
-
Not Synced
Thank you.
-
Not Synced
[A] Yeah, so I'm interested in any kind
of feedback.
-
Not Synced
If you think it's a great idea or think
there are some problems I might have missed
-
Not Synced
or it might get difficult to implement.
-
Not Synced
Please come to me in case you have
anything.
-
Not Synced
[Applause]