-
Not Synced
This will be an academic talk
as announced.
-
Not Synced
I will try to bring some of my research
I did during my PhD into the real world.
-
Not Synced
We are going to talk about the security
of software distribution and
-
Not Synced
I'm going to propose a security feature
that adds on top of
-
Not Synced
the signatures we have up to today
-
Not Synced
and also the reproducible builds that
we already have to very large degree.
-
Not Synced
I am going to highlight a few points where
I think infrastructure changes are required
-
Not Synced
to accommodate this system and I would
also appreciate any feedback
-
Not Synced
you might have.
-
Not Synced
I'm going to ??? a few motivation of
what should we care about.
-
Not Synced
In the security of software distribution
-
Not Synced
we already do have
cryptographic signatures
-
Not Synced
I've just put up a few examples of
recent attacks that involved
-
Not Synced
the distribution of software where
people who presumably thought
-
Not Synced
they knew what they were doing had
grave problems with software distribution.
-
Not Synced
For example, the juniper backdoors,
pretty famous.
-
Not Synced
Juniper discovered two backdoors
in the code and
-
Not Synced
nobody really knew where they were
coming from.
-
Not Synced
Another example would be
Chrome extension developers
-
Not Synced
who got their credentials fished and
subsequently their extensions backdoored
-
Not Synced
or another example, a signed update to
a banking software actually included
-
Not Synced
a malware and infected several banks.
-
Not Synced
I hope this is motivation for us to
consider this kinds of text
-
Not Synced
to be possible and to prepare ourselves.
-
Not Synced
I have two main goals in the system
I am going to propose.
-
Not Synced
The first is to relax trust in the archive.
-
Not Synced
In particular, what I want to achieve is
a level of security even if
-
Not Synced
the archive is compromised and
the specific thing I am going to do is
-
Not Synced
to detect targeted backdoors.
-
Not Synced
That means backdoors that are distributed
only to a subset of the population and
-
Not Synced
what we can achieve is to force
the attacker to deliver the malware
-
Not Synced
to everybody, thereby greatly decreasing
their degree of stealth and increasing
-
Not Synced
their danger of detection.
-
Not Synced
This would work to our advantage.
-
Not Synced
The second goal is the forensic auditability
-
Not Synced
which overlaps to a surprising degree
with the first one in technical terms,
-
Not Synced
in terms of implementation.
-
Not Synced
So, what I want to ensure is that we have
-
Not Synced
inspectable source code for every binary.
-
Not Synced
We do have of course the source code
available from our packages, but
-
Not Synced
only for the most recent version,
everything else is a best effort
-
Not Synced
by the code archiving services.
-
Not Synced
The mapping between those and binary
can be verified once we have
-
Not Synced
reproducible builds to a large extent.
-
Not Synced
I want to make sure that we can identify
the maintainer responsible for distribution
-
Not Synced
of a particular package and the system
is also interested in providing
-
Not Synced
attribution of where something went from,
-
Not Synced
so that we are not in a situation where we
notice something went wrong but
-
Not Synced
we don't really know where we have to look
in order to find the problems
-
Not Synced
but that we really have specific and
secured indication of
-
Not Synced
where a compromised problem
was coming from.
-
Not Synced
Let's quickly recap how our software
distribution works.
-
Not Synced
We have the maintainers who upload
their code to the archive.
-
Not Synced
The archive has access to a signing key
which signs the releases.
-
Not Synced
Actually, metadata covering all the actual
binary packages.
-
Not Synced
These are distributed over
the mirror network
-
Not Synced
from where the apt clients will download
the package metadata.
-
Not Synced
That means the hash sums for the packages,
their dependencies and so on
-
Not Synced
as well as the actual packages themselves.
-
Not Synced
This central architecture has an important
advantage,
-
Not Synced
mainly the mirror network need not
to be trusted, right?
-
Not Synced
We have the signature that covers all
the contents of binary and source packages
-
Not Synced
and the metadata, so the mirror network
need not to be trusted.
-
Not Synced
On the other hand, it makes the archive and
the signing key a very interesting target
-
Not Synced
for attackers because this central point
controls all the signing operations.
-
Not Synced
So this is a place where we need to be
particularly careful and perhaps
-
Not Synced
maybe even do better than
cryptographic signatures.
-
Not Synced
This is where the main focus of this talk
will be, although I will also consider
-
Not Synced
the uploaders to some extent.
-
Not Synced
We want to achieve two things:
-
Not Synced
resistance against key compromise and
targeted backdoors and
-
Not Synced
to get some better support for auditing
in case things go wrong.
-
Not Synced
The approach that we choose to do this is
-
Not Synced
we want to make sure that everybody runs
exactly the same software
-
Not Synced
or at least the parts of it these choose
to install.
-
Not Synced
If we think about that for a moment,
this gives us a number of advantages.
-
Not Synced
For example, all the analysis that's done
on a piece of software immediately
-
Not Synced
carries over to all other users of
the software, right?
-
Not Synced
Because if we haven't made sure that
everybody installs the same software,
-
Not Synced
they might not have exactly
the same version and perhaps
-
Not Synced
some backdoored version.
-
Not Synced
This also ensures that we cannot suffer
targeted backdoors by increasing
-
Not Synced
the detection risk of attackers
-
Not Synced
and we also want to have a cryptographic
proof of where something went wrong.
-
Not Synced
Now, to look at some pictures,
I will present the data structure that
-
Not Synced
we use in order to achieve these goals.
-
Not Synced
The data structure is hash tree,
a Merkle tree which is
-
Not Synced
a data structure that operates over a list.
-
Not Synced
So we have a list of these squares here
which represent the list items.
-
Not Synced
In our case, this is going to be
the files containing a package metedata
-
Not Synced
that just dependencies, a hash sum of
packages
-
Not Synced
and also the source packages themselves
are going to be elements in this list.
-
Not Synced
The tree works as follows.
-
Not Synced
It uses a cryptographic has function
-
Not Synced
which is a collision resistant compressing
function
-
Not Synced
and the labels of the inner nodes
of the tree are computed as
-
Not Synced
the hashes of the children. Ok?
-
Not Synced
Once we have computed the root hash,
the root label,
-
Not Synced
we have fixed all the elements and
none of the elements can be changed
-
Not Synced
without changing the root hash.
-
Not Synced
We can exploit this in order to
efficiently prove
-
Not Synced
the two following properties for ???
-
Not Synced
First of all, we can efficiently prove
the inclusion of a given element
-
Not Synced
in the list.
-
Not Synced
If we know the tree root ???,
this works as follows:
-
Not Synced
let's make a quick example, we see
the third list item is marked with an X
-
Not Synced
and if I know the tree root, then
the server operating the tree structure
-
Not Synced
will only need to give me the three grey
marked labels,
-
Not Synced
the three marked node values and then
I can recompute the root hash and
-
Not Synced
be convinced that this element actually
was contained in the list.
-
Not Synced
The second property is that we can also
efficiently verify the append-only operation
-
Not Synced
of the list.
-
Not Synced
So we can have a log server operating
this kind of structure and
-
Not Synced
the log server need not to be trusted,
-
Not Synced
it's not going to be trusted third party
but rather, its operation can be
-
Not Synced
verified from the outside.
-
Not Synced
So, what does this design look like?
-
Not Synced
The theoretical foundation is called
a transparency overlay and
-
Not Synced
in our system it looks like this:
-
Not Synced
We have the archive as per usual,
-
Not Synced
we have a log server and the archive will
submit package metadata, the release file,
-
Not Synced
the packages file containing dependencies
and so on and the source code
-
Not Synced
into this log server.
-
Not Synced
The apt client will be augmented with
an auditor component and
-
Not Synced
this auditor component is responsible for
verifying the correct log operation
-
Not Synced
as well as the inclusion of the downloaded
release into the log.
-
Not Synced
This is a mechanism which we will be able
to make sure that everybody is running
-
Not Synced
the exact same version of the software
they installed.
-
Not Synced
A third component is the monitor.
-
Not Synced
The monitor is necessary also to verify
log operation and also to inspect
-
Not Synced
the elements that are contained in the log
-
Not Synced
The monitor would then be run by groups
of individuals or individuals that want to
-
Not Synced
make sure of certain properties in the log
-
Not Synced
Alright, let's quickly recap.
-
Not Synced
We have added this log server, which
can prove two properties efficiently
-
Not Synced
to the outside world.
-
Not Synced
And we have the auditor and monitor
components.
-
Not Synced
The auditor is added to the apt client
and the monitor does
-
Not Synced
additional investigative tasks.
-
Not Synced
Now, in order to make this system work,
we need to…
-
Not Synced
I need to make a few assumptions.
-
Not Synced
The archive will need to handle
log submission and distribution of
-
Not Synced
certain log datastructure.
-
Not Synced
These are usually very small things
given to the archive
-
Not Synced
in response to submission.
-
Not Synced
Then I'm assuming a very consistent
release frequency.
-
Not Synced
The archive is responsible for distributing
reproducible binaries
-
Not Synced
in my architecture.
-
Not Synced
i'm assuming that the buildinfo files are
covered by the release file
-
Not Synced
I treat them as additional source metadata
so whenever the source package or
-
Not Synced
the buildinfo file changes, I expect
an increase in the binary version number.
-
Not Synced
I also assume source-only uploads and
one additional thing that we have,
-
Not Synced
keyring source package treated
by the archive as authoritative and
-
Not Synced
this keyring must have
the special property that
-
Not Synced
is operated in append-only so that
we can go back in time and see
-
Not Synced
what keys were authorized at different
points in time.
-
Not Synced
The log server is a standalone server
component that speaks at the moment on
-
Not Synced
an HTTP-based protocol.
-
Not Synced
Probably one would want to have
more than one, but
-
Not Synced
we are going to have, I think,
a much easier time running log servers
-
Not Synced
than for example, the certificate
transparency people
-
Not Synced
because we only have one source
of writing access,
-
Not Synced
namely the archive, so we can easily
schedule the write access,
-
Not Synced
and you can have read-only frontends that
aren't quite critical.
-
Not Synced
The auditor component would need to be
integrated into the apt client or library.
-
Not Synced
It needs two things like cryptographic
verifications,
-
Not Synced
understand a bit more file formats and
some more network access.
-
Not Synced
Parts of the proof could also
probably distribute over
-
Not Synced
the mirror network and we need
not necessarily do everything
-
Not Synced
??? communication with
the log server.
-
Not Synced
So, this covers archive auditor and
log server.
-
Not Synced
The monitoring servers have a few functions
that are necessary for the verification of
-
Not Synced
the log itself, meaning that they verify
the append-only operation of the log
-
Not Synced
and they will also likely want to exchange
the tree roots with perhaps
-
Not Synced
other monitors and some auditors.
-
Not Synced
The important verification functions
of the log server are validating
-
Not Synced
the metadata of the release packages and
sources file,
-
Not Synced
namely making sure that these are complete,
that the sources are available,
-
Not Synced
that the versions are incremented
correctly and so on.
-
Not Synced
And that's necessary to make sure that
a compromised archive can't do
-
Not Synced
certain attacks.
-
Not Synced
Also in this category is the fact that
we depend on a fixed release frequency
-
Not Synced
and monitors will also be verifying
the upload ACL,
-
Not Synced
meaning which keys are authorized to
upload.
-
Not Synced
Monitors also would be verifying
reproducible builds in this scenario.
-
Not Synced
That's the monitoring functions and
I think that many different people and
-
Not Synced
groups in Debian could get some benefits
out of these monitoring functions
-
Not Synced
in order to verify that everything
worked correctly.
-
Not Synced
We should note that all these verifications
are completely independent of
-
Not Synced
the existing infrastructure because
happening on the client side.
-
Not Synced
So we don't depend on any notifications
from the existing infrastructure that
-
Not Synced
works correctly and no notifications
are stopped.
-
Not Synced
This can be done completely
on the client side using
-
Not Synced
the data provided by the log server.
-
Not Synced
For example, maintainers could verify that
the code uploaded builds reproducibly
-
Not Synced
using the corresponding build info or
-
Not Synced
they could have checks:
-
Not Synced
which uploads were done using their key
-
Not Synced
which packages were modified perhaps
by other people
-
Not Synced
the keyring maintainers or account
managers could be looking at
-
Not Synced
the keyring: what keys are in the keyring
and what uploads were done
-
Not Synced
using which keys.
-
Not Synced
And the archive, last but not least, has
an additional verification step available
-
Not Synced
to make sure all the metadata was produced
correctly and to know
-
Not Synced
??? things happened during the production
of a given release.