-
Not Synced
This will be an academic talk
as announced.
-
Not Synced
I will try to bring some of my research
I did during my PhD into the real world.
-
Not Synced
We are going to talk about the security
of software distribution and
-
Not Synced
I'm going to propose a security feature
that adds on top of
-
Not Synced
the signatures we have up to today
-
Not Synced
and also the reproducible builds that
we already have to very large degree.
-
Not Synced
I am going to highlight a few points where
I think infrastructure changes are required
-
Not Synced
to accommodate this system and I would
also appreciate any feedback
-
Not Synced
you might have.
-
Not Synced
I'm going to ??? a few motivation of
what should we care about.
-
Not Synced
In the security of software distribution
-
Not Synced
we already do have
cryptographic signatures
-
Not Synced
I've just put up a few examples of
recent attacks that involved
-
Not Synced
the distribution of software where
people who presumably thought
-
Not Synced
they knew what they were doing had
grave problems with software distribution.
-
Not Synced
For example, the juniper backdoors,
pretty famous.
-
Not Synced
Juniper discovered two backdoors
in the code and
-
Not Synced
nobody really knew where they were
coming from.
-
Not Synced
Another example would be
Chrome extension developers
-
Not Synced
who got their credentials fished and
subsequently their extensions backdoored
-
Not Synced
or another example, a signed update to
a banking software actually included
-
Not Synced
a malware and infected several banks.
-
Not Synced
I hope this is motivation for us to
consider this kinds of text
-
Not Synced
to be possible and to prepare ourselves.
-
Not Synced
I have two main goals in the system
I am going to propose.
-
Not Synced
The first is to relax trust in the archive.
-
Not Synced
In particular, what I want to achieve is
a level of security even if
-
Not Synced
the archive is compromised and
the specific thing I am going to do is
-
Not Synced
to detect targeted backdoors.
-
Not Synced
That means backdoors that are distributed
only to a subset of the population and
-
Not Synced
what we can achieve is to force
the attacker to deliver the malware
-
Not Synced
to everybody, thereby greatly decreasing
their degree of stealth and increasing
-
Not Synced
their danger of detection.
-
Not Synced
This would work to our advantage.
-
Not Synced
The second goal is the forensic auditability
-
Not Synced
which overlaps to a surprising degree
with the first one in technical terms,
-
Not Synced
in terms of implementation.
-
Not Synced
So, what I want to ensure is that we have
-
Not Synced
inspectable source code for every binary.
-
Not Synced
We do have of course the source code
available from our packages, but
-
Not Synced
only for the most recent version,
everything else is a best effort
-
Not Synced
by the code archiving services.
-
Not Synced
The mapping between those and binary
can be verified once we have
-
Not Synced
reproducible builds to a large extent.
-
Not Synced
I want to make sure that we can identify
the maintainer responsible for distribution
-
Not Synced
of a particular package and the system
is also interested in providing
-
Not Synced
attribution of where something went from,
-
Not Synced
so that we are not in a situation where we
notice something went wrong but
-
Not Synced
we don't really know where we have to look
in order to find the problems
-
Not Synced
but that we really have specific and
secured indication of
-
Not Synced
where a compromised problem
was coming from.
-
Not Synced
Let's quickly recap how our software
distribution works.
-
Not Synced
We have the maintainers who upload
their code to the archive.
-
Not Synced
The archive has access to a signing key
which signs the releases.
-
Not Synced
Actually, metadata covering all the actual
binary packages.
-
Not Synced
These are distributed over
the mirror network
-
Not Synced
from where the app clients will download
the package metadata.
-
Not Synced
That means the hash sums for the packages,
their dependencies and so on
-
Not Synced
as well as the actual packages themselves.
-
Not Synced
This central architecture has an important
advantage,
-
Not Synced
mainly the mirror network need not
to be trusted, right?
-
Not Synced
We have the signature that covers all
the contents of binary and source packages
-
Not Synced
and the metadata, so the mirror network
need not to be trusted.
-
Not Synced
On the other hand, it makes the archive and
the signing key a very interesting target
-
Not Synced
for attackers because this central point
controls all the signing operations.
-
Not Synced
So this is a place where we need to be
particularly careful and perhaps
-
Not Synced
maybe even do better than
cryptographic signatures.
-
Not Synced
This is where the main focus of this talk
will be, although I will also consider
-
Not Synced
the uploaders to some extent.
-
Not Synced
We want to achieve two things:
-
Not Synced
resistance against key compromise and
targeted backdoors and
-
Not Synced
to get some better support for auditing
in case things go wrong.
-
Not Synced
The approach that we choose to do this is
-
Not Synced
we want to make sure that everybody runs
exactly the same software
-
Not Synced
or at least the parts of it these choose
to install.
-
Not Synced
If we think about that for a moment,
this gives us a number of advantages.
-
Not Synced
For example, all the analysis that's done
on a piece of software immediately
-
Not Synced
carries over to all other users of
the software, right?
-
Not Synced
Because if we haven't made sure that
everybody installs the same software,
-
Not Synced
they might not have exactly
the same version and perhaps
-
Not Synced
some backdoored version.
-
Not Synced
This also ensures that we cannot suffer
targeted backdoors by increasing
-
Not Synced
the detection risk of attackers
-
Not Synced
and we also want to have a cryptographic
proof of where something went wrong.
-
Not Synced
Now, to look at some pictures,
I will present the data structure that
-
Not Synced
we use in order to achieve these goals.
-
Not Synced
The data structure is hash tree,
a Merkle tree which is
-
Not Synced
a data structure that operates over a list.
-
Not Synced
So we have a list of these squares here
which represent the list items.
-
Not Synced
In our case, this is going to be
the files containing a package metedata
-
Not Synced
that just dependencies, a hash sum of
packages
-
Not Synced
and also the source packages themselves
are going to be elements in this list.
-
Not Synced
The tree works as follows.
-
Not Synced
It uses a cryptographic has function
-
Not Synced
which is a collision resistant compressing
function
-
Not Synced
and the labels of the inner nodes
of the tree are computed as
-
Not Synced
the hashes of the children. Ok?
-
Not Synced
Once we have computed the root hash,
the root label,
-
Not Synced
we have fixed all the elements and
none of the elements can be changed
-
Not Synced
without changing the root hash.
-
Not Synced
We can exploit this in order to
efficiently prove
-
Not Synced
the two following properties for ???
-
Not Synced
First of all, we can efficiently prove
the inclusion of a given element
-
Not Synced
in the list.
-
Not Synced
If we know the tree root ???,
this works as follows:
-
Not Synced
let's make a quick example, we see
the third list item is marked with an X
-
Not Synced
and if I know the tree root, then
the server operating the tree structure
-
Not Synced
will only need to give me the three grey
marked labels,
-
Not Synced
the three marked node values and then
I can recompute the root hash and
-
Not Synced
be convinced that this element actually
was contained in the list.
-
Not Synced
The second property is that we can also
efficiently verify the append-only operation
-
Not Synced
of the list.
-
Not Synced
So we can have a log server operating
this kind of structure and
-
Not Synced
the log server need not to be trusted,
-
Not Synced
it's not going to be trusted third party
but rather, its operation can be
-
Not Synced
verified from the outside.
-
Not Synced
So, what does this design look like?
-
Not Synced
The theoretical foundation is called
a transparency overlay and
-
Not Synced
in our system it looks like this:
-
Not Synced
We have the archive as per usual,
-
Not Synced
we have a log server and the archive will
submit package metadata, the release file,
-
Not Synced
the packages file containing dependencies
and so on and the source code
-
Not Synced
into this log server.
-
Not Synced
The app client will be augmented with
an auditor component and
-
Not Synced
this auditor component is responsible for
verifying the correct log operation
-
Not Synced
as well as the inclusion of the downloaded
release into the log.