Return to Video

Software transparency: package security beyond signatures and reproducible builds

  • Not Synced
    This will be an academic talk
    as announced.
  • Not Synced
    I will try to bring some of my research
    I did during my PhD into the real world.
  • Not Synced
    We are going to talk about the security
    of software distribution and
  • Not Synced
    I'm going to propose a security feature
    that adds on top of
  • Not Synced
    the signatures we have up to today
  • Not Synced
    and also the reproducible builds that
    we already have to very large degree.
  • Not Synced
    I am going to highlight a few points where
    I think infrastructure changes are required
  • Not Synced
    to accommodate this system and I would
    also appreciate any feedback
  • Not Synced
    you might have.
  • Not Synced
    I'm going to ??? a few motivation of
    what should we care about.
  • Not Synced
    In the security of software distribution
  • Not Synced
    we already do have
    cryptographic signatures
  • Not Synced
    I've just put up a few examples of
    recent attacks that involved
  • Not Synced
    the distribution of software where
    people who presumably thought
  • Not Synced
    they knew what they were doing had
    grave problems with software distribution.
  • Not Synced
    For example, the juniper backdoors,
    pretty famous.
  • Not Synced
    Juniper discovered two backdoors
    in the code and
  • Not Synced
    nobody really knew where they were
    coming from.
  • Not Synced
    Another example would be
    Chrome extension developers
  • Not Synced
    who got their credentials fished and
    subsequently their extensions backdoored
  • Not Synced
    or another example, a signed update to
    a banking software actually included
  • Not Synced
    a malware and infected several banks.
  • Not Synced
    I hope this is motivation for us to
    consider this kinds of text
  • Not Synced
    to be possible and to prepare ourselves.
  • Not Synced
    I have two main goals in the system
    I am going to propose.
  • Not Synced
    The first is to relax trust in the archive.
  • Not Synced
    In particular, what I want to achieve is
    a level of security even if
  • Not Synced
    the archive is compromised and
    the specific thing I am going to do is
  • Not Synced
    to detect targeted backdoors.
  • Not Synced
    That means backdoors that are distributed
    only to a subset of the population and
  • Not Synced
    what we can achieve is to force
    the attacker to deliver the malware
  • Not Synced
    to everybody, thereby greatly decreasing
    their degree of stealth and increasing
  • Not Synced
    their danger of detection.
  • Not Synced
    This would work to our advantage.
  • Not Synced
    The second goal is the forensic auditability
  • Not Synced
    which overlaps to a surprising degree
    with the first one in technical terms,
  • Not Synced
    in terms of implementation.
  • Not Synced
    So, what I want to ensure is that we have
  • Not Synced
    inspectable source code for every binary.
  • Not Synced
    We do have of course the source code
    available from our packages, but
  • Not Synced
    only for the most recent version,
    everything else is a best effort
  • Not Synced
    by the code archiving services.
  • Not Synced
    The mapping between those and binary
    can be verified once we have
  • Not Synced
    reproducible builds to a large extent.
  • Not Synced
    I want to make sure that we can identify
    the maintainer responsible for distribution
  • Not Synced
    of a particular package and the system
    is also interested in providing
  • Not Synced
    attribution of where something went from,
  • Not Synced
    so that we are not in a situation where we
    notice something went wrong but
  • Not Synced
    we don't really know where we have to look
    in order to find the problems
  • Not Synced
    but that we really have specific and
    secured indication of
  • Not Synced
    where a compromised problem
    was coming from.
  • Not Synced
    Let's quickly recap how our software
    distribution works.
  • Not Synced
    We have the maintainers who upload
    their code to the archive.
  • Not Synced
    The archive has access to a signing key
    which signs the releases.
  • Not Synced
    Actually, metadata covering all the actual
    binary packages.
  • Not Synced
    These are distributed over
    the mirror network
  • Not Synced
    from where the apt clients will download
    the package metadata.
  • Not Synced
    That means the hash sums for the packages,
    their dependencies and so on
  • Not Synced
    as well as the actual packages themselves.
  • Not Synced
    This central architecture has an important
    advantage,
  • Not Synced
    mainly the mirror network need not
    to be trusted, right?
  • Not Synced
    We have the signature that covers all
    the contents of binary and source packages
  • Not Synced
    and the metadata, so the mirror network
    need not to be trusted.
  • Not Synced
    On the other hand, it makes the archive and
    the signing key a very interesting target
  • Not Synced
    for attackers because this central point
    controls all the signing operations.
  • Not Synced
    So this is a place where we need to be
    particularly careful and perhaps
  • Not Synced
    maybe even do better than
    cryptographic signatures.
  • Not Synced
    This is where the main focus of this talk
    will be, although I will also consider
  • Not Synced
    the uploaders to some extent.
  • Not Synced
    We want to achieve two things:
  • Not Synced
    resistance against key compromise and
    targeted backdoors and
  • Not Synced
    to get some better support for auditing
    in case things go wrong.
  • Not Synced
    The approach that we choose to do this is
  • Not Synced
    we want to make sure that everybody runs
    exactly the same software
  • Not Synced
    or at least the parts of it these choose
    to install.
  • Not Synced
    If we think about that for a moment,
    this gives us a number of advantages.
  • Not Synced
    For example, all the analysis that's done
    on a piece of software immediately
  • Not Synced
    carries over to all other users of
    the software, right?
  • Not Synced
    Because if we haven't made sure that
    everybody installs the same software,
  • Not Synced
    they might not have exactly
    the same version and perhaps
  • Not Synced
    some backdoored version.
  • Not Synced
    This also ensures that we cannot suffer
    targeted backdoors by increasing
  • Not Synced
    the detection risk of attackers
  • Not Synced
    and we also want to have a cryptographic
    proof of where something went wrong.
  • Not Synced
    Now, to look at some pictures,
    I will present the data structure that
  • Not Synced
    we use in order to achieve these goals.
  • Not Synced
    The data structure is hash tree,
    a Merkle tree which is
  • Not Synced
    a data structure that operates over a list.
  • Not Synced
    So we have a list of these squares here
    which represent the list items.
  • Not Synced
    In our case, this is going to be
    the files containing a package metedata
  • Not Synced
    that just dependencies, a hash sum of
    packages
  • Not Synced
    and also the source packages themselves
    are going to be elements in this list.
  • Not Synced
    The tree works as follows.
  • Not Synced
    It uses a cryptographic has function
  • Not Synced
    which is a collision resistant compressing
    function
  • Not Synced
    and the labels of the inner nodes
    of the tree are computed as
  • Not Synced
    the hashes of the children. Ok?
  • Not Synced
    Once we have computed the root hash,
    the root label,
  • Not Synced
    we have fixed all the elements and
    none of the elements can be changed
  • Not Synced
    without changing the root hash.
  • Not Synced
    We can exploit this in order to
    efficiently prove
  • Not Synced
    the two following properties for ???
  • Not Synced
    First of all, we can efficiently prove
    the inclusion of a given element
  • Not Synced
    in the list.
  • Not Synced
    If we know the tree root ???,
    this works as follows:
  • Not Synced
    let's make a quick example, we see
    the third list item is marked with an X
  • Not Synced
    and if I know the tree root, then
    the server operating the tree structure
  • Not Synced
    will only need to give me the three grey
    marked labels,
  • Not Synced
    the three marked node values and then
    I can recompute the root hash and
  • Not Synced
    be convinced that this element actually
    was contained in the list.
  • Not Synced
    The second property is that we can also
    efficiently verify the append-only operation
  • Not Synced
    of the list.
  • Not Synced
    So we can have a log server operating
    this kind of structure and
  • Not Synced
    the log server need not to be trusted,
  • Not Synced
    it's not going to be trusted third party
    but rather, its operation can be
  • Not Synced
    verified from the outside.
  • Not Synced
    So, what does this design look like?
  • Not Synced
    The theoretical foundation is called
    a transparency overlay and
  • Not Synced
    in our system it looks like this:
  • Not Synced
    We have the archive as per usual,
  • Not Synced
    we have a log server and the archive will
    submit package metadata, the release file,
  • Not Synced
    the packages file containing dependencies
    and so on and the source code
  • Not Synced
    into this log server.
  • Not Synced
    The apt client will be augmented with
    an auditor component and
  • Not Synced
    this auditor component is responsible for
    verifying the correct log operation
  • Not Synced
    as well as the inclusion of the downloaded
    release into the log.
  • Not Synced
    This is a mechanism which we will be able
    to make sure that everybody is running
  • Not Synced
    the exact same version of the software
    they installed.
  • Not Synced
    A third component is the monitor.
  • Not Synced
    The monitor is necessary also to verify
    log operation and also to inspect
  • Not Synced
    the elements that are contained in the log
  • Not Synced
    The monitor would then be run by groups
    of individuals or individuals that want to
  • Not Synced
    make sure of certain properties in the log
  • Not Synced
    Alright, let's quickly recap.
  • Not Synced
    We have added this log server, which
    can prove two properties efficiently
  • Not Synced
    to the outside world.
  • Not Synced
    And we have the auditor and monitor
    components.
  • Not Synced
    The auditor is added to the apt client
    and the monitor does
  • Not Synced
    additional investigative tasks.
  • Not Synced
    Now, in order to make this system work,
    we need to…
  • Not Synced
    I need to make a few assumptions.
  • Not Synced
    The archive will need to handle
    log submission and distribution of
  • Not Synced
    certain log datastructure.
  • Not Synced
    These are usually very small things
    given to the archive
  • Not Synced
    in response to submission.
  • Not Synced
    Then I'm assuming a very consistent
    release frequency.
  • Not Synced
    The archive is responsible for distributing
    reproducible binaries
  • Not Synced
    in my architecture.
  • Not Synced
    i'm assuming that the buildinfo files are
    covered by the release file
  • Not Synced
    I treat them as additional source metadata
    so whenever the source package or
  • Not Synced
    the buildinfo file changes, I expect
    an increase in the binary version number.
  • Not Synced
    I also assume source-only uploads and
    one additional thing that we have,
  • Not Synced
    keyring source package treated
    by the archive as authoritative and
  • Not Synced
    this keyring must have
    the special property that
  • Not Synced
    is operated in append-only so that
    we can go back in time and see
  • Not Synced
    what keys were authorized at different
    points in time.
  • Not Synced
    The log server is a standalone server
    component that speaks at the moment on
  • Not Synced
    an HTTP-based protocol.
  • Not Synced
    Probably one would want to have
    more than one, but
  • Not Synced
    we are going to have, I think,
    a much easier time running log servers
  • Not Synced
    than for example, the certificate
    transparency people
  • Not Synced
    because we only have one source
    of writing access,
  • Not Synced
    namely the archive, so we can easily
    schedule the write access,
  • Not Synced
    and you can have read-only frontends that
    aren't quite critical.
  • Not Synced
    The auditor component would need to be
    integrated into the apt client or library.
  • Not Synced
    It needs two things like cryptographic
    verifications,
  • Not Synced
    understand a bit more file formats and
    some more network access.
  • Not Synced
    Parts of the proof could also
    probably distribute over
  • Not Synced
    the mirror network and we need
    not necessarily do everything
  • Not Synced
    ??? communication with
    the log server.
  • Not Synced
    So, this covers archive auditor and
    log server.
  • Not Synced
    The monitoring servers have a few functions
    that are necessary for the verification of
  • Not Synced
    the log itself, meaning that they verify
    the append-only operation of the log
  • Not Synced
    and they will also likely want to exchange
    the tree roots with perhaps
  • Not Synced
    other monitors and some auditors.
  • Not Synced
    The important verification functions
    of the log server are validating
  • Not Synced
    the metadata of the release packages and
    sources file,
  • Not Synced
    namely making sure that these are complete,
    that the sources are available,
  • Not Synced
    that the versions are incremented
    correctly and so on.
  • Not Synced
    And that's necessary to make sure that
    a compromised archive can't do
  • Not Synced
    certain attacks.
  • Not Synced
    Also in this category is the fact that
    we depend on a fixed release frequency
  • Not Synced
    and monitors will also be verifying
    the upload ACL,
  • Not Synced
    meaning which keys are authorized to
    upload.
  • Not Synced
    Monitors also would be verifying
    reproducible builds in this scenario.
  • Not Synced
    That's the monitoring functions and
    I think that many different people and
  • Not Synced
    groups in Debian could get some benefits
    out of these monitoring functions
  • Not Synced
    in order to verify that everything
    worked correctly.
  • Not Synced
    We should note that all these verifications
    are completely independent of
  • Not Synced
    the existing infrastructure because
    happening on the client side.
  • Not Synced
    So we don't depend on any notifications
    from the existing infrastructure that
  • Not Synced
    works correctly and no notifications
    are stopped.
  • Not Synced
    This can be done completely
    on the client side using
  • Not Synced
    the data provided by the log server.
  • Not Synced
    For example, maintainers could verify that
    the code uploaded builds reproducibly
  • Not Synced
    using the corresponding build info or
  • Not Synced
    they could have checks:
  • Not Synced
    which uploads were done using their key
  • Not Synced
    which packages were modified perhaps
    by other people
  • Not Synced
    the keyring maintainers or account
    managers could be looking at
  • Not Synced
    the keyring: what keys are in the keyring
    and what uploads were done
  • Not Synced
    using which keys.
  • Not Synced
    And the archive, last but not least, has
    an additional verification step available
  • Not Synced
    to make sure all the metadata was produced
    correctly and to know
  • Not Synced
    ??? things happened during the production
    of a given release.
Title:
Software transparency: package security beyond signatures and reproducible builds
Description:

more » « less
Video Language:
English
Team:
Debconf
Project:
2018_mini-debconf-hamburg
Duration:
25:04

English subtitles

Incomplete

Revisions Compare revisions