< Return to Video

Software transparency: package security beyond signatures and reproducible builds

  • Not Synced
    This will be an academic talk
    as announced.
  • Not Synced
    I will try to bring some of my research
    I did during my PhD into the real world.
  • Not Synced
    We are going to talk about the security
    of software distribution and
  • Not Synced
    I'm going to propose a security feature
    that adds on top of
  • Not Synced
    the signatures we have up to today
  • Not Synced
    and also the reproducible builds that
    we already have to very large degree.
  • Not Synced
    I am going to highlight a few points where
    I think infrastructure changes are required
  • Not Synced
    to accommodate this system and I would
    also appreciate any feedback
  • Not Synced
    you might have.
  • Not Synced
    I'm going to ??? a few motivation of
    what should we care about.
  • Not Synced
    In the security of software distribution
  • Not Synced
    we already do have
    cryptographic signatures
  • Not Synced
    I've just put up a few examples of
    recent attacks that involved
  • Not Synced
    the distribution of software where
    people who presumably thought
  • Not Synced
    they knew what they were doing had
    grave problems with software distribution.
  • Not Synced
    For example, the juniper backdoors,
    pretty famous.
  • Not Synced
    Juniper discovered two backdoors
    in the code and
  • Not Synced
    nobody really knew where they were
    coming from.
  • Not Synced
    Another example would be
    Chrome extension developers
  • Not Synced
    who got their credentials fished and
    subsequently their extensions backdoored
  • Not Synced
    or another example, a signed update to
    a banking software actually included
  • Not Synced
    a malware and infected several banks.
  • Not Synced
    I hope this is motivation for us to
    consider this kinds of text
  • Not Synced
    to be possible and to prepare ourselves.
  • Not Synced
    I have two main goals in the system
    I am going to propose.
  • Not Synced
    The first is to relax trust in the archive.
  • Not Synced
    In particular, what I want to achieve is
    a level of security even if
  • Not Synced
    the archive is compromised and
    the specific thing I am going to do is
  • Not Synced
    to detect targeted backdoors.
  • Not Synced
    That means backdoors that are distributed
    only to a subset of the population and
  • Not Synced
    what we can achieve is to force
    the attacker to deliver the malware
  • Not Synced
    to everybody, thereby greatly decreasing
    their degree of stealth and increasing
  • Not Synced
    their danger of detection.
  • Not Synced
    This would work to our advantage.
  • Not Synced
    The second goal is the forensic auditability
  • Not Synced
    which overlaps to a surprising degree
    with the first one in technical terms,
  • Not Synced
    in terms of implementation.
  • Not Synced
    So, what I want to ensure is that we have
  • Not Synced
    inspectable source code for every binary.
  • Not Synced
    We do have of course the source code
    available from our packages, but
  • Not Synced
    only for the most recent version,
    everything else is a best effort
  • Not Synced
    by the code archiving services.
  • Not Synced
    The mapping between those and binary
    can be verified once we have
  • Not Synced
    reproducible builds to a large extent.
  • Not Synced
    I want to make sure that we can identify
    the maintainer responsible for distribution
  • Not Synced
    of a particular package and the system
    is also interested in providing
  • Not Synced
    attribution of where something went from,
  • Not Synced
    so that we are not in a situation where we
    notice something went wrong but
  • Not Synced
    we don't really know where we have to look
    in order to find the problems
  • Not Synced
    but that we really have specific and
    secured indication of
  • Not Synced
    where a compromised problem
    was coming from.
  • Not Synced
    Let's quickly recap how our software
    distribution works.
  • Not Synced
    We have the maintainers who upload
    their code to the archive.
  • Not Synced
    The archive has access to a signing key
    which signs the releases.
  • Not Synced
    Actually, metadata covering all the actual
    binary packages.
  • Not Synced
    These are distributed over
    the mirror network
  • Not Synced
    from where the apt clients will download
    the package metadata.
  • Not Synced
    That means the hash sums for the packages,
    their dependencies and so on
  • Not Synced
    as well as the actual packages themselves.
  • Not Synced
    This central architecture has an important
    advantage,
  • Not Synced
    mainly the mirror network need not
    to be trusted, right?
  • Not Synced
    We have the signature that covers all
    the contents of binary and source packages
  • Not Synced
    and the metadata, so the mirror network
    need not to be trusted.
  • Not Synced
    On the other hand, it makes the archive and
    the signing key a very interesting target
  • Not Synced
    for attackers because this central point
    controls all the signing operations.
  • Not Synced
    So this is a place where we need to be
    particularly careful and perhaps
  • Not Synced
    maybe even do better than
    cryptographic signatures.
  • Not Synced
    This is where the main focus of this talk
    will be, although I will also consider
  • Not Synced
    the uploaders to some extent.
  • Not Synced
    We want to achieve two things:
  • Not Synced
    resistance against key compromise and
    targeted backdoors and
  • Not Synced
    to get some better support for auditing
    in case things go wrong.
  • Not Synced
    The approach that we choose to do this is
  • Not Synced
    we want to make sure that everybody runs
    exactly the same software
  • Not Synced
    or at least the parts of it these choose
    to install.
  • Not Synced
    If we think about that for a moment,
    this gives us a number of advantages.
  • Not Synced
    For example, all the analysis that's done
    on a piece of software immediately
  • Not Synced
    carries over to all other users of
    the software, right?
  • Not Synced
    Because if we haven't made sure that
    everybody installs the same software,
  • Not Synced
    they might not have exactly
    the same version and perhaps
  • Not Synced
    some backdoored version.
  • Not Synced
    This also ensures that we cannot suffer
    targeted backdoors by increasing
  • Not Synced
    the detection risk of attackers
  • Not Synced
    and we also want to have a cryptographic
    proof of where something went wrong.
  • Not Synced
    Now, to look at some pictures,
    I will present the data structure that
  • Not Synced
    we use in order to achieve these goals.
  • Not Synced
    The data structure is hash tree,
    a Merkle tree which is
  • Not Synced
    a data structure that operates over a list.
  • Not Synced
    So we have a list of these squares here
    which represent the list items.
  • Not Synced
    In our case, this is going to be
    the files containing a package metedata
  • Not Synced
    that just dependencies, a hash sum of
    packages
  • Not Synced
    and also the source packages themselves
    are going to be elements in this list.
  • Not Synced
    The tree works as follows.
  • Not Synced
    It uses a cryptographic has function
  • Not Synced
    which is a collision resistant compressing
    function
  • Not Synced
    and the labels of the inner nodes
    of the tree are computed as
  • Not Synced
    the hashes of the children. Ok?
  • Not Synced
    Once we have computed the root hash,
    the root label,
  • Not Synced
    we have fixed all the elements and
    none of the elements can be changed
  • Not Synced
    without changing the root hash.
  • Not Synced
    We can exploit this in order to
    efficiently prove
  • Not Synced
    the two following properties for ???
  • Not Synced
    First of all, we can efficiently prove
    the inclusion of a given element
  • Not Synced
    in the list.
  • Not Synced
    If we know the tree root ???,
    this works as follows:
  • Not Synced
    let's make a quick example, we see
    the third list item is marked with an X
  • Not Synced
    and if I know the tree root, then
    the server operating the tree structure
  • Not Synced
    will only need to give me the three gray
    marked labels,
  • Not Synced
    the three marked node values and then
    I can recompute the root hash and
  • Not Synced
    be convinced that this element actually
    was contained in the list.
  • Not Synced
    The second property is that we can also
    efficiently verify the append-only operation
  • Not Synced
    of the list.
  • Not Synced
    So we can have a log server operating
    this kind of structure and
  • Not Synced
    the log server need not to be trusted,
  • Not Synced
    it's not going to be trusted third party
    but rather, its operation can be
  • Not Synced
    verified from the outside.
  • Not Synced
    So, what does this design look like?
  • Not Synced
    The theoretical foundation is called
    a transparency overlay and
  • Not Synced
    in our system it looks like this:
  • Not Synced
    We have the archive as per usual,
  • Not Synced
    we have a log server and the archive will
    submit package metadata, the release file,
  • Not Synced
    the packages file containing dependencies
    and so on and the source code
  • Not Synced
    into this log server.
  • Not Synced
    The apt client will be augmented with
    an auditor component and
  • Not Synced
    this auditor component is responsible for
    verifying the correct log operation
  • Not Synced
    as well as the inclusion of the downloaded
    release into the log.
  • Not Synced
    This is a mechanism which we will be able
    to make sure that everybody is running
  • Not Synced
    the exact same version of the software
    they installed.
  • Not Synced
    A third component is the monitor.
  • Not Synced
    The monitor is necessary also to verify
    log operation and also to inspect
  • Not Synced
    the elements that are contained in the log
  • Not Synced
    The monitor would then be run by groups
    of individuals or individuals that want to
  • Not Synced
    make sure of certain properties in the log
  • Not Synced
    Alright, let's quickly recap.
  • Not Synced
    We have added this log server, which
    can prove two properties efficiently
  • Not Synced
    to the outside world.
  • Not Synced
    And we have the auditor and monitor
    components.
  • Not Synced
    The auditor is added to the apt client
    and the monitor does
  • Not Synced
    additional investigative tasks.
  • Not Synced
    Now, in order to make this system work,
    we need to…
  • Not Synced
    I need to make a few assumptions.
  • Not Synced
    The archive will need to handle
    log submission and distribution of
  • Not Synced
    certain log datastructure.
  • Not Synced
    These are usually very small things
    given to the archive
  • Not Synced
    in response to submission.
  • Not Synced
    Then I'm assuming a very consistent
    release frequency.
  • Not Synced
    The archive is responsible for distributing
    reproducible binaries
  • Not Synced
    in my architecture.
  • Not Synced
    i'm assuming that the buildinfo files are
    covered by the release file
  • Not Synced
    I treat them as additional source metadata
    so whenever the source package or
  • Not Synced
    the buildinfo file changes, I expect
    an increase in the binary version number.
  • Not Synced
    I also assume source-only uploads and
    one additional thing that we have,
  • Not Synced
    keyring source package treated
    by the archive as authoritative and
  • Not Synced
    this keyring must have
    the special property that
  • Not Synced
    is operated in append-only so that
    we can go back in time and see
  • Not Synced
    what keys were authorized at different
    points in time.
  • Not Synced
    The log server is a standalone server
    component that speaks at the moment on
  • Not Synced
    an HTTP-based protocol.
  • Not Synced
    Probably one would want to have
    more than one, but
  • Not Synced
    we are going to have, I think,
    a much easier time running log servers
  • Not Synced
    than for example, the certificate
    transparency people
  • Not Synced
    because we only have one source
    of writing access,
  • Not Synced
    namely the archive, so we can easily
    schedule the write access,
  • Not Synced
    and you can have read-only frontends that
    aren't quite critical.
  • Not Synced
    The auditor component would need to be
    integrated into the apt client or library.
  • Not Synced
    It needs two things like cryptographic
    verifications,
  • Not Synced
    understand a bit more file formats and
    some more network access.
  • Not Synced
    Parts of the proof could also
    probably distribute over
  • Not Synced
    the mirror network and we need
    not necessarily do everything
  • Not Synced
    ??? communication with
    the log server.
  • Not Synced
    So, this covers archive auditor and
    log server.
  • Not Synced
    The monitoring servers have a few functions
    that are necessary for the verification of
  • Not Synced
    the log itself, meaning that they verify
    the append-only operation of the log
  • Not Synced
    and they will also likely want to exchange
    the tree roots with perhaps
  • Not Synced
    other monitors and some auditors.
  • Not Synced
    The important verification functions
    of the log server are validating
  • Not Synced
    the metadata of the release packages and
    sources file,
  • Not Synced
    namely making sure that these are complete,
    that the sources are available,
  • Not Synced
    that the versions are incremented
    correctly and so on.
  • Not Synced
    And that's necessary to make sure that
    a compromised archive can't do
  • Not Synced
    certain attacks.
  • Not Synced
    Also in this category is the fact that
    we depend on a fixed release frequency
  • Not Synced
    and monitors will also be verifying
    the upload ACL,
  • Not Synced
    meaning which keys are authorized to
    upload.
  • Not Synced
    Monitors also would be verifying
    reproducible builds in this scenario.
  • Not Synced
    That's the monitoring functions and
    I think that many different people and
  • Not Synced
    groups in Debian could get some benefits
    out of these monitoring functions
  • Not Synced
    in order to verify that everything
    worked correctly.
  • Not Synced
    We should note that all these verifications
    are completely independent of
  • Not Synced
    the existing infrastructure because
    happening on the client side.
  • Not Synced
    So we don't depend on any notifications
    from the existing infrastructure that
  • Not Synced
    works correctly and no notifications
    are stopped.
  • Not Synced
    This can be done completely
    on the client side using
  • Not Synced
    the data provided by the log server.
  • Not Synced
    For example, maintainers could verify that
    the code uploaded builds reproducibly
  • Not Synced
    using the corresponding build info or
  • Not Synced
    they could have checks:
  • Not Synced
    which uploads were done using their key
  • Not Synced
    which packages were modified perhaps
    by other people
  • Not Synced
    the keyring maintainers or account
    managers could be looking at
  • Not Synced
    the keyring: what keys are in the keyring
    and what uploads were done
  • Not Synced
    using which keys.
  • Not Synced
    And the archive, last but not least, has
    an additional verification step available
  • Not Synced
    to make sure all the metadata was produced
    correctly and to know
  • Not Synced
    ??? things happened during the production
    of a given release.
  • Not Synced
    This thing actually exists.
  • Not Synced
    Well, I have programmed prototypes
    for all these components,
  • Not Synced
    meaning nothing that would be ready
    to implement,
  • Not Synced
    but to show that it actually works.
  • Not Synced
    I've used two years of Debian Stretch
    releases and fed it into the system.
  • Not Synced
    This resulted in a tree size of
    270,000 elements and
  • Not Synced
    the storage required was about 400GB
    where almost all of that is
  • Not Synced
    source packages.
  • Not Synced
    I would say that it's imminently feasible
    to do this.
  • Not Synced
    The monitor functions run rather cheaply.
  • Not Synced
    A monitor needs not necessarily to keep
    a complete copy of the log in all cases
  • Not Synced
    but what I noticed some unexpected events
    in the package metadata.
  • Not Synced
    I have observed sources missing and
    version increments missing where
  • Not Synced
    I think there should be a version increment
  • Not Synced
    So I ??? be looking more closely
    into these cases.
  • Not Synced
    If anybody is interested at
    the theoretical side of this,
  • Not Synced
    this would be the immediate pointers
    I can give.
  • Not Synced
    The first paper is the theoretical and
    mathematical foundation and
  • Not Synced
    the other ones are applications of
    similar transparency work, but
  • Not Synced
    with different goals.
  • Not Synced
    Summarizing, we can introduce a system
    to detect target backdoors,
  • Not Synced
    even under compromise of the archive.
  • Not Synced
    We need to add a bit more infrastructure
    and need to change how some things are done
  • Not Synced
    We also can improve the auditability of
    what we can securely identify when
  • Not Synced
    things go wrong.
  • Not Synced
    In particular, we can make sure that for
    every binary, we can get
  • Not Synced
    the source code that was used to produced
    the binary
  • Not Synced
    and then identify
    the responsible maintainer.
  • Not Synced
    There's one class of attacks I have left
    out for today,
  • Not Synced
    if anybody wants to talk about that, we
    can do so too.
  • Not Synced
    And now, I'm interested in your questions
    and feedback.
  • Not Synced
    [Applause]
  • Not Synced
    [Q] Did you already test the reproducibility
    and how do you interact with
  • Not Synced
    problems of not reproducible packages?
  • Not Synced
    i mean, do you not integrate some
    into the log?
  • Not Synced
    [A] For now, the implementation of
    my monitor functions hasn't covered
  • Not Synced
    reproducibility.
  • Not Synced
    I think the first step to do so would be
    to have a blacklist of packages
  • Not Synced
    that are known not to be built reproducibly
    and then try to get on with it.
  • Not Synced
    [Q] Two questions.
  • Not Synced
    You say "authenticating metadata and
    code".
  • Not Synced
    This means signing or what is it exactly,
    "authenticating"?
  • Not Synced
    [A] At which point?
  • Not Synced
    [Q] It was… back. Where the tree is.
  • Not Synced
    Yes, yes. The tree before that.
  • Not Synced
    [A] Ok.
  • Not Synced
    This authentication here doesn't quite mean
    a signature.
  • Not Synced
    It means if I know the value of the root
    of the hashtree, then
  • Not Synced
    I can be assured that a given element
    is included if I'm told
  • Not Synced
    the value of the three gray marked
    inner nodes here.
  • Not Synced
    And that works by recomputing
    the hash tree.
  • Not Synced
    [Q] Ok, I think I have to defer this
    to after the talk.
  • Not Synced
    [A] Yeah, I can explain.
  • Not Synced
    [Q] Another question would be,
  • Not Synced
    so, detection of targeted backdoors.
  • Not Synced
    You mean at the stage of signing archive
    or which backdoors?
  • Not Synced
    [A] The scenario would be that
    the signing key of the archive is
  • Not Synced
    used to create an additional release file
    which covers
  • Not Synced
    a manipulated software version.
  • Not Synced
    And this software version and signature is
    only shown to the victim population
  • Not Synced
    and not to the general population.
  • Not Synced
    This means that the malicious software
    would only be observed by the victim
  • Not Synced
    and not by everybody else.
  • Not Synced
    My goal is to force the attacker to
    distribute the malicious software
  • Not Synced
    to the whole world in order to increase
  • Not Synced
    the chance that they're going to be
    detected and thereby deterring perhaps
  • Not Synced
    the attack from the beginning.
  • Not Synced
    [Q] Great talk. Great ideas as well.
  • Not Synced
    I really liked your slide on
    your assumptions
  • Not Synced
    ???
    honest about them like
  • Not Synced
    "yeah we assume ???"
  • Not Synced
    I wouldn't underestimate how difficult
    it would be to make
  • Not Synced
    some of these changes.
  • Not Synced
    I mean, even ones that look simple, like
    source-only uploads.
  • Not Synced
    Everyone wants them, right?
  • Not Synced
    [A] Yes, sure, we have to start somewhere
    and I hope if people are convinced that
  • Not Synced
    this is a great idea and we should to this
  • Not Synced
    then we get some more impetus
    for these things that everybody wants
  • Not Synced
    like source-only uploads.
  • Not Synced
    [Q] Thank you, yeah, and it will be really
    pretty good to base this stuff
  • Not Synced
    on ??? effort
    ??? build on the same choices.
  • Not Synced
    Thank you.
  • Not Synced
    [A] Yeah, so I'm interested in any kind
    of feedback.
  • Not Synced
    If you think it's a great idea or think
    there are some problems I might have missed
  • Not Synced
    or it might get difficult to implement.
  • Not Synced
    Please come to me in case you have
    anything.
  • Not Synced
    [Applause]
Title:
Software transparency: package security beyond signatures and reproducible builds
Description:

more » « less
Video Language:
English
Team:
Debconf
Project:
2018_mini-debconf-hamburg
Duration:
25:04

English subtitles

Incomplete

Revisions Compare revisions