Return to Video

Software transparency: package security beyond signatures and reproducible builds

  • 0:08 - 0:12
    This will be an academic talk
    as announced.
  • 0:12 - 0:19
    I will try to bring some of my research
    I did during my PhD into the real world.
  • 0:21 - 0:26
    We are going to talk about the security
    of software distribution and
  • 0:26 - 0:30
    I'm going to propose a security feature
    that adds on top of
  • 0:30 - 0:33
    the signatures we have up to today
  • 0:33 - 0:40
    and also the reproducible builds that
    we already have to very large degree.
  • 0:40 - 0:47
    I am going to highlight a few points where
    I think infrastructure changes are required
  • 0:47 - 0:53
    to accommodate this system and I would
    also appreciate any feedback
  • 0:53 - 0:55
    you might have.
  • 0:56 - 1:00
    I'm going to ??? a few motivation of
    what should we care about.
  • 1:01 - 1:03
    In the security of software distribution
  • 1:03 - 1:06
    we already do have
    cryptographic signatures
  • 1:06 - 1:11
    I've just put up a few examples of
    recent attacks that involved
  • 1:11 - 1:19
    the distribution of software where
    people who presumably thought
  • 1:19 - 1:24
    they knew what they were doing had
    grave problems with software distribution.
  • 1:25 - 1:28
    For example, the juniper backdoors,
    pretty famous.
  • 1:29 - 1:33
    Juniper discovered two backdoors
    in the code and
  • 1:33 - 1:38
    nobody really knew where they were
    coming from.
  • 1:39 - 1:43
    Another example would be
    Chrome extension developers
  • 1:43 - 1:48
    who got their credentials fished and
    subsequently their extensions backdoored
  • 1:48 - 1:58
    or another example, a signed update to
    a banking software actually included
  • 1:58 - 2:02
    a malware and infected several banks.
  • 2:03 - 2:11
    I hope this is motivation for us to
    consider this kinds of attacks
  • 2:11 - 2:15
    to be possible and to prepare ourselves.
  • 2:17 - 2:21
    I have two main goals in the system
    I am going to propose.
  • 2:21 - 2:25
    The first is to relax trust in the archive.
  • 2:25 - 2:31
    In particular, what I want to achieve is
    a level of security even if
  • 2:31 - 2:38
    the archive is compromised and
    the specific thing I am going to do is
  • 2:38 - 2:41
    to detect targeted backdoors.
  • 2:41 - 2:47
    That means backdoors that are distributed
    only to a subset of the population and
  • 2:47 - 2:53
    what we can achieve is to force
    the attacker to deliver the malware
  • 2:53 - 3:00
    to everybody, thereby greatly decreasing
    their degree of stealth and increasing
  • 3:00 - 3:02
    their danger of detection.
  • 3:03 - 3:05
    This would work to our advantage.
  • 3:06 - 3:10
    The second goal is the forensic auditability
  • 3:10 - 3:17
    which overlaps to a surprising degree
    with the first one in technical terms,
  • 3:17 - 3:19
    in terms of implementation.
  • 3:20 - 3:24
    So, what I want to ensure is that we have
  • 3:24 - 3:27
    inspectable source code for every binary.
  • 3:28 - 3:33
    We do have of course the source code
    available from our packages, but
  • 3:33 - 3:39
    only for the most recent version,
    everything else is a best effort
  • 3:39 - 3:43
    by the code archiving services.
  • 3:44 - 3:51
    The mapping between those and binary
    can be verified once we have
  • 3:51 - 3:54
    reproducible builds to a large extent.
  • 3:55 - 4:01
    I want to make sure that we can identify
    the maintainer responsible for distribution
  • 4:01 - 4:08
    of a particular package and the system
    is also interested in providing
  • 4:08 - 4:11
    attribution of where something went from,
  • 4:11 - 4:16
    so that we are not in a situation where we
    notice something went wrong but
  • 4:16 - 4:22
    we don't really know where we have to look
    in order to find the problems
  • 4:22 - 4:29
    but that we really have specific and
    secured indication of
  • 4:29 - 4:32
    where a compromised problem
    was coming from.
  • 4:34 - 4:37
    Let's quickly recap how our software
    distribution works.
  • 4:37 - 4:42
    We have the maintainers who upload
    their code to the archive.
  • 4:42 - 4:47
    The archive has access to a signing key
    which signs the releases.
  • 4:48 - 4:52
    Actually, metadata covering all the actual
    binary packages.
  • 4:53 - 4:57
    These are distributed over
    the mirror network
  • 4:57 - 5:02
    from where the apt clients will download
    the package metadata.
  • 5:02 - 5:07
    That means the hash sums for the packages,
    their dependencies and so on
  • 5:07 - 5:10
    as well as the actual packages themselves.
  • 5:12 - 5:19
    This central architecture has an important
    advantage,
  • 5:19 - 5:23
    mainly the mirror network need not
    to be trusted, right?
  • 5:23 - 5:29
    We have the signature that covers all
    the contents of binary and source packages
  • 5:29 - 5:33
    and the metadata, so the mirror network
    need not to be trusted.
  • 5:33 - 5:39
    On the other hand, it makes the archive and
    its signing key a very interesting target
  • 5:39 - 5:46
    for attackers because this central point
    controls all the signing operations.
  • 5:46 - 5:52
    So this is a place where we need to be
    particularly careful and perhaps
  • 5:52 - 5:56
    maybe even do better than
    cryptographic signatures.
  • 5:57 - 6:02
    This is where the main focus of this talk
    will be, although I will also consider
  • 6:02 - 6:05
    the uploaders to some extent.
  • 6:07 - 6:08
    We want to achieve two things:
  • 6:08 - 6:12
    resistance against key compromise and
    targeted backdoors and
  • 6:12 - 6:18
    to get some better support for auditing
    in case things go wrong.
  • 6:18 - 6:22
    The approach that we choose to do this is
  • 6:22 - 6:27
    we want to make sure that everybody runs
    exactly the same software
  • 6:27 - 6:30
    or at least the parts of it these choose
    to install.
  • 6:31 - 6:35
    If we think about that for a moment,
    this gives us a number of advantages.
  • 6:35 - 6:40
    For example, all the analysis that's done
    on a piece of software immediately
  • 6:40 - 6:44
    carries over to all other users of
    the software, right?
  • 6:44 - 6:48
    Because if we haven't made sure that
    everybody installs the same software,
  • 6:48 - 6:51
    they might not have exactly
    the same version and perhaps
  • 6:51 - 6:53
    some backdoored version.
  • 6:54 - 7:01
    This also ensures that we cannot suffer
    targeted backdoors by increasing
  • 7:01 - 7:04
    the detection risk of attackers
  • 7:04 - 7:09
    and we also want to have a cryptographic
    proof of where something went wrong.
  • 7:11 - 7:19
    Now, to look at some pictures,
    I will present the data structure that
  • 7:19 - 7:22
    we use in order to achieve these goals.
  • 7:23 - 7:28
    The data structure is a hash tree,
    a Merkle tree which is
  • 7:28 - 7:31
    a data structure that operates over a list.
  • 7:31 - 7:35
    So we have a list of these squares here
    which represent the list items.
  • 7:35 - 7:39
    In our case, this is going to be
    the files containing a package metedata
  • 7:39 - 7:42
    that just dependencies, a hash sum of
    packages
  • 7:42 - 7:47
    and also the source packages themselves
    are going to be elements in this list.
  • 7:48 - 7:49
    The tree works as follows.
  • 7:50 - 7:52
    It uses a cryptographic hash function
  • 7:52 - 7:56
    which is a collision resistant compressing
    function
  • 7:56 - 8:01
    and the labels of the inner nodes
    of the tree are computed as
  • 8:01 - 8:05
    the hashes of the children. Ok?
  • 8:06 - 8:10
    Once we have computed the root hash,
    the root label,
  • 8:10 - 8:15
    we have fixed all the elements and
    none of the elements can be changed
  • 8:15 - 8:17
    without changing the root hash.
  • 8:18 - 8:22
    We can exploit this in order to
    efficiently prove
  • 8:22 - 8:26
    the two following properties for elements.
  • 8:26 - 8:31
    First of all, we can efficiently prove
    the inclusion of a given element
  • 8:31 - 8:32
    in the list.
  • 8:32 - 8:37
    If we know the tree root ???,
    this works as follows:
  • 8:37 - 8:41
    let's make a quick example, we see
    the third list item is marked with an X
  • 8:41 - 8:50
    and if I know the tree root, then
    the server operating the tree structure
  • 8:50 - 8:55
    will only need to give me the three gray
    marked labels,
  • 8:55 - 9:01
    the three marked node values and then
    I can recompute the root hash and
  • 9:01 - 9:06
    be convinced that this element actually
    was contained in the list.
  • 9:07 - 9:12
    The second property is that we can also
    efficiently verify the append-only operation
  • 9:12 - 9:14
    of the list.
  • 9:14 - 9:18
    So we can have a log server operating
    this kind of structure and
  • 9:18 - 9:19
    the log server need not to be trusted,
  • 9:19 - 9:24
    it's not going to be trusted third party
    but rather, its operation can be
  • 9:24 - 9:26
    verified from the outside.
  • 9:28 - 9:31
    So, what does this design look like?
  • 9:32 - 9:36
    The theoretical foundation is called
    a transparency overlay and
  • 9:36 - 9:38
    in our system it looks like this:
  • 9:38 - 9:41
    We have the archive as per usual,
  • 9:41 - 9:47
    we have a log server and the archive will
    submit package metadata, the release file,
  • 9:47 - 9:52
    the packages file containing dependencies
    and so on and the source code
  • 9:52 - 9:55
    into this log server.
  • 9:56 - 10:04
    The apt client will be augmented with
    an auditor component and
  • 10:04 - 10:10
    this auditor component is responsible for
    verifying the correct log operation
  • 10:10 - 10:15
    as well as the inclusion of the downloaded
    release into the log.
  • 10:16 - 10:21
    This is a mechanism which we will be able
    to make sure that everybody is running
  • 10:21 - 10:25
    the exact same version of the software
    they installed.
  • 10:27 - 10:29
    A third component is the monitor.
  • 10:30 - 10:37
    The monitor is necessary also to verify
    log operation and also to inspect
  • 10:37 - 10:42
    the elements that are contained in the log
  • 10:44 - 10:52
    The monitor would then be run by groups
    of individuals or individuals that want to
  • 10:52 - 10:57
    make sure of certain properties in the log
  • 11:01 - 11:06
    Alright, let's quickly recap.
  • 11:07 - 11:13
    We have added this log server, which
    can prove two properties efficiently
  • 11:13 - 11:16
    to the outside world.
  • 11:17 - 11:21
    And we have the auditor and monitor
    components.
  • 11:21 - 11:27
    The auditor is added to the apt client
    and the monitor does
  • 11:27 - 11:30
    additional investigative tasks.
  • 11:31 - 11:38
    Now, in order to make this system work,
    we need to…
  • 11:39 - 11:41
    I need to make a few assumptions.
  • 11:41 - 11:48
    The archive will need to handle
    log submission and distribution of
  • 11:48 - 11:51
    certain log datastructure.
  • 11:52 - 11:56
    These are usually very small things
    given to the archive
  • 11:56 - 11:58
    in response to submission.
  • 11:59 - 12:04
    Then I'm assuming a very consistent
    release frequency.
  • 12:05 - 12:11
    The archive is responsible for distributing
    reproducible binaries
  • 12:11 - 12:13
    in my architecture.
  • 12:14 - 12:20
    I'm assuming that the buildinfo files are
    covered by the release file
  • 12:21 - 12:28
    I treat them as additional source metadata
    so whenever the source package or
  • 12:28 - 12:35
    the buildinfo file changes, I expect
    an increase in the binary version number.
  • 12:36 - 12:42
    I also assume source-only uploads and
    one additional thing that we have,
  • 12:42 - 12:49
    keyring source package treated
    by the archive as authoritative and
  • 12:49 - 12:53
    this keyring must have
    the special property that
  • 12:53 - 12:58
    is operated in append-only so that
    we can go back in time and see
  • 12:58 - 13:01
    what keys were authorized at different
    points in time.
  • 13:04 - 13:11
    The log server is a standalone server
    component that speaks at the moment on
  • 13:11 - 13:13
    an HTTP-based protocol.
  • 13:14 - 13:17
    Probably one would want to have
    more than one, but
  • 13:17 - 13:21
    we are going to have, I think,
    a much easier time running log servers
  • 13:21 - 13:24
    than for example, the certificate
    transparency people
  • 13:24 - 13:30
    because we only have one source
    of writing access,
  • 13:30 - 13:36
    namely the archive, so we can easily
    schedule the write access,
  • 13:36 - 13:41
    and you can have read-only frontends that
    aren't quite critical.
  • 13:43 - 13:50
    The auditor component would need to be
    integrated into the apt client or library.
  • 13:50 - 13:54
    It needs two things like cryptographic
    verifications,
  • 13:54 - 14:00
    understand a bit more file formats and
    some more network access.
  • 14:03 - 14:07
    Parts of the proof could also
    probably distribute over
  • 14:07 - 14:12
    the mirror network and we need
    not necessarily do everything
  • 14:13 - 14:15
    ??? communication with
    the log server.
  • 14:19 - 14:23
    So, this covers archive auditor and
    log server.
  • 14:24 - 14:31
    The monitoring servers have a few functions
    that are necessary for the verification of
  • 14:31 - 14:38
    the log itself, meaning that they verify
    the append-only operation of the log
  • 14:38 - 14:43
    and they will also likely want to exchange
    the tree roots with perhaps
  • 14:43 - 14:45
    other monitors and some auditors.
  • 14:47 - 14:51
    The important verification functions
    of the log server are validating
  • 14:51 - 14:57
    the metadata of the release packages and
    sources file,
  • 14:57 - 15:01
    namely making sure that these are complete,
    that the sources are available,
  • 15:01 - 15:06
    that the versions are incremented
    correctly and so on.
  • 15:06 - 15:10
    And that's necessary to make sure that
    a compromised archive can't do
  • 15:10 - 15:12
    certain attacks.
  • 15:13 - 15:19
    Also in this category is the fact that
    we depend on a fixed release frequency
  • 15:19 - 15:27
    and monitors will also be verifying
    the upload ACL,
  • 15:27 - 15:29
    meaning which keys are authorized to
    upload.
  • 15:30 - 15:37
    Monitors also would be verifying
    reproducible builds in this scenario.
  • 15:41 - 15:49
    That's the monitoring functions and
    I think that many different people and
  • 15:49 - 15:56
    groups in Debian could get some benefits
    out of these monitoring functions
  • 15:56 - 16:01
    in order to verify that everything
    worked correctly.
  • 16:01 - 16:05
    We should note that all these verifications
    are completely independent of
  • 16:05 - 16:09
    the existing infrastructure because
    happening on the client side.
  • 16:10 - 16:16
    So we don't depend on any notifications
    from the existing infrastructure that
  • 16:16 - 16:19
    works correctly and no notifications
    are stopped.
  • 16:19 - 16:23
    This can be done completely
    on the client side using
  • 16:23 - 16:26
    the data provided by the log server.
  • 16:27 - 16:35
    For example, maintainers could verify that
    the code uploaded builds reproducibly
  • 16:35 - 16:38
    using the corresponding build info or
  • 16:38 - 16:40
    they could have checks:
  • 16:40 - 16:43
    which uploads were done using their key
  • 16:43 - 16:48
    which packages were modified perhaps
    by other people
  • 16:48 - 16:54
    the keyring maintainers or account
    managers could be looking at
  • 16:54 - 16:59
    the keyring: what keys are in the keyring
    and what uploads were done
  • 16:59 - 17:01
    using which keys.
  • 17:02 - 17:12
    And the archive, last but not least, has
    an additional verification step available
  • 17:12 - 17:18
    to make sure all the metadata was produced
    correctly and to know
  • 17:18 - 17:23
    wierd things happened during the production
    of a given release.
  • 17:28 - 17:29
    This thing actually exists.
  • 17:29 - 17:33
    Well, I have programmed prototypes
    for all these components,
  • 17:34 - 17:37
    meaning nothing that would be ready
    to implement,
  • 17:37 - 17:40
    but to show that it actually works.
  • 17:40 - 17:46
    I've used two years of Debian Stretch
    releases and fed it into the system.
  • 17:47 - 17:53
    This resulted in a tree size of
    270,000 elements and
  • 17:53 - 17:59
    the storage required was about 400GB
    where almost all of that is
  • 17:59 - 18:00
    source packages.
  • 18:01 - 18:05
    I would say that it's imminently feasible
    to do this.
  • 18:05 - 18:10
    The monitor functions run rather cheaply.
  • 18:10 - 18:17
    A monitor needs not necessarily to keep
    a complete copy of the log in all cases
  • 18:19 - 18:26
    but what I noticed some unexpected events
    in the package metadata.
  • 18:28 - 18:33
    I have observed sources missing and
    version increments missing where
  • 18:33 - 18:36
    I think there should be a version increment
  • 18:37 - 18:42
    So I'll be looking more closely
    into these cases.
  • 18:45 - 18:51
    If anybody is interested at
    the theoretical side of this,
  • 18:51 - 18:55
    this would be the immediate pointers
    I can give.
  • 18:56 - 19:01
    The first paper is the theoretical and
    mathematical foundation and
  • 19:01 - 19:09
    the other ones are applications of
    similar transparency work, but
  • 19:09 - 19:14
    with different goals.
  • 19:18 - 19:26
    Summarizing, we can introduce a system
    to detect target backdoors,
  • 19:26 - 19:29
    even under compromise of the archive.
  • 19:30 - 19:36
    We need to add a bit more infrastructure
    and need to change how some things are done
  • 19:38 - 19:47
    We also can improve the auditability of
    what we can securely identify when
  • 19:47 - 19:49
    things go wrong.
  • 19:50 - 19:56
    In particular, we can make sure that for
    every binary, we can get
  • 19:56 - 20:03
    the source code that was used to produced
    the binary
  • 20:03 - 20:06
    and then identify
    the responsible maintainer.
  • 20:07 - 20:11
    There's one class of attacks I have left
    out for today,
  • 20:11 - 20:14
    if anybody wants to talk about that, we
    can do so too.
  • 20:16 - 20:21
    And now, I'm interested in your questions
    and feedback.
  • 20:23 - 20:28
    [Applause]
  • 20:34 - 20:40
    [Q] Did you already test the reproducibility
    and how do you interact with
  • 20:40 - 20:44
    problems of not reproducible packages?
  • 20:44 - 20:47
    I mean, do you not integrate some
    into the log?
  • 20:48 - 20:53
    [A] For now, the implementation of
    my monitor functions hasn't covered
  • 20:53 - 20:55
    reproducibility.
  • 20:55 - 21:01
    I think the first step to do so would be
    to have a blacklist of packages
  • 21:01 - 21:06
    that are known not to be built reproducibly
    and then try to get on with it.
  • 21:17 - 21:18
    [Q] Two questions.
  • 21:18 - 21:21
    You say "authenticating metadata and
    code".
  • 21:21 - 21:26
    This means signing or what is it exactly,
    "authenticating"?
  • 21:26 - 21:28
    [A] At which point?
  • 21:29 - 21:33
    [Q] It was… back. Where the tree is.
  • 21:34 - 21:37
    Yes, yes. The tree before that.
  • 21:40 - 21:40
    [A] Ok.
  • 21:42 - 21:49
    This authentication here doesn't quite mean
    a signature.
  • 21:50 - 21:57
    It means if I know the value of the root
    of the hashtree, then
  • 21:57 - 22:04
    I can be assured that a given element
    is included if I'm told
  • 22:04 - 22:11
    the value of the three gray marked
    inner nodes here.
  • 22:11 - 22:16
    And that works by recomputing
    the hash tree.
  • 22:20 - 22:24
    [Q] Ok, I think I have to defer this
    to after the talk.
  • 22:24 - 22:26
    [A] Yeah, I can explain.
  • 22:27 - 22:29
    [Q] Another question would be,
  • 22:29 - 22:32
    so, detection of targeted backdoors.
  • 22:33 - 22:39
    You mean at the stage of signing archive
    or which backdoors?
  • 22:41 - 22:46
    [A] The scenario would be that
    the signing key of the archive is
  • 22:46 - 22:53
    used to create an additional release file
    which covers
  • 22:53 - 22:55
    a manipulated software version.
  • 22:55 - 23:00
    And this software version and signature is
    only shown to the victim population
  • 23:00 - 23:02
    and not to the general population.
  • 23:03 - 23:08
    This means that the malicious software
    would only be observed by the victim
  • 23:08 - 23:10
    and not by everybody else.
  • 23:11 - 23:15
    My goal is to force the attacker to
    distribute the malicious software
  • 23:15 - 23:18
    to the whole world in order to increase
  • 23:18 - 23:22
    the chance that they're going to be
    detected and thereby deterring perhaps
  • 23:22 - 23:24
    the attack from the beginning.
  • 23:40 - 23:42
    [Q] Great talk. Great ideas as well.
  • 23:44 - 23:47
    I really liked your slide on
    your assumptions
  • 23:47 - 23:50
    ???
    honest about them like
  • 23:50 - 23:51
    "yeah we assume ???"
  • 23:53 - 23:56
    I wouldn't underestimate how difficult
    it would be to make
  • 23:56 - 23:57
    some of these changes.
  • 23:57 - 24:01
    I mean, even ones that look simple, like
    source-only uploads.
  • 24:01 - 24:03
    Everyone wants them, right?
  • 24:05 - 24:11
    [A] Yes, sure, we have to start somewhere
    and I hope if people are convinced that
  • 24:12 - 24:14
    this is a great idea and we should to this
  • 24:14 - 24:18
    then we get some more impetus
    for these things that everybody wants
  • 24:18 - 24:21
    like source-only uploads.
  • 24:22 - 24:25
    [Q] Thank you, yeah, and it will be really
    pretty good to base this stuff
  • 24:25 - 24:29
    on reproducible builds effort because
    it builds on the same choices.
  • 24:30 - 24:31
    Thank you.
  • 24:34 - 24:37
    [A] Yeah, so I'm interested in any kind
    of feedback.
  • 24:38 - 24:43
    If you think it's a great idea or think
    there are some problems I might have missed
  • 24:43 - 24:46
    or it might get difficult to implement.
  • 24:48 - 24:52
    Please come talk to me in case you have
    anything.
  • 24:53 - 24:59
    [Applause]
Title:
Software transparency: package security beyond signatures and reproducible builds
Description:

Talk given by Benjamin Hof at Minidebconf Hamburg 18
https://meetings-archive.debian.net/pub/debian-meetings/2018/miniconf-hamburg/2018-05-19/software_transparency.webm

more » « less
Video Language:
English
Team:
Debconf
Project:
2018_mini-debconf-hamburg
Duration:
25:04

English subtitles

Incomplete

Revisions Compare revisions