Software transparency: package security beyond signatures and reproducible builds

Edit subtitles

0:08 - 0:12

This will be an academic talk
as announced.
0:12 - 0:19

I will try to bring some of my research
I did during my PhD into the real world.
0:21 - 0:26

We are going to talk about the security
of software distribution and
0:26 - 0:30

I'm going to propose a security feature
that adds on top of
0:30 - 0:33

the signatures we have up to today
0:33 - 0:40

and also the reproducible builds that
we already have to very large degree.
0:40 - 0:47

I am going to highlight a few points where
I think infrastructure changes are required
0:47 - 0:53

to accommodate this system and I would
also appreciate any feedback
0:53 - 0:55

you might have.
0:56 - 1:00

I'm going to ??? a few motivation of
what should we care about.
1:01 - 1:03

In the security of software distribution
1:03 - 1:06

we already do have
cryptographic signatures
1:06 - 1:11

I've just put up a few examples of
recent attacks that involved
1:11 - 1:19

the distribution of software where
people who presumably thought
1:19 - 1:24

they knew what they were doing had
grave problems with software distribution.
1:25 - 1:28

For example, the juniper backdoors,
pretty famous.
1:29 - 1:33

Juniper discovered two backdoors
in the code and
1:33 - 1:38

nobody really knew where they were
coming from.
1:39 - 1:43

Another example would be
Chrome extension developers
1:43 - 1:48

who got their credentials fished and
subsequently their extensions backdoored
1:48 - 1:58

or another example, a signed update to
a banking software actually included
1:58 - 2:02

a malware and infected several banks.
2:03 - 2:11

I hope this is motivation for us to
consider this kinds of attacks
2:11 - 2:15

to be possible and to prepare ourselves.
2:17 - 2:21

I have two main goals in the system
I am going to propose.
2:21 - 2:25

The first is to relax trust in the archive.
2:25 - 2:31

In particular, what I want to achieve is
a level of security even if
2:31 - 2:38

the archive is compromised and
the specific thing I am going to do is
2:38 - 2:41

to detect targeted backdoors.
2:41 - 2:47

That means backdoors that are distributed
only to a subset of the population and
2:47 - 2:53

what we can achieve is to force
the attacker to deliver the malware
2:53 - 3:00

to everybody, thereby greatly decreasing
their degree of stealth and increasing
3:00 - 3:02

their danger of detection.
3:03 - 3:05

This would work to our advantage.
3:06 - 3:10

The second goal is the forensic auditability
3:10 - 3:17

which overlaps to a surprising degree
with the first one in technical terms,
3:17 - 3:19

in terms of implementation.
3:20 - 3:24

So, what I want to ensure is that we have
3:24 - 3:27

inspectable source code for every binary.
3:28 - 3:33

We do have of course the source code
available from our packages, but
3:33 - 3:39

only for the most recent version,
everything else is a best effort
3:39 - 3:43

by the code archiving services.
3:44 - 3:51

The mapping between those and binary
can be verified once we have
3:51 - 3:54

reproducible builds to a large extent.
3:55 - 4:01

I want to make sure that we can identify
the maintainer responsible for distribution
4:01 - 4:08

of a particular package and the system
is also interested in providing
4:08 - 4:11

attribution of where something went from,
4:11 - 4:16

so that we are not in a situation where we
notice something went wrong but
4:16 - 4:22

we don't really know where we have to look
in order to find the problems
4:22 - 4:29

but that we really have specific and
secured indication of
4:29 - 4:32

where a compromised problem
was coming from.
4:34 - 4:37

Let's quickly recap how our software
distribution works.
4:37 - 4:42

We have the maintainers who upload
their code to the archive.
4:42 - 4:47

The archive has access to a signing key
which signs the releases.
4:48 - 4:52

Actually, metadata covering all the actual
binary packages.
4:53 - 4:57

These are distributed over
the mirror network
4:57 - 5:02

from where the apt clients will download
the package metadata.
5:02 - 5:07

That means the hash sums for the packages,
their dependencies and so on
5:07 - 5:10

as well as the actual packages themselves.
5:12 - 5:19

This central architecture has an important
advantage,
5:19 - 5:23

mainly the mirror network need not
to be trusted, right?
5:23 - 5:29

We have the signature that covers all
the contents of binary and source packages
5:29 - 5:33

and the metadata, so the mirror network
need not to be trusted.
5:33 - 5:39

On the other hand, it makes the archive and
its signing key a very interesting target
5:39 - 5:46

for attackers because this central point
controls all the signing operations.
5:46 - 5:52

So this is a place where we need to be
particularly careful and perhaps
5:52 - 5:56

maybe even do better than
cryptographic signatures.
5:57 - 6:02

This is where the main focus of this talk
will be, although I will also consider
6:02 - 6:05

the uploaders to some extent.
6:07 - 6:08

We want to achieve two things:
6:08 - 6:12

resistance against key compromise and
targeted backdoors and
6:12 - 6:18

to get some better support for auditing
in case things go wrong.
6:18 - 6:22

The approach that we choose to do this is
6:22 - 6:27

we want to make sure that everybody runs
exactly the same software
6:27 - 6:30

or at least the parts of it these choose
to install.
6:31 - 6:35

If we think about that for a moment,
this gives us a number of advantages.
6:35 - 6:40

For example, all the analysis that's done
on a piece of software immediately
6:40 - 6:44

carries over to all other users of
the software, right?
6:44 - 6:48

Because if we haven't made sure that
everybody installs the same software,
6:48 - 6:51

they might not have exactly
the same version and perhaps
6:51 - 6:53

some backdoored version.
6:54 - 7:01

This also ensures that we cannot suffer
targeted backdoors by increasing
7:01 - 7:04

the detection risk of attackers
7:04 - 7:09

and we also want to have a cryptographic
proof of where something went wrong.
7:11 - 7:19

Now, to look at some pictures,
I will present the data structure that
7:19 - 7:22

we use in order to achieve these goals.
7:23 - 7:28

The data structure is a hash tree,
a Merkle tree which is
7:28 - 7:31

a data structure that operates over a list.
7:31 - 7:35

So we have a list of these squares here
which represent the list items.
7:35 - 7:39

In our case, this is going to be
the files containing a package metedata
7:39 - 7:42

that just dependencies, a hash sum of
packages
7:42 - 7:47

and also the source packages themselves
are going to be elements in this list.
7:48 - 7:49

The tree works as follows.
7:50 - 7:52

It uses a cryptographic hash function
7:52 - 7:56

which is a collision resistant compressing
function
7:56 - 8:01

and the labels of the inner nodes
of the tree are computed as
8:01 - 8:05

the hashes of the children. Ok?
8:06 - 8:10

Once we have computed the root hash,
the root label,
8:10 - 8:15

we have fixed all the elements and
none of the elements can be changed
8:15 - 8:17

without changing the root hash.
8:18 - 8:22

We can exploit this in order to
efficiently prove
8:22 - 8:26

the two following properties for elements.
8:26 - 8:31

First of all, we can efficiently prove
the inclusion of a given element
8:31 - 8:32

in the list.
8:32 - 8:37

If we know the tree root ???,
this works as follows:
8:37 - 8:41

let's make a quick example, we see
the third list item is marked with an X
8:41 - 8:50

and if I know the tree root, then
the server operating the tree structure
8:50 - 8:55

will only need to give me the three gray
marked labels,
8:55 - 9:01

the three marked node values and then
I can recompute the root hash and
9:01 - 9:06

be convinced that this element actually
was contained in the list.
9:07 - 9:12

The second property is that we can also
efficiently verify the append-only operation
9:12 - 9:14

of the list.
9:14 - 9:18

So we can have a log server operating
this kind of structure and
9:18 - 9:19

the log server need not to be trusted,
9:19 - 9:24

it's not going to be trusted third party
but rather, its operation can be
9:24 - 9:26

verified from the outside.
9:28 - 9:31

So, what does this design look like?
9:32 - 9:36

The theoretical foundation is called
a transparency overlay and
9:36 - 9:38

in our system it looks like this:
9:38 - 9:41

We have the archive as per usual,
9:41 - 9:47

we have a log server and the archive will
submit package metadata, the release file,
9:47 - 9:52

the packages file containing dependencies
and so on and the source code
9:52 - 9:55

into this log server.
9:56 - 10:04

The apt client will be augmented with
an auditor component and
10:04 - 10:10

this auditor component is responsible for
verifying the correct log operation
10:10 - 10:15

as well as the inclusion of the downloaded
release into the log.
10:16 - 10:21

This is a mechanism which we will be able
to make sure that everybody is running
10:21 - 10:25

the exact same version of the software
they installed.
10:27 - 10:29

A third component is the monitor.
10:30 - 10:37

The monitor is necessary also to verify
log operation and also to inspect
10:37 - 10:42

the elements that are contained in the log
10:44 - 10:52

The monitor would then be run by groups
of individuals or individuals that want to
10:52 - 10:57

make sure of certain properties in the log
11:01 - 11:06

Alright, let's quickly recap.
11:07 - 11:13

We have added this log server, which
can prove two properties efficiently
11:13 - 11:16

to the outside world.
11:17 - 11:21

And we have the auditor and monitor
components.
11:21 - 11:27

The auditor is added to the apt client
and the monitor does
11:27 - 11:30

additional investigative tasks.
11:31 - 11:38

Now, in order to make this system work,
we need to…
11:39 - 11:41

I need to make a few assumptions.
11:41 - 11:48

The archive will need to handle
log submission and distribution of
11:48 - 11:51

certain log datastructure.
11:52 - 11:56

These are usually very small things
given to the archive
11:56 - 11:58

in response to submission.
11:59 - 12:04

Then I'm assuming a very consistent
release frequency.
12:05 - 12:11

The archive is responsible for distributing
reproducible binaries
12:11 - 12:13

in my architecture.
12:14 - 12:20

I'm assuming that the buildinfo files are
covered by the release file
12:21 - 12:28

I treat them as additional source metadata
so whenever the source package or
12:28 - 12:35

the buildinfo file changes, I expect
an increase in the binary version number.
12:36 - 12:42

I also assume source-only uploads and
one additional thing that we have,
12:42 - 12:49

keyring source package treated
by the archive as authoritative and
12:49 - 12:53

this keyring must have
the special property that
12:53 - 12:58

is operated in append-only so that
we can go back in time and see
12:58 - 13:01

what keys were authorized at different
points in time.
13:04 - 13:11

The log server is a standalone server
component that speaks at the moment on
13:11 - 13:13

an HTTP-based protocol.
13:14 - 13:17

Probably one would want to have
more than one, but
13:17 - 13:21

we are going to have, I think,
a much easier time running log servers
13:21 - 13:24

than for example, the certificate
transparency people
13:24 - 13:30

because we only have one source
of writing access,
13:30 - 13:36

namely the archive, so we can easily
schedule the write access,
13:36 - 13:41

and you can have read-only frontends that
aren't quite critical.
13:43 - 13:50

The auditor component would need to be
integrated into the apt client or library.
13:50 - 13:54

It needs two things like cryptographic
verifications,
13:54 - 14:00

understand a bit more file formats and
some more network access.
14:03 - 14:07

Parts of the proof could also
probably distribute over
14:07 - 14:12

the mirror network and we need
not necessarily do everything
14:13 - 14:15

??? communication with
the log server.
14:19 - 14:23

So, this covers archive auditor and
log server.
14:24 - 14:31

The monitoring servers have a few functions
that are necessary for the verification of
14:31 - 14:38

the log itself, meaning that they verify
the append-only operation of the log
14:38 - 14:43

and they will also likely want to exchange
the tree roots with perhaps
14:43 - 14:45

other monitors and some auditors.
14:47 - 14:51

The important verification functions
of the log server are validating
14:51 - 14:57

the metadata of the release packages and
sources file,
14:57 - 15:01

namely making sure that these are complete,
that the sources are available,
15:01 - 15:06

that the versions are incremented
correctly and so on.
15:06 - 15:10

And that's necessary to make sure that
a compromised archive can't do
15:10 - 15:12

certain attacks.
15:13 - 15:19

Also in this category is the fact that
we depend on a fixed release frequency
15:19 - 15:27

and monitors will also be verifying
the upload ACL,
15:27 - 15:29

meaning which keys are authorized to
upload.
15:30 - 15:37

Monitors also would be verifying
reproducible builds in this scenario.
15:41 - 15:49

That's the monitoring functions and
I think that many different people and
15:49 - 15:56

groups in Debian could get some benefits
out of these monitoring functions
15:56 - 16:01

in order to verify that everything
worked correctly.
16:01 - 16:05

We should note that all these verifications
are completely independent of
16:05 - 16:09

the existing infrastructure because
happening on the client side.
16:10 - 16:16

So we don't depend on any notifications
from the existing infrastructure that
16:16 - 16:19

works correctly and no notifications
are stopped.
16:19 - 16:23

This can be done completely
on the client side using
16:23 - 16:26

the data provided by the log server.
16:27 - 16:35

For example, maintainers could verify that
the code uploaded builds reproducibly
16:35 - 16:38

using the corresponding build info or
16:38 - 16:40

they could have checks:
16:40 - 16:43

which uploads were done using their key
16:43 - 16:48

which packages were modified perhaps
by other people
16:48 - 16:54

the keyring maintainers or account
managers could be looking at
16:54 - 16:59

the keyring: what keys are in the keyring
and what uploads were done
16:59 - 17:01

using which keys.
17:02 - 17:12

And the archive, last but not least, has
an additional verification step available
17:12 - 17:18

to make sure all the metadata was produced
correctly and to know
17:18 - 17:23

wierd things happened during the production
of a given release.
17:28 - 17:29

This thing actually exists.
17:29 - 17:33

Well, I have programmed prototypes
for all these components,
17:34 - 17:37

meaning nothing that would be ready
to implement,
17:37 - 17:40

but to show that it actually works.
17:40 - 17:46

I've used two years of Debian Stretch
releases and fed it into the system.
17:47 - 17:53

This resulted in a tree size of
270,000 elements and
17:53 - 17:59

the storage required was about 400GB
where almost all of that is
17:59 - 18:00

source packages.
18:01 - 18:05

I would say that it's imminently feasible
to do this.
18:05 - 18:10

The monitor functions run rather cheaply.
18:10 - 18:17

A monitor needs not necessarily to keep
a complete copy of the log in all cases
18:19 - 18:26

but what I noticed some unexpected events
in the package metadata.
18:28 - 18:33

I have observed sources missing and
version increments missing where
18:33 - 18:36

I think there should be a version increment
18:37 - 18:42

So I'll be looking more closely
into these cases.
18:45 - 18:51

If anybody is interested at
the theoretical side of this,
18:51 - 18:55

this would be the immediate pointers
I can give.
18:56 - 19:01

The first paper is the theoretical and
mathematical foundation and
19:01 - 19:09

the other ones are applications of
similar transparency work, but
19:09 - 19:14

with different goals.
19:18 - 19:26

Summarizing, we can introduce a system
to detect target backdoors,
19:26 - 19:29

even under compromise of the archive.
19:30 - 19:36

We need to add a bit more infrastructure
and need to change how some things are done
19:38 - 19:47

We also can improve the auditability of
what we can securely identify when
19:47 - 19:49

things go wrong.
19:50 - 19:56

In particular, we can make sure that for
every binary, we can get
19:56 - 20:03

the source code that was used to produced
the binary
20:03 - 20:06

and then identify
the responsible maintainer.
20:07 - 20:11

There's one class of attacks I have left
out for today,
20:11 - 20:14

if anybody wants to talk about that, we
can do so too.
20:16 - 20:21

And now, I'm interested in your questions
and feedback.
20:23 - 20:28

[Applause]
20:34 - 20:40

[Q] Did you already test the reproducibility
and how do you interact with
20:40 - 20:44

problems of not reproducible packages?
20:44 - 20:47

I mean, do you not integrate some
into the log?
20:48 - 20:53

[A] For now, the implementation of
my monitor functions hasn't covered
20:53 - 20:55

reproducibility.
20:55 - 21:01

I think the first step to do so would be
to have a blacklist of packages
21:01 - 21:06

that are known not to be built reproducibly
and then try to get on with it.
21:17 - 21:18

[Q] Two questions.
21:18 - 21:21

You say "authenticating metadata and
code".
21:21 - 21:26

This means signing or what is it exactly,
"authenticating"?
21:26 - 21:28

[A] At which point?
21:29 - 21:33

[Q] It was… back. Where the tree is.
21:34 - 21:37

Yes, yes. The tree before that.
21:40 - 21:40

[A] Ok.
21:42 - 21:49

This authentication here doesn't quite mean
a signature.
21:50 - 21:57

It means if I know the value of the root
of the hashtree, then
21:57 - 22:04

I can be assured that a given element
is included if I'm told
22:04 - 22:11

the value of the three gray marked
inner nodes here.
22:11 - 22:16

And that works by recomputing
the hash tree.
22:20 - 22:24

[Q] Ok, I think I have to defer this
to after the talk.
22:24 - 22:26

[A] Yeah, I can explain.
22:27 - 22:29

[Q] Another question would be,
22:29 - 22:32

so, detection of targeted backdoors.
22:33 - 22:39

You mean at the stage of signing archive
or which backdoors?
22:41 - 22:46

[A] The scenario would be that
the signing key of the archive is
22:46 - 22:53

used to create an additional release file
which covers
22:53 - 22:55

a manipulated software version.
22:55 - 23:00

And this software version and signature is
only shown to the victim population
23:00 - 23:02

and not to the general population.
23:03 - 23:08

This means that the malicious software
would only be observed by the victim
23:08 - 23:10

and not by everybody else.
23:11 - 23:15

My goal is to force the attacker to
distribute the malicious software
23:15 - 23:18

to the whole world in order to increase
23:18 - 23:22

the chance that they're going to be
detected and thereby deterring perhaps
23:22 - 23:24

the attack from the beginning.
23:40 - 23:42

[Q] Great talk. Great ideas as well.
23:44 - 23:47

I really liked your slide on
your assumptions
23:47 - 23:50

???
honest about them like
23:50 - 23:51

"yeah we assume ???"
23:53 - 23:56

I wouldn't underestimate how difficult
it would be to make
23:56 - 23:57

some of these changes.
23:57 - 24:01

I mean, even ones that look simple, like
source-only uploads.
24:01 - 24:03

Everyone wants them, right?
24:05 - 24:11

[A] Yes, sure, we have to start somewhere
and I hope if people are convinced that
24:12 - 24:14

this is a great idea and we should to this
24:14 - 24:18

then we get some more impetus
for these things that everybody wants
24:18 - 24:21

like source-only uploads.
24:22 - 24:25

[Q] Thank you, yeah, and it will be really
pretty good to base this stuff
24:25 - 24:29

on reproducible builds effort because
it builds on the same choices.
24:30 - 24:31

Thank you.
24:34 - 24:37

[A] Yeah, so I'm interested in any kind
of feedback.
24:38 - 24:43

If you think it's a great idea or think
there are some problems I might have missed
24:43 - 24:46

or it might get difficult to implement.
24:48 - 24:52

Please come talk to me in case you have
anything.
24:53 - 24:59

[Applause]

Title:: Software transparency: package security beyond signatures and reproducible builds
Description:: Talk given by Benjamin Hof at Minidebconf Hamburg 18
https://meetings-archive.debian.net/pub/debian-meetings/2018/miniconf-hamburg/2018-05-19/software_transparency.webm

more » « less
Video Language:: English
Team:: Debconf
Project:: 2018_mini-debconf-hamburg
Duration:: 25:04

	tvincent edited English subtitles for Software transparency: package security beyond signatures and reproducible builds
	tvincent edited English subtitles for Software transparency: package security beyond signatures and reproducible builds
	tvincent edited English subtitles for Software transparency: package security beyond signatures and reproducible builds
	tvincent edited English subtitles for Software transparency: package security beyond signatures and reproducible builds
	tvincent edited English subtitles for Software transparency: package security beyond signatures and reproducible builds
	tvincent edited English subtitles for Software transparency: package security beyond signatures and reproducible builds

English subtitles

Incomplete

Revisions Compare revisions

Revision 6 Edited

tvincent
Revision 5 Edited

tvincent
Revision 4 Edited

tvincent
Revision 3 Edited

tvincent
Revision 2 Edited

tvincent
Revision 1 Edited

tvincent

	Revision Number	Author	Created
	6	tvincent
	5	tvincent
	4	tvincent
	3	tvincent
	2	tvincent
	1	tvincent

Software transparency: package security beyond signatures and reproducible builds

Revisions Compare revisions

Our website uses cookies

Operating cookies (Required)