36C3 - How to Break PDFs

Edit subtitles

0:00 - 0:19

36c3 preroll music
0:19 - 0:25

Herald: The next talk is on how to break
PDF's, breaking the encryption and the
0:25 - 0:33

signatures, by Fabian Ising and Vladislav
Mladenov. Their talk was accepted at CCS
0:33 - 0:38

this year in London and they had that in
November. It comes from research that
0:38 - 0:44

basically produced two different kinds of
papers and it has been... people worldwide
0:44 - 0:48

have been interested in what has been
going on. Please give them a great round
0:48 - 0:52

of applause and welcome them to the stage.
0:52 - 0:59

Applause
0:59 - 1:12

Vladi: So can you hear me? Yeah. Perfect.
OK. Now you can see the slides. My name is
1:12 - 1:15

Vladislav Mladenov, or just Vladi if you
have some questions to me and this is
1:15 - 1:21

Fabian. And we are allowed today to talk
about how to break PDF security or more
1:21 - 1:28

special about how to break the
cryptography operations in PDF files. We
1:28 - 1:37

are a large team from university of
Bochum, Mue nster and Hackmanit GmbH. So as
1:37 - 1:46

I mentioned: We will talk about
cryptography and PDF files. Does it work?
1:46 - 1:58

Fabian: All right. OK. Let's try that
again. Okay.
1:58 - 2:02

Vladi: Perfect. This talk will consist of
two parts. The first part is about
2:02 - 2:08

digitally signed PDF files and how can we
recognize such files? If we open them we
2:08 - 2:16

see the information regarding that the
file was signed and all verification
2:16 - 2:21

procedures were valid. And more
information regarding the signature
2:21 - 2:27

validation panel and information about who
signed this file. This is the first part
2:27 - 2:36

of the talk and I will present this topic.
And the second part is regarding PDF
2:36 - 2:41

encrypted files and how can we recognize
such files? If you tried to open such
2:41 - 2:47

files, the first thing you see is the
password prompt. And after entering the
2:47 - 2:52

correct password, the file is decrypted
and you can read the content within this
2:52 - 2:58

file. If you open it with Adobe,
additional information regarding if this
2:58 - 3:04

file is secured or not is displayed
further. And this is the second part of
3:04 - 3:12

our talk, and Fabian, will talk: how can
we break the PDA encryption? So before we
3:12 - 3:19

start with the attacks on signatures or
encryption, we first need some basics. And
3:19 - 3:23

after six slides, you will be experts
regarding PDF files and you will
3:23 - 3:29

understand everything about it. But maybe
it's a little bit boring, so be patient:
3:29 - 3:35

there are only 6 slides. So the first is
quite easy. PDF files are... the first
3:35 - 3:42

specification was in 1993 and almost at
the beginning PDF cryptography operations
3:42 - 3:49

like signatures and encryption was already
there. The last version is PDF 2.0 and it
3:49 - 3:58

was released in 2017. And according to
Adobe 1.6 billion files are on the web and
3:58 - 4:06

perhaps more exchange beyond the web. So
basically PDF files are everywhere. And
4:06 - 4:12

that's the reason why we consider this
topic and tried to find or to analyze the
4:12 - 4:20

security of the features. If we have some
very simple file and we open it with Adobe
4:20 - 4:25

Reader, the first thing we see is, of
course, the content. "Hello, world!" in
4:25 - 4:32

this case, and additional information
regarding the focused page and how many
4:32 - 4:40

pages this document has. But what would
happen if we don't use a PDF viewer and
4:40 - 4:48

just use some text editor? We use the
Notepad++ to open and later manipulate the
4:48 - 4:56

files. So I will zoom this thing... this
file. And the first thing we see is that
4:56 - 5:04

we can read it. Perhaps it's quite, quite
funny. And but we can still extract some
5:04 - 5:11

information of this file. For example,
some information regarding the pages. And
5:11 - 5:20

here you can see the information that the
PDF file consists of one page. But more
5:20 - 5:27

interesting is that we can see the
content of the file itself. So the lessons
5:27 - 5:35

we learned is that we can use a simple
text editor to view and edit PDF files.
5:35 - 5:44

And for our attacks, we used only this
text editor. So let's go to the details.
5:44 - 5:52

How PDF files are structured and how they
are processed. PDF files consist of 4
5:52 - 5:59

parts: header, body and body is the most
important part of the PDF files. The body
5:59 - 6:04

contains the entire information presented
to the user. And 2 other sections: Xref
6:04 - 6:11

section and trailer. Very important think
about processing PDF files, is that
6:11 - 6:18

they're processed not from the top to the
bottom, but from the bottom to the top. So
6:18 - 6:24

the first thing is that the PDF viewer
analyses or processes is the trailer. So
6:24 - 6:29

let's start doing that. What information
is starting this trailer? Basically, there
6:29 - 6:36

are two very important informations. On
the first side this is the information:
6:36 - 6:41

what is the root element of this PDF? So
which is the first object which will be
6:41 - 6:48

processed? And the second important
information is where the Xref section
6:48 - 6:54

starts. It's just a byte offset pointing
to the position of the XRef section within
6:54 - 7:00

the PDF file. So this pointer, as
mentioned before, points to the Xref
7:00 - 7:06

section. But what is the Xref section
about? The Xref section is a catalog
7:06 - 7:11

pointing or holding the information where
the objects defined in the body are
7:11 - 7:19

contained or the byte positions of this
object. So how can we read this weird Xref
7:19 - 7:26

section? The first information we extract
is that the first object, which is defined
7:26 - 7:35

here, is the object with ID 0 and we have
5 further elements or objects which are
7:35 - 7:41

defined. So the first object is here. The
first entry is the byte position within
7:41 - 7:47

the file. The second is its generation
number. And the last charter points, if
7:47 - 7:53

this object is used or not used. So
reading it, reading this Xref section, we
7:53 - 8:01

extract the information that the object
with ID 0 is at byte position 0 and is not
8:01 - 8:09

in use. So the object with ID 1 is at the
position 9 and so on and so forth. So for
8:09 - 8:18

the object with ID 4 and the object number
comes from counting it: 0 1, 2, 3 and 4.
8:18 - 8:29

So the object with ID 4 can be found at
the offset 184 and it's in use. In other
8:29 - 8:35

words, the PDF viewer knows where each
object will be found and can properly
8:35 - 8:42

display it and process it. Now we come to
the most important part: the body, and I
8:42 - 8:49

mentioned it that in the body the entire
content which is presented to the user is
8:49 - 8:58

contained. So let's see. Object 4 0 is
this one and as you can see, it contains
8:58 - 9:05

the word "Hello World". The other objects
are a reference, too. So each pointer
9:05 - 9:10

points exactly to the starting position of
each of the objects. And how can we read
9:10 - 9:16

this object? You see, we have an object
starting with the ID number, then the
9:16 - 9:25

generation number and the word "obj". So
you now know where the object starts
9:25 - 9:32

and when it ends. Now how can we process
this body? As I mentioned before in the
9:32 - 9:41

trailer, there was a reference regarding
the root element and this element was with
9:41 - 9:49

ID 1 and generation number 0. So, we now
we start reading the document here and we
9:49 - 9:56

have a catalog and a reference to some
pages. Pages is just a description of all
9:56 - 10:03

the pages contained within the file. And
what can we see here is that we have this
10:03 - 10:10

number count once or we have only one page
and a reference to the page object which
10:10 - 10:15

contains the entire information
inscription of the page. If we have
10:15 - 10:22

multiple pages, then we will have here
multiple elements. Then we have one page.
10:22 - 10:30

And here we have the contents, which is a
reference to the string we already saw.
10:30 - 10:35

Perfect. If you understand this then you
know everything or almost everything about
10:35 - 10:39

PDF files. Now you can just use your
editor and open such files and analyze
10:39 - 10:50

them. Then we need one feature... I forgot
the last part. The most simple one. The
10:50 - 10:56

header. It should just one line stating
which version is used. For example, in our
10:56 - 11:05

case, 1.4. For the last version of Adobe
here will be stated 2.0. Now, we need this
11:05 - 11:14

one feature called "Incremental Update".
And I call this feature - do you know this
11:14 - 11:20

feature highlighting something in the PDF
file or putting some sticky notes?
11:20 - 11:24

Technically, it's called "incremental
update." I just call it reviewing master
11:24 - 11:31

and bachelor thesis of my students because
this is exactly the procedure I follow. I
11:31 - 11:38

just read the text and highlight something
and store the information I put at it.
11:38 - 11:47

Technically by putting such a sticky note.
this additional information is appended
11:47 - 11:53

after the end of the file. So we have a
body update which contains exactly the
11:53 - 12:01

information additionally of the new
objects and of course, new Xref section
12:01 - 12:16

and a new trailer pointing to this new
object. Okay, we are done. Considering
12:16 - 12:24

incremental update, we saw that it is used
mainly for sticky notes or highlighting.
12:24 - 12:30

But we observed something which is very
important because an incremental update we
12:30 - 12:37

can redefine existing objects, for
example, we can redefine the object with
12:37 - 12:46

ID 4 and put new content. So we replace in
this manner the word "Hello World" with
12:46 - 12:52

another sentence and of course the Xref
section and the trailer point to this new
12:52 - 13:00

object. So this is very important. With
incremental update we are not stuck to
13:00 - 13:06

only adding some highlighting or notes. We
can redefine already existing content and
13:06 - 13:14

perhaps we need this for the attacks we
will present. So let's talk about PDF
13:14 - 13:23

signatures. First, we need a difference
between electronic signature and digital
13:23 - 13:29

signature. Electronic signature. From a
technical point of view, it's just an
13:29 - 13:36

image. I just wrote it on my PC and put it
into the file. There is no cryptographic
13:36 - 13:41

protection. It could be me lying on the
beach doing something. From cryptographic
13:41 - 13:46

point of view is the same. It does not
provide any security, any cryptographic
13:46 - 13:53

security. What we will talk about here is
about digitally signed files, so if you
13:53 - 14:00

open such files, you have the additional
information regarding the validation about
14:00 - 14:08

the signatures and who signed this PDF
file. So as I mentioned before, this talk
14:08 - 14:17

will concentrate only on these digitally
signed PDF files. How? What kind of
14:17 - 14:23

process is behind digitally signing PDF
files? Imagine we have this abstract
14:23 - 14:29

overview of a PDF document. We have the
header, body, Xref section and trailer. We
14:29 - 14:35

want to sign it. What happens is that we
take this PDF file and via incremental
14:35 - 14:42

update we put additional information
regarding that. There is a new catalog and
14:42 - 14:46

more important, a new signature object
containing the signature value and
14:46 - 14:52

information about who signed this PDF
file. And of course, there is an Xref
14:52 - 14:59

section and trailer. And relevant for you:
The entire file is now protected by the
14:59 - 15:07

PDF signature. So manipulations within
this area should not be possible, right?
15:07 - 15:16

Yeah, let's talk about this: why it's not
possible and how can we break it? First,
15:16 - 15:21

we need an attack scenario. What we want
to achieve as an attacker. We assumed in
15:21 - 15:28

our research that the attacker possesses
this signed PDF file. This could be an old
15:28 - 15:36

contract, receipt or, in our case, a bill
from Amazon. And if we open this file, the
15:36 - 15:41

signature is valid. So everything is
green. No warnings are thrown and
15:41 - 15:48

everything is fine. What we tried to do is
to take this file, manipulate it somehow
15:48 - 15:56

and then send it to the victim. And now
the victim expects to receive a digitally
15:56 - 16:02

signed PDF file, so just tripping the
digital signature is a very trivial
16:02 - 16:08

scenario and we did not consider it
because it's trivial. We considered that
16:08 - 16:13

the victim expects to see that there is a
signature and it is valid. So no warning
16:13 - 16:20

casts are thrown and the entire left side
is exactly the same from the normal
16:20 - 16:28

behavior. But on the other side, the
content was exchanged so we manipulated
16:28 - 16:34

the receipt and exchanged it with another
content. The question is now: how can we
16:34 - 16:41

do it on a technical level? And we came up
with three attacks: incremental saving
16:41 - 16:46

attacks, signature wrapping and universal
signature forgery. And I will now
16:46 - 16:51

introduce the techniques and how these
attacks are working. The first attack is
16:51 - 16:57

the incremental saving attack. So I
mentioned before that via incremental
16:57 - 17:06

saving or via incremental updates, we can
add and remove and even redefine already
17:06 - 17:15

existing objects and the signature still
stays valid. Why is this happening?
17:15 - 17:21

Consider now again our case. We have some
header, body, Xref table and trailer and
17:21 - 17:28

the file is now signed and the signature
protects only the signed area. So what
17:28 - 17:33

would happen if I put a sticky note or
some highlighting? An incremental update
17:33 - 17:39

happens. If I open this file, usually this
happens: We have the information that this
17:39 - 17:46

signature is valid, when it was signed and
so on and so forth. So our first idea was
17:46 - 17:53

to just put new body updates, redefine
already existing content and with a Xref
17:53 - 17:59

table and trailer we point to the new
content. This is quite trivial because
17:59 - 18:05

it's a legitimate feature in PDF files, so
we didn't expect to be quite successful
18:05 - 18:12

and we were not so successful. But the
first idea: we applied this attack, we
18:12 - 18:22

opened it and we got this message. So it's
kind of a weird message because an
18:22 - 18:28

experienced user sees valid, but the
document has been updated and you should
18:28 - 18:34

know what does this exactly mean. But we
did not consider this attack as successful
18:34 - 18:41

because the warning is not the same or the
status of the signature validation is not
18:41 - 18:51

the same. So what we did is to evaluate
this first against this trivial case,
18:51 - 18:57

against older viewers we have, and Libre
office, for example, was vulnerable
18:57 - 19:02

against this trivial attack. This was the
only viewer which was vulnerable against
19:02 - 19:07

this trivial variation. But then we asked
ourselves: Okay, the other viewers are
19:07 - 19:14

quite secure. But how do they detect these
incremental updates? And from developer
19:14 - 19:22

point of view, the laziest thing we can do
is just to check if another Xref table and
19:22 - 19:28

trailer were added after the signature was
applied. So we just put our body updates
19:28 - 19:37

but just deleted the other two parts. This
is not a standard compliant PDF file. It's
19:37 - 19:45

broken. But our hope was that the PDF
viewer fixes this kind of stuff for us and
19:45 - 19:51

that these viewers are error-tolerant. And
we were quite successful because the
19:51 - 19:56

verification logic just checked: Is there
an Xref table and trailer after the
19:56 - 20:02

signature was applied? No? Okay.
Everything's fine. The signature is valid.
20:02 - 20:05

No warning was thrown. But then the
application logic saw that incremental
20:05 - 20:14

updates were applied and fixed this for us
and processed these body updates and no
20:14 - 20:21

warning was thrown. Some of the viewers
required to have a trailer. I don't know
20:21 - 20:25

why - it was a Black box testing. So we
just removed the Xref table, but the
20:25 - 20:32

trailer was there and we were able to
break further PDF viewers. The most
20:32 - 20:38

complex variation of the attack was the
following: We had the PDF viewers checked
20:38 - 20:47

if every incremental update contains a
signature object. But they did not check
20:47 - 20:53

if this signature is covered by the
incremental update. So we just copy-pasted
20:53 - 21:01

the signature which was provided here and
we just forced the PDF viewer to validate
21:01 - 21:10

this signed content twice - and still our
body updates were processed and for
21:10 - 21:19

example, Foxit or Master PDF were
vulnerable against this type of attack. So
21:19 - 21:25

the evaluation of our attack: We
considered as part of our evaluation 22
21:25 - 21:31

different viewers - among others, Adobe
with different versions, Foxit, and so on.
21:31 - 21:41

And as you can see 11 of 22 were
vulnerable against incremental saving. So
21:41 - 21:47

50 percent, and we were quite surprised
because we saw that the developers saw
21:47 - 21:52

that incremental updates could be
dangerous regarding the signature
21:52 - 22:01

validation. But we were still able to
bypass their considerations. We had - a
22:01 - 22:08

full signature bypass means that there is
no possibility for the victim to detect
22:08 - 22:14

the attack. A limited signature bypass
means that the victim, if the victim
22:14 - 22:23

clicks on one - at least one - additional
window and explicitly wants to validate
22:23 - 22:32

the signature, then the viewer was
vulnerable. But the most important thing
22:32 - 22:38

is by opening the file, there was a status
message that the signature validation and
22:38 - 22:44

all signatures are valid. So this was the
first layer and the viewers were
22:44 - 22:51

vulnerable against this. So let's talk
about the second attack class. We called
22:51 - 22:58

it "signature wrapping attack" and this is
the most complex attack of the 3 classes.
22:58 - 23:05

And now we have to go a little bit into
the details of how PDF signatures are
23:05 - 23:10

made. So imagine now we have a PDF file.
We have some header and the original
23:10 - 23:16

document. The original document contains
the header, the body, the Xref section and
23:16 - 23:22

so on and so forth. And we want to sign
this document. Technically, again, an
23:22 - 23:29

incremental update is provided and we have
a new catalog here. We have some other
23:29 - 23:35

objects, for example, certificates and so
on and the signature objects. And we will
23:35 - 23:39

now concentrate on this signature object
because it's essential for the attack we
23:39 - 23:45

want to to carry out. And the signature
object contains a lot of information, but
23:45 - 23:51

we want for this attacks only two elements
are relevant: The contents and the byte
23:51 - 23:58

range. The contents contains the signature
value. It's a PKCS7 container containing
23:58 - 24:06

the signature value and the certificates
used to validate the signature and the
24:06 - 24:11

bytes range. The byte range contains four
different values and what how these values
24:11 - 24:23

are being used. The first two, A and B
define the first signed area. And this is
24:23 - 24:29

here from the beginning of the document
until the start of the signature value.
24:29 - 24:35

Why we need this? Because the signature
value is part of the signed area. So we need
24:35 - 24:43

to exclude the signature value from the
document computation. And this is how the
24:43 - 24:49

bytes range is used. The first part is
from the beginning of the document until
24:49 - 24:55

the signed the signature value starts and
after the signature ends until the end of
24:55 - 25:05

the file is the second area specified by
the two digits C and D. So, now we have
25:05 - 25:14

everything protected besides the signature
value itself. What we wanted to try is to
25:14 - 25:22

create additional space for our attacks.
So our idea was to move the second signed
25:22 - 25:30

area. And how can we do it? So basically
we can do it by just defining another byte
25:30 - 25:40

range. And as you can see here, the byte
range points from area A to B. So this
25:40 - 25:47

area we didn't made any manipulation in
this part, right? It was not modified at
25:47 - 25:53

all. So it's still valid. And the second
part, the new C value and the next D
25:53 - 26:00

bytes, we didn't change anything here,
right? So basically, we didn't changed
26:00 - 26:07

anything in the signed area. And the
signature is still valid. But what we
26:07 - 26:14

created was a space for some malicious
objects; sometimes we needed some padding
26:14 - 26:21

and a new extra section pointing to this
malicious objects. Important thing was
26:21 - 26:28

that this malicious Xref sections, the
position is defined by the trailer. And
26:28 - 26:33

since we can not modify this trailer, this
position is fixed. So this is the only
26:33 - 26:43

limitation of the attack, but it works
like a charm. And the question is now: How
26:43 - 26:50

many PDF viewers were vulnerable against
this attack? And as you can see, this is
26:50 - 26:58

the signature wrapping column. 17 out of
22 applications were vulnerable against
26:58 - 27:06

this attack. This was quite expected
result because the attack was complex we
27:06 - 27:15

saw that many developers didn't, were not
aware of this threat and that's the reason
27:15 - 27:23

why so many vulnerabilities were there.
Now to the last class of attacks,
27:23 - 27:29

universal signature forgery. And we called
it universal signature forgery, but I
27:29 - 27:34

preferred to use another definition for
this attacks. I call them stupid
27:34 - 27:41

implementation flaws. We are coming from
the PenTesting area and I know a lot of
27:41 - 27:50

you are PenTesters, too. And, many of you
have experience, quite interesting
27:50 - 27:58

experience with zero bytes, null values or
some kind of weird values. And this is
27:58 - 28:06

what we tried in this kind of attacks.
Just tried to do some stupid values or
28:06 - 28:13

remove references and see what happen.
Considering the signature, there are two
28:13 - 28:18

different important elements: The contents
containing the signature value and the
28:18 - 28:25

byte range pointing to what is exactly
signed. So, what would happen if we remove
28:25 - 28:31

the contents? Our hope was that the
information regarding the signature is
28:31 - 28:38

still shown by the viewer as valid without
validating any signature because it was
28:38 - 28:45

not possible. And by just removing the
signature value is quite obvious idea. And
28:45 - 28:49

we were not successful with this kind of
attack. But let's proceed with another
28:49 - 28:57

values like for example, contents without
any value or contents like equals NULL or
28:57 - 29:05

zero bytes. And considering this last
version, we had two viewers which were
29:05 - 29:15

vulnerable against this attack. And
another, another case is, for example, by
29:15 - 29:20

removing the byte range. By removing this
byte range we have some signature value,
29:20 - 29:30

but we don't know what is exactly signed.
So, we tried this attack and of course,
29:30 - 29:38

byte range without any value or NULL bytes
or byte range with a minus or negative,
29:38 - 29:46

negative numbers. And usually this last
crashed very a lot of viewers. But the
29:46 - 29:52

most interesting is that Adobe made this
mistake by just removing the byte range.
29:52 - 29:57

We were able to bypass the entire
security. We didn't expect this behavior,
29:57 - 30:01

but it was a stupid implementation flaw,
allowing us to do anything in this
30:01 - 30:08

document and all the exploits we show in
our presentations were made on Adobe with
30:08 - 30:15

this attack. So let's see what were the
results of this attack. As you can see,
30:15 - 30:21

only 4 of 22 viewers were vulnerable
against this attack and only Adobe
30:21 - 30:26

unlimited; for the others, there was
limitation because if you click on the
30:26 - 30:33

signature validation, then a warning was
thrown. It was very easy for Adobe to fix.
30:33 - 30:38

And as you can see, Adobe didn't mistake,
made any mistake regarding incremental
30:38 - 30:41

saving, a signature wrapping, but
regarding controversial signature forgery.
30:41 - 30:48

There were vulnerable against this attack.
And this was the hope of our approach. In
30:48 - 30:56

summary, we were able to break 21 of 22
PDF viewers. The only
30:56 - 31:01

Applause
Thanks.
31:01 - 31:08

Applause
The only secure PDF viewer is Adobe 9,
31:08 - 31:13

which is deprecated and has remote code
execution. The only
31:13 - 31:18

Laugh
The only users allowed to use them or are
31:18 - 31:25

using it are Linux users, because this is
the last version available for Linux and
31:25 - 31:32

that's the reason why you consider it. So,
I'm done with the talk about PDF
31:32 - 31:37

signatures and now Fabian can talk about
PDF encryption. Thank you.
31:37 - 31:43

Fabian: Yes
Applause
31:43 - 31:47

OK, now that we have dealt with the
signatures, let's talk about another
31:47 - 31:53

cryptographic aspect in PDFs. And that is
encryption. And some of you might remember
31:53 - 31:58

our PDFex vulnerability from earlier this
year. It's, of course, an attack with a
31:58 - 32:04

logo and it presents two novel tech
techniques targeting PDF encryption that
32:04 - 32:08

have never been applied to PDF encryption
before. So one of them is these so-called
32:08 - 32:13

direct exfiltration where we break the
cryptography without even touching the
32:13 - 32:19

cryptography. So no ciphertext
manipulation here. The second one as so-
32:19 - 32:25

called malleability gadgets. And those are
actually targeted modifications of the
32:25 - 32:31

ciphertext of the document. But first,
let's take a step back and let again take
32:31 - 32:40

some keywords in. So PDF uses AES. OK.
Well, AES is good. Nothing can go wrong,
32:40 - 32:44

right? So let's go home. Encryption is
fine. Well, of course, we didn't stop
32:44 - 32:52

here, but took a closer look. So they use
CBC mode of operation, so cipher block
32:52 - 32:58

chaining. And, what's more important is
that they don't use any integrity
32:58 - 33:04

protection. So it's unintegrity protected
AES-CBC. And you might remember the
33:04 - 33:09

scenario from the attacks against
encrypted e-mail, so against OpenPGP and
33:09 - 33:16

S-MIME, it's basically the same problem.
But first, who actually uses PDF
33:16 - 33:21

encryption? You might ask. For one, we
found some local banks in Germany use
33:21 - 33:26

encrypted PDFs as a drop-in replacement
for S-MIME or OpenPGP because their
33:26 - 33:35

customers might not want to deal with uhm,
set, with the setup of encrypted e-mail.
33:35 - 33:40

Second one, were some drop-in plugins for
encrypt e-mail as well. So there are some
33:40 - 33:45

companies out there that produce product
that you can put into your outlook and you
33:45 - 33:51

can use encrypted PDF files instead of
encrypted email. We also found that some
33:51 - 33:58

scanners and medical devices were able to
send encrypted PDF files via e-mail. So
33:58 - 34:03

you can set a password on that machine and
they will send the encrypted PDF via
34:03 - 34:10

e-mail and you have to put in the
password some other way. And lastly, we
34:10 - 34:15

found that some governmental organizations
use encrypted PDF documents, for example,
34:15 - 34:20

the US Department of Justice allows for
the send, sending in some claims via
34:20 - 34:25

encrypted PDFs. And I've exactly no idea
how you how they get the password, but at
34:25 - 34:31

least they allow it. So as we are from
academia, let's take a step back and look
34:31 - 34:37

at our attacker model. So we've got Alice
and Bob. Alice wants to send a document to
34:37 - 34:42

Bob. And she wants to send it over an
unencrypted channel or a channel she
34:42 - 34:49

doesn't trust. So of course, she decides
to encrypt it. Second scenario is, they
34:49 - 34:53

want to upload it to a shared storage. For
example, Dropbox or any other shared
34:53 - 34:57

storage. And of course, they don't trust
the storage. So, again, they use end-to-
34:57 - 35:05

end encryption. So let's assume that this
shared storage is indeed dangerous or
35:05 - 35:11

malicious. So, Alice will, of course,
again upload the encrypted document to the
35:11 - 35:17

attacker in this case, will perform some
targeted modification of that, and will
35:17 - 35:22

send the modified documents back to Bob,
who will happily put in the password
35:22 - 35:27

because from his point of view, it's
undistinguishable from the original
35:27 - 35:33

document and the original plain text will
be leaked back to the attacker, breaking
35:33 - 35:40

the confidentiality. So let's take a look
at the first attack on how we did that.
35:40 - 35:43

That's the direct exfiltration, so
breaking the cryptography without touching
35:43 - 35:51

any cryptography, as I like to say. But
first, encryption in, in a nutshell, PDF
35:51 - 35:55

encryption. So you have seen the structure
of the PDF document. There is a header
35:55 - 36:00

with a version number. There's a body
where all the interesting objects live. So
36:00 - 36:07

there is our confidential content that we
want to actually, well, to actually
36:07 - 36:15

exfiltrate as an attacker. And finally,
there is Xref table and the trailer. So
36:15 - 36:20

what changes if we decide to encrypt this
document? Well, actually, not a whole lot.
36:20 - 36:24

So instead of confidential data, of
course, there's now some encrypted
36:24 - 36:29

ciphertext. Okay. And the rest pretty much
remains the same. The only thing that is
36:29 - 36:37

added is a new value in the trailer that
tells us how to decrypt this data again.
36:37 - 36:44

So there's pretty much of the structure
left unencrypted. And we thought about:
36:44 - 36:50

Why is this? And we took a look at the
standard. So, this is an excerpt from the
36:50 - 36:56

PDF specification and I've highlighted the
interesting parts for you. Encryption is
36:56 - 37:01

only applied to strings and streams. Well,
those of the values that actually can
37:01 - 37:08

contain any text in the document and all
other objects are not encrypted. And that
37:08 - 37:12

is because, well, they want to allow
random access to the whole document. So no
37:12 - 37:18

parsing the whole document before actually
showing page 16 of the encrypted document.
37:18 - 37:25

Well, that seems kind of reasonable. So,
but that also means that the whole
37:25 - 37:28

documents structure is unencrypted and
only the streams and strings are
37:28 - 37:31

encrypted. This reveals a lot of
information to an attacker that he or she
37:31 - 37:36

shouldn't have probably. That's for one
the number and size of pages, that's the
37:36 - 37:43

number and size of objects in the document
and that's also including any links, so
37:43 - 37:48

any hyperlinks in document that are
actually there. So, that's a lot of
37:48 - 37:55

information an attacker probably shouldn't
have. So, next we thought maybe we can do
37:55 - 38:01

some more stuff. Can we add our own
unencrypted content? And we took a look at
38:01 - 38:06

the standard again and found that our so-
called crypt filters, which provide finer
38:06 - 38:11

granularity control of the encryption.
This basically means as an attacker, I can
38:11 - 38:16

change a document to say, hey, only
strings in this document are encrypted and
38:16 - 38:21

streams are unencrypted. That's what the
identity filter is for. I have no idea why
38:21 - 38:27

they decided to add that to a document
format, but it's there. So that means
38:27 - 38:32

their support for partial encryption and
that means attackers content can be mixed
38:32 - 38:37

with actual encrypted content. And we
found 18 different techniques to do that
38:37 - 38:42

in different readers. So there is a lot of
ways to do that in the different readers.
38:42 - 38:48

So let's have a look at a demo. So we have
this document, this encrypted document, we
38:48 - 38:54

put in our password and get our secret
message. We now open it again in a text
38:54 - 39:00

editor. We see, in object 4 0 down here,
there's the actual ciphertext of the
39:00 - 39:06

object, so of the message, and we see it's
AES encrypted, with a 32 byte key, so it's
39:06 - 39:16

AES-256. OK. Now we decide to add a new
object that contains, well, plaintext.
39:16 - 39:22

And, well, we simply add that to the
contents array of this document. So, we
39:22 - 39:28

say "Display this on the first page", save
the document. We open it, and we'll put in
39:28 - 39:38

our password and, oh well, this is indeed
awkward. OK. So, now, we have broken the
39:38 - 39:44

integrity of an encrypted document. Well,
you might think maybe they didn't want any
39:44 - 39:49

integrity in the encrypted files. Maybe
that's the use case people have, I don't
39:49 - 39:55

know. But we thought, maybe we can somehow
exfiltrate the plaintext this way. So
39:55 - 40:00

again, we took a step back, and looked at
the PDF specification. And the first thing
40:00 - 40:06

we found were so-called submit-form
actions. And that's basically the same as
40:06 - 40:11

a form on a website. You can put in data.
You might have seen this in a contract, in
40:11 - 40:15

a PDF contract, where you can put in your
name, and your address, and so on, and so
40:15 - 40:23

on, and the data that is saved inside of
that is saved in strings and streams. And
40:23 - 40:28

now remember that is everything that is
encrypted in a document. And, of course,
40:28 - 40:32

you can also send that back to an
attacker, or well, to a legitimate use
40:32 - 40:38

case, of course, via clicking a button,
but clicking buttons is pretty lame. So we
40:38 - 40:42

again looked at the standard and found the
so-called open action. And that is an
40:42 - 40:47

action, for example, submitting a form
that can be performed upon opening a
40:47 - 40:55

document. So how might this look? This is
how a PDF form looks, already with the
40:55 - 41:01

attack applied. So, we've got an URL here
that is unencrypted, because all strings
41:01 - 41:07

in this document are unencrypted, and
we've got the value object 2 O, where the
41:07 - 41:13

actual encrypted data lives. So, that is
the value of the form fields. And what
41:13 - 41:17

will happen on the attacker side as soon
as this document is opened? Well, we'll
41:17 - 41:25

get a post request with a confidential
content. Let's have a demo. Again, we have
41:25 - 41:31

this document. We put in our password.
It's the original document you have
41:31 - 41:36

already seen. We reopen it in a text
viewer, or a text editor, again see it's
41:36 - 41:44

encrypted, and we decide to change all
strings to the identity filter. So, no
41:44 - 41:49

encryption is applied to strings from now
on. And then we add a whole blob of
41:49 - 41:56

information for the open action, and for
the form. So this will be op- this will be
41:56 - 42:00

performed, as soon as the document is
opened. There is a URL, p.df, and the
42:00 - 42:08

value is the encrypted object 4 0. We
start an HTTP server on the domain we
42:08 - 42:13

specified, we open the document, put in
the password again, and as soon as we open
42:13 - 42:18

the document Adobe will helpfully show us
a warning, but they will already click the
42:18 - 42:22

button for remembering that for the
future. And if you accept that, you will
42:22 - 42:29

see your secret message on the attacker
server. And that is pretty bad already.
42:29 - 42:36

OK. The same works for hyperlinks, so, of
course, there are links in PDF documents,
42:36 - 42:44

and as on the Web, we can define a base
URL for hyperlinks. So we can say all URLs
42:44 - 42:50

from this document start with http://p.df.
And of course we can define any object as
42:50 - 42:57

a URL. So any object we prepared this way
can be sent as a URL, and that will, of
42:57 - 43:01

course, trigger a GET request upon opening
the document again, if you defined an open
43:01 - 43:09

action for the same object. So again,
pretty bad and breaks confidentiality. And
43:09 - 43:16

of course, everybody loves JavaScript in
PDF files, and that works as well. Okay.
43:16 - 43:21

Let's talk about ciphertext attacks, so
actual cryptographic attacks, no more not
43:21 - 43:29

touching the crypto. So you might remember
the efail attacks on OpenPGP and S/MIME,
43:29 - 43:34

and those had basically three
prerequisites. 1: Well, ciphertext
43:34 - 43:39

malleability, so it's called malleability
gadgets. That's why we need ciphertext
43:39 - 43:44

malleability, and we've got no integrity
protection, that's a plus. Then we need
43:44 - 43:49

some known plaintext for actual targeted
modifications. And we need an exfiltration
43:49 - 43:53

channel to send the data back to an
attacker. Well, exfiltration channels are
43:53 - 44:00

already dealt with as we have hyperlinks
and forms. So we can already check that.
44:00 - 44:06

Nice. Let's talk about ciphertext
malleability, or what we call gadgets. So,
44:06 - 44:10

some of you might remember this from
crypto 101, or whatever lecture you ever
44:10 - 44:15

had on cryptography. This is the
decryption function of CBC, so cipher
44:15 - 44:24

block chaining. And it's basically, you've
got your ciphertext up here, and your
44:24 - 44:30

plaintext down here. And it works by
simply decrypting a block of ciphertext,
44:30 - 44:36

XORing the previous block of ciphertext
onto that, and you'll get the plaintext.
44:36 - 44:41

So what happens, if you decide to change a
single bit in the ciphertext, for example,
44:41 - 44:48

the first bit of the initialization
vector? Well, that same bit will flip in
44:48 - 44:53

the actual plaintext. Wait a second. What
happens, if you happen to know a whole
44:53 - 45:00

plaintext block? Well, we can XOR that
onto the first block, and basically get
45:00 - 45:06

all zeros, or what we call a gadget, or a
blank sheet of paper, because we can write
45:06 - 45:14

on that by taking a chosen plaintext and
XORing that onto this results. And this
45:14 - 45:19

way we can, for example, construct URLs in
the actual ciphertext, or in the actual
45:19 - 45:24

resulting plaintext. What we can also do
with these gadget is, gadgets is moving
45:24 - 45:29

them somewhere else in the document,
cloning them, so we can have multiple
45:29 - 45:34

gadgets, at multiple places in the
ciphertext. But remember, if you do that,
45:34 - 45:38

there's always the avalanche effect of
CBC, so you will have some random bytes in
45:38 - 45:46

here, but the URL still remains in place.
Okay. That's ciphertext malleability done.
45:46 - 45:51

As I've said we need some plaintext. We
need to have some known plaintext. And as
45:51 - 45:54

the PDF standard has been pretty helpful
up until now, in breaking PDF encryption,
45:54 - 46:02

let's take a look again. And what we found
here: Permissions. So a PDF documents can
46:02 - 46:08

have different permissions for the author,
and the user of the document. This
46:08 - 46:11

basically means the author can edit the
document and the users might not be able
46:11 - 46:16

to do that. And of course, people started
to change with that- started to tamper
46:16 - 46:20

with that value, if it was left
unencrypted, so in the newest version, it
46:20 - 46:27

was decided this should be encrypted as a
16 byte value. So we've got 16 bytes. How
46:27 - 46:31

do they look? Well, at first, we need room
for extension. We need lots of
46:31 - 46:36

permissions. Then we put 4 bytes of the
actual permission value - That is also in
46:36 - 46:42

unencrypted form in document. Then we need
one byte for encrypted metadata, and for
46:42 - 46:47

some reason we need some acronym, "adb",
I'll leave it to you to figure out what
46:47 - 46:53

that stands for. And finally, we've got
four random bytes, because we have to fill
46:53 - 47:00

up 16 bytes, and we have run out of ideas.
Okay. We take all of that, encrypt it, and
47:00 - 47:06

oh well, we know a lot of that, and that
is basically known plaintext by design.
47:06 - 47:13

Which is bad. Let's look at how this looks
in a document. So, you see the perms
47:13 - 47:16

value, I've marked it down here. That is
the actual extended value I've shown you
47:16 - 47:23

on the last slide. And above that you'll
see the unencrypted value that's inside
47:23 - 47:28

this perms value, so the minus 4 in this
case, it's basically a bit field. On the
47:28 - 47:34

right side you see the actual encrypted
contents, and helpfully, all of this is
47:34 - 47:38

encrypted under the same document-wide key
in the newest version of the
47:38 - 47:44

specification. And that means we can you
reuse this plaintext anywhere in the
47:44 - 47:49

document we want, and we can reuse this
to build gadgets. To sum that last point
47:49 - 47:53

up for you: Adobe decided to add
permissions to the PDF format, and people
47:53 - 47:57

thought of tampering with them. So they
decided to encrypt these permissions to
47:57 - 48:06

prevent tampering, and now known plaintext
is available to attackers. All right. So
48:06 - 48:14

that's basically all of the prerequisites
done, and let's again have a demo. So, we
48:14 - 48:20

again open this document, put in our
password, well, as soon as Chrome decides
48:20 - 48:27

to open this document, we put in our
password. It's the same as before. Now,
48:27 - 48:32

I've prepared a script for you, because I
really can't do this live, and it
48:32 - 48:35

basically does what I've told you. It's
getting a blank gadget from the perms
48:35 - 48:40

value. It's generating a URL from that.
It's generating a field name, so that it
48:40 - 48:45

will look nice on the server side, we
regenerate this document and put a form in
48:45 - 48:50

there. We start a web server, open this
modified document, put in the password
48:50 - 48:56

again and oh well, Chrome doesn't even
ask. So as soon as this document is opened
48:56 - 48:59

in Chrome and the password is put in,
we'll get our secret message delivered to
48:59 - 49:07

the attacker.
Applause
49:07 - 49:14

So we took a look at 27 viewers and found
all of them vulnerable to at least one of
49:14 - 49:18

our attacks. So some of them work with no
user interaction as we have seen in
49:18 - 49:23

Chrome. Some work with user interaction in
specific cases, as you've seen with Adobe
49:23 - 49:31

with a warning, but generally all of these
were attackable in one way or the other.
49:31 - 49:36

So what can be done about all of this?
Well, you might think signatures might
49:36 - 49:40

help. That's usually the first point
people bring up: "A signature on the
49:40 - 49:47

encrypted file will help." Well, no, not
really. Why is that? Well, for one, a
49:47 - 49:50

broken signature does not prevent opening
the document. So we'll still be able to
49:50 - 49:54

exfiltrate as soon as a password is put
in. Signatures can be stripped because
49:54 - 49:58

they're not encrypted. And as you have
seen before, they can also be forged in
49:58 - 50:03

most viewers. Signatures are not the
answer. Closing exfiltration channels is
50:03 - 50:08

also not the answer because for one, it's
hard to do. And how would you even find
50:08 - 50:15

all exfiltrations channels in an 800 pages
standard? And I mean, we have barely
50:15 - 50:18

scratched the surface of exfiltration
channels. And should we really remove
50:18 - 50:24

forms and hyperlinks from documents? And
should we remove JavaScript? OK, maybe we
50:24 - 50:29

should. And finally, if you have to do
that, please ask the user before
50:29 - 50:34

connecting to a web server. So let's look
at some vendor reactions. Apple decided to
50:34 - 50:39

do exactly what I've told you: to add a
dialog to warn the user and even show the
50:39 - 50:44

whole URL with the encrypted plaintext.
And Google decided to stop trying to fix
50:44 - 50:50

the unfixable in Chrome. They fixed the
automatic exfiltration, but there's really
50:50 - 50:54

nothing they can do about the standard. So
this is a problem that has to be done in
50:54 - 51:00

the standard. And that is basically that.
For mitigating wrapping attacks, we have
51:00 - 51:04

to deprecate partial encryption and
disallow access from unencrypted to
51:04 - 51:08

encrypted objects. And against the gadget
attacks, we have to use authenticated
51:08 - 51:16

encryption like AES-GCM. OK. And Adobe has
told us that they were escalating this to
51:16 - 51:20

the ISO working group that's now
responsible for the PDF standard and this
51:20 - 51:25

will be taken up in the next revision. So
that's a win in my book.
51:25 - 51:31

Applause
51:31 - 51:36

Herald: Thank you so much, guys. That was
really awesome. Please queue up by the
51:36 - 51:41

microphones if you have any questions, we
still have some time left for Q and A. But
51:41 - 51:45

I think your research is really, really
interesting because it opens my mind to
51:45 - 51:51

like how would this actually be able to be
misused in practice? Like, and I don't
51:51 - 51:55

know, like, what's your take? I guess
since you've been working so much with
51:55 - 51:59

this, you must have some kind of idea as
to what devious things you could come up
51:59 - 52:03

with.
Fabian: I mean, it's still an attacker
52:03 - 52:08

scenario that requires a lot of resources
and a very motivated attacker. So this
52:08 - 52:14

might not be very important to the normal
user. Let's be real here. So most of us
52:14 - 52:19

are not targeted by the NSA, I guess. So
you need an active attacker, an active man
52:19 - 52:21

in the middle to actually perform these
attacks.
52:21 - 52:26

Herald: Great. Thank you. And then I think
we have a question from microphone number
52:26 - 52:29

four, please.
Microphone 4: Yes. You'll said that the
52:29 - 52:33

next standard might have a fix.
Do you know a time frame on how long it
52:33 - 52:41

takes to build such a standard?
Fabian: Well, no, we don't really know. We
52:41 - 52:45

have talked with Adobe and they told us
they will show the next version of the
52:45 - 52:49

standard to us before actually releasing
that, but we have no time frame at all
52:49 - 52:52

from them.
Microphone 4: OK. Thank you.
52:52 - 52:57

Herald: Thank you.
Microphone number five, please.
52:57 - 53:02

Microphone 5: Thank you for a very
interesting talk. You showed in the first
53:02 - 53:09

part that the signature has like these
four numbers with the byte range. And why
53:09 - 53:16

is this, like four numbers, not part of a
signature? Is there a technical reason for
53:16 - 53:18

that? Because the byte offset is
predictable.
53:18 - 53:24

Vladi: It is! The bytes ranges protected
by the signature. But we just defined the
53:24 - 53:32

second one and just moved the signed one
to be validated later. So there are two
53:32 - 53:38

byte ranges. But only the first one, the
manipulated one, will be processed.
53:38 - 53:43

Microphone 5: Thank you.
Herald: Thank you so much. Microphone
53:43 - 53:48

number four, please.
Microphone 4: Oh, this is way too high for
53:48 - 53:54

me. OK. I have an answer and a question
for you. You mentioned during the talk
53:54 - 53:59

that you weren't sure how the Department
of Justice did distributes the passwords
53:59 - 54:08

for encrypting PDFs. The answer is: in
plain text, in a separate email or as the
54:08 - 54:14

password of the week, which is distributed
through various means. That is also what
54:14 - 54:20

the Department of Homeland Security does,
and the military is somewhat less stupid.
54:20 - 54:27

As a question: I have roughly a half
terabyte of sensitive PDFs that I would
54:27 - 54:37

like to scan for your attack and also for
redaction failures. Do you know of any
54:37 - 54:46

fast, feasible ways to scan documents for
the presence of this kind of attack?
54:46 - 54:52

Fabian: I don't know of any tools, but I
mean, scanning for the gadget attacks is
54:52 - 54:58

actually possible if you tried to do some
entropy detection. So, because you reuse
54:58 - 55:02

ciphertext, you will have less entropy in
your ciphertext, but that's pretty hard to
55:02 - 55:07

do. Direct exfiltration should probably be
detectable by scanning simply for words
55:07 - 55:12

like "identity". Well, beyond that, 18
different techniques that we provided in
55:12 - 55:16

the paper. But I don't know of any tools
to do that automatically.
55:16 - 55:22

Microphone 4: Thank you.
Herald: Great. Thank you. And microphone
55:22 - 55:24

number two, please. Microphone 2: Thank
you for your very interesting
55:24 - 55:30

presentation. I have one suggestion and
one question for the mitigation scheme. If
55:30 - 55:34

you simply run your PDF reader in a
virtual machine, that is firewalled away,
55:34 - 55:39

so your firewall won't led you to anybody
going out. But for the signature
55:39 - 55:43

forgeries, I had an idea. I'm not sure if
this is actually a stupid idea, but did
55:43 - 55:47

you consider faking the certificate?
Because presumably the signature is
55:47 - 55:52

protected by the seller's certificate. You
make up your own, signing with that. Does
55:52 - 55:58

it catch it and how?
Vladi: We considered it but not in this
55:58 - 56:05

paper. We assume that the certificate and
the entire chain of trust for this path is
56:05 - 56:12

totally secure. It was just an assumption
to just concentrate only on the attacks we
56:12 - 56:20

already found. So, perhaps there will be
further research provided by us in the
56:20 - 56:23

next months and years.
Herald: We might just hear more from you
56:23 - 56:28

in the future. Thank you so much. And now
questions from the Internet, please.
56:28 - 56:35

Signal Angel: I have two questions to the
first part of your talk from the Internet.
56:35 - 56:41

The first one is you mentioned a few
reactions, but can you give a bit more
56:41 - 56:47

detail about your experience with vendors
while reporting these issues?
56:47 - 56:58

Vladi: Yeah. We, ... for the first time we
started, we asked the CERT team from BSI,
56:58 - 57:05

CERT-Bund, to help us because there were a
lot of affected vendors and we were not
57:05 - 57:14

able to provide the support in a feasible
way. So they supported us the entire way.
57:14 - 57:20

We first created the report with,
containing the exact description of the
57:20 - 57:26

vulnerabilities and old exploits. Then, we
distributed it to the BSI and they
57:26 - 57:33

contacted the vendors and just proxied to
the communication and there was a lot of
57:33 - 57:37

communication. So I'm not aware of the
entire communication, but only about the
57:37 - 57:46

technical stuff where we were asked to
just retest the fix and so on. So there
57:46 - 57:53

was some reaction from Adobe, FoxIt and a
lot of viewers reacted on our attacks and
57:53 - 57:58

contacted us, but not everybody.
Herald: Thank you so much. Unfortunately,
57:58 - 58:02

that's the only time that we have
available for questions today. I think you
58:02 - 58:06

guys might stay around for a couple of
minutes, just if someone has any more
58:06 - 58:11

questions. Fabian, I thank ... and
Vladislav, not enough. Thank you so much.
58:11 - 58:13

It was very interesting. Please give them
a great round of applause.
58:13 - 58:15

Valdi: Thank you.
Applause
58:15 - 58:20

36c3 postroll music
58:20 - 58:43

subtitles created by c3subtitles.de
in the year 2019. Join, and help us!

Title:: 36C3 - How to Break PDFs
Description:: more » « less
Video Language:: English
Duration:: 58:43

	C3Subtitles edited English subtitles for 36C3 - How to Break PDFs
	Florian edited English subtitles for 36C3 - How to Break PDFs
	C3Subtitles edited English subtitles for 36C3 - How to Break PDFs
	C3Subtitles edited English subtitles for 36C3 - How to Break PDFs

English subtitles

Revisions

Revision 4 Edited

C3Subtitles

36C3 - How to Break PDFs

Revisions

Our website uses cookies

Operating cookies (Required)