Format preserving encryption (13 min)

Edit subtitles

0:00 - 0:04

In this segment, I want to tell you about
another form of encryption, called format
0:04 - 0:07

preserving encryption. This is again
something that comes up in practice quite
0:07 - 0:11

often, especially when it comes to
encrypting credit card numbers. And, so,
0:11 - 0:14

let's see how it works. Remember that a
typical credit card number is sixteen
0:14 - 0:19

digits, broken into four blocks of four
digits each. You've probably heard before
0:19 - 0:23

that the first six digits are what's
called the BIN number, which represent the
0:23 - 0:28

issuer ID. For example, all credit cards
issued by VISA always start with the digit
0:28 - 0:34

four; all credit cards issued by
MasterCard start with digits 51 to 55, and
0:34 - 0:39

so on and so forth. The next nine digits
are actually the account number that is
0:39 - 0:43

specific to the, to the particular
customer and the last digit is a check sum
0:43 - 0:48

that's computed from the previous fifteen
digits. So there are about 20,000 BIN
0:48 - 0:53

numbers and so if you do the calculation
it turns out there are about 2 to the 42
0:53 - 0:57

possible credit card numbers which
corresponds to about 42 bits of
0:57 - 1:01

information that you need to encode if you
want to represent a credit card number
1:01 - 1:06

compactly. Now the problem is this.
Suppose we wanted to encrypt credit card
1:06 - 1:11

numbers, so that when the user swipes the
credit card number at the point of sale
1:11 - 1:15

terminal, namely at the, you know,
the merchant's cash register. The cash
1:15 - 1:19

register, this is called a point of sale
terminal, goes ahead and encrypts the
1:19 - 1:23

credit card number using a key k and
it's encrypted all the way until it goes
1:23 - 1:27

to the acquiring bank or maybe even beyond
that, maybe even all the way when it goes
1:27 - 1:31

to Visa. Now, the problem is that these
credit card numbers actually pass through
1:31 - 1:35

many, many, many processing points. All of
them expect to get basically something
1:35 - 1:40

that looks like a credit card number as a
result. So if we simply wanted to encrypt
1:40 - 1:44

something at the end point, at one end
point, and decrypt it at the other end
1:44 - 1:48

point, it's actually not that easy because
if all we did was encrypt it using AES,
1:48 - 1:53

even if we just used deterministic AES,
the output of the encrypted credit card
1:53 - 1:57

number would be 128 bit block. Sixteen
bytes that would be, that would need to be
1:57 - 2:02

sent from one system to the next, until it
reaches its destination. But each one of
2:02 - 2:06

these processors, then, would have to be
changed, because they're all expecting to
2:06 - 2:10

get credit card numbers. And so the
question is, can we encrypt at the cash
2:10 - 2:14

register, such that the resulting
encryption itself looks like a credit card
2:14 - 2:19

number. And as a result, none of these
intermediate systems would have to be
2:19 - 2:23

changed. Only the endpoints would have to
be changed. And in doing so we would
2:23 - 2:27

actually obtain end-to-end encryption
without actually having to change a lot of
2:27 - 2:32

software along the way. So the question
then is, again, can we have this mechanism
2:32 - 2:37

called format preserving encryption, where
encrypting a credit card itself produces
2:37 - 2:41

something that looks like a credit card?
So that's our goal. The encrypted credit
2:41 - 2:45

card number should look like a regular
valid credit card number. So this is the
2:45 - 2:49

goal of format preserving encryption. More
abstractly what it is we're trying to do,
2:49 - 2:54

is basically build a pseudo random
permutation on the set zero to S minus one
2:54 - 2:59

for any given S. So for the set of credit
card numbers, S would be roughly, you
2:59 - 3:04

know, two to the 42. In fact, it's gonna
be, not exactly two to the 42. It's gonna
3:04 - 3:08

be some funny numbers that's around two to
the 42. And our goal is to build a PRP
3:08 - 3:14

that acts exactly on the interval, zero to
S minus one. And what we're given as input
3:14 - 3:20

is some PRF, say, something like AES, that
acts on much larger blocks, say, acts on
3:20 - 3:24

sixteen byte blocks. So we're trying to,
in some sense, shrink the domain of the
3:24 - 3:30

PRF to make it fit the data that we're
given. And the question is basically how
3:30 - 3:34

to do that. Well, once we have such a
construction we can easily use it to
3:34 - 3:38

encrypt credit card numbers. What we would
do is we would take the, a given credit
3:38 - 3:42

card number. We would encode it such that
now it's represented as a number between
3:42 - 3:47

zero and the total number of credit card
numbers. Then we would apply our PRP so we
3:47 - 3:52

would encrypt this credit card number, you
know, using construction number two from
3:52 - 3:56

the deterministic encryption segment. And
then we would map the final number back to
3:56 - 4:01

be a regular, to look like val--, regular,
valid credit card number and then send
4:01 - 4:05

this along the way. So this is, this
is again non expanding encryption. The
4:05 - 4:09

best we can do, as we said before is to
encrypt using a PRP except again the
4:09 - 4:14

technical challenge is we need a PRP that
acts on this particular funny looking set
4:14 - 4:20

from zero to S-1 for this prespecified
value of S. So my goal is to show you how
4:20 - 4:23

to construct this and once we see how to
do it, you will also know how to encrypt
4:23 - 4:28

credit card number so that the resulting
cipher text is itself a credit card
4:28 - 4:33

number. So the construction works in two
steps. The first thing we do is we shrink
4:33 - 4:39

our PRF from the set {0,1} to the n, say
{0,1} to the 128 in the case of AES,
4:39 - 4:44

to {0,1} to the t where t is the
closest power of two to the value S.
4:45 - 4:50

So say S is some crazy number around two to
the 41. T would basically be then 42
4:50 - 4:55

because that's the closest power of two
that's just above the value S. So the
4:55 - 4:59

construction is basically gonna use the
Luby-Rackoff construction. What we need
4:59 - 5:05

is a PRF F prime that acts on blocks of
size t over two. So imagine for example
5:05 - 5:10

in the credit card case, t would be 42
because two to the 42 we said is the
5:10 - 5:15

closest power of two that's bigger than,
than, than S. S is the number of credit,
5:15 - 5:20

total number of credit cards. Then we would
need a PRF now that acts on 21-bit
5:20 - 5:26

inputs. So one way to do that is simply to
take the AES block cipher, treat it as a
5:26 - 5:30

PRF on 128-bit inputs, and then simply
truncate the output to the least
5:30 - 5:35

significant 21 bits. Okay, so we were
given a 21 bit value. We would append
5:35 - 5:40

zeros to it so that we get 128 bits as a
result. We would apply AES to that and
5:40 - 5:45

then we would truncate back to 21 bits.
Now I should tell you that that's actually
5:45 - 5:49

a simple way to do it but it's actually
not the best way to do it. There are
5:49 - 5:54

actually better ways to truncate a PRF on
n bits to a PRF on t over two bits.
5:54 - 5:58

But for our pur--, for the purposes of this
segment, the truncation method I just said
5:59 - 6:03

is good enough. So let's just use that
particular method. Okay, so now, we've
6:03 - 6:09

converted our AES block cipher into a PRF
on t over two bits, say, on 21 bits. But
6:09 - 6:14

what we really want is a PRP. And so what
we'll do is we'll plug our new PRF, F prime,
6:14 - 6:18

directly into the Luby-Rackoff
construction. If you remember, this is
6:18 - 6:22

basically a Feistel construction. We saw
this when we talked about DES. It's a,
6:22 - 6:27

Luby-Rackoff used three rounds, and we
know that this converts a secure PRF into
6:27 - 6:32

a secure PRP on twice the block size. In
other words, we started from a secure PRF
6:32 - 6:37

on 21 bits, and that will give us, and
Luby-Rackoff will give us a secure PRF on
6:37 - 6:41

42 bits, which is what we wanted. Now, I
should tell you that the error parameters
6:41 - 6:46

in the Luby-Rackoff construction are not as
good as they could be. And, in fact, a
6:46 - 6:50

better thing to do is to use seven rounds
of Feistel. So in other words, we'll do
6:50 - 6:55

this seven times; every time we'll use a
different key. So you notice, here, we're
6:55 - 6:59

only using three keys. We should be using,
we should be doing this seven times using
6:59 - 7:04

seven different keys. And then there's a
nice result, due to Patarin, that
7:04 - 7:07

shows that the seven-round construction
basically has as good an error
7:07 - 7:11

terms as possible. So the error terms in
the security theorem will basically be two
7:11 - 7:16

to the T. Okay. So now we have a pseudo
random permutation that operates on close
7:16 - 7:22

power of two to the value of S. But that's
not good enough. We actually want to get a
7:22 - 7:27

pseudo random permutation on the set zero
to S minus one. So step two will take us
7:27 - 7:31

down from {0,1} to the T, to the actual
set zero to the S minus one that we're
7:31 - 7:35

interested in. And this construction is,
again, really, really cute, so let me show
7:35 - 7:39

you how it works. So, again, we're given
this PRP that operates on a power of two.
7:39 - 7:44

And we wanna go down to a PRP that
operates on something slightly smaller.
7:44 - 7:49

Okay, so here's on the construction works.
Basically we're given input X in the set
7:49 - 7:53

zero to S minus one that we want. And what
we're going to do is we're going to
7:53 - 7:57

iterate the following procedure again
and again. So first of all we map X into
7:57 - 8:02

this temporary variable Y. And now we're
just gonna encrypt the input X and put the
8:02 - 8:07

result into Y. If Y is inside of our
target set, we stop and we output Y. If
8:07 - 8:12

not we iterate this again and again and
again and again and again until finally Y
8:12 - 8:17

falls into our target set and then we
output that value. So in a picture, let me
8:17 - 8:23

explain how this works. So we start from a
point in our target set. And now, now we
8:23 - 8:27

apply the, the function E and we might be
mapped into this point outside our target
8:27 - 8:31

set, so we're not gonna stop. So now we
apply the function E again and we might
8:31 - 8:35

be mapped into this point and now we apply
the function E again and now all of a
8:35 - 8:39

sudden we map into this point, and then we
stop, and that's our output. Okay, so
8:39 - 8:44

that's how we map points between from zero
to S minus one, to other points in the
8:44 - 8:48

zero to S minus one. It should be pretty
clear that this is invertible. To invert,
8:48 - 8:52

all I'll do is I'll just decrypt and walk,
kind of, in the opposite direction. So
8:52 - 8:56

I'll decrypt, and then decrypt, and then
decrypt, until finally, I fall into the
8:56 - 9:00

set, zero to S minus one. And that gives
me the inverse of the point that I wanted
9:00 - 9:05

to. So this is, in fact, a PRP. The
question is whether it's a secure PRP, and
9:05 - 9:09

we'll see that in just a minute. But before
that, let me ask you, how many iterations
9:09 - 9:13

do you expect that we're gonna need? And
I wanna remind you again, before I ask you
9:13 - 9:20

that question, that, in fact, S is less than
two to the T, but is more than two to the T-1.
9:20 - 9:25

So in particular S is greater than two to
the T over two. And my question to you
9:25 - 9:30

now is how many iterations are we gonna
need, on average, until this process converges?
9:33 - 9:35

So the answer is we're going to need two iterations,
9:35 - 9:39

so really just a small
constant number of iterations. And so this
9:39 - 9:43

will give us a very, very efficient
mechanism that will move us down from
9:43 - 9:49

{0,1} to the T to zero to the S-1. So now
when we talk about security, it turns out
9:49 - 9:53

this step here is tight. In other words,
there is really no additional security
9:53 - 9:59

cost to going down from two to the T to
zero to S-1. And the reason that's true,
9:59 - 10:03

it's actually again a very cute theorem
to prove, but I, I won't do it here. Maybe
10:03 - 10:08

I'll leave it as an exercise for you guys
to argue. Which says that if you give me
10:08 - 10:12

any two sets Y and X, where Y is contained
inside of X, then applying the
10:12 - 10:17

transformation that we just saw to a
random permutation from X to X actually
10:17 - 10:21

gives a random permutation from Y to Y. So
this means that if we started with a truly
10:21 - 10:26

random permutation on X, you'll end up
with a truly random permutation on Y. And
10:26 - 10:29

since, well, the permutation we're
starting with is indistinguishable from
10:29 - 10:33

random on X, we'll end up with a
permutation that's indistinguishable from
10:33 - 10:38

random on Y. Okay, so this is a very tight
construction and as I said, the first step
10:38 - 10:42

actually is basically, the analysis is the
same as the Luby-Rackoff analysis. In
10:42 - 10:46

fact, it's better to use as I said, the
Patarin analysis using seven rounds. So
10:46 - 10:50

actually here, there's a bit of cost in
security. But, overall, we get a
10:50 - 10:56

construction that is a secure PRP for
actually, not such good security
10:56 - 11:00

parameters, but nevertheless, this is a
good secure PRP that we can construct that
11:00 - 11:05

actually will operate on the range zero to
S minus one. This in turn will allow us to
11:05 - 11:09

encrypt credit card numbers such that the
cipher text looks like a credit card
11:09 - 11:13

number. And again, I want to emphasize
that there's no integrity here. The best
11:13 - 11:17

we can achieve here is just
deterministic CPA security. No cipher text
11:17 - 11:21

integrity, and no randomness at all. Okay.
So this concludes this module. And as
11:21 - 11:26

usual I want to point to a few reading
materials that you can take a look at if
11:26 - 11:30

you're interested in learning more about
anything that we discussed in this module.
11:30 - 11:34

So first of all, the HKDF construction
that we talked about for key derivation is
11:34 - 11:39

described in a paper from 2010 by Hugo
Krawczyk. Deterministic encryption, The
11:39 - 11:44

SIV mode that we described is discussed in
this paper over here. The EME mode that we
11:44 - 11:50

described that allows us to build a, a
wider block pseudo random permutation is
11:50 - 11:54

described in this paper here by Halevi and
Rogaway. Tweakable block ciphers that
11:54 - 11:59

actually led to the XDS mode of operation
that's used for disk encryption is
11:59 - 12:05

described in this paper here. And finally,
a format preserving encryption is described
12:05 - 12:10

in this last paper that I point to over
here. Okay so this concludes this module.
12:10 - 12:14

And in the next module we gonna start
looking at how to do key exchange.

Title:: Format preserving encryption (13 min)
Video Language:: English

	amyc edited English subtitles for Format preserving encryption (13 min)
	amyc edited English subtitles for Format preserving encryption (13 min)
	amyc edited English subtitles for Format preserving encryption (13 min)
	stanford-bot edited English subtitles for Format preserving encryption (13 min)
	stanford-bot edited English subtitles for Format preserving encryption (13 min)
	stanford-bot added a translation

English subtitles

Revisions

Revision 2

amyc

Format preserving encryption (13 min)

Revisions

Our website uses cookies

Operating cookies (Required)