< Return to Video

Format preserving encryption (13 min)

  • 0:00 - 0:04
    In this segment, I want to tell you about
    another form of encryption, called format
  • 0:04 - 0:07
    preserving encryption. This is again
    something that comes up in practice quite
  • 0:07 - 0:11
    often, especially when it comes to
    encrypting credit card numbers. And, so,
  • 0:11 - 0:14
    let's see how it works. Remember that a
    typical credit card number is sixteen
  • 0:14 - 0:19
    digits, broken into four blocks of four
    digits each. You've probably heard before
  • 0:19 - 0:23
    that the first six digits are what's
    called the BIN number, which represent the
  • 0:23 - 0:28
    issuer ID. For example, all credit cards
    issued by VISA always start with the digit
  • 0:28 - 0:34
    four; all credit cards issued by
    MasterCard start with digits 51 to 55, and
  • 0:34 - 0:39
    so on and so forth. The next nine digits
    are actually the account number that is
  • 0:39 - 0:43
    specific to the, to the particular
    customer and the last digit is a check sum
  • 0:43 - 0:48
    that's computed from the previous fifteen
    digits. So there are about 20,000 BIN
  • 0:48 - 0:53
    numbers and so if you do the calculation
    it turns out there are about 2 to the 42
  • 0:53 - 0:57
    possible credit card numbers which
    corresponds to about 42 bits of
  • 0:57 - 1:01
    information that you need to encode if you
    want to represent a credit card number
  • 1:01 - 1:06
    compactly. Now the problem is this.
    Suppose we wanted to encrypt credit card
  • 1:06 - 1:11
    numbers, so that when the user swipes the
    credit card number at the point of sale
  • 1:11 - 1:15
    terminal, namely at the, you know,
    the merchant's cash register. The cash
  • 1:15 - 1:19
    register, this is called a point of sale
    terminal, goes ahead and encrypts the
  • 1:19 - 1:23
    credit card number using a key k and
    it's encrypted all the way until it goes
  • 1:23 - 1:27
    to the acquiring bank or maybe even beyond
    that, maybe even all the way when it goes
  • 1:27 - 1:31
    to Visa. Now, the problem is that these
    credit card numbers actually pass through
  • 1:31 - 1:35
    many, many, many processing points. All of
    them expect to get basically something
  • 1:35 - 1:40
    that looks like a credit card number as a
    result. So if we simply wanted to encrypt
  • 1:40 - 1:44
    something at the end point, at one end
    point, and decrypt it at the other end
  • 1:44 - 1:48
    point, it's actually not that easy because
    if all we did was encrypt it using AES,
  • 1:48 - 1:53
    even if we just used deterministic AES,
    the output of the encrypted credit card
  • 1:53 - 1:57
    number would be 128 bit block. Sixteen
    bytes that would be, that would need to be
  • 1:57 - 2:02
    sent from one system to the next, until it
    reaches its destination. But each one of
  • 2:02 - 2:06
    these processors, then, would have to be
    changed, because they're all expecting to
  • 2:06 - 2:10
    get credit card numbers. And so the
    question is, can we encrypt at the cash
  • 2:10 - 2:14
    register, such that the resulting
    encryption itself looks like a credit card
  • 2:14 - 2:19
    number. And as a result, none of these
    intermediate systems would have to be
  • 2:19 - 2:23
    changed. Only the endpoints would have to
    be changed. And in doing so we would
  • 2:23 - 2:27
    actually obtain end-to-end encryption
    without actually having to change a lot of
  • 2:27 - 2:32
    software along the way. So the question
    then is, again, can we have this mechanism
  • 2:32 - 2:37
    called format preserving encryption, where
    encrypting a credit card itself produces
  • 2:37 - 2:41
    something that looks like a credit card?
    So that's our goal. The encrypted credit
  • 2:41 - 2:45
    card number should look like a regular
    valid credit card number. So this is the
  • 2:45 - 2:49
    goal of format preserving encryption. More
    abstractly what it is we're trying to do,
  • 2:49 - 2:54
    is basically build a pseudo random
    permutation on the set zero to S minus one
  • 2:54 - 2:59
    for any given S. So for the set of credit
    card numbers, S would be roughly, you
  • 2:59 - 3:04
    know, two to the 42. In fact, it's gonna
    be, not exactly two to the 42. It's gonna
  • 3:04 - 3:08
    be some funny numbers that's around two to
    the 42. And our goal is to build a PRP
  • 3:08 - 3:14
    that acts exactly on the interval, zero to
    S minus one. And what we're given as input
  • 3:14 - 3:20
    is some PRF, say, something like AES, that
    acts on much larger blocks, say, acts on
  • 3:20 - 3:24
    sixteen byte blocks. So we're trying to,
    in some sense, shrink the domain of the
  • 3:24 - 3:30
    PRF to make it fit the data that we're
    given. And the question is basically how
  • 3:30 - 3:34
    to do that. Well, once we have such a
    construction we can easily use it to
  • 3:34 - 3:38
    encrypt credit card numbers. What we would
    do is we would take the, a given credit
  • 3:38 - 3:42
    card number. We would encode it such that
    now it's represented as a number between
  • 3:42 - 3:47
    zero and the total number of credit card
    numbers. Then we would apply our PRP so we
  • 3:47 - 3:52
    would encrypt this credit card number, you
    know, using construction number two from
  • 3:52 - 3:56
    the deterministic encryption segment. And
    then we would map the final number back to
  • 3:56 - 4:01
    be a regular, to look like val--, regular,
    valid credit card number and then send
  • 4:01 - 4:05
    this along the way. So this is, this
    is again non expanding encryption. The
  • 4:05 - 4:09
    best we can do, as we said before is to
    encrypt using a PRP except again the
  • 4:09 - 4:14
    technical challenge is we need a PRP that
    acts on this particular funny looking set
  • 4:14 - 4:20
    from zero to S-1 for this prespecified
    value of S. So my goal is to show you how
  • 4:20 - 4:23
    to construct this and once we see how to
    do it, you will also know how to encrypt
  • 4:23 - 4:28
    credit card number so that the resulting
    cipher text is itself a credit card
  • 4:28 - 4:33
    number. So the construction works in two
    steps. The first thing we do is we shrink
  • 4:33 - 4:39
    our PRF from the set {0,1} to the n, say
    {0,1} to the 128 in the case of AES,
  • 4:39 - 4:44
    to {0,1} to the t where t is the
    closest power of two to the value S.
  • 4:45 - 4:50
    So say S is some crazy number around two to
    the 41. T would basically be then 42
  • 4:50 - 4:55
    because that's the closest power of two
    that's just above the value S. So the
  • 4:55 - 4:59
    construction is basically gonna use the
    Luby-Rackoff construction. What we need
  • 4:59 - 5:05
    is a PRF F prime that acts on blocks of
    size t over two. So imagine for example
  • 5:05 - 5:10
    in the credit card case, t would be 42
    because two to the 42 we said is the
  • 5:10 - 5:15
    closest power of two that's bigger than,
    than, than S. S is the number of credit,
  • 5:15 - 5:20
    total number of credit cards. Then we would
    need a PRF now that acts on 21-bit
  • 5:20 - 5:26
    inputs. So one way to do that is simply to
    take the AES block cipher, treat it as a
  • 5:26 - 5:30
    PRF on 128-bit inputs, and then simply
    truncate the output to the least
  • 5:30 - 5:35
    significant 21 bits. Okay, so we were
    given a 21 bit value. We would append
  • 5:35 - 5:40
    zeros to it so that we get 128 bits as a
    result. We would apply AES to that and
  • 5:40 - 5:45
    then we would truncate back to 21 bits.
    Now I should tell you that that's actually
  • 5:45 - 5:49
    a simple way to do it but it's actually
    not the best way to do it. There are
  • 5:49 - 5:54
    actually better ways to truncate a PRF on
    n bits to a PRF on t over two bits.
  • 5:54 - 5:58
    But for our pur--, for the purposes of this
    segment, the truncation method I just said
  • 5:59 - 6:03
    is good enough. So let's just use that
    particular method. Okay, so now, we've
  • 6:03 - 6:09
    converted our AES block cipher into a PRF
    on t over two bits, say, on 21 bits. But
  • 6:09 - 6:14
    what we really want is a PRP. And so what
    we'll do is we'll plug our new PRF, F prime,
  • 6:14 - 6:18
    directly into the Luby-Rackoff
    construction. If you remember, this is
  • 6:18 - 6:22
    basically a Feistel construction. We saw
    this when we talked about DES. It's a,
  • 6:22 - 6:27
    Luby-Rackoff used three rounds, and we
    know that this converts a secure PRF into
  • 6:27 - 6:32
    a secure PRP on twice the block size. In
    other words, we started from a secure PRF
  • 6:32 - 6:37
    on 21 bits, and that will give us, and
    Luby-Rackoff will give us a secure PRF on
  • 6:37 - 6:41
    42 bits, which is what we wanted. Now, I
    should tell you that the error parameters
  • 6:41 - 6:46
    in the Luby-Rackoff construction are not as
    good as they could be. And, in fact, a
  • 6:46 - 6:50
    better thing to do is to use seven rounds
    of Feistel. So in other words, we'll do
  • 6:50 - 6:55
    this seven times; every time we'll use a
    different key. So you notice, here, we're
  • 6:55 - 6:59
    only using three keys. We should be using,
    we should be doing this seven times using
  • 6:59 - 7:04
    seven different keys. And then there's a
    nice result, due to Patarin, that
  • 7:04 - 7:07
    shows that the seven-round construction
    basically has as good an error
  • 7:07 - 7:11
    terms as possible. So the error terms in
    the security theorem will basically be two
  • 7:11 - 7:16
    to the T. Okay. So now we have a pseudo
    random permutation that operates on close
  • 7:16 - 7:22
    power of two to the value of S. But that's
    not good enough. We actually want to get a
  • 7:22 - 7:27
    pseudo random permutation on the set zero
    to S minus one. So step two will take us
  • 7:27 - 7:31
    down from {0,1} to the T, to the actual
    set zero to the S minus one that we're
  • 7:31 - 7:35
    interested in. And this construction is,
    again, really, really cute, so let me show
  • 7:35 - 7:39
    you how it works. So, again, we're given
    this PRP that operates on a power of two.
  • 7:39 - 7:44
    And we wanna go down to a PRP that
    operates on something slightly smaller.
  • 7:44 - 7:49
    Okay, so here's on the construction works.
    Basically we're given input X in the set
  • 7:49 - 7:53
    zero to S minus one that we want. And what
    we're going to do is we're going to
  • 7:53 - 7:57
    iterate the following procedure again
    and again. So first of all we map X into
  • 7:57 - 8:02
    this temporary variable Y. And now we're
    just gonna encrypt the input X and put the
  • 8:02 - 8:07
    result into Y. If Y is inside of our
    target set, we stop and we output Y. If
  • 8:07 - 8:12
    not we iterate this again and again and
    again and again and again until finally Y
  • 8:12 - 8:17
    falls into our target set and then we
    output that value. So in a picture, let me
  • 8:17 - 8:23
    explain how this works. So we start from a
    point in our target set. And now, now we
  • 8:23 - 8:27
    apply the, the function E and we might be
    mapped into this point outside our target
  • 8:27 - 8:31
    set, so we're not gonna stop. So now we
    apply the function E again and we might
  • 8:31 - 8:35
    be mapped into this point and now we apply
    the function E again and now all of a
  • 8:35 - 8:39
    sudden we map into this point, and then we
    stop, and that's our output. Okay, so
  • 8:39 - 8:44
    that's how we map points between from zero
    to S minus one, to other points in the
  • 8:44 - 8:48
    zero to S minus one. It should be pretty
    clear that this is invertible. To invert,
  • 8:48 - 8:52
    all I'll do is I'll just decrypt and walk,
    kind of, in the opposite direction. So
  • 8:52 - 8:56
    I'll decrypt, and then decrypt, and then
    decrypt, until finally, I fall into the
  • 8:56 - 9:00
    set, zero to S minus one. And that gives
    me the inverse of the point that I wanted
  • 9:00 - 9:05
    to. So this is, in fact, a PRP. The
    question is whether it's a secure PRP, and
  • 9:05 - 9:09
    we'll see that in just a minute. But before
    that, let me ask you, how many iterations
  • 9:09 - 9:13
    do you expect that we're gonna need? And
    I wanna remind you again, before I ask you
  • 9:13 - 9:20
    that question, that, in fact, S is less than
    two to the T, but is more than two to the T-1.
  • 9:20 - 9:25
    So in particular S is greater than two to
    the T over two. And my question to you
  • 9:25 - 9:30
    now is how many iterations are we gonna
    need, on average, until this process converges?
  • 9:33 - 9:35
    So the answer is we're going to need two iterations,
  • 9:35 - 9:39
    so really just a small
    constant number of iterations. And so this
  • 9:39 - 9:43
    will give us a very, very efficient
    mechanism that will move us down from
  • 9:43 - 9:49
    {0,1} to the T to zero to the S-1. So now
    when we talk about security, it turns out
  • 9:49 - 9:53
    this step here is tight. In other words,
    there is really no additional security
  • 9:53 - 9:59
    cost to going down from two to the T to
    zero to S-1. And the reason that's true,
  • 9:59 - 10:03
    it's actually again a very cute theorem
    to prove, but I, I won't do it here. Maybe
  • 10:03 - 10:08
    I'll leave it as an exercise for you guys
    to argue. Which says that if you give me
  • 10:08 - 10:12
    any two sets Y and X, where Y is contained
    inside of X, then applying the
  • 10:12 - 10:17
    transformation that we just saw to a
    random permutation from X to X actually
  • 10:17 - 10:21
    gives a random permutation from Y to Y. So
    this means that if we started with a truly
  • 10:21 - 10:26
    random permutation on X, you'll end up
    with a truly random permutation on Y. And
  • 10:26 - 10:29
    since, well, the permutation we're
    starting with is indistinguishable from
  • 10:29 - 10:33
    random on X, we'll end up with a
    permutation that's indistinguishable from
  • 10:33 - 10:38
    random on Y. Okay, so this is a very tight
    construction and as I said, the first step
  • 10:38 - 10:42
    actually is basically, the analysis is the
    same as the Luby-Rackoff analysis. In
  • 10:42 - 10:46
    fact, it's better to use as I said, the
    Patarin analysis using seven rounds. So
  • 10:46 - 10:50
    actually here, there's a bit of cost in
    security. But, overall, we get a
  • 10:50 - 10:56
    construction that is a secure PRP for
    actually, not such good security
  • 10:56 - 11:00
    parameters, but nevertheless, this is a
    good secure PRP that we can construct that
  • 11:00 - 11:05
    actually will operate on the range zero to
    S minus one. This in turn will allow us to
  • 11:05 - 11:09
    encrypt credit card numbers such that the
    cipher text looks like a credit card
  • 11:09 - 11:13
    number. And again, I want to emphasize
    that there's no integrity here. The best
  • 11:13 - 11:17
    we can achieve here is just
    deterministic CPA security. No cipher text
  • 11:17 - 11:21
    integrity, and no randomness at all. Okay.
    So this concludes this module. And as
  • 11:21 - 11:26
    usual I want to point to a few reading
    materials that you can take a look at if
  • 11:26 - 11:30
    you're interested in learning more about
    anything that we discussed in this module.
  • 11:30 - 11:34
    So first of all, the HKDF construction
    that we talked about for key derivation is
  • 11:34 - 11:39
    described in a paper from 2010 by Hugo
    Krawczyk. Deterministic encryption, The
  • 11:39 - 11:44
    SIV mode that we described is discussed in
    this paper over here. The EME mode that we
  • 11:44 - 11:50
    described that allows us to build a, a
    wider block pseudo random permutation is
  • 11:50 - 11:54
    described in this paper here by Halevi and
    Rogaway. Tweakable block ciphers that
  • 11:54 - 11:59
    actually led to the XDS mode of operation
    that's used for disk encryption is
  • 11:59 - 12:05
    described in this paper here. And finally,
    a format preserving encryption is described
  • 12:05 - 12:10
    in this last paper that I point to over
    here. Okay so this concludes this module.
  • 12:10 - 12:14
    And in the next module we gonna start
    looking at how to do key exchange.
Title:
Format preserving encryption (13 min)
Video Language:
English

English subtitles

Revisions