Return to Video

#rC3 - Corona-Warn-App

  • 0:00 - 0:13
    rC3 preroll music
  • 0:13 - 0:20
    Herald: All right, CWA - three simple
    letters, but what stands behind them is
  • 0:20 - 0:25
    not simple at all. For various reasons. The
    Corona-Warn-App has been one of the most
  • 0:25 - 0:30
    talked about digital project of the year.
    Behind its rather simplistic facade there
  • 0:30 - 0:35
    are many considerations that went into the
    App's design to protect its users and
  • 0:35 - 0:39
    their data, while they might not be
    visible for most users, these goals had a
  • 0:39 - 0:43
    direct influence on the software
    architecture. For instance, the risk
  • 0:43 - 0:48
    calculation. Here today to talk about some
    of these backend elements is one of the
  • 0:48 - 0:53
    solution architects of the Corona-Warn-App
    - Thomas Klingbeil. And I'm probably not
  • 0:53 - 1:00
    the only one here at rC3, who is an active
    user. And I'm pretty curious to hear more
  • 1:00 - 1:04
    about what's going on behind the scenes of
    the App. So without further ado, let's
  • 1:04 - 1:12
    give a warm virtual welcome to Thomas
    Klingbeil. Thomas, the stream is yours.
  • 1:15 - 1:19
    Thomas Klingbeil: Hello, everybody. I'm
    Thomas Klingbeil, and today in the
  • 1:19 - 1:23
    session, I would like to talk about the
    German Corona-Warn-App and give you a
  • 1:23 - 1:28
    little tour behind the scenes of the App
    development, the underlying technologies
  • 1:28 - 1:34
    and which things are invisible to the end
    user, but still very important for the App
  • 1:34 - 1:38
    itself. First, I would like to give you a
    short introduction to the App, the
  • 1:38 - 1:42
    underlying architecture and to used
    technologies, for example, the Exposure
  • 1:42 - 1:46
    Notification Framework. Then I would like
    to have a look on the communication
  • 1:46 - 1:52
    between the App and the backend and
    looking at which possible privacy threats
  • 1:52 - 1:57
    could be found and how we mitigated them,
    of course. And then I would like to dive a
  • 1:57 - 2:02
    little bit into the risk calculation of
    the App to show you what it actually
  • 2:02 - 2:07
    means, If there is a red or a green
    screen, visible to the end user. First of
  • 2:07 - 2:13
    all, we can ask ourselves the question,
    what is the Corona-Warn-App, actually? So,
  • 2:13 - 2:21
    here it is. This is the German Corona-Warn-App,
    you can download it from the App stores and
  • 2:21 - 2:25
    once you have unboarded onto the App, you
    will see the following: up here it shows
  • 2:25 - 2:29
    you that the exposure logging is active,
    which means this is the currently active
  • 2:29 - 2:34
    App. Then you have this green card. Green
    means it's low risk because there have
  • 2:34 - 2:40
    been no exposures so far. The logging has
    been permanently active and it has just
  • 2:40 - 2:45
    updated this afternoon. So everything is
    all right. Let's say you have just been
  • 2:45 - 2:51
    tested at a doctor's, then you could click
    this button here and you get to the
  • 2:51 - 2:57
    screen, we you're able to retrieve your test
    result digitally. To do this, you can scan
  • 2:57 - 3:02
    a QR code, which is on the phone, you
    received from your doctor, and then you
  • 3:02 - 3:09
    will get an update as soon as the test
    result is available. Of course, you can
  • 3:09 - 3:13
    also get more information about the active
    exposure logging when you click the button
  • 3:13 - 3:17
    up here, then you get to this screen and
    there you can learn more about the
  • 3:17 - 3:22
    transnational exposure logging, because
    the German Corona-Warn-App is not alone.
  • 3:22 - 3:27
    It is connected to other Corona-Apps of
    other countries within Europe. So users
  • 3:27 - 3:33
    from other countries can meet and they
    would be informed mutually about possible
  • 3:33 - 3:41
    encounters. So just to be sure, I would
    like to quickly dive into the terminology
  • 3:41 - 3:46
    of the exposure notification framework. So
    you know what I'm talking about during
  • 3:46 - 3:53
    this session. It all starts with a
    Temporary Exposure Key which is generated
  • 3:53 - 3:59
    on the phone and which is valid for 24
    hours. From this Temporary Exposure Key,
  • 3:59 - 4:03
    several things are derived. First, for
    example, there is the Rolling Proximity
  • 4:03 - 4:09
    Identifier Key and the Associated
    Encrypted Metadata Key. This part down
  • 4:09 - 4:13
    here, we can skip for a moment being and
    look at the generation of Rolling
  • 4:13 - 4:19
    Proximity Identifiers. Those Rolling
    Proximity Identifiers are only valid for
  • 4:19 - 4:24
    10 minutes each because they are regularly
    exchanged once the Bluetooth MAC-Address
  • 4:24 - 4:29
    change takes place. So the Rolling
    Proximity Identifier is basically the
  • 4:29 - 4:33
    Bluetooth payload your phone uses, when
    the Exposion Verification Framework is
  • 4:33 - 4:39
    active and broadcasting. When I say
    broadcasting, I mean every 250
  • 4:39 - 4:45
    milliseconds your phone sends out its own
    Rolling Proximity Identifiers, so other
  • 4:45 - 4:52
    phones around, which are scanning for
    signal in the air basically can catch them
  • 4:52 - 4:58
    and store them locally. So let's look at
    the receiving side. This is what we see
  • 4:58 - 5:02
    down here and now, as I've already
    mentioned, we've got those Bluetooth low
  • 5:02 - 5:07
    energy beacon mechanics sending out those
    Rolling Proximity Identifiers and they're
  • 5:07 - 5:13
    received down here. This is all a very
    simplified schematic, just to give you an
  • 5:13 - 5:18
    impression of what's going on there. So
    now we've got those Rolling Proximity
  • 5:18 - 5:24
    Identifiers stored under receiving phone
    and now, somehow, this other phone needs
  • 5:24 - 5:29
    to find out that there has been a match,
    this happens by transforming those
  • 5:29 - 5:34
    Temporary Exposure Keys into Diagnosis
    Keys, which is just a renaming. But as
  • 5:34 - 5:38
    soon as someone has tested positive and a
    Temporary Exposure Key is linked to a
  • 5:38 - 5:43
    positive diagnosis, it is called Diagnosis
    Key and they are uploaded to the server.
  • 5:44 - 5:52
    And I'm drastically simplifying here. So
    they receive the other phone here they're
  • 5:52 - 5:58
    downloaded, all those Diagnosis Keys are
    extracted again. And as you can see, the
  • 5:58 - 6:06
    same functions applied, again HKDF, then
    AES, and we get a lot of Rolling Proximity
  • 6:06 - 6:13
    Identifiers for matching down here. And
    those are the ones we have stored and now
  • 6:13 - 6:17
    we can match them and find out which of
    those Rolling Proximity Identifiers we
  • 6:17 - 6:23
    have seen so far. And, of course, the
    receiving phone can also make sure that
  • 6:23 - 6:28
    the Rolling Proximity Identifiers
    belonging to a single Diagnosis Key, which
  • 6:28 - 6:33
    means they belong to one single other
    phone, are connected to each other. So we
  • 6:33 - 6:38
    can also track exposures which have lasted
    longer than 10 minutes. So, for example,
  • 6:38 - 6:43
    if you are having a meeting of 90 minutes,
    this would allow the explosion
  • 6:43 - 6:47
    notification framework to get together
    those up to nine Rolling Proximity
  • 6:47 - 6:52
    Identifiers and transform them into a
    single encounter, which you then get
  • 6:52 - 6:57
    enriched with those associated encrypted
    metadata, which is basically just the
  • 6:57 - 7:05
    transmit power. As a summary, down here. So
    now that we know which data are being
  • 7:05 - 7:10
    transferred from phone to phone, we can
    have a look at the actual architecture of
  • 7:10 - 7:17
    the App itself. This gray box here is the
    mobile phone, and down here is the German
  • 7:17 - 7:23
    Corona-Warn-App, it's a dashed line, which
    means there's more documentation available
  • 7:23 - 7:29
    online. So I can only invite you to go to
    GitHub repository. Have a look at our code
  • 7:29 - 7:34
    and, of course, our documentation. So
    there are more diagrams available. And as
  • 7:34 - 7:39
    you can see, the App itself does not store
    a lot of data. So those boxes here are
  • 7:39 - 7:44
    storages. So it only store something
    called a Registration Token and the
  • 7:44 - 7:50
    contact journal entries for our most
    recent version, which means that's all the
  • 7:50 - 7:56
    App stores itself. What you can see here
    is that it's connected to the operating
  • 7:56 - 8:00
    system API/SDK for the exposure
    notifications, so that's the exposure
  • 8:00 - 8:04
    notification framework to which we
    interface, which takes care of all the key
  • 8:04 - 8:10
    connecting, broadcasting and key matching
    as well. Then there's a protocol buffer
  • 8:10 - 8:16
    library which we need for the data
    transfer, and we use the operating system
  • 8:16 - 8:21
    cryptography libraries or, basically, the
    SDK. So we don't need to include external
  • 8:21 - 8:31
    libraries for that. What you can see here
    is the OS API/SDK for push messages. But
  • 8:31 - 8:36
    this is not remote push messaging, but
    only locally. So the App triggers local
  • 8:36 - 8:42
    notifications and to the user, it appears
    as if the notifications to push message
  • 8:42 - 8:50
    came in remotely, but actually it only
    uses local messages. But what would the
  • 8:50 - 8:56
    App be without the actual backend
    infrastructure? So you can see here,
  • 8:56 - 9:02
    that's the Corona-Warn-App server, that's
    the actual backend for managing all the
  • 9:02 - 9:08
    keys. And you see the upload path here.
    It's aggregated then provided through
  • 9:08 - 9:13
    content delivery network and downloaded by
    the App here. But we've got more. We've
  • 9:13 - 9:19
    got the verification server, which has the
    job of verifying a positive test result.
  • 9:19 - 9:26
    And how does it do that? There's basically
    two ways it can either get the information
  • 9:26 - 9:33
    that a positive test is true though a so-
    called teleTAN, which is the most basic
  • 9:33 - 9:38
    way, because people call up the hotline,
    get one of those teleTAN, entered into the
  • 9:38 - 9:45
    App and then they are able to upload the
    Diagnosis Keys or, if people use the fully
  • 9:45 - 9:50
    digital way, they get their test result
    through the App. And that's why we have
  • 9:50 - 9:55
    the test results server up here, which can
    be queried by the verification server
  • 9:55 - 10:02
    so users can get the test result through
    the infrastructure. But that's not all,
  • 10:02 - 10:07
    because as I've promised earlier, we've also
    got the connection to other European
  • 10:07 - 10:11
    countries. So down here is the European
    Federation Gateway Service, which gives us
  • 10:11 - 10:17
    the possibility to a) upload our own
    national keys to this European Federation
  • 10:17 - 10:21
    Gateway Service, so other countries can
    download them and distribute them to their
  • 10:21 - 10:26
    users, but we can also request foreign
    keys and, gets even better, we can be
  • 10:26 - 10:31
    informed if new foreign keys are available
    for download through a callback mechanism,
  • 10:31 - 10:40
    which is just here on the right side. So
    once the app is communicating with the
  • 10:40 - 10:49
    backend, what would actually happen if
    someone is listening? So we've got our
  • 10:49 - 10:59
    dataflow here. And. Let's have a look at
    it, so in step one, we are actually
  • 10:59 - 11:05
    scanning the QR code with a camera of the
    phone and extracted from the QR code would
  • 11:05 - 11:11
    be a GUID, which is then fed into the
    Corona-Warn-App. You can see here it is
  • 11:11 - 11:15
    never stored within the app. That's very
    important, because we wanted to make sure
  • 11:15 - 11:19
    that as few information as possible needs
    to be stored within the app and also that
  • 11:19 - 11:24
    it's not possible to connect information
    from different sources, for example, to
  • 11:24 - 11:32
    trace back Diagnosis Key to a GUID to
    allow personification. It was very
  • 11:32 - 11:39
    important that this step is not possible.
    So we had to take care that no data is
  • 11:39 - 11:45
    stored together and data cannot be
    connected again. So in step one, we get
  • 11:45 - 11:50
    this GUID. And this is then hashed on the
    phone being sent to the verification
  • 11:50 - 11:56
    server, which in step three generates a
    so-called Registration Token and stores it
  • 11:56 - 12:02
    together. So it stores the hash(GUID) and
    the hash(Registration Token), making sure
  • 12:02 - 12:09
    that GUID can only be used once and
    returns the unhashed Registration Token to
  • 12:09 - 12:18
    the App here. Now the App can store the
    Registration Token and use it in step five
  • 12:18 - 12:23
    for polling for test results, but the test
    results are not available directly on the
  • 12:23 - 12:27
    verification server, because we do not
    store it here. But the verification server
  • 12:27 - 12:33
    connects to the test results server by
    using the hash(GUID), which can get from
  • 12:33 - 12:39
    the hash(Registration Token) here, and
    then it can ask the test results server. And
  • 12:39 - 12:44
    the test results server might have a data
    set connecting the hash(GUID) to the test
  • 12:44 - 12:52
    result. And this check needs to be done
    because the test results server might also
  • 12:52 - 12:57
    have no information for this hash(GUID),
    and this only means that no test result
  • 12:57 - 13:01
    has received yet. This is what happens
    here in step A, the Lab Information
  • 13:01 - 13:07
    system, the LIS, can supply the
    test results server with a package of
  • 13:07 - 13:12
    hash(GUID) and the test result - so it's
    stored there. And if it's available already
  • 13:12 - 13:18
    on a test result server, it is returned to the
    verification server and here in step 7 and
  • 13:18 - 13:25
    accordingly in step 8 to the App. You
    might have noted the test results is also,
  • 13:25 - 13:30
    neither cached nor stored here on the
    verification server, which means if the
  • 13:30 - 13:36
    user then decides to upload the keys, a
    TAN is required to pass onto the backend
  • 13:36 - 13:42
    for verification of the positive test. An
    equal flow needs to be followed. So in
  • 13:42 - 13:49
    step 9, again, the Registration Token
    is passed to the TAN endpoint, the
  • 13:49 - 13:53
    verification server once more needs to
    check with the test results server
  • 13:53 - 13:57
    that it's actually a positive test result.
    Gets back here in step 11, TAN is
  • 13:57 - 14:02
    generated in step 12. You can see the TAN
    is not stored in plaintext, but it's
  • 14:02 - 14:08
    stored as a hash, but the plaintext is
    returned to the App, which can
  • 14:08 - 14:13
    then bundle it with Diagnosis Keys
    extracted from the exposure notification
  • 14:13 - 14:17
    framework and upload it to the Corona-
    Warn-App server or more specifically, the
  • 14:17 - 14:24
    submission service. But this also needs to
    verify that it's authentic, so takes it in
  • 14:24 - 14:31
    step 15 to the verification server on the
    verify endpoint. Where the TAN is
  • 14:31 - 14:37
    validated and validation means it is
    marked as used already, so at the same
  • 14:37 - 14:42
    time cannot be used twice, and then the
    response is given to the backend here,
  • 14:42 - 14:48
    which can then, if it's positive, which
    means if it's authentic TAN can store the
  • 14:48 - 14:55
    Diagnosis Key in its own storage. And as
    you can see, only the Diagnosis Keys are
  • 14:55 - 15:00
    stored here, nothing else. So there's no
    correlation possible between Diagnosis
  • 15:00 - 15:07
    Keys, Registration Token or even GUID
    because it's completely separate. But
  • 15:07 - 15:13
    still, what could be found out about users
    if someone were to observe the network
  • 15:13 - 15:19
    traffic going on there? An important
    assumption in the beginning, the content
  • 15:19 - 15:25
    of all the messages is secure because only
    secure connections are being used and only
  • 15:25 - 15:33
    the size of the transfer is observable. So
    we can, from a network sniffing
  • 15:33 - 15:38
    perspective observe that a connection is
    created. We can observe how many bytes are
  • 15:38 - 15:42
    being transferred back and forth, but we
    cannot learn about the content of the
  • 15:42 - 15:49
    message. So here we are, we've got the
    first communication between App and server
  • 15:49 - 15:56
    in step two, because we can see: OK, if
    someone is requesting something from the
  • 15:56 - 16:01
    Registration Token endpoint, this person
    has been tested maybe on that specific
  • 16:01 - 16:09
    day. Then there is next communication
    going on in step five, because this means
  • 16:09 - 16:13
    that the person has been tested. I mean,
    we might know that from step two already,
  • 16:13 - 16:19
    but this person has still not received the
    test result. So it might still be positive
  • 16:19 - 16:26
    or negative. If we can observe that the
    request to the TAN endpoint takes place in
  • 16:26 - 16:33
    step 9, then we know the person has
    been tested positive. So OK, this is
  • 16:33 - 16:39
    https, so we cannot actually learn which
    end point is being queried, but there
  • 16:39 - 16:44
    might be specific sizes to those
    individual requests which might allow us
  • 16:44 - 16:54
    to learn about the direction the request
    is going into. Just as a thought. OK, and
  • 16:54 - 16:59
    then, of course, we've got also the
    submission service in step 14 where users
  • 16:59 - 17:05
    upload their Diagnosis Keys and a TAN, and
    this is really, really without any
  • 17:05 - 17:12
    possibility for discussion, because if a
    App-context, the Corona-Warn-App server
  • 17:12 - 17:18
    and... builds up a connection - this must
    mean that the user has been tested
  • 17:18 - 17:25
    positive and is submitting Diagnosis Keys.
    Apart from that, once the user submits
  • 17:25 - 17:32
    Diagnosis Keys, and the App talks to the
    Corona-Warn-App backend - it could also be
  • 17:32 - 17:40
    possible to relate those keys to an origin
    IP-address, for example. Could there be a
  • 17:40 - 17:46
    way around that? So what we need to do in
    this scenario and what we did is to
  • 17:46 - 17:52
    establish plausible deniability, which
    basically means we generate so much noise
  • 17:52 - 17:58
    with the connections we build up that it's
    not possible to identify individuals which
  • 17:58 - 18:04
    actually use those connections to query their
    test results to receive the test result,
  • 18:04 - 18:11
    if it's positive, to retrieve a TAN or to
    upload the Keys. So generating noise is
  • 18:11 - 18:19
    the key. So what the App actually does is:
    simulate the backend traffic by sending
  • 18:19 - 18:24
    those fake or dummy requests according to
    a so-called playbook. So we've got... we
  • 18:24 - 18:29
    call it playbook, from which the App takes
    which requests to do, how long to wait,
  • 18:29 - 18:35
    how often to repeat those requests and so
    on. And it's also interesting that those
  • 18:35 - 18:40
    requests might either be triggered by real
    event or they might be triggered by just
  • 18:40 - 18:46
    some random trigger. So scanning a QR code
    or entering a teleTAN also triggers this
  • 18:46 - 18:51
    flow. A little bit different, but it still
    triggers it, because if you then get your
  • 18:51 - 18:56
    Registration Token retrieve your test
    results and the retrieval of your test
  • 18:56 - 19:02
    results stops at some point, this must
    mean, OK, there has been the test result -
  • 19:02 - 19:06
    negative or positive. If it's then
    observable that you communicate to the
  • 19:06 - 19:11
    submission service - this would mean that
    it has been positive. So what the App
  • 19:11 - 19:18
    actually does is: even if it is negative,
    it continues sending out dummy requests to
  • 19:18 - 19:25
    the verification server and it might also,
    so that's all based on random decisions
  • 19:25 - 19:32
    within the App, it might also then
    retrieve a fake TAN and it might do a fake
  • 19:32 - 19:37
    upload of Diagnosis Keys. So in the end,
    you're not able to distinguish between an App
  • 19:37 - 19:44
    actually uploading real data or an App just
    doing playbook's stuff and creating noise.
  • 19:44 - 19:50
    So users really uploading the Diagnosis
    Keys cannot be picked out from all the
  • 19:50 - 19:56
    noise. And to make sure that our backend,
    it's not just swamped with all those fake
  • 19:56 - 20:02
    and dummy requests, there's a special
    header field, which informs the backend to
  • 20:02 - 20:06
    actually ignore those requests. But if you
    would just ignore them and not send a
  • 20:06 - 20:12
    response - it could be implemented on the
    client, but then it would be observable
  • 20:12 - 20:17
    again that it's just a fake request. So
    what we do is - we let the backend skip
  • 20:17 - 20:22
    all the interaction with the underlying
    database infrastructure, do not modify any
  • 20:22 - 20:28
    data and so on, but there will be a delay
    in the response and the response will look
  • 20:28 - 20:34
    exactly the same as if it was to respond
    to real request. Also on the data, both
  • 20:34 - 20:41
    directions from the client to the server
    and from the server to the client, get
  • 20:41 - 20:47
    some padding, so it's always the same
    size, no matter what information is contained
  • 20:47 - 20:54
    in this data packages. So observing the
    data packages... so the size does not help
  • 20:54 - 21:00
    in finding out what's actually going on.
    Now, you could say, OK, if there's so much
  • 21:00 - 21:06
    additional traffic because they're fake
    requests being sent out and fake uploads
  • 21:06 - 21:12
    being done and so on, this must cost a lot
    of data traffic to the users. There's a
  • 21:12 - 21:19
    good point. It is all zero rated with
    German mobile operators, which means it's
  • 21:19 - 21:29
    not charged to the end customers, but it's
    just being paid for. Now, there is still that
  • 21:29 - 21:35
    thing with the extraction of information from
    the metadata while uploading the Diagnosis
  • 21:35 - 21:41
    Keys and this metadata might be the source
    IP address, it might be the user agent
  • 21:41 - 21:47
    being used. So then you can distinguish
    Android from iOS and possibly you could
  • 21:47 - 21:52
    also find out about the OS version and to
    prevent it with introduced an intermediary
  • 21:52 - 21:58
    server, which removes the metadata from
    the requests and just forwards the plain
  • 21:58 - 22:04
    content of the packages basically to the
    backend service. So the backend service,
  • 22:04 - 22:18
    the submission service is not able to tell
    from where this package came from. Now,
  • 22:18 - 22:25
    for risk calculation, we can have a look
    at which information is available here. So
  • 22:25 - 22:31
    we've got the information about
    encounters, which calculated at the device
  • 22:31 - 22:35
    receiving the Rolling Proximity
    Identifiers as mentioned earlier and those
  • 22:35 - 22:40
    information come into us in 30 minute
    exposure windows. So I mentioned earlier
  • 22:40 - 22:45
    that all the Rolling Proximity Identifiers
    belonging to a single Diagnosis Key. So
  • 22:45 - 22:50
    single day UTC basically that is, can be
    related to each other. But what the
  • 22:50 - 22:56
    exposure notification framework then does
    is split up those encounters in 30 minute
  • 22:56 - 23:06
    windows. So the first scan instance, where
    another device has been identified, starts
  • 23:06 - 23:10
    the exposure window and then it's filled
    up until the 30 minutes are full. And if
  • 23:10 - 23:14
    there's more encounters with the same
    Diagnosis Key basically, a new window is
  • 23:14 - 23:19
    started and so on. The single exposure
    window only contains a single device. So
  • 23:19 - 23:25
    it's one to one mapping. And within that
    window we can find the number of the scan
  • 23:25 - 23:33
    instances. So scans take place every three
    to five minutes and within those scan
  • 23:33 - 23:35
    instances, there are also multiple scans.
  • 23:35 - 23:38
    And we get the minimum and
    the average attenuation
  • 23:38 - 23:44
    per instance, and the attenuation is
    actually the reported transmit power of
  • 23:44 - 23:50
    the device minus the signal strength when
    receiving the signal. So it basically
  • 23:50 - 23:55
    tells us how much signal strength got lost
    on the way. If we talk about a low
  • 23:55 - 24:01
    attenuation, this means the other device
    has been very close. If the attenuation is
  • 24:01 - 24:08
    higher, it means the other device is farther
    away and, from the other way around, so
  • 24:08 - 24:13
    through the Diagnosis Keys, which have been
    uploaded to the server, processed on the
  • 24:13 - 24:17
    backend provided on CDN and came to us
    through that way, we can also get
  • 24:17 - 24:22
    information about the infectiousness of
    the user, which is encoded in something we
  • 24:22 - 24:30
    call Transmission Risk Level, which tells
    us how big the risk of infection from that
  • 24:30 - 24:38
    person on that specific day has been. So,
    the Transmission Risk Level is based on
  • 24:38 - 24:43
    the symptom status of a person and the
    symptom status means: Is the person
  • 24:43 - 24:49
    symptomatic, asymptomatic, does the
    person want to tell about the symptoms or
  • 24:49 - 24:54
    maybe do they not want to tell about the
    symptoms, and in addition to that, if
  • 24:54 - 24:59
    there have been symptoms, it can also be
    clarified whether the symptoms start was a
  • 24:59 - 25:03
    specific day, whether it has been a range
    of multiple days when the symptoms
  • 25:03 - 25:08
    started, or people could also say: "I'm
    not sure about when the symptoms started,
  • 25:08 - 25:15
    but there have been symptoms definitely".
    So this is the first case people can
  • 25:15 - 25:20
    specify when the symptoms started and we
    can say that the symptoms start down here
  • 25:20 - 25:28
    and around that date of the onset of
    symptoms, it's basically evenly spread the
  • 25:28 - 25:36
    risk of infection: red means high risk,
    blue means low risk. See, when you move
  • 25:36 - 25:44
    around that symptom start day also the
    infectiousness moves around and there's
  • 25:44 - 25:48
    basically a matrix from where this
    information is derived. Again, you can
  • 25:48 - 25:54
    find that all in the code. And there's
    also the possibility to say, OK, the
  • 25:54 - 26:00
    symptoms started somewhere within the last
    seven days. That's the case up here. See,
  • 26:00 - 26:05
    it's spread a little bit differently.
    Users could also specify it started
  • 26:05 - 26:11
    somewhere from one to two weeks ago. You
    can see that here in the second chart and
  • 26:11 - 26:19
    the third chart is the case for when the
    symptoms started more than two weeks ago.
  • 26:19 - 26:24
    Now, here's the case, that user specify
    that they just received a positive test
  • 26:24 - 26:28
    result. So they're definitely Corona
    positive, but they have never had
  • 26:28 - 26:33
    symptoms, which might mean they are
    asymptomatic or presymptomatic. And,
  • 26:33 - 26:40
    again, you see around the submission,
    there is an increased risk, but all the
  • 26:40 - 26:48
    time before here only has a low
    transmission level asigned. If users want
  • 26:48 - 26:52
    to specify that they can't remember when
    the symptoms started, but they definitely
  • 26:52 - 27:00
    had symptoms, then it's all spread a
    little bit differently. And equally, if
  • 27:00 - 27:03
    users do not want to share the
    information, whether they had symptoms at
  • 27:03 - 27:10
    all. So now we've got this big risk
    calculation chart here, and I would like
  • 27:10 - 27:14
    to walk you quickly through it. So on the
    left, we've got the configuration which is
  • 27:14 - 27:19
    being fed into the exposure notification
    framework by Appe / Google, because
  • 27:19 - 27:24
    there's also some mappings which the
    framework needs from us. There is some
  • 27:24 - 27:29
    internal configuration because we have
    decided to do a lot of the risk
  • 27:29 - 27:33
    calculation within the App instead of
    doing it in the framework, mainly because
  • 27:33 - 27:40
    we have decided we want a eight levels,
    transmission risk levels, instead of the
  • 27:40 - 27:45
    only three levels, so low, standard and
    high, which Apple and Google provide to
  • 27:45 - 27:51
    us. For the sake of having those eight
    levels, we actually sacrifice the
  • 27:51 - 27:56
    parameters of infectiousness, which is
    derived from the parameter days since
  • 27:56 - 28:03
    onset of symptoms and the report type,
    which is always a confirmed test in Europe.
  • 28:03 - 28:08
    So we got those three bits actually, which
    we can now use as a Transmission Risk
  • 28:08 - 28:13
    Level, which is encoded on the server in
    those two fields, added to the Keys and
  • 28:13 - 28:20
    the content delivery network, downloaded
    by the App and then passed through the
  • 28:20 - 28:25
    calculation here. So it comes in here. It
    is assembled from those two parameters,
  • 28:25 - 28:31
    Report Type and Infectiousness, and now it
    goes along. So first, we need to look,
  • 28:31 - 28:38
    whether the sum of the durations at below
    73 decibels. So that's our first threshold
  • 28:38 - 28:43
    has been less than 10 minutes. If it has
    been less than 10 minutes, just drop the
  • 28:43 - 28:49
    whole exposure window. If it has been more
    or equal 10 minutes, we might use it,
  • 28:49 - 28:56
    depending on whether the Transmission Risk
    Level is larger or equal three and we use
  • 28:56 - 29:06
    it. And now we actually calculate the
    relevant time and times between 60...
  • 29:06 - 29:13
    between 55 and 63 decibels are only counted
    half, because that's a medium distance and
  • 29:13 - 29:19
    times at below 55 decibels, that's up here
    are counted full, then added up. And
  • 29:19 - 29:24
    then we've got the weight exposure time
    and now we've got this transmission risk
  • 29:24 - 29:29
    level, which leads us to a normalization
    factor, basically. And this is multiplied
  • 29:29 - 29:34
    with the rate exposure time. What we get
    here is the normalized exposure time per
  • 29:34 - 29:40
    exposure window and those times for each
    window are added up for the whole day. And
  • 29:40 - 29:45
    then that's the threshold of 15 minutes,
    which decides whether the day had a high
  • 29:45 - 29:54
    risk of infection or a low risk. So now
    that you all know how to do those
  • 29:54 - 30:01
    calculations, we can walk through it for
    three examples. So the first example is
  • 30:01 - 30:05
    here: it's a transmission risk level of
    seven. You can see those all are pretty
  • 30:05 - 30:10
    close so our magic thresholds are here at
    73. That's for whether that's counted or
  • 30:10 - 30:18
    not. Then at 63, it's this line. And at
    55. So we see, OK, there's been a lot of
  • 30:18 - 30:23
    close contact going on and some medium
    range contact as well. So let's do the
  • 30:23 - 30:29
    pre-filtering, even though we already see
    it has been at least 10 minutes below 73
  • 30:29 - 30:36
    decibels. Yes, definitely, because each of
    those dots represents three minutes. So,
  • 30:36 - 30:41
    for this example calculation, I just
    assumed the scan windows are three minutes
  • 30:41 - 30:48
    apart. Is it at least transmission risk
    level three? Yes, it's even seven. So now
  • 30:48 - 30:54
    we do the calculation. It has been 18
    minutes a day low attenuation, so at a
  • 30:54 - 30:59
    close proximity, so that's 18 minutes and
    nine minutes those and those - three dots
  • 30:59 - 31:04
    here at a medium attenuation. So a little
    bit farther apart, they count as four and
  • 31:04 - 31:10
    a half minutes. We've got a factor here
    adding it up, it gets us to 25 minutes
  • 31:10 - 31:20
    multiplied by 1.4 giving us 33... 31.5
    minutes, which means red status. Already
  • 31:20 - 31:26
    with a single window. Now, in this
    example, we can always see that's pretty
  • 31:26 - 31:31
    far away and that's been one close
    encounter here, transmission risk level
  • 31:31 - 31:38
    eight even, pre-filtering: has it been at
    least 10 minutes below 73 decibels? Nope.
  • 31:38 - 31:43
    OK, then we already drop it. Now that's
    the third one. Transmission risk level
  • 31:43 - 31:51
    eight again. It has been a little bit
    away, but there's also been some close
  • 31:51 - 31:57
    contact, so we do the pre-filtering: has
    it been at least 10 minutes below 73? Now
  • 31:57 - 32:03
    we already have to look closely. So, yes.
    It is below 73, this one as well. OK, so
  • 32:03 - 32:10
    we've got four dots below 73 decibels.
    Gives us 12 minutes. Yes, transmission
  • 32:10 - 32:15
    risk level three. OK, that's easy. Yes.
    And now we can do the calculation. It has
  • 32:15 - 32:21
    been six minutes at the low attenuation -
    those two dots here. OK, they count full
  • 32:21 - 32:26
    and zero minutes at the medium
    attenuation. You see this part is empty
  • 32:26 - 32:31
    and the transmission risk level eight
    gives us a factor of 1.6. If we now
  • 32:31 - 32:37
    multiply the six minutes by 1.6, we get
    9.6 minutes. So if this has been the only
  • 32:37 - 32:42
    encounter for a day, that's stil
    green. But if, for example, you had two
  • 32:42 - 32:48
    encounters of this kind, so with the same
    person or with different people, then it
  • 32:48 - 32:53
    would already turn into red because then
    it's close to 20 minutes, which is above
  • 32:53 - 33:01
    the 15 minute threshold. Now, I would like
    to thank you for listening to my session,
  • 33:01 - 33:05
    and I'm available for Q&A shortly.
  • 33:13 - 33:19
    Herald: OK, so thank you, Tomas. This was
    a prerecorded talk and the discussion was
  • 33:19 - 33:24
    very lively in the IRC during the talk,
    and I'm glad that Thomas will be here for
  • 33:24 - 33:36
    the Q&A. Maybe to start with the first
    question by MH in IRC on security and
  • 33:36 - 33:45
    replay attacks: Italy and Netherlands
    published TAKs DKs so early today are
  • 33:45 - 33:50
    still valid. We learned that yesterday and
    the time between presentation, how is this
  • 33:50 - 33:55
    handled in the European cooperation and
    can you make them adhere to the security
  • 33:55 - 34:03
    requirements? This is the first question
    for you, Thomas.
  • 34:03 - 34:08
    Thomas: OK, so thank you for this
    question. The way we handle Keys coming
  • 34:08 - 34:12
    in from other European contries,
    that's through the European federation
  • 34:12 - 34:15
    gateway service is, that they are handled
  • 34:15 - 34:20
    as if they were national keys,
    which means they are put in some kind of
  • 34:20 - 34:27
    embargo for two hours until... so two
    hours after the end of their validity to
  • 34:27 - 34:32
    make sure that replay attacks are not
    possible.
  • 34:32 - 34:38
    Herald: All right, I hope that answers
    this actually. OK, and then there was
  • 34:38 - 34:43
    another one on international
    interoperability: is it EU only or is
  • 34:43 - 34:49
    there is also cooperation between EU and,
    for example, Switzerland?
  • 34:49 - 34:57
    Thomas: So so far, we've got the cooperation
    with other EU countries from audio glitches
  • 34:57 - 35:07
    the European Union, which interoperates
    already, and regarding the integration of
  • 35:07 - 35:14
    non-EU countries, that's basically a
    political decision that has to be made
  • 35:14 - 35:22
    from this place as well. So that's nothing
    I as an architect can drive or control. So
  • 35:22 - 35:28
    so far, it's only EU countries.
    Herald: All right. And then I have some
  • 35:28 - 35:33
    comments and also questions on community
    interaction and implementation of new
  • 35:33 - 35:38
    features, which seems a little slow for
    some. There was, for example, a proposal
  • 35:38 - 35:43
    for functionality called Crowd Notifier
    for events and restaurants to check in by
  • 35:43 - 35:50
    scanning a QR code. Can you tell us a bit
    more about this or what's there? Are you
  • 35:50 - 35:59
    aware of this?
    Thomas: So I've personally seen that there
  • 35:59 - 36:04
    are proposals online, and that is also a
    lively discussion on those issues, but
  • 36:04 - 36:10
    what you need to keep in mind is that we
    are also... we have the task of developing
  • 36:10 - 36:16
    this App for the federal ministry of
    Health, and they are basically the ones
  • 36:16 - 36:23
    requesting features and then there's some
    scoping going on. So I'm personally and so
  • 36:23 - 36:30
    to say that again, I am the architect so I
    can't decide which features are going to
  • 36:30 - 36:35
    be implemented. It's just as soon as the
    decision has been made that we need a new
  • 36:35 - 36:41
    feature, so after we've been given
    the task, then I come in and prepare the
  • 36:41 - 36:47
    architecture for that. So I'm not aware of
    the current state of those developments,
  • 36:47 - 36:50
    to be honest, because that's out of my
    personal scope.
  • 36:50 - 36:56
    Herald: All right. I mean, it's often the
    case, I suppose, with great projects, with
  • 36:56 - 37:02
    huge project. But overall, people seem to
    be liking the fact that everything is
  • 37:02 - 37:09
    available on GitHub. But some people are
    really dedicated and seem to be a bit
  • 37:09 - 37:14
    disappointed that interaction with the
    community on GitHub seems a bit slow,
  • 37:14 - 37:20
    because some issues are not answered as
    people would hope it would be. Do you know
  • 37:20 - 37:27
    that about some ideas on adding dedicated
    community managers to the GitHub community
  • 37:27 - 37:33
    around the App? So the people we speak
    with, that was one note in IRC, actually
  • 37:33 - 37:37
    seem to be changing every month. So are
    you aware of this kind of position of
  • 37:37 - 37:41
    community management.
    Thomas: So there's people definitely
  • 37:41 - 37:45
    working on the community management,
    there's also a lot of feedback and
  • 37:45 - 37:52
    comments coming in from the community, and
    I'm definitely aware that there are people
  • 37:52 - 37:59
    working on that. And, for example, I get
    asked by them to jump in on certain
  • 37:59 - 38:04
    questions where verification was needed
    from an architectural point of view. And
  • 38:04 - 38:09
    that's... if you look at GitHub, there's
    also some issues I've been answering, and
  • 38:09 - 38:15
    that's because our community team has
    asked me to jump in there. So but the
  • 38:15 - 38:20
    feedback that people are not fully
    satisfied with the way how the community
  • 38:20 - 38:23
    is handled, is something I would
    definitely take back to our team
  • 38:23 - 38:28
    internally and let them know about it.
    Herald: Yeah, that's great to know,
  • 38:28 - 38:34
    actually. So people will have some answers
    on that. Maybe one last very concrete
  • 38:34 - 38:39
    question by duffman in the IRC: Is the
    inability of the App to show the time/day
  • 38:39 - 38:43
    of exposures a limitation of the
    framework or is it an implementation
  • 38:43 - 38:47
    choice? And what would be the privacy
    implications of introducing such a
  • 38:47 - 38:51
    feature? Actually, a big question, but
    maybe you can cut it short.
  • 38:51 - 38:57
    Thomas: Yeah, OK, so the only information,
    the exposion notification framework by
  • 38:57 - 39:02
    Google / Apple can give us - is the date
    of the exposure, and date always relates
  • 39:02 - 39:08
    to UTC there. And so we never get the time
    of the actual exposure back. And when
  • 39:08 - 39:14
    moving to the exposure windows, we also do
    not get the time back of the exposure
  • 39:14 - 39:19
    window. And the implications if you were
    able to tell the exact time of the
  • 39:19 - 39:24
    encounter, would be that people are often
    aware where they've been at a certain
  • 39:24 - 39:30
    time. And let's say at 11:15, you were
    meeting with a friend and you get a
  • 39:30 - 39:36
    notification that at 11:15, you had that
    exact encounter, it would be easy to tell
  • 39:36 - 39:44
    whom you've met, who's been infected. And
    that's something not desired, that you can
  • 39:44 - 39:50
    trace it back to a certain person. So the
    personification would basically then be
  • 39:50 - 39:54
    the thing.
    Herald: All right, and I hope we have time
  • 39:54 - 39:58
    for this last question asked on IRC:
    have you considered training a machine
  • 39:58 - 40:02
    learning method to classified the risk
    levels instead of the used rule-based
  • 40:02 - 40:13
    method?
    Thomas: So, I mean, classifying the risk
  • 40:13 - 40:21
    levels through machine learning is
    something I'm not aware of yet. So the
  • 40:21 - 40:26
    thing is, it's all based on basically a
    cooperation with the Fraunhofer Institute,
  • 40:26 - 40:31
    where they have basically reenacted
    certain situations, did some measurements
  • 40:31 - 40:36
    and that's what has been transferred into
    the risk model. So all those thresholds
  • 40:36 - 40:45
    are derived from, basically, practical
    tests. So no ML at the moment.
  • 40:45 - 40:53
    Herald: All right, so I suppose this was
    our last question and again, Thomas, a
  • 40:53 - 40:58
    warm round of virtual applause to you and
    thank you again, Thomas, for giving this
  • 40:58 - 41:04
    talk, for being part of this first remote
    case experience and for giving us some
  • 41:04 - 41:09
    insight into the backend of the Corona-
    Warn-App. Thank you.
  • 41:09 - 41:12
    Thomas: Was happy to do so. Thank you for
    having me here.
  • 41:12 - 41:16
    rC3 postroll music
  • 41:16 - 41:50
    Subtitles created by c3subtitles.de
    in the year 2021. Join, and help us!
Title:
#rC3 - Corona-Warn-App
Description:

more » « less
Video Language:
English
Duration:
41:52

English subtitles

Revisions Compare revisions