1 00:00:00,000 --> 00:00:12,988 rC3 preroll music 2 00:00:12,988 --> 00:00:20,400 Herald: All right, CWA - three simple letters, but what stands behind them is 3 00:00:20,400 --> 00:00:25,360 not simple at all. For various reasons. The Corona-Warn-App has been one of the most 4 00:00:25,360 --> 00:00:30,160 talked about digital project of the year. Behind its rather simplistic facade there 5 00:00:30,160 --> 00:00:34,960 are many considerations that went into the App's design to protect its users and 6 00:00:34,960 --> 00:00:39,120 their data, while they might not be visible for most users, these goals had a 7 00:00:39,120 --> 00:00:43,280 direct influence on the software architecture. For instance, the risk 8 00:00:43,280 --> 00:00:48,080 calculation. Here today to talk about some of these backend elements is one of the 9 00:00:48,080 --> 00:00:53,200 solution architects of the Corona-Warn-App - Thomas Klingbeil. And I'm probably not 10 00:00:53,200 --> 00:00:59,520 the only one here at rC3, who is an active user. And I'm pretty curious to hear more 11 00:00:59,520 --> 00:01:04,480 about what's going on behind the scenes of the App. So without further ado, let's 12 00:01:04,480 --> 00:01:11,640 give a warm virtual welcome to Thomas Klingbeil. Thomas, the stream is yours. 13 00:01:15,316 --> 00:01:18,960 Thomas Klingbeil: Hello, everybody. I'm Thomas Klingbeil, and today in the 14 00:01:18,960 --> 00:01:23,280 session, I would like to talk about the German Corona-Warn-App and give you a 15 00:01:23,280 --> 00:01:27,760 little tour behind the scenes of the App development, the underlying technologies 16 00:01:28,320 --> 00:01:33,520 and which things are invisible to the end user, but still very important for the App 17 00:01:33,520 --> 00:01:38,240 itself. First, I would like to give you a short introduction to the App, the 18 00:01:38,240 --> 00:01:41,600 underlying architecture and to used technologies, for example, the Exposure 19 00:01:41,600 --> 00:01:45,680 Notification Framework. Then I would like to have a look on the communication 20 00:01:45,680 --> 00:01:52,080 between the App and the backend and looking at which possible privacy threats 21 00:01:52,080 --> 00:01:57,280 could be found and how we mitigated them, of course. And then I would like to dive a 22 00:01:57,280 --> 00:02:01,600 little bit into the risk calculation of the App to show you what it actually 23 00:02:01,600 --> 00:02:06,960 means, If there is a red or a green screen, visible to the end user. First of 24 00:02:06,960 --> 00:02:13,280 all, we can ask ourselves the question, what is the Corona-Warn-App, actually? So, 25 00:02:13,280 --> 00:02:20,960 here it is. This is the German Corona-Warn-App, you can download it from the App stores and 26 00:02:20,960 --> 00:02:24,640 once you have unboarded onto the App, you will see the following: up here it shows 27 00:02:24,640 --> 00:02:28,880 you that the exposure logging is active, which means this is the currently active 28 00:02:28,880 --> 00:02:34,160 App. Then you have this green card. Green means it's low risk because there have 29 00:02:34,160 --> 00:02:39,520 been no exposures so far. The logging has been permanently active and it has just 30 00:02:39,520 --> 00:02:45,200 updated this afternoon. So everything is all right. Let's say you have just been 31 00:02:45,200 --> 00:02:51,200 tested at a doctor's, then you could click this button here and you get to the 32 00:02:51,200 --> 00:02:57,040 screen, we you're able to retrieve your test result digitally. To do this, you can scan 33 00:02:57,040 --> 00:03:01,600 a QR code, which is on the phone, you received from your doctor, and then you 34 00:03:01,600 --> 00:03:08,560 will get an update as soon as the test result is available. Of course, you can 35 00:03:08,560 --> 00:03:12,880 also get more information about the active exposure logging when you click the button 36 00:03:12,880 --> 00:03:16,800 up here, then you get to this screen and there you can learn more about the 37 00:03:16,800 --> 00:03:21,600 transnational exposure logging, because the German Corona-Warn-App is not alone. 38 00:03:21,600 --> 00:03:27,200 It is connected to other Corona-Apps of other countries within Europe. So users 39 00:03:27,200 --> 00:03:32,720 from other countries can meet and they would be informed mutually about possible 40 00:03:32,720 --> 00:03:40,960 encounters. So just to be sure, I would like to quickly dive into the terminology 41 00:03:40,960 --> 00:03:45,760 of the exposure notification framework. So you know what I'm talking about during 42 00:03:45,760 --> 00:03:52,560 this session. It all starts with a Temporary Exposure Key which is generated 43 00:03:52,560 --> 00:03:58,930 on the phone and which is valid for 24 hours. From this Temporary Exposure Key, 44 00:03:58,930 --> 00:04:03,200 several things are derived. First, for example, there is the Rolling Proximity 45 00:04:03,200 --> 00:04:09,040 Identifier Key and the Associated Encrypted Metadata Key. This part down 46 00:04:09,040 --> 00:04:13,280 here, we can skip for a moment being and look at the generation of Rolling 47 00:04:13,280 --> 00:04:18,560 Proximity Identifiers. Those Rolling Proximity Identifiers are only valid for 48 00:04:18,560 --> 00:04:24,160 10 minutes each because they are regularly exchanged once the Bluetooth MAC-Address 49 00:04:24,160 --> 00:04:29,040 change takes place. So the Rolling Proximity Identifier is basically the 50 00:04:29,040 --> 00:04:33,360 Bluetooth payload your phone uses, when the Exposion Verification Framework is 51 00:04:33,360 --> 00:04:38,960 active and broadcasting. When I say broadcasting, I mean every 250 52 00:04:38,960 --> 00:04:45,120 milliseconds your phone sends out its own Rolling Proximity Identifiers, so other 53 00:04:45,120 --> 00:04:51,600 phones around, which are scanning for signal in the air basically can catch them 54 00:04:51,600 --> 00:04:58,400 and store them locally. So let's look at the receiving side. This is what we see 55 00:04:58,400 --> 00:05:02,240 down here and now, as I've already mentioned, we've got those Bluetooth low 56 00:05:02,240 --> 00:05:07,280 energy beacon mechanics sending out those Rolling Proximity Identifiers and they're 57 00:05:07,280 --> 00:05:13,120 received down here. This is all a very simplified schematic, just to give you an 58 00:05:13,120 --> 00:05:17,760 impression of what's going on there. So now we've got those Rolling Proximity 59 00:05:17,760 --> 00:05:23,600 Identifiers stored under receiving phone and now, somehow, this other phone needs 60 00:05:23,600 --> 00:05:28,960 to find out that there has been a match, this happens by transforming those 61 00:05:28,960 --> 00:05:34,400 Temporary Exposure Keys into Diagnosis Keys, which is just a renaming. But as 62 00:05:34,400 --> 00:05:38,400 soon as someone has tested positive and a Temporary Exposure Key is linked to a 63 00:05:38,400 --> 00:05:43,440 positive diagnosis, it is called Diagnosis Key and they are uploaded to the server. 64 00:05:44,240 --> 00:05:51,760 And I'm drastically simplifying here. So they receive the other phone here they're 65 00:05:51,760 --> 00:05:57,680 downloaded, all those Diagnosis Keys are extracted again. And as you can see, the 66 00:05:57,680 --> 00:06:06,080 same functions applied, again HKDF, then AES, and we get a lot of Rolling Proximity 67 00:06:06,080 --> 00:06:12,560 Identifiers for matching down here. And those are the ones we have stored and now 68 00:06:12,560 --> 00:06:16,560 we can match them and find out which of those Rolling Proximity Identifiers we 69 00:06:16,560 --> 00:06:23,345 have seen so far. And, of course, the receiving phone can also make sure that 70 00:06:23,345 --> 00:06:28,320 the Rolling Proximity Identifiers belonging to a single Diagnosis Key, which 71 00:06:28,320 --> 00:06:33,200 means they belong to one single other phone, are connected to each other. So we 72 00:06:33,200 --> 00:06:37,840 can also track exposures which have lasted longer than 10 minutes. So, for example, 73 00:06:37,840 --> 00:06:42,640 if you are having a meeting of 90 minutes, this would allow the explosion 74 00:06:42,640 --> 00:06:46,960 notification framework to get together those up to nine Rolling Proximity 75 00:06:46,960 --> 00:06:52,320 Identifiers and transform them into a single encounter, which you then get 76 00:06:52,320 --> 00:06:57,280 enriched with those associated encrypted metadata, which is basically just the 77 00:06:57,280 --> 00:07:04,560 transmit power. As a summary, down here. So now that we know which data are being 78 00:07:04,560 --> 00:07:10,000 transferred from phone to phone, we can have a look at the actual architecture of 79 00:07:10,000 --> 00:07:16,960 the App itself. This gray box here is the mobile phone, and down here is the German 80 00:07:16,960 --> 00:07:23,120 Corona-Warn-App, it's a dashed line, which means there's more documentation available 81 00:07:23,120 --> 00:07:28,560 online. So I can only invite you to go to GitHub repository. Have a look at our code 82 00:07:28,560 --> 00:07:34,320 and, of course, our documentation. So there are more diagrams available. And as 83 00:07:34,320 --> 00:07:38,960 you can see, the App itself does not store a lot of data. So those boxes here are 84 00:07:38,960 --> 00:07:43,600 storages. So it only store something called a Registration Token and the 85 00:07:43,600 --> 00:07:49,840 contact journal entries for our most recent version, which means that's all the 86 00:07:49,840 --> 00:07:55,520 App stores itself. What you can see here is that it's connected to the operating 87 00:07:55,520 --> 00:07:59,760 system API/SDK for the exposure notifications, so that's the exposure 88 00:07:59,760 --> 00:08:04,320 notification framework to which we interface, which takes care of all the key 89 00:08:04,320 --> 00:08:09,520 connecting, broadcasting and key matching as well. Then there's a protocol buffer 90 00:08:09,520 --> 00:08:15,760 library which we need for the data transfer, and we use the operating system 91 00:08:15,760 --> 00:08:21,440 cryptography libraries or, basically, the SDK. So we don't need to include external 92 00:08:21,440 --> 00:08:30,720 libraries for that. What you can see here is the OS API/SDK for push messages. But 93 00:08:30,720 --> 00:08:36,400 this is not remote push messaging, but only locally. So the App triggers local 94 00:08:36,400 --> 00:08:42,320 notifications and to the user, it appears as if the notifications to push message 95 00:08:42,320 --> 00:08:50,264 came in remotely, but actually it only uses local messages. But what would the 96 00:08:50,264 --> 00:08:56,242 App be without the actual backend infrastructure? So you can see here, 97 00:08:56,242 --> 00:09:01,840 that's the Corona-Warn-App server, that's the actual backend for managing all the 98 00:09:01,840 --> 00:09:07,710 keys. And you see the upload path here. It's aggregated then provided through 99 00:09:07,710 --> 00:09:13,371 content delivery network and downloaded by the App here. But we've got more. We've 100 00:09:13,371 --> 00:09:19,209 got the verification server, which has the job of verifying a positive test result. 101 00:09:19,209 --> 00:09:26,258 And how does it do that? There's basically two ways it can either get the information 102 00:09:26,258 --> 00:09:32,659 that a positive test is true though a so- called teleTAN, which is the most basic 103 00:09:32,659 --> 00:09:37,712 way, because people call up the hotline, get one of those teleTAN, entered into the 104 00:09:37,712 --> 00:09:45,045 App and then they are able to upload the Diagnosis Keys or, if people use the fully 105 00:09:45,045 --> 00:09:49,892 digital way, they get their test result through the App. And that's why we have 106 00:09:49,892 --> 00:09:54,840 the test results server up here, which can be queried by the verification server 107 00:09:54,840 --> 00:10:01,563 so users can get the test result through the infrastructure. But that's not all, 108 00:10:01,563 --> 00:10:06,761 because as I've promised earlier, we've also got the connection to other European 109 00:10:06,761 --> 00:10:11,456 countries. So down here is the European Federation Gateway Service, which gives us 110 00:10:11,456 --> 00:10:16,815 the possibility to a) upload our own national keys to this European Federation 111 00:10:16,815 --> 00:10:20,756 Gateway Service, so other countries can download them and distribute them to their 112 00:10:20,756 --> 00:10:25,940 users, but we can also request foreign keys and, gets even better, we can be 113 00:10:25,940 --> 00:10:31,264 informed if new foreign keys are available for download through a callback mechanism, 114 00:10:31,264 --> 00:10:40,430 which is just here on the right side. So once the app is communicating with the 115 00:10:40,430 --> 00:10:48,574 backend, what would actually happen if someone is listening? So we've got our 116 00:10:48,574 --> 00:10:58,903 dataflow here. And. Let's have a look at it, so in step one, we are actually 117 00:10:58,903 --> 00:11:05,072 scanning the QR code with a camera of the phone and extracted from the QR code would 118 00:11:05,072 --> 00:11:10,870 be a GUID, which is then fed into the Corona-Warn-App. You can see here it is 119 00:11:10,870 --> 00:11:14,880 never stored within the app. That's very important, because we wanted to make sure 120 00:11:14,880 --> 00:11:19,467 that as few information as possible needs to be stored within the app and also that 121 00:11:19,467 --> 00:11:23,939 it's not possible to connect information from different sources, for example, to 122 00:11:23,939 --> 00:11:31,570 trace back Diagnosis Key to a GUID to allow personification. It was very 123 00:11:31,570 --> 00:11:38,916 important that this step is not possible. So we had to take care that no data is 124 00:11:38,916 --> 00:11:45,319 stored together and data cannot be connected again. So in step one, we get 125 00:11:45,319 --> 00:11:50,022 this GUID. And this is then hashed on the phone being sent to the verification 126 00:11:50,022 --> 00:11:55,636 server, which in step three generates a so-called Registration Token and stores it 127 00:11:55,636 --> 00:12:02,049 together. So it stores the hash(GUID) and the hash(Registration Token), making sure 128 00:12:02,049 --> 00:12:08,748 that GUID can only be used once and returns the unhashed Registration Token to 129 00:12:08,748 --> 00:12:17,920 the App here. Now the App can store the Registration Token and use it in step five 130 00:12:17,920 --> 00:12:22,709 for polling for test results, but the test results are not available directly on the 131 00:12:22,709 --> 00:12:27,378 verification server, because we do not store it here. But the verification server 132 00:12:27,378 --> 00:12:32,707 connects to the test results server by using the hash(GUID), which can get from 133 00:12:32,707 --> 00:12:38,836 the hash(Registration Token) here, and then it can ask the test results server. And 134 00:12:38,836 --> 00:12:43,831 the test results server might have a data set connecting the hash(GUID) to the test 135 00:12:43,831 --> 00:12:51,649 result. And this check needs to be done because the test results server might also 136 00:12:51,649 --> 00:12:56,546 have no information for this hash(GUID), and this only means that no test result 137 00:12:56,546 --> 00:13:01,436 has received yet. This is what happens here in step A, the Lab Information 138 00:13:01,436 --> 00:13:06,820 system, the LIS, can supply the test results server with a package of 139 00:13:06,820 --> 00:13:12,044 hash(GUID) and the test result - so it's stored there. And if it's available already 140 00:13:12,044 --> 00:13:18,318 on a test result server, it is returned to the verification server and here in step 7 and 141 00:13:18,318 --> 00:13:25,101 accordingly in step 8 to the App. You might have noted the test results is also, 142 00:13:25,101 --> 00:13:30,000 neither cached nor stored here on the verification server, which means if the 143 00:13:30,000 --> 00:13:36,276 user then decides to upload the keys, a TAN is required to pass onto the backend 144 00:13:36,276 --> 00:13:41,877 for verification of the positive test. An equal flow needs to be followed. So in 145 00:13:41,877 --> 00:13:48,815 step 9, again, the Registration Token is passed to the TAN endpoint, the 146 00:13:48,815 --> 00:13:52,680 verification server once more needs to check with the test results server 147 00:13:52,680 --> 00:13:56,878 that it's actually a positive test result. Gets back here in step 11, TAN is 148 00:13:56,878 --> 00:14:02,056 generated in step 12. You can see the TAN is not stored in plaintext, but it's 149 00:14:02,056 --> 00:14:08,387 stored as a hash, but the plaintext is returned to the App, which can 150 00:14:08,387 --> 00:14:12,773 then bundle it with Diagnosis Keys extracted from the exposure notification 151 00:14:12,773 --> 00:14:17,491 framework and upload it to the Corona- Warn-App server or more specifically, the 152 00:14:17,491 --> 00:14:24,155 submission service. But this also needs to verify that it's authentic, so takes it in 153 00:14:24,155 --> 00:14:30,598 step 15 to the verification server on the verify endpoint. Where the TAN is 154 00:14:30,598 --> 00:14:36,769 validated and validation means it is marked as used already, so at the same 155 00:14:36,769 --> 00:14:41,804 time cannot be used twice, and then the response is given to the backend here, 156 00:14:41,804 --> 00:14:48,366 which can then, if it's positive, which means if it's authentic TAN can store the 157 00:14:48,366 --> 00:14:54,560 Diagnosis Key in its own storage. And as you can see, only the Diagnosis Keys are 158 00:14:54,560 --> 00:14:59,537 stored here, nothing else. So there's no correlation possible between Diagnosis 159 00:14:59,537 --> 00:15:07,044 Keys, Registration Token or even GUID because it's completely separate. But 160 00:15:07,044 --> 00:15:13,392 still, what could be found out about users if someone were to observe the network 161 00:15:13,392 --> 00:15:18,811 traffic going on there? An important assumption in the beginning, the content 162 00:15:18,811 --> 00:15:25,200 of all the messages is secure because only secure connections are being used and only 163 00:15:25,200 --> 00:15:32,554 the size of the transfer is observable. So we can, from a network sniffing 164 00:15:32,554 --> 00:15:37,801 perspective observe that a connection is created. We can observe how many bytes are 165 00:15:37,801 --> 00:15:42,044 being transferred back and forth, but we cannot learn about the content of the 166 00:15:42,044 --> 00:15:49,412 message. So here we are, we've got the first communication between App and server 167 00:15:49,412 --> 00:15:55,520 in step two, because we can see: OK, if someone is requesting something from the 168 00:15:55,520 --> 00:16:00,739 Registration Token endpoint, this person has been tested maybe on that specific 169 00:16:00,739 --> 00:16:08,697 day. Then there is next communication going on in step five, because this means 170 00:16:08,697 --> 00:16:13,448 that the person has been tested. I mean, we might know that from step two already, 171 00:16:13,448 --> 00:16:18,501 but this person has still not received the test result. So it might still be positive 172 00:16:18,501 --> 00:16:25,973 or negative. If we can observe that the request to the TAN endpoint takes place in 173 00:16:25,973 --> 00:16:33,333 step 9, then we know the person has been tested positive. So OK, this is 174 00:16:33,333 --> 00:16:38,995 https, so we cannot actually learn which end point is being queried, but there 175 00:16:38,995 --> 00:16:44,360 might be specific sizes to those individual requests which might allow us 176 00:16:44,360 --> 00:16:53,775 to learn about the direction the request is going into. Just as a thought. OK, and 177 00:16:53,775 --> 00:16:58,606 then, of course, we've got also the submission service in step 14 where users 178 00:16:58,606 --> 00:17:04,821 upload their Diagnosis Keys and a TAN, and this is really, really without any 179 00:17:04,821 --> 00:17:12,154 possibility for discussion, because if a App-context, the Corona-Warn-App server 180 00:17:12,154 --> 00:17:17,934 and... builds up a connection - this must mean that the user has been tested 181 00:17:17,934 --> 00:17:24,641 positive and is submitting Diagnosis Keys. Apart from that, once the user submits 182 00:17:24,641 --> 00:17:31,688 Diagnosis Keys, and the App talks to the Corona-Warn-App backend - it could also be 183 00:17:31,688 --> 00:17:39,720 possible to relate those keys to an origin IP-address, for example. Could there be a 184 00:17:39,720 --> 00:17:45,821 way around that? So what we need to do in this scenario and what we did is to 185 00:17:45,821 --> 00:17:51,758 establish plausible deniability, which basically means we generate so much noise 186 00:17:51,758 --> 00:17:58,000 with the connections we build up that it's not possible to identify individuals which 187 00:17:58,000 --> 00:18:04,157 actually use those connections to query their test results to receive the test result, 188 00:18:04,157 --> 00:18:11,016 if it's positive, to retrieve a TAN or to upload the Keys. So generating noise is 189 00:18:11,016 --> 00:18:18,576 the key. So what the App actually does is: simulate the backend traffic by sending 190 00:18:18,576 --> 00:18:24,162 those fake or dummy requests according to a so-called playbook. So we've got... we 191 00:18:24,162 --> 00:18:29,306 call it playbook, from which the App takes which requests to do, how long to wait, 192 00:18:29,306 --> 00:18:35,229 how often to repeat those requests and so on. And it's also interesting that those 193 00:18:35,229 --> 00:18:40,323 requests might either be triggered by real event or they might be triggered by just 194 00:18:40,323 --> 00:18:45,822 some random trigger. So scanning a QR code or entering a teleTAN also triggers this 195 00:18:45,822 --> 00:18:50,703 flow. A little bit different, but it still triggers it, because if you then get your 196 00:18:50,703 --> 00:18:55,600 Registration Token retrieve your test results and the retrieval of your test 197 00:18:55,600 --> 00:19:01,709 results stops at some point, this must mean, OK, there has been the test result - 198 00:19:01,709 --> 00:19:06,116 negative or positive. If it's then observable that you communicate to the 199 00:19:06,116 --> 00:19:10,698 submission service - this would mean that it has been positive. So what the App 200 00:19:10,698 --> 00:19:17,810 actually does is: even if it is negative, it continues sending out dummy requests to 201 00:19:17,810 --> 00:19:24,660 the verification server and it might also, so that's all based on random decisions 202 00:19:24,660 --> 00:19:31,727 within the App, it might also then retrieve a fake TAN and it might do a fake 203 00:19:31,727 --> 00:19:36,944 upload of Diagnosis Keys. So in the end, you're not able to distinguish between an App 204 00:19:36,944 --> 00:19:43,709 actually uploading real data or an App just doing playbook's stuff and creating noise. 205 00:19:43,709 --> 00:19:49,814 So users really uploading the Diagnosis Keys cannot be picked out from all the 206 00:19:49,814 --> 00:19:56,371 noise. And to make sure that our backend, it's not just swamped with all those fake 207 00:19:56,371 --> 00:20:01,600 and dummy requests, there's a special header field, which informs the backend to 208 00:20:01,600 --> 00:20:06,054 actually ignore those requests. But if you would just ignore them and not send a 209 00:20:06,054 --> 00:20:11,560 response - it could be implemented on the client, but then it would be observable 210 00:20:11,560 --> 00:20:17,280 again that it's just a fake request. So what we do is - we let the backend skip 211 00:20:17,280 --> 00:20:22,318 all the interaction with the underlying database infrastructure, do not modify any 212 00:20:22,318 --> 00:20:28,021 data and so on, but there will be a delay in the response and the response will look 213 00:20:28,021 --> 00:20:34,310 exactly the same as if it was to respond to real request. Also on the data, both 214 00:20:34,310 --> 00:20:40,712 directions from the client to the server and from the server to the client, get 215 00:20:40,712 --> 00:20:46,977 some padding, so it's always the same size, no matter what information is contained 216 00:20:46,977 --> 00:20:53,986 in this data packages. So observing the data packages... so the size does not help 217 00:20:53,986 --> 00:21:00,484 in finding out what's actually going on. Now, you could say, OK, if there's so much 218 00:21:00,484 --> 00:21:06,162 additional traffic because they're fake requests being sent out and fake uploads 219 00:21:06,162 --> 00:21:12,283 being done and so on, this must cost a lot of data traffic to the users. There's a 220 00:21:12,283 --> 00:21:18,897 good point. It is all zero rated with German mobile operators, which means it's 221 00:21:18,897 --> 00:21:29,040 not charged to the end customers, but it's just being paid for. Now, there is still that 222 00:21:29,040 --> 00:21:34,560 thing with the extraction of information from the metadata while uploading the Diagnosis 223 00:21:34,560 --> 00:21:41,120 Keys and this metadata might be the source IP address, it might be the user agent 224 00:21:41,120 --> 00:21:47,120 being used. So then you can distinguish Android from iOS and possibly you could 225 00:21:47,120 --> 00:21:52,480 also find out about the OS version and to prevent it with introduced an intermediary 226 00:21:52,480 --> 00:21:58,320 server, which removes the metadata from the requests and just forwards the plain 227 00:21:58,320 --> 00:22:04,240 content of the packages basically to the backend service. So the backend service, 228 00:22:04,240 --> 00:22:17,705 the submission service is not able to tell from where this package came from. Now, 229 00:22:17,705 --> 00:22:24,613 for risk calculation, we can have a look at which information is available here. So 230 00:22:24,613 --> 00:22:30,661 we've got the information about encounters, which calculated at the device 231 00:22:30,661 --> 00:22:34,817 receiving the Rolling Proximity Identifiers as mentioned earlier and those 232 00:22:34,817 --> 00:22:39,820 information come into us in 30 minute exposure windows. So I mentioned earlier 233 00:22:39,820 --> 00:22:45,480 that all the Rolling Proximity Identifiers belonging to a single Diagnosis Key. So 234 00:22:45,480 --> 00:22:50,482 single day UTC basically that is, can be related to each other. But what the 235 00:22:50,482 --> 00:22:56,250 exposure notification framework then does is split up those encounters in 30 minute 236 00:22:56,250 --> 00:23:05,531 windows. So the first scan instance, where another device has been identified, starts 237 00:23:05,531 --> 00:23:09,716 the exposure window and then it's filled up until the 30 minutes are full. And if 238 00:23:09,716 --> 00:23:14,046 there's more encounters with the same Diagnosis Key basically, a new window is 239 00:23:14,046 --> 00:23:19,182 started and so on. The single exposure window only contains a single device. So 240 00:23:19,182 --> 00:23:25,039 it's one to one mapping. And within that window we can find the number of the scan 241 00:23:25,039 --> 00:23:32,897 instances. So scans take place every three to five minutes and within those scan 242 00:23:32,897 --> 00:23:35,280 instances, there are also multiple scans. 243 00:23:35,280 --> 00:23:38,455 And we get the minimum and the average attenuation 244 00:23:38,455 --> 00:23:44,229 per instance, and the attenuation is actually the reported transmit power of 245 00:23:44,229 --> 00:23:49,542 the device minus the signal strength when receiving the signal. So it basically 246 00:23:49,542 --> 00:23:55,405 tells us how much signal strength got lost on the way. If we talk about a low 247 00:23:55,405 --> 00:24:00,520 attenuation, this means the other device has been very close. If the attenuation is 248 00:24:00,520 --> 00:24:08,348 higher, it means the other device is farther away and, from the other way around, so 249 00:24:08,348 --> 00:24:12,951 through the Diagnosis Keys, which have been uploaded to the server, processed on the 250 00:24:12,951 --> 00:24:17,440 backend provided on CDN and came to us through that way, we can also get 251 00:24:17,440 --> 00:24:22,157 information about the infectiousness of the user, which is encoded in something we 252 00:24:22,157 --> 00:24:30,002 call Transmission Risk Level, which tells us how big the risk of infection from that 253 00:24:30,002 --> 00:24:38,240 person on that specific day has been. So, the Transmission Risk Level is based on 254 00:24:38,240 --> 00:24:43,360 the symptom status of a person and the symptom status means: Is the person 255 00:24:43,360 --> 00:24:49,082 symptomatic, asymptomatic, does the person want to tell about the symptoms or 256 00:24:49,082 --> 00:24:53,840 maybe do they not want to tell about the symptoms, and in addition to that, if 257 00:24:53,840 --> 00:24:58,640 there have been symptoms, it can also be clarified whether the symptoms start was a 258 00:24:58,640 --> 00:25:02,720 specific day, whether it has been a range of multiple days when the symptoms 259 00:25:02,720 --> 00:25:08,400 started, or people could also say: "I'm not sure about when the symptoms started, 260 00:25:08,400 --> 00:25:14,880 but there have been symptoms definitely". So this is the first case people can 261 00:25:14,880 --> 00:25:20,160 specify when the symptoms started and we can say that the symptoms start down here 262 00:25:20,160 --> 00:25:27,840 and around that date of the onset of symptoms, it's basically evenly spread the 263 00:25:27,840 --> 00:25:36,160 risk of infection: red means high risk, blue means low risk. See, when you move 264 00:25:36,160 --> 00:25:44,080 around that symptom start day also the infectiousness moves around and there's 265 00:25:44,080 --> 00:25:47,920 basically a matrix from where this information is derived. Again, you can 266 00:25:47,920 --> 00:25:54,400 find that all in the code. And there's also the possibility to say, OK, the 267 00:25:54,400 --> 00:26:00,080 symptoms started somewhere within the last seven days. That's the case up here. See, 268 00:26:00,080 --> 00:26:05,440 it's spread a little bit differently. Users could also specify it started 269 00:26:05,440 --> 00:26:11,440 somewhere from one to two weeks ago. You can see that here in the second chart and 270 00:26:11,440 --> 00:26:18,560 the third chart is the case for when the symptoms started more than two weeks ago. 271 00:26:18,560 --> 00:26:23,760 Now, here's the case, that user specify that they just received a positive test 272 00:26:23,760 --> 00:26:28,240 result. So they're definitely Corona positive, but they have never had 273 00:26:28,240 --> 00:26:32,640 symptoms, which might mean they are asymptomatic or presymptomatic. And, 274 00:26:32,640 --> 00:26:40,160 again, you see around the submission, there is an increased risk, but all the 275 00:26:40,160 --> 00:26:48,400 time before here only has a low transmission level asigned. If users want 276 00:26:48,400 --> 00:26:52,320 to specify that they can't remember when the symptoms started, but they definitely 277 00:26:52,320 --> 00:26:59,520 had symptoms, then it's all spread a little bit differently. And equally, if 278 00:26:59,520 --> 00:27:03,200 users do not want to share the information, whether they had symptoms at 279 00:27:03,200 --> 00:27:10,160 all. So now we've got this big risk calculation chart here, and I would like 280 00:27:10,160 --> 00:27:14,320 to walk you quickly through it. So on the left, we've got the configuration which is 281 00:27:14,320 --> 00:27:18,720 being fed into the exposure notification framework by Appe / Google, because 282 00:27:18,720 --> 00:27:24,400 there's also some mappings which the framework needs from us. There is some 283 00:27:24,400 --> 00:27:28,880 internal configuration because we have decided to do a lot of the risk 284 00:27:28,880 --> 00:27:33,360 calculation within the App instead of doing it in the framework, mainly because 285 00:27:33,360 --> 00:27:39,520 we have decided we want a eight levels, transmission risk levels, instead of the 286 00:27:39,520 --> 00:27:44,720 only three levels, so low, standard and high, which Apple and Google provide to 287 00:27:44,720 --> 00:27:51,280 us. For the sake of having those eight levels, we actually sacrifice the 288 00:27:51,280 --> 00:27:55,840 parameters of infectiousness, which is derived from the parameter days since 289 00:27:55,840 --> 00:28:02,880 onset of symptoms and the report type, which is always a confirmed test in Europe. 290 00:28:02,880 --> 00:28:08,000 So we got those three bits actually, which we can now use as a Transmission Risk 291 00:28:08,000 --> 00:28:13,440 Level, which is encoded on the server in those two fields, added to the Keys and 292 00:28:13,440 --> 00:28:20,080 the content delivery network, downloaded by the App and then passed through the 293 00:28:20,080 --> 00:28:24,560 calculation here. So it comes in here. It is assembled from those two parameters, 294 00:28:24,560 --> 00:28:30,960 Report Type and Infectiousness, and now it goes along. So first, we need to look, 295 00:28:30,960 --> 00:28:37,760 whether the sum of the durations at below 73 decibels. So that's our first threshold 296 00:28:37,760 --> 00:28:42,640 has been less than 10 minutes. If it has been less than 10 minutes, just drop the 297 00:28:42,640 --> 00:28:49,120 whole exposure window. If it has been more or equal 10 minutes, we might use it, 298 00:28:49,120 --> 00:28:55,760 depending on whether the Transmission Risk Level is larger or equal three and we use 299 00:28:55,760 --> 00:29:05,635 it. And now we actually calculate the relevant time and times between 60... 300 00:29:05,635 --> 00:29:12,970 between 55 and 63 decibels are only counted half, because that's a medium distance and 301 00:29:12,970 --> 00:29:19,158 times at below 55 decibels, that's up here are counted full, then added up. And 302 00:29:19,158 --> 00:29:24,080 then we've got the weight exposure time and now we've got this transmission risk 303 00:29:24,080 --> 00:29:28,591 level, which leads us to a normalization factor, basically. And this is multiplied 304 00:29:28,591 --> 00:29:33,800 with the rate exposure time. What we get here is the normalized exposure time per 305 00:29:33,800 --> 00:29:39,815 exposure window and those times for each window are added up for the whole day. And 306 00:29:39,815 --> 00:29:44,977 then that's the threshold of 15 minutes, which decides whether the day had a high 307 00:29:44,977 --> 00:29:54,000 risk of infection or a low risk. So now that you all know how to do those 308 00:29:54,000 --> 00:30:00,880 calculations, we can walk through it for three examples. So the first example is 309 00:30:00,880 --> 00:30:05,120 here: it's a transmission risk level of seven. You can see those all are pretty 310 00:30:05,120 --> 00:30:10,400 close so our magic thresholds are here at 73. That's for whether that's counted or 311 00:30:10,400 --> 00:30:17,680 not. Then at 63, it's this line. And at 55. So we see, OK, there's been a lot of 312 00:30:17,680 --> 00:30:23,280 close contact going on and some medium range contact as well. So let's do the 313 00:30:23,280 --> 00:30:29,360 pre-filtering, even though we already see it has been at least 10 minutes below 73 314 00:30:29,360 --> 00:30:35,600 decibels. Yes, definitely, because each of those dots represents three minutes. So, 315 00:30:35,600 --> 00:30:40,960 for this example calculation, I just assumed the scan windows are three minutes 316 00:30:40,960 --> 00:30:47,840 apart. Is it at least transmission risk level three? Yes, it's even seven. So now 317 00:30:47,840 --> 00:30:54,000 we do the calculation. It has been 18 minutes a day low attenuation, so at a 318 00:30:54,000 --> 00:30:59,200 close proximity, so that's 18 minutes and nine minutes those and those - three dots 319 00:30:59,200 --> 00:31:04,000 here at a medium attenuation. So a little bit farther apart, they count as four and 320 00:31:04,000 --> 00:31:09,600 a half minutes. We've got a factor here adding it up, it gets us to 25 minutes 321 00:31:09,600 --> 00:31:19,600 multiplied by 1.4 giving us 33... 31.5 minutes, which means red status. Already 322 00:31:19,600 --> 00:31:25,920 with a single window. Now, in this example, we can always see that's pretty 323 00:31:25,920 --> 00:31:30,560 far away and that's been one close encounter here, transmission risk level 324 00:31:30,560 --> 00:31:37,680 eight even, pre-filtering: has it been at least 10 minutes below 73 decibels? Nope. 325 00:31:37,680 --> 00:31:43,360 OK, then we already drop it. Now that's the third one. Transmission risk level 326 00:31:43,360 --> 00:31:51,440 eight again. It has been a little bit away, but there's also been some close 327 00:31:51,440 --> 00:31:57,040 contact, so we do the pre-filtering: has it been at least 10 minutes below 73? Now 328 00:31:57,040 --> 00:32:03,200 we already have to look closely. So, yes. It is below 73, this one as well. OK, so 329 00:32:03,200 --> 00:32:09,920 we've got four dots below 73 decibels. Gives us 12 minutes. Yes, transmission 330 00:32:09,920 --> 00:32:14,880 risk level three. OK, that's easy. Yes. And now we can do the calculation. It has 331 00:32:14,880 --> 00:32:20,560 been six minutes at the low attenuation - those two dots here. OK, they count full 332 00:32:20,560 --> 00:32:25,760 and zero minutes at the medium attenuation. You see this part is empty 333 00:32:25,760 --> 00:32:31,040 and the transmission risk level eight gives us a factor of 1.6. If we now 334 00:32:31,040 --> 00:32:36,880 multiply the six minutes by 1.6, we get 9.6 minutes. So if this has been the only 335 00:32:36,880 --> 00:32:41,680 encounter for a day, that's stil green. But if, for example, you had two 336 00:32:41,680 --> 00:32:47,840 encounters of this kind, so with the same person or with different people, then it 337 00:32:47,840 --> 00:32:53,280 would already turn into red because then it's close to 20 minutes, which is above 338 00:32:53,280 --> 00:33:00,640 the 15 minute threshold. Now, I would like to thank you for listening to my session, 339 00:33:00,640 --> 00:33:05,400 and I'm available for Q&A shortly. 340 00:33:12,510 --> 00:33:18,640 Herald: OK, so thank you, Tomas. This was a prerecorded talk and the discussion was 341 00:33:18,640 --> 00:33:24,240 very lively in the IRC during the talk, and I'm glad that Thomas will be here for 342 00:33:24,240 --> 00:33:36,080 the Q&A. Maybe to start with the first question by MH in IRC on security and 343 00:33:36,080 --> 00:33:44,519 replay attacks: Italy and Netherlands published TAKs DKs so early today are 344 00:33:44,519 --> 00:33:50,378 still valid. We learned that yesterday and the time between presentation, how is this 345 00:33:50,378 --> 00:33:55,000 handled in the European cooperation and can you make them adhere to the security 346 00:33:55,000 --> 00:34:03,199 requirements? This is the first question for you, Thomas. 347 00:34:03,199 --> 00:34:07,979 Thomas: OK, so thank you for this question. The way we handle Keys coming 348 00:34:07,979 --> 00:34:12,024 in from other European contries, that's through the European federation 349 00:34:12,024 --> 00:34:14,920 gateway service is, that they are handled 350 00:34:14,920 --> 00:34:19,724 as if they were national keys, which means they are put in some kind of 351 00:34:19,724 --> 00:34:26,518 embargo for two hours until... so two hours after the end of their validity to 352 00:34:26,518 --> 00:34:32,149 make sure that replay attacks are not possible. 353 00:34:32,149 --> 00:34:37,863 Herald: All right, I hope that answers this actually. OK, and then there was 354 00:34:37,863 --> 00:34:43,399 another one on international interoperability: is it EU only or is 355 00:34:43,399 --> 00:34:48,711 there is also cooperation between EU and, for example, Switzerland? 356 00:34:48,711 --> 00:34:57,021 Thomas: So so far, we've got the cooperation with other EU countries from audio glitches 357 00:34:57,021 --> 00:35:06,880 the European Union, which interoperates already, and regarding the integration of 358 00:35:06,880 --> 00:35:13,840 non-EU countries, that's basically a political decision that has to be made 359 00:35:13,840 --> 00:35:21,760 from this place as well. So that's nothing I as an architect can drive or control. So 360 00:35:21,760 --> 00:35:27,840 so far, it's only EU countries. Herald: All right. And then I have some 361 00:35:27,840 --> 00:35:32,640 comments and also questions on community interaction and implementation of new 362 00:35:32,640 --> 00:35:38,400 features, which seems a little slow for some. There was, for example, a proposal 363 00:35:38,400 --> 00:35:43,120 for functionality called Crowd Notifier for events and restaurants to check in by 364 00:35:43,120 --> 00:35:49,680 scanning a QR code. Can you tell us a bit more about this or what's there? Are you 365 00:35:49,680 --> 00:35:58,671 aware of this? Thomas: So I've personally seen that there 366 00:35:58,671 --> 00:36:03,540 are proposals online, and that is also a lively discussion on those issues, but 367 00:36:03,540 --> 00:36:10,313 what you need to keep in mind is that we are also... we have the task of developing 368 00:36:10,313 --> 00:36:16,498 this App for the federal ministry of Health, and they are basically the ones 369 00:36:16,498 --> 00:36:23,280 requesting features and then there's some scoping going on. So I'm personally and so 370 00:36:23,280 --> 00:36:29,720 to say that again, I am the architect so I can't decide which features are going to 371 00:36:29,720 --> 00:36:34,714 be implemented. It's just as soon as the decision has been made that we need a new 372 00:36:34,714 --> 00:36:41,194 feature, so after we've been given the task, then I come in and prepare the 373 00:36:41,194 --> 00:36:46,845 architecture for that. So I'm not aware of the current state of those developments, 374 00:36:46,845 --> 00:36:49,588 to be honest, because that's out of my personal scope. 375 00:36:49,588 --> 00:36:55,557 Herald: All right. I mean, it's often the case, I suppose, with great projects, with 376 00:36:55,557 --> 00:37:02,215 huge project. But overall, people seem to be liking the fact that everything is 377 00:37:02,215 --> 00:37:08,605 available on GitHub. But some people are really dedicated and seem to be a bit 378 00:37:08,605 --> 00:37:14,004 disappointed that interaction with the community on GitHub seems a bit slow, 379 00:37:14,004 --> 00:37:20,400 because some issues are not answered as people would hope it would be. Do you know 380 00:37:20,400 --> 00:37:27,288 that about some ideas on adding dedicated community managers to the GitHub community 381 00:37:27,288 --> 00:37:32,991 around the App? So the people we speak with, that was one note in IRC, actually 382 00:37:32,991 --> 00:37:37,430 seem to be changing every month. So are you aware of this kind of position of 383 00:37:37,430 --> 00:37:40,960 community management. Thomas: So there's people definitely 384 00:37:40,960 --> 00:37:45,256 working on the community management, there's also a lot of feedback and 385 00:37:45,256 --> 00:37:52,378 comments coming in from the community, and I'm definitely aware that there are people 386 00:37:52,378 --> 00:37:59,305 working on that. And, for example, I get asked by them to jump in on certain 387 00:37:59,305 --> 00:38:03,800 questions where verification was needed from an architectural point of view. And 388 00:38:03,800 --> 00:38:08,565 that's... if you look at GitHub, there's also some issues I've been answering, and 389 00:38:08,565 --> 00:38:14,507 that's because our community team has asked me to jump in there. So but the 390 00:38:14,507 --> 00:38:19,802 feedback that people are not fully satisfied with the way how the community 391 00:38:19,802 --> 00:38:23,165 is handled, is something I would definitely take back to our team 392 00:38:23,165 --> 00:38:27,513 internally and let them know about it. Herald: Yeah, that's great to know, 393 00:38:27,513 --> 00:38:33,835 actually. So people will have some answers on that. Maybe one last very concrete 394 00:38:33,835 --> 00:38:39,299 question by duffman in the IRC: Is the inability of the App to show the time/day 395 00:38:39,299 --> 00:38:42,973 of exposures a limitation of the framework or is it an implementation 396 00:38:42,973 --> 00:38:46,773 choice? And what would be the privacy implications of introducing such a 397 00:38:46,773 --> 00:38:51,053 feature? Actually, a big question, but maybe you can cut it short. 398 00:38:51,053 --> 00:38:56,537 Thomas: Yeah, OK, so the only information, the exposion notification framework by 399 00:38:56,537 --> 00:39:02,128 Google / Apple can give us - is the date of the exposure, and date always relates 400 00:39:02,128 --> 00:39:08,445 to UTC there. And so we never get the time of the actual exposure back. And when 401 00:39:08,445 --> 00:39:13,884 moving to the exposure windows, we also do not get the time back of the exposure 402 00:39:13,884 --> 00:39:19,353 window. And the implications if you were able to tell the exact time of the 403 00:39:19,353 --> 00:39:24,262 encounter, would be that people are often aware where they've been at a certain 404 00:39:24,262 --> 00:39:30,229 time. And let's say at 11:15, you were meeting with a friend and you get a 405 00:39:30,229 --> 00:39:36,200 notification that at 11:15, you had that exact encounter, it would be easy to tell 406 00:39:36,200 --> 00:39:44,210 whom you've met, who's been infected. And that's something not desired, that you can 407 00:39:44,210 --> 00:39:50,320 trace it back to a certain person. So the personification would basically then be 408 00:39:50,320 --> 00:39:53,680 the thing. Herald: All right, and I hope we have time 409 00:39:53,680 --> 00:39:58,080 for this last question asked on IRC: have you considered training a machine 410 00:39:58,080 --> 00:40:02,320 learning method to classified the risk levels instead of the used rule-based 411 00:40:02,320 --> 00:40:12,882 method? Thomas: So, I mean, classifying the risk 412 00:40:12,882 --> 00:40:21,358 levels through machine learning is something I'm not aware of yet. So the 413 00:40:21,358 --> 00:40:26,472 thing is, it's all based on basically a cooperation with the Fraunhofer Institute, 414 00:40:26,472 --> 00:40:30,840 where they have basically reenacted certain situations, did some measurements 415 00:40:30,840 --> 00:40:36,405 and that's what has been transferred into the risk model. So all those thresholds 416 00:40:36,405 --> 00:40:44,950 are derived from, basically, practical tests. So no ML at the moment. 417 00:40:44,950 --> 00:40:52,588 Herald: All right, so I suppose this was our last question and again, Thomas, a 418 00:40:52,588 --> 00:40:58,326 warm round of virtual applause to you and thank you again, Thomas, for giving this 419 00:40:58,326 --> 00:41:03,922 talk, for being part of this first remote case experience and for giving us some 420 00:41:03,922 --> 00:41:08,723 insight into the backend of the Corona- Warn-App. Thank you. 421 00:41:08,723 --> 00:41:11,661 Thomas: Was happy to do so. Thank you for having me here. 422 00:41:11,661 --> 00:41:15,656 rC3 postroll music 423 00:41:15,656 --> 00:41:49,965 Subtitles created by c3subtitles.de in the year 2021. Join, and help us!