WEBVTT 00:00:00.000 --> 00:00:19.237 35C3 preroll music 00:00:19.237 --> 00:00:24.970 Herald Angel: All right. It's my very big pleasure to introduce Roya Ensafi to you. 00:00:24.970 --> 00:00:31.390 She's gonna talk about "Censored Planet: a Global Censorship Observatory". I'm 00:00:31.390 --> 00:00:36.230 personally very interested in learning more about this project. Sounds like it's 00:00:36.230 --> 00:00:41.490 gonna be very important. So please welcome Roya with a huge warm round of applause. 00:00:41.490 --> 00:00:42.880 Thank you. 00:00:42.880 --> 00:00:48.660 Applause 00:00:48.660 --> 00:00:56.170 Roya: It's wonderful to finally make it to CCC. I had joined talk with multiple of my 00:00:56.170 --> 00:01:00.219 friends over the past years and the visa stuff never worked out. This year I 00:01:00.219 --> 00:01:06.430 applied for a conference in August and the visa worked for coming to CCC. My name is 00:01:06.430 --> 00:01:11.170 Roya Ensafi and I'm professor at the University of Michigan. My research 00:01:11.170 --> 00:01:18.069 focuses on security and privacy with the goal of protecting users from adversarial 00:01:18.069 --> 00:01:27.799 network. So basically I investigate network interference ...and somebody is 00:01:27.799 --> 00:01:55.770 interfering right now. Damn it. What the heck. Cool, I'm good. Oh, no I'm not. 00:01:55.770 --> 00:02:07.639 laughter OK. In my lab we develop techniques and systems to be able to 00:02:07.639 --> 00:02:13.800 detect network interference often at a scale and apply these frameworks and tools 00:02:13.800 --> 00:02:20.060 to be able to understand the behaviors of these actors that do the interference and 00:02:20.060 --> 00:02:25.040 use this understanding to be able to come up with a defense. Today I'm going to talk 00:02:25.040 --> 00:02:30.030 about a project that is very dear to my heart. The one that I spent six years 00:02:30.030 --> 00:02:34.560 working on it. And in this talk I'm going to talk about censorship, internet 00:02:34.560 --> 00:02:41.391 censorship. And by that I mean any action that prevents users' access to the 00:02:41.391 --> 00:02:48.720 requested content. We have heard an alarming level of censorship happening all 00:02:48.720 --> 00:02:53.980 around the world. And while it was previously multiple countries that were 00:02:53.980 --> 00:03:01.260 capable of using deep packet inspections to tamper with user traffic thanks to 00:03:01.260 --> 00:03:08.540 commercialization of these DPIs now many countries are actually messing with users' 00:03:08.540 --> 00:03:16.951 data. For the first time that the users type CNN.com in their browsers, their 00:03:16.951 --> 00:03:22.320 traffic is subject to some level of interference by different actors. First 00:03:22.320 --> 00:03:27.150 for example the DNS query where the mapping between the domain and the IP 00:03:27.150 --> 00:03:34.100 where the content is, can be manipulated. For example the DNS assets can be a dead 00:03:34.100 --> 00:03:40.900 IP where the content is not there. If the DNS succeed then the users and the servers 00:03:40.900 --> 00:03:47.500 are going to establish a connection, TCP handshake and that can be easily blocked. 00:03:47.500 --> 00:03:53.840 If that succeed then users and servers start actually sending back and forth the 00:03:53.840 --> 00:04:00.209 actual data and there are enough to clear text to be the traffic encrypted or not 00:04:00.209 --> 00:04:06.130 that the DPI can detect a sensitive keyboard and send a reset package to both 00:04:06.130 --> 00:04:12.990 basically shut down the connections. Before I forget let me tell you and 00:04:12.990 --> 00:04:18.150 emphasize that it's not just the governments and the policies that impose 00:04:18.150 --> 00:04:25.400 on the ISPs that lead to censorship. Actually server side which provides the 00:04:25.400 --> 00:04:31.319 data are also blocking users. Especially if they are located in a region that they 00:04:31.319 --> 00:04:39.580 don't provide any revenue. We recently investigated this issue of dual blocking 00:04:39.580 --> 00:04:49.180 in deep and provide more details about what role CDNs actually provide. Imagine 00:04:49.180 --> 00:04:57.490 now we have how many users, how many ISPs, how many transit networks and how many 00:04:57.490 --> 00:05:02.830 websites. Each of which are going to have their own policies of how to block users' 00:05:02.830 --> 00:05:09.859 access. More, censorship changes from time to time, region to region and country to 00:05:09.859 --> 00:05:14.759 country. And for that reason many researchers including me have been 00:05:14.759 --> 00:05:20.660 interested in collecting data about censorship in a global way and 00:05:20.660 --> 00:05:29.539 continuously. Well, I grew up under severe censorship. Be it the university, 00:05:29.539 --> 00:05:35.289 government, more frustrating the server side. And I genuinely believe that 00:05:35.289 --> 00:05:44.739 censorship take away opportunities and degrade human dignity. It is not just 00:05:44.739 --> 00:05:54.090 China, Bahrain, Turkey that does internet censorship. Actually with the DPIs become 00:05:54.090 --> 00:06:02.499 cheaper and cheaper many governments are following their leads. As a result 00:06:02.499 --> 00:06:06.680 Internet is becoming more and more balkanized and the users around the world 00:06:06.680 --> 00:06:09.870 are going to soon have a very very different pictures of what this Internet 00:06:09.870 --> 00:06:16.500 is. And we need to be able to collect the data and to be able to know what is being 00:06:16.500 --> 00:06:25.121 censored, how it's being censored, where it's being censored and for how long. This 00:06:25.121 --> 00:06:32.509 data then can be used to bring transparency and accountability to 00:06:32.509 --> 00:06:38.779 governments or private companies that practice internet censorship. It can help 00:06:38.779 --> 00:06:44.460 us to know where the circumvention to, where the defense needs to be deployed. It 00:06:44.460 --> 00:06:49.309 can help us to let the users around the world to know what their governments are 00:06:49.309 --> 00:06:59.370 up to and more important provide valid and good data for the policymakers to come up 00:06:59.370 --> 00:07:07.860 with the good policies. Existing research already shows that if we can provide this 00:07:07.860 --> 00:07:17.860 data to users they act by their own will to ensure Internet freedom. For many years 00:07:17.860 --> 00:07:22.619 my goal has been to come up with a weather map, a censorship weather map where you 00:07:22.619 --> 00:07:27.199 can actually see changes in censorship over time, how some countries are 00:07:27.199 --> 00:07:34.100 different from others and do that for a continuous duration of time, and for all 00:07:34.100 --> 00:07:41.710 over the world. Creating such a map was impossible with the techniques, Internet 00:07:41.710 --> 00:07:46.919 measurement methods that we had at that time. At the time and even the common 00:07:46.919 --> 00:07:53.779 techniques we now use. The measurement methods to be able to use for measuring 00:07:53.779 --> 00:07:59.080 internet censorship is often by deploying a software or giving your customized 00:07:59.080 --> 00:08:05.689 Raspberry Pi to either a client or a server and based on that measure what's 00:08:05.689 --> 00:08:12.550 happening between client and servers. Well, this approach has a lot of 00:08:12.550 --> 00:08:18.050 limitations. For example there are not that many volunteers around the whole 00:08:18.050 --> 00:08:25.409 world that are eager to download a software and run it. Second, the data 00:08:25.409 --> 00:08:33.190 collected from this approach are often not continuous because the user's connection 00:08:33.190 --> 00:08:37.960 can die for a variety of reasons or users may loose interest to keep running the 00:08:37.960 --> 00:08:45.450 software. And therefore we end up with sparse data where we cannot have a good 00:08:45.450 --> 00:08:53.450 baseline for internet censorship studies. More measuring domains that are sensitive 00:08:53.450 --> 00:08:59.800 often create risks for the local collaborators and might end up with their 00:08:59.800 --> 00:09:09.810 government's retaliate. These risks are not hypothetical. When the Arab Spring was 00:09:09.810 --> 00:09:17.240 happening I was approached by many colleagues to recruit local friends and 00:09:17.240 --> 00:09:24.340 colleagues in Middle East to be able to collect measurement data at the time that 00:09:24.340 --> 00:09:30.010 was very interesting to capture the behavior of the network and most dangerous 00:09:30.010 --> 00:09:36.450 for the locals, and volunteers to collect that. My painting actually expressed what 00:09:36.450 --> 00:09:44.090 I felt at the time. I can't just imagine asking people on the ground to help at 00:09:44.090 --> 00:09:54.810 these times of unrest. In my opinion, conspiring to collect the data against the 00:09:54.810 --> 00:10:02.450 government's interest can be seen as an act of treason. And these governments are 00:10:02.450 --> 00:10:11.770 unpredictable often. So it has exposed these volunteers to a severe risk. While 00:10:11.770 --> 00:10:19.030 no one has yet been arrested because of measuring internet censorship as far as we 00:10:19.030 --> 00:10:25.740 know, and I don't know how we can know that on a global scale, I think the clouds 00:10:25.740 --> 00:10:34.210 are on the horizon. I'm still at awe how Turkish government used their surveillance 00:10:34.210 --> 00:10:42.410 data at a time of a co-op and tracked down and detained hundreds of users because 00:10:42.410 --> 00:10:49.400 there was a traffic between them and by luck a messenger app that was used by co- 00:10:49.400 --> 00:10:57.410 op administrators. These things happens. Before I continue, if you know OONI you 00:10:57.410 --> 00:11:08.091 might ask how OONI prevents risk. Well, with a great level of efforts. And if you 00:11:08.091 --> 00:11:12.130 don't know OONI, OONI is a global community of volunteers that collect data 00:11:12.130 --> 00:11:20.840 about censorship around the world. Well, first and foremost they provide their 00:11:20.840 --> 00:11:27.990 volunteers with the very honest consent, telling them that "hey, if you run this 00:11:27.990 --> 00:11:34.560 software, anybody who is monitoring your traffic know what you're up to." They also 00:11:34.560 --> 00:11:39.390 go out of their way to give freedom to these volunteers to choose what website 00:11:39.390 --> 00:11:46.010 they want to run, what data they want to push. They establish a great relationship 00:11:46.010 --> 00:11:53.940 with the local activist organization in the countries. Well, now that I prove to 00:11:53.940 --> 00:11:59.250 you guys that I am a supporter of OONI and I am actually friends with most of them; I 00:11:59.250 --> 00:12:05.300 want to emphasize that I still believe that consistent and continuous and global 00:12:05.300 --> 00:12:12.200 data about censorship requires a new approach that doesn't need volunteers' 00:12:12.200 --> 00:12:21.880 help. I've become obsessed with solving this problems. What if we could measure 00:12:21.880 --> 00:12:29.160 without a client, in anywhere around the world, can talk to a server without being 00:12:29.160 --> 00:12:36.290 close to a client. Somewhere from here, from University of Michigan. And see 00:12:36.290 --> 00:12:42.300 whether the two hosts can talk to each other, globally and remotely, off the 00:12:42.300 --> 00:12:50.220 path. When I talk to the people about this, honestly, everybody was like "you 00:12:50.220 --> 00:12:54.190 don't know what you're talking about, it's really really challenging". Well, they 00:12:54.190 --> 00:13:01.370 were right. The challenge is there, and I'm going to walk you through it. We have 00:13:01.370 --> 00:13:06.760 at least 140 million IP addresses that respond to same packet. This means they 00:13:06.760 --> 00:13:15.530 speak to the world, and they follow blindly TCP/IP protocol. So the question 00:13:15.530 --> 00:13:24.400 becomes: how can I leverage the subtle properties of TCP/IP to be able to detect 00:13:24.400 --> 00:13:36.080 that two hosts can talk to each other? Well, Spooky Scan is a technique that Jed 00:13:36.080 --> 00:13:43.090 Crandall from University of New Mexico and I developed that uses TCP/IP side channels 00:13:43.090 --> 00:13:49.770 to be able to detect whether the two remote hosts can establish a TCP handshake 00:13:49.770 --> 00:13:56.890 or not, and if not, in which direction the packets are being dropped. Off the path 00:13:56.890 --> 00:14:03.780 and remotely. And I'm gonna start telling you how this works. First I have to cover 00:14:03.780 --> 00:14:10.810 some background. So any connection that is based on TCP, one of the basic 00:14:10.810 --> 00:14:15.950 communication protocols we have, is it needs to establish a TCP handshake. So 00:14:15.950 --> 00:14:22.730 basically you should, you send a SYN and in the packet you send, in the IP header, 00:14:22.730 --> 00:14:30.750 you have a field called "identification IP_ID", and this field is used for 00:14:30.750 --> 00:14:36.610 fragmentation reason, and I'm going to use this field a lot in the rest of the talk. 00:14:36.610 --> 00:14:42.300 After the user received a SYN, it is going to send a SYN-ACK back, have another IP_ID 00:14:42.300 --> 00:14:47.520 in it. And then, if I want to establish a connection I send ACK. Otherwise I send a 00:14:47.520 --> 00:14:56.070 RESET (RST). Part of the protocol says that if you send a SYN-ACK packet to a 00:14:56.070 --> 00:15:01.310 machine with a port open or closed, it's going to send you a RST, telling you "what 00:15:01.310 --> 00:15:05.220 the heck you are sending me SYN-ACK, I didn't send you a SYN" and another part 00:15:05.220 --> 00:15:09.350 said: if you send a SYN packet to a machine with the port open, eager to 00:15:09.350 --> 00:15:13.880 establish connection, it will send you a SYN-ACK. If you don't do anything, because 00:15:13.880 --> 00:15:20.040 TCP/IP is reliable, it's going to send you multple SYN-ACK. It depends on operating 00:15:20.040 --> 00:15:30.241 system, 3, 5, you name it. Spooky Scan requires some basic characteristics. For 00:15:30.241 --> 00:15:36.740 example, the client, the vantage points that we are interested, should maintain a 00:15:36.740 --> 00:15:44.060 global variable for the IP_ID. It means that, when they receive the packets and 00:15:44.060 --> 00:15:48.650 they want to send a packet out, no matter who they're sending the packet to, this 00:15:48.650 --> 00:15:53.590 IP_ID is going to be a shared resource, as in going to be increment by one. So by 00:15:53.590 --> 00:15:57.900 just watching the IP_ID changes you can see how much a machine is noisy, how much 00:15:57.900 --> 00:16:03.820 a machine is sending traffic out. A server should have a port open, let's say 80 or 00:16:03.820 --> 00:16:08.910 443, and wants to establish a connection, and the measurement machine, me, should be 00:16:08.910 --> 00:16:15.360 able to spoof packets. It means sending packet with the source IP different from 00:16:15.360 --> 00:16:20.520 my own machine. To be able to do that, you need to talk to upstream network and ask 00:16:20.520 --> 00:16:28.260 them not to drop the packets. All of these requirements I could easily satisfy with a 00:16:28.260 --> 00:16:36.560 little bit of effort. A Spooky Scan starts with measurement machine send a SYN-ACK 00:16:36.560 --> 00:16:41.310 packet to one of this client with a global IP_ID, at a time let's say the value is 00:16:41.310 --> 00:16:49.010 7000. The client is going to send back a RST, following the protocol, revealing to 00:16:49.010 --> 00:16:53.881 me what the value of IP_ID. In the next step I'm going to send a spoofed SYN 00:16:53.881 --> 00:17:01.779 packet to a server using a client IP. As a result, the SYN-ACK is going to be sent to 00:17:01.779 --> 00:17:06.289 the client. Again, client is going to send a RST back, the IP_ID is going to be 00:17:06.289 --> 00:17:11.240 incremented by 1. Next time I query IP_ID I'm going to see a jump too. In a 00:17:11.240 --> 00:17:17.189 noiseless model, I know that this machine talked to the server. If I query it again, 00:17:17.189 --> 00:17:25.070 I won't see any jump. So, Delta 2, Delta 1. Now imagine there is a firewall that 00:17:25.070 --> 00:17:32.520 blocks the SYN-ACKs going from the server to the client. Well, it doesn't matter how 00:17:32.520 --> 00:17:36.860 much of the traffic I send, it's not going to get there. It's not going to get there. 00:17:36.860 --> 00:17:44.390 So the delta I see is 1, 1. In the third case when the packets are going to be 00:17:44.390 --> 00:17:49.790 dropped from the client to the server: Well, my SYN-ACK gets there. The SYN-ACK 00:17:49.790 --> 00:17:55.030 gets to the client, the client is going to set the RST back, but it's not going to 00:17:55.030 --> 00:17:59.470 get to the server. And so server thinks that a packet got dropped, so it's going 00:17:59.470 --> 00:18:07.040 to send multiple SYN-ACK. And as a result the RST is going to be plus plus more. And 00:18:07.040 --> 00:18:13.690 so what jump I would see is, let's say, 2, 2. Let me put them all together. So you 00:18:13.690 --> 00:18:19.670 have 3 cases. Blocking in this direction. No blocking and blocking in the other. And 00:18:19.670 --> 00:18:25.890 you see different jumps or different deltas. So it's detectable. Yes, yes, in a 00:18:25.890 --> 00:18:31.770 noiseless model. I know the clients talk to so many others and the IP_ID is going 00:18:31.770 --> 00:18:37.590 to be changed because of a variety of reason. I call all of those noise. And 00:18:37.590 --> 00:18:42.870 this is how we are going to deal with it. Well, intuitively thinking we can amplify 00:18:42.870 --> 00:18:47.940 the signal. We can actually instead of sending one spoofed SYN packet we can send 00:18:47.940 --> 00:18:55.310 n. And for a variety of reasons packets can get dropped. So we need to repeat this 00:18:55.310 --> 00:19:04.360 measurement. So here is some data from a Spooky Scan where I used the following 00:19:04.360 --> 00:19:13.300 probing method. For 30 seconds I spoofed the, I've sent a query for IP_ID. And then 00:19:13.300 --> 00:19:20.559 for another 30 seconds I send these 5 spoofed SYN packets. This is machines or 00:19:20.559 --> 00:19:26.680 clients in Azerbaijan, China and United States. And we wanted to check whether it 00:19:26.680 --> 00:19:32.980 has reached the TOR-relay that we had in Sweden. You can see there are different 00:19:32.980 --> 00:19:40.280 jump or different levels-shift that you observe in a second phase. And just 00:19:40.280 --> 00:19:45.290 visually looking at it or using auto- regressive moving average or ARMA you 00:19:45.290 --> 00:19:51.120 can actually detect that. But there is an insight here, which is that not all the 00:19:51.120 --> 00:19:56.520 clients have the same level of noise. And for which, for some of them, especially 00:19:56.520 --> 00:20:01.630 these guys, you could easily detect after five level of sending IP_ID-query and then 00:20:01.630 --> 00:20:10.770 five seconds of spoofing. So in the follow-up work we tried to use this 00:20:10.770 --> 00:20:16.480 insight, to be able to come up with a scalable and efficient technique to be 00:20:16.480 --> 00:20:24.900 able to use it in a global way. And that technique is called "Augur". Well Augur 00:20:24.900 --> 00:20:32.920 adopts this probing method. First, for four seconds it queries IP_ID, then in one 00:20:32.920 --> 00:20:42.160 second sends 10 spoofed SYN-packets. Then look at the IP_ID-acceleration or second 00:20:42.160 --> 00:20:49.600 derivative, and see whether we see a jump, a sudden jump at the time of perturbation, 00:20:49.600 --> 00:20:55.520 when we did the spoofing. How confident we are that that jump is the result of our 00:20:55.520 --> 00:21:02.290 own spoofed packet? Well, I'm not confident, run it again. I think so, run 00:21:02.290 --> 00:21:09.280 it again, until you have a sufficient confidence. It turns out there is a 00:21:09.280 --> 00:21:15.230 statistical analysis called "sequential hypothesis testing" that can be used to be 00:21:15.230 --> 00:21:23.300 able to gradually improve our confidence about the case we're detecting. So I'm 00:21:23.300 --> 00:21:28.340 going to give you a very, very rough overview of how this works. But for 00:21:28.340 --> 00:21:36.810 sequential hypothesis testing we need to define a random variable. And we use 00:21:36.810 --> 00:21:42.910 IP_ID-acceleration at the time of perturbation, being 1 or 0, based on you 00:21:42.910 --> 00:21:53.570 see jump or not. We also need to calculate some empirical priors, known 00:21:53.570 --> 00:21:59.450 probabilities. If you look at everything, what would be the probability that you see 00:21:59.450 --> 00:22:08.179 jump when there is actually no blocking? And so on. After we put all this together 00:22:08.179 --> 00:22:16.150 then we can formalize an algorithm starting by run a trial. Update the 00:22:16.150 --> 00:22:20.940 sequence of values for the random variables. Then check whether this 00:22:20.940 --> 00:22:27.320 sequence of values belongs to the distribution of where the blocking happen 00:22:27.320 --> 00:22:32.590 or not. What's the likelihood of that? If you're confident, if we reached the level 00:22:32.590 --> 00:22:39.130 that we are satisfied, then we call it a case. So putting all this together this is 00:22:39.130 --> 00:22:47.720 how Augur works. We scan the whole IPv4, find global IP_ID-machines. And then we 00:22:47.720 --> 00:22:55.870 have some constraint that is it a stable machine? Is it a noisier or have a noise 00:22:55.870 --> 00:23:02.170 that you want to deal with? We also need to figure out what website are we 00:23:02.170 --> 00:23:09.290 interested to test reachability towards? What countries we are? So after we decide 00:23:09.290 --> 00:23:18.500 all the input then we run a scheduler making sure that no client and server are 00:23:18.500 --> 00:23:26.160 under the measurement in the same time because they mess each other's detection. 00:23:26.160 --> 00:23:32.500 And then we actually use our analysis to be able to call the case and summarize the 00:23:32.500 --> 00:23:39.191 results. I started by saying that the common methods have this limitation, for 00:23:39.191 --> 00:23:45.370 example coverage continuity and ethics. Well, when it comes to coverage there are 00:23:45.370 --> 00:23:52.620 more than 22-million global IP_ID- machines. These are WindowsXP or 00:23:52.620 --> 00:24:02.570 predecessors. And FreeBSDs for example. Compared to the previous board, 00:24:02.570 --> 00:24:07.910 one successful project is the RIPE-atlas, and they have around 10000 probes globally 00:24:07.910 --> 00:24:18.970 deployed. When it comes to continuity we don't depend on the end user. So it's much 00:24:18.970 --> 00:24:28.720 more reliable to use this. Well, by not asking volunteers to help we were already 00:24:28.720 --> 00:24:34.570 reducing the risk. Because there is no users conspiring against their governments 00:24:34.570 --> 00:24:43.000 to collect this data. But our approach is not also zero risk. If you look you have a 00:24:43.000 --> 00:24:49.860 different kind of risk here. The client and server exchanging SYN-ACK and RST 00:24:49.860 --> 00:24:55.810 without each of them giving a consent. And we don't want to ask for consent. Because 00:24:55.810 --> 00:25:01.020 if you do, the dilemma exists. We have to go back and it's just the same that's 00:25:01.020 --> 00:25:06.850 asking volunteers. So, to deal with that and cope with that, to reduce the risk 00:25:06.850 --> 00:25:15.380 more, we don't use end-IPs. We actually use 2 hops back, routers which high 00:25:15.380 --> 00:25:21.650 probability they are infrastructure machines and use those as a vantage point. 00:25:21.650 --> 00:25:31.486 Even in this harsh constraint we still have 53000 global IP_ID-routers. To test 00:25:31.486 --> 00:25:38.780 the framework to see that whether Augur works we chose 2000 of these global IP_ID- 00:25:38.780 --> 00:25:45.350 machines, uniformly selected from all the countries we had vantage point. We 00:25:45.350 --> 00:25:52.549 selected websites from Citizen Lab Testlist. This is the research 00:25:52.549 --> 00:25:57.710 organization in Toronto University where they crowdsourced websites that are 00:25:57.710 --> 00:26:03.070 potentially being blocked or potential sensitive. And then we used thousands of 00:26:03.070 --> 00:26:09.640 the websites from Alexa top-10k. And then we get the Augur running for 17 days and 00:26:09.640 --> 00:26:17.050 collect this data. One of the challenges that we have to validate Augur was like: 00:26:17.050 --> 00:26:22.940 So, what is the truth? What is the ground- truth? What would we see that makes sense? 00:26:22.940 --> 00:26:26.270 So, and this is the biggest and fundamental challenge for internet- 00:26:26.270 --> 00:26:33.570 censorship anyway. But so the first approach is leaning on intuition, which is 00:26:33.570 --> 00:26:40.049 like no client should show blocking towards all the websites. No server should 00:26:40.049 --> 00:26:45.740 show blocking for bulk of our clients. And if anything happens like that we just 00:26:45.740 --> 00:26:51.960 trash it. And we should see more bias towards the sensitive domain versus the 00:26:51.960 --> 00:27:01.559 ones that are popular. And so on. And also we hope to replicate the anecdotes, the 00:27:01.559 --> 00:27:08.870 reports out there. And we did all of those. And that's how we validate Augur. 00:27:08.870 --> 00:27:17.690 So at the end Augur is a system that is as scalable and efficient, ethical and can be 00:27:17.690 --> 00:27:24.630 used to detect TCP/IP-blocking continuously. Yes I know that is just 00:27:24.630 --> 00:27:32.310 TCP/IP. What about the other layers? Can we measure them remotely as well? Well, 00:27:32.310 --> 00:27:40.090 let me focus on the DNS. You might ask: Is there a way that we can remotely detect 00:27:40.090 --> 00:27:46.890 DNS poisoning or manipulation? Well let's think it out loud. From now on I'm gonna 00:27:46.890 --> 00:27:54.370 give just the highlights of the papers we work for the lack of the time. Well, if we 00:27:54.370 --> 00:28:06.070 scan the whole IPv4 we have a lot of open DNS resolvers, which means that they are 00:28:06.070 --> 00:28:14.929 open to anybody sending a query to them to resolve. And these open DNS-resolvers can 00:28:14.929 --> 00:28:22.590 be used as a vantage point. We can use open DNS-resolvers in different ISPs 00:28:22.590 --> 00:28:29.830 around the world to see whether that DNS queries are poisoned or not. Well, wait. 00:28:29.830 --> 00:28:35.419 We need to make sure that they don't belong to the end user. So we come up with 00:28:35.419 --> 00:28:42.760 a lot of checks to make sure that these open DNS-resolvers are organizational, 00:28:42.760 --> 00:28:50.610 belonging to the ISP or infrastructure. After we do that then we start sending all 00:28:50.610 --> 00:28:57.980 our queries to these, let's say, open DNS- resolvers in the ISP in Bahrain, for all 00:28:57.980 --> 00:29:03.929 the domain we're interested. And capture what we receive what IPs we receive. The 00:29:03.929 --> 00:29:11.390 challenge is then to detect what is the wrong answer. And so we have to come up 00:29:11.390 --> 00:29:19.760 with a lot of heuristics. A set of heuristics. For example the response that 00:29:19.760 --> 00:29:28.610 we received is that equal to a reply we got from our control measurements, where 00:29:28.610 --> 00:29:36.500 we know the IP is not blocked or poisoned or something. The content is there. Or we 00:29:36.500 --> 00:29:42.060 can actually look at the IP that we received and see whether it has a valid 00:29:42.060 --> 00:29:50.850 http cert, with or without the SNI or servername identification or something. 00:29:50.850 --> 00:29:55.720 And so on so forth. So we come up with lots of heuristics to detect wrong 00:29:55.720 --> 00:30:06.840 answers. The results of all these efforts ended up being a project called 00:30:06.840 --> 00:30:12.210 "Satellite", which was started by Will Scott. I'm sure he is in the audience 00:30:12.210 --> 00:30:16.809 somewhere. A great friend of mine and very good supporter of CensoredPlanet. 00:30:16.809 --> 00:30:24.000 Selflessly, he has been a miracle that I I had the opportunity and fortune to meet 00:30:24.000 --> 00:30:31.890 him. We have Satellite. Satellite automate the whole steps that I told you. For this 00:30:31.890 --> 00:30:37.400 work we use science that developed in both of the work. We call it Satellite because 00:30:37.400 --> 00:30:46.421 of seniority and sticking with the name. So how much coverage Satellite has? If you 00:30:46.421 --> 00:30:54.880 scan IPv4 you end up with 4.2 million open DNS-resolvers in every country in their 00:30:54.880 --> 00:31:01.079 territories. We make, we need, we we actually need to make sure there are 00:31:01.079 --> 00:31:08.950 ethics for that reason. If we put a harsh condition. We say that let's only use the 00:31:08.950 --> 00:31:17.710 ones that fallow their valid PTR record followed this expression. Basically let's 00:31:17.710 --> 00:31:23.200 just use the open DNS-resolvers that are name servers or at least their PDR record 00:31:23.200 --> 00:31:29.920 suggests that. This is a really harsh constraint. Actually, my students have 00:31:29.920 --> 00:31:34.430 been adding more and more regular expression for the ones that we are sure 00:31:34.430 --> 00:31:42.610 they are organizational. But for now just being this harsh we have 40k of DNS- 00:31:42.610 --> 00:31:56.830 revolvers in almost 169 countries I guess. So censorship happened in other layers as 00:31:56.830 --> 00:32:00.700 well. How do we want to deal with that remote channel, with the remote side 00:32:00.700 --> 00:32:12.520 channel? And, especially, like, what about http traffic or disruption that can happen 00:32:12.520 --> 00:32:29.809 to you know TLS centric. I hate water. Oh no. Okay. So. So it's scratching 00:32:29.809 --> 00:32:38.220 noise it's well documented that many DPIs especially in the Great Firewall of China monitor 00:32:38.220 --> 00:32:43.930 the traffic and then they see a key word, a sensitive keyword like "Falun Gong". 00:32:43.930 --> 00:32:50.350 They act and a drop traffic or send a RST. And as I mentioned earlier there are 00:32:50.350 --> 00:32:57.330 enough clear text everywhere. Even in TLS handshakes SNI is in clear text. And for a 00:32:57.330 --> 00:33:03.590 long time I was trying to come up with a way of detecting application layer using 00:33:03.590 --> 00:33:09.320 this fancy side channel. Like, how can I detect that when the client and server 00:33:09.320 --> 00:33:14.630 need to first establish a TCP handshake, how the side channel can jump in and then 00:33:14.630 --> 00:33:22.720 detect the rest? We were lucky enough that the end pointed to a protocol called 00:33:22.720 --> 00:33:32.900 "Echo". It's a protocol designed in 1983 and it's for testing reasons, for the 00:33:32.900 --> 00:33:41.140 debu..it is a debugging tool, basically. It's a predecessor to ping. And basically, 00:33:41.140 --> 00:33:50.120 after you establish a TCP handshake to port 7, whatever you send the Echo servers 00:33:50.120 --> 00:33:57.290 on port 7 it's gonna echo it back. Now think about it. How we can use Echo 00:33:57.290 --> 00:34:04.570 servers to be able to detect application layer blocking? Well, when it's not 00:34:04.570 --> 00:34:08.490 available, let's say I have an Echo server in the U.S. and a measurement machine in 00:34:08.490 --> 00:34:13.890 the University of Michigan I establish a TCP handshake and I send a GET request 00:34:13.890 --> 00:34:19.190 to... using a censored keyboard for example. It's gonna get back to me the 00:34:19.190 --> 00:34:28.269 same thing I sent. But now let's put the DPI that is gonna be triggered by it. 00:34:28.269 --> 00:34:37.150 Well, for sure, either I'm going to receive a RST first or something else. So 00:34:37.150 --> 00:34:43.609 we can actually come up with a algorithm to be able to use Echo servers to detect 00:34:43.609 --> 00:34:47.969 disruptions on application layer. Basically keyboards blocking, URL 00:34:47.969 --> 00:34:58.530 blocking. Results of this is a tool called Quack. And Quack actually uses Echo 00:34:58.530 --> 00:35:06.470 servers to be able to detect in a scalable way and say if, whether the keywords are 00:35:06.470 --> 00:35:14.380 being blocked around the world. So what did we do is first scan the whole IPv4. We 00:35:14.380 --> 00:35:22.910 find 47k Echo servers running around the world. Then we need to be able to check 00:35:22.910 --> 00:35:27.270 whether they or not belong to the end users. And that was a very challenging 00:35:27.270 --> 00:35:36.530 part because there is not a clear signal as it's.. there are 90 percent of them are 00:35:36.530 --> 00:35:40.730 infrastructure but there is still some portion of them that we don't know. So 00:35:40.730 --> 00:35:46.610 what we do is we look at the FreedomHouse reports and the countries that are 00:35:46.610 --> 00:35:52.931 partially open or not open, not free or partially free what they're called. This 00:35:52.931 --> 00:35:58.720 is around 50... This is around 50 countries. And for those we use... we 00:35:58.720 --> 00:36:05.460 randomly select some that we want and we use OS detection of Nmap. And if you have, 00:36:05.460 --> 00:36:15.750 it will give us back it's a server, it's a switch and so on. We use those. So with 00:36:15.750 --> 00:36:23.010 the help of so many collaborators after almost six years we end up with three 00:36:23.010 --> 00:36:32.420 systems that can capture TCP/IP blocking, DNS, and application layer blocking using 00:36:32.420 --> 00:36:43.480 infrastructure and organizational machines. So while it was, it was a dream 00:36:43.480 --> 00:36:47.810 or a vision that we can come up with a better map to collect this data in a 00:36:47.810 --> 00:36:56.020 continuous way, thanks to help of a lot of people especially my students, Will, and 00:36:56.020 --> 00:37:02.060 other collaborators we now have CensoredPlanet. CensoredPlanet collects 00:37:02.060 --> 00:37:09.020 semi-weekly snapshots of Internet censorship using our vantage point in all 00:37:09.020 --> 00:37:18.090 the layers and provide this data in a raw format now in our web site. We also 00:37:18.090 --> 00:37:24.531 provide some visualization way for people to be able to see how many vantage points 00:37:24.531 --> 00:37:29.560 we have in each country and so on. Of course, this is the beginning of 00:37:29.560 --> 00:37:34.160 CensoredPlanet. We launched this at August and we have been collecting data for 00:37:34.160 --> 00:37:39.880 almost four months and we have a long way to go. We have users right now through 00:37:39.880 --> 00:37:45.130 organizations using our data and helping us debug by finding things that doesn't 00:37:45.130 --> 00:37:51.950 make sense pointing to us and any of you that ended up using these data, please 00:37:51.950 --> 00:37:56.930 share your feedback with us and we are very responsive to be able to change it, 00:37:56.930 --> 00:38:03.940 not as much as you need. They have a collective of very well dedicated people 00:38:03.940 --> 00:38:10.940 participating. So, now that we have this CensoredPlanet let me give you how it can 00:38:10.940 --> 00:38:19.349 help when there is a political situation going on. You all must remember around 00:38:19.349 --> 00:38:25.410 October there Jamal Khashoggi, a Washington Post reporter, disappeared, 00:38:25.410 --> 00:38:34.530 killed at the Saudi Arabian embassy in Turkey. At the time of this happening 00:38:34.530 --> 00:38:40.540 there was a lot of media attention and this, this news especially two weeks in 00:38:40.540 --> 00:38:46.980 become very internationally spread. CensoredPlanet didn't know this event was 00:38:46.980 --> 00:38:52.750 going to happen. So we have been collecting this data semi-weekly for 2000 00:38:52.750 --> 00:38:57.660 domain or so. And so we went back and we checked the Saudi Arabia. Did we see 00:38:57.660 --> 00:39:04.830 anything interesting? And yes, we saw for example at two weeks in, around October 00:39:04.830 --> 00:39:12.680 16, the domains that we were that was news category and media category, the 00:39:12.680 --> 00:39:18.500 censorship related to those doubled. And let me emphasize, we didn't see like a 00:39:18.500 --> 00:39:23.440 block or not block over the whole country not all the countries have a homogeneous 00:39:23.440 --> 00:39:28.430 censorship happening. We saw it in multiple of the ISPs that we had vantage 00:39:28.430 --> 00:39:34.770 point. Actually I freaked out when one of the activists in Saudi Arabia told us that 00:39:34.770 --> 00:39:41.869 "I don't see this". And we said "What ISP you are in?" And this wasn't the ISPs that 00:39:41.869 --> 00:39:49.160 we had vantage point in. So we were looking for hints that "Is anybody else 00:39:49.160 --> 00:39:55.720 seeing what we were seeing?". And so we ended up seeing there was a commander 00:39:55.720 --> 00:40:03.560 lab project that also saw around October 16 the number of malwares or whatever they 00:40:03.560 --> 00:40:10.220 are testing is also doubled or tripled. I don't know the other. So something was 00:40:10.220 --> 00:40:17.180 going on two weeks in when the news broke. Let me emphasize this news media that I am 00:40:17.180 --> 00:40:22.300 talking about or the global news media that we check like L.A. Times, Fox News 00:40:22.300 --> 00:40:30.970 and so on. But we also checked Arab News which is as the activists told us is a 00:40:30.970 --> 00:40:38.490 Saudi Arabia's propaganda newspaper. That in one of the ISPs was being poisoned. So 00:40:38.490 --> 00:40:49.910 again, censorship measurement is very complex problem. So where we're heading? 00:40:49.910 --> 00:40:55.580 Well, having said that about side channels and the techniques that help us remotely 00:40:55.580 --> 00:41:01.900 collect this data I have to also say that the data we collect doesn't replicate the 00:41:01.900 --> 00:41:06.950 picture of the internet censorship. I mean having a root access on a volunteers 00:41:06.950 --> 00:41:17.641 machine to do a detailed test is powerful. So in the next step, in the next year, one 00:41:17.641 --> 00:41:27.720 of our goal is to join force with OONI to integrate the data and from remote and 00:41:27.720 --> 00:41:37.800 basically local measurements to provide the best of both worlds. Also, we have 00:41:37.800 --> 00:41:43.990 been thinking a lot about what would be a good visualization tools that doesn't end 00:41:43.990 --> 00:41:51.391 up to misrepresent internet censorship. I literally hate that one. Hate it. The 00:41:51.391 --> 00:41:56.860 number of vantage point in countries are not equal. We don't know whether all the 00:41:56.860 --> 00:42:00.980 vantage points that the data has resulted from it is from one ISP or all of our 00:42:00.980 --> 00:42:08.109 ISPs. And then we test domains that are like benign and like I don't know defined 00:42:08.109 --> 00:42:13.650 based on some western values of the freedom of expression. I believe in all of 00:42:13.650 --> 00:42:19.330 them but still culture, economy might play something red. And then we put colors on 00:42:19.330 --> 00:42:25.030 the map, rank the countries, call some countries awful and not giving full 00:42:25.030 --> 00:42:30.849 attention to the others. So something needs to be changed and it's in our 00:42:30.849 --> 00:42:37.700 horizon too. Think about it more deeper. We want to be able to have more statistic 00:42:37.700 --> 00:42:44.320 tools to be able to spot when the patterns change. We want to be able to compare the 00:42:44.320 --> 00:42:49.580 countries when for example Telegram was being blocked at Russia. If you remember 00:42:49.580 --> 00:42:54.910 millions of IPs being blocked. If you don't, know go to my friend Leonid's talk 00:42:54.910 --> 00:43:00.020 about Russia. You're going to learn a lot there. But anyway. So when the Russia was 00:43:00.020 --> 00:43:06.520 blocking Telegram, I said to everyone I bet in the following some other 00:43:06.520 --> 00:43:10.370 governments are going to jump to block Telegram as well. And that's actually what 00:43:10.370 --> 00:43:15.320 we heard, rumors like that. So we need to be able to do that automatically. And 00:43:15.320 --> 00:43:26.470 overall, I want to be able to develop an empirical science of internet censorship 00:43:26.470 --> 00:43:36.720 based on rich data with the help of all of you. CensoredPlanet is now being 00:43:36.720 --> 00:43:43.370 maintained by a group of dedicated students, great friends that I have and 00:43:43.370 --> 00:43:49.960 needs engineers and political scientists to jump on our data and help us to bring 00:43:49.960 --> 00:43:57.320 meaning to what we are collecting. So if you are a good engineer or a political 00:43:57.320 --> 00:44:07.250 scientist or a dedicated person who wants to change the world, reach out to me. For 00:44:07.250 --> 00:44:11.500 as a reference for those of you interested: these are the publications 00:44:11.500 --> 00:44:19.720 that my talk was based on. And now I am open to questions. 00:44:19.720 --> 00:44:26.180 applause 00:44:26.180 --> 00:44:31.440 Herald: Allright, perfect. Thank you so much, Roya, so far. We have some time for 00:44:31.440 --> 00:44:35.500 questions so if you have a question in the room please go to one of the room 00:44:35.500 --> 00:44:40.100 microphones one, two, three, four, and five in the very back. And if you're 00:44:40.100 --> 00:44:44.490 watching the stream you can ask questions to the signal angel via IRC or Twitter and 00:44:44.490 --> 00:44:49.360 we'll also make sure to relay those to the speaker and make sure those get asked. So 00:44:49.360 --> 00:44:52.040 let's just go ahead and start with Mic two please. 00:44:52.040 --> 00:44:57.349 Question: Hey, great talk. Do you worry that by publishing your methods as well as 00:44:57.349 --> 00:45:02.690 your data that you're going to get a response from governments that are 00:45:02.690 --> 00:45:05.869 censoring things such that it makes it more difficult for you to monitor what's 00:45:05.869 --> 00:45:08.680 being censored? Or has that already happened? 00:45:08.680 --> 00:45:14.630 Roya: It hasn't happened. We have control measures to be able to detect that. But 00:45:14.630 --> 00:45:19.260 that has been... it's a really good question and often comes up after I 00:45:19.260 --> 00:45:25.490 present. I can tell you based on my experience it's really hard to synchronize 00:45:25.490 --> 00:45:31.490 all the ISPs in all the countries to act to the SYN-ACK and RST that I'm sending. 00:45:31.490 --> 00:45:36.150 Like, for example for Augur, this is unsolicited packets and for governments to 00:45:36.150 --> 00:45:41.850 block that they are going to be a lot of collateral damage. You might say that 00:45:41.850 --> 00:45:45.610 well, Roya, they're going to block the IP of the University of Michigan. They're a 00:45:45.610 --> 00:45:50.770 spoofing machine. We have a measure for that. I have multiple places that I 00:45:50.770 --> 00:45:56.190 actually have a backup if that case happened. But overall this is a global 00:45:56.190 --> 00:46:02.800 scale measurement, and even in one city or like multiple ISPs you know of it's really 00:46:02.800 --> 00:46:06.920 hard to synchronize being like blocking something and maintaining. So it is 00:46:06.920 --> 00:46:13.630 something that's in our mind thinking about. But as as of now it's not a worry. 00:46:13.630 --> 00:46:16.470 Herald: All right then let's go over to Mic one. 00:46:16.470 --> 00:46:20.510 Question: Thank you. I wondered, it's kind of similar to this question. What if you 00:46:20.510 --> 00:46:24.920 are measuring from a country that is blocking? Do you also distribute the 00:46:24.920 --> 00:46:29.970 measurements over several countries? Roya: Absolutely. Every snapshot that we 00:46:29.970 --> 00:46:37.280 collect is from all the vantage point we have in like certain countries and portion 00:46:37.280 --> 00:46:42.100 of vantage point in like China or like US because they have millions of vantage 00:46:42.100 --> 00:46:46.220 points or like thousands of vantage points. So basically at each snapshot, 00:46:46.220 --> 00:46:52.340 which takes us three days, we collect the data from all of all of the vantage point. 00:46:52.340 --> 00:46:57.580 And so let's say that somebody is reacting to us. We have a benign domain that we 00:46:57.580 --> 00:47:03.250 check as well like for example a domain example.com or random.com. So if we see 00:47:03.250 --> 00:47:09.380 something going on there we actually double check. But good point, because now 00:47:09.380 --> 00:47:14.720 our efforts is very manual labor and we're trying to automate everything so it's 00:47:14.720 --> 00:47:18.900 still a challenge. Thank you. Herald: All right then let's go to Mic 00:47:18.900 --> 00:47:22.859 three. Question: Hi. Have you measured how much 00:47:22.859 --> 00:47:28.140 does IP-ID randomization break your probes? 00:47:28.140 --> 00:47:35.349 Roya: Oh. This is also really good. Let me give a shout out to [name]. He's the guy 00:47:35.349 --> 00:47:45.990 at 1998 discovered IP-ID or published something that I ended up reading. So like 00:47:45.990 --> 00:47:54.440 for example Linux or Ubuntu in the U.S. version they randomized it but it still 00:47:54.440 --> 00:47:59.421 draws this legacy operating system like WindowsXP and predecessors and FreeBSD 00:47:59.421 --> 00:48:04.750 that still have global IP-ID. So one argument that often come up is, what if 00:48:04.750 --> 00:48:09.339 all these machines get updated to the new operating system where it doesn't have a 00:48:09.339 --> 00:48:13.780 maintain global IP-ID? And I can tell you that, well, we'll come up with another 00:48:13.780 --> 00:48:20.129 side channel. For now, that works. But my gut feeling is that if it didn't change 00:48:20.129 --> 00:48:25.230 from 1998 until now with all the things that everybody says that global IP-ID 00:48:25.230 --> 00:48:30.440 variable is a horrible idea, it's not going to change in the coming five years so 00:48:30.440 --> 00:48:33.230 we're good. Question: Thank you. 00:48:33.230 --> 00:48:36.520 Herald: Okay, then let's just move on to Mic four. 00:48:36.520 --> 00:48:41.480 Question: Thank you very much for the great talk. When you were introducing 00:48:41.480 --> 00:48:46.910 Augur I was wondering, does the detection of the blockage between client server 00:48:46.910 --> 00:48:52.190 necessarily indicate censorship? So, because you were talking about validating 00:48:52.190 --> 00:48:59.130 Augur I was wondering if it turns out that there is like a false alarm. What do you 00:48:59.130 --> 00:49:04.530 think could be the potential cause? Roya: You're absolutely right. And I tried 00:49:04.530 --> 00:49:11.630 to emphasize on that that what we end up collecting is can be seen as a disruption. 00:49:11.630 --> 00:49:17.200 Something didn't work. The SYN-ACK or RST got disrupted. Is that there is a 00:49:17.200 --> 00:49:22.250 censorship or it can be a random packet drop. And the way to be able to establish 00:49:22.250 --> 00:49:28.290 that confidence is to check whether aggregate the results. Do we see this 00:49:28.290 --> 00:49:33.670 blocking between multiple of the routers within that country or within that AS . 00:49:33.670 --> 00:49:38.880 Because if one of this is for accident that just didn't make sense or didn't get 00:49:38.880 --> 00:49:43.900 dropped, what about the others? So the whole idea and this is another point that 00:49:43.900 --> 00:49:50.390 I'm so so concerned about: Most of this report and anecdotes that we read is based 00:49:50.390 --> 00:49:55.869 on one VPN or one man touch points in the country. And then there are a lot of lot 00:49:55.869 --> 00:50:00.770 of conclusion out of that. And you often can ask that well this vantage point might 00:50:00.770 --> 00:50:05.640 be subject to so many different things than a government's censorship. Also I 00:50:05.640 --> 00:50:11.980 emphasized that the censorship that I use in this talk is any action that stops 00:50:11.980 --> 00:50:17.180 users' access to get to the requested content. I'm trying to get away from a 00:50:17.180 --> 00:50:23.480 semantic where of the intention applied. But great question. 00:50:23.480 --> 00:50:26.240 Herald: All right, then let's go back to Mic one right. 00:50:26.240 --> 00:50:29.740 Question: Hi Roya. You mentioned that you have a team of students working on all of 00:50:29.740 --> 00:50:33.890 these frameworks. I was wondering if your frameworks were open source are available 00:50:33.890 --> 00:50:37.760 online for collaboration? And if so, where those resources would be? 00:50:37.760 --> 00:50:45.040 Roya: So the data is open. The code hasn't been. For one reason is I'm so low 00:50:45.040 --> 00:50:49.090 confident in sharing code, like I'm friends with Philipp Winter, Dave Fifield. 00:50:49.090 --> 00:50:54.170 These people are pro open source and they constantly blame me for not. But it really 00:50:54.170 --> 00:51:00.721 requires confidence to share code. So we are working on that at least for Quack. I 00:51:00.721 --> 00:51:06.390 think the code is very easily can be shared. For Augur, we spent a heck amount 00:51:06.390 --> 00:51:12.109 of time to make a production ready code and for Satellite I think that is also 00:51:12.109 --> 00:51:17.420 ready. I can share them personally with you but before sharing to the world I want 00:51:17.420 --> 00:51:21.560 to actually give another person to audit and make sure we're not using a curse word 00:51:21.560 --> 00:51:26.420 or something. I don't know. It's just completely my mind being a little bit 00:51:26.420 --> 00:51:31.030 conservative. But happy if you send me an e-mail I send you to code. 00:51:31.030 --> 00:51:35.640 Question: Thank you. Herald: All right then move to Mic two. 00:51:35.640 --> 00:51:39.930 Question: Thanks again for sharing your great vision. I find it really 00:51:39.930 --> 00:51:47.470 fascinating. Also I'm not really a data scientist but my question is: did you find 00:51:47.470 --> 00:51:56.099 any any usefulness in your approaches in the spreading of the Internet of Things? I 00:51:56.099 --> 00:52:06.960 understood that you used routers to make queries but did you send and maybe receive 00:52:06.960 --> 00:52:11.260 back any data from washing machines, toasters,...? 00:52:11.260 --> 00:52:17.480 Roya: I mean, I know, being ethical and trying to not use end user machine limits 00:52:17.480 --> 00:52:22.589 your access a lot. And but but but that's our goal. We are going to stick with 00:52:22.589 --> 00:52:28.240 things that don't belong to the end users. And so it's all routers, organizational 00:52:28.240 --> 00:52:31.940 machines. So I want to make sure that whatever we're using belong to the 00:52:31.940 --> 00:52:35.349 identity that can protect themselves if something went wrong. They can just say 00:52:35.349 --> 00:52:39.640 "Hey this is a freaking router, it receives and sends so many things. I mean, 00:52:39.640 --> 00:52:44.740 look, let me give you show you a TCP (?), for example. A volunteer might not be able 00:52:44.740 --> 00:52:49.290 to defend that because it's already conspiring and collecting this data. But 00:52:49.290 --> 00:52:53.550 good questions, I wish I could but I won't pass that line. 00:52:53.550 --> 00:52:57.380 Herald: All right. I don't see any more questions in the room right now. But we 00:52:57.380 --> 00:53:01.080 have one from the internet so please, signal angel. 00:53:01.080 --> 00:53:06.510 Signal Angel: Yes. Actually a question from koli585: I was in an African 00:53:06.510 --> 00:53:10.009 country where the internet has been completely shut down. How can I quickly 00:53:10.009 --> 00:53:14.709 and safely inform others about the shut down? 00:53:14.709 --> 00:53:21.470 Roya: So while I think local users' values are highly highly needed they can use 00:53:21.470 --> 00:53:27.510 social media like Twitter to send and say whatever, there is a project called IODA. 00:53:27.510 --> 00:53:36.869 It's a project at CAIDA UCSD University in U.S. and Philipp Winter, Alberto 00:53:36.869 --> 00:53:43.160 [Dainotti] and Alistair [King] are working on that. They basically remotely keep 00:53:43.160 --> 00:53:51.540 track of shutdowns and push them out. If you look at the IODA on Twitter you can 00:53:51.540 --> 00:54:02.620 see their live feed of how the shutdowns where the shutdowns happen. So I haven't 00:54:02.620 --> 00:54:09.260 thought about how to reach to the users telling them what we see or how we can 00:54:09.260 --> 00:54:18.609 incorporate the users' feedback. We are working with a group of researchers that 00:54:18.609 --> 00:54:27.000 already developed tools to receive this data from Tweeters and basically use that 00:54:27.000 --> 00:54:31.890 as some level of ground truth, but OONI does such a great job that I haven't felt 00:54:31.890 --> 00:54:37.220 a need. Herald: Alright. Unless the signal angel 00:54:37.220 --> 00:54:43.750 has another question? No? Roya: And let me, can I add one thing? So 00:54:43.750 --> 00:54:52.940 I was listening to a talk about how Iranian versus Arabs were sympathetic 00:54:52.940 --> 00:55:01.040 towards Boston bombing in United States and there were a lot of assumptions and a 00:55:01.040 --> 00:55:05.819 lot of conclusions were made that, oh this, I'm completely paraphrasing. I don't 00:55:05.819 --> 00:55:09.900 remember. But this Iranian doesn't care because they didn't tweet as much. So 00:55:09.900 --> 00:55:17.060 basically their input data was a bunch of tweets around the time of Boston bombing. 00:55:17.060 --> 00:55:21.599 After the talk was over I said: you know that in this country Twitter has been 00:55:21.599 --> 00:55:28.929 blocked and so many people couldn't tweet. applause 00:55:28.929 --> 00:55:33.490 Herald: Alright. That concludes our Q&A, so thanks so much Roya. 00:55:33.490 --> 00:55:35.436 Roya: Thank you. 00:55:35.436 --> 00:55:41.150 applause 00:55:41.150 --> 00:55:45.970 postroll music 00:55:45.970 --> 00:56:04.000 Subtitles created by c3subtitles.de in the year 2020. Join, and help us!