1 00:00:00,000 --> 00:00:19,237 35C3 preroll music 2 00:00:19,237 --> 00:00:24,970 Herald Angel: All right. It's my very big pleasure to introduce Roya Ensafi to you. 3 00:00:24,970 --> 00:00:31,390 She's gonna talk about "Censored Planet: a Global Censorship Observatory". I'm 4 00:00:31,390 --> 00:00:36,230 personally very interested in learning more about this project. Sounds like it's 5 00:00:36,230 --> 00:00:41,490 gonna be very important. So please welcome Roya with a huge warm round of applause. 6 00:00:41,490 --> 00:00:42,880 Thank you. 7 00:00:42,880 --> 00:00:48,660 Applause 8 00:00:48,660 --> 00:00:56,170 Roya: It's wonderful to finally make it to CCC. I had joined talk with multiple of my 9 00:00:56,170 --> 00:01:00,219 friends over the past years and the visa stuff never worked out. This year I 10 00:01:00,219 --> 00:01:06,430 applied for a conference in August and the visa worked for coming to CCC. My name is 11 00:01:06,430 --> 00:01:11,170 Roya Ensafi and I'm professor at the University of Michigan. My research 12 00:01:11,170 --> 00:01:18,069 focuses on security and privacy with the goal of protecting users from adversarial 13 00:01:18,069 --> 00:01:27,799 network. So basically I investigate network interference ...and somebody is 14 00:01:27,799 --> 00:01:55,770 interfering right now. Damn it. What the heck. Cool, I'm good. Oh, no I'm not. 15 00:01:55,770 --> 00:02:07,639 laughter OK. In my lab we develop techniques and systems to be able to 16 00:02:07,639 --> 00:02:13,800 detect network interference often at a scale and apply these frameworks and tools 17 00:02:13,800 --> 00:02:20,060 to be able to understand the behaviors of these actors that do the interference and 18 00:02:20,060 --> 00:02:25,040 use this understanding to be able to come up with a defense. Today I'm going to talk 19 00:02:25,040 --> 00:02:30,030 about a project that is very dear to my heart. The one that I spent six years 20 00:02:30,030 --> 00:02:34,560 working on it. And in this talk I'm going to talk about censorship, internet 21 00:02:34,560 --> 00:02:41,391 censorship. And by that I mean any action that prevents users' access to the 22 00:02:41,391 --> 00:02:48,720 requested content. We have heard an alarming level of censorship happening all 23 00:02:48,720 --> 00:02:53,980 around the world. And while it was previously multiple countries that were 24 00:02:53,980 --> 00:03:01,260 capable of using deep packet inspections to tamper with user traffic thanks to 25 00:03:01,260 --> 00:03:08,540 commercialization of these DPIs now many countries are actually messing with users' 26 00:03:08,540 --> 00:03:16,951 data. For the first time that the users type CNN.com in their browsers, their 27 00:03:16,951 --> 00:03:22,320 traffic is subject to some level of interference by different actors. First 28 00:03:22,320 --> 00:03:27,150 for example the DNS query where the mapping between the domain and the IP 29 00:03:27,150 --> 00:03:34,100 where the content is, can be manipulated. For example the DNS assets can be a dead 30 00:03:34,100 --> 00:03:40,900 IP where the content is not there. If the DNS succeed then the users and the servers 31 00:03:40,900 --> 00:03:47,500 are going to establish a connection, TCP handshake and that can be easily blocked. 32 00:03:47,500 --> 00:03:53,840 If that succeed then users and servers start actually sending back and forth the 33 00:03:53,840 --> 00:04:00,209 actual data and there are enough to clear text to be the traffic encrypted or not 34 00:04:00,209 --> 00:04:06,130 that the DPI can detect a sensitive keyboard and send a reset package to both 35 00:04:06,130 --> 00:04:12,990 basically shut down the connections. Before I forget let me tell you and 36 00:04:12,990 --> 00:04:18,150 emphasize that it's not just the governments and the policies that impose 37 00:04:18,150 --> 00:04:25,400 on the ISPs that lead to censorship. Actually server side which provides the 38 00:04:25,400 --> 00:04:31,319 data are also blocking users. Especially if they are located in a region that they 39 00:04:31,319 --> 00:04:39,580 don't provide any revenue. We recently investigated this issue of dual blocking 40 00:04:39,580 --> 00:04:49,180 in deep and provide more details about what role CDNs actually provide. Imagine 41 00:04:49,180 --> 00:04:57,490 now we have how many users, how many ISPs, how many transit networks and how many 42 00:04:57,490 --> 00:05:02,830 websites. Each of which are going to have their own policies of how to block users' 43 00:05:02,830 --> 00:05:09,859 access. More, censorship changes from time to time, region to region and country to 44 00:05:09,859 --> 00:05:14,759 country. And for that reason many researchers including me have been 45 00:05:14,759 --> 00:05:20,660 interested in collecting data about censorship in a global way and 46 00:05:20,660 --> 00:05:29,539 continuously. Well, I grew up under severe censorship. Be it the university, 47 00:05:29,539 --> 00:05:35,289 government, more frustrating the server side. And I genuinely believe that 48 00:05:35,289 --> 00:05:44,739 censorship take away opportunities and degrade human dignity. It is not just 49 00:05:44,739 --> 00:05:54,090 China, Bahrain, Turkey that does internet censorship. Actually with the DPIs become 50 00:05:54,090 --> 00:06:02,499 cheaper and cheaper many governments are following their leads. As a result 51 00:06:02,499 --> 00:06:06,680 Internet is becoming more and more balkanized and the users around the world 52 00:06:06,680 --> 00:06:09,870 are going to soon have a very very different pictures of what this Internet 53 00:06:09,870 --> 00:06:16,500 is. And we need to be able to collect the data and to be able to know what is being 54 00:06:16,500 --> 00:06:25,121 censored, how it's being censored, where it's being censored and for how long. This 55 00:06:25,121 --> 00:06:32,509 data then can be used to bring transparency and accountability to 56 00:06:32,509 --> 00:06:38,779 governments or private companies that practice internet censorship. It can help 57 00:06:38,779 --> 00:06:44,460 us to know where the circumvention to, where the defense needs to be deployed. It 58 00:06:44,460 --> 00:06:49,309 can help us to let the users around the world to know what their governments are 59 00:06:49,309 --> 00:06:59,370 up to and more important provide valid and good data for the policymakers to come up 60 00:06:59,370 --> 00:07:07,860 with the good policies. Existing research already shows that if we can provide this 61 00:07:07,860 --> 00:07:17,860 data to users they act by their own will to ensure Internet freedom. For many years 62 00:07:17,860 --> 00:07:22,619 my goal has been to come up with a weather map, a censorship weather map where you 63 00:07:22,619 --> 00:07:27,199 can actually see changes in censorship over time, how some countries are 64 00:07:27,199 --> 00:07:34,100 different from others and do that for a continuous duration of time, and for all 65 00:07:34,100 --> 00:07:41,710 over the world. Creating such a map was impossible with the techniques, Internet 66 00:07:41,710 --> 00:07:46,919 measurement methods that we had at that time. At the time and even the common 67 00:07:46,919 --> 00:07:53,779 techniques we now use. The measurement methods to be able to use for measuring 68 00:07:53,779 --> 00:07:59,080 internet censorship is often by deploying a software or giving your customized 69 00:07:59,080 --> 00:08:05,689 Raspberry Pi to either a client or a server and based on that measure what's 70 00:08:05,689 --> 00:08:12,550 happening between client and servers. Well, this approach has a lot of 71 00:08:12,550 --> 00:08:18,050 limitations. For example there are not that many volunteers around the whole 72 00:08:18,050 --> 00:08:25,409 world that are eager to download a software and run it. Second, the data 73 00:08:25,409 --> 00:08:33,190 collected from this approach are often not continuous because the user's connection 74 00:08:33,190 --> 00:08:37,960 can die for a variety of reasons or users may loose interest to keep running the 75 00:08:37,960 --> 00:08:45,450 software. And therefore we end up with sparse data where we cannot have a good 76 00:08:45,450 --> 00:08:53,450 baseline for internet censorship studies. More measuring domains that are sensitive 77 00:08:53,450 --> 00:08:59,800 often create risks for the local collaborators and might end up with their 78 00:08:59,800 --> 00:09:09,810 government's retaliate. These risks are not hypothetical. When the Arab Spring was 79 00:09:09,810 --> 00:09:17,240 happening I was approached by many colleagues to recruit local friends and 80 00:09:17,240 --> 00:09:24,340 colleagues in Middle East to be able to collect measurement data at the time that 81 00:09:24,340 --> 00:09:30,010 was very interesting to capture the behavior of the network and most dangerous 82 00:09:30,010 --> 00:09:36,450 for the locals, and volunteers to collect that. My painting actually expressed what 83 00:09:36,450 --> 00:09:44,090 I felt at the time. I can't just imagine asking people on the ground to help at 84 00:09:44,090 --> 00:09:54,810 these times of unrest. In my opinion, conspiring to collect the data against the 85 00:09:54,810 --> 00:10:02,450 government's interest can be seen as an act of treason. And these governments are 86 00:10:02,450 --> 00:10:11,770 unpredictable often. So it has exposed these volunteers to a severe risk. While 87 00:10:11,770 --> 00:10:19,030 no one has yet been arrested because of measuring internet censorship as far as we 88 00:10:19,030 --> 00:10:25,740 know, and I don't know how we can know that on a global scale, I think the clouds 89 00:10:25,740 --> 00:10:34,210 are on the horizon. I'm still at awe how Turkish government used their surveillance 90 00:10:34,210 --> 00:10:42,410 data at a time of a co-op and tracked down and detained hundreds of users because 91 00:10:42,410 --> 00:10:49,400 there was a traffic between them and by luck a messenger app that was used by co- 92 00:10:49,400 --> 00:10:57,410 op administrators. These things happens. Before I continue, if you know OONI you 93 00:10:57,410 --> 00:11:08,091 might ask how OONI prevents risk. Well, with a great level of efforts. And if you 94 00:11:08,091 --> 00:11:12,130 don't know OONI, OONI is a global community of volunteers that collect data 95 00:11:12,130 --> 00:11:20,840 about censorship around the world. Well, first and foremost they provide their 96 00:11:20,840 --> 00:11:27,990 volunteers with the very honest consent, telling them that "hey, if you run this 97 00:11:27,990 --> 00:11:34,560 software, anybody who is monitoring your traffic know what you're up to." They also 98 00:11:34,560 --> 00:11:39,390 go out of their way to give freedom to these volunteers to choose what website 99 00:11:39,390 --> 00:11:46,010 they want to run, what data they want to push. They establish a great relationship 100 00:11:46,010 --> 00:11:53,940 with the local activist organization in the countries. Well, now that I prove to 101 00:11:53,940 --> 00:11:59,250 you guys that I am a supporter of OONI and I am actually friends with most of them; I 102 00:11:59,250 --> 00:12:05,300 want to emphasize that I still believe that consistent and continuous and global 103 00:12:05,300 --> 00:12:12,200 data about censorship requires a new approach that doesn't need volunteers' 104 00:12:12,200 --> 00:12:21,880 help. I've become obsessed with solving this problems. What if we could measure 105 00:12:21,880 --> 00:12:29,160 without a client, in anywhere around the world, can talk to a server without being 106 00:12:29,160 --> 00:12:36,290 close to a client. Somewhere from here, from University of Michigan. And see 107 00:12:36,290 --> 00:12:42,300 whether the two hosts can talk to each other, globally and remotely, off the 108 00:12:42,300 --> 00:12:50,220 path. When I talk to the people about this, honestly, everybody was like "you 109 00:12:50,220 --> 00:12:54,190 don't know what you're talking about, it's really really challenging". Well, they 110 00:12:54,190 --> 00:13:01,370 were right. The challenge is there, and I'm going to walk you through it. We have 111 00:13:01,370 --> 00:13:06,760 at least 140 million IP addresses that respond to same packet. This means they 112 00:13:06,760 --> 00:13:15,530 speak to the world, and they follow blindly TCP/IP protocol. So the question 113 00:13:15,530 --> 00:13:24,400 becomes: how can I leverage the subtle properties of TCP/IP to be able to detect 114 00:13:24,400 --> 00:13:36,080 that two hosts can talk to each other? Well, Spooky Scan is a technique that Jed 115 00:13:36,080 --> 00:13:43,090 Crandall from University of New Mexico and I developed that uses TCP/IP side channels 116 00:13:43,090 --> 00:13:49,770 to be able to detect whether the two remote hosts can establish a TCP handshake 117 00:13:49,770 --> 00:13:56,890 or not, and if not, in which direction the packets are being dropped. Off the path 118 00:13:56,890 --> 00:14:03,780 and remotely. And I'm gonna start telling you how this works. First I have to cover 119 00:14:03,780 --> 00:14:10,810 some background. So any connection that is based on TCP, one of the basic 120 00:14:10,810 --> 00:14:15,950 communication protocols we have, is it needs to establish a TCP handshake. So 121 00:14:15,950 --> 00:14:22,730 basically you should, you send a SYN and in the packet you send, in the IP header, 122 00:14:22,730 --> 00:14:30,750 you have a field called "identification IP_ID", and this field is used for 123 00:14:30,750 --> 00:14:36,610 fragmentation reason, and I'm going to use this field a lot in the rest of the talk. 124 00:14:36,610 --> 00:14:42,300 After the user received a SYN, it is going to send a SYN-ACK back, have another IP_ID 125 00:14:42,300 --> 00:14:47,520 in it. And then, if I want to establish a connection I send ACK. Otherwise I send a 126 00:14:47,520 --> 00:14:56,070 RESET (RST). Part of the protocol says that if you send a SYN-ACK packet to a 127 00:14:56,070 --> 00:15:01,310 machine with a port open or closed, it's going to send you a RST, telling you "what 128 00:15:01,310 --> 00:15:05,220 the heck you are sending me SYN-ACK, I didn't send you a SYN" and another part 129 00:15:05,220 --> 00:15:09,350 said: if you send a SYN packet to a machine with the port open, eager to 130 00:15:09,350 --> 00:15:13,880 establish connection, it will send you a SYN-ACK. If you don't do anything, because 131 00:15:13,880 --> 00:15:20,040 TCP/IP is reliable, it's going to send you multple SYN-ACK. It depends on operating 132 00:15:20,040 --> 00:15:30,241 system, 3, 5, you name it. Spooky Scan requires some basic characteristics. For 133 00:15:30,241 --> 00:15:36,740 example, the client, the vantage points that we are interested, should maintain a 134 00:15:36,740 --> 00:15:44,060 global variable for the IP_ID. It means that, when they receive the packets and 135 00:15:44,060 --> 00:15:48,650 they want to send a packet out, no matter who they're sending the packet to, this 136 00:15:48,650 --> 00:15:53,590 IP_ID is going to be a shared resource, as in going to be increment by one. So by 137 00:15:53,590 --> 00:15:57,900 just watching the IP_ID changes you can see how much a machine is noisy, how much 138 00:15:57,900 --> 00:16:03,820 a machine is sending traffic out. A server should have a port open, let's say 80 or 139 00:16:03,820 --> 00:16:08,910 443, and wants to establish a connection, and the measurement machine, me, should be 140 00:16:08,910 --> 00:16:15,360 able to spoof packets. It means sending packet with the source IP different from 141 00:16:15,360 --> 00:16:20,520 my own machine. To be able to do that, you need to talk to upstream network and ask 142 00:16:20,520 --> 00:16:28,260 them not to drop the packets. All of these requirements I could easily satisfy with a 143 00:16:28,260 --> 00:16:36,560 little bit of effort. A Spooky Scan starts with measurement machine send a SYN-ACK 144 00:16:36,560 --> 00:16:41,310 packet to one of this client with a global IP_ID, at a time let's say the value is 145 00:16:41,310 --> 00:16:49,010 7000. The client is going to send back a RST, following the protocol, revealing to 146 00:16:49,010 --> 00:16:53,881 me what the value of IP_ID. In the next step I'm going to send a spoofed SYN 147 00:16:53,881 --> 00:17:01,779 packet to a server using a client IP. As a result, the SYN-ACK is going to be sent to 148 00:17:01,779 --> 00:17:06,289 the client. Again, client is going to send a RST back, the IP_ID is going to be 149 00:17:06,289 --> 00:17:11,240 incremented by 1. Next time I query IP_ID I'm going to see a jump too. In a 150 00:17:11,240 --> 00:17:17,189 noiseless model, I know that this machine talked to the server. If I query it again, 151 00:17:17,189 --> 00:17:25,070 I won't see any jump. So, Delta 2, Delta 1. Now imagine there is a firewall that 152 00:17:25,070 --> 00:17:32,520 blocks the SYN-ACKs going from the server to the client. Well, it doesn't matter how 153 00:17:32,520 --> 00:17:36,860 much of the traffic I send, it's not going to get there. It's not going to get there. 154 00:17:36,860 --> 00:17:44,390 So the delta I see is 1, 1. In the third case when the packets are going to be 155 00:17:44,390 --> 00:17:49,790 dropped from the client to the server: Well, my SYN-ACK gets there. The SYN-ACK 156 00:17:49,790 --> 00:17:55,030 gets to the client, the client is going to set the RST back, but it's not going to 157 00:17:55,030 --> 00:17:59,470 get to the server. And so server thinks that a packet got dropped, so it's going 158 00:17:59,470 --> 00:18:07,040 to send multiple SYN-ACK. And as a result the RST is going to be plus plus more. And 159 00:18:07,040 --> 00:18:13,690 so what jump I would see is, let's say, 2, 2. Let me put them all together. So you 160 00:18:13,690 --> 00:18:19,670 have 3 cases. Blocking in this direction. No blocking and blocking in the other. And 161 00:18:19,670 --> 00:18:25,890 you see different jumps or different deltas. So it's detectable. Yes, yes, in a 162 00:18:25,890 --> 00:18:31,770 noiseless model. I know the clients talk to so many others and the IP_ID is going 163 00:18:31,770 --> 00:18:37,590 to be changed because of a variety of reason. I call all of those noise. And 164 00:18:37,590 --> 00:18:42,870 this is how we are going to deal with it. Well, intuitively thinking we can amplify 165 00:18:42,870 --> 00:18:47,940 the signal. We can actually instead of sending one spoofed SYN packet we can send 166 00:18:47,940 --> 00:18:55,310 n. And for a variety of reasons packets can get dropped. So we need to repeat this 167 00:18:55,310 --> 00:19:04,360 measurement. So here is some data from a Spooky Scan where I used the following 168 00:19:04,360 --> 00:19:13,300 probing method. For 30 seconds I spoofed the, I've sent a query for IP_ID. And then 169 00:19:13,300 --> 00:19:20,559 for another 30 seconds I send these 5 spoofed SYN packets. This is machines or 170 00:19:20,559 --> 00:19:26,680 clients in Azerbaijan, China and United States. And we wanted to check whether it 171 00:19:26,680 --> 00:19:32,980 has reached the TOR-relay that we had in Sweden. You can see there are different 172 00:19:32,980 --> 00:19:40,280 jump or different levels-shift that you observe in a second phase. And just 173 00:19:40,280 --> 00:19:45,290 visually looking at it or using auto- regressive moving average or ARMA you 174 00:19:45,290 --> 00:19:51,120 can actually detect that. But there is an insight here, which is that not all the 175 00:19:51,120 --> 00:19:56,520 clients have the same level of noise. And for which, for some of them, especially 176 00:19:56,520 --> 00:20:01,630 these guys, you could easily detect after five level of sending IP_ID-query and then 177 00:20:01,630 --> 00:20:10,770 five seconds of spoofing. So in the follow-up work we tried to use this 178 00:20:10,770 --> 00:20:16,480 insight, to be able to come up with a scalable and efficient technique to be 179 00:20:16,480 --> 00:20:24,900 able to use it in a global way. And that technique is called "Augur". Well Augur 180 00:20:24,900 --> 00:20:32,920 adopts this probing method. First, for four seconds it queries IP_ID, then in one 181 00:20:32,920 --> 00:20:42,160 second sends 10 spoofed SYN-packets. Then look at the IP_ID-acceleration or second 182 00:20:42,160 --> 00:20:49,600 derivative, and see whether we see a jump, a sudden jump at the time of perturbation, 183 00:20:49,600 --> 00:20:55,520 when we did the spoofing. How confident we are that that jump is the result of our 184 00:20:55,520 --> 00:21:02,290 own spoofed packet? Well, I'm not confident, run it again. I think so, run 185 00:21:02,290 --> 00:21:09,280 it again, until you have a sufficient confidence. It turns out there is a 186 00:21:09,280 --> 00:21:15,230 statistical analysis called "sequential hypothesis testing" that can be used to be 187 00:21:15,230 --> 00:21:23,300 able to gradually improve our confidence about the case we're detecting. So I'm 188 00:21:23,300 --> 00:21:28,340 going to give you a very, very rough overview of how this works. But for 189 00:21:28,340 --> 00:21:36,810 sequential hypothesis testing we need to define a random variable. And we use 190 00:21:36,810 --> 00:21:42,910 IP_ID-acceleration at the time of perturbation, being 1 or 0, based on you 191 00:21:42,910 --> 00:21:53,570 see jump or not. We also need to calculate some empirical priors, known 192 00:21:53,570 --> 00:21:59,450 probabilities. If you look at everything, what would be the probability that you see 193 00:21:59,450 --> 00:22:08,179 jump when there is actually no blocking? And so on. After we put all this together 194 00:22:08,179 --> 00:22:16,150 then we can formalize an algorithm starting by run a trial. Update the 195 00:22:16,150 --> 00:22:20,940 sequence of values for the random variables. Then check whether this 196 00:22:20,940 --> 00:22:27,320 sequence of values belongs to the distribution of where the blocking happen 197 00:22:27,320 --> 00:22:32,590 or not. What's the likelihood of that? If you're confident, if we reached the level 198 00:22:32,590 --> 00:22:39,130 that we are satisfied, then we call it a case. So putting all this together this is 199 00:22:39,130 --> 00:22:47,720 how Augur works. We scan the whole IPv4, find global IP_ID-machines. And then we 200 00:22:47,720 --> 00:22:55,870 have some constraint that is it a stable machine? Is it a noisier or have a noise 201 00:22:55,870 --> 00:23:02,170 that you want to deal with? We also need to figure out what website are we 202 00:23:02,170 --> 00:23:09,290 interested to test reachability towards? What countries we are? So after we decide 203 00:23:09,290 --> 00:23:18,500 all the input then we run a scheduler making sure that no client and server are 204 00:23:18,500 --> 00:23:26,160 under the measurement in the same time because they mess each other's detection. 205 00:23:26,160 --> 00:23:32,500 And then we actually use our analysis to be able to call the case and summarize the 206 00:23:32,500 --> 00:23:39,191 results. I started by saying that the common methods have this limitation, for 207 00:23:39,191 --> 00:23:45,370 example coverage continuity and ethics. Well, when it comes to coverage there are 208 00:23:45,370 --> 00:23:52,620 more than 22-million global IP_ID- machines. These are WindowsXP or 209 00:23:52,620 --> 00:24:02,570 predecessors. And FreeBSDs for example. Compared to the previous board, 210 00:24:02,570 --> 00:24:07,910 one successful project is the RIPE-atlas, and they have around 10000 probes globally 211 00:24:07,910 --> 00:24:18,970 deployed. When it comes to continuity we don't depend on the end user. So it's much 212 00:24:18,970 --> 00:24:28,720 more reliable to use this. Well, by not asking volunteers to help we were already 213 00:24:28,720 --> 00:24:34,570 reducing the risk. Because there is no users conspiring against their governments 214 00:24:34,570 --> 00:24:43,000 to collect this data. But our approach is not also zero risk. If you look you have a 215 00:24:43,000 --> 00:24:49,860 different kind of risk here. The client and server exchanging SYN-ACK and RST 216 00:24:49,860 --> 00:24:55,810 without each of them giving a consent. And we don't want to ask for consent. Because 217 00:24:55,810 --> 00:25:01,020 if you do, the dilemma exists. We have to go back and it's just the same that's 218 00:25:01,020 --> 00:25:06,850 asking volunteers. So, to deal with that and cope with that, to reduce the risk 219 00:25:06,850 --> 00:25:15,380 more, we don't use end-IPs. We actually use 2 hops back, routers which high 220 00:25:15,380 --> 00:25:21,650 probability they are infrastructure machines and use those as a vantage point. 221 00:25:21,650 --> 00:25:31,486 Even in this harsh constraint we still have 53000 global IP_ID-routers. To test 222 00:25:31,486 --> 00:25:38,780 the framework to see that whether Augur works we chose 2000 of these global IP_ID- 223 00:25:38,780 --> 00:25:45,350 machines, uniformly selected from all the countries we had vantage point. We 224 00:25:45,350 --> 00:25:52,549 selected websites from Citizen Lab Testlist. This is the research 225 00:25:52,549 --> 00:25:57,710 organization in Toronto University where they crowdsourced websites that are 226 00:25:57,710 --> 00:26:03,070 potentially being blocked or potential sensitive. And then we used thousands of 227 00:26:03,070 --> 00:26:09,640 the websites from Alexa top-10k. And then we get the Augur running for 17 days and 228 00:26:09,640 --> 00:26:17,050 collect this data. One of the challenges that we have to validate Augur was like: 229 00:26:17,050 --> 00:26:22,940 So, what is the truth? What is the ground- truth? What would we see that makes sense? 230 00:26:22,940 --> 00:26:26,270 So, and this is the biggest and fundamental challenge for internet- 231 00:26:26,270 --> 00:26:33,570 censorship anyway. But so the first approach is leaning on intuition, which is 232 00:26:33,570 --> 00:26:40,049 like no client should show blocking towards all the websites. No server should 233 00:26:40,049 --> 00:26:45,740 show blocking for bulk of our clients. And if anything happens like that we just 234 00:26:45,740 --> 00:26:51,960 trash it. And we should see more bias towards the sensitive domain versus the 235 00:26:51,960 --> 00:27:01,559 ones that are popular. And so on. And also we hope to replicate the anecdotes, the 236 00:27:01,559 --> 00:27:08,870 reports out there. And we did all of those. And that's how we validate Augur. 237 00:27:08,870 --> 00:27:17,690 So at the end Augur is a system that is as scalable and efficient, ethical and can be 238 00:27:17,690 --> 00:27:24,630 used to detect TCP/IP-blocking continuously. Yes I know that is just 239 00:27:24,630 --> 00:27:32,310 TCP/IP. What about the other layers? Can we measure them remotely as well? Well, 240 00:27:32,310 --> 00:27:40,090 let me focus on the DNS. You might ask: Is there a way that we can remotely detect 241 00:27:40,090 --> 00:27:46,890 DNS poisoning or manipulation? Well let's think it out loud. From now on I'm gonna 242 00:27:46,890 --> 00:27:54,370 give just the highlights of the papers we work for the lack of the time. Well, if we 243 00:27:54,370 --> 00:28:06,070 scan the whole IPv4 we have a lot of open DNS resolvers, which means that they are 244 00:28:06,070 --> 00:28:14,929 open to anybody sending a query to them to resolve. And these open DNS-resolvers can 245 00:28:14,929 --> 00:28:22,590 be used as a vantage point. We can use open DNS-resolvers in different ISPs 246 00:28:22,590 --> 00:28:29,830 around the world to see whether that DNS queries are poisoned or not. Well, wait. 247 00:28:29,830 --> 00:28:35,419 We need to make sure that they don't belong to the end user. So we come up with 248 00:28:35,419 --> 00:28:42,760 a lot of checks to make sure that these open DNS-resolvers are organizational, 249 00:28:42,760 --> 00:28:50,610 belonging to the ISP or infrastructure. After we do that then we start sending all 250 00:28:50,610 --> 00:28:57,980 our queries to these, let's say, open DNS- resolvers in the ISP in Bahrain, for all 251 00:28:57,980 --> 00:29:03,929 the domain we're interested. And capture what we receive what IPs we receive. The 252 00:29:03,929 --> 00:29:11,390 challenge is then to detect what is the wrong answer. And so we have to come up 253 00:29:11,390 --> 00:29:19,760 with a lot of heuristics. A set of heuristics. For example the response that 254 00:29:19,760 --> 00:29:28,610 we received is that equal to a reply we got from our control measurements, where 255 00:29:28,610 --> 00:29:36,500 we know the IP is not blocked or poisoned or something. The content is there. Or we 256 00:29:36,500 --> 00:29:42,060 can actually look at the IP that we received and see whether it has a valid 257 00:29:42,060 --> 00:29:50,850 http cert, with or without the SNI or servername identification or something. 258 00:29:50,850 --> 00:29:55,720 And so on so forth. So we come up with lots of heuristics to detect wrong 259 00:29:55,720 --> 00:30:06,840 answers. The results of all these efforts ended up being a project called 260 00:30:06,840 --> 00:30:12,210 "Satellite", which was started by Will Scott. I'm sure he is in the audience 261 00:30:12,210 --> 00:30:16,809 somewhere. A great friend of mine and very good supporter of CensoredPlanet. 262 00:30:16,809 --> 00:30:24,000 Selflessly, he has been a miracle that I I had the opportunity and fortune to meet 263 00:30:24,000 --> 00:30:31,890 him. We have Satellite. Satellite automate the whole steps that I told you. For this 264 00:30:31,890 --> 00:30:37,400 work we use science that developed in both of the work. We call it Satellite because 265 00:30:37,400 --> 00:30:46,421 of seniority and sticking with the name. So how much coverage Satellite has? If you 266 00:30:46,421 --> 00:30:54,880 scan IPv4 you end up with 4.2 million open DNS-resolvers in every country in their 267 00:30:54,880 --> 00:31:01,079 territories. We make, we need, we we actually need to make sure there are 268 00:31:01,079 --> 00:31:08,950 ethics for that reason. If we put a harsh condition. We say that let's only use the 269 00:31:08,950 --> 00:31:17,710 ones that fallow their valid PTR record followed this expression. Basically let's 270 00:31:17,710 --> 00:31:23,200 just use the open DNS-resolvers that are name servers or at least their PDR record 271 00:31:23,200 --> 00:31:29,920 suggests that. This is a really harsh constraint. Actually, my students have 272 00:31:29,920 --> 00:31:34,430 been adding more and more regular expression for the ones that we are sure 273 00:31:34,430 --> 00:31:42,610 they are organizational. But for now just being this harsh we have 40k of DNS- 274 00:31:42,610 --> 00:31:56,830 revolvers in almost 169 countries I guess. So censorship happened in other layers as 275 00:31:56,830 --> 00:32:00,700 well. How do we want to deal with that remote channel, with the remote side 276 00:32:00,700 --> 00:32:12,520 channel? And, especially, like, what about http traffic or disruption that can happen 277 00:32:12,520 --> 00:32:29,809 to you know TLS centric. I hate water. Oh no. Okay. So. So it's scratching 278 00:32:29,809 --> 00:32:38,220 noise it's well documented that many DPIs especially in the Great Firewall of China monitor 279 00:32:38,220 --> 00:32:43,930 the traffic and then they see a key word, a sensitive keyword like "Falun Gong". 280 00:32:43,930 --> 00:32:50,350 They act and a drop traffic or send a RST. And as I mentioned earlier there are 281 00:32:50,350 --> 00:32:57,330 enough clear text everywhere. Even in TLS handshakes SNI is in clear text. And for a 282 00:32:57,330 --> 00:33:03,590 long time I was trying to come up with a way of detecting application layer using 283 00:33:03,590 --> 00:33:09,320 this fancy side channel. Like, how can I detect that when the client and server 284 00:33:09,320 --> 00:33:14,630 need to first establish a TCP handshake, how the side channel can jump in and then 285 00:33:14,630 --> 00:33:22,720 detect the rest? We were lucky enough that the end pointed to a protocol called 286 00:33:22,720 --> 00:33:32,900 "Echo". It's a protocol designed in 1983 and it's for testing reasons, for the 287 00:33:32,900 --> 00:33:41,140 debu..it is a debugging tool, basically. It's a predecessor to ping. And basically, 288 00:33:41,140 --> 00:33:50,120 after you establish a TCP handshake to port 7, whatever you send the Echo servers 289 00:33:50,120 --> 00:33:57,290 on port 7 it's gonna echo it back. Now think about it. How we can use Echo 290 00:33:57,290 --> 00:34:04,570 servers to be able to detect application layer blocking? Well, when it's not 291 00:34:04,570 --> 00:34:08,490 available, let's say I have an Echo server in the U.S. and a measurement machine in 292 00:34:08,490 --> 00:34:13,890 the University of Michigan I establish a TCP handshake and I send a GET request 293 00:34:13,890 --> 00:34:19,190 to... using a censored keyboard for example. It's gonna get back to me the 294 00:34:19,190 --> 00:34:28,269 same thing I sent. But now let's put the DPI that is gonna be triggered by it. 295 00:34:28,269 --> 00:34:37,150 Well, for sure, either I'm going to receive a RST first or something else. So 296 00:34:37,150 --> 00:34:43,609 we can actually come up with a algorithm to be able to use Echo servers to detect 297 00:34:43,609 --> 00:34:47,969 disruptions on application layer. Basically keyboards blocking, URL 298 00:34:47,969 --> 00:34:58,530 blocking. Results of this is a tool called Quack. And Quack actually uses Echo 299 00:34:58,530 --> 00:35:06,470 servers to be able to detect in a scalable way and say if, whether the keywords are 300 00:35:06,470 --> 00:35:14,380 being blocked around the world. So what did we do is first scan the whole IPv4. We 301 00:35:14,380 --> 00:35:22,910 find 47k Echo servers running around the world. Then we need to be able to check 302 00:35:22,910 --> 00:35:27,270 whether they or not belong to the end users. And that was a very challenging 303 00:35:27,270 --> 00:35:36,530 part because there is not a clear signal as it's.. there are 90 percent of them are 304 00:35:36,530 --> 00:35:40,730 infrastructure but there is still some portion of them that we don't know. So 305 00:35:40,730 --> 00:35:46,610 what we do is we look at the FreedomHouse reports and the countries that are 306 00:35:46,610 --> 00:35:52,931 partially open or not open, not free or partially free what they're called. This 307 00:35:52,931 --> 00:35:58,720 is around 50... This is around 50 countries. And for those we use... we 308 00:35:58,720 --> 00:36:05,460 randomly select some that we want and we use OS detection of Nmap. And if you have, 309 00:36:05,460 --> 00:36:15,750 it will give us back it's a server, it's a switch and so on. We use those. So with 310 00:36:15,750 --> 00:36:23,010 the help of so many collaborators after almost six years we end up with three 311 00:36:23,010 --> 00:36:32,420 systems that can capture TCP/IP blocking, DNS, and application layer blocking using 312 00:36:32,420 --> 00:36:43,480 infrastructure and organizational machines. So while it was, it was a dream 313 00:36:43,480 --> 00:36:47,810 or a vision that we can come up with a better map to collect this data in a 314 00:36:47,810 --> 00:36:56,020 continuous way, thanks to help of a lot of people especially my students, Will, and 315 00:36:56,020 --> 00:37:02,060 other collaborators we now have CensoredPlanet. CensoredPlanet collects 316 00:37:02,060 --> 00:37:09,020 semi-weekly snapshots of Internet censorship using our vantage point in all 317 00:37:09,020 --> 00:37:18,090 the layers and provide this data in a raw format now in our web site. We also 318 00:37:18,090 --> 00:37:24,531 provide some visualization way for people to be able to see how many vantage points 319 00:37:24,531 --> 00:37:29,560 we have in each country and so on. Of course, this is the beginning of 320 00:37:29,560 --> 00:37:34,160 CensoredPlanet. We launched this at August and we have been collecting data for 321 00:37:34,160 --> 00:37:39,880 almost four months and we have a long way to go. We have users right now through 322 00:37:39,880 --> 00:37:45,130 organizations using our data and helping us debug by finding things that doesn't 323 00:37:45,130 --> 00:37:51,950 make sense pointing to us and any of you that ended up using these data, please 324 00:37:51,950 --> 00:37:56,930 share your feedback with us and we are very responsive to be able to change it, 325 00:37:56,930 --> 00:38:03,940 not as much as you need. They have a collective of very well dedicated people 326 00:38:03,940 --> 00:38:10,940 participating. So, now that we have this CensoredPlanet let me give you how it can 327 00:38:10,940 --> 00:38:19,349 help when there is a political situation going on. You all must remember around 328 00:38:19,349 --> 00:38:25,410 October there Jamal Khashoggi, a Washington Post reporter, disappeared, 329 00:38:25,410 --> 00:38:34,530 killed at the Saudi Arabian embassy in Turkey. At the time of this happening 330 00:38:34,530 --> 00:38:40,540 there was a lot of media attention and this, this news especially two weeks in 331 00:38:40,540 --> 00:38:46,980 become very internationally spread. CensoredPlanet didn't know this event was 332 00:38:46,980 --> 00:38:52,750 going to happen. So we have been collecting this data semi-weekly for 2000 333 00:38:52,750 --> 00:38:57,660 domain or so. And so we went back and we checked the Saudi Arabia. Did we see 334 00:38:57,660 --> 00:39:04,830 anything interesting? And yes, we saw for example at two weeks in, around October 335 00:39:04,830 --> 00:39:12,680 16, the domains that we were that was news category and media category, the 336 00:39:12,680 --> 00:39:18,500 censorship related to those doubled. And let me emphasize, we didn't see like a 337 00:39:18,500 --> 00:39:23,440 block or not block over the whole country not all the countries have a homogeneous 338 00:39:23,440 --> 00:39:28,430 censorship happening. We saw it in multiple of the ISPs that we had vantage 339 00:39:28,430 --> 00:39:34,770 point. Actually I freaked out when one of the activists in Saudi Arabia told us that 340 00:39:34,770 --> 00:39:41,869 "I don't see this". And we said "What ISP you are in?" And this wasn't the ISPs that 341 00:39:41,869 --> 00:39:49,160 we had vantage point in. So we were looking for hints that "Is anybody else 342 00:39:49,160 --> 00:39:55,720 seeing what we were seeing?". And so we ended up seeing there was a commander 343 00:39:55,720 --> 00:40:03,560 lab project that also saw around October 16 the number of malwares or whatever they 344 00:40:03,560 --> 00:40:10,220 are testing is also doubled or tripled. I don't know the other. So something was 345 00:40:10,220 --> 00:40:17,180 going on two weeks in when the news broke. Let me emphasize this news media that I am 346 00:40:17,180 --> 00:40:22,300 talking about or the global news media that we check like L.A. Times, Fox News 347 00:40:22,300 --> 00:40:30,970 and so on. But we also checked Arab News which is as the activists told us is a 348 00:40:30,970 --> 00:40:38,490 Saudi Arabia's propaganda newspaper. That in one of the ISPs was being poisoned. So 349 00:40:38,490 --> 00:40:49,910 again, censorship measurement is very complex problem. So where we're heading? 350 00:40:49,910 --> 00:40:55,580 Well, having said that about side channels and the techniques that help us remotely 351 00:40:55,580 --> 00:41:01,900 collect this data I have to also say that the data we collect doesn't replicate the 352 00:41:01,900 --> 00:41:06,950 picture of the internet censorship. I mean having a root access on a volunteers 353 00:41:06,950 --> 00:41:17,641 machine to do a detailed test is powerful. So in the next step, in the next year, one 354 00:41:17,641 --> 00:41:27,720 of our goal is to join force with OONI to integrate the data and from remote and 355 00:41:27,720 --> 00:41:37,800 basically local measurements to provide the best of both worlds. Also, we have 356 00:41:37,800 --> 00:41:43,990 been thinking a lot about what would be a good visualization tools that doesn't end 357 00:41:43,990 --> 00:41:51,391 up to misrepresent internet censorship. I literally hate that one. Hate it. The 358 00:41:51,391 --> 00:41:56,860 number of vantage point in countries are not equal. We don't know whether all the 359 00:41:56,860 --> 00:42:00,980 vantage points that the data has resulted from it is from one ISP or all of our 360 00:42:00,980 --> 00:42:08,109 ISPs. And then we test domains that are like benign and like I don't know defined 361 00:42:08,109 --> 00:42:13,650 based on some western values of the freedom of expression. I believe in all of 362 00:42:13,650 --> 00:42:19,330 them but still culture, economy might play something red. And then we put colors on 363 00:42:19,330 --> 00:42:25,030 the map, rank the countries, call some countries awful and not giving full 364 00:42:25,030 --> 00:42:30,849 attention to the others. So something needs to be changed and it's in our 365 00:42:30,849 --> 00:42:37,700 horizon too. Think about it more deeper. We want to be able to have more statistic 366 00:42:37,700 --> 00:42:44,320 tools to be able to spot when the patterns change. We want to be able to compare the 367 00:42:44,320 --> 00:42:49,580 countries when for example Telegram was being blocked at Russia. If you remember 368 00:42:49,580 --> 00:42:54,910 millions of IPs being blocked. If you don't, know go to my friend Leonid's talk 369 00:42:54,910 --> 00:43:00,020 about Russia. You're going to learn a lot there. But anyway. So when the Russia was 370 00:43:00,020 --> 00:43:06,520 blocking Telegram, I said to everyone I bet in the following some other 371 00:43:06,520 --> 00:43:10,370 governments are going to jump to block Telegram as well. And that's actually what 372 00:43:10,370 --> 00:43:15,320 we heard, rumors like that. So we need to be able to do that automatically. And 373 00:43:15,320 --> 00:43:26,470 overall, I want to be able to develop an empirical science of internet censorship 374 00:43:26,470 --> 00:43:36,720 based on rich data with the help of all of you. CensoredPlanet is now being 375 00:43:36,720 --> 00:43:43,370 maintained by a group of dedicated students, great friends that I have and 376 00:43:43,370 --> 00:43:49,960 needs engineers and political scientists to jump on our data and help us to bring 377 00:43:49,960 --> 00:43:57,320 meaning to what we are collecting. So if you are a good engineer or a political 378 00:43:57,320 --> 00:44:07,250 scientist or a dedicated person who wants to change the world, reach out to me. For 379 00:44:07,250 --> 00:44:11,500 as a reference for those of you interested: these are the publications 380 00:44:11,500 --> 00:44:19,720 that my talk was based on. And now I am open to questions. 381 00:44:19,720 --> 00:44:26,180 applause 382 00:44:26,180 --> 00:44:31,440 Herald: Allright, perfect. Thank you so much, Roya, so far. We have some time for 383 00:44:31,440 --> 00:44:35,500 questions so if you have a question in the room please go to one of the room 384 00:44:35,500 --> 00:44:40,100 microphones one, two, three, four, and five in the very back. And if you're 385 00:44:40,100 --> 00:44:44,490 watching the stream you can ask questions to the signal angel via IRC or Twitter and 386 00:44:44,490 --> 00:44:49,360 we'll also make sure to relay those to the speaker and make sure those get asked. So 387 00:44:49,360 --> 00:44:52,040 let's just go ahead and start with Mic two please. 388 00:44:52,040 --> 00:44:57,349 Question: Hey, great talk. Do you worry that by publishing your methods as well as 389 00:44:57,349 --> 00:45:02,690 your data that you're going to get a response from governments that are 390 00:45:02,690 --> 00:45:05,869 censoring things such that it makes it more difficult for you to monitor what's 391 00:45:05,869 --> 00:45:08,680 being censored? Or has that already happened? 392 00:45:08,680 --> 00:45:14,630 Roya: It hasn't happened. We have control measures to be able to detect that. But 393 00:45:14,630 --> 00:45:19,260 that has been... it's a really good question and often comes up after I 394 00:45:19,260 --> 00:45:25,490 present. I can tell you based on my experience it's really hard to synchronize 395 00:45:25,490 --> 00:45:31,490 all the ISPs in all the countries to act to the SYN-ACK and RST that I'm sending. 396 00:45:31,490 --> 00:45:36,150 Like, for example for Augur, this is unsolicited packets and for governments to 397 00:45:36,150 --> 00:45:41,850 block that they are going to be a lot of collateral damage. You might say that 398 00:45:41,850 --> 00:45:45,610 well, Roya, they're going to block the IP of the University of Michigan. They're a 399 00:45:45,610 --> 00:45:50,770 spoofing machine. We have a measure for that. I have multiple places that I 400 00:45:50,770 --> 00:45:56,190 actually have a backup if that case happened. But overall this is a global 401 00:45:56,190 --> 00:46:02,800 scale measurement, and even in one city or like multiple ISPs you know of it's really 402 00:46:02,800 --> 00:46:06,920 hard to synchronize being like blocking something and maintaining. So it is 403 00:46:06,920 --> 00:46:13,630 something that's in our mind thinking about. But as as of now it's not a worry. 404 00:46:13,630 --> 00:46:16,470 Herald: All right then let's go over to Mic one. 405 00:46:16,470 --> 00:46:20,510 Question: Thank you. I wondered, it's kind of similar to this question. What if you 406 00:46:20,510 --> 00:46:24,920 are measuring from a country that is blocking? Do you also distribute the 407 00:46:24,920 --> 00:46:29,970 measurements over several countries? Roya: Absolutely. Every snapshot that we 408 00:46:29,970 --> 00:46:37,280 collect is from all the vantage point we have in like certain countries and portion 409 00:46:37,280 --> 00:46:42,100 of vantage point in like China or like US because they have millions of vantage 410 00:46:42,100 --> 00:46:46,220 points or like thousands of vantage points. So basically at each snapshot, 411 00:46:46,220 --> 00:46:52,340 which takes us three days, we collect the data from all of all of the vantage point. 412 00:46:52,340 --> 00:46:57,580 And so let's say that somebody is reacting to us. We have a benign domain that we 413 00:46:57,580 --> 00:47:03,250 check as well like for example a domain example.com or random.com. So if we see 414 00:47:03,250 --> 00:47:09,380 something going on there we actually double check. But good point, because now 415 00:47:09,380 --> 00:47:14,720 our efforts is very manual labor and we're trying to automate everything so it's 416 00:47:14,720 --> 00:47:18,900 still a challenge. Thank you. Herald: All right then let's go to Mic 417 00:47:18,900 --> 00:47:22,859 three. Question: Hi. Have you measured how much 418 00:47:22,859 --> 00:47:28,140 does IP-ID randomization break your probes? 419 00:47:28,140 --> 00:47:35,349 Roya: Oh. This is also really good. Let me give a shout out to [name]. He's the guy 420 00:47:35,349 --> 00:47:45,990 at 1998 discovered IP-ID or published something that I ended up reading. So like 421 00:47:45,990 --> 00:47:54,440 for example Linux or Ubuntu in the U.S. version they randomized it but it still 422 00:47:54,440 --> 00:47:59,421 draws this legacy operating system like WindowsXP and predecessors and FreeBSD 423 00:47:59,421 --> 00:48:04,750 that still have global IP-ID. So one argument that often come up is, what if 424 00:48:04,750 --> 00:48:09,339 all these machines get updated to the new operating system where it doesn't have a 425 00:48:09,339 --> 00:48:13,780 maintain global IP-ID? And I can tell you that, well, we'll come up with another 426 00:48:13,780 --> 00:48:20,129 side channel. For now, that works. But my gut feeling is that if it didn't change 427 00:48:20,129 --> 00:48:25,230 from 1998 until now with all the things that everybody says that global IP-ID 428 00:48:25,230 --> 00:48:30,440 variable is a horrible idea, it's not going to change in the coming five years so 429 00:48:30,440 --> 00:48:33,230 we're good. Question: Thank you. 430 00:48:33,230 --> 00:48:36,520 Herald: Okay, then let's just move on to Mic four. 431 00:48:36,520 --> 00:48:41,480 Question: Thank you very much for the great talk. When you were introducing 432 00:48:41,480 --> 00:48:46,910 Augur I was wondering, does the detection of the blockage between client server 433 00:48:46,910 --> 00:48:52,190 necessarily indicate censorship? So, because you were talking about validating 434 00:48:52,190 --> 00:48:59,130 Augur I was wondering if it turns out that there is like a false alarm. What do you 435 00:48:59,130 --> 00:49:04,530 think could be the potential cause? Roya: You're absolutely right. And I tried 436 00:49:04,530 --> 00:49:11,630 to emphasize on that that what we end up collecting is can be seen as a disruption. 437 00:49:11,630 --> 00:49:17,200 Something didn't work. The SYN-ACK or RST got disrupted. Is that there is a 438 00:49:17,200 --> 00:49:22,250 censorship or it can be a random packet drop. And the way to be able to establish 439 00:49:22,250 --> 00:49:28,290 that confidence is to check whether aggregate the results. Do we see this 440 00:49:28,290 --> 00:49:33,670 blocking between multiple of the routers within that country or within that AS . 441 00:49:33,670 --> 00:49:38,880 Because if one of this is for accident that just didn't make sense or didn't get 442 00:49:38,880 --> 00:49:43,900 dropped, what about the others? So the whole idea and this is another point that 443 00:49:43,900 --> 00:49:50,390 I'm so so concerned about: Most of this report and anecdotes that we read is based 444 00:49:50,390 --> 00:49:55,869 on one VPN or one man touch points in the country. And then there are a lot of lot 445 00:49:55,869 --> 00:50:00,770 of conclusion out of that. And you often can ask that well this vantage point might 446 00:50:00,770 --> 00:50:05,640 be subject to so many different things than a government's censorship. Also I 447 00:50:05,640 --> 00:50:11,980 emphasized that the censorship that I use in this talk is any action that stops 448 00:50:11,980 --> 00:50:17,180 users' access to get to the requested content. I'm trying to get away from a 449 00:50:17,180 --> 00:50:23,480 semantic where of the intention applied. But great question. 450 00:50:23,480 --> 00:50:26,240 Herald: All right, then let's go back to Mic one right. 451 00:50:26,240 --> 00:50:29,740 Question: Hi Roya. You mentioned that you have a team of students working on all of 452 00:50:29,740 --> 00:50:33,890 these frameworks. I was wondering if your frameworks were open source are available 453 00:50:33,890 --> 00:50:37,760 online for collaboration? And if so, where those resources would be? 454 00:50:37,760 --> 00:50:45,040 Roya: So the data is open. The code hasn't been. For one reason is I'm so low 455 00:50:45,040 --> 00:50:49,090 confident in sharing code, like I'm friends with Philipp Winter, Dave Fifield. 456 00:50:49,090 --> 00:50:54,170 These people are pro open source and they constantly blame me for not. But it really 457 00:50:54,170 --> 00:51:00,721 requires confidence to share code. So we are working on that at least for Quack. I 458 00:51:00,721 --> 00:51:06,390 think the code is very easily can be shared. For Augur, we spent a heck amount 459 00:51:06,390 --> 00:51:12,109 of time to make a production ready code and for Satellite I think that is also 460 00:51:12,109 --> 00:51:17,420 ready. I can share them personally with you but before sharing to the world I want 461 00:51:17,420 --> 00:51:21,560 to actually give another person to audit and make sure we're not using a curse word 462 00:51:21,560 --> 00:51:26,420 or something. I don't know. It's just completely my mind being a little bit 463 00:51:26,420 --> 00:51:31,030 conservative. But happy if you send me an e-mail I send you to code. 464 00:51:31,030 --> 00:51:35,640 Question: Thank you. Herald: All right then move to Mic two. 465 00:51:35,640 --> 00:51:39,930 Question: Thanks again for sharing your great vision. I find it really 466 00:51:39,930 --> 00:51:47,470 fascinating. Also I'm not really a data scientist but my question is: did you find 467 00:51:47,470 --> 00:51:56,099 any any usefulness in your approaches in the spreading of the Internet of Things? I 468 00:51:56,099 --> 00:52:06,960 understood that you used routers to make queries but did you send and maybe receive 469 00:52:06,960 --> 00:52:11,260 back any data from washing machines, toasters,...? 470 00:52:11,260 --> 00:52:17,480 Roya: I mean, I know, being ethical and trying to not use end user machine limits 471 00:52:17,480 --> 00:52:22,589 your access a lot. And but but but that's our goal. We are going to stick with 472 00:52:22,589 --> 00:52:28,240 things that don't belong to the end users. And so it's all routers, organizational 473 00:52:28,240 --> 00:52:31,940 machines. So I want to make sure that whatever we're using belong to the 474 00:52:31,940 --> 00:52:35,349 identity that can protect themselves if something went wrong. They can just say 475 00:52:35,349 --> 00:52:39,640 "Hey this is a freaking router, it receives and sends so many things. I mean, 476 00:52:39,640 --> 00:52:44,740 look, let me give you show you a TCP (?), for example. A volunteer might not be able 477 00:52:44,740 --> 00:52:49,290 to defend that because it's already conspiring and collecting this data. But 478 00:52:49,290 --> 00:52:53,550 good questions, I wish I could but I won't pass that line. 479 00:52:53,550 --> 00:52:57,380 Herald: All right. I don't see any more questions in the room right now. But we 480 00:52:57,380 --> 00:53:01,080 have one from the internet so please, signal angel. 481 00:53:01,080 --> 00:53:06,510 Signal Angel: Yes. Actually a question from koli585: I was in an African 482 00:53:06,510 --> 00:53:10,009 country where the internet has been completely shut down. How can I quickly 483 00:53:10,009 --> 00:53:14,709 and safely inform others about the shut down? 484 00:53:14,709 --> 00:53:21,470 Roya: So while I think local users' values are highly highly needed they can use 485 00:53:21,470 --> 00:53:27,510 social media like Twitter to send and say whatever, there is a project called IODA. 486 00:53:27,510 --> 00:53:36,869 It's a project at CAIDA UCSD University in U.S. and Philipp Winter, Alberto 487 00:53:36,869 --> 00:53:43,160 [Dainotti] and Alistair [King] are working on that. They basically remotely keep 488 00:53:43,160 --> 00:53:51,540 track of shutdowns and push them out. If you look at the IODA on Twitter you can 489 00:53:51,540 --> 00:54:02,620 see their live feed of how the shutdowns where the shutdowns happen. So I haven't 490 00:54:02,620 --> 00:54:09,260 thought about how to reach to the users telling them what we see or how we can 491 00:54:09,260 --> 00:54:18,609 incorporate the users' feedback. We are working with a group of researchers that 492 00:54:18,609 --> 00:54:27,000 already developed tools to receive this data from Tweeters and basically use that 493 00:54:27,000 --> 00:54:31,890 as some level of ground truth, but OONI does such a great job that I haven't felt 494 00:54:31,890 --> 00:54:37,220 a need. Herald: Alright. Unless the signal angel 495 00:54:37,220 --> 00:54:43,750 has another question? No? Roya: And let me, can I add one thing? So 496 00:54:43,750 --> 00:54:52,940 I was listening to a talk about how Iranian versus Arabs were sympathetic 497 00:54:52,940 --> 00:55:01,040 towards Boston bombing in United States and there were a lot of assumptions and a 498 00:55:01,040 --> 00:55:05,819 lot of conclusions were made that, oh this, I'm completely paraphrasing. I don't 499 00:55:05,819 --> 00:55:09,900 remember. But this Iranian doesn't care because they didn't tweet as much. So 500 00:55:09,900 --> 00:55:17,060 basically their input data was a bunch of tweets around the time of Boston bombing. 501 00:55:17,060 --> 00:55:21,599 After the talk was over I said: you know that in this country Twitter has been 502 00:55:21,599 --> 00:55:28,929 blocked and so many people couldn't tweet. applause 503 00:55:28,929 --> 00:55:33,490 Herald: Alright. That concludes our Q&A, so thanks so much Roya. 504 00:55:33,490 --> 00:55:35,436 Roya: Thank you. 505 00:55:35,436 --> 00:55:41,150 applause 506 00:55:41,150 --> 00:55:45,970 postroll music 507 00:55:45,970 --> 00:56:04,000 Subtitles created by c3subtitles.de in the year 2020. Join, and help us!