35C3 preroll music Herald Angel: All right. It's my very big pleasure to introduce Roya Ensafi to you. She's gonna talk about "Censored Planet: a Global Censorship Observatory". I'm personally very interested in learning more about this project. Sounds like it's gonna be very important. So please welcome Roya with a huge warm round of applause. Thank you. Applause Roya: It's wonderful to finally make it to CCC. I had joined talk with multiple of my friends over the past years and the visa stuff never worked out. This year I applied for a conference in August and the visa worked for coming to CCC. My name is Roya Ensafi and I'm professor at the University of Michigan. My research focuses on security and privacy with the goal of protecting users from adversarial network. So basically I investigate network interference ...and somebody is interfering right now. Damn it. What the heck. Cool, I'm good. Oh, no I'm not. laughter OK. In my lab we develop techniques and systems to be able to detect network interference often at a scale and apply these frameworks and tools to be able to understand the behaviors of these actors that do the interference and use this understanding to be able to come up with a defense. Today I'm going to talk about a project that is very dear to my heart. The one that I spent six years working on it. And in this talk I'm going to talk about censorship, internet censorship. And by that I mean any action that prevents users' access to the requested content. We have heard an alarming level of censorship happening all around the world. And while it was previously multiple countries that were capable of using deep packet inspections to tamper with user traffic thanks to commercialization of these DPIs now many countries are actually messing with users' data. For the first time that the users type CNN.com in their browsers, their traffic is subject to some level of interference by different actors. First for example the DNS query where the mapping between the domain and the IP where the content is, can be manipulated. For example the DNS assets can be a dead IP where the content is not there. If the DNS succeed then the users and the servers are going to establish a connection, TCP handshake and that can be easily blocked. If that succeed then users and servers start actually sending back and forth the actual data and there are enough to clear text to be the traffic encrypted or not that the DPI can detect a sensitive keyboard and send a reset package to both basically shut down the connections. Before I forget let me tell you and emphasize that it's not just the governments and the policies that impose on the ISPs that lead to censorship. Actually server side which provides the data are also blocking users. Especially if they are located in a region that they don't provide any revenue. We recently investigated this issue of dual blocking in deep and provide more details about what role CDNs actually provide. Imagine now we have how many users, how many ISPs, how many transit networks and how many websites. Each of which are going to have their own policies of how to block users' access. More, censorship changes from time to time, region to region and country to country. And for that reason many researchers including me have been interested in collecting data about censorship in a global way and continuously. Well, I grew up under severe censorship. Be it the university, government, more frustrating the server side. And I genuinely believe that censorship take away opportunities and degrade human dignity. It is not just China, Bahrain, Turkey that does internet censorship. Actually with the DPIs become cheaper and cheaper many governments are following their leads. As a result Internet is becoming more and more balkanized and the users around the world are going to soon have a very very different pictures of what this Internet is. And we need to be able to collect the data and to be able to know what is being censored, how it's being censored, where it's being censored and for how long. This data then can be used to bring transparency and accountability to governments or private companies that practice internet censorship. It can help us to know where the circumvention to, where the defense needs to be deployed. It can help us to let the users around the world to know what their governments are up to and more important provide valid and good data for the policymakers to come up with the good policies. Existing research already shows that if we can provide this data to users they act by their own will to ensure Internet freedom. For many years my goal has been to come up with a weather map, a censorship weather map where you can actually see changes in censorship over time, how some countries are different from others and do that for a continuous duration of time, and for all over the world. Creating such a map was impossible with the techniques, Internet measurement methods that we had at that time. At the time and even the common techniques we now use. The measurement methods to be able to use for measuring internet censorship is often by deploying a software or giving your customized Raspberry Pi to either a client or a server and based on that measure what's happening between client and servers. Well, this approach has a lot of limitations. For example there are not that many volunteers around the whole world that are eager to download a software and run it. Second, the data collected from this approach are often not continuous because the user's connection can die for a variety of reasons or users may loose interest to keep running the software. And therefore we end up with sparse data where we cannot have a good baseline for internet censorship studies. More measuring domains that are sensitive often create risks for the local collaborators and might end up with their government's retaliate. These risks are not hypothetical. When the Arab Spring was happening I was approached by many colleagues to recruit local friends and colleagues in Middle East to be able to collect measurement data at the time that was very interesting to capture the behavior of the network and most dangerous for the locals, and volunteers to collect that. My painting actually expressed what I felt at the time. I can't just imagine asking people on the ground to help at these times of unrest. In my opinion, conspiring to collect the data against the government's interest can be seen as an act of treason. And these governments are unpredictable often. So it has exposed these volunteers to a severe risk. While no one has yet been arrested because of measuring internet censorship as far as we know, and I don't know how we can know that on a global scale, I think the clouds are on the horizon. I'm still at awe how Turkish government used their surveillance data at a time of a co-op and tracked down and detained hundreds of users because there was a traffic between them and by luck a messenger app that was used by co- op administrators. These things happens. Before I continue, if you know OONI you might ask how OONI prevents risk. Well, with a great level of efforts. And if you don't know OONI, OONI is a global community of volunteers that collect data about censorship around the world. Well, first and foremost they provide their volunteers with the very honest consent, telling them that "hey, if you run this software, anybody who is monitoring your traffic know what you're up to." They also go out of their way to give freedom to these volunteers to choose what website they want to run, what data they want to push. They establish a great relationship with the local activist organization in the countries. Well, now that I prove to you guys that I am a supporter of OONI and I am actually friends with most of them; I want to emphasize that I still believe that consistent and continuous and global data about censorship requires a new approach that doesn't need volunteers' help. I've become obsessed with solving this problems. What if we could measure without a client, in anywhere around the world, can talk to a server without being close to a client. Somewhere from here, from University of Michigan. And see whether the two hosts can talk to each other, globally and remotely, off the path. When I talk to the people about this, honestly, everybody was like "you don't know what you're talking about, it's really really challenging". Well, they were right. The challenge is there, and I'm going to walk you through it. We have at least 140 million IP addresses that respond to same packet. This means they speak to the world, and they follow blindly TCP/IP protocol. So the question becomes: how can I leverage the subtle properties of TCP/IP to be able to detect that two hosts can talk to each other? Well, Spooky Scan is a technique that Jed Crandall from University of New Mexico and I developed that uses TCP/IP side channels to be able to detect whether the two remote hosts can establish a TCP handshake or not, and if not, in which direction the packets are being dropped. Off the path and remotely. And I'm gonna start telling you how this works. First I have to cover some background. So any connection that is based on TCP, one of the basic communication protocols we have, is it needs to establish a TCP handshake. So basically you should, you send a SYN and in the packet you send, in the IP header, you have a field called "identification IP_ID", and this field is used for fragmentation reason, and I'm going to use this field a lot in the rest of the talk. After the user received a SYN, it is going to send a SYN-ACK back, have another IP_ID in it. And then, if I want to establish a connection I send ACK. Otherwise I send a RESET (RST). Part of the protocol says that if you send a SYN-ACK packet to a machine with a port open or closed, it's going to send you a RST, telling you "what the heck you are sending me SYN-ACK, I didn't send you a SYN" and another part said: if you send a SYN packet to a machine with the port open, eager to establish connection, it will send you a SYN-ACK. If you don't do anything, because TCP/IP is reliable, it's going to send you multple SYN-ACK. It depends on operating system, 3, 5, you name it. Spooky Scan requires some basic characteristics. For example, the client, the vantage points that we are interested, should maintain a global variable for the IP_ID. It means that, when they receive the packets and they want to send a packet out, no matter who they're sending the packet to, this IP_ID is going to be a shared resource, as in going to be increment by one. So by just watching the IP_ID changes you can see how much a machine is noisy, how much a machine is sending traffic out. A server should have a port open, let's say 80 or 443, and wants to establish a connection, and the measurement machine, me, should be able to spoof packets. It means sending packet with the source IP different from my own machine. To be able to do that, you need to talk to upstream network and ask them not to drop the packets. All of these requirements I could easily satisfy with a little bit of effort. A Spooky Scan starts with measurement machine send a SYN-ACK packet to one of this client with a global IP_ID, at a time let's say the value is 7000. The client is going to send back a RST, following the protocol, revealing to me what the value of IP_ID. In the next step I'm going to send a spoofed SYN packet to a server using a client IP. As a result, the SYN-ACK is going to be sent to the client. Again, client is going to send a RST back, the IP_ID is going to be incremented by 1. Next time I query IP_ID I'm going to see a jump too. In a noiseless model, I know that this machine talked to the server. If I query it again, I won't see any jump. So, Delta 2, Delta 1. Now imagine there is a firewall that blocks the SYN-ACKs going from the server to the client. Well, it doesn't matter how much of the traffic I send, it's not going to get there. It's not going to get there. So the delta I see is 1, 1. In the third case when the packets are going to be dropped from the client to the server: Well, my SYN-ACK gets there. The SYN-ACK gets to the client, the client is going to set the RST back, but it's not going to get to the server. And so server thinks that a packet got dropped, so it's going to send multiple SYN-ACK. And as a result the RST is going to be plus plus more. And so what jump I would see is, let's say, 2, 2. Let me put them all together. So you have 3 cases. Blocking in this direction. No blocking and blocking in the other. And you see different jumps or different deltas. So it's detectable. Yes, yes, in a noiseless model. I know the clients talk to so many others and the IP_ID is going to be changed because of a variety of reason. I call all of those noise. And this is how we are going to deal with it. Well, intuitively thinking we can amplify the signal. We can actually instead of sending one spoofed SYN packet we can send n. And for a variety of reasons packets can get dropped. So we need to repeat this measurement. So here is some data from a Spooky Scan where I used the following probing method. For 30 seconds I spoofed the, I've sent a query for IP_ID. And then for another 30 seconds I send these 5 spoofed SYN packets. This is machines or clients in Azerbaijan, China and United States. And we wanted to check whether it has reached the TOR-relay that we had in Sweden. You can see there are different jump or different levels-shift that you observe in a second phase. And just visually looking at it or using auto- regressive moving average or ARMA you can actually detect that. But there is an insight here, which is that not all the clients have the same level of noise. And for which, for some of them, especially these guys, you could easily detect after five level of sending IP_ID-query and then five seconds of spoofing. So in the follow-up work we tried to use this insight, to be able to come up with a scalable and efficient technique to be able to use it in a global way. And that technique is called "Augur". Well Augur adopts this probing method. First, for four seconds it queries IP_ID, then in one second sends 10 spoofed SYN-packets. Then look at the IP_ID-acceleration or second derivative, and see whether we see a jump, a sudden jump at the time of perturbation, when we did the spoofing. How confident we are that that jump is the result of our own spoofed packet? Well, I'm not confident, run it again. I think so, run it again, until you have a sufficient confidence. It turns out there is a statistical analysis called "sequential hypothesis testing" that can be used to be able to gradually improve our confidence about the case we're detecting. So I'm going to give you a very, very rough overview of how this works. But for sequential hypothesis testing we need to define a random variable. And we use IP_ID-acceleration at the time of perturbation, being 1 or 0, based on you see jump or not. We also need to calculate some empirical priors, known probabilities. If you look at everything, what would be the probability that you see jump when there is actually no blocking? And so on. After we put all this together then we can formalize an algorithm starting by run a trial. Update the sequence of values for the random variables. Then check whether this sequence of values belongs to the distribution of where the blocking happen or not. What's the likelihood of that? If you're confident, if we reached the level that we are satisfied, then we call it a case. So putting all this together this is how Augur works. We scan the whole IPv4, find global IP_ID-machines. And then we have some constraint that is it a stable machine? Is it a noisier or have a noise that you want to deal with? We also need to figure out what website are we interested to test reachability towards? What countries we are? So after we decide all the input then we run a scheduler making sure that no client and server are under the measurement in the same time because they mess each other's detection. And then we actually use our analysis to be able to call the case and summarize the results. I started by saying that the common methods have this limitation, for example coverage continuity and ethics. Well, when it comes to coverage there are more than 22-million global IP_ID- machines. These are WindowsXP or predecessors. And FreeBSDs for example. Compared to the previous board, one successful project is the RIPE-atlas, and they have around 10000 probes globally deployed. When it comes to continuity we don't depend on the end user. So it's much more reliable to use this. Well, by not asking volunteers to help we were already reducing the risk. Because there is no users conspiring against their governments to collect this data. But our approach is not also zero risk. If you look you have a different kind of risk here. The client and server exchanging SYN-ACK and RST without each of them giving a consent. And we don't want to ask for consent. Because if you do, the dilemma exists. We have to go back and it's just the same that's asking volunteers. So, to deal with that and cope with that, to reduce the risk more, we don't use end-IPs. We actually use 2 hops back, routers which high probability they are infrastructure machines and use those as a vantage point. Even in this harsh constraint we still have 53000 global IP_ID-routers. To test the framework to see that whether Augur works we chose 2000 of these global IP_ID- machines, uniformly selected from all the countries we had vantage point. We selected websites from Citizen Lab Testlist. This is the research organization in Toronto University where they crowdsourced websites that are potentially being blocked or potential sensitive. And then we used thousands of the websites from Alexa top-10k. And then we get the Augur running for 17 days and collect this data. One of the challenges that we have to validate Augur was like: So, what is the truth? What is the ground- truth? What would we see that makes sense? So, and this is the biggest and fundamental challenge for internet- censorship anyway. But so the first approach is leaning on intuition, which is like no client should show blocking towards all the websites. No server should show blocking for bulk of our clients. And if anything happens like that we just trash it. And we should see more bias towards the sensitive domain versus the ones that are popular. And so on. And also we hope to replicate the anecdotes, the reports out there. And we did all of those. And that's how we validate Augur. So at the end Augur is a system that is as scalable and efficient, ethical and can be used to detect TCP/IP-blocking continuously. Yes I know that is just TCP/IP. What about the other layers? Can we measure them remotely as well? Well, let me focus on the DNS. You might ask: Is there a way that we can remotely detect DNS poisoning or manipulation? Well let's think it out loud. From now on I'm gonna give just the highlights of the papers we work for the lack of the time. Well, if we scan the whole IPv4 we have a lot of open DNS resolvers, which means that they are open to anybody sending a query to them to resolve. And these open DNS-resolvers can be used as a vantage point. We can use open DNS-resolvers in different ISPs around the world to see whether that DNS queries are poisoned or not. Well, wait. We need to make sure that they don't belong to the end user. So we come up with a lot of checks to make sure that these open DNS-resolvers are organizational, belonging to the ISP or infrastructure. After we do that then we start sending all our queries to these, let's say, open DNS- resolvers in the ISP in Bahrain, for all the domain we're interested. And capture what we receive what IPs we receive. The challenge is then to detect what is the wrong answer. And so we have to come up with a lot of heuristics. A set of heuristics. For example the response that we received is that equal to a reply we got from our control measurements, where we know the IP is not blocked or poisoned or something. The content is there. Or we can actually look at the IP that we received and see whether it has a valid http cert, with or without the SNI or servername identification or something. And so on so forth. So we come up with lots of heuristics to detect wrong answers. The results of all these efforts ended up being a project called "Satellite", which was started by Will Scott. I'm sure he is in the audience somewhere. A great friend of mine and very good supporter of CensoredPlanet. Selflessly, he has been a miracle that I I had the opportunity and fortune to meet him. We have Satellite. Satellite automate the whole steps that I told you. For this work we use science that developed in both of the work. We call it Satellite because of seniority and sticking with the name. So how much coverage Satellite has? If you scan IPv4 you end up with 4.2 million open DNS-resolvers in every country in their territories. We make, we need, we we actually need to make sure there are ethics for that reason. If we put a harsh condition. We say that let's only use the ones that fallow their valid PTR record followed this expression. Basically let's just use the open DNS-resolvers that are name servers or at least their PDR record suggests that. This is a really harsh constraint. Actually, my students have been adding more and more regular expression for the ones that we are sure they are organizational. But for now just being this harsh we have 40k of DNS- revolvers in almost 169 countries I guess. So censorship happened in other layers as well. How do we want to deal with that remote channel, with the remote side channel? And, especially, like, what about http traffic or disruption that can happen to you know TLS centric. I hate water. Oh no. Okay. So. So it's scratching noise it's well documented that many DPIs especially in the Great Firewall of China monitor the traffic and then they see a key word, a sensitive keyword like "Falun Gong". They act and a drop traffic or send a RST. And as I mentioned earlier there are enough clear text everywhere. Even in TLS handshakes SNI is in clear text. And for a long time I was trying to come up with a way of detecting application layer using this fancy side channel. Like, how can I detect that when the client and server need to first establish a TCP handshake, how the side channel can jump in and then detect the rest? We were lucky enough that the end pointed to a protocol called "Echo". It's a protocol designed in 1983 and it's for testing reasons, for the debu..it is a debugging tool, basically. It's a predecessor to ping. And basically, after you establish a TCP handshake to port 7, whatever you send the Echo servers on port 7 it's gonna echo it back. Now think about it. How we can use Echo servers to be able to detect application layer blocking? Well, when it's not available, let's say I have an Echo server in the U.S. and a measurement machine in the University of Michigan I establish a TCP handshake and I send a GET request to... using a censored keyboard for example. It's gonna get back to me the same thing I sent. But now let's put the DPI that is gonna be triggered by it. Well, for sure, either I'm going to receive a RST first or something else. So we can actually come up with a algorithm to be able to use Echo servers to detect disruptions on application layer. Basically keyboards blocking, URL blocking. Results of this is a tool called Quack. And Quack actually uses Echo servers to be able to detect in a scalable way and say if, whether the keywords are being blocked around the world. So what did we do is first scan the whole IPv4. We find 47k Echo servers running around the world. Then we need to be able to check whether they or not belong to the end users. And that was a very challenging part because there is not a clear signal as it's.. there are 90 percent of them are infrastructure but there is still some portion of them that we don't know. So what we do is we look at the FreedomHouse reports and the countries that are partially open or not open, not free or partially free what they're called. This is around 50... This is around 50 countries. And for those we use... we randomly select some that we want and we use OS detection of Nmap. And if you have, it will give us back it's a server, it's a switch and so on. We use those. So with the help of so many collaborators after almost six years we end up with three systems that can capture TCP/IP blocking, DNS, and application layer blocking using infrastructure and organizational machines. So while it was, it was a dream or a vision that we can come up with a better map to collect this data in a continuous way, thanks to help of a lot of people especially my students, Will, and other collaborators we now have CensoredPlanet. CensoredPlanet collects semi-weekly snapshots of Internet censorship using our vantage point in all the layers and provide this data in a raw format now in our web site. We also provide some visualization way for people to be able to see how many vantage points we have in each country and so on. Of course, this is the beginning of CensoredPlanet. We launched this at August and we have been collecting data for almost four months and we have a long way to go. We have users right now through organizations using our data and helping us debug by finding things that doesn't make sense pointing to us and any of you that ended up using these data, please share your feedback with us and we are very responsive to be able to change it, not as much as you need. They have a collective of very well dedicated people participating. So, now that we have this CensoredPlanet let me give you how it can help when there is a political situation going on. You all must remember around October there Jamal Khashoggi, a Washington Post reporter, disappeared, killed at the Saudi Arabian embassy in Turkey. At the time of this happening there was a lot of media attention and this, this news especially two weeks in become very internationally spread. CensoredPlanet didn't know this event was going to happen. So we have been collecting this data semi-weekly for 2000 domain or so. And so we went back and we checked the Saudi Arabia. Did we see anything interesting? And yes, we saw for example at two weeks in, around October 16, the domains that we were that was news category and media category, the censorship related to those doubled. And let me emphasize, we didn't see like a block or not block over the whole country not all the countries have a homogeneous censorship happening. We saw it in multiple of the ISPs that we had vantage point. Actually I freaked out when one of the activists in Saudi Arabia told us that "I don't see this". And we said "What ISP you are in?" And this wasn't the ISPs that we had vantage point in. So we were looking for hints that "Is anybody else seeing what we were seeing?". And so we ended up seeing there was a commander lab project that also saw around October 16 the number of malwares or whatever they are testing is also doubled or tripled. I don't know the other. So something was going on two weeks in when the news broke. Let me emphasize this news media that I am talking about or the global news media that we check like L.A. Times, Fox News and so on. But we also checked Arab News which is as the activists told us is a Saudi Arabia's propaganda newspaper. That in one of the ISPs was being poisoned. So again, censorship measurement is very complex problem. So where we're heading? Well, having said that about side channels and the techniques that help us remotely collect this data I have to also say that the data we collect doesn't replicate the picture of the internet censorship. I mean having a root access on a volunteers machine to do a detailed test is powerful. So in the next step, in the next year, one of our goal is to join force with OONI to integrate the data and from remote and basically local measurements to provide the best of both worlds. Also, we have been thinking a lot about what would be a good visualization tools that doesn't end up to misrepresent internet censorship. I literally hate that one. Hate it. The number of vantage point in countries are not equal. We don't know whether all the vantage points that the data has resulted from it is from one ISP or all of our ISPs. And then we test domains that are like benign and like I don't know defined based on some western values of the freedom of expression. I believe in all of them but still culture, economy might play something red. And then we put colors on the map, rank the countries, call some countries awful and not giving full attention to the others. So something needs to be changed and it's in our horizon too. Think about it more deeper. We want to be able to have more statistic tools to be able to spot when the patterns change. We want to be able to compare the countries when for example Telegram was being blocked at Russia. If you remember millions of IPs being blocked. If you don't, know go to my friend Leonid's talk about Russia. You're going to learn a lot there. But anyway. So when the Russia was blocking Telegram, I said to everyone I bet in the following some other governments are going to jump to block Telegram as well. And that's actually what we heard, rumors like that. So we need to be able to do that automatically. And overall, I want to be able to develop an empirical science of internet censorship based on rich data with the help of all of you. CensoredPlanet is now being maintained by a group of dedicated students, great friends that I have and needs engineers and political scientists to jump on our data and help us to bring meaning to what we are collecting. So if you are a good engineer or a political scientist or a dedicated person who wants to change the world, reach out to me. For as a reference for those of you interested: these are the publications that my talk was based on. And now I am open to questions. applause Herald: Allright, perfect. Thank you so much, Roya, so far. We have some time for questions so if you have a question in the room please go to one of the room microphones one, two, three, four, and five in the very back. And if you're watching the stream you can ask questions to the signal angel via IRC or Twitter and we'll also make sure to relay those to the speaker and make sure those get asked. So let's just go ahead and start with Mic two please. Question: Hey, great talk. Do you worry that by publishing your methods as well as your data that you're going to get a response from governments that are censoring things such that it makes it more difficult for you to monitor what's being censored? Or has that already happened? Roya: It hasn't happened. We have control measures to be able to detect that. But that has been... it's a really good question and often comes up after I present. I can tell you based on my experience it's really hard to synchronize all the ISPs in all the countries to act to the SYN-ACK and RST that I'm sending. Like, for example for Augur, this is unsolicited packets and for governments to block that they are going to be a lot of collateral damage. You might say that well, Roya, they're going to block the IP of the University of Michigan. They're a spoofing machine. We have a measure for that. I have multiple places that I actually have a backup if that case happened. But overall this is a global scale measurement, and even in one city or like multiple ISPs you know of it's really hard to synchronize being like blocking something and maintaining. So it is something that's in our mind thinking about. But as as of now it's not a worry. Herald: All right then let's go over to Mic one. Question: Thank you. I wondered, it's kind of similar to this question. What if you are measuring from a country that is blocking? Do you also distribute the measurements over several countries? Roya: Absolutely. Every snapshot that we collect is from all the vantage point we have in like certain countries and portion of vantage point in like China or like US because they have millions of vantage points or like thousands of vantage points. So basically at each snapshot, which takes us three days, we collect the data from all of all of the vantage point. And so let's say that somebody is reacting to us. We have a benign domain that we check as well like for example a domain example.com or random.com. So if we see something going on there we actually double check. But good point, because now our efforts is very manual labor and we're trying to automate everything so it's still a challenge. Thank you. Herald: All right then let's go to Mic three. Question: Hi. Have you measured how much does IP-ID randomization break your probes? Roya: Oh. This is also really good. Let me give a shout out to [name]. He's the guy at 1998 discovered IP-ID or published something that I ended up reading. So like for example Linux or Ubuntu in the U.S. version they randomized it but it still draws this legacy operating system like WindowsXP and predecessors and FreeBSD that still have global IP-ID. So one argument that often come up is, what if all these machines get updated to the new operating system where it doesn't have a maintain global IP-ID? And I can tell you that, well, we'll come up with another side channel. For now, that works. But my gut feeling is that if it didn't change from 1998 until now with all the things that everybody says that global IP-ID variable is a horrible idea, it's not going to change in the coming five years so we're good. Question: Thank you. Herald: Okay, then let's just move on to Mic four. Question: Thank you very much for the great talk. When you were introducing Augur I was wondering, does the detection of the blockage between client server necessarily indicate censorship? So, because you were talking about validating Augur I was wondering if it turns out that there is like a false alarm. What do you think could be the potential cause? Roya: You're absolutely right. And I tried to emphasize on that that what we end up collecting is can be seen as a disruption. Something didn't work. The SYN-ACK or RST got disrupted. Is that there is a censorship or it can be a random packet drop. And the way to be able to establish that confidence is to check whether aggregate the results. Do we see this blocking between multiple of the routers within that country or within that AS . Because if one of this is for accident that just didn't make sense or didn't get dropped, what about the others? So the whole idea and this is another point that I'm so so concerned about: Most of this report and anecdotes that we read is based on one VPN or one man touch points in the country. And then there are a lot of lot of conclusion out of that. And you often can ask that well this vantage point might be subject to so many different things than a government's censorship. Also I emphasized that the censorship that I use in this talk is any action that stops users' access to get to the requested content. I'm trying to get away from a semantic where of the intention applied. But great question. Herald: All right, then let's go back to Mic one right. Question: Hi Roya. You mentioned that you have a team of students working on all of these frameworks. I was wondering if your frameworks were open source are available online for collaboration? And if so, where those resources would be? Roya: So the data is open. The code hasn't been. For one reason is I'm so low confident in sharing code, like I'm friends with Philipp Winter, Dave Fifield. These people are pro open source and they constantly blame me for not. But it really requires confidence to share code. So we are working on that at least for Quack. I think the code is very easily can be shared. For Augur, we spent a heck amount of time to make a production ready code and for Satellite I think that is also ready. I can share them personally with you but before sharing to the world I want to actually give another person to audit and make sure we're not using a curse word or something. I don't know. It's just completely my mind being a little bit conservative. But happy if you send me an e-mail I send you to code. Question: Thank you. Herald: All right then move to Mic two. Question: Thanks again for sharing your great vision. I find it really fascinating. Also I'm not really a data scientist but my question is: did you find any any usefulness in your approaches in the spreading of the Internet of Things? I understood that you used routers to make queries but did you send and maybe receive back any data from washing machines, toasters,...? Roya: I mean, I know, being ethical and trying to not use end user machine limits your access a lot. And but but but that's our goal. We are going to stick with things that don't belong to the end users. And so it's all routers, organizational machines. So I want to make sure that whatever we're using belong to the identity that can protect themselves if something went wrong. They can just say "Hey this is a freaking router, it receives and sends so many things. I mean, look, let me give you show you a TCP (?), for example. A volunteer might not be able to defend that because it's already conspiring and collecting this data. But good questions, I wish I could but I won't pass that line. Herald: All right. I don't see any more questions in the room right now. But we have one from the internet so please, signal angel. Signal Angel: Yes. Actually a question from koli585: I was in an African country where the internet has been completely shut down. How can I quickly and safely inform others about the shut down? Roya: So while I think local users' values are highly highly needed they can use social media like Twitter to send and say whatever, there is a project called IODA. It's a project at CAIDA UCSD University in U.S. and Philipp Winter, Alberto [Dainotti] and Alistair [King] are working on that. They basically remotely keep track of shutdowns and push them out. If you look at the IODA on Twitter you can see their live feed of how the shutdowns where the shutdowns happen. So I haven't thought about how to reach to the users telling them what we see or how we can incorporate the users' feedback. We are working with a group of researchers that already developed tools to receive this data from Tweeters and basically use that as some level of ground truth, but OONI does such a great job that I haven't felt a need. Herald: Alright. Unless the signal angel has another question? No? Roya: And let me, can I add one thing? So I was listening to a talk about how Iranian versus Arabs were sympathetic towards Boston bombing in United States and there were a lot of assumptions and a lot of conclusions were made that, oh this, I'm completely paraphrasing. I don't remember. But this Iranian doesn't care because they didn't tweet as much. So basically their input data was a bunch of tweets around the time of Boston bombing. After the talk was over I said: you know that in this country Twitter has been blocked and so many people couldn't tweet. applause Herald: Alright. That concludes our Q&A, so thanks so much Roya. Roya: Thank you. applause postroll music Subtitles created by c3subtitles.de in the year 2020. Join, and help us!