-
35C3 preroll music
-
Herald Angel: All right. It's my very big
pleasure to introduce Roya Ensafi to you.
-
She's gonna talk about "Censored Planet: a
Global Censorship Observatory". I'm
-
personally very interested in learning
more about this project. Sounds like it's
-
gonna be very important. So please welcome
Roya with a huge warm round of applause.
-
Thank you.
-
Applause
-
Roya: It's wonderful to finally make it to
CCC. I had joined talk with multiple of my
-
friends over the past years and the visa
stuff never worked out. This year I
-
applied for a conference in August and the
visa worked for coming to CCC. My name is
-
Roya Ensafi and I'm professor at the
University of Michigan. My research
-
focuses on security and privacy with the
goal of protecting users from adversarial
-
network. So basically I investigate
network interference ...and somebody is
-
interfering right now. Damn it. What the
heck. Cool, I'm good. Oh, no I'm not.
-
laughter OK. In my lab we develop
techniques and systems to be able to
-
detect network interference often at a
scale and apply these frameworks and tools
-
to be able to understand the behaviors of
these actors that do the interference and
-
use this understanding to be able to come
up with a defense. Today I'm going to talk
-
about a project that is very dear to my
heart. The one that I spent six years
-
working on it. And in this talk I'm going
to talk about censorship, internet
-
censorship. And by that I mean any action
that prevents users' access to the
-
requested content. We have heard an
alarming level of censorship happening all
-
around the world. And while it was
previously multiple countries that were
-
capable of using deep packet inspections
to tamper with user traffic thanks to
-
commercialization of these DPIs now many
countries are actually messing with users'
-
data. For the first time that the users
type CNN.com in their browsers, their
-
traffic is subject to some level of
interference by different actors. First
-
for example the DNS query where the
mapping between the domain and the IP
-
where the content is, can be manipulated.
For example the DNS assets can be a dead
-
IP where the content is not there. If the
DNS succeed then the users and the servers
-
are going to establish a connection, TCP
handshake and that can be easily blocked.
-
If that succeed then users and servers
start actually sending back and forth the
-
actual data and there are enough to clear
text to be the traffic encrypted or not
-
that the DPI can detect a sensitive
keyboard and send a reset package to both
-
basically shut down the connections.
Before I forget let me tell you and
-
emphasize that it's not just the
governments and the policies that impose
-
on the ISPs that lead to censorship.
Actually server side which provides the
-
data are also blocking users. Especially
if they are located in a region that they
-
don't provide any revenue. We recently
investigated this issue of dual blocking
-
in deep and provide more details about
what role CDNs actually provide. Imagine
-
now we have how many users, how many ISPs,
how many transit networks and how many
-
websites. Each of which are going to have
their own policies of how to block users'
-
access. More, censorship changes from time
to time, region to region and country to
-
country. And for that reason many
researchers including me have been
-
interested in collecting data about
censorship in a global way and
-
continuously. Well, I grew up under severe
censorship. Be it the university,
-
government, more frustrating the server
side. And I genuinely believe that
-
censorship take away opportunities and
degrade human dignity. It is not just
-
China, Bahrain, Turkey that does internet
censorship. Actually with the DPIs become
-
cheaper and cheaper many governments are
following their leads. As a result
-
Internet is becoming more and more
balkanized and the users around the world
-
are going to soon have a very very
different pictures of what this Internet
-
is. And we need to be able to collect the
data and to be able to know what is being
-
censored, how it's being censored, where
it's being censored and for how long. This
-
data then can be used to bring
transparency and accountability to
-
governments or private companies that
practice internet censorship. It can help
-
us to know where the circumvention to,
where the defense needs to be deployed. It
-
can help us to let the users around the
world to know what their governments are
-
up to and more important provide valid and
good data for the policymakers to come up
-
with the good policies. Existing research
already shows that if we can provide this
-
data to users they act by their own will
to ensure Internet freedom. For many years
-
my goal has been to come up with a weather
map, a censorship weather map where you
-
can actually see changes in censorship
over time, how some countries are
-
different from others and do that for a
continuous duration of time, and for all
-
over the world. Creating such a map was
impossible with the techniques, Internet
-
measurement methods that we had at that
time. At the time and even the common
-
techniques we now use. The measurement
methods to be able to use for measuring
-
internet censorship is often by deploying
a software or giving your customized
-
Raspberry Pi to either a client or a
server and based on that measure what's
-
happening between client and servers.
Well, this approach has a lot of
-
limitations. For example there are not
that many volunteers around the whole
-
world that are eager to download a
software and run it. Second, the data
-
collected from this approach are often not
continuous because the user's connection
-
can die for a variety of reasons or users
may loose interest to keep running the
-
software. And therefore we end up with
sparse data where we cannot have a good
-
baseline for internet censorship studies.
More measuring domains that are sensitive
-
often create risks for the local
collaborators and might end up with their
-
government's retaliate. These risks are
not hypothetical. When the Arab Spring was
-
happening I was approached by many
colleagues to recruit local friends and
-
colleagues in Middle East to be able to
collect measurement data at the time that
-
was very interesting to capture the
behavior of the network and most dangerous
-
for the locals, and volunteers to collect
that. My painting actually expressed what
-
I felt at the time. I can't just imagine
asking people on the ground to help at
-
these times of unrest. In my opinion,
conspiring to collect the data against the
-
government's interest can be seen as an
act of treason. And these governments are
-
unpredictable often. So it has exposed
these volunteers to a severe risk. While
-
no one has yet been arrested because of
measuring internet censorship as far as we
-
know, and I don't know how we can know
that on a global scale, I think the clouds
-
are on the horizon. I'm still at awe how
Turkish government used their surveillance
-
data at a time of a co-op and tracked down
and detained hundreds of users because
-
there was a traffic between them and by
luck a messenger app that was used by co-
-
op administrators. These things happens.
Before I continue, if you know OONI you
-
might ask how OONI prevents risk. Well,
with a great level of efforts. And if you
-
don't know OONI, OONI is a global
community of volunteers that collect data
-
about censorship around the world. Well,
first and foremost they provide their
-
volunteers with the very honest consent,
telling them that "hey, if you run this
-
software, anybody who is monitoring your
traffic know what you're up to." They also
-
go out of their way to give freedom to
these volunteers to choose what website
-
they want to run, what data they want to
push. They establish a great relationship
-
with the local activist organization in
the countries. Well, now that I prove to
-
you guys that I am a supporter of OONI and
I am actually friends with most of them; I
-
want to emphasize that I still believe
that consistent and continuous and global
-
data about censorship requires a new
approach that doesn't need volunteers'
-
help. I've become obsessed with solving
this problems. What if we could measure
-
without a client, in anywhere around the
world, can talk to a server without being
-
close to a client. Somewhere from here,
from University of Michigan. And see
-
whether the two hosts can talk to each
other, globally and remotely, off the
-
path. When I talk to the people about
this, honestly, everybody was like "you
-
don't know what you're talking about, it's
really really challenging". Well, they
-
were right. The challenge is there, and
I'm going to walk you through it. We have
-
at least 140 million IP addresses that
respond to same packet. This means they
-
speak to the world, and they follow
blindly TCP/IP protocol. So the question
-
becomes: how can I leverage the subtle
properties of TCP/IP to be able to detect
-
that two hosts can talk to each other?
Well, Spooky Scan is a technique that Jed
-
Crandall from University of New Mexico and
I developed that uses TCP/IP side channels
-
to be able to detect whether the two
remote hosts can establish a TCP handshake
-
or not, and if not, in which direction the
packets are being dropped. Off the path
-
and remotely. And I'm gonna start telling
you how this works. First I have to cover
-
some background. So any connection that is
based on TCP, one of the basic
-
communication protocols we have, is it
needs to establish a TCP handshake. So
-
basically you should, you send a SYN and
in the packet you send, in the IP header,
-
you have a field called "identification
IP_ID", and this field is used for
-
fragmentation reason, and I'm going to use
this field a lot in the rest of the talk.
-
After the user received a SYN, it is going
to send a SYN-ACK back, have another IP_ID
-
in it. And then, if I want to establish a
connection I send ACK. Otherwise I send a
-
RESET (RST). Part of the protocol says
that if you send a SYN-ACK packet to a
-
machine with a port open or closed, it's
going to send you a RST, telling you "what
-
the heck you are sending me SYN-ACK, I
didn't send you a SYN" and another part
-
said: if you send a SYN packet to a
machine with the port open, eager to
-
establish connection, it will send you a
SYN-ACK. If you don't do anything, because
-
TCP/IP is reliable, it's going to send you
multple SYN-ACK. It depends on operating
-
system, 3, 5, you name it. Spooky Scan
requires some basic characteristics. For
-
example, the client, the vantage points
that we are interested, should maintain a
-
global variable for the IP_ID. It means
that, when they receive the packets and
-
they want to send a packet out, no matter
who they're sending the packet to, this
-
IP_ID is going to be a shared resource, as
in going to be increment by one. So by
-
just watching the IP_ID changes you can
see how much a machine is noisy, how much
-
a machine is sending traffic out. A server
should have a port open, let's say 80 or
-
443, and wants to establish a connection,
and the measurement machine, me, should be
-
able to spoof packets. It means sending
packet with the source IP different from
-
my own machine. To be able to do that, you
need to talk to upstream network and ask
-
them not to drop the packets. All of these
requirements I could easily satisfy with a
-
little bit of effort. A Spooky Scan starts
with measurement machine send a SYN-ACK
-
packet to one of this client with a global
IP_ID, at a time let's say the value is
-
7000. The client is going to send back a
RST, following the protocol, revealing to
-
me what the value of IP_ID. In the next
step I'm going to send a spoofed SYN
-
packet to a server using a client IP. As a
result, the SYN-ACK is going to be sent to
-
the client. Again, client is going to send
a RST back, the IP_ID is going to be
-
incremented by 1. Next time I query IP_ID
I'm going to see a jump too. In a
-
noiseless model, I know that this machine
talked to the server. If I query it again,
-
I won't see any jump. So, Delta 2, Delta
1. Now imagine there is a firewall that
-
blocks the SYN-ACKs going from the server
to the client. Well, it doesn't matter how
-
much of the traffic I send, it's not going
to get there. It's not going to get there.
-
So the delta I see is 1, 1. In the third
case when the packets are going to be
-
dropped from the client to the server:
Well, my SYN-ACK gets there. The SYN-ACK
-
gets to the client, the client is going to
set the RST back, but it's not going to
-
get to the server. And so server thinks
that a packet got dropped, so it's going
-
to send multiple SYN-ACK. And as a result
the RST is going to be plus plus more. And
-
so what jump I would see is, let's say, 2,
2. Let me put them all together. So you
-
have 3 cases. Blocking in this direction.
No blocking and blocking in the other. And
-
you see different jumps or different
deltas. So it's detectable. Yes, yes, in a
-
noiseless model. I know the clients talk
to so many others and the IP_ID is going
-
to be changed because of a variety of
reason. I call all of those noise. And
-
this is how we are going to deal with it.
Well, intuitively thinking we can amplify
-
the signal. We can actually instead of
sending one spoofed SYN packet we can send
-
n. And for a variety of reasons packets
can get dropped. So we need to repeat this
-
measurement. So here is some data from a
Spooky Scan where I used the following
-
probing method. For 30 seconds I spoofed
the, I've sent a query for IP_ID. And then
-
for another 30 seconds I send these 5
spoofed SYN packets. This is machines or
-
clients in Azerbaijan, China and United
States. And we wanted to check whether it
-
has reached the TOR-relay that we had in
Sweden. You can see there are different
-
jump or different levels-shift that you
observe in a second phase. And just
-
visually looking at it or using auto-
regressive moving average or ARMA you
-
can actually detect that. But there is an
insight here, which is that not all the
-
clients have the same level of noise. And
for which, for some of them, especially
-
these guys, you could easily detect after
five level of sending IP_ID-query and then
-
five seconds of spoofing. So in the
follow-up work we tried to use this
-
insight, to be able to come up with a
scalable and efficient technique to be
-
able to use it in a global way. And that
technique is called "Augur". Well Augur
-
adopts this probing method. First, for four
seconds it queries IP_ID, then in one
-
second sends 10 spoofed SYN-packets. Then
look at the IP_ID-acceleration or second
-
derivative, and see whether we see a jump,
a sudden jump at the time of perturbation,
-
when we did the spoofing. How confident we
are that that jump is the result of our
-
own spoofed packet? Well, I'm not
confident, run it again. I think so, run
-
it again, until you have a sufficient
confidence. It turns out there is a
-
statistical analysis called "sequential
hypothesis testing" that can be used to be
-
able to gradually improve our confidence
about the case we're detecting. So I'm
-
going to give you a very, very rough
overview of how this works. But for
-
sequential hypothesis testing we need to
define a random variable. And we use
-
IP_ID-acceleration at the time of
perturbation, being 1 or 0, based on you
-
see jump or not. We also need to calculate
some empirical priors, known
-
probabilities. If you look at everything,
what would be the probability that you see
-
jump when there is actually no blocking?
And so on. After we put all this together
-
then we can formalize an algorithm
starting by run a trial. Update the
-
sequence of values for the random
variables. Then check whether this
-
sequence of values belongs to the
distribution of where the blocking happen
-
or not. What's the likelihood of that? If
you're confident, if we reached the level
-
that we are satisfied, then we call it a
case. So putting all this together this is
-
how Augur works. We scan the whole IPv4,
find global IP_ID-machines. And then we
-
have some constraint that is it a stable
machine? Is it a noisier or have a noise
-
that you want to deal with? We also need
to figure out what website are we
-
interested to test reachability towards?
What countries we are? So after we decide
-
all the input then we run a scheduler
making sure that no client and server are
-
under the measurement in the same time
because they mess each other's detection.
-
And then we actually use our analysis to
be able to call the case and summarize the
-
results. I started by saying that the
common methods have this limitation, for
-
example coverage continuity and ethics.
Well, when it comes to coverage there are
-
more than 22-million global IP_ID-
machines. These are WindowsXP or
-
predecessors. And FreeBSDs for
example. Compared to the previous board,
-
one successful project is the RIPE-atlas,
and they have around 10000 probes globally
-
deployed. When it comes to continuity we
don't depend on the end user. So it's much
-
more reliable to use this. Well, by not
asking volunteers to help we were already
-
reducing the risk. Because there is no
users conspiring against their governments
-
to collect this data. But our approach is
not also zero risk. If you look you have a
-
different kind of risk here. The client
and server exchanging SYN-ACK and RST
-
without each of them giving a consent. And
we don't want to ask for consent. Because
-
if you do, the dilemma exists. We have to
go back and it's just the same that's
-
asking volunteers. So, to deal with that
and cope with that, to reduce the risk
-
more, we don't use end-IPs. We actually
use 2 hops back, routers which high
-
probability they are infrastructure
machines and use those as a vantage point.
-
Even in this harsh constraint we still
have 53000 global IP_ID-routers. To test
-
the framework to see that whether Augur
works we chose 2000 of these global IP_ID-
-
machines, uniformly selected from all the
countries we had vantage point. We
-
selected websites from Citizen Lab
Testlist. This is the research
-
organization in Toronto University where
they crowdsourced websites that are
-
potentially being blocked or potential
sensitive. And then we used thousands of
-
the websites from Alexa top-10k. And then
we get the Augur running for 17 days and
-
collect this data. One of the challenges
that we have to validate Augur was like:
-
So, what is the truth? What is the ground-
truth? What would we see that makes sense?
-
So, and this is the biggest and
fundamental challenge for internet-
-
censorship anyway. But so the first
approach is leaning on intuition, which is
-
like no client should show blocking
towards all the websites. No server should
-
show blocking for bulk of our clients. And
if anything happens like that we just
-
trash it. And we should see more bias
towards the sensitive domain versus the
-
ones that are popular. And so on. And also
we hope to replicate the anecdotes, the
-
reports out there. And we did all of
those. And that's how we validate Augur.
-
So at the end Augur is a system that is as
scalable and efficient, ethical and can be
-
used to detect TCP/IP-blocking
continuously. Yes I know that is just
-
TCP/IP. What about the other layers? Can
we measure them remotely as well? Well,
-
let me focus on the DNS. You might ask: Is
there a way that we can remotely detect
-
DNS poisoning or manipulation? Well let's
think it out loud. From now on I'm gonna
-
give just the highlights of the papers we
work for the lack of the time. Well, if we
-
scan the whole IPv4 we have a lot of open
DNS resolvers, which means that they are
-
open to anybody sending a query to them to
resolve. And these open DNS-resolvers can
-
be used as a vantage point. We can use
open DNS-resolvers in different ISPs
-
around the world to see whether that DNS
queries are poisoned or not. Well, wait.
-
We need to make sure that they don't
belong to the end user. So we come up with
-
a lot of checks to make sure that these
open DNS-resolvers are organizational,
-
belonging to the ISP or infrastructure.
After we do that then we start sending all
-
our queries to these, let's say, open DNS-
resolvers in the ISP in Bahrain, for all
-
the domain we're interested. And capture
what we receive what IPs we receive. The
-
challenge is then to detect what is the
wrong answer. And so we have to come up
-
with a lot of heuristics. A set of
heuristics. For example the response that
-
we received is that equal to a reply we
got from our control measurements, where
-
we know the IP is not blocked or poisoned
or something. The content is there. Or we
-
can actually look at the IP that we
received and see whether it has a valid
-
http cert, with or without the SNI or
servername identification or something.
-
And so on so forth. So we come up with
lots of heuristics to detect wrong
-
answers. The results of all these efforts
ended up being a project called
-
"Satellite", which was started by Will
Scott. I'm sure he is in the audience
-
somewhere. A great friend of mine and very
good supporter of CensoredPlanet.
-
Selflessly, he has been a miracle that I I
had the opportunity and fortune to meet
-
him. We have Satellite. Satellite automate
the whole steps that I told you. For this
-
work we use science that developed in both
of the work. We call it Satellite because
-
of seniority and sticking with the name. So
how much coverage Satellite has? If you
-
scan IPv4 you end up with 4.2 million open
DNS-resolvers in every country in their
-
territories. We make, we need, we we
actually need to make sure there are
-
ethics for that reason. If we put a harsh
condition. We say that let's only use the
-
ones that fallow their valid PTR record
followed this expression. Basically let's
-
just use the open DNS-resolvers that are
name servers or at least their PDR record
-
suggests that. This is a really harsh
constraint. Actually, my students have
-
been adding more and more regular
expression for the ones that we are sure
-
they are organizational. But for now just
being this harsh we have 40k of DNS-
-
revolvers in almost 169 countries I guess.
So censorship happened in other layers as
-
well. How do we want to deal with that
remote channel, with the remote side
-
channel? And, especially, like, what about
http traffic or disruption that can happen
-
to you know TLS centric. I hate water.
Oh no. Okay. So. So it's scratching
-
noise it's well documented that many DPIs
especially in the Great Firewall of China monitor
-
the traffic and then they see a key word,
a sensitive keyword like "Falun Gong".
-
They act and a drop traffic or send a RST.
And as I mentioned earlier there are
-
enough clear text everywhere. Even in TLS
handshakes SNI is in clear text. And for a
-
long time I was trying to come up with a
way of detecting application layer using
-
this fancy side channel. Like, how can I
detect that when the client and server
-
need to first establish a TCP handshake,
how the side channel can jump in and then
-
detect the rest? We were lucky enough that
the end pointed to a protocol called
-
"Echo". It's a protocol designed in 1983
and it's for testing reasons, for the
-
debu..it is a debugging tool, basically.
It's a predecessor to ping. And basically,
-
after you establish a TCP handshake to
port 7, whatever you send the Echo servers
-
on port 7 it's gonna echo it back. Now
think about it. How we can use Echo
-
servers to be able to detect application
layer blocking? Well, when it's not
-
available, let's say I have an Echo server
in the U.S. and a measurement machine in
-
the University of Michigan I establish a
TCP handshake and I send a GET request
-
to... using a censored keyboard for
example. It's gonna get back to me the
-
same thing I sent. But now let's put the
DPI that is gonna be triggered by it.
-
Well, for sure, either I'm going to
receive a RST first or something else. So
-
we can actually come up with a algorithm
to be able to use Echo servers to detect
-
disruptions on application layer.
Basically keyboards blocking, URL
-
blocking. Results of this is a tool called
Quack. And Quack actually uses Echo
-
servers to be able to detect in a scalable
way and say if, whether the keywords are
-
being blocked around the world. So what
did we do is first scan the whole IPv4. We
-
find 47k Echo servers running around the
world. Then we need to be able to check
-
whether they or not belong to the end
users. And that was a very challenging
-
part because there is not a clear signal
as it's.. there are 90 percent of them are
-
infrastructure but there is still some
portion of them that we don't know. So
-
what we do is we look at the FreedomHouse
reports and the countries that are
-
partially open or not open, not free or
partially free what they're called. This
-
is around 50... This is around 50
countries. And for those we use... we
-
randomly select some that we want and we
use OS detection of Nmap. And if you have,
-
it will give us back it's a server, it's a
switch and so on. We use those. So with
-
the help of so many collaborators after
almost six years we end up with three
-
systems that can capture TCP/IP blocking,
DNS, and application layer blocking using
-
infrastructure and organizational
machines. So while it was, it was a dream
-
or a vision that we can come up with a
better map to collect this data in a
-
continuous way, thanks to help of a lot of
people especially my students, Will, and
-
other collaborators we now have
CensoredPlanet. CensoredPlanet collects
-
semi-weekly snapshots of Internet
censorship using our vantage point in all
-
the layers and provide this data in a raw
format now in our web site. We also
-
provide some visualization way for people
to be able to see how many vantage points
-
we have in each country and so on. Of
course, this is the beginning of
-
CensoredPlanet. We launched this at August
and we have been collecting data for
-
almost four months and we have a long way
to go. We have users right now through
-
organizations using our data and helping
us debug by finding things that doesn't
-
make sense pointing to us and any of you
that ended up using these data, please
-
share your feedback with us and we are
very responsive to be able to change it,
-
not as much as you need. They have a
collective of very well dedicated people
-
participating. So, now that we have this
CensoredPlanet let me give you how it can
-
help when there is a political situation
going on. You all must remember around
-
October there Jamal Khashoggi, a
Washington Post reporter, disappeared,
-
killed at the Saudi Arabian embassy in
Turkey. At the time of this happening
-
there was a lot of media attention and
this, this news especially two weeks in
-
become very internationally spread.
CensoredPlanet didn't know this event was
-
going to happen. So we have been
collecting this data semi-weekly for 2000
-
domain or so. And so we went back and we
checked the Saudi Arabia. Did we see
-
anything interesting? And yes, we saw for
example at two weeks in, around October
-
16, the domains that we were that was news
category and media category, the
-
censorship related to those doubled. And
let me emphasize, we didn't see like a
-
block or not block over the whole country
not all the countries have a homogeneous
-
censorship happening. We saw it in
multiple of the ISPs that we had vantage
-
point. Actually I freaked out when one of
the activists in Saudi Arabia told us that
-
"I don't see this". And we said "What ISP
you are in?" And this wasn't the ISPs that
-
we had vantage point in. So we were
looking for hints that "Is anybody else
-
seeing what we were seeing?". And so we
ended up seeing there was a commander
-
lab project that also saw around October
16 the number of malwares or whatever they
-
are testing is also doubled or tripled. I
don't know the other. So something was
-
going on two weeks in when the news broke.
Let me emphasize this news media that I am
-
talking about or the global news media
that we check like L.A. Times, Fox News
-
and so on. But we also checked Arab News
which is as the activists told us is a
-
Saudi Arabia's propaganda newspaper. That
in one of the ISPs was being poisoned. So
-
again, censorship measurement is very
complex problem. So where we're heading?
-
Well, having said that about side channels
and the techniques that help us remotely
-
collect this data I have to also say that
the data we collect doesn't replicate the
-
picture of the internet censorship. I mean
having a root access on a volunteers
-
machine to do a detailed test is powerful.
So in the next step, in the next year, one
-
of our goal is to join force with OONI to
integrate the data and from remote and
-
basically local measurements to provide
the best of both worlds. Also, we have
-
been thinking a lot about what would be a
good visualization tools that doesn't end
-
up to misrepresent internet censorship. I
literally hate that one. Hate it. The
-
number of vantage point in countries are
not equal. We don't know whether all the
-
vantage points that the data has resulted
from it is from one ISP or all of our
-
ISPs. And then we test domains that are
like benign and like I don't know defined
-
based on some western values of the
freedom of expression. I believe in all of
-
them but still culture, economy might play
something red. And then we put colors on
-
the map, rank the countries, call some
countries awful and not giving full
-
attention to the others. So something
needs to be changed and it's in our
-
horizon too. Think about it more deeper.
We want to be able to have more statistic
-
tools to be able to spot when the patterns
change. We want to be able to compare the
-
countries when for example Telegram was
being blocked at Russia. If you remember
-
millions of IPs being blocked. If you
don't, know go to my friend Leonid's talk
-
about Russia. You're going to learn a lot
there. But anyway. So when the Russia was
-
blocking Telegram, I said to everyone I
bet in the following some other
-
governments are going to jump to block
Telegram as well. And that's actually what
-
we heard, rumors like that. So we need to
be able to do that automatically. And
-
overall, I want to be able to develop an
empirical science of internet censorship
-
based on rich data with the help of all of
you. CensoredPlanet is now being
-
maintained by a group of dedicated
students, great friends that I have and
-
needs engineers and political scientists
to jump on our data and help us to bring
-
meaning to what we are collecting. So if
you are a good engineer or a political
-
scientist or a dedicated person who wants
to change the world, reach out to me. For
-
as a reference for those of you
interested: these are the publications
-
that my talk was based on.
And now I am open to questions.
-
applause
-
Herald: Allright, perfect. Thank you so
much, Roya, so far. We have some time for
-
questions so if you have a question in the
room please go to one of the room
-
microphones one, two, three, four, and
five in the very back. And if you're
-
watching the stream you can ask questions
to the signal angel via IRC or Twitter and
-
we'll also make sure to relay those to the
speaker and make sure those get asked. So
-
let's just go ahead and
start with Mic two please.
-
Question: Hey, great talk. Do you worry
that by publishing your methods as well as
-
your data that you're going to get a
response from governments that are
-
censoring things such that it makes it
more difficult for you to monitor what's
-
being censored? Or has
that already happened?
-
Roya: It hasn't happened. We have control
measures to be able to detect that. But
-
that has been... it's a really good
question and often comes up after I
-
present. I can tell you based on my
experience it's really hard to synchronize
-
all the ISPs in all the countries to act
to the SYN-ACK and RST that I'm sending.
-
Like, for example for Augur, this is
unsolicited packets and for governments to
-
block that they are going to be a lot of
collateral damage. You might say that
-
well, Roya, they're going to block the IP
of the University of Michigan. They're a
-
spoofing machine. We have a measure for
that. I have multiple places that I
-
actually have a backup if that case
happened. But overall this is a global
-
scale measurement, and even in one city or
like multiple ISPs you know of it's really
-
hard to synchronize being like blocking
something and maintaining. So it is
-
something that's in our mind thinking
about. But as as of now it's not a worry.
-
Herald: All right then let's
go over to Mic one.
-
Question: Thank you. I wondered, it's kind
of similar to this question. What if you
-
are measuring from a country that is
blocking? Do you also distribute the
-
measurements over several countries?
Roya: Absolutely. Every snapshot that we
-
collect is from all the vantage point we
have in like certain countries and portion
-
of vantage point in like China or like US
because they have millions of vantage
-
points or like thousands of vantage
points. So basically at each snapshot,
-
which takes us three days, we collect the
data from all of all of the vantage point.
-
And so let's say that somebody is reacting
to us. We have a benign domain that we
-
check as well like for example a domain
example.com or random.com. So if we see
-
something going on there we actually
double check. But good point, because now
-
our efforts is very manual labor and we're
trying to automate everything so it's
-
still a challenge. Thank you.
Herald: All right then let's go to Mic
-
three.
Question: Hi. Have you measured how much
-
does IP-ID randomization
break your probes?
-
Roya: Oh. This is also really good. Let me
give a shout out to [name]. He's the guy
-
at 1998 discovered IP-ID or published
something that I ended up reading. So like
-
for example Linux or Ubuntu in the U.S.
version they randomized it but it still
-
draws this legacy operating system like
WindowsXP and predecessors and FreeBSD
-
that still have global IP-ID. So one
argument that often come up is, what if
-
all these machines get updated to the new
operating system where it doesn't have a
-
maintain global IP-ID? And I can tell you
that, well, we'll come up with another
-
side channel. For now, that works. But my
gut feeling is that if it didn't change
-
from 1998 until now with all the things
that everybody says that global IP-ID
-
variable is a horrible idea, it's not going
to change in the coming five years so
-
we're good.
Question: Thank you.
-
Herald: Okay, then let's just
move on to Mic four.
-
Question: Thank you very much for the
great talk. When you were introducing
-
Augur I was wondering, does the detection
of the blockage between client server
-
necessarily indicate censorship? So,
because you were talking about validating
-
Augur I was wondering if it turns out that
there is like a false alarm. What do you
-
think could be the potential cause?
Roya: You're absolutely right. And I tried
-
to emphasize on that that what we end up
collecting is can be seen as a disruption.
-
Something didn't work. The SYN-ACK or RST
got disrupted. Is that there is a
-
censorship or it can be a random packet
drop. And the way to be able to establish
-
that confidence is to check whether
aggregate the results. Do we see this
-
blocking between multiple of the routers
within that country or within that AS .
-
Because if one of this is for accident
that just didn't make sense or didn't get
-
dropped, what about the others? So the
whole idea and this is another point that
-
I'm so so concerned about: Most of this
report and anecdotes that we read is based
-
on one VPN or one man touch points in the
country. And then there are a lot of lot
-
of conclusion out of that. And you often
can ask that well this vantage point might
-
be subject to so many different things
than a government's censorship. Also I
-
emphasized that the censorship that I use
in this talk is any action that stops
-
users' access to get to the requested
content. I'm trying to get away from a
-
semantic where of the intention applied.
But great question.
-
Herald: All right, then let's go back to
Mic one right.
-
Question: Hi Roya. You mentioned that you
have a team of students working on all of
-
these frameworks. I was wondering if your
frameworks were open source are available
-
online for collaboration? And if so, where
those resources would be?
-
Roya: So the data is open. The code hasn't
been. For one reason is I'm so low
-
confident in sharing code, like I'm
friends with Philipp Winter, Dave Fifield.
-
These people are pro open source and they
constantly blame me for not. But it really
-
requires confidence to share code. So we
are working on that at least for Quack. I
-
think the code is very easily can be
shared. For Augur, we spent a heck amount
-
of time to make a production ready code
and for Satellite I think that is also
-
ready. I can share them personally with
you but before sharing to the world I want
-
to actually give another person to audit
and make sure we're not using a curse word
-
or something. I don't know. It's just
completely my mind being a little bit
-
conservative. But happy if you send me an
e-mail I send you to code.
-
Question: Thank you.
Herald: All right then move to Mic two.
-
Question: Thanks again for sharing your
great vision. I find it really
-
fascinating. Also I'm not really a data
scientist but my question is: did you find
-
any any usefulness in your approaches in
the spreading of the Internet of Things? I
-
understood that you used routers to make
queries but did you send and maybe receive
-
back any data from
washing machines, toasters,...?
-
Roya: I mean, I know, being ethical and
trying to not use end user machine limits
-
your access a lot. And but but but that's
our goal. We are going to stick with
-
things that don't belong to the end users.
And so it's all routers, organizational
-
machines. So I want to make sure that
whatever we're using belong to the
-
identity that can protect themselves if
something went wrong. They can just say
-
"Hey this is a freaking router, it
receives and sends so many things. I mean,
-
look, let me give you show you a TCP (?),
for example. A volunteer might not be able
-
to defend that because it's already
conspiring and collecting this data. But
-
good questions, I wish I could
but I won't pass that line.
-
Herald: All right. I don't see any more
questions in the room right now. But we
-
have one from the internet
so please, signal angel.
-
Signal Angel: Yes. Actually a question
from koli585: I was in an African
-
country where the internet has been
completely shut down. How can I quickly
-
and safely inform others
about the shut down?
-
Roya: So while I think local users' values
are highly highly needed they can use
-
social media like Twitter to send and say
whatever, there is a project called IODA.
-
It's a project at CAIDA UCSD University in
U.S. and Philipp Winter, Alberto
-
[Dainotti] and Alistair [King] are working
on that. They basically remotely keep
-
track of shutdowns and push them out. If
you look at the IODA on Twitter you can
-
see their live feed of how the shutdowns
where the shutdowns happen. So I haven't
-
thought about how to reach to the users
telling them what we see or how we can
-
incorporate the users' feedback. We are
working with a group of researchers that
-
already developed tools to receive this
data from Tweeters and basically use that
-
as some level of ground truth, but OONI
does such a great job that I haven't felt
-
a need.
Herald: Alright. Unless the signal angel
-
has another question? No?
Roya: And let me, can I add one thing? So
-
I was listening to a talk about how
Iranian versus Arabs were sympathetic
-
towards Boston bombing in United States
and there were a lot of assumptions and a
-
lot of conclusions were made that, oh
this, I'm completely paraphrasing. I don't
-
remember. But this Iranian doesn't care
because they didn't tweet as much. So
-
basically their input data was a bunch of
tweets around the time of Boston bombing.
-
After the talk was over I said: you know
that in this country Twitter has been
-
blocked and so many people couldn't tweet.
applause
-
Herald: Alright. That concludes our Q&A,
so thanks so much Roya.
-
Roya: Thank you.
-
applause
-
postroll music
-
Subtitles created by c3subtitles.de
in the year 2020. Join, and help us!