35C3 preroll music
Herald Angel: All right. It's my very big
pleasure to introduce Roya Ensafi to you.
She's gonna talk about "Censored Planet: a
Global Censorship Observatory". I'm
personally very interested in learning
more about this project. Sounds like it's
gonna be very important. So please welcome
Roya with a huge warm round of applause.
Thank you.
Applause
Roya: It's wonderful to finally make it to
CCC. I had joined talk with multiple of my
friends over the past years and the visa
stuff never worked out. This year I
applied for a conference in August and the
visa worked for coming to CCC. My name is
Roya Ensafi and I'm professor at the
University of Michigan. My research
focuses on security and privacy with the
goal of protecting users from adversarial
network. So basically I investigate
network interference ...and somebody is
interfering right now. Damn it. What the
heck. Cool, I'm good. Oh, no I'm not.
laughter OK. In my lab we develop
techniques and systems to be able to
detect network interference often at a
scale and apply these frameworks and tools
to be able to understand the behaviors of
these actors that do the interference and
use this understanding to be able to come
up with a defense. Today I'm going to talk
about a project that is very dear to my
heart. The one that I spent six years
working on it. And in this talk I'm going
to talk about censorship, internet
censorship. And by that I mean any action
that prevents users' access to the
requested content. We have heard an
alarming level of censorship happening all
around the world. And while it was
previously multiple countries that were
capable of using deep packet inspections
to tamper with user traffic thanks to
commercialization of these DPIs now many
countries are actually messing with users'
data. For the first time that the users
type CNN.com in their browsers, their
traffic is subject to some level of
interference by different actors. First
for example the DNS query where the
mapping between the domain and the IP
where the content is, can be manipulated.
For example the DNS assets can be a dead
IP where the content is not there. If the
DNS succeed then the users and the servers
are going to establish a connection, TCP
handshake and that can be easily blocked.
If that succeed then users and servers
start actually sending back and forth the
actual data and there are enough to clear
text to be the traffic encrypted or not
that the DPI can detect a sensitive
keyboard and send a reset package to both
basically shut down the connections.
Before I forget let me tell you and
emphasize that it's not just the
governments and the policies that impose
on the ISPs that lead to censorship.
Actually server side which provides the
data are also blocking users. Especially
if they are located in a region that they
don't provide any revenue. We recently
investigated this issue of dual blocking
in deep and provide more details about
what role CDNs actually provide. Imagine
now we have how many users, how many ISPs,
how many transit networks and how many
websites. Each of which are going to have
their own policies of how to block users'
access. More, censorship changes from time
to time, region to region and country to
country. And for that reason many
researchers including me have been
interested in collecting data about
censorship in a global way and
continuously. Well, I grew up under severe
censorship. Be it the university,
government, more frustrating the server
side. And I genuinely believe that
censorship take away opportunities and
degrade human dignity. It is not just
China, Bahrain, Turkey that does internet
censorship. Actually with the DPIs become
cheaper and cheaper many governments are
following their leads. As a result
Internet is becoming more and more
balkanized and the users around the world
are going to soon have a very very
different pictures of what this Internet
is. And we need to be able to collect the
data and to be able to know what is being
censored, how it's being censored, where
it's being censored and for how long. This
data then can be used to bring
transparency and accountability to
governments or private companies that
practice internet censorship. It can help
us to know where the circumvention to,
where the defense needs to be deployed. It
can help us to let the users around the
world to know what their governments are
up to and more important provide valid and
good data for the policymakers to come up
with the good policies. Existing research
already shows that if we can provide this
data to users they act by their own will
to ensure Internet freedom. For many years
my goal has been to come up with a weather
map, a censorship weather map where you
can actually see changes in censorship
over time, how some countries are
different from others and do that for a
continuous duration of time, and for all
over the world. Creating such a map was
impossible with the techniques, Internet
measurement methods that we had at that
time. At the time and even the common
techniques we now use. The measurement
methods to be able to use for measuring
internet censorship is often by deploying
a software or giving your customized
Raspberry Pi to either a client or a
server and based on that measure what's
happening between client and servers.
Well, this approach has a lot of
limitations. For example there are not
that many volunteers around the whole
world that are eager to download a
software and run it. Second, the data
collected from this approach are often not
continuous because the user's connection
can die for a variety of reasons or users
may loose interest to keep running the
software. And therefore we end up with
sparse data where we cannot have a good
baseline for internet censorship studies.
More measuring domains that are sensitive
often create risks for the local
collaborators and might end up with their
government's retaliate. These risks are
not hypothetical. When the Arab Spring was
happening I was approached by many
colleagues to recruit local friends and
colleagues in Middle East to be able to
collect measurement data at the time that
was very interesting to capture the
behavior of the network and most dangerous
for the locals, and volunteers to collect
that. My painting actually expressed what
I felt at the time. I can't just imagine
asking people on the ground to help at
these times of unrest. In my opinion,
conspiring to collect the data against the
government's interest can be seen as an
act of treason. And these governments are
unpredictable often. So it has exposed
these volunteers to a severe risk. While
no one has yet been arrested because of
measuring internet censorship as far as we
know, and I don't know how we can know
that on a global scale, I think the clouds
are on the horizon. I'm still at awe how
Turkish government used their surveillance
data at a time of a co-op and tracked down
and detained hundreds of users because
there was a traffic between them and by
luck a messenger app that was used by co-
op administrators. These things happens.
Before I continue, if you know OONI you
might ask how OONI prevents risk. Well,
with a great level of efforts. And if you
don't know OONI, OONI is a global
community of volunteers that collect data
about censorship around the world. Well,
first and foremost they provide their
volunteers with the very honest consent,
telling them that "hey, if you run this
software, anybody who is monitoring your
traffic know what you're up to." They also
go out of their way to give freedom to
these volunteers to choose what website
they want to run, what data they want to
push. They establish a great relationship
with the local activist organization in
the countries. Well, now that I prove to
you guys that I am a supporter of OONI and
I am actually friends with most of them; I
want to emphasize that I still believe
that consistent and continuous and global
data about censorship requires a new
approach that doesn't need volunteers'
help. I've become obsessed with solving
this problems. What if we could measure
without a client, in anywhere around the
world, can talk to a server without being
close to a client. Somewhere from here,
from University of Michigan. And see
whether the two hosts can talk to each
other, globally and remotely, off the
path. When I talk to the people about
this, honestly, everybody was like "you
don't know what you're talking about, it's
really really challenging". Well, they
were right. The challenge is there, and
I'm going to walk you through it. We have
at least 140 million IP addresses that
respond to same packet. This means they
speak to the world, and they follow
blindly TCP/IP protocol. So the question
becomes: how can I leverage the subtle
properties of TCP/IP to be able to detect
that two hosts can talk to each other?
Well, Spooky Scan is a technique that Jed
Crandall from University of New Mexico and
I developed that uses TCP/IP side channels
to be able to detect whether the two
remote hosts can establish a TCP handshake
or not, and if not, in which direction the
packets are being dropped. Off the path
and remotely. And I'm gonna start telling
you how this works. First I have to cover
some background. So any connection that is
based on TCP, one of the basic
communication protocols we have, is it
needs to establish a TCP handshake. So
basically you should, you send a SYN and
in the packet you send, in the IP header,
you have a field called "identification
IP_ID", and this field is used for
fragmentation reason, and I'm going to use
this field a lot in the rest of the talk.
After the user received a SYN, it is going
to send a SYN-ACK back, have another IP_ID
in it. And then, if I want to establish a
connection I send ACK. Otherwise I send a
RESET (RST). Part of the protocol says
that if you send a SYN-ACK packet to a
machine with a port open or closed, it's
going to send you a RST, telling you "what
the heck you are sending me SYN-ACK, I
didn't send you a SYN" and another part
said: if you send a SYN packet to a
machine with the port open, eager to
establish connection, it will send you a
SYN-ACK. If you don't do anything, because
TCP/IP is reliable, it's going to send you
multple SYN-ACK. It depends on operating
system, 3, 5, you name it. Spooky Scan
requires some basic characteristics. For
example, the client, the vantage points
that we are interested, should maintain a
global variable for the IP_ID. It means
that, when they receive the packets and
they want to send a packet out, no matter
who they're sending the packet to, this
IP_ID is going to be a shared resource, as
in going to be increment by one. So by
just watching the IP_ID changes you can
see how much a machine is noisy, how much
a machine is sending traffic out. A server
should have a port open, let's say 80 or
443, and wants to establish a connection,
and the measurement machine, me, should be
able to spoof packets. It means sending
packet with the source IP different from
my own machine. To be able to do that, you
need to talk to upstream network and ask
them not to drop the packets. All of these
requirements I could easily satisfy with a
little bit of effort. A Spooky Scan starts
with measurement machine send a SYN-ACK
packet to one of this client with a global
IP_ID, at a time let's say the value is
7000. The client is going to send back a
RST, following the protocol, revealing to
me what the value of IP_ID. In the next
step I'm going to send a spoofed SYN
packet to a server using a client IP. As a
result, the SYN-ACK is going to be sent to
the client. Again, client is going to send
a RST back, the IP_ID is going to be
incremented by 1. Next time I query IP_ID
I'm going to see a jump too. In a
noiseless model, I know that this machine
talked to the server. If I query it again,
I won't see any jump. So, Delta 2, Delta
1. Now imagine there is a firewall that
blocks the SYN-ACKs going from the server
to the client. Well, it doesn't matter how
much of the traffic I send, it's not going
to get there. It's not going to get there.
So the delta I see is 1, 1. In the third
case when the packets are going to be
dropped from the client to the server:
Well, my SYN-ACK gets there. The SYN-ACK
gets to the client, the client is going to
set the RST back, but it's not going to
get to the server. And so server thinks
that a packet got dropped, so it's going
to send multiple SYN-ACK. And as a result
the RST is going to be plus plus more. And
so what jump I would see is, let's say, 2,
2. Let me put them all together. So you
have 3 cases. Blocking in this direction.
No blocking and blocking in the other. And
you see different jumps or different
deltas. So it's detectable. Yes, yes, in a
noiseless model. I know the clients talk
to so many others and the IP_ID is going
to be changed because of a variety of
reason. I call all of those noise. And
this is how we are going to deal with it.
Well, intuitively thinking we can amplify
the signal. We can actually instead of
sending one spoofed SYN packet we can send
n. And for a variety of reasons packets
can get dropped. So we need to repeat this
measurement. So here is some data from a
Spooky Scan where I used the following
probing method. For 30 seconds I spoofed
the, I've sent a query for IP_ID. And then
for another 30 seconds I send these 5
spoofed SYN packets. This is machines or
clients in Azerbaijan, China and United
States. And we wanted to check whether it
has reached the TOR-relay that we had in
Sweden. You can see there are different
jump or different levels-shift that you
observe in a second phase. And just
visually looking at it or using auto-
regressive moving average or ARMA you
can actually detect that. But there is an
insight here, which is that not all the
clients have the same level of noise. And
for which, for some of them, especially
these guys, you could easily detect after
five level of sending IP_ID-query and then
five seconds of spoofing. So in the
follow-up work we tried to use this
insight, to be able to come up with a
scalable and efficient technique to be
able to use it in a global way. And that
technique is called "Augur". Well Augur
adopts this probing method. First, for four
seconds it queries IP_ID, then in one
second sends 10 spoofed SYN-packets. Then
look at the IP_ID-acceleration or second
derivative, and see whether we see a jump,
a sudden jump at the time of perturbation,
when we did the spoofing. How confident we
are that that jump is the result of our
own spoofed packet? Well, I'm not
confident, run it again. I think so, run
it again, until you have a sufficient
confidence. It turns out there is a
statistical analysis called "sequential
hypothesis testing" that can be used to be
able to gradually improve our confidence
about the case we're detecting. So I'm
going to give you a very, very rough
overview of how this works. But for
sequential hypothesis testing we need to
define a random variable. And we use
IP_ID-acceleration at the time of
perturbation, being 1 or 0, based on you
see jump or not. We also need to calculate
some empirical priors, known
probabilities. If you look at everything,
what would be the probability that you see
jump when there is actually no blocking?
And so on. After we put all this together
then we can formalize an algorithm
starting by run a trial. Update the
sequence of values for the random
variables. Then check whether this
sequence of values belongs to the
distribution of where the blocking happen
or not. What's the likelihood of that? If
you're confident, if we reached the level
that we are satisfied, then we call it a
case. So putting all this together this is
how Augur works. We scan the whole IPv4,
find global IP_ID-machines. And then we
have some constraint that is it a stable
machine? Is it a noisier or have a noise
that you want to deal with? We also need
to figure out what website are we
interested to test reachability towards?
What countries we are? So after we decide
all the input then we run a scheduler
making sure that no client and server are
under the measurement in the same time
because they mess each other's detection.
And then we actually use our analysis to
be able to call the case and summarize the
results. I started by saying that the
common methods have this limitation, for
example coverage continuity and ethics.
Well, when it comes to coverage there are
more than 22-million global IP_ID-
machines. These are WindowsXP or
predecessors. And FreeBSDs for
example. Compared to the previous board,
one successful project is the RIPE-atlas,
and they have around 10000 probes globally
deployed. When it comes to continuity we
don't depend on the end user. So it's much
more reliable to use this. Well, by not
asking volunteers to help we were already
reducing the risk. Because there is no
users conspiring against their governments
to collect this data. But our approach is
not also zero risk. If you look you have a
different kind of risk here. The client
and server exchanging SYN-ACK and RST
without each of them giving a consent. And
we don't want to ask for consent. Because
if you do, the dilemma exists. We have to
go back and it's just the same that's
asking volunteers. So, to deal with that
and cope with that, to reduce the risk
more, we don't use end-IPs. We actually
use 2 hops back, routers which high
probability they are infrastructure
machines and use those as a vantage point.
Even in this harsh constraint we still
have 53000 global IP_ID-routers. To test
the framework to see that whether Augur
works we chose 2000 of these global IP_ID-
machines, uniformly selected from all the
countries we had vantage point. We
selected websites from Citizen Lab
Testlist. This is the research
organization in Toronto University where
they crowdsourced websites that are
potentially being blocked or potential
sensitive. And then we used thousands of
the websites from Alexa top-10k. And then
we get the Augur running for 17 days and
collect this data. One of the challenges
that we have to validate Augur was like:
So, what is the truth? What is the ground-
truth? What would we see that makes sense?
So, and this is the biggest and
fundamental challenge for internet-
censorship anyway. But so the first
approach is leaning on intuition, which is
like no client should show blocking
towards all the websites. No server should
show blocking for bulk of our clients. And
if anything happens like that we just
trash it. And we should see more bias
towards the sensitive domain versus the
ones that are popular. And so on. And also
we hope to replicate the anecdotes, the
reports out there. And we did all of
those. And that's how we validate Augur.
So at the end Augur is a system that is as
scalable and efficient, ethical and can be
used to detect TCP/IP-blocking
continuously. Yes I know that is just
TCP/IP. What about the other layers? Can
we measure them remotely as well? Well,
let me focus on the DNS. You might ask: Is
there a way that we can remotely detect
DNS poisoning or manipulation? Well let's
think it out loud. From now on I'm gonna
give just the highlights of the papers we
work for the lack of the time. Well, if we
scan the whole IPv4 we have a lot of open
DNS resolvers, which means that they are
open to anybody sending a query to them to
resolve. And these open DNS-resolvers can
be used as a vantage point. We can use
open DNS-resolvers in different ISPs
around the world to see whether that DNS
queries are poisoned or not. Well, wait.
We need to make sure that they don't
belong to the end user. So we come up with
a lot of checks to make sure that these
open DNS-resolvers are organizational,
belonging to the ISP or infrastructure.
After we do that then we start sending all
our queries to these, let's say, open DNS-
resolvers in the ISP in Bahrain, for all
the domain we're interested. And capture
what we receive what IPs we receive. The
challenge is then to detect what is the
wrong answer. And so we have to come up
with a lot of heuristics. A set of
heuristics. For example the response that
we received is that equal to a reply we
got from our control measurements, where
we know the IP is not blocked or poisoned
or something. The content is there. Or we
can actually look at the IP that we
received and see whether it has a valid
http cert, with or without the SNI or
servername identification or something.
And so on so forth. So we come up with
lots of heuristics to detect wrong
answers. The results of all these efforts
ended up being a project called
"Satellite", which was started by Will
Scott. I'm sure he is in the audience
somewhere. A great friend of mine and very
good supporter of CensoredPlanet.
Selflessly, he has been a miracle that I I
had the opportunity and fortune to meet
him. We have Satellite. Satellite automate
the whole steps that I told you. For this
work we use science that developed in both
of the work. We call it Satellite because
of seniority and sticking with the name. So
how much coverage Satellite has? If you
scan IPv4 you end up with 4.2 million open
DNS-resolvers in every country in their
territories. We make, we need, we we
actually need to make sure there are
ethics for that reason. If we put a harsh
condition. We say that let's only use the
ones that fallow their valid PTR record
followed this expression. Basically let's
just use the open DNS-resolvers that are
name servers or at least their PDR record
suggests that. This is a really harsh
constraint. Actually, my students have
been adding more and more regular
expression for the ones that we are sure
they are organizational. But for now just
being this harsh we have 40k of DNS-
revolvers in almost 169 countries I guess.
So censorship happened in other layers as
well. How do we want to deal with that
remote channel, with the remote side
channel? And, especially, like, what about
http traffic or disruption that can happen
to you know TLS centric. I hate water.
Oh no. Okay. So. So it's scratching
noise it's well documented that many DPIs
especially in the Great Firewall of China monitor
the traffic and then they see a key word,
a sensitive keyword like "Falun Gong".
They act and a drop traffic or send a RST.
And as I mentioned earlier there are
enough clear text everywhere. Even in TLS
handshakes SNI is in clear text. And for a
long time I was trying to come up with a
way of detecting application layer using
this fancy side channel. Like, how can I
detect that when the client and server
need to first establish a TCP handshake,
how the side channel can jump in and then
detect the rest? We were lucky enough that
the end pointed to a protocol called
"Echo". It's a protocol designed in 1983
and it's for testing reasons, for the
debu..it is a debugging tool, basically.
It's a predecessor to ping. And basically,
after you establish a TCP handshake to
port 7, whatever you send the Echo servers
on port 7 it's gonna echo it back. Now
think about it. How we can use Echo
servers to be able to detect application
layer blocking? Well, when it's not
available, let's say I have an Echo server
in the U.S. and a measurement machine in
the University of Michigan I establish a
TCP handshake and I send a GET request
to... using a censored keyboard for
example. It's gonna get back to me the
same thing I sent. But now let's put the
DPI that is gonna be triggered by it.
Well, for sure, either I'm going to
receive a RST first or something else. So
we can actually come up with a algorithm
to be able to use Echo servers to detect
disruptions on application layer.
Basically keyboards blocking, URL
blocking. Results of this is a tool called
Quack. And Quack actually uses Echo
servers to be able to detect in a scalable
way and say if, whether the keywords are
being blocked around the world. So what
did we do is first scan the whole IPv4. We
find 47k Echo servers running around the
world. Then we need to be able to check
whether they or not belong to the end
users. And that was a very challenging
part because there is not a clear signal
as it's.. there are 90 percent of them are
infrastructure but there is still some
portion of them that we don't know. So
what we do is we look at the FreedomHouse
reports and the countries that are
partially open or not open, not free or
partially free what they're called. This
is around 50... This is around 50
countries. And for those we use... we
randomly select some that we want and we
use OS detection of Nmap. And if you have,
it will give us back it's a server, it's a
switch and so on. We use those. So with
the help of so many collaborators after
almost six years we end up with three
systems that can capture TCP/IP blocking,
DNS, and application layer blocking using
infrastructure and organizational
machines. So while it was, it was a dream
or a vision that we can come up with a
better map to collect this data in a
continuous way, thanks to help of a lot of
people especially my students, Will, and
other collaborators we now have
CensoredPlanet. CensoredPlanet collects
semi-weekly snapshots of Internet
censorship using our vantage point in all
the layers and provide this data in a raw
format now in our web site. We also
provide some visualization way for people
to be able to see how many vantage points
we have in each country and so on. Of
course, this is the beginning of
CensoredPlanet. We launched this at August
and we have been collecting data for
almost four months and we have a long way
to go. We have users right now through
organizations using our data and helping
us debug by finding things that doesn't
make sense pointing to us and any of you
that ended up using these data, please
share your feedback with us and we are
very responsive to be able to change it,
not as much as you need. They have a
collective of very well dedicated people
participating. So, now that we have this
CensoredPlanet let me give you how it can
help when there is a political situation
going on. You all must remember around
October there Jamal Khashoggi, a
Washington Post reporter, disappeared,
killed at the Saudi Arabian embassy in
Turkey. At the time of this happening
there was a lot of media attention and
this, this news especially two weeks in
become very internationally spread.
CensoredPlanet didn't know this event was
going to happen. So we have been
collecting this data semi-weekly for 2000
domain or so. And so we went back and we
checked the Saudi Arabia. Did we see
anything interesting? And yes, we saw for
example at two weeks in, around October
16, the domains that we were that was news
category and media category, the
censorship related to those doubled. And
let me emphasize, we didn't see like a
block or not block over the whole country
not all the countries have a homogeneous
censorship happening. We saw it in
multiple of the ISPs that we had vantage
point. Actually I freaked out when one of
the activists in Saudi Arabia told us that
"I don't see this". And we said "What ISP
you are in?" And this wasn't the ISPs that
we had vantage point in. So we were
looking for hints that "Is anybody else
seeing what we were seeing?". And so we
ended up seeing there was a commander
lab project that also saw around October
16 the number of malwares or whatever they
are testing is also doubled or tripled. I
don't know the other. So something was
going on two weeks in when the news broke.
Let me emphasize this news media that I am
talking about or the global news media
that we check like L.A. Times, Fox News
and so on. But we also checked Arab News
which is as the activists told us is a
Saudi Arabia's propaganda newspaper. That
in one of the ISPs was being poisoned. So
again, censorship measurement is very
complex problem. So where we're heading?
Well, having said that about side channels
and the techniques that help us remotely
collect this data I have to also say that
the data we collect doesn't replicate the
picture of the internet censorship. I mean
having a root access on a volunteers
machine to do a detailed test is powerful.
So in the next step, in the next year, one
of our goal is to join force with OONI to
integrate the data and from remote and
basically local measurements to provide
the best of both worlds. Also, we have
been thinking a lot about what would be a
good visualization tools that doesn't end
up to misrepresent internet censorship. I
literally hate that one. Hate it. The
number of vantage point in countries are
not equal. We don't know whether all the
vantage points that the data has resulted
from it is from one ISP or all of our
ISPs. And then we test domains that are
like benign and like I don't know defined
based on some western values of the
freedom of expression. I believe in all of
them but still culture, economy might play
something red. And then we put colors on
the map, rank the countries, call some
countries awful and not giving full
attention to the others. So something
needs to be changed and it's in our
horizon too. Think about it more deeper.
We want to be able to have more statistic
tools to be able to spot when the patterns
change. We want to be able to compare the
countries when for example Telegram was
being blocked at Russia. If you remember
millions of IPs being blocked. If you
don't, know go to my friend Leonid's talk
about Russia. You're going to learn a lot
there. But anyway. So when the Russia was
blocking Telegram, I said to everyone I
bet in the following some other
governments are going to jump to block
Telegram as well. And that's actually what
we heard, rumors like that. So we need to
be able to do that automatically. And
overall, I want to be able to develop an
empirical science of internet censorship
based on rich data with the help of all of
you. CensoredPlanet is now being
maintained by a group of dedicated
students, great friends that I have and
needs engineers and political scientists
to jump on our data and help us to bring
meaning to what we are collecting. So if
you are a good engineer or a political
scientist or a dedicated person who wants
to change the world, reach out to me. For
as a reference for those of you
interested: these are the publications
that my talk was based on.
And now I am open to questions.
applause
Herald: Allright, perfect. Thank you so
much, Roya, so far. We have some time for
questions so if you have a question in the
room please go to one of the room
microphones one, two, three, four, and
five in the very back. And if you're
watching the stream you can ask questions
to the signal angel via IRC or Twitter and
we'll also make sure to relay those to the
speaker and make sure those get asked. So
let's just go ahead and
start with Mic two please.
Question: Hey, great talk. Do you worry
that by publishing your methods as well as
your data that you're going to get a
response from governments that are
censoring things such that it makes it
more difficult for you to monitor what's
being censored? Or has
that already happened?
Roya: It hasn't happened. We have control
measures to be able to detect that. But
that has been... it's a really good
question and often comes up after I
present. I can tell you based on my
experience it's really hard to synchronize
all the ISPs in all the countries to act
to the SYN-ACK and RST that I'm sending.
Like, for example for Augur, this is
unsolicited packets and for governments to
block that they are going to be a lot of
collateral damage. You might say that
well, Roya, they're going to block the IP
of the University of Michigan. They're a
spoofing machine. We have a measure for
that. I have multiple places that I
actually have a backup if that case
happened. But overall this is a global
scale measurement, and even in one city or
like multiple ISPs you know of it's really
hard to synchronize being like blocking
something and maintaining. So it is
something that's in our mind thinking
about. But as as of now it's not a worry.
Herald: All right then let's
go over to Mic one.
Question: Thank you. I wondered, it's kind
of similar to this question. What if you
are measuring from a country that is
blocking? Do you also distribute the
measurements over several countries?
Roya: Absolutely. Every snapshot that we
collect is from all the vantage point we
have in like certain countries and portion
of vantage point in like China or like US
because they have millions of vantage
points or like thousands of vantage
points. So basically at each snapshot,
which takes us three days, we collect the
data from all of all of the vantage point.
And so let's say that somebody is reacting
to us. We have a benign domain that we
check as well like for example a domain
example.com or random.com. So if we see
something going on there we actually
double check. But good point, because now
our efforts is very manual labor and we're
trying to automate everything so it's
still a challenge. Thank you.
Herald: All right then let's go to Mic
three.
Question: Hi. Have you measured how much
does IP-ID randomization
break your probes?
Roya: Oh. This is also really good. Let me
give a shout out to [name]. He's the guy
at 1998 discovered IP-ID or published
something that I ended up reading. So like
for example Linux or Ubuntu in the U.S.
version they randomized it but it still
draws this legacy operating system like
WindowsXP and predecessors and FreeBSD
that still have global IP-ID. So one
argument that often come up is, what if
all these machines get updated to the new
operating system where it doesn't have a
maintain global IP-ID? And I can tell you
that, well, we'll come up with another
side channel. For now, that works. But my
gut feeling is that if it didn't change
from 1998 until now with all the things
that everybody says that global IP-ID
variable is a horrible idea, it's not going
to change in the coming five years so
we're good.
Question: Thank you.
Herald: Okay, then let's just
move on to Mic four.
Question: Thank you very much for the
great talk. When you were introducing
Augur I was wondering, does the detection
of the blockage between client server
necessarily indicate censorship? So,
because you were talking about validating
Augur I was wondering if it turns out that
there is like a false alarm. What do you
think could be the potential cause?
Roya: You're absolutely right. And I tried
to emphasize on that that what we end up
collecting is can be seen as a disruption.
Something didn't work. The SYN-ACK or RST
got disrupted. Is that there is a
censorship or it can be a random packet
drop. And the way to be able to establish
that confidence is to check whether
aggregate the results. Do we see this
blocking between multiple of the routers
within that country or within that AS .
Because if one of this is for accident
that just didn't make sense or didn't get
dropped, what about the others? So the
whole idea and this is another point that
I'm so so concerned about: Most of this
report and anecdotes that we read is based
on one VPN or one man touch points in the
country. And then there are a lot of lot
of conclusion out of that. And you often
can ask that well this vantage point might
be subject to so many different things
than a government's censorship. Also I
emphasized that the censorship that I use
in this talk is any action that stops
users' access to get to the requested
content. I'm trying to get away from a
semantic where of the intention applied.
But great question.
Herald: All right, then let's go back to
Mic one right.
Question: Hi Roya. You mentioned that you
have a team of students working on all of
these frameworks. I was wondering if your
frameworks were open source are available
online for collaboration? And if so, where
those resources would be?
Roya: So the data is open. The code hasn't
been. For one reason is I'm so low
confident in sharing code, like I'm
friends with Philipp Winter, Dave Fifield.
These people are pro open source and they
constantly blame me for not. But it really
requires confidence to share code. So we
are working on that at least for Quack. I
think the code is very easily can be
shared. For Augur, we spent a heck amount
of time to make a production ready code
and for Satellite I think that is also
ready. I can share them personally with
you but before sharing to the world I want
to actually give another person to audit
and make sure we're not using a curse word
or something. I don't know. It's just
completely my mind being a little bit
conservative. But happy if you send me an
e-mail I send you to code.
Question: Thank you.
Herald: All right then move to Mic two.
Question: Thanks again for sharing your
great vision. I find it really
fascinating. Also I'm not really a data
scientist but my question is: did you find
any any usefulness in your approaches in
the spreading of the Internet of Things? I
understood that you used routers to make
queries but did you send and maybe receive
back any data from
washing machines, toasters,...?
Roya: I mean, I know, being ethical and
trying to not use end user machine limits
your access a lot. And but but but that's
our goal. We are going to stick with
things that don't belong to the end users.
And so it's all routers, organizational
machines. So I want to make sure that
whatever we're using belong to the
identity that can protect themselves if
something went wrong. They can just say
"Hey this is a freaking router, it
receives and sends so many things. I mean,
look, let me give you show you a TCP (?),
for example. A volunteer might not be able
to defend that because it's already
conspiring and collecting this data. But
good questions, I wish I could
but I won't pass that line.
Herald: All right. I don't see any more
questions in the room right now. But we
have one from the internet
so please, signal angel.
Signal Angel: Yes. Actually a question
from koli585: I was in an African
country where the internet has been
completely shut down. How can I quickly
and safely inform others
about the shut down?
Roya: So while I think local users' values
are highly highly needed they can use
social media like Twitter to send and say
whatever, there is a project called IODA.
It's a project at CAIDA UCSD University in
U.S. and Philipp Winter, Alberto
[Dainotti] and Alistair [King] are working
on that. They basically remotely keep
track of shutdowns and push them out. If
you look at the IODA on Twitter you can
see their live feed of how the shutdowns
where the shutdowns happen. So I haven't
thought about how to reach to the users
telling them what we see or how we can
incorporate the users' feedback. We are
working with a group of researchers that
already developed tools to receive this
data from Tweeters and basically use that
as some level of ground truth, but OONI
does such a great job that I haven't felt
a need.
Herald: Alright. Unless the signal angel
has another question? No?
Roya: And let me, can I add one thing? So
I was listening to a talk about how
Iranian versus Arabs were sympathetic
towards Boston bombing in United States
and there were a lot of assumptions and a
lot of conclusions were made that, oh
this, I'm completely paraphrasing. I don't
remember. But this Iranian doesn't care
because they didn't tweet as much. So
basically their input data was a bunch of
tweets around the time of Boston bombing.
After the talk was over I said: you know
that in this country Twitter has been
blocked and so many people couldn't tweet.
applause
Herald: Alright. That concludes our Q&A,
so thanks so much Roya.
Roya: Thank you.
applause
postroll music
Subtitles created by c3subtitles.de
in the year 2020. Join, and help us!