<i>35C3 preroll music</i>

Herald Angel: All right. It's my very big
pleasure to introduce Roya Ensafi to you.

She's gonna talk about "Censored Planet: a
Global Censorship Observatory". I'm

personally very interested in learning
more about this project. Sounds like it's

gonna be very important. So please welcome
Roya with a huge warm round of applause.

Thank you.

<i>Applause</i>

Roya: It's wonderful to finally make it to
CCC. I had joined talk with multiple of my

friends over the past years and the visa
stuff never worked out. This year I

applied for a conference in August and the
visa worked for coming to CCC. My name is

Roya Ensafi and I'm professor at the
University of Michigan. My research

focuses on security and privacy with the
goal of protecting users from adversarial

network. So basically I investigate
network interference ...and somebody is

interfering right now. Damn it. What the
heck. Cool, I'm good. Oh, no I'm not.

<i>laughter</i> OK. In my lab we develop
techniques and systems to be able to

detect network interference often at a
scale and apply these frameworks and tools

to be able to understand the behaviors of
these actors that do the interference and

use this understanding to be able to come
up with a defense. Today I'm going to talk

about a project that is very dear to my
heart. The one that I spent six years

working on it. And in this talk I'm going
to talk about censorship, internet

censorship. And by that I mean any action
that prevents users' access to the

requested content. We have heard an
alarming level of censorship happening all

around the world. And while it was
previously multiple countries that were

capable of using deep packet inspections
to tamper with user traffic thanks to

commercialization of these DPIs now many
countries are actually messing with users'

data. For the first time that the users
type CNN.com in their browsers, their

traffic is subject to some level of
interference by different actors. First

for example the DNS query where the
mapping between the domain and the IP

where the content is, can be manipulated.
For example the DNS assets can be a dead

IP where the content is not there. If the
DNS succeed then the users and the servers

are going to establish a connection, TCP
handshake and that can be easily blocked.

If that succeed then users and servers
start actually sending back and forth the

actual data and there are enough to clear
text to be the traffic encrypted or not

that the DPI can detect a sensitive
keyboard and send a reset package to both

basically shut down the connections.
Before I forget let me tell you and

emphasize that it's not just the
governments and the policies that impose

on the ISPs that lead to censorship.
Actually server side which provides the

data are also blocking users. Especially
if they are located in a region that they

don't provide any revenue. We recently
investigated this issue of dual blocking

in deep and provide more details about
what role CDNs actually provide. Imagine

now we have how many users, how many ISPs,
how many transit networks and how many

websites. Each of which are going to have
their own policies of how to block users'

access. More, censorship changes from time
to time, region to region and country to

country. And for that reason many
researchers including me have been

interested in collecting data about
censorship in a global way and

continuously. Well, I grew up under severe
censorship. Be it the university,

government, more frustrating the server
side. And I genuinely believe that

censorship take away opportunities and
degrade human dignity. It is not just

China, Bahrain, Turkey that does internet
censorship. Actually with the DPIs become

cheaper and cheaper many governments are
following their leads. As a result

Internet is becoming more and more
balkanized and the users around the world

are going to soon have a very very
different pictures of what this Internet

is. And we need to be able to collect the
data and to be able to know what is being

censored, how it's being censored, where
it's being censored and for how long. This

data then can be used to bring
transparency and accountability to

governments or private companies that
practice internet censorship. It can help

us to know where the circumvention to,
where the defense needs to be deployed. It

can help us to let the users around the
world to know what their governments are

up to and more important provide valid and
good data for the policymakers to come up

with the good policies. Existing research
already shows that if we can provide this

data to users they act by their own will
to ensure Internet freedom. For many years

my goal has been to come up with a weather
map, a censorship weather map where you

can actually see changes in censorship
over time, how some countries are

different from others and do that for a
continuous duration of time, and for all

over the world. Creating such a map was
impossible with the techniques, Internet

measurement methods that we had at that
time. At the time and even the common

techniques we now use. The measurement
methods to be able to use for measuring

internet censorship is often by deploying
a software or giving your customized

Raspberry Pi to either a client or a
server and based on that measure what's

happening between client and servers.
Well, this approach has a lot of

limitations. For example there are not
that many volunteers around the whole

world that are eager to download a
software and run it. Second, the data

collected from this approach are often not
continuous because the user's connection

can die for a variety of reasons or users
may loose interest to keep running the

software. And therefore we end up with
sparse data where we cannot have a good

baseline for internet censorship studies.
More measuring domains that are sensitive

often create risks for the local
collaborators and might end up with their

government's retaliate. These risks are
not hypothetical. When the Arab Spring was

happening I was approached by many
colleagues to recruit local friends and

colleagues in Middle East to be able to
collect measurement data at the time that

was very interesting to capture the
behavior of the network and most dangerous

for the locals, and volunteers to collect
that. My painting actually expressed what

I felt at the time. I can't just imagine
asking people on the ground to help at

these times of unrest. In my opinion,
conspiring to collect the data against the

government's interest can be seen as an
act of treason. And these governments are

unpredictable often. So it has exposed
these volunteers to a severe risk. While

no one has yet been arrested because of
measuring internet censorship as far as we

know, and I don't know how we can know
that on a global scale, I think the clouds

are on the horizon. I'm still at awe how
Turkish government used their surveillance

data at a time of a co-op and tracked down
and detained hundreds of users because

there was a traffic between them and by
luck a messenger app that was used by co-

op administrators. These things happens.
Before I continue, if you know OONI you

might ask how OONI prevents risk. Well,
with a great level of efforts. And if you

don't know OONI, OONI is a global
community of volunteers that collect data

about censorship around the world. Well,
first and foremost they provide their

volunteers with the very honest consent,
telling them that "hey, if you run this

software, anybody who is monitoring your
traffic know what you're up to." They also

go out of their way to give freedom to
these volunteers to choose what website

they want to run, what data they want to
push. They establish a great relationship

with the local activist organization in
the countries. Well, now that I prove to

you guys that I am a supporter of OONI and
I am actually friends with most of them; I

want to emphasize that I still believe
that consistent and continuous and global

data about censorship requires a new
approach that doesn't need volunteers'

help. I've become obsessed with solving
this problems. What if we could measure

without a client, in anywhere around the
world, can talk to a server without being

close to a client. Somewhere from here,
from University of Michigan. And see

whether the two hosts can talk to each
other, globally and remotely, off the

path. When I talk to the people about
this, honestly, everybody was like "you

don't know what you're talking about, it's
really really challenging". Well, they

were right. The challenge is there, and
I'm going to walk you through it. We have

at least 140 million IP addresses that
respond to same packet. This means they

speak to the world, and they follow
blindly TCP/IP protocol. So the question

becomes: how can I leverage the subtle
properties of TCP/IP to be able to detect

that two hosts can talk to each other?
Well, Spooky Scan is a technique that Jed

Crandall from University of New Mexico and
I developed that uses TCP/IP side channels

to be able to detect whether the two
remote hosts can establish a TCP handshake

or not, and if not, in which direction the
packets are being dropped. Off the path

and remotely. And I'm gonna start telling
you how this works. First I have to cover

some background. So any connection that is
based on TCP, one of the basic

communication protocols we have, is it
needs to establish a TCP handshake. So

basically you should, you send a SYN and
in the packet you send, in the IP header,

you have a field called "identification
IP_ID", and this field is used for

fragmentation reason, and I'm going to use
this field a lot in the rest of the talk.

After the user received a SYN, it is going
to send a SYN-ACK back, have another IP_ID

in it. And then, if I want to establish a
connection I send ACK. Otherwise I send a

RESET (RST). Part of the protocol says
that if you send a SYN-ACK packet to a

machine with a port open or closed, it's
going to send you a RST, telling you "what

the heck you are sending me SYN-ACK, I
didn't send you a SYN" and another part

said: if you send a SYN packet to a
machine with the port open, eager to

establish connection, it will send you a
SYN-ACK. If you don't do anything, because

TCP/IP is reliable, it's going to send you
multple SYN-ACK. It depends on operating

system, 3, 5, you name it. Spooky Scan
requires some basic characteristics. For

example, the client, the vantage points
that we are interested, should maintain a

global variable for the IP_ID. It means
that, when they receive the packets and

they want to send a packet out, no matter
who they're sending the packet to, this

IP_ID is going to be a shared resource, as
in going to be increment by one. So by

just watching the IP_ID changes you can
see how much a machine is noisy, how much

a machine is sending traffic out. A server
should have a port open, let's say 80 or

443, and wants to establish a connection,
and the measurement machine, me, should be

able to spoof packets. It means sending
packet with the source IP different from

my own machine. To be able to do that, you
need to talk to upstream network and ask

them not to drop the packets. All of these
requirements I could easily satisfy with a

little bit of effort. A Spooky Scan starts
with measurement machine send a SYN-ACK

packet to one of this client with a global
IP_ID, at a time let's say the value is

7000. The client is going to send back a
RST, following the protocol, revealing to

me what the value of IP_ID. In the next
step I'm going to send a spoofed SYN

packet to a server using a client IP. As a
result, the SYN-ACK is going to be sent to

the client. Again, client is going to send
a RST back, the IP_ID is going to be

incremented by 1. Next time I query IP_ID
I'm going to see a jump too. In a

noiseless model, I know that this machine
talked to the server. If I query it again,

I won't see any jump. So, Delta 2, Delta
1. Now imagine there is a firewall that

blocks the SYN-ACKs going from the server
to the client. Well, it doesn't matter how

much of the traffic I send, it's not going
to get there. It's not going to get there.

So the delta I see is 1, 1. In the third
case when the packets are going to be

dropped from the client to the server:
Well, my SYN-ACK gets there. The SYN-ACK

gets to the client, the client is going to
set the RST back, but it's not going to

get to the server. And so server thinks
that a packet got dropped, so it's going

to send multiple SYN-ACK. And as a result
the RST is going to be plus plus more. And

so what jump I would see is, let's say, 2,
2. Let me put them all together. So you

have 3 cases. Blocking in this direction.
No blocking and blocking in the other. And

you see different jumps or different
deltas. So it's detectable. Yes, yes, in a

noiseless model. I know the clients talk
to so many others and the IP_ID is going

to be changed because of a variety of
reason. I call all of those noise. And

this is how we are going to deal with it.
Well, intuitively thinking we can amplify

the signal. We can actually instead of
sending one spoofed SYN packet we can send

n. And for a variety of reasons packets
can get dropped. So we need to repeat this

measurement. So here is some data from a
Spooky Scan where I used the following

probing method. For 30 seconds I spoofed
the, I've sent a query for IP_ID. And then

for another 30 seconds I send these 5
spoofed SYN packets. This is machines or

clients in Azerbaijan, China and United
States. And we wanted to check whether it

has reached the TOR-relay that we had in
Sweden. You can see there are different

jump or different levels-shift that you
observe in a second phase. And just

visually looking at it or using auto-
regressive moving average or ARMA you

can actually detect that. But there is an
insight here, which is that not all the

clients have the same level of noise. And
for which, for some of them, especially

these guys, you could easily detect after
five level of sending IP_ID-query and then

five seconds of spoofing. So in the
follow-up work we tried to use this

insight, to be able to come up with a
scalable and efficient technique to be

able to use it in a global way. And that
technique is called "Augur". Well Augur

adopts this probing method. First, for four
seconds it queries IP_ID, then in one

second sends 10 spoofed SYN-packets. Then
look at the IP_ID-acceleration or second

derivative, and see whether we see a jump,
a sudden jump at the time of perturbation,

when we did the spoofing. How confident we
are that that jump is the result of our

own spoofed packet? Well, I'm not
confident, run it again. I think so, run

it again, until you have a sufficient
confidence. It turns out there is a

statistical analysis called "sequential
hypothesis testing" that can be used to be

able to gradually improve our confidence
about the case we're detecting. So I'm

going to give you a very, very rough
overview of how this works. But for

sequential hypothesis testing we need to
define a random variable. And we use

IP_ID-acceleration at the time of
perturbation, being 1 or 0, based on you

see jump or not. We also need to calculate
some empirical priors, known

probabilities. If you look at everything,
what would be the probability that you see

jump when there is actually no blocking?
And so on. After we put all this together

then we can formalize an algorithm
starting by run a trial. Update the

sequence of values for the random
variables. Then check whether this

sequence of values belongs to the
distribution of where the blocking happen

or not. What's the likelihood of that? If
you're confident, if we reached the level

that we are satisfied, then we call it a
case. So putting all this together this is

how Augur works. We scan the whole IPv4,
find global IP_ID-machines. And then we

have some constraint that is it a stable
machine? Is it a noisier or have a noise

that you want to deal with? We also need
to figure out what website are we

interested to test reachability towards?
What countries we are? So after we decide

all the input then we run a scheduler
making sure that no client and server are

under the measurement in the same time
because they mess each other's detection.

And then we actually use our analysis to
be able to call the case and summarize the

results. I started by saying that the
common methods have this limitation, for

example coverage continuity and ethics.
Well, when it comes to coverage there are

more than 22-million global IP_ID-
machines. These are WindowsXP or

predecessors. And FreeBSDs for
example. Compared to the previous board,

one successful project is the RIPE-atlas,
and they have around 10000 probes globally

deployed. When it comes to continuity we
don't depend on the end user. So it's much

more reliable to use this. Well, by not
asking volunteers to help we were already

reducing the risk. Because there is no
users conspiring against their governments

to collect this data. But our approach is
not also zero risk. If you look you have a

different kind of risk here. The client
and server exchanging SYN-ACK and RST

without each of them giving a consent. And
we don't want to ask for consent. Because

if you do, the dilemma exists. We have to
go back and it's just the same that's

asking volunteers. So, to deal with that
and cope with that, to reduce the risk

more, we don't use end-IPs. We actually
use 2 hops back, routers which high

probability they are infrastructure
machines and use those as a vantage point.

Even in this harsh constraint we still
have 53000 global IP_ID-routers. To test

the framework to see that whether Augur
works we chose 2000 of these global IP_ID-

machines, uniformly selected from all the
countries we had vantage point. We

selected websites from Citizen Lab
Testlist. This is the research

organization in Toronto University where
they crowdsourced websites that are

potentially being blocked or potential
sensitive. And then we used thousands of

the websites from Alexa top-10k. And then
we get the Augur running for 17 days and

collect this data. One of the challenges
that we have to validate Augur was like:

So, what is the truth? What is the ground-
truth? What would we see that makes sense?

So, and this is the biggest and
fundamental challenge for internet-

censorship anyway. But so the first
approach is leaning on intuition, which is

like no client should show blocking
towards all the websites. No server should

show blocking for bulk of our clients. And
if anything happens like that we just

trash it. And we should see more bias
towards the sensitive domain versus the

ones that are popular. And so on. And also
we hope to replicate the anecdotes, the

reports out there. And we did all of
those. And that's how we validate Augur.

So at the end Augur is a system that is as
scalable and efficient, ethical and can be

used to detect TCP/IP-blocking
continuously. Yes I know that is just

TCP/IP. What about the other layers? Can
we measure them remotely as well? Well,

let me focus on the DNS. You might ask: Is
there a way that we can remotely detect

DNS poisoning or manipulation? Well let's
think it out loud. From now on I'm gonna

give just the highlights of the papers we
work for the lack of the time. Well, if we

scan the whole IPv4 we have a lot of open
DNS resolvers, which means that they are

open to anybody sending a query to them to
resolve. And these open DNS-resolvers can

be used as a vantage point. We can use
open DNS-resolvers in different ISPs

around the world to see whether that DNS
queries are poisoned or not. Well, wait.

We need to make sure that they don't
belong to the end user. So we come up with

a lot of checks to make sure that these
open DNS-resolvers are organizational,

belonging to the ISP or infrastructure.
After we do that then we start sending all

our queries to these, let's say, open DNS-
resolvers in the ISP in Bahrain, for all

the domain we're interested. And capture
what we receive what IPs we receive. The

challenge is then to detect what is the
wrong answer. And so we have to come up

with a lot of heuristics. A set of
heuristics. For example the response that

we received is that equal to a reply we
got from our control measurements, where

we know the IP is not blocked or poisoned
or something. The content is there. Or we

can actually look at the IP that we
received and see whether it has a valid

http cert, with or without the SNI or
servername identification or something.

And so on so forth. So we come up with
lots of heuristics to detect wrong

answers. The results of all these efforts
ended up being a project called

"Satellite", which was started by Will
Scott. I'm sure he is in the audience

somewhere. A great friend of mine and very
good supporter of CensoredPlanet.

Selflessly, he has been a miracle that I I
had the opportunity and fortune to meet

him. We have Satellite. Satellite automate
the whole steps that I told you. For this

work we use science that developed in both
of the work. We call it Satellite because

of seniority and sticking with the name. So
how much coverage Satellite has? If you

scan IPv4 you end up with 4.2 million open
DNS-resolvers in every country in their

territories. We make, we need, we we
actually need to make sure there are

ethics for that reason. If we put a harsh
condition. We say that let's only use the

ones that fallow their valid PTR record
followed this expression. Basically let's

just use the open DNS-resolvers that are
name servers or at least their PDR record

suggests that. This is a really harsh
constraint. Actually, my students have

been adding more and more regular
expression for the ones that we are sure

they are organizational. But for now just
being this harsh we have 40k of DNS-

revolvers in almost 169 countries I guess.
So censorship happened in other layers as

well. How do we want to deal with that
remote channel, with the remote side

channel? And, especially, like, what about
http traffic or disruption that can happen

to you know TLS centric. I hate water.
Oh no. Okay. So. So it's <i>scratching</i>

<i>noise</i> it's well documented that many DPIs
especially in the Great Firewall of China monitor

the traffic and then they see a key word,
a sensitive keyword like "Falun Gong".

They act and a drop traffic or send a RST.
And as I mentioned earlier there are

enough clear text everywhere. Even in TLS
handshakes SNI is in clear text. And for a

long time I was trying to come up with a
way of detecting application layer using

this fancy side channel. Like, how can I
detect that when the client and server

need to first establish a TCP handshake,
how the side channel can jump in and then

detect the rest? We were lucky enough that
the end pointed to a protocol called

"Echo". It's a protocol designed in 1983
and it's for testing reasons, for the

debu..it is a debugging tool, basically.
It's a predecessor to ping. And basically,

after you establish a TCP handshake to
port 7, whatever you send the Echo servers

on port 7 it's gonna echo it back. Now
think about it. How we can use Echo

servers to be able to detect application
layer blocking? Well, when it's not

available, let's say I have an Echo server
in the U.S. and a measurement machine in

the University of Michigan I establish a
TCP handshake and I send a GET request

to... using a censored keyboard for
example. It's gonna get back to me the

same thing I sent. But now let's put the
DPI that is gonna be triggered by it.

Well, for sure, either I'm going to
receive a RST first or something else. So

we can actually come up with a algorithm
to be able to use Echo servers to detect

disruptions on application layer.
Basically keyboards blocking, URL

blocking. Results of this is a tool called
Quack. And Quack actually uses Echo

servers to be able to detect in a scalable
way and say if, whether the keywords are

being blocked around the world. So what
did we do is first scan the whole IPv4. We

find 47k Echo servers running around the
world. Then we need to be able to check

whether they or not belong to the end
users. And that was a very challenging

part because there is not a clear signal
as it's.. there are 90 percent of them are

infrastructure but there is still some
portion of them that we don't know. So

what we do is we look at the FreedomHouse
reports and the countries that are

partially open or not open, not free or
partially free what they're called. This

is around 50... This is around 50
countries. And for those we use... we

randomly select some that we want and we
use OS detection of Nmap. And if you have,

it will give us back it's a server, it's a
switch and so on. We use those. So with

the help of so many collaborators after
almost six years we end up with three

systems that can capture TCP/IP blocking,
DNS, and application layer blocking using

infrastructure and organizational
machines. So while it was, it was a dream

or a vision that we can come up with a
better map to collect this data in a

continuous way, thanks to help of a lot of
people especially my students, Will, and

other collaborators we now have
CensoredPlanet. CensoredPlanet collects

semi-weekly snapshots of Internet
censorship using our vantage point in all

the layers and provide this data in a raw
format now in our web site. We also

provide some visualization way for people
to be able to see how many vantage points

we have in each country and so on. Of
course, this is the beginning of

CensoredPlanet. We launched this at August
and we have been collecting data for

almost four months and we have a long way
to go. We have users right now through

organizations using our data and helping
us debug by finding things that doesn't

make sense pointing to us and any of you
that ended up using these data, please

share your feedback with us and we are
very responsive to be able to change it,

not as much as you need. They have a
collective of very well dedicated people

participating. So, now that we have this
CensoredPlanet let me give you how it can

help when there is a political situation
going on. You all must remember around

October there Jamal Khashoggi, a
Washington Post reporter, disappeared,

killed at the Saudi Arabian embassy in
Turkey. At the time of this happening

there was a lot of media attention and
this, this news especially two weeks in

become very internationally spread.
CensoredPlanet didn't know this event was

going to happen. So we have been
collecting this data semi-weekly for 2000

domain or so. And so we went back and we
checked the Saudi Arabia. Did we see

anything interesting? And yes, we saw for
example at two weeks in, around October

16, the domains that we were that was news
category and media category, the

censorship related to those doubled. And
let me emphasize, we didn't see like a

block or not block over the whole country
not all the countries have a homogeneous

censorship happening. We saw it in
multiple of the ISPs that we had vantage

point. Actually I freaked out when one of
the activists in Saudi Arabia told us that

"I don't see this". And we said "What ISP
you are in?" And this wasn't the ISPs that

we had vantage point in. So we were
looking for hints that "Is anybody else

seeing what we were seeing?". And so we
ended up seeing there was a commander

lab project that also saw around October
16 the number of malwares or whatever they

are testing is also doubled or tripled. I
don't know the other. So something was

going on two weeks in when the news broke.
Let me emphasize this news media that I am

talking about or the global news media
that we check like L.A. Times, Fox News

and so on. But we also checked Arab News
which is as the activists told us is a

Saudi Arabia's propaganda newspaper. That
in one of the ISPs was being poisoned. So

again, censorship measurement is very
complex problem. So where we're heading?

Well, having said that about side channels
and the techniques that help us remotely

collect this data I have to also say that
the data we collect doesn't replicate the

picture of the internet censorship. I mean
having a root access on a volunteers

machine to do a detailed test is powerful.
So in the next step, in the next year, one

of our goal is to join force with OONI to
integrate the data and from remote and

basically local measurements to provide
the best of both worlds. Also, we have

been thinking a lot about what would be a
good visualization tools that doesn't end

up to misrepresent internet censorship. I
literally hate that one. Hate it. The

number of vantage point in countries are
not equal. We don't know whether all the

vantage points that the data has resulted
from it is from one ISP or all of our

ISPs. And then we test domains that are
like benign and like I don't know defined

based on some western values of the
freedom of expression. I believe in all of

them but still culture, economy might play
something red. And then we put colors on

the map, rank the countries, call some
countries awful and not giving full

attention to the others. So something
needs to be changed and it's in our

horizon too. Think about it more deeper.
We want to be able to have more statistic

tools to be able to spot when the patterns
change. We want to be able to compare the

countries when for example Telegram was
being blocked at Russia. If you remember

millions of IPs being blocked. If you
don't, know go to my friend Leonid's talk

about Russia. You're going to learn a lot
there. But anyway. So when the Russia was

blocking Telegram, I said to everyone I
bet in the following some other

governments are going to jump to block
Telegram as well. And that's actually what

we heard, rumors like that. So we need to
be able to do that automatically. And

overall, I want to be able to develop an
empirical science of internet censorship

based on rich data with the help of all of
you. CensoredPlanet is now being

maintained by a group of dedicated
students, great friends that I have and

needs engineers and political scientists
to jump on our data and help us to bring

meaning to what we are collecting. So if
you are a good engineer or a political

scientist or a dedicated person who wants
to change the world, reach out to me. For

as a reference for those of you
interested: these are the publications

that my talk was based on.
And now I am open to questions.

<i>applause</i>

Herald: Allright, perfect. Thank you so
much, Roya, so far. We have some time for

questions so if you have a question in the
room please go to one of the room

microphones one, two, three, four, and
five in the very back. And if you're

watching the stream you can ask questions
to the signal angel via IRC or Twitter and

we'll also make sure to relay those to the
speaker and make sure those get asked. So

let's just go ahead and
start with Mic two please.

Question: Hey, great talk. Do you worry
that by publishing your methods as well as

your data that you're going to get a
response from governments that are

censoring things such that it makes it
more difficult for you to monitor what's

being censored? Or has
that already happened?

Roya: It hasn't happened. We have control
measures to be able to detect that. But

that has been... it's a really good
question and often comes up after I

present. I can tell you based on my
experience it's really hard to synchronize

all the ISPs in all the countries to act
to the SYN-ACK and RST that I'm sending.

Like, for example for Augur, this is
unsolicited packets and for governments to

block that they are going to be a lot of
collateral damage. You might say that

well, Roya, they're going to block the IP
of the University of Michigan. They're a

spoofing machine. We have a measure for
that. I have multiple places that I

actually have a backup if that case
happened. But overall this is a global

scale measurement, and even in one city or
like multiple ISPs you know of it's really

hard to synchronize being like blocking
something and maintaining. So it is

something that's in our mind thinking
about. But as as of now it's not a worry.

Herald: All right then let's
go over to Mic one.

Question: Thank you. I wondered, it's kind
of similar to this question. What if you

are measuring from a country that is
blocking? Do you also distribute the

measurements over several countries?
Roya: Absolutely. Every snapshot that we

collect is from all the vantage point we
have in like certain countries and portion

of vantage point in like China or like US
because they have millions of vantage

points or like thousands of vantage
points. So basically at each snapshot,

which takes us three days, we collect the
data from all of all of the vantage point.

And so let's say that somebody is reacting
to us. We have a benign domain that we

check as well like for example a domain
example.com or random.com. So if we see

something going on there we actually
double check. But good point, because now

our efforts is very manual labor and we're
trying to automate everything so it's

still a challenge. Thank you.
Herald: All right then let's go to Mic

three.
Question: Hi. Have you measured how much

does IP-ID randomization
break your probes?

Roya: Oh. This is also really good. Let me
give a shout out to [name]. He's the guy

at 1998 discovered IP-ID or published
something that I ended up reading. So like

for example Linux or Ubuntu in the U.S.
version they randomized it but it still

draws this legacy operating system like
WindowsXP and predecessors and FreeBSD

that still have global IP-ID. So one
argument that often come up is, what if

all these machines get updated to the new
operating system where it doesn't have a

maintain global IP-ID? And I can tell you
that, well, we'll come up with another

side channel. For now, that works. But my
gut feeling is that if it didn't change

from 1998 until now with all the things
that everybody says that global IP-ID

variable is a horrible idea, it's not going
to change in the coming five years so

we're good.
Question: Thank you.

Herald: Okay, then let's just
move on to Mic four.

Question: Thank you very much for the
great talk. When you were introducing

Augur I was wondering, does the detection
of the blockage between client server

necessarily indicate censorship? So,
because you were talking about validating

Augur I was wondering if it turns out that
there is like a false alarm. What do you

think could be the potential cause?
Roya: You're absolutely right. And I tried

to emphasize on that that what we end up
collecting is can be seen as a disruption.

Something didn't work. The SYN-ACK or RST
got disrupted. Is that there is a

censorship or it can be a random packet
drop. And the way to be able to establish

that confidence is to check whether
aggregate the results. Do we see this

blocking between multiple of the routers
within that country or within that AS .

Because if one of this is for accident
that just didn't make sense or didn't get

dropped, what about the others? So the
whole idea and this is another point that

I'm so so concerned about: Most of this
report and anecdotes that we read is based

on one VPN or one man touch points in the
country. And then there are a lot of lot

of conclusion out of that. And you often
can ask that well this vantage point might

be subject to so many different things
than a government's censorship. Also I

emphasized that the censorship that I use
in this talk is any action that stops

users' access to get to the requested
content. I'm trying to get away from a

semantic where of the intention applied.
But great question.

Herald: All right, then let's go back to
Mic one right.

Question: Hi Roya. You mentioned that you
have a team of students working on all of

these frameworks. I was wondering if your
frameworks were open source are available

online for collaboration? And if so, where
those resources would be?

Roya: So the data is open. The code hasn't
been. For one reason is I'm so low

confident in sharing code, like I'm
friends with Philipp Winter, Dave Fifield.

These people are pro open source and they
constantly blame me for not. But it really

requires confidence to share code. So we
are working on that at least for Quack. I

think the code is very easily can be
shared. For Augur, we spent a heck amount

of time to make a production ready code
and for Satellite I think that is also

ready. I can share them personally with
you but before sharing to the world I want

to actually give another person to audit
and make sure we're not using a curse word

or something. I don't know. It's just
completely my mind being a little bit

conservative. But happy if you send me an
e-mail I send you to code.

Question: Thank you.
Herald: All right then move to Mic two.

Question: Thanks again for sharing your
great vision. I find it really

fascinating. Also I'm not really a data
scientist but my question is: did you find

any any usefulness in your approaches in
the spreading of the Internet of Things? I

understood that you used routers to make
queries but did you send and maybe receive

back any data from
washing machines, toasters,...?

Roya: I mean, I know, being ethical and
trying to not use end user machine limits

your access a lot. And but but but that's
our goal. We are going to stick with

things that don't belong to the end users.
And so it's all routers, organizational

machines. So I want to make sure that
whatever we're using belong to the

identity that can protect themselves if
something went wrong. They can just say

"Hey this is a freaking router, it
receives and sends so many things. I mean,

look, let me give you show you a TCP (?),
for example. A volunteer might not be able

to defend that because it's already
conspiring and collecting this data. But

good questions, I wish I could
but I won't pass that line.

Herald: All right. I don't see any more
questions in the room right now. But we

have one from the internet
so please, signal angel.

Signal Angel: Yes. Actually a question
from koli585: I was in an African

country where the internet has been
completely shut down. How can I quickly

and safely inform others
about the shut down?

Roya: So while I think local users' values
are highly highly needed they can use

social media like Twitter to send and say
whatever, there is a project called IODA.

It's a project at CAIDA UCSD University in
U.S. and Philipp Winter, Alberto

[Dainotti] and Alistair [King] are working
on that. They basically remotely keep

track of shutdowns and push them out. If
you look at the IODA on Twitter you can

see their live feed of how the shutdowns
where the shutdowns happen. So I haven't

thought about how to reach to the users
telling them what we see or how we can

incorporate the users' feedback. We are
working with a group of researchers that

already developed tools to receive this
data from Tweeters and basically use that

as some level of ground truth, but OONI
does such a great job that I haven't felt

a need.
Herald: Alright. Unless the signal angel

has another question? No?
Roya: And let me, can I add one thing? So

I was listening to a talk about how
Iranian versus Arabs were sympathetic

towards Boston bombing in United States
and there were a lot of assumptions and a

lot of conclusions were made that, oh
this, I'm completely paraphrasing. I don't

remember. But this Iranian doesn't care
because they didn't tweet as much. So

basically their input data was a bunch of
tweets around the time of Boston bombing.

After the talk was over I said: you know
that in this country Twitter has been

blocked and so many people couldn't tweet.
<i>applause</i>

Herald: Alright. That concludes our Q&amp;A,
so thanks so much Roya.

Roya: Thank you.

<i>applause</i>

<i>postroll music</i>

Subtitles created by c3subtitles.de
in the year 2020. Join, and help us!