WEBVTT
00:00:00.000 --> 00:00:19.237
35C3 preroll music
00:00:19.237 --> 00:00:24.970
Herald Angel: All right. It's my very big
pleasure to introduce Roya Ensafi to you.
00:00:24.970 --> 00:00:31.390
She's gonna talk about "Censored Planet: a
Global Censorship Observatory". I'm
00:00:31.390 --> 00:00:36.230
personally very interested in learning
more about this project. Sounds like it's
00:00:36.230 --> 00:00:41.490
gonna be very important. So please welcome
Roya with a huge warm round of applause.
00:00:41.490 --> 00:00:42.880
Thank you.
00:00:42.880 --> 00:00:48.660
Applause
00:00:48.660 --> 00:00:56.170
Roya: It's wonderful to finally make it to
CCC. I had joined talk with multiple of my
00:00:56.170 --> 00:01:00.219
friends over the past years and the visa
stuff never worked out. This year I
00:01:00.219 --> 00:01:06.430
applied for a conference in August and the
visa worked for coming to CCC. My name is
00:01:06.430 --> 00:01:11.170
Roya Ensafi and I'm professor at the
University of Michigan. My research
00:01:11.170 --> 00:01:18.069
focuses on security and privacy with the
goal of protecting users from adversarial
00:01:18.069 --> 00:01:27.799
network. So basically I investigate
network interference ...and somebody is
00:01:27.799 --> 00:01:55.770
interfering right now. Damn it. What the
heck. Cool, I'm good. Oh, no I'm not.
00:01:55.770 --> 00:02:07.639
laughter OK. In my lab we develop
techniques and systems to be able to
00:02:07.639 --> 00:02:13.800
detect network interference often at a
scale and apply these frameworks and tools
00:02:13.800 --> 00:02:20.060
to be able to understand the behaviors of
these actors that do the interference and
00:02:20.060 --> 00:02:25.040
use this understanding to be able to come
up with a defense. Today I'm going to talk
00:02:25.040 --> 00:02:30.030
about a project that is very dear to my
heart. The one that I spent six years
00:02:30.030 --> 00:02:34.560
working on it. And in this talk I'm going
to talk about censorship, internet
00:02:34.560 --> 00:02:41.391
censorship. And by that I mean any action
that prevents users' access to the
00:02:41.391 --> 00:02:48.720
requested content. We have heard an
alarming level of censorship happening all
00:02:48.720 --> 00:02:53.980
around the world. And while it was
previously multiple countries that were
00:02:53.980 --> 00:03:01.260
capable of using deep packet inspections
to tamper with user traffic thanks to
00:03:01.260 --> 00:03:08.540
commercialization of these DPIs now many
countries are actually messing with users'
00:03:08.540 --> 00:03:16.951
data. For the first time that the users
type CNN.com in their browsers, their
00:03:16.951 --> 00:03:22.320
traffic is subject to some level of
interference by different actors. First
00:03:22.320 --> 00:03:27.150
for example the DNS query where the
mapping between the domain and the IP
00:03:27.150 --> 00:03:34.100
where the content is, can be manipulated.
For example the DNS assets can be a dead
00:03:34.100 --> 00:03:40.900
IP where the content is not there. If the
DNS succeed then the users and the servers
00:03:40.900 --> 00:03:47.500
are going to establish a connection, TCP
handshake and that can be easily blocked.
00:03:47.500 --> 00:03:53.840
If that succeed then users and servers
start actually sending back and forth the
00:03:53.840 --> 00:04:00.209
actual data and there are enough to clear
text to be the traffic encrypted or not
00:04:00.209 --> 00:04:06.130
that the DPI can detect a sensitive
keyboard and send a reset package to both
00:04:06.130 --> 00:04:12.990
basically shut down the connections.
Before I forget let me tell you and
00:04:12.990 --> 00:04:18.150
emphasize that it's not just the
governments and the policies that impose
00:04:18.150 --> 00:04:25.400
on the ISPs that lead to censorship.
Actually server side which provides the
00:04:25.400 --> 00:04:31.319
data are also blocking users. Especially
if they are located in a region that they
00:04:31.319 --> 00:04:39.580
don't provide any revenue. We recently
investigated this issue of dual blocking
00:04:39.580 --> 00:04:49.180
in deep and provide more details about
what role CDNs actually provide. Imagine
00:04:49.180 --> 00:04:57.490
now we have how many users, how many ISPs,
how many transit networks and how many
00:04:57.490 --> 00:05:02.830
websites. Each of which are going to have
their own policies of how to block users'
00:05:02.830 --> 00:05:09.859
access. More, censorship changes from time
to time, region to region and country to
00:05:09.859 --> 00:05:14.759
country. And for that reason many
researchers including me have been
00:05:14.759 --> 00:05:20.660
interested in collecting data about
censorship in a global way and
00:05:20.660 --> 00:05:29.539
continuously. Well, I grew up under severe
censorship. Be it the university,
00:05:29.539 --> 00:05:35.289
government, more frustrating the server
side. And I genuinely believe that
00:05:35.289 --> 00:05:44.739
censorship take away opportunities and
degrade human dignity. It is not just
00:05:44.739 --> 00:05:54.090
China, Bahrain, Turkey that does internet
censorship. Actually with the DPIs become
00:05:54.090 --> 00:06:02.499
cheaper and cheaper many governments are
following their leads. As a result
00:06:02.499 --> 00:06:06.680
Internet is becoming more and more
balkanized and the users around the world
00:06:06.680 --> 00:06:09.870
are going to soon have a very very
different pictures of what this Internet
00:06:09.870 --> 00:06:16.500
is. And we need to be able to collect the
data and to be able to know what is being
00:06:16.500 --> 00:06:25.121
censored, how it's being censored, where
it's being censored and for how long. This
00:06:25.121 --> 00:06:32.509
data then can be used to bring
transparency and accountability to
00:06:32.509 --> 00:06:38.779
governments or private companies that
practice internet censorship. It can help
00:06:38.779 --> 00:06:44.460
us to know where the circumvention to,
where the defense needs to be deployed. It
00:06:44.460 --> 00:06:49.309
can help us to let the users around the
world to know what their governments are
00:06:49.309 --> 00:06:59.370
up to and more important provide valid and
good data for the policymakers to come up
00:06:59.370 --> 00:07:07.860
with the good policies. Existing research
already shows that if we can provide this
00:07:07.860 --> 00:07:17.860
data to users they act by their own will
to ensure Internet freedom. For many years
00:07:17.860 --> 00:07:22.619
my goal has been to come up with a weather
map, a censorship weather map where you
00:07:22.619 --> 00:07:27.199
can actually see changes in censorship
over time, how some countries are
00:07:27.199 --> 00:07:34.100
different from others and do that for a
continuous duration of time, and for all
00:07:34.100 --> 00:07:41.710
over the world. Creating such a map was
impossible with the techniques, Internet
00:07:41.710 --> 00:07:46.919
measurement methods that we had at that
time. At the time and even the common
00:07:46.919 --> 00:07:53.779
techniques we now use. The measurement
methods to be able to use for measuring
00:07:53.779 --> 00:07:59.080
internet censorship is often by deploying
a software or giving your customized
00:07:59.080 --> 00:08:05.689
Raspberry Pi to either a client or a
server and based on that measure what's
00:08:05.689 --> 00:08:12.550
happening between client and servers.
Well, this approach has a lot of
00:08:12.550 --> 00:08:18.050
limitations. For example there are not
that many volunteers around the whole
00:08:18.050 --> 00:08:25.409
world that are eager to download a
software and run it. Second, the data
00:08:25.409 --> 00:08:33.190
collected from this approach are often not
continuous because the user's connection
00:08:33.190 --> 00:08:37.960
can die for a variety of reasons or users
may loose interest to keep running the
00:08:37.960 --> 00:08:45.450
software. And therefore we end up with
sparse data where we cannot have a good
00:08:45.450 --> 00:08:53.450
baseline for internet censorship studies.
More measuring domains that are sensitive
00:08:53.450 --> 00:08:59.800
often create risks for the local
collaborators and might end up with their
00:08:59.800 --> 00:09:09.810
government's retaliate. These risks are
not hypothetical. When the Arab Spring was
00:09:09.810 --> 00:09:17.240
happening I was approached by many
colleagues to recruit local friends and
00:09:17.240 --> 00:09:24.340
colleagues in Middle East to be able to
collect measurement data at the time that
00:09:24.340 --> 00:09:30.010
was very interesting to capture the
behavior of the network and most dangerous
00:09:30.010 --> 00:09:36.450
for the locals, and volunteers to collect
that. My painting actually expressed what
00:09:36.450 --> 00:09:44.090
I felt at the time. I can't just imagine
asking people on the ground to help at
00:09:44.090 --> 00:09:54.810
these times of unrest. In my opinion,
conspiring to collect the data against the
00:09:54.810 --> 00:10:02.450
government's interest can be seen as an
act of treason. And these governments are
00:10:02.450 --> 00:10:11.770
unpredictable often. So it has exposed
these volunteers to a severe risk. While
00:10:11.770 --> 00:10:19.030
no one has yet been arrested because of
measuring internet censorship as far as we
00:10:19.030 --> 00:10:25.740
know, and I don't know how we can know
that on a global scale, I think the clouds
00:10:25.740 --> 00:10:34.210
are on the horizon. I'm still at awe how
Turkish government used their surveillance
00:10:34.210 --> 00:10:42.410
data at a time of a co-op and tracked down
and detained hundreds of users because
00:10:42.410 --> 00:10:49.400
there was a traffic between them and by
luck a messenger app that was used by co-
00:10:49.400 --> 00:10:57.410
op administrators. These things happens.
Before I continue, if you know OONI you
00:10:57.410 --> 00:11:08.091
might ask how OONI prevents risk. Well,
with a great level of efforts. And if you
00:11:08.091 --> 00:11:12.130
don't know OONI, OONI is a global
community of volunteers that collect data
00:11:12.130 --> 00:11:20.840
about censorship around the world. Well,
first and foremost they provide their
00:11:20.840 --> 00:11:27.990
volunteers with the very honest consent,
telling them that "hey, if you run this
00:11:27.990 --> 00:11:34.560
software, anybody who is monitoring your
traffic know what you're up to." They also
00:11:34.560 --> 00:11:39.390
go out of their way to give freedom to
these volunteers to choose what website
00:11:39.390 --> 00:11:46.010
they want to run, what data they want to
push. They establish a great relationship
00:11:46.010 --> 00:11:53.940
with the local activist organization in
the countries. Well, now that I prove to
00:11:53.940 --> 00:11:59.250
you guys that I am a supporter of OONI and
I am actually friends with most of them; I
00:11:59.250 --> 00:12:05.300
want to emphasize that I still believe
that consistent and continuous and global
00:12:05.300 --> 00:12:12.200
data about censorship requires a new
approach that doesn't need volunteers'
00:12:12.200 --> 00:12:21.880
help. I've become obsessed with solving
this problems. What if we could measure
00:12:21.880 --> 00:12:29.160
without a client, in anywhere around the
world, can talk to a server without being
00:12:29.160 --> 00:12:36.290
close to a client. Somewhere from here,
from University of Michigan. And see
00:12:36.290 --> 00:12:42.300
whether the two hosts can talk to each
other, globally and remotely, off the
00:12:42.300 --> 00:12:50.220
path. When I talk to the people about
this, honestly, everybody was like "you
00:12:50.220 --> 00:12:54.190
don't know what you're talking about, it's
really really challenging". Well, they
00:12:54.190 --> 00:13:01.370
were right. The challenge is there, and
I'm going to walk you through it. We have
00:13:01.370 --> 00:13:06.760
at least 140 million IP addresses that
respond to same packet. This means they
00:13:06.760 --> 00:13:15.530
speak to the world, and they follow
blindly TCP/IP protocol. So the question
00:13:15.530 --> 00:13:24.400
becomes: how can I leverage the subtle
properties of TCP/IP to be able to detect
00:13:24.400 --> 00:13:36.080
that two hosts can talk to each other?
Well, Spooky Scan is a technique that Jed
00:13:36.080 --> 00:13:43.090
Crandall from University of New Mexico and
I developed that uses TCP/IP side channels
00:13:43.090 --> 00:13:49.770
to be able to detect whether the two
remote hosts can establish a TCP handshake
00:13:49.770 --> 00:13:56.890
or not, and if not, in which direction the
packets are being dropped. Off the path
00:13:56.890 --> 00:14:03.780
and remotely. And I'm gonna start telling
you how this works. First I have to cover
00:14:03.780 --> 00:14:10.810
some background. So any connection that is
based on TCP, one of the basic
00:14:10.810 --> 00:14:15.950
communication protocols we have, is it
needs to establish a TCP handshake. So
00:14:15.950 --> 00:14:22.730
basically you should, you send a SYN and
in the packet you send, in the IP header,
00:14:22.730 --> 00:14:30.750
you have a field called "identification
IP_ID", and this field is used for
00:14:30.750 --> 00:14:36.610
fragmentation reason, and I'm going to use
this field a lot in the rest of the talk.
00:14:36.610 --> 00:14:42.300
After the user received a SYN, it is going
to send a SYN-ACK back, have another IP_ID
00:14:42.300 --> 00:14:47.520
in it. And then, if I want to establish a
connection I send ACK. Otherwise I send a
00:14:47.520 --> 00:14:56.070
RESET (RST). Part of the protocol says
that if you send a SYN-ACK packet to a
00:14:56.070 --> 00:15:01.310
machine with a port open or closed, it's
going to send you a RST, telling you "what
00:15:01.310 --> 00:15:05.220
the heck you are sending me SYN-ACK, I
didn't send you a SYN" and another part
00:15:05.220 --> 00:15:09.350
said: if you send a SYN packet to a
machine with the port open, eager to
00:15:09.350 --> 00:15:13.880
establish connection, it will send you a
SYN-ACK. If you don't do anything, because
00:15:13.880 --> 00:15:20.040
TCP/IP is reliable, it's going to send you
multple SYN-ACK. It depends on operating
00:15:20.040 --> 00:15:30.241
system, 3, 5, you name it. Spooky Scan
requires some basic characteristics. For
00:15:30.241 --> 00:15:36.740
example, the client, the vantage points
that we are interested, should maintain a
00:15:36.740 --> 00:15:44.060
global variable for the IP_ID. It means
that, when they receive the packets and
00:15:44.060 --> 00:15:48.650
they want to send a packet out, no matter
who they're sending the packet to, this
00:15:48.650 --> 00:15:53.590
IP_ID is going to be a shared resource, as
in going to be increment by one. So by
00:15:53.590 --> 00:15:57.900
just watching the IP_ID changes you can
see how much a machine is noisy, how much
00:15:57.900 --> 00:16:03.820
a machine is sending traffic out. A server
should have a port open, let's say 80 or
00:16:03.820 --> 00:16:08.910
443, and wants to establish a connection,
and the measurement machine, me, should be
00:16:08.910 --> 00:16:15.360
able to spoof packets. It means sending
packet with the source IP different from
00:16:15.360 --> 00:16:20.520
my own machine. To be able to do that, you
need to talk to upstream network and ask
00:16:20.520 --> 00:16:28.260
them not to drop the packets. All of these
requirements I could easily satisfy with a
00:16:28.260 --> 00:16:36.560
little bit of effort. A Spooky Scan starts
with measurement machine send a SYN-ACK
00:16:36.560 --> 00:16:41.310
packet to one of this client with a global
IP_ID, at a time let's say the value is
00:16:41.310 --> 00:16:49.010
7000. The client is going to send back a
RST, following the protocol, revealing to
00:16:49.010 --> 00:16:53.881
me what the value of IP_ID. In the next
step I'm going to send a spoofed SYN
00:16:53.881 --> 00:17:01.779
packet to a server using a client IP. As a
result, the SYN-ACK is going to be sent to
00:17:01.779 --> 00:17:06.289
the client. Again, client is going to send
a RST back, the IP_ID is going to be
00:17:06.289 --> 00:17:11.240
incremented by 1. Next time I query IP_ID
I'm going to see a jump too. In a
00:17:11.240 --> 00:17:17.189
noiseless model, I know that this machine
talked to the server. If I query it again,
00:17:17.189 --> 00:17:25.070
I won't see any jump. So, Delta 2, Delta
1. Now imagine there is a firewall that
00:17:25.070 --> 00:17:32.520
blocks the SYN-ACKs going from the server
to the client. Well, it doesn't matter how
00:17:32.520 --> 00:17:36.860
much of the traffic I send, it's not going
to get there. It's not going to get there.
00:17:36.860 --> 00:17:44.390
So the delta I see is 1, 1. In the third
case when the packets are going to be
00:17:44.390 --> 00:17:49.790
dropped from the client to the server:
Well, my SYN-ACK gets there. The SYN-ACK
00:17:49.790 --> 00:17:55.030
gets to the client, the client is going to
set the RST back, but it's not going to
00:17:55.030 --> 00:17:59.470
get to the server. And so server thinks
that a packet got dropped, so it's going
00:17:59.470 --> 00:18:07.040
to send multiple SYN-ACK. And as a result
the RST is going to be plus plus more. And
00:18:07.040 --> 00:18:13.690
so what jump I would see is, let's say, 2,
2. Let me put them all together. So you
00:18:13.690 --> 00:18:19.670
have 3 cases. Blocking in this direction.
No blocking and blocking in the other. And
00:18:19.670 --> 00:18:25.890
you see different jumps or different
deltas. So it's detectable. Yes, yes, in a
00:18:25.890 --> 00:18:31.770
noiseless model. I know the clients talk
to so many others and the IP_ID is going
00:18:31.770 --> 00:18:37.590
to be changed because of a variety of
reason. I call all of those noise. And
00:18:37.590 --> 00:18:42.870
this is how we are going to deal with it.
Well, intuitively thinking we can amplify
00:18:42.870 --> 00:18:47.940
the signal. We can actually instead of
sending one spoofed SYN packet we can send
00:18:47.940 --> 00:18:55.310
n. And for a variety of reasons packets
can get dropped. So we need to repeat this
00:18:55.310 --> 00:19:04.360
measurement. So here is some data from a
Spooky Scan where I used the following
00:19:04.360 --> 00:19:13.300
probing method. For 30 seconds I spoofed
the, I've sent a query for IP_ID. And then
00:19:13.300 --> 00:19:20.559
for another 30 seconds I send these 5
spoofed SYN packets. This is machines or
00:19:20.559 --> 00:19:26.680
clients in Azerbaijan, China and United
States. And we wanted to check whether it
00:19:26.680 --> 00:19:32.980
has reached the TOR-relay that we had in
Sweden. You can see there are different
00:19:32.980 --> 00:19:40.280
jump or different levels-shift that you
observe in a second phase. And just
00:19:40.280 --> 00:19:45.290
visually looking at it or using auto-
regressive moving average or ARMA you
00:19:45.290 --> 00:19:51.120
can actually detect that. But there is an
insight here, which is that not all the
00:19:51.120 --> 00:19:56.520
clients have the same level of noise. And
for which, for some of them, especially
00:19:56.520 --> 00:20:01.630
these guys, you could easily detect after
five level of sending IP_ID-query and then
00:20:01.630 --> 00:20:10.770
five seconds of spoofing. So in the
follow-up work we tried to use this
00:20:10.770 --> 00:20:16.480
insight, to be able to come up with a
scalable and efficient technique to be
00:20:16.480 --> 00:20:24.900
able to use it in a global way. And that
technique is called "Augur". Well Augur
00:20:24.900 --> 00:20:32.920
adopts this probing method. First, for four
seconds it queries IP_ID, then in one
00:20:32.920 --> 00:20:42.160
second sends 10 spoofed SYN-packets. Then
look at the IP_ID-acceleration or second
00:20:42.160 --> 00:20:49.600
derivative, and see whether we see a jump,
a sudden jump at the time of perturbation,
00:20:49.600 --> 00:20:55.520
when we did the spoofing. How confident we
are that that jump is the result of our
00:20:55.520 --> 00:21:02.290
own spoofed packet? Well, I'm not
confident, run it again. I think so, run
00:21:02.290 --> 00:21:09.280
it again, until you have a sufficient
confidence. It turns out there is a
00:21:09.280 --> 00:21:15.230
statistical analysis called "sequential
hypothesis testing" that can be used to be
00:21:15.230 --> 00:21:23.300
able to gradually improve our confidence
about the case we're detecting. So I'm
00:21:23.300 --> 00:21:28.340
going to give you a very, very rough
overview of how this works. But for
00:21:28.340 --> 00:21:36.810
sequential hypothesis testing we need to
define a random variable. And we use
00:21:36.810 --> 00:21:42.910
IP_ID-acceleration at the time of
perturbation, being 1 or 0, based on you
00:21:42.910 --> 00:21:53.570
see jump or not. We also need to calculate
some empirical priors, known
00:21:53.570 --> 00:21:59.450
probabilities. If you look at everything,
what would be the probability that you see
00:21:59.450 --> 00:22:08.179
jump when there is actually no blocking?
And so on. After we put all this together
00:22:08.179 --> 00:22:16.150
then we can formalize an algorithm
starting by run a trial. Update the
00:22:16.150 --> 00:22:20.940
sequence of values for the random
variables. Then check whether this
00:22:20.940 --> 00:22:27.320
sequence of values belongs to the
distribution of where the blocking happen
00:22:27.320 --> 00:22:32.590
or not. What's the likelihood of that? If
you're confident, if we reached the level
00:22:32.590 --> 00:22:39.130
that we are satisfied, then we call it a
case. So putting all this together this is
00:22:39.130 --> 00:22:47.720
how Augur works. We scan the whole IPv4,
find global IP_ID-machines. And then we
00:22:47.720 --> 00:22:55.870
have some constraint that is it a stable
machine? Is it a noisier or have a noise
00:22:55.870 --> 00:23:02.170
that you want to deal with? We also need
to figure out what website are we
00:23:02.170 --> 00:23:09.290
interested to test reachability towards?
What countries we are? So after we decide
00:23:09.290 --> 00:23:18.500
all the input then we run a scheduler
making sure that no client and server are
00:23:18.500 --> 00:23:26.160
under the measurement in the same time
because they mess each other's detection.
00:23:26.160 --> 00:23:32.500
And then we actually use our analysis to
be able to call the case and summarize the
00:23:32.500 --> 00:23:39.191
results. I started by saying that the
common methods have this limitation, for
00:23:39.191 --> 00:23:45.370
example coverage continuity and ethics.
Well, when it comes to coverage there are
00:23:45.370 --> 00:23:52.620
more than 22-million global IP_ID-
machines. These are WindowsXP or
00:23:52.620 --> 00:24:02.570
predecessors. And FreeBSDs for
example. Compared to the previous board,
00:24:02.570 --> 00:24:07.910
one successful project is the RIPE-atlas,
and they have around 10000 probes globally
00:24:07.910 --> 00:24:18.970
deployed. When it comes to continuity we
don't depend on the end user. So it's much
00:24:18.970 --> 00:24:28.720
more reliable to use this. Well, by not
asking volunteers to help we were already
00:24:28.720 --> 00:24:34.570
reducing the risk. Because there is no
users conspiring against their governments
00:24:34.570 --> 00:24:43.000
to collect this data. But our approach is
not also zero risk. If you look you have a
00:24:43.000 --> 00:24:49.860
different kind of risk here. The client
and server exchanging SYN-ACK and RST
00:24:49.860 --> 00:24:55.810
without each of them giving a consent. And
we don't want to ask for consent. Because
00:24:55.810 --> 00:25:01.020
if you do, the dilemma exists. We have to
go back and it's just the same that's
00:25:01.020 --> 00:25:06.850
asking volunteers. So, to deal with that
and cope with that, to reduce the risk
00:25:06.850 --> 00:25:15.380
more, we don't use end-IPs. We actually
use 2 hops back, routers which high
00:25:15.380 --> 00:25:21.650
probability they are infrastructure
machines and use those as a vantage point.
00:25:21.650 --> 00:25:31.486
Even in this harsh constraint we still
have 53000 global IP_ID-routers. To test
00:25:31.486 --> 00:25:38.780
the framework to see that whether Augur
works we chose 2000 of these global IP_ID-
00:25:38.780 --> 00:25:45.350
machines, uniformly selected from all the
countries we had vantage point. We
00:25:45.350 --> 00:25:52.549
selected websites from Citizen Lab
Testlist. This is the research
00:25:52.549 --> 00:25:57.710
organization in Toronto University where
they crowdsourced websites that are
00:25:57.710 --> 00:26:03.070
potentially being blocked or potential
sensitive. And then we used thousands of
00:26:03.070 --> 00:26:09.640
the websites from Alexa top-10k. And then
we get the Augur running for 17 days and
00:26:09.640 --> 00:26:17.050
collect this data. One of the challenges
that we have to validate Augur was like:
00:26:17.050 --> 00:26:22.940
So, what is the truth? What is the ground-
truth? What would we see that makes sense?
00:26:22.940 --> 00:26:26.270
So, and this is the biggest and
fundamental challenge for internet-
00:26:26.270 --> 00:26:33.570
censorship anyway. But so the first
approach is leaning on intuition, which is
00:26:33.570 --> 00:26:40.049
like no client should show blocking
towards all the websites. No server should
00:26:40.049 --> 00:26:45.740
show blocking for bulk of our clients. And
if anything happens like that we just
00:26:45.740 --> 00:26:51.960
trash it. And we should see more bias
towards the sensitive domain versus the
00:26:51.960 --> 00:27:01.559
ones that are popular. And so on. And also
we hope to replicate the anecdotes, the
00:27:01.559 --> 00:27:08.870
reports out there. And we did all of
those. And that's how we validate Augur.
00:27:08.870 --> 00:27:17.690
So at the end Augur is a system that is as
scalable and efficient, ethical and can be
00:27:17.690 --> 00:27:24.630
used to detect TCP/IP-blocking
continuously. Yes I know that is just
00:27:24.630 --> 00:27:32.310
TCP/IP. What about the other layers? Can
we measure them remotely as well? Well,
00:27:32.310 --> 00:27:40.090
let me focus on the DNS. You might ask: Is
there a way that we can remotely detect
00:27:40.090 --> 00:27:46.890
DNS poisoning or manipulation? Well let's
think it out loud. From now on I'm gonna
00:27:46.890 --> 00:27:54.370
give just the highlights of the papers we
work for the lack of the time. Well, if we
00:27:54.370 --> 00:28:06.070
scan the whole IPv4 we have a lot of open
DNS resolvers, which means that they are
00:28:06.070 --> 00:28:14.929
open to anybody sending a query to them to
resolve. And these open DNS-resolvers can
00:28:14.929 --> 00:28:22.590
be used as a vantage point. We can use
open DNS-resolvers in different ISPs
00:28:22.590 --> 00:28:29.830
around the world to see whether that DNS
queries are poisoned or not. Well, wait.
00:28:29.830 --> 00:28:35.419
We need to make sure that they don't
belong to the end user. So we come up with
00:28:35.419 --> 00:28:42.760
a lot of checks to make sure that these
open DNS-resolvers are organizational,
00:28:42.760 --> 00:28:50.610
belonging to the ISP or infrastructure.
After we do that then we start sending all
00:28:50.610 --> 00:28:57.980
our queries to these, let's say, open DNS-
resolvers in the ISP in Bahrain, for all
00:28:57.980 --> 00:29:03.929
the domain we're interested. And capture
what we receive what IPs we receive. The
00:29:03.929 --> 00:29:11.390
challenge is then to detect what is the
wrong answer. And so we have to come up
00:29:11.390 --> 00:29:19.760
with a lot of heuristics. A set of
heuristics. For example the response that
00:29:19.760 --> 00:29:28.610
we received is that equal to a reply we
got from our control measurements, where
00:29:28.610 --> 00:29:36.500
we know the IP is not blocked or poisoned
or something. The content is there. Or we
00:29:36.500 --> 00:29:42.060
can actually look at the IP that we
received and see whether it has a valid
00:29:42.060 --> 00:29:50.850
http cert, with or without the SNI or
servername identification or something.
00:29:50.850 --> 00:29:55.720
And so on so forth. So we come up with
lots of heuristics to detect wrong
00:29:55.720 --> 00:30:06.840
answers. The results of all these efforts
ended up being a project called
00:30:06.840 --> 00:30:12.210
"Satellite", which was started by Will
Scott. I'm sure he is in the audience
00:30:12.210 --> 00:30:16.809
somewhere. A great friend of mine and very
good supporter of CensoredPlanet.
00:30:16.809 --> 00:30:24.000
Selflessly, he has been a miracle that I I
had the opportunity and fortune to meet
00:30:24.000 --> 00:30:31.890
him. We have Satellite. Satellite automate
the whole steps that I told you. For this
00:30:31.890 --> 00:30:37.400
work we use science that developed in both
of the work. We call it Satellite because
00:30:37.400 --> 00:30:46.421
of seniority and sticking with the name. So
how much coverage Satellite has? If you
00:30:46.421 --> 00:30:54.880
scan IPv4 you end up with 4.2 million open
DNS-resolvers in every country in their
00:30:54.880 --> 00:31:01.079
territories. We make, we need, we we
actually need to make sure there are
00:31:01.079 --> 00:31:08.950
ethics for that reason. If we put a harsh
condition. We say that let's only use the
00:31:08.950 --> 00:31:17.710
ones that fallow their valid PTR record
followed this expression. Basically let's
00:31:17.710 --> 00:31:23.200
just use the open DNS-resolvers that are
name servers or at least their PDR record
00:31:23.200 --> 00:31:29.920
suggests that. This is a really harsh
constraint. Actually, my students have
00:31:29.920 --> 00:31:34.430
been adding more and more regular
expression for the ones that we are sure
00:31:34.430 --> 00:31:42.610
they are organizational. But for now just
being this harsh we have 40k of DNS-
00:31:42.610 --> 00:31:56.830
revolvers in almost 169 countries I guess.
So censorship happened in other layers as
00:31:56.830 --> 00:32:00.700
well. How do we want to deal with that
remote channel, with the remote side
00:32:00.700 --> 00:32:12.520
channel? And, especially, like, what about
http traffic or disruption that can happen
00:32:12.520 --> 00:32:29.809
to you know TLS centric. I hate water.
Oh no. Okay. So. So it's scratching
00:32:29.809 --> 00:32:38.220
noise it's well documented that many DPIs
especially in the Great Firewall of China monitor
00:32:38.220 --> 00:32:43.930
the traffic and then they see a key word,
a sensitive keyword like "Falun Gong".
00:32:43.930 --> 00:32:50.350
They act and a drop traffic or send a RST.
And as I mentioned earlier there are
00:32:50.350 --> 00:32:57.330
enough clear text everywhere. Even in TLS
handshakes SNI is in clear text. And for a
00:32:57.330 --> 00:33:03.590
long time I was trying to come up with a
way of detecting application layer using
00:33:03.590 --> 00:33:09.320
this fancy side channel. Like, how can I
detect that when the client and server
00:33:09.320 --> 00:33:14.630
need to first establish a TCP handshake,
how the side channel can jump in and then
00:33:14.630 --> 00:33:22.720
detect the rest? We were lucky enough that
the end pointed to a protocol called
00:33:22.720 --> 00:33:32.900
"Echo". It's a protocol designed in 1983
and it's for testing reasons, for the
00:33:32.900 --> 00:33:41.140
debu..it is a debugging tool, basically.
It's a predecessor to ping. And basically,
00:33:41.140 --> 00:33:50.120
after you establish a TCP handshake to
port 7, whatever you send the Echo servers
00:33:50.120 --> 00:33:57.290
on port 7 it's gonna echo it back. Now
think about it. How we can use Echo
00:33:57.290 --> 00:34:04.570
servers to be able to detect application
layer blocking? Well, when it's not
00:34:04.570 --> 00:34:08.490
available, let's say I have an Echo server
in the U.S. and a measurement machine in
00:34:08.490 --> 00:34:13.890
the University of Michigan I establish a
TCP handshake and I send a GET request
00:34:13.890 --> 00:34:19.190
to... using a censored keyboard for
example. It's gonna get back to me the
00:34:19.190 --> 00:34:28.269
same thing I sent. But now let's put the
DPI that is gonna be triggered by it.
00:34:28.269 --> 00:34:37.150
Well, for sure, either I'm going to
receive a RST first or something else. So
00:34:37.150 --> 00:34:43.609
we can actually come up with a algorithm
to be able to use Echo servers to detect
00:34:43.609 --> 00:34:47.969
disruptions on application layer.
Basically keyboards blocking, URL
00:34:47.969 --> 00:34:58.530
blocking. Results of this is a tool called
Quack. And Quack actually uses Echo
00:34:58.530 --> 00:35:06.470
servers to be able to detect in a scalable
way and say if, whether the keywords are
00:35:06.470 --> 00:35:14.380
being blocked around the world. So what
did we do is first scan the whole IPv4. We
00:35:14.380 --> 00:35:22.910
find 47k Echo servers running around the
world. Then we need to be able to check
00:35:22.910 --> 00:35:27.270
whether they or not belong to the end
users. And that was a very challenging
00:35:27.270 --> 00:35:36.530
part because there is not a clear signal
as it's.. there are 90 percent of them are
00:35:36.530 --> 00:35:40.730
infrastructure but there is still some
portion of them that we don't know. So
00:35:40.730 --> 00:35:46.610
what we do is we look at the FreedomHouse
reports and the countries that are
00:35:46.610 --> 00:35:52.931
partially open or not open, not free or
partially free what they're called. This
00:35:52.931 --> 00:35:58.720
is around 50... This is around 50
countries. And for those we use... we
00:35:58.720 --> 00:36:05.460
randomly select some that we want and we
use OS detection of Nmap. And if you have,
00:36:05.460 --> 00:36:15.750
it will give us back it's a server, it's a
switch and so on. We use those. So with
00:36:15.750 --> 00:36:23.010
the help of so many collaborators after
almost six years we end up with three
00:36:23.010 --> 00:36:32.420
systems that can capture TCP/IP blocking,
DNS, and application layer blocking using
00:36:32.420 --> 00:36:43.480
infrastructure and organizational
machines. So while it was, it was a dream
00:36:43.480 --> 00:36:47.810
or a vision that we can come up with a
better map to collect this data in a
00:36:47.810 --> 00:36:56.020
continuous way, thanks to help of a lot of
people especially my students, Will, and
00:36:56.020 --> 00:37:02.060
other collaborators we now have
CensoredPlanet. CensoredPlanet collects
00:37:02.060 --> 00:37:09.020
semi-weekly snapshots of Internet
censorship using our vantage point in all
00:37:09.020 --> 00:37:18.090
the layers and provide this data in a raw
format now in our web site. We also
00:37:18.090 --> 00:37:24.531
provide some visualization way for people
to be able to see how many vantage points
00:37:24.531 --> 00:37:29.560
we have in each country and so on. Of
course, this is the beginning of
00:37:29.560 --> 00:37:34.160
CensoredPlanet. We launched this at August
and we have been collecting data for
00:37:34.160 --> 00:37:39.880
almost four months and we have a long way
to go. We have users right now through
00:37:39.880 --> 00:37:45.130
organizations using our data and helping
us debug by finding things that doesn't
00:37:45.130 --> 00:37:51.950
make sense pointing to us and any of you
that ended up using these data, please
00:37:51.950 --> 00:37:56.930
share your feedback with us and we are
very responsive to be able to change it,
00:37:56.930 --> 00:38:03.940
not as much as you need. They have a
collective of very well dedicated people
00:38:03.940 --> 00:38:10.940
participating. So, now that we have this
CensoredPlanet let me give you how it can
00:38:10.940 --> 00:38:19.349
help when there is a political situation
going on. You all must remember around
00:38:19.349 --> 00:38:25.410
October there Jamal Khashoggi, a
Washington Post reporter, disappeared,
00:38:25.410 --> 00:38:34.530
killed at the Saudi Arabian embassy in
Turkey. At the time of this happening
00:38:34.530 --> 00:38:40.540
there was a lot of media attention and
this, this news especially two weeks in
00:38:40.540 --> 00:38:46.980
become very internationally spread.
CensoredPlanet didn't know this event was
00:38:46.980 --> 00:38:52.750
going to happen. So we have been
collecting this data semi-weekly for 2000
00:38:52.750 --> 00:38:57.660
domain or so. And so we went back and we
checked the Saudi Arabia. Did we see
00:38:57.660 --> 00:39:04.830
anything interesting? And yes, we saw for
example at two weeks in, around October
00:39:04.830 --> 00:39:12.680
16, the domains that we were that was news
category and media category, the
00:39:12.680 --> 00:39:18.500
censorship related to those doubled. And
let me emphasize, we didn't see like a
00:39:18.500 --> 00:39:23.440
block or not block over the whole country
not all the countries have a homogeneous
00:39:23.440 --> 00:39:28.430
censorship happening. We saw it in
multiple of the ISPs that we had vantage
00:39:28.430 --> 00:39:34.770
point. Actually I freaked out when one of
the activists in Saudi Arabia told us that
00:39:34.770 --> 00:39:41.869
"I don't see this". And we said "What ISP
you are in?" And this wasn't the ISPs that
00:39:41.869 --> 00:39:49.160
we had vantage point in. So we were
looking for hints that "Is anybody else
00:39:49.160 --> 00:39:55.720
seeing what we were seeing?". And so we
ended up seeing there was a commander
00:39:55.720 --> 00:40:03.560
lab project that also saw around October
16 the number of malwares or whatever they
00:40:03.560 --> 00:40:10.220
are testing is also doubled or tripled. I
don't know the other. So something was
00:40:10.220 --> 00:40:17.180
going on two weeks in when the news broke.
Let me emphasize this news media that I am
00:40:17.180 --> 00:40:22.300
talking about or the global news media
that we check like L.A. Times, Fox News
00:40:22.300 --> 00:40:30.970
and so on. But we also checked Arab News
which is as the activists told us is a
00:40:30.970 --> 00:40:38.490
Saudi Arabia's propaganda newspaper. That
in one of the ISPs was being poisoned. So
00:40:38.490 --> 00:40:49.910
again, censorship measurement is very
complex problem. So where we're heading?
00:40:49.910 --> 00:40:55.580
Well, having said that about side channels
and the techniques that help us remotely
00:40:55.580 --> 00:41:01.900
collect this data I have to also say that
the data we collect doesn't replicate the
00:41:01.900 --> 00:41:06.950
picture of the internet censorship. I mean
having a root access on a volunteers
00:41:06.950 --> 00:41:17.641
machine to do a detailed test is powerful.
So in the next step, in the next year, one
00:41:17.641 --> 00:41:27.720
of our goal is to join force with OONI to
integrate the data and from remote and
00:41:27.720 --> 00:41:37.800
basically local measurements to provide
the best of both worlds. Also, we have
00:41:37.800 --> 00:41:43.990
been thinking a lot about what would be a
good visualization tools that doesn't end
00:41:43.990 --> 00:41:51.391
up to misrepresent internet censorship. I
literally hate that one. Hate it. The
00:41:51.391 --> 00:41:56.860
number of vantage point in countries are
not equal. We don't know whether all the
00:41:56.860 --> 00:42:00.980
vantage points that the data has resulted
from it is from one ISP or all of our
00:42:00.980 --> 00:42:08.109
ISPs. And then we test domains that are
like benign and like I don't know defined
00:42:08.109 --> 00:42:13.650
based on some western values of the
freedom of expression. I believe in all of
00:42:13.650 --> 00:42:19.330
them but still culture, economy might play
something red. And then we put colors on
00:42:19.330 --> 00:42:25.030
the map, rank the countries, call some
countries awful and not giving full
00:42:25.030 --> 00:42:30.849
attention to the others. So something
needs to be changed and it's in our
00:42:30.849 --> 00:42:37.700
horizon too. Think about it more deeper.
We want to be able to have more statistic
00:42:37.700 --> 00:42:44.320
tools to be able to spot when the patterns
change. We want to be able to compare the
00:42:44.320 --> 00:42:49.580
countries when for example Telegram was
being blocked at Russia. If you remember
00:42:49.580 --> 00:42:54.910
millions of IPs being blocked. If you
don't, know go to my friend Leonid's talk
00:42:54.910 --> 00:43:00.020
about Russia. You're going to learn a lot
there. But anyway. So when the Russia was
00:43:00.020 --> 00:43:06.520
blocking Telegram, I said to everyone I
bet in the following some other
00:43:06.520 --> 00:43:10.370
governments are going to jump to block
Telegram as well. And that's actually what
00:43:10.370 --> 00:43:15.320
we heard, rumors like that. So we need to
be able to do that automatically. And
00:43:15.320 --> 00:43:26.470
overall, I want to be able to develop an
empirical science of internet censorship
00:43:26.470 --> 00:43:36.720
based on rich data with the help of all of
you. CensoredPlanet is now being
00:43:36.720 --> 00:43:43.370
maintained by a group of dedicated
students, great friends that I have and
00:43:43.370 --> 00:43:49.960
needs engineers and political scientists
to jump on our data and help us to bring
00:43:49.960 --> 00:43:57.320
meaning to what we are collecting. So if
you are a good engineer or a political
00:43:57.320 --> 00:44:07.250
scientist or a dedicated person who wants
to change the world, reach out to me. For
00:44:07.250 --> 00:44:11.500
as a reference for those of you
interested: these are the publications
00:44:11.500 --> 00:44:19.720
that my talk was based on.
And now I am open to questions.
00:44:19.720 --> 00:44:26.180
applause
00:44:26.180 --> 00:44:31.440
Herald: Allright, perfect. Thank you so
much, Roya, so far. We have some time for
00:44:31.440 --> 00:44:35.500
questions so if you have a question in the
room please go to one of the room
00:44:35.500 --> 00:44:40.100
microphones one, two, three, four, and
five in the very back. And if you're
00:44:40.100 --> 00:44:44.490
watching the stream you can ask questions
to the signal angel via IRC or Twitter and
00:44:44.490 --> 00:44:49.360
we'll also make sure to relay those to the
speaker and make sure those get asked. So
00:44:49.360 --> 00:44:52.040
let's just go ahead and
start with Mic two please.
00:44:52.040 --> 00:44:57.349
Question: Hey, great talk. Do you worry
that by publishing your methods as well as
00:44:57.349 --> 00:45:02.690
your data that you're going to get a
response from governments that are
00:45:02.690 --> 00:45:05.869
censoring things such that it makes it
more difficult for you to monitor what's
00:45:05.869 --> 00:45:08.680
being censored? Or has
that already happened?
00:45:08.680 --> 00:45:14.630
Roya: It hasn't happened. We have control
measures to be able to detect that. But
00:45:14.630 --> 00:45:19.260
that has been... it's a really good
question and often comes up after I
00:45:19.260 --> 00:45:25.490
present. I can tell you based on my
experience it's really hard to synchronize
00:45:25.490 --> 00:45:31.490
all the ISPs in all the countries to act
to the SYN-ACK and RST that I'm sending.
00:45:31.490 --> 00:45:36.150
Like, for example for Augur, this is
unsolicited packets and for governments to
00:45:36.150 --> 00:45:41.850
block that they are going to be a lot of
collateral damage. You might say that
00:45:41.850 --> 00:45:45.610
well, Roya, they're going to block the IP
of the University of Michigan. They're a
00:45:45.610 --> 00:45:50.770
spoofing machine. We have a measure for
that. I have multiple places that I
00:45:50.770 --> 00:45:56.190
actually have a backup if that case
happened. But overall this is a global
00:45:56.190 --> 00:46:02.800
scale measurement, and even in one city or
like multiple ISPs you know of it's really
00:46:02.800 --> 00:46:06.920
hard to synchronize being like blocking
something and maintaining. So it is
00:46:06.920 --> 00:46:13.630
something that's in our mind thinking
about. But as as of now it's not a worry.
00:46:13.630 --> 00:46:16.470
Herald: All right then let's
go over to Mic one.
00:46:16.470 --> 00:46:20.510
Question: Thank you. I wondered, it's kind
of similar to this question. What if you
00:46:20.510 --> 00:46:24.920
are measuring from a country that is
blocking? Do you also distribute the
00:46:24.920 --> 00:46:29.970
measurements over several countries?
Roya: Absolutely. Every snapshot that we
00:46:29.970 --> 00:46:37.280
collect is from all the vantage point we
have in like certain countries and portion
00:46:37.280 --> 00:46:42.100
of vantage point in like China or like US
because they have millions of vantage
00:46:42.100 --> 00:46:46.220
points or like thousands of vantage
points. So basically at each snapshot,
00:46:46.220 --> 00:46:52.340
which takes us three days, we collect the
data from all of all of the vantage point.
00:46:52.340 --> 00:46:57.580
And so let's say that somebody is reacting
to us. We have a benign domain that we
00:46:57.580 --> 00:47:03.250
check as well like for example a domain
example.com or random.com. So if we see
00:47:03.250 --> 00:47:09.380
something going on there we actually
double check. But good point, because now
00:47:09.380 --> 00:47:14.720
our efforts is very manual labor and we're
trying to automate everything so it's
00:47:14.720 --> 00:47:18.900
still a challenge. Thank you.
Herald: All right then let's go to Mic
00:47:18.900 --> 00:47:22.859
three.
Question: Hi. Have you measured how much
00:47:22.859 --> 00:47:28.140
does IP-ID randomization
break your probes?
00:47:28.140 --> 00:47:35.349
Roya: Oh. This is also really good. Let me
give a shout out to [name]. He's the guy
00:47:35.349 --> 00:47:45.990
at 1998 discovered IP-ID or published
something that I ended up reading. So like
00:47:45.990 --> 00:47:54.440
for example Linux or Ubuntu in the U.S.
version they randomized it but it still
00:47:54.440 --> 00:47:59.421
draws this legacy operating system like
WindowsXP and predecessors and FreeBSD
00:47:59.421 --> 00:48:04.750
that still have global IP-ID. So one
argument that often come up is, what if
00:48:04.750 --> 00:48:09.339
all these machines get updated to the new
operating system where it doesn't have a
00:48:09.339 --> 00:48:13.780
maintain global IP-ID? And I can tell you
that, well, we'll come up with another
00:48:13.780 --> 00:48:20.129
side channel. For now, that works. But my
gut feeling is that if it didn't change
00:48:20.129 --> 00:48:25.230
from 1998 until now with all the things
that everybody says that global IP-ID
00:48:25.230 --> 00:48:30.440
variable is a horrible idea, it's not going
to change in the coming five years so
00:48:30.440 --> 00:48:33.230
we're good.
Question: Thank you.
00:48:33.230 --> 00:48:36.520
Herald: Okay, then let's just
move on to Mic four.
00:48:36.520 --> 00:48:41.480
Question: Thank you very much for the
great talk. When you were introducing
00:48:41.480 --> 00:48:46.910
Augur I was wondering, does the detection
of the blockage between client server
00:48:46.910 --> 00:48:52.190
necessarily indicate censorship? So,
because you were talking about validating
00:48:52.190 --> 00:48:59.130
Augur I was wondering if it turns out that
there is like a false alarm. What do you
00:48:59.130 --> 00:49:04.530
think could be the potential cause?
Roya: You're absolutely right. And I tried
00:49:04.530 --> 00:49:11.630
to emphasize on that that what we end up
collecting is can be seen as a disruption.
00:49:11.630 --> 00:49:17.200
Something didn't work. The SYN-ACK or RST
got disrupted. Is that there is a
00:49:17.200 --> 00:49:22.250
censorship or it can be a random packet
drop. And the way to be able to establish
00:49:22.250 --> 00:49:28.290
that confidence is to check whether
aggregate the results. Do we see this
00:49:28.290 --> 00:49:33.670
blocking between multiple of the routers
within that country or within that AS .
00:49:33.670 --> 00:49:38.880
Because if one of this is for accident
that just didn't make sense or didn't get
00:49:38.880 --> 00:49:43.900
dropped, what about the others? So the
whole idea and this is another point that
00:49:43.900 --> 00:49:50.390
I'm so so concerned about: Most of this
report and anecdotes that we read is based
00:49:50.390 --> 00:49:55.869
on one VPN or one man touch points in the
country. And then there are a lot of lot
00:49:55.869 --> 00:50:00.770
of conclusion out of that. And you often
can ask that well this vantage point might
00:50:00.770 --> 00:50:05.640
be subject to so many different things
than a government's censorship. Also I
00:50:05.640 --> 00:50:11.980
emphasized that the censorship that I use
in this talk is any action that stops
00:50:11.980 --> 00:50:17.180
users' access to get to the requested
content. I'm trying to get away from a
00:50:17.180 --> 00:50:23.480
semantic where of the intention applied.
But great question.
00:50:23.480 --> 00:50:26.240
Herald: All right, then let's go back to
Mic one right.
00:50:26.240 --> 00:50:29.740
Question: Hi Roya. You mentioned that you
have a team of students working on all of
00:50:29.740 --> 00:50:33.890
these frameworks. I was wondering if your
frameworks were open source are available
00:50:33.890 --> 00:50:37.760
online for collaboration? And if so, where
those resources would be?
00:50:37.760 --> 00:50:45.040
Roya: So the data is open. The code hasn't
been. For one reason is I'm so low
00:50:45.040 --> 00:50:49.090
confident in sharing code, like I'm
friends with Philipp Winter, Dave Fifield.
00:50:49.090 --> 00:50:54.170
These people are pro open source and they
constantly blame me for not. But it really
00:50:54.170 --> 00:51:00.721
requires confidence to share code. So we
are working on that at least for Quack. I
00:51:00.721 --> 00:51:06.390
think the code is very easily can be
shared. For Augur, we spent a heck amount
00:51:06.390 --> 00:51:12.109
of time to make a production ready code
and for Satellite I think that is also
00:51:12.109 --> 00:51:17.420
ready. I can share them personally with
you but before sharing to the world I want
00:51:17.420 --> 00:51:21.560
to actually give another person to audit
and make sure we're not using a curse word
00:51:21.560 --> 00:51:26.420
or something. I don't know. It's just
completely my mind being a little bit
00:51:26.420 --> 00:51:31.030
conservative. But happy if you send me an
e-mail I send you to code.
00:51:31.030 --> 00:51:35.640
Question: Thank you.
Herald: All right then move to Mic two.
00:51:35.640 --> 00:51:39.930
Question: Thanks again for sharing your
great vision. I find it really
00:51:39.930 --> 00:51:47.470
fascinating. Also I'm not really a data
scientist but my question is: did you find
00:51:47.470 --> 00:51:56.099
any any usefulness in your approaches in
the spreading of the Internet of Things? I
00:51:56.099 --> 00:52:06.960
understood that you used routers to make
queries but did you send and maybe receive
00:52:06.960 --> 00:52:11.260
back any data from
washing machines, toasters,...?
00:52:11.260 --> 00:52:17.480
Roya: I mean, I know, being ethical and
trying to not use end user machine limits
00:52:17.480 --> 00:52:22.589
your access a lot. And but but but that's
our goal. We are going to stick with
00:52:22.589 --> 00:52:28.240
things that don't belong to the end users.
And so it's all routers, organizational
00:52:28.240 --> 00:52:31.940
machines. So I want to make sure that
whatever we're using belong to the
00:52:31.940 --> 00:52:35.349
identity that can protect themselves if
something went wrong. They can just say
00:52:35.349 --> 00:52:39.640
"Hey this is a freaking router, it
receives and sends so many things. I mean,
00:52:39.640 --> 00:52:44.740
look, let me give you show you a TCP (?),
for example. A volunteer might not be able
00:52:44.740 --> 00:52:49.290
to defend that because it's already
conspiring and collecting this data. But
00:52:49.290 --> 00:52:53.550
good questions, I wish I could
but I won't pass that line.
00:52:53.550 --> 00:52:57.380
Herald: All right. I don't see any more
questions in the room right now. But we
00:52:57.380 --> 00:53:01.080
have one from the internet
so please, signal angel.
00:53:01.080 --> 00:53:06.510
Signal Angel: Yes. Actually a question
from koli585: I was in an African
00:53:06.510 --> 00:53:10.009
country where the internet has been
completely shut down. How can I quickly
00:53:10.009 --> 00:53:14.709
and safely inform others
about the shut down?
00:53:14.709 --> 00:53:21.470
Roya: So while I think local users' values
are highly highly needed they can use
00:53:21.470 --> 00:53:27.510
social media like Twitter to send and say
whatever, there is a project called IODA.
00:53:27.510 --> 00:53:36.869
It's a project at CAIDA UCSD University in
U.S. and Philipp Winter, Alberto
00:53:36.869 --> 00:53:43.160
[Dainotti] and Alistair [King] are working
on that. They basically remotely keep
00:53:43.160 --> 00:53:51.540
track of shutdowns and push them out. If
you look at the IODA on Twitter you can
00:53:51.540 --> 00:54:02.620
see their live feed of how the shutdowns
where the shutdowns happen. So I haven't
00:54:02.620 --> 00:54:09.260
thought about how to reach to the users
telling them what we see or how we can
00:54:09.260 --> 00:54:18.609
incorporate the users' feedback. We are
working with a group of researchers that
00:54:18.609 --> 00:54:27.000
already developed tools to receive this
data from Tweeters and basically use that
00:54:27.000 --> 00:54:31.890
as some level of ground truth, but OONI
does such a great job that I haven't felt
00:54:31.890 --> 00:54:37.220
a need.
Herald: Alright. Unless the signal angel
00:54:37.220 --> 00:54:43.750
has another question? No?
Roya: And let me, can I add one thing? So
00:54:43.750 --> 00:54:52.940
I was listening to a talk about how
Iranian versus Arabs were sympathetic
00:54:52.940 --> 00:55:01.040
towards Boston bombing in United States
and there were a lot of assumptions and a
00:55:01.040 --> 00:55:05.819
lot of conclusions were made that, oh
this, I'm completely paraphrasing. I don't
00:55:05.819 --> 00:55:09.900
remember. But this Iranian doesn't care
because they didn't tweet as much. So
00:55:09.900 --> 00:55:17.060
basically their input data was a bunch of
tweets around the time of Boston bombing.
00:55:17.060 --> 00:55:21.599
After the talk was over I said: you know
that in this country Twitter has been
00:55:21.599 --> 00:55:28.929
blocked and so many people couldn't tweet.
applause
00:55:28.929 --> 00:55:33.490
Herald: Alright. That concludes our Q&A,
so thanks so much Roya.
00:55:33.490 --> 00:55:35.436
Roya: Thank you.
00:55:35.436 --> 00:55:41.150
applause
00:55:41.150 --> 00:55:45.970
postroll music
00:55:45.970 --> 00:56:04.000
Subtitles created by c3subtitles.de
in the year 2020. Join, and help us!