1
00:00:00,000 --> 00:00:19,237
35C3 preroll music
2
00:00:19,237 --> 00:00:24,970
Herald Angel: All right. It's my very big
pleasure to introduce Roya Ensafi to you.
3
00:00:24,970 --> 00:00:31,390
She's gonna talk about "Censored Planet: a
Global Censorship Observatory". I'm
4
00:00:31,390 --> 00:00:36,230
personally very interested in learning
more about this project. Sounds like it's
5
00:00:36,230 --> 00:00:41,490
gonna be very important. So please welcome
Roya with a huge warm round of applause.
6
00:00:41,490 --> 00:00:42,880
Thank you.
7
00:00:42,880 --> 00:00:48,660
Applause
8
00:00:48,660 --> 00:00:56,170
Roya: It's wonderful to finally make it to
CCC. I had joined talk with multiple of my
9
00:00:56,170 --> 00:01:00,219
friends over the past years and the visa
stuff never worked out. This year I
10
00:01:00,219 --> 00:01:06,430
applied for a conference in August and the
visa worked for coming to CCC. My name is
11
00:01:06,430 --> 00:01:11,170
Roya Ensafi and I'm professor at the
University of Michigan. My research
12
00:01:11,170 --> 00:01:18,069
focuses on security and privacy with the
goal of protecting users from adversarial
13
00:01:18,069 --> 00:01:27,799
network. So basically I investigate
network interference ...and somebody is
14
00:01:27,799 --> 00:01:55,770
interfering right now. Damn it. What the
heck. Cool, I'm good. Oh, no I'm not.
15
00:01:55,770 --> 00:02:07,639
laughter OK. In my lab we develop
techniques and systems to be able to
16
00:02:07,639 --> 00:02:13,800
detect network interference often at a
scale and apply these frameworks and tools
17
00:02:13,800 --> 00:02:20,060
to be able to understand the behaviors of
these actors that do the interference and
18
00:02:20,060 --> 00:02:25,040
use this understanding to be able to come
up with a defense. Today I'm going to talk
19
00:02:25,040 --> 00:02:30,030
about a project that is very dear to my
heart. The one that I spent six years
20
00:02:30,030 --> 00:02:34,560
working on it. And in this talk I'm going
to talk about censorship, internet
21
00:02:34,560 --> 00:02:41,391
censorship. And by that I mean any action
that prevents users' access to the
22
00:02:41,391 --> 00:02:48,720
requested content. We have heard an
alarming level of censorship happening all
23
00:02:48,720 --> 00:02:53,980
around the world. And while it was
previously multiple countries that were
24
00:02:53,980 --> 00:03:01,260
capable of using deep packet inspections
to tamper with user traffic thanks to
25
00:03:01,260 --> 00:03:08,540
commercialization of these DPIs now many
countries are actually messing with users'
26
00:03:08,540 --> 00:03:16,951
data. For the first time that the users
type CNN.com in their browsers, their
27
00:03:16,951 --> 00:03:22,320
traffic is subject to some level of
interference by different actors. First
28
00:03:22,320 --> 00:03:27,150
for example the DNS query where the
mapping between the domain and the IP
29
00:03:27,150 --> 00:03:34,100
where the content is, can be manipulated.
For example the DNS assets can be a dead
30
00:03:34,100 --> 00:03:40,900
IP where the content is not there. If the
DNS succeed then the users and the servers
31
00:03:40,900 --> 00:03:47,500
are going to establish a connection, TCP
handshake and that can be easily blocked.
32
00:03:47,500 --> 00:03:53,840
If that succeed then users and servers
start actually sending back and forth the
33
00:03:53,840 --> 00:04:00,209
actual data and there are enough to clear
text to be the traffic encrypted or not
34
00:04:00,209 --> 00:04:06,130
that the DPI can detect a sensitive
keyboard and send a reset package to both
35
00:04:06,130 --> 00:04:12,990
basically shut down the connections.
Before I forget let me tell you and
36
00:04:12,990 --> 00:04:18,150
emphasize that it's not just the
governments and the policies that impose
37
00:04:18,150 --> 00:04:25,400
on the ISPs that lead to censorship.
Actually server side which provides the
38
00:04:25,400 --> 00:04:31,319
data are also blocking users. Especially
if they are located in a region that they
39
00:04:31,319 --> 00:04:39,580
don't provide any revenue. We recently
investigated this issue of dual blocking
40
00:04:39,580 --> 00:04:49,180
in deep and provide more details about
what role CDNs actually provide. Imagine
41
00:04:49,180 --> 00:04:57,490
now we have how many users, how many ISPs,
how many transit networks and how many
42
00:04:57,490 --> 00:05:02,830
websites. Each of which are going to have
their own policies of how to block users'
43
00:05:02,830 --> 00:05:09,859
access. More, censorship changes from time
to time, region to region and country to
44
00:05:09,859 --> 00:05:14,759
country. And for that reason many
researchers including me have been
45
00:05:14,759 --> 00:05:20,660
interested in collecting data about
censorship in a global way and
46
00:05:20,660 --> 00:05:29,539
continuously. Well, I grew up under severe
censorship. Be it the university,
47
00:05:29,539 --> 00:05:35,289
government, more frustrating the server
side. And I genuinely believe that
48
00:05:35,289 --> 00:05:44,739
censorship take away opportunities and
degrade human dignity. It is not just
49
00:05:44,739 --> 00:05:54,090
China, Bahrain, Turkey that does internet
censorship. Actually with the DPIs become
50
00:05:54,090 --> 00:06:02,499
cheaper and cheaper many governments are
following their leads. As a result
51
00:06:02,499 --> 00:06:06,680
Internet is becoming more and more
balkanized and the users around the world
52
00:06:06,680 --> 00:06:09,870
are going to soon have a very very
different pictures of what this Internet
53
00:06:09,870 --> 00:06:16,500
is. And we need to be able to collect the
data and to be able to know what is being
54
00:06:16,500 --> 00:06:25,121
censored, how it's being censored, where
it's being censored and for how long. This
55
00:06:25,121 --> 00:06:32,509
data then can be used to bring
transparency and accountability to
56
00:06:32,509 --> 00:06:38,779
governments or private companies that
practice internet censorship. It can help
57
00:06:38,779 --> 00:06:44,460
us to know where the circumvention to,
where the defense needs to be deployed. It
58
00:06:44,460 --> 00:06:49,309
can help us to let the users around the
world to know what their governments are
59
00:06:49,309 --> 00:06:59,370
up to and more important provide valid and
good data for the policymakers to come up
60
00:06:59,370 --> 00:07:07,860
with the good policies. Existing research
already shows that if we can provide this
61
00:07:07,860 --> 00:07:17,860
data to users they act by their own will
to ensure Internet freedom. For many years
62
00:07:17,860 --> 00:07:22,619
my goal has been to come up with a weather
map, a censorship weather map where you
63
00:07:22,619 --> 00:07:27,199
can actually see changes in censorship
over time, how some countries are
64
00:07:27,199 --> 00:07:34,100
different from others and do that for a
continuous duration of time, and for all
65
00:07:34,100 --> 00:07:41,710
over the world. Creating such a map was
impossible with the techniques, Internet
66
00:07:41,710 --> 00:07:46,919
measurement methods that we had at that
time. At the time and even the common
67
00:07:46,919 --> 00:07:53,779
techniques we now use. The measurement
methods to be able to use for measuring
68
00:07:53,779 --> 00:07:59,080
internet censorship is often by deploying
a software or giving your customized
69
00:07:59,080 --> 00:08:05,689
Raspberry Pi to either a client or a
server and based on that measure what's
70
00:08:05,689 --> 00:08:12,550
happening between client and servers.
Well, this approach has a lot of
71
00:08:12,550 --> 00:08:18,050
limitations. For example there are not
that many volunteers around the whole
72
00:08:18,050 --> 00:08:25,409
world that are eager to download a
software and run it. Second, the data
73
00:08:25,409 --> 00:08:33,190
collected from this approach are often not
continuous because the user's connection
74
00:08:33,190 --> 00:08:37,960
can die for a variety of reasons or users
may loose interest to keep running the
75
00:08:37,960 --> 00:08:45,450
software. And therefore we end up with
sparse data where we cannot have a good
76
00:08:45,450 --> 00:08:53,450
baseline for internet censorship studies.
More measuring domains that are sensitive
77
00:08:53,450 --> 00:08:59,800
often create risks for the local
collaborators and might end up with their
78
00:08:59,800 --> 00:09:09,810
government's retaliate. These risks are
not hypothetical. When the Arab Spring was
79
00:09:09,810 --> 00:09:17,240
happening I was approached by many
colleagues to recruit local friends and
80
00:09:17,240 --> 00:09:24,340
colleagues in Middle East to be able to
collect measurement data at the time that
81
00:09:24,340 --> 00:09:30,010
was very interesting to capture the
behavior of the network and most dangerous
82
00:09:30,010 --> 00:09:36,450
for the locals, and volunteers to collect
that. My painting actually expressed what
83
00:09:36,450 --> 00:09:44,090
I felt at the time. I can't just imagine
asking people on the ground to help at
84
00:09:44,090 --> 00:09:54,810
these times of unrest. In my opinion,
conspiring to collect the data against the
85
00:09:54,810 --> 00:10:02,450
government's interest can be seen as an
act of treason. And these governments are
86
00:10:02,450 --> 00:10:11,770
unpredictable often. So it has exposed
these volunteers to a severe risk. While
87
00:10:11,770 --> 00:10:19,030
no one has yet been arrested because of
measuring internet censorship as far as we
88
00:10:19,030 --> 00:10:25,740
know, and I don't know how we can know
that on a global scale, I think the clouds
89
00:10:25,740 --> 00:10:34,210
are on the horizon. I'm still at awe how
Turkish government used their surveillance
90
00:10:34,210 --> 00:10:42,410
data at a time of a co-op and tracked down
and detained hundreds of users because
91
00:10:42,410 --> 00:10:49,400
there was a traffic between them and by
luck a messenger app that was used by co-
92
00:10:49,400 --> 00:10:57,410
op administrators. These things happens.
Before I continue, if you know OONI you
93
00:10:57,410 --> 00:11:08,091
might ask how OONI prevents risk. Well,
with a great level of efforts. And if you
94
00:11:08,091 --> 00:11:12,130
don't know OONI, OONI is a global
community of volunteers that collect data
95
00:11:12,130 --> 00:11:20,840
about censorship around the world. Well,
first and foremost they provide their
96
00:11:20,840 --> 00:11:27,990
volunteers with the very honest consent,
telling them that "hey, if you run this
97
00:11:27,990 --> 00:11:34,560
software, anybody who is monitoring your
traffic know what you're up to." They also
98
00:11:34,560 --> 00:11:39,390
go out of their way to give freedom to
these volunteers to choose what website
99
00:11:39,390 --> 00:11:46,010
they want to run, what data they want to
push. They establish a great relationship
100
00:11:46,010 --> 00:11:53,940
with the local activist organization in
the countries. Well, now that I prove to
101
00:11:53,940 --> 00:11:59,250
you guys that I am a supporter of OONI and
I am actually friends with most of them; I
102
00:11:59,250 --> 00:12:05,300
want to emphasize that I still believe
that consistent and continuous and global
103
00:12:05,300 --> 00:12:12,200
data about censorship requires a new
approach that doesn't need volunteers'
104
00:12:12,200 --> 00:12:21,880
help. I've become obsessed with solving
this problems. What if we could measure
105
00:12:21,880 --> 00:12:29,160
without a client, in anywhere around the
world, can talk to a server without being
106
00:12:29,160 --> 00:12:36,290
close to a client. Somewhere from here,
from University of Michigan. And see
107
00:12:36,290 --> 00:12:42,300
whether the two hosts can talk to each
other, globally and remotely, off the
108
00:12:42,300 --> 00:12:50,220
path. When I talk to the people about
this, honestly, everybody was like "you
109
00:12:50,220 --> 00:12:54,190
don't know what you're talking about, it's
really really challenging". Well, they
110
00:12:54,190 --> 00:13:01,370
were right. The challenge is there, and
I'm going to walk you through it. We have
111
00:13:01,370 --> 00:13:06,760
at least 140 million IP addresses that
respond to same packet. This means they
112
00:13:06,760 --> 00:13:15,530
speak to the world, and they follow
blindly TCP/IP protocol. So the question
113
00:13:15,530 --> 00:13:24,400
becomes: how can I leverage the subtle
properties of TCP/IP to be able to detect
114
00:13:24,400 --> 00:13:36,080
that two hosts can talk to each other?
Well, Spooky Scan is a technique that Jed
115
00:13:36,080 --> 00:13:43,090
Crandall from University of New Mexico and
I developed that uses TCP/IP side channels
116
00:13:43,090 --> 00:13:49,770
to be able to detect whether the two
remote hosts can establish a TCP handshake
117
00:13:49,770 --> 00:13:56,890
or not, and if not, in which direction the
packets are being dropped. Off the path
118
00:13:56,890 --> 00:14:03,780
and remotely. And I'm gonna start telling
you how this works. First I have to cover
119
00:14:03,780 --> 00:14:10,810
some background. So any connection that is
based on TCP, one of the basic
120
00:14:10,810 --> 00:14:15,950
communication protocols we have, is it
needs to establish a TCP handshake. So
121
00:14:15,950 --> 00:14:22,730
basically you should, you send a SYN and
in the packet you send, in the IP header,
122
00:14:22,730 --> 00:14:30,750
you have a field called "identification
IP_ID", and this field is used for
123
00:14:30,750 --> 00:14:36,610
fragmentation reason, and I'm going to use
this field a lot in the rest of the talk.
124
00:14:36,610 --> 00:14:42,300
After the user received a SYN, it is going
to send a SYN-ACK back, have another IP_ID
125
00:14:42,300 --> 00:14:47,520
in it. And then, if I want to establish a
connection I send ACK. Otherwise I send a
126
00:14:47,520 --> 00:14:56,070
RESET (RST). Part of the protocol says
that if you send a SYN-ACK packet to a
127
00:14:56,070 --> 00:15:01,310
machine with a port open or closed, it's
going to send you a RST, telling you "what
128
00:15:01,310 --> 00:15:05,220
the heck you are sending me SYN-ACK, I
didn't send you a SYN" and another part
129
00:15:05,220 --> 00:15:09,350
said: if you send a SYN packet to a
machine with the port open, eager to
130
00:15:09,350 --> 00:15:13,880
establish connection, it will send you a
SYN-ACK. If you don't do anything, because
131
00:15:13,880 --> 00:15:20,040
TCP/IP is reliable, it's going to send you
multple SYN-ACK. It depends on operating
132
00:15:20,040 --> 00:15:30,241
system, 3, 5, you name it. Spooky Scan
requires some basic characteristics. For
133
00:15:30,241 --> 00:15:36,740
example, the client, the vantage points
that we are interested, should maintain a
134
00:15:36,740 --> 00:15:44,060
global variable for the IP_ID. It means
that, when they receive the packets and
135
00:15:44,060 --> 00:15:48,650
they want to send a packet out, no matter
who they're sending the packet to, this
136
00:15:48,650 --> 00:15:53,590
IP_ID is going to be a shared resource, as
in going to be increment by one. So by
137
00:15:53,590 --> 00:15:57,900
just watching the IP_ID changes you can
see how much a machine is noisy, how much
138
00:15:57,900 --> 00:16:03,820
a machine is sending traffic out. A server
should have a port open, let's say 80 or
139
00:16:03,820 --> 00:16:08,910
443, and wants to establish a connection,
and the measurement machine, me, should be
140
00:16:08,910 --> 00:16:15,360
able to spoof packets. It means sending
packet with the source IP different from
141
00:16:15,360 --> 00:16:20,520
my own machine. To be able to do that, you
need to talk to upstream network and ask
142
00:16:20,520 --> 00:16:28,260
them not to drop the packets. All of these
requirements I could easily satisfy with a
143
00:16:28,260 --> 00:16:36,560
little bit of effort. A Spooky Scan starts
with measurement machine send a SYN-ACK
144
00:16:36,560 --> 00:16:41,310
packet to one of this client with a global
IP_ID, at a time let's say the value is
145
00:16:41,310 --> 00:16:49,010
7000. The client is going to send back a
RST, following the protocol, revealing to
146
00:16:49,010 --> 00:16:53,881
me what the value of IP_ID. In the next
step I'm going to send a spoofed SYN
147
00:16:53,881 --> 00:17:01,779
packet to a server using a client IP. As a
result, the SYN-ACK is going to be sent to
148
00:17:01,779 --> 00:17:06,289
the client. Again, client is going to send
a RST back, the IP_ID is going to be
149
00:17:06,289 --> 00:17:11,240
incremented by 1. Next time I query IP_ID
I'm going to see a jump too. In a
150
00:17:11,240 --> 00:17:17,189
noiseless model, I know that this machine
talked to the server. If I query it again,
151
00:17:17,189 --> 00:17:25,070
I won't see any jump. So, Delta 2, Delta
1. Now imagine there is a firewall that
152
00:17:25,070 --> 00:17:32,520
blocks the SYN-ACKs going from the server
to the client. Well, it doesn't matter how
153
00:17:32,520 --> 00:17:36,860
much of the traffic I send, it's not going
to get there. It's not going to get there.
154
00:17:36,860 --> 00:17:44,390
So the delta I see is 1, 1. In the third
case when the packets are going to be
155
00:17:44,390 --> 00:17:49,790
dropped from the client to the server:
Well, my SYN-ACK gets there. The SYN-ACK
156
00:17:49,790 --> 00:17:55,030
gets to the client, the client is going to
set the RST back, but it's not going to
157
00:17:55,030 --> 00:17:59,470
get to the server. And so server thinks
that a packet got dropped, so it's going
158
00:17:59,470 --> 00:18:07,040
to send multiple SYN-ACK. And as a result
the RST is going to be plus plus more. And
159
00:18:07,040 --> 00:18:13,690
so what jump I would see is, let's say, 2,
2. Let me put them all together. So you
160
00:18:13,690 --> 00:18:19,670
have 3 cases. Blocking in this direction.
No blocking and blocking in the other. And
161
00:18:19,670 --> 00:18:25,890
you see different jumps or different
deltas. So it's detectable. Yes, yes, in a
162
00:18:25,890 --> 00:18:31,770
noiseless model. I know the clients talk
to so many others and the IP_ID is going
163
00:18:31,770 --> 00:18:37,590
to be changed because of a variety of
reason. I call all of those noise. And
164
00:18:37,590 --> 00:18:42,870
this is how we are going to deal with it.
Well, intuitively thinking we can amplify
165
00:18:42,870 --> 00:18:47,940
the signal. We can actually instead of
sending one spoofed SYN packet we can send
166
00:18:47,940 --> 00:18:55,310
n. And for a variety of reasons packets
can get dropped. So we need to repeat this
167
00:18:55,310 --> 00:19:04,360
measurement. So here is some data from a
Spooky Scan where I used the following
168
00:19:04,360 --> 00:19:13,300
probing method. For 30 seconds I spoofed
the, I've sent a query for IP_ID. And then
169
00:19:13,300 --> 00:19:20,559
for another 30 seconds I send these 5
spoofed SYN packets. This is machines or
170
00:19:20,559 --> 00:19:26,680
clients in Azerbaijan, China and United
States. And we wanted to check whether it
171
00:19:26,680 --> 00:19:32,980
has reached the TOR-relay that we had in
Sweden. You can see there are different
172
00:19:32,980 --> 00:19:40,280
jump or different levels-shift that you
observe in a second phase. And just
173
00:19:40,280 --> 00:19:45,290
visually looking at it or using auto-
regressive moving average or ARMA you
174
00:19:45,290 --> 00:19:51,120
can actually detect that. But there is an
insight here, which is that not all the
175
00:19:51,120 --> 00:19:56,520
clients have the same level of noise. And
for which, for some of them, especially
176
00:19:56,520 --> 00:20:01,630
these guys, you could easily detect after
five level of sending IP_ID-query and then
177
00:20:01,630 --> 00:20:10,770
five seconds of spoofing. So in the
follow-up work we tried to use this
178
00:20:10,770 --> 00:20:16,480
insight, to be able to come up with a
scalable and efficient technique to be
179
00:20:16,480 --> 00:20:24,900
able to use it in a global way. And that
technique is called "Augur". Well Augur
180
00:20:24,900 --> 00:20:32,920
adopts this probing method. First, for four
seconds it queries IP_ID, then in one
181
00:20:32,920 --> 00:20:42,160
second sends 10 spoofed SYN-packets. Then
look at the IP_ID-acceleration or second
182
00:20:42,160 --> 00:20:49,600
derivative, and see whether we see a jump,
a sudden jump at the time of perturbation,
183
00:20:49,600 --> 00:20:55,520
when we did the spoofing. How confident we
are that that jump is the result of our
184
00:20:55,520 --> 00:21:02,290
own spoofed packet? Well, I'm not
confident, run it again. I think so, run
185
00:21:02,290 --> 00:21:09,280
it again, until you have a sufficient
confidence. It turns out there is a
186
00:21:09,280 --> 00:21:15,230
statistical analysis called "sequential
hypothesis testing" that can be used to be
187
00:21:15,230 --> 00:21:23,300
able to gradually improve our confidence
about the case we're detecting. So I'm
188
00:21:23,300 --> 00:21:28,340
going to give you a very, very rough
overview of how this works. But for
189
00:21:28,340 --> 00:21:36,810
sequential hypothesis testing we need to
define a random variable. And we use
190
00:21:36,810 --> 00:21:42,910
IP_ID-acceleration at the time of
perturbation, being 1 or 0, based on you
191
00:21:42,910 --> 00:21:53,570
see jump or not. We also need to calculate
some empirical priors, known
192
00:21:53,570 --> 00:21:59,450
probabilities. If you look at everything,
what would be the probability that you see
193
00:21:59,450 --> 00:22:08,179
jump when there is actually no blocking?
And so on. After we put all this together
194
00:22:08,179 --> 00:22:16,150
then we can formalize an algorithm
starting by run a trial. Update the
195
00:22:16,150 --> 00:22:20,940
sequence of values for the random
variables. Then check whether this
196
00:22:20,940 --> 00:22:27,320
sequence of values belongs to the
distribution of where the blocking happen
197
00:22:27,320 --> 00:22:32,590
or not. What's the likelihood of that? If
you're confident, if we reached the level
198
00:22:32,590 --> 00:22:39,130
that we are satisfied, then we call it a
case. So putting all this together this is
199
00:22:39,130 --> 00:22:47,720
how Augur works. We scan the whole IPv4,
find global IP_ID-machines. And then we
200
00:22:47,720 --> 00:22:55,870
have some constraint that is it a stable
machine? Is it a noisier or have a noise
201
00:22:55,870 --> 00:23:02,170
that you want to deal with? We also need
to figure out what website are we
202
00:23:02,170 --> 00:23:09,290
interested to test reachability towards?
What countries we are? So after we decide
203
00:23:09,290 --> 00:23:18,500
all the input then we run a scheduler
making sure that no client and server are
204
00:23:18,500 --> 00:23:26,160
under the measurement in the same time
because they mess each other's detection.
205
00:23:26,160 --> 00:23:32,500
And then we actually use our analysis to
be able to call the case and summarize the
206
00:23:32,500 --> 00:23:39,191
results. I started by saying that the
common methods have this limitation, for
207
00:23:39,191 --> 00:23:45,370
example coverage continuity and ethics.
Well, when it comes to coverage there are
208
00:23:45,370 --> 00:23:52,620
more than 22-million global IP_ID-
machines. These are WindowsXP or
209
00:23:52,620 --> 00:24:02,570
predecessors. And FreeBSDs for
example. Compared to the previous board,
210
00:24:02,570 --> 00:24:07,910
one successful project is the RIPE-atlas,
and they have around 10000 probes globally
211
00:24:07,910 --> 00:24:18,970
deployed. When it comes to continuity we
don't depend on the end user. So it's much
212
00:24:18,970 --> 00:24:28,720
more reliable to use this. Well, by not
asking volunteers to help we were already
213
00:24:28,720 --> 00:24:34,570
reducing the risk. Because there is no
users conspiring against their governments
214
00:24:34,570 --> 00:24:43,000
to collect this data. But our approach is
not also zero risk. If you look you have a
215
00:24:43,000 --> 00:24:49,860
different kind of risk here. The client
and server exchanging SYN-ACK and RST
216
00:24:49,860 --> 00:24:55,810
without each of them giving a consent. And
we don't want to ask for consent. Because
217
00:24:55,810 --> 00:25:01,020
if you do, the dilemma exists. We have to
go back and it's just the same that's
218
00:25:01,020 --> 00:25:06,850
asking volunteers. So, to deal with that
and cope with that, to reduce the risk
219
00:25:06,850 --> 00:25:15,380
more, we don't use end-IPs. We actually
use 2 hops back, routers which high
220
00:25:15,380 --> 00:25:21,650
probability they are infrastructure
machines and use those as a vantage point.
221
00:25:21,650 --> 00:25:31,486
Even in this harsh constraint we still
have 53000 global IP_ID-routers. To test
222
00:25:31,486 --> 00:25:38,780
the framework to see that whether Augur
works we chose 2000 of these global IP_ID-
223
00:25:38,780 --> 00:25:45,350
machines, uniformly selected from all the
countries we had vantage point. We
224
00:25:45,350 --> 00:25:52,549
selected websites from Citizen Lab
Testlist. This is the research
225
00:25:52,549 --> 00:25:57,710
organization in Toronto University where
they crowdsourced websites that are
226
00:25:57,710 --> 00:26:03,070
potentially being blocked or potential
sensitive. And then we used thousands of
227
00:26:03,070 --> 00:26:09,640
the websites from Alexa top-10k. And then
we get the Augur running for 17 days and
228
00:26:09,640 --> 00:26:17,050
collect this data. One of the challenges
that we have to validate Augur was like:
229
00:26:17,050 --> 00:26:22,940
So, what is the truth? What is the ground-
truth? What would we see that makes sense?
230
00:26:22,940 --> 00:26:26,270
So, and this is the biggest and
fundamental challenge for internet-
231
00:26:26,270 --> 00:26:33,570
censorship anyway. But so the first
approach is leaning on intuition, which is
232
00:26:33,570 --> 00:26:40,049
like no client should show blocking
towards all the websites. No server should
233
00:26:40,049 --> 00:26:45,740
show blocking for bulk of our clients. And
if anything happens like that we just
234
00:26:45,740 --> 00:26:51,960
trash it. And we should see more bias
towards the sensitive domain versus the
235
00:26:51,960 --> 00:27:01,559
ones that are popular. And so on. And also
we hope to replicate the anecdotes, the
236
00:27:01,559 --> 00:27:08,870
reports out there. And we did all of
those. And that's how we validate Augur.
237
00:27:08,870 --> 00:27:17,690
So at the end Augur is a system that is as
scalable and efficient, ethical and can be
238
00:27:17,690 --> 00:27:24,630
used to detect TCP/IP-blocking
continuously. Yes I know that is just
239
00:27:24,630 --> 00:27:32,310
TCP/IP. What about the other layers? Can
we measure them remotely as well? Well,
240
00:27:32,310 --> 00:27:40,090
let me focus on the DNS. You might ask: Is
there a way that we can remotely detect
241
00:27:40,090 --> 00:27:46,890
DNS poisoning or manipulation? Well let's
think it out loud. From now on I'm gonna
242
00:27:46,890 --> 00:27:54,370
give just the highlights of the papers we
work for the lack of the time. Well, if we
243
00:27:54,370 --> 00:28:06,070
scan the whole IPv4 we have a lot of open
DNS resolvers, which means that they are
244
00:28:06,070 --> 00:28:14,929
open to anybody sending a query to them to
resolve. And these open DNS-resolvers can
245
00:28:14,929 --> 00:28:22,590
be used as a vantage point. We can use
open DNS-resolvers in different ISPs
246
00:28:22,590 --> 00:28:29,830
around the world to see whether that DNS
queries are poisoned or not. Well, wait.
247
00:28:29,830 --> 00:28:35,419
We need to make sure that they don't
belong to the end user. So we come up with
248
00:28:35,419 --> 00:28:42,760
a lot of checks to make sure that these
open DNS-resolvers are organizational,
249
00:28:42,760 --> 00:28:50,610
belonging to the ISP or infrastructure.
After we do that then we start sending all
250
00:28:50,610 --> 00:28:57,980
our queries to these, let's say, open DNS-
resolvers in the ISP in Bahrain, for all
251
00:28:57,980 --> 00:29:03,929
the domain we're interested. And capture
what we receive what IPs we receive. The
252
00:29:03,929 --> 00:29:11,390
challenge is then to detect what is the
wrong answer. And so we have to come up
253
00:29:11,390 --> 00:29:19,760
with a lot of heuristics. A set of
heuristics. For example the response that
254
00:29:19,760 --> 00:29:28,610
we received is that equal to a reply we
got from our control measurements, where
255
00:29:28,610 --> 00:29:36,500
we know the IP is not blocked or poisoned
or something. The content is there. Or we
256
00:29:36,500 --> 00:29:42,060
can actually look at the IP that we
received and see whether it has a valid
257
00:29:42,060 --> 00:29:50,850
http cert, with or without the SNI or
servername identification or something.
258
00:29:50,850 --> 00:29:55,720
And so on so forth. So we come up with
lots of heuristics to detect wrong
259
00:29:55,720 --> 00:30:06,840
answers. The results of all these efforts
ended up being a project called
260
00:30:06,840 --> 00:30:12,210
"Satellite", which was started by Will
Scott. I'm sure he is in the audience
261
00:30:12,210 --> 00:30:16,809
somewhere. A great friend of mine and very
good supporter of CensoredPlanet.
262
00:30:16,809 --> 00:30:24,000
Selflessly, he has been a miracle that I I
had the opportunity and fortune to meet
263
00:30:24,000 --> 00:30:31,890
him. We have Satellite. Satellite automate
the whole steps that I told you. For this
264
00:30:31,890 --> 00:30:37,400
work we use science that developed in both
of the work. We call it Satellite because
265
00:30:37,400 --> 00:30:46,421
of seniority and sticking with the name. So
how much coverage Satellite has? If you
266
00:30:46,421 --> 00:30:54,880
scan IPv4 you end up with 4.2 million open
DNS-resolvers in every country in their
267
00:30:54,880 --> 00:31:01,079
territories. We make, we need, we we
actually need to make sure there are
268
00:31:01,079 --> 00:31:08,950
ethics for that reason. If we put a harsh
condition. We say that let's only use the
269
00:31:08,950 --> 00:31:17,710
ones that fallow their valid PTR record
followed this expression. Basically let's
270
00:31:17,710 --> 00:31:23,200
just use the open DNS-resolvers that are
name servers or at least their PDR record
271
00:31:23,200 --> 00:31:29,920
suggests that. This is a really harsh
constraint. Actually, my students have
272
00:31:29,920 --> 00:31:34,430
been adding more and more regular
expression for the ones that we are sure
273
00:31:34,430 --> 00:31:42,610
they are organizational. But for now just
being this harsh we have 40k of DNS-
274
00:31:42,610 --> 00:31:56,830
revolvers in almost 169 countries I guess.
So censorship happened in other layers as
275
00:31:56,830 --> 00:32:00,700
well. How do we want to deal with that
remote channel, with the remote side
276
00:32:00,700 --> 00:32:12,520
channel? And, especially, like, what about
http traffic or disruption that can happen
277
00:32:12,520 --> 00:32:29,809
to you know TLS centric. I hate water.
Oh no. Okay. So. So it's scratching
278
00:32:29,809 --> 00:32:38,220
noise it's well documented that many DPIs
especially in the Great Firewall of China monitor
279
00:32:38,220 --> 00:32:43,930
the traffic and then they see a key word,
a sensitive keyword like "Falun Gong".
280
00:32:43,930 --> 00:32:50,350
They act and a drop traffic or send a RST.
And as I mentioned earlier there are
281
00:32:50,350 --> 00:32:57,330
enough clear text everywhere. Even in TLS
handshakes SNI is in clear text. And for a
282
00:32:57,330 --> 00:33:03,590
long time I was trying to come up with a
way of detecting application layer using
283
00:33:03,590 --> 00:33:09,320
this fancy side channel. Like, how can I
detect that when the client and server
284
00:33:09,320 --> 00:33:14,630
need to first establish a TCP handshake,
how the side channel can jump in and then
285
00:33:14,630 --> 00:33:22,720
detect the rest? We were lucky enough that
the end pointed to a protocol called
286
00:33:22,720 --> 00:33:32,900
"Echo". It's a protocol designed in 1983
and it's for testing reasons, for the
287
00:33:32,900 --> 00:33:41,140
debu..it is a debugging tool, basically.
It's a predecessor to ping. And basically,
288
00:33:41,140 --> 00:33:50,120
after you establish a TCP handshake to
port 7, whatever you send the Echo servers
289
00:33:50,120 --> 00:33:57,290
on port 7 it's gonna echo it back. Now
think about it. How we can use Echo
290
00:33:57,290 --> 00:34:04,570
servers to be able to detect application
layer blocking? Well, when it's not
291
00:34:04,570 --> 00:34:08,490
available, let's say I have an Echo server
in the U.S. and a measurement machine in
292
00:34:08,490 --> 00:34:13,890
the University of Michigan I establish a
TCP handshake and I send a GET request
293
00:34:13,890 --> 00:34:19,190
to... using a censored keyboard for
example. It's gonna get back to me the
294
00:34:19,190 --> 00:34:28,269
same thing I sent. But now let's put the
DPI that is gonna be triggered by it.
295
00:34:28,269 --> 00:34:37,150
Well, for sure, either I'm going to
receive a RST first or something else. So
296
00:34:37,150 --> 00:34:43,609
we can actually come up with a algorithm
to be able to use Echo servers to detect
297
00:34:43,609 --> 00:34:47,969
disruptions on application layer.
Basically keyboards blocking, URL
298
00:34:47,969 --> 00:34:58,530
blocking. Results of this is a tool called
Quack. And Quack actually uses Echo
299
00:34:58,530 --> 00:35:06,470
servers to be able to detect in a scalable
way and say if, whether the keywords are
300
00:35:06,470 --> 00:35:14,380
being blocked around the world. So what
did we do is first scan the whole IPv4. We
301
00:35:14,380 --> 00:35:22,910
find 47k Echo servers running around the
world. Then we need to be able to check
302
00:35:22,910 --> 00:35:27,270
whether they or not belong to the end
users. And that was a very challenging
303
00:35:27,270 --> 00:35:36,530
part because there is not a clear signal
as it's.. there are 90 percent of them are
304
00:35:36,530 --> 00:35:40,730
infrastructure but there is still some
portion of them that we don't know. So
305
00:35:40,730 --> 00:35:46,610
what we do is we look at the FreedomHouse
reports and the countries that are
306
00:35:46,610 --> 00:35:52,931
partially open or not open, not free or
partially free what they're called. This
307
00:35:52,931 --> 00:35:58,720
is around 50... This is around 50
countries. And for those we use... we
308
00:35:58,720 --> 00:36:05,460
randomly select some that we want and we
use OS detection of Nmap. And if you have,
309
00:36:05,460 --> 00:36:15,750
it will give us back it's a server, it's a
switch and so on. We use those. So with
310
00:36:15,750 --> 00:36:23,010
the help of so many collaborators after
almost six years we end up with three
311
00:36:23,010 --> 00:36:32,420
systems that can capture TCP/IP blocking,
DNS, and application layer blocking using
312
00:36:32,420 --> 00:36:43,480
infrastructure and organizational
machines. So while it was, it was a dream
313
00:36:43,480 --> 00:36:47,810
or a vision that we can come up with a
better map to collect this data in a
314
00:36:47,810 --> 00:36:56,020
continuous way, thanks to help of a lot of
people especially my students, Will, and
315
00:36:56,020 --> 00:37:02,060
other collaborators we now have
CensoredPlanet. CensoredPlanet collects
316
00:37:02,060 --> 00:37:09,020
semi-weekly snapshots of Internet
censorship using our vantage point in all
317
00:37:09,020 --> 00:37:18,090
the layers and provide this data in a raw
format now in our web site. We also
318
00:37:18,090 --> 00:37:24,531
provide some visualization way for people
to be able to see how many vantage points
319
00:37:24,531 --> 00:37:29,560
we have in each country and so on. Of
course, this is the beginning of
320
00:37:29,560 --> 00:37:34,160
CensoredPlanet. We launched this at August
and we have been collecting data for
321
00:37:34,160 --> 00:37:39,880
almost four months and we have a long way
to go. We have users right now through
322
00:37:39,880 --> 00:37:45,130
organizations using our data and helping
us debug by finding things that doesn't
323
00:37:45,130 --> 00:37:51,950
make sense pointing to us and any of you
that ended up using these data, please
324
00:37:51,950 --> 00:37:56,930
share your feedback with us and we are
very responsive to be able to change it,
325
00:37:56,930 --> 00:38:03,940
not as much as you need. They have a
collective of very well dedicated people
326
00:38:03,940 --> 00:38:10,940
participating. So, now that we have this
CensoredPlanet let me give you how it can
327
00:38:10,940 --> 00:38:19,349
help when there is a political situation
going on. You all must remember around
328
00:38:19,349 --> 00:38:25,410
October there Jamal Khashoggi, a
Washington Post reporter, disappeared,
329
00:38:25,410 --> 00:38:34,530
killed at the Saudi Arabian embassy in
Turkey. At the time of this happening
330
00:38:34,530 --> 00:38:40,540
there was a lot of media attention and
this, this news especially two weeks in
331
00:38:40,540 --> 00:38:46,980
become very internationally spread.
CensoredPlanet didn't know this event was
332
00:38:46,980 --> 00:38:52,750
going to happen. So we have been
collecting this data semi-weekly for 2000
333
00:38:52,750 --> 00:38:57,660
domain or so. And so we went back and we
checked the Saudi Arabia. Did we see
334
00:38:57,660 --> 00:39:04,830
anything interesting? And yes, we saw for
example at two weeks in, around October
335
00:39:04,830 --> 00:39:12,680
16, the domains that we were that was news
category and media category, the
336
00:39:12,680 --> 00:39:18,500
censorship related to those doubled. And
let me emphasize, we didn't see like a
337
00:39:18,500 --> 00:39:23,440
block or not block over the whole country
not all the countries have a homogeneous
338
00:39:23,440 --> 00:39:28,430
censorship happening. We saw it in
multiple of the ISPs that we had vantage
339
00:39:28,430 --> 00:39:34,770
point. Actually I freaked out when one of
the activists in Saudi Arabia told us that
340
00:39:34,770 --> 00:39:41,869
"I don't see this". And we said "What ISP
you are in?" And this wasn't the ISPs that
341
00:39:41,869 --> 00:39:49,160
we had vantage point in. So we were
looking for hints that "Is anybody else
342
00:39:49,160 --> 00:39:55,720
seeing what we were seeing?". And so we
ended up seeing there was a commander
343
00:39:55,720 --> 00:40:03,560
lab project that also saw around October
16 the number of malwares or whatever they
344
00:40:03,560 --> 00:40:10,220
are testing is also doubled or tripled. I
don't know the other. So something was
345
00:40:10,220 --> 00:40:17,180
going on two weeks in when the news broke.
Let me emphasize this news media that I am
346
00:40:17,180 --> 00:40:22,300
talking about or the global news media
that we check like L.A. Times, Fox News
347
00:40:22,300 --> 00:40:30,970
and so on. But we also checked Arab News
which is as the activists told us is a
348
00:40:30,970 --> 00:40:38,490
Saudi Arabia's propaganda newspaper. That
in one of the ISPs was being poisoned. So
349
00:40:38,490 --> 00:40:49,910
again, censorship measurement is very
complex problem. So where we're heading?
350
00:40:49,910 --> 00:40:55,580
Well, having said that about side channels
and the techniques that help us remotely
351
00:40:55,580 --> 00:41:01,900
collect this data I have to also say that
the data we collect doesn't replicate the
352
00:41:01,900 --> 00:41:06,950
picture of the internet censorship. I mean
having a root access on a volunteers
353
00:41:06,950 --> 00:41:17,641
machine to do a detailed test is powerful.
So in the next step, in the next year, one
354
00:41:17,641 --> 00:41:27,720
of our goal is to join force with OONI to
integrate the data and from remote and
355
00:41:27,720 --> 00:41:37,800
basically local measurements to provide
the best of both worlds. Also, we have
356
00:41:37,800 --> 00:41:43,990
been thinking a lot about what would be a
good visualization tools that doesn't end
357
00:41:43,990 --> 00:41:51,391
up to misrepresent internet censorship. I
literally hate that one. Hate it. The
358
00:41:51,391 --> 00:41:56,860
number of vantage point in countries are
not equal. We don't know whether all the
359
00:41:56,860 --> 00:42:00,980
vantage points that the data has resulted
from it is from one ISP or all of our
360
00:42:00,980 --> 00:42:08,109
ISPs. And then we test domains that are
like benign and like I don't know defined
361
00:42:08,109 --> 00:42:13,650
based on some western values of the
freedom of expression. I believe in all of
362
00:42:13,650 --> 00:42:19,330
them but still culture, economy might play
something red. And then we put colors on
363
00:42:19,330 --> 00:42:25,030
the map, rank the countries, call some
countries awful and not giving full
364
00:42:25,030 --> 00:42:30,849
attention to the others. So something
needs to be changed and it's in our
365
00:42:30,849 --> 00:42:37,700
horizon too. Think about it more deeper.
We want to be able to have more statistic
366
00:42:37,700 --> 00:42:44,320
tools to be able to spot when the patterns
change. We want to be able to compare the
367
00:42:44,320 --> 00:42:49,580
countries when for example Telegram was
being blocked at Russia. If you remember
368
00:42:49,580 --> 00:42:54,910
millions of IPs being blocked. If you
don't, know go to my friend Leonid's talk
369
00:42:54,910 --> 00:43:00,020
about Russia. You're going to learn a lot
there. But anyway. So when the Russia was
370
00:43:00,020 --> 00:43:06,520
blocking Telegram, I said to everyone I
bet in the following some other
371
00:43:06,520 --> 00:43:10,370
governments are going to jump to block
Telegram as well. And that's actually what
372
00:43:10,370 --> 00:43:15,320
we heard, rumors like that. So we need to
be able to do that automatically. And
373
00:43:15,320 --> 00:43:26,470
overall, I want to be able to develop an
empirical science of internet censorship
374
00:43:26,470 --> 00:43:36,720
based on rich data with the help of all of
you. CensoredPlanet is now being
375
00:43:36,720 --> 00:43:43,370
maintained by a group of dedicated
students, great friends that I have and
376
00:43:43,370 --> 00:43:49,960
needs engineers and political scientists
to jump on our data and help us to bring
377
00:43:49,960 --> 00:43:57,320
meaning to what we are collecting. So if
you are a good engineer or a political
378
00:43:57,320 --> 00:44:07,250
scientist or a dedicated person who wants
to change the world, reach out to me. For
379
00:44:07,250 --> 00:44:11,500
as a reference for those of you
interested: these are the publications
380
00:44:11,500 --> 00:44:19,720
that my talk was based on.
And now I am open to questions.
381
00:44:19,720 --> 00:44:26,180
applause
382
00:44:26,180 --> 00:44:31,440
Herald: Allright, perfect. Thank you so
much, Roya, so far. We have some time for
383
00:44:31,440 --> 00:44:35,500
questions so if you have a question in the
room please go to one of the room
384
00:44:35,500 --> 00:44:40,100
microphones one, two, three, four, and
five in the very back. And if you're
385
00:44:40,100 --> 00:44:44,490
watching the stream you can ask questions
to the signal angel via IRC or Twitter and
386
00:44:44,490 --> 00:44:49,360
we'll also make sure to relay those to the
speaker and make sure those get asked. So
387
00:44:49,360 --> 00:44:52,040
let's just go ahead and
start with Mic two please.
388
00:44:52,040 --> 00:44:57,349
Question: Hey, great talk. Do you worry
that by publishing your methods as well as
389
00:44:57,349 --> 00:45:02,690
your data that you're going to get a
response from governments that are
390
00:45:02,690 --> 00:45:05,869
censoring things such that it makes it
more difficult for you to monitor what's
391
00:45:05,869 --> 00:45:08,680
being censored? Or has
that already happened?
392
00:45:08,680 --> 00:45:14,630
Roya: It hasn't happened. We have control
measures to be able to detect that. But
393
00:45:14,630 --> 00:45:19,260
that has been... it's a really good
question and often comes up after I
394
00:45:19,260 --> 00:45:25,490
present. I can tell you based on my
experience it's really hard to synchronize
395
00:45:25,490 --> 00:45:31,490
all the ISPs in all the countries to act
to the SYN-ACK and RST that I'm sending.
396
00:45:31,490 --> 00:45:36,150
Like, for example for Augur, this is
unsolicited packets and for governments to
397
00:45:36,150 --> 00:45:41,850
block that they are going to be a lot of
collateral damage. You might say that
398
00:45:41,850 --> 00:45:45,610
well, Roya, they're going to block the IP
of the University of Michigan. They're a
399
00:45:45,610 --> 00:45:50,770
spoofing machine. We have a measure for
that. I have multiple places that I
400
00:45:50,770 --> 00:45:56,190
actually have a backup if that case
happened. But overall this is a global
401
00:45:56,190 --> 00:46:02,800
scale measurement, and even in one city or
like multiple ISPs you know of it's really
402
00:46:02,800 --> 00:46:06,920
hard to synchronize being like blocking
something and maintaining. So it is
403
00:46:06,920 --> 00:46:13,630
something that's in our mind thinking
about. But as as of now it's not a worry.
404
00:46:13,630 --> 00:46:16,470
Herald: All right then let's
go over to Mic one.
405
00:46:16,470 --> 00:46:20,510
Question: Thank you. I wondered, it's kind
of similar to this question. What if you
406
00:46:20,510 --> 00:46:24,920
are measuring from a country that is
blocking? Do you also distribute the
407
00:46:24,920 --> 00:46:29,970
measurements over several countries?
Roya: Absolutely. Every snapshot that we
408
00:46:29,970 --> 00:46:37,280
collect is from all the vantage point we
have in like certain countries and portion
409
00:46:37,280 --> 00:46:42,100
of vantage point in like China or like US
because they have millions of vantage
410
00:46:42,100 --> 00:46:46,220
points or like thousands of vantage
points. So basically at each snapshot,
411
00:46:46,220 --> 00:46:52,340
which takes us three days, we collect the
data from all of all of the vantage point.
412
00:46:52,340 --> 00:46:57,580
And so let's say that somebody is reacting
to us. We have a benign domain that we
413
00:46:57,580 --> 00:47:03,250
check as well like for example a domain
example.com or random.com. So if we see
414
00:47:03,250 --> 00:47:09,380
something going on there we actually
double check. But good point, because now
415
00:47:09,380 --> 00:47:14,720
our efforts is very manual labor and we're
trying to automate everything so it's
416
00:47:14,720 --> 00:47:18,900
still a challenge. Thank you.
Herald: All right then let's go to Mic
417
00:47:18,900 --> 00:47:22,859
three.
Question: Hi. Have you measured how much
418
00:47:22,859 --> 00:47:28,140
does IP-ID randomization
break your probes?
419
00:47:28,140 --> 00:47:35,349
Roya: Oh. This is also really good. Let me
give a shout out to [name]. He's the guy
420
00:47:35,349 --> 00:47:45,990
at 1998 discovered IP-ID or published
something that I ended up reading. So like
421
00:47:45,990 --> 00:47:54,440
for example Linux or Ubuntu in the U.S.
version they randomized it but it still
422
00:47:54,440 --> 00:47:59,421
draws this legacy operating system like
WindowsXP and predecessors and FreeBSD
423
00:47:59,421 --> 00:48:04,750
that still have global IP-ID. So one
argument that often come up is, what if
424
00:48:04,750 --> 00:48:09,339
all these machines get updated to the new
operating system where it doesn't have a
425
00:48:09,339 --> 00:48:13,780
maintain global IP-ID? And I can tell you
that, well, we'll come up with another
426
00:48:13,780 --> 00:48:20,129
side channel. For now, that works. But my
gut feeling is that if it didn't change
427
00:48:20,129 --> 00:48:25,230
from 1998 until now with all the things
that everybody says that global IP-ID
428
00:48:25,230 --> 00:48:30,440
variable is a horrible idea, it's not going
to change in the coming five years so
429
00:48:30,440 --> 00:48:33,230
we're good.
Question: Thank you.
430
00:48:33,230 --> 00:48:36,520
Herald: Okay, then let's just
move on to Mic four.
431
00:48:36,520 --> 00:48:41,480
Question: Thank you very much for the
great talk. When you were introducing
432
00:48:41,480 --> 00:48:46,910
Augur I was wondering, does the detection
of the blockage between client server
433
00:48:46,910 --> 00:48:52,190
necessarily indicate censorship? So,
because you were talking about validating
434
00:48:52,190 --> 00:48:59,130
Augur I was wondering if it turns out that
there is like a false alarm. What do you
435
00:48:59,130 --> 00:49:04,530
think could be the potential cause?
Roya: You're absolutely right. And I tried
436
00:49:04,530 --> 00:49:11,630
to emphasize on that that what we end up
collecting is can be seen as a disruption.
437
00:49:11,630 --> 00:49:17,200
Something didn't work. The SYN-ACK or RST
got disrupted. Is that there is a
438
00:49:17,200 --> 00:49:22,250
censorship or it can be a random packet
drop. And the way to be able to establish
439
00:49:22,250 --> 00:49:28,290
that confidence is to check whether
aggregate the results. Do we see this
440
00:49:28,290 --> 00:49:33,670
blocking between multiple of the routers
within that country or within that AS .
441
00:49:33,670 --> 00:49:38,880
Because if one of this is for accident
that just didn't make sense or didn't get
442
00:49:38,880 --> 00:49:43,900
dropped, what about the others? So the
whole idea and this is another point that
443
00:49:43,900 --> 00:49:50,390
I'm so so concerned about: Most of this
report and anecdotes that we read is based
444
00:49:50,390 --> 00:49:55,869
on one VPN or one man touch points in the
country. And then there are a lot of lot
445
00:49:55,869 --> 00:50:00,770
of conclusion out of that. And you often
can ask that well this vantage point might
446
00:50:00,770 --> 00:50:05,640
be subject to so many different things
than a government's censorship. Also I
447
00:50:05,640 --> 00:50:11,980
emphasized that the censorship that I use
in this talk is any action that stops
448
00:50:11,980 --> 00:50:17,180
users' access to get to the requested
content. I'm trying to get away from a
449
00:50:17,180 --> 00:50:23,480
semantic where of the intention applied.
But great question.
450
00:50:23,480 --> 00:50:26,240
Herald: All right, then let's go back to
Mic one right.
451
00:50:26,240 --> 00:50:29,740
Question: Hi Roya. You mentioned that you
have a team of students working on all of
452
00:50:29,740 --> 00:50:33,890
these frameworks. I was wondering if your
frameworks were open source are available
453
00:50:33,890 --> 00:50:37,760
online for collaboration? And if so, where
those resources would be?
454
00:50:37,760 --> 00:50:45,040
Roya: So the data is open. The code hasn't
been. For one reason is I'm so low
455
00:50:45,040 --> 00:50:49,090
confident in sharing code, like I'm
friends with Philipp Winter, Dave Fifield.
456
00:50:49,090 --> 00:50:54,170
These people are pro open source and they
constantly blame me for not. But it really
457
00:50:54,170 --> 00:51:00,721
requires confidence to share code. So we
are working on that at least for Quack. I
458
00:51:00,721 --> 00:51:06,390
think the code is very easily can be
shared. For Augur, we spent a heck amount
459
00:51:06,390 --> 00:51:12,109
of time to make a production ready code
and for Satellite I think that is also
460
00:51:12,109 --> 00:51:17,420
ready. I can share them personally with
you but before sharing to the world I want
461
00:51:17,420 --> 00:51:21,560
to actually give another person to audit
and make sure we're not using a curse word
462
00:51:21,560 --> 00:51:26,420
or something. I don't know. It's just
completely my mind being a little bit
463
00:51:26,420 --> 00:51:31,030
conservative. But happy if you send me an
e-mail I send you to code.
464
00:51:31,030 --> 00:51:35,640
Question: Thank you.
Herald: All right then move to Mic two.
465
00:51:35,640 --> 00:51:39,930
Question: Thanks again for sharing your
great vision. I find it really
466
00:51:39,930 --> 00:51:47,470
fascinating. Also I'm not really a data
scientist but my question is: did you find
467
00:51:47,470 --> 00:51:56,099
any any usefulness in your approaches in
the spreading of the Internet of Things? I
468
00:51:56,099 --> 00:52:06,960
understood that you used routers to make
queries but did you send and maybe receive
469
00:52:06,960 --> 00:52:11,260
back any data from
washing machines, toasters,...?
470
00:52:11,260 --> 00:52:17,480
Roya: I mean, I know, being ethical and
trying to not use end user machine limits
471
00:52:17,480 --> 00:52:22,589
your access a lot. And but but but that's
our goal. We are going to stick with
472
00:52:22,589 --> 00:52:28,240
things that don't belong to the end users.
And so it's all routers, organizational
473
00:52:28,240 --> 00:52:31,940
machines. So I want to make sure that
whatever we're using belong to the
474
00:52:31,940 --> 00:52:35,349
identity that can protect themselves if
something went wrong. They can just say
475
00:52:35,349 --> 00:52:39,640
"Hey this is a freaking router, it
receives and sends so many things. I mean,
476
00:52:39,640 --> 00:52:44,740
look, let me give you show you a TCP (?),
for example. A volunteer might not be able
477
00:52:44,740 --> 00:52:49,290
to defend that because it's already
conspiring and collecting this data. But
478
00:52:49,290 --> 00:52:53,550
good questions, I wish I could
but I won't pass that line.
479
00:52:53,550 --> 00:52:57,380
Herald: All right. I don't see any more
questions in the room right now. But we
480
00:52:57,380 --> 00:53:01,080
have one from the internet
so please, signal angel.
481
00:53:01,080 --> 00:53:06,510
Signal Angel: Yes. Actually a question
from koli585: I was in an African
482
00:53:06,510 --> 00:53:10,009
country where the internet has been
completely shut down. How can I quickly
483
00:53:10,009 --> 00:53:14,709
and safely inform others
about the shut down?
484
00:53:14,709 --> 00:53:21,470
Roya: So while I think local users' values
are highly highly needed they can use
485
00:53:21,470 --> 00:53:27,510
social media like Twitter to send and say
whatever, there is a project called IODA.
486
00:53:27,510 --> 00:53:36,869
It's a project at CAIDA UCSD University in
U.S. and Philipp Winter, Alberto
487
00:53:36,869 --> 00:53:43,160
[Dainotti] and Alistair [King] are working
on that. They basically remotely keep
488
00:53:43,160 --> 00:53:51,540
track of shutdowns and push them out. If
you look at the IODA on Twitter you can
489
00:53:51,540 --> 00:54:02,620
see their live feed of how the shutdowns
where the shutdowns happen. So I haven't
490
00:54:02,620 --> 00:54:09,260
thought about how to reach to the users
telling them what we see or how we can
491
00:54:09,260 --> 00:54:18,609
incorporate the users' feedback. We are
working with a group of researchers that
492
00:54:18,609 --> 00:54:27,000
already developed tools to receive this
data from Tweeters and basically use that
493
00:54:27,000 --> 00:54:31,890
as some level of ground truth, but OONI
does such a great job that I haven't felt
494
00:54:31,890 --> 00:54:37,220
a need.
Herald: Alright. Unless the signal angel
495
00:54:37,220 --> 00:54:43,750
has another question? No?
Roya: And let me, can I add one thing? So
496
00:54:43,750 --> 00:54:52,940
I was listening to a talk about how
Iranian versus Arabs were sympathetic
497
00:54:52,940 --> 00:55:01,040
towards Boston bombing in United States
and there were a lot of assumptions and a
498
00:55:01,040 --> 00:55:05,819
lot of conclusions were made that, oh
this, I'm completely paraphrasing. I don't
499
00:55:05,819 --> 00:55:09,900
remember. But this Iranian doesn't care
because they didn't tweet as much. So
500
00:55:09,900 --> 00:55:17,060
basically their input data was a bunch of
tweets around the time of Boston bombing.
501
00:55:17,060 --> 00:55:21,599
After the talk was over I said: you know
that in this country Twitter has been
502
00:55:21,599 --> 00:55:28,929
blocked and so many people couldn't tweet.
applause
503
00:55:28,929 --> 00:55:33,490
Herald: Alright. That concludes our Q&A,
so thanks so much Roya.
504
00:55:33,490 --> 00:55:35,436
Roya: Thank you.
505
00:55:35,436 --> 00:55:41,150
applause
506
00:55:41,150 --> 00:55:45,970
postroll music
507
00:55:45,970 --> 00:56:04,000
Subtitles created by c3subtitles.de
in the year 2020. Join, and help us!