WEBVTT
00:02:01.480 --> 00:02:03.700
Rachel Greenstadt:
pressure on or from ISPs
00:02:03.700 --> 00:02:06.950
would make it difficult or impossible
to run an exit relay
00:02:06.950 --> 00:02:11.500
however the third point is the one that
I'm gonna mostly be talking about today:
00:02:11.500 --> 00:02:15.300
Tor is not very useful if you can't
actually use it to get anywhere
00:02:15.300 --> 00:02:18.200
and there is an increasing number of
prominent sites on the internet
00:02:18.200 --> 00:02:20.750
that are restricting what you
can do through Tor
00:02:20.750 --> 00:02:24.220
and in some cases Tor is outright blocked
00:02:24.220 --> 00:02:29.310
and in other cases you're slowed down
by CAPTCHAs and other ways
00:02:29.310 --> 00:02:33.799
to sort of make it annoying to visit
00:02:33.799 --> 00:02:35.660
so a brief overview of my talk
00:02:35.660 --> 00:02:37.970
I'm gonna give a little bit of
background on Tor
00:02:37.970 --> 00:02:41.940
and discuss how it's being blocked by
internet services today
00:02:41.940 --> 00:02:43.700
then I'm gonna talk about Wikipedia
00:02:43.700 --> 00:02:47.500
which is a service or a website,
you may have heard of it
00:02:47.500 --> 00:02:51.019
laughing
00:02:51.019 --> 00:02:53.530
that makes it difficult to edit
through Tor
00:02:53.530 --> 00:02:54.980
and I'm gonna talk about their
relationship
00:02:54.980 --> 00:02:57.260
and then I'm gonna discuss some of the
findings that we have
00:02:57.260 --> 00:03:02.640
from our interview-study of Tor users
and Wikipedians.
00:03:02.640 --> 00:03:05.390
So here is some examples of some things
that you might see
00:03:05.390 --> 00:03:07.510
when you are browsing with Tor these days.
00:03:07.510 --> 00:03:12.620
Now, it's worth pointing out that a lot of
these are not individual sites
00:03:12.620 --> 00:03:16.480
but rather content distribution networks,
like Cloudflare and Akamai
00:03:16.480 --> 00:03:20.170
or they're hosting providers like Bluehost
or anti-spam-block-plugins
00:03:20.170 --> 00:03:25.530
that sort of affects a huge, sort of swath
of sites on the internet, not just one.
00:03:25.530 --> 00:03:27.220
There are some individual sites
00:03:27.220 --> 00:03:31.340
say like Yelp, that provide their
own blocking
00:03:31.340 --> 00:03:35.090
but they tend to be somewhat
important sites
00:03:35.090 --> 00:03:37.040
So before I go any further
00:03:37.040 --> 00:03:40.500
I should probably disclose that I'm not
exactly a neutral party here
00:03:40.500 --> 00:03:41.980
I'm married to Roger Dingledine
00:03:41.980 --> 00:03:44.630
who is one of the founders
of the Tor project
00:03:44.630 --> 00:03:48.470
This work is part of a recent experiment
of mine, doing research related to Tor
00:03:48.470 --> 00:03:50.400
while remaining happily married
00:03:50.400 --> 00:03:52.660
so far so good!
00:03:52.660 --> 00:03:56.819
furthermore, this work uses qualitative
ethnographic methods
00:03:56.819 --> 00:04:01.430
which is a bit of a departure from the
machine learning work that I usually do
00:04:01.430 --> 00:04:04.900
mitigating both of these factor is my
wonderful co-author, Andrea Forte
00:04:04.900 --> 00:04:06.919
who is trained in ethnographic methods
00:04:06.919 --> 00:04:09.500
and conducted all of the interview that
I'm going to talk to you about
00:04:13.360 --> 00:04:17.789
So, when I was talking to Roger about this
talk, he said
00:04:17.789 --> 00:04:20.430
most people at CCC will have heard of Tor
by now
00:04:20.430 --> 00:04:22.180
I think that's probably true,
and they'll be aware that
00:04:22.180 --> 00:04:25.909
and they'll be aware that it hides something
about you when you're browsing the Internet
00:04:25.909 --> 00:04:32.280
but, they might be a bit fuzzy on some of
the details, so: very quick recap
00:04:32.280 --> 00:04:35.680
When Alice starts up Tor, her client
starts by fetching a list of relays
00:04:35.680 --> 00:04:36.680
from the directory server.
00:04:36.680 --> 00:04:43.680
Then, the Tor client is gonna pick a
three-hop path to the destination server.
00:04:43.680 --> 00:04:46.840
Hop 1 is gonna know who you are
but not where you're going.
00:04:46.840 --> 00:04:49.969
Then Hop 3 knows where you're going
but not who you are.
00:04:49.969 --> 00:04:52.280
Now there is a link encrypted
from you to hop 3,
00:04:52.280 --> 00:04:55.210
and then hop 3,
which is the exit relay,
00:04:55.210 --> 00:04:57.969
actually delivers your
request to a website.
00:04:57.969 --> 00:05:02.280
Now this part is not encrypted by Tor
and as far as the website is concerned,
00:05:02.280 --> 00:05:07.440
it is actually delivering a request from
the user at the exit relay
00:05:07.440 --> 00:05:11.500
usually when Tor users receive the
blocking screens that I've showed earlier
00:05:11.500 --> 00:05:14.810
it's because the website is blocking
the exit relay's IP address
00:05:14.810 --> 00:05:18.190
so this can happen either because the site
is deliberately blocking tor
00:05:18.190 --> 00:05:22.620
by downloading the directory and blocking
all of the Tor exit IP's
00:05:22.620 --> 00:05:24.680
or because someone did something
unpleasant
00:05:24.680 --> 00:05:26.919
through that exit relay in the past
00:05:26.919 --> 00:05:30.230
and it was put on a blocklist incidentally
00:05:32.510 --> 00:05:34.930
So there's been some research on this
phenomenon
00:05:34.930 --> 00:05:39.560
and here's some cutting-edge research that
hasn't actually even been presented yet
00:05:39.560 --> 00:05:43.500
it's going to be published in the NDSS
conference in February
00:05:43.500 --> 00:05:46.310
by the people up here
00:05:46.310 --> 00:05:50.430
and it's looking sort of quantitatively
about how prevalent
00:05:50.430 --> 00:05:51.930
this blocking problem is.
00:05:51.930 --> 00:06:00.230
We found that of the top 1000 Alexa
sites, 3.5% of them were actually blocked
00:06:00.230 --> 00:06:02.460
for Tor users.
00:06:02.460 --> 00:06:06.990
You can see on this list on the right:
most of the blocking is due to
00:06:06.990 --> 00:06:11.330
aggregate blockers like these hosting
companies and CDNs
00:06:11.330 --> 00:06:13.700
it's also the case that most of the sites
00:06:13.700 --> 00:06:16.810
didn't actually
block 100% of the exit nodes
00:06:16.810 --> 00:06:19.520
But the bigger the exit is bandwidth wise
00:06:19.520 --> 00:06:21.520
thus the higher probability to be
exiting from it
00:06:21.520 --> 00:06:23.520
the more likely it was to be blocked
00:06:23.520 --> 00:06:28.969
so this graph shows of 2000 block sites
from Ooni data
00:06:28.969 --> 00:06:31.520
given the exit node and how probable
it was
00:06:31.520 --> 00:06:34.189
that that exit node would be blocked.
00:06:35.519 --> 00:06:39.440
So one website that blocks Tor users
is Wikipedia
00:06:39.440 --> 00:06:42.399
Now Wikipedia doesn't actually Tor users
from reading Wikipedia
00:06:42.399 --> 00:06:45.599
which is very useful because it's a
resource that's important
00:06:45.599 --> 00:06:48.770
for lots of people to be able to reach,
sometimes anonymously
00:06:48.770 --> 00:06:51.140
but it does prevent them from editing.
00:06:51.140 --> 00:06:53.390
That's true even if they're logged in.
00:06:53.390 --> 00:06:57.190
So according to Wikipedia,
Wikipedia is a free access,
00:06:57.190 --> 00:07:00.020
free content Internet encyclopedia
supported and hosted by the
00:07:00.020 --> 00:07:02.789
non-profit Wikimedia Foundation
00:07:02.789 --> 00:07:05.839
Those who can access this site can
edit most of its articles
00:07:05.839 --> 00:07:08.399
and Wikipedia is ranked among the ten most
popular websites
00:07:08.399 --> 00:07:12.809
and constitutes the Internet's largest and
most popular general reference work
00:07:12.809 --> 00:07:18.559
So right now, y'know, from our vantage
point eight years...
00:07:18.799 --> 00:07:22.820
since this quote in 2007
in probably about...
00:07:22.820 --> 00:07:28.010
I'm not actually sure when Wikipedia was
founded, but some years after
00:07:28.010 --> 00:07:31.959
it's hard to realize what a radical idea
Wikipedia once was
00:07:31.959 --> 00:07:35.950
this encyclopedia that can be edited by,
well, almost anyone
00:07:35.950 --> 00:07:37.839
in 2007 the New York Times said:
00:07:37.839 --> 00:07:40.830
"The problem with WIkipedia is that it
only works in practice.
00:07:40.830 --> 00:07:43.839
In theory, it can never work."
00:07:46.039 --> 00:07:49.149
There's some sort of miracle,
that Wikipedia manages to be
00:07:49.149 --> 00:07:51.820
the resource it is, and it's the sort of
thing that researchers
00:07:51.820 --> 00:07:54.190
and economists have tried to explain
00:07:54.190 --> 00:07:56.209
and they've tried to explain it in the
same way they explain
00:07:56.209 --> 00:07:58.240
the Linux kernel
00:08:01.780 --> 00:08:04.950
this thing happens and nobody quite knows
why
00:08:04.950 --> 00:08:09.310
and it makes Wikipedians today a little
nervous about and conservative perhaps
00:08:09.310 --> 00:08:13.890
about anything that could rock the boat,
affect the quality of the encyclopedia
00:08:14.680 --> 00:08:18.310
but the fact is that Wikipedia needs its
contributors to continue to
00:08:18.310 --> 00:08:20.700
update, expand and improve the resource
00:08:20.700 --> 00:08:26.640
Wikipedia contributions peaked in 2007 and
have been in a slow and steady decline
00:08:26.640 --> 00:08:32.929
so this graph above shows the number of
active registered editors
00:08:32.929 --> 00:08:37.159
who've edited more than 5 edits per month
as plotted over time
00:08:37.159 --> 00:08:40.949
and you can see this peak that happens
in 2007
00:08:42.399 --> 00:08:45.190
the reasons behind this decline are
actually an active area of research
00:08:45.190 --> 00:08:51.250
in their area of concern for the
Wikimedia foundation and so on
00:08:51.250 --> 00:08:54.880
the upshot of it is that Wikipedia can't
exactly afford to
00:08:54.880 --> 00:08:56.820
just throw away good editors.
00:08:57.690 --> 00:09:00.200
Aside from the general decline in
participation
00:09:00.200 --> 00:09:04.160
there's Wikipedia's sort of demographic
imbalance
00:09:04.160 --> 00:09:06.430
Wikipedia editors are 84-91% male
00:09:06.430 --> 00:09:08.510
depending on how you count
00:09:08.510 --> 00:09:10.510
and there is also a lot of
under-representation
00:09:10.510 --> 00:09:12.709
from global south countries
00:09:12.709 --> 00:09:16.019
and there's been a little bit of research
to show how this affects the quality
00:09:16.019 --> 00:09:17.649
of the encyclopedia.
00:09:17.649 --> 00:09:19.840
There's a group of researchers from the
?Groveland's? group at
00:09:19.840 --> 00:09:24.479
the university of Minnesota
and they were interested in this question
00:09:24.479 --> 00:09:28.589
they had access to a database of movie-
ratings and the gender of the raters
00:09:28.589 --> 00:09:31.899
so they compared the length of articles
about movies that were
00:09:31.899 --> 00:09:36.070
disproportionately rated by men or women
while controlling for the popularity
00:09:36.070 --> 00:09:37.720
and the rating of the movie
00:09:37.720 --> 00:09:40.899
and in this case they showed that
male-skewing movies
00:09:40.899 --> 00:09:45.420
had articles that were much longer than
articles about female-skewing movies
00:09:45.420 --> 00:09:49.779
independent of these popularity and
rating effects.
00:09:49.779 --> 00:09:53.760
Now, maybe articles about movies, it's
kind of a trivial thing,
00:09:53.760 --> 00:09:59.959
but it kind of shows you that the editor
population affects article categories
00:09:59.959 --> 00:10:04.180
that might be harder to measure
in such a rigorous way.
00:10:04.180 --> 00:10:07.740
it made us wonder how the absence of
Tor user editors
00:10:07.740 --> 00:10:09.579
affects the quality of the encyclopedia
00:10:09.579 --> 00:10:13.160
and if there's a similar skew that you
might be able to see.
00:10:16.650 --> 00:10:19.610
To help understand and answer this
question, it's worth asking
00:10:19.610 --> 00:10:22.760
what a Wikipedian would
get out of using Tor.
00:10:22.760 --> 00:10:26.060
This question is actually one that has
people kind of confused because
00:10:26.060 --> 00:10:31.659
a lot of people see Tor as a tool that you
use to hide who you are to a website
00:10:32.809 --> 00:10:35.170
and basically no one at Wikipedia is at
all interested
00:10:35.170 --> 00:10:38.660
in letting Tor users Wikipedia without
logging in at all.
00:10:38.660 --> 00:10:42.440
However Tor provides some benefits to
users, even when they're logged in
00:10:42.440 --> 00:10:45.210
and thus not hiding from Wikipedia.
00:10:45.210 --> 00:10:48.840
In particular it protects against certain
surveillance by your local ISP
00:10:48.840 --> 00:10:54.100
or administrative domain, and it can also
protect against government surveillance.
00:10:54.100 --> 00:10:56.830
Furthermore it prevents your IP-address
from being stored
00:10:56.830 --> 00:11:02.220
in the Wikipedia database of user IPs that
can be accessed by administrators
00:11:02.220 --> 00:11:04.470
and attackers.
00:11:04.470 --> 00:11:08.570
We've all seen plenty of cases where
attackers get access
00:11:08.570 --> 00:11:11.130
to databases they're not supposed to.
00:11:12.250 --> 00:11:18.240
Another property that is probably more
easy to think about is reachability.
00:11:18.240 --> 00:11:22.130
Internet connections could be censored,
and Tor might be the only method of
00:11:22.130 --> 00:11:24.560
actually accessing Wikipedia.
00:11:24.560 --> 00:11:28.250
And lastly a lot of Tor users use Tor for
all of their Internet use
00:11:28.250 --> 00:11:32.730
as a mechanism to diversify the user base
and provide cover for and solidarity with
00:11:32.730 --> 00:11:36.880
users that might need Tor for a
different purpose.
00:11:38.630 --> 00:11:44.900
So participation in Internet projects and
open source projects can be dangerous.
00:11:44.900 --> 00:11:47.530
Consider the case of Bassel Khartabil
00:11:47.530 --> 00:11:50.130
who's a well-known Wikipedia editor,
open source software developer
00:11:50.130 --> 00:11:53.260
and the founder of Creative Commons Syria.
00:11:53.260 --> 00:11:58.620
He was jailed for three years and he's now
disappeared, a lot of people think he's dead
00:11:58.620 --> 00:12:02.230
he's very well known for having founded
the New Palmyra project
00:12:02.230 --> 00:12:06.560
which uses satellite and high-resolution
imagery to create open 3d models
00:12:06.560 --> 00:12:07.820
of ancient structures.
00:12:07.820 --> 00:12:12.320
Now these structures were raided by Daesh,
sometimes called ISIS, some time in 2015
00:12:12.320 --> 00:12:17.050
and so this work that he's done is our
best record of these structures
00:12:17.050 --> 00:12:18.720
that now exist.
00:12:20.750 --> 00:12:26.360
In another case, Jimmy Wales announced in
2015 that the Wikipedian of the year could
00:12:26.360 --> 00:12:31.540
not be revealed publicly, because to do so
would actually put the person in danger.
00:12:31.540 --> 00:12:34.890
So, the Wikimedia foundation is also
aware that there are some cases
00:12:34.890 --> 00:12:38.620
where editors need privacy.
00:12:39.180 --> 00:12:43.400
So then, with all these risks, that
Wikipedians face, and the benefits
00:12:43.400 --> 00:12:45.840
that Tor can provide,
why would it be blocked?
00:12:45.840 --> 00:12:48.570
Well, it comes down to abuse.
00:12:48.570 --> 00:12:51.750
The problem of jerks is a real problem
on the Internet.
00:12:51.750 --> 00:12:55.440
Though the research is somewhat ambiguous
as to the degree at which it's actually
00:12:55.440 --> 00:12:56.660
made worse by anonymity,
00:12:56.660 --> 00:13:02.230
there's this very popular theory on the
Internet that if you take a normal person
00:13:02.230 --> 00:13:07.110
and anonymity and an audience,
they become a total dickwad.
00:13:07.210 --> 00:13:11.110
Nonetheless, managing abuse is actually
somewhat harder
00:13:11.110 --> 00:13:14.250
with anonymous participants, and there's
certainly this perception that
00:13:14.250 --> 00:13:19.000
anonymity can make people more
susceptible to abusive behavior.
00:13:22.130 --> 00:13:25.040
Fortunately the cryptographic
research community has studied
00:13:25.040 --> 00:13:27.600
how to reconcile anonymity and
blacklisting of users
00:13:27.600 --> 00:13:30.880
and has found some pretty promising
solutions.
00:13:30.880 --> 00:13:35.670
The first, which I'll discuss briefly here
is Apu Kapadia's Nymble design.
00:13:35.670 --> 00:13:40.040
There have been many variants of this,
including Nymbler, ?Jackbenable?, Jack,
00:13:40.040 --> 00:13:42.120
you get the idea.
00:13:42.120 --> 00:13:46.840
Basically when Alice wants to contribute
anonymously to a website or a project
00:13:46.840 --> 00:13:49.970
she uses a pseudonym server to get
a pseudonym.
00:13:49.970 --> 00:13:53.550
Then she gives that 'nym to a
nym-manager
00:13:53.550 --> 00:13:55.779
and that nym-manager
gives her a ticket.
00:13:55.779 --> 00:13:59.450
That ticket is then used to connect to the
site she wants to participate on,
00:13:59.450 --> 00:14:03.069
so it's another way to sort of distribute
the trust.
00:14:03.339 --> 00:14:07.340
But our Alice is a jerk, so
she vandalizes the website.
00:14:07.430 --> 00:14:10.760
The website then complains to the Nymble
manager which will then send the server
00:14:10.760 --> 00:14:14.089
a token that can be used to link that user
in the future.
00:14:14.089 --> 00:14:16.980
The server then adds the user to a
blacklist.
00:14:18.740 --> 00:14:21.720
So basically the way that this works is
that everything the user has done
00:14:21.720 --> 00:14:24.820
before the complaint still remains
anonymous forever,
00:14:24.820 --> 00:14:28.170
but everything that they do in the future
is linkable
00:14:28.170 --> 00:14:31.290
and thus it remains easier to block them.
00:14:32.200 --> 00:14:37.090
There has basically been no adoption of
this kind of protocol,
00:14:37.090 --> 00:14:40.160
despite a lot of iterations in the
literature.
00:14:40.320 --> 00:14:42.560
There are some reasons for this:
00:14:42.560 --> 00:14:45.380
many of the variants have no
implementation, and those that do
00:14:45.380 --> 00:14:48.050
it's research code and as the author
of some research code...
00:14:48.050 --> 00:14:50.949
I can tell you that there would be
significant work involved in
00:14:50.949 --> 00:14:53.140
actually adopting these measures.
00:14:53.140 --> 00:14:56.380
And there is a price to be paid. You have
pick between either having
00:14:56.380 --> 00:15:00.480
a semi-trusted third party, degraded
notions of privacy,
00:15:00.480 --> 00:15:02.950
so basically pseudonymity
rather than anonymity,
00:15:02.950 --> 00:15:05.240
or high computational overhead
00:15:05.240 --> 00:15:08.160
because zero-knowledge proofs are
still kind of expensive.
00:15:08.160 --> 00:15:11.960
But it could well be done, and it's not
like you need all of these things,
00:15:11.960 --> 00:15:13.360
you only need one,
00:15:13.360 --> 00:15:17.870
but ultimately it isn't being done, and I
think this is because most sites
00:15:17.870 --> 00:15:23.060
don't really care. They believe that the
number of non-jerks might not be zero,
00:15:23.060 --> 00:15:28.350
but it's approximately zero,
and it's just not worth the bother.
00:15:29.600 --> 00:15:33.680
So we're interested in measuring this
value of anonymous participation
00:15:33.680 --> 00:15:37.740
to sort of provide motivation for sites to
actually try and solve these problems.
00:15:37.990 --> 00:15:42.120
It's not a terribly easy thing to do,
because Tor is blocked so often
00:15:42.120 --> 00:15:45.050
we're actually trying to measure
participation that doesn't happen,
00:15:45.050 --> 00:15:47.490
that might happen under
alternate circumstances.
00:15:47.490 --> 00:15:51.300
To ask this question we turned to
qualitative methods, which is
00:15:51.300 --> 00:15:53.020
basically an interview study.
00:15:53.020 --> 00:15:56.429
We talked to Tor users who participate in
open collaboration, and we talked to
00:15:56.429 --> 00:15:58.990
Wikipedia editors about their privacy
concerns.
00:16:01.510 --> 00:16:03.649
So we have two basic research questions:
00:16:03.649 --> 00:16:05.839
first, what kind of threats do
contributors
00:16:05.839 --> 00:16:09.899
to open collaboration projects perceive,
and second:
00:16:09.899 --> 00:16:13.850
how do people who contribute to open
collaboration projects manage the risk?
00:16:13.850 --> 00:16:16.990
The goal here is to get the kind of
in-depth and qualitative
00:16:16.990 --> 00:16:19.490
understanding that will help us to ask
the right questions
00:16:19.490 --> 00:16:23.000
in a larger scale study, and ensure that
we're solving the right problems
00:16:23.000 --> 00:16:28.069
when we design systems to facilitate
anonymous participation in online projects
00:16:29.219 --> 00:16:30.970
As ?Cera McDonald? Pikelet said:
00:16:30.970 --> 00:16:36.470
"They're not anecdotes, that's small
batch artisanal data..."
00:16:38.320 --> 00:16:42.730
So a little bit about our 23 participants
in our study
00:16:42.730 --> 00:16:45.339
We had 12 participants that were Tor users
00:16:45.339 --> 00:16:50.640
8 males, 3 females and 1 of fluid gender.
00:16:50.640 --> 00:16:55.410
The minimum age was 18, the maximum age
was 41 and the average was 30.
00:16:55.410 --> 00:17:01.020
3 people with a high school education, 4
current and graduated undergraduates
00:17:01.020 --> 00:17:07.048
and 5 people with post-graduate degrees or
who were graduate students.
00:17:08.398 --> 00:17:13.279
The location: 7 of the participants were
from the U.S. but we also had
00:17:13.279 --> 00:17:18.699
participants from Australia, Belgium,
Canada, South Africa and Sweden.
00:17:18.959 --> 00:17:26.169
For the Wikimedia participants, we had
again 8 males and 3 females.
00:17:26.169 --> 00:17:31.649
Actually I think the demographics of Tor
and Wikimedia might not be too different.
00:17:31.649 --> 00:17:37.159
The minimum age was 20 and the max was 53,
again the average was 30.
00:17:37.159 --> 00:17:42.360
One didn't report their education level,
we had 8 people with bachelor's degrees
00:17:42.360 --> 00:17:47.330
or undergraduate students, and 2 graduate
students or people with graduate degrees.
00:17:47.330 --> 00:17:51.620
Again we had 5 participants from the U.S.,
but we also had participants from
00:17:51.620 --> 00:17:56.309
Australia, France, Ghana, Israel
and the U.K. in this case.
00:17:56.309 --> 00:18:00.740
So we didn't have - a lot of people talked
to us - we didn't have any participants
00:18:00.740 --> 00:18:05.559
from places like Iran or China, though we
did have some Iranians who were
00:18:05.559 --> 00:18:08.520
living in the U.S. who talked to us.
00:18:08.520 --> 00:18:12.230
So types of participation
00:18:12.230 --> 00:18:15.489
Obviously we had Wikipedians,
we sought them out
00:18:15.489 --> 00:18:18.440
a number of the people that we talked
to, especially the Tor users
00:18:18.440 --> 00:18:21.310
who actually contribute to
the Tor project in some way
00:18:21.310 --> 00:18:24.559
but we asked people about their other
participation on the Internet,
00:18:24.559 --> 00:18:28.300
especially Tor users, and we found that
there are a lot of people that participate
00:18:28.300 --> 00:18:34.000
through adding web comments, participating
on forums, using Twitter...
00:18:34.000 --> 00:18:37.740
contributing open source code to projects
on Github or Sourceforge
00:18:37.740 --> 00:18:40.850
or other projects on the Internet, helping
with the Internet archive
00:18:40.850 --> 00:18:46.100
or contributing to image boards...
to sites that do that.
00:18:46.100 --> 00:18:50.120
So our interview protocol: we gave 20
dollars in compensation,
00:18:50.120 --> 00:18:51.480
gift cards or cash.
00:18:51.480 --> 00:18:58.200
30% of people declined this because we
would need to register their participation
00:18:58.200 --> 00:19:02.809
if we give them compensation, and some
people didn't want there to be
00:19:02.809 --> 00:19:03.980
as much of a record.
00:19:03.980 --> 00:19:07.509
We spoke to people over the phone, using
Skype, using
00:19:07.509 --> 00:19:11.809
various encrypted audio mechanisms,
one person was interviewed face to face.
00:19:11.809 --> 00:19:14.669
The interviews were again conducted by
Andrea Forte
00:19:14.669 --> 00:19:19.260
and we asked people to tell in-depth
stories and prompted them for detail.
00:19:19.690 --> 00:19:23.630
Our analysis of this is ongoing, it's
not done,
00:19:24.310 --> 00:19:28.319
we've transcribed all the interviews,
we've coded them to identify the themes
00:19:28.319 --> 00:19:30.480
and we grouped and merged some of these
themes.
00:19:30.480 --> 00:19:34.009
I'm going to talk to you about some of the
stuff that came out of this study,
00:19:34.009 --> 00:19:37.009
give some quotes and things like that.
00:19:37.579 --> 00:19:38.520
Interview topics.
00:19:38.520 --> 00:19:42.299
For Tor users we asked them to explain Tor
and what it's for. We asked for some
00:19:42.299 --> 00:19:44.879
current and retrospective examples of use,
00:19:44.879 --> 00:19:48.169
the story of how and why they first
started using Tor,
00:19:48.169 --> 00:19:52.139
and some examples of when they use Tor
online and when they don't use Tor online
00:19:52.139 --> 00:19:55.489
and some questions about their
participation in online projects
00:19:55.489 --> 00:19:59.480
and if they participate in Wikipedia we
asked them some of the Wikipedia questions
00:19:59.480 --> 00:20:02.249
similarly with Wikipedia people who had
used Tor.
00:20:02.249 --> 00:20:05.560
And there was some considerable overlap.
00:20:06.590 --> 00:20:09.640
For Wikipedians we asked how and why they
started editing,
00:20:09.640 --> 00:20:12.289
examples of privacy concerns associated
with their editing,
00:20:12.289 --> 00:20:15.169
steps they may have taken to protect their
privacy when editing,
00:20:15.169 --> 00:20:18.450
and examples of interactions with other
editors.
00:20:18.820 --> 00:20:24.170
Now, there's some real limitations with
this work:
00:20:24.450 --> 00:20:28.210
we may be missing participants with severe
privacy concerns.
00:20:28.940 --> 00:20:32.519
Anybody who participate in this would have
talk to unknown parties
00:20:32.519 --> 00:20:36.700
that they couldn't necessarily trust that
we were not going to do
00:20:36.700 --> 00:20:40.199
any nefarious things with their interview.
00:20:40.279 --> 00:20:43.769
They need to speak remotely over a
communications channel in most cases
00:20:43.769 --> 00:20:48.909
we were willing to conduct some interviews
over various encrypted channels
00:20:48.909 --> 00:20:51.950
such as Jitsi or really whatever people
wanted us to do,
00:20:51.950 --> 00:20:53.519
as long as we could set it up.
00:20:53.519 --> 00:20:56.500
Though we didn't mention Skype in our
recruitment materials,
00:20:56.500 --> 00:20:59.899
and this actually caused a bit of a
kerfuffle on the Tor blog
00:20:59.899 --> 00:21:04.700
when people were saying we clearly don't
understand Tor
00:21:04.700 --> 00:21:08.399
and have no familiarity with the project
if we're even thinking of using Skype
00:21:08.399 --> 00:21:14.099
I know a couple of Tor users and Tor
developers that use Skype, so...
00:21:14.179 --> 00:21:17.809
but, y'know, we were willing to
use other things,
00:21:17.809 --> 00:21:20.700
and we again didn't talk to residents of
Iran or China,
00:21:20.700 --> 00:21:25.319
which is something that a lot of people
told us might be of interest.
00:21:25.319 --> 00:21:28.459
So, what does anonymity actually mean to a
00:21:28.459 --> 00:21:32.040
Wikipedian, was an interesting question.
Because it doesn't mean the same thing
00:21:32.040 --> 00:21:36.999
that it usually means to a Tor user. So,
a lot of times when people talk about
00:21:36.999 --> 00:21:40.440
anonymous edits in Wikipedia they mean
editing without logging in.
00:21:40.440 --> 00:21:45.649
And this is actually called IP editing to
Wikipedians, because what happens when you
00:21:45.649 --> 00:21:50.820
edit Wikipedia without logging in is that
the IP address is actually published
00:21:50.820 --> 00:21:53.409
as the author of that edit.
00:21:53.409 --> 00:21:57.450
The other thing that people mean when
they talk about editing anonymously is
00:21:57.450 --> 00:22:01.399
editing under a synonymous account while
not leaving clues about your identity.
00:22:03.300 --> 00:22:06.250
The notion of IP editing is somewhat
problematic.
00:22:06.500 --> 00:22:10.289
This was an article from Buzzfeed about
00:22:10.289 --> 00:22:15.879
the 33 most embarassing congressional
edits to member's Wikipedia pages.
00:22:15.879 --> 00:22:20.960
The congressional offices in the U.S. all
share one IP address,
00:22:20.960 --> 00:22:24.200
so you can simply search Wikipedia for
that IP address
00:22:24.200 --> 00:22:26.980
and you can find people making revisions,
00:22:26.980 --> 00:22:32.379
for example to the liberty caucus
Wikipedia site and so on.
00:22:34.259 --> 00:22:39.659
So in terms of content-based anonymity,
according to the Wikipedians we talked to,
00:22:39.659 --> 00:22:42.490
most deanonymisation is done actually by
contextual clues.
00:22:42.490 --> 00:22:45.779
When people are outed as being this
pseudonymous Wikipedia person,
00:22:45.779 --> 00:22:48.229
it's usually because somebody
looked up things.
00:22:48.229 --> 00:22:49.960
There was a quote, someone said:
00:22:49.960 --> 00:22:53.590
"these is small things but I usually
wouldn't edit things relating to my school
00:22:53.590 --> 00:22:55.909
or places near where I lived
when I was logged in.
00:22:55.909 --> 00:22:58.720
It's actually weirdly easy to piece
together someone's identity
00:22:58.720 --> 00:23:01.220
based on the location or things like that"
00:23:01.220 --> 00:23:04.279
So Tor, it's worth pointing out the limits
of what Tor can do
00:23:04.279 --> 00:23:07.920
Tor is not gonna help with this particular
problem
00:23:07.920 --> 00:23:09.320
it will hide your IP address
00:23:09.320 --> 00:23:13.850
but not necessarily this.
00:23:16.310 --> 00:23:19.070
What is the Wikipedia policy on Tor?
00:23:19.070 --> 00:23:23.590
Mediawiki has a TorBlock extension, which
automatically blocks editing through Tor
00:23:23.590 --> 00:23:27.570
Now, it's possible to actually get an
exemption,
00:23:27.570 --> 00:23:31.970
what is called an IP block exemption, and
registered users in good standing
00:23:31.970 --> 00:23:33.559
can ask for one.
00:23:33.559 --> 00:23:36.789
The problem is, it's a little bit hard to
establish that standing
00:23:36.789 --> 00:23:41.249
it requires editing without using Tor.
00:23:41.739 --> 00:23:49.159
When pointed out that this is particularly
problematic for censored users,
00:23:49.159 --> 00:23:52.279
because they can't access Wikipedia to
edit in the first place,
00:23:52.279 --> 00:23:56.720
although they do provide some closed
proxies for Chinese users in particular,
00:23:56.720 --> 00:24:00.309
there are a lot of censored users that
aren't Chinese but...
00:24:00.309 --> 00:24:04.499
you can contact them to ask to use their
sort of secret proxies.
00:24:04.499 --> 00:24:06.909
I don't know how well this actually works.
00:24:06.909 --> 00:24:11.700
But we did ask our interviewees, can
Wikipedia be edited through Tor?
00:24:11.700 --> 00:24:15.649
Which is an interesting question. So,
as a convention for the rest of the talk
00:24:15.649 --> 00:24:19.109
when you see these blue boxes, they are
gonna be quotes from Wikipedians,
00:24:19.109 --> 00:24:22.009
when you see the green boxes, they're
quotes from Tor users.
00:24:22.009 --> 00:24:27.400
When we asked people, the WIkipedians
often said: if the account exists,
00:24:27.400 --> 00:24:31.019
yes, when you're doing an anonymous edit
with Tor it's really difficult
00:24:31.969 --> 00:24:34.450
they mean an IP edit there.
And then he said:
00:24:34.450 --> 00:24:36.469
I had one that came
through the mailing list
00:24:36.469 --> 00:24:39.289
in the last couple of weeks, and that
their employer had been
00:24:39.289 --> 00:24:41.700
checking up on them... we allowed that.
00:24:41.700 --> 00:24:45.349
So as an administrator I have a user bot
that allows me to get around that,
00:24:45.349 --> 00:24:49.459
but as well as feeling bad about that,
other people don't have that option.
00:24:50.759 --> 00:24:55.440
From a Tor user, we actually said: but
sometimes, like every so many exit nodes,
00:24:55.440 --> 00:24:57.999
you sometimes one have works...
so many sites block Tor,
00:24:57.999 --> 00:25:01.259
try to block it, it's quite annoying as
you're trying to do something.
00:25:01.259 --> 00:25:05.969
So this person sort of... saw what... in
the research of blocking Tor,
00:25:05.969 --> 00:25:09.419
not every exit node is blocked, so if
you're really determined to make that
00:25:09.419 --> 00:25:15.389
anonymous edit, you can just keep clicking
'New Identity' and get there.
00:25:16.359 --> 00:25:20.130
And then they said: we do sometimes let
people edit through them,
00:25:20.130 --> 00:25:23.139
I know we have users in China coming
through the Great Firewall
00:25:23.139 --> 00:25:25.139
and stuff like that.
00:25:25.249 --> 00:25:29.179
So then ...
[[ audio cuts out for 4 seconds ]]
00:25:29.179 --> 00:25:35.820
Tor user, y'know, well they...
[[ audio cuts out for 16 seconds ]]
00:25:35.820 --> 00:25:55.070
[[ audio cuts out for 16 seconds ]]
00:25:55.070 --> 00:25:59.670
[[ 5 seconds audio cut remaining ]]
00:25:59.670 --> 00:26:01.099
...things like that.
00:26:01.099 --> 00:26:04.340
So because you can change your IP address
with the click of a button,
00:26:04.340 --> 00:26:07.910
it's very difficult to prevent abuse.
00:26:09.110 --> 00:26:14.189
There's this sort of notion that maybe
it's important for vandalism,
00:26:14.189 --> 00:26:17.789
but maybe that's a problem, and maybe
there should be something that be done.
00:26:17.789 --> 00:26:20.799
So then, a lot of what asked people about
was sort of the threats
00:26:20.799 --> 00:26:23.779
that they were concerned about, from a
data privacy perspective.
00:26:23.779 --> 00:26:27.899
People talked about government threats,
businesses, organized crime,
00:26:27.899 --> 00:26:32.579
private citizens, other project members,
and project outsiders.
00:26:32.759 --> 00:26:38.179
When we group the threats, we found sort
of five or so big threats
00:26:38.179 --> 00:26:41.940
that lots of people talked about, we had
twelve different instances of
00:26:41.940 --> 00:26:45.389
people talking about surveillance concerns
or general concerns about
00:26:45.389 --> 00:26:47.739
the loss of privacy.
00:26:47.739 --> 00:26:50.969
Ten people talked specifically about the
loss of employment
00:26:50.969 --> 00:26:55.979
or economic opportunity that might happen,
9 people talked about bullying,
00:26:55.979 --> 00:26:59.700
harassment, intimidation, stalking,
this sort of thing.
00:26:59.760 --> 00:27:04.429
Another 9 people talked about personal
safety, or the safety of their loved ones.
00:27:04.429 --> 00:27:10.100
6 people that we talked to, talked about
reputation loss.
00:27:10.100 --> 00:27:12.909
I'll get into these in more detail.
00:27:13.309 --> 00:27:14.679
Surveillance.
00:27:14.679 --> 00:27:18.090
Y'know, in my country there is basically
unknown surveillance going on
00:27:18.090 --> 00:27:21.369
and I don't know what providers to use,
and at some point I decided to
00:27:21.369 --> 00:27:22.619
use Tor for everything.
00:27:22.619 --> 00:27:25.919
It's worth pointing out given the list of
countries I gave that
00:27:25.919 --> 00:27:30.850
this isn't necessarily the list and...
I think you wouldn't get this list of
00:27:30.850 --> 00:27:36.320
kinda quotes maybe before the Snowden
revelations about generalized surveillance
00:27:36.320 --> 00:27:38.029
across the world.
00:27:38.029 --> 00:27:41.160
A lot of people talked about how their
online activities were
00:27:41.160 --> 00:27:45.140
being accessed or logged without their
consent, and especially among
00:27:45.140 --> 00:27:47.669
Tor users there was this
notion of wanting to be
00:27:47.669 --> 00:27:51.189
public by effort, but private by default.
00:27:51.319 --> 00:27:57.049
And when you talk to Wikipedians, they
talked about their edit histories and how
00:27:57.049 --> 00:28:01.299
the edit histories themselves might be
somewhat sensitive.
00:28:03.809 --> 00:28:06.799
In terms of loss of employment...
00:28:06.799 --> 00:28:13.049
many many employers now look at your
online footprint before they hire you.
00:28:13.049 --> 00:28:16.719
According to Monster, one of the big
employment websites,
00:28:16.719 --> 00:28:20.730
77% of employers google perspective
employees.
00:28:22.180 --> 00:28:26.810
From a Tor user, we had someone talk about
"I am transgender, I am queer, my boss
00:28:26.810 --> 00:28:30.369
would rant for hours about this kind of
person, that kind of person, the other
00:28:30.369 --> 00:28:34.179
kind of person, all of which I happen to
be... and I decided if I was going to do
00:28:34.179 --> 00:28:37.829
anything online at all, I better look into
options for protecting myself, because
00:28:37.829 --> 00:28:40.179
I didn't want to get fired."
00:28:40.179 --> 00:28:44.529
In Wikipedia, someone said: "A friend of
mine was also involved in this discussion
00:28:44.529 --> 00:28:47.910
and he actually got it worse than I did.
He's in a position now where
00:28:47.910 --> 00:28:52.110
anyone who googles him finds allegations
that he is this awful monster, and
00:28:52.110 --> 00:28:55.369
he's terrified of having to look for work
now because you google him,
00:28:55.369 --> 00:28:57.379
and that's what you find.
00:28:57.379 --> 00:29:01.750
So these things can have a real impact
on people. So...
00:29:01.790 --> 00:29:05.989
and then there is harassment. So this is
a quote from a Wikipedian who said:
00:29:05.989 --> 00:29:10.239
"I would say that the fear of harassment
of real, of stalking and things like that
00:29:10.239 --> 00:29:13.539
is quite substantial, at least among
administrators I know,
00:29:13.539 --> 00:29:15.309
especially women."
00:29:15.309 --> 00:29:18.519
From a Tor user there was someone who
talked about "this is a map
00:29:18.519 --> 00:29:21.989
of active hate groups in the
United States"
00:29:21.989 --> 00:29:25.609
and how they had experienced problems
with these hate groups in the past
00:29:25.609 --> 00:29:29.519
and they wanted to see who was active in
their area, and they would
00:29:29.519 --> 00:29:33.320
go to the websites of these hate groups
and sort of for obvious reasons
00:29:33.320 --> 00:29:37.549
they didn't want their home IP address
to appear in the logs of these
00:29:37.549 --> 00:29:40.179
hate group websites.
00:29:42.889 --> 00:29:46.759
Safety of loved ones,
also personal safety.
00:29:47.179 --> 00:29:51.499
A lot of people talked about, y'know,
real, concrete, not just threats but
00:29:51.499 --> 00:29:54.779
things that had happened to them or to
people that they knew.
00:29:54.779 --> 00:29:59.129
In Tor there is this story: they bursted
his door down and
00:29:59.129 --> 00:30:02.149
they beat the ever living crap out of him.
He was hospitalized
00:30:02.149 --> 00:30:05.850
for two and a half weeks, and they told
him: "if you and your family wanna live,
00:30:05.850 --> 00:30:07.840
you're gonna have to stop causing trouble"
00:30:07.840 --> 00:30:09.570
and they said that to him in farsee.
00:30:09.570 --> 00:30:12.750
I have a family so after I visited him
in the hospital, I started...
00:30:12.750 --> 00:30:15.909
well at first I started shaking, and I
went into a cold sweat
00:30:15.909 --> 00:30:20.019
and then I realized I have to start taking
my human rights activities
00:30:20.019 --> 00:30:22.459
into other identities through
the Tor network.
00:30:22.869 --> 00:30:24.659
And on the Wikipedia side:
00:30:24.659 --> 00:30:28.229
"I pulled back from some of that Wikipedia
work when I could no longer hide
00:30:28.229 --> 00:30:32.179
in quite the same way. For a long time I
lived on my own, so it's just my own
00:30:32.179 --> 00:30:36.049
personal risk I was taking with things,
now my wife lives here as well
00:30:36.049 --> 00:30:37.699
and I can't take that same risk."
00:30:41.329 --> 00:30:45.619
Lastly, people were concerned about
reputation loss.
00:30:45.619 --> 00:30:52.179
In Wikipedia there has been known to be
edit wars that escalate into vendettas
00:30:52.179 --> 00:30:55.879
here's a sort of example of an edit war
where y'know some user says:
00:30:55.879 --> 00:31:03.779
"I hate big bitch Alison," who is then
blocked indefinitely by Alison.
00:31:03.779 --> 00:31:07.220
People are worried about this sort of
thing escalating and then somebody
00:31:07.220 --> 00:31:12.179
doing something off of the Internet to
call them names, or mess with their
00:31:12.179 --> 00:31:15.599
reputation... and that would have a
negative effect on their life.
00:31:15.599 --> 00:31:21.919
In Tor there is a couple interesting cases
that sort of concerns guilt by association
00:31:21.919 --> 00:31:24.529
So there is someone who participates on
image boards,
00:31:24.529 --> 00:31:27.059
on 8chan or infinite chan,
00:31:27.059 --> 00:31:31.380
and I don't know if you guys are that
aware of this... it's sort of the place
00:31:31.380 --> 00:31:34.310
which was kind of started by people that
were blocked by 4chan,
00:31:34.310 --> 00:31:36.830
so it's the people that 4chan think are
kind of sketchy
00:31:36.830 --> 00:31:39.740
laughter
00:31:39.740 --> 00:31:43.499
and this person said: "Look, I stand
behind the material and the content that
00:31:43.499 --> 00:31:45.789
I have created, but some people
on this site,
00:31:45.789 --> 00:31:48.999
I wouldn't wanna be associated with them."
00:31:48.999 --> 00:31:53.549
So, there is another person who talked
about "look I've created some online
00:31:53.549 --> 00:31:59.249
resources about various pharmaceuticals,
but I don't wanna be very associated
00:31:59.249 --> 00:32:04.009
with the community that posts stuff about
stuff like that.
00:32:05.499 --> 00:32:07.119
So some other threats.
00:32:07.919 --> 00:32:10.929
Some people talked about diminished
project quality.
00:32:10.929 --> 00:32:15.619
In particular a lot of the Wikipedians
that we talked to
00:32:15.619 --> 00:32:18.149
were somewhat prominent in the
Wikipedia project,
00:32:18.149 --> 00:32:21.979
and in some respects had kind of achieved
some degree of like
00:32:21.979 --> 00:32:25.909
rock star status as editors, if such
things can be.
00:32:26.379 --> 00:32:30.459
They found it very difficult to edit
anymore because they'd edit a page
00:32:30.459 --> 00:32:34.059
and that page hadn't received a lot of
attention but people would see that
00:32:34.059 --> 00:32:37.510
they had edited it and there would be
sort of hordes of people that would
00:32:37.510 --> 00:32:40.479
descend on that page, and mess with it.
00:32:40.489 --> 00:32:44.420
And they found that they couldn't do that
without actually sort of harming the pages
00:32:44.420 --> 00:32:46.239
that they were trying to edit.
00:32:46.239 --> 00:32:50.599
Similarly, there were some Tor users who
were talked about, y'know,
00:32:50.599 --> 00:32:54.690
not wanting to sort of... take credit for
their work because they were worried
00:32:54.690 --> 00:32:58.769
they wouldn't have the credentials to be
taken seriously in various ways,
00:32:58.769 --> 00:33:00.029
or things like that.
00:33:00.029 --> 00:33:03.940
Only two people in our project actually
talked about worrying about
00:33:03.940 --> 00:33:12.320
legal sort of sanctions, government
sanctions for their participation.
00:33:12.320 --> 00:33:16.320
There were a lot of people that talked
about computer security concerns
00:33:16.320 --> 00:33:19.769
which is not so much a privacy concern,
though it's very related, and I'm
00:33:19.769 --> 00:33:24.460
going to talk about that because this
group might be interested.
00:33:24.460 --> 00:33:27.749
On the Tor side, people liked to see
authentication properties
00:33:27.749 --> 00:33:32.440
of .onion services. The idea that when
you go to a .onion website,
00:33:32.440 --> 00:33:37.440
the address is self-authenticating, you
know where you're going.
00:33:37.440 --> 00:33:41.289
But a lot of people who use Tor talked
about the general data hygiene idea
00:33:41.289 --> 00:33:45.879
that there's sort of less data about them
in unknown websites,
00:33:45.879 --> 00:33:49.159
in unknown databases of companies
because they don't leave as many
00:33:49.159 --> 00:33:55.010
online footprints, and then you see all
these high profile break-ins that happen
00:33:55.010 --> 00:33:58.639
and these databases get stolen, if you're
using Tor, maybe you're less likely
00:33:58.639 --> 00:34:00.209
to be in those databases.
00:34:00.209 --> 00:34:02.599
That was the idea there.
00:34:02.599 --> 00:34:05.969
From Wikipedia a lot of people were
concerned about
00:34:05.969 --> 00:34:08.020
their Wikipedia credentials.
00:34:08.020 --> 00:34:12.879
They talked about not logging in on
public terminals and things like that,
00:34:12.879 --> 00:34:17.590
in particular being concerned about the
security of administrative credentials
00:34:17.590 --> 00:34:22.679
that have privileges to, for example, look
up the IP address of users who had edited
00:34:22.679 --> 00:34:25.989
and things like that, which could
be abused.
00:34:27.309 --> 00:34:30.410
So some concrete things that the people
were afraid of,
00:34:30.410 --> 00:34:31.999
not a complete list:
00:34:31.999 --> 00:34:35.069
having their head photoshopped onto porn,
something that happens
00:34:35.069 --> 00:34:37.260
sometimes to editors...
00:34:37.260 --> 00:34:40.729
being beaten up, actually a couple of Tor
people mentioned this;
00:34:40.729 --> 00:34:43.260
being swatted;
receiving pipe bombs;
00:34:43.260 --> 00:34:47.080
having fake information about them
published online.
00:34:47.320 --> 00:34:52.180
Though there were people that said, look,
I don't really see a threat.
00:34:52.180 --> 00:34:56.469
And some participants said they don't
perceive threats when they're contributing
00:34:56.469 --> 00:35:00.800
but in a lot of cases they pointed out
that they enjoyed certain privileges
00:35:00.800 --> 00:35:04.020
related to perhaps their gender, their
nationality, or the fact that
00:35:04.020 --> 00:35:05.970
their interests were fairly mainstream.
00:35:05.970 --> 00:35:08.700
So here's a quote:
"yeah I'm not that worried about it,
00:35:08.700 --> 00:35:11.960
mainly because there's pretty good support
for some of these viewpoints,
00:35:11.960 --> 00:35:15.450
kind of a mainstream discourse, and it's
not so radical, I don't think anyone's
00:35:15.450 --> 00:35:17.300
going to be knocking down on my door.
00:35:17.300 --> 00:35:20.390
But I've been in contact with activists
who have been engaged with
00:35:20.390 --> 00:35:23.440
higher risk activities, and I do wonder
about, I do have concerns
00:35:23.440 --> 00:35:27.470
about their welfare, and the desire they
have to have the tools to
00:35:27.470 --> 00:35:31.930
be able to pursue their activities without
facing consequences."
00:35:31.930 --> 00:35:38.500
So in contrast to the jerk theme, there
are a lot of people who run Tor
00:35:38.500 --> 00:35:43.330
out of a sense of altruism, to provide
cover and solidarity.
00:35:43.920 --> 00:35:47.460
Someone said, I appreciate the need for
protecting vulnerable people
00:35:47.460 --> 00:35:51.390
around the world, so I run several relays,
some of them are exit relays,
00:35:51.390 --> 00:35:54.470
some of them are middle relays, and I
run them around the world".
00:35:54.470 --> 00:35:57.820
And someone else said:
"While you use it, you help
00:35:57.820 --> 00:36:01.950
diversify the network for those who may be
subject to traffic monitoring, and you can
00:36:01.950 --> 00:36:05.820
look up any information you like, whether
or not it's sensitive, and you'll get it,
00:36:05.820 --> 00:36:09.370
and if you live in a place where it may
not be the greatest in legal standing
00:36:09.370 --> 00:36:13.289
to look it up, you're able to find out
information."
00:36:14.459 --> 00:36:19.839
So mitigating strategies, how did people
deal with this when they wanted to
00:36:19.839 --> 00:36:26.319
participate in sites but they couldn't do
it through anonymous means, well,
00:36:26.319 --> 00:36:29.520
some people modified their participation,
and I'll talk about some of
00:36:29.520 --> 00:36:35.940
the chilling effects that we saw, and also
attempts to get anonymity in various ways
00:36:37.440 --> 00:36:40.079
So, lost editors.
00:36:40.389 --> 00:36:43.210
Several Tor users that we talked to,
actually mentioned that
00:36:43.210 --> 00:36:47.700
they had edited Wikipedia and they no
longer edited it, or they edited it
00:36:47.700 --> 00:36:50.230
less because of the difficulty of editing
through Tor.
00:36:50.230 --> 00:36:53.380
There was someone who said:
"Basically I used to edit Wikipedia
00:36:53.380 --> 00:36:57.470
prior to doing a lot of Tor, so yeah now
it's mostly reading... I used to
00:36:57.470 --> 00:37:01.730
do a lot of editing for license design
and for like some open source licenses,
00:37:01.730 --> 00:37:06.840
occasionally random forms and stuff that I
knew about, sometimes grammar.
00:37:09.780 --> 00:37:13.289
And people talked to us in particular
about the chilling effects
00:37:13.289 --> 00:37:17.910
of state surveillance, and in particular
the Snowden revelations.
00:37:17.910 --> 00:37:22.179
In March of 2015 Wikimedia foundation
announced that it was
00:37:22.179 --> 00:37:25.720
suing the National Security Agency.
00:37:25.720 --> 00:37:29.409
We asked people about that, and
the Wikipedians, some of them said
00:37:29.409 --> 00:37:32.929
"People aren't willing to engage with us
when they know their government is
00:37:32.929 --> 00:37:36.960
watching their every move." And they
said that in particular they can show
00:37:36.960 --> 00:37:39.960
that editing dropped off significantly on
certain articles
00:37:39.960 --> 00:37:42.680
after the Upstream program was revealed.
00:37:42.680 --> 00:37:48.329
Here's a quote from one of our Tor users
in the study that substantiates this.
00:37:48.329 --> 00:37:51.330
"For the Edward Snowden page, I've pulled
myself away from adding
00:37:51.330 --> 00:37:54.429
sensitive contributions, like different
references, because I thought
00:37:54.429 --> 00:37:59.100
that made be traced back to me
in some way. But not refraining from
00:37:59.100 --> 00:38:00.400
useful content I guess."
00:38:00.400 --> 00:38:04.779
Though, of course, adding references is
one of the things that contributes to
00:38:04.779 --> 00:38:09.819
the quality of articles and so on, and in
particular they said, articles about
00:38:09.819 --> 00:38:16.089
national security things, about terrorism
and so on, people didn't edit as much
00:38:16.089 --> 00:38:21.510
about these things anymore because they
were worried about ending up on a list.
00:38:21.510 --> 00:38:27.349
The other major topic that was chilled was
articles about women's health.
00:38:27.349 --> 00:38:31.890
So, here's a picture of a vacuum
aspiration abortion from the
00:38:31.890 --> 00:38:39.049
Wikipedia abortion article and a couple
of people told us about how, "look, any
00:38:39.049 --> 00:38:44.609
site that has to do with women or women's
issues is more contentiously edited,
00:38:44.609 --> 00:38:49.280
is more likely of inflaming people,
getting into edit wars, than other sites."
00:38:50.100 --> 00:38:53.769
There were a lot of trolls on the Internet
and there's a quote on the Internet:
00:38:53.769 --> 00:38:57.359
"Trolls have called their bosses and been
like 'Do you know that your employee
00:38:57.359 --> 00:38:59.510
was editing the clitoris article last
week?'"
00:38:59.510 --> 00:39:01.829
They will do stuff like that.
00:39:01.829 --> 00:39:07.000
So this means that, y'know, in particular
someone talked about "I was a medical
00:39:07.000 --> 00:39:10.890
student, I had my obstetrics text book
open, I was looking at the abortion
00:39:10.890 --> 00:39:14.029
article, I was thinking about making some
changes, but then I just
00:39:14.029 --> 00:39:20.460
pulled myself back and said, y'know,
I don't need that in my life."
00:39:20.460 --> 00:39:26.490
This is another area where privacy
concerns push back, cause people
00:39:26.490 --> 00:39:29.839
to not necessarily do things...
00:39:29.839 --> 00:39:36.539
And then there's this idea of a threshold
of participation, that the more involved
00:39:36.539 --> 00:39:40.529
you are, the more active you are in a
project, the more likely you're actually
00:39:40.529 --> 00:39:43.569
gonna encounter real problems.
00:39:43.569 --> 00:39:48.069
People involved in curating content,
deleting things, promoting things,
00:39:48.069 --> 00:39:51.619
arbitrating disputes, etc., they're going
to make enemies.
00:39:51.619 --> 00:39:54.200
Some of these enemies are going to make
nasty threats,
00:39:54.200 --> 00:39:56.550
and some of them are gonna act on them.
00:39:56.550 --> 00:40:00.000
Here is another quote of somebody:
"As long as I have that pseudonym ...
00:40:00.000 --> 00:40:05.330
"As long as I have that pseudonym ...
[[ see slide ]]
00:40:05.330 --> 00:40:10.549
[[ see slide ]]
... that turns up when you do that."
00:40:10.549 --> 00:40:14.720
People mention in particular, from the
Wikipedia side, that there were two sites:
00:40:14.720 --> 00:40:21.150
Wikipediocracy and The Wikipedia Review,
where people have critiques of Wikipedia
00:40:21.150 --> 00:40:27.860
and that people on these sites had done
threats and doxing of various people
00:40:27.860 --> 00:40:29.910
on the arbitration committee.
00:40:29.910 --> 00:40:33.160
Someone talked about "they found my
parents' home address, they found
00:40:33.160 --> 00:40:36.439
one of my old phone numbers, they wrote a
blog post about all of these
00:40:36.439 --> 00:40:39.330
horrible things I've done, and here's my
contact information,
00:40:39.330 --> 00:40:44.869
and for a good time call... and when it's
on the Internet it doesn't die.
00:40:45.099 --> 00:40:51.729
People that get to a certain level of
doing things, like handling abuse,
00:40:51.729 --> 00:40:53.629
had problems.
00:40:53.629 --> 00:40:57.630
So since I didn't have any privacy, I felt
limited in what I could do, I could still
00:40:57.630 --> 00:41:00.219
write articles but blocking people
was something
00:41:00.219 --> 00:41:03.209
I tried to avoid, since I didn't wanna
get angry phone calls.
00:41:03.209 --> 00:41:06.269
So someone else also talked about
activities that they used to do,
00:41:06.269 --> 00:41:08.429
but then after receiving threats and
things...
00:41:08.429 --> 00:41:12.440
I used to check for use of the N-word, the
ruder of the two F-words, one or two other
00:41:12.440 --> 00:41:16.969
things that were indicative of problems in
user space, and I deleted lots and lots of
00:41:16.969 --> 00:41:20.260
attack pages which were fairly hot in
dealing with them when they would
00:41:20.260 --> 00:41:23.779
turn up in article space, and when people
create a user account in somebody
00:41:23.779 --> 00:41:27.380
else's name and say a bunch of things
about that person they won't agree with,
00:41:27.380 --> 00:41:30.520
I used to deal with that, but then, y'know
they're not willing to
00:41:30.520 --> 00:41:33.560
deal with that anymore.
00:41:35.120 --> 00:41:37.729
Privacy measures that people took.
00:41:37.959 --> 00:41:42.730
Obviously in some cases people use Tor, we
talked to Tor users where that's possible
00:41:42.730 --> 00:41:46.460
People also talk about avoiding posting
linking information and details
00:41:46.460 --> 00:41:53.710
about who they are, not editing things
about y'know, their local things,
00:41:53.710 --> 00:41:57.710
things only they would know, etc.
00:41:57.710 --> 00:42:02.750
People talked about using Proxies or VPNs,
some people talked about HideMyAss,
00:42:02.750 --> 00:42:08.470
editing from a public computer using
multiple accounts in some cases, and
00:42:08.470 --> 00:42:18.590
using privacy browser plug ins and
safeguards like NoScript and Ghostery
00:42:18.590 --> 00:42:23.540
We asked people, both Tor users and
not Tor users if they had used Tor,
00:42:23.540 --> 00:42:27.359
what they thought of Tor, and there was
this person who said: "I tried using Tor,
00:42:27.359 --> 00:42:31.249
I did, when I was younger, and everything
was so slow and terrible, I was just like
00:42:31.249 --> 00:42:32.850
'so not worh it'."
00:42:32.850 --> 00:42:38.470
And in fact a couple years ago, Tor was in
fact pretty slow - it's gotten better!
00:42:38.470 --> 00:42:41.349
But the Tor users still talked about
bit about latencies, but
00:42:41.349 --> 00:42:45.630
a lot of them talked about these issues of
CAPTCHAs, unusable website features,
00:42:45.630 --> 00:42:47.940
the fact that it used to be slow...
00:42:47.940 --> 00:42:51.920
and Wikipedians on Tor talked about it
being slow or too much trouble,
00:42:51.920 --> 00:42:56.069
just the need to download the software and
connect to it every time... and people,
00:42:56.069 --> 00:42:58.680
some people found it unnecessary.
00:42:58.680 --> 00:43:04.569
There was some other interesting things
that came up.
00:43:04.569 --> 00:43:06.250
Some people talked about how
00:43:06.250 --> 00:43:09.440
they used information ?revelation?
as a defense mechanism.
00:43:09.440 --> 00:43:14.559
This idea that, okay, I'm gonna give you
some information about me, so you can't
00:43:14.559 --> 00:43:18.920
really dox me because that's my address
right there, or whatever.
00:43:18.920 --> 00:43:23.740
But people talked also about the limits of
long term participation. A lot of people
00:43:23.740 --> 00:43:28.670
that talked to us had started editing or
participating in online projects
00:43:28.670 --> 00:43:32.680
as a relatively young teenager,
and a lot of people
00:43:32.680 --> 00:43:37.450
start with things like fixing typos,
before they later become a member
00:43:37.450 --> 00:43:40.630
of the arbitration committee, or something
like that.
00:43:40.630 --> 00:43:44.460
It's hard to have this long term
perspective when you're first creating
00:43:44.460 --> 00:43:48.650
your login name and you identity
and so on.
00:43:48.650 --> 00:44:06.559
"Until it happens to you ...
[[ see slide ]]
00:44:06.559 --> 00:44:10.769
[[ see slide ]]
... some serious thought."
00:44:11.849 --> 00:44:17.400
As most good, ethnographic studies do, and
as this one was intended to do,
00:44:17.400 --> 00:44:21.420
it sort of raises more questions
than answers.
00:44:21.420 --> 00:44:23.190
That was our goal.
00:44:23.190 --> 00:44:27.970
We're hoping... we learned that Tor users
and Wikipedians share some
00:44:27.970 --> 00:44:32.480
privacy concerns, but they do have some
different perspectives.
00:44:32.480 --> 00:44:36.019
And we did learn that some value of
participation is being lost when people
00:44:36.019 --> 00:44:38.779
can't participate in a private way.
00:44:38.869 --> 00:44:44.180
We'd like to use this work to do some
follow-up studies, and also perhaps
00:44:44.180 --> 00:44:48.470
build a larger survey study so we can
learn more, see things that are more
00:44:48.470 --> 00:44:53.400
quantitative about this work.
00:44:53.400 --> 00:44:56.869
If you find this topic interesting, a
short plug for
00:44:56.869 --> 00:44:59.250
the privacy enhancing technology symposium
00:44:59.250 --> 00:45:02.779
which will be in July in Darmstadt.
00:45:02.779 --> 00:45:06.369
We're not presenting this particular
work here, but there is a lot of
00:45:06.369 --> 00:45:14.760
work on Tor, anonymity, privacy, so on
from the research community.
00:45:14.760 --> 00:45:19.480
And I'd like to thank my co-authors,
Andrea Forte and Nazanin Andalibi,
00:45:19.480 --> 00:45:25.400
our interview participants, the WIkimedia
foundation, the Tor project,
00:45:25.400 --> 00:45:29.039
the National Science Foundation that
funded Andrea's and my participation
00:45:29.039 --> 00:45:33.869
in this project, and all the people whose
images I've used in my slides...
00:45:33.869 --> 00:45:36.900
so... Thanks!
Any questions? Oh and by the way
00:45:36.900 --> 00:45:42.949
I'll be here for the whole conference, so
you can find me afterwards if...
00:45:42.949 --> 00:45:51.549
applause
00:45:51.549 --> 00:45:56.510
Herald Angel: Thanks a lot, Rachel
Greenstadt. And so, we hopefully have
00:45:56.510 --> 00:46:01.400
a few questions from you in the audience,
you can line behind the microphones
00:46:01.400 --> 00:46:05.940
we have 4 of them here in the audience
and also in the back there are 2,
00:46:05.940 --> 00:46:11.650
and we also have the Signal Angel present
but he didn't get any questions yet,
00:46:11.650 --> 00:46:14.790
but maybe some comments or something?
00:46:14.790 --> 00:46:16.819
Some feedback from the crowd on the
Internet?
00:46:16.819 --> 00:46:18.660
Rachel Greenstadt: but there is somebody
with a... [inaudible]
00:46:18.660 --> 00:46:23.369
Herald Angel: then let me immediately go
to the questions in the audience.
00:46:23.369 --> 00:46:26.210
Herald Angel: We have microphone 2, please
00:46:26.210 --> 00:46:32.900
HA: And, one second, can you please be
quiet if you go outside? Because that's
00:46:32.900 --> 00:46:34.319
really rude.
00:46:34.319 --> 00:46:39.139
Question: did you find out if Wikipedia
for example treats classical VPN or
00:46:39.139 --> 00:46:40.769
proxies differently from Tor?
00:46:40.769 --> 00:46:44.029
Rachel Greendstadt: If what?
Question: if they treat them differently
00:46:44.029 --> 00:46:48.730
from Tor, so do they have the same policy
in place for blocking, let's say,
00:46:48.730 --> 00:46:54.370
private VPN which can also be used to
change your IP with the click of a button,
00:46:54.370 --> 00:46:59.239
if you want to bully someone but it might
offer less privacy than Tor, but if you
00:46:59.239 --> 00:47:01.869
really only want to bully someone,
that might be enough.
00:47:01.869 --> 00:47:06.240
Rachel Greenstadt: I think it depends,
is the answer.
00:47:06.240 --> 00:47:12.349
The extensions that they have, they do
block a lot of things from IPs so I think
00:47:12.349 --> 00:47:15.700
it depends on if there's been abuse
through that thing before,
00:47:15.700 --> 00:47:20.480
they try and block open proxies, I think
some people said certain VPNs you can
00:47:20.480 --> 00:47:23.400
still edit through, and some you couldn't,
it really depended.
00:47:23.400 --> 00:47:28.010
Herald Angel: Thanks, microphone 1 please.
00:47:28.010 --> 00:47:31.520
Question: Wikipedia is by no means an
isolated case, right?
00:47:31.520 --> 00:47:34.569
RA: No, no
Question: And there's more and more
00:47:34.569 --> 00:47:39.510
capability of blocking Tor exit nodes and
whatnot, so where's the project going?
00:47:39.510 --> 00:47:43.529
I mean, the Great Firewall for example
could very well block all its users from
00:47:43.529 --> 00:47:46.559
accessing Tor, right?
RA: It actually does.
00:47:46.559 --> 00:47:52.279
So it blocks people from accessing Tor and
it blocks people from accessing Wikipedia,
00:47:52.279 --> 00:47:56.140
in terms of the Tor project there are
mechanisms through using
00:47:56.140 --> 00:48:01.960
pluggable transports and bridge addresses,
they can actually help people still
00:48:01.960 --> 00:48:05.920
access Tor, and then they'll be able to
read Wikipedia, but then again
00:48:05.920 --> 00:48:08.049
they won't be able to edit for these
reasons.
00:48:08.049 --> 00:48:13.340
HA: So, again, we have 15 minutes of break
after this, so you can get out after this
00:48:13.340 --> 00:48:16.359
and change the room, and please be
quiet if you really have to
00:48:16.359 --> 00:48:20.439
leave the room already or if you come in
the room already. Thank you.
00:48:20.439 --> 00:48:22.430
Now to the Signal Angel, please.
00:48:22.430 --> 00:48:27.579
Signal Angel: There is one question from
the Internet, from ?Whyness?, he or she
00:48:27.579 --> 00:48:31.829
is asking if there's actual a recorded
instance of someone attempting to
00:48:31.829 --> 00:48:36.059
put a pipe bomb in the post
because of Wikipedia edits.
00:48:36.059 --> 00:48:42.519
RA: I certainly don't have such
information. This was just
00:48:42.519 --> 00:48:46.799
people telling us things that they were
concerned about, or things that
00:48:46.799 --> 00:48:51.000
there had been threats that they'd
experienced.
00:48:51.000 --> 00:48:54.369
Nobody that I know of specifically
mentioned that they experienced
00:48:54.369 --> 00:48:55.369
a pipe bomb.
00:48:55.369 --> 00:49:01.470
Signal Angel: And another question from
?a_monk?: if blocked Tor traffic
00:49:01.470 --> 00:49:05.839
is a problem, why does the Tor project
publish the exit IP list, making it
00:49:05.839 --> 00:49:08.329
easy to block?
00:49:08.329 --> 00:49:16.000
RA: That would be a question for the Tor
people, my understanding of it is that
00:49:16.000 --> 00:49:20.339
the Tor project does try and be a good
Internet citizen and they don't want to
00:49:20.339 --> 00:49:26.650
encourage the kind of, sort of, arms race
that would happen with sort of...
00:49:26.650 --> 00:49:30.349
people trying to like find all the exits,
and block them versus making it
00:49:30.349 --> 00:49:34.479
just look, here it is, this is what's
going on, and... it's also very helpful
00:49:34.479 --> 00:49:37.970
when you're running an exit node, to be
able to say, look, this thing is
00:49:37.970 --> 00:49:42.819
an exit node and that's what was going on
when this thing happened
00:49:42.819 --> 00:49:49.369
through my computer. So I think, y'know,
there's the ability of the exit relay
00:49:49.369 --> 00:49:54.069
operators to be able to say what they're
doing is also an important concern.
00:49:54.069 --> 00:49:59.119
Herald Angel: so there's standing someone
at microphone 5.
00:49:59.119 --> 00:50:03.680
Question: You mentioned zero-knowledge
proofs in the beginning, is there any more
00:50:03.680 --> 00:50:05.269
research on this?
00:50:05.269 --> 00:50:13.269
RA: Uhm, yeah, so... If you look at the
research on Nymble
00:50:13.269 --> 00:50:15.639
by Apu Kapadia, there's also some people
00:50:15.639 --> 00:50:19.089
in Nick Hopper's group at the university
of Minnesota, there's also
00:50:19.089 --> 00:50:24.169
Ryan Henry in Indiana University
that's done a lot of work on this
00:50:24.169 --> 00:50:27.680
in Ian Goldberg's group at Waterloo,
those are the people that I would
00:50:27.680 --> 00:50:32.359
look up in terms of anonymous blacklisting
schemes, and I'm sure I'm forgetting
00:50:32.359 --> 00:50:35.700
some of them right now, so hopefully
they'll forgive me, but those are
00:50:35.700 --> 00:50:37.430
good places to start.
00:50:37.430 --> 00:50:41.799
Herald Angel: we have the next question at
microphone 1.
00:50:41.799 --> 00:50:49.039
Question: Do you know if Wikipedia ever
thought about hashing IP addresses,
00:50:49.039 --> 00:50:55.960
so that the contributions are still unique
but the users are anonymized?
00:50:57.610 --> 00:51:02.029
RA: Nobody at WIkipedia talked to us about
that, so I do not know if they thought
00:51:02.029 --> 00:51:04.089
about that or not.
00:51:04.089 --> 00:51:10.559
Herald Angel: and the last comment or
question at the Signal Angel microphone.
00:51:10.559 --> 00:51:14.859
Signal Angel: Thanks, not really a
question, more a comment...
00:51:14.859 --> 00:51:22.359
"I just wanted to relate, indeed Wikipedia
blocking Tor is pretty concerned
00:51:22.359 --> 00:51:28.750
also for Tor users because for instance,
the French Wikipedia articles about Tor
00:51:28.750 --> 00:51:34.650
have very, very poor quality and lot of
people end up asking us questions about
00:51:34.650 --> 00:51:39.930
Tor and are missing from because of that,
and I cannot fix it because I am not
00:51:39.930 --> 00:51:44.500
willing to edit Wikipedia without Tor. And
that is also a pretty big issue I think."
00:51:44.500 --> 00:51:49.109
RA: Yeah, so it would be interesting from
my perspective, using this to then look at
00:51:49.109 --> 00:51:53.230
the articles, the types of articles about
Tor, about anonymous participation,
00:51:53.230 --> 00:51:58.059
where we would suggest... we'd like to do
a bigger study, learn what articles about
00:51:58.059 --> 00:52:03.130
that anonymous users would edit if they
were going to edit Wikipedia, and then
00:52:03.130 --> 00:52:07.309
we could do an analysis like they did
about the movie sites to figure out
00:52:07.309 --> 00:52:11.739
if these articles are in some way shorter
or of lower quality than other articles
00:52:11.739 --> 00:52:13.970
because they're missing that perspective.
00:52:13.970 --> 00:52:20.569
Herald Angel: Thank you Rachel, thank you
for the questions, and warm applause again
00:52:20.569 --> 00:52:21.789
for Rachel GreenStadt.
00:52:21.959 --> 00:52:23.700
applause
00:52:23.780 --> 00:52:24.709
RA: Thanks
00:52:25.989 --> 00:52:29.831
tune playing
00:52:29.831 --> 00:52:37.000
subtitles created by c3subtitles.de
Join, and help us!