WEBVTT 00:02:01.480 --> 00:02:03.700 Rachel Greenstadt: pressure on or from ISPs 00:02:03.700 --> 00:02:06.950 would make it difficult or impossible to run an exit relay 00:02:06.950 --> 00:02:11.500 however the third point is the one that I'm gonna mostly be talking about today: 00:02:11.500 --> 00:02:15.300 Tor is not very useful if you can't actually use it to get anywhere 00:02:15.300 --> 00:02:18.200 and there is an increasing number of prominent sites on the internet 00:02:18.200 --> 00:02:20.750 that are restricting what you can do through Tor 00:02:20.750 --> 00:02:24.220 and in some cases Tor is outright blocked 00:02:24.220 --> 00:02:29.310 and in other cases you're slowed down by CAPTCHAs and other ways 00:02:29.310 --> 00:02:33.799 to sort of make it annoying to visit 00:02:33.799 --> 00:02:35.660 so a brief overview of my talk 00:02:35.660 --> 00:02:37.970 I'm gonna give a little bit of background on Tor 00:02:37.970 --> 00:02:41.940 and discuss how it's being blocked by internet services today 00:02:41.940 --> 00:02:43.700 then I'm gonna talk about Wikipedia 00:02:43.700 --> 00:02:47.500 which is a service or a website, you may have heard of it 00:02:47.500 --> 00:02:51.019 laughing 00:02:51.019 --> 00:02:53.530 that makes it difficult to edit through Tor 00:02:53.530 --> 00:02:54.980 and I'm gonna talk about their relationship 00:02:54.980 --> 00:02:57.260 and then I'm gonna discuss some of the findings that we have 00:02:57.260 --> 00:03:02.640 from our interview-study of Tor users and Wikipedians. 00:03:02.640 --> 00:03:05.390 So here is some examples of some things that you might see 00:03:05.390 --> 00:03:07.510 when you are browsing with Tor these days. 00:03:07.510 --> 00:03:12.620 Now, it's worth pointing out that a lot of these are not individual sites 00:03:12.620 --> 00:03:16.480 but rather content distribution networks, like Cloudflare and Akamai 00:03:16.480 --> 00:03:20.170 or they're hosting providers like Bluehost or anti-spam-block-plugins 00:03:20.170 --> 00:03:25.530 that sort of affects a huge, sort of swath of sites on the internet, not just one. 00:03:25.530 --> 00:03:27.220 There are some individual sites 00:03:27.220 --> 00:03:31.340 say like Yelp, that provide their own blocking 00:03:31.340 --> 00:03:35.090 but they tend to be somewhat important sites 00:03:35.090 --> 00:03:37.040 So before I go any further 00:03:37.040 --> 00:03:40.500 I should probably disclose that I'm not exactly a neutral party here 00:03:40.500 --> 00:03:41.980 I'm married to Roger Dingledine 00:03:41.980 --> 00:03:44.630 who is one of the founders of the Tor project 00:03:44.630 --> 00:03:48.470 This work is part of a recent experiment of mine, doing research related to Tor 00:03:48.470 --> 00:03:50.400 while remaining happily married 00:03:50.400 --> 00:03:52.660 so far so good! 00:03:52.660 --> 00:03:56.819 furthermore, this work uses qualitative ethnographic methods 00:03:56.819 --> 00:04:01.430 which is a bit of a departure from the machine learning work that I usually do 00:04:01.430 --> 00:04:04.900 mitigating both of these factor is my wonderful co-author, Andrea Forte 00:04:04.900 --> 00:04:06.919 who is trained in ethnographic methods 00:04:06.919 --> 00:04:09.500 and conducted all of the interview that I'm going to talk to you about 00:04:13.360 --> 00:04:17.789 So, when I was talking to Roger about this talk, he said 00:04:17.789 --> 00:04:20.430 most people at CCC will have heard of Tor by now 00:04:20.430 --> 00:04:22.180 I think that's probably true, and they'll be aware that 00:04:22.180 --> 00:04:25.909 and they'll be aware that it hides something about you when you're browsing the Internet 00:04:25.909 --> 00:04:32.280 but, they might be a bit fuzzy on some of the details, so: very quick recap 00:04:32.280 --> 00:04:35.680 When Alice starts up Tor, her client starts by fetching a list of relays 00:04:35.680 --> 00:04:36.680 from the directory server. 00:04:36.680 --> 00:04:43.680 Then, the Tor client is gonna pick a three-hop path to the destination server. 00:04:43.680 --> 00:04:46.840 Hop 1 is gonna know who you are but not where you're going. 00:04:46.840 --> 00:04:49.969 Then Hop 3 knows where you're going but not who you are. 00:04:49.969 --> 00:04:52.280 Now there is a link encrypted from you to hop 3, 00:04:52.280 --> 00:04:55.210 and then hop 3, which is the exit relay, 00:04:55.210 --> 00:04:57.969 actually delivers your request to a website. 00:04:57.969 --> 00:05:02.280 Now this part is not encrypted by Tor and as far as the website is concerned, 00:05:02.280 --> 00:05:07.440 it is actually delivering a request from the user at the exit relay 00:05:07.440 --> 00:05:11.500 usually when Tor users receive the blocking screens that I've showed earlier 00:05:11.500 --> 00:05:14.810 it's because the website is blocking the exit relay's IP address 00:05:14.810 --> 00:05:18.190 so this can happen either because the site is deliberately blocking tor 00:05:18.190 --> 00:05:22.620 by downloading the directory and blocking all of the Tor exit IP's 00:05:22.620 --> 00:05:24.680 or because someone did something unpleasant 00:05:24.680 --> 00:05:26.919 through that exit relay in the past 00:05:26.919 --> 00:05:30.230 and it was put on a blocklist incidentally 00:05:32.510 --> 00:05:34.930 So there's been some research on this phenomenon 00:05:34.930 --> 00:05:39.560 and here's some cutting-edge research that hasn't actually even been presented yet 00:05:39.560 --> 00:05:43.500 it's going to be published in the NDSS conference in February 00:05:43.500 --> 00:05:46.310 by the people up here 00:05:46.310 --> 00:05:50.430 and it's looking sort of quantitatively about how prevalent 00:05:50.430 --> 00:05:51.930 this blocking problem is. 00:05:51.930 --> 00:06:00.230 We found that of the top 1000 Alexa sites, 3.5% of them were actually blocked 00:06:00.230 --> 00:06:02.460 for Tor users. 00:06:02.460 --> 00:06:06.990 You can see on this list on the right: most of the blocking is due to 00:06:06.990 --> 00:06:11.330 aggregate blockers like these hosting companies and CDNs 00:06:11.330 --> 00:06:13.700 it's also the case that most of the sites 00:06:13.700 --> 00:06:16.810 didn't actually block 100% of the exit nodes 00:06:16.810 --> 00:06:19.520 But the bigger the exit is bandwidth wise 00:06:19.520 --> 00:06:21.520 thus the higher probability to be exiting from it 00:06:21.520 --> 00:06:23.520 the more likely it was to be blocked 00:06:23.520 --> 00:06:28.969 so this graph shows of 2000 block sites from Ooni data 00:06:28.969 --> 00:06:31.520 given the exit node and how probable it was 00:06:31.520 --> 00:06:34.189 that that exit node would be blocked. 00:06:35.519 --> 00:06:39.440 So one website that blocks Tor users is Wikipedia 00:06:39.440 --> 00:06:42.399 Now Wikipedia doesn't actually Tor users from reading Wikipedia 00:06:42.399 --> 00:06:45.599 which is very useful because it's a resource that's important 00:06:45.599 --> 00:06:48.770 for lots of people to be able to reach, sometimes anonymously 00:06:48.770 --> 00:06:51.140 but it does prevent them from editing. 00:06:51.140 --> 00:06:53.390 That's true even if they're logged in. 00:06:53.390 --> 00:06:57.190 So according to Wikipedia, Wikipedia is a free access, 00:06:57.190 --> 00:07:00.020 free content Internet encyclopedia supported and hosted by the 00:07:00.020 --> 00:07:02.789 non-profit Wikimedia Foundation 00:07:02.789 --> 00:07:05.839 Those who can access this site can edit most of its articles 00:07:05.839 --> 00:07:08.399 and Wikipedia is ranked among the ten most popular websites 00:07:08.399 --> 00:07:12.809 and constitutes the Internet's largest and most popular general reference work 00:07:12.809 --> 00:07:18.559 So right now, y'know, from our vantage point eight years... 00:07:18.799 --> 00:07:22.820 since this quote in 2007 in probably about... 00:07:22.820 --> 00:07:28.010 I'm not actually sure when Wikipedia was founded, but some years after 00:07:28.010 --> 00:07:31.959 it's hard to realize what a radical idea Wikipedia once was 00:07:31.959 --> 00:07:35.950 this encyclopedia that can be edited by, well, almost anyone 00:07:35.950 --> 00:07:37.839 in 2007 the New York Times said: 00:07:37.839 --> 00:07:40.830 "The problem with WIkipedia is that it only works in practice. 00:07:40.830 --> 00:07:43.839 In theory, it can never work." 00:07:46.039 --> 00:07:49.149 There's some sort of miracle, that Wikipedia manages to be 00:07:49.149 --> 00:07:51.820 the resource it is, and it's the sort of thing that researchers 00:07:51.820 --> 00:07:54.190 and economists have tried to explain 00:07:54.190 --> 00:07:56.209 and they've tried to explain it in the same way they explain 00:07:56.209 --> 00:07:58.240 the Linux kernel 00:08:01.780 --> 00:08:04.950 this thing happens and nobody quite knows why 00:08:04.950 --> 00:08:09.310 and it makes Wikipedians today a little nervous about and conservative perhaps 00:08:09.310 --> 00:08:13.890 about anything that could rock the boat, affect the quality of the encyclopedia 00:08:14.680 --> 00:08:18.310 but the fact is that Wikipedia needs its contributors to continue to 00:08:18.310 --> 00:08:20.700 update, expand and improve the resource 00:08:20.700 --> 00:08:26.640 Wikipedia contributions peaked in 2007 and have been in a slow and steady decline 00:08:26.640 --> 00:08:32.929 so this graph above shows the number of active registered editors 00:08:32.929 --> 00:08:37.159 who've edited more than 5 edits per month as plotted over time 00:08:37.159 --> 00:08:40.949 and you can see this peak that happens in 2007 00:08:42.399 --> 00:08:45.190 the reasons behind this decline are actually an active area of research 00:08:45.190 --> 00:08:51.250 in their area of concern for the Wikimedia foundation and so on 00:08:51.250 --> 00:08:54.880 the upshot of it is that Wikipedia can't exactly afford to 00:08:54.880 --> 00:08:56.820 just throw away good editors. 00:08:57.690 --> 00:09:00.200 Aside from the general decline in participation 00:09:00.200 --> 00:09:04.160 there's Wikipedia's sort of demographic imbalance 00:09:04.160 --> 00:09:06.430 Wikipedia editors are 84-91% male 00:09:06.430 --> 00:09:08.510 depending on how you count 00:09:08.510 --> 00:09:10.510 and there is also a lot of under-representation 00:09:10.510 --> 00:09:12.709 from global south countries 00:09:12.709 --> 00:09:16.019 and there's been a little bit of research to show how this affects the quality 00:09:16.019 --> 00:09:17.649 of the encyclopedia. 00:09:17.649 --> 00:09:19.840 There's a group of researchers from the ?Groveland's? group at 00:09:19.840 --> 00:09:24.479 the university of Minnesota and they were interested in this question 00:09:24.479 --> 00:09:28.589 they had access to a database of movie- ratings and the gender of the raters 00:09:28.589 --> 00:09:31.899 so they compared the length of articles about movies that were 00:09:31.899 --> 00:09:36.070 disproportionately rated by men or women while controlling for the popularity 00:09:36.070 --> 00:09:37.720 and the rating of the movie 00:09:37.720 --> 00:09:40.899 and in this case they showed that male-skewing movies 00:09:40.899 --> 00:09:45.420 had articles that were much longer than articles about female-skewing movies 00:09:45.420 --> 00:09:49.779 independent of these popularity and rating effects. 00:09:49.779 --> 00:09:53.760 Now, maybe articles about movies, it's kind of a trivial thing, 00:09:53.760 --> 00:09:59.959 but it kind of shows you that the editor population affects article categories 00:09:59.959 --> 00:10:04.180 that might be harder to measure in such a rigorous way. 00:10:04.180 --> 00:10:07.740 it made us wonder how the absence of Tor user editors 00:10:07.740 --> 00:10:09.579 affects the quality of the encyclopedia 00:10:09.579 --> 00:10:13.160 and if there's a similar skew that you might be able to see. 00:10:16.650 --> 00:10:19.610 To help understand and answer this question, it's worth asking 00:10:19.610 --> 00:10:22.760 what a Wikipedian would get out of using Tor. 00:10:22.760 --> 00:10:26.060 This question is actually one that has people kind of confused because 00:10:26.060 --> 00:10:31.659 a lot of people see Tor as a tool that you use to hide who you are to a website 00:10:32.809 --> 00:10:35.170 and basically no one at Wikipedia is at all interested 00:10:35.170 --> 00:10:38.660 in letting Tor users Wikipedia without logging in at all. 00:10:38.660 --> 00:10:42.440 However Tor provides some benefits to users, even when they're logged in 00:10:42.440 --> 00:10:45.210 and thus not hiding from Wikipedia. 00:10:45.210 --> 00:10:48.840 In particular it protects against certain surveillance by your local ISP 00:10:48.840 --> 00:10:54.100 or administrative domain, and it can also protect against government surveillance. 00:10:54.100 --> 00:10:56.830 Furthermore it prevents your IP-address from being stored 00:10:56.830 --> 00:11:02.220 in the Wikipedia database of user IPs that can be accessed by administrators 00:11:02.220 --> 00:11:04.470 and attackers. 00:11:04.470 --> 00:11:08.570 We've all seen plenty of cases where attackers get access 00:11:08.570 --> 00:11:11.130 to databases they're not supposed to. 00:11:12.250 --> 00:11:18.240 Another property that is probably more easy to think about is reachability. 00:11:18.240 --> 00:11:22.130 Internet connections could be censored, and Tor might be the only method of 00:11:22.130 --> 00:11:24.560 actually accessing Wikipedia. 00:11:24.560 --> 00:11:28.250 And lastly a lot of Tor users use Tor for all of their Internet use 00:11:28.250 --> 00:11:32.730 as a mechanism to diversify the user base and provide cover for and solidarity with 00:11:32.730 --> 00:11:36.880 users that might need Tor for a different purpose. 00:11:38.630 --> 00:11:44.900 So participation in Internet projects and open source projects can be dangerous. 00:11:44.900 --> 00:11:47.530 Consider the case of Bassel Khartabil 00:11:47.530 --> 00:11:50.130 who's a well-known Wikipedia editor, open source software developer 00:11:50.130 --> 00:11:53.260 and the founder of Creative Commons Syria. 00:11:53.260 --> 00:11:58.620 He was jailed for three years and he's now disappeared, a lot of people think he's dead 00:11:58.620 --> 00:12:02.230 he's very well known for having founded the New Palmyra project 00:12:02.230 --> 00:12:06.560 which uses satellite and high-resolution imagery to create open 3d models 00:12:06.560 --> 00:12:07.820 of ancient structures. 00:12:07.820 --> 00:12:12.320 Now these structures were raided by Daesh, sometimes called ISIS, some time in 2015 00:12:12.320 --> 00:12:17.050 and so this work that he's done is our best record of these structures 00:12:17.050 --> 00:12:18.720 that now exist. 00:12:20.750 --> 00:12:26.360 In another case, Jimmy Wales announced in 2015 that the Wikipedian of the year could 00:12:26.360 --> 00:12:31.540 not be revealed publicly, because to do so would actually put the person in danger. 00:12:31.540 --> 00:12:34.890 So, the Wikimedia foundation is also aware that there are some cases 00:12:34.890 --> 00:12:38.620 where editors need privacy. 00:12:39.180 --> 00:12:43.400 So then, with all these risks, that Wikipedians face, and the benefits 00:12:43.400 --> 00:12:45.840 that Tor can provide, why would it be blocked? 00:12:45.840 --> 00:12:48.570 Well, it comes down to abuse. 00:12:48.570 --> 00:12:51.750 The problem of jerks is a real problem on the Internet. 00:12:51.750 --> 00:12:55.440 Though the research is somewhat ambiguous as to the degree at which it's actually 00:12:55.440 --> 00:12:56.660 made worse by anonymity, 00:12:56.660 --> 00:13:02.230 there's this very popular theory on the Internet that if you take a normal person 00:13:02.230 --> 00:13:07.110 and anonymity and an audience, they become a total dickwad. 00:13:07.210 --> 00:13:11.110 Nonetheless, managing abuse is actually somewhat harder 00:13:11.110 --> 00:13:14.250 with anonymous participants, and there's certainly this perception that 00:13:14.250 --> 00:13:19.000 anonymity can make people more susceptible to abusive behavior. 00:13:22.130 --> 00:13:25.040 Fortunately the cryptographic research community has studied 00:13:25.040 --> 00:13:27.600 how to reconcile anonymity and blacklisting of users 00:13:27.600 --> 00:13:30.880 and has found some pretty promising solutions. 00:13:30.880 --> 00:13:35.670 The first, which I'll discuss briefly here is Apu Kapadia's Nymble design. 00:13:35.670 --> 00:13:40.040 There have been many variants of this, including Nymbler, ?Jackbenable?, Jack, 00:13:40.040 --> 00:13:42.120 you get the idea. 00:13:42.120 --> 00:13:46.840 Basically when Alice wants to contribute anonymously to a website or a project 00:13:46.840 --> 00:13:49.970 she uses a pseudonym server to get a pseudonym. 00:13:49.970 --> 00:13:53.550 Then she gives that 'nym to a nym-manager 00:13:53.550 --> 00:13:55.779 and that nym-manager gives her a ticket. 00:13:55.779 --> 00:13:59.450 That ticket is then used to connect to the site she wants to participate on, 00:13:59.450 --> 00:14:03.069 so it's another way to sort of distribute the trust. 00:14:03.339 --> 00:14:07.340 But our Alice is a jerk, so she vandalizes the website. 00:14:07.430 --> 00:14:10.760 The website then complains to the Nymble manager which will then send the server 00:14:10.760 --> 00:14:14.089 a token that can be used to link that user in the future. 00:14:14.089 --> 00:14:16.980 The server then adds the user to a blacklist. 00:14:18.740 --> 00:14:21.720 So basically the way that this works is that everything the user has done 00:14:21.720 --> 00:14:24.820 before the complaint still remains anonymous forever, 00:14:24.820 --> 00:14:28.170 but everything that they do in the future is linkable 00:14:28.170 --> 00:14:31.290 and thus it remains easier to block them. 00:14:32.200 --> 00:14:37.090 There has basically been no adoption of this kind of protocol, 00:14:37.090 --> 00:14:40.160 despite a lot of iterations in the literature. 00:14:40.320 --> 00:14:42.560 There are some reasons for this: 00:14:42.560 --> 00:14:45.380 many of the variants have no implementation, and those that do 00:14:45.380 --> 00:14:48.050 it's research code and as the author of some research code... 00:14:48.050 --> 00:14:50.949 I can tell you that there would be significant work involved in 00:14:50.949 --> 00:14:53.140 actually adopting these measures. 00:14:53.140 --> 00:14:56.380 And there is a price to be paid. You have pick between either having 00:14:56.380 --> 00:15:00.480 a semi-trusted third party, degraded notions of privacy, 00:15:00.480 --> 00:15:02.950 so basically pseudonymity rather than anonymity, 00:15:02.950 --> 00:15:05.240 or high computational overhead 00:15:05.240 --> 00:15:08.160 because zero-knowledge proofs are still kind of expensive. 00:15:08.160 --> 00:15:11.960 But it could well be done, and it's not like you need all of these things, 00:15:11.960 --> 00:15:13.360 you only need one, 00:15:13.360 --> 00:15:17.870 but ultimately it isn't being done, and I think this is because most sites 00:15:17.870 --> 00:15:23.060 don't really care. They believe that the number of non-jerks might not be zero, 00:15:23.060 --> 00:15:28.350 but it's approximately zero, and it's just not worth the bother. 00:15:29.600 --> 00:15:33.680 So we're interested in measuring this value of anonymous participation 00:15:33.680 --> 00:15:37.740 to sort of provide motivation for sites to actually try and solve these problems. 00:15:37.990 --> 00:15:42.120 It's not a terribly easy thing to do, because Tor is blocked so often 00:15:42.120 --> 00:15:45.050 we're actually trying to measure participation that doesn't happen, 00:15:45.050 --> 00:15:47.490 that might happen under alternate circumstances. 00:15:47.490 --> 00:15:51.300 To ask this question we turned to qualitative methods, which is 00:15:51.300 --> 00:15:53.020 basically an interview study. 00:15:53.020 --> 00:15:56.429 We talked to Tor users who participate in open collaboration, and we talked to 00:15:56.429 --> 00:15:58.990 Wikipedia editors about their privacy concerns. 00:16:01.510 --> 00:16:03.649 So we have two basic research questions: 00:16:03.649 --> 00:16:05.839 first, what kind of threats do contributors 00:16:05.839 --> 00:16:09.899 to open collaboration projects perceive, and second: 00:16:09.899 --> 00:16:13.850 how do people who contribute to open collaboration projects manage the risk? 00:16:13.850 --> 00:16:16.990 The goal here is to get the kind of in-depth and qualitative 00:16:16.990 --> 00:16:19.490 understanding that will help us to ask the right questions 00:16:19.490 --> 00:16:23.000 in a larger scale study, and ensure that we're solving the right problems 00:16:23.000 --> 00:16:28.069 when we design systems to facilitate anonymous participation in online projects 00:16:29.219 --> 00:16:30.970 As ?Cera McDonald? Pikelet said: 00:16:30.970 --> 00:16:36.470 "They're not anecdotes, that's small batch artisanal data..." 00:16:38.320 --> 00:16:42.730 So a little bit about our 23 participants in our study 00:16:42.730 --> 00:16:45.339 We had 12 participants that were Tor users 00:16:45.339 --> 00:16:50.640 8 males, 3 females and 1 of fluid gender. 00:16:50.640 --> 00:16:55.410 The minimum age was 18, the maximum age was 41 and the average was 30. 00:16:55.410 --> 00:17:01.020 3 people with a high school education, 4 current and graduated undergraduates 00:17:01.020 --> 00:17:07.048 and 5 people with post-graduate degrees or who were graduate students. 00:17:08.398 --> 00:17:13.279 The location: 7 of the participants were from the U.S. but we also had 00:17:13.279 --> 00:17:18.699 participants from Australia, Belgium, Canada, South Africa and Sweden. 00:17:18.959 --> 00:17:26.169 For the Wikimedia participants, we had again 8 males and 3 females. 00:17:26.169 --> 00:17:31.649 Actually I think the demographics of Tor and Wikimedia might not be too different. 00:17:31.649 --> 00:17:37.159 The minimum age was 20 and the max was 53, again the average was 30. 00:17:37.159 --> 00:17:42.360 One didn't report their education level, we had 8 people with bachelor's degrees 00:17:42.360 --> 00:17:47.330 or undergraduate students, and 2 graduate students or people with graduate degrees. 00:17:47.330 --> 00:17:51.620 Again we had 5 participants from the U.S., but we also had participants from 00:17:51.620 --> 00:17:56.309 Australia, France, Ghana, Israel and the U.K. in this case. 00:17:56.309 --> 00:18:00.740 So we didn't have - a lot of people talked to us - we didn't have any participants 00:18:00.740 --> 00:18:05.559 from places like Iran or China, though we did have some Iranians who were 00:18:05.559 --> 00:18:08.520 living in the U.S. who talked to us. 00:18:08.520 --> 00:18:12.230 So types of participation 00:18:12.230 --> 00:18:15.489 Obviously we had Wikipedians, we sought them out 00:18:15.489 --> 00:18:18.440 a number of the people that we talked to, especially the Tor users 00:18:18.440 --> 00:18:21.310 who actually contribute to the Tor project in some way 00:18:21.310 --> 00:18:24.559 but we asked people about their other participation on the Internet, 00:18:24.559 --> 00:18:28.300 especially Tor users, and we found that there are a lot of people that participate 00:18:28.300 --> 00:18:34.000 through adding web comments, participating on forums, using Twitter... 00:18:34.000 --> 00:18:37.740 contributing open source code to projects on Github or Sourceforge 00:18:37.740 --> 00:18:40.850 or other projects on the Internet, helping with the Internet archive 00:18:40.850 --> 00:18:46.100 or contributing to image boards... to sites that do that. 00:18:46.100 --> 00:18:50.120 So our interview protocol: we gave 20 dollars in compensation, 00:18:50.120 --> 00:18:51.480 gift cards or cash. 00:18:51.480 --> 00:18:58.200 30% of people declined this because we would need to register their participation 00:18:58.200 --> 00:19:02.809 if we give them compensation, and some people didn't want there to be 00:19:02.809 --> 00:19:03.980 as much of a record. 00:19:03.980 --> 00:19:07.509 We spoke to people over the phone, using Skype, using 00:19:07.509 --> 00:19:11.809 various encrypted audio mechanisms, one person was interviewed face to face. 00:19:11.809 --> 00:19:14.669 The interviews were again conducted by Andrea Forte 00:19:14.669 --> 00:19:19.260 and we asked people to tell in-depth stories and prompted them for detail. 00:19:19.690 --> 00:19:23.630 Our analysis of this is ongoing, it's not done, 00:19:24.310 --> 00:19:28.319 we've transcribed all the interviews, we've coded them to identify the themes 00:19:28.319 --> 00:19:30.480 and we grouped and merged some of these themes. 00:19:30.480 --> 00:19:34.009 I'm going to talk to you about some of the stuff that came out of this study, 00:19:34.009 --> 00:19:37.009 give some quotes and things like that. 00:19:37.579 --> 00:19:38.520 Interview topics. 00:19:38.520 --> 00:19:42.299 For Tor users we asked them to explain Tor and what it's for. We asked for some 00:19:42.299 --> 00:19:44.879 current and retrospective examples of use, 00:19:44.879 --> 00:19:48.169 the story of how and why they first started using Tor, 00:19:48.169 --> 00:19:52.139 and some examples of when they use Tor online and when they don't use Tor online 00:19:52.139 --> 00:19:55.489 and some questions about their participation in online projects 00:19:55.489 --> 00:19:59.480 and if they participate in Wikipedia we asked them some of the Wikipedia questions 00:19:59.480 --> 00:20:02.249 similarly with Wikipedia people who had used Tor. 00:20:02.249 --> 00:20:05.560 And there was some considerable overlap. 00:20:06.590 --> 00:20:09.640 For Wikipedians we asked how and why they started editing, 00:20:09.640 --> 00:20:12.289 examples of privacy concerns associated with their editing, 00:20:12.289 --> 00:20:15.169 steps they may have taken to protect their privacy when editing, 00:20:15.169 --> 00:20:18.450 and examples of interactions with other editors. 00:20:18.820 --> 00:20:24.170 Now, there's some real limitations with this work: 00:20:24.450 --> 00:20:28.210 we may be missing participants with severe privacy concerns. 00:20:28.940 --> 00:20:32.519 Anybody who participate in this would have talk to unknown parties 00:20:32.519 --> 00:20:36.700 that they couldn't necessarily trust that we were not going to do 00:20:36.700 --> 00:20:40.199 any nefarious things with their interview. 00:20:40.279 --> 00:20:43.769 They need to speak remotely over a communications channel in most cases 00:20:43.769 --> 00:20:48.909 we were willing to conduct some interviews over various encrypted channels 00:20:48.909 --> 00:20:51.950 such as Jitsi or really whatever people wanted us to do, 00:20:51.950 --> 00:20:53.519 as long as we could set it up. 00:20:53.519 --> 00:20:56.500 Though we didn't mention Skype in our recruitment materials, 00:20:56.500 --> 00:20:59.899 and this actually caused a bit of a kerfuffle on the Tor blog 00:20:59.899 --> 00:21:04.700 when people were saying we clearly don't understand Tor 00:21:04.700 --> 00:21:08.399 and have no familiarity with the project if we're even thinking of using Skype 00:21:08.399 --> 00:21:14.099 I know a couple of Tor users and Tor developers that use Skype, so... 00:21:14.179 --> 00:21:17.809 but, y'know, we were willing to use other things, 00:21:17.809 --> 00:21:20.700 and we again didn't talk to residents of Iran or China, 00:21:20.700 --> 00:21:25.319 which is something that a lot of people told us might be of interest. 00:21:25.319 --> 00:21:28.459 So, what does anonymity actually mean to a 00:21:28.459 --> 00:21:32.040 Wikipedian, was an interesting question. Because it doesn't mean the same thing 00:21:32.040 --> 00:21:36.999 that it usually means to a Tor user. So, a lot of times when people talk about 00:21:36.999 --> 00:21:40.440 anonymous edits in Wikipedia they mean editing without logging in. 00:21:40.440 --> 00:21:45.649 And this is actually called IP editing to Wikipedians, because what happens when you 00:21:45.649 --> 00:21:50.820 edit Wikipedia without logging in is that the IP address is actually published 00:21:50.820 --> 00:21:53.409 as the author of that edit. 00:21:53.409 --> 00:21:57.450 The other thing that people mean when they talk about editing anonymously is 00:21:57.450 --> 00:22:01.399 editing under a synonymous account while not leaving clues about your identity. 00:22:03.300 --> 00:22:06.250 The notion of IP editing is somewhat problematic. 00:22:06.500 --> 00:22:10.289 This was an article from Buzzfeed about 00:22:10.289 --> 00:22:15.879 the 33 most embarassing congressional edits to member's Wikipedia pages. 00:22:15.879 --> 00:22:20.960 The congressional offices in the U.S. all share one IP address, 00:22:20.960 --> 00:22:24.200 so you can simply search Wikipedia for that IP address 00:22:24.200 --> 00:22:26.980 and you can find people making revisions, 00:22:26.980 --> 00:22:32.379 for example to the liberty caucus Wikipedia site and so on. 00:22:34.259 --> 00:22:39.659 So in terms of content-based anonymity, according to the Wikipedians we talked to, 00:22:39.659 --> 00:22:42.490 most deanonymisation is done actually by contextual clues. 00:22:42.490 --> 00:22:45.779 When people are outed as being this pseudonymous Wikipedia person, 00:22:45.779 --> 00:22:48.229 it's usually because somebody looked up things. 00:22:48.229 --> 00:22:49.960 There was a quote, someone said: 00:22:49.960 --> 00:22:53.590 "these is small things but I usually wouldn't edit things relating to my school 00:22:53.590 --> 00:22:55.909 or places near where I lived when I was logged in. 00:22:55.909 --> 00:22:58.720 It's actually weirdly easy to piece together someone's identity 00:22:58.720 --> 00:23:01.220 based on the location or things like that" 00:23:01.220 --> 00:23:04.279 So Tor, it's worth pointing out the limits of what Tor can do 00:23:04.279 --> 00:23:07.920 Tor is not gonna help with this particular problem 00:23:07.920 --> 00:23:09.320 it will hide your IP address 00:23:09.320 --> 00:23:13.850 but not necessarily this. 00:23:16.310 --> 00:23:19.070 What is the Wikipedia policy on Tor? 00:23:19.070 --> 00:23:23.590 Mediawiki has a TorBlock extension, which automatically blocks editing through Tor 00:23:23.590 --> 00:23:27.570 Now, it's possible to actually get an exemption, 00:23:27.570 --> 00:23:31.970 what is called an IP block exemption, and registered users in good standing 00:23:31.970 --> 00:23:33.559 can ask for one. 00:23:33.559 --> 00:23:36.789 The problem is, it's a little bit hard to establish that standing 00:23:36.789 --> 00:23:41.249 it requires editing without using Tor. 00:23:41.739 --> 00:23:49.159 When pointed out that this is particularly problematic for censored users, 00:23:49.159 --> 00:23:52.279 because they can't access Wikipedia to edit in the first place, 00:23:52.279 --> 00:23:56.720 although they do provide some closed proxies for Chinese users in particular, 00:23:56.720 --> 00:24:00.309 there are a lot of censored users that aren't Chinese but... 00:24:00.309 --> 00:24:04.499 you can contact them to ask to use their sort of secret proxies. 00:24:04.499 --> 00:24:06.909 I don't know how well this actually works. 00:24:06.909 --> 00:24:11.700 But we did ask our interviewees, can Wikipedia be edited through Tor? 00:24:11.700 --> 00:24:15.649 Which is an interesting question. So, as a convention for the rest of the talk 00:24:15.649 --> 00:24:19.109 when you see these blue boxes, they are gonna be quotes from Wikipedians, 00:24:19.109 --> 00:24:22.009 when you see the green boxes, they're quotes from Tor users. 00:24:22.009 --> 00:24:27.400 When we asked people, the WIkipedians often said: if the account exists, 00:24:27.400 --> 00:24:31.019 yes, when you're doing an anonymous edit with Tor it's really difficult 00:24:31.969 --> 00:24:34.450 they mean an IP edit there. And then he said: 00:24:34.450 --> 00:24:36.469 I had one that came through the mailing list 00:24:36.469 --> 00:24:39.289 in the last couple of weeks, and that their employer had been 00:24:39.289 --> 00:24:41.700 checking up on them... we allowed that. 00:24:41.700 --> 00:24:45.349 So as an administrator I have a user bot that allows me to get around that, 00:24:45.349 --> 00:24:49.459 but as well as feeling bad about that, other people don't have that option. 00:24:50.759 --> 00:24:55.440 From a Tor user, we actually said: but sometimes, like every so many exit nodes, 00:24:55.440 --> 00:24:57.999 you sometimes one have works... so many sites block Tor, 00:24:57.999 --> 00:25:01.259 try to block it, it's quite annoying as you're trying to do something. 00:25:01.259 --> 00:25:05.969 So this person sort of... saw what... in the research of blocking Tor, 00:25:05.969 --> 00:25:09.419 not every exit node is blocked, so if you're really determined to make that 00:25:09.419 --> 00:25:15.389 anonymous edit, you can just keep clicking 'New Identity' and get there. 00:25:16.359 --> 00:25:20.130 And then they said: we do sometimes let people edit through them, 00:25:20.130 --> 00:25:23.139 I know we have users in China coming through the Great Firewall 00:25:23.139 --> 00:25:25.139 and stuff like that. 00:25:25.249 --> 00:25:29.179 So then ... [[ audio cuts out for 4 seconds ]] 00:25:29.179 --> 00:25:35.820 Tor user, y'know, well they... [[ audio cuts out for 16 seconds ]] 00:25:35.820 --> 00:25:55.070 [[ audio cuts out for 16 seconds ]] 00:25:55.070 --> 00:25:59.670 [[ 5 seconds audio cut remaining ]] 00:25:59.670 --> 00:26:01.099 ...things like that. 00:26:01.099 --> 00:26:04.340 So because you can change your IP address with the click of a button, 00:26:04.340 --> 00:26:07.910 it's very difficult to prevent abuse. 00:26:09.110 --> 00:26:14.189 There's this sort of notion that maybe it's important for vandalism, 00:26:14.189 --> 00:26:17.789 but maybe that's a problem, and maybe there should be something that be done. 00:26:17.789 --> 00:26:20.799 So then, a lot of what asked people about was sort of the threats 00:26:20.799 --> 00:26:23.779 that they were concerned about, from a data privacy perspective. 00:26:23.779 --> 00:26:27.899 People talked about government threats, businesses, organized crime, 00:26:27.899 --> 00:26:32.579 private citizens, other project members, and project outsiders. 00:26:32.759 --> 00:26:38.179 When we group the threats, we found sort of five or so big threats 00:26:38.179 --> 00:26:41.940 that lots of people talked about, we had twelve different instances of 00:26:41.940 --> 00:26:45.389 people talking about surveillance concerns or general concerns about 00:26:45.389 --> 00:26:47.739 the loss of privacy. 00:26:47.739 --> 00:26:50.969 Ten people talked specifically about the loss of employment 00:26:50.969 --> 00:26:55.979 or economic opportunity that might happen, 9 people talked about bullying, 00:26:55.979 --> 00:26:59.700 harassment, intimidation, stalking, this sort of thing. 00:26:59.760 --> 00:27:04.429 Another 9 people talked about personal safety, or the safety of their loved ones. 00:27:04.429 --> 00:27:10.100 6 people that we talked to, talked about reputation loss. 00:27:10.100 --> 00:27:12.909 I'll get into these in more detail. 00:27:13.309 --> 00:27:14.679 Surveillance. 00:27:14.679 --> 00:27:18.090 Y'know, in my country there is basically unknown surveillance going on 00:27:18.090 --> 00:27:21.369 and I don't know what providers to use, and at some point I decided to 00:27:21.369 --> 00:27:22.619 use Tor for everything. 00:27:22.619 --> 00:27:25.919 It's worth pointing out given the list of countries I gave that 00:27:25.919 --> 00:27:30.850 this isn't necessarily the list and... I think you wouldn't get this list of 00:27:30.850 --> 00:27:36.320 kinda quotes maybe before the Snowden revelations about generalized surveillance 00:27:36.320 --> 00:27:38.029 across the world. 00:27:38.029 --> 00:27:41.160 A lot of people talked about how their online activities were 00:27:41.160 --> 00:27:45.140 being accessed or logged without their consent, and especially among 00:27:45.140 --> 00:27:47.669 Tor users there was this notion of wanting to be 00:27:47.669 --> 00:27:51.189 public by effort, but private by default. 00:27:51.319 --> 00:27:57.049 And when you talk to Wikipedians, they talked about their edit histories and how 00:27:57.049 --> 00:28:01.299 the edit histories themselves might be somewhat sensitive. 00:28:03.809 --> 00:28:06.799 In terms of loss of employment... 00:28:06.799 --> 00:28:13.049 many many employers now look at your online footprint before they hire you. 00:28:13.049 --> 00:28:16.719 According to Monster, one of the big employment websites, 00:28:16.719 --> 00:28:20.730 77% of employers google perspective employees. 00:28:22.180 --> 00:28:26.810 From a Tor user, we had someone talk about "I am transgender, I am queer, my boss 00:28:26.810 --> 00:28:30.369 would rant for hours about this kind of person, that kind of person, the other 00:28:30.369 --> 00:28:34.179 kind of person, all of which I happen to be... and I decided if I was going to do 00:28:34.179 --> 00:28:37.829 anything online at all, I better look into options for protecting myself, because 00:28:37.829 --> 00:28:40.179 I didn't want to get fired." 00:28:40.179 --> 00:28:44.529 In Wikipedia, someone said: "A friend of mine was also involved in this discussion 00:28:44.529 --> 00:28:47.910 and he actually got it worse than I did. He's in a position now where 00:28:47.910 --> 00:28:52.110 anyone who googles him finds allegations that he is this awful monster, and 00:28:52.110 --> 00:28:55.369 he's terrified of having to look for work now because you google him, 00:28:55.369 --> 00:28:57.379 and that's what you find. 00:28:57.379 --> 00:29:01.750 So these things can have a real impact on people. So... 00:29:01.790 --> 00:29:05.989 and then there is harassment. So this is a quote from a Wikipedian who said: 00:29:05.989 --> 00:29:10.239 "I would say that the fear of harassment of real, of stalking and things like that 00:29:10.239 --> 00:29:13.539 is quite substantial, at least among administrators I know, 00:29:13.539 --> 00:29:15.309 especially women." 00:29:15.309 --> 00:29:18.519 From a Tor user there was someone who talked about "this is a map 00:29:18.519 --> 00:29:21.989 of active hate groups in the United States" 00:29:21.989 --> 00:29:25.609 and how they had experienced problems with these hate groups in the past 00:29:25.609 --> 00:29:29.519 and they wanted to see who was active in their area, and they would 00:29:29.519 --> 00:29:33.320 go to the websites of these hate groups and sort of for obvious reasons 00:29:33.320 --> 00:29:37.549 they didn't want their home IP address to appear in the logs of these 00:29:37.549 --> 00:29:40.179 hate group websites. 00:29:42.889 --> 00:29:46.759 Safety of loved ones, also personal safety. 00:29:47.179 --> 00:29:51.499 A lot of people talked about, y'know, real, concrete, not just threats but 00:29:51.499 --> 00:29:54.779 things that had happened to them or to people that they knew. 00:29:54.779 --> 00:29:59.129 In Tor there is this story: they bursted his door down and 00:29:59.129 --> 00:30:02.149 they beat the ever living crap out of him. He was hospitalized 00:30:02.149 --> 00:30:05.850 for two and a half weeks, and they told him: "if you and your family wanna live, 00:30:05.850 --> 00:30:07.840 you're gonna have to stop causing trouble" 00:30:07.840 --> 00:30:09.570 and they said that to him in farsee. 00:30:09.570 --> 00:30:12.750 I have a family so after I visited him in the hospital, I started... 00:30:12.750 --> 00:30:15.909 well at first I started shaking, and I went into a cold sweat 00:30:15.909 --> 00:30:20.019 and then I realized I have to start taking my human rights activities 00:30:20.019 --> 00:30:22.459 into other identities through the Tor network. 00:30:22.869 --> 00:30:24.659 And on the Wikipedia side: 00:30:24.659 --> 00:30:28.229 "I pulled back from some of that Wikipedia work when I could no longer hide 00:30:28.229 --> 00:30:32.179 in quite the same way. For a long time I lived on my own, so it's just my own 00:30:32.179 --> 00:30:36.049 personal risk I was taking with things, now my wife lives here as well 00:30:36.049 --> 00:30:37.699 and I can't take that same risk." 00:30:41.329 --> 00:30:45.619 Lastly, people were concerned about reputation loss. 00:30:45.619 --> 00:30:52.179 In Wikipedia there has been known to be edit wars that escalate into vendettas 00:30:52.179 --> 00:30:55.879 here's a sort of example of an edit war where y'know some user says: 00:30:55.879 --> 00:31:03.779 "I hate big bitch Alison," who is then blocked indefinitely by Alison. 00:31:03.779 --> 00:31:07.220 People are worried about this sort of thing escalating and then somebody 00:31:07.220 --> 00:31:12.179 doing something off of the Internet to call them names, or mess with their 00:31:12.179 --> 00:31:15.599 reputation... and that would have a negative effect on their life. 00:31:15.599 --> 00:31:21.919 In Tor there is a couple interesting cases that sort of concerns guilt by association 00:31:21.919 --> 00:31:24.529 So there is someone who participates on image boards, 00:31:24.529 --> 00:31:27.059 on 8chan or infinite chan, 00:31:27.059 --> 00:31:31.380 and I don't know if you guys are that aware of this... it's sort of the place 00:31:31.380 --> 00:31:34.310 which was kind of started by people that were blocked by 4chan, 00:31:34.310 --> 00:31:36.830 so it's the people that 4chan think are kind of sketchy 00:31:36.830 --> 00:31:39.740 laughter 00:31:39.740 --> 00:31:43.499 and this person said: "Look, I stand behind the material and the content that 00:31:43.499 --> 00:31:45.789 I have created, but some people on this site, 00:31:45.789 --> 00:31:48.999 I wouldn't wanna be associated with them." 00:31:48.999 --> 00:31:53.549 So, there is another person who talked about "look I've created some online 00:31:53.549 --> 00:31:59.249 resources about various pharmaceuticals, but I don't wanna be very associated 00:31:59.249 --> 00:32:04.009 with the community that posts stuff about stuff like that. 00:32:05.499 --> 00:32:07.119 So some other threats. 00:32:07.919 --> 00:32:10.929 Some people talked about diminished project quality. 00:32:10.929 --> 00:32:15.619 In particular a lot of the Wikipedians that we talked to 00:32:15.619 --> 00:32:18.149 were somewhat prominent in the Wikipedia project, 00:32:18.149 --> 00:32:21.979 and in some respects had kind of achieved some degree of like 00:32:21.979 --> 00:32:25.909 rock star status as editors, if such things can be. 00:32:26.379 --> 00:32:30.459 They found it very difficult to edit anymore because they'd edit a page 00:32:30.459 --> 00:32:34.059 and that page hadn't received a lot of attention but people would see that 00:32:34.059 --> 00:32:37.510 they had edited it and there would be sort of hordes of people that would 00:32:37.510 --> 00:32:40.479 descend on that page, and mess with it. 00:32:40.489 --> 00:32:44.420 And they found that they couldn't do that without actually sort of harming the pages 00:32:44.420 --> 00:32:46.239 that they were trying to edit. 00:32:46.239 --> 00:32:50.599 Similarly, there were some Tor users who were talked about, y'know, 00:32:50.599 --> 00:32:54.690 not wanting to sort of... take credit for their work because they were worried 00:32:54.690 --> 00:32:58.769 they wouldn't have the credentials to be taken seriously in various ways, 00:32:58.769 --> 00:33:00.029 or things like that. 00:33:00.029 --> 00:33:03.940 Only two people in our project actually talked about worrying about 00:33:03.940 --> 00:33:12.320 legal sort of sanctions, government sanctions for their participation. 00:33:12.320 --> 00:33:16.320 There were a lot of people that talked about computer security concerns 00:33:16.320 --> 00:33:19.769 which is not so much a privacy concern, though it's very related, and I'm 00:33:19.769 --> 00:33:24.460 going to talk about that because this group might be interested. 00:33:24.460 --> 00:33:27.749 On the Tor side, people liked to see authentication properties 00:33:27.749 --> 00:33:32.440 of .onion services. The idea that when you go to a .onion website, 00:33:32.440 --> 00:33:37.440 the address is self-authenticating, you know where you're going. 00:33:37.440 --> 00:33:41.289 But a lot of people who use Tor talked about the general data hygiene idea 00:33:41.289 --> 00:33:45.879 that there's sort of less data about them in unknown websites, 00:33:45.879 --> 00:33:49.159 in unknown databases of companies because they don't leave as many 00:33:49.159 --> 00:33:55.010 online footprints, and then you see all these high profile break-ins that happen 00:33:55.010 --> 00:33:58.639 and these databases get stolen, if you're using Tor, maybe you're less likely 00:33:58.639 --> 00:34:00.209 to be in those databases. 00:34:00.209 --> 00:34:02.599 That was the idea there. 00:34:02.599 --> 00:34:05.969 From Wikipedia a lot of people were concerned about 00:34:05.969 --> 00:34:08.020 their Wikipedia credentials. 00:34:08.020 --> 00:34:12.879 They talked about not logging in on public terminals and things like that, 00:34:12.879 --> 00:34:17.590 in particular being concerned about the security of administrative credentials 00:34:17.590 --> 00:34:22.679 that have privileges to, for example, look up the IP address of users who had edited 00:34:22.679 --> 00:34:25.989 and things like that, which could be abused. 00:34:27.309 --> 00:34:30.410 So some concrete things that the people were afraid of, 00:34:30.410 --> 00:34:31.999 not a complete list: 00:34:31.999 --> 00:34:35.069 having their head photoshopped onto porn, something that happens 00:34:35.069 --> 00:34:37.260 sometimes to editors... 00:34:37.260 --> 00:34:40.729 being beaten up, actually a couple of Tor people mentioned this; 00:34:40.729 --> 00:34:43.260 being swatted; receiving pipe bombs; 00:34:43.260 --> 00:34:47.080 having fake information about them published online. 00:34:47.320 --> 00:34:52.180 Though there were people that said, look, I don't really see a threat. 00:34:52.180 --> 00:34:56.469 And some participants said they don't perceive threats when they're contributing 00:34:56.469 --> 00:35:00.800 but in a lot of cases they pointed out that they enjoyed certain privileges 00:35:00.800 --> 00:35:04.020 related to perhaps their gender, their nationality, or the fact that 00:35:04.020 --> 00:35:05.970 their interests were fairly mainstream. 00:35:05.970 --> 00:35:08.700 So here's a quote: "yeah I'm not that worried about it, 00:35:08.700 --> 00:35:11.960 mainly because there's pretty good support for some of these viewpoints, 00:35:11.960 --> 00:35:15.450 kind of a mainstream discourse, and it's not so radical, I don't think anyone's 00:35:15.450 --> 00:35:17.300 going to be knocking down on my door. 00:35:17.300 --> 00:35:20.390 But I've been in contact with activists who have been engaged with 00:35:20.390 --> 00:35:23.440 higher risk activities, and I do wonder about, I do have concerns 00:35:23.440 --> 00:35:27.470 about their welfare, and the desire they have to have the tools to 00:35:27.470 --> 00:35:31.930 be able to pursue their activities without facing consequences." 00:35:31.930 --> 00:35:38.500 So in contrast to the jerk theme, there are a lot of people who run Tor 00:35:38.500 --> 00:35:43.330 out of a sense of altruism, to provide cover and solidarity. 00:35:43.920 --> 00:35:47.460 Someone said, I appreciate the need for protecting vulnerable people 00:35:47.460 --> 00:35:51.390 around the world, so I run several relays, some of them are exit relays, 00:35:51.390 --> 00:35:54.470 some of them are middle relays, and I run them around the world". 00:35:54.470 --> 00:35:57.820 And someone else said: "While you use it, you help 00:35:57.820 --> 00:36:01.950 diversify the network for those who may be subject to traffic monitoring, and you can 00:36:01.950 --> 00:36:05.820 look up any information you like, whether or not it's sensitive, and you'll get it, 00:36:05.820 --> 00:36:09.370 and if you live in a place where it may not be the greatest in legal standing 00:36:09.370 --> 00:36:13.289 to look it up, you're able to find out information." 00:36:14.459 --> 00:36:19.839 So mitigating strategies, how did people deal with this when they wanted to 00:36:19.839 --> 00:36:26.319 participate in sites but they couldn't do it through anonymous means, well, 00:36:26.319 --> 00:36:29.520 some people modified their participation, and I'll talk about some of 00:36:29.520 --> 00:36:35.940 the chilling effects that we saw, and also attempts to get anonymity in various ways 00:36:37.440 --> 00:36:40.079 So, lost editors. 00:36:40.389 --> 00:36:43.210 Several Tor users that we talked to, actually mentioned that 00:36:43.210 --> 00:36:47.700 they had edited Wikipedia and they no longer edited it, or they edited it 00:36:47.700 --> 00:36:50.230 less because of the difficulty of editing through Tor. 00:36:50.230 --> 00:36:53.380 There was someone who said: "Basically I used to edit Wikipedia 00:36:53.380 --> 00:36:57.470 prior to doing a lot of Tor, so yeah now it's mostly reading... I used to 00:36:57.470 --> 00:37:01.730 do a lot of editing for license design and for like some open source licenses, 00:37:01.730 --> 00:37:06.840 occasionally random forms and stuff that I knew about, sometimes grammar. 00:37:09.780 --> 00:37:13.289 And people talked to us in particular about the chilling effects 00:37:13.289 --> 00:37:17.910 of state surveillance, and in particular the Snowden revelations. 00:37:17.910 --> 00:37:22.179 In March of 2015 Wikimedia foundation announced that it was 00:37:22.179 --> 00:37:25.720 suing the National Security Agency. 00:37:25.720 --> 00:37:29.409 We asked people about that, and the Wikipedians, some of them said 00:37:29.409 --> 00:37:32.929 "People aren't willing to engage with us when they know their government is 00:37:32.929 --> 00:37:36.960 watching their every move." And they said that in particular they can show 00:37:36.960 --> 00:37:39.960 that editing dropped off significantly on certain articles 00:37:39.960 --> 00:37:42.680 after the Upstream program was revealed. 00:37:42.680 --> 00:37:48.329 Here's a quote from one of our Tor users in the study that substantiates this. 00:37:48.329 --> 00:37:51.330 "For the Edward Snowden page, I've pulled myself away from adding 00:37:51.330 --> 00:37:54.429 sensitive contributions, like different references, because I thought 00:37:54.429 --> 00:37:59.100 that made be traced back to me in some way. But not refraining from 00:37:59.100 --> 00:38:00.400 useful content I guess." 00:38:00.400 --> 00:38:04.779 Though, of course, adding references is one of the things that contributes to 00:38:04.779 --> 00:38:09.819 the quality of articles and so on, and in particular they said, articles about 00:38:09.819 --> 00:38:16.089 national security things, about terrorism and so on, people didn't edit as much 00:38:16.089 --> 00:38:21.510 about these things anymore because they were worried about ending up on a list. 00:38:21.510 --> 00:38:27.349 The other major topic that was chilled was articles about women's health. 00:38:27.349 --> 00:38:31.890 So, here's a picture of a vacuum aspiration abortion from the 00:38:31.890 --> 00:38:39.049 Wikipedia abortion article and a couple of people told us about how, "look, any 00:38:39.049 --> 00:38:44.609 site that has to do with women or women's issues is more contentiously edited, 00:38:44.609 --> 00:38:49.280 is more likely of inflaming people, getting into edit wars, than other sites." 00:38:50.100 --> 00:38:53.769 There were a lot of trolls on the Internet and there's a quote on the Internet: 00:38:53.769 --> 00:38:57.359 "Trolls have called their bosses and been like 'Do you know that your employee 00:38:57.359 --> 00:38:59.510 was editing the clitoris article last week?'" 00:38:59.510 --> 00:39:01.829 They will do stuff like that. 00:39:01.829 --> 00:39:07.000 So this means that, y'know, in particular someone talked about "I was a medical 00:39:07.000 --> 00:39:10.890 student, I had my obstetrics text book open, I was looking at the abortion 00:39:10.890 --> 00:39:14.029 article, I was thinking about making some changes, but then I just 00:39:14.029 --> 00:39:20.460 pulled myself back and said, y'know, I don't need that in my life." 00:39:20.460 --> 00:39:26.490 This is another area where privacy concerns push back, cause people 00:39:26.490 --> 00:39:29.839 to not necessarily do things... 00:39:29.839 --> 00:39:36.539 And then there's this idea of a threshold of participation, that the more involved 00:39:36.539 --> 00:39:40.529 you are, the more active you are in a project, the more likely you're actually 00:39:40.529 --> 00:39:43.569 gonna encounter real problems. 00:39:43.569 --> 00:39:48.069 People involved in curating content, deleting things, promoting things, 00:39:48.069 --> 00:39:51.619 arbitrating disputes, etc., they're going to make enemies. 00:39:51.619 --> 00:39:54.200 Some of these enemies are going to make nasty threats, 00:39:54.200 --> 00:39:56.550 and some of them are gonna act on them. 00:39:56.550 --> 00:40:00.000 Here is another quote of somebody: "As long as I have that pseudonym ... 00:40:00.000 --> 00:40:05.330 "As long as I have that pseudonym ... [[ see slide ]] 00:40:05.330 --> 00:40:10.549 [[ see slide ]] ... that turns up when you do that." 00:40:10.549 --> 00:40:14.720 People mention in particular, from the Wikipedia side, that there were two sites: 00:40:14.720 --> 00:40:21.150 Wikipediocracy and The Wikipedia Review, where people have critiques of Wikipedia 00:40:21.150 --> 00:40:27.860 and that people on these sites had done threats and doxing of various people 00:40:27.860 --> 00:40:29.910 on the arbitration committee. 00:40:29.910 --> 00:40:33.160 Someone talked about "they found my parents' home address, they found 00:40:33.160 --> 00:40:36.439 one of my old phone numbers, they wrote a blog post about all of these 00:40:36.439 --> 00:40:39.330 horrible things I've done, and here's my contact information, 00:40:39.330 --> 00:40:44.869 and for a good time call... and when it's on the Internet it doesn't die. 00:40:45.099 --> 00:40:51.729 People that get to a certain level of doing things, like handling abuse, 00:40:51.729 --> 00:40:53.629 had problems. 00:40:53.629 --> 00:40:57.630 So since I didn't have any privacy, I felt limited in what I could do, I could still 00:40:57.630 --> 00:41:00.219 write articles but blocking people was something 00:41:00.219 --> 00:41:03.209 I tried to avoid, since I didn't wanna get angry phone calls. 00:41:03.209 --> 00:41:06.269 So someone else also talked about activities that they used to do, 00:41:06.269 --> 00:41:08.429 but then after receiving threats and things... 00:41:08.429 --> 00:41:12.440 I used to check for use of the N-word, the ruder of the two F-words, one or two other 00:41:12.440 --> 00:41:16.969 things that were indicative of problems in user space, and I deleted lots and lots of 00:41:16.969 --> 00:41:20.260 attack pages which were fairly hot in dealing with them when they would 00:41:20.260 --> 00:41:23.779 turn up in article space, and when people create a user account in somebody 00:41:23.779 --> 00:41:27.380 else's name and say a bunch of things about that person they won't agree with, 00:41:27.380 --> 00:41:30.520 I used to deal with that, but then, y'know they're not willing to 00:41:30.520 --> 00:41:33.560 deal with that anymore. 00:41:35.120 --> 00:41:37.729 Privacy measures that people took. 00:41:37.959 --> 00:41:42.730 Obviously in some cases people use Tor, we talked to Tor users where that's possible 00:41:42.730 --> 00:41:46.460 People also talk about avoiding posting linking information and details 00:41:46.460 --> 00:41:53.710 about who they are, not editing things about y'know, their local things, 00:41:53.710 --> 00:41:57.710 things only they would know, etc. 00:41:57.710 --> 00:42:02.750 People talked about using Proxies or VPNs, some people talked about HideMyAss, 00:42:02.750 --> 00:42:08.470 editing from a public computer using multiple accounts in some cases, and 00:42:08.470 --> 00:42:18.590 using privacy browser plug ins and safeguards like NoScript and Ghostery 00:42:18.590 --> 00:42:23.540 We asked people, both Tor users and not Tor users if they had used Tor, 00:42:23.540 --> 00:42:27.359 what they thought of Tor, and there was this person who said: "I tried using Tor, 00:42:27.359 --> 00:42:31.249 I did, when I was younger, and everything was so slow and terrible, I was just like 00:42:31.249 --> 00:42:32.850 'so not worh it'." 00:42:32.850 --> 00:42:38.470 And in fact a couple years ago, Tor was in fact pretty slow - it's gotten better! 00:42:38.470 --> 00:42:41.349 But the Tor users still talked about bit about latencies, but 00:42:41.349 --> 00:42:45.630 a lot of them talked about these issues of CAPTCHAs, unusable website features, 00:42:45.630 --> 00:42:47.940 the fact that it used to be slow... 00:42:47.940 --> 00:42:51.920 and Wikipedians on Tor talked about it being slow or too much trouble, 00:42:51.920 --> 00:42:56.069 just the need to download the software and connect to it every time... and people, 00:42:56.069 --> 00:42:58.680 some people found it unnecessary. 00:42:58.680 --> 00:43:04.569 There was some other interesting things that came up. 00:43:04.569 --> 00:43:06.250 Some people talked about how 00:43:06.250 --> 00:43:09.440 they used information ?revelation? as a defense mechanism. 00:43:09.440 --> 00:43:14.559 This idea that, okay, I'm gonna give you some information about me, so you can't 00:43:14.559 --> 00:43:18.920 really dox me because that's my address right there, or whatever. 00:43:18.920 --> 00:43:23.740 But people talked also about the limits of long term participation. A lot of people 00:43:23.740 --> 00:43:28.670 that talked to us had started editing or participating in online projects 00:43:28.670 --> 00:43:32.680 as a relatively young teenager, and a lot of people 00:43:32.680 --> 00:43:37.450 start with things like fixing typos, before they later become a member 00:43:37.450 --> 00:43:40.630 of the arbitration committee, or something like that. 00:43:40.630 --> 00:43:44.460 It's hard to have this long term perspective when you're first creating 00:43:44.460 --> 00:43:48.650 your login name and you identity and so on. 00:43:48.650 --> 00:44:06.559 "Until it happens to you ... [[ see slide ]] 00:44:06.559 --> 00:44:10.769 [[ see slide ]] ... some serious thought." 00:44:11.849 --> 00:44:17.400 As most good, ethnographic studies do, and as this one was intended to do, 00:44:17.400 --> 00:44:21.420 it sort of raises more questions than answers. 00:44:21.420 --> 00:44:23.190 That was our goal. 00:44:23.190 --> 00:44:27.970 We're hoping... we learned that Tor users and Wikipedians share some 00:44:27.970 --> 00:44:32.480 privacy concerns, but they do have some different perspectives. 00:44:32.480 --> 00:44:36.019 And we did learn that some value of participation is being lost when people 00:44:36.019 --> 00:44:38.779 can't participate in a private way. 00:44:38.869 --> 00:44:44.180 We'd like to use this work to do some follow-up studies, and also perhaps 00:44:44.180 --> 00:44:48.470 build a larger survey study so we can learn more, see things that are more 00:44:48.470 --> 00:44:53.400 quantitative about this work. 00:44:53.400 --> 00:44:56.869 If you find this topic interesting, a short plug for 00:44:56.869 --> 00:44:59.250 the privacy enhancing technology symposium 00:44:59.250 --> 00:45:02.779 which will be in July in Darmstadt. 00:45:02.779 --> 00:45:06.369 We're not presenting this particular work here, but there is a lot of 00:45:06.369 --> 00:45:14.760 work on Tor, anonymity, privacy, so on from the research community. 00:45:14.760 --> 00:45:19.480 And I'd like to thank my co-authors, Andrea Forte and Nazanin Andalibi, 00:45:19.480 --> 00:45:25.400 our interview participants, the WIkimedia foundation, the Tor project, 00:45:25.400 --> 00:45:29.039 the National Science Foundation that funded Andrea's and my participation 00:45:29.039 --> 00:45:33.869 in this project, and all the people whose images I've used in my slides... 00:45:33.869 --> 00:45:36.900 so... Thanks! Any questions? Oh and by the way 00:45:36.900 --> 00:45:42.949 I'll be here for the whole conference, so you can find me afterwards if... 00:45:42.949 --> 00:45:51.549 applause 00:45:51.549 --> 00:45:56.510 Herald Angel: Thanks a lot, Rachel Greenstadt. And so, we hopefully have 00:45:56.510 --> 00:46:01.400 a few questions from you in the audience, you can line behind the microphones 00:46:01.400 --> 00:46:05.940 we have 4 of them here in the audience and also in the back there are 2, 00:46:05.940 --> 00:46:11.650 and we also have the Signal Angel present but he didn't get any questions yet, 00:46:11.650 --> 00:46:14.790 but maybe some comments or something? 00:46:14.790 --> 00:46:16.819 Some feedback from the crowd on the Internet? 00:46:16.819 --> 00:46:18.660 Rachel Greenstadt: but there is somebody with a... [inaudible] 00:46:18.660 --> 00:46:23.369 Herald Angel: then let me immediately go to the questions in the audience. 00:46:23.369 --> 00:46:26.210 Herald Angel: We have microphone 2, please 00:46:26.210 --> 00:46:32.900 HA: And, one second, can you please be quiet if you go outside? Because that's 00:46:32.900 --> 00:46:34.319 really rude. 00:46:34.319 --> 00:46:39.139 Question: did you find out if Wikipedia for example treats classical VPN or 00:46:39.139 --> 00:46:40.769 proxies differently from Tor? 00:46:40.769 --> 00:46:44.029 Rachel Greendstadt: If what? Question: if they treat them differently 00:46:44.029 --> 00:46:48.730 from Tor, so do they have the same policy in place for blocking, let's say, 00:46:48.730 --> 00:46:54.370 private VPN which can also be used to change your IP with the click of a button, 00:46:54.370 --> 00:46:59.239 if you want to bully someone but it might offer less privacy than Tor, but if you 00:46:59.239 --> 00:47:01.869 really only want to bully someone, that might be enough. 00:47:01.869 --> 00:47:06.240 Rachel Greenstadt: I think it depends, is the answer. 00:47:06.240 --> 00:47:12.349 The extensions that they have, they do block a lot of things from IPs so I think 00:47:12.349 --> 00:47:15.700 it depends on if there's been abuse through that thing before, 00:47:15.700 --> 00:47:20.480 they try and block open proxies, I think some people said certain VPNs you can 00:47:20.480 --> 00:47:23.400 still edit through, and some you couldn't, it really depended. 00:47:23.400 --> 00:47:28.010 Herald Angel: Thanks, microphone 1 please. 00:47:28.010 --> 00:47:31.520 Question: Wikipedia is by no means an isolated case, right? 00:47:31.520 --> 00:47:34.569 RA: No, no Question: And there's more and more 00:47:34.569 --> 00:47:39.510 capability of blocking Tor exit nodes and whatnot, so where's the project going? 00:47:39.510 --> 00:47:43.529 I mean, the Great Firewall for example could very well block all its users from 00:47:43.529 --> 00:47:46.559 accessing Tor, right? RA: It actually does. 00:47:46.559 --> 00:47:52.279 So it blocks people from accessing Tor and it blocks people from accessing Wikipedia, 00:47:52.279 --> 00:47:56.140 in terms of the Tor project there are mechanisms through using 00:47:56.140 --> 00:48:01.960 pluggable transports and bridge addresses, they can actually help people still 00:48:01.960 --> 00:48:05.920 access Tor, and then they'll be able to read Wikipedia, but then again 00:48:05.920 --> 00:48:08.049 they won't be able to edit for these reasons. 00:48:08.049 --> 00:48:13.340 HA: So, again, we have 15 minutes of break after this, so you can get out after this 00:48:13.340 --> 00:48:16.359 and change the room, and please be quiet if you really have to 00:48:16.359 --> 00:48:20.439 leave the room already or if you come in the room already. Thank you. 00:48:20.439 --> 00:48:22.430 Now to the Signal Angel, please. 00:48:22.430 --> 00:48:27.579 Signal Angel: There is one question from the Internet, from ?Whyness?, he or she 00:48:27.579 --> 00:48:31.829 is asking if there's actual a recorded instance of someone attempting to 00:48:31.829 --> 00:48:36.059 put a pipe bomb in the post because of Wikipedia edits. 00:48:36.059 --> 00:48:42.519 RA: I certainly don't have such information. This was just 00:48:42.519 --> 00:48:46.799 people telling us things that they were concerned about, or things that 00:48:46.799 --> 00:48:51.000 there had been threats that they'd experienced. 00:48:51.000 --> 00:48:54.369 Nobody that I know of specifically mentioned that they experienced 00:48:54.369 --> 00:48:55.369 a pipe bomb. 00:48:55.369 --> 00:49:01.470 Signal Angel: And another question from ?a_monk?: if blocked Tor traffic 00:49:01.470 --> 00:49:05.839 is a problem, why does the Tor project publish the exit IP list, making it 00:49:05.839 --> 00:49:08.329 easy to block? 00:49:08.329 --> 00:49:16.000 RA: That would be a question for the Tor people, my understanding of it is that 00:49:16.000 --> 00:49:20.339 the Tor project does try and be a good Internet citizen and they don't want to 00:49:20.339 --> 00:49:26.650 encourage the kind of, sort of, arms race that would happen with sort of... 00:49:26.650 --> 00:49:30.349 people trying to like find all the exits, and block them versus making it 00:49:30.349 --> 00:49:34.479 just look, here it is, this is what's going on, and... it's also very helpful 00:49:34.479 --> 00:49:37.970 when you're running an exit node, to be able to say, look, this thing is 00:49:37.970 --> 00:49:42.819 an exit node and that's what was going on when this thing happened 00:49:42.819 --> 00:49:49.369 through my computer. So I think, y'know, there's the ability of the exit relay 00:49:49.369 --> 00:49:54.069 operators to be able to say what they're doing is also an important concern. 00:49:54.069 --> 00:49:59.119 Herald Angel: so there's standing someone at microphone 5. 00:49:59.119 --> 00:50:03.680 Question: You mentioned zero-knowledge proofs in the beginning, is there any more 00:50:03.680 --> 00:50:05.269 research on this? 00:50:05.269 --> 00:50:13.269 RA: Uhm, yeah, so... If you look at the research on Nymble 00:50:13.269 --> 00:50:15.639 by Apu Kapadia, there's also some people 00:50:15.639 --> 00:50:19.089 in Nick Hopper's group at the university of Minnesota, there's also 00:50:19.089 --> 00:50:24.169 Ryan Henry in Indiana University that's done a lot of work on this 00:50:24.169 --> 00:50:27.680 in Ian Goldberg's group at Waterloo, those are the people that I would 00:50:27.680 --> 00:50:32.359 look up in terms of anonymous blacklisting schemes, and I'm sure I'm forgetting 00:50:32.359 --> 00:50:35.700 some of them right now, so hopefully they'll forgive me, but those are 00:50:35.700 --> 00:50:37.430 good places to start. 00:50:37.430 --> 00:50:41.799 Herald Angel: we have the next question at microphone 1. 00:50:41.799 --> 00:50:49.039 Question: Do you know if Wikipedia ever thought about hashing IP addresses, 00:50:49.039 --> 00:50:55.960 so that the contributions are still unique but the users are anonymized? 00:50:57.610 --> 00:51:02.029 RA: Nobody at WIkipedia talked to us about that, so I do not know if they thought 00:51:02.029 --> 00:51:04.089 about that or not. 00:51:04.089 --> 00:51:10.559 Herald Angel: and the last comment or question at the Signal Angel microphone. 00:51:10.559 --> 00:51:14.859 Signal Angel: Thanks, not really a question, more a comment... 00:51:14.859 --> 00:51:22.359 "I just wanted to relate, indeed Wikipedia blocking Tor is pretty concerned 00:51:22.359 --> 00:51:28.750 also for Tor users because for instance, the French Wikipedia articles about Tor 00:51:28.750 --> 00:51:34.650 have very, very poor quality and lot of people end up asking us questions about 00:51:34.650 --> 00:51:39.930 Tor and are missing from because of that, and I cannot fix it because I am not 00:51:39.930 --> 00:51:44.500 willing to edit Wikipedia without Tor. And that is also a pretty big issue I think." 00:51:44.500 --> 00:51:49.109 RA: Yeah, so it would be interesting from my perspective, using this to then look at 00:51:49.109 --> 00:51:53.230 the articles, the types of articles about Tor, about anonymous participation, 00:51:53.230 --> 00:51:58.059 where we would suggest... we'd like to do a bigger study, learn what articles about 00:51:58.059 --> 00:52:03.130 that anonymous users would edit if they were going to edit Wikipedia, and then 00:52:03.130 --> 00:52:07.309 we could do an analysis like they did about the movie sites to figure out 00:52:07.309 --> 00:52:11.739 if these articles are in some way shorter or of lower quality than other articles 00:52:11.739 --> 00:52:13.970 because they're missing that perspective. 00:52:13.970 --> 00:52:20.569 Herald Angel: Thank you Rachel, thank you for the questions, and warm applause again 00:52:20.569 --> 00:52:21.789 for Rachel GreenStadt. 00:52:21.959 --> 00:52:23.700 applause 00:52:23.780 --> 00:52:24.709 RA: Thanks 00:52:25.989 --> 00:52:29.831 tune playing 00:52:29.831 --> 00:52:37.000 subtitles created by c3subtitles.de Join, and help us!