0:00:00.000,0:00:09.970 silent 31C3 preroll 0:00:09.970,0:00:13.220 Dr. Gareth Owen: Hello. Can you hear me?[br]Yes. Okay. So my name is Gareth Owen. 0:00:13.220,0:00:16.150 I’m from the University of Portsmouth.[br]I’m an academic 0:00:16.150,0:00:19.320 and I’m going to talk to you about[br]an experiment that we did 0:00:19.320,0:00:22.610 on the Tor hidden services,[br]trying to categorize them, 0:00:22.610,0:00:25.230 estimate how many they were etc. etc. 0:00:25.230,0:00:27.380 Well, as we go through the talk[br]I’m going to explain 0:00:27.380,0:00:31.120 how Tor hidden services work internally,[br]and how the data was collected. 0:00:31.120,0:00:35.320 So what sort of conclusions you can draw[br]from the data based on the way that we’ve 0:00:35.320,0:00:39.950 collected it. Just so [that] I get[br]an idea: how many of you use Tor 0:00:39.950,0:00:42.430 on a regular basis, could you[br]put your hand up for me? 0:00:42.430,0:00:46.120 So quite a big number. Keep your hand[br]up if… or put your hand up if you’re 0:00:46.120,0:00:48.320 a relay operator. 0:00:48.320,0:00:51.470 Wow, that’s quite a significant number,[br]isn’t it? And then, put your hand up 0:00:51.470,0:00:55.250 and/or keep it up if you[br]run a hidden service. 0:00:55.250,0:00:59.530 Okay, so, a fewer number, but still[br]some people run hidden services. 0:00:59.530,0:01:02.720 Okay, so, some of you may be very familiar[br]with the way Tor works, sort of, 0:01:02.720,0:01:06.700 in a low level. But I am gonna go through[br]it for those which aren’t, so they understand 0:01:06.700,0:01:10.380 just how they work. And as we go along,[br]because I’m explaining how 0:01:10.380,0:01:14.030 the hidden services work, I’m going[br]to tag on information on how 0:01:14.030,0:01:19.030 the Tor hidden services themselves can be[br]deanonymised and also how the users 0:01:19.030,0:01:23.090 of those hidden services can be[br]deanonymised, if you put 0:01:23.090,0:01:27.040 some strict criteria on what it is you[br]want to do with respect to them. 0:01:27.040,0:01:30.920 So the things that I’m going to go over:[br]I wanna go over how Tor works, 0:01:30.920,0:01:34.190 and then specifically how hidden services[br]work. I’m gonna talk about something 0:01:34.190,0:01:37.889 called the “Tor Distributed Hash Table”[br]for hidden services. If you’ve heard 0:01:37.889,0:01:40.560 that term and don’t know what[br]it means, don’t worry, I’ll explain 0:01:40.560,0:01:44.010 what a distributed hash table is and[br]how it works. It’s not as complicated 0:01:44.010,0:01:47.690 as it sounds. And then I wanna go over[br]Darknet data, so, data that we collected 0:01:47.690,0:01:53.030 from Tor hidden services. And as I say,[br]as we go along I will sort of explain 0:01:53.030,0:01:56.650 how you do deanonymisation of both the[br]services themselves and of the visitors 0:01:56.650,0:02:02.400 to the service. And just[br]how complicated it is. 0:02:02.400,0:02:07.370 So you may have seen this slide which[br]I think was from GCHQ, released last year 0:02:07.370,0:02:12.099 as part of the Snowden leaks where they[br]said: “You can deanonymise some users 0:02:12.099,0:02:15.560 some of the time but they’ve had[br]no success in deanonymising someone 0:02:15.560,0:02:20.109 in response to a specific request.”[br]So, given all of you e.g., I may be able 0:02:20.109,0:02:25.090 to deanonymise a small fraction of you[br]but I can’t choose precisely one person 0:02:25.090,0:02:27.499 I want to deanonymise. That’s what[br]I’m gonna be explaining in relation 0:02:27.499,0:02:30.940 to the deanonymisation attacks, how[br]you can deanonymise a section but 0:02:30.940,0:02:38.629 you can’t necessarily choose which section[br]of the users that you will be deanonymising. 0:02:38.629,0:02:42.740 Tor drives with just a couple[br]of different problems. On one part 0:02:42.740,0:02:46.239 it allows you to bypass censorship. So if[br]you’re in a country like China, which 0:02:46.239,0:02:51.010 blocks some types of traffic you can use[br]Tor to bypass their censorship blocks. 0:02:51.010,0:02:55.541 It tries to give you privacy, so, at some[br]level in the network someone can’t see 0:02:55.541,0:02:59.200 what you’re doing. And at another point[br]in the network people who don’t know 0:02:59.200,0:03:02.540 who you are but may necessarily[br]be able to see what you’re doing. 0:03:02.540,0:03:07.099 Now the traditional case[br]for this is to look at VPNs. 0:03:07.099,0:03:10.669 With a VPN you have[br]sort of a single provider. 0:03:10.669,0:03:14.689 You have lots of users connecting[br]to the VPN. The VPN has sort of 0:03:14.689,0:03:18.240 a mixing effect from an outside or[br]a server’s point of view. And then 0:03:18.240,0:03:22.499 out of the VPN you see requests[br]to Twitter, Wikipedia etc. etc. 0:03:22.499,0:03:26.830 And if that traffic doesn’t encrypt it then[br]the VPN can also read the contents 0:03:26.830,0:03:30.980 of the traffic. Now of course there is[br]a fundamental weakness with this. 0:03:30.980,0:03:35.730 If you trust the VPN provider the VPN[br]provider knows both who you are 0:03:35.730,0:03:39.629 and what you’re doing and can[br]link those two together with absolute 0:03:39.629,0:03:43.580 certainty. So you don’t… whilst you do[br]get some of these properties, assuming 0:03:43.580,0:03:48.069 you’ve got a trustworthy VPN provider[br]you don’t get them in the face of 0:03:48.069,0:03:51.609 an untrustworthy VPN provider.[br]And of course: how do you trust the VPN 0:03:51.609,0:03:59.319 provider? What sort of measure do[br]you use? That’s sort of an open question. 0:03:59.319,0:04:03.729 So Tor tries to solve this problem[br]by distributing the trust. Tor is 0:04:03.729,0:04:07.500 an open source project, so you can go[br]on to their Git repository, you can 0:04:07.500,0:04:12.620 download the source code, and change it,[br]improve it, submit patches etc. 0:04:12.620,0:04:17.108 As you heard earlier, during Jacob and[br]Roger’s talk they’re currently partly 0:04:17.108,0:04:20.949 sponsored by the US Government which seems[br]a bit paradoxical, but they explained 0:04:20.949,0:04:24.770 in that talk many of the… that[br]doesn’t affect like judgment. 0:04:24.770,0:04:28.540 And indeed, they do have some funding from[br]other sources, and they design that system 0:04:28.540,0:04:30.841 – which I’ll talk about a little bit[br]later – in a way where they don’t have 0:04:30.841,0:04:34.230 to trust each other. So there’s sort of[br]some redundancy, and they’re trying 0:04:34.230,0:04:39.650 to minimize these sort of trust issues[br]related to this. Now, Tor is 0:04:39.650,0:04:43.310 a partially de-centralized network, which[br]means that it has some centralized 0:04:43.310,0:04:47.870 components which are under the control of[br]the Tor Project and some de-centralized 0:04:47.870,0:04:51.190 components which are normally the Tor[br]relays. If you run a relay you’re 0:04:51.190,0:04:56.290 one of those de-centralized components.[br]There is, however, no single authority 0:04:56.290,0:05:01.110 on the Tor network.[br]So no single server which is responsible, 0:05:01.110,0:05:04.290 which you’re required to trust.[br]So the trust is somewhat distributed, 0:05:04.290,0:05:12.000 but not entirely. When you establish[br]a circuit through Tor you, the user, 0:05:12.000,0:05:15.500 download a list of all of the relays[br]inside the Tor network. 0:05:15.500,0:05:19.070 And you get to pick – and I’ll tell you[br]how you do that – which relays 0:05:19.070,0:05:22.750 you’re going to use to route your traffic[br]through. So here is a typical example: 0:05:22.750,0:05:27.090 You’re here on the left hand side as the[br]user. You download a list of the relays 0:05:27.090,0:05:32.010 inside the Tor network and you select from[br]that list three nodes, a guard node 0:05:32.010,0:05:36.580 which is your entry into the Tor network,[br]a relay node which is a middle node. 0:05:36.580,0:05:39.010 Essentially, it’s going to route your[br]traffic to a third hop. And then 0:05:39.010,0:05:42.650 the third hop is the exit node where[br]your traffic essentially exits out 0:05:42.650,0:05:46.840 on the internet. Now, looking at the[br]circuit. So this is a circuit through 0:05:46.840,0:05:50.170 the Tor network through which you’re[br]going to route your traffic. There are 0:05:50.170,0:05:52.540 three layers of encryption at the[br]beginning, so between you 0:05:52.540,0:05:56.150 and the guard node. Your traffic[br]is encrypted three times. 0:05:56.150,0:05:59.330 In the first instance encrypted to the[br]guard, and the it’s encrypted again, 0:05:59.330,0:06:03.180 through the relay, and then encrypted[br]again to the exit, and as the traffic moves 0:06:03.180,0:06:08.710 through the Tor network each of those[br]layers of encryption are unpeeled 0:06:08.710,0:06:17.300 from the data. The Guard here in this case[br]knows who you are, and the exit relay 0:06:17.300,0:06:21.590 knows what you’re doing but neither know[br]both. And the middle relay doesn’t really 0:06:21.590,0:06:26.710 know a lot, except for which relay is[br]her guard and which relay is her exit. 0:06:26.710,0:06:31.870 Who runs an exit relay? So if you run[br]an exit relay all of the traffic which 0:06:31.870,0:06:36.210 users are sending out on the internet they[br]appear to come from your IP address. 0:06:36.210,0:06:41.360 So running an exit relay is potentially[br]risky because someone may do something 0:06:41.360,0:06:45.590 through your relay which attracts attention.[br]And then, when law enforcement 0:06:45.590,0:06:48.940 traced that back to an IP address it’s[br]going to come back to your address. 0:06:48.940,0:06:51.790 So some relay operators have had trouble[br]with this, with law enforcement coming 0:06:51.790,0:06:55.360 to them, and saying: “Hey we got this[br]traffic coming through your IP address 0:06:55.360,0:06:57.950 and you have to go and explain it.”[br]So if you want to run an exit relay 0:06:57.950,0:07:01.400 it’s a little bit risky, but we’re thankful[br]for those people that do run exit relays 0:07:01.400,0:07:04.870 because ultimately if people didn’t run[br]an exit relay you wouldn’t be able 0:07:04.870,0:07:08.000 to get out of the Tor network, and it[br]wouldn’t be terribly useful from this 0:07:08.000,0:07:20.560 point of view. So, yes.[br]applause 0:07:20.560,0:07:24.610 So every Tor relay, when you set up[br]a Tor relay you publish something called 0:07:24.610,0:07:28.780 a descriptor which describes your Tor[br]relay and how to use it to a set 0:07:28.780,0:07:33.430 of servers called the authorities. And the[br]trust in the Tor network is essentially 0:07:33.430,0:07:38.610 split across these authorities. They’re run[br]by the core Tor Project members. 0:07:38.610,0:07:42.639 And they maintain a list of all of the[br]relays in the network. And they observe 0:07:42.639,0:07:46.010 them over a period of time. If the relays[br]exhibit certain properties they give 0:07:46.010,0:07:50.480 the relays flags. If e.g. a relay allows[br]traffic to exit from the Tor network 0:07:50.480,0:07:54.450 it will get the ‘Exit’ flag. If they’d been[br]switched on for a certain period of time, 0:07:54.450,0:07:58.400 or for a certain amount of traffic they’ll[br]be allowed to become the guard relay 0:07:58.400,0:08:02.180 which is the first node in your circuit.[br]So when you build your circuit you 0:08:02.180,0:08:07.230 download a list of these descriptors from[br]one of the Directory Authorities. You look 0:08:07.230,0:08:10.120 at the flags which have been assigned to[br]each of the relays, and then you pick 0:08:10.120,0:08:14.150 your route based on that. So you’ll pick[br]the guard node from a set of relays 0:08:14.150,0:08:16.400 which have the ‘Guard’ flag, your exits[br]from the set of relays which have 0:08:16.400,0:08:20.860 the ‘Exit’ flag etc. etc. Now, as of[br]a quick count this morning there are 0:08:20.860,0:08:29.229 about 1500 guard relays, around 1000 exit[br]relays, and six relays flagged as ‘bad’ exits. 0:08:29.229,0:08:34.360 What does a ‘bad exit’ mean?[br]waits for audience to respond 0:08:34.360,0:08:37.759 That’s not good! That’s exactly[br]what it means! Yes! laughs 0:08:37.759,0:08:40.450 applause 0:08:40.450,0:08:45.569 So relays which have been flagged as ‘bad[br]exits’ your client will never chose to exit 0:08:45.569,0:08:50.660 traffic through. And examples of things[br]which may get a relay flagged as an 0:08:50.660,0:08:53.829 [bad] exit relay – if they’re fiddling with[br]the traffic which is coming out of 0:08:53.829,0:08:57.019 the Tor relay. Or doing things like[br]man-in-the-middle attacks against 0:08:57.019,0:09:01.629 SSL traffic. We’ve seen various things,[br]there have been relays man-in-the-middling 0:09:01.629,0:09:07.050 SSL traffic, there have very, very recently[br]been an exit relay which was patching 0:09:07.050,0:09:10.800 binaries that you downloaded from the[br]internet, inserting malware into the binaries. 0:09:10.800,0:09:14.630 So you can do these things but the Tor[br]Project tries to scan for them. And if 0:09:14.630,0:09:19.829 these things are detected then they’ll be[br]flagged as ‘Bad Exits’. It’s true to say 0:09:19.829,0:09:24.610 that the scanning mechanism is not 100%[br]fool-proof by any stretch of the imagination. 0:09:24.610,0:09:28.559 It tries to pick up common types[br]of attacks, so as a result 0:09:28.559,0:09:32.480 it won’t pick up unknown attacks or[br]attacks which haven’t been seen or 0:09:32.480,0:09:36.680 have not been known about beforehand. 0:09:36.680,0:09:45.370 So looking at this, how do you deanonymise[br]the traffic travelling through the Tor 0:09:45.370,0:09:49.449 networks? Given some traffic coming out[br]of the exit relay, how do you know 0:09:49.449,0:09:54.269 which user that corresponds to? What is[br]their IP address? You can’t actually 0:09:54.269,0:09:58.279 modify the traffic because if any of the[br]relays tried to modify the traffic 0:09:58.279,0:10:02.249 which they’re sending through the network[br]Tor will tear down the circuit through the relay. 0:10:02.249,0:10:06.290 So there’s these integrity checks, each[br]of the hops. And if you try to sort of 0:10:06.290,0:10:09.870 – because you can’t decrypt the packet[br]you can’t modify it in any meaningful way, 0:10:09.870,0:10:13.749 and because there’s an integrity check[br]at the next hop that means that you can’t 0:10:13.749,0:10:17.019 modify the packet because otherwise it’s[br]detected. So you can’t do this sort of 0:10:17.019,0:10:20.900 marker, and try and follow the marker[br]through the network. So instead 0:10:20.900,0:10:26.699 what you can do if you control… so let me[br]give you two cases. In the worst case 0:10:26.699,0:10:31.330 if the attacker controls all three of your[br]relays that you pick, which is an unlikely 0:10:31.330,0:10:34.739 scenario that needs to control quite[br]a big proportion of the network. Then 0:10:34.739,0:10:39.550 it should be quite obvious that they can[br]work out who you are and also 0:10:39.550,0:10:42.369 see what you’re doing because in that[br]case they can tag the traffic, and 0:10:42.369,0:10:45.709 they can just discard these integrity[br]checks at each of the following hops. 0:10:45.709,0:10:50.709 Now in a different case, if you control[br]the Guard relay and the exit relay 0:10:50.709,0:10:54.160 but not the middle relay the Guard relay[br]can’t tamper with the traffic because 0:10:54.160,0:10:57.660 this middle relay will close down the[br]circuit as soon as it happens. 0:10:57.660,0:11:01.130 The exit relay can’t send stuff back down[br]the circuit to try and identify the user, 0:11:01.130,0:11:05.030 either. Because again, the circuit will be[br]closed down. So what can you do? 0:11:05.030,0:11:09.869 Well, you can count the number of packets[br]going through the Guard node. And you can 0:11:09.869,0:11:14.690 measure the timing differences between[br]packets, and try and spot that pattern 0:11:14.690,0:11:18.750 at the Exit relays. You’re looking at counts of[br]packets and the timing between those 0:11:18.750,0:11:22.360 packets which are being sent, and[br]essentially trying to correlate them all. 0:11:22.360,0:11:26.869 So if your user happens to pick you as[br]your Guard node, and then happens to pick 0:11:26.869,0:11:31.850 your exit relay, then you can deanonymise[br]them with very high probability using 0:11:31.850,0:11:35.649 this technique. You’re just correlating[br]the timings of packets and counting 0:11:35.649,0:11:38.889 the number of packets going through.[br]And the attacks demonstrated in literature 0:11:38.889,0:11:44.509 are very reliable for this. We heard[br]earlier from the Tor talk about the “relay 0:11:44.509,0:11:50.739 early” tag which was the attack discovered[br]by the cert researches in the US. 0:11:50.739,0:11:55.050 That attack didn’t rely on timing attacks.[br]Instead, what they were able to do was 0:11:55.050,0:11:58.720 send a special type of cell containing[br]the data back down the circuit, 0:11:58.720,0:12:01.889 essentially marking this data, and saying:[br]“This is the data we’re seeing 0:12:01.889,0:12:06.149 at the Exit relay, or at the hidden[br]service", and encode into the messages 0:12:06.149,0:12:10.049 travelling back down the circuit, what the[br]data was. And then you could pick 0:12:10.049,0:12:14.269 those up at the Guard relay and say, okay,[br]whether it’s this person that’s doing that. 0:12:14.269,0:12:18.370 In fact, although this technique works,[br]and yeah it was a very nice attack, 0:12:18.370,0:12:21.269 the traffic correlation attacks are[br]actually just as powerful. 0:12:21.269,0:12:25.259 So although this bug has been fixed traffic[br]correlation attacks still work and are 0:12:25.259,0:12:29.739 still fairly, fairly reliable. So the problem[br]still does exist. This is very much 0:12:29.739,0:12:33.399 an open question. How do we solve this[br]problem? We don’t know, currently, 0:12:33.399,0:12:40.040 how to solve this problem of trying[br]to tackle the traffic correlation. 0:12:40.040,0:12:45.369 There are a couple of solutions.[br]But they’re not particularly… 0:12:45.369,0:12:48.569 they’re not particularly reliable. Let me[br]just go through these, and I’ll skip back 0:12:48.569,0:12:53.061 on the few things I’ve missed. The first[br]thing is, high-latency networks, so 0:12:53.061,0:12:56.999 networks where packets are delayed[br]in their transit through the network. 0:12:56.999,0:13:00.740 That throws away a lot of the timing[br]information. So they promise 0:13:00.740,0:13:03.800 to potentially solve this problem.[br]But of course, if you want to visit 0:13:03.800,0:13:06.779 Google’s home page, and you have to wait[br]five minutes for it, you’re simply 0:13:06.779,0:13:11.910 just not going to use Tor. The whole point[br]is trying to make this technology usable. 0:13:11.910,0:13:14.759 And if you got something which is very,[br]very slow then it doesn’t make it 0:13:14.759,0:13:18.269 attractive to use. But of course,[br]this case does work slightly better 0:13:18.269,0:13:22.059 for e-mail. If you think about it with[br]e-mail, you don’t mind if you’re e-mail 0:13:22.059,0:13:25.399 – well, you may not mind, you may mind –[br]you don’t mind if your e-mail is delayed 0:13:25.399,0:13:29.120 by some period of time. Which makes this[br]somewhat difficult. And as Roger said 0:13:29.120,0:13:35.130 earlier, you can also introduce padding[br]into the circuit, so these are dummy cells. 0:13:35.130,0:13:39.839 But, but… with a big caveat: some of the[br]research suggests that actually you’d 0:13:39.839,0:13:43.439 need to introduce quite a lot of padding[br]to defeat these attacks, and that would 0:13:43.439,0:13:47.179 overload the Tor network in its current[br]state. So, again, not a particular 0:13:47.179,0:13:53.860 practical solution. 0:13:53.860,0:13:58.279 How does Tor try to solve this problem?[br]Well, Tor makes it very difficult 0:13:58.279,0:14:03.171 to become a users Guard relay. If you[br]can’t become a users Guard relay 0:14:03.171,0:14:07.839 then you don’t know who the user is, quite[br]simply. And so by making it very hard 0:14:07.839,0:14:13.249 to become the Guard relay therefore you[br]can’t do this traffic correlation attack. 0:14:13.249,0:14:17.579 So at the moment the Tor client chooses[br]one Guard relay and keeps it for a period 0:14:17.579,0:14:22.259 of time. So if I want to sort of target[br]just one of you I would need to control 0:14:22.259,0:14:26.259 the Guard relay that you were using at[br]that particular point in time. And in fact 0:14:26.259,0:14:30.679 I’d also need to know what that Guard[br]relay is. So by making it very unlikely 0:14:30.679,0:14:34.129 that you would select a particular malicious[br]Guard relay, where the number of malicious 0:14:34.129,0:14:39.179 Guard relays is very small, that’s how Tor[br]tries to solve this problem. And 0:14:39.179,0:14:43.280 at the moment your Guard relay is your[br]barrier of security. If the attacker can’t 0:14:43.280,0:14:46.460 control the Guard relay then they won’t[br]know who you are. That doesn’t mean 0:14:46.460,0:14:50.639 they can’t try other sort of side channel[br]attacks by messing with the traffic 0:14:50.639,0:14:55.129 at the Exit relay etc. You know that you[br]may sort of e.g. download dodgy documents 0:14:55.129,0:14:59.499 and open one on your computer, and those[br]sort of things. Now the alternative 0:14:59.499,0:15:02.769 of course to having a Guard relay[br]and keeping it for a very long time 0:15:02.769,0:15:06.029 will be to have a Guard relay and[br]to change it on a regular basis. 0:15:06.029,0:15:09.929 Because you might think, well, just choosing[br]one Guard relay and sticking with it 0:15:09.929,0:15:13.399 is probably a bad idea. But actually,[br]that’s not the case. If you pick 0:15:13.399,0:15:18.370 the Guard relay, and assuming that the[br]chance of picking a Guard relay that is 0:15:18.370,0:15:22.800 malicious is very low, then, when you[br]first use your Guard relay, if you got 0:15:22.800,0:15:27.420 a good choice, then your traffic is safe.[br]If you haven’t got a good choice then 0:15:27.420,0:15:31.759 your traffic isn’t safe. Whereas if your[br]Tor client chooses a Guard relay 0:15:31.759,0:15:35.610 every few minutes, or every hour, or[br]something on those lines at some point 0:15:35.610,0:15:39.179 you’re gonna pick a malicious Guard relay.[br]So they’re gonna have some of your traffic 0:15:39.179,0:15:43.399 but not all of it. And so currently the[br]trade-off is that we make it very difficult 0:15:43.399,0:15:48.490 for an attacker to control a Guard relay[br]and the user picks a Guard relay and 0:15:48.490,0:15:52.449 keeps it for a long period of time. And[br]so it’s very difficult for the attackers 0:15:52.449,0:15:58.939 to pick that Guard relay when they control[br]a very small proportion of the network. 0:15:58.939,0:16:06.420 So this, currently, provides those[br]properties I described earlier, the privacy 0:16:06.420,0:16:11.410 and the anonymity when you’re browsing the[br]web, when you’re accessing websites etc. 0:16:11.410,0:16:16.519 But still you know who the website is. So[br]although you’re anonymous and the website 0:16:16.519,0:16:20.730 doesn’t know who you are you know who the[br]website is. And there may be some cases 0:16:20.730,0:16:25.499 where e.g. the website would also wish to[br]remain anonymous. You want the person 0:16:25.499,0:16:29.970 accessing the website and the website[br]itself to be anonymous to each other. 0:16:29.970,0:16:34.230 And you could think about people e.g.[br]being in countries where running 0:16:34.230,0:16:39.730 a political blog e.g. might be a dangerous[br]activity. If you run that on a regular 0:16:39.730,0:16:45.660 webserver you’re easily identified whereas,[br]if you got some way where you as 0:16:45.660,0:16:49.490 the webserver can be anonymous then[br]that allows you to do that activity without 0:16:49.490,0:16:57.480 being targeted by your government. So[br]this is what hidden services try to solve. 0:16:57.480,0:17:03.080 Now when you first think about a problem[br]you kind of think: “Hang on a second, 0:17:03.080,0:17:06.429 the user doesn’t know who the website[br]is and the website doesn’t know 0:17:06.429,0:17:09.890 who the user is. So how on earth do they[br]talk to each other?” Well, that’s essentially 0:17:09.890,0:17:14.220 what the Tor hidden service protocol tries[br]to sort of set up. How do you identify and 0:17:14.220,0:17:19.579 connect to each other. So at the moment[br]this is what happens: We’ve got Bob 0:17:19.579,0:17:23.780 on the [right] hand side who is the hidden[br]service. And we got Alice on the left hand 0:17:23.780,0:17:28.620 side here who is the user who wishes to[br]visit the hidden service. Now when Bob 0:17:28.620,0:17:34.190 sets up his hidden service he picks three[br]nodes in the Tor network as introduction 0:17:34.190,0:17:38.831 points and builds several hop circuits to[br]them. So the introduction points don’t know 0:17:38.831,0:17:44.680 who Bob is. Bob has circuits to them. And[br]Bob says to each of these introduction points 0:17:44.680,0:17:48.240 “Will you relay traffic to me if someone[br]connects to you asking for me?” 0:17:48.240,0:17:53.030 And then those introduction points[br]do that. So then, once Bob has picked 0:17:53.030,0:17:56.840 his introduction points he publishes[br]a descriptor describing the list of his 0:17:56.840,0:18:01.310 introduction points for someone who wishes[br]to come onto his websites. And then Alice 0:18:01.310,0:18:06.700 on the left hand side wishing to visit Bob[br]will pick a rendezvous point in the network 0:18:06.700,0:18:10.030 and build a circuit to it. So this “RP”[br]here is the rendezvous point. 0:18:10.030,0:18:14.530 And she will relay a message via one of[br]the introduction points saying to Bob: 0:18:14.530,0:18:18.290 “Meet me at the rendezvous point”.[br]And then Bob will build a 3-hop-circuit 0:18:18.290,0:18:22.870 to the rendezvous point. So now at this[br]stage we got Alice with a multi-hop circuit 0:18:22.870,0:18:26.890 to the rendezvous point, and Bob with[br]a multi-hop circuit to the rendezvous point. 0:18:26.890,0:18:32.550 Alice and Bob haven’t connected to one[br]another directly. The rendezvous point 0:18:32.550,0:18:36.530 doesn’t know who Bob is, the rendezvous[br]point doesn’t know who Alice is. 0:18:36.530,0:18:40.261 All they’re doing is forwarding the[br]traffic. And they can’t inspect the traffic, 0:18:40.261,0:18:43.740 either, because the traffic itself[br]is encrypted. 0:18:43.740,0:18:47.530 So that’s currently how you solve this[br]problem with trying to communicate 0:18:47.530,0:18:50.820 with someone who you don’t know[br]who they are and vice versa. 0:18:50.820,0:18:55.740 drinks from the bottle 0:18:55.740,0:18:58.870 The principle thing I’m going to talk[br]about today is this database. 0:18:58.870,0:19:01.990 So I said, Bob, when he picks his[br]introduction points he builds this thing 0:19:01.990,0:19:06.080 called a descriptor, describing who his[br]introduction points are, and he publishes 0:19:06.080,0:19:10.390 them to a database. This database itself[br]is distributed throughout the Tor network. 0:19:10.390,0:19:17.860 It’s not a single server. So both, Bob and[br]Alice need to be able to publish information 0:19:17.860,0:19:22.040 to this database, and also retrieve[br]information from this database. And Tor 0:19:22.040,0:19:24.820 currently uses something called[br]a distributed hash table, which I’m gonna 0:19:24.820,0:19:27.930 give an example of what this means and[br]how it works. And then I’ll talk to you 0:19:27.930,0:19:34.380 specifically how the Tor Distributed Hash[br]Table works itself. So let’s say e.g. 0:19:34.380,0:19:39.830 you've got a set of servers. So here we've[br]got 26 servers and you’d like to store 0:19:39.830,0:19:44.240 your files across these different servers[br]without having a single server responsible 0:19:44.240,0:19:48.050 for deciding, “okay, that file is stored[br]on that server, and this file is stored 0:19:48.050,0:19:53.050 on that server” etc. etc. Now here is my[br]list of files. You could take a very naive 0:19:53.050,0:19:57.740 approach. And you could say: “Okay, I’ve[br]got 26 servers, I got all of these file names 0:19:57.740,0:20:01.250 and start with the letter of the alphabet.”[br]And I could say: “All of the files that begin 0:20:01.250,0:20:05.450 with A are gonna go under server A; or[br]the files that begin with B are gonna go 0:20:05.450,0:20:09.900 on server B etc.” And then when you want[br]to retrieve a file you say: “Okay, what 0:20:09.900,0:20:13.950 does my file name begin with?” And then[br]you know which server it’s stored on. 0:20:13.950,0:20:17.750 Now of course you could have a lot of[br]servers – sorry – a lot of files 0:20:17.750,0:20:22.780 which begin with a Z, an X or a Y etc. in[br]which case you’re gonna overload 0:20:22.780,0:20:27.310 that server. You’re gonna have more files[br]stored on one server than on another server 0:20:27.310,0:20:32.150 in your set. And if you have a lot of big[br]files, say e.g. beginning with B then 0:20:32.150,0:20:35.520 rather than distributing your files across[br]all the servers you’re gonna just be 0:20:35.520,0:20:39.060 overloading one or two of them. So to[br]solve this problem what we tend to do is: 0:20:39.060,0:20:42.410 we take the file name, and we run it[br]through a cryptographic hash function. 0:20:42.410,0:20:46.930 A hash function produces output which[br]looks like random, very small changes 0:20:46.930,0:20:50.740 in the input so a cryptographic hash[br]function produces a very large change 0:20:50.740,0:20:55.240 in the output. And this change looks[br]random. So if I take all of my file names 0:20:55.240,0:20:59.820 here, and assuming I have a lot more,[br]I take a hash of them, and then I use 0:20:59.820,0:21:05.470 that hash to determine which server to[br]store the file on. Then, with high probability 0:21:05.470,0:21:09.670 my files will be distributed evenly across[br]all of the servers. And then when I want 0:21:09.670,0:21:12.990 to go and retrieve one of the files I take[br]my file name, I run it through the 0:21:12.990,0:21:15.980 cryptographic hash function, that gives me[br]the hash, and then I use that hash 0:21:15.980,0:21:19.740 to identify which server that particular[br]file is stored on. And then I go and 0:21:19.740,0:21:25.990 retrieve it. So that’s the sort of a loose[br]idea of how a distributed hash table works. 0:21:25.990,0:21:29.340 There are a couple of problems with this.[br]What if you got a changing size, what 0:21:29.340,0:21:34.700 if the number of servers you got changes[br]in size as it does in the Tor network. 0:21:34.700,0:21:42.290 It’s a very brief overview of the theory.[br]So how does it apply for the Tor network? 0:21:42.290,0:21:47.640 Well, the Tor network has a set of relays[br]and it has a set of hidden services. 0:21:47.640,0:21:52.710 Now we take all of the relays, and they[br]have a hash identity which identifies them. 0:21:52.710,0:21:57.460 And we map them onto a circle using that[br]hash value as an identifier. So you can 0:21:57.460,0:22:03.230 imagine the hash value ranging from Zero[br]to a very large number. We got a Zero point 0:22:03.230,0:22:07.280 at the very top there. And that runs all[br]the way round to the very large number. 0:22:07.280,0:22:12.130 So given the identity hash for a relay we[br]can map that to a particular point on 0:22:12.130,0:22:19.070 the server. And then all we have to do[br]is also do this for hidden services. 0:22:19.070,0:22:22.320 So there’s a hidden service address,[br]something.onion, so this is 0:22:22.320,0:22:27.750 one of the hidden websites that you might[br]visit. You take the – I’m not gonna describe 0:22:27.750,0:22:33.980 in too much detail how this is done but –[br]the value is done in such a way such that 0:22:33.980,0:22:38.020 it’s evenly distributed about the circle.[br]So your hidden service will have 0:22:38.020,0:22:44.240 a particular point on the circle. And the[br]relays will also be mapped onto this circle. 0:22:44.240,0:22:49.640 So there’s the relays. And the hidden[br]service. And in the case of Tor 0:22:49.640,0:22:53.460 the hidden service actually maps to two[br]positions on the circle, and it publishes 0:22:53.460,0:22:57.850 its descriptor to the three relays to the[br]right at one position, and the three relays 0:22:57.850,0:23:01.600 to the right at another position. So there[br]are actually in total six places where 0:23:01.600,0:23:05.060 this descriptor is published on the[br]circle. And then if I want to go and 0:23:05.060,0:23:09.450 fetch and connect to a hidden service[br]I go on to go and pull this hidden descriptor 0:23:09.450,0:23:13.780 down to identify what its introduction[br]points are. I take the hidden service 0:23:13.780,0:23:17.200 address, I find out where it is on the[br]circle, I map all of the relays onto 0:23:17.200,0:23:21.110 the circle, and then I identify which[br]relays on the circle are responsible 0:23:21.110,0:23:24.031 for that particular hidden service. And[br]I just connect, then I say: “Do you have 0:23:24.031,0:23:26.630 a copy of the descriptor for that[br]particular hidden service?” 0:23:26.630,0:23:29.620 And if so then we’ve got our list of[br]introduction points. And we can go 0:23:29.620,0:23:38.020 to the next steps to connect to our hidden[br]service. So I’m gonna explain how we 0:23:38.020,0:23:41.320 sort of set up our experiments. What we[br]thought, or what we were interested to do, 0:23:41.320,0:23:48.181 was collect publications of hidden[br]services. So for everytime a hidden service 0:23:48.181,0:23:51.520 gets set up it publishes to this distributed[br]hash table. What we wanted to do was 0:23:51.520,0:23:55.750 collect those publications so that we[br]get a complete list of all of the hidden 0:23:55.750,0:23:59.280 services. And what we also wanted to do[br]is to find out how many times a particular 0:23:59.280,0:24:06.300 hidden service is requested. 0:24:06.300,0:24:10.540 Just one more point that[br]will become important later. 0:24:10.540,0:24:14.230 The position which the hidden service[br]appears on the circle changes 0:24:14.230,0:24:18.950 every 24 hours. So there’s not[br]a fixed position every single day. 0:24:18.950,0:24:24.370 If we run 40 nodes over a long period of[br]time we will occupy positions within 0:24:24.370,0:24:29.570 that distributed hash table. And we will be[br]able to collect publications and requests 0:24:29.570,0:24:34.300 for hidden services that are located at[br]that position inside the distributed 0:24:34.300,0:24:39.251 hash table. So in that case we ran 40 Tor[br]nodes, we had a student at university 0:24:39.251,0:24:43.950 who said: “Hey, I run a hosting company,[br]I got loads of server capacity”, and 0:24:43.950,0:24:46.580 we told him what we were doing, and he[br]said: “Well, you really helped us out, 0:24:46.580,0:24:49.820 these last couple of years…”[br]and just gave us loads of server capacity 0:24:49.820,0:24:55.500 to allow us to do this. So we spun up 40[br]Tor nodes. Each Tor node was required 0:24:55.500,0:24:59.560 to advertise a certain amount of bandwidth[br]to become a part of that distributed 0:24:59.560,0:25:02.200 hash table. It’s actually a very small[br]amount, so this didn’t matter too much. 0:25:02.200,0:25:06.050 And then, after – this has changed[br]recently in the last few days, 0:25:06.050,0:25:10.070 it used to be 25 hours, it’s just been[br]increased as a result of one of the 0:25:10.070,0:25:14.570 attacks last week. But here… certainly[br]during our study it was 25 hours. You then 0:25:14.570,0:25:18.300 appear at a particular point inside that[br]distributed hash table. And you’re then 0:25:18.300,0:25:22.750 in a position to record publications of[br]hidden services and requests for hidden 0:25:22.750,0:25:27.810 services. So not only can you get a full[br]list of the onion addresses you can also 0:25:27.810,0:25:32.250 find out how many times each of the[br]onion addresses are requested. 0:25:32.250,0:25:38.270 And so this is what we recorded. And then,[br]once we had a full list of… or once 0:25:38.270,0:25:41.830 we had run for a long period of time to[br]collect a long list of .onion addresses 0:25:41.830,0:25:46.850 we then built a custom crawler that would[br]visit each of the Tor hidden services 0:25:46.850,0:25:51.450 in turn, and pull down the HTML contents,[br]the text content from the web page, 0:25:51.450,0:25:54.760 so that we could go ahead and classify[br]the content. Now it’s really important 0:25:54.760,0:25:59.250 to know here, and it will become obvious[br]why a little bit later, we only pulled down 0:25:59.250,0:26:03.030 HTML content. We didn’t pull out images.[br]And there’s a very, very important reason 0:26:03.030,0:26:09.980 for that which will become clear shortly. 0:26:09.980,0:26:13.520 We had a lot of questions when we[br]first started this. Noone really knew 0:26:13.520,0:26:18.000 how many hidden services there were. It had[br]been suggested to us there was a very high 0:26:18.000,0:26:21.250 turn-over of hidden services. We wanted to[br]confirm that whether that was true or not. 0:26:21.250,0:26:24.530 And we also wanted to do this so,[br]what are the hidden services, 0:26:24.530,0:26:30.140 how popular are they, etc. etc. etc. So[br]our estimate for how many hidden services 0:26:30.140,0:26:34.770 there are, over the period which we[br]ran our study, this is a graph plotting 0:26:34.770,0:26:38.560 our estimate for each of the individual[br]days as to how many hidden services 0:26:38.560,0:26:44.850 there were on that particular day. Now the[br]data is naturally noisy because we’re only 0:26:44.850,0:26:48.590 a very small proportion of that circle.[br]So we’re only observing a very small 0:26:48.590,0:26:53.250 proportion of the total publications and[br]requests every single day, for each of 0:26:53.250,0:26:57.260 those hidden services. And if you[br]take a long term average for this 0:26:57.260,0:27:02.720 there’s about 45.000 hidden services that[br]we think were present, on average, 0:27:02.720,0:27:07.880 each day, during our entire study. Which[br]is a large number of hidden services. 0:27:07.880,0:27:11.070 But over the entire length we[br]collected about 80.000, in total. 0:27:11.070,0:27:14.270 Some came and went etc.[br]So the next question after how many 0:27:14.270,0:27:17.750 hidden services there are is how long[br]the hidden service exists for. 0:27:17.750,0:27:20.620 Does it exist for a very long period[br]of time, does it exist for a very short 0:27:20.620,0:27:24.220 period of time etc. etc.[br]So what we did was, for every single 0:27:24.220,0:27:30.260 .onion address we plotted how many times[br]we saw a publication for that particular 0:27:30.260,0:27:34.160 hidden service during the six months.[br]How many times did we see it. 0:27:34.160,0:27:38.100 If we saw it a lot of times that suggested[br]in general the hidden service existed 0:27:38.100,0:27:42.180 for a very long period of time. If we saw[br]a very short number of publications 0:27:42.180,0:27:45.760 for each hidden service then that[br]suggests that they were only present 0:27:45.760,0:27:51.690 for a very short period of time. This is[br]our graph. By far the most number 0:27:51.690,0:27:55.890 of hidden services we only saw once during[br]the entire study. And we never saw them 0:27:55.890,0:28:00.390 again. We suggest that there’s a very high[br]turnover of the hidden services, they 0:28:00.390,0:28:04.520 don’t tend to exist on average i.e. for[br]a very long period of time. 0:28:04.520,0:28:10.730 And then you can see the sort of[br]a tail here. If we plot just those 0:28:10.730,0:28:16.390 hidden services which existed for a long[br]time, so e.g. we could take hidden services 0:28:16.390,0:28:20.280 which have a high number of hit requests[br]and say: “Okay, those that have a high number 0:28:20.280,0:28:24.800 of hits probably existed for a long time.”[br]That’s not absolutely certain, but probably. 0:28:24.800,0:28:29.190 Then you see this sort of -normal- plot[br]about 4..5, so we saw on average 0:28:29.190,0:28:34.870 most hidden services four or five times[br]during the entire six months if they were 0:28:34.870,0:28:40.530 popular and we’re using that as a proxy[br]measure for whether they existed 0:28:40.530,0:28:48.160 for the entire time. Now, this stage was[br]over 160 days, so almost six months. 0:28:48.160,0:28:51.490 What we also wanted to do was trying[br]to confirm this over a longer period. 0:28:51.490,0:28:56.310 So last year, in 2013, about February time[br]some researchers of the University 0:28:56.310,0:29:00.350 of Luxemburg also ran a similar study[br]but it ran over a very short period of time 0:29:00.350,0:29:05.060 over the day. But they did it in such[br]a way it could collect descriptors 0:29:05.060,0:29:08.590 across much of the circle during a single[br]day. That was because of a bug in the way 0:29:08.590,0:29:12.020 Tor did some of the things which has[br]now been fixed so we can’t repeat that 0:29:12.020,0:29:16.520 as a particular way. So we got a list of[br].onion addresses from February 2013 0:29:16.520,0:29:18.960 from these researchers at the University[br]of Luxemburg. And then we got our list 0:29:18.960,0:29:23.670 of .onion addresses from this six months[br]which was March to September of this year. 0:29:23.670,0:29:26.700 And we wanted to say, okay, we’re given[br]these two sets of .onion addresses. 0:29:26.700,0:29:30.740 Which .onion addresses existed in his set[br]but not ours and vice versa, and which 0:29:30.740,0:29:39.740 .onion addresses existed in both sets? 0:29:39.740,0:29:45.520 So as you can see a very small minority[br]of hidden service addresses existed 0:29:45.520,0:29:50.000 in both sets. This is over an 18 month[br]period between these two collection points. 0:29:50.000,0:29:54.430 A very small number of services existed[br]in both his data set and in 0:29:54.430,0:29:58.390 our data set. Which again suggested[br]there’s a very high turnover of hidden 0:29:58.390,0:30:02.920 services that don’t tend to exist[br]for a very long period of time. 0:30:02.920,0:30:06.530 So the question is why is that?[br]Which we’ll come on to a little bit later. 0:30:06.530,0:30:11.120 It’s a very valid question, can’t answer[br]it 100%, we have some inclines as to 0:30:11.120,0:30:15.560 why that may be the case. So in terms[br]of popularity which hidden services 0:30:15.560,0:30:19.700 did we see, or which .onion addresses[br]did we see requested the most? 0:30:19.700,0:30:26.980 Which got the most number of hits? Or the[br]most number of directory requests. 0:30:26.980,0:30:30.120 So botnet Command & Control servers[br]– if you’re not familiar with what 0:30:30.120,0:30:34.340 a botnet is, the idea is to infect lots of[br]people with a piece of malware. 0:30:34.340,0:30:37.630 And this malware phones home to[br]a Command & Control server where 0:30:37.630,0:30:41.500 the botnet master can give instructions[br]to each of the bots on to do things. 0:30:41.500,0:30:46.780 So it might be e.g. to collect passwords,[br]key strokes, banking details. 0:30:46.780,0:30:51.010 Or it might be to do things like[br]Distributed Denial of Service attacks, 0:30:51.010,0:30:55.220 or to send spam, those sorts of things.[br]And a couple of years ago someone gave 0:30:55.220,0:31:00.720 a talk and said: “Well, the problem with[br]running a botnet is your C&C servers 0:31:00.720,0:31:05.750 are vulnerable.” Once a C&C server is taken[br]down you no longer have control over 0:31:05.750,0:31:10.030 your botnet. So it’s been a sort of arms[br]race against anti-virus companies and 0:31:10.030,0:31:15.130 against malware authors to try and come up[br]with techniques to run C&C servers in a way 0:31:15.130,0:31:18.490 which they can’t be taken down. And[br]a couple of years ago someone gave a talk 0:31:18.490,0:31:22.450 at a conference that said: “You know what?[br]It would be a really good idea if botnet 0:31:22.450,0:31:25.809 C&C servers were run as Tor hidden[br]services because then no one knows 0:31:25.809,0:31:29.370 where they are, and in theory they can’t[br]be taken down.” So in the fact we have this 0:31:29.370,0:31:33.000 there are loads and loads and loads of[br]these addresses associated with several 0:31:33.000,0:31:38.122 different botnets, ‘Sefnit’ and ‘Skynet’.[br]Now Skynet is the one I wanted to talk 0:31:38.122,0:31:42.840 to you about because the guy that runs[br]Skynet had a twitter account, and he also 0:31:42.840,0:31:47.210 did a Reddit AMA. If you not heard[br]of a Reddit AMA before, that’s a Reddit 0:31:47.210,0:31:51.500 ask-me-anything. You can go on the website[br]and ask the guy anything. So this guy 0:31:51.500,0:31:54.790 wasn’t hiding in the shadows. He’d say:[br]“Hey, I’m running this massive botnet, 0:31:54.790,0:31:58.180 here’s my Twitter account which I update[br]regularly, here is my Reddit AMA where 0:31:58.180,0:32:01.620 you can ask me questions!” etc. 0:32:01.620,0:32:04.590 He was arrested last year, which is not,[br]perhaps, a huge surprise. 0:32:04.590,0:32:11.750 laughter and applause 0:32:11.750,0:32:15.970 But… so he was arrested,[br]his C&C servers disappeared 0:32:15.970,0:32:21.600 but there were still infected hosts trying[br]to connect with the C&C servers and 0:32:21.600,0:32:24.490 request access to the C&C server. 0:32:24.490,0:32:27.570 This is why we’re saying: “A large number[br]of hits.” So all of these requests are 0:32:27.570,0:32:31.520 failed requests, i.e. we didn’t have[br]a descriptor for them because 0:32:31.520,0:32:34.910 the hidden service had gone away but[br]there were still clients requesting each 0:32:34.910,0:32:38.040 of the hidden services. 0:32:38.040,0:32:41.980 And the next thing we wanted to do was[br]to try and categorize sites. So, as I said 0:32:41.980,0:32:45.960 earlier, we crawled all of the hidden[br]services that we could, and we classified 0:32:45.960,0:32:50.230 them into different categories based[br]on what the type of content was 0:32:50.230,0:32:53.650 on the hidden service side. The first[br]graph I have is the number of sites 0:32:53.650,0:32:58.040 in each of the categories. So you can see[br]down the bottom here we got lots of 0:32:58.040,0:33:04.280 different categories. We got drugs, market[br]places, etc. on the bottom. And the graph 0:33:04.280,0:33:07.360 shows the percentage of the hidden[br]services that we crawled that fit in 0:33:07.360,0:33:12.680 to each of these categories. So e.g. looking[br]at this, drugs, the most number of sites 0:33:12.680,0:33:16.250 that we crawled were made up of[br]drugs-focused websites, followed by 0:33:16.250,0:33:20.970 market places etc. There’s a couple of[br]questions you might have here, 0:33:20.970,0:33:25.640 so which ones are gonna stick out, what[br]does ‘porn’ mean, well, you know 0:33:25.640,0:33:31.060 what ‘porn’ means. There are some very[br]notorious porn sites on the Tor Darknet. 0:33:31.060,0:33:34.470 There was one in particular which was[br]focused on revenge porn. It turns out 0:33:34.470,0:33:37.520 that youngsters wish to take pictures[br]of themselves, and send it to their 0:33:37.520,0:33:45.040 boyfriends or their girlfriends. And[br]when they get dumped they publish them 0:33:45.040,0:33:49.750 on these websites. So there were several[br]of these sites on the main internet 0:33:49.750,0:33:53.070 which have mostly been shut down.[br]And some of these sites were archived 0:33:53.070,0:33:58.220 on the Darknet. The second one is that[br]we should probably wonder what is, 0:33:58.220,0:34:03.430 is ‘abuse’. Abuse was… every single[br]site we classified in this category 0:34:03.430,0:34:07.750 were child abuse sites. So they were in[br]some way facilitating child abuse. 0:34:07.750,0:34:10.980 And how do we know that? Well, the data[br]that came back from the crawler 0:34:10.980,0:34:14.789 made it completely unambiguous as to what[br]the content was in these sites. That was 0:34:14.789,0:34:18.918 completely obvious, from then content, from[br]the crawler as to what was on these sites. 0:34:18.918,0:34:23.449 And this is the principal reason why we[br]didn’t pull down images from sites. 0:34:23.449,0:34:26.099 There are many countries that[br]would be a criminal offense to do so. 0:34:26.099,0:34:29.530 So our crawler only pulled down text[br]content from all of these sites, and that 0:34:29.530,0:34:34.470 enabled us to classify them, based on[br]that. We didn’t pull down any images. 0:34:34.470,0:34:37.880 So of course the next thing we liked to do[br]is to say: “Okay, well, given each of these 0:34:37.880,0:34:42.759 categories, what proportion of directory[br]requests went to each of the categories?” 0:34:42.759,0:34:45.489 Now the next graph is going to need some[br]explaining as to precisely what it 0:34:45.489,0:34:52.090 means, and I’m gonna give that. This is[br]the proportion of directory requests 0:34:52.090,0:34:55.830 which we saw that went to each of the[br]categories of hidden service that we 0:34:55.830,0:34:59.740 classified. As you can see, in fact, we[br]saw a very large number going to these 0:34:59.740,0:35:05.010 abuse sites. And the rest sort of[br]distributed right there, at the bottom. 0:35:05.010,0:35:07.230 And the question is: “What is it[br]we’re collecting here?” 0:35:07.230,0:35:12.070 We’re collecting successful hidden service[br]directory requests. What does a hidden 0:35:12.070,0:35:16.790 service directory request mean?[br]It probably loosely correlates with 0:35:16.790,0:35:22.230 either a visit or a visitor. So somewhere[br]in between those two. Because when you 0:35:22.230,0:35:26.790 want to visit a hidden service you make[br]a request for the hidden service descriptor 0:35:26.790,0:35:31.080 and that allows you to connect to it[br]and browse through the web site. 0:35:31.080,0:35:34.770 But there are cases where, e.g. if you[br]restart Tor, you’ll go back and you 0:35:34.770,0:35:40.100 re-fetch the descriptor. So in that case[br]we’ll count twice, for example. 0:35:40.100,0:35:43.050 What proportion of these are people,[br]and which proportion of them are 0:35:43.050,0:35:46.619 something else? The answer to that is[br]we just simply don’t know. 0:35:46.619,0:35:50.250 We've got directory requests but that doesn’t[br]tell us about what they’re doing on these 0:35:50.250,0:35:55.130 sites, what they’re fetching, or who[br]indeed they are, or what it is they are. 0:35:55.130,0:35:58.690 So these could be automated requests,[br]they could be human beings. We can’t 0:35:58.690,0:36:03.750 distinguish between those two things. 0:36:03.750,0:36:06.420 What are the limitations? 0:36:06.420,0:36:12.170 A hidden service directory request neither[br]exactly correlates to a visit -or- a visitor. 0:36:12.170,0:36:16.380 It’s probably somewhere in between.[br]So you can’t say whether it’s exactly one 0:36:16.380,0:36:19.810 or the other. We cannot say whether[br]a hidden service directory request 0:36:19.810,0:36:26.230 is a person or something automated.[br]We can’t distinguish between those two. 0:36:26.230,0:36:31.890 Any type of site could be targeted by e.g.[br]DoS attacks, by web crawlers which would 0:36:31.890,0:36:40.040 greatly inflate the figures. If you were[br]to do a DoS attack it’s likely you’d only 0:36:40.040,0:36:44.700 request a small number of descriptors.[br]You’d actually be flooding the site itself 0:36:44.700,0:36:47.740 rather than the directories. But, in[br]theory, you could flood the directories. 0:36:47.740,0:36:52.840 But we didn’t see any sort of shutdown[br]of our directories based on flooding, e.g. 0:36:52.840,0:36:58.720 Whilst we can’t rule that out, it doesn’t[br]seem to fit too well with what we’ve got. 0:36:58.720,0:37:02.971 The other question is ‘crawlers’.[br]I obviously talked with the Tor Project 0:37:02.971,0:37:08.570 about these results and they’ve suggested[br]that there are groups, so the child 0:37:08.570,0:37:12.740 protection agencies e.g. that will crawl[br]these sites on a regular basis. And, 0:37:12.740,0:37:15.879 again, that doesn’t necessarily correlate[br]with a human being. And that could 0:37:15.879,0:37:19.830 inflate the figures. How many hidden[br]directory requests would there be 0:37:19.830,0:37:24.610 if a crawler was pointed at it. Typically,[br]if I crawl them on a single day, one request. 0:37:24.610,0:37:27.850 But if they got a large number of servers[br]doing the crawling then it could be 0:37:27.850,0:37:32.840 a request per day for every single server.[br]So, again, I can’t give you, definitive, 0:37:32.840,0:37:37.930 “yes, this is human beings” or[br]“yes, this is automated requests”. 0:37:37.930,0:37:43.300 The other important point is, these two[br]content graphs are only hidden services 0:37:43.300,0:37:48.550 offering web content. There are hidden[br]services that do things, e.g. IRC, 0:37:48.550,0:37:52.490 the instant messaging etc. Those aren’t[br]included in these figures. We’re only 0:37:52.490,0:37:57.990 concentrating on hidden services offering[br]web sites. They’re HTTP services, or HTTPS 0:37:57.990,0:38:01.640 services. Because that allows to easily[br]classify them. And, in fact, some of 0:38:01.640,0:38:06.080 the other types are IRC and Jabber the[br]result was probably not directly comparable 0:38:06.080,0:38:08.920 with web sites. That’s sort of the use[br]case for using them, it’s probably 0:38:08.920,0:38:16.490 slightly different. So I appreciate the[br]last graph is somewhat alarming. 0:38:16.490,0:38:20.640 If you have any questions please ask[br]either me or the Tor developers 0:38:20.640,0:38:24.810 as to how to interpret these results. It’s[br]not quite as straight-forward as it may 0:38:24.810,0:38:27.500 look when you look at the graph. You[br]might look at the graph and say: “Hey, 0:38:27.500,0:38:30.980 that looks like there’s lots of people[br]visiting these sites”. It’s difficult 0:38:30.980,0:38:40.240 to conclude that from the results. 0:38:40.240,0:38:45.990 The next slide is gonna be very[br]contentious. I will prefix it with: 0:38:45.990,0:38:50.970 “I’m not advocating -any- kind of[br]action whatsoever. I’m just trying 0:38:50.970,0:38:56.130 to describe technically as to what could[br]be done. It’s not up to me to make decisions 0:38:56.130,0:39:02.869 on these types of things.” So, of course,[br]when we found this out, frankly, I think 0:39:02.869,0:39:06.190 we were stunned. I mean, it took us[br]several days, frankly, it just stunned us, 0:39:06.190,0:39:09.610 “what the hell, this is not[br]what we expected at all.” 0:39:09.610,0:39:13.210 So a natural step is, well, we think, most[br]of us think that Tor is a great thing, 0:39:13.210,0:39:18.510 it seems. Could this problem be sorted out[br]while still keeping Tor as it is? 0:39:18.510,0:39:21.510 And probably the next step to say: “Well,[br]okay, could we just block this class 0:39:21.510,0:39:26.060 of content and not other types of content?”[br]So could we block just hidden services 0:39:26.060,0:39:29.630 that are associated with these sites and[br]not other types of hidden services? 0:39:29.630,0:39:33.370 We thought there’s three ways in which[br]we could block hidden services. 0:39:33.370,0:39:36.960 And I’ll talk about whether these were[br]impossible in the coming months, 0:39:36.960,0:39:39.430 after explaining them. But during our[br]study these would have been impossible 0:39:39.430,0:39:43.590 and presently they are possible. 0:39:43.590,0:39:48.630 A single individual could shut down[br]a single hidden service by controlling 0:39:48.630,0:39:53.640 all of the relays which are responsible[br]for receiving a publication request 0:39:53.640,0:39:57.280 on that distributed hash table. It’s[br]possible to place one of your relays 0:39:57.280,0:40:01.460 at a particular position on that circle[br]and so therefore make yourself be 0:40:01.460,0:40:04.290 the responsible relay for[br]a particular hidden service. 0:40:04.290,0:40:08.500 And if you control all of the six relays[br]which are responsible for a hidden service, 0:40:08.500,0:40:11.390 when someone comes to you and says:[br]“Can I have a descriptor for that site” 0:40:11.390,0:40:15.910 you can just say: “No, I haven’t got it”.[br]And provided you control those relays 0:40:15.910,0:40:20.580 users won’t be able to fetch those sites. 0:40:20.580,0:40:25.010 The second option is you could say:[br]“Okay, the Tor Project are blocking these” 0:40:25.010,0:40:28.941 – which I’ll talk about in a second –[br]“as a relay operator”. Could I 0:40:28.941,0:40:32.500 as a relay operator say: “Okay, as[br]a relay operator I don’t want to carry 0:40:32.500,0:40:35.930 this type of content, and I don’t want to[br]be responsible for serving up this type 0:40:35.930,0:40:39.930 of content.” A relay operator could patch[br]his relay and say: “You know what, 0:40:39.930,0:40:44.020 if anyone comes to this relay requesting[br]anyone of these sites then, again, just 0:40:44.020,0:40:48.740 refuse to do it”. The problem is a lot of[br]relay operators need to do it. So a very, 0:40:48.740,0:40:51.990 very large number of the potential relay[br]operators would need to do that 0:40:51.990,0:40:56.170 to effectively block these sites. The[br]final option is the Tor Project could 0:40:56.170,0:41:00.740 modify the Tor program and actually embed[br]these ingresses in the Tor program itself 0:41:00.740,0:41:05.030 so as that all relays by default both[br]block hidden service directory requests 0:41:05.030,0:41:10.560 to these sites, and also clients themselves[br]would say: “Okay, if anyone’s requesting 0:41:10.560,0:41:15.000 these block them at the client level.”[br]Now I hasten to add: I’m not advocating 0:41:15.000,0:41:18.230 any kind of action that is entirely up to[br]other people because, frankly, I think 0:41:18.230,0:41:22.530 if I advocated blocking hidden services[br]I probably wouldn’t make it out alive, 0:41:22.530,0:41:27.050 so I’m just saying: this is a description[br]of what technical measures could be used 0:41:27.050,0:41:30.730 to block some classes of sites. And of[br]course there’s lots of questions here. 0:41:30.730,0:41:35.150 If e.g. the Tor Project themselves decided:[br]“Okay, we’re gonna block these sites” 0:41:35.150,0:41:38.490 that means they are essentially[br]in control of the block list. 0:41:38.490,0:41:41.360 The block list would be somewhat public[br]so everyone would be up to inspect 0:41:41.360,0:41:44.930 what the sites are that are being blocked[br]and they would be in control of some kind 0:41:44.930,0:41:54.360 of block list. Which, you know, arguably[br]is against what the Tor Projects are after. 0:41:54.360,0:41:59.560 takes a sip, coughs 0:41:59.560,0:42:05.480 So how about deanonymising visitors[br]to hidden service web sites? 0:42:05.480,0:42:08.940 So in this case we got a user on the[br]left-hand side who is connected to 0:42:08.940,0:42:12.630 a Guard node. We’ve got a hidden service[br]on the right-hand side who is connected 0:42:12.630,0:42:17.530 to a Guard node and on the top we got[br]one of those directory servers which is 0:42:17.530,0:42:21.850 responsible for serving up those[br]hidden service directory requests. 0:42:21.850,0:42:28.660 Now, when you first want to connect to[br]a hidden service you connect through 0:42:28.660,0:42:31.619 your Guard node and through a couple of hops[br]up to the hidden service directory and 0:42:31.619,0:42:35.840 you request the descriptor off of them.[br]So at this point if you are the attacker 0:42:35.840,0:42:39.440 and you control one of the hidden service[br]directory nodes for a particular site 0:42:39.440,0:42:43.100 you can send back down the circuit[br]a particular pattern of traffic. 0:42:43.100,0:42:47.740 And if you control that user’s[br]Guard node – which is a big if – 0:42:47.740,0:42:52.110 then you can spot that pattern of traffic[br]at the Guard node. The question is: 0:42:52.110,0:42:56.940 “How do you control a particular user’s[br]Guard node?” That’s very, very hard. 0:42:56.940,0:43:01.480 But if e.g. I run a hidden service and all[br]of you visit my hidden service, and 0:43:01.480,0:43:05.670 I’m running a couple of dodgy Guard relays[br]then the probability is that some of you, 0:43:05.670,0:43:09.760 certainly not all of you by any stretch will[br]select my dodgy Guard relay, and 0:43:09.760,0:43:13.220 I could deanonymise you, but I couldn’t[br]deanonymise the rest of them. 0:43:13.220,0:43:18.260 So what we’re saying here is that[br]you can deanonymise some of the users 0:43:18.260,0:43:22.130 some of the time but you can’t pick which[br]users those are which you’re going to 0:43:22.130,0:43:26.609 deanonymise. You can’t deanonymise someone[br]specific but you can deanonymise a fraction 0:43:26.609,0:43:32.170 based on what fraction of the network you[br]control in terms of Guard capacity. 0:43:32.170,0:43:36.340 How about… so the attacker controls those[br]two – here’s a picture from a research of 0:43:36.340,0:43:40.200 the University of Luxemburg which[br]did this. And these are plots of 0:43:40.200,0:43:45.270 taking the user’s IP address visiting[br]a C&C server, and then geolocating it 0:43:45.270,0:43:48.480 and putting it on a map. So “where was the[br]user located when they called one of 0:43:48.480,0:43:51.620 the Tor hidden services?” So, again,[br]this is a selection, a percentage 0:43:51.620,0:43:58.060 of the users visiting C&C servers[br]using this technique. 0:43:58.060,0:44:03.770 How about deanonymising hidden services[br]themselves? Well, again, you got a problem. 0:44:03.770,0:44:08.340 You’re the user. You’re gonna connect[br]through your Guard into the Tor network. 0:44:08.340,0:44:12.160 And then, eventually, through the hidden[br]service’s Guard node, and talk to 0:44:12.160,0:44:16.740 the hidden service. As the attacker you[br]need to control the hidden service’s 0:44:16.740,0:44:20.859 Guard node to do these traffic correlation[br]attacks. So again, it’s very difficult 0:44:20.859,0:44:24.390 to deanonymise a specific Tor hidden[br]service. But if you think about, okay, 0:44:24.390,0:44:30.200 there is 1.000 Tor hidden services, if you[br]can control a percentage of the Guard nodes 0:44:30.200,0:44:34.230 then some hidden services will pick you[br]and then you’ll be able to deanonymise those. 0:44:34.230,0:44:37.330 So provided you don’t care which hidden[br]services you gonna deanonymise 0:44:37.330,0:44:41.400 then it becomes much more straight-forward[br]to control the Guard nodes of some hidden 0:44:41.400,0:44:44.910 services but you can’t pick exactly[br]what those are. 0:44:44.910,0:44:51.040 So what sort of data can you see[br]traversing a relay? 0:44:51.040,0:44:55.880 This is a modified Tor client which just[br]dumps cells which are coming… 0:44:55.880,0:44:58.750 essentially packets travelling down[br]a circuit, and the information you can 0:44:58.750,0:45:04.020 extract from them at a Guard node.[br]And this is done off the main Tor network. 0:45:04.020,0:45:08.590 So I’ve got a client connected to[br]a “malicious” Guard relay 0:45:08.590,0:45:14.040 and it logs every single packet – they’re[br]called ‘cells’ in the Tor protocol – 0:45:14.040,0:45:17.619 coming through the Guard relay. We can’t[br]decrypt the packet because it’s encrypted 0:45:17.619,0:45:21.780 three times. What we can record,[br]though, is the IP address of the user, 0:45:21.780,0:45:25.070 the IP address of the next hop,[br]and we can count packets travelling 0:45:25.070,0:45:29.240 in each direction down the circuit. And we[br]can also record the time at which those 0:45:29.240,0:45:32.210 packets were sent. So of course, if you’re[br]doing the traffic correlation attacks 0:45:32.210,0:45:37.970 you’re using that time in the information[br]to try and work out whether you’re seeing 0:45:37.970,0:45:42.370 traffic which you’ve sent and which[br]identifies a particular user or not. 0:45:42.370,0:45:44.810 Or indeed traffic which they’ve sent[br]which you’ve seen at a different point 0:45:44.810,0:45:49.100 in the network. 0:45:49.100,0:45:51.980 Moving on to my… 0:45:51.980,0:45:55.760 …interesting problems,[br]research questions etc. 0:45:55.760,0:45:59.250 Based on what I’ve said, I’ve said there’s[br]these directory authorities which are 0:45:59.250,0:46:05.070 controlled by the core Tor members. If[br]e.g. they were malicious then they could 0:46:05.070,0:46:08.990 manipulate the Tor… – if a big enough[br]chunk of them are malicious then 0:46:08.990,0:46:12.700 they can manipulate the consensus[br]to direct you to particular nodes. 0:46:12.700,0:46:15.920 I don’t think that’s the case, and that[br]anyone thinks that’s the case. 0:46:15.920,0:46:19.180 And Tor is designed in a way to tr…[br]I mean that you’d have to control 0:46:19.180,0:46:22.480 a certain number of the authorities[br]to be able to do anything important. 0:46:22.480,0:46:25.270 So the Tor people… I said this[br]to them a couple of days ago. 0:46:25.270,0:46:28.780 I find it quite funny that you’d design[br]your system as if you don’t trust 0:46:28.780,0:46:31.880 each other. To which their response was:[br]“No, we design our system so that 0:46:31.880,0:46:35.620 we don’t have to trust each other.” Which[br]I think is a very good model to have, 0:46:35.620,0:46:39.430 when you have this type of system.[br]So could we eliminate these sort of 0:46:39.430,0:46:43.240 centralized servers? I think that’s[br]actually a very hard problem to do. 0:46:43.240,0:46:46.340 There are lots of attacks which could[br]potentially be deployed against 0:46:46.340,0:46:51.250 a decentralized network. At the moment the[br]Tor network is relatively well understood 0:46:51.250,0:46:54.490 both in terms of what types of attack it[br]is vulnerable to. So if we were to move 0:46:54.490,0:46:58.880 to a new architecture then we may open it[br]to a whole new class of attacks. 0:46:58.880,0:47:02.000 The Tor network has been existing[br]for quite some time and it’s been 0:47:02.000,0:47:06.820 very well studied. What about global[br]adversaries like the NSA, where you could 0:47:06.820,0:47:10.980 monitor network links all across the[br]world? It’s very difficult to defend 0:47:10.980,0:47:15.530 against that. Where they can monitor…[br]if they can identify which Guard relay 0:47:15.530,0:47:18.760 you’re using, they can monitor traffic[br]going into and out of the Guard relay, 0:47:18.760,0:47:23.259 and they log each of the subsequent hops[br]along. It’s very, very difficult to defend against 0:47:23.259,0:47:26.470 these types of things. Do we know if[br]they’re doing it? The documents that were 0:47:26.470,0:47:29.850 released yesterday – I’ve only had a very[br]brief look through them, but they suggest 0:47:29.850,0:47:32.480 that they’re not presently doing it and[br]they haven’t had much success. 0:47:32.480,0:47:36.450 I don’t know why, there are very powerful[br]attacks described in the academic literature 0:47:36.450,0:47:40.830 which are very, very reliable and most[br]academic literature you can access for free 0:47:40.830,0:47:43.960 so it’s not even as if they have to figure[br]out how to do it. They just have to read 0:47:43.960,0:47:47.010 the academic literature and try and[br]implement some of these attacks. 0:47:47.010,0:47:52.000 I don’t know what – why they’re not. The[br]next question is how to detect malicious 0:47:52.000,0:47:57.760 relays. So in my case we’re running[br]40 relays. Our relays were on consecutive 0:47:57.760,0:48:01.570 IP addresses, so we’re running 40[br]– well, most of them are on consecutive 0:48:01.570,0:48:04.820 IP addresses in two blocks. So they’re[br]running on IP addresses numbered 0:48:04.820,0:48:09.280 e.g. 1,2,3,4,…[br]We were running two relays per IP address, 0:48:09.280,0:48:12.210 and every single relay had my name[br]plastered across it. 0:48:12.210,0:48:14.740 So after I set up these 40 relays in 0:48:14.740,0:48:17.420 a relatively short period of time[br]I expected someone from the Tor Project 0:48:17.420,0:48:22.260 to come to me and say: “Hey Gareth, what[br]are you doing?” – no one noticed, 0:48:22.260,0:48:26.090 no one noticed. So this is presently[br]an open question. On the Tor Project 0:48:26.090,0:48:28.790 they’re quite open about this. They[br]acknowledged that, in fact, last year 0:48:28.790,0:48:33.210 we had the CERT researchers launch much[br]more relays than that. The Tor Project 0:48:33.210,0:48:36.510 spotted those large number of relays[br]but chose not to do anything about it 0:48:36.510,0:48:40.119 and, in fact, they were deploying an[br]attack. But, as you know, it’s often very 0:48:40.119,0:48:43.700 difficult to defend against unknown[br]attacks. So at the moment how to detect 0:48:43.700,0:48:47.780 malicious relays is a bit of an open[br]question. Which as I think is being 0:48:47.780,0:48:50.720 discussed on the mailing list. 0:48:50.720,0:48:54.230 The other one is defending against unknown[br]tampering at exits. If you took or take 0:48:54.230,0:48:57.220 the exit relays – the exit relay[br]can tamper with the traffic. 0:48:57.220,0:49:01.040 So we know particular types of attacks[br]doing SSL man-in-the-middles etc. 0:49:01.040,0:49:05.350 We’ve seen recently binary patching.[br]How do we detect unknown tampering 0:49:05.350,0:49:08.970 with traffic, other types of traffic? So[br]the binary tampering wasn’t spotted 0:49:08.970,0:49:12.060 until it was spotted by someone who[br]told the Tor Project. So it wasn’t 0:49:12.060,0:49:15.609 detected e.g. by the Tor Project[br]themselves, it was spotted by someone else 0:49:15.609,0:49:20.500 and notified to them. And then the final[br]one open on here is the Tor code review. 0:49:20.500,0:49:25.400 So the Tor code is open source. We know[br]from OpenSSL that, although everyone 0:49:25.400,0:49:29.260 can read source code, people don’t always[br]look at it. And OpenSSL has been 0:49:29.260,0:49:32.230 a huge mess, and there’s been[br]lots of stuff disclosed over that 0:49:32.230,0:49:35.880 over the last coming days. There are[br]lots of eyes on the Tor code but I think 0:49:35.880,0:49:41.519 always, more eyes are better. I’d say,[br]ideally if we can get people to look 0:49:41.519,0:49:45.140 at the Tor code and look for[br]vulnerabilities then… I encourage people 0:49:45.140,0:49:49.860 to do that. It’s a very useful thing to[br]do. There could be unknown vulnerabilities 0:49:49.860,0:49:53.119 as we’ve seen with the “relay early” type[br]quite recently in the Tor code which 0:49:53.119,0:49:56.990 could be quite serious. The truth is we[br]just don’t know until people do thorough 0:49:56.990,0:50:02.500 code audits, and even then it’s very[br]difficult to know for certain. 0:50:02.500,0:50:08.170 So my last point, I think, yes, 0:50:08.170,0:50:11.130 is advice to future researchers.[br]So if you ever wanted, or are planning 0:50:11.130,0:50:16.349 on doing a study in the future, e.g. on[br]Tor, do not do what the CERT researchers 0:50:16.349,0:50:20.550 do and start deanonymising people on the[br]live Tor network and doing it in a way 0:50:20.550,0:50:25.060 which is incredibly irresponsible. I don’t[br]think…I mean, I tend, myself, to give you with 0:50:25.060,0:50:28.510 the benefit of a doubt, I don’t think the[br]CERT researchers set out to be malicious. 0:50:28.510,0:50:33.320 I think they’re just very naive.[br]That’s what it was they were doing. 0:50:33.320,0:50:36.780 That was rapidly pointed out to them.[br]In my case we are running 0:50:36.780,0:50:43.090 40 relays. Our Tor relays they were forwarding[br]traffic, they were acting as good relays. 0:50:43.090,0:50:45.970 The only thing that we were doing[br]was logging publication requests 0:50:45.970,0:50:50.050 to the directories. Big question whether[br]that’s malicious or not – I don’t know. 0:50:50.050,0:50:53.330 One thing that has been pointed out to me[br]is that the .onion addresses themselves 0:50:53.330,0:50:58.270 could be considered sensitive information,[br]so any data we will be retaining 0:50:58.270,0:51:01.840 from the study is the aggregated data.[br]So we won't be retaining information 0:51:01.840,0:51:05.400 on individual .onion addresses because[br]that could potentially be considered 0:51:05.400,0:51:08.900 sensitive information. If you think about[br]someone running an .onion address which 0:51:08.900,0:51:11.240 contains something which they don’t want[br]other people knowing about. So we won’t 0:51:11.240,0:51:15.060 be retaining that data, and[br]we’ll be destroying them. 0:51:15.060,0:51:19.920 So I think that brings me now[br]to starting the questions. 0:51:19.920,0:51:22.770 I want to say “Thanks” to a couple of[br]people. The student who donated 0:51:22.770,0:51:26.820 the server to us. Nick Savage who is one[br]of my colleagues who was a sounding board 0:51:26.820,0:51:30.510 during the entire study. Ivan Pustogarov[br]who is the researcher at the University 0:51:30.510,0:51:34.700 of Luxembourg who sent us the large data[br]set of .onion addresses from last year. 0:51:34.700,0:51:37.670 He’s also the chap who has demonstrated[br]those deanonymisation attacks 0:51:37.670,0:51:41.500 that I talked about. A big "Thank you" to[br]Roger Dingledine who has frankly been… 0:51:41.500,0:51:45.230 presented loads of questions to me over[br]the last couple of days and allowed me 0:51:45.230,0:51:49.410 to bounce ideas back and forth.[br]That has been a very useful process. 0:51:49.410,0:51:53.640 If you are doing future research I strongly[br]encourage you to contact the Tor Project 0:51:53.640,0:51:57.040 at the earliest opportunity. You’ll find[br]them… certainly I found them to be 0:51:57.040,0:51:59.460 extremely helpful. 0:51:59.460,0:52:04.640 Donncha also did something similar,[br]so both Ivan and Donncha have done 0:52:04.640,0:52:09.520 a similar study in trying to classify the[br]types of hidden services or work out 0:52:09.520,0:52:13.520 how many hits there are to particular[br]types of hidden service. Ivan Pustogarov 0:52:13.520,0:52:17.430 did it on a bigger scale[br]and found similar results to us. 0:52:17.430,0:52:21.910 That is that these abuse sites[br]featured frequently 0:52:21.910,0:52:26.740 in the top requested sites. That was done[br]over a year ago, and again, he was seeing 0:52:26.740,0:52:31.109 similar sorts of pattern. There were these[br]abuse sites being requested frequently. 0:52:31.109,0:52:35.450 So that also sort of probates[br]what we’re saying. 0:52:35.450,0:52:38.540 The data I put online is at this address,[br]there will probably be the slides, 0:52:38.540,0:52:41.609 something called ‘The Tor Research[br]Framework’ which is an implementation 0:52:41.609,0:52:47.510 of a Java client, so an implementation[br]of a Tor client in Java specifically aimed 0:52:47.510,0:52:52.080 at researchers. So if e.g. you wanna pull[br]out data from a consensus you can do. 0:52:52.080,0:52:55.290 If you want to build custom routes[br]through the network you can do. 0:52:55.290,0:52:58.230 If you want to build routes through the[br]network and start sending padding traffic 0:52:58.230,0:53:01.720 down them you can do etc.[br]The code is designed in a way which is 0:53:01.720,0:53:06.000 designed to be easily modifiable[br]for testing lots of these things. 0:53:06.000,0:53:10.580 There is also a link to the Tor FBI[br]exploit which they deployed against 0:53:10.580,0:53:16.230 visitors to some Tor hidden services last[br]year. They exploited a Mozilla Firefox bug 0:53:16.230,0:53:20.540 and then ran code on users who were[br]visiting these hidden service, and ran 0:53:20.540,0:53:24.619 code on their computer to identify them.[br]At this address there is a link to that 0:53:24.619,0:53:29.250 including a copy of the shell code and an[br]analysis of exactly what it was doing. 0:53:29.250,0:53:31.670 And then of course a list of references,[br]with papers and things. 0:53:31.670,0:53:34.260 So I’m quite happy to take questions now. 0:53:34.260,0:53:46.960 applause 0:53:46.960,0:53:50.880 Herald: Thanks for the nice talk![br]Do we have any questions 0:53:50.880,0:53:57.000 from the internet? 0:53:57.000,0:53:59.740 Signal Angel: One question. It’s very hard[br]to block addresses since creating them 0:53:59.740,0:54:03.620 is cheap, and they can be generated[br]for each user, and rotated often. So 0:54:03.620,0:54:07.510 can you think of any other way[br]for doing the blocking? 0:54:07.510,0:54:09.799 Gareth: That is absolutely true, so, yes.[br]If you were to block a particular .onion 0:54:09.799,0:54:13.060 address they can wail: “I want another[br].onion address.” So I don’t know of 0:54:13.060,0:54:16.760 any way to counter that now. 0:54:16.760,0:54:18.510 Herald: Another one from the internet?[br]inaudible answer from Signal Angel 0:54:18.510,0:54:22.030 Okay, then, Microphone 1, please! 0:54:22.030,0:54:26.359 Question: Thank you, that’s fascinating[br]research. You mentioned that it is 0:54:26.359,0:54:32.200 possible to influence the hash of your[br]relay node in a sense that you could 0:54:32.200,0:54:35.970 to be choosing which service you are[br]advertising, or which hidden service 0:54:35.970,0:54:38.050 you are responsible for. Is that right?[br]Gareth: Yeah, correct! 0:54:38.050,0:54:40.390 Question: So could you elaborate[br]on how this is possible? 0:54:40.390,0:54:44.740 Gareth: So e.g. you just keep regenerating[br]a public key for your relay, 0:54:44.740,0:54:48.140 you’ll get closer and closer to the point[br]where you’ll be the responsible relay 0:54:48.140,0:54:51.160 for that particular hidden service. That’s[br]just – you keep regenerating your identity 0:54:51.160,0:54:54.720 hash until you’re at that particular point[br]in the relay. That’s not particularly 0:54:54.720,0:55:00.490 computationally intensive to do.[br]That was it? 0:55:00.490,0:55:04.740 Herald: Okay, next question[br]from Microphone 5, please. 0:55:04.740,0:55:09.490 Question: Hi, I was wondering for the[br]attacks where you identify a certain number 0:55:09.490,0:55:15.170 of users using a hidden service. Have[br]those attacks been used, or is there 0:55:15.170,0:55:18.880 any evidence there, and is there[br]any way of protecting against that? 0:55:18.880,0:55:22.260 Gareth: That’s a very interesting question,[br]is there any way to detect these types 0:55:22.260,0:55:24.970 of attacks? So some of the attacks,[br]if you’re going to generate particular 0:55:24.970,0:55:29.030 traffic patterns, one way to do that is to[br]use the padding cells. The padding cells 0:55:29.030,0:55:32.070 aren’t used at the moment by the official[br]Tor client. So the detection of those 0:55:32.070,0:55:36.510 could be indicative but it doesn't... [br]it`s not conclusive evidence in our tool. 0:55:36.510,0:55:40.050 Question: And is there any way of[br]protecting against a government 0:55:40.050,0:55:46.510 or something trying to denial-of-service[br]hidden services? 0:55:46.510,0:55:48.180 Gareth: So I… trying to… did not… 0:55:48.180,0:55:52.500 Question: Is it possible to protect[br]against this kind of attack? 0:55:52.500,0:55:56.180 Gareth: Not that I’m aware of. The Tor[br]Project are currently revising how they 0:55:56.180,0:55:59.500 do the hidden service protocol which will[br]make e.g. what I did, enumerating 0:55:59.500,0:56:03.230 the hidden services, much more difficult.[br]And to also be in a position on the 0:56:03.230,0:56:07.470 distributed hash table in advance[br]for a particular hidden service. 0:56:07.470,0:56:10.510 So they are at the moment trying to change[br]the way it’s done, and make some of 0:56:10.510,0:56:15.270 these things more difficult. 0:56:15.270,0:56:20.290 Herald: Good. Next question[br]from Microphone 2, please. 0:56:20.290,0:56:27.220 Mic2: Hi. I’m running the Tor2Web abuse,[br]and so I used to see a lot of abuse of requests 0:56:27.220,0:56:31.130 concerning the Tor hidden service[br]being exposed on the internet through 0:56:31.130,0:56:37.270 the Tor2Web.org domain name. And I just[br]wanted to comment on, like you said, 0:56:37.270,0:56:45.410 the abuse number of the requests. I used[br]to spoke with some of the child protection 0:56:45.410,0:56:50.070 agencies that reported abuse at[br]Tor2Web.org, and they are effectively 0:56:50.070,0:56:55.570 using crawlers that periodically look for[br]changes in order to get new images to be 0:56:55.570,0:57:00.190 put in the database. And what I was able[br]to understand is that the German agency 0:57:00.190,0:57:07.440 doing that is crawling the same sites that[br]the Italian agencies are crawling, too. 0:57:07.440,0:57:11.890 So it’s likely that in most of the[br]countries there are the child protection 0:57:11.890,0:57:16.790 agencies that are crawling those few[br]numbers of Tor hidden services that 0:57:16.790,0:57:22.760 contain child porn. And I saw it also[br]a bit from the statistics of Tor2Web 0:57:22.760,0:57:28.500 where the amount of abuse relating to[br]that kind of content, it’s relatively low. 0:57:28.500,0:57:30.000 Just as contribution! 0:57:30.000,0:57:33.500 Gareth: Yes, that’s very interesting,[br]thank you for that! 0:57:33.500,0:57:37.260 applause 0:57:37.260,0:57:39.560 Herald: Next, Microphone 4, please. 0:57:39.560,0:57:45.260 Mic4: You then attacked or deanonymised[br]users with an infected or a modified Guard 0:57:45.260,0:57:51.810 relay? Is it required to modify the Guard[br]relay if I control the entry point 0:57:51.810,0:57:57.360 of the user to the internet?[br]If I’m his ISP? 0:57:57.360,0:58:01.900 Gareth: Yes, if you observe traffic[br]travelling into a Guard relay without 0:58:01.900,0:58:04.570 controlling the Guard relay itself.[br]Mic4: Yeah. 0:58:04.570,0:58:07.500 Gareth: In theory, yes. I wouldn’t be able[br]to tell you how reliable that is 0:58:07.500,0:58:10.500 off the top of my head.[br]Mic4: Thanks! 0:58:10.500,0:58:13.630 Herald: So another question[br]from the internet! 0:58:13.630,0:58:16.339 Signal Angel: Wouldn’t the ability to[br]choose the key hash prefix give 0:58:16.339,0:58:19.980 the ability to target specific .onions? 0:58:19.980,0:58:23.680 Gareth: So you can only target one .onion[br]address at a time. Because of the way 0:58:23.680,0:58:28.080 they are generated. So you wouldn’t be[br]able to say e.g. “Pick a key which targeted 0:58:28.080,0:58:32.339 two or more .onion addresses.” You can[br]only target one .onion address at a time 0:58:32.339,0:58:37.720 by positioning yourself at a particular[br]point on the distributed hash table. 0:58:37.720,0:58:40.260 Herald: Another one[br]from the internet? … Okay. 0:58:40.260,0:58:43.369 Then Microphone 3, please. 0:58:43.369,0:58:47.780 Mic3: Hey. Thanks for this research.[br]I think it strengthens the network. 0:58:47.780,0:58:54.300 So in the deem (?) I was wondering whether[br]you can donate this relays to be a part of 0:58:54.300,0:58:59.500 non-malicious relays pool, basically[br]use them as regular relays afterwards? 0:58:59.500,0:59:02.750 Gareth: Okay, so can I donate the relays[br]a rerun and at the Tor capacity (?) ? 0:59:02.750,0:59:05.490 Unfortunately, I said they were run by[br]a student and they were donated for 0:59:05.490,0:59:09.510 a fixed period of time. So we’ve given[br]those back to him. We are very grateful 0:59:09.510,0:59:14.790 to him, he was very generous. In fact,[br]without his contribution donating these 0:59:14.790,0:59:18.700 it would have been much more difficult[br]to collect as much data as we did. 0:59:18.700,0:59:21.490 Herald: Good, next, Microphone 5, please! 0:59:21.490,0:59:25.839 Mic5: Yeah hi, first of all thanks[br]for your talk. I think you’ve raised 0:59:25.839,0:59:29.310 some real issues that need to be[br]considered very carerfully by everyone 0:59:29.310,0:59:33.950 on the Tor Project. My question: I’d like[br]to go back to the issue with so many 0:59:33.950,0:59:38.470 abuse related web sites running over[br]the Tor Project. I think it’s an important 0:59:38.470,0:59:41.900 issue that really needs to be considered[br]because we don’t wanna be associated 0:59:41.900,0:59:44.840 with that at the end of the day.[br]Anyone who uses Tor, who runs a relay 0:59:44.840,0:59:51.250 or an exit node. And I understand it’s[br]a bit of a censored issue, and you don’t 0:59:51.250,0:59:55.300 really have any say over whether it’s[br]implemented or not. But I’d like to get 0:59:55.300,1:00:02.410 your opinion on the implementation[br]of a distributed block-deny system 1:00:02.410,1:00:06.980 that would run in very much a similar way[br]to those of the directory authorities. 1:00:06.980,1:00:08.950 I’d just like to see what[br]you think of that. 1:00:08.950,1:00:13.200 Gareth: So you’re asking me whether I want[br]to support a particular blocking mechanism 1:00:13.200,1:00:14.200 then? 1:00:14.200,1:00:16.470 Mic5: I’d like to get your opinion on it.[br]Gareth laughs 1:00:16.470,1:00:20.540 I know it’s a sensitive issue but I think,[br]like I said, I think something… 1:00:20.540,1:00:25.700 I think it needs to be considered because[br]everyone running exit nodes and relays 1:00:25.700,1:00:30.270 and people of the Tor Project don’t[br]want to be known or associated with 1:00:30.270,1:00:34.790 these massive amount of abuse web sites[br]that currently exist within the Tor network. 1:00:34.790,1:00:40.210 Gareth: I absolutely agree, and I think[br]the Tor Project are horrified as well that 1:00:40.210,1:00:43.960 this problem exists, and they, in fact,[br]talked on it in previous years that 1:00:43.960,1:00:48.690 they have a problem with this type of[br]content. I asked to what if anything is 1:00:48.690,1:00:52.340 done about it, it’s very much up to them.[br]Could it be done in a distributed fashion? 1:00:52.340,1:00:56.240 So the example I gave was a way which[br]it could be done by relay operators. 1:00:56.240,1:00:59.770 So e.g. that would need the consensus of[br]a large number of relay operators to be 1:00:59.770,1:01:02.890 effective. So that is done in[br]a distributed fashion. The question is: 1:01:02.890,1:01:06.810 who gives the list of .onion addresses to[br]block to each of the relay operators? 1:01:06.810,1:01:09.640 Clearly, the relay operators aren’t going[br]to collect themselves. It needs to be 1:01:09.640,1:01:15.780 supplied by someone like the Tor Project,[br]e.g., or someone trustworthy. Yes, it can 1:01:15.780,1:01:20.480 be done in a distributed fashion.[br]It can be done in an open fashion. 1:01:20.480,1:01:21.710 Mic5: Who knows?[br]Gareth: Okay. 1:01:21.710,1:01:23.750 Mic5: Thank you. 1:01:23.750,1:01:27.260 Herald: Good. And another[br]question from the internet. 1:01:27.260,1:01:31.210 Signal Angel: Apparently there’s an option[br]in the Tor client to collect statistics 1:01:31.210,1:01:35.169 on hidden services. Do you know about[br]this, and how it relates to your research? 1:01:35.169,1:01:38.551 Gareth: Yes, I believe they’re going to[br]be… the extent to which I know about it 1:01:38.551,1:01:41.930 is they’re gonna be trying this next[br]month, to try and estimate how many 1:01:41.930,1:01:46.490 hidden services there are. So keep[br]your eye on the Tor Project web site, 1:01:46.490,1:01:50.340 I’m sure they’ll be publishing[br]their data in the coming months. 1:01:50.340,1:01:55.090 Herald: And, sadly, we are running out of[br]time, so this will be the last question, 1:01:55.090,1:01:56.980 so Microphone 4, please! 1:01:56.980,1:02:01.250 Mic4: Hi, I’m just wondering if you could[br]sort of outline what ethical clearances 1:02:01.250,1:02:04.510 you had to get from your university[br]to conduct this kind of research. 1:02:04.510,1:02:07.260 Gareth: So we have to discuss these[br]types of things before undertaking 1:02:07.260,1:02:11.970 any research. And we go through the steps[br]to make sure that we’re not e.g. storing 1:02:11.970,1:02:16.370 sensitive information about particular[br]people. So yes, we are very mindful 1:02:16.370,1:02:19.240 of that. And that’s why I made a[br]particular point of putting on the slides 1:02:19.240,1:02:21.510 as to some of the things to consider. 1:02:21.510,1:02:26.180 Mic4: So like… you outlined a potential[br]implementation of the traffic correlation 1:02:26.180,1:02:29.500 attack. Are you saying that[br]you performed the attack? Or… 1:02:29.500,1:02:33.180 Gareth: No, no no, absolutely not.[br]So the link I’m giving… absolutely not. 1:02:33.180,1:02:34.849 We have not engaged in any… 1:02:34.849,1:02:36.350 Mic4: It just wasn’t clear[br]from the slides. 1:02:36.350,1:02:39.380 Gareth: I apologize. So it’s absolutely[br]clear on that. No, we’re not engaging 1:02:39.380,1:02:42.860 in any deanonymisation research on the[br]Tor network. The research I showed 1:02:42.860,1:02:46.079 is linked on the references, I think,[br]which I put at the end of the slides. 1:02:46.079,1:02:52.000 You can read about it. But it’s done in[br]simulation. So e.g. there’s a way 1:02:52.000,1:02:54.730 to do simulation of the Tor network on[br]a single computer. I can’t remember 1:02:54.730,1:02:58.880 the name of the project, though.[br]Shadow! Yes, it’s a system 1:02:58.880,1:03:02.170 called Shadow, we can run a large[br]number of Tor relays on a single computer 1:03:02.170,1:03:04.579 and simulate the traffic between them.[br]If you’re going to do that type of research 1:03:04.579,1:03:09.380 then you should use that. Okay,[br]thank you very much, everyone. 1:03:09.380,1:03:17.985 applause 1:03:17.985,1:03:22.071 silent postroll titles 1:03:22.071,1:03:27.000 subtitles created by c3subtitles.de[br]Join, and help us!