0:00:00.000,0:00:09.970
<i>silent 31C3 preroll</i>

0:00:09.970,0:00:13.220
Dr. Gareth Owen: Hello. Can you hear me?[br]Yes. Okay. So my name is Gareth Owen.

0:00:13.220,0:00:16.150
I’m from the University of Portsmouth.[br]I’m an academic

0:00:16.150,0:00:19.320
and I’m going to talk to you about[br]an experiment that we did

0:00:19.320,0:00:22.610
on the Tor hidden services,[br]trying to categorize them,

0:00:22.610,0:00:25.230
estimate how many they were etc. etc.

0:00:25.230,0:00:27.380
Well, as we go through the talk[br]I’m going to explain

0:00:27.380,0:00:31.120
how Tor hidden services work internally,[br]and how the data was collected.

0:00:31.120,0:00:35.320
So what sort of conclusions you can draw[br]from the data based on the way that we’ve

0:00:35.320,0:00:39.950
collected it. Just so [that] I get[br]an idea: how many of you use Tor

0:00:39.950,0:00:42.430
on a regular basis, could you[br]put your hand up for me?

0:00:42.430,0:00:46.120
So quite a big number. Keep your hand[br]up if… or put your hand up if you’re

0:00:46.120,0:00:48.320
a relay operator.

0:00:48.320,0:00:51.470
Wow, that’s quite a significant number,[br]isn’t it? And then, put your hand up

0:00:51.470,0:00:55.250
and/or keep it up if you[br]run a hidden service.

0:00:55.250,0:00:59.530
Okay, so, a fewer number, but still[br]some people run hidden services.

0:00:59.530,0:01:02.720
Okay, so, some of you may be very familiar[br]with the way Tor works, sort of,

0:01:02.720,0:01:06.700
in a low level. But I am gonna go through[br]it for those which aren’t, so they understand

0:01:06.700,0:01:10.380
just how they work. And as we go along,[br]because I’m explaining how

0:01:10.380,0:01:14.030
the hidden services work, I’m going[br]to tag on information on how

0:01:14.030,0:01:19.030
the Tor hidden services themselves can be[br]deanonymised and also how the users

0:01:19.030,0:01:23.090
of those hidden services can be[br]deanonymised, if you put

0:01:23.090,0:01:27.040
some strict criteria on what it is you[br]want to do with respect to them.

0:01:27.040,0:01:30.920
So the things that I’m going to go over:[br]I wanna go over how Tor works,

0:01:30.920,0:01:34.190
and then specifically how hidden services[br]work. I’m gonna talk about something

0:01:34.190,0:01:37.889
called the “Tor Distributed Hash Table”[br]for hidden services. If you’ve heard

0:01:37.889,0:01:40.560
that term and don’t know what[br]it means, don’t worry, I’ll explain

0:01:40.560,0:01:44.010
what a distributed hash table is and[br]how it works. It’s not as complicated

0:01:44.010,0:01:47.690
as it sounds. And then I wanna go over[br]Darknet data, so, data that we collected

0:01:47.690,0:01:53.030
from Tor hidden services. And as I say,[br]as we go along I will sort of explain

0:01:53.030,0:01:56.650
how you do deanonymisation of both the[br]services themselves and of the visitors

0:01:56.650,0:02:02.400
to the service. And just[br]how complicated it is.

0:02:02.400,0:02:07.370
So you may have seen this slide which[br]I think was from GCHQ, released last year

0:02:07.370,0:02:12.099
as part of the Snowden leaks where they[br]said: “You can deanonymise some users

0:02:12.099,0:02:15.560
some of the time but they’ve had[br]no success in deanonymising someone

0:02:15.560,0:02:20.109
in response to a specific request.”[br]So, given all of you e.g., I may be able

0:02:20.109,0:02:25.090
to deanonymise a small fraction of you[br]but I can’t choose precisely one person

0:02:25.090,0:02:27.499
I want to deanonymise. That’s what[br]I’m gonna be explaining in relation

0:02:27.499,0:02:30.940
to the deanonymisation attacks, how[br]you can deanonymise a section but

0:02:30.940,0:02:38.629
you can’t necessarily choose which section[br]of the users that you will be deanonymising.

0:02:38.629,0:02:42.740
Tor drives with just a couple[br]of different problems. On one part

0:02:42.740,0:02:46.239
it allows you to bypass censorship. So if[br]you’re in a country like China, which

0:02:46.239,0:02:51.010
blocks some types of traffic you can use[br]Tor to bypass their censorship blocks.

0:02:51.010,0:02:55.541
It tries to give you privacy, so, at some[br]level in the network someone can’t see

0:02:55.541,0:02:59.200
what you’re doing. And at another point[br]in the network people who don’t know

0:02:59.200,0:03:02.540
who you are but may necessarily[br]be able to see what you’re doing.

0:03:02.540,0:03:07.099
Now the traditional case[br]for this is to look at VPNs.

0:03:07.099,0:03:10.669
With a VPN you have[br]sort of a single provider.

0:03:10.669,0:03:14.689
You have lots of users connecting[br]to the VPN. The VPN has sort of

0:03:14.689,0:03:18.240
a mixing effect from an outside or[br]a server’s point of view. And then

0:03:18.240,0:03:22.499
out of the VPN you see requests[br]to Twitter, Wikipedia etc. etc.

0:03:22.499,0:03:26.830
And if that traffic doesn’t encrypt it then[br]the VPN can also read the contents

0:03:26.830,0:03:30.980
of the traffic. Now of course there is[br]a fundamental weakness with this.

0:03:30.980,0:03:35.730
If you trust the VPN provider the VPN[br]provider knows both who you are

0:03:35.730,0:03:39.629
and what you’re doing and can[br]link those two together with absolute

0:03:39.629,0:03:43.580
certainty. So you don’t… whilst you do[br]get some of these properties, assuming

0:03:43.580,0:03:48.069
you’ve got a trustworthy VPN provider[br]you don’t get them in the face of

0:03:48.069,0:03:51.609
an untrustworthy VPN provider.[br]And of course: how do you trust the VPN

0:03:51.609,0:03:59.319
provider? What sort of measure do[br]you use? That’s sort of an open question.

0:03:59.319,0:04:03.729
So Tor tries to solve this problem[br]by distributing the trust. Tor is

0:04:03.729,0:04:07.500
an open source project, so you can go[br]on to their Git repository, you can

0:04:07.500,0:04:12.620
download the source code, and change it,[br]improve it, submit patches etc.

0:04:12.620,0:04:17.108
As you heard earlier, during Jacob and[br]Roger’s talk they’re currently partly

0:04:17.108,0:04:20.949
sponsored by the US Government which seems[br]a bit paradoxical, but they explained

0:04:20.949,0:04:24.770
in that talk many of the… that[br]doesn’t affect like judgment.

0:04:24.770,0:04:28.540
And indeed, they do have some funding from[br]other sources, and they design that system

0:04:28.540,0:04:30.841
– which I’ll talk about a little bit[br]later – in a way where they don’t have

0:04:30.841,0:04:34.230
to trust each other. So there’s sort of[br]some redundancy, and they’re trying

0:04:34.230,0:04:39.650
to minimize these sort of trust issues[br]related to this. Now, Tor is

0:04:39.650,0:04:43.310
a partially de-centralized network, which[br]means that it has some centralized

0:04:43.310,0:04:47.870
components which are under the control of[br]the Tor Project and some de-centralized

0:04:47.870,0:04:51.190
components which are normally the Tor[br]relays. If you run a relay you’re

0:04:51.190,0:04:56.290
one of those de-centralized components.[br]There is, however, no single authority

0:04:56.290,0:05:01.110
on the Tor network.[br]So no single server which is responsible,

0:05:01.110,0:05:04.290
which you’re required to trust.[br]So the trust is somewhat distributed,

0:05:04.290,0:05:12.000
but not entirely. When you establish[br]a circuit through Tor you, the user,

0:05:12.000,0:05:15.500
download a list of all of the relays[br]inside the Tor network.

0:05:15.500,0:05:19.070
And you get to pick – and I’ll tell you[br]how you do that – which relays

0:05:19.070,0:05:22.750
you’re going to use to route your traffic[br]through. So here is a typical example:

0:05:22.750,0:05:27.090
You’re here on the left hand side as the[br]user. You download a list of the relays

0:05:27.090,0:05:32.010
inside the Tor network and you select from[br]that list three nodes, a guard node

0:05:32.010,0:05:36.580
which is your entry into the Tor network,[br]a relay node which is a middle node.

0:05:36.580,0:05:39.010
Essentially, it’s going to route your[br]traffic to a third hop. And then

0:05:39.010,0:05:42.650
the third hop is the exit node where[br]your traffic essentially exits out

0:05:42.650,0:05:46.840
on the internet. Now, looking at the[br]circuit. So this is a circuit through

0:05:46.840,0:05:50.170
the Tor network through which you’re[br]going to route your traffic. There are

0:05:50.170,0:05:52.540
three layers of encryption at the[br]beginning, so between you

0:05:52.540,0:05:56.150
and the guard node. Your traffic[br]is encrypted three times.

0:05:56.150,0:05:59.330
In the first instance encrypted to the[br]guard, and the it’s encrypted again,

0:05:59.330,0:06:03.180
through the relay, and then encrypted[br]again to the exit, and as the traffic moves

0:06:03.180,0:06:08.710
through the Tor network each of those[br]layers of encryption are unpeeled

0:06:08.710,0:06:17.300
from the data. The Guard here in this case[br]knows who you are, and the exit relay

0:06:17.300,0:06:21.590
knows what you’re doing but neither know[br]both. And the middle relay doesn’t really

0:06:21.590,0:06:26.710
know a lot, except for which relay is[br]her guard and which relay is her exit.

0:06:26.710,0:06:31.870
Who runs an exit relay? So if you run[br]an exit relay all of the traffic which

0:06:31.870,0:06:36.210
users are sending out on the internet they[br]appear to come from your IP address.

0:06:36.210,0:06:41.360
So running an exit relay is potentially[br]risky because someone may do something

0:06:41.360,0:06:45.590
through your relay which attracts attention.[br]And then, when law enforcement

0:06:45.590,0:06:48.940
traced that back to an IP address it’s[br]going to come back to your address.

0:06:48.940,0:06:51.790
So some relay operators have had trouble[br]with this, with law enforcement coming

0:06:51.790,0:06:55.360
to them, and saying: “Hey we got this[br]traffic coming through your IP address

0:06:55.360,0:06:57.950
and you have to go and explain it.”[br]So if you want to run an exit relay

0:06:57.950,0:07:01.400
it’s a little bit risky, but we’re thankful[br]for those people that do run exit relays

0:07:01.400,0:07:04.870
because ultimately if people didn’t run[br]an exit relay you wouldn’t be able

0:07:04.870,0:07:08.000
to get out of the Tor network, and it[br]wouldn’t be terribly useful from this

0:07:08.000,0:07:20.560
point of view. So, yes.[br]<i>applause</i>

0:07:20.560,0:07:24.610
So every Tor relay, when you set up[br]a Tor relay you publish something called

0:07:24.610,0:07:28.780
a descriptor which describes your Tor[br]relay and how to use it to a set

0:07:28.780,0:07:33.430
of servers called the authorities. And the[br]trust in the Tor network is essentially

0:07:33.430,0:07:38.610
split across these authorities. They’re run[br]by the core Tor Project members.

0:07:38.610,0:07:42.639
And they maintain a list of all of the[br]relays in the network. And they observe

0:07:42.639,0:07:46.010
them over a period of time. If the relays[br]exhibit certain properties they give

0:07:46.010,0:07:50.480
the relays flags. If e.g. a relay allows[br]traffic to exit from the Tor network

0:07:50.480,0:07:54.450
it will get the ‘Exit’ flag. If they’d been[br]switched on for a certain period of time,

0:07:54.450,0:07:58.400
or for a certain amount of traffic they’ll[br]be allowed to become the guard relay

0:07:58.400,0:08:02.180
which is the first node in your circuit.[br]So when you build your circuit you

0:08:02.180,0:08:07.230
download a list of these descriptors from[br]one of the Directory Authorities. You look

0:08:07.230,0:08:10.120
at the flags which have been assigned to[br]each of the relays, and then you pick

0:08:10.120,0:08:14.150
your route based on that. So you’ll pick[br]the guard node from a set of relays

0:08:14.150,0:08:16.400
which have the ‘Guard’ flag, your exits[br]from the set of relays which have

0:08:16.400,0:08:20.860
the ‘Exit’ flag etc. etc. Now, as of[br]a quick count this morning there are

0:08:20.860,0:08:29.229
about 1500 guard relays, around 1000 exit[br]relays, and six relays flagged as ‘bad’ exits.

0:08:29.229,0:08:34.360
What does a ‘bad exit’ mean?[br]<i>waits for audience to respond</i>

0:08:34.360,0:08:37.759
That’s not good! That’s exactly[br]what it means! Yes! <i>laughs</i>

0:08:37.759,0:08:40.450
<i>applause</i>

0:08:40.450,0:08:45.569
So relays which have been flagged as ‘bad[br]exits’ your client will never chose to exit

0:08:45.569,0:08:50.660
traffic through. And examples of things[br]which may get a relay flagged as an

0:08:50.660,0:08:53.829
[bad] exit relay – if they’re fiddling with[br]the traffic which is coming out of

0:08:53.829,0:08:57.019
the Tor relay. Or doing things like[br]man-in-the-middle attacks against

0:08:57.019,0:09:01.629
SSL traffic. We’ve seen various things,[br]there have been relays man-in-the-middling

0:09:01.629,0:09:07.050
SSL traffic, there have very, very recently[br]been an exit relay which was patching

0:09:07.050,0:09:10.800
binaries that you downloaded from the[br]internet, inserting malware into the binaries.

0:09:10.800,0:09:14.630
So you can do these things but the Tor[br]Project tries to scan for them. And if

0:09:14.630,0:09:19.829
these things are detected then they’ll be[br]flagged as ‘Bad Exits’. It’s true to say

0:09:19.829,0:09:24.610
that the scanning mechanism is not 100%[br]fool-proof by any stretch of the imagination.

0:09:24.610,0:09:28.559
It tries to pick up common types[br]of attacks, so as a result

0:09:28.559,0:09:32.480
it won’t pick up unknown attacks or[br]attacks which haven’t been seen or

0:09:32.480,0:09:36.680
have not been known about beforehand.

0:09:36.680,0:09:45.370
So looking at this, how do you deanonymise[br]the traffic travelling through the Tor

0:09:45.370,0:09:49.449
networks? Given some traffic coming out[br]of the exit relay, how do you know

0:09:49.449,0:09:54.269
which user that corresponds to? What is[br]their IP address? You can’t actually

0:09:54.269,0:09:58.279
modify the traffic because if any of the[br]relays tried to modify the traffic

0:09:58.279,0:10:02.249
which they’re sending through the network[br]Tor will tear down the circuit through the relay.

0:10:02.249,0:10:06.290
So there’s these integrity checks, each[br]of the hops. And if you try to sort of

0:10:06.290,0:10:09.870
– because you can’t decrypt the packet[br]you can’t modify it in any meaningful way,

0:10:09.870,0:10:13.749
and because there’s an integrity check[br]at the next hop that means that you can’t

0:10:13.749,0:10:17.019
modify the packet because otherwise it’s[br]detected. So you can’t do this sort of

0:10:17.019,0:10:20.900
marker, and try and follow the marker[br]through the network. So instead

0:10:20.900,0:10:26.699
what you can do if you control… so let me[br]give you two cases. In the worst case

0:10:26.699,0:10:31.330
if the attacker controls all three of your[br]relays that you pick, which is an unlikely

0:10:31.330,0:10:34.739
scenario that needs to control quite[br]a big proportion of the network. Then

0:10:34.739,0:10:39.550
it should be quite obvious that they can[br]work out who you are and also

0:10:39.550,0:10:42.369
see what you’re doing because in that[br]case they can tag the traffic, and

0:10:42.369,0:10:45.709
they can just discard these integrity[br]checks at each of the following hops.

0:10:45.709,0:10:50.709
Now in a different case, if you control[br]the Guard relay and the exit relay

0:10:50.709,0:10:54.160
but not the middle relay the Guard relay[br]can’t tamper with the traffic because

0:10:54.160,0:10:57.660
this middle relay will close down the[br]circuit as soon as it happens.

0:10:57.660,0:11:01.130
The exit relay can’t send stuff back down[br]the circuit to try and identify the user,

0:11:01.130,0:11:05.030
either. Because again, the circuit will be[br]closed down. So what can you do?

0:11:05.030,0:11:09.869
Well, you can count the number of packets[br]going through the Guard node. And you can

0:11:09.869,0:11:14.690
measure the timing differences between[br]packets, and try and spot that pattern

0:11:14.690,0:11:18.750
at the Exit relays. You’re looking at counts of[br]packets and the timing between those

0:11:18.750,0:11:22.360
packets which are being sent, and[br]essentially trying to correlate them all.

0:11:22.360,0:11:26.869
So if your user happens to pick you as[br]your Guard node, and then happens to pick

0:11:26.869,0:11:31.850
your exit relay, then you can deanonymise[br]them with very high probability using

0:11:31.850,0:11:35.649
this technique. You’re just correlating[br]the timings of packets and counting

0:11:35.649,0:11:38.889
the number of packets going through.[br]And the attacks demonstrated in literature

0:11:38.889,0:11:44.509
are very reliable for this. We heard[br]earlier from the Tor talk about the “relay

0:11:44.509,0:11:50.739
early” tag which was the attack discovered[br]by the cert researches in the US.

0:11:50.739,0:11:55.050
That attack didn’t rely on timing attacks.[br]Instead, what they were able to do was

0:11:55.050,0:11:58.720
send a special type of cell containing[br]the data back down the circuit,

0:11:58.720,0:12:01.889
essentially marking this data, and saying:[br]“This is the data we’re seeing

0:12:01.889,0:12:06.149
at the Exit relay, or at the hidden[br]service", and encode into the messages

0:12:06.149,0:12:10.049
travelling back down the circuit, what the[br]data was. And then you could pick

0:12:10.049,0:12:14.269
those up at the Guard relay and say, okay,[br]whether it’s this person that’s doing that.

0:12:14.269,0:12:18.370
In fact, although this technique works,[br]and yeah it was a very nice attack,

0:12:18.370,0:12:21.269
the traffic correlation attacks are[br]actually just as powerful.

0:12:21.269,0:12:25.259
So although this bug has been fixed traffic[br]correlation attacks still work and are

0:12:25.259,0:12:29.739
still fairly, fairly reliable. So the problem[br]still does exist. This is very much

0:12:29.739,0:12:33.399
an open question. How do we solve this[br]problem? We don’t know, currently,

0:12:33.399,0:12:40.040
how to solve this problem of trying[br]to tackle the traffic correlation.

0:12:40.040,0:12:45.369
There are a couple of solutions.[br]But they’re not particularly…

0:12:45.369,0:12:48.569
they’re not particularly reliable. Let me[br]just go through these, and I’ll skip back

0:12:48.569,0:12:53.061
on the few things I’ve missed. The first[br]thing is, high-latency networks, so

0:12:53.061,0:12:56.999
networks where packets are delayed[br]in their transit through the network.

0:12:56.999,0:13:00.740
That throws away a lot of the timing[br]information. So they promise

0:13:00.740,0:13:03.800
to potentially solve this problem.[br]But of course, if you want to visit

0:13:03.800,0:13:06.779
Google’s home page, and you have to wait[br]five minutes for it, you’re simply

0:13:06.779,0:13:11.910
just not going to use Tor. The whole point[br]is trying to make this technology usable.

0:13:11.910,0:13:14.759
And if you got something which is very,[br]very slow then it doesn’t make it

0:13:14.759,0:13:18.269
attractive to use. But of course,[br]this case does work slightly better

0:13:18.269,0:13:22.059
for e-mail. If you think about it with[br]e-mail, you don’t mind if you’re e-mail

0:13:22.059,0:13:25.399
– well, you may not mind, you may mind –[br]you don’t mind if your e-mail is delayed

0:13:25.399,0:13:29.120
by some period of time. Which makes this[br]somewhat difficult. And as Roger said

0:13:29.120,0:13:35.130
earlier, you can also introduce padding[br]into the circuit, so these are dummy cells.

0:13:35.130,0:13:39.839
But, but… with a big caveat: some of the[br]research suggests that actually you’d

0:13:39.839,0:13:43.439
need to introduce quite a lot of padding[br]to defeat these attacks, and that would

0:13:43.439,0:13:47.179
overload the Tor network in its current[br]state. So, again, not a particular

0:13:47.179,0:13:53.860
practical solution.

0:13:53.860,0:13:58.279
How does Tor try to solve this problem?[br]Well, Tor makes it very difficult

0:13:58.279,0:14:03.171
to become a users Guard relay. If you[br]can’t become a users Guard relay

0:14:03.171,0:14:07.839
then you don’t know who the user is, quite[br]simply. And so by making it very hard

0:14:07.839,0:14:13.249
to become the Guard relay therefore you[br]can’t do this traffic correlation attack.

0:14:13.249,0:14:17.579
So at the moment the Tor client chooses[br]one Guard relay and keeps it for a period

0:14:17.579,0:14:22.259
of time. So if I want to sort of target[br]just one of you I would need to control

0:14:22.259,0:14:26.259
the Guard relay that you were using at[br]that particular point in time. And in fact

0:14:26.259,0:14:30.679
I’d also need to know what that Guard[br]relay is. So by making it very unlikely

0:14:30.679,0:14:34.129
that you would select a particular malicious[br]Guard relay, where the number of malicious

0:14:34.129,0:14:39.179
Guard relays is very small, that’s how Tor[br]tries to solve this problem. And

0:14:39.179,0:14:43.280
at the moment your Guard relay is your[br]barrier of security. If the attacker can’t

0:14:43.280,0:14:46.460
control the Guard relay then they won’t[br]know who you are. That doesn’t mean

0:14:46.460,0:14:50.639
they can’t try other sort of side channel[br]attacks by messing with the traffic

0:14:50.639,0:14:55.129
at the Exit relay etc. You know that you[br]may sort of e.g. download dodgy documents

0:14:55.129,0:14:59.499
and open one on your computer, and those[br]sort of things. Now the alternative

0:14:59.499,0:15:02.769
of course to having a Guard relay[br]and keeping it for a very long time

0:15:02.769,0:15:06.029
will be to have a Guard relay and[br]to change it on a regular basis.

0:15:06.029,0:15:09.929
Because you might think, well, just choosing[br]one Guard relay and sticking with it

0:15:09.929,0:15:13.399
is probably a bad idea. But actually,[br]that’s not the case. If you pick

0:15:13.399,0:15:18.370
the Guard relay, and assuming that the[br]chance of picking a Guard relay that is

0:15:18.370,0:15:22.800
malicious is very low, then, when you[br]first use your Guard relay, if you got

0:15:22.800,0:15:27.420
a good choice, then your traffic is safe.[br]If you haven’t got a good choice then

0:15:27.420,0:15:31.759
your traffic isn’t safe. Whereas if your[br]Tor client chooses a Guard relay

0:15:31.759,0:15:35.610
every few minutes, or every hour, or[br]something on those lines at some point

0:15:35.610,0:15:39.179
you’re gonna pick a malicious Guard relay.[br]So they’re gonna have some of your traffic

0:15:39.179,0:15:43.399
but not all of it. And so currently the[br]trade-off is that we make it very difficult

0:15:43.399,0:15:48.490
for an attacker to control a Guard relay[br]and the user picks a Guard relay and

0:15:48.490,0:15:52.449
keeps it for a long period of time. And[br]so it’s very difficult for the attackers

0:15:52.449,0:15:58.939
to pick that Guard relay when they control[br]a very small proportion of the network.

0:15:58.939,0:16:06.420
So this, currently, provides those[br]properties I described earlier, the privacy

0:16:06.420,0:16:11.410
and the anonymity when you’re browsing the[br]web, when you’re accessing websites etc.

0:16:11.410,0:16:16.519
But still you know who the website is. So[br]although you’re anonymous and the website

0:16:16.519,0:16:20.730
doesn’t know who you are you know who the[br]website is. And there may be some cases

0:16:20.730,0:16:25.499
where e.g. the website would also wish to[br]remain anonymous. You want the person

0:16:25.499,0:16:29.970
accessing the website and the website[br]itself to be anonymous to each other.

0:16:29.970,0:16:34.230
And you could think about people e.g.[br]being in countries where running

0:16:34.230,0:16:39.730
a political blog e.g. might be a dangerous[br]activity. If you run that on a regular

0:16:39.730,0:16:45.660
webserver you’re easily identified whereas,[br]if you got some way where you as

0:16:45.660,0:16:49.490
the webserver can be anonymous then[br]that allows you to do that activity without

0:16:49.490,0:16:57.480
being targeted by your government. So[br]this is what hidden services try to solve.

0:16:57.480,0:17:03.080
Now when you first think about a problem[br]you kind of think: “Hang on a second,

0:17:03.080,0:17:06.429
the user doesn’t know who the website[br]is and the website doesn’t know

0:17:06.429,0:17:09.890
who the user is. So how on earth do they[br]talk to each other?” Well, that’s essentially

0:17:09.890,0:17:14.220
what the Tor hidden service protocol tries[br]to sort of set up. How do you identify and

0:17:14.220,0:17:19.579
connect to each other. So at the moment[br]this is what happens: We’ve got Bob

0:17:19.579,0:17:23.780
on the [right] hand side who is the hidden[br]service. And we got Alice on the left hand

0:17:23.780,0:17:28.620
side here who is the user who wishes to[br]visit the hidden service. Now when Bob

0:17:28.620,0:17:34.190
sets up his hidden service he picks three[br]nodes in the Tor network as introduction

0:17:34.190,0:17:38.831
points and builds several hop circuits to[br]them. So the introduction points don’t know

0:17:38.831,0:17:44.680
who Bob is. Bob has circuits to them. And[br]Bob says to each of these introduction points

0:17:44.680,0:17:48.240
“Will you relay traffic to me if someone[br]connects to you asking for me?”

0:17:48.240,0:17:53.030
And then those introduction points[br]do that. So then, once Bob has picked

0:17:53.030,0:17:56.840
his introduction points he publishes[br]a descriptor describing the list of his

0:17:56.840,0:18:01.310
introduction points for someone who wishes[br]to come onto his websites. And then Alice

0:18:01.310,0:18:06.700
on the left hand side wishing to visit Bob[br]will pick a rendezvous point in the network

0:18:06.700,0:18:10.030
and build a circuit to it. So this “RP”[br]here is the rendezvous point.

0:18:10.030,0:18:14.530
And she will relay a message via one of[br]the introduction points saying to Bob:

0:18:14.530,0:18:18.290
“Meet me at the rendezvous point”.[br]And then Bob will build a 3-hop-circuit

0:18:18.290,0:18:22.870
to the rendezvous point. So now at this[br]stage we got Alice with a multi-hop circuit

0:18:22.870,0:18:26.890
to the rendezvous point, and Bob with[br]a multi-hop circuit to the rendezvous point.

0:18:26.890,0:18:32.550
Alice and Bob haven’t connected to one[br]another directly. The rendezvous point

0:18:32.550,0:18:36.530
doesn’t know who Bob is, the rendezvous[br]point doesn’t know who Alice is.

0:18:36.530,0:18:40.261
All they’re doing is forwarding the[br]traffic. And they can’t inspect the traffic,

0:18:40.261,0:18:43.740
either, because the traffic itself[br]is encrypted.

0:18:43.740,0:18:47.530
So that’s currently how you solve this[br]problem with trying to communicate

0:18:47.530,0:18:50.820
with someone who you don’t know[br]who they are and vice versa.

0:18:50.820,0:18:55.740
<i>drinks from the bottle</i>

0:18:55.740,0:18:58.870
The principle thing I’m going to talk[br]about today is this database.

0:18:58.870,0:19:01.990
So I said, Bob, when he picks his[br]introduction points he builds this thing

0:19:01.990,0:19:06.080
called a descriptor, describing who his[br]introduction points are, and he publishes

0:19:06.080,0:19:10.390
them to a database. This database itself[br]is distributed throughout the Tor network.

0:19:10.390,0:19:17.860
It’s not a single server. So both, Bob and[br]Alice need to be able to publish information

0:19:17.860,0:19:22.040
to this database, and also retrieve[br]information from this database. And Tor

0:19:22.040,0:19:24.820
currently uses something called[br]a distributed hash table, which I’m gonna

0:19:24.820,0:19:27.930
give an example of what this means and[br]how it works. And then I’ll talk to you

0:19:27.930,0:19:34.380
specifically how the Tor Distributed Hash[br]Table works itself. So let’s say e.g.

0:19:34.380,0:19:39.830
you've got a set of servers. So here we've[br]got 26 servers and you’d like to store

0:19:39.830,0:19:44.240
your files across these different servers[br]without having a single server responsible

0:19:44.240,0:19:48.050
for deciding, “okay, that file is stored[br]on that server, and this file is stored

0:19:48.050,0:19:53.050
on that server” etc. etc. Now here is my[br]list of files. You could take a very naive

0:19:53.050,0:19:57.740
approach. And you could say: “Okay, I’ve[br]got 26 servers, I got all of these file names

0:19:57.740,0:20:01.250
and start with the letter of the alphabet.”[br]And I could say: “All of the files that begin

0:20:01.250,0:20:05.450
with A are gonna go under server A; or[br]the files that begin with B are gonna go

0:20:05.450,0:20:09.900
on server B etc.” And then when you want[br]to retrieve a file you say: “Okay, what

0:20:09.900,0:20:13.950
does my file name begin with?” And then[br]you know which server it’s stored on.

0:20:13.950,0:20:17.750
Now of course you could have a lot of[br]servers – sorry – a lot of files

0:20:17.750,0:20:22.780
which begin with a Z, an X or a Y etc. in[br]which case you’re gonna overload

0:20:22.780,0:20:27.310
that server. You’re gonna have more files[br]stored on one server than on another server

0:20:27.310,0:20:32.150
in your set. And if you have a lot of big[br]files, say e.g. beginning with B then

0:20:32.150,0:20:35.520
rather than distributing your files across[br]all the servers you’re gonna just be

0:20:35.520,0:20:39.060
overloading one or two of them. So to[br]solve this problem what we tend to do is:

0:20:39.060,0:20:42.410
we take the file name, and we run it[br]through a cryptographic hash function.

0:20:42.410,0:20:46.930
A hash function produces output which[br]looks like random, very small changes

0:20:46.930,0:20:50.740
in the input so a cryptographic hash[br]function produces a very large change

0:20:50.740,0:20:55.240
in the output. And this change looks[br]random. So if I take all of my file names

0:20:55.240,0:20:59.820
here, and assuming I have a lot more,[br]I take a hash of them, and then I use

0:20:59.820,0:21:05.470
that hash to determine which server to[br]store the file on. Then, with high probability

0:21:05.470,0:21:09.670
my files will be distributed evenly across[br]all of the servers. And then when I want

0:21:09.670,0:21:12.990
to go and retrieve one of the files I take[br]my file name, I run it through the

0:21:12.990,0:21:15.980
cryptographic hash function, that gives me[br]the hash, and then I use that hash

0:21:15.980,0:21:19.740
to identify which server that particular[br]file is stored on. And then I go and

0:21:19.740,0:21:25.990
retrieve it. So that’s the sort of a loose[br]idea of how a distributed hash table works.

0:21:25.990,0:21:29.340
There are a couple of problems with this.[br]What if you got a changing size, what

0:21:29.340,0:21:34.700
if the number of servers you got changes[br]in size as it does in the Tor network.

0:21:34.700,0:21:42.290
It’s a very brief overview of the theory.[br]So how does it apply for the Tor network?

0:21:42.290,0:21:47.640
Well, the Tor network has a set of relays[br]and it has a set of hidden services.

0:21:47.640,0:21:52.710
Now we take all of the relays, and they[br]have a hash identity which identifies them.

0:21:52.710,0:21:57.460
And we map them onto a circle using that[br]hash value as an identifier. So you can

0:21:57.460,0:22:03.230
imagine the hash value ranging from Zero[br]to a very large number. We got a Zero point

0:22:03.230,0:22:07.280
at the very top there. And that runs all[br]the way round to the very large number.

0:22:07.280,0:22:12.130
So given the identity hash for a relay we[br]can map that to a particular point on

0:22:12.130,0:22:19.070
the server. And then all we have to do[br]is also do this for hidden services.

0:22:19.070,0:22:22.320
So there’s a hidden service address,[br]something.onion, so this is

0:22:22.320,0:22:27.750
one of the hidden websites that you might[br]visit. You take the – I’m not gonna describe

0:22:27.750,0:22:33.980
in too much detail how this is done but –[br]the value is done in such a way such that

0:22:33.980,0:22:38.020
it’s evenly distributed about the circle.[br]So your hidden service will have

0:22:38.020,0:22:44.240
a particular point on the circle. And the[br]relays will also be mapped onto this circle.

0:22:44.240,0:22:49.640
So there’s the relays. And the hidden[br]service. And in the case of Tor

0:22:49.640,0:22:53.460
the hidden service actually maps to two[br]positions on the circle, and it publishes

0:22:53.460,0:22:57.850
its descriptor to the three relays to the[br]right at one position, and the three relays

0:22:57.850,0:23:01.600
to the right at another position. So there[br]are actually in total six places where

0:23:01.600,0:23:05.060
this descriptor is published on the[br]circle. And then if I want to go and

0:23:05.060,0:23:09.450
fetch and connect to a hidden service[br]I go on to go and pull this hidden descriptor

0:23:09.450,0:23:13.780
down to identify what its introduction[br]points are. I take the hidden service

0:23:13.780,0:23:17.200
address, I find out where it is on the[br]circle, I map all of the relays onto

0:23:17.200,0:23:21.110
the circle, and then I identify which[br]relays on the circle are responsible

0:23:21.110,0:23:24.031
for that particular hidden service. And[br]I just connect, then I say: “Do you have

0:23:24.031,0:23:26.630
a copy of the descriptor for that[br]particular hidden service?”

0:23:26.630,0:23:29.620
And if so then we’ve got our list of[br]introduction points. And we can go

0:23:29.620,0:23:38.020
to the next steps to connect to our hidden[br]service. So I’m gonna explain how we

0:23:38.020,0:23:41.320
sort of set up our experiments. What we[br]thought, or what we were interested to do,

0:23:41.320,0:23:48.181
was collect publications of hidden[br]services. So for everytime a hidden service

0:23:48.181,0:23:51.520
gets set up it publishes to this distributed[br]hash table. What we wanted to do was

0:23:51.520,0:23:55.750
collect those publications so that we[br]get a complete list of all of the hidden

0:23:55.750,0:23:59.280
services. And what we also wanted to do[br]is to find out how many times a particular

0:23:59.280,0:24:06.300
hidden service is requested.

0:24:06.300,0:24:10.540
Just one more point that[br]will become important later.

0:24:10.540,0:24:14.230
The position which the hidden service[br]appears on the circle changes

0:24:14.230,0:24:18.950
every 24 hours. So there’s not[br]a fixed position every single day.

0:24:18.950,0:24:24.370
If we run 40 nodes over a long period of[br]time we will occupy positions within

0:24:24.370,0:24:29.570
that distributed hash table. And we will be[br]able to collect publications and requests

0:24:29.570,0:24:34.300
for hidden services that are located at[br]that position inside the distributed

0:24:34.300,0:24:39.251
hash table. So in that case we ran 40 Tor[br]nodes, we had a student at university

0:24:39.251,0:24:43.950
who said: “Hey, I run a hosting company,[br]I got loads of server capacity”, and

0:24:43.950,0:24:46.580
we told him what we were doing, and he[br]said: “Well, you really helped us out,

0:24:46.580,0:24:49.820
these last couple of years…”[br]and just gave us loads of server capacity

0:24:49.820,0:24:55.500
to allow us to do this. So we spun up 40[br]Tor nodes. Each Tor node was required

0:24:55.500,0:24:59.560
to advertise a certain amount of bandwidth[br]to become a part of that distributed

0:24:59.560,0:25:02.200
hash table. It’s actually a very small[br]amount, so this didn’t matter too much.

0:25:02.200,0:25:06.050
And then, after – this has changed[br]recently in the last few days,

0:25:06.050,0:25:10.070
it used to be 25 hours, it’s just been[br]increased as a result of one of the

0:25:10.070,0:25:14.570
attacks last week. But here… certainly[br]during our study it was 25 hours. You then

0:25:14.570,0:25:18.300
appear at a particular point inside that[br]distributed hash table. And you’re then

0:25:18.300,0:25:22.750
in a position to record publications of[br]hidden services and requests for hidden

0:25:22.750,0:25:27.810
services. So not only can you get a full[br]list of the onion addresses you can also

0:25:27.810,0:25:32.250
find out how many times each of the[br]onion addresses are requested.

0:25:32.250,0:25:38.270
And so this is what we recorded. And then,[br]once we had a full list of… or once

0:25:38.270,0:25:41.830
we had run for a long period of time to[br]collect a long list of .onion addresses

0:25:41.830,0:25:46.850
we then built a custom crawler that would[br]visit each of the Tor hidden services

0:25:46.850,0:25:51.450
in turn, and pull down the HTML contents,[br]the text content from the web page,

0:25:51.450,0:25:54.760
so that we could go ahead and classify[br]the content. Now it’s really important

0:25:54.760,0:25:59.250
to know here, and it will become obvious[br]why a little bit later, we only pulled down

0:25:59.250,0:26:03.030
HTML content. We didn’t pull out images.[br]And there’s a very, very important reason

0:26:03.030,0:26:09.980
for that which will become clear shortly.

0:26:09.980,0:26:13.520
We had a lot of questions when we[br]first started this. Noone really knew

0:26:13.520,0:26:18.000
how many hidden services there were. It had[br]been suggested to us there was a very high

0:26:18.000,0:26:21.250
turn-over of hidden services. We wanted to[br]confirm that whether that was true or not.

0:26:21.250,0:26:24.530
And we also wanted to do this so,[br]what are the hidden services,

0:26:24.530,0:26:30.140
how popular are they, etc. etc. etc. So[br]our estimate for how many hidden services

0:26:30.140,0:26:34.770
there are, over the period which we[br]ran our study, this is a graph plotting

0:26:34.770,0:26:38.560
our estimate for each of the individual[br]days as to how many hidden services

0:26:38.560,0:26:44.850
there were on that particular day. Now the[br]data is naturally noisy because we’re only

0:26:44.850,0:26:48.590
a very small proportion of that circle.[br]So we’re only observing a very small

0:26:48.590,0:26:53.250
proportion of the total publications and[br]requests every single day, for each of

0:26:53.250,0:26:57.260
those hidden services. And if you[br]take a long term average for this

0:26:57.260,0:27:02.720
there’s about 45.000 hidden services that[br]we think were present, on average,

0:27:02.720,0:27:07.880
each day, during our entire study. Which[br]is a large number of hidden services.

0:27:07.880,0:27:11.070
But over the entire length we[br]collected about 80.000, in total.

0:27:11.070,0:27:14.270
Some came and went etc.[br]So the next question after how many

0:27:14.270,0:27:17.750
hidden services there are is how long[br]the hidden service exists for.

0:27:17.750,0:27:20.620
Does it exist for a very long period[br]of time, does it exist for a very short

0:27:20.620,0:27:24.220
period of time etc. etc.[br]So what we did was, for every single

0:27:24.220,0:27:30.260
.onion address we plotted how many times[br]we saw a publication for that particular

0:27:30.260,0:27:34.160
hidden service during the six months.[br]How many times did we see it.

0:27:34.160,0:27:38.100
If we saw it a lot of times that suggested[br]in general the hidden service existed

0:27:38.100,0:27:42.180
for a very long period of time. If we saw[br]a very short number of publications

0:27:42.180,0:27:45.760
for each hidden service then that[br]suggests that they were only present

0:27:45.760,0:27:51.690
for a very short period of time. This is[br]our graph. By far the most number

0:27:51.690,0:27:55.890
of hidden services we only saw once during[br]the entire study. And we never saw them

0:27:55.890,0:28:00.390
again. We suggest that there’s a very high[br]turnover of the hidden services, they

0:28:00.390,0:28:04.520
don’t tend to exist on average i.e. for[br]a very long period of time.

0:28:04.520,0:28:10.730
And then you can see the sort of[br]a tail here. If we plot just those

0:28:10.730,0:28:16.390
hidden services which existed for a long[br]time, so e.g. we could take hidden services

0:28:16.390,0:28:20.280
which have a high number of hit requests[br]and say: “Okay, those that have a high number

0:28:20.280,0:28:24.800
of hits probably existed for a long time.”[br]That’s not absolutely certain, but probably.

0:28:24.800,0:28:29.190
Then you see this sort of -normal- plot[br]about 4..5, so we saw on average

0:28:29.190,0:28:34.870
most hidden services four or five times[br]during the entire six months if they were

0:28:34.870,0:28:40.530
popular and we’re using that as a proxy[br]measure for whether they existed

0:28:40.530,0:28:48.160
for the entire time. Now, this stage was[br]over 160 days, so almost six months.

0:28:48.160,0:28:51.490
What we also wanted to do was trying[br]to confirm this over a longer period.

0:28:51.490,0:28:56.310
So last year, in 2013, about February time[br]some researchers of the University

0:28:56.310,0:29:00.350
of Luxemburg also ran a similar study[br]but it ran over a very short period of time

0:29:00.350,0:29:05.060
over the day. But they did it in such[br]a way it could collect descriptors

0:29:05.060,0:29:08.590
across much of the circle during a single[br]day. That was because of a bug in the way

0:29:08.590,0:29:12.020
Tor did some of the things which has[br]now been fixed so we can’t repeat that

0:29:12.020,0:29:16.520
as a particular way. So we got a list of[br].onion addresses from February 2013

0:29:16.520,0:29:18.960
from these researchers at the University[br]of Luxemburg. And then we got our list

0:29:18.960,0:29:23.670
of .onion addresses from this six months[br]which was March to September of this year.

0:29:23.670,0:29:26.700
And we wanted to say, okay, we’re given[br]these two sets of .onion addresses.

0:29:26.700,0:29:30.740
Which .onion addresses existed in his set[br]but not ours and vice versa, and which

0:29:30.740,0:29:39.740
.onion addresses existed in both sets?

0:29:39.740,0:29:45.520
So as you can see a very small minority[br]of hidden service addresses existed

0:29:45.520,0:29:50.000
in both sets. This is over an 18 month[br]period between these two collection points.

0:29:50.000,0:29:54.430
A very small number of services existed[br]in both his data set and in

0:29:54.430,0:29:58.390
our data set. Which again suggested[br]there’s a very high turnover of hidden

0:29:58.390,0:30:02.920
services that don’t tend to exist[br]for a very long period of time.

0:30:02.920,0:30:06.530
So the question is why is that?[br]Which we’ll come on to a little bit later.

0:30:06.530,0:30:11.120
It’s a very valid question, can’t answer[br]it 100%, we have some inclines as to

0:30:11.120,0:30:15.560
why that may be the case. So in terms[br]of popularity which hidden services

0:30:15.560,0:30:19.700
did we see, or which .onion addresses[br]did we see requested the most?

0:30:19.700,0:30:26.980
Which got the most number of hits? Or the[br]most number of directory requests.

0:30:26.980,0:30:30.120
So botnet Command &amp; Control servers[br]– if you’re not familiar with what

0:30:30.120,0:30:34.340
a botnet is, the idea is to infect lots of[br]people with a piece of malware.

0:30:34.340,0:30:37.630
And this malware phones home to[br]a Command &amp; Control server where

0:30:37.630,0:30:41.500
the botnet master can give instructions[br]to each of the bots on to do things.

0:30:41.500,0:30:46.780
So it might be e.g. to collect passwords,[br]key strokes, banking details.

0:30:46.780,0:30:51.010
Or it might be to do things like[br]Distributed Denial of Service attacks,

0:30:51.010,0:30:55.220
or to send spam, those sorts of things.[br]And a couple of years ago someone gave

0:30:55.220,0:31:00.720
a talk and said: “Well, the problem with[br]running a botnet is your C&amp;C servers

0:31:00.720,0:31:05.750
are vulnerable.” Once a C&amp;C server is taken[br]down you no longer have control over

0:31:05.750,0:31:10.030
your botnet. So it’s been a sort of arms[br]race against anti-virus companies and

0:31:10.030,0:31:15.130
against malware authors to try and come up[br]with techniques to run C&amp;C servers in a way

0:31:15.130,0:31:18.490
which they can’t be taken down. And[br]a couple of years ago someone gave a talk

0:31:18.490,0:31:22.450
at a conference that said: “You know what?[br]It would be a really good idea if botnet

0:31:22.450,0:31:25.809
C&amp;C servers were run as Tor hidden[br]services because then no one knows

0:31:25.809,0:31:29.370
where they are, and in theory they can’t[br]be taken down.” So in the fact we have this

0:31:29.370,0:31:33.000
there are loads and loads and loads of[br]these addresses associated with several

0:31:33.000,0:31:38.122
different botnets, ‘Sefnit’ and ‘Skynet’.[br]Now Skynet is the one I wanted to talk

0:31:38.122,0:31:42.840
to you about because the guy that runs[br]Skynet had a twitter account, and he also

0:31:42.840,0:31:47.210
did a Reddit AMA. If you not heard[br]of a Reddit AMA before, that’s a Reddit

0:31:47.210,0:31:51.500
ask-me-anything. You can go on the website[br]and ask the guy anything. So this guy

0:31:51.500,0:31:54.790
wasn’t hiding in the shadows. He’d say:[br]“Hey, I’m running this massive botnet,

0:31:54.790,0:31:58.180
here’s my Twitter account which I update[br]regularly, here is my Reddit AMA where

0:31:58.180,0:32:01.620
you can ask me questions!” etc.

0:32:01.620,0:32:04.590
He was arrested last year, which is not,[br]perhaps, a huge surprise.

0:32:04.590,0:32:11.750
<i>laughter and applause</i>

0:32:11.750,0:32:15.970
But… so he was arrested,[br]his C&amp;C servers disappeared

0:32:15.970,0:32:21.600
but there were still infected hosts trying[br]to connect with the C&amp;C servers and

0:32:21.600,0:32:24.490
request access to the C&amp;C server.

0:32:24.490,0:32:27.570
This is why we’re saying: “A large number[br]of hits.” So all of these requests are

0:32:27.570,0:32:31.520
failed requests, i.e. we didn’t have[br]a descriptor for them because

0:32:31.520,0:32:34.910
the hidden service had gone away but[br]there were still clients requesting each

0:32:34.910,0:32:38.040
of the hidden services.

0:32:38.040,0:32:41.980
And the next thing we wanted to do was[br]to try and categorize sites. So, as I said

0:32:41.980,0:32:45.960
earlier, we crawled all of the hidden[br]services that we could, and we classified

0:32:45.960,0:32:50.230
them into different categories based[br]on what the type of content was

0:32:50.230,0:32:53.650
on the hidden service side. The first[br]graph I have is the number of sites

0:32:53.650,0:32:58.040
in each of the categories. So you can see[br]down the bottom here we got lots of

0:32:58.040,0:33:04.280
different categories. We got drugs, market[br]places, etc. on the bottom. And the graph

0:33:04.280,0:33:07.360
shows the percentage of the hidden[br]services that we crawled that fit in

0:33:07.360,0:33:12.680
to each of these categories. So e.g. looking[br]at this, drugs, the most number of sites

0:33:12.680,0:33:16.250
that we crawled were made up of[br]drugs-focused websites, followed by

0:33:16.250,0:33:20.970
market places etc. There’s a couple of[br]questions you might have here,

0:33:20.970,0:33:25.640
so which ones are gonna stick out, what[br]does ‘porn’ mean, well, you know

0:33:25.640,0:33:31.060
what ‘porn’ means. There are some very[br]notorious porn sites on the Tor Darknet.

0:33:31.060,0:33:34.470
There was one in particular which was[br]focused on revenge porn. It turns out

0:33:34.470,0:33:37.520
that youngsters wish to take pictures[br]of themselves, and send it to their

0:33:37.520,0:33:45.040
boyfriends or their girlfriends. And[br]when they get dumped they publish them

0:33:45.040,0:33:49.750
on these websites. So there were several[br]of these sites on the main internet

0:33:49.750,0:33:53.070
which have mostly been shut down.[br]And some of these sites were archived

0:33:53.070,0:33:58.220
on the Darknet. The second one is that[br]we should probably wonder what is,

0:33:58.220,0:34:03.430
is ‘abuse’. Abuse was… every single[br]site we classified in this category

0:34:03.430,0:34:07.750
were child abuse sites. So they were in[br]some way facilitating child abuse.

0:34:07.750,0:34:10.980
And how do we know that? Well, the data[br]that came back from the crawler

0:34:10.980,0:34:14.789
made it completely unambiguous as to what[br]the content was in these sites. That was

0:34:14.789,0:34:18.918
completely obvious, from then content, from[br]the crawler as to what was on these sites.

0:34:18.918,0:34:23.449
And this is the principal reason why we[br]didn’t pull down images from sites.

0:34:23.449,0:34:26.099
There are many countries that[br]would be a criminal offense to do so.

0:34:26.099,0:34:29.530
So our crawler only pulled down text[br]content from all of these sites, and that

0:34:29.530,0:34:34.470
enabled us to classify them, based on[br]that. We didn’t pull down any images.

0:34:34.470,0:34:37.880
So of course the next thing we liked to do[br]is to say: “Okay, well, given each of these

0:34:37.880,0:34:42.759
categories, what proportion of directory[br]requests went to each of the categories?”

0:34:42.759,0:34:45.489
Now the next graph is going to need some[br]explaining as to precisely what it

0:34:45.489,0:34:52.090
means, and I’m gonna give that. This is[br]the proportion of directory requests

0:34:52.090,0:34:55.830
which we saw that went to each of the[br]categories of hidden service that we

0:34:55.830,0:34:59.740
classified. As you can see, in fact, we[br]saw a very large number going to these

0:34:59.740,0:35:05.010
abuse sites. And the rest sort of[br]distributed right there, at the bottom.

0:35:05.010,0:35:07.230
And the question is: “What is it[br]we’re collecting here?”

0:35:07.230,0:35:12.070
We’re collecting successful hidden service[br]directory requests. What does a hidden

0:35:12.070,0:35:16.790
service directory request mean?[br]It probably loosely correlates with

0:35:16.790,0:35:22.230
either a visit or a visitor. So somewhere[br]in between those two. Because when you

0:35:22.230,0:35:26.790
want to visit a hidden service you make[br]a request for the hidden service descriptor

0:35:26.790,0:35:31.080
and that allows you to connect to it[br]and browse through the web site.

0:35:31.080,0:35:34.770
But there are cases where, e.g. if you[br]restart Tor, you’ll go back and you

0:35:34.770,0:35:40.100
re-fetch the descriptor. So in that case[br]we’ll count twice, for example.

0:35:40.100,0:35:43.050
What proportion of these are people,[br]and which proportion of them are

0:35:43.050,0:35:46.619
something else? The answer to that is[br]we just simply don’t know.

0:35:46.619,0:35:50.250
We've got directory requests but that doesn’t[br]tell us about what they’re doing on these

0:35:50.250,0:35:55.130
sites, what they’re fetching, or who[br]indeed they are, or what it is they are.

0:35:55.130,0:35:58.690
So these could be automated requests,[br]they could be human beings. We can’t

0:35:58.690,0:36:03.750
distinguish between those two things.

0:36:03.750,0:36:06.420
What are the limitations?

0:36:06.420,0:36:12.170
A hidden service directory request neither[br]exactly correlates to a visit -or- a visitor.

0:36:12.170,0:36:16.380
It’s probably somewhere in between.[br]So you can’t say whether it’s exactly one

0:36:16.380,0:36:19.810
or the other. We cannot say whether[br]a hidden service directory request

0:36:19.810,0:36:26.230
is a person or something automated.[br]We can’t distinguish between those two.

0:36:26.230,0:36:31.890
Any type of site could be targeted by e.g.[br]DoS attacks, by web crawlers which would

0:36:31.890,0:36:40.040
greatly inflate the figures. If you were[br]to do a DoS attack it’s likely you’d only

0:36:40.040,0:36:44.700
request a small number of descriptors.[br]You’d actually be flooding the site itself

0:36:44.700,0:36:47.740
rather than the directories. But, in[br]theory, you could flood the directories.

0:36:47.740,0:36:52.840
But we didn’t see any sort of shutdown[br]of our directories based on flooding, e.g.

0:36:52.840,0:36:58.720
Whilst we can’t rule that out, it doesn’t[br]seem to fit too well with what we’ve got.

0:36:58.720,0:37:02.971
The other question is ‘crawlers’.[br]I obviously talked with the Tor Project

0:37:02.971,0:37:08.570
about these results and they’ve suggested[br]that there are groups, so the child

0:37:08.570,0:37:12.740
protection agencies e.g. that will crawl[br]these sites on a regular basis. And,

0:37:12.740,0:37:15.879
again, that doesn’t necessarily correlate[br]with a human being. And that could

0:37:15.879,0:37:19.830
inflate the figures. How many hidden[br]directory requests would there be

0:37:19.830,0:37:24.610
if a crawler was pointed at it. Typically,[br]if I crawl them on a single day, one request.

0:37:24.610,0:37:27.850
But if they got a large number of servers[br]doing the crawling then it could be

0:37:27.850,0:37:32.840
a request per day for every single server.[br]So, again, I can’t give you, definitive,

0:37:32.840,0:37:37.930
“yes, this is human beings” or[br]“yes, this is automated requests”.

0:37:37.930,0:37:43.300
The other important point is, these two[br]content graphs are only hidden services

0:37:43.300,0:37:48.550
offering web content. There are hidden[br]services that do things, e.g. IRC,

0:37:48.550,0:37:52.490
the instant messaging etc. Those aren’t[br]included in these figures. We’re only

0:37:52.490,0:37:57.990
concentrating on hidden services offering[br]web sites. They’re HTTP services, or HTTPS

0:37:57.990,0:38:01.640
services. Because that allows to easily[br]classify them. And, in fact, some of

0:38:01.640,0:38:06.080
the other types are IRC and Jabber the[br]result was probably not directly comparable

0:38:06.080,0:38:08.920
with web sites. That’s sort of the use[br]case for using them, it’s probably

0:38:08.920,0:38:16.490
slightly different. So I appreciate the[br]last graph is somewhat alarming.

0:38:16.490,0:38:20.640
If you have any questions please ask[br]either me or the Tor developers

0:38:20.640,0:38:24.810
as to how to interpret these results. It’s[br]not quite as straight-forward as it may

0:38:24.810,0:38:27.500
look when you look at the graph. You[br]might look at the graph and say: “Hey,

0:38:27.500,0:38:30.980
that looks like there’s lots of people[br]visiting these sites”. It’s difficult

0:38:30.980,0:38:40.240
to conclude that from the results.

0:38:40.240,0:38:45.990
The next slide is gonna be very[br]contentious. I will prefix it with:

0:38:45.990,0:38:50.970
“I’m not advocating -any- kind of[br]action whatsoever. I’m just trying

0:38:50.970,0:38:56.130
to describe technically as to what could[br]be done. It’s not up to me to make decisions

0:38:56.130,0:39:02.869
on these types of things.” So, of course,[br]when we found this out, frankly, I think

0:39:02.869,0:39:06.190
we were stunned. I mean, it took us[br]several days, frankly, it just stunned us,

0:39:06.190,0:39:09.610
“what the hell, this is not[br]what we expected at all.”

0:39:09.610,0:39:13.210
So a natural step is, well, we think, most[br]of us think that Tor is a great thing,

0:39:13.210,0:39:18.510
it seems. Could this problem be sorted out[br]while still keeping Tor as it is?

0:39:18.510,0:39:21.510
And probably the next step to say: “Well,[br]okay, could we just block this class

0:39:21.510,0:39:26.060
of content and not other types of content?”[br]So could we block just hidden services

0:39:26.060,0:39:29.630
that are associated with these sites and[br]not other types of hidden services?

0:39:29.630,0:39:33.370
We thought there’s three ways in which[br]we could block hidden services.

0:39:33.370,0:39:36.960
And I’ll talk about whether these were[br]impossible in the coming months,

0:39:36.960,0:39:39.430
after explaining them. But during our[br]study these would have been impossible

0:39:39.430,0:39:43.590
and presently they are possible.

0:39:43.590,0:39:48.630
A single individual could shut down[br]a single hidden service by controlling

0:39:48.630,0:39:53.640
all of the relays which are responsible[br]for receiving a publication request

0:39:53.640,0:39:57.280
on that distributed hash table. It’s[br]possible to place one of your relays

0:39:57.280,0:40:01.460
at a particular position on that circle[br]and so therefore make yourself be

0:40:01.460,0:40:04.290
the responsible relay for[br]a particular hidden service.

0:40:04.290,0:40:08.500
And if you control all of the six relays[br]which are responsible for a hidden service,

0:40:08.500,0:40:11.390
when someone comes to you and says:[br]“Can I have a descriptor for that site”

0:40:11.390,0:40:15.910
you can just say: “No, I haven’t got it”.[br]And provided you control those relays

0:40:15.910,0:40:20.580
users won’t be able to fetch those sites.

0:40:20.580,0:40:25.010
The second option is you could say:[br]“Okay, the Tor Project are blocking these”

0:40:25.010,0:40:28.941
– which I’ll talk about in a second –[br]“as a relay operator”. Could I

0:40:28.941,0:40:32.500
as a relay operator say: “Okay, as[br]a relay operator I don’t want to carry

0:40:32.500,0:40:35.930
this type of content, and I don’t want to[br]be responsible for serving up this type

0:40:35.930,0:40:39.930
of content.” A relay operator could patch[br]his relay and say: “You know what,

0:40:39.930,0:40:44.020
if anyone comes to this relay requesting[br]anyone of these sites then, again, just

0:40:44.020,0:40:48.740
refuse to do it”. The problem is a lot of[br]relay operators need to do it. So a very,

0:40:48.740,0:40:51.990
very large number of the potential relay[br]operators would need to do that

0:40:51.990,0:40:56.170
to effectively block these sites. The[br]final option is the Tor Project could

0:40:56.170,0:41:00.740
modify the Tor program and actually embed[br]these ingresses in the Tor program itself

0:41:00.740,0:41:05.030
so as that all relays by default both[br]block hidden service directory requests

0:41:05.030,0:41:10.560
to these sites, and also clients themselves[br]would say: “Okay, if anyone’s requesting

0:41:10.560,0:41:15.000
these block them at the client level.”[br]Now I hasten to add: I’m not advocating

0:41:15.000,0:41:18.230
any kind of action that is entirely up to[br]other people because, frankly, I think

0:41:18.230,0:41:22.530
if I advocated blocking hidden services[br]I probably wouldn’t make it out alive,

0:41:22.530,0:41:27.050
so I’m just saying: this is a description[br]of what technical measures could be used

0:41:27.050,0:41:30.730
to block some classes of sites. And of[br]course there’s lots of questions here.

0:41:30.730,0:41:35.150
If e.g. the Tor Project themselves decided:[br]“Okay, we’re gonna block these sites”

0:41:35.150,0:41:38.490
that means they are essentially[br]in control of the block list.

0:41:38.490,0:41:41.360
The block list would be somewhat public[br]so everyone would be up to inspect

0:41:41.360,0:41:44.930
what the sites are that are being blocked[br]and they would be in control of some kind

0:41:44.930,0:41:54.360
of block list. Which, you know, arguably[br]is against what the Tor Projects are after.

0:41:54.360,0:41:59.560
<i>takes a sip, coughs</i>

0:41:59.560,0:42:05.480
So how about deanonymising visitors[br]to hidden service web sites?

0:42:05.480,0:42:08.940
So in this case we got a user on the[br]left-hand side who is connected to

0:42:08.940,0:42:12.630
a Guard node. We’ve got a hidden service[br]on the right-hand side who is connected

0:42:12.630,0:42:17.530
to a Guard node and on the top we got[br]one of those directory servers which is

0:42:17.530,0:42:21.850
responsible for serving up those[br]hidden service directory requests.

0:42:21.850,0:42:28.660
Now, when you first want to connect to[br]a hidden service you connect through

0:42:28.660,0:42:31.619
your Guard node and through a couple of hops[br]up to the hidden service directory and

0:42:31.619,0:42:35.840
you request the descriptor off of them.[br]So at this point if you are the attacker

0:42:35.840,0:42:39.440
and you control one of the hidden service[br]directory nodes for a particular site

0:42:39.440,0:42:43.100
you can send back down the circuit[br]a particular pattern of traffic.

0:42:43.100,0:42:47.740
And if you control that user’s[br]Guard node – which is a big if –

0:42:47.740,0:42:52.110
then you can spot that pattern of traffic[br]at the Guard node. The question is:

0:42:52.110,0:42:56.940
“How do you control a particular user’s[br]Guard node?” That’s very, very hard.

0:42:56.940,0:43:01.480
But if e.g. I run a hidden service and all[br]of you visit my hidden service, and

0:43:01.480,0:43:05.670
I’m running a couple of dodgy Guard relays[br]then the probability is that some of you,

0:43:05.670,0:43:09.760
certainly not all of you by any stretch will[br]select my dodgy Guard relay, and

0:43:09.760,0:43:13.220
I could deanonymise you, but I couldn’t[br]deanonymise the rest of them.

0:43:13.220,0:43:18.260
So what we’re saying here is that[br]you can deanonymise some of the users

0:43:18.260,0:43:22.130
some of the time but you can’t pick which[br]users those are which you’re going to

0:43:22.130,0:43:26.609
deanonymise. You can’t deanonymise someone[br]specific but you can deanonymise a fraction

0:43:26.609,0:43:32.170
based on what fraction of the network you[br]control in terms of Guard capacity.

0:43:32.170,0:43:36.340
How about… so the attacker controls those[br]two – here’s a picture from a research of

0:43:36.340,0:43:40.200
the University of Luxemburg which[br]did this. And these are plots of

0:43:40.200,0:43:45.270
taking the user’s IP address visiting[br]a C&amp;C server, and then geolocating it

0:43:45.270,0:43:48.480
and putting it on a map. So “where was the[br]user located when they called one of

0:43:48.480,0:43:51.620
the Tor hidden services?” So, again,[br]this is a selection, a percentage

0:43:51.620,0:43:58.060
of the users visiting C&amp;C servers[br]using this technique.

0:43:58.060,0:44:03.770
How about deanonymising hidden services[br]themselves? Well, again, you got a problem.

0:44:03.770,0:44:08.340
You’re the user. You’re gonna connect[br]through your Guard into the Tor network.

0:44:08.340,0:44:12.160
And then, eventually, through the hidden[br]service’s Guard node, and talk to

0:44:12.160,0:44:16.740
the hidden service. As the attacker you[br]need to control the hidden service’s

0:44:16.740,0:44:20.859
Guard node to do these traffic correlation[br]attacks. So again, it’s very difficult

0:44:20.859,0:44:24.390
to deanonymise a specific Tor hidden[br]service. But if you think about, okay,

0:44:24.390,0:44:30.200
there is 1.000 Tor hidden services, if you[br]can control a percentage of the Guard nodes

0:44:30.200,0:44:34.230
then some hidden services will pick you[br]and then you’ll be able to deanonymise those.

0:44:34.230,0:44:37.330
So provided you don’t care which hidden[br]services you gonna deanonymise

0:44:37.330,0:44:41.400
then it becomes much more straight-forward[br]to control the Guard nodes of some hidden

0:44:41.400,0:44:44.910
services but you can’t pick exactly[br]what those are.

0:44:44.910,0:44:51.040
So what sort of data can you see[br]traversing a relay?

0:44:51.040,0:44:55.880
This is a modified Tor client which just[br]dumps cells which are coming…

0:44:55.880,0:44:58.750
essentially packets travelling down[br]a circuit, and the information you can

0:44:58.750,0:45:04.020
extract from them at a Guard node.[br]And this is done off the main Tor network.

0:45:04.020,0:45:08.590
So I’ve got a client connected to[br]a “malicious” Guard relay

0:45:08.590,0:45:14.040
and it logs every single packet – they’re[br]called ‘cells’ in the Tor protocol –

0:45:14.040,0:45:17.619
coming through the Guard relay. We can’t[br]decrypt the packet because it’s encrypted

0:45:17.619,0:45:21.780
three times. What we can record,[br]though, is the IP address of the user,

0:45:21.780,0:45:25.070
the IP address of the next hop,[br]and we can count packets travelling

0:45:25.070,0:45:29.240
in each direction down the circuit. And we[br]can also record the time at which those

0:45:29.240,0:45:32.210
packets were sent. So of course, if you’re[br]doing the traffic correlation attacks

0:45:32.210,0:45:37.970
you’re using that time in the information[br]to try and work out whether you’re seeing

0:45:37.970,0:45:42.370
traffic which you’ve sent and which[br]identifies a particular user or not.

0:45:42.370,0:45:44.810
Or indeed traffic which they’ve sent[br]which you’ve seen at a different point

0:45:44.810,0:45:49.100
in the network.

0:45:49.100,0:45:51.980
Moving on to my…

0:45:51.980,0:45:55.760
…interesting problems,[br]research questions etc.

0:45:55.760,0:45:59.250
Based on what I’ve said, I’ve said there’s[br]these directory authorities which are

0:45:59.250,0:46:05.070
controlled by the core Tor members. If[br]e.g. they were malicious then they could

0:46:05.070,0:46:08.990
manipulate the Tor… – if a big enough[br]chunk of them are malicious then

0:46:08.990,0:46:12.700
they can manipulate the consensus[br]to direct you to particular nodes.

0:46:12.700,0:46:15.920
I don’t think that’s the case, and that[br]anyone thinks that’s the case.

0:46:15.920,0:46:19.180
And Tor is designed in a way to tr…[br]I mean that you’d have to control

0:46:19.180,0:46:22.480
a certain number of the authorities[br]to be able to do anything important.

0:46:22.480,0:46:25.270
So the Tor people… I said this[br]to them a couple of days ago.

0:46:25.270,0:46:28.780
I find it quite funny that you’d design[br]your system as if you don’t trust

0:46:28.780,0:46:31.880
each other. To which their response was:[br]“No, we design our system so that

0:46:31.880,0:46:35.620
we don’t have to trust each other.” Which[br]I think is a very good model to have,

0:46:35.620,0:46:39.430
when you have this type of system.[br]So could we eliminate these sort of

0:46:39.430,0:46:43.240
centralized servers? I think that’s[br]actually a very hard problem to do.

0:46:43.240,0:46:46.340
There are lots of attacks which could[br]potentially be deployed against

0:46:46.340,0:46:51.250
a decentralized network. At the moment the[br]Tor network is relatively well understood

0:46:51.250,0:46:54.490
both in terms of what types of attack it[br]is vulnerable to. So if we were to move

0:46:54.490,0:46:58.880
to a new architecture then we may open it[br]to a whole new class of attacks.

0:46:58.880,0:47:02.000
The Tor network has been existing[br]for quite some time and it’s been

0:47:02.000,0:47:06.820
very well studied. What about global[br]adversaries like the NSA, where you could

0:47:06.820,0:47:10.980
monitor network links all across the[br]world? It’s very difficult to defend

0:47:10.980,0:47:15.530
against that. Where they can monitor…[br]if they can identify which Guard relay

0:47:15.530,0:47:18.760
you’re using, they can monitor traffic[br]going into and out of the Guard relay,

0:47:18.760,0:47:23.259
and they log each of the subsequent hops[br]along. It’s very, very difficult to defend against

0:47:23.259,0:47:26.470
these types of things. Do we know if[br]they’re doing it? The documents that were

0:47:26.470,0:47:29.850
released yesterday – I’ve only had a very[br]brief look through them, but they suggest

0:47:29.850,0:47:32.480
that they’re not presently doing it and[br]they haven’t had much success.

0:47:32.480,0:47:36.450
I don’t know why, there are very powerful[br]attacks described in the academic literature

0:47:36.450,0:47:40.830
which are very, very reliable and most[br]academic literature you can access for free

0:47:40.830,0:47:43.960
so it’s not even as if they have to figure[br]out how to do it. They just have to read

0:47:43.960,0:47:47.010
the academic literature and try and[br]implement some of these attacks.

0:47:47.010,0:47:52.000
I don’t know what – why they’re not. The[br]next question is how to detect malicious

0:47:52.000,0:47:57.760
relays. So in my case we’re running[br]40 relays. Our relays were on consecutive

0:47:57.760,0:48:01.570
IP addresses, so we’re running 40[br]– well, most of them are on consecutive

0:48:01.570,0:48:04.820
IP addresses in two blocks. So they’re[br]running on IP addresses numbered

0:48:04.820,0:48:09.280
e.g. 1,2,3,4,…[br]We were running two relays per IP address,

0:48:09.280,0:48:12.210
and every single relay had my name[br]plastered across it.

0:48:12.210,0:48:14.740
So after I set up these 40 relays in

0:48:14.740,0:48:17.420
a relatively short period of time[br]I expected someone from the Tor Project

0:48:17.420,0:48:22.260
to come to me and say: “Hey Gareth, what[br]are you doing?” – no one noticed,

0:48:22.260,0:48:26.090
no one noticed. So this is presently[br]an open question. On the Tor Project

0:48:26.090,0:48:28.790
they’re quite open about this. They[br]acknowledged that, in fact, last year

0:48:28.790,0:48:33.210
we had the CERT researchers launch much[br]more relays than that. The Tor Project

0:48:33.210,0:48:36.510
spotted those large number of relays[br]but chose not to do anything about it

0:48:36.510,0:48:40.119
and, in fact, they were deploying an[br]attack. But, as you know, it’s often very

0:48:40.119,0:48:43.700
difficult to defend against unknown[br]attacks. So at the moment how to detect

0:48:43.700,0:48:47.780
malicious relays is a bit of an open[br]question. Which as I think is being

0:48:47.780,0:48:50.720
discussed on the mailing list.

0:48:50.720,0:48:54.230
The other one is defending against unknown[br]tampering at exits. If you took or take

0:48:54.230,0:48:57.220
the exit relays – the exit relay[br]can tamper with the traffic.

0:48:57.220,0:49:01.040
So we know particular types of attacks[br]doing SSL man-in-the-middles etc.

0:49:01.040,0:49:05.350
We’ve seen recently binary patching.[br]How do we detect unknown tampering

0:49:05.350,0:49:08.970
with traffic, other types of traffic? So[br]the binary tampering wasn’t spotted

0:49:08.970,0:49:12.060
until it was spotted by someone who[br]told the Tor Project. So it wasn’t

0:49:12.060,0:49:15.609
detected e.g. by the Tor Project[br]themselves, it was spotted by someone else

0:49:15.609,0:49:20.500
and notified to them. And then the final[br]one open on here is the Tor code review.

0:49:20.500,0:49:25.400
So the Tor code is open source. We know[br]from OpenSSL that, although everyone

0:49:25.400,0:49:29.260
can read source code, people don’t always[br]look at it. And OpenSSL has been

0:49:29.260,0:49:32.230
a huge mess, and there’s been[br]lots of stuff disclosed over that

0:49:32.230,0:49:35.880
over the last coming days. There are[br]lots of eyes on the Tor code but I think

0:49:35.880,0:49:41.519
always, more eyes are better. I’d say,[br]ideally if we can get people to look

0:49:41.519,0:49:45.140
at the Tor code and look for[br]vulnerabilities then… I encourage people

0:49:45.140,0:49:49.860
to do that. It’s a very useful thing to[br]do. There could be unknown vulnerabilities

0:49:49.860,0:49:53.119
as we’ve seen with the “relay early” type[br]quite recently in the Tor code which

0:49:53.119,0:49:56.990
could be quite serious. The truth is we[br]just don’t know until people do thorough

0:49:56.990,0:50:02.500
code audits, and even then it’s very[br]difficult to know for certain.

0:50:02.500,0:50:08.170
So my last point, I think, yes,

0:50:08.170,0:50:11.130
is advice to future researchers.[br]So if you ever wanted, or are planning

0:50:11.130,0:50:16.349
on doing a study in the future, e.g. on[br]Tor, do not do what the CERT researchers

0:50:16.349,0:50:20.550
do and start deanonymising people on the[br]live Tor network and doing it in a way

0:50:20.550,0:50:25.060
which is incredibly irresponsible. I don’t[br]think…I mean, I tend, myself, to give you with

0:50:25.060,0:50:28.510
the benefit of a doubt, I don’t think the[br]CERT researchers set out to be malicious.

0:50:28.510,0:50:33.320
I think they’re just very naive.[br]That’s what it was they were doing.

0:50:33.320,0:50:36.780
That was rapidly pointed out to them.[br]In my case we are running

0:50:36.780,0:50:43.090
40 relays. Our Tor relays they were forwarding[br]traffic, they were acting as good relays.

0:50:43.090,0:50:45.970
The only thing that we were doing[br]was logging publication requests

0:50:45.970,0:50:50.050
to the directories. Big question whether[br]that’s malicious or not – I don’t know.

0:50:50.050,0:50:53.330
One thing that has been pointed out to me[br]is that the .onion addresses themselves

0:50:53.330,0:50:58.270
could be considered sensitive information,[br]so any data we will be retaining

0:50:58.270,0:51:01.840
from the study is the aggregated data.[br]So we won't be retaining information

0:51:01.840,0:51:05.400
on individual .onion addresses because[br]that could potentially be considered

0:51:05.400,0:51:08.900
sensitive information. If you think about[br]someone running an .onion address which

0:51:08.900,0:51:11.240
contains something which they don’t want[br]other people knowing about. So we won’t

0:51:11.240,0:51:15.060
be retaining that data, and[br]we’ll be destroying them.

0:51:15.060,0:51:19.920
So I think that brings me now[br]to starting the questions.

0:51:19.920,0:51:22.770
I want to say “Thanks” to a couple of[br]people. The student who donated

0:51:22.770,0:51:26.820
the server to us. Nick Savage who is one[br]of my colleagues who was a sounding board

0:51:26.820,0:51:30.510
during the entire study. Ivan Pustogarov[br]who is the researcher at the University

0:51:30.510,0:51:34.700
of Luxembourg who sent us the large data[br]set of .onion addresses from last year.

0:51:34.700,0:51:37.670
He’s also the chap who has demonstrated[br]those deanonymisation attacks

0:51:37.670,0:51:41.500
that I talked about. A big "Thank you" to[br]Roger Dingledine who has frankly been…

0:51:41.500,0:51:45.230
presented loads of questions to me over[br]the last couple of days and allowed me

0:51:45.230,0:51:49.410
to bounce ideas back and forth.[br]That has been a very useful process.

0:51:49.410,0:51:53.640
If you are doing future research I strongly[br]encourage you to contact the Tor Project

0:51:53.640,0:51:57.040
at the earliest opportunity. You’ll find[br]them… certainly I found them to be

0:51:57.040,0:51:59.460
extremely helpful.

0:51:59.460,0:52:04.640
Donncha also did something similar,[br]so both Ivan and Donncha have done

0:52:04.640,0:52:09.520
a similar study in trying to classify the[br]types of hidden services or work out

0:52:09.520,0:52:13.520
how many hits there are to particular[br]types of hidden service. Ivan Pustogarov

0:52:13.520,0:52:17.430
did it on a bigger scale[br]and found similar results to us.

0:52:17.430,0:52:21.910
That is that these abuse sites[br]featured frequently

0:52:21.910,0:52:26.740
in the top requested sites. That was done[br]over a year ago, and again, he was seeing

0:52:26.740,0:52:31.109
similar sorts of pattern. There were these[br]abuse sites being requested frequently.

0:52:31.109,0:52:35.450
So that also sort of probates[br]what we’re saying.

0:52:35.450,0:52:38.540
The data I put online is at this address,[br]there will probably be the slides,

0:52:38.540,0:52:41.609
something called ‘The Tor Research[br]Framework’ which is an implementation

0:52:41.609,0:52:47.510
of a Java client, so an implementation[br]of a Tor client in Java specifically aimed

0:52:47.510,0:52:52.080
at researchers. So if e.g. you wanna pull[br]out data from a consensus you can do.

0:52:52.080,0:52:55.290
If you want to build custom routes[br]through the network you can do.

0:52:55.290,0:52:58.230
If you want to build routes through the[br]network and start sending padding traffic

0:52:58.230,0:53:01.720
down them you can do etc.[br]The code is designed in a way which is

0:53:01.720,0:53:06.000
designed to be easily modifiable[br]for testing lots of these things.

0:53:06.000,0:53:10.580
There is also a link to the Tor FBI[br]exploit which they deployed against

0:53:10.580,0:53:16.230
visitors to some Tor hidden services last[br]year. They exploited a Mozilla Firefox bug

0:53:16.230,0:53:20.540
and then ran code on users who were[br]visiting these hidden service, and ran

0:53:20.540,0:53:24.619
code on their computer to identify them.[br]At this address there is a link to that

0:53:24.619,0:53:29.250
including a copy of the shell code and an[br]analysis of exactly what it was doing.

0:53:29.250,0:53:31.670
And then of course a list of references,[br]with papers and things.

0:53:31.670,0:53:34.260
So I’m quite happy to take questions now.

0:53:34.260,0:53:46.960
<i>applause</i>

0:53:46.960,0:53:50.880
Herald: Thanks for the nice talk![br]Do we have any questions

0:53:50.880,0:53:57.000
from the internet?

0:53:57.000,0:53:59.740
Signal Angel: One question. It’s very hard[br]to block addresses since creating them

0:53:59.740,0:54:03.620
is cheap, and they can be generated[br]for each user, and rotated often. So

0:54:03.620,0:54:07.510
can you think of any other way[br]for doing the blocking?

0:54:07.510,0:54:09.799
Gareth: That is absolutely true, so, yes.[br]If you were to block a particular .onion

0:54:09.799,0:54:13.060
address they can wail: “I want another[br].onion address.” So I don’t know of

0:54:13.060,0:54:16.760
any way to counter that now.

0:54:16.760,0:54:18.510
Herald: Another one from the internet?[br]<i>inaudible answer from Signal Angel</i>

0:54:18.510,0:54:22.030
Okay, then, Microphone 1, please!

0:54:22.030,0:54:26.359
Question: Thank you, that’s fascinating[br]research. You mentioned that it is

0:54:26.359,0:54:32.200
possible to influence the hash of your[br]relay node in a sense that you could

0:54:32.200,0:54:35.970
to be choosing which service you are[br]advertising, or which hidden service

0:54:35.970,0:54:38.050
you are responsible for. Is that right?[br]Gareth: Yeah, correct!

0:54:38.050,0:54:40.390
Question: So could you elaborate[br]on how this is possible?

0:54:40.390,0:54:44.740
Gareth: So e.g. you just keep regenerating[br]a public key for your relay,

0:54:44.740,0:54:48.140
you’ll get closer and closer to the point[br]where you’ll be the responsible relay

0:54:48.140,0:54:51.160
for that particular hidden service. That’s[br]just – you keep regenerating your identity

0:54:51.160,0:54:54.720
hash until you’re at that particular point[br]in the relay. That’s not particularly

0:54:54.720,0:55:00.490
computationally intensive to do.[br]That was it?

0:55:00.490,0:55:04.740
Herald: Okay, next question[br]from Microphone 5, please.

0:55:04.740,0:55:09.490
Question: Hi, I was wondering for the[br]attacks where you identify a certain number

0:55:09.490,0:55:15.170
of users using a hidden service. Have[br]those attacks been used, or is there

0:55:15.170,0:55:18.880
any evidence there, and is there[br]any way of protecting against that?

0:55:18.880,0:55:22.260
Gareth: That’s a very interesting question,[br]is there any way to detect these types

0:55:22.260,0:55:24.970
of attacks? So some of the attacks,[br]if you’re going to generate particular

0:55:24.970,0:55:29.030
traffic patterns, one way to do that is to[br]use the padding cells. The padding cells

0:55:29.030,0:55:32.070
aren’t used at the moment by the official[br]Tor client. So the detection of those

0:55:32.070,0:55:36.510
could be indicative but it doesn't... [br]it`s not conclusive evidence in our tool.

0:55:36.510,0:55:40.050
Question: And is there any way of[br]protecting against a government

0:55:40.050,0:55:46.510
or something trying to denial-of-service[br]hidden services?

0:55:46.510,0:55:48.180
Gareth: So I… trying to… did not…

0:55:48.180,0:55:52.500
Question: Is it possible to protect[br]against this kind of attack?

0:55:52.500,0:55:56.180
Gareth: Not that I’m aware of. The Tor[br]Project are currently revising how they

0:55:56.180,0:55:59.500
do the hidden service protocol which will[br]make e.g. what I did, enumerating

0:55:59.500,0:56:03.230
the hidden services, much more difficult.[br]And to also be in a position on the

0:56:03.230,0:56:07.470
distributed hash table in advance[br]for a particular hidden service.

0:56:07.470,0:56:10.510
So they are at the moment trying to change[br]the way it’s done, and make some of

0:56:10.510,0:56:15.270
these things more difficult.

0:56:15.270,0:56:20.290
Herald: Good. Next question[br]from Microphone 2, please.

0:56:20.290,0:56:27.220
Mic2: Hi. I’m running the Tor2Web abuse,[br]and so I used to see a lot of abuse of requests

0:56:27.220,0:56:31.130
concerning the Tor hidden service[br]being exposed on the internet through

0:56:31.130,0:56:37.270
the Tor2Web.org domain name. And I just[br]wanted to comment on, like you said,

0:56:37.270,0:56:45.410
the abuse number of the requests. I used[br]to spoke with some of the child protection

0:56:45.410,0:56:50.070
agencies that reported abuse at[br]Tor2Web.org, and they are effectively

0:56:50.070,0:56:55.570
using crawlers that periodically look for[br]changes in order to get new images to be

0:56:55.570,0:57:00.190
put in the database. And what I was able[br]to understand is that the German agency

0:57:00.190,0:57:07.440
doing that is crawling the same sites that[br]the Italian agencies are crawling, too.

0:57:07.440,0:57:11.890
So it’s likely that in most of the[br]countries there are the child protection

0:57:11.890,0:57:16.790
agencies that are crawling those few[br]numbers of Tor hidden services that

0:57:16.790,0:57:22.760
contain child porn. And I saw it also[br]a bit from the statistics of Tor2Web

0:57:22.760,0:57:28.500
where the amount of abuse relating to[br]that kind of content, it’s relatively low.

0:57:28.500,0:57:30.000
Just as contribution!

0:57:30.000,0:57:33.500
Gareth: Yes, that’s very interesting,[br]thank you for that!

0:57:33.500,0:57:37.260
<i>applause</i>

0:57:37.260,0:57:39.560
Herald: Next, Microphone 4, please.

0:57:39.560,0:57:45.260
Mic4: You then attacked or deanonymised[br]users with an infected or a modified Guard

0:57:45.260,0:57:51.810
relay? Is it required to modify the Guard[br]relay if I control the entry point

0:57:51.810,0:57:57.360
of the user to the internet?[br]If I’m his ISP?

0:57:57.360,0:58:01.900
Gareth: Yes, if you observe traffic[br]travelling into a Guard relay without

0:58:01.900,0:58:04.570
controlling the Guard relay itself.[br]Mic4: Yeah.

0:58:04.570,0:58:07.500
Gareth: In theory, yes. I wouldn’t be able[br]to tell you how reliable that is

0:58:07.500,0:58:10.500
off the top of my head.[br]Mic4: Thanks!

0:58:10.500,0:58:13.630
Herald: So another question[br]from the internet!

0:58:13.630,0:58:16.339
Signal Angel: Wouldn’t the ability to[br]choose the key hash prefix give

0:58:16.339,0:58:19.980
the ability to target specific .onions?

0:58:19.980,0:58:23.680
Gareth: So you can only target one .onion[br]address at a time. Because of the way

0:58:23.680,0:58:28.080
they are generated. So you wouldn’t be[br]able to say e.g. “Pick a key which targeted

0:58:28.080,0:58:32.339
two or more .onion addresses.” You can[br]only target one .onion address at a time

0:58:32.339,0:58:37.720
by positioning yourself at a particular[br]point on the distributed hash table.

0:58:37.720,0:58:40.260
Herald: Another one[br]from the internet? … Okay.

0:58:40.260,0:58:43.369
Then Microphone 3, please.

0:58:43.369,0:58:47.780
Mic3: Hey. Thanks for this research.[br]I think it strengthens the network.

0:58:47.780,0:58:54.300
So in the deem (?) I was wondering whether[br]you can donate this relays to be a part of

0:58:54.300,0:58:59.500
non-malicious relays pool, basically[br]use them as regular relays afterwards?

0:58:59.500,0:59:02.750
Gareth: Okay, so can I donate the relays[br]a rerun and at the Tor capacity (?) ?

0:59:02.750,0:59:05.490
Unfortunately, I said they were run by[br]a student and they were donated for

0:59:05.490,0:59:09.510
a fixed period of time. So we’ve given[br]those back to him. We are very grateful

0:59:09.510,0:59:14.790
to him, he was very generous. In fact,[br]without his contribution donating these

0:59:14.790,0:59:18.700
it would have been much more difficult[br]to collect as much data as we did.

0:59:18.700,0:59:21.490
Herald: Good, next, Microphone 5, please!

0:59:21.490,0:59:25.839
Mic5: Yeah hi, first of all thanks[br]for your talk. I think you’ve raised

0:59:25.839,0:59:29.310
some real issues that need to be[br]considered very carerfully by everyone

0:59:29.310,0:59:33.950
on the Tor Project. My question: I’d like[br]to go back to the issue with so many

0:59:33.950,0:59:38.470
abuse related web sites running over[br]the Tor Project. I think it’s an important

0:59:38.470,0:59:41.900
issue that really needs to be considered[br]because we don’t wanna be associated

0:59:41.900,0:59:44.840
with that at the end of the day.[br]Anyone who uses Tor, who runs a relay

0:59:44.840,0:59:51.250
or an exit node. And I understand it’s[br]a bit of a censored issue, and you don’t

0:59:51.250,0:59:55.300
really have any say over whether it’s[br]implemented or not. But I’d like to get

0:59:55.300,1:00:02.410
your opinion on the implementation[br]of a distributed block-deny system

1:00:02.410,1:00:06.980
that would run in very much a similar way[br]to those of the directory authorities.

1:00:06.980,1:00:08.950
I’d just like to see what[br]you think of that.

1:00:08.950,1:00:13.200
Gareth: So you’re asking me whether I want[br]to support a particular blocking mechanism

1:00:13.200,1:00:14.200
then?

1:00:14.200,1:00:16.470
Mic5: I’d like to get your opinion on it.[br]<i>Gareth laughs</i>

1:00:16.470,1:00:20.540
I know it’s a sensitive issue but I think,[br]like I said, I think something…

1:00:20.540,1:00:25.700
I think it needs to be considered because[br]everyone running exit nodes and relays

1:00:25.700,1:00:30.270
and people of the Tor Project don’t[br]want to be known or associated with

1:00:30.270,1:00:34.790
these massive amount of abuse web sites[br]that currently exist within the Tor network.

1:00:34.790,1:00:40.210
Gareth: I absolutely agree, and I think[br]the Tor Project are horrified as well that

1:00:40.210,1:00:43.960
this problem exists, and they, in fact,[br]talked on it in previous years that

1:00:43.960,1:00:48.690
they have a problem with this type of[br]content. I asked to what if anything is

1:00:48.690,1:00:52.340
done about it, it’s very much up to them.[br]Could it be done in a distributed fashion?

1:00:52.340,1:00:56.240
So the example I gave was a way which[br]it could be done by relay operators.

1:00:56.240,1:00:59.770
So e.g. that would need the consensus of[br]a large number of relay operators to be

1:00:59.770,1:01:02.890
effective. So that is done in[br]a distributed fashion. The question is:

1:01:02.890,1:01:06.810
who gives the list of .onion addresses to[br]block to each of the relay operators?

1:01:06.810,1:01:09.640
Clearly, the relay operators aren’t going[br]to collect themselves. It needs to be

1:01:09.640,1:01:15.780
supplied by someone like the Tor Project,[br]e.g., or someone trustworthy. Yes, it can

1:01:15.780,1:01:20.480
be done in a distributed fashion.[br]It can be done in an open fashion.

1:01:20.480,1:01:21.710
Mic5: Who knows?[br]Gareth: Okay.

1:01:21.710,1:01:23.750
Mic5: Thank you.

1:01:23.750,1:01:27.260
Herald: Good. And another[br]question from the internet.

1:01:27.260,1:01:31.210
Signal Angel: Apparently there’s an option[br]in the Tor client to collect statistics

1:01:31.210,1:01:35.169
on hidden services. Do you know about[br]this, and how it relates to your research?

1:01:35.169,1:01:38.551
Gareth: Yes, I believe they’re going to[br]be… the extent to which I know about it

1:01:38.551,1:01:41.930
is they’re gonna be trying this next[br]month, to try and estimate how many

1:01:41.930,1:01:46.490
hidden services there are. So keep[br]your eye on the Tor Project web site,

1:01:46.490,1:01:50.340
I’m sure they’ll be publishing[br]their data in the coming months.

1:01:50.340,1:01:55.090
Herald: And, sadly, we are running out of[br]time, so this will be the last question,

1:01:55.090,1:01:56.980
so Microphone 4, please!

1:01:56.980,1:02:01.250
Mic4: Hi, I’m just wondering if you could[br]sort of outline what ethical clearances

1:02:01.250,1:02:04.510
you had to get from your university[br]to conduct this kind of research.

1:02:04.510,1:02:07.260
Gareth: So we have to discuss these[br]types of things before undertaking

1:02:07.260,1:02:11.970
any research. And we go through the steps[br]to make sure that we’re not e.g. storing

1:02:11.970,1:02:16.370
sensitive information about particular[br]people. So yes, we are very mindful

1:02:16.370,1:02:19.240
of that. And that’s why I made a[br]particular point of putting on the slides

1:02:19.240,1:02:21.510
as to some of the things to consider.

1:02:21.510,1:02:26.180
Mic4: So like… you outlined a potential[br]implementation of the traffic correlation

1:02:26.180,1:02:29.500
attack. Are you saying that[br]you performed the attack? Or…

1:02:29.500,1:02:33.180
Gareth: No, no no, absolutely not.[br]So the link I’m giving… absolutely not.

1:02:33.180,1:02:34.849
We have not engaged in any…

1:02:34.849,1:02:36.350
Mic4: It just wasn’t clear[br]from the slides.

1:02:36.350,1:02:39.380
Gareth: I apologize. So it’s absolutely[br]clear on that. No, we’re not engaging

1:02:39.380,1:02:42.860
in any deanonymisation research on the[br]Tor network. The research I showed

1:02:42.860,1:02:46.079
is linked on the references, I think,[br]which I put at the end of the slides.

1:02:46.079,1:02:52.000
You can read about it. But it’s done in[br]simulation. So e.g. there’s a way

1:02:52.000,1:02:54.730
to do simulation of the Tor network on[br]a single computer. I can’t remember

1:02:54.730,1:02:58.880
the name of the project, though.[br]Shadow! Yes, it’s a system

1:02:58.880,1:03:02.170
called Shadow, we can run a large[br]number of Tor relays on a single computer

1:03:02.170,1:03:04.579
and simulate the traffic between them.[br]If you’re going to do that type of research

1:03:04.579,1:03:09.380
then you should use that. Okay,[br]thank you very much, everyone.

1:03:09.380,1:03:17.985
<i>applause</i>

1:03:17.985,1:03:22.071
<i>silent postroll titles</i>

1:03:22.071,1:03:27.000
subtitles created by c3subtitles.de[br]Join, and help us!