<i>silent 31C3 preroll</i>

Dr. Gareth Owen: Hello. Can you hear me?
Yes. Okay. So my name is Gareth Owen.

I’m from the University of Portsmouth.
I’m an academic

and I’m going to talk to you about
an experiment that we did

on the Tor hidden services,
trying to categorize them,

estimate how many they were etc. etc.

Well, as we go through the talk
I’m going to explain

how Tor hidden services work internally,
and how the data was collected.

So what sort of conclusions you can draw
from the data based on the way that we’ve

collected it. Just so [that] I get
an idea: how many of you use Tor

on a regular basis, could you
put your hand up for me?

So quite a big number. Keep your hand
up if… or put your hand up if you’re

a relay operator.

Wow, that’s quite a significant number,
isn’t it? And then, put your hand up

and/or keep it up if you
run a hidden service.

Okay, so, a fewer number, but still
some people run hidden services.

Okay, so, some of you may be very familiar
with the way Tor works, sort of,

in a low level. But I am gonna go through
it for those which aren’t, so they understand

just how they work. And as we go along,
because I’m explaining how

the hidden services work, I’m going
to tag on information on how

the Tor hidden services themselves can be
deanonymised and also how the users

of those hidden services can be
deanonymised, if you put

some strict criteria on what it is you
want to do with respect to them.

So the things that I’m going to go over:
I wanna go over how Tor works,

and then specifically how hidden services
work. I’m gonna talk about something

called the “Tor Distributed Hash Table”
for hidden services. If you’ve heard

that term and don’t know what
it means, don’t worry, I’ll explain

what a distributed hash table is and
how it works. It’s not as complicated

as it sounds. And then I wanna go over
Darknet data, so, data that we collected

from Tor hidden services. And as I say,
as we go along I will sort of explain

how you do deanonymisation of both the
services themselves and of the visitors

to the service. And just
how complicated it is.

So you may have seen this slide which
I think was from GCHQ, released last year

as part of the Snowden leaks where they
said: “You can deanonymise some users

some of the time but they’ve had
no success in deanonymising someone

in response to a specific request.”
So, given all of you e.g., I may be able

to deanonymise a small fraction of you
but I can’t choose precisely one person

I want to deanonymise. That’s what
I’m gonna be explaining in relation

to the deanonymisation attacks, how
you can deanonymise a section but

you can’t necessarily choose which section
of the users that you will be deanonymising.

Tor drives with just a couple
of different problems. On one part

it allows you to bypass censorship. So if
you’re in a country like China, which

blocks some types of traffic you can use
Tor to bypass their censorship blocks.

It tries to give you privacy, so, at some
level in the network someone can’t see

what you’re doing. And at another point
in the network people who don’t know

who you are but may necessarily
be able to see what you’re doing.

Now the traditional case
for this is to look at VPNs.

With a VPN you have
sort of a single provider.

You have lots of users connecting
to the VPN. The VPN has sort of

a mixing effect from an outside or
a server’s point of view. And then

out of the VPN you see requests
to Twitter, Wikipedia etc. etc.

And if that traffic doesn’t encrypt it then
the VPN can also read the contents

of the traffic. Now of course there is
a fundamental weakness with this.

If you trust the VPN provider the VPN
provider knows both who you are

and what you’re doing and can
link those two together with absolute

certainty. So you don’t… whilst you do
get some of these properties, assuming

you’ve got a trustworthy VPN provider
you don’t get them in the face of

an untrustworthy VPN provider.
And of course: how do you trust the VPN

provider? What sort of measure do
you use? That’s sort of an open question.

So Tor tries to solve this problem
by distributing the trust. Tor is

an open source project, so you can go
on to their Git repository, you can

download the source code, and change it,
improve it, submit patches etc.

As you heard earlier, during Jacob and
Roger’s talk they’re currently partly

sponsored by the US Government which seems
a bit paradoxical, but they explained

in that talk many of the… that
doesn’t affect like judgment.

And indeed, they do have some funding from
other sources, and they design that system

– which I’ll talk about a little bit
later – in a way where they don’t have

to trust each other. So there’s sort of
some redundancy, and they’re trying

to minimize these sort of trust issues
related to this. Now, Tor is

a partially de-centralized network, which
means that it has some centralized

components which are under the control of
the Tor Project and some de-centralized

components which are normally the Tor
relays. If you run a relay you’re

one of those de-centralized components.
There is, however, no single authority

on the Tor network.
So no single server which is responsible,

which you’re required to trust.
So the trust is somewhat distributed,

but not entirely. When you establish
a circuit through Tor you, the user,

download a list of all of the relays
inside the Tor network.

And you get to pick – and I’ll tell you
how you do that – which relays

you’re going to use to route your traffic
through. So here is a typical example:

You’re here on the left hand side as the
user. You download a list of the relays

inside the Tor network and you select from
that list three nodes, a guard node

which is your entry into the Tor network,
a relay node which is a middle node.

Essentially, it’s going to route your
traffic to a third hop. And then

the third hop is the exit node where
your traffic essentially exits out

on the internet. Now, looking at the
circuit. So this is a circuit through

the Tor network through which you’re
going to route your traffic. There are

three layers of encryption at the
beginning, so between you

and the guard node. Your traffic
is encrypted three times.

In the first instance encrypted to the
guard, and the it’s encrypted again,

through the relay, and then encrypted
again to the exit, and as the traffic moves

through the Tor network each of those
layers of encryption are unpeeled

from the data. The Guard here in this case
knows who you are, and the exit relay

knows what you’re doing but neither know
both. And the middle relay doesn’t really

know a lot, except for which relay is
her guard and which relay is her exit.

Who runs an exit relay? So if you run
an exit relay all of the traffic which

users are sending out on the internet they
appear to come from your IP address.

So running an exit relay is potentially
risky because someone may do something

through your relay which attracts attention.
And then, when law enforcement

traced that back to an IP address it’s
going to come back to your address.

So some relay operators have had trouble
with this, with law enforcement coming

to them, and saying: “Hey we got this
traffic coming through your IP address

and you have to go and explain it.”
So if you want to run an exit relay

it’s a little bit risky, but we’re thankful
for those people that do run exit relays

because ultimately if people didn’t run
an exit relay you wouldn’t be able

to get out of the Tor network, and it
wouldn’t be terribly useful from this

point of view. So, yes.
<i>applause</i>

So every Tor relay, when you set up
a Tor relay you publish something called

a descriptor which describes your Tor
relay and how to use it to a set

of servers called the authorities. And the
trust in the Tor network is essentially

split across these authorities. They’re run
by the core Tor Project members.

And they maintain a list of all of the
relays in the network. And they observe

them over a period of time. If the relays
exhibit certain properties they give

the relays flags. If e.g. a relay allows
traffic to exit from the Tor network

it will get the ‘Exit’ flag. If they’d been
switched on for a certain period of time,

or for a certain amount of traffic they’ll
be allowed to become the guard relay

which is the first node in your circuit.
So when you build your circuit you

download a list of these descriptors from
one of the Directory Authorities. You look

at the flags which have been assigned to
each of the relays, and then you pick

your route based on that. So you’ll pick
the guard node from a set of relays

which have the ‘Guard’ flag, your exits
from the set of relays which have

the ‘Exit’ flag etc. etc. Now, as of
a quick count this morning there are

about 1500 guard relays, around 1000 exit
relays, and six relays flagged as ‘bad’ exits.

What does a ‘bad exit’ mean?
<i>waits for audience to respond</i>

That’s not good! That’s exactly
what it means! Yes! <i>laughs</i>

<i>applause</i>

So relays which have been flagged as ‘bad
exits’ your client will never chose to exit

traffic through. And examples of things
which may get a relay flagged as an

[bad] exit relay – if they’re fiddling with
the traffic which is coming out of

the Tor relay. Or doing things like
man-in-the-middle attacks against

SSL traffic. We’ve seen various things,
there have been relays man-in-the-middling

SSL traffic, there have very, very recently
been an exit relay which was patching

binaries that you downloaded from the
internet, inserting malware into the binaries.

So you can do these things but the Tor
Project tries to scan for them. And if

these things are detected then they’ll be
flagged as ‘Bad Exits’. It’s true to say

that the scanning mechanism is not 100%
fool-proof by any stretch of the imagination.

It tries to pick up common types
of attacks, so as a result

it won’t pick up unknown attacks or
attacks which haven’t been seen or

have not been known about beforehand.

So looking at this, how do you deanonymise
the traffic travelling through the Tor

networks? Given some traffic coming out
of the exit relay, how do you know

which user that corresponds to? What is
their IP address? You can’t actually

modify the traffic because if any of the
relays tried to modify the traffic

which they’re sending through the network
Tor will tear down the circuit through the relay.

So there’s these integrity checks, each
of the hops. And if you try to sort of

– because you can’t decrypt the packet
you can’t modify it in any meaningful way,

and because there’s an integrity check
at the next hop that means that you can’t

modify the packet because otherwise it’s
detected. So you can’t do this sort of

marker, and try and follow the marker
through the network. So instead

what you can do if you control… so let me
give you two cases. In the worst case

if the attacker controls all three of your
relays that you pick, which is an unlikely

scenario that needs to control quite
a big proportion of the network. Then

it should be quite obvious that they can
work out who you are and also

see what you’re doing because in that
case they can tag the traffic, and

they can just discard these integrity
checks at each of the following hops.

Now in a different case, if you control
the Guard relay and the exit relay

but not the middle relay the Guard relay
can’t tamper with the traffic because

this middle relay will close down the
circuit as soon as it happens.

The exit relay can’t send stuff back down
the circuit to try and identify the user,

either. Because again, the circuit will be
closed down. So what can you do?

Well, you can count the number of packets
going through the Guard node. And you can

measure the timing differences between
packets, and try and spot that pattern

at the Exit relays. You’re looking at counts of
packets and the timing between those

packets which are being sent, and
essentially trying to correlate them all.

So if your user happens to pick you as
your Guard node, and then happens to pick

your exit relay, then you can deanonymise
them with very high probability using

this technique. You’re just correlating
the timings of packets and counting

the number of packets going through.
And the attacks demonstrated in literature

are very reliable for this. We heard
earlier from the Tor talk about the “relay

early” tag which was the attack discovered
by the cert researches in the US.

That attack didn’t rely on timing attacks.
Instead, what they were able to do was

send a special type of cell containing
the data back down the circuit,

essentially marking this data, and saying:
“This is the data we’re seeing

at the Exit relay, or at the hidden
service", and encode into the messages

travelling back down the circuit, what the
data was. And then you could pick

those up at the Guard relay and say, okay,
whether it’s this person that’s doing that.

In fact, although this technique works,
and yeah it was a very nice attack,

the traffic correlation attacks are
actually just as powerful.

So although this bug has been fixed traffic
correlation attacks still work and are

still fairly, fairly reliable. So the problem
still does exist. This is very much

an open question. How do we solve this
problem? We don’t know, currently,

how to solve this problem of trying
to tackle the traffic correlation.

There are a couple of solutions.
But they’re not particularly…

they’re not particularly reliable. Let me
just go through these, and I’ll skip back

on the few things I’ve missed. The first
thing is, high-latency networks, so

networks where packets are delayed
in their transit through the network.

That throws away a lot of the timing
information. So they promise

to potentially solve this problem.
But of course, if you want to visit

Google’s home page, and you have to wait
five minutes for it, you’re simply

just not going to use Tor. The whole point
is trying to make this technology usable.

And if you got something which is very,
very slow then it doesn’t make it

attractive to use. But of course,
this case does work slightly better

for e-mail. If you think about it with
e-mail, you don’t mind if you’re e-mail

– well, you may not mind, you may mind –
you don’t mind if your e-mail is delayed

by some period of time. Which makes this
somewhat difficult. And as Roger said

earlier, you can also introduce padding
into the circuit, so these are dummy cells.

But, but… with a big caveat: some of the
research suggests that actually you’d

need to introduce quite a lot of padding
to defeat these attacks, and that would

overload the Tor network in its current
state. So, again, not a particular

practical solution.

How does Tor try to solve this problem?
Well, Tor makes it very difficult

to become a users Guard relay. If you
can’t become a users Guard relay

then you don’t know who the user is, quite
simply. And so by making it very hard

to become the Guard relay therefore you
can’t do this traffic correlation attack.

So at the moment the Tor client chooses
one Guard relay and keeps it for a period

of time. So if I want to sort of target
just one of you I would need to control

the Guard relay that you were using at
that particular point in time. And in fact

I’d also need to know what that Guard
relay is. So by making it very unlikely

that you would select a particular malicious
Guard relay, where the number of malicious

Guard relays is very small, that’s how Tor
tries to solve this problem. And

at the moment your Guard relay is your
barrier of security. If the attacker can’t

control the Guard relay then they won’t
know who you are. That doesn’t mean

they can’t try other sort of side channel
attacks by messing with the traffic

at the Exit relay etc. You know that you
may sort of e.g. download dodgy documents

and open one on your computer, and those
sort of things. Now the alternative

of course to having a Guard relay
and keeping it for a very long time

will be to have a Guard relay and
to change it on a regular basis.

Because you might think, well, just choosing
one Guard relay and sticking with it

is probably a bad idea. But actually,
that’s not the case. If you pick

the Guard relay, and assuming that the
chance of picking a Guard relay that is

malicious is very low, then, when you
first use your Guard relay, if you got

a good choice, then your traffic is safe.
If you haven’t got a good choice then

your traffic isn’t safe. Whereas if your
Tor client chooses a Guard relay

every few minutes, or every hour, or
something on those lines at some point

you’re gonna pick a malicious Guard relay.
So they’re gonna have some of your traffic

but not all of it. And so currently the
trade-off is that we make it very difficult

for an attacker to control a Guard relay
and the user picks a Guard relay and

keeps it for a long period of time. And
so it’s very difficult for the attackers

to pick that Guard relay when they control
a very small proportion of the network.

So this, currently, provides those
properties I described earlier, the privacy

and the anonymity when you’re browsing the
web, when you’re accessing websites etc.

But still you know who the website is. So
although you’re anonymous and the website

doesn’t know who you are you know who the
website is. And there may be some cases

where e.g. the website would also wish to
remain anonymous. You want the person

accessing the website and the website
itself to be anonymous to each other.

And you could think about people e.g.
being in countries where running

a political blog e.g. might be a dangerous
activity. If you run that on a regular

webserver you’re easily identified whereas,
if you got some way where you as

the webserver can be anonymous then
that allows you to do that activity without

being targeted by your government. So
this is what hidden services try to solve.

Now when you first think about a problem
you kind of think: “Hang on a second,

the user doesn’t know who the website
is and the website doesn’t know

who the user is. So how on earth do they
talk to each other?” Well, that’s essentially

what the Tor hidden service protocol tries
to sort of set up. How do you identify and

connect to each other. So at the moment
this is what happens: We’ve got Bob

on the [right] hand side who is the hidden
service. And we got Alice on the left hand

side here who is the user who wishes to
visit the hidden service. Now when Bob

sets up his hidden service he picks three
nodes in the Tor network as introduction

points and builds several hop circuits to
them. So the introduction points don’t know

who Bob is. Bob has circuits to them. And
Bob says to each of these introduction points

“Will you relay traffic to me if someone
connects to you asking for me?”

And then those introduction points
do that. So then, once Bob has picked

his introduction points he publishes
a descriptor describing the list of his

introduction points for someone who wishes
to come onto his websites. And then Alice

on the left hand side wishing to visit Bob
will pick a rendezvous point in the network

and build a circuit to it. So this “RP”
here is the rendezvous point.

And she will relay a message via one of
the introduction points saying to Bob:

“Meet me at the rendezvous point”.
And then Bob will build a 3-hop-circuit

to the rendezvous point. So now at this
stage we got Alice with a multi-hop circuit

to the rendezvous point, and Bob with
a multi-hop circuit to the rendezvous point.

Alice and Bob haven’t connected to one
another directly. The rendezvous point

doesn’t know who Bob is, the rendezvous
point doesn’t know who Alice is.

All they’re doing is forwarding the
traffic. And they can’t inspect the traffic,

either, because the traffic itself
is encrypted.

So that’s currently how you solve this
problem with trying to communicate

with someone who you don’t know
who they are and vice versa.

<i>drinks from the bottle</i>

The principle thing I’m going to talk
about today is this database.

So I said, Bob, when he picks his
introduction points he builds this thing

called a descriptor, describing who his
introduction points are, and he publishes

them to a database. This database itself
is distributed throughout the Tor network.

It’s not a single server. So both, Bob and
Alice need to be able to publish information

to this database, and also retrieve
information from this database. And Tor

currently uses something called
a distributed hash table, which I’m gonna

give an example of what this means and
how it works. And then I’ll talk to you

specifically how the Tor Distributed Hash
Table works itself. So let’s say e.g.

you've got a set of servers. So here we've
got 26 servers and you’d like to store

your files across these different servers
without having a single server responsible

for deciding, “okay, that file is stored
on that server, and this file is stored

on that server” etc. etc. Now here is my
list of files. You could take a very naive

approach. And you could say: “Okay, I’ve
got 26 servers, I got all of these file names

and start with the letter of the alphabet.”
And I could say: “All of the files that begin

with A are gonna go under server A; or
the files that begin with B are gonna go

on server B etc.” And then when you want
to retrieve a file you say: “Okay, what

does my file name begin with?” And then
you know which server it’s stored on.

Now of course you could have a lot of
servers – sorry – a lot of files

which begin with a Z, an X or a Y etc. in
which case you’re gonna overload

that server. You’re gonna have more files
stored on one server than on another server

in your set. And if you have a lot of big
files, say e.g. beginning with B then

rather than distributing your files across
all the servers you’re gonna just be

overloading one or two of them. So to
solve this problem what we tend to do is:

we take the file name, and we run it
through a cryptographic hash function.

A hash function produces output which
looks like random, very small changes

in the input so a cryptographic hash
function produces a very large change

in the output. And this change looks
random. So if I take all of my file names

here, and assuming I have a lot more,
I take a hash of them, and then I use

that hash to determine which server to
store the file on. Then, with high probability

my files will be distributed evenly across
all of the servers. And then when I want

to go and retrieve one of the files I take
my file name, I run it through the

cryptographic hash function, that gives me
the hash, and then I use that hash

to identify which server that particular
file is stored on. And then I go and

retrieve it. So that’s the sort of a loose
idea of how a distributed hash table works.

There are a couple of problems with this.
What if you got a changing size, what

if the number of servers you got changes
in size as it does in the Tor network.

It’s a very brief overview of the theory.
So how does it apply for the Tor network?

Well, the Tor network has a set of relays
and it has a set of hidden services.

Now we take all of the relays, and they
have a hash identity which identifies them.

And we map them onto a circle using that
hash value as an identifier. So you can

imagine the hash value ranging from Zero
to a very large number. We got a Zero point

at the very top there. And that runs all
the way round to the very large number.

So given the identity hash for a relay we
can map that to a particular point on

the server. And then all we have to do
is also do this for hidden services.

So there’s a hidden service address,
something.onion, so this is

one of the hidden websites that you might
visit. You take the – I’m not gonna describe

in too much detail how this is done but –
the value is done in such a way such that

it’s evenly distributed about the circle.
So your hidden service will have

a particular point on the circle. And the
relays will also be mapped onto this circle.

So there’s the relays. And the hidden
service. And in the case of Tor

the hidden service actually maps to two
positions on the circle, and it publishes

its descriptor to the three relays to the
right at one position, and the three relays

to the right at another position. So there
are actually in total six places where

this descriptor is published on the
circle. And then if I want to go and

fetch and connect to a hidden service
I go on to go and pull this hidden descriptor

down to identify what its introduction
points are. I take the hidden service

address, I find out where it is on the
circle, I map all of the relays onto

the circle, and then I identify which
relays on the circle are responsible

for that particular hidden service. And
I just connect, then I say: “Do you have

a copy of the descriptor for that
particular hidden service?”

And if so then we’ve got our list of
introduction points. And we can go

to the next steps to connect to our hidden
service. So I’m gonna explain how we

sort of set up our experiments. What we
thought, or what we were interested to do,

was collect publications of hidden
services. So for everytime a hidden service

gets set up it publishes to this distributed
hash table. What we wanted to do was

collect those publications so that we
get a complete list of all of the hidden

services. And what we also wanted to do
is to find out how many times a particular

hidden service is requested.

Just one more point that
will become important later.

The position which the hidden service
appears on the circle changes

every 24 hours. So there’s not
a fixed position every single day.

If we run 40 nodes over a long period of
time we will occupy positions within

that distributed hash table. And we will be
able to collect publications and requests

for hidden services that are located at
that position inside the distributed

hash table. So in that case we ran 40 Tor
nodes, we had a student at university

who said: “Hey, I run a hosting company,
I got loads of server capacity”, and

we told him what we were doing, and he
said: “Well, you really helped us out,

these last couple of years…”
and just gave us loads of server capacity

to allow us to do this. So we spun up 40
Tor nodes. Each Tor node was required

to advertise a certain amount of bandwidth
to become a part of that distributed

hash table. It’s actually a very small
amount, so this didn’t matter too much.

And then, after – this has changed
recently in the last few days,

it used to be 25 hours, it’s just been
increased as a result of one of the

attacks last week. But here… certainly
during our study it was 25 hours. You then

appear at a particular point inside that
distributed hash table. And you’re then

in a position to record publications of
hidden services and requests for hidden

services. So not only can you get a full
list of the onion addresses you can also

find out how many times each of the
onion addresses are requested.

And so this is what we recorded. And then,
once we had a full list of… or once

we had run for a long period of time to
collect a long list of .onion addresses

we then built a custom crawler that would
visit each of the Tor hidden services

in turn, and pull down the HTML contents,
the text content from the web page,

so that we could go ahead and classify
the content. Now it’s really important

to know here, and it will become obvious
why a little bit later, we only pulled down

HTML content. We didn’t pull out images.
And there’s a very, very important reason

for that which will become clear shortly.

We had a lot of questions when we
first started this. Noone really knew

how many hidden services there were. It had
been suggested to us there was a very high

turn-over of hidden services. We wanted to
confirm that whether that was true or not.

And we also wanted to do this so,
what are the hidden services,

how popular are they, etc. etc. etc. So
our estimate for how many hidden services

there are, over the period which we
ran our study, this is a graph plotting

our estimate for each of the individual
days as to how many hidden services

there were on that particular day. Now the
data is naturally noisy because we’re only

a very small proportion of that circle.
So we’re only observing a very small

proportion of the total publications and
requests every single day, for each of

those hidden services. And if you
take a long term average for this

there’s about 45.000 hidden services that
we think were present, on average,

each day, during our entire study. Which
is a large number of hidden services.

But over the entire length we
collected about 80.000, in total.

Some came and went etc.
So the next question after how many

hidden services there are is how long
the hidden service exists for.

Does it exist for a very long period
of time, does it exist for a very short

period of time etc. etc.
So what we did was, for every single

.onion address we plotted how many times
we saw a publication for that particular

hidden service during the six months.
How many times did we see it.

If we saw it a lot of times that suggested
in general the hidden service existed

for a very long period of time. If we saw
a very short number of publications

for each hidden service then that
suggests that they were only present

for a very short period of time. This is
our graph. By far the most number

of hidden services we only saw once during
the entire study. And we never saw them

again. We suggest that there’s a very high
turnover of the hidden services, they

don’t tend to exist on average i.e. for
a very long period of time.

And then you can see the sort of
a tail here. If we plot just those

hidden services which existed for a long
time, so e.g. we could take hidden services

which have a high number of hit requests
and say: “Okay, those that have a high number

of hits probably existed for a long time.”
That’s not absolutely certain, but probably.

Then you see this sort of -normal- plot
about 4..5, so we saw on average

most hidden services four or five times
during the entire six months if they were

popular and we’re using that as a proxy
measure for whether they existed

for the entire time. Now, this stage was
over 160 days, so almost six months.

What we also wanted to do was trying
to confirm this over a longer period.

So last year, in 2013, about February time
some researchers of the University

of Luxemburg also ran a similar study
but it ran over a very short period of time

over the day. But they did it in such
a way it could collect descriptors

across much of the circle during a single
day. That was because of a bug in the way

Tor did some of the things which has
now been fixed so we can’t repeat that

as a particular way. So we got a list of
.onion addresses from February 2013

from these researchers at the University
of Luxemburg. And then we got our list

of .onion addresses from this six months
which was March to September of this year.

And we wanted to say, okay, we’re given
these two sets of .onion addresses.

Which .onion addresses existed in his set
but not ours and vice versa, and which

.onion addresses existed in both sets?

So as you can see a very small minority
of hidden service addresses existed

in both sets. This is over an 18 month
period between these two collection points.

A very small number of services existed
in both his data set and in

our data set. Which again suggested
there’s a very high turnover of hidden

services that don’t tend to exist
for a very long period of time.

So the question is why is that?
Which we’ll come on to a little bit later.

It’s a very valid question, can’t answer
it 100%, we have some inclines as to

why that may be the case. So in terms
of popularity which hidden services

did we see, or which .onion addresses
did we see requested the most?

Which got the most number of hits? Or the
most number of directory requests.

So botnet Command &amp; Control servers
– if you’re not familiar with what

a botnet is, the idea is to infect lots of
people with a piece of malware.

And this malware phones home to
a Command &amp; Control server where

the botnet master can give instructions
to each of the bots on to do things.

So it might be e.g. to collect passwords,
key strokes, banking details.

Or it might be to do things like
Distributed Denial of Service attacks,

or to send spam, those sorts of things.
And a couple of years ago someone gave

a talk and said: “Well, the problem with
running a botnet is your C&amp;C servers

are vulnerable.” Once a C&amp;C server is taken
down you no longer have control over

your botnet. So it’s been a sort of arms
race against anti-virus companies and

against malware authors to try and come up
with techniques to run C&amp;C servers in a way

which they can’t be taken down. And
a couple of years ago someone gave a talk

at a conference that said: “You know what?
It would be a really good idea if botnet

C&amp;C servers were run as Tor hidden
services because then no one knows

where they are, and in theory they can’t
be taken down.” So in the fact we have this

there are loads and loads and loads of
these addresses associated with several

different botnets, ‘Sefnit’ and ‘Skynet’.
Now Skynet is the one I wanted to talk

to you about because the guy that runs
Skynet had a twitter account, and he also

did a Reddit AMA. If you not heard
of a Reddit AMA before, that’s a Reddit

ask-me-anything. You can go on the website
and ask the guy anything. So this guy

wasn’t hiding in the shadows. He’d say:
“Hey, I’m running this massive botnet,

here’s my Twitter account which I update
regularly, here is my Reddit AMA where

you can ask me questions!” etc.

He was arrested last year, which is not,
perhaps, a huge surprise.

<i>laughter and applause</i>

But… so he was arrested,
his C&amp;C servers disappeared

but there were still infected hosts trying
to connect with the C&amp;C servers and

request access to the C&amp;C server.

This is why we’re saying: “A large number
of hits.” So all of these requests are

failed requests, i.e. we didn’t have
a descriptor for them because

the hidden service had gone away but
there were still clients requesting each

of the hidden services.

And the next thing we wanted to do was
to try and categorize sites. So, as I said

earlier, we crawled all of the hidden
services that we could, and we classified

them into different categories based
on what the type of content was

on the hidden service side. The first
graph I have is the number of sites

in each of the categories. So you can see
down the bottom here we got lots of

different categories. We got drugs, market
places, etc. on the bottom. And the graph

shows the percentage of the hidden
services that we crawled that fit in

to each of these categories. So e.g. looking
at this, drugs, the most number of sites

that we crawled were made up of
drugs-focused websites, followed by

market places etc. There’s a couple of
questions you might have here,

so which ones are gonna stick out, what
does ‘porn’ mean, well, you know

what ‘porn’ means. There are some very
notorious porn sites on the Tor Darknet.

There was one in particular which was
focused on revenge porn. It turns out

that youngsters wish to take pictures
of themselves, and send it to their

boyfriends or their girlfriends. And
when they get dumped they publish them

on these websites. So there were several
of these sites on the main internet

which have mostly been shut down.
And some of these sites were archived

on the Darknet. The second one is that
we should probably wonder what is,

is ‘abuse’. Abuse was… every single
site we classified in this category

were child abuse sites. So they were in
some way facilitating child abuse.

And how do we know that? Well, the data
that came back from the crawler

made it completely unambiguous as to what
the content was in these sites. That was

completely obvious, from then content, from
the crawler as to what was on these sites.

And this is the principal reason why we
didn’t pull down images from sites.

There are many countries that
would be a criminal offense to do so.

So our crawler only pulled down text
content from all of these sites, and that

enabled us to classify them, based on
that. We didn’t pull down any images.

So of course the next thing we liked to do
is to say: “Okay, well, given each of these

categories, what proportion of directory
requests went to each of the categories?”

Now the next graph is going to need some
explaining as to precisely what it

means, and I’m gonna give that. This is
the proportion of directory requests

which we saw that went to each of the
categories of hidden service that we

classified. As you can see, in fact, we
saw a very large number going to these

abuse sites. And the rest sort of
distributed right there, at the bottom.

And the question is: “What is it
we’re collecting here?”

We’re collecting successful hidden service
directory requests. What does a hidden

service directory request mean?
It probably loosely correlates with

either a visit or a visitor. So somewhere
in between those two. Because when you

want to visit a hidden service you make
a request for the hidden service descriptor

and that allows you to connect to it
and browse through the web site.

But there are cases where, e.g. if you
restart Tor, you’ll go back and you

re-fetch the descriptor. So in that case
we’ll count twice, for example.

What proportion of these are people,
and which proportion of them are

something else? The answer to that is
we just simply don’t know.

We've got directory requests but that doesn’t
tell us about what they’re doing on these

sites, what they’re fetching, or who
indeed they are, or what it is they are.

So these could be automated requests,
they could be human beings. We can’t

distinguish between those two things.

What are the limitations?

A hidden service directory request neither
exactly correlates to a visit -or- a visitor.

It’s probably somewhere in between.
So you can’t say whether it’s exactly one

or the other. We cannot say whether
a hidden service directory request

is a person or something automated.
We can’t distinguish between those two.

Any type of site could be targeted by e.g.
DoS attacks, by web crawlers which would

greatly inflate the figures. If you were
to do a DoS attack it’s likely you’d only

request a small number of descriptors.
You’d actually be flooding the site itself

rather than the directories. But, in
theory, you could flood the directories.

But we didn’t see any sort of shutdown
of our directories based on flooding, e.g.

Whilst we can’t rule that out, it doesn’t
seem to fit too well with what we’ve got.

The other question is ‘crawlers’.
I obviously talked with the Tor Project

about these results and they’ve suggested
that there are groups, so the child

protection agencies e.g. that will crawl
these sites on a regular basis. And,

again, that doesn’t necessarily correlate
with a human being. And that could

inflate the figures. How many hidden
directory requests would there be

if a crawler was pointed at it. Typically,
if I crawl them on a single day, one request.

But if they got a large number of servers
doing the crawling then it could be

a request per day for every single server.
So, again, I can’t give you, definitive,

“yes, this is human beings” or
“yes, this is automated requests”.

The other important point is, these two
content graphs are only hidden services

offering web content. There are hidden
services that do things, e.g. IRC,

the instant messaging etc. Those aren’t
included in these figures. We’re only

concentrating on hidden services offering
web sites. They’re HTTP services, or HTTPS

services. Because that allows to easily
classify them. And, in fact, some of

the other types are IRC and Jabber the
result was probably not directly comparable

with web sites. That’s sort of the use
case for using them, it’s probably

slightly different. So I appreciate the
last graph is somewhat alarming.

If you have any questions please ask
either me or the Tor developers

as to how to interpret these results. It’s
not quite as straight-forward as it may

look when you look at the graph. You
might look at the graph and say: “Hey,

that looks like there’s lots of people
visiting these sites”. It’s difficult

to conclude that from the results.

The next slide is gonna be very
contentious. I will prefix it with:

“I’m not advocating -any- kind of
action whatsoever. I’m just trying

to describe technically as to what could
be done. It’s not up to me to make decisions

on these types of things.” So, of course,
when we found this out, frankly, I think

we were stunned. I mean, it took us
several days, frankly, it just stunned us,

“what the hell, this is not
what we expected at all.”

So a natural step is, well, we think, most
of us think that Tor is a great thing,

it seems. Could this problem be sorted out
while still keeping Tor as it is?

And probably the next step to say: “Well,
okay, could we just block this class

of content and not other types of content?”
So could we block just hidden services

that are associated with these sites and
not other types of hidden services?

We thought there’s three ways in which
we could block hidden services.

And I’ll talk about whether these were
impossible in the coming months,

after explaining them. But during our
study these would have been impossible

and presently they are possible.

A single individual could shut down
a single hidden service by controlling

all of the relays which are responsible
for receiving a publication request

on that distributed hash table. It’s
possible to place one of your relays

at a particular position on that circle
and so therefore make yourself be

the responsible relay for
a particular hidden service.

And if you control all of the six relays
which are responsible for a hidden service,

when someone comes to you and says:
“Can I have a descriptor for that site”

you can just say: “No, I haven’t got it”.
And provided you control those relays

users won’t be able to fetch those sites.

The second option is you could say:
“Okay, the Tor Project are blocking these”

– which I’ll talk about in a second –
“as a relay operator”. Could I

as a relay operator say: “Okay, as
a relay operator I don’t want to carry

this type of content, and I don’t want to
be responsible for serving up this type

of content.” A relay operator could patch
his relay and say: “You know what,

if anyone comes to this relay requesting
anyone of these sites then, again, just

refuse to do it”. The problem is a lot of
relay operators need to do it. So a very,

very large number of the potential relay
operators would need to do that

to effectively block these sites. The
final option is the Tor Project could

modify the Tor program and actually embed
these ingresses in the Tor program itself

so as that all relays by default both
block hidden service directory requests

to these sites, and also clients themselves
would say: “Okay, if anyone’s requesting

these block them at the client level.”
Now I hasten to add: I’m not advocating

any kind of action that is entirely up to
other people because, frankly, I think

if I advocated blocking hidden services
I probably wouldn’t make it out alive,

so I’m just saying: this is a description
of what technical measures could be used

to block some classes of sites. And of
course there’s lots of questions here.

If e.g. the Tor Project themselves decided:
“Okay, we’re gonna block these sites”

that means they are essentially
in control of the block list.

The block list would be somewhat public
so everyone would be up to inspect

what the sites are that are being blocked
and they would be in control of some kind

of block list. Which, you know, arguably
is against what the Tor Projects are after.

<i>takes a sip, coughs</i>

So how about deanonymising visitors
to hidden service web sites?

So in this case we got a user on the
left-hand side who is connected to

a Guard node. We’ve got a hidden service
on the right-hand side who is connected

to a Guard node and on the top we got
one of those directory servers which is

responsible for serving up those
hidden service directory requests.

Now, when you first want to connect to
a hidden service you connect through

your Guard node and through a couple of hops
up to the hidden service directory and

you request the descriptor off of them.
So at this point if you are the attacker

and you control one of the hidden service
directory nodes for a particular site

you can send back down the circuit
a particular pattern of traffic.

And if you control that user’s
Guard node – which is a big if –

then you can spot that pattern of traffic
at the Guard node. The question is:

“How do you control a particular user’s
Guard node?” That’s very, very hard.

But if e.g. I run a hidden service and all
of you visit my hidden service, and

I’m running a couple of dodgy Guard relays
then the probability is that some of you,

certainly not all of you by any stretch will
select my dodgy Guard relay, and

I could deanonymise you, but I couldn’t
deanonymise the rest of them.

So what we’re saying here is that
you can deanonymise some of the users

some of the time but you can’t pick which
users those are which you’re going to

deanonymise. You can’t deanonymise someone
specific but you can deanonymise a fraction

based on what fraction of the network you
control in terms of Guard capacity.

How about… so the attacker controls those
two – here’s a picture from a research of

the University of Luxemburg which
did this. And these are plots of

taking the user’s IP address visiting
a C&amp;C server, and then geolocating it

and putting it on a map. So “where was the
user located when they called one of

the Tor hidden services?” So, again,
this is a selection, a percentage

of the users visiting C&amp;C servers
using this technique.

How about deanonymising hidden services
themselves? Well, again, you got a problem.

You’re the user. You’re gonna connect
through your Guard into the Tor network.

And then, eventually, through the hidden
service’s Guard node, and talk to

the hidden service. As the attacker you
need to control the hidden service’s

Guard node to do these traffic correlation
attacks. So again, it’s very difficult

to deanonymise a specific Tor hidden
service. But if you think about, okay,

there is 1.000 Tor hidden services, if you
can control a percentage of the Guard nodes

then some hidden services will pick you
and then you’ll be able to deanonymise those.

So provided you don’t care which hidden
services you gonna deanonymise

then it becomes much more straight-forward
to control the Guard nodes of some hidden

services but you can’t pick exactly
what those are.

So what sort of data can you see
traversing a relay?

This is a modified Tor client which just
dumps cells which are coming…

essentially packets travelling down
a circuit, and the information you can

extract from them at a Guard node.
And this is done off the main Tor network.

So I’ve got a client connected to
a “malicious” Guard relay

and it logs every single packet – they’re
called ‘cells’ in the Tor protocol –

coming through the Guard relay. We can’t
decrypt the packet because it’s encrypted

three times. What we can record,
though, is the IP address of the user,

the IP address of the next hop,
and we can count packets travelling

in each direction down the circuit. And we
can also record the time at which those

packets were sent. So of course, if you’re
doing the traffic correlation attacks

you’re using that time in the information
to try and work out whether you’re seeing

traffic which you’ve sent and which
identifies a particular user or not.

Or indeed traffic which they’ve sent
which you’ve seen at a different point

in the network.

Moving on to my…

…interesting problems,
research questions etc.

Based on what I’ve said, I’ve said there’s
these directory authorities which are

controlled by the core Tor members. If
e.g. they were malicious then they could

manipulate the Tor… – if a big enough
chunk of them are malicious then

they can manipulate the consensus
to direct you to particular nodes.

I don’t think that’s the case, and that
anyone thinks that’s the case.

And Tor is designed in a way to tr…
I mean that you’d have to control

a certain number of the authorities
to be able to do anything important.

So the Tor people… I said this
to them a couple of days ago.

I find it quite funny that you’d design
your system as if you don’t trust

each other. To which their response was:
“No, we design our system so that

we don’t have to trust each other.” Which
I think is a very good model to have,

when you have this type of system.
So could we eliminate these sort of

centralized servers? I think that’s
actually a very hard problem to do.

There are lots of attacks which could
potentially be deployed against

a decentralized network. At the moment the
Tor network is relatively well understood

both in terms of what types of attack it
is vulnerable to. So if we were to move

to a new architecture then we may open it
to a whole new class of attacks.

The Tor network has been existing
for quite some time and it’s been

very well studied. What about global
adversaries like the NSA, where you could

monitor network links all across the
world? It’s very difficult to defend

against that. Where they can monitor…
if they can identify which Guard relay

you’re using, they can monitor traffic
going into and out of the Guard relay,

and they log each of the subsequent hops
along. It’s very, very difficult to defend against

these types of things. Do we know if
they’re doing it? The documents that were

released yesterday – I’ve only had a very
brief look through them, but they suggest

that they’re not presently doing it and
they haven’t had much success.

I don’t know why, there are very powerful
attacks described in the academic literature

which are very, very reliable and most
academic literature you can access for free

so it’s not even as if they have to figure
out how to do it. They just have to read

the academic literature and try and
implement some of these attacks.

I don’t know what – why they’re not. The
next question is how to detect malicious

relays. So in my case we’re running
40 relays. Our relays were on consecutive

IP addresses, so we’re running 40
– well, most of them are on consecutive

IP addresses in two blocks. So they’re
running on IP addresses numbered

e.g. 1,2,3,4,…
We were running two relays per IP address,

and every single relay had my name
plastered across it.

So after I set up these 40 relays in

a relatively short period of time
I expected someone from the Tor Project

to come to me and say: “Hey Gareth, what
are you doing?” – no one noticed,

no one noticed. So this is presently
an open question. On the Tor Project

they’re quite open about this. They
acknowledged that, in fact, last year

we had the CERT researchers launch much
more relays than that. The Tor Project

spotted those large number of relays
but chose not to do anything about it

and, in fact, they were deploying an
attack. But, as you know, it’s often very

difficult to defend against unknown
attacks. So at the moment how to detect

malicious relays is a bit of an open
question. Which as I think is being

discussed on the mailing list.

The other one is defending against unknown
tampering at exits. If you took or take

the exit relays – the exit relay
can tamper with the traffic.

So we know particular types of attacks
doing SSL man-in-the-middles etc.

We’ve seen recently binary patching.
How do we detect unknown tampering

with traffic, other types of traffic? So
the binary tampering wasn’t spotted

until it was spotted by someone who
told the Tor Project. So it wasn’t

detected e.g. by the Tor Project
themselves, it was spotted by someone else

and notified to them. And then the final
one open on here is the Tor code review.

So the Tor code is open source. We know
from OpenSSL that, although everyone

can read source code, people don’t always
look at it. And OpenSSL has been

a huge mess, and there’s been
lots of stuff disclosed over that

over the last coming days. There are
lots of eyes on the Tor code but I think

always, more eyes are better. I’d say,
ideally if we can get people to look

at the Tor code and look for
vulnerabilities then… I encourage people

to do that. It’s a very useful thing to
do. There could be unknown vulnerabilities

as we’ve seen with the “relay early” type
quite recently in the Tor code which

could be quite serious. The truth is we
just don’t know until people do thorough

code audits, and even then it’s very
difficult to know for certain.

So my last point, I think, yes,

is advice to future researchers.
So if you ever wanted, or are planning

on doing a study in the future, e.g. on
Tor, do not do what the CERT researchers

do and start deanonymising people on the
live Tor network and doing it in a way

which is incredibly irresponsible. I don’t
think…I mean, I tend, myself, to give you with

the benefit of a doubt, I don’t think the
CERT researchers set out to be malicious.

I think they’re just very naive.
That’s what it was they were doing.

That was rapidly pointed out to them.
In my case we are running

40 relays. Our Tor relays they were forwarding
traffic, they were acting as good relays.

The only thing that we were doing
was logging publication requests

to the directories. Big question whether
that’s malicious or not – I don’t know.

One thing that has been pointed out to me
is that the .onion addresses themselves

could be considered sensitive information,
so any data we will be retaining

from the study is the aggregated data.
So we won't be retaining information

on individual .onion addresses because
that could potentially be considered

sensitive information. If you think about
someone running an .onion address which

contains something which they don’t want
other people knowing about. So we won’t

be retaining that data, and
we’ll be destroying them.

So I think that brings me now
to starting the questions.

I want to say “Thanks” to a couple of
people. The student who donated

the server to us. Nick Savage who is one
of my colleagues who was a sounding board

during the entire study. Ivan Pustogarov
who is the researcher at the University

of Luxembourg who sent us the large data
set of .onion addresses from last year.

He’s also the chap who has demonstrated
those deanonymisation attacks

that I talked about. A big "Thank you" to
Roger Dingledine who has frankly been…

presented loads of questions to me over
the last couple of days and allowed me

to bounce ideas back and forth.
That has been a very useful process.

If you are doing future research I strongly
encourage you to contact the Tor Project

at the earliest opportunity. You’ll find
them… certainly I found them to be

extremely helpful.

Donncha also did something similar,
so both Ivan and Donncha have done

a similar study in trying to classify the
types of hidden services or work out

how many hits there are to particular
types of hidden service. Ivan Pustogarov

did it on a bigger scale
and found similar results to us.

That is that these abuse sites
featured frequently

in the top requested sites. That was done
over a year ago, and again, he was seeing

similar sorts of pattern. There were these
abuse sites being requested frequently.

So that also sort of probates
what we’re saying.

The data I put online is at this address,
there will probably be the slides,

something called ‘The Tor Research
Framework’ which is an implementation

of a Java client, so an implementation
of a Tor client in Java specifically aimed

at researchers. So if e.g. you wanna pull
out data from a consensus you can do.

If you want to build custom routes
through the network you can do.

If you want to build routes through the
network and start sending padding traffic

down them you can do etc.
The code is designed in a way which is

designed to be easily modifiable
for testing lots of these things.

There is also a link to the Tor FBI
exploit which they deployed against

visitors to some Tor hidden services last
year. They exploited a Mozilla Firefox bug

and then ran code on users who were
visiting these hidden service, and ran

code on their computer to identify them.
At this address there is a link to that

including a copy of the shell code and an
analysis of exactly what it was doing.

And then of course a list of references,
with papers and things.

So I’m quite happy to take questions now.

<i>applause</i>

Herald: Thanks for the nice talk!
Do we have any questions

from the internet?

Signal Angel: One question. It’s very hard
to block addresses since creating them

is cheap, and they can be generated
for each user, and rotated often. So

can you think of any other way
for doing the blocking?

Gareth: That is absolutely true, so, yes.
If you were to block a particular .onion

address they can wail: “I want another
.onion address.” So I don’t know of

any way to counter that now.

Herald: Another one from the internet?
<i>inaudible answer from Signal Angel</i>

Okay, then, Microphone 1, please!

Question: Thank you, that’s fascinating
research. You mentioned that it is

possible to influence the hash of your
relay node in a sense that you could

to be choosing which service you are
advertising, or which hidden service

you are responsible for. Is that right?
Gareth: Yeah, correct!

Question: So could you elaborate
on how this is possible?

Gareth: So e.g. you just keep regenerating
a public key for your relay,

you’ll get closer and closer to the point
where you’ll be the responsible relay

for that particular hidden service. That’s
just – you keep regenerating your identity

hash until you’re at that particular point
in the relay. That’s not particularly

computationally intensive to do.
That was it?

Herald: Okay, next question
from Microphone 5, please.

Question: Hi, I was wondering for the
attacks where you identify a certain number

of users using a hidden service. Have
those attacks been used, or is there

any evidence there, and is there
any way of protecting against that?

Gareth: That’s a very interesting question,
is there any way to detect these types

of attacks? So some of the attacks,
if you’re going to generate particular

traffic patterns, one way to do that is to
use the padding cells. The padding cells

aren’t used at the moment by the official
Tor client. So the detection of those

could be indicative but it doesn't... 
it`s not conclusive evidence in our tool.

Question: And is there any way of
protecting against a government

or something trying to denial-of-service
hidden services?

Gareth: So I… trying to… did not…

Question: Is it possible to protect
against this kind of attack?

Gareth: Not that I’m aware of. The Tor
Project are currently revising how they

do the hidden service protocol which will
make e.g. what I did, enumerating

the hidden services, much more difficult.
And to also be in a position on the

distributed hash table in advance
for a particular hidden service.

So they are at the moment trying to change
the way it’s done, and make some of

these things more difficult.

Herald: Good. Next question
from Microphone 2, please.

Mic2: Hi. I’m running the Tor2Web abuse,
and so I used to see a lot of abuse of requests

concerning the Tor hidden service
being exposed on the internet through

the Tor2Web.org domain name. And I just
wanted to comment on, like you said,

the abuse number of the requests. I used
to spoke with some of the child protection

agencies that reported abuse at
Tor2Web.org, and they are effectively

using crawlers that periodically look for
changes in order to get new images to be

put in the database. And what I was able
to understand is that the German agency

doing that is crawling the same sites that
the Italian agencies are crawling, too.

So it’s likely that in most of the
countries there are the child protection

agencies that are crawling those few
numbers of Tor hidden services that

contain child porn. And I saw it also
a bit from the statistics of Tor2Web

where the amount of abuse relating to
that kind of content, it’s relatively low.

Just as contribution!

Gareth: Yes, that’s very interesting,
thank you for that!

<i>applause</i>

Herald: Next, Microphone 4, please.

Mic4: You then attacked or deanonymised
users with an infected or a modified Guard

relay? Is it required to modify the Guard
relay if I control the entry point

of the user to the internet?
If I’m his ISP?

Gareth: Yes, if you observe traffic
travelling into a Guard relay without

controlling the Guard relay itself.
Mic4: Yeah.

Gareth: In theory, yes. I wouldn’t be able
to tell you how reliable that is

off the top of my head.
Mic4: Thanks!

Herald: So another question
from the internet!

Signal Angel: Wouldn’t the ability to
choose the key hash prefix give

the ability to target specific .onions?

Gareth: So you can only target one .onion
address at a time. Because of the way

they are generated. So you wouldn’t be
able to say e.g. “Pick a key which targeted

two or more .onion addresses.” You can
only target one .onion address at a time

by positioning yourself at a particular
point on the distributed hash table.

Herald: Another one
from the internet? … Okay.

Then Microphone 3, please.

Mic3: Hey. Thanks for this research.
I think it strengthens the network.

So in the deem (?) I was wondering whether
you can donate this relays to be a part of

non-malicious relays pool, basically
use them as regular relays afterwards?

Gareth: Okay, so can I donate the relays
a rerun and at the Tor capacity (?) ?

Unfortunately, I said they were run by
a student and they were donated for

a fixed period of time. So we’ve given
those back to him. We are very grateful

to him, he was very generous. In fact,
without his contribution donating these

it would have been much more difficult
to collect as much data as we did.

Herald: Good, next, Microphone 5, please!

Mic5: Yeah hi, first of all thanks
for your talk. I think you’ve raised

some real issues that need to be
considered very carerfully by everyone

on the Tor Project. My question: I’d like
to go back to the issue with so many

abuse related web sites running over
the Tor Project. I think it’s an important

issue that really needs to be considered
because we don’t wanna be associated

with that at the end of the day.
Anyone who uses Tor, who runs a relay

or an exit node. And I understand it’s
a bit of a censored issue, and you don’t

really have any say over whether it’s
implemented or not. But I’d like to get

your opinion on the implementation
of a distributed block-deny system

that would run in very much a similar way
to those of the directory authorities.

I’d just like to see what
you think of that.

Gareth: So you’re asking me whether I want
to support a particular blocking mechanism

then?

Mic5: I’d like to get your opinion on it.
<i>Gareth laughs</i>

I know it’s a sensitive issue but I think,
like I said, I think something…

I think it needs to be considered because
everyone running exit nodes and relays

and people of the Tor Project don’t
want to be known or associated with

these massive amount of abuse web sites
that currently exist within the Tor network.

Gareth: I absolutely agree, and I think
the Tor Project are horrified as well that

this problem exists, and they, in fact,
talked on it in previous years that

they have a problem with this type of
content. I asked to what if anything is

done about it, it’s very much up to them.
Could it be done in a distributed fashion?

So the example I gave was a way which
it could be done by relay operators.

So e.g. that would need the consensus of
a large number of relay operators to be

effective. So that is done in
a distributed fashion. The question is:

who gives the list of .onion addresses to
block to each of the relay operators?

Clearly, the relay operators aren’t going
to collect themselves. It needs to be

supplied by someone like the Tor Project,
e.g., or someone trustworthy. Yes, it can

be done in a distributed fashion.
It can be done in an open fashion.

Mic5: Who knows?
Gareth: Okay.

Mic5: Thank you.

Herald: Good. And another
question from the internet.

Signal Angel: Apparently there’s an option
in the Tor client to collect statistics

on hidden services. Do you know about
this, and how it relates to your research?

Gareth: Yes, I believe they’re going to
be… the extent to which I know about it

is they’re gonna be trying this next
month, to try and estimate how many

hidden services there are. So keep
your eye on the Tor Project web site,

I’m sure they’ll be publishing
their data in the coming months.

Herald: And, sadly, we are running out of
time, so this will be the last question,

so Microphone 4, please!

Mic4: Hi, I’m just wondering if you could
sort of outline what ethical clearances

you had to get from your university
to conduct this kind of research.

Gareth: So we have to discuss these
types of things before undertaking

any research. And we go through the steps
to make sure that we’re not e.g. storing

sensitive information about particular
people. So yes, we are very mindful

of that. And that’s why I made a
particular point of putting on the slides

as to some of the things to consider.

Mic4: So like… you outlined a potential
implementation of the traffic correlation

attack. Are you saying that
you performed the attack? Or…

Gareth: No, no no, absolutely not.
So the link I’m giving… absolutely not.

We have not engaged in any…

Mic4: It just wasn’t clear
from the slides.

Gareth: I apologize. So it’s absolutely
clear on that. No, we’re not engaging

in any deanonymisation research on the
Tor network. The research I showed

is linked on the references, I think,
which I put at the end of the slides.

You can read about it. But it’s done in
simulation. So e.g. there’s a way

to do simulation of the Tor network on
a single computer. I can’t remember

the name of the project, though.
Shadow! Yes, it’s a system

called Shadow, we can run a large
number of Tor relays on a single computer

and simulate the traffic between them.
If you’re going to do that type of research

then you should use that. Okay,
thank you very much, everyone.

<i>applause</i>

<i>silent postroll titles</i>

subtitles created by c3subtitles.de
Join, and help us!