silent 31C3 preroll
Dr. Gareth Owen: Hello. Can you hear me?
Yes. Okay. So my name is Gareth Owen.
I’m from the University of Portsmouth.
I’m an academic
and I’m going to talk to you about
an experiment that we did
on the Tor hidden services,
trying to categorize them,
estimate how many they were etc. etc.
Well, as we go through the talk
I’m going to explain
how Tor hidden services work internally,
and how the data was collected.
So what sort of conclusions you can draw
from the data based on the way that we’ve
collected it. Just so [that] I get
an idea: how many of you use Tor
on a regular basis, could you
put your hand up for me?
So quite a big number. Keep your hand
up if… or put your hand up if you’re
a relay operator.
Wow, that’s quite a significant number,
isn’t it? And then, put your hand up
and/or keep it up if you
run a hidden service.
Okay, so, a fewer number, but still
some people run hidden services.
Okay, so, some of you may be very familiar
with the way Tor works, sort of,
in a low level. But I am gonna go through
it for those which aren’t, so they understand
just how they work. And as we go along,
because I’m explaining how
the hidden services work, I’m going
to tag on information on how
the Tor hidden services themselves can be
deanonymised and also how the users
of those hidden services can be
deanonymised, if you put
some strict criteria on what it is you
want to do with respect to them.
So the things that I’m going to go over:
I wanna go over how Tor works,
and then specifically how hidden services
work. I’m gonna talk about something
called the “Tor Distributed Hash Table”
for hidden services. If you’ve heard
that term and don’t know what
it means, don’t worry, I’ll explain
what a distributed hash table is and
how it works. It’s not as complicated
as it sounds. And then I wanna go over
Darknet data, so, data that we collected
from Tor hidden services. And as I say,
as we go along I will sort of explain
how you do deanonymisation of both the
services themselves and of the visitors
to the service. And just
how complicated it is.
So you may have seen this slide which
I think was from GCHQ, released last year
as part of the Snowden leaks where they
said: “You can deanonymise some users
some of the time but they’ve had
no success in deanonymising someone
in response to a specific request.”
So, given all of you e.g., I may be able
to deanonymise a small fraction of you
but I can’t choose precisely one person
I want to deanonymise. That’s what
I’m gonna be explaining in relation
to the deanonymisation attacks, how
you can deanonymise a section but
you can’t necessarily choose which section
of the users that you will be deanonymising.
Tor drives with just a couple
of different problems. On one part
it allows you to bypass censorship. So if
you’re in a country like China, which
blocks some types of traffic you can use
Tor to bypass their censorship blocks.
It tries to give you privacy, so, at some
level in the network someone can’t see
what you’re doing. And at another point
in the network people who don’t know
who you are but may necessarily
be able to see what you’re doing.
Now the traditional case
for this is to look at VPNs.
With a VPN you have
sort of a single provider.
You have lots of users connecting
to the VPN. The VPN has sort of
a mixing effect from an outside or
a server’s point of view. And then
out of the VPN you see requests
to Twitter, Wikipedia etc. etc.
And if that traffic doesn’t encrypt it then
the VPN can also read the contents
of the traffic. Now of course there is
a fundamental weakness with this.
If you trust the VPN provider the VPN
provider knows both who you are
and what you’re doing and can
link those two together with absolute
certainty. So you don’t… whilst you do
get some of these properties, assuming
you’ve got a trustworthy VPN provider
you don’t get them in the face of
an untrustworthy VPN provider.
And of course: how do you trust the VPN
provider? What sort of measure do
you use? That’s sort of an open question.
So Tor tries to solve this problem
by distributing the trust. Tor is
an open source project, so you can go
on to their Git repository, you can
download the source code, and change it,
improve it, submit patches etc.
As you heard earlier, during Jacob and
Roger’s talk they’re currently partly
sponsored by the US Government which seems
a bit paradoxical, but they explained
in that talk many of the… that
doesn’t affect like judgment.
And indeed, they do have some funding from
other sources, and they design that system
– which I’ll talk about a little bit
later – in a way where they don’t have
to trust each other. So there’s sort of
some redundancy, and they’re trying
to minimize these sort of trust issues
related to this. Now, Tor is
a partially de-centralized network, which
means that it has some centralized
components which are under the control of
the Tor Project and some de-centralized
components which are normally the Tor
relays. If you run a relay you’re
one of those de-centralized components.
There is, however, no single authority
on the Tor network.
So no single server which is responsible,
which you’re required to trust.
So the trust is somewhat distributed,
but not entirely. When you establish
a circuit through Tor you, the user,
download a list of all of the relays
inside the Tor network.
And you get to pick – and I’ll tell you
how you do that – which relays
you’re going to use to route your traffic
through. So here is a typical example:
You’re here on the left hand side as the
user. You download a list of the relays
inside the Tor network and you select from
that list three nodes, a guard node
which is your entry into the Tor network,
a relay node which is a middle node.
Essentially, it’s going to route your
traffic to a third hop. And then
the third hop is the exit node where
your traffic essentially exits out
on the internet. Now, looking at the
circuit. So this is a circuit through
the Tor network through which you’re
going to route your traffic. There are
three layers of encryption at the
beginning, so between you
and the guard node. Your traffic
is encrypted three times.
In the first instance encrypted to the
guard, and the it’s encrypted again,
through the relay, and then encrypted
again to the exit, and as the traffic moves
through the Tor network each of those
layers of encryption are unpeeled
from the data. The Guard here in this case
knows who you are, and the exit relay
knows what you’re doing but neither know
both. And the middle relay doesn’t really
know a lot, except for which relay is
her guard and which relay is her exit.
Who runs an exit relay? So if you run
an exit relay all of the traffic which
users are sending out on the internet they
appear to come from your IP address.
So running an exit relay is potentially
risky because someone may do something
through your relay which attracts attention.
And then, when law enforcement
traced that back to an IP address it’s
going to come back to your address.
So some relay operators have had trouble
with this, with law enforcement coming
to them, and saying: “Hey we got this
traffic coming through your IP address
and you have to go and explain it.”
So if you want to run an exit relay
it’s a little bit risky, but we’re thankful
for those people that do run exit relays
because ultimately if people didn’t run
an exit relay you wouldn’t be able
to get out of the Tor network, and it
wouldn’t be terribly useful from this
point of view. So, yes.
applause
So every Tor relay, when you set up
a Tor relay you publish something called
a descriptor which describes your Tor
relay and how to use it to a set
of servers called the authorities. And the
trust in the Tor network is essentially
split across these authorities. They’re run
by the core Tor Project members.
And they maintain a list of all of the
relays in the network. And they observe
them over a period of time. If the relays
exhibit certain properties they give
the relays flags. If e.g. a relay allows
traffic to exit from the Tor network
it will get the ‘Exit’ flag. If they’d been
switched on for a certain period of time,
or for a certain amount of traffic they’ll
be allowed to become the guard relay
which is the first node in your circuit.
So when you build your circuit you
download a list of these descriptors from
one of the Directory Authorities. You look
at the flags which have been assigned to
each of the relays, and then you pick
your route based on that. So you’ll pick
the guard node from a set of relays
which have the ‘Guard’ flag, your exits
from the set of relays which have
the ‘Exit’ flag etc. etc. Now, as of
a quick count this morning there are
about 1500 guard relays, around 1000 exit
relays, and six relays flagged as ‘bad’ exits.
What does a ‘bad exit’ mean?
waits for audience to respond
That’s not good! That’s exactly
what it means! Yes! laughs
applause
So relays which have been flagged as ‘bad
exits’ your client will never chose to exit
traffic through. And examples of things
which may get a relay flagged as an
[bad] exit relay – if they’re fiddling with
the traffic which is coming out of
the Tor relay. Or doing things like
man-in-the-middle attacks against
SSL traffic. We’ve seen various things,
there have been relays man-in-the-middling
SSL traffic, there have very, very recently
been an exit relay which was patching
binaries that you downloaded from the
internet, inserting malware into the binaries.
So you can do these things but the Tor
Project tries to scan for them. And if
these things are detected then they’ll be
flagged as ‘Bad Exits’. It’s true to say
that the scanning mechanism is not 100%
fool-proof by any stretch of the imagination.
It tries to pick up common types
of attacks, so as a result
it won’t pick up unknown attacks or
attacks which haven’t been seen or
have not been known about beforehand.
So looking at this, how do you deanonymise
the traffic travelling through the Tor
networks? Given some traffic coming out
of the exit relay, how do you know
which user that corresponds to? What is
their IP address? You can’t actually
modify the traffic because if any of the
relays tried to modify the traffic
which they’re sending through the network
Tor will tear down the circuit through the relay.
So there’s these integrity checks, each
of the hops. And if you try to sort of
– because you can’t decrypt the packet
you can’t modify it in any meaningful way,
and because there’s an integrity check
at the next hop that means that you can’t
modify the packet because otherwise it’s
detected. So you can’t do this sort of
marker, and try and follow the marker
through the network. So instead
what you can do if you control… so let me
give you two cases. In the worst case
if the attacker controls all three of your
relays that you pick, which is an unlikely
scenario that needs to control quite
a big proportion of the network. Then
it should be quite obvious that they can
work out who you are and also
see what you’re doing because in that
case they can tag the traffic, and
they can just discard these integrity
checks at each of the following hops.
Now in a different case, if you control
the Guard relay and the exit relay
but not the middle relay the Guard relay
can’t tamper with the traffic because
this middle relay will close down the
circuit as soon as it happens.
The exit relay can’t send stuff back down
the circuit to try and identify the user,
either. Because again, the circuit will be
closed down. So what can you do?
Well, you can count the number of packets
going through the Guard node. And you can
measure the timing differences between
packets, and try and spot that pattern
at the Exit relays. You’re looking at counts of
packets and the timing between those
packets which are being sent, and
essentially trying to correlate them all.
So if your user happens to pick you as
your Guard node, and then happens to pick
your exit relay, then you can deanonymise
them with very high probability using
this technique. You’re just correlating
the timings of packets and counting
the number of packets going through.
And the attacks demonstrated in literature
are very reliable for this. We heard
earlier from the Tor talk about the “relay
early” tag which was the attack discovered
by the cert researches in the US.
That attack didn’t rely on timing attacks.
Instead, what they were able to do was
send a special type of cell containing
the data back down the circuit,
essentially marking this data, and saying:
“This is the data we’re seeing
at the Exit relay, or at the hidden
service", and encode into the messages
travelling back down the circuit, what the
data was. And then you could pick
those up at the Guard relay and say, okay,
whether it’s this person that’s doing that.
In fact, although this technique works,
and yeah it was a very nice attack,
the traffic correlation attacks are
actually just as powerful.
So although this bug has been fixed traffic
correlation attacks still work and are
still fairly, fairly reliable. So the problem
still does exist. This is very much
an open question. How do we solve this
problem? We don’t know, currently,
how to solve this problem of trying
to tackle the traffic correlation.
There are a couple of solutions.
But they’re not particularly…
they’re not particularly reliable. Let me
just go through these, and I’ll skip back
on the few things I’ve missed. The first
thing is, high-latency networks, so
networks where packets are delayed
in their transit through the network.
That throws away a lot of the timing
information. So they promise
to potentially solve this problem.
But of course, if you want to visit
Google’s home page, and you have to wait
five minutes for it, you’re simply
just not going to use Tor. The whole point
is trying to make this technology usable.
And if you got something which is very,
very slow then it doesn’t make it
attractive to use. But of course,
this case does work slightly better
for e-mail. If you think about it with
e-mail, you don’t mind if you’re e-mail
– well, you may not mind, you may mind –
you don’t mind if your e-mail is delayed
by some period of time. Which makes this
somewhat difficult. And as Roger said
earlier, you can also introduce padding
into the circuit, so these are dummy cells.
But, but… with a big caveat: some of the
research suggests that actually you’d
need to introduce quite a lot of padding
to defeat these attacks, and that would
overload the Tor network in its current
state. So, again, not a particular
practical solution.
How does Tor try to solve this problem?
Well, Tor makes it very difficult
to become a users Guard relay. If you
can’t become a users Guard relay
then you don’t know who the user is, quite
simply. And so by making it very hard
to become the Guard relay therefore you
can’t do this traffic correlation attack.
So at the moment the Tor client chooses
one Guard relay and keeps it for a period
of time. So if I want to sort of target
just one of you I would need to control
the Guard relay that you were using at
that particular point in time. And in fact
I’d also need to know what that Guard
relay is. So by making it very unlikely
that you would select a particular malicious
Guard relay, where the number of malicious
Guard relays is very small, that’s how Tor
tries to solve this problem. And
at the moment your Guard relay is your
barrier of security. If the attacker can’t
control the Guard relay then they won’t
know who you are. That doesn’t mean
they can’t try other sort of side channel
attacks by messing with the traffic
at the Exit relay etc. You know that you
may sort of e.g. download dodgy documents
and open one on your computer, and those
sort of things. Now the alternative
of course to having a Guard relay
and keeping it for a very long time
will be to have a Guard relay and
to change it on a regular basis.
Because you might think, well, just choosing
one Guard relay and sticking with it
is probably a bad idea. But actually,
that’s not the case. If you pick
the Guard relay, and assuming that the
chance of picking a Guard relay that is
malicious is very low, then, when you
first use your Guard relay, if you got
a good choice, then your traffic is safe.
If you haven’t got a good choice then
your traffic isn’t safe. Whereas if your
Tor client chooses a Guard relay
every few minutes, or every hour, or
something on those lines at some point
you’re gonna pick a malicious Guard relay.
So they’re gonna have some of your traffic
but not all of it. And so currently the
trade-off is that we make it very difficult
for an attacker to control a Guard relay
and the user picks a Guard relay and
keeps it for a long period of time. And
so it’s very difficult for the attackers
to pick that Guard relay when they control
a very small proportion of the network.
So this, currently, provides those
properties I described earlier, the privacy
and the anonymity when you’re browsing the
web, when you’re accessing websites etc.
But still you know who the website is. So
although you’re anonymous and the website
doesn’t know who you are you know who the
website is. And there may be some cases
where e.g. the website would also wish to
remain anonymous. You want the person
accessing the website and the website
itself to be anonymous to each other.
And you could think about people e.g.
being in countries where running
a political blog e.g. might be a dangerous
activity. If you run that on a regular
webserver you’re easily identified whereas,
if you got some way where you as
the webserver can be anonymous then
that allows you to do that activity without
being targeted by your government. So
this is what hidden services try to solve.
Now when you first think about a problem
you kind of think: “Hang on a second,
the user doesn’t know who the website
is and the website doesn’t know
who the user is. So how on earth do they
talk to each other?” Well, that’s essentially
what the Tor hidden service protocol tries
to sort of set up. How do you identify and
connect to each other. So at the moment
this is what happens: We’ve got Bob
on the [right] hand side who is the hidden
service. And we got Alice on the left hand
side here who is the user who wishes to
visit the hidden service. Now when Bob
sets up his hidden service he picks three
nodes in the Tor network as introduction
points and builds several hop circuits to
them. So the introduction points don’t know
who Bob is. Bob has circuits to them. And
Bob says to each of these introduction points
“Will you relay traffic to me if someone
connects to you asking for me?”
And then those introduction points
do that. So then, once Bob has picked
his introduction points he publishes
a descriptor describing the list of his
introduction points for someone who wishes
to come onto his websites. And then Alice
on the left hand side wishing to visit Bob
will pick a rendezvous point in the network
and build a circuit to it. So this “RP”
here is the rendezvous point.
And she will relay a message via one of
the introduction points saying to Bob:
“Meet me at the rendezvous point”.
And then Bob will build a 3-hop-circuit
to the rendezvous point. So now at this
stage we got Alice with a multi-hop circuit
to the rendezvous point, and Bob with
a multi-hop circuit to the rendezvous point.
Alice and Bob haven’t connected to one
another directly. The rendezvous point
doesn’t know who Bob is, the rendezvous
point doesn’t know who Alice is.
All they’re doing is forwarding the
traffic. And they can’t inspect the traffic,
either, because the traffic itself
is encrypted.
So that’s currently how you solve this
problem with trying to communicate
with someone who you don’t know
who they are and vice versa.
drinks from the bottle
The principle thing I’m going to talk
about today is this database.
So I said, Bob, when he picks his
introduction points he builds this thing
called a descriptor, describing who his
introduction points are, and he publishes
them to a database. This database itself
is distributed throughout the Tor network.
It’s not a single server. So both, Bob and
Alice need to be able to publish information
to this database, and also retrieve
information from this database. And Tor
currently uses something called
a distributed hash table, which I’m gonna
give an example of what this means and
how it works. And then I’ll talk to you
specifically how the Tor Distributed Hash
Table works itself. So let’s say e.g.
you've got a set of servers. So here we've
got 26 servers and you’d like to store
your files across these different servers
without having a single server responsible
for deciding, “okay, that file is stored
on that server, and this file is stored
on that server” etc. etc. Now here is my
list of files. You could take a very naive
approach. And you could say: “Okay, I’ve
got 26 servers, I got all of these file names
and start with the letter of the alphabet.”
And I could say: “All of the files that begin
with A are gonna go under server A; or
the files that begin with B are gonna go
on server B etc.” And then when you want
to retrieve a file you say: “Okay, what
does my file name begin with?” And then
you know which server it’s stored on.
Now of course you could have a lot of
servers – sorry – a lot of files
which begin with a Z, an X or a Y etc. in
which case you’re gonna overload
that server. You’re gonna have more files
stored on one server than on another server
in your set. And if you have a lot of big
files, say e.g. beginning with B then
rather than distributing your files across
all the servers you’re gonna just be
overloading one or two of them. So to
solve this problem what we tend to do is:
we take the file name, and we run it
through a cryptographic hash function.
A hash function produces output which
looks like random, very small changes
in the input so a cryptographic hash
function produces a very large change
in the output. And this change looks
random. So if I take all of my file names
here, and assuming I have a lot more,
I take a hash of them, and then I use
that hash to determine which server to
store the file on. Then, with high probability
my files will be distributed evenly across
all of the servers. And then when I want
to go and retrieve one of the files I take
my file name, I run it through the
cryptographic hash function, that gives me
the hash, and then I use that hash
to identify which server that particular
file is stored on. And then I go and
retrieve it. So that’s the sort of a loose
idea of how a distributed hash table works.
There are a couple of problems with this.
What if you got a changing size, what
if the number of servers you got changes
in size as it does in the Tor network.
It’s a very brief overview of the theory.
So how does it apply for the Tor network?
Well, the Tor network has a set of relays
and it has a set of hidden services.
Now we take all of the relays, and they
have a hash identity which identifies them.
And we map them onto a circle using that
hash value as an identifier. So you can
imagine the hash value ranging from Zero
to a very large number. We got a Zero point
at the very top there. And that runs all
the way round to the very large number.
So given the identity hash for a relay we
can map that to a particular point on
the server. And then all we have to do
is also do this for hidden services.
So there’s a hidden service address,
something.onion, so this is
one of the hidden websites that you might
visit. You take the – I’m not gonna describe
in too much detail how this is done but –
the value is done in such a way such that
it’s evenly distributed about the circle.
So your hidden service will have
a particular point on the circle. And the
relays will also be mapped onto this circle.
So there’s the relays. And the hidden
service. And in the case of Tor
the hidden service actually maps to two
positions on the circle, and it publishes
its descriptor to the three relays to the
right at one position, and the three relays
to the right at another position. So there
are actually in total six places where
this descriptor is published on the
circle. And then if I want to go and
fetch and connect to a hidden service
I go on to go and pull this hidden descriptor
down to identify what its introduction
points are. I take the hidden service
address, I find out where it is on the
circle, I map all of the relays onto
the circle, and then I identify which
relays on the circle are responsible
for that particular hidden service. And
I just connect, then I say: “Do you have
a copy of the descriptor for that
particular hidden service?”
And if so then we’ve got our list of
introduction points. And we can go
to the next steps to connect to our hidden
service. So I’m gonna explain how we
sort of set up our experiments. What we
thought, or what we were interested to do,
was collect publications of hidden
services. So for everytime a hidden service
gets set up it publishes to this distributed
hash table. What we wanted to do was
collect those publications so that we
get a complete list of all of the hidden
services. And what we also wanted to do
is to find out how many times a particular
hidden service is requested.
Just one more point that
will become important later.
The position which the hidden service
appears on the circle changes
every 24 hours. So there’s not
a fixed position every single day.
If we run 40 nodes over a long period of
time we will occupy positions within
that distributed hash table. And we will be
able to collect publications and requests
for hidden services that are located at
that position inside the distributed
hash table. So in that case we ran 40 Tor
nodes, we had a student at university
who said: “Hey, I run a hosting company,
I got loads of server capacity”, and
we told him what we were doing, and he
said: “Well, you really helped us out,
these last couple of years…”
and just gave us loads of server capacity
to allow us to do this. So we spun up 40
Tor nodes. Each Tor node was required
to advertise a certain amount of bandwidth
to become a part of that distributed
hash table. It’s actually a very small
amount, so this didn’t matter too much.
And then, after – this has changed
recently in the last few days,
it used to be 25 hours, it’s just been
increased as a result of one of the
attacks last week. But here… certainly
during our study it was 25 hours. You then
appear at a particular point inside that
distributed hash table. And you’re then
in a position to record publications of
hidden services and requests for hidden
services. So not only can you get a full
list of the onion addresses you can also
find out how many times each of the
onion addresses are requested.
And so this is what we recorded. And then,
once we had a full list of… or once
we had run for a long period of time to
collect a long list of .onion addresses
we then built a custom crawler that would
visit each of the Tor hidden services
in turn, and pull down the HTML contents,
the text content from the web page,
so that we could go ahead and classify
the content. Now it’s really important
to know here, and it will become obvious
why a little bit later, we only pulled down
HTML content. We didn’t pull out images.
And there’s a very, very important reason
for that which will become clear shortly.
We had a lot of questions when we
first started this. Noone really knew
how many hidden services there were. It had
been suggested to us there was a very high
turn-over of hidden services. We wanted to
confirm that whether that was true or not.
And we also wanted to do this so,
what are the hidden services,
how popular are they, etc. etc. etc. So
our estimate for how many hidden services
there are, over the period which we
ran our study, this is a graph plotting
our estimate for each of the individual
days as to how many hidden services
there were on that particular day. Now the
data is naturally noisy because we’re only
a very small proportion of that circle.
So we’re only observing a very small
proportion of the total publications and
requests every single day, for each of
those hidden services. And if you
take a long term average for this
there’s about 45.000 hidden services that
we think were present, on average,
each day, during our entire study. Which
is a large number of hidden services.
But over the entire length we
collected about 80.000, in total.
Some came and went etc.
So the next question after how many
hidden services there are is how long
the hidden service exists for.
Does it exist for a very long period
of time, does it exist for a very short
period of time etc. etc.
So what we did was, for every single
.onion address we plotted how many times
we saw a publication for that particular
hidden service during the six months.
How many times did we see it.
If we saw it a lot of times that suggested
in general the hidden service existed
for a very long period of time. If we saw
a very short number of publications
for each hidden service then that
suggests that they were only present
for a very short period of time. This is
our graph. By far the most number
of hidden services we only saw once during
the entire study. And we never saw them
again. We suggest that there’s a very high
turnover of the hidden services, they
don’t tend to exist on average i.e. for
a very long period of time.
And then you can see the sort of
a tail here. If we plot just those
hidden services which existed for a long
time, so e.g. we could take hidden services
which have a high number of hit requests
and say: “Okay, those that have a high number
of hits probably existed for a long time.”
That’s not absolutely certain, but probably.
Then you see this sort of -normal- plot
about 4..5, so we saw on average
most hidden services four or five times
during the entire six months if they were
popular and we’re using that as a proxy
measure for whether they existed
for the entire time. Now, this stage was
over 160 days, so almost six months.
What we also wanted to do was trying
to confirm this over a longer period.
So last year, in 2013, about February time
some researchers of the University
of Luxemburg also ran a similar study
but it ran over a very short period of time
over the day. But they did it in such
a way it could collect descriptors
across much of the circle during a single
day. That was because of a bug in the way
Tor did some of the things which has
now been fixed so we can’t repeat that
as a particular way. So we got a list of
.onion addresses from February 2013
from these researchers at the University
of Luxemburg. And then we got our list
of .onion addresses from this six months
which was March to September of this year.
And we wanted to say, okay, we’re given
these two sets of .onion addresses.
Which .onion addresses existed in his set
but not ours and vice versa, and which
.onion addresses existed in both sets?
So as you can see a very small minority
of hidden service addresses existed
in both sets. This is over an 18 month
period between these two collection points.
A very small number of services existed
in both his data set and in
our data set. Which again suggested
there’s a very high turnover of hidden
services that don’t tend to exist
for a very long period of time.
So the question is why is that?
Which we’ll come on to a little bit later.
It’s a very valid question, can’t answer
it 100%, we have some inclines as to
why that may be the case. So in terms
of popularity which hidden services
did we see, or which .onion addresses
did we see requested the most?
Which got the most number of hits? Or the
most number of directory requests.
So botnet Command & Control servers
– if you’re not familiar with what
a botnet is, the idea is to infect lots of
people with a piece of malware.
And this malware phones home to
a Command & Control server where
the botnet master can give instructions
to each of the bots on to do things.
So it might be e.g. to collect passwords,
key strokes, banking details.
Or it might be to do things like
Distributed Denial of Service attacks,
or to send spam, those sorts of things.
And a couple of years ago someone gave
a talk and said: “Well, the problem with
running a botnet is your C&C servers
are vulnerable.” Once a C&C server is taken
down you no longer have control over
your botnet. So it’s been a sort of arms
race against anti-virus companies and
against malware authors to try and come up
with techniques to run C&C servers in a way
which they can’t be taken down. And
a couple of years ago someone gave a talk
at a conference that said: “You know what?
It would be a really good idea if botnet
C&C servers were run as Tor hidden
services because then no one knows
where they are, and in theory they can’t
be taken down.” So in the fact we have this
there are loads and loads and loads of
these addresses associated with several
different botnets, ‘Sefnit’ and ‘Skynet’.
Now Skynet is the one I wanted to talk
to you about because the guy that runs
Skynet had a twitter account, and he also
did a Reddit AMA. If you not heard
of a Reddit AMA before, that’s a Reddit
ask-me-anything. You can go on the website
and ask the guy anything. So this guy
wasn’t hiding in the shadows. He’d say:
“Hey, I’m running this massive botnet,
here’s my Twitter account which I update
regularly, here is my Reddit AMA where
you can ask me questions!” etc.
He was arrested last year, which is not,
perhaps, a huge surprise.
laughter and applause
But… so he was arrested,
his C&C servers disappeared
but there were still infected hosts trying
to connect with the C&C servers and
request access to the C&C server.
This is why we’re saying: “A large number
of hits.” So all of these requests are
failed requests, i.e. we didn’t have
a descriptor for them because
the hidden service had gone away but
there were still clients requesting each
of the hidden services.
And the next thing we wanted to do was
to try and categorize sites. So, as I said
earlier, we crawled all of the hidden
services that we could, and we classified
them into different categories based
on what the type of content was
on the hidden service side. The first
graph I have is the number of sites
in each of the categories. So you can see
down the bottom here we got lots of
different categories. We got drugs, market
places, etc. on the bottom. And the graph
shows the percentage of the hidden
services that we crawled that fit in
to each of these categories. So e.g. looking
at this, drugs, the most number of sites
that we crawled were made up of
drugs-focused websites, followed by
market places etc. There’s a couple of
questions you might have here,
so which ones are gonna stick out, what
does ‘porn’ mean, well, you know
what ‘porn’ means. There are some very
notorious porn sites on the Tor Darknet.
There was one in particular which was
focused on revenge porn. It turns out
that youngsters wish to take pictures
of themselves, and send it to their
boyfriends or their girlfriends. And
when they get dumped they publish them
on these websites. So there were several
of these sites on the main internet
which have mostly been shut down.
And some of these sites were archived
on the Darknet. The second one is that
we should probably wonder what is,
is ‘abuse’. Abuse was… every single
site we classified in this category
were child abuse sites. So they were in
some way facilitating child abuse.
And how do we know that? Well, the data
that came back from the crawler
made it completely unambiguous as to what
the content was in these sites. That was
completely obvious, from then content, from
the crawler as to what was on these sites.
And this is the principal reason why we
didn’t pull down images from sites.
There are many countries that
would be a criminal offense to do so.
So our crawler only pulled down text
content from all of these sites, and that
enabled us to classify them, based on
that. We didn’t pull down any images.
So of course the next thing we liked to do
is to say: “Okay, well, given each of these
categories, what proportion of directory
requests went to each of the categories?”
Now the next graph is going to need some
explaining as to precisely what it
means, and I’m gonna give that. This is
the proportion of directory requests
which we saw that went to each of the
categories of hidden service that we
classified. As you can see, in fact, we
saw a very large number going to these
abuse sites. And the rest sort of
distributed right there, at the bottom.
And the question is: “What is it
we’re collecting here?”
We’re collecting successful hidden service
directory requests. What does a hidden
service directory request mean?
It probably loosely correlates with
either a visit or a visitor. So somewhere
in between those two. Because when you
want to visit a hidden service you make
a request for the hidden service descriptor
and that allows you to connect to it
and browse through the web site.
But there are cases where, e.g. if you
restart Tor, you’ll go back and you
re-fetch the descriptor. So in that case
we’ll count twice, for example.
What proportion of these are people,
and which proportion of them are
something else? The answer to that is
we just simply don’t know.
We've got directory requests but that doesn’t
tell us about what they’re doing on these
sites, what they’re fetching, or who
indeed they are, or what it is they are.
So these could be automated requests,
they could be human beings. We can’t
distinguish between those two things.
What are the limitations?
A hidden service directory request neither
exactly correlates to a visit -or- a visitor.
It’s probably somewhere in between.
So you can’t say whether it’s exactly one
or the other. We cannot say whether
a hidden service directory request
is a person or something automated.
We can’t distinguish between those two.
Any type of site could be targeted by e.g.
DoS attacks, by web crawlers which would
greatly inflate the figures. If you were
to do a DoS attack it’s likely you’d only
request a small number of descriptors.
You’d actually be flooding the site itself
rather than the directories. But, in
theory, you could flood the directories.
But we didn’t see any sort of shutdown
of our directories based on flooding, e.g.
Whilst we can’t rule that out, it doesn’t
seem to fit too well with what we’ve got.
The other question is ‘crawlers’.
I obviously talked with the Tor Project
about these results and they’ve suggested
that there are groups, so the child
protection agencies e.g. that will crawl
these sites on a regular basis. And,
again, that doesn’t necessarily correlate
with a human being. And that could
inflate the figures. How many hidden
directory requests would there be
if a crawler was pointed at it. Typically,
if I crawl them on a single day, one request.
But if they got a large number of servers
doing the crawling then it could be
a request per day for every single server.
So, again, I can’t give you, definitive,
“yes, this is human beings” or
“yes, this is automated requests”.
The other important point is, these two
content graphs are only hidden services
offering web content. There are hidden
services that do things, e.g. IRC,
the instant messaging etc. Those aren’t
included in these figures. We’re only
concentrating on hidden services offering
web sites. They’re HTTP services, or HTTPS
services. Because that allows to easily
classify them. And, in fact, some of
the other types are IRC and Jabber the
result was probably not directly comparable
with web sites. That’s sort of the use
case for using them, it’s probably
slightly different. So I appreciate the
last graph is somewhat alarming.
If you have any questions please ask
either me or the Tor developers
as to how to interpret these results. It’s
not quite as straight-forward as it may
look when you look at the graph. You
might look at the graph and say: “Hey,
that looks like there’s lots of people
visiting these sites”. It’s difficult
to conclude that from the results.
The next slide is gonna be very
contentious. I will prefix it with:
“I’m not advocating -any- kind of
action whatsoever. I’m just trying
to describe technically as to what could
be done. It’s not up to me to make decisions
on these types of things.” So, of course,
when we found this out, frankly, I think
we were stunned. I mean, it took us
several days, frankly, it just stunned us,
“what the hell, this is not
what we expected at all.”
So a natural step is, well, we think, most
of us think that Tor is a great thing,
it seems. Could this problem be sorted out
while still keeping Tor as it is?
And probably the next step to say: “Well,
okay, could we just block this class
of content and not other types of content?”
So could we block just hidden services
that are associated with these sites and
not other types of hidden services?
We thought there’s three ways in which
we could block hidden services.
And I’ll talk about whether these were
impossible in the coming months,
after explaining them. But during our
study these would have been impossible
and presently they are possible.
A single individual could shut down
a single hidden service by controlling
all of the relays which are responsible
for receiving a publication request
on that distributed hash table. It’s
possible to place one of your relays
at a particular position on that circle
and so therefore make yourself be
the responsible relay for
a particular hidden service.
And if you control all of the six relays
which are responsible for a hidden service,
when someone comes to you and says:
“Can I have a descriptor for that site”
you can just say: “No, I haven’t got it”.
And provided you control those relays
users won’t be able to fetch those sites.
The second option is you could say:
“Okay, the Tor Project are blocking these”
– which I’ll talk about in a second –
“as a relay operator”. Could I
as a relay operator say: “Okay, as
a relay operator I don’t want to carry
this type of content, and I don’t want to
be responsible for serving up this type
of content.” A relay operator could patch
his relay and say: “You know what,
if anyone comes to this relay requesting
anyone of these sites then, again, just
refuse to do it”. The problem is a lot of
relay operators need to do it. So a very,
very large number of the potential relay
operators would need to do that
to effectively block these sites. The
final option is the Tor Project could
modify the Tor program and actually embed
these ingresses in the Tor program itself
so as that all relays by default both
block hidden service directory requests
to these sites, and also clients themselves
would say: “Okay, if anyone’s requesting
these block them at the client level.”
Now I hasten to add: I’m not advocating
any kind of action that is entirely up to
other people because, frankly, I think
if I advocated blocking hidden services
I probably wouldn’t make it out alive,
so I’m just saying: this is a description
of what technical measures could be used
to block some classes of sites. And of
course there’s lots of questions here.
If e.g. the Tor Project themselves decided:
“Okay, we’re gonna block these sites”
that means they are essentially
in control of the block list.
The block list would be somewhat public
so everyone would be up to inspect
what the sites are that are being blocked
and they would be in control of some kind
of block list. Which, you know, arguably
is against what the Tor Projects are after.
takes a sip, coughs
So how about deanonymising visitors
to hidden service web sites?
So in this case we got a user on the
left-hand side who is connected to
a Guard node. We’ve got a hidden service
on the right-hand side who is connected
to a Guard node and on the top we got
one of those directory servers which is
responsible for serving up those
hidden service directory requests.
Now, when you first want to connect to
a hidden service you connect through
your Guard node and through a couple of hops
up to the hidden service directory and
you request the descriptor off of them.
So at this point if you are the attacker
and you control one of the hidden service
directory nodes for a particular site
you can send back down the circuit
a particular pattern of traffic.
And if you control that user’s
Guard node – which is a big if –
then you can spot that pattern of traffic
at the Guard node. The question is:
“How do you control a particular user’s
Guard node?” That’s very, very hard.
But if e.g. I run a hidden service and all
of you visit my hidden service, and
I’m running a couple of dodgy Guard relays
then the probability is that some of you,
certainly not all of you by any stretch will
select my dodgy Guard relay, and
I could deanonymise you, but I couldn’t
deanonymise the rest of them.
So what we’re saying here is that
you can deanonymise some of the users
some of the time but you can’t pick which
users those are which you’re going to
deanonymise. You can’t deanonymise someone
specific but you can deanonymise a fraction
based on what fraction of the network you
control in terms of Guard capacity.
How about… so the attacker controls those
two – here’s a picture from a research of
the University of Luxemburg which
did this. And these are plots of
taking the user’s IP address visiting
a C&C server, and then geolocating it
and putting it on a map. So “where was the
user located when they called one of
the Tor hidden services?” So, again,
this is a selection, a percentage
of the users visiting C&C servers
using this technique.
How about deanonymising hidden services
themselves? Well, again, you got a problem.
You’re the user. You’re gonna connect
through your Guard into the Tor network.
And then, eventually, through the hidden
service’s Guard node, and talk to
the hidden service. As the attacker you
need to control the hidden service’s
Guard node to do these traffic correlation
attacks. So again, it’s very difficult
to deanonymise a specific Tor hidden
service. But if you think about, okay,
there is 1.000 Tor hidden services, if you
can control a percentage of the Guard nodes
then some hidden services will pick you
and then you’ll be able to deanonymise those.
So provided you don’t care which hidden
services you gonna deanonymise
then it becomes much more straight-forward
to control the Guard nodes of some hidden
services but you can’t pick exactly
what those are.
So what sort of data can you see
traversing a relay?
This is a modified Tor client which just
dumps cells which are coming…
essentially packets travelling down
a circuit, and the information you can
extract from them at a Guard node.
And this is done off the main Tor network.
So I’ve got a client connected to
a “malicious” Guard relay
and it logs every single packet – they’re
called ‘cells’ in the Tor protocol –
coming through the Guard relay. We can’t
decrypt the packet because it’s encrypted
three times. What we can record,
though, is the IP address of the user,
the IP address of the next hop,
and we can count packets travelling
in each direction down the circuit. And we
can also record the time at which those
packets were sent. So of course, if you’re
doing the traffic correlation attacks
you’re using that time in the information
to try and work out whether you’re seeing
traffic which you’ve sent and which
identifies a particular user or not.
Or indeed traffic which they’ve sent
which you’ve seen at a different point
in the network.
Moving on to my…
…interesting problems,
research questions etc.
Based on what I’ve said, I’ve said there’s
these directory authorities which are
controlled by the core Tor members. If
e.g. they were malicious then they could
manipulate the Tor… – if a big enough
chunk of them are malicious then
they can manipulate the consensus
to direct you to particular nodes.
I don’t think that’s the case, and that
anyone thinks that’s the case.
And Tor is designed in a way to tr…
I mean that you’d have to control
a certain number of the authorities
to be able to do anything important.
So the Tor people… I said this
to them a couple of days ago.
I find it quite funny that you’d design
your system as if you don’t trust
each other. To which their response was:
“No, we design our system so that
we don’t have to trust each other.” Which
I think is a very good model to have,
when you have this type of system.
So could we eliminate these sort of
centralized servers? I think that’s
actually a very hard problem to do.
There are lots of attacks which could
potentially be deployed against
a decentralized network. At the moment the
Tor network is relatively well understood
both in terms of what types of attack it
is vulnerable to. So if we were to move
to a new architecture then we may open it
to a whole new class of attacks.
The Tor network has been existing
for quite some time and it’s been
very well studied. What about global
adversaries like the NSA, where you could
monitor network links all across the
world? It’s very difficult to defend
against that. Where they can monitor…
if they can identify which Guard relay
you’re using, they can monitor traffic
going into and out of the Guard relay,
and they log each of the subsequent hops
along. It’s very, very difficult to defend against
these types of things. Do we know if
they’re doing it? The documents that were
released yesterday – I’ve only had a very
brief look through them, but they suggest
that they’re not presently doing it and
they haven’t had much success.
I don’t know why, there are very powerful
attacks described in the academic literature
which are very, very reliable and most
academic literature you can access for free
so it’s not even as if they have to figure
out how to do it. They just have to read
the academic literature and try and
implement some of these attacks.
I don’t know what – why they’re not. The
next question is how to detect malicious
relays. So in my case we’re running
40 relays. Our relays were on consecutive
IP addresses, so we’re running 40
– well, most of them are on consecutive
IP addresses in two blocks. So they’re
running on IP addresses numbered
e.g. 1,2,3,4,…
We were running two relays per IP address,
and every single relay had my name
plastered across it.
So after I set up these 40 relays in
a relatively short period of time
I expected someone from the Tor Project
to come to me and say: “Hey Gareth, what
are you doing?” – no one noticed,
no one noticed. So this is presently
an open question. On the Tor Project
they’re quite open about this. They
acknowledged that, in fact, last year
we had the CERT researchers launch much
more relays than that. The Tor Project
spotted those large number of relays
but chose not to do anything about it
and, in fact, they were deploying an
attack. But, as you know, it’s often very
difficult to defend against unknown
attacks. So at the moment how to detect
malicious relays is a bit of an open
question. Which as I think is being
discussed on the mailing list.
The other one is defending against unknown
tampering at exits. If you took or take
the exit relays – the exit relay
can tamper with the traffic.
So we know particular types of attacks
doing SSL man-in-the-middles etc.
We’ve seen recently binary patching.
How do we detect unknown tampering
with traffic, other types of traffic? So
the binary tampering wasn’t spotted
until it was spotted by someone who
told the Tor Project. So it wasn’t
detected e.g. by the Tor Project
themselves, it was spotted by someone else
and notified to them. And then the final
one open on here is the Tor code review.
So the Tor code is open source. We know
from OpenSSL that, although everyone
can read source code, people don’t always
look at it. And OpenSSL has been
a huge mess, and there’s been
lots of stuff disclosed over that
over the last coming days. There are
lots of eyes on the Tor code but I think
always, more eyes are better. I’d say,
ideally if we can get people to look
at the Tor code and look for
vulnerabilities then… I encourage people
to do that. It’s a very useful thing to
do. There could be unknown vulnerabilities
as we’ve seen with the “relay early” type
quite recently in the Tor code which
could be quite serious. The truth is we
just don’t know until people do thorough
code audits, and even then it’s very
difficult to know for certain.
So my last point, I think, yes,
is advice to future researchers.
So if you ever wanted, or are planning
on doing a study in the future, e.g. on
Tor, do not do what the CERT researchers
do and start deanonymising people on the
live Tor network and doing it in a way
which is incredibly irresponsible. I don’t
think…I mean, I tend, myself, to give you with
the benefit of a doubt, I don’t think the
CERT researchers set out to be malicious.
I think they’re just very naive.
That’s what it was they were doing.
That was rapidly pointed out to them.
In my case we are running
40 relays. Our Tor relays they were forwarding
traffic, they were acting as good relays.
The only thing that we were doing
was logging publication requests
to the directories. Big question whether
that’s malicious or not – I don’t know.
One thing that has been pointed out to me
is that the .onion addresses themselves
could be considered sensitive information,
so any data we will be retaining
from the study is the aggregated data.
So we won't be retaining information
on individual .onion addresses because
that could potentially be considered
sensitive information. If you think about
someone running an .onion address which
contains something which they don’t want
other people knowing about. So we won’t
be retaining that data, and
we’ll be destroying them.
So I think that brings me now
to starting the questions.
I want to say “Thanks” to a couple of
people. The student who donated
the server to us. Nick Savage who is one
of my colleagues who was a sounding board
during the entire study. Ivan Pustogarov
who is the researcher at the University
of Luxembourg who sent us the large data
set of .onion addresses from last year.
He’s also the chap who has demonstrated
those deanonymisation attacks
that I talked about. A big "Thank you" to
Roger Dingledine who has frankly been…
presented loads of questions to me over
the last couple of days and allowed me
to bounce ideas back and forth.
That has been a very useful process.
If you are doing future research I strongly
encourage you to contact the Tor Project
at the earliest opportunity. You’ll find
them… certainly I found them to be
extremely helpful.
Donncha also did something similar,
so both Ivan and Donncha have done
a similar study in trying to classify the
types of hidden services or work out
how many hits there are to particular
types of hidden service. Ivan Pustogarov
did it on a bigger scale
and found similar results to us.
That is that these abuse sites
featured frequently
in the top requested sites. That was done
over a year ago, and again, he was seeing
similar sorts of pattern. There were these
abuse sites being requested frequently.
So that also sort of probates
what we’re saying.
The data I put online is at this address,
there will probably be the slides,
something called ‘The Tor Research
Framework’ which is an implementation
of a Java client, so an implementation
of a Tor client in Java specifically aimed
at researchers. So if e.g. you wanna pull
out data from a consensus you can do.
If you want to build custom routes
through the network you can do.
If you want to build routes through the
network and start sending padding traffic
down them you can do etc.
The code is designed in a way which is
designed to be easily modifiable
for testing lots of these things.
There is also a link to the Tor FBI
exploit which they deployed against
visitors to some Tor hidden services last
year. They exploited a Mozilla Firefox bug
and then ran code on users who were
visiting these hidden service, and ran
code on their computer to identify them.
At this address there is a link to that
including a copy of the shell code and an
analysis of exactly what it was doing.
And then of course a list of references,
with papers and things.
So I’m quite happy to take questions now.
applause
Herald: Thanks for the nice talk!
Do we have any questions
from the internet?
Signal Angel: One question. It’s very hard
to block addresses since creating them
is cheap, and they can be generated
for each user, and rotated often. So
can you think of any other way
for doing the blocking?
Gareth: That is absolutely true, so, yes.
If you were to block a particular .onion
address they can wail: “I want another
.onion address.” So I don’t know of
any way to counter that now.
Herald: Another one from the internet?
inaudible answer from Signal Angel
Okay, then, Microphone 1, please!
Question: Thank you, that’s fascinating
research. You mentioned that it is
possible to influence the hash of your
relay node in a sense that you could
to be choosing which service you are
advertising, or which hidden service
you are responsible for. Is that right?
Gareth: Yeah, correct!
Question: So could you elaborate
on how this is possible?
Gareth: So e.g. you just keep regenerating
a public key for your relay,
you’ll get closer and closer to the point
where you’ll be the responsible relay
for that particular hidden service. That’s
just – you keep regenerating your identity
hash until you’re at that particular point
in the relay. That’s not particularly
computationally intensive to do.
That was it?
Herald: Okay, next question
from Microphone 5, please.
Question: Hi, I was wondering for the
attacks where you identify a certain number
of users using a hidden service. Have
those attacks been used, or is there
any evidence there, and is there
any way of protecting against that?
Gareth: That’s a very interesting question,
is there any way to detect these types
of attacks? So some of the attacks,
if you’re going to generate particular
traffic patterns, one way to do that is to
use the padding cells. The padding cells
aren’t used at the moment by the official
Tor client. So the detection of those
could be indicative but it doesn't...
it`s not conclusive evidence in our tool.
Question: And is there any way of
protecting against a government
or something trying to denial-of-service
hidden services?
Gareth: So I… trying to… did not…
Question: Is it possible to protect
against this kind of attack?
Gareth: Not that I’m aware of. The Tor
Project are currently revising how they
do the hidden service protocol which will
make e.g. what I did, enumerating
the hidden services, much more difficult.
And to also be in a position on the
distributed hash table in advance
for a particular hidden service.
So they are at the moment trying to change
the way it’s done, and make some of
these things more difficult.
Herald: Good. Next question
from Microphone 2, please.
Mic2: Hi. I’m running the Tor2Web abuse,
and so I used to see a lot of abuse of requests
concerning the Tor hidden service
being exposed on the internet through
the Tor2Web.org domain name. And I just
wanted to comment on, like you said,
the abuse number of the requests. I used
to spoke with some of the child protection
agencies that reported abuse at
Tor2Web.org, and they are effectively
using crawlers that periodically look for
changes in order to get new images to be
put in the database. And what I was able
to understand is that the German agency
doing that is crawling the same sites that
the Italian agencies are crawling, too.
So it’s likely that in most of the
countries there are the child protection
agencies that are crawling those few
numbers of Tor hidden services that
contain child porn. And I saw it also
a bit from the statistics of Tor2Web
where the amount of abuse relating to
that kind of content, it’s relatively low.
Just as contribution!
Gareth: Yes, that’s very interesting,
thank you for that!
applause
Herald: Next, Microphone 4, please.
Mic4: You then attacked or deanonymised
users with an infected or a modified Guard
relay? Is it required to modify the Guard
relay if I control the entry point
of the user to the internet?
If I’m his ISP?
Gareth: Yes, if you observe traffic
travelling into a Guard relay without
controlling the Guard relay itself.
Mic4: Yeah.
Gareth: In theory, yes. I wouldn’t be able
to tell you how reliable that is
off the top of my head.
Mic4: Thanks!
Herald: So another question
from the internet!
Signal Angel: Wouldn’t the ability to
choose the key hash prefix give
the ability to target specific .onions?
Gareth: So you can only target one .onion
address at a time. Because of the way
they are generated. So you wouldn’t be
able to say e.g. “Pick a key which targeted
two or more .onion addresses.” You can
only target one .onion address at a time
by positioning yourself at a particular
point on the distributed hash table.
Herald: Another one
from the internet? … Okay.
Then Microphone 3, please.
Mic3: Hey. Thanks for this research.
I think it strengthens the network.
So in the deem (?) I was wondering whether
you can donate this relays to be a part of
non-malicious relays pool, basically
use them as regular relays afterwards?
Gareth: Okay, so can I donate the relays
a rerun and at the Tor capacity (?) ?
Unfortunately, I said they were run by
a student and they were donated for
a fixed period of time. So we’ve given
those back to him. We are very grateful
to him, he was very generous. In fact,
without his contribution donating these
it would have been much more difficult
to collect as much data as we did.
Herald: Good, next, Microphone 5, please!
Mic5: Yeah hi, first of all thanks
for your talk. I think you’ve raised
some real issues that need to be
considered very carerfully by everyone
on the Tor Project. My question: I’d like
to go back to the issue with so many
abuse related web sites running over
the Tor Project. I think it’s an important
issue that really needs to be considered
because we don’t wanna be associated
with that at the end of the day.
Anyone who uses Tor, who runs a relay
or an exit node. And I understand it’s
a bit of a censored issue, and you don’t
really have any say over whether it’s
implemented or not. But I’d like to get
your opinion on the implementation
of a distributed block-deny system
that would run in very much a similar way
to those of the directory authorities.
I’d just like to see what
you think of that.
Gareth: So you’re asking me whether I want
to support a particular blocking mechanism
then?
Mic5: I’d like to get your opinion on it.
Gareth laughs
I know it’s a sensitive issue but I think,
like I said, I think something…
I think it needs to be considered because
everyone running exit nodes and relays
and people of the Tor Project don’t
want to be known or associated with
these massive amount of abuse web sites
that currently exist within the Tor network.
Gareth: I absolutely agree, and I think
the Tor Project are horrified as well that
this problem exists, and they, in fact,
talked on it in previous years that
they have a problem with this type of
content. I asked to what if anything is
done about it, it’s very much up to them.
Could it be done in a distributed fashion?
So the example I gave was a way which
it could be done by relay operators.
So e.g. that would need the consensus of
a large number of relay operators to be
effective. So that is done in
a distributed fashion. The question is:
who gives the list of .onion addresses to
block to each of the relay operators?
Clearly, the relay operators aren’t going
to collect themselves. It needs to be
supplied by someone like the Tor Project,
e.g., or someone trustworthy. Yes, it can
be done in a distributed fashion.
It can be done in an open fashion.
Mic5: Who knows?
Gareth: Okay.
Mic5: Thank you.
Herald: Good. And another
question from the internet.
Signal Angel: Apparently there’s an option
in the Tor client to collect statistics
on hidden services. Do you know about
this, and how it relates to your research?
Gareth: Yes, I believe they’re going to
be… the extent to which I know about it
is they’re gonna be trying this next
month, to try and estimate how many
hidden services there are. So keep
your eye on the Tor Project web site,
I’m sure they’ll be publishing
their data in the coming months.
Herald: And, sadly, we are running out of
time, so this will be the last question,
so Microphone 4, please!
Mic4: Hi, I’m just wondering if you could
sort of outline what ethical clearances
you had to get from your university
to conduct this kind of research.
Gareth: So we have to discuss these
types of things before undertaking
any research. And we go through the steps
to make sure that we’re not e.g. storing
sensitive information about particular
people. So yes, we are very mindful
of that. And that’s why I made a
particular point of putting on the slides
as to some of the things to consider.
Mic4: So like… you outlined a potential
implementation of the traffic correlation
attack. Are you saying that
you performed the attack? Or…
Gareth: No, no no, absolutely not.
So the link I’m giving… absolutely not.
We have not engaged in any…
Mic4: It just wasn’t clear
from the slides.
Gareth: I apologize. So it’s absolutely
clear on that. No, we’re not engaging
in any deanonymisation research on the
Tor network. The research I showed
is linked on the references, I think,
which I put at the end of the slides.
You can read about it. But it’s done in
simulation. So e.g. there’s a way
to do simulation of the Tor network on
a single computer. I can’t remember
the name of the project, though.
Shadow! Yes, it’s a system
called Shadow, we can run a large
number of Tor relays on a single computer
and simulate the traffic between them.
If you’re going to do that type of research
then you should use that. Okay,
thank you very much, everyone.
applause
silent postroll titles
subtitles created by c3subtitles.de
Join, and help us!