-
silent 31C3 preroll
-
Dr. Gareth Owen: Hello. Can you hear me?
Yes. Okay. So my name is Gareth Owen.
-
I’m from the University of Portsmouth.
I’m an academic
-
and I’m going to talk to you about
an experiment that we did
-
on the Tor hidden services,
trying to categorize them,
-
estimate how many they were etc. etc.
-
Well, as we go through the talk
I’m going to explain
-
how Tor hidden services work internally,
and how the data was collected.
-
So what sort of conclusions you can draw
from the data based on the way that we’ve
-
collected it. Just so [that] I get
an idea: how many of you use Tor
-
on a regular basis, could you
put your hand up for me?
-
So quite a big number. Keep your hand
up if… or put your hand up if you’re
-
a relay operator.
-
Wow, that’s quite a significant number,
isn’t it? And then, put your hand up
-
and/or keep it up if you
run a hidden service.
-
Okay, so, a fewer number, but still
some people run hidden services.
-
Okay, so, some of you may be very familiar
with the way Tor works, sort of,
-
in a low level. But I am gonna go through
it for those which aren’t, so they understand
-
just how they work. And as we go along,
because I’m explaining how
-
the hidden services work, I’m going
to tag on information on how
-
the Tor hidden services themselves can be
deanonymised and also how the users
-
of those hidden services can be
deanonymised, if you put
-
some strict criteria on what it is you
want to do with respect to them.
-
So the things that I’m going to go over:
I wanna go over how Tor works,
-
and then specifically how hidden services
work. I’m gonna talk about something
-
called the “Tor Distributed Hash Table”
for hidden services. If you’ve heard
-
that term and don’t know what
it means, don’t worry, I’ll explain
-
what a distributed hash table is and
how it works. It’s not as complicated
-
as it sounds. And then I wanna go over
Darknet data, so, data that we collected
-
from Tor hidden services. And as I say,
as we go along I will sort of explain
-
how you do deanonymisation of both the
services themselves and of the visitors
-
to the service. And just
how complicated it is.
-
So you may have seen this slide which
I think was from GCHQ, released last year
-
as part of the Snowden leaks where they
said: “You can deanonymise some users
-
some of the time but they’ve had
no success in deanonymising someone
-
in response to a specific request.”
So, given all of you e.g., I may be able
-
to deanonymise a small fraction of you
but I can’t choose precisely one person
-
I want to deanonymise. That’s what
I’m gonna be explaining in relation
-
to the deanonymisation attacks, how
you can deanonymise a section but
-
you can’t necessarily choose which section
of the users that you will be deanonymising.
-
Tor drives with just a couple
of different problems. On one part
-
it allows you to bypass censorship. So if
you’re in a country like China, which
-
blocks some types of traffic you can use
Tor to bypass their censorship blocks.
-
It tries to give you privacy, so, at some
level in the network someone can’t see
-
what you’re doing. And at another point
in the network people who don’t know
-
who you are but may necessarily
be able to see what you’re doing.
-
Now the traditional case
for this is to look at VPNs.
-
With a VPN you have
sort of a single provider.
-
You have lots of users connecting
to the VPN. The VPN has sort of
-
a mixing effect from an outside or
a server’s point of view. And then
-
out of the VPN you see requests
to Twitter, Wikipedia etc. etc.
-
And if that traffic doesn’t encrypt it then
the VPN can also read the contents
-
of the traffic. Now of course there is
a fundamental weakness with this.
-
If you trust the VPN provider the VPN
provider knows both who you are
-
and what you’re doing and can
link those two together with absolute
-
certainty. So you don’t… whilst you do
get some of these properties, assuming
-
you’ve got a trustworthy VPN provider
you don’t get them in the face of
-
an untrustworthy VPN provider.
And of course: how do you trust the VPN
-
provider? What sort of measure do
you use? That’s sort of an open question.
-
So Tor tries to solve this problem
by distributing the trust. Tor is
-
an open source project, so you can go
on to their Git repository, you can
-
download the source code, and change it,
improve it, submit patches etc.
-
As you heard earlier, during Jacob and
Roger’s talk they’re currently partly
-
sponsored by the US Government which seems
a bit paradoxical, but they explained
-
in that talk many of the… that
doesn’t affect like judgment.
-
And indeed, they do have some funding from
other sources, and they design that system
-
– which I’ll talk about a little bit
later – in a way where they don’t have
-
to trust each other. So there’s sort of
some redundancy, and they’re trying
-
to minimize these sort of trust issues
related to this. Now, Tor is
-
a partially de-centralized network, which
means that it has some centralized
-
components which are under the control of
the Tor Project and some de-centralized
-
components which are normally the Tor
relays. If you run a relay you’re
-
one of those de-centralized components.
There is, however, no single authority
-
on the Tor network.
So no single server which is responsible,
-
which you’re required to trust.
So the trust is somewhat distributed,
-
but not entirely. When you establish
a circuit through Tor you, the user,
-
download a list of all of the relays
inside the Tor network.
-
And you get to pick – and I’ll tell you
how you do that – which relays
-
you’re going to use to route your traffic
through. So here is a typical example:
-
You’re here on the left hand side as the
user. You download a list of the relays
-
inside the Tor network and you select from
that list three nodes, a guard node
-
which is your entry into the Tor network,
a relay node which is a middle node.
-
Essentially, it’s going to route your
traffic to a third hop. And then
-
the third hop is the exit node where
your traffic essentially exits out
-
on the internet. Now, looking at the
circuit. So this is a circuit through
-
the Tor network through which you’re
going to route your traffic. There are
-
three layers of encryption at the
beginning, so between you
-
and the guard node. Your traffic
is encrypted three times.
-
In the first instance encrypted to the
guard, and the it’s encrypted again,
-
through the relay, and then encrypted
again to the exit, and as the traffic moves
-
through the Tor network each of those
layers of encryption are unpeeled
-
from the data. The Guard here in this case
knows who you are, and the exit relay
-
knows what you’re doing but neither know
both. And the middle relay doesn’t really
-
know a lot, except for which relay is
her guard and which relay is her exit.
-
Who runs an exit relay? So if you run
an exit relay all of the traffic which
-
users are sending out on the internet they
appear to come from your IP address.
-
So running an exit relay is potentially
risky because someone may do something
-
through your relay which attracts attention.
And then, when law enforcement
-
traced that back to an IP address it’s
going to come back to your address.
-
So some relay operators have had trouble
with this, with law enforcement coming
-
to them, and saying: “Hey we got this
traffic coming through your IP address
-
and you have to go and explain it.”
So if you want to run an exit relay
-
it’s a little bit risky, but we’re thankful
for those people that do run exit relays
-
because ultimately if people didn’t run
an exit relay you wouldn’t be able
-
to get out of the Tor network, and it
wouldn’t be terribly useful from this
-
point of view. So, yes.
applause
-
So every Tor relay, when you set up
a Tor relay you publish something called
-
a descriptor which describes your Tor
relay and how to use it to a set
-
of servers called the authorities. And the
trust in the Tor network is essentially
-
split across these authorities. They’re run
by the core Tor Project members.
-
And they maintain a list of all of the
relays in the network. And they observe
-
them over a period of time. If the relays
exhibit certain properties they give
-
the relays flags. If e.g. a relay allows
traffic to exit from the Tor network
-
it will get the ‘Exit’ flag. If they’d been
switched on for a certain period of time,
-
or for a certain amount of traffic they’ll
be allowed to become the guard relay
-
which is the first node in your circuit.
So when you build your circuit you
-
download a list of these descriptors from
one of the Directory Authorities. You look
-
at the flags which have been assigned to
each of the relays, and then you pick
-
your route based on that. So you’ll pick
the guard node from a set of relays
-
which have the ‘Guard’ flag, your exits
from the set of relays which have
-
the ‘Exit’ flag etc. etc. Now, as of
a quick count this morning there are
-
about 1500 guard relays, around 1000 exit
relays, and six relays flagged as ‘bad’ exits.
-
What does a ‘bad exit’ mean?
waits for audience to respond
-
That’s not good! That’s exactly
what it means! Yes! laughs
-
applause
-
So relays which have been flagged as ‘bad
exits’ your client will never chose to exit
-
traffic through. And examples of things
which may get a relay flagged as an
-
[bad] exit relay – if they’re fiddling with
the traffic which is coming out of
-
the Tor relay. Or doing things like
man-in-the-middle attacks against
-
SSL traffic. We’ve seen various things,
there have been relays man-in-the-middling
-
SSL traffic, there have very, very recently
been an exit relay which was patching
-
binaries that you downloaded from the
internet, inserting malware into the binaries.
-
So you can do these things but the Tor
Project tries to scan for them. And if
-
these things are detected then they’ll be
flagged as ‘Bad Exits’. It’s true to say
-
that the scanning mechanism is not 100%
fool-proof by any stretch of the imagination.
-
It tries to pick up common types
of attacks, so as a result
-
it won’t pick up unknown attacks or
attacks which haven’t been seen or
-
have not been known about beforehand.
-
So looking at this, how do you deanonymise
the traffic travelling through the Tor
-
networks? Given some traffic coming out
of the exit relay, how do you know
-
which user that corresponds to? What is
their IP address? You can’t actually
-
modify the traffic because if any of the
relays tried to modify the traffic
-
which they’re sending through the network
Tor will tear down the circuit through the relay.
-
So there’s these integrity checks, each
of the hops. And if you try to sort of
-
– because you can’t decrypt the packet
you can’t modify it in any meaningful way,
-
and because there’s an integrity check
at the next hop that means that you can’t
-
modify the packet because otherwise it’s
detected. So you can’t do this sort of
-
marker, and try and follow the marker
through the network. So instead
-
what you can do if you control… so let me
give you two cases. In the worst case
-
if the attacker controls all three of your
relays that you pick, which is an unlikely
-
scenario that needs to control quite
a big proportion of the network. Then
-
it should be quite obvious that they can
work out who you are and also
-
see what you’re doing because in that
case they can tag the traffic, and
-
they can just discard these integrity
checks at each of the following hops.
-
Now in a different case, if you control
the Guard relay and the exit relay
-
but not the middle relay the Guard relay
can’t tamper with the traffic because
-
this middle relay will close down the
circuit as soon as it happens.
-
The exit relay can’t send stuff back down
the circuit to try and identify the user,
-
either. Because again, the circuit will be
closed down. So what can you do?
-
Well, you can count the number of packets
going through the Guard node. And you can
-
measure the timing differences between
packets, and try and spot that pattern
-
at the Exit relays. You’re looking at counts of
packets and the timing between those
-
packets which are being sent, and
essentially trying to correlate them all.
-
So if your user happens to pick you as
your Guard node, and then happens to pick
-
your exit relay, then you can deanonymise
them with very high probability using
-
this technique. You’re just correlating
the timings of packets and counting
-
the number of packets going through.
And the attacks demonstrated in literature
-
are very reliable for this. We heard
earlier from the Tor talk about the “relay
-
early” tag which was the attack discovered
by the cert researches in the US.
-
That attack didn’t rely on timing attacks.
Instead, what they were able to do was
-
send a special type of cell containing
the data back down the circuit,
-
essentially marking this data, and saying:
“This is the data we’re seeing
-
at the Exit relay, or at the hidden
service", and encode into the messages
-
travelling back down the circuit, what the
data was. And then you could pick
-
those up at the Guard relay and say, okay,
whether it’s this person that’s doing that.
-
In fact, although this technique works,
and yeah it was a very nice attack,
-
the traffic correlation attacks are
actually just as powerful.
-
So although this bug has been fixed traffic
correlation attacks still work and are
-
still fairly, fairly reliable. So the problem
still does exist. This is very much
-
an open question. How do we solve this
problem? We don’t know, currently,
-
how to solve this problem of trying
to tackle the traffic correlation.
-
There are a couple of solutions.
But they’re not particularly…
-
they’re not particularly reliable. Let me
just go through these, and I’ll skip back
-
on the few things I’ve missed. The first
thing is, high-latency networks, so
-
networks where packets are delayed
in their transit through the network.
-
That throws away a lot of the timing
information. So they promise
-
to potentially solve this problem.
But of course, if you want to visit
-
Google’s home page, and you have to wait
five minutes for it, you’re simply
-
just not going to use Tor. The whole point
is trying to make this technology usable.
-
And if you got something which is very,
very slow then it doesn’t make it
-
attractive to use. But of course,
this case does work slightly better
-
for e-mail. If you think about it with
e-mail, you don’t mind if you’re e-mail
-
– well, you may not mind, you may mind –
you don’t mind if your e-mail is delayed
-
by some period of time. Which makes this
somewhat difficult. And as Roger said
-
earlier, you can also introduce padding
into the circuit, so these are dummy cells.
-
But, but… with a big caveat: some of the
research suggests that actually you’d
-
need to introduce quite a lot of padding
to defeat these attacks, and that would
-
overload the Tor network in its current
state. So, again, not a particular
-
practical solution.
-
How does Tor try to solve this problem?
Well, Tor makes it very difficult
-
to become a users Guard relay. If you
can’t become a users Guard relay
-
then you don’t know who the user is, quite
simply. And so by making it very hard
-
to become the Guard relay therefore you
can’t do this traffic correlation attack.
-
So at the moment the Tor client chooses
one Guard relay and keeps it for a period
-
of time. So if I want to sort of target
just one of you I would need to control
-
the Guard relay that you were using at
that particular point in time. And in fact
-
I’d also need to know what that Guard
relay is. So by making it very unlikely
-
that you would select a particular malicious
Guard relay, where the number of malicious
-
Guard relays is very small, that’s how Tor
tries to solve this problem. And
-
at the moment your Guard relay is your
barrier of security. If the attacker can’t
-
control the Guard relay then they won’t
know who you are. That doesn’t mean
-
they can’t try other sort of side channel
attacks by messing with the traffic
-
at the Exit relay etc. You know that you
may sort of e.g. download dodgy documents
-
and open one on your computer, and those
sort of things. Now the alternative
-
of course to having a Guard relay
and keeping it for a very long time
-
will be to have a Guard relay and
to change it on a regular basis.
-
Because you might think, well, just choosing
one Guard relay and sticking with it
-
is probably a bad idea. But actually,
that’s not the case. If you pick
-
the Guard relay, and assuming that the
chance of picking a Guard relay that is
-
malicious is very low, then, when you
first use your Guard relay, if you got
-
a good choice, then your traffic is safe.
If you haven’t got a good choice then
-
your traffic isn’t safe. Whereas if your
Tor client chooses a Guard relay
-
every few minutes, or every hour, or
something on those lines at some point
-
you’re gonna pick a malicious Guard relay.
So they’re gonna have some of your traffic
-
but not all of it. And so currently the
trade-off is that we make it very difficult
-
for an attacker to control a Guard relay
and the user picks a Guard relay and
-
keeps it for a long period of time. And
so it’s very difficult for the attackers
-
to pick that Guard relay when they control
a very small proportion of the network.
-
So this, currently, provides those
properties I described earlier, the privacy
-
and the anonymity when you’re browsing the
web, when you’re accessing websites etc.
-
But still you know who the website is. So
although you’re anonymous and the website
-
doesn’t know who you are you know who the
website is. And there may be some cases
-
where e.g. the website would also wish to
remain anonymous. You want the person
-
accessing the website and the website
itself to be anonymous to each other.
-
And you could think about people e.g.
being in countries where running
-
a political blog e.g. might be a dangerous
activity. If you run that on a regular
-
webserver you’re easily identified whereas,
if you got some way where you as
-
the webserver can be anonymous then
that allows you to do that activity without
-
being targeted by your government. So
this is what hidden services try to solve.
-
Now when you first think about a problem
you kind of think: “Hang on a second,
-
the user doesn’t know who the website
is and the website doesn’t know
-
who the user is. So how on earth do they
talk to each other?” Well, that’s essentially
-
what the Tor hidden service protocol tries
to sort of set up. How do you identify and
-
connect to each other. So at the moment
this is what happens: We’ve got Bob
-
on the [right] hand side who is the hidden
service. And we got Alice on the left hand
-
side here who is the user who wishes to
visit the hidden service. Now when Bob
-
sets up his hidden service he picks three
nodes in the Tor network as introduction
-
points and builds several hop circuits to
them. So the introduction points don’t know
-
who Bob is. Bob has circuits to them. And
Bob says to each of these introduction points
-
“Will you relay traffic to me if someone
connects to you asking for me?”
-
And then those introduction points
do that. So then, once Bob has picked
-
his introduction points he publishes
a descriptor describing the list of his
-
introduction points for someone who wishes
to come onto his websites. And then Alice
-
on the left hand side wishing to visit Bob
will pick a rendezvous point in the network
-
and build a circuit to it. So this “RP”
here is the rendezvous point.
-
And she will relay a message via one of
the introduction points saying to Bob:
-
“Meet me at the rendezvous point”.
And then Bob will build a 3-hop-circuit
-
to the rendezvous point. So now at this
stage we got Alice with a multi-hop circuit
-
to the rendezvous point, and Bob with
a multi-hop circuit to the rendezvous point.
-
Alice and Bob haven’t connected to one
another directly. The rendezvous point
-
doesn’t know who Bob is, the rendezvous
point doesn’t know who Alice is.
-
All they’re doing is forwarding the
traffic. And they can’t inspect the traffic,
-
either, because the traffic itself
is encrypted.
-
So that’s currently how you solve this
problem with trying to communicate
-
with someone who you don’t know
who they are and vice versa.
-
drinks from the bottle
-
The principle thing I’m going to talk
about today is this database.
-
So I said, Bob, when he picks his
introduction points he builds this thing
-
called a descriptor, describing who his
introduction points are, and he publishes
-
them to a database. This database itself
is distributed throughout the Tor network.
-
It’s not a single server. So both, Bob and
Alice need to be able to publish information
-
to this database, and also retrieve
information from this database. And Tor
-
currently uses something called
a distributed hash table, which I’m gonna
-
give an example of what this means and
how it works. And then I’ll talk to you
-
specifically how the Tor Distributed Hash
Table works itself. So let’s say e.g.
-
you've got a set of servers. So here we've
got 26 servers and you’d like to store
-
your files across these different servers
without having a single server responsible
-
for deciding, “okay, that file is stored
on that server, and this file is stored
-
on that server” etc. etc. Now here is my
list of files. You could take a very naive
-
approach. And you could say: “Okay, I’ve
got 26 servers, I got all of these file names
-
and start with the letter of the alphabet.”
And I could say: “All of the files that begin
-
with A are gonna go under server A; or
the files that begin with B are gonna go
-
on server B etc.” And then when you want
to retrieve a file you say: “Okay, what
-
does my file name begin with?” And then
you know which server it’s stored on.
-
Now of course you could have a lot of
servers – sorry – a lot of files
-
which begin with a Z, an X or a Y etc. in
which case you’re gonna overload
-
that server. You’re gonna have more files
stored on one server than on another server
-
in your set. And if you have a lot of big
files, say e.g. beginning with B then
-
rather than distributing your files across
all the servers you’re gonna just be
-
overloading one or two of them. So to
solve this problem what we tend to do is:
-
we take the file name, and we run it
through a cryptographic hash function.
-
A hash function produces output which
looks like random, very small changes
-
in the input so a cryptographic hash
function produces a very large change
-
in the output. And this change looks
random. So if I take all of my file names
-
here, and assuming I have a lot more,
I take a hash of them, and then I use
-
that hash to determine which server to
store the file on. Then, with high probability
-
my files will be distributed evenly across
all of the servers. And then when I want
-
to go and retrieve one of the files I take
my file name, I run it through the
-
cryptographic hash function, that gives me
the hash, and then I use that hash
-
to identify which server that particular
file is stored on. And then I go and
-
retrieve it. So that’s the sort of a loose
idea of how a distributed hash table works.
-
There are a couple of problems with this.
What if you got a changing size, what
-
if the number of servers you got changes
in size as it does in the Tor network.
-
It’s a very brief overview of the theory.
So how does it apply for the Tor network?
-
Well, the Tor network has a set of relays
and it has a set of hidden services.
-
Now we take all of the relays, and they
have a hash identity which identifies them.
-
And we map them onto a circle using that
hash value as an identifier. So you can
-
imagine the hash value ranging from Zero
to a very large number. We got a Zero point
-
at the very top there. And that runs all
the way round to the very large number.
-
So given the identity hash for a relay we
can map that to a particular point on
-
the server. And then all we have to do
is also do this for hidden services.
-
So there’s a hidden service address,
something.onion, so this is
-
one of the hidden websites that you might
visit. You take the – I’m not gonna describe
-
in too much detail how this is done but –
the value is done in such a way such that
-
it’s evenly distributed about the circle.
So your hidden service will have
-
a particular point on the circle. And the
relays will also be mapped onto this circle.
-
So there’s the relays. And the hidden
service. And in the case of Tor
-
the hidden service actually maps to two
positions on the circle, and it publishes
-
its descriptor to the three relays to the
right at one position, and the three relays
-
to the right at another position. So there
are actually in total six places where
-
this descriptor is published on the
circle. And then if I want to go and
-
fetch and connect to a hidden service
I go on to go and pull this hidden descriptor
-
down to identify what its introduction
points are. I take the hidden service
-
address, I find out where it is on the
circle, I map all of the relays onto
-
the circle, and then I identify which
relays on the circle are responsible
-
for that particular hidden service. And
I just connect, then I say: “Do you have
-
a copy of the descriptor for that
particular hidden service?”
-
And if so then we’ve got our list of
introduction points. And we can go
-
to the next steps to connect to our hidden
service. So I’m gonna explain how we
-
sort of set up our experiments. What we
thought, or what we were interested to do,
-
was collect publications of hidden
services. So for everytime a hidden service
-
gets set up it publishes to this distributed
hash table. What we wanted to do was
-
collect those publications so that we
get a complete list of all of the hidden
-
services. And what we also wanted to do
is to find out how many times a particular
-
hidden service is requested.
-
Just one more point that
will become important later.
-
The position which the hidden service
appears on the circle changes
-
every 24 hours. So there’s not
a fixed position every single day.
-
If we run 40 nodes over a long period of
time we will occupy positions within
-
that distributed hash table. And we will be
able to collect publications and requests
-
for hidden services that are located at
that position inside the distributed
-
hash table. So in that case we ran 40 Tor
nodes, we had a student at university
-
who said: “Hey, I run a hosting company,
I got loads of server capacity”, and
-
we told him what we were doing, and he
said: “Well, you really helped us out,
-
these last couple of years…”
and just gave us loads of server capacity
-
to allow us to do this. So we spun up 40
Tor nodes. Each Tor node was required
-
to advertise a certain amount of bandwidth
to become a part of that distributed
-
hash table. It’s actually a very small
amount, so this didn’t matter too much.
-
And then, after – this has changed
recently in the last few days,
-
it used to be 25 hours, it’s just been
increased as a result of one of the
-
attacks last week. But here… certainly
during our study it was 25 hours. You then
-
appear at a particular point inside that
distributed hash table. And you’re then
-
in a position to record publications of
hidden services and requests for hidden
-
services. So not only can you get a full
list of the onion addresses you can also
-
find out how many times each of the
onion addresses are requested.
-
And so this is what we recorded. And then,
once we had a full list of… or once
-
we had run for a long period of time to
collect a long list of .onion addresses
-
we then built a custom crawler that would
visit each of the Tor hidden services
-
in turn, and pull down the HTML contents,
the text content from the web page,
-
so that we could go ahead and classify
the content. Now it’s really important
-
to know here, and it will become obvious
why a little bit later, we only pulled down
-
HTML content. We didn’t pull out images.
And there’s a very, very important reason
-
for that which will become clear shortly.
-
We had a lot of questions when we
first started this. Noone really knew
-
how many hidden services there were. It had
been suggested to us there was a very high
-
turn-over of hidden services. We wanted to
confirm that whether that was true or not.
-
And we also wanted to do this so,
what are the hidden services,
-
how popular are they, etc. etc. etc. So
our estimate for how many hidden services
-
there are, over the period which we
ran our study, this is a graph plotting
-
our estimate for each of the individual
days as to how many hidden services
-
there were on that particular day. Now the
data is naturally noisy because we’re only
-
a very small proportion of that circle.
So we’re only observing a very small
-
proportion of the total publications and
requests every single day, for each of
-
those hidden services. And if you
take a long term average for this
-
there’s about 45.000 hidden services that
we think were present, on average,
-
each day, during our entire study. Which
is a large number of hidden services.
-
But over the entire length we
collected about 80.000, in total.
-
Some came and went etc.
So the next question after how many
-
hidden services there are is how long
the hidden service exists for.
-
Does it exist for a very long period
of time, does it exist for a very short
-
period of time etc. etc.
So what we did was, for every single
-
.onion address we plotted how many times
we saw a publication for that particular
-
hidden service during the six months.
How many times did we see it.
-
If we saw it a lot of times that suggested
in general the hidden service existed
-
for a very long period of time. If we saw
a very short number of publications
-
for each hidden service then that
suggests that they were only present
-
for a very short period of time. This is
our graph. By far the most number
-
of hidden services we only saw once during
the entire study. And we never saw them
-
again. We suggest that there’s a very high
turnover of the hidden services, they
-
don’t tend to exist on average i.e. for
a very long period of time.
-
And then you can see the sort of
a tail here. If we plot just those
-
hidden services which existed for a long
time, so e.g. we could take hidden services
-
which have a high number of hit requests
and say: “Okay, those that have a high number
-
of hits probably existed for a long time.”
That’s not absolutely certain, but probably.
-
Then you see this sort of -normal- plot
about 4..5, so we saw on average
-
most hidden services four or five times
during the entire six months if they were
-
popular and we’re using that as a proxy
measure for whether they existed
-
for the entire time. Now, this stage was
over 160 days, so almost six months.
-
What we also wanted to do was trying
to confirm this over a longer period.
-
So last year, in 2013, about February time
some researchers of the University
-
of Luxemburg also ran a similar study
but it ran over a very short period of time
-
over the day. But they did it in such
a way it could collect descriptors
-
across much of the circle during a single
day. That was because of a bug in the way
-
Tor did some of the things which has
now been fixed so we can’t repeat that
-
as a particular way. So we got a list of
.onion addresses from February 2013
-
from these researchers at the University
of Luxemburg. And then we got our list
-
of .onion addresses from this six months
which was March to September of this year.
-
And we wanted to say, okay, we’re given
these two sets of .onion addresses.
-
Which .onion addresses existed in his set
but not ours and vice versa, and which
-
.onion addresses existed in both sets?
-
So as you can see a very small minority
of hidden service addresses existed
-
in both sets. This is over an 18 month
period between these two collection points.
-
A very small number of services existed
in both his data set and in
-
our data set. Which again suggested
there’s a very high turnover of hidden
-
services that don’t tend to exist
for a very long period of time.
-
So the question is why is that?
Which we’ll come on to a little bit later.
-
It’s a very valid question, can’t answer
it 100%, we have some inclines as to
-
why that may be the case. So in terms
of popularity which hidden services
-
did we see, or which .onion addresses
did we see requested the most?
-
Which got the most number of hits? Or the
most number of directory requests.
-
So botnet Command & Control servers
– if you’re not familiar with what
-
a botnet is, the idea is to infect lots of
people with a piece of malware.
-
And this malware phones home to
a Command & Control server where
-
the botnet master can give instructions
to each of the bots on to do things.
-
So it might be e.g. to collect passwords,
key strokes, banking details.
-
Or it might be to do things like
Distributed Denial of Service attacks,
-
or to send spam, those sorts of things.
And a couple of years ago someone gave
-
a talk and said: “Well, the problem with
running a botnet is your C&C servers
-
are vulnerable.” Once a C&C server is taken
down you no longer have control over
-
your botnet. So it’s been a sort of arms
race against anti-virus companies and
-
against malware authors to try and come up
with techniques to run C&C servers in a way
-
which they can’t be taken down. And
a couple of years ago someone gave a talk
-
at a conference that said: “You know what?
It would be a really good idea if botnet
-
C&C servers were run as Tor hidden
services because then no one knows
-
where they are, and in theory they can’t
be taken down.” So in the fact we have this
-
there are loads and loads and loads of
these addresses associated with several
-
different botnets, ‘Sefnit’ and ‘Skynet’.
Now Skynet is the one I wanted to talk
-
to you about because the guy that runs
Skynet had a twitter account, and he also
-
did a Reddit AMA. If you not heard
of a Reddit AMA before, that’s a Reddit
-
ask-me-anything. You can go on the website
and ask the guy anything. So this guy
-
wasn’t hiding in the shadows. He’d say:
“Hey, I’m running this massive botnet,
-
here’s my Twitter account which I update
regularly, here is my Reddit AMA where
-
you can ask me questions!” etc.
-
He was arrested last year, which is not,
perhaps, a huge surprise.
-
laughter and applause
-
But… so he was arrested,
his C&C servers disappeared
-
but there were still infected hosts trying
to connect with the C&C servers and
-
request access to the C&C server.
-
This is why we’re saying: “A large number
of hits.” So all of these requests are
-
failed requests, i.e. we didn’t have
a descriptor for them because
-
the hidden service had gone away but
there were still clients requesting each
-
of the hidden services.
-
And the next thing we wanted to do was
to try and categorize sites. So, as I said
-
earlier, we crawled all of the hidden
services that we could, and we classified
-
them into different categories based
on what the type of content was
-
on the hidden service side. The first
graph I have is the number of sites
-
in each of the categories. So you can see
down the bottom here we got lots of
-
different categories. We got drugs, market
places, etc. on the bottom. And the graph
-
shows the percentage of the hidden
services that we crawled that fit in
-
to each of these categories. So e.g. looking
at this, drugs, the most number of sites
-
that we crawled were made up of
drugs-focused websites, followed by
-
market places etc. There’s a couple of
questions you might have here,
-
so which ones are gonna stick out, what
does ‘porn’ mean, well, you know
-
what ‘porn’ means. There are some very
notorious porn sites on the Tor Darknet.
-
There was one in particular which was
focused on revenge porn. It turns out
-
that youngsters wish to take pictures
of themselves, and send it to their
-
boyfriends or their girlfriends. And
when they get dumped they publish them
-
on these websites. So there were several
of these sites on the main internet
-
which have mostly been shut down.
And some of these sites were archived
-
on the Darknet. The second one is that
we should probably wonder what is,
-
is ‘abuse’. Abuse was… every single
site we classified in this category
-
were child abuse sites. So they were in
some way facilitating child abuse.
-
And how do we know that? Well, the data
that came back from the crawler
-
made it completely unambiguous as to what
the content was in these sites. That was
-
completely obvious, from then content, from
the crawler as to what was on these sites.
-
And this is the principal reason why we
didn’t pull down images from sites.
-
There are many countries that
would be a criminal offense to do so.
-
So our crawler only pulled down text
content from all of these sites, and that
-
enabled us to classify them, based on
that. We didn’t pull down any images.
-
So of course the next thing we liked to do
is to say: “Okay, well, given each of these
-
categories, what proportion of directory
requests went to each of the categories?”
-
Now the next graph is going to need some
explaining as to precisely what it
-
means, and I’m gonna give that. This is
the proportion of directory requests
-
which we saw that went to each of the
categories of hidden service that we
-
classified. As you can see, in fact, we
saw a very large number going to these
-
abuse sites. And the rest sort of
distributed right there, at the bottom.
-
And the question is: “What is it
we’re collecting here?”
-
We’re collecting successful hidden service
directory requests. What does a hidden
-
service directory request mean?
It probably loosely correlates with
-
either a visit or a visitor. So somewhere
in between those two. Because when you
-
want to visit a hidden service you make
a request for the hidden service descriptor
-
and that allows you to connect to it
and browse through the web site.
-
But there are cases where, e.g. if you
restart Tor, you’ll go back and you
-
re-fetch the descriptor. So in that case
we’ll count twice, for example.
-
What proportion of these are people,
and which proportion of them are
-
something else? The answer to that is
we just simply don’t know.
-
We've got directory requests but that doesn’t
tell us about what they’re doing on these
-
sites, what they’re fetching, or who
indeed they are, or what it is they are.
-
So these could be automated requests,
they could be human beings. We can’t
-
distinguish between those two things.
-
What are the limitations?
-
A hidden service directory request neither
exactly correlates to a visit -or- a visitor.
-
It’s probably somewhere in between.
So you can’t say whether it’s exactly one
-
or the other. We cannot say whether
a hidden service directory request
-
is a person or something automated.
We can’t distinguish between those two.
-
Any type of site could be targeted by e.g.
DoS attacks, by web crawlers which would
-
greatly inflate the figures. If you were
to do a DoS attack it’s likely you’d only
-
request a small number of descriptors.
You’d actually be flooding the site itself
-
rather than the directories. But, in
theory, you could flood the directories.
-
But we didn’t see any sort of shutdown
of our directories based on flooding, e.g.
-
Whilst we can’t rule that out, it doesn’t
seem to fit too well with what we’ve got.
-
The other question is ‘crawlers’.
I obviously talked with the Tor Project
-
about these results and they’ve suggested
that there are groups, so the child
-
protection agencies e.g. that will crawl
these sites on a regular basis. And,
-
again, that doesn’t necessarily correlate
with a human being. And that could
-
inflate the figures. How many hidden
directory requests would there be
-
if a crawler was pointed at it. Typically,
if I crawl them on a single day, one request.
-
But if they got a large number of servers
doing the crawling then it could be
-
a request per day for every single server.
So, again, I can’t give you, definitive,
-
“yes, this is human beings” or
“yes, this is automated requests”.
-
The other important point is, these two
content graphs are only hidden services
-
offering web content. There are hidden
services that do things, e.g. IRC,
-
the instant messaging etc. Those aren’t
included in these figures. We’re only
-
concentrating on hidden services offering
web sites. They’re HTTP services, or HTTPS
-
services. Because that allows to easily
classify them. And, in fact, some of
-
the other types are IRC and Jabber the
result was probably not directly comparable
-
with web sites. That’s sort of the use
case for using them, it’s probably
-
slightly different. So I appreciate the
last graph is somewhat alarming.
-
If you have any questions please ask
either me or the Tor developers
-
as to how to interpret these results. It’s
not quite as straight-forward as it may
-
look when you look at the graph. You
might look at the graph and say: “Hey,
-
that looks like there’s lots of people
visiting these sites”. It’s difficult
-
to conclude that from the results.
-
The next slide is gonna be very
contentious. I will prefix it with:
-
“I’m not advocating -any- kind of
action whatsoever. I’m just trying
-
to describe technically as to what could
be done. It’s not up to me to make decisions
-
on these types of things.” So, of course,
when we found this out, frankly, I think
-
we were stunned. I mean, it took us
several days, frankly, it just stunned us,
-
“what the hell, this is not
what we expected at all.”
-
So a natural step is, well, we think, most
of us think that Tor is a great thing,
-
it seems. Could this problem be sorted out
while still keeping Tor as it is?
-
And probably the next step to say: “Well,
okay, could we just block this class
-
of content and not other types of content?”
So could we block just hidden services
-
that are associated with these sites and
not other types of hidden services?
-
We thought there’s three ways in which
we could block hidden services.
-
And I’ll talk about whether these were
impossible in the coming months,
-
after explaining them. But during our
study these would have been impossible
-
and presently they are possible.
-
A single individual could shut down
a single hidden service by controlling
-
all of the relays which are responsible
for receiving a publication request
-
on that distributed hash table. It’s
possible to place one of your relays
-
at a particular position on that circle
and so therefore make yourself be
-
the responsible relay for
a particular hidden service.
-
And if you control all of the six relays
which are responsible for a hidden service,
-
when someone comes to you and says:
“Can I have a descriptor for that site”
-
you can just say: “No, I haven’t got it”.
And provided you control those relays
-
users won’t be able to fetch those sites.
-
The second option is you could say:
“Okay, the Tor Project are blocking these”
-
– which I’ll talk about in a second –
“as a relay operator”. Could I
-
as a relay operator say: “Okay, as
a relay operator I don’t want to carry
-
this type of content, and I don’t want to
be responsible for serving up this type
-
of content.” A relay operator could patch
his relay and say: “You know what,
-
if anyone comes to this relay requesting
anyone of these sites then, again, just
-
refuse to do it”. The problem is a lot of
relay operators need to do it. So a very,
-
very large number of the potential relay
operators would need to do that
-
to effectively block these sites. The
final option is the Tor Project could
-
modify the Tor program and actually embed
these ingresses in the Tor program itself
-
so as that all relays by default both
block hidden service directory requests
-
to these sites, and also clients themselves
would say: “Okay, if anyone’s requesting
-
these block them at the client level.”
Now I hasten to add: I’m not advocating
-
any kind of action that is entirely up to
other people because, frankly, I think
-
if I advocated blocking hidden services
I probably wouldn’t make it out alive,
-
so I’m just saying: this is a description
of what technical measures could be used
-
to block some classes of sites. And of
course there’s lots of questions here.
-
If e.g. the Tor Project themselves decided:
“Okay, we’re gonna block these sites”
-
that means they are essentially
in control of the block list.
-
The block list would be somewhat public
so everyone would be up to inspect
-
what the sites are that are being blocked
and they would be in control of some kind
-
of block list. Which, you know, arguably
is against what the Tor Projects are after.
-
takes a sip, coughs
-
So how about deanonymising visitors
to hidden service web sites?
-
So in this case we got a user on the
left-hand side who is connected to
-
a Guard node. We’ve got a hidden service
on the right-hand side who is connected
-
to a Guard node and on the top we got
one of those directory servers which is
-
responsible for serving up those
hidden service directory requests.
-
Now, when you first want to connect to
a hidden service you connect through
-
your Guard node and through a couple of hops
up to the hidden service directory and
-
you request the descriptor off of them.
So at this point if you are the attacker
-
and you control one of the hidden service
directory nodes for a particular site
-
you can send back down the circuit
a particular pattern of traffic.
-
And if you control that user’s
Guard node – which is a big if –
-
then you can spot that pattern of traffic
at the Guard node. The question is:
-
“How do you control a particular user’s
Guard node?” That’s very, very hard.
-
But if e.g. I run a hidden service and all
of you visit my hidden service, and
-
I’m running a couple of dodgy Guard relays
then the probability is that some of you,
-
certainly not all of you by any stretch will
select my dodgy Guard relay, and
-
I could deanonymise you, but I couldn’t
deanonymise the rest of them.
-
So what we’re saying here is that
you can deanonymise some of the users
-
some of the time but you can’t pick which
users those are which you’re going to
-
deanonymise. You can’t deanonymise someone
specific but you can deanonymise a fraction
-
based on what fraction of the network you
control in terms of Guard capacity.
-
How about… so the attacker controls those
two – here’s a picture from a research of
-
the University of Luxemburg which
did this. And these are plots of
-
taking the user’s IP address visiting
a C&C server, and then geolocating it
-
and putting it on a map. So “where was the
user located when they called one of
-
the Tor hidden services?” So, again,
this is a selection, a percentage
-
of the users visiting C&C servers
using this technique.
-
How about deanonymising hidden services
themselves? Well, again, you got a problem.
-
You’re the user. You’re gonna connect
through your Guard into the Tor network.
-
And then, eventually, through the hidden
service’s Guard node, and talk to
-
the hidden service. As the attacker you
need to control the hidden service’s
-
Guard node to do these traffic correlation
attacks. So again, it’s very difficult
-
to deanonymise a specific Tor hidden
service. But if you think about, okay,
-
there is 1.000 Tor hidden services, if you
can control a percentage of the Guard nodes
-
then some hidden services will pick you
and then you’ll be able to deanonymise those.
-
So provided you don’t care which hidden
services you gonna deanonymise
-
then it becomes much more straight-forward
to control the Guard nodes of some hidden
-
services but you can’t pick exactly
what those are.
-
So what sort of data can you see
traversing a relay?
-
This is a modified Tor client which just
dumps cells which are coming…
-
essentially packets travelling down
a circuit, and the information you can
-
extract from them at a Guard node.
And this is done off the main Tor network.
-
So I’ve got a client connected to
a “malicious” Guard relay
-
and it logs every single packet – they’re
called ‘cells’ in the Tor protocol –
-
coming through the Guard relay. We can’t
decrypt the packet because it’s encrypted
-
three times. What we can record,
though, is the IP address of the user,
-
the IP address of the next hop,
and we can count packets travelling
-
in each direction down the circuit. And we
can also record the time at which those
-
packets were sent. So of course, if you’re
doing the traffic correlation attacks
-
you’re using that time in the information
to try and work out whether you’re seeing
-
traffic which you’ve sent and which
identifies a particular user or not.
-
Or indeed traffic which they’ve sent
which you’ve seen at a different point
-
in the network.
-
Moving on to my…
-
…interesting problems,
research questions etc.
-
Based on what I’ve said, I’ve said there’s
these directory authorities which are
-
controlled by the core Tor members. If
e.g. they were malicious then they could
-
manipulate the Tor… – if a big enough
chunk of them are malicious then
-
they can manipulate the consensus
to direct you to particular nodes.
-
I don’t think that’s the case, and that
anyone thinks that’s the case.
-
And Tor is designed in a way to tr…
I mean that you’d have to control
-
a certain number of the authorities
to be able to do anything important.
-
So the Tor people… I said this
to them a couple of days ago.
-
I find it quite funny that you’d design
your system as if you don’t trust
-
each other. To which their response was:
“No, we design our system so that
-
we don’t have to trust each other.” Which
I think is a very good model to have,
-
when you have this type of system.
So could we eliminate these sort of
-
centralized servers? I think that’s
actually a very hard problem to do.
-
There are lots of attacks which could
potentially be deployed against
-
a decentralized network. At the moment the
Tor network is relatively well understood
-
both in terms of what types of attack it
is vulnerable to. So if we were to move
-
to a new architecture then we may open it
to a whole new class of attacks.
-
The Tor network has been existing
for quite some time and it’s been
-
very well studied. What about global
adversaries like the NSA, where you could
-
monitor network links all across the
world? It’s very difficult to defend
-
against that. Where they can monitor…
if they can identify which Guard relay
-
you’re using, they can monitor traffic
going into and out of the Guard relay,
-
and they log each of the subsequent hops
along. It’s very, very difficult to defend against
-
these types of things. Do we know if
they’re doing it? The documents that were
-
released yesterday – I’ve only had a very
brief look through them, but they suggest
-
that they’re not presently doing it and
they haven’t had much success.
-
I don’t know why, there are very powerful
attacks described in the academic literature
-
which are very, very reliable and most
academic literature you can access for free
-
so it’s not even as if they have to figure
out how to do it. They just have to read
-
the academic literature and try and
implement some of these attacks.
-
I don’t know what – why they’re not. The
next question is how to detect malicious
-
relays. So in my case we’re running
40 relays. Our relays were on consecutive
-
IP addresses, so we’re running 40
– well, most of them are on consecutive
-
IP addresses in two blocks. So they’re
running on IP addresses numbered
-
e.g. 1,2,3,4,…
We were running two relays per IP address,
-
and every single relay had my name
plastered across it.
-
So after I set up these 40 relays in
-
a relatively short period of time
I expected someone from the Tor Project
-
to come to me and say: “Hey Gareth, what
are you doing?” – no one noticed,
-
no one noticed. So this is presently
an open question. On the Tor Project
-
they’re quite open about this. They
acknowledged that, in fact, last year
-
we had the CERT researchers launch much
more relays than that. The Tor Project
-
spotted those large number of relays
but chose not to do anything about it
-
and, in fact, they were deploying an
attack. But, as you know, it’s often very
-
difficult to defend against unknown
attacks. So at the moment how to detect
-
malicious relays is a bit of an open
question. Which as I think is being
-
discussed on the mailing list.
-
The other one is defending against unknown
tampering at exits. If you took or take
-
the exit relays – the exit relay
can tamper with the traffic.
-
So we know particular types of attacks
doing SSL man-in-the-middles etc.
-
We’ve seen recently binary patching.
How do we detect unknown tampering
-
with traffic, other types of traffic? So
the binary tampering wasn’t spotted
-
until it was spotted by someone who
told the Tor Project. So it wasn’t
-
detected e.g. by the Tor Project
themselves, it was spotted by someone else
-
and notified to them. And then the final
one open on here is the Tor code review.
-
So the Tor code is open source. We know
from OpenSSL that, although everyone
-
can read source code, people don’t always
look at it. And OpenSSL has been
-
a huge mess, and there’s been
lots of stuff disclosed over that
-
over the last coming days. There are
lots of eyes on the Tor code but I think
-
always, more eyes are better. I’d say,
ideally if we can get people to look
-
at the Tor code and look for
vulnerabilities then… I encourage people
-
to do that. It’s a very useful thing to
do. There could be unknown vulnerabilities
-
as we’ve seen with the “relay early” type
quite recently in the Tor code which
-
could be quite serious. The truth is we
just don’t know until people do thorough
-
code audits, and even then it’s very
difficult to know for certain.
-
So my last point, I think, yes,
-
is advice to future researchers.
So if you ever wanted, or are planning
-
on doing a study in the future, e.g. on
Tor, do not do what the CERT researchers
-
do and start deanonymising people on the
live Tor network and doing it in a way
-
which is incredibly irresponsible. I don’t
think…I mean, I tend, myself, to give you with
-
the benefit of a doubt, I don’t think the
CERT researchers set out to be malicious.
-
I think they’re just very naive.
That’s what it was they were doing.
-
That was rapidly pointed out to them.
In my case we are running
-
40 relays. Our Tor relays they were forwarding
traffic, they were acting as good relays.
-
The only thing that we were doing
was logging publication requests
-
to the directories. Big question whether
that’s malicious or not – I don’t know.
-
One thing that has been pointed out to me
is that the .onion addresses themselves
-
could be considered sensitive information,
so any data we will be retaining
-
from the study is the aggregated data.
So we won't be retaining information
-
on individual .onion addresses because
that could potentially be considered
-
sensitive information. If you think about
someone running an .onion address which
-
contains something which they don’t want
other people knowing about. So we won’t
-
be retaining that data, and
we’ll be destroying them.
-
So I think that brings me now
to starting the questions.
-
I want to say “Thanks” to a couple of
people. The student who donated
-
the server to us. Nick Savage who is one
of my colleagues who was a sounding board
-
during the entire study. Ivan Pustogarov
who is the researcher at the University
-
of Luxembourg who sent us the large data
set of .onion addresses from last year.
-
He’s also the chap who has demonstrated
those deanonymisation attacks
-
that I talked about. A big "Thank you" to
Roger Dingledine who has frankly been…
-
presented loads of questions to me over
the last couple of days and allowed me
-
to bounce ideas back and forth.
That has been a very useful process.
-
If you are doing future research I strongly
encourage you to contact the Tor Project
-
at the earliest opportunity. You’ll find
them… certainly I found them to be
-
extremely helpful.
-
Donncha also did something similar,
so both Ivan and Donncha have done
-
a similar study in trying to classify the
types of hidden services or work out
-
how many hits there are to particular
types of hidden service. Ivan Pustogarov
-
did it on a bigger scale
and found similar results to us.
-
That is that these abuse sites
featured frequently
-
in the top requested sites. That was done
over a year ago, and again, he was seeing
-
similar sorts of pattern. There were these
abuse sites being requested frequently.
-
So that also sort of probates
what we’re saying.
-
The data I put online is at this address,
there will probably be the slides,
-
something called ‘The Tor Research
Framework’ which is an implementation
-
of a Java client, so an implementation
of a Tor client in Java specifically aimed
-
at researchers. So if e.g. you wanna pull
out data from a consensus you can do.
-
If you want to build custom routes
through the network you can do.
-
If you want to build routes through the
network and start sending padding traffic
-
down them you can do etc.
The code is designed in a way which is
-
designed to be easily modifiable
for testing lots of these things.
-
There is also a link to the Tor FBI
exploit which they deployed against
-
visitors to some Tor hidden services last
year. They exploited a Mozilla Firefox bug
-
and then ran code on users who were
visiting these hidden service, and ran
-
code on their computer to identify them.
At this address there is a link to that
-
including a copy of the shell code and an
analysis of exactly what it was doing.
-
And then of course a list of references,
with papers and things.
-
So I’m quite happy to take questions now.
-
applause
-
Herald: Thanks for the nice talk!
Do we have any questions
-
from the internet?
-
Signal Angel: One question. It’s very hard
to block addresses since creating them
-
is cheap, and they can be generated
for each user, and rotated often. So
-
can you think of any other way
for doing the blocking?
-
Gareth: That is absolutely true, so, yes.
If you were to block a particular .onion
-
address they can wail: “I want another
.onion address.” So I don’t know of
-
any way to counter that now.
-
Herald: Another one from the internet?
inaudible answer from Signal Angel
-
Okay, then, Microphone 1, please!
-
Question: Thank you, that’s fascinating
research. You mentioned that it is
-
possible to influence the hash of your
relay node in a sense that you could
-
to be choosing which service you are
advertising, or which hidden service
-
you are responsible for. Is that right?
Gareth: Yeah, correct!
-
Question: So could you elaborate
on how this is possible?
-
Gareth: So e.g. you just keep regenerating
a public key for your relay,
-
you’ll get closer and closer to the point
where you’ll be the responsible relay
-
for that particular hidden service. That’s
just – you keep regenerating your identity
-
hash until you’re at that particular point
in the relay. That’s not particularly
-
computationally intensive to do.
That was it?
-
Herald: Okay, next question
from Microphone 5, please.
-
Question: Hi, I was wondering for the
attacks where you identify a certain number
-
of users using a hidden service. Have
those attacks been used, or is there
-
any evidence there, and is there
any way of protecting against that?
-
Gareth: That’s a very interesting question,
is there any way to detect these types
-
of attacks? So some of the attacks,
if you’re going to generate particular
-
traffic patterns, one way to do that is to
use the padding cells. The padding cells
-
aren’t used at the moment by the official
Tor client. So the detection of those
-
could be indicative but it doesn't...
it`s not conclusive evidence in our tool.
-
Question: And is there any way of
protecting against a government
-
or something trying to denial-of-service
hidden services?
-
Gareth: So I… trying to… did not…
-
Question: Is it possible to protect
against this kind of attack?
-
Gareth: Not that I’m aware of. The Tor
Project are currently revising how they
-
do the hidden service protocol which will
make e.g. what I did, enumerating
-
the hidden services, much more difficult.
And to also be in a position on the
-
distributed hash table in advance
for a particular hidden service.
-
So they are at the moment trying to change
the way it’s done, and make some of
-
these things more difficult.
-
Herald: Good. Next question
from Microphone 2, please.
-
Mic2: Hi. I’m running the Tor2Web abuse,
and so I used to see a lot of abuse of requests
-
concerning the Tor hidden service
being exposed on the internet through
-
the Tor2Web.org domain name. And I just
wanted to comment on, like you said,
-
the abuse number of the requests. I used
to spoke with some of the child protection
-
agencies that reported abuse at
Tor2Web.org, and they are effectively
-
using crawlers that periodically look for
changes in order to get new images to be
-
put in the database. And what I was able
to understand is that the German agency
-
doing that is crawling the same sites that
the Italian agencies are crawling, too.
-
So it’s likely that in most of the
countries there are the child protection
-
agencies that are crawling those few
numbers of Tor hidden services that
-
contain child porn. And I saw it also
a bit from the statistics of Tor2Web
-
where the amount of abuse relating to
that kind of content, it’s relatively low.
-
Just as contribution!
-
Gareth: Yes, that’s very interesting,
thank you for that!
-
applause
-
Herald: Next, Microphone 4, please.
-
Mic4: You then attacked or deanonymised
users with an infected or a modified Guard
-
relay? Is it required to modify the Guard
relay if I control the entry point
-
of the user to the internet?
If I’m his ISP?
-
Gareth: Yes, if you observe traffic
travelling into a Guard relay without
-
controlling the Guard relay itself.
Mic4: Yeah.
-
Gareth: In theory, yes. I wouldn’t be able
to tell you how reliable that is
-
off the top of my head.
Mic4: Thanks!
-
Herald: So another question
from the internet!
-
Signal Angel: Wouldn’t the ability to
choose the key hash prefix give
-
the ability to target specific .onions?
-
Gareth: So you can only target one .onion
address at a time. Because of the way
-
they are generated. So you wouldn’t be
able to say e.g. “Pick a key which targeted
-
two or more .onion addresses.” You can
only target one .onion address at a time
-
by positioning yourself at a particular
point on the distributed hash table.
-
Herald: Another one
from the internet? … Okay.
-
Then Microphone 3, please.
-
Mic3: Hey. Thanks for this research.
I think it strengthens the network.
-
So in the deem (?) I was wondering whether
you can donate this relays to be a part of
-
non-malicious relays pool, basically
use them as regular relays afterwards?
-
Gareth: Okay, so can I donate the relays
a rerun and at the Tor capacity (?) ?
-
Unfortunately, I said they were run by
a student and they were donated for
-
a fixed period of time. So we’ve given
those back to him. We are very grateful
-
to him, he was very generous. In fact,
without his contribution donating these
-
it would have been much more difficult
to collect as much data as we did.
-
Herald: Good, next, Microphone 5, please!
-
Mic5: Yeah hi, first of all thanks
for your talk. I think you’ve raised
-
some real issues that need to be
considered very carerfully by everyone
-
on the Tor Project. My question: I’d like
to go back to the issue with so many
-
abuse related web sites running over
the Tor Project. I think it’s an important
-
issue that really needs to be considered
because we don’t wanna be associated
-
with that at the end of the day.
Anyone who uses Tor, who runs a relay
-
or an exit node. And I understand it’s
a bit of a censored issue, and you don’t
-
really have any say over whether it’s
implemented or not. But I’d like to get
-
your opinion on the implementation
of a distributed block-deny system
-
that would run in very much a similar way
to those of the directory authorities.
-
I’d just like to see what
you think of that.
-
Gareth: So you’re asking me whether I want
to support a particular blocking mechanism
-
then?
-
Mic5: I’d like to get your opinion on it.
Gareth laughs
-
I know it’s a sensitive issue but I think,
like I said, I think something…
-
I think it needs to be considered because
everyone running exit nodes and relays
-
and people of the Tor Project don’t
want to be known or associated with
-
these massive amount of abuse web sites
that currently exist within the Tor network.
-
Gareth: I absolutely agree, and I think
the Tor Project are horrified as well that
-
this problem exists, and they, in fact,
talked on it in previous years that
-
they have a problem with this type of
content. I asked to what if anything is
-
done about it, it’s very much up to them.
Could it be done in a distributed fashion?
-
So the example I gave was a way which
it could be done by relay operators.
-
So e.g. that would need the consensus of
a large number of relay operators to be
-
effective. So that is done in
a distributed fashion. The question is:
-
who gives the list of .onion addresses to
block to each of the relay operators?
-
Clearly, the relay operators aren’t going
to collect themselves. It needs to be
-
supplied by someone like the Tor Project,
e.g., or someone trustworthy. Yes, it can
-
be done in a distributed fashion.
It can be done in an open fashion.
-
Mic5: Who knows?
Gareth: Okay.
-
Mic5: Thank you.
-
Herald: Good. And another
question from the internet.
-
Signal Angel: Apparently there’s an option
in the Tor client to collect statistics
-
on hidden services. Do you know about
this, and how it relates to your research?
-
Gareth: Yes, I believe they’re going to
be… the extent to which I know about it
-
is they’re gonna be trying this next
month, to try and estimate how many
-
hidden services there are. So keep
your eye on the Tor Project web site,
-
I’m sure they’ll be publishing
their data in the coming months.
-
Herald: And, sadly, we are running out of
time, so this will be the last question,
-
so Microphone 4, please!
-
Mic4: Hi, I’m just wondering if you could
sort of outline what ethical clearances
-
you had to get from your university
to conduct this kind of research.
-
Gareth: So we have to discuss these
types of things before undertaking
-
any research. And we go through the steps
to make sure that we’re not e.g. storing
-
sensitive information about particular
people. So yes, we are very mindful
-
of that. And that’s why I made a
particular point of putting on the slides
-
as to some of the things to consider.
-
Mic4: So like… you outlined a potential
implementation of the traffic correlation
-
attack. Are you saying that
you performed the attack? Or…
-
Gareth: No, no no, absolutely not.
So the link I’m giving… absolutely not.
-
We have not engaged in any…
-
Mic4: It just wasn’t clear
from the slides.
-
Gareth: I apologize. So it’s absolutely
clear on that. No, we’re not engaging
-
in any deanonymisation research on the
Tor network. The research I showed
-
is linked on the references, I think,
which I put at the end of the slides.
-
You can read about it. But it’s done in
simulation. So e.g. there’s a way
-
to do simulation of the Tor network on
a single computer. I can’t remember
-
the name of the project, though.
Shadow! Yes, it’s a system
-
called Shadow, we can run a large
number of Tor relays on a single computer
-
and simulate the traffic between them.
If you’re going to do that type of research
-
then you should use that. Okay,
thank you very much, everyone.
-
applause
-
silent postroll titles
-
subtitles created by c3subtitles.de
Join, and help us!