Return to Video

Dr Gareth Owen: Tor: Hidden Services and Deanonymisation

  • 0:00 - 0:10
    silent 31C3 preroll
  • 0:10 - 0:13
    Dr. Gareth Owen: Hello. Can you hear me?
    Yes. Okay. So my name is Gareth Owen.
  • 0:13 - 0:16
    I’m from the University of Portsmouth.
    I’m an academic
  • 0:16 - 0:19
    and I’m going to talk to you about
    an experiment that we did
  • 0:19 - 0:23
    on the Tor hidden services,
    trying to categorize them,
  • 0:23 - 0:25
    estimate how many they were etc. etc.
  • 0:25 - 0:27
    Well, as we go through the talk
    I’m going to explain
  • 0:27 - 0:31
    how Tor hidden services work internally,
    and how the data was collected.
  • 0:31 - 0:35
    So what sort of conclusions you can draw
    from the data based on the way that we’ve
  • 0:35 - 0:40
    collected it. Just so [that] I get
    an idea: how many of you use Tor
  • 0:40 - 0:42
    on a regular basis, could you
    put your hand up for me?
  • 0:42 - 0:46
    So quite a big number. Keep your hand
    up if… or put your hand up if you’re
  • 0:46 - 0:48
    a relay operator.
  • 0:48 - 0:51
    Wow, that’s quite a significant number,
    isn’t it? And then, put your hand up
  • 0:51 - 0:55
    and/or keep it up if you
    run a hidden service.
  • 0:55 - 1:00
    Okay, so, a fewer number, but still
    some people run hidden services.
  • 1:00 - 1:03
    Okay, so, some of you may be very familiar
    with the way Tor works, sort of,
  • 1:03 - 1:07
    in a low level. But I am gonna go through
    it for those which aren’t, so they understand
  • 1:07 - 1:10
    just how they work. And as we go along,
    because I’m explaining how
  • 1:10 - 1:14
    the hidden services work, I’m going
    to tag on information on how
  • 1:14 - 1:19
    the Tor hidden services themselves can be
    deanonymised and also how the users
  • 1:19 - 1:23
    of those hidden services can be
    deanonymised, if you put
  • 1:23 - 1:27
    some strict criteria on what it is you
    want to do with respect to them.
  • 1:27 - 1:31
    So the things that I’m going to go over:
    I wanna go over how Tor works,
  • 1:31 - 1:34
    and then specifically how hidden services
    work. I’m gonna talk about something
  • 1:34 - 1:38
    called the “Tor Distributed Hash Table”
    for hidden services. If you’ve heard
  • 1:38 - 1:41
    that term and don’t know what
    it means, don’t worry, I’ll explain
  • 1:41 - 1:44
    what a distributed hash table is and
    how it works. It’s not as complicated
  • 1:44 - 1:48
    as it sounds. And then I wanna go over
    Darknet data, so, data that we collected
  • 1:48 - 1:53
    from Tor hidden services. And as I say,
    as we go along I will sort of explain
  • 1:53 - 1:57
    how you do deanonymisation of both the
    services themselves and of the visitors
  • 1:57 - 2:02
    to the service. And just
    how complicated it is.
  • 2:02 - 2:07
    So you may have seen this slide which
    I think was from GCHQ, released last year
  • 2:07 - 2:12
    as part of the Snowden leaks where they
    said: “You can deanonymise some users
  • 2:12 - 2:16
    some of the time but they’ve had
    no success in deanonymising someone
  • 2:16 - 2:20
    in response to a specific request.”
    So, given all of you e.g., I may be able
  • 2:20 - 2:25
    to deanonymise a small fraction of you
    but I can’t choose precisely one person
  • 2:25 - 2:27
    I want to deanonymise. That’s what
    I’m gonna be explaining in relation
  • 2:27 - 2:31
    to the deanonymisation attacks, how
    you can deanonymise a section but
  • 2:31 - 2:39
    you can’t necessarily choose which section
    of the users that you will be deanonymising.
  • 2:39 - 2:43
    Tor drives with just a couple
    of different problems. On one part
  • 2:43 - 2:46
    it allows you to bypass censorship. So if
    you’re in a country like China, which
  • 2:46 - 2:51
    blocks some types of traffic you can use
    Tor to bypass their censorship blocks.
  • 2:51 - 2:56
    It tries to give you privacy, so, at some
    level in the network someone can’t see
  • 2:56 - 2:59
    what you’re doing. And at another point
    in the network people who don’t know
  • 2:59 - 3:03
    who you are but may necessarily
    be able to see what you’re doing.
  • 3:03 - 3:07
    Now the traditional case
    for this is to look at VPNs.
  • 3:07 - 3:11
    With a VPN you have
    sort of a single provider.
  • 3:11 - 3:15
    You have lots of users connecting
    to the VPN. The VPN has sort of
  • 3:15 - 3:18
    a mixing effect from an outside or
    a server’s point of view. And then
  • 3:18 - 3:22
    out of the VPN you see requests
    to Twitter, Wikipedia etc. etc.
  • 3:22 - 3:27
    And if that traffic doesn’t encrypt it then
    the VPN can also read the contents
  • 3:27 - 3:31
    of the traffic. Now of course there is
    a fundamental weakness with this.
  • 3:31 - 3:36
    If you trust the VPN provider the VPN
    provider knows both who you are
  • 3:36 - 3:40
    and what you’re doing and can
    link those two together with absolute
  • 3:40 - 3:44
    certainty. So you don’t… whilst you do
    get some of these properties, assuming
  • 3:44 - 3:48
    you’ve got a trustworthy VPN provider
    you don’t get them in the face of
  • 3:48 - 3:52
    an untrustworthy VPN provider.
    And of course: how do you trust the VPN
  • 3:52 - 3:59
    provider? What sort of measure do
    you use? That’s sort of an open question.
  • 3:59 - 4:04
    So Tor tries to solve this problem
    by distributing the trust. Tor is
  • 4:04 - 4:08
    an open source project, so you can go
    on to their Git repository, you can
  • 4:08 - 4:13
    download the source code, and change it,
    improve it, submit patches etc.
  • 4:13 - 4:17
    As you heard earlier, during Jacob and
    Roger’s talk they’re currently partly
  • 4:17 - 4:21
    sponsored by the US Government which seems
    a bit paradoxical, but they explained
  • 4:21 - 4:25
    in that talk many of the… that
    doesn’t affect like judgment.
  • 4:25 - 4:29
    And indeed, they do have some funding from
    other sources, and they design that system
  • 4:29 - 4:31
    – which I’ll talk about a little bit
    later – in a way where they don’t have
  • 4:31 - 4:34
    to trust each other. So there’s sort of
    some redundancy, and they’re trying
  • 4:34 - 4:40
    to minimize these sort of trust issues
    related to this. Now, Tor is
  • 4:40 - 4:43
    a partially de-centralized network, which
    means that it has some centralized
  • 4:43 - 4:48
    components which are under the control of
    the Tor Project and some de-centralized
  • 4:48 - 4:51
    components which are normally the Tor
    relays. If you run a relay you’re
  • 4:51 - 4:56
    one of those de-centralized components.
    There is, however, no single authority
  • 4:56 - 5:01
    on the Tor network.
    So no single server which is responsible,
  • 5:01 - 5:04
    which you’re required to trust.
    So the trust is somewhat distributed,
  • 5:04 - 5:12
    but not entirely. When you establish
    a circuit through Tor you, the user,
  • 5:12 - 5:16
    download a list of all of the relays
    inside the Tor network.
  • 5:16 - 5:19
    And you get to pick – and I’ll tell you
    how you do that – which relays
  • 5:19 - 5:23
    you’re going to use to route your traffic
    through. So here is a typical example:
  • 5:23 - 5:27
    You’re here on the left hand side as the
    user. You download a list of the relays
  • 5:27 - 5:32
    inside the Tor network and you select from
    that list three nodes, a guard node
  • 5:32 - 5:37
    which is your entry into the Tor network,
    a relay node which is a middle node.
  • 5:37 - 5:39
    Essentially, it’s going to route your
    traffic to a third hop. And then
  • 5:39 - 5:43
    the third hop is the exit node where
    your traffic essentially exits out
  • 5:43 - 5:47
    on the internet. Now, looking at the
    circuit. So this is a circuit through
  • 5:47 - 5:50
    the Tor network through which you’re
    going to route your traffic. There are
  • 5:50 - 5:53
    three layers of encryption at the
    beginning, so between you
  • 5:53 - 5:56
    and the guard node. Your traffic
    is encrypted three times.
  • 5:56 - 5:59
    In the first instance encrypted to the
    guard, and the it’s encrypted again,
  • 5:59 - 6:03
    through the relay, and then encrypted
    again to the exit, and as the traffic moves
  • 6:03 - 6:09
    through the Tor network each of those
    layers of encryption are unpeeled
  • 6:09 - 6:17
    from the data. The Guard here in this case
    knows who you are, and the exit relay
  • 6:17 - 6:22
    knows what you’re doing but neither know
    both. And the middle relay doesn’t really
  • 6:22 - 6:27
    know a lot, except for which relay is
    her guard and which relay is her exit.
  • 6:27 - 6:32
    Who runs an exit relay? So if you run
    an exit relay all of the traffic which
  • 6:32 - 6:36
    users are sending out on the internet they
    appear to come from your IP address.
  • 6:36 - 6:41
    So running an exit relay is potentially
    risky because someone may do something
  • 6:41 - 6:46
    through your relay which attracts attention.
    And then, when law enforcement
  • 6:46 - 6:49
    traced that back to an IP address it’s
    going to come back to your address.
  • 6:49 - 6:52
    So some relay operators have had trouble
    with this, with law enforcement coming
  • 6:52 - 6:55
    to them, and saying: “Hey we got this
    traffic coming through your IP address
  • 6:55 - 6:58
    and you have to go and explain it.”
    So if you want to run an exit relay
  • 6:58 - 7:01
    it’s a little bit risky, but we’re thankful
    for those people that do run exit relays
  • 7:01 - 7:05
    because ultimately if people didn’t run
    an exit relay you wouldn’t be able
  • 7:05 - 7:08
    to get out of the Tor network, and it
    wouldn’t be terribly useful from this
  • 7:08 - 7:21
    point of view. So, yes.
    applause
  • 7:21 - 7:25
    So every Tor relay, when you set up
    a Tor relay you publish something called
  • 7:25 - 7:29
    a descriptor which describes your Tor
    relay and how to use it to a set
  • 7:29 - 7:33
    of servers called the authorities. And the
    trust in the Tor network is essentially
  • 7:33 - 7:39
    split across these authorities. They’re run
    by the core Tor Project members.
  • 7:39 - 7:43
    And they maintain a list of all of the
    relays in the network. And they observe
  • 7:43 - 7:46
    them over a period of time. If the relays
    exhibit certain properties they give
  • 7:46 - 7:50
    the relays flags. If e.g. a relay allows
    traffic to exit from the Tor network
  • 7:50 - 7:54
    it will get the ‘Exit’ flag. If they’d been
    switched on for a certain period of time,
  • 7:54 - 7:58
    or for a certain amount of traffic they’ll
    be allowed to become the guard relay
  • 7:58 - 8:02
    which is the first node in your circuit.
    So when you build your circuit you
  • 8:02 - 8:07
    download a list of these descriptors from
    one of the Directory Authorities. You look
  • 8:07 - 8:10
    at the flags which have been assigned to
    each of the relays, and then you pick
  • 8:10 - 8:14
    your route based on that. So you’ll pick
    the guard node from a set of relays
  • 8:14 - 8:16
    which have the ‘Guard’ flag, your exits
    from the set of relays which have
  • 8:16 - 8:21
    the ‘Exit’ flag etc. etc. Now, as of
    a quick count this morning there are
  • 8:21 - 8:29
    about 1500 guard relays, around 1000 exit
    relays, and six relays flagged as ‘bad’ exits.
  • 8:29 - 8:34
    What does a ‘bad exit’ mean?
    waits for audience to respond
  • 8:34 - 8:38
    That’s not good! That’s exactly
    what it means! Yes! laughs
  • 8:38 - 8:40
    applause
  • 8:40 - 8:46
    So relays which have been flagged as ‘bad
    exits’ your client will never chose to exit
  • 8:46 - 8:51
    traffic through. And examples of things
    which may get a relay flagged as an
  • 8:51 - 8:54
    [bad] exit relay – if they’re fiddling with
    the traffic which is coming out of
  • 8:54 - 8:57
    the Tor relay. Or doing things like
    man-in-the-middle attacks against
  • 8:57 - 9:02
    SSL traffic. We’ve seen various things,
    there have been relays man-in-the-middling
  • 9:02 - 9:07
    SSL traffic, there have very, very recently
    been an exit relay which was patching
  • 9:07 - 9:11
    binaries that you downloaded from the
    internet, inserting malware into the binaries.
  • 9:11 - 9:15
    So you can do these things but the Tor
    Project tries to scan for them. And if
  • 9:15 - 9:20
    these things are detected then they’ll be
    flagged as ‘Bad Exits’. It’s true to say
  • 9:20 - 9:25
    that the scanning mechanism is not 100%
    fool-proof by any stretch of the imagination.
  • 9:25 - 9:29
    It tries to pick up common types
    of attacks, so as a result
  • 9:29 - 9:32
    it won’t pick up unknown attacks or
    attacks which haven’t been seen or
  • 9:32 - 9:37
    have not been known about beforehand.
  • 9:37 - 9:45
    So looking at this, how do you deanonymise
    the traffic travelling through the Tor
  • 9:45 - 9:49
    networks? Given some traffic coming out
    of the exit relay, how do you know
  • 9:49 - 9:54
    which user that corresponds to? What is
    their IP address? You can’t actually
  • 9:54 - 9:58
    modify the traffic because if any of the
    relays tried to modify the traffic
  • 9:58 - 10:02
    which they’re sending through the network
    Tor will tear down the circuit through the relay.
  • 10:02 - 10:06
    So there’s these integrity checks, each
    of the hops. And if you try to sort of
  • 10:06 - 10:10
    – because you can’t decrypt the packet
    you can’t modify it in any meaningful way,
  • 10:10 - 10:14
    and because there’s an integrity check
    at the next hop that means that you can’t
  • 10:14 - 10:17
    modify the packet because otherwise it’s
    detected. So you can’t do this sort of
  • 10:17 - 10:21
    marker, and try and follow the marker
    through the network. So instead
  • 10:21 - 10:27
    what you can do if you control… so let me
    give you two cases. In the worst case
  • 10:27 - 10:31
    if the attacker controls all three of your
    relays that you pick, which is an unlikely
  • 10:31 - 10:35
    scenario that needs to control quite
    a big proportion of the network. Then
  • 10:35 - 10:40
    it should be quite obvious that they can
    work out who you are and also
  • 10:40 - 10:42
    see what you’re doing because in that
    case they can tag the traffic, and
  • 10:42 - 10:46
    they can just discard these integrity
    checks at each of the following hops.
  • 10:46 - 10:51
    Now in a different case, if you control
    the Guard relay and the exit relay
  • 10:51 - 10:54
    but not the middle relay the Guard relay
    can’t tamper with the traffic because
  • 10:54 - 10:58
    this middle relay will close down the
    circuit as soon as it happens.
  • 10:58 - 11:01
    The exit relay can’t send stuff back down
    the circuit to try and identify the user,
  • 11:01 - 11:05
    either. Because again, the circuit will be
    closed down. So what can you do?
  • 11:05 - 11:10
    Well, you can count the number of packets
    going through the Guard node. And you can
  • 11:10 - 11:15
    measure the timing differences between
    packets, and try and spot that pattern
  • 11:15 - 11:19
    at the Exit relays. You’re looking at counts of
    packets and the timing between those
  • 11:19 - 11:22
    packets which are being sent, and
    essentially trying to correlate them all.
  • 11:22 - 11:27
    So if your user happens to pick you as
    your Guard node, and then happens to pick
  • 11:27 - 11:32
    your exit relay, then you can deanonymise
    them with very high probability using
  • 11:32 - 11:36
    this technique. You’re just correlating
    the timings of packets and counting
  • 11:36 - 11:39
    the number of packets going through.
    And the attacks demonstrated in literature
  • 11:39 - 11:45
    are very reliable for this. We heard
    earlier from the Tor talk about the “relay
  • 11:45 - 11:51
    early” tag which was the attack discovered
    by the cert researches in the US.
  • 11:51 - 11:55
    That attack didn’t rely on timing attacks.
    Instead, what they were able to do was
  • 11:55 - 11:59
    send a special type of cell containing
    the data back down the circuit,
  • 11:59 - 12:02
    essentially marking this data, and saying:
    “This is the data we’re seeing
  • 12:02 - 12:06
    at the Exit relay, or at the hidden
    service", and encode into the messages
  • 12:06 - 12:10
    travelling back down the circuit, what the
    data was. And then you could pick
  • 12:10 - 12:14
    those up at the Guard relay and say, okay,
    whether it’s this person that’s doing that.
  • 12:14 - 12:18
    In fact, although this technique works,
    and yeah it was a very nice attack,
  • 12:18 - 12:21
    the traffic correlation attacks are
    actually just as powerful.
  • 12:21 - 12:25
    So although this bug has been fixed traffic
    correlation attacks still work and are
  • 12:25 - 12:30
    still fairly, fairly reliable. So the problem
    still does exist. This is very much
  • 12:30 - 12:33
    an open question. How do we solve this
    problem? We don’t know, currently,
  • 12:33 - 12:40
    how to solve this problem of trying
    to tackle the traffic correlation.
  • 12:40 - 12:45
    There are a couple of solutions.
    But they’re not particularly…
  • 12:45 - 12:49
    they’re not particularly reliable. Let me
    just go through these, and I’ll skip back
  • 12:49 - 12:53
    on the few things I’ve missed. The first
    thing is, high-latency networks, so
  • 12:53 - 12:57
    networks where packets are delayed
    in their transit through the network.
  • 12:57 - 13:01
    That throws away a lot of the timing
    information. So they promise
  • 13:01 - 13:04
    to potentially solve this problem.
    But of course, if you want to visit
  • 13:04 - 13:07
    Google’s home page, and you have to wait
    five minutes for it, you’re simply
  • 13:07 - 13:12
    just not going to use Tor. The whole point
    is trying to make this technology usable.
  • 13:12 - 13:15
    And if you got something which is very,
    very slow then it doesn’t make it
  • 13:15 - 13:18
    attractive to use. But of course,
    this case does work slightly better
  • 13:18 - 13:22
    for e-mail. If you think about it with
    e-mail, you don’t mind if you’re e-mail
  • 13:22 - 13:25
    – well, you may not mind, you may mind –
    you don’t mind if your e-mail is delayed
  • 13:25 - 13:29
    by some period of time. Which makes this
    somewhat difficult. And as Roger said
  • 13:29 - 13:35
    earlier, you can also introduce padding
    into the circuit, so these are dummy cells.
  • 13:35 - 13:40
    But, but… with a big caveat: some of the
    research suggests that actually you’d
  • 13:40 - 13:43
    need to introduce quite a lot of padding
    to defeat these attacks, and that would
  • 13:43 - 13:47
    overload the Tor network in its current
    state. So, again, not a particular
  • 13:47 - 13:54
    practical solution.
  • 13:54 - 13:58
    How does Tor try to solve this problem?
    Well, Tor makes it very difficult
  • 13:58 - 14:03
    to become a users Guard relay. If you
    can’t become a users Guard relay
  • 14:03 - 14:08
    then you don’t know who the user is, quite
    simply. And so by making it very hard
  • 14:08 - 14:13
    to become the Guard relay therefore you
    can’t do this traffic correlation attack.
  • 14:13 - 14:18
    So at the moment the Tor client chooses
    one Guard relay and keeps it for a period
  • 14:18 - 14:22
    of time. So if I want to sort of target
    just one of you I would need to control
  • 14:22 - 14:26
    the Guard relay that you were using at
    that particular point in time. And in fact
  • 14:26 - 14:31
    I’d also need to know what that Guard
    relay is. So by making it very unlikely
  • 14:31 - 14:34
    that you would select a particular malicious
    Guard relay, where the number of malicious
  • 14:34 - 14:39
    Guard relays is very small, that’s how Tor
    tries to solve this problem. And
  • 14:39 - 14:43
    at the moment your Guard relay is your
    barrier of security. If the attacker can’t
  • 14:43 - 14:46
    control the Guard relay then they won’t
    know who you are. That doesn’t mean
  • 14:46 - 14:51
    they can’t try other sort of side channel
    attacks by messing with the traffic
  • 14:51 - 14:55
    at the Exit relay etc. You know that you
    may sort of e.g. download dodgy documents
  • 14:55 - 14:59
    and open one on your computer, and those
    sort of things. Now the alternative
  • 14:59 - 15:03
    of course to having a Guard relay
    and keeping it for a very long time
  • 15:03 - 15:06
    will be to have a Guard relay and
    to change it on a regular basis.
  • 15:06 - 15:10
    Because you might think, well, just choosing
    one Guard relay and sticking with it
  • 15:10 - 15:13
    is probably a bad idea. But actually,
    that’s not the case. If you pick
  • 15:13 - 15:18
    the Guard relay, and assuming that the
    chance of picking a Guard relay that is
  • 15:18 - 15:23
    malicious is very low, then, when you
    first use your Guard relay, if you got
  • 15:23 - 15:27
    a good choice, then your traffic is safe.
    If you haven’t got a good choice then
  • 15:27 - 15:32
    your traffic isn’t safe. Whereas if your
    Tor client chooses a Guard relay
  • 15:32 - 15:36
    every few minutes, or every hour, or
    something on those lines at some point
  • 15:36 - 15:39
    you’re gonna pick a malicious Guard relay.
    So they’re gonna have some of your traffic
  • 15:39 - 15:43
    but not all of it. And so currently the
    trade-off is that we make it very difficult
  • 15:43 - 15:48
    for an attacker to control a Guard relay
    and the user picks a Guard relay and
  • 15:48 - 15:52
    keeps it for a long period of time. And
    so it’s very difficult for the attackers
  • 15:52 - 15:59
    to pick that Guard relay when they control
    a very small proportion of the network.
  • 15:59 - 16:06
    So this, currently, provides those
    properties I described earlier, the privacy
  • 16:06 - 16:11
    and the anonymity when you’re browsing the
    web, when you’re accessing websites etc.
  • 16:11 - 16:17
    But still you know who the website is. So
    although you’re anonymous and the website
  • 16:17 - 16:21
    doesn’t know who you are you know who the
    website is. And there may be some cases
  • 16:21 - 16:25
    where e.g. the website would also wish to
    remain anonymous. You want the person
  • 16:25 - 16:30
    accessing the website and the website
    itself to be anonymous to each other.
  • 16:30 - 16:34
    And you could think about people e.g.
    being in countries where running
  • 16:34 - 16:40
    a political blog e.g. might be a dangerous
    activity. If you run that on a regular
  • 16:40 - 16:46
    webserver you’re easily identified whereas,
    if you got some way where you as
  • 16:46 - 16:49
    the webserver can be anonymous then
    that allows you to do that activity without
  • 16:49 - 16:57
    being targeted by your government. So
    this is what hidden services try to solve.
  • 16:57 - 17:03
    Now when you first think about a problem
    you kind of think: “Hang on a second,
  • 17:03 - 17:06
    the user doesn’t know who the website
    is and the website doesn’t know
  • 17:06 - 17:10
    who the user is. So how on earth do they
    talk to each other?” Well, that’s essentially
  • 17:10 - 17:14
    what the Tor hidden service protocol tries
    to sort of set up. How do you identify and
  • 17:14 - 17:20
    connect to each other. So at the moment
    this is what happens: We’ve got Bob
  • 17:20 - 17:24
    on the [right] hand side who is the hidden
    service. And we got Alice on the left hand
  • 17:24 - 17:29
    side here who is the user who wishes to
    visit the hidden service. Now when Bob
  • 17:29 - 17:34
    sets up his hidden service he picks three
    nodes in the Tor network as introduction
  • 17:34 - 17:39
    points and builds several hop circuits to
    them. So the introduction points don’t know
  • 17:39 - 17:45
    who Bob is. Bob has circuits to them. And
    Bob says to each of these introduction points
  • 17:45 - 17:48
    “Will you relay traffic to me if someone
    connects to you asking for me?”
  • 17:48 - 17:53
    And then those introduction points
    do that. So then, once Bob has picked
  • 17:53 - 17:57
    his introduction points he publishes
    a descriptor describing the list of his
  • 17:57 - 18:01
    introduction points for someone who wishes
    to come onto his websites. And then Alice
  • 18:01 - 18:07
    on the left hand side wishing to visit Bob
    will pick a rendezvous point in the network
  • 18:07 - 18:10
    and build a circuit to it. So this “RP”
    here is the rendezvous point.
  • 18:10 - 18:15
    And she will relay a message via one of
    the introduction points saying to Bob:
  • 18:15 - 18:18
    “Meet me at the rendezvous point”.
    And then Bob will build a 3-hop-circuit
  • 18:18 - 18:23
    to the rendezvous point. So now at this
    stage we got Alice with a multi-hop circuit
  • 18:23 - 18:27
    to the rendezvous point, and Bob with
    a multi-hop circuit to the rendezvous point.
  • 18:27 - 18:33
    Alice and Bob haven’t connected to one
    another directly. The rendezvous point
  • 18:33 - 18:37
    doesn’t know who Bob is, the rendezvous
    point doesn’t know who Alice is.
  • 18:37 - 18:40
    All they’re doing is forwarding the
    traffic. And they can’t inspect the traffic,
  • 18:40 - 18:44
    either, because the traffic itself
    is encrypted.
  • 18:44 - 18:48
    So that’s currently how you solve this
    problem with trying to communicate
  • 18:48 - 18:51
    with someone who you don’t know
    who they are and vice versa.
  • 18:51 - 18:56
    drinks from the bottle
  • 18:56 - 18:59
    The principle thing I’m going to talk
    about today is this database.
  • 18:59 - 19:02
    So I said, Bob, when he picks his
    introduction points he builds this thing
  • 19:02 - 19:06
    called a descriptor, describing who his
    introduction points are, and he publishes
  • 19:06 - 19:10
    them to a database. This database itself
    is distributed throughout the Tor network.
  • 19:10 - 19:18
    It’s not a single server. So both, Bob and
    Alice need to be able to publish information
  • 19:18 - 19:22
    to this database, and also retrieve
    information from this database. And Tor
  • 19:22 - 19:25
    currently uses something called
    a distributed hash table, which I’m gonna
  • 19:25 - 19:28
    give an example of what this means and
    how it works. And then I’ll talk to you
  • 19:28 - 19:34
    specifically how the Tor Distributed Hash
    Table works itself. So let’s say e.g.
  • 19:34 - 19:40
    you've got a set of servers. So here we've
    got 26 servers and you’d like to store
  • 19:40 - 19:44
    your files across these different servers
    without having a single server responsible
  • 19:44 - 19:48
    for deciding, “okay, that file is stored
    on that server, and this file is stored
  • 19:48 - 19:53
    on that server” etc. etc. Now here is my
    list of files. You could take a very naive
  • 19:53 - 19:58
    approach. And you could say: “Okay, I’ve
    got 26 servers, I got all of these file names
  • 19:58 - 20:01
    and start with the letter of the alphabet.”
    And I could say: “All of the files that begin
  • 20:01 - 20:05
    with A are gonna go under server A; or
    the files that begin with B are gonna go
  • 20:05 - 20:10
    on server B etc.” And then when you want
    to retrieve a file you say: “Okay, what
  • 20:10 - 20:14
    does my file name begin with?” And then
    you know which server it’s stored on.
  • 20:14 - 20:18
    Now of course you could have a lot of
    servers – sorry – a lot of files
  • 20:18 - 20:23
    which begin with a Z, an X or a Y etc. in
    which case you’re gonna overload
  • 20:23 - 20:27
    that server. You’re gonna have more files
    stored on one server than on another server
  • 20:27 - 20:32
    in your set. And if you have a lot of big
    files, say e.g. beginning with B then
  • 20:32 - 20:36
    rather than distributing your files across
    all the servers you’re gonna just be
  • 20:36 - 20:39
    overloading one or two of them. So to
    solve this problem what we tend to do is:
  • 20:39 - 20:42
    we take the file name, and we run it
    through a cryptographic hash function.
  • 20:42 - 20:47
    A hash function produces output which
    looks like random, very small changes
  • 20:47 - 20:51
    in the input so a cryptographic hash
    function produces a very large change
  • 20:51 - 20:55
    in the output. And this change looks
    random. So if I take all of my file names
  • 20:55 - 21:00
    here, and assuming I have a lot more,
    I take a hash of them, and then I use
  • 21:00 - 21:05
    that hash to determine which server to
    store the file on. Then, with high probability
  • 21:05 - 21:10
    my files will be distributed evenly across
    all of the servers. And then when I want
  • 21:10 - 21:13
    to go and retrieve one of the files I take
    my file name, I run it through the
  • 21:13 - 21:16
    cryptographic hash function, that gives me
    the hash, and then I use that hash
  • 21:16 - 21:20
    to identify which server that particular
    file is stored on. And then I go and
  • 21:20 - 21:26
    retrieve it. So that’s the sort of a loose
    idea of how a distributed hash table works.
  • 21:26 - 21:29
    There are a couple of problems with this.
    What if you got a changing size, what
  • 21:29 - 21:35
    if the number of servers you got changes
    in size as it does in the Tor network.
  • 21:35 - 21:42
    It’s a very brief overview of the theory.
    So how does it apply for the Tor network?
  • 21:42 - 21:48
    Well, the Tor network has a set of relays
    and it has a set of hidden services.
  • 21:48 - 21:53
    Now we take all of the relays, and they
    have a hash identity which identifies them.
  • 21:53 - 21:57
    And we map them onto a circle using that
    hash value as an identifier. So you can
  • 21:57 - 22:03
    imagine the hash value ranging from Zero
    to a very large number. We got a Zero point
  • 22:03 - 22:07
    at the very top there. And that runs all
    the way round to the very large number.
  • 22:07 - 22:12
    So given the identity hash for a relay we
    can map that to a particular point on
  • 22:12 - 22:19
    the server. And then all we have to do
    is also do this for hidden services.
  • 22:19 - 22:22
    So there’s a hidden service address,
    something.onion, so this is
  • 22:22 - 22:28
    one of the hidden websites that you might
    visit. You take the – I’m not gonna describe
  • 22:28 - 22:34
    in too much detail how this is done but –
    the value is done in such a way such that
  • 22:34 - 22:38
    it’s evenly distributed about the circle.
    So your hidden service will have
  • 22:38 - 22:44
    a particular point on the circle. And the
    relays will also be mapped onto this circle.
  • 22:44 - 22:50
    So there’s the relays. And the hidden
    service. And in the case of Tor
  • 22:50 - 22:53
    the hidden service actually maps to two
    positions on the circle, and it publishes
  • 22:53 - 22:58
    its descriptor to the three relays to the
    right at one position, and the three relays
  • 22:58 - 23:02
    to the right at another position. So there
    are actually in total six places where
  • 23:02 - 23:05
    this descriptor is published on the
    circle. And then if I want to go and
  • 23:05 - 23:09
    fetch and connect to a hidden service
    I go on to go and pull this hidden descriptor
  • 23:09 - 23:14
    down to identify what its introduction
    points are. I take the hidden service
  • 23:14 - 23:17
    address, I find out where it is on the
    circle, I map all of the relays onto
  • 23:17 - 23:21
    the circle, and then I identify which
    relays on the circle are responsible
  • 23:21 - 23:24
    for that particular hidden service. And
    I just connect, then I say: “Do you have
  • 23:24 - 23:27
    a copy of the descriptor for that
    particular hidden service?”
  • 23:27 - 23:30
    And if so then we’ve got our list of
    introduction points. And we can go
  • 23:30 - 23:38
    to the next steps to connect to our hidden
    service. So I’m gonna explain how we
  • 23:38 - 23:41
    sort of set up our experiments. What we
    thought, or what we were interested to do,
  • 23:41 - 23:48
    was collect publications of hidden
    services. So for everytime a hidden service
  • 23:48 - 23:52
    gets set up it publishes to this distributed
    hash table. What we wanted to do was
  • 23:52 - 23:56
    collect those publications so that we
    get a complete list of all of the hidden
  • 23:56 - 23:59
    services. And what we also wanted to do
    is to find out how many times a particular
  • 23:59 - 24:06
    hidden service is requested.
  • 24:06 - 24:11
    Just one more point that
    will become important later.
  • 24:11 - 24:14
    The position which the hidden service
    appears on the circle changes
  • 24:14 - 24:19
    every 24 hours. So there’s not
    a fixed position every single day.
  • 24:19 - 24:24
    If we run 40 nodes over a long period of
    time we will occupy positions within
  • 24:24 - 24:30
    that distributed hash table. And we will be
    able to collect publications and requests
  • 24:30 - 24:34
    for hidden services that are located at
    that position inside the distributed
  • 24:34 - 24:39
    hash table. So in that case we ran 40 Tor
    nodes, we had a student at university
  • 24:39 - 24:44
    who said: “Hey, I run a hosting company,
    I got loads of server capacity”, and
  • 24:44 - 24:47
    we told him what we were doing, and he
    said: “Well, you really helped us out,
  • 24:47 - 24:50
    these last couple of years…”
    and just gave us loads of server capacity
  • 24:50 - 24:56
    to allow us to do this. So we spun up 40
    Tor nodes. Each Tor node was required
  • 24:56 - 25:00
    to advertise a certain amount of bandwidth
    to become a part of that distributed
  • 25:00 - 25:02
    hash table. It’s actually a very small
    amount, so this didn’t matter too much.
  • 25:02 - 25:06
    And then, after – this has changed
    recently in the last few days,
  • 25:06 - 25:10
    it used to be 25 hours, it’s just been
    increased as a result of one of the
  • 25:10 - 25:15
    attacks last week. But here… certainly
    during our study it was 25 hours. You then
  • 25:15 - 25:18
    appear at a particular point inside that
    distributed hash table. And you’re then
  • 25:18 - 25:23
    in a position to record publications of
    hidden services and requests for hidden
  • 25:23 - 25:28
    services. So not only can you get a full
    list of the onion addresses you can also
  • 25:28 - 25:32
    find out how many times each of the
    onion addresses are requested.
  • 25:32 - 25:38
    And so this is what we recorded. And then,
    once we had a full list of… or once
  • 25:38 - 25:42
    we had run for a long period of time to
    collect a long list of .onion addresses
  • 25:42 - 25:47
    we then built a custom crawler that would
    visit each of the Tor hidden services
  • 25:47 - 25:51
    in turn, and pull down the HTML contents,
    the text content from the web page,
  • 25:51 - 25:55
    so that we could go ahead and classify
    the content. Now it’s really important
  • 25:55 - 25:59
    to know here, and it will become obvious
    why a little bit later, we only pulled down
  • 25:59 - 26:03
    HTML content. We didn’t pull out images.
    And there’s a very, very important reason
  • 26:03 - 26:10
    for that which will become clear shortly.
  • 26:10 - 26:14
    We had a lot of questions when we
    first started this. Noone really knew
  • 26:14 - 26:18
    how many hidden services there were. It had
    been suggested to us there was a very high
  • 26:18 - 26:21
    turn-over of hidden services. We wanted to
    confirm that whether that was true or not.
  • 26:21 - 26:25
    And we also wanted to do this so,
    what are the hidden services,
  • 26:25 - 26:30
    how popular are they, etc. etc. etc. So
    our estimate for how many hidden services
  • 26:30 - 26:35
    there are, over the period which we
    ran our study, this is a graph plotting
  • 26:35 - 26:39
    our estimate for each of the individual
    days as to how many hidden services
  • 26:39 - 26:45
    there were on that particular day. Now the
    data is naturally noisy because we’re only
  • 26:45 - 26:49
    a very small proportion of that circle.
    So we’re only observing a very small
  • 26:49 - 26:53
    proportion of the total publications and
    requests every single day, for each of
  • 26:53 - 26:57
    those hidden services. And if you
    take a long term average for this
  • 26:57 - 27:03
    there’s about 45.000 hidden services that
    we think were present, on average,
  • 27:03 - 27:08
    each day, during our entire study. Which
    is a large number of hidden services.
  • 27:08 - 27:11
    But over the entire length we
    collected about 80.000, in total.
  • 27:11 - 27:14
    Some came and went etc.
    So the next question after how many
  • 27:14 - 27:18
    hidden services there are is how long
    the hidden service exists for.
  • 27:18 - 27:21
    Does it exist for a very long period
    of time, does it exist for a very short
  • 27:21 - 27:24
    period of time etc. etc.
    So what we did was, for every single
  • 27:24 - 27:30
    .onion address we plotted how many times
    we saw a publication for that particular
  • 27:30 - 27:34
    hidden service during the six months.
    How many times did we see it.
  • 27:34 - 27:38
    If we saw it a lot of times that suggested
    in general the hidden service existed
  • 27:38 - 27:42
    for a very long period of time. If we saw
    a very short number of publications
  • 27:42 - 27:46
    for each hidden service then that
    suggests that they were only present
  • 27:46 - 27:52
    for a very short period of time. This is
    our graph. By far the most number
  • 27:52 - 27:56
    of hidden services we only saw once during
    the entire study. And we never saw them
  • 27:56 - 28:00
    again. We suggest that there’s a very high
    turnover of the hidden services, they
  • 28:00 - 28:05
    don’t tend to exist on average i.e. for
    a very long period of time.
  • 28:05 - 28:11
    And then you can see the sort of
    a tail here. If we plot just those
  • 28:11 - 28:16
    hidden services which existed for a long
    time, so e.g. we could take hidden services
  • 28:16 - 28:20
    which have a high number of hit requests
    and say: “Okay, those that have a high number
  • 28:20 - 28:25
    of hits probably existed for a long time.”
    That’s not absolutely certain, but probably.
  • 28:25 - 28:29
    Then you see this sort of -normal- plot
    about 4..5, so we saw on average
  • 28:29 - 28:35
    most hidden services four or five times
    during the entire six months if they were
  • 28:35 - 28:41
    popular and we’re using that as a proxy
    measure for whether they existed
  • 28:41 - 28:48
    for the entire time. Now, this stage was
    over 160 days, so almost six months.
  • 28:48 - 28:51
    What we also wanted to do was trying
    to confirm this over a longer period.
  • 28:51 - 28:56
    So last year, in 2013, about February time
    some researchers of the University
  • 28:56 - 29:00
    of Luxemburg also ran a similar study
    but it ran over a very short period of time
  • 29:00 - 29:05
    over the day. But they did it in such
    a way it could collect descriptors
  • 29:05 - 29:09
    across much of the circle during a single
    day. That was because of a bug in the way
  • 29:09 - 29:12
    Tor did some of the things which has
    now been fixed so we can’t repeat that
  • 29:12 - 29:17
    as a particular way. So we got a list of
    .onion addresses from February 2013
  • 29:17 - 29:19
    from these researchers at the University
    of Luxemburg. And then we got our list
  • 29:19 - 29:24
    of .onion addresses from this six months
    which was March to September of this year.
  • 29:24 - 29:27
    And we wanted to say, okay, we’re given
    these two sets of .onion addresses.
  • 29:27 - 29:31
    Which .onion addresses existed in his set
    but not ours and vice versa, and which
  • 29:31 - 29:40
    .onion addresses existed in both sets?
  • 29:40 - 29:46
    So as you can see a very small minority
    of hidden service addresses existed
  • 29:46 - 29:50
    in both sets. This is over an 18 month
    period between these two collection points.
  • 29:50 - 29:54
    A very small number of services existed
    in both his data set and in
  • 29:54 - 29:58
    our data set. Which again suggested
    there’s a very high turnover of hidden
  • 29:58 - 30:03
    services that don’t tend to exist
    for a very long period of time.
  • 30:03 - 30:07
    So the question is why is that?
    Which we’ll come on to a little bit later.
  • 30:07 - 30:11
    It’s a very valid question, can’t answer
    it 100%, we have some inclines as to
  • 30:11 - 30:16
    why that may be the case. So in terms
    of popularity which hidden services
  • 30:16 - 30:20
    did we see, or which .onion addresses
    did we see requested the most?
  • 30:20 - 30:27
    Which got the most number of hits? Or the
    most number of directory requests.
  • 30:27 - 30:30
    So botnet Command & Control servers
    – if you’re not familiar with what
  • 30:30 - 30:34
    a botnet is, the idea is to infect lots of
    people with a piece of malware.
  • 30:34 - 30:38
    And this malware phones home to
    a Command & Control server where
  • 30:38 - 30:42
    the botnet master can give instructions
    to each of the bots on to do things.
  • 30:42 - 30:47
    So it might be e.g. to collect passwords,
    key strokes, banking details.
  • 30:47 - 30:51
    Or it might be to do things like
    Distributed Denial of Service attacks,
  • 30:51 - 30:55
    or to send spam, those sorts of things.
    And a couple of years ago someone gave
  • 30:55 - 31:01
    a talk and said: “Well, the problem with
    running a botnet is your C&C servers
  • 31:01 - 31:06
    are vulnerable.” Once a C&C server is taken
    down you no longer have control over
  • 31:06 - 31:10
    your botnet. So it’s been a sort of arms
    race against anti-virus companies and
  • 31:10 - 31:15
    against malware authors to try and come up
    with techniques to run C&C servers in a way
  • 31:15 - 31:18
    which they can’t be taken down. And
    a couple of years ago someone gave a talk
  • 31:18 - 31:22
    at a conference that said: “You know what?
    It would be a really good idea if botnet
  • 31:22 - 31:26
    C&C servers were run as Tor hidden
    services because then no one knows
  • 31:26 - 31:29
    where they are, and in theory they can’t
    be taken down.” So in the fact we have this
  • 31:29 - 31:33
    there are loads and loads and loads of
    these addresses associated with several
  • 31:33 - 31:38
    different botnets, ‘Sefnit’ and ‘Skynet’.
    Now Skynet is the one I wanted to talk
  • 31:38 - 31:43
    to you about because the guy that runs
    Skynet had a twitter account, and he also
  • 31:43 - 31:47
    did a Reddit AMA. If you not heard
    of a Reddit AMA before, that’s a Reddit
  • 31:47 - 31:52
    ask-me-anything. You can go on the website
    and ask the guy anything. So this guy
  • 31:52 - 31:55
    wasn’t hiding in the shadows. He’d say:
    “Hey, I’m running this massive botnet,
  • 31:55 - 31:58
    here’s my Twitter account which I update
    regularly, here is my Reddit AMA where
  • 31:58 - 32:02
    you can ask me questions!” etc.
  • 32:02 - 32:05
    He was arrested last year, which is not,
    perhaps, a huge surprise.
  • 32:05 - 32:12
    laughter and applause
  • 32:12 - 32:16
    But… so he was arrested,
    his C&C servers disappeared
  • 32:16 - 32:22
    but there were still infected hosts trying
    to connect with the C&C servers and
  • 32:22 - 32:24
    request access to the C&C server.
  • 32:24 - 32:28
    This is why we’re saying: “A large number
    of hits.” So all of these requests are
  • 32:28 - 32:32
    failed requests, i.e. we didn’t have
    a descriptor for them because
  • 32:32 - 32:35
    the hidden service had gone away but
    there were still clients requesting each
  • 32:35 - 32:38
    of the hidden services.
  • 32:38 - 32:42
    And the next thing we wanted to do was
    to try and categorize sites. So, as I said
  • 32:42 - 32:46
    earlier, we crawled all of the hidden
    services that we could, and we classified
  • 32:46 - 32:50
    them into different categories based
    on what the type of content was
  • 32:50 - 32:54
    on the hidden service side. The first
    graph I have is the number of sites
  • 32:54 - 32:58
    in each of the categories. So you can see
    down the bottom here we got lots of
  • 32:58 - 33:04
    different categories. We got drugs, market
    places, etc. on the bottom. And the graph
  • 33:04 - 33:07
    shows the percentage of the hidden
    services that we crawled that fit in
  • 33:07 - 33:13
    to each of these categories. So e.g. looking
    at this, drugs, the most number of sites
  • 33:13 - 33:16
    that we crawled were made up of
    drugs-focused websites, followed by
  • 33:16 - 33:21
    market places etc. There’s a couple of
    questions you might have here,
  • 33:21 - 33:26
    so which ones are gonna stick out, what
    does ‘porn’ mean, well, you know
  • 33:26 - 33:31
    what ‘porn’ means. There are some very
    notorious porn sites on the Tor Darknet.
  • 33:31 - 33:34
    There was one in particular which was
    focused on revenge porn. It turns out
  • 33:34 - 33:38
    that youngsters wish to take pictures
    of themselves, and send it to their
  • 33:38 - 33:45
    boyfriends or their girlfriends. And
    when they get dumped they publish them
  • 33:45 - 33:50
    on these websites. So there were several
    of these sites on the main internet
  • 33:50 - 33:53
    which have mostly been shut down.
    And some of these sites were archived
  • 33:53 - 33:58
    on the Darknet. The second one is that
    we should probably wonder what is,
  • 33:58 - 34:03
    is ‘abuse’. Abuse was… every single
    site we classified in this category
  • 34:03 - 34:08
    were child abuse sites. So they were in
    some way facilitating child abuse.
  • 34:08 - 34:11
    And how do we know that? Well, the data
    that came back from the crawler
  • 34:11 - 34:15
    made it completely unambiguous as to what
    the content was in these sites. That was
  • 34:15 - 34:19
    completely obvious, from then content, from
    the crawler as to what was on these sites.
  • 34:19 - 34:23
    And this is the principal reason why we
    didn’t pull down images from sites.
  • 34:23 - 34:26
    There are many countries that
    would be a criminal offense to do so.
  • 34:26 - 34:30
    So our crawler only pulled down text
    content from all of these sites, and that
  • 34:30 - 34:34
    enabled us to classify them, based on
    that. We didn’t pull down any images.
  • 34:34 - 34:38
    So of course the next thing we liked to do
    is to say: “Okay, well, given each of these
  • 34:38 - 34:43
    categories, what proportion of directory
    requests went to each of the categories?”
  • 34:43 - 34:45
    Now the next graph is going to need some
    explaining as to precisely what it
  • 34:45 - 34:52
    means, and I’m gonna give that. This is
    the proportion of directory requests
  • 34:52 - 34:56
    which we saw that went to each of the
    categories of hidden service that we
  • 34:56 - 35:00
    classified. As you can see, in fact, we
    saw a very large number going to these
  • 35:00 - 35:05
    abuse sites. And the rest sort of
    distributed right there, at the bottom.
  • 35:05 - 35:07
    And the question is: “What is it
    we’re collecting here?”
  • 35:07 - 35:12
    We’re collecting successful hidden service
    directory requests. What does a hidden
  • 35:12 - 35:17
    service directory request mean?
    It probably loosely correlates with
  • 35:17 - 35:22
    either a visit or a visitor. So somewhere
    in between those two. Because when you
  • 35:22 - 35:27
    want to visit a hidden service you make
    a request for the hidden service descriptor
  • 35:27 - 35:31
    and that allows you to connect to it
    and browse through the web site.
  • 35:31 - 35:35
    But there are cases where, e.g. if you
    restart Tor, you’ll go back and you
  • 35:35 - 35:40
    re-fetch the descriptor. So in that case
    we’ll count twice, for example.
  • 35:40 - 35:43
    What proportion of these are people,
    and which proportion of them are
  • 35:43 - 35:47
    something else? The answer to that is
    we just simply don’t know.
  • 35:47 - 35:50
    We've got directory requests but that doesn’t
    tell us about what they’re doing on these
  • 35:50 - 35:55
    sites, what they’re fetching, or who
    indeed they are, or what it is they are.
  • 35:55 - 35:59
    So these could be automated requests,
    they could be human beings. We can’t
  • 35:59 - 36:04
    distinguish between those two things.
  • 36:04 - 36:06
    What are the limitations?
  • 36:06 - 36:12
    A hidden service directory request neither
    exactly correlates to a visit -or- a visitor.
  • 36:12 - 36:16
    It’s probably somewhere in between.
    So you can’t say whether it’s exactly one
  • 36:16 - 36:20
    or the other. We cannot say whether
    a hidden service directory request
  • 36:20 - 36:26
    is a person or something automated.
    We can’t distinguish between those two.
  • 36:26 - 36:32
    Any type of site could be targeted by e.g.
    DoS attacks, by web crawlers which would
  • 36:32 - 36:40
    greatly inflate the figures. If you were
    to do a DoS attack it’s likely you’d only
  • 36:40 - 36:45
    request a small number of descriptors.
    You’d actually be flooding the site itself
  • 36:45 - 36:48
    rather than the directories. But, in
    theory, you could flood the directories.
  • 36:48 - 36:53
    But we didn’t see any sort of shutdown
    of our directories based on flooding, e.g.
  • 36:53 - 36:59
    Whilst we can’t rule that out, it doesn’t
    seem to fit too well with what we’ve got.
  • 36:59 - 37:03
    The other question is ‘crawlers’.
    I obviously talked with the Tor Project
  • 37:03 - 37:09
    about these results and they’ve suggested
    that there are groups, so the child
  • 37:09 - 37:13
    protection agencies e.g. that will crawl
    these sites on a regular basis. And,
  • 37:13 - 37:16
    again, that doesn’t necessarily correlate
    with a human being. And that could
  • 37:16 - 37:20
    inflate the figures. How many hidden
    directory requests would there be
  • 37:20 - 37:25
    if a crawler was pointed at it. Typically,
    if I crawl them on a single day, one request.
  • 37:25 - 37:28
    But if they got a large number of servers
    doing the crawling then it could be
  • 37:28 - 37:33
    a request per day for every single server.
    So, again, I can’t give you, definitive,
  • 37:33 - 37:38
    “yes, this is human beings” or
    “yes, this is automated requests”.
  • 37:38 - 37:43
    The other important point is, these two
    content graphs are only hidden services
  • 37:43 - 37:49
    offering web content. There are hidden
    services that do things, e.g. IRC,
  • 37:49 - 37:52
    the instant messaging etc. Those aren’t
    included in these figures. We’re only
  • 37:52 - 37:58
    concentrating on hidden services offering
    web sites. They’re HTTP services, or HTTPS
  • 37:58 - 38:02
    services. Because that allows to easily
    classify them. And, in fact, some of
  • 38:02 - 38:06
    the other types are IRC and Jabber the
    result was probably not directly comparable
  • 38:06 - 38:09
    with web sites. That’s sort of the use
    case for using them, it’s probably
  • 38:09 - 38:16
    slightly different. So I appreciate the
    last graph is somewhat alarming.
  • 38:16 - 38:21
    If you have any questions please ask
    either me or the Tor developers
  • 38:21 - 38:25
    as to how to interpret these results. It’s
    not quite as straight-forward as it may
  • 38:25 - 38:28
    look when you look at the graph. You
    might look at the graph and say: “Hey,
  • 38:28 - 38:31
    that looks like there’s lots of people
    visiting these sites”. It’s difficult
  • 38:31 - 38:40
    to conclude that from the results.
  • 38:40 - 38:46
    The next slide is gonna be very
    contentious. I will prefix it with:
  • 38:46 - 38:51
    “I’m not advocating -any- kind of
    action whatsoever. I’m just trying
  • 38:51 - 38:56
    to describe technically as to what could
    be done. It’s not up to me to make decisions
  • 38:56 - 39:03
    on these types of things.” So, of course,
    when we found this out, frankly, I think
  • 39:03 - 39:06
    we were stunned. I mean, it took us
    several days, frankly, it just stunned us,
  • 39:06 - 39:10
    “what the hell, this is not
    what we expected at all.”
  • 39:10 - 39:13
    So a natural step is, well, we think, most
    of us think that Tor is a great thing,
  • 39:13 - 39:19
    it seems. Could this problem be sorted out
    while still keeping Tor as it is?
  • 39:19 - 39:22
    And probably the next step to say: “Well,
    okay, could we just block this class
  • 39:22 - 39:26
    of content and not other types of content?”
    So could we block just hidden services
  • 39:26 - 39:30
    that are associated with these sites and
    not other types of hidden services?
  • 39:30 - 39:33
    We thought there’s three ways in which
    we could block hidden services.
  • 39:33 - 39:37
    And I’ll talk about whether these were
    impossible in the coming months,
  • 39:37 - 39:39
    after explaining them. But during our
    study these would have been impossible
  • 39:39 - 39:44
    and presently they are possible.
  • 39:44 - 39:49
    A single individual could shut down
    a single hidden service by controlling
  • 39:49 - 39:54
    all of the relays which are responsible
    for receiving a publication request
  • 39:54 - 39:57
    on that distributed hash table. It’s
    possible to place one of your relays
  • 39:57 - 40:01
    at a particular position on that circle
    and so therefore make yourself be
  • 40:01 - 40:04
    the responsible relay for
    a particular hidden service.
  • 40:04 - 40:08
    And if you control all of the six relays
    which are responsible for a hidden service,
  • 40:08 - 40:11
    when someone comes to you and says:
    “Can I have a descriptor for that site”
  • 40:11 - 40:16
    you can just say: “No, I haven’t got it”.
    And provided you control those relays
  • 40:16 - 40:21
    users won’t be able to fetch those sites.
  • 40:21 - 40:25
    The second option is you could say:
    “Okay, the Tor Project are blocking these”
  • 40:25 - 40:29
    – which I’ll talk about in a second –
    “as a relay operator”. Could I
  • 40:29 - 40:32
    as a relay operator say: “Okay, as
    a relay operator I don’t want to carry
  • 40:32 - 40:36
    this type of content, and I don’t want to
    be responsible for serving up this type
  • 40:36 - 40:40
    of content.” A relay operator could patch
    his relay and say: “You know what,
  • 40:40 - 40:44
    if anyone comes to this relay requesting
    anyone of these sites then, again, just
  • 40:44 - 40:49
    refuse to do it”. The problem is a lot of
    relay operators need to do it. So a very,
  • 40:49 - 40:52
    very large number of the potential relay
    operators would need to do that
  • 40:52 - 40:56
    to effectively block these sites. The
    final option is the Tor Project could
  • 40:56 - 41:01
    modify the Tor program and actually embed
    these ingresses in the Tor program itself
  • 41:01 - 41:05
    so as that all relays by default both
    block hidden service directory requests
  • 41:05 - 41:11
    to these sites, and also clients themselves
    would say: “Okay, if anyone’s requesting
  • 41:11 - 41:15
    these block them at the client level.”
    Now I hasten to add: I’m not advocating
  • 41:15 - 41:18
    any kind of action that is entirely up to
    other people because, frankly, I think
  • 41:18 - 41:23
    if I advocated blocking hidden services
    I probably wouldn’t make it out alive,
  • 41:23 - 41:27
    so I’m just saying: this is a description
    of what technical measures could be used
  • 41:27 - 41:31
    to block some classes of sites. And of
    course there’s lots of questions here.
  • 41:31 - 41:35
    If e.g. the Tor Project themselves decided:
    “Okay, we’re gonna block these sites”
  • 41:35 - 41:38
    that means they are essentially
    in control of the block list.
  • 41:38 - 41:41
    The block list would be somewhat public
    so everyone would be up to inspect
  • 41:41 - 41:45
    what the sites are that are being blocked
    and they would be in control of some kind
  • 41:45 - 41:54
    of block list. Which, you know, arguably
    is against what the Tor Projects are after.
  • 41:54 - 42:00
    takes a sip, coughs
  • 42:00 - 42:05
    So how about deanonymising visitors
    to hidden service web sites?
  • 42:05 - 42:09
    So in this case we got a user on the
    left-hand side who is connected to
  • 42:09 - 42:13
    a Guard node. We’ve got a hidden service
    on the right-hand side who is connected
  • 42:13 - 42:18
    to a Guard node and on the top we got
    one of those directory servers which is
  • 42:18 - 42:22
    responsible for serving up those
    hidden service directory requests.
  • 42:22 - 42:29
    Now, when you first want to connect to
    a hidden service you connect through
  • 42:29 - 42:32
    your Guard node and through a couple of hops
    up to the hidden service directory and
  • 42:32 - 42:36
    you request the descriptor off of them.
    So at this point if you are the attacker
  • 42:36 - 42:39
    and you control one of the hidden service
    directory nodes for a particular site
  • 42:39 - 42:43
    you can send back down the circuit
    a particular pattern of traffic.
  • 42:43 - 42:48
    And if you control that user’s
    Guard node – which is a big if –
  • 42:48 - 42:52
    then you can spot that pattern of traffic
    at the Guard node. The question is:
  • 42:52 - 42:57
    “How do you control a particular user’s
    Guard node?” That’s very, very hard.
  • 42:57 - 43:01
    But if e.g. I run a hidden service and all
    of you visit my hidden service, and
  • 43:01 - 43:06
    I’m running a couple of dodgy Guard relays
    then the probability is that some of you,
  • 43:06 - 43:10
    certainly not all of you by any stretch will
    select my dodgy Guard relay, and
  • 43:10 - 43:13
    I could deanonymise you, but I couldn’t
    deanonymise the rest of them.
  • 43:13 - 43:18
    So what we’re saying here is that
    you can deanonymise some of the users
  • 43:18 - 43:22
    some of the time but you can’t pick which
    users those are which you’re going to
  • 43:22 - 43:27
    deanonymise. You can’t deanonymise someone
    specific but you can deanonymise a fraction
  • 43:27 - 43:32
    based on what fraction of the network you
    control in terms of Guard capacity.
  • 43:32 - 43:36
    How about… so the attacker controls those
    two – here’s a picture from a research of
  • 43:36 - 43:40
    the University of Luxemburg which
    did this. And these are plots of
  • 43:40 - 43:45
    taking the user’s IP address visiting
    a C&C server, and then geolocating it
  • 43:45 - 43:48
    and putting it on a map. So “where was the
    user located when they called one of
  • 43:48 - 43:52
    the Tor hidden services?” So, again,
    this is a selection, a percentage
  • 43:52 - 43:58
    of the users visiting C&C servers
    using this technique.
  • 43:58 - 44:04
    How about deanonymising hidden services
    themselves? Well, again, you got a problem.
  • 44:04 - 44:08
    You’re the user. You’re gonna connect
    through your Guard into the Tor network.
  • 44:08 - 44:12
    And then, eventually, through the hidden
    service’s Guard node, and talk to
  • 44:12 - 44:17
    the hidden service. As the attacker you
    need to control the hidden service’s
  • 44:17 - 44:21
    Guard node to do these traffic correlation
    attacks. So again, it’s very difficult
  • 44:21 - 44:24
    to deanonymise a specific Tor hidden
    service. But if you think about, okay,
  • 44:24 - 44:30
    there is 1.000 Tor hidden services, if you
    can control a percentage of the Guard nodes
  • 44:30 - 44:34
    then some hidden services will pick you
    and then you’ll be able to deanonymise those.
  • 44:34 - 44:37
    So provided you don’t care which hidden
    services you gonna deanonymise
  • 44:37 - 44:41
    then it becomes much more straight-forward
    to control the Guard nodes of some hidden
  • 44:41 - 44:45
    services but you can’t pick exactly
    what those are.
  • 44:45 - 44:51
    So what sort of data can you see
    traversing a relay?
  • 44:51 - 44:56
    This is a modified Tor client which just
    dumps cells which are coming…
  • 44:56 - 44:59
    essentially packets travelling down
    a circuit, and the information you can
  • 44:59 - 45:04
    extract from them at a Guard node.
    And this is done off the main Tor network.
  • 45:04 - 45:09
    So I’ve got a client connected to
    a “malicious” Guard relay
  • 45:09 - 45:14
    and it logs every single packet – they’re
    called ‘cells’ in the Tor protocol –
  • 45:14 - 45:18
    coming through the Guard relay. We can’t
    decrypt the packet because it’s encrypted
  • 45:18 - 45:22
    three times. What we can record,
    though, is the IP address of the user,
  • 45:22 - 45:25
    the IP address of the next hop,
    and we can count packets travelling
  • 45:25 - 45:29
    in each direction down the circuit. And we
    can also record the time at which those
  • 45:29 - 45:32
    packets were sent. So of course, if you’re
    doing the traffic correlation attacks
  • 45:32 - 45:38
    you’re using that time in the information
    to try and work out whether you’re seeing
  • 45:38 - 45:42
    traffic which you’ve sent and which
    identifies a particular user or not.
  • 45:42 - 45:45
    Or indeed traffic which they’ve sent
    which you’ve seen at a different point
  • 45:45 - 45:49
    in the network.
  • 45:49 - 45:52
    Moving on to my…
  • 45:52 - 45:56
    …interesting problems,
    research questions etc.
  • 45:56 - 45:59
    Based on what I’ve said, I’ve said there’s
    these directory authorities which are
  • 45:59 - 46:05
    controlled by the core Tor members. If
    e.g. they were malicious then they could
  • 46:05 - 46:09
    manipulate the Tor… – if a big enough
    chunk of them are malicious then
  • 46:09 - 46:13
    they can manipulate the consensus
    to direct you to particular nodes.
  • 46:13 - 46:16
    I don’t think that’s the case, and that
    anyone thinks that’s the case.
  • 46:16 - 46:19
    And Tor is designed in a way to tr…
    I mean that you’d have to control
  • 46:19 - 46:22
    a certain number of the authorities
    to be able to do anything important.
  • 46:22 - 46:25
    So the Tor people… I said this
    to them a couple of days ago.
  • 46:25 - 46:29
    I find it quite funny that you’d design
    your system as if you don’t trust
  • 46:29 - 46:32
    each other. To which their response was:
    “No, we design our system so that
  • 46:32 - 46:36
    we don’t have to trust each other.” Which
    I think is a very good model to have,
  • 46:36 - 46:39
    when you have this type of system.
    So could we eliminate these sort of
  • 46:39 - 46:43
    centralized servers? I think that’s
    actually a very hard problem to do.
  • 46:43 - 46:46
    There are lots of attacks which could
    potentially be deployed against
  • 46:46 - 46:51
    a decentralized network. At the moment the
    Tor network is relatively well understood
  • 46:51 - 46:54
    both in terms of what types of attack it
    is vulnerable to. So if we were to move
  • 46:54 - 46:59
    to a new architecture then we may open it
    to a whole new class of attacks.
  • 46:59 - 47:02
    The Tor network has been existing
    for quite some time and it’s been
  • 47:02 - 47:07
    very well studied. What about global
    adversaries like the NSA, where you could
  • 47:07 - 47:11
    monitor network links all across the
    world? It’s very difficult to defend
  • 47:11 - 47:16
    against that. Where they can monitor…
    if they can identify which Guard relay
  • 47:16 - 47:19
    you’re using, they can monitor traffic
    going into and out of the Guard relay,
  • 47:19 - 47:23
    and they log each of the subsequent hops
    along. It’s very, very difficult to defend against
  • 47:23 - 47:26
    these types of things. Do we know if
    they’re doing it? The documents that were
  • 47:26 - 47:30
    released yesterday – I’ve only had a very
    brief look through them, but they suggest
  • 47:30 - 47:32
    that they’re not presently doing it and
    they haven’t had much success.
  • 47:32 - 47:36
    I don’t know why, there are very powerful
    attacks described in the academic literature
  • 47:36 - 47:41
    which are very, very reliable and most
    academic literature you can access for free
  • 47:41 - 47:44
    so it’s not even as if they have to figure
    out how to do it. They just have to read
  • 47:44 - 47:47
    the academic literature and try and
    implement some of these attacks.
  • 47:47 - 47:52
    I don’t know what – why they’re not. The
    next question is how to detect malicious
  • 47:52 - 47:58
    relays. So in my case we’re running
    40 relays. Our relays were on consecutive
  • 47:58 - 48:02
    IP addresses, so we’re running 40
    – well, most of them are on consecutive
  • 48:02 - 48:05
    IP addresses in two blocks. So they’re
    running on IP addresses numbered
  • 48:05 - 48:09
    e.g. 1,2,3,4,…
    We were running two relays per IP address,
  • 48:09 - 48:12
    and every single relay had my name
    plastered across it.
  • 48:12 - 48:15
    So after I set up these 40 relays in
  • 48:15 - 48:17
    a relatively short period of time
    I expected someone from the Tor Project
  • 48:17 - 48:22
    to come to me and say: “Hey Gareth, what
    are you doing?” – no one noticed,
  • 48:22 - 48:26
    no one noticed. So this is presently
    an open question. On the Tor Project
  • 48:26 - 48:29
    they’re quite open about this. They
    acknowledged that, in fact, last year
  • 48:29 - 48:33
    we had the CERT researchers launch much
    more relays than that. The Tor Project
  • 48:33 - 48:37
    spotted those large number of relays
    but chose not to do anything about it
  • 48:37 - 48:40
    and, in fact, they were deploying an
    attack. But, as you know, it’s often very
  • 48:40 - 48:44
    difficult to defend against unknown
    attacks. So at the moment how to detect
  • 48:44 - 48:48
    malicious relays is a bit of an open
    question. Which as I think is being
  • 48:48 - 48:51
    discussed on the mailing list.
  • 48:51 - 48:54
    The other one is defending against unknown
    tampering at exits. If you took or take
  • 48:54 - 48:57
    the exit relays – the exit relay
    can tamper with the traffic.
  • 48:57 - 49:01
    So we know particular types of attacks
    doing SSL man-in-the-middles etc.
  • 49:01 - 49:05
    We’ve seen recently binary patching.
    How do we detect unknown tampering
  • 49:05 - 49:09
    with traffic, other types of traffic? So
    the binary tampering wasn’t spotted
  • 49:09 - 49:12
    until it was spotted by someone who
    told the Tor Project. So it wasn’t
  • 49:12 - 49:16
    detected e.g. by the Tor Project
    themselves, it was spotted by someone else
  • 49:16 - 49:20
    and notified to them. And then the final
    one open on here is the Tor code review.
  • 49:20 - 49:25
    So the Tor code is open source. We know
    from OpenSSL that, although everyone
  • 49:25 - 49:29
    can read source code, people don’t always
    look at it. And OpenSSL has been
  • 49:29 - 49:32
    a huge mess, and there’s been
    lots of stuff disclosed over that
  • 49:32 - 49:36
    over the last coming days. There are
    lots of eyes on the Tor code but I think
  • 49:36 - 49:42
    always, more eyes are better. I’d say,
    ideally if we can get people to look
  • 49:42 - 49:45
    at the Tor code and look for
    vulnerabilities then… I encourage people
  • 49:45 - 49:50
    to do that. It’s a very useful thing to
    do. There could be unknown vulnerabilities
  • 49:50 - 49:53
    as we’ve seen with the “relay early” type
    quite recently in the Tor code which
  • 49:53 - 49:57
    could be quite serious. The truth is we
    just don’t know until people do thorough
  • 49:57 - 50:02
    code audits, and even then it’s very
    difficult to know for certain.
  • 50:02 - 50:08
    So my last point, I think, yes,
  • 50:08 - 50:11
    is advice to future researchers.
    So if you ever wanted, or are planning
  • 50:11 - 50:16
    on doing a study in the future, e.g. on
    Tor, do not do what the CERT researchers
  • 50:16 - 50:21
    do and start deanonymising people on the
    live Tor network and doing it in a way
  • 50:21 - 50:25
    which is incredibly irresponsible. I don’t
    think…I mean, I tend, myself, to give you with
  • 50:25 - 50:29
    the benefit of a doubt, I don’t think the
    CERT researchers set out to be malicious.
  • 50:29 - 50:33
    I think they’re just very naive.
    That’s what it was they were doing.
  • 50:33 - 50:37
    That was rapidly pointed out to them.
    In my case we are running
  • 50:37 - 50:43
    40 relays. Our Tor relays they were forwarding
    traffic, they were acting as good relays.
  • 50:43 - 50:46
    The only thing that we were doing
    was logging publication requests
  • 50:46 - 50:50
    to the directories. Big question whether
    that’s malicious or not – I don’t know.
  • 50:50 - 50:53
    One thing that has been pointed out to me
    is that the .onion addresses themselves
  • 50:53 - 50:58
    could be considered sensitive information,
    so any data we will be retaining
  • 50:58 - 51:02
    from the study is the aggregated data.
    So we won't be retaining information
  • 51:02 - 51:05
    on individual .onion addresses because
    that could potentially be considered
  • 51:05 - 51:09
    sensitive information. If you think about
    someone running an .onion address which
  • 51:09 - 51:11
    contains something which they don’t want
    other people knowing about. So we won’t
  • 51:11 - 51:15
    be retaining that data, and
    we’ll be destroying them.
  • 51:15 - 51:20
    So I think that brings me now
    to starting the questions.
  • 51:20 - 51:23
    I want to say “Thanks” to a couple of
    people. The student who donated
  • 51:23 - 51:27
    the server to us. Nick Savage who is one
    of my colleagues who was a sounding board
  • 51:27 - 51:31
    during the entire study. Ivan Pustogarov
    who is the researcher at the University
  • 51:31 - 51:35
    of Luxembourg who sent us the large data
    set of .onion addresses from last year.
  • 51:35 - 51:38
    He’s also the chap who has demonstrated
    those deanonymisation attacks
  • 51:38 - 51:42
    that I talked about. A big "Thank you" to
    Roger Dingledine who has frankly been…
  • 51:42 - 51:45
    presented loads of questions to me over
    the last couple of days and allowed me
  • 51:45 - 51:49
    to bounce ideas back and forth.
    That has been a very useful process.
  • 51:49 - 51:54
    If you are doing future research I strongly
    encourage you to contact the Tor Project
  • 51:54 - 51:57
    at the earliest opportunity. You’ll find
    them… certainly I found them to be
  • 51:57 - 51:59
    extremely helpful.
  • 51:59 - 52:05
    Donncha also did something similar,
    so both Ivan and Donncha have done
  • 52:05 - 52:10
    a similar study in trying to classify the
    types of hidden services or work out
  • 52:10 - 52:14
    how many hits there are to particular
    types of hidden service. Ivan Pustogarov
  • 52:14 - 52:17
    did it on a bigger scale
    and found similar results to us.
  • 52:17 - 52:22
    That is that these abuse sites
    featured frequently
  • 52:22 - 52:27
    in the top requested sites. That was done
    over a year ago, and again, he was seeing
  • 52:27 - 52:31
    similar sorts of pattern. There were these
    abuse sites being requested frequently.
  • 52:31 - 52:35
    So that also sort of probates
    what we’re saying.
  • 52:35 - 52:39
    The data I put online is at this address,
    there will probably be the slides,
  • 52:39 - 52:42
    something called ‘The Tor Research
    Framework’ which is an implementation
  • 52:42 - 52:48
    of a Java client, so an implementation
    of a Tor client in Java specifically aimed
  • 52:48 - 52:52
    at researchers. So if e.g. you wanna pull
    out data from a consensus you can do.
  • 52:52 - 52:55
    If you want to build custom routes
    through the network you can do.
  • 52:55 - 52:58
    If you want to build routes through the
    network and start sending padding traffic
  • 52:58 - 53:02
    down them you can do etc.
    The code is designed in a way which is
  • 53:02 - 53:06
    designed to be easily modifiable
    for testing lots of these things.
  • 53:06 - 53:11
    There is also a link to the Tor FBI
    exploit which they deployed against
  • 53:11 - 53:16
    visitors to some Tor hidden services last
    year. They exploited a Mozilla Firefox bug
  • 53:16 - 53:21
    and then ran code on users who were
    visiting these hidden service, and ran
  • 53:21 - 53:25
    code on their computer to identify them.
    At this address there is a link to that
  • 53:25 - 53:29
    including a copy of the shell code and an
    analysis of exactly what it was doing.
  • 53:29 - 53:32
    And then of course a list of references,
    with papers and things.
  • 53:32 - 53:34
    So I’m quite happy to take questions now.
  • 53:34 - 53:47
    applause
  • 53:47 - 53:51
    Herald: Thanks for the nice talk!
    Do we have any questions
  • 53:51 - 53:57
    from the internet?
  • 53:57 - 54:00
    Signal Angel: One question. It’s very hard
    to block addresses since creating them
  • 54:00 - 54:04
    is cheap, and they can be generated
    for each user, and rotated often. So
  • 54:04 - 54:08
    can you think of any other way
    for doing the blocking?
  • 54:08 - 54:10
    Gareth: That is absolutely true, so, yes.
    If you were to block a particular .onion
  • 54:10 - 54:13
    address they can wail: “I want another
    .onion address.” So I don’t know of
  • 54:13 - 54:17
    any way to counter that now.
  • 54:17 - 54:19
    Herald: Another one from the internet?
    inaudible answer from Signal Angel
  • 54:19 - 54:22
    Okay, then, Microphone 1, please!
  • 54:22 - 54:26
    Question: Thank you, that’s fascinating
    research. You mentioned that it is
  • 54:26 - 54:32
    possible to influence the hash of your
    relay node in a sense that you could
  • 54:32 - 54:36
    to be choosing which service you are
    advertising, or which hidden service
  • 54:36 - 54:38
    you are responsible for. Is that right?
    Gareth: Yeah, correct!
  • 54:38 - 54:40
    Question: So could you elaborate
    on how this is possible?
  • 54:40 - 54:45
    Gareth: So e.g. you just keep regenerating
    a public key for your relay,
  • 54:45 - 54:48
    you’ll get closer and closer to the point
    where you’ll be the responsible relay
  • 54:48 - 54:51
    for that particular hidden service. That’s
    just – you keep regenerating your identity
  • 54:51 - 54:55
    hash until you’re at that particular point
    in the relay. That’s not particularly
  • 54:55 - 55:00
    computationally intensive to do.
    That was it?
  • 55:00 - 55:05
    Herald: Okay, next question
    from Microphone 5, please.
  • 55:05 - 55:09
    Question: Hi, I was wondering for the
    attacks where you identify a certain number
  • 55:09 - 55:15
    of users using a hidden service. Have
    those attacks been used, or is there
  • 55:15 - 55:19
    any evidence there, and is there
    any way of protecting against that?
  • 55:19 - 55:22
    Gareth: That’s a very interesting question,
    is there any way to detect these types
  • 55:22 - 55:25
    of attacks? So some of the attacks,
    if you’re going to generate particular
  • 55:25 - 55:29
    traffic patterns, one way to do that is to
    use the padding cells. The padding cells
  • 55:29 - 55:32
    aren’t used at the moment by the official
    Tor client. So the detection of those
  • 55:32 - 55:37
    could be indicative but it doesn't...
    it`s not conclusive evidence in our tool.
  • 55:37 - 55:40
    Question: And is there any way of
    protecting against a government
  • 55:40 - 55:47
    or something trying to denial-of-service
    hidden services?
  • 55:47 - 55:48
    Gareth: So I… trying to… did not…
  • 55:48 - 55:52
    Question: Is it possible to protect
    against this kind of attack?
  • 55:52 - 55:56
    Gareth: Not that I’m aware of. The Tor
    Project are currently revising how they
  • 55:56 - 56:00
    do the hidden service protocol which will
    make e.g. what I did, enumerating
  • 56:00 - 56:03
    the hidden services, much more difficult.
    And to also be in a position on the
  • 56:03 - 56:07
    distributed hash table in advance
    for a particular hidden service.
  • 56:07 - 56:11
    So they are at the moment trying to change
    the way it’s done, and make some of
  • 56:11 - 56:15
    these things more difficult.
  • 56:15 - 56:20
    Herald: Good. Next question
    from Microphone 2, please.
  • 56:20 - 56:27
    Mic2: Hi. I’m running the Tor2Web abuse,
    and so I used to see a lot of abuse of requests
  • 56:27 - 56:31
    concerning the Tor hidden service
    being exposed on the internet through
  • 56:31 - 56:37
    the Tor2Web.org domain name. And I just
    wanted to comment on, like you said,
  • 56:37 - 56:45
    the abuse number of the requests. I used
    to spoke with some of the child protection
  • 56:45 - 56:50
    agencies that reported abuse at
    Tor2Web.org, and they are effectively
  • 56:50 - 56:56
    using crawlers that periodically look for
    changes in order to get new images to be
  • 56:56 - 57:00
    put in the database. And what I was able
    to understand is that the German agency
  • 57:00 - 57:07
    doing that is crawling the same sites that
    the Italian agencies are crawling, too.
  • 57:07 - 57:12
    So it’s likely that in most of the
    countries there are the child protection
  • 57:12 - 57:17
    agencies that are crawling those few
    numbers of Tor hidden services that
  • 57:17 - 57:23
    contain child porn. And I saw it also
    a bit from the statistics of Tor2Web
  • 57:23 - 57:28
    where the amount of abuse relating to
    that kind of content, it’s relatively low.
  • 57:28 - 57:30
    Just as contribution!
  • 57:30 - 57:34
    Gareth: Yes, that’s very interesting,
    thank you for that!
  • 57:34 - 57:37
    applause
  • 57:37 - 57:40
    Herald: Next, Microphone 4, please.
  • 57:40 - 57:45
    Mic4: You then attacked or deanonymised
    users with an infected or a modified Guard
  • 57:45 - 57:52
    relay? Is it required to modify the Guard
    relay if I control the entry point
  • 57:52 - 57:57
    of the user to the internet?
    If I’m his ISP?
  • 57:57 - 58:02
    Gareth: Yes, if you observe traffic
    travelling into a Guard relay without
  • 58:02 - 58:05
    controlling the Guard relay itself.
    Mic4: Yeah.
  • 58:05 - 58:08
    Gareth: In theory, yes. I wouldn’t be able
    to tell you how reliable that is
  • 58:08 - 58:10
    off the top of my head.
    Mic4: Thanks!
  • 58:10 - 58:14
    Herald: So another question
    from the internet!
  • 58:14 - 58:16
    Signal Angel: Wouldn’t the ability to
    choose the key hash prefix give
  • 58:16 - 58:20
    the ability to target specific .onions?
  • 58:20 - 58:24
    Gareth: So you can only target one .onion
    address at a time. Because of the way
  • 58:24 - 58:28
    they are generated. So you wouldn’t be
    able to say e.g. “Pick a key which targeted
  • 58:28 - 58:32
    two or more .onion addresses.” You can
    only target one .onion address at a time
  • 58:32 - 58:38
    by positioning yourself at a particular
    point on the distributed hash table.
  • 58:38 - 58:40
    Herald: Another one
    from the internet? … Okay.
  • 58:40 - 58:43
    Then Microphone 3, please.
  • 58:43 - 58:48
    Mic3: Hey. Thanks for this research.
    I think it strengthens the network.
  • 58:48 - 58:54
    So in the deem (?) I was wondering whether
    you can donate this relays to be a part of
  • 58:54 - 59:00
    non-malicious relays pool, basically
    use them as regular relays afterwards?
  • 59:00 - 59:03
    Gareth: Okay, so can I donate the relays
    a rerun and at the Tor capacity (?) ?
  • 59:03 - 59:05
    Unfortunately, I said they were run by
    a student and they were donated for
  • 59:05 - 59:10
    a fixed period of time. So we’ve given
    those back to him. We are very grateful
  • 59:10 - 59:15
    to him, he was very generous. In fact,
    without his contribution donating these
  • 59:15 - 59:19
    it would have been much more difficult
    to collect as much data as we did.
  • 59:19 - 59:21
    Herald: Good, next, Microphone 5, please!
  • 59:21 - 59:26
    Mic5: Yeah hi, first of all thanks
    for your talk. I think you’ve raised
  • 59:26 - 59:29
    some real issues that need to be
    considered very carerfully by everyone
  • 59:29 - 59:34
    on the Tor Project. My question: I’d like
    to go back to the issue with so many
  • 59:34 - 59:38
    abuse related web sites running over
    the Tor Project. I think it’s an important
  • 59:38 - 59:42
    issue that really needs to be considered
    because we don’t wanna be associated
  • 59:42 - 59:45
    with that at the end of the day.
    Anyone who uses Tor, who runs a relay
  • 59:45 - 59:51
    or an exit node. And I understand it’s
    a bit of a censored issue, and you don’t
  • 59:51 - 59:55
    really have any say over whether it’s
    implemented or not. But I’d like to get
  • 59:55 - 60:02
    your opinion on the implementation
    of a distributed block-deny system
  • 60:02 - 60:07
    that would run in very much a similar way
    to those of the directory authorities.
  • 60:07 - 60:09
    I’d just like to see what
    you think of that.
  • 60:09 - 60:13
    Gareth: So you’re asking me whether I want
    to support a particular blocking mechanism
  • 60:13 - 60:14
    then?
  • 60:14 - 60:16
    Mic5: I’d like to get your opinion on it.
    Gareth laughs
  • 60:16 - 60:21
    I know it’s a sensitive issue but I think,
    like I said, I think something…
  • 60:21 - 60:26
    I think it needs to be considered because
    everyone running exit nodes and relays
  • 60:26 - 60:30
    and people of the Tor Project don’t
    want to be known or associated with
  • 60:30 - 60:35
    these massive amount of abuse web sites
    that currently exist within the Tor network.
  • 60:35 - 60:40
    Gareth: I absolutely agree, and I think
    the Tor Project are horrified as well that
  • 60:40 - 60:44
    this problem exists, and they, in fact,
    talked on it in previous years that
  • 60:44 - 60:49
    they have a problem with this type of
    content. I asked to what if anything is
  • 60:49 - 60:52
    done about it, it’s very much up to them.
    Could it be done in a distributed fashion?
  • 60:52 - 60:56
    So the example I gave was a way which
    it could be done by relay operators.
  • 60:56 - 61:00
    So e.g. that would need the consensus of
    a large number of relay operators to be
  • 61:00 - 61:03
    effective. So that is done in
    a distributed fashion. The question is:
  • 61:03 - 61:07
    who gives the list of .onion addresses to
    block to each of the relay operators?
  • 61:07 - 61:10
    Clearly, the relay operators aren’t going
    to collect themselves. It needs to be
  • 61:10 - 61:16
    supplied by someone like the Tor Project,
    e.g., or someone trustworthy. Yes, it can
  • 61:16 - 61:20
    be done in a distributed fashion.
    It can be done in an open fashion.
  • 61:20 - 61:22
    Mic5: Who knows?
    Gareth: Okay.
  • 61:22 - 61:24
    Mic5: Thank you.
  • 61:24 - 61:27
    Herald: Good. And another
    question from the internet.
  • 61:27 - 61:31
    Signal Angel: Apparently there’s an option
    in the Tor client to collect statistics
  • 61:31 - 61:35
    on hidden services. Do you know about
    this, and how it relates to your research?
  • 61:35 - 61:39
    Gareth: Yes, I believe they’re going to
    be… the extent to which I know about it
  • 61:39 - 61:42
    is they’re gonna be trying this next
    month, to try and estimate how many
  • 61:42 - 61:46
    hidden services there are. So keep
    your eye on the Tor Project web site,
  • 61:46 - 61:50
    I’m sure they’ll be publishing
    their data in the coming months.
  • 61:50 - 61:55
    Herald: And, sadly, we are running out of
    time, so this will be the last question,
  • 61:55 - 61:57
    so Microphone 4, please!
  • 61:57 - 62:01
    Mic4: Hi, I’m just wondering if you could
    sort of outline what ethical clearances
  • 62:01 - 62:05
    you had to get from your university
    to conduct this kind of research.
  • 62:05 - 62:07
    Gareth: So we have to discuss these
    types of things before undertaking
  • 62:07 - 62:12
    any research. And we go through the steps
    to make sure that we’re not e.g. storing
  • 62:12 - 62:16
    sensitive information about particular
    people. So yes, we are very mindful
  • 62:16 - 62:19
    of that. And that’s why I made a
    particular point of putting on the slides
  • 62:19 - 62:22
    as to some of the things to consider.
  • 62:22 - 62:26
    Mic4: So like… you outlined a potential
    implementation of the traffic correlation
  • 62:26 - 62:30
    attack. Are you saying that
    you performed the attack? Or…
  • 62:30 - 62:33
    Gareth: No, no no, absolutely not.
    So the link I’m giving… absolutely not.
  • 62:33 - 62:35
    We have not engaged in any…
  • 62:35 - 62:36
    Mic4: It just wasn’t clear
    from the slides.
  • 62:36 - 62:39
    Gareth: I apologize. So it’s absolutely
    clear on that. No, we’re not engaging
  • 62:39 - 62:43
    in any deanonymisation research on the
    Tor network. The research I showed
  • 62:43 - 62:46
    is linked on the references, I think,
    which I put at the end of the slides.
  • 62:46 - 62:52
    You can read about it. But it’s done in
    simulation. So e.g. there’s a way
  • 62:52 - 62:55
    to do simulation of the Tor network on
    a single computer. I can’t remember
  • 62:55 - 62:59
    the name of the project, though.
    Shadow! Yes, it’s a system
  • 62:59 - 63:02
    called Shadow, we can run a large
    number of Tor relays on a single computer
  • 63:02 - 63:05
    and simulate the traffic between them.
    If you’re going to do that type of research
  • 63:05 - 63:09
    then you should use that. Okay,
    thank you very much, everyone.
  • 63:09 - 63:18
    applause
  • 63:18 - 63:22
    silent postroll titles
  • 63:22 - 63:27
    subtitles created by c3subtitles.de
    Join, and help us!
Title:
Dr Gareth Owen: Tor: Hidden Services and Deanonymisation
Description:

more » « less
Video Language:
English
Duration:
01:03:27

English subtitles

Revisions