WEBVTT
00:00:00.000 --> 00:00:09.550
34c3 preroll music
00:00:15.565 --> 00:00:18.230
Herald: ...and I will let Katherine take
the stage now.
00:00:18.589 --> 00:00:21.430
Katharine Jarmul, kjam: Awesome! Well,
thank you so much for the introduction and
00:00:21.430 --> 00:00:25.310
thank you so much for being here, taking
your time. I know that Congress is really
00:00:25.310 --> 00:00:29.800
exciting, so I really appreciate you
spending some time with me today. It's my
00:00:29.800 --> 00:00:34.470
first ever Congress, so I'm also really
excited and I want to meet new people. So
00:00:34.470 --> 00:00:39.930
if you wanna come say hi to me later, I'm
somewhat friendly, so we can maybe be
00:00:39.930 --> 00:00:44.680
friends later. Today what we're going to
talk about is deep learning blind spots or
00:00:44.680 --> 00:00:49.890
how to fool "artificial intelligence". I
like to put "artificial intelligence" in
00:00:49.890 --> 00:00:55.270
quotes, because.. yeah, we'll talk about
that, but I think it should be in quotes.
00:00:55.270 --> 00:00:59.570
And today we're going to talk a little bit
about deep learning, how it works and how
00:00:59.570 --> 00:01:07.640
you can maybe fool it. So I ask us: Is AI
becoming more intelligent?
00:01:07.640 --> 00:01:11.078
And I ask this because when I open a
browser and, of course, often it's Chrome
00:01:11.078 --> 00:01:16.979
and Google is already prompting me
for what I should look at
00:01:16.979 --> 00:01:20.260
and it knows that I work with machine
learning, right?
00:01:20.260 --> 00:01:23.830
And these are the headlines
that I see every day:
00:01:23.830 --> 00:01:29.399
"Are Computers Already Smarter Than
Humans?"
00:01:29.399 --> 00:01:32.289
If so, I think we could just pack up and
go home, right?
00:01:32.289 --> 00:01:36.140
Like, we fixed computers,
right? If a computer is smarter than me,
00:01:36.140 --> 00:01:39.780
then I already fixed it, we can go home,
there's no need to talk about computers
00:01:39.780 --> 00:01:47.750
anymore, let's just move on with life. But
that's not true, right? We know, because
00:01:47.750 --> 00:01:51.010
we work with computers and we know how
stupid computers are sometimes. They're
00:01:51.010 --> 00:01:55.890
pretty bad. Computers do only what we tell
them to do, generally, so I don't think a
00:01:55.890 --> 00:02:01.090
computer can think and be smarter than me.
So with the same types of headlines that
00:02:01.090 --> 00:02:11.690
you see this, then you also see this: And
yeah, so Apple recently released their
00:02:11.690 --> 00:02:17.500
face ID and this unlocks your phone with
your face and it seems like a great idea,
00:02:17.500 --> 00:02:22.451
right? You have a unique face, you have a
face, nobody else can take your face. But
00:02:22.451 --> 00:02:28.300
unfortunately what we find out about
computers is that they're awful sometimes,
00:02:28.300 --> 00:02:32.480
and for these women.. for this Chinese
woman that owned an iPhone,
00:02:32.480 --> 00:02:35.960
her coworker was able to unlock her phone.
00:02:35.964 --> 00:02:39.320
And I think Hendrick and Karin
talked about, if you were here for the
00:02:39.320 --> 00:02:41.590
last talk ("Beeinflussung durch künstliche
Intelligenz"). We have a lot of problems
00:02:41.590 --> 00:02:46.379
in machine learning and one of them is
stereotypes and prejudice that are within
00:02:46.379 --> 00:02:52.340
our training data or within our minds that
leak into our models. And perhaps they
00:02:52.340 --> 00:02:57.739
didn't do adequate training data on
determining different features of Chinese
00:02:57.739 --> 00:03:03.160
folks. And perhaps it's other problems
with their model or their training data or
00:03:03.160 --> 00:03:07.500
whatever they're trying to do. But they
clearly have some issues, right? So when
00:03:07.500 --> 00:03:12.050
somebody asked me: "Is AI gonna take over
the world and is there a super robot
00:03:12.050 --> 00:03:17.300
that's gonna come and be my new, you know,
leader or so to speak?" I tell them we
00:03:17.300 --> 00:03:21.710
can't even figure out the stuff that we
already have in production. So if we can't
00:03:21.710 --> 00:03:25.690
even figure out the stuff we already have
in production, I'm a little bit less
00:03:25.690 --> 00:03:33.209
worried of the super robot coming to kill
me. That said, unfortunately the powers
00:03:33.209 --> 00:03:38.190
that be, the powers that be a lot of times
they believe in this and they believe
00:03:38.190 --> 00:03:44.540
strongly in "artificial intelligence" and
machine learning. They're collecting data
00:03:44.540 --> 00:03:50.800
every day about you and me and everyone
else. And they're gonna use this data to
00:03:50.800 --> 00:03:56.349
build even better models. This is because
the revolution that we're seeing now in
00:03:56.349 --> 00:04:02.080
machine learning has really not much to do
with new algorithms or architectures. It
00:04:02.080 --> 00:04:09.630
has a lot more to do with heavy compute
and with massive, massive data sets. And
00:04:09.630 --> 00:04:15.740
the more that we have training data of
petabytes per 24 hours or even less, the
00:04:15.740 --> 00:04:22.690
more we're able to essentially fix up the
parts that don't work so well. The
00:04:22.690 --> 00:04:25.979
companies that we see here are companies
that are investing heavily in machine
00:04:25.979 --> 00:04:30.979
learning and AI. Part of how they're
investing heavily is, they're collecting
00:04:30.979 --> 00:04:37.999
more and more data about you and me and
everyone else. Google and Facebook, more
00:04:37.999 --> 00:04:42.789
than 1 billion active users. I was
surprised to know that in Germany the
00:04:42.789 --> 00:04:48.159
desktop search traffic for Google is
higher than most of the rest of the world.
00:04:48.159 --> 00:04:53.259
And for Baidu they're growing with the
speed that broadband is available. And so,
00:04:53.259 --> 00:04:56.970
what we see is, these people are
collecting this data and they also are
00:04:56.970 --> 00:05:02.779
using new technologies like GPUs and TPUs
in new ways to parallelize workflows
00:05:02.779 --> 00:05:09.449
and with this they're able to mess up
less, right? They're still messing up, but
00:05:09.449 --> 00:05:14.960
they mess up slightly less. And they're
not going to get uninterested in this
00:05:14.960 --> 00:05:20.550
topic, so we need to kind of start to
prepare how we respond to this type of
00:05:20.550 --> 00:05:25.860
behavior. One of the things that has been
a big area of research, actually also for
00:05:25.860 --> 00:05:30.080
a lot of these companies, is what we'll
talk about today and that's adversarial
00:05:30.080 --> 00:05:36.800
machine learning. But the first thing that
we'll start with is what is behind what we
00:05:36.800 --> 00:05:44.009
call AI. So most of the time when you
think of AI or something like Siri and so
00:05:44.009 --> 00:05:48.979
forth, you are actually potentially
talking about an old-school rule-based
00:05:48.979 --> 00:05:53.930
system. This is a rule, like you say a
particular thing and then Siri is like:
00:05:53.930 --> 00:05:58.129
"Yes, I know how to respond to this". And
we even hard program these types of things
00:05:58.129 --> 00:06:02.880
in, right? That is one version of AI, is
essentially: It's been pre-programmed to
00:06:02.880 --> 00:06:08.839
do and understand certain things. Another
form that usually, for example for the
00:06:08.839 --> 00:06:12.619
people that are trying to build AI robots
and the people that are trying to build
00:06:12.619 --> 00:06:17.110
what we call "general AI", so this is
something that can maybe learn like a
00:06:17.110 --> 00:06:20.190
human, they'll use reinforcement learning.
00:06:20.190 --> 00:06:22.200
I don't specialize in reinforcement
learning.
00:06:22.200 --> 00:06:26.401
But what it does is it essentially
tries to reward you for
00:06:26.401 --> 00:06:32.429
behaviour that you're expected to do. So
if you complete a task, you get a a
00:06:32.429 --> 00:06:36.099
cookie. You complete two other tasks, you
get two or three more cookies depending on
00:06:36.099 --> 00:06:41.759
how important the task is. And this will
help you learn how to behave to get more
00:06:41.759 --> 00:06:45.990
points and it's used a lot in robots and
gaming and so forth. And I'm not really
00:06:45.990 --> 00:06:49.340
going to talk about that today because
most of that is still not really something
00:06:49.340 --> 00:06:54.880
that you or I interact with. Well, what I
am gonna talk about today is neural
00:06:54.880 --> 00:06:59.680
networks, or as some people like to call
them "deep learning", right? So deep
00:06:59.680 --> 00:07:04.119
learning 1: The neural network versus deep
learning battle awhile ago. So here's an
00:07:04.119 --> 00:07:09.949
example neural network: we have an input
layer and that's where we essentially make
00:07:09.949 --> 00:07:14.550
a quantitative version of whatever our
data is. So we need to make it into
00:07:14.550 --> 00:07:19.890
numbers. Then we have a hidden layer and
we might have multiple hidden layers. And
00:07:19.890 --> 00:07:23.759
depending on how deep our network is, or a
network inside a network, right, which is
00:07:23.759 --> 00:07:28.179
possible. We might have very much
different layers there and they may even
00:07:28.179 --> 00:07:33.539
act in cyclical ways. And then that's
where all the weights and the variables
00:07:33.539 --> 00:07:39.259
and the learning happens. So that has..
holds a lot of information and data that
00:07:39.259 --> 00:07:43.979
we eventually want to train there. And
finally we have an output layer. And
00:07:43.979 --> 00:07:47.529
depending on the network and what we're
trying to do the output layer can vary
00:07:47.529 --> 00:07:51.539
between something that looks like the
input, like for example if we want to
00:07:51.539 --> 00:07:55.719
machine translate, then I want the output
to look like the input, right, I want it
00:07:55.719 --> 00:07:59.909
to just be in a different language, or the
output could be a different class. It can
00:07:59.909 --> 00:08:05.749
be, you know, this is a car or this is a
train and so forth. So it really depends
00:08:05.749 --> 00:08:10.610
what you're trying to solve, but the
output layer gives us the answer. And how
00:08:10.610 --> 00:08:17.159
we train this is, we use backpropagation.
Backpropagation is nothing new and neither
00:08:17.159 --> 00:08:21.139
is one of the most popular methods to do
so, which is called stochastic gradient
00:08:21.139 --> 00:08:26.459
descent. What we do when we go through
that part of the training, is we go from
00:08:26.459 --> 00:08:29.759
the output layer and we go backwards
through the network. That's why it's
00:08:29.759 --> 00:08:34.828
called backpropagation, right? And as we
go backwards through the network, in the
00:08:34.828 --> 00:08:39.139
most simple way, we upvote and downvote
what's working and what's not working. So
00:08:39.139 --> 00:08:42.729
we say: "oh you got it right, you get a
little bit more importance", or "you got
00:08:42.729 --> 00:08:46.040
it wrong, you get a little bit less
importance". And eventually we hope
00:08:46.040 --> 00:08:50.481
over time, that they essentially correct
each other's errors enough that we get a
00:08:50.481 --> 00:08:57.550
right answer. So that's a very general
overview of how it works and the cool
00:08:57.550 --> 00:09:02.720
thing is: Because it works that way, we
can fool it. And people have been
00:09:02.720 --> 00:09:08.269
researching ways to fool it for quite some
time. So I give you a brief overview of
00:09:08.269 --> 00:09:13.290
the history of this field, so we can kind
of know where we're working from and maybe
00:09:13.290 --> 00:09:19.220
hopefully then where we're going to. In
2005 was one of the first most important
00:09:19.220 --> 00:09:24.740
papers to approach adversarial learning
and it was written by a series of
00:09:24.740 --> 00:09:29.630
researchers and they wanted to see, if
they could act as an informed attacker and
00:09:29.630 --> 00:09:34.440
attack a linear classifier. So this is
just a spam filter and they're like can I
00:09:34.440 --> 00:09:37.850
send spam to my friend? I don't know why
they would want to do this, but: "Can I
00:09:37.850 --> 00:09:43.209
send spam to my friend, if I tried testing
out a few ideas?" And what they were able
00:09:43.209 --> 00:09:47.639
to show is: Yes, rather than just, you
know, trial and error which anybody can do
00:09:47.639 --> 00:09:52.120
or a brute force attack of just like send
a thousand emails and see what happens,
00:09:52.120 --> 00:09:56.370
they were able to craft a few algorithms
that they could use to try and find
00:09:56.370 --> 00:10:03.240
important words to change, to make it go
through the spam filter. In 2007 NIPS,
00:10:03.240 --> 00:10:08.019
which is a very popular machine learning
conference, had one of their first all-day
00:10:08.019 --> 00:10:12.930
workshops on computer security. And when
they did so, they had a bunch of different
00:10:12.930 --> 00:10:16.780
people that were working on machine
learning in computer security: From
00:10:16.780 --> 00:10:21.430
malware detection, to network intrusion
detection, to of course spam. And they
00:10:21.430 --> 00:10:25.190
also had a few talks on this type of
adversarial learning. So how do you act as
00:10:25.190 --> 00:10:29.980
an adversary to your own model? And then
how do you learn how to counter that
00:10:29.980 --> 00:10:35.650
adversary? In 2013 there was a really
great paper that got a lot of people's
00:10:35.650 --> 00:10:40.001
attention called "Poisoning Attacks
against Support Vector Machines". Now
00:10:40.001 --> 00:10:45.290
support vector machines are essentially
usually a linear classifier and we use
00:10:45.290 --> 00:10:50.121
them a lot to say, "this is a member of
this class, that, or another", when we
00:10:50.121 --> 00:10:54.940
pertain to text. So I have a text and I
want to know what the text is about or I
00:10:54.940 --> 00:10:58.610
want to know if it's a positive or
negative sentiment, a lot of times I'll
00:10:58.610 --> 00:11:05.160
use a support vector machine. We call them
SVM's as well. Battista Biggio was the
00:11:05.160 --> 00:11:08.319
main researcher and he has actually
written quite a lot about these poisoning
00:11:08.319 --> 00:11:15.569
attacks and he poisoned the training data.
So for a lot of these systems, sometimes
00:11:15.569 --> 00:11:20.820
they have active learning. This means, you
or I, when we classify our emails as spam,
00:11:20.820 --> 00:11:26.290
we're helping train the network. So he
poisoned the training data and was able to
00:11:26.290 --> 00:11:32.360
show that by poisoning it in a particular
way, that he was able to then send spam
00:11:32.360 --> 00:11:37.810
email because he knew what words were then
benign, essentially. He went on to study a
00:11:37.810 --> 00:11:43.220
few other things about biometric data if
you're interested in biometrics. But then
00:11:43.220 --> 00:11:49.329
in 2014 Christian Szegedy, Ian Goodfellow,
and a few other main researchers at Google
00:11:49.329 --> 00:11:55.350
Brain released "Intriguing Properties of
Neural Networks." That really became the
00:11:55.350 --> 00:12:00.040
explosion of what we're seeing today in
adversarial learning. And what they were
00:12:00.040 --> 00:12:04.629
able to do, is they were able to say "We
believe there's linear properties of these
00:12:04.629 --> 00:12:08.790
neural networks, even if they're not
necessarily linear networks.
00:12:08.790 --> 00:12:15.560
And we believe we can exploit them to fool
them". And they first introduced then the
00:12:15.560 --> 00:12:23.189
fast gradient sign method, which we'll
talk about later today. So how does it
00:12:23.189 --> 00:12:28.830
work? First I want us to get a little bit
of an intuition around how this works.
00:12:28.830 --> 00:12:35.310
Here's a graphic of gradient descent. And
in gradient descent we have this vertical
00:12:35.310 --> 00:12:40.339
axis is our cost function. And what we're
trying to do is: We're trying to minimize
00:12:40.339 --> 00:12:47.400
cost, we want to minimize the error. And
so when we start out, we just chose random
00:12:47.400 --> 00:12:51.790
weights and variables, so all of our
hidden layers, they just have maybe random
00:12:51.790 --> 00:12:57.339
weights or random distribution. And then
we want to get to a place where the
00:12:57.339 --> 00:13:01.740
weights have meaning, right? We want our
network to know something, even if it's
00:13:01.740 --> 00:13:08.740
just a mathematical pattern, right? So we
start in the high area of the graph, or
00:13:08.740 --> 00:13:13.819
the reddish area, and that's where we
started, we have high error there. And
00:13:13.819 --> 00:13:21.209
then we try to get to the lowest area of
the graph, or here the dark blue that is
00:13:21.209 --> 00:13:26.889
right about here. But sometimes what
happens: As we learn, as we go through
00:13:26.889 --> 00:13:33.300
epochs and training, we're moving slowly
down and hopefully we're optimizing. But
00:13:33.300 --> 00:13:37.370
what we might end up in instead of this
global minimum, we might end up in the
00:13:37.370 --> 00:13:43.800
local minimum which is the other trail.
And that's fine, because it's still zero
00:13:43.800 --> 00:13:49.889
error, right? So we're still probably
going to be able to succeed, but we might
00:13:49.889 --> 00:13:56.139
not get the best answer all the time. What
adversarial tries to do in the most basic
00:13:56.139 --> 00:14:01.980
of ways, it essentially tries to push the
error rate back up the hill for as many
00:14:01.980 --> 00:14:07.709
units as it can. So it essentially tries
to increase the error slowly through
00:14:07.709 --> 00:14:14.600
perturbations. And by disrupting, let's
say, the weakest links like the one that
00:14:14.600 --> 00:14:19.060
did not find the global minimum but
instead found a local minimum, we can
00:14:19.060 --> 00:14:23.069
hopefully fool the network, because we're
finding those weak spots and we're
00:14:23.069 --> 00:14:25.629
capitalizing on them, essentially.
00:14:31.252 --> 00:14:34.140
So what does an adversarial example
actually look like?
00:14:34.140 --> 00:14:37.430
You may have already seen this
because it's very popular on the
00:14:37.430 --> 00:14:45.221
Twittersphere and a few other places, but
this was a series of researches at MIT. It
00:14:45.221 --> 00:14:51.059
was debated whether you could do adverse..
adversarial learning in the real world. A
00:14:51.059 --> 00:14:57.339
lot of the research has just been a still
image. And what they were able to show:
00:14:57.339 --> 00:15:03.079
They created a 3D-printed turtle. I mean
it looks like a turtle to you as well,
00:15:03.079 --> 00:15:09.910
correct? And this 3D-printed turtle by the
Inception Network, which is a very popular
00:15:09.910 --> 00:15:16.790
computer vision network, is a rifle and it
is a rifle in every angle that you can
00:15:16.790 --> 00:15:21.959
see. And the way they were able to do this
and, I don't know the next time it goes
00:15:21.959 --> 00:15:25.910
around you can see perhaps, and it's a
little bit easier on the video which I'll
00:15:25.910 --> 00:15:29.790
have posted, I'll share at the end, you
can see perhaps that there's a slight
00:15:29.790 --> 00:15:35.529
discoloration of the shell. They messed
with the texture. By messing with this
00:15:35.529 --> 00:15:39.910
texture and the colors they were able to
fool the neural network, they were able to
00:15:39.910 --> 00:15:45.259
activate different neurons that were not
supposed to be activated. Units, I should
00:15:45.259 --> 00:15:51.129
say. So what we see here is, yeah, it can
be done in the real world, and when I saw
00:15:51.129 --> 00:15:56.339
this I started getting really excited.
Because, video surveillance is a real
00:15:56.339 --> 00:16:02.529
thing, right? So if we can start fooling
3D objects, we can perhaps start fooling
00:16:02.529 --> 00:16:08.040
other things in the real world that we
would like to fool.
00:16:08.040 --> 00:16:12.440
applause
00:16:12.440 --> 00:16:19.149
kjam: So why do adversarial examples
exist? We're going to talk a little bit
00:16:19.149 --> 00:16:23.879
about some things that are approximations
of what's actually happening, so please
00:16:23.879 --> 00:16:27.610
forgive me for not being always exact, but
I would rather us all have a general
00:16:27.610 --> 00:16:33.660
understanding of what's happening. Across
the top row we have an input layer and
00:16:33.660 --> 00:16:39.480
these images to the left, we can see, are
the source images and this source image is
00:16:39.480 --> 00:16:43.380
like a piece of farming equipment or
something. And on the right we have our
00:16:43.380 --> 00:16:48.800
guide image. This is what we're trying to
get the network to see we want it to
00:16:48.800 --> 00:16:55.070
missclassify this farm equipment as a pink
bird. So what these researchers did is
00:16:55.070 --> 00:16:59.019
they targeted different layers of the
network. And they said: "Okay, we're going
00:16:59.019 --> 00:17:02.410
to use this method to target this
particular layer and we'll see what
00:17:02.410 --> 00:17:07.569
happens". And so as they targeted these
different layers you can see what's
00:17:07.569 --> 00:17:12.109
happening on the internal visualization.
Now neural networks can't see, right?
00:17:12.109 --> 00:17:17.939
They're looking at matrices of numbers but
what we can do is we can use those
00:17:17.939 --> 00:17:26.559
internal values to try and see with our
human eyes what they are learning. And we
00:17:26.559 --> 00:17:31.370
can see here clearly inside the network,
we no longer see the farming equipment,
00:17:31.370 --> 00:17:39.550
right? We see a pink bird. And this is not
visible to our human eyes. Now if you
00:17:39.550 --> 00:17:43.570
really study and if you enlarge the image
you can start to see okay there's a little
00:17:43.570 --> 00:17:48.190
bit of pink here or greens, I don't know
what's happening, but we can still see it
00:17:48.190 --> 00:17:56.510
in the neural network we have tricked. Now
people don't exactly know yet why these
00:17:56.510 --> 00:18:03.159
blind spots exist. So it's still an area
of active research exactly why we can fool
00:18:03.159 --> 00:18:09.429
neural networks so easily. There are some
prominent researchers that believe that
00:18:09.429 --> 00:18:14.450
neural networks are essentially very
linear and that we can use this simple
00:18:14.450 --> 00:18:20.840
linearity to misclassify to jump into
another area. But there are others that
00:18:20.840 --> 00:18:24.820
believe that there's these pockets or
blind spots and that we can then find
00:18:24.820 --> 00:18:28.500
these blind spots where these neurons
really are the weakest links and they
00:18:28.500 --> 00:18:33.160
maybe even haven't learned anything and if
we change their activation then we can
00:18:33.160 --> 00:18:37.580
fool the network easily. So this is still
an area of active research and let's say
00:18:37.580 --> 00:18:44.320
you're looking for your thesis, this would
be a pretty neat thing to work on. So
00:18:44.320 --> 00:18:49.399
we'll get into just a brief overview of
some of the math behind the most popular
00:18:49.399 --> 00:18:55.571
methods. First we have the fast gradient
sign method and that is was used in the
00:18:55.571 --> 00:18:59.950
initial paper and now there's been many
iterations on it. And what we do is we
00:18:59.950 --> 00:19:05.120
have our same cost function, so this is
the same way that we're trying to train
00:19:05.120 --> 00:19:13.110
our network and it's trying to learn. And
we take the gradient sign of that and if
00:19:13.110 --> 00:19:16.330
you can think, it's okay, if you're not
used to doing vector calculus, and
00:19:16.330 --> 00:19:20.250
especially not without a pen and paper in
front of you, but what you think we're
00:19:20.250 --> 00:19:24.140
doing is we're essentially trying to
calculate some approximation of a
00:19:24.140 --> 00:19:29.700
derivative of the function. And this can
kind of tell us, where is it going. And if
00:19:29.700 --> 00:19:37.299
we know where it's going, we can maybe
anticipate that and change. And then to
00:19:37.299 --> 00:19:41.480
create the adversarial images, we then
take the original input plus a small
00:19:41.480 --> 00:19:48.770
number epsilon times that gradient's sign.
For the Jacobian Saliency Map, this is a
00:19:48.770 --> 00:19:55.010
newer method and it's a little bit more
effective, but it takes a little bit more
00:19:55.010 --> 00:20:02.250
compute. This Jacobian Saliency Map uses a
Jacobian matrix and if you remember also,
00:20:02.250 --> 00:20:07.649
and it's okay if you don't, a Jacobian
matrix will look at the full derivative of
00:20:07.649 --> 00:20:12.049
a function, so you take the full
derivative of a cost function
00:20:12.049 --> 00:20:18.269
at that vector, and it gives you a matrix
that is a pointwise approximation,
00:20:18.269 --> 00:20:22.550
if the function is differentiable
at that input vector. Don't
00:20:22.550 --> 00:20:28.320
worry you can review this later too. But
the Jacobian matrix then we use to create
00:20:28.320 --> 00:20:33.059
this saliency map the same way where we're
essentially trying some sort of linear
00:20:33.059 --> 00:20:38.830
approximation, or pointwise approximation,
and we then want to find two pixels that
00:20:38.830 --> 00:20:43.860
we can perturb that cause the most
disruption. And then we continue to the
00:20:43.860 --> 00:20:48.970
next. Unfortunately this is currently a
O(n²) problem, but there's a few people
00:20:48.970 --> 00:20:53.910
that are trying to essentially find ways
that we can approximate this and make it
00:20:53.910 --> 00:21:01.320
faster. So maybe now you want to fool a
network too and I hope you do, because
00:21:01.320 --> 00:21:06.580
that's what we're going to talk about.
First you need to pick a problem or a
00:21:06.580 --> 00:21:13.460
network type you may already know. But you
may want to investigate what perhaps is
00:21:13.460 --> 00:21:19.019
this company using, what perhaps is this
method using and do a little bit of
00:21:19.019 --> 00:21:23.730
research, because that's going to help
you. Then you want to research state-of-
00:21:23.730 --> 00:21:28.610
the-art methods and this is like a typical
research statement that you have a new
00:21:28.610 --> 00:21:32.360
state-of-the-art method, but the good news
is is that the state-of-the-art two to
00:21:32.360 --> 00:21:38.179
three years ago is most likely in
production or in systems today. So once
00:21:38.179 --> 00:21:44.480
they find ways to speed it up, some
approximation of that is deployed. And a
00:21:44.480 --> 00:21:48.279
lot of times these are then publicly
available models, so a lot of times, if
00:21:48.279 --> 00:21:51.480
you're already working with the deep
learning framework they'll come
00:21:51.480 --> 00:21:56.450
prepackaged with a few of the different
popular models, so you can even use that.
00:21:56.450 --> 00:22:00.691
If you're already building neural networks
of course you can build your own. An
00:22:00.691 --> 00:22:05.510
optional step, but one that might be
recommended, is to fine-tune your model
00:22:05.510 --> 00:22:10.750
and what this means is to essentially take
a new training data set, maybe data that
00:22:10.750 --> 00:22:15.490
you think this company is using or that
you think this network is using, and
00:22:15.490 --> 00:22:19.300
you're going to remove the last few layers
of the neural network and you're going to
00:22:19.300 --> 00:22:24.809
retrain it. So you essentially are nicely
piggybacking on the work of the pre
00:22:24.809 --> 00:22:30.650
trained model and you're using the final
layers to create finesse. This essentially
00:22:30.650 --> 00:22:37.169
makes your model better at the task that
you have for it. Finally then you use a
00:22:37.169 --> 00:22:40.260
library, and we'll go through a few of
them, but some of the ones that I have
00:22:40.260 --> 00:22:46.450
used myself is cleverhans, DeepFool and
deep-pwning, and these all come with nice
00:22:46.450 --> 00:22:51.580
built-in features for you to use for let's
say the fast gradient sign method, the
00:22:51.580 --> 00:22:56.740
Jacobian saliency map and a few other
methods that are available. Finally it's
00:22:56.740 --> 00:23:01.550
not going to always work so depending on
your source and your target, you won't
00:23:01.550 --> 00:23:05.840
always necessarily find a match. What
researchers have shown is it's a lot
00:23:05.840 --> 00:23:10.950
easier to fool a network that a cat is a
dog than it is to fool in networks that a
00:23:10.950 --> 00:23:16.030
cat is an airplane. And this is just like
we can make these intuitive, so you might
00:23:16.030 --> 00:23:21.830
want to pick an input that's not super
dissimilar from where you want to go, but
00:23:21.830 --> 00:23:28.260
is dissimilar enough. And you want to test
it locally and then finally test the one
00:23:28.260 --> 00:23:38.149
for the highest misclassification rates on
the target network. And you might say
00:23:38.149 --> 00:23:44.230
Katharine, or you can call me kjam, that's
okay. You might say: "I don't know what
00:23:44.230 --> 00:23:50.049
the person is using", "I don't know what
the company is using" and I will say "it's
00:23:50.049 --> 00:23:56.750
okay", because what's been proven: You can
attack a blackbox model, you do not have
00:23:56.750 --> 00:24:01.950
to know what they're using, you do not
have to know exactly how it works, you
00:24:01.950 --> 00:24:06.760
don't even have to know their training
data, because what you can do is if it
00:24:06.760 --> 00:24:12.710
has.. okay, addendum it has to have some
API you can interface with. But if it has
00:24:12.710 --> 00:24:18.130
an API you can interface with or even any
API you can interact with, that uses the
00:24:18.130 --> 00:24:24.840
same type of learning, you can collect
training data by querying the API. And
00:24:24.840 --> 00:24:28.700
then you're training your local model on
that data that you're collecting. So
00:24:28.700 --> 00:24:32.890
you're collecting the data, you're
training your local model, and as your
00:24:32.890 --> 00:24:37.299
local model gets more accurate and more
similar to the deployed black box that you
00:24:37.299 --> 00:24:43.409
don't know how it works, you are then
still able to fool it. And what this paper
00:24:43.409 --> 00:24:49.730
proved, Nicolas Papanov and a few other
great researchers, is that with usually
00:24:49.730 --> 00:24:56.527
less than six thousand queries they were
able to fool the network between 84% and 97% certainty
00:24:59.301 --> 00:25:03.419
And what the same group
of researchers also studied is the ability
00:25:03.419 --> 00:25:09.241
to transfer the ability to fool one
network into another network and they
00:25:09.241 --> 00:25:14.910
called that transfer ability. So I can
take a certain type of network and I can
00:25:14.910 --> 00:25:19.320
use adversarial examples against this
network to fool a different type of
00:25:19.320 --> 00:25:26.269
machine learning technique. Here we have
their matrix, their heat map, that shows
00:25:26.269 --> 00:25:32.730
us exactly what they were able to fool. So
we have across the left-hand side here the
00:25:32.730 --> 00:25:37.740
source machine learning technique, we have
deep learning, logistic regression, SVM's
00:25:37.740 --> 00:25:43.380
like we talked about, decision trees and
K-nearest-neighbors. And across the bottom
00:25:43.380 --> 00:25:47.340
we have the target machine learning, so
what were they targeting. They created the
00:25:47.340 --> 00:25:51.470
adversaries with the left hand side and
they targeted across the bottom. We
00:25:51.470 --> 00:25:56.700
finally have an ensemble model at the end.
And what they were able to show is like,
00:25:56.700 --> 00:26:03.130
for example, SVM's and decision trees are
quite easy to fool, but logistic
00:26:03.130 --> 00:26:08.480
regression a little bit less so, but still
strong, for deep learning and K-nearest-
00:26:08.480 --> 00:26:13.460
neighbors, if you train a deep learning
model or a K-nearest-neighbor model, then
00:26:13.460 --> 00:26:18.179
that performs fairly well against itself.
And so what they're able to show is that
00:26:18.179 --> 00:26:23.320
you don't necessarily need to know the
target machine and you don't even have to
00:26:23.320 --> 00:26:28.050
get it right, even if you do know, you can
use a different type of machine learning
00:26:28.050 --> 00:26:30.437
technique to target the network.
00:26:34.314 --> 00:26:39.204
So we'll
look at six lines of Python here and in
00:26:39.204 --> 00:26:44.559
these six lines of Python I'm using the
cleverhans library and in six lines of
00:26:44.559 --> 00:26:52.419
Python I can both generate my adversarial
input and I can even predict on it. So if
00:26:52.419 --> 00:27:02.350
you don't code Python, it's pretty easy to
learn and pick up. And for example here we
00:27:02.350 --> 00:27:06.830
have Keras and Keras is a very popular
deep learning library in Python, it
00:27:06.830 --> 00:27:12.070
usually works with a theano or a
tensorflow backend and we can just wrap
00:27:12.070 --> 00:27:19.250
our model, pass it to the fast gradient
method, class and then set up some
00:27:19.250 --> 00:27:24.630
parameters, so here's our epsilon and a
few extra parameters, this is to tune our
00:27:24.630 --> 00:27:30.860
adversary, and finally we can generate our
adversarial examples and then predict on
00:27:30.860 --> 00:27:39.865
them. So in a very small amount of Python
we're able to target and trick a network.
00:27:40.710 --> 00:27:45.791
If you're already using tensorflow or
Keras, it already works with those libraries.
00:27:48.828 --> 00:27:52.610
Deep-pwning is one of the first
libraries that I heard about in this space
00:27:52.610 --> 00:27:58.200
and it was presented at Def Con in 2016
and what it comes with is a bunch of
00:27:58.200 --> 00:28:03.320
tensorflow built-in code. It even comes
with a way that you can train the model
00:28:03.320 --> 00:28:06.730
yourself, so it has a few different
models, a few different convolutional
00:28:06.730 --> 00:28:12.130
neural networks and these are
predominantly used in computer vision.
00:28:12.130 --> 00:28:18.090
It also however has a semantic model and I
normally work in NLP and I was pretty
00:28:18.090 --> 00:28:24.240
excited to try it out. What it comes built
with is the Rotten Tomatoes sentiment, so
00:28:24.240 --> 00:28:29.900
this is Rotten Tomatoes movie reviews that
try to learn is it positive or negative.
00:28:30.470 --> 00:28:35.269
So the original text that I input in, when
I was generating my adversarial networks
00:28:35.269 --> 00:28:41.500
was "more trifle than triumph", which is a
real review and the adversarial text that
00:28:41.500 --> 00:28:46.080
it gave me was "jonah refreshing haunting
leaky"
00:28:49.470 --> 00:28:52.660
...Yeah.. so I was able to fool my network
00:28:52.660 --> 00:28:57.559
but I lost any type of meaning and
this is really the problem when we think
00:28:57.559 --> 00:29:03.539
about how we apply adversarial learning to
different tasks is, it's easy for an image
00:29:03.539 --> 00:29:08.960
if we make a few changes for it to retain
its image, right? It's many, many pixels,
00:29:08.960 --> 00:29:14.139
but when we start going into language, if
we change one word and then another word
00:29:14.139 --> 00:29:18.950
and another word or maybe we changed all
of the words, we no longer understand as
00:29:18.950 --> 00:29:23.120
humans. And I would say this is garbage
in, garbage out, this is not actual
00:29:23.120 --> 00:29:28.759
adversarial learning. So we have a long
way to go when it comes to language tasks
00:29:28.759 --> 00:29:32.740
and being able to do adversarial learning
and there is some research in this, but
00:29:32.740 --> 00:29:37.279
it's not really advanced yet. So hopefully
this is something that we can continue to
00:29:37.279 --> 00:29:42.429
work on and advance further and if so we
need to support a few different types of
00:29:42.429 --> 00:29:47.426
networks that are more common in NLP than
they are in computer vision.
00:29:50.331 --> 00:29:54.759
There's some other notable open-source libraries that
are available to you and I'll cover just a
00:29:54.759 --> 00:29:59.610
few here. There's a "Vanderbilt
computational economics research lab" that
00:29:59.610 --> 00:30:03.679
has adlib and this allows you to do
poisoning attacks. So if you want to
00:30:03.679 --> 00:30:09.429
target training data and poison it, then
you can do so with that and use scikit-
00:30:09.429 --> 00:30:16.590
learn. DeepFool allows you to do the fast
gradient sign method, but it tries to do
00:30:16.590 --> 00:30:21.590
smaller perturbations, it tries to be less
detectable to us humans.
00:30:23.171 --> 00:30:28.284
It's based on Theano, which is another library that I believe uses Lua as well as Python.
00:30:29.669 --> 00:30:34.049
"FoolBox" is kind of neat because I only
heard about it last week, but it collects
00:30:34.049 --> 00:30:39.309
a bunch of different techniques all in one
library and you could use it with one
00:30:39.309 --> 00:30:43.160
interface. So if you want to experiment
with a few different ones at once, I would
00:30:43.160 --> 00:30:47.460
recommend taking a look at that and
finally for something that we'll talk
00:30:47.460 --> 00:30:53.600
about briefly in a short period of time we
have "Evolving AI Lab", which release a
00:30:53.600 --> 00:30:59.710
fooling library and this fooling library
is able to generate images that you or I
00:30:59.710 --> 00:31:04.573
can't tell what it is, but that the neural
network is convinced it is something.
00:31:05.298 --> 00:31:09.940
So this we'll talk about maybe some
applications of this in a moment, but they
00:31:09.940 --> 00:31:13.559
also open sourced all of their code and
they're researchers, who open sourced
00:31:13.559 --> 00:31:19.649
their code, which is always very exciting.
As you may have known from some of the
00:31:19.649 --> 00:31:25.500
research I already cited, most of the
studies and the research in this area has
00:31:25.500 --> 00:31:29.830
been on malicious attacks. So there's very
few people trying to figure out how to do
00:31:29.830 --> 00:31:33.769
this for what I would call benevolent
purposes. Most of them are trying to act
00:31:33.769 --> 00:31:39.539
as an adversary in the traditional
computer security sense. They're perhaps
00:31:39.539 --> 00:31:43.889
studying spam filters and how spammers can
get by them. They're perhaps looking at
00:31:43.889 --> 00:31:48.669
network intrusion or botnet-attacks and so
forth. They're perhaps looking at self-
00:31:48.669 --> 00:31:53.390
driving cars so and I know that was
referenced earlier as well at Henrick and
00:31:53.390 --> 00:31:57.889
Karen's talk, they're perhaps trying to
make a yield sign look like a stop sign or
00:31:57.889 --> 00:32:02.760
a stop sign look like a yield sign or a
speed limit, and so forth, and scarily
00:32:02.760 --> 00:32:07.669
they are quite successful at this. Or
perhaps they're looking at data poisoning,
00:32:07.669 --> 00:32:12.441
so how do we poison the model so we render
it useless? In a particular context, so we
00:32:12.441 --> 00:32:17.990
can utilize that. And finally for malware.
So what a few researchers were able to
00:32:17.990 --> 00:32:22.669
show is, by just changing a few things in
the malware they were able to upload their
00:32:22.669 --> 00:32:26.270
malware to Google Mail and send it to
someone and this was still fully
00:32:26.270 --> 00:32:31.580
functional malware. In that same sense
there's the malGAN project, which uses a
00:32:31.580 --> 00:32:38.549
generative adversarial network to create
malware that works, I guess. So there's a
00:32:38.549 --> 00:32:43.326
lot of research of these kind of malicious
attacks within adversarial learning.
00:32:44.984 --> 00:32:51.929
But what I wonder is how might we use this for
good. And I put "good" in quotation marks,
00:32:51.929 --> 00:32:56.179
because we all have different ethical and
moral systems we use. And what you may
00:32:56.179 --> 00:33:00.289
decide is ethical for you might be
different. But I think as a community,
00:33:00.289 --> 00:33:05.450
especially at a conference like this,
hopefully we can converge on some ethical
00:33:05.450 --> 00:33:10.183
privacy concerned version of using these
networks.
00:33:13.237 --> 00:33:20.990
So I've composed a few ideas and I hope that this is just a starting list of a longer conversation.
00:33:22.889 --> 00:33:30.010
One idea is that we can perhaps use this type of adversarial learning to fool surveillance.
00:33:30.830 --> 00:33:36.470
As surveillance affects you and I it even
disproportionately affects people that
00:33:36.470 --> 00:33:41.870
most likely can't be here. So whether or
not we're personally affected, we can care
00:33:41.870 --> 00:33:46.419
about the many lives that are affected by
this type of surveillance. And we can try
00:33:46.419 --> 00:33:49.667
and build ways to fool surveillance
systems.
00:33:50.937 --> 00:33:52.120
Stenography:
00:33:52.120 --> 00:33:55.223
So we could potentially, in a world where more and more people
00:33:55.223 --> 00:33:58.780
have less of a private way of sending messages to one another
00:33:58.780 --> 00:34:03.080
We can perhaps use adversarial learning to send private messages.
00:34:03.830 --> 00:34:08.310
Adware fooling: So
again, where I might have quite a lot of
00:34:08.310 --> 00:34:13.859
privilege and I don't actually see ads
that are predatory on me as much, there is
00:34:13.859 --> 00:34:19.449
a lot of people in the world that face
predatory advertising. And so how can we
00:34:19.449 --> 00:34:23.604
help those problems by developing
adversarial techniques?
00:34:24.638 --> 00:34:26.520
Poisoning your own private data:
00:34:27.386 --> 00:34:30.600
This depends on whether you
actually need to use the service and
00:34:30.600 --> 00:34:34.590
whether you like how the service is
helping you with the machine learning, but
00:34:34.590 --> 00:34:40.110
if you don't care or if you need to
essentially have a burn box of your data.
00:34:40.110 --> 00:34:45.760
Then potentially you could poison your own
private data. Finally, I want us to use it
00:34:45.760 --> 00:34:51.139
to investigate deployed models. So even
if we don't actually need a use for
00:34:51.139 --> 00:34:56.010
fooling this particular network, the more
we know about what's deployed and how we
00:34:56.010 --> 00:35:00.350
can fool it, the more we're able to keep
up with this technology as it continues to
00:35:00.350 --> 00:35:04.630
evolve. So the more that we're practicing,
the more that we're ready for whatever
00:35:04.630 --> 00:35:09.800
might happen next. And finally I really
want to hear your ideas as well. So I'll
00:35:09.800 --> 00:35:13.940
be here throughout the whole Congress and
of course you can share during the Q&A
00:35:13.940 --> 00:35:17.073
time. If you have great ideas, I really
want to hear them.
00:35:20.635 --> 00:35:26.085
So I decided to play around a little bit with some of my ideas.
00:35:26.810 --> 00:35:32.720
And I was convinced perhaps that I could make Facebook think I was a cat.
00:35:33.305 --> 00:35:36.499
This is my goal. Can Facebook think I'm a cat?
00:35:37.816 --> 00:35:40.704
Because nobody really likes Facebook. I
mean let's be honest, right?
00:35:41.549 --> 00:35:44.166
But I have to be on it because my mom messages me there
00:35:44.166 --> 00:35:46.020
and she doesn't use the email anymore.
00:35:46.020 --> 00:35:47.890
So I'm on Facebook. Anyways.
00:35:48.479 --> 00:35:55.151
So I used a pre-trained Inception model and Keras and I fine-tuned the layers.
00:35:55.151 --> 00:35:57.190
And I'm not a
computer vision person really. But it
00:35:57.190 --> 00:36:01.770
took me like a day of figuring out how
computer vision people transfer their data
00:36:01.770 --> 00:36:06.350
into something I can put inside of a
network figure that out and I was able to
00:36:06.350 --> 00:36:12.040
quickly train a model and the model could
only distinguish between people and cats.
00:36:12.040 --> 00:36:15.140
That's all the model knew how to do. I
give it a picture it says it's a person or
00:36:15.140 --> 00:36:19.630
it's a cat. I actually didn't try just
giving it an image of something else, it
00:36:19.630 --> 00:36:25.380
would probably guess it's a person or a
cat maybe, 50/50, who knows. What I did
00:36:25.380 --> 00:36:31.930
was, I used an image of myself and
eventually I had my fast gradient sign
00:36:31.930 --> 00:36:37.700
method, I used cleverhans, and I was able
to slowly increase the epsilon and so the
00:36:37.700 --> 00:36:44.100
epsilon as it's low, you and I can't see
the perturbations, but also the network
00:36:44.100 --> 00:36:48.920
can't see the perturbations. So we need to
increase it, and of course as we increase
00:36:48.920 --> 00:36:53.300
it, when we're using a technique like
FGSM, we are also increasing the noise
00:36:53.300 --> 00:37:00.830
that we see. And when I got 2.21 epsilon
and I kept uploading it to Facebook and
00:37:00.830 --> 00:37:02.350
Facebook kept saying: "Yeah, do you want
to tag yourself?" and I'm like:
00:37:02.370 --> 00:37:04.222
"no Idon't, I'm just testing".
00:37:05.123 --> 00:37:11.379
Finally I got deployed to an epsilon and Facebook no longer knew I was a face
00:37:11.379 --> 00:37:15.323
So I was just a
book, I was a cat book, maybe.
00:37:15.340 --> 00:37:19.590
applause
00:37:21.311 --> 00:37:24.740
kjam: So, unfortunately, as we see, I
didn't actually become a cat, because that
00:37:24.740 --> 00:37:30.630
would be pretty neat. But I was able to
fool it. I spoke with the computer visions
00:37:30.630 --> 00:37:34.760
specialists that I know and she actually
works in this and I was like: "What
00:37:34.760 --> 00:37:39.020
methods do you think Facebook was using?
Did I really fool the neural network or
00:37:39.020 --> 00:37:43.140
what did I do?" And she's convinced most
likely that they're actually using a
00:37:43.140 --> 00:37:47.580
statistical method called Viola-Jones,
which takes a look at the statistical
00:37:47.580 --> 00:37:53.280
distribution of your face and tries to
guess if there's really a face there. But
00:37:53.280 --> 00:37:58.800
what I was able to show: transferability.
That is, I can use my neural network even
00:37:58.800 --> 00:38:05.380
to fool this statistical model, so now I
have a very noisy but happy photo on FB
00:38:08.548 --> 00:38:14.140
Another use case potentially is
adversarial stenography and I was really
00:38:14.140 --> 00:38:18.590
excited reading this paper. What this
paper covered and they actually released
00:38:18.590 --> 00:38:22.860
the library, as I mentioned. They study
the ability of a neural network to be
00:38:22.860 --> 00:38:26.309
convinced that something's there that's
not actually there.
00:38:27.149 --> 00:38:30.177
And what they used, they used the MNIST training set.
00:38:30.240 --> 00:38:33.420
I'm sorry, if that's like a trigger word
00:38:33.420 --> 00:38:38.410
if you've used MNIST a million times, then
I'm sorry for this, but what they use is
00:38:38.410 --> 00:38:43.290
MNIST, which is zero through nine of
digits, and what they were able to show
00:38:43.290 --> 00:38:48.790
using evolutionary networks is they were
able to generate things that to us look
00:38:48.790 --> 00:38:53.280
maybe like art and they actually used it
on the CIFAR data set too, which has
00:38:53.280 --> 00:38:57.320
colors, and it was quite beautiful. Some
of what they created in fact they showed
00:38:57.320 --> 00:39:04.340
in a gallery. And what the network sees
here is the digits across the top. They
00:39:04.340 --> 00:39:12.170
see that digit, they are more than 99%
convinced that that digit is there and
00:39:12.170 --> 00:39:15.476
what we see is pretty patterns or just
noise.
00:39:16.778 --> 00:39:19.698
When I was reading this paper I was thinking,
00:39:19.698 --> 00:39:23.620
how can we use this to send
messages to each other that nobody else
00:39:23.620 --> 00:39:28.511
will know is there? I'm just sending
really nice.., I'm an artist and this is
00:39:28.511 --> 00:39:35.200
my art and I'm sharing it with my friend.
And in a world where I'm afraid to go home
00:39:35.200 --> 00:39:42.360
because there's a crazy person in charge
and I'm afraid that they might look at my
00:39:42.360 --> 00:39:47.040
phone, in my computer, and a million other
things and I just want to make sure that
00:39:47.040 --> 00:39:51.650
my friend has my pin number or this or
that or whatever. I see a use case for my
00:39:51.650 --> 00:39:56.120
life, but again I leave a fairly
privileged life, there are other people
00:39:56.120 --> 00:40:01.690
where their actual life and livelihood and
security might depend on using a technique
00:40:01.690 --> 00:40:06.150
like this. And I think we could use
adversarial learning to create a new form
00:40:06.150 --> 00:40:07.359
of stenography.
00:40:11.289 --> 00:40:17.070
Finally I cannot impress
enough that the more information we have
00:40:17.070 --> 00:40:20.620
about the systems that we interact with
every day, that our machine learning
00:40:20.620 --> 00:40:24.850
systems, that our AI systems, or whatever
you want to call it, that our deep
00:40:24.850 --> 00:40:29.701
networks, the more information we have,
the better we can fight them, right. We
00:40:29.701 --> 00:40:33.920
don't need perfect knowledge, but the more
knowledge that we have, the better an
00:40:33.920 --> 00:40:41.360
adversary we can be. I thankfully now live
in Germany and if you are also a European
00:40:41.360 --> 00:40:46.770
resident: We have GDPR, which is the
general data protection regulation and it
00:40:46.770 --> 00:40:55.650
goes into effect in May of 2018. We can
use gdpr to make requests about our data,
00:40:55.650 --> 00:41:00.450
we can use GDPR to make requests about
machine learning systems that we interact
00:41:00.450 --> 00:41:07.840
with, this is a right that we have. And in
recital 71 of the GDPR it states: "The
00:41:07.840 --> 00:41:12.550
data subject should have the right to not
be subject to a decision, which may
00:41:12.550 --> 00:41:17.730
include a measure, evaluating personal
aspects relating to him or her which is
00:41:17.730 --> 00:41:22.880
based solely on automated processing and
which produces legal effects concerning
00:41:22.880 --> 00:41:28.010
him or her or similarly significantly
affects him or her, such as automatic
00:41:28.010 --> 00:41:33.620
refusal of an online credit application or
e-recruiting practices without any human
00:41:33.620 --> 00:41:39.270
intervention." And I'm not a lawyer and I
don't know how this will be implemented
00:41:39.270 --> 00:41:43.990
and it's a recital, so we don't even know,
if it will be in force the same way, but
00:41:43.990 --> 00:41:50.720
the good news is: Pieces of this same
sentiment are in the actual amendments and
00:41:50.720 --> 00:41:55.580
if they're in the amendments, then we can
legally use them. And what it also says
00:41:55.580 --> 00:41:59.920
is, we can ask companies to port our data
other places, we can ask companies to
00:41:59.920 --> 00:42:03.890
delete our data, we can ask for
information about how our data is
00:42:03.890 --> 00:42:09.010
processed, we can ask for information
about what different automated decisions
00:42:09.010 --> 00:42:15.750
are being made, and the more we all here
ask for that data, the more we can also
00:42:15.750 --> 00:42:20.530
share that same information with people
worldwide. Because the systems that we
00:42:20.530 --> 00:42:25.091
interact with, they're not special to us,
they're the same types of systems that are
00:42:25.091 --> 00:42:30.610
being deployed everywhere in the world. So
we can help our fellow humans outside of
00:42:30.610 --> 00:42:36.400
Europe by being good caretakers and using
our rights to make more information
00:42:36.400 --> 00:42:41.960
available to the entire world and to use
this information, to find ways to use
00:42:41.960 --> 00:42:46.242
adversarial learning to fool these types
of systems.
00:42:47.512 --> 00:42:56.500
applause
00:42:56.662 --> 00:43:03.360
So how else might we be able to harness
this for good? I cannot focus enough on
00:43:03.360 --> 00:43:08.260
GDPR and our right to collect more
information about the information they're
00:43:08.260 --> 00:43:14.110
already collecting about us and everyone
else. So use it, let's find ways to share
00:43:14.110 --> 00:43:17.740
the information we gain from it. So I
don't want it to just be that one person
00:43:17.740 --> 00:43:21.020
requests it and they learn something. Se
have to find ways to share this
00:43:21.020 --> 00:43:28.080
information with one another. Test low-
tech ways. I'm so excited about the maker
00:43:28.080 --> 00:43:32.850
space here and maker culture and other
low-tech or human-crafted ways to fool
00:43:32.850 --> 00:43:37.890
networks. We can use adversarial learning
perhaps to get good ideas on how to fool
00:43:37.890 --> 00:43:43.350
networks, to get lower tech ways. What if
I painted red pixels all over my face?
00:43:43.350 --> 00:43:48.600
Would I still be recognized? Would I not?
Let's experiment with things that we learn
00:43:48.600 --> 00:43:53.570
from adversarial learning and try to find
other lower-tech solutions to the same problem
00:43:55.428 --> 00:43:59.930
Finally. or nearly finally, we
need to increase the research beyond just
00:43:59.930 --> 00:44:04.010
computer vision. Quite a lot of
adversarial learning has been only in
00:44:04.010 --> 00:44:08.220
computer vision and while I think that's
important and it's also been very
00:44:08.220 --> 00:44:12.030
practical, because we can start to see how
we can fool something, we need to figure
00:44:12.030 --> 00:44:15.920
out natural language processing, we need
to figure out other ways that machine
00:44:15.920 --> 00:44:19.933
learning systems are being used, and we
need to come up with clever ways to fool them.
00:44:21.797 --> 00:44:26.000
Finally, spread the word! So I don't
want the conversation to end here, I don't
00:44:26.000 --> 00:44:30.950
want the conversation to end at Congress,
I want you to go back to your hacker
00:44:30.950 --> 00:44:36.530
collective, your local CCC, the people
that you talk with, your co-workers and I
00:44:36.530 --> 00:44:41.340
want you to spread the word. I want you to
do workshops on adversarial learning, I
00:44:41.340 --> 00:44:47.930
want more people to not treat this AI as
something mystical and powerful, because
00:44:47.930 --> 00:44:52.340
unfortunately it is powerful, but it's not
mystical! So we need to demystify this
00:44:52.340 --> 00:44:57.040
space, we need to experiment, we need to
hack on it and we need to find ways to
00:44:57.040 --> 00:45:02.310
play with it and spread the word to other
people. Finally, I really want to hear
00:45:02.310 --> 00:45:10.480
your other ideas and before I leave today
have to say a little bit about why I
00:45:10.480 --> 00:45:15.820
decided to join the resiliency track this
year. I read about the resiliency track
00:45:15.820 --> 00:45:21.910
and I was really excited. It spoke to me.
And I said I want to live in a world
00:45:21.910 --> 00:45:27.230
where, even if there's an entire burning
trash fire around me, I know that there
00:45:27.230 --> 00:45:32.010
are other people that I care about, that I
can count on, that I can work with to try
00:45:32.010 --> 00:45:37.840
and at least protect portions of our
world. To try and protect ourselves, to
00:45:37.840 --> 00:45:43.940
try and protect people that do not have as
much privilege. So, what I want to be a
00:45:43.940 --> 00:45:49.240
part of, is something that can use maybe
the skills I have and the skills you have
00:45:49.240 --> 00:45:56.590
to do something with that. And your data
is a big source of value for everyone.
00:45:56.590 --> 00:46:02.820
Any free service you use, they are selling
your data. OK, I don't know that for a
00:46:02.820 --> 00:46:08.420
fact, but it is very certain, I feel very
certain about the fact that they're most
00:46:08.420 --> 00:46:12.560
likely selling your data. And if they're
selling your data, they might also be
00:46:12.560 --> 00:46:17.730
buying your data. And there is a whole
market, that's legal, that's freely
00:46:17.730 --> 00:46:22.670
available, to buy and sell your data. And
they make money off of that, and they mine
00:46:22.670 --> 00:46:28.910
more information, and make more money off
of that, and so forth. So, I will read a
00:46:28.910 --> 00:46:35.410
little bit of my opinions that I put forth
on this. Determine who you share your data
00:46:35.410 --> 00:46:41.910
with and for what reasons. GDPR and data
portability give us European residents
00:46:41.910 --> 00:46:44.410
stronger rights than most of the world.
00:46:44.920 --> 00:46:47.940
Let's use them. Let's choose privacy
00:46:47.940 --> 00:46:52.800
concerned ethical data companies over
corporations that are entirely built on
00:46:52.800 --> 00:46:58.260
selling ads. Let's build start-ups,
organizations, open-source tools and
00:46:58.260 --> 00:47:05.691
systems that we can be truly proud of. And
let's port our data to those.
00:47:05.910 --> 00:47:15.310
Applause
00:47:15.409 --> 00:47:18.940
Herald: Amazing. We have,
we have time for a few questions.
00:47:18.940 --> 00:47:21.860
K.J.: I'm not done yet, sorry, it's fine.
Herald: I'm so sorry.
00:47:21.860 --> 00:47:24.750
K.J.: Laughs It's cool.
No big deal.
00:47:24.750 --> 00:47:31.520
So, machine learning. Closing remarks is
brief round up. Closing remarks. There is
00:47:31.520 --> 00:47:35.250
that machine learning is not very
intelligent. I think artificial
00:47:35.250 --> 00:47:39.330
intelligence is a misnomer in a lot of
ways, but this doesn't mean that people
00:47:39.330 --> 00:47:43.830
are going to stop using it. In fact
there's very smart, powerful, and rich
00:47:43.830 --> 00:47:49.850
people that are investing more than ever
in it. So it's not going anywhere. And
00:47:49.850 --> 00:47:53.620
it's going to be something that
potentially becomes more dangerous over
00:47:53.620 --> 00:47:58.570
time. Because as we hand over more of
these to these systems, it could
00:47:58.570 --> 00:48:04.240
potentially control more and more of our
lives. We can use, however, adversarial
00:48:04.240 --> 00:48:09.320
machine learning techniques to find ways
to fool "black box" networks. So we can
00:48:09.320 --> 00:48:14.400
use these and we know we don't have to
have perfect knowledge. However,
00:48:14.400 --> 00:48:18.930
information is powerful. And the more
information that we do have, the more were
00:48:18.930 --> 00:48:25.860
able to become a good GDPR based
adversary. So please use GDPR and let's
00:48:25.860 --> 00:48:31.230
discuss ways where we can share
information. Finally, please support open-
00:48:31.230 --> 00:48:35.590
source tools and research in this space,
because we need to keep up with where the
00:48:35.590 --> 00:48:41.790
state of the art is. So we need to keep
ourselves moving and open in that way. And
00:48:41.790 --> 00:48:46.670
please, support ethical data companies. Or
start one. If you come to me and you say
00:48:46.670 --> 00:48:50.240
"Katharine, I'm going to charge you this
much money, but I will never sell your
00:48:50.240 --> 00:48:56.520
data. And I will never buy your data." I
would much rather you handle my data. So I
00:48:56.520 --> 00:49:03.390
want us, especially those within the EU,
to start a new economy around trust, and
00:49:03.390 --> 00:49:12.740
privacy, and ethical data use.
Applause
00:49:12.740 --> 00:49:15.830
Thank you very much.
Thank you.
00:49:15.830 --> 00:49:18.050
Herald: OK. We still have time for a few
questions.
00:49:18.050 --> 00:49:20.390
K.J.: No, no, no. No worries, no worries.
Herald: Less than the last time I walked
00:49:20.390 --> 00:49:23.870
up here, but we do.
K.J.: Yeah, now I'm really done.
00:49:23.870 --> 00:49:27.730
Herald: Come up to one of the mics in the
front section and raise your hand. Can we
00:49:27.730 --> 00:49:31.584
take a question from mic one.
Question: Thank you very much for the very
00:49:31.584 --> 00:49:37.860
interesting talk. One impression that I
got during the talk was, with the
00:49:37.860 --> 00:49:42.420
adversarial learning approach aren't we
just doing pen testing and Quality
00:49:42.420 --> 00:49:47.920
Assurance for the AI companies they're
just going to build better machines.
00:49:47.920 --> 00:49:52.910
Answer: That's a very good question and of
course most of this research right now is
00:49:52.910 --> 00:49:56.780
coming from those companies, because
they're worried about this. What, however,
00:49:56.780 --> 00:50:02.290
they've shown is, they don't really have a
good way to fool, to learn how to fool
00:50:02.290 --> 00:50:08.710
this. Most likely they will need to use a
different type of network, eventually. So
00:50:08.710 --> 00:50:13.440
probably, whether it's the blind spots or
the linearity of these networks, they are
00:50:13.440 --> 00:50:18.000
easy to fool and they will have to come up
with a different method for generating
00:50:18.000 --> 00:50:24.520
something that is robust enough to not be
tricked. So, to some degree yes, its a
00:50:24.520 --> 00:50:28.520
cat-and-mouse game, right. But that's why
I want the research and the open source to
00:50:28.520 --> 00:50:33.410
continue as well. And I would be highly
suspect if they all of a sudden figure out
00:50:33.410 --> 00:50:38.170
a way to make a neural network which has
proven linear relationships, that we can
00:50:38.170 --> 00:50:42.560
exploit, nonlinear. And if so, it's
usually a different type of network that's
00:50:42.560 --> 00:50:47.430
a lot more expensive to train and that
doesn't actually generalize well. So we're
00:50:47.430 --> 00:50:51.280
going to really hit them in a way where
they're going to have to be more specific,
00:50:51.280 --> 00:50:59.620
try harder, and I would rather do that
than just kind of give up.
00:50:59.620 --> 00:51:02.560
Herald: Next one.
Mic 2
00:51:02.560 --> 00:51:07.840
Q: Hello. Thank you for the nice talk. I
wanted to ask, have you ever tried looking
00:51:07.840 --> 00:51:14.720
at from the other direction? Like, just
trying to feed the companies falsely
00:51:14.720 --> 00:51:21.560
classified data. And just do it with so
massive amounts of data, so that they
00:51:21.560 --> 00:51:25.380
learn from it at a certain point.
A: Yes, that's these poisoning attacks. So
00:51:25.380 --> 00:51:30.020
when we talk about poison attacks, we are
essentially feeding bad training data and
00:51:30.020 --> 00:51:35.120
we're trying to get them to learn bad
things. Or I wouldn't say bad things, but
00:51:35.120 --> 00:51:37.540
we're trying to get them to learn false
information.
00:51:37.540 --> 00:51:42.781
And that already happens on accident all
the time so I think the more to we can, if
00:51:42.781 --> 00:51:46.491
we share information and they have a
publicly available API, where they're
00:51:46.491 --> 00:51:49.970
actually actively learning from our
information, then yes I would say
00:51:49.970 --> 00:51:55.180
poisoning is a great attack way. And we
can also share information of maybe how
00:51:55.180 --> 00:51:58.360
that works.
So especially I would be intrigued if we
00:51:58.360 --> 00:52:02.330
can do poisoning for adware and malicious
ad targeting.
00:52:02.330 --> 00:52:07.300
Mic 2: OK, thank you.
Herald: One more question from the
00:52:07.300 --> 00:52:12.300
internet and then we run out of time.
K.J. Oh no, sorry
00:52:12.300 --> 00:52:14.290
Herald: So you can find Katherine after.
Signal-Angel: Thank you. One question from
00:52:14.290 --> 00:52:18.210
the internet. What exactly can I do to
harden my model against adversarial
00:52:18.210 --> 00:52:21.210
samples?
K.J.: Sorry?
00:52:21.210 --> 00:52:27.080
Signal: What exactly can I do to harden my
model against adversarial samples?
00:52:27.080 --> 00:52:33.340
K.J.: Not much. What they have shown is,
that if you train on a mixture of real
00:52:33.340 --> 00:52:39.300
training data and adversarial data it's a
little bit harder to fool, but that just
00:52:39.300 --> 00:52:44.720
means that you have to try more iterations
of adversarial input. So right now, the
00:52:44.720 --> 00:52:51.520
recommendation is to train on a mixture of
adversarial and real training data and to
00:52:51.520 --> 00:52:56.330
continue to do that over time. And I would
argue that you need to maybe do data
00:52:56.330 --> 00:53:00.400
validation on input. And if you do data
validation on input maybe you can
00:53:00.400 --> 00:53:05.100
recognize abnormalities. But that's
because I come from mainly like production
00:53:05.100 --> 00:53:09.220
levels not theoretical, and I think maybe
you should just test things, and see if
00:53:09.220 --> 00:53:15.210
look weird you should maybe not take them
into the system.
00:53:15.210 --> 00:53:19.340
Herald: And that's all for the questions.
I wish we had more time but we just don't.
00:53:19.340 --> 00:53:21.660
Please give it up for Katharine Jarmul
00:53:21.660 --> 00:53:26.200
Applause
00:53:26.200 --> 00:53:31.050
34c3 postroll music
00:53:31.050 --> 00:53:47.950
subtitles created by c3subtitles.de
in the year 2019. Join, and help us!