What we hope to do with this meetup
is have something, given the spread of
the questionnaire results
we hope to do something which is kind of
for people who don't know what
deep learning is
and want an introduction to TensorFlow
but also something which is more of a
like a crowd pleaser or something
which is more cutting edge
I am not going to say that this
thing is particularly cutting edge
because once we saw the responses,
we dialed things down a bit
But there will be more cutting edge stuff
and maybe we start to do other meetups
events in other formats
So it could be like we have
an experts' paper meeting
or we could split it now we can see
the size of people, size of the crowd
Anyway, let me talk a little bit about
going deeper with transfer learning
Unfortunately, this is something
some of you people
would have seen me do before
This is the first time I have
done it in tensorflow
and let me just explain that
Before, I have been programming this stuff
in Theano with the
Lasagna layers thing on top
and Theano is a research-based
deep learning framework, out of Montreal
but what I have concluded
since last summer
is that TensorFlow 's probably the winner
of this framework race
at least for foreseeable future
with all this nice industrial stuff
I should be retooling into TensorFlow
That's what I am taking the opportunity
to do for this
So, about me, sorry here we go
I have come up through finance,
startups and stuff
I took a year out basically in 2014
just for fun
I have been doing serious kind of
natural language processing since then
Basically, the overview for this
something more challenging talk
which will probably be 20 mins, 30 mins
depending on how it goes
I want to take a state-of-the-art
TensorFlow model
I want to solve a problem that
it wasn't trained for
And I am going to be using
deep learning as a component
of my solution rather than the
primary focus of what I am trying to build
So this is, in a way more of an industrial
or commercial kind of application
for what's going on here
So the goal for this kind of problem is
I want to distinguish pictures
of classic and modern sports cars
you will see some pictures of
classic and modern cars a bit later
It's not that easy to say what
the difference is
obviously, it could be
different types of images
and it could be lots of
different classes
I am just doing a very simple
two class thing
but it's complicated images
what I want to do is
I want to have a very small training time
so I don't want to be retraining
some huge network
Particularly, I have only got
in this case, 20 training examples
So I am not gonna do any fantastic
million image training
I have got 20 images with me
and I also want to be able to
put this in production
so I can just run it as a component of
something else
Basically, one of the things that is
carrying the deep learning world forward
is an image classification task
called ImageNet
this has been a competition where
they have 15 million labeled images
from 22,000 categories
and you can see some of them here
if we go for this.
this is a picture of a hotdog in a bun
and here are some of the categories
which will be some food I don't know
these are hotdogs, lots of
different pictures of hotdogs
lots of different pictures of cheeseburgers
lots of different pictures of plates
so the task for ImageNet is to classify
for any given, any one of these images
which of a thousand different
categories it is from
and it used to be that people could
score adequately well
and were making incremental changes in
how well they can do this
but the deep learning people came along
and kind of tore this to shreds
and Google came up with GoogLeNet
what we are actually going to use here,
back in 2014
suddenly, this stuff is now being done
with further iterations
of this kind of thing,
better than humans can
So the way you can measure whether
someone is better than humans
is, you take a human and see
whether it beats him
the question there is
are there labeling errors
there you need a committee of humans
so the way they label these things is
by running it on Mechanical Turk and
asking people what category is this
cheeseburger in
The network we are going to use here
is the 2014 state-of-the-art GoogLeNet,
also called Inception version 1
The nice thing about this is that
there is an existing model
already trained for this task
and it's available for download
it's all free
and there are lots of different
models out there
there's a model zoo for TensorFlow
So, what I have on my machine
and this is a small model,
it's a 20 megabytes kind of model
So it is not a very big model
Inception 4 is a 200 MB kind of model
which is a bit heavy
I am working here on my laptop
you are gonna see it working in real-time
and the trick here is instead of
a softmax layer at the end
I will show you the diagram, it should be
clear to anyone who's following along
instead of using logits to get me
the probablilities
I am going to strip that away
and I am going to train
a support vector machine
to distinguish these classes
I am not going to retrain the
Inception network at all
I am going to just use it as a component
strip off the top classification piece
and replace it with an SVM
Now, SVMs are pretty well understood
here I am just using Inception
as a featurizer for images
So here's a network picture
Basically, this is what the ImageNet
network is designed for
you put in an image at the bottom
there is this black box which is the
Inception network
which is a bunch of CNNs or
convolutional neural networks
followed by Dense network
followed by these logits
and these logits layers is essentially
the same as the 0 to 10
that Sam had for his digits, 1 to 1000
for the different classes for ImageNet
To actually get the ImageNet output
it uses a softmax function and
then chooses the highest one of these
to give you this is the class
that this is in
What I am going to do is
I am going to ignore this
neat piece of classification technology
that they have got
let's say we use these outputs as inputs
to SVM and just treat these as features
Now if we pick out one of these
this class could be cheeseburger
and this class could be parrot
and this other class could be Husky dog
there is all sorts of classes in here
but basically what I will be doing is that
I will be extracting out the features
of these photos
saying how much of this photo
is like a parrot
how much of this is like a Husky dog
Now it turns out that modern cars and
classic cars can be distinguised that way
Let me go to some code
Ok this code is all up on GitHub
Can everyone see this enough
So basically, I am pulling in TensorFlow
I pull in this model
Here is what the Inception architecture is
It feeds forward this way,
here you put your image
it goes through lots and lots of
convolutional layers
all the way up to the end
with softmax and the output
So having done that, what I will do is
actually I have a download
for the checkpoint
this is the checkpoint here which
is a tar file, I have it locally stored
It doesn't download it now
but it is all there, even the
big models are there up from Google
so they have retrained these
so the Inception thing takes about a week
to retrain on a bunch of,
it could be 64 GPUs
so you don't really want to be
training this thing on your own
you also need the ImageNet training set
it is a 140 GB file
which is no fun to download
what I am doing here is basically
there is also an Inception library
which is part of the TF-Slim
this thing is desinged such that
it already knows the network
it can preload it
this has loaded it,
I can get some labels
This is loading up the ImageNet labels
I need to know which location
corresponds to which class like the digits
Here we are going through
basically the same steps
as the MNIST example in that
we reset the default graph
we create a placeholder which is
where my images are going to go
this is as an input
but from this image input
I am then going to do some TensorFlow steps
because TensorFlow
has various preprocessing
or graphics handling commands
because a lot of this stuff
works with images
so there's all sorts of clipping
and rotating stuff
so it can preprocess these images
I am also going to pull out a numpy image
so I can see what it is actually looking at
here with this Inception version 1
I am going to pull in the entire
Inception version 1 model
My net function rather than being
just picks and random weights
is gonna be assigned this
from this checkpoint
when I run the init thing from my graph
or in my session, it won't initialize
everything from random
it will initialize everything from disk
so this will define the model
and now let's proceed
one of the issues with having this
on a nice TensorFlow graph
is it just says input, Inception1, output
so there's a big block there
you can delve into it if you want
let me just show you
let's go back a bit
So this is the code
behind the Inception1 model
so this is actually smaller than the
Inception2 and Inception3
basically, we have a kind of a base
Inception piece, just this
and these are combined together
and this is a detailed model put together
by many smart people in 2014
it's got much more complicated since then
fortunately, they have written the code
and we don't have to
So here what I am gonna do is
I am gonna load an example image
just to show you
one of the the things here is
TensorFlow in order to become efficient
wants to do the loading itself
So in order to get this pumping
information through
it wants you to set up hues of images
it will then handle the whole ingestion
process itself
the problem with that is
it's kind of complicated to do
in a Jupyter notebook right here
so here I am going to do
the very simplest thing
which is load a numpy image
and stuff the numpy image in
but what TensorFlow would love me to do
is create , as you see in this one
create a file name queue and it will
then run the queue, do the matching
and do all of this stuff itself
because then it can lay it out across
potentially distributed cluster
and do everything just right
here I do kind of the simple read the image
so this image is a tensor
which is 224 by 224 by RGB
this is kind of sanity check
what kind of numbers I got in the corner
and then what I am gonna do is
i am going to crop out the
middle section of it
this happens to be the right size already
basically if you got odd shapes
you need to think about
how am I gonna do it
am I going to pad it
what do you do
because in order to make this efficient
TensorFlow wants to lay it out without
all this variability in image size
one set of parameters and it's then going
to blast it across your GPU
so let's just run this thing
so now we have defined the network
here I am going to pick a session
here I am going to init the session
it loads the data, and then I am going
to pick up the numpy image and the
probabilities from the top layer
I am just gonna show it
here is the image
this is image I pulled out of the disk
you can see here the probabilities,
the highest probability is Tabby cat
which is good, it's also interesting that
the second in line things are Tiger cat,
Egyptian cat, lynx
so it's got a fair idea that it is a cat
in particular, it is getting it right
ok so this is the same diagram
we have had before
what you have seen is this going in this
black box, coming out and telling us
the probabilities here, so what we are
now gonna do is
from the image to the black box and
just learn a bunch of features
let me just show you this on disk
so I have a cars directory here
and inside this thing,
I have surprisingly little data
In this directory, I just have a
bunch of car images
and I have two sets of images
one of which is called classic
and the other is called modern
so I basically I picked some
photos off Flickr
I put these into two separate directories
I am going to use those directory names
as the classification for these images
In the upper directory here
I got a bunch of test images
which I don't know the labels for
this picks out the list of classic , there
is a classic and a modern directory
I am gonna go through every file
in this directory
I am gonna crop it, I am gonna find
the logits level which is
all the classes and then I am just gonna
add these to features
So basically I am gonna do something
like a scikit-learn model
I am gonna fit SVM
so basically, this is featurizing
all these pictures
so here we go with the training data
here's some training
classic cars,
it went through the classic directory
modern cars,
it went through the modern directory
it's thinking hard
what I am gonna do now is
build SVM over those features
jump to 21:36
I restarted this thing
the actual training for this SVM thing
takes that long,
this is very quick, essentially 20 images
worth of a thousand features
so there was no big training loop to do
then I can run this on the actual models
in the directory, in the test set
so here this is images that it has never
seen before
it thinks that this is a modern car
this one it thinks is a classic car,
this one is classified as modern
so this is actually doing quite a good job
out of just 10 examples of each
it actually thinks this one is modern
it's not a sports car but anyway
so this is showing that the SVM we trained
can classify based on the features that
Inception is producing because
Inception understands "understands"
what images are about
so if I go back to here,
code is on GitHub
conclusions okay, this thing really works
we didn't have to train
a deep neural network
we could plug this TensorFlow model
into an existing pipeline
and this is actually something where
the TensorFlow Summit has something
to say about these pipelines
because not only are they talking
about deep learning
they are talking about the whole
cloud-based learning
and setting up proper processes
I guess, time for questions quickly
we can then do the
TensorFlow Summit wrap-up
"I am assuming that there is no
backpropagation here"
This includes no backpropagation
"End result is a feature"
I am just assuming that Inception,
you can imagine if the ImageNet thing
had focused more on products,
it could be even better
if it focused on man-made things
The ImageNet training set has an awful
lot of dogs in it, not that many cats
So, on the other hand it may be that
it has quite a lot of flowers
or maybe that it is saying I like this car
as modern car
because it's got petals for wheels
whereas the other one, the classic cars
tend to have round things for wheels
So it is abstractly doing this
It doesn't know about sports cars or
what they look like
But it does know about curves
"So for SVM, you don't use
TensorFlow anymore ?"
No, basically I have used TensorFlow to
create some features
Now, I don't want to throw it away
because hopefully I have got
a streaming process where
more and more images are chugged
through this thing
could not hear the question properly
There is an example code called
TensorFlow for poets
where they actually say that,
let's load up one of these networks
and then we will do some fine tuning
there you get involved in tuning
these neurons with some gradient descent
and you are taking some steps
and all this kind of thing
maybe you are having broad implications
across the whole network
which could be good if you have got
tons of data and tons of time
but this is a very simple way of just
tricking it to get it done
could not hear the comment properly
it will be a very small network
because SVM is essentially fairly shallow
could not hear the question
TensorFlow even though it has imported
this large Inception network
as far as I am concerned,
I am using a f(x) = y and that's it
but you can inquire what would it say
at this particular level
and these bunches of levels with various
component points along the way
I could take out other levels
I haven't tried it to have a look
There you get more like pictures
worth of features rather than
this string of a 1000 numbers
but each intermediate levels
will be pictures with CNN kind of features
on the other hand, if you want
to play around with this thing
there's this nice stuff called
the DeepDream kind of things
where they try and match images to
being interesting images
then you do the featurizing that looks at
different levels
the highest level is a cat but I want all
local features to be as fishy as possible
then you get like a fish-faced cat
that's the kind of thing you can do with
these kinds of features in models