What we hope to do with this meetup

is have something, given the spread of 
the questionnaire results

we hope to do something which is kind of

for people who don't know what
deep learning is

and want an introduction to TensorFlow

but also something which is more of a

like a crowd pleaser or something
which is more cutting edge

I am not going to say that this 
thing is particularly cutting edge

because once we saw the responses,
we dialed things down a bit

But there will be more cutting edge stuff

and maybe we start to do other meetups
events in other formats

So it could be like we have 
an experts' paper meeting

or we could split it now we can see 
the size of people, size of the crowd

Anyway, let me talk a little bit about
going deeper with transfer learning

Unfortunately, this is something 
some of you people

would have seen me do before

This is the first time I have 
done it in tensorflow

and let me just explain that

Before, I have been programming this stuff

in Theano with the 
Lasagna layers thing on top

and Theano is a research-based 
deep learning framework, out of Montreal

but what I have concluded
since last summer

is that TensorFlow 's probably the winner
of this framework race

at least for foreseeable future

with all this nice industrial stuff

I should be retooling into TensorFlow

That's what I am taking the opportunity
to do for this

So, about me, sorry here we go

I have come up through finance, 
startups and stuff

I took a year out basically in 2014
just for fun

I have been doing serious kind of 
natural language processing since then

Basically, the overview for this
something more challenging talk

which will probably be 20 mins, 30 mins
depending on how it goes

I want to take a state-of-the-art
TensorFlow model

I want to solve a problem that 
it wasn't trained for

And I am going to be using 
deep learning as a component

of my solution rather than the 
primary focus of what I am trying to build

So this is, in a way more of an industrial
or commercial kind of application

for what's going on here

So the goal for this kind of problem is

I want to distinguish pictures
of classic and modern sports cars

you will see some pictures of 
classic and modern cars a bit later

It's not that easy to say what
the difference is

obviously, it could be 
different types of images

and it could be lots of 
different classes

I am just doing a very simple 
two class thing

but it's complicated images

what I want to do is

I want to have a very small training time

so I don't want to be retraining 
some huge network

Particularly, I have only got 
in this case, 20 training examples

So I am not gonna do any fantastic
million image training

I have got 20 images with me

and I also want to be able to
put this in production

so I can just run it as a component of 
something else

Basically, one of the things that is 
carrying the deep learning world forward

is an image classification task 
called ImageNet

this has been a competition where

they have 15 million labeled images
from 22,000 categories

and you can see some of them here

if we go for this.
this is a picture of a hotdog in a bun

and here are some of the categories

which will be some food I don't know

these are hotdogs, lots of 
different pictures of hotdogs

lots of different pictures of cheeseburgers

lots of different pictures of plates

so the task for ImageNet is to classify

for any given, any one of these images

which of a thousand different 
categories it is from

and it used to be that people could
score adequately well

and were making incremental changes in

how well they can do this

but the deep learning people came along

and kind of tore this to shreds

and Google came up with GoogLeNet

what we are actually going to use here,
back in 2014

suddenly, this stuff is now being done
with further iterations

of this kind of thing,
better than humans can

So the way you can measure whether
someone is better than humans

is, you take a human and see
whether it beats him

the question there is 
are there labeling errors

there you need a committee of humans

so the way they label these things is

by running it on Mechanical Turk and

asking people what category is this
cheeseburger in

The network we are going to use here

is the 2014 state-of-the-art GoogLeNet,
also called Inception version 1

The nice thing about this is that

there is an existing model
already trained for this task

and it's available for download
it's all free

and there are lots of different 
models out there

there's a model zoo for TensorFlow

So, what I have on my machine

and this is a small model, 
it's a 20 megabytes kind of model

So it is not a very big model

Inception 4 is a 200 MB kind of model
which is a bit heavy

I am working here on my laptop

you are gonna see it working in real-time

and the trick here is instead of 
a softmax layer at the end

I will show you the diagram, it should be 
clear to anyone who's following along

instead of using logits to get me 
the probablilities

I am going to strip that away

and I am going to train 
a support vector machine

to distinguish these classes

I am not going to retrain the 
Inception network at all

I am going to just use it as a component

strip off the top classification piece

and replace it with an SVM

Now, SVMs are pretty well understood

here I am just using Inception 
as a featurizer for images

So here's a network picture

Basically, this is what the ImageNet 
network is designed for

you put in an image at the bottom

there is this black box which is the 
Inception network

which is a bunch of CNNs or 
convolutional neural networks

followed by Dense network

followed by these logits

and these logits layers is essentially 
the same as the 0 to 10

that Sam had for his digits, 1 to 1000 
for the different classes for ImageNet

To actually get the ImageNet output

it uses a softmax function and 
then chooses the highest one of these

to give you this is the class
that this is in

What I am going to do is 
I am going to ignore this

neat piece of classification technology
that they have got

let's say we use these outputs as inputs
to SVM and just treat these as features

Now if we pick out one of these

this class could be cheeseburger
and this class could be parrot

and this other class could be Husky dog

there is all sorts of classes in here

but basically what I will be doing is that

I will be extracting out the features 
of these photos

saying how much of this photo 
is like a parrot

how much of this is like a Husky dog

Now it turns out that modern cars and 
classic cars can be distinguised that way

Let me go to some code

Ok this code is all up on GitHub

Can everyone see this enough

So basically, I am pulling in TensorFlow

I pull in this model

Here is what the Inception architecture is

It feeds forward this way,
here you put your image

it goes through lots and lots of 
convolutional layers

all the way up to the end 
with softmax and the output

So having done that, what I will do is

actually I have a download 
for the checkpoint

this is the checkpoint here which 
is a tar file, I have it locally stored

It doesn't download it now

but it is all there, even the 
big models are there up from Google

so they have retrained these

so the Inception thing takes about a week

to retrain on a bunch of, 
it could be 64 GPUs

so you don't really want to be
training this thing on your own

you also need the ImageNet training set

it is a 140 GB file 
which is no fun to download

what I am doing here is basically
there is also an Inception library

which is part of the TF-Slim 
this thing is desinged such that

it already knows the network
it can preload it

this has loaded it, 
I can get some labels

This is loading up the ImageNet labels

I need to know which location 
corresponds to which class like the digits

Here we are going through 
basically the same steps

as the MNIST example in that
we reset the default graph

we create a placeholder which is 
where my images are going to go

this is as an input 
but from this image input

I am then going to do some TensorFlow steps

because TensorFlow 
has various preprocessing

or graphics handling commands

because a lot of this stuff
works with images

so there's all sorts of clipping
and rotating stuff

so it can preprocess these images

I am also going to pull out a numpy image

so I can see what it is actually looking at

here with this Inception version 1

I am going to pull in the entire
Inception version 1 model

My net function rather than being 
just picks and random weights

is gonna be assigned this 
from this checkpoint

when I run the init thing from my graph

or in my session, it won't initialize
everything from random

it will initialize everything from disk

so this will define the model

and now let's proceed

one of the issues with having this 
on a nice TensorFlow graph

is it just says input, Inception1, output

so there's a big block there 
you can delve into it if you want

let me just show you
let's go back a bit

So this is the code 
behind the Inception1 model

so this is actually smaller than the 
Inception2 and Inception3

basically, we have a kind of a base
Inception piece, just this

and these are combined together

and this is a detailed model put together 
by many smart people in 2014

it's got much more complicated since then

fortunately, they have written the code 
and we don't have to

So here what I am gonna do is 
I am gonna load an example image

just to show you
one of the the things here is

TensorFlow in order to become efficient
wants to do the loading itself

So in order to get this pumping 
information through

it wants you to set up hues of images

it will then handle the whole ingestion
process itself

the problem with that is 
it's kind of complicated to do

in a Jupyter notebook right here

so here I am going to do 
the very simplest thing

which is load a numpy image 
and stuff the numpy image in

but what TensorFlow would love me to do

is create , as you see in this one

create a file name queue and it will

then run the queue, do the matching

and do all of this stuff itself

because then it can lay it out across 
potentially distributed cluster

and do everything just right

here I do kind of the simple read the image

so this image is a tensor 
which is 224 by 224 by RGB

this is kind of sanity check
what kind of numbers I got in the corner

and then what I am gonna do is

i am going to crop out the 
middle section of it

this happens to be the right size already

basically if you got odd shapes

you need to think about 
how am I gonna do it

am I going to pad it
what do you do

because in order to make this efficient

TensorFlow wants to lay it out without 
all this variability in image size

one set of parameters and it's then going
to blast it across your GPU

so let's just run this thing

so now we have defined the network

here I am going to pick a session
here I am going to init the session

it loads the data, and then I am going

to pick up the numpy image and the
probabilities from the top layer

I am just gonna show it

here is the image 
this is image I pulled out of the disk

you can see here the probabilities, 
the highest probability is Tabby cat

which is good, it's also interesting that

the second in line things are Tiger cat,
Egyptian cat, lynx

so it's got a fair idea that it is a cat
in particular, it is getting it right

ok so this is the same diagram 
we have had before

what you have seen is this going in this 
black box, coming out and telling us

the probabilities here, so what we are 
now gonna do is

from the image to the black box and 
just learn a bunch of features

let me just show you this on disk

so I have a cars directory here

and inside this thing,

I have surprisingly little data

In this directory, I just have a 
bunch of car images

and I have two sets of images

one of which is called classic 
and the other is called modern

so I basically I picked some
photos off Flickr

I put these into two separate directories

I am going to use those directory names

as the classification for these images

In the upper directory here
I got a bunch of test images

which I don't know the labels for

this picks out the list of classic , there
is a classic and a modern directory

I am gonna go through every file 
in this directory

I am gonna crop it, I am gonna find 
the logits level which is

all the classes and then I am just gonna 
add these to features

So basically I am gonna do something
like a scikit-learn model

I am gonna fit SVM

so basically, this is featurizing 
all these pictures

so here we go with the training data

here's some training

classic cars,
it went through the classic directory

modern cars, 
it went through the modern directory

it's thinking hard

what I am gonna do now is 
build SVM over those features

jump to 21:36

I restarted this thing

the actual training for this SVM thing
takes that long,

this is very quick, essentially 20 images 
worth of a thousand features

so there was no big training loop to do

then I can run this on the actual models 
in the directory, in the test set

so here this is images that it has never
seen before

it thinks that this is a modern car

this one it thinks is a classic car, 
this one is classified as modern

so this is actually doing quite a good job
out of just 10 examples of each

it actually thinks this one is modern
it's not a sports car but anyway

so this is showing that the SVM we trained

can classify based on the features that
Inception is producing because

Inception understands "understands"
what images are about

so if I go back to here, 
code is on GitHub

conclusions okay, this thing really works

we didn't have to train 
a deep neural network

we could plug this TensorFlow model
into an existing pipeline

and this is actually something where

the TensorFlow Summit has something
to say about these pipelines

because not only are they talking
about deep learning

they are talking about the whole 
cloud-based learning

and setting up proper processes

I guess, time for questions quickly

we can then do the 
TensorFlow Summit wrap-up

"I am assuming that there is no 
backpropagation here"

This includes no backpropagation

"End result is a feature"

I am just assuming that Inception,
you can imagine if the ImageNet thing

had focused more on products,
it could be even better

if it focused on man-made things

The ImageNet training set has an awful
lot of dogs in it, not that many cats

So, on the other hand it may be that 
it has quite a lot of flowers

or maybe that it is saying I like this car
as modern car

because it's got petals for wheels

whereas the other one, the classic cars
tend to have round things for wheels

So it is abstractly doing this

It doesn't know about sports cars or 
what they look like

But it does know about curves

"So for SVM, you don't use 
TensorFlow anymore ?"

No, basically I have used TensorFlow to 
create some features

Now, I don't want to throw it away

because hopefully I have got
a streaming process where

more and more images are chugged 
through this thing

<i> could not hear the question properly </i>

There is an example code called
TensorFlow for poets

where they actually say that, 
let's load up one of these networks

and then we will do some fine tuning

there you get involved in tuning 
these neurons with some gradient descent

and you are taking some steps 
and all this kind of thing

maybe you are having broad implications
across the whole network

which could be good if you have got
tons of data and tons of time

but this is a very simple way of just
tricking it to get it done

<i> could not hear the comment properly </i>

it will be a very small network
because SVM is essentially fairly shallow

<i>could not hear the question</i>

TensorFlow even though it has imported
this large Inception network

as far as I am concerned, 
I am using a f(x) = y and that's it

but you can inquire what would it say
at this particular level

and these bunches of levels with various
component points along the way

I could take out other levels

I haven't tried it to have a look

There you get more like pictures
worth of features rather than

this string of a 1000 numbers

but each intermediate levels 
will be pictures with CNN kind of features

on the other hand, if you want 
to play around with this thing

there's this nice stuff called 
the DeepDream kind of things

where they try and match images to 
being interesting images

then you do the featurizing that looks at
different levels

the highest level is a cat but I want all
local features to be as fishy as possible

then you get like a fish-faced cat

that's the kind of thing you can do with
these kinds of features in models