Go Deeper: Transfer Learning - TensorFlow and Deep Learning Singapore

Edit subtitles

0:00 - 0:03

What we hope to do with this meetup
0:04 - 0:11

is have something, given the spread of
the questionnaire results
0:11 - 0:13

we hope to do something which is kind of
0:13 - 0:16

for people who don't know what
deep learning is
0:16 - 0:18

and want an introduction to TensorFlow
0:18 - 0:20

but also something which is more of a
0:20 - 0:24

like a crowd pleaser or something
which is more cutting edge
0:24 - 0:27

I am not going to say that this
thing is particularly cutting edge
0:27 - 0:32

because once we saw the responses,
we dialed things down a bit
0:32 - 0:38

But there will be more cutting edge stuff
0:38 - 0:43

and maybe we start to do other meetups
events in other formats
0:43 - 0:49

So it could be like we have
an experts' paper meeting
0:49 - 0:53

or we could split it now we can see
the size of people, size of the crowd
0:53 - 0:58

Anyway, let me talk a little bit about
going deeper with transfer learning
0:58 - 1:00

Unfortunately, this is something
some of you people
1:00 - 1:03

would have seen me do before
1:03 - 1:05

This is the first time I have
done it in tensorflow
1:05 - 1:07

and let me just explain that
1:07 - 1:10

Before, I have been programming this stuff
1:10 - 1:13

in Theano with the
Lasagna layers thing on top
1:13 - 1:19

and Theano is a research-based
deep learning framework, out of Montreal
1:19 - 1:23

but what I have concluded
since last summer
1:23 - 1:27

is that TensorFlow 's probably the winner
of this framework race
1:27 - 1:29

at least for foreseeable future
1:29 - 1:32

with all this nice industrial stuff
1:32 - 1:35

I should be retooling into TensorFlow
1:36 - 1:37

That's what I am taking the opportunity
to do for this
1:41 - 1:43

So, about me, sorry here we go
1:43 - 1:46

I have come up through finance,
startups and stuff
1:46 - 1:50

I took a year out basically in 2014
just for fun
1:50 - 1:54

I have been doing serious kind of
natural language processing since then
2:01 - 2:05

Basically, the overview for this
something more challenging talk
2:05 - 2:09

which will probably be 20 mins, 30 mins
depending on how it goes
2:09 - 2:14

I want to take a state-of-the-art
TensorFlow model
2:14 - 2:17

I want to solve a problem that
it wasn't trained for
2:17 - 2:21

And I am going to be using
deep learning as a component
2:21 - 2:26

of my solution rather than the
primary focus of what I am trying to build
2:26 - 2:33

So this is, in a way more of an industrial
or commercial kind of application
2:33 - 2:35

for what's going on here
2:35 - 2:39

So the goal for this kind of problem is
2:39 - 2:43

I want to distinguish pictures
of classic and modern sports cars
2:43 - 2:47

you will see some pictures of
classic and modern cars a bit later
2:48 - 2:52

It's not that easy to say what
the difference is
2:52 - 2:55

obviously, it could be
different types of images
2:55 - 2:57

and it could be lots of
different classes
2:57 - 3:01

I am just doing a very simple
two class thing
3:01 - 3:03

but it's complicated images
3:03 - 3:05

what I want to do is
3:05 - 3:06

I want to have a very small training time
3:06 - 3:08

so I don't want to be retraining
some huge network
3:08 - 3:13

Particularly, I have only got
in this case, 20 training examples
3:13 - 3:18

So I am not gonna do any fantastic
million image training
3:18 - 3:21

I have got 20 images with me
3:21 - 3:25

and I also want to be able to
put this in production
3:25 - 3:30

so I can just run it as a component of
something else
3:30 - 3:36

Basically, one of the things that is
carrying the deep learning world forward
3:36 - 3:40

is an image classification task
called ImageNet
3:40 - 3:42

this has been a competition where
3:42 - 3:47

they have 15 million labeled images
from 22,000 categories
3:47 - 3:50

and you can see some of them here
3:50 - 3:56

if we go for this.
this is a picture of a hotdog in a bun
3:56 - 3:58

and here are some of the categories
3:58 - 4:03

which will be some food I don't know
4:03 - 4:06

these are hotdogs, lots of
different pictures of hotdogs
4:06 - 4:09

lots of different pictures of cheeseburgers
4:09 - 4:12

lots of different pictures of plates
4:12 - 4:15

so the task for ImageNet is to classify
4:15 - 4:18

for any given, any one of these images
4:18 - 4:20

which of a thousand different
categories it is from
4:20 - 4:25

and it used to be that people could
score adequately well
4:25 - 4:29

and were making incremental changes in
4:29 - 4:31

how well they can do this
4:31 - 4:33

but the deep learning people came along
4:33 - 4:35

and kind of tore this to shreds
4:35 - 4:40

and Google came up with GoogLeNet
4:40 - 4:44

what we are actually going to use here,
back in 2014
4:44 - 4:50

suddenly, this stuff is now being done
with further iterations
4:50 - 4:53

of this kind of thing,
better than humans can
4:53 - 4:57

So the way you can measure whether
someone is better than humans
4:57 - 4:59

is, you take a human and see
whether it beats him
4:59 - 5:02

the question there is
are there labeling errors
5:02 - 5:04

there you need a committee of humans
5:04 - 5:06

so the way they label these things is
5:06 - 5:09

by running it on Mechanical Turk and
5:09 - 5:12

asking people what category is this
cheeseburger in
5:15 - 5:16

The network we are going to use here
5:16 - 5:23

is the 2014 state-of-the-art GoogLeNet,
also called Inception version 1
5:23 - 5:26

The nice thing about this is that
5:26 - 5:31

there is an existing model
already trained for this task
5:31 - 5:34

and it's available for download
it's all free
5:34 - 5:39

and there are lots of different
models out there
5:39 - 5:41

there's a model zoo for TensorFlow
5:41 - 5:44

So, what I have on my machine
5:44 - 5:49

and this is a small model,
it's a 20 megabytes kind of model
5:49 - 5:50

So it is not a very big model
5:50 - 5:57

Inception 4 is a 200 MB kind of model
which is a bit heavy
5:57 - 5:59

I am working here on my laptop
5:59 - 6:01

you are gonna see it working in real-time
6:01 - 6:07

and the trick here is instead of
a softmax layer at the end
6:07 - 6:13

I will show you the diagram, it should be
clear to anyone who's following along
6:13 - 6:19

instead of using logits to get me
the probablilities
6:19 - 6:21

I am going to strip that away
6:21 - 6:23

and I am going to train
a support vector machine
6:23 - 6:25

to distinguish these classes
6:25 - 6:30

I am not going to retrain the
Inception network at all
6:30 - 6:32

I am going to just use it as a component
6:32 - 6:35

strip off the top classification piece
6:35 - 6:38

and replace it with an SVM
6:38 - 6:40

Now, SVMs are pretty well understood
6:40 - 6:45

here I am just using Inception
as a featurizer for images
6:45 - 6:47

So here's a network picture
6:47 - 6:52

Basically, this is what the ImageNet
network is designed for
6:52 - 6:54

you put in an image at the bottom
6:54 - 6:57

there is this black box which is the
Inception network
6:57 - 7:01

which is a bunch of CNNs or
convolutional neural networks
7:01 - 7:03

followed by Dense network
7:03 - 7:05

followed by these logits
7:05 - 7:08

and these logits layers is essentially
the same as the 0 to 10
7:08 - 7:17

that Sam had for his digits, 1 to 1000
for the different classes for ImageNet
7:17 - 7:20

To actually get the ImageNet output
7:20 - 7:27

it uses a softmax function and
then chooses the highest one of these
7:27 - 7:29

to give you this is the class
that this is in
7:29 - 7:32

What I am going to do is
I am going to ignore this
7:32 - 7:35

neat piece of classification technology
that they have got
7:35 - 7:44

let's say we use these outputs as inputs
to SVM and just treat these as features
7:44 - 7:47

Now if we pick out one of these
7:47 - 7:51

this class could be cheeseburger
and this class could be parrot
7:51 - 7:54

and this other class could be Husky dog
7:54 - 7:57

there is all sorts of classes in here
7:57 - 8:00

but basically what I will be doing is that
8:00 - 8:02

I will be extracting out the features
of these photos
8:02 - 8:05

saying how much of this photo
is like a parrot
8:05 - 8:09

how much of this is like a Husky dog
8:09 - 8:13

Now it turns out that modern cars and
classic cars can be distinguised that way
8:13 - 8:19

Let me go to some code
8:19 - 8:21

Ok this code is all up on GitHub
8:31 - 8:34

Can everyone see this enough
8:38 - 8:42

So basically, I am pulling in TensorFlow
8:45 - 8:49

I pull in this model
8:49 - 8:53

Here is what the Inception architecture is
8:53 - 8:57

It feeds forward this way,
here you put your image
8:57 - 9:00

it goes through lots and lots of
convolutional layers
9:00 - 9:03

all the way up to the end
with softmax and the output
9:03 - 9:07

So having done that, what I will do is
9:07 - 9:10

actually I have a download
for the checkpoint
9:10 - 9:17

this is the checkpoint here which
is a tar file, I have it locally stored
9:17 - 9:18

It doesn't download it now
9:18 - 9:25

but it is all there, even the
big models are there up from Google
9:25 - 9:28

so they have retrained these
9:28 - 9:30

so the Inception thing takes about a week
9:30 - 9:34

to retrain on a bunch of,
it could be 64 GPUs
9:34 - 9:37

so you don't really want to be
training this thing on your own
9:37 - 9:41

you also need the ImageNet training set
9:41 - 9:48

it is a 140 GB file
which is no fun to download
9:51 - 9:57

what I am doing here is basically
there is also an Inception library
9:57 - 10:04

which is part of the TF-Slim
this thing is desinged such that
10:04 - 10:08

it already knows the network
it can preload it
10:08 - 10:12

this has loaded it,
I can get some labels
10:12 - 10:17

This is loading up the ImageNet labels
10:17 - 10:26

I need to know which location
corresponds to which class like the digits
10:31 - 10:33

Here we are going through
basically the same steps
10:33 - 10:39

as the MNIST example in that
we reset the default graph
10:39 - 10:45

we create a placeholder which is
where my images are going to go
10:45 - 10:48

this is as an input
but from this image input
10:48 - 10:50

I am then going to do some TensorFlow steps
10:50 - 10:52

because TensorFlow
has various preprocessing
10:52 - 10:56

or graphics handling commands
10:56 - 10:58

because a lot of this stuff
works with images
10:58 - 11:03

so there's all sorts of clipping
and rotating stuff
11:03 - 11:05

so it can preprocess these images
11:05 - 11:08

I am also going to pull out a numpy image
11:08 - 11:11

so I can see what it is actually looking at
11:11 - 11:15

here with this Inception version 1
11:15 - 11:21

I am going to pull in the entire
Inception version 1 model
11:23 - 11:27

My net function rather than being
just picks and random weights
11:27 - 11:30

is gonna be assigned this
from this checkpoint
11:30 - 11:34

when I run the init thing from my graph
11:34 - 11:37

or in my session, it won't initialize
everything from random
11:37 - 11:39

it will initialize everything from disk
11:39 - 11:42

so this will define the model
11:42 - 11:45

and now let's proceed
11:45 - 11:52

one of the issues with having this
on a nice TensorFlow graph
11:52 - 11:57

is it just says input, Inception1, output
11:57 - 12:00

so there's a big block there
you can delve into it if you want
12:00 - 12:06

let me just show you
let's go back a bit
12:08 - 12:11

So this is the code
behind the Inception1 model
12:11 - 12:16

so this is actually smaller than the
Inception2 and Inception3
12:16 - 12:22

basically, we have a kind of a base
Inception piece, just this
12:22 - 12:25

and these are combined together
12:25 - 12:33

and this is a detailed model put together
by many smart people in 2014
12:33 - 12:35

it's got much more complicated since then
12:35 - 12:39

fortunately, they have written the code
and we don't have to
12:43 - 12:46

So here what I am gonna do is
I am gonna load an example image
12:46 - 12:51

just to show you
one of the the things here is
12:51 - 12:56

TensorFlow in order to become efficient
wants to do the loading itself
12:56 - 13:01

So in order to get this pumping
information through
13:01 - 13:04

it wants you to set up hues of images
13:04 - 13:10

it will then handle the whole ingestion
process itself
13:10 - 13:14

the problem with that is
it's kind of complicated to do
13:14 - 13:16

in a Jupyter notebook right here
13:16 - 13:19

so here I am going to do
the very simplest thing
13:19 - 13:22

which is load a numpy image
and stuff the numpy image in
13:22 - 13:25

but what TensorFlow would love me to do
13:25 - 13:29

is create , as you see in this one
13:29 - 13:34

create a file name queue and it will
13:34 - 13:35

then run the queue, do the matching
13:35 - 13:37

and do all of this stuff itself
13:37 - 13:41

because then it can lay it out across
potentially distributed cluster
13:41 - 13:43

and do everything just right
13:43 - 13:50

here I do kind of the simple read the image
13:50 - 14:00

so this image is a tensor
which is 224 by 224 by RGB
14:00 - 14:03

this is kind of sanity check
what kind of numbers I got in the corner
14:03 - 14:06

and then what I am gonna do is
14:06 - 14:08

i am going to crop out the
middle section of it
14:08 - 14:11

this happens to be the right size already
14:11 - 14:13

basically if you got odd shapes
14:13 - 14:15

you need to think about
how am I gonna do it
14:15 - 14:19

am I going to pad it
what do you do
14:19 - 14:22

because in order to make this efficient
14:22 - 14:29

TensorFlow wants to lay it out without
all this variability in image size
14:29 - 14:34

one set of parameters and it's then going
to blast it across your GPU
14:34 - 14:38

so let's just run this thing
14:38 - 14:40

so now we have defined the network
14:40 - 14:46

here I am going to pick a session
here I am going to init the session
14:46 - 14:48

it loads the data, and then I am going
14:48 - 14:52

to pick up the numpy image and the
probabilities from the top layer
14:52 - 14:55

I am just gonna show it
14:58 - 15:01

here is the image
this is image I pulled out of the disk
15:01 - 15:06

you can see here the probabilities,
the highest probability is Tabby cat
15:06 - 15:10

which is good, it's also interesting that
15:10 - 15:15

the second in line things are Tiger cat,
Egyptian cat, lynx
15:15 - 15:21

so it's got a fair idea that it is a cat
in particular, it is getting it right
15:21 - 15:26

ok so this is the same diagram
we have had before
15:26 - 15:33

what you have seen is this going in this
black box, coming out and telling us
15:33 - 15:36

the probabilities here, so what we are
now gonna do is
15:36 - 15:42

from the image to the black box and
just learn a bunch of features
15:50 - 15:53

let me just show you this on disk
16:11 - 16:13

so I have a cars directory here
16:14 - 16:18

and inside this thing,
16:24 - 16:26

I have surprisingly little data
16:37 - 16:40

In this directory, I just have a
bunch of car images
16:40 - 16:42

and I have two sets of images
16:42 - 16:48

one of which is called classic
and the other is called modern
16:48 - 16:52

so I basically I picked some
photos off Flickr
16:52 - 16:54

I put these into two separate directories
16:54 - 16:56

I am going to use those directory names
16:56 - 17:00

as the classification for these images
17:00 - 17:05

In the upper directory here
I got a bunch of test images
17:05 - 17:07

which I don't know the labels for
17:13 - 17:17

this picks out the list of classic , there
is a classic and a modern directory
17:17 - 17:22

I am gonna go through every file
in this directory
17:22 - 17:28

I am gonna crop it, I am gonna find
the logits level which is
17:28 - 17:33

all the classes and then I am just gonna
add these to features
17:33 - 17:37

So basically I am gonna do something
like a scikit-learn model
17:37 - 17:38

I am gonna fit SVM
17:38 - 17:42

so basically, this is featurizing
all these pictures
17:48 - 17:50

so here we go with the training data
17:56 - 17:57

here's some training
18:02 - 18:06

classic cars,
it went through the classic directory
18:06 - 18:09

modern cars,
it went through the modern directory
18:15 - 18:17

it's thinking hard
18:18 - 18:25

what I am gonna do now is
build SVM over those features
18:31 - 18:40

jump to 21:36
21:35 - 21:44

I restarted this thing
21:44 - 21:50

the actual training for this SVM thing
takes that long,
21:50 - 21:58

this is very quick, essentially 20 images
worth of a thousand features
21:58 - 22:02

so there was no big training loop to do
22:02 - 22:09

then I can run this on the actual models
in the directory, in the test set
22:09 - 22:13

so here this is images that it has never
seen before
22:13 - 22:16

it thinks that this is a modern car
22:16 - 22:19

this one it thinks is a classic car,
this one is classified as modern
22:19 - 22:26

so this is actually doing quite a good job
out of just 10 examples of each
22:26 - 22:33

it actually thinks this one is modern
it's not a sports car but anyway
22:33 - 22:39

so this is showing that the SVM we trained
22:39 - 22:43

can classify based on the features that
Inception is producing because
22:43 - 22:47

Inception understands "understands"
what images are about
22:47 - 22:51

so if I go back to here,
code is on GitHub
22:51 - 22:54

conclusions okay, this thing really works
22:54 - 22:58

we didn't have to train
a deep neural network
22:58 - 23:02

we could plug this TensorFlow model
into an existing pipeline
23:02 - 23:05

and this is actually something where
23:05 - 23:09

the TensorFlow Summit has something
to say about these pipelines
23:09 - 23:11

because not only are they talking
about deep learning
23:11 - 23:15

they are talking about the whole
cloud-based learning
23:15 - 23:19

and setting up proper processes
23:19 - 23:24

I guess, time for questions quickly
23:24 - 23:29

we can then do the
TensorFlow Summit wrap-up
23:33 - 23:37

"I am assuming that there is no
backpropagation here"
23:37 - 23:40

This includes no backpropagation
23:40 - 23:43

"End result is a feature"
23:46 - 23:53

I am just assuming that Inception,
you can imagine if the ImageNet thing
23:53 - 23:56

had focused more on products,
it could be even better
23:56 - 23:59

if it focused on man-made things
23:59 - 24:05

The ImageNet training set has an awful
lot of dogs in it, not that many cats
24:05 - 24:09

So, on the other hand it may be that
it has quite a lot of flowers
24:09 - 24:14

or maybe that it is saying I like this car
as modern car
24:14 - 24:16

because it's got petals for wheels
24:16 - 24:20

whereas the other one, the classic cars
tend to have round things for wheels
24:20 - 24:25

So it is abstractly doing this
24:25 - 24:30

It doesn't know about sports cars or
what they look like
24:30 - 24:32

But it does know about curves
24:35 - 24:38

"So for SVM, you don't use
TensorFlow anymore ?"
24:38 - 24:43

No, basically I have used TensorFlow to
create some features
24:43 - 24:45

Now, I don't want to throw it away
24:45 - 24:48

because hopefully I have got
a streaming process where
24:48 - 24:52

more and more images are chugged
through this thing
24:52 - 25:05

could not hear the question properly
25:07 - 25:10

There is an example code called
TensorFlow for poets
25:10 - 25:13

where they actually say that,
let's load up one of these networks
25:13 - 25:15

and then we will do some fine tuning
25:15 - 25:22

there you get involved in tuning
these neurons with some gradient descent
25:22 - 25:25

and you are taking some steps
and all this kind of thing
25:25 - 25:28

maybe you are having broad implications
across the whole network
25:28 - 25:33

which could be good if you have got
tons of data and tons of time
25:33 - 25:37

but this is a very simple way of just
tricking it to get it done
25:37 - 25:47

could not hear the comment properly
25:47 - 25:54

it will be a very small network
because SVM is essentially fairly shallow
25:54 - 26:07

could not hear the question
26:07 - 26:14

TensorFlow even though it has imported
this large Inception network
26:14 - 26:21

as far as I am concerned,
I am using a f(x) = y and that's it
26:21 - 26:25

but you can inquire what would it say
at this particular level
26:25 - 26:30

and these bunches of levels with various
component points along the way
26:30 - 26:34

I could take out other levels
26:34 - 26:36

I haven't tried it to have a look
26:36 - 26:40

There you get more like pictures
worth of features rather than
26:40 - 26:43

this string of a 1000 numbers
26:43 - 26:49

but each intermediate levels
will be pictures with CNN kind of features
26:49 - 26:54

on the other hand, if you want
to play around with this thing
26:54 - 26:58

there's this nice stuff called
the DeepDream kind of things
26:58 - 27:03

where they try and match images to
being interesting images
27:03 - 27:06

then you do the featurizing that looks at
different levels
27:06 - 27:12

the highest level is a cat but I want all
local features to be as fishy as possible
27:12 - 27:16

then you get like a fish-faced cat
27:16 - 27:20

that's the kind of thing you can do with
these kinds of features in models

Title:: Go Deeper: Transfer Learning - TensorFlow and Deep Learning Singapore
Description:: Speaker: Martin Andrews

Event Page: https://www.meetup.com/TensorFlow-and-Deep-Learning-Singapore/events/237032130/

Produced by Engineers.SG

English subtitles by: Sindhu Shetty

more » « less
Video Language:: English
Duration:: 27:36

	sindhu shetty edited English subtitles for Go Deeper: Transfer Learning - TensorFlow and Deep Learning Singapore
	sindhu shetty edited English subtitles for Go Deeper: Transfer Learning - TensorFlow and Deep Learning Singapore
	sindhu shetty edited English subtitles for Go Deeper: Transfer Learning - TensorFlow and Deep Learning Singapore
	sindhu shetty edited English subtitles for Go Deeper: Transfer Learning - TensorFlow and Deep Learning Singapore
	sindhu shetty edited English subtitles for Go Deeper: Transfer Learning - TensorFlow and Deep Learning Singapore
	sindhu shetty edited English subtitles for Go Deeper: Transfer Learning - TensorFlow and Deep Learning Singapore

English subtitles

Revisions

Revision 6 Edited

sindhu shetty

Go Deeper: Transfer Learning - TensorFlow and Deep Learning Singapore

Revisions

Our website uses cookies

Operating cookies (Required)