Go Deeper: Transfer Learning - TensorFlow and Deep Learning Singapore
-
0:00 - 0:03What we hope to do with this meetup
-
0:04 - 0:11is have something, given the spread of
the questionnaire results -
0:11 - 0:13we hope to do something which is kind of
-
0:13 - 0:16for people who don't know what
deep learning is -
0:16 - 0:18and want an introduction to TensorFlow
-
0:18 - 0:20but also something which is more of a
-
0:20 - 0:24like a crowd pleaser or something
which is more cutting edge -
0:24 - 0:27I am not going to say that this
thing is particularly cutting edge -
0:27 - 0:32because once we saw the responses,
we dialed things down a bit -
0:32 - 0:38But there will be more cutting edge stuff
-
0:38 - 0:43and maybe we start to do other meetups
events in other formats -
0:43 - 0:49So it could be like we have
an experts' paper meeting -
0:49 - 0:53or we could split it now we can see
the size of people, size of the crowd -
0:53 - 0:58Anyway, let me talk a little bit about
going deeper with transfer learning -
0:58 - 1:00Unfortunately, this is something
some of you people -
1:00 - 1:03would have seen me do before
-
1:03 - 1:05This is the first time I have
done it in tensorflow -
1:05 - 1:07and let me just explain that
-
1:07 - 1:10Before, I have been programming this stuff
-
1:10 - 1:13in Theano with the
Lasagna layers thing on top -
1:13 - 1:19and Theano is a research-based
deep learning framework, out of Montreal -
1:19 - 1:23but what I have concluded
since last summer -
1:23 - 1:27is that TensorFlow 's probably the winner
of this framework race -
1:27 - 1:29at least for foreseeable future
-
1:29 - 1:32with all this nice industrial stuff
-
1:32 - 1:35I should be retooling into TensorFlow
-
1:36 - 1:37That's what I am taking the opportunity
to do for this -
1:41 - 1:43So, about me, sorry here we go
-
1:43 - 1:46I have come up through finance,
startups and stuff -
1:46 - 1:50I took a year out basically in 2014
just for fun -
1:50 - 1:54I have been doing serious kind of
natural language processing since then -
2:01 - 2:05Basically, the overview for this
something more challenging talk -
2:05 - 2:09which will probably be 20 mins, 30 mins
depending on how it goes -
2:09 - 2:14I want to take a state-of-the-art
TensorFlow model -
2:14 - 2:17I want to solve a problem that
it wasn't trained for -
2:17 - 2:21And I am going to be using
deep learning as a component -
2:21 - 2:26of my solution rather than the
primary focus of what I am trying to build -
2:26 - 2:33So this is, in a way more of an industrial
or commercial kind of application -
2:33 - 2:35for what's going on here
-
2:35 - 2:39So the goal for this kind of problem is
-
2:39 - 2:43I want to distinguish pictures
of classic and modern sports cars -
2:43 - 2:47you will see some pictures of
classic and modern cars a bit later -
2:48 - 2:52It's not that easy to say what
the difference is -
2:52 - 2:55obviously, it could be
different types of images -
2:55 - 2:57and it could be lots of
different classes -
2:57 - 3:01I am just doing a very simple
two class thing -
3:01 - 3:03but it's complicated images
-
3:03 - 3:05what I want to do is
-
3:05 - 3:06I want to have a very small training time
-
3:06 - 3:08so I don't want to be retraining
some huge network -
3:08 - 3:13Particularly, I have only got
in this case, 20 training examples -
3:13 - 3:18So I am not gonna do any fantastic
million image training -
3:18 - 3:21I have got 20 images with me
-
3:21 - 3:25and I also want to be able to
put this in production -
3:25 - 3:30so I can just run it as a component of
something else -
3:30 - 3:36Basically, one of the things that is
carrying the deep learning world forward -
3:36 - 3:40is an image classification task
called ImageNet -
3:40 - 3:42this has been a competition where
-
3:42 - 3:47they have 15 million labeled images
from 22,000 categories -
3:47 - 3:50and you can see some of them here
-
3:50 - 3:56if we go for this.
this is a picture of a hotdog in a bun -
3:56 - 3:58and here are some of the categories
-
3:58 - 4:03which will be some food I don't know
-
4:03 - 4:06these are hotdogs, lots of
different pictures of hotdogs -
4:06 - 4:09lots of different pictures of cheeseburgers
-
4:09 - 4:12lots of different pictures of plates
-
4:12 - 4:15so the task for ImageNet is to classify
-
4:15 - 4:18for any given, any one of these images
-
4:18 - 4:20which of a thousand different
categories it is from -
4:20 - 4:25and it used to be that people could
score adequately well -
4:25 - 4:29and were making incremental changes in
-
4:29 - 4:31how well they can do this
-
4:31 - 4:33but the deep learning people came along
-
4:33 - 4:35and kind of tore this to shreds
-
4:35 - 4:40and Google came up with GoogLeNet
-
4:40 - 4:44what we are actually going to use here,
back in 2014 -
4:44 - 4:50suddenly, this stuff is now being done
with further iterations -
4:50 - 4:53of this kind of thing,
better than humans can -
4:53 - 4:57So the way you can measure whether
someone is better than humans -
4:57 - 4:59is, you take a human and see
whether it beats him -
4:59 - 5:02the question there is
are there labeling errors -
5:02 - 5:04there you need a committee of humans
-
5:04 - 5:06so the way they label these things is
-
5:06 - 5:09by running it on Mechanical Turk and
-
5:09 - 5:12asking people what category is this
cheeseburger in -
5:15 - 5:16The network we are going to use here
-
5:16 - 5:23is the 2014 state-of-the-art GoogLeNet,
also called Inception version 1 -
5:23 - 5:26The nice thing about this is that
-
5:26 - 5:31there is an existing model
already trained for this task -
5:31 - 5:34and it's available for download
it's all free -
5:34 - 5:39and there are lots of different
models out there -
5:39 - 5:41there's a model zoo for TensorFlow
-
5:41 - 5:44So, what I have on my machine
-
5:44 - 5:49and this is a small model,
it's a 20 megabytes kind of model -
5:49 - 5:50So it is not a very big model
-
5:50 - 5:57Inception 4 is a 200 MB kind of model
which is a bit heavy -
5:57 - 5:59I am working here on my laptop
-
5:59 - 6:01you are gonna see it working in real-time
-
6:01 - 6:07and the trick here is instead of
a softmax layer at the end -
6:07 - 6:13I will show you the diagram, it should be
clear to anyone who's following along -
6:13 - 6:19instead of using logits to get me
the probablilities -
6:19 - 6:21I am going to strip that away
-
6:21 - 6:23and I am going to train
a support vector machine -
6:23 - 6:25to distinguish these classes
-
6:25 - 6:30I am not going to retrain the
Inception network at all -
6:30 - 6:32I am going to just use it as a component
-
6:32 - 6:35strip off the top classification piece
-
6:35 - 6:38and replace it with an SVM
-
6:38 - 6:40Now, SVMs are pretty well understood
-
6:40 - 6:45here I am just using Inception
as a featurizer for images -
6:45 - 6:47So here's a network picture
-
6:47 - 6:52Basically, this is what the ImageNet
network is designed for -
6:52 - 6:54you put in an image at the bottom
-
6:54 - 6:57there is this black box which is the
Inception network -
6:57 - 7:01which is a bunch of CNNs or
convolutional neural networks -
7:01 - 7:03followed by Dense network
-
7:03 - 7:05followed by these logits
-
7:05 - 7:08and these logits layers is essentially
the same as the 0 to 10 -
7:08 - 7:17that Sam had for his digits, 1 to 1000
for the different classes for ImageNet -
7:17 - 7:20To actually get the ImageNet output
-
7:20 - 7:27it uses a softmax function and
then chooses the highest one of these -
7:27 - 7:29to give you this is the class
that this is in -
7:29 - 7:32What I am going to do is
I am going to ignore this -
7:32 - 7:35neat piece of classification technology
that they have got -
7:35 - 7:44let's say we use these outputs as inputs
to SVM and just treat these as features -
7:44 - 7:47Now if we pick out one of these
-
7:47 - 7:51this class could be cheeseburger
and this class could be parrot -
7:51 - 7:54and this other class could be Husky dog
-
7:54 - 7:57there is all sorts of classes in here
-
7:57 - 8:00but basically what I will be doing is that
-
8:00 - 8:02I will be extracting out the features
of these photos -
8:02 - 8:05saying how much of this photo
is like a parrot -
8:05 - 8:09how much of this is like a Husky dog
-
8:09 - 8:13Now it turns out that modern cars and
classic cars can be distinguised that way -
8:13 - 8:19Let me go to some code
-
8:19 - 8:21Ok this code is all up on GitHub
-
8:31 - 8:34Can everyone see this enough
-
8:38 - 8:42So basically, I am pulling in TensorFlow
-
8:45 - 8:49I pull in this model
-
8:49 - 8:53Here is what the Inception architecture is
-
8:53 - 8:57It feeds forward this way,
here you put your image -
8:57 - 9:00it goes through lots and lots of
convolutional layers -
9:00 - 9:03all the way up to the end
with softmax and the output -
9:03 - 9:07So having done that, what I will do is
-
9:07 - 9:10actually I have a download
for the checkpoint -
9:10 - 9:17this is the checkpoint here which
is a tar file, I have it locally stored -
9:17 - 9:18It doesn't download it now
-
9:18 - 9:25but it is all there, even the
big models are there up from Google -
9:25 - 9:28so they have retrained these
-
9:28 - 9:30so the Inception thing takes about a week
-
9:30 - 9:34to retrain on a bunch of,
it could be 64 GPUs -
9:34 - 9:37so you don't really want to be
training this thing on your own -
9:37 - 9:41you also need the ImageNet training set
-
9:41 - 9:48it is a 140 GB file
which is no fun to download -
9:51 - 9:57what I am doing here is basically
there is also an Inception library -
9:57 - 10:04which is part of the TF-Slim
this thing is desinged such that -
10:04 - 10:08it already knows the network
it can preload it -
10:08 - 10:12this has loaded it,
I can get some labels -
10:12 - 10:17This is loading up the ImageNet labels
-
10:17 - 10:26I need to know which location
corresponds to which class like the digits -
10:31 - 10:33Here we are going through
basically the same steps -
10:33 - 10:39as the MNIST example in that
we reset the default graph -
10:39 - 10:45we create a placeholder which is
where my images are going to go -
10:45 - 10:48this is as an input
but from this image input -
10:48 - 10:50I am then going to do some TensorFlow steps
-
10:50 - 10:52because TensorFlow
has various preprocessing -
10:52 - 10:56or graphics handling commands
-
10:56 - 10:58because a lot of this stuff
works with images -
10:58 - 11:03so there's all sorts of clipping
and rotating stuff -
11:03 - 11:05so it can preprocess these images
-
11:05 - 11:08I am also going to pull out a numpy image
-
11:08 - 11:11so I can see what it is actually looking at
-
11:11 - 11:15here with this Inception version 1
-
11:15 - 11:21I am going to pull in the entire
Inception version 1 model -
11:23 - 11:27My net function rather than being
just picks and random weights -
11:27 - 11:30is gonna be assigned this
from this checkpoint -
11:30 - 11:34when I run the init thing from my graph
-
11:34 - 11:37or in my session, it won't initialize
everything from random -
11:37 - 11:39it will initialize everything from disk
-
11:39 - 11:42so this will define the model
-
11:42 - 11:45and now let's proceed
-
11:45 - 11:52one of the issues with having this
on a nice TensorFlow graph -
11:52 - 11:57is it just says input, Inception1, output
-
11:57 - 12:00so there's a big block there
you can delve into it if you want -
12:00 - 12:06let me just show you
let's go back a bit -
12:08 - 12:11So this is the code
behind the Inception1 model -
12:11 - 12:16so this is actually smaller than the
Inception2 and Inception3 -
12:16 - 12:22basically, we have a kind of a base
Inception piece, just this -
12:22 - 12:25and these are combined together
-
12:25 - 12:33and this is a detailed model put together
by many smart people in 2014 -
12:33 - 12:35it's got much more complicated since then
-
12:35 - 12:39fortunately, they have written the code
and we don't have to -
12:43 - 12:46So here what I am gonna do is
I am gonna load an example image -
12:46 - 12:51just to show you
one of the the things here is -
12:51 - 12:56TensorFlow in order to become efficient
wants to do the loading itself -
12:56 - 13:01So in order to get this pumping
information through -
13:01 - 13:04it wants you to set up hues of images
-
13:04 - 13:10it will then handle the whole ingestion
process itself -
13:10 - 13:14the problem with that is
it's kind of complicated to do -
13:14 - 13:16in a Jupyter notebook right here
-
13:16 - 13:19so here I am going to do
the very simplest thing -
13:19 - 13:22which is load a numpy image
and stuff the numpy image in -
13:22 - 13:25but what TensorFlow would love me to do
-
13:25 - 13:29is create , as you see in this one
-
13:29 - 13:34create a file name queue and it will
-
13:34 - 13:35then run the queue, do the matching
-
13:35 - 13:37and do all of this stuff itself
-
13:37 - 13:41because then it can lay it out across
potentially distributed cluster -
13:41 - 13:43and do everything just right
-
13:43 - 13:50here I do kind of the simple read the image
-
13:50 - 14:00so this image is a tensor
which is 224 by 224 by RGB -
14:00 - 14:03this is kind of sanity check
what kind of numbers I got in the corner -
14:03 - 14:06and then what I am gonna do is
-
14:06 - 14:08i am going to crop out the
middle section of it -
14:08 - 14:11this happens to be the right size already
-
14:11 - 14:13basically if you got odd shapes
-
14:13 - 14:15you need to think about
how am I gonna do it -
14:15 - 14:19am I going to pad it
what do you do -
14:19 - 14:22because in order to make this efficient
-
14:22 - 14:29TensorFlow wants to lay it out without
all this variability in image size -
14:29 - 14:34one set of parameters and it's then going
to blast it across your GPU -
14:34 - 14:38so let's just run this thing
-
14:38 - 14:40so now we have defined the network
-
14:40 - 14:46here I am going to pick a session
here I am going to init the session -
14:46 - 14:48it loads the data, and then I am going
-
14:48 - 14:52to pick up the numpy image and the
probabilities from the top layer -
14:52 - 14:55I am just gonna show it
-
14:58 - 15:01here is the image
this is image I pulled out of the disk -
15:01 - 15:06you can see here the probabilities,
the highest probability is Tabby cat -
15:06 - 15:10which is good, it's also interesting that
-
15:10 - 15:15the second in line things are Tiger cat,
Egyptian cat, lynx -
15:15 - 15:21so it's got a fair idea that it is a cat
in particular, it is getting it right -
15:21 - 15:26ok so this is the same diagram
we have had before -
15:26 - 15:33what you have seen is this going in this
black box, coming out and telling us -
15:33 - 15:36the probabilities here, so what we are
now gonna do is -
15:36 - 15:42from the image to the black box and
just learn a bunch of features -
15:50 - 15:53let me just show you this on disk
-
16:11 - 16:13so I have a cars directory here
-
16:14 - 16:18and inside this thing,
-
16:24 - 16:26I have surprisingly little data
-
16:37 - 16:40In this directory, I just have a
bunch of car images -
16:40 - 16:42and I have two sets of images
-
16:42 - 16:48one of which is called classic
and the other is called modern -
16:48 - 16:52so I basically I picked some
photos off Flickr -
16:52 - 16:54I put these into two separate directories
-
16:54 - 16:56I am going to use those directory names
-
16:56 - 17:00as the classification for these images
-
17:00 - 17:05In the upper directory here
I got a bunch of test images -
17:05 - 17:07which I don't know the labels for
-
17:13 - 17:17this picks out the list of classic , there
is a classic and a modern directory -
17:17 - 17:22I am gonna go through every file
in this directory -
17:22 - 17:28I am gonna crop it, I am gonna find
the logits level which is -
17:28 - 17:33all the classes and then I am just gonna
add these to features -
17:33 - 17:37So basically I am gonna do something
like a scikit-learn model -
17:37 - 17:38I am gonna fit SVM
-
17:38 - 17:42so basically, this is featurizing
all these pictures -
17:48 - 17:50so here we go with the training data
-
17:56 - 17:57here's some training
-
18:02 - 18:06classic cars,
it went through the classic directory -
18:06 - 18:09modern cars,
it went through the modern directory -
18:15 - 18:17it's thinking hard
-
18:18 - 18:25what I am gonna do now is
build SVM over those features -
18:31 - 18:40jump to 21:36
-
21:35 - 21:44I restarted this thing
-
21:44 - 21:50the actual training for this SVM thing
takes that long, -
21:50 - 21:58this is very quick, essentially 20 images
worth of a thousand features -
21:58 - 22:02so there was no big training loop to do
-
22:02 - 22:09then I can run this on the actual models
in the directory, in the test set -
22:09 - 22:13so here this is images that it has never
seen before -
22:13 - 22:16it thinks that this is a modern car
-
22:16 - 22:19this one it thinks is a classic car,
this one is classified as modern -
22:19 - 22:26so this is actually doing quite a good job
out of just 10 examples of each -
22:26 - 22:33it actually thinks this one is modern
it's not a sports car but anyway -
22:33 - 22:39so this is showing that the SVM we trained
-
22:39 - 22:43can classify based on the features that
Inception is producing because -
22:43 - 22:47Inception understands "understands"
what images are about -
22:47 - 22:51so if I go back to here,
code is on GitHub -
22:51 - 22:54conclusions okay, this thing really works
-
22:54 - 22:58we didn't have to train
a deep neural network -
22:58 - 23:02we could plug this TensorFlow model
into an existing pipeline -
23:02 - 23:05and this is actually something where
-
23:05 - 23:09the TensorFlow Summit has something
to say about these pipelines -
23:09 - 23:11because not only are they talking
about deep learning -
23:11 - 23:15they are talking about the whole
cloud-based learning -
23:15 - 23:19and setting up proper processes
-
23:19 - 23:24I guess, time for questions quickly
-
23:24 - 23:29we can then do the
TensorFlow Summit wrap-up -
23:33 - 23:37"I am assuming that there is no
backpropagation here" -
23:37 - 23:40This includes no backpropagation
-
23:40 - 23:43"End result is a feature"
-
23:46 - 23:53I am just assuming that Inception,
you can imagine if the ImageNet thing -
23:53 - 23:56had focused more on products,
it could be even better -
23:56 - 23:59if it focused on man-made things
-
23:59 - 24:05The ImageNet training set has an awful
lot of dogs in it, not that many cats -
24:05 - 24:09So, on the other hand it may be that
it has quite a lot of flowers -
24:09 - 24:14or maybe that it is saying I like this car
as modern car -
24:14 - 24:16because it's got petals for wheels
-
24:16 - 24:20whereas the other one, the classic cars
tend to have round things for wheels -
24:20 - 24:25So it is abstractly doing this
-
24:25 - 24:30It doesn't know about sports cars or
what they look like -
24:30 - 24:32But it does know about curves
-
24:35 - 24:38"So for SVM, you don't use
TensorFlow anymore ?" -
24:38 - 24:43No, basically I have used TensorFlow to
create some features -
24:43 - 24:45Now, I don't want to throw it away
-
24:45 - 24:48because hopefully I have got
a streaming process where -
24:48 - 24:52more and more images are chugged
through this thing -
24:52 - 25:05could not hear the question properly
-
25:07 - 25:10There is an example code called
TensorFlow for poets -
25:10 - 25:13where they actually say that,
let's load up one of these networks -
25:13 - 25:15and then we will do some fine tuning
-
25:15 - 25:22there you get involved in tuning
these neurons with some gradient descent -
25:22 - 25:25and you are taking some steps
and all this kind of thing -
25:25 - 25:28maybe you are having broad implications
across the whole network -
25:28 - 25:33which could be good if you have got
tons of data and tons of time -
25:33 - 25:37but this is a very simple way of just
tricking it to get it done -
25:37 - 25:47could not hear the comment properly
-
25:47 - 25:54it will be a very small network
because SVM is essentially fairly shallow -
25:54 - 26:07could not hear the question
-
26:07 - 26:14TensorFlow even though it has imported
this large Inception network -
26:14 - 26:21as far as I am concerned,
I am using a f(x) = y and that's it -
26:21 - 26:25but you can inquire what would it say
at this particular level -
26:25 - 26:30and these bunches of levels with various
component points along the way -
26:30 - 26:34I could take out other levels
-
26:34 - 26:36I haven't tried it to have a look
-
26:36 - 26:40There you get more like pictures
worth of features rather than -
26:40 - 26:43this string of a 1000 numbers
-
26:43 - 26:49but each intermediate levels
will be pictures with CNN kind of features -
26:49 - 26:54on the other hand, if you want
to play around with this thing -
26:54 - 26:58there's this nice stuff called
the DeepDream kind of things -
26:58 - 27:03where they try and match images to
being interesting images -
27:03 - 27:06then you do the featurizing that looks at
different levels -
27:06 - 27:12the highest level is a cat but I want all
local features to be as fishy as possible -
27:12 - 27:16then you get like a fish-faced cat
-
27:16 - 27:20that's the kind of thing you can do with
these kinds of features in models
- Title:
- Go Deeper: Transfer Learning - TensorFlow and Deep Learning Singapore
- Description:
-
Speaker: Martin Andrews
Event Page: https://www.meetup.com/TensorFlow-and-Deep-Learning-Singapore/events/237032130/
Produced by Engineers.SG
English subtitles by: Sindhu Shetty
- Video Language:
- English
- Duration:
- 27:36
sindhu shetty edited English subtitles for Go Deeper: Transfer Learning - TensorFlow and Deep Learning Singapore | ||
sindhu shetty edited English subtitles for Go Deeper: Transfer Learning - TensorFlow and Deep Learning Singapore | ||
sindhu shetty edited English subtitles for Go Deeper: Transfer Learning - TensorFlow and Deep Learning Singapore | ||
sindhu shetty edited English subtitles for Go Deeper: Transfer Learning - TensorFlow and Deep Learning Singapore | ||
sindhu shetty edited English subtitles for Go Deeper: Transfer Learning - TensorFlow and Deep Learning Singapore | ||
sindhu shetty edited English subtitles for Go Deeper: Transfer Learning - TensorFlow and Deep Learning Singapore |