< Return to Video

Go Deeper: Transfer Learning - TensorFlow and Deep Learning Singapore

  • 0:00 - 0:03
    What we hope to do with this meetup
  • 0:04 - 0:11
    is have something, given the spread of
    the questionnaire results
  • 0:11 - 0:13
    we hope to do something which is kind of
  • 0:13 - 0:16
    for people who don't know what
    deep learning is
  • 0:16 - 0:18
    and want an introduction to TensorFlow
  • 0:18 - 0:20
    but also something which is more of a
  • 0:20 - 0:24
    like a crowd pleaser or something
    which is more cutting edge
  • 0:24 - 0:27
    I am not going to say that this
    thing is particularly cutting edge
  • 0:27 - 0:32
    because once we saw the responses,
    we dialed things down a bit
  • 0:32 - 0:38
    But there will be more cutting edge stuff
  • 0:38 - 0:43
    and maybe we start to do other meetups
    events in other formats
  • 0:43 - 0:49
    So it could be like we have
    an experts' paper meeting
  • 0:49 - 0:53
    or we could split it now we can see
    the size of people, size of the crowd
  • 0:53 - 0:58
    Anyway, let me talk a little bit about
    going deeper with transfer learning
  • 0:58 - 1:00
    Unfortunately, this is something
    some of you people
  • 1:00 - 1:03
    would have seen me do before
  • 1:03 - 1:05
    This is the first time I have
    done it in tensorflow
  • 1:05 - 1:07
    and let me just explain that
  • 1:07 - 1:10
    Before, I have been programming this stuff
  • 1:10 - 1:13
    in Theano with the
    Lasagna layers thing on top
  • 1:13 - 1:19
    and Theano is a research-based
    deep learning framework, out of Montreal
  • 1:19 - 1:23
    but what I have concluded
    since last summer
  • 1:23 - 1:27
    is that TensorFlow 's probably the winner
    of this framework race
  • 1:27 - 1:29
    at least for foreseeable future
  • 1:29 - 1:32
    with all this nice industrial stuff
  • 1:32 - 1:35
    I should be retooling into TensorFlow
  • 1:36 - 1:37
    That's what I am taking the opportunity
    to do for this
  • 1:41 - 1:43
    So, about me, sorry here we go
  • 1:43 - 1:46
    I have come up through finance,
    startups and stuff
  • 1:46 - 1:50
    I took a year out basically in 2014
    just for fun
  • 1:50 - 1:54
    I have been doing serious kind of
    natural language processing since then
  • 2:01 - 2:05
    Basically, the overview for this
    something more challenging talk
  • 2:05 - 2:09
    which will probably be 20 mins, 30 mins
    depending on how it goes
  • 2:09 - 2:14
    I want to take a state-of-the-art
    TensorFlow model
  • 2:14 - 2:17
    I want to solve a problem that
    it wasn't trained for
  • 2:17 - 2:21
    And I am going to be using
    deep learning as a component
  • 2:21 - 2:26
    of my solution rather than the
    primary focus of what I am trying to build
  • 2:26 - 2:33
    So this is, in a way more of an industrial
    or commercial kind of application
  • 2:33 - 2:35
    for what's going on here
  • 2:35 - 2:39
    So the goal for this kind of problem is
  • 2:39 - 2:43
    I want to distinguish pictures
    of classic and modern sports cars
  • 2:43 - 2:47
    you will see some pictures of
    classic and modern cars a bit later
  • 2:48 - 2:52
    It's not that easy to say what
    the difference is
  • 2:52 - 2:55
    obviously, it could be
    different types of images
  • 2:55 - 2:57
    and it could be lots of
    different classes
  • 2:57 - 3:01
    I am just doing a very simple
    two class thing
  • 3:01 - 3:03
    but it's complicated images
  • 3:03 - 3:05
    what I want to do is
  • 3:05 - 3:06
    I want to have a very small training time
  • 3:06 - 3:08
    so I don't want to be retraining
    some huge network
  • 3:08 - 3:13
    Particularly, I have only got
    in this case, 20 training examples
  • 3:13 - 3:18
    So I am not gonna do any fantastic
    million image training
  • 3:18 - 3:21
    I have got 20 images with me
  • 3:21 - 3:25
    and I also want to be able to
    put this in production
  • 3:25 - 3:30
    so I can just run it as a component of
    something else
  • 3:30 - 3:36
    Basically, one of the things that is
    carrying the deep learning world forward
  • 3:36 - 3:40
    is an image classification task
    called ImageNet
  • 3:40 - 3:42
    this has been a competition where
  • 3:42 - 3:47
    they have 15 million labeled images
    from 22,000 categories
  • 3:47 - 3:50
    and you can see some of them here
  • 3:50 - 3:56
    if we go for this.
    this is a picture of a hotdog in a bun
  • 3:56 - 3:58
    and here are some of the categories
  • 3:58 - 4:03
    which will be some food I don't know
  • 4:03 - 4:06
    these are hotdogs, lots of
    different pictures of hotdogs
  • 4:06 - 4:09
    lots of different pictures of cheeseburgers
  • 4:09 - 4:12
    lots of different pictures of plates
  • 4:12 - 4:15
    so the task for ImageNet is to classify
  • 4:15 - 4:18
    for any given, any one of these images
  • 4:18 - 4:20
    which of a thousand different
    categories it is from
  • 4:20 - 4:25
    and it used to be that people could
    score adequately well
  • 4:25 - 4:29
    and were making incremental changes in
  • 4:29 - 4:31
    how well they can do this
  • 4:31 - 4:33
    but the deep learning people came along
  • 4:33 - 4:35
    and kind of tore this to shreds
  • 4:35 - 4:40
    and Google came up with GoogLeNet
  • 4:40 - 4:44
    what we are actually going to use here,
    back in 2014
  • 4:44 - 4:50
    suddenly, this stuff is now being done
    with further iterations
  • 4:50 - 4:53
    of this kind of thing,
    better than humans can
  • 4:53 - 4:57
    So the way you can measure whether
    someone is better than humans
  • 4:57 - 4:59
    is, you take a human and see
    whether it beats him
  • 4:59 - 5:02
    the question there is
    are there labeling errors
  • 5:02 - 5:04
    there you need a committee of humans
  • 5:04 - 5:06
    so the way they label these things is
  • 5:06 - 5:09
    by running it on Mechanical Turk and
  • 5:09 - 5:12
    asking people what category is this
    cheeseburger in
  • 5:15 - 5:16
    The network we are going to use here
  • 5:16 - 5:23
    is the 2014 state-of-the-art GoogLeNet,
    also called Inception version 1
  • 5:23 - 5:26
    The nice thing about this is that
  • 5:26 - 5:31
    there is an existing model
    already trained for this task
  • 5:31 - 5:34
    and it's available for download
    it's all free
  • 5:34 - 5:39
    and there are lots of different
    models out there
  • 5:39 - 5:41
    there's a model zoo for TensorFlow
  • 5:41 - 5:44
    So, what I have on my machine
  • 5:44 - 5:49
    and this is a small model,
    it's a 20 megabytes kind of model
  • 5:49 - 5:50
    So it is not a very big model
  • 5:50 - 5:57
    Inception 4 is a 200 MB kind of model
    which is a bit heavy
  • 5:57 - 5:59
    I am working here on my laptop
  • 5:59 - 6:01
    you are gonna see it working in real-time
  • 6:01 - 6:07
    and the trick here is instead of
    a softmax layer at the end
  • 6:07 - 6:13
    I will show you the diagram, it should be
    clear to anyone who's following along
  • 6:13 - 6:19
    instead of using logits to get me
    the probablilities
  • 6:19 - 6:21
    I am going to strip that away
  • 6:21 - 6:23
    and I am going to train
    a support vector machine
  • 6:23 - 6:25
    to distinguish these classes
  • 6:25 - 6:30
    I am not going to retrain the
    Inception network at all
  • 6:30 - 6:32
    I am going to just use it as a component
  • 6:32 - 6:35
    strip off the top classification piece
  • 6:35 - 6:38
    and replace it with an SVM
  • 6:38 - 6:40
    Now, SVMs are pretty well understood
  • 6:40 - 6:45
    here I am just using Inception
    as a featurizer for images
  • 6:45 - 6:47
    So here's a network picture
  • 6:47 - 6:52
    Basically, this is what the ImageNet
    network is designed for
  • 6:52 - 6:54
    you put in an image at the bottom
  • 6:54 - 6:57
    there is this black box which is the
    Inception network
  • 6:57 - 7:01
    which is a bunch of CNNs or
    convolutional neural networks
  • 7:01 - 7:03
    followed by Dense network
  • 7:03 - 7:05
    followed by these logits
  • 7:05 - 7:08
    and these logits layers is essentially
    the same as the 0 to 10
  • 7:08 - 7:17
    that Sam had for his digits, 1 to 1000
    for the different classes for ImageNet
  • 7:17 - 7:20
    To actually get the ImageNet output
  • 7:20 - 7:27
    it uses a softmax function and
    then chooses the highest one of these
  • 7:27 - 7:29
    to give you this is the class
    that this is in
  • 7:29 - 7:32
    What I am going to do is
    I am going to ignore this
  • 7:32 - 7:35
    neat piece of classification technology
    that they have got
  • 7:35 - 7:44
    let's say we use these outputs as inputs
    to SVM and just treat these as features
  • 7:44 - 7:47
    Now if we pick out one of these
  • 7:47 - 7:51
    this class could be cheeseburger
    and this class could be parrot
  • 7:51 - 7:54
    and this other class could be Husky dog
  • 7:54 - 7:57
    there is all sorts of classes in here
  • 7:57 - 8:00
    but basically what I will be doing is that
  • 8:00 - 8:02
    I will be extracting out the features
    of these photos
  • 8:02 - 8:05
    saying how much of this photo
    is like a parrot
  • 8:05 - 8:09
    how much of this is like a Husky dog
  • 8:09 - 8:13
    Now it turns out that modern cars and
    classic cars can be distinguised that way
  • 8:13 - 8:19
    Let me go to some code
  • 8:19 - 8:21
    Ok this code is all up on GitHub
  • 8:31 - 8:34
    Can everyone see this enough
  • 8:38 - 8:42
    So basically, I am pulling in TensorFlow
  • 8:45 - 8:49
    I pull in this model
  • 8:49 - 8:53
    Here is what the Inception architecture is
  • 8:53 - 8:57
    It feeds forward this way,
    here you put your image
  • 8:57 - 9:00
    it goes through lots and lots of
    convolutional layers
  • 9:00 - 9:03
    all the way up to the end
    with softmax and the output
  • 9:03 - 9:07
    So having done that, what I will do is
  • 9:07 - 9:10
    actually I have a download
    for the checkpoint
  • 9:10 - 9:17
    this is the checkpoint here which
    is a tar file, I have it locally stored
  • 9:17 - 9:18
    It doesn't download it now
  • 9:18 - 9:25
    but it is all there, even the
    big models are there up from Google
  • 9:25 - 9:28
    so they have retrained these
  • 9:28 - 9:30
    so the Inception thing takes about a week
  • 9:30 - 9:34
    to retrain on a bunch of,
    it could be 64 GPUs
  • 9:34 - 9:37
    so you don't really want to be
    training this thing on your own
  • 9:37 - 9:41
    you also need the ImageNet training set
  • 9:41 - 9:48
    it is a 140 GB file
    which is no fun to download
  • 9:51 - 9:57
    what I am doing here is basically
    there is also an Inception library
  • 9:57 - 10:04
    which is part of the TF-Slim
    this thing is desinged such that
  • 10:04 - 10:08
    it already knows the network
    it can preload it
  • 10:08 - 10:12
    this has loaded it,
    I can get some labels
  • 10:12 - 10:17
    This is loading up the ImageNet labels
  • 10:17 - 10:26
    I need to know which location
    corresponds to which class like the digits
  • 10:31 - 10:33
    Here we are going through
    basically the same steps
  • 10:33 - 10:39
    as the MNIST example in that
    we reset the default graph
  • 10:39 - 10:45
    we create a placeholder which is
    where my images are going to go
  • 10:45 - 10:48
    this is as an input
    but from this image input
  • 10:48 - 10:50
    I am then going to do some TensorFlow steps
  • 10:50 - 10:52
    because TensorFlow
    has various preprocessing
  • 10:52 - 10:56
    or graphics handling commands
  • 10:56 - 10:58
    because a lot of this stuff
    works with images
  • 10:58 - 11:03
    so there's all sorts of clipping
    and rotating stuff
  • 11:03 - 11:05
    so it can preprocess these images
  • 11:05 - 11:08
    I am also going to pull out a numpy image
  • 11:08 - 11:11
    so I can see what it is actually looking at
  • 11:11 - 11:15
    here with this Inception version 1
  • 11:15 - 11:21
    I am going to pull in the entire
    Inception version 1 model
  • 11:23 - 11:27
    My net function rather than being
    just picks and random weights
  • 11:27 - 11:30
    is gonna be assigned this
    from this checkpoint
  • 11:30 - 11:34
    when I run the init thing from my graph
  • 11:34 - 11:37
    or in my session, it won't initialize
    everything from random
  • 11:37 - 11:39
    it will initialize everything from disk
  • 11:39 - 11:42
    so this will define the model
  • 11:42 - 11:45
    and now let's proceed
  • 11:45 - 11:52
    one of the issues with having this
    on a nice TensorFlow graph
  • 11:52 - 11:57
    is it just says input, Inception1, output
  • 11:57 - 12:00
    so there's a big block there
    you can delve into it if you want
  • 12:00 - 12:06
    let me just show you
    let's go back a bit
  • 12:08 - 12:11
    So this is the code
    behind the Inception1 model
  • 12:11 - 12:16
    so this is actually smaller than the
    Inception2 and Inception3
  • 12:16 - 12:22
    basically, we have a kind of a base
    Inception piece, just this
  • 12:22 - 12:25
    and these are combined together
  • 12:25 - 12:33
    and this is a detailed model put together
    by many smart people in 2014
  • 12:33 - 12:35
    it's got much more complicated since then
  • 12:35 - 12:39
    fortunately, they have written the code
    and we don't have to
  • 12:43 - 12:46
    So here what I am gonna do is
    I am gonna load an example image
  • 12:46 - 12:51
    just to show you
    one of the the things here is
  • 12:51 - 12:56
    TensorFlow in order to become efficient
    wants to do the loading itself
  • 12:56 - 13:01
    So in order to get this pumping
    information through
  • 13:01 - 13:04
    it wants you to set up hues of images
  • 13:04 - 13:10
    it will then handle the whole ingestion
    process itself
  • 13:10 - 13:14
    the problem with that is
    it's kind of complicated to do
  • 13:14 - 13:16
    in a Jupyter notebook right here
  • 13:16 - 13:19
    so here I am going to do
    the very simplest thing
  • 13:19 - 13:22
    which is load a numpy image
    and stuff the numpy image in
  • 13:22 - 13:25
    but what TensorFlow would love me to do
  • 13:25 - 13:29
    is create , as you see in this one
  • 13:29 - 13:34
    create a file name queue and it will
  • 13:34 - 13:35
    then run the queue, do the matching
  • 13:35 - 13:37
    and do all of this stuff itself
  • 13:37 - 13:41
    because then it can lay it out across
    potentially distributed cluster
  • 13:41 - 13:43
    and do everything just right
  • 13:43 - 13:50
    here I do kind of the simple read the image
  • 13:50 - 14:00
    so this image is a tensor
    which is 224 by 224 by RGB
  • 14:00 - 14:03
    this is kind of sanity check
    what kind of numbers I got in the corner
  • 14:03 - 14:06
    and then what I am gonna do is
  • 14:06 - 14:08
    i am going to crop out the
    middle section of it
  • 14:08 - 14:11
    this happens to be the right size already
  • 14:11 - 14:13
    basically if you got odd shapes
  • 14:13 - 14:15
    you need to think about
    how am I gonna do it
  • 14:15 - 14:19
    am I going to pad it
    what do you do
  • 14:19 - 14:22
    because in order to make this efficient
  • 14:22 - 14:29
    TensorFlow wants to lay it out without
    all this variability in image size
  • 14:29 - 14:34
    one set of parameters and it's then going
    to blast it across your GPU
  • 14:34 - 14:38
    so let's just run this thing
  • 14:38 - 14:40
    so now we have defined the network
  • 14:40 - 14:46
    here I am going to pick a session
    here I am going to init the session
  • 14:46 - 14:48
    it loads the data, and then I am going
  • 14:48 - 14:52
    to pick up the numpy image and the
    probabilities from the top layer
  • 14:52 - 14:55
    I am just gonna show it
  • 14:58 - 15:01
    here is the image
    this is image I pulled out of the disk
  • 15:01 - 15:06
    you can see here the probabilities,
    the highest probability is Tabby cat
  • 15:06 - 15:10
    which is good, it's also interesting that
  • 15:10 - 15:15
    the second in line things are Tiger cat,
    Egyptian cat, lynx
  • 15:15 - 15:21
    so it's got a fair idea that it is a cat
    in particular, it is getting it right
  • 15:21 - 15:26
    ok so this is the same diagram
    we have had before
  • 15:26 - 15:33
    what you have seen is this going in this
    black box, coming out and telling us
  • 15:33 - 15:36
    the probabilities here, so what we are
    now gonna do is
  • 15:36 - 15:42
    from the image to the black box and
    just learn a bunch of features
  • 15:50 - 15:53
    let me just show you this on disk
  • 16:11 - 16:13
    so I have a cars directory here
  • 16:14 - 16:18
    and inside this thing,
  • 16:24 - 16:26
    I have surprisingly little data
  • 16:37 - 16:40
    In this directory, I just have a
    bunch of car images
  • 16:40 - 16:42
    and I have two sets of images
  • 16:42 - 16:48
    one of which is called classic
    and the other is called modern
  • 16:48 - 16:52
    so I basically I picked some
    photos off Flickr
  • 16:52 - 16:54
    I put these into two separate directories
  • 16:54 - 16:56
    I am going to use those directory names
  • 16:56 - 17:00
    as the classification for these images
  • 17:00 - 17:05
    In the upper directory here
    I got a bunch of test images
  • 17:05 - 17:07
    which I don't know the labels for
  • 17:13 - 17:17
    this picks out the list of classic , there
    is a classic and a modern directory
  • 17:17 - 17:22
    I am gonna go through every file
    in this directory
  • 17:22 - 17:28
    I am gonna crop it, I am gonna find
    the logits level which is
  • 17:28 - 17:33
    all the classes and then I am just gonna
    add these to features
  • 17:33 - 17:37
    So basically I am gonna do something
    like a scikit-learn model
  • 17:37 - 17:38
    I am gonna fit SVM
  • 17:38 - 17:42
    so basically, this is featurizing
    all these pictures
  • 17:48 - 17:50
    so here we go with the training data
  • 17:56 - 17:57
    here's some training
  • 18:02 - 18:06
    classic cars,
    it went through the classic directory
  • 18:06 - 18:09
    modern cars,
    it went through the modern directory
  • 18:15 - 18:17
    it's thinking hard
  • 18:18 - 18:25
    what I am gonna do now is
    build SVM over those features
  • 18:31 - 18:40
    jump to 21:36
  • 21:35 - 21:44
    I restarted this thing
  • 21:44 - 21:50
    the actual training for this SVM thing
    takes that long,
  • 21:50 - 21:58
    this is very quick, essentially 20 images
    worth of a thousand features
  • 21:58 - 22:02
    so there was no big training loop to do
  • 22:02 - 22:09
    then I can run this on the actual models
    in the directory, in the test set
  • 22:09 - 22:13
    so here this is images that it has never
    seen before
  • 22:13 - 22:16
    it thinks that this is a modern car
  • 22:16 - 22:19
    this one it thinks is a classic car,
    this one is classified as modern
  • 22:19 - 22:26
    so this is actually doing quite a good job
    out of just 10 examples of each
  • 22:26 - 22:33
    it actually thinks this one is modern
    it's not a sports car but anyway
  • 22:33 - 22:39
    so this is showing that the SVM we trained
  • 22:39 - 22:43
    can classify based on the features that
    Inception is producing because
  • 22:43 - 22:47
    Inception understands "understands"
    what images are about
  • 22:47 - 22:51
    so if I go back to here,
    code is on GitHub
  • 22:51 - 22:54
    conclusions okay, this thing really works
  • 22:54 - 22:58
    we didn't have to train
    a deep neural network
  • 22:58 - 23:02
    we could plug this TensorFlow model
    into an existing pipeline
  • 23:02 - 23:05
    and this is actually something where
  • 23:05 - 23:09
    the TensorFlow Summit has something
    to say about these pipelines
  • 23:09 - 23:11
    because not only are they talking
    about deep learning
  • 23:11 - 23:15
    they are talking about the whole
    cloud-based learning
  • 23:15 - 23:19
    and setting up proper processes
  • 23:19 - 23:24
    I guess, time for questions quickly
  • 23:24 - 23:29
    we can then do the
    TensorFlow Summit wrap-up
  • 23:33 - 23:37
    "I am assuming that there is no
    backpropagation here"
  • 23:37 - 23:40
    This includes no backpropagation
  • 23:40 - 23:43
    "End result is a feature"
  • 23:46 - 23:53
    I am just assuming that Inception,
    you can imagine if the ImageNet thing
  • 23:53 - 23:56
    had focused more on products,
    it could be even better
  • 23:56 - 23:59
    if it focused on man-made things
  • 23:59 - 24:05
    The ImageNet training set has an awful
    lot of dogs in it, not that many cats
  • 24:05 - 24:09
    So, on the other hand it may be that
    it has quite a lot of flowers
  • 24:09 - 24:14
    or maybe that it is saying I like this car
    as modern car
  • 24:14 - 24:16
    because it's got petals for wheels
  • 24:16 - 24:20
    whereas the other one, the classic cars
    tend to have round things for wheels
  • 24:20 - 24:25
    So it is abstractly doing this
  • 24:25 - 24:30
    It doesn't know about sports cars or
    what they look like
  • 24:30 - 24:32
    But it does know about curves
  • 24:35 - 24:38
    "So for SVM, you don't use
    TensorFlow anymore ?"
  • 24:38 - 24:43
    No, basically I have used TensorFlow to
    create some features
  • 24:43 - 24:45
    Now, I don't want to throw it away
  • 24:45 - 24:48
    because hopefully I have got
    a streaming process where
  • 24:48 - 24:52
    more and more images are chugged
    through this thing
  • 24:52 - 25:05
    could not hear the question properly
  • 25:07 - 25:10
    There is an example code called
    TensorFlow for poets
  • 25:10 - 25:13
    where they actually say that,
    let's load up one of these networks
  • 25:13 - 25:15
    and then we will do some fine tuning
  • 25:15 - 25:22
    there you get involved in tuning
    these neurons with some gradient descent
  • 25:22 - 25:25
    and you are taking some steps
    and all this kind of thing
  • 25:25 - 25:28
    maybe you are having broad implications
    across the whole network
  • 25:28 - 25:33
    which could be good if you have got
    tons of data and tons of time
  • 25:33 - 25:37
    but this is a very simple way of just
    tricking it to get it done
  • 25:37 - 25:47
    could not hear the comment properly
  • 25:47 - 25:54
    it will be a very small network
    because SVM is essentially fairly shallow
  • 25:54 - 26:07
    could not hear the question
  • 26:07 - 26:14
    TensorFlow even though it has imported
    this large Inception network
  • 26:14 - 26:21
    as far as I am concerned,
    I am using a f(x) = y and that's it
  • 26:21 - 26:25
    but you can inquire what would it say
    at this particular level
  • 26:25 - 26:30
    and these bunches of levels with various
    component points along the way
  • 26:30 - 26:34
    I could take out other levels
  • 26:34 - 26:36
    I haven't tried it to have a look
  • 26:36 - 26:40
    There you get more like pictures
    worth of features rather than
  • 26:40 - 26:43
    this string of a 1000 numbers
  • 26:43 - 26:49
    but each intermediate levels
    will be pictures with CNN kind of features
  • 26:49 - 26:54
    on the other hand, if you want
    to play around with this thing
  • 26:54 - 26:58
    there's this nice stuff called
    the DeepDream kind of things
  • 26:58 - 27:03
    where they try and match images to
    being interesting images
  • 27:03 - 27:06
    then you do the featurizing that looks at
    different levels
  • 27:06 - 27:12
    the highest level is a cat but I want all
    local features to be as fishy as possible
  • 27:12 - 27:16
    then you get like a fish-faced cat
  • 27:16 - 27:20
    that's the kind of thing you can do with
    these kinds of features in models
Title:
Go Deeper: Transfer Learning - TensorFlow and Deep Learning Singapore
Description:

Speaker: Martin Andrews

Event Page: https://www.meetup.com/TensorFlow-and-Deep-Learning-Singapore/events/237032130/

Produced by Engineers.SG

English subtitles by: Sindhu Shetty

more » « less
Video Language:
English
Duration:
27:36

English subtitles

Revisions