WEBVTT
00:00:00.259 --> 00:00:03.421
What we hope to do with this meetup
00:00:03.851 --> 00:00:10.620
is have something, given the spread of
the questionnaire results
00:00:10.620 --> 00:00:12.681
we hope to do something which is kind of
00:00:12.681 --> 00:00:15.721
for people who don't know what
deep learning is
00:00:15.721 --> 00:00:17.992
and want an introduction to TensorFlow
00:00:17.992 --> 00:00:20.322
but also something which is more of a
00:00:20.322 --> 00:00:24.021
like a crowd pleaser or something
which is more cutting edge
00:00:24.021 --> 00:00:27.081
I am not going to say that this
thing is particularly cutting edge
00:00:27.081 --> 00:00:31.553
because once we saw the responses,
we dialed things down a bit
00:00:31.553 --> 00:00:37.803
But there will be more cutting edge stuff
00:00:37.803 --> 00:00:42.811
and maybe we start to do other meetups
events in other formats
00:00:42.811 --> 00:00:48.824
So it could be like we have
an experts' paper meeting
00:00:48.824 --> 00:00:52.864
or we could split it now we can see
the size of people, size of the crowd
00:00:52.864 --> 00:00:57.824
Anyway, let me talk a little bit about
going deeper with transfer learning
00:00:57.824 --> 00:01:00.164
Unfortunately, this is something
some of you people
00:01:00.164 --> 00:01:02.503
would have seen me do before
00:01:02.503 --> 00:01:04.883
This is the first time I have
done it in tensorflow
00:01:05.122 --> 00:01:07.272
and let me just explain that
00:01:07.272 --> 00:01:10.162
Before, I have been programming this stuff
00:01:10.162 --> 00:01:13.382
in Theano with the
Lasagna layers thing on top
00:01:13.382 --> 00:01:19.431
and Theano is a research-based
deep learning framework, out of Montreal
00:01:19.431 --> 00:01:22.683
but what I have concluded
since last summer
00:01:22.683 --> 00:01:26.643
is that TensorFlow 's probably the winner
of this framework race
00:01:26.643 --> 00:01:29.434
at least for foreseeable future
00:01:29.434 --> 00:01:31.861
with all this nice industrial stuff
00:01:31.891 --> 00:01:35.154
I should be retooling into TensorFlow
00:01:35.518 --> 00:01:37.483
That's what I am taking the opportunity
to do for this
00:01:40.951 --> 00:01:43.067
So, about me, sorry here we go
00:01:43.067 --> 00:01:45.678
I have come up through finance,
startups and stuff
00:01:45.678 --> 00:01:49.649
I took a year out basically in 2014
just for fun
00:01:49.649 --> 00:01:53.559
I have been doing serious kind of
natural language processing since then
00:02:00.909 --> 00:02:04.629
Basically, the overview for this
something more challenging talk
00:02:04.629 --> 00:02:08.669
which will probably be 20 mins, 30 mins
depending on how it goes
00:02:08.669 --> 00:02:13.889
I want to take a state-of-the-art
TensorFlow model
00:02:13.889 --> 00:02:16.769
I want to solve a problem that
it wasn't trained for
00:02:16.769 --> 00:02:20.928
And I am going to be using
deep learning as a component
00:02:20.928 --> 00:02:25.960
of my solution rather than the
primary focus of what I am trying to build
00:02:25.960 --> 00:02:32.900
So this is, in a way more of an industrial
or commercial kind of application
00:02:32.912 --> 00:02:35.190
for what's going on here
00:02:35.190 --> 00:02:38.510
So the goal for this kind of problem is
00:02:38.510 --> 00:02:42.530
I want to distinguish pictures
of classic and modern sports cars
00:02:42.530 --> 00:02:47.051
you will see some pictures of
classic and modern cars a bit later
00:02:48.433 --> 00:02:51.722
It's not that easy to say what
the difference is
00:02:51.722 --> 00:02:55.211
obviously, it could be
different types of images
00:02:55.211 --> 00:02:57.454
and it could be lots of
different classes
00:02:57.454 --> 00:03:00.992
I am just doing a very simple
two class thing
00:03:00.992 --> 00:03:03.145
but it's complicated images
00:03:03.145 --> 00:03:04.824
what I want to do is
00:03:04.824 --> 00:03:06.114
I want to have a very small training time
00:03:06.114 --> 00:03:08.381
so I don't want to be retraining
some huge network
00:03:08.381 --> 00:03:12.895
Particularly, I have only got
in this case, 20 training examples
00:03:12.895 --> 00:03:18.195
So I am not gonna do any fantastic
million image training
00:03:18.195 --> 00:03:20.863
I have got 20 images with me
00:03:20.863 --> 00:03:24.705
and I also want to be able to
put this in production
00:03:24.705 --> 00:03:30.118
so I can just run it as a component of
something else
00:03:30.118 --> 00:03:36.395
Basically, one of the things that is
carrying the deep learning world forward
00:03:36.395 --> 00:03:40.196
is an image classification task
called ImageNet
00:03:40.196 --> 00:03:42.406
this has been a competition where
00:03:42.406 --> 00:03:47.407
they have 15 million labeled images
from 22,000 categories
00:03:47.407 --> 00:03:49.858
and you can see some of them here
00:03:49.858 --> 00:03:55.817
if we go for this.
this is a picture of a hotdog in a bun
00:03:55.817 --> 00:03:57.786
and here are some of the categories
00:03:57.786 --> 00:04:02.538
which will be some food I don't know
00:04:02.538 --> 00:04:06.107
these are hotdogs, lots of
different pictures of hotdogs
00:04:06.107 --> 00:04:09.058
lots of different pictures of cheeseburgers
00:04:09.058 --> 00:04:11.848
lots of different pictures of plates
00:04:11.848 --> 00:04:15.338
so the task for ImageNet is to classify
00:04:15.338 --> 00:04:18.267
for any given, any one of these images
00:04:18.267 --> 00:04:20.447
which of a thousand different
categories it is from
00:04:20.447 --> 00:04:25.328
and it used to be that people could
score adequately well
00:04:25.328 --> 00:04:28.558
and were making incremental changes in
00:04:28.558 --> 00:04:30.558
how well they can do this
00:04:30.558 --> 00:04:32.998
but the deep learning people came along
00:04:32.998 --> 00:04:35.488
and kind of tore this to shreds
00:04:35.488 --> 00:04:40.149
and Google came up with GoogLeNet
00:04:40.149 --> 00:04:43.909
what we are actually going to use here,
back in 2014
00:04:43.909 --> 00:04:49.649
suddenly, this stuff is now being done
with further iterations
00:04:49.649 --> 00:04:52.808
of this kind of thing,
better than humans can
00:04:52.808 --> 00:04:56.795
So the way you can measure whether
someone is better than humans
00:04:56.795 --> 00:04:59.069
is, you take a human and see
whether it beats him
00:04:59.069 --> 00:05:01.560
the question there is
are there labeling errors
00:05:01.560 --> 00:05:03.720
there you need a committee of humans
00:05:03.720 --> 00:05:06.250
so the way they label these things is
00:05:06.250 --> 00:05:08.740
by running it on Mechanical Turk and
00:05:08.740 --> 00:05:12.490
asking people what category is this
cheeseburger in
00:05:14.820 --> 00:05:16.380
The network we are going to use here
00:05:16.380 --> 00:05:23.421
is the 2014 state-of-the-art GoogLeNet,
also called Inception version 1
00:05:23.421 --> 00:05:25.690
The nice thing about this is that
00:05:25.690 --> 00:05:30.942
there is an existing model
already trained for this task
00:05:30.942 --> 00:05:33.772
and it's available for download
it's all free
00:05:33.772 --> 00:05:38.952
and there are lots of different
models out there
00:05:38.952 --> 00:05:41.362
there's a model zoo for TensorFlow
00:05:41.362 --> 00:05:44.351
So, what I have on my machine
00:05:44.351 --> 00:05:48.531
and this is a small model,
it's a 20 megabytes kind of model
00:05:48.531 --> 00:05:50.276
So it is not a very big model
00:05:50.276 --> 00:05:57.291
Inception 4 is a 200 MB kind of model
which is a bit heavy
00:05:57.291 --> 00:05:59.423
I am working here on my laptop
00:05:59.423 --> 00:06:01.212
you are gonna see it working in real-time
00:06:01.212 --> 00:06:07.254
and the trick here is instead of
a softmax layer at the end
00:06:07.254 --> 00:06:12.984
I will show you the diagram, it should be
clear to anyone who's following along
00:06:12.984 --> 00:06:19.082
instead of using logits to get me
the probablilities
00:06:19.082 --> 00:06:21.133
I am going to strip that away
00:06:21.133 --> 00:06:23.074
and I am going to train
a support vector machine
00:06:23.074 --> 00:06:24.884
to distinguish these classes
00:06:24.884 --> 00:06:29.854
I am not going to retrain the
Inception network at all
00:06:29.854 --> 00:06:32.474
I am going to just use it as a component
00:06:32.474 --> 00:06:34.913
strip off the top classification piece
00:06:34.913 --> 00:06:38.234
and replace it with an SVM
00:06:38.234 --> 00:06:40.384
Now, SVMs are pretty well understood
00:06:40.384 --> 00:06:44.624
here I am just using Inception
as a featurizer for images
00:06:44.624 --> 00:06:47.285
So here's a network picture
00:06:47.285 --> 00:06:52.015
Basically, this is what the ImageNet
network is designed for
00:06:52.015 --> 00:06:54.334
you put in an image at the bottom
00:06:54.334 --> 00:06:57.445
there is this black box which is the
Inception network
00:06:57.445 --> 00:07:00.745
which is a bunch of CNNs or
convolutional neural networks
00:07:00.745 --> 00:07:02.596
followed by Dense network
00:07:02.596 --> 00:07:04.846
followed by these logits
00:07:04.846 --> 00:07:07.976
and these logits layers is essentially
the same as the 0 to 10
00:07:07.976 --> 00:07:17.037
that Sam had for his digits, 1 to 1000
for the different classes for ImageNet
00:07:17.037 --> 00:07:20.418
To actually get the ImageNet output
00:07:20.418 --> 00:07:27.387
it uses a softmax function and
then chooses the highest one of these
00:07:27.387 --> 00:07:28.908
to give you this is the class
that this is in
00:07:28.908 --> 00:07:32.167
What I am going to do is
I am going to ignore this
00:07:32.167 --> 00:07:35.337
neat piece of classification technology
that they have got
00:07:35.337 --> 00:07:44.148
let's say we use these outputs as inputs
to SVM and just treat these as features
00:07:44.148 --> 00:07:46.567
Now if we pick out one of these
00:07:46.567 --> 00:07:50.698
this class could be cheeseburger
and this class could be parrot
00:07:50.698 --> 00:07:54.067
and this other class could be Husky dog
00:07:54.067 --> 00:07:57.178
there is all sorts of classes in here
00:07:57.178 --> 00:07:59.709
but basically what I will be doing is that
00:07:59.709 --> 00:08:02.248
I will be extracting out the features
of these photos
00:08:02.248 --> 00:08:04.948
saying how much of this photo
is like a parrot
00:08:04.948 --> 00:08:08.938
how much of this is like a Husky dog
00:08:08.938 --> 00:08:13.229
Now it turns out that modern cars and
classic cars can be distinguised that way
00:08:13.229 --> 00:08:18.659
Let me go to some code
00:08:18.659 --> 00:08:20.600
Ok this code is all up on GitHub
00:08:30.950 --> 00:08:34.300
Can everyone see this enough
00:08:38.380 --> 00:08:42.230
So basically, I am pulling in TensorFlow
00:08:45.400 --> 00:08:49.251
I pull in this model
00:08:49.251 --> 00:08:52.780
Here is what the Inception architecture is
00:08:52.780 --> 00:08:56.971
It feeds forward this way,
here you put your image
00:08:56.971 --> 00:08:59.901
it goes through lots and lots of
convolutional layers
00:08:59.901 --> 00:09:03.490
all the way up to the end
with softmax and the output
00:09:03.490 --> 00:09:06.922
So having done that, what I will do is
00:09:06.922 --> 00:09:09.741
actually I have a download
for the checkpoint
00:09:09.741 --> 00:09:16.562
this is the checkpoint here which
is a tar file, I have it locally stored
00:09:16.562 --> 00:09:18.500
It doesn't download it now
00:09:18.500 --> 00:09:25.262
but it is all there, even the
big models are there up from Google
00:09:25.262 --> 00:09:27.762
so they have retrained these
00:09:27.762 --> 00:09:30.483
so the Inception thing takes about a week
00:09:30.483 --> 00:09:33.792
to retrain on a bunch of,
it could be 64 GPUs
00:09:33.792 --> 00:09:36.864
so you don't really want to be
training this thing on your own
00:09:36.864 --> 00:09:40.793
you also need the ImageNet training set
00:09:40.793 --> 00:09:48.384
it is a 140 GB file
which is no fun to download
00:09:50.824 --> 00:09:57.185
what I am doing here is basically
there is also an Inception library
00:09:57.185 --> 00:10:04.043
which is part of the TF-Slim
this thing is desinged such that
00:10:04.043 --> 00:10:08.264
it already knows the network
it can preload it
00:10:08.264 --> 00:10:12.290
this has loaded it,
I can get some labels
00:10:12.290 --> 00:10:17.184
This is loading up the ImageNet labels
00:10:17.184 --> 00:10:25.565
I need to know which location
corresponds to which class like the digits
00:10:31.285 --> 00:10:33.305
Here we are going through
basically the same steps
00:10:33.305 --> 00:10:39.068
as the MNIST example in that
we reset the default graph
00:10:39.068 --> 00:10:44.586
we create a placeholder which is
where my images are going to go
00:10:44.586 --> 00:10:47.575
this is as an input
but from this image input
00:10:47.575 --> 00:10:49.904
I am then going to do some TensorFlow steps
00:10:49.904 --> 00:10:52.286
because TensorFlow
has various preprocessing
00:10:52.286 --> 00:10:55.767
or graphics handling commands
00:10:55.767 --> 00:10:57.747
because a lot of this stuff
works with images
00:10:57.747 --> 00:11:02.547
so there's all sorts of clipping
and rotating stuff
00:11:02.547 --> 00:11:04.778
so it can preprocess these images
00:11:04.778 --> 00:11:08.485
I am also going to pull out a numpy image
00:11:08.485 --> 00:11:10.828
so I can see what it is actually looking at
00:11:10.828 --> 00:11:14.850
here with this Inception version 1
00:11:14.850 --> 00:11:20.906
I am going to pull in the entire
Inception version 1 model
00:11:23.356 --> 00:11:26.568
My net function rather than being
just picks and random weights
00:11:26.568 --> 00:11:29.978
is gonna be assigned this
from this checkpoint
00:11:29.978 --> 00:11:34.418
when I run the init thing from my graph
00:11:34.418 --> 00:11:37.478
or in my session, it won't initialize
everything from random
00:11:37.478 --> 00:11:39.479
it will initialize everything from disk
00:11:39.479 --> 00:11:42.028
so this will define the model
00:11:42.028 --> 00:11:45.358
and now let's proceed
00:11:45.358 --> 00:11:51.609
one of the issues with having this
on a nice TensorFlow graph
00:11:51.609 --> 00:11:56.658
is it just says input, Inception1, output
00:11:56.658 --> 00:11:59.939
so there's a big block there
you can delve into it if you want
00:11:59.939 --> 00:12:05.790
let me just show you
let's go back a bit
00:12:08.320 --> 00:12:11.300
So this is the code
behind the Inception1 model
00:12:11.300 --> 00:12:16.060
so this is actually smaller than the
Inception2 and Inception3
00:12:16.060 --> 00:12:22.331
basically, we have a kind of a base
Inception piece, just this
00:12:22.331 --> 00:12:24.971
and these are combined together
00:12:24.971 --> 00:12:33.441
and this is a detailed model put together
by many smart people in 2014
00:12:33.441 --> 00:12:35.472
it's got much more complicated since then
00:12:35.472 --> 00:12:38.912
fortunately, they have written the code
and we don't have to
00:12:43.422 --> 00:12:46.321
So here what I am gonna do is
I am gonna load an example image
00:12:46.321 --> 00:12:50.581
just to show you
one of the the things here is
00:12:50.581 --> 00:12:56.396
TensorFlow in order to become efficient
wants to do the loading itself
00:12:56.396 --> 00:13:01.344
So in order to get this pumping
information through
00:13:01.344 --> 00:13:03.633
it wants you to set up hues of images
00:13:03.633 --> 00:13:10.263
it will then handle the whole ingestion
process itself
00:13:10.263 --> 00:13:14.153
the problem with that is
it's kind of complicated to do
00:13:14.153 --> 00:13:16.023
in a Jupyter notebook right here
00:13:16.023 --> 00:13:19.133
so here I am going to do
the very simplest thing
00:13:19.133 --> 00:13:22.393
which is load a numpy image
and stuff the numpy image in
00:13:22.393 --> 00:13:24.883
but what TensorFlow would love me to do
00:13:24.883 --> 00:13:29.413
is create , as you see in this one
00:13:29.413 --> 00:13:34.024
create a file name queue and it will
00:13:34.024 --> 00:13:35.314
then run the queue, do the matching
00:13:35.314 --> 00:13:36.674
and do all of this stuff itself
00:13:36.674 --> 00:13:41.093
because then it can lay it out across
potentially distributed cluster
00:13:41.093 --> 00:13:43.414
and do everything just right
00:13:43.414 --> 00:13:50.254
here I do kind of the simple read the image
00:13:50.254 --> 00:13:59.507
so this image is a tensor
which is 224 by 224 by RGB
00:13:59.507 --> 00:14:03.478
this is kind of sanity check
what kind of numbers I got in the corner
00:14:03.478 --> 00:14:05.667
and then what I am gonna do is
00:14:05.667 --> 00:14:08.016
i am going to crop out the
middle section of it
00:14:08.016 --> 00:14:10.761
this happens to be the right size already
00:14:10.761 --> 00:14:13.495
basically if you got odd shapes
00:14:13.495 --> 00:14:15.136
you need to think about
how am I gonna do it
00:14:15.136 --> 00:14:18.956
am I going to pad it
what do you do
00:14:18.956 --> 00:14:21.947
because in order to make this efficient
00:14:21.947 --> 00:14:29.056
TensorFlow wants to lay it out without
all this variability in image size
00:14:29.056 --> 00:14:34.475
one set of parameters and it's then going
to blast it across your GPU
00:14:34.475 --> 00:14:37.865
so let's just run this thing
00:14:37.865 --> 00:14:39.697
so now we have defined the network
00:14:39.697 --> 00:14:45.767
here I am going to pick a session
here I am going to init the session
00:14:45.767 --> 00:14:47.839
it loads the data, and then I am going
00:14:47.839 --> 00:14:52.037
to pick up the numpy image and the
probabilities from the top layer
00:14:52.037 --> 00:14:54.677
I am just gonna show it
00:14:57.507 --> 00:15:01.366
here is the image
this is image I pulled out of the disk
00:15:01.366 --> 00:15:06.327
you can see here the probabilities,
the highest probability is Tabby cat
00:15:06.327 --> 00:15:10.487
which is good, it's also interesting that
00:15:10.487 --> 00:15:15.263
the second in line things are Tiger cat,
Egyptian cat, lynx
00:15:15.263 --> 00:15:21.037
so it's got a fair idea that it is a cat
in particular, it is getting it right
00:15:21.037 --> 00:15:26.169
ok so this is the same diagram
we have had before
00:15:26.169 --> 00:15:32.729
what you have seen is this going in this
black box, coming out and telling us
00:15:32.729 --> 00:15:35.868
the probabilities here, so what we are
now gonna do is
00:15:35.868 --> 00:15:41.910
from the image to the black box and
just learn a bunch of features
00:15:50.030 --> 00:15:52.720
let me just show you this on disk
00:16:11.300 --> 00:16:13.304
so I have a cars directory here
00:16:13.957 --> 00:16:17.848
and inside this thing,
00:16:24.238 --> 00:16:25.788
I have surprisingly little data
00:16:36.648 --> 00:16:39.863
In this directory, I just have a
bunch of car images
00:16:39.863 --> 00:16:42.189
and I have two sets of images
00:16:42.189 --> 00:16:47.659
one of which is called classic
and the other is called modern
00:16:47.659 --> 00:16:52.010
so I basically I picked some
photos off Flickr
00:16:52.010 --> 00:16:54.439
I put these into two separate directories
00:16:54.439 --> 00:16:56.309
I am going to use those directory names
00:16:56.309 --> 00:17:00.431
as the classification for these images
00:17:00.431 --> 00:17:05.160
In the upper directory here
I got a bunch of test images
00:17:05.160 --> 00:17:06.830
which I don't know the labels for
00:17:12.610 --> 00:17:17.261
this picks out the list of classic , there
is a classic and a modern directory
00:17:17.261 --> 00:17:21.990
I am gonna go through every file
in this directory
00:17:21.990 --> 00:17:28.470
I am gonna crop it, I am gonna find
the logits level which is
00:17:28.470 --> 00:17:33.441
all the classes and then I am just gonna
add these to features
00:17:33.441 --> 00:17:36.601
So basically I am gonna do something
like a scikit-learn model
00:17:36.601 --> 00:17:38.311
I am gonna fit SVM
00:17:38.311 --> 00:17:42.111
so basically, this is featurizing
all these pictures
00:17:47.911 --> 00:17:49.961
so here we go with the training data
00:17:55.571 --> 00:17:56.972
here's some training
00:18:02.272 --> 00:18:05.622
classic cars,
it went through the classic directory
00:18:05.622 --> 00:18:08.782
modern cars,
it went through the modern directory
00:18:15.292 --> 00:18:16.752
it's thinking hard
00:18:18.392 --> 00:18:25.284
what I am gonna do now is
build SVM over those features
00:18:31.016 --> 00:18:40.180
jump to 21:36
00:21:35.478 --> 00:21:43.839
I restarted this thing
00:21:43.839 --> 00:21:49.619
the actual training for this SVM thing
takes that long,
00:21:49.619 --> 00:21:58.018
this is very quick, essentially 20 images
worth of a thousand features
00:21:58.018 --> 00:22:01.840
so there was no big training loop to do
00:22:01.840 --> 00:22:09.070
then I can run this on the actual models
in the directory, in the test set
00:22:09.070 --> 00:22:12.680
so here this is images that it has never
seen before
00:22:12.680 --> 00:22:16.440
it thinks that this is a modern car
00:22:16.440 --> 00:22:19.020
this one it thinks is a classic car,
this one is classified as modern
00:22:19.020 --> 00:22:26.301
so this is actually doing quite a good job
out of just 10 examples of each
00:22:26.301 --> 00:22:32.770
it actually thinks this one is modern
it's not a sports car but anyway
00:22:32.770 --> 00:22:38.939
so this is showing that the SVM we trained
00:22:38.939 --> 00:22:42.901
can classify based on the features that
Inception is producing because
00:22:42.901 --> 00:22:47.231
Inception understands "understands"
what images are about
00:22:47.231 --> 00:22:50.801
so if I go back to here,
code is on GitHub
00:22:50.801 --> 00:22:53.992
conclusions okay, this thing really works
00:22:53.992 --> 00:22:58.402
we didn't have to train
a deep neural network
00:22:58.402 --> 00:23:01.876
we could plug this TensorFlow model
into an existing pipeline
00:23:01.876 --> 00:23:04.760
and this is actually something where
00:23:04.760 --> 00:23:08.532
the TensorFlow Summit has something
to say about these pipelines
00:23:08.532 --> 00:23:11.013
because not only are they talking
about deep learning
00:23:11.013 --> 00:23:14.753
they are talking about the whole
cloud-based learning
00:23:14.753 --> 00:23:19.453
and setting up proper processes
00:23:19.453 --> 00:23:23.965
I guess, time for questions quickly
00:23:23.965 --> 00:23:29.142
we can then do the
TensorFlow Summit wrap-up
00:23:33.212 --> 00:23:37.144
"I am assuming that there is no
backpropagation here"
00:23:37.144 --> 00:23:40.034
This includes no backpropagation
00:23:40.034 --> 00:23:42.504
"End result is a feature"
00:23:45.884 --> 00:23:53.135
I am just assuming that Inception,
you can imagine if the ImageNet thing
00:23:53.135 --> 00:23:56.265
had focused more on products,
it could be even better
00:23:56.265 --> 00:23:58.914
if it focused on man-made things
00:23:58.914 --> 00:24:04.915
The ImageNet training set has an awful
lot of dogs in it, not that many cats
00:24:04.915 --> 00:24:09.426
So, on the other hand it may be that
it has quite a lot of flowers
00:24:09.426 --> 00:24:13.826
or maybe that it is saying I like this car
as modern car
00:24:13.826 --> 00:24:16.046
because it's got petals for wheels
00:24:16.046 --> 00:24:20.385
whereas the other one, the classic cars
tend to have round things for wheels
00:24:20.385 --> 00:24:25.146
So it is abstractly doing this
00:24:25.146 --> 00:24:29.918
It doesn't know about sports cars or
what they look like
00:24:29.918 --> 00:24:31.587
But it does know about curves
00:24:34.607 --> 00:24:37.527
"So for SVM, you don't use
TensorFlow anymore ?"
00:24:37.527 --> 00:24:43.157
No, basically I have used TensorFlow to
create some features
00:24:43.157 --> 00:24:45.308
Now, I don't want to throw it away
00:24:45.308 --> 00:24:47.687
because hopefully I have got
a streaming process where
00:24:47.687 --> 00:24:52.177
more and more images are chugged
through this thing
00:24:52.177 --> 00:25:04.528
could not hear the question properly
00:25:07.058 --> 00:25:10.068
There is an example code called
TensorFlow for poets
00:25:10.068 --> 00:25:13.296
where they actually say that,
let's load up one of these networks
00:25:13.296 --> 00:25:15.369
and then we will do some fine tuning
00:25:15.369 --> 00:25:21.977
there you get involved in tuning
these neurons with some gradient descent
00:25:21.977 --> 00:25:24.819
and you are taking some steps
and all this kind of thing
00:25:24.819 --> 00:25:28.328
maybe you are having broad implications
across the whole network
00:25:28.328 --> 00:25:32.819
which could be good if you have got
tons of data and tons of time
00:25:32.819 --> 00:25:36.948
but this is a very simple way of just
tricking it to get it done
00:25:36.948 --> 00:25:47.382
could not hear the comment properly
00:25:47.382 --> 00:25:54.033
it will be a very small network
because SVM is essentially fairly shallow
00:25:54.033 --> 00:26:06.532
could not hear the question
00:26:06.532 --> 00:26:13.752
TensorFlow even though it has imported
this large Inception network
00:26:13.752 --> 00:26:20.572
as far as I am concerned,
I am using a f(x) = y and that's it
00:26:20.572 --> 00:26:25.062
but you can inquire what would it say
at this particular level
00:26:25.062 --> 00:26:30.473
and these bunches of levels with various
component points along the way
00:26:30.473 --> 00:26:33.654
I could take out other levels
00:26:33.654 --> 00:26:35.783
I haven't tried it to have a look
00:26:35.783 --> 00:26:40.083
There you get more like pictures
worth of features rather than
00:26:40.083 --> 00:26:43.094
this string of a 1000 numbers
00:26:43.094 --> 00:26:48.884
but each intermediate levels
will be pictures with CNN kind of features
00:26:48.884 --> 00:26:53.544
on the other hand, if you want
to play around with this thing
00:26:53.544 --> 00:26:57.654
there's this nice stuff called
the DeepDream kind of things
00:26:57.654 --> 00:27:02.559
where they try and match images to
being interesting images
00:27:02.559 --> 00:27:06.454
then you do the featurizing that looks at
different levels
00:27:06.454 --> 00:27:12.415
the highest level is a cat but I want all
local features to be as fishy as possible
00:27:12.415 --> 00:27:15.561
then you get like a fish-faced cat
00:27:15.561 --> 00:27:20.010
that's the kind of thing you can do with
these kinds of features in models