WEBVTT

00:00:00.259 --> 00:00:03.421
What we hope to do with this meetup

00:00:03.851 --> 00:00:10.620
is have something, given the spread of 
the questionnaire results

00:00:10.620 --> 00:00:12.681
we hope to do something which is kind of


00:00:12.681 --> 00:00:15.721
for people who don't know what
deep learning is

00:00:15.721 --> 00:00:17.992
and want an introduction to TensorFlow

00:00:17.992 --> 00:00:20.322
but also something which is more of a

00:00:20.322 --> 00:00:24.021
like a crowd pleaser or something
which is more cutting edge

00:00:24.021 --> 00:00:27.081
I am not going to say that this 
thing is particularly cutting edge

00:00:27.081 --> 00:00:31.553
because once we saw the responses,
we dialed things down a bit

00:00:31.553 --> 00:00:37.803
But there will be more cutting edge stuff

00:00:37.803 --> 00:00:42.811
and maybe we start to do other meetups
events in other formats

00:00:42.811 --> 00:00:48.824
So it could be like we have 
an experts' paper meeting

00:00:48.824 --> 00:00:52.864
or we could split it now we can see 
the size of people, size of the crowd

00:00:52.864 --> 00:00:57.824
Anyway, let me talk a little bit about
going deeper with transfer learning

00:00:57.824 --> 00:01:00.164
Unfortunately, this is something 
some of you people

00:01:00.164 --> 00:01:02.503
would have seen me do before

00:01:02.503 --> 00:01:04.883
This is the first time I have 
done it in tensorflow

00:01:05.122 --> 00:01:07.272
and let me just explain that

00:01:07.272 --> 00:01:10.162
Before, I have been programming this stuff

00:01:10.162 --> 00:01:13.382
in Theano with the 
Lasagna layers thing on top

00:01:13.382 --> 00:01:19.431
and Theano is a research-based 
deep learning framework, out of Montreal

00:01:19.431 --> 00:01:22.683
but what I have concluded
since last summer

00:01:22.683 --> 00:01:26.643
is that TensorFlow 's probably the winner
of this framework race

00:01:26.643 --> 00:01:29.434
at least for foreseeable future

00:01:29.434 --> 00:01:31.861
with all this nice industrial stuff

00:01:31.891 --> 00:01:35.154
I should be retooling into TensorFlow

00:01:35.518 --> 00:01:37.483
That's what I am taking the opportunity
to do for this

00:01:40.951 --> 00:01:43.067
So, about me, sorry here we go

00:01:43.067 --> 00:01:45.678
I have come up through finance, 
startups and stuff

00:01:45.678 --> 00:01:49.649
I took a year out basically in 2014
just for fun

00:01:49.649 --> 00:01:53.559
I have been doing serious kind of 
natural language processing since then

00:02:00.909 --> 00:02:04.629
Basically, the overview for this
something more challenging talk

00:02:04.629 --> 00:02:08.669
which will probably be 20 mins, 30 mins
depending on how it goes

00:02:08.669 --> 00:02:13.889
I want to take a state-of-the-art
TensorFlow model

00:02:13.889 --> 00:02:16.769
I want to solve a problem that 
it wasn't trained for

00:02:16.769 --> 00:02:20.928
And I am going to be using 
deep learning as a component

00:02:20.928 --> 00:02:25.960
of my solution rather than the 
primary focus of what I am trying to build

00:02:25.960 --> 00:02:32.900
So this is, in a way more of an industrial
or commercial kind of application

00:02:32.912 --> 00:02:35.190
for what's going on here

00:02:35.190 --> 00:02:38.510
So the goal for this kind of problem is

00:02:38.510 --> 00:02:42.530
I want to distinguish pictures
of classic and modern sports cars

00:02:42.530 --> 00:02:47.051
you will see some pictures of 
classic and modern cars a bit later

00:02:48.433 --> 00:02:51.722
It's not that easy to say what
the difference is

00:02:51.722 --> 00:02:55.211
obviously, it could be 
different types of images

00:02:55.211 --> 00:02:57.454
and it could be lots of 
different classes

00:02:57.454 --> 00:03:00.992
I am just doing a very simple 
two class thing

00:03:00.992 --> 00:03:03.145
but it's complicated images

00:03:03.145 --> 00:03:04.824
what I want to do is

00:03:04.824 --> 00:03:06.114
I want to have a very small training time

00:03:06.114 --> 00:03:08.381
so I don't want to be retraining 
some huge network

00:03:08.381 --> 00:03:12.895
Particularly, I have only got 
in this case, 20 training examples

00:03:12.895 --> 00:03:18.195
So I am not gonna do any fantastic
million image training

00:03:18.195 --> 00:03:20.863
I have got 20 images with me

00:03:20.863 --> 00:03:24.705
and I also want to be able to
put this in production

00:03:24.705 --> 00:03:30.118
so I can just run it as a component of 
something else

00:03:30.118 --> 00:03:36.395
Basically, one of the things that is 
carrying the deep learning world forward

00:03:36.395 --> 00:03:40.196
is an image classification task 
called ImageNet

00:03:40.196 --> 00:03:42.406
this has been a competition where

00:03:42.406 --> 00:03:47.407
they have 15 million labeled images
from 22,000 categories

00:03:47.407 --> 00:03:49.858
and you can see some of them here

00:03:49.858 --> 00:03:55.817
if we go for this.
this is a picture of a hotdog in a bun

00:03:55.817 --> 00:03:57.786
and here are some of the categories

00:03:57.786 --> 00:04:02.538
which will be some food I don't know

00:04:02.538 --> 00:04:06.107
these are hotdogs, lots of 
different pictures of hotdogs

00:04:06.107 --> 00:04:09.058
lots of different pictures of cheeseburgers

00:04:09.058 --> 00:04:11.848
lots of different pictures of plates

00:04:11.848 --> 00:04:15.338
so the task for ImageNet is to classify

00:04:15.338 --> 00:04:18.267
for any given, any one of these images

00:04:18.267 --> 00:04:20.447
which of a thousand different 
categories it is from

00:04:20.447 --> 00:04:25.328
and it used to be that people could
score adequately well

00:04:25.328 --> 00:04:28.558
and were making incremental changes in

00:04:28.558 --> 00:04:30.558
how well they can do this

00:04:30.558 --> 00:04:32.998
but the deep learning people came along

00:04:32.998 --> 00:04:35.488
and kind of tore this to shreds

00:04:35.488 --> 00:04:40.149
and Google came up with GoogLeNet

00:04:40.149 --> 00:04:43.909
what we are actually going to use here,
back in 2014

00:04:43.909 --> 00:04:49.649
suddenly, this stuff is now being done
with further iterations

00:04:49.649 --> 00:04:52.808
of this kind of thing,
better than humans can

00:04:52.808 --> 00:04:56.795
So the way you can measure whether
someone is better than humans

00:04:56.795 --> 00:04:59.069
is, you take a human and see
whether it beats him

00:04:59.069 --> 00:05:01.560
the question there is 
are there labeling errors

00:05:01.560 --> 00:05:03.720
there you need a committee of humans

00:05:03.720 --> 00:05:06.250
so the way they label these things is

00:05:06.250 --> 00:05:08.740
by running it on Mechanical Turk and

00:05:08.740 --> 00:05:12.490
asking people what category is this
cheeseburger in

00:05:14.820 --> 00:05:16.380
The network we are going to use here

00:05:16.380 --> 00:05:23.421
is the 2014 state-of-the-art GoogLeNet,
also called Inception version 1

00:05:23.421 --> 00:05:25.690
The nice thing about this is that

00:05:25.690 --> 00:05:30.942
there is an existing model
already trained for this task

00:05:30.942 --> 00:05:33.772
and it's available for download
it's all free

00:05:33.772 --> 00:05:38.952
and there are lots of different 
models out there

00:05:38.952 --> 00:05:41.362
there's a model zoo for TensorFlow

00:05:41.362 --> 00:05:44.351
So, what I have on my machine

00:05:44.351 --> 00:05:48.531
and this is a small model, 
it's a 20 megabytes kind of model

00:05:48.531 --> 00:05:50.276
So it is not a very big model

00:05:50.276 --> 00:05:57.291
Inception 4 is a 200 MB kind of model
which is a bit heavy

00:05:57.291 --> 00:05:59.423
I am working here on my laptop

00:05:59.423 --> 00:06:01.212
you are gonna see it working in real-time

00:06:01.212 --> 00:06:07.254
and the trick here is instead of 
a softmax layer at the end

00:06:07.254 --> 00:06:12.984
I will show you the diagram, it should be 
clear to anyone who's following along

00:06:12.984 --> 00:06:19.082
instead of using logits to get me 
the probablilities

00:06:19.082 --> 00:06:21.133
I am going to strip that away

00:06:21.133 --> 00:06:23.074
and I am going to train 
a support vector machine

00:06:23.074 --> 00:06:24.884
to distinguish these classes

00:06:24.884 --> 00:06:29.854
I am not going to retrain the 
Inception network at all

00:06:29.854 --> 00:06:32.474
I am going to just use it as a component

00:06:32.474 --> 00:06:34.913
strip off the top classification piece

00:06:34.913 --> 00:06:38.234
and replace it with an SVM

00:06:38.234 --> 00:06:40.384
Now, SVMs are pretty well understood

00:06:40.384 --> 00:06:44.624
here I am just using Inception 
as a featurizer for images

00:06:44.624 --> 00:06:47.285
So here's a network picture

00:06:47.285 --> 00:06:52.015
Basically, this is what the ImageNet 
network is designed for

00:06:52.015 --> 00:06:54.334
you put in an image at the bottom

00:06:54.334 --> 00:06:57.445
there is this black box which is the 
Inception network

00:06:57.445 --> 00:07:00.745
which is a bunch of CNNs or 
convolutional neural networks

00:07:00.745 --> 00:07:02.596
followed by Dense network

00:07:02.596 --> 00:07:04.846
followed by these logits

00:07:04.846 --> 00:07:07.976
and these logits layers is essentially 
the same as the 0 to 10

00:07:07.976 --> 00:07:17.037
that Sam had for his digits, 1 to 1000 
for the different classes for ImageNet

00:07:17.037 --> 00:07:20.418
To actually get the ImageNet output

00:07:20.418 --> 00:07:27.387
it uses a softmax function and 
then chooses the highest one of these

00:07:27.387 --> 00:07:28.908
to give you this is the class
that this is in

00:07:28.908 --> 00:07:32.167
What I am going to do is 
I am going to ignore this

00:07:32.167 --> 00:07:35.337
neat piece of classification technology
that they have got

00:07:35.337 --> 00:07:44.148
let's say we use these outputs as inputs
to SVM and just treat these as features

00:07:44.148 --> 00:07:46.567
Now if we pick out one of these

00:07:46.567 --> 00:07:50.698
this class could be cheeseburger
and this class could be parrot

00:07:50.698 --> 00:07:54.067
and this other class could be Husky dog

00:07:54.067 --> 00:07:57.178
there is all sorts of classes in here

00:07:57.178 --> 00:07:59.709
but basically what I will be doing is that

00:07:59.709 --> 00:08:02.248
I will be extracting out the features 
of these photos

00:08:02.248 --> 00:08:04.948
saying how much of this photo 
is like a parrot

00:08:04.948 --> 00:08:08.938
how much of this is like a Husky dog

00:08:08.938 --> 00:08:13.229
Now it turns out that modern cars and 
classic cars can be distinguised that way

00:08:13.229 --> 00:08:18.659
Let me go to some code

00:08:18.659 --> 00:08:20.600
Ok this code is all up on GitHub

00:08:30.950 --> 00:08:34.300
Can everyone see this enough

00:08:38.380 --> 00:08:42.230
So basically, I am pulling in TensorFlow

00:08:45.400 --> 00:08:49.251
I pull in this model

00:08:49.251 --> 00:08:52.780
Here is what the Inception architecture is

00:08:52.780 --> 00:08:56.971
It feeds forward this way,
here you put your image

00:08:56.971 --> 00:08:59.901
it goes through lots and lots of 
convolutional layers

00:08:59.901 --> 00:09:03.490
all the way up to the end 
with softmax and the output

00:09:03.490 --> 00:09:06.922
So having done that, what I will do is

00:09:06.922 --> 00:09:09.741
actually I have a download 
for the checkpoint

00:09:09.741 --> 00:09:16.562
this is the checkpoint here which 
is a tar file, I have it locally stored

00:09:16.562 --> 00:09:18.500
It doesn't download it now

00:09:18.500 --> 00:09:25.262
but it is all there, even the 
big models are there up from Google

00:09:25.262 --> 00:09:27.762
so they have retrained these

00:09:27.762 --> 00:09:30.483
so the Inception thing takes about a week

00:09:30.483 --> 00:09:33.792
to retrain on a bunch of, 
it could be 64 GPUs

00:09:33.792 --> 00:09:36.864
so you don't really want to be
training this thing on your own

00:09:36.864 --> 00:09:40.793
you also need the ImageNet training set


00:09:40.793 --> 00:09:48.384
it is a 140 GB file 
which is no fun to download

00:09:50.824 --> 00:09:57.185
what I am doing here is basically
there is also an Inception library

00:09:57.185 --> 00:10:04.043
which is part of the TF-Slim 
this thing is desinged such that

00:10:04.043 --> 00:10:08.264
it already knows the network
it can preload it

00:10:08.264 --> 00:10:12.290
this has loaded it, 
I can get some labels

00:10:12.290 --> 00:10:17.184
This is loading up the ImageNet labels

00:10:17.184 --> 00:10:25.565
I need to know which location 
corresponds to which class like the digits

00:10:31.285 --> 00:10:33.305
Here we are going through 
basically the same steps

00:10:33.305 --> 00:10:39.068
as the MNIST example in that
we reset the default graph

00:10:39.068 --> 00:10:44.586
we create a placeholder which is 
where my images are going to go

00:10:44.586 --> 00:10:47.575
this is as an input 
but from this image input

00:10:47.575 --> 00:10:49.904
I am then going to do some TensorFlow steps

00:10:49.904 --> 00:10:52.286
because TensorFlow 
has various preprocessing

00:10:52.286 --> 00:10:55.767
or graphics handling commands

00:10:55.767 --> 00:10:57.747
because a lot of this stuff
works with images

00:10:57.747 --> 00:11:02.547
so there's all sorts of clipping
and rotating stuff

00:11:02.547 --> 00:11:04.778
so it can preprocess these images

00:11:04.778 --> 00:11:08.485
I am also going to pull out a numpy image

00:11:08.485 --> 00:11:10.828
so I can see what it is actually looking at

00:11:10.828 --> 00:11:14.850
here with this Inception version 1

00:11:14.850 --> 00:11:20.906
I am going to pull in the entire
Inception version 1 model

00:11:23.356 --> 00:11:26.568
My net function rather than being 
just picks and random weights

00:11:26.568 --> 00:11:29.978
is gonna be assigned this 
from this checkpoint

00:11:29.978 --> 00:11:34.418
when I run the init thing from my graph

00:11:34.418 --> 00:11:37.478
or in my session, it won't initialize
everything from random

00:11:37.478 --> 00:11:39.479
it will initialize everything from disk

00:11:39.479 --> 00:11:42.028
so this will define the model

00:11:42.028 --> 00:11:45.358
and now let's proceed

00:11:45.358 --> 00:11:51.609
one of the issues with having this 
on a nice TensorFlow graph

00:11:51.609 --> 00:11:56.658
is it just says input, Inception1, output

00:11:56.658 --> 00:11:59.939
so there's a big block there 
you can delve into it if you want

00:11:59.939 --> 00:12:05.790
let me just show you
let's go back a bit

00:12:08.320 --> 00:12:11.300
So this is the code 
behind the Inception1 model

00:12:11.300 --> 00:12:16.060
so this is actually smaller than the 
Inception2 and Inception3

00:12:16.060 --> 00:12:22.331
basically, we have a kind of a base
Inception piece, just this

00:12:22.331 --> 00:12:24.971
and these are combined together

00:12:24.971 --> 00:12:33.441
and this is a detailed model put together 
by many smart people in 2014

00:12:33.441 --> 00:12:35.472
it's got much more complicated since then

00:12:35.472 --> 00:12:38.912
fortunately, they have written the code 
and we don't have to

00:12:43.422 --> 00:12:46.321
So here what I am gonna do is 
I am gonna load an example image

00:12:46.321 --> 00:12:50.581
just to show you
one of the the things here is

00:12:50.581 --> 00:12:56.396
TensorFlow in order to become efficient
wants to do the loading itself

00:12:56.396 --> 00:13:01.344
So in order to get this pumping 
information through

00:13:01.344 --> 00:13:03.633
it wants you to set up hues of images

00:13:03.633 --> 00:13:10.263
it will then handle the whole ingestion
process itself

00:13:10.263 --> 00:13:14.153
the problem with that is 
it's kind of complicated to do

00:13:14.153 --> 00:13:16.023
in a Jupyter notebook right here

00:13:16.023 --> 00:13:19.133
so here I am going to do 
the very simplest thing

00:13:19.133 --> 00:13:22.393
which is load a numpy image 
and stuff the numpy image in

00:13:22.393 --> 00:13:24.883
but what TensorFlow would love me to do

00:13:24.883 --> 00:13:29.413
is create , as you see in this one

00:13:29.413 --> 00:13:34.024
create a file name queue and it will

00:13:34.024 --> 00:13:35.314
then run the queue, do the matching

00:13:35.314 --> 00:13:36.674
and do all of this stuff itself

00:13:36.674 --> 00:13:41.093
because then it can lay it out across 
potentially distributed cluster

00:13:41.093 --> 00:13:43.414
and do everything just right

00:13:43.414 --> 00:13:50.254
here I do kind of the simple read the image

00:13:50.254 --> 00:13:59.507
so this image is a tensor 
which is 224 by 224 by RGB

00:13:59.507 --> 00:14:03.478
this is kind of sanity check
what kind of numbers I got in the corner

00:14:03.478 --> 00:14:05.667
and then what I am gonna do is

00:14:05.667 --> 00:14:08.016
i am going to crop out the 
middle section of it

00:14:08.016 --> 00:14:10.761
this happens to be the right size already

00:14:10.761 --> 00:14:13.495
basically if you got odd shapes

00:14:13.495 --> 00:14:15.136
you need to think about 
how am I gonna do it

00:14:15.136 --> 00:14:18.956
am I going to pad it
what do you do

00:14:18.956 --> 00:14:21.947
because in order to make this efficient

00:14:21.947 --> 00:14:29.056
TensorFlow wants to lay it out without 
all this variability in image size

00:14:29.056 --> 00:14:34.475
one set of parameters and it's then going
to blast it across your GPU

00:14:34.475 --> 00:14:37.865
so let's just run this thing

00:14:37.865 --> 00:14:39.697
so now we have defined the network

00:14:39.697 --> 00:14:45.767
here I am going to pick a session
here I am going to init the session

00:14:45.767 --> 00:14:47.839
it loads the data, and then I am going

00:14:47.839 --> 00:14:52.037
to pick up the numpy image and the
probabilities from the top layer

00:14:52.037 --> 00:14:54.677
I am just gonna show it

00:14:57.507 --> 00:15:01.366
here is the image 
this is image I pulled out of the disk

00:15:01.366 --> 00:15:06.327
you can see here the probabilities, 
the highest probability is Tabby cat

00:15:06.327 --> 00:15:10.487
which is good, it's also interesting that

00:15:10.487 --> 00:15:15.263
the second in line things are Tiger cat,
Egyptian cat, lynx

00:15:15.263 --> 00:15:21.037
so it's got a fair idea that it is a cat
in particular, it is getting it right

00:15:21.037 --> 00:15:26.169
ok so this is the same diagram 
we have had before

00:15:26.169 --> 00:15:32.729
what you have seen is this going in this 
black box, coming out and telling us

00:15:32.729 --> 00:15:35.868
the probabilities here, so what we are 
now gonna do is

00:15:35.868 --> 00:15:41.910
from the image to the black box and 
just learn a bunch of features

00:15:50.030 --> 00:15:52.720
let me just show you this on disk

00:16:11.300 --> 00:16:13.304
so I have a cars directory here

00:16:13.957 --> 00:16:17.848
and inside this thing,

00:16:24.238 --> 00:16:25.788
I have surprisingly little data

00:16:36.648 --> 00:16:39.863
In this directory, I just have a 
bunch of car images

00:16:39.863 --> 00:16:42.189
and I have two sets of images

00:16:42.189 --> 00:16:47.659
one of which is called classic 
and the other is called modern

00:16:47.659 --> 00:16:52.010
so I basically I picked some
photos off Flickr

00:16:52.010 --> 00:16:54.439
I put these into two separate directories

00:16:54.439 --> 00:16:56.309
I am going to use those directory names

00:16:56.309 --> 00:17:00.431
as the classification for these images

00:17:00.431 --> 00:17:05.160
In the upper directory here
I got a bunch of test images

00:17:05.160 --> 00:17:06.830
which I don't know the labels for

00:17:12.610 --> 00:17:17.261
this picks out the list of classic , there
is a classic and a modern directory

00:17:17.261 --> 00:17:21.990
I am gonna go through every file 
in this directory

00:17:21.990 --> 00:17:28.470
I am gonna crop it, I am gonna find 
the logits level which is

00:17:28.470 --> 00:17:33.441
all the classes and then I am just gonna 
add these to features

00:17:33.441 --> 00:17:36.601
So basically I am gonna do something
like a scikit-learn model

00:17:36.601 --> 00:17:38.311
I am gonna fit SVM

00:17:38.311 --> 00:17:42.111
so basically, this is featurizing 
all these pictures

00:17:47.911 --> 00:17:49.961
so here we go with the training data

00:17:55.571 --> 00:17:56.972
here's some training

00:18:02.272 --> 00:18:05.622
classic cars,
it went through the classic directory

00:18:05.622 --> 00:18:08.782
modern cars, 
it went through the modern directory

00:18:15.292 --> 00:18:16.752
it's thinking hard

00:18:18.392 --> 00:18:25.284
what I am gonna do now is 
build SVM over those features

00:18:31.016 --> 00:18:40.180
jump to 21:36

00:21:35.478 --> 00:21:43.839
I restarted this thing

00:21:43.839 --> 00:21:49.619
the actual training for this SVM thing
takes that long,

00:21:49.619 --> 00:21:58.018
this is very quick, essentially 20 images 
worth of a thousand features

00:21:58.018 --> 00:22:01.840
so there was no big training loop to do

00:22:01.840 --> 00:22:09.070
then I can run this on the actual models 
in the directory, in the test set

00:22:09.070 --> 00:22:12.680
so here this is images that it has never
seen before

00:22:12.680 --> 00:22:16.440
it thinks that this is a modern car

00:22:16.440 --> 00:22:19.020
this one it thinks is a classic car, 
this one is classified as modern

00:22:19.020 --> 00:22:26.301
so this is actually doing quite a good job
out of just 10 examples of each

00:22:26.301 --> 00:22:32.770
it actually thinks this one is modern
it's not a sports car but anyway

00:22:32.770 --> 00:22:38.939
so this is showing that the SVM we trained

00:22:38.939 --> 00:22:42.901
can classify based on the features that
Inception is producing because

00:22:42.901 --> 00:22:47.231
Inception understands "understands"
what images are about

00:22:47.231 --> 00:22:50.801
so if I go back to here, 
code is on GitHub

00:22:50.801 --> 00:22:53.992
conclusions okay, this thing really works

00:22:53.992 --> 00:22:58.402
we didn't have to train 
a deep neural network

00:22:58.402 --> 00:23:01.876
we could plug this TensorFlow model
into an existing pipeline

00:23:01.876 --> 00:23:04.760
and this is actually something where

00:23:04.760 --> 00:23:08.532
the TensorFlow Summit has something
to say about these pipelines

00:23:08.532 --> 00:23:11.013
because not only are they talking
about deep learning

00:23:11.013 --> 00:23:14.753
they are talking about the whole 
cloud-based learning

00:23:14.753 --> 00:23:19.453
and setting up proper processes

00:23:19.453 --> 00:23:23.965
I guess, time for questions quickly

00:23:23.965 --> 00:23:29.142
we can then do the 
TensorFlow Summit wrap-up

00:23:33.212 --> 00:23:37.144
"I am assuming that there is no 
backpropagation here"

00:23:37.144 --> 00:23:40.034
This includes no backpropagation

00:23:40.034 --> 00:23:42.504
"End result is a feature"

00:23:45.884 --> 00:23:53.135
I am just assuming that Inception,
you can imagine if the ImageNet thing

00:23:53.135 --> 00:23:56.265
had focused more on products,
it could be even better

00:23:56.265 --> 00:23:58.914
if it focused on man-made things

00:23:58.914 --> 00:24:04.915
The ImageNet training set has an awful
lot of dogs in it, not that many cats

00:24:04.915 --> 00:24:09.426
So, on the other hand it may be that 
it has quite a lot of flowers

00:24:09.426 --> 00:24:13.826
or maybe that it is saying I like this car
as modern car

00:24:13.826 --> 00:24:16.046
because it's got petals for wheels

00:24:16.046 --> 00:24:20.385
whereas the other one, the classic cars
tend to have round things for wheels

00:24:20.385 --> 00:24:25.146
So it is abstractly doing this

00:24:25.146 --> 00:24:29.918
It doesn't know about sports cars or 
what they look like

00:24:29.918 --> 00:24:31.587
But it does know about curves

00:24:34.607 --> 00:24:37.527
"So for SVM, you don't use 
TensorFlow anymore ?"

00:24:37.527 --> 00:24:43.157
No, basically I have used TensorFlow to 
create some features

00:24:43.157 --> 00:24:45.308
Now, I don't want to throw it away

00:24:45.308 --> 00:24:47.687
because hopefully I have got
a streaming process where

00:24:47.687 --> 00:24:52.177
more and more images are chugged 
through this thing

00:24:52.177 --> 00:25:04.528
<i> could not hear the question properly </i>

00:25:07.058 --> 00:25:10.068
There is an example code called
TensorFlow for poets

00:25:10.068 --> 00:25:13.296
where they actually say that, 
let's load up one of these networks

00:25:13.296 --> 00:25:15.369
and then we will do some fine tuning

00:25:15.369 --> 00:25:21.977
there you get involved in tuning 
these neurons with some gradient descent

00:25:21.977 --> 00:25:24.819
and you are taking some steps 
and all this kind of thing

00:25:24.819 --> 00:25:28.328
maybe you are having broad implications
across the whole network

00:25:28.328 --> 00:25:32.819
which could be good if you have got
tons of data and tons of time

00:25:32.819 --> 00:25:36.948
but this is a very simple way of just
tricking it to get it done

00:25:36.948 --> 00:25:47.382
<i> could not hear the comment properly </i>

00:25:47.382 --> 00:25:54.033
it will be a very small network
because SVM is essentially fairly shallow

00:25:54.033 --> 00:26:06.532
<i>could not hear the question</i>

00:26:06.532 --> 00:26:13.752
TensorFlow even though it has imported
this large Inception network

00:26:13.752 --> 00:26:20.572
as far as I am concerned, 
I am using a f(x) = y and that's it

00:26:20.572 --> 00:26:25.062
but you can inquire what would it say
at this particular level

00:26:25.062 --> 00:26:30.473
and these bunches of levels with various
component points along the way

00:26:30.473 --> 00:26:33.654
I could take out other levels

00:26:33.654 --> 00:26:35.783
I haven't tried it to have a look

00:26:35.783 --> 00:26:40.083
There you get more like pictures
worth of features rather than

00:26:40.083 --> 00:26:43.094
this string of a 1000 numbers

00:26:43.094 --> 00:26:48.884
but each intermediate levels 
will be pictures with CNN kind of features

00:26:48.884 --> 00:26:53.544
on the other hand, if you want 
to play around with this thing

00:26:53.544 --> 00:26:57.654
there's this nice stuff called 
the DeepDream kind of things

00:26:57.654 --> 00:27:02.559
where they try and match images to 
being interesting images

00:27:02.559 --> 00:27:06.454
then you do the featurizing that looks at
different levels

00:27:06.454 --> 00:27:12.415
the highest level is a cat but I want all
local features to be as fishy as possible

00:27:12.415 --> 00:27:15.561
then you get like a fish-faced cat

00:27:15.561 --> 00:27:20.010
that's the kind of thing you can do with
these kinds of features in models