1
00:00:00,259 --> 00:00:03,421
What we hope to do with this meetup
2
00:00:03,851 --> 00:00:10,620
is have something, given the spread of
the questionnaire results
3
00:00:10,620 --> 00:00:12,681
we hope to do something which is kind of
4
00:00:12,681 --> 00:00:15,721
for people who don't know what
deep learning is
5
00:00:15,721 --> 00:00:17,992
and want an introduction to TensorFlow
6
00:00:17,992 --> 00:00:20,322
but also something which is more of a
7
00:00:20,322 --> 00:00:24,021
like a crowd pleaser or something
which is more cutting edge
8
00:00:24,021 --> 00:00:27,081
I am not going to say that this
thing is particularly cutting edge
9
00:00:27,081 --> 00:00:31,553
because once we saw the responses,
we dialed things down a bit
10
00:00:31,553 --> 00:00:37,803
But there will be more cutting edge stuff
11
00:00:37,803 --> 00:00:42,811
and maybe we start to do other meetups
events in other formats
12
00:00:42,811 --> 00:00:48,824
So it could be like we have
an experts' paper meeting
13
00:00:48,824 --> 00:00:52,864
or we could split it now we can see
the size of people, size of the crowd
14
00:00:52,864 --> 00:00:57,824
Anyway, let me talk a little bit about
going deeper with transfer learning
15
00:00:57,824 --> 00:01:00,164
Unfortunately, this is something
some of you people
16
00:01:00,164 --> 00:01:02,503
would have seen me do before
17
00:01:02,503 --> 00:01:04,883
This is the first time I have
done it in tensorflow
18
00:01:05,122 --> 00:01:07,272
and let me just explain that
19
00:01:07,272 --> 00:01:10,162
Before, I have been programming this stuff
20
00:01:10,162 --> 00:01:13,382
in Theano with the
Lasagna layers thing on top
21
00:01:13,382 --> 00:01:19,431
and Theano is a research-based
deep learning framework, out of Montreal
22
00:01:19,431 --> 00:01:22,683
but what I have concluded
since last summer
23
00:01:22,683 --> 00:01:26,643
is that TensorFlow 's probably the winner
of this framework race
24
00:01:26,643 --> 00:01:29,434
at least for foreseeable future
25
00:01:29,434 --> 00:01:31,861
with all this nice industrial stuff
26
00:01:31,891 --> 00:01:35,154
I should be retooling into TensorFlow
27
00:01:35,518 --> 00:01:37,483
That's what I am taking the opportunity
to do for this
28
00:01:40,951 --> 00:01:43,067
So, about me, sorry here we go
29
00:01:43,067 --> 00:01:45,678
I have come up through finance,
startups and stuff
30
00:01:45,678 --> 00:01:49,649
I took a year out basically in 2014
just for fun
31
00:01:49,649 --> 00:01:53,559
I have been doing serious kind of
natural language processing since then
32
00:02:00,909 --> 00:02:04,629
Basically, the overview for this
something more challenging talk
33
00:02:04,629 --> 00:02:08,669
which will probably be 20 mins, 30 mins
depending on how it goes
34
00:02:08,669 --> 00:02:13,889
I want to take a state-of-the-art
TensorFlow model
35
00:02:13,889 --> 00:02:16,769
I want to solve a problem that
it wasn't trained for
36
00:02:16,769 --> 00:02:20,928
And I am going to be using
deep learning as a component
37
00:02:20,928 --> 00:02:25,960
of my solution rather than the
primary focus of what I am trying to build
38
00:02:25,960 --> 00:02:32,900
So this is, in a way more of an industrial
or commercial kind of application
39
00:02:32,912 --> 00:02:35,190
for what's going on here
40
00:02:35,190 --> 00:02:38,510
So the goal for this kind of problem is
41
00:02:38,510 --> 00:02:42,530
I want to distinguish pictures
of classic and modern sports cars
42
00:02:42,530 --> 00:02:47,051
you will see some pictures of
classic and modern cars a bit later
43
00:02:48,433 --> 00:02:51,722
It's not that easy to say what
the difference is
44
00:02:51,722 --> 00:02:55,211
obviously, it could be
different types of images
45
00:02:55,211 --> 00:02:57,454
and it could be lots of
different classes
46
00:02:57,454 --> 00:03:00,992
I am just doing a very simple
two class thing
47
00:03:00,992 --> 00:03:03,145
but it's complicated images
48
00:03:03,145 --> 00:03:04,824
what I want to do is
49
00:03:04,824 --> 00:03:06,114
I want to have a very small training time
50
00:03:06,114 --> 00:03:08,381
so I don't want to be retraining
some huge network
51
00:03:08,381 --> 00:03:12,895
Particularly, I have only got
in this case, 20 training examples
52
00:03:12,895 --> 00:03:18,195
So I am not gonna do any fantastic
million image training
53
00:03:18,195 --> 00:03:20,863
I have got 20 images with me
54
00:03:20,863 --> 00:03:24,705
and I also want to be able to
put this in production
55
00:03:24,705 --> 00:03:30,118
so I can just run it as a component of
something else
56
00:03:30,118 --> 00:03:36,395
Basically, one of the things that is
carrying the deep learning world forward
57
00:03:36,395 --> 00:03:40,196
is an image classification task
called ImageNet
58
00:03:40,196 --> 00:03:42,406
this has been a competition where
59
00:03:42,406 --> 00:03:47,407
they have 15 million labeled images
from 22,000 categories
60
00:03:47,407 --> 00:03:49,858
and you can see some of them here
61
00:03:49,858 --> 00:03:55,817
if we go for this.
this is a picture of a hotdog in a bun
62
00:03:55,817 --> 00:03:57,786
and here are some of the categories
63
00:03:57,786 --> 00:04:02,538
which will be some food I don't know
64
00:04:02,538 --> 00:04:06,107
these are hotdogs, lots of
different pictures of hotdogs
65
00:04:06,107 --> 00:04:09,058
lots of different pictures of cheeseburgers
66
00:04:09,058 --> 00:04:11,848
lots of different pictures of plates
67
00:04:11,848 --> 00:04:15,338
so the task for ImageNet is to classify
68
00:04:15,338 --> 00:04:18,267
for any given, any one of these images
69
00:04:18,267 --> 00:04:20,447
which of a thousand different
categories it is from
70
00:04:20,447 --> 00:04:25,328
and it used to be that people could
score adequately well
71
00:04:25,328 --> 00:04:28,558
and were making incremental changes in
72
00:04:28,558 --> 00:04:30,558
how well they can do this
73
00:04:30,558 --> 00:04:32,998
but the deep learning people came along
74
00:04:32,998 --> 00:04:35,488
and kind of tore this to shreds
75
00:04:35,488 --> 00:04:40,149
and Google came up with GoogLeNet
76
00:04:40,149 --> 00:04:43,909
what we are actually going to use here,
back in 2014
77
00:04:43,909 --> 00:04:49,649
suddenly, this stuff is now being done
with further iterations
78
00:04:49,649 --> 00:04:52,808
of this kind of thing,
better than humans can
79
00:04:52,808 --> 00:04:56,795
So the way you can measure whether
someone is better than humans
80
00:04:56,795 --> 00:04:59,069
is, you take a human and see
whether it beats him
81
00:04:59,069 --> 00:05:01,560
the question there is
are there labeling errors
82
00:05:01,560 --> 00:05:03,720
there you need a committee of humans
83
00:05:03,720 --> 00:05:06,250
so the way they label these things is
84
00:05:06,250 --> 00:05:08,740
by running it on Mechanical Turk and
85
00:05:08,740 --> 00:05:12,490
asking people what category is this
cheeseburger in
86
00:05:14,820 --> 00:05:16,380
The network we are going to use here
87
00:05:16,380 --> 00:05:23,421
is the 2014 state-of-the-art GoogLeNet,
also called Inception version 1
88
00:05:23,421 --> 00:05:25,690
The nice thing about this is that
89
00:05:25,690 --> 00:05:30,942
there is an existing model
already trained for this task
90
00:05:30,942 --> 00:05:33,772
and it's available for download
it's all free
91
00:05:33,772 --> 00:05:38,952
and there are lots of different
models out there
92
00:05:38,952 --> 00:05:41,362
there's a model zoo for TensorFlow
93
00:05:41,362 --> 00:05:44,351
So, what I have on my machine
94
00:05:44,351 --> 00:05:48,531
and this is a small model,
it's a 20 megabytes kind of model
95
00:05:48,531 --> 00:05:50,276
So it is not a very big model
96
00:05:50,276 --> 00:05:57,291
Inception 4 is a 200 MB kind of model
which is a bit heavy
97
00:05:57,291 --> 00:05:59,423
I am working here on my laptop
98
00:05:59,423 --> 00:06:01,212
you are gonna see it working in real-time
99
00:06:01,212 --> 00:06:07,254
and the trick here is instead of
a softmax layer at the end
100
00:06:07,254 --> 00:06:12,984
I will show you the diagram, it should be
clear to anyone who's following along
101
00:06:12,984 --> 00:06:19,082
instead of using logits to get me
the probablilities
102
00:06:19,082 --> 00:06:21,133
I am going to strip that away
103
00:06:21,133 --> 00:06:23,074
and I am going to train
a support vector machine
104
00:06:23,074 --> 00:06:24,884
to distinguish these classes
105
00:06:24,884 --> 00:06:29,854
I am not going to retrain the
Inception network at all
106
00:06:29,854 --> 00:06:32,474
I am going to just use it as a component
107
00:06:32,474 --> 00:06:34,913
strip off the top classification piece
108
00:06:34,913 --> 00:06:38,234
and replace it with an SVM
109
00:06:38,234 --> 00:06:40,384
Now, SVMs are pretty well understood
110
00:06:40,384 --> 00:06:44,624
here I am just using Inception
as a featurizer for images
111
00:06:44,624 --> 00:06:47,285
So here's a network picture
112
00:06:47,285 --> 00:06:52,015
Basically, this is what the ImageNet
network is designed for
113
00:06:52,015 --> 00:06:54,334
you put in an image at the bottom
114
00:06:54,334 --> 00:06:57,445
there is this black box which is the
Inception network
115
00:06:57,445 --> 00:07:00,745
which is a bunch of CNNs or
convolutional neural networks
116
00:07:00,745 --> 00:07:02,596
followed by Dense network
117
00:07:02,596 --> 00:07:04,846
followed by these logits
118
00:07:04,846 --> 00:07:07,976
and these logits layers is essentially
the same as the 0 to 10
119
00:07:07,976 --> 00:07:17,037
that Sam had for his digits, 1 to 1000
for the different classes for ImageNet
120
00:07:17,037 --> 00:07:20,418
To actually get the ImageNet output
121
00:07:20,418 --> 00:07:27,387
it uses a softmax function and
then chooses the highest one of these
122
00:07:27,387 --> 00:07:28,908
to give you this is the class
that this is in
123
00:07:28,908 --> 00:07:32,167
What I am going to do is
I am going to ignore this
124
00:07:32,167 --> 00:07:35,337
neat piece of classification technology
that they have got
125
00:07:35,337 --> 00:07:44,148
let's say we use these outputs as inputs
to SVM and just treat these as features
126
00:07:44,148 --> 00:07:46,567
Now if we pick out one of these
127
00:07:46,567 --> 00:07:50,698
this class could be cheeseburger
and this class could be parrot
128
00:07:50,698 --> 00:07:54,067
and this other class could be Husky dog
129
00:07:54,067 --> 00:07:57,178
there is all sorts of classes in here
130
00:07:57,178 --> 00:07:59,709
but basically what I will be doing is that
131
00:07:59,709 --> 00:08:02,248
I will be extracting out the features
of these photos
132
00:08:02,248 --> 00:08:04,948
saying how much of this photo
is like a parrot
133
00:08:04,948 --> 00:08:08,938
how much of this is like a Husky dog
134
00:08:08,938 --> 00:08:13,229
Now it turns out that modern cars and
classic cars can be distinguised that way
135
00:08:13,229 --> 00:08:18,659
Let me go to some code
136
00:08:18,659 --> 00:08:20,600
Ok this code is all up on GitHub
137
00:08:30,950 --> 00:08:34,300
Can everyone see this enough
138
00:08:38,380 --> 00:08:42,230
So basically, I am pulling in TensorFlow
139
00:08:45,400 --> 00:08:49,251
I pull in this model
140
00:08:49,251 --> 00:08:52,780
Here is what the Inception architecture is
141
00:08:52,780 --> 00:08:56,971
It feeds forward this way,
here you put your image
142
00:08:56,971 --> 00:08:59,901
it goes through lots and lots of
convolutional layers
143
00:08:59,901 --> 00:09:03,490
all the way up to the end
with softmax and the output
144
00:09:03,490 --> 00:09:06,922
So having done that, what I will do is
145
00:09:06,922 --> 00:09:09,741
actually I have a download
for the checkpoint
146
00:09:09,741 --> 00:09:16,562
this is the checkpoint here which
is a tar file, I have it locally stored
147
00:09:16,562 --> 00:09:18,500
It doesn't download it now
148
00:09:18,500 --> 00:09:25,262
but it is all there, even the
big models are there up from Google
149
00:09:25,262 --> 00:09:27,762
so they have retrained these
150
00:09:27,762 --> 00:09:30,483
so the Inception thing takes about a week
151
00:09:30,483 --> 00:09:33,792
to retrain on a bunch of,
it could be 64 GPUs
152
00:09:33,792 --> 00:09:36,864
so you don't really want to be
training this thing on your own
153
00:09:36,864 --> 00:09:40,793
you also need the ImageNet training set
154
00:09:40,793 --> 00:09:48,384
it is a 140 GB file
which is no fun to download
155
00:09:50,824 --> 00:09:57,185
what I am doing here is basically
there is also an Inception library
156
00:09:57,185 --> 00:10:04,043
which is part of the TF-Slim
this thing is desinged such that
157
00:10:04,043 --> 00:10:08,264
it already knows the network
it can preload it
158
00:10:08,264 --> 00:10:12,290
this has loaded it,
I can get some labels
159
00:10:12,290 --> 00:10:17,184
This is loading up the ImageNet labels
160
00:10:17,184 --> 00:10:25,565
I need to know which location
corresponds to which class like the digits
161
00:10:31,285 --> 00:10:33,305
Here we are going through
basically the same steps
162
00:10:33,305 --> 00:10:39,068
as the MNIST example in that
we reset the default graph
163
00:10:39,068 --> 00:10:44,586
we create a placeholder which is
where my images are going to go
164
00:10:44,586 --> 00:10:47,575
this is as an input
but from this image input
165
00:10:47,575 --> 00:10:49,904
I am then going to do some TensorFlow steps
166
00:10:49,904 --> 00:10:52,286
because TensorFlow
has various preprocessing
167
00:10:52,286 --> 00:10:55,767
or graphics handling commands
168
00:10:55,767 --> 00:10:57,747
because a lot of this stuff
works with images
169
00:10:57,747 --> 00:11:02,547
so there's all sorts of clipping
and rotating stuff
170
00:11:02,547 --> 00:11:04,778
so it can preprocess these images
171
00:11:04,778 --> 00:11:08,485
I am also going to pull out a numpy image
172
00:11:08,485 --> 00:11:10,828
so I can see what it is actually looking at
173
00:11:10,828 --> 00:11:14,850
here with this Inception version 1
174
00:11:14,850 --> 00:11:20,906
I am going to pull in the entire
Inception version 1 model
175
00:11:23,356 --> 00:11:26,568
My net function rather than being
just picks and random weights
176
00:11:26,568 --> 00:11:29,978
is gonna be assigned this
from this checkpoint
177
00:11:29,978 --> 00:11:34,418
when I run the init thing from my graph
178
00:11:34,418 --> 00:11:37,478
or in my session, it won't initialize
everything from random
179
00:11:37,478 --> 00:11:39,479
it will initialize everything from disk
180
00:11:39,479 --> 00:11:42,028
so this will define the model
181
00:11:42,028 --> 00:11:45,358
and now let's proceed
182
00:11:45,358 --> 00:11:51,609
one of the issues with having this
on a nice TensorFlow graph
183
00:11:51,609 --> 00:11:56,658
is it just says input, Inception1, output
184
00:11:56,658 --> 00:11:59,939
so there's a big block there
you can delve into it if you want
185
00:11:59,939 --> 00:12:05,790
let me just show you
let's go back a bit
186
00:12:08,320 --> 00:12:11,300
So this is the code
behind the Inception1 model
187
00:12:11,300 --> 00:12:16,060
so this is actually smaller than the
Inception2 and Inception3
188
00:12:16,060 --> 00:12:22,331
basically, we have a kind of a base
Inception piece, just this
189
00:12:22,331 --> 00:12:24,971
and these are combined together
190
00:12:24,971 --> 00:12:33,441
and this is a detailed model put together
by many smart people in 2014
191
00:12:33,441 --> 00:12:35,472
it's got much more complicated since then
192
00:12:35,472 --> 00:12:38,912
fortunately, they have written the code
and we don't have to
193
00:12:43,422 --> 00:12:46,321
So here what I am gonna do is
I am gonna load an example image
194
00:12:46,321 --> 00:12:50,581
just to show you
one of the the things here is
195
00:12:50,581 --> 00:12:56,396
TensorFlow in order to become efficient
wants to do the loading itself
196
00:12:56,396 --> 00:13:01,344
So in order to get this pumping
information through
197
00:13:01,344 --> 00:13:03,633
it wants you to set up hues of images
198
00:13:03,633 --> 00:13:10,263
it will then handle the whole ingestion
process itself
199
00:13:10,263 --> 00:13:14,153
the problem with that is
it's kind of complicated to do
200
00:13:14,153 --> 00:13:16,023
in a Jupyter notebook right here
201
00:13:16,023 --> 00:13:19,133
so here I am going to do
the very simplest thing
202
00:13:19,133 --> 00:13:22,393
which is load a numpy image
and stuff the numpy image in
203
00:13:22,393 --> 00:13:24,883
but what TensorFlow would love me to do
204
00:13:24,883 --> 00:13:29,413
is create , as you see in this one
205
00:13:29,413 --> 00:13:34,024
create a file name queue and it will
206
00:13:34,024 --> 00:13:35,314
then run the queue, do the matching
207
00:13:35,314 --> 00:13:36,674
and do all of this stuff itself
208
00:13:36,674 --> 00:13:41,093
because then it can lay it out across
potentially distributed cluster
209
00:13:41,093 --> 00:13:43,414
and do everything just right
210
00:13:43,414 --> 00:13:50,254
here I do kind of the simple read the image
211
00:13:50,254 --> 00:13:59,507
so this image is a tensor
which is 224 by 224 by RGB
212
00:13:59,507 --> 00:14:03,478
this is kind of sanity check
what kind of numbers I got in the corner
213
00:14:03,478 --> 00:14:05,667
and then what I am gonna do is
214
00:14:05,667 --> 00:14:08,016
i am going to crop out the
middle section of it
215
00:14:08,016 --> 00:14:10,761
this happens to be the right size already
216
00:14:10,761 --> 00:14:13,495
basically if you got odd shapes
217
00:14:13,495 --> 00:14:15,136
you need to think about
how am I gonna do it
218
00:14:15,136 --> 00:14:18,956
am I going to pad it
what do you do
219
00:14:18,956 --> 00:14:21,947
because in order to make this efficient
220
00:14:21,947 --> 00:14:29,056
TensorFlow wants to lay it out without
all this variability in image size
221
00:14:29,056 --> 00:14:34,475
one set of parameters and it's then going
to blast it across your GPU
222
00:14:34,475 --> 00:14:37,865
so let's just run this thing
223
00:14:37,865 --> 00:14:39,697
so now we have defined the network
224
00:14:39,697 --> 00:14:45,767
here I am going to pick a session
here I am going to init the session
225
00:14:45,767 --> 00:14:47,839
it loads the data, and then I am going
226
00:14:47,839 --> 00:14:52,037
to pick up the numpy image and the
probabilities from the top layer
227
00:14:52,037 --> 00:14:54,677
I am just gonna show it
228
00:14:57,507 --> 00:15:01,366
here is the image
this is image I pulled out of the disk
229
00:15:01,366 --> 00:15:06,327
you can see here the probabilities,
the highest probability is Tabby cat
230
00:15:06,327 --> 00:15:10,487
which is good, it's also interesting that
231
00:15:10,487 --> 00:15:15,263
the second in line things are Tiger cat,
Egyptian cat, lynx
232
00:15:15,263 --> 00:15:21,037
so it's got a fair idea that it is a cat
in particular, it is getting it right
233
00:15:21,037 --> 00:15:26,169
ok so this is the same diagram
we have had before
234
00:15:26,169 --> 00:15:32,729
what you have seen is this going in this
black box, coming out and telling us
235
00:15:32,729 --> 00:15:35,868
the probabilities here, so what we are
now gonna do is
236
00:15:35,868 --> 00:15:41,910
from the image to the black box and
just learn a bunch of features
237
00:15:50,030 --> 00:15:52,720
let me just show you this on disk
238
00:16:11,300 --> 00:16:13,304
so I have a cars directory here
239
00:16:13,957 --> 00:16:17,848
and inside this thing,
240
00:16:24,238 --> 00:16:25,788
I have surprisingly little data
241
00:16:36,648 --> 00:16:39,863
In this directory, I just have a
bunch of car images
242
00:16:39,863 --> 00:16:42,189
and I have two sets of images
243
00:16:42,189 --> 00:16:47,659
one of which is called classic
and the other is called modern
244
00:16:47,659 --> 00:16:52,010
so I basically I picked some
photos off Flickr
245
00:16:52,010 --> 00:16:54,439
I put these into two separate directories
246
00:16:54,439 --> 00:16:56,309
I am going to use those directory names
247
00:16:56,309 --> 00:17:00,431
as the classification for these images
248
00:17:00,431 --> 00:17:05,160
In the upper directory here
I got a bunch of test images
249
00:17:05,160 --> 00:17:06,830
which I don't know the labels for
250
00:17:12,610 --> 00:17:17,261
this picks out the list of classic , there
is a classic and a modern directory
251
00:17:17,261 --> 00:17:21,990
I am gonna go through every file
in this directory
252
00:17:21,990 --> 00:17:28,470
I am gonna crop it, I am gonna find
the logits level which is
253
00:17:28,470 --> 00:17:33,441
all the classes and then I am just gonna
add these to features
254
00:17:33,441 --> 00:17:36,601
So basically I am gonna do something
like a scikit-learn model
255
00:17:36,601 --> 00:17:38,311
I am gonna fit SVM
256
00:17:38,311 --> 00:17:42,111
so basically, this is featurizing
all these pictures
257
00:17:47,911 --> 00:17:49,961
so here we go with the training data
258
00:17:55,571 --> 00:17:56,972
here's some training
259
00:18:02,272 --> 00:18:05,622
classic cars,
it went through the classic directory
260
00:18:05,622 --> 00:18:08,782
modern cars,
it went through the modern directory
261
00:18:15,292 --> 00:18:16,752
it's thinking hard
262
00:18:18,392 --> 00:18:25,284
what I am gonna do now is
build SVM over those features
263
00:18:31,016 --> 00:18:40,180
jump to 21:36
264
00:21:35,478 --> 00:21:43,839
I restarted this thing
265
00:21:43,839 --> 00:21:49,619
the actual training for this SVM thing
takes that long,
266
00:21:49,619 --> 00:21:58,018
this is very quick, essentially 20 images
worth of a thousand features
267
00:21:58,018 --> 00:22:01,840
so there was no big training loop to do
268
00:22:01,840 --> 00:22:09,070
then I can run this on the actual models
in the directory, in the test set
269
00:22:09,070 --> 00:22:12,680
so here this is images that it has never
seen before
270
00:22:12,680 --> 00:22:16,440
it thinks that this is a modern car
271
00:22:16,440 --> 00:22:19,020
this one it thinks is a classic car,
this one is classified as modern
272
00:22:19,020 --> 00:22:26,301
so this is actually doing quite a good job
out of just 10 examples of each
273
00:22:26,301 --> 00:22:32,770
it actually thinks this one is modern
it's not a sports car but anyway
274
00:22:32,770 --> 00:22:38,939
so this is showing that the SVM we trained
275
00:22:38,939 --> 00:22:42,901
can classify based on the features that
Inception is producing because
276
00:22:42,901 --> 00:22:47,231
Inception understands "understands"
what images are about
277
00:22:47,231 --> 00:22:50,801
so if I go back to here,
code is on GitHub
278
00:22:50,801 --> 00:22:53,992
conclusions okay, this thing really works
279
00:22:53,992 --> 00:22:58,402
we didn't have to train
a deep neural network
280
00:22:58,402 --> 00:23:01,876
we could plug this TensorFlow model
into an existing pipeline
281
00:23:01,876 --> 00:23:04,760
and this is actually something where
282
00:23:04,760 --> 00:23:08,532
the TensorFlow Summit has something
to say about these pipelines
283
00:23:08,532 --> 00:23:11,013
because not only are they talking
about deep learning
284
00:23:11,013 --> 00:23:14,753
they are talking about the whole
cloud-based learning
285
00:23:14,753 --> 00:23:19,453
and setting up proper processes
286
00:23:19,453 --> 00:23:23,965
I guess, time for questions quickly
287
00:23:23,965 --> 00:23:29,142
we can then do the
TensorFlow Summit wrap-up
288
00:23:33,212 --> 00:23:37,144
"I am assuming that there is no
backpropagation here"
289
00:23:37,144 --> 00:23:40,034
This includes no backpropagation
290
00:23:40,034 --> 00:23:42,504
"End result is a feature"
291
00:23:45,884 --> 00:23:53,135
I am just assuming that Inception,
you can imagine if the ImageNet thing
292
00:23:53,135 --> 00:23:56,265
had focused more on products,
it could be even better
293
00:23:56,265 --> 00:23:58,914
if it focused on man-made things
294
00:23:58,914 --> 00:24:04,915
The ImageNet training set has an awful
lot of dogs in it, not that many cats
295
00:24:04,915 --> 00:24:09,426
So, on the other hand it may be that
it has quite a lot of flowers
296
00:24:09,426 --> 00:24:13,826
or maybe that it is saying I like this car
as modern car
297
00:24:13,826 --> 00:24:16,046
because it's got petals for wheels
298
00:24:16,046 --> 00:24:20,385
whereas the other one, the classic cars
tend to have round things for wheels
299
00:24:20,385 --> 00:24:25,146
So it is abstractly doing this
300
00:24:25,146 --> 00:24:29,918
It doesn't know about sports cars or
what they look like
301
00:24:29,918 --> 00:24:31,587
But it does know about curves
302
00:24:34,607 --> 00:24:37,527
"So for SVM, you don't use
TensorFlow anymore ?"
303
00:24:37,527 --> 00:24:43,157
No, basically I have used TensorFlow to
create some features
304
00:24:43,157 --> 00:24:45,308
Now, I don't want to throw it away
305
00:24:45,308 --> 00:24:47,687
because hopefully I have got
a streaming process where
306
00:24:47,687 --> 00:24:52,177
more and more images are chugged
through this thing
307
00:24:52,177 --> 00:25:04,528
could not hear the question properly
308
00:25:07,058 --> 00:25:10,068
There is an example code called
TensorFlow for poets
309
00:25:10,068 --> 00:25:13,296
where they actually say that,
let's load up one of these networks
310
00:25:13,296 --> 00:25:15,369
and then we will do some fine tuning
311
00:25:15,369 --> 00:25:21,977
there you get involved in tuning
these neurons with some gradient descent
312
00:25:21,977 --> 00:25:24,819
and you are taking some steps
and all this kind of thing
313
00:25:24,819 --> 00:25:28,328
maybe you are having broad implications
across the whole network
314
00:25:28,328 --> 00:25:32,819
which could be good if you have got
tons of data and tons of time
315
00:25:32,819 --> 00:25:36,948
but this is a very simple way of just
tricking it to get it done
316
00:25:36,948 --> 00:25:47,382
could not hear the comment properly
317
00:25:47,382 --> 00:25:54,033
it will be a very small network
because SVM is essentially fairly shallow
318
00:25:54,033 --> 00:26:06,532
could not hear the question
319
00:26:06,532 --> 00:26:13,752
TensorFlow even though it has imported
this large Inception network
320
00:26:13,752 --> 00:26:20,572
as far as I am concerned,
I am using a f(x) = y and that's it
321
00:26:20,572 --> 00:26:25,062
but you can inquire what would it say
at this particular level
322
00:26:25,062 --> 00:26:30,473
and these bunches of levels with various
component points along the way
323
00:26:30,473 --> 00:26:33,654
I could take out other levels
324
00:26:33,654 --> 00:26:35,783
I haven't tried it to have a look
325
00:26:35,783 --> 00:26:40,083
There you get more like pictures
worth of features rather than
326
00:26:40,083 --> 00:26:43,094
this string of a 1000 numbers
327
00:26:43,094 --> 00:26:48,884
but each intermediate levels
will be pictures with CNN kind of features
328
00:26:48,884 --> 00:26:53,544
on the other hand, if you want
to play around with this thing
329
00:26:53,544 --> 00:26:57,654
there's this nice stuff called
the DeepDream kind of things
330
00:26:57,654 --> 00:27:02,559
where they try and match images to
being interesting images
331
00:27:02,559 --> 00:27:06,454
then you do the featurizing that looks at
different levels
332
00:27:06,454 --> 00:27:12,415
the highest level is a cat but I want all
local features to be as fishy as possible
333
00:27:12,415 --> 00:27:15,561
then you get like a fish-faced cat
334
00:27:15,561 --> 00:27:20,010
that's the kind of thing you can do with
these kinds of features in models