1
00:00:00,259 --> 00:00:03,421
What we hope to do with this meetup

2
00:00:03,851 --> 00:00:10,620
is have something, given the spread of 
the questionnaire results

3
00:00:10,620 --> 00:00:12,681
we hope to do something which is kind of


4
00:00:12,681 --> 00:00:15,721
for people who don't know what
deep learning is

5
00:00:15,721 --> 00:00:17,992
and want an introduction to TensorFlow

6
00:00:17,992 --> 00:00:20,322
but also something which is more of a

7
00:00:20,322 --> 00:00:24,021
like a crowd pleaser or something
which is more cutting edge

8
00:00:24,021 --> 00:00:27,081
I am not going to say that this 
thing is particularly cutting edge

9
00:00:27,081 --> 00:00:31,553
because once we saw the responses,
we dialed things down a bit

10
00:00:31,553 --> 00:00:37,803
But there will be more cutting edge stuff

11
00:00:37,803 --> 00:00:42,811
and maybe we start to do other meetups
events in other formats

12
00:00:42,811 --> 00:00:48,824
So it could be like we have 
an experts' paper meeting

13
00:00:48,824 --> 00:00:52,864
or we could split it now we can see 
the size of people, size of the crowd

14
00:00:52,864 --> 00:00:57,824
Anyway, let me talk a little bit about
going deeper with transfer learning

15
00:00:57,824 --> 00:01:00,164
Unfortunately, this is something 
some of you people

16
00:01:00,164 --> 00:01:02,503
would have seen me do before

17
00:01:02,503 --> 00:01:04,883
This is the first time I have 
done it in tensorflow

18
00:01:05,122 --> 00:01:07,272
and let me just explain that

19
00:01:07,272 --> 00:01:10,162
Before, I have been programming this stuff

20
00:01:10,162 --> 00:01:13,382
in Theano with the 
Lasagna layers thing on top

21
00:01:13,382 --> 00:01:19,431
and Theano is a research-based 
deep learning framework, out of Montreal

22
00:01:19,431 --> 00:01:22,683
but what I have concluded
since last summer

23
00:01:22,683 --> 00:01:26,643
is that TensorFlow 's probably the winner
of this framework race

24
00:01:26,643 --> 00:01:29,434
at least for foreseeable future

25
00:01:29,434 --> 00:01:31,861
with all this nice industrial stuff

26
00:01:31,891 --> 00:01:35,154
I should be retooling into TensorFlow

27
00:01:35,518 --> 00:01:37,483
That's what I am taking the opportunity
to do for this

28
00:01:40,951 --> 00:01:43,067
So, about me, sorry here we go

29
00:01:43,067 --> 00:01:45,678
I have come up through finance, 
startups and stuff

30
00:01:45,678 --> 00:01:49,649
I took a year out basically in 2014
just for fun

31
00:01:49,649 --> 00:01:53,559
I have been doing serious kind of 
natural language processing since then

32
00:02:00,909 --> 00:02:04,629
Basically, the overview for this
something more challenging talk

33
00:02:04,629 --> 00:02:08,669
which will probably be 20 mins, 30 mins
depending on how it goes

34
00:02:08,669 --> 00:02:13,889
I want to take a state-of-the-art
TensorFlow model

35
00:02:13,889 --> 00:02:16,769
I want to solve a problem that 
it wasn't trained for

36
00:02:16,769 --> 00:02:20,928
And I am going to be using 
deep learning as a component

37
00:02:20,928 --> 00:02:25,960
of my solution rather than the 
primary focus of what I am trying to build

38
00:02:25,960 --> 00:02:32,900
So this is, in a way more of an industrial
or commercial kind of application

39
00:02:32,912 --> 00:02:35,190
for what's going on here

40
00:02:35,190 --> 00:02:38,510
So the goal for this kind of problem is

41
00:02:38,510 --> 00:02:42,530
I want to distinguish pictures
of classic and modern sports cars

42
00:02:42,530 --> 00:02:47,051
you will see some pictures of 
classic and modern cars a bit later

43
00:02:48,433 --> 00:02:51,722
It's not that easy to say what
the difference is

44
00:02:51,722 --> 00:02:55,211
obviously, it could be 
different types of images

45
00:02:55,211 --> 00:02:57,454
and it could be lots of 
different classes

46
00:02:57,454 --> 00:03:00,992
I am just doing a very simple 
two class thing

47
00:03:00,992 --> 00:03:03,145
but it's complicated images

48
00:03:03,145 --> 00:03:04,824
what I want to do is

49
00:03:04,824 --> 00:03:06,114
I want to have a very small training time

50
00:03:06,114 --> 00:03:08,381
so I don't want to be retraining 
some huge network

51
00:03:08,381 --> 00:03:12,895
Particularly, I have only got 
in this case, 20 training examples

52
00:03:12,895 --> 00:03:18,195
So I am not gonna do any fantastic
million image training

53
00:03:18,195 --> 00:03:20,863
I have got 20 images with me

54
00:03:20,863 --> 00:03:24,705
and I also want to be able to
put this in production

55
00:03:24,705 --> 00:03:30,118
so I can just run it as a component of 
something else

56
00:03:30,118 --> 00:03:36,395
Basically, one of the things that is 
carrying the deep learning world forward

57
00:03:36,395 --> 00:03:40,196
is an image classification task 
called ImageNet

58
00:03:40,196 --> 00:03:42,406
this has been a competition where

59
00:03:42,406 --> 00:03:47,407
they have 15 million labeled images
from 22,000 categories

60
00:03:47,407 --> 00:03:49,858
and you can see some of them here

61
00:03:49,858 --> 00:03:55,817
if we go for this.
this is a picture of a hotdog in a bun

62
00:03:55,817 --> 00:03:57,786
and here are some of the categories

63
00:03:57,786 --> 00:04:02,538
which will be some food I don't know

64
00:04:02,538 --> 00:04:06,107
these are hotdogs, lots of 
different pictures of hotdogs

65
00:04:06,107 --> 00:04:09,058
lots of different pictures of cheeseburgers

66
00:04:09,058 --> 00:04:11,848
lots of different pictures of plates

67
00:04:11,848 --> 00:04:15,338
so the task for ImageNet is to classify

68
00:04:15,338 --> 00:04:18,267
for any given, any one of these images

69
00:04:18,267 --> 00:04:20,447
which of a thousand different 
categories it is from

70
00:04:20,447 --> 00:04:25,328
and it used to be that people could
score adequately well

71
00:04:25,328 --> 00:04:28,558
and were making incremental changes in

72
00:04:28,558 --> 00:04:30,558
how well they can do this

73
00:04:30,558 --> 00:04:32,998
but the deep learning people came along

74
00:04:32,998 --> 00:04:35,488
and kind of tore this to shreds

75
00:04:35,488 --> 00:04:40,149
and Google came up with GoogLeNet

76
00:04:40,149 --> 00:04:43,909
what we are actually going to use here,
back in 2014

77
00:04:43,909 --> 00:04:49,649
suddenly, this stuff is now being done
with further iterations

78
00:04:49,649 --> 00:04:52,808
of this kind of thing,
better than humans can

79
00:04:52,808 --> 00:04:56,795
So the way you can measure whether
someone is better than humans

80
00:04:56,795 --> 00:04:59,069
is, you take a human and see
whether it beats him

81
00:04:59,069 --> 00:05:01,560
the question there is 
are there labeling errors

82
00:05:01,560 --> 00:05:03,720
there you need a committee of humans

83
00:05:03,720 --> 00:05:06,250
so the way they label these things is

84
00:05:06,250 --> 00:05:08,740
by running it on Mechanical Turk and

85
00:05:08,740 --> 00:05:12,490
asking people what category is this
cheeseburger in

86
00:05:14,820 --> 00:05:16,380
The network we are going to use here

87
00:05:16,380 --> 00:05:23,421
is the 2014 state-of-the-art GoogLeNet,
also called Inception version 1

88
00:05:23,421 --> 00:05:25,690
The nice thing about this is that

89
00:05:25,690 --> 00:05:30,942
there is an existing model
already trained for this task

90
00:05:30,942 --> 00:05:33,772
and it's available for download
it's all free

91
00:05:33,772 --> 00:05:38,952
and there are lots of different 
models out there

92
00:05:38,952 --> 00:05:41,362
there's a model zoo for TensorFlow

93
00:05:41,362 --> 00:05:44,351
So, what I have on my machine

94
00:05:44,351 --> 00:05:48,531
and this is a small model, 
it's a 20 megabytes kind of model

95
00:05:48,531 --> 00:05:50,276
So it is not a very big model

96
00:05:50,276 --> 00:05:57,291
Inception 4 is a 200 MB kind of model
which is a bit heavy

97
00:05:57,291 --> 00:05:59,423
I am working here on my laptop

98
00:05:59,423 --> 00:06:01,212
you are gonna see it working in real-time

99
00:06:01,212 --> 00:06:07,254
and the trick here is instead of 
a softmax layer at the end

100
00:06:07,254 --> 00:06:12,984
I will show you the diagram, it should be 
clear to anyone who's following along

101
00:06:12,984 --> 00:06:19,082
instead of using logits to get me 
the probablilities

102
00:06:19,082 --> 00:06:21,133
I am going to strip that away

103
00:06:21,133 --> 00:06:23,074
and I am going to train 
a support vector machine

104
00:06:23,074 --> 00:06:24,884
to distinguish these classes

105
00:06:24,884 --> 00:06:29,854
I am not going to retrain the 
Inception network at all

106
00:06:29,854 --> 00:06:32,474
I am going to just use it as a component

107
00:06:32,474 --> 00:06:34,913
strip off the top classification piece

108
00:06:34,913 --> 00:06:38,234
and replace it with an SVM

109
00:06:38,234 --> 00:06:40,384
Now, SVMs are pretty well understood

110
00:06:40,384 --> 00:06:44,624
here I am just using Inception 
as a featurizer for images

111
00:06:44,624 --> 00:06:47,285
So here's a network picture

112
00:06:47,285 --> 00:06:52,015
Basically, this is what the ImageNet 
network is designed for

113
00:06:52,015 --> 00:06:54,334
you put in an image at the bottom

114
00:06:54,334 --> 00:06:57,445
there is this black box which is the 
Inception network

115
00:06:57,445 --> 00:07:00,745
which is a bunch of CNNs or 
convolutional neural networks

116
00:07:00,745 --> 00:07:02,596
followed by Dense network

117
00:07:02,596 --> 00:07:04,846
followed by these logits

118
00:07:04,846 --> 00:07:07,976
and these logits layers is essentially 
the same as the 0 to 10

119
00:07:07,976 --> 00:07:17,037
that Sam had for his digits, 1 to 1000 
for the different classes for ImageNet

120
00:07:17,037 --> 00:07:20,418
To actually get the ImageNet output

121
00:07:20,418 --> 00:07:27,387
it uses a softmax function and 
then chooses the highest one of these

122
00:07:27,387 --> 00:07:28,908
to give you this is the class
that this is in

123
00:07:28,908 --> 00:07:32,167
What I am going to do is 
I am going to ignore this

124
00:07:32,167 --> 00:07:35,337
neat piece of classification technology
that they have got

125
00:07:35,337 --> 00:07:44,148
let's say we use these outputs as inputs
to SVM and just treat these as features

126
00:07:44,148 --> 00:07:46,567
Now if we pick out one of these

127
00:07:46,567 --> 00:07:50,698
this class could be cheeseburger
and this class could be parrot

128
00:07:50,698 --> 00:07:54,067
and this other class could be Husky dog

129
00:07:54,067 --> 00:07:57,178
there is all sorts of classes in here

130
00:07:57,178 --> 00:07:59,709
but basically what I will be doing is that

131
00:07:59,709 --> 00:08:02,248
I will be extracting out the features 
of these photos

132
00:08:02,248 --> 00:08:04,948
saying how much of this photo 
is like a parrot

133
00:08:04,948 --> 00:08:08,938
how much of this is like a Husky dog

134
00:08:08,938 --> 00:08:13,229
Now it turns out that modern cars and 
classic cars can be distinguised that way

135
00:08:13,229 --> 00:08:18,659
Let me go to some code

136
00:08:18,659 --> 00:08:20,600
Ok this code is all up on GitHub

137
00:08:30,950 --> 00:08:34,300
Can everyone see this enough

138
00:08:38,380 --> 00:08:42,230
So basically, I am pulling in TensorFlow

139
00:08:45,400 --> 00:08:49,251
I pull in this model

140
00:08:49,251 --> 00:08:52,780
Here is what the Inception architecture is

141
00:08:52,780 --> 00:08:56,971
It feeds forward this way,
here you put your image

142
00:08:56,971 --> 00:08:59,901
it goes through lots and lots of 
convolutional layers

143
00:08:59,901 --> 00:09:03,490
all the way up to the end 
with softmax and the output

144
00:09:03,490 --> 00:09:06,922
So having done that, what I will do is

145
00:09:06,922 --> 00:09:09,741
actually I have a download 
for the checkpoint

146
00:09:09,741 --> 00:09:16,562
this is the checkpoint here which 
is a tar file, I have it locally stored

147
00:09:16,562 --> 00:09:18,500
It doesn't download it now

148
00:09:18,500 --> 00:09:25,262
but it is all there, even the 
big models are there up from Google

149
00:09:25,262 --> 00:09:27,762
so they have retrained these

150
00:09:27,762 --> 00:09:30,483
so the Inception thing takes about a week

151
00:09:30,483 --> 00:09:33,792
to retrain on a bunch of, 
it could be 64 GPUs

152
00:09:33,792 --> 00:09:36,864
so you don't really want to be
training this thing on your own

153
00:09:36,864 --> 00:09:40,793
you also need the ImageNet training set


154
00:09:40,793 --> 00:09:48,384
it is a 140 GB file 
which is no fun to download

155
00:09:50,824 --> 00:09:57,185
what I am doing here is basically
there is also an Inception library

156
00:09:57,185 --> 00:10:04,043
which is part of the TF-Slim 
this thing is desinged such that

157
00:10:04,043 --> 00:10:08,264
it already knows the network
it can preload it

158
00:10:08,264 --> 00:10:12,290
this has loaded it, 
I can get some labels

159
00:10:12,290 --> 00:10:17,184
This is loading up the ImageNet labels

160
00:10:17,184 --> 00:10:25,565
I need to know which location 
corresponds to which class like the digits

161
00:10:31,285 --> 00:10:33,305
Here we are going through 
basically the same steps

162
00:10:33,305 --> 00:10:39,068
as the MNIST example in that
we reset the default graph

163
00:10:39,068 --> 00:10:44,586
we create a placeholder which is 
where my images are going to go

164
00:10:44,586 --> 00:10:47,575
this is as an input 
but from this image input

165
00:10:47,575 --> 00:10:49,904
I am then going to do some TensorFlow steps

166
00:10:49,904 --> 00:10:52,286
because TensorFlow 
has various preprocessing

167
00:10:52,286 --> 00:10:55,767
or graphics handling commands

168
00:10:55,767 --> 00:10:57,747
because a lot of this stuff
works with images

169
00:10:57,747 --> 00:11:02,547
so there's all sorts of clipping
and rotating stuff

170
00:11:02,547 --> 00:11:04,778
so it can preprocess these images

171
00:11:04,778 --> 00:11:08,485
I am also going to pull out a numpy image

172
00:11:08,485 --> 00:11:10,828
so I can see what it is actually looking at

173
00:11:10,828 --> 00:11:14,850
here with this Inception version 1

174
00:11:14,850 --> 00:11:20,906
I am going to pull in the entire
Inception version 1 model

175
00:11:23,356 --> 00:11:26,568
My net function rather than being 
just picks and random weights

176
00:11:26,568 --> 00:11:29,978
is gonna be assigned this 
from this checkpoint

177
00:11:29,978 --> 00:11:34,418
when I run the init thing from my graph

178
00:11:34,418 --> 00:11:37,478
or in my session, it won't initialize
everything from random

179
00:11:37,478 --> 00:11:39,479
it will initialize everything from disk

180
00:11:39,479 --> 00:11:42,028
so this will define the model

181
00:11:42,028 --> 00:11:45,358
and now let's proceed

182
00:11:45,358 --> 00:11:51,609
one of the issues with having this 
on a nice TensorFlow graph

183
00:11:51,609 --> 00:11:56,658
is it just says input, Inception1, output

184
00:11:56,658 --> 00:11:59,939
so there's a big block there 
you can delve into it if you want

185
00:11:59,939 --> 00:12:05,790
let me just show you
let's go back a bit

186
00:12:08,320 --> 00:12:11,300
So this is the code 
behind the Inception1 model

187
00:12:11,300 --> 00:12:16,060
so this is actually smaller than the 
Inception2 and Inception3

188
00:12:16,060 --> 00:12:22,331
basically, we have a kind of a base
Inception piece, just this

189
00:12:22,331 --> 00:12:24,971
and these are combined together

190
00:12:24,971 --> 00:12:33,441
and this is a detailed model put together 
by many smart people in 2014

191
00:12:33,441 --> 00:12:35,472
it's got much more complicated since then

192
00:12:35,472 --> 00:12:38,912
fortunately, they have written the code 
and we don't have to

193
00:12:43,422 --> 00:12:46,321
So here what I am gonna do is 
I am gonna load an example image

194
00:12:46,321 --> 00:12:50,581
just to show you
one of the the things here is

195
00:12:50,581 --> 00:12:56,396
TensorFlow in order to become efficient
wants to do the loading itself

196
00:12:56,396 --> 00:13:01,344
So in order to get this pumping 
information through

197
00:13:01,344 --> 00:13:03,633
it wants you to set up hues of images

198
00:13:03,633 --> 00:13:10,263
it will then handle the whole ingestion
process itself

199
00:13:10,263 --> 00:13:14,153
the problem with that is 
it's kind of complicated to do

200
00:13:14,153 --> 00:13:16,023
in a Jupyter notebook right here

201
00:13:16,023 --> 00:13:19,133
so here I am going to do 
the very simplest thing

202
00:13:19,133 --> 00:13:22,393
which is load a numpy image 
and stuff the numpy image in

203
00:13:22,393 --> 00:13:24,883
but what TensorFlow would love me to do

204
00:13:24,883 --> 00:13:29,413
is create , as you see in this one

205
00:13:29,413 --> 00:13:34,024
create a file name queue and it will

206
00:13:34,024 --> 00:13:35,314
then run the queue, do the matching

207
00:13:35,314 --> 00:13:36,674
and do all of this stuff itself

208
00:13:36,674 --> 00:13:41,093
because then it can lay it out across 
potentially distributed cluster

209
00:13:41,093 --> 00:13:43,414
and do everything just right

210
00:13:43,414 --> 00:13:50,254
here I do kind of the simple read the image

211
00:13:50,254 --> 00:13:59,507
so this image is a tensor 
which is 224 by 224 by RGB

212
00:13:59,507 --> 00:14:03,478
this is kind of sanity check
what kind of numbers I got in the corner

213
00:14:03,478 --> 00:14:05,667
and then what I am gonna do is

214
00:14:05,667 --> 00:14:08,016
i am going to crop out the 
middle section of it

215
00:14:08,016 --> 00:14:10,761
this happens to be the right size already

216
00:14:10,761 --> 00:14:13,495
basically if you got odd shapes

217
00:14:13,495 --> 00:14:15,136
you need to think about 
how am I gonna do it

218
00:14:15,136 --> 00:14:18,956
am I going to pad it
what do you do

219
00:14:18,956 --> 00:14:21,947
because in order to make this efficient

220
00:14:21,947 --> 00:14:29,056
TensorFlow wants to lay it out without 
all this variability in image size

221
00:14:29,056 --> 00:14:34,475
one set of parameters and it's then going
to blast it across your GPU

222
00:14:34,475 --> 00:14:37,865
so let's just run this thing

223
00:14:37,865 --> 00:14:39,697
so now we have defined the network

224
00:14:39,697 --> 00:14:45,767
here I am going to pick a session
here I am going to init the session

225
00:14:45,767 --> 00:14:47,839
it loads the data, and then I am going

226
00:14:47,839 --> 00:14:52,037
to pick up the numpy image and the
probabilities from the top layer

227
00:14:52,037 --> 00:14:54,677
I am just gonna show it

228
00:14:57,507 --> 00:15:01,366
here is the image 
this is image I pulled out of the disk

229
00:15:01,366 --> 00:15:06,327
you can see here the probabilities, 
the highest probability is Tabby cat

230
00:15:06,327 --> 00:15:10,487
which is good, it's also interesting that

231
00:15:10,487 --> 00:15:15,263
the second in line things are Tiger cat,
Egyptian cat, lynx

232
00:15:15,263 --> 00:15:21,037
so it's got a fair idea that it is a cat
in particular, it is getting it right

233
00:15:21,037 --> 00:15:26,169
ok so this is the same diagram 
we have had before

234
00:15:26,169 --> 00:15:32,729
what you have seen is this going in this 
black box, coming out and telling us

235
00:15:32,729 --> 00:15:35,868
the probabilities here, so what we are 
now gonna do is

236
00:15:35,868 --> 00:15:41,910
from the image to the black box and 
just learn a bunch of features

237
00:15:50,030 --> 00:15:52,720
let me just show you this on disk

238
00:16:11,300 --> 00:16:13,304
so I have a cars directory here

239
00:16:13,957 --> 00:16:17,848
and inside this thing,

240
00:16:24,238 --> 00:16:25,788
I have surprisingly little data

241
00:16:36,648 --> 00:16:39,863
In this directory, I just have a 
bunch of car images

242
00:16:39,863 --> 00:16:42,189
and I have two sets of images

243
00:16:42,189 --> 00:16:47,659
one of which is called classic 
and the other is called modern

244
00:16:47,659 --> 00:16:52,010
so I basically I picked some
photos off Flickr

245
00:16:52,010 --> 00:16:54,439
I put these into two separate directories

246
00:16:54,439 --> 00:16:56,309
I am going to use those directory names

247
00:16:56,309 --> 00:17:00,431
as the classification for these images

248
00:17:00,431 --> 00:17:05,160
In the upper directory here
I got a bunch of test images

249
00:17:05,160 --> 00:17:06,830
which I don't know the labels for

250
00:17:12,610 --> 00:17:17,261
this picks out the list of classic , there
is a classic and a modern directory

251
00:17:17,261 --> 00:17:21,990
I am gonna go through every file 
in this directory

252
00:17:21,990 --> 00:17:28,470
I am gonna crop it, I am gonna find 
the logits level which is

253
00:17:28,470 --> 00:17:33,441
all the classes and then I am just gonna 
add these to features

254
00:17:33,441 --> 00:17:36,601
So basically I am gonna do something
like a scikit-learn model

255
00:17:36,601 --> 00:17:38,311
I am gonna fit SVM

256
00:17:38,311 --> 00:17:42,111
so basically, this is featurizing 
all these pictures

257
00:17:47,911 --> 00:17:49,961
so here we go with the training data

258
00:17:55,571 --> 00:17:56,972
here's some training

259
00:18:02,272 --> 00:18:05,622
classic cars,
it went through the classic directory

260
00:18:05,622 --> 00:18:08,782
modern cars, 
it went through the modern directory

261
00:18:15,292 --> 00:18:16,752
it's thinking hard

262
00:18:18,392 --> 00:18:25,284
what I am gonna do now is 
build SVM over those features

263
00:18:31,016 --> 00:18:40,180
jump to 21:36

264
00:21:35,478 --> 00:21:43,839
I restarted this thing

265
00:21:43,839 --> 00:21:49,619
the actual training for this SVM thing
takes that long,

266
00:21:49,619 --> 00:21:58,018
this is very quick, essentially 20 images 
worth of a thousand features

267
00:21:58,018 --> 00:22:01,840
so there was no big training loop to do

268
00:22:01,840 --> 00:22:09,070
then I can run this on the actual models 
in the directory, in the test set

269
00:22:09,070 --> 00:22:12,680
so here this is images that it has never
seen before

270
00:22:12,680 --> 00:22:16,440
it thinks that this is a modern car

271
00:22:16,440 --> 00:22:19,020
this one it thinks is a classic car, 
this one is classified as modern

272
00:22:19,020 --> 00:22:26,301
so this is actually doing quite a good job
out of just 10 examples of each

273
00:22:26,301 --> 00:22:32,770
it actually thinks this one is modern
it's not a sports car but anyway

274
00:22:32,770 --> 00:22:38,939
so this is showing that the SVM we trained

275
00:22:38,939 --> 00:22:42,901
can classify based on the features that
Inception is producing because

276
00:22:42,901 --> 00:22:47,231
Inception understands "understands"
what images are about

277
00:22:47,231 --> 00:22:50,801
so if I go back to here, 
code is on GitHub

278
00:22:50,801 --> 00:22:53,992
conclusions okay, this thing really works

279
00:22:53,992 --> 00:22:58,402
we didn't have to train 
a deep neural network

280
00:22:58,402 --> 00:23:01,876
we could plug this TensorFlow model
into an existing pipeline

281
00:23:01,876 --> 00:23:04,760
and this is actually something where

282
00:23:04,760 --> 00:23:08,532
the TensorFlow Summit has something
to say about these pipelines

283
00:23:08,532 --> 00:23:11,013
because not only are they talking
about deep learning

284
00:23:11,013 --> 00:23:14,753
they are talking about the whole 
cloud-based learning

285
00:23:14,753 --> 00:23:19,453
and setting up proper processes

286
00:23:19,453 --> 00:23:23,965
I guess, time for questions quickly

287
00:23:23,965 --> 00:23:29,142
we can then do the 
TensorFlow Summit wrap-up

288
00:23:33,212 --> 00:23:37,144
"I am assuming that there is no 
backpropagation here"

289
00:23:37,144 --> 00:23:40,034
This includes no backpropagation

290
00:23:40,034 --> 00:23:42,504
"End result is a feature"

291
00:23:45,884 --> 00:23:53,135
I am just assuming that Inception,
you can imagine if the ImageNet thing

292
00:23:53,135 --> 00:23:56,265
had focused more on products,
it could be even better

293
00:23:56,265 --> 00:23:58,914
if it focused on man-made things

294
00:23:58,914 --> 00:24:04,915
The ImageNet training set has an awful
lot of dogs in it, not that many cats

295
00:24:04,915 --> 00:24:09,426
So, on the other hand it may be that 
it has quite a lot of flowers

296
00:24:09,426 --> 00:24:13,826
or maybe that it is saying I like this car
as modern car

297
00:24:13,826 --> 00:24:16,046
because it's got petals for wheels

298
00:24:16,046 --> 00:24:20,385
whereas the other one, the classic cars
tend to have round things for wheels

299
00:24:20,385 --> 00:24:25,146
So it is abstractly doing this

300
00:24:25,146 --> 00:24:29,918
It doesn't know about sports cars or 
what they look like

301
00:24:29,918 --> 00:24:31,587
But it does know about curves

302
00:24:34,607 --> 00:24:37,527
"So for SVM, you don't use 
TensorFlow anymore ?"

303
00:24:37,527 --> 00:24:43,157
No, basically I have used TensorFlow to 
create some features

304
00:24:43,157 --> 00:24:45,308
Now, I don't want to throw it away

305
00:24:45,308 --> 00:24:47,687
because hopefully I have got
a streaming process where

306
00:24:47,687 --> 00:24:52,177
more and more images are chugged 
through this thing

307
00:24:52,177 --> 00:25:04,528
<i> could not hear the question properly </i>

308
00:25:07,058 --> 00:25:10,068
There is an example code called
TensorFlow for poets

309
00:25:10,068 --> 00:25:13,296
where they actually say that, 
let's load up one of these networks

310
00:25:13,296 --> 00:25:15,369
and then we will do some fine tuning

311
00:25:15,369 --> 00:25:21,977
there you get involved in tuning 
these neurons with some gradient descent

312
00:25:21,977 --> 00:25:24,819
and you are taking some steps 
and all this kind of thing

313
00:25:24,819 --> 00:25:28,328
maybe you are having broad implications
across the whole network

314
00:25:28,328 --> 00:25:32,819
which could be good if you have got
tons of data and tons of time

315
00:25:32,819 --> 00:25:36,948
but this is a very simple way of just
tricking it to get it done

316
00:25:36,948 --> 00:25:47,382
<i> could not hear the comment properly </i>

317
00:25:47,382 --> 00:25:54,033
it will be a very small network
because SVM is essentially fairly shallow

318
00:25:54,033 --> 00:26:06,532
<i>could not hear the question</i>

319
00:26:06,532 --> 00:26:13,752
TensorFlow even though it has imported
this large Inception network

320
00:26:13,752 --> 00:26:20,572
as far as I am concerned, 
I am using a f(x) = y and that's it

321
00:26:20,572 --> 00:26:25,062
but you can inquire what would it say
at this particular level

322
00:26:25,062 --> 00:26:30,473
and these bunches of levels with various
component points along the way

323
00:26:30,473 --> 00:26:33,654
I could take out other levels

324
00:26:33,654 --> 00:26:35,783
I haven't tried it to have a look

325
00:26:35,783 --> 00:26:40,083
There you get more like pictures
worth of features rather than

326
00:26:40,083 --> 00:26:43,094
this string of a 1000 numbers

327
00:26:43,094 --> 00:26:48,884
but each intermediate levels 
will be pictures with CNN kind of features

328
00:26:48,884 --> 00:26:53,544
on the other hand, if you want 
to play around with this thing

329
00:26:53,544 --> 00:26:57,654
there's this nice stuff called 
the DeepDream kind of things

330
00:26:57,654 --> 00:27:02,559
where they try and match images to 
being interesting images

331
00:27:02,559 --> 00:27:06,454
then you do the featurizing that looks at
different levels

332
00:27:06,454 --> 00:27:12,415
the highest level is a cat but I want all
local features to be as fishy as possible

333
00:27:12,415 --> 00:27:15,561
then you get like a fish-faced cat

334
00:27:15,561 --> 00:27:20,010
that's the kind of thing you can do with
these kinds of features in models