1 00:00:00,259 --> 00:00:03,421 What we hope to do with this meetup 2 00:00:03,851 --> 00:00:10,620 is have something, given the spread of the questionnaire results 3 00:00:10,620 --> 00:00:12,681 we hope to do something which is kind of 4 00:00:12,681 --> 00:00:15,721 for people who don't know what deep learning is 5 00:00:15,721 --> 00:00:17,992 and want an introduction to TensorFlow 6 00:00:17,992 --> 00:00:20,322 but also something which is more of a 7 00:00:20,322 --> 00:00:24,021 like a crowd pleaser or something which is more cutting edge 8 00:00:24,021 --> 00:00:27,081 I am not going to say that this thing is particularly cutting edge 9 00:00:27,081 --> 00:00:31,553 because once we saw the responses, we dialed things down a bit 10 00:00:31,553 --> 00:00:37,803 But there will be more cutting edge stuff 11 00:00:37,803 --> 00:00:42,811 and maybe we start to do other meetups events in other formats 12 00:00:42,811 --> 00:00:48,824 So it could be like we have an experts' paper meeting 13 00:00:48,824 --> 00:00:52,864 or we could split it now we can see the size of people, size of the crowd 14 00:00:52,864 --> 00:00:57,824 Anyway, let me talk a little bit about going deeper with transfer learning 15 00:00:57,824 --> 00:01:00,164 Unfortunately, this is something some of you people 16 00:01:00,164 --> 00:01:02,503 would have seen me do before 17 00:01:02,503 --> 00:01:04,883 This is the first time I have done it in tensorflow 18 00:01:05,122 --> 00:01:07,272 and let me just explain that 19 00:01:07,272 --> 00:01:10,162 Before, I have been programming this stuff 20 00:01:10,162 --> 00:01:13,382 in Theano with the Lasagna layers thing on top 21 00:01:13,382 --> 00:01:19,431 and Theano is a research-based deep learning framework, out of Montreal 22 00:01:19,431 --> 00:01:22,683 but what I have concluded since last summer 23 00:01:22,683 --> 00:01:26,643 is that TensorFlow 's probably the winner of this framework race 24 00:01:26,643 --> 00:01:29,434 at least for foreseeable future 25 00:01:29,434 --> 00:01:31,861 with all this nice industrial stuff 26 00:01:31,891 --> 00:01:35,154 I should be retooling into TensorFlow 27 00:01:35,518 --> 00:01:37,483 That's what I am taking the opportunity to do for this 28 00:01:40,951 --> 00:01:43,067 So, about me, sorry here we go 29 00:01:43,067 --> 00:01:45,678 I have come up through finance, startups and stuff 30 00:01:45,678 --> 00:01:49,649 I took a year out basically in 2014 just for fun 31 00:01:49,649 --> 00:01:53,559 I have been doing serious kind of natural language processing since then 32 00:02:00,909 --> 00:02:04,629 Basically, the overview for this something more challenging talk 33 00:02:04,629 --> 00:02:08,669 which will probably be 20 mins, 30 mins depending on how it goes 34 00:02:08,669 --> 00:02:13,889 I want to take a state-of-the-art TensorFlow model 35 00:02:13,889 --> 00:02:16,769 I want to solve a problem that it wasn't trained for 36 00:02:16,769 --> 00:02:20,928 And I am going to be using deep learning as a component 37 00:02:20,928 --> 00:02:25,960 of my solution rather than the primary focus of what I am trying to build 38 00:02:25,960 --> 00:02:32,900 So this is, in a way more of an industrial or commercial kind of application 39 00:02:32,912 --> 00:02:35,190 for what's going on here 40 00:02:35,190 --> 00:02:38,510 So the goal for this kind of problem is 41 00:02:38,510 --> 00:02:42,530 I want to distinguish pictures of classic and modern sports cars 42 00:02:42,530 --> 00:02:47,051 you will see some pictures of classic and modern cars a bit later 43 00:02:48,433 --> 00:02:51,722 It's not that easy to say what the difference is 44 00:02:51,722 --> 00:02:55,211 obviously, it could be different types of images 45 00:02:55,211 --> 00:02:57,454 and it could be lots of different classes 46 00:02:57,454 --> 00:03:00,992 I am just doing a very simple two class thing 47 00:03:00,992 --> 00:03:03,145 but it's complicated images 48 00:03:03,145 --> 00:03:04,824 what I want to do is 49 00:03:04,824 --> 00:03:06,114 I want to have a very small training time 50 00:03:06,114 --> 00:03:08,381 so I don't want to be retraining some huge network 51 00:03:08,381 --> 00:03:12,895 Particularly, I have only got in this case, 20 training examples 52 00:03:12,895 --> 00:03:18,195 So I am not gonna do any fantastic million image training 53 00:03:18,195 --> 00:03:20,863 I have got 20 images with me 54 00:03:20,863 --> 00:03:24,705 and I also want to be able to put this in production 55 00:03:24,705 --> 00:03:30,118 so I can just run it as a component of something else 56 00:03:30,118 --> 00:03:36,395 Basically, one of the things that is carrying the deep learning world forward 57 00:03:36,395 --> 00:03:40,196 is an image classification task called ImageNet 58 00:03:40,196 --> 00:03:42,406 this has been a competition where 59 00:03:42,406 --> 00:03:47,407 they have 15 million labeled images from 22,000 categories 60 00:03:47,407 --> 00:03:49,858 and you can see some of them here 61 00:03:49,858 --> 00:03:55,817 if we go for this. this is a picture of a hotdog in a bun 62 00:03:55,817 --> 00:03:57,786 and here are some of the categories 63 00:03:57,786 --> 00:04:02,538 which will be some food I don't know 64 00:04:02,538 --> 00:04:06,107 these are hotdogs, lots of different pictures of hotdogs 65 00:04:06,107 --> 00:04:09,058 lots of different pictures of cheeseburgers 66 00:04:09,058 --> 00:04:11,848 lots of different pictures of plates 67 00:04:11,848 --> 00:04:15,338 so the task for ImageNet is to classify 68 00:04:15,338 --> 00:04:18,267 for any given, any one of these images 69 00:04:18,267 --> 00:04:20,447 which of a thousand different categories it is from 70 00:04:20,447 --> 00:04:25,328 and it used to be that people could score adequately well 71 00:04:25,328 --> 00:04:28,558 and were making incremental changes in 72 00:04:28,558 --> 00:04:30,558 how well they can do this 73 00:04:30,558 --> 00:04:32,998 but the deep learning people came along 74 00:04:32,998 --> 00:04:35,488 and kind of tore this to shreds 75 00:04:35,488 --> 00:04:40,149 and Google came up with GoogLeNet 76 00:04:40,149 --> 00:04:43,909 what we are actually going to use here, back in 2014 77 00:04:43,909 --> 00:04:49,649 suddenly, this stuff is now being done with further iterations 78 00:04:49,649 --> 00:04:52,808 of this kind of thing, better than humans can 79 00:04:52,808 --> 00:04:56,795 So the way you can measure whether someone is better than humans 80 00:04:56,795 --> 00:04:59,069 is, you take a human and see whether it beats him 81 00:04:59,069 --> 00:05:01,560 the question there is are there labeling errors 82 00:05:01,560 --> 00:05:03,720 there you need a committee of humans 83 00:05:03,720 --> 00:05:06,250 so the way they label these things is 84 00:05:06,250 --> 00:05:08,740 by running it on Mechanical Turk and 85 00:05:08,740 --> 00:05:12,490 asking people what category is this cheeseburger in 86 00:05:14,820 --> 00:05:16,380 The network we are going to use here 87 00:05:16,380 --> 00:05:23,421 is the 2014 state-of-the-art GoogLeNet, also called Inception version 1 88 00:05:23,421 --> 00:05:25,690 The nice thing about this is that 89 00:05:25,690 --> 00:05:30,942 there is an existing model already trained for this task 90 00:05:30,942 --> 00:05:33,772 and it's available for download it's all free 91 00:05:33,772 --> 00:05:38,952 and there are lots of different models out there 92 00:05:38,952 --> 00:05:41,362 there's a model zoo for TensorFlow 93 00:05:41,362 --> 00:05:44,351 So, what I have on my machine 94 00:05:44,351 --> 00:05:48,531 and this is a small model, it's a 20 megabytes kind of model 95 00:05:48,531 --> 00:05:50,276 So it is not a very big model 96 00:05:50,276 --> 00:05:57,291 Inception 4 is a 200 MB kind of model which is a bit heavy 97 00:05:57,291 --> 00:05:59,423 I am working here on my laptop 98 00:05:59,423 --> 00:06:01,212 you are gonna see it working in real-time 99 00:06:01,212 --> 00:06:07,254 and the trick here is instead of a softmax layer at the end 100 00:06:07,254 --> 00:06:12,984 I will show you the diagram, it should be clear to anyone who's following along 101 00:06:12,984 --> 00:06:19,082 instead of using logits to get me the probablilities 102 00:06:19,082 --> 00:06:21,133 I am going to strip that away 103 00:06:21,133 --> 00:06:23,074 and I am going to train a support vector machine 104 00:06:23,074 --> 00:06:24,884 to distinguish these classes 105 00:06:24,884 --> 00:06:29,854 I am not going to retrain the Inception network at all 106 00:06:29,854 --> 00:06:32,474 I am going to just use it as a component 107 00:06:32,474 --> 00:06:34,913 strip off the top classification piece 108 00:06:34,913 --> 00:06:38,234 and replace it with an SVM 109 00:06:38,234 --> 00:06:40,384 Now, SVMs are pretty well understood 110 00:06:40,384 --> 00:06:44,624 here I am just using Inception as a featurizer for images 111 00:06:44,624 --> 00:06:47,285 So here's a network picture 112 00:06:47,285 --> 00:06:52,015 Basically, this is what the ImageNet network is designed for 113 00:06:52,015 --> 00:06:54,334 you put in an image at the bottom 114 00:06:54,334 --> 00:06:57,445 there is this black box which is the Inception network 115 00:06:57,445 --> 00:07:00,745 which is a bunch of CNNs or convolutional neural networks 116 00:07:00,745 --> 00:07:02,596 followed by Dense network 117 00:07:02,596 --> 00:07:04,846 followed by these logits 118 00:07:04,846 --> 00:07:07,976 and these logits layers is essentially the same as the 0 to 10 119 00:07:07,976 --> 00:07:17,037 that Sam had for his digits, 1 to 1000 for the different classes for ImageNet 120 00:07:17,037 --> 00:07:20,418 To actually get the ImageNet output 121 00:07:20,418 --> 00:07:27,387 it uses a softmax function and then chooses the highest one of these 122 00:07:27,387 --> 00:07:28,908 to give you this is the class that this is in 123 00:07:28,908 --> 00:07:32,167 What I am going to do is I am going to ignore this 124 00:07:32,167 --> 00:07:35,337 neat piece of classification technology that they have got 125 00:07:35,337 --> 00:07:44,148 let's say we use these outputs as inputs to SVM and just treat these as features 126 00:07:44,148 --> 00:07:46,567 Now if we pick out one of these 127 00:07:46,567 --> 00:07:50,698 this class could be cheeseburger and this class could be parrot 128 00:07:50,698 --> 00:07:54,067 and this other class could be Husky dog 129 00:07:54,067 --> 00:07:57,178 there is all sorts of classes in here 130 00:07:57,178 --> 00:07:59,709 but basically what I will be doing is that 131 00:07:59,709 --> 00:08:02,248 I will be extracting out the features of these photos 132 00:08:02,248 --> 00:08:04,948 saying how much of this photo is like a parrot 133 00:08:04,948 --> 00:08:08,938 how much of this is like a Husky dog 134 00:08:08,938 --> 00:08:13,229 Now it turns out that modern cars and classic cars can be distinguised that way 135 00:08:13,229 --> 00:08:18,659 Let me go to some code 136 00:08:18,659 --> 00:08:20,600 Ok this code is all up on GitHub 137 00:08:30,950 --> 00:08:34,300 Can everyone see this enough 138 00:08:38,380 --> 00:08:42,230 So basically, I am pulling in TensorFlow 139 00:08:45,400 --> 00:08:49,251 I pull in this model 140 00:08:49,251 --> 00:08:52,780 Here is what the Inception architecture is 141 00:08:52,780 --> 00:08:56,971 It feeds forward this way, here you put your image 142 00:08:56,971 --> 00:08:59,901 it goes through lots and lots of convolutional layers 143 00:08:59,901 --> 00:09:03,490 all the way up to the end with softmax and the output 144 00:09:03,490 --> 00:09:06,922 So having done that, what I will do is 145 00:09:06,922 --> 00:09:09,741 actually I have a download for the checkpoint 146 00:09:09,741 --> 00:09:16,562 this is the checkpoint here which is a tar file, I have it locally stored 147 00:09:16,562 --> 00:09:18,500 It doesn't download it now 148 00:09:18,500 --> 00:09:25,262 but it is all there, even the big models are there up from Google 149 00:09:25,262 --> 00:09:27,762 so they have retrained these 150 00:09:27,762 --> 00:09:30,483 so the Inception thing takes about a week 151 00:09:30,483 --> 00:09:33,792 to retrain on a bunch of, it could be 64 GPUs 152 00:09:33,792 --> 00:09:36,864 so you don't really want to be training this thing on your own 153 00:09:36,864 --> 00:09:40,793 you also need the ImageNet training set 154 00:09:40,793 --> 00:09:48,384 it is a 140 GB file which is no fun to download 155 00:09:50,824 --> 00:09:57,185 what I am doing here is basically there is also an Inception library 156 00:09:57,185 --> 00:10:04,043 which is part of the TF-Slim this thing is desinged such that 157 00:10:04,043 --> 00:10:08,264 it already knows the network it can preload it 158 00:10:08,264 --> 00:10:12,290 this has loaded it, I can get some labels 159 00:10:12,290 --> 00:10:17,184 This is loading up the ImageNet labels 160 00:10:17,184 --> 00:10:25,565 I need to know which location corresponds to which class like the digits 161 00:10:31,285 --> 00:10:33,305 Here we are going through basically the same steps 162 00:10:33,305 --> 00:10:39,068 as the MNIST example in that we reset the default graph 163 00:10:39,068 --> 00:10:44,586 we create a placeholder which is where my images are going to go 164 00:10:44,586 --> 00:10:47,575 this is as an input but from this image input 165 00:10:47,575 --> 00:10:49,904 I am then going to do some TensorFlow steps 166 00:10:49,904 --> 00:10:52,286 because TensorFlow has various preprocessing 167 00:10:52,286 --> 00:10:55,767 or graphics handling commands 168 00:10:55,767 --> 00:10:57,747 because a lot of this stuff works with images 169 00:10:57,747 --> 00:11:02,547 so there's all sorts of clipping and rotating stuff 170 00:11:02,547 --> 00:11:04,778 so it can preprocess these images 171 00:11:04,778 --> 00:11:08,485 I am also going to pull out a numpy image 172 00:11:08,485 --> 00:11:10,828 so I can see what it is actually looking at 173 00:11:10,828 --> 00:11:14,850 here with this Inception version 1 174 00:11:14,850 --> 00:11:20,906 I am going to pull in the entire Inception version 1 model 175 00:11:23,356 --> 00:11:26,568 My net function rather than being just picks and random weights 176 00:11:26,568 --> 00:11:29,978 is gonna be assigned this from this checkpoint 177 00:11:29,978 --> 00:11:34,418 when I run the init thing from my graph 178 00:11:34,418 --> 00:11:37,478 or in my session, it won't initialize everything from random 179 00:11:37,478 --> 00:11:39,479 it will initialize everything from disk 180 00:11:39,479 --> 00:11:42,028 so this will define the model 181 00:11:42,028 --> 00:11:45,358 and now let's proceed 182 00:11:45,358 --> 00:11:51,609 one of the issues with having this on a nice TensorFlow graph 183 00:11:51,609 --> 00:11:56,658 is it just says input, Inception1, output 184 00:11:56,658 --> 00:11:59,939 so there's a big block there you can delve into it if you want 185 00:11:59,939 --> 00:12:05,790 let me just show you let's go back a bit 186 00:12:08,320 --> 00:12:11,300 So this is the code behind the Inception1 model 187 00:12:11,300 --> 00:12:16,060 so this is actually smaller than the Inception2 and Inception3 188 00:12:16,060 --> 00:12:22,331 basically, we have a kind of a base Inception piece, just this 189 00:12:22,331 --> 00:12:24,971 and these are combined together 190 00:12:24,971 --> 00:12:33,441 and this is a detailed model put together by many smart people in 2014 191 00:12:33,441 --> 00:12:35,472 it's got much more complicated since then 192 00:12:35,472 --> 00:12:38,912 fortunately, they have written the code and we don't have to 193 00:12:43,422 --> 00:12:46,321 So here what I am gonna do is I am gonna load an example image 194 00:12:46,321 --> 00:12:50,581 just to show you one of the the things here is 195 00:12:50,581 --> 00:12:56,396 TensorFlow in order to become efficient wants to do the loading itself 196 00:12:56,396 --> 00:13:01,344 So in order to get this pumping information through 197 00:13:01,344 --> 00:13:03,633 it wants you to set up hues of images 198 00:13:03,633 --> 00:13:10,263 it will then handle the whole ingestion process itself 199 00:13:10,263 --> 00:13:14,153 the problem with that is it's kind of complicated to do 200 00:13:14,153 --> 00:13:16,023 in a Jupyter notebook right here 201 00:13:16,023 --> 00:13:19,133 so here I am going to do the very simplest thing 202 00:13:19,133 --> 00:13:22,393 which is load a numpy image and stuff the numpy image in 203 00:13:22,393 --> 00:13:24,883 but what TensorFlow would love me to do 204 00:13:24,883 --> 00:13:29,413 is create , as you see in this one 205 00:13:29,413 --> 00:13:34,024 create a file name queue and it will 206 00:13:34,024 --> 00:13:35,314 then run the queue, do the matching 207 00:13:35,314 --> 00:13:36,674 and do all of this stuff itself 208 00:13:36,674 --> 00:13:41,093 because then it can lay it out across potentially distributed cluster 209 00:13:41,093 --> 00:13:43,414 and do everything just right 210 00:13:43,414 --> 00:13:50,254 here I do kind of the simple read the image 211 00:13:50,254 --> 00:13:59,507 so this image is a tensor which is 224 by 224 by RGB 212 00:13:59,507 --> 00:14:03,478 this is kind of sanity check what kind of numbers I got in the corner 213 00:14:03,478 --> 00:14:05,667 and then what I am gonna do is 214 00:14:05,667 --> 00:14:08,016 i am going to crop out the middle section of it 215 00:14:08,016 --> 00:14:10,761 this happens to be the right size already 216 00:14:10,761 --> 00:14:13,495 basically if you got odd shapes 217 00:14:13,495 --> 00:14:15,136 you need to think about how am I gonna do it 218 00:14:15,136 --> 00:14:18,956 am I going to pad it what do you do 219 00:14:18,956 --> 00:14:21,947 because in order to make this efficient 220 00:14:21,947 --> 00:14:29,056 TensorFlow wants to lay it out without all this variability in image size 221 00:14:29,056 --> 00:14:34,475 one set of parameters and it's then going to blast it across your GPU 222 00:14:34,475 --> 00:14:37,865 so let's just run this thing 223 00:14:37,865 --> 00:14:39,697 so now we have defined the network 224 00:14:39,697 --> 00:14:45,767 here I am going to pick a session here I am going to init the session 225 00:14:45,767 --> 00:14:47,839 it loads the data, and then I am going 226 00:14:47,839 --> 00:14:52,037 to pick up the numpy image and the probabilities from the top layer 227 00:14:52,037 --> 00:14:54,677 I am just gonna show it 228 00:14:57,507 --> 00:15:01,366 here is the image this is image I pulled out of the disk 229 00:15:01,366 --> 00:15:06,327 you can see here the probabilities, the highest probability is Tabby cat 230 00:15:06,327 --> 00:15:10,487 which is good, it's also interesting that 231 00:15:10,487 --> 00:15:15,263 the second in line things are Tiger cat, Egyptian cat, lynx 232 00:15:15,263 --> 00:15:21,037 so it's got a fair idea that it is a cat in particular, it is getting it right 233 00:15:21,037 --> 00:15:26,169 ok so this is the same diagram we have had before 234 00:15:26,169 --> 00:15:32,729 what you have seen is this going in this black box, coming out and telling us 235 00:15:32,729 --> 00:15:35,868 the probabilities here, so what we are now gonna do is 236 00:15:35,868 --> 00:15:41,910 from the image to the black box and just learn a bunch of features 237 00:15:50,030 --> 00:15:52,720 let me just show you this on disk 238 00:16:11,300 --> 00:16:13,304 so I have a cars directory here 239 00:16:13,957 --> 00:16:17,848 and inside this thing, 240 00:16:24,238 --> 00:16:25,788 I have surprisingly little data 241 00:16:36,648 --> 00:16:39,863 In this directory, I just have a bunch of car images 242 00:16:39,863 --> 00:16:42,189 and I have two sets of images 243 00:16:42,189 --> 00:16:47,659 one of which is called classic and the other is called modern 244 00:16:47,659 --> 00:16:52,010 so I basically I picked some photos off Flickr 245 00:16:52,010 --> 00:16:54,439 I put these into two separate directories 246 00:16:54,439 --> 00:16:56,309 I am going to use those directory names 247 00:16:56,309 --> 00:17:00,431 as the classification for these images 248 00:17:00,431 --> 00:17:05,160 In the upper directory here I got a bunch of test images 249 00:17:05,160 --> 00:17:06,830 which I don't know the labels for 250 00:17:12,610 --> 00:17:17,261 this picks out the list of classic , there is a classic and a modern directory 251 00:17:17,261 --> 00:17:21,990 I am gonna go through every file in this directory 252 00:17:21,990 --> 00:17:28,470 I am gonna crop it, I am gonna find the logits level which is 253 00:17:28,470 --> 00:17:33,441 all the classes and then I am just gonna add these to features 254 00:17:33,441 --> 00:17:36,601 So basically I am gonna do something like a scikit-learn model 255 00:17:36,601 --> 00:17:38,311 I am gonna fit SVM 256 00:17:38,311 --> 00:17:42,111 so basically, this is featurizing all these pictures 257 00:17:47,911 --> 00:17:49,961 so here we go with the training data 258 00:17:55,571 --> 00:17:56,972 here's some training 259 00:18:02,272 --> 00:18:05,622 classic cars, it went through the classic directory 260 00:18:05,622 --> 00:18:08,782 modern cars, it went through the modern directory 261 00:18:15,292 --> 00:18:16,752 it's thinking hard 262 00:18:18,392 --> 00:18:25,284 what I am gonna do now is build SVM over those features 263 00:18:31,016 --> 00:18:40,180 jump to 21:36 264 00:21:35,478 --> 00:21:43,839 I restarted this thing 265 00:21:43,839 --> 00:21:49,619 the actual training for this SVM thing takes that long, 266 00:21:49,619 --> 00:21:58,018 this is very quick, essentially 20 images worth of a thousand features 267 00:21:58,018 --> 00:22:01,840 so there was no big training loop to do 268 00:22:01,840 --> 00:22:09,070 then I can run this on the actual models in the directory, in the test set 269 00:22:09,070 --> 00:22:12,680 so here this is images that it has never seen before 270 00:22:12,680 --> 00:22:16,440 it thinks that this is a modern car 271 00:22:16,440 --> 00:22:19,020 this one it thinks is a classic car, this one is classified as modern 272 00:22:19,020 --> 00:22:26,301 so this is actually doing quite a good job out of just 10 examples of each 273 00:22:26,301 --> 00:22:32,770 it actually thinks this one is modern it's not a sports car but anyway 274 00:22:32,770 --> 00:22:38,939 so this is showing that the SVM we trained 275 00:22:38,939 --> 00:22:42,901 can classify based on the features that Inception is producing because 276 00:22:42,901 --> 00:22:47,231 Inception understands "understands" what images are about 277 00:22:47,231 --> 00:22:50,801 so if I go back to here, code is on GitHub 278 00:22:50,801 --> 00:22:53,992 conclusions okay, this thing really works 279 00:22:53,992 --> 00:22:58,402 we didn't have to train a deep neural network 280 00:22:58,402 --> 00:23:01,876 we could plug this TensorFlow model into an existing pipeline 281 00:23:01,876 --> 00:23:04,760 and this is actually something where 282 00:23:04,760 --> 00:23:08,532 the TensorFlow Summit has something to say about these pipelines 283 00:23:08,532 --> 00:23:11,013 because not only are they talking about deep learning 284 00:23:11,013 --> 00:23:14,753 they are talking about the whole cloud-based learning 285 00:23:14,753 --> 00:23:19,453 and setting up proper processes 286 00:23:19,453 --> 00:23:23,965 I guess, time for questions quickly 287 00:23:23,965 --> 00:23:29,142 we can then do the TensorFlow Summit wrap-up 288 00:23:33,212 --> 00:23:37,144 "I am assuming that there is no backpropagation here" 289 00:23:37,144 --> 00:23:40,034 This includes no backpropagation 290 00:23:40,034 --> 00:23:42,504 "End result is a feature" 291 00:23:45,884 --> 00:23:53,135 I am just assuming that Inception, you can imagine if the ImageNet thing 292 00:23:53,135 --> 00:23:56,265 had focused more on products, it could be even better 293 00:23:56,265 --> 00:23:58,914 if it focused on man-made things 294 00:23:58,914 --> 00:24:04,915 The ImageNet training set has an awful lot of dogs in it, not that many cats 295 00:24:04,915 --> 00:24:09,426 So, on the other hand it may be that it has quite a lot of flowers 296 00:24:09,426 --> 00:24:13,826 or maybe that it is saying I like this car as modern car 297 00:24:13,826 --> 00:24:16,046 because it's got petals for wheels 298 00:24:16,046 --> 00:24:20,385 whereas the other one, the classic cars tend to have round things for wheels 299 00:24:20,385 --> 00:24:25,146 So it is abstractly doing this 300 00:24:25,146 --> 00:24:29,918 It doesn't know about sports cars or what they look like 301 00:24:29,918 --> 00:24:31,587 But it does know about curves 302 00:24:34,607 --> 00:24:37,527 "So for SVM, you don't use TensorFlow anymore ?" 303 00:24:37,527 --> 00:24:43,157 No, basically I have used TensorFlow to create some features 304 00:24:43,157 --> 00:24:45,308 Now, I don't want to throw it away 305 00:24:45,308 --> 00:24:47,687 because hopefully I have got a streaming process where 306 00:24:47,687 --> 00:24:52,177 more and more images are chugged through this thing 307 00:24:52,177 --> 00:25:04,528 could not hear the question properly 308 00:25:07,058 --> 00:25:10,068 There is an example code called TensorFlow for poets 309 00:25:10,068 --> 00:25:13,296 where they actually say that, let's load up one of these networks 310 00:25:13,296 --> 00:25:15,369 and then we will do some fine tuning 311 00:25:15,369 --> 00:25:21,977 there you get involved in tuning these neurons with some gradient descent 312 00:25:21,977 --> 00:25:24,819 and you are taking some steps and all this kind of thing 313 00:25:24,819 --> 00:25:28,328 maybe you are having broad implications across the whole network 314 00:25:28,328 --> 00:25:32,819 which could be good if you have got tons of data and tons of time 315 00:25:32,819 --> 00:25:36,948 but this is a very simple way of just tricking it to get it done 316 00:25:36,948 --> 00:25:47,382 could not hear the comment properly 317 00:25:47,382 --> 00:25:54,033 it will be a very small network because SVM is essentially fairly shallow 318 00:25:54,033 --> 00:26:06,532 could not hear the question 319 00:26:06,532 --> 00:26:13,752 TensorFlow even though it has imported this large Inception network 320 00:26:13,752 --> 00:26:20,572 as far as I am concerned, I am using a f(x) = y and that's it 321 00:26:20,572 --> 00:26:25,062 but you can inquire what would it say at this particular level 322 00:26:25,062 --> 00:26:30,473 and these bunches of levels with various component points along the way 323 00:26:30,473 --> 00:26:33,654 I could take out other levels 324 00:26:33,654 --> 00:26:35,783 I haven't tried it to have a look 325 00:26:35,783 --> 00:26:40,083 There you get more like pictures worth of features rather than 326 00:26:40,083 --> 00:26:43,094 this string of a 1000 numbers 327 00:26:43,094 --> 00:26:48,884 but each intermediate levels will be pictures with CNN kind of features 328 00:26:48,884 --> 00:26:53,544 on the other hand, if you want to play around with this thing 329 00:26:53,544 --> 00:26:57,654 there's this nice stuff called the DeepDream kind of things 330 00:26:57,654 --> 00:27:02,559 where they try and match images to being interesting images 331 00:27:02,559 --> 00:27:06,454 then you do the featurizing that looks at different levels 332 00:27:06,454 --> 00:27:12,415 the highest level is a cat but I want all local features to be as fishy as possible 333 00:27:12,415 --> 00:27:15,561 then you get like a fish-faced cat 334 00:27:15,561 --> 00:27:20,010 that's the kind of thing you can do with these kinds of features in models