Delving into Deep Learning – Part 2

Repurpose a model through transfer learning.


The code for this blog is available at


In the previous installment, we talked a bit about ResNET and how it can be used as an image classifier.


My use-case is to exploit what this Neural Network already knows for my own purposes. It can classify between 1000 different categories. I need it to classify between only three: a Car, an SUV or a Van.


Let’s have a look at the ResNET topology.


To achieve this, I will use a Keras built-in function. But I need the prerequisites first:

Now we can run this visualisation code:

Looking at the model image RESNET50_architecture.png, it looks like RESNET does a rinse-repeat of activation, split, convolute one side and add result with previous branch. Interesting. What is of particular interest is the last Dense layer:

We can get more detail from the output of model.summary():

So after a Flatten layer, we have a Dense layer spanning the 1000 ResNet pre-trained classes.

The idea is to leave what ResNET knows about image deconstruction and identification intact, and retrain the last layer to remap to my three new classes. As such, I want to leave the network and weights of all but the last Dense layer intact and swop out that Dense 1000 layer for my own Dense 3 layer.


We will need data. Lots of data!


I found a sizable set of vehicle images at

You can download the dataset from

And the meta-data from

And the full annotation list from

Unfortunately, cars_annos.mat is in MatLab format, so I had to use convert it into annotations.json.

Using these annotations, I manually classified the set into Car, SUV and Van by creating my own ClassificationMap.csv.


I’m going to eventually feed this data to our model using a Keras function called flow_from_directory. It’s nifty in that it understands that you’re attempting a classification problem, but it requires your data to be split up into a training set that will be used to train the model and a validation set that will be used to verify the model’s correctness.


I also need to restrict images that are of poor quality from entering the dataset. And I also want to train using as much signal as I can get from the images, in other words, ignore the backgrounds as much as possible. Luckily the dataset includes bounding boxes in the annotation data which I can use to crop the images.


After putting the raw images in an ImagesUnclassified folder, I ran my sorting script that achieved all the data sorting, cropping and filtering requirements in one go.


After a couple of minutes of running the script, my folder structure looks like this:


Because we will be using a ton of data, my poor CPU was not sufficient to train the model. As such, I had to execute the training bit of this exercise on a computer with a GPU. And to do that I had to install a GPU-enabled version of the Keras backend. My initial implementation used the GPU distribution of CNTK. However, the most commonly found literature seems to point to Tensorflow as the industry standard, so I eventually opted to use that instead.


Tensorflow does have its own dependencies a la NVidia CUDA, so please refer to this site for installation guidelines:

A word of caution: note the very specific versions of Python, CuDNN and CUDA drivers needed. I had quite a tough time by getting these wrong!


Once GPU-enable Tensoflow was up, I could finally start the model training cycle by running script


Let’s have a look at some of the code:

Here I use a pre-processing function to normalise the images to what ResNET was trained on. I also get more bang for my buck by generating extra images from the ones I supplied, but randomly rotated by 30 degrees, randomly sheered, and sometimes flipped on the horizontal axis.

Here I setup a data generator from the directories I supplied, resized to what ResNET expects and telling the model that it should treat the data as categorical.

Now we load up the ResNET model, but we exclude the “top” ie the last fully connected layer. I supply my own top layer as a Dense layer of class_count categories which in our case would be 3.

I make sure that only my own layer is trainable. I don’t want to muck with ResNET too much. Not yet anyway.

This statement will kick-off the training session.

And here we save the trained model so we can easily load it later.


I set the number of epochs (learning iterations) to 50 and executed the script. After a long training session, this is the final output:

What this is saying as that we achieved 70.75% accuracy on our training data and 81.38% accuracy on our validation data. The model has never seen the validation data before until we used it as a measure, so what this means is that it must be generalising ideas from our training set.


As a demonstration, I’m going to give it these two pictures to classify:








This is the output from the prediction script

[(‘SUV’, 0.98928076), (‘Van’, 0.008739293), (‘Car’, 0.0019799978)]

[(‘SUV’, 0.64701205), (‘Van’, 0.3395496), (‘Car’, 0.0134383505)]


So the first image is spot-on, 98.9% sure it’s an SUV.

The second image, however, needs some more attention.


The final tool in our toolbox for today is a technique called Fine Tuning.


Now that our last fully connected layer is starting to do what we want to, we can attempt to retrain some of the deeper ResNET layers.


Script was written for this purpose. It looks similar to our previous training script, but with these important changes:

Here we want the last 50 layers to be marked as trainable.

And we use a different optimizer: one where we can specify a very tiny learning rate. If we make too big jumps, our model will not be able to converge on an answer.


After a very long training session we arrive at this output:


99.9% accuracy on training data and 85.4% on validation. I rerun out previous two samples using and get this result:

[(‘SUV’, 0.94335765), (‘Van’, 0.04256247), (‘Car’, 0.0140798)]

[(‘Van’, 0.50210416), (‘Car’, 0.4541535), (‘SUV’, 0.043742355)]


Success! It was able to classify our Van example as a Van.


It seems as though our Fine Tuned model does what we want it to. From here on what we can do is train it up some more, or even train additional layers to give a better accuracy.


I hope this was a useful introduction to the application of Neural Nets.


Want to be part of the movement? Follow us on Slack at #data


Author: Pieter Coetzee – Senior .Net Developer


Training Images from

3D Object Representations for Fine-Grained Categorization

 Jonathan Krause, Michael Stark, Jia Deng, Li Fei-Fei

4th IEEE Workshop on 3D Representation and Recognition, at ICCV 2013 (3dRR-13). Sydney, Australia. Dec. 8, 2013.