Delving into Deep Learning – Part 1

I recently attended a Microsoft Hack session around Computer Vision. I’ve been interested in this topic since I can remember and I coded my first neural network in 2000 as a computer opponent to a game of Tic-Tac-Toe. It trained itself on the fly and had the ability of completely flat-lining 1 GHz AMD processor, and it still lost the game.

Things have certainly progressed since then and Deep Learning has recently become the new buzz-phrase. Since the ever looming artificiapocalypse (yes, I made a word) is apparently coming, I figured I should probably start investing some time in this phenomenon.

This blog is an attempt to detail part of my journey in a step by step fashion so that you can follow along. The code is available at

My very first use-case was quite simple to verbalise: from a picture of a vehicle, identify if MobeeWash should classify it as Car, SUV, or Bus.

To accomplish this, I needed to use a Convolutional Neural Network that, according to research, is best at image classification.

The idea is that you train up a set of convolution matrices to perform convolutions i.e. multiply the image matrix with the convolution matrix. Then remove negative values using a Rectified Linear Unit element-wise operation that produces a rectified feature map. You then perform a pooling operation on the feature matrix to reduce the complexity, but with the added benefit of keeping the important extracted features. Continue to do this for a couple of layers and finally smooth the output over the number of categories that you’re training on.

This article does a lot more justice to the topic than I can and I recommend giving it a thorough read:

The term “training” in this case means feeding the model a ton of data and calculating what these convolution matrices and final connected layers should look like. Luckily this work has already been done on thousands of images in the form of pre-trained models and I will, therefore, repurpose such a model. As a start, I will use the model in its vanilla state as its authors intended.

The first thing I realised is that I will have to become intimately familiar with a scripting language. The two most widely used languages in the machine learning space are Python and R. I decided on the former as it, well, looked shinier. There really is no difference from a utility point of view. So long as you pick one and stick to it.

Next was a decision on which IDE to use. Visual Studio Code was the obvious choice as I wanted to stick to the Microsoft tool-chain. You can get it here: for free and it deserves a blog dedicated all to itself. It really is a work of art and if you are using any other text editor I strongly advise swopping it out for this bad-boy. It will change your life!

I then needed to get Python working inside VSCode. My Visual Studio 2017 installation came with a Python installation called Anaconda. You can also get it as a standalone from here:

The recommended approach is to isolate your ML code from other Python code in case the different dependencies interfere with one another. From an elevated command line, I created a new MachineLearning virtual environment for Python with the latest (i.e. 3.6) Python interpreter:
conda create -n MachineLearning python=3.6 anaconda

After that, I grabbed the VS Code extension called Python, directed it to my Python MachineLearning environment and I was up and running.

The two major ML engines competing at the moment are Google’s Tensorflow and Microsoft’s Cognitive Toolkit (AKA CNTK). I opted for the latter as I am a Microsoft evangelist at heart. For completeness sake, there’s also a third engine called Theano, but it seems to be nearing the end of its life-cycle.

Installing CNTK into Python is as simple as running the “pip install” command and installing the wheel (Python speak for “package”) you need according to

Seeing as I’m using Python 3.6 on a Windows PC without a Graphics Card I need to use:
pip install

CNTK itself was too low level for a newbie starting out, so I needed to use a higher level driver. Keras came to the rescue And installing was simple:
pip install keras

Keras defaults to using Tensorflow as a Back-end. You need to spin-up Keras at least once for it to generate the config file “keras.json”. In other words, after running (and failing) the sample app, hunt for this file in %USERPROFILE%/.keras/keras.json. We want to change the “backend” to “cntk”
“floatx”: “float32”,
“epsilon”: 1e-07,
“backend”: “cntk”,
“image_data_format”: “channels_last”

The model we will use is called RESNET50, a residual Convolutional Neural Network consisting of 50 layers. It has been pre-trained on imagenet images, and the meat of the model aka its weights can be downloaded using the ResNet50 constructor. Our code will, therefore, be simple:

I found a cute picture of a VW minibus from and saved it as “test_image.jpg”.

If we run this, Keras will download and cache the ‘imagenet’ weights once and then produce this output:
[(‘n03769881’, ‘minibus’, 0.97781414), (‘n03770679’, ‘minivan’, 0.0099776555), (‘n04065272’, ‘recreational_vehicle’, 0.008111931)]

In other words, it is 97.7% sure that this image is a minibus. Neato!

But ResNet was trained on about 1000 different categories. I found this little guy on

And according to ResNet it’s a:
[(‘n03916031’, ‘perfume’, 0.50275147), (‘n01729322’, ‘hognose_snake’, 0.37831724), (‘n03690938’, ‘lotion’, 0.034052484)]
The results say it’s 50% chance of being Perfume and 37.8% chance of being a Snake. So we’re not exactly into Terminator 2 level cognitive abilities yet, but it’s a good start.

Coming in the next blog: re-purposing this pre-trained model using transfer learning.

Want to be part of the movement? Follow us on Slack at #data

Author: Pieter Coetzee – Senior .Net Developer