My 3 months with Computer Vision — Part 3— Simple Neural Network from scratch for MNIST
Let’s start with Project 1 — MNIST dataset. MNIST is dataset of handwritten digits. These are 28, 28 pixel images and you need to predict what digit it is between 0 and 9.
1. Loading The Data
Let’s start by downloading the data. The data is from Kaggle https://www.kaggle.com/c/digit-recognizer .We will also create a submission for Kaggle and see how our model performs.
You can start by cloning my repository https://github.com/angadp/DeepLearning
Let’s load the data and see a few Samples:
2. Augmenting the data
We will use the ImageDataGenerator function for this. Basically you pass images to this function and it gives you the same images but with the images augmented. This is done to make sure that the right features are learnt. You are basically learning the same images but in a different way.
3. Model
Let’s go to the model now. If you have read the previous article, we need to create a convolution i.e. a model that starts big and becomes small.
Please go through the Keras documentation, you can experiment with more perimeters of this Con2D Layers and MaxPooling Layers. Just make sure that the no of parameters are decreasing every layer. If you visualize this model you will see what I mean,
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 26, 26, 256) 2560
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 256) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 11, 11, 128) 295040
_________________________________________________________________
batch_normalization (BatchNo (None, 11, 11, 128) 512
_________________________________________________________________
conv2d_2 (Conv2D) (None, 9, 9, 96) 110688
_________________________________________________________________
conv2d_3 (Conv2D) (None, 7, 7, 64) 55360
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 3, 3, 64) 0
_________________________________________________________________
dropout (Dropout) (None, 3, 3, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 576) 0
_________________________________________________________________
dense (Dense) (None, 128) 73856
_________________________________________________________________
dropout_1 (Dropout) (None, 128) 0
_________________________________________________________________
dense_1 (Dense) (None, 32) 4128
_________________________________________________________________
dense_2 (Dense) (None, 10) 330
=================================================================
Total params: 542,474
Trainable params: 542,218
Non-trainable params: 256
_________________________________________________________________
As you can see, the number of parameters are decreasing as we go from starting layer to another. Now the work will be done by the training method.
4. Compile
We are predicting categories, so the loss is cateogorial_crossentropy.
5. Training and Visualizing
You define the batch size and epochs and the rest is taken care by Keras.
After training we will see a graph like this.
6. Predicting
You can use the model you have trained to predict new items as shown below.
Next let’s move on CatsVDogs.