My 3 months with Computer Vision — Part 3— Simple Neural Network from scratch for MNIST

2 min readApr 18, 2021

Let’s start with Project 1 — MNIST dataset. MNIST is dataset of handwritten digits. These are 28, 28 pixel images and you need to predict what digit it is between 0 and 9.

1. Loading The Data

Let’s start by downloading the data. The data is from Kaggle https://www.kaggle.com/c/digit-recognizer .We will also create a submission for Kaggle and see how our model performs.

You can start by cloning my repository https://github.com/angadp/DeepLearning

Let’s load the data and see a few Samples:

2. Augmenting the data

We will use the ImageDataGenerator function for this. Basically you pass images to this function and it gives you the same images but with the images augmented. This is done to make sure that the right features are learnt. You are basically learning the same images but in a different way.

3. Model

Let’s go to the model now. If you have read the previous article, we need to create a convolution i.e. a model that starts big and becomes small.

Please go through the Keras documentation, you can experiment with more perimeters of this Con2D Layers and MaxPooling Layers. Just make sure that the no of parameters are decreasing every layer. If you visualize this model you will see what I mean,

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 26, 26, 256)       2560      
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 256)       0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 128)       295040    
_________________________________________________________________
batch_normalization (BatchNo (None, 11, 11, 128)       512       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 9, 9, 96)          110688    
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 7, 7, 64)          55360     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 3, 3, 64)          0         
_________________________________________________________________
dropout (Dropout)            (None, 3, 3, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 576)               0         
_________________________________________________________________
dense (Dense)                (None, 128)               73856     
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 32)                4128      
_________________________________________________________________
dense_2 (Dense)              (None, 10)                330       
=================================================================
Total params: 542,474
Trainable params: 542,218
Non-trainable params: 256
_________________________________________________________________