Build VGG16 from scratch : part II

In the first part of this article on VGG16 we describe the part of each layer in this network. Now we will implement it with Keras. Note that Keras has a pre-trained VGG16 method : I have used it in this article. But this time we will use the Sequential() model of Keras to build it ! If one of the argument of a Keras function doesn’t make sense for you, you should refer to the part I, each layer is explained step by step.

Note that we will not train the model but only build it (the architecture) then use the weights provided here.

The information about the architecture are in this tab from the original article :

I. The architecture

1 . Convolutional block

A. Padding

In the original article on VGG16, the authors writes : “The convolution stride is fixed to 1 pixel; the spatial padding of conv. layer input is such that the spatial resolution is preserved after convolution, i.e. the padding is 1 pixel for 3 × 3 conv. layers”.

To do that with Keras we have two options :

  • Precise before the conv layer : model.add(ZeroPadding2D((1, 1)))
  • Or precise when adding the conv layer : padding='same' in the arguments, it will padding the input so the output as the same length as the input.

B. Convolutional layer

The convolution layer for convolution over images in Keras is Conv2D. There are two arguments to specified : filters and kernel_size. The first one is the number of filters that results from the convolution. The second one is the shape of the patches. (Conv layer are explained in part I !). These information are in the tab at the beginning of the post.

For example the first one will be : model.add(Conv2D(64, (3, 3), activation='relu', padding='same')

C. Max Pooling

At the end of each convolutional block there is a max pooling layer. The Keras function is MaxPooling2D and the arguments pool_size and strides. The pool_size is the size of the patches. (again if it max pooling doesn’t make sense for you, it is explained in the first part of the article). The strides argument . It is given in the article “Max-pooling is performed over a 2 × 2 pixel window, with stride 2.”

The first max-pooling layer will be model.Add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))

2. Fully-connected / Dense block

A. Flatten layer

Because we want the dense layer to return a 1D vector (for the predictions), its input should be a 1D vector. That’s why we need to flatten() the input.

B. Dense (of FC) layer

For this layer we just need to specified the output with units argument and the activation layer that followed. According to the paper it’s ReLU activation function that is used.

For the first dense layer : Dense(4096, activation='relu')

C. Predictions

Finally to get our predictions we use the soft-max function as described in the tab. model.add(Dense(1000, activation='softmax'))

II. Put the pieces together

1 . Write the stack of layers

First we have to check that the layers are the same with model.summary()

 _________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 224, 224, 64)      1792      
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 224, 224, 64)      36928     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 112, 112, 64)      0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 112, 112, 128)     73856     
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 112, 112, 128)     147584    
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 56, 56, 128)       0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 56, 56, 256)       295168    
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 56, 56, 256)       590080    
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 56, 56, 256)       590080    
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 28, 28, 256)       0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 28, 28, 512)       1180160   
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 28, 28, 512)       2359808   
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 28, 28, 512)       2359808   
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 14, 14, 512)       0         
_________________________________________________________________
conv2d_11 (Conv2D)           (None, 14, 14, 512)       2359808   
_________________________________________________________________
conv2d_12 (Conv2D)           (None, 14, 14, 512)       2359808   
_________________________________________________________________
conv2d_13 (Conv2D)           (None, 14, 14, 512)       2359808   
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 7, 7, 512)         0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 25088)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 4096)              102764544 
_________________________________________________________________
dense_2 (Dense)              (None, 4096)              16781312  
_________________________________________________________________
dense_3 (Dense)              (None, 1000)              4097000   
=================================================================
Total params: 138,357,544.0
Trainable params: 138,357,544.0
Non-trainable params: 0.0
_________________________________________________________________

2. Load the weights

We load the weights with the function load_weights from here. The file you have to download is : vgg16_weights_tf_dim_ordering_tf_kernels.h5

3. Fine-tuning

I want to compare the accuracy of my ‘home-made’ model to the Keras model. I already use the VGG16() model of Keras for the Cats-Vs-Dogs-Redux challenge, I will use the same data.

First we should adapt the VGG16 model (build for ImageNet competition that requires 1000 classes) to our classes : we have 2 categories so the final ouput should be a vector of length 2.

model.add(Dense(2, activation='softmax'))

To obtain the best results I am reusing the fine-tuning method of this article.

I put the entire code in this gist. I finally got this accuracy :

Epoch 1/2
703/703 [==============================] - 1106s - loss: 0.0454 - acc: 0.9841 - val_loss: 0.0613 - val_acc: 0.9797
Epoch 2/2
703/703 [==============================] - 1100s - loss: 0.0279 - acc: 0.9906 - val_loss: 0.0601 - val_acc: 0.9805

 

Conclusion

Building VGG16 from scratch is an opportunity to come back on each layers functions. It is also an a chance to learn to use the Sequential()model of Keras. Moreover it gave 98% of accuracy on the validation set, which is quite good !

1 thought on “Build VGG16 from scratch : part II”

Leave a Reply

Your email address will not be published. Required fields are marked *