In the first part of this article on VGG16 we describe the part of each layer in this network. Now we will implement it with Keras. Note that Keras has a pre-trained VGG16 method : I have used it in this article. But this time we will use the
Sequential() model of Keras to build it ! If one of the argument of a Keras function doesn’t make sense for you, you should refer to the part I, each layer is explained step by step.
Note that we will not train the model but only build it (the architecture) then use the weights provided here.
The information about the architecture are in this tab from the original article :
I. The architecture
1 . Convolutional block
In the original article on VGG16, the authors writes : “The convolution stride is fixed to 1 pixel; the spatial padding of conv. layer input is such that the spatial resolution is preserved after convolution, i.e. the padding is 1 pixel for 3 × 3 conv. layers”.
To do that with Keras we have two options :
- Precise before the conv layer :
- Or precise when adding the conv layer :
padding='same'in the arguments, it will padding the input so the output as the same length as the input.
B. Convolutional layer
The convolution layer for convolution over images in Keras is
Conv2D. There are two arguments to specified :
kernel_size. The first one is the number of filters that results from the convolution. The second one is the shape of the patches. (Conv layer are explained in part I !). These information are in the tab at the beginning of the post.
For example the first one will be :
model.add(Conv2D(64, (3, 3), activation='relu', padding='same')
C. Max Pooling
At the end of each convolutional block there is a max pooling layer. The Keras function is
MaxPooling2D and the arguments
pool_size is the size of the patches. (again if it max pooling doesn’t make sense for you, it is explained in the first part of the article). The
strides argument . It is given in the article “Max-pooling is performed over a 2 × 2 pixel window, with stride 2.”
The first max-pooling layer will be
model.Add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
2. Fully-connected / Dense block
A. Flatten layer
Because we want the
dense layer to return a 1D vector (for the predictions), its input should be a 1D vector. That’s why we need to
flatten() the input.
B. Dense (of FC) layer
For this layer we just need to specified the output with
units argument and the activation layer that followed. According to the paper it’s ReLU activation function that is used.
For the first dense layer :
Finally to get our predictions we use the
soft-max function as described in the tab.
II. Put the pieces together
1 . Write the stack of layers
First we have to check that the layers are the same with
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_1 (Conv2D) (None, 224, 224, 64) 1792 _________________________________________________________________ conv2d_2 (Conv2D) (None, 224, 224, 64) 36928 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 112, 112, 64) 0 _________________________________________________________________ conv2d_3 (Conv2D) (None, 112, 112, 128) 73856 _________________________________________________________________ conv2d_4 (Conv2D) (None, 112, 112, 128) 147584 _________________________________________________________________ max_pooling2d_2 (MaxPooling2 (None, 56, 56, 128) 0 _________________________________________________________________ conv2d_5 (Conv2D) (None, 56, 56, 256) 295168 _________________________________________________________________ conv2d_6 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ conv2d_7 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ max_pooling2d_3 (MaxPooling2 (None, 28, 28, 256) 0 _________________________________________________________________ conv2d_8 (Conv2D) (None, 28, 28, 512) 1180160 _________________________________________________________________ conv2d_9 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ conv2d_10 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ max_pooling2d_4 (MaxPooling2 (None, 14, 14, 512) 0 _________________________________________________________________ conv2d_11 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ conv2d_12 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ conv2d_13 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ max_pooling2d_5 (MaxPooling2 (None, 7, 7, 512) 0 _________________________________________________________________ flatten_1 (Flatten) (None, 25088) 0 _________________________________________________________________ dense_1 (Dense) (None, 4096) 102764544 _________________________________________________________________ dense_2 (Dense) (None, 4096) 16781312 _________________________________________________________________ dense_3 (Dense) (None, 1000) 4097000 ================================================================= Total params: 138,357,544.0 Trainable params: 138,357,544.0 Non-trainable params: 0.0 _________________________________________________________________
2. Load the weights
We load the weights with the function
load_weights from here. The file you have to download is : vgg16_weights_tf_dim_ordering_tf_kernels.h5
I want to compare the accuracy of my ‘home-made’ model to the Keras model. I already use the
VGG16() model of Keras for the Cats-Vs-Dogs-Redux challenge, I will use the same data.
First we should adapt the VGG16 model (build for ImageNet competition that requires 1000 classes) to our classes : we have 2 categories so the final ouput should be a vector of length 2.
To obtain the best results I am reusing the fine-tuning method of this article.
I put the entire code in this gist. I finally got this accuracy :
Epoch 1/2 703/703 [==============================] - 1106s - loss: 0.0454 - acc: 0.9841 - val_loss: 0.0613 - val_acc: 0.9797 Epoch 2/2 703/703 [==============================] - 1100s - loss: 0.0279 - acc: 0.9906 - val_loss: 0.0601 - val_acc: 0.9805
Building VGG16 from scratch is an opportunity to come back on each layers functions. It is also an a chance to learn to use the
Sequential()model of Keras. Moreover it gave 98% of accuracy on the validation set, which is quite good !