In the first post, I’ve created a simple model with Keras, which gave quite good results: more than 96% of accuracy on the Dogs Vs Cats Redux data from Kaggle. However the accuracy can be easily improved by changing the way I fine-tuned the model.
The model is based on a pre-trained model VGG16. To improve it the main idea is simple : instead of training only the last layer, I will train multiple layers.
NB : I am not going to detail the beginning of the process, it is explained in the first post
Found 22500 images belonging to 2 classes. Found 2500 images belonging to 2 classes.
1. Fine tune the last layer slightly differently
include_top=False in the VGG16 model of Keras, the final layer is removed but the last two FC (fully-connected) layers are also removed. (more about this). I noticed that keeping these two layers gave me better results. So I specified
include_top=True and removed the predictions layer later.
The VGG-16 model is trained on the 1000 categories of ImageNet. We are going to add a dense layer and fit our model so that the model is adapted to our categories. We fine-tune the last layer.
Epoch 1/3 22500/22500 [==============================] - 517s - loss: 0.1292 - acc: 0.9589 - val_loss: 0.1002 - val_acc: 0.9668 Epoch 2/3 22500/22500 [==============================] - 517s - loss: 0.0859 - acc: 0.9713 - val_loss: 0.1048 - val_acc: 0.9684 Epoch 3/3 22500/22500 [==============================] - 517s - loss: 0.0620 - acc: 0.9790 - val_loss: 0.0841 - val_acc: 0.9716
2. Fine-tune the other layers
So far, we’ve fine-tuned the last layer. But actually we can also fine-tune the rest of the dense layers of our model. We are going to “freeze” the 10 first layers and train the others.
Now that the last layer is already optimized we can use a lower learning rate.
Epoch 1/20 22500/22500 [==============================] - 837s - loss: 0.0327 - acc: 0.9888 - val_loss: 0.0749 - val_acc: 0.9752 Epoch 2/20 22500/22500 [==============================] - 836s - loss: 0.0217 - acc: 0.9932 - val_loss: 0.0743 - val_acc: 0.9776 Epoch 3/20 22500/22500 [==============================] - 836s - loss: 0.0162 - acc: 0.9961 - val_loss: 0.0742 - val_acc: 0.9776 Epoch 4/20 22500/22500 [==============================] - 835s - loss: 0.0125 - acc: 0.9974 - val_loss: 0.0743 - val_acc: 0.9768 Epoch 5/20 22500/22500 [==============================] - 835s - loss: 0.0099 - acc: 0.9984 - val_loss: 0.0741 - val_acc: 0.9768 Epoch 6/20 22500/22500 [==============================] - 834s - loss: 0.0081 - acc: 0.9987 - val_loss: 0.0745 - val_acc: 0.9772
Finally if we can use our model to run predictions on new data (non labeled).
By fine-tuning multiple layers we improve our first simple model to reach an accuracy of almost 98%. However, it looks like we could improve it a little bit more by exploring our data or prevent our model from over-fitting … we’ll talk about that in the next posts 🙂
NB : the entire code can be found here.