Fine-tuning a pre-trained model with Keras

In the first post, I’ve created a simple model with Keras, which gave quite good results: more than 96% of accuracy on the Dogs Vs Cats Redux  data from Kaggle. However the accuracy can be easily improved by changing the way I fine-tuned the model.

The model is based on a pre-trained model VGG16. To improve it the main idea is simple : instead of training only the last layer, I will train multiple layers.

NB : I am not going to detail the beginning of the process, it is explained in the first post

Found 22500 images belonging to 2 classes.
Found 2500 images belonging to 2 classes.

1. Fine tune the last layer slightly differently

When using include_top=False in the VGG16 model of Keras, the final layer is removed but the last two FC (fully-connected) layers are also removed. (more about this). I noticed that keeping these two layers gave me better results. So I specified include_top=True and removed the predictions layer later.

The VGG-16 model is trained on the 1000 categories of ImageNet. We are going to add a dense layer and fit our model so that the model is adapted to our categories. We fine-tune the last layer.

Epoch 1/3
22500/22500 [==============================] - 517s - loss: 0.1292 - acc: 0.9589 - val_loss: 0.1002 - val_acc: 0.9668
Epoch 2/3
22500/22500 [==============================] - 517s - loss: 0.0859 - acc: 0.9713 - val_loss: 0.1048 - val_acc: 0.9684
Epoch 3/3
22500/22500 [==============================] - 517s - loss: 0.0620 - acc: 0.9790 - val_loss: 0.0841 - val_acc: 0.9716

2. Fine-tune the other layers

So far, we’ve fine-tuned the last layer. But actually we can also fine-tune the rest of the dense layers of our model. We are going to “freeze” the 10 first layers and train the others.
Now that the last layer is already optimized we can use a lower learning rate.

Epoch 1/20
22500/22500 [==============================] - 837s - loss: 0.0327 - acc: 0.9888 - val_loss: 0.0749 - val_acc: 0.9752
Epoch 2/20
22500/22500 [==============================] - 836s - loss: 0.0217 - acc: 0.9932 - val_loss: 0.0743 - val_acc: 0.9776
Epoch 3/20
22500/22500 [==============================] - 836s - loss: 0.0162 - acc: 0.9961 - val_loss: 0.0742 - val_acc: 0.9776
Epoch 4/20
22500/22500 [==============================] - 835s - loss: 0.0125 - acc: 0.9974 - val_loss: 0.0743 - val_acc: 0.9768
Epoch 5/20
22500/22500 [==============================] - 835s - loss: 0.0099 - acc: 0.9984 - val_loss: 0.0741 - val_acc: 0.9768
Epoch 6/20
22500/22500 [==============================] - 834s - loss: 0.0081 - acc: 0.9987 - val_loss: 0.0745 - val_acc: 0.9772

3. Predictions

Finally if we can use our model to run predictions on new data (non labeled).

Conclusion

By fine-tuning multiple layers we improve our first simple model to reach an accuracy of almost 98%. However, it looks like we could improve it a little bit more by exploring our data or prevent our model from over-fitting … we’ll talk about that in the next posts 🙂

NB : the entire code can be found here.

A simple classifier using a pre-trained model with Keras

In this article I am going to create a simple classifier in a few lines of Python. I am using the data from Dogs vs. Cats Redux Kaggle competition, but it can be used for any classification task.

To build this model I will use Keras. Keras is an API to create neural networks or use pre-trained networks. It can run on top of Tensorflow or Theano. I use an AWS machine (P2 instance)  to run my script however you can run it on any computer (it will take a little more time…).

 

0. Setup

To use the main functions of Keras easily, the images directory should have a specific structure : each subdirectory should contain the one folder per class (e.g. possible prediction).

 ├── sample 
 │   ├── test 
 │   ├── train 
 │   └── valid 
 ├── test 
 │   └── unknown 
 ├── train 
 │   ├── cats 
 │   └── dogs 
 └── valid 
     ├── cats 
     └── dogs

NB : The test data should also contain a subdirectory called unknown which contains all the test images.

The sample directory is not necessary but it’s useful to test the entire process before you launch it with all the data.

VGG16

For our classifier we are going to use a specific architecture: VGG16. This model was developed for the ImageNet competition by the VGG team at Oxford,  and it contains only 16 layers.

 

VGG16 architecture (picture from here)

 

(224, 224) is the size of the images used for VGG16.

1. Generation of batches of data

Firstly we create batches of data with flow_from_directory()This article by F. Chollet, the author of Keras, explains the method. We need to split the test, train, and validation data in batches.

Found 22500 images belonging to 2 classes.
Found 2500 images belonging to 2 classes.

2. Fine-tune the model

VGG16 is trained with the 1000 categories of ImageNet, but we need to customize the model for our categories (cats and dogs). To do that, we fine-tune it. The idea is to remove the last layer (which is the prediction layer), add a dense layer and train this new layer with our data. The other layers of the VGG16 model remain the same.

Keras documentation gives an example of fine-tuning with an other pre-trained model (InceptionV3).

Now that we have frozen the pre-trained layers, we can train the last one (which will be the predictions layer).

Epoch 1/1
22500/22500 [==============================] - 491s - loss: 0.9527 - acc: 0.9346 - val_loss: 0.5594 - val_acc: 0.9624

3. Predictions

Finally, we can use our model to make predictions on unseen data.

Conclusion

We learn how to build a simple model with Keras. We obtain 96% of accuracy with this model. However the final accuracy could be better with a few tips from the next post 🙂

PS: I included the entire code here