My Introduction to Transfer Learning

John Cook
4 min readJul 3, 2020

Abstract:

I created a Keras training model, using transfer learning, that is about 87% accurate on the cifar10 data set. My goal was 88% accuracy at least. I knew that in order to get reasonable accuracy it is best to preprocess the data to as closely match the original data as possible. My constraints where to use python 3.5.2, tensorflow 1.12.0 and only import tensorflow.keras as K. I believe that with some fine tuning of the learning rate and a drop out layer or two I could have gotten around 90% accuracy but I ran out of time. I would also note that I am limited by what my personal computer can do at this time.

Introduction:

The goal of this exercise was to create a Keras model with transfer learning that is at least 88% accurate on the cifar10 data set. I was able to get 87% without any over fitting counter measures, aside form an adaptive learning rate which has not been fine tuned yet. After talking with my peers and mentor and trying some other architectures with no luck or too slow of speeds I settled on the xception architecture.

Materials and Methods:

My constraints where to use python 3.5.2, tensorflow 1.12.0 and only import tensorflow.keras as K. Starting out I knew that I wanted to match the cifar10 data set as closely to the architecture that I decided to use while not exceeding my system limitations. I am using a virtual machine with about 32 gbs of ram and 6 processing cores that on average run at around 3.6 ghz. I also was given only about a week of time to meet the goal having only a basic understanding of CNNs at the time. As stated I knew that I had to match the data to the architecture as closely as possible. I use a lambda layer to resize the input images to seven times their original scale to closely match the original size of the chosen architecture. This results in about 68% accuracy without any additional measures. I then save this model and load it and run a prediction on it. This saves about an hour and a half of training time per subsequent runs however the prepossessing and prediction still take an hour. I then added a few fully connected layers and got results close to my original goal.

Results

After much trial and error, I have to settle with 87% accuracy, just shy of the original goal, as I have run out of time. However I am pleased with the results given the complications that arose during the process. The preprocessing takes just over an hour on my vm, and then the actual training takes mere seconds, I am pleased with this as I could if it where in the scope of the exercise save the predictions to a CSV or pickle and load them from disk allowing for rapid training and further development.

Discussion

The most difficult part of this process was saving the model and then loading it and making a prediction. I would have liked to have been able to use other libraries to save to predictions to disk and speed up development however this was outside of the constraints of the exercise. I also struggled with choosing an architecture, I tried several before deciding to just stick with the one my mentor recommended after a discussion on the topic. I would also like to use dropout layers but I get an attirbute error that has to do with how I am saving and loading the model and how we where required to import Keras. I would like to add at least one dropout layer because I know I have an over fitting problem. I also can not at this time get the main file that was supplied to us to print out the loss and accuracy of the data and am unsure why this is. Despite not having a proper server to work on this project was educational and I believe that I would be able to reach at least 90% accuracy given more time. I would also have liked to get tensor board working but due to time restraints I did not make an attempt.

Sources:

--

--