Today I decided to learn more about hyperparameter optimization, and how to implement it. First lets explain what a hyperparameter is. A hyperparameter is a variable in machine learning that is set before the model is trained. Its a constant that the machine uses to calculate other values.
I learned about the Gaussian Process and Bayesian Optimization. The application of which is as follows. Imagine a continues function, we do not know the shape of the line or curve it will make but we can probe it by inputting a value and in return we get a value. These values can be plotted on a graph if we assign them like we would an X and Y coordinate system. This graph will produce an approximation of the actual function. The goal of this process is to find the local maximum or minimum.
I started off the day with choosing a piece of code I am somewhat familiar with, my Transfer Learning code, and decided to optimize it, or at least try. Through the course of day I learned that my code was already pretty well optimized without drastically altering it, and that was not in the spirit of the challenge I set myself for this task. In the end I was unable to get above an 87% accuracy rating. However I am much more comfortable with hyperparameter optimization now. Also I cleaned up my code a little so it does not take forever to run multiple times.
I will confess I have done this before and with better results but I did not learn as much. I chose to focus on the learning rate and set it to decay very slowly. I also chose to modify the dropout rate, and the number of nodes per layer on three separate layers of the model, I could have done weights and regularization but decided not to as once again I find myself short on time before the day is up.
I chose to use the evaluation loss of my model as the metric to measure it by. I could have just as easily chosen evaluation accuracy but at the end of the day I liked the graph that the loss model produced slightly more then the accuracy graph. As the data points where more diverse and not a flat line. I know that if I had chosen a different model or rewritten my code significantly I would have had a curve, instead of a flat line.
I think I would like to play around with optimization more but there is much more to learn, and as I learned a model can only be optimized so much. I would say if I failed today it was only because of my choice to use something I was familiar with instead of learning something new.
Hyperparameter optimization is an excellent form of unsupervised machine learning. I find it quite enjoyable and see several parallels with physics. I especially like how you can not know the exact black box function you are working with at times, but you can narrow down the probability of what it is by measuring it. I guess the difference is that measuring it does not change the properties of the function, while it does in quantum physics. I think it would be fun to try and use the Gaussian process with Bayesian Optimization to try and simulate the “hyperparameters” of the universe such as pi, and e. However right now my studies must remain focused on machine learning, but hopefully some day I will get around to physics. I enjoyed this challenge and learned quite a lot from it, I wish I had more time but part of the challenge was seeing what I could achieve in a day.
The code(specifically task 6)