Regularization for machine learning in terms a child could understand.

John Cook
3 min readMay 30, 2020

I am going to be explaining regularization is in regards to supervised learning, in very very simple terms, without much math. Or in other words a few ways to make computers smarter. I will be using simple terms because I was asked to do so. So with that out of the way lets list what this article will be covering. We are going to cover: L1 regularization, L2 regularization, dropout, data augmentation(doing more with less), and early stopping.

First lets explain why we need to regularize. Regularization is basically the process of studding, but not studding one thing too much. If we spend too much time on English homework and none on math we can not ever hope to do cool machine learning. However if you do the opposite you would not be able to write articles about code. So the real goal of regularization is to find a happy balance and not over train in one area. Turns out commuters learn in a somewhat predictable way, and if they study one thing too much they are not good at other things. If we want them to do more things we need to use regularization.

So now that we have explained why we care about regularization lets explain what L1 and L2 regularization is. L1 regularization makes it so that some times you do not study one thing at all. Where as L2 regularization makes it so that you almost do not study it at all sometimes. If I where to offer a 6 year old no ice cream or almost no ice cream they would appreciate the difference even though they would want more. When I say sometimes I mean that for a round of training the subject of learning is either possibly not considered or possibly, almost, not considered. So the main distinction is L1 is possibly nothing, and L2 is possibly, almost, nothing.

Dropout is another way of training. Its like learning your neighborhood, and all the paths through it. If you want to get to one place there are many different ways to get there. So lets say there is construction on your quickest path to where you are going. You are smart and know another way to get there. Well computers can learn in a way that is similar to this, if you take away one of their “neurons” they will make new paths to get to where they are going. This can actually make them smarter, and improves regularization.

Next is data augmentation. Lets say we want to teach a computer how to recognize a cat, but we only have one cat. Well we can take lots of pictures of the cat in different positions and this will teach the computer what a cat is better then just one picture of one cat. The more pictures or data we have the better the computer learns. So if we have more cats it will be better at learning what a cat is.

Early stopping is like when you are studying and are sleepy, maybe you know what you know, but learning new things is hard. The same goes for computers kind of. If it trains for too long on one topic it can get “sleepy” and not perform as well on other task that are new to it. So we want to stop the computer before it gets too tired.

So to cover everything we have learned, computers can learn in different ways and regularization is keeping their education well balanced so that they can perform well at all task. L1 is possibly nothing, while L2 is almost possibly nothing. Dropout is about finding new paths when the current path is blocked. Data augmentation is making the best with what you have, and early stopping is stopping before the computer gets “sleepy”. If you already know about these topics congratulations, and I hope that you could explain them in such simple terms. I know my analogies are not perfect but I was asked to be as simple as possible. If you do not know much about computers or machine learning then I hope this article was informative. Thanks for reading!

--

--