I recently gave a seminar talk on model compression in Machine Learning. It gives an overview of different techniques developed to compress the final model obtained after training in order to reduce memory footprint and speed–up prediction. I covered pruning methods such as Optimal Brain Damage (OBD) and Optimal Brain Surgeon (OBS), primarily for tree based models and ANNs, Deep Compression for DNNs and a generic method known as Knowledge Distillation by Geoffrey Hinton. This talk was part of the Advanced Machine Learning seminar at University of Heidelberg in the summer term 2016.
You can find a copy of the slides on this website and download the seminar report if you are interested in further reading. Note that the latter focuses on ANNs. The following gives a short summary of what to expect:
Over the last decades, very powerful algorithms for prediction on various kinds of problems have been developed, but until recently, people have mostly spend their effort on improving predictive performance. Unfortunately, deploying a model in reality requires considering far more factors than just the model’s predictive performance. Often, the available hardware is very limited, especially on mobile and embedded devices, and other metrics such as memory complexity, time complexity or energy consumption might become increasingly important.
This report will provide an overview of different model compression approaches, specifically Neural Network Pruning, Knowledge Distillation and Deep Compression. All of the methods discussed are able to compress artificial neural networks, while some of them are much more generic and able to compress other types of models, as well. Typically, these techniques allow for a significant reduction in memory footprint and impressive speed–ups for both training and prediction, with little or no loss in predictive performance.
The most important contributions are linked in the first paragraph. Needless to say, there is much more! If you do not know where to start, maybe the following list will help:
- Do Deep Nets Really Need to be Deep?
- SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
- Github: SqueezeNet-Deep-Compression
- Good Summary of different methods in secton Related Work
- Pruning of Neural Networks
- Pruning algorithms of neural networks — a comparative study
- A Comparative Study of Neural Network Optimization Techniques
- Compression of Deep Convolutional Neural Networks for Fast And Low Power Mobile Applications