Abstract:
Penalty-based regularization is extremely popular in ML. However, this powerful technique can require an expensive trial-and-error process for tuning the penalty coefficient. In this paper, we take sparse training of deep neural networks as a case study to illustrate the advantages of a constrained optimization approach: improved tunability, and a more interpretable hyperparameter. Our proposed technique (i) has a negligible computational overhead, (ii) reliably achieves arbitrary sparsity targets โin one shotโ while retaining high accuracy, and (iii) scales successfully to large residual models and datasets.
In this talk, I will also give a brief introduction to Cooper External Link, a general-purpose, deep learning-first library for constrained optimization in Pytorch. Cooper was developed as part of the research direction above, and was born out of the need to handle constrained optimization problems for which the loss or constraints may not be โnicely behavedโ or โtheoretically tractableโ, as is often the case in DL.
Discussion