Communications on Applied Mathematics and Computation ›› 2021, Vol. 3 ›› Issue (2): 293-311.doi: 10.1007/s42967-020-00085-3

• ORIGINAL PAPER • 上一篇    下一篇

Drop-Activation: Implicit Parameter Reduction and Harmonious Regularization

Senwei Liang1, Yuehaw Khoo2, Haizhao Yang1   

  1. 1 Department of Mathematics, Purdue University, West Lafayette, IN 47907, USA;
    2 Department of Statistics and the College, The University of Chicago, Chicago, IL 60637, USA
  • 收稿日期:2019-12-09 修回日期:2020-03-27 出版日期:2021-06-20 发布日期:2021-05-26
  • 通讯作者: Haizhao Yang, Senwei Liang, Yuehaw Khoo E-mail:haizhao@purdue.edu;liang339@purdue.edu;ykhoo@galton.uchicago.edu

Drop-Activation: Implicit Parameter Reduction and Harmonious Regularization

Senwei Liang1, Yuehaw Khoo2, Haizhao Yang1   

  1. 1 Department of Mathematics, Purdue University, West Lafayette, IN 47907, USA;
    2 Department of Statistics and the College, The University of Chicago, Chicago, IL 60637, USA
  • Received:2019-12-09 Revised:2020-03-27 Online:2021-06-20 Published:2021-05-26
  • Contact: Haizhao Yang, Senwei Liang, Yuehaw Khoo E-mail:haizhao@purdue.edu;liang339@purdue.edu;ykhoo@galton.uchicago.edu

摘要: Overftting frequently occurs in deep learning. In this paper, we propose a novel regularization method called drop-activation to reduce overftting and improve generalization. The key idea is to drop nonlinear activation functions by setting them to be identity functions randomly during training time. During testing, we use a deterministic network with a new activation function to encode the average efect of dropping activations randomly. Our theoretical analyses support the regularization efect of drop-activation as implicit parameter reduction and verify its capability to be used together with batch normalization (Iofe and Szegedy in Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167, 2015). The experimental results on CIFAR10, CIFAR100, SVHN, EMNIST, and ImageNet show that drop-activation generally improves the performance of popular neural network architectures for the image classifcation task. Furthermore, as a regularizer drop-activation can be used in harmony with standard training and regularization techniques such as batch normalization and AutoAugment (Cubuk et al. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 113-123, 2019). The code is available at https://github.com/LeungSamWai/Drop-Activ ation.

关键词: Deep learning, Image classifcation, Overftting, Regularization

Abstract: Overftting frequently occurs in deep learning. In this paper, we propose a novel regularization method called drop-activation to reduce overftting and improve generalization. The key idea is to drop nonlinear activation functions by setting them to be identity functions randomly during training time. During testing, we use a deterministic network with a new activation function to encode the average efect of dropping activations randomly. Our theoretical analyses support the regularization efect of drop-activation as implicit parameter reduction and verify its capability to be used together with batch normalization (Iofe and Szegedy in Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167, 2015). The experimental results on CIFAR10, CIFAR100, SVHN, EMNIST, and ImageNet show that drop-activation generally improves the performance of popular neural network architectures for the image classifcation task. Furthermore, as a regularizer drop-activation can be used in harmony with standard training and regularization techniques such as batch normalization and AutoAugment (Cubuk et al. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 113-123, 2019). The code is available at https://github.com/LeungSamWai/Drop-Activ ation.

Key words: Deep learning, Image classifcation, Overftting, Regularization

中图分类号: