Drop-Activation: Implicit Parameter Reduction and Harmonious Regularization

doi:10.1007/s42967-020-00085-3

Communications on Applied Mathematics and Computation ›› 2021, Vol. 3 ›› Issue (2): 293-311.doi: 10.1007/s42967-020-00085-3

• ORIGINAL PAPER • 上一篇下一篇

Drop-Activation: Implicit Parameter Reduction and Harmonious Regularization

Senwei Liang¹, Yuehaw Khoo², Haizhao Yang¹

1 Department of Mathematics, Purdue University, West Lafayette, IN 47907, USA;
2 Department of Statistics and the College, The University of Chicago, Chicago, IL 60637, USA

收稿日期:2019-12-09 修回日期:2020-03-27 出版日期:2021-06-20 发布日期:2021-05-26
通讯作者: Haizhao Yang, Senwei Liang, Yuehaw Khoo E-mail:haizhao@purdue.edu;liang339@purdue.edu;ykhoo@galton.uchicago.edu

Drop-Activation: Implicit Parameter Reduction and Harmonious Regularization

Senwei Liang¹, Yuehaw Khoo², Haizhao Yang¹

1 Department of Mathematics, Purdue University, West Lafayette, IN 47907, USA;
2 Department of Statistics and the College, The University of Chicago, Chicago, IL 60637, USA

Received:2019-12-09 Revised:2020-03-27 Online:2021-06-20 Published:2021-05-26
Contact: Haizhao Yang, Senwei Liang, Yuehaw Khoo E-mail:haizhao@purdue.edu;liang339@purdue.edu;ykhoo@galton.uchicago.edu

摘要/Abstract

摘要： Overftting frequently occurs in deep learning. In this paper, we propose a novel regularization method called drop-activation to reduce overftting and improve generalization. The key idea is to drop nonlinear activation functions by setting them to be identity functions randomly during training time. During testing, we use a deterministic network with a new activation function to encode the average efect of dropping activations randomly. Our theoretical analyses support the regularization efect of drop-activation as implicit parameter reduction and verify its capability to be used together with batch normalization (Iofe and Szegedy in Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167, 2015). The experimental results on CIFAR10, CIFAR100, SVHN, EMNIST, and ImageNet show that drop-activation generally improves the performance of popular neural network architectures for the image classifcation task. Furthermore, as a regularizer drop-activation can be used in harmony with standard training and regularization techniques such as batch normalization and AutoAugment (Cubuk et al. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 113-123, 2019). The code is available at https://github.com/LeungSamWai/Drop-Activ ation.

关键词: Deep learning, Image classifcation, Overftting, Regularization

Abstract: Overftting frequently occurs in deep learning. In this paper, we propose a novel regularization method called drop-activation to reduce overftting and improve generalization. The key idea is to drop nonlinear activation functions by setting them to be identity functions randomly during training time. During testing, we use a deterministic network with a new activation function to encode the average efect of dropping activations randomly. Our theoretical analyses support the regularization efect of drop-activation as implicit parameter reduction and verify its capability to be used together with batch normalization (Iofe and Szegedy in Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167, 2015). The experimental results on CIFAR10, CIFAR100, SVHN, EMNIST, and ImageNet show that drop-activation generally improves the performance of popular neural network architectures for the image classifcation task. Furthermore, as a regularizer drop-activation can be used in harmony with standard training and regularization techniques such as batch normalization and AutoAugment (Cubuk et al. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 113-123, 2019). The code is available at https://github.com/LeungSamWai/Drop-Activ ation.

Key words: Deep learning, Image classifcation, Overftting, Regularization

中图分类号:

92B20
78M32

Senwei Liang, Yuehaw Khoo, Haizhao Yang. Drop-Activation: Implicit Parameter Reduction and Harmonious Regularization[J]. Communications on Applied Mathematics and Computation, 2021, 3(2): 293-311.

参考文献

1. Cohen, G., Afshar, S., Tapson, J., van Schaik, A.:EMNIST:an extension of MNIST to handwritten letters (2017). arXiv:1702.05373
2. Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.:AutoAugment:learning augmentation strategies from data. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 113-123 (2019)
3. DeVries, T., Taylor, G.W.:Improved regularization of convolutional neural networks with cutout (2017). arXiv:1708.04552
4. Gastaldi, X.:Shake-shake regularization (2017). arXiv:1705.07485
5. He, K., Zhang, X., Ren, S., Sun, J.:Deep residual learning for image recognition. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778 (2016)
6. He, K., Zhang, X., Ren, S., Sun, J.:Identity mappings in deep residual networks. In:European Conference on Computer Vision, pp. 630-645. Springer, Cham (2016)
7. Hu, J., Shen, L., Sun, G.:Squeeze-and-excitation networks. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132-7141 (2018)
8. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.:Densely connected convolutional networks. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700-4708 (2017)
9. Huang, G., Sun, Y., Liu, Z., Sedra, D., Weinberger, K.Q.:Deep networks with stochastic depth. In:European Conference on Computer Vision, pp. 646-661. Springer, Cham (2016)
10. Iofe, S., Szegedy, C.:Batch normalization:accelerating deep network training by reducing internal covariate shift (2015). arXiv:1502.03167
11. Krizhevsky, A., Hinton, G.:Learning multiple layers of features from tiny images. Technical Report TR-2009, University of Toronto, Toronto (2009)
12. Krizhevsky, A., Sutskever, I., Hinton, G.E.:Imagenet classifcation with deep convolutional neural networks. In:Advances in Neural Information Processing Systems, pp. 1097-1105 (2012)
13. Krueger, D., Maharaj, T., Kramár, J., Pezeshki, M., Ballas, N., Ke, N.R., Goyal, A., Bengio, Y., Courville, A., Pal, C.:Zoneout:regularizing RNNs by randomly preserving hidden activations (2016). arXiv:1606.01305
14. Lee, J., Xiao, L., Schoenholz, S., Bahri, Y., Novak, R., Sohl-Dickstein, J., Pennington, J.:Wide neural networks of any depth evolve as linear models under gradient descent. In:Advances in Neural Information Processing Systems, pp. 8570-8581 (2019)
15. Li, X., Chen, S., Hu, X., Yang, J.:Understanding the disharmony between dropout and batch normalization by variance shift. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2682-2690 (2019)
16. Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.:Reading digits in natural images with unsupervised feature learning. In:NIPS Workshop on Deep Learning and Unsupervised Feature Learning, Granada, Spain (2011)
17. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Li, F.F.:Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211-252 (2015)
18. Simonyan, K., Zisserman, A.:Very deep convolutional networks for large-scale image recognition (2014). arXiv:1409.1556
19. Singh, S., Hoiem, D., Forsyth, D.:Swapout:learning an ensemble of deep architectures. In:Advances in Neural Information Processing Systems, pp. 28-36 (2016)
20. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.:Dropout:a simple way to prevent neural networks from overftting. J. Mach. Learn. Res. 15(1), 1929-1958 (2014)
21. Sutskever, I., Martens, J., Dahl, G., Hinton, G.:On the importance of initialization and momentum in deep learning. In:International Conference on Machine Learning, pp. 1139-1147 (2013)
22. Wager, S., Wang, S., Liang, P.S.:Dropout training as adaptive regularization. In:Advances in Neural Information Processing Systems, pp. 351-359 (2013)
23. Wan, L., Zeiler, M., Zhang, S., Le Cun, Y., Fergus, R.:Regularization of neural networks using dropconnect. In:International Conference on Machine Learning, pp. 1058-1066 (2013)
24. Xie, L., Wang, J., Wei, Z., Wang, M., Tian, Q.:DisturbLabel:regularizing CNN on the loss layer. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4753-4762 (2016)
25. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.:Aggregated residual transformations for deep neural networks. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492-1500 (2017)
26. Xu, B., Wang, N., Chen, T., Li, M.L.:Empirical evaluation of rectifed activations in convolutional network (2015). arXiv:1505.00853
27. Yamada, Y., Iwamura, M., Akiba, T., Kise, K.:Shakedrop regularization for deep residual learning (2018). arXiv:1802.02375
28. Zagoruyko, S., Komodakis, N.:Wide residual networks (2016). arXiv:1605.07146
29. Zeiler, M.D., Fergus, R.:Stochastic pooling for regularization of deep convolutional neural networks (2013). arXiv:1301.3557
30. Zeiler, M.D., Fergus, R.:Visualizing and understanding convolutional networks. In:European Conference on Computer Vision, pp. 818-833. Springer, Cham (2014)
31. Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.:Mixup:beyond empirical risk minimization (2017). arXiv:1710.09412

Drop-Activation: Implicit Parameter Reduction and Harmonious Regularization

Drop-Activation: Implicit Parameter Reduction and Harmonious Regularization

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

编辑推荐

Metrics

本文评价