ORIGINAL PAPER

Drop-Activation: Implicit Parameter Reduction and Harmonious Regularization

Expand
  • 1 Department of Mathematics, Purdue University, West Lafayette, IN 47907, USA;
    2 Department of Statistics and the College, The University of Chicago, Chicago, IL 60637, USA

Received date: 2019-12-09

  Revised date: 2020-03-27

  Online published: 2021-05-26

Abstract

Overftting frequently occurs in deep learning. In this paper, we propose a novel regularization method called drop-activation to reduce overftting and improve generalization. The key idea is to drop nonlinear activation functions by setting them to be identity functions randomly during training time. During testing, we use a deterministic network with a new activation function to encode the average efect of dropping activations randomly. Our theoretical analyses support the regularization efect of drop-activation as implicit parameter reduction and verify its capability to be used together with batch normalization (Iofe and Szegedy in Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167, 2015). The experimental results on CIFAR10, CIFAR100, SVHN, EMNIST, and ImageNet show that drop-activation generally improves the performance of popular neural network architectures for the image classifcation task. Furthermore, as a regularizer drop-activation can be used in harmony with standard training and regularization techniques such as batch normalization and AutoAugment (Cubuk et al. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 113-123, 2019). The code is available at https://github.com/LeungSamWai/Drop-Activ ation.

Cite this article

Senwei Liang, Yuehaw Khoo, Haizhao Yang . Drop-Activation: Implicit Parameter Reduction and Harmonious Regularization[J]. Communications on Applied Mathematics and Computation, 2021 , 3(2) : 293 -311 . DOI: 10.1007/s42967-020-00085-3

References

1. Cohen, G., Afshar, S., Tapson, J., van Schaik, A.:EMNIST:an extension of MNIST to handwritten letters (2017). arXiv:1702.05373
2. Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.:AutoAugment:learning augmentation strategies from data. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 113-123 (2019)
3. DeVries, T., Taylor, G.W.:Improved regularization of convolutional neural networks with cutout (2017). arXiv:1708.04552
4. Gastaldi, X.:Shake-shake regularization (2017). arXiv:1705.07485
5. He, K., Zhang, X., Ren, S., Sun, J.:Deep residual learning for image recognition. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778 (2016)
6. He, K., Zhang, X., Ren, S., Sun, J.:Identity mappings in deep residual networks. In:European Conference on Computer Vision, pp. 630-645. Springer, Cham (2016)
7. Hu, J., Shen, L., Sun, G.:Squeeze-and-excitation networks. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132-7141 (2018)
8. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.:Densely connected convolutional networks. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700-4708 (2017)
9. Huang, G., Sun, Y., Liu, Z., Sedra, D., Weinberger, K.Q.:Deep networks with stochastic depth. In:European Conference on Computer Vision, pp. 646-661. Springer, Cham (2016)
10. Iofe, S., Szegedy, C.:Batch normalization:accelerating deep network training by reducing internal covariate shift (2015). arXiv:1502.03167
11. Krizhevsky, A., Hinton, G.:Learning multiple layers of features from tiny images. Technical Report TR-2009, University of Toronto, Toronto (2009)
12. Krizhevsky, A., Sutskever, I., Hinton, G.E.:Imagenet classifcation with deep convolutional neural networks. In:Advances in Neural Information Processing Systems, pp. 1097-1105 (2012)
13. Krueger, D., Maharaj, T., Kramár, J., Pezeshki, M., Ballas, N., Ke, N.R., Goyal, A., Bengio, Y., Courville, A., Pal, C.:Zoneout:regularizing RNNs by randomly preserving hidden activations (2016). arXiv:1606.01305
14. Lee, J., Xiao, L., Schoenholz, S., Bahri, Y., Novak, R., Sohl-Dickstein, J., Pennington, J.:Wide neural networks of any depth evolve as linear models under gradient descent. In:Advances in Neural Information Processing Systems, pp. 8570-8581 (2019)
15. Li, X., Chen, S., Hu, X., Yang, J.:Understanding the disharmony between dropout and batch normalization by variance shift. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2682-2690 (2019)
16. Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.:Reading digits in natural images with unsupervised feature learning. In:NIPS Workshop on Deep Learning and Unsupervised Feature Learning, Granada, Spain (2011)
17. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Li, F.F.:Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211-252 (2015)
18. Simonyan, K., Zisserman, A.:Very deep convolutional networks for large-scale image recognition (2014). arXiv:1409.1556
19. Singh, S., Hoiem, D., Forsyth, D.:Swapout:learning an ensemble of deep architectures. In:Advances in Neural Information Processing Systems, pp. 28-36 (2016)
20. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.:Dropout:a simple way to prevent neural networks from overftting. J. Mach. Learn. Res. 15(1), 1929-1958 (2014)
21. Sutskever, I., Martens, J., Dahl, G., Hinton, G.:On the importance of initialization and momentum in deep learning. In:International Conference on Machine Learning, pp. 1139-1147 (2013)
22. Wager, S., Wang, S., Liang, P.S.:Dropout training as adaptive regularization. In:Advances in Neural Information Processing Systems, pp. 351-359 (2013)
23. Wan, L., Zeiler, M., Zhang, S., Le Cun, Y., Fergus, R.:Regularization of neural networks using dropconnect. In:International Conference on Machine Learning, pp. 1058-1066 (2013)
24. Xie, L., Wang, J., Wei, Z., Wang, M., Tian, Q.:DisturbLabel:regularizing CNN on the loss layer. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4753-4762 (2016)
25. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.:Aggregated residual transformations for deep neural networks. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492-1500 (2017)
26. Xu, B., Wang, N., Chen, T., Li, M.L.:Empirical evaluation of rectifed activations in convolutional network (2015). arXiv:1505.00853
27. Yamada, Y., Iwamura, M., Akiba, T., Kise, K.:Shakedrop regularization for deep residual learning (2018). arXiv:1802.02375
28. Zagoruyko, S., Komodakis, N.:Wide residual networks (2016). arXiv:1605.07146
29. Zeiler, M.D., Fergus, R.:Stochastic pooling for regularization of deep convolutional neural networks (2013). arXiv:1301.3557
30. Zeiler, M.D., Fergus, R.:Visualizing and understanding convolutional networks. In:European Conference on Computer Vision, pp. 818-833. Springer, Cham (2014)
31. Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.:Mixup:beyond empirical risk minimization (2017). arXiv:1710.09412
Options
Outlines

/