ORIGINAL PAPER

Learning the Mapping x ↦ $\sum\limits_{i = 1}^d {x_i^2}$: the Cost of Finding the Needle in a Haystack

Expand
  • 1 Department of Mathematics, University of California, Berkeley, Berkeley, CA 94720, USA;
    2 Department of Mathematics, University of Wisconsin-Madison, Madison, WI 53706, USA;
    3 Department of Mathematics, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China;
    4 Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA

Received date: 2020-02-24

  Revised date: 2020-05-16

  Online published: 2021-05-26

Abstract

The task of using the machine learning to approximate the mapping x ↦ $\sum\limits_{i = 1}^d {x_i^2}$ with xi ∈ [-1, 1] seems to be a trivial one. Given the knowledge of the separable structure of the function, one can design a sparse network to represent the function very accurately, or even exactly. When such structural information is not available, and we may only use a dense neural network, the optimization procedure to fnd the sparse network embedded in the dense network is similar to fnding the needle in a haystack, using a given number of samples of the function. We demonstrate that the cost (measured by sample complexity) of fnding the needle is directly related to the Barron norm of the function. While only a small number of samples are needed to train a sparse network, the dense network trained with the same number of samples exhibits large test loss and a large generalization gap. To control the size of the generalization gap, we fnd that the use of the explicit regularization becomes increasingly more important as d increases. The numerically observed sample complexity with explicit regularization scales as $\mathcal{O}$(d2.5), which is in fact better than the theoretically predicted sample complexity that scales as $\mathcal{O}$(d4). Without the explicit regularization (also called the implicit regularization), the numerically observed sample complexity is signifcantly higher and is close to $\mathcal{O}$(d4.5).

Cite this article

Jiefu Zhang, Leonardo Zepeda-Núñez, Yuan Yao, Lin Lin . Learning the Mapping x ↦ $\sum\limits_{i = 1}^d {x_i^2}$: the Cost of Finding the Needle in a Haystack[J]. Communications on Applied Mathematics and Computation, 2021 , 3(2) : 313 -335 . DOI: 10.1007/s42967-020-00078-2

References

1. Abadi, M., Barham, P., Chen, J., et al.:TensorFlow:a system for large-scale machine learning. In:12th USE-NIX Symposium on Operating Systems Design and Implementation, vol. 16, pp. 265-283 (2016)
2. Arora, S., Du, S. S., Hu, W., Li, Z., Wang, R.:Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks. arXiv:1901.08584 (2019)
3. Bach, F.:Breaking the curse of dimensionality with convex neural networks. J. Mach. Learn. Res. 18(1), 629-681 (2017)
4. Barron, A.R.:Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inform. Theory 39, 930-945 (1993)
5. Chollet, F. et al.:Keras. https://keras.io (2015)
6. Cohen, N., Sharir, O., Shashua, A.:On the expressive power of deep learning:a tensor analysis. In:Conference on Learning Theory, pp. 698-728 (2016)
7. D'Ascoli, S., Sagun, L., Bruna, J., Biroli, G.:Finding the needle in the haystack with convolutions:on the benefts of architectural bias. arXiv:1906.06766 (2019)
8. Frankle, J., Dziugaite, G. K., Roy, D. M., Carbin, M.:The lottery ticket hypothesis at scale. arXiv:1903.01611 (2019)
9. Frankle, J., Carbin, M.:The lottery ticket hypothesis:fnding sparse, trainable neural networks. In:International Conference on Learning Representations 2017 (ICLR). arXiv:1803.03635 (2019)
10. Freeman, C. D., Bruna, J.:Topology and geometry of half-rectifed network optimization. In:International Conference on Learning Representations (ICLR). arXiv:1611.01540 (2017)
11. He, K., Sun, J.:Convolutional neural networks at constrained time cost. In:2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5353-5360 (2015)
12. He, K., Zhang, X., Ren, S., Sun, J.:Deep residual learning for image recognition. In:2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778 (2016)
13. Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., Kingsbury, B.:Deep neural networks for acoustic modeling in speech recognition:the shared views of four research groups. IEEE Signal Processing Magazine 29(6), 82-97 (2012)
14. Hornik, K.:Approximation capabilities of multilayer feedforward networks. Neural Netw. 4(2), 251-257 (1991)
15. Kawaguchi, K.:Deep learning without poor local minima. arXiv:1605.07110 (2016)
16. Khrulkov, V., Novikov, A., Oseledets, I.:Expressive power of recurrent neural networks. arXiv:1711.00811 (2017)
17. Kingma, D., Ba, J.:Adam:a method for stochastic optimization. In:The 3rd International Conference for Learning Representations (ICLR). arXiv:1412.6980v8 (2015)
18. Klusowski, J. M., Barron, A. R.:Risk bounds for high-dimensional ridge function combinations including neural networks. arXiv:1607.01434 (2018)
19. Krizhevsky, A., Sutskever, I., Hinton, G. E.:ImageNet classifcation with deep convolutional neural networks. In:Proceedings of the 25th International Conference on Neural Information Processing Systems. vol. 1, pp. 1097-1105 (2012)
20. Kuditipudi, R., Wang, X., Lee, H., Zhang, Y., Li, Z., Hu, W., Arora, S., Ge, R.:Explaining landscape connectivity of low-cost solutions for multilayer net. In:NeurIPS. arXiv:161 (2019)
21. Leung, M.K.K., Xiong, H.Y., Lee, L.J., Frey, B.J.:Deep learning of the tissue-regulated splicing code. Bioinformatics 30(12), i121-i129 (2014)
22. Liu, Z., Sun, M., Zhou, T., Huang, G., Darrell, T.:Rethinking the value of network pruning. In:ICLR (2019). arXiv:1810.05270 (2019)
23. Livni, R., Shalev-Shwartz, S., Shamir, O.:On the computational efciency of training neural networks. In:Advances in Neural Information Processing Systems. arXiv:1410.1141 (2014)
24. Ma, W. E, C., Wu, L.:A priori estimates for two-layer neural networks. arXiv:1810.06397 (2018)
25. Ma, W. E, C., Wu, L.:Barron spaces and the compositional function spaces for neural network models. arXiv:1906.08039 (2019)
26. Ma. W. E, C., Wang, Q.:A priori estimates of the population risk for residual networks. arXiv:1903.02154 (2019)
27. Ma, J., Sheridan, R.P., Liaw, A., Dahl, G.E., Svetnik, V.:Deep neural nets as a method for quantitative structure-activity relationships. J. Chem. Inf. Model. 55(2), 263-274 (2015)
28. Mhaskar, H., Liao, Q., Poggio, T.:Learning functions:when is deep better than shallow. arXiv:1603.00988 (2016)
29. Nagarajan, V., Kolter, J. Z.:Uniform convergence may be unable to explain generalization in deep learning. In:Advances in Neural Information Processing Systems, pp. 11611-11622 (2019)
30. Venturi, L., Bandeira, A. S., Bruna, J.:Spurious valleys in two-layer neural network optimization landscapes. arXiv:1802.06384 (2018)
31. Wei, Y., Yang, F., Wainwright, M. J.:Early stopping for kernel boosting algorithms:a general analysis with localized complexities. In:NIPS. arXiv:1707.01543 (2017)
32. Yao, Y., Rosasco, L., Caponnetto, A.:On early stopping in gradient descent learning. Constructive Approximation 26, 289-315 (2007)
33. Yarotsky, D.:Error bounds for approximations with deep ReLU networks. Neural Networks 94, 103-114 (2017)
34. Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R. R., Smola, A. J.:Deep sets. arXiv:1703.06114 (2017)
35. Zhang, L., Han, J., Wang, H., Car, R., DeePCG, W. E.:Constructing coarse-grained models via deep neural networks. arXiv:1802.08549 (2018)
Options
Outlines

/