Communications on Applied Mathematics and Computation ›› 2021, Vol. 3 ›› Issue (2): 313-335.doi: 10.1007/s42967-020-00078-2

• ORIGINAL PAPER • Previous Articles     Next Articles

Learning the Mapping x ↦ $\sum\limits_{i = 1}^d {x_i^2}$: the Cost of Finding the Needle in a Haystack

Jiefu Zhang1, Leonardo Zepeda-Núñez2, Yuan Yao3, Lin Lin1,4   

  1. 1 Department of Mathematics, University of California, Berkeley, Berkeley, CA 94720, USA;
    2 Department of Mathematics, University of Wisconsin-Madison, Madison, WI 53706, USA;
    3 Department of Mathematics, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China;
    4 Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
  • Received:2020-02-24 Revised:2020-05-16 Online:2021-06-20 Published:2021-05-26
  • Contact: Jiefu Zhang, Leonardo Zepeda, Núñez, Yuan Yao, Lin Lin E-mail:jiefuzhang@berkeley.edu;lzepeda@math.wisc.edu;yuany@ust.hk;linlin@math.berkeley.edu

Abstract: The task of using the machine learning to approximate the mapping x ↦ $\sum\limits_{i = 1}^d {x_i^2}$ with xi ∈ [-1, 1] seems to be a trivial one. Given the knowledge of the separable structure of the function, one can design a sparse network to represent the function very accurately, or even exactly. When such structural information is not available, and we may only use a dense neural network, the optimization procedure to fnd the sparse network embedded in the dense network is similar to fnding the needle in a haystack, using a given number of samples of the function. We demonstrate that the cost (measured by sample complexity) of fnding the needle is directly related to the Barron norm of the function. While only a small number of samples are needed to train a sparse network, the dense network trained with the same number of samples exhibits large test loss and a large generalization gap. To control the size of the generalization gap, we fnd that the use of the explicit regularization becomes increasingly more important as d increases. The numerically observed sample complexity with explicit regularization scales as $\mathcal{O}$(d2.5), which is in fact better than the theoretically predicted sample complexity that scales as $\mathcal{O}$(d4). Without the explicit regularization (also called the implicit regularization), the numerically observed sample complexity is signifcantly higher and is close to $\mathcal{O}$(d4.5).

Key words: Machine learning, Sample complexity, Generalization error

CLC Number: