[1] Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Schölkopf, B., Platt, J.C., Hoffman, T. (eds) NIPS’06: Proceedings of the 20th International Conference on Neural Information Processing Systems, pp. 153–160. MIT Press, Cambridge (2006) [2] Bottou, L.: Online algorithms and stochastic approximations. In: Saad, D. (ed) Online Learning and Neural Networks. Cambridge University Press, Cambridge (1998) [3] Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Platt, J.C., Koller, D., Singer, Y., Roweis, S.T. (eds) NIPS’07: Proceedings of the 21st International Conference on Neural Information Processing Systems, pp. 161–168. Curran Associates Inc., New York (2007) [4] Brownlee, J.: Probability for machine learning: discover how to harness uncertainty with Python. Mach. Learn. Mastery (2019) [5] Chen, Q., Huang, N., Riemenschneider, S., Xu, Y.: A B-spline approach for empirical mode decompositions. Adv. Comput. Math. 24(1), 171–195 (2006) [6] Chen, Z., Micchelli, C., Xu, Y.: A construction of interpolating wavelets on invariant sets. Math. Comput. 68(228), 1569–1587 (1999) [7] Chen, Z., Micchelli, C.A., Xu, Y.: Multiscale Methods for Fredholm Integral Equations, vol. 28. Cambridge University Press, Cambridge (2015) [8] Chen, Z., Wu, B., Xu, Y.: Fast multilevel augmentation methods for solving Hammerstein equations. SIAM J. Numer. Anal. 47(3), 2321–2346 (2009) [9] Chizat, L., Bach, F.: On the global convergence of gradient descent for over-parameterized models using optimal transport. In: Bengio, S., Wallach, H.M. (eds) NIPS’18: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 3040–3050. Curran Associates Inc., New York (2018) [10] Daubechies, I.: Ten Lectures on Wavelets. SIAM, Philadelphia (1992) [11] Daubechies, I., DeVore, R., Foucart, S., Hanin, B., Petrova, G.: Nonlinear approximation and (deep) ReLU networks. Constr. Approx. 55(1), 127–172 (2022) [12] Deutsch, F., Deutsch, F.: Best Approximation in Inner Product Spaces, vol. 7. Springer, New York (2001) [13] Du, S., Lee, J., Li, H., Wang, L., Zhai, X.: Gradient descent finds global minima of deep neural networks. In: International Conference on Machine Learning, ICML, pp. 1675–1685. PMLR (2019) [14] Elsken, T., Metzen, J.H., Hutter, F.: Neural architecture search: a survey. J. Mach. Learn. Res. 20(55), 1–21 (2019) [15] Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256. JMLR Workshop and Conference Proceedings (2010) [16] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016) [17] Häggström, I., Schmidtlein, C.R., Campanella, G., Fuchs, T.J.: DeepPET: a deep encoder-decoder network for directly solving the PET image reconstruction inverse problem. Med. Image Anal. 54, 253–262 (2019) [18] Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., Tsang, I., Sugiyama, M.: Co-teaching: robust training of deep neural networks with extremely noisy labels. Adv. Neural Inf. Process. Syst. 31, 1–11 (2018) [19] Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, New York (2012) [20] Huang, N.E., Shen, Z., Long, S.R., Wu, M.C., Shih, H.H., Zheng, Q., Yen, N.C., Tung, C.C., Liu, H.H.: The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. Lond. A 454(1971), 903–995 (1998) [21] Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR), San Diega, CA, USA (2015) [22] Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Communications of the ACM 60(6), 84–90 (2017) [23] Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. In: 5th International Conference on Learning Representations (ICLR 2017) (2017) [24] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015) [25] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998) [26] Liu, Q., Wang, R., Xu, Y., Yan, M.: Parameter choices for sparse regularization with the norm. Inverse Probl. 39(2), 025004 (2023) [27] Mallat, S.G., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 41(12), 3397–3415 (1993) [28] Micchelli, C.A., Xu, Y.: Using the matrix refinement equation for the construction of wavelets on invariant sets. Appl. Comput. Harmon. Anal. 1(4), 391–401 (1994) [29] Natarajan, N., Dhillon, I.S., Ravikumar, P.K., Tewari, A.: Learning with noisy labels. Adv. Neural Inf. Process. Syst. 26, 1–9 (2013) [30] Patrini, G., Rozza, A., Krishna Menon, A., Nock, R., Qu, L.: Making deep neural networks robust to label noise: a loss correction approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1944–1952 (2017) [31] Rahaman, N., Baratin, A., Arpit, D., Draxler, F., Lin, M., Hamprecht, F., Bengio, Y., Courville, A.: On the spectral bias of neural networks. In: International Conference on Machine Learning (2019) [32] Raissi, M.: Deep hidden physics models: deep learning of nonlinear partial differential equations. J. Mach. Learn. Res. 19(25), 1–24 (2018) [33] Rice, L., Wong, E., Kolter, Z.: Overfitting in adversarially robust deep learning. In: Singh III, H.D.A. (ed.) Proceedings of the 37th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 119, pp. 8093–8104. PMLR (2020) [34] van Rooyen, B., Menon, A.K., Williamson, R.C.: Learning with symmetric label noise: the importance of being unhinged. In: NIPS’15: Proceedings of the 29th International Conference on Neural Information Processing Systems, vol. 1, pp. 10–18 (2015) [35] Shen, D., Wu, G., Suk, H.I.: Deep learning in medical image analysis. Annu. Rev. Biomed. Eng. 19(1), 221–248 (2017) [36] Shen, Z., Yang, H., Zhang, S.: Deep network with approximation error being reciprocal of width to power of square root of depth. Neural Comput. 33(4), 1005–1036 (2021) [37] Torlai, G., Mazzola, G., Carrasquilla, J., Troyer, M., Melko, R., Carleo, G.: Neural-network quantum state tomography. Nat. Phys. 14(5), 447–450 (2018) [38] Wu, W., Feng, G., Li, Z., Xu, Y.: Deterministic convergence of an online gradient method for BP neural networks. IEEE Trans. Neural Netw. 16(3), 533–540 (2005) [39] Wu, W., Xu, Y.: Deterministic convergence of an online gradient method for neural networks. J. Comput. Appl. Math. 144(1/2), 335–347 (2002) [40] Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv:1708.07747 (2017) [41] Xu, Y., Liu, B., Liu, J., Riemenschneider, S.: Two-dimensional empirical mode decomposition by finite elements. Proc. R. Soc. A Math. Phys. Eng. Sci. 462(2074), 3081–3096 (2006) [42] Xu, Y., Zhang, H.: Convergence of deep ReLU networks. Neurocomputing 571, 127174 (2024) [43] Xu, Y., Zeng, T.: Sparse deep neural network for nonlinear partial differential equations. Numer. Math. Theor. Meth. Appl. 16(1), 58–78 (2023) [44] Xu, Z.Q.J., Zhang, Y., Luo, T.: Overview frequency principle/spectral bias in deep learning. Commun. Appl. Math. Comput. (2024). https://doi.org/10.1007/s42967-024-00398-7 [45] Xu, Z.Q.J., Zhang, Y., Xiao, Y.: Training behavior of deep neural network in frequency domain. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Lecture Notes in Computer Science, vol 11953. Springer, Cham |