1. Abadi, M., Barham, P., Chen, J., et al.:TensorFlow:a system for large-scale machine learning. In:12th USE-NIX Symposium on Operating Systems Design and Implementation, vol. 16, pp. 265-283 (2016) 2. Arora, S., Du, S. S., Hu, W., Li, Z., Wang, R.:Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks. arXiv:1901.08584 (2019) 3. Bach, F.:Breaking the curse of dimensionality with convex neural networks. J. Mach. Learn. Res. 18(1), 629-681 (2017) 4. Barron, A.R.:Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inform. Theory 39, 930-945 (1993) 5. Chollet, F. et al.:Keras. https://keras.io (2015) 6. Cohen, N., Sharir, O., Shashua, A.:On the expressive power of deep learning:a tensor analysis. In:Conference on Learning Theory, pp. 698-728 (2016) 7. D'Ascoli, S., Sagun, L., Bruna, J., Biroli, G.:Finding the needle in the haystack with convolutions:on the benefts of architectural bias. arXiv:1906.06766 (2019) 8. Frankle, J., Dziugaite, G. K., Roy, D. M., Carbin, M.:The lottery ticket hypothesis at scale. arXiv:1903.01611 (2019) 9. Frankle, J., Carbin, M.:The lottery ticket hypothesis:fnding sparse, trainable neural networks. In:International Conference on Learning Representations 2017 (ICLR). arXiv:1803.03635 (2019) 10. Freeman, C. D., Bruna, J.:Topology and geometry of half-rectifed network optimization. In:International Conference on Learning Representations (ICLR). arXiv:1611.01540 (2017) 11. He, K., Sun, J.:Convolutional neural networks at constrained time cost. In:2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5353-5360 (2015) 12. He, K., Zhang, X., Ren, S., Sun, J.:Deep residual learning for image recognition. In:2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778 (2016) 13. Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., Kingsbury, B.:Deep neural networks for acoustic modeling in speech recognition:the shared views of four research groups. IEEE Signal Processing Magazine 29(6), 82-97 (2012) 14. Hornik, K.:Approximation capabilities of multilayer feedforward networks. Neural Netw. 4(2), 251-257 (1991) 15. Kawaguchi, K.:Deep learning without poor local minima. arXiv:1605.07110 (2016) 16. Khrulkov, V., Novikov, A., Oseledets, I.:Expressive power of recurrent neural networks. arXiv:1711.00811 (2017) 17. Kingma, D., Ba, J.:Adam:a method for stochastic optimization. In:The 3rd International Conference for Learning Representations (ICLR). arXiv:1412.6980v8 (2015) 18. Klusowski, J. M., Barron, A. R.:Risk bounds for high-dimensional ridge function combinations including neural networks. arXiv:1607.01434 (2018) 19. Krizhevsky, A., Sutskever, I., Hinton, G. E.:ImageNet classifcation with deep convolutional neural networks. In:Proceedings of the 25th International Conference on Neural Information Processing Systems. vol. 1, pp. 1097-1105 (2012) 20. Kuditipudi, R., Wang, X., Lee, H., Zhang, Y., Li, Z., Hu, W., Arora, S., Ge, R.:Explaining landscape connectivity of low-cost solutions for multilayer net. In:NeurIPS. arXiv:161 (2019) 21. Leung, M.K.K., Xiong, H.Y., Lee, L.J., Frey, B.J.:Deep learning of the tissue-regulated splicing code. Bioinformatics 30(12), i121-i129 (2014) 22. Liu, Z., Sun, M., Zhou, T., Huang, G., Darrell, T.:Rethinking the value of network pruning. In:ICLR (2019). arXiv:1810.05270 (2019) 23. Livni, R., Shalev-Shwartz, S., Shamir, O.:On the computational efciency of training neural networks. In:Advances in Neural Information Processing Systems. arXiv:1410.1141 (2014) 24. Ma, W. E, C., Wu, L.:A priori estimates for two-layer neural networks. arXiv:1810.06397 (2018) 25. Ma, W. E, C., Wu, L.:Barron spaces and the compositional function spaces for neural network models. arXiv:1906.08039 (2019) 26. Ma. W. E, C., Wang, Q.:A priori estimates of the population risk for residual networks. arXiv:1903.02154 (2019) 27. Ma, J., Sheridan, R.P., Liaw, A., Dahl, G.E., Svetnik, V.:Deep neural nets as a method for quantitative structure-activity relationships. J. Chem. Inf. Model. 55(2), 263-274 (2015) 28. Mhaskar, H., Liao, Q., Poggio, T.:Learning functions:when is deep better than shallow. arXiv:1603.00988 (2016) 29. Nagarajan, V., Kolter, J. Z.:Uniform convergence may be unable to explain generalization in deep learning. In:Advances in Neural Information Processing Systems, pp. 11611-11622 (2019) 30. Venturi, L., Bandeira, A. S., Bruna, J.:Spurious valleys in two-layer neural network optimization landscapes. arXiv:1802.06384 (2018) 31. Wei, Y., Yang, F., Wainwright, M. J.:Early stopping for kernel boosting algorithms:a general analysis with localized complexities. In:NIPS. arXiv:1707.01543 (2017) 32. Yao, Y., Rosasco, L., Caponnetto, A.:On early stopping in gradient descent learning. Constructive Approximation 26, 289-315 (2007) 33. Yarotsky, D.:Error bounds for approximations with deep ReLU networks. Neural Networks 94, 103-114 (2017) 34. Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R. R., Smola, A. J.:Deep sets. arXiv:1703.06114 (2017) 35. Zhang, L., Han, J., Wang, H., Car, R., DeePCG, W. E.:Constructing coarse-grained models via deep neural networks. arXiv:1802.08549 (2018) |