Communications on Applied Mathematics and Computation ›› 2024, Vol. 6 ›› Issue (2): 1217-1240.doi: 10.1007/s42967-023-00322-5

• ORIGINAL PAPERS • 上一篇    下一篇

Optimization in Machine Learning: a Distribution-Space Approach

Yongqiang Cai1, Qianxiao Li2, Zuowei Shen2   

  1. 1. School of Mathematical Sciences, Laboratory of Mathematics and Complex Systems, MOE, Beijing Normal University, Beijing 100875, China;
    2. Department of Mathematics, National University of Singapore, 21 Lower Kent Ridge Road, Singapore 119077, Singapore
  • 收稿日期:2022-08-26 修回日期:2023-09-15 接受日期:2023-09-16 出版日期:2024-03-01 发布日期:2024-03-01
  • 通讯作者: Yongqiang Cai,E-mail:caiyq.math@bnu.edu.cn;Qianxiao Li,E-mail:qianxiao@nus.edu.sg;Zuowei Shen,E-mail:matzuows@nus.edu.sg E-mail:caiyq.math@bnu.edu.cn;qianxiao@nus.edu.sg;matzuows@nus.edu.sg
  • 基金资助:
    Yongqiang Cai is supported by the National Natural Science Foundation of China (Grant No. 12201053). Qianxiao Li is supported by the National Research Foundation, Singapore, under the NRF fellowship (Project No. NRF-NRFF13-2021-0005).

Optimization in Machine Learning: a Distribution-Space Approach

Yongqiang Cai1, Qianxiao Li2, Zuowei Shen2   

  1. 1. School of Mathematical Sciences, Laboratory of Mathematics and Complex Systems, MOE, Beijing Normal University, Beijing 100875, China;
    2. Department of Mathematics, National University of Singapore, 21 Lower Kent Ridge Road, Singapore 119077, Singapore
  • Received:2022-08-26 Revised:2023-09-15 Accepted:2023-09-16 Online:2024-03-01 Published:2024-03-01
  • Contact: Yongqiang Cai,E-mail:caiyq.math@bnu.edu.cn;Qianxiao Li,E-mail:qianxiao@nus.edu.sg;Zuowei Shen,E-mail:matzuows@nus.edu.sg E-mail:caiyq.math@bnu.edu.cn;qianxiao@nus.edu.sg;matzuows@nus.edu.sg
  • Supported by:
    Yongqiang Cai is supported by the National Natural Science Foundation of China (Grant No. 12201053). Qianxiao Li is supported by the National Research Foundation, Singapore, under the NRF fellowship (Project No. NRF-NRFF13-2021-0005).

摘要: We present the viewpoint that optimization problems encountered in machine learning can often be interpreted as minimizing a convex functional over a function space, but with a non-convex constraint set introduced by model parameterization. This observation allows us to repose such problems via a suitable relaxation as convex optimization problems in the space of distributions over the training parameters. We derive some simple relationships between the distribution-space problem and the original problem, e.g., a distribution-space solution is at least as good as a solution in the original space. Moreover, we develop a numerical algorithm based on mixture distributions to perform approximate optimization directly in the distribution space. Consistency of this approximation is established and the numerical efficacy of the proposed algorithm is illustrated in simple examples. In both theory and practice, this formulation provides an alternative approach to large-scale optimization in machine learning.

关键词: Machine learning, Convex relaxation, Optimization, Distribution space

Abstract: We present the viewpoint that optimization problems encountered in machine learning can often be interpreted as minimizing a convex functional over a function space, but with a non-convex constraint set introduced by model parameterization. This observation allows us to repose such problems via a suitable relaxation as convex optimization problems in the space of distributions over the training parameters. We derive some simple relationships between the distribution-space problem and the original problem, e.g., a distribution-space solution is at least as good as a solution in the original space. Moreover, we develop a numerical algorithm based on mixture distributions to perform approximate optimization directly in the distribution space. Consistency of this approximation is established and the numerical efficacy of the proposed algorithm is illustrated in simple examples. In both theory and practice, this formulation provides an alternative approach to large-scale optimization in machine learning.

Key words: Machine learning, Convex relaxation, Optimization, Distribution space