Global Solutions to Nonconvex Problems by Evolution of Hamilton-Jacobi PDEs

doi:10.1007/s42967-022-00239-5

Abstract

Abstract: Computing tasks may often be posed as optimization problems. The objective functions for real-world scenarios are often nonconvex and/or nondifferentiable. State-of-the-art methods for solving these problems typically only guarantee convergence to local minima. This work presents Hamilton-Jacobi-based Moreau adaptive descent (HJ-MAD), a zero-order algorithm with guaranteed convergence to global minima, assuming continuity of the objective function. The core idea is to compute gradients of the Moreau envelope of the objective (which is “piece-wise convex”) with adaptive smoothing parameters. Gradients of the Moreau envelope (i.e., proximal operators) are approximated via the Hopf-Lax formula for the viscous Hamilton-Jacobi equation. Our numerical examples illustrate global convergence.

Key words: Global optimization, Moreau envelope, Hamilton-Jacobi, Hopf-Lax, Cole-Hopf, Proximals, Zero-order optimization

Howard Heaton, Samy Wu Fung, Stanley Osher. Global Solutions to Nonconvex Problems by Evolution of Hamilton-Jacobi PDEs[J]. Communications on Applied Mathematics and Computation, 2024, 6(2): 790-810.

TrendMD

References

[1] Almeida, L.B.: A learning rule for asynchronous perceptrons with feedback in a combinatorial environment. In: Diederich, J. (ed) Artificial Neural Networks: Concept Learning, pp 102-111. IEEE Press, Los Alamitos (1990). https://doi.org/10.5555/104134.104145
[2] Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd edn. Springer, New York (2017)
[3] Beck, A.: First-Order Methods in Optimization. SIAM, Philadelphia (2017)
[4] Berahas, A.S., Byrd, R.H., Nocedal, J.: Derivative-free optimization of noisy functions via quasi-Newton methods. SIAM J. Optim. 29(2), 965-993 (2019)
[5] Bisong, E.: Google colaboratory. In: Building Machine Learning and Deep Learning Models on Google Cloud Platform, pp. 59-64. Springer, New York (2019)
[6] Brooks, S.H.: A discussion of random methods for seeking maxima. Oper. Res. 6(2), 244-251 (1958)
[7] Cai, H., Lou, Y., McKenzie, D., Yin, W.: A zeroth-order block coordinate descent algorithm for huge-scale black-box optimization. In: International Conference on Machine Learning. PMLR, pp 1193-1203 (2021)
[8] Cai, H., Mckenzie, D., Yin, W., Zhang, Z.: A one-bit, comparison-based gradient estimator. Appl. Comput. Harmon. Anal. 60, 242-266 (2022)
[9] Cai, H., Mckenzie, D., Yin, W., Zhang, Z.: Zeroth-order regularized optimization (zoro): approximately sparse gradients and adaptive sampling. SIAM J. Optim. 32(2), 687-714 (2022)
[10] Chaudhari, P., Choromanska, A., Soatto, S., LeCun, Y., Baldassi, C., Borgs, C., Chayes, J., Sagun, L., Zecchina, R.: Entropy-SGD: biasing gradient descent into wide valleys. J. Stat. Mech 2019(12), 124018 (2019)
[11] Chaudhari, P., Oberman, A., Osher, S., Soatto, S., Carlier, G.: Deep relaxation: partial differential equations for optimizing deep neural networks. Res. Math. Sci. 5(3), 1-30 (2018)
[12] Crandall, M.G., Lions, P.-L.: Two approximations of solutions of Hamilton-Jacobi equations. Math. Comput. 43(167), 1-19 (1984)
[13] Davis, D., Drusvyatskiy, D.: Stochastic subgradient method converges at the rate k^-1/4 on weakly convex functions. arXiv:1802.02988 (2018)
[14] Davis, D., Drusvyatskiy, D.: Stochastic model-based minimization of weakly convex functions. SIAM J. Optim. 29(1), 207-239 (2019)
[15] Davis, D., Drusvyatskiy, D., MacPhee, K.J., Paquette, C.: Subgradient methods for sharp weakly convex functions. J. Optim. Theory Appl. 179(3), 962-982 (2018)
[16] Davis, D., Díaz, M., Drusvyatskiy, D.: Escaping strict saddle points of the Moreau envelope in nonsmooth optimization. SIAM J. Optim. 32(3), 1958-1983 (2022)
[17] Ermoliev, Y.M., Wets, R.-B.: Numerical Techniques for Stochastic Optimization. Springer-Verlag, New York (1988)
[18] Evans, L.C.: Partial differential equations. In: Graduate Studies in Mathematics, vol. 19, 2nd edn. American Mathematical Society, Providence, RI (2010)
[19] Evans, L.C.: Envelopes and nonconvex Hamilton-Jacobi equations. Calc. Var. Partial. Differ. Equ. 50(1), 257-282 (2014)
[20] Hastie, T., Tibshirani, R., Friedman, J.H., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, vol. 2. Springer, New York (2009)
[21] Jones, D.R., Perttunen, C.D., Stuckman, B.E.: Lipschitzian optimization without the Lipschitz constant. J. Optim. Theory Appl. 79(1), 157-181 (1993)
[22] Jones, E., Oliphant, T., Peterson, P., et al. SciPy: open source scientific tools for Python. http://www.scipy.org/ (2001)
[23] Jourani, A., Thibault, L., Zagrodny, D.: Differential properties of the Moreau envelope. J. Funct. Anal. 266(3), 1185-1237 (2014)
[24] Kim, B., Cai, H., McKenzie, D., Yin, W.: Curvature-aware derivative-free optimization. arXiv:2109.13391 (2021)
[25] Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR 2015, San Diego, CA (2015)
[26] Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671-680 (1983)
[27] Kozak, D., Becker, S., Doostan, A., Tenorio, L.: Stochastic subspace descent. arXiv:1904.01145 (2019)
[28] Kozak, D., Becker, S., Doostan, A., Tenorio, L.: A stochastic subspace approach to gradient-free optimization in high dimensions. Comput. Optim. Appl. 79(2), 339-368 (2021)
[29] Kozak, D., Molinari, C., Rosasco, L., Tenorio, L., Villa, S.: Zeroth order optimization with orthogonal random directions. arXiv:2107.03941 (2021)
[30] Larson, J., Menickelly, M., Wild, S.M.: Derivative-free optimization methods. Acta Numer. 28, 287-404 (2019)
[31] Le Digabel, S.: Algorithm 909: NOMAD: nonlinear optimization with the mads algorithm. ACM Transact. Math. Softw. (TOMS) 37(4), 1-15 (2011)
[32] Liu, X.D., Osher, S., Chan, T.: Weighted essentially non-oscillatory schemes. J. Comput. Phys. 115(1), 200-212 (1994)
[33] Moré, J., Wild, S.: Benchmarking derivative-free optimization algorithms. SIAM J. Optim. 20(1), 172-191 (2009)
[34] Moreau, J.J.: Décomposition orthogonale d’un espace hilbertien selon deux cônes mutuellement polaires. C. R. Hebd. Seances Acad. Sci. 255, 238-240 (1962)
[35] Moreau, J.J.: Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. France 93, 273-299 (1965)
[36] Olson, B., Hashmi, I., Molloy, K., Shehu, A.: Basin hopping as a general and versatile optimization framework for the characterization of biological macromolecules. Adv. Artif. Intell. 2012, 674832 (2012)
[37] Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: algorithms based on Hamilton-Jacobi formulations. J. Comput. Phys. 79(1), 12-49 (1988)
[38] Osher, S., Shu, C.-W.: High-order essentially nonoscillatory schemes for Hamilton-Jacobi equations. SIAM J. Numer. Anal. 28(4), 907-922 (1991)
[39] Rastrigin, L.: The convergence of the random search method in the extremal control of a many parameter system. Automat. Remote Control 24, 1337-1342 (1963)
[40] Rockafellar, R.T.: Convex Analysis, vol. 18. Princeton University Press, Princeton (1970)
[41] Scaman, K., Dos Santos, L., Barlier, M., Colin, I.: A simple and efficient smoothing method for faster optimization and local exploration. Adv. Neural. Inf. Process. Syst. 33, 6503-6513 (2020)
[42] Shi, H.-J. M., Xie,Y., Xuan, M. Q., Nocedal, J.: Adaptive finite-difference interval estimation for noisy derivative-free optimization. arXiv:2110.06380 (2021)
[43] Shi, H.-J.M., Xuan, M.Q., Oztoprak, F., Nocedal, J.: On the numerical performance of derivative-free optimization methods based on finite-difference approximations. arXiv:2102.09762 (2021)
[44] Storn, R., Price, K.: Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11(4), 341-359 (1997)
[45] Surjanovic, S., Bingham, D.: Virtual library of simulation experiments: test functions and datasets. Retrieved June 23. http://www.sfu.ca/~ssurjano (2021)
[46] Wales, D.J., Doye, J.P.: Global optimization by basin-hopping and the lowest energy structures of Lennard-Jones clusters containing up to 110 atoms. J. Phys. Chem. A 101(28), 5111-5116 (1997)