nach oben

Erschienen in:

2024 | OriginalPaper | Buchkapitel

Probabilistic Guarantees of Stochastic Recursive Gradient in Non-convex Finite Sum Problems

verfasst von : Yanjie Zhong, Jiaqi Li, Soumendra Lahiri

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer Nature Singapore

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This paper develops a new dimension-free Azuma-Hoeffding type bound on summation norm of a martingale difference sequence with random individual bounds. With this novel result, we provide high-probability bounds for the gradient norm estimator in the proposed algorithm Prob-SARAH, which is a modified version of the StochAstic Recursive grAdient algoritHm (SARAH), a state-of-art variance reduced algorithm that achieves optimal computational complexity in expectation for the finite sum problem. The in-probability complexity by Prob-SARAH matches the best in-expectation result up to logarithmic factors. Empirical experiments demonstrate the superior probabilistic performance of Prob-SARAH on real datasets compared to other popular algorithms.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Adaptive Prediction Interval for Data Stream Regression

Nächstes Kapitel Rethinking Personalized Federated Learning with Clustering-Based Dynamic Graph Propagation

Nur mit Berechtigung zugänglich

Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707. PMLR (2016)

Bardenet, R., Maillard, O.A.: Concentration inequalities for sampling without replacement. Bernoulli 21(3), 1361–1385 (2015)MathSciNetCrossRef

Boucheron, S., Lugosi, G., Massart, P.: Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press (2013)

Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014)

Fang, C., Li, C.J., Lin, Z., Zhang, T.: SPIDER: near-optimal non-convex optimization via stochastic path integrated differential estimator. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 687–697 (2018)

Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)MathSciNetCrossRef

Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)

Harvey, N.J., Liaw, C., Plan, Y., Randhawa, S.: Tight analyses for non-smooth stochastic gradient descent. In: Conference on Learning Theory, pp. 1579–1613. PMLR (2019)

Harvey, N.J., Liaw, C., Randhawa, S.: Simple and optimal high-probability bounds for strongly-convex stochastic gradient descent. arXiv preprint arXiv:1909.00843 (2019)

10.

Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)MathSciNetCrossRef

11.

Horváth, S., Lei, L., Richtárik, P., Jordan, M.I.: Adaptivity of stochastic gradient methods for nonconvex optimization. arXiv preprint arXiv:2002.05359 (2020)

12.

Jain, P., Nagaraj, D., Netrapalli, P.: Making the last iterate of SGD information theoretically optimal. In: Conference on Learning Theory, pp. 1752–1755. PMLR (2019)

13.

James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning. STS, vol. 103. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-7138-7CrossRef

14.

Ji, K., Wang, Z., Weng, B., Zhou, Y., Zhang, W., Liang, Y.: History-gradient aided batch size adaptation for variance reduced algorithms. In: International Conference on Machine Learning, pp. 4762–4772. PMLR (2020)

15.

Jin, C., Netrapalli, P., Ge, R., Kakade, S.M., Jordan, M.I.: A short note on concentration inequalities for random vectors with Subgaussian norm. arXiv preprint arXiv:1902.03736 (2019)

16.

Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems, vol. 26, pp. 315–323 (2013)

17.

Kakade, S.M., Tewari, A.: On the generalization ability of online strongly convex programming algorithms. In: Advances in Neural Information Processing Systems, pp. 801–808 (2009)

18.

Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, vol. 2, pp. 2663–2671 (2012)

19.

Lei, L., Jordan, M.: Less than a single pass: stochastically controlled stochastic gradient. In: Artificial Intelligence and Statistics, pp. 148–156. PMLR (2017)

20.

Lei, L., Jordan, M.I.: On the adaptivity of stochastic gradient-based optimization. SIAM J. Optim. 30(2), 1473–1500 (2020)MathSciNetCrossRef

21.

Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via SCSG methods. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 2345–2355 (2017)

22.

Li, X., Orabona, F.: A high probability analysis of adaptive SGD with momentum. arXiv preprint arXiv:2007.14294 (2020)

23.

Li, Z.: SSRGD: simple stochastic recursive gradient descent for escaping saddle points. In: Advances in Neural Information Processing Systems, vol. 32, pp. 1523–1533 (2019)

24.

Li, Z., Bao, H., Zhang, X., Richtárik, P.: PAGE: a simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295. PMLR (2021)

25.

Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 5569–5579 (2018)

26.

Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, vol. 87. Springer, New York (2003). https://doi.org/10.1007/978-1-4419-8853-9

27.

Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: SARAH: a novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621. PMLR (2017)

28.

Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017)

29.

Pinelis, I.: An approach to inequalities for the distributions of infinite-dimensional martingales. In: Dudley, R.M., Hahn, M.G., Kuelbs, J. (eds.) Probability in Banach Spaces, 8: Proceedings of the Eighth International Conference. Progress in Probability, vol. 30, pp. 128–134. Birkhäuser, Boston (1992). https://doi.org/10.1007/978-1-4612-0367-4_9

30.

Pinelis, I.: Optimum bounds for the distributions of martingales in Banach spaces. Ann. Probab. 22, 1679–1706 (1994)MathSciNetCrossRef

31.

Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323. PMLR (2016)

32.

Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: Hybrid stochastic gradient descent algorithms for stochastic nonconvex optimization. arXiv preprint arXiv:1905.05920 (2019)

33.

Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: SpiderBoost and momentum: faster variance reduction algorithms. In: Advances in Neural Information Processing Systems, vol. 32, pp. 2406–2416 (2019)

34.

Zhou, D., Chen, J., Cao, Y., Tang, Y., Yang, Z., Gu, Q.: On the convergence of adaptive gradient methods for nonconvex optimization. arXiv preprint arXiv:1808.05671 (2018)

Titel: Probabilistic Guarantees of Stochastic Recursive Gradient in Non-convex Finite Sum Problems
verfasst von: Yanjie Zhong
Jiaqi Li
Soumendra Lahiri
Verlag: Springer Nature Singapore
Buch: Advances in Knowledge Discovery and Data Mining
Print ISBN: 978-981-9722-61-7

Electronic ISBN: 978-981-9722-59-4

Copyright-Jahr: 2024
DOI: https://doi.org/10.1007/978-981-97-2259-4_11

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner