Skip to main content

2024 | OriginalPaper | Buchkapitel

Probabilistic Guarantees of Stochastic Recursive Gradient in Non-convex Finite Sum Problems

verfasst von : Yanjie Zhong, Jiaqi Li, Soumendra Lahiri

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper develops a new dimension-free Azuma-Hoeffding type bound on summation norm of a martingale difference sequence with random individual bounds. With this novel result, we provide high-probability bounds for the gradient norm estimator in the proposed algorithm Prob-SARAH, which is a modified version of the StochAstic Recursive grAdient algoritHm (SARAH), a state-of-art variance reduced algorithm that achieves optimal computational complexity in expectation for the finite sum problem. The in-probability complexity by Prob-SARAH matches the best in-expectation result up to logarithmic factors. Empirical experiments demonstrate the superior probabilistic performance of Prob-SARAH on real datasets compared to other popular algorithms.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707. PMLR (2016) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707. PMLR (2016)
2.
Zurück zum Zitat Bardenet, R., Maillard, O.A.: Concentration inequalities for sampling without replacement. Bernoulli 21(3), 1361–1385 (2015)MathSciNetCrossRef Bardenet, R., Maillard, O.A.: Concentration inequalities for sampling without replacement. Bernoulli 21(3), 1361–1385 (2015)MathSciNetCrossRef
3.
Zurück zum Zitat Boucheron, S., Lugosi, G., Massart, P.: Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press (2013) Boucheron, S., Lugosi, G., Massart, P.: Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press (2013)
4.
Zurück zum Zitat Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014)
5.
Zurück zum Zitat Fang, C., Li, C.J., Lin, Z., Zhang, T.: SPIDER: near-optimal non-convex optimization via stochastic path integrated differential estimator. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 687–697 (2018) Fang, C., Li, C.J., Lin, Z., Zhang, T.: SPIDER: near-optimal non-convex optimization via stochastic path integrated differential estimator. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 687–697 (2018)
6.
Zurück zum Zitat Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)MathSciNetCrossRef Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)MathSciNetCrossRef
7.
Zurück zum Zitat Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016) Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)
8.
Zurück zum Zitat Harvey, N.J., Liaw, C., Plan, Y., Randhawa, S.: Tight analyses for non-smooth stochastic gradient descent. In: Conference on Learning Theory, pp. 1579–1613. PMLR (2019) Harvey, N.J., Liaw, C., Plan, Y., Randhawa, S.: Tight analyses for non-smooth stochastic gradient descent. In: Conference on Learning Theory, pp. 1579–1613. PMLR (2019)
9.
Zurück zum Zitat Harvey, N.J., Liaw, C., Randhawa, S.: Simple and optimal high-probability bounds for strongly-convex stochastic gradient descent. arXiv preprint arXiv:1909.00843 (2019) Harvey, N.J., Liaw, C., Randhawa, S.: Simple and optimal high-probability bounds for strongly-convex stochastic gradient descent. arXiv preprint arXiv:​1909.​00843 (2019)
10.
Zurück zum Zitat Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)MathSciNetCrossRef Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)MathSciNetCrossRef
11.
Zurück zum Zitat Horváth, S., Lei, L., Richtárik, P., Jordan, M.I.: Adaptivity of stochastic gradient methods for nonconvex optimization. arXiv preprint arXiv:2002.05359 (2020) Horváth, S., Lei, L., Richtárik, P., Jordan, M.I.: Adaptivity of stochastic gradient methods for nonconvex optimization. arXiv preprint arXiv:​2002.​05359 (2020)
12.
Zurück zum Zitat Jain, P., Nagaraj, D., Netrapalli, P.: Making the last iterate of SGD information theoretically optimal. In: Conference on Learning Theory, pp. 1752–1755. PMLR (2019) Jain, P., Nagaraj, D., Netrapalli, P.: Making the last iterate of SGD information theoretically optimal. In: Conference on Learning Theory, pp. 1752–1755. PMLR (2019)
14.
Zurück zum Zitat Ji, K., Wang, Z., Weng, B., Zhou, Y., Zhang, W., Liang, Y.: History-gradient aided batch size adaptation for variance reduced algorithms. In: International Conference on Machine Learning, pp. 4762–4772. PMLR (2020) Ji, K., Wang, Z., Weng, B., Zhou, Y., Zhang, W., Liang, Y.: History-gradient aided batch size adaptation for variance reduced algorithms. In: International Conference on Machine Learning, pp. 4762–4772. PMLR (2020)
15.
Zurück zum Zitat Jin, C., Netrapalli, P., Ge, R., Kakade, S.M., Jordan, M.I.: A short note on concentration inequalities for random vectors with Subgaussian norm. arXiv preprint arXiv:1902.03736 (2019) Jin, C., Netrapalli, P., Ge, R., Kakade, S.M., Jordan, M.I.: A short note on concentration inequalities for random vectors with Subgaussian norm. arXiv preprint arXiv:​1902.​03736 (2019)
16.
Zurück zum Zitat Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems, vol. 26, pp. 315–323 (2013) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems, vol. 26, pp. 315–323 (2013)
17.
Zurück zum Zitat Kakade, S.M., Tewari, A.: On the generalization ability of online strongly convex programming algorithms. In: Advances in Neural Information Processing Systems, pp. 801–808 (2009) Kakade, S.M., Tewari, A.: On the generalization ability of online strongly convex programming algorithms. In: Advances in Neural Information Processing Systems, pp. 801–808 (2009)
18.
Zurück zum Zitat Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, vol. 2, pp. 2663–2671 (2012) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, vol. 2, pp. 2663–2671 (2012)
19.
Zurück zum Zitat Lei, L., Jordan, M.: Less than a single pass: stochastically controlled stochastic gradient. In: Artificial Intelligence and Statistics, pp. 148–156. PMLR (2017) Lei, L., Jordan, M.: Less than a single pass: stochastically controlled stochastic gradient. In: Artificial Intelligence and Statistics, pp. 148–156. PMLR (2017)
20.
Zurück zum Zitat Lei, L., Jordan, M.I.: On the adaptivity of stochastic gradient-based optimization. SIAM J. Optim. 30(2), 1473–1500 (2020)MathSciNetCrossRef Lei, L., Jordan, M.I.: On the adaptivity of stochastic gradient-based optimization. SIAM J. Optim. 30(2), 1473–1500 (2020)MathSciNetCrossRef
21.
Zurück zum Zitat Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via SCSG methods. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 2345–2355 (2017) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via SCSG methods. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 2345–2355 (2017)
22.
23.
Zurück zum Zitat Li, Z.: SSRGD: simple stochastic recursive gradient descent for escaping saddle points. In: Advances in Neural Information Processing Systems, vol. 32, pp. 1523–1533 (2019) Li, Z.: SSRGD: simple stochastic recursive gradient descent for escaping saddle points. In: Advances in Neural Information Processing Systems, vol. 32, pp. 1523–1533 (2019)
24.
Zurück zum Zitat Li, Z., Bao, H., Zhang, X., Richtárik, P.: PAGE: a simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295. PMLR (2021) Li, Z., Bao, H., Zhang, X., Richtárik, P.: PAGE: a simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295. PMLR (2021)
25.
Zurück zum Zitat Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 5569–5579 (2018) Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 5569–5579 (2018)
27.
Zurück zum Zitat Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: SARAH: a novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621. PMLR (2017) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: SARAH: a novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621. PMLR (2017)
28.
Zurück zum Zitat Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:​1705.​07261 (2017)
29.
Zurück zum Zitat Pinelis, I.: An approach to inequalities for the distributions of infinite-dimensional martingales. In: Dudley, R.M., Hahn, M.G., Kuelbs, J. (eds.) Probability in Banach Spaces, 8: Proceedings of the Eighth International Conference. Progress in Probability, vol. 30, pp. 128–134. Birkhäuser, Boston (1992). https://doi.org/10.1007/978-1-4612-0367-4_9 Pinelis, I.: An approach to inequalities for the distributions of infinite-dimensional martingales. In: Dudley, R.M., Hahn, M.G., Kuelbs, J. (eds.) Probability in Banach Spaces, 8: Proceedings of the Eighth International Conference. Progress in Probability, vol. 30, pp. 128–134. Birkhäuser, Boston (1992). https://​doi.​org/​10.​1007/​978-1-4612-0367-4_​9
30.
Zurück zum Zitat Pinelis, I.: Optimum bounds for the distributions of martingales in Banach spaces. Ann. Probab. 22, 1679–1706 (1994)MathSciNetCrossRef Pinelis, I.: Optimum bounds for the distributions of martingales in Banach spaces. Ann. Probab. 22, 1679–1706 (1994)MathSciNetCrossRef
31.
Zurück zum Zitat Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323. PMLR (2016) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323. PMLR (2016)
32.
Zurück zum Zitat Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: Hybrid stochastic gradient descent algorithms for stochastic nonconvex optimization. arXiv preprint arXiv:1905.05920 (2019) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: Hybrid stochastic gradient descent algorithms for stochastic nonconvex optimization. arXiv preprint arXiv:​1905.​05920 (2019)
33.
Zurück zum Zitat Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: SpiderBoost and momentum: faster variance reduction algorithms. In: Advances in Neural Information Processing Systems, vol. 32, pp. 2406–2416 (2019) Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: SpiderBoost and momentum: faster variance reduction algorithms. In: Advances in Neural Information Processing Systems, vol. 32, pp. 2406–2416 (2019)
34.
Zurück zum Zitat Zhou, D., Chen, J., Cao, Y., Tang, Y., Yang, Z., Gu, Q.: On the convergence of adaptive gradient methods for nonconvex optimization. arXiv preprint arXiv:1808.05671 (2018) Zhou, D., Chen, J., Cao, Y., Tang, Y., Yang, Z., Gu, Q.: On the convergence of adaptive gradient methods for nonconvex optimization. arXiv preprint arXiv:​1808.​05671 (2018)
Metadaten
Titel
Probabilistic Guarantees of Stochastic Recursive Gradient in Non-convex Finite Sum Problems
verfasst von
Yanjie Zhong
Jiaqi Li
Soumendra Lahiri
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-2259-4_11

Premium Partner