Skip to main content

2024 | OriginalPaper | Buchkapitel

RPH-PGD: Randomly Projected Hessian for Perturbed Gradient Descent

verfasst von : Chi-Chang Li, Jay Huang, Wing-Kai Hon, Che-Rung Lee

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The perturbed gradient descent (PGD) method, which adds random noises in the search directions, has been widely used in solving large-scale optimization problems, owing to its capability to escape from saddle points. However, it is inefficient sometimes for two reasons. First, the random noises may not point to a descent direction, so PGD may still stagnate around saddle points. Second, the size of random noises, which is controlled by the radius of the perturbation ball, may not be properly configured, so the convergence is slow. In this paper, we proposed a method, called RPH-PGD (Randomly Projected Hessian for Perturbed Gradient Descent), to improve the performance of PGD. The randomly projected Hessian (RPH) is created by projecting the Hessian matrix into a relatively small subspace which contains rich information about the eigenvectors of the original Hessian matrix. RPH-PGD utilizes the eigenvalues and eigenvectors of the randomly projected Hessian to identify the negative curvatures and uses the matrix itself to estimate the changes of Hessian matrices, which is necessary information for dynamically adjusting the radius during the computation. In addition, RPH-PGD employs the finite difference method to approximate the product of the Hessian and vectors, instead of constructing the Hessian explicitly. The amortized analysis shows the time complexity of RPH-PGD is only slightly higher than that of PGD. The experimental results show RPH-PGD does not only converge faster than PGD, but also converges in cases that PGD cannot.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Agarwal, N., Allen-Zhu, Z., Bullins, B., Hazan, E., Ma, T.: Finding approximate local minima faster than gradient descent. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1195–1199. STOC 2017, Association for Computing Machinery, New York, NY, USA (2017) Agarwal, N., Allen-Zhu, Z., Bullins, B., Hazan, E., Ma, T.: Finding approximate local minima faster than gradient descent. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1195–1199. STOC 2017, Association for Computing Machinery, New York, NY, USA (2017)
2.
Zurück zum Zitat Allen-Zhu, Z.: Natasha 2: faster non-convex optimization than SGD. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018) Allen-Zhu, Z.: Natasha 2: faster non-convex optimization than SGD. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018)
3.
Zurück zum Zitat Allen-Zhu, Z., Li, Y.: NEON2: finding local minima via first-order Oracles. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018) Allen-Zhu, Z., Li, Y.: NEON2: finding local minima via first-order Oracles. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018)
4.
Zurück zum Zitat Carmon, Y., Duchi, J.C., Hinder, O., Sidford, A.: Accelerated methods for nonconvex optimization. SIAM J. Optim. 28(2), 1751–1772 (2018)MathSciNetCrossRef Carmon, Y., Duchi, J.C., Hinder, O., Sidford, A.: Accelerated methods for nonconvex optimization. SIAM J. Optim. 28(2), 1751–1772 (2018)MathSciNetCrossRef
5.
Zurück zum Zitat Demmel, J., Grigori, L., Hoemmen, M., Langou, J.: Communication-optimal parallel and sequential QR and LU factorizations. SIAM J. Sci. Comput. 34(1), A206–A239 (2012)MathSciNetCrossRef Demmel, J., Grigori, L., Hoemmen, M., Langou, J.: Communication-optimal parallel and sequential QR and LU factorizations. SIAM J. Sci. Comput. 34(1), A206–A239 (2012)MathSciNetCrossRef
6.
Zurück zum Zitat Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(61), 2121–2159 (2011)MathSciNet Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(61), 2121–2159 (2011)MathSciNet
7.
Zurück zum Zitat Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points: online stochastic gradient for tensor decomposition. J. Mach. Learn. Res. 40(2015) Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points: online stochastic gradient for tensor decomposition. J. Mach. Learn. Res. 40(2015)
8.
Zurück zum Zitat Halko, N., Martinsson, P.G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)MathSciNetCrossRef Halko, N., Martinsson, P.G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)MathSciNetCrossRef
9.
Zurück zum Zitat Jia, Z., Stewart, G.W.: An analysis of the Rayleigh-Ritz method for approximating Eigenspaces. Math. Comput. 70, 637–647 (2001)MathSciNetCrossRef Jia, Z., Stewart, G.W.: An analysis of the Rayleigh-Ritz method for approximating Eigenspaces. Math. Comput. 70, 637–647 (2001)MathSciNetCrossRef
15.
Zurück zum Zitat Zhang, C., Li, T.: Escape saddle points by a simple gradient-descent based algorithm. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 8545–8556. Curran Associates, Inc. (2021) Zhang, C., Li, T.: Escape saddle points by a simple gradient-descent based algorithm. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 8545–8556. Curran Associates, Inc. (2021)
Metadaten
Titel
RPH-PGD: Randomly Projected Hessian for Perturbed Gradient Descent
verfasst von
Chi-Chang Li
Jay Huang
Wing-Kai Hon
Che-Rung Lee
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-2253-2_20

Premium Partner