nach oben

Erschienen in:

2024 | OriginalPaper | Buchkapitel

Enhancing Policy Gradient for Traveling Salesman Problem with Data Augmented Behavior Cloning

verfasst von : Yunchao Zhang, Kewen Liao, Zhibin Liao, Longkun Guo

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer Nature Singapore

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The use of deep reinforcement learning (DRL) techniques to solve classical combinatorial optimization problems like the Traveling Salesman Problem (TSP) has garnered considerable attention due to its advantage of flexible and fast model-based inference. However, DRL training often suffers low efficiency and scalability, which hinders model generalization. This paper proposes a simple yet effective pre-training method that utilizes behavior cloning to initialize neural network parameters for policy gradient DRL. To alleviate the need for large amounts of demonstrations in behavior cloning, we exploit the symmetry of TSP solutions for augmentation. Our method is demonstrated by enhancing the state-of-the-art policy gradient models Attention and POMO for the TSP. Experimental results show that the optimality gap of the solution is significantly reduced while the DRL training time is greatly shortened. This also enables effective and efficient solving of larger TSP instances.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Query-Decision Regression Between Shortest Path and Minimum Steiner Tree

Nächstes Kapitel Leveraging Transfer Learning for Enhancing Graph Optimization Problem Solving

Aarts, E.H., Lenstra, J.K.: Local search in combinatorial optimization. Princeton University Press (2003)

Bellman, R.: Dynamic programming treatment of the travelling salesman problem. J. ACM (JACM) 9(1), 61–63 (1962)MathSciNetCrossRef

Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural combinatorial optimization with reinforcement learning. In: Proceedings of International Conference on Learning Representations (ICLR) (2017)

Bengio, Y., Lodi, A., Prouvost, A.: Machine learning for combinatorial optimization: a methodological tour d’horizon. Eur. J. Oper. Res. 290(2), 405–421 (2021)MathSciNetCrossRef

Dai, H., Dai, B., Song, L.: Discriminative embeddings of latent variable models for structured data. In: International Conference on Machine Learning, pp. 2702–2711. PMLR (2016)

Applegate, D., Robert Bixby, V.C., Cook, W.: Concorde TSP Solver (2006). https://www.math.uwaterloo.ca/tsp/concorde/index.html

Halim, A.H., Ismail, I.: Combinatorial optimization: comparison of heuristic algorithms in travelling salesman problem. Arch. Comput. Methods Eng. 26, 367–380 (2019)MathSciNetCrossRef

Helsgaun, K.: An extension of the lin-kernighan-helsgaun TSP solver for constrained traveling salesman and vehicle routing problems: Technical report (2017)

Hussein, A., Gaber, M.M., Elyan, E., Jayne, C.: Imitation learning: a survey of learning methods. ACM Comput. Surv. (CSUR) 50(2), 1–35 (2017)CrossRef

10.

Khalil, E., Dai, H., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial optimization algorithms over graphs. Adv. Neural. Inf. Process. Syst. 30 (2017)

11.

Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of International Conference on Learning Representations (ICLR) (2015)

12.

Kool, W., van Hoof, H., Welling, M.: Attention, learn to solve routing problems! In: International Conference on Learning Representations (2019)

13.

Kwon, Y.D., Choo, J., Kim, B., Yoon, I., Gwon, Y., Min, S.: Pomo: Policy optimization with multiple optima for reinforcement learning. Adv. Neural. Inf. Process. Syst. 33, 21188–21198 (2020)

14.

Lawler, E.L., Wood, D.E.: Branch-and-bound methods: a survey. Oper. Res. 14(4), 699–719 (1966)MathSciNetCrossRef

15.

Ma, Q., Ge, S., He, D., Thaker, D., Drori, I.: Combinatorial optimization by graph pointer networks and hierarchical reinforcement learning. In: AAAI Workshop on Deep Learning on Graphs: Methodologies and Applications (2020)

16.

Matai, R., Singh, S.P., Mittal, M.L.: Traveling salesman problem: an overview of applications, formulations, and solution approaches. Traveling Salesman Problem, Theory and Applications 1 (2010)

17.

d O Costa, P.R., Rhuggenaath, J., Zhang, Y., Akcay, A.: Learning 2-opt heuristics for the traveling salesman problem via deep reinforcement learning. In: Asian Conference on Machine Learning, pp. 465–480. PMLR (2020)

18.

Papadimitriou, C.H.: The euclidean travelling salesman problem is np-complete. Theoret. Comput. Sci. 4(3), 237–244 (1977)MathSciNetCrossRef

19.

Perron, L., Furnon, V.: Or-tools (2022). https://developers.google.com/optimization/

20.

Pomerleau, D.A.: Alvinn: an autonomous land vehicle in a neural network. Adv. Neural. Inf. Process. Syst. 1 (1988)

21.

Rajeswaran, A., et al.: Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. In: Proceedings of Robotics: Science and Systems. Pittsburgh, Pennsylvania (June 2018)

22.

Riedmiller, M.: Neural fitted Q iteration – first experiences with a data efficient neural reinforcement learning method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 317–328. Springer, Heidelberg (2005). https://doi.org/10.1007/11564096_32CrossRef

23.

Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 627–635. JMLR Workshop and Conference Proceedings (2011)

24.

Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)CrossRef

25.

Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. Adv. Neural. Inf. Process. Syst. 27 (2014)

26.

Thrun, S., Littman, M.L.: Reinforcement learning: an introduction. AI Mag. 21(1), 103–103 (2000)

27.

Torabi, F., Warnell, G., Stone, P.: Recent advances in imitation learning from observation. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pp. 6324–6331 (2019)

28.

Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. Adv. Neural. Inf. Process. Syst. 28 (2015)

29.

Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Reinforc. Learn., 5–32 (1992)

30.

Williamson, D.P., Shmoys, D.B.: The design of approximation algorithms. Cambridge University Press (2011)

31.

Xin, L., Song, W., Cao, Z., Zhang, J.: Multi-decoder attention model with embedding glimpse for solving vehicle routing problems. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 12042–12049 (2021)

32.

Xin, L., Song, W., Cao, Z., Zhang, J.: Step-wise deep learning models for solving routing problems. IEEE Trans. Industr. Inf. 17(7), 4861–4871 (2021)CrossRef

33.

Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? In: International Conference on Learning Representations (2019)

34.

Yang, H., Gu, M.: A new baseline of policy gradient for traveling salesman problem. In: 2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA), pp. 1–7. IEEE (2022)

35.

Zaheer, M., et al.: Big bird: transformers for longer sequences. Adv. Neural. Inf. Process. Syst. 33 (2020)

Titel: Enhancing Policy Gradient for Traveling Salesman Problem with Data Augmented Behavior Cloning
verfasst von: Yunchao Zhang
Kewen Liao
Zhibin Liao
Longkun Guo
Verlag: Springer Nature Singapore
Buch: Advances in Knowledge Discovery and Data Mining
Print ISBN: 978-981-9722-52-5

Electronic ISBN: 978-981-9722-53-2

Copyright-Jahr: 2024
DOI: https://doi.org/10.1007/978-981-97-2253-2_26

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner