Skip to main content

2024 | OriginalPaper | Buchkapitel

Enhancing Policy Gradient for Traveling Salesman Problem with Data Augmented Behavior Cloning

verfasst von : Yunchao Zhang, Kewen Liao, Zhibin Liao, Longkun Guo

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The use of deep reinforcement learning (DRL) techniques to solve classical combinatorial optimization problems like the Traveling Salesman Problem (TSP) has garnered considerable attention due to its advantage of flexible and fast model-based inference. However, DRL training often suffers low efficiency and scalability, which hinders model generalization. This paper proposes a simple yet effective pre-training method that utilizes behavior cloning to initialize neural network parameters for policy gradient DRL. To alleviate the need for large amounts of demonstrations in behavior cloning, we exploit the symmetry of TSP solutions for augmentation. Our method is demonstrated by enhancing the state-of-the-art policy gradient models Attention and POMO for the TSP. Experimental results show that the optimality gap of the solution is significantly reduced while the DRL training time is greatly shortened. This also enables effective and efficient solving of larger TSP instances.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Aarts, E.H., Lenstra, J.K.: Local search in combinatorial optimization. Princeton University Press (2003) Aarts, E.H., Lenstra, J.K.: Local search in combinatorial optimization. Princeton University Press (2003)
2.
Zurück zum Zitat Bellman, R.: Dynamic programming treatment of the travelling salesman problem. J. ACM (JACM) 9(1), 61–63 (1962)MathSciNetCrossRef Bellman, R.: Dynamic programming treatment of the travelling salesman problem. J. ACM (JACM) 9(1), 61–63 (1962)MathSciNetCrossRef
3.
Zurück zum Zitat Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural combinatorial optimization with reinforcement learning. In: Proceedings of International Conference on Learning Representations (ICLR) (2017) Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural combinatorial optimization with reinforcement learning. In: Proceedings of International Conference on Learning Representations (ICLR) (2017)
4.
Zurück zum Zitat Bengio, Y., Lodi, A., Prouvost, A.: Machine learning for combinatorial optimization: a methodological tour d’horizon. Eur. J. Oper. Res. 290(2), 405–421 (2021)MathSciNetCrossRef Bengio, Y., Lodi, A., Prouvost, A.: Machine learning for combinatorial optimization: a methodological tour d’horizon. Eur. J. Oper. Res. 290(2), 405–421 (2021)MathSciNetCrossRef
5.
Zurück zum Zitat Dai, H., Dai, B., Song, L.: Discriminative embeddings of latent variable models for structured data. In: International Conference on Machine Learning, pp. 2702–2711. PMLR (2016) Dai, H., Dai, B., Song, L.: Discriminative embeddings of latent variable models for structured data. In: International Conference on Machine Learning, pp. 2702–2711. PMLR (2016)
7.
Zurück zum Zitat Halim, A.H., Ismail, I.: Combinatorial optimization: comparison of heuristic algorithms in travelling salesman problem. Arch. Comput. Methods Eng. 26, 367–380 (2019)MathSciNetCrossRef Halim, A.H., Ismail, I.: Combinatorial optimization: comparison of heuristic algorithms in travelling salesman problem. Arch. Comput. Methods Eng. 26, 367–380 (2019)MathSciNetCrossRef
8.
Zurück zum Zitat Helsgaun, K.: An extension of the lin-kernighan-helsgaun TSP solver for constrained traveling salesman and vehicle routing problems: Technical report (2017) Helsgaun, K.: An extension of the lin-kernighan-helsgaun TSP solver for constrained traveling salesman and vehicle routing problems: Technical report (2017)
9.
Zurück zum Zitat Hussein, A., Gaber, M.M., Elyan, E., Jayne, C.: Imitation learning: a survey of learning methods. ACM Comput. Surv. (CSUR) 50(2), 1–35 (2017)CrossRef Hussein, A., Gaber, M.M., Elyan, E., Jayne, C.: Imitation learning: a survey of learning methods. ACM Comput. Surv. (CSUR) 50(2), 1–35 (2017)CrossRef
10.
Zurück zum Zitat Khalil, E., Dai, H., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial optimization algorithms over graphs. Adv. Neural. Inf. Process. Syst. 30 (2017) Khalil, E., Dai, H., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial optimization algorithms over graphs. Adv. Neural. Inf. Process. Syst. 30 (2017)
11.
Zurück zum Zitat Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of International Conference on Learning Representations (ICLR) (2015) Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of International Conference on Learning Representations (ICLR) (2015)
12.
Zurück zum Zitat Kool, W., van Hoof, H., Welling, M.: Attention, learn to solve routing problems! In: International Conference on Learning Representations (2019) Kool, W., van Hoof, H., Welling, M.: Attention, learn to solve routing problems! In: International Conference on Learning Representations (2019)
13.
Zurück zum Zitat Kwon, Y.D., Choo, J., Kim, B., Yoon, I., Gwon, Y., Min, S.: Pomo: Policy optimization with multiple optima for reinforcement learning. Adv. Neural. Inf. Process. Syst. 33, 21188–21198 (2020) Kwon, Y.D., Choo, J., Kim, B., Yoon, I., Gwon, Y., Min, S.: Pomo: Policy optimization with multiple optima for reinforcement learning. Adv. Neural. Inf. Process. Syst. 33, 21188–21198 (2020)
15.
Zurück zum Zitat Ma, Q., Ge, S., He, D., Thaker, D., Drori, I.: Combinatorial optimization by graph pointer networks and hierarchical reinforcement learning. In: AAAI Workshop on Deep Learning on Graphs: Methodologies and Applications (2020) Ma, Q., Ge, S., He, D., Thaker, D., Drori, I.: Combinatorial optimization by graph pointer networks and hierarchical reinforcement learning. In: AAAI Workshop on Deep Learning on Graphs: Methodologies and Applications (2020)
16.
Zurück zum Zitat Matai, R., Singh, S.P., Mittal, M.L.: Traveling salesman problem: an overview of applications, formulations, and solution approaches. Traveling Salesman Problem, Theory and Applications 1 (2010) Matai, R., Singh, S.P., Mittal, M.L.: Traveling salesman problem: an overview of applications, formulations, and solution approaches. Traveling Salesman Problem, Theory and Applications 1 (2010)
17.
Zurück zum Zitat d O Costa, P.R., Rhuggenaath, J., Zhang, Y., Akcay, A.: Learning 2-opt heuristics for the traveling salesman problem via deep reinforcement learning. In: Asian Conference on Machine Learning, pp. 465–480. PMLR (2020) d O Costa, P.R., Rhuggenaath, J., Zhang, Y., Akcay, A.: Learning 2-opt heuristics for the traveling salesman problem via deep reinforcement learning. In: Asian Conference on Machine Learning, pp. 465–480. PMLR (2020)
18.
Zurück zum Zitat Papadimitriou, C.H.: The euclidean travelling salesman problem is np-complete. Theoret. Comput. Sci. 4(3), 237–244 (1977)MathSciNetCrossRef Papadimitriou, C.H.: The euclidean travelling salesman problem is np-complete. Theoret. Comput. Sci. 4(3), 237–244 (1977)MathSciNetCrossRef
20.
Zurück zum Zitat Pomerleau, D.A.: Alvinn: an autonomous land vehicle in a neural network. Adv. Neural. Inf. Process. Syst. 1 (1988) Pomerleau, D.A.: Alvinn: an autonomous land vehicle in a neural network. Adv. Neural. Inf. Process. Syst. 1 (1988)
21.
Zurück zum Zitat Rajeswaran, A., et al.: Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. In: Proceedings of Robotics: Science and Systems. Pittsburgh, Pennsylvania (June 2018) Rajeswaran, A., et al.: Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. In: Proceedings of Robotics: Science and Systems. Pittsburgh, Pennsylvania (June 2018)
22.
Zurück zum Zitat Riedmiller, M.: Neural fitted Q iteration – first experiences with a data efficient neural reinforcement learning method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 317–328. Springer, Heidelberg (2005). https://doi.org/10.1007/11564096_32CrossRef Riedmiller, M.: Neural fitted Q iteration – first experiences with a data efficient neural reinforcement learning method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 317–328. Springer, Heidelberg (2005). https://​doi.​org/​10.​1007/​11564096_​32CrossRef
23.
Zurück zum Zitat Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 627–635. JMLR Workshop and Conference Proceedings (2011) Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 627–635. JMLR Workshop and Conference Proceedings (2011)
24.
Zurück zum Zitat Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)CrossRef Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)CrossRef
25.
Zurück zum Zitat Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. Adv. Neural. Inf. Process. Syst. 27 (2014) Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. Adv. Neural. Inf. Process. Syst. 27 (2014)
26.
Zurück zum Zitat Thrun, S., Littman, M.L.: Reinforcement learning: an introduction. AI Mag. 21(1), 103–103 (2000) Thrun, S., Littman, M.L.: Reinforcement learning: an introduction. AI Mag. 21(1), 103–103 (2000)
27.
Zurück zum Zitat Torabi, F., Warnell, G., Stone, P.: Recent advances in imitation learning from observation. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pp. 6324–6331 (2019) Torabi, F., Warnell, G., Stone, P.: Recent advances in imitation learning from observation. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pp. 6324–6331 (2019)
28.
Zurück zum Zitat Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. Adv. Neural. Inf. Process. Syst. 28 (2015) Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. Adv. Neural. Inf. Process. Syst. 28 (2015)
29.
Zurück zum Zitat Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Reinforc. Learn., 5–32 (1992) Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Reinforc. Learn., 5–32 (1992)
30.
Zurück zum Zitat Williamson, D.P., Shmoys, D.B.: The design of approximation algorithms. Cambridge University Press (2011) Williamson, D.P., Shmoys, D.B.: The design of approximation algorithms. Cambridge University Press (2011)
31.
Zurück zum Zitat Xin, L., Song, W., Cao, Z., Zhang, J.: Multi-decoder attention model with embedding glimpse for solving vehicle routing problems. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 12042–12049 (2021) Xin, L., Song, W., Cao, Z., Zhang, J.: Multi-decoder attention model with embedding glimpse for solving vehicle routing problems. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 12042–12049 (2021)
32.
Zurück zum Zitat Xin, L., Song, W., Cao, Z., Zhang, J.: Step-wise deep learning models for solving routing problems. IEEE Trans. Industr. Inf. 17(7), 4861–4871 (2021)CrossRef Xin, L., Song, W., Cao, Z., Zhang, J.: Step-wise deep learning models for solving routing problems. IEEE Trans. Industr. Inf. 17(7), 4861–4871 (2021)CrossRef
33.
Zurück zum Zitat Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? In: International Conference on Learning Representations (2019) Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? In: International Conference on Learning Representations (2019)
34.
Zurück zum Zitat Yang, H., Gu, M.: A new baseline of policy gradient for traveling salesman problem. In: 2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA), pp. 1–7. IEEE (2022) Yang, H., Gu, M.: A new baseline of policy gradient for traveling salesman problem. In: 2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA), pp. 1–7. IEEE (2022)
35.
Zurück zum Zitat Zaheer, M., et al.: Big bird: transformers for longer sequences. Adv. Neural. Inf. Process. Syst. 33 (2020) Zaheer, M., et al.: Big bird: transformers for longer sequences. Adv. Neural. Inf. Process. Syst. 33 (2020)
Metadaten
Titel
Enhancing Policy Gradient for Traveling Salesman Problem with Data Augmented Behavior Cloning
verfasst von
Yunchao Zhang
Kewen Liao
Zhibin Liao
Longkun Guo
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-2253-2_26

Premium Partner