Skip to main content

2023 | OriginalPaper | Buchkapitel

Heuristic Search Optimisation Using Planning and Curriculum Learning Techniques

verfasst von : Leah Chrestien, Tomás̆ Pevný, Stefan Edelkamp, Antonín Komenda

Erschienen in: Progress in Artificial Intelligence

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Learning a well-informed heuristic function for hard planning domains is an elusive problem. Although there are known neural network architectures to represent such heuristic knowledge, it is not obvious what concrete information is learned and whether techniques aimed at understanding the structure help in improving the quality of the heuristics. This paper presents a network model that learns a heuristic function capable of relating distant parts of the state space via optimal plan imitation using the attention mechanism which drastically improves the learning of a good heuristic function. The learning of this heuristic function is further improved by the use of curriculum learning, where newly solved problem instances are added to the training set, which, in turn, helps to solve problems of higher complexities and train from harder problem instances. The methodologies used in this paper far exceed the performances of all existing baselines including known deep learning approaches and classical planning heuristics. We demonstrate its effectiveness and success on grid-type PDDL domains, namely Sokoban, maze-with-teleports and sliding tile puzzles.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The composability of harmonic functions is based on the following property \(\cos (\theta _1 + \theta _2) = \cos (\theta _1)\cos (\theta _2) - \sin (\theta _1)\sin (\theta _2) = (\cos (\theta _1), \sin (\theta _1)) \cdot (\sin (\theta _1), \sin (\theta _2)),\) where \(\cdot \) denotes the inner product of two vectors, which appears in Eq. (1) in inner product of \({\textbf {q}}_{u,v}\) and \({\textbf {k}}_{r,s}\).
 
2
Convolution layers are appropriately padded to preserve sizes.
 
8
The planners and NNs were given 10 minutes to solve each maze instance.
 
9
The planners and NNs were given 10 minutes to solve each maze instance.
 
Literatur
1.
Zurück zum Zitat Agostinelli, F., McAleer, S., Shmakov, A., Baldi, P.: Solving the rubik’s cube with deep reinforcement learning and search. Nature Mach. Intell. 1(8), 356–363 (2019)CrossRef Agostinelli, F., McAleer, S., Shmakov, A., Baldi, P.: Solving the rubik’s cube with deep reinforcement learning and search. Nature Mach. Intell. 1(8), 356–363 (2019)CrossRef
2.
Zurück zum Zitat Asai, M., Fukunaga, A.: Classical planning in deep latent space: Bridging the subsymbolic-symbolic boundary. arXiv preprint arXiv:1705.00154 (2017) Asai, M., Fukunaga, A.: Classical planning in deep latent space: Bridging the subsymbolic-symbolic boundary. arXiv preprint arXiv:​1705.​00154 (2017)
3.
Zurück zum Zitat Bonet, B., Geffner, H.: Planning as heuristic search. Artif. Intell. 129(1–2), 5–33 (2001) Bonet, B., Geffner, H.: Planning as heuristic search. Artif. Intell. 129(1–2), 5–33 (2001)
4.
Zurück zum Zitat Elman, J.L.: Learning and development in neural networks: the importance of starting small. Cognition 48(1), 71–99 (1993)CrossRef Elman, J.L.: Learning and development in neural networks: the importance of starting small. Cognition 48(1), 71–99 (1993)CrossRef
5.
Zurück zum Zitat Ernandes, M., Gori, M.: Likely-admissible and sub-symbolic heuristics. In: Proceedings of the 16th European Conference on Artificial Intelligence, pp. 613–617 (2004) Ernandes, M., Gori, M.: Likely-admissible and sub-symbolic heuristics. In: Proceedings of the 16th European Conference on Artificial Intelligence, pp. 613–617 (2004)
6.
Zurück zum Zitat Fikes, R.E., Nilsson, N.J.: Strips: a new approach to the application of theorem proving to problem solving. Artif. Intell. 2(3–4), 189–208 (1971)CrossRef Fikes, R.E., Nilsson, N.J.: Strips: a new approach to the application of theorem proving to problem solving. Artif. Intell. 2(3–4), 189–208 (1971)CrossRef
7.
Zurück zum Zitat Fox, M., Long, D.: Pddl2. 1: An extension to pddl for expressing temporal planning domains. J. Artif. Intell. Res. 20, 61–124 (2003) Fox, M., Long, D.: Pddl2. 1: An extension to pddl for expressing temporal planning domains. J. Artif. Intell. Res. 20, 61–124 (2003)
8.
Zurück zum Zitat Groshev, E., Goldstein, M., Tamar, A., Srivastava, S., Abbeel, P.: Learning generalized reactive policies using deep neural networks. arXiv:1708.07280 (2017) Groshev, E., Goldstein, M., Tamar, A., Srivastava, S., Abbeel, P.: Learning generalized reactive policies using deep neural networks. arXiv:​1708.​07280 (2017)
9.
Zurück zum Zitat Hart, P.E., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 4(2), 100–107 (1968)CrossRef Hart, P.E., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 4(2), 100–107 (1968)CrossRef
10.
Zurück zum Zitat Katz, M., Hoffmann, J.: Mercury planner: Pushing the limits of partial delete relaxation. In: IPC 2014 Planner Abstracts, pp. 43–47 (2014) Katz, M., Hoffmann, J.: Mercury planner: Pushing the limits of partial delete relaxation. In: IPC 2014 Planner Abstracts, pp. 43–47 (2014)
11.
Zurück zum Zitat Katz, M., Sohrabi, S., Samulowitz, H., Sievers, S.: Delfi: Online planner selection for cost-optimal planning. In: IPC-9 Planner Abstracts, pp. 57–64 (2018) Katz, M., Sohrabi, S., Samulowitz, H., Sievers, S.: Delfi: Online planner selection for cost-optimal planning. In: IPC-9 Planner Abstracts, pp. 57–64 (2018)
13.
Zurück zum Zitat Long, D., Fox, M.: Automatic synthesis and use of generic types in planning. In: AAAI, pp. 196–205. AAAI Press (2000) Long, D., Fox, M.: Automatic synthesis and use of generic types in planning. In: AAAI, pp. 196–205. AAAI Press (2000)
14.
Zurück zum Zitat Racanière, S., Weber, T., Reichert, D., Buesing, L., Guez, A., Jimenez Rezende, D., Puigdomènech Badia, A., Vinyals, O., Heess, N., Li, Y., et al.: Imagination-augmented agents for deep reinforcement learning. Adv. Neural. Inf. Process. Syst. 30, 5690–5701 (2017) Racanière, S., Weber, T., Reichert, D., Buesing, L., Guez, A., Jimenez Rezende, D., Puigdomènech Badia, A., Vinyals, O., Heess, N., Li, Y., et al.: Imagination-augmented agents for deep reinforcement learning. Adv. Neural. Inf. Process. Syst. 30, 5690–5701 (2017)
15.
Zurück zum Zitat Richter, S., Westphal, M.: The lama planner: Guiding cost-based anytime planning with landmarks. J. Artif. Intell. Res. 39, 127–177 (2010)CrossRef Richter, S., Westphal, M.: The lama planner: Guiding cost-based anytime planning with landmarks. J. Artif. Intell. Res. 39, 127–177 (2010)CrossRef
16.
Zurück zum Zitat Schaal, S.: Is imitation learning the route to humanoid robots? Trends Cogn. Sci. 3(6), 233–242 (1999)CrossRef Schaal, S.: Is imitation learning the route to humanoid robots? Trends Cogn. Sci. 3(6), 233–242 (1999)CrossRef
17.
Zurück zum Zitat Schrader, M.P.B.: gym-sokoban. github.com/mpSchrader/gym-sokoban (2018) Schrader, M.P.B.: gym-sokoban. github.com/mpSchrader/gym-sokoban (2018)
18.
Zurück zum Zitat Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017) Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
19.
Zurück zum Zitat Tesauro, G.: Programming backgammon using self-teaching neural nets. Artif. Intell. 134(1–2), 181–199 (2002)CrossRef Tesauro, G.: Programming backgammon using self-teaching neural nets. Artif. Intell. 134(1–2), 181–199 (2002)CrossRef
20.
Zurück zum Zitat Thrun, S.: Learning to play the game of chess. Adv. Neural. Inf. Process. Syst. 7, 1069–1076 (1994) Thrun, S.: Learning to play the game of chess. Adv. Neural. Inf. Process. Syst. 7, 1069–1076 (1994)
21.
Zurück zum Zitat Torralba, A., Alcázar, V., Borrajo, D., Kissmann, P., Edelkamp, S.: Symba*: A symbolic bidirectional a* planner. In: International Planning Competition, pp. 105–108 (2014) Torralba, A., Alcázar, V., Borrajo, D., Kissmann, P., Edelkamp, S.: Symba*: A symbolic bidirectional a* planner. In: International Planning Competition, pp. 105–108 (2014)
22.
Zurück zum Zitat Torrey, L., Shavlik, J., Walker, T., Maclin, R.: Skill acquisition via transfer learning and advice taking. In: European Conference on Machine Learning, pp. 425–436. Springer (2006) Torrey, L., Shavlik, J., Walker, T., Maclin, R.: Skill acquisition via transfer learning and advice taking. In: European Conference on Machine Learning, pp. 425–436. Springer (2006)
23.
Zurück zum Zitat Tsai, Y.H.H., Bai, S., Yamada, M., Morency, L.P., Salakhutdinov, R.: Transformer dissection: An unified understanding for transformer’s attention via the lens of kernel. arXiv:1908.11775 (2019) Tsai, Y.H.H., Bai, S., Yamada, M., Morency, L.P., Salakhutdinov, R.: Transformer dissection: An unified understanding for transformer’s attention via the lens of kernel. arXiv:​1908.​11775 (2019)
24.
Zurück zum Zitat Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł, Polosukhin, I.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 5998–6008 (2017) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł, Polosukhin, I.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 5998–6008 (2017)
25.
Zurück zum Zitat Virseda, J., Borrajo, D., Alcázar, V.: Learning heuristic functions for cost-based planning. Plan. Learn. 6 (2013) Virseda, J., Borrajo, D., Alcázar, V.: Learning heuristic functions for cost-based planning. Plan. Learn. 6 (2013)
26.
Metadaten
Titel
Heuristic Search Optimisation Using Planning and Curriculum Learning Techniques
verfasst von
Leah Chrestien
Tomás̆ Pevný
Stefan Edelkamp
Antonín Komenda
Copyright-Jahr
2023
DOI
https://doi.org/10.1007/978-3-031-49008-8_39

Premium Partner