nach oben

Erschienen in:

2024 | OriginalPaper | Buchkapitel

Meta-Reinforcement Learning Algorithm Based on Reward and Dynamic Inference

verfasst von : Jinhao Chen, Chunhong Zhang, Zheng Hu

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer Nature Singapore

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Meta-Reinforcement Learning aims to rapidly address unseen tasks that share similar structures. However, the agent heavily relies on a large amount of experience during the meta-training phase, presenting a formidable challenge in achieving high sample efficiency. Current methods typically adapt to novel tasks within the Meta-Reinforcement Learning framework through task inference. Unfortunately, these approaches still exhibit limitations when faced with high-complexity task space. In this paper, we propose a Meta-Reinforcement Learning method based on reward and dynamic inference. We introduce independent reward and dynamic inference encoders, which sample specific context information to capture the deep-level features of task goals and dynamics. By reducing task inference space, agent effectively learns the shared structures across tasks and acquires a profound understanding of the task differences. We illustrate the performance degradation caused by the high task inference complexity and demonstrate that our method outperforms previous algorithms in terms of sample efficiency.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel How Large Corpora Sizes Influence the Distribution of Low Frequency Text n-grams

Nächstes Kapitel SecureBoost: Large Scale and High-Performance Vertical Federated Gradient Boosting Decision Tree

Bellemare, M.G., et al.: Autonomous navigation of stratospheric balloons using reinforcement learning. Nature 588(7836), 77–82. https://doi.org/10.1038/s41586-020-2939-8. https://www.nature.com/articles/s41586-020-2939-8

Miki, T., Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V., Hutter, M.: Learning robust perceptive locomotion for quadrupedal robots in the wild. Sci. Robot. 7(62), eabk2822. https://doi.org/10.1126/scirobotics.abk2822. https://www.science.org/doi/full/10.1126/scirobotics.abk2822

Lake, B.M., Ullman, T.D., Tenenbaum, J.B., Gershman, S.J.: Building machines that learn and think like people. Behav. Brain Sci. 40, e253 (2017). https://doi.org/10.1017/S0140525X16001837CrossRef

Peng, M., Zhu, B., Jiao, J.: Linear representation meta-reinforcement learning for instant adaptation. arXiv arXiv:2101.04750v1 (2021)

Beck, J., et al.: A survey of meta-reinforcement learning. arXiv arXiv:2301.08028 (2023). https://doi.org/10.48550/arXiv.2301.08028

Imagawa, T., Hiraoka, T., Tsuruoka, Y.: Off-policy meta-reinforcement learning with belief-based task inference. IEEE Access 10, 49494–49507. https://doi.org/10.1109/ACCESS.2022.3170582. https://ieeexplore.ieee.org/abstract/document/9763505

Wang, J.X., et al.: Learning to reinforcement learn. arXiv arXiv:1611.05763 (2017)

Melo, L.C.: Transformers are meta-reinforcement learners. arXiv arXiv:2206.06614 (2022)

Rakelly, K., Zhou, A., Quillen, D., Finn, C., Levine, S.: Efficient off-policy meta-reinforcement learning via probabilistic context variables, p. 10 (2019)

10.

Jiang, P., Song, S., Huang, G.: Exploration with task information for meta reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 34(8), 4033–4046 (2023). https://doi.org/10.1109/TNNLS.2021.3121432. https://ieeexplore.ieee.org/document/9604770/

11.

Humplik, J., Galashov, A., Hasenclever, L., Ortega, P.A., Teh, Y.W., Heess, N.: Meta reinforcement learning as task inference. arXiv arXiv:1905.06424 (2019)

12.

Han, X., Wu, F.: Meta reinforcement learning with successor feature based context. arXiv arXiv:2207.14723 (2022)

13.

Gupta, A., Mendonca, R., Liu, Y., Abbeel, P., Levine, S.: Meta-reinforcement learning of structured exploration strategies. In: Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018). https://proceedings.neurips.cc/paper/2018/hash/4de754248c196c85ee4fbdcee89179bd-Abstract.html

14.

Stadie, B.C., et al.: Some considerations on learning to explore via meta-reinforcement learning. arXiv arXiv:1803.01118 (2018)

15.

Rothfuss, J., Lee, D., Clavera, I., Asfour, T., Abbeel, P.: ProMP: proximal meta-policy search (2018). https://doi.org/10.48550/arXiv.1810.06784. http://arxiv.org/abs/1810.06784

16.

Zintgraf, L., Shiarli, K., Kurin, V., Hofmann, K., Whiteson, S.: Fast context adaptation via meta-learning. In: Proceedings of the 36th International Conference on Machine Learning, pp. 7693–7702. PMLR (2018). ISSN 2640-3498. https://proceedings.mlr.press/v97/zintgraf19a.html

17.

Vuorio, R., Beck, J., Farquhar, G., Foerster, J., Whiteson, S.: No dice: an investigation of the bias- variance tradeoff in meta-gradients (2022)

18.

Mendonca, R., Gupta, A., Kralev, R., Abbeel, P., Levine, S., Finn, C.: Guided meta-policy search. In: Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper/2019/hash/d324a0cc02881779dcda44a675fdcaaa-Abstract.html

19.

Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks, p. 10 (2017)

20.

Korshunova, I., Degrave, J., Dambre, J., Gretton, A., Huszár, F.: Exchangeable models in meta reinforcement learning (2020)

21.

Raileanu, R., Goldstein, M., Szlam, A., Fergus, R.: Fast adaptation via policy-dynamics value functions (2020). https://doi.org/10.48550/arXiv.2007.02879. http://arxiv.org/abs/2007.02879

22.

He, J.Z.Y., Raghunathan, A., Brown, D.S., Erickson, Z., Dragan, A.D.: Learning representations that enable generalization in assistive tasks (2022). https://doi.org/10.48550/arXiv.2212.03175. https://arxiv.org/abs/2212.03175v1

23.

Beck, J., Jackson, M.T., Vuorio, R., Whiteson, S.: Hypernetworks in meta-reinforcement learning (2022). https://doi.org/10.48550/arXiv.2210.11348. https://arxiv.org/abs/2210.11348v1

24.

Duan, Y., Schulman, J., Chen, X., Bartlett, P.L., Sutskever, I., Abbeel, P.: RL\(^{2}\): fast reinforcement learning via slow reinforcement learning. arXiv arXiv:1611.02779 (2017)

25.

Greenberg, I., Mannor, S., Chechik, G., Meirom, E.: Train hard, fight easy: robust meta reinforcement learning (2023). https://doi.org/10.48550/arXiv.2301.11147. http://arxiv.org/abs/2301.11147

26.

Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1), 99–134 (1998). https://doi.org/10.1016/S0004-3702(98)00023-X. https://www.sciencedirect.com/science/article/pii/S000437029800023X

27.

Zintgraf, L., et al.: VariBAD: a very good method for Bayes-adaptive deep RL via meta-learning (2020). https://doi.org/10.48550/arXiv.1910.08348. https://arxiv.org/abs/1910.08348v2

28.

Yu, T., et al.: Meta-world: a benchmark and evaluation for multi-task and meta reinforcement learning, p. 17 (2021)

29.

Yang, R., Xu, H., Wu, Y., Wang, X.: Multi-task reinforcement learning with soft modularization. arXiv arXiv:2003.13661 (2020)

30.

Li, L., Huang, Y., Chen, M., Luo, S., Luo, D., Huang, J.: Provably improved context-based offline meta-RL with attention and contrastive learning, p. 21 (2021)

31.

Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2022). https://doi.org/10.48550/arXiv.1312.6114. http://arxiv.org/abs/1312.6114

32.

Alemi, A.A., Fischer, I., Dillon, J.V., Murphy, K.: Deep variational information bottleneck (2019). https://doi.org/10.48550/arXiv.1612.00410. http://arxiv.org/abs/1612.00410

33.

Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor (2018). https://doi.org/10.48550/arXiv.1801.01290. http://arxiv.org/abs/1801.01290

34.

Todorov, E., Erez, T., Tassa, Y.: MuJoCo: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). ISSN 2153-0866. https://doi.org/10.1109/IROS.2012.6386109. https://ieeexplore.ieee.org/abstract/document/6386109

Titel: Meta-Reinforcement Learning Algorithm Based on Reward and Dynamic Inference
verfasst von: Jinhao Chen
Chunhong Zhang
Zheng Hu
Verlag: Springer Nature Singapore
Buch: Advances in Knowledge Discovery and Data Mining
Print ISBN: 978-981-9722-61-7

Electronic ISBN: 978-981-9722-59-4

Copyright-Jahr: 2024
DOI: https://doi.org/10.1007/978-981-97-2259-4_17

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner