nach oben

Erschienen in:

2024 | OriginalPaper | Buchkapitel

Towards Offline Reinforcement Learning with Pessimistic Value Priors

verfasst von : Filippo Valdettaro, A. Aldo Faisal

Erschienen in: Epistemic Uncertainty in Artificial Intelligence

Verlag: Springer Nature Switzerland

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Offline reinforcement learning (RL) seeks to train agents in sequential decision-making tasks using only previously collected data and without directly interacting with the environment. As the agent tries to improve on the policy present in the dataset, it can introduce distributional shift between the training data and the suggested agent’s policy which can lead to poor performance. To avoid the agent assigning high values to out-of-distribution actions, successful offline RL requires some form of conservatism to be introduced. Here we present a model-free inference framework that encodes this conservatism in the prior belief of the value function: by carrying out policy evaluation with a pessimistic prior, we ensure that only the actions that are directly supported by the offline dataset will be modelled as having a high value. In contrast to other methods, we do not need to introduce heuristic policy constraints, value regularisation or uncertainty penalties to achieve successful offline RL policies in a toy environment. An additional consequence of our work is a principled quantification of Bayesian uncertainty in off-policy returns in model-free RL. While we are able to present an implementation of this framework to verify its behaviour in the exact inference setting with Gaussian processes on a toy problem, the scalability issues that it suffers as the central avenue for further work. We address in more detail these limitations and consider future directions to improve the scalability of this framework beyond the vanilla Gaussian process implementation, proposing a path towards improving offline RL algorithms in a principled way.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Optimizing Brain Tumor Classification: A Comprehensive Study on Transfer Learning and Imbalance Handling in Deep Learning Models

Nächstes Kapitel Semantic Attribution for Explainable Uncertainty Quantification

Nur mit Berechtigung zugänglich

An, G., Moon, S., Kim, J.-H., Song, H.O.: Uncertainty-based offline reinforcement learning with diversified q-ensemble. In: Advances in Neural Information Processing Systems, vol. 34, pp. 7436–7447 (2021)

Bachtiger, P., et al.: Artificial intelligence, data sensors and interconnectivity: future opportunities for heart failure. Card. Fail. Rev. 6 (2020)

Brandfonbrener, D., Whitney, W., Ranganath, R., Bruna, J.: Offline RL without off-policy evaluation. In: Advances in Neural Information Processing Systems, vol. 34, pp. 4933–4946 (2021)

Burt, D.R., Ober, S.W., Garriga-Alonso, A., van der Wilk, M.: Understanding variational inference in function-space. arXiv preprint: arXiv:2011.09421 (2020)

Dasari, S., et al.: RoboNet: Large-scale multi-robot learning. In: Conference on Robot Learning, pp. 885–897. PMLR (2020)

Degris, T., White, M., Sutton, R.S.: Off-policy actor-critic. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012 (2012)

D’Angelo, F., Fortuin, V.: Repulsive deep ensembles are Bayesian. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 3451–3465. Curran Associates, Inc. (2021)

Engel, Y., Mannor, S., Meir, R.: Reinforcement learning with Gaussian processes. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 201–208 (2005)

Fujimoto, S., Gu, S.S.: A minimalist approach to offline reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 34, pp. 20132–20145 (2021)

Fujimoto, S., Meger, D., Precup, D.: Off-policy deep reinforcement learning without exploration. In: International Conference on Machine Learning, pp. 2052–2062. PMLR (2019)

Hafner, D., Tran, D., Lillicrap, T., Irpan, A., Davidson, J.: Noise contrastive priors for functional uncertainty. In: Uncertainty in Artificial Intelligence, pp. 905–914. PMLR (2020)

Hensman, J., Fusi, N., Lawrence, N.D.: Gaussian processes for big data. arXiv preprint: arXiv:1309.6835 (2013)

Huang, Z., Wu, J., Lv, C.: Efficient deep reinforcement learning with imitative expert priors for autonomous driving. IEEE Trans. Neural Netw. Learn. Syst. (2022)

Kalashnikov, D., et al.: Scalable deep reinforcement learning for vision-based robotic manipulation. In: Conference on Robot Learning, pp. 651–673. PMLR (2018)

Kendall, A., et a.: Learning to drive in a day. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 8248–8254. IEEE (2019)

Kidambi, R., Rajeswaran, A., Netrapalli, P., Joachims, T.: MOReL: model-based offline reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 33, pp. 21810–21823 (2020)

Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint: arXiv:1412.6980 (2014)

Komorowski, M., Celi, L.A., Badawi, O., Gordon, A.C., Faisal, A.A.: The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nat. Med. 24(11), 1716–1720 (2018)CrossRef

Kostrikov, I., Fergus, R., Tompson, J., Nachum, O.: Offline reinforcement learning with fisher divergence critic regularization. In: International Conference on Machine Learning, pp. 5774–5783. PMLR (2021)

Kostrikov, I., Nair, A., Levine, S.: Offline reinforcement learning with implicit Q-learning. In: International Conference on Learning Representations (2022)

Kumar, A., Fu, J., Soh, M., Tucker, G., Levine, S.: Stabilizing off-policy Q-learning via bootstrapping error reduction. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

Kumar, A., Zhou, A., Tucker, G., Levine, S.: Conservative q-learning for offline reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 33, pp. 1179–1191 (2020)

Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

Ma, C., Hernández-Lobato, J.M.: Functional variational inference based on stochastic process generators. In: Advances in Neural Information Processing Systems, vol. 34, pp. 21795–21807 (2021)

Matsushima, T., Furuta, H., Matsuo, Y., Nachum, O., Gu, S.: Deployment-efficient reinforcement learning via model-based offline optimization. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021)

Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRef

Ober, S.W., Rasmussen, C.E., van der Wilk, M.: The promises and pitfalls of deep kernel learning. In: Uncertainty in Artificial Intelligence, pp. 1206–1216. PMLR (2021)

Osband, I., Aslanides, J., Cassirer, A.: Randomized prior functions for deep reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 31 (2018)

Ovadia, Y., et al.: Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. In: Advances in Neural Information Processing Systems, vol. 32, (2019)

Rasmussen, C.E., et al.: Gaussian Processes for Machine Learning, vol. 1. Springer, Cham (2006)

Shi, L., Li, G., Wei, Y., Chen, Y., Chi, Y.: Pessimistic Q-learning for offline reinforcement learning: towards optimal sample complexity. In: International Conference on Machine Learning, pp. 19967–20025. PMLR (2022)

Sinha, S., Mandlekar, A., Garg, A.: S4rl: surprisingly simple self-supervision for offline reinforcement learning in robotics. In: Conference on Robot Learning, pp. 907–917. PMLR (2022)

Sun, S., Zhang, G., Shi, J., Grosse, R.: Functional variational Bayesian neural networks. arXiv preprint: arXiv:1903.05779 (2019)

Titsias, M.: Variational learning of inducing variables in sparse Gaussian processes. In: Artificial Intelligence and Statistics, pp. 567–574. PMLR (2009)

Touati, A., Satija, H., Romoff, J., Pineau, J., Vincent, P.: Randomized value functions via multiplicative normalizing flows. In: Uncertainty in Artificial Intelligence, pp. 422–432. PMLR (2020)

Van Amersfoort, J., Smith, L., Jesson, A., Key, O., Gal, Y.: On feature collapse and deep kernel learning for single forward pass uncertainty. arXiv preprint: arXiv:2102.11409 (2021)

Wilson, A.G., Hu, Z., Salakhutdinov, R., Xing, E.P.: Deep kernel learning. In: Artificial Intelligence and Statistics, pp. 370–378. PMLR (2016)

Xiao, T., Wang, D.: A general offline reinforcement learning framework for interactive recommendation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 4512–4520 (2021)

Yu, T., et al.: MOPO: model-based offline policy optimization. In: Advances in Neural Information Processing Systems, vol. 33, pp. 14129–14142 (2020)

Titel: Towards Offline Reinforcement Learning with Pessimistic Value Priors
verfasst von: Filippo Valdettaro
A. Aldo Faisal
Verlag: Springer Nature Switzerland
Buch: Epistemic Uncertainty in Artificial Intelligence
Print ISBN: 978-3-031-57962-2

Electronic ISBN: 978-3-031-57963-9

Copyright-Jahr: 2024
DOI: https://doi.org/10.1007/978-3-031-57963-9_7

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner