Robust Q-learning algorithm for Markov decision processes under Wasserstein uncertainty

Ariel Neufeld*, Julian Sester

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)

Abstract

We present a novel Q-learning algorithm tailored to solve distributionally robust Markov decision problems where the corresponding ambiguity set of transition probabilities for the underlying Markov decision process is a Wasserstein ball around a (possibly estimated) reference measure. We prove convergence of the presented algorithm and provide several examples also using real data to illustrate both the tractability of our algorithm as well as the benefits of considering distributional robustness when solving stochastic optimal control problems, in particular when the estimated distributions turn out to be misspecified in practice.

Original languageEnglish
Article number111825
JournalAutomatica
Volume168
DOIs
Publication statusPublished - Oct 2024
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2024 Elsevier Ltd

ASJC Scopus Subject Areas

  • Control and Systems Engineering
  • Electrical and Electronic Engineering

Keywords

  • Distributionally robust optimization
  • Markov decision process
  • Q-learning
  • Reinforcement learning
  • Wasserstein uncertainty

Fingerprint

Dive into the research topics of 'Robust Q-learning algorithm for Markov decision processes under Wasserstein uncertainty'. Together they form a unique fingerprint.

Cite this