Robust Q-learning algorithm for Markov decision processes under Wasserstein uncertainty

Ariel Neufeld; Julian Sester

doi:10.1016/j.automatica.2024.111825

Robust Q-learning algorithm for Markov decision processes under Wasserstein uncertainty

Ariel Neufeld^*, Julian Sester

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

8 Citations (Scopus)

Abstract

We present a novel Q-learning algorithm tailored to solve distributionally robust Markov decision problems where the corresponding ambiguity set of transition probabilities for the underlying Markov decision process is a Wasserstein ball around a (possibly estimated) reference measure. We prove convergence of the presented algorithm and provide several examples also using real data to illustrate both the tractability of our algorithm as well as the benefits of considering distributional robustness when solving stochastic optimal control problems, in particular when the estimated distributions turn out to be misspecified in practice.

Original language	English
Article number	111825
Journal	Automatica
Volume	168
DOIs	https://doi.org/10.1016/j.automatica.2024.111825
Publication status	Published - Oct 2024
Externally published	Yes

Bibliographical note

Publisher Copyright:
© 2024 Elsevier Ltd

ASJC Scopus Subject Areas

Control and Systems Engineering
Electrical and Electronic Engineering

Keywords

Distributionally robust optimization
Markov decision process
Q-learning
Reinforcement learning
Wasserstein uncertainty

Access to Document

10.1016/j.automatica.2024.111825

Cite this

@article{8beeec0312c64f41a2bf0630001463f6,

title = "Robust Q-learning algorithm for Markov decision processes under Wasserstein uncertainty",

abstract = "We present a novel Q-learning algorithm tailored to solve distributionally robust Markov decision problems where the corresponding ambiguity set of transition probabilities for the underlying Markov decision process is a Wasserstein ball around a (possibly estimated) reference measure. We prove convergence of the presented algorithm and provide several examples also using real data to illustrate both the tractability of our algorithm as well as the benefits of considering distributional robustness when solving stochastic optimal control problems, in particular when the estimated distributions turn out to be misspecified in practice.",

keywords = "Distributionally robust optimization, Markov decision process, Q-learning, Reinforcement learning, Wasserstein uncertainty",

author = "Ariel Neufeld and Julian Sester",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier Ltd",

year = "2024",

month = oct,

doi = "10.1016/j.automatica.2024.111825",

language = "English",

volume = "168",

journal = "Automatica",

issn = "0005-1098",

publisher = "Elsevier Limited",

}

TY - JOUR

T1 - Robust Q-learning algorithm for Markov decision processes under Wasserstein uncertainty

AU - Neufeld, Ariel

AU - Sester, Julian

PY - 2024/10

Y1 - 2024/10

N2 - We present a novel Q-learning algorithm tailored to solve distributionally robust Markov decision problems where the corresponding ambiguity set of transition probabilities for the underlying Markov decision process is a Wasserstein ball around a (possibly estimated) reference measure. We prove convergence of the presented algorithm and provide several examples also using real data to illustrate both the tractability of our algorithm as well as the benefits of considering distributional robustness when solving stochastic optimal control problems, in particular when the estimated distributions turn out to be misspecified in practice.

AB - We present a novel Q-learning algorithm tailored to solve distributionally robust Markov decision problems where the corresponding ambiguity set of transition probabilities for the underlying Markov decision process is a Wasserstein ball around a (possibly estimated) reference measure. We prove convergence of the presented algorithm and provide several examples also using real data to illustrate both the tractability of our algorithm as well as the benefits of considering distributional robustness when solving stochastic optimal control problems, in particular when the estimated distributions turn out to be misspecified in practice.

KW - Distributionally robust optimization

KW - Markov decision process

KW - Q-learning

KW - Reinforcement learning

KW - Wasserstein uncertainty

UR - http://www.scopus.com/inward/record.url?scp=85200048519&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85200048519&partnerID=8YFLogxK

U2 - 10.1016/j.automatica.2024.111825

DO - 10.1016/j.automatica.2024.111825

M3 - Article

AN - SCOPUS:85200048519

SN - 0005-1098

VL - 168

JO - Automatica

JF - Automatica

M1 - 111825

ER -

Robust Q-learning algorithm for Markov decision processes under Wasserstein uncertainty

Abstract

Bibliographical note

ASJC Scopus Subject Areas

Keywords

Access to Document

Other files and links

Fingerprint

Cite this