Abstract
In this note we provide an upper bound for the difference between the value function of a distributionally robust Markov decision problem and the value function of a non-robust Markov decision problem, where the ambiguity set of probability kernels of the distributionally robust Markov decision process is described by a Wasserstein ball around some reference kernel whereas the non-robust Markov decision process behaves according to a fixed probability kernel contained in the ambiguity set. Our derived upper bound for the difference between the value functions is dimension-free and depends linearly on the radius of the Wasserstein ball.
Original language | English |
---|---|
Journal | Journal of Applied Probability |
DOIs | |
Publication status | Accepted/In press - 2024 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:© The Author(s), 2024.
ASJC Scopus Subject Areas
- Statistics and Probability
- General Mathematics
- Statistics, Probability and Uncertainty
Keywords
- distributionally robust optimization
- Markov decision process
- reinforcement learning
- Wasserstein uncertainty