Logo image
Prioritized Sweeping Converges to the Optimal Value Function
Technical documentation   Open access

Prioritized Sweeping Converges to the Optimal Value Function

Lihong Li and Michael Littman
Rutgers University
2008
DOI:
https://doi.org/10.7282/T3TX3JSX

Abstract

Prioritized Sweeping Asynchronous Dynamic Programming Asymptotic Convergence Decision-Theoretic Planning Markov Decision Process
Prioritized sweeping (PS) and its variants are model-based reinforcement-learning algorithms that have demonstrated superior performance in terms of computational and experience efficiency in practice. This note establishes the first—to the best of our knowledge—formal proof of convergence to the optimal value function when they are used as planning algorithms. We also describe applications of this result to provably efficient model-based reinforcement learning in the PAC-MDP framework. We do not address the issue of convergence rate in the present paper
pdf
Prioritized Sweeping Converges205.57 kBDownloadView
Technical Documentation Open Access
url
Report an accessibility issueView
Please complete a content remediation request to report an accessibility issue with a library electronic resource, website, or service.

Metrics

178 File downloads
276 Record Views

Details

Logo image