Prioritized Sweeping Converges to the Optimal Value Function

Lihong Li; Michael Littman

doi:10.7282/T3TX3JSX

Back

Prioritized Sweeping Converges to the Optimal Value Function

Technical documentation

Open access

Prioritized Sweeping Converges to the Optimal Value Function

Lihong Li and Michael Littman

Rutgers University

2008

DOI:

https://doi.org/10.7282/T3TX3JSX

Abstract

Prioritized Sweeping

Asynchronous Dynamic Programming

Asymptotic Convergence

Decision-Theoretic Planning

Markov Decision Process

Prioritized sweeping (PS) and its variants are model-based reinforcement-learning algorithms that have demonstrated superior performance in terms of computational and experience efficiency in practice. This note establishes the first—to the best of our knowledge—formal proof of convergence to the optimal value function when they are used as planning algorithms. We also describe applications of this result to provably efficient model-based reinforcement learning in the PAC-MDP framework. We do not address the issue of convergence rate in the present paper

Files and links (1)

pdf

Prioritized Sweeping Converges205.57 kBDownload View

Technical Documentation Open Access

Metrics

155 File downloads

207 Record Views

Details

Title: Prioritized Sweeping Converges to the Optimal Value Function
Creators: Lihong Li (Author) - Computer Science (New Brunswick)
Michael Littman (Author) - Computer Science (New Brunswick)
Date published: 2008
Publisher: Rutgers University
Number of pages: 15 p.
Academic Unit: School of Arts and Sciences; Computer Science (SAS)
Language: English
Resource Type: Technical documentation
Comment: Technical report DCS-TR-631
Identifiers: 991031550021004646