Dual Temporal Difference Learning

Min Yang; Yuxi Li; Dale Schuurmans

Dual Temporal Difference Learning

Min Yang, Yuxi Li, Dale Schuurmans

Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, PMLR 5:631-638, 2009.

Abstract

Recently, researchers have investigated novel dual representations as a basis for dynamic programming and reinforcement learning algorithms. Although the convergence properties of classical dynamic programming algorithms have been established for dual representations, temporal difference learning algorithms have not yet been analyzed. In this paper, we study the convergence properties of temporal difference learning using dual representations. We contribute significant progress by proving the convergence of dual temporal difference learning with eligibility traces. Experimental results suggest that the dual algorithms seem to demonstrate empirical benefits over standard primal algorithms.

Cite this Paper

BibTeX


@InProceedings{pmlr-v5-yang09a,
  title = 	 {Dual Temporal Difference Learning},
  author = 	 {Yang, Min and Li, Yuxi and Schuurmans, Dale},
  booktitle = 	 {Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {631--638},
  year = 	 {2009},
  editor = 	 {van Dyk, David and Welling, Max},
  volume = 	 {5},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA},
  month = 	 {16--18 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v5/yang09a/yang09a.pdf},
  url = 	 {https://proceedings.mlr.press/v5/yang09a.html},
  abstract = 	 {Recently, researchers have investigated novel dual representations  as a basis for dynamic programming and reinforcement learning algorithms.  Although the convergence properties of classical dynamic programming  algorithms have been established for dual representations, temporal  difference learning algorithms have not yet been analyzed.  In this paper,  we study the convergence properties of temporal difference learning using  dual representations.  We contribute significant progress by proving the  convergence of dual temporal difference learning with eligibility traces.  Experimental results suggest that the dual algorithms seem to demonstrate  empirical benefits over standard primal algorithms.}
}

Endnote

%0 Conference Paper
%T Dual Temporal Difference Learning
%A Min Yang
%A Yuxi Li
%A Dale Schuurmans
%B Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2009
%E David van Dyk
%E Max Welling	
%F pmlr-v5-yang09a
%I PMLR
%P 631--638
%U https://proceedings.mlr.press/v5/yang09a.html
%V 5
%X Recently, researchers have investigated novel dual representations  as a basis for dynamic programming and reinforcement learning algorithms.  Although the convergence properties of classical dynamic programming  algorithms have been established for dual representations, temporal  difference learning algorithms have not yet been analyzed.  In this paper,  we study the convergence properties of temporal difference learning using  dual representations.  We contribute significant progress by proving the  convergence of dual temporal difference learning with eligibility traces.  Experimental results suggest that the dual algorithms seem to demonstrate  empirical benefits over standard primal algorithms.

RIS


TY  - CPAPER
TI  - Dual Temporal Difference Learning
AU  - Min Yang
AU  - Yuxi Li
AU  - Dale Schuurmans
BT  - Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics
DA  - 2009/04/15
ED  - David van Dyk
ED  - Max Welling	
ID  - pmlr-v5-yang09a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 5
SP  - 631
EP  - 638
L1  - http://proceedings.mlr.press/v5/yang09a/yang09a.pdf
UR  - https://proceedings.mlr.press/v5/yang09a.html
AB  - Recently, researchers have investigated novel dual representations  as a basis for dynamic programming and reinforcement learning algorithms.  Although the convergence properties of classical dynamic programming  algorithms have been established for dual representations, temporal  difference learning algorithms have not yet been analyzed.  In this paper,  we study the convergence properties of temporal difference learning using  dual representations.  We contribute significant progress by proving the  convergence of dual temporal difference learning with eligibility traces.  Experimental results suggest that the dual algorithms seem to demonstrate  empirical benefits over standard primal algorithms.
ER  -

APA


Yang, M., Li, Y. & Schuurmans, D.. (2009). Dual Temporal Difference Learning. Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 5:631-638 Available from https://proceedings.mlr.press/v5/yang09a.html.

Related Material

Download PDF