---
references:
- id: AjayConditional2023
accessed:
- year: 2023
month: 7
day: 16
author:
- family: Ajay
given: Anurag
- family: Du
given: Yilun
- family: Gupta
given: Abhi
- family: Tenenbaum
given: Joshua
- family: Jaakkola
given: Tommi
- family: Agrawal
given: Pulkit
citation-key: AjayConditional2023
DOI: 10.48550/arXiv.2211.15657
event-title: ICLR
issued:
- year: 2023
month: 7
day: 10
publisher: arXiv
title: Is Conditional Generative Modeling all you need for Decision-Making?
type: paper-conference
URL: http://arxiv.org/abs/2211.15657
- id: BensoussanMachine2020
accessed:
- year: 2020
month: 11
day: 30
author:
- family: Bensoussan
given: Alain
- family: Li
given: Yiqun
- family: Nguyen
given: Dinh Phan Cao
- family: Tran
given: Minh-Binh
- family: Yam
given: Sheung Chi Phillip
- family: Zhou
given: Xiang
citation-key: BensoussanMachine2020
container-title: arXiv:2006.05604 [cs, math, stat]
issued:
- year: 2020
month: 6
day: 9
title: Machine Learning and Control Theory
type: article-journal
URL: http://arxiv.org/abs/2006.05604
- id: BrockmanOpenAI2016
accessed:
- year: 2022
month: 5
day: 10
author:
- family: Brockman
given: Greg
- family: Cheung
given: Vicki
- family: Pettersson
given: Ludwig
- family: Schneider
given: Jonas
- family: Schulman
given: John
- family: Tang
given: Jie
- family: Zaremba
given: Wojciech
citation-key: BrockmanOpenAI2016
container-title: arXiv:1606.01540 [cs]
issued:
- year: 2016
month: 6
day: 5
language: en
title: OpenAI Gym
type: article-journal
URL: http://arxiv.org/abs/1606.01540
- id: CliftonQLearning2020
accessed:
- year: 2020
month: 3
day: 11
author:
- family: Clifton
given: Jesse
- family: Laber
given: Eric
citation-key: CliftonQLearning2020
container-title: Annual Review of Statistics and Its Application
DOI: 10.1146/annurev-statistics-031219-041220
issue: '1'
issued:
- year: 2020
page: 279-301
title: 'Q-Learning: Theory and Applications'
type: article-journal
volume: '7'
- id: DayanReinforcement
author:
- family: Dayan
given: Peter
- family: Watkins
given: Christopher JCH
citation-key: DayanReinforcement
container-title: Encyclopedia of Cognitve Science
title: Reinforcement Learning
type: chapter
- id: DroriDeep2022
author:
- family: Drori
given: Iddo
citation-key: DroriDeep2022
container-author:
- family: Drori
given: Iddo
container-title: The science of deep learning
issued:
- year: 2022
publisher: Cambridge University Press
title: Deep reinforcement learning
type: chapter
- id: DroriReinforcement2022
author:
- family: Drori
given: Iddo
citation-key: DroriReinforcement2022
container-author:
- family: Drori
given: Iddo
container-title: The science of deep learning
issued:
- year: 2022
publisher: Cambridge University Press
title: Reinforcement learning
type: chapter
- id: DroriScience2022
author:
- family: Drori
given: Iddo
citation-key: DroriScience2022
issued:
- year: 2022
publisher: Cambridge University Press
title: The science of deep learning
type: book
URL: http://www.dlbook.org
- id: FellowsVIREL2019
accessed:
- year: 2024
month: 5
day: 28
author:
- family: Fellows
given: Matthew
- family: Mahajan
given: Anuj
- family: Rudner
given: Tim G. J.
- family: Whiteson
given: Shimon
citation-key: FellowsVIREL2019
container-title: Advances in Neural Information Processing Systems
issued:
- year: 2019
publisher: Curran Associates, Inc.
title: 'VIREL: A Variational Inference Framework for Reinforcement Learning'
type: paper-conference
URL: >-
https://proceedings.neurips.cc/paper_files/paper/2019/hash/582967e09f1b30ca2539968da0a174fa-Abstract.html
volume: '32'
- id: JaakkolaReinforcement1995
accessed:
- year: 2017
month: 9
day: 13
author:
- family: Jaakkola
given: Tommi
- family: Singh
given: Satinder P.
- family: Jordan
given: Michael I.
citation-key: JaakkolaReinforcement1995
container-title: Advances in neural information processing systems
issued:
- year: 1995
page: 345–352
title: >-
Reinforcement learning algorithm for partially observable Markov decision
problems
type: paper-conference
URL: >-
http://papers.nips.cc/paper/951-reinforcement-learning-algorithm-for-partially-observable-markov-decision-problems.pdf
- id: KaelblingReinforcement1996
accessed:
- year: 2014
month: 11
day: 27
author:
- family: Kaelbling
given: L. P.
- family: Littman
given: M. L.
- family: Moore
given: A. W.
citation-key: KaelblingReinforcement1996
container-title: Journal of Artifical Intelligence Research
issued:
- year: 1996
month: 4
day: 30
title: 'Reinforcement Learning: A Survey'
type: article-journal
URL: http://arxiv.org/abs/cs/9605103
volume: '4'
- id: KochenderferAlgorithms2022
author:
- family: Kochenderfer
given: Mykel J.
- family: Wheeler
given: Tim Allan
- family: Wray
given: Kyle H.
citation-key: KochenderferAlgorithms2022
event-place: Cambridge, Massachusetts London, UK
ISBN: 978-0-262-04701-2
issued:
- year: 2022
language: eng
number-of-pages: '678'
publisher: Massachusetts Institute of Technology
publisher-place: Cambridge, Massachusetts London, UK
title: Algorithms for decision making
type: book
- id: KorbakRL2022
accessed:
- year: 2024
month: 5
day: 28
author:
- family: Korbak
given: Tomasz
- family: Perez
given: Ethan
- family: Buckley
given: Christopher L.
citation-key: KorbakRL2022
DOI: 10.48550/arXiv.2205.11275
issued:
- year: 2022
month: 10
day: 21
number: arXiv:2205.11275
publisher: arXiv
title: RL with KL penalties is better viewed as Bayesian inference
type: article
URL: http://arxiv.org/abs/2205.11275
- id: KrakovskyReinforcement2016
accessed:
- year: 2016
month: 7
day: 29
author:
- family: Krakovsky
given: Marina
citation-key: KrakovskyReinforcement2016
container-title: Commun. ACM
DOI: 10.1145/2949662
ISSN: 0001-0782
issue: '8'
issued:
- year: 2016
month: 7
page: 12–14
title: Reinforcement Renaissance
type: article-journal
volume: '59'
- id: KrishnamurthyContextualMDPs2016
accessed:
- year: 2016
month: 3
day: 26
author:
- family: Krishnamurthy
given: Akshay
- family: Agarwal
given: Alekh
- family: Langford
given: John
citation-key: KrishnamurthyContextualMDPs2016
container-title: arXiv:1602.02722 [cs, stat]
issued:
- year: 2016
month: 2
day: 8
title: Contextual-MDPs for PAC-Reinforcement Learning with Rich Observations
type: article-journal
URL: http://arxiv.org/abs/1602.02722
- id: LehmanEvolution2022
accessed:
- year: 2023
month: 9
day: 7
author:
- family: Lehman
given: Joel
- family: Gordon
given: Jonathan
- family: Jain
given: Shawn
- family: Ndousse
given: Kamal
- family: Yeh
given: Cathy
- family: Stanley
given: Kenneth O.
citation-key: LehmanEvolution2022
DOI: 10.48550/arXiv.2206.08896
issued:
- year: 2022
month: 6
day: 17
number: arXiv:2206.08896
publisher: arXiv
title: Evolution through Large Models
type: article
URL: http://arxiv.org/abs/2206.08896
- id: LevineReinforcement2018
accessed:
- year: 2021
month: 11
day: 15
author:
- family: Levine
given: Sergey
citation-key: LevineReinforcement2018
container-title: arXiv:1805.00909 [cs, stat]
issued:
- year: 2018
month: 5
day: 20
language: en
title: >-
Reinforcement Learning and Control as Probabilistic Inference: Tutorial and
Review
type: article-journal
URL: http://arxiv.org/abs/1805.00909
- id: ManiaSimple2018
accessed:
- year: 2021
month: 3
day: 31
author:
- family: Mania
given: Horia
- family: Guy
given: Aurelia
- family: Recht
given: Benjamin
citation-key: ManiaSimple2018
container-title: arXiv:1803.07055 [cs, math, stat]
issued:
- year: 2018
month: 3
day: 19
title: >-
Simple random search provides a competitive approach to reinforcement
learning
type: article-journal
URL: http://arxiv.org/abs/1803.07055
- id: MukherjeeBridging2023
accessed:
- year: 2023
month: 7
day: 17
author:
- family: Mukherjee
given: Amartya
- family: Liu
given: Jun
citation-key: MukherjeeBridging2023
DOI: 10.48550/arXiv.2302.00237
issued:
- year: 2023
month: 1
day: 31
number: arXiv:2302.00237
publisher: arXiv
title: >-
Bridging Physics-Informed Neural Networks with Reinforcement Learning:
Hamilton-Jacobi-Bellman Proximal Policy Optimization (HJBPPO)
type: article
URL: http://arxiv.org/abs/2302.00237
- id: ParisottoNeural2017
author:
- family: Parisotto
given: Emilio
- family: Salakhutdinov
given: Ruslan
citation-key: ParisottoNeural2017
container-title: arXiv:1702.08360 [cs]
issued:
- year: 2017
month: 2
day: 27
title: 'Neural Map: Structured Memory for Deep Reinforcement Learning'
type: article-journal
URL: http://arxiv.org/abs/1702.08360
- id: PfauConnecting2016
accessed:
- year: 2019
month: 5
day: 29
author:
- family: Pfau
given: David
- family: Vinyals
given: Oriol
citation-key: PfauConnecting2016
container-title: arXiv:1610.01945 [cs, stat]
issued:
- year: 2016
month: 10
day: 6
title: Connecting Generative Adversarial Networks and Actor-Critic Methods
type: article-journal
URL: http://arxiv.org/abs/1610.01945
- id: RenSpectral2023
accessed:
- year: 2023
month: 4
day: 22
author:
- family: Ren
given: Tongzheng
- family: Zhang
given: Tianjun
- family: Lee
given: Lisa
- family: Gonzalez
given: Joseph E.
- family: Schuurmans
given: Dale
- family: Dai
given: Bo
citation-key: RenSpectral2023
DOI: 10.48550/arXiv.2208.09515
issued:
- year: 2023
month: 3
day: 7
number: arXiv:2208.09515
publisher: arXiv
title: Spectral Decomposition Representation for Reinforcement Learning
type: article
URL: http://arxiv.org/abs/2208.09515
- id: RingstromReward2022
accessed:
- year: 2023
month: 1
day: 26
author:
- family: Ringstrom
given: Thomas J.
citation-key: RingstromReward2022
DOI: 10.48550/arXiv.2211.10851
issued:
- year: 2022
month: 11
day: 22
number: arXiv:2211.10851
publisher: arXiv
title: >-
Reward is not Necessary: How to Create a Compositional Self-Preserving Agent
for Life-Long Learning
type: article
URL: http://arxiv.org/abs/2211.10851
- id: SalimansEvolution2017
author:
- family: Salimans
given: Tim
- family: Ho
given: Jonathan
- family: Chen
given: Xi
- family: Sutskever
given: Ilya
citation-key: SalimansEvolution2017
container-title: arXiv:1703.03864 [cs, stat]
issued:
- year: 2017
month: 3
day: 10
title: Evolution Strategies as a Scalable Alternative to Reinforcement Learning
type: article-journal
URL: http://arxiv.org/abs/1703.03864
- id: SchulmanProximal2017
accessed:
- year: 2024
month: 8
day: 29
author:
- family: Schulman
given: John
- family: Wolski
given: Filip
- family: Dhariwal
given: Prafulla
- family: Radford
given: Alec
- family: Klimov
given: Oleg
citation-key: SchulmanProximal2017
DOI: 10.48550/arXiv.1707.06347
issued:
- year: 2017
month: 8
day: 28
number: arXiv:1707.06347
publisher: arXiv
title: Proximal Policy Optimization Algorithms
type: article
URL: http://arxiv.org/abs/1707.06347
- id: ShibataProbabilistic2006
accessed:
- year: 2014
month: 9
day: 9
author:
- family: Shibata
given: Takeshi
- family: Yoshinaka
given: Ryo
- family: Chikayama
given: Takashi
citation-key: ShibataProbabilistic2006
collection-number: '4264'
collection-title: Lecture Notes in Computer Science
container-title: Algorithmic Learning Theory
editor:
- family: Balcázar
given: José L.
- family: Long
given: Philip M.
- family: Stephan
given: Frank
ISBN: 978-3-540-46649-9 978-3-540-46650-5
issued:
- year: 2006
month: 1
day: 1
language: en
page: 348-362
publisher: Springer Berlin Heidelberg
title: >-
Probabilistic Generalization of Simple Grammars and Its Application to
Reinforcement Learning
type: chapter
URL: http://link.springer.com/chapter/10.1007/11894841_28
- id: SilverReward2021
accessed:
- year: 2022
month: 5
day: 27
author:
- family: Silver
given: David
- family: Singh
given: Satinder
- family: Precup
given: Doina
- family: Sutton
given: Richard S.
citation-key: SilverReward2021
container-title: Artificial Intelligence
container-title-short: Artificial Intelligence
DOI: 10.1016/j.artint.2021.103535
ISSN: 0004-3702
issued:
- year: 2021
month: 10
day: 1
language: en
page: '103535'
title: Reward is enough
type: article-journal
volume: '299'
- id: SuttonPolicy2000
accessed:
- year: 2017
month: 9
day: 13
author:
- family: Sutton
given: Richard S.
- family: McAllester
given: David A.
- family: Singh
given: Satinder P.
- family: Mansour
given: Yishay
citation-key: SuttonPolicy2000
container-title: Advances in neural information processing systems
issued:
- year: 2000
page: 1057–1063
title: >-
Policy gradient methods for reinforcement learning with function
approximation
type: paper-conference
URL: >-
http://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf
- id: SuttonReinforcement1998
author:
- family: Sutton
given: Richard S
- family: Barto
given: Andrew G
citation-key: SuttonReinforcement1998
event-place: Cambridge, Mass.
ISBN: 0-262-19398-1
issued:
- year: 1998
publisher: MIT Press
publisher-place: Cambridge, Mass.
title: Reinforcement learning
type: book
URL: http://lccn.loc.gov/97026416
- id: SuttonReinforcement2018
author:
- family: Sutton
given: Richard S.
- family: Barto
given: Andrew G.
citation-key: SuttonReinforcement2018
edition: 2nd edition
event-place: Cambridge, Massachusetts London, England
ISBN: 978-0-262-03924-6
issued:
- year: 2018
month: 11
day: 13
language: English
number-of-pages: '552'
publisher: Bradford Books
publisher-place: Cambridge, Massachusetts London, England
title: 'Reinforcement Learning, second edition: An Introduction'
type: book
URL: http://incompleteideas.net/book/the-book.html
- id: ThrunEfficient1992
author:
- family: Thrun
given: Sebastian B.
citation-key: ThrunEfficient1992
issued:
- year: 1992
title: Efficient Exploration In Reinforcement Learning
type: report
URL: >-
http://www.ri.cmu.edu/pub_files/pub1/thrun_sebastian_1992_1/thrun_sebastian_1992_1.pdf
...