Mixture Weighted Policy Cover
Date of Award
M.S. in Electrical Engineering
Department of Electrical and Computer Engineering
Exploration plays a major role in the performance of reinforcement learning algorithms. Successful exploration should force the agent to access parts of the state-action space that it has not been heavily exposed to. This allows agents to find potentially better trajectories in terms of the value function that they yield. Exploration becomes much more difficult however when the environment is nonstationary. This is the case in multiagent reinforcement learning where other agents also learn and so change the dynamics of the environment from the perspective of any single agent. The upper confidence bound style reward bonus that is common in many reinforcement learning algorithms does not take this nonstationarity into account and therefore cannot be successfully applied to the multiagent setting. In this thesis, we propose Mixture-Weighted Policy Cover, a policy iteration algorithm using an upper confidence bound based intrinsic exploration bonus that encourages exploration in episodic multiagent settings by defining a policy cover that favors newer policies.
Copyright © 2022, author
Miller, Dylan, "Mixture Weighted Policy Cover" (2022). Graduate Theses and Dissertations. 7106.