Mixture Weighted Policy Cover

Date of Award


Degree Name

M.S. in Electrical Engineering


Department of Electrical and Computer Engineering


Raul Ordóñez


Exploration plays a major role in the performance of reinforcement learning algorithms. Successful exploration should force the agent to access parts of the state-action space that it has not been heavily exposed to. This allows agents to find potentially better trajectories in terms of the value function that they yield. Exploration becomes much more difficult however when the environment is nonstationary. This is the case in multiagent reinforcement learning where other agents also learn and so change the dynamics of the environment from the perspective of any single agent. The upper confidence bound style reward bonus that is common in many reinforcement learning algorithms does not take this nonstationarity into account and therefore cannot be successfully applied to the multiagent setting. In this thesis, we propose Mixture-Weighted Policy Cover, a policy iteration algorithm using an upper confidence bound based intrinsic exploration bonus that encourages exploration in episodic multiagent settings by defining a policy cover that favors newer policies.


Artificial Intelligence

Rights Statement

Copyright © 2022, author