Natasha Jaques
Natasha Jaques
Awards
Press
Featured
Publications
Topics
Talks
Communities
Light
Dark
Automatic
Interpretability
Concept-based Understanding of Emergent Multi-Agent Behavior
Interpreting whether multi-agent reinforcement learning (MARL) agents have successfully learned to coordinate with each other, versus finding some other way to exploit the reward function, is a longstanding problem. We develop a novel interpretability method for MARL based on concept bottlenecks, which enables detecting which agents are truly coordinating, which environments require coordination, and identifying lazy agents.
N. Grupen
,
Natasha Jaques
,
B. Kim
,
S. Omidshafiei
2022
In
Preprint
Cite
Cite
×