Concept-based Understanding of Emergent Multi-Agent Behavior


This work studies concept-based interpretability in the context of multi-agent learning. Unlike supervised learning, where there have been efforts to understand a model’s decisions, multi-agent interpretability remains under-investigated. This is in part due to the increased complexity of the multi-agent setting—interpreting the decisions of multiple agents over time is combinatorially more complex than understanding individual, static decisions—but is also a reflection of the limited availability of tools for understanding multi-agent behavior. Interactions between agents, and coordination generally, remain difficult to gauge in MARL. In this work, we propose Concept Bottleneck Policies (CBPs) as a method for learning intrinsically interpretable, concept-based policies with MARL. We demonstrate that, by conditioning each agent’s action on a set of human-understandable concepts, our method enables post-hoc behavioral analysis via concept intervention that is infeasible with standard policy architectures. Experiments show that concept interventions over CBPs reliably detect when agents have learned to coordinate with each other in environments that do not demand coordination, and detect those environments in which coordination is required. Moreover, we find evidence that CBPs can detect coordination failures (such as lazy agents) and expose the lowlevel inter-agent information that underpins emergent coordination. Finally, we demonstrate that our approach matches the performance of standard, non-conceptbased policies; thereby achieving interpretability without sacrificing performance.

In Preprint
Natasha Jaques
Natasha Jaques

My research is focused on Social Reinforcement Learning–developing algorithms that use insights from social learning to improve AI agents’ learning, generalization, coordination, and human-AI interaction.