3

Concept-based Understanding of Emergent Multi-Agent Behavior

Interpreting whether multi-agent reinforcement learning (MARL) agents have successfully learned to coordinate with each other, versus finding some other way to exploit the reward function, is a longstanding problem. We develop a novel interpretability method for MARL based on concept bottlenecks, which enables detecting which agents are truly coordinating, which environments require coordination, and identifying lazy agents.

N. Grupen, Natasha Jaques, B. Kim, S. Omidshafiei

2022 In Preprint

Concept-based Understanding of Emergent Multi-Agent Behavior

Moral Foundations of Large Language Models

Moral Foundations theory decomposes human moral reasoning into five factors, which vary reliably across different human populations and political affiliations. We use moral foundations to analyze large language models like GPT-3 to determine what, if any, consistent moral values it brings to conversations, whether these can be deliberately manipulated, and whether holding a particular moral stance affects downstream tasks.

M. Abdulhai, C. Crepy, D. Valter, J. Canny, S. Levine, Natasha Jaques

2022 In Preprint

Moral Foundations of Large Language Models

In the ZONE: Measuring difficulty and progression in curriculum generation

Past work on curriculum generation in RL has focused on training a teacher agent to generate tasks for a student agent that accelerate student learning and improve generalization. In this work, we create a mathematical framework that formalizes these concepts and subsumes prior work, taking inspiration from the psychological concept of the Zone of Proximal Development. We propose two new techniques based on rejection sampling and maximizing the student’s gradient norm that improve curriculum learning.

R. E. Wang, J. Mu, D. Arumugam, Natasha Jaques, N. Goodman

2022 In Preprint

In the ZONE: Measuring difficulty and progression in curriculum generation

Multi-Agent Reinforcement Learning for Hardware Architecture Search: A Case Study on Domain-Specific DRAM Memory Controller Design

Reinforement Learning can potentially be a powerful tool for solving complex combinatorial optimization problems, such as microprocessor desgin. Here, we show that a multi-agent RL approach outperforms past work using single agent RL, since the problem can easily be decomposed into designing independent sub-systems.

S. Krishnan, Natasha Jaques, S. Omidshafiei, D. Zhang, I. Gur, V. J. Reddi, S. Faust

2022 In Preprint

Multi-Agent Reinforcement Learning for Hardware Architecture Search: A Case Study on Domain-Specific DRAM Memory Controller Design

Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience

Using inverse reinforcement learning to infer human preferences is challenging, because it is an underspecified problem. We use multi-task RL pre-training and successor features to learn a strong prior over the space of reasonable goals in an environment—which we call a basis—that enables rapidly inferring an expert’s reward function in only 100 samples.

M. Abdulhai, Natasha Jaques, S. Levine

2022 In Preprint

Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience

A Comparison of Random Forests and Dropout Nets for Sign Language Recognition with the Kinect

We conduct a study in which participants form American Sign Language hand signs while being recorded with a Microsoft Kinect. The resulting infra-red distance data are used to train both neural networks with dropout (dropout NN) and Random Forests; dropout NN perform significantly better.

Natasha Jaques, J. Nutini

2013 In Unpublished manuscript

A Comparison of Random Forests and Dropout Nets for Sign Language Recognition with the Kinect

Emotionally Adaptive Intelligent Tutoring Systems using POMDPs

An emerging field in user-adaptive systems is affect adaptivity: modeling and responding to an estimation of the user’s emotional state. Prior work used Dynamic Bayesian Networks to obtain adaptivity, but in this paper we represent the problem as a Partially Observable Markov Decision Process (POMDP) and find solutions that compute a plan of interventions for an Intelligent Tutoring System to take given an estimation of the user’s mood and goals.

Natasha Jaques

2013 In Unpublished manuscript

Emotionally Adaptive Intelligent Tutoring Systems using POMDPs

Fast Johnson–Lindenstrauss transform for classification of high dimensional data

This paper investigates the utility of using the Fast Johnson-Lindenstrauss Transform to produce a low-dimensional random projection of eye-tracking data features that can be used for classifying emotion in an Intelligent Tutoring System. Interestingly, the FJLT provides similar or superior performance to more computationally expensive techniques.

Natasha Jaques

2013 In Unpublished manuscript

Fast Johnson–Lindenstrauss transform for classification of high dimensional data