In the ZONE: Measuring difficulty and progression in curriculum generation


A common strategy in curriculum generation for reinforcement learning is to train a teacher network to generate tasks that enable student learning. But, what kind of tasks enables this? One answer is tasks belonging to a student’s zone of proximal development (ZPD), a concept from developmental psychology. These are tasks that are not too easy and not too hard for the student. Albeit intuitive, ZPD is not well understood computationally. We propose ZONE, a novel computational framework that operationalizes ZPD. It formalizes ZPD through the language of Bayesian probability theory, revealing that tasks should be selected by difficulty (the student’s probability of task success) and learning progression (the degree of change in the student’s model parameters). ZONE instantiates two techniques that enforce the teacher to pick tasks within the student’s ZPD. One is REJECT , which rejects tasks outside of a difficulty scope, and the other is GRAD, which prioritizes tasks that maximize the student’s gradient norm. We apply these techniques to existing curriculum learning algorithms. We show that they improve the student’s generalization performance on discrete MiniGrid environments and continuous control MuJoCo domains with up to 9× higher success. ZONE also accelerates the student’s learning by training with 10× less data

In Preprint
Natasha Jaques
Natasha Jaques

My research is focused on Social Reinforcement Learning–developing algorithms that use insights from social learning to improve AI agents’ learning, generalization, coordination, and human-AI interaction.