Hierarchical Reinforcement Learning

Hierarchical Reinforcement Learning for Open-Domain Dialog

For the first time, we use hierarchical reinforcement learning to train open-domain dialog models, enabling the optimization of long-term, conversational, rewards, including reducing the toxicity of generated language. Our approach provides significant improvements over state-of-the-art dialog models.

A. Saleh*, Natasha Jaques*, A. Ghandeharioun, J. H. Shen, R. Picard

2019 In Association for the Advancement of Artificial Intelligence (AAAI) Oral (top 7.8% of submissions)

Hierarchical Reinforcement Learning for Open-Domain Dialog