Offline RL

Human-Centric Dialog Training via Offline Reinforcement Learning

We train dialog models with interactive data from conversations with real humans, using a novel Offline RL technique based on KL-control. Rather than rely on manual ratings, we learn from implicit signals like sentiment, and show that this results in better performance.

Natasha Jaques*, J. H. Shen*, A. Ghandeharioun, C. Ferguson, A. Lapedriza, N. Jones, S. Gu, R. Picard

2020 In Empirical Methods in Natural Language Processing (EMNLP)

Human-Centric Dialog Training via Offline Reinforcement Learning