Sequence Modeling

Hierarchical Reinforcement Learning for Open-Domain Dialog

For the first time, we use hierarchical reinforcement learning to train open-domain dialog models, enabling the optimization of long-term, conversational, rewards, including reducing the toxicity of generated language. Our approach provides significant improvements over state-of-the-art dialog models.

A. Saleh*, Natasha Jaques*, A. Ghandeharioun, J. H. Shen, R. Picard

2019 In Association for the Advancement of Artificial Intelligence (AAAI) Oral (top 7.8% of submissions)

Hierarchical Reinforcement Learning for Open-Domain Dialog

Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control

To combine supervised learning on data with reinforcement learning, we pre-train a supervised data prior, and penalize KL-divergence from this model using RL training. This enables effective learning of complex sequence-modeling problems for which we wish to match the data while optimizing external metrics like drug effectiveness. The approach produces compelling results in the disparate domains of music generation and drug discovery.

Natasha Jaques, S. Gu, D. Bahdanau, J. M. Hernandez-Lobato, R. E. Turner, D. Eck

2017 In International Conference on Machine Learning (ICML)

Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control

Interactive Musical Improvisation with Magenta

This demo deployed RL Tuner and other Magenta music generation models into an interactive interface in which users can collaborate creatively with a machine learning model. The interface supports call and response interaction, automatically generating an accompaniment to the user’s melody, or melody morphing: responding both with variations on the user’s melody and a bass accompaniment.

A. Roberts, J. Engel, C. Hawthorne, I. Simon, E. Waite, S. Oore, Natasha Jaques, C. Resnick, D. Eck

2016 In Neural Information Processing Systems (NeurIPS) Best Demo

Interactive Musical Improvisation with Magenta

Tuning Recurrent Neural Networks with Reinforcement Learning

Generating music using traditional supervised sequence models suffers from known failure modes, including the inability to produce coherent global structure. Music is an interesting sequence generation problem, because musical compositions adhere to known rules. We impose these rules with a novel algorithm combining RL and supervised learning.

Natasha Jaques, S. Gu, R. E. Turner, D. Eck

2016 In International Conference on Learning Representations (ICLR) - workshop

Tuning Recurrent Neural Networks with Reinforcement Learning