Natasha Jaques
Natasha Jaques
Awards
Press
Featured
Publications
Topics
Talks
Communities
Light
Dark
Automatic
Sequence Modeling
Hierarchical Reinforcement Learning for Open-Domain Dialog
For the first time, we use hierarchical reinforcement learning to train open-domain dialog models, enabling the optimization of long-term, conversational, rewards, including reducing the toxicity of generated language. Our approach provides significant improvements over state-of-the-art dialog models.
A. Saleh
*
,
Natasha Jaques
*
,
A. Ghandeharioun
,
J. H. Shen
,
R. Picard
2019
In
Association for the Advancement of Artificial Intelligence (AAAI)
Oral (top 7.8% of submissions)
PDF
Cite
Code
Dataset
Talk
Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control
To combine supervised learning on data with reinforcement learning, we pre-train a supervised data prior, and penalize KL-divergence from this model using RL training. This enables effective learning of complex sequence-modeling problems for which we wish to match the data while optimizing external metrics like drug effectiveness. The approach produces compelling results in the disparate domains of music generation and drug discovery.
Natasha Jaques
,
S. Gu
,
D. Bahdanau
,
J. M. Hernandez-Lobato
,
R. E. Turner
,
D. Eck
2017
In
International Conference on Machine Learning (ICML)
PDF
Cite
Code
ICML talk
Generated music
Magenta blog
MIT Tech Review article
Interactive Musical Improvisation with Magenta
This demo deployed RL Tuner and other Magenta music generation models into an interactive interface in which users can collaborate creatively with a machine learning model. The interface supports call and response interaction, automatically generating an accompaniment to the user’s melody, or melody morphing: responding both with variations on the user’s melody and a bass accompaniment.
A. Roberts
,
J. Engel
,
C. Hawthorne
,
I. Simon
,
E. Waite
,
S. Oore
,
Natasha Jaques
,
C. Resnick
,
D. Eck
2016
In
Neural Information Processing Systems (NeurIPS)
Best Demo
Cite
Code
Video
NeurIPS Demo
Magenta
Blog post
Tuning Recurrent Neural Networks with Reinforcement Learning
Generating music using traditional supervised sequence models suffers from known failure modes, including the inability to produce coherent global structure. Music is an interesting sequence generation problem, because musical compositions adhere to known rules. We impose these rules with a novel algorithm combining RL and supervised learning.
Natasha Jaques
,
S. Gu
,
R. E. Turner
,
D. Eck
2016
In
International Conference on Learning Representations (ICLR) - workshop
PDF
Cite
Code
Magenta blog
MIT Tech Review article
Cite
×