Natasha Jaques
Natasha Jaques
Awards
Press
Featured
Publications
Topics
Talks
Communities
Light
Dark
Automatic
KL-control
Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control
To combine supervised learning on data with reinforcement learning, we pre-train a supervised data prior, and penalize KL-divergence from this model using RL training. This enables effective learning of complex sequence-modeling problems for which we wish to match the data while optimizing external metrics like drug effectiveness. The approach produces compelling results in the disparate domains of music generation and drug discovery.
Natasha Jaques
,
S. Gu
,
D. Bahdanau
,
J. M. Hernandez-Lobato
,
R. E. Turner
,
D. Eck
2017
In
International Conference on Machine Learning (ICML)
PDF
Cite
Code
ICML talk
Generated music
Magenta blog
MIT Tech Review article
Cite
×