KL-control

Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control

To combine supervised learning on data with reinforcement learning, we pre-train a supervised data prior, and penalize KL-divergence from this model using RL training. This enables effective learning of complex sequence-modeling problems for which we wish to match the data while optimizing external metrics like drug effectiveness. The approach produces compelling results in the disparate domains of music generation and drug discovery.

Natasha Jaques, S. Gu, D. Bahdanau, J. M. Hernandez-Lobato, R. E. Turner, D. Eck

2017 In International Conference on Machine Learning (ICML)

Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control