Environment Generation for Zero-Shot Compositional Reinforcement Learning
We analyze and improve upon PAIRED in the case of learning to generate challenging compositional tasks. We apply our improved algorithm to the complex task of training RL agents to navigate websites, and find that it is able to generating a challenging curriculum of novel sites. We achieve a 4x improvement over the strongest web navigation baselines, and deploy our model to navigate real-world websites..
Neural Information Processing Systems (NeurIPS)