Scroll for more
blog Feb 21, 2018

RESEARCH | Exploring deep recurrent models with reinforcement learning for molecule design

Author: Daniel Neil, Marwin Segler, Laura Guasch, Mohamed Ahmed, Dean Plumbley, Matthew Sellwood, Nathan Brown

The design of small molecules with bespoke properties is of central importance to drug discovery. However significant challenges yet remain for computational methods, despite recent advances such as deep recurrent networks and reinforcement learning strategies for sequence generation, and it can be difficult to compare results across different works.



This work proposes 19 benchmarks selected by subject experts, expands smaller datasets previously used to approximately 1.1 million training molecules, and explores how to apply new reinforcement learning techniques effectively for molecular design. The benchmarks here, built as OpenAI Gym environments, will be open-sourced to encourage innovation in molecular design algorithms and to enable usage by those without a background in chemistry. Finally, this work explores recent development in reinforcement-learning methods with excellent sample complexity (the A2C and PPO algorithms) and investigates their behavior in molecular generation, demonstrating significant performance gains compared to standard reinforcement learning techniques.

Keywords: reinforcement learning, molecule design, de novo design, ppo, sample-efficient reinforcement learning, chemistry



Novel drugs are developed using design - make - test cycles: molecules are designed, synthesized in the laboratory, and then tested for their biological effect. The insights gained from these tests then inform the design for the next iteration. The objective of de novo design methodologies is to perform this cycle with computational methods (Brown, 2015; Schneider, 2013). The test phase was the first to be automated, using the broad categorization of machine learning models known as quantitative structure-activity/property relationships (QSAR/QSPR) to predict the activity of a molecule against a certain biological target, or physicochemical properties. To make virtual molecules, symbolic approaches based on graph rewriting have been used, which are domain-specific and rely on extensive hand-engineering by experts. To optimize the properties of a molecule, for example its activity against a biological target (design), global optimization approaches such as evolutionary algorithms or ant colony optimization have been used (Brown, 2015; Schneider, 2013). Symbolic approaches have been highlighted as either generating unrealistic molecules that would be difficult to synthesize, or for being too conservative, and therefore not sufficiently exploring the space of tractable molecules (Schneider, 2013; Brown & Bostrom, 2016). ¨ Recently, generative models have been proposed to learn the distribution of real druglike molecules from data, and then to generate chemical structures that are appropriate for the application domain (White & Wilson, 2010). Interestingly, the generation of molecules is related to natural language generation (NLG). Two classic problems of NLG – preserving coherent long-range dependencies, and syntactic and semantic correctness – directly map to molecules. Current investigations draw heavily from tools developed for language tasks, including variational autoencoders (VAE) (Gomez- ´ Bombarelli et al., 2016; Kusner et al., 2017), recurrent neural network (RNN) models (Segler et al., 2017; Jaques et al., 2017; Olivecrona et al., 2017), generative adversarial networks (GAN) (Guimaraes et al., 2017) and Monte Carlo Tree Search (MCTS) (Yang et al., 2017).


Full publication available on OpenReview