De novo design seeks to generate molecules with required property profiles by virtual design-make-test cycles. With the emergence of deep learning and neural generative models in many application areas, models for molecular design based on neural networks appeared recently and show promising results. However, the new model shave not been profiled on consistent tasks, and comparative studies to well-established algorithms have only seldom been performed. To standardise the assessment of both classical and neural models for de novo molecular design, we propose an evaluation framework, GuacaMol, based on a suite of standardised benchmarks. The benchmark tasks encompass measuring the fidelity of the models to reproduce the property distribution of the training sets, the ability to generate novel molecules, the exploration and exploitation of chemical space, and a variety of single and multi-objective optimisation tasks. The benchmarking framework is available as an open-source Python package.
Machine Learning for Molecules and Materials, NeurIPS 2018 Workshop
DEFactor: Differentiable Edge Factorization-based Probabilistic Graph Generation
Rim Assouel, Mohamed Ahmed, Marwin H Segler and Amir Saffari (BenevolentAI), Yoshua Bengio (MILA)
Generating novel molecules with optimal properties is a crucial step in many industries such as drug discovery. Recently, deep generative models have shown a promising way of performing de-novo molecular design. Although graph generative models are currently available they either have a graph size dependency in their number of parameters, limiting their use to only very small graphs or are formulated as a sequence of discrete actions needed to construct a graph, making the output graph non-differentiable w.r.t the model parameters, therefore preventing them to be used in scenarios such as conditional graph generation. In this work we propose a model for conditional graph generation that is computationally efficient and enables direct optimisation of the graph. We demonstrate favourable performance of our model on prototype-based molecular graph conditional generation tasks.
Machine Learning in Health Workshop, Neurips 2018
Adjusting for Confounding in Unsupervised Latent Representations of Images
Craig A. Glastonbury (BenevolentAI), Michael Ferlaino, Christoffer Nellåker and Cecilia M. Lindgren (Big Data Institute, University of Oxford)
Biological imaging data are often partially confounded or contain unwanted variability. Examples of such phenomena include variable lighting across microscopy image captures, stain intensity variation in histological slides, and batch effects for high throughput drug screening assays. Therefore, to develop "fair" models which generalise well to unseen examples, it is crucial to learn data representations that are insensitive to nuisance factors of variation. In this paper, we present a strategy based on adversarial training, capable of learning unsupervised representations invariant to confounders. As an empirical validation of our method, we use deep convolutional autoencoders to learn unbiased cellular representations from microscopy imaging.
Machine Learning in Health Workshop, Neurips 2018
Interpretable Graph Convolutional Neural Networks for Inference on Noisy Knowledge Graphs
Daniel Neil, Joss Briody, Alix Lacoste, Aaron Sim, Paidi Creed, Amir Saffari
In this work, we provide a new formulation for Graph Convolutional Neural Networks (GCNNs) for link prediction on graph data that addresses common challenges for biomedical knowledge graphs (KGs). We introduce a regularized attention mechanism to GCNNs that not only improves performance on clean datasets, but also favorably accommodates noise in KGs, a pervasive issue in real-world applications. Further, we explore new visualization methods for interpretable modelling and to illustrate how the learned representation can be exploited to automate dataset denoising. The results are demonstrated on a synthetic dataset, the common benchmark dataset FB15k-237, and a large biomedical knowledge graph derived from a combination of noisy and clean data sources. Using these improvements, we visualize a learned model's representation of the disease cystic fibrosis and demonstrate how to interrogate a neural network to show the potential of PPARG as a candidate therapeutic target for rheumatoid arthritis.
Future Medicinal Chemistry, 13 Aug 2018
Artificial intelligence in drug discovery
Matthew A Sellwood, Mohamed Ahmed, Marwin HS Segler & Nathan Brown
There has been a great deal of hype surrounding the resurgence of Artificial Intelligence and Machine Learning. This commentary was published in Future Medicinal Chemistry as a brief overview of the AI and ML domains, their relevance in different aspects of drug discovery and, importantly, reflecting on managing expectations from different quarters. The key themes covered are molecular design approaches, including our recent paper on do novo design models, predictive modelling, synthesis planning, and closing the feedback loop to learn from our decisions.
Clinicaltrials.gov is the world’s largest primary registry of clinical studies. For almost two decades now it has been helping physicians, patients, and regulators identify relevant trials and collect evidence. It also offers a unique opportunity to explore, examine, and monitor the clinical research landscape. In our recent research paper, we used the clinicaltrials.gov registry data to conduct a comprehensive large-scale analysis of registered clinical trials and investigate trends in their design and transparency.
Progress in Medicinal Chemistry, Volume 57, elsevier, 10 April 2018
Chapter Five - Big Data in Drug Discovery
Nathan Brown, Jean Cambruzzi, Peter J. Cox, Mark Davies, James Dunbar, Dean Plumbley, Matthew A.Sellwood, Aaron Sim, Bryn I. Williams-Jones, Magdalena Zwierzyna, David W.Sheppard
Modern scientific discovery is driven by data and learning from those data. This book chapter offers an overview of available data sources of relevance to drug discovery and how these can and do make an impact in our research and predictions to make better informed decisions that more rapidly make changes in our discovery research ethic to progress drugs to the clinic.
Ian Churcher, VP Drug Discovery recently published a paper in Nature to highlight how organic synthesis could represent an opportunity for the pharmaceuticals industries to improve drug development. He presents the current challenges that the industry needs overcome and explains how new technologies and industry-academia collaborations are essential to progress.
Nature, 28 March 2018
Planning chemical syntheses with deep neural networks and symbolic AI
Marwin Segler et al
The AI technology developed by Marwin uses deep neural networks to learn from every chemical reaction ever performed (12.4 million of them). Combined with modern tree search algorithms, this allows to plan the synthesis of novel molecules. The technology augments the ability of chemists to make molecules faster, increases the success rate of synthetic chemistry and the speed and efficiency of drug development in general.
OPEN REVIEW, ICLR 2018, 27 March 2018
Exploring deep recurrent models with reinforcement learning for molecule design
Daniel Neil, Marwin Segler, Laura Guasch, Mohamed Ahmed, Dean Plumbley, Matthew Sellwood, Nathan Brown
The essence of molecular design is to effectively fulfill a molecular property profile that is desirable as a drug. In this paper we consider a number of different generative models for the design of new molecular structures the satisfy specific multiple objectives that are desirable for a particular drug discovery project. In addition to the evaluation of multiple generative models, we also presented as part of this work a benchmarking dataset to the community with the aim to provide an objective set to evaluate other new de novo molecular design models appropriately
ChemMedChem, 20 March 2018
Special Issue: Cheminformatics in Drug Discovery
Andreas Bender, Nathan Brown
BenevolentAI guest edited a special issue of ChemMedChem in early 2018 with our Head of Cheminformatics, Nathan Brown, in collaboration with Andreas Bender at the University of Cambridge. The special issue consisted of twenty original research papers from leading names in the field and was introduced with a guest editorial written by Nathan and Andreas, introducing the content. The special issue covered a broad range of topics in Cheminformatics from recent work in Machine Learning in Drug Discovery, to large scale data analyses of protein structures and ligand binding.