Recent publications.



Interpret: Constructing large scale biomedical knowledge bases from scratch with rapid annotation of interpretable patterns

Julien Fauqueur, Ashok Thillaisundaram, Theodosia Togia

Blog | Meet the team | Publication

Knowledge base construction is crucial for summarising, understanding and inferring relationships between biomedical entities. However, for many practical applications such as drug discovery, the scarcity of relevant facts (e.g. gene X is therapeutic target for disease Y) severely limits a domain expert's ability to create a usable knowledge base, either directly or by training a relation extraction model.
In this paper, we present a simple and effective method of extracting new facts with a pre-specified binary relationship type from the biomedical literature, without requiring any training data or hand-crafted rules. Our system discovers, ranks and presents the most salient patterns to domain experts in an interpretable form. By marking patterns as compatible with the desired relationship type, experts indirectly batch-annotate candidate pairs whose relationship is expressed with such patterns in the literature. Read more →

Journal of chemical information and modeling, 19 March 2019

GuacaMol: Benchmarking Models for De Novo Molecular Design

Nathan Brown, Marco Fiscato, Marwin H.S. Segler , and Alain C. Vaucher

Leaderboard | Blog | Publication

De novo design seeks to generate molecules with required property profiles by virtual design-make-test cycles. With the emergence of deep learning and neural generative models in many application areas, models for molecular design based on neural networks appeared recently and show promising results. However, the new model shave not been profiled on consistent tasks, and comparative studies to well-established algorithms have only seldom been performed. To standardise the assessment of both classical and neural models for de novo molecular design, we propose an evaluation framework, GuacaMol, based on a suite of standardised benchmarks. The benchmark tasks encompass measuring the fidelity of the models to reproduce the property distribution of the training sets, the ability to generate novel molecules, the exploration and exploitation of chemical space, and a variety of single and multi-objective optimisation tasks. The benchmarking framework is available as an open-source Python package.

Machine Learning for Molecules and Materials Workshop, NeurIPS 2018

DEFactor: Differentiable Edge Factorization-based Probabilistic Graph Generation

Rim Assouel, Mohamed Ahmed, Marwin H Segler and Amir Saffari (BenevolentAI), Yoshua Bengio (MILA)


Generating novel molecules with optimal properties is a crucial step in many industries such as drug discovery. Recently, deep generative models have shown a promising way of performing de-novo molecular design. Although graph generative models are currently available they either have a graph size dependency in their number of parameters, limiting their use to only very small graphs or are formulated as a sequence of discrete actions needed to construct a graph, making the output graph non-differentiable w.r.t the model parameters, therefore preventing them to be used in scenarios such as conditional graph generation. In this work we propose a model for conditional graph generation that is computationally efficient and enables direct optimisation of the graph. We demonstrate favourable performance of our model on prototype-based molecular graph conditional generation tasks.

Machine Learning in Health Workshop, Neurips 2018

Adjusting for Confounding in Unsupervised Latent Representations of Images

Craig A. Glastonbury (BenevolentAI), Michael Ferlaino, Christoffer Nellåker and Cecilia M. Lindgren (Big Data Institute, University of Oxford)


Biological imaging data are often partially confounded or contain unwanted variability. Examples of such phenomena include variable lighting across microscopy image captures, stain intensity variation in histological slides, and batch effects for high throughput drug screening assays. Therefore, to develop "fair" models which generalise well to unseen examples, it is crucial to learn data representations that are insensitive to nuisance factors of variation. In this paper, we present a strategy based on adversarial training, capable of learning unsupervised representations invariant to confounders. As an empirical validation of our method, we use deep convolutional autoencoders to learn unbiased cellular representations from microscopy imaging.

Machine Learning in Health Workshop, Neurips 2018

Interpretable Graph Convolutional Neural Networks for Inference on Noisy Knowledge Graphs

Daniel Neil, Joss Briody, Alix Lacoste, Aaron Sim, Paidi Creed, Amir Saffari


In this work, we provide a new formulation for Graph Convolutional Neural Networks (GCNNs) for link prediction on graph data that addresses common challenges for biomedical knowledge graphs (KGs). We introduce a regularized attention mechanism to GCNNs that not only improves performance on clean datasets, but also favorably accommodates noise in KGs, a pervasive issue in real-world applications. Further, we explore new visualization methods for interpretable modelling and to illustrate how the learned representation can be exploited to automate dataset denoising. The results are demonstrated on a synthetic dataset, the common benchmark dataset FB15k-237, and a large biomedical knowledge graph derived from a combination of noisy and clean data sources. Using these improvements, we visualize a learned model's representation of the disease cystic fibrosis and demonstrate how to interrogate a neural network to show the potential of PPARG as a candidate therapeutic target for rheumatoid arthritis.

Future Medicinal Chemistry, 13 Aug 2018

Artificial intelligence in drug discovery

Matthew A Sellwood, Mohamed Ahmed, Marwin HS Segler & Nathan Brown


There has been a great deal of hype surrounding the resurgence of Artificial Intelligence and Machine Learning. This commentary was published in Future Medicinal Chemistry as a brief overview of the AI and ML domains, their relevance in different aspects of drug discovery and, importantly, reflecting on managing expectations from different quarters. The key themes covered are molecular design approaches, including our recent paper on de novo design models, predictive modelling, synthesis planning, and closing the feedback loop to learn from our decisions.

british medical journal, 7 june 2018

Clinical trial design and dissemination: comprehensive analysis of and PubMed data since 2005

Magdalena Zwierzyna, Mark Davies, Aroon D Hingorani, Jackie Hunter

Publication |  BMJ Opinion is the world’s largest primary registry of clinical studies. For almost two decades now  it has been helping physicians, patients, and regulators identify relevant trials and collect evidence. It also offers a unique opportunity to explore, examine, and monitor the clinical research landscape.  In our recent research paper, we used the registry data to conduct a comprehensive large-scale analysis of registered clinical trials and investigate trends in their design and transparency. 

Progress in Medicinal Chemistry, Volume 57, elsevier, 10 April 2018

Chapter Five - Big Data in Drug Discovery

Nathan Brown, Jean Cambruzzi, Peter J. Cox, Mark Davies, James Dunbar, Dean Plumbley, Matthew A.Sellwood, Aaron Sim, Bryn I. Williams-Jones, Magdalena Zwierzyna, David W.Sheppard


Modern scientific discovery is driven by data and learning from those data. This book chapter offers an overview of available data sources of relevance to drug discovery and how these can and do make an impact in our research and predictions to make better informed decisions that more rapidly make changes in our discovery research ethic to progress drugs to the clinic.

Nature Chemistry, 4 April 2018

Organic synthesis provides opportunities to transform drug discovery

Ian Churcher et al

Blog | Publication 

Ian Churcher, VP Drug Discovery recently published a paper in Nature to highlight how organic synthesis could represent an opportunity for the pharmaceuticals industries to improve drug development. He presents the current challenges that the industry needs overcome and explains how new technologies and industry-academia collaborations are essential to progress.

Nature, 28 March 2018

Planning chemical syntheses with deep neural networks and symbolic AI

Marwin Segler et al


The AI technology developed by Marwin uses deep neural networks to learn from every chemical reaction ever performed (12.4 million of them). Combined with modern tree search algorithms, this allows to plan the synthesis of novel molecules. The technology augments the ability of chemists to make molecules faster, increases the success rate of synthetic chemistry and the speed and efficiency of drug development in general.

OPEN REVIEW, ICLR 2018, 27 March 2018

Exploring deep recurrent models with reinforcement learning for molecule design

Daniel Neil, Marwin Segler, Laura Guasch, Mohamed Ahmed, Dean Plumbley, Matthew Sellwood, Nathan Brown


The essence of molecular design is to effectively fulfill a molecular property profile that is desirable as a drug. In this paper we consider a number of different generative models for the design of new molecular structures the satisfy specific multiple objectives that are desirable for a particular drug discovery project. In addition to the evaluation of multiple generative models, we also presented as part of this work a benchmarking dataset to the community with the aim to provide an objective set to evaluate other new de novo molecular design models appropriately

ChemMedChem, 20 March 2018

Special Issue: Cheminformatics in Drug Discovery

Andreas Bender, Nathan Brown


BenevolentAI guest edited a special issue of ChemMedChem in early 2018 with our Head of Cheminformatics, Nathan Brown, in collaboration with Andreas Bender at the University of Cambridge. The special issue consisted of twenty original research papers from leading names in the field and was introduced with a guest editorial written by Nathan and Andreas, introducing the content. The special issue covered a broad range of topics in Cheminformatics from recent work in Machine Learning in Drug Discovery, to large scale data analyses of protein structures and ligand binding.