Joshua Meyers, Cheminformatics Data Scientist at BenevolentAI will be presenting at the Royal Society of Chemistry on our latest paper: DeeplyTough, Learning to structurally compare protein binding sites
The similarity principle represents a cornerstone of small molecule drug discovery. Identification of structurally similar protein binding sites can help guide efforts for hit finding and understanding polypharmacology. Many approaches for binding site comparison (pocket matching) intend to quantify the likelihood of shared ligand binding between a pair of protein pockets . Methods for pocket matching have traditionally capitalised on human intuition, and employed a broad variety of algorithms and representations of the input protein structures. The ability of neural networks to learn latent representations directly from the input data provides an opportunity to remove the bias associated with intuition-based featurisation schemes, and learn more effectively from extant protein structure data.
DeeplyTough is a convolutional neural network (CNN) that encodes a three-dimensional representation of protein binding sites into descriptor vectors that may be compared efficiently in an alignment-free manner by computing pairwise Euclidean distances. The network is trained on the recently released TOUGH-M1 dataset containing positive and negative pairs of protein pockets . During training the network learns to: (i) provide similar (positive) pockets with similar descriptors, (ii) separate the descriptors of dissimilar (negative) pockets by a minimum margin, and (iii) achieve robustness to nuisance variations. The method is evaluated using three large-scale benchmark datasets, on which it demonstrates excellent performance for held-out data coming from the training distribution and competitive performance when the trained network is required to generalise to datasets constructed independently.
A further advantage of leveraging a CNN for pocket comparison is the efficiency of inferring the similarity between two pockets. It is envisioned that increased throughput combined with sufficient accuracy for identifying similarity between pockets binding similar ligands from unrelated protein folds, will enable more powerful large scale analyses on the proteome level .
Experienced Data Scientist with a demonstrated history of working in the biotechnology industry. Skilled in Python, Machine Learning, Structural Bioinformatics, Chemoinformatics and Medicinal Chemistry. Strong engineering professional with a Doctor of Philosophy (PhD) focused in Chemoinformatics from The Institute of Cancer Research, U. of London.
 Naderi, M., Lemoine, J. M., Govindaraj, R. G., Kana, O. Z., Feinstein, W. P. & Brylinski, M. (2018). Binding site matching in rational drug design: algorithms and applications, Briefings in Bioinformatics, bby078. doi:10.1093/bib/bby078
 Govindaraj, R. G., & Brylinski, M. (2018). Comparative assessment of strategies to identify similar ligand-binding pockets in proteins. BMC bioinformatics, 19(1), 91. doi:10.1186/s12859-018-2109-2
 Meyers, J., Brown, N., & Blagg, J. (2016). Mapping the 3D structures of small molecule binding sites. Journal of Cheminformatics, 8, 70. doi:10.1186/s13321-016-0180-0