Our team will be heading to BioNLP workshop end of July to present a poster on our latest paper: Constructing large scale biomedical knowledge bases from scratch with rapid annotation of interpretable patterns.
Knowledge base construction is crucial for summarising, understanding and inferring relationships between biomedical entities. However, for many practical applications such as drug discovery, the scarcity of relevant facts (e.g. gene X is therapeutic target for disease Y) severely limits a domain expert's ability to create a usable knowledge base, either directly or by training a relation extraction model.
In this paper, we present a simple and effective method of extracting new facts with a pre-specified binary relationship type from the biomedical literature, without requiring any training data or hand-crafted rules. Our system discovers, ranks and presents the most salient patterns to domain experts in an interpretable form. By marking patterns as compatible with the desired relationship type, experts indirectly batch-annotate candidate pairs whose relationship is expressed with such patterns in the literature. Even with a complete absence of seed data, experts are able to discover thousands of high-quality pairs with the desired relationship within minutes. When a small number of relevant pairs do exist - even when their relationship is more general (e.g. gene X is biologically associated with disease Y) than the relationship of interest - our system leverages them in order to i) learn a better ranking of the patterns to be annotated or ii) generate weakly labelled pairs in a fully automated manner.
We evaluate our method both intrinsically and via a downstream knowledge base completion task, and show that it is an effective way of constructing knowledge bases when few or no relevant facts are already available.
Ashok Thillaisundaram is a machine learning researcher at BenevolentAI. He read mathematics for his first degree followed by a PhD in applied mathematics all from the University of Cambridge. His recent work in machine learning has focused on reinforcement learning and natural language processing.
Sia is a machine learning researcher specialising in Natural Language Processing. She obtained a PhD in 2015 from the University of Cambridge and previously an MSc in Speech and Language Processing and an MPhil in Linguistics (from the University of Edinburgh and University of Cambridge respectively). For the last five years, she has been working in the tech industry focusing on information extraction, information retrieval and knowledge representation.
Julien Fauqueur is an AI research manager at BenevolentAI developing Natural Language Processing methods for Biomedical Information Extraction. He holds a PhD in Computer Science from INRIA (France). While a research associate at the University of Cambridge, then as a researcher in start-ups, he developed new methods and products using computer vision and machine learning. His work was published in over 20 publications and led to 4 patents.