Meet Ashok at EMNLP for an oral presentation about his latest paper co-authored with Sia Togia: Biomedical relation extraction with pre-trained language representations and minimal task-specific architecture/
This paper presents our participation in the AGAC Track from the 2019 BioNLP Open Shared Tasks. We provide a solution for Task 3, which aims to extract “gene – function - change – disease” triples, where “gene” and “disease” are mentions of particular genes and diseases respectively and “function change” is one of four pre-defined relationship types.
Our system extends BERT (Devlin et al., 2018), a state of-the-art language model, which learns contextual language representations from a large unlabelled corpus and whose parameters can be fine-tuned to solve specific tasks with minimal additional architecture.
We encode the pair of mentions and their textual context as two consecutive sequences in BERT, separated by a special symbol. We then use a single linear layer to classify their relationship into five classes (four pre-defined, as well as ‘no relation’). Despite considerable class imbalance, our system significantly outperforms a random baseline while relying on an extremely simple setup with no specially engineered features.
Ashok Thillaisundaram is a machine learning researcher at BenevolentAI. He read mathematics for his first degree followed by a PhD in applied mathematics all from the University of Cambridge. His recent work in machine learning has focused on reinforcement learning and natural language processing.
Sia is a Senior machine learning researcher specialising in Natural Language Processing. She obtained a PhD in 2015 from the University of Cambridge and previously an MSc in Speech and Language Processing and an MPhil in Linguistics (from the University of Edinburgh and University of Cambridge respectively). For the last five years, she has been working in the tech industry focusing on information extraction, information retrieval and knowledge representation.