• Preprint 266

Technical Report 266, c4e-Preprint Series, Cambridge

A Question Answering System for Chemistry

Reference: Technical Report 266, c4e-Preprint Series, Cambridge, 2021

Associated Themes:
  Theme icon


Highlights
  • A proof-of-concept Question Answering system for chemical data is built.
  • A novel design that integrates a topic model for better accuracy is investigated.
  • A training set of 80085 questions is automatically generated.
  • The training set is effective for training both the question classification and the entity extraction model.
Abstract

Graphical abstract This paper describes the implementation and evaluation of a proof-of-concept Question Answering system for accessing chemical data from knowledge graphs which offer data from chemical kinetics to chemical and physical properties of species. We trained a question type classification model and an entity extraction model to interpret chemistry questions of interest. The system has a novel design which applies a topic model to identify the question-to-ontology affiliation. The topic model helps the system to provide more accurate answers. A new method that automatically generates training questions from ontologies is also implemented. The question set generated for training contains 80085 questions under 8 types. Such a training set has been proven to be effective for training both the question type classification model and the entity extraction model. We evaluated the system using the Google search engine as the baseline. We found that it can answer 114 questions of interest that Google or Wolfram alpha can not give a direct answer to. Moreover, the application of the topic model was found to increase the accuracy of constructing the correct queries.

Material from this preprint has been published in Journal of Chemical Information and Modeling.

Download

PDF (2.4 MB)