Question Answering System for Chemistry
- A proof-of-concept Question Answering system for chemical data is built.
- A novel design that integrates a topic model for better accuracy is investigated.
- A training set of 432,989 questions is automatically generated.
- The training set is effective for training both the question classification and the entity extraction model.
This paper describes the implementation and evaluation of a proof-of-concept Question Answering (QA) system for accessing chemical data from knowledge graphs (KGs) which offer data from chemical kinetics to the chemical and physical properties of species. We trained the question classification and named the entity recognition models that specialize in interpreting chemistry questions. The system has a novel design which applies a topic model to identify the question-to-ontology affiliation to handle ontologies with different structures. The topic model also helps the system to provide answers with a higher quality. Moreover, a new method that automatically generates training questions from ontologies is also implemented. The question set generated for training contains 432,989 questions under 11 types. Such a training set has been proven to be effective for training both the question classification model and the named entity recognition model. We evaluated the system using other KGQA systems as baselines. The system outperforms the chosen KGQA system answering chemistry-related questions. The QA system is also compared to the Google search engine and the WolframAlpha engine. It shows that the QA system can answer certain types of questions better than the search engines.