• Preprint 306

Technical Report 306, c4e-Preprint Series, Cambridge

A chemical species ontology for data integration and knowledge discovery

Reference: Technical Report 306, c4e-Preprint Series, Cambridge, 2023

Associated Themes:
  Theme icon


Highlights
  • OntoSpecies ontology is developed for representing chemical species and their properties.
  • OntoSpecies serves as core component of The World Avatar and it is linked to the existing ontologies in The World Avatar chemistry domain.
  • A software agent is developed to dynamically collect data from PubChem and ChEBI.
  • The ontological format permits advanced queries, and easy data analysis and visualization of chemical information.
Abstract

Graphical abstract Web ontologies are important tools in modern scientific research, as they provide a standardized way to represent and manage large amounts of complex data. In the chemistry field, the need for a comprehensive and reliable semantic database of chemical species is essential for accurate analysis and prediction of chemical behavior. This paper presents OntoSpecies, a web ontology designed to semantically represent chemical species and their properties. The ontology serves as a core component of The World Avatar knowledge-graph chemistry domain and includes a wide range of identifiers, chemical and physical properties, chemical classifications and applications, as well as spectral information associated with each species. The ontology also includes provenance and attribution metadata, ensuring the reliability and traceability of the data. Most of the information about a chemical species is sourced from PubChem and ChEBI data on the respective compound webpages using a software agent, making OntoSpecies the most comprehensive semantic database on chemical species. Access to this reliable source of chemical data is provided through a SPARQL endpoint. The paper presents several use cases to demonstrate the usefulness of OntoSpecies in solving complex tasks that require information at a deep level of knowledge, making it an invaluable tool for scientific research. Overall, the approach presented in this paper is a significant advancement in the field of chemical data management, offering a powerful tool for representing, navigating and analyzing chemical information.

Material from this preprint has been published in Journal of Chemical Information and Modeling.

Download

PDF (3.7 MB)