• DD-4-2893-2909

Extraction of chemical synthesis information using The World Avatar

Reference: Digital Discovery 4(10), 2893-2909, (2025)

Highlights
  • Developed a synthesis ontology for structured chemical procedures.
  • Automated data extraction using LLMs and ontology-based JSON schemas.
  • Integrated synthesis data into The World Avatar (TWA).
  • TWA allows interoperability and connection with existing knowledge.
Abstract

Graphical abstract This work presents a generalisable process that transforms unstructured synthesis descriptions of metal–organic polyhedra (MOPs) – a class of organometallic nanocages – into machine-readable, structured representations, integrating them into The World Avatar (TWA), a universal knowledge representation encompassing physical, abstract, and conceptual entities. TWA makes use of knowledge graphs and semantic agents. While previous work established rational design principles for MOPs in the context of TWA, experimental verification remains a bottleneck due to the lack of accessible and structured synthesis data. However, synthesis information in the literature is often sparse, ambiguous, and embedded with implicit knowledge, making direct translation into structured formats a significant challenge. To achieve this, a synthesis ontology was developed to standardise the representation of chemical synthesis procedures by building on existing standardisation efforts. We then designed an LLM-based pipeline with advanced prompt engineering strategies to automate data extraction and created workflows for seamless integration into a knowledge representation within TWA. Using this approach, we extracted and uploaded nearly 300 synthesis procedures, automatically linking reactants, chemical building units, and MOPs to related entities across interconnected knowledge graphs. Over 90% of publications were processed successfully through the fully automated pipeline without manual intervention. The demonstrated use cases show that this framework supports chemists in designing and executing experiments and enables data-driven retrosynthetic analysis, laying the groundwork for autonomous, knowledge-guided discovery in reticular chemistry.


Access options

Associated Themes:
  Theme icon

*Corresponding author:
Telephone: +44 (0)1223 762784 (Dept) 769010 (CHU)
Address: Department of Chemical Engineering and Biotechnology
University of Cambridge
West Cambridge Site
Philippa Fawcett Drive
Cambridge CB3 0AS
United Kingdom
Website: Personal Homepage