• Preprint 327

Technical Report 327, c4e-Preprint Series, Cambridge

Natural Language Access Point to Digital Metal-Organic Polyhedra Chemistry in The World Avatar

Reference: Technical Report 327, c4e-Preprint Series, Cambridge, 2024

Associated Themes:
  Theme icon


Highlights
  • Metal-organic polyhedron (MOP) data made accessible via natural language.
  • Visualisation of MOP geometries, including empirically verified and machine-predicted structures.
  • Accelerated adaptation of the QA system to new data and domains using few-shot learning methods.
Abstract

Graphical abstract Metal-organic polyhedra (MOPs) are discrete, porous metal-organic assemblies known for their wide-ranging applications in separation, drug delivery, and catalysis. As part of The World Avatar (TWA) project -- a universal and interoperable knowledge model -- we have previously systematised known MOPs and expanded the explorable MOP space with novel targets. Although this data is available via a complex query language, a more user-friendly interface is desirable to enhance accessibility. To address a similar challenge in other chemistry domains, the natural language question-answering system `Marie' has been developed; however, its scalability is limited due to its reliance on supervised fine-tuning, which hinders its adaptability to new knowledge domains. In this paper, we introduce an enhanced database of MOPs and a first-of-its-kind question-answering system tailored for MOP chemistry. By augmenting TWA's MOP database with geometry data, we enable the visualisation of not just empirically verified MOP structures but also machine-predicted ones. In addition, we renovated Marie's semantic parser to adopt in-context few-shot learning, allowing seamless interaction with TWA's extensive MOP repository. These advancements significantly improve the accessibility and versatility of TWA, marking an important step toward accelerating and automating the development of reticular materials with the aid of digital assistants.

Download

PDF (8.4 MB)