• Preprint 293

Technical Report 293, c4e-Preprint Series, Cambridge

A Simple and Efficient Approach to Unsupervised Instance Matching and its Application to Linked Data of Power Plants

Authors: Andreas Eibeck, Shaocong Zhang, Mei Qi Lim, and Markus Kraft*

Reference: Technical Report 293, c4e-Preprint Series, Cambridge, 2023

Associated Themes:
  Theme icon


Highlights
  • A simple and efficient instance matcher designed for automated environments.
  • Performance competitive to unsupervised Machine Learning methods for matching.
  • Integration of power plant instances into the knowledge-graph based World Avatar.
Abstract

Graphical abstract Knowledge graphs store and link semantically annotated data about real-world entities from a variety of domains and on a large scale. The World Avatar is based on a dynamic decentralised knowledge graph and on semantic technologies to realise complex cross-domain scenarios. Accurate computational results for such scenarios require the availability of complete, high-quality data. This work focuses on instance matching – one of the subtasks of automatically populating the knowledge graph with data from a wide spectrum of external sources. Instance matching compares two data sets and seeks to identify instances (data, records) referring to the same real-world entity. We introduce AutoCal, a new instance matcher which does not require labelled data and runs out of the box for a wide range of domains without tuning method-specific parameters. AutoCal achieves results competitive to recently proposed unsupervised matchers from the field of Machine Learning. We also select an unsupervised state-of-the-art matcher from the field of Deep Learning for a thorough comparison. Our results show that neither AutoCal nor the state-of-the-art matcher is superior regarding matching quality while AutoCal has only moderate hardware requirements and runs 2.7 to 60 times faster. In summary, AutoCal is specifically well-suited to be used in an automated environment. We present its prototypical integration into the World Avatar and apply AutoCal to the domain of power plants which is relevant for practical environmental scenarios of the World Avatar.

Download

PDF (3.0 MB)