Technical Report 338, c4e-Preprint Series, Cambridge
Parliamentary debates in The World Avatar: A hybrid retrieval-augmented generation system
Reference: Technical Report 338, c4e-Preprint Series, Cambridge, 2025
- A retrieval-augmented generation (RAG) system for German parliamentary debates.
- Hybrid approach employs both vector store and knowledge graph retrieval.
- Automated pipeline for ontology generation and instantiation from debate minutes.
Parliamentary debates form a cornerstone of democratic governance, but their complexity and volume make them difficult for citizens, journalists, and researchers to access in meaningful ways. This paper presents a proof-of-concept system that enables natural language questions to be asked of records of debates in the German Bundestag. The approach combines two complementary methods: retrieval from unstructured text (parliamentary speeches) and retrieval from structured metadata represented in a knowledge graph. By integrating both sources, the system can answer questions that go beyond simple keyword searches, such as identifying which parties most often supported one another or when particular topics drew strong audience reactions. All data are drawn from open parliamentary records, which are automatically processed into machine-readable form. The system is embedded in The World Avatar, a dynamic knowledge graph designed to connect heterogeneous data sources across domains. We argue that this hybrid approach illustrates the potential of artificial intelligence to enhance transparency, public accountability, and citizen engagement with democratic institutions, while also highlighting challenges related to accuracy, interpretation, and responsible design.
PDF (936.4 KB)


