Partners: University of Bern, University of Fribourg, DODIS
This project develops novel algorithms for conversational AI and summarization to enable accurate, natural-language interaction with the Diplomatic Documents of Switzerland archive, overcoming OCR errors and LLM limitations to improve historical document analysis and accessibility.
It is based on a collaboration between the applicants’ research groups and the research center Diplomatic Documents of Switzerland (Dodis), which is the center of excellence for the study of history of Swiss foreign policy.
The overarching goal of the project is to research novel algorithms to automate and support the process of studying, understanding, selecting, and editing documents at Dodis. In particular, we strive for a unique solution that allows both experts and interested laypersons to have a written conversation with an artificial system which has in turn access to Dodis’ large document corpus. We aim for a framework that is composed of two building blocks. First, a conversational artificial intelligence (CAI) that accurately answers user questions and follow-up questions and challenges incorrect assumptions, and second a high-quality summarization component that provides accurate and useful summaries of documents avoiding hallucinations.
Both fields of research (i.e., summarization and CAI) gained momentum in the last years and elaborated models are available. However, to further push the boundaries of current understanding and implement a solution that is actually usable for Dodis, substantial efforts in research are required. First, although Dodis will contribute an extensive corpus of real-world documents including ground truth transcripts, we face the major challenge that state-of-the-art OCR is prone to errors for historical typewritten documents and is largely unable to extract (implicit) meta-information from the documents that could be valuable for subsequent research (e.g., the document type or recipient). Second, existing (locally executable) models for summarization produce questionable summaries that are hardly usable for Dodis’ research purposes. Moreover, while public LLMs such as GPT provide grammatically and linguistically sound summaries, the results often do not reflect the historically important aspect of the document (moreover, the summaries are still not precise enough, i.e. they contain hallucinations). Third, we observe that open platforms like Dodis are based on the idea of keyword search which are not able to answer specific questions in natural language. On the other hand generic CAI platforms provide very vague or even false answers to specific questions on Dodis documents (as they do not have good access to the necessary documents). It is a major research challenge of this project to enable natural language interactions with a novel CAI system based on the large amount of documents available. The aim is to provide highly accurate and verified answers to specific user questions without hallucinations.
We start our research by selecting and preparing large-scale ground truth training sets for building the envisaged systems. We will first research and empirically evaluate existing models for all tasks and systematically document all limitations (e.g., fabricated facts or similar). Major contribution of the present project is then to develop and research novel approaches that use large knowledge graphs to guide and improve both the summarization and CAI processes. In the experimental phase we use both automatic evaluation metrics and human-based evaluations. Last but not least, the writing of scientific papers and doctoral theses is also part of the present project.
The significance and impact of the proposed project is manifold (ranging from unique and novel data sets for document analysis to more robust document workflows using LLMs). One of the most important impacts is, however, that the present project might result in radically different services, which can be interpreted as first – yet significant – step towards a more natural human-computer interaction with large archives of historical documents.
Combining Image and Graph-Based Neural Networks for Handwriting Recognition
This project develops deep learning methods for document analysis that combine visual and structural representations of handwriting, reducing the need for manual annotations and enabling applications in historical and low-resource settings.
This proposal aims to develop models, data synthesis and training methods for document analysis tasks such as classification or retrieval that allow the application of deep learning models without the need for manually created annotations. Especially, considering historic documents or low resource language the heavy demand for labeled data hinders the application of learning-based methodology. A key factor in the proposed project is the exploitation of the structural nature of handwriting. In addition to the visual appearance, the underlying structure can represented as a graph. Recent developments in geometric deep learning allow to integrate this structural element on the level of model design. Additionally, explicitly modeling the geometric component serves as a form of regularization, increasing adaptability and the generalization capabilities of the developed models. Training of a model combining visual and geometrical information is then performed with as little supervision as possible. Therefore, data synthesis approaches will be developed that also include structural representations. This project extends the state-of-the art in deep learning based document analysis by developing methods that explicitly model the geometric components of handwriting. The resulting reduction of training data demand opens up a diverse set of application areas and tasks.
Automated HVAC-Concept Audit and Optimisation Using AI
This project aims to close the energy performance gap in modern HVAC systems by developing an AI-driven software solution that optimizes energy efficiency, comfort, and sustainability from the start of building operation.
Modern Heating, Ventilation and Air Conditioning (HVAC) systems are incredibly sophisticated, but rarely live-up to their potential. There is a significant Energy Performance Gap (EPG) between the energy performance calculated at the time of planning and the actual energy performance of the building once the equipment is in operation. In our experience it is up to 30%, which is equivalent to over 4% of global energy usage.
Our vision, under this project, is to tackle the EPG from Day 1, by developing an automated software solution that uses AI and data-driven analytics to maximise the energy-efficiency, comfort and environmental sustainability of buildings.