PoVeJMo

Adaptive Natural Language Processing with the Help of Large Language Models

PROJECT TYPE

Local project - funded by ARIS.

DURATION

01.09.2023 - 30.06.2026

PROJECT MANAGER AT XLAB

Daniel Vladušič, PhD

PoVeJMo at its core builds on the success of extremely large language models, however acknowledging and focusing on the constraints posed by limited data and resources for training and use. The problem of prevailing commercial models developed by giants like Google, Microsoft and OpenAI is also their closed nature.

So, we’re taking a different approach using smaller open-source models like LLaMa and addressing the issues of resource scarcity, data availability, and promoting openness, all while delivering results comparable to those achieved by larger models.

We’re doing two main things in the project. First, we’re creating the computationally efficient open-source large language models for Slovene language, making it the first such model for our morphologically rich language. And second, we’re aiming to peruse the results of the first one and develop specific models and software in several areas:

1. Preparation of museum materials and descriptions.

2. Slovenian speech recognition and synthesis: A baseline, serving as a starting point for advanced industrial applications.

3. Medical applications: Specialization in medical texts and instructions for clinical use.

4. Infrastructure code generation: Utilizing computationally efficient large language model technology and developed pipelines for instruction-following datasets to enable infrastructure code generation.

XLAB’s role

XLAB builds resource and data-efficient models, focusing on creating specialized models designed for generating infrastructure code. In this area, XLAB is a leader with its product Steampunk Spotter, Ansible Playbook Platform, ensuring trustable code generation.