<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <article-id pub-id-type="doi">10.1109/ICASSP.2018.8461368</article-id>
      <title-group>
        <article-title>ADAPT-IA: Towards an Adaptive AI in Language Technologies applied to RIS3 Industrial sectors⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aitor Álvarez</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arantza del Pozo</string-name>
          <email>adelpozo@vicomtech.org</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Montse Cuadros</string-name>
          <email>mcuadros@vicomtech.org</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thierry Etchegoyhen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Juan Camilo Vásquez-Correa</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ander González-Docasal</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Santiago Andrés Moreno-Acevedo</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Harritxu Gete</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Victor Ruiz</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aingeru Bellido</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander Platas</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maia Agirre</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ariane Méndez</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Javier Mikel Olaso</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonio Aparicio</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Asier López</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>M. Inés Torres</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Begoña Arrate</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joxean Zapirain</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oscar Montserrat</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>José María Echevarría</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>José Ignacio Hormaetxe</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pedro Fortea</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sofia Olaso</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jon Nuñez</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Automotive Intelligence Centre - AIC</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Fundación Vicomtech, Basque Research and Technology Alliance (BRTA)</institution>
          ,
          <addr-line>Donostia - San Sebastián</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Zaragoza, Department of Electronics, Engineering and Communications</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>1</volume>
      <fpage>305</fpage>
      <lpage>309</lpage>
      <abstract>
        <p>Despite the fact that Language Technologies are increasingly benefiting Industry by accelerating and automating processes, Basque remains a largely overlooked language in these technologies, even though it is an oficial language of the Autonomous Community of the Basque Country and has a growing number of speakers. With the aim of promoting its use in industrial environments, ADAPT-IA was launched as a research and development project focused on Adaptive AI technologies applied to Language Technologies in multiple languages, including Basque, aiming to enhance their integration into industrial processes across key sectors in the Basque Country. We detail six use cases related to the development of advanced prototypes within the Automotive, Energy, Machine Tool, and Railway industries. These prototypes were developed through a collaborative efort involving the applied research center Vicomtech, the University of the Basque Country, the Basque Center for Terminology and Lexicography, and the corresponding industry clusters. This work demonstrates the potential of multilingual Language Technologies to address the specific needs of diferent industrial sectors.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Adaptive AI</kwd>
        <kwd>Voice and text Assistants</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>Adaptive Machine Translation</kwd>
        <kwd>Language resources</kwd>
        <kwd>MLOps</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The exponential growth of new technologies, driven by Artificial Intelligence (AI) and its increasing
democratization in society, is becoming a pivotal factor in the digitization of Industry [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In recent
years, Language Technologies (LT) has increasingly benefit the Industry across various areas. These
advancements include many applications, such as the use of machine translation for translating technical
manuals of industrial machinery [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], automatic classification of technical documents, manuals and
reports [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ], data mining and extraction of key information from large volumes of technical texts [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
      </p>
      <p>
        AI assistants to enhance operator interaction with intelligent systems [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and speech processing
technologies [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ], among others.
      </p>
      <p>The AI technology is currently experiencing a golden age. AI systems developed by leading technology
companies, such as OpenAI, Deepseek, Mistral, Google, Meta, Claude, and Qwen, among others, are
increasingly exhibiting remarkable generative capabilities in LT. Although the Large Language Models
(LLM) integrated in these systems possess vast knowledge from being trained on extensive datasets
-often accessible only to major industry players-, their performance and quality tend to decrease when
applied to specific domains, languages, or data types that were not present during training. Despite
their great potential, they still sufer from one of the main challenges of AI in general: models must be
adapted with domain-specific data to perform optimally in specialized scenarios, industrial sectors, or
low-resource minority languages such as Basque.</p>
      <p>Within this context, the ADAPT-IA project aimed to advance research and development in Adaptive
AI technologies applied to LT in several languages, including Basque, with the objective of facilitating
their seamless integration into processes across various Basque Industrial sectors, such as automotive,
machine tool, energy, and the railway industry, aligned with the objectives of RIS3 Euskadi’s Research
and Innovation Strategy for Smart Specialization.</p>
      <p>We initially identified specific areas of interest to develop high-value use cases for each industrial
sector using LT, in collaboration with the representative clusters participating in the consortium. The
ifnal selected use-cases were mainly focused on applications of (1) voice interaction between the
user/operator and machines, (2) domain-expert and generic voice/text assistants for specialized and
open-ended queries about machines, their operation and maintenance, (3) report classification system
to ensure that unstructured inputs converge into a standardized text with consistent meaning, (4) voice
control technology for managing and controlling a production line through spoken queries and data
reporting, and (5) adaptive machine translation systems.</p>
      <p>The Adaptive AI-related activities were accompanied by a significant efort to collect and generate
in-domain data from large amounts of heterogeneous sources, including technical reports, manuals,
specialized journals, technical glossaries, and more general digital repositories. Additionally, generative
AI technology was also employed to produce synthetic data for training and adapting AI models. These
data resources will be freely available to the community for research purposes.</p>
      <p>The R&amp;D activities were performed within the basic research project ADAPT-IA, partially supported
by the Department of Economic Development of the Basque Government. The project started in April
2023 and finalised in December 2024, and was carried out by the following consortium: Vicomtech 1
(project coordinator), University of the Basque Country (UPV/EHU)2, the Basque Centre for
Terminology and Lexicography (UZEI)3, INVEMA4, Cluster de Energía (ACE)5, Industria Ferroviaria Española
(MAFEX)6, and Automotive Intelligence Center (AIC)7.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Main objectives</title>
      <p>The main objective of the ADAPT-IA project was the research and development of Adaptive AI
technologies applied to Language Technologies in multiple languages, with the aim of enhancing and facilitating
their integration into industrial processes across various sectors in the Basque Country. The project also
seeks to explore emerging methodologies that optimize the maintenance, adaptation, and continuous
deployment of neural models in production, leveraging the growing potential of AI in combination
with human knowledge and creativity. Figure 1 provides a simplified overview of the core concept of
the project.
1https://www.vicomtech.org/en
2https://www.ehu.eus/
3https://uzei.eus/
4https://www.afm.es/
5https://www.clusterenergia.com/
6https://mafex.es/
7https://www.aicenter.eu/</p>
      <p>More specifically, the project focuses on the following key technical objectives:
• R&amp;D in Adaptive AI technologies applied to LTs in several languages, including paradigms such
as Active Learning, Reinforcement Learning, and Transfer Learning, as well as their combination
to enhance synergies and leverage the unique benefits of each approach.
• R&amp;D in neural AI modeling for each language technology required in the project’s priority use
cases, with an emphasis on emerging neural architectures and a balance between performance
and eficiency, ensuring the highest possible quality with minimal resource consumption.
• Generation of linguistic resources in several languages to enable the implementation of LT
systems based on Adaptive AI, while promoting the use of the language and its terminological
standardization in key industrial sectors of the Basque Country.
• Development of a prototype by each consortium member and sector, integrating the R&amp;D activities
in Adaptive AI applied to LTs.
• Validation of the prototypes with real users, including training and capacity-building processes.
• Scientific dissemination of the results in leading international forums, journals, and conferences.</p>
      <p>The R&amp;D activities, along with the development of prototypes for each sector and consortium member,
required the compilation and generation of new data for training, adaptation, and evaluation purposes.
These data serve as the foundation for building the systems developed for the following final use cases
(UC):
• UC1: A Multilingual Speech-Based Driver Assistant
• UC2: Technical Norm Documentation Assistant
• UC3: Software Documentation Assistant
• UC4: PCB Assembling Assistant
• UC5: CNC Voice Assistant
• UC6: Adaptive Machine Translation</p>
    </sec>
    <sec id="sec-3">
      <title>3. Use cases and data resources</title>
      <p>This section presents the use cases implemented in the ADAPT-IA project, along with the data resources
compiled and generated for each. These use cases were defined through collaboration between the
UC6
technological partners and the industrial clusters, considering the needs and relevant applications
identified by the latter.</p>
      <sec id="sec-3-1">
        <title>3.1. A Multilingual Speech-Based Driver Assistant</title>
        <p>This prototype, developed collaboratively by Vicomtech, UPV/EHU, UZEI and AIC within the automotive
domain, consists of a multilingual, speech-based driver assistance system operating in both Basque and
English. Integrated into AIC’s industrial vehicle simulator, the assistant enables conversational access
to real-time vehicle data, including speed, trafic conditions, tire pressure, and battery status.</p>
        <p>
          To support the system, domain-specific corpora were created for both languages. For Basque, UZEI
linguists generated sentence templates covering varied word orders, synonyms, omissions, and linguistic
registers. These templates produced over 20 million sentence variations, which were reduced to a diverse
and balanced set of 14K sentences using a cluster-based active learning approach [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. An additional 612
naturally phrased sentences were collected from volunteers to enhance coverage of user intents.
        </p>
        <p>For English, recent advancements in LLMs enabled eficient generation of artificial in-domain data. A
small set of manually curated example queries served as seed data, reflecting typical user interactions
with a car assistant. Based on these examples, additional utterances were generated using
state-ofthe-art LLMs (primarily ChatGPT-4o) resulting in a dataset of 1,700 unique and carefully validated
sentences. Both corpora will be made publicly available upon project completion.</p>
        <p>
          The assistant comprises three main components: an Automatic Speech Recognition (ASR) system, a
Natural Language Processing (NLP) module, and a Text-to-Speech (TTS) unit. The ASR component was
based on NVIDIA’s Parakeet models, which uses state-of-the-art recurrent neural network transducers
(RNN-T) [
          <xref ref-type="bibr" rid="ref10">10, 11</xref>
          ]. Both the Basque and English systems employ the large fast-conformer architecture with
0.6 billion parameters [12]. The Basque model was trained from scratch on 1,258 hours of transcribed
audio and fine-tuned with 12.8 hours of synthesized in-domain data generated using Vicomtech’s
proprietary TTS system. The English model was adapted from NVIDIA’s pre-trained version and
ifne-tuned with 7 hours of synthesized in-domain 8,223 user queries.
        </p>
        <p>The Natural Language Understanding (NLU) unit within the NLP component employs
languagespecific strategies due to the limited availability of robust Basque NLP models. For English,
user queries are embedded using the 384-dimensional all-MiniLM-L12-v2 model from
SentenceTransformers [13]. For Basque, embeddings are extracted using Latxa 7B [14], a language-specific LLM.
In both cases, multilayer neural networks classify the embeddings into one or more user intents. A
rule-based Dialog Manager governs system responses, ofering enhanced controllability and robustness
in domain-specific settings [ 15]. Finally, a template-based Natural Language Generation (NLG) module
generates text responses, which are synthesized into speech via a proprietary Tacotron-2-based TTS
systems [16].</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Technical Norm Documentation Assistant</title>
        <p>This prototype, developed by Vicomtech with INVEMA is designed to assist technicians with information
related to Norms providing the documents where the information appears. The Retrieval-Augmented
Generation (RAG) system developed by Vicomtech for INVEMA is a tailored chatbot designed for users
to make queries about the UNE regulations they own about tools and machines. This chatbot allows
users to make queries about their regulations and obtain quick and accurate answers.</p>
        <p>The RAG system was populated by segmenting the provided regulations using an optimized
sentencewindow chunking method. These segments were then vectorized with the BAAI/bge-m3c embedding
model [17] and stored in a Qdrant vector database [18], managed by LlamaIndex [19] for storage and
retrieval. The generative process directly integrated the retrieved contextual data with the user’s query,
using a locally hosted Phi-4 model [20], a small-scale LM suitable for deployment on systems with
limited GPU memory.</p>
        <p>A 20-question-answer test dataset, created by INVEMA professionals, was used to assess various
chunking strategies and retrieval processes. This dataset allowed both human and LLM evaluation of
diferent RAG approaches. The LLM evaluation used metrics derived from a RAG Triad strategy with
the nuclia/REMi-v0 model [21] fine-tuned for RAG evaluation. These methodologies were integrated to
deliver the final version of the RAG system.</p>
        <p>The INVEMA professionals can access this RAG through a REST API and a Gradio frontend
interface [22]. The interface further grants access to documents referenced in the RAG output and allows
user feedback for each system response.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Software Documentation Assistant</title>
        <p>Closely resembling the previous prototype, the RAG system developed by Vicomtech for a company of
the ACE cluster is a domain-specific chatbot designed to assist users in navigating and understanding
product documentation. This system leverages software manuals as its primary knowledge source,
enabling eficient and context-aware access to technical information embedded within these documents.</p>
        <p>To populate the RAG system, the provided manuals underwent segmentation using an optimized
sentence-window chunking methodology. These chunks of the manuals are subsequently vectorized
with the BAAI/bge-m3 [17] embedding model and stored in a Qdrant vector database, facilitated by
LlamaIndex for both storage and retrieval operations. The generative process involved direct integration
of the retrieved contextual data with the user’s query, employing a locally hosted Mistral 7B Instruct
v0.3 [23] model, a small-scale LM suitable for deployment on systems with limited GPU memory. A
24question test dataset, developed by Optimitive software professionals, was employed to evaluate various
chunking strategies and retrieval processes. This dataset enabled both human and LLM evaluation
of diferent RAG approaches. With respect to the LLM evaluation, some metrics were derived from a
RAG Triad strategy [24] with the Nuclia’s REMi-v0 model fine-tuned for RAG evaluation. Through the
integration of these methodologies, the final version of the RAG system was delivered.</p>
        <p>The company professionals can access this RAG through a REST API and a Gradio frontend interface.
The interface further grants access to documents referenced in the RAG output and allows user feedback
for each system response.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. PCB Assembling Assistant</title>
        <p>This prototype, developed by Vicomtech with a company from the ACE cluster, is designed to assist
technicians in assembling electrical industrial components through a multimodal guidance system. The
primary objective was to develop an intelligent assistant capable of guiding users through the assembly
process step by step while allowing flexible navigation —both forward and backward— within the
instructions. Additionally, the assistant provides supplementary information upon request, enhancing
user support throughout the process.</p>
        <p>The system was initially supplied with three separate manuals in Spanish, containing detailed
component specifications, sequential assembly instructions, verification procedures, and testing report
guidelines, supplemented with numerous interspersed images. However, the original manuals were
not structured for direct and natural interaction, requiring manual edition and reorganization to break
down excessively long assembly steps into a more user-friendly format.</p>
        <p>In order to develop a system capable of assisting technicians through these guidelines, various LLMs
were evaluated on their ability to process, interpret, and relay instructional content while maintaining
accuracy, step order and displaying images when needed. The models tested include LLaMA 3.1-8B,
Phi-4 14B, DeepSeek 7B, and Mistral 7B. Performance was assessed using a test set of approximately 25
queries, structured to include the previous step, the user’s query, and the corresponding ground truth.
The evaluation metrics applied include BLEU, ROUGE-L, Cosine Similarity, Exact Match, Levenshtein
Distance, BERTScore, BLEURT, and G-EVAL.</p>
        <p>Diferent prompting strategies were explored to optimize LLM performance for the procedural
guidance task. Experiments included consolidating all manuals as context into a single prompt, splitting
them into separate prompts, and varying the level of specificity with which instructions were provided
to the model. Initial methods involved minimal instructions, specifying only that the assistant should
guide users step by step. More advanced approaches incorporated memory mechanisms, instructing the
model to retain the previous step and generate the next accordingly. To support visual guidance, image
placeholders with corresponding identifiers were embedded in the system prompt, which were later
replaced with actual images when the model referenced them in its step-by-step instructions. Results
indicate that while more structured and detailed prompts can improve response accuracy, they also
introduce higher latency, and do not always lead to better task execution. Further refinements are being
explored to balance response quality and eficiency.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. CNC Voice Assistant</title>
        <p>Developed by Vicomtech for a company of INVEMA, this prototype is an advanced solution designed to
streamline communication between users and a Computer Numerical Control (CNC) system. It enables
seamless interaction to eficiently obtain crucial parameter values, such as the number of machine axes
and heads, their positions, maximum speeds, and the range of motion for diferent axes, among others.</p>
        <p>The system integrates multiple components. First, an ASR module transcribes spoken user input
into text. This text is then processed by an LLM, which extracts user intent, relevant entities and a
corresponding CNC string code. The extracted information is passed to a CNC interface, which acts as
a bridge between the system and the CNC machine, retrieving the corresponding data from it. Finally,
the retrieved information is presented to the user through a dedicated user interface. Additionally, the
prototype includes simulation software provided to replicate CNC operations [25].</p>
        <p>The LLM component is based on LLaMA 3.1-8B, which has been specifically instructed to map user
input into intents, relevant entities, and CNC string code. For each possible user intent considered
in the system, the LLM was provided with examples of both expected input and the corresponding
output format and data. To ensure robustness, a validation mechanism was incorporated to verify
that the LLM’s responses conform to the expected JSON structure and permitted values. The ASR
system, implemented using the Kaldi toolkit [26], utilizes a neural network-based architecture with
iVector-based speaker adaptation to improve accuracy. For the acoustic and language models, we
employed general-purpose pretrained models. Additionally, the language model was fine-tuned with
task-specific sentences to enhance transcription accuracy. The CNC interface uses the Windows COM
protocol to communicate with the CNC machine simulation software. By utilizing the API functions
defined for the system, it retrieves the necessary data. Finally, the User Interface is a Gradio application
that enables voice interaction. It includes several text fields where the relevant information is displayed
to the user, providing an intuitive and straightforward way to present data retrieved from the CNC
system.</p>
      </sec>
      <sec id="sec-3-6">
        <title>3.6. Adaptive Machine Translation</title>
        <p>This use case covered all four industrial sectors in the project and had, as its main objective, the
development of resources and systems for adaptive Machine Translation (MT), focusing on the
BasqueSpanish translation pair in both directions. Three main aspects were tackled within the project: (i)
resource creation, (ii) machine translation modelling, and (iii) system deployment. We briefly describe
each aspect in turn below.</p>
        <p>Bilingual terminology is a critical component to reach quality translations in specific domains. For the
four sectors of the project, namely Energy, Automotive, Railways and Machine Tool, no terminological
resources were publicly available at the onset of the project, and part of the efort centred on building new
term bases. For this purpose, a corpus was compiled for each industrial sector, which consisted of
domainspecific private documents provided by ADAPT-IA participants from each sector and terminological
reference resources such as dictionaries in Spanish and/or Basque. Bilingual terminologies were
manually crafted and validated according to ISO standards, in particular ISO/TC 37, with around 600
new terminological records per sector. An additional objective was the development of sector-specific
evaluation datasets to assess machine translation (MT) quality across four industrial domains. To this end,
we selected Spanish-language documents representative of each sector, prioritizing high terminological
density and ensuring they were suitable for open sharing with the scientific community. For each test
set, our target was to extract a minimum of 50,000 words—corresponding to over 2,000 sentences—to
enable robust and meaningful evaluation outcomes. An initial assessment of the shareable evaluation
corpora from each sector indicated insuficient volumes of text including relevant terminology overall.
We thus complemented the data from two sources. We first selected public Wikipedia entries dedicated
to topics from each sector, as they contained significant volumes of relevant terms. Additionally, we
generated synthetic text by querying both ChatGPT and DeepSeek to generate text involving a list
of provided terms. The selected texts were professionally translated, and data were aligned at both
paragraph and sentence levels. The newly created term bases and evaluation datasets form part of the
ADAPTIA-MT suite, which will be shared with the scientific community under a CC-BY-NC-ND 4.0
license to foster research and development in adaptive MT for industrial sectors.</p>
        <p>To exploit the resources created within the project, we developed several MT models, contrasting
Neural Machine Translation (NMT) and LLM-based MT (LLMMT) approaches. The NMT models
were adapted for terminologically-sound translation with variants of the approach in [27], where
target language terms are injected in the source text wherever the input features matching terms. The
models, based on an encoder-decoder Transformer architecture [28], were trained to respond to term
injection, identified with specific tags, leading to significant improvements in translation accuracy in
both translation directions. LLMMT models were based on Latxa [14], performing adaptation via
fewshot in-context learning. We evaluated both approaches on the domain-specific test sets, on standard
metrics. In terms of general translation quality, NMT models significant outperformed LLMMT variants,
whereas in terms of terminological accuracy the latter achieved better results. We performed additional
human evaluations by experts in terminological translation for this language pair, which also indicated
significant preference for NMT translations overall. Further details are provided in [29].</p>
        <p>To provide an operative environment, we adapted Vicomtech’s Batua MT ecosystem [30], which
provides both text and document translation, the latter via private access. The environment was made
accessible to all participants of the project and related companies within the clusters. The system
leverages Vicomtech’s proprietary MT solution, Itzuli, which was adapted to support the methods and
models for adaptive MT that were developed within the project. Participants could thus translate any
content in their respective sectors, contrasting generic translation with terminology-enhanced MT in
their domain. Both frontend and backend components of the Batua application were adapted to support
terminology integration and selection, as well as model selection. Beyond the previously described
controlled evaluations, to assess MT model quality, end-users performed subjective evaluations by
translating and reviewing documents in their specific domains, with positive feedback overall.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>In this work, we described ADAPT-IA, a research and development project focused on Adaptive
AI technologies applied to multilingual Language Technologies, including Basque, with the goal of
promoting their adoption and integration into industrial processes across multiple sectors in the
Basque Country. We presented 6 use cases related to the development of advanced prototypes for the
Automotive, Energy, Machine Tool, and Railway sectors, implemented through collaboration between
the applied research center Vicomtech, the University of the Basque Country, the Basque Center for
Terminology and Lexicography, and the respective industry clusters.</p>
      <p>This research has highlighted the importance of applying Language Technologies in these sectors,
including the development of technologies for languages such as Basque. Future work will focus on
the research and development of Small Language Models, with the aim of assessing their performance
in these sectors, which would facilitate their use and deployment. Additionally, the research will be
expanded to new sectors, with the goal of further developing new technologies and prototypes, as
well as generating domain-specific linguistic resources and making them available to the research
community.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work was partially supported by the Department of Economic Development and Competitiveness
of the Basque Government (Spri Group) through funding for the ADAPT-IA project (KK-2023/00035).</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used GPT-4 in order to: Grammar and spelling check.
After using these tool, the authors reviewed and edited the content as needed and take full responsibility
for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Aldoseri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. N.</given-names>
            <surname>Al-Khalifa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Hamouda</surname>
          </string-name>
          ,
          <article-title>Ai-powered innovation in digital transformation: Key pillars and industry impact</article-title>
          ,
          <source>Sustainability</source>
          <volume>16</volume>
          (
          <year>2024</year>
          )
          <fpage>1790</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Herold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ney</surname>
          </string-name>
          ,
          <article-title>On search strategies for document-level neural machine translation</article-title>
          , in: A.
          <string-name>
            <surname>Rogers</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Boyd-Graber</surname>
          </string-name>
          , N. Okazaki (Eds.),
          <source>Findings of the Association for Computational Linguistics: ACL</source>
          <year>2023</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Toronto, Canada,
          <year>2023</year>
          , pp.
          <fpage>12827</fpage>
          -
          <lpage>12836</lpage>
          . URL: https://aclanthology.org/
          <year>2023</year>
          .findings-acl.
          <volume>811</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .findings-acl.
          <volume>811</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Magee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <article-title>Deep learning for technical document classification</article-title>
          ,
          <source>IEEE Transactions on Engineering Management</source>
          <volume>71</volume>
          (
          <year>2022</year>
          )
          <fpage>1163</fpage>
          -
          <lpage>1179</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Madan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Schilder</surname>
          </string-name>
          <article-title>, Multi-label legal document classification: A deep learningbased approach with label-attention and domain-specific pre-training</article-title>
          ,
          <source>Information Systems</source>
          <volume>106</volume>
          (
          <year>2022</year>
          )
          <fpage>101718</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Starly</surname>
          </string-name>
          , “fabner”
          <article-title>: information extraction from manufacturing process science domain literature using named entity recognition</article-title>
          ,
          <source>Journal of Intelligent Manufacturing</source>
          <volume>33</volume>
          (
          <year>2022</year>
          )
          <fpage>2393</fpage>
          -
          <lpage>2407</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Borji</surname>
          </string-name>
          ,
          <article-title>A categorical archive of chatgpt failures</article-title>
          ,
          <source>arXiv preprint arXiv:2302.03494</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Ludwig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kühn</surname>
          </string-name>
          ,
          <article-title>Voice user interfaces in manufacturing logistics: a literature review</article-title>
          ,
          <source>International Journal of Speech Technology</source>
          <volume>26</volume>
          (
          <year>2023</year>
          )
          <fpage>627</fpage>
          -
          <lpage>639</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Multimodal human-robot interaction for human-centric smart manufacturing: a survey</article-title>
          ,
          <source>Advanced Intelligent Systems</source>
          <volume>6</volume>
          (
          <year>2024</year>
          )
          <fpage>2300359</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Moreno-Acevedo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Vasquez-Correa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Martín-Doñas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Álvarez</surname>
          </string-name>
          ,
          <article-title>Stream-based active learning for speech emotion recognition via hybrid data selection and continuous learning</article-title>
          , in: International Conference on Text, Speech, and Dialogue, Springer,
          <year>2024</year>
          , pp.
          <fpage>105</fpage>
          -
          <lpage>117</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>V.</given-names>
            <surname>Srivastav</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Koluguri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moumen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gandhi</surname>
          </string-name>
          , et al.,
          <article-title>Open automatic speech recognition leaderboard</article-title>
          , https://huggingface.co/spaces/hf-audio/open_asr_leaderboard,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>