<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Fine-tuning BERT Models on Demand for Information Systems Explained Using Training Data from Pre-modern Arabic</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Thomas Asselborn</string-name>
          <email>asselborn@ifis.uni-luebeck.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sylvia Melzer</string-name>
          <email>sylvia.melzer@uni-hamburg.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Said Aljoumani</string-name>
          <email>saidaljomani@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Magnus Bender</string-name>
          <email>bender@ifis.uni-luebeck.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Florian Andreas Marwitz</string-name>
          <email>f.marwitz@uni-luebeck.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Konrad Hirschler</string-name>
          <email>konrad.hirschler@uni-hamburg.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ralf Möller</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universität Hamburg, Centre for the Study of Manuscript Cultures</institution>
          ,
          <addr-line>Warburgstraße 26, 20354 Hamburg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Lübeck, Institute of Information Systems</institution>
          ,
          <addr-line>Ratzeburger Allee 160, 23562 Lübeck</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Humanities scholars can use Large Language Models (LLMs) to simplify text analysis and pattern recognition. Fine-tuning LLMs for specific humanities tasks can be challenging due to limited training data. However, in the humanities exists a growing number of information systems with research data which can be used for this purpose. This article outlines how to fine-tune Bidirectional Encoder Representations from Transformers (BERT) models using pre-modern Arabic data available in an information system. We also introduce the Humanities Aligned Chatbot (ChatHA) for user-friendly interaction with the ifne-tuned model to break down the barriers to the application of LLMs in the humanities. The result we have achieved is that all archived research data can be used in a research data repository for fine-tuning models in a short time without requiring IT expertise. Additionally, users can chat with a ChatHA, which provides users with more precise answers. This success is also attributed to the availability of well-structured data in canonical form, enabling us to precisely define the mapping of entity types to labels. In addition, we use a manifest file which serves as the cornerstone for structuring and organizing training data to automate the Fine-tuning on Demand (FToD) process. The results we obtained show that the FToD process can be done in just a few minutes using a sample dataset and BERT. The FToD process identified names of people, places, or dates written in pre-modern Arabic that could not be recognised by the pre-trained model.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Fine-tuning on demand</kwd>
        <kwd>BERT</kwd>
        <kwd>pre-modern Arabic</kwd>
        <kwd>manifest file</kwd>
        <kwd>ChatHA</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Large Language Models (LLMs) are a subfield of Natural Language Processing (NLP) and are
based on transformer architecture neural networks that use deep learning algorithms [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ].
They are pre-trained on huge amounts of text and fine-tuned for specific tasks. Some of the most
popular LLMs are Bidirectional Encoder Representations from Transformers (BERT) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
Generative Pre-trained Transformer 3 (GPT-3) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and Text-to-Text Transfer Transformer (T5) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
BERT is an LLM specifically designed for text processing and is capable of modelling semantic
relationships between words and sentences. It is used for various applications, such as automatic
text summarisation, question-answering generation, and sentiment analysis. Although BERT
has been trained with 3300M data from BooksCorpus and English Wikipedia [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], there are
limitations to the application in specific domains. The limitations arise from the fact that the
validity of a design is not equally applicable to everything. In Arab countries, for example, the
names with first name and surname are adapted to the Western naming pattern (e.g. first name:
Hamza, surname: ibn Omar ibn Mustafa). In the early Arabic texts as much as in texts from
other world regions, however, there are diferent naming patterns. The name also included, for
example, membership of a tribe, the origin of a place, or even the afiliation to a legal school or
occupational title. A place name can thus also be part of a person’s name and should therefore
be labelled as such. It also happens that places are paraphrased, such as the Golden Mosque, the
Glittering School or the Big Tower. The indication of a date is also diferent from the Gregorian
calendar, as can be seen from the following sentence:“Hamza ibn Omar ibn Mustafa went to
Marrakesh on the ninth day of the month of Shawwa¯l.”.
      </p>
      <p>The diferent date representations can be found in the collection of so-called audition
certificates written in pre-modern Arabic. Audition certificates are notes on a written artefact
(such as a codex, a scroll, or a sheet of paper) to document the authorised transfer of a text from
the book from teacher(s) to student(s). At the Centre of Manuscript Cultures (CSMC) at the
Universität Hamburg, research data was collected from over 3,000 audition certificates. A total
of over 50,000 annotations were made in the texts of the audition certificates. These annotations
are persons, places, and dates as well as their role in the reading session. Annotations and
other research data were stored in an information system called Audition Certificates Platform
(ACP)1.</p>
      <p>At the CSMC are about 13 other information systems which represent research data in
diferent, project-specific data models. Each project has therefore collected and stored
projectspecific data. It is exactly this data that can now be used as fine-tuning to train the LLMs to the
needs of the projects.</p>
      <p>Considering that humanities scholars involved in a new project aim to assess textual artefacts
in terms of person names, locations, dates, or other types of entities, and lack their specific
ifne-tuning data, they have the option to utilize fine-tuning data obtained from other projects.
Looking outside the CSMC, there are research data repositories like the Research Data Repository
(RDR) or Zenodo that also have lots of archived data that could be used for fine-tuning.</p>
      <p>Humanities scholars, as well as other researchers without specific IT knowledge, are already
producing labelled data, i.e., texts with tags or categories assigned to specific words or sentences</p>
      <sec id="sec-1-1">
        <title>1https://www.audition-certificates-platform.org/</title>
        <p>indicating meaning or relevance. These data might however be in a format that can not be
used directly to fine-tune a LLM, e.g., in Microsoft Word docx. To properly use the data, certain
pre-processing steps might be needed which can be time-consuming and hard to understand,
requiring an IT expert to do the task.</p>
        <p>Various tools and libraries are available for the process of building a fine-tuned model.
However, the biggest challenge is to empower scholars to use them without requiring the
assistance of an IT expert. This would enable scholars to build their own fine-tuned models
using project-specific data archived in research data repositories. Therefore, an approach is
needed to allow scholars to build fine-tuned models intuitively and independently. To address
this need, we have developed the Fine-Tuning on Demand (FToD) approach, which enables
scholars to build project-specific models on demand in just a few seconds and with minimal
resources. In this article, we explain the FToD approach using training data from pre-modern
Arabic. The FToD approach can also be applied to other data from diferent domains, making it
universally applicable in the humanities as well as in other fields.</p>
        <p>The first contribution of this article is to define the FToD process to reduce the time needed
to pre-process a dataset to fine-tune a BERT model to a specific task. The users of the system
do not need to know the specifics of the BERT models and the libraries used.</p>
        <p>The second contribution of this article is to present how to use fine-tuned models which can
also be used to provide a chatbot that can answer natural language questions about ancient
texts and the humanities. In contrast to publicly available chatbots, we propose the Humanities
Aligned Chatbot (ChatHA) which will be fine-tuned to the specific data in the RDR. Hence,
ChatHA can provide detailed answers based on the available data. Moreover, ChatHA’s ability
to handle queries in natural language and provide answers in a conversational form makes it
easier for humanities scholars to use the system and lowers barriers to access.</p>
        <p>The results we obtained show that the fine-tuning process can be done in just a few minutes
using a sample dataset and BERT. The FToD process identified names of people, places or dates
that could not be recognised by the pre-trained model. In the end, we provide an extended
outlook to ChatHA, a Humanities Aligned Chatbot trained on the same dataset as BERT.</p>
        <sec id="sec-1-1-1">
          <title>1.1. Related Work</title>
          <p>Data Management Plans (DMPs) have a focus on archiving research data for a long time and
making it accessible to other researchers. To fulfil the requirements of funding programs or
regulations, tools and research data repositories (RDRs) have been developed. These tools
provide a structured description of the data, eliminating the need for individual researchers to
handle it themselves. As a result, RDRs are well-suited for data archiving. Users can reuse this
archived data to train LLMs, for example.</p>
          <p>
            Scholars can use LLMs to simplify their work to analyze texts or recognize patterns in data.
If domain-specific problems need to be solved, the existing models need to be fine-tuned. There
is limited information on the application of fine-tuning LLMs in the humanities. A multilingual
transformer model called CAMeLBERT [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] for segmenting Arabic text without punctuation is
employed. The results showed that the proposed model outperformed other models in terms
of accuracy and demonstrated the potential of fine-tuning LLMs in the humanities for various
tasks. Further research is needed to explore the full potential of these models.
          </p>
          <p>
            In [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ], the survey of NLP approaches presents recent work that uses LLMs to solve NLP tasks
via pre-training and fine-tuning, prompting and other techniques.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Audition Certificates</title>
      <p>Audition certificates are a salient feature of Arabic manuscript cultures. They are notes written
on an artefact that document the authorised transmission of texts from teacher(s) to student(s).
Specifically, texts were read out aloud (by a teacher or one of the students) and at the end of a
reading session, one of the members of this reading group added the audition certificate to the
book. By virtue of their participation, all students now had the right to act as teachers in future
reading sessions.</p>
      <p>Audition certificates can include: the name of the teacher(s), the name of the student(s), the
name of the reader, the name of the writer of the certificate, the name of the book’s owner, the
date of the reading, the place of the reading, and many other data.</p>
      <p>An audition certificate, both in original Arabic script and translation, is presented below. The
example was taken from the manuscript Bibliothèque nationale de France, arabe 708, fol. 38v and
can be found as a digital copy online2 as well as in an information system at the Universität
Hamburg3. In the following text, persons are highlighted in green, dates in pink and locations
in blue.</p>
      <p>Original Arabic text:
ماملإا نيلجلأا نيخيشلا ىلع هدعب ثلاثلا ءزجلا عيمجو دواد يبلأ ننسلا باتك نم يناثلا وهو ؤزجلا اذه عيمج تأرق
روكذملا هدنسب ينطلاسقلا يلع نب دمحأ سابعلا يبأ ةودقلا خيشلا نب دمحم ركب يبأ نيدلا بطق لضافلا ظفاحلا ملاعلا
يقشمدلا ىيحي نبا فسوي نب ميحرلا دبع لضفلا يبأ نيدلا باهش دنسملا لضافلاو لولأا ءزجلا يف عامسلا ةقبط يف
نيدلا ردص و دمحأ سابعلا وبأ نيدلا نيز ءجلالأا ةداسلا امهعمسف دزربط نم هعامسب ةزملا بيطخ نباب فرع يعفاشلا
يعفاشلا نيزر نب نيسحلا نب دمحم اله دبع يبأ نيملسملا يتفم نيدلا يقت ةاضقلا يضاق انديس ادلو ربلا دبع ريخلا وبأ
تاكربلا وبأ نيدلا ءاهبو دمحأ سابعلا وبأ نيدلا فرش و دمحم نيدلا سمش نب نامثع ورمع وبأ نيدلا رخف امهتخأ دلوو
دمحمو دمحم يلاعملا يبأ نيدلا ني مأ نب يلع نيدلا رون امهيخأ نباو ينطلاسقلا نبا نيدلا بطق خيشلا ادلو قحلا دبع
نيدلا سمش و يعبرلا ليمج نبا ]مسلالا[ دبع نب مساقلا يبأ نب دمحم نيدلا سمش لضافلا هيقفلاو يلع نب دمحأ نب
عسات ءاعبرلأا موي يف تبثو كلذ حصو لصلأا يف ةعامجلا ءامسأ تبثم وهو يفوصلا يبلحلا ليلخ نب ناردب نب ليلخ
رادب هتياور ]اذك[ هل زوجي ام عيمج نيروكذمللو يل ناعمسملا زاجأو ةئامتسو نيعبسو عبس ةنس نم ىلولأا ىدامج
ىلع هتاولصو هدحو له دمحلاو يمخللا يلع نب نسحلا نب ىسيع نب يلع نب نسحلا هبتك ةرهاقلاب ةيلماكلا ثيدحلا
.همسلاو هبحصو هلآو دمحم انديس
Translation:
“I have read all of this part, and it is the second from the book Al-Sunan by Abu Dawud, and
all of the subsequent third part under the authority of the two eminent sheikhs, the scholar,
the hafiz, the virtuous Qutb al-Din Abi Bakr Muhammad ibn al-Sheikh al-Qudwa Abi al-Abbas
Ahmad ibn Ali al-Qastalani according his chain of transmission mentioned in the audition
2https://gallica.bnf.fr/ark:/12148/btv1b11000356s/f19.item.r=%22arabe%20708%22.zoom#
3https://www.audition-certificates-platform.org/ac/245
certificate that is in the first part, and the virtuous Musnad Shihab al-Din Abi al-Fadl Abd
al-Rahim ibn Yusuf ibn Yahya al-Dimashqi al-Shafi’i known as Ibn al-Khatib al-Mizza by virtue
of the authorised transmission from Tabarzad, distinguished masters Zain al-Din Abu al-Abbas
Ahmad and Sadr al-Din Abu al-Khair Abd al-Barr the sons of our Master the Chief Judge
Taqi al-Din the mufti of the Muslims Abi Abd Allah Muhammad ibn al-Husain ibn Ruzayq
al-Shafi’i and their nephew Fakhr al-Din. Abu Amr Othman ibn Shams Al-Din Muhammad
and Sharaf Al-Din Abu Al-Abbas Ahmed and Baha’ Al-Din Abu Al-Barakat Abdul-Haq, the
sons of Sheikh Qutbuddin ibn Al-Qastalani and their nephew Nur Al-Din Ali ibn Amin Al-Din
Abi Al-Ma’ali Muhammad and Muhammad ibn Ahmad ibn Ali and the virtuous jurist Shams
Al-Din Muhammad ibn Abi Al-Qasim ibn Abd [al-Salam] ibn Jamil al-Raba’i and Shams al-Din
Khalil ibn Badran ibn Khalil al-Halabi al-Sufi, and he established the names of the group in
the original, and that was confirmed on Wednesday, the ninth of Jumada al-Awwal of the year
seventy-seven and six hundred. The two authorised teachers gave me and those mentioned the
authority to transmit all that had been authorised to them. [This reading took place] in the Dar
al-Hadith al-Kamiliyah in Cairo, written by Al-Hassan ibn Ali ibn Isa ibn Al-Hassan ibn Ali
Al-Lakhmi.”</p>
      <sec id="sec-2-1">
        <title>Audition Certificates Platform</title>
        <p>Audition certificates are the result of complex documentary practices and they were written by
highly specialised communities. We have collected research data related to audition certificates
and archived them in a database. On top of the database, we built an information system which
is called the Audition Certificates Platform (ACP).</p>
        <p>ACP was implemented as an information system using the database management tool Heurist4.
The objective is to analyze the collected research data using Heurist and make it accessible to
users through a website. Once the project is completed, the information system can be exported
in formats such as CSV, XML, or JSON and archived in an RDR. The archived data is now
available as training data.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Fine-Tuning on Demand</title>
      <p>Collect research data</p>
      <p>Archive in research
data repository</p>
      <p>Fine-tune
Transformer model</p>
      <p>Display results to
expert</p>
      <p>Validate and store
results</p>
      <p>The FToD process is presented in Figure 1. FToD consists of five steps. As a first step,
researchers collect research data which can be stored in formats such as JSON, XML, CSV, or in
a database. Once the researcher has collected all of their data, they will be archived in an RDR</p>
      <sec id="sec-3-1">
        <title>4https://heurist.fdm.uni-hamburg.de/index_en.html</title>
        <p>together with a manifest file. We use the manifest file which is called Metadata Encoding &amp;
Transmission Standard (METS) file.</p>
        <p>METS is a standard that enables the exchange of digitized documents between cultural
heritage institutions and is an XML schema for the creation of digital objects. A digital object
can consist of one or more digital files, which can be in diferent formats and describe a detailed
internal structure. A METS file consists of seven major sections. Here, an excerpt is presented
of the fields important for FToD. Other fields are described in the standard 5.
• METS Header: The METS Header is a section within the METS file that contains metadata
about the METS file itself. It includes information such as the creator and editor of the file.
• File Section: The file section in METS lists all the files that make up the digital object.</p>
        <p>
          It includes &lt;file&gt; elements that can be grouped within &lt;fileGrp&gt; elements to organize
the files based on diferent versions or categories of the object. Additionally, it includes
functions, for example, what data is to be harvested (e.g. JSON file) or which process is to
be executed; in this case FToD. The function is represented with the attribute “USE”, with
the assignment USE=“json” or USE=“FToD.”
• Structural Map: The structural map is a crucial component of a METS file. It outlines
the hierarchical structure of the digital library object, defining the relationships between
diferent elements. It also links the elements to the corresponding content files and
metadata. The Structural Map is built in a tree-like structure using multiple nested DIV
elements, with the root DIV element containing all other DIV elements. The DIV elements
directly under the root DIV element represent the possible models for training (e.g. BERT)
and within the DIV elements of the models. The DIV element contains a Filepointer
element that points to the File element of the research data file (e.g. JSON).
• As an extension of Tilp’s METS Generator (see [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]), we also use the Structmap element to
do the label assignment for the FToD process. Each DIV is assigned all the labels that are
entered as input in BERT. The Filepointers would then represent the field names. Several
Filepointers can be specified. Thus, the fields “Region” and “Territory” can be labelled
with “Place.”
        </p>
        <p>
          METS files can also be used for other processes such as the Databasing on Demand (DBoD)
processes [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. With the help of a METS file, data stored in, e.g., a CSV file and archived in an
RDR are transferred to a database at the press of a button. (The button was added to the RDR in
a prototype implementation.) More information on the practical application of the METS file
and the DBoD process is described in Stahl’s bachelor thesis [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>Once everything is archived in the repository, a registered and authorised user will see a
button with the text “Train my model” (see Figure 2). Train my model was chosen in favour of
ifne-tune my model because it may be easier to understand for non-IT experts. Once an FToD
process is started, a screen is shown asking the user to be patient (see Figure 3). Since the
ifne-tuning process might take a long time, depending on the dataset, hardware configuration,
specific task, etc., the user will additionally receive an e-mail once the process has finished (see
Figure 3). In addition, the information will be presented on the screen as well.</p>
      </sec>
      <sec id="sec-3-2">
        <title>5https://www.loc.gov/standards/mets/METSOverview.v2.html</title>
        <p>
          Once the fine-tuning process has been completed, the user is redirected to a screen where
they can verify the results (see Figure 4). The screen in Figure 4 is shown when the BERT
named entity recognition task has been chosen. Verification screens for other tasks will be
implemented in the future as necessary. Additionally, a user also receives an email with a link
to this screen. In some circumstances, the process can take a little longer, so we have included
a function in a prototype implementation that informs users by email when the process has
been completed successfully. More details on the implementation of a notification function are
described in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>On the screen, it is possible to enter a new text and see the predictions for two models. One is
the pre-trained model, denoted as “Standard”, while the other is the fine-tuned model denoted
as “My model”.</p>
        <p>The overview of both models enables users to quickly determine whether fine-tuning the
model on the specified data set has improved performance to meet their particular requirements.
In our example, Islamic dates are now correctly identified which was not the case before.</p>
        <p>If a user is satisfied with their model, they will be able to save it in the RDR again in a future
version of the implementation. Such archived models will empower other users with a similar
task, e.g. other Arabic texts that are not yet labelled but should be, to reuse the model for
automatic labelling.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Application</title>
      <p>Our demo application uses the audition certificate data stored in JSON as the data set for the
FToD process.</p>
      <p>
        We have used the HuggingFace Transformers library [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] to execute the FToD process. The
pre-trained model of choice was “CAMeL-Lab/bert-base-arabic-camelbert-ca-ner”6 which is
a BERT model specifically pre-trained on classical Arabic texts and then fine-tuned on the
ANERCorp dataset7.
      </p>
      <p>1800 of the 3000 annotated audition certificates in JSON format in a Zip file together with the
METS file have been loaded into a demo version of the RDR. Annotations are persons, locations
and dates.</p>
      <sec id="sec-4-1">
        <title>6https://huggingface.co/CAMeL-Lab/bert-base-arabic-camelbert-ca-ner 7https://camel.abudhabi.nyu.edu/anercorp/</title>
        <p>After all the necessary data have been successfully uploaded, and the user is granted
permission to fine-tune a model with specified data, the RDR displays a new button labelled “Train my
model” (Figure 2). After activating the button, a script is executed which reads out the METS file
and executes a corresponding behaviour depending on the configuration data. For our example,
the following parameters have been chosen:
• A train-test-split of 80% and 20%,
• a learning rate of 10− 4,
• a weight decay of 10− 5, and
• 5 epochs.</p>
        <p>All other parameters are used as they are defined in the HuggingFace library, but the parameters
can also be changed by users via a METS file if required. Fine-tuning on an NVIDIA DGX2 with
selected parameters took only a few minutes.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Evaluation</title>
      <p>
        In many cases, using custom models for fine-tuning can improve results, but this is not always
the case [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. After we labelled the places described, such as the Golden Mosque as locations,
the results became worse as places like Baghdad were no longer identified as locations. At this
point, experts would have to evaluate the result and repeat the fine-tuning process.
      </p>
      <p>The performance of both the pre-trained CAMeLBERT as well as our fine-tuned model have
been evaluated using the test data set. We have evaluated the performance using precision,
recall, and F1 score. Table 1 shows the results using the pre-trained model. The precision, recall
and F1 score for the labels DAT, MISC and ORG are zero. The reason for the labels “MISC” as
well as “ORG” having all zero scores is that both categories have not been labelled in our data
set. “DAT”, i.e., date, was labelled 319 times but the pre-trained model was not able to detect
them. Thus, they also got a zero score for all metrics. Locations as well as persons got some
matches but the performance is not good enough using the test data set.</p>
      <p>label
DAT
LOC
MISC
ORG
PERS
Overall</p>
      <p>Table 2 provides the results using our fine-tuned model. Performance with all metrics has
increased for all given labels. Since there are no entities labelled “MISC” and “ORG”, both values
have been omitted. As can be seen, performance has increased for all labels. The model is now
able to detect dates which was not possible before. Especially looking at persons, the detection
is now in a good range compared to the pre-trained model.</p>
      <p>The use of LLMs is a big challenge, especially for non-IT experts. To get an FToD process
up and running requires some prior knowledge or a user has to laboriously gather everything
until an FToD process can be executed. In the article, we presented how to fine-tune a model
with a click, starting from archived data.</p>
      <p>
        Labelling is a task that is time-consuming and monotonous. Thus, it can be prone to small
errors and mistakes [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. One mistake found in our data set is the labelling of the ð (Arabic
translation of “and”) as part of a person’s name. Incorrect labels have an impact on both the
ifne-tuning of the model itself as well as on the computation of the metrics to evaluate the
models.
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. ChatHA</title>
      <p>Having employed an FToD system for identifying names of persons, places, or dates, we aim to go
further and outline a more feature-rich and intuitive information system for humanities scholars.
Currently, the model for executing the FToD process is made available to an information system
providing a Graphical User Interface (GUI). However, it is sometimes dificult to deal with GUIs
and each interface needs to be created for the specific tasks of humanities scholars.</p>
      <p>Additionally, scholars have to become familiar with each new interface for diferent tasks.
Instead, we present a system with a more universal interface, enabling scholars to engage with
the data in a manner that feels more intuitive. This system ensures that scholars can interact
naturally with a model that is fine-tuned to meet their specific needs on demand, eliminating
the requirement to adapt to multiple distinct interfaces.</p>
      <p>
        As a solution, we outline ChatHA, a Humanities Aligned Chatbot. We adopt the already
presented mechanism to fine-tune an LLM, e.g., GPT-4 [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. GPT-4 is a better choice for a
chatbot compared to BERT because it was pre-trained to generate text and answer queries of
various forms. With ChatHA it is not necessary to create a GUI for a task, as each task can be
sent as a textual question.
      </p>
      <p>A system like ChatHA is built automatically in an FToD process. Therefore, it is built with less
supervision, which also brings some problems: LLMs have no true understanding of research
data archived in RDRs and its content. LLMs try to combine the best answers based on texts
they processed during training. Therefore, LLMs are prone to hallucinations, i.e. erroneous
answers invented by the LLM. Furthermore, LLMs do not cite their sources, at least not directly
in the raw LLM output. To combat the problems, we do not output the raw LLM output during
question answering, but rather post-process it to include citations. The obtained result is then
displayed to users and the new citations allow the user to validate the answer.</p>
      <p>This post-processing and the overall workflow are described in more detail in the following:
First, we choose some pre-trained LLM. We opt to use a pre-trained version to include basic
natural language understanding and general query answering. Second, a user, in our case a
humanities scholar, chooses on which types of texts the LLM should be fine-tuned on. Texts
can be, e.g., Arabic or Tamil8 or the whole RDR. This second step composes the corpus for our
LLM and ensures the alignment of the fine-tuned LLM with the humanities. Third, we fine-tune
the LLM with the selected data and create the chatbot by this step. However, we still have the
issue of hallucinations and missing citations.</p>
      <p>
        As a solution, we apply Subjective Content Descriptions (SCDs) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. SCDs are additional data
attached to locations in text documents, i.e., an SCD contains additional data like descriptions,
links, or labels which themselves may be created automatically or by humans and each SCD is
attached to one or more sentences of a text document. Additionally, the sentences to which
an SCD is attached are represented in an SCD word-distribution matrix. Using this SCD
worddistribution matrix, an SCD can be identified by the Most Probably Suited SCD ( MPS2CD)
algorithm [17] for any new and unseen sentence. MPS2CD identifies the most suitable SCD from
the set of known SCDs attached to the text documents. Hence, using MPS2CD it is possible to
create a link from a new and unseen sentence to an SCD and all sentences this SCD is attached
with. Generally, the theory of SCDs is not restricted to text documents attached with SCD
containing additional text.
      </p>
      <p>Coming back to ChatHA, we lack SCDs on the corpus used for fine-tuning. The UnSupervised
Estimator of SCD Matrices (USEM) [18] automatically creates SCDs for a corpus and the SCDs
get attached to the sentences in the corpus, too. Thus, using USEM we add SCDs to the corpus
used for fine-tuning—each sentence gets one SCD. Each SCD represents a topic or concept
mentioned in the corpus and all sentences about each topic or concept belong to the same SCD.
Thus, our SCDs represent the various topics or concepts in the corpus and the sentences that
mention them.</p>
      <p>Using SCDs we can solve the issue of hallucinations and missing citations in the output of
the LLM. For each sentence in the output, MPS2CD identifies an SCD from the corpus. In doing
so, a link is created from the output of the LLM to the SCDs of the corpus, and further on to the
sentences of the corpus. These links can now be used as citations shown in the output, pointing
to the relevant sentence in the corpus used for fine-tuning.</p>
      <p>Finally, ChatHA is ready to be used: A humanities scholar inputs a question about Arabic
or Tamil texts using natural language. This question is first sent to the LLM and its outputs
are post-processed in the following way: We apply MPS2CD on the raw output to identify an
SCD for each sentence. Sentences for which no SCD is found may be hallucinations and are
omitted because there is no evidence of SCDs. The processed output is then displayed to the
user alongside the SCDs for each sentence. For each sentence and SCD, ChatHA ofers the
possibility to view this SCD in the information system for USEM [19] or to open each of the
sentences in the corpus used for fine-tuning which are attached to the same SCD. Additionally,
if further visualization is available for an SCD or a sentence, ChatHA ofers to visualize it in the
corresponding information system, e.g., for Arabic texts the system shown in Figure 4.</p>
      <p>All in all, ChatHA can be used to query RDRs for research tasks in the humanities. The</p>
      <sec id="sec-6-1">
        <title>8https://www.awhamburg.de/forschung/langzeitvorhaben/tamilex.html</title>
        <p>output includes citations, so we not only reduce hallucinations, but also give references for a
closer look.</p>
        <p>Some questions might go beyond the information available in the corpus and the research data
repository. For this case, it should be possible to include corresponding academic publications
and further web resources about humanities in the corpus. Such resources might be obtained by
crawling the web and asking multiple questions to ChatHA, maybe also combined with running
a second fine-tuning on the new corpus. AutoGPT [ 20] is a chatbot for more sophisticated tasks
which may require multiple questions. It will split a task into multiple questions, and run each
question and the implied tasks before it combines the answers with some automated answer
combination mechanism. Thus, AutoGPT can be used to answer more challenging questions
that require deeper understanding and expressiveness.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>This paper introduces the FToD process, a process that helps people who are not experts in AI
training a custom NLP model. Data already available in RDRs including a METS file can be
used directly without knowing the specifics of the libraries used. It was shown, using the ACP
training data and CAMeLBERT, that the process can improve results using labelled data already
available in the repository.</p>
      <p>After fine-tuning a model, an expert can evaluate the model. In the future, we also plan
to use the evaluation screen for experts to enter new samples into an RDR. When a new text
is submitted, the results from the model are shown. The system should then allow for label
adjustments when necessary and the option to select labels from the pre-trained model if they
are deemed more accurate. Afterwards, this new example will be integrated into the data set
which can be used to fine-tune a potentially improved model. Future work focuses also on the
implementation and refinement of ChatHA.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>The research for this contribution was funded by the Deutsche Forschungsgemeinschaft (DFG,
German Research Foundation) under Germany’s Excellence Strategy - EXC 2176 ’Understanding
Written Artefacts: Material, Interaction and Transmission in Manuscript Cultures’, project no.
390893796.
[17] F. Kuhr, M. Bender, T. Braun, R. Möller, Augmenting and automating corpus enrichment,</p>
      <p>Int. J. Semantic Computing 14 (2020) 173–197. doi:10.1142/S1793351X20400061.
[18] M. Bender, T. Braun, R. Möller, M. Gehrke, Unsupervised estimation of subjective
content descriptions, Proceedings of the 17th IEEE International Conference on Semantic
Computing (ICSC-23) (2023). doi:10.1109/ICSC56153.2023.00052.
[19] M. Bender, T. Braun, R. Möller, M. Gehrke, Unsupervised estimation of subjective content
descriptions in an information system, Int. Journal of Semantic Computing (2023).
[20] H. Yang, S. Yue, Y. He, Auto-gpt for online decision making: Benchmarks and additional
opinions, 2023. arXiv:2306.02224.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , L. u. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          , in: I. Guyon,
          <string-name>
            <given-names>U. V.</given-names>
            <surname>Luxburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vishwanathan</surname>
          </string-name>
          , R. Garnett (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>30</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2017</year>
          . URL: https://proceedings.neurips.cc/ paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Lei</surname>
          </string-name>
          , Unsupervised Learning: Word Vector, Springer Singapore, Singapore,
          <year>2021</year>
          , pp.
          <fpage>95</fpage>
          -
          <lpage>149</lpage>
          . URL: https://doi.org/10.1007/
          <fpage>978</fpage>
          -981-16-2233-
          <issue>5</issue>
          _7. doi:
          <volume>10</volume>
          .1007/
          <fpage>978</fpage>
          -981-16-2233-
          <issue>5</issue>
          _
          <fpage>7</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Rothman</surname>
          </string-name>
          ,
          <article-title>Transformers for Natural Language Processing: Build innovative deep neural network architectures for NLP with Python, PyTorch, TensorFlow</article-title>
          , BERT, RoBERTa, and more, Packt Publishing,
          <year>2021</year>
          . URL: https://books.google.de/books?id=
          <fpage>Cr0YEAAAQBAJ</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          ,
          <article-title>BERT: pre-training of deep bidirectional transformers for language understanding</article-title>
          , CoRR abs/
          <year>1810</year>
          .04805 (
          <year>2018</year>
          ). URL: http://arxiv. org/abs/
          <year>1810</year>
          .04805. arXiv:
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kublik</surname>
          </string-name>
          , S. Saboo, GPT-3:
          <string-name>
            <given-names>The</given-names>
            <surname>Ultimate Guide To Building NLP Products With OpenAI</surname>
          </string-name>
          <string-name>
            <given-names>API</given-names>
            ,
            <surname>Packt</surname>
          </string-name>
          <string-name>
            <surname>Publishing</surname>
          </string-name>
          ,
          <year>2023</year>
          . URL: https://books.google.de/books?id=fgutEAAAQBAJ.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Matena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Exploring the Limits of Transfer Learning with a Unified Text-to-</article-title>
          <string-name>
            <surname>Text</surname>
            <given-names>Transformer</given-names>
          </string-name>
          , CoRR abs/
          <year>1910</year>
          .10683 (
          <year>2019</year>
          ). URL: http://arxiv.org/abs/
          <year>1910</year>
          .10683. arXiv:
          <year>1910</year>
          .10683.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Inoue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Alhafni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Baimukan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bouamor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Habash</surname>
          </string-name>
          ,
          <article-title>The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models</article-title>
          ,
          <source>in: Proceedings of the Sixth Arabic Natural Language Processing Workshop</source>
          , Association for Computational Linguistics, Kyiv,
          <source>Ukraine (Online)</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sulem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. P. B.</given-names>
            <surname>Veyseh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. H.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Sainz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Agirre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Heinz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Roth</surname>
          </string-name>
          ,
          <article-title>Recent advances in natural language processing via large pre-trained language models: A survey</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2111</volume>
          .
          <fpage>01243</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Tilp</surname>
          </string-name>
          ,
          <article-title>Compilation and preview of archieved research data with a manifest-file, Bachelor thesis</article-title>
          ,
          <source>Universität zu Lübeck</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Schif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Melzer</surname>
          </string-name>
          , E. Wilden,
          <string-name>
            <given-names>R.</given-names>
            <surname>Möller</surname>
          </string-name>
          ,
          <article-title>TEI-Based Interactive Critical Editions</article-title>
          , in: S.
          <string-name>
            <surname>Uchida</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Barney</surname>
          </string-name>
          , V. Eglin (Eds.),
          <source>Document Analysis Systems</source>
          , Springer International Publishing, Cham,
          <year>2022</year>
          , pp.
          <fpage>230</fpage>
          -
          <lpage>244</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N.</given-names>
            <surname>Stahl</surname>
          </string-name>
          ,
          <article-title>New concepts for previewing of data management repositories</article-title>
          ,
          <source>Bachelor thesis, Universität zu Lübeck</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Delangue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cistac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Louf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Funtowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Davison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shleifer</surname>
          </string-name>
          , P. von Platen, C. Ma,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jernite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Plu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. Le</given-names>
            <surname>Scao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gugger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Drame</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lhoest</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rush</surname>
          </string-name>
          , Transformers:
          <article-title>State-of-the-art natural language processing</article-title>
          ,
          <source>in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>38</fpage>
          -
          <lpage>45</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .emnlp-demos.6. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .emnlp-demos.
          <volume>6</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>E.</given-names>
            <surname>Webber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Olgiati</surname>
          </string-name>
          ,
          <article-title>Pretrain Vision and Large Language Models in Python: End-to-end techniques for building and deploying foundation models on AWS (English Edition</article-title>
          ), 1 ed.,
          <source>Packt Publishing; 1. Edition (31. Mai</source>
          <year>2023</year>
          ),
          <year>2023</year>
          , pp.
          <fpage>133</fpage>
          -
          <lpage>219</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>C. G.</given-names>
            <surname>Northcutt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Athalye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mueller</surname>
          </string-name>
          ,
          <article-title>Pervasive label errors in test sets destabilize machine learning benchmarks</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2103</volume>
          .
          <fpage>14749</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>OpenAI</surname>
          </string-name>
          , Gpt-4
          <source>technical report</source>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2303</volume>
          .
          <fpage>08774</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>F.</given-names>
            <surname>Kuhr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Braun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bender</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Möller</surname>
          </string-name>
          , To Extend or not to Extend?
          <article-title>Context-specific Corpus Enrichment</article-title>
          ,
          <source>Proceedings of AI: Advances in Artificial Intelligence</source>
          (
          <year>2019</year>
          )
          <fpage>357</fpage>
          -
          <lpage>368</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -35288-2_
          <fpage>29</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>