<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Understanding High-complexity Technical and Regulatory Documents with State-of-the-Art Models: A Pilot Study</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bernardo Magnini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Dal Pozzo</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roberto Zanoli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fondazione Bruno Kessler</institution>
          ,
          <addr-line>Trento</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Rete Ferroviaria Italiana S.p.A</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We explore the potential of state-of-the-art Large Language Models (LLMs) to reason on the content of high-complexity documents written in Italian. We focus on both technical documents (e.g., describing civil engineering works) and regulatory documents (e.g., describing procedures). While civil engineering documents contain crucial information that supports critical decision-making in construction, transportation and infrastructure projects, procedural documents outline essential guidelines and protocols that ensure eficient operations, adherence to safety standards and efective incident management. Although LLMs ofer a promising solution for automating the extraction and comprehension of high-complexity documents, potentially transforming our interaction with technical information, LLMs may encounter significant challenges when processing such documents due to their complex structure, specialized terminology and strong reliance on graphical and visual elements. Moreover, LLMs are known to sometimes produce unexpected or incorrect analyses, a phenomenon referred to as hallucination. The goal of the paper is to conduct an assessment of LLM capacities along several dimensions, including the format of the document (i.e., selectable text PDFs versus scanned OCR PDFs), the structure of the documents (e.g., number of pages, date of the document), the graphical elements (e.g., tables, graphs, photos), the interpretation of text portions (e.g., make a summary), and the need of external knowledge (e.g., to interpret a mathematical expressions). To run the assessment, we took advantage of GPT-4omni, a large multi-modal model pre-trained on a variety of diferent data. Our findings suggest that there is great potential for real-world applications for high-complexity documents, although LLMs may still be susceptible to produce misleading information.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;LLMs</kwd>
        <kwd>GPT-4omni</kwd>
        <kwd>Information extraction</kwd>
        <kwd>Technical documents</kwd>
        <kwd>Procedural documents</kwd>
        <kwd>Civil engineering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>CLiC-it 2024: Tenth Italian Conference on Computational Linguistics,
Dec 04 — 06, 2024, Pisa, Italy
* Corresponding author.
† These authors contributed equally.
$ magnini@fbk.eu (B. Magnini); a.dalpozzo@rfi.it (A. Dal Pozzo);
zanoli@fbk.eu (R. Zanoli)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License and more powerful, our research questions aim at
asAttribution 4.0 International (CC BY 4.0).</p>
    </sec>
    <sec id="sec-2">
      <title>2. Assessment Framework</title>
      <p>We defined a series of questions to assess the model’s
proficiency in interpreting written text and visual
content, including images and graphs. Table 1 lists queries
designed to evaluate how well the model understands
textual content, assessing its performance across categories
like “Bibliographic Information", “Document Structure"
and “Text Interpretation". Similarly, Table 2 presents the
list of queries aimed at assessing the model’s ability to
interpret graphical content, including “Table", “Photo",
“Figure", “Mathematical Expression" and “Graph".</p>
      <p>
        Additionally, we investigated the potential for the
model to experience hallucinations by making “trap"
Figure 1: Figure showing drainage outlets used at the junction questions designed to induce incorrect responses. For
points between the bituminous membrane and the rainwater example, a question such as “How tall is the pylon of
downpipe. the Zambana Vecchia-Fai della Paganella cableway
mentioned in paragraph 12.6?" was posed, even though
neither the specified paragraph nor the whole document
sessing their ability to extract and interpret key informa- contains any information about cableways. Other
intion, this way reducing the need for manual reviews by stances include queries like “What is the highest value
human experts. To this end, we have defined a simple in the fifth column of Table 12.8.1-1?", despite the
specquestion-answer evaluation framework tailored to tech- ified table having only 4 columns. Trap questions are
nical and regulatory documents. As an example, we ask highlighted in bold in the tables.
the model questions such as Provide a general summary Human evaluators subsequently reviewed and
anaof the technical specifications in the document and then we lyzed all responses provided by the model. Each response
manually check the model answer. We also consider the generated by the model was evaluated based on the
folpotential for LLMs/LMMs to generate content that is not lowing scoring:
grounded to the document, an issue often referred to as • 2 points for fully accurate responses: the answer
model confabulations or hallucinations [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. To assess meets the prompt’s requirements completely,
confabulations we included “trap" questions mentioning such as providing a full list of figures or a
comprenon-existing objects in the document. Finally, the as- hensive summary of the document’s key content.
sessment considers both selectable text PDFs, which are • 1 point for partially correct responses: the
anextractable and editable, and scanned OCR PDFs, where swer is incomplete, such as a list of figures
misstext is derived from scanning or from OCR. ing some entries or a summary that covers some
      </p>
      <p>
        A state-of-the-art survey on articles published between important points but omits others.
2000 and 2021, focusing on the applications of Text Min- • 0 points for incorrect responses: the answer fails
ing in the construction industry was presented in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] to meet requirements, such as a mostly
incomand [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] explored NLP application and development in con- plete or missing list of figures or a summary that
struction. Various machine learning and deep learning- does not accurately match the document’s
conbased NLP techniques, and their applications in construc- tent.
tion research, are documented in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        There are several potential real-world applications of 2.1. Model
LLMs in supporting and enhancing various sectors.
Construction firms can exploit LLMs to assist in reviewing For our experiments we use GPT-4omni[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], available
technical documents for safety regulations and building from OpenAI since April 2024, which represents a
significodes, helping simplifying compliance checks. Addition- cant advance in AI innovation by becoming the first truly
ally, organizations with large document archives can multimodal model capable of interpreting and generating
leverage LLMs to identify potential inconsistencies or various types of data, including text, images and audio.
conflicts in procedures, providing valuable insights for
further human review and ensuring adherence to unified 2.2. Dataset
operational protocols.
      </p>
      <p>The dataset for our pilot experiments includes four
highcomplexity documents, two are technical specifications
and two are regulatory documents. More specifically:
Estrai il nome completo degli autori del documento. Estrai il titolo completo del documento. Estrai la
data di pubblicazione del documento.</p>
      <p>Riporta l’esatto numero di pagine del documento. Riporta l’indice delle tabelle presenti nel documento.</p>
      <sec id="sec-2-1">
        <title>Riporta l’indice delle figure presenti nel documento.</title>
        <p>Documento: Fai un riassunto generale del capitolato tecnico. Quali normative e regolamenti devono
essere rispettati secondo il capitolato tecnico? Qual è la timeline del progetto come delineata nel
capitolato tecnico? Qual e’ la lunghezza della fune portante della funivia descritta nel capitolato
tecnico?
Paragrafo: Riassumi il paragrafo II.12 PROCESSO DI CONDIVISIONE DELLE INDAGINI del documento
seguente utilizzando un linguaggio tecnico. Includi tutte le informazioni pertinenti e fornisci un livello
di dettaglio approfondito. Indica chiaramente eventuali riferimenti a documenti e procedure pertinenti.</p>
        <sec id="sec-2-1-1">
          <title>Come sono suddivise le attività di manutenzione ordinaria? Table 2</title>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Content</title>
      </sec>
      <sec id="sec-2-3">
        <title>4. Table</title>
      </sec>
      <sec id="sec-2-4">
        <title>5. Photo</title>
      </sec>
      <sec id="sec-2-5">
        <title>6. Figure</title>
      </sec>
      <sec id="sec-2-6">
        <title>7. Mathematical</title>
      </sec>
      <sec id="sec-2-7">
        <title>Expression</title>
      </sec>
      <sec id="sec-2-8">
        <title>8. Graph</title>
      </sec>
      <sec id="sec-2-9">
        <title>Question</title>
        <p>
          Qual è il valore richiesto della resistenza a rottura per trazione su un provino longitudinale per la
membrana inferiore da 4 mm? Cosa rappresenta la Tabella 12.8.1-2? Quali caratteristiche della membrana
sono riportate nella Tabella 12.8.1-1 rispetto alla Tabella 12.8.1-2? Quale è il valore più alto nella
quinta colonna della Tabella 12.8.1-1?
Per quante tipologie di eventi di cui alla tabella allegato 9 è previsto l’invio dell’Avviso di Accadimento
(AA)?
Descrivi gli oggetti o le persone presenti nella figura 12.8.4.2.6.a? Il tubo verde nella figura passa sopra
oppure sotto alla rotaia? Quanti alberi ci sono nella figura?
Descrivi il contenuto della figura 12.8.4.2.5.c. Nella figura 12.8.4.2.5.c dove va posizionato il bocchettone
in HDPN? Cosa rappresenta l’oggetto di colore rosso presente nella figura?
Descrivi a cosa fa riferimento l’espressione matematica 11 ≤  ≤ 40 riportata nella tabella Tabella
12.14.3.7. Cosa significa il simbolo ≤ nell’espressione matematica? Come si interpreta il prodotto
che è presente nell’espressione matematica?
Cosa è rappresentato nel grafico di figura 1? Cosa rappresenta l’asse delle X e l’asse delle Y del grafico?
Quale unità di misura è utilizzata per esprimere i valori sull’asse delle Y? A quale valore della curva
del grafico corrisponde il valore 100 delle X?
• A 96-page technical specification document
for civil engineering works from the Italian
railways[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
• A 32-page document on the design of an outdoor
        </p>
        <p>
          swimming pool in Trentino-Alto Adige[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
• A 49-page regulatory document from RFI
outlinimg procedures for investigating railway
incidents.
• A 12-page regulatory document from RFI
focusing on managing prescriptions and supervising
activities by ANSFISA (Agenzia Nazionale per la
        </p>
        <p>Sicurezza Ferroviaria).</p>
        <p>The two technical documents are licensed for
unrestricted use in non-commercial, educational, or research
contexts. In contrast, the two procedural documents
related to the Italian railway system are intended only for
internal RFI use and cannot be distributed.</p>
        <p>As far as the content of the four documents, the first
page provides general information (bibliographic) about
the document, including publication date and authors.</p>
        <p>An example is reported in Figure 2.</p>
        <p>Furthermore, the documents contain a combination of
photos, figures and tables, exemplified by Figures 1, 3, 4,
respectively. These visual elements are important for
explaining technical details and the logical structure of
procedures, often substituting written descriptions. This
means that the model frequently needs to interpret these
visual elements without relying on explanations provided
in the text.</p>
        <p>An important feature of our dataset is that it includes
both selectable PDF and scanned OCR PDF. More
specifically, the three RFI documents are selectable text PDF,
where the text is digital, searchable and can be copied,
typically created by word processors or digital publishing
software. These documents contain pages with tables and
ifgures, with some tables spanning multiple pages and
others presented as images. Certain figures and tables
include captions, while others do not. The documents
also includes formulas and graphics, such as those in
Figures 5 and 6. On the other hand, the swimming pool
document is a scanned OCR PDF, which is not directly
selectable and searchable. Some pages in this document
are misaligned compared to the standard orientation, and
it also includes tables and figures across the document.</p>
        <p>Table 3 shows a comparison of the key characteristics
of these documents.
which are internal to RFI, it was not necessary. For the
contamination test, we masked document elements, such
as numbers and paragraph identifiers in the text, and
asked the model to fill in these gaps. For instance, we
prompted the model with tasks like “Replace the MASK
marker with the missing paragraph number in the
following text". Results indicate that the model was unable
to identify the missing words, suggesting that it is likely
2.3. Contamination Test to have not encountered these documents in the
pretraining phase. Moreover, even if prior exposure to the
We ran a contamination test to verify that GPT-4omni did documents could improve GPT’s performance, its
unfanot use in its pre-training the documents of our dataset. miliarity with the specific questions and answers should
The test was carried out on two publicly available tech- limit its accuracy in responding.
nical documents, while for the regulatory documents,
2.4. Experimental Setup</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results and Discussion</title>
      <p>GPT-4omni achieves an average accuracy of 83,66% on
textual content and 88,00% on visual content, resulting in
an overall accuracy of 85.83%. However, accuracy drops
significantly, to 80,25%, when presented with questions
specifically designed to induce errors (“trap" questions).
GPT-4omni’ scores for both textual content and graphical
elements, ranging from 0 (indicating no accuracy) to
1 (indicating perfect accuracy) are provided separately
for regular questions (Table 4) and for “trap" questions
(Table 5).</p>
      <sec id="sec-3-1">
        <title>Bibliographic Information. A perfect score for both</title>
        <p>technical and regulatory documents indicates that the
model consistently retrieved bibliographic information
(author, title, date) accurately.</p>
        <p>Document Structure. GPT-4omni is not perfect at
detecting the structure of the documents. For example, the
model sometimes includes invented entries or omits the
entire index of the technical railway documents. This
could be attributed to the document’s complexity,
containing lengthy table labels (e.g., Table 12.8.2.1-1), a large
number of figures and tables (51), the absence of captions
for some of them, and a high page count (96). We observe
that the model is highly sensitive to the prompts used.
For instance, when prompted with:</p>
        <p>Report the number of tables present in
the document
for a regulatory document, the model inaccurately
returns a result of just one table. In contrast, when we
refined the prompt as:</p>
        <p>Identify all the tables present in the
following document. For each table
found, provide the page number where
it is located and the total number of
tables in the document
the model accurately lists the tables along with their
corresponding pages and correctly identifies six tables.
As for the pool document, the model did not extract the
exact number of pages, likely due to the absence of page
numbers.</p>
        <p>Text Interpretation. The model performs better in
the pool document than on the railway documents in
text interpretation. In particular, GPT-4omni makes a
mistake in a paragraph-level “trap" question. When asked
about the height of the cable car pylon mentioned in
paragraph 12.6, the model incorrectly claims it was 43
meters tall, despite neither the paragraph nor the entire
document containing any references to cable cars. As
in the previous case, we found that the model is highly
sensitive to prompt phrasing. For example, when asked
to:</p>
        <p>Riassumi il contenuto del paragrafo
II.12 PROCESSO DI CONDIVISIONE</p>
        <p>DELLE INDAGINI
the model provides a somewhat brief and general
response. However, when the prompt was made more
specific, such as:</p>
        <p>Riassumi il paragrafo II.12
’PROCESSO DI CONDIVISIONE DELLE
INDAGINI’ del documento seguente
utilizzando un linguaggio tecnico.</p>
        <p>Includi tutte le informazioni
pertinenti e fornisci un livello di dettaglio
approfondito. Indica chiaramente
eventuali riferimenti a documenti e
procedure pertinenti
the model produces a much more accurate and detailed
summary.</p>
        <p>Tables. As for interpreting table content, GPT-4omni
performs well in both document types. However, in the
railway document, the model falls into the “trap" question
by attempting to answer the query about the value in the
iffth column of a table with only four columns. When
prompted with:
Quale è il valore più alto nella quinta
colonna della Tabella 12.8.1-1?
the model produced:</p>
        <p>Nella quinta colonna della Tabella 12.8.1-1,
che rappresenta le tolleranze, il valore più
alto è ± 20% per la resistenza a rottura per
trazione su provino longitudinale e
trasversale, e per la stabilità di forma a caldo
despite the absence of a fifth column. The model’s
answer was so detailed that, without verifying the
document, even a human might find it dificult to recognize
that the response had no basis in the actual content.
Photos. GPT-4omni can describe objects in photos of
documents. However, in the railway technical document,
the model missed crucial details, like a green tube and
railway track in Figure 3, which afected its ability to
answer questions about their relative position.
Interestingly, when provided only with the relevant page, the
model correctly identified all objects, including key
elements, suggesting that the document context or photo
placement may influence its comprehension of images.
Figures. In the railway document, GPT-4omni fell into
a “trap" question and additionally failed to accurately
describe the content of Figure 1. Regarding the “trap"
question:</p>
        <p>Cosa rappresenta l’oggetto di colore
rosso presente nella figura?
the model identified a red object that was not present
in the figure and even provided a detailed description of
it:</p>
        <p>L’oggetto di colore rosso presente nella
ifgura rappresenta il "rivestimento
impermeabile" dello strato di colore rosso, come
indicato nella legenda e nel contesto delle
spiegazioni tecniche fornite nella sezione
pertinente del capitolato tecnico, in
particolare nelle fasi di impermeabilizzazione
straordinaria e ripresa del rivestimento
impermeabile
As for the image description:</p>
        <p>Descrivi il contenuto della figura
12.8.4.2.5.c. Nella figura 12.8.4.2.5.c
dove va posizionato il bocchettone in</p>
        <p>HDPN?</p>
        <p>GPT-4omni initially states that it cannot describe the
image as it does not exist:</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>La figura 12.8.4.2.5.c non è visibile nel
documento condiviso, quindi non posso fornire
una descrizione dei suoi contenuti specifici</p>
      <p>We showed that GPT-4omni has a high potential for
analyzing technical and regulatory documents. However, the</p>
      <p>However, in the subsequent question about the place- model tends to make factual errors, to generate
inaccument of the nozzle, the model correctly described the rate details and to provide misleading answers supported
nozzle placement: by technical explanations. These observations highlight
potential limitations when handling long and complex
Nella figura 12.8.4.2.5.c, il bocchettone in documents, and further research is needed to better
unHDPM va posizionato in corrispondenza dei derstand and address these challenges. Our study has
fori di scarico, come indicato nella figura some limitations that should be considered.
stessa Limited Sample Size. The evaluation was based on a
dataset of four documents, which may not be
representaMath Expressions. GPT-4omni demonstrates good tive of the broader range of technical documents.
performance in interpreting mathematical expressions in Query Format. We employed a multi-question prompt
technical documents. However, in the railway document, format, grouping multiple questions within a single
the model made a mistake on the “trap" question asking prompt. We plan to explore an approach where each
about multiplication: question is presented as an individual prompt.</p>
      <p>Examining Positional Bias. There is a possibility that
Come si interpreta il prodotto che è pre- the answer location within the document (beginning,
sente nell’espressione matematica? middle, or end) might afect the model’s performance.</p>
      <p>Contextual Sensitivity Analysis. The amount of context
in a formula that did not have any multiplication: provided could influence GPT in answering questions
related to specific document elements. We plan to
sysIl prodotto presente nell’espressione tematically compare the model accuracy when presented
matematica 11&lt;n&lt;40 non rappresenta with the entire document versus just the relevant page
un’operazione di moltiplicazione, ma containing the answer.
indica semplicemente che la variabile n Playground vs. API Analysis. We primarily used the
deve rispettare entrambi i limiti indicati OpenAI API for evaluation. It would be valuable to
explore whether analyzing documents through OpenAI’s</p>
      <p>Playground interface yields similar results.</p>
      <p>This suggests that the model might have
misinterpreted the word “product" in the mathematical context.</p>
      <p>Graphs. The results table shows a perfect score for
the railway document in interpreting graphs. There is
no data for the other documents.</p>
      <p>This study suggests several practical applications of
LLMs in various sectors. Automating Compliance Checks
for Construction Projects: LLMs can help construction
companies review technical documents for safety
regulations and building codes. By analyzing specifications,
the model can identify parts that may comply with or
violate local laws. While this can make compliance easier,
human experts must verify the model’s findings because
LLMs can make errors or generate false information.</p>
      <p>Identifying Conflicting Procedures in Large Document
Archives: Organizations with extensive procedural
document archives can use LLMs to find inconsistencies or
conflicts between procedures. The model can scan large
amounts of text and highlight contradictions, providing
a basis for human review. This helps companies resolve
discrepancies eficiently.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work has been partially supported by the PNRR
project FAIR - Future AI Research (PE00000013), under
the NRRP MUR program funded by the
NextGenerationEU.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>On hallucination and predictive uncertainty in conditional language generation</article-title>
          , in: P. Merlo,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tiedemann</surname>
          </string-name>
          , R. Tsarfaty (Eds.),
          <source>Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics:</source>
          Main Volume,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2021</year>
          , pp.
          <fpage>2734</fpage>
          -
          <lpage>2744</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          .eacl-main.
          <volume>236</volume>
          . doi:
          <volume>10</volume>
          . 18653/v1/
          <year>2021</year>
          .eacl-main.
          <volume>236</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rohrbach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Hendricks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Burns</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Darrell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Saenko</surname>
          </string-name>
          ,
          <article-title>Object hallucination in image captioning</article-title>
          , in: E.
          <string-name>
            <surname>Rilof</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Chiang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Hockenmaier</surname>
          </string-name>
          , J. Tsujii (Eds.),
          <source>Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Brussels, Belgium,
          <year>2018</year>
          , pp.
          <fpage>4035</fpage>
          -
          <lpage>4045</lpage>
          . URL: https://aclanthology. org/D18-1437. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D18</fpage>
          -1437.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Yan</surname>
          </string-name>
          , M. Ma, Y. Wu,
          <string-name>
            <given-names>H.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Dong, Overview and analysis of the text mining applications in the construction industry</article-title>
          ,
          <source>Heliyon</source>
          <volume>8</volume>
          (
          <year>2022</year>
          )
          <article-title>e12088</article-title>
          . URL: https://www.sciencedirect.com/science/article/pii/ S240584402203376X. doi:https://doi.org/10. 1016/j.heliyon.
          <year>2022</year>
          .e12088.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <article-title>Applications of natural language processing in construction, Automation in Construction 136 (</article-title>
          <year>2022</year>
          )
          <article-title>104169</article-title>
          . URL: https://www.sciencedirect.com/science/ article/pii/S0926580522000425. doi:https: //doi.org/10.1016/j.autcon.
          <year>2022</year>
          .
          <volume>104169</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Shamshiri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. R.</given-names>
            <surname>Ryu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Y.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <article-title>Text mining and natural language processing in construction, Automation in Construction 158 (</article-title>
          <year>2024</year>
          )
          <article-title>105200</article-title>
          . URL: https://www.sciencedirect.com/ science/article/pii/S0926580523004600. doi:https: //doi.org/10.1016/j.autcon.
          <year>2023</year>
          .
          <volume>105200</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Erfani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <article-title>Natural language processing application in construction domain: An integrative review and algorithms comparison</article-title>
          ,
          <year>2022</year>
          , pp.
          <fpage>26</fpage>
          -
          <lpage>33</lpage>
          . doi:
          <volume>10</volume>
          .1061/9780784483893.004.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7] OpenAI, Gpt-4
          <source>technical report</source>
          ,
          <year>2024</year>
          . URL: https: //arxiv.org/abs/2303.08774. arXiv:
          <volume>2303</volume>
          .
          <fpage>08774</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Annicchiarico</surname>
          </string-name>
          , Capitolato - parte ii - sezione 12 - ponti, viadotti, sottovia e cavalcavia images,
          <source>Pubblica Amministrazione</source>
          ,
          <year>2020</year>
          . URL: https://condivisionext. rfi.it/mimse/Documenti%20condivisi/PFTE% 20Velocizzazione%
          <fpage>20Roma</fpage>
          -Pescara%
          <fpage>20</fpage>
          -
          <lpage>%</lpage>
          20Lotto%
          <fpage>201</fpage>
          %
          <fpage>20</fpage>
          -
          <lpage>%</lpage>
          <source>20Interporto-Manoppello/Riscontro% 20osservazioni%20Comitato%20Speciale% 20CSLLPP/Integrazione%20documentale/1_ Capitolato%20generale%20tecnico%20OOCC/ Capitolato</source>
          %
          <fpage>20</fpage>
          -
          <lpage>%</lpage>
          20Parte%
          <fpage>20II</fpage>
          %
          <fpage>20</fpage>
          -
          <lpage>%</lpage>
          20Sezione%
          <year>2012</year>
          %
          <fpage>20</fpage>
          -
          <lpage>%</lpage>
          20Ponti,%20Viadotti,
          <source>%20Sottovia%20e% 20Cavalcavia.pdf, accessed: July</source>
          <volume>18</volume>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Luciano</surname>
          </string-name>
          , Riqualificazione punto natatorio,
          <source>Comune di Lavis</source>
          ,
          <year>2016</year>
          . URL: https://apl.provincia.tn. it/content/download/12939/230226/version/1/file/ Riqualificazione+punto+natatorio.pdf,
          <source>accessed: July</source>
          <volume>18</volume>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>