ArchiBERTo: a Hierarchization Quality Objectives NLP Tool in the
Italian Architecture, Engineering and Construction Sector
Mirko Locatelli 1, Lavinia C. Tagliabue 2 and Giuseppe M. Di Giuda 3
1
  Politecnico di Milano, Department of Architecture, Built Environment and Construction Engineering, 20133
Milan, Italy
2
  Università degli Studi di Torino, Department of Computer Science, 10149 Turin, Italy
3
  Università degli Studi di Torino Department of Management, 10134 Turin, Italy


                 Abstract
                 Natural language is the main source of communication during pre-design phase, effective
                 communication among the actors must be guaranteed for the design project success during this
                 crucial phase. In the proposed study, textual data is processed via an NLP tool (ArchiBERTo)
                 specifically developed for the elaboration of Design Guidance Documents (DIP), pivotal
                 documents in the pre-design stages of the design and construction procurement process in Italy.
                 DIP defines demands and objectives of the public appointing party. The tool is used to process
                 and translate the DIP quality objectives related sentences into a list of hierarchized objectives
                 and criteria. To evaluate ArchiBERTo performances, the outputs generated by the tool and the
                 objectives rankings provided by a group of architecture and construction experts are compared.
                 The results show a good capability of the tool to mirror the collective capability and sensitivity
                 of the group of experts in the design and construction domain.

                 Keywords 1
                 Collective intelligence, BERT, School buildings

1. Introduction
1.1. Natural language and pre-design phase in Architecture, Engineering and
Construction sector
    Pre-design is the initial and a crucial phase of the architectural design and construction process
having a significant impact on the project's value [1]. During the pre-design phase, project goals and
objectives are defined and conveyed to the designers in order to reach a consensus between the
stakeholders’ needs and demands and the designers’ proposals. Effective communication and the
consequent proper understanding of requests and requirements by all the involved parties is the main
goal of the pre-design phase [2], being the objective definition, communication, and understanding a
critical factor for the success of the design and construction projects [3]. In the pre-design phase,
communication mainly takes place using verbal expressions collected and shared through multiple text
documents [4], and natural language turns out to be the main source of information at this stage of the
design and construction process. However, natural language can lead to misinterpretation, or at least
different interpretations and complexities [5], primarily in the definition of the relative importance of
the quality needs and objectives to be pursued in the project. In fact, the hierarchy assigned to demands,
especially qualitative demands, varies greatly from subject to subject being a personal and individual
judgment influenced by countless factors and biases. In addition, considering the prohibition of direct
communication for the actors involved in a public call for tenders, the forementioned obstacles inherent
in the use of natural language turn out to be exacerbated [6].

AIxPA 2022: 1st Workshop on AI for Public Administration, December 2nd, 2022, Udine, IT
EMAIL: mirko.locatelli@polimi.it (M. Locatelli); laviniachiara.tagliabue@unito.it (L.C. Tagliabue); giuseppemartino.digiuda@unito.it
(G.M. Di Giuda)
ORCID: 0000-0003-0100-3169 (M. Locatelli); 0000-0002-3059-4204 (L.C. Tagliabue); 0000-0002-2294-0402 (G.M. Di Giuda)
              ©️ 2022 Copyright for this paper by its authors.
              Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
              CEUR Workshop Proceedings (CEUR-WS.org)
1.2.    Italian public design call for tenders: actors, documents, and procedure
    An overview of the design call for tenders procedure and of the actors’ role involved within the
Italian design and construction sector is presented to better identify information flow criticalities linked
with the mandatory steps of the tender procedure. In the following sections the research context and
objectives are also introduced. As stated, the procedure of a public design call for tender, in the Italian
context, involves mandatory steps, documents and the participation of three main actors:
        • Appointing public party: it identifies needs, objectives, and requirements to be pursued by
             the design project. Quality and quantitative objectives are defined and shared via a Design
             Guidance Document called Documento di Indirizzo alla Progettazione (DIP).
        • Design teams: teams of designers competing in the call for tenders. To win the tender they
             aim to submit a design proposal that meets the public actor's requirements and needs as
             defined in the DIP.
        • External committee: appointed by the appointing public party, the committee evaluates the
             design teams bids to identify the best design project, i.e., the one that comprehensively
             complies the systemic demands and requirements declared as priorities, via natural language
             expressions, in the DIP document.
    From this point of view, the DIP is the instrument for the public party to communicate quality
demands and expectations regarding the design and construction of the building and, at the same time,
the benchmark adopted to evaluate the design proposals submitted by the competing design teams [6].
Consequently, the DIP aims to ensure that the interventions will meet the administration needs and
objectives which must be clearly identified and stated in the document. In order to reach this goal, the
DIP should allow the designers to have a deep understanding about the needs and objectives which
ought to be properly communicated and shared to lead the design proposals towards the achievement
of the correct goals, and at the same time, support the external committee in the evaluation of the bids,
defining the quality objectives relative priority and hierarchy. Moreover, the DIP contents are regulated
in the Italian legislation by the D.P.R. 207/2010, the document is divided into two main sections. A
quantitative section about the state of the premises, technical requirements, and regulations. This section
can be defined and supported identifying alpha-numerical parameters; a qualitative section that
describes quality objectives and expectations (e.g., sociocultural value of the project, architectural and
landscape quality of the intervention, flexibility of spaces, perceptual comfort, etc.).
    Regarding the quantitative section architects and engineers are already prepared and equipped with
specific digital tools and calculation methods to deal with alpha-numerical information mainly using
the Building Information Modelling (BIM) approach. BIM is a methodology to digitally manage the
design and construction process allowing to model and represent a physical asset, like a building or
building components, in a virtual environment [7]; BIM methodology and related enabling tools have
been used as a design management approach by the design and construction industry in order to improve
the collaboration and communication among the construction players as well as the management of
documentation in the construction projects, helping to accomplish efficiency and effectiveness [8].
Consequently, the evaluation of the design proposals by the external committee on the standards and
technical requirements aspects is currently feasible applying and calling for the application of BIM
methodology, since it is based on processing and comparing procedures implying parameters and
numerical values. From this point of view, the evaluation of technical needs can be supported by
requiring the designers to deliver specific Building Information Models. On the contrary, being the DIP
qualitative section expressed and shared via natural language expressions, the processing and digital
management of quality characteristics cannot be handled via traditional BIM methods and digital tools.

1.3. Natural Language Processing in Architecture, Engineering and
Construction sector
   As explained, a major part of the quality-related information is expressed relying on natural
sentences and exchanged via written text documents. Unstructured data and information, such as written
natural language, can be processed and digitally managed relying on Natural Language Processing
(NLP) systems and techniques. NLP aims to allow computers to process human natural language and
knowledge [9–13]. NLP systems have been already applied and assessed in several AECO sector fields
[14]. A brief overview of the existing NLP applications in AECO is provided to frame existing studies,
detect possible deficiencies, and research gaps.
    Articles about NLP applications in AECO are listed according to major application fields (i.e.,
Procurement management, Safety management, and Project and construction risk management) and
scope/task:
         • Procurement management: Legal clauses classification [15,16]; Contract document risk
             detection [17–19]; Automated detection of contract changes [20]; Disputes resolution
             facilitation [21].
         • Safety management: Safety risks prediction [22,23]; Construction site accidents analysis
             [24]; Safety incompatibilities prediction [25]; Accidents and injuries prediction [26–28].
         • Project and construction risk management (different from safety and legal risks):
             Requirements defects detection [29]; Estimation of non-compliance [30–33]; Support
             project and construction risk management [34]; Support or automate compliance checking
             [35,36].
    The analysis highlights the lack of applications involving documents belonging to the pre-design or
preliminary stages of the design and construction process. For a detailed and in-depth scientometric
analysis of the use of NLP and BIM in AECO sector the authors suggest the consultation of Locatelli
et al. [37].

1.4.    NLP tool for automatic classification of qualitative objectives
   The study aims, by the use of an ad-hoc developed NLP tool (ArchiBERTo) [6,38], to automatically
process and translate the quality objectives expressed in DIPs into a list of hierarchized objectives to
support the evaluation process of the design proposals of school buildings. The use of ArchiBERTo
aims to establish a consensus between the actors regarding the relative hierarchy of quality needs and
objectives, as shown in Figure 1. The manuscript explains the development and assessment steps
adopted and the evaluation of the tool measuring the subjectivity degree and customization capability,
fundamental features to properly translate the quality objectives and needs related sentences into a list
of hierarchized objectives and criteria. As an ultimate goal the use of ArchiBERTo aims to minimize
the possible different interpretations of the hierarchy of the appointing party objectives by the design
teams and by the external committee enhancing the effective communication during the pre-design
phase.


Figure 1: Schema of ArchiBERTo DIP processing.

2. Methodology
   For reasons of clarity the methodology section is divided into two main parts. The first one explains
the development and assessment steps adopted to train and develop the model, also providing the
metrics used to measure the ArchiBERTo performances. The second focuses on evaluating the NLP
tool by calculating the degree of subjectivity in objectives and criteria ranking generation and assessing
the capability to produce goal hierarchies customized on the content of the processed documents.

2.1.    ArchiBERTo development and assessment steps
    The NLP-based tool (ArchiBERTo) is developed as a multi label classifier based on the BERT
(Bidirectional Encoder Representation from Transformers) language representation model that provides
contextualized embedding [39]. A multi-label classifier has been chosen because capable to
automatically apply more than one classification label to a single text or sentence allowing the
prediction of multiple mutually non-exclusive classes [40], classes which coincide with a defined list
of labels. The capability to automatically label and according to the labels assign weights is the basis to
generate the priority ranking of the quality objectives of the DIPs used as case study. The NLP tool is
trained (using the data from already validated DIPs) to classify sentences according to the set of
predefined labels which represent the appointing party and end-users’ demands and quality requests.
The main activities to develop and assess the tool are listed below and explained in the following
paragraphs:
        • labels definition
        • training and validation dataset production
        • model fine-tuning
        • performance evaluation

        2.1.1. Labels definition
    As stated, the NLP tool is trained to classify sentences according to a set of predefined labels. In this
phase is fundamental to create a consensus about the labels number and definition, representing the
appointing party and end-user interests and quality objectives. Consequently, to reach a consensus
among the stakeholders involved the set of labels must be defined in conjunction by the appointing
party, the end-users, when possible, and the domain experts; in this specific context architects, building
engineers, and designers. For the selected case study (Progetto Iscol@), a list of predefined labels,
defined by the appointing party, is already available. The labels are the result of a cooperation among
different experts (i.e., architects, designers, pedagogues, and agronomists) and end-users (i.e., primary
and secondary school teachers, principals, and school building janitors) representing the result of a
collective effort of the different stakeholders.

        2.1.2. Training and validation datasets definition
   Being the proposed NLP tool based on the BERT language model it is necessary to fine-tune the
model to solve the multi-label classification problem in the architecture and design knowledge domain.
Consequently, a certain amount of training and validation data is needed. A general dataset is defined
and then is randomly split into a training and validation dataset at a 0.8:0.2 ratio. The general dataset is
defined by selecting DIP sentences and manually assigning labels.
   The manual labelling is a critical task influencing both accuracy and capability of the NLP tool to
automatically process and properly label needs and quality objective sentences. Being the dataset the
knowledge source of the tool the general dataset is produced via a collaboration between experts with
knowledge in the architectural, design, and construction fields. In addition, since the proposed NLP
system is applied to a specific case study (Progetto Iscol@) a deep knowledge of the strategic objectives
of Progetto Iscol@ are requested to the experts to correctly label the training sentences. Since Iscol@
members could not be directly involved in the project, a preliminary study of the overall goals,
guidelines, and context of Iscol@ is conducted by the selected experts before the labelling of the training
sentences. Moreover, to avoid biases in the production of the dataset, each expert is asked to
independently propose a hypothesis for the labelling of each sentence. Then the experts share their
hypothesis and in case of disagreement on some labels, they are asked to share the motivation of their
label choices and converge on a single common proposal. The construction of the dataset by different
experts aims to allow the model to represent and use their collective knowledge in the labelling activity.
   The NLP tool aims to avoid subjectivity in the interpretation of textual information by representing
the collective intelligence of a group of experts, rather than that of a single expert. Furthermore,
ArchiBERTo aims to outperform the capability of a single expert to manage the complexity of analyzing
several sentences being the representation of a group of experts’ knowledge.

        2.1.3. Model fine-tuning parameters
   Once defined the dataset, to properly train the BERT model, a set of hyperparameters must be
defined. A hyperparameter is a variable configuration that is external to the model and whose value is
not estimated from the data but estimated via a trial and errors cycle. The list and the description of the
hyperparameters used for the NLP tool training is provided in Table 1.

Table 1
Hyperparameters description and values setting.
  Hyperparameter                                        Description
  MaximumLength        Maximum number of words simultaneously processed during the training
  TrainingBatchSize                Number of training examples used in one iteration
 ValidationBatchSize          Number of examples used for the validation in one iteration
   EpochsNumber          An epoch is an entire transit of the training data through the algorithm
    LearningRate      It defines the adjustment in the weights of the neural network with respect
                                               to the loss gradient descent

        2.1.4. Performance assessment metrics and learning curves
    In order to measure the NLP tool accuracy, the model predictions are compared with the human
annotation of the validation dataset. Precision (P), Recall (R) and F1-score (F1) metrics are selected to
measure the model performances [41].
    Learning curves are also plotted, in fact learning curves display the training error as a function of
the number of iterations in the optimization process allowing the monitoring of the optimality of a
model allowing to diagnose problems and optimize predictions [42]. Specifically, training and
validation loss curves are plotted in order to detect the properly training of the model identifying
possible underfitting or overfitting behaviors.
    An underfitted model shows a loss value function for training and validation curves not decreasing
with the number of iterations or epochs. An underfitted model is highly biased, and it does not consider
the data and relevant information. On the other hand, an overfitted model shows a decreasing training
loss curve, achieving low error values per iterations or epochs. However, in an overfitted model the
validation loss decreases until a minimum turning point and then starts to increase. The minimum point
represents the beginning of the overfitting behavior of the model. If overfitted the model can capture
and learn from training data, but it performs poorly on new and unseen data, showing poor model
generalization performances. Consequently, to proper train a model is necessary to stop the training
process at the global minimum point, i.e., where the validation error trend changes from descending to
ascending. Summarizing, if the training process is stopped before the global minimum point the model
is underfitted, if it is stopped after the global minimum point the model is overfitted, as shown in Figure
2.
Figure 2: Underfitting, overfitting and optimal training zone.

        2.1.5. ArchiBERTo outputs and ranking calculation
   Once ArchiBERTo is fine-tuned and the performances are assessed, the NLP tool can be tested
processing new sentences assigning the labels and the accuracy degree with which the labels are
associated to the new sentences. The accuracy degree values of each processed sentence represent the
weights of the labels, and thus the relative priority of the labels/quality objectives for the single
sentence. The accuracy values/weights of the labels obtained by processing all the sentences of a
document are summed and normalized to define the total weight of each label for the entire DIP
(Formula (1)).

                              ∑ (𝐿𝑖 )                                                               (1)
           𝐿𝑎𝑏𝑒𝑙 𝑊𝑒𝑖𝑔ℎ𝑡𝑖 =           ⁄∑ (𝐿 ) + ∑ (𝐿 ) + ∑ (𝐿 ) + ⋯ + ∑ (𝐿 )
                                            1           2          3             𝑛
where Li denotes the accuracy value of the i-th label, i = (A.1, B.1, C.1…, P.1)

   The total weights of each label represent the relative importance of the quality objectives to be
pursued by the design teams in the definition of the design proposals, and at the same time, the
evaluation criteria to be used by the external committee to evaluate the design proposals.

2.2.    ArchiBERTo evaluation
        2.2.1. ArchiBERTo subjectivity degree
   As stated, after the completion of the NLP tool development and assessment the evaluation of the
subjectivity degree of the tool is conducted. In order to evaluate ArchiBERTo subjectivity the contents
of different DIPs related to the quality objectives are processed by ArchiBERTo and the corresponding
ranking of objectives/criteria is provided.
   The same DIPs are also analyzed by three experts individually. Each of them individually hierarchize
the objectives providing a ranking, then the same DIPs are read and analyzed by the three experts
collectively and a ranking is provided by the group.
   To measure the subjectivity degree of the tool, the rankings generated via the NLP model and the
rankings provided by the single experts are compared with the rankings provided collectively by the
group of three experts considered as the benchmark. To calculate the discrepancy between the
evaluations of the individual opinions of each expert and the NLP tool with the collective evaluation, a
score is assigned from 1 (last goal/label in the ranking) to 21 (first and most important objective in the
ranking). The discrepancy coincides with the difference between the score/position of each objective
generated by each single expert and by the NLP tool, both compared with the ranking provided
collectively by the group of experts (Figure 3).
Figure 3: Single experts and ArchiBERTo ranking subjectivity measurement.

   Consequently, two conditions can happen:
      • ArchiBERTo ranking discrepancy for each label > individual expert ranking discrepancy
          for each label: the tool has a higher degree of subjectivity than the single experts and for
          that reason does not adequately represent the collective capability to translate quality
          objectives expressed in natural language into the corresponding ranking.
      • ArchiBERTo ranking discrepancy for each label < individual expert ranking discrepancy
          for each label: the tool is less affected by subjective biases representing the collective
          capability of the group of experts to translate the natural language expressions into a ranking
          of objectives.

        2.2.2. ArchiBERTo customization degree
   Three different DIPs are processed, and three rankings are produced using the NLP system. A unique
ranking is produced by a group of three experts from the analysis of a single DIP. The aim is to measure
the capability of the NLP system to generate a ranking customized on the DIPs content, measuring the
variation of the rankings of different DIPs compared with the fixed ranking produced by the group of
experts collectively (Figure 4). The described experiment aims to measure the flexibility of the proposed
system, which must not flatten the rankings of objectives proving the capability of ArchiBERTo to
provide a customized prioritization of objectives for different DIPs, mirroring the semantic content of
each document.


Figure 4: NLP tool ranking customization capability measurement.

3. Case study
3.1. Case study: Progetto Iscol@
    The case study concerns the processing of DIPs regarding the design and construction of school
buildings. In particular, Progetto Iscol@ introduced by the Regional Council of Sardinia to address the
backwardness of the regional education system, aiming at modernizing and expanding the regional
school building stock, is chosen as a suitable case study. The public investment aims at achieving high
standards of architectural quality and social and environmental sustainability of the educational
facilities. General directions and regional guidelines are shared in the early stages of Progetto Iscol@
to the various municipalities involved in the interventions. The use of guidelines ensures that all DIPs
follow the regional directives, homogenizing the quality objectives of interventions on the island's
school heritage through the sharing of a list and a ranking of quality objectives defined by the Iscol@
working group. The completion of the first round of call for tenders and pilot projects highlighted the
impact of using a standardized list and ranking of objectives with a relative priority fixed for all the
projects. On one hand, it demonstrated to be an effective approach for leading different projects toward
criteria and objectives in line with Iscol@ strategic goals. On the other hand, it turned out to be an
excessively rigid approach to properly support the designers and the external committee in the
evaluation of school building projects that differ in geographical-environmental, socio-cultural context
and consequently in quality demands. In fact, considering the internal specificity of design and
construction projects, as they are closely related to and influenced by the context and by the different
socio-economic and territorial needs, the use of a fixed hierarchy of objectives tends to flatten and
eliminate the individual specificities and needs of the different projects. Consequently, the use of NLP
can, in such a context, play a crucial role customizing the prioritization of objectives for each call,
mirroring the semantic content of each DIP. Consequently, the use of the ad-hoc developed NLP tool
ArchiBERTo, aims to reintroduce proper flexibility and compliance with the different project specific
demands and requests.

4. Results and discussion
4.1. ArchiBERTo development and assessment steps
    As stated, the NLP-based tool ArchiBERTo is developed as a multi label classifier based on the
BERT (Bidirectional Encoder Representation from Transformers) language representation model.
Results and details of each step of the development and assessment of the tool are provided in the
following sections, followed by a section about the result of the tool evaluation (i.e., subjectivity degree
and customization capability).

        4.1.1. Dataset construction
   As stated, a BERT language model is fine-tuned to solve the multi-label classification problem in
the architecture and construction knowledge domain. Among the DIPs currently validated and
published by the municipalities, twenty-one DIPs are used to develop and assess the NLP tool. From
the DIPs collection a data set is produced and split (with a ratio of 0.8:0.2) into a training and validation
dataset. Table 2 shows labels topic and the number of sentences manually labelled.
   The dataset is available at the following link in .csv format: Github_ArchiBERTo_dataset.

Table 2
Training and validation dataset overview.
 Label                                  Label/Objective topic                                   Sentences
  A.1)              Capability of the school building to be used as a Civic Center                 133
  B.1)    Visibility and integration of sustainable design choices (educational medium)            45
          and integration of the intervention into nature and application of landscape
                                       enhancement strategies
  C.1)    Possibility of personalization of spaces and equipment to prevent vandalism               25
                               creating a feeling of belonging in users
  D.1) Spatial and volumetric integration of the intervention in the context and with               68
            existing buildings (shape, materials, colors, connections etc.) and proper
             mediation with the demand for visibility and architectural quality of the
                        intervention as a building containing public functions
  E.1)         Articulation of spaces and accesses with a focus on simple and clear                 61
           identification of the various functions, including using colors and signages
  E.2)                Presence of green spaces as an integral part of the design                    34
  F.1)     Perceptual quality (natural and artificial light) and psychophysical comfort
                                                                                                    144
          (visual, thermo-hygrometric, acoustic etc.) to promote comfort and learning
  F.2)                            Indoor air quality and healthiness                                29
 G.1)        Cleanability, durability, maintainability, and replaceability of landscaping,    41
                materials, and greenery to reduce operating and maintenance costs
  I.1)    Integration of the intervention with the road system and distinction between        36
         driveways, bicycle, and pedestrian paths; provision of areas and equipment to
                              encourage slow and non-motorized mobility
 I.2)              Ensuring accessibility and usability for people with disabilities          40
 L.1)      Fostering interactions between students and teachers, group work and peer          197
           learning (collaborative learning and peer tutoring) by supporting innovative
             and inclusive teaching. Architecture should support the idea of space as a
                                              “third teacher”
 L.2)        Visual and spatial continuity between outdoor (green and non-green) and          108
               indoor environments to encourage outdoor educational activities and
         enhance contact with the natural environment (outdoor space can be used as
           a second classroom). Connection between classroom and circulation spaces.
            The architecture should support the concept of openness of the traditional
                           classroom and the concept of learning landscape
 M.1)        Use of renewable, natural (non-harmful), local materials or materials with       46
                                             recycled content
 M.2)      Minimization of the impact of the building on the surrounding environment          89
                (noise, light, water pollution, heat island effect, minimization of land
                          consumption and use of soil defense strategies etc.)
 M.3)       Integration between design and renewable energy production systems and            48
             exploitation/management of solar, light, and natural cooling and heating
                                                    inputs
 M.4)         Requests regarding energy standards and minimization of consumption             96
                       (energy, water etc.) including using monitoring systems
 N.1)         Ensuring safety during school activities and separation between activity        26
         conducted by people not belonging to the school staff, maintenance activities
          (spaces and paths). Adequate delimitation of the school perimeter, and need
                                         for control/supervision
 O.1)                          Spatial flexibility (furniture, facilities etc.)               198
 O.2)        Temporal flexibility, possibility of use during curricular and extracurricular   103
             hours by citizens and long-term temporal flexibility, adaptability of spaces
                                  (readiness for change, adaptability)
 P.1)         Usability of technological devices and integration with learning theories.      103
         Integration of space and technology; widespread presence of ICT technologies

         4.1.2. Model fine tuning parameters
   Description and values of hyperparameters used for the NLP fine tuning are provide in Table 3.

Table 3
Hyperparameters description and values setting.
  Hyperparameter                                 Description                                  Value
  MaximumLength      Maximum number of words simultaneously processed during the               85
                                                   training
  TrainingBatchSize          Number of training examples used in one iteration                  2
 ValidationBatchSize    Number of examples used for the validation in one iteration            32
   EpochsNumber         An epoch is an entire transit of the training data through the         20
                                                  algorithm
    LearningRate         It defines the adjustment in the weights of the neural network      2 E-05
                                     with respect to the loss gradient descent

        4.1.3. Fine-tuned model evaluation
   As stated, to evaluate the NLP tool predictions on the validation dataset Precision (P), Recall (R)
and F1-score (F1) metrics are calculated (Table 4) to measure the model performances.

Table 4
Model Precision, Recall, and F1-score values per label.
        Label                     Recall (R)              Precision (P)           F1-score (F1)
         A.1)                       0.92                      0.71                    0.80
         B.1)                       0.75                      0.75                    0.75
         C.1)                       1.00                      0.75                    0.86
         D.1)                       0.42                      0.56                    0.48
         E.1)                       0.64                      0.88                    0.74
         E.2)                       0.33                      0.67                    0.44
         F.1)                       0.77                      0.68                    0.72
         F.2)                       1.00                      1.00                    1.00
         G.1)                       0.90                      1.00                    0.95
         I.1)                       0.67                      0.67                    0.67
         I.2)                       0.86                      0.55                    0.67
         L.1)                        0.8                      0.67                    0.73
         L.2)                       0.39                      0.85                    0.54
         M.1)                       1.00                      0.67                    0.80
         M.2)                       0.90                      0.82                    0.86
         M.3)                       0.86                      0.67                    0.75
         M.4)                       0.60                      0.64                    0.62
         N.1)                       0.50                      1.00                    0.67
         O.1)                       0.71                      0.88                    0.78
         O.2)                       0.78                      0.44                    0.56
         P.1)                       1.00                      0.65                    0.79

   The results show that the NLP model is properly fine-tuned showing only the labels D.1) and E.2)
with a F1-score lower than 0.5.
   Furthermore, training and validation loss learning curves are plotted to avoid overfitting or
underfitting phenomena (Figure 5).
Figure 5: Training and validation loss charts.

   The curves plot shows the training and validation loss curves gradually decrease and flatten moving
close to each other, furthermore validation loss is slightly greater than the training loss at the global
minimum point: Validation_loss = 0.1031, Training_loss = 0.08937. Consequently, the model can be
considered fine-tuned. No overfitting or underfitting phenomena are present.
   The code is available in .ipynb at the link: Github_ArchiBERTo_code.

        4.1.4. ArchiBERTo output example
    After the assessment and fine-tuning phase, a qualitative evaluation of the capability of ArchiBERTo
in the labelling of new unknown sentences is conducted. An example of the processing and labelling
output of two quality related sentences is provided in Figure 6.


Figure 6: Outputs of the NLP tool processing and labelling of DIP sentences.
    4.2.        ArchiBERTo evaluation: subjectivity degree
   In the the following sections the results of the subjectivity and customization degree evaluation of
the tool are provided and discussed to demonstrate the capability of ArchiBERTo to mirror the
collective intelligence and its capability to customize the objectives rankings according to the different
DIPs content.

        4.2.1. DIP number 1: Sassari primary school
   ArchiBERTo is applied to process a DIP of a primary school located in the municipality of Sassari,
Sardinia. The objectives ranking generated by the NLP tool gets for 12 out of 21 labels/objectives the
lowest discrepancy. 4 labels show a discrepancy lower than the discrepancy of the evaluation of two
expert, and other 4 labels, B.1), D.1), F.1) and O.2), show a discrepancy lower than a single expert and
for only 1 objective, A.1), the NLP tool obtains the highest discrepancy compared with the benchmark.
Moreover, considering the discrepancy (d) of the ranking provided by the NLP tool:
   •     15 labels show           d < 10%
   •     4 labels show            10% < d < 20%
   •     4 labels show            20% < d < 30%
   •     1 label shows            d > 30%
   Figure 7 shows that the average discrepancy related to the ranking generated by the NLP tool is
10%, the lowest value when compared with the average discrepancy of individual expert rankings.
Consequently, the tool seems to be less subjective if compared with individual experts opinion
mirroring the collective capability of the group to translate the quality-related natural language
expressions


Figure 7: Single experts and NLP tool total discrepancy, DIP_01.

        4.2.2. DIP number 2: Nuoro secondary school
   ArchiBERTo is applied to process a DIP of a secondary school located in the municipality of Tortolì,
province of Nuoro Sardinia. The objectives ranking generated by the NLP tool gets for 10 out of 21
labels/objectives the lowest discrepancy. 7 labels show a discrepancy lower than the discrepancy of the
evaluation of two expert, 3 labels, C.1), E.1), and M.2), show a discrepancy lower than a single expert
and for only 1 objective, G.1), the NLP tool obtains the highest discrepancy compared with the
benchmark. Moreover, considering the discrepancy (d) of the ranking provided by the NLP tool:
   •    13 labels show            d < 10%
   •    7 labels show             10% < d < 20%
   •     3 labels show           20% < d < 30%
   •     1 label shows           d > 30%
   Figure 8 shows that the average discrepancy related to the ranking generated by the NLP tool is
11%, the lowest value when compared with the average discrepancy of individual expert rankings. The
results of the second DIP processed confirm the lower subjectivity of the NLP tool and its capability of
mirroring the collective knowledge of the group of experts in the objectives ranking task.


Figure 8: Single experts and NLP tool total discrepancy, DIP_02.

4.3.    ArchiBERTo evaluation: customization degree
        4.3.1. DIP number 3-4-5
   The capability of the NLP system to generate rankings customized on different DIPs is evaluated
measuring the variation of the rankings generate processing three different DIPs compared with a fixed
ranking produced by the three experts collectively. The variation obtained for the DIPs is shown in
Figure 9.


Figure 9: DIPs ranking variation comparison with a fixed ranking.
   The DIP analyzed and translated by the group of experts into the fixed ranking used to compare the
ArchiBERTo outputs related to the three different DIPs concerns the design and construction of a
secondary school building. The DIP_03 concerns the design of a secondary school as well as for the
DIP_05, the DIP_03 is about the construction of a new primary school.
   The NLP tool seems to show a good customization degree with an average variation of 20, 15, and
17% for the three DIPs analyzed respectively.

5. Conclusion
    The study aims to demonstrate the capability of an ad-hoc developed NLP tool (ArchiBERTo) to
automatically process and translate quality objectives expressed in DIPs of school building projects into
a list of hierarchized objectives to support the evaluation process of the design proposals in the Italian
design and construction context of Progetto Iscol@. The use of ArchiBERTo aims to establish a
consensus between the involved actors (i.e., public appointing party, design teams and, the evaluation
committee) regarding the relative hierarchy of quality needs and objectives providing a list of ranked
objectives and criteria, enhancing the effective communication during the pre-design phase.
    According to the results, ArchiBERTo seems to be capable of mirroring the collective capability and
sensitivity of a group of experts in the architecture and construction knowledge domain in the ranking
of sentences related to quality objectives, avoiding subjectivity in the interpretation of textual
information. Moreover, ArchiBERTo, being the representation of a group of experts knowledge is less
affected to subjectivity, outperforms the capability of a single expert to handle the complexity of
analyzing several sentences contained in a DIP document.
    Regarding the capability to produce customized rankings based on the content of different DIPs,
ArchiBERTo shows a good degree of customization. This confirms the proper flexibility of the
proposed system, which does not flatten the ranking of objectives, deviating from the use of a fixed
evaluation grid as, on contrary, was the case for the evaluation of the first cycle of the call for tenders
and pilot projects of the Progetto Iscol@. In fact, ArchiBERTo is able to provide a customized
prioritization of objectives for the different processed DIPs, mirroring the semantic content of each
documents reintroducing the proper flexibility and compliance with the different projects specific
quality demands.
    In the Italian public tender procedure, the prioritization ranking generated by the NLP tool can be
shared along with the tender documentation to the design teams participating to the call for tenders to
improve the communication and allow the designers to have full understanding of the appointing party
needs and quality objectives, and to the external committee to have a support in the evaluation and
comparison of the design projects.
    Moreover, considering the aforementioned limitation of the BIM approach in managing unstructured
information, such as the information shared in natural language in the qualitative section of a DIP
document, the proposed methodology stands as an attempt to expand the digitalization of the design
and construction process to the unstructured natural language data. In fact, the combined use of BIM
and NLP methodologies and tools could help architects and engineers to digitally manage both aspects
of the design and construction process: the alphanumeric (structured) side and the non-alphanumeric
(unstructured) side of the data, the latter a key step in the digitization of the design and construction
industry.

6. Acknowledgements
   The authors take the opportunity to acknowledge and thank Valerio Basile assistant professor in the
Content-Centered Computing group at University of Turin for the technical support in the development
and assessment of ArchiBERTo.
7. References
[1]    R. R. Senescu, J. R. Haymaker, S. Meža, M. A. Fischer, Design Process Communication
       Methodology: Improving the Effectiveness and Efficiency of Collaboration, Sharing, and
       Understanding,       Journal     of     Architectural   Engineering.      20     (2014)     1–14.
       doi:10.1061/(ASCE)AE.1943-5568.0000122.
[2]    N. Norouzi, M. Shabak, M. R. Bin Embi, T. H. Khan, The Architect, the Client and Effective
       Communication in Architectural Design Practice, in: Procedia - Social and Behavioral Sciences,
       Elsevier B.V., 2015, pp. 635–642. doi:10.1016/j.sbspro.2015.01.413.
[3]    H. Taleb, S. Ismail, M. H. Wahab, W. N. M. W. M. Rani, Communication management between
       architects and clients, in: AIP Conference Proceedings, 2017, pp. 1–6. doi:10.1063/1.5005469.
[4]    G. M. Di Giuda, M. Locatelli, E. Seghezzi, Natural Language Processing and BIM In AECO
       Sector: A State Of The Art, in: Proceedings of the Fifth Australasia and South-East Asia
       Structural Engineering and Construction Conference, ISEC Press, Christchurch, New Zealand,
       2020, pp. 1–6. doi:10.14455/ISEC.2020.7(2).CON-22.
[5]    S. Sun, L. Li, Application of Deep Learning Model Based on Big Data in Semantic Sentiment
       Analysis, in: The 2021 International Conference on Machine Learning and Big Data Analytics
       for IoT Security and Privacy. SPIoT 2021. Lecture Notes on Data Engineering and
       Communications Technologies, Springer Science and Business Media Deutschland GmbH,
       Shanghai, China, 2022, pp. 590–597. doi:10.1007/978-3-030-89508-2_76.
[6]    M. Locatelli, G. Pattini, E. Seghezzi, L. C. Tagliabue, D. G. Giuseppe Martino, NLP-based
       system for automatic processing of quality demands in Italian public procedure: a system
       engineering formalization, in: 2022 European Conference on Computing in Construction, Ixia,
       Rhodes, Greece, 2022, pp. 1–8. doi:10.35490/EC3.2022.176.
[7]    F. Ameziane, Information system for building production management, International Journal of
       Production Economics. 64 (2000) 345–358. doi:10.1016/S0925-5273(99)00071-7.
[8]    A. A. Latiffi, J. Brahim, M. S. Fathi, The Development of Building Information Modeling (BIM)
       Definition,       Applied     Mechanics      and      Materials.    567      (2014)     625–630.
       doi:10.4028/www.scientific.net/AMM.567.625.
[9]    T. Young, D. Hazarika, S. Poria, E. Cambria, Recent trends in deep learning based natural
       language processing, IEEE Computational Intelligence Magazine. 13 (2018) 55–75.
       doi:10.1109/MCI.2018.2840738.
[10]   C. A. Montgomery, Linguistics and Automated Language Processing, in: International
       Conference on Computational Linguistics COLING, 1969, pp. 1–25.
[11]   M. Pacak, A. W. Pratt, The function of semantics in automated language processing, in: SIGIR
       ’71: Proceedings of the 1971 International ACM SIGIR Conference on Information Storage and
       Retrieval, 1971, pp. 5–18. doi:10.1145/511285.511288.
[12]   J. Barnett, K. Knight, I. Mani, E. Rich, Knowledge and Natural Language Processing,
       Communications of the ACM. 33 (1990) 49–63. doi:10.1145/79173.79177.
[13]   A. Lenci, S. Montemagni, V. Pirelli, Testo e computer. Elementi di linguistica computazionale,
       6th. ed., Carocci editore@Aulamagna, Rome, 2005.
[14]   C. Wu, X. Li, Y. Guo, J. Wang, Z. Ren, M. Wang, Z. Yang, Natural language processing for
       smart construction: Current status and future directions, Automation in Construction. 134
       (2022). doi:10.1016/j.autcon.2021.104059.
[15]   F. U. Hassan, T. Le, Automated Requirements Identification from Construction Contract
       Documents Using Natural Language Processing, Journal of Legal Affairs and Dispute
       Resolution in Engineering and Construction. 12 (2020). doi:10.1061/(ASCE)LA.1943-
       4170.0000379.
[16]   F. ul Hassan, T. Le, Computer-assisted separation of design-build contract requirements to
       support     subcontract      drafting,    Automation     in      Construction.    122     (2021).
       doi:10.1016/j.autcon.2020.103479.
[17]   T. Mahfouz, A. Kandil, S. Davlyatov, Identification of latent legal knowledge in differing site
       condition (DSC) litigations, Automation in Construction. 94 (2018) 104–111.
       doi:10.1016/j.autcon.2018.06.011.
[18]   J. Lee, J.-S. Yi, J. Son, Development of Automatic-Extraction Model of Poisonous Clauses in
       International Construction Contracts Using Rule-Based NLP, Journal of Computing in Civil
       Engineering. 33 (2019) 04019003. doi:10.1061/(ASCE)CP.1943-5487.0000807.
[19]   J. Lee, Y. Ham, J.-S. Yi, J. Son, Effective Risk Positioning through Automated Identification of
       Missing Contract Conditions from the Contractor’s Perspective Based on FIDIC Contract Cases,
       Journal of Management in Engineering. 36 (2020) 1–11. doi:10.1061/(ASCE)ME.1943-
       5479.0000757.
[20]   R. Khalef, I. H. El-adaway, Automated Identification of Substantial Changes in Construction
       Projects of Airport Improvement Program: Machine Learning and Natural Language Processing
       Comparative Analysis, Journal of Management in Engineering. 37 (2021) 1–15.
       doi:10.1061/(ASCE)ME.1943-5479.0000959.
[21]   H. Fan, H. Li, Retrieving similar cases for alternative dispute resolution in construction
       accidents using text mining techniques, Automation in Construction. 34 (2013) 85–91.
       doi:10.1016/j.autcon.2012.10.014.
[22]   B. Zhong, X. Pan, P. E. D. Love, L. Ding, W. Fang, Deep Learning and network analysis:
       Classifying and visualizing accident narratives in construction, Automation in Construction. 113
       (2020) 103089. doi:10.1016/j.autcon.2020.103089.
[23]   A. Ajayi, L. Oyedele, H. Owolabi, O. Akinade, M. Bilal, J. M. Davila Delgado, L. Akanbi, Deep
       Learning Models for Health and Safety Risk Prediction in Power Infrastructure Projects, Risk
       Analysis. 40 (2020) 2019–2039. doi:10.1111/risa.13425.
[24]   F. Zhang, H. Fleyeh, X. Wang, M. Lu, Construction site accident analysis using text mining and
       natural language processing techniques, Automation in Construction. 99 (2019) 238–248.
       doi:10.1016/j.autcon.2018.12.016.
[25]   A. J. P. Tixier, M. R. Hallowell, B. Rajagopalan, D. Bowman, Construction Safety Clash
       Detection: Identifying Safety Incompatibilities among Fundamental Attributes using Data
       Mining, Automation in Construction. 74 (2017) 39–54. doi:10.1016/j.autcon.2016.11.001.
[26]   A. J. P. Tixier, M. R. Hallowell, B. Rajagopalan, D. Bowman, Application of machine learning
       to construction injury prediction, Automation in Construction. 69 (2016) 102–114.
       doi:10.1016/j.autcon.2016.05.016.
[27]   H. Baker, M. R. Hallowell, A. J. P. Tixier, AI-based prediction of independent construction
       safety outcomes from universal attributes, Automation in Construction. 118 (2020) 103146.
       doi:10.1016/j.autcon.2020.103146.
[28]   A. J. P. Tixier, M. R. Hallowell, B. Rajagopalan, D. Bowman, Automated content analysis for
       construction safety: A natural language processing system to extract precursors and outcomes
       from unstructured injury reports, Automation in Construction. 62 (2016) 45–56.
       doi:10.1016/j.autcon.2015.11.001.
[29]   A. Ferrari, G. Gori, B. Rosadini, I. Trotta, S. Bacherini, A. Fantechi, S. Gnesi, Detecting
       requirements defects with NLP patterns: an industrial experience in the railway domain,
       Empirical Software Engineering. 23 (2018) 3684–3733. doi:10.1007/s10664-018-9596-7.
[30]   A. Faraji, M. Rashidi, S. Perera, Text Mining Risk Assessment–Based Model to Conduct
       Uncertainty Analysis of the General Conditions of Contract in Housing Construction Projects:
       Case Study of the NSW GC21, Journal of Architectural Engineering. 27 (2021) 1–17.
       doi:10.1061/(asce)ae.1943-5568.0000489.
[31]   J. Lee, J.-S. Yi, Predicting Project’s Uncertainty Risk in the Bidding Process by Integrating
       Unstructured Text Data and Structured Numerical Data Using Text Mining, Applied Sciences.
       7 (2017) 1–15. doi:10.3390/app7111141.
[32]   M. Bilal, L. O. Oyedele, Big Data with deep learning for benchmarking profitability
       performance in project tendering, Expert Systems with Applications. 147 (2020) 1–19.
       doi:10.1016/j.eswa.2020.113194.
[33]   M. F. F. Siu, W. Y. J. Leung, W. M. D. Chan, A data-driven approach to identify-quantify-
       analyse construction risk for Hong Kong NEC projects, Journal of Civil Engineering and
       Management. 24 (2018) 592–606. doi:10.3846/jcem.2018.6483.
[34]   Y. Zou, A. Kiviniemi, S. W. Jones, Retrieving similar cases for construction project risk
       management using Natural Language Processing techniques, Automation in Construction. 80
       (2017) 66–76. doi:10.1016/j.autcon.2017.04.003.
[35]   R. Zhang, N. El-Gohary, A machine learning-based method for building code requirement
       hierarchy extraction, in: Canadian Society for Civil Engineering Annual Conference, CSCE
       2019, Laval, Canada, 2019, pp. 1–10.
[36]   J. Zhang, N. M. El-Gohary, Integrating semantic NLP and logic reasoning into a unified system
       for fully-automated code checking, Automation in Construction. 73 (2017) 45–57.
       doi:10.1016/j.autcon.2016.08.027.
[37]   M. Locatelli, E. Seghezzi, L. Pellegrini, L. C. Tagliabue, D. G. Giuseppe Martino, Exploring
       Natural Language Processing in Construction and Integration with Building Information
       Modeling:       A     Scientometric       Analysis,     Buildings.     11     (2021)    1–33.
       doi:10.3390/buildings11120583.
[38]   M. Locatelli, G. Pattini, L. Pellegrini, S. Meschini, D. Accardo, Fostering the consensus: a
       BERT-based Multi-label Text Classifier to support agreement in Public design call for tenders,
       2022. To appear.
[39]   J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional
       Transformers for Language Understanding, ArXiv. (2019) 16. doi:10.48550/arXiv.1810.04805.
[40]   R. Venkatesan, M. J. Er, Multi-label classification method based on extreme learning machines,
       in: 2014 13th International Conference on Control Automation Robotics and Vision, ICARCV
       2014, 2014, pp. 619–624. doi:10.1109/ICARCV.2014.7064375.
[41]   M. Sokolova, G. Lapalme, A systematic analysis of performance measures for classification
       tasks,    Information      Processing     and      Management.      45    (2009)     427–437.
       doi:10.1016/j.ipm.2009.03.002.
[42]   M. L. Osborne, A Modification of Veto Logic for a Committee of Threshold Logic Units and
       the Use of 2-Class Classifiers for Function Estimation, Oregon State University, 1975.