1. Introduction e Motivations

Ital-IA

organisation-specific transformer via semantic pre-training

Daniele Margiotta

margiotta@revealsrl.it 1

Danilo Croce

croce@info.uniroma2.it 1

Marco Rotoloni

m.rotoloni@abilab.it 0

Barbara Cacciamani

b.cacciamani@abilab.it 0

Roberto Basili

basili@info.uniroma2.it 1

Reveal s.r.l.

Italy

0 ABI Lab , Italy 1 University of Roma , Tor Vergata , Italy

2023

3 29 31

AI approaches to business knowledge management have often neglected the role of documents, which are the backbone of expertise, norms, and optimal practices that every organisation implicitly encodes in its large-scale document collections. Banks make no exception and have to deal with operational documents on business process engineering, as well as norms on legal compliance aspects. They are thus particularly interested in the mining of the huge body of knowledge implicitly stored in their text archives, i.e. in their document assets. Extracting semantic metadata from raw bank documents is therefore central for supporting efective governance, business engineering as well as legal monitoring processes in an accurate and profitable manner. In this paper, a weakly-supervised neural methodology for creating semantic metadata from bank documents and its application to diferent banking organisations is presented. Based on a neural pre-training methodology driven by knowledge models of individual banks, it is shown to improve with respect to inductive approaches previously presented, that are domain specific, but organisation independent. The application to business process design in diferent Italian banks has been here tested and the observed impact through measurements confirms its wide applicability at the level of banks, as well as to other business organisations.

1. Introduction e Motivations

Traditional banking technologies focus on transaction processing and data analysis. Artificial Intelligence is promoting the adoption of data-driven methods that can induce expert rules and accurate predictions for financial forecasting tasks, such as the estimation of future values for bonds and equities, identifying market opportunities, or anti-money laundering decisions [ 1, 2, 3, 4 ]. However, dealing with massive unstructured information poses challenges, especially with non-numerical data. Financial information management applications are responding by transforming unstructured into structured data to support information labeling, searching, and promoting industry development. The banking and financial industry heavily relies on internal documentation to record and regulate, processes and organisational units. These texts include regulatory documents, reference models, and terminologies, making up a valuable repository of core data for business analysis and strategic planning. nEvelop-O (R. Basili) classifying texts into Process hierarchy classes. This architecture, known as ABILaBERT, was able to associate texts with nodes in Banking Process Tree provided by

ABI Lab1. Most notably, ABILaBERT was trained without The organisational regulations are expressed in a semi- vide examples of such complex associations is significant.

the need for any labeled text, but rather through a pro- independent formalization of processes active in the Italcess called textification , where the target taxonomy and ian bank eco-system that aims to map all areas of activity its semantic relations between concepts (i.e. processes) at a common level of detail across diferent banks and fiwas used to generate a large-scale corpus made of their nancial organisations without referencing organisational corresponding textual descriptions. Notice that the ABI structures, products, or delivery channels. The process Lab taxonomy is representative of a generic bank and taxonomy defines process types and their subsumption can be used for pre-training BERT before fine-tuning is relation, with specific properties of each process includcarried out for text classification. However, while it was ing a label and textual description. The processes naming shown efective in classifying bank-specific texts through and descriptions are in Italian, even though all examples the neutral taxonomy (i.e. the ABILaB one, [ 6 ]) the above will be reported through their English translations in the method was never applied to bank-specific taxonomies. rest of the paper.

In this paper, we aim at answering the following Re- More formally, the process taxonomy defines consearch Questions: “Is a unified ABILaBERT model sufi- ceptualized process types, i.e., taxonomy nodes ∈ , ciently accurate for a set of diferent banks 1, … , ? ”; and a subsumption relation ⊑ in × . Specific prop“Is fine-tuning of the ABILaBERT model possible against a erties of a process include at least the label, i.e., the bank-specific knowledge model? ”, or in other words, “Is process naming term l a b e l () , and its textual description, a specialization of ABILaBERT towards a bank through namely d e s c () . As an example, a process has l a b e l () : pre-training possible and efective to induce a bank-specific “Definition of the Company Vision” , while its description model, such as − ? ”; “Which kind of fine-tuning is d e s c () : “The process of Defining, at an abstract level, applicable to − in order to get specific and optimal some company objectives towards the diferent stakeholder, classifiers for the individual banks? ”. the expected company positioning and the policies to be

The experimental evaluation confirms that the combi- adopted to achieve them”. nation of pre-training on bank-specific taxonomies and The automatic association of a text (e.g., a paragraph ifne-tuning over (semi-automatically annotated) docu- from a document or the entire document itself) to nodes ments is highly beneficial, demonstrating that a bank- in this Taxonomy is traditionally modeled as a text classpecific ABILaBERT can be extremely efective in the sification task ∶ → . However, in order to train a automatic classification with respect to diferent and het- classifier that approximates , a training set of texts manerogeneous taxonomies. ually associated with nodes in the taxonomy is required.

In the rest of the paper, Section 2 summarized the ABI- Unfortunately, this manual annotation is a costly activity, LaBERT approach. Section 3 shows how it was applied especially when the size of grows. to diferent banks, Section 4 reports the experimental In [ 6 ] a Zero-Shot Learning technique (ZSL) is proevaluation while Section 5 derives the conclusions. posed to inject information from directly into a text classifier without the need for annotated documents. In particular, an approach based on textification is applied. 2. The ABILaBERT approach The idea is to capitalize the (textual) information about the nodes of the taxonomy to initialize a neural-based The timely and precise sharing of information is crucial classifier, similar to the pre-trained stages, as discussed for business-related problems in banks like Legal Gover- in [ 5 ]. nance, Financial Planning, or Risk Assessment. This is Language Modeling (e.g., [ 5, 7 ]) has been largely used usually ensured through rigorous Business Process Man- as an efective pre-training method for large-scale neuagement (BPM) frameworks. Processes are thus defined ral networks. However, the auxiliary tasks adopted by specialists, consultants, and banking leaders, mainly (such as Masked Language Modeling) just emphasize designed through unstructured, or semi-structured data, language general properties, and models de facto, tasksuch as documents or process case templates. Maintain- and, more importantly, domain-independent information. ing an eficient BPM system is a crucial activity for banks. The domain-specific knowledge is particularly imporTypically, they obtain machine-readable forms of pro- tant in certain inferences, such as entity recognition and cesses through semi-formal specifications, then docu- metadata creation in the financial domain: in our case, ment them in process management platforms. Bank ana- the use of a process tree as a source of information for lysts use process-related information, such as norms or pre-training neural networks has been shown beneficial. activity obligations, in their document and information Since all the nodes and properties of the process tree management processes. The overall BPM system gives have a linguistic nature, they can be mapped into text rise to a hierarchy of processes that formalize tasks and units ussefull to tagger inference tasks. Specifically, a obligations at diferent abstraction levels.

The ABI Lab Process Tree Taxonomy2 is a bank- su1b, su2m∈ptioncarnelbateiomna p1pe⊏d in2tobetthweeteenxttwcloaspsrificoacteisosnes

2It’s available at: https://www.abilab.it/tassonomia-processi-bancari

2)” 1)” BERT ABILaBERT P4 (b)

P3 P6

P5 Taxonomy P2

P3 P1 P5 Bank specific Taxonomy ”( ”( Zero-Shot Learning

Notice that the diferent information explicit in the tax

onomy gives rise to auxiliary text classification tasks that can be seen as a form of pre-training of neural trans- Bank-specific (c) Annotated forPmoesirtimveodaenlds.negative examples for the task can be au- TAaBxIoL-adBrEivReTn TtehxetsBfaronmk tomatically derived from the taxonomy and its related textual properties. Training the neural network to un- Bank-specific Taxo/Doc-driven derstand how processes are defined and how they sub- ABILaBERT sume other processes corresponds to injecting domain- Weakly-Supervised specific knowledge through a stage of domain-specific Learning pre-training.

Once the model is pre-trained on thousands of state- Figure 1: Adapting ABILaBERT to specific banks. ments automatically derived from , it can be used in a ZSL fashion (i.e., no text is labeled in the process) to classify texts, i.e., prompting the model with the question if a text “ is a valid association to a node () ” is true. In [ 6 ], ABILaBERT was demonstrated to be efective in This approach aims to avoid the manual labeling stage associating texts with the ABI Lab Process Taxonomy. In typical of supervised learning, and at the same time fos- the remainder of the paper, we explore how ABILaBERT ters diferent auxiliary tasks, sensitive to the knowledge can be successfully adapted to individual bank-specific implicit in the process tree. process taxonomies while maintaining its ZSL approach.

The objective is to allow the system to encode free We also investigate the possibility of extending the trainsentences from domain documents in an informed man- ing process with a set of labeled examples in a weakly ner and support classification, i.e. the association of the supervised manner. By exploring these avenues, we aim proper processes from to input texts. The resulting to enhance the applicability of ABILaBERT to a wider model is called ABILaBERT: given an incoming text , range of banking-specific domains, while also improving it exploits the Transformer-based architecture to first its performance in classifying texts with respect to their generate an embedding for (contained in the vector in associated process trees. the first position [ C L S ] ) and then make it available for the classification step, possibly fine-tuned with labeled 3. Adapting ABILaBERT to examples. For every sentence, or paragraph, and process ∈ the system can estimate an auxiliary function, specific banks such as Definition Recognition (, l a b e l ()) , that corresponds to accept (or reject) a sentence such as: binary task of accepting a sentence like: or rejecting its inverse statement: 1) is a process more specific than (

2) is a process more specific than (

∶ “ is a valid description for the process () ”

In this way, the training of a classifier corresponds to learning the function

(, l a b e l ()) if sentence is True These promote the node as a good candidate to represent the semantics of a sentence with respect to the process taxonomy . Note that a document is usually made of complex textual units (e.g., paragraphs) made of more than one sentence. As a consequence, ABILaBERT can be used to automatically extract rich metadata from a document by applying (, l a b e l ()) to individual paragraphs .

To specialize the ABILaBERT model for a specific bank,

denoted by , we developed a strategy outlined in Figure 1. We began by pre-training a standard BERT-based model using information derived from the ABI Lab Process Taxonomy, resulting in the “ ” model, as demonstrated in [ 6 ] (step (a) in Figure 1).

Next, we utilized the Process Taxonomy specific to a bank to create a ZSL approach for deriving a bankspecific ABILaBERT model (step (b) in Figure 1), denoted by “ ”. This specialized model is taxonomydriven, meaning it is exclusively exposed to information derived from the taxonomy. The proposed strategy ofers a straightforward way to tailor the language of processes that ABILaBERT was pre-trained on, which were specific to ABI Lab, to the language, definitions, and semantic relationships that are specific to a particular bank.

Table 1 processes from the ABI Lab taxonomy. The resulting Statistics on taxonomies and documents shared by banks. pairings are not as numerous (up to 5/10 candidates for each input process) and can be validated by the bank’s Bank Num. Processes Num. Documents expert analysts. The analysts can prompt ABILaBERT 1 25 250 with the definition of a process from ABILaBERT, rank 2 15 30 all the processes according to their cosine similarity, and 3 10 48 easily retrieve the corresponding one. In this way, all 4 28 236 labels associated with the ABI Lab taxonomy are translated to the processes of the new taxonomy: this enables their reuse to fine-tune the final transformer against the

Finally, when annotated documents becomes available, bank-specific knowledge model. This approach is costwe can fine-tune the model in a “supervised” manner by efective, assuming that the BERT-based model is robust incorporating the labeled examples (step (c) in Figure 1), enough against the potential noise introduced by the denoted by “ , ”. proposed automatic labeling process, more details are in

It also improves the model’s performance by fine- [ 6 ]. tuning with labeled examples, so that the model is exposed to the “language” used in a specific bank. However, we can still avoid requiring that the annotation is com- 4. Experimental Evaluation pletely made manually. We instead refer to this latter approach as weakly supervised because we do not require In this section, the experimental evaluation is reported. all paragraphs from a document to be labeled with a spe- The objective is to study the efects of tuning ABILaBERT cific process. A subset of paragraphs can be manually on both bank-specific taxonomies and documents and annotated by the analysts, but we can also adopt ABIL- measure its benefits. aBERT to annotate paragraphs within a document, thus avoiding the need for costly manual annotations. This 4.1. Data and Hyperparameters strategy allows us to limit the need for annotated examples, by mining the bank-independent ABILaBERT as an We worked with documents and process trees provided already available supervised classifier. by four banks, referred to as 1, 2, 3, and

It’s worth noting that in some cases, the weakly- 4 for sake of simplicity. While 4 uses the same supervised strategy used by ABILaBERT may not be process taxonomy as ABI Lab, the internal process taximmediately applicable when the bank-specific taxon- onomy of 1, 2, and 3 are diferent from that omy contains processes with diferent names or descrip- of ABI Lab. Therefore, ABILaBERT will be fine-tuned tions compared to the ABI Lab taxonomy, even if they only on internal documents from 4, whereas for the express the same process. As an example, the ABI Lab other three banks, both pre-training on the taxonomy tree defines the process called “ Gestione servizi di banca and fine-tuning on the documents will be performed. Tavirtuale”3 while in the bank-specific hierarchy the pro- ble 1 provides a summary of the number of processes cess is called “Gestione Digital banking e servizi remoti considered within the process tree from each bank, along alla clientela”4. However, ABILaBERT can be used to with the corresponding number of provided documents. support this mapping process. As a text encoder [ 5 ], ABI- ABILaBERT is a language model that is based on BERT, LaBERT can be used to support the mapping between the a popular transformer-based model used for natural lanABI Lab and bank-specific taxonomies, in order to reuse guage processing tasks. According to [ 6 ], ABILaBERT paragraphs labeled by ABILaBERT, by simply assigning is a model that has been built on top of GilBERTo, then the corresponding process in the targeted bank-specific pre-trained on the texts expressing knowledge from the taxonomy. This mapping is derived by exploiting process ABI Lab taxonomy. definitions: first, ABILaBERT can be applied to derive To tailor ABILaBERT to document classification to a the embeddings of the process definitions of both tax- specific bank, it was further tuned, for each bank, on onomies (i.e., extracting the embeddings that encode the texts derived from the respective internal taxonomy (prerespective [ C L S ] token), and then the semantic similarity training) as well as on annotated documents. This latbetween individual nodes of the two process trees are ter fine-tuning process involved training the specialized estimated through the cosine similarity between such ABILaBERT model on the bank documents for 10 epochs, embeddings. For each process in a bank’s taxonomy, with a learning rate of 5 −7.

ABILaBERT is used to select the most similar candidate Diferences in the internal taxonomies of the involved banks resulted in diferent pre-training and fine-tuning 3In English: “Management of Virtual Banking Services”. stages. As summarized in Table 1, each bank makes 4In English: “Digital banking and remote customer services manage- reference to a diferent number of processes as targets ment”.

.043 .609 .696 .826 for the document classification stage. Specifically, 1 provided 250 documents that were representative of 25 internal macro-processes, 2 provided 30 documents ∈ , positive examples ⟨ , ⟩ and several negative that represented 15 macro-processes, 3 provided 48 examples ⟨ , ⟩ were generated, where are all other documents, with reference to 10 macro-processes, and macro-processes present in the taxonomy (so excluding ifnally, 4 provided 236 documents, and in this case, the correct macro-process ), in this way for each posithe reference taxonomy was the same as the ABI Lab one, tive example we have − 1 negative examples, where witUhsi2n8gatchteivseatdeodcmumacernot-sparnodcemssaecsr.o-processes, a dataset referiesntcheetanxuomnboemryof d.iferent macro-processes in the was generated for each bank. One dataset refers to the Although the paragraphs in the training set were aupre-training phase on the taxonomy (Table 2), and one tomatically annotated, the processes associated with the dataset refers to the fine-tuning phase of the model on paragraphs in the test dataset were manually checked by document paragraphs (Table 3). In Table 2 the number bank analysts to ensure reliable measurements. of “textified” examples derived from each bank-specific taxonomy are reported. It should be noted that the auxiliary relationship task used for the generation of the 4.2. Cross-bank Evaluation of pre-training dataset on the taxonomy is only that of Defi- Organisation-specific Transformers nition Recognition of a process, as it will then be the one used (and the most efective) in the classification phase The results are presented in Table 4, and the classification (as described in [ 6 ]). Each process generates a positive process used is the same as the one described in [ 6 ]. To example when paired with its definition while the as- summarize, ABILaBERT first applies a filtering phase sociation with a random incorrect definition generates to identify a subset of processes that may be evoked by negative examples. Moreover, for each process, we added the input paragraph. The filtered subset is then classialso examples derived by considering all its subsumed ifed using ABILaBERT , and the resulting candidates are nodes. In Table 3, data referring to the fine-tuning of ranked by their classification confidence. This ranking the ABILaBERT model on the bank-specific documents is enables the selection of the top ordered processes. presented. In particular, for each bank, documents were For each bank, the recall at (@ ) is reported as a split in a percentage of 90% for training and 10% for test- measure of classification performance. @ represents ing. In the training documents, paragraphs were labeled the percentage of paragraphs that were correctly associby ABILaBERT and each paragraph represents a positive ated with a process downstream of the processes proexample for the associated process . Specifically, for posed by the model. By reporting recall at for each emaacchrpoo-psirtoivceesesxeasm.Wplehen t h(,e lparboecle(ss )a)s,swighneerde byaAreBtIhLe- fboarnmks, wfoer gdaifeinreinntsibgahntkiinntgodhoomwaiwnes.ll FAirBsItLoafBaElRl, TtabpleeraBERT from the ABI Lab Process Tree does not exist in 4 confirm the outcomes in [ 6 ]: the original GilBERTo the target taxonomy, this is derived using the “mapping” diverges on bank-specific documents, with a low @ strategy described in Section 3. For each paragraph, that comparable to a baseline where processes are randomly showed a positive association with the macro-process assigned. ABILaBERT pre-trained in [ 6 ] shows significant improvements, suggesting that the pre-training step on the ABI Lab taxonomy is highly beneficial 5. The row show the systematic improvement due to the specific pre-train. On , ) model a more significant boost is obtained, with an average improvement of 44% in terms of @1 . This improvement is confirmed also for 4 where no additional taxonomy is provided. The experimental @3 results from diferent banks indicate that, on average, over 81% of texts can be accurately assigned to the correct process within the bank when three processes are suggested, even without prior document labeling in the overall process. We believe that this approach would be efective in supporting an annotation process to scale up efectively to fully supervised ones.

Error Analysis. A manual error analysis shows that a non-fine-tuned model such as ABILaBERT provides processes that are topically related to the texts, but these are in general too vague (i.e., “too high” in the process tree). For example, a text like ‘Il codice interno della nuova linea è il ’234’, l’importo minimo conferibile è pari a 12.000 € e le commissioni di gestione si attestano all’1,40% + Iva (in base all’aliquota tempo per tempo vigente)”6 is incorrectly classified by the original ABILaBERT with the process “Amministrazione”7 while the fine-tuned model correctly associates “Credito”8. The process “Amministrazione” seems indeed topically related to the text, but it is too vague. The fine-tuned model , provides a more specific and consistent labeling.

5. Conclusions

This paper has presented a novel weakly-supervised neural methodology for creating semantic metadata from bank documents and its successful application to various banking organisations. Our approach is based on a neural pre-training methodology that is driven by knowledge models specific to individual banks, and it has been shown to outperform inductive approaches that are merely specialized to the domain but independent from the organisation. Our experiments on four diferent Italian banks have demonstrated that the proposed methodology can significantly impact the design of busi5Since ABILaBERT returns processes consistent with the ABI Lab taxonomy, the mapping procedure is applied to derive the bankspecific ones. 6In English: “The internal code of the new line is ’234’, the minimum amount that can be confirmed is €12,000, and management fees are set at 1.40% + VAT (based on the prevailing rate at the time)”. 7In English: “Financial reporting” described as: “Management of accounting, tax, and reporting requirements borne by the bank and its group”. 8In English: “Credit management process” described as: “Process of credit management, in its funding and origination components, to different recipients (businesses, households, and public administration), in the diferent types (land credit, agricultural, … ness processes within an organisation. The observed results suggest that our methodology has wide applicability to other banks, as well as to other types of business organisations. This work highlights the potential of deep learning-based techniques to cost-efectively automate the process of extracting semantic information from business documents, thereby reducing the manual efort required to design efective machine reading tools beneficial to the overall eficiency of business operations.

Acknowledgments The authors would like to thank the Special Interest

Group of “ABI Lab” for actively supporting the research and experimentation presented in this paper. In particular, we thank the following banks: Monte dei Paschi di Siena (in particular, Ugolini Massimiliano), Mediolanum (Paolo Crocè, Francesco Fasano, Demetrio Migliorati, Gaetano Silletti), Banca Nazionale del Lavoro (Emanuele Tango, Ciro Esposito) and Banca Popolare di Sondrio (Roberta Besseghini, Gianpaolo Mura, Sergio Pozzi). We acknowledge financial support from the PNRR MUR project PE0000013-FAIR.

[1]

Cohen ,

Balch ,

Veloso , Trading via image classification , CoRR abs/ 1907 .10046 ( 2019 ). a r X i v : 1 9 0 7 . 1 0 0 4 6 .

[2]

J.-H.

Chen , Y.-C. Tsai, Encoding candlesticks as images for patterns classification using convolutional neural networks , 2020 . a r X i v : 1 9 0 1 . 0 5 2 3 7 .

[3]

Li ,

Saúde ,

Reddy ,

Veloso , Classifying and understanding financial data using graph neural network , in: AAAI-20 Workshop on Knowledge Discovery from Unstructured Data in Financial Services , 2020 .

[4]

Borrajo ,

Veloso ,

Shah , Simulating and classifying behavior in adversarial environments based on action-state traces: an application to money laundering , CoRR abs/ 2011 . 01826 ( 2020 ). a r X i v : 2 0 1 1 . 0 1 8 2 6 .

[5]

Devlin ,

Chang ,

Lee ,

Toutanova , BERT: pretraining of deep bidirectional transformers for language understanding , CoRR abs/ 1810 .04805 ( 2018 ).

[6]

Margiotta ,

Croce ,

Rotoloni ,

Cacciamani ,

Basili , Knowledge-based neural pre-training for intelligent document management , in: 20th International Conference of the Italian Association for Artificial Intelligence, Virtual Event, December 1- 3 , 2021 , volume 13196 of Lecture Notes in Computer Science, Springer, 2021 , pp. 564 - 579 .

[7]

Liu ,

Ott ,

Goyal ,

Du ,

Joshi ,

Chen ,

Levy ,

Lewis ,

Zettlemoyer ,

Stoyanov , Roberta: A robustly optimized bert pretraining approach , ArXiv abs/ 1907 .11692 ( 2019 ).