<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Results at SwissText 2022</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Claudio Giorgio Giancaterino</string-name>
          <email>claudio.giancaterino@intesasanpaolovita.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Intesa SanPaolo Vita</institution>
          ,
          <addr-line>Milano</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Lugano</institution>
          ,
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Workshop Proce dings</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>tion, Part of Speech Tagging, N-grams analysis</institution>
          ,
          <addr-line>Topic</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Natural Language Processing, briefly NLP, will lead in the next years the revolution of the Artificial Intelligence for the Insurance industry. There are several opportunities to employ NLP in the insurance activities, from claims processing to fraud detection and chatbots. In the marketing field, NLP can be used to monitor the sentiment analysis of feedback that people publish on diferent social networks to better consider insured needs or extract insight risks. Textual analysis of claims and classification can simplify the claims processing to reduce time treatment, operational errors or provide help in fraud detection. Underwriting process can be improved by a better textual assessment. The workshop had the goal to show NLP techniques on fraud detection by the disaster Tweets data set from Kaggle classification competition.</p>
      </abstract>
      <kwd-group>
        <kwd>The workshop ended with the classification task pre-</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        ance industry. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
      </p>
      <p>-Marketing: NLP can be used to monitor the sentiment
analysis of feedback to better consider insured needs, to
monitor risks insight on what people are thinking about a
particular product, to extract information about expected
trends to improve marketing strategy.</p>
      <p>-Underwriting: using Optical Character Recognition
(OCR) and NLP is possible to extract information from
medical reports and help underwriters in a better quote
of the insurance coverage. NLP can categorize patients’
diseases and retrieves correlation between some
symptoms and the likely cost of treatment for the Insurance
Company.</p>
      <p>-Reserving: the analysis of claim reports during the
ifrst notification of loss can improve the reserving process
for severe claims.</p>
      <p>-Claims processing: textual analysis can simplify the
trial reducing the time treatment of the process and
reducing operational mistakes.</p>
      <p>-Risk management: text classification is useful in the
risk assessment giving help in fraud detection.</p>
    </sec>
    <sec id="sec-2">
      <title>3. Exploratory Data Analysis</title>
      <sec id="sec-2-1">
        <title>Before to start with the application of any Machine Learn</title>
        <p>ing model, is better to understand data involved in the
project, and Exploratory Data Analysis is a block between
data cleaning and data modeling with the goal to
undercheck relationships between variables of a data set with</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>1. Introduction</title>
      <sec id="sec-3-1">
        <title>The workshop started with an introduction of the Natural</title>
      </sec>
      <sec id="sec-3-2">
        <title>Language Processing (NLP) explaining use cases in the</title>
      </sec>
      <sec id="sec-3-3">
        <title>Insurance world.</title>
      </sec>
      <sec id="sec-3-4">
        <title>NLP can find a slot in broad Insurance fields, from Marketing to Underwriting, from Claims processing to Risk assessment, and also it can be applied in the traditional actuarial Reserving area.</title>
        <p>Kaggle classification competition. 1</p>
      </sec>
      <sec id="sec-3-5">
        <title>The workshop was organized in the manner to show</title>
        <p>an application of NLP techniques in the Insurance Fraud</p>
      </sec>
      <sec id="sec-3-6">
        <title>Detection by the disaster Tweets data set retrieved from</title>
      </sec>
      <sec id="sec-3-7">
        <title>The first approach was to apply an Exploratory Data</title>
      </sec>
      <sec id="sec-3-8">
        <title>Analysis by the use of the word cloud, statistics of tweets and the language used in the data set.</title>
      </sec>
      <sec id="sec-3-9">
        <title>After the pre-processing activity the work went ahead deeply into the text discovering Named Entity Recognition, Part of Speech Tagging, N-grams analysis, Topic Modelling and Word Embedding.</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>2. Natural Language Processing in</title>
      <sec id="sec-4-1">
        <title>3.1. Activity</title>
        <p>the Insurance world</p>
        <sec id="sec-4-1-1">
          <title>Natural Language Processing is a branch of Artificial</title>
        </sec>
        <sec id="sec-4-1-2">
          <title>Intelligence with the aim to design some models allowing</title>
          <p>CEUR
htp:/ceur-ws.org
ISN1613-073</p>
          <p>CEUR</p>
          <p>Workshop Proceedings (CEUR-WS.org)
https://www.kaggle.com/competitions/nlp-getting-started</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Results</title>
        <sec id="sec-4-2-1">
          <title>The data set has 7.613 rows and 5 columns with ”id”,</title>
          <p>”text”, ”target” and other 2 columns: ”keyword” and
”location”.</p>
          <p>Text of the tweets are written in English for the almost
whole data frame.</p>
          <p>The data set has a length of 7.613 tweets with an
average of 15 words per tweet, an average of 6 characters
word length, and roughly 5 stop words per tweet.</p>
          <p>Tweets are largely coming from the USA followed by
UK and Canada. From the word cloud tool appears words
“earthquake”, “deeds”, “reason”, “forest” and “fire” as the
most common terms.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Text Analysis</title>
      <sec id="sec-5-1">
        <title>4.1. Activity</title>
        <p>4.1.1. Named Entity Recognition and Part of
Speech Tagging
generate tree representation of the sentence. POS
tagging works as a prerequisite for further NLP analysis
such as chunking, syntax parsing, information
extraction, machine translation, sentiment analysis, grammar
analysis and word-sense disambiguation.
4.1.2. Text cleaning and N-grams</p>
        <sec id="sec-5-1-1">
          <title>At this point is required a pre-processing step, known</title>
          <p>as text cleaning, to generate features, extracts patterns
in text and further analysis. The aim of this process
is preparing raw text for Natural Language Processing.</p>
          <p>Text cleaning is based on several steps starting with the
text normalization, proceeding with removing Unicode
characters and stop words, and ending with performing
stemming/lemmatization.</p>
          <p>The analysis went ahead with N-grams, a contiguous
sentence of n items generated from a given sample of text
where the items can be words. N-grams can be used as
features from text corpus for Machine Learning models,
moreover they are useful in autocompletion of sentences,
or speech recognition helping to predict the next word
that should be occurring in a sequence.</p>
          <p>In the workshop N-grams were performed by
Bagof-Words (BoW) approach retrieving top occurrence of
unigram, bigrams and trigrams in the data set.</p>
          <p>
            Bag-of-Words [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ] belong to a first level of techniques
used to transform raw text into numerical features by
vectorization. BoW describes the occurrence of words
taken from a vocabulary obtained from a corpus text by
labelling it in a binary vector.
          </p>
          <p>Each word is represented by a one-hot vector, a sparse
vector in the size of the vocabulary. The Bag-of-Words
feature vector is the sum of all one-hot vectors of the
words.
4.1.3. Word Embedding
Text Analysis tools are used by Companies to extract The second level to transform raw text into numerical
valuable insights from text data. features is given by Word Embedding with the goal to</p>
          <p>
            The first approach followed was to explore Named En- generate vectors encoding semantic meanings:
individtity Recognition (NER) and Part of Speech (POS) Tagging ual words are represented by vectors in a predefined
by the use of “Spacy” library. vector space [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]. It allows words with similar meaning
          </p>
          <p>A named entity is a proper noun that refer to a specific to have a similar representation. In the workshop Word
entity like location, person, date, time, organization, etc. Embedding was performed by Word2vec using “gensim”
Named Entity Recognition systems are used to identify library. Word2vec is a neural network that try to
maxiand segmentize named entities in text, and you can find mize the probability to see a word in a context window,
them in application such as question answering, infor- the cosine similarity between two vectors. This task was
mation retrieval, and machine translation. In Insurance, explored by two architectures: continuous bag-of-words
NER applications can be found in customer care to clas- (CBOW) model that try to predict a word from its context
sify customer complaints. and by continuous skip-gram model that try to predict</p>
          <p>A POS Tagging is the process on marking tokens in a the context from a word.
part of speech like nouns, adverbs, verbs, punctuation,
etc. The purpose of a POS tagger is to assign grammatical
information to each token and in this way is possible to
4.1.4. Topic Modelling</p>
        </sec>
        <sec id="sec-5-1-2">
          <title>The activity of this chapter ended with Topic Modelling,</title>
          <p>a form of unsupervised learning that works
discovering hidden relationships in the text, more precisely the
purpose is to identify topics in a document.</p>
          <p>The purpose of the workshop was to discover new
available and best performing tools for NLP, and for this
activity was explored the use of BERTopic developed by
Maarten Grootendorst. 3</p>
          <p>
            There are four key components in BERTopic [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ], and
can be considered as an ensemble of models.
          </p>
          <p>It starts generating several document embeddings to
represent the meaning of the sentences using a
pretrained language model: BERT (Bidirectional Encoder
Representations from Transformers).</p>
          <p>Given the huge dimension of vectors generated, is
applied a general non-linear dimensionality reduction
technique: UMAP (Uniform Manifold Approximation
and Projection).</p>
          <p>At this point the reduced embeddings are clustered
with HDBSCAN (Hierarchical Density-Based Spatial
Clustering). It finds clusters of variable densities
converting DBSCAN into hierarchical clustering.</p>
          <p>To retrieve topics from clustered document is applied
a modified version of TF-IDF (Term Frequency-Inverse
Document Frequency): the class-based TF-IDF procedure,
where the class represents the collection of documents
merged into a single document per each cluster.</p>
          <p>
            Input features for BERTopic has been generated by
TF-IDF that is an extension of Bag of Words, where terms
are weighted and in this way are highlighted words with
useful information [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ].
ter”. Looking at the bigrams they are: “suicide bomber”,
“youtube video”, “northern california”, “california
wildifre”, “bombe detonate”, and “natural disaster”.
          </p>
          <p>Interesting results are coming from Word Embedding,
where the vocabulary is similar between the two
architectures models with these relevant words: “earthquake”,
“forest”, “evacuation”, “people”, ’wildfire’, “california”,
“flood”, “disaster”, “emergency”, “damage”.</p>
          <p>What is changing is the similarity between words,
the CBOW model usually provides a similarity between
words lower than the one provided by the Skip-Gram
model.</p>
          <p>The Word Embedding exploration ended reducing the
vectors dimension with the application of the Principal
Component Analysis giving the opportunity for words
visualization into two dimensions.</p>
          <p>
            After that, was the turn of the Topic Modelling with
BERTopic. The tool was used in easy way, without fine
tuning parameters, using TF-IDF features as input and
4.2. Results with the arbitrary choice of ten topics.
Participants asked how to trust in results, so the job
From the N-grams analysis the top occurrence words has been completed with the coherence score
evaluaare: “people”, ”video”, ”crash”, ”emergency”, and ”disas- tion of BERTopic and the comparison with the Latent
Dirichlet Allocation (LDA) model [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ], the common model
employed in the Topic Modelling. a prediction for the task.
          </p>
          <p>The same relevant words from BERTopic appears also The attention mechanism was introduced to improve
in LDA, but with diferent probabilities. The evaluation the performance of the encoder-decoder model for
mahas been done with the UMass coherence score that calcu- chine translation. The idea behind the attention
mechalates how often two words appear together in the corpus. nism was to permit the decoder to use the most relevant
The perfect coherence is in 0, and it usually decrease parts of the input sequence in a flexible manner, by a
with the rising number of topics. The issue with the LDA weighted combination of all of the encoded input
vecmodel was that the number of trained documents in each tors.
chunk has been reduced to make it converge. From re- With Transformers we had reached the third level of
sults BERTopic shows a number closer to 0 (roughly -14) vectorization technique in NLP, the Contextual
Embedthan LDA model (roughly -18), though the result can be ding [9]. Both traditional Word Embedding (word2vec,
improved tuning the model. Glove) and Contextual Embedding (ELMo, BERT), aim to
learn a continuous (vector) representation for each word
in the documents.
5. Text Classification The Word Embedding method builds a global
vocabulary using unique words in the documents by ignoring
5.1. Activity the meaning of words in diferent context. Hence, given
The approach followed in this chapter is the classifica- a word, its embedding is always the same in whichever
tion task: given a target variable the aim is to predict if sentence it occurs, and for this reason, the pre-trained
a tweet can be considered a “disaster” or otherwise “not Word Embeddings are static.
disaster”. Twitter has become an important emergency Contextual Embedding methods are used to learn
communication channel, because people by smartphones sequence-level semantics by considering the sequence
are able to announce an emergency they’re observing in of all words in the documents. The embeddings are
obreal-time. For this reason, more agencies and Insurance tained from a model by passing the entire sentence to
Companies are interested in monitoring Twitter. More- the pre-trained model. The embeddings generated for
over, it’s not always clear whether a person’s words are each word depends on the other words in a given
senactually announcing a disaster, so this task can be linked tence. The Transformer based models work on attention
to a Fraud Detection task. mechanism, and attention is a way to look at the relation</p>
          <p>
            The approach followed was to use Transfer Learning between a word with its neighbours, and for this reason,
[
            <xref ref-type="bibr" rid="ref6">6</xref>
            ] that is a Machine Learning method where a model pre-trained Contextual Embeddings are dynamic.
developed for a task is reused as the starting point for BERT’s goal is to generate a language representation
a model on a second task. In the traditional Supervised model and uses a rich input embedding representation,
Learning approach, Machine Learning models are trained derived from a sequence of tokens, which is converted
on labelled data sets expecting to perform well on un- into vectors and then three embedding layers are
comseen data of the same task and domain. The traditional bined to obtain a fixed-length vector processed in the
approach falling down when there is not enough labelled Neural Network. BERT is pre-trained using two
Unsudata to perform training for the task or domain of inter- pervised Learning tasks: Masked LM (MLM) and Next
est. Sentence Prediction (NSP).
          </p>
          <p>The idea behind Transfer Learning is to try to store The usual workflow of BERT consists of two stages:
the knowledge gained in solving the source task in the pre-training and fine-tuning.
source domain and apply it to another similar problem The attention mechanism in the Transformer allows
of interest, it is the same concept of learning process by BERT to model many downstream tasks, such as
sentiexperience, so the aim is to exploit pre-trained models ment analysis, question answering, paraphrase detection
that can be fine-tuned on smaller task or specific data and more. In this workshop has been used the “ktrain”
sets. [10] library, a low-code library developed by Arun S.</p>
          <p>Bidirectional Encoder Representation from Transform- Maiya 4 that provides a lightweight wrapper for “Keras”,
ers (BERT) is one of the most popular state-of-art NLP making it easier to build, train, and deploy Deep Learning
approaches for Transfer Learning, published by Google models.
in 2018 [7].</p>
          <p>BERT is a bidirectional multi-layer Transformer model 5.2. Results
that exploits the attention mechanism [8]. A basic
Transformer uses an encoder-decoder architecture. The en- Thanks to the “ktrain” low coding library was easy the
coder learns the representation from the input sentence implementation of BERT, just to split the data set into a
and the decoder receives the representation and produces
train and test set, then sent the train set as input into the
“ktrain” pre-processing and Deep Learning model.</p>
          <p>The Transformer based model shows, as expected, a
good performance both on the train set and test set with
an accuracy greater than 82%. Given the imbalanced data
set the performance was evaluated also by F1 score with
quite the same results.</p>
          <p>Participants asked an evaluation with other models,
and the job has been completed with the common
classification Machine Learning model: the Logistic Regression,
always with “ktrain” library.</p>
          <p>Implementation of this model is easy, but there is
a poorly performance: roughly 63% on train set and
roughly 58% on validation set.</p>
          <p>Looking at the confusion matrix, BERT model shows a
large number of elements across the diagonal and small
number of elements of the diagonal, so a better matrix
than the Logistic Regression.</p>
          <p>Last step was the inference of the model, testing the
right prediction of test tweets, and also in this situation
BERT outperformed Logistic Regression model with all
right predictions.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <sec id="sec-6-1">
        <title>With this workshop there has been the opportunity to</title>
        <p>have an overview of Natural Language Processing
applications in the Insurance world, and with the disaster
Tweets data set there has been the opportunity to
discover NLP applications in Fraud Detection.</p>
        <p>In this work the development of Natural Language
Processing has been retracted, from Tokenization to Bag of
Words, from Word Embedding to Contextual Embedding.</p>
        <p>Transfer Learning has been explored, looking on its
potential that outperforms benchmark models both on
Topic Modelling and Classification prediction.
[7] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova,
Bert: Pre-training of deep bidirectional
transformers for language understanding, arXiv preprint
arXiv:1810.04805 (2018).
[8] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit,
L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin,
Attention is all you need, Advances in neural
information processing systems 30 (2017).
[9] Q. Liu, M. J. Kusner, P. Blunsom, A
survey on contextual embeddings, arXiv preprint
arXiv:2003.07278 (2020).
[10] A. S. Maiya, ktrain: A low-code library for
augmented machine learning, J. Mach. Learn. Res 23
(2020) 1–6.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Uthayasooriyar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>A survey on natural language processing (nlp) and applications in insurance</article-title>
          , arXiv preprint arXiv:
          <year>2010</year>
          .
          <volume>00462</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ferrario</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nägelin</surname>
          </string-name>
          ,
          <article-title>The art of natural language processing: classical, modern and contemporary approaches to text document classification, Modern and Contemporary Approaches to Text Document Classification (March 1,</article-title>
          <year>2020</year>
          ) (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Corrado,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <article-title>Eficient estimation of word representations in vector space</article-title>
          ,
          <source>arXiv preprint arXiv:1301.3781</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Grootendorst</surname>
          </string-name>
          , Bertopic:
          <article-title>Neural topic modeling with a class-based tf-idf procedure</article-title>
          ,
          <source>arXiv preprint arXiv:2203.05794</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Jelodar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <article-title>Latent dirichlet allocation (lda) and topic modeling: models, applications, a survey</article-title>
          ,
          <source>Multimedia Tools and Applications</source>
          <volume>78</volume>
          (
          <year>2019</year>
          )
          <fpage>15169</fpage>
          -
          <lpage>15211</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Malte</surname>
          </string-name>
          , P. Ratadiya,
          <article-title>Evolution of transfer learning in natural language processing</article-title>
          , arXiv preprint arXiv:
          <year>1910</year>
          .
          <volume>07370</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>