=Paper= {{Paper |id=Vol-2699/paper13 |storemode=property |title=Improving Spare Part Search for Maintenance Services using Topic Modelling |pdfUrl=https://ceur-ws.org/Vol-2699/paper13.pdf |volume=Vol-2699 |authors=Anastasiia Grishina,Milosh Stolikj,Qi Gao,Milan Petkovic |dblpUrl=https://dblp.org/rec/conf/cikm/GrishinaSGP20 }} ==Improving Spare Part Search for Maintenance Services using Topic Modelling== https://ceur-ws.org/Vol-2699/paper13.pdf
Improving Spare Part Search for Maintenance Services
using Topic Modelling
Anastasiia Grishinaa , Milosh Stolikjb , Qi Gaob and Milan Petkovica,b
a Eindhoven University of Technology, Den Dolech 2, 5612 AZ, Eindhoven, The Netherlands
b Philips Research, High Tech Campus 34, 5656 AE, Eindhoven, The Netherlands



                                          Abstract
                                          To support the decision-making process in various industrial applications, many companies use knowledge management and
                                          Information Retrieval (IR). In an industrial setting, knowledge is extracted from data that is often stored in a semi-structured
                                          or unstructured format. As a result, Natural Language Processing (NLP) methods have been applied to a number of IR steps.
                                          In this work, we explore how NLP and particularly topic modelling can be used to improve the relevance of spare part retrieval
                                          in the context of maintenance services. A proposed methodology extracts topics from short maintenance service reports that
                                          also include part replacement data. An intuition behind the proposed methodology is that every topic should represent a
                                          specific root cause. Experimental were conducted for an ad-hoc retrieval system of service case descriptions and spare parts.
                                          The results have shown that our modification improves a baseline system thus boosting the performance of maintenance
                                          service solution recommendation.

                                          Keywords
                                          Entity retrieval, spare part search, decision support, maintenance services, natural language processing, topic modelling


1. Introduction                                                                                                    system which helps engineers to search for relevant
                                                                                                                   historical service reports and identify the most
Information retrieval systems are gaining importance                                                               probable service solution. Therefore, target retrieval
in various industrial applications. We can observe the                                                             entities are equipment components, i.e. parts to be
emergence of knowledge-based systems that support                                                                  replaced. In practice, one case may require multiple
the decision-making process in construction, avia-                                                                 parts to be replaced.
tion, equipment maintenance and other areas [1, 2].                                                                   To address the challenge of spare part retrieval, we
In these settings, knowledge is frequently extracted                                                               create an NLP pipeline that pre-processes short
from data that is captured in legacy systems using                                                                 textual descriptions of maintenance activities and
natural language and stored in a semi-structured or                                                                apply topic modelling to categorize the descriptions
unstructured format. As a result, linguistic and                                                                   of past cases. From relevant maintenance service
statistical NLP methods have been applied to a                                                                     reports, the proposed methodology extracts topics
number of IR steps, such as document and query                                                                     each of which may indicate a specific root cause.
modelling, query expansion and search result                                                                       Once categorized, cases and parts would be easier to
clustering based on semantic similarities [3, 4, 5, 6].                                                            examine and more relevant to a particular type of
   In this work, we explore how NLP and particularly                                                               failure. An engineer can address topics seuqentially
topic modelling can be used to improve spare part                                                                  and choose among parts related to the same topic.
retrieval that serves the purpose of medical                                                                       Therefore, we exploit term co-occurrences and their
equipment maintenance. In particular, we focus on                                                                  semantic correspondences using topic modelling to
remote system diagnostics that takes place when the                                                                enhance the relevance of target entities retrieval.
equipment malfunctions, i.e. stops working according                                                               Although the use case assumes that a number of
to its specification. The problem may be resolved in                                                               parts will be ultimately suggested based on past
several ways, one of which is the replacement of one                                                               maintenance records, the problem statement does not
or more (malfunctioning) parts. We conducted our                                                                   fall under the vastly explored area of recommender
research in the context of an ad-hoc entity retrieval                                                              systems that involves user preference modelling.
Proceedings of the CIKM 2020 Workshops, October 19-20, 2020,
                                                                                                                      To evaluate the difference introduced by the
Galway, Ireland                                                                                                    proposed component, we use IR metrics that are
" a.grishina@tue.nl (A. Grishina); m.stolikj@philips.com (M.                                                       customized to characterize the relevance and
Stolikj); q.gao@philips.com (Q. Gao); milan.petkovic@philips.com                                                   completeness of a set of retrieved entities. They
(M. Petkovic)
 0000-0003-3139-0200 (A. Grishina)
                                                                                                                   measure how far in the list of search results all the
                                    Β© 2020 Copyright for this paper by its authors. Use permitted under Creative
                                    Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                                                   required parts are present, indicate if at least one
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)                                        required entity is retrieved and whether all needed
parts are present among top K search results.               activities and IDs of parts used to solve the issue.
  The main contributions of the work are as follows:        Hence, the reports might contain abbreviations,
                                                            software logs sent by a machine as well as natural
    β€’ we enhance the performance of an industrial           language descriptions of a machine state on every
      entity retrieval system by learning semantic          step of the maintenance process. Closed cases are up-
      correspondences between short historical              loaded to the collection of historical cases that could
      descriptions of events associated with the            be mined using the above mentioned ER system.
      entities;                                                To present the setting in a formal way, let π‘ž be a
    β€’ we approach the challenge of spare parts              query performed by a service engineer while
      retrieval in remote system diagnostics and            working on a case. We will use the term query case to
      maintenance of industrial equipment using             indicate such cases. Each query is associated with a
      topic modelling to group extracted historical         single maintenance case. The list of parts replaced in
      cases and parts under topics that should              a case 𝑐 is 𝑃(𝑐). We use 𝐢(π‘ž) to denote a list of cases
      represent failure root causes;                        retrieved for the query π‘ž. A set of parts replaced in all
                                                            retrieved cases is denoted by 𝑃(π‘ž) = βˆͺπ‘βˆˆπΆ(π‘ž) 𝑃(𝑐), and a
    β€’ we evaluate the proposed method on a real             set of ranked parts recommended for replacement is
      world dataset using customized information            expressed by 𝑃𝑅 (π‘ž) βŠ† 𝑃(π‘ž).
      retrieval metrics.
   The remainder of this paper is organized as              3. Methodology
follows. We present the problem formulation and a
baseline part retrieval system in Section 2. The            The method proposed in this work combines a baseline
methodology of combining the text mining pipeline           entity retrieval setting and an add-on topic modelling
and the entity retrieval process is described in            component as described below.
Section 3. Section 4 is dedicated to a dataset
description and methods implementation. We discuss
                                                            3.1. Baseline Entity Retrieval System
experimental results in Section 5 and related work in
Section 6. The paper is concluded by Section 7 where        The baseline entity search system in question is
we also mention possible directions for future work.        empowered with a two-step retrieval mechanism. A
                                                            database of entity descriptions lies in the foundation
                                                            of the mechanism. It consists of entity descriptions
2. Problem Description                                      retrieval followed by the final entity retrieval and
                                                            ranking as explained in detail below.
In the scope of this work, entity descriptions are
composed of equipment characteristics and
represented by maintenance case reports registered          3.1.1. Retrieval of Entity Descriptions
in the retrieval system. Entities to be retrieved are the   At the first step of the entity search, a system
parts recommended for replacement to troubleshoot           retrieves relevant descriptions using a Vector Space
a machine referred to in a new malfunction report.          Model (VSM) with Okapi BM25 similarity score [7, 8].
Queries may contain various characteristics of a new        VSM is a document and query representation model
maintenance case that should be treated by a                that converts texts to N-dimensional vectors of term
maintenance service team. An entity, i.e. a spare part,     weights, where N is the number of words in a
is identified with a unique ID and is related to a case     dictionary. Terms are simply the words or groups of
description. One historical maintenance case can            words present in the collection of documents. The
have several parts associated with it, similarly, a new     dictionary is built from a text corpus and includes
service case may require a set of different parts.          distinct terms. The intuition behind VSM is that
   The knowledge base of maintenance cases is               retrieved documents will be ranked according to a
updated with the help of service engineers. They            similarity function computed for a query and a
submit maintenance reports for every equipment              document, i.e. vectors in a vector space.
failure or customer complaint as short technical texts         In the context of our problem description, for a
often in multiple languages (English and a locally          query π‘ž containing keywords {π‘žπ‘– }𝑛𝑖=1 and a mainte-
spoken language). Each historical report includes a         nance case description 𝑐 with fields {𝑐𝑗 }π‘š
                                                                                                      𝑗=1 , Okapi
number of logs such as time of customer complaint
registration, a textual description of maintenance
BM25 similarity score could be expressed as follows:                          Algorithm 1 Part Recommendation
                                                                              Input: Query π‘ž associates with maintenance case 𝑐,
                         𝐡𝑀25(π‘ž, 𝑐) =                                           number of parts to recommend 𝐾
    π‘š   𝑛
                                     𝑓 (π‘žπ‘– , 𝑐𝑗 ) β‹… (π‘˜1 + 1)                  Output: A list of recommended parts
 = βˆ‘ βˆ‘ 𝐼 𝐷𝐹 (π‘žπ‘– ) β‹…                                            𝐿𝑐
                                                                        .       π‘π‘œπ‘’π‘›π‘‘ ← {} ⊳ # occurrences of part combinations
   𝑗=1 𝑖=1              𝑓 (π‘žπ‘– , 𝑐𝑗 ) + π‘˜1 β‹… (1 βˆ’ 𝑏 + 𝑏 β‹… πΏπ‘Žπ‘£π‘”π‘— )
                                                                    𝑗           𝑃(π‘ž) ← {}                          ⊳ retrieved parts
                                                                        (1)     𝑃𝑅 (π‘ž) ← {}                  ⊳ recommended parts
                                                                                for 𝑐 ∈ 𝐢(π‘ž) do:
   Here, 𝑓 (π‘žπ‘– , 𝑐𝑗 ) is the frequency of the keyword π‘žπ‘– in                         𝑃(𝑐) ← get part IDs(𝑐)
a field 𝑐𝑗 of the case description 𝑐. 𝐿𝑐𝑗 is the length of                          if 𝑃(𝑐) ∈ 𝑃(π‘ž) then
the field 𝑐𝑗 in terms of words, and πΏπ‘Žπ‘£π‘”π‘— is the average                                π‘π‘œπ‘’π‘›π‘‘(𝑃(𝑐)) ← π‘π‘œπ‘’π‘›π‘‘(𝑃(𝑐)) + 1
length of the field 𝑗 in descriptions of all cases in the                           else
collection 𝐢.           Variables π‘˜1 and 𝑏 are tuning                                   𝑃(π‘ž) ← 𝑃(π‘ž) βˆͺ 𝑃(𝑐)
parameters that control how much every new                                              π‘π‘œπ‘’π‘›π‘‘(𝑃(𝑐)) ← 1
occurrence of a term impacts the score and the                                      end if
document length scaling correspondingly. Inverse                                    sort(𝑃(π‘ž), using=π‘π‘œπ‘’π‘›π‘‘(𝑃(𝑐)), order=DESC)
Document Frequency is calculated as:                                                for 𝑃(𝑐) ∈ 𝑃(π‘ž) do
                                                                                        𝑃(π‘ž) ← 𝑃(π‘ž) βˆͺ 𝑃(𝑐)
                                    𝑀 βˆ’ 𝑛(π‘žπ‘– ) + 0.5
             𝐼 𝐷𝐹 (π‘žπ‘– ) = log                        ,                  (2)             drop duplicates(𝑃(π‘ž))
                                (    𝑛(π‘žπ‘– ) + 0.5 )                                 end for
where 𝑀 is the total number of cases, i.e. 𝑀 = |𝐢(π‘ž)|,                          end for
and 𝑛(π‘žπ‘– ) is the number of case descriptions that                              𝑃𝑅 (π‘ž) ← top K(𝑃(π‘ž))
contain the query term π‘žπ‘– . Therefore, the case
𝑐 𝑖1 ∈ 𝐢(π‘ž) is ranked higher than 𝑐 𝑖2 ∈ 𝐢(π‘ž) iff
𝐡𝑀25(π‘ž, 𝑐 𝑖1 ) > 𝐡𝑀25(π‘ž, 𝑐 𝑖2 ).                           tokens that represent individual words or sometimes
                                                           groups of words [9]. The process of lemmatization
3.1.2. Entity Retrieval and Ranking                        involves finding the initial forms of the inflected
                                                           words, also referred to as root forms or lemmas. A
The second step realizes the entity retrieval. It ranks lemma is a word in its canonical form that exists in
spare parts associated with the retrieved cases based the dictionary of the used language. For example, the
on the frequency of their occurrence and the rank of lemma for do, doing, did is the word do. Next, term
the case where they occur. Thus, the most frequent weighting refers to assigning weights to tokens. We
parts that occur in top ranked cases appear higher on utilize term frequency or bag-of-words weights as a
the final list of retrieved parts than a part that term weighting scheme. It associates a term with a
appears the same number of times lower on the case weight proportional to the frequency of the term
list. Several proprietary filters are applied as well, but occurrence in the corpus of documents. For topic
they do not affect the methodology. The algorithm modelling, we use Latent Dirichlet Allocation (LDA),
for part recommendation is presented in Algorithm 1. one of the most popular algorithms for automatically
                                                           extracting topics. LDA is based on the generative
3.2. Topic Modelling Component                             probabilistic language model [10]. The purpose of
                                                           LDA is to learn the representation of a fixed number
Transformation of the historical cases and parts re- of topics and derive the topic distribution for every
trieval pipeline is performed by adding a component document in a collection. Every maintenance service
that groups retrieved cases under a number of topics case is assigned a topic according to the maximum
and ranks the parts within the topics. Figure 1 shows probability of the case belonging to a topic.
the baseline architecture (a) and the modification that
includes the proposed topic modelling component (b).
    The topic modelling component could be consid- 4. Evaluation
ered as an individual NLP pipeline with a number of
steps. The pipeline includes tokenization, lemmatiza- In this section, we describe the real world dataset that
tion, removal of stop phrases, building a dictionary of is extracted from the baseline part retrieval system.
tokens, term weighting and topic modelling. Tok- We also discuss the metrics used to evaluate the
enization of the text refers to splitting it into units or performance of the baseline system and compare it to
Figure 1: Integration of the topic modelling component (b)
in a baseline two-step document and part retrieval system
(a).

                                                             Figure 2: Distribution of queries over the number of cases
                                                             and parts retrieved in response to the queries.
the configuration with the integrated topic modelling
component.
                                                       with the corresponding cases returned as search
4.1. Dataset Description                               results: (π‘ž, 𝐢(π‘ž)). Cases returned for the queries may
For our experiments, we use a proprietary dataset have non-empty intersection with the training
composed of historical maintenance cases. Textual dataset, however, the cases for which the queries had
fields of case descriptions have been aggregated into been created were excluded from the training set.
one field per maintenance case and serve as input to
LDA during training and testing stages. The majority 4.2. Evaluation metrics
of cases are written in mixed languages. Figure 2
                                                       Top 𝐾 ranked parts are used to estimate 𝑠𝑒𝑐𝑐𝑒𝑠𝑠,
presents the distribution of the number of queries
                                                       π‘π‘œπ‘šπ‘π‘™π‘’π‘‘π‘’π‘›π‘’π‘ π‘ , π‘Ÿπ‘’π‘π‘Žπ‘™π‘™ and π‘šπ‘–π‘›_π‘‘π‘œπ‘_π‘˜ metrics.
over their characteristics: the number of retrieved
                                                       Metric@K is computed for a set of retrieved parts
service cases, retrieved ranked parts to replace and
                                                       |𝑃𝑅 (π‘ž)| ≀ 𝐾 . The operator | β‹… | applied to a set defines
parts replaced in the query case. The majority of
                                                       the count of set elements. The metrics are calculated
queries retrieved up to 200 similar case descriptions,
                                                       as follows:
however, this number could reach 1000 cases. The
                                                                                {
number of unique recommended parts retrieved from                                 1, if 𝑃(𝑐) βŠ† 𝑃𝑅 (π‘ž),
these cases was below 350 in general, while the π‘π‘œπ‘šπ‘π‘™π‘’π‘‘π‘’π‘›π‘’π‘ π‘ @𝐾 (π‘ž) =
                                                                                  0, if 𝑃(𝑐) ⊈ 𝑃𝑅 (π‘ž);
majority of queries retrieved 0-10 parts. The number                            {
of parts required to treat a maintenance case                                     1, if |𝑃(𝑐) ∩ 𝑃𝑅 (π‘ž)| > 0,
associated with the query was equal to 5 or less in             𝑠𝑒𝑐𝑐𝑒𝑠𝑠@𝐾 (π‘ž) =
                                                                                  0, if |𝑃(𝑐) ∩ 𝑃𝑅 (π‘ž)| = 0;
most of the query cases.
   For building the LDA model, we use a subset of                π‘Ÿπ‘’π‘π‘Žπ‘™π‘™@𝐾 (π‘ž) =
                                                                                |𝑃(𝑐) ∩ 𝑃𝑅 (π‘ž)|
                                                                                                ;
historical cases written in English. The training set                               |𝑃(𝑐)|
contains data from 101,026 different maintenance            π‘šπ‘–π‘›_π‘‘π‘œπ‘_π‘˜@𝐾 (π‘ž) =π‘˜, π‘˜ ≀ 𝐾 and
cases. For the test set, we use a sample of 1,564                                     π‘π‘œπ‘šπ‘π‘™π‘’π‘‘π‘’π‘›π‘’π‘ π‘ @π‘˜(π‘ž) = 1;
queries performed by service engineers, together
   πΆπ‘œπ‘šπ‘π‘™π‘’π‘‘π‘’π‘›π‘’π‘ π‘  measures whether all the used parts          which spans from 2 to 20 in our experiments. All the
were suggested for a troubleshooting report, 𝑠𝑒𝑐𝑐𝑒𝑠𝑠         metrics presented in the paper are evaluated at top 𝐾
shows if any consumed part was listed among                  retrieved parts, 𝐾 = 5, 10. The algorithm is set up to
retrieved parts and π‘Ÿπ‘’π‘π‘Žπ‘™π‘™ indicates the ratio of            learn symmetric 𝛼, a document-topic prior, from data
retrieved parts that were consumed to the total              as well as πœ‚, a topic-word prior. The number of
number of consumed parts. An additional metric               iterations is fixed at 100.
π‘šπ‘–π‘›_π‘‘π‘œπ‘_π‘˜ is used to estimate how far in the list of            In addition, we set an empirical parameter for the
retrieved parts one could find the full list of              ratio of English words appearing in the case
consumed parts in the query case and returns null if         description 𝑅𝐸𝑛 = 30%. A topic will be derived by
such π‘˜ does not exist. As a baseline, we use the initial     LDA trained on the entirely English corpus in case
part retrieval strategy and its statistics for the whole     the description contains at least 𝑅𝐸𝑛 English words,
set of retrieved and ranked parts 𝑃𝑅 (π‘ž). Once topics        otherwise the maintenance case will be marked as
are computed, the metrics are estimated for parts            β€œtopic undefined”.
associated with the cases in every topic 𝑑, i.e. a subset
of        cases      and,        therefore,         parts:
𝑃𝑅 (π‘ž)(𝑑) = {𝑃(π‘ž) | 𝑐 ∈ 𝐢(π‘ž) & 𝑐 ∈ 𝑑 } instead of 𝑃𝑅 (π‘ž).    5. Results and Discussion
We discard query cases that did not include
                                                       In this section, we compare the results of the initial
information whether some parts were consumed or
                                                       ER architecture evaluation to the results of the
not (i.e. missing data). If a case did not require any
                                                       modified architecture with the topic modelling
part replacement, we utilize an artificial part called
                                                       component as well as to the best possible results for
β€œNo parts” and assign an ID to it. In this way, for
                                                       the dataset of maintenance cases. We group queries
query cases that were solved without part
                                                       by levels of generalization, which stands for the
replacement it is possible to evaluate the
                                                       number of matched cases and retrieved parts in our
performance of part retrieval. The top ranked part in
                                                       setting. Moreover, since a number of topics is a
this situation should be β€œNo parts”.
                                                       hyper-parameter that is not learned via training, we
                                                       discuss the estimation of a possible number of topics
4.3. Implementation                                    using NLP coherence metrics and compare it with
The first step of the initial ER system is powered by observations of the retrieval system’s performance.
Elasticsearch [11]. It performs indexing of the
documents in the knowledge base and retrieves them 5.1. Retrieval Performance at Top K
according to Okapi BM25 ranking with default tuning           Parts
parameters π‘˜1 = 1.2 and 𝑏 = 0.75.
   For the add-on topic modelling component, we The performance of maintenance cases and parts
utilize Python NLP libraries: Gensim [12] for all the retrieval in the initial configuration of the part
steps including topic modelling and spaCy [13] for retrieval system (Baseline) and the configuration with
lemmatization. One step that is also customized to LDA topic modelling component (LDA) is evaluated
the maintenance application is the removal of stop using the above described metrics at different 𝐾 .
phrases. We use a collection of English stop words These results are also compared to the best possible
pre-defined by Gensim and corpus-specific common results on the test dataset computed at 𝐾 = ∞. We
phrases such as questionnaire forms repeated across report a 95%-level confidence interval of the mean
the majority of cases, since question formulations do values of 5 runs with different random seeds for LDA
not characterize individual cases.                     initialization in Figure 3. In addition, we show the
   One characteristic of LDA model is that it provides ratio of test queries for which the metrics improved
different topic distributions depending on a random with the topic modelling component in comparison
seed used in its initialization. Therefore, every LDA to the baseline implementation in Figure 4.
model with the same set of parameters, except for the     Comparing baseline results at different top 𝐾
random seed, should be computed several times that     retrieved   parts, it can be seen that the values of
will be referred to as runs further in the text.       𝑠𝑒𝑐𝑐𝑒𝑠𝑠,     π‘π‘œπ‘šπ‘π‘™π‘’π‘‘π‘’π‘›π‘’π‘ π‘ , π‘Ÿπ‘’π‘π‘Žπ‘™π‘™ and π‘šπ‘–π‘›_π‘‘π‘œπ‘_π‘˜
Afterwards, all the metrics should be averaged over    increase   with   higher 𝐾 and achieve the possible
several runs to get consistent results and minimize    maximum      at 𝐾 = ∞. 𝑀𝑖𝑛_π‘‘π‘œπ‘_π‘˜@∞ is not the target
the influence of the algorithm’s stochastic behavior.  value  for this metric, since it is higher than the values
Another control parameter is the number of topics      of π‘šπ‘–π‘›_π‘‘π‘œπ‘_π‘˜@𝐾       for any 𝐾 β‰  ∞ while the goal is to
minimize it.        Since we target at the lowest
π‘šπ‘–π‘›_π‘‘π‘œπ‘_π‘˜@𝐾 possible, this metric is improved
when the average value decreases.
   Overall improvement is observed for the
experimental configuration with the topic modelling
component. For metrics evaluated at 𝐾 = 10, the
improvement reached 54.5%, 52.6% and 51.8% of
maximum possible improvement for π‘π‘œπ‘šπ‘π‘™π‘’π‘‘π‘’π‘›π‘’π‘ π‘ ,
π‘Ÿπ‘’π‘π‘Žπ‘™π‘™ and 𝑠𝑒𝑐𝑐𝑒𝑠𝑠. It indicates that the introduced
component effectively captures similar cases and
therefore parts, too. The performance improvement
influenced by topic modelling is more prominent at
smaller values of 𝐾 as can be seen from the difference
between the average baseline values of π‘π‘œπ‘šπ‘π‘™π‘’π‘‘π‘’π‘›π‘’π‘ π‘ ,      Figure 3: Comparison of different metrics computed for
π‘Ÿπ‘’π‘π‘Žπ‘™π‘™ and 𝑠𝑒𝑐𝑐𝑒𝑠𝑠 and those of LDA in Figure 3.          LDA and baseline results in a part retrieval task. Confidence
   There is an increase in the ratio of improved          interval of 95% is shown as a box around LDA values.
queries for π‘π‘œπ‘šπ‘π‘™π‘’π‘‘π‘’π‘›π‘’π‘ π‘ , π‘Ÿπ‘’π‘π‘Žπ‘™π‘™ and 𝑠𝑒𝑐𝑐𝑒𝑠𝑠
calculated at smaller 𝐾 as depiced in Figure 4. For
example, from less than 4% of queries for π‘Ÿπ‘’π‘π‘Žπ‘™π‘™@10
to around 5.45% for π‘Ÿπ‘’π‘π‘Žπ‘™π‘™@5. Turning now to the
ratio of queries with improved π‘šπ‘–π‘›_π‘‘π‘œπ‘_π‘˜@𝐾 , it is
higher for larger 𝐾 since the set of top ranked parts
increases with greater 𝐾 likewise the probability of
finding all of the necessary parts among top 𝐾 parts.
Yet, it is the metric with the most prominent progress
according to the ratio of queries that were improved
using topic modelling: 10.49% to 11.20% for the LDA
configuration.
   While for some queries the metrics were improved
by the introduction of LDA component, 0.007% to
0.5% of queries experienced deterioration of the          Figure 4: Ratio of queries for which the performance met-
                                                          rics improved by the topic modelling component. Confi-
π‘π‘œπ‘šπ‘π‘™π‘’π‘‘π‘’π‘›π‘’π‘ π‘ , π‘Ÿπ‘’π‘π‘Žπ‘™π‘™ and 𝑠𝑒𝑐𝑐𝑒𝑠𝑠 at different 𝐾 and
                                                          dence interval of 95% is shown as a box around LDA values.
0.8% to 3.2% of queries for π‘šπ‘–π‘›_π‘‘π‘œπ‘_π‘˜@𝐾 . This
happens, for example, when a number of documents
with the right parts suggestion do not appear in the
                                                          as |𝑃(𝑐)| = 0. The groups of queries that benefited the
same group. A possible solution (as well as a future
                                                          most from the topic modelling component
work direction) is to integrate domain knowledge
                                                          integration are the following:
into the system and pre-define the number of topics
and their characteristic terms to always appear in the       1. queries with number of retrieved cases |𝐢(π‘ž)| >
same topic.                                                     100,

5.1.1. Performance Evaluation for Queries                    2. queries associated with cases that required 1 ≀
       Grouped Based on the Number of                           |𝑃(𝑐)| ≀ 10 parts,
       Retrieved Cases and Parts                             3. queries with retrieved and ranked parts
The queries are grouped by the number of parts used             10 < |𝑃𝑅 (π‘ž)| ≀ 100.
in the query case and retrieved cases as well as by the
                                                          Therefore, the topic modelling has a positive effect on
number of retrieved service cases as demonstrated in
                                                          the queries that result in extensive lists of cases and,
Figure A in Appendix. Similarly to Figure 3, the
                                                          thus, parts appearing in those cases. Comparing this
results are reported with the mention of 95%-level
                                                          result to the distribution of queries in our
confidence interval on average for the runs. We
                                                          experimental setting (Figure 2), the positive effect
distinguish the queries made for service cases that
                                                          concerns the largest groups of queries.
did not require any part replacement and mark them
                                                           NLP as well as database search and Semantic Web.
                                                           Both IR and ER are usually enabled with a search
                                                           engine, a user interface and an available knowledge
                                                           base. However, while IR aims at document retrieval,
                                                           the target of ER is to provide a list of ranked entities,
                                                           such as people, places or specific concepts and things.
                                                           An entity is characterized with a unique ID, a name
                                                           and possibly a set of attributes. Data that describes
                                                           the entity could be stored in natural text or in a more
                                                           structured form. NLP techniques are used for repre-
                                                           sentation of unstructured texts in a knowledge base,
Figure 5:     Coherence metric 𝐢𝑣 and IR metric
                                                           query processing and expansion and query-document
π‘Ÿπ‘’π‘π‘Žπ‘™π‘™@𝐾 , 𝐾 = 5, 10 computed for 2–20 topics de-
rived by LDA.                                              modelling. They also facilitate context capturing,
                                                           named entity recognition, topic-oriented filtering in
                                                           IR and ER [17, 16, 18]. Considering the classification
                                                           in [16], our work could be categorized as a study on
5.2. Number of Topics                                      improvement of an ad-hoc entity retrieval system
LDA requires the number of topics to be passed as an       that uses semantically enriched term representation
input parameter. In some applications, this value is       and preserves topical relations among search results.
available as expert knowledge or is motivated by the       Examples of industrial entity retrieval often include
dataset [14]. Alternatively, a set of coherence metrics    the knowledge representation in the form of ontology
could be used to indicate the semantic correspon-          as shown in [3, 4, 5].
dences within and throughout the derived topics and
to evaluate their quality [15]. When a target number       6.2. Knowledge Management for
of topics is unknown, it could be suggested by the              Industrial Applications
elbow method applied to coherence measures. In our
case, the coherence score 𝐢𝑣 estimated over 5 LDA          Industries have been adopting process planning and
instantiations with 2β€”20 topics resulted in an elbow       knowledge-based systems for machine manufacturing
point between 5 and 9 topics as shown in Figure 5.         and maintenance over the recent years [1, 2, 19]. In
However, the best results of IR evaluation metrics         the literature review on spare part demand forecasting
were obtained in the majority of experiments with          [20], it has been found that a large part of research
LDA at 𝐾 = 5 for 19 topics and at 𝐾 = 10 for 14 topics     work has been dedicated to the analysis of historical
as also demonstrated for π‘Ÿπ‘’π‘π‘Žπ‘™π‘™@𝐾 in Figure 5. In          demand using installed base information and reports.
general, the models perform well with 13 or more              The work on technical support that utilizes a
topics in our experiment. The impact of the number         historical case base is particularly relevant to our
of topics in terms of chosen evaluation metrics is         research [21, 22, 23]. The goal of the paper [21] is to
observed on a smaller scale for 13 or more topics than     aid telecom technical support teams with a fast and
for the number of topics from 2 to 12.                     accurate search over the solutions base for previously
                                                           registered cases and solutions from other technical
                                                           texts. A method of populating an existing ontology
6. Related Work                                            has been proposed using text segmentation and
                                                           scoring to serve the use case of Telecom Hardware
Areas related to our research span across entity           remote user assistance. The authors in [24] propose a
retrieval and knowledge management in industrial           two-step method for spare part demand forecasting
applications that correspond to the scope of our work      that predicts the number of repairs and the number
while the use of topic modelling in IR is related to the   of parts needed for a repair. Our work combines
methodology used in this paper.                            processing of a historical case base, but is not focused
                                                           on spare part demand forecasting for general
6.1. Entity Retrieval Overview                             planning. It rather considers individual maintenance
                                                           cases and addresses a lower level of granularity.
Entity retrieval (ER) is defined in [16] as β€œthe task of
                                                              Processing of Technical Documents Studies
answering queries with a ranked list of entities.” The
                                                           apply NLP as a tool for extracting knowledge from
area of entity retrieval is closely connected to IR and
                                                           natural texts in industrial log mining [25, 22, 26],
mining technical documentation [27], classification of      In a number of research works, a combination of
system failures and preventive maintenance [28, 23].     topic modelling and IR is applied to small texts [34].
   The study [22] applies an NLP approach to             For instance, the paper [35] describes a method that
maintenance data concerning a part of the Swedish        first pools similar tweets using an IR approach,
railway system and identifies frequent failure cases     merges relevant short texts in a larger document and
on the railways. Text mining and NLP techniques are      trains LDA model on concatenated documents thus
applied in [23] to analyze and classify the              obtaining richer topics. By contrast, our method
construction site accidents using the data from          addresses a domain-specific collection of short texts
Occupational Safety and Health Administration. In        written in so-called telegraph style with spelling
this setting, an ensemble method was used to obtain      mistakes and domain-related abbreviations.
Tfidf matrix and a sequential quadratic parsing             Search Results Clustering To date, several stud-
method to assign weights to 5 classifiers.               ies have investigated document and language models
   The work [29] focuses on building Machine             based on topics and clusters. The work [36] explored
Learning (ML) models to estimate future duration of      a cluster-based retrieval of documents, a mechanism
maintenance activities by identifying problem,           that returns a relevant cluster of documents, and
solution and items features via text mining for          proposed two language models for ranking the
pre-processing followed by neural networks and           clusters of documents and smoothing the documents
decision trees for prediction. NLP is used to mine       using clusters. By contrast, some works cluster
electronic documents composed of free-form text to       search results using traditional ML, graph-based and
extract terms of interest, the hierarchy of their con-   rank-based clustering techniques [6, 37].           For
texts and form a set of normalized terms including       instance, Lingo algorithm [38] focuses on learning
multi-word terms for further data analysis in [30].      phrases to represent clusters in a human-readable
   Therefore, problems addressed in maintenance          way and then it discovers topics using Tfidf weight-
services application domain are diverse in nature.       ing, performs term-document matrix reduction with
However, to the best of our knowledge the current        SVD and matches the extracted phrases with topics.
paper is the first attempt to use entity retrieval       In comparison to these approaches, our work aims at
techniques for spare part management.                    retrieving entities rather than documents and the
                                                         user can explore all the retrieved parts within all the
6.3. Use of NLP and Topic Modelling in                   clusters instead of only one cluster.
     IR Systems
The effectiveness of IR systems could be improved by     7. Conclusion
topic modelling that mines term associations in a
                                                         In this work, we explored a way of improving a spare
collection of documents. Topic modelling could be
                                                         part retrieval system for remote diagnostics and
integrated to IR tasks to smooth the document model
                                                         maintenance of medical equipment by applying topic
with a document term prior estimated using term
                                                         modelling to search results. The topic modelling
distributions over topics [31]. The work [32] explores
                                                         component was used to cluster the results of a
the possibilities of modelling term associations as a
                                                         baseline retrieval system and improve the relevance
way of related terms integration into document
                                                         of the search results. We aimed to support the
models and proposes a model of probabilistic term
                                                         decision-making process of maintenance service
association using the joint probability of terms. A
                                                         teams that searched in a historical collection of
combination of term indexing and topic modelling
                                                         troubleshooting reports and retrieved parts needed
approaches is proposed in [33]. In the proposed
                                                         for a new similar issue.
model, every query term in a document is weighted
                                                            The experimental dataset was constructed from
using the LDA algorithm and IR indexing methods.
                                                         query-result pairs pointing at the historical case base
The best experimental results were obtained with
                                                         and parts used in the cases. We adjusted several IR
LDA-BM25 version. However, in this paper, the
                                                         metrics to evaluate the results of spare part retrieval
similarity is computed using a vector space model
                                                         in the baseline architecture and the topic modelling
and the retrieval results are combined using topic
                                                         component modification. The major enhancement
relations mined from a historical case base.
                                                         was observed for the metric that estimated the
Therefore, topic modelling is used as a clustering or
                                                         minimum top ranked parts that were sufficient for
grouping method on top of an ER system.
                                                         the full treatment of a service case associated with a
performed query.                                               2017, pp. 425–432. URL: http://link.springer.com/
   A natural progression of this work is to apply on-          10.1007/978-3-319-66923-6{_}50. doi:10.1007/
line topic learning and automatically recommend the            978-3-319-66923-6_50.
topic that performs best for a given query. An input       [6] H. Toda, R. Kataoka, M. Oku,                Search
from domain experts would help fix the number of               Result     Clustering      Using      Informatively
topics and characteristic terms that should appear             Named Entities,               International Jour-
under one topic. Furthermore, additional domain                nal of Human-Computer Interaction 23
knowledge could be combined with the entity re-                (2007) 3–23. URL: http://www.tandfonline.
trieval system under consideration to suggest actions          com/doi/abs/10.1080/10447310701360995.
beyond part replacement, such as troubleshooting               doi:10.1080/10447310701360995.
tests for remote and on-site diagnostics.                  [7] S. E. Robertson, S. Walker, K. S. Jones, M. M.
                                                               Hancock-Beaulieu, Okapi at TREC-3, Pro-
                                                               ceedings of the Third Text REtrieval Conference
Acknowledgments                                                (1994).
                                                           [8] C. D. Manning, P. Raghavan, H. Schutze,
The authors would like to acknowledge the gracious
                                                               Introduction      to     Information      Retrieval,
support of this work through the local authorities
                                                               Cambridge University Press, Cambridge,
under grant agreement β€œITEA-2018-17030-Daytime”.
                                                               2008.      URL:      http://ebooks.cambridge.org/
                                                               ref/id/CBO9780511809071.             doi:10.1017/
References                                                     CBO9780511809071.
                                                           [9] C. D. Manning, H. SchΓΌtze, Foundations of Sta-
 [1] G.-F. Liang, J.-T. Lin, S.-L. Hwang, E. M.-y.             tistical Natural Language Processing, The MIT
     Wang, P. Patterson,        Preventing human er-           Press;, 1999. URL: https://nlp.stanford.edu/fsnlp/.
     rors in aviation maintenance using an on-line        [10] D. M. Blei, A. Y. Ng, M. T. Jordan, Latent Dirich-
     maintenance assistance platform,            Inter-        let Allocation, Journal of Machine Learning Re-
     national Journal of Industrial Ergonomics                 search 3 (2003) 993–1022.
     40 (2010) 356–367. URL: https://linkinghub.          [11] Elasticsearch B.V., Elasticsearch, ???? URL: https:
     elsevier.com/retrieve/pii/S0169814110000028.              //www.elastic.co/.
     doi:10.1016/j.ergon.2010.01.001.                     [12] R. ŘehΕ―Ε™ek, P. Sojka, Software Framework for
 [2] E. Ruschel, E. A. P. Santos, E. d. F. R. Loures,          Topic Modelling with Large Corpora, in: Pro-
     Industrial maintenance decision-making: A sys-            ceedings of the LREC 2010 Workshop on New
     tematic literature review, Journal of Manufactur-         Challenges for NLP Frameworks, ELRA, Valletta,
     ing Systems 45 (2017) 180–194. URL: https://doi.          Malta, 2010, pp. 45–50.
     org/10.1016/j.jmsy.2017.09.003. doi:10.1016/j.       [13] M. Honnibal, I. Montani, spaCy 2: Natural lan-
     jmsy.2017.09.003.                                         guage understanding with Bloom embeddings,
 [3] Z. Li, K. Ramani, Ontology-based design infor-            convolutional neural networks and incremental
     mation extraction and retrieval, Artificial Intel-        parsing, 2017. To appear.
     ligence for Engineering Design, Analysis fand        [14] R. J. Gallagher, K. Reing, D. Kale, G. Ver
     Manufacturing 21 (2007) 137–154. URL: https:              Steeg,      Anchored Correlation Explanation:
     //www.cambridge.org/core/product/identifier/              Topic Modeling with Minimal Domain Knowl-
     S0890060407070199/type/journal{_}article.                 edge,     Transactions of the Association for
     doi:10.1017/S0890060407070199.                            Computational Linguistics 5 (2017) 529–542.
 [4] K. Ponnalagu, Ontology-driven root-cause ana-             URL: https://www.mitpressjournals.org/doi/abs/
     lytics for user-reported symptoms in managed IT           10.1162/tacl{_}a{_}00078. doi:10.1162/tacl_a_
     systems, IBM Journal of Research and Develop-             00078.
     ment 61 (2017) 53–61. doi:10.1147/JRD.2016.          [15] M. RΓΆder, A. Both, A. Hinneburg, Exploring the
     2629319.                                                  Space of Topic Coherence Measures, in: Pro-
 [5] M. Sharp, T. Sexton, M. P. Brundage,          To-         ceedings of the Eighth ACM International Con-
     ward Semi-autonomous Information Extraction               ference on Web Search and Data Mining - WSDM
     for Unstructured Maintenance Data in Root                 ’15, ACM Press, New York, New York, USA, 2015,
     Cause Analysis, in: IFIP Advances in Information          pp. 399–408. URL: http://dl.acm.org/citation.cfm?
     and Communication Technology, volume 513,                 doid=2684822.2685324. doi:10.1145/2684822.
                                                               2685324.
[16] K. Balog, Entity-Oriented Search, volume 39                ence on Knowledge discovery and data mining
     of The Information Retrieval Series, Springer              - KDD ’14, ACM Press, New York, New York,
     International Publishing, Stavanger, Nor-                  USA, 2014, pp. 1867–1876. URL: http://dl.acm.
     way, 2018. URL: https://eos-book.orghttp:                  org/citation.cfm?doid=2623330.2623340. doi:10.
     //link.springer.com/10.1007/978-3-319-93935-3.             1145/2623330.2623340.
     doi:10.1007/978-3-319-93935-3.                        [26] S. Agarwal, V. Aggarwal, A. R. Akula, G. B. Das-
[17] S. BΓΌttcher, C. L. A. Clarke, G. V. Cormack, Infor-        gupta, G. Sridhara, Automatic problem extrac-
     mation Retrieval: Implementing and Evaluating              tion and analysis from unstructured text in IT
     Search Engines, The MIT Press, 2010.                       tickets, IBM Journal of Research and Develop-
[18] Z. A. Merrouni, B. Frikh, B. Ouhbi, Toward                 ment 61 (2017) 41–52. doi:10.1147/JRD.2016.
     Contextual Information Retrieval: A Review                 2629318.
     And Trends,         Procedia Computer Science         [27] K. Richardson, J. Kuhn, Learning semantic cor-
     148 (2019) 191–200. URL: https://linkinghub.               respondences in technical documentation, ACL
     elsevier.com/retrieve/pii/S1877050919300365.               2017 - 55th Annual Meeting of the Association
     doi:10.1016/j.procs.2019.01.036.                           for Computational Linguistics, Proceedings of
[19] S. P. Leo Kumar, Knowledge-based expert sys-               the Conference (Long Papers) 1 (2017) 1612–
     tem in manufacturing planning: state-of-the-               1622. doi:10.18653/v1/P17-1148.
     art review, International Journal of Production       [28] K. Arif-Uz-Zaman, M. E. Cholette, L. Ma,
     Research 57 (2019) 4766–4790. doi:10.1080/                 A. Karim,          Extracting failure time data
     00207543.2018.1424372.                                     from industrial maintenance records us-
[20] S. Van der Auweraer, R. N. Boute, A. A. Syn-               ing text mining,             Advanced Engineer-
     tetos, Forecasting spare part demand with in-              ing Informatics 33 (2017) 388–396. URL:
     stalled base information: A review, International          http://dx.doi.org/10.1016/j.aei.2016.11.004.
     Journal of Forecasting (2019). doi:10.1016/j.              doi:10.1016/j.aei.2016.11.004.
     ijforecast.2018.09.002.                               [29] M. Navinchandran, M. E. Sharp, M. P. Brundage,
[21] A. Kouznetsov, J. B. Laurila, C. J. Baker, B. Shoe-        T. B. Sexton, Studies to predict maintenance
     bottom, Algorithm for Population of Object                 time duration and important factors from main-
     Property Assertions Derived from Telecom Con-              tenance workorder data, in: Proceedings of
     tact Centre Product Support Documentation, in:             the Annual Conference of the Prognostics and
     2011 IEEE Workshops of International Confer-               Health Management Society, PHM, 2019. doi:10.
     ence on Advanced Information Networking and                36001/phmconf.2019.v11i1.792.
     Applications, IEEE, 2011, pp. 41–46. URL: http://     [30] A. Kao, N. B. Niraula, D. I. Whyatt, Text mining a
     ieeexplore.ieee.org/document/5763435/. doi:10.             dataset of electronic documents to discover terms
     1109/WAINA.2011.135.                                       of interest, 2020.
[22] C. StenstrΓΆm, M. Aljumaili, A. Parida, Natu-          [31] L. Azzopardi, M. Girolami, C. van Rijsbergen,
     ral language processing of maintenance records             Topic based language models for ad hoc in-
     data, International Journal of COMADEM 18                  formation retrieval, in: 2004 IEEE Interna-
     (2015) 33–37.                                              tional Joint Conference on Neural Networks
[23] F. Zhang, H. Fleyeh, X. Wang, M. Lu,                       (IEEE Cat. No.04CH37541), volume 4, IEEE,
     Construction site accident analysis us-                    2004, pp. 3281–3286. URL: http://ieeexplore.ieee.
     ing text mining and natural language                       org/document/1381205/. doi:10.1109/IJCNN.
     processing techniques,             Automation in           2004.1381205.
     Construction 99 (2019) 238–248. URL:                  [32] X. Wei, W. B. Croft, Modeling Term Asso-
     https://doi.org/10.1016/j.autcon.2018.12.016.              ciations for Ad-Hoc Retrieval Performance
     doi:10.1016/j.autcon.2018.12.016.                          Within Language Modeling Framework,
[24] W. Romeijnders, R. Teunter, W. Van Jaarsveld, A            in:     Advances in Information Retrieval,
     two-step method for forecasting spare parts de-            Springer Berlin Heidelberg, Berlin, Hei-
     mand using information on component repairs,               delberg, 2007, pp. 52–63. URL: http://link.
     European Journal of Operational Research (2012).           springer.com/10.1007/978-3-540-71496-5{_}8.
     doi:10.1016/j.ejor.2012.01.019.                            doi:10.1007/978-3-540-71496-5_8.
[25] R. Sipos, D. Fradkin, F. Moerchen, Z. Wang, Log-      [33] F. Jian, J. X. Huang, J. Zhao, T. He, P. Hu, A
     based predictive maintenance, in: Proceedings              Simple Enhancement for Ad-hoc Information Re-
     of the 20th ACM SIGKDD international confer-               trieval via Topic Modelling, in: Proceedings of
     the 39th International ACM SIGIR conference on
     Research and Development in Information Re-
     trieval - SIGIR ’16, ACM Press, New York, New
     York, USA, 2016, pp. 733–736. URL: http://dl.acm.
     org/citation.cfm?doid=2911451.2914748. doi:10.
     1145/2911451.2914748.
[34] J. Qiang, Z. Qian, Y. Li, Y. Yuan, X. Wu, Short
     Text Topic Modeling Techniques, Applications,
     and Performance: A Survey,           IEEE Trans-
     actions on Knowledge and Data Engineering
     14 (2020) 1–17. URL: https://ieeexplore.ieee.org/
     document/9086136/. doi:10.1109/TKDE.2020.
     2992485.
[35] M. Hajjem, C. Latiri, Combining IR and LDA
     Topic Modeling for Filtering Microblogs, in: Pro-
     cedia Computer Science, 2017. doi:10.1016/j.
     procs.2017.08.166.
[36] X. Liu, W. B. Croft, Cluster-based retrieval us-
     ing language models, in: Proceedings of the 27th
     annual international conference on Research and
     development in information retrieval - SIGIR ’04,
     ACM Press, New York, New York, USA, 2004,
     pp. 1–8. URL: http://portal.acm.org/citation.cfm?
     doid=1008992.1009026. doi:10.1145/1008992.
     1009026.
[37] K. Sadaf, Web Search Result Clustering- A
     Review,      International Journal of Computer
     Science & Engineering Survey 3 (2012) 85–
     92. URL: http://www.airccse.org/journal/ijcses/
     papers/3412ijcses07.pdf. doi:10.5121/ijcses.
     2012.3407.
[38] S. OsiΕ„ski, J. Stefanowski, D. Weiss, Lingo:
     Search Results Clustering Algorithm Based on
     Singular Value Decomposition,         in: Intelli-
     gent Information Processing and Web Mining,
     Springer Berlin Heidelberg, Berlin, Heidelberg,
     2004, pp. 359–368. URL: http://link.springer.com/
     10.1007/978-3-540-39985-8{_}37. doi:10.1007/
     978-3-540-39985-8_37.




A. Topic Modelling Component
   Performance Evaluation for
   Grouped Queries
Figure 1: Comparison of different metrics computed for LDA and baseline results in a part retrieval task.
Queries are divided into groups using the number of retrieved cases, as well as used and retrieved parts.
                   Confidence interval of 95% is shown as a box around LDA values.