=Paper= {{Paper |id=Vol-2699/paper13 |storemode=property |title=Improving Spare Part Search for Maintenance Services using Topic Modelling |pdfUrl=https://ceur-ws.org/Vol-2699/paper13.pdf |volume=Vol-2699 |authors=Anastasiia Grishina,Milosh Stolikj,Qi Gao,Milan Petkovic |dblpUrl=https://dblp.org/rec/conf/cikm/GrishinaSGP20 }} ==Improving Spare Part Search for Maintenance Services using Topic Modelling== https://ceur-ws.org/Vol-2699/paper13.pdf

Improving Spare Part Search for Maintenance Services
using Topic Modelling
Anastasiia Grishinaa , Milosh Stolikjb , Qi Gaob and Milan Petkovica,b
a Eindhoven University of Technology, Den Dolech 2, 5612 AZ, Eindhoven, The Netherlands
b Philips Research, High Tech Campus 34, 5656 AE, Eindhoven, The Netherlands

Abstract
To support the decision-making process in various industrial applications, many companies use knowledge management and
Information Retrieval (IR). In an industrial setting, knowledge is extracted from data that is often stored in a semi-structured
or unstructured format. As a result, Natural Language Processing (NLP) methods have been applied to a number of IR steps.
In this work, we explore how NLP and particularly topic modelling can be used to improve the relevance of spare part retrieval
in the context of maintenance services. A proposed methodology extracts topics from short maintenance service reports that
also include part replacement data. An intuition behind the proposed methodology is that every topic should represent a
specific root cause. Experimental were conducted for an ad-hoc retrieval system of service case descriptions and spare parts.
The results have shown that our modification improves a baseline system thus boosting the performance of maintenance
service solution recommendation.

Keywords
Entity retrieval, spare part search, decision support, maintenance services, natural language processing, topic modelling

1. Introduction system which helps engineers to search for relevant
historical service reports and identify the most
Information retrieval systems are gaining importance probable service solution. Therefore, target retrieval
in various industrial applications. We can observe the entities are equipment components, i.e. parts to be
emergence of knowledge-based systems that support replaced. In practice, one case may require multiple
the decision-making process in construction, avia- parts to be replaced.
tion, equipment maintenance and other areas [1, 2]. To address the challenge of spare part retrieval, we
In these settings, knowledge is frequently extracted create an NLP pipeline that pre-processes short
from data that is captured in legacy systems using textual descriptions of maintenance activities and
natural language and stored in a semi-structured or apply topic modelling to categorize the descriptions
unstructured format. As a result, linguistic and of past cases. From relevant maintenance service
statistical NLP methods have been applied to a reports, the proposed methodology extracts topics
number of IR steps, such as document and query each of which may indicate a specific root cause.
modelling, query expansion and search result Once categorized, cases and parts would be easier to
clustering based on semantic similarities [3, 4, 5, 6]. examine and more relevant to a particular type of
In this work, we explore how NLP and particularly failure. An engineer can address topics seuqentially
topic modelling can be used to improve spare part and choose among parts related to the same topic.
retrieval that serves the purpose of medical Therefore, we exploit term co-occurrences and their
equipment maintenance. In particular, we focus on semantic correspondences using topic modelling to
remote system diagnostics that takes place when the enhance the relevance of target entities retrieval.
equipment malfunctions, i.e. stops working according Although the use case assumes that a number of
to its specification. The problem may be resolved in parts will be ultimately suggested based on past
several ways, one of which is the replacement of one maintenance records, the problem statement does not
or more (malfunctioning) parts. We conducted our fall under the vastly explored area of recommender
research in the context of an ad-hoc entity retrieval systems that involves user preference modelling.
Proceedings of the CIKM 2020 Workshops, October 19-20, 2020,
To evaluate the difference introduced by the
Galway, Ireland proposed component, we use IR metrics that are
" a.grishina@tue.nl (A. Grishina); m.stolikj@philips.com (M. customized to characterize the relevance and
Stolikj); q.gao@philips.com (Q. Gao); milan.petkovic@philips.com completeness of a set of retrieved entities. They
(M. Petkovic)
0000-0003-3139-0200 (A. Grishina)
measure how far in the list of search results all the
© 2020 Copyright for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
required parts are present, indicate if at least one
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) required entity is retrieved and whether all needed
parts are present among top K search results. activities and IDs of parts used to solve the issue.
The main contributions of the work are as follows: Hence, the reports might contain abbreviations,
software logs sent by a machine as well as natural
• we enhance the performance of an industrial language descriptions of a machine state on every
entity retrieval system by learning semantic step of the maintenance process. Closed cases are up-
correspondences between short historical loaded to the collection of historical cases that could
descriptions of events associated with the be mined using the above mentioned ER system.
entities; To present the setting in a formal way, let 𝑞 be a
• we approach the challenge of spare parts query performed by a service engineer while
retrieval in remote system diagnostics and working on a case. We will use the term query case to
maintenance of industrial equipment using indicate such cases. Each query is associated with a
topic modelling to group extracted historical single maintenance case. The list of parts replaced in
cases and parts under topics that should a case 𝑐 is 𝑃(𝑐). We use 𝐶(𝑞) to denote a list of cases
represent failure root causes; retrieved for the query 𝑞. A set of parts replaced in all
retrieved cases is denoted by 𝑃(𝑞) = ∪𝑐∈𝐶(𝑞) 𝑃(𝑐), and a
• we evaluate the proposed method on a real set of ranked parts recommended for replacement is
world dataset using customized information expressed by 𝑃𝑅 (𝑞) ⊆ 𝑃(𝑞).
retrieval metrics.
The remainder of this paper is organized as 3. Methodology
follows. We present the problem formulation and a
baseline part retrieval system in Section 2. The The method proposed in this work combines a baseline
methodology of combining the text mining pipeline entity retrieval setting and an add-on topic modelling
and the entity retrieval process is described in component as described below.
Section 3. Section 4 is dedicated to a dataset
description and methods implementation. We discuss
3.1. Baseline Entity Retrieval System
experimental results in Section 5 and related work in
Section 6. The paper is concluded by Section 7 where The baseline entity search system in question is
we also mention possible directions for future work. empowered with a two-step retrieval mechanism. A
database of entity descriptions lies in the foundation
of the mechanism. It consists of entity descriptions
2. Problem Description retrieval followed by the final entity retrieval and
ranking as explained in detail below.
In the scope of this work, entity descriptions are
composed of equipment characteristics and
represented by maintenance case reports registered 3.1.1. Retrieval of Entity Descriptions
in the retrieval system. Entities to be retrieved are the At the first step of the entity search, a system
parts recommended for replacement to troubleshoot retrieves relevant descriptions using a Vector Space
a machine referred to in a new malfunction report. Model (VSM) with Okapi BM25 similarity score [7, 8].
Queries may contain various characteristics of a new VSM is a document and query representation model
maintenance case that should be treated by a that converts texts to N-dimensional vectors of term
maintenance service team. An entity, i.e. a spare part, weights, where N is the number of words in a
is identified with a unique ID and is related to a case dictionary. Terms are simply the words or groups of
description. One historical maintenance case can words present in the collection of documents. The
have several parts associated with it, similarly, a new dictionary is built from a text corpus and includes
service case may require a set of different parts. distinct terms. The intuition behind VSM is that
The knowledge base of maintenance cases is retrieved documents will be ranked according to a
updated with the help of service engineers. They similarity function computed for a query and a
submit maintenance reports for every equipment document, i.e. vectors in a vector space.
failure or customer complaint as short technical texts In the context of our problem description, for a
often in multiple languages (English and a locally query 𝑞 containing keywords {𝑞𝑖 }𝑛𝑖=1 and a mainte-
spoken language). Each historical report includes a nance case description 𝑐 with fields {𝑐𝑗 }𝑚
𝑗=1 , Okapi
number of logs such as time of customer complaint
registration, a textual description of maintenance
BM25 similarity score could be expressed as follows: Algorithm 1 Part Recommendation
Input: Query 𝑞 associates with maintenance case 𝑐,
𝐵𝑀25(𝑞, 𝑐) = number of parts to recommend 𝐾
𝑚 𝑛
𝑓 (𝑞𝑖 , 𝑐𝑗 ) ⋅ (𝑘1 + 1) Output: A list of recommended parts
= ∑ ∑ 𝐼 𝐷𝐹 (𝑞𝑖 ) ⋅ 𝐿𝑐
. 𝑐𝑜𝑢𝑛𝑡 ← {} ⊳ # occurrences of part combinations
𝑗=1 𝑖=1 𝑓 (𝑞𝑖 , 𝑐𝑗 ) + 𝑘1 ⋅ (1 − 𝑏 + 𝑏 ⋅ 𝐿𝑎𝑣𝑔𝑗 )
𝑗 𝑃(𝑞) ← {} ⊳ retrieved parts
(1) 𝑃𝑅 (𝑞) ← {} ⊳ recommended parts
for 𝑐 ∈ 𝐶(𝑞) do:
Here, 𝑓 (𝑞𝑖 , 𝑐𝑗 ) is the frequency of the keyword 𝑞𝑖 in 𝑃(𝑐) ← get part IDs(𝑐)
a field 𝑐𝑗 of the case description 𝑐. 𝐿𝑐𝑗 is the length of if 𝑃(𝑐) ∈ 𝑃(𝑞) then
the field 𝑐𝑗 in terms of words, and 𝐿𝑎𝑣𝑔𝑗 is the average 𝑐𝑜𝑢𝑛𝑡(𝑃(𝑐)) ← 𝑐𝑜𝑢𝑛𝑡(𝑃(𝑐)) + 1
length of the field 𝑗 in descriptions of all cases in the else
collection 𝐶. Variables 𝑘1 and 𝑏 are tuning 𝑃(𝑞) ← 𝑃(𝑞) ∪ 𝑃(𝑐)
parameters that control how much every new 𝑐𝑜𝑢𝑛𝑡(𝑃(𝑐)) ← 1
occurrence of a term impacts the score and the end if
document length scaling correspondingly. Inverse sort(𝑃(𝑞), using=𝑐𝑜𝑢𝑛𝑡(𝑃(𝑐)), order=DESC)
Document Frequency is calculated as: for 𝑃(𝑐) ∈ 𝑃(𝑞) do
𝑃(𝑞) ← 𝑃(𝑞) ∪ 𝑃(𝑐)
𝑀 − 𝑛(𝑞𝑖 ) + 0.5
𝐼 𝐷𝐹 (𝑞𝑖 ) = log , (2) drop duplicates(𝑃(𝑞))
( 𝑛(𝑞𝑖 ) + 0.5 ) end for
where 𝑀 is the total number of cases, i.e. 𝑀 = |𝐶(𝑞)|, end for
and 𝑛(𝑞𝑖 ) is the number of case descriptions that 𝑃𝑅 (𝑞) ← top K(𝑃(𝑞))
contain the query term 𝑞𝑖 . Therefore, the case
𝑐 𝑖1 ∈ 𝐶(𝑞) is ranked higher than 𝑐 𝑖2 ∈ 𝐶(𝑞) iff
𝐵𝑀25(𝑞, 𝑐 𝑖1 ) > 𝐵𝑀25(𝑞, 𝑐 𝑖2 ). tokens that represent individual words or sometimes
groups of words [9]. The process of lemmatization
3.1.2. Entity Retrieval and Ranking involves finding the initial forms of the inflected
words, also referred to as root forms or lemmas. A
The second step realizes the entity retrieval. It ranks lemma is a word in its canonical form that exists in
spare parts associated with the retrieved cases based the dictionary of the used language. For example, the
on the frequency of their occurrence and the rank of lemma for do, doing, did is the word do. Next, term
the case where they occur. Thus, the most frequent weighting refers to assigning weights to tokens. We
parts that occur in top ranked cases appear higher on utilize term frequency or bag-of-words weights as a
the final list of retrieved parts than a part that term weighting scheme. It associates a term with a
appears the same number of times lower on the case weight proportional to the frequency of the term
list. Several proprietary filters are applied as well, but occurrence in the corpus of documents. For topic
they do not affect the methodology. The algorithm modelling, we use Latent Dirichlet Allocation (LDA),
for part recommendation is presented in Algorithm 1. one of the most popular algorithms for automatically
extracting topics. LDA is based on the generative
3.2. Topic Modelling Component probabilistic language model [10]. The purpose of
LDA is to learn the representation of a fixed number
Transformation of the historical cases and parts re- of topics and derive the topic distribution for every
trieval pipeline is performed by adding a component document in a collection. Every maintenance service
that groups retrieved cases under a number of topics case is assigned a topic according to the maximum
and ranks the parts within the topics. Figure 1 shows probability of the case belonging to a topic.
the baseline architecture (a) and the modification that
includes the proposed topic modelling component (b).
The topic modelling component could be consid- 4. Evaluation
ered as an individual NLP pipeline with a number of
steps. The pipeline includes tokenization, lemmatiza- In this section, we describe the real world dataset that
tion, removal of stop phrases, building a dictionary of is extracted from the baseline part retrieval system.
tokens, term weighting and topic modelling. Tok- We also discuss the metrics used to evaluate the
enization of the text refers to splitting it into units or performance of the baseline system and compare it to
Figure 1: Integration of the topic modelling component (b)
in a baseline two-step document and part retrieval system
(a).

Figure 2: Distribution of queries over the number of cases
and parts retrieved in response to the queries.
the configuration with the integrated topic modelling
component.
with the corresponding cases returned as search
4.1. Dataset Description results: (𝑞, 𝐶(𝑞)). Cases returned for the queries may
For our experiments, we use a proprietary dataset have non-empty intersection with the training
composed of historical maintenance cases. Textual dataset, however, the cases for which the queries had
fields of case descriptions have been aggregated into been created were excluded from the training set.
one field per maintenance case and serve as input to
LDA during training and testing stages. The majority 4.2. Evaluation metrics
of cases are written in mixed languages. Figure 2
Top 𝐾 ranked parts are used to estimate 𝑠𝑢𝑐𝑐𝑒𝑠𝑠,
presents the distribution of the number of queries
𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒𝑛𝑒𝑠𝑠, 𝑟𝑒𝑐𝑎𝑙𝑙 and 𝑚𝑖𝑛_𝑡𝑜𝑝_𝑘 metrics.
over their characteristics: the number of retrieved
Metric@K is computed for a set of retrieved parts
service cases, retrieved ranked parts to replace and
|𝑃𝑅 (𝑞)| ≤ 𝐾 . The operator | ⋅ | applied to a set defines
parts replaced in the query case. The majority of
the count of set elements. The metrics are calculated
queries retrieved up to 200 similar case descriptions,
as follows:
however, this number could reach 1000 cases. The
{
number of unique recommended parts retrieved from 1, if 𝑃(𝑐) ⊆ 𝑃𝑅 (𝑞),
these cases was below 350 in general, while the 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒𝑛𝑒𝑠𝑠@𝐾 (𝑞) =
0, if 𝑃(𝑐) ⊈ 𝑃𝑅 (𝑞);
majority of queries retrieved 0-10 parts. The number {
of parts required to treat a maintenance case 1, if |𝑃(𝑐) ∩ 𝑃𝑅 (𝑞)| > 0,
associated with the query was equal to 5 or less in 𝑠𝑢𝑐𝑐𝑒𝑠𝑠@𝐾 (𝑞) =
0, if |𝑃(𝑐) ∩ 𝑃𝑅 (𝑞)| = 0;
most of the query cases.
For building the LDA model, we use a subset of 𝑟𝑒𝑐𝑎𝑙𝑙@𝐾 (𝑞) =
|𝑃(𝑐) ∩ 𝑃𝑅 (𝑞)|
;
historical cases written in English. The training set |𝑃(𝑐)|
contains data from 101,026 different maintenance 𝑚𝑖𝑛_𝑡𝑜𝑝_𝑘@𝐾 (𝑞) =𝑘, 𝑘 ≤ 𝐾 and
cases. For the test set, we use a sample of 1,564 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒𝑛𝑒𝑠𝑠@𝑘(𝑞) = 1;
queries performed by service engineers, together
𝐶𝑜𝑚𝑝𝑙𝑒𝑡𝑒𝑛𝑒𝑠𝑠 measures whether all the used parts which spans from 2 to 20 in our experiments. All the
were suggested for a troubleshooting report, 𝑠𝑢𝑐𝑐𝑒𝑠𝑠 metrics presented in the paper are evaluated at top 𝐾
shows if any consumed part was listed among retrieved parts, 𝐾 = 5, 10. The algorithm is set up to
retrieved parts and 𝑟𝑒𝑐𝑎𝑙𝑙 indicates the ratio of learn symmetric 𝛼, a document-topic prior, from data
retrieved parts that were consumed to the total as well as 𝜂, a topic-word prior. The number of
number of consumed parts. An additional metric iterations is fixed at 100.
𝑚𝑖𝑛_𝑡𝑜𝑝_𝑘 is used to estimate how far in the list of In addition, we set an empirical parameter for the
retrieved parts one could find the full list of ratio of English words appearing in the case
consumed parts in the query case and returns null if description 𝑅𝐸𝑛 = 30%. A topic will be derived by
such 𝑘 does not exist. As a baseline, we use the initial LDA trained on the entirely English corpus in case
part retrieval strategy and its statistics for the whole the description contains at least 𝑅𝐸𝑛 English words,
set of retrieved and ranked parts 𝑃𝑅 (𝑞). Once topics otherwise the maintenance case will be marked as
are computed, the metrics are estimated for parts “topic undefined”.
associated with the cases in every topic 𝑡, i.e. a subset
of cases and, therefore, parts:
𝑃𝑅 (𝑞)(𝑡) = {𝑃(𝑞) | 𝑐 ∈ 𝐶(𝑞) & 𝑐 ∈ 𝑡 } instead of 𝑃𝑅 (𝑞). 5. Results and Discussion
We discard query cases that did not include
In this section, we compare the results of the initial
information whether some parts were consumed or
ER architecture evaluation to the results of the
not (i.e. missing data). If a case did not require any
modified architecture with the topic modelling
part replacement, we utilize an artificial part called
component as well as to the best possible results for
“No parts” and assign an ID to it. In this way, for
the dataset of maintenance cases. We group queries
query cases that were solved without part
by levels of generalization, which stands for the
replacement it is possible to evaluate the
number of matched cases and retrieved parts in our
performance of part retrieval. The top ranked part in
setting. Moreover, since a number of topics is a
this situation should be “No parts”.
hyper-parameter that is not learned via training, we
discuss the estimation of a possible number of topics
4.3. Implementation using NLP coherence metrics and compare it with
The first step of the initial ER system is powered by observations of the retrieval system’s performance.
Elasticsearch [11]. It performs indexing of the
documents in the knowledge base and retrieves them 5.1. Retrieval Performance at Top K
according to Okapi BM25 ranking with default tuning Parts
parameters 𝑘1 = 1.2 and 𝑏 = 0.75.
For the add-on topic modelling component, we The performance of maintenance cases and parts
utilize Python NLP libraries: Gensim [12] for all the retrieval in the initial configuration of the part
steps including topic modelling and spaCy [13] for retrieval system (Baseline) and the configuration with
lemmatization. One step that is also customized to LDA topic modelling component (LDA) is evaluated
the maintenance application is the removal of stop using the above described metrics at different 𝐾 .
phrases. We use a collection of English stop words These results are also compared to the best possible
pre-defined by Gensim and corpus-specific common results on the test dataset computed at 𝐾 = ∞. We
phrases such as questionnaire forms repeated across report a 95%-level confidence interval of the mean
the majority of cases, since question formulations do values of 5 runs with different random seeds for LDA
not characterize individual cases. initialization in Figure 3. In addition, we show the
One characteristic of LDA model is that it provides ratio of test queries for which the metrics improved
different topic distributions depending on a random with the topic modelling component in comparison
seed used in its initialization. Therefore, every LDA to the baseline implementation in Figure 4.
model with the same set of parameters, except for the Comparing baseline results at different top 𝐾
random seed, should be computed several times that retrieved parts, it can be seen that the values of
will be referred to as runs further in the text. 𝑠𝑢𝑐𝑐𝑒𝑠𝑠, 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒𝑛𝑒𝑠𝑠, 𝑟𝑒𝑐𝑎𝑙𝑙 and 𝑚𝑖𝑛_𝑡𝑜𝑝_𝑘
Afterwards, all the metrics should be averaged over increase with higher 𝐾 and achieve the possible
several runs to get consistent results and minimize maximum at 𝐾 = ∞. 𝑀𝑖𝑛_𝑡𝑜𝑝_𝑘@∞ is not the target
the influence of the algorithm’s stochastic behavior. value for this metric, since it is higher than the values
Another control parameter is the number of topics of 𝑚𝑖𝑛_𝑡𝑜𝑝_𝑘@𝐾 for any 𝐾 ≠ ∞ while the goal is to
minimize it. Since we target at the lowest
𝑚𝑖𝑛_𝑡𝑜𝑝_𝑘@𝐾 possible, this metric is improved
when the average value decreases.
Overall improvement is observed for the
experimental configuration with the topic modelling
component. For metrics evaluated at 𝐾 = 10, the
improvement reached 54.5%, 52.6% and 51.8% of
maximum possible improvement for 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒𝑛𝑒𝑠𝑠,
𝑟𝑒𝑐𝑎𝑙𝑙 and 𝑠𝑢𝑐𝑐𝑒𝑠𝑠. It indicates that the introduced
component effectively captures similar cases and
therefore parts, too. The performance improvement
influenced by topic modelling is more prominent at
smaller values of 𝐾 as can be seen from the difference
between the average baseline values of 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒𝑛𝑒𝑠𝑠, Figure 3: Comparison of different metrics computed for
𝑟𝑒𝑐𝑎𝑙𝑙 and 𝑠𝑢𝑐𝑐𝑒𝑠𝑠 and those of LDA in Figure 3. LDA and baseline results in a part retrieval task. Confidence
There is an increase in the ratio of improved interval of 95% is shown as a box around LDA values.
queries for 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒𝑛𝑒𝑠𝑠, 𝑟𝑒𝑐𝑎𝑙𝑙 and 𝑠𝑢𝑐𝑐𝑒𝑠𝑠
calculated at smaller 𝐾 as depiced in Figure 4. For
example, from less than 4% of queries for 𝑟𝑒𝑐𝑎𝑙𝑙@10
to around 5.45% for 𝑟𝑒𝑐𝑎𝑙𝑙@5. Turning now to the
ratio of queries with improved 𝑚𝑖𝑛_𝑡𝑜𝑝_𝑘@𝐾 , it is
higher for larger 𝐾 since the set of top ranked parts
increases with greater 𝐾 likewise the probability of
finding all of the necessary parts among top 𝐾 parts.
Yet, it is the metric with the most prominent progress
according to the ratio of queries that were improved
using topic modelling: 10.49% to 11.20% for the LDA
configuration.
While for some queries the metrics were improved
by the introduction of LDA component, 0.007% to
0.5% of queries experienced deterioration of the Figure 4: Ratio of queries for which the performance met-
rics improved by the topic modelling component. Confi-
𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒𝑛𝑒𝑠𝑠, 𝑟𝑒𝑐𝑎𝑙𝑙 and 𝑠𝑢𝑐𝑐𝑒𝑠𝑠 at different 𝐾 and
dence interval of 95% is shown as a box around LDA values.
0.8% to 3.2% of queries for 𝑚𝑖𝑛_𝑡𝑜𝑝_𝑘@𝐾 . This
happens, for example, when a number of documents
with the right parts suggestion do not appear in the
as |𝑃(𝑐)| = 0. The groups of queries that benefited the
same group. A possible solution (as well as a future
most from the topic modelling component
work direction) is to integrate domain knowledge
integration are the following:
into the system and pre-define the number of topics
and their characteristic terms to always appear in the 1. queries with number of retrieved cases |𝐶(𝑞)| >
same topic. 100,

5.1.1. Performance Evaluation for Queries 2. queries associated with cases that required 1 ≤
Grouped Based on the Number of |𝑃(𝑐)| ≤ 10 parts,
Retrieved Cases and Parts 3. queries with retrieved and ranked parts
The queries are grouped by the number of parts used 10 < |𝑃𝑅 (𝑞)| ≤ 100.
in the query case and retrieved cases as well as by the
Therefore, the topic modelling has a positive effect on
number of retrieved service cases as demonstrated in
the queries that result in extensive lists of cases and,
Figure A in Appendix. Similarly to Figure 3, the
thus, parts appearing in those cases. Comparing this
results are reported with the mention of 95%-level
result to the distribution of queries in our
confidence interval on average for the runs. We
experimental setting (Figure 2), the positive effect
distinguish the queries made for service cases that
concerns the largest groups of queries.
did not require any part replacement and mark them
NLP as well as database search and Semantic Web.
Both IR and ER are usually enabled with a search
engine, a user interface and an available knowledge
base. However, while IR aims at document retrieval,
the target of ER is to provide a list of ranked entities,
such as people, places or specific concepts and things.
An entity is characterized with a unique ID, a name
and possibly a set of attributes. Data that describes
the entity could be stored in natural text or in a more
structured form. NLP techniques are used for repre-
sentation of unstructured texts in a knowledge base,
Figure 5: Coherence metric 𝐶𝑣 and IR metric
query processing and expansion and query-document
𝑟𝑒𝑐𝑎𝑙𝑙@𝐾 , 𝐾 = 5, 10 computed for 2–20 topics de-
rived by LDA. modelling. They also facilitate context capturing,
named entity recognition, topic-oriented filtering in
IR and ER [17, 16, 18]. Considering the classification
in [16], our work could be categorized as a study on
5.2. Number of Topics improvement of an ad-hoc entity retrieval system
LDA requires the number of topics to be passed as an that uses semantically enriched term representation
input parameter. In some applications, this value is and preserves topical relations among search results.
available as expert knowledge or is motivated by the Examples of industrial entity retrieval often include
dataset [14]. Alternatively, a set of coherence metrics the knowledge representation in the form of ontology
could be used to indicate the semantic correspon- as shown in [3, 4, 5].
dences within and throughout the derived topics and
to evaluate their quality [15]. When a target number 6.2. Knowledge Management for
of topics is unknown, it could be suggested by the Industrial Applications
elbow method applied to coherence measures. In our
case, the coherence score 𝐶𝑣 estimated over 5 LDA Industries have been adopting process planning and
instantiations with 2—20 topics resulted in an elbow knowledge-based systems for machine manufacturing
point between 5 and 9 topics as shown in Figure 5. and maintenance over the recent years [1, 2, 19]. In
However, the best results of IR evaluation metrics the literature review on spare part demand forecasting
were obtained in the majority of experiments with [20], it has been found that a large part of research
LDA at 𝐾 = 5 for 19 topics and at 𝐾 = 10 for 14 topics work has been dedicated to the analysis of historical
as also demonstrated for 𝑟𝑒𝑐𝑎𝑙𝑙@𝐾 in Figure 5. In demand using installed base information and reports.
general, the models perform well with 13 or more The work on technical support that utilizes a
topics in our experiment. The impact of the number historical case base is particularly relevant to our
of topics in terms of chosen evaluation metrics is research [21, 22, 23]. The goal of the paper [21] is to
observed on a smaller scale for 13 or more topics than aid telecom technical support teams with a fast and
for the number of topics from 2 to 12. accurate search over the solutions base for previously
registered cases and solutions from other technical
texts. A method of populating an existing ontology
6. Related Work has been proposed using text segmentation and
scoring to serve the use case of Telecom Hardware
Areas related to our research span across entity remote user assistance. The authors in [24] propose a
retrieval and knowledge management in industrial two-step method for spare part demand forecasting
applications that correspond to the scope of our work that predicts the number of repairs and the number
while the use of topic modelling in IR is related to the of parts needed for a repair. Our work combines
methodology used in this paper. processing of a historical case base, but is not focused
on spare part demand forecasting for general
6.1. Entity Retrieval Overview planning. It rather considers individual maintenance
cases and addresses a lower level of granularity.
Entity retrieval (ER) is defined in [16] as “the task of
Processing of Technical Documents Studies
answering queries with a ranked list of entities.” The
apply NLP as a tool for extracting knowledge from
area of entity retrieval is closely connected to IR and
natural texts in industrial log mining [25, 22, 26],
mining technical documentation [27], classification of In a number of research works, a combination of
system failures and preventive maintenance [28, 23]. topic modelling and IR is applied to small texts [34].
The study [22] applies an NLP approach to For instance, the paper [35] describes a method that
maintenance data concerning a part of the Swedish first pools similar tweets using an IR approach,
railway system and identifies frequent failure cases merges relevant short texts in a larger document and
on the railways. Text mining and NLP techniques are trains LDA model on concatenated documents thus
applied in [23] to analyze and classify the obtaining richer topics. By contrast, our method
construction site accidents using the data from addresses a domain-specific collection of short texts
Occupational Safety and Health Administration. In written in so-called telegraph style with spelling
this setting, an ensemble method was used to obtain mistakes and domain-related abbreviations.
Tfidf matrix and a sequential quadratic parsing Search Results Clustering To date, several stud-
method to assign weights to 5 classifiers. ies have investigated document and language models
The work [29] focuses on building Machine based on topics and clusters. The work [36] explored
Learning (ML) models to estimate future duration of a cluster-based retrieval of documents, a mechanism
maintenance activities by identifying problem, that returns a relevant cluster of documents, and
solution and items features via text mining for proposed two language models for ranking the
pre-processing followed by neural networks and clusters of documents and smoothing the documents
decision trees for prediction. NLP is used to mine using clusters. By contrast, some works cluster
electronic documents composed of free-form text to search results using traditional ML, graph-based and
extract terms of interest, the hierarchy of their con- rank-based clustering techniques [6, 37]. For
texts and form a set of normalized terms including instance, Lingo algorithm [38] focuses on learning
multi-word terms for further data analysis in [30]. phrases to represent clusters in a human-readable
Therefore, problems addressed in maintenance way and then it discovers topics using Tfidf weight-
services application domain are diverse in nature. ing, performs term-document matrix reduction with
However, to the best of our knowledge the current SVD and matches the extracted phrases with topics.
paper is the first attempt to use entity retrieval In comparison to these approaches, our work aims at
techniques for spare part management. retrieving entities rather than documents and the
user can explore all the retrieved parts within all the
6.3. Use of NLP and Topic Modelling in clusters instead of only one cluster.
IR Systems
The effectiveness of IR systems could be improved by 7. Conclusion
topic modelling that mines term associations in a
In this work, we explored a way of improving a spare
collection of documents. Topic modelling could be
part retrieval system for remote diagnostics and
integrated to IR tasks to smooth the document model
maintenance of medical equipment by applying topic
with a document term prior estimated using term
modelling to search results. The topic modelling
distributions over topics [31]. The work [32] explores
component was used to cluster the results of a
the possibilities of modelling term associations as a
baseline retrieval system and improve the relevance
way of related terms integration into document
of the search results. We aimed to support the
models and proposes a model of probabilistic term
decision-making process of maintenance service
association using the joint probability of terms. A
teams that searched in a historical collection of
combination of term indexing and topic modelling
troubleshooting reports and retrieved parts needed
approaches is proposed in [33]. In the proposed
for a new similar issue.
model, every query term in a document is weighted
The experimental dataset was constructed from
using the LDA algorithm and IR indexing methods.
query-result pairs pointing at the historical case base
The best experimental results were obtained with
and parts used in the cases. We adjusted several IR
LDA-BM25 version. However, in this paper, the
metrics to evaluate the results of spare part retrieval
similarity is computed using a vector space model
in the baseline architecture and the topic modelling
and the retrieval results are combined using topic
component modification. The major enhancement
relations mined from a historical case base.
was observed for the metric that estimated the
Therefore, topic modelling is used as a clustering or
minimum top ranked parts that were sufficient for
grouping method on top of an ER system.
the full treatment of a service case associated with a
performed query. 2017, pp. 425–432. URL: http://link.springer.com/
A natural progression of this work is to apply on- 10.1007/978-3-319-66923-6{_}50. doi:10.1007/
line topic learning and automatically recommend the 978-3-319-66923-6_50.
topic that performs best for a given query. An input [6] H. Toda, R. Kataoka, M. Oku, Search
from domain experts would help fix the number of Result Clustering Using Informatively
topics and characteristic terms that should appear Named Entities, International Jour-
under one topic. Furthermore, additional domain nal of Human-Computer Interaction 23
knowledge could be combined with the entity re- (2007) 3–23. URL: http://www.tandfonline.
trieval system under consideration to suggest actions com/doi/abs/10.1080/10447310701360995.
beyond part replacement, such as troubleshooting doi:10.1080/10447310701360995.
tests for remote and on-site diagnostics. [7] S. E. Robertson, S. Walker, K. S. Jones, M. M.
Hancock-Beaulieu, Okapi at TREC-3, Pro-
ceedings of the Third Text REtrieval Conference
Acknowledgments (1994).
[8] C. D. Manning, P. Raghavan, H. Schutze,
The authors would like to acknowledge the gracious
Introduction to Information Retrieval,
support of this work through the local authorities
Cambridge University Press, Cambridge,
under grant agreement “ITEA-2018-17030-Daytime”.
2008. URL: http://ebooks.cambridge.org/
ref/id/CBO9780511809071. doi:10.1017/
References CBO9780511809071.
[9] C. D. Manning, H. Schütze, Foundations of Sta-
[1] G.-F. Liang, J.-T. Lin, S.-L. Hwang, E. M.-y. tistical Natural Language Processing, The MIT
Wang, P. Patterson, Preventing human er- Press;, 1999. URL: https://nlp.stanford.edu/fsnlp/.
rors in aviation maintenance using an on-line [10] D. M. Blei, A. Y. Ng, M. T. Jordan, Latent Dirich-
maintenance assistance platform, Inter- let Allocation, Journal of Machine Learning Re-
national Journal of Industrial Ergonomics search 3 (2003) 993–1022.
40 (2010) 356–367. URL: https://linkinghub. [11] Elasticsearch B.V., Elasticsearch, ???? URL: https:
elsevier.com/retrieve/pii/S0169814110000028. //www.elastic.co/.
doi:10.1016/j.ergon.2010.01.001. [12] R. Řehůřek, P. Sojka, Software Framework for
[2] E. Ruschel, E. A. P. Santos, E. d. F. R. Loures, Topic Modelling with Large Corpora, in: Pro-
Industrial maintenance decision-making: A sys- ceedings of the LREC 2010 Workshop on New
tematic literature review, Journal of Manufactur- Challenges for NLP Frameworks, ELRA, Valletta,
ing Systems 45 (2017) 180–194. URL: https://doi. Malta, 2010, pp. 45–50.
org/10.1016/j.jmsy.2017.09.003. doi:10.1016/j. [13] M. Honnibal, I. Montani, spaCy 2: Natural lan-
jmsy.2017.09.003. guage understanding with Bloom embeddings,
[3] Z. Li, K. Ramani, Ontology-based design infor- convolutional neural networks and incremental
mation extraction and retrieval, Artificial Intel- parsing, 2017. To appear.
ligence for Engineering Design, Analysis fand [14] R. J. Gallagher, K. Reing, D. Kale, G. Ver
Manufacturing 21 (2007) 137–154. URL: https: Steeg, Anchored Correlation Explanation:
//www.cambridge.org/core/product/identifier/ Topic Modeling with Minimal Domain Knowl-
S0890060407070199/type/journal{_}article. edge, Transactions of the Association for
doi:10.1017/S0890060407070199. Computational Linguistics 5 (2017) 529–542.
[4] K. Ponnalagu, Ontology-driven root-cause ana- URL: https://www.mitpressjournals.org/doi/abs/
lytics for user-reported symptoms in managed IT 10.1162/tacl{_}a{_}00078. doi:10.1162/tacl_a_
systems, IBM Journal of Research and Develop- 00078.
ment 61 (2017) 53–61. doi:10.1147/JRD.2016. [15] M. Röder, A. Both, A. Hinneburg, Exploring the
2629319. Space of Topic Coherence Measures, in: Pro-
[5] M. Sharp, T. Sexton, M. P. Brundage, To- ceedings of the Eighth ACM International Con-
ward Semi-autonomous Information Extraction ference on Web Search and Data Mining - WSDM
for Unstructured Maintenance Data in Root ’15, ACM Press, New York, New York, USA, 2015,
Cause Analysis, in: IFIP Advances in Information pp. 399–408. URL: http://dl.acm.org/citation.cfm?
and Communication Technology, volume 513, doid=2684822.2685324. doi:10.1145/2684822.
2685324.
[16] K. Balog, Entity-Oriented Search, volume 39 ence on Knowledge discovery and data mining
of The Information Retrieval Series, Springer - KDD ’14, ACM Press, New York, New York,
International Publishing, Stavanger, Nor- USA, 2014, pp. 1867–1876. URL: http://dl.acm.
way, 2018. URL: https://eos-book.orghttp: org/citation.cfm?doid=2623330.2623340. doi:10.
//link.springer.com/10.1007/978-3-319-93935-3. 1145/2623330.2623340.
doi:10.1007/978-3-319-93935-3. [26] S. Agarwal, V. Aggarwal, A. R. Akula, G. B. Das-
[17] S. Büttcher, C. L. A. Clarke, G. V. Cormack, Infor- gupta, G. Sridhara, Automatic problem extrac-
mation Retrieval: Implementing and Evaluating tion and analysis from unstructured text in IT
Search Engines, The MIT Press, 2010. tickets, IBM Journal of Research and Develop-
[18] Z. A. Merrouni, B. Frikh, B. Ouhbi, Toward ment 61 (2017) 41–52. doi:10.1147/JRD.2016.
Contextual Information Retrieval: A Review 2629318.
And Trends, Procedia Computer Science [27] K. Richardson, J. Kuhn, Learning semantic cor-
148 (2019) 191–200. URL: https://linkinghub. respondences in technical documentation, ACL
elsevier.com/retrieve/pii/S1877050919300365. 2017 - 55th Annual Meeting of the Association
doi:10.1016/j.procs.2019.01.036. for Computational Linguistics, Proceedings of
[19] S. P. Leo Kumar, Knowledge-based expert sys- the Conference (Long Papers) 1 (2017) 1612–
tem in manufacturing planning: state-of-the- 1622. doi:10.18653/v1/P17-1148.
art review, International Journal of Production [28] K. Arif-Uz-Zaman, M. E. Cholette, L. Ma,
Research 57 (2019) 4766–4790. doi:10.1080/ A. Karim, Extracting failure time data
00207543.2018.1424372. from industrial maintenance records us-
[20] S. Van der Auweraer, R. N. Boute, A. A. Syn- ing text mining, Advanced Engineer-
tetos, Forecasting spare part demand with in- ing Informatics 33 (2017) 388–396. URL:
stalled base information: A review, International http://dx.doi.org/10.1016/j.aei.2016.11.004.
Journal of Forecasting (2019). doi:10.1016/j. doi:10.1016/j.aei.2016.11.004.
ijforecast.2018.09.002. [29] M. Navinchandran, M. E. Sharp, M. P. Brundage,
[21] A. Kouznetsov, J. B. Laurila, C. J. Baker, B. Shoe- T. B. Sexton, Studies to predict maintenance
bottom, Algorithm for Population of Object time duration and important factors from main-
Property Assertions Derived from Telecom Con- tenance workorder data, in: Proceedings of
tact Centre Product Support Documentation, in: the Annual Conference of the Prognostics and
2011 IEEE Workshops of International Confer- Health Management Society, PHM, 2019. doi:10.
ence on Advanced Information Networking and 36001/phmconf.2019.v11i1.792.
Applications, IEEE, 2011, pp. 41–46. URL: http:// [30] A. Kao, N. B. Niraula, D. I. Whyatt, Text mining a
ieeexplore.ieee.org/document/5763435/. doi:10. dataset of electronic documents to discover terms
1109/WAINA.2011.135. of interest, 2020.
[22] C. Stenström, M. Aljumaili, A. Parida, Natu- [31] L. Azzopardi, M. Girolami, C. van Rijsbergen,
ral language processing of maintenance records Topic based language models for ad hoc in-
data, International Journal of COMADEM 18 formation retrieval, in: 2004 IEEE Interna-
(2015) 33–37. tional Joint Conference on Neural Networks
[23] F. Zhang, H. Fleyeh, X. Wang, M. Lu, (IEEE Cat. No.04CH37541), volume 4, IEEE,
Construction site accident analysis us- 2004, pp. 3281–3286. URL: http://ieeexplore.ieee.
ing text mining and natural language org/document/1381205/. doi:10.1109/IJCNN.
processing techniques, Automation in 2004.1381205.
Construction 99 (2019) 238–248. URL: [32] X. Wei, W. B. Croft, Modeling Term Asso-
https://doi.org/10.1016/j.autcon.2018.12.016. ciations for Ad-Hoc Retrieval Performance
doi:10.1016/j.autcon.2018.12.016. Within Language Modeling Framework,
[24] W. Romeijnders, R. Teunter, W. Van Jaarsveld, A in: Advances in Information Retrieval,
two-step method for forecasting spare parts de- Springer Berlin Heidelberg, Berlin, Hei-
mand using information on component repairs, delberg, 2007, pp. 52–63. URL: http://link.
European Journal of Operational Research (2012). springer.com/10.1007/978-3-540-71496-5{_}8.
doi:10.1016/j.ejor.2012.01.019. doi:10.1007/978-3-540-71496-5_8.
[25] R. Sipos, D. Fradkin, F. Moerchen, Z. Wang, Log- [33] F. Jian, J. X. Huang, J. Zhao, T. He, P. Hu, A
based predictive maintenance, in: Proceedings Simple Enhancement for Ad-hoc Information Re-
of the 20th ACM SIGKDD international confer- trieval via Topic Modelling, in: Proceedings of
the 39th International ACM SIGIR conference on
Research and Development in Information Re-
trieval - SIGIR ’16, ACM Press, New York, New
York, USA, 2016, pp. 733–736. URL: http://dl.acm.
org/citation.cfm?doid=2911451.2914748. doi:10.
1145/2911451.2914748.
[34] J. Qiang, Z. Qian, Y. Li, Y. Yuan, X. Wu, Short
Text Topic Modeling Techniques, Applications,
and Performance: A Survey, IEEE Trans-
actions on Knowledge and Data Engineering
14 (2020) 1–17. URL: https://ieeexplore.ieee.org/
document/9086136/. doi:10.1109/TKDE.2020.
2992485.
[35] M. Hajjem, C. Latiri, Combining IR and LDA
Topic Modeling for Filtering Microblogs, in: Pro-
cedia Computer Science, 2017. doi:10.1016/j.
procs.2017.08.166.
[36] X. Liu, W. B. Croft, Cluster-based retrieval us-
ing language models, in: Proceedings of the 27th
annual international conference on Research and
development in information retrieval - SIGIR ’04,
ACM Press, New York, New York, USA, 2004,
pp. 1–8. URL: http://portal.acm.org/citation.cfm?
doid=1008992.1009026. doi:10.1145/1008992.
1009026.
[37] K. Sadaf, Web Search Result Clustering- A
Review, International Journal of Computer
Science & Engineering Survey 3 (2012) 85–
92. URL: http://www.airccse.org/journal/ijcses/
papers/3412ijcses07.pdf. doi:10.5121/ijcses.
2012.3407.
[38] S. Osiński, J. Stefanowski, D. Weiss, Lingo:
Search Results Clustering Algorithm Based on
Singular Value Decomposition, in: Intelli-
gent Information Processing and Web Mining,
Springer Berlin Heidelberg, Berlin, Heidelberg,
2004, pp. 359–368. URL: http://link.springer.com/
10.1007/978-3-540-39985-8{_}37. doi:10.1007/
978-3-540-39985-8_37.

A. Topic Modelling Component
Performance Evaluation for
Grouped Queries
Figure 1: Comparison of different metrics computed for LDA and baseline results in a part retrieval task.
Queries are divided into groups using the number of retrieved cases, as well as used and retrieved parts.
Confidence interval of 95% is shown as a box around LDA values.